Running Socket.IO Applications on Kubernetes
In this post, I'm going to go through the challenges faced when running a WebSocket based application on Kubernetes, and how to deal with these challenges.
I'll be considering an AWS based setup. A similar approach would work for other environments as well.
Intro to WebSockets
If you've worked with WebSockets before, you might remember that requests associated with a particular session ID have to connect to the process that originated them. This is required since certain transports, like XHR Polling and JSONP Polling, fire several requests during the lifetime of a WebSocket, as explained in the Socket.IO docs.
Basically, the Socket.IO client and server send multiple requests to perform a handshake and establish connection. With multiple servers running, those requests may arrive at different servers, which will break handshake protocol.
We're good as long as the Socket.IO application is running on a single server. But when we move the application to production, we need to run multiple servers for a number of reasons, for example: high availability, fault tolerance, and load balancing.
Typically, a load balancer sits in front of multiple backend servers and round-robins requests to different servers. This will break the Socket.IO sessions.
To ensure all the requests for a particular session go to the same server, we need to use Sticky Sessions, a.k.a. Session Affinity. A common way to achieve session stickiness is via something known as Client-IP hashing.
WebSockets, Meet Kubernetes
In Kubernetes, load balancing is done by kube-proxy. The default settings send traffic randomly to one of the backend pods. Client-IP based session affinity can be selected by setting
ClientIP (the default value is
None) in the Kubernetes service config.
When the service is exposed publicly via NodePorts, the incoming traffic would be routed based on client IPs to ensure sticky sessions. Everything happens just the way we wanted. But things get a little different when we put a load balancer in front of the NodePorts, especially in case of AWS.
Let's take a look at some of the gotchas.
Gotcha #1: Missing Source IPs
When running on AWS, the best way to serve traffic across different availability zones in a particular region is with an Elastic Load Balancer (ELB).
However, as mentioned, when we use ELBs, kube-proxy isn't able to maintain session affinity consistently anymore. The reason being that kube-proxy gets the IP address of the ELB instance instead of the actual client that made the request. That's because ELBs forward requests to backend instances without modifying the request headers and the client IP is not sent in case of TCP load balancing.
At least ELBs don't do that by default.
To get the real client IP address from an ELB, we need to enable Proxy Protocol v1 support. It adds a human-readable header with connection information, including the source IP address, destination IP address, and port numbers to the existing request headers. This provides a way to transport connection information to the NodePort where kube-proxy is listening.
So, let's take a look at enabling Proxy Protocol support for an ELB.
As of today, we can't enable proxy protocol support via the AWS Management Console, so we'll need to use the AWS CLI. We can do that by creating a new proxy protocol policy for our load balancer by running:
Here, you need to replace
LOADBALANCER_NAME and and POLICY_NAME with custom values.
I went with
Enable Policy on the Required Port
Now the policy has been created, we need to enable it by associating it with the particular port the backend servers are listening on. This is required for all the ports added in the list of listeners.
This command replaces the current set of enabled policies:
Before running this, be sure to replace the uppercase variables.
LOADBALANCER_NAME should be the value you previously set. The
INSTANCE_PORT is the NodePort for the respective service. And you must specify both the new policy you're adding (
POLICY_NAME) as well as any existing policies (
EXISTING_POLICY_NAMES) that are enabled. You can get these existing policies using the same command we use for verification in the next section.
Verify That Proxy Protocol Is Enabled
We can confirm that our newly created policy has been enabled for a particular NodePort by using the following command:
Again, be sure to replace the uppercase variable before running this.
The response shows that the
POLICY_NAME policy is associated with the NodePort
INSTANCE_PORT and it looks like this:
That's about it.
The ELB would now send client IP information to the backend in the first line of request, using the following format:
If we have an nginx backend, the access logs would show something like this:
Gotcha #2: Unsupported Headers in Kube-Proxy
We're getting the Proxy Protocol headers to kube-proxy now. Great! But here's the thing, kube-proxy doesn't handle these headers. It's expected to be fixed in future releases, but we're on our own for now.
A way to get past this is using another proxy server (e.g. nginx or haproxy) that handles the Proxy Protocol headers before the traffic reaches kube-proxy.
The nginx config would look like this:
This will translate the Proxy Protocol headers to the headers kube-proxy can understand, most importantly
X-Real-IP which will give kube-proxy the actual client IP and would allow it to enforce session stickiness.
We run these proxy servers as a set of replicated pods (via a deployment controller) in our Kubernetes cluster where our Socket.IO server pods are running. It will proxy the websocket requests to the Socket.IO server. We also expose this as a Kubernetes Service with
sessionAffinity: ClientIP option in the service config for both nginx as well as the upstream Socket.IO service.
The manifest for the Nginx deployment and service looks like this:
SOCKET_SERVER environment variable is the DNS name for the Socket.IO service, and should be resolvable via KubeDNS.
In this post, we explored the problems and solutions associated with running WebSocket based applications on Kubernetes.
We looked at the limitations of AWS Elastic Load Balancer and kube-proxy when working with WebSockets. We also looked at how to work with these limitations by making use of
ProxyProtocol and an intermediary nginx pod.