Deis Workflow: Migrating From fleet to Kubernetes
This month, we released Deis Workflow.
This was the second major release of the Deis v1 PaaS.
One of the big changes under the covers (and the reason we bumped the major version number) was the shift from fleet to Kubernetes. There are several fundamental changes that needed to occur for this to happen.
In this post I'm going to look at some of the challenges, and the solutions we came up with as we migrated from one scheduler to another. Hopefully others can learn from our experience.
First, a small history lesson.
The Deis v1 PaaS uses fleet as the container scheduler for both hosted applications and for Deis itself. fleet is a distributed init system, relying on etcd and systemd to provide a simple job scheduler to run processes across a cluster.
Deis has been using fleet as the scheduler since v0.8.0, which is when we made the switch from our rudimentary Chef scheduler.
While fleet makes a good basic scheduler, it lacked sophistication. For example, fleet schedules jobs based on whichever machine has the least amount of jobs available. So it is common for a particular host to be swamped with jobs that require many CPU cycles or lots of RAM. In other words, fleet is not resource aware.
In addition, service discovery between containers could never be achieved within fleet. This must be handled by the application developer.
Working Around fleet Limitations
To work around these problems, we started using etcd as our service discovery mechanism, as well as for platform configuration.
Since etcd is a key/value store, it is well suited to the task of holding configuration. For example, the registration mode for the controller, force redirection to port 443 for the router, or the maximum connections allowed for the database.
We then instruct every container to write to etcd every few seconds to update their host information, such as their IP address and port number.
Because of this heavy reliance on etcd for service discovery and configuration, testing the application outside of fleet was very difficult.
For example, we had a logging component called Logspout. Logspout’s only job was to attach itself to containers running on the host and ship those containers' logs to the logger. The logger was discovered by etcd, so in order to test logger connections we would have to mock out an etcd cluster.
As you might've guessed, etcd eventually became a single point of failure for the system.
If etcd started to suffer from network latency issues (a common problem on public clouds), components would start to stutter. They would no longer be able to find each other, could not fetch their configuration.
This inevitably caused platform downtime.
So, we needed a better abstraction to solve these issues.
Switching to Kubernetes
With the switch to Kubernetes, a lot of these problems have gone away.
Kubernetes is a resource-aware scheduler that schedules jobs based on their needs and requirements. It has built-in service discovery and the capability to change container configuration on-the-fly.
We took these builtins and decided to architect our applications around them.
Service discovery within Kubernetes is handled via DNS. This removes our heavy reliance on etcd for service discovery.
If you have a pod backed by a service named
foo, the pod can be discovered by any other container in the same namespace by the DNS hostname
foo. This made testing communication significantly easier because you no longer had to use etcd (or mock out an etcd cluster) for service discovery.
Custom container configuration in Kubernetes is handled by environment variables and secrets. These are top level concepts.
Environment variables are injected into the container, and secrets are mounted into the filesystem. This is much easier than having to bootstrap a mock etcd cluster, set keys, boot up the application containers, and have them communicate with etcd.
We are all really happy with the switch to Kubernetes.
Not only because it makes testing components easier, which increases code quality. But also, the stability of the Deis platform has improved dramatically.
Gone are the times when a component is stuck trying to fetch its configuration from etcd because of network latency issues. A node no longer comes down because a job scheduled to the node ran away with all the RAM.
However, the massive improvements in both functionality and stability have to be experienced to be believed.
If you're interested in trying out Deis Workflow, check out the quickstart!