Fleet on CoreOS, Part One
Servers crash all the time. But it is important to make sure applications, and hence the business, doesn’t suffer. This is why service availability is one of biggest concerns for operational engineers deploying applications in the cloud.
Fleet—a CoreOS tool—solves this problem and frees you from worry by automatically routing application execution to healthy nodes.
So, how does this work?
How does Fleet know if a node is down? How does the rerouting happen?
We covered this in detail in a previous post. But if you’re in a hurry, I will recap.
Each node in a CoreOS cluster runs the fleet daemon, which keeps a tab on the node’s health and is responsible for communicating with other nodes. The daemons coordinate to elect a leader during cluster startup, or when the current leader fails. The leader schedules new services on the nodes whenever a new service request is submitted to the cluster, or when a node goes down taking services with it.
In this miniseries, we’ll get some services up-and-running on a cluster, then take down a node to see how fleet reshuffles things. We’ll then move on and take a closer look at some additional fleet functionality.
Let’s take a look at Fleet in action. Specifically, let’s look at how the rerouting takes places when a nodes goes down.
To get started, you should have CoreOS cluster running. If you’re not sure how to do this, check out this tutorial on how to install CoreOS on AWS. Note: in this tutorial, I have used a three node CoreOS cluster running on AWS EC2.
Once you have your cluster ready, connect to the cluster.
That should look something like this:
Make sure to change the path to the actual key file path and the public hostname of your EC2 server.
After you’re connected, check you have fleet installed by running:
You should see something like this:
Define Your Services
While fleet comes pre-installed on CoreOS, it doesn’t start automatically.
To start fleet, we’ll need to add at least one unit file. A unit file describes a service you want to run on your cluster.
Here’s an example unit file:
We can name this file
Description shows up in the logs, so it is a good idea to set this to something you’ll understand later.
Requires=docker.service means this unit will only start after
docker.service is active.
ExecStart allows you to specify a command to run when the unit is started.
ExecStartPre specifies commands to be executed before the command specified by
ExecStart. This can be used to do cleanup, or setup, and so on.
Do not run docker containers with
-d as this will prevent the container from starting as a child of this process. In that case, systemd will think the process has exited, and the unit will be stopped.
ExecStop Commands that will run when this unit is considered failed or if it is stopped.
To start this service, move or create the
myapp.service file on the node you’re already connected to.
You will see that the service starts.
Confirm this by running:
This command displays all unit files running in the cluster along with the node IP addresses.
Now we’ve added got one service running on the cluster, we can get multiple services running by repeating the process. Just remember that unit file names need to be unique across the cluster.
Once done, run the above command again.
It should look something like this:
$ fleetctl list-unit-files UNIT HASH DSTATE STATE TARGET myapp.service d4c61bf launched launched 85c0c595.../172.31.5.9 anotherapp.service e55c0ae launched launched 113f16a7.../172.31.22.187 someapp.service 391d247 launched launched a0b7a5f7.../172.31.22.22
Note here we have three services (
someapp.service) running on three hosts, indicated by the three IP addresses under the
Fleet in Action
As mentioned at the start, the best bit about fleet is that, not only does it schedule units across the cluster as and when requests are submitted, it automatically reroutes application execution to healthy nodes.
To see this automatic rerouting in action, let’s take down one node.
I did this via the AWS console, by stopping one of my EC2 instances. But you can use whatever management console you’re running your virtual machines with.
After you’ve taken down the node, run:
You should see output like this:
UNIT HASH DSTATE STATE TARGET myapp.service d4c61bf launched launched 85c0c595.../172.31.5.9 anotherapp.service e55c0ae launched launched 113f16a7.../172.31.22.187 someapp.service 391d247 launched launched a0b7a5f7.../172.31.22.187
someapp.service was running on 172.31.22.22, but now it’s running on 172.31.22.187, along with
anotherapp.service. So what’s happened here is 172.31.22.22 was removed from the cluster, and the
someapp.service was moved to the healthy node running
Fleet did this automatically, with no interaction from us.
Pretty cool, huh?
In this post, we learnt:
- How to create unit files that describe a service we want to run on our cluster
- How to launch a service on our cluster using fleet
- How fleet detects node failure and automatically shuffles downed services onto healthy nodes
In the next post in this miniseries, we’ll look at more advanced ways to interact with and use fleet.