Run Self-Sufficient Containers on CoreOS
In my last post we learnt about CoreOS and took a look at the steps needed to install it on your laptop (using VirtualBox). We also learnt CoreOS doesn’t ship a package manager. Instead it comes with Docker pre-installed. So, for every service that you need (e.g. web server, database, cache, and so on) you can just create and use a Docker container for it.
So, what’s the deal with self-sufficient containers?
Containers are self-sufficient by default, right? Well, this depends on what you call self-sufficient. Of course, containers are self-sufficient in that they don’t depend on any software running outside of their logical boundaries. But they need kernel support and computing resources from the host computer.
Let’s see it like this: containers are generally set up on a network with several nodes, with each node running one or more containers. Each container may be self-sufficient, but the node on which the container is supposed to run is not! Nodes may crash or run slow or get disconnected off the network. And in such cases, you need to find out the offline node and re-route the traffic meant for that node to other nodes.
But how do we do that?
In this post, we’ll take a look at these three components and learn how they’re used by CoreOS to help you create self-sufficient containers.
systemd is an init system that provides features for starting, stopping, and managing processes during the boot process of a computer
To let systemd know about the processes you’d like to run on system startup, you need to write configuration files called units. There are several types of units. In CoreOS context, we will mostly discuss target units.
Target units are used for grouping units into well-known synchronization points during system startup. A simple way to identify a target unit, is the extension
.target. Returning to CoreOS,
multi-user.target is the target file we’ll interact the most with, as it holds links to all the general use unit files.
It is important to note systemd is not a CoreOS-specific feature. Rather, systemd is a separate project that existed long before CoreOS.
Simply put, fleet is a cluster manager that uses systemd at the cluster level. Usefully, fleet uses the systemd unit files and adds a few fleet-specific properties.
There are two types of units that can be run on a CoreOS cluster: standard and global units.
Standard units are long-running processes that are scheduled onto a single machine. Take for example a database server or a web server. If that machine goes offline, the unit will be automatically migrated onto a new machine and started.
Global units are run on all machines in the cluster. These are ideal for common services like monitoring agents or components of higher-level orchestration systems like Kubernetes, Mesos, or OpenStack.
To use fleet from a CoreOS machine that’s a part of the CoreOS cluster, you can directly use the
fleetctl command. If you connect to the cluster remotely, it is just a matter of logging in to one of the nodes of cluster via SSH, and then use
fleetctl. Options provided by
fleetctl are analogous to
Running high availability service using fleet
CoreOS offers high availability for nodes running in the cluster. If any node is down, the tasks running on that node are automatically re-launched on another node. Let’s see how this works.
As we discussed above, each node in the cluster runs the fleet daemon and hence, all the nodes are constantly in communication with each other. After the cluster starts up (or the current engine goes bad), nodes elect a leader (called the engine) to make scheduling decisions. The engine parses new requests (that are generally unit files), finds qualified nodes to run them, and finally informs the node to start the unit.
To put things in perspective, imagine you need Apache web server and MySql database services. Specifically, four Apache web server instances and three MySQL instances. So, the obvious first step is to write corresponding unit files. Next step is to push the unit files to the cluster. The fleet engine then parses the request and assigns nodes to the units and corresponding services are started.
When a node fails, it misses the deadline to heartbeat back to the leader. The leader, after sensing this, marks all the units running on that node for rescheduling. As soon as qualified machines for all the units are found, the units are started on new node. If the old node comes up, the leader asks it to stop running the units that have been rescheduled.
When applications run on a computer, they have access to computer’s file system, and so they can easily save any information that comes in (as files) and then read it back whenever needed.
But as explained previously, applications are moving to containers and containers are running on clusters. So applications can no longer take the file system for granted. There is not even a fixed server where the application will run. Everything decided on the fly.
Take applications like WordPress, Magento, and so on. They generally have a config file that is written to the disk during the setup. Every time the application runs, it reads info from that file. Now, imagine running WordPress on a distributed setup like a CoreOS cluster, where a node failure may cause the application being assigned to another node. How do you ensure consistency while accessing the configuration information?
etcd solves this problem. Think of etcd as a file system where applications can store operational data like database details, cache settings, application specific data etc, without bothering about the node going offline or a container being removed.
As etcd runs on all the nodes of the cluster and can gracefully handle network partitions, it serves as the backbone of distributed systems by providing a system wide hub of information. You can read or write data into etcd using cURL or other HTTP libraries.
To define it formally: etcd is a distributed key-value store that provides a reliable way to store data across a cluster. Though developed by CoreOS, etcd is an independent project, and works with several other operating systems.
In this post, we took a look at three things:
- systemd lets you manage processes during system bootup.
- fleet helps you can manage the CoreOS cluster and do things like starting many instances of a container across the entire cluster. fleet also makes the setup highly available, i.e. nodes started with fleet will be restarted on other nodes in the cluster in case of a node failure.
- etcd lets applications manage their configuration easily and reliably in a way that works for stateless, ephemeral containers.
Subsystems like systemd, fleet, etcd add a lot of possibilities to CoreOS. They make CoreOS a truly distributed operating system suitable for scalable operations.
In the next posts in this series, we’ll take a look at using these tools.