Docker Architecture – Engine, Containerd, runc

People often get confused when looking at the Docker architecture at what the various components do. In this article I will attempt to demystify a few components. The latest docker version when writing this article is 18.09.
Here’s an overall hierarchy of Docker

The heart of Docker is the docker engine.

1. Docker Engine

The Docker Engine consists of

1. Docker Server which is implemented using dockerd or docker daemon. It is responsible for creating the images, containers, networks and volumes.
2. A RESTFul API to talk to the docker server.
3. A docker command line client a.k.a the docker command

1.1 Docker Server

The Docker Server takes care of – creating and maintaining containers using containerd, Networking, persistent storage, Orchestration and Distribution.

1.1.1 Persistent storage

The file system in docker is managed by the container runtime and it uses a storage driver to write to the containers writable layer. However to persist data outside the container there are three options – volumes, bind mounts and tmpfs mounts.

1.1.1.1 Volumes

The Docker server take care of maintaining volumes and they are managed using the api or the client. They can be shared amongst multiple containers and work on both linux, windows or event remote hosts or cloud providers. The volumes are created on the host but managed by docker and its life is independent of the lifecycle of the container. A volume can be mounted onto multiple containers and also on remote hosts or cloud providers using volume drivers.

1.1.1.2 Bind Mounts

You can mount a file or directory from the host machine to the docker container using bind mounts. They are limited in functionality compared to volumes and use the complete file path of the host system. Volumes are preferable to bind mounts for new applications.

1.1.1.3 tmpfs Mounts

This can be used to store temporary data since the data is not persisted on disk but is kept on the temporary storage area of the host machine.

1.1.2 Networking

The diagram below from the docker documentation for networks explains how networking in docker works.

We will not go through the details of this but lets highlight the main parts

1.1.2.1 Sandbox

This is the network for the container itself. It manages the containers routing tables, interfaces and DNS settings.

1.1.2.2 Endpoint

Endpoints join a sandbox to a network. Their main goal is to abstract the driver implementation.

1.1.2.3 Network Driver

The network drivers are used by the docker engine to connect to the actual network infrastructure. There are two types of network drivers – native and remote. The Native drivers are present inside the Docker Engine and the remote drivers are maintained by community and vendors. Following native drivers exist –

Host – Container uses host networking
Bridge – Docker creates a linux bridge. containers can talk on that bridge.
Overlay – Creates an overlay network that supports multiple host out of the box.
MACVLAN
None – creates networking stack and namespace but does not give containers an interface. The containers are therefore completely isolated

1.1.2.4 IPAM Driver

Manages IP addresses for Docker.

1.1.3 containerd

The heart of the container system is containerd. It is the container runtime that the docker engine uses to create and manage containers. It abstracts away calls to system or OS specific functionality to run containers on windows, solaris and other operating systems. The scope of containerd includes the following

Create, start, stop, pause, resume, signal, delete a container
Functionality for overlay, aufs and other copy on write file systems for containers (Copy on write)
build, push and pull images and management of images
container level metrics
creating and managing network interfaces
Persisting container logs

containerd is scoped to a single host. In this article we will discuss two core features – snapshotter and runtime.

1.1.3.1 Container runtime – runC

The Container runtime is implemented through runC. runC is a cli tool that follows the open container initiative as specified at https://www.opencontainers.org/. The open container initiatives provides specification for the runtime (runtime-spec) and the Image (image-spec). runC tracks the runtime-spec.

runC provides GO implementation to create containers using namespace, cgroups, filesystem access controls and linux security capabilities. For more information look at the libContainer (parent of runC) specification at https://github.com/opencontainers/runc/blob/master/libcontainer/SPEC.md.

1.1.3.2 Snapshotter

Docker containers use a system known as layers. Layers allow making modifications to a file system and storing them as a changeset on the top of a base layer. Docker earlier used graphdriver for taking snapshots, however containerd uses the snapshotter.

A snapshot is a filesystem state. Each snapshot has a parent and the empty parent is a an empty string. A layer is the diff between the snapshots. When a container is created, it adds a writable layer on top of all the layers. All changes are written to this writable layer. This writable layer is what differentiates a container from an image.

All containers use the same base layers. If a layer needs changes, then a new layer is created as a copy of that layer (and all layers on top of it) and additional changes are added to that new layer. These new layers are visible to the container that requested those changes and the other containers can still use the original layer. If you check out two images from the docker repository and if the two images have common base layers then docker will download those common layers only once.

There are different drivers that accomplish this functionality. Docker uses a pluggable strategy so that the storage drivers can be changed. The storage drivers supported are overlay2, aufs(used by older versions), devicemapper (older CentOS and RHEL kernel versions), btrfs and zfs if the hosts use them and vfs for testing.

1.2 Docker RESTful API

The docker API receives commands from the client and passes that on the daemon. The main endpoints that the API exposes are /containers (list, create, inspect, list processes, get logs, manage container), /images (list, build, create, inspect , push to registry, remove, search), check auth, monitor events, /volumes (list, create, inspect, remove), /networks (list, inspect, create, connect container, remove), /nodes (list, inspect, remove ), /swarm (inspect, init new, leave, update), /services (list, create, remove, inspect, update), /tasks (inspect, )

1.3 Docker Command line client

As the diagram shows, the user sends commands to the docker server using the command client client also know as docker. The command is received by the Docker RESTful API which in turn instructs the daemon to perform tasks. To see a list of commands the the command line runs type in docker help.

This finishes our introduction to the Docker architecture.