How it Works
This is a distributed application, built on top of Docker Swarm (the new, integrated "Swarm Mode", not the legacy docker-swarm utility). It is composed of seven different Docker services.
Services¶
The application is made up of several Docker services: redis, mongo, builder, runner, visualizer, dashboard, distcc. Let's discuss what each of them are there for, and what they do.
Redis¶
This service runs the official Redis image on the Manager. It is needed by RQ (Redis Queue) that we use for job queueing.
Mongo¶
Mongo is a database that we use for temporary storage of binaries and result files.
Builder¶
This is one of the two services running an RQ worker. It starts a single
container on the manager, and listens on the build
queue for jobs.
It uses the distcc servers from the distcc
service through the buildnet
network to distribute the compilation tasks across all nodes.
Once a build is done, it submits the binaries (actually, the libINET.so
file) to the Mongo service,
so the runner containers can access it later.
Runner¶
The other RQ worker, running as many containers in each host, as their respecrtive number of cpu cores. Except the manager, because that needs some extra juice running the other services, like redis and mongo. it is done by requesting a large number of containers (100), but reserving 95% of a core for each container, so they automatically "expand" to "fill" the available number of (remaining) CPUs, like a liquid.
It gets the built libINET.so from the MongoDB server, and also submits the simulation results there.
Visualizer¶
This service starts a single container on the manager, using
the official docker-swarm-visualizer image.
In that container, a web server runs that lets you quickly inspect
the state of the swarm, including its nodes, services and containers,
just using your web browser. It listens on port 8080
, so once
your swarm application is up and running (and you are connected to the swarm
if it is on AWS), you can check it out at [http://localhost:8080/].
Dashboard¶
Similarly to the visualizer
, this is an auxiliary service, running a single
container on the manager, with a web server in it.
This one lets you see, and manage in a limited way, the RQ queues, workers, and jobs.
See: [http://localhost:9181/].
distcc¶
This service starts exactly one container on all nodes (workers and the manager alike).
They all run a distcc server, listening for incoming requests for compilation (completely
independent from RQ).
They are only attached to the buildnet
network, and have deterministic IP addresses.
When the builder
container starts a build in a build
job, it will try
to connect to the distcc
containers, and will use them to distribute
the compilation tasks to all nodes.
Networks¶
The stack also contains two virtual networks. Each service is attached to one or both of these networks. The networks are:
- interlink
- buildnet
Both of them use the overlay
driver, meaning that these are entirely virtual
networks, not interfering with the underlying real one between the nodes.
Interlink¶
This is the main network, all services except distcc
are attached to it.
Buildnet¶
The buildnet
network connects the containers of the distcc
service
with the builder
service. It operates on a fixed subnet, The builder
This was only necessary to give the distcc
containers deterministic and known
IP addresses. On interlink
they didn't always get the same addresses, they
were randomly interleaved with the containers of all the other services.
This would not be necessary at all if multicast traffic worked on overlay
networks between nodes, because then we could just use the built-in zeroconf
service discovery capabilities of distcc (the software itself). However, until
this issue is resolved, we
have to resort to this solution.
Operation¶
aws_swarm_tool init:
Deploying the official CloudFormation template supplied by Docker, called Docker for AWS.
Using the default settings, the script creates 1 manager and 3 workers, each of them as a c4.4xlarge
type Instance.
It also creates an alarm and an AutoScaling policy that makes sure that all machines are shut down after 1 hour of inactivity (precisely, if the maximum CPU utilization of the manager machine was below 10 percent for 4 consecutive 15 minute periods). This is to reduce the chances that they are forgotten about, and left running indefinitely, generating unexpected expenditure.
To be able to connect to the Swarm we are about to create, we must first create and SSH keypair.
The aws_swarm_tool.py
can do this for us.
Connecting to the Swarm is essentially opening an SSH connection to the manager machine, and forwarding
a handful of ports through that tunnel from the local machine to the swarm.
There is no need to do it manually, the aws_swarm_tool.py
script has a command for it:
$ aws_swarm_tool.py connect
In addition to bringing up the SSH connection, the script also saves the process ID (PID) of the SSH client process into a file (so called PID-file) in a temporary directory (most likely /tmp
), called
What is Docker Swarm?¶
With version 1.12.0, Docker introduced Swarm Mode. It makes it possible to connect multiple computers (called hosts or nodes) on a network, into a cluster - called swarm.
This new feature enables someting called "container orchestration". It makes the development, deployment, and maintenance of distributed, multi-container applications easier. You can read more about it here.
One great advantage of this is that wherever a Docker Swarm is configured, any application can be run, let it be on local machines, or any cloud computing platform.