How it Works
The application is made up of several Docker services: redis, mongo, builder, runner, visualizer, dashboard, distcc. Let's discuss what each of them are there for, and what they do.
This service runs the official Redis image on the Manager. It is needed by RQ (Redis Queue) that we use for job queueing.
Mongo is a database that we use for temporary storage of binaries and result files.
This is one of the two services running an RQ worker. It starts a single
container on the manager, and listens on the
build queue for jobs.
It uses the distcc servers from the
distcc service through the
network to distribute the compilation tasks across all nodes.
Once a build is done, it submits the binaries (actually, the
libINET.so file) to the Mongo service,
so the runner containers can access it later.
The other RQ worker, running as many containers in each host, as their respecrtive number of cpu cores. Except the manager, because that needs some extra juice running the other services, like redis and mongo. it is done by requesting a large number of containers (100), but reserving 95% of a core for each container, so they automatically "expand" to "fill" the available number of (remaining) CPUs, like a liquid.
It gets the built libINET.so from the MongoDB server, and also submits the simulation results there.
This service starts a single container on the manager, using
the official docker-swarm-visualizer image.
In that container, a web server runs that lets you quickly inspect
the state of the swarm, including its nodes, services and containers,
just using your web browser. It listens on port
8080, so once
your swarm application is up and running (and you are connected to the swarm
if it is on AWS), you can check it out at [http://localhost:8080/].
Similarly to the
visualizer, this is an auxiliary service, running a single
container on the manager, with a web server in it.
This one lets you see, and manage in a limited way, the RQ queues, workers, and jobs.
This service starts exactly one container on all nodes (workers and the manager alike).
They all run a distcc server, listening for incoming requests for compilation (completely
independent from RQ).
They are only attached to the
buildnet network, and have deterministic IP addresses.
builder container starts a build in a
build job, it will try
to connect to the
distcc containers, and will use them to distribute
the compilation tasks to all nodes.
The stack also contains two virtual networks. Each service is attached to one or both of these networks. The networks are:
Both of them use the
overlay driver, meaning that these are entirely virtual
networks, not interfering with the underlying real one between the nodes.
This is the main network, all services except
distcc are attached to it.
buildnet network connects the containers of the
builder service. It operates on a fixed subnet, The
This was only necessary to give the
distcc containers deterministic and known
IP addresses. On
interlink they didn't always get the same addresses, they
were randomly interleaved with the containers of all the other services.
This would not be necessary at all if multicast traffic worked on
networks between nodes, because then we could just use the built-in zeroconf
service discovery capabilities of distcc (the software itself). However, until
this issue is resolved, we
have to resort to this solution.
Deploying the official CloudFormation template supplied by Docker, called Docker for AWS.
Using the default settings, the script creates 1 manager and 3 workers, each of them as a
c4.4xlarge type Instance.
It also creates an alarm and an AutoScaling policy that makes sure that all machines are shut down after 1 hour of inactivity (precisely, if the maximum CPU utilization of the manager machine was below 10 percent for 4 consecutive 15 minute periods). This is to reduce the chances that they are forgotten about, and left running indefinitely, generating unexpected expenditure.
To be able to connect to the Swarm we are about to create, we must first create and SSH keypair.
aws_swarm_tool.py can do this for us.
Connecting to the Swarm is essentially opening an SSH connection to the manager machine, and forwarding
a handful of ports through that tunnel from the local machine to the swarm.
There is no need to do it manually, the
aws_swarm_tool.py script has a command for it:
$ aws_swarm_tool.py connect
In addition to bringing up the SSH connection, the script also saves the process ID (PID) of the SSH client process into a file (so called PID-file) in a temporary directory (most likely
What is Docker Swarm?¶
With version 1.12.0, Docker introduced Swarm Mode. It makes it possible to connect multiple computers (called hosts or nodes) on a network, into a cluster - called swarm.
This new feature enables someting called "container orchestration". It makes the development, deployment, and maintenance of distributed, multi-container applications easier. You can read more about it here.
One great advantage of this is that wherever a Docker Swarm is configured, any application can be run, let it be on local machines, or any cloud computing platform.