Speeding up your local Docker builds

raver119
4 min readSep 22, 2020

Every day I build containers locally. Many of us do — Docker is a crucial tool in the developer arsenal these days. However, when you build and maintain 1–2 containers — that’s great, but when you maintain an app built from 10 containers? 15? 20? Each local build, even with cache enabled, will take a significant amount of time. Multiply that by a number of daily rebuilds — and you’ll see hours of your precious time leaking away!

Sure, there’s CI/CD tools capable of parallel building, but what about local builds? I’ve spent some time googling for a solution to this small problem, and, to my surprise, wasn’t able to find one.

So, I’ve thought I could make a useful tool for myself, plus a blog post for others :)

Step 0. Let’s define the problem and requirements.

I need an app, I’ll call it Krane, that is able to build multiple Docker images in parallel. There are a few special requirements for this app:

  • Build configuration must be persistent. I’d hate using CLI to describe 10–20–30 images. So, JSON or YAML build configuration would be good.
  • Krane should take care of internal dependencies: if image2 depends on image1, it will guarantee that image2 will be built only after image1 is built.
  • Krane should be aware of build outcome: if one of the images fails the build — all remaining images should fail too.
  • Krane should be 100% compatible with Minikube, which I use for local development. However, this requirement will be satisfied automatically if I just use Docker executable.

Since this app needs concurrent execution, Python wouldn’t be the first tool to use for me. Golang will fit way better, I believe.

Step 1. Build configuration in a file.

My new tool should be able to read build configuration from a file, passed as an argument into the app. Something like app -f configFile.yml.

Golang provides JSON support out of the box, without any external dependencies. However, since Docker/Kubernetes environment is YAML-centric, it would make sense to use YAML for build configuration as well.

Golang has no issues with YAML as well, there are multiple libraries providing YAML support. I prefer this one.

Deserializing YAML build configuration

Now, since I’m able to read random YAML files, time to make sure I’m able to pass the configuration file as a CLI argument.

Golang has a builtin package for that as well: flag. Defining all the needed input arguments is really trivial:

Parsing CLI arguments

Now, something like krane -f config.yml is definitely going to work :)

Step 2. Find dependencies within the build task.

When you’re building a bunch of independent containers — dependencies tracking is not a problem — by definition. But if some images depend on other images within the task — it means that dependencies must be built before images that depend on them.

In other words: I have to search for internal dependencies before building anything. Thanks to Docker developers, the Dockerfile format is pretty straightforward: there’s a dedicated <FROM> keyword, so old good regular expressions will do the job.

Finding dependencies for each container in the build job

When applied to every Dockerfile in the job, it’ll get me a full map of dependencies, where the key is a container name, and the value is a slice of containers it depends on.

Step 3. Organizing the build process.

The map of dependencies is good, but how can I use it to organize the build process? One of the simplest ways is to represent the build it as a sequence of sequences of independent build steps. Basically topological sort of the graph, when the outcome isn’t 1D sequence, but 2D instead, to allow parallelism.

It might sound tough, but it’s really trivial. Imagine the following algorithm:

  • All independent containers are built first, in parallel. Let’s call it “Layer”.
  • Containers that depend on the previous layer are built in parallel.

The last step is repeated until all containers are built.

With this approach, the executor will dispatch individual build jobs to separate goroutines on each Layer. Once all jobs dispatched, the executor will wait until all jobs are finished, before switching to the next layer

Step 4. Handling the outcome.

The last requirement I have is the build state handling and transfer: if one of the jobs fails — it shouldn’t be silently swallowed. I must be aware of the problems as soon as they arise: there’s no sense waiting for the full build to finish if one of the jobs failed. So, early stopping would be a “really nice to have” feature.

Luckily, Golang has channels for communications between goroutines, so each worker will get a channel for reading build jobs, and channel for reporting. The reporting channel will be used for tracking the outcome of each build.

Blocking dispatcher internals

Final step. Comparing the apples.

It’s time to see numbers. For the performance test, I’ve mastered pretty much realistic sample deployment: 4 containers building React apps (frontend part), 2 containers building Go apps (backend part), and an ML-deployment container (almost static one). I will compare build time twice: the first run with a no-cache option, and the second run without it.

no-cache sequential build time:

real 7m15,544s
user 0m5,941s
sys 0m7,696s

no-cache parallel build time:

real 2m46,595s
user 0m5,800s
sys 0m8,451s

partially cached sequential build time:

real 2m57,410s
user 0m6,304s
sys 0m7,320s

partially cached parallel build time:

real 0m17,323s
user 0m6,111s
sys 0m7,808s

So, the relative speedup is somewhere between x2.5 and x10, which is just great for me: my typical builds are partially cached. I’ll save lots of my time using this small tool.

I hope you’ll find it useful too.

Feel free to contact me if you have any questions :)
As usual, the source code for this app is available on GitHub.

--

--