July 22, 2015

Building a minimum viable PhantomJS 2 Docker image

As part of something I’ve been hacking on the side, I have a need to run a bunch of PhantomJS 2.0 containers on a Docker host. While I could’ve just built an image that includes its binary and consider it done, there is currently a need to build the phantomjs binary from sources for Linux machines. Not only that is a PITA but it also requires us to do some “juggling” to clean up build-time dependencies and it still produces a somewhat large Docker image as a result (something in the ~400mb).

After some initial research I could find a Docker image that does that heavy lifting but I still wanted a smaller image as I have been bitten by the “minimalist docker images” bug after coming across this blog post and also getting to know Alpine Linux.

Compiling from sources under gliderlabs/alpine (failed)

My first attempt was to use the gliderlabs/alpine image and build phantomjs from sources but unfortunately that didn’t work. I’m not sure what is the actual root cause for that but I had lots of weird compilation errors. I risk saying it is because of musl libc or the compiler available to Alpine Linux but I haven’t had a chance to dig into it yet. I even tried using Alpine’s prebuilt QtWebKit packages (required by PhantomJS) but had no luck as well.

Using dockerize to produce a minimalist image

After failing at compiling PhantomJS from sources, I decided to take a stab on using dockerize to produce the bare minimum required to run the phantomjs CLI.

dockerize is a pretty cool tool for creating minimal Docker images from dynamically linked ELF binaries. Its CLI has many different options and the idea is that you can run a simple dockerize -t sed /bin/sed and get a minimal image with everything that is needed for sed to run using scratch as the starting point.

In order to make things simpler, I created an environment based off of rosenhouse/phantomjs2 with dockerize in place and started hacking away on top of it. After lots of experiments, this is what I put together to produce the sources for a minimalist PhantomJS image:

# For the most up to date version of this, please check the link below:
#   https://github.com/fgrehm/docker-phantomjs2/blob/master/dockerize-phantomjs
dockerize -n -o dockerized-phantomjs \
          -e $(which phantomjs) \
          -a /bin/dash /bin/sh \
          -a /etc/fonts /etc  \
          -a /etc/ssl /etc  \
          --verbose \
          $(which phantomjs) \
          /usr/bin/curl

Can’t get any easier than that right?

But that actually took me a while to get it right. As you might imagine, things did not work on my first interactions with dockerize. The tool itself works perfectly, but for reasons that I haven’t figured out yet the phantomjs CLI did not work properly on my new image unless I also vendor curl related dependencies. Another thing that gave me trouble was the fact that system fonts were not being included on the new image and screenshots produced by phantomjs for some apps would come out with blank text blocks because of that.

My advice in case you decide to try out dockerize with some other executable is to provision a container with the executable within a docker run and docker diff CONTAINER_ID afterwards, looking for potential “suspects” that might be missing on your minimal image.

Publishing the image on Docker Hub

With everything running smooth locally, the next step was to set up an Automated Build and get the image on the hub. There is just a small gotcha around that: AFAIK there is no way we can docker inside docker build as it does not support the --privileged flag that is required to run nested docker instances.

To work aroud that, I created a GitHub release on the project with a tarball of all dependencies that can be extracted under /.

Based on the Dockerfile I used to build the image locally, my initial Dockerfile looked like this:

FROM scratch
ADD https://github.com/fgrehm/docker-phantomjs2/releases/download/v2.0.0-20150722/dockerized-phantomjs.tar.gz /
ENTRYPOINT ["/usr/local/bin/phantomjs"]

But that did not work and the reason can be found in the ADD instruction docs:

If <src> is a local tar archive in a recognized compression format (identity, gzip, bzip2 or xz) then it is unpacked as a directory. Resources from remote URLs are not decompressed. When a directory is copied or unpacked, it has the same behavior as tar -x, the result is the union of:

  1. Whatever existed at the destination path and
  2. The contents of the source tree, with conflicts resolved in favor of “2.” on a file-by-file basis.

I got tricked by the fact that local files ADDed to an image are automagically extracted but remote ADDed files are not. Since the scratch image has an empty filesystem, my solution to this was to keep things simple again and just switch to an Alpine Linux base image that includes tar and curl so I can download the tarball from GitHub and extract it on top of the image’s / without relying on the ADD behavior:

FROM gliderlabs/alpine:3.2
RUN apk-install curl \
    && curl -Ls https://github.com/fgrehm/docker-phantomjs2/releases/download/v2.0.0-20150722/dockerized-phantomjs.tar.gz \
       | tar xz -C /
ENTRYPOINT ["/usr/local/bin/phantomjs"]

That’s mostly because the phantomjs tarball has ~50Mb and I didn’t want to include it on source control, but if your executable is small, I’d recommend keeping it around and sticking to the FROM scratch + ADD tarball.tgz / combo when possible.

That’s it

The image is already available on the Docker Hub as fgrehm/phantomjs2 so feel free to give it a try.

Some of you might question the security out of using my image as it involves extracting a remote tarball on top of /. If you are one of those feel free to build the image yourself, its as easy as a git clone and a make build.local. If you want to go hardcore and / or have the time to spend building phantomjs from sources, you can make phantomjs.build build.local :)

So far things have been working fine for the examples provided by the phantomjs project itself and some other hacks I put together, but please let me know in case you have any trouble!

© Fabio Rehm 2013-2022

Powered by Hugo & Kiss.