May 24, 2013

Stop wasting bandwidth with vagrant-cachier

If you have done any kind of Puppet manifests / Chef cookbooks development using Vagrant chances are that you’ve been staring at your screen waiting for the machine to be provisioned for really long periods of time, specially when you need to destroy the VM and start over.

A while ago I came across this gist which solves part of the issue by caching downloaded packages on the host machine and sharing them among similar VM instances. After copying and pasting it on different projects, I decided to extract it to a Vagrant plugin and expand its usage by supporting multiple Linux distros and package managers allowing others to benefit from it as well.

I started spiking the plugin a while ago and after using it on a couple projects today I went ahead and open sourced it. The code is not the best you’ll find around and right now it supports caching for APT, Yum, Pacman and RubyGems packages and I’m planning to add others as needed.

On a side note, this is probably the first Vagrant plugin to make use of guest capabilities that I’m aware of ;)

How does it work?

From the project’s README:

Under the hood, the plugin will hook into calls to Vagrant::Builtin::Provision during vagrant up / vagrant reload and will set things up for each configured cache bucket. Before halting the machine, it will revert the changes required to set things up by hooking into calls to Vagrant::Builtin::GracefulHalt so that you can repackage the machine for others to use without requiring users to install the plugin as well.

Cache buckets will be available from /tmp/vagrant-cachier on your guest and the appropriate folders will get symlinked to the right path after the machine is up but right before it gets provisioned. We could potentially do it on one go and share bucket’s folders directly to the right path if we were only using VirtualBox since it shares folders after booting the machine, but the LXC provider does that as part of the boot process (shared folders are actually lxc-start parameters) and as of now we are not able to get some information that this plugin requires about the guest machine before it is actually up and running.

Please keep in mind that this plugin won’t do magic, if you are compiling things during provisioning or manually downloading packages that does not fit into a “cache bucket” you won’t see that much of improvement.

UPDATE: Please refer to the project’s docs for the most up-to-date information about it. Things have changed a bit lately and are likely to change a bit more ;)

Show me the numbers!

I’ve done some pretty basic testing on four different boxes doing something along the lines of this script on VirtualBox VMs with NFS and machine cache scope enabled:

The times shown below are just for provisioning after the machine has already been brought up on a 35mb connection:

First provisionSecond provisionDiff.APT cache

As I said, the plugin does not do any magic and it will just save you from downloading packages. For instance, my rails-base-box compiles Ruby 2.0 from source using ruby-build and there’s nothing much we can do about it (apart from using precompiled rubies of course).

If you do the maths, on average those numbers represents ~41% drop on provisioning time. In my opinion this alone represents a huge win, specially if you are running a CI server as it means a faster feedback loop. It also means that if a mirror is slower that usual for some weird reason or if you are on a 3G connection, it’ll save you a few mbs worth of downloading packages. Not to say that is “against etiquette towards package hosters” to download those files over and over throughout the day.

So please, be nice and stop wasting yours and others bandwidth :)


I’ve been stalking watching what people have been saying about the plugin on twitter and looks like some have experienced an even bigger drop on provisioning time:

© Fabio Rehm 2013-2022

Powered by Hugo & Kiss.