This month, I'll be writing about how I set up my self-hosted continuous integration setup (powered by Jenkins and Docker). In this initial post, I wanted to provide some motivation as to why one would want to do this and explain what continuous integration (CI) actually is.

Setting it up took a lot of scouring the internet, so this tutorial aims to be a centralized repository for how to minimally use Jenkins with containerized builds and provide a sample configuration for use.

I'll link the individual parts here as I write them:

Why even test research code?

It's no secret that researchers have terrible code quality. Unlike in software engineering where the code is the product, code is merely a means for running experiments and getting results in research. As a result, it's naturally not a primary focus of researchers. That being said, I think that open science and reproducibility are critical for scientific progress. If more people released better code, it'd be easier to build on prior work and have reproducible and trustable baselines.

To this end, I've always found unit tests to be vital in my machine learning / NLP research code. Machine learning systems are full of moving parts and complex logic. To me, unit tests have a few roles:

  • Make sure that my data pipeline does what I want it to.
  • Ensure that my models minimally train without crashing on toy data.
  • Verify that new code I write doesn't break anything I've already written.
  • Check that my models can save / load properly, and that the results are reproducible across runs.
  • Help me catch bugs in models before I actually train on real data.

Suppose you're trying out a new research idea, be it a new model architecture, initialization strategy, or something else. If you run the code and don't get the results you want, you're left in a quandary --- is something wrong with your code, or does the idea simply not work? Unit tests allow me to rule out the first option with some degree of confidence (assuming the tests are good).

Why set up your own CI server?

In short, continuous integration (CI) tools automatically run your tests after each code push you make to a branch in your version control system to make sure that things still work -- ideally, this would prevent you from ever merging code that fails tests into master. In addition, CI can test your code on a host of different environments (e.g. different versions of Python, CPU vs GPU, etc).

There are many great options for cloud-hosted continuous integration out there. I'm partial to TravisCI, as it's quite easy to configure and very reliable. They've generously provided free access to their service if testing open source repos, so that's a very attractive option if you're working in a public Github repository.

However, much of the research code I write is in private repos. TravisCI provides 1 job for private repos to students as part of the Github student developer pack. For research projects where the tests can take a non-trivial amount of time to run, this 1-job cap can really show down progress. For example, I was recently working on a project where each build took around 7 minutes. Since I ran Travis on Python 2.7, 3.5, and 3.6, building each push took a whopping 21 minutes! Waiting 21 minutes to verify that a push passes tests is far too long, especially when you're usually writing more code in the meantime and you want to merge features as fast as possible. As a result, I turned to setting up my own continuous integration server with the open-source Jenkins to speed up my builds and thus let me iterate faster.

Next: Self-hosted CI for Research, Part 1: Running Jenkins builds in Docker Containers