Introducing Strong-Bootcamp: An Ultra-lightweight Solution for Rigorous Machine Learning Development

At Strong, we help companies integrate machine learning into products, internal tools and automated processes.

Through our work, we’ve learned a lot about what makes a machine learning project or platform successful. Of course, a large part of an ML project' success can be attributed to doing the necessary algorithmic research and optimization to settle on the best solution. But another major contributor, separate from the particulars of any solution, is how one approaches the problem:

How should we compare models?
How can we collaborate on the problem?
How should we measure performance?
How should we integrate the ML components with the rest of the project?
How will we share training and validation data?
How do we modify models after deployment?

Answering these questions isn't easy and well beyond the scope of this brief blog post (sorry!). But it's important that — when you do answer these questions — you reify your approach by building a technical ecosystem that enforces your decisions.

By settling on a rigorous, well-defined approach, you can avoid problematic practices we see in the field, such as:

Training and comparing models on different datasets. Which model is better? Well, place your bets, then throw those bets out the window because you'll never really know!
Building applications around the particulars of a certain algorithmic implementation, instead of allowing for new implementations to be easily swapped in later.
Building models in a Jupyter notebook and, when happy with performance, doing some manual export/import process to embed and deploy the model (with fingers crossed, of course).
Not logging and archiving experimentation results. On the one hand, it’s a huge wasted effort for your data science team. On the other hand, it can be a real bonding experience for every new data scientist to unknowingly run the same experiments and run into the same issues.

Each of these practices makes model development and deployment less rigorous and more error-prone. More generally, they are hallmarks of unreproducible science — processes that are inadequately documented and difficult or impossible to replicate.

Pre-existing data-science-platforms-as-a-service like Domino can enforce rigorous standards by requiring data scientists to adopt a workflow enforced by the platform. However, third-party services like this can also add a lot of overhead in terms of cost and process complexity for an otherwise simple project executed by a small team.

For these simpler cases, we built strong-bootcamp, an ultra-lightweight Python package that provides a flexible ecosystem in which to develop, compare, and deploy machine learning flows with increased rigor and reproducibility.

How it works

Build a “bootcamp”

A bootcamp is a simple Python class that gathers data (likely, in __init__()), passes that data to a candidate model in train(), and passes validation data to a trained model in validate() method before calculating standardized performance metrics based on the model’s predictions.

Example:

from collections import namedtuple

# define an example data structure that will be used
text = namedtuple("text", ["string"])


class Camp(object):
    def __init__(self):
        # pre-load data for use in training and testing
        self.training_data = [
            text(string="I am a nice string"),
            text(string="I am also a very average string"),
        ]

        self.training_classes = [True, False]

        self.validation_data = [
            text(string="I am a new string"),
            text(string="I am another new string"),
        ]

        self.validation_classes = [False, False]

    def train(self, model):
        """
        Receive an untrained model and pass it whatever is required to train it.
        """
        model.train(text=self.training_data, classes=self.training_classes)

    def validate(self, model):
        """
        Receive a trained model and pass it whatever is required to validate it. Return
        the validation dictionary it returns.
        """

        # before doing statistical validation, we can do some miscellaneous validations
        if type(model).__name__ == "random":
            raise Exception(
                "Models can't be named random. It scares the marketing team."
            )

        # run the validation and return the results
        predictions = model.predict(text=self.validation_data)

        return {
            "accuracy": sum(
                [
                    1 if p == self.validation_classes[i] else 0
                    for i, p in enumerate(predictions)
                ]
            )
            / float(len(predictions))
        }

Configure your Bootcamp

Bootcamp configuration files are YAML files that point to the location of your Bootcamp class (above), define the minimum requirements for models that can solve this problem, and define the metrics that will be gathered.

Along with actually being loaded and used to validate models before we begin training (and stumble upon a bug a few hours down the line), these simple configuration files serve to document the problem and the requirements for potential solutions. Ideally, this encourages model building — especially later, after the problem has been generally forgotten and companies run the risk of leaving models as-is because people are afraid of breaking things.

An example configuration file:

bootcamp:
 module: example.camp
 callable: Camp
model_requirements:
 parameters:
 n_epochs
 methods:
 train: [text, classes]
 save: [path]
 load: [path]
 predict: [text]
 validation_metrics: accuracy

Configure your models

All that’s left is defining a final configuration file that is meant to declare the active candidate models for the bootcamp (aside from, you know, actually building and implementing the models…).

Model configuration files give names to each model, describe where they can be found (it assumes they are Python classes that exist in some callable module), as well as ranges of hyperparameters that will be explored in a grid search every time you run the bootcamp.

Here’s an example of a model configuration file for two models with slightly different parameterizations:

models:
 linear:
 module: example.models.linear
 callable: LinearModel
 parameters:
 l1: [.1, .5, .9]
 loglinear:
 module: example.models.loglinear
 callable: LogLinearModel
 parameters:
 l2: [.1, .2, .3]

Run the bootcamp

Once everything is in place, you can run the bootcamp at any time from the command line:

bootcamp --config=bootcamp.yml --models=models.yml --results=./results/

Once started, the bootcamp will:

Validate each of your models
Prepare the bootcamp
Prepare the grid-search for each candidate model
Initialize and then pass each model for training
Validate each model and save the metrics (and parameter/model metadata) to the results folder.

You can monitor progress as you go in the logs. Results are saved as JSON files with metadata and metrics, and are therefore easily importable into R or Python for further analysis and visualization.

Why use Strong Bootcamp?

Going back to the questions we touched on earlier, and the kinds of problems we saw in machine learning, one can see how strong-bootcamp implements a few key principles that can help make machine learning development projects successful. It helps you to:

Build and test models collaboratively in a shared ecosystem.
Strictly enforce shared training/validation data across models.
Gather the same metrics for each candidate model.
Log experiment results in an archive with appropriate meta-data.

Moreover, because it’s so lightweight, it enables you to train and validate models in situ — i.e., in the context of your broader application — while remaining naïve to how your models are implemented internally. Models expose APIs, but the bootcamp and the rest of the application are otherwise completely naïve to the models’ internals. This solves two additional problems:

Deploying models should be easy. Models are validated and tested before deployment and, in most cases, ‘deployment’ should just mean pointing to your best-performing model without any error-prone, manual intervention required. A good rule of thumb (and a sign that your bootcamp is properly setup): A model that makes it through your bootcamp is fit to deploy.
You won't get cornered into a certain implementation. By defining the minimum requirements for a model and validating those that are essential for integration into your application, you encourage exploration and iteration on all other fronts.

We don’t always use strong-bootcamp. Sometimes it’s too simple, and it’s far from the full-featured data platforms that some projects and teams demand. Nevertheless, it can provide just the right amount of structure to facilitate model development and deployment for some projects.

Interested in trying it out yourself? Check out the strong-bootcamp Github repository!