CI/CD for ML (unit test models, data tests)

What is CI/CD for Machine Learning?

In traditional software, CI/CD (Continuous Integration/Continuous Deployment) is about automatically testing and deploying code. In Machine Learning, the system is composed of code + data + model. A change in any of these components can break your system.

CI/CD for ML is the practice of automating the testing and deployment of the entire ML pipeline, ensuring that every change—whether to code or data—results in a robust, validated model.

1. Unit Testing Your Model

Your trained model is a software artifact. You should test it like one! Model unit tests don't check for statistical performance (like accuracy), but for programmatic correctness and expected behavior.

What to test:

Input/Output Shape: Does the model crash if it receives an input with the wrong number of features? Does its output have the expected shape?
Data Types: Does the model handle the correct data types (float32, int64, etc.)?
Prediction on a Known Example: If you give the model a specific input for which you know the expected output, does it produce it? This catches regressions.
Invariance: The model's prediction shouldn't change based on irrelevant factors. For example, for tabular data, the prediction should be the same even if you shuffle the column order (assuming you use names, not indices).

Code Example (using pytest):

Python

import pytest
import joblib
import pandas as pd

@pytest.fixture
def model():
    # Load a pre-trained model for testing
    return joblib.load("models/titanic_classifier.joblib")

def test_model_prediction_on_known_sample(model):
    """Test the model's prediction on a single, known data point."""
    # Data for a passenger who should be classified as 'survived'
    sample_data = pd.DataFrame({
        'Pclass': [1], 'Sex': ['female'], 'Age': [38.0],
        'SibSp': [1], 'Parch': [0], 'Fare': [71.2833]
    })

    prediction = model.predict(sample_data)[0]
    assert prediction == 1, "Prediction for known 'survived' sample is incorrect"

def test_model_output_shape(model):
    """Test that the model output is a 1D array."""
    sample_data = pd.DataFrame({
        'Pclass': [3], 'Sex': ['male'], 'Age': [22.0],

        'SibSp': [1], 'Parch': [0], 'Fare': [7.25]
    })
    prediction = model.predict(sample_data)
    assert len(prediction.shape) == 1

2. Testing and Validating Your Data

Your model is only as good as your data. Data tests are crucial for preventing model failures caused by unexpected changes in the input data. This is where "Garbage In, Garbage Out" comes from.

Libraries like Pandera or Great Expectations are excellent for this. They allow you to define a "schema" for your data and validate that your dataframes conform to it.

Code Example (using pandera):

Python

import pandas as pd
import pandera as pa

# Define the expected schema for your training data
data_schema = pa.DataFrameSchema({
    "Age": pa.Column(float, checks=pa.Check.in_range(0, 100), nullable=True),
    "Sex": pa.Column(str, checks=pa.Check.isin(["male", "female"])),
    "Pclass": pa.Column(int, checks=pa.Check.isin([1, 2, 3])),
    "Survived": pa.Column(int, checks=pa.Check.isin([0, 1])),
})

def test_training_data_schema():
    """Validates the schema of the raw training data."""
    raw_data = pd.read_csv("data/train.csv")
    try:
        data_schema.validate(raw_data, lazy=True)
    except pa.errors.SchemaErrors as err:
        pytest.fail(f"Schema validation failed: {err}")

3. Automating with a CI Pipeline

You can put these tests together in a CI pipeline using tools like GitHub Actions or GitLab CI. The pipeline will run automatically whenever new code or data is pushed.

Conceptual CI Pipeline (.github/workflows/ci.yml):

YAML

name: ML Model CI

on: [push]

jobs:
  test-and-validate:
    runs-on: ubuntu-latest
    steps:
      - name: Check out repository
        uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Validate data
        # This step would run your Pandera/Great Expectations tests
        run: pytest tests/test_data.py

      - name: Test model
        # This step would run your model unit tests
        run: pytest tests/test_model.py

      - name: Train model if tests pass
        # You could add a step here to retrain the model
        # and store it as an artifact if all tests are successful
        run: python src/train.py

LearnCodePro

CI/CD for ML (unit test models, data tests)

What is CI/CD for Machine Learning?

1. Unit Testing Your Model

2. Testing and Validating Your Data

3. Automating with a CI Pipeline

Saving & loading models (Pickle, TorchScript, SavedModel)

Serving models with Flask / FastAPI (REST + batching)

Dockerizing ML apps & multi-stage builds

Model serving platforms (TorchServe, TensorFlow Serving)

Data versioning (DVC) & experiment tracking (MLflow)

Quick Navigation

This Series

Topics in Data Science, Machine Learning & AI

Categories

Learn More

Want to Track Your Progress?

What is CI/CD for Machine Learning?

1. Unit Testing Your Model

2. Testing and Validating Your Data

3. Automating with a CI Pipeline

More in Model Deployment & MLOps

Saving & loading models (Pickle, TorchScript, SavedModel)

Serving models with Flask / FastAPI (REST + batching)

Dockerizing ML apps & multi-stage builds

Model serving platforms (TorchServe, TensorFlow Serving)

Data versioning (DVC) & experiment tracking (MLflow)

Quick Navigation

This Series

Topics in Data Science, Machine Learning & AI

Categories

Learn More

Want to Track Your Progress?