Notebooks → Production Scripts Refactor Patterns

Why Refactor?

Jupyter notebooks are amazing for:

Quick prototyping
Visualization
Storytelling

But they’re bad for production because:

Hidden states → results depend on execution order
Hard to test → no modular functions
Messy structure → one giant notebook with 1,000 lines

That’s why we refactor into modular scripts.

🔹 Step 1: Modularize Functions

Instead of copy-pasting code across cells, extract reusable functions.

Before (Notebook Cell):

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

After (Script):

# src/preprocessing.py
from sklearn.preprocessing import StandardScaler

def scale_features(X_train, X_test):
    scaler = StandardScaler()
    return scaler.fit_transform(X_train), scaler.transform(X_test)

🔹 Step 2: Config Management

Hardcoded params → YAML/JSON configs.

# config.yaml
model:
  type: RandomForest
  n_estimators: 100
  max_depth: 5

import yaml
params = yaml.safe_load(open("config.yaml"))
print(params["model"]["n_estimators"])

🔹 Step 3: Separate Stages into Scripts

data_loader.py → load/clean data
features.py → feature engineering
train.py → training logic
evaluate.py → metrics, plots

Now you can run:

python src/train.py --config config.yaml

🔹 Step 4: Automate Pipelines

Use Makefile or Prefect/Airflow to chain tasks.

Makefile Example:

train:
	python src/train.py --config config.yaml

🔹 Step 5: Testing & CI

Write unit tests for functions, e.g.:

def test_scale_features():
    X_train, X_test = [[1],[2]], [[3]]
    X_train_scaled, X_test_scaled = scale_features(X_train, X_test)
    assert X_train_scaled.shape[0] == 2

Run automatically with GitHub Actions.

🔹 Example Refactor Flow

Notebook → experiment + visualization only
Move functions → src/
Save config in YAML
Add unit tests in tests/
Automate with Makefile + CI

LearnCodePro

Notebooks → Production Scripts Refactor Patterns

Why Refactor?

🔹 Step 1: Modularize Functions

🔹 Step 2: Config Management

🔹 Step 3: Separate Stages into Scripts

🔹 Step 4: Automate Pipelines

🔹 Step 5: Testing & CI

🔹 Example Refactor Flow

Jupyter Notebook → JupyterLab Advanced Tips

Using Git for Data Science Workflows

Virtual Environments & Reproducible Setups (conda)

Quick Navigation

This Series

Topics in Data Science, Machine Learning & AI

Categories

Learn More

Want to Track Your Progress?

Why Refactor?

🔹 Step 1: Modularize Functions

🔹 Step 2: Config Management

🔹 Step 3: Separate Stages into Scripts

🔹 Step 4: Automate Pipelines

🔹 Step 5: Testing & CI

🔹 Example Refactor Flow

More in Tools & Ecosystem

Jupyter Notebook → JupyterLab Advanced Tips

Using Git for Data Science Workflows

Virtual Environments & Reproducible Setups (conda)

Quick Navigation

This Series

Topics in Data Science, Machine Learning & AI

Categories

Learn More

Want to Track Your Progress?