Start with simple baselines
- Mean predictor for regression (predict average).
- Most frequent class for classification.
- Logistic Regression, k-NN, or Decision Tree as early baselines.
Baselines give a minimal performance bar: anything more complex should significantly beat the baseline.
Criteria for model selection
- Accuracy / performance on validation/test.
- Interpretability: feature importance, coefficients.
- Training / inference time and resource usage.
- Maintenance: complexity of deployment and debug.
Classification metrics
- Accuracy: (TP+TN)/total — can be misleading with imbalance.
- Precision: TP / (TP + FP) — proportion of positive predictions that were correct.
- Recall (Sensitivity): TP / (TP + FN) — fraction of actual positives found.
- F1-score: harmonic mean of precision & recall.
- ROC AUC: area under ROC curve — threshold-independent.
- Confusion matrix for deeper analysis.
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score
Regression metrics
- MAE (Mean Absolute Error) — average absolute errors.
- RMSE (Root Mean Squared Error) — penalizes large errors more.
- R² (coefficient of determination) — proportion of variance explained.
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score rmse = mean_squared_error(y_true, y_pred, squared=False)
Model comparison via cross-validation
from sklearn.model_selection import cross_val_score scores = cross_val_score(LogisticRegression(), X, y, cv=5, scoring="f1")
Threshold tuning & calibration
- For probability outputs, choose threshold to balance precision/recall for your application.
- Calibration (Platt scaling, isotonic) makes predicted probabilities reliable.