🌍 Why Drift Matters
- ML models operate in changing environments.
- Over time, data and user behavior can shift.
- This affects model performance, even if the model doesn’t change.
- Monitoring drift helps catch problems before users are impacted.
📦 Types of Drift
1. Data Drift (Covariate Shift)
- What: Change in distribution of input features (X).
- Why it matters: Model sees new types of data it wasn’t trained on.
- Example: Seasonal changes in customer behavior.
- Detection Metrics:
- KL Divergence
- PSI (Population Stability Index)
- KS Test (Kolmogorov–Smirnov)
- Earth Mover’s Distance (EMD)
2. Concept Drift
- What: Change in the relationship between features (X) and target (Y).
- Why it matters: The model’s understanding becomes outdated.
- Example: What counts as “spam” changes over time.
- Detection Metrics:
- Accuracy decline (if true labels are available)
- DDM (Drift Detection Method)
- ADWIN (Adaptive Windowing)
- Confidence shift in model predictions
🛠 What to Monitor
Compare new incoming data to a reference dataset (from when the model was working well).
Compare Distributions Of:
- 📥 Input features (X)
- 🤖 Model predictions (P)
- ✅ True labels (Y) (if available)
⚠️ Key Insight
- Drift doesn’t always mean the model is broken, but it’s a strong signal to:
- Investigate
- Possibly retrain or adjust the model