Quote
It’s hard to train ML models, but it’s even harder to build production services for these models.
After sometime the ML models quality start to degrade. So we need to monitor production services.
There are 4 things to monitor:
- Monitoring Service health
- Monitoring Model Health
- Monitoring data health
- Data and Concept drift
- Other ML metrics to collect from monitoring
Each of them has its own metrics to measure. The model, and data health monitoring are related to the ML products not normal services.
Summary
Problem: models degrade over time Solution: Monitoring (collect metrics)
You can collect metrics related to different aspects of the system:
-
Service related metrics like up-time, and memory usage is crucial and must be implemented. A MUST
-
Model related metrics are split into 2 categories: general and problem specific metrics.
-
Data-related metrics act as a powerful proxy for model quality when ground truth labels are unavailable or delayed. By monitoring changes in input features or prediction distributions, we can detect *potential model degradation early
Reusing the existing monitoring architecture for ML models can save time and resources as you don’t need to build a new monitoring system from scratch. You can start by adding a couple of dashboards and expand to a more sophisticated system later.