ML monitoring

Quote

It’s hard to train ML models, but it’s even harder to build production services for these models.

After sometime the ML models quality start to degrade. So we need to monitor production services.

There are 4 things to monitor:

Each of them has its own metrics to measure. The model, and data health monitoring are related to the ML products not normal services.

Summary

Problem: models degrade over time Solution: Monitoring (collect metrics)

You can collect metrics related to different aspects of the system:

Service related metrics like up-time, and memory usage is crucial and must be implemented. A MUST
Model related metrics are split into 2 categories: general and problem specific metrics.
Data-related metrics act as a powerful proxy for model quality when ground truth labels are unavailable or delayed. By monitoring changes in input features or prediction distributions, we can detect *potential model degradation early

Reusing the existing monitoring architecture for ML models can save time and resources as you don’t need to build a new monitoring system from scratch. You can start by adding a couple of dashboards and expand to a more sophisticated system later.

M.K Hassan

Areas

Technical

Leetcode Problems

Projects

MLOPS Zoomcamp

LLM Zoomcamp

Inbox

Archive

ML monitoring

Summary

Graph View

Areas

Technical

Leetcode Problems

Projects

MLOPS Zoomcamp

LLM Zoomcamp

Inbox

Archive