MLFlow is an open source platform for the machine learning life cycle.
Life cycle means the whole process of building and maintaining a machine learning model.
It’s a simple python package you can install using pip install mlflow
It consists of 4 main components that make up the MLFlow framework:
- Tracking
Used for logging and querying experiments, runs, parameters, metrics, artifacts, etc.
- Python API:
mlflow.log_param(),mlflow.log_metric(),mlflow.log_artifact() - UI to compare and visualize runs.
- CLI and REST API available.
- Can log data automatically from ML libraries like scikit-learn, XGBoost, TensorFlow, etc.
- Python API:
- Models Manages and Packages machine learning models so they can easily deployed across different platforms
- Model regestry A centralized model store, API, and UI for managing model lifecycle
- Projects
General Concepts
ML Experiment ⇒ the process of building an ML model. Experiment RUN ⇒ each trial in an ML Experiment. RUN Artifacts ⇒ any file associated with that run. Experiment MetaData ⇒ all information related to the experiment.
MLFlow Tracking component allows you to organize your experiments into RUNs. Each Run is basically an experiment trail, and so an experiment is a bunch of Runs.
For each run you can keep track of each of these entities:
- Parameters Hyper Parameters and any thing that has an effect on the metrics of the run (path to the data used, or any preprocessing you made on the data)
- Metrics Metrics calculated on the training, validation, and test data sets like accuracy, f1 score and others
- MetaData Like Tags, you want to add tags for each run such that you can use them on search and filter runs or comparing runs.
- Artifacts Any file associated with the RUN, also any visualization you created from the metrics or the dataset.
- Models
MLFlow also logs these information automatically:
- Source Code
- Version of the code
- Start and end time
- Author
Warning
Experiment Tracking is a general concept it’s part of the machine learning lifecycle itself specifically (model management) area. You can achieve it using many tools from a paper to full automated tool like mlflow