Saving models using mlflow

There are 2 ways to save a model:

using mlflow.log_artifact() This is the most basic way to store a trained model but you need to save the model locally first then move it to airflow as an artifact

with open("models/booster.bin", "wb") as model_file:
	pickle.dump(booster, model_file)
 
mlflow.log_artifact(local_path="models/booster.bin", artifact_path="picke_models")

NOTE

I’d already saved it locally why to move it to mlflow artifact? see the reasons ML model management

the problem with that approach is to use the model you need to download the model, then read it using pickle to predict new data.

# I used the model already saved on the local file system but typically you should download the model from airflow artifacts
with open("models/booster.bin", "rb") as booster_file:
	loaded_booster = pickle.load(booster_file)
  
temp = loaded_booster.predict(valid)

using mlflow.<framework name>.log_model()

mlflow.xgboost.log_model(booster, name="models_mlflow")

Runs and Saved model separation

In modern Mlflow versions runs are separated from the saved models. In old version you can find the saved model stored within the run but now it’s like more related to the experiment and just referenced by the run owns the model.

Example: Notice how the models folder exists out of the run 1ea and more related to the experiment 558046090145187780 ├── 1eaff8eb6b2a488d9118a231e41dd395 └── models

This method provides more information about the saved data and the dependancies needed to run the model. Besides that you can load the model directly without the need to download the model.

It results some useful files the most important one is MLmodel:

 
artifact_path: models_mlflow
 
flavors:
 
  python_function:
    data: model.xgb
    env:
      conda: conda.yaml
      virtualenv: python_env.yaml
    loader_module: mlflow.xgboost
    
    python_version: 3.9.23
 
  xgboost:
    code: null
    data: model.xgb
    model_class: xgboost.core.Booster
    model_format: xgb
    
    xgb_version: 2.1.4
mlflow_version: 2.22.1
model_size_bytes: 3209279
model_uuid: a28888eb24434c2697d05709d2634e3e
prompts: null
run_id: d7892e89580341e197f77a5900370dc2
utc_time_created: '2025-06-11 22:09:41.968097'

TIP

It’s a good point to save any preprocessors such that you can load and use it in the future in predicting data It depends on the preprocessor framework, if it’s supported by mlflow or you can just use pickle and mlflow.log_articat()

Summary

Using log_model function from the mlflow supported frameworks you can save the model with mlflow model format with lots of information.

You can then load the model easily in the required flavor (python function, framework model object) then deploy the model on whatever platform you want.

M.K Hassan

Areas

Technical

Leetcode Problems

Projects

MLOPS Zoomcamp

LLM Zoomcamp

Inbox

Archive

Saving models using mlflow

Summary

Graph View

Areas

Technical

Leetcode Problems

Projects

MLOPS Zoomcamp

LLM Zoomcamp

Inbox

Archive