Breaking KServe's MLflow Version Lock-in

September 15, 2025

We needed to serve MLflow models on KServe at work. Should be straightforward — KServe has native MLflow support. Specify modelFormat: mlflow, point it at your model, done.

Except it wasn’t.

The problem

When you set modelFormat: name: "mlflow" in your InferenceService, KServe pulls a pre-built image from seldonio/mlserver. That image is hardcoded in KServe’s runtime kustomization:

image: seldonio/mlserver:1.7.1-mlflow

This ships with MLflow 2.2. We needed 3.2.0. And we had models with different dependency requirements — one needs specific NumPy versions, another needs custom packages. The pre-built image is one-size-fits-all, and it fits nobody.

Figuring out what’s actually going on

This is where it got frustrating. The abstraction layers in this ecosystem are deep:

KServe orchestrates model serving on Kubernetes
MLServer (by Seldon) is the actual inference server that runs inside KServe containers
MLflow is the model format and runtime

To understand how these pieces connect, I had to read the source code of all three projects. AI tools were no help here — the interactions between these systems are too specific and the docs are sparse.

meme

What I figured out:

KServe fetches a default Docker image based on model format
You can override that image — I didn’t know this initially
You can also bypass the default runtime entirely and use your own container
MLflow bundles dependency info (conda.yaml, requirements.txt) with the model artifacts

The people behind this ecosystem have made it genuinely confusing — arguably so managed platforms built on top of it can sell the “easy” version. Smart for them, annoying for us.

The actual solution

Once I understood the layers, the fix was surprisingly simple: build a custom base image with the dependencies we control, and load model-specific requirements at container startup.

Base image

FROM python:3.10.17-slim

ENV MLSERVER_MODEL_URI=/mnt/models
ENV MODELS_DIR=/mnt/models
ENV MLSERVER_MODELS_DIR=/mnt/models
ENV MLSERVER_ENV_TARBALL=/mnt/models/environment.tar.gz
ENV MLSERVER_PATH=/opt/mlserver
ENV MLSERVER_MODEL_IMPLEMENTATION=mlserver_mlflow.MLflowRuntime
ENV MLSERVER_GRPC_PORT=9000
ENV MLSERVER_HTTP_PORT=8080

RUN pip install --no-cache-dir \
    mlflow==3.2.0 \
    cloudpickle==3.1.1 \
    mlserver==1.7.1 \
    mlserver-mlflow==1.7.1 \
    boto3 \
    pyyaml \
    numpy \
    pandas \
    scikit-learn

COPY entrypoint.sh /opt/mlserver/entrypoint.sh
ENTRYPOINT ["/opt/mlserver/entrypoint.sh"]

The base image has the common stuff. Individual models can override versions at startup.

Entrypoint

The entrypoint downloads the model from S3, installs any model-specific dependencies, then starts MLServer:

#!/bin/bash
set -e

# Download model artifacts from S3
python -c "
import mlflow.artifacts
import os
mlflow.artifacts.download_artifacts(
    os.environ['MLFLOW_MODEL_S3_URI'],
    dst_path='/mnt/models'
)
"

# Install model-specific dependencies if bundled
if [ -f "/mnt/models/requirements.txt" ]; then
    pip install --upgrade -r "/mnt/models/requirements.txt"
fi

# Start MLServer
mlserver start /opt/mlserver/settings.json

KServe InferenceService

Instead of using modelFormat: mlflow (which pulls the locked-down default image), we define our own container:

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "similarity-model-test"
  namespace: "inference"
spec:
  predictor:
    containers:
      - name: "similarity-model-test"
        image: "your-registry/mlflow-base:v3"
        env:
          - name: MLFLOW_MODEL_S3_URI
            value: "s3://bucket/path/to/model/artifacts/"
          - name: MLSERVER_MODEL_NAME
            value: "similarity-model-test"
          - name: AWS_DEFAULT_REGION
            value: "us-east-2"

Model packaging

The MLflow side doesn’t change at all. You still log models the same way — the conda_env just gets used at runtime now instead of being ignored:

mlflow.pyfunc.log_model(
    name="similarity-model",
    python_model=YourModel(),
    conda_env={
        'dependencies': [
            'python=3.10.12',
            {'pip': [
                'mlflow==3.2.0',
                'numpy==1.24.3',
                'pandas==2.0.3',
                'custom-package==1.0.0'
            ]}
        ]
    }
)

Istio routing

For production, we set up an Istio VirtualService to route to the model:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ml-inference-vs
  namespace: inference
spec:
  hosts:
  - ml.yourdomain.com
  http:
  - match:
    - uri:
        prefix: /inference/similarity-model/
    rewrite:
      uri: /
    route:
    - destination:
        host: similarity-model-predictor.inference.svc.cluster.local
        port:
          number: 80

Applied the InferenceService, applied the Istio config, curled the endpoint — and it just worked. ✨

What this gets you

	Default KServe MLflow	Custom runtime loading
MLflow version	Fixed at 2.2	Whatever you want
Dependencies	Locked to image	Per-model
Updating models	Rebuild images	Just redeploy
Customization	Limited	Full control

One base image serves all your models. Each model gets its own dependencies loaded at startup. You keep all the KServe benefits — autoscaling, traffic management, canary deployments — without being locked into outdated versions.

Takeaways

The actual code change was small (usually its always the case). Most of the work was understanding the system — reading through KServe, MLServer, and MLflow source code to figure out which layer does what and where the extension points are.

A few things I learned:

You can override KServe runtime images. The docs mention this but it’s easy to miss.
You can bypass the default runtime entirely by specifying containers directly instead of using modelFormat. This gives you full control.
MLServer is the real server. KServe is orchestration. Understanding this separation makes everything click.
The pre-built images install way more than you need. Building your own keeps things lean.

If you’re hitting version conflicts with KServe’s MLflow support, you don’t need to fight the system. Just step around it.