Breaking KServe's MLflow Version Lock-in
We needed to serve MLflow models on KServe at work. Should be straightforward — KServe has native MLflow support. Specify modelFormat: mlflow, point it at your model, done.
Except it wasn’t.
The problem
When you set modelFormat: name: "mlflow" in your InferenceService, KServe pulls a pre-built image from seldonio/mlserver. That image is hardcoded in KServe’s runtime kustomization:
image: seldonio/mlserver:1.7.1-mlflow
This ships with MLflow 2.2. We needed 3.2.0. And we had models with different dependency requirements — one needs specific NumPy versions, another needs custom packages. The pre-built image is one-size-fits-all, and it fits nobody.
Figuring out what’s actually going on
This is where it got frustrating. The abstraction layers in this ecosystem are deep:
- KServe orchestrates model serving on Kubernetes
- MLServer (by Seldon) is the actual inference server that runs inside KServe containers
- MLflow is the model format and runtime
To understand how these pieces connect, I had to read the source code of all three projects. AI tools were no help here — the interactions between these systems are too specific and the docs are sparse.

What I figured out:
- KServe fetches a default Docker image based on model format
- You can override that image — I didn’t know this initially
- You can also bypass the default runtime entirely and use your own container
- MLflow bundles dependency info (
conda.yaml,requirements.txt) with the model artifacts
The people behind this ecosystem have made it genuinely confusing — arguably so managed platforms built on top of it can sell the “easy” version. Smart for them, annoying for us.
The actual solution
Once I understood the layers, the fix was surprisingly simple: build a custom base image with the dependencies we control, and load model-specific requirements at container startup.
Base image
FROM python:3.10.17-slim
ENV MLSERVER_MODEL_URI=/mnt/models
ENV MODELS_DIR=/mnt/models
ENV MLSERVER_MODELS_DIR=/mnt/models
ENV MLSERVER_ENV_TARBALL=/mnt/models/environment.tar.gz
ENV MLSERVER_PATH=/opt/mlserver
ENV MLSERVER_MODEL_IMPLEMENTATION=mlserver_mlflow.MLflowRuntime
ENV MLSERVER_GRPC_PORT=9000
ENV MLSERVER_HTTP_PORT=8080
RUN pip install --no-cache-dir \
mlflow==3.2.0 \
cloudpickle==3.1.1 \
mlserver==1.7.1 \
mlserver-mlflow==1.7.1 \
boto3 \
pyyaml \
numpy \
pandas \
scikit-learn
COPY entrypoint.sh /opt/mlserver/entrypoint.sh
ENTRYPOINT ["/opt/mlserver/entrypoint.sh"]
The base image has the common stuff. Individual models can override versions at startup.
Entrypoint
The entrypoint downloads the model from S3, installs any model-specific dependencies, then starts MLServer:
#!/bin/bash
set -e
# Download model artifacts from S3
python -c "
import mlflow.artifacts
import os
mlflow.artifacts.download_artifacts(
os.environ['MLFLOW_MODEL_S3_URI'],
dst_path='/mnt/models'
)
"
# Install model-specific dependencies if bundled
if [ -f "/mnt/models/requirements.txt" ]; then
pip install --upgrade -r "/mnt/models/requirements.txt"
fi
# Start MLServer
mlserver start /opt/mlserver/settings.json
KServe InferenceService
Instead of using modelFormat: mlflow (which pulls the locked-down default image), we define our own container:
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "similarity-model-test"
namespace: "inference"
spec:
predictor:
containers:
- name: "similarity-model-test"
image: "your-registry/mlflow-base:v3"
env:
- name: MLFLOW_MODEL_S3_URI
value: "s3://bucket/path/to/model/artifacts/"
- name: MLSERVER_MODEL_NAME
value: "similarity-model-test"
- name: AWS_DEFAULT_REGION
value: "us-east-2"
Model packaging
The MLflow side doesn’t change at all. You still log models the same way — the conda_env just gets used at runtime now instead of being ignored:
mlflow.pyfunc.log_model(
name="similarity-model",
python_model=YourModel(),
conda_env={
'dependencies': [
'python=3.10.12',
{'pip': [
'mlflow==3.2.0',
'numpy==1.24.3',
'pandas==2.0.3',
'custom-package==1.0.0'
]}
]
}
)
Istio routing
For production, we set up an Istio VirtualService to route to the model:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ml-inference-vs
namespace: inference
spec:
hosts:
- ml.yourdomain.com
http:
- match:
- uri:
prefix: /inference/similarity-model/
rewrite:
uri: /
route:
- destination:
host: similarity-model-predictor.inference.svc.cluster.local
port:
number: 80
Applied the InferenceService, applied the Istio config, curled the endpoint — and it just worked. ✨
What this gets you
| Default KServe MLflow | Custom runtime loading | |
|---|---|---|
| MLflow version | Fixed at 2.2 | Whatever you want |
| Dependencies | Locked to image | Per-model |
| Updating models | Rebuild images | Just redeploy |
| Customization | Limited | Full control |
One base image serves all your models. Each model gets its own dependencies loaded at startup. You keep all the KServe benefits — autoscaling, traffic management, canary deployments — without being locked into outdated versions.
Takeaways
The actual code change was small (usually its always the case). Most of the work was understanding the system — reading through KServe, MLServer, and MLflow source code to figure out which layer does what and where the extension points are.
A few things I learned:
- You can override KServe runtime images. The docs mention this but it’s easy to miss.
- You can bypass the default runtime entirely by specifying
containersdirectly instead of usingmodelFormat. This gives you full control. - MLServer is the real server. KServe is orchestration. Understanding this separation makes everything click.
- The pre-built images install way more than you need. Building your own keeps things lean.
If you’re hitting version conflicts with KServe’s MLflow support, you don’t need to fight the system. Just step around it.