Model server - Wrap any Python predictor class into a production-ready FastAPI inference API using a simple YAML configuration file.
You have a Python class with predict() and/or predict_proba() methods. This tool:
- Wraps it in a FastAPI server with
/predictand/predict_probaendpoints - Adds Prometheus metrics, structured logging, and health checks
- Handles input validation and format conversion automatically
- Provides Docker containerization with version tracking
- Generates GitHub Actions workflows for automated CI/CD builds
pip install git+https://github.com/core64-lab/merve.gitFor development:
git clone https://github.com/core64-lab/merve.git
cd merve
pip install -e ".[dev]"# mlserver_predictor.py
import joblib
class MyPredictor:
def __init__(self, model_path: str):
self.model = joblib.load(model_path)
def predict(self, data):
# data: list of dicts or numpy array
return self.model.predict(data)
def predict_proba(self, data):
return self.model.predict_proba(data)predictor:
module: mlserver_predictor
class_name: MyPredictor
init_kwargs:
model_path: ./model.pkl
classifier:
name: my-classifier
version: 1.0.0That's the minimal configuration. Server defaults to 0.0.0.0:8000.
mlserver serve# Single prediction
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"instances": [{"feature1": 1.0, "feature2": 2.0}]}'
# Batch prediction (same endpoint)
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"instances": [
{"feature1": 1.0, "feature2": 2.0},
{"feature1": 3.0, "feature2": 4.0}
]}'| Endpoint | Method | Description |
|---|---|---|
/predict |
POST | Model predictions (single or batch) |
/predict_proba |
POST | Probability predictions |
/healthz |
GET | Health check |
/info |
GET | Server and model metadata |
/status |
GET | Detailed status information |
/metrics |
GET | Prometheus metrics |
predictor:
module: mlserver_predictor
class_name: MyPredictor
init_kwargs:
model_path: ./model.pkl
classifier:
name: my-classifier
version: 1.0.0server:
host: 0.0.0.0
port: 8000
workers: 1
log_level: INFO
cors:
allow_origins: []
predictor:
module: mlserver_predictor
class_name: MyPredictor
init_kwargs:
model_path: ./model.pkl
classifier:
name: my-classifier
version: 1.0.0
description: My ML classifier
api:
adapter: auto # auto | records | ndarray
feature_order: [col1, col2] # or path to JSON file
thread_safe_predict: false
max_concurrent_predictions: 1
warmup_on_start: true
endpoints:
predict: true
predict_proba: true
observability:
metrics: true
structured_logging: true
correlation_ids: true
log_payloads: falseServe multiple models from one repository:
server:
host: 0.0.0.0
port: 8000
classifiers:
sentiment:
predictor:
module: sentiment_predictor
class_name: SentimentPredictor
init_kwargs:
model_path: ./models/sentiment.pkl
classifier:
name: sentiment
version: 1.0.0
fraud:
predictor:
module: fraud_predictor
class_name: FraudPredictor
init_kwargs:
model_path: ./models/fraud.pkl
classifier:
name: fraud
version: 2.0.0Run a specific classifier:
mlserver serve --classifier sentiment
mlserver build --classifier sentimentmlserver serve [config.yaml] Start the server
mlserver build --classifier <name> Build Docker container
mlserver tag <patch|minor|major> -c <name> Create version tag
mlserver push --classifier <name> Push to container registry
mlserver run --classifier <name> Run container locally
mlserver images List built images
mlserver clean --classifier <name> Remove built images
mlserver list-classifiers List classifiers in config
mlserver version [--json] Show version info
mlserver status Show system status
mlserver validate Validate configuration
mlserver doctor Diagnose common issues
mlserver test Test against running server
mlserver init Initialize new project
mlserver init-github Generate GitHub Actions workflow
# Build container
mlserver build --classifier my-classifier
# Run locally
mlserver run --classifier my-classifier
# Or manually
docker run -p 8000:8000 my-repo/my-classifier:latestCreate hierarchical tags that track both classifier and mlserver versions:
# Create patch version bump (1.0.0 -> 1.0.1)
mlserver tag patch --classifier my-classifier
# Push to trigger GitHub Actions
git push --tagsTag format: <classifier>-v<version>-mlserver-<commit>
Example: my-classifier-v1.0.1-mlserver-abc123d
Initialize the workflow:
mlserver init-githubThis creates .github/workflows/ml-classifier-container-build.yml which:
- Triggers on hierarchical tags
- Installs the exact mlserver version from the tag
- Builds and tests the container
- Pushes to GHCR or ECR
Configure registry in mlserver.yaml:
deployment:
registry:
type: ghcr # or ecr
namespace: your-orgThe API auto-detects input format:
Records (list of dicts):
{"instances": [{"age": 25, "income": 50000}]}ndarray (nested lists):
{"instances": [[25, 50000]]}Force a specific format with api.adapter: records or api.adapter: ndarray.
Built-in observability at no extra configuration:
- Prometheus metrics at
/metrics - Structured JSON logging with correlation IDs
- Health checks at
/healthz - Request tracing via X-Correlation-ID header
Example Prometheus + Grafana setup in monitoring/ directory.
- Python 3.9+
- Docker (for containerization)
- Git (for version tagging)
Request -> FastAPI -> InputAdapter -> Predictor.predict() -> Response
|
Metrics + Logging
See examples/ directory for complete working examples:
- Single classifier setup
- Multi-classifier repository
- Custom preprocessing
- Model ensembles
Run the test suite:
# Run all tests
pytest tests/
# Run with coverage
pytest tests/ --cov=mlserver --cov-report=term-missing
# Run specific test categories
pytest tests/unit/ # Unit tests
pytest tests/integration/ # Integration testsCurrent status: 860 tests passing, 63% coverage
# Validate configuration
mlserver validate
# Diagnose environment issues
mlserver doctor
# Check server status
mlserver statusCommon issues:
- Import errors: Ensure predictor module is in Python path
- Memory issues: Reduce
server.workers(each loads full model) - Slow first request: Enable
api.warmup_on_start: true
Apache License 2.0 - see LICENSE file.
- MLflow Models - Full ML lifecycle platform
- BentoML - Feature-rich model serving
- TorchServe / TensorFlow Serving - Framework-specific
This tool focuses on simplicity: wrap any Python predictor with minimal configuration, no framework lock-in.