This repo contains code for FastAPI service for serving LUME models from MLFlow. With one docker image, multiple model deployments can be made via environment variables. Any changes to the client code will rebuild the base image automatically through Github Actions CI.
- MLFLOW_TRACKING_URI: This is set in the mlflow-config configmap.
- MODEL_NAME: lcls_cu_inj_model or lcls-fel-surrogate
- MODEL_VERSION: 1
To test the functionality of the image, user can create a temporary pod using the test-client image (also created by the CI) to run checks.
kubectl run test -n inference-service --image=ghcr.io/slaclab/inference-service/test-client:latest --rm -it --restart=Never --env="INFERENCE_SERVICE_URL=http://inference-service:8000" python test_validation.py
kubectl run test -n inference-service --image=ghcr.io/slaclab/inference-service/test-client:latest --rm -it --restart=Never --env="INFERENCE_SERVICE_URL=http://inference-service:8000" python test_client.pyThe Copier template generates Kubernetes manifests for deploying ML inference services. User can either use a yaml like here or use copier.
- User can either use a simple template that is then used by Copier to generate deployment yaml
service_name: iris-service
namespace: inference-service
model_name: iris-model
model_version: "1"
# mlflow_uri removed - it's in the shared mlflow-config ConfigMap
container_registry: ghcr.io/slaclab/inference-service
replicas: 2
memory_request: "4Gi"
memory_limit: "8Gi"
cpu_request: "1000m"
cpu_limit: "4000m"
An example of this template is in the model-configs/iris-model directory. User can generate copier template in the deployments/iris-model directory using this command from the root -
copier copy --data-file model-configs/iris-model.yaml copier-template-k8s deployments/iris-model
- User can also simply run below copier command and answer questions to generate the deployment yaml
copier copy copier-template-k8s deployments/iris-model
The command below will create the deployment yaml in the folder specified at the end. These are the questions the template will ask.
(test-bed) bash-5.3$ copier copy copier-template-k8s deployments/fel-model
🎤 What is the service name?
inference-service-fel
🎤 Which Kubernetes namespace?
lume-online-ml
🎤 What is the MLflow model name?
lcls-fel-surrogate
🎤 What model version to deploy?
1
🎤 Container registry (e.g., ghcr.io/username/repo)?
ghcr.io/slaclab/inference-service
🎤 Number of replicas?
1
🎤 Memory request (e.g., 2Gi)?
2Gi
🎤 Memory limit (e.g., 4Gi)?
4Gi
🎤 CPU request (e.g., 500m)?
500m
🎤 CPU limit (e.g., 2000m)?
2000m
Copying from template version None
create deployment.yaml
User can then deploy the deployment yaml in the lume-online-ml namespace either manually using
kubectl apply -f deployment.yaml -n lume-online-ml. We have also configured ArgoCD to automate the deployments.