Audience: Microsoft Field Engineers Goal: Provide guidance on how to demo Anyscale on AKS functionality
- Prerequisites
- Demo 1: Multi-modal Batch Inference
- Demo 2: Deploy LLMs
- Tips for a Successful Demo
- Support
Contact [email protected] for access credentials to the demo Anyscale organization.
Navigate to console.anyscale.com and sign in with your credentials.
-
Launch a workspace with Multi-Modal AI template
- From the Anyscale console, create a new workspace (Create from template)
- Select the "Multi-Modal AI" template
- Launch the template
-
Modify compute configuration
- Terminate the workspace (if already running)
- Navigate to compute configuration settings
- Change the head node:
- From:
2CPU-8GB - To:
8CPU-32GB
- From:
- Change the worker nodes:
- From:
Auto-select workers - To:
4 x T4GPUs
- From:
-
Modify the container image
- Select image "anyscale/ray:2.49.1-py312-cu128"
-
Re-launch the workspace
- Navigate to the VSCode interface (not VSCode Desktop)
- Access notebooks/01-Batch-Inference.ipynb
- [Optional] Modify the Batch Inference notebook to use a shared storage mount In the start of the section "Data ingestion" replace the code with a reference to S3 to:
# Load data.
ds = ray.data.read_images(
"/mnt/shared_storage/doggos-dataset/train",
include_paths=True,
shuffle="files",
)
ds.take(1)
- Run through the notebook until the section "Monitoring and Debugging" in the notebook
-
Launch a workspace with Deploy LLMs template
- From the Anyscale console, create a new workspace
- Select the "Deploy LLMs" template
-
Modify compute configuration
- Terminate the workspace (if already running)
- Navigate to compute configuration settings
- Change the head node:
- From:
2CPU-8GB - To:
8CPU-32GB
- From:
- Change the worker nodes:
- From:
Auto-select workers - To:
2 x A100nodes
- From:
-
Set up HuggingFace token
- Sign in to HuggingFace (create an account if required)
- Navigate to Profile → Access Tokens
- Create a new token with read permissions
- Copy the token for the next step
-
Configure environment variables
- In the Anyscale workspace settings, navigate to Dependencies → Environment Variables
- Edit to Add the following environment variable:
HF_TOKEN=<YOUR_HF_TOKEN> - Replace
<YOUR_HF_TOKEN>with your actual HuggingFace token
-
Launch the workspace
- Start the workspace with the new configuration
- Modify
small-size-llm/notebook.ipynbas follows:
accelerator_type="A100",instead ofaccelerator_type="L4"- Add your HuggingFace in the right locations (two locations)
- Modify
small-size-llm/serve_llama_3_1_8b.pyas follows:
accelerator_type="A100",instead ofaccelerator_type="L4"
- Modify
small-size-llm/service.yamlas follows:
# service.yaml
name: deploy-llama-3-8b
image_uri: anyscale/ray-llm:2.50.1-py311-cu128 # Anyscale Ray Serve LLM image. Use `containerfile: ./Dockerfile` to use a custom Dockerfile.
compute_config:
auto_select_worker_config: true
head_node:
instance_type: 8CPU-32GB
working_dir: .
cloud:
applications:
# Point to your app in your Python module
- import_path: serve_llama_3_1_8b:app
- Follow the instructions in the notebook small-size-llm/notebook.ipynb
- Ensure nodes are provisioned before the demo to avoid wait times
- Test the workflows in advance to familiarize yourself with the UI
- Prepare talking points about AKS integration benefits
- Have backup examples ready in case of any technical issues
- Emphasize scalability and Azure-native features
For questions or issues, contact [email protected]








