Demo: Zava Media AI Assistant
Multi-Agent Architecture
for Image & Video Processing - Overview

Costa Rica

Last updated: 2026-01-16

List of References (Click to expand)

Foundry Models sold directly by Azure - models available
Timelines for Foundry Models - retirement dates
Azure OpenAI in Microsoft Foundry model deprecations and retirements - deprecation Date
Use model router for Microsoft Foundry - model-router LLMs
Model summary table and region availability - table summary
Baseline architecture for an Azure Kubernetes Service (AKS) cluster
Run your functions from a package file in Azure
What is Microsoft Translator Pro?
Model leaderboards in Microsoft Foundry portal (preview)
AI Leaderboards - general ref
How to Stream Agent Responses
How to enable Live Streaming over Direct Line for a Copilot Studio - deployed agent?
Azure OpenAI Responses API
Foundry Control Plane: Managing AI agents at scale | BRK202

Important

Disclaimer: This repository contains a demo of Zava Media AI Assistant, a hybrid system using 2 Azure AI Agents (via Azure AI Agents Service) for conversational orchestration and cropping, with code-based orchestration for other media tasks (video, image generation, document processing). It features a fully automated "Zero-Touch" deployment pipeline orchestrated by Terraform, which provisions infrastructure, creates specialized AI agents in MSFT Foundry, and deploys the complete application stack. Feel free to modify this as needed, it's just a reference. Please refer TechWorkshop L300: AI Apps and Agents, and if needed contact Microsoft directly: Microsoft Sales and Support for more guidance.

E.g

Important

The deployment process typically takes 15-20 minutes

Adjust terraform.tfvars values
Initialize terraform with terraform init. Click here to understand more about the deployment process
Run terraform apply - this automatically handles all deployment including agent creation and configuration

Key Features

Warning

Multi-Region Deployment: Sweden Central hosts 4 models + 2 agents, East US hosts 1 model.
All models use GlobalStandard SKU for optimal performance and availability.

For example East US & Sweden Central:

East US	Sweden Central

Hybrid Agent Architecture: 2 Azure AI Agents for chat-based orchestration + code-based orchestration for media processing
Multi-Region Deployment:
- Sweden Central: 4 models + 2 agents
  - Models: model-router, GPT-4o, Sora, FLUX.1-Kontext-pro
  - Agents: zava-media-orchestrator, vision-analyst
- East US: 1 model (no agents)
  - Models: FLUX.2-pro
2 Azure AI Agents (chat-based via Responses API):
- zava-media-orchestrator: Central request router using model-router chat model. Routes to 18+ other models
- vision-analyst: Object detection and coordinate analysis using GPT-4o chat model with vision (provides JSON coordinates via HTTPS). ~ Analyzes images to detect objects and return bounding box coordinates as JSON. Application code handles actual image manipulation (cropping, resizing, etc.) using the provided coordinates.
Code-Based Orchestration for generation tasks:
- Video Generation: Direct calls to Sora (Sweden Central). ~ Video generation model (not used by agents, called directly via code)
- Image Generation: Direct calls to FLUX.1-Kontext-pro (Sweden Central) and FLUX.2-pro (East US) ~ Image generation model (not used by agents, called directly via code).
Real-Time Image Processing: Upload or paste images directly into the chat for immediate agent action
Real MSFT Foundry Agents: Integrates with MSFT Foundry to create and host persistent agents across multiple projects
Zero-Touch Deployment: A single terraform apply command handles the entire lifecycle
Advanced Task Coordination: Inter-agent task delegation (e.g., "Crop this, then change background, then add text")
Dynamic Configuration: All settings managed via terraform.tfvars - no code changes needed, just add your values here

Architecture Overview

Important

Agents use CHAT models only (not image generation models). GPT-4o is a chat model with vision, it can see/analyze images in conversation but doesn't generate images.

How It Works:

Orchestrator Agent (model-router - chat model) receives user requests and routes appropriately

Vision Analyst Agent (GPT-4o - chat model with vision) can SEE images in chat and provide object detection coordinates via JSON

Code Orchestration calls generation models directly:

Video generation (Sora - not an agent, direct API call)

Image generation (FLUX.1-Kontext-pro - not an agent, direct API call)

Key Distinction:

Agents = Chat Models (model-router, GPT-4o) for conversation and analysis

Code = Generation Models (Sora, FLUX) for creating videos/images

GPT-4o is a CHAT model that can see images, NOT an image generation model

Warning

Azure Quota and Model Availability The models deployed (model-router, GPT-4o, FLUX.2-pro, FLUX.1-Kontext-pro, Sora) require GPU capacity and are subject to Azure quotas. If you encounter deployment errors related to "Insufficient Quota", request a quota increase: Azure Support

Architecture

graph TD
    User[User] <--> UI[Media Studio UI]
    UI <--> App[FastAPI Application]
    
    App <--> Orchestrator[zava-media-orchestrator<br/>Model Router Chat Model<br/>Sweden Central]
    App <--> Vision[vision-analyst<br/>GPT-4o Chat + Vision<br/>Object Detection & Coordinates<br/>Sweden Central]
    
    App <--> CodeOrch[Code-Based Orchestration]
    
    CodeOrch --> Sora[Sora<br/>Video Generation<br/>Sweden Central]
    CodeOrch --> FLUX1[FLUX.1-Kontext-pro<br/>Image Generation<br/>Sweden Central]
    CodeOrch --> FLUX2[FLUX.2-pro<br/>Image Generation<br/>East US]
    
    subgraph "Azure AI Agents - Chat Models Only"
        Orchestrator
        Vision
    end
    
    subgraph "Sweden Central - Generation Models"
        Sora
        FLUX1
    end
    
    subgraph "East US - Generation Models"
        FLUX2
    end

Architecture Distribution:

2 Azure AI Agents (Sweden Central): zava-media-orchestrator (model-router), vision-analyst (GPT-4o)

Generation Models: Sora, FLUX.1-Kontext-pro (Sweden Central), FLUX.2-pro (East US)

Key: As now, Agents use chat models per Azure AI Agents SDK design

What Happens Under the Hood?

When you run terraform apply, the following automated sequence occurs:

Infrastructure Provisioning:
- Creates Resource Group, 2 Azure AI Foundry projects (Sweden Central + East US), Key Vault, Storage Account, and Container Registry (ACR)
- Multi-Region Model Deployment:
  - Sweden Central (4 models):
    - Model Router (Orchestrator - automatic model selection from 18+ options)
    - GPT-4o (Vision and cropping tasks)
    - Sora (Native video generation)
    - FLUX.1-Kontext-pro (Document processing and contextual understanding)
  - East US (1 model):
    - FLUX.2-pro (Background generation, thumbnail creation, artistic image manipulation)
- All models use GlobalStandard SKU for optimal performance
- All resources use Managed Identity for secure authentication (no API keys stored)
Automated Agent Creation:
- Fully automated by Terraform: No manual intervention required
- Installs the azure-ai-projects SDK and connects to MSFT Foundry projects in both regions
- Creates specialized media processing agents:
  - Sweden Central: zava-media-orchestrator, vision-analyst
  - East US: No agents (Models accessed directly via code)
- Automatically stores agent IDs in Azure Key Vault for secure access with region prefixes
- Web app retrieves agent configuration from Key Vault automatically
- Zero manual configuration - Terraform handles all multi-region agent deployment and setup
Application Deployment:
- Builds the Docker container in the cloud (ACR Build)
- Configures the Azure Web App with the generated Agent IDs and Managed Identity
- Deploys the container and restarts the app

Verification

After deployment completes, verify the system:

Check the Web App:
- The Terraform output will provide the application_url
- Visit https://<your-app-name>.azurewebsites.net
- You should see the Zava Media AI interface
  
  How.the.Web.App.looks.like.mp4
Verify Agent Architecture:
- Go to the MSFT Foundry Portal
- Check Sweden Central Project -> Build -> Agents:
  - Should see: zava-media-orchestrator and vision-analyst
- Check East US Project:
  - Note: No agents are created in East US. The FLUX.2-pro model is accessed directly via code.
- Agent IDs are automatically stored in Azure Key Vault with region prefixes and retrieved by the web app
Test Processing: For example:
- Chat: Ask for information "What is GitHub Copilot?"
- Image Upload: Upload an image and ask "Crop the main subject"
- Background: "Change the background to a beach scene" (routed to East US for fast generation)
- Thumbnail: "Create a thumbnail with the text 'AMAZING'" (routed to East US)
- Multi-Step: "Crop the car, put it on a race track background, and add the text 'SPEED' in red"
- Video: "Generate a 5-second video of a sunset over mountains" (Sweden Central - Sora)
- Document: "Extract all text from this PDF" or "Summarize this document" (Sweden Central - FLUX.1-Kontext-pro)

Refresh Date: 2026-01-16

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github		.github
scripts		scripts
src		src
terraform-infrastructure		terraform-infrastructure
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
metrics.json		metrics.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Demo: Zava Media AI Assistant
Multi-Agent Architecture
for Image & Video Processing - Overview

Key Features

Architecture Overview

Architecture

What Happens Under the Hood?

Verification

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

MicrosoftCloudEssentials-LearningHub/Agentic-AI-Media-Assistant

Folders and files

Latest commit

History

Repository files navigation

Demo: Zava Media AI Assistant Multi-Agent Architecture for Image & Video Processing - Overview

Key Features

Architecture Overview

Architecture

What Happens Under the Hood?

Verification

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Demo: Zava Media AI Assistant
Multi-Agent Architecture
for Image & Video Processing - Overview

Packages