Azure OAI Proxy is a lightweight, high-performance proxy server that enables seamless integration between Azure OpenAI Services and applications designed for OpenAI API only compatible endpoints. This project bridges the gap for tools and services that are built to work with OpenAI's API structure but need to utilize Azure's OpenAI services, including support for the latest reasoning models through Azure's Responses API.
- ✅ API Compatibility: Translates requests from OpenAI API format to Azure OpenAI Services format on-the-fly.
- 🧠 Advanced Reasoning Model Support: Full support for Azure's advanced reasoning models (O1, O3, O4 series) through automatic Responses API integration.
- 📡 Streaming Support: Real-time streaming for both traditional chat models and reasoning models with proper format conversion.
- 🗺️ Model Mapping: Automatically maps OpenAI model names to Azure scheme, with a comprehensive failsafe list.
- 🔄 Dynamic Model List: Fetches available models directly from your Azure OpenAI deployment using a dedicated API version.
- 🌐 Support for Multiple Endpoints: Handles various API endpoints including image, speech, completions, chat completions, embeddings, responses API, and more.
- 🚦 Error Handling: Provides meaningful error messages and logging for easier debugging.
- ⚙️ Configurable: Easy to set up with environment variables for Azure AI/Azure OAI endpoint, API keys, and API versions.
- 🔐 Serverless Deployment Support: Supports Azure AI serverless deployments with custom authentication.
- 🔀 Automatic API Selection: Intelligently routes requests to Chat Completions API or Responses API based on model capabilities.
This proxy is particularly useful for:
- Running applications like Open WebUI with Azure OpenAI Services, including advanced reasoning models like O3 and O1.
- Seamlessly using Azure's latest reasoning models in tools built for OpenAI API.
- Testing Azure OpenAI capabilities using tools built for the OpenAI API.
- Transitioning projects from OpenAI to Azure OpenAI with minimal code changes.
- Accessing Azure-exclusive models and features through familiar OpenAI interfaces.
While azure oai proxy serves as a convenient bridge, it's recommended to use the official Azure OpenAI SDK or API directly in production environments or when building new services.
Direct integration offers:
- Better performance
- More reliable and up-to-date feature support
- Simplified architecture with one less component to maintain
- Direct access to Azure-specific features and optimizations
This proxy is ideal for testing, development, and scenarios where modifying the original application to use Azure OpenAI directly is not feasible.
Also, I strongly recommend using TLS/SSL for secure communication between the proxy and the client. This is especially important when using the proxy in a production environment (even though you shouldn't but well, here you are anyway). TBD: Add docker compose including nginx proxy manager.
The latest version of the Azure OpenAI service supports the following APIs:
| Path | Status | Notes |
|---|---|---|
| /v1/chat/completions | ✅ | Auto-routes to Responses API for reasoning models |
| /v1/completions | ✅ | |
| /v1/embeddings | ✅ | |
| /v1/images/generations | ✅ | |
| /v1/fine_tunes | ✅ | |
| /v1/files | ✅ | |
| /v1/models | ✅ | |
| /v1/responses | ✅ | New - Azure Responses API support |
| /v1/responses/:response_id | ✅ | New - Retrieve, delete, cancel operations |
| /v1/responses/:response_id/input_items | ✅ | New - List input items |
| /deployments | ✅ | |
| /v1/audio/speech | ✅ | |
| /v1/audio/transcriptions | ✅ | |
| /v1/audio/translations | ✅ | |
| /v1/models/:model_id/capabilities | ✅ |
The proxy automatically detects model capabilities and routes requests appropriately:
- GPT-5.2 series: gpt-5.2, gpt-5.2-chat (NEW - Preview)
- GPT-5.1 series: gpt-5.1, gpt-5.1-chat (NEW)
- GPT-5 series: gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat
- GPT-4.1 series: gpt-4.1, gpt-4.1-mini, gpt-4.1-nano
- GPT-4o series: gpt-4o, gpt-4o-mini, gpt-4o-2024-11-20, etc.
- GPT-4 series: gpt-4, gpt-4-turbo, gpt-4-32k, etc.
- GPT-3.5 series: gpt-3.5-turbo, gpt-3.5-turbo-16k, etc.
- Claude series (Azure Foundry - Chat Completions API): claude-opus-4.5, claude-sonnet-4.5, claude-haiku-4.5, claude-opus-4.1
⚠️ Note: Claude models must be deployed in your Azure Foundry account first- Claude uses Chat Completions API (NOT Responses API)
- Deployment name must match your Azure deployment (e.g., use
AZURE_OPENAI_MODEL_MAPPERif needed)
- Phi series (Azure Foundry): phi-3, phi-3-mini, phi-3-small, phi-3-medium, phi-4
- Open Source Models: Mistral, Llama, gpt-oss-120b, gpt-oss-20b (via serverless/managed deployments)
- O1 Series: o1, o1-preview, o1-mini
- O3 Series: o3, o3-pro, o3-mini, o3-deep-research
- O4 Series: o4, o4-mini
- Codex Models: codex-mini, gpt-5.1-codex, gpt-5.1-codex-mini, gpt-5.1-codex-max, gpt-5-codex
- Specialized: computer-use-preview, gpt-5-pro
- Realtime Audio: gpt-4o-realtime-preview, gpt-4o-mini-realtime-preview, gpt-realtime, gpt-realtime-mini
- Audio Generation: gpt-4o-audio-preview, gpt-4o-mini-audio-preview, gpt-audio, gpt-audio-mini
- Speech-to-Text: gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize, whisper
- Text-to-Speech: gpt-4o-mini-tts, tts, tts-hd
- Image Generation: gpt-image-1, gpt-image-1-mini, dall-e-2, dall-e-3
- Video Generation: sora, sora-2
Reasoning models automatically use Azure's Responses API while maintaining OpenAI chat completion interface compatibility.
| Parameter | Description | Default Value | Required |
|---|---|---|---|
| AZURE_OPENAI_ENDPOINT | Azure OpenAI Endpoint | Yes | |
| AZURE_OPENAI_PROXY_ADDRESS | Service listening address | 0.0.0.0:11437 | No |
| AZURE_OPENAI_PROXY_MODE | Proxy mode, can be either "azure" or "openai" | azure | No |
| AZURE_OPENAI_APIVERSION | Azure OpenAI API version (for general operations) | 2024-08-01-preview | No |
| AZURE_OPENAI_MODELS_APIVERSION | Azure OpenAI API version (for fetching models) | 2024-10-21 | No |
| AZURE_OPENAI_RESPONSES_APIVERSION | Azure OpenAI API version (for Responses API/O-series) | 2024-08-01-preview | No |
| ANTHROPIC_APIVERSION | Anthropic API version (for Claude models) | 2023-06-01 | No |
| AZURE_OPENAI_MODEL_MAPPER | Comma-separated list of model=deployment pairs | No | |
| AZURE_AI_STUDIO_DEPLOYMENTS | Comma-separated list of serverless deployments | No | |
| AZURE_OPENAI_KEY_* | API keys for serverless deployments (replace * with uppercase model name) | No |
Here's an example docker-compose.yml file with all possible environment variable options:
services:
azure-oai-proxy:
image: 'gyarbij/azure-oai-proxy:latest'
# container_name: azure-oai-proxy
# Alternatively, use GitHub Container Registry:
# image: 'ghcr.io/gyarbij/azure-oai-proxy:latest'
restart: always
environment:
- AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
- AZURE_OPENAI_APIVERSION=2024-08-01-preview
- AZURE_OPENAI_MODELS_APIVERSION=2024-10-21
- AZURE_OPENAI_RESPONSES_APIVERSION=2024-08-01-preview
- ANTHROPIC_APIVERSION=2023-06-01
# - AZURE_OPENAI_PROXY_ADDRESS=0.0.0.0:11437
# - AZURE_OPENAI_PROXY_MODE=azure
# - AZURE_OPENAI_MODEL_MAPPER=gpt-3.5-turbo=gpt-35-turbo,gpt-4=gpt-4-turbo
# - AZURE_AI_STUDIO_DEPLOYMENTS=mistral-large-2407=Mistral-large2:swedencentral,llama-3.1-405B=Meta-Llama-3-1-405B-Instruct:northcentralus,claude-sonnet-4.5=Claude-Sonnet-45:eastus2
# - AZURE_OPENAI_KEY_MISTRAL-LARGE-2407=your-api-key-1
# - AZURE_OPENAI_KEY_LLAMA-3.1-405B=your-api-key-2
# - AZURE_OPENAI_KEY_CLAUDE-SONNET-4.5=your-api-key-3
ports:
- '11437:11437'
# Uncomment the following line to use an .env file:
# env_file: .envTo use this configuration:
- Save the above content in a file named
compose.yaml. - Replace the placeholder values (e.g.,
your-endpoint,your-api-key-1, etc.) with your actual Azure OpenAI configuration. - Run the following command in the same directory as your
compose.yamlfile:
docker compose up -dTo use an .env file instead of environment variables in the Docker Compose file:
- Create a file named
.envin the same directory as yourdocker-compose.yml. - Add your environment variables to the
.envfile, one per line:
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_APIVERSION=2024-08-01-preview
AZURE_OPENAI_MODELS_APIVERSION=2024-10-21
AZURE_OPENAI_RESPONSES_APIVERSION=2024-08-01-preview
ANTHROPIC_APIVERSION=2023-06-01
AZURE_AI_STUDIO_DEPLOYMENTS=mistral-large-2407=Mistral-large2:swedencentral,llama-3.1-405B=Meta-Llama-3-1-405B-Instruct:northcentralus,claude-sonnet-4.5=Claude-Sonnet-45:eastus2
AZURE_OPENAI_KEY_MISTRAL-LARGE-2407=your-api-key-1
AZURE_OPENAI_KEY_LLAMA-3.1-405B=your-api-key-2
AZURE_OPENAI_KEY_CLAUDE-SONNET-4.5=your-api-key-3
- Uncomment the
env_file: .envline in yourdocker-compose.yml. - Run
docker-compose up -dto start the container with the environment variables from the .env file.
To run the Azure OAI Proxy using the image from GitHub Container Registry:
docker run -d -p 11437:11437 \
-e AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/ \
-e AZURE_OPENAI_MODELS_APIVERSION=2024-10-21 \
-e AZURE_AI_STUDIO_DEPLOYMENTS=mistral-large-2407=Mistral-large2:swedencentral \
-e AZURE_OPENAI_KEY_MISTRAL-LARGE-2407=your-api-key \
ghcr.io/gyarbij/azure-oai-proxy:latestReplace the placeholder values with your actual Azure OpenAI configuration.
Once the proxy is running, you can call it using the OpenAI API format:
curl http://localhost:11437/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-azure-api-key" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'- Claude models must be deployed in your Azure Foundry account before use
- They use the Anthropic Messages API (automatically converted from OpenAI chat completions format)
- The proxy automatically handles the conversion - just use the standard OpenAI format
- Requests are routed to
/anthropic/v1/messagesendpoint - Responses are automatically converted back to OpenAI chat completion format
Example - Standard OpenAI Format Works Seamlessly:
curl http://localhost:11437/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-azure-api-key" \
-d '{
"model": "claude-sonnet-4.5",
"messages": [{"role": "user", "content": "Explain quantum computing in simple terms"}],
"max_tokens": 1000
}'Behind the scenes:
- Request is automatically converted to Anthropic Messages API format
- Routed to
https://your-endpoint.services.ai.azure.com/anthropic/v1/messages(no Azure api-version query parameter) - Response is converted back to OpenAI chat completion format
- System messages are extracted and passed as the
systemparameter - Headers are automatically adjusted (
x-api-key,anthropic-version: 2023-06-01) - Note: Uses
ANTHROPIC_APIVERSIONenvironment variable (default:2023-06-01)
Example with custom deployment name:
If your Claude deployment has a different name (e.g., Claude-Sonnet-45-20251001), use the model mapper:
AZURE_OPENAI_MODEL_MAPPER=claude-sonnet-4.5=Claude-Sonnet-45-20251001curl http://localhost:11437/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-azure-api-key" \
-d '{
"model": "phi-4",
"messages": [{"role": "user", "content": "What is machine learning?"}]
}'curl http://localhost:11437/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-azure-api-key" \
-d '{
"model": "o3-pro",
"messages": [{"role": "user", "content": "Solve this complex reasoning problem..."}],
"stream": true
}'curl http://localhost:11437/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-azure-api-key" \
-d '{
"model": "o3-pro",
"input": "What are the implications of quantum computing?",
"stream": false
}'For serverless deployments, use the model name as defined in your AZURE_AI_STUDIO_DEPLOYMENTS configuration.
These are the default mappings for the most common models, if your Azure OpenAI deployment uses different names, you can set the AZURE_OPENAI_MODEL_MAPPER environment variable to define custom mappings. The proxy also includes a comprehensive failsafe list to handle a wide variety of model names:
| OpenAI Model | Azure OpenAI Model |
|---|---|
"o1" |
"o1" |
"o1-preview" |
"o1-preview" |
"o1-mini" |
"o1-mini" |
"o1-mini-2024-09-12" |
"o1-mini-2024-09-12" |
"o3" |
"o3" |
"o3-mini" |
"o3-mini" |
"o3-pro" |
"o3-pro" |
"o3-pro-2025-06-10" |
"o3-pro-2025-06-10" |
"o4" |
"o4" |
"o4-mini" |
"o4-mini" |
| OpenAI Model | Azure OpenAI Model |
|---|---|
"claude-opus-4.5" |
"claude-opus-4.5" |
"claude-opus-4-5" |
"claude-opus-4.5" |
"claude-sonnet-4.5" |
"claude-sonnet-4.5" |
"claude-sonnet-4-5" |
"claude-sonnet-4.5" |
"claude-haiku-4.5" |
"claude-haiku-4.5" |
"claude-haiku-4-5" |
"claude-haiku-4.5" |
"claude-opus-4.1" |
"claude-opus-4.1" |
"claude-opus-4-1" |
"claude-opus-4.1" |
| OpenAI Model | Azure OpenAI Model |
|---|---|
"gpt-4o" |
"gpt-4o" |
"gpt-4o-2024-05-13" |
"gpt-4o-2024-05-13" |
"gpt-4o-2024-08-06" |
"gpt-4o-2024-08-06" |
"gpt-4o-2024-11-20" |
"gpt-4o-2024-11-20" |
"gpt-4o-mini" |
"gpt-4o-mini" |
"gpt-4o-mini-2024-07-18" |
"gpt-4o-mini-2024-07-18" |
"gpt-4" |
"gpt-4-0613" |
"gpt-4-turbo" |
"gpt-4-turbo" |
"gpt-4-turbo-2024-04-09" |
"gpt-4-turbo-2024-04-09" |
"gpt-3.5-turbo" |
"gpt-35-turbo-0613" |
"gpt-3.5-turbo-16k" |
"gpt-35-turbo-16k-0613" |
| OpenAI Model | Azure OpenAI Model |
|---|---|
"phi-3" |
"phi-3" |
"phi-3-mini" |
"phi-3-mini" |
"phi-3-small" |
"phi-3-small" |
"phi-3-medium" |
"phi-3-medium" |
"phi-4" |
"phi-4" |
| OpenAI Model | Azure OpenAI Model |
|---|---|
"text-embedding-3-small" |
"text-embedding-3-small-1" |
"text-embedding-3-large" |
"text-embedding-3-large-1" |
"dall-e-2" |
"dall-e-2-2.0" |
"dall-e-3" |
"dall-e-3-3.0" |
"tts" |
"tts-001" |
"tts-hd" |
"tts-hd-001" |
"whisper" |
"whisper-001" |
For custom fine-tuned models, the model name can be passed directly. For models with deployment names different from the model names, custom mapping relationships can be defined, such as:
| Model Name | Deployment Name |
|---|---|
| gpt-3.5-turbo | gpt-35-turbo-upgrade |
| gpt-3.5-turbo-0301 | gpt-35-turbo-0301-fine-tuned |
The proxy automatically detects when you're using reasoning models (O1, O3, O4 series) and:
- Routes to Responses API: Automatically converts
/v1/chat/completionsrequests to use Azure's/openai/v1/responsesendpoint - Converts Request Format: Transforms OpenAI chat messages to Responses API input format
- Handles Streaming: Converts Responses API SSE events to OpenAI-compatible streaming format
- Maintains Compatibility: Your client code doesn't need to change - use standard OpenAI format
- O1 Family:
o1,o1-preview,o1-mini,o1-mini-2024-09-12 - O3 Family:
o3,o3-pro,o3-mini,o3-pro-2025-06-10 - O4 Family:
o4,o4-mini
When using reasoning models, you get access to:
- Advanced Reasoning: Enhanced problem-solving capabilities
- Reasoning Traces: Detailed reasoning process (when available)
- Background Processing: Support for long-running reasoning tasks
- Chain of Thought: Structured reasoning outputs
- Always use HTTPS in production environments for secure communication.
- Regularly update the proxy to ensure compatibility with the latest Azure OpenAI API changes.
- Monitor your Azure OpenAI usage and costs, especially when using this proxy in high-traffic scenarios.
- Reasoning models may have higher latency due to their advanced processing capabilities.
- Some reasoning models may have usage limits or require special access permissions.
✅ NEW: Native Anthropic Messages API Support
- Claude models now use the Anthropic Messages API (
/anthropic/v1/messages) - Automatic conversion from OpenAI chat completions format
- Automatic response conversion back to OpenAI format
- No configuration changes needed - use standard OpenAI format
Error: "This model is not supported by Responses API"
- Fixed: Claude models now correctly use Anthropic Messages API (not Responses API or standard Chat Completions)
- Solution: Update to the latest version - the proxy now automatically routes Claude to the correct endpoint
Error: "Unknown model: claude-sonnet-4-5" or similar
- Cause: The deployment name in Azure doesn't match the model name you're using
- Solution: Use
AZURE_OPENAI_MODEL_MAPPERto map the model name to your actual Azure deployment name:# If your deployment is named something like "Claude-Sonnet-45-20251001" AZURE_OPENAI_MODEL_MAPPER=claude-sonnet-4.5=Claude-Sonnet-45-20251001 - Tip: Check your Azure Foundry portal to see the exact deployment name
Deployment Requirements:
- Claude models must be deployed in your Azure Foundry account (East US2 or Sweden Central)
- They require Global Standard deployment
- The endpoint format is
https://your-resource.services.ai.azure.com - Uses
x-api-keyheader andanthropic-version: 2023-06-01
Error: "Resource not found" (404)
- Check deployment exists: Verify the model is deployed in your Azure account
- Check deployment name: Use the detailed logging to see what deployment name is being used
- Use model mapper: Map model names to your actual deployment names if they differ
- 2025-12-14 (Latest) Added native Anthropic Messages API support for Claude models:
- Claude models now use
/anthropic/v1/messagesendpoint (correct format for Azure Foundry) - Automatic bidirectional conversion between OpenAI and Anthropic formats
- System messages extracted and handled correctly
- Headers automatically adjusted (
x-api-key,anthropic-version) - Seamless integration - use standard OpenAI chat completions format
- Claude models now use
- 2025-12-14 Added comprehensive Azure OpenAI in Microsoft Foundry support including:
- GPT-5.2 series (gpt-5.2, gpt-5.2-chat) - NEW preview models
- GPT-5.1 series (gpt-5.1, gpt-5.1-chat, gpt-5.1-codex variants)
- GPT-5 series (gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat, gpt-5-codex, gpt-5-pro)
- GPT-4.1 series (gpt-4.1, gpt-4.1-mini, gpt-4.1-nano)
- Claude 4.x models (Opus 4.5, Sonnet 4.5, Haiku 4.5, Opus 4.1)
- Complete O-series reasoning models (o1, o3, o4 variants, o3-deep-research)
- Codex models (codex-mini, gpt-5.1-codex variants)
- Audio models (gpt-4o audio/realtime/transcribe, gpt-realtime, gpt-audio variants)
- Image generation (gpt-image-1, gpt-image-1-mini)
- Video generation (sora, sora-2)
- Open-weight models (gpt-oss-120b, gpt-oss-20b)
- Specialized models (computer-use-preview)
- Updated API versions to 2024-08-01-preview (general and Responses API - supports all Azure Foundry models)
- 2025-08-03 (v1.0.8) Added comprehensive support for Azure OpenAI Responses API with automatic reasoning model detection and streaming conversion.
- 2025-01-24 Added support for Azure OpenAI API version 2024-12-01-preview and new model fetching mechanism.
- 2024-07-25 Implemented support for Azure AI Studio deployments with support for Meta LLama 3.1, Mistral-2407 (mistral large 2), and other open models including from Cohere AI.
- 2024-07-18 Added support for
gpt-4o-mini. - 2024-06-23 Implemented dynamic model fetching for
/v1/modelsendpoint, replacing hardcoded model list. - 2024-06-23 Unified token handling mechanism across the application, improving consistency and security.
- 2024-06-23 Added support for audio-related endpoints:
/v1/audio/speech,/v1/audio/transcriptions, and/v1/audio/translations. - 2024-06-23 Implemented flexible environment variable handling for configuration (AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_TOKEN).
- 2024-06-23 Added support for model capabilities endpoint
/v1/models/:model_id/capabilities. - 2024-06-23 Improved cross-origin resource sharing (CORS) handling with OPTIONS requests.
- 2024-06-23 Enhanced proxy functionality to better handle various Azure OpenAI API endpoints.
- 2024-06-23 Implemented fallback model mapping for unsupported models.
- 2024-06-22 Added support for image generation
/v1/images/generations, fine-tuning operations/v1/fine_tunes, and file management/v1/files. - 2024-06-22 Implemented better error handling and logging for API requests.
- 2024-06-22 Improved handling of rate limiting and streaming responses.
- 2024-06-22 Updated model mappings to include the latest models (gpt-4-turbo, gpt-4-vision-preview, dall-e-3).
- 2024-06-23 Added support for deployments management (/deployments).
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.
This project is not officially associated with or endorsed by Microsoft Azure or OpenAI. Use at your own discretion and ensure compliance with all relevant terms of service.