Azure OpenAI Proxy

Introduction

Azure OAI Proxy is a lightweight, high-performance proxy server that enables seamless integration between Azure OpenAI Services and applications designed for OpenAI API only compatible endpoints. This project bridges the gap for tools and services that are built to work with OpenAI's API structure but need to utilize Azure's OpenAI services, including support for the latest reasoning models through Azure's Responses API.

Key Features

✅ API Compatibility: Translates requests from OpenAI API format to Azure OpenAI Services format on-the-fly.
🧠 Advanced Reasoning Model Support: Full support for Azure's advanced reasoning models (O1, O3, O4 series) through automatic Responses API integration.
📡 Streaming Support: Real-time streaming for both traditional chat models and reasoning models with proper format conversion.
🗺️ Model Mapping: Automatically maps OpenAI model names to Azure scheme, with a comprehensive failsafe list.
🔄 Dynamic Model List: Fetches available models directly from your Azure OpenAI deployment using a dedicated API version.
🌐 Support for Multiple Endpoints: Handles various API endpoints including image, speech, completions, chat completions, embeddings, responses API, and more.
🚦 Error Handling: Provides meaningful error messages and logging for easier debugging.
⚙️ Configurable: Easy to set up with environment variables for Azure AI/Azure OAI endpoint, API keys, and API versions.
🔐 Serverless Deployment Support: Supports Azure AI serverless deployments with custom authentication.
🔀 Automatic API Selection: Intelligently routes requests to Chat Completions API or Responses API based on model capabilities.

Use Cases

This proxy is particularly useful for:

Running applications like Open WebUI with Azure OpenAI Services, including advanced reasoning models like O3 and O1.
Seamlessly using Azure's latest reasoning models in tools built for OpenAI API.
Testing Azure OpenAI capabilities using tools built for the OpenAI API.
Transitioning projects from OpenAI to Azure OpenAI with minimal code changes.
Accessing Azure-exclusive models and features through familiar OpenAI interfaces.

Important Note

While azure oai proxy serves as a convenient bridge, it's recommended to use the official Azure OpenAI SDK or API directly in production environments or when building new services.

Direct integration offers:

Better performance
More reliable and up-to-date feature support
Simplified architecture with one less component to maintain
Direct access to Azure-specific features and optimizations

This proxy is ideal for testing, development, and scenarios where modifying the original application to use Azure OpenAI directly is not feasible.

Also, I strongly recommend using TLS/SSL for secure communication between the proxy and the client. This is especially important when using the proxy in a production environment (even though you shouldn't but well, here you are anyway). TBD: Add docker compose including nginx proxy manager.

Supported APIs

The latest version of the Azure OpenAI service supports the following APIs:

Path	Status	Notes
/v1/chat/completions	✅	Auto-routes to Responses API for reasoning models
/v1/completions	✅
/v1/embeddings	✅
/v1/images/generations	✅
/v1/fine_tunes	✅
/v1/files	✅
/v1/models	✅
/v1/responses	✅	New - Azure Responses API support
/v1/responses/:response_id	✅	New - Retrieve, delete, cancel operations
/v1/responses/:response_id/input_items	✅	New - List input items
/deployments	✅
/v1/audio/speech	✅
/v1/audio/transcriptions	✅
/v1/audio/translations	✅
/v1/models/:model_id/capabilities	✅

Model Support & API Routing

The proxy automatically detects model capabilities and routes requests appropriately:

Traditional Models (Chat Completions API)

GPT-5.2 series: gpt-5.2, gpt-5.2-chat (NEW - Preview)
GPT-5.1 series: gpt-5.1, gpt-5.1-chat (NEW)
GPT-5 series: gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat
GPT-4.1 series: gpt-4.1, gpt-4.1-mini, gpt-4.1-nano
GPT-4o series: gpt-4o, gpt-4o-mini, gpt-4o-2024-11-20, etc.
GPT-4 series: gpt-4, gpt-4-turbo, gpt-4-32k, etc.
GPT-3.5 series: gpt-3.5-turbo, gpt-3.5-turbo-16k, etc.
Claude series (Azure Foundry - Chat Completions API): claude-opus-4.5, claude-sonnet-4.5, claude-haiku-4.5, claude-opus-4.1
- ⚠️ Note: Claude models must be deployed in your Azure Foundry account first
- Claude uses Chat Completions API (NOT Responses API)
- Deployment name must match your Azure deployment (e.g., use AZURE_OPENAI_MODEL_MAPPER if needed)
Phi series (Azure Foundry): phi-3, phi-3-mini, phi-3-small, phi-3-medium, phi-4
Open Source Models: Mistral, Llama, gpt-oss-120b, gpt-oss-20b (via serverless/managed deployments)

Reasoning Models (Responses API)

O1 Series: o1, o1-preview, o1-mini
O3 Series: o3, o3-pro, o3-mini, o3-deep-research
O4 Series: o4, o4-mini
Codex Models: codex-mini, gpt-5.1-codex, gpt-5.1-codex-mini, gpt-5.1-codex-max, gpt-5-codex
Specialized: computer-use-preview, gpt-5-pro

Audio Models

Realtime Audio: gpt-4o-realtime-preview, gpt-4o-mini-realtime-preview, gpt-realtime, gpt-realtime-mini
Audio Generation: gpt-4o-audio-preview, gpt-4o-mini-audio-preview, gpt-audio, gpt-audio-mini
Speech-to-Text: gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize, whisper
Text-to-Speech: gpt-4o-mini-tts, tts, tts-hd

Image & Video Generation

Image Generation: gpt-image-1, gpt-image-1-mini, dall-e-2, dall-e-3
Video Generation: sora, sora-2

Reasoning models automatically use Azure's Responses API while maintaining OpenAI chat completion interface compatibility.

Configuration

Environment Variables

Parameter	Description	Default Value	Required
AZURE_OPENAI_ENDPOINT	Azure OpenAI Endpoint		Yes
AZURE_OPENAI_PROXY_ADDRESS	Service listening address	0.0.0.0:11437	No
AZURE_OPENAI_PROXY_MODE	Proxy mode, can be either "azure" or "openai"	azure	No
AZURE_OPENAI_APIVERSION	Azure OpenAI API version (for general operations)	2024-08-01-preview	No
AZURE_OPENAI_MODELS_APIVERSION	Azure OpenAI API version (for fetching models)	2024-10-21	No
AZURE_OPENAI_RESPONSES_APIVERSION	Azure OpenAI API version (for Responses API/O-series)	2024-08-01-preview	No
ANTHROPIC_APIVERSION	Anthropic API version (for Claude models)	2023-06-01	No
AZURE_OPENAI_MODEL_MAPPER	Comma-separated list of model=deployment pairs		No
AZURE_AI_STUDIO_DEPLOYMENTS	Comma-separated list of serverless deployments		No
AZURE_OPENAI_KEY_*	API keys for serverless deployments (replace * with uppercase model name)		No

Usage

Docker Compose

⚠️ Important: When using Docker, you must set the API version environment variables in your compose file to override the defaults. Older Docker images may have outdated API versions hardcoded.

Here's an example docker-compose.yml file with all possible environment variable options:

services:
  azure-oai-proxy:
    image: 'gyarbij/azure-oai-proxy:latest'
    # container_name: azure-oai-proxy
    # Alternatively, use GitHub Container Registry:
    # image: 'ghcr.io/gyarbij/azure-oai-proxy:latest'
    restart: always
    environment:
      - AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
      - AZURE_OPENAI_APIVERSION=2024-08-01-preview
      - AZURE_OPENAI_MODELS_APIVERSION=2024-10-21
      - AZURE_OPENAI_RESPONSES_APIVERSION=2024-08-01-preview
      - ANTHROPIC_APIVERSION=2023-06-01
      # - AZURE_OPENAI_PROXY_ADDRESS=0.0.0.0:11437
      # - AZURE_OPENAI_PROXY_MODE=azure
      # - AZURE_OPENAI_MODEL_MAPPER=gpt-3.5-turbo=gpt-35-turbo,gpt-4=gpt-4-turbo
      # - AZURE_AI_STUDIO_DEPLOYMENTS=mistral-large-2407=Mistral-large2:swedencentral,llama-3.1-405B=Meta-Llama-3-1-405B-Instruct:northcentralus,claude-sonnet-4.5=Claude-Sonnet-45:eastus2
      # - AZURE_OPENAI_KEY_MISTRAL-LARGE-2407=your-api-key-1
      # - AZURE_OPENAI_KEY_LLAMA-3.1-405B=your-api-key-2
      # - AZURE_OPENAI_KEY_CLAUDE-SONNET-4.5=your-api-key-3
    ports:
      - '11437:11437'
    # Uncomment the following line to use an .env file:
    # env_file: .env

To use this configuration:

Save the above content in a file named compose.yaml.
Replace the placeholder values (e.g., your-endpoint, your-api-key-1, etc.) with your actual Azure OpenAI configuration.
Run the following command in the same directory as your compose.yaml file:

docker compose up -d

Using an .env File

To use an .env file instead of environment variables in the Docker Compose file:

Create a file named .env in the same directory as your docker-compose.yml.
Add your environment variables to the .env file, one per line:

AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_APIVERSION=2024-08-01-preview
AZURE_OPENAI_MODELS_APIVERSION=2024-10-21
AZURE_OPENAI_RESPONSES_APIVERSION=2024-08-01-preview
ANTHROPIC_APIVERSION=2023-06-01
AZURE_AI_STUDIO_DEPLOYMENTS=mistral-large-2407=Mistral-large2:swedencentral,llama-3.1-405B=Meta-Llama-3-1-405B-Instruct:northcentralus,claude-sonnet-4.5=Claude-Sonnet-45:eastus2
AZURE_OPENAI_KEY_MISTRAL-LARGE-2407=your-api-key-1
AZURE_OPENAI_KEY_LLAMA-3.1-405B=your-api-key-2
AZURE_OPENAI_KEY_CLAUDE-SONNET-4.5=your-api-key-3

Uncomment the env_file: .env line in your docker-compose.yml.
Run docker-compose up -d to start the container with the environment variables from the .env file.

Running from GitHub Container Registry

To run the Azure OAI Proxy using the image from GitHub Container Registry:

docker run -d -p 11437:11437 \
 -e AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/ \
 -e AZURE_OPENAI_MODELS_APIVERSION=2024-10-21 \
 -e AZURE_AI_STUDIO_DEPLOYMENTS=mistral-large-2407=Mistral-large2:swedencentral \
 -e AZURE_OPENAI_KEY_MISTRAL-LARGE-2407=your-api-key \
 ghcr.io/gyarbij/azure-oai-proxy:latest

Replace the placeholder values with your actual Azure OpenAI configuration.

Usage Examples

Calling the API

Once the proxy is running, you can call it using the OpenAI API format:

Traditional Chat Models (GPT-4o, GPT-4, etc.)

curl http://localhost:11437/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-azure-api-key" \
 -d '{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "Hello!"}]
 }'

Claude Models (Azure Foundry)

⚠️ Important for Claude Models:

Claude models must be deployed in your Azure Foundry account before use
They use the Anthropic Messages API (automatically converted from OpenAI chat completions format)
The proxy automatically handles the conversion - just use the standard OpenAI format
Requests are routed to /anthropic/v1/messages endpoint
Responses are automatically converted back to OpenAI chat completion format

Example - Standard OpenAI Format Works Seamlessly:

curl http://localhost:11437/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-azure-api-key" \
 -d '{
  "model": "claude-sonnet-4.5",
  "messages": [{"role": "user", "content": "Explain quantum computing in simple terms"}],
  "max_tokens": 1000
 }'

Behind the scenes:

Request is automatically converted to Anthropic Messages API format
Routed to https://your-endpoint.services.ai.azure.com/anthropic/v1/messages (no Azure api-version query parameter)
Response is converted back to OpenAI chat completion format
System messages are extracted and passed as the system parameter
Headers are automatically adjusted (x-api-key, anthropic-version: 2023-06-01)
Note: Uses ANTHROPIC_APIVERSION environment variable (default: 2023-06-01)

Example with custom deployment name: If your Claude deployment has a different name (e.g., Claude-Sonnet-45-20251001), use the model mapper:

AZURE_OPENAI_MODEL_MAPPER=claude-sonnet-4.5=Claude-Sonnet-45-20251001

Phi Models (Azure Foundry)

curl http://localhost:11437/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-azure-api-key" \
 -d '{
  "model": "phi-4",
  "messages": [{"role": "user", "content": "What is machine learning?"}]
 }'

Reasoning Models (Automatically routed to Responses API)

curl http://localhost:11437/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-azure-api-key" \
 -d '{
  "model": "o3-pro",
  "messages": [{"role": "user", "content": "Solve this complex reasoning problem..."}],
  "stream": true
 }'

Direct Responses API Access

curl http://localhost:11437/v1/responses \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-azure-api-key" \
 -d '{
  "model": "o3-pro",
  "input": "What are the implications of quantum computing?",
  "stream": false
 }'

For serverless deployments, use the model name as defined in your AZURE_AI_STUDIO_DEPLOYMENTS configuration.

Model Mapping Mechanism (Used for Custom deployment names)

These are the default mappings for the most common models, if your Azure OpenAI deployment uses different names, you can set the AZURE_OPENAI_MODEL_MAPPER environment variable to define custom mappings. The proxy also includes a comprehensive failsafe list to handle a wide variety of model names:

Reasoning Models (O-series)

OpenAI Model	Azure OpenAI Model
`"o1"`	`"o1"`
`"o1-preview"`	`"o1-preview"`
`"o1-mini"`	`"o1-mini"`
`"o1-mini-2024-09-12"`	`"o1-mini-2024-09-12"`
`"o3"`	`"o3"`
`"o3-mini"`	`"o3-mini"`
`"o3-pro"`	`"o3-pro"`
`"o3-pro-2025-06-10"`	`"o3-pro-2025-06-10"`
`"o4"`	`"o4"`
`"o4-mini"`	`"o4-mini"`

Claude Models (Azure Foundry)

OpenAI Model	Azure OpenAI Model
`"claude-opus-4.5"`	`"claude-opus-4.5"`
`"claude-opus-4-5"`	`"claude-opus-4.5"`
`"claude-sonnet-4.5"`	`"claude-sonnet-4.5"`
`"claude-sonnet-4-5"`	`"claude-sonnet-4.5"`
`"claude-haiku-4.5"`	`"claude-haiku-4.5"`
`"claude-haiku-4-5"`	`"claude-haiku-4.5"`
`"claude-opus-4.1"`	`"claude-opus-4.1"`
`"claude-opus-4-1"`	`"claude-opus-4.1"`

GPT Models

OpenAI Model	Azure OpenAI Model
`"gpt-4o"`	`"gpt-4o"`
`"gpt-4o-2024-05-13"`	`"gpt-4o-2024-05-13"`
`"gpt-4o-2024-08-06"`	`"gpt-4o-2024-08-06"`
`"gpt-4o-2024-11-20"`	`"gpt-4o-2024-11-20"`
`"gpt-4o-mini"`	`"gpt-4o-mini"`
`"gpt-4o-mini-2024-07-18"`	`"gpt-4o-mini-2024-07-18"`
`"gpt-4"`	`"gpt-4-0613"`
`"gpt-4-turbo"`	`"gpt-4-turbo"`
`"gpt-4-turbo-2024-04-09"`	`"gpt-4-turbo-2024-04-09"`
`"gpt-3.5-turbo"`	`"gpt-35-turbo-0613"`
`"gpt-3.5-turbo-16k"`	`"gpt-35-turbo-16k-0613"`

Phi Models (Azure Foundry)

OpenAI Model	Azure OpenAI Model
`"phi-3"`	`"phi-3"`
`"phi-3-mini"`	`"phi-3-mini"`
`"phi-3-small"`	`"phi-3-small"`
`"phi-3-medium"`	`"phi-3-medium"`
`"phi-4"`	`"phi-4"`

Other Models

OpenAI Model	Azure OpenAI Model
`"text-embedding-3-small"`	`"text-embedding-3-small-1"`
`"text-embedding-3-large"`	`"text-embedding-3-large-1"`
`"dall-e-2"`	`"dall-e-2-2.0"`
`"dall-e-3"`	`"dall-e-3-3.0"`
`"tts"`	`"tts-001"`
`"tts-hd"`	`"tts-hd-001"`
`"whisper"`	`"whisper-001"`

For custom fine-tuned models, the model name can be passed directly. For models with deployment names different from the model names, custom mapping relationships can be defined, such as:

Model Name	Deployment Name
gpt-3.5-turbo	gpt-35-turbo-upgrade
gpt-3.5-turbo-0301	gpt-35-turbo-0301-fine-tuned

Reasoning Models & Responses API

Automatic Detection

The proxy automatically detects when you're using reasoning models (O1, O3, O4 series) and:

Routes to Responses API: Automatically converts /v1/chat/completions requests to use Azure's /openai/v1/responses endpoint
Converts Request Format: Transforms OpenAI chat messages to Responses API input format
Handles Streaming: Converts Responses API SSE events to OpenAI-compatible streaming format
Maintains Compatibility: Your client code doesn't need to change - use standard OpenAI format

Supported Reasoning Models

O1 Family: o1, o1-preview, o1-mini, o1-mini-2024-09-12
O3 Family: o3, o3-pro, o3-mini, o3-pro-2025-06-10
O4 Family: o4, o4-mini

Response API Features

When using reasoning models, you get access to:

Advanced Reasoning: Enhanced problem-solving capabilities
Reasoning Traces: Detailed reasoning process (when available)
Background Processing: Support for long-running reasoning tasks
Chain of Thought: Structured reasoning outputs

Important Notes

Always use HTTPS in production environments for secure communication.
Regularly update the proxy to ensure compatibility with the latest Azure OpenAI API changes.
Monitor your Azure OpenAI usage and costs, especially when using this proxy in high-traffic scenarios.
Reasoning models may have higher latency due to their advanced processing capabilities.
Some reasoning models may have usage limits or require special access permissions.

Troubleshooting

Claude Models

✅ NEW: Native Anthropic Messages API Support

Claude models now use the Anthropic Messages API (/anthropic/v1/messages)
Automatic conversion from OpenAI chat completions format
Automatic response conversion back to OpenAI format
No configuration changes needed - use standard OpenAI format

Error: "This model is not supported by Responses API"

Fixed: Claude models now correctly use Anthropic Messages API (not Responses API or standard Chat Completions)
Solution: Update to the latest version - the proxy now automatically routes Claude to the correct endpoint

Error: "Unknown model: claude-sonnet-4-5" or similar

Cause: The deployment name in Azure doesn't match the model name you're using

Solution: Use AZURE_OPENAI_MODEL_MAPPER to map the model name to your actual Azure deployment name:

# If your deployment is named something like "Claude-Sonnet-45-20251001"
AZURE_OPENAI_MODEL_MAPPER=claude-sonnet-4.5=Claude-Sonnet-45-20251001

Tip: Check your Azure Foundry portal to see the exact deployment name

Deployment Requirements:

Claude models must be deployed in your Azure Foundry account (East US2 or Sweden Central)
They require Global Standard deployment
The endpoint format is https://your-resource.services.ai.azure.com
Uses x-api-key header and anthropic-version: 2023-06-01

General 404 Errors

Error: "Resource not found" (404)

Check deployment exists: Verify the model is deployed in your Azure account
Check deployment name: Use the detailed logging to see what deployment name is being used
Use model mapper: Map model names to your actual deployment names if they differ

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License.

Disclaimer

This project is not officially associated with or endorsed by Microsoft Azure or OpenAI. Use at your own discretion and ensure compliance with all relevant terms of service.

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
.github		.github
pkg		pkg
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
azure-oai-proxy		azure-oai-proxy
compose.yaml		compose.yaml
example.env		example.env
go.mod		go.mod
go.sum		go.sum
main.go		main.go

License

Gyarbij/azure-oai-proxy

Folders and files

Latest commit

History

Repository files navigation

Azure OpenAI Proxy

Introduction

Key Features

Use Cases

Important Note

Supported APIs

Model Support & API Routing

Traditional Models (Chat Completions API)

Reasoning Models (Responses API)

Audio Models

Image & Video Generation

Configuration

Environment Variables

Usage

Docker Compose

Using an .env File

Running from GitHub Container Registry

Usage Examples

Calling the API

Traditional Chat Models (GPT-4o, GPT-4, etc.)

Claude Models (Azure Foundry)

Phi Models (Azure Foundry)

Reasoning Models (Automatically routed to Responses API)

Direct Responses API Access

Model Mapping Mechanism (Used for Custom deployment names)

Reasoning Models (O-series)

Claude Models (Azure Foundry)

GPT Models

Phi Models (Azure Foundry)

Other Models

Reasoning Models & Responses API

Automatic Detection

Supported Reasoning Models

Response API Features

Important Notes

Troubleshooting

Claude Models

General 404 Errors

Recently Updated

Contributing

License

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Languages

Packages