Ray Serve backend for Arabic Speech Recognition, Text Correction, and Text-to-Speech (TTS). This implementation uses Ray Serve to deploy our models — DeepAr for Arabic speech-to-text and AraFix for text correction — as scalable microservices. The models are included in this project as Git submodules and are also available on our CUAIStudents HuggingFace organization.
- Dynamic Batching: Ray Serve automatically batches incoming requests to maximize GPU utilization and throughput
- Scalability: Easily scale to multiple replicas to handle increased load
- Fault Tolerance: Automatic recovery from worker failures
- Resource Management: Fine-grained control over CPU/GPU resources
Make sure ffmpeg and python are installed on your system.
macOS:
# Install Homebrew (if not installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install ffmpeg
brew install ffmpeg
# Install Python (if needed)
brew install [email protected]Ubuntu/Debian:
# Update package list
sudo apt update
# Install ffmpeg
sudo apt install ffmpeg
# Install Python
sudo apt install python3.11 python3.11-venvClone with submodules (includes DeepAr + AraFix):
git clone --recurse-submodules https://github.com/AbdoAlshoki2/Cairo-Dictionary-AI-Ray-Backend.git <any_dir>
cd <any_dir>/ray-apiCreate a virtual environment and install dependencies:
python -m venv venv
.\venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
pip install -r src/requirements.txtStart Ray and deploy the services:
# Start Ray in the background
ray start --head
# Deploy the services
serve run src/config.yamlPOST /api/v1/audio
- Upload audio file in the request body
- Returns transcribed text
POST /api/v1/text
- Input:
{"text": "your arabic text"} - Returns corrected Arabic text
/api/v1/voice_generator
- Input:
{"text": "your arabic text"} - Returns audio stream
This implementation leverages Ray Serve's built-in dynamic batching to maximize throughput:
- Requests are automatically batched based on model requirements
- Batch size is dynamically adjusted for optimal performance
- Reduces latency by processing multiple requests simultaneously
ray-api/
├── src/
│ ├── apps/ # Ray Serve applications
│ │ ├── __init__.py
│ │ ├── text_corrector.py
│ │ ├── transcriber.py
│ │ └── voice_generator.py
│ │
│ ├── models/ # Model implementations
│ │ ├── araFix/ # Text correction model
│ │ └── whisper/ # Speech recognition model
│ │
│ ├── schemas/ # Pydantic models
│ │ ├── correction.py
│ │ └── tts.py
│ │
│ ├── config.yaml # Ray Serve configuration
│ └── main.py # Entry point for Ray Serve
│
├── .gitmodules # Git submodules configuration
└── README.md # This file