Skip to content

AbdoAlshoki2/Cairo-Dictionary-AI-Ray-Backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cairo Dictionary AI – Backend with Ray Serve

Ray Serve backend for Arabic Speech Recognition, Text Correction, and Text-to-Speech (TTS). This implementation uses Ray Serve to deploy our models — DeepAr for Arabic speech-to-text and AraFix for text correction — as scalable microservices. The models are included in this project as Git submodules and are also available on our CUAIStudents HuggingFace organization.

Key Features

  • Dynamic Batching: Ray Serve automatically batches incoming requests to maximize GPU utilization and throughput
  • Scalability: Easily scale to multiple replicas to handle increased load
  • Fault Tolerance: Automatic recovery from worker failures
  • Resource Management: Fine-grained control over CPU/GPU resources

Installation

Make sure ffmpeg and python are installed on your system.

macOS:

# Install Homebrew (if not installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install ffmpeg
brew install ffmpeg

# Install Python (if needed)
brew install [email protected]

Ubuntu/Debian:

# Update package list
sudo apt update

# Install ffmpeg
sudo apt install ffmpeg

# Install Python
sudo apt install python3.11 python3.11-venv

Clone with submodules (includes DeepAr + AraFix):

git clone --recurse-submodules https://github.com/AbdoAlshoki2/Cairo-Dictionary-AI-Ray-Backend.git <any_dir>
cd <any_dir>/ray-api

Create a virtual environment and install dependencies:

python -m venv venv
.\venv\Scripts\activate  # Windows
# source venv/bin/activate  # Linux/Mac
pip install -r src/requirements.txt

Running the API

Development Mode

Start Ray and deploy the services:

# Start Ray in the background
ray start --head

# Deploy the services
serve run src/config.yaml

API Endpoints

Speech-to-Text

POST /api/v1/audio

  • Upload audio file in the request body
  • Returns transcribed text

Text Correction

POST /api/v1/text

  • Input: {"text": "your arabic text"}
  • Returns corrected Arabic text

Text-to-Speech

/api/v1/voice_generator

  • Input: {"text": "your arabic text"}
  • Returns audio stream

Dynamic Batching

This implementation leverages Ray Serve's built-in dynamic batching to maximize throughput:

  • Requests are automatically batched based on model requirements
  • Batch size is dynamically adjusted for optimal performance
  • Reduces latency by processing multiple requests simultaneously

Project Structure

ray-api/
├── src/
│   ├── apps/               # Ray Serve applications
│   │   ├── __init__.py
│   │   ├── text_corrector.py
│   │   ├── transcriber.py
│   │   └── voice_generator.py
│   │
│   ├── models/             # Model implementations
│   │   ├── araFix/         # Text correction model
│   │   └── whisper/        # Speech recognition model
│   │
│   ├── schemas/            # Pydantic models
│   │   ├── correction.py
│   │   └── tts.py
│   │
│   ├── config.yaml         # Ray Serve configuration
│   └── main.py            # Entry point for Ray Serve
│
├── .gitmodules            # Git submodules configuration
└── README.md              # This file

About

Ray Serve backend for Arabic Speech Recognition

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages