A real-time, multilingual voice assistant prototype that enables natural spoken conversations in multiple Indic languages — including Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Gujarati, Odia, Punjabi, and Assamese — as well as English.
The application is built on the Pipecat open-source framework and leverages AWS services (Amazon Bedrock for LLM, Amazon Transcribe for STT, Amazon Polly for TTS, and Amazon Nova Sonic for speech-to-speech) alongside third-party providers like Smallest AI for broader Indic language STT/TTS coverage. A React frontend connects to a FastAPI backend over WebSocket, streaming audio in real time for low-latency, interactive voice conversations. The infrastructure is defined as code using AWS CDK for easy deployment.
Key Features:
- Real-time bidirectional audio streaming via WebSocket
- Support for multiple Indic languages
- Pluggable STT/TTS pipeline — switch between AWS-native services, Smallest AI, or Amazon Nova Sonic
- Tool-calling support (e.g., weather lookup) powered by Strands Agents and Pipecat function calling
- Voice Activity Detection (VAD) using Silero for natural turn-taking
- Capability to handle interruptions
- 3D animated avatar or spectogram on the frontend
- Dynamic, contextual waiting message for tools that take longer time to process
- AWS CDK infrastructure for VPC, ECS, and Cognito-based authentication
- Connect — The React frontend authenticates via Cognito and opens a WebSocket to the FastAPI backend, passing the selected pipeline and language.
- Capture — The browser captures microphone audio at 16 kHz, encodes it as base64 PCM16, and streams it to the backend over WebSocket.
- Process — The backend runs a Pipecat pipeline: STT → LLM (Bedrock) → TTS (or Nova Sonic for end-to-end speech-to-speech). Silero VAD handles turn-taking, and Strands Agents enable tool use.
- Respond — Generated audio is streamed back to the frontend, played via an AudioWorklet, and used to drive the visualizations.
- Interrupt — If the user speaks mid-response, VAD triggers an interruption that stops playback immediately.
- Python 3.12+ with pip
- Node.js 22+ with npm
- AWS CLI configured with appropriate credentials
- AWS CDK (
npm install -g aws-cdk) - Docker (for local testing and CDK deployments)
- Git
- Active AWS account with admin permissions
- Development machine with above tools installed or optionally create a new Ubuntu EC2 machine and install above tools
- Smallest.ai API Key (alternative to AWS STT/TTS)
- In your AWS account, create a Bedrock knowledgebase and a secret in the Secret manager. AWS CLI command to create secret for Smallest.ai
aws secretsmanager create-secret \
--name "smallest_key" \
--description "Smallest.ai API key for TTS/STT" \
--secret-string "sk_your_actual_api_key_here" \
--region ap-south-1
- Clone this repo to an Ubuntu machine.
- Update the infra/cdk.json with the knowledgebase ID and secret name.
- Install frontend dependencies and deploy the infrastructure.
# Install frontend dependencies cd frontend npm install # Deploy the infrastructure (automatically builds frontend) cd ../infra pip install -r requirements.txt cdk deploy - In your AWS account, find the Cognito Userpool created by the above deployment and add a user.
- In your browser, access the endpoint given by the deployment command above and login with the user you just created.
To change the behaviour of the bot, you can edit the prompt and the set of tools in the backend/usecase.py file. You may also want to load the rigth set of files into the Bedrock knowledgebase for the bot to answer correctly based on facts. Remember to re-deploy after your changes.
- In your AWS account, delete the Bedrock knowledgebase and the secret in the Secret manager.
- Run the following commands to remove the backend.
cd infra cdk destroy
You should consider doing your own independent assessment before using the content in this sample for production purposes. This may include (amongst other things) testing, securing, and optimizing the content provided in this sample, based on your specific quality control practices and standards.
