Introduction

A real-time, multilingual voice assistant prototype that enables natural spoken conversations in multiple Indic languages — including Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Marathi, Gujarati, Odia, Punjabi, and Assamese — as well as English.

The application is built on the Pipecat open-source framework and leverages AWS services (Amazon Bedrock for LLM, Amazon Transcribe for STT, Amazon Polly for TTS, and Amazon Nova Sonic for speech-to-speech) alongside third-party providers like Smallest AI for broader Indic language STT/TTS coverage. A React frontend connects to a FastAPI backend over WebSocket, streaming audio in real time for low-latency, interactive voice conversations. The infrastructure is defined as code using AWS CDK for easy deployment.

Key Features:

Real-time bidirectional audio streaming via WebSocket
Support for multiple Indic languages
Pluggable STT/TTS pipeline — switch between AWS-native services, Smallest AI, or Amazon Nova Sonic
Tool-calling support (e.g., weather lookup) powered by Strands Agents and Pipecat function calling
Voice Activity Detection (VAD) using Silero for natural turn-taking
Capability to handle interruptions
3D animated avatar or spectogram on the frontend
Dynamic, contextual waiting message for tools that take longer time to process
AWS CDK infrastructure for VPC, ECS, and Cognito-based authentication

Architecture

Data Flow

Connect — The React frontend authenticates via Cognito and opens a WebSocket to the FastAPI backend, passing the selected pipeline and language.
Capture — The browser captures microphone audio at 16 kHz, encodes it as base64 PCM16, and streams it to the backend over WebSocket.
Process — The backend runs a Pipecat pipeline: STT → LLM (Bedrock) → TTS (or Nova Sonic for end-to-end speech-to-speech). Silero VAD handles turn-taking, and Strands Agents enable tool use.
Respond — Generated audio is streamed back to the frontend, played via an AudioWorklet, and used to drive the visualizations.
Interrupt — If the user speaks mid-response, VAD triggers an interruption that stops playback immediately.

Prerequisites

Required Tools for Development & Deployment

Python 3.12+ with pip
Node.js 22+ with npm
AWS CLI configured with appropriate credentials
AWS CDK (npm install -g aws-cdk)
Docker (for local testing and CDK deployments)
Git

AWS Account Requirements

Active AWS account with admin permissions
Development machine with above tools installed or optionally create a new Ubuntu EC2 machine and install above tools

Other Requirements

Smallest.ai API Key (alternative to AWS STT/TTS)

Setting up

In your AWS account, create a Bedrock knowledgebase and a secret in the Secret manager. AWS CLI command to create secret for Smallest.ai

aws secretsmanager create-secret \
    --name "smallest_key" \
    --description "Smallest.ai API key for TTS/STT" \
    --secret-string "sk_your_actual_api_key_here" \
    --region ap-south-1

Clone this repo to an Ubuntu machine.
Update the infra/cdk.json with the knowledgebase ID and secret name.

Install frontend dependencies and deploy the infrastructure.

# Install frontend dependencies
cd frontend
npm install

# Deploy the infrastructure (automatically builds frontend)
cd ../infra
pip install -r requirements.txt
cdk deploy

In your AWS account, find the Cognito Userpool created by the above deployment and add a user.
In your browser, access the endpoint given by the deployment command above and login with the user you just created.

Customizing

To change the behaviour of the bot, you can edit the prompt and the set of tools in the backend/usecase.py file. You may also want to load the rigth set of files into the Bedrock knowledgebase for the bot to answer correctly based on facts. Remember to re-deploy after your changes.

Tearing down

In your AWS account, delete the Bedrock knowledgebase and the secret in the Secret manager.
Run the following commands to remove the backend.
```
cd infra
cdk destroy
```

Disclaimer

You should consider doing your own independent assessment before using the content in this sample for production purposes. This may include (amongst other things) testing, securing, and optimizing the content provided in this sample, based on your specific quality control practices and standards.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
backend		backend
frontend		frontend
infra		infra
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
THIRD-PARTY-LICENSES		THIRD-PARTY-LICENSES
architecture.png		architecture.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Architecture

Data Flow

Prerequisites

Required Tools for Development & Deployment

AWS Account Requirements

Other Requirements

Setting up

Customizing

Tearing down

Disclaimer

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Architecture

Data Flow

Prerequisites

Required Tools for Development & Deployment

AWS Account Requirements

Other Requirements

Setting up

Customizing

Tearing down

Disclaimer

Authors

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 0

Languages

Packages

Contributors