RAG Voice Agent

A sophisticated voice-enabled AI assistant that combines Retrieval Augmented Generation (RAG) with speech capabilities to provide intelligent, context-aware responses through both voice and text interactions.

🌟 Key Features

Voice Interaction: Seamless voice input and output using advanced ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) models
RAG-Powered Knowledge Base: Intelligent document retrieval and response generation using vector database technology
Multi-Modal Interface: Supports both voice and text-based interactions
Web Search Integration: Capability to search the internet for up-to-date information
Conversational Memory: Maintains context through conversation history
Extensible Tool System: Modular architecture with support for adding new tools and capabilities

🛠️ Technology Stack

ASR Model: Faster Whisper (Medium variant) for accurate speech recognition
Embedding Model: Sentence Transformers (all-MiniLM-L6-v2) for document embeddings
LLM: TinyLlama-1.1B-Chat for response generation
TTS Model: Glow-TTS for natural speech synthesis
Vector Database: Qdrant for efficient similarity search
Web Search: DuckDuckGo integration for real-time information

🏗️ Architecture

The system consists of several key components:

VoiceRAGAgent: Core agent class that orchestrates all components and manages the interaction flow
AudioHandler: Manages voice input/output, including recording and speech synthesis
TaskHandler: Processes user queries and determines appropriate actions
ComponentInitializer: Handles initialization of all AI models and components
Tools: Modular system including:
- SearchDocumentsTool: RAG knowledge base search
- WebSearchTool: Internet search capability
- SaveNoteTool: Note-taking functionality

🚀 Key Features In-Depth

Voice Interaction

Real-time voice input processing with automatic silence detection
Natural-sounding speech output using advanced TTS
Seamless switching between voice and text modes

Knowledge Processing

RAG-based document retrieval for accurate information access
Web search integration for real-time information
Context-aware response generation
Conversation memory for maintaining context

System Commands

Voice input control ("voice" to start/stop)
System status checks
Memory management
Tool listing and help commands

💡 Use Cases

Knowledge Base Queries: Access information from ingested documents with natural language
Real-time Information: Get updated information through web searches
Interactive Conversations: Engage in context-aware dialogue
Voice-First Interaction: Hands-free operation for various tasks

🔧 Implementation Details

Data Ingestion

PDF document processing with chunking
Vector embedding generation
Efficient storage in Qdrant vector database

Query Processing

Speech-to-text conversion
Query understanding and routing
Context-aware response generation
Text-to-speech synthesis

🎯 Main Components

main_agent.py: Core agent implementation
audio_handler.py: Voice I/O management
task_handler.py: Query processing and routing
Tools.py: Implementation of various tools
data_ingestion.py & rebuild_db.py: Document processing and storage
AgentState.py: State management
config.py: System configuration

📝 Configuration

The system is highly configurable through config.py, allowing customization of:

Model selections and parameters
Audio processing settings
Database configurations
System behaviors and timeouts

🌟 Features in Development

Enhanced multi-document support
Improved context understanding
Additional tool integrations
Extended web search capabilities

👨‍💻 Developer

Sandeep (@sandeep231004)
Last Updated: 2025-05-27

This voice-enabled RAG agent represents a sophisticated approach to combining various AI technologies into a cohesive, interactive system that can process both voice and text inputs while providing intelligent, context-aware responses.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
documents		documents
AgentState.py		AgentState.py
ConversationMemory.py		ConversationMemory.py
README.md		README.md
Tools.py		Tools.py
audio_handler.py		audio_handler.py
check_collections.py		check_collections.py
check_db_status.py		check_db_status.py
config.py		config.py
data__ingestion.py		data__ingestion.py
initializers.py		initializers.py
llm_handler.py		llm_handler.py
main_agent.py		main_agent.py
rebuild_db.py		rebuild_db.py
requirements.txt		requirements.txt
task_handler.py		task_handler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Voice Agent

🌟 Key Features

🛠️ Technology Stack

🏗️ Architecture

🚀 Key Features In-Depth

Voice Interaction

Knowledge Processing

System Commands

💡 Use Cases

🔧 Implementation Details

Data Ingestion

Query Processing

🎯 Main Components

📝 Configuration

🌟 Features in Development

👨‍💻 Developer

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sandeep231004/RAG-Voice-Agent

Folders and files

Latest commit

History

Repository files navigation

RAG Voice Agent

🌟 Key Features

🛠️ Technology Stack

🏗️ Architecture

🚀 Key Features In-Depth

Voice Interaction

Knowledge Processing

System Commands

💡 Use Cases

🔧 Implementation Details

Data Ingestion

Query Processing

🎯 Main Components

📝 Configuration

🌟 Features in Development

👨‍💻 Developer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages