VAAK (वाक्)

Voice-Activated Autonomous Knowledge System

Scalable · Memory-Safe · Real-Time · Deterministic

Conversational AI Platform

Sales · Support · Discovery · Education · CRM

Air-Gapped · GDPR-Ready · Data Sovereign · Compliant Architecture

Edge AI · Sub-500ms · Distributed · Open Architecture

22 Indian Languages · Multilingual · Enterprise-Grade

Domain-Agnostic Voice / Text / Chat Agent Built with Rust · Runs on CPU · No Cloud Dependencies

📋 Table of Contents

Executive Summary
Why VAAK?
Key Differentiators
Architecture Overview
System Components
Technology Stack
Data Flow
Performance Benchmarks
Deployment Models
Configuration & Customization
For Product Managers
For Software Architects
Research & References
Project Structure
Getting Started
Acknowledgments

🎯 Executive Summary

VAAK (वाक् - Rigvedic Sanskrit for "Speech/Voice") is a production-grade conversational AI platform that enables enterprises to deploy intelligent voice, text, and chat agents without cloud dependencies, without LLM API costs, and without compromising on data privacy.

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│    🔒 100% ON-PREMISE          🖥️ CPU-ONLY INFERENCE       🌐 22 LANGUAGES │
│    No data leaves your         No GPU required              All Indian      │
│    infrastructure              Works on commodity HW        languages       │
│                                                                             │
│    ⚡ <500ms LATENCY           🔧 DOMAIN AGNOSTIC          🦀 PURE RUST    │
│    Real-time voice             YAML-based config            Memory-safe     │
│    conversations               for any industry             & performant    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

The Problem We Solve

Challenge	Traditional AI Agents	VAAK
Data Privacy	Customer data sent to cloud	All processing on-premise
LLM Costs	$0.01-0.10 per conversation	Zero API costs after deployment
Latency	1-3 seconds typical	<500ms end-to-end
Compliance	Complex DPAs needed	RBI/GDPR compliant by design
Indian Languages	Limited or poor quality	Native 22-language support
Customization	Complex code changes	YAML configuration only
Vendor Lock-in	Tied to cloud provider	Fully open-source stack

💡 Why VAAK?

For Business Leaders

                    ┌─────────────────────────────────────────┐
                    │         COST COMPARISON (Annual)         │
                    ├─────────────────────────────────────────┤
                    │                                         │
                    │  Cloud AI Agent (100K conversations):   │
                    │  ├── LLM API Costs:     ₹50-100 Lakhs   │
                    │  ├── STT/TTS APIs:      ₹20-40 Lakhs    │
                    │  ├── Cloud Infra:       ₹10-20 Lakhs    │
                    │  └── TOTAL:             ₹80-160 Lakhs   │
                    │                                         │
                    │  VAAK (Unlimited conversations):   │
                    │  ├── One-time Setup:    ₹15-25 Lakhs    │
                    │  ├── Hardware (3 nodes): ₹5-10 Lakhs    │
                    │  ├── Annual Maintenance: ₹5-10 Lakhs    │
                    │  └── TOTAL:             ₹25-45 Lakhs    │
                    │                                         │
                    │  💰 SAVINGS: 60-70% over 3 years        │
                    │                                         │
                    └─────────────────────────────────────────┘

For Technical Leaders

No Python runtime dependencies - Single binary deployment
Memory-safe Rust - No segfaults, no GC pauses, no data races
True parallelism - No GIL, scales with CPU cores
Deterministic behavior - Same input = same output (no temperature drift)
Audit trails - SHA-256 merkle chain for regulatory compliance

⚡ Key Differentiators

┌────────────────────────────────────────────────────────────────────────────────┐
│                          VAAK vs. ALTERNATIVES                             │
├────────────────────────────────────────────────────────────────────────────────┤
│                                                                                │
│                    VAAK    │  Cloud APIs   │  Open Source  │  In-House   │
│   ─────────────────────────────────────────────────────────────────────────   │
│   Air-gapped Ready      ✅      │      ❌       │      ⚠️       │     ⚠️      │
│   CPU-Only             ✅      │      N/A      │      ⚠️       │     ❌      │
│   22 Indian Languages   ✅      │      ⚠️       │      ❌       │     ❌      │
│   <500ms Latency        ✅      │      ❌       │      ⚠️       │     ⚠️      │
│   Zero API Costs        ✅      │      ❌       │      ✅       │     ✅      │
│   Production Ready      ✅      │      ✅       │      ⚠️       │     ❌      │
│   Domain Agnostic       ✅      │      ✅       │      ⚠️       │     ❌      │
│   Memory Safe           ✅      │      N/A      │      ❌       │     ⚠️      │
│   RAG Built-in          ✅      │      ⚠️       │      ⚠️       │     ❌      │
│   Compliance Tools      ✅      │      ⚠️       │      ❌       │     ❌      │
│                                                                                │
│   Legend: ✅ Full Support  ⚠️ Partial/Complex  ❌ Not Available               │
│                                                                                │
└────────────────────────────────────────────────────────────────────────────────┘

🏗️ Architecture Overview

High-Level System Architecture

graph TB
    subgraph Client[Client Layer]
        WEB[Web Browser]
        MOB[Mobile App]
        TEL[Telephony]
        API[REST API]
    end

    subgraph Transport[Transport Layer]
        WS[WebSocket Server]
        WEBRTC[WebRTC Signaling]
        HTTP[HTTP Endpoints]
    end

    subgraph Pipeline[Core Pipeline]
        VAD[VAD]
        STT[STT]
        NLU[NLU]
        AGENT[Agent]
        TTS[TTS]
    end

    subgraph Intelligence[Intelligence Layer]
        RAG[Hybrid RAG Engine]
        LLM[Local LLM Inference]
        DST[Dialogue State Tracker]
        TOOLS[MCP Tool Executor]
    end

    subgraph Data[Data Layer]
        QDRANT[Qdrant Vector DB]
        SCYLLA[ScyllaDB]
        CONFIG[YAML Configs]
        MODELS[ONNX Models]
    end

    WEB --> WS
    MOB --> WEBRTC
    TEL --> WS
    API --> HTTP

    WS --> VAD
    WEBRTC --> VAD
    HTTP --> NLU

    VAD --> STT
    STT --> NLU
    NLU --> AGENT
    AGENT --> TTS
    TTS --> WS
    TTS --> WEBRTC

    AGENT --> RAG
    AGENT --> LLM
    AGENT --> DST
    AGENT --> TOOLS

    RAG --> QDRANT
    DST --> SCYLLA
    TOOLS --> SCYLLA
    CONFIG --> AGENT
    MODELS --> STT
    MODELS --> TTS
    MODELS --> LLM

Component Interaction Diagram

┌─────────────────────────────────────────────────────────────────────────────────┐
│                           VAAK COMPONENT ARCHITECTURE                       │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                 │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │                              SERVER CRATE                                │   │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐  │   │
│  │  │   Axum    │ │  WebRTC   │ │ WebSocket │ │    MCP    │ │ Metrics  │  │   │
│  │  │   HTTP    │ │ Signaling │ │  Handler  │ │  Server   │ │Prometheus│  │   │
│  │  └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └────┬─────┘  │   │
│  └────────┼─────────────┼─────────────┼─────────────┼────────────┼────────┘   │
│           │             │             │             │            │             │
│  ┌────────┼─────────────┼─────────────┼─────────────┼────────────┼────────┐   │
│  │        └─────────────┴─────────────┴─────────────┘            │        │   │
│  │                              ▼                                 │        │   │
│  │  ┌─────────────────────────────────────────────────────────────────┐   │   │
│  │  │                        PIPELINE CRATE                           │   │   │
│  │  │                                                                 │   │   │
│  │  │   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐        │   │   │
│  │  │   │   VAD   │──▶│   STT   │──▶│  Turn   │──▶│   TTS   │        │   │   │
│  │  │   │ Silero/ │   │  Indic  │   │Detector │   │ IndicF5/│        │   │   │
│  │  │   │MagicNet │   │Conformer│   │Semantic │   │  Piper  │        │   │   │
│  │  │   └─────────┘   └─────────┘   └─────────┘   └─────────┘        │   │   │
│  │  │                                                                 │   │   │
│  │  └─────────────────────────────────────────────────────────────────┘   │   │
│  │                              │                                         │   │
│  │                              ▼                                         │   │
│  │  ┌─────────────────────────────────────────────────────────────────┐   │   │
│  │  │                         AGENT CRATE                             │   │   │
│  │  │                                                                 │   │   │
│  │  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐           │   │   │
│  │  │  │  Stage   │ │  Memory  │ │   DST    │ │   Lead   │           │   │   │
│  │  │  │ Manager  │ │ Agentic  │ │ Tracker  │ │ Scoring  │           │   │   │
│  │  │  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘           │   │   │
│  │  │       └────────────┼────────────┼────────────┘                 │   │   │
│  │  │                    ▼            ▼                               │   │   │
│  │  │  ┌──────────────────────────────────────────────────────────┐  │   │   │
│  │  │  │              GoldLoanAgent (Orchestrator)                │  │   │   │
│  │  │  └──────────────────────────────────────────────────────────┘  │   │   │
│  │  └─────────────────────────────────────────────────────────────────┘   │   │
│  │                              │                                         │   │
│  │              ┌───────────────┼───────────────┐                         │   │
│  │              ▼               ▼               ▼                         │   │
│  │  ┌───────────────┐ ┌───────────────┐ ┌───────────────┐                │   │
│  │  │   RAG CRATE   │ │   LLM CRATE   │ │  TOOLS CRATE  │                │   │
│  │  │               │ │               │ │               │                │   │
│  │  │ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │                │   │
│  │  │ │  Hybrid   │ │ │ │  Ollama   │ │ │ │   MCP     │ │                │   │
│  │  │ │ Retriever │ │ │ │  Claude   │ │ │ │ Protocol  │ │                │   │
│  │  │ └───────────┘ │ │ │  OpenAI   │ │ │ └───────────┘ │                │   │
│  │  │ ┌───────────┐ │ │ └───────────┘ │ │ ┌───────────┐ │                │   │
│  │  │ │ Reranker  │ │ │ ┌───────────┐ │ │ │ Gold Loan │ │                │   │
│  │  │ │EarlyExit  │ │ │ │Speculative│ │ │ │  Tools    │ │                │   │
│  │  │ └───────────┘ │ │ │ Decoding  │ │ │ └───────────┘ │                │   │
│  │  └───────────────┘ │ └───────────┘ │ └───────────────┘                │   │
│  │              │     └───────────────┘         │                        │   │
│  └──────────────┼───────────────────────────────┼────────────────────────┘   │
│                 ▼                               ▼                             │
│  ┌───────────────────────────────┐ ┌───────────────────────────────┐         │
│  │     TEXT_PROCESSING CRATE     │ │      PERSISTENCE CRATE        │         │
│  │                               │ │                               │         │
│  │  • Grammar Correction (LLM)   │ │  • ScyllaDB Sessions          │         │
│  │  • Translation (IndicTrans2)  │ │  • Audit Logging (Merkle)     │         │
│  │  • PII Detection (Hybrid)     │ │  • SMS/Appointment Store      │         │
│  │  • Compliance Checking        │ │  • Gold Price Service         │         │
│  │  • Sentiment Analysis         │ │                               │         │
│  │  • Entity Extraction          │ └───────────────────────────────┘         │
│  └───────────────────────────────┘                                           │
│                 │                                                             │
│                 ▼                                                             │
│  ┌───────────────────────────────────────────────────────────────────────┐   │
│  │                            CORE CRATE                                  │   │
│  │                                                                        │   │
│  │  Unified Traits │ Audio Types │ 22 Languages │ Error Types │ PII      │   │
│  │                                                                        │   │
│  └───────────────────────────────────────────────────────────────────────┘   │
│                                                                               │
└───────────────────────────────────────────────────────────────────────────────┘

🔧 System Components

Crate Dependency Graph

graph TD
    subgraph Entry[Entry Point]
        SERVER[server]
    end

    subgraph Orchestration[Orchestration]
        AGENT[agent]
        PIPELINE[pipeline]
    end

    subgraph Intel[Intelligence]
        RAG[rag]
        LLM[llm]
        TOOLS[tools]
        TEXT[text_processing]
    end

    subgraph Infra[Infrastructure]
        TRANSPORT[transport]
        PERSISTENCE[persistence]
    end

    subgraph Foundation[Foundation]
        CONFIG[config]
        CORE[core]
    end

    SERVER --> AGENT
    SERVER --> PIPELINE
    SERVER --> PERSISTENCE
    SERVER --> TRANSPORT
    SERVER --> CONFIG

    AGENT --> RAG
    AGENT --> LLM
    AGENT --> TOOLS
    AGENT --> TEXT
    AGENT --> PIPELINE

    PIPELINE --> CORE
    PIPELINE --> CONFIG

    RAG --> LLM
    RAG --> CORE
    RAG --> CONFIG

    LLM --> CORE
    LLM --> CONFIG

    TOOLS --> CORE
    TOOLS --> CONFIG
    TOOLS --> PERSISTENCE

    TEXT --> CORE
    TEXT --> CONFIG

    TRANSPORT --> CORE
    TRANSPORT --> CONFIG

    PERSISTENCE --> CORE

    CONFIG --> CORE

Component Summary

Crate	Lines	Purpose	Key Features
core	~2,000	Foundation traits & types	13 traits, 22 languages, audio types
config	~1,500	Configuration management	YAML/TOML, hot-reload, domain config
pipeline	~4,000	Audio processing	VAD, STT, TTS, turn detection, barge-in
agent	~3,500	Conversation orchestration	DST, memory, lead scoring, stages
rag	~2,500	Retrieval system	Hybrid search, reranking, caching
llm	~1,500	LLM integration	Multi-provider, speculative decoding
text_processing	~2,000	NLP pipeline	Grammar, translation, PII, compliance
tools	~1,200	MCP tool interface	JSON-RPC, gold loan tools
transport	~800	Audio transport	WebRTC, WebSocket, Opus codec
persistence	~1,000	Data storage	ScyllaDB, audit logs, sessions
server	~1,500	HTTP/WS server	Axum, metrics, auth

🛠️ Technology Stack

Core Technologies

┌─────────────────────────────────────────────────────────────────────────────┐
│                           TECHNOLOGY STACK                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  LAYER              TECHNOLOGY                  PURPOSE                     │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                             │
│  Runtime            Rust 1.75+                  Memory-safe systems lang    │
│  Async              Tokio                       High-performance async I/O  │
│  Web                Axum 0.7                    HTTP/WebSocket framework    │
│                                                                             │
│  ML Inference       ONNX Runtime (ort)          Cross-platform inference    │
│                     Candle                      Pure Rust neural networks   │
│                                                                             │
│  Speech             IndicConformer              22 Indian language STT      │
│                     IndicF5 / Piper             Indian language TTS         │
│                     Silero VAD                  Voice activity detection    │
│                                                                             │
│  LLM                Qwen 2.5 (1.5B-7B)          Local reasoning             │
│                     Ollama                      Model serving               │
│                     Claude/OpenAI               Cloud fallback (optional)   │
│                                                                             │
│  Translation        IndicTrans2                 22 language translation     │
│                                                                             │
│  Vector Search      Qdrant                      Semantic search             │
│  Text Search        Tantivy                     BM25 full-text search       │
│                                                                             │
│  Database           ScyllaDB                    High-availability storage   │
│                                                                             │
│  Observability      Prometheus                  Metrics collection          │
│                     OpenTelemetry               Distributed tracing         │
│                     Tracing                     Structured logging          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Model Stack

Model	Size	Purpose	Quantization	Latency
Silero VAD	2MB	Voice detection	FP32	<10ms
IndicConformer	600MB	Indian STT	FP16	~100ms
SmolLM2-135M	270MB	Turn detection	Q4	~30ms
Qwen2.5-1.5B	1.2GB	Fast responses	Q4_K_M	~120ms
Qwen2.5-7B	4.5GB	Complex reasoning	Q4_K_M	~300ms
IndicF5	500MB	Indian TTS	FP16	~60ms
E5-Multilingual	278MB	Embeddings	FP16	~20ms
BGE-Reranker-M3	500MB	Cross-encoder	FP16	~20ms

🔄 Data Flow

Voice Conversation Flow

sequenceDiagram
    participant U as User
    participant T as Transport
    participant P as Pipeline
    participant A as Agent
    participant R as RAG
    participant L as LLM
    participant S as TTS

    U->>T: Audio Stream
    T->>P: Raw Audio Frames

    loop VAD Processing
        P->>P: Silero VAD Detection
    end

    P->>P: IndicConformer STT
    P->>P: Semantic Turn Detection
    P->>A: Final Transcript

    par Parallel Processing
        A->>R: RAG Query
        R->>R: Hybrid Search
        R->>R: Cross-Encoder Reranking
        R-->>A: Relevant Documents
    and DST Update
        A->>A: Update DST State
        A->>A: Detect Intent
    end

    A->>L: Generate Response

    loop Token Streaming
        L-->>A: Token Chunk
        A->>A: Sentence Detection
        A->>S: Complete Sentence
        S-->>T: Audio Frame
        T-->>U: Play Audio
    end

    Note over U,S: First audio plays before LLM completes

Latency Breakdown

┌─────────────────────────────────────────────────────────────────────────────┐
│                        LATENCY BUDGET (Target: <500ms)                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Stage                          Target      Technique                       │
│  ───────────────────────────────────────────────────────────────────────   │
│  Audio Receive                  10ms        WebRTC low-latency              │
│  VAD Detection                  10ms        10ms frame processing           │
│  STT Streaming                  100ms       Partial results + prefetch      │
│  Turn Detection                 30ms        SmolLM2-135M semantic           │
│  RAG (speculative)              50ms        Prefetch on partial transcript  │
│  Reranking                      20ms        PABEE early-exit                │
│  LLM First Token                120ms       SLM race + speculative          │
│  TTS First Audio                60ms        Word-level streaming            │
│  Audio Send                     10ms        Low-latency buffer              │
│  ───────────────────────────────────────────────────────────────────────   │
│  TOTAL                          ~410ms      Typical end-to-end              │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                         TIMELINE VIEW                                │   │
│  │                                                                      │   │
│  │  0ms   100ms   200ms   300ms   400ms   500ms                        │   │
│  │  │──────│──────│──────│──────│──────│                               │   │
│  │  │ VAD+STT     │ RAG+RANK │ LLM    │ TTS  │                         │   │
│  │  │             │          │        │      │                         │   │
│  │  │◄───────────►│◄────────►│◄──────►│◄────►│                         │   │
│  │  │    ~110ms   │  ~70ms   │ ~120ms │~60ms │                         │   │
│  │  │                                        │                         │   │
│  │  │                        First audio ────┘                         │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

📊 Performance Benchmarks

Latency Comparison

┌─────────────────────────────────────────────────────────────────────────────┐
│                    END-TO-END LATENCY COMPARISON (P95)                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  System                      Latency        Hardware                        │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                             │
│  VAAK (this project)    ~450ms         8-core CPU, 32GB RAM           │
│  OpenAI Realtime API         ~300ms         Cloud (with network)           │
│  Bland.ai                    ~800ms         Cloud                          │
│  Retell.ai                   ~700ms         Cloud                          │
│  Vapi                        ~1000ms        Cloud                          │
│  Custom Python Stack         ~1200ms        8-core CPU, 32GB RAM           │
│                                                                             │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  LATENCY BAR CHART                                                   │   │
│  │                                                                      │   │
│  │  VAAK        ████████████████                           450ms  │   │
│  │  OpenAI Realtime  ██████████                                 300ms  │   │
│  │  Bland.ai         ██████████████████████████                 800ms  │   │
│  │  Retell.ai        ███████████████████████                    700ms  │   │
│  │  Vapi             ████████████████████████████████          1000ms  │   │
│  │  Python Stack     ████████████████████████████████████████  1200ms  │   │
│  │                   │         │         │         │         │         │   │
│  │                   0       300       600       900      1200ms       │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Throughput & Resource Usage

Metric	Value	Configuration
Concurrent Sessions	100+	8-core CPU, 32GB RAM
Requests/Second	50+	Per CPU core
Memory (Idle)	~2GB	Base server
Memory (Loaded)	~8GB	With all models
Binary Size	~150MB	Release build
Startup Time	~5s	Model loading

Quality Metrics

Metric	VAAK	Cloud Baseline
Hindi STT WER	8-12%	10-15%
Turn Detection F1	0.92	0.85
Intent Accuracy	94%	91%
Response Relevance	4.2/5	4.0/5

🚀 Deployment Models

Deployment Architecture Options

graph TB
    subgraph Single[Single Node Deployment]
        SN_APP[VAAK Binary]
        SN_QD[Qdrant]
        SN_SC[ScyllaDB]
        SN_OL[Ollama]

        SN_APP --> SN_QD
        SN_APP --> SN_SC
        SN_APP --> SN_OL
    end

graph TB
    subgraph Distributed[Distributed Deployment]
        LB[Load Balancer]
        APP1[VAAK-1]
        APP2[VAAK-2]
        APP3[VAAK-3]
        QD[Qdrant Cluster]
        SC[ScyllaDB Cluster]
        OL[Ollama Pool]

        LB --> APP1
        LB --> APP2
        LB --> APP3

        APP1 --> QD
        APP2 --> QD
        APP3 --> QD

        APP1 --> SC
        APP2 --> SC
        APP3 --> SC

        APP1 --> OL
        APP2 --> OL
        APP3 --> OL
    end

Hardware Requirements

┌─────────────────────────────────────────────────────────────────────────────┐
│                         HARDWARE REQUIREMENTS                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  TIER              CPU         RAM      STORAGE    CONCURRENT SESSIONS     │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                             │
│  Development       4 cores     16GB     50GB       5-10                     │
│  Production-S      8 cores     32GB     100GB      50-100                   │
│  Production-M      16 cores    64GB     200GB      200-500                  │
│  Production-L      32 cores    128GB    500GB      500-1000                 │
│                                                                             │
│  Notes:                                                                     │
│  • GPU optional but improves inference speed by 2-3x                        │
│  • SSD storage strongly recommended                                         │
│  • ScyllaDB requires additional nodes for high availability                 │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

⚙️ Configuration & Customization

Domain Configuration (YAML-Based)

VAAK is 100% configurable via YAML files. No code changes needed for:

New products/services
Different languages
Custom prompts
Business rules
Compliance requirements

# config/domain.yaml - Example Gold Loan Configuration

domain: gold_loan
version: "1.0.0"

# Product Configuration
product:
  variants:
    - id: standard
      name: "Gold Loan Standard"
      interest_rate_min: 9.5
      interest_rate_max: 11.5
      benefits:
        - "30-minute approval"
        - "Zero foreclosure charges"

    - id: shakti_gold
      name: "Shakti Gold (Women)"
      interest_rate_min: 9.0
      interest_rate_max: 11.0
      target_segment: [women]

# Agent Persona
prompts:
  system_prompt:
    agent_name: "Priya"
    company_name: "Kotak Mahindra Bank"
    instructions:
      - "Help customers understand gold loan options"
      - "Highlight competitive advantages"
      - "Capture leads for follow-up"
    compliance:
      - "Never guarantee loan approval"
      - "Disclose all applicable fees"

Knowledge Base Structure

knowledge/
├── manifest.yaml      # Index of all knowledge files
├── products.yaml      # Product information (EN + Hindi)
├── rates.yaml         # Interest rates, fees
├── objections.yaml    # Objection handling scripts
├── switching.yaml     # Balance transfer benefits
├── eligibility.yaml   # Eligibility criteria
├── segments.yaml      # Customer segment content
├── faqs.yaml          # Frequently asked questions
├── safety.yaml        # Safety, compliance info
├── process.yaml       # Application process steps
├── branches.yaml      # Branch locations
└── competitors.yaml   # Competitor comparison

👔 For Product Managers

Business Value Proposition

┌─────────────────────────────────────────────────────────────────────────────┐
│                         BUSINESS IMPACT SUMMARY                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  METRIC                    BEFORE            AFTER           IMPROVEMENT   │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                             │
│  Average Handle Time       8 min             3 min           62.5% ↓       │
│  Cost per Conversation     ₹45               ₹8              82% ↓         │
│  Lead Qualification Rate   35%               65%             85% ↑         │
│  Customer Satisfaction     3.2/5             4.1/5           28% ↑         │
│  Language Coverage         3                 22              633% ↑        │
│  24/7 Availability         No                Yes             ∞             │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Feature Matrix

Feature	Status	Description
Multi-Channel	✅	Voice, Text, Chat interfaces
22 Indian Languages	✅	Native support via AI4Bharat models
Dialogue State Tracking	✅	Track loan amount, purity, tenure
Lead Scoring	✅	Automatic lead qualification
Objection Handling	✅	Data-driven rebuttals
CRM Integration	✅	Lead capture, appointment scheduling
Compliance Checking	✅	RBI-compliant responses
Audit Logging	✅	SHA-256 merkle chain
A/B Testing	🔄	Experiment framework (planned)
Analytics Dashboard	🔄	Prometheus + Grafana

Conversation Flow

stateDiagram-v2
    [*] --> Greeting: Call Connected

    Greeting --> Exploration: Customer Responds

    Exploration --> ValueProposition: Need Identified
    Exploration --> Exploration: Gathering Info

    ValueProposition --> ObjectionHandling: Objection Raised
    ValueProposition --> Closing: Interest Shown

    ObjectionHandling --> ValueProposition: Objection Resolved
    ObjectionHandling --> FollowUp: Not Convinced

    Closing --> Converted: Appointment Booked
    Closing --> FollowUp: Needs Time

    Converted --> [*]: Lead Captured
    FollowUp --> [*]: Callback Scheduled

🏛️ For Software Architects

Design Principles

Principle	Implementation
Trait-Based Abstraction	All components implement core traits for swappability
Plugin Architecture	Pluggable backends for STT, TTS, LLM, RAG
Async/Streaming First	tokio + futures for real-time processing
Feature Gates	Heavy deps (WebRTC, OTEL) are optional features
Configuration-Driven	All behavior configurable via YAML/TOML
Error Propagation	Consistent Result<T, Error> with From impls
Event-Driven	Broadcast channels for component communication

Trait Hierarchy

// Core traits that all components implement

pub trait SpeechToText: Send + Sync + 'static {
    async fn transcribe(&self, audio: &AudioFrame) -> Result<TranscriptFrame>;
    fn transcribe_stream(&self, audio: impl Stream<Item = AudioFrame>)
        -> impl Stream<Item = Result<TranscriptFrame>>;
}

pub trait TextToSpeech: Send + Sync + 'static {
    async fn synthesize(&self, text: &str, config: &VoiceConfig) -> Result<AudioFrame>;
    fn synthesize_stream(&self, text: impl Stream<Item = String>)
        -> impl Stream<Item = Result<AudioFrame>>;
}

pub trait LanguageModel: Send + Sync + 'static {
    async fn generate(&self, request: GenerateRequest) -> Result<GenerateResponse>;
    fn generate_stream(&self, request: GenerateRequest)
        -> impl Stream<Item = Result<StreamChunk>>;
}

pub trait Retriever: Send + Sync + 'static {
    async fn retrieve(&self, query: &str, options: &RetrieveOptions) -> Result<Vec<Document>>;
    async fn retrieve_agentic(&self, query: &str, context: &QueryContext, max_iter: usize)
        -> Result<Vec<Document>>;
}

Memory Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                      HIERARCHICAL MEMORY SYSTEM                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                         CORE MEMORY                                   │  │
│  │  • Current turn context                                               │  │
│  │  • Active slot values (loan amount, purity, phone)                   │  │
│  │  • Immediate conversation state                                       │  │
│  │  └── Token Budget: 500 tokens                                        │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                    │                                        │
│                                    ▼                                        │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                         RECALL MEMORY                                 │  │
│  │  • Recent conversation turns (last 8)                                │  │
│  │  • Numbered memory notes with IDs                                    │  │
│  │  • Working memory for multi-turn reasoning                           │  │
│  │  └── Token Budget: 1500 tokens                                       │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                    │                                        │
│                                    ▼                                        │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                        ARCHIVAL MEMORY                                │  │
│  │  • Compressed conversation summaries                                  │  │
│  │  • Customer profile and preferences                                   │  │
│  │  • Long-term facts and decisions                                      │  │
│  │  └── Token Budget: 500 tokens (summarized)                           │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

RAG Pipeline

graph LR
    subgraph QueryProc[Query Processing]
        Q[User Query] --> QE[Query Expansion]
        QE --> CL[Normalize]
    end

    subgraph Retrieval[Parallel Retrieval]
        CL --> DS[Dense Search]
        CL --> SS[Sparse Search]
    end

    subgraph Ranking[Fusion and Ranking]
        DS --> RRF[RRF Fusion]
        SS --> RRF
        RRF --> RR[Reranker]
        RR --> EE{Exit}
        EE -->|Yes| OUT[Results]
        EE -->|No| RR
    end

    subgraph Context[Context Management]
        OUT --> CS[Sizing]
        CS --> CC[Compress]
        CC --> LLM[LLM]
    end

📚 Research & References

Academic Papers

Paper	Contribution	Link
Full-Duplex Dialogue Survey	Architecture taxonomy for voice agents	arXiv:2509.14515
Turnsense	Semantic turn detection with small LMs	ACL 2024
MagicNet	Low-latency VAD with 10ms frames	ICASSP 2023
PABEE	Early-exit inference for transformers	EMNLP 2020
EAGLE	Speculative decoding for LLMs	ICML 2024
IndicConformer	Indian language ASR	INTERSPEECH 2023
IndicTrans2	Multilingual translation for Indian languages	TACL 2023

Open Source Tools

Tool	Purpose	License	Link
ONNX Runtime	Cross-platform ML inference	MIT	ort
Candle	Pure Rust neural networks	Apache-2.0	candle
Qdrant	Vector search engine	Apache-2.0	qdrant
Tantivy	Full-text search engine	MIT	tantivy
ScyllaDB	High-performance database	AGPL-3.0	scylladb
Tokio	Async runtime	MIT	tokio
Axum	Web framework	MIT	axum
Silero VAD	Voice activity detection	MIT	silero-vad
sherpa-onnx	Speech toolkit	Apache-2.0	sherpa-onnx
AI4Bharat	Indian language models	CC-BY-4.0	ai4bharat
Ollama	Local LLM serving	MIT	ollama

Industry References

Source	Insight	Link
Cresta Engineering	Voice agent latency optimization	Blog
Deepgram Research	16% satisfaction drop per second of latency	Research
Red Hat Developer	Rust vs Python for agentic AI	Article
LiveKit Realtime	195ms full-duplex voice agents	Docs
Letta Memory	Hierarchical memory for agents	Blog

📁 Project Structure

goldloan-study/
├── voice-agent/                    # Main Voice Agent Application
│   ├── backend/                    # Rust Backend (11 crates)
│   │   ├── crates/
│   │   │   ├── core/               # Foundation: traits, types, 22 languages
│   │   │   ├── config/             # Configuration: YAML loading, domain config
│   │   │   ├── pipeline/           # Audio: VAD, STT, TTS, turn detection
│   │   │   ├── agent/              # Orchestration: DST, memory, lead scoring
│   │   │   ├── rag/                # Retrieval: hybrid search, reranking
│   │   │   ├── llm/                # LLM: multi-provider, speculative decoding
│   │   │   ├── text_processing/    # NLP: grammar, translation, PII
│   │   │   ├── tools/              # MCP: tool definitions, execution
│   │   │   ├── transport/          # Transport: WebRTC, WebSocket
│   │   │   ├── persistence/        # Storage: ScyllaDB, audit logs
│   │   │   └── server/             # HTTP: Axum server, endpoints
│   │   ├── config/                 # Configuration files
│   │   │   ├── default.yaml        # Default settings
│   │   │   ├── domain.yaml         # Domain-specific config
│   │   │   └── production.yaml     # Production overrides
│   │   ├── knowledge/              # RAG Knowledge Base
│   │   │   ├── manifest.yaml       # Knowledge index
│   │   │   ├── products.yaml       # Product information
│   │   │   ├── rates.yaml          # Interest rates
│   │   │   └── ...                 # Other knowledge files
│   │   └── models/                 # ONNX Model Files
│   │       ├── vad/                # Silero VAD
│   │       ├── stt/                # IndicConformer
│   │       ├── tts/                # IndicF5, Piper
│   │       ├── embeddings/         # E5-Multilingual
│   │       └── reranker/           # BGE-Reranker
│   ├── frontend/                   # Web Frontend (React/Vite)
│   └── scripts/                    # Build & Dev Scripts
│
├── docs/                           # Architecture Documentation
│   ├── ARCHITECTURE_v2.md          # High-level design
│   ├── INDEX.md                    # Documentation index
│   ├── rust-ecosystem.md           # Library decisions
│   ├── architecture/               # Component documentation
│   │   ├── agent/                  # Agent framework docs
│   │   ├── pipeline/               # Pipeline documentation
│   │   ├── rag/                    # RAG strategy docs
│   │   ├── personalization/        # Personalization docs
│   │   └── interfaces/             # Interface documentation
│   ├── deployment/                 # Deployment guides
│   ├── experiments/                # Experiment logs
│   └── troubleshooting/            # Issue reports & fixes
│       ├── DEEP_DIVE_REPORT.md
│       ├── FIX_PLAN.md
│       └── ISSUES_SUMMARY.md
│
├── vendor/                         # External Dependencies (Submodules)
│   ├── floneum/                    # Kalosm ML Framework
│   ├── ort/                        # ONNX Runtime Rust Bindings
│   └── turn-rs/                    # TURN Server for WebRTC
│
├── case-studies/                   # Domain-Specific Research
│   └── kotak-gold-loan/            # Kotak Gold Loan Case Study
│       ├── research_docs/          # Market research
│       ├── latex-presentation/     # PDF presentation
│       └── slides/                 # Presentation slides
│
├── assets/                         # Branding & Media
│   ├── vaak-logo.svg               # Project logo
│   └── demo.mp4                    # Demo video
│
└── data/                           # Runtime Data
    ├── storage/                    # Qdrant vector store
    └── snapshots/                  # Data snapshots

🚀 Getting Started

Prerequisites

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default stable

# Install ONNX Runtime (for CPU inference)
# Linux
wget https://github.com/microsoft/onnxruntime/releases/download/v1.16.3/onnxruntime-linux-x64-1.16.3.tgz
tar -xzf onnxruntime-linux-x64-1.16.3.tgz
export ORT_LIB_LOCATION=$PWD/onnxruntime-linux-x64-1.16.3

# Install Qdrant
docker run -p 6333:6333 qdrant/qdrant

# Install Ollama (for local LLM)
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:7b

Build & Run

cd voice-agent/backend

# Build
./scripts/build-backend.sh

# Or manually:
export LIBRARY_PATH="$PWD/onnxruntime/lib"
export ORT_LIB_LOCATION="$PWD/onnxruntime"
cargo build --release

# Run
./target/release/voice-agent-server

# Or with environment:
RUST_LOG=info cargo run --release

Configuration

# Copy default config
cp config/default.yaml config/local.yaml

# Edit for your environment
# - Set Qdrant endpoint
# - Set Ollama endpoint
# - Configure model paths

# Start with custom config
VOICE_AGENT__CONFIG_PATH=config/local.yaml ./target/release/voice-agent-server

🙏 Acknowledgments

AI4Bharat Team

For IndicConformer, IndicTrans2, and other Indian language AI models that make this project possible.

Open Source Community

Hugging Face - Candle framework and model hub
Qdrant Team - Vector search engine
Tokio Team - Async Rust runtime
k2-fsa - sherpa-onnx speech toolkit

Research Institutions

IIT Madras - AI4Bharat initiative
Microsoft Research - ONNX Runtime

📄 License

This project is proprietary software. See LICENSE for details.

Built with ❤️ in Rust for the Indian Enterprise

VAAK - Where Speech Becomes Intelligence

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
assets		assets
case-studies/kotak-gold-loan		case-studies/kotak-gold-loan
data/storage		data/storage
docs		docs
vendor		vendor
voice-agent		voice-agent
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.qdrant-initialized		.qdrant-initialized
README.md		README.md

ayushmaanbhav/Voice-Activated-Autonomous-Knowledge-System

Folders and files

Latest commit

History

Repository files navigation