Skip to content

VAAK (वाक्) · Voice-Activated Autonomous-Knowledge-System · Domain-Agnostic Voice / Text / Chat Agent Built with Rust · Runs on CPU · No Cloud Dependencies · Scalable · Memory-Safe · Sovereign · Low-Cost · Real-Time · Deterministic · Compliant

Notifications You must be signed in to change notification settings

ayushmaanbhav/Voice-Activated-Autonomous-Knowledge-System

Repository files navigation

VAAK Logo

VAAK Demo

VAAK (वाक्)

Voice-Activated Autonomous Knowledge System


Scalable · Memory-Safe · Real-Time · Deterministic

Conversational AI Platform


Sales · Support · Discovery · Education · CRM


Air-Gapped · GDPR-Ready · Data Sovereign · Compliant Architecture

Edge AI · Sub-500ms · Distributed · Open Architecture

22 Indian Languages · Multilingual · Enterprise-Grade


Domain-Agnostic Voice / Text / Chat Agent Built with Rust · Runs on CPU · No Cloud Dependencies

Rust License Languages Latency


📋 Table of Contents


🎯 Executive Summary

VAAK (वाक् - Rigvedic Sanskrit for "Speech/Voice") is a production-grade conversational AI platform that enables enterprises to deploy intelligent voice, text, and chat agents without cloud dependencies, without LLM API costs, and without compromising on data privacy.

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│    🔒 100% ON-PREMISE          🖥️ CPU-ONLY INFERENCE       🌐 22 LANGUAGES │
│    No data leaves your         No GPU required              All Indian      │
│    infrastructure              Works on commodity HW        languages       │
│                                                                             │
│    ⚡ <500ms LATENCY           🔧 DOMAIN AGNOSTIC          🦀 PURE RUST    │
│    Real-time voice             YAML-based config            Memory-safe     │
│    conversations               for any industry             & performant    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

The Problem We Solve

Challenge Traditional AI Agents VAAK
Data Privacy Customer data sent to cloud All processing on-premise
LLM Costs $0.01-0.10 per conversation Zero API costs after deployment
Latency 1-3 seconds typical <500ms end-to-end
Compliance Complex DPAs needed RBI/GDPR compliant by design
Indian Languages Limited or poor quality Native 22-language support
Customization Complex code changes YAML configuration only
Vendor Lock-in Tied to cloud provider Fully open-source stack

💡 Why VAAK?

For Business Leaders

                    ┌─────────────────────────────────────────┐
                    │         COST COMPARISON (Annual)         │
                    ├─────────────────────────────────────────┤
                    │                                         │
                    │  Cloud AI Agent (100K conversations):   │
                    │  ├── LLM API Costs:     ₹50-100 Lakhs   │
                    │  ├── STT/TTS APIs:      ₹20-40 Lakhs    │
                    │  ├── Cloud Infra:       ₹10-20 Lakhs    │
                    │  └── TOTAL:             ₹80-160 Lakhs   │
                    │                                         │
                    │  VAAK (Unlimited conversations):   │
                    │  ├── One-time Setup:    ₹15-25 Lakhs    │
                    │  ├── Hardware (3 nodes): ₹5-10 Lakhs    │
                    │  ├── Annual Maintenance: ₹5-10 Lakhs    │
                    │  └── TOTAL:             ₹25-45 Lakhs    │
                    │                                         │
                    │  💰 SAVINGS: 60-70% over 3 years        │
                    │                                         │
                    └─────────────────────────────────────────┘

For Technical Leaders

  • No Python runtime dependencies - Single binary deployment
  • Memory-safe Rust - No segfaults, no GC pauses, no data races
  • True parallelism - No GIL, scales with CPU cores
  • Deterministic behavior - Same input = same output (no temperature drift)
  • Audit trails - SHA-256 merkle chain for regulatory compliance

⚡ Key Differentiators

┌────────────────────────────────────────────────────────────────────────────────┐
│                          VAAK vs. ALTERNATIVES                             │
├────────────────────────────────────────────────────────────────────────────────┤
│                                                                                │
│                    VAAK    │  Cloud APIs   │  Open Source  │  In-House   │
│   ─────────────────────────────────────────────────────────────────────────   │
│   Air-gapped Ready      ✅      │      ❌       │      ⚠️       │     ⚠️      │
│   CPU-Only             ✅      │      N/A      │      ⚠️       │     ❌      │
│   22 Indian Languages   ✅      │      ⚠️       │      ❌       │     ❌      │
│   <500ms Latency        ✅      │      ❌       │      ⚠️       │     ⚠️      │
│   Zero API Costs        ✅      │      ❌       │      ✅       │     ✅      │
│   Production Ready      ✅      │      ✅       │      ⚠️       │     ❌      │
│   Domain Agnostic       ✅      │      ✅       │      ⚠️       │     ❌      │
│   Memory Safe           ✅      │      N/A      │      ❌       │     ⚠️      │
│   RAG Built-in          ✅      │      ⚠️       │      ⚠️       │     ❌      │
│   Compliance Tools      ✅      │      ⚠️       │      ❌       │     ❌      │
│                                                                                │
│   Legend: ✅ Full Support  ⚠️ Partial/Complex  ❌ Not Available               │
│                                                                                │
└────────────────────────────────────────────────────────────────────────────────┘

🏗️ Architecture Overview

High-Level System Architecture

graph TB
    subgraph Client[Client Layer]
        WEB[Web Browser]
        MOB[Mobile App]
        TEL[Telephony]
        API[REST API]
    end

    subgraph Transport[Transport Layer]
        WS[WebSocket Server]
        WEBRTC[WebRTC Signaling]
        HTTP[HTTP Endpoints]
    end

    subgraph Pipeline[Core Pipeline]
        VAD[VAD]
        STT[STT]
        NLU[NLU]
        AGENT[Agent]
        TTS[TTS]
    end

    subgraph Intelligence[Intelligence Layer]
        RAG[Hybrid RAG Engine]
        LLM[Local LLM Inference]
        DST[Dialogue State Tracker]
        TOOLS[MCP Tool Executor]
    end

    subgraph Data[Data Layer]
        QDRANT[Qdrant Vector DB]
        SCYLLA[ScyllaDB]
        CONFIG[YAML Configs]
        MODELS[ONNX Models]
    end

    WEB --> WS
    MOB --> WEBRTC
    TEL --> WS
    API --> HTTP

    WS --> VAD
    WEBRTC --> VAD
    HTTP --> NLU

    VAD --> STT
    STT --> NLU
    NLU --> AGENT
    AGENT --> TTS
    TTS --> WS
    TTS --> WEBRTC

    AGENT --> RAG
    AGENT --> LLM
    AGENT --> DST
    AGENT --> TOOLS

    RAG --> QDRANT
    DST --> SCYLLA
    TOOLS --> SCYLLA
    CONFIG --> AGENT
    MODELS --> STT
    MODELS --> TTS
    MODELS --> LLM
Loading

Component Interaction Diagram

┌─────────────────────────────────────────────────────────────────────────────────┐
│                           VAAK COMPONENT ARCHITECTURE                       │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                 │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │                              SERVER CRATE                                │   │
│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐  │   │
│  │  │   Axum    │ │  WebRTC   │ │ WebSocket │ │    MCP    │ │ Metrics  │  │   │
│  │  │   HTTP    │ │ Signaling │ │  Handler  │ │  Server   │ │Prometheus│  │   │
│  │  └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └────┬─────┘  │   │
│  └────────┼─────────────┼─────────────┼─────────────┼────────────┼────────┘   │
│           │             │             │             │            │             │
│  ┌────────┼─────────────┼─────────────┼─────────────┼────────────┼────────┐   │
│  │        └─────────────┴─────────────┴─────────────┘            │        │   │
│  │                              ▼                                 │        │   │
│  │  ┌─────────────────────────────────────────────────────────────────┐   │   │
│  │  │                        PIPELINE CRATE                           │   │   │
│  │  │                                                                 │   │   │
│  │  │   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐        │   │   │
│  │  │   │   VAD   │──▶│   STT   │──▶│  Turn   │──▶│   TTS   │        │   │   │
│  │  │   │ Silero/ │   │  Indic  │   │Detector │   │ IndicF5/│        │   │   │
│  │  │   │MagicNet │   │Conformer│   │Semantic │   │  Piper  │        │   │   │
│  │  │   └─────────┘   └─────────┘   └─────────┘   └─────────┘        │   │   │
│  │  │                                                                 │   │   │
│  │  └─────────────────────────────────────────────────────────────────┘   │   │
│  │                              │                                         │   │
│  │                              ▼                                         │   │
│  │  ┌─────────────────────────────────────────────────────────────────┐   │   │
│  │  │                         AGENT CRATE                             │   │   │
│  │  │                                                                 │   │   │
│  │  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐           │   │   │
│  │  │  │  Stage   │ │  Memory  │ │   DST    │ │   Lead   │           │   │   │
│  │  │  │ Manager  │ │ Agentic  │ │ Tracker  │ │ Scoring  │           │   │   │
│  │  │  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘           │   │   │
│  │  │       └────────────┼────────────┼────────────┘                 │   │   │
│  │  │                    ▼            ▼                               │   │   │
│  │  │  ┌──────────────────────────────────────────────────────────┐  │   │   │
│  │  │  │              GoldLoanAgent (Orchestrator)                │  │   │   │
│  │  │  └──────────────────────────────────────────────────────────┘  │   │   │
│  │  └─────────────────────────────────────────────────────────────────┘   │   │
│  │                              │                                         │   │
│  │              ┌───────────────┼───────────────┐                         │   │
│  │              ▼               ▼               ▼                         │   │
│  │  ┌───────────────┐ ┌───────────────┐ ┌───────────────┐                │   │
│  │  │   RAG CRATE   │ │   LLM CRATE   │ │  TOOLS CRATE  │                │   │
│  │  │               │ │               │ │               │                │   │
│  │  │ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │                │   │
│  │  │ │  Hybrid   │ │ │ │  Ollama   │ │ │ │   MCP     │ │                │   │
│  │  │ │ Retriever │ │ │ │  Claude   │ │ │ │ Protocol  │ │                │   │
│  │  │ └───────────┘ │ │ │  OpenAI   │ │ │ └───────────┘ │                │   │
│  │  │ ┌───────────┐ │ │ └───────────┘ │ │ ┌───────────┐ │                │   │
│  │  │ │ Reranker  │ │ │ ┌───────────┐ │ │ │ Gold Loan │ │                │   │
│  │  │ │EarlyExit  │ │ │ │Speculative│ │ │ │  Tools    │ │                │   │
│  │  │ └───────────┘ │ │ │ Decoding  │ │ │ └───────────┘ │                │   │
│  │  └───────────────┘ │ └───────────┘ │ └───────────────┘                │   │
│  │              │     └───────────────┘         │                        │   │
│  └──────────────┼───────────────────────────────┼────────────────────────┘   │
│                 ▼                               ▼                             │
│  ┌───────────────────────────────┐ ┌───────────────────────────────┐         │
│  │     TEXT_PROCESSING CRATE     │ │      PERSISTENCE CRATE        │         │
│  │                               │ │                               │         │
│  │  • Grammar Correction (LLM)   │ │  • ScyllaDB Sessions          │         │
│  │  • Translation (IndicTrans2)  │ │  • Audit Logging (Merkle)     │         │
│  │  • PII Detection (Hybrid)     │ │  • SMS/Appointment Store      │         │
│  │  • Compliance Checking        │ │  • Gold Price Service         │         │
│  │  • Sentiment Analysis         │ │                               │         │
│  │  • Entity Extraction          │ └───────────────────────────────┘         │
│  └───────────────────────────────┘                                           │
│                 │                                                             │
│                 ▼                                                             │
│  ┌───────────────────────────────────────────────────────────────────────┐   │
│  │                            CORE CRATE                                  │   │
│  │                                                                        │   │
│  │  Unified Traits │ Audio Types │ 22 Languages │ Error Types │ PII      │   │
│  │                                                                        │   │
│  └───────────────────────────────────────────────────────────────────────┘   │
│                                                                               │
└───────────────────────────────────────────────────────────────────────────────┘

🔧 System Components

Crate Dependency Graph

graph TD
    subgraph Entry[Entry Point]
        SERVER[server]
    end

    subgraph Orchestration[Orchestration]
        AGENT[agent]
        PIPELINE[pipeline]
    end

    subgraph Intel[Intelligence]
        RAG[rag]
        LLM[llm]
        TOOLS[tools]
        TEXT[text_processing]
    end

    subgraph Infra[Infrastructure]
        TRANSPORT[transport]
        PERSISTENCE[persistence]
    end

    subgraph Foundation[Foundation]
        CONFIG[config]
        CORE[core]
    end

    SERVER --> AGENT
    SERVER --> PIPELINE
    SERVER --> PERSISTENCE
    SERVER --> TRANSPORT
    SERVER --> CONFIG

    AGENT --> RAG
    AGENT --> LLM
    AGENT --> TOOLS
    AGENT --> TEXT
    AGENT --> PIPELINE

    PIPELINE --> CORE
    PIPELINE --> CONFIG

    RAG --> LLM
    RAG --> CORE
    RAG --> CONFIG

    LLM --> CORE
    LLM --> CONFIG

    TOOLS --> CORE
    TOOLS --> CONFIG
    TOOLS --> PERSISTENCE

    TEXT --> CORE
    TEXT --> CONFIG

    TRANSPORT --> CORE
    TRANSPORT --> CONFIG

    PERSISTENCE --> CORE

    CONFIG --> CORE
Loading

Component Summary

Crate Lines Purpose Key Features
core ~2,000 Foundation traits & types 13 traits, 22 languages, audio types
config ~1,500 Configuration management YAML/TOML, hot-reload, domain config
pipeline ~4,000 Audio processing VAD, STT, TTS, turn detection, barge-in
agent ~3,500 Conversation orchestration DST, memory, lead scoring, stages
rag ~2,500 Retrieval system Hybrid search, reranking, caching
llm ~1,500 LLM integration Multi-provider, speculative decoding
text_processing ~2,000 NLP pipeline Grammar, translation, PII, compliance
tools ~1,200 MCP tool interface JSON-RPC, gold loan tools
transport ~800 Audio transport WebRTC, WebSocket, Opus codec
persistence ~1,000 Data storage ScyllaDB, audit logs, sessions
server ~1,500 HTTP/WS server Axum, metrics, auth

🛠️ Technology Stack

Core Technologies

┌─────────────────────────────────────────────────────────────────────────────┐
│                           TECHNOLOGY STACK                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  LAYER              TECHNOLOGY                  PURPOSE                     │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                             │
│  Runtime            Rust 1.75+                  Memory-safe systems lang    │
│  Async              Tokio                       High-performance async I/O  │
│  Web                Axum 0.7                    HTTP/WebSocket framework    │
│                                                                             │
│  ML Inference       ONNX Runtime (ort)          Cross-platform inference    │
│                     Candle                      Pure Rust neural networks   │
│                                                                             │
│  Speech             IndicConformer              22 Indian language STT      │
│                     IndicF5 / Piper             Indian language TTS         │
│                     Silero VAD                  Voice activity detection    │
│                                                                             │
│  LLM                Qwen 2.5 (1.5B-7B)          Local reasoning             │
│                     Ollama                      Model serving               │
│                     Claude/OpenAI               Cloud fallback (optional)   │
│                                                                             │
│  Translation        IndicTrans2                 22 language translation     │
│                                                                             │
│  Vector Search      Qdrant                      Semantic search             │
│  Text Search        Tantivy                     BM25 full-text search       │
│                                                                             │
│  Database           ScyllaDB                    High-availability storage   │
│                                                                             │
│  Observability      Prometheus                  Metrics collection          │
│                     OpenTelemetry               Distributed tracing         │
│                     Tracing                     Structured logging          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Model Stack

Model Size Purpose Quantization Latency
Silero VAD 2MB Voice detection FP32 <10ms
IndicConformer 600MB Indian STT FP16 ~100ms
SmolLM2-135M 270MB Turn detection Q4 ~30ms
Qwen2.5-1.5B 1.2GB Fast responses Q4_K_M ~120ms
Qwen2.5-7B 4.5GB Complex reasoning Q4_K_M ~300ms
IndicF5 500MB Indian TTS FP16 ~60ms
E5-Multilingual 278MB Embeddings FP16 ~20ms
BGE-Reranker-M3 500MB Cross-encoder FP16 ~20ms

🔄 Data Flow

Voice Conversation Flow

sequenceDiagram
    participant U as User
    participant T as Transport
    participant P as Pipeline
    participant A as Agent
    participant R as RAG
    participant L as LLM
    participant S as TTS

    U->>T: Audio Stream
    T->>P: Raw Audio Frames

    loop VAD Processing
        P->>P: Silero VAD Detection
    end

    P->>P: IndicConformer STT
    P->>P: Semantic Turn Detection
    P->>A: Final Transcript

    par Parallel Processing
        A->>R: RAG Query
        R->>R: Hybrid Search
        R->>R: Cross-Encoder Reranking
        R-->>A: Relevant Documents
    and DST Update
        A->>A: Update DST State
        A->>A: Detect Intent
    end

    A->>L: Generate Response

    loop Token Streaming
        L-->>A: Token Chunk
        A->>A: Sentence Detection
        A->>S: Complete Sentence
        S-->>T: Audio Frame
        T-->>U: Play Audio
    end

    Note over U,S: First audio plays before LLM completes
Loading

Latency Breakdown

┌─────────────────────────────────────────────────────────────────────────────┐
│                        LATENCY BUDGET (Target: <500ms)                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Stage                          Target      Technique                       │
│  ───────────────────────────────────────────────────────────────────────   │
│  Audio Receive                  10ms        WebRTC low-latency              │
│  VAD Detection                  10ms        10ms frame processing           │
│  STT Streaming                  100ms       Partial results + prefetch      │
│  Turn Detection                 30ms        SmolLM2-135M semantic           │
│  RAG (speculative)              50ms        Prefetch on partial transcript  │
│  Reranking                      20ms        PABEE early-exit                │
│  LLM First Token                120ms       SLM race + speculative          │
│  TTS First Audio                60ms        Word-level streaming            │
│  Audio Send                     10ms        Low-latency buffer              │
│  ───────────────────────────────────────────────────────────────────────   │
│  TOTAL                          ~410ms      Typical end-to-end              │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                         TIMELINE VIEW                                │   │
│  │                                                                      │   │
│  │  0ms   100ms   200ms   300ms   400ms   500ms                        │   │
│  │  │──────│──────│──────│──────│──────│                               │   │
│  │  │ VAD+STT     │ RAG+RANK │ LLM    │ TTS  │                         │   │
│  │  │             │          │        │      │                         │   │
│  │  │◄───────────►│◄────────►│◄──────►│◄────►│                         │   │
│  │  │    ~110ms   │  ~70ms   │ ~120ms │~60ms │                         │   │
│  │  │                                        │                         │   │
│  │  │                        First audio ────┘                         │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

📊 Performance Benchmarks

Latency Comparison

┌─────────────────────────────────────────────────────────────────────────────┐
│                    END-TO-END LATENCY COMPARISON (P95)                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  System                      Latency        Hardware                        │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                             │
│  VAAK (this project)    ~450ms         8-core CPU, 32GB RAM           │
│  OpenAI Realtime API         ~300ms         Cloud (with network)           │
│  Bland.ai                    ~800ms         Cloud                          │
│  Retell.ai                   ~700ms         Cloud                          │
│  Vapi                        ~1000ms        Cloud                          │
│  Custom Python Stack         ~1200ms        8-core CPU, 32GB RAM           │
│                                                                             │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  LATENCY BAR CHART                                                   │   │
│  │                                                                      │   │
│  │  VAAK        ████████████████                           450ms  │   │
│  │  OpenAI Realtime  ██████████                                 300ms  │   │
│  │  Bland.ai         ██████████████████████████                 800ms  │   │
│  │  Retell.ai        ███████████████████████                    700ms  │   │
│  │  Vapi             ████████████████████████████████          1000ms  │   │
│  │  Python Stack     ████████████████████████████████████████  1200ms  │   │
│  │                   │         │         │         │         │         │   │
│  │                   0       300       600       900      1200ms       │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Throughput & Resource Usage

Metric Value Configuration
Concurrent Sessions 100+ 8-core CPU, 32GB RAM
Requests/Second 50+ Per CPU core
Memory (Idle) ~2GB Base server
Memory (Loaded) ~8GB With all models
Binary Size ~150MB Release build
Startup Time ~5s Model loading

Quality Metrics

Metric VAAK Cloud Baseline
Hindi STT WER 8-12% 10-15%
Turn Detection F1 0.92 0.85
Intent Accuracy 94% 91%
Response Relevance 4.2/5 4.0/5

🚀 Deployment Models

Deployment Architecture Options

graph TB
    subgraph Single[Single Node Deployment]
        SN_APP[VAAK Binary]
        SN_QD[Qdrant]
        SN_SC[ScyllaDB]
        SN_OL[Ollama]

        SN_APP --> SN_QD
        SN_APP --> SN_SC
        SN_APP --> SN_OL
    end
Loading
graph TB
    subgraph Distributed[Distributed Deployment]
        LB[Load Balancer]
        APP1[VAAK-1]
        APP2[VAAK-2]
        APP3[VAAK-3]
        QD[Qdrant Cluster]
        SC[ScyllaDB Cluster]
        OL[Ollama Pool]

        LB --> APP1
        LB --> APP2
        LB --> APP3

        APP1 --> QD
        APP2 --> QD
        APP3 --> QD

        APP1 --> SC
        APP2 --> SC
        APP3 --> SC

        APP1 --> OL
        APP2 --> OL
        APP3 --> OL
    end
Loading

Hardware Requirements

┌─────────────────────────────────────────────────────────────────────────────┐
│                         HARDWARE REQUIREMENTS                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  TIER              CPU         RAM      STORAGE    CONCURRENT SESSIONS     │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                             │
│  Development       4 cores     16GB     50GB       5-10                     │
│  Production-S      8 cores     32GB     100GB      50-100                   │
│  Production-M      16 cores    64GB     200GB      200-500                  │
│  Production-L      32 cores    128GB    500GB      500-1000                 │
│                                                                             │
│  Notes:                                                                     │
│  • GPU optional but improves inference speed by 2-3x                        │
│  • SSD storage strongly recommended                                         │
│  • ScyllaDB requires additional nodes for high availability                 │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

⚙️ Configuration & Customization

Domain Configuration (YAML-Based)

VAAK is 100% configurable via YAML files. No code changes needed for:

  • New products/services
  • Different languages
  • Custom prompts
  • Business rules
  • Compliance requirements
# config/domain.yaml - Example Gold Loan Configuration

domain: gold_loan
version: "1.0.0"

# Product Configuration
product:
  variants:
    - id: standard
      name: "Gold Loan Standard"
      interest_rate_min: 9.5
      interest_rate_max: 11.5
      benefits:
        - "30-minute approval"
        - "Zero foreclosure charges"

    - id: shakti_gold
      name: "Shakti Gold (Women)"
      interest_rate_min: 9.0
      interest_rate_max: 11.0
      target_segment: [women]

# Agent Persona
prompts:
  system_prompt:
    agent_name: "Priya"
    company_name: "Kotak Mahindra Bank"
    instructions:
      - "Help customers understand gold loan options"
      - "Highlight competitive advantages"
      - "Capture leads for follow-up"
    compliance:
      - "Never guarantee loan approval"
      - "Disclose all applicable fees"

Knowledge Base Structure

knowledge/
├── manifest.yaml      # Index of all knowledge files
├── products.yaml      # Product information (EN + Hindi)
├── rates.yaml         # Interest rates, fees
├── objections.yaml    # Objection handling scripts
├── switching.yaml     # Balance transfer benefits
├── eligibility.yaml   # Eligibility criteria
├── segments.yaml      # Customer segment content
├── faqs.yaml          # Frequently asked questions
├── safety.yaml        # Safety, compliance info
├── process.yaml       # Application process steps
├── branches.yaml      # Branch locations
└── competitors.yaml   # Competitor comparison

👔 For Product Managers

Business Value Proposition

┌─────────────────────────────────────────────────────────────────────────────┐
│                         BUSINESS IMPACT SUMMARY                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  METRIC                    BEFORE            AFTER           IMPROVEMENT   │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                             │
│  Average Handle Time       8 min             3 min           62.5% ↓       │
│  Cost per Conversation     ₹45               ₹8              82% ↓         │
│  Lead Qualification Rate   35%               65%             85% ↑         │
│  Customer Satisfaction     3.2/5             4.1/5           28% ↑         │
│  Language Coverage         3                 22              633% ↑        │
│  24/7 Availability         No                Yes             ∞             │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Feature Matrix

Feature Status Description
Multi-Channel Voice, Text, Chat interfaces
22 Indian Languages Native support via AI4Bharat models
Dialogue State Tracking Track loan amount, purity, tenure
Lead Scoring Automatic lead qualification
Objection Handling Data-driven rebuttals
CRM Integration Lead capture, appointment scheduling
Compliance Checking RBI-compliant responses
Audit Logging SHA-256 merkle chain
A/B Testing 🔄 Experiment framework (planned)
Analytics Dashboard 🔄 Prometheus + Grafana

Conversation Flow

stateDiagram-v2
    [*] --> Greeting: Call Connected

    Greeting --> Exploration: Customer Responds

    Exploration --> ValueProposition: Need Identified
    Exploration --> Exploration: Gathering Info

    ValueProposition --> ObjectionHandling: Objection Raised
    ValueProposition --> Closing: Interest Shown

    ObjectionHandling --> ValueProposition: Objection Resolved
    ObjectionHandling --> FollowUp: Not Convinced

    Closing --> Converted: Appointment Booked
    Closing --> FollowUp: Needs Time

    Converted --> [*]: Lead Captured
    FollowUp --> [*]: Callback Scheduled
Loading

🏛️ For Software Architects

Design Principles

Principle Implementation
Trait-Based Abstraction All components implement core traits for swappability
Plugin Architecture Pluggable backends for STT, TTS, LLM, RAG
Async/Streaming First tokio + futures for real-time processing
Feature Gates Heavy deps (WebRTC, OTEL) are optional features
Configuration-Driven All behavior configurable via YAML/TOML
Error Propagation Consistent Result<T, Error> with From impls
Event-Driven Broadcast channels for component communication

Trait Hierarchy

// Core traits that all components implement

pub trait SpeechToText: Send + Sync + 'static {
    async fn transcribe(&self, audio: &AudioFrame) -> Result<TranscriptFrame>;
    fn transcribe_stream(&self, audio: impl Stream<Item = AudioFrame>)
        -> impl Stream<Item = Result<TranscriptFrame>>;
}

pub trait TextToSpeech: Send + Sync + 'static {
    async fn synthesize(&self, text: &str, config: &VoiceConfig) -> Result<AudioFrame>;
    fn synthesize_stream(&self, text: impl Stream<Item = String>)
        -> impl Stream<Item = Result<AudioFrame>>;
}

pub trait LanguageModel: Send + Sync + 'static {
    async fn generate(&self, request: GenerateRequest) -> Result<GenerateResponse>;
    fn generate_stream(&self, request: GenerateRequest)
        -> impl Stream<Item = Result<StreamChunk>>;
}

pub trait Retriever: Send + Sync + 'static {
    async fn retrieve(&self, query: &str, options: &RetrieveOptions) -> Result<Vec<Document>>;
    async fn retrieve_agentic(&self, query: &str, context: &QueryContext, max_iter: usize)
        -> Result<Vec<Document>>;
}

Memory Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                      HIERARCHICAL MEMORY SYSTEM                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                         CORE MEMORY                                   │  │
│  │  • Current turn context                                               │  │
│  │  • Active slot values (loan amount, purity, phone)                   │  │
│  │  • Immediate conversation state                                       │  │
│  │  └── Token Budget: 500 tokens                                        │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                    │                                        │
│                                    ▼                                        │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                         RECALL MEMORY                                 │  │
│  │  • Recent conversation turns (last 8)                                │  │
│  │  • Numbered memory notes with IDs                                    │  │
│  │  • Working memory for multi-turn reasoning                           │  │
│  │  └── Token Budget: 1500 tokens                                       │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                    │                                        │
│                                    ▼                                        │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                        ARCHIVAL MEMORY                                │  │
│  │  • Compressed conversation summaries                                  │  │
│  │  • Customer profile and preferences                                   │  │
│  │  • Long-term facts and decisions                                      │  │
│  │  └── Token Budget: 500 tokens (summarized)                           │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

RAG Pipeline

graph LR
    subgraph QueryProc[Query Processing]
        Q[User Query] --> QE[Query Expansion]
        QE --> CL[Normalize]
    end

    subgraph Retrieval[Parallel Retrieval]
        CL --> DS[Dense Search]
        CL --> SS[Sparse Search]
    end

    subgraph Ranking[Fusion and Ranking]
        DS --> RRF[RRF Fusion]
        SS --> RRF
        RRF --> RR[Reranker]
        RR --> EE{Exit}
        EE -->|Yes| OUT[Results]
        EE -->|No| RR
    end

    subgraph Context[Context Management]
        OUT --> CS[Sizing]
        CS --> CC[Compress]
        CC --> LLM[LLM]
    end
Loading

📚 Research & References

Academic Papers

Paper Contribution Link
Full-Duplex Dialogue Survey Architecture taxonomy for voice agents arXiv:2509.14515
Turnsense Semantic turn detection with small LMs ACL 2024
MagicNet Low-latency VAD with 10ms frames ICASSP 2023
PABEE Early-exit inference for transformers EMNLP 2020
EAGLE Speculative decoding for LLMs ICML 2024
IndicConformer Indian language ASR INTERSPEECH 2023
IndicTrans2 Multilingual translation for Indian languages TACL 2023

Open Source Tools

Tool Purpose License Link
ONNX Runtime Cross-platform ML inference MIT ort
Candle Pure Rust neural networks Apache-2.0 candle
Qdrant Vector search engine Apache-2.0 qdrant
Tantivy Full-text search engine MIT tantivy
ScyllaDB High-performance database AGPL-3.0 scylladb
Tokio Async runtime MIT tokio
Axum Web framework MIT axum
Silero VAD Voice activity detection MIT silero-vad
sherpa-onnx Speech toolkit Apache-2.0 sherpa-onnx
AI4Bharat Indian language models CC-BY-4.0 ai4bharat
Ollama Local LLM serving MIT ollama

Industry References

Source Insight Link
Cresta Engineering Voice agent latency optimization Blog
Deepgram Research 16% satisfaction drop per second of latency Research
Red Hat Developer Rust vs Python for agentic AI Article
LiveKit Realtime 195ms full-duplex voice agents Docs
Letta Memory Hierarchical memory for agents Blog

📁 Project Structure

goldloan-study/
├── voice-agent/                    # Main Voice Agent Application
│   ├── backend/                    # Rust Backend (11 crates)
│   │   ├── crates/
│   │   │   ├── core/               # Foundation: traits, types, 22 languages
│   │   │   ├── config/             # Configuration: YAML loading, domain config
│   │   │   ├── pipeline/           # Audio: VAD, STT, TTS, turn detection
│   │   │   ├── agent/              # Orchestration: DST, memory, lead scoring
│   │   │   ├── rag/                # Retrieval: hybrid search, reranking
│   │   │   ├── llm/                # LLM: multi-provider, speculative decoding
│   │   │   ├── text_processing/    # NLP: grammar, translation, PII
│   │   │   ├── tools/              # MCP: tool definitions, execution
│   │   │   ├── transport/          # Transport: WebRTC, WebSocket
│   │   │   ├── persistence/        # Storage: ScyllaDB, audit logs
│   │   │   └── server/             # HTTP: Axum server, endpoints
│   │   ├── config/                 # Configuration files
│   │   │   ├── default.yaml        # Default settings
│   │   │   ├── domain.yaml         # Domain-specific config
│   │   │   └── production.yaml     # Production overrides
│   │   ├── knowledge/              # RAG Knowledge Base
│   │   │   ├── manifest.yaml       # Knowledge index
│   │   │   ├── products.yaml       # Product information
│   │   │   ├── rates.yaml          # Interest rates
│   │   │   └── ...                 # Other knowledge files
│   │   └── models/                 # ONNX Model Files
│   │       ├── vad/                # Silero VAD
│   │       ├── stt/                # IndicConformer
│   │       ├── tts/                # IndicF5, Piper
│   │       ├── embeddings/         # E5-Multilingual
│   │       └── reranker/           # BGE-Reranker
│   ├── frontend/                   # Web Frontend (React/Vite)
│   └── scripts/                    # Build & Dev Scripts
│
├── docs/                           # Architecture Documentation
│   ├── ARCHITECTURE_v2.md          # High-level design
│   ├── INDEX.md                    # Documentation index
│   ├── rust-ecosystem.md           # Library decisions
│   ├── architecture/               # Component documentation
│   │   ├── agent/                  # Agent framework docs
│   │   ├── pipeline/               # Pipeline documentation
│   │   ├── rag/                    # RAG strategy docs
│   │   ├── personalization/        # Personalization docs
│   │   └── interfaces/             # Interface documentation
│   ├── deployment/                 # Deployment guides
│   ├── experiments/                # Experiment logs
│   └── troubleshooting/            # Issue reports & fixes
│       ├── DEEP_DIVE_REPORT.md
│       ├── FIX_PLAN.md
│       └── ISSUES_SUMMARY.md
│
├── vendor/                         # External Dependencies (Submodules)
│   ├── floneum/                    # Kalosm ML Framework
│   ├── ort/                        # ONNX Runtime Rust Bindings
│   └── turn-rs/                    # TURN Server for WebRTC
│
├── case-studies/                   # Domain-Specific Research
│   └── kotak-gold-loan/            # Kotak Gold Loan Case Study
│       ├── research_docs/          # Market research
│       ├── latex-presentation/     # PDF presentation
│       └── slides/                 # Presentation slides
│
├── assets/                         # Branding & Media
│   ├── vaak-logo.svg               # Project logo
│   └── demo.mp4                    # Demo video
│
└── data/                           # Runtime Data
    ├── storage/                    # Qdrant vector store
    └── snapshots/                  # Data snapshots

🚀 Getting Started

Prerequisites

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default stable

# Install ONNX Runtime (for CPU inference)
# Linux
wget https://github.com/microsoft/onnxruntime/releases/download/v1.16.3/onnxruntime-linux-x64-1.16.3.tgz
tar -xzf onnxruntime-linux-x64-1.16.3.tgz
export ORT_LIB_LOCATION=$PWD/onnxruntime-linux-x64-1.16.3

# Install Qdrant
docker run -p 6333:6333 qdrant/qdrant

# Install Ollama (for local LLM)
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:7b

Build & Run

cd voice-agent/backend

# Build
./scripts/build-backend.sh

# Or manually:
export LIBRARY_PATH="$PWD/onnxruntime/lib"
export ORT_LIB_LOCATION="$PWD/onnxruntime"
cargo build --release

# Run
./target/release/voice-agent-server

# Or with environment:
RUST_LOG=info cargo run --release

Configuration

# Copy default config
cp config/default.yaml config/local.yaml

# Edit for your environment
# - Set Qdrant endpoint
# - Set Ollama endpoint
# - Configure model paths

# Start with custom config
VOICE_AGENT__CONFIG_PATH=config/local.yaml ./target/release/voice-agent-server

🙏 Acknowledgments

AI4Bharat Team

For IndicConformer, IndicTrans2, and other Indian language AI models that make this project possible.

Open Source Community

  • Hugging Face - Candle framework and model hub
  • Qdrant Team - Vector search engine
  • Tokio Team - Async Rust runtime
  • k2-fsa - sherpa-onnx speech toolkit

Research Institutions

  • IIT Madras - AI4Bharat initiative
  • Microsoft Research - ONNX Runtime

📄 License

This project is proprietary software. See LICENSE for details.


Built with ❤️ in Rust for the Indian Enterprise

VAAK - Where Speech Becomes Intelligence

About

VAAK (वाक्) · Voice-Activated Autonomous-Knowledge-System · Domain-Agnostic Voice / Text / Chat Agent Built with Rust · Runs on CPU · No Cloud Dependencies · Scalable · Memory-Safe · Sovereign · Low-Cost · Real-Time · Deterministic · Compliant

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •