Skip to content

dangvansam/livekit-plugins-namo-turn-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Namo Turn Detector Plugin for LiveKit Agents

Turn detection plugin for LiveKit Agents using Namo Turn Detector models.

Installation

pip install livekit-plugins-namo-turn-detector

Features

  • Single-Language Models: Memory-efficient models for Vietnamese, English, Chinese (NEW ✨)
  • Multilingual Support: 23+ languages with unified multilingual model
  • High Accuracy: Language-specific models outperform baseline models
  • Fast & Efficient: Optimized inference with 66% less memory for single-language apps
  • Async API: Built on LiveKit's inference runner for optimal performance
  • Easy Integration: Drop-in replacement for existing turn detectors

Quick Start

🎯 Single-Language Models (Recommended for Production)

Most memory-efficient option - loads only one language model (~200MB):

Vietnamese Only

from livekit.plugins import namo_turn_detector
from livekit import agents

async def entrypoint(ctx: agents.JobContext):
    model = namo_turn_detector.vi_model.VietnameseModel(threshold=0.7)
    prob = await model.predict_end_of_turn(chat_ctx)

English Only

from livekit.plugins import namo_turn_detector

async def entrypoint(ctx: agents.JobContext):
    model = namo_turn_detector.en_model.EnglishModel(threshold=0.7)
    prob = await model.predict_end_of_turn(chat_ctx)

Chinese Only

from livekit.plugins import namo_turn_detector

async def entrypoint(ctx: agents.JobContext):
    model = namo_turn_detector.zh_model.ChineseModel(threshold=0.7)
    prob = await model.predict_end_of_turn(chat_ctx)

Benefits:

  • 66% less memory (~200MB vs ~600MB)
  • 3x faster initialization
  • Highest accuracy for the language
  • ✅ Best for single-language production apps

Multi-Language Model (EN/VI/ZH Switching)

Use when you need to switch between English, Vietnamese, or Chinese:

from livekit.plugins.namo_turn_detector.language_specific import LanguageSpecificModel

# Loads all 3 models (en, vi, zh) - ~600MB
async def entrypoint(ctx: agents.JobContext):
    model = LanguageSpecificModel(language="vi", threshold=0.7)
    prob = await model.predict_end_of_turn(chat_ctx)

Multilingual Model (23+ Languages)

Use when you need support for many languages:

from livekit.plugins.namo_turn_detector.multilingual import MultilingualModel

async def entrypoint(ctx: agents.JobContext):
    model = MultilingualModel(threshold=0.7)
    prob = await model.predict_end_of_turn(chat_ctx)

Benchmark Results

Comparison across English, Vietnamese, and Chinese:

English Performance

Sample: "Hello, how are you?"
  • Namo Multilingual:     0.8757 (16ms) - EOT: True
  • Namo English-Specific: 0.0002 (13ms) - EOT: False
  • LiveKit Multilingual:  0.2838 (33ms) - EOT: True
  • LiveKit English:       0.4596 (4ms)  - EOT: True

Sample: "What's the weather like today?"
  • Namo Multilingual:     0.8032 (15ms) - EOT: True
  • Namo English-Specific: 0.9999 (9ms)  - EOT: True ⭐
  • LiveKit Multilingual:  0.7799 (27ms) - EOT: True
  • LiveKit English:       0.9409 (3ms)  - EOT: True

Vietnamese Performance

Sample: "Xin chào, bạn khỏe không?" (Hello, how are you?)
  • Namo Multilingual:        0.8651 (25ms) - EOT: True
  • Namo Vietnamese-Specific: 0.9857 (36ms) - EOT: True ⭐
  • LiveKit Multilingual:     0.0322 (20ms) - EOT: False

Sample: "Thời tiết hôm nay thế nào?" (What's the weather today?)
  • Namo Multilingual:        0.5168 (27ms) - EOT: False
  • Namo Vietnamese-Specific: 0.9952 (4ms)  - EOT: True ⭐
  • LiveKit Multilingual:     0.2988 (22ms) - EOT: False

Sample: "Vay ở đâu" (Where to borrow) - Incomplete phrase
  • Namo Multilingual:        0.6599 (20ms) - EOT: False
  • Namo Vietnamese-Specific: 0.9875 (10ms) - EOT: True ⭐
  • LiveKit Multilingual:     0.5106 (25ms) - EOT: False

Chinese Performance

Sample: "你好,你好吗?" (Hello, how are you?)
  • Namo Multilingual:     0.6525 (30ms) - EOT: False
  • Namo Chinese-Specific: 0.8777 (16ms) - EOT: True ⭐
  • LiveKit Multilingual:  0.8520 (20ms) - EOT: True

Sample: "今天天气怎么样?" (What's the weather today?)
  • Namo Multilingual:     0.6818 (18ms) - EOT: False
  • Namo Chinese-Specific: 0.9090 (34ms) - EOT: True ⭐
  • LiveKit Multilingual:  0.9707 (20ms) - EOT: True

Key Insights:

  • Language-Specific models show superior accuracy for their target languages
  • Namo Multilingual provides consistent performance across all languages
  • Inference speed is competitive, typically 10-30ms per prediction
  • Vietnamese detection significantly outperforms baseline multilingual model

API Reference

Single-Language Models (NEW ✨)

VietnameseModel

from livekit.plugins import namo_turn_detector

model = namo_turn_detector.vi_model.VietnameseModel(threshold: float = 0.7)

EnglishModel

from livekit.plugins import namo_turn_detector

model = namo_turn_detector.en_model.EnglishModel(threshold: float = 0.7)

ChineseModel

from livekit.plugins import namo_turn_detector

model = namo_turn_detector.zh_model.ChineseModel(threshold: float = 0.7)

Parameters:

  • threshold: Detection threshold (0.0-1.0), default 0.7

Properties:

  • language - Language code ("vi", "en", or "zh")
  • model - Model name (e.g., "namo-vi")
  • threshold - Current detection threshold

Methods:

  • predict_end_of_turn(chat_ctx, timeout=10.0) -> float - Returns probability (0.0-1.0)
  • unlikely_threshold(language) -> float - Get model's threshold for language

Memory Usage: ~200MB per model (loads only one language)


LanguageSpecificModel

LanguageSpecificModel(language: str, threshold: float = 0.7)

Parameters:

  • language: Language code ("en", "vi", "zh")
  • threshold: Detection threshold (0.0-1.0)

Methods:

  • predict_end_of_turn(chat_ctx, timeout=10.0) -> float - Returns probability (0.0-1.0)
  • unlikely_threshold(language) -> float - Get model's threshold for language

Memory Usage: ~600MB (loads all 3 models: en, vi, zh)


MultilingualModel

MultilingualModel(threshold: float = 0.7)

Methods:

  • predict_end_of_turn(chat_ctx, timeout=10.0) -> float - Returns probability (0.0-1.0)
  • unlikely_threshold(language) -> float - Get model's threshold for language

Memory Usage: ~400MB (single multilingual model for 23 languages)

Pre-download Models

python main.py download-files

Model Comparison

Choose the right model for your use case:

Model Languages Memory Init Speed Accuracy Best For
VietnameseModel Vietnamese ~200MB ⚡⚡⚡ Fast ⭐⭐⭐ Highest Vietnamese-only apps
EnglishModel English ~200MB ⚡⚡⚡ Fast ⭐⭐⭐ Highest English-only apps
ChineseModel Chinese ~200MB ⚡⚡⚡ Fast ⭐⭐⭐ Highest Chinese-only apps
LanguageSpecificModel EN, VI, ZH ~600MB ⚡ Slow ⭐⭐⭐ High Multi-lang apps (3 langs)
MultilingualModel 23 languages ~400MB ⚡⚡ Medium ⭐⭐ Good Global apps (many langs)

Recommendation: Use single-language models (VietnameseModel, EnglishModel, ChineseModel) for production apps serving one language. They provide 66% memory savings and 3x faster initialization.


Supported Languages

  • Single-Language Models: Vietnamese (vi), English (en), Chinese (zh)

  • Multi-Language Model (LanguageSpecificModel): English (en), Vietnamese (vi), Chinese (zh)

  • Multilingual Model (23 languages): Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hindi, Indonesian, Italian, Japanese, Korean, Marathi, Norwegian, Polish, Portuguese, Russian, Spanish, Turkish, Ukrainian, Vietnamese

License

Apache-2.0

Credits

Citation

@software{namo2025,
  title = {Namo Turn Detector v1: Semantic Turn Detection for Conversational AI},
  author = {VideoSDK Team},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/collections/videosdk-live/namo-turn-detector-v1-68d52c0564d2164e9d17ca97}
}