Skip to content

Conversation

@tanmaypawar-noise
Copy link

No description provided.

Jagatheeswaran Senthilvelan and others added 14 commits September 22, 2025 17:34
- Enhanced getFirstSample() to specifically look for samples_jfk.wav first
- Added complete directory cleanup in copyAssets() to prevent stale files
- Improved file selection priority: samples_jfk.wav > any jfk file > first wav
- Added comprehensive logging for file selection and copying process
- Ensures transcribe sample always uses the correct JFK speech file
…tion

✨ Features Added:
- Real-time audio streaming with optimized chunking (500ms intervals)
- Automatic WAV file storage with timestamps in Downloads/WhisperRecordings
- CSV logging of all transcriptions with timestamps and filenames
- Prompt-based transcription for improved voice command accuracy
- Transcription timing display in UI
- Optimized performance for 2.5-3 second transcription latency

🚀 Performance Optimizations:
- JNI parameters: Greedy sampling, optimized context (1024), max length (150)
- Audio processing: 8-second max capture, adaptive silence detection
- Threading: Limited to 4 threads for optimal mobile performance
- Smart segmentation: Single segment for short commands, multi-segment for longer
- Non-speech token suppression for cleaner command recognition

🎯 Key Improvements:
- Short commands (1-2s): ~2.5-3s transcription time
- Long commands (3-8s): Complete capture without truncation
- Enhanced accuracy with context-aware processing
- Robust file storage and permission handling
- Sample file selection fixes and UI improvements
…sification

✨ Added complete slot extraction system with 14+ slot types
🎯 Enhanced intent classification with TensorFlow Lite models
🏷️ Comprehensive NLU pipeline with intent + slot extraction

Key Features:
- SlotExtractor.kt: Complete slot extraction with pattern matching
- Enhanced IntentClassifier.kt: Integrated slot extraction pipeline
- Updated UI: Beautiful slot visualization with confidence scoring
- 14+ Slot Types: metric, time_ref, unit, qualifier, threshold, target, value, feature, state, action, tool, activity_type, app, contact, location, attribute, type, period, event_type
- Pattern Recognition: Advanced regex patterns with synonym support
- Contextual Inference: Smart slot extraction based on domain knowledge
- Intent-Specific Templates: Required slots per intent type

Technical Implementation:
- TensorFlow Lite 2.12.0: Compatible models for Android deployment
- Working Models: lightweight_sentence_encoder.tflite (657KB), intent_classifier.tflite (21KB)
- Enhanced UI: Tabbed interface with comprehensive slot display
- Performance: 2.5-3s transcription + real-time slot extraction

Example Commands with Slots:
- 'How many steps today?' → QueryPoint {metric: steps, time_ref: today}
- 'Set goal to 10000 steps' → SetGoal {metric: steps, target: 10000, unit: count}
- 'Turn on do not disturb' → ToggleFeature {feature: do not disturb, state: on}

Complete NLU pipeline now matches Python implementation capabilities! 🚀
… implementation

✨ Complete iOS implementation with SwiftUI and TensorFlow Lite
🎯 Feature parity with Android implementation
🍎 Native iOS optimizations with async/await

Key Features:
- IntentClassifier.swift: TensorFlow Lite inference engine with async/await
- SlotExtractor.swift: Complete slot extraction with 14+ slot types
- IntentTestView.swift: Beautiful SwiftUI interface for testing
- IntentModels.swift: Core data structures and error types
- Updated ContentView.swift: TabView integration

Technical Implementation:
- TensorFlow Lite Swift integration (2.12.0 compatible)
- Same models as Android: intent_classifier.tflite + lightweight_sentence_encoder.tflite
- Modern Swift patterns: async/await, ObservableObject, Result types
- SwiftUI optimizations: LazyVGrid, progressive disclosure, reactive state
- Comprehensive error handling and logging with os.log

Slot Extraction Capabilities:
- 14+ Slot Types: metric, time_ref, unit, qualifier, threshold, target, value, feature, state, action, tool, activity_type, app, contact, location, attribute, type, period, event_type
- Pattern Matching: Advanced Swift regex patterns with synonym support
- Contextual Inference: Smart extraction based on domain knowledge
- Intent-Specific Templates: Required slots per intent type

iOS-Specific Features:
- Native SwiftUI interface with color-coded sections
- Tabbed integration with existing Whisper app
- Async processing for smooth UI performance
- Bundle resource management for model files
- Memory management with automatic cleanup

Example Usage:
- 'How many steps today?' → QueryPoint {metric: steps, time_ref: today}
- 'Set goal to 10000 steps' → SetGoal {metric: steps, target: 10000, unit: count}
- 'Turn on do not disturb' → ToggleFeature {feature: do not disturb, state: on}

Documentation:
- Comprehensive README.md with setup instructions
- QUICK_SETUP.md for 5-minute installation
- API documentation and usage examples
- Troubleshooting guide and performance tips

Complete NLU pipeline now available for iOS with full Android feature parity! 🚀🍎
…ion accuracy

- Implemented Rust-based HuggingFace tokenizer using proper BERT WordPiece tokenization
- Added HFTokenizer.kt Kotlin wrapper for seamless Android integration
- Updated IntentClassifier.kt to use HF tokenizer instead of basic word tokenization
- Built native libraries for arm64-v8a and x86_64 Android architectures
- Added comprehensive documentation and build scripts
- Resolves tokenization accuracy issues by matching model training tokenization

Key improvements:
- Proper subword tokenization (e.g. 'playing' -> ['play', '##ing'])
- Better out-of-vocabulary word handling
- Significant accuracy improvements in intent classification
- Industry-standard tokenization approach
- Merged voice recognition and intent classification into unified workflow
- Added intent classification UI with confidence display and slot visualization
- Optimized audio recording pipeline for better performance
- Improved SlotExtractor with pre-compiled regex patterns for 10-100x performance boost
- Added default case handling for irrelevant input ('Sorry, please say again')
- Enhanced UI: centered prominent Start button, increased font size for accessibility
- Updated app branding to 'Noise AI ASR and Intent Demo'
- Renamed tabs: 'Whisper' → 'ASR & Intent', 'Intent Test' maintained
- Removed benchmark and transcribe sample buttons for cleaner interface
- Added CSV logging with intent classification results
- Updated app branding: 'WhisperCppDemo' → 'Noise AI ASR and Intent Demo'
- Renamed main tab: 'Whisper' → 'ASR & Intent'
- Enhanced Start button: centered, 300dp width, 56dp height, larger font
- Removed benchmark and transcribe sample buttons for cleaner UI
- Improved contact extraction: now extracts any name after phone action keywords
- Added comprehensive intent classification UI with confidence display
- Implemented CSV logging with timestamp, audio filename, transcription, and intent
- Optimized SlotExtractor with pre-compiled regex patterns for 10-100x performance boost
- Added default case handling for irrelevant input ('Sorry, please say again')
- Enhanced accessibility with larger touch targets and better visual hierarchy
- Added SlotExtractor integration to extract slots from transcriptions
- Updated CSV headers to include 'slots' column
- Modified saveToCsv to include slots data as JSON string
- Slots are extracted after intent classification and saved alongside timestamp, audio file, transcription, and intent
…ence, and action extraction improvements

- Added extensive time reference patterns (tomorrow, this morning/afternoon/evening, this/next week/month, recently, all time, this/last year)
- Separated 'now' as distinct time reference from 'today'
- Refactored extractAction into intent-specific functions (extractTimerAction, extractMediaAction, extractAppAction, extractPhoneAction)
- Enhanced extractUnit with context-based inference for stress (score), blood oxygen (percent), sleep (hours), sleep quality (score), distance, calories, and walking movement
- Removed weight unit inference as requested
- Improved slot extraction accuracy for different intent types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant