-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Dev #3498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tanmaypawar-noise
wants to merge
14
commits into
ggml-org:master
Choose a base branch
from
jagatheeswaran-noise:dev
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Dev #3498
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Enhanced getFirstSample() to specifically look for samples_jfk.wav first - Added complete directory cleanup in copyAssets() to prevent stale files - Improved file selection priority: samples_jfk.wav > any jfk file > first wav - Added comprehensive logging for file selection and copying process - Ensures transcribe sample always uses the correct JFK speech file
…tion ✨ Features Added: - Real-time audio streaming with optimized chunking (500ms intervals) - Automatic WAV file storage with timestamps in Downloads/WhisperRecordings - CSV logging of all transcriptions with timestamps and filenames - Prompt-based transcription for improved voice command accuracy - Transcription timing display in UI - Optimized performance for 2.5-3 second transcription latency 🚀 Performance Optimizations: - JNI parameters: Greedy sampling, optimized context (1024), max length (150) - Audio processing: 8-second max capture, adaptive silence detection - Threading: Limited to 4 threads for optimal mobile performance - Smart segmentation: Single segment for short commands, multi-segment for longer - Non-speech token suppression for cleaner command recognition 🎯 Key Improvements: - Short commands (1-2s): ~2.5-3s transcription time - Long commands (3-8s): Complete capture without truncation - Enhanced accuracy with context-aware processing - Robust file storage and permission handling - Sample file selection fixes and UI improvements
…sification
✨ Added complete slot extraction system with 14+ slot types
🎯 Enhanced intent classification with TensorFlow Lite models
🏷️ Comprehensive NLU pipeline with intent + slot extraction
Key Features:
- SlotExtractor.kt: Complete slot extraction with pattern matching
- Enhanced IntentClassifier.kt: Integrated slot extraction pipeline
- Updated UI: Beautiful slot visualization with confidence scoring
- 14+ Slot Types: metric, time_ref, unit, qualifier, threshold, target, value, feature, state, action, tool, activity_type, app, contact, location, attribute, type, period, event_type
- Pattern Recognition: Advanced regex patterns with synonym support
- Contextual Inference: Smart slot extraction based on domain knowledge
- Intent-Specific Templates: Required slots per intent type
Technical Implementation:
- TensorFlow Lite 2.12.0: Compatible models for Android deployment
- Working Models: lightweight_sentence_encoder.tflite (657KB), intent_classifier.tflite (21KB)
- Enhanced UI: Tabbed interface with comprehensive slot display
- Performance: 2.5-3s transcription + real-time slot extraction
Example Commands with Slots:
- 'How many steps today?' → QueryPoint {metric: steps, time_ref: today}
- 'Set goal to 10000 steps' → SetGoal {metric: steps, target: 10000, unit: count}
- 'Turn on do not disturb' → ToggleFeature {feature: do not disturb, state: on}
Complete NLU pipeline now matches Python implementation capabilities! 🚀
… implementation
✨ Complete iOS implementation with SwiftUI and TensorFlow Lite
🎯 Feature parity with Android implementation
🍎 Native iOS optimizations with async/await
Key Features:
- IntentClassifier.swift: TensorFlow Lite inference engine with async/await
- SlotExtractor.swift: Complete slot extraction with 14+ slot types
- IntentTestView.swift: Beautiful SwiftUI interface for testing
- IntentModels.swift: Core data structures and error types
- Updated ContentView.swift: TabView integration
Technical Implementation:
- TensorFlow Lite Swift integration (2.12.0 compatible)
- Same models as Android: intent_classifier.tflite + lightweight_sentence_encoder.tflite
- Modern Swift patterns: async/await, ObservableObject, Result types
- SwiftUI optimizations: LazyVGrid, progressive disclosure, reactive state
- Comprehensive error handling and logging with os.log
Slot Extraction Capabilities:
- 14+ Slot Types: metric, time_ref, unit, qualifier, threshold, target, value, feature, state, action, tool, activity_type, app, contact, location, attribute, type, period, event_type
- Pattern Matching: Advanced Swift regex patterns with synonym support
- Contextual Inference: Smart extraction based on domain knowledge
- Intent-Specific Templates: Required slots per intent type
iOS-Specific Features:
- Native SwiftUI interface with color-coded sections
- Tabbed integration with existing Whisper app
- Async processing for smooth UI performance
- Bundle resource management for model files
- Memory management with automatic cleanup
Example Usage:
- 'How many steps today?' → QueryPoint {metric: steps, time_ref: today}
- 'Set goal to 10000 steps' → SetGoal {metric: steps, target: 10000, unit: count}
- 'Turn on do not disturb' → ToggleFeature {feature: do not disturb, state: on}
Documentation:
- Comprehensive README.md with setup instructions
- QUICK_SETUP.md for 5-minute installation
- API documentation and usage examples
- Troubleshooting guide and performance tips
Complete NLU pipeline now available for iOS with full Android feature parity! 🚀🍎
…ion accuracy - Implemented Rust-based HuggingFace tokenizer using proper BERT WordPiece tokenization - Added HFTokenizer.kt Kotlin wrapper for seamless Android integration - Updated IntentClassifier.kt to use HF tokenizer instead of basic word tokenization - Built native libraries for arm64-v8a and x86_64 Android architectures - Added comprehensive documentation and build scripts - Resolves tokenization accuracy issues by matching model training tokenization Key improvements: - Proper subword tokenization (e.g. 'playing' -> ['play', '##ing']) - Better out-of-vocabulary word handling - Significant accuracy improvements in intent classification - Industry-standard tokenization approach
… extraction, and performance optimizations
- Merged voice recognition and intent classification into unified workflow
- Added intent classification UI with confidence display and slot visualization
- Optimized audio recording pipeline for better performance
- Improved SlotExtractor with pre-compiled regex patterns for 10-100x performance boost
- Added default case handling for irrelevant input ('Sorry, please say again')
- Enhanced UI: centered prominent Start button, increased font size for accessibility
- Updated app branding to 'Noise AI ASR and Intent Demo'
- Renamed tabs: 'Whisper' → 'ASR & Intent', 'Intent Test' maintained
- Removed benchmark and transcribe sample buttons for cleaner interface
- Added CSV logging with intent classification results
- Updated app branding: 'WhisperCppDemo' → 'Noise AI ASR and Intent Demo'
- Renamed main tab: 'Whisper' → 'ASR & Intent'
- Enhanced Start button: centered, 300dp width, 56dp height, larger font
- Removed benchmark and transcribe sample buttons for cleaner UI
- Improved contact extraction: now extracts any name after phone action keywords
- Added comprehensive intent classification UI with confidence display
- Implemented CSV logging with timestamp, audio filename, transcription, and intent
- Optimized SlotExtractor with pre-compiled regex patterns for 10-100x performance boost
- Added default case handling for irrelevant input ('Sorry, please say again')
- Enhanced accessibility with larger touch targets and better visual hierarchy
…and remove whisper.cpp-master folder
- Added SlotExtractor integration to extract slots from transcriptions - Updated CSV headers to include 'slots' column - Modified saveToCsv to include slots data as JSON string - Slots are extracted after intent classification and saved alongside timestamp, audio file, transcription, and intent
…ence, and action extraction improvements - Added extensive time reference patterns (tomorrow, this morning/afternoon/evening, this/next week/month, recently, all time, this/last year) - Separated 'now' as distinct time reference from 'today' - Refactored extractAction into intent-specific functions (extractTimerAction, extractMediaAction, extractAppAction, extractPhoneAction) - Enhanced extractUnit with context-based inference for stress (score), blood oxygen (percent), sleep (hours), sleep quality (score), distance, calories, and walking movement - Removed weight unit inference as requested - Improved slot extraction accuracy for different intent types
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.