Web application using Voxtral AI from Mistral AI to automatically analyze your audio/video meetings with:
- Direct analysis: Transcription and structured summary in one step
- 3 processing modes: Local (Transformers), MLX (Apple Silicon), API (Cloud)
- Quantized models: 4bit/8bit support for memory efficiency
- Smart diarization: Speaker identification and renaming
- Customizable summaries: Modular sections according to your needs
- Language-adaptive: Automatically detects and responds in meeting language
- Centralized UI: Clean English interface with multilingual analysis
- π Hugging Face Spaces: Try the simplified version online at VincentGOURBIN/MeetingNotes-Voxtral-Analysis
Try MeetingNotes directly in your browser: VincentGOURBIN/MeetingNotes-Voxtral-Analysis
This simplified version uses standard Mistral Voxtral models optimized for Zero GPU with automatic chunk processing.
- Clone the repository and install dependencies:
git clone <repository-url>
cd meetingnotes
pip install -r requirements.txt- Configure your Hugging Face token:
cp .env.example .env
# Edit .env and add your Hugging Face token- Launch the application:
python main.pyThe web interface will be accessible at http://localhost:7860
Get an access token from Hugging Face and add it to .env:
HUGGINGFACE_TOKEN=your_token_hereTo use cloud API mode, get a key from Mistral AI:
MISTRAL_API_KEY=your_mistral_api_keyChoose the mode that best fits your hardware and needs:
- Local processing: Everything runs on your machine with PyTorch
- Privacy: No data sent to external servers
- GPU acceleration: Automatic CUDA/MPS detection
- Optimized for Mac: M1/M2/M3 processors with MLX Framework
- Best performance: Native Apple Silicon acceleration
- Memory efficient: Optimized quantized models
- Cloud processing: Uses Mistral Cloud API
- No local resources: Minimal memory usage
- Always up-to-date: Latest models and improvements
| Model | Precision | Repository | Memory Usage |
|---|---|---|---|
| Voxtral Mini | Default | mistralai/Voxtral-Mini-3B-2507 |
~6GB |
| Voxtral Mini | 8bit | mzbac/voxtral-mini-3b-8bit |
~3.5GB |
| Voxtral Mini | 4bit | mzbac/voxtral-mini-3b-4bit-mixed |
~2GB |
| Voxtral Small | Default | mistralai/Voxtral-Small-24B-2507 |
~48GB |
| Voxtral Small | 8bit | VincentGOURBIN/voxtral-small-8bit |
~24GB |
| Voxtral Small | 4bit | VincentGOURBIN/voxtral-small-4bit-mixed |
~12GB |
- Automatic identification: Detection of different speakers with pyannote.audio
- Reference segments: Listen to audio samples for each speaker
- Custom renaming: Assign human names to speakers
- Context integration: Use speaker information in summaries
Modular sections: Choose the sections to include according to your needs
- π Executive Summary: Global overview of the meeting
- π¬ Main Discussions: Main topics addressed
- β Action Plan: Actions, responsibilities, deadlines
- βοΈ Decisions Made: Validated decisions
- βοΈ Next Steps: Follow-up actions
- π Main Topics: Information presented
- β Key Points: Insights and key data
- β Questions & Discussions: Questions asked and answers
- π Follow-up Elements: Clarifications needed
Predefined Profiles:
- π― Action Profile: Focus on tasks and decisions
- π Information Profile: Focus on data and insights
- π Complete Profile: All sections activated
- Audio: WAV, MP3, M4A, OGG, FLAC
- Video: MP4, AVI, MOV, MKV (automatic audio extraction)
- Choose the mode: Local, MLX or API
- Select the model: Mini or Small according to your needs
- Choose precision: Default, 8bit or 4bit to optimize memory
- File: Direct audio or video (automatic extraction)
- Optional trimming: Start/end trimming (leave empty for 0)
- Chunk size: Processing duration (5-25 minutes)
- Analyze speakers with pyannote.audio
- Listen to reference segments of each speaker
- Rename speakers with custom names
- Apply renamings for enriched context
- Modular sections: Enable only necessary sections
- Quick profiles: Action, Information or Complete
- Flexible configuration: Adapt summary to your usage
Click "Analyze Meeting" to get a customized structured summary.
The project follows a modular architecture with two versions:
src/meetingnotes/
βββ ai/ # Artificial Intelligence
β βββ voxtral_analyzer.py # Local Voxtral analyzer (Transformers)
β βββ voxtral_api_analyzer.py # Voxtral API analyzer
β βββ voxtral_mlx_analyzer.py # Voxtral MLX analyzer (Apple Silicon)
β βββ diarization.py # Speaker diarization (pyannote)
β βββ memory_manager.py # Optimized memory management
β βββ prompts_config.py # Centralized prompt configuration
βββ audio/ # Audio Processing
β βββ wav_converter.py # Format conversion
β βββ normalizer.py # Volume normalization
βββ core/ # Business Logic
β βββ voxtral_direct.py # Direct processing (Transformers)
β βββ voxtral_api.py # Mistral API interface
β βββ voxtral_mlx.py # MLX Apple Silicon interface
βββ ui/ # User Interface
β βββ main.py # Main Gradio interface
β βββ handlers.py # Event handlers
β βββ labels.py # UI labels and text constants
βββ utils/ # Utilities
βββ __init__.py # Utils module
βββ time_formatter.py # Duration formatting
βββ token_tracker.py # Token usage tracking
Simplified version deployed at VincentGOURBIN/MeetingNotes-Voxtral-Analysis:
huggingface-space/
βββ src/
β βββ ai/
β β βββ voxtral_spaces_analyzer.py # HF Spaces optimized analyzer
β β βββ prompts_config.py # Shared prompts configuration
β βββ ui/
β β βββ spaces_interface.py # Simplified Gradio interface
β β βββ labels.py # UI labels
β βββ utils/
β βββ zero_gpu_manager.py # Zero GPU management
β βββ token_tracker.py # Token tracking
βββ app.py # HF Spaces entry point
βββ requirements.txt # HF Spaces dependencies
βββ deploy.py # Deployment script
Key differences in HF Spaces version:
- Only Transformers backend (no MLX/API modes)
- Standard Mistral Voxtral models (optimized for Zero GPU)
- No speaker diarization (simplified interface)
- Progress bar with chunk-based tracking
- Automatic chunk duration optimization (15min Mini, 10min Small)
For more details, see ARCHITECTURE.md.
# Required for all modes
HUGGINGFACE_TOKEN=your_hf_token
# Optional for API mode
MISTRAL_API_KEY=your_mistral_key- Mac M1/M2/M3: Use MLX mode for better performance
- NVIDIA GPU: Local mode with automatic CUDA acceleration
- CPU only: Prefer 4bit models to save memory
- Limited memory: Mini 4bit (~2GB) or Small 4bit (~12GB)
- Pre-quantized models: 4bit and 8bit for memory reduction
- Memory manager: Automatic cleanup between chunks
- Multi-platform support: MPS (Apple), CUDA (NVIDIA), optimized CPU
- 3 inference modes: Direct audio-chat without intermediate transcription
- Language-adaptive: Automatically detects meeting language and responds accordingly
- Adaptive chunks: Smart division of long files with synthesis
- Modular prompts: Customizable summary sections with centralized configuration
- Enriched context: Integration of diarization in analyses
- Token tracking: Comprehensive usage statistics across all processing modes
- Centralized UI labels: Clean English interface with maintainable text management
- Improved API layout: API key positioned next to model selection for better UX
- Interactive diarization: Speaker listening and renaming
- Modular sections: Advanced summary customization with preset profiles
- Real-time feedback: Detailed progress indicators and token consumption tracking
- gradio: Modern web user interface
- torch/torchaudio: Deep learning framework (Local mode)
- transformers: Hugging Face and Voxtral models
- mlx/mlx-voxtral: MLX framework optimized for Apple Silicon (macOS only)
- pyannote.audio: Speaker diarization
- pydub: Audio processing and conversion
- requests: Communication with Mistral API
- python-dotenv: Environment variables management
- Local processing: Option for entirely on-machine processing
- Environment variables: Secure tokens via
.env - No cloud storage: Your files remain local
- Automatic cleanup: Temporary files removal
β Version v2.2 - Hugging Face Spaces Integration
- π HF Spaces version: Simplified online version at VincentGOURBIN/MeetingNotes-Voxtral-Analysis
- 3 processing modes: Local, MLX, API with improved layout (main version)
- Standard Mistral models: Original Voxtral models optimized for Zero GPU (HF Spaces)
- 6 model configurations: Mini/Small + Default/8bit/4bit (main version)
- Complete diarization: Speaker identification and renaming (main version only)
- Modular summaries: 9 customizable sections with preset profiles
- Language-adaptive AI: Automatically responds in detected meeting language
- Progress tracking: Real-time progress bar with chunk-based updates
- Centralized UI management: Clean English interface with maintainable labels
- Token tracking: Comprehensive usage statistics for all modes
- Improved UX: Better API mode layout and visual organization
- Multi-platform support: Windows, macOS, Linux (main), Zero GPU (HF Spaces)
To contribute to the project:
- Fork the repository
- Create a feature branch
- Implement your changes
- Add tests if necessary
- Open a Pull Request
This project is under MIT license. See the LICENSE file for more details.
MeetingNotes - Powered by Voxtral from Mistral AI | π Intelligent meeting analysis | πΎ Secure local and cloud processing






