Skip to content

VincentGourbin/meetingnotes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MeetingNotes - Intelligent Meeting Analysis with Voxtral

Web application using Voxtral AI from Mistral AI to automatically analyze your audio/video meetings with:

  • Direct analysis: Transcription and structured summary in one step
  • 3 processing modes: Local (Transformers), MLX (Apple Silicon), API (Cloud)
  • Quantized models: 4bit/8bit support for memory efficiency
  • Smart diarization: Speaker identification and renaming
  • Customizable summaries: Modular sections according to your needs
  • Language-adaptive: Automatically detects and responds in meeting language
  • Centralized UI: Clean English interface with multilingual analysis
  • πŸš€ Hugging Face Spaces: Try the simplified version online at VincentGOURBIN/MeetingNotes-Voxtral-Analysis

Meeting Analysis Interface

πŸš€ Quick Start

🌐 Online Version (Hugging Face Spaces)

Try MeetingNotes directly in your browser: VincentGOURBIN/MeetingNotes-Voxtral-Analysis

This simplified version uses standard Mistral Voxtral models optimized for Zero GPU with automatic chunk processing.

πŸ’» Local Installation (Full Version)

  1. Clone the repository and install dependencies:
git clone <repository-url>
cd meetingnotes
pip install -r requirements.txt
  1. Configure your Hugging Face token:
cp .env.example .env
# Edit .env and add your Hugging Face token
  1. Launch the application:
python main.py

The web interface will be accessible at http://localhost:7860

βš™οΈ Configuration

Hugging Face Token (Required)

Get an access token from Hugging Face and add it to .env:

HUGGINGFACE_TOKEN=your_token_here

Mistral API Key (Optional)

To use cloud API mode, get a key from Mistral AI:

MISTRAL_API_KEY=your_mistral_api_key

🎯 Features

Processing Modes

Choose the mode that best fits your hardware and needs:

Local Mode (Transformers)

Local Mode

  • Local processing: Everything runs on your machine with PyTorch
  • Privacy: No data sent to external servers
  • GPU acceleration: Automatic CUDA/MPS detection

MLX Mode (Apple Silicon)

MLX Mode

  • Optimized for Mac: M1/M2/M3 processors with MLX Framework
  • Best performance: Native Apple Silicon acceleration
  • Memory efficient: Optimized quantized models

API Mode (Cloud)

API Mode

  • Cloud processing: Uses Mistral Cloud API
  • No local resources: Minimal memory usage
  • Always up-to-date: Latest models and improvements

Models and Quantization

Model Precision Repository Memory Usage
Voxtral Mini Default mistralai/Voxtral-Mini-3B-2507 ~6GB
Voxtral Mini 8bit mzbac/voxtral-mini-3b-8bit ~3.5GB
Voxtral Mini 4bit mzbac/voxtral-mini-3b-4bit-mixed ~2GB
Voxtral Small Default mistralai/Voxtral-Small-24B-2507 ~48GB
Voxtral Small 8bit VincentGOURBIN/voxtral-small-8bit ~24GB
Voxtral Small 4bit VincentGOURBIN/voxtral-small-4bit-mixed ~12GB

Speaker Diarization

Speaker Diarization

  • Automatic identification: Detection of different speakers with pyannote.audio
  • Reference segments: Listen to audio samples for each speaker
  • Custom renaming: Assign human names to speakers
  • Context integration: Use speaker information in summaries

Customizable Summaries

Modular sections: Choose the sections to include according to your needs

  • πŸ“„ Executive Summary: Global overview of the meeting
  • πŸ’¬ Main Discussions: Main topics addressed
  • βœ… Action Plan: Actions, responsibilities, deadlines
  • βš–οΈ Decisions Made: Validated decisions
  • ⏭️ Next Steps: Follow-up actions
  • πŸ“Œ Main Topics: Information presented
  • ⭐ Key Points: Insights and key data
  • ❓ Questions & Discussions: Questions asked and answers
  • πŸ“ Follow-up Elements: Clarifications needed

Predefined Profiles:

  • 🎯 Action Profile: Focus on tasks and decisions
  • πŸ“Š Information Profile: Focus on data and insights
  • πŸ“‹ Complete Profile: All sections activated

Supported Formats

  • Audio: WAV, MP3, M4A, OGG, FLAC
  • Video: MP4, AVI, MOV, MKV (automatic audio extraction)

πŸ”§ Usage

1. Processing Mode Configuration

  1. Choose the mode: Local, MLX or API
  2. Select the model: Mini or Small according to your needs
  3. Choose precision: Default, 8bit or 4bit to optimize memory

2. Upload and Options

  • File: Direct audio or video (automatic extraction)
  • Optional trimming: Start/end trimming (leave empty for 0)

Trim Options

  • Chunk size: Processing duration (5-25 minutes)

3. Diarization (Optional)

  1. Analyze speakers with pyannote.audio
  2. Listen to reference segments of each speaker
  3. Rename speakers with custom names
  4. Apply renamings for enriched context

4. Summary Customization

  • Modular sections: Enable only necessary sections
  • Quick profiles: Action, Information or Complete
  • Flexible configuration: Adapt summary to your usage

5. Analysis and Results

Click "Analyze Meeting" to get a customized structured summary.

Meeting Summary

πŸ—οΈ Architecture

The project follows a modular architecture with two versions:

Main Project (src/meetingnotes/)

src/meetingnotes/
β”œβ”€β”€ ai/                    # Artificial Intelligence
β”‚   β”œβ”€β”€ voxtral_analyzer.py      # Local Voxtral analyzer (Transformers)
β”‚   β”œβ”€β”€ voxtral_api_analyzer.py  # Voxtral API analyzer
β”‚   β”œβ”€β”€ voxtral_mlx_analyzer.py  # Voxtral MLX analyzer (Apple Silicon)
β”‚   β”œβ”€β”€ diarization.py           # Speaker diarization (pyannote)
β”‚   β”œβ”€β”€ memory_manager.py        # Optimized memory management
β”‚   └── prompts_config.py        # Centralized prompt configuration
β”œβ”€β”€ audio/                 # Audio Processing
β”‚   β”œβ”€β”€ wav_converter.py         # Format conversion
β”‚   └── normalizer.py            # Volume normalization
β”œβ”€β”€ core/                  # Business Logic
β”‚   β”œβ”€β”€ voxtral_direct.py        # Direct processing (Transformers)
β”‚   β”œβ”€β”€ voxtral_api.py           # Mistral API interface
β”‚   └── voxtral_mlx.py           # MLX Apple Silicon interface
β”œβ”€β”€ ui/                    # User Interface
β”‚   β”œβ”€β”€ main.py                  # Main Gradio interface
β”‚   β”œβ”€β”€ handlers.py              # Event handlers
β”‚   └── labels.py                # UI labels and text constants
└── utils/                 # Utilities
    β”œβ”€β”€ __init__.py              # Utils module
    β”œβ”€β”€ time_formatter.py        # Duration formatting
    └── token_tracker.py          # Token usage tracking

Hugging Face Spaces Version (huggingface-space/)

Simplified version deployed at VincentGOURBIN/MeetingNotes-Voxtral-Analysis:

huggingface-space/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ ai/
β”‚   β”‚   β”œβ”€β”€ voxtral_spaces_analyzer.py   # HF Spaces optimized analyzer
β”‚   β”‚   └── prompts_config.py            # Shared prompts configuration
β”‚   β”œβ”€β”€ ui/
β”‚   β”‚   β”œβ”€β”€ spaces_interface.py          # Simplified Gradio interface
β”‚   β”‚   └── labels.py                    # UI labels
β”‚   └── utils/
β”‚       β”œβ”€β”€ zero_gpu_manager.py          # Zero GPU management
β”‚       └── token_tracker.py             # Token tracking
β”œβ”€β”€ app.py                               # HF Spaces entry point
β”œβ”€β”€ requirements.txt                     # HF Spaces dependencies
└── deploy.py                           # Deployment script

Key differences in HF Spaces version:

  • Only Transformers backend (no MLX/API modes)
  • Standard Mistral Voxtral models (optimized for Zero GPU)
  • No speaker diarization (simplified interface)
  • Progress bar with chunk-based tracking
  • Automatic chunk duration optimization (15min Mini, 10min Small)

For more details, see ARCHITECTURE.md.

πŸ”§ Advanced Configuration

Environment Variables

# Required for all modes
HUGGINGFACE_TOKEN=your_hf_token

# Optional for API mode
MISTRAL_API_KEY=your_mistral_key

Hardware Optimizations

  • Mac M1/M2/M3: Use MLX mode for better performance
  • NVIDIA GPU: Local mode with automatic CUDA acceleration
  • CPU only: Prefer 4bit models to save memory
  • Limited memory: Mini 4bit (~2GB) or Small 4bit (~12GB)

πŸ” Technical Features

Memory Optimizations

  • Pre-quantized models: 4bit and 8bit for memory reduction
  • Memory manager: Automatic cleanup between chunks
  • Multi-platform support: MPS (Apple), CUDA (NVIDIA), optimized CPU

Intelligent Processing

  • 3 inference modes: Direct audio-chat without intermediate transcription
  • Language-adaptive: Automatically detects meeting language and responds accordingly
  • Adaptive chunks: Smart division of long files with synthesis
  • Modular prompts: Customizable summary sections with centralized configuration
  • Enriched context: Integration of diarization in analyses
  • Token tracking: Comprehensive usage statistics across all processing modes

Modern Interface

  • Centralized UI labels: Clean English interface with maintainable text management
  • Improved API layout: API key positioned next to model selection for better UX
  • Interactive diarization: Speaker listening and renaming
  • Modular sections: Advanced summary customization with preset profiles
  • Real-time feedback: Detailed progress indicators and token consumption tracking

πŸ“¦ Main Dependencies

  • gradio: Modern web user interface
  • torch/torchaudio: Deep learning framework (Local mode)
  • transformers: Hugging Face and Voxtral models
  • mlx/mlx-voxtral: MLX framework optimized for Apple Silicon (macOS only)
  • pyannote.audio: Speaker diarization
  • pydub: Audio processing and conversion
  • requests: Communication with Mistral API
  • python-dotenv: Environment variables management

πŸ”’ Security and Privacy

  • Local processing: Option for entirely on-machine processing
  • Environment variables: Secure tokens via .env
  • No cloud storage: Your files remain local
  • Automatic cleanup: Temporary files removal

🚦 Project Status

βœ… Version v2.2 - Hugging Face Spaces Integration

  • πŸš€ HF Spaces version: Simplified online version at VincentGOURBIN/MeetingNotes-Voxtral-Analysis
  • 3 processing modes: Local, MLX, API with improved layout (main version)
  • Standard Mistral models: Original Voxtral models optimized for Zero GPU (HF Spaces)
  • 6 model configurations: Mini/Small + Default/8bit/4bit (main version)
  • Complete diarization: Speaker identification and renaming (main version only)
  • Modular summaries: 9 customizable sections with preset profiles
  • Language-adaptive AI: Automatically responds in detected meeting language
  • Progress tracking: Real-time progress bar with chunk-based updates
  • Centralized UI management: Clean English interface with maintainable labels
  • Token tracking: Comprehensive usage statistics for all modes
  • Improved UX: Better API mode layout and visual organization
  • Multi-platform support: Windows, macOS, Linux (main), Zero GPU (HF Spaces)

🀝 Contributing

To contribute to the project:

  1. Fork the repository
  2. Create a feature branch
  3. Implement your changes
  4. Add tests if necessary
  5. Open a Pull Request

πŸ“„ License

This project is under MIT license. See the LICENSE file for more details.


MeetingNotes - Powered by Voxtral from Mistral AI | πŸš€ Intelligent meeting analysis | πŸ’Ύ Secure local and cloud processing

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages