Transform your meetings into actionable insights with real-time transcription, AI-powered summaries, and intelligent analysis.
π Documentation β’ π Quick Start β’ β¨ Features β’ ποΈ Architecture β’ π οΈ Installation
- Overview
- Key Features
- System Architecture
- Components
- Quick Start
- Installation
- Usage Guide
- API Documentation
- Development
- Testing
- Deployment
- Troubleshooting
- Contributing
- License
AI MOM (AI Minutes of Meeting) is a comprehensive, production-ready meeting intelligence platform that combines cutting-edge AI technologies to revolutionize how you capture, transcribe, and analyze meetings. Built with a modular architecture, it offers three powerful ways to work with meeting content:
- ποΈ Real-Time Intelligence: Live audio transcription with speaker diarization and instant AI summaries
- π Multi-Format Processing: Upload and process pre-recorded audio files (MP3, WAV, M4A, AAC, OGG, FLAC)
- π₯οΈ Browser Extension: Capture any online meeting with screen recording and live transcription overlay
- π€ 5-Model AI Processing: Concurrent processing using Groq (Llama 3.3/3.1) and OpenRouter (GPT-4o Mini, Claude Haiku, Gemini Flash)
- π GPU Acceleration: Optimized Whisper model with automatic CUDA detection for 10x faster transcription
- π° Cost-Effective: 100% FREE for development and normal usage with built-in API cost monitoring
- π₯ Speaker Recognition: Advanced speaker diarization with visual color coding and personalized alerts
- π Privacy-First: All processing happens on your infrastructure with optional cloud AI services
- Live microphone input processing with WebSocket streaming
- Instant transcription with speaker identification
- Dynamic speaker color coding for easy conversation tracking
- Personalized notifications when you're mentioned
- Custom keyword alerts for important topics
- Session save/restore functionality
- Drag-and-drop or click-to-upload interface
- Support for multiple audio formats (MP3, WAV, M4A, AAC, OGG, FLAC)
- File validation (type, size up to 100MB)
- Progress visualization with real-time status updates
- Batch processing capability
- Screen Capture with Audio: Record system audio during screen sharing
- Multi-Platform Support: Google Meet, Zoom, Microsoft Teams, Zoho Meeting, YouTube
- Floating Overlay: Draggable real-time transcription display
- Auto-Detection: Automatically detects meeting state
- Keyboard Shortcuts: Platform-specific shortcuts for quick control
- Professional UI: Clean, responsive interface
Automatically generates structured insights:
- π Meeting Overview: Comprehensive summary of discussions
- π Key Points: Important topics and decisions
- β Action Items: Tasks and responsibilities with assignees
- π― Conclusions: Final outcomes and next steps
- π₯ Participants: Detected speakers and attendees
- Role-based analysis customization (Developer, Manager, Designer, etc.)
- Project tracking and contextual insights
- Custom keyword monitoring
- Personalized meeting summaries
- Real-time API cost tracking
- Performance analytics with detailed metrics
- Comprehensive error handling and recovery
- WebSocket health monitoring
- Rate limiting and request throttling
graph TB
subgraph "Frontend Layer"
A[Web Application<br/>HTML/CSS/JS]
B[Chrome Extension<br/>Manifest V3]
end
subgraph "Backend Services"
C[FastAPI Server<br/>Python 3.9+]
D[WebSocket Handler<br/>Real-time Communication]
E[Audio Processor<br/>Whisper + PyTorch]
F[Multi-API Processor<br/>5 Concurrent Models]
end
subgraph "AI Processing Pipeline"
G[Groq Llama 3.3 70B]
H[Groq Llama 3.1 70B]
I[OpenRouter GPT-4o Mini]
J[OpenRouter Claude Haiku]
K[OpenRouter Gemini Flash]
end
subgraph "Services"
L[Speaker Diarization<br/>PyAnnote]
M[AI Summarizer<br/>Intelligent Analysis]
N[User Profile Service<br/>Personalization]
O[Cost Monitor<br/>Usage Analytics]
end
A --> C
B --> C
C --> D
C --> E
E --> F
E --> L
F --> G
F --> H
F --> I
F --> J
F --> K
G --> M
H --> M
I --> M
J --> M
K --> M
L --> N
M --> C
N --> C
C --> O
style A fill:#4CAF50
style B fill:#2196F3
style C fill:#FF9800
style F fill:#9C27B0
style M fill:#E91E63
| Layer | Technologies |
|---|---|
| Frontend | HTML5, CSS3, Vanilla JavaScript, WebSocket API |
| Extension | Chrome Manifest V3, Screen Capture API, Tab Capture API |
| Backend | FastAPI, Uvicorn, Python 3.9+, WebSockets |
| AI/ML | OpenAI Whisper, PyTorch, Groq API, OpenRouter API |
| Audio Processing | PyAudio, FFmpeg, Pydub, NumPy, SciPy |
| Speaker Diarization | PyAnnote Audio, Scikit-learn |
| Testing | Pytest, Pytest-asyncio, Pytest-cov |
| Monitoring | Custom Cost Monitor, Performance Analytics |
FastAPI-based REST API and WebSocket server
- Purpose: Core server handling transcription, AI processing, and real-time communication
- Key Technologies: FastAPI, Whisper, Groq, OpenRouter, WebSockets
- Features:
- GPU-accelerated audio transcription
- 5-model concurrent AI processing
- Real-time WebSocket streaming
- Speaker diarization
- User profile management
- API cost monitoring
Modern web application for meeting management
- Purpose: User-friendly web interface for real-time capture and file processing
- Key Technologies: HTML5, CSS3, Vanilla JavaScript, WebSocket Client
- Features:
- Real-time meeting capture interface
- File upload with drag-and-drop
- Live transcription display
- Speaker color coding
- User authentication system
- Profile management
- Settings customization
Chrome extension for online meeting capture
- Purpose: Capture any online meeting with screen recording and live transcription
- Key Technologies: Chrome Manifest V3, Screen Capture API, Content Scripts
- Features:
- Screen capture with system audio
- Multi-platform support (Meet, Zoom, Teams, etc.)
- Floating transcription overlay
- Auto-detection of meeting state
- WebSocket backend integration
- Platform-specific keyboard shortcuts
- Python 3.9+ (with pip)
- CUDA-capable GPU (optional, for 10x faster transcription)
- FFmpeg (for audio processing)
- API Keys (free tier available):
- Groq API Key (Get it here)
- OpenRouter API Key (Get it here)
git clone https://github.com/Baisampayan1324/AI-MOM.git
cd AI-MOM# Navigate to backend
cd backend
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create .env file
copy .env.example .env # Windows
cp .env.example .env # macOS/Linux
# Edit .env and add your API keys
notepad .env # Windows
nano .env # macOS/LinuxRequired Environment Variables:
GROQ_API_KEY=your_groq_api_key_here
OPENROUTER_API_KEY=your_openrouter_api_key_here
GROQ_MODEL=llama-3.3-70b-versatile
GROQ_MODEL_2=llama-3.1-70b-versatile
OPENROUTER_MODEL=openai/gpt-4o-mini
OPENROUTER_MODEL_2=anthropic/claude-3-haiku
OPENROUTER_MODEL_3=google/gemini-flash-1.5
HOST=localhost
PORT=8000Option A: One-click scripts (Windows)
# From project root
./start_backend.bat # starts FastAPI backend on http://localhost:8000
./start_frontend.bat # opens frontend/index.html in your default browserOption B: Manual Start
# Terminal 1 - Start Backend
cd backend
python main.py
# Terminal 2 - Open Frontend
# Open frontend/index.html in your browser
# Or use: start frontend/index.html (Windows)- Open Chrome and navigate to
chrome://extensions/ - Enable "Developer mode" (top-right corner)
- Click "Load unpacked"
- Select the
extensionfolder from this project - Pin the extension to your toolbar
- Visit Google Meet, Zoom, or Teams and click the extension icon
-
Check Backend Health:
curl http://localhost:8000/health # Should return: {"status": "healthy"} -
Test Frontend:
- Open
frontend/index.htmlin your browser - Click "Test Connection" - should show "Connected β "
- Open
-
Test Extension:
- Visit Google Meet
- Click the AI MOM extension icon
- Click "Test Connection" - should show "Connected β "
- Open Frontend: Navigate to
frontend/real.html - Connect: Click "Connect to Meeting"
- Configure:
- Set your name for speaker alerts
- Add custom keywords to monitor
- Choose notification preferences
- Start Recording: Click "Start Recording"
- Monitor: Watch live transcription with speaker colors
- Stop & Download: Click "Stop Recording" and export results
- Open Frontend: Navigate to
frontend/file.html - Upload Audio:
- Drag and drop audio file, OR
- Click "Browse Files" to select
- Process: Wait for transcription and AI analysis
- Review Results:
- Read full transcript
- Review AI-generated summary
- Check key points and action items
- Export: Download transcript or summary as text
- Join Meeting: Go to Google Meet, Zoom, or Teams
- Open Extension: Click AI MOM icon in toolbar
- Configure Settings:
- Backend URL:
http://localhost:8000 - Language: Auto-detect or specific
- Enable overlay and auto-summary
- Backend URL:
- Start Recording: Click "βΊοΈ Start Recording"
- Share Screen:
- Select window or entire screen
- β CRITICAL: Check "Share system audio"
- Click "Share"
- Monitor: Watch floating overlay with live transcription
- Stop: Click "βΉοΈ Stop Recording" in extension or overlay
- Access Profile: Navigate to
frontend/profile.html - Set Role: Choose your role (Developer, Manager, etc.)
- Add Projects: List current projects for context
- Keywords: Add important keywords to track
- Save: Click "Update Profile"
- Benefit: Get personalized summaries and alerts
GET /healthResponse:
{
"status": "healthy",
"timestamp": "2025-10-07T10:30:00Z"
}POST /api/process-audio
Content-Type: multipart/form-data
file: <audio_file>Response:
{
"transcript": "Full meeting transcript...",
"summary": {
"overview": "Meeting summary...",
"key_points": ["Point 1", "Point 2"],
"action_items": ["Task 1", "Task 2"],
"conclusions": ["Conclusion 1"],
"participants": ["Speaker 1", "Speaker 2"]
},
"speakers": {
"SPEAKER_00": ["segment1", "segment2"],
"SPEAKER_01": ["segment3"]
},
"processing_time": 45.2,
"word_count": 1523
}ws://localhost:8000/ws/audio
Client β Server (Audio Data):
{
"type": "audio",
"data": "<base64_encoded_audio>",
"format": "int16",
"sampleRate": 16000
}Server β Client (Transcription):
{
"type": "transcription",
"text": "Transcribed text...",
"speaker": "SPEAKER_00",
"timestamp": "00:05:23",
"is_final": true
}Server β Client (Summary):
{
"type": "summary",
"summary": {
"overview": "...",
"key_points": [...],
"action_items": [...],
"conclusions": [...]
}
}See a live, always-current overview in PROJECT_STRUCTURE.md.
cd backend
pytest test/ -v
# With coverage
pytest test/ --cov=app --cov-report=html
# Specific test
pytest test/test_api_costs.py -vcd backend
python test/performance_test.pycd backend
python test/api_cost_monitor.py# Format code
black app/
# Lint code
flake8 app/
# Type checking
mypy app/| Component | Coverage | Tests |
|---|---|---|
| Backend API | 85% | 45 tests |
| Audio Processor | 90% | 12 tests |
| AI Summarizer | 80% | 15 tests |
| WebSocket Handler | 75% | 8 tests |
| User Profile | 95% | 10 tests |
| Multi-API Processor | 88% | 18 tests |
- β Unit tests for all major components
- β Integration tests for API endpoints
- β WebSocket connection tests
- β Performance benchmarking
- β API cost tracking
- β Error handling validation
- β Concurrent processing tests
# Manual
cd backend
pytest test/ -v --cov=appUsing Uvicorn (recommended for production):
cd backend
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4Using Docker:
# Build image
docker build -t ai-mom-backend ./backend
# Run container
docker run -d \
-p 8000:8000 \
-e GROQ_API_KEY=your_key \
-e OPENROUTER_API_KEY=your_key \
--name ai-mom \
ai-mom-backend- Static Hosting: Upload
frontend/to any static host (Netlify, Vercel, GitHub Pages) - Update Backend URL: Modify WebSocket and API URLs in JavaScript files
- CORS Configuration: Ensure backend allows your domain
-
Chrome Web Store:
- Create developer account
- Package extension as ZIP
- Submit for review
- Follow Chrome Web Store guidelines
-
Internal Distribution:
- Share
extension/folder - Users load as unpacked extension
- Share
# API Keys
GROQ_API_KEY=prod_groq_key
OPENROUTER_API_KEY=prod_openrouter_key
# Models
GROQ_MODEL=llama-3.3-70b-versatile
OPENROUTER_MODEL=openai/gpt-4o-mini
# Server Config
HOST=0.0.0.0
PORT=8000
ENVIRONMENT=production
# CORS
ALLOWED_ORIGINS=https://yourdomain.com,https://www.yourdomain.com
# Rate Limiting
RATE_LIMIT=100/minute
# Logging
LOG_LEVEL=INFOProblem: ModuleNotFoundError: No module named 'fastapi'
Solution:
cd backend
pip install -r requirements.txtProblem: Frontend shows "WebSocket connection failed"
Solution:
- Verify backend is running:
curl http://localhost:8000/health - Check backend logs for errors
- Ensure firewall allows port 8000
- Try changing URL to
127.0.0.1:8000instead oflocalhost:8000
Problem: Extension popup shows "Could not establish connection"
Solution:
- Reload Extension:
- Go to
chrome://extensions/ - Find "AI MOM Meeting Intelligence"
- Click reload button (π)
- Go to
- Reload Meeting Page: Refresh Google Meet/Zoom page
- Check Backend: Ensure backend is running on
http://localhost:8000 - Check Logs: Open browser console (F12) for errors
Problem: Recording starts but no text appears
Solution:
- Check Audio: Ensure you checked "Share system audio" when screen sharing
- Test Microphone: Verify microphone works in other apps
- Backend Logs: Check for audio processing errors
- WebSocket: Ensure WebSocket connection is established (check browser console)
Problem: Transcription takes too long
Solution:
- GPU: Ensure CUDA is installed for GPU acceleration
- Check GPU: Backend should log "Using device: cuda"
- CPU Mode: If no GPU, transcription is slower but works
- File Size: Large files (>100MB) take longer
Problem: ValueError: GROQ_API_KEY environment variable not set
Solution:
- Create
backend/.envfile - Add API keys:
GROQ_API_KEY=your_actual_key_here OPENROUTER_API_KEY=your_actual_key_here
- Restart backend server
- Check Documentation: Component-specific READMEs in each folder and
docs/ - Quick Start: See
docs/setup/QUICK_START.md - View Logs: Backend console shows detailed error messages
- GitHub Issues: Report bugs
| Service | Model | Input Cost | Output Cost | Free Tier |
|---|---|---|---|---|
| Groq | Llama 3.3 70B | $0.59/1M tokens | $0.79/1M tokens | β Generous |
| Groq | Llama 3.1 70B | $0.59/1M tokens | $0.79/1M tokens | β Generous |
| OpenRouter | GPT-4o Mini | $0.15/1M tokens | $0.60/1M tokens | β Available |
| OpenRouter | Claude Haiku | $0.25/1M tokens | $1.25/1M tokens | β Available |
| OpenRouter | Gemini Flash | $0.075/1M tokens | $0.30/1M tokens | β Available |
| Meeting Type | Duration | Tokens Used | Total Cost | Notes |
|---|---|---|---|---|
| Small Meeting | 15 min | ~5,000 | FREE | Well within free tier |
| Standard Meeting | 1 hour | ~20,000 | $0.02 | Negligible cost |
| Long Meeting | 2 hours | ~40,000 | $0.04 | Still very cheap |
| All-Day Workshop | 6 hours | ~120,000 | $0.12 | Extremely affordable |
π For normal usage, AI MOM is essentially FREE!
# Check API costs
cd backend
python test/api_cost_monitor.py
# View cost breakdown
pytest test/test_api_costs.py -v- FastAPI Tutorial: fastapi.tiangolo.com
- WebSocket Guide: MDN WebSockets
- Chrome Extension Docs: developer.chrome.com
- Whisper Documentation: OpenAI Whisper
- Quick Start Guide:
docs/setup/QUICK_START.md - Environment Variables:
docs/configuration/ENV_VARIABLES_REFERENCE.md - Backend Guide:
backend/README.md - Frontend Guide:
frontend/README.md
We welcome contributions! Here's how you can help:
- Report Bugs: Open an issue with detailed reproduction steps
- Suggest Features: Share your ideas for improvements
- Submit PRs: Fix bugs or add new features
- Improve Docs: Help make documentation clearer
- Share Feedback: Tell us about your experience
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Test thoroughly
- Commit:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open a Pull Request
- Follow PEP 8 for Python code
- Use ESLint for JavaScript
- Write tests for new features
- Update documentation
- Add comments for complex logic
This project is licensed under the MIT License - see the LICENSE file for details.
β
Commercial use allowed
β
Modification allowed
β
Distribution allowed
β
Private use allowed
- OpenAI Whisper - Audio transcription
- FastAPI - Web framework
- Groq - Fast AI inference
- OpenRouter - Multi-model API gateway
- PyAnnote - Speaker diarization
- Baisampayan Dey
- Sanjit Vinod Pandey
- Dhruv Motovall
- Aryan Patil
- Documentation: Check component READMEs
- GitHub Issues: Report problems
- Discussions: Community forum
- Star this repository to get updates
- Watch for new releases
- Follow on GitHub for news
- Mobile App: iOS and Android applications
- Cloud Storage: Integration with Google Drive, Dropbox
- Calendar Integration: Automatic meeting scheduling
- Video Processing: Video file transcription
- Language Support: 50+ languages
- Team Features: Multi-user workspaces
- Analytics Dashboard: Meeting insights and trends
- Export Formats: PDF, DOCX, PPTX
- Integrations: Slack, Teams, Notion
- Voice Commands: Hands-free control