An end-to-end AI system that transforms audio recordings into structured meeting insights using Google Gemini.
This project demonstrates real-world Retrieval-Augmented Generation (RAG) and advanced LLM reasoning to automate tasks such as:
- 🎧 Transcription
- 📝 Summarization
- 📌 Action item extraction
- 🧑🤝🧑 Speaker-based assignment
- 📤 Export to JSON & Markdown
This project converts raw meeting audio into actionable insights.
The workflow includes:
- Audio Transcription – Converts speech into text using Gemini’s audio capabilities
- Summarization – Extracts concise summaries of meeting discussions
- Action Item Extraction – Identifies to-dos and assigns them to speakers
- Fine-Tuning Pipeline – (Optional) Improve action-item detection using custom examples
- Exportable Outputs – Produces clean JSON + Markdown summaries
This enables teams to move from “What did we discuss?” to
“Here are the tasks, decisions, and next steps.”
- Audio Understanding — Convert audio → text
- Structured Summaries — Hierarchical, clean outputs
- Action Item Reasoning — Detect tasks, owners, and deadlines
- Speaker Attribution — Map tasks to individuals
- Model Fine-Tuning — Add domain-specific consistency
- Multi-format Output — JSON + Markdown for downstream systems
git clone https://github.com/your-username/audio-summary.git
cd audio-summarypip install -r requirements.txtCreate a .env file:
GEMINI_API_KEY=your_key_hereRun the tool on an audio file:
python audio_summary.py --audio sample_audio/meeting1.wavProduces:
- transcript.txt
- summary.md
- action_items.json
## 📝 Meeting Summary
### Key Discussion Points
- Budget approval is pending finance sign-off
- Marketing team needs final design assets by Friday
### Decisions Made
- Launch date confirmed for April 12{
"action_items": [
{
"task": "Prepare final design assets",
"assigned_to": "Alex",
"deadline": "Friday"
},
{
"task": "Send updated budget to finance",
"assigned_to": "Priya"
}
]
}🌍 Multi-language meeting support 📚 Domain-specific vocabulary enhancements 😊 Sentiment & tone analysis 🗂️ Topic classification across meetings 🕒 Decision tracking over time
This project provides a complete workflow for turning raw audio into structured, actionable insights. It serves as a strong foundation for:
- Meeting assistants
- Customer service analytics
- Interview summarization
- Enterprise knowledge capture
This project is licensed under the MIT License - see the LICENSE file for details.