🎙️ Audio_Summary_LLM_GenAI_Capstone_Kaggle

From Voice to Action: AI-Powered Meeting Assistant

An end-to-end AI system that transforms audio recordings into structured meeting insights using Google Gemini.
This project demonstrates real-world Retrieval-Augmented Generation (RAG) and advanced LLM reasoning to automate tasks such as:

🎧 Transcription
📝 Summarization
📌 Action item extraction
🧑‍🤝‍🧑 Speaker-based assignment
📤 Export to JSON & Markdown

🚀 Project Overview

This project converts raw meeting audio into actionable insights.
The workflow includes:

Audio Transcription – Converts speech into text using Gemini’s audio capabilities
Summarization – Extracts concise summaries of meeting discussions
Action Item Extraction – Identifies to-dos and assigns them to speakers
Fine-Tuning Pipeline – (Optional) Improve action-item detection using custom examples
Exportable Outputs – Produces clean JSON + Markdown summaries

This enables teams to move from “What did we discuss?” to
“Here are the tasks, decisions, and next steps.”

🧠 GenAI Capabilities Demonstrated

Audio Understanding — Convert audio → text
Structured Summaries — Hierarchical, clean outputs
Action Item Reasoning — Detect tasks, owners, and deadlines
Speaker Attribution — Map tasks to individuals
Model Fine-Tuning — Add domain-specific consistency
Multi-format Output — JSON + Markdown for downstream systems

🔧 Installation

1️⃣ Clone repository

git clone https://github.com/your-username/audio-summary.git
cd audio-summary

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Add your Gemini API key

Create a .env file:

GEMINI_API_KEY=your_key_here

▶️ Usage

Run the tool on an audio file:

python audio_summary.py --audio sample_audio/meeting1.wav

Produces:

transcript.txt
summary.md
action_items.json

📊 Sample Output Formats

📝 Summaries (Markdown)

## 📝 Meeting Summary

### Key Discussion Points
- Budget approval is pending finance sign-off
- Marketing team needs final design assets by Friday

### Decisions Made
- Launch date confirmed for April 12

📌 Action Items (JSON)

{
  "action_items": [
    {
      "task": "Prepare final design assets",
      "assigned_to": "Alex",
      "deadline": "Friday"
    },
    {
      "task": "Send updated budget to finance",
      "assigned_to": "Priya"
    }
  ]
}

🔮 Future Improvements

🌍 Multi-language meeting support 📚 Domain-specific vocabulary enhancements 😊 Sentiment & tone analysis 🗂️ Topic classification across meetings 🕒 Decision tracking over time

🏁 Conclusion

This project provides a complete workflow for turning raw audio into structured, actionable insights. It serves as a strong foundation for:

Meeting assistants
Customer service analytics
Interview summarization
Enterprise knowledge capture

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
audio-summary-kaggle.ipynb		audio-summary-kaggle.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Audio_Summary_LLM_GenAI_Capstone_Kaggle

From Voice to Action: AI-Powered Meeting Assistant

🚀 Project Overview

🧠 GenAI Capabilities Demonstrated

🔧 Installation

1️⃣ Clone repository

2️⃣ Install dependencies

3️⃣ Add your Gemini API key

▶️ Usage

📊 Sample Output Formats

📝 Summaries (Markdown)

📌 Action Items (JSON)

🔮 Future Improvements

🏁 Conclusion

📄 License

About

Uh oh!

Releases

Packages

Languages

Mounika-Geriki/Audio_Summary_LLM_GenAI_Capstone_Kaggle

Folders and files

Latest commit

History

Repository files navigation

🎙️ Audio_Summary_LLM_GenAI_Capstone_Kaggle

From Voice to Action: AI-Powered Meeting Assistant

🚀 Project Overview

🧠 GenAI Capabilities Demonstrated

🔧 Installation

1️⃣ Clone repository

2️⃣ Install dependencies

3️⃣ Add your Gemini API key

▶️ Usage

📊 Sample Output Formats

📝 Summaries (Markdown)

📌 Action Items (JSON)

🔮 Future Improvements

🏁 Conclusion

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages