Skip to content

This project builds an LLM-powered audio summarization pipeline that converts spoken content into concise, meaningful text summaries. It integrates speech-to-text processing with large language models to demonstrate practical applications of Generative AI for content understanding and automation.

Notifications You must be signed in to change notification settings

Mounika-Geriki/Audio_Summary_LLM_GenAI_Capstone_Kaggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

🎙️ Audio_Summary_LLM_GenAI_Capstone_Kaggle

From Voice to Action: AI-Powered Meeting Assistant

Python GenAI Status Contributions

An end-to-end AI system that transforms audio recordings into structured meeting insights using Google Gemini.
This project demonstrates real-world Retrieval-Augmented Generation (RAG) and advanced LLM reasoning to automate tasks such as:

  • 🎧 Transcription
  • 📝 Summarization
  • 📌 Action item extraction
  • 🧑‍🤝‍🧑 Speaker-based assignment
  • 📤 Export to JSON & Markdown

🚀 Project Overview

This project converts raw meeting audio into actionable insights.
The workflow includes:

  1. Audio Transcription – Converts speech into text using Gemini’s audio capabilities
  2. Summarization – Extracts concise summaries of meeting discussions
  3. Action Item Extraction – Identifies to-dos and assigns them to speakers
  4. Fine-Tuning Pipeline – (Optional) Improve action-item detection using custom examples
  5. Exportable Outputs – Produces clean JSON + Markdown summaries

This enables teams to move from “What did we discuss?” to
“Here are the tasks, decisions, and next steps.”


🧠 GenAI Capabilities Demonstrated

  • Audio Understanding — Convert audio → text
  • Structured Summaries — Hierarchical, clean outputs
  • Action Item Reasoning — Detect tasks, owners, and deadlines
  • Speaker Attribution — Map tasks to individuals
  • Model Fine-Tuning — Add domain-specific consistency
  • Multi-format Output — JSON + Markdown for downstream systems


🔧 Installation

1️⃣ Clone repository

git clone https://github.com/your-username/audio-summary.git
cd audio-summary

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Add your Gemini API key

Create a .env file:

GEMINI_API_KEY=your_key_here

▶️ Usage

Run the tool on an audio file:

python audio_summary.py --audio sample_audio/meeting1.wav

Produces:

  • transcript.txt
  • summary.md
  • action_items.json

📊 Sample Output Formats

📝 Summaries (Markdown)

## 📝 Meeting Summary

### Key Discussion Points
- Budget approval is pending finance sign-off
- Marketing team needs final design assets by Friday

### Decisions Made
- Launch date confirmed for April 12

📌 Action Items (JSON)

{
  "action_items": [
    {
      "task": "Prepare final design assets",
      "assigned_to": "Alex",
      "deadline": "Friday"
    },
    {
      "task": "Send updated budget to finance",
      "assigned_to": "Priya"
    }
  ]
}

🔮 Future Improvements

🌍 Multi-language meeting support 📚 Domain-specific vocabulary enhancements 😊 Sentiment & tone analysis 🗂️ Topic classification across meetings 🕒 Decision tracking over time

🏁 Conclusion

This project provides a complete workflow for turning raw audio into structured, actionable insights. It serves as a strong foundation for:

  • Meeting assistants
  • Customer service analytics
  • Interview summarization
  • Enterprise knowledge capture

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

This project builds an LLM-powered audio summarization pipeline that converts spoken content into concise, meaningful text summaries. It integrates speech-to-text processing with large language models to demonstrate practical applications of Generative AI for content understanding and automation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published