Emotion-Aware ASR

An advanced speech recognition system that goes beyond transcription to detect emotional context in spoken language. This student project combines OpenAI's Whisper for speech recognition with emotion detection models to analyze both vocal patterns and textual content.

This is an individual student project developed to explore the intersection of speech recognition and emotion detection. The goal was to create a practical application that demonstrates how AI can understand not just the words we say, but the emotional context behind them.

🌟 Features

Dual Emotion Analysis: Detects emotions from both audio characteristics and textual content
Multi-language Support: Automatic language detection with Whisper ASR
Real-time Processing: Fast analysis with cached models for better performance
Interactive Visualizations: Beautiful charts comparing audio vs. text emotions
Web Interface: User-friendly Streamlit application

🚀 Live Demo

Check out the deployed website here !

📊 Sample Results

Transcript with Emotion Annotation

Emotion Analysis Dashboard

Audio vs Text Emotion Comparison

🛠️ Installation

Clone the repository

git clone https://github.com/antarades/emotion-aware-automatic-speech-recognition.git
cd emotion-aware-automatic-speech-recognition

Install dependencies
```
pip install -r requirements.txt
```

Install system dependencies (for audio processing)

# macOS
brew install ffmpeg

# Windows 
choco install ffmpeg

🎯 Usage

Web Application

streamlit run app.py

Command Line Interface

# Analyze an audio file
python src/pipeline.py --mode file --audio_file path/to/audio.wav

# Record and analyze audio via terminal
python src/pipeline.py --mode record

📁 Project Structure

emotion-aware-asr/
├── src/
│   ├── asr_whisper.py      
│   ├── emotion_model.py    
│   ├── pipeline.py         # CLI pipeline
│   ├── record_audio.py     # Terminal audio recording utility
├── app.py                  # Streamlit web application
├── home-image.svg          # Home Page illustration
├── requirements.txt
├── packages.txt     
└── README.md

🧠 Models Used

Speech Recognition

OpenAI Whisper (small variant): Accurate speech-to-text conversion with multi-language support

Emotion Detection

Audio Analysis: superb/wav2vec2-base-superb-er - detects anger, happiness, neutrality, and sadness from vocal patterns
Text Analysis: j-hartmann/emotion-english-distilroberta-base - detects joy, sadness, anger, fear, surprise, disgust, and neutrality from text content

🔮 Future Enhancements

Web-based Audio Recording: Direct audio recording capability within the web interface
Additional Language Support: Expanded emotion detection for non-English languages

🎨 Customization

You can customize the application by:

Model Size: Adjust the Whisper model size in app.py (tiny, base, small, medium, large)
Language Forcing: Force specific language detection in the transcribe_audio function
Emotion Thresholds: Modify confidence thresholds in emotion_model.py
UI Styling: Customize the Streamlit interface in the CSS section of app.py

📈 Performance Notes

The small Whisper model provides a good balance between accuracy and speed
Emotion detection takes approximately 2-5 seconds depending on audio length
Models are cached after first load for faster subsequent processing
Current audio recording is terminal-based; web recording will be added in future versions

📄 License

This project is licensed under the MIT License.

🙋‍♀️ Author

Built by Antara Srivastava 📧 [email protected] 🌐 github.com/antarades

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analysis.png		analysis.png
app.py		app.py
comparison.png		comparison.png
home-image.svg		home-image.svg
packages.txt		packages.txt
requirements.txt		requirements.txt
transcript.png		transcript.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emotion-Aware ASR

🌟 Features

🚀 Live Demo

📊 Sample Results

Transcript with Emotion Annotation

Emotion Analysis Dashboard

Audio vs Text Emotion Comparison

🛠️ Installation

🎯 Usage

Web Application

Command Line Interface

📁 Project Structure

🧠 Models Used

Speech Recognition

Emotion Detection

🔮 Future Enhancements

🎨 Customization

📈 Performance Notes

📄 License

🙋‍♀️ Author

About

Uh oh!

Releases

Packages

Languages

License

antarades/emotion-aware-automatic-speech-recognition

Folders and files

Latest commit

History

Repository files navigation

Emotion-Aware ASR

🌟 Features

🚀 Live Demo

📊 Sample Results

Transcript with Emotion Annotation

Emotion Analysis Dashboard

Audio vs Text Emotion Comparison

🛠️ Installation

🎯 Usage

Web Application

Command Line Interface

📁 Project Structure

🧠 Models Used

Speech Recognition

Emotion Detection

🔮 Future Enhancements

🎨 Customization

📈 Performance Notes

📄 License

🙋‍♀️ Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages