Meet Hearo - a real-time meeting companion that transcribes speech with Whisper (via faster-whisper), streams smart keywords alongside your call, and turns each tap into instant answers. Click any keyword to get a bite‑size summary, the newest relevant image, and up‑to‑the‑minute news - all without leaving your meeting.
Table of Contents
Hearo is a real-time note‑taking and information‑lookup assistant for online meetings and classes. It helps solve the problem of information overload and missing important points while you are listening, note‑taking, and searching — all at once.
Hearo captures system audio (Zoom, Google Meet, YouTube, etc.), converts speech to text using Whisper, extracts important keywords with NLP, and shows those keywords on a floating UI so you can follow the main ideas without manual note‑taking. When you click a keyword, Hearo returns a short summary, the latest image, and related news — instantly.
Project by team Free Five 🆓5️⃣:
- Tran Xuan Bao - 23020332
- Nguyen Hoang Tu - 23020428
- Bui Thanh Dan - 23020342
- Nguyen Duy Hai Bang - 23020335
- Le Vu Hieu - 23020365
| Component | Technologies / Notes |
|---|---|
| UI/Overlay | PyQt6 floating panel (glassmorphism‑style) |
| Speech-to-Text | OpenAI Whisper (via faster-whisper) |
| NLP for Keywords | spaCy (noun chunks & proper nouns) |
| Audio Capture | soundcard (system loopback) |
| Search Providers | Images / definitions / news via external APIs |
-
Python 3.11+
-
ffmpeg(recommended for audio/Whisper) -
Permission for system audio loopback recording (if you capture system sound)
-
(Optional) Git for faster source download
# Install git if you don't have it
git clone https://github.com/its6ueq/Hearo.git Hearo
cd Hearo
pip install --upgrade pip
pip install -r requirements.txt
python -m Hearo.main
- STT and keyword extraction are processed locally with Whisper & spaCy.
- Lookups for images/news may call external APIs: please review and accept the privacy policy before enabling.
- Options to limit/hide sensitive content; do not store audio unless explicitly allowed.
- Near real‑time speech‑to‑text; supports Vietnamese and multilingual scenarios.
- Maintains conversation context across the meeting.
- Automatically extracts keywords (noun chunks / proper nouns).
- Displays compact interactive chips.
- Topic grouping and timeline tracking.
- For each keyword: returns a short summary, latest images, and related news.
- One click - zero context switching.
- Pluggable search providers (images/definitions/news).
- Source filters and timestamps for contextual reference.
Delivery phases
- Phase 1: Audio capture → Whisper → spaCy → Overlay → Search Engine pipeline.
- Phase 2: Desktop MVP; click‑to‑lookup.
- Phase 3: Speed optimizations for smooth near‑real‑time (CPU‑friendly).
- Phase 4: Beta testing with students/office workers/tech learners.
- Phase 5: Premium tier - content saving, account‑based keyword sync, context summaries.
Additional directions
- Real‑time captions to support the hard‑of‑hearing community.
- Integrations with popular online meeting/learning platforms.
- Security hardening: least‑privilege options; opt‑out of external data sharing.
Highlights
- Live, interactive keyword overlay - a novel UX approach.
- Context‑aware grouping vs. isolated keywords.
- Clear path to monetization: standalone app, plugins, or B2B API.
- This document summarizes the system; deeper technical docs will be added when the codebase is ready.
- Questions or suggestions? Please contact the team via the emails above.
- Made with ❤️ by Free Five.