This demo showcases a MultiModal AI RAG Agent that leverages Text-To-Speech (TTS) and Speech-To-Text (STT) for LLM interactions using Deepgram and Groq LPU's.
Sentence Tranformers to build vector embeddings for the user message and uploaded documents that undergo cosine similarity testing to find the most relevant, for LLM context management. Dense and Sparse retrieval pipelines with Hybrid Search options. BM25 search algorithm with Colbert Reranking
DB connection through SQLAlchemy/ChromaDB for transcription sessions.
The demo is designed to stream STT and TTS to enhance speed.
INSTALLATION macos:
- brew install ffmpeg and portaudio
- pip install -r requirements.txt
windows powershell:
-
cd C:
curl -L -o ffmpeg-release-essentials.zip https://www.gyan.dev/ffmpeg/builds/ffmpeg-release-essentials.zip -
Extract the FFmpeg Package: powershell -command "Expand-Archive -Path .\ffmpeg-release-essentials.zip -DestinationPath C:\ffmpeg"
-
Add FFmpeg to the System PATH: setx /M PATH "%PATH%;C:\ffmpeg\ffmpeg-\bin" ###Replace with the actual version directory inside C:\ffmpeg (e.g., ffmpeg-5.1-essentials_build)###
LAUNCH FLASK WEB APP:
python3 alpha_app2.py
Toggle the sidebar for the AI RAG AGENT
CLI: python3 Quickagent.py
Create .env file for: GROQ_API_KEY = "" DEEPGRAM_API_KEY = ""
MAIL_USERNAME = "" MAIL_PASSWORD = "" MAIL_DEFAULT_SENDER = ""
OPENWEATHER_API_KEY = "" X-Api-Key =
