This medical chatbot was created using RAG (Retrieval Augmented Generation) to produce more correct responses than the LLM model by providing context data from a vectorized database that can be updated with more recent data. This extra context reduces hallucinations in responses, the system prompt creates consistent responses, and citations to where the top 3 sources that provided the context are also provided in the response. Although there is html code for a simple UI, the main purpose is to create an backend API endpoint using Flask to be used in another frontend program with a better UI which is in another repository here. The data that was provided was the Gale Encyclopedia of Medicine and the data was split and embedded using the all-MiniLM-L6-v2 from hugging face. The vectorized data was then stored on PineCone. Used langchain to create documents, split, create the retrieval chain to retrieve the context from Pinecone, and apply the prompt to our LLM. Lastly, the LLM that was used was gemini-2.0-flash.
python3.10 -m venv llmappActivate it
Mac
source ./llmapp/bin/activateWindows
env/Scripts/activate.bat // in CMDpip install -r requirements.txtPINECONE_API_KEY = "..."
GOOGLE_API_KEY = "..."
cd <working directory>
python store_index.pypython app.pyNow you can ask medical question to the bot!