Smart Applicant Tracking System (ATS) that uses advanced NLP and RAG (Retrieval-Augmented Generation) that can search for the best candidates from a vector database of resumes and recommend the best one using a Large Language Model LLM (Google Gemini)
- Collected data from several resources as resume pdf documents using file scrappers, resume images using OCR model, text from CSV files using Pandas into a Vector Database using Chroma DB.
- Built a retrieval system that retrieved most similar documents to the job description.
- Embedded an LLM model to collect and reformat the retrieved documents and the job description to recommend the best candidate fit the job.
- Designed a user friendly graphical interface using Streamlit with a integrated job description samples to try and testing the app.
This project enhances the traditional ATS by:
- Parsing and understanding resumes using AI
- Matching resumes with job descriptions
- Answering questions based on stored CV data using a RAG-based QA pipeline
- Graphical Interface: Streamlit
- Backend: Python
- Vector Store: ChromaDB
- LLM: Google Gemini
- Embeddings:
sentence-transformers - Other Libraries: pandas, langchain, sentence_transformers.
Smart-ATS/
├── .streamlit/ # Streamlit config files
├── Data/ # Data-related folders
│ ├── job_description/ # Sample or scraped job description texts
│ └── vector_db/ # Vector database files (Chroma DB)
├── images/ # Visual assets and screenshots
├── notebooks/ # Jupyter notebooks for experimentation
├── .gitattributes # Git settings
├── README.md # Project documentation
├── Retriever.py # Core retrieval logic for RAG
├── requirements.txt # Python dependencies
└── st_app.py # Streamlit app entry point
Install required packages:
pip install -r requirements.txt
Run the app:
streamlit run st_app.py
huggingface pdf resumes dataset | Kaggle pdf resumes dataset | Kaggle CSV file dataset | Kaggle image resumes dataset
Developed by Abdallah Fekry





