This repository contains a RAG (Retrieval-Augmented Generation) system powered by an open-source LLM, PGVector as the vector database, and LangChain for orchestration. The project includes full documentation, setup instructions, and evaluation tools.
- RAG model serving system based on an open-source LLM.
- Uses PGVector as the vector database (
rubythalib/pgvector:latest). - Complete documentation for setup and installation.
- Evaluation spreadsheet containing:
- 25 questions
- 25 ground-truth answers from SOP
- 25 model-generated answers (LLM output)
| Component | Technology |
|---|---|
| LLM | llama-3.1-8b-instant (Groq) |
| Embedding | sentence-transformers/all-MiniLM-L6-v2 (HF) |
| Vector Store | PGVector |
| Orchestration | LangChain |
| API Serving | FastAPI |
curl -Ls https://astral.sh/uv/install.sh | bash
# Make sure ~/.local/bin is in PATH
export PATH="$HOME/.local/bin:$PATH"git clone https://github.com/fahmiaziz98/technical_test.git
cd technical_testuv venv .venv
source .venv/bin/activateuv pip install -r requirements.txtdocker run --name pgvector-container \
-e POSTGRES_USER=user \
-e POSTGRES_PASSWORD=user \
-e POSTGRES_DB=SOP_perusahaan \
-p 6024:5432 \
-d rubythalib/pgvector:latestcp .env.example .envFill the .env file with your local credentials.
Generate a Groq API Key at: https://console.groq.com/keys
uv run etl/indexing.pyuv run src/service.pyAPI Docs available at:
http://0.0.0.0:8000/docs
curl -X 'POST' \
'http://0.0.0.0:8000/api/v1/ask' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"session_id": "123sh",
"query": "What are the requirements for working overtime and how should it be reported?",
"method": "hybrid" # or 'native'
}'{
"session_id": "123sh",
"query": "What are the requirements for working overtime and how should it be reported?",
"answer": "The requirement for working overtime is direct supervisor approval. Employees working beyond regular hours are entitled to overtime compensation. Overtime reports must be submitted no later than 1 business day after the overtime is performed.",
"metadata": {
"method": "hybrid",
"model": "llama-3.1-8b-instant",
"retriever_config": {
"type": "hybrid",
"collection": "doc_SOP_v2",
"top_k": 3,
"vector_store_top_k": 3,
"bm25_top_k": 3,
"weights": [0.5, 0.5],
"rerank_top_n": 5
}
}
}uv run evaluate.py --method native --delay 5 --input evaluasi_data.xlsxEvaluation results will be saved into a spreadsheet containing:
- Column A: Question
- Column B: Ground-truth answer (SOP)
- Column C: Model-generated answer (LLM)
- Column D: Native output
- Column E: Hybrid output
Stop and remove PGVector container:
docker stop pgvector-container
docker rm pgvector-container-
RAG model serving based on company SOP documents
-
PGVector with image
rubythalib/pgvector:latest -
API service with FastAPI
-
Documentation for installation & usage
-
Script for indexing documents into vector DB
-
/askendpoint with hybrid & native retrieval -
Spreadsheet-based evaluation of answers
-
CURL example for manual testing
-
.envand Groq API key handling -
Document extraction using SmolDocling VLM (GPU-based for accuracy)
-
Hybrid retrieval + reranker for better contextual answers
-
Improved embeddings for better retrieval quality, e.g.:
