An end-to-end, well-documented Python project that demonstrates how to build an LLM-powered query–retrieval and decision engine over large, unstructured documents (PDFs, Word, e-mails).
- Multi-format ingestion – loaders for PDF, DOCX, and E-mail files extract text + metadata.
- Semantic vector store – documents are chunked, embedded via
sentence-transformers, and indexed with FAISS for fast nearest-neighbour search. - Natural-language query parser – rule-based + LLM fallback extracts structured fields (age, procedure, location, policy age, …).
- Decision engine – pluggable logic evaluates retrieved clauses and returns JSON containing:
{ "decision": "approved | rejected", "amount": 12345.67, "justification": "Text explanation …", "clauses": [ {"id": "…", "text": "…"} ] } - CLI –
python -m hackrx_llm --docs /path/to/folder --ask "46M knee surgery Pune 3-month policy". - Extensible – swap embeddings, LLM provider, or decision logic.
- Test-driven –
pytestcovers parsing and retrieval.
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Build vector store (runs ingestion automatically)
python -m hackrx_llm ingest --docs sample_docs
# Ask a question
python -m hackrx_llm ask --query "46-year-old male, knee surgery in Pune, 3-month policy" --top_k 5If an OpenAI key is present (export OPENAI_API_KEY=…), the parser will enrich/validate fields via GPT automatically; otherwise, rule-based extraction is used.
├── hackrx_llm/ ← Library package
│ ├── ingestion/ ← PDF, Word, e-mail loaders
│ ├── parser.py ← Query → structured data
│ ├── retriever.py ← Vector store + semantic search
│ ├── decision_engine.py
│ ├── schema.py ← Pydantic models
│ ├── cli.py ← Typer CLI
│ └── __init__.py
├── tests/ ← Unit tests (pytest)
├── requirements.txt
└── README.md ← You are here
graph TD
A(User Query) --> B(Parser)
B --> C[Structured Query]
C --> D(Retriever)
D --> E[Relevant Clauses]
C --> F(Decision Engine)
E --> F
F --> G[JSON Response]
- Fork -> git clone -> create feature branch.
- Ensure
pytestpasses & runblack/ruff. - PR with clear description.
MIT © 2025 Bajaj Finserv Health Ltd.