HackRx 6.0 – LLM Document Processing System

An end-to-end, well-documented Python project that demonstrates how to build an LLM-powered query–retrieval and decision engine over large, unstructured documents (PDFs, Word, e-mails).

Features

Multi-format ingestion – loaders for PDF, DOCX, and E-mail files extract text + metadata.
Semantic vector store – documents are chunked, embedded via sentence-transformers, and indexed with FAISS for fast nearest-neighbour search.
Natural-language query parser – rule-based + LLM fallback extracts structured fields (age, procedure, location, policy age, …).

Decision engine – pluggable logic evaluates retrieved clauses and returns JSON containing:

{
  "decision": "approved | rejected",
  "amount": 12345.67,
  "justification": "Text explanation …",
  "clauses": [ {"id": "…", "text": "…"} ]
}

CLI – python -m hackrx_llm --docs /path/to/folder --ask "46M knee surgery Pune 3-month policy".
Extensible – swap embeddings, LLM provider, or decision logic.
Test-driven – pytest covers parsing and retrieval.

Quick-start

python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

# Build vector store (runs ingestion automatically)
python -m hackrx_llm ingest --docs sample_docs

# Ask a question
python -m hackrx_llm ask --query "46-year-old male, knee surgery in Pune, 3-month policy" --top_k 5

If an OpenAI key is present (export OPENAI_API_KEY=…), the parser will enrich/validate fields via GPT automatically; otherwise, rule-based extraction is used.

Project Structure

├── hackrx_llm/          ← Library package
│   ├── ingestion/       ← PDF, Word, e-mail loaders
│   ├── parser.py        ← Query → structured data
│   ├── retriever.py     ← Vector store + semantic search
│   ├── decision_engine.py
│   ├── schema.py        ← Pydantic models
│   ├── cli.py           ← Typer CLI
│   └── __init__.py
├── tests/               ← Unit tests (pytest)
├── requirements.txt
└── README.md            ← You are here

Design Diagram

graph TD
    A(User Query) --> B(Parser)
    B --> C[Structured Query]
    C --> D(Retriever)
    D --> E[Relevant Clauses]
    C --> F(Decision Engine)
    E --> F
    F --> G[JSON Response]

Contributing

Fork -> git clone -> create feature branch.
Ensure pytest passes & run black / ruff.
PR with clear description.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
api		api
backend_index		backend_index
docs		docs
documents		documents
hackrx_llm		hackrx_llm
render		render
tests		tests
.gitignore		.gitignore
.vercelignore		.vercelignore
Dockerfile		Dockerfile
README.md		README.md
build.sh		build.sh
create_vector_db.py		create_vector_db.py
gunicorn_config.py		gunicorn_config.py
package.json		package.json
requirements-vercel.txt		requirements-vercel.txt
requirements.txt		requirements.txt
runtime.txt		runtime.txt
vercel-build.js		vercel-build.js
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HackRx 6.0 – LLM Document Processing System

Features

Quick-start

Project Structure

Design Diagram

Contributing

License

About

Uh oh!

Languages

Aditya-Ranjan1234/FinQuery

Folders and files

Latest commit

History

Repository files navigation

HackRx 6.0 – LLM Document Processing System

Features

Quick-start

Project Structure

Design Diagram

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages