Minimal, fast agentic RAG with:
- Hybrid retrieval (BM25 + FAISS) with optional cross-encoder rerank
- Plan selection (single, multi, needs_calc, needs_sql)
- Math tool (exact expression evaluation)
- SQL tool over CSVs via DuckDB, with LLM-assisted table selection/querying
- Citation-first synthesis + light claim verification (NLI) and auto-widening
- Streaming answers with TTFT and simple Web UI + API
datasets/: raw downloads (SQuAD, HotpotQA, WikiSQL tarball)data/: processed runtime artifactsdata/docs/: plain-text corpus from SQuAD/Hotpotdata/tables/: CSVs from WikiSQL (auto-loaded into DuckDB)
data/retrieval/: FAISS index + metadata (index.faiss,metadata.json)src/tools/: DuckDB + table index builder, math tool
- Node 18+
- macOS/Linux (faiss-node prebuilds supported)
Set these in .env or your shell:
LM_BASE_URL: OpenAI-compatible endpoint (e.g., LM Studio or OpenAI)LM_API_KEY: API key for the endpointLM_MODEL: chat/completions model (used for planning, generation, NLI)EMBED_MODEL: embedding model for FAISS indexing/searchRERANK_MODEL(optional): ms-marco-style embedding model for rerankPORT(optional): web server port (default 3000)
npm installnpm run prep:all
# or run individually: prep:squad, prep:hotpot, prep:wikisql, prep:merge# 1) Build text index (FAISS + BM25 metadata) and 2) build table index for SQL tool
npm run indexLM_BASE_URL=... LM_API_KEY=... LM_MODEL=... EMBED_MODEL=... \
npx tsx src/index.ts "Summarize key themes with citations."npm run web
# open http://localhost:3000curl -s http://localhost:3000/api/ask \
-H 'content-type: application/json' \
-d '{"question":"Which sections discuss X? Cite passages."}' | jq .Quick run (limit cases):
EVAL_LIMIT=20 npm run eval
# or: npm run eval -- --limit=20Full run:
npm run evalOutputs a recall@k proxy, latencies (p50/p95), and retrieval IDs per question.
Notes
- Size: prep builds ~380 total cases (≈200 SQuAD + ≈120 Hotpot + ≈60 WikiSQL). The runner evaluates document-grounded cases by default; SQL-only cases are skipped in recall@k.
- Runtime: end-to-end with LLMs can be long; use
EVAL_LIMITfor a quick sanity check.
- Plan selection:
single | multi | needs_calc | needs_sql - Retrieval: BM25 + FAISS (requires embeddings); optional cross-encoder rerank
- Tools (conditional): math or DuckDB SQL (tables auto-loaded from
data/tables/) - Draft answer with citations; extract claims; NLI verify; widen context and re-synthesize if weak
- Stream answer + trace (timings, citations, tool calls)
- Single (doc retrieval):
- Who killed King Harold II at the Battle of Hastings?
- Multi-hop-ish (combine facts):
- In the Norman context, who considered England their most important holding, and which language did Anglo‑Norman become distinct from?
- Needs calc (math tool):
- Compute (12.5 - 3.2) * 4 + 7.
- Needs SQL (DuckDB on WikiSQL tables):
- How many players are listed for Toronto in the 2005-06 season?
- Not answerable / out‑of‑scope:
- What is the weather in San Francisco right now?
Tip: For SQL, you can also ask natural language like "count players by team"; the system selects relevant tables and generates a query. Tables are derived from WikiSQL CSVs; names depend on dataset contents.
npm run prep:*: download and build corpora and tablesnpm run index: build FAISS and table indexnpm run web: start Web UI/APInpm run eval: run quick eval over bundled sets
- FAISS/embeddings: ensure
EMBED_MODELreachable viaLM_BASE_URL; artifacts are stored underdata/retrieval/ - Rerank disabled: set
LM_BASE_URLandRERANK_MODEL(logs will note fallback) - DuckDB: CSVs must exist in
data/tables/(created by prep). Re-runnpm run indexafter adding CSVs