Skip to content

Pro-GenAI/FAQ-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project banner

FAQ-RAG: FAQ-Centric Vector Storage for Citation-Aware QA

Structured Knowledge Extraction for Trustworthy RAG Systems

AI LLMs Python License: CC BY 4.0

Preprint YouTube Blog

Next-Generation RAG Pipelines: Automatically transform documents into structured FAQs with grounded answers and dual-vector representations for accurate retrieval and citations.


Data ingestion

Ingestion

🔥 Why FAQ-RAG Changes How You Build RAG Systems

Traditional RAG systems index raw text. FAQ-RAG indexes knowledge.
Instead of embedding arbitrary chunks, FAQ-RAG converts documents into explicit questions and grounded answers, producing retrieval units that are semantically complete, citation-aware, and optimized for QA.

🎯 The Problem with Traditional RAG

  • Chunk-Centric Indexing: Embeddings are based on fragmented text, not user-intent questions. Chunks may not contain complete answers. Different chunks might be required to answer a single query and not all chunks may be fetched.
  • Weak Question Alignment: Queries must “match” text embeddings indirectly
  • Citation Ambiguity: Answers lack clear provenance at the page or file level
  • Reasoning Overhead: Systems compensate with slow, multi-step reasoning at query time
  • Known Limitations: Highlighted in recent citation-focused research and evaluations

🚀 FAQ-RAG: A Structured Alternative

FAQ-RAG introduces a document-to-FAQ transformation pipeline that converts each page or file into a comprehensive set of frequently asked questions, paired with grounded answers derived strictly from the source content.

Each FAQ becomes a first-class retrieval unit.


Why store FAQs instead of raw text?

For every document (or page), FAQ-RAG:

  1. Extracts the content into clean, structured Markdown
  2. Generates all plausible FAQs that can be answered from that content
  3. Produces grounded answers for each question using only the extracted text
  4. Creates two embeddings per FAQ:
    • One vector from the question
    • One vector from the answer
  5. Stores both vectors with precise document location metadata

This design improves recall, precision, and citation fidelity without duplicating raw text unnecessarily.


✨ Key Advantages

🎯 Citation-Ready by Construction

  • Explicit Provenance: Every FAQ is linked to file name, page number, and content scope
  • Deterministic Sources: Answers are generated from the document, not inferred later
  • Audit-Friendly: Ideal for regulated and research-heavy environments

🧠 Semantically Complete Retrieval Units

  • Question-Aligned Indexing: Queries match stored questions directly
  • Answer-Aware Embeddings: Answer vectors improve semantic grounding and reranking
  • Reduced Hallucination Surface: Answers already exist before query time

Efficient and Scalable

  • No Heavy Reasoning at Query Time
  • Dual-Vector Retrieval: Flexible matching on intent (question) or substance (answer)
  • Lightweight Infrastructure: Standard vector databases, no complex orchestration
  • Scales Linearly: Suitable for large document collections

🛠️ Quick Start: From Document to FAQ Index

Prerequisites

  • Python 3.8+
  • Vector database (e.g., Pinecone)
  • API keys for OpenAI / Cohere (or compatible models)

Installation

git clone https://github.com/Pro-GenAI/FAQ-RAG
cd FAQ-RAG
pip install -e .
cp .env.example .env
# Configure your API keys in .env

Build the FAQ Index

# Host embedding / generation models
python faq_rag/host_models.py &

# Ingest a document:
# 1. Extract to Markdown
# 2. Generate FAQs
# 3. Generate grounded answers
# 4. Create dual embeddings (Q + A)
python -c "from faq_rag.utils.ingestion import ingest_document; ingest_document('your-document.pdf')"

Query with Structured Knowledge

import openai

client = openai.OpenAI(
	api_key="dummy",
	base_url="http://localhost:8001/v1"
)

response = client.chat.completions.create(
	model="RAG-app",
	messages=[{"role": "user", "content": "What is compound interest?"}]
)

print(response.choices[0].message.content)
# Sample output:
# "Compound interest is the interest calculated on both the initial principal
# and the accumulated interest from previous periods (investopedia.pdf, page 12)."

📊 How FAQ-RAG Compares

Feature FAQ-RAG Traditional RAG Reasoning-Based RAG
Retrieval Unit FAQ (Q + A) Text Chunk Dynamic Context
Embedding Strategy Dual (Question + Answer) Single Single
Citation Fidelity ✅ Exact ❌ Approximate ✅ Exact
Query-Time Reasoning Cost ✅ Low ✅ Low ❌ High
Scalability ✅ High ✅ High ⚠️ Limited

🎯 Ideal Use Cases

  • Academic & Scientific QA
  • Legal and Compliance Systems
  • Financial Research Platforms
  • Medical and Technical Documentation
  • Enterprise Knowledge Bases
  • Educational and Training Tools

🔧 Technical Architecture

Core Pipeline

  • Document → Markdown Extraction
  • Exhaustive FAQ Generation
  • Grounded Answer Synthesis
  • Dual-Vector Embedding (Q + A)
  • Vector DB Storage with Metadata
  • OpenAI-Compatible Retrieval API

Supported Inputs

  • PDF files with page-level tracking
  • File-level and page-level ingestion modes
  • Extensible to additional formats

🚀 Why FAQ-RAG

FAQ-RAG treats questions as the atomic unit of knowledge. Instead of hoping a chunk contains an answer, the system guarantees that every indexed item is an answer—already validated, embedded, and traceable.


FAQ-RAG: Stop retrieving text. Start retrieving answers.

Built for precise, auditable, and scalable knowledge systems.