RAGLite Repository

Note: This repository contains multiple projects. The main production-ready package is ContextTape (located in the contexttape/ directory).

📦 Main Package: ContextTape

👉 Go to ContextTape Package →

The production-ready, open-source RAG storage system is in:

contexttape/                    ← THE MAIN PACKAGE
├── src/contexttape/           ← Source code
├── tests/                     ← Test suite (55 tests)
├── examples/                  ← Usage examples
├── docs/                      ← Documentation
└── README.md                  ← Full package docs

Quick links:

🧪 Other Projects in This Repository

Directory	Status	Purpose
`contexttape/`	✅ Production	Main RAG package
`cleanup/`	🧪 Experimental	Data cleanup utilities
`newdbtype/`, `newrag/`	🧪 Experimental	Database/RAG experiments
`runner.py`	📝 Legacy	Old runner scripts

🧠 ContextTape — Overview

Persistent Vector Memory for Retrieval-Augmented Generation (RAG-Light, Database-Free)

🔍 What is ContextTape?

ContextTape is a database-free retrieval architecture that replaces vector databases with a pure file-segment system.

Each content item is stored as two paired files:

A text segment (segment_T.is) containing UTF-8 text.
A vector segment (segment_E.is) containing an embedding.

Each segment begins with a fixed 32-byte header that encodes the metadata needed to link, identify, and retrieve content.

At query time, the system:

Embeds the query into a vector.
Scans only vector segments sequentially (no ANN index).
Maintains a top-k heap of best matches.
Late-dereferences corresponding text segments only for top results.
Optionally applies hybrid re-ranking (vector + lexical/domain).
Assembles the retrieval context for an LLM or other downstream use.

This design shifts the bottleneck from compute-bound vector search to sequential I/O, drastically reducing memory use and power draw.

⚙️ Key Features

🧩 No Vector DB — storage uses ordinary files, not Faiss/PGVector/Milvus/etc.
🧱 Segment Headers — 32B metadata block linking text and vectors.
⚡ Quantization — int8 vectors with per-segment scale → 4× smaller.
🔄 Late Dereference — read text only for top-k hits.
🧮 Stride Scanning — skip segments to trade accuracy for speed.
🌍 Multi-Store Fusion — merge results from multiple directories.
🧠 Coarse Prefilter — optional lightweight centroid filtering.
🔒 Append-Only Writes — crash-safe, easy snapshotting.
🧾 Auditability — exact bytes passed to the model are reproducible.
🪫 Energy-Aware Mode — reduce top-k or stride under power limits.
🖼️ Visual Container Option — (future) deterministic vector-in-frame mapping.

📁 Directory Layout

src/contexttape/
  ├── storage.py        # File-segment store (this README documents it)
  ├── cli.py            # CLI: ingest, search, chat, bench, stat, reset
  ├── embed.py          # Embedding utilities
  ├── ingest_*          # Chunkers for text/wiki/etc
  ├── energy.py         # Power-aware tuning (optional)
  └── search.py         # Hybrid vector/lexical rerank
wiki_store/             # Example corpus store
chat_ts/                # Chat memory store
bench/                  # Benchmark outputs
dist/                   # Exported playlists/manifests

🚀 Quick Start

Installation

pip install contexttape

Basic Usage

from contexttape import TSStore
import numpy as np

# Create a store
store = TSStore("my_knowledge_base")

# Add documents with embeddings
text = "Machine learning is transforming AI."
embedding = np.random.randn(1536).astype(np.float32)  # Use real embeddings in production
text_id, vec_id = store.append_text_with_embedding(text, embedding, quantize=True)

# Search
query_embedding = np.random.randn(1536).astype(np.float32)
results = store.search_by_vector(query_embedding, top_k=5)

for score, text_id, vec_id in results:
    print(f"Score: {score:.4f} | {store.read_text(text_id)}")

With OpenAI Embeddings

from contexttape import TSStore, get_client, embed_text_1536
import os

# Set your API key
os.environ["OPENAI_API_KEY"] = "sk-..."

# Initialize
client = get_client()
store = TSStore("my_store")

# Ingest documents
docs = [
    "Python is a versatile programming language.",
    "Neural networks power modern AI systems.",
    "Data preprocessing is crucial for ML success."
]

for doc in docs:
    embedding = embed_text_1536(client, doc)
    store.append_text_with_embedding(doc, embedding, quantize=True)

# Search
query = "artificial intelligence programming"
query_emb = embed_text_1536(client, query)
results = store.search_by_vector(query_emb, top_k=3)

for score, tid, vid in results:
    print(f"{score:.4f}: {store.read_text(tid)}")

High-Level Client API

from contexttape import ContextTapeClient

# Create client (handles embeddings automatically)
client = ContextTapeClient("my_store")

# Ingest documents
client.ingest("Document content here", metadata={"author": "Alice", "date": "2024-01-01"})

# Batch ingest
texts = ["Doc 1", "Doc 2", "Doc 3"]
client.ingest_batch(texts)

# Search
results = client.search("query text", top_k=5)
for result in results:
    print(f"{result.score:.4f}: {result.text}")
    if result.metadata:
        print(f"  Metadata: {result.metadata}")

Command-Line Interface

# Ingest documents
ct ingest-path ./documents --out-dir my_store --quantize --verbose

# Search
ct search "machine learning" --wiki-dir my_store --topk 5

# Get statistics
ct stat --wiki-dir my_store

# Interactive chat with retrieval
ct chat --wiki-dir my_store --topk 8 --verbose

📚 Examples

Check out the examples/ directory:

quickstart.py - Basic operations and workflows
advanced_usage.py - Advanced patterns and integrations
tutorial.py - Step-by-step learning tutorials
comprehensive_examples.py - Production patterns

Run them with:

python examples/quickstart.py
python examples/tutorial.py

🚀 Ingesting Data

Option A: wiki corpus

ct build-wiki \
  --topics-file scripts/topics.example.txt \
  --out-dir wiki_store \
  --verbose

Option B: Local documents

ct ingest-path ./docs \
  --out-dir wiki_store \
  --exts md txt pdf \
  --max-pdf-pages 10 \
  --verbose

During ingestion, each text chunk is embedded and stored as:

segment_<n>.is   # text
segment_<n+1>.is # embedding (float32 or int8)

Link fields in the headers connect the two.

🧾 Segment Format

Each .ts file contains:

[32-byte header][payload]

Header Layout (32 bytes, little-endian)

Field	Type	Bytes	Description
next_id	int32	4	link to paired vector/text
prev_id	int32	4	reverse link
data_len	int32	4	length of payload
data_type	int32	4	0=text, 1=vec_f32, 2=vec_i8, 100=coarse
dim	int32	4	vector dimension
scale	float32	4	quantization scale
reserved	8	timestamp / nonce / magic

💾 Python API

from contexttape.storage import TSStore, MultiStore, write_playlist
import numpy as np

# Create store
store = TSStore("wiki_store")

# Append text + embedding
vec = np.random.randn(1536).astype(np.float32)
t_id, v_id = store.append_text_with_embedding("Photosynthesis converts light energy.", vec)

# Search
q = np.random.randn(1536).astype(np.float32)
hits = store.search_by_vector(q, top_k=5)
for score, tid, eid in hits:
    print(score, store.read_text(tid))

# Multi-store fusion
wiki = TSStore("wiki_store")
chat = TSStore("chat_ts")
ms = MultiStore([wiki, chat])
res = ms.search(q, per_shard_k=8, final_k=5)

🧩 CLI Reference

1️⃣ `ct search`

Search stores for nearest vector matches.

ct search "photosynthesis basics" \
  --wiki-dir wiki_store \
  --chat-dir chat_ts \
  --topk 5 \
  --verbose

2️⃣ `ct chat`

Hybrid retrieval + prompt assembly.

ct chat

OR

ct chat \
  --wiki-dir wiki_store --chat-dir chat_ts \
  --topk 8 \
  --alpha 0.6 \
  --min-score 0.32 --min-lex 0.1 --min-hybrid 0.25 \
  --verbose

3️⃣ `ct stat`

Show stats.

ct stat --wiki-dir wiki_store

4️⃣ `ct reset-chat`

Reset chat history.

ct reset-chat --chat-dir chat_ts

🧪 Benchmarking

ContextTape includes a microbenchmark that measures:

Query latency (ms)
QPS (queries/sec)
Memory footprint (RSS/PSS)
Estimated or measured energy (J)
Corpus size and structure

Run a benchmark

# Prepare queries
printf "photosynthesis\nquantum computing\n" > /tmp/queries.txt

# Run benchmark (5 repeats)
ct bench \
  --wiki-dir wiki_store --chat-dir chat_ts \
  --queries-file /tmp/queries.txt \
  --repeats 5 \
  --topk 5 \
  --energy-aware \
  --assume-power-watts 15 \
  --out-json bench/bench.json \
  --out-csv bench/bench.csv \
  --out-md bench/bench.md \
  --verbose

Component	Description
Storage layer	File segments with headers and payloads
Retrieval	Sequential scan over vector segments
Dereference	Load text only for top-k
Quantization	int8 + scale, dequantized on read
Hybrid re-rank	Vector + lexical similarity
Multi-store fusion	Merge per-shard results
Energy module	Adjust stride/k under power budget
Playlist	Optional `.m3u8` listing for streaming replication

Sample output (bench/bench.md)

Latency (ms): p50=47.71, p95=97.59, mean=49.90
Throughput: 18.66 QPS
Memory: RSS=162.9 MB, PSS=153.9 MB
Energy (est): 16.076 J @ 15 W
Segments: 50 pairs, 487k tokens
Corpus: wiki_store=2.50MB, chat_ts=0.00MB





---

✅ **How to run a benchmark (step-by-step)**

1. Make sure you’ve already ingested at least one store (`wiki_store`, `chat_ts`).
2. Create a query file:
   ```bash
   printf "photosynthesis\nquantum computing\n" > /tmp/queries.txt
   ```
3. Run:
   ```bash
   ct bench \
     --wiki-dir wiki_store --chat-dir chat_ts \
     --queries-file /tmp/queries.txt \
     --repeats 5 \
     --topk 5 \
     --energy-aware \
     --assume-power-watts 15 \
     --verbose
   ```
4. Check results:
   - `bench/bench.json` → structured results.
   - `bench/bench.csv` → spreadsheet-ready metrics.
   - `bench/bench.md` → human-readable summary.
   - Energy (if supported) is computed from RAPL, else estimated via `assume-power-watts`.

---

Would you like me to add the **`ct bench` Python implementation section** (the one that measures latency, memory, and energy) into your README as an appendix? That would make the documentation fully reproducible for reviewers or patent enablement.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
contexttape		contexttape
ENHANCEMENT_SUMMARY.md		ENHANCEMENT_SUMMARY.md
LICENSE		LICENSE
README.md		README.md
SYSTEM_REPORT.md		SYSTEM_REPORT.md
runner.py		runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAGLite Repository

📦 Main Package: ContextTape

🧪 Other Projects in This Repository

🧠 ContextTape — Overview

Persistent Vector Memory for Retrieval-Augmented Generation (RAG-Light, Database-Free)

🔍 What is ContextTape?

⚙️ Key Features

📁 Directory Layout

🚀 Quick Start

Installation

Basic Usage

With OpenAI Embeddings

High-Level Client API

Command-Line Interface

📚 Examples

🚀 Ingesting Data

Option A: wiki corpus

Option B: Local documents

🧾 Segment Format

Header Layout (32 bytes, little-endian)

💾 Python API

🧩 CLI Reference

1️⃣ `ct search`

2️⃣ `ct chat`

3️⃣ `ct stat`

4️⃣ `ct reset-chat`

🧪 Benchmarking

Run a benchmark

Sample output (bench/bench.md)

About

Uh oh!

Releases

Packages

Languages

License

NuTerraLabs/ContextTape

Folders and files

Latest commit

History

Repository files navigation

RAGLite Repository

📦 Main Package: ContextTape

🧪 Other Projects in This Repository

🧠 ContextTape — Overview

Persistent Vector Memory for Retrieval-Augmented Generation (RAG-Light, Database-Free)

🔍 What is ContextTape?

⚙️ Key Features

📁 Directory Layout

🚀 Quick Start

Installation

Basic Usage

With OpenAI Embeddings

High-Level Client API

Command-Line Interface

📚 Examples

🚀 Ingesting Data

Option A: wiki corpus

Option B: Local documents

🧾 Segment Format

Header Layout (32 bytes, little-endian)

💾 Python API

🧩 CLI Reference

1️⃣ ct search

2️⃣ ct chat

3️⃣ ct stat

4️⃣ ct reset-chat

🧪 Benchmarking

Run a benchmark

Sample output (bench/bench.md)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1️⃣ `ct search`

2️⃣ `ct chat`

3️⃣ `ct stat`

4️⃣ `ct reset-chat`

Packages