Textrawl

Turn your documents into Claude's memory.

Textrawl is a personal knowledge base that lets Claude search through your emails, PDFs, notes, and web pages. Ask questions about your own documents right from Claude Desktop - no copy-pasting, no context limits.

How It Works

                                    Your Knowledge Base
                                    ┌─────────────────────────────────┐
┌──────────────┐                    │                                 │
│              │                    │   Emails      PDFs      Notes   │
│    Claude    │◄───── search ─────►│     │          │          │     │
│   Desktop    │                    │     ▼          ▼          ▼     │
│              │                    │  ┌──────────────────────────┐   │
└──────────────┘                    │  │   Hybrid Search Engine   │   │
       │                            │  │  (semantic + keywords)   │   │
       │                            │  └──────────────────────────┘   │
       ▼                            │              │                  │
  "What did                         │              ▼                  │
   Sarah say                        │     PostgreSQL + pgvector       │
   about the                        │         (Supabase)              │
   project?"                        │                                 │
                                    └─────────────────────────────────┘
                                                   ▲
                                                   │
                                        ┌──────────┴──────────┐
                                        │                     │
                                   Desktop App            CLI Tools
                                  (drag & drop)        (batch import)

Why Textrawl?

Beyond keyword search. Most search tools only match exact words. Textrawl combines semantic understanding (finds "automobile" when you search "car") with traditional keyword matching - so you get relevant results without missing exact phrases.

Your data, your choice. Use OpenAI's embeddings for best accuracy, or run completely locally with Ollama - no API costs, no data leaving your machine.

Import everything. Emails from Gmail exports, PDFs from your research, saved web pages, Google Takeout archives - Textrawl converts them all into searchable knowledge.

Features

Feature	Description
Hybrid Search	Vector similarity + full-text search with Reciprocal Rank Fusion
Desktop App	Drag-and-drop file conversion and upload (macOS, Windows, Linux)
Multi-Format	PDF, DOCX, XLSX, PPTX, HTML, MBOX/EML emails, Google Takeout
MCP Integration	Works natively with Claude Desktop and other MCP clients
Flexible Embeddings	OpenAI (cloud) or Ollama (free, local)
Smart Chunking	Paragraph-aware splitting with overlap for context
CLI Tools	Batch processing for large archives
Cloud Ready	Deploy to Docker, Cloud Run, or any container platform

Quick Start

1. Set Up the Server

git clone https://github.com/jeffgreendesign/textrawl.git
cd textrawl
pnpm install
pnpm setup    # Interactive setup for credentials
pnpm dev      # Start the server

2. Set Up Supabase

Create a free project at supabase.com
Run scripts/setup-db.sql in the SQL Editor (or setup-db-ollama.sql for Ollama)
(Optional) For memory tools, also run scripts/setup-db-memory.sql (or setup-db-memory-ollama.sql)
(Optional) For conversation tools, also run scripts/setup-db-conversation.sql (or setup-db-conversation-ollama.sql / setup-db-conversation-ollama-v2.sql)
Run scripts/security-rls.sql for security hardening
Copy your project URL and service role key to .env

3. Connect Claude Desktop

Add to your Claude config (~/Library/Application Support/Claude/claude_desktop_config.json). Create this file if it doesn't exist:

{
  "mcpServers": {
    "textrawl": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "http://localhost:3000/mcp",
        "--header",
        "Accept: application/json, text/event-stream"
      ]
    }
  }
}

Note: Requires Node.js 20+. If using nvm, ensure your default is set: nvm alias default 22

If you've set API_BEARER_TOKEN in .env, add the auth header:

"--header",
"Authorization: Bearer YOUR_TOKEN_HERE"

Restart Claude Desktop - you'll now see Textrawl's tools available.

3b. Connect ChatGPT Desktop (Alternative)

ChatGPT Desktop supports MCP servers natively (Pro/Plus required):

Open Settings → Connectors → Advanced → Developer mode
Add a new connector with your server URL: http://localhost:3000/mcp
If using auth, add the Authorization: Bearer YOUR_TOKEN header

See OpenAI MCP documentation for details.

4. Add Your Documents

Option A: Desktop App (easiest)

pnpm desktop:dev

Drag files onto the window to convert and upload.

Option B: CLI (for batch imports)

pnpm convert -- mbox ~/Mail/archive.mbox
pnpm upload -- ./converted/

Documentation

Guide	Description
CLI Tools	Batch conversion and upload from command line
Security	Row Level Security and access controls

Configuration

Variable	Required	Description
`SUPABASE_URL`	Yes	`https://your-project.supabase.co`
`SUPABASE_SERVICE_KEY`	Yes	Service role key
`EMBEDDING_PROVIDER`	No	`openai` (default) or `ollama`
`OPENAI_API_KEY`	If OpenAI	For text-embedding-3-small
`OLLAMA_BASE_URL`	If Ollama	Default: `http://localhost:11434`
`OLLAMA_MODEL`	If Ollama	Default: `nomic-embed-text`
`API_BEARER_TOKEN`	Prod only	Min 32 chars (`openssl rand -hex 32`)
`PORT`	No	Default: 3000
`LOG_LEVEL`	No	debug, info, warn, error
`ALLOWED_ORIGINS`	No	Comma-separated CORS origins
`ENABLE_MEMORY`	No	Enable memory tools (default: true); requires `setup-db-memory.sql` or `setup-db-memory-ollama.sql`
`ENABLE_CONVERSATIONS`	No	Enable conversation memory tools (default: true); requires `setup-db-conversation.sql` or `setup-db-conversation-ollama.sql` or `setup-db-conversation-ollama-v2.sql`
`ENABLE_INSIGHTS`	No	Enable proactive insight tools (default: true)
`ENABLE_MEMORY_EXTRACTION`	No	Enable LLM-based memory extraction (default: false)
`ANTHROPIC_API_KEY`	If extraction	Required for `extract_memories` tool
`EXTRACTION_MODEL`	No	Model for extraction (default: claude-3-haiku-20240307)
`COMPACT_RESPONSES`	No	Token-efficient responses (default: true)

MCP Tools

Document Tools

Tool	Description
`search_knowledge`	Hybrid semantic + full-text search
`get_document`	Retrieve document by ID
`list_documents`	List with pagination and filtering
`update_document`	Update title and/or tags
`add_note`	Add markdown note to knowledge base

Memory Tools (Persistent Memory)

Enable with ENABLE_MEMORY=true (default). Requires scripts/setup-db-memory.sql or setup-db-memory-ollama.sql.

Tool	Description
`remember_fact`	Store facts about entities (people, projects, concepts)
`recall_memories`	Semantic search across stored memories
`relate_entities`	Create relationships between entities
`get_entity_context`	Get all memories and relations for an entity
`list_entities`	List all known entities
`forget_entity`	Delete an entity and all its memories
`memory_stats`	Get memory statistics
`extract_memories`	Extract entities and facts from text using LLM

Conversation Tools (Conversation Memory)

Enable with ENABLE_CONVERSATIONS=true (default). Requires running one of the conversation schema scripts:

scripts/setup-db-conversation.sql (OpenAI embeddings)
scripts/setup-db-conversation-ollama.sql (Ollama v1 - nomic-embed-text, 1024d)
scripts/setup-db-conversation-ollama-v2.sql (Ollama v2 - nomic-embed-text-v2-moe, 768d)

Tool	Description
`save_conversation_context`	Save conversation summary and turns for recall
`recall_conversation`	Semantic search across past conversations
`list_conversations`	List recent conversation sessions
`get_conversation`	Get full conversation by session ID or key
`delete_conversation`	Delete a conversation session
`conversation_stats`	Get conversation storage statistics

Unified Search

Tool	Description
`search_with_context`	Search across documents, memories, and conversations simultaneously
`knowledge_stats`	Get statistics about the knowledge base

Insight Tools (Proactive Discovery)

Enable with ENABLE_INSIGHTS=true (default).

Tool	Description
`get_insights`	View discovered cross-source connections and patterns
`discover_connections`	Trigger an insight scan across the knowledge base
`dismiss_insight`	Dismiss an insight from the queue
`insight_stats`	Get insight queue and processing statistics

Search Parameters

Parameter	Type	Default	Description
`query`	string	required	Search query
`limit`	number	10	Max results (1-50)
`fullTextWeight`	number	1.0	Keyword weight (0-2)
`semanticWeight`	number	1.0	Semantic weight (0-2)
`minScore`	number	0	Min relevance threshold (0-1)
`tags`	string[]	-	Filter by tags (AND logic)
`sourceType`	string	-	`note`, `file`, or `url`

REST API

Upload Documents

curl -X POST http://localhost:3000/api/upload \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]" \
  -F "title=Optional Title" \
  -F "tags=tag1,tag2"

Limits: 10MB max file size, 10 uploads/min Formats: .pdf, .docx, .txt, .md

Response:

{
  "success": true,
  "documentId": "uuid",
  "title": "Document Title",
  "tags": ["tag1", "tag2"],
  "chunksCreated": 12
}

Health Checks

GET /health - Basic health
GET /health/ready - Readiness probe (checks DB)
GET /health/live - Liveness probe

Deployment

Docker Compose

docker-compose up -d
docker-compose logs -f

Google Cloud Run

# Create secrets in Secret Manager first
export GCP_PROJECT_ID=your-project-id
./scripts/deploy.sh

Development

pnpm dev            # Watch mode
pnpm build          # Production build
pnpm start          # Run production
pnpm typecheck      # Type check
pnpm lint           # Biome lint check
pnpm quality        # Lint + typecheck combined
pnpm inspector      # MCP Inspector
pnpm setup          # Generate .env with secure token
pnpm desktop:dev    # Run desktop app
pnpm docs:dev       # Run docs site

Local Database (Optional)

Run PostgreSQL + pgvector locally instead of using Supabase:

# Start local Postgres with pgvector
docker-compose -f docker-compose.local.yml up -d

# Initialize the database schema
docker exec -i textrawl-postgres psql -U postgres -d textrawl < scripts/setup-db.sql

# Optional: Start pgAdmin at http://localhost:5050
docker-compose -f docker-compose.local.yml --profile tools up -d

Local Embeddings with Ollama (No API Key Required)

Run embeddings locally with Ollama instead of OpenAI:

# Start Postgres + Ollama
docker-compose -f docker-compose.local.yml --profile ollama up -d

# Pull the embedding model (~274MB)
docker exec textrawl-ollama ollama pull nomic-embed-text

# Use the Ollama-specific schema (1024 dimensions)
docker exec -i textrawl-postgres psql -U postgres -d textrawl < scripts/setup-db-ollama.sql

Set in .env:

EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=nomic-embed-text

Supported Ollama models: nomic-embed-text (recommended), mxbai-embed-large

Note: OpenAI uses 1536-dimension embeddings, Ollama models use 1024. Use setup-db.sql for OpenAI or setup-db-ollama.sql for Ollama. You cannot mix providers without re-embedding all documents.

Troubleshooting

Issue	Solution
Invalid Supabase URL	Format: `https://your-project.supabase.co` (no trailing slash)
Missing service role key	Use service role key from Settings > API, not anon key
No search results	Check `chunks` table has embeddings; lower `minScore`
MCP tools not in Claude	Restart Claude Desktop; check `curl http://localhost:3000/health`
Rate limit exceeded	API: 100/min, Upload: 10/min

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

MIT - see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
.cursor		.cursor
.github		.github
.husky		.husky
.well-known		.well-known
data		data
desktop		desktop
docs		docs
scripts		scripts
src		src
website		website
.env.example		.env.example
.gitignore		.gitignore
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
.npmrc		.npmrc
.release-please-manifest.json		.release-please-manifest.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
docker-compose.local.yml		docker-compose.local.yml
docker-compose.yml		docker-compose.yml
esbuild.config.mjs		esbuild.config.mjs
llms-full.txt		llms-full.txt
llms.txt		llms.txt
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
release-please-config.json		release-please-config.json
tsconfig.json		tsconfig.json

License

jeffgreendesign/textrawl

Folders and files

Latest commit

History

Repository files navigation

Textrawl

How It Works

Why Textrawl?

Features

Quick Start

1. Set Up the Server

2. Set Up Supabase

3. Connect Claude Desktop

3b. Connect ChatGPT Desktop (Alternative)

4. Add Your Documents

Documentation

Configuration

MCP Tools

Document Tools

Memory Tools (Persistent Memory)

Conversation Tools (Conversation Memory)

Unified Search

Insight Tools (Proactive Discovery)

Search Parameters

REST API

Upload Documents

Health Checks

Deployment

Docker Compose

Google Cloud Run

Development

Local Database (Optional)

Local Embeddings with Ollama (No API Key Required)

Troubleshooting

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 3

Uh oh!

Languages