Skip to content

Turn your documents into Claude's memory. MCP server with hybrid semantic + keyword search over emails, PDFs, and notes. Works with Claude Desktop.

License

Notifications You must be signed in to change notification settings

jeffgreendesign/textrawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

204 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Textrawl

License: MIT Node.js TypeScript PostgreSQL Supabase OpenAI Ollama MCP

Turn your documents into Claude's memory.

Textrawl is a personal knowledge base that lets Claude search through your emails, PDFs, notes, and web pages. Ask questions about your own documents right from Claude Desktop - no copy-pasting, no context limits.

How It Works

                                    Your Knowledge Base
                                    ┌─────────────────────────────────┐
┌──────────────┐                    │                                 │
│              │                    │   Emails      PDFs      Notes   │
│    Claude    │◄───── search ─────►│     │          │          │     │
│   Desktop    │                    │     ▼          ▼          ▼     │
│              │                    │  ┌──────────────────────────┐   │
└──────────────┘                    │  │   Hybrid Search Engine   │   │
       │                            │  │  (semantic + keywords)   │   │
       │                            │  └──────────────────────────┘   │
       ▼                            │              │                  │
  "What did                         │              ▼                  │
   Sarah say                        │     PostgreSQL + pgvector       │
   about the                        │         (Supabase)              │
   project?"                        │                                 │
                                    └─────────────────────────────────┘
                                                   ▲
                                                   │
                                        ┌──────────┴──────────┐
                                        │                     │
                                   Desktop App            CLI Tools
                                  (drag & drop)        (batch import)

Why Textrawl?

Beyond keyword search. Most search tools only match exact words. Textrawl combines semantic understanding (finds "automobile" when you search "car") with traditional keyword matching - so you get relevant results without missing exact phrases.

Your data, your choice. Use OpenAI's embeddings for best accuracy, or run completely locally with Ollama - no API costs, no data leaving your machine.

Import everything. Emails from Gmail exports, PDFs from your research, saved web pages, Google Takeout archives - Textrawl converts them all into searchable knowledge.

Features

Feature Description
Hybrid Search Vector similarity + full-text search with Reciprocal Rank Fusion
Desktop App Drag-and-drop file conversion and upload (macOS, Windows, Linux)
Multi-Format PDF, DOCX, XLSX, PPTX, HTML, MBOX/EML emails, Google Takeout
MCP Integration Works natively with Claude Desktop and other MCP clients
Flexible Embeddings OpenAI (cloud) or Ollama (free, local)
Smart Chunking Paragraph-aware splitting with overlap for context
CLI Tools Batch processing for large archives
Cloud Ready Deploy to Docker, Cloud Run, or any container platform

Quick Start

1. Set Up the Server

git clone https://github.com/jeffgreendesign/textrawl.git
cd textrawl
pnpm install
pnpm setup    # Interactive setup for credentials
pnpm dev      # Start the server

2. Set Up Supabase

  1. Create a free project at supabase.com
  2. Run scripts/setup-db.sql in the SQL Editor (or setup-db-ollama.sql for Ollama)
  3. (Optional) For memory tools, also run scripts/setup-db-memory.sql (or setup-db-memory-ollama.sql)
  4. (Optional) For conversation tools, also run scripts/setup-db-conversation.sql (or setup-db-conversation-ollama.sql / setup-db-conversation-ollama-v2.sql)
  5. Run scripts/security-rls.sql for security hardening
  6. Copy your project URL and service role key to .env

3. Connect Claude Desktop

Add to your Claude config (~/Library/Application Support/Claude/claude_desktop_config.json). Create this file if it doesn't exist:

{
  "mcpServers": {
    "textrawl": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "http://localhost:3000/mcp",
        "--header",
        "Accept: application/json, text/event-stream"
      ]
    }
  }
}

Note: Requires Node.js 20+. If using nvm, ensure your default is set: nvm alias default 22

If you've set API_BEARER_TOKEN in .env, add the auth header:

"--header",
"Authorization: Bearer YOUR_TOKEN_HERE"

Restart Claude Desktop - you'll now see Textrawl's tools available.

3b. Connect ChatGPT Desktop (Alternative)

ChatGPT Desktop supports MCP servers natively (Pro/Plus required):

  1. Open Settings → Connectors → Advanced → Developer mode
  2. Add a new connector with your server URL: http://localhost:3000/mcp
  3. If using auth, add the Authorization: Bearer YOUR_TOKEN header

See OpenAI MCP documentation for details.

4. Add Your Documents

Option A: Desktop App (easiest)

pnpm desktop:dev

Drag files onto the window to convert and upload.

Option B: CLI (for batch imports)

pnpm convert -- mbox ~/Mail/archive.mbox
pnpm upload -- ./converted/

Documentation

Guide Description
CLI Tools Batch conversion and upload from command line
Security Row Level Security and access controls

Configuration

Variable Required Description
SUPABASE_URL Yes https://your-project.supabase.co
SUPABASE_SERVICE_KEY Yes Service role key
EMBEDDING_PROVIDER No openai (default) or ollama
OPENAI_API_KEY If OpenAI For text-embedding-3-small
OLLAMA_BASE_URL If Ollama Default: http://localhost:11434
OLLAMA_MODEL If Ollama Default: nomic-embed-text
API_BEARER_TOKEN Prod only Min 32 chars (openssl rand -hex 32)
PORT No Default: 3000
LOG_LEVEL No debug, info, warn, error
ALLOWED_ORIGINS No Comma-separated CORS origins
ENABLE_MEMORY No Enable memory tools (default: true); requires setup-db-memory.sql or setup-db-memory-ollama.sql
ENABLE_CONVERSATIONS No Enable conversation memory tools (default: true); requires setup-db-conversation.sql or setup-db-conversation-ollama.sql or setup-db-conversation-ollama-v2.sql
ENABLE_INSIGHTS No Enable proactive insight tools (default: true)
ENABLE_MEMORY_EXTRACTION No Enable LLM-based memory extraction (default: false)
ANTHROPIC_API_KEY If extraction Required for extract_memories tool
EXTRACTION_MODEL No Model for extraction (default: claude-3-haiku-20240307)
COMPACT_RESPONSES No Token-efficient responses (default: true)

MCP Tools

Document Tools

Tool Description
search_knowledge Hybrid semantic + full-text search
get_document Retrieve document by ID
list_documents List with pagination and filtering
update_document Update title and/or tags
add_note Add markdown note to knowledge base

Memory Tools (Persistent Memory)

Enable with ENABLE_MEMORY=true (default). Requires scripts/setup-db-memory.sql or setup-db-memory-ollama.sql.

Tool Description
remember_fact Store facts about entities (people, projects, concepts)
recall_memories Semantic search across stored memories
relate_entities Create relationships between entities
get_entity_context Get all memories and relations for an entity
list_entities List all known entities
forget_entity Delete an entity and all its memories
memory_stats Get memory statistics
extract_memories Extract entities and facts from text using LLM

Conversation Tools (Conversation Memory)

Enable with ENABLE_CONVERSATIONS=true (default). Requires running one of the conversation schema scripts:

  • scripts/setup-db-conversation.sql (OpenAI embeddings)
  • scripts/setup-db-conversation-ollama.sql (Ollama v1 - nomic-embed-text, 1024d)
  • scripts/setup-db-conversation-ollama-v2.sql (Ollama v2 - nomic-embed-text-v2-moe, 768d)
Tool Description
save_conversation_context Save conversation summary and turns for recall
recall_conversation Semantic search across past conversations
list_conversations List recent conversation sessions
get_conversation Get full conversation by session ID or key
delete_conversation Delete a conversation session
conversation_stats Get conversation storage statistics

Unified Search

Tool Description
search_with_context Search across documents, memories, and conversations simultaneously
knowledge_stats Get statistics about the knowledge base

Insight Tools (Proactive Discovery)

Enable with ENABLE_INSIGHTS=true (default).

Tool Description
get_insights View discovered cross-source connections and patterns
discover_connections Trigger an insight scan across the knowledge base
dismiss_insight Dismiss an insight from the queue
insight_stats Get insight queue and processing statistics

Search Parameters

Parameter Type Default Description
query string required Search query
limit number 10 Max results (1-50)
fullTextWeight number 1.0 Keyword weight (0-2)
semanticWeight number 1.0 Semantic weight (0-2)
minScore number 0 Min relevance threshold (0-1)
tags string[] - Filter by tags (AND logic)
sourceType string - note, file, or url

REST API

Upload Documents

curl -X POST http://localhost:3000/api/upload \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "[email protected]" \
  -F "title=Optional Title" \
  -F "tags=tag1,tag2"

Limits: 10MB max file size, 10 uploads/min Formats: .pdf, .docx, .txt, .md

Response:

{
  "success": true,
  "documentId": "uuid",
  "title": "Document Title",
  "tags": ["tag1", "tag2"],
  "chunksCreated": 12
}

Health Checks

  • GET /health - Basic health
  • GET /health/ready - Readiness probe (checks DB)
  • GET /health/live - Liveness probe

Deployment

Docker Compose

docker-compose up -d
docker-compose logs -f

Google Cloud Run

# Create secrets in Secret Manager first
export GCP_PROJECT_ID=your-project-id
./scripts/deploy.sh

Development

pnpm dev            # Watch mode
pnpm build          # Production build
pnpm start          # Run production
pnpm typecheck      # Type check
pnpm lint           # Biome lint check
pnpm quality        # Lint + typecheck combined
pnpm inspector      # MCP Inspector
pnpm setup          # Generate .env with secure token
pnpm desktop:dev    # Run desktop app
pnpm docs:dev       # Run docs site

Local Database (Optional)

Run PostgreSQL + pgvector locally instead of using Supabase:

# Start local Postgres with pgvector
docker-compose -f docker-compose.local.yml up -d

# Initialize the database schema
docker exec -i textrawl-postgres psql -U postgres -d textrawl < scripts/setup-db.sql

# Optional: Start pgAdmin at http://localhost:5050
docker-compose -f docker-compose.local.yml --profile tools up -d

Local Embeddings with Ollama (No API Key Required)

Run embeddings locally with Ollama instead of OpenAI:

# Start Postgres + Ollama
docker-compose -f docker-compose.local.yml --profile ollama up -d

# Pull the embedding model (~274MB)
docker exec textrawl-ollama ollama pull nomic-embed-text

# Use the Ollama-specific schema (1024 dimensions)
docker exec -i textrawl-postgres psql -U postgres -d textrawl < scripts/setup-db-ollama.sql

Set in .env:

EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=nomic-embed-text

Supported Ollama models: nomic-embed-text (recommended), mxbai-embed-large

Note: OpenAI uses 1536-dimension embeddings, Ollama models use 1024. Use setup-db.sql for OpenAI or setup-db-ollama.sql for Ollama. You cannot mix providers without re-embedding all documents.

Troubleshooting

Issue Solution
Invalid Supabase URL Format: https://your-project.supabase.co (no trailing slash)
Missing service role key Use service role key from Settings > API, not anon key
No search results Check chunks table has embeddings; lower minScore
MCP tools not in Claude Restart Claude Desktop; check curl http://localhost:3000/health
Rate limit exceeded API: 100/min, Upload: 10/min

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

MIT - see LICENSE

About

Turn your documents into Claude's memory. MCP server with hybrid semantic + keyword search over emails, PDFs, and notes. Works with Claude Desktop.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Contributors 3

  •  
  •  
  •