Memory is a TypeScript/Bun service that ingests, stores, and retrieves “memories” (text snippets and extracted facts) with semantic search and retrieval-augmented generation (RAG). It uses Express for the API, Prisma/PostgreSQL for structured storage, Qdrant for vector search, and OpenAI for embeddings, fact extraction, reranking, and answer generation.
- REST API for creating, updating, deleting, and querying memories
- Fact-first storage: extracts atomic facts before embedding for richer recall
- Semantic search via Qdrant vector DB with scoped filters (userId, agentId, runId)
- RAG Q&A: retrieves memories, optionally reranks, then generates answers with LLMs
- Deduplication using normalized content hashing
- Validation with Zod schemas on all inputs
- Configurable models for embedding, rerank, fact extraction, and answer generation
- API Layer: Express routes under
/apiwith controllers handling validation and responses. - Services:
- Memory Service: core orchestration (create, batch ingest, search, ask/answer, dedupe).
- Fact Extraction Service: OpenAI chat completion to produce concise facts.
- Embedding Service: generates embeddings (OpenAI), stores/searches vectors in Qdrant.
- Rerank & Answer: optional reranking plus answer generation via OpenAI chat.
- Data Stores:
- PostgreSQL (Prisma): memory metadata (source, tags, categories, attributes, summary, contentHash).
- Qdrant: embeddings with payload metadata for filtered search.
- Utilities: hashing for deduplication, prompt templates for RAG flows.
Memory fields include: userId, agentId, runId, role, source, sourceId, timestamp, contentUrl, title, origin, tags[], category[], attribute (JSON), summary, type, importance, confidence, embeddingRef, contentHash (unique), createdAt, updatedAt. Indexed on contentHash, userId + contentHash, and userId + agentId + runId.
Base path: /api
POST /memory— Create a memory (fact extraction + embeddings).PUT /memory— Update memory metadata/content (does not currently re-embed).DELETE /memory— Delete a memory (controller does not delete vector; use Embedding Service if needed).GET /memory— Get memory by id.GET /memory/user— List memories for a user.POST /memories— Batch ingest messages; defaults to fact extraction unlessinfer=false.POST /memories/search— Semantic search with filters (userId/agentId/runId, limit, scoreThreshold).POST /memories/answer— Ask with optional query override; returns answer + source memories.POST /memories/ask— Similar to answer; returns answer, memories, count, models.
Request/response schemas are enforced via Zod in src/types/memory.types.ts.
- Create: controller validates → Memory Service dedupes by hash → save row → extract facts → embed each fact → store vectors in Qdrant → update memory summary/embeddingRef → respond.
- Batch: iterate messages;
infer=truefollows Create flow per message;infer=falsestores full-content embedding once. - Search: generate query embedding → Qdrant search with filters → return scored payloads.
- Ask/Answer (RAG): search → optional rerank via LLM scores → format memories → answer via LLM.
PORT— API port (default 8000)OPENAI_API_KEY— OpenAI keyEMBEDDING_MODEL— OpenAI embedding model nameEMBEDDING_DIMENSION— Embedding vector dimension (must match Qdrant collection)QDRANT_URL— Qdrant endpointQDRANT_API_KEY— Qdrant API key (if required)QDRANT_COLLECTION_NAMEorCOLLECTION_NAME— Target collectionANSWER_MODEL— Model for answer generation (defaultgpt-4o-mini)RERANK_MODEL— Model for reranking (defaultgpt-4o-mini)RERANK_ENABLED—"true"to enable rerank by defaultRERANK_TOP_K— Max docs after rerankFACT_MODEL— Model for fact extraction (defaultgpt-4o-mini)NODE_ENV— Controls Prisma logging
- Install Bun:
curl -fsSL https://bun.sh/install | bash - Install deps:
bun install - Configure environment variables (e.g.,
.env). - Prepare Postgres database and run Prisma generate (ensure
prisma/generatoroutput matchessrc/generated/prisma). - Ensure Qdrant is reachable and collection matches dimension.
- Dev/serve:
bun run index.ts - The server listens on
PORTand exposes/api/...routes.
- Deduplication is per
userId(and agent/run attributes) usingcontentHash. - Fact extraction is mandatory in the create flow; if no facts are extracted, the memory is stored without embeddings.
- Embedding updates on memory updates are not automatic in current controllers.
- Qdrant collection is auto-created on first use with cosine distance and configured dimension.
- Rerank is optional and can be toggled per request or via env defaults.
src/index.ts— Express bootstrapsrc/routes/memory.routes.ts— Route definitionssrc/controllers/memory.controller.ts— HTTP handlers + validationsrc/services/memory/memory.service.ts— Core domain logicsrc/services/extraction/factExtraction.service.ts— Fact extraction (OpenAI)src/services/embedding/embedding.service.ts— Embeddings + Qdrant I/Osrc/services/vector/qdrant.ts— Qdrant client wrappersrc/services/embedding/openai.ts— OpenAI client wrappersrc/types/memory.types.ts— Zod schemas and TS typessrc/utils/hash.ts— Normalization and content hashingsrc/prisma/schema.prisma— Data model