Skip to content
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
---
title: "Adding Persistent Memory to OpenAI Codex with Hindsight"
authors: [benfrank241]
date: 2026-04-08T09:00
tags: [codex, openai, memory, persistent-memory, hindsight, tutorial, coding-agents, python]
image: /img/blog/adding-memory-to-codex-with-hindsight.png
description: "Give OpenAI Codex persistent memory across sessions with Hindsight. Auto-recall injects context before every prompt. Auto-retain extracts facts when sessions end."
hide_table_of_contents: true
---

![Adding Persistent Memory to OpenAI Codex with Hindsight](/img/blog/adding-memory-to-codex-with-hindsight.png)

## TL;DR

<!-- truncate -->

- Codex has no persistent memory built in. Every session starts fresh — no recollection of past decisions, preferences, or codebase context.
- The Hindsight plugin hooks into Codex via three Python scripts with zero `pip install` required. One curl command gets you running.
- Auto-recall queries your memory bank before every prompt and injects relevant facts as invisible context. Codex sees them; you don't have to repeat them.
- Auto-retain fires at the end of every session, extracts facts from the transcript, and stores them for future recall.
- For teams, point everyone's config at a shared Hindsight server with a fixed `bankId`. See [Shared Memory for AI Coding Agents](https://hindsight.vectorize.io/blog/2026/03/31/team-shared-memory-ai-coding-agents).
- Hindsight Cloud stores your memory bank server-side — no local daemon, and memory follows you across machines. [Sign up free.](https://ui.hindsight.vectorize.io/signup)

## The Problem: Codex Has No Persistent Memory

[Codex](https://github.com/openai/codex) is OpenAI's open-source coding agent CLI. You give it a task, it reads your files, runs commands, and iterates until it's done. It's capable and fast — but it has no memory.

Every session starts from nothing. Codex doesn't know which libraries your project uses, which patterns you've standardized on, which areas of the codebase are fragile, or what you were working on yesterday. You re-establish this context at the start of every session, either by explaining it directly or by pointing at an `AGENTS.md` file you've manually maintained.

`AGENTS.md` helps — it's a static markdown file that tells Codex baseline facts about your project on startup. But it captures what you remembered to write down, not what you actually encountered. The Redis TTL discrepancy you noticed Tuesday at 3pm, the JWT edge case that surfaced during code review, the reason you stopped using SQLAlchemy — these live in session transcripts that vanish when the window closes. Nobody updated `AGENTS.md`. Next session, that knowledge is gone.

## How Hindsight Adds Persistent Memory to Codex

[Hindsight](https://github.com/vectorize-io/hindsight) adds a persistent memory layer to Codex by hooking into its lifecycle at two points: before every prompt and after every session.

**Auto-recall.** On every user prompt, the recall hook queries Hindsight for memories relevant to what you're about to ask. The results are injected into Codex as `additionalContext` — prepended to the conversation before the model sees it, but not visible in your terminal output. Codex has the context; you didn't have to repeat it.

**Auto-retain.** When a Codex session ends (the `Stop` hook fires), the retain hook takes the session transcript, strips any injected memory tags to prevent feedback loops, and sends it to Hindsight. The extraction model reads it and pulls out discrete facts — decisions made, patterns observed, bugs found. These land in your memory bank, available for every future session.

**Full-session upsert.** The transcript is stored using the session ID as the document key. If a session is retained multiple times (in chunked mode), the content is upserted rather than duplicated. No accumulation of near-identical entries.

**Minimal dependencies.** The hook scripts use Python stdlib only — no pip install, no virtualenv, no version conflicts. Local daemon mode requires [`uvx`](https://docs.astral.sh/uv/) to run `hindsight-embed`; Cloud mode has no local prerequisites at all.

```
┌──────────────┐ UserPromptSubmit ┌─────────────────────────┐
│ Codex │ ─────────────────────────▶ │ recall.py │
│ CLI │ ◀───────────────────────── │ queries Hindsight, │
│ │ additionalContext inject │ injects as context │
│ │ └─────────────────────────┘
│ │ Stop ┌─────────────────────────┐
│ │ ─────────────────────────▶ │ retain.py │
└──────────────┘ │ strips tags, sends │
│ transcript to Hindsight│
└─────────────────────────┘
```

## Installing

```bash
curl -fsSL https://hindsight.vectorize.io/get-codex | bash
```

The installer guides you through choosing local or cloud mode and writes the hook scripts to `~/.hindsight/codex/`. It also enables `codex_hooks = true` in your `~/.codex/config.toml` automatically.

If you choose Hindsight Cloud during setup, there's nothing else to install — no local daemon, no `uvx` required.

Start a new Codex session — memory is live immediately. The first few sessions build up your bank. By the third or fourth, recall starts surfacing useful context you didn't have to re-explain.

To uninstall:

```bash
curl -fsSL https://hindsight.vectorize.io/get-codex | bash -s -- --uninstall
```

## Hindsight Cloud (Recommended)

For most users, Hindsight Cloud is the easier option: no daemon to manage, memory syncs across machines, and setup is two lines of JSON. The default setup runs a local `hindsight-embed` daemon, but switching to Cloud means you never have to think about it again. Edit `~/.hindsight/codex.json`:

```json
{
"hindsightApiUrl": "https://api.hindsight.vectorize.io",
"hindsightApiToken": "hsk_your_token"
}
```

Create an account and API key at [hindsight.vectorize.io](https://ui.hindsight.vectorize.io/signup). No daemon to manage — the cloud server handles extraction.

## What Gets Recalled

The recall hook fires on every `UserPromptSubmit` event. It takes your prompt (and optionally the previous turn for context), queries Hindsight, and injects the most relevant memories as a block at the top of the conversation:

```
<hindsight_memories>
Relevant memories from past conversations...
Current time - 2026-03-28 09:14

- Project uses FastAPI with asyncpg — not SQLAlchemy [world] (2026-03-26)
- Preferred testing framework: pytest with pytest-asyncio [world] (2026-03-26)
- Redis TTL in production is 15 minutes — README says 30, README is wrong [world] (2026-03-27)
</hindsight_memories>
```

Codex sees this block; it doesn't appear in your terminal output. The result: Codex starts every response with relevant context from past sessions, without you having to provide it.

You can tune how much to inject with `recallBudget` (`"low"`, `"mid"`, `"high"`) and `recallMaxTokens`.

### Before and after

Without persistent memory, you open a session and type: "Continue working on the payments module." Codex has no context. It reads the files, makes reasonable guesses, and may ask clarifying questions you've answered before, or worse, contradict decisions you made last week.

With Hindsight, the same session opens with recalled facts already in context: which payment provider you're using, that you decided against webhooks in favor of polling, and that the staging environment has a known issue with idempotency keys. Codex starts from where you left off, not from zero.

## Per-Project Memory

By default all Codex sessions share a single bank. To give each project its own isolated memory:

```json
{
"dynamicBankId": true,
"dynamicBankGranularity": ["agent", "project"]
}
```

With this config, running Codex in `~/projects/api` and `~/projects/frontend` maintains separate banks. Bank IDs are derived from the working directory path — switching projects automatically switches memory context. Per-project isolation works with Hindsight Cloud the same way it does locally — the bank ID is just routed to the Cloud API instead of a local daemon.

## Team Shared Memory

Individual persistent memory is useful. Shared memory across a team is transformative.

When everyone on a team points their Codex config at the same Hindsight bank, context accumulated by one developer becomes available to all. A bug discovered on Monday surfaces in recall on Tuesday, regardless of who's asking. Architecture decisions made in one session inform the next, without requiring anyone to update a shared doc.

To configure team shared memory, set a fixed `bankId` in each developer's config and point them at the same Hindsight Cloud endpoint:

```json
{
"hindsightApiUrl": "https://api.hindsight.vectorize.io",
"hindsightApiToken": "hsk_your_token",
"bankId": "my-team-project"
}
```

See [Shared Memory for AI Coding Agents](https://hindsight.vectorize.io/blog/2026/03/31/team-shared-memory-ai-coding-agents) for a full team setup guide including bank seeding, per-project isolation, and onboarding patterns.

## Key Configuration Options

Settings live in `~/.hindsight/codex.json`. Every setting can also be set via environment variable.

| Setting | Default | What it does |
|---------|---------|--------------|
| `retainMission` | generic | Steers fact extraction — tell it what to focus on |
| `retainEveryNTurns` | `10` | How often to retain mid-session in chunked mode |
| `recallBudget` | `"mid"` | Search depth: `"low"` (fast) / `"mid"` / `"high"` (thorough) |
| `autoRecall` | `true` | Master switch for recall |
| `autoRetain` | `true` | Master switch for retention |
| `dynamicBankId` | `false` | Enable per-project bank isolation |
| `debug` | `false` | Logs all Hindsight activity to stderr |

Example `retainMission` for a focused memory bank:

```json
{
"retainMission": "Extract technical decisions, known bugs and their workarounds, architecture choices, and coding conventions. Do not retain one-off debugging steps or editor preferences."
}
```

## Pitfalls

**Hooks not firing.** The installer sets `codex_hooks = true` in `~/.codex/config.toml` automatically, but if you installed manually or the file already existed, this may have been missed. Check the file and add the setting under `[features]` if it's missing.

**No memories recalled in the first session.** Recall returns results only after something has been retained. Complete one session first, or seed your bank manually using the [cookbook example](https://github.com/vectorize-io/hindsight-cookbook/tree/main/applications/codex-memory).

**Retention seems delayed.** `retainEveryNTurns` defaults to `10` in chunked mode — retain fires every 10 turns. In full-session mode (the default), retention fires once at session end. If you're testing, add `"retainEveryNTurns": 1` to your config.

**Nothing happening?** Check `~/.hindsight/codex/error.log` first — hook failures are written there. For a full trace of what each hook is doing, enable debug logging:

**Debug mode.** Add `"debug": true` to `~/.hindsight/codex.json` to log detailed activity to `~/.hindsight/codex/debug.log`:

```
[Hindsight] Recalling from bank 'codex', query length: 42
[Hindsight] Injecting 3 memories
[Hindsight] Retaining to bank 'codex', doc 'sess-abc123', 2 messages, 847 chars
```

## Tradeoffs

A few things worth knowing before you commit.

**Recall adds latency.** Every prompt triggers a Hindsight query before Codex sees it. In practice this is 100–300ms with Hindsight Cloud on a fast connection. For interactive sessions it's imperceptible; for automated scripts it may matter. Use `"recallBudget": "low"` or `"autoRecall": false` if you need to skip it.

**Retention is asynchronous.** The `retain.py` hook fires asynchronously so it doesn't block your session from exiting. Facts are typically available within seconds of retention completing, but timing depends on server load — they may appear sooner or later than the next session.

**Extraction quality depends on conversation quality.** Hindsight extracts facts from what's actually in the transcript. If you work through a problem entirely in file edits without narrating what you're doing, there may be little for the extraction model to work with. Brief explanations in your prompts help.

## Recap

| | Codex default | With Hindsight |
|---|---|---|
| Memory across sessions | None | Automatic |
| Memory setup | Manual `AGENTS.md` | Extracted from transcripts |
| Recall mechanism | File content on startup | Semantic search, injected per prompt |
| Per-project isolation | No | Optional via `dynamicBankId` |
| Team shared memory | No | Shared bank via Hindsight Cloud |

## Next Steps

- **Hindsight Cloud**: [ui.hindsight.vectorize.io](https://ui.hindsight.vectorize.io/signup)
- **Install**: `curl -fsSL https://hindsight.vectorize.io/get-codex | bash`
- **Config reference**: [Codex integration docs](/sdks/integrations/codex)
- **Team memory**: [Shared Memory for AI Coding Agents](https://hindsight.vectorize.io/blog/2026/03/31/team-shared-memory-ai-coding-agents)
- **Cookbook**: [applications/codex-memory](https://github.com/vectorize-io/hindsight-cookbook/tree/main/applications/codex-memory)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.