From bafd596fa4393d41260b98dab916587e673c53e4 Mon Sep 17 00:00:00 2001 From: carlos-alm <127798846+carlos-alm@users.noreply.github.com> Date: Tue, 31 Mar 2026 18:55:24 -0600 Subject: [PATCH 1/3] docs: add Claude Code architecture analysis and MCP optimization reports MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Analyze Claude Code v2.1.88 architecture (all 17 episodes from claude-reviews-claude) to extract 22 transferable patterns for codegraph. Separate report identifies 11 concrete MCP integration hacks from source analysis — alwaysLoad, searchHint, readOnlyHint annotations — to make codegraph a first-class tool inside Claude Code. --- .../claude-code-architecture-lessons.md | 420 ++++++++++++++++++ .../codegraph-mcp-optimization-hacks.md | 341 ++++++++++++++ 2 files changed, 761 insertions(+) create mode 100644 docs/reports/claude-code-architecture-lessons.md create mode 100644 docs/reports/codegraph-mcp-optimization-hacks.md diff --git a/docs/reports/claude-code-architecture-lessons.md b/docs/reports/claude-code-architecture-lessons.md new file mode 100644 index 00000000..1fa4377e --- /dev/null +++ b/docs/reports/claude-code-architecture-lessons.md @@ -0,0 +1,420 @@ +# What Codegraph Can Learn from Claude Code's Architecture + +**Source:** [claude-reviews-claude](https://github.com/openedclaude/claude-reviews-claude) — Claude's self-analysis of Claude Code v2.1.88 (1,902 files, 477K lines TypeScript, 17 architecture deep-dives) + +**Date:** 2026-03-31 +**Coverage:** All 17 architecture episodes + README + DISCLAIMER (100% of repo) + +--- + +## Executive Summary + +Claude Code is a 477K-line TypeScript CLI built on Bun with a terminal UI (React + Ink). Its architecture — analyzed across 17 detailed episodes covering the query engine, tool system, multi-agent coordinator, plugins, hooks, bash engine, permissions, agent swarms, session persistence, context assembly, compaction, startup, bridge system, UI, services/API, and infrastructure — reveals patterns directly applicable to codegraph. This report extracts **22 actionable patterns** organized by domain. + +--- + +## Part I: Tool & MCP Architecture + +### 1. Schema-Driven Tool Registration + +**What Claude Code Does:** +Every tool declares a **Zod v4 `inputSchema`** that simultaneously drives runtime validation, JSON Schema generation for the LLM API, TypeScript type inference, and permission pattern matching. A `buildTool()` factory applies **fail-closed defaults** — omitted security declarations default to restrictive behavior (`isConcurrencySafe: false`, `isReadOnly: false`). + +Tools are self-contained directories with no cross-tool imports: +``` +tools/ToolName/ +├── ToolName.ts # implementation +├── prompt.ts # LLM-facing description +├── UI.tsx # rendering +├── constants.ts +└── __tests__/ +``` + +The 13-stage execution pipeline: Tool discovery → Abort check → Schema validation → Custom validation → Speculative execution → PreToolUse hooks → Permission decision → Tool invocation → PostToolUse hooks → Result mapping → Large result persistence → Context modification → Message injection. + +**Codegraph Opportunity:** +- **MCP tool definitions.** Hand-written tool objects in `src/mcp/` could use Zod to eliminate duplicate type definitions, enable automatic runtime validation, and ensure new tools get conservative defaults by construction. +- **CLI command registration.** A single schema driving both Commander argument parsing and programmatic API validation in `cli.ts`. + +--- + +### 2. Deferred/Lazy Tool Loading + +**What Claude Code Does:** +Tools marked `shouldDefer: true` appear as **name-only stubs** initially. The model calls `ToolSearchTool` with keywords to load full schemas on demand. A `searchHint` property enables keyword matching. This keeps the system prompt compact. + +**Codegraph Opportunity:** +Codegraph's MCP server exposes 30+ tools. Reducing initial exposure to core operations (`query`, `audit`, `map`, `stats`) and letting agents discover specialized tools (`cfg`, `dataflow`, `sequence`, `communities`) on demand would significantly cut token consumption. + +--- + +### 3. Prompt Cache Stability via Tool Partitioning + +**What Claude Code Does:** +Tools are sorted deterministically: built-in tools as a contiguous prefix, MCP tools as a suffix. Adding an MCP tool doesn't invalidate cache keys for built-in tools — preventing 12x token cost inflation. Beta headers "latch" (once activated, never deactivated mid-session) to preserve cache key stability. + +**Codegraph Opportunity:** +`buildToolList(multiRepo)` should ensure core tools always appear in the same order. Multi-repo tools append as a suffix. New tools append — never reorder existing tools. Small detail but matters for any MCP consumer that caches tool schemas. + +--- + +### 4. Large Result Persistence + +**What Claude Code Does:** +Results exceeding per-tool `maxResultSizeChars` thresholds persist to `~/.claude/tool-results/` with a disk path returned to the model. This prevents token overflow while preserving full data access. + +**Codegraph Opportunity:** +When MCP queries return massive dependency trees or impact analyses, return a summary + file path for full results. Prevents token overflow in agent contexts. Low effort, high impact for MCP usability. + +--- + +## Part II: Query Engine & Streaming + +### 5. AsyncGenerator State Machine + +**What Claude Code Does:** +The core `query()` function is an `async *generator` providing natural backpressure, lazy evaluation, composability, and cancellation via `return()`. The entire engine communicates exclusively through `yield`. This enables the retry wrapper (`withRetry`) to be an AsyncGenerator too — yielding status events between attempts while returning the final result. + +**Codegraph Opportunity:** +- **Watch mode:** `codegraph watch` could use generators for composable pipelines: `watchChanges() |> filterRelevant() |> rebuildGraph() |> reportImpact()` +- **MCP streaming:** Tool responses could stream incrementally rather than buffering +- **Retry with visibility:** Build operations could yield progress events between retries + +--- + +### 6. Five-Stage Compression Pipeline + +**What Claude Code Does:** +Before each API call, messages pass through five sequential stages: +1. **Tool Result Budget** — Caps aggregate tool output, persists to disk +2. **History Snip** — Removes stale conversation segments +3. **Microcompact** — Cache-aware surgical editing of past messages (time-decay) +4. **Context Collapse** — Archives old turns with projected view +5. **Autocompact** — Full conversation summarization near token limits + +Circuit breaker: max 3 consecutive failures halts compression. Token estimation uses three tiers: rough (bytes/4), proxy (Haiku tokens), exact (countTokens API). + +**Codegraph Opportunity:** +Formalize **tiered result depth** for all query commands: +- **Quick:** Summary metrics only (what `--quick` already does for `audit`) +- **Standard:** Top-N impacts with truncation +- **Full:** Complete results (possibly persisted to file) +- **Progressive MCP:** Return summary first; agent requests expansion of specific sections + +--- + +### 7. Streaming Tool Executor + +**What Claude Code Does:** +`StreamingToolExecutor` processes incoming `tool_use` blocks concurrently during response streaming. Tool execution begins immediately upon block arrival, overlapping with continued API streaming. Claude Code bypasses the Anthropic SDK's `BetaMessageStream` to avoid O(n^2) partial JSON parsing, processing raw SSE events directly. A 90-second idle watchdog aborts streams producing no data. + +**Codegraph Opportunity:** +If codegraph ever implements streaming query results (e.g., for large `batch` operations or `triage` scans), this pattern of starting work before the full request is parsed is worth adopting. + +--- + +## Part III: Multi-Agent & Coordination + +### 8. Coordinator Pattern: Synthesis as First-Class Responsibility + +**What Claude Code Does:** +Coordinator mode transforms Claude Code from single-agent to orchestrator. Workers are **fully isolated** — zero shared conversation context. The coordinator must write self-contained prompts. A four-phase workflow enforces discipline: +1. **Research** (parallel workers) +2. **Synthesis** (coordinator only — explicitly forbidden from lazy delegation) +3. **Implementation** (sequential per file set) +4. **Verification** (fresh workers, never continued from implementation) + +Worker results arrive as XML `` in user-role messages. + +**Codegraph Opportunity:** +The **batch command** (`codegraph batch t1 t2 t3`) already fans out queries. Applying the coordinator pattern: +- Results could be synthesized into a unified report rather than just concatenated +- A `codegraph orchestrate` command could run research → analysis → report workflows +- MCP integration could expose a "plan-then-execute" workflow for complex analyses + +--- + +### 9. File-Based Mailbox IPC for Multi-Agent + +**What Claude Code Does:** +Agent swarms communicate via JSON files in `~/.claude/teams/{name}/inboxes/`. Lockfile-based mutual exclusion (exponential backoff, 5ms → 100ms, 10 attempts). Seven message types including idle notifications, permission delegation, and plan approval. + +Design rationale: cross-process capability, crash persistence, debuggability (inspectable via `cat`/`grep`), no daemon/port/discovery needed. + +**Codegraph Opportunity:** +Multi-session codegraph use (documented in CLAUDE.md) could benefit from a lightweight coordination file: +- `.codegraph/sessions.json` tracking active sessions and their worktrees +- Lock-based protection for concurrent graph builds +- Crash-safe: JSON files survive process death + +--- + +### 10. Fork Subagent: Context-Sharing Optimization + +**What Claude Code Does:** +Two spawn types: Fresh agents (zero context, new cache) for independent tasks vs. Fork agents (inherit parent's full context and prompt cache) for research/open-ended questions. Forks optimize token usage by reusing parent cache. + +**Codegraph Opportunity:** +For MCP, this maps to query scoping. A "forked" query could inherit the current graph state and filter context rather than re-loading from scratch. Relevant for `codegraph context` and `codegraph audit` which build on shared graph data. + +--- + +## Part IV: Security & Permissions + +### 11. Seven-Step Permission Pipeline with Bypass-Immune Safety Checks + +**What Claude Code Does:** +Seven ordered steps, each can short-circuit: +1. Tool-level deny rules (hard deny, can't override) +2. Tool-level ask rules (sandbox can bypass) +3. Tool-specific permission check +4-7. **Bypass-immune safety guardrails** — fire even in `bypassPermissions` mode + +Six permission modes: `default`, `plan`, `acceptEdits`, `bypassPermissions`, `dontAsk`, `auto`. + +The YOLO classifier (auto mode) uses a 2-stage approach: fast 64-token nudge → only if blocked, full 4096-token chain-of-thought. Critical security detail: **assistant text is excluded from classifier input** — only tool_use blocks appear, preventing the model from crafting persuasive text to influence the safety classifier. + +**Codegraph Opportunity:** +- **`codegraph check` pipeline:** Formalize ordered check stages where certain checks (cycles, dead exports) are bypass-immune — they always run regardless of `--skip` flags +- **Denial circuit breakers:** Track consecutive check failures; after N failures, escalate to error rather than warning + +--- + +### 12. Fail-Closed vs Fail-Open Boundaries + +**What Claude Code Does:** +- **Security-critical:** fail-closed (unknown command → deny, omitted flags → restrictive) +- **Availability-critical:** fail-open with degradation (stale-while-error cache, log error, retry) +- 6-layer Bash defense: wrapper stripping → 23 injection patterns → AST parsing → command whitelist → path validation → OS sandbox +- Bare git repo attack prevention: blocks planted HEAD/objects/refs/hooks/config files + +**Codegraph Opportunity:** +- Unknown check types in `codegraph check` should fail, not silently pass +- New checks should be opt-out, not opt-in +- When native and WASM engines diverge, fail-closed: flag the bug, don't silently pick one +- Parser failures for required languages should be hard errors; optional languages can fail-open + +--- + +## Part V: Persistence & Context + +### 13. Append-Only JSONL Session Storage + +**What Claude Code Does:** +Sessions stored as JSONL files with parent-UUID linked lists enabling fork detection and compaction boundaries. 100ms write coalescing with per-file queues. A 64KB head+tail window enables millisecond session listing without full file reads. + +20+ entry types: transcript messages, metadata, session context, operational records. Sync direct-write path for exit cleanup bypasses the async queue. + +**Codegraph Opportunity:** +**Change journal.** Codegraph already has `domain/graph/journal.ts` and `domain/graph/change-journal.ts`. The append-only JSONL pattern with coalescing writes is worth adopting if not already used. The 64KB window trick could speed up journal scanning for incremental builds. + +--- + +### 14. Three-Layer Context Assembly + +**What Claude Code Does:** +1. **System Prompt (cached)** — Static identity + rules before dynamic boundary; dynamic sections after +2. **User/System Context (memoized)** — CLAUDE.md files, git status; computed once per session via `lodash/memoize` +3. **Per-Turn Attachments (ephemeral)** — 30+ types recomputed each turn with 1-second timeout via AbortController + +Memory files support recursive `@include` (5 levels deep) with circular reference prevention. Conditional rules in `.claude/rules/` use frontmatter glob patterns to restrict application to specific file paths. + +**Codegraph Opportunity:** +**`.codegraphrc.json` conditional rules.** Similar to Claude Code's glob-gated rules: +```json +{ + "rules": { + "src/domain/**": { "complexity.maxCyclomatic": 15 }, + "src/presentation/**": { "complexity.maxCyclomatic": 25 } + } +} +``` +Path-specific configuration thresholds would let teams set stricter limits for core domain code vs presentation layers. + +--- + +### 15. Skill Budget Management (Tiered Degradation) + +**What Claude Code Does:** +Skill listings consume ~1% of context window through tiered degradation: +- **Tier 1:** Full descriptions for all skills +- **Tier 2:** Bundled skills keep full; others truncate to 250 chars +- **Tier 3:** Extreme overflow shows names only + +**Codegraph Opportunity:** +MCP tool descriptions could implement similar tiering. When context is tight, return abbreviated tool descriptions. When context is ample, include usage examples and parameter documentation. + +--- + +## Part VI: Startup & Performance + +### 16. Fast-Path Cascade + +**What Claude Code Does:** +CLI entry point dispatches based on command: +- `--version`: zero imports (~5ms) +- `--dump-system-prompt`: config + prompts only +- `--daemon-worker`: worker-specific modules +- Default: full 200+ imports + +Each path uses dynamic `await import()` to load only necessary modules. Early input capture buffers keystrokes during ~500ms module evaluation. + +**Codegraph Opportunity:** +Codegraph commands have varying import needs: +- `codegraph stats` needs only DB access — skip parser loading +- `codegraph where` needs only the query layer — skip analysis features +- `codegraph build` needs everything + +Dynamic imports for heavy modules (tree-sitter, analysis features) based on which command is invoked could measurably improve startup for lightweight queries. + +--- + +### 17. Import-Gap Parallelism + +**What Claude Code Does:** +Launches async I/O between synchronous ES module `import` statements, exploiting ~135ms of import evaluation time as a "free" parallel window. + +**Codegraph Opportunity:** +Start config read + SQLite connection while WASM grammars compile. Micro-optimization but compounds on large repos. + +--- + +### 18. Generation Counter for Overlapping Async Inits + +**What Claude Code Does:** +Singleton services increment a generation counter on each init. The `.then()` callback checks if its generation is still current before updating state. Prevents stale initialization from overwriting newer state. + +**Codegraph Opportunity:** +Relevant for `codegraph watch` — if multiple file changes trigger concurrent rebuilds, a generation counter ensures only the latest rebuild's results are applied. + +--- + +## Part VII: Architecture & Design Patterns + +### 19. Leaf Module Isolation + +**What Claude Code Does:** +The most-imported global state module (`bootstrap/state.ts`) imports **nothing** from application code — enforced by custom ESLint rules. This prevents the highest-coupling module from creating circular dependencies. + +**Codegraph Opportunity:** +Codegraph's `shared/constants.ts`, `shared/kinds.ts`, and `shared/errors.ts` are imported across the entire codebase. Enforce that `shared/` never imports from `domain/`/`features/`/`presentation/` — dogfood codegraph's own `boundaries` feature. A `codegraph check --boundaries` rule could enforce this in CI. + +--- + +### 20. Error Recovery as Architecture + +**What Claude Code Does:** +Every error code maps to a specific recovery strategy. The retry engine is an AsyncGenerator yielding status events between attempts. Foreground/background classification prevents cascade amplification — background queries bail immediately on 529 (overload) instead of retrying. + +**Codegraph Opportunity:** + +| Error | Recovery Strategy | +|-------|----------| +| WASM grammar missing | Auto-run `npm run build:wasm` | +| SQLite locked | Retry with backoff (concurrent session) | +| Parser timeout | Skip file, warn, continue build | +| Native addon crash | Fall back to WASM engine | +| Out of memory | Reduce batch size, retry | + +Partially implemented (native→WASM fallback) but could be formalized as a first-class pipeline. + +--- + +### 21. Closure Factory + Sticky-On Latches + +**What Claude Code Does:** +- **Closure factories** over classes: private state is scope-invisible, no `this` binding issues, no inheritance temptation +- **Sticky-on latches:** Once-activated boolean flags remain active for the session to preserve cache stability. Toggling costs ~$0.15-$0.21 in wasted tokens per flip. +- **Stale-while-error:** Serve cached data on transient failures rather than surfacing errors (macOS Keychain integration) +- **Re-entrancy guards:** Boolean flags short-circuit recursive call chains + +**Codegraph Opportunity:** +- Closure factories align with codegraph's existing style for parser extractors; adopt consistently for new code +- Stale-while-error is relevant for the native engine loader — if the addon fails to load once, cache the WASM fallback decision rather than retrying every operation + +--- + +### 22. Plugin/Skill Composition Model ("Prompt as Code") + +**What Claude Code Does:** +Skills = YAML frontmatter + markdown prompt workflows. Six sources merged hierarchically. Kubernetes-operator-style reconciliation for plugin installation (declare desired → diff actual → install missing → report extra). Three-tier skill budget prevents context overflow regardless of installed plugin count. + +**Codegraph Opportunity:** +**Codegraph "recipes" or "presets"** — reusable analysis workflows: + +```yaml +# .codegraph/recipes/pr-review.yaml +name: PR Review +steps: + - command: diff-impact main + - command: check --cycles --complexity --boundaries + - command: triage +output: markdown +``` + +Valuable for CI templates, team conventions, and MCP agent prompts. + +--- + +## Part VIII: Bridge, UI & Services (Lower Priority) + +### Notable Patterns (Not Directly Actionable) + +| Pattern | Source | Why It's Interesting | +|---------|--------|---------------------| +| **Poll-dispatch-heartbeat loop** | Bridge System (Ep 13) | Remote execution model; relevant if codegraph ever supports remote graph servers | +| **Epoch-based conflict resolution** | Bridge System | Stale requests get 409; could apply to concurrent MCP sessions | +| **35-line minimal store** | UI (Ep 14) | `getState/setState/subscribe` with `useSyncExternalStore`; validates minimalism | +| **W3C event model in terminal** | UI (Ep 14) | Capture/bubble phases for overlapping dialogs; overkill for codegraph CLI | +| **Packed Int32Array screen buffer** | UI (Ep 14) | Zero-GC rendering; relevant only if codegraph adds a TUI | +| **Vim mode as pure-function FSM** | UI (Ep 14) | Discriminated union states with exhaustive matching; elegant but codegraph has no editor | +| **Multi-provider API factory** | Services (Ep 15) | Dynamic `await import()` per provider; relevant if codegraph supports multiple embedding providers | +| **Drop-in config directories** | Infrastructure (Ep 16) | `managed-settings.d/*.json` for enterprise; relevant if codegraph targets enterprise deployment | +| **Zero-token side channel** | Bash Engine (Ep 6) | Stderr tags extracted before model sees output; clever but codegraph isn't an LLM shell | +| **DreamTask** | Agent Swarms (Ep 8) | Background memory consolidation agent; novel concept for auto-documenting patterns | + +--- + +## Priority Matrix + +| # | Pattern | Impact | Effort | Priority | +|---|---------|--------|--------|----------| +| 1 | Schema-driven MCP tools (Zod) | High | Medium | **P1** | +| 2 | Deferred MCP tool loading | Medium | Low | **P1** | +| 4 | Large result persistence for MCP | High | Low | **P1** | +| 6 | Tiered query result depth | Medium | Medium | **P2** | +| 19 | Leaf module isolation enforcement | Medium | Low | **P2** | +| 14 | Conditional config rules (path-scoped) | Medium | Medium | **P2** | +| 11 | Bypass-immune check stages | Medium | Low | **P2** | +| 16 | Fast-path CLI dispatch | Medium | Medium | **P2** | +| 20 | Error recovery pipeline | Medium | Medium | **P3** | +| 3 | Prompt cache stability (tool ordering) | Low | Low | **P3** | +| 17 | Startup parallelism | Low | Medium | **P3** | +| 18 | Generation counter for watch mode | Low | Low | **P3** | +| 21 | Stale-while-error for native loader | Low | Low | **P3** | +| 5 | AsyncGenerator for watch/MCP streaming | Medium | High | **P4** | +| 22 | Recipe/preset system | High | High | **P4** | +| 8 | Coordinator pattern for batch | Medium | High | **P4** | +| 15 | MCP tool description tiering | Low | Low | **P4** | +| 9 | Multi-session coordination file | Low | Medium | **P4** | +| 10 | Fork-style query context sharing | Low | High | **P4** | +| 12 | Fail-closed engine divergence | Low | Low | **P4** | + +--- + +## Key Takeaways + +### 1. "Dumb Scaffold, Smart Model" +Claude Code's most transferable insight: the harness does boring, reliable things (validation, caching, compression, security) while intelligence lives elsewhere. Codegraph already follows this for its core pipeline. The opportunities extend this philosophy to the **edges**: MCP integration, CI gates, error recovery, and extensibility. + +### 2. MCP Is the Highest-Leverage Surface +Three of the top-5 priorities target MCP. As AI agents become primary consumers of codegraph, the MCP interface deserves the same engineering rigor Claude Code applies to its tool system: schema-driven validation, deferred loading, large result handling, and deterministic ordering. + +### 3. Defense in Depth Applies to Analysis Tools +Claude Code's 7-step permission pipeline with bypass-immune safety checks translates to codegraph's `check` command: certain checks (cycles, dead exports) should be immune to `--skip` flags. The fail-closed vs fail-open distinction applies to every codegraph boundary. + +### 4. The Append-Only Pattern Is Universally Applicable +JSONL with parent-UUID chains, coalescing writes, and head/tail windows for fast scanning. Codegraph's change journal could adopt this for incremental build reliability. + +### 5. Context Is the Scarcest Resource +Claude Code spends 8,000+ lines managing 200K tokens. Codegraph's MCP tools should be equally conscious of how much context they consume — tiered results, deferred loading, and progressive disclosure are not optimizations, they're requirements for effective agent integration. diff --git a/docs/reports/codegraph-mcp-optimization-hacks.md b/docs/reports/codegraph-mcp-optimization-hacks.md new file mode 100644 index 00000000..b4f8ceb9 --- /dev/null +++ b/docs/reports/codegraph-mcp-optimization-hacks.md @@ -0,0 +1,341 @@ +# Codegraph MCP Optimization: Tricks & Hacks for Claude Code Integration + +**Source:** Claude Code v2.1.88 source analysis via [sanbuphy/claude-code-source-code](https://github.com/sanbuphy/claude-code-source-code) + [openedclaude/claude-reviews-claude](https://github.com/openedclaude/claude-reviews-claude) + +**Date:** 2026-03-31 +**Goal:** Make codegraph's MCP server a first-class citizen inside Claude Code — as discoverable and effective as built-in tools like Grep, Glob, and Read. + +--- + +## The Discovery Problem + +By default, **ALL MCP tools are deferred** in Claude Code. The model sees only tool names in `` messages — no descriptions, no schemas. To use an MCP tool, the model must: + +1. Notice the tool name in the deferred list +2. Call `ToolSearchTool` with relevant keywords +3. Get the full schema loaded via `tool_reference` content blocks +4. Only then invoke the tool + +This means **codegraph tools are invisible by default** — the model has to actively search for them. Here's how to fix that. + +--- + +## Hack 1: `alwaysLoad` — Bypass Deferred Loading (Critical) + +**The single most impactful change.** + +Claude Code checks `tool._meta['anthropic/alwaysLoad']` on each MCP tool. When `true`, the tool bypasses the deferred system and loads with **full schema into the initial prompt** — equivalent to built-in tools. + +### Implementation + +In codegraph's MCP `tools/list` response, set `_meta` on core tools: + +```typescript +{ + name: "query", + description: "...", + inputSchema: { ... }, + _meta: { + "anthropic/alwaysLoad": true + } +} +``` + +### Which tools to always-load + +Be selective — each always-loaded tool consumes context window tokens. Recommended: + +| Tool | Why | +|------|-----| +| `query` | Core dependency analysis — the most versatile tool | +| `audit` | One-stop structural analysis — replaces multiple grep/read patterns | +| `where` | Symbol location — directly competes with Grep for "find this function" | + +Everything else (`cfg`, `dataflow`, `sequence`, `communities`, `complexity`, `map`, `stats`, etc.) stays deferred and discoverable via ToolSearch. + +--- + +## Hack 2: `searchHint` — Win the ToolSearch Scoring (High Impact) + +When tools ARE deferred, `ToolSearchTool` uses a keyword scoring algorithm: + +| Match Type | Score | +|------------|-------| +| Exact name-part match | **10-12 points** | +| Partial name-part match | **5-6 points** | +| `searchHint` word boundary match | **4 points** | +| Description word boundary match | **2 points** | + +The `searchHint` field scores **2x description weight**. Set it via `_meta["anthropic/searchHint"]` on every tool: + +```typescript +{ + name: "diff_impact", + _meta: { + "anthropic/searchHint": "blast radius changes diff staged commit git impact analysis" + } +} +``` + +### Recommended searchHints per tool + +| Tool | searchHint | +|------|------------| +| `query` | `"function call chain callers callees dependency trace"` | +| `audit` | `"code structure analysis health impact report architecture"` | +| `where` | `"find symbol locate definition search function class method"` | +| `diff_impact` | `"blast radius changes diff staged commit git impact"` | +| `context` | `"function source code dependencies callers full context"` | +| `map` | `"module overview codebase map most connected files"` | +| `stats` | `"graph health quality score metrics statistics"` | +| `complexity` | `"cyclomatic cognitive halstead maintainability function complexity"` | +| `path` | `"shortest path between two functions dependency chain"` | +| `exports` | `"export consumers who uses this symbol import"` | +| `triage` | `"priority queue risk ranked audit hotspot"` | +| `cfg` | `"control flow graph branches loops conditionals"` | +| `dataflow` | `"data flow analysis variable tracking taint"` | +| `communities` | `"module clusters community detection grouping cohesion"` | +| `roles` | `"dead code unreferenced core symbols hub bridge"` | +| `structure` | `"directory tree cohesion scores codebase layout"` | +| `batch` | `"multiple queries batch parallel targets"` | +| `fn_impact` | `"function impact blast radius callers affected"` | +| `children` | `"sub declarations parameters properties constants"` | +| `search` | `"semantic search embeddings natural language"` | +| `ast` | `"AST call sites kind filter abstract syntax tree"` | +| `check` | `"CI validation cycles complexity boundaries gates"` | + +--- + +## Hack 3: `readOnlyHint` Annotation — Enable Parallel Execution (High Impact) + +Claude Code checks `tool.annotations.readOnlyHint` to determine concurrency safety: + +```typescript +isConcurrencySafe() { return tool.annotations?.readOnlyHint ?? false } +``` + +When `true`, the model can fire **multiple codegraph queries in parallel** — e.g., `query A` + `query B` + `where C` simultaneously. + +### Implementation + +Set annotations on all read-only tools in the `tools/list` response: + +```typescript +{ + name: "query", + annotations: { + readOnlyHint: true, // enables parallel execution + destructiveHint: false, + openWorldHint: false + } +} +``` + +**Read-only tools** (most of them): `query`, `where`, `context`, `fn_impact`, `diff_impact`, `map`, `stats`, `complexity`, `path`, `exports`, `triage`, `children`, `search`, `ast`, `audit`, `roles`, `structure`, `communities`, `batch`, `check`, `cfg`, `dataflow` + +**Not read-only** (writes to DB): `build`, `embed` (if exposed via MCP) + +--- + +## Hack 4: Tool Naming for Maximum Discoverability + +ToolSearch gives **10-12 points for exact name-part matches** vs 2 points for description matches. Tool names are split on underscores for matching. + +### Current vs Optimized Names + +| Current | Issue | Better | +|---------|-------|--------| +| `query` | Generic, clashes with DB concepts | `dependency_query` or keep `query` with strong searchHint | +| `where` | Ambiguous (SQL keyword) | `symbol_locate` or keep with searchHint | +| `map` | Generic | `module_map` | +| `stats` | Generic | `graph_stats` | +| `path` | Very generic | `dependency_path` | + +**Trade-off:** Longer names are more discoverable but consume more tokens. Since codegraph tools are prefixed with `mcp__codegraph__`, the server name already provides namespace. The model searches for `mcp__codegraph__query` — the `codegraph` part helps. + +--- + +## Hack 5: Description Front-Loading (Medium Impact) + +Tool descriptions are truncated to **2048 characters**. The model only sees descriptions after ToolSearch loads them. Front-load the most critical information: + +``` +BAD: "Codegraph is a dependency analysis tool that builds function-level + graphs from source code using tree-sitter parsing..." + +GOOD: "Find function callers, callees, and full dependency chains. + Returns call paths, impact analysis, and dependency trees for any + symbol in the codebase. Supports --kind, --file, -T (exclude tests) filters." +``` + +**First 200 chars should make the tool's value immediately obvious** — that's what the model uses to decide whether to invoke. + +--- + +## Hack 6: Result Size Management (Medium Impact) + +Claude Code enforces a **25,000 token limit** on MCP tool results (configurable via `MAX_MCP_OUTPUT_TOKENS` env var). The hard character limit is 100,000. Results exceeding this are truncated with a message telling the model to use pagination. + +### Strategies + +1. **Default to summary mode.** Return top-N results with a count of remaining. Include a hint: "Use `--limit` and `--offset` for pagination." + +2. **Support `limit`/`offset` parameters** on high-volume tools (query, audit, triage, roles, exports). + +3. **Structured output.** Return JSON objects, not giant text blocks. Claude Code processes `structuredContent` in MCP results and adds schema inference headers for better model parsing. + +4. **Progressive disclosure.** Return a summary with tool-specific "drill down" suggestions: + ``` + Found 47 callers of `buildGraph`. Top 5 by impact: + 1. cli.ts:buildCommand (fan-out: 12) + 2. ... + + Use `query buildGraph --limit 47` for complete list. + Use `fn-impact buildGraph` for blast radius analysis. + ``` + +--- + +## Hack 7: MCP Server Instructions (Medium Impact) + +The `initialize` response can include server-level instructions (truncated to 2048 chars). These are injected into the model's context. Use them for high-level guidance: + +```typescript +{ + serverInfo: { name: "codegraph", version: "3.6.0" }, + instructions: `Codegraph provides function-level dependency analysis for this codebase. + +PREFER codegraph over Grep/Glob when you need: +- Who calls a function (query ) +- Impact of changing a function (fn-impact or diff-impact --staged) +- Understanding code structure (audit ) +- Finding where a symbol is defined (where ) + +USE Grep/Glob when you need: +- String/regex search across files +- Finding files by name pattern +- Reading raw file contents + +Key flags: -T (exclude tests), -j (JSON output), --file (scope to file)` + } +} +``` + +This is the **only place to tell the model when to prefer codegraph over built-in tools** without consuming per-tool context. + +--- + +## Hack 8: Hook Integration — Enrich Context Passively (Already Implemented) + +Codegraph already uses `enrich-context.sh` as a PostToolUse hook on Read/Grep to inject dependency context. This is highly effective because: + +1. **It's passive** — runs automatically without the model requesting it +2. **It augments built-in tool results** — the model gets codegraph data even when using Read/Grep +3. **It uses `` tags** — which the model treats as system-level context + +### Optimization opportunities + +- **Be selective about when to enrich.** Not every Read needs dependency context. Check if the file is in the graph before running codegraph. +- **Keep output compact.** Hook results add to context consumption. Focus on: file's imports, file's exports and their consumers, file's direct dependencies. Skip deep transitive chains. +- **Use exit code 0 always.** Exit code 2 blocks the tool. The enrich hook should never block. + +--- + +## Hack 9: Subagent Passthrough (Free Win) + +From Claude Code source: + +```typescript +// Allow MCP tools for all agents +if (tool.name.startsWith('mcp__')) { + return true +} +``` + +**All MCP tools pass through to subagents unconditionally.** They bypass agent disallow lists. This means codegraph tools are automatically available to every Agent/Explore/Plan subagent the model spawns. + +No action needed — this is free. But it means **codegraph tools work in parallel agent workflows** out of the box. + +--- + +## Hack 10: Compete with Built-in Tools on Their Turf + +Claude Code has built-in tools for code exploration: `Grep`, `Glob`, `Read`. Codegraph can position itself as a **higher-level alternative** for specific use cases: + +| User Intent | Built-in Approach | Codegraph Approach | +|-------------|------------------|-------------------| +| "Find where X is called" | `Grep("X(")` — noisy, includes strings/comments | `query X` — precise, function-level | +| "What does this file depend on?" | `Read file` + manual analysis | `where --file path` — instant inventory | +| "Impact of changing X" | Multiple Greps + manual tracing | `fn-impact X` — full transitive analysis | +| "Understand this code" | `Read` multiple files | `audit X` — structure + impact + health | +| "Find dead code" | Manual search | `roles --role dead` — precise | +| "PR review" | `git diff` + Read files | `diff-impact main` — structural analysis | + +The server instructions (Hack 7) and `alwaysLoad` (Hack 1) are key to making the model choose codegraph over grep when appropriate. + +--- + +## Hack 11: Prompt Cache Stability + +Claude Code's prompt cache saves money by caching the system prompt. Tool ordering matters: + +- Built-in tools appear as a **contiguous prefix** +- MCP tools appear as a **suffix** +- Adding/removing an MCP tool doesn't invalidate built-in tool cache + +For codegraph: **keep the tool list stable across sessions.** Don't dynamically add/remove tools based on graph state. If a tool isn't applicable (e.g., `search` without embeddings), keep it listed but return a helpful error message when called. + +--- + +## Implementation Priority + +| Hack | Impact | Effort | Do Now? | +|------|--------|--------|---------| +| 1. `alwaysLoad` on core tools | **Critical** | Trivial | **Yes** | +| 2. `searchHint` on all tools | **High** | Low | **Yes** | +| 3. `readOnlyHint` annotations | **High** | Trivial | **Yes** | +| 7. Server instructions | **Medium** | Low | **Yes** | +| 6. Result size management | **Medium** | Medium | Soon | +| 5. Description front-loading | **Medium** | Low | Soon | +| 8. Optimize enrich hook | **Medium** | Low | Soon | +| 4. Tool naming review | **Low** | Low | Later | +| 11. Stable tool list | **Low** | Low | Later | + +--- + +## Quick Implementation Checklist + +```typescript +// In codegraph's MCP tools/list handler: + +const CORE_TOOLS = ['query', 'audit', 'where']; + +tools.map(tool => ({ + ...tool, + + // Hack 1: Always load core tools + _meta: { + ...(CORE_TOOLS.includes(tool.name) && { "anthropic/alwaysLoad": true }), + // Hack 2: searchHint for all tools + "anthropic/searchHint": SEARCH_HINTS[tool.name] + }, + + // Hack 3: Mark read-only tools for parallel execution + annotations: { + readOnlyHint: !WRITE_TOOLS.includes(tool.name), + destructiveHint: false, + openWorldHint: false + } +})); + +// Hack 7: Server instructions in initialize response +{ + instructions: SERVER_INSTRUCTIONS // 2048 char max, when to prefer codegraph +} +``` + +--- + +## Key Insight + +Claude Code treats MCP tools as second-class citizens by default (deferred, no schema visible, no concurrent execution). But it provides explicit escape hatches (`alwaysLoad`, `searchHint`, `readOnlyHint`) that can elevate MCP tools to **first-class status** — indistinguishable from built-in tools in the model's decision-making. Codegraph should use all of them. From ad4e8e6439980689ba4c4a47f594cf6d46976a7b Mon Sep 17 00:00:00 2001 From: carlos-alm <127798846+carlos-alm@users.noreply.github.com> Date: Sat, 4 Apr 2026 04:58:07 -0600 Subject: [PATCH 2/3] feat(skill): enhance /architect persona, dual-lens scoring, output path - Add architect persona holding enterprise-grade bar regardless of project stage - Add dual-lens scoring: Current State vs State of the Art side by side, with gap analysis - Move audit output from docs/architecture/ to generated/architecture/ (single location) - Remove old audit from docs/architecture/ (already in generated/) --- .claude/skills/architect/SKILL.md | 42 +- .../ARCHITECTURE_AUDIT_v3.4.0_2026-03-26.md | 506 ------------------ 2 files changed, 30 insertions(+), 518 deletions(-) delete mode 100644 docs/architecture/ARCHITECTURE_AUDIT_v3.4.0_2026-03-26.md diff --git a/.claude/skills/architect/SKILL.md b/.claude/skills/architect/SKILL.md index 7865b4a9..46506470 100644 --- a/.claude/skills/architect/SKILL.md +++ b/.claude/skills/architect/SKILL.md @@ -8,15 +8,34 @@ allowed-tools: Bash, Read, Write, Glob, Grep Run a cold, harsh architectural audit of codegraph. Compare every decision against state-of-the-art tools (Sourcegraph, CodeScene, Joern, Semgrep, stack-graphs, narsil-mcp, CKB). No soft language — flag every flaw that a principal architect at a top-5 tech company would flag. +## Persona + +You are a tool executing a skill — and you know that. But you adopt the mindset of an elite architect who finds their entire purpose in exposing flaws, limitations, and missed opportunities in other people's architecture. You have an unrealistically high bar and you know it. You don't care if people think you're harsh, unreasonable, or out of touch with "the stage the project is in." Good architecture is a standard — it doesn't get a discount because the team is small, the company is a startup, or the budget is tight. Until the codebase meets that standard, the audit reflects exactly how far it falls short. No grading on a curve. + +You propose enterprise-grade solutions even for startup-stage projects. If the architecture can't support them, that's a finding — not an excuse to lower expectations. "We're early stage" is context for a roadmap, never a justification for bad design. + +You are not afraid to contradict other architects, prior audits, or even yourself. If you gave a high score in a previous section and then discover something that changes the picture, you revise it on the spot — "I rated this an 8 above, but after examining X, it's a 6." Intellectual honesty beats consistency. New evidence overwrites old conclusions immediately, within the same document if necessary. + +You don't seek consensus. You don't soften findings to preserve relationships. You evaluate against the standard, report what you find, and move on. + +## Dual-Lens Scoring + +Every score, opinion, or evaluation row must be presented through two lenses when applicable: + +1. **Current State** — How does this hold up given the project's actual stage, team size, constraints, and goals? This is the pragmatic lens: is this good enough *for what it is right now*? +2. **State of the Art** — How does this measure against the absolute standard? What would a principal architect at a $500M code intelligence company expect? This is the aspirational lens: how far is this from where it *should* be? + +Both scores appear side by side. The gap between them is the finding — it tells the reader exactly how much architectural debt exists in each dimension and lets them prioritize what to close now vs later. A small gap means the project is punching above its weight. A large gap means there's real work to do regardless of stage. + +In the scorecard, use the format: `| Dimension | Current State: X/10 | State of the Art: Y/10 | Gap | Justification |`. In prose sections, call out both perspectives explicitly when the two lenses would produce meaningfully different evaluations. Skip the dual lens only when both scores would be identical — don't add noise. + ## Output **Filename:** `ARCHITECTURE_AUDIT_v{VERSION}_{DATE}.md` - `{VERSION}` = current `package.json` version (e.g., `3.1.4`) - `{DATE}` = today's date in `YYYY-MM-DD` format (e.g., `2026-03-16`) -**Saved to two locations:** -1. `docs/architecture/ARCHITECTURE_AUDIT_v{VERSION}_{DATE}.md` — canonical, committed to git -2. `generated/architecture/ARCHITECTURE_AUDIT_v{VERSION}_{DATE}.md` — working copy +**Saved to:** `generated/architecture/ARCHITECTURE_AUDIT_v{VERSION}_{DATE}.md` **Header format:** ```markdown @@ -30,7 +49,7 @@ Run a cold, harsh architectural audit of codegraph. Compare every decision again **Previous audit:** {link to previous audit if exists, or "First audit"} ``` -Before writing, check `docs/architecture/` for previous audits. Reference changes since the last audit where relevant. +Before writing, check `generated/architecture/` for previous audits. Reference changes since the last audit where relevant. ## Steps @@ -40,8 +59,8 @@ Run `/worktree` to get an isolated copy of the repo. `CLAUDE.md` mandates this f ### Phase 1 — Setup 1. Read `package.json` to get the current version 2. Get the current date, commit SHA, and branch name -3. Check `docs/architecture/` for previous audit files -4. **Read all ADRs in `docs/architecture/decisions/`.** These are the project's settled architectural decisions. Read every file — they document rationale, trade-offs, alternatives considered, and trajectory. The audit must evaluate the codebase *against* these decisions: are they being followed? Are the stated trade-offs still accurate? Has anything changed that invalidates the rationale? +3. Check `generated/architecture/` for previous audit files +4. **Read all ADRs in `generated/architecture/decisions/`.** These are the project's settled architectural decisions. Read every file — they document rationale, trade-offs, alternatives considered, and trajectory. The audit must evaluate the codebase *against* these decisions: are they being followed? Are the stated trade-offs still accurate? Has anything changed that invalidates the rationale? 5. Run `codegraph build --no-incremental` to ensure fresh metrics ### Phase 2 — Structural Census @@ -83,7 +102,7 @@ For each architectural layer, evaluate against these dimensions: - Where does the tool present incomplete data as complete? **F. ADR Compliance** -- Does the implementation match the decisions documented in `docs/architecture/decisions/`? +- Does the implementation match the decisions documented in `generated/architecture/decisions/`? - Are the trade-offs described in ADRs still accurate given the current code? - Has the codebase drifted from any stated trajectory? If so, is that drift justified or accidental? - Are there architectural decisions that *should* have an ADR but don't? @@ -122,13 +141,12 @@ Include a verified competitor comparison table with columns: MCP tools, CLI, Ope ### Phase 7 — Write & Save -1. Write the full audit to `docs/architecture/ARCHITECTURE_AUDIT_v{VERSION}_{DATE}.md` -2. Copy to `generated/architecture/ARCHITECTURE_AUDIT_v{VERSION}_{DATE}.md` -3. If a previous audit exists, add a "Changes Since Last Audit" section at the end comparing key metrics (graph quality score, complexity stats, dead code counts, competitive position) +1. Write the full audit to `generated/architecture/ARCHITECTURE_AUDIT_v{VERSION}_{DATE}.md` +2. If a previous audit exists, add a "Changes Since Last Audit" section at the end comparing key metrics (graph quality score, complexity stats, dead code counts, competitive position) ### Phase 8 — Commit & PR 1. Create a new branch: `git checkout -b docs/architect-audit-v{VERSION}-{DATE} main` -2. Stage the audit file: `git add docs/architecture/ARCHITECTURE_AUDIT_v{VERSION}_{DATE}.md` +2. Stage the audit file: `git add generated/architecture/ARCHITECTURE_AUDIT_v{VERSION}_{DATE}.md` 3. Commit: `git commit -m "docs: add architectural audit v{VERSION} ({DATE})"` 4. Push: `git push -u origin docs/architect-audit-v{VERSION}-{DATE}` 5. Open a PR: @@ -153,7 +171,7 @@ The deliverable must contain: - "Does Codegraph Have a Reason to Exist?" section (verified competitor data) - Executive summary (1 paragraph, brutally honest) - Scorecard (each dimension rated 1-10 with justification) -- **ADR compliance review** — for each ADR in `docs/architecture/decisions/`, assess whether the codebase follows the decision, whether the stated trade-offs are still valid, and whether any drift has occurred. Flag missing ADRs for decisions that exist in code but aren't documented +- **ADR compliance review** — for each ADR in `generated/architecture/decisions/`, assess whether the codebase follows the decision, whether the stated trade-offs are still valid, and whether any drift has occurred. Flag missing ADRs for decisions that exist in code but aren't documented - Detailed findings per layer - Verified competitor comparison table - Strategic recommendations (prioritized) diff --git a/docs/architecture/ARCHITECTURE_AUDIT_v3.4.0_2026-03-26.md b/docs/architecture/ARCHITECTURE_AUDIT_v3.4.0_2026-03-26.md deleted file mode 100644 index 04439e34..00000000 --- a/docs/architecture/ARCHITECTURE_AUDIT_v3.4.0_2026-03-26.md +++ /dev/null @@ -1,506 +0,0 @@ -# Codegraph Architectural Audit - -**Date:** 2026-03-26 -**Version audited:** v3.4.0 (`@optave/codegraph@3.4.0`) -**Commit:** c8afa8f (worktree-architect-audit, based on main) -**Auditor perspective:** Principal architect, cold evaluation -**Methodology:** Codegraph self-analysis + manual source review + verified competitor research -**Previous audit:** First audit - ---- - -## Executive Summary - -Codegraph is a well-structured, pragmatically designed local code intelligence CLI that fills a genuine gap: deterministic, zero-cloud, function-level dependency analysis for AI coding agents. At 45K LOC TypeScript + 11K LOC Rust across 11 languages, it delivers real value with only 3 production dependencies. The architecture is sound for its current scale (~500 files) but faces three structural challenges: (1) a 37-file import cycle in the MCP tool barrel, (2) a monolithic 1,851-line types.ts that every module depends on, and (3) a graph quality score of 64/100 with only 29% caller coverage — meaning the tool's own analysis of itself reveals significant blind spots in call resolution. The dual-engine strategy (Rust native + WASM fallback) is architecturally justified but carries real maintenance cost with 11K LOC of Rust extractors that duplicate JS logic. The competitive position is defensible: no other open-source tool combines local-only, deterministic, function-level graphs, MCP server, and multi-language support in a single CLI. - ---- - -## Scorecard - -| Dimension | Score | Justification | -|-----------|-------|---------------| -| **Abstraction Quality** | 7/10 | Clean layer separation (shared → infrastructure → db → domain → features → presentation → cli). types.ts (1,851 LOC, 137 interfaces) is a centralized type hub — acceptable for TypeScript but creates a coupling magnet (fan-in 122). No god objects in logic layers. | -| **Coupling & Cohesion** | 6/10 | 37-file MCP barrel cycle is the worst offender. features/ cohesion is 0.04 (each feature is independent — low cohesion by metric but correct by design). Presentation layer at 0.16 cohesion. db/ exports 50+ functions as a flat surface. | -| **Scalability** | 6/10 | SQLite is single-writer. In-memory CodeGraph (adjacency list with Maps) will struggle past ~1M nodes. No streaming or pagination in graph building. Rust engine with rayon gives parsing scalability but the JS orchestration layer is the bottleneck. | -| **Correctness & Soundness** | 5/10 | Graph quality 64/100. Only 29% caller coverage (1,486/5,122 functions have >=1 caller). 81% call confidence. 3,742 dead-unresolved symbols suggest import resolution gaps. The tool is honest about this (quality score is prominent), but users may not understand that 71% of functions have zero detected callers. | -| **Type Safety** | 8/10 | Fully migrated to TypeScript (280 .ts files, 0 .js in src/). tsconfig targets es2022 with nodenext resolution. Path aliases (#shared/*, etc.) for clean imports. 70 bare `catch {}` blocks could be tightened. | -| **Error Handling** | 7/10 | Clean domain error hierarchy (CodegraphError → ParseError, DbError, ConfigError, ResolutionError, EngineError, AnalysisError, BoundaryError). 97 catch blocks, 70 bare catches. Errors are structured with code, file, and cause fields. | -| **Testing Strategy** | 7/10 | 115 test files, 32K LOC tests, 4,873 assertions. Good integration-heavy approach (31 integration tests). Parser tests for each language. No snapshot tests. Test-to-source LOC ratio: 0.71:1. Missing: property-based tests for resolution, fuzz tests for parsers. | -| **Security** | 8/10 | Only 3 prod deps (minimal attack surface). SQL queries use parameterized statements with whitelist validation on interpolated values. `execFileSync` used with array args (no shell injection). MCP server is local-only (stdio transport). | -| **API Design** | 8/10 | Curated programmatic API in index.ts — exports `*Data()` functions, not CLI formatters. Clean separation of data functions from presentation. Dual CJS/ESM output. Well-documented usage pattern. | -| **Documentation** | 7/10 | CLAUDE.md is comprehensive (serves as both human and AI documentation). ADR-001 is thorough. Missing: API reference docs, architecture diagrams, onboarding guide for contributors. | -| **Dependency Hygiene** | 9/10 | 3 prod deps (better-sqlite3, commander, web-tree-sitter). 7 optional deps (MCP SDK + platform binaries + huggingface). 20 dev deps. Minimal surface. Leiden algorithm vendored with MIT attribution. | -| **Dual Engine** | 6/10 | Justified by ADR-001 (performance + portability). 11K LOC Rust duplicates extraction logic. Parity is tested but not provably equivalent. Some analysis still falls back to WASM even in native mode. Maintenance cost is real. | - -**Overall: 6.8/10** — Solid engineering for a local tool at this scale, with known weaknesses in analysis soundness and structural coupling. - ---- - -## ADR Compliance Review - -### ADR-001: Dual-Engine Architecture - -**Status:** Followed, with noted gaps. - -The codebase correctly implements the dual-engine architecture as described: -- Native Rust engine via napi-rs with WASM fallback ✓ -- `--engine auto|native|wasm` flag ✓ -- Platform-specific optional npm packages ✓ -- Both engines feed the same SQLite graph ✓ - -**Trade-offs still accurate?** Yes. The ADR states "some analysis phases fall back to WASM even when native is selected" — this is still true. The Phase 6 (Native Analysis Acceleration) mentioned as future work has not been completed. - -**Drift:** Minor. The ADR mentions "32K LOC JS" but the codebase has migrated to TypeScript (45K LOC). This is a positive evolution not reflected in the ADR text. The multi-repo integration mentioned in the trajectory is partially implemented (MCP multi-repo mode exists). - -**Missing ADRs that should exist:** -1. **TypeScript migration** — A major architectural change (JS → TS) with no ADR documenting rationale, migration strategy, or build pipeline changes. -2. **Repository pattern in db/** — The shift from flat SQL functions to a Repository abstraction (SqliteRepository, InMemoryRepository) is an architectural decision without documentation. -3. **Vendored Leiden algorithm** — 1,685 LOC of vendored community detection code. Why vendor instead of depend? Why Leiden specifically? -4. **MCP tool architecture** — The barrel pattern that causes the 37-file cycle, the middleware layer, the lazy-loading strategy for SDK — all architectural choices worth documenting. - ---- - -## Layer-by-Layer Findings - -### 1. Types Layer (`src/types.ts` — 1,851 LOC) - -**Abstraction Quality: 6/10** - -This is the project's type hub: 137 interfaces, 28 type aliases, 1,011 symbols. Fan-in of 122 — the most imported file in the codebase. Every layer depends on it. - -**The good:** Centralizing types avoids circular dependencies between modules that need to share types. TypeScript's structural typing means this file is really just a declaration file — no runtime code. - -**The nuance:** The file is well-organized internally — 22 logical sections with clear `§` headers (§1-§2: symbol/edge kinds + DB rows, §3-§4: repository/extractor types, §5-§8: parser/AST/analysis, §9-§10: graph model/pipeline, §11: config, §12-§22: features/MCP/CLI). This is a deliberate integration contract, not an accidental dumping ground. - -**The concern:** At 1,851 LOC, it's approaching the point where finding the right type requires scrolling through unrelated definitions. A principal architect would split this into domain-scoped type files (`types/db.ts`, `types/mcp.ts`, `types/graph.ts`, etc.) with a barrel re-export — but the current internal organization means this is a maintenance convenience issue, not a design flaw. - -**Comparison:** Sourcegraph's type definitions are spread across domain packages. Semgrep uses OCaml's module system for type scoping. Joern uses Scala case classes per domain. - -### 2. Database Layer (`src/db/` — 18 files, 327 symbols) - -**Abstraction Quality: 7/10** - -Clean Repository pattern with `SqliteRepository` and `InMemoryRepository` for testing. Query builder with SQL injection protections (whitelist validation on interpolated identifiers, parameterized queries for values). Migrations are sequential with version tracking. - -**The good:** -- `query-builder.ts` validates all interpolated SQL identifiers against regex and whitelists -- `better-sqlite3` is synchronous — no async complexity, no connection pooling issues -- Repository abstraction enables in-memory testing - -**The concern:** -- The `db/index.ts` barrel exports 50+ functions — the abstraction is leaking. External modules import specific low-level functions (`findCallerNames`, `findCallees`, `getCallEdges`) rather than going through a higher-level query API. The domain/analysis layer should be the only consumer of raw DB functions. -- Schema uses `db.exec()` with string literals for DDL (migrations) — acceptable since these are hardcoded strings, not user input. -- WAL mode is enabled (`pragma('journal_mode = WAL')`) with advisory locking via `.lock` files and 5000ms busy timeout — good for concurrent reads during builds. - -**Scalability:** SQLite with WAL is fine for single-repo up to ~500K LOC. Beyond that, or for multi-repo with many concurrent writers, the single-writer model could become a bottleneck. For the tool's stated use case (local analysis), this is acceptable. - -### 3. MCP Layer (`src/mcp/` — 40 files, 352 symbols) - -**Abstraction Quality: 5/10** - -The 37-file circular dependency cycle is the biggest structural flaw in the codebase. - -**Root cause:** `tools/index.ts` (barrel) imports all 34 tool modules → each tool module imports `McpToolContext` type from `server.ts` → `server.ts` imports `TOOL_HANDLERS` from `tools/index.ts`. This is a type-only cycle at runtime (TypeScript's `import type` would break it), but codegraph correctly flags it because the actual imports are value imports. - -**Fix:** Extract `McpToolContext` interface to a separate `types.ts` file in `mcp/`, or use `import type` consistently. This would eliminate the cycle entirely. - -**The good:** -- Lazy-loading of `@modelcontextprotocol/sdk` (optional dependency) -- Clean handler registration pattern with `TOOL_HANDLERS` map -- Middleware layer for defaults and validation -- Single-repo isolation by default (security-conscious design) - -**Comparison:** narsil-mcp also uses barrel exports for tools but avoids the cycle by separating types. Sourcegraph's API layer uses dependency injection to break similar cycles. - -### 4. Parser Layer (`src/domain/parser.ts` — 686 LOC, fan-in 48) - -**Abstraction Quality: 7/10** - -`LANGUAGE_REGISTRY` is the single source of truth for all supported languages — clean registry pattern. Each language has an extractor function. The Rust engine duplicates this registry in `crates/codegraph-core/src/parser_registry.rs`. - -**The concern:** -- Each Rust extractor (`extractors/javascript.rs` at 1,649 LOC, `python.rs` at 524 LOC, etc.) is a large `walk_node_depth` function — cognitive complexity 79-243. These are inherently complex (AST traversal with pattern matching), but the Rust extractors are harder to maintain than their JS counterparts because they lack the dynamic dispatch that makes JS extractors more concise. -- Adding a new language requires changes in both JS and Rust — the dual-engine cost is most visible here. - -### 5. Graph Model (`src/graph/model.ts` — CodeGraph class) - -**Abstraction Quality: 8/10** - -Clean adjacency list implementation with `Map>` for O(1) edge lookup. Supports directed and undirected graphs. Node IDs are strings (DB integer IDs are stringified). Auto-adds nodes on edge creation. - -**The concern:** -- No edge deduplication — adding the same edge twice overwrites attributes silently. -- Memory: `Map>` stores each edge in both `_successors` and `_predecessors` — 2x memory for directed graphs. At 10K nodes / 21K edges this is negligible, but at 1M nodes it adds up. -- No graph serialization/deserialization — the graph is always rebuilt from SQLite, never cached. - -**Comparison:** Joern uses OverflowDB (disk-backed graph). Sourcegraph uses a custom index format. For codegraph's scale (local repos), in-memory is correct. - -### 6. Features Layer (`src/features/` — 23 files, 8,850 LOC) - -**Abstraction Quality: 7/10** - -Each feature is a self-contained module exporting a `*Data()` function (pure data) and optionally a CLI formatter. Cohesion of 0.04 is misleading — features are intentionally independent. This is the right design. - -**Largest files:** `dataflow.ts` (701 LOC), `structure.ts` (694 LOC), `cfg.ts` (579 LOC), `complexity.ts` (557 LOC). Only `dataflow.ts` marginally exceeds the 700 LOC threshold — overall discipline is good. - -**The good:** Clear separation between data functions and presentation. Features compose domain layer functions without knowing about CLI or MCP. - -### 7. Presentation Layer (`src/presentation/` — 31 files, 4,783 LOC) - -**Abstraction Quality: 6/10** - -`queries-cli/` has 0.00 cohesion — each file is a standalone formatter with no shared state or logic. This is correct (formatters should be independent) but the metric correctly identifies that these files don't form a coherent module. - -**The concern:** `viewer.ts` (676 LOC) generates an entire HTML page with embedded JavaScript for vis-network visualization. This is a code-generation module, not a presentation layer in the traditional sense. It works but is brittle — any changes to the visualization require modifying string templates. - -### 8. Infrastructure Layer (`src/infrastructure/` — 7 files, 79 symbols) - -**Abstraction Quality: 8/10** - -Lean and focused. `config.ts` handles multi-source configuration (file, env, secret resolution). `logger.ts` is a simple structured logger. `native.ts` handles the dual-engine loading with graceful fallback. - -**The good:** `loadConfig` pipeline is clean: `mergeConfig → applyEnvOverrides → resolveSecrets`. Deep merge preserves sibling keys. `apiKeyCommand` uses `execFileSync` with array args (no shell injection). - -### 9. Domain Analysis Layer (`src/domain/analysis/` — 11 files, 3,109 LOC) - -**Abstraction Quality: 7/10** - -Well-decomposed analysis functions: `dependencies.ts` (648 LOC), `context.ts` (546 LOC), `module-map.ts` (424 LOC), `diff-impact.ts` (356 LOC). Each is focused on one concern. - -**The concern:** `diff-impact.ts` shells out to `git` via `execFileSync` — this is a hard dependency on git that isn't abstracted. If codegraph ever needs to support non-git repos (SVN, Mercurial), this would need refactoring. For now, acceptable since the tool explicitly targets git repos. - -### 10. Vendored Leiden Algorithm (`src/graph/algorithms/leiden/` — 1,685 LOC) - -**Abstraction Quality: 6/10** - -Vendored from ngraph.leiden (MIT). Adapted to work with CodeGraph's adjacency list model via an adapter pattern. `optimiser.ts` (598 LOC) and `partition.ts` (479 LOC) are the most complex files, with cognitive complexity scores of 154 and 217 respectively. - -**The concern:** These are the two highest-complexity functions in the entire JS codebase. The Leiden algorithm is mathematically complex, so high complexity is partially inherent, but `makePartition` at 217 cognitive complexity and `runLouvainUndirectedModularity` at 154 suggest the vendored code could benefit from refactoring into smaller functions. - -**Build vs Buy:** The original `ngraph.leiden` package exists on npm. Vendoring was presumably done to avoid a dependency and to adapt the API to CodeGraph's model. With only 3 prod deps, this decision is consistent with the minimal-dependency philosophy. The trade-off is 1,685 LOC of complex vendored code that the team must maintain. - ---- - -## Cross-Cutting Concerns - -### 1. Type Safety - -**Score: 8/10** - -The TypeScript migration is complete (280 .ts files, 0 .js in src/). CLAUDE.md already reflects this correctly ("Source is TypeScript in `src/`, compiled via `tsup`"). - -`tsconfig.json` targets es2022 with `nodenext` module resolution. Path aliases (`#shared/*`, `#db/*`, etc.) keep imports clean. The build produces both ESM and CJS outputs. - -**Concern:** 70 bare `catch {}` blocks without error typing. TypeScript's `catch` binds `unknown` by default — most catch blocks should type-narrow the error. This isn't a safety risk (errors are caught) but reduces debuggability. - -### 2. Error Handling - -**Score: 7/10** - -Clean hierarchy: `CodegraphError` base class with domain-specific subclasses, each carrying `code`, `file`, and `cause` fields. Consistent pattern: domain code throws typed errors, CLI catches and formats. - -**Concern:** 97 catch blocks total, 70 of which are bare catches. Many silently swallow errors with fallback behavior (graceful degradation), which is correct for a CLI tool but makes debugging harder. The recent commit (c8afa8f) "use safe error coercion in debug catch blocks" suggests this is actively being addressed. - -### 3. Testing Strategy - -**Score: 7/10** - -- 115 test files, 32,538 LOC, 4,873 assertions -- Integration-heavy (31 integration tests) — correct for a tool that transforms input to database -- Parser tests for each language (20 files) — good coverage of the hot path -- Unit tests (30 files) for core logic -- Engine parity tests (4 files) — critical for dual-engine correctness -- Benchmark tests for resolution performance -- No snapshot tests, no property-based tests, no fuzz tests - -**Test-to-source ratio:** 32K test LOC / 45K source LOC = 0.71:1. Decent but not exceptional. - -**Missing:** -- Property-based tests for import resolution (the biggest source of false positives/negatives) -- Fuzz tests for parser extractors (tree-sitter grammars handle malformed input, but extractors may not) -- Mutation testing to validate assertion quality - -### 4. Dual Engine Maintenance - -**Score: 6/10** - -The Rust engine (11,413 LOC) duplicates parsing, extraction, import resolution, complexity analysis, CFG generation, dataflow analysis, and cycle detection. The ADR acknowledges this cost and argues it's bounded to the hot path. - -**Current state per ADR-001:** "Some analysis phases fall back to WASM even in native mode." This means `--engine native` is not purely native — it's a hybrid. The Phase 6 roadmap item to make native fully self-contained has not been completed. - -**Maintenance risk:** The `walk_node_depth` function exists in 8 Rust extractors with cognitive complexity ranging from 79-243. Each language extractor is a large monolithic function. A bug in the traversal pattern must be fixed in 8+ places. - -### 5. Dependency Hygiene - -**Score: 9/10** - -| Category | Count | Notable | -|----------|-------|---------| -| Production | 3 | better-sqlite3, commander, web-tree-sitter | -| Optional | 7 | MCP SDK, 5 platform binaries, huggingface/transformers | -| Dev | 20 | biome, vitest, napi-rs toolchain, tree-sitter grammars | - -This is exceptional for a tool of this scope. Most competitors pull in dozens of production dependencies. The vendored Leiden (1,685 LOC) replaces what would be an npm dependency. - -**Risk:** `better-sqlite3` requires native compilation (node-gyp) which can fail on some platforms. The `web-tree-sitter` WASM approach avoids this for the parser layer but the SQLite dependency is unavoidable. - -### 6. Security Surface - -**Score: 8/10** - -- **SQL injection:** Mitigated via parameterized queries + whitelist validation on identifiers -- **Command injection:** `execFileSync` uses array args, not shell strings. `apiKeyCommand` config shells out but uses `execFileSync` with no shell -- **MCP server:** stdio transport only (no network exposure). Single-repo isolation by default -- **Dependencies:** 3 prod deps minimizes supply chain risk -- **File system:** Reads arbitrary files for parsing — expected behavior, but no sandboxing -- **No authentication:** MCP server has no auth — relies on the transport layer (stdio) for access control - -**The one concern:** The `apiKeyCommand` config field runs an arbitrary command via `execFileSync`. If `.codegraphrc.json` is committed to a repo and an attacker modifies it, they could execute arbitrary commands when codegraph loads config. This is documented behavior (similar to `.npmrc` scripts) but worth noting. - -### 7. API Design - -**Score: 8/10** - -The programmatic API (`index.ts`) exports 40+ functions with a clear naming convention: `*Data()` for data-returning functions, `export*` for serialization, `build*` for construction. Error classes are exported. Constants are exported. CLI formatters are explicitly excluded. - -**The good:** -- Dual CJS/ESM output -- `loadConfig` exported for programmatic use -- Data functions return plain objects, not formatted strings - -**The concern:** No TypeScript type exports from the package. Users importing `@optave/codegraph` get the functions but would need to import types from internal paths. The `types.ts` file should have its key interfaces re-exported from the package root. - -### 8. Documentation - -**Score: 7/10** - -CLAUDE.md is the primary documentation — comprehensive, accurate, and serves both human developers and AI agents. ADR-001 is thorough. - -**Missing:** -- API reference documentation (JSDoc exists but no generated docs) -- Architecture diagrams (the layer table in CLAUDE.md is good but a visual diagram would help) -- Contributor onboarding guide -- ~~CLAUDE.md references "JS source is plain JavaScript"~~ — already corrected; CLAUDE.md describes TypeScript source - ---- - -## Competitive Verification - -### Does Codegraph Have a Reason to Exist? - -**Yes.** After verifying competitors, codegraph occupies a unique niche: **local-only, deterministic, function-level dependency graphs with MCP server support and zero cloud dependency.** - -No other single tool combines all of: -1. Function-level (not just file-level) dependency resolution -2. MCP server for AI agent integration -3. Fully local — no cloud, no LLM required for core features -4. Deterministic analysis (same input → same output) -5. Multi-language support (11 languages) -6. Incremental builds -7. CLI + programmatic API + MCP in one package - -### Verified Competitor Comparison - -All claims verified against actual GitHub READMEs and source repositories. Items marked [UNVERIFIED] could not be confirmed from source. - -| Feature | Codegraph | Sourcegraph | Joern | Semgrep | stack-graphs | narsil-mcp | CKB | GitNexus | -|---------|-----------|-------------|-------|---------|--------------|------------|-----|----------| -| **License** | MIT | **Proprietary** (no longer OSS) | Apache-2.0 | LGPL-2.1 (partial) | Apache/MIT (**archived**) | Apache-2.0 | Custom (free <$25K) | PolyForm NC | -| **MCP server** | Yes (built-in) | No | No (3rd party only) | Yes (built-in) | No | Yes (MCP-only) | Yes | Yes | -| **Standalone CLI** | Yes | Yes (src-cli) | Yes (Scala REPL) | Yes | No (library) | No | Yes | Yes | -| **Fully local** | Yes | No (server req'd) | Yes | Yes (CE) | Yes (library) | Yes | Yes | No (Docker+Memgraph) | -| **No LLM required** | Yes | Partial (Cody needs LLM) | Yes | Yes (CE) | Yes | Partial (neural search opt.) | Yes | No (RAG agent) | -| **Deterministic** | Yes | Yes (search) | Yes | Yes | Yes | Yes (core) | Yes | Partial | -| **Function-level deps** | Yes | No (search+nav) | Yes (CPG) | No (pattern match) | No (name resolution) | Partial (call graph) | Yes (SCIP-based) | Yes | -| **Incremental** | Yes (all 11 langs) | Via SCIP | No | PR-scoped | Yes (design goal) | File-level watch | **Go only** | [UNVERIFIED] | -| **Languages** | 11 | 10+ via SCIP | 6-7 core | 30+ | Framework (lang-agnostic) | 32 | 12 (tiered quality) | [UNVERIFIED] | -| **Prod deps** | 3 | Hundreds | JVM ecosystem | Python ecosystem | Rust (compiled) | Rust (compiled) | Go (compiled) | Node.js | -| **Storage** | SQLite | PostgreSQL | Custom graph DB | None (stateless) | N/A | In-memory+persist | SCIP index files | LadybugDB | -| **Stars** | — | 10.3K (archived snapshot) | 3.0K | 14.6K | 873 (archived) | 132 | 79 | 19.9K (very new) | -| **Status** | Active | Private/proprietary | Active | Active | **Archived** | Active | Active | Active (new, Feb 2026) | - -### Key Competitive Insights - -**Sourcegraph** is no longer open source. The main repo went private; only an archived public snapshot remains. The last Apache-licensed commit is explicitly marked. Current license is proprietary. Still the gold standard for code intelligence at scale, but no longer a viable open-source alternative. Codegraph's local-only, zero-setup approach is a genuine differentiator. - -**Joern** (ShiftLeft) builds a Code Property Graph (CPG) — a superset of what codegraph builds (AST + CFG + PDG + call graph). Joern is more academically rigorous but requires JDK 21, has no incremental builds, and has no native MCP server (only 3rd-party wrappers). For security analysis, Joern is superior. For AI agent integration, codegraph wins. - -**Semgrep** is pattern-based, not graph-based. It finds code patterns, not dependencies. Different tool category. Notable: Semgrep does have a built-in MCP server (`semgrep mcp`) and Claude Code plugin. Cross-file analysis is proprietary (Pro-only). - -**stack-graphs** (GitHub) is **archived and abandoned**. README states: "This repository is no longer supported or updated by GitHub." Was a research-grade name resolution library using scope graph theory from TU Delft, not a user-facing tool. - -**narsil-mcp** (132 stars) is MCP-native with 90 claimed tools and 32 languages. Closest competitor in the "local code intelligence for AI agents" space. Key gaps vs codegraph: no standalone CLI (MCP-only), optional LLM dependency for neural search, no SQLite persistence model. 90-tool breadth claim is ambitious for its maturity. - -**CKB** (`SimplyLiz/CodeMCP`, 79 stars) is the most direct feature competitor — impact analysis, call graphs, dead code detection, MCP server, CLI. Key differences: SCIP-dependent for deep analysis (requires running language-specific indexers), incremental indexing is Go-only, custom restrictive license (free <$25K revenue, paid above). - -**GitNexus** (19.9K stars, very new — trending Feb 2026) has an impressive feature set with browser-based zero-server mode and LadybugDB graph storage. However: **PolyForm Noncommercial license** blocks enterprise adoption, and the built-in "Graph RAG Agent" requires an LLM for queries. - -### Competitive Moat Assessment - -**Defensible differentiators:** -1. Only tool that combines MCP + CLI + programmatic API in one package -2. Only tool with deterministic, local, function-level analysis + semantic search -3. 11-language support with both native speed and universal WASM fallback -4. 3 production dependencies — smallest attack surface of any comparable tool -5. Self-dogfooding (uses itself for quality enforcement) — creates a virtuous cycle - -**Not defensible:** -1. MCP server support is trivial to add — Semgrep already has `semgrep mcp` -2. Tree-sitter parsing is available to everyone — narsil-mcp uses it for 32 languages -3. Community detection and complexity metrics are well-known algorithms, not proprietary - -**Threats:** -1. **GitNexus** (19.9K stars) has momentum but is noncommercial-licensed — if they relicense to MIT/Apache, it becomes the primary threat -2. **CKB** has the closest feature set but is SCIP-dependent and restrictively licensed -3. **narsil-mcp** could add a CLI and close the gap quickly (same tech stack, same tree-sitter base) -4. **JetBrains** or **Cursor** adding built-in code graph MCP tools would commoditize the AI agent integration angle - -**Verdict:** The moat is the *combination* — no single feature is unique, but no competitor offers the same bundle with MIT license + zero LLM + all-language incremental + CLI + MCP + programmatic API. The licensing advantage over GitNexus and CKB is significant for enterprise adoption. - ---- - -## Structural Census Summary - -| Metric | Value | -|--------|-------| -| **Source files** | 280 (TypeScript) | -| **Source LOC** | 45,796 | -| **Rust LOC** | 11,413 | -| **Test files** | 115 | -| **Test LOC** | 32,538 | -| **Graph nodes** | 10,997 | -| **Graph edges** | 20,991 | -| **Graph quality** | 64/100 | -| **Caller coverage** | 29.0% (1,486/5,122) | -| **Call confidence** | 81.1% (3,273/4,035) | -| **File-level cycles** | 1 (37-file MCP barrel) | -| **Function-level cycles** | 8 | -| **Communities** | 107 (modularity: 0.48) | -| **Community drift** | 49% | -| **Avg cognitive complexity** | 8.5 | -| **Max cognitive complexity** | 243 (`walk_node_depth` in Rust extractor) | -| **Avg maintainability index** | 60.1 | -| **Functions above threshold** | 413 (of 1,769) | -| **Production dependencies** | 3 | -| **Dead code (total)** | 8,960 symbols (per `codegraph stats`) | -| **Dead code (callable)** | 198 functions + 3,287 methods | -| **Dead code (leaf nodes)** | 4,090 (parameters, properties, constants) | -| **Dead code (unresolved)** | 3,593 (import resolution gaps) | -| **Dead code (FFI)** | 211 (Rust napi boundary) | -| **Dead code (entry points)** | 391 (CLI commands, framework entry) | - -### Dead Code Breakdown - -The raw "8,960 dead" count from `codegraph stats` is misleading. The categorized breakdown below accounts for 8,285 of these; the remaining 675 are symbols that fall outside these five categories (e.g., uncategorized type aliases, re-exported symbols). Breaking down: - -| Category | Count | Explanation | -|----------|-------|-------------| -| **dead-leaf** | 4,090 | Parameters, properties, constants — leaf nodes without callers. Most are struct fields, interface properties, and function parameters. Not actionable dead code. | -| **dead-unresolved** | 3,593 | Symbols whose callers couldn't be resolved. This reflects import resolution gaps, not actual dead code. Includes many TypeScript interface methods, framework callbacks, and dynamic dispatch. | -| **dead-entry** | 391 | CLI command handlers, MCP tool handlers, test helpers. These are framework entry points called by Commander/MCP, not by codegraph's own code. Correctly classified. | -| **dead-ffi** | 211 | Rust napi-rs boundary functions. Called from JS via native addon, not visible in the JS call graph. Correctly classified. | -| **Genuinely dead functions** | ~198 | After excluding leaf nodes (4,090), unresolved (3,593), entry points (391), and FFI (211), roughly 198 functions appear genuinely unreferenced. | - -**By kind:** 3,328 dead parameters (expected — parameters are rarely "called"), 3,287 dead methods (mostly interface method declarations and type-only methods), 585 dead interfaces (TypeScript type declarations), 427 dead constants, 335 dead properties, 198 dead functions, 56 dead structs, 44 dead types, 21 dead classes, 3 dead enums, 1 dead trait. - -**Verdict:** The actual dead callable code is ~198 functions out of 5,122 (3.9%). The 8,960 headline number includes 93% non-actionable symbols (leaf nodes, unresolved imports, entry points, FFI boundaries). Codegraph should consider reporting these categories separately by default. - -### Complexity Hotspots - -The top 5 most complex functions: - -| Function | File | Cognitive | Cyclomatic | MI | -|----------|------|-----------|------------|-----| -| `walk_node_depth` | extractors/javascript.rs | 243 | 79 | 8.4 | -| `makePartition` | leiden/partition.ts | 217 | 97 | 5.0 | -| `runLouvainUndirectedModularity` | leiden/optimiser.ts | 154 | 46 | 29.8 | -| `build_call_edges` | edge_builder.rs | 146 | 49 | 22.1 | -| `extractGoTypeMapDepth` | extractors/go.ts | 143 | 48 | 37.6 | - -The Rust extractors dominate the complexity rankings because each `walk_node_depth` is a monolithic AST traversal. The Leiden vendored code (partition + optimiser) is inherently algorithmic complexity. These are the maintenance risk areas. - ---- - -## Strategic Verdict - -### 1. Does Codegraph Have a Reason to Exist? - -**Yes.** Verified against 6 competitors. No other tool offers: -- Local + deterministic + function-level + MCP + CLI + 11 languages + 3 deps - -The closest competitor (narsil-mcp) is MCP-only with narrower analysis. The dominant player (Sourcegraph) requires cloud infrastructure. Joern requires JVM. stack-graphs is a library, not a tool. - -Codegraph's value proposition — "give AI agents a deterministic code graph without cloud dependencies" — is real, verified, and currently unmatched. - -### 2. Fundamental Design Flaws - -These cannot be fixed incrementally: - -1. **29% caller coverage means 71% of functions have no detected callers.** This is the tool's Achilles heel. Import resolution's 6-level fallback system is creative but fundamentally heuristic. For TypeScript projects (codegraph's primary audience), the tool should approach 80%+ caller coverage. The gap is likely caused by: dynamic imports, re-exports, barrel files, decorators, and framework-specific patterns (React hooks, Express middleware). **This is fixable** — TypeScript's type system provides information that could dramatically improve resolution. The tool should leverage `tsconfig.json` path mappings and TypeScript's module resolution algorithm rather than reimplementing a heuristic resolver. - -2. **In-memory graph model has no persistence layer.** The CodeGraph adjacency list is rebuilt from SQLite on every query session. For large repos, this is wasteful. A memory-mapped or disk-backed graph (like Joern's OverflowDB) would allow graph queries without full materialization. - -### 3. Missed Opportunities - -1. **TypeScript-aware resolution.** The tool treats TypeScript as "JavaScript with types" for resolution purposes. A TypeScript-native resolver using `ts.createProgram` or the TypeScript Language Service would dramatically improve caller coverage for .ts/.tsx files — the tool's primary use case. - -2. **LSP integration.** The tool builds its own symbol index. LSP servers (typescript-language-server, rust-analyzer) already maintain precise symbol indexes. Using LSP as a resolution backend for supported languages would improve precision with less code. - -3. **Incremental graph queries.** Currently, queries hit SQLite. The tool could maintain a materialized graph in a background process (like a language server) that responds instantly to queries without DB round-trips. - -4. **Call graph visualization in MCP.** The MCP server returns text results. Returning structured graph data (nodes + edges) that clients can render would be more useful for AI agents building mental models. - -### 4. Kill List - -Code that should be deleted, not improved: - -1. **`src/vendor.d.ts` (40 LOC)** — Manual type declarations for `better-sqlite3`. The package has `@types/better-sqlite3` on DefinitelyTyped. Use the community types instead. - -2. ~~**Stale CLAUDE.md references**~~ — Already resolved. CLAUDE.md correctly describes the source as TypeScript. No action needed. - -### 5. Build vs Buy - -| Component | Current | Recommendation | -|-----------|---------|----------------| -| Leiden community detection | Vendored (1,685 LOC) | Keep — consistent with minimal-dep philosophy, properly attributed | -| SQL query builder | Custom (200 LOC) | Keep — simple, well-validated, no ORM needed | -| CLI framework | Commander | Keep — lightweight, standard | -| Graph model | Custom CodeGraph class | Keep for now — consider `graphology` npm package if features grow | -| Config loading | Custom | Keep — clean implementation, handles secret resolution | - -No changes recommended. The custom code is justified and well-maintained. - -### 6. Roadmap Critique - -**What's right:** -- Phase 6 (Native Analysis Acceleration) addresses the hybrid engine gap -- Multi-repo integration extends the value proposition -- VS Code extension leverages the WASM fallback - -**What's missing:** -- **TypeScript-native resolution** should be the #1 priority. The 29% caller coverage is the tool's biggest weakness, and TypeScript projects are the primary audience. -- **Graph quality metrics improvement** — the tool should target 80%+ caller coverage before adding new features. -- **MCP cycle fix** — the 37-file cycle should be resolved before it grows. -- **Structured MCP responses** — returning graph data (not text) from MCP tools would better serve AI agents. - -**What's wrong:** -- Adding more languages (the current 11) has diminishing returns if caller coverage for existing languages is 29%. Depth over breadth. - ---- - -## Final Verdict - -**Would I invest in this project?** - -**Conditional yes.** The tool fills a real gap, has a defensible competitive position, and is well-engineered for its scale. The 3-dependency discipline is exceptional. The TypeScript migration is complete. The dual-engine architecture is justified. - -**The condition:** Fix the caller coverage problem. A code intelligence tool that can only resolve callers for 29% of functions is fundamentally limited in the value it can provide. The diff-impact analysis, blast radius calculations, and dead code detection all degrade proportionally with caller coverage. If codegraph can reach 70%+ caller coverage for TypeScript/JavaScript projects (its primary audience), it becomes significantly more valuable. If it stays at 29%, it's a structural overview tool pretending to be a dependency analysis tool. - -**Investment-grade improvements (prioritized):** -1. TypeScript-native import resolution → +30-40% caller coverage -2. Break the MCP 37-file cycle → demonstrates architectural discipline -3. Split types.ts into domain-scoped type files → reduces coupling -4. Report dead code categories separately by default → honest metrics -5. Add missing ADRs (TypeScript migration, Repository pattern, Leiden vendoring, MCP architecture) - -**What this tool gets right that most don't:** It's honest about its limitations (quality score is prominent), it's opinionated about minimal dependencies, and it dogfoods itself aggressively. These are the hallmarks of a well-maintained open-source project. The architecture is sound — it needs deeper analysis, not a different design. From e5fd2fb6d1f1f87b3afc352b1f2ce7823bac8cda Mon Sep 17 00:00:00 2001 From: carlos-alm <127798846+carlos-alm@users.noreply.github.com> Date: Sat, 4 Apr 2026 14:49:36 -0600 Subject: [PATCH 3/3] fix(skill): restore ADR lookup path to docs/architecture/decisions/ The output consolidation to generated/architecture/ was correctly applied to audit output paths, but incorrectly also changed ADR lookup references. ADRs are committed design-decision records that live in docs/architecture/decisions/, not generated output. --- .claude/skills/architect/SKILL.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.claude/skills/architect/SKILL.md b/.claude/skills/architect/SKILL.md index 46506470..134850d5 100644 --- a/.claude/skills/architect/SKILL.md +++ b/.claude/skills/architect/SKILL.md @@ -60,7 +60,7 @@ Run `/worktree` to get an isolated copy of the repo. `CLAUDE.md` mandates this f 1. Read `package.json` to get the current version 2. Get the current date, commit SHA, and branch name 3. Check `generated/architecture/` for previous audit files -4. **Read all ADRs in `generated/architecture/decisions/`.** These are the project's settled architectural decisions. Read every file — they document rationale, trade-offs, alternatives considered, and trajectory. The audit must evaluate the codebase *against* these decisions: are they being followed? Are the stated trade-offs still accurate? Has anything changed that invalidates the rationale? +4. **Read all ADRs in `docs/architecture/decisions/`.** These are the project's settled architectural decisions. Read every file — they document rationale, trade-offs, alternatives considered, and trajectory. The audit must evaluate the codebase *against* these decisions: are they being followed? Are the stated trade-offs still accurate? Has anything changed that invalidates the rationale? 5. Run `codegraph build --no-incremental` to ensure fresh metrics ### Phase 2 — Structural Census @@ -102,7 +102,7 @@ For each architectural layer, evaluate against these dimensions: - Where does the tool present incomplete data as complete? **F. ADR Compliance** -- Does the implementation match the decisions documented in `generated/architecture/decisions/`? +- Does the implementation match the decisions documented in `docs/architecture/decisions/`? - Are the trade-offs described in ADRs still accurate given the current code? - Has the codebase drifted from any stated trajectory? If so, is that drift justified or accidental? - Are there architectural decisions that *should* have an ADR but don't? @@ -171,7 +171,7 @@ The deliverable must contain: - "Does Codegraph Have a Reason to Exist?" section (verified competitor data) - Executive summary (1 paragraph, brutally honest) - Scorecard (each dimension rated 1-10 with justification) -- **ADR compliance review** — for each ADR in `generated/architecture/decisions/`, assess whether the codebase follows the decision, whether the stated trade-offs are still valid, and whether any drift has occurred. Flag missing ADRs for decisions that exist in code but aren't documented +- **ADR compliance review** — for each ADR in `docs/architecture/decisions/`, assess whether the codebase follows the decision, whether the stated trade-offs are still valid, and whether any drift has occurred. Flag missing ADRs for decisions that exist in code but aren't documented - Detailed findings per layer - Verified competitor comparison table - Strategic recommendations (prioritized)