Skip to content

perf: reduce memory and network cost for checkpoint metadata#680

Open
pjbgf wants to merge 15 commits intomainfrom
pjbgf/jsonl-data2
Open

perf: reduce memory and network cost for checkpoint metadata#680
pjbgf wants to merge 15 commits intomainfrom
pjbgf/jsonl-data2

Conversation

@pjbgf
Copy link
Member

@pjbgf pjbgf commented Mar 11, 2026

Summary

This PR focuses on two key optimisations:

1. Reduce memory allocations when reading checkpoint metadata

  • Streaming json.Decoder replaces json.Unmarshal for reading metadata.json files, avoiding loading entire JSON documents into memory as intermediate []byte
  • Lite structs (sessionMetadataLite, checkpointSummaryLite) replace full CommittedMetadata/CheckpointSummary when only a few fields are needed — avoids allocating large unused nested objects (Summary, InitialAttribution, TokenUsage, etc.)
  • Shared helpers (decodeSummaryLiteFromTree, decodeSessionMetadataLite) deduplicate the metadata reading logic between ListCheckpoints and ReadCheckpointMetadata

2. Reduce network cost for pulling checkpoint data from remote

  • Shallow treeless fetch (--depth=1 --filter=blob:none) downloads only the tip commit and tree objects for the metadata branch — no blobs, no history
  • On-demand blob resolution via BlobResolver uses go-git's HasEncodedObject to check the local object store (loose objects + packfile indices) before fetching anything
  • Targeted subtree blobs fetch via git fetch-pack requests all blobs within the checkpoint tree, so that a consistent view is present in disk. Automatically falling back to a full branch fetch if the server doesn't support allow-reachable-sha1-in-want

Resume flow (before → after)

Before: git fetch origin entire/checkpoints/v1 → downloads entire branch (all commits, trees, and every blob across all
checkpoints)

After:

  1. git fetch --depth=1 --filter=blob:none → tip commit + trees only
  2. go-git tree walk → find all blob hashes within the checkpoint sub-tree
  3. HasEncodedObject → skip blobs already in local store
  4. git fetch-pack <url> <hash>... → fetch only the missing blobs

This translates into an user upon doing a single branch clone of a repository being able to do a entire resume <branch> with a rather efficient network and local storage data consumption. Behind the scenes entire will only fetch the blobs contained in the checkpoint subtree, as opposed to the entire branch,

Test plan

  • BlobResolver unit tests: HasBlob present/missing, ReadBlob present/missing
  • CollectTranscriptBlobHashes: single session, multi-session, chunked, nonexistent checkpoint
  • Existing ListCheckpoints and ReadCheckpointMetadata tests still pass with lite structs
  • Full mise run lint clean
  • Manual E2E: entire resume on a branch with remote-only checkpoint data

Copilot AI review requested due to automatic review settings March 11, 2026 12:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes entire resume and checkpoint metadata handling by reducing memory allocations when decoding checkpoint metadata and by reducing network transfer when resuming from remote-only checkpoint data.

Changes:

  • Switch metadata reads to streaming json.Decoder + “lite” structs for ListCheckpoints / ReadCheckpointMetadata.
  • Add a “treeless” metadata-branch fetch (--depth=1 --filter=blob:none) and best-effort prefetching of only required transcript blobs.
  • Introduce BlobResolver + tree-walk collection of transcript blob hashes (including chunked transcripts) and associated unit tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmd/entire/cli/strategy/common_helpers_test.go Adds a preserved “full deserialize” test helper for checkpoint metadata parsing.
cmd/entire/cli/strategy/common.go Refactors checkpoint listing/reading to stream JSON decoding and use minimal structs.
cmd/entire/cli/resume.go Attempts treeless fetch first and adds best-effort transcript blob prefetch before restore.
cmd/entire/cli/git_operations.go Adds treeless metadata fetch and a targeted blob fetch helper.
cmd/entire/cli/checkpoint/blob_resolver.go Adds local-object-store blob existence/read + transcript blob hash collection via tree entries.
cmd/entire/cli/checkpoint/blob_resolver_test.go Adds unit tests for blob resolution and transcript hash collection.

@pjbgf pjbgf force-pushed the pjbgf/jsonl-data2 branch 2 times, most recently from a5c9b4f to 8b9cac1 Compare March 16, 2026 10:19
@pjbgf pjbgf marked this pull request as ready for review March 16, 2026 10:28
@pjbgf pjbgf requested a review from a team as a code owner March 16, 2026 10:28
@pjbgf pjbgf force-pushed the pjbgf/jsonl-data2 branch from 0f69a62 to 270c665 Compare March 17, 2026 09:18
pjbgf and others added 7 commits March 18, 2026 09:47
…data

Avoid allocating large unused fields (Summary, InitialAttribution,
TokenUsage) when reading checkpoint metadata by introducing minimal
sessionMetadataLite and checkpointSummaryLite structs with streaming
json.Decoder instead of json.Unmarshal into full types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: f64fee85be5f
Add treeless fetch (--filter=blob:none) for the metadata branch so only
commit and tree objects are downloaded initially. Transcript blobs
(full.jsonl) are then fetched individually via git fetch-pack using their
SHA-1 hashes, skipping any that already exist in the local object store.

- BlobResolver: go-git HasEncodedObject checks for local blob existence
- CollectTranscriptBlobHashes: walks tree entries to find full.jsonl blob
  hashes without reading blob content (works after treeless fetch)
- FetchMetadataTreeOnly: git CLI fetch with --filter=blob:none
- FetchBlobsByHash: git fetch-pack for specific objects, with full-fetch
  fallback
- resume.go: ensureTranscriptBlobs wired before RestoreLogsOnly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 48e91bc15f41
Add --depth=1 to the filtered fetch so only the tip commit and its tree
objects are downloaded, skipping the entire branch history.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 564aea6f7ca5
…ches

The single-checkpoint path in resumeFromCurrentBranch used
strategy.GetMetadataBranchTree directly, which fails on fresh clones
where entire/checkpoints/v1 doesn't exist locally. Now uses
getMetadataTree() with the full fallback chain (treeless fetch → local →
full fetch → remote tree), matching the multi-checkpoint path.

Also fixes BranchExistsOnRemote to query the actual remote via
git ls-remote when the local remote-tracking ref is missing, so
 works after a fresh clone without a prior fetch.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paulo Gomes <paulo@entire.io>
…e fallback

- resume_test: remove remote ref creation that now succeeds via
  getMetadataTree's remote tree fallback (added in 649ab7c), update
  assertion to expect nil when no metadata is available anywhere
- hooks_test: add clearGlobalHooksPath helper to override global
  core.hooksPath in test repos so git resolves to .git/hooks
- integration_test/testenv: set local core.hooksPath in InitRepo to
  override global setting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: cab75de47439
…t reads

Replace direct getSessionsBranchTree() calls with getFetchingTree() which
transparently fetches missing blobs when accessing treeless-cloned metadata.
Updates resume path to use the same fallback chain (treeless fetch → local →
full fetch → remote tree) for consistency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 5eec47c03756
…CollectTranscriptBlobHashes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: e5eaf035e386
@pjbgf pjbgf force-pushed the pjbgf/jsonl-data2 branch from 270c665 to 9ca78e1 Compare March 18, 2026 09:47
Soph and others added 2 commits March 18, 2026 12:58
The comment referenced "err from line 85" which was from a prior
revision. The actual suppressed error is treeErr from the session
enumeration loop, used as a break condition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: c69f1bbf8b6a
Previously, when fetch-pack failed and the fallback to full metadata
fetch succeeded, the fetch-pack error was silently discarded. Now logs
the error at debug level for traceability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: d0b47f037e1a
@pjbgf pjbgf added the performance Performance issues label Mar 18, 2026
Soph and others added 6 commits March 18, 2026 13:56
fetch-pack bypasses git's credential helpers, so HTTPS repos with
auth tokens would always fail the initial attempt, adding latency
before the fallback. Switch to "git fetch origin <hash>" which goes
through the normal transport and credential helper chain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: a763734f96af
… reading

ReadCheckpointMetadata and ReadCheckpointMetadataFromSubtree had ~60
lines of identical decoding logic. Extract a shared decodeCheckpointInfo
helper that takes a normalizePath function to handle the only real
difference: how session metadata paths are transformed for tree lookups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: a81cfc203db5
Add a note that opening a repo is cheap (reads .git/config + refs)
so the multiple opens across fallback attempts are acceptable for
correctness vs go-git's stale packfile index cache.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 22c23e450ee3
…Branch

The control flow ensures metadataTree and freshRepo are set together
by either resolveLatestCheckpoint or getMetadataTree, but the nil
initialization makes this non-obvious. Add a comment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 01a34baed435
git fetch output may contain credential helper prompts, remote URLs
with embedded tokens, or other auth-related stderr. Drop the raw
output from both the debug log and the returned error message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 511569c9de2a
@cursor
Copy link

cursor bot commented Mar 18, 2026

PR Summary

Medium Risk
Changes resume/checkpoint reading to perform treeless fetches and on-demand blob fetching via git CLI; failures or stale-object edge cases could break resume/read flows, but scope is limited to checkpoint metadata access paths.

Overview
Reduces memory and network cost when resuming/reading checkpoint metadata. Checkpoint metadata decoding now uses streaming json.Decoder with minimal "lite" structs and shared helpers, plus a new ReadCheckpointMetadataFromSubtree path to avoid loading large nested fields.

Adds treeless fetch + on-demand blob reads for checkpoints. Introduces FetchingTree/BlobResolver to fetch missing blobs by hash after git fetch --filter=blob:none, with a git cat-file fallback for go-git storer cache staleness; GitStore/ManualCommitStrategy can now be configured with a blob fetcher.

Updates resume and related git ops. entire resume now attempts a shallow treeless metadata fetch first, opens a fresh repo after fetches to avoid stale indices, wraps checkpoint subtrees with FetchingTree, and improves diagnostics; adds new git helpers (FetchMetadataTreeOnly, FetchBlobsByHash) and strengthens BranchExistsOnRemote via git ls-remote fallback.

Tests/benchmarks expanded. Adds unit tests for blob resolution, adjusts resume tests for new behavior, adds benchmark coverage for metadata decoding/listing, adds E2E coverage for resuming from a cloned repo without a local metadata branch, and hardens tests against global core.hooksPath.

Written by Cursor Bugbot for commit db99d03. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Unused resolver field allocated in every FetchingTree
    • Removed the unused resolver field from FetchingTree and the storer parameter from NewFetchingTree, eliminating the dead BlobResolver allocation on every FetchingTree creation.
  • ✅ Fixed: Per-blob fetching misses intended batch optimization
    • Added PrefetchBlobs() method that recursively collects all blob hashes from the tree and batch-fetches them in a single call, and wired it into all three resume flow FetchingTree creation sites to replace per-blob network round-trips.

Create PR

Or push these changes by commenting:

@cursor push 899ccc0906
Preview (899ccc0906)
diff --git a/cmd/entire/cli/checkpoint/committed.go b/cmd/entire/cli/checkpoint/committed.go
--- a/cmd/entire/cli/checkpoint/committed.go
+++ b/cmd/entire/cli/checkpoint/committed.go
@@ -1303,7 +1303,7 @@
 	if err != nil {
 		return nil, err
 	}
-	return NewFetchingTree(ctx, tree, s.repo.Storer, s.blobFetcher), nil
+	return NewFetchingTree(ctx, tree, s.blobFetcher), nil
 }
 
 // getSessionsBranchTree returns the tree object for the entire/checkpoints/v1 branch.

diff --git a/cmd/entire/cli/checkpoint/fetching_tree.go b/cmd/entire/cli/checkpoint/fetching_tree.go
--- a/cmd/entire/cli/checkpoint/fetching_tree.go
+++ b/cmd/entire/cli/checkpoint/fetching_tree.go
@@ -10,8 +10,8 @@
 	"github.com/entireio/cli/cmd/entire/cli/logging"
 
 	"github.com/go-git/go-git/v6/plumbing"
+	"github.com/go-git/go-git/v6/plumbing/filemode"
 	"github.com/go-git/go-git/v6/plumbing/object"
-	"github.com/go-git/go-git/v6/plumbing/storer"
 )
 
 // BlobFetchFunc fetches missing blob objects by hash from a remote.
@@ -28,22 +28,18 @@
 // visible to go-git's storer. As a fallback, File() reads the blob via
 // "git cat-file" which always sees the current on-disk object store.
 type FetchingTree struct {
-	inner    *object.Tree
-	ctx      context.Context
-	resolver *BlobResolver
-	fetch    BlobFetchFunc
+	inner *object.Tree
+	ctx   context.Context
+	fetch BlobFetchFunc
 }
 
 // NewFetchingTree wraps a git tree with on-demand blob fetching.
-// The storer is used to check if blobs exist locally, and fetch is called
-// to download any that are missing. If fetch is nil, File() behaves
-// identically to the underlying tree.
-func NewFetchingTree(ctx context.Context, tree *object.Tree, s storer.EncodedObjectStorer, fetch BlobFetchFunc) *FetchingTree {
+// If fetch is nil, File() behaves identically to the underlying tree.
+func NewFetchingTree(ctx context.Context, tree *object.Tree, fetch BlobFetchFunc) *FetchingTree {
 	return &FetchingTree{
-		inner:    tree,
-		ctx:      ctx,
-		resolver: NewBlobResolver(s),
-		fetch:    fetch,
+		inner: tree,
+		ctx:   ctx,
+		fetch: fetch,
 	}
 }
 
@@ -156,13 +152,48 @@
 		return nil, fmt.Errorf("tree %s: %w", path, err)
 	}
 	return &FetchingTree{
-		inner:    subtree,
-		ctx:      t.ctx,
-		resolver: t.resolver,
-		fetch:    t.fetch,
+		inner: subtree,
+		ctx:   t.ctx,
+		fetch: t.fetch,
 	}, nil
 }
 
+// PrefetchBlobs recursively walks the tree entries and fetches all blob hashes
+// in a single batch call. This avoids per-blob network round-trips when multiple
+// File() calls will be made against a tree from a treeless fetch.
+// Errors are non-fatal: on-demand fetching in File() serves as a fallback.
+func (t *FetchingTree) PrefetchBlobs() error {
+	if t.fetch == nil {
+		return nil
+	}
+	hashes := collectBlobHashes(t.inner)
+	if len(hashes) == 0 {
+		return nil
+	}
+	logging.Debug(t.ctx, "FetchingTree.PrefetchBlobs: batch fetching blobs",
+		slog.Int("blob_count", len(hashes)),
+	)
+	return t.fetch(t.ctx, hashes)
+}
+
+// collectBlobHashes recursively walks a git tree and returns all blob hashes.
+// Only tree objects are resolved; blobs are collected by hash without reading them.
+// This works after a treeless fetch where tree objects are local but blobs are not.
+func collectBlobHashes(tree *object.Tree) []plumbing.Hash {
+	var hashes []plumbing.Hash
+	for _, entry := range tree.Entries {
+		if entry.Mode == filemode.Dir {
+			subtree, err := tree.Tree(entry.Name)
+			if err == nil {
+				hashes = append(hashes, collectBlobHashes(subtree)...)
+			}
+		} else {
+			hashes = append(hashes, entry.Hash)
+		}
+	}
+	return hashes
+}
+
 // RawEntries returns the direct tree entries (no blob reads needed).
 func (t *FetchingTree) RawEntries() []object.TreeEntry {
 	return t.inner.Entries

diff --git a/cmd/entire/cli/resume.go b/cmd/entire/cli/resume.go
--- a/cmd/entire/cli/resume.go
+++ b/cmd/entire/cli/resume.go
@@ -167,13 +167,11 @@
 	// Multiple checkpoints (squash merge): resolve latest by CreatedAt timestamp.
 	// resolveLatestCheckpoint also returns the metadata tree so we can reuse it
 	// for the ReadCheckpointMetadata call below without a redundant lookup.
-	// Both metadataTree and freshRepo are set together — either by resolveLatestCheckpoint
-	// (multi-checkpoint path) or by getMetadataTree (single-checkpoint path below).
-	// freshRepo is always non-nil before it reaches NewFetchingTree.
+	// metadataTree is set by either resolveLatestCheckpoint (multi-checkpoint path)
+	// or getMetadataTree (single-checkpoint path below).
 	var metadataTree *object.Tree
-	var freshRepo *git.Repository
 	if len(result.checkpointIDs) > 1 {
-		latest, tree, latestRepo, err := resolveLatestCheckpoint(ctx, result.checkpointIDs)
+		latest, tree, err := resolveLatestCheckpoint(ctx, result.checkpointIDs)
 		if err != nil {
 			// No metadata available — nothing to resume from
 			logging.Warn(logCtx, "resolveLatestCheckpoint failed",
@@ -189,13 +187,12 @@
 			len(result.checkpointIDs), result.commitHash[:7], skipped)
 		checkpointID = latest
 		metadataTree = tree
-		freshRepo = latestRepo
 	}
 
 	// Get metadata branch tree for lookups (reuse from resolveLatestCheckpoint if available)
 	if metadataTree == nil {
 		var treeErr error
-		metadataTree, freshRepo, treeErr = getMetadataTree(ctx)
+		metadataTree, _, treeErr = getMetadataTree(ctx)
 		if treeErr != nil {
 			logging.Warn(logCtx, "getMetadataTree failed, checking remote",
 				slog.String("checkpoint_id", checkpointID.String()),
@@ -238,10 +235,14 @@
 		slog.Any("entries", subtreeEntryNames),
 	)
 
-	// Wrap the checkpoint subtree with on-demand blob fetching.
-	// Use the fresh repo's storer (not the original repo) because a fetch may have
-	// created new packfiles that the original repo's storer doesn't know about.
-	ft := checkpoint.NewFetchingTree(ctx, cpSubtree, freshRepo.Storer, FetchBlobsByHash)
+	// Wrap the checkpoint subtree with on-demand blob fetching and prefetch all
+	// blobs in a single batch to avoid per-blob network round-trips.
+	ft := checkpoint.NewFetchingTree(ctx, cpSubtree, FetchBlobsByHash)
+	if prefetchErr := ft.PrefetchBlobs(); prefetchErr != nil {
+		logging.Debug(logCtx, "PrefetchBlobs failed, will fetch on demand",
+			slog.String("error", prefetchErr.Error()),
+		)
+	}
 
 	// Read metadata from checkpoint subtree (paths are relative to checkpoint root)
 	metadata, err := strategy.ReadCheckpointMetadataFromSubtree(ft, checkpointID.Path())
@@ -264,13 +265,13 @@
 }
 
 // resolveLatestCheckpoint reads metadata for each checkpoint ID and returns
-// the one with the latest CreatedAt, along with the metadata tree and fresh
-// repo for reuse. It tries the local metadata branch first, then fetches from
-// remote, then falls back to the remote tree directly.
-func resolveLatestCheckpoint(ctx context.Context, checkpointIDs []id.CheckpointID) (id.CheckpointID, *object.Tree, *git.Repository, error) {
-	metadataTree, freshRepo, err := getMetadataTree(ctx)
+// the one with the latest CreatedAt, along with the metadata tree for reuse.
+// It tries the local metadata branch first, then fetches from remote, then
+// falls back to the remote tree directly.
+func resolveLatestCheckpoint(ctx context.Context, checkpointIDs []id.CheckpointID) (id.CheckpointID, *object.Tree, error) {
+	metadataTree, _, err := getMetadataTree(ctx)
 	if err != nil {
-		return id.EmptyCheckpointID, nil, nil, err
+		return id.EmptyCheckpointID, nil, err
 	}
 
 	infoMap := make(map[id.CheckpointID]strategy.CheckpointInfo, len(checkpointIDs))
@@ -284,7 +285,13 @@
 			)
 			continue
 		}
-		ft := checkpoint.NewFetchingTree(ctx, cpSubtree, freshRepo.Storer, FetchBlobsByHash)
+		ft := checkpoint.NewFetchingTree(ctx, cpSubtree, FetchBlobsByHash)
+		if prefetchErr := ft.PrefetchBlobs(); prefetchErr != nil {
+			logging.Debug(ctx, "resolveLatestCheckpoint: PrefetchBlobs failed, will fetch on demand",
+				slog.String("checkpoint_id", cpID.String()),
+				slog.String("error", prefetchErr.Error()),
+			)
+		}
 		metadata, metaErr := strategy.ReadCheckpointMetadataFromSubtree(ft, cpID.Path())
 		if metaErr != nil {
 			logging.Debug(ctx, "resolveLatestCheckpoint: checkpoint metadata read failed",
@@ -297,9 +304,9 @@
 	}
 	latest, found := strategy.ResolveLatestCheckpointFromMap(checkpointIDs, infoMap)
 	if !found {
-		return id.EmptyCheckpointID, nil, nil, errors.New("no checkpoint metadata found")
+		return id.EmptyCheckpointID, nil, errors.New("no checkpoint metadata found")
 	}
-	return latest.CheckpointID, metadataTree, freshRepo, nil
+	return latest.CheckpointID, metadataTree, nil
 }
 
 // getMetadataTree returns the metadata branch tree and a fresh repo handle.
@@ -590,7 +597,12 @@
 		fmt.Fprintf(os.Stderr, "  git fetch origin entire/checkpoints/v1:entire/checkpoints/v1\n")
 		return nil //nolint:nilerr // Informational message, not a fatal error
 	}
-	ft := checkpoint.NewFetchingTree(ctx, cpSubtree, repo.Storer, FetchBlobsByHash)
+	ft := checkpoint.NewFetchingTree(ctx, cpSubtree, FetchBlobsByHash)
+	if prefetchErr := ft.PrefetchBlobs(); prefetchErr != nil {
+		logging.Debug(logCtx, "checkRemoteMetadata: PrefetchBlobs failed, will fetch on demand",
+			slog.String("error", prefetchErr.Error()),
+		)
+	}
 	metadata, err := strategy.ReadCheckpointMetadataFromSubtree(ft, checkpointID.Path())
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "Checkpoint '%s' found in commit but its metadata could not be read from entire/checkpoints/v1.\n", checkpointID)

diff --git a/cmd/entire/cli/resume_test.go b/cmd/entire/cli/resume_test.go
--- a/cmd/entire/cli/resume_test.go
+++ b/cmd/entire/cli/resume_test.go
@@ -459,7 +459,7 @@
 	// Pass checkpoint IDs in reverse chronological order (newest first),
 	// simulating git CLI squash merge trailer order.
 	reverseOrderIDs := []id.CheckpointID{cpID3, cpID2, cpID1}
-	latest, tree, _, err := resolveLatestCheckpoint(context.Background(), reverseOrderIDs)
+	latest, tree, err := resolveLatestCheckpoint(context.Background(), reverseOrderIDs)
 	if err != nil {
 		t.Fatalf("resolveLatestCheckpoint() error = %v", err)
 	}
@@ -476,7 +476,7 @@
 
 	// Also verify with chronological order
 	chronologicalIDs := []id.CheckpointID{cpID1, cpID2, cpID3}
-	latest2, _, _, err := resolveLatestCheckpoint(context.Background(), chronologicalIDs)
+	latest2, _, err := resolveLatestCheckpoint(context.Background(), chronologicalIDs)
 	if err != nil {
 		t.Fatalf("resolveLatestCheckpoint() error = %v", err)
 	}

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

type FetchingTree struct {
inner *object.Tree
ctx context.Context
resolver *BlobResolver
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused resolver field allocated in every FetchingTree

Low Severity

The resolver field (*BlobResolver) is allocated via NewBlobResolver(s) in every NewFetchingTree call and propagated in Tree(), but no FetchingTree method ever reads it. File() uses t.inner.File() as the existence check and t.fetch for fetching—t.resolver.HasBlob() and t.resolver.ReadBlob() are never called. This is dead code that wastes a heap allocation per FetchingTree instance.

Additional Locations (2)
Fix in Cursor Fix in Web

)

// Fetch the blob from the remote.
if fetchErr := t.fetch(t.ctx, []plumbing.Hash{entry.Hash}); fetchErr != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-blob fetching misses intended batch optimization

Medium Severity

FetchingTree.File() calls t.fetch with a single-element slice for every blob miss, triggering one git fetch origin <hash> subprocess per file. During resume, reading root metadata plus N session metadata files generates N+1 separate network round-trips, each spawning a subprocess. FetchBlobsByHash accepts a batch of hashes and CollectTranscriptBlobHashes was built to collect them, but neither is used to pre-fetch blobs before tree reads. The stated "targeted subtree blobs fetch" optimization is built but never wired into the resume flow.

Fix in Cursor Fix in Web

@pjbgf pjbgf enabled auto-merge March 18, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance issues

Development

Successfully merging this pull request may close these issues.

3 participants