perf: reduce memory and network cost for checkpoint metadata#680
perf: reduce memory and network cost for checkpoint metadata#680
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes entire resume and checkpoint metadata handling by reducing memory allocations when decoding checkpoint metadata and by reducing network transfer when resuming from remote-only checkpoint data.
Changes:
- Switch metadata reads to streaming
json.Decoder+ “lite” structs forListCheckpoints/ReadCheckpointMetadata. - Add a “treeless” metadata-branch fetch (
--depth=1 --filter=blob:none) and best-effort prefetching of only required transcript blobs. - Introduce
BlobResolver+ tree-walk collection of transcript blob hashes (including chunked transcripts) and associated unit tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/strategy/common_helpers_test.go | Adds a preserved “full deserialize” test helper for checkpoint metadata parsing. |
| cmd/entire/cli/strategy/common.go | Refactors checkpoint listing/reading to stream JSON decoding and use minimal structs. |
| cmd/entire/cli/resume.go | Attempts treeless fetch first and adds best-effort transcript blob prefetch before restore. |
| cmd/entire/cli/git_operations.go | Adds treeless metadata fetch and a targeted blob fetch helper. |
| cmd/entire/cli/checkpoint/blob_resolver.go | Adds local-object-store blob existence/read + transcript blob hash collection via tree entries. |
| cmd/entire/cli/checkpoint/blob_resolver_test.go | Adds unit tests for blob resolution and transcript hash collection. |
a5c9b4f to
8b9cac1
Compare
0f69a62 to
270c665
Compare
…data Avoid allocating large unused fields (Summary, InitialAttribution, TokenUsage) when reading checkpoint metadata by introducing minimal sessionMetadataLite and checkpointSummaryLite structs with streaming json.Decoder instead of json.Unmarshal into full types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: f64fee85be5f
Add treeless fetch (--filter=blob:none) for the metadata branch so only commit and tree objects are downloaded initially. Transcript blobs (full.jsonl) are then fetched individually via git fetch-pack using their SHA-1 hashes, skipping any that already exist in the local object store. - BlobResolver: go-git HasEncodedObject checks for local blob existence - CollectTranscriptBlobHashes: walks tree entries to find full.jsonl blob hashes without reading blob content (works after treeless fetch) - FetchMetadataTreeOnly: git CLI fetch with --filter=blob:none - FetchBlobsByHash: git fetch-pack for specific objects, with full-fetch fallback - resume.go: ensureTranscriptBlobs wired before RestoreLogsOnly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 48e91bc15f41
Add --depth=1 to the filtered fetch so only the tip commit and its tree objects are downloaded, skipping the entire branch history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 564aea6f7ca5
…ches The single-checkpoint path in resumeFromCurrentBranch used strategy.GetMetadataBranchTree directly, which fails on fresh clones where entire/checkpoints/v1 doesn't exist locally. Now uses getMetadataTree() with the full fallback chain (treeless fetch → local → full fetch → remote tree), matching the multi-checkpoint path. Also fixes BranchExistsOnRemote to query the actual remote via git ls-remote when the local remote-tracking ref is missing, so works after a fresh clone without a prior fetch. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Paulo Gomes <paulo@entire.io>
…e fallback - resume_test: remove remote ref creation that now succeeds via getMetadataTree's remote tree fallback (added in 649ab7c), update assertion to expect nil when no metadata is available anywhere - hooks_test: add clearGlobalHooksPath helper to override global core.hooksPath in test repos so git resolves to .git/hooks - integration_test/testenv: set local core.hooksPath in InitRepo to override global setting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: cab75de47439
…t reads Replace direct getSessionsBranchTree() calls with getFetchingTree() which transparently fetches missing blobs when accessing treeless-cloned metadata. Updates resume path to use the same fallback chain (treeless fetch → local → full fetch → remote tree) for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 5eec47c03756
…CollectTranscriptBlobHashes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: e5eaf035e386
270c665 to
9ca78e1
Compare
The comment referenced "err from line 85" which was from a prior revision. The actual suppressed error is treeErr from the session enumeration loop, used as a break condition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c69f1bbf8b6a
Previously, when fetch-pack failed and the fallback to full metadata fetch succeeded, the fetch-pack error was silently discarded. Now logs the error at debug level for traceability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: d0b47f037e1a
fetch-pack bypasses git's credential helpers, so HTTPS repos with auth tokens would always fail the initial attempt, adding latency before the fallback. Switch to "git fetch origin <hash>" which goes through the normal transport and credential helper chain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: a763734f96af
… reading ReadCheckpointMetadata and ReadCheckpointMetadataFromSubtree had ~60 lines of identical decoding logic. Extract a shared decodeCheckpointInfo helper that takes a normalizePath function to handle the only real difference: how session metadata paths are transformed for tree lookups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: a81cfc203db5
Add a note that opening a repo is cheap (reads .git/config + refs) so the multiple opens across fallback attempts are acceptable for correctness vs go-git's stale packfile index cache. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 22c23e450ee3
…Branch The control flow ensures metadataTree and freshRepo are set together by either resolveLatestCheckpoint or getMetadataTree, but the nil initialization makes this non-obvious. Add a comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 01a34baed435
git fetch output may contain credential helper prompts, remote URLs with embedded tokens, or other auth-related stderr. Drop the raw output from both the debug log and the returned error message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 511569c9de2a
Review Feedback
PR SummaryMedium Risk Overview Adds treeless fetch + on-demand blob reads for checkpoints. Introduces Updates Tests/benchmarks expanded. Adds unit tests for blob resolution, adjusts resume tests for new behavior, adds benchmark coverage for metadata decoding/listing, adds E2E coverage for resuming from a cloned repo without a local metadata branch, and hardens tests against global Written by Cursor Bugbot for commit db99d03. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Unused
resolverfield allocated in everyFetchingTree- Removed the unused
resolverfield from FetchingTree and thestorerparameter from NewFetchingTree, eliminating the dead BlobResolver allocation on every FetchingTree creation.
- Removed the unused
- ✅ Fixed: Per-blob fetching misses intended batch optimization
- Added PrefetchBlobs() method that recursively collects all blob hashes from the tree and batch-fetches them in a single call, and wired it into all three resume flow FetchingTree creation sites to replace per-blob network round-trips.
Or push these changes by commenting:
@cursor push 899ccc0906
Preview (899ccc0906)
diff --git a/cmd/entire/cli/checkpoint/committed.go b/cmd/entire/cli/checkpoint/committed.go
--- a/cmd/entire/cli/checkpoint/committed.go
+++ b/cmd/entire/cli/checkpoint/committed.go
@@ -1303,7 +1303,7 @@
if err != nil {
return nil, err
}
- return NewFetchingTree(ctx, tree, s.repo.Storer, s.blobFetcher), nil
+ return NewFetchingTree(ctx, tree, s.blobFetcher), nil
}
// getSessionsBranchTree returns the tree object for the entire/checkpoints/v1 branch.
diff --git a/cmd/entire/cli/checkpoint/fetching_tree.go b/cmd/entire/cli/checkpoint/fetching_tree.go
--- a/cmd/entire/cli/checkpoint/fetching_tree.go
+++ b/cmd/entire/cli/checkpoint/fetching_tree.go
@@ -10,8 +10,8 @@
"github.com/entireio/cli/cmd/entire/cli/logging"
"github.com/go-git/go-git/v6/plumbing"
+ "github.com/go-git/go-git/v6/plumbing/filemode"
"github.com/go-git/go-git/v6/plumbing/object"
- "github.com/go-git/go-git/v6/plumbing/storer"
)
// BlobFetchFunc fetches missing blob objects by hash from a remote.
@@ -28,22 +28,18 @@
// visible to go-git's storer. As a fallback, File() reads the blob via
// "git cat-file" which always sees the current on-disk object store.
type FetchingTree struct {
- inner *object.Tree
- ctx context.Context
- resolver *BlobResolver
- fetch BlobFetchFunc
+ inner *object.Tree
+ ctx context.Context
+ fetch BlobFetchFunc
}
// NewFetchingTree wraps a git tree with on-demand blob fetching.
-// The storer is used to check if blobs exist locally, and fetch is called
-// to download any that are missing. If fetch is nil, File() behaves
-// identically to the underlying tree.
-func NewFetchingTree(ctx context.Context, tree *object.Tree, s storer.EncodedObjectStorer, fetch BlobFetchFunc) *FetchingTree {
+// If fetch is nil, File() behaves identically to the underlying tree.
+func NewFetchingTree(ctx context.Context, tree *object.Tree, fetch BlobFetchFunc) *FetchingTree {
return &FetchingTree{
- inner: tree,
- ctx: ctx,
- resolver: NewBlobResolver(s),
- fetch: fetch,
+ inner: tree,
+ ctx: ctx,
+ fetch: fetch,
}
}
@@ -156,13 +152,48 @@
return nil, fmt.Errorf("tree %s: %w", path, err)
}
return &FetchingTree{
- inner: subtree,
- ctx: t.ctx,
- resolver: t.resolver,
- fetch: t.fetch,
+ inner: subtree,
+ ctx: t.ctx,
+ fetch: t.fetch,
}, nil
}
+// PrefetchBlobs recursively walks the tree entries and fetches all blob hashes
+// in a single batch call. This avoids per-blob network round-trips when multiple
+// File() calls will be made against a tree from a treeless fetch.
+// Errors are non-fatal: on-demand fetching in File() serves as a fallback.
+func (t *FetchingTree) PrefetchBlobs() error {
+ if t.fetch == nil {
+ return nil
+ }
+ hashes := collectBlobHashes(t.inner)
+ if len(hashes) == 0 {
+ return nil
+ }
+ logging.Debug(t.ctx, "FetchingTree.PrefetchBlobs: batch fetching blobs",
+ slog.Int("blob_count", len(hashes)),
+ )
+ return t.fetch(t.ctx, hashes)
+}
+
+// collectBlobHashes recursively walks a git tree and returns all blob hashes.
+// Only tree objects are resolved; blobs are collected by hash without reading them.
+// This works after a treeless fetch where tree objects are local but blobs are not.
+func collectBlobHashes(tree *object.Tree) []plumbing.Hash {
+ var hashes []plumbing.Hash
+ for _, entry := range tree.Entries {
+ if entry.Mode == filemode.Dir {
+ subtree, err := tree.Tree(entry.Name)
+ if err == nil {
+ hashes = append(hashes, collectBlobHashes(subtree)...)
+ }
+ } else {
+ hashes = append(hashes, entry.Hash)
+ }
+ }
+ return hashes
+}
+
// RawEntries returns the direct tree entries (no blob reads needed).
func (t *FetchingTree) RawEntries() []object.TreeEntry {
return t.inner.Entries
diff --git a/cmd/entire/cli/resume.go b/cmd/entire/cli/resume.go
--- a/cmd/entire/cli/resume.go
+++ b/cmd/entire/cli/resume.go
@@ -167,13 +167,11 @@
// Multiple checkpoints (squash merge): resolve latest by CreatedAt timestamp.
// resolveLatestCheckpoint also returns the metadata tree so we can reuse it
// for the ReadCheckpointMetadata call below without a redundant lookup.
- // Both metadataTree and freshRepo are set together — either by resolveLatestCheckpoint
- // (multi-checkpoint path) or by getMetadataTree (single-checkpoint path below).
- // freshRepo is always non-nil before it reaches NewFetchingTree.
+ // metadataTree is set by either resolveLatestCheckpoint (multi-checkpoint path)
+ // or getMetadataTree (single-checkpoint path below).
var metadataTree *object.Tree
- var freshRepo *git.Repository
if len(result.checkpointIDs) > 1 {
- latest, tree, latestRepo, err := resolveLatestCheckpoint(ctx, result.checkpointIDs)
+ latest, tree, err := resolveLatestCheckpoint(ctx, result.checkpointIDs)
if err != nil {
// No metadata available — nothing to resume from
logging.Warn(logCtx, "resolveLatestCheckpoint failed",
@@ -189,13 +187,12 @@
len(result.checkpointIDs), result.commitHash[:7], skipped)
checkpointID = latest
metadataTree = tree
- freshRepo = latestRepo
}
// Get metadata branch tree for lookups (reuse from resolveLatestCheckpoint if available)
if metadataTree == nil {
var treeErr error
- metadataTree, freshRepo, treeErr = getMetadataTree(ctx)
+ metadataTree, _, treeErr = getMetadataTree(ctx)
if treeErr != nil {
logging.Warn(logCtx, "getMetadataTree failed, checking remote",
slog.String("checkpoint_id", checkpointID.String()),
@@ -238,10 +235,14 @@
slog.Any("entries", subtreeEntryNames),
)
- // Wrap the checkpoint subtree with on-demand blob fetching.
- // Use the fresh repo's storer (not the original repo) because a fetch may have
- // created new packfiles that the original repo's storer doesn't know about.
- ft := checkpoint.NewFetchingTree(ctx, cpSubtree, freshRepo.Storer, FetchBlobsByHash)
+ // Wrap the checkpoint subtree with on-demand blob fetching and prefetch all
+ // blobs in a single batch to avoid per-blob network round-trips.
+ ft := checkpoint.NewFetchingTree(ctx, cpSubtree, FetchBlobsByHash)
+ if prefetchErr := ft.PrefetchBlobs(); prefetchErr != nil {
+ logging.Debug(logCtx, "PrefetchBlobs failed, will fetch on demand",
+ slog.String("error", prefetchErr.Error()),
+ )
+ }
// Read metadata from checkpoint subtree (paths are relative to checkpoint root)
metadata, err := strategy.ReadCheckpointMetadataFromSubtree(ft, checkpointID.Path())
@@ -264,13 +265,13 @@
}
// resolveLatestCheckpoint reads metadata for each checkpoint ID and returns
-// the one with the latest CreatedAt, along with the metadata tree and fresh
-// repo for reuse. It tries the local metadata branch first, then fetches from
-// remote, then falls back to the remote tree directly.
-func resolveLatestCheckpoint(ctx context.Context, checkpointIDs []id.CheckpointID) (id.CheckpointID, *object.Tree, *git.Repository, error) {
- metadataTree, freshRepo, err := getMetadataTree(ctx)
+// the one with the latest CreatedAt, along with the metadata tree for reuse.
+// It tries the local metadata branch first, then fetches from remote, then
+// falls back to the remote tree directly.
+func resolveLatestCheckpoint(ctx context.Context, checkpointIDs []id.CheckpointID) (id.CheckpointID, *object.Tree, error) {
+ metadataTree, _, err := getMetadataTree(ctx)
if err != nil {
- return id.EmptyCheckpointID, nil, nil, err
+ return id.EmptyCheckpointID, nil, err
}
infoMap := make(map[id.CheckpointID]strategy.CheckpointInfo, len(checkpointIDs))
@@ -284,7 +285,13 @@
)
continue
}
- ft := checkpoint.NewFetchingTree(ctx, cpSubtree, freshRepo.Storer, FetchBlobsByHash)
+ ft := checkpoint.NewFetchingTree(ctx, cpSubtree, FetchBlobsByHash)
+ if prefetchErr := ft.PrefetchBlobs(); prefetchErr != nil {
+ logging.Debug(ctx, "resolveLatestCheckpoint: PrefetchBlobs failed, will fetch on demand",
+ slog.String("checkpoint_id", cpID.String()),
+ slog.String("error", prefetchErr.Error()),
+ )
+ }
metadata, metaErr := strategy.ReadCheckpointMetadataFromSubtree(ft, cpID.Path())
if metaErr != nil {
logging.Debug(ctx, "resolveLatestCheckpoint: checkpoint metadata read failed",
@@ -297,9 +304,9 @@
}
latest, found := strategy.ResolveLatestCheckpointFromMap(checkpointIDs, infoMap)
if !found {
- return id.EmptyCheckpointID, nil, nil, errors.New("no checkpoint metadata found")
+ return id.EmptyCheckpointID, nil, errors.New("no checkpoint metadata found")
}
- return latest.CheckpointID, metadataTree, freshRepo, nil
+ return latest.CheckpointID, metadataTree, nil
}
// getMetadataTree returns the metadata branch tree and a fresh repo handle.
@@ -590,7 +597,12 @@
fmt.Fprintf(os.Stderr, " git fetch origin entire/checkpoints/v1:entire/checkpoints/v1\n")
return nil //nolint:nilerr // Informational message, not a fatal error
}
- ft := checkpoint.NewFetchingTree(ctx, cpSubtree, repo.Storer, FetchBlobsByHash)
+ ft := checkpoint.NewFetchingTree(ctx, cpSubtree, FetchBlobsByHash)
+ if prefetchErr := ft.PrefetchBlobs(); prefetchErr != nil {
+ logging.Debug(logCtx, "checkRemoteMetadata: PrefetchBlobs failed, will fetch on demand",
+ slog.String("error", prefetchErr.Error()),
+ )
+ }
metadata, err := strategy.ReadCheckpointMetadataFromSubtree(ft, checkpointID.Path())
if err != nil {
fmt.Fprintf(os.Stderr, "Checkpoint '%s' found in commit but its metadata could not be read from entire/checkpoints/v1.\n", checkpointID)
diff --git a/cmd/entire/cli/resume_test.go b/cmd/entire/cli/resume_test.go
--- a/cmd/entire/cli/resume_test.go
+++ b/cmd/entire/cli/resume_test.go
@@ -459,7 +459,7 @@
// Pass checkpoint IDs in reverse chronological order (newest first),
// simulating git CLI squash merge trailer order.
reverseOrderIDs := []id.CheckpointID{cpID3, cpID2, cpID1}
- latest, tree, _, err := resolveLatestCheckpoint(context.Background(), reverseOrderIDs)
+ latest, tree, err := resolveLatestCheckpoint(context.Background(), reverseOrderIDs)
if err != nil {
t.Fatalf("resolveLatestCheckpoint() error = %v", err)
}
@@ -476,7 +476,7 @@
// Also verify with chronological order
chronologicalIDs := []id.CheckpointID{cpID1, cpID2, cpID3}
- latest2, _, _, err := resolveLatestCheckpoint(context.Background(), chronologicalIDs)
+ latest2, _, err := resolveLatestCheckpoint(context.Background(), chronologicalIDs)
if err != nil {
t.Fatalf("resolveLatestCheckpoint() error = %v", err)
}
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
| type FetchingTree struct { | ||
| inner *object.Tree | ||
| ctx context.Context | ||
| resolver *BlobResolver |
There was a problem hiding this comment.
Unused resolver field allocated in every FetchingTree
Low Severity
The resolver field (*BlobResolver) is allocated via NewBlobResolver(s) in every NewFetchingTree call and propagated in Tree(), but no FetchingTree method ever reads it. File() uses t.inner.File() as the existence check and t.fetch for fetching—t.resolver.HasBlob() and t.resolver.ReadBlob() are never called. This is dead code that wastes a heap allocation per FetchingTree instance.
Additional Locations (2)
| ) | ||
|
|
||
| // Fetch the blob from the remote. | ||
| if fetchErr := t.fetch(t.ctx, []plumbing.Hash{entry.Hash}); fetchErr != nil { |
There was a problem hiding this comment.
Per-blob fetching misses intended batch optimization
Medium Severity
FetchingTree.File() calls t.fetch with a single-element slice for every blob miss, triggering one git fetch origin <hash> subprocess per file. During resume, reading root metadata plus N session metadata files generates N+1 separate network round-trips, each spawning a subprocess. FetchBlobsByHash accepts a batch of hashes and CollectTranscriptBlobHashes was built to collect them, but neither is used to pre-fetch blobs before tree reads. The stated "targeted subtree blobs fetch" optimization is built but never wired into the resume flow.



Summary
This PR focuses on two key optimisations:
1. Reduce memory allocations when reading checkpoint metadata
json.Decoderreplacesjson.Unmarshalfor readingmetadata.jsonfiles, avoiding loading entire JSON documents into memory as intermediate[]bytesessionMetadataLite,checkpointSummaryLite) replace fullCommittedMetadata/CheckpointSummarywhen only a few fields are needed — avoids allocating large unused nested objects (Summary,InitialAttribution,TokenUsage, etc.)decodeSummaryLiteFromTree,decodeSessionMetadataLite) deduplicate the metadata reading logic betweenListCheckpointsandReadCheckpointMetadata2. Reduce network cost for pulling checkpoint data from remote
--depth=1 --filter=blob:none) downloads only the tip commit and tree objects for the metadata branch — no blobs, no historyBlobResolveruses go-git'sHasEncodedObjectto check the local object store (loose objects + packfile indices) before fetching anythinggit fetch-packrequests all blobs within the checkpoint tree, so that a consistent view is present in disk. Automatically falling back to a full branch fetch if the server doesn't supportallow-reachable-sha1-in-wantResume flow (before → after)
Before:
git fetch origin entire/checkpoints/v1→ downloads entire branch (all commits, trees, and every blob across allcheckpoints)
After:
git fetch --depth=1 --filter=blob:none→ tip commit + trees onlyHasEncodedObject→ skip blobs already in local storegit fetch-pack <url> <hash>...→ fetch only the missing blobsThis translates into an user upon doing a single branch clone of a repository being able to do a
entire resume <branch>with a rather efficient network and local storage data consumption. Behind the scenesentirewill only fetch the blobs contained in the checkpoint subtree, as opposed to the entire branch,Test plan
BlobResolverunit tests:HasBlobpresent/missing,ReadBlobpresent/missingCollectTranscriptBlobHashes: single session, multi-session, chunked, nonexistent checkpointListCheckpointsandReadCheckpointMetadatatests still pass with lite structsmise run lintcleanentire resumeon a branch with remote-only checkpoint data