perf: reduce memory and network cost for checkpoint metadata by pjbgf · Pull Request #680 · entireio/cli

pjbgf · 2026-03-11T12:19:25Z

Summary

This PR focuses on two key optimisations:

1. Reduce memory allocations when reading checkpoint metadata

Streaming json.Decoder replaces json.Unmarshal for reading metadata.json files, avoiding loading entire JSON documents into memory as intermediate []byte
Lite structs (sessionMetadataLite, checkpointSummaryLite) replace full CommittedMetadata/CheckpointSummary when only a few fields are needed — avoids allocating large unused nested objects (Summary, InitialAttribution, TokenUsage, etc.)
Shared helpers (decodeSummaryLiteFromTree, decodeSessionMetadataLite) deduplicate the metadata reading logic between ListCheckpoints and ReadCheckpointMetadata

2. Reduce network cost for pulling checkpoint data from remote

Shallow treeless fetch (--depth=1 --filter=blob:none) downloads only the tip commit and tree objects for the metadata branch — no blobs, no history
On-demand blob resolution via BlobResolver uses go-git's HasEncodedObject to check the local object store (loose objects + packfile indices) before fetching anything
Targeted subtree blobs fetch via git fetch-pack requests all blobs within the checkpoint tree, so that a consistent view is present in disk. Automatically falling back to a full branch fetch if the server doesn't support allow-reachable-sha1-in-want

Resume flow (before → after)

Before: git fetch origin entire/checkpoints/v1 → downloads entire branch (all commits, trees, and every blob across all
checkpoints)

After:

git fetch --depth=1 --filter=blob:none → tip commit + trees only
go-git tree walk → find all blob hashes within the checkpoint sub-tree
HasEncodedObject → skip blobs already in local store
git fetch-pack <url> <hash>... → fetch only the missing blobs

This translates into an user upon doing a single branch clone of a repository being able to do a entire resume <branch> with a rather efficient network and local storage data consumption. Behind the scenes entire will only fetch the blobs contained in the checkpoint subtree, as opposed to the entire branch,

Test plan

BlobResolver unit tests: HasBlob present/missing, ReadBlob present/missing
CollectTranscriptBlobHashes: single session, multi-session, chunked, nonexistent checkpoint
Existing ListCheckpoints and ReadCheckpointMetadata tests still pass with lite structs
Full mise run lint clean
Manual E2E: entire resume on a branch with remote-only checkpoint data

Copilot

Pull request overview

This PR optimizes entire resume and checkpoint metadata handling by reducing memory allocations when decoding checkpoint metadata and by reducing network transfer when resuming from remote-only checkpoint data.

Changes:

Switch metadata reads to streaming json.Decoder + “lite” structs for ListCheckpoints / ReadCheckpointMetadata.
Add a “treeless” metadata-branch fetch (--depth=1 --filter=blob:none) and best-effort prefetching of only required transcript blobs.
Introduce BlobResolver + tree-walk collection of transcript blob hashes (including chunked transcripts) and associated unit tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
cmd/entire/cli/strategy/common_helpers_test.go	Adds a preserved “full deserialize” test helper for checkpoint metadata parsing.
cmd/entire/cli/strategy/common.go	Refactors checkpoint listing/reading to stream JSON decoding and use minimal structs.
cmd/entire/cli/resume.go	Attempts treeless fetch first and adds best-effort transcript blob prefetch before restore.
cmd/entire/cli/git_operations.go	Adds treeless metadata fetch and a targeted blob fetch helper.
cmd/entire/cli/checkpoint/blob_resolver.go	Adds local-object-store blob existence/read + transcript blob hash collection via tree entries.
cmd/entire/cli/checkpoint/blob_resolver_test.go	Adds unit tests for blob resolution and transcript hash collection.

cmd/entire/cli/resume.go

cmd/entire/cli/checkpoint/blob_resolver.go

cmd/entire/cli/git_operations.go

…data Avoid allocating large unused fields (Summary, InitialAttribution, TokenUsage) when reading checkpoint metadata by introducing minimal sessionMetadataLite and checkpointSummaryLite structs with streaming json.Decoder instead of json.Unmarshal into full types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: f64fee85be5f

Add treeless fetch (--filter=blob:none) for the metadata branch so only commit and tree objects are downloaded initially. Transcript blobs (full.jsonl) are then fetched individually via git fetch-pack using their SHA-1 hashes, skipping any that already exist in the local object store. - BlobResolver: go-git HasEncodedObject checks for local blob existence - CollectTranscriptBlobHashes: walks tree entries to find full.jsonl blob hashes without reading blob content (works after treeless fetch) - FetchMetadataTreeOnly: git CLI fetch with --filter=blob:none - FetchBlobsByHash: git fetch-pack for specific objects, with full-fetch fallback - resume.go: ensureTranscriptBlobs wired before RestoreLogsOnly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 48e91bc15f41

Add --depth=1 to the filtered fetch so only the tip commit and its tree objects are downloaded, skipping the entire branch history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 564aea6f7ca5

…ches The single-checkpoint path in resumeFromCurrentBranch used strategy.GetMetadataBranchTree directly, which fails on fresh clones where entire/checkpoints/v1 doesn't exist locally. Now uses getMetadataTree() with the full fallback chain (treeless fetch → local → full fetch → remote tree), matching the multi-checkpoint path. Also fixes BranchExistsOnRemote to query the actual remote via git ls-remote when the local remote-tracking ref is missing, so works after a fresh clone without a prior fetch. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Paulo Gomes <paulo@entire.io>

…e fallback - resume_test: remove remote ref creation that now succeeds via getMetadataTree's remote tree fallback (added in 649ab7c), update assertion to expect nil when no metadata is available anywhere - hooks_test: add clearGlobalHooksPath helper to override global core.hooksPath in test repos so git resolves to .git/hooks - integration_test/testenv: set local core.hooksPath in InitRepo to override global setting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: cab75de47439

…t reads Replace direct getSessionsBranchTree() calls with getFetchingTree() which transparently fetches missing blobs when accessing treeless-cloned metadata. Updates resume path to use the same fallback chain (treeless fetch → local → full fetch → remote tree) for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 5eec47c03756

…CollectTranscriptBlobHashes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: e5eaf035e386

The comment referenced "err from line 85" which was from a prior revision. The actual suppressed error is treeErr from the session enumeration loop, used as a break condition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c69f1bbf8b6a

Previously, when fetch-pack failed and the fallback to full metadata fetch succeeded, the fetch-pack error was silently discarded. Now logs the error at debug level for traceability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: d0b47f037e1a

fetch-pack bypasses git's credential helpers, so HTTPS repos with auth tokens would always fail the initial attempt, adding latency before the fallback. Switch to "git fetch origin <hash>" which goes through the normal transport and credential helper chain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: a763734f96af

… reading ReadCheckpointMetadata and ReadCheckpointMetadataFromSubtree had ~60 lines of identical decoding logic. Extract a shared decodeCheckpointInfo helper that takes a normalizePath function to handle the only real difference: how session metadata paths are transformed for tree lookups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: a81cfc203db5

Add a note that opening a repo is cheap (reads .git/config + refs) so the multiple opens across fallback attempts are acceptable for correctness vs go-git's stale packfile index cache. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 22c23e450ee3

…Branch The control flow ensures metadataTree and freshRepo are set together by either resolveLatestCheckpoint or getMetadataTree, but the nil initialization makes this non-obvious. Add a comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 01a34baed435

git fetch output may contain credential helper prompts, remote URLs with embedded tokens, or other auth-related stderr. Drop the raw output from both the debug log and the returned error message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 511569c9de2a

Review Feedback

cursor · 2026-03-18T15:51:01Z

PR Summary

Medium Risk
Changes resume/checkpoint reading to perform treeless fetches and on-demand blob fetching via git CLI; failures or stale-object edge cases could break resume/read flows, but scope is limited to checkpoint metadata access paths.

Overview
Reduces memory and network cost when resuming/reading checkpoint metadata. Checkpoint metadata decoding now uses streaming json.Decoder with minimal "lite" structs and shared helpers, plus a new ReadCheckpointMetadataFromSubtree path to avoid loading large nested fields.

Adds treeless fetch + on-demand blob reads for checkpoints. Introduces FetchingTree/BlobResolver to fetch missing blobs by hash after git fetch --filter=blob:none, with a git cat-file fallback for go-git storer cache staleness; GitStore/ManualCommitStrategy can now be configured with a blob fetcher.

Updates resume and related git ops. entire resume now attempts a shallow treeless metadata fetch first, opens a fresh repo after fetches to avoid stale indices, wraps checkpoint subtrees with FetchingTree, and improves diagnostics; adds new git helpers (FetchMetadataTreeOnly, FetchBlobsByHash) and strengthens BranchExistsOnRemote via git ls-remote fallback.

Tests/benchmarks expanded. Adds unit tests for blob resolution, adjusts resume tests for new behavior, adds benchmark coverage for metadata decoding/listing, adds E2E coverage for resuming from a cloned repo without a local metadata branch, and hardens tests against global core.hooksPath.

^{Written by Cursor Bugbot for commit db99d03. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix prepared fixes for both issues found in the latest run.

✅ Fixed: Unused resolver field allocated in every FetchingTree
- Removed the unused resolver field from FetchingTree and the storer parameter from NewFetchingTree, eliminating the dead BlobResolver allocation on every FetchingTree creation.
✅ Fixed: Per-blob fetching misses intended batch optimization
- Added PrefetchBlobs() method that recursively collects all blob hashes from the tree and batch-fetches them in a single call, and wired it into all three resume flow FetchingTree creation sites to replace per-blob network round-trips.

Or push these changes by commenting:

@cursor push 899ccc0906

Preview (899ccc0906)

diff --git a/cmd/entire/cli/checkpoint/committed.go b/cmd/entire/cli/checkpoint/committed.go
--- a/cmd/entire/cli/checkpoint/committed.go
+++ b/cmd/entire/cli/checkpoint/committed.go
@@ -1303,7 +1303,7 @@
 	if err != nil {
 		return nil, err
 	}
-	return NewFetchingTree(ctx, tree, s.repo.Storer, s.blobFetcher), nil
+	return NewFetchingTree(ctx, tree, s.blobFetcher), nil
 }
 
 // getSessionsBranchTree returns the tree object for the entire/checkpoints/v1 branch.

diff --git a/cmd/entire/cli/checkpoint/fetching_tree.go b/cmd/entire/cli/checkpoint/fetching_tree.go
--- a/cmd/entire/cli/checkpoint/fetching_tree.go
+++ b/cmd/entire/cli/checkpoint/fetching_tree.go
@@ -10,8 +10,8 @@
 	"github.com/entireio/cli/cmd/entire/cli/logging"
 
 	"github.com/go-git/go-git/v6/plumbing"
+	"github.com/go-git/go-git/v6/plumbing/filemode"
 	"github.com/go-git/go-git/v6/plumbing/object"
-	"github.com/go-git/go-git/v6/plumbing/storer"
 )
 
 // BlobFetchFunc fetches missing blob objects by hash from a remote.
@@ -28,22 +28,18 @@
 // visible to go-git's storer. As a fallback, File() reads the blob via
 // "git cat-file" which always sees the current on-disk object store.
 type FetchingTree struct {
-	inner    *object.Tree
-	ctx      context.Context
-	resolver *BlobResolver
-	fetch    BlobFetchFunc
+	inner *object.Tree
+	ctx   context.Context
+	fetch BlobFetchFunc
 }
 
 // NewFetchingTree wraps a git tree with on-demand blob fetching.
-// The storer is used to check if blobs exist locally, and fetch is called
-// to download any that are missing. If fetch is nil, File() behaves
-// identically to the underlying tree.
-func NewFetchingTree(ctx context.Context, tree *object.Tree, s storer.EncodedObjectStorer, fetch BlobFetchFunc) *FetchingTree {
+// If fetch is nil, File() behaves identically to the underlying tree.
+func NewFetchingTree(ctx context.Context, tree *object.Tree, fetch BlobFetchFunc) *FetchingTree {
 	return &FetchingTree{
-		inner:    tree,
-		ctx:      ctx,
-		resolver: NewBlobResolver(s),
-		fetch:    fetch,
+		inner: tree,
+		ctx:   ctx,
+		fetch: fetch,
 	}
 }
 
@@ -156,13 +152,48 @@
 		return nil, fmt.Errorf("tree %s: %w", path, err)
 	}
 	return &FetchingTree{
-		inner:    subtree,
-		ctx:      t.ctx,
-		resolver: t.resolver,
-		fetch:    t.fetch,
+		inner: subtree,
+		ctx:   t.ctx,
+		fetch: t.fetch,
 	}, nil
 }
 
+// PrefetchBlobs recursively walks the tree entries and fetches all blob hashes
+// in a single batch call. This avoids per-blob network round-trips when multiple
+// File() calls will be made against a tree from a treeless fetch.
+// Errors are non-fatal: on-demand fetching in File() serves as a fallback.
+func (t *FetchingTree) PrefetchBlobs() error {
+	if t.fetch == nil {
+		return nil
+	}
+	hashes := collectBlobHashes(t.inner)
+	if len(hashes) == 0 {
+		return nil
+	}
+	logging.Debug(t.ctx, "FetchingTree.PrefetchBlobs: batch fetching blobs",
+		slog.Int("blob_count", len(hashes)),
+	)
+	return t.fetch(t.ctx, hashes)
+}
+
+// collectBlobHashes recursively walks a git tree and returns all blob hashes.
+// Only tree objects are resolved; blobs are collected by hash without reading them.
+// This works after a treeless fetch where tree objects are local but blobs are not.
+func collectBlobHashes(tree *object.Tree) []plumbing.Hash {
+	var hashes []plumbing.Hash
+	for _, entry := range tree.Entries {
+		if entry.Mode == filemode.Dir {
+			subtree, err := tree.Tree(entry.Name)
+			if err == nil {
+				hashes = append(hashes, collectBlobHashes(subtree)...)
+			}
+		} else {
+			hashes = append(hashes, entry.Hash)
+		}
+	}
+	return hashes
+}
+
 // RawEntries returns the direct tree entries (no blob reads needed).
 func (t *FetchingTree) RawEntries() []object.TreeEntry {
 	return t.inner.Entries

diff --git a/cmd/entire/cli/resume.go b/cmd/entire/cli/resume.go
--- a/cmd/entire/cli/resume.go
+++ b/cmd/entire/cli/resume.go
@@ -167,13 +167,11 @@
 	// Multiple checkpoints (squash merge): resolve latest by CreatedAt timestamp.
 	// resolveLatestCheckpoint also returns the metadata tree so we can reuse it
 	// for the ReadCheckpointMetadata call below without a redundant lookup.
-	// Both metadataTree and freshRepo are set together — either by resolveLatestCheckpoint
-	// (multi-checkpoint path) or by getMetadataTree (single-checkpoint path below).
-	// freshRepo is always non-nil before it reaches NewFetchingTree.
+	// metadataTree is set by either resolveLatestCheckpoint (multi-checkpoint path)
+	// or getMetadataTree (single-checkpoint path below).
 	var metadataTree *object.Tree
-	var freshRepo *git.Repository
 	if len(result.checkpointIDs) > 1 {
-		latest, tree, latestRepo, err := resolveLatestCheckpoint(ctx, result.checkpointIDs)
+		latest, tree, err := resolveLatestCheckpoint(ctx, result.checkpointIDs)
 		if err != nil {
 			// No metadata available — nothing to resume from
 			logging.Warn(logCtx, "resolveLatestCheckpoint failed",
@@ -189,13 +187,12 @@
 			len(result.checkpointIDs), result.commitHash[:7], skipped)
 		checkpointID = latest
 		metadataTree = tree
-		freshRepo = latestRepo
 	}
 
 	// Get metadata branch tree for lookups (reuse from resolveLatestCheckpoint if available)
 	if metadataTree == nil {
 		var treeErr error
-		metadataTree, freshRepo, treeErr = getMetadataTree(ctx)
+		metadataTree, _, treeErr = getMetadataTree(ctx)
 		if treeErr != nil {
 			logging.Warn(logCtx, "getMetadataTree failed, checking remote",
 				slog.String("checkpoint_id", checkpointID.String()),
@@ -238,10 +235,14 @@
 		slog.Any("entries", subtreeEntryNames),
 	)
 
-	// Wrap the checkpoint subtree with on-demand blob fetching.
-	// Use the fresh repo's storer (not the original repo) because a fetch may have
-	// created new packfiles that the original repo's storer doesn't know about.
-	ft := checkpoint.NewFetchingTree(ctx, cpSubtree, freshRepo.Storer, FetchBlobsByHash)
+	// Wrap the checkpoint subtree with on-demand blob fetching and prefetch all
+	// blobs in a single batch to avoid per-blob network round-trips.
+	ft := checkpoint.NewFetchingTree(ctx, cpSubtree, FetchBlobsByHash)
+	if prefetchErr := ft.PrefetchBlobs(); prefetchErr != nil {
+		logging.Debug(logCtx, "PrefetchBlobs failed, will fetch on demand",
+			slog.String("error", prefetchErr.Error()),
+		)
+	}
 
 	// Read metadata from checkpoint subtree (paths are relative to checkpoint root)
 	metadata, err := strategy.ReadCheckpointMetadataFromSubtree(ft, checkpointID.Path())
@@ -264,13 +265,13 @@
 }
 
 // resolveLatestCheckpoint reads metadata for each checkpoint ID and returns
-// the one with the latest CreatedAt, along with the metadata tree and fresh
-// repo for reuse. It tries the local metadata branch first, then fetches from
-// remote, then falls back to the remote tree directly.
-func resolveLatestCheckpoint(ctx context.Context, checkpointIDs []id.CheckpointID) (id.CheckpointID, *object.Tree, *git.Repository, error) {
-	metadataTree, freshRepo, err := getMetadataTree(ctx)
+// the one with the latest CreatedAt, along with the metadata tree for reuse.
+// It tries the local metadata branch first, then fetches from remote, then
+// falls back to the remote tree directly.
+func resolveLatestCheckpoint(ctx context.Context, checkpointIDs []id.CheckpointID) (id.CheckpointID, *object.Tree, error) {
+	metadataTree, _, err := getMetadataTree(ctx)
 	if err != nil {
-		return id.EmptyCheckpointID, nil, nil, err
+		return id.EmptyCheckpointID, nil, err
 	}
 
 	infoMap := make(map[id.CheckpointID]strategy.CheckpointInfo, len(checkpointIDs))
@@ -284,7 +285,13 @@
 			)
 			continue
 		}
-		ft := checkpoint.NewFetchingTree(ctx, cpSubtree, freshRepo.Storer, FetchBlobsByHash)
+		ft := checkpoint.NewFetchingTree(ctx, cpSubtree, FetchBlobsByHash)
+		if prefetchErr := ft.PrefetchBlobs(); prefetchErr != nil {
+			logging.Debug(ctx, "resolveLatestCheckpoint: PrefetchBlobs failed, will fetch on demand",
+				slog.String("checkpoint_id", cpID.String()),
+				slog.String("error", prefetchErr.Error()),
+			)
+		}
 		metadata, metaErr := strategy.ReadCheckpointMetadataFromSubtree(ft, cpID.Path())
 		if metaErr != nil {
 			logging.Debug(ctx, "resolveLatestCheckpoint: checkpoint metadata read failed",
@@ -297,9 +304,9 @@
 	}
 	latest, found := strategy.ResolveLatestCheckpointFromMap(checkpointIDs, infoMap)
 	if !found {
-		return id.EmptyCheckpointID, nil, nil, errors.New("no checkpoint metadata found")
+		return id.EmptyCheckpointID, nil, errors.New("no checkpoint metadata found")
 	}
-	return latest.CheckpointID, metadataTree, freshRepo, nil
+	return latest.CheckpointID, metadataTree, nil
 }
 
 // getMetadataTree returns the metadata branch tree and a fresh repo handle.
@@ -590,7 +597,12 @@
 		fmt.Fprintf(os.Stderr, "  git fetch origin entire/checkpoints/v1:entire/checkpoints/v1\n")
 		return nil //nolint:nilerr // Informational message, not a fatal error
 	}
-	ft := checkpoint.NewFetchingTree(ctx, cpSubtree, repo.Storer, FetchBlobsByHash)
+	ft := checkpoint.NewFetchingTree(ctx, cpSubtree, FetchBlobsByHash)
+	if prefetchErr := ft.PrefetchBlobs(); prefetchErr != nil {
+		logging.Debug(logCtx, "checkRemoteMetadata: PrefetchBlobs failed, will fetch on demand",
+			slog.String("error", prefetchErr.Error()),
+		)
+	}
 	metadata, err := strategy.ReadCheckpointMetadataFromSubtree(ft, checkpointID.Path())
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "Checkpoint '%s' found in commit but its metadata could not be read from entire/checkpoints/v1.\n", checkpointID)

diff --git a/cmd/entire/cli/resume_test.go b/cmd/entire/cli/resume_test.go
--- a/cmd/entire/cli/resume_test.go
+++ b/cmd/entire/cli/resume_test.go
@@ -459,7 +459,7 @@
 	// Pass checkpoint IDs in reverse chronological order (newest first),
 	// simulating git CLI squash merge trailer order.
 	reverseOrderIDs := []id.CheckpointID{cpID3, cpID2, cpID1}
-	latest, tree, _, err := resolveLatestCheckpoint(context.Background(), reverseOrderIDs)
+	latest, tree, err := resolveLatestCheckpoint(context.Background(), reverseOrderIDs)
 	if err != nil {
 		t.Fatalf("resolveLatestCheckpoint() error = %v", err)
 	}
@@ -476,7 +476,7 @@
 
 	// Also verify with chronological order
 	chronologicalIDs := []id.CheckpointID{cpID1, cpID2, cpID3}
-	latest2, _, _, err := resolveLatestCheckpoint(context.Background(), chronologicalIDs)
+	latest2, _, err := resolveLatestCheckpoint(context.Background(), chronologicalIDs)
 	if err != nil {
 		t.Fatalf("resolveLatestCheckpoint() error = %v", err)
 	}

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-03-18T15:59:48Z

cmd/entire/cli/checkpoint/fetching_tree.go

+type FetchingTree struct {
+	inner    *object.Tree
+	ctx      context.Context
+	resolver *BlobResolver


Unused resolver field allocated in every FetchingTree

Low Severity

The resolver field (*BlobResolver) is allocated via NewBlobResolver(s) in every NewFetchingTree call and propagated in Tree(), but no FetchingTree method ever reads it. File() uses t.inner.File() as the existence check and t.fetch for fetching—t.resolver.HasBlob() and t.resolver.ReadBlob() are never called. This is dead code that wastes a heap allocation per FetchingTree instance.

Additional Locations (2)

cmd/entire/cli/checkpoint/fetching_tree.go#L44-L45

cmd/entire/cli/checkpoint/fetching_tree.go#L160-L161

cursor · 2026-03-18T15:59:48Z

cmd/entire/cli/checkpoint/fetching_tree.go

+	)
+
+	// Fetch the blob from the remote.
+	if fetchErr := t.fetch(t.ctx, []plumbing.Hash{entry.Hash}); fetchErr != nil {


Per-blob fetching misses intended batch optimization

Medium Severity

FetchingTree.File() calls t.fetch with a single-element slice for every blob miss, triggering one git fetch origin <hash> subprocess per file. During resume, reading root metadata plus N session metadata files generates N+1 separate network round-trips, each spawning a subprocess. FetchBlobsByHash accepts a batch of hashes and CollectTranscriptBlobHashes was built to collect them, but neither is used to pre-fetch blobs before tree reads. The stated "targeted subtree blobs fetch" optimization is built but never wired into the resume flow.

Copilot AI review requested due to automatic review settings March 11, 2026 12:19

Copilot started reviewing on behalf of pjbgf March 11, 2026 12:19 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

cmd/entire/cli/resume.go Outdated Show resolved Hide resolved

cmd/entire/cli/checkpoint/blob_resolver.go Outdated Show resolved Hide resolved

cmd/entire/cli/git_operations.go Show resolved Hide resolved

jcleira mentioned this pull request Mar 11, 2026

Reduce memory and network cost when reading checkpoint metadata partio-io/minions#57

Open

6 tasks

pjbgf force-pushed the pjbgf/jsonl-data2 branch 2 times, most recently from a5c9b4f to 8b9cac1 Compare March 16, 2026 10:19

pjbgf marked this pull request as ready for review March 16, 2026 10:28

pjbgf requested a review from a team as a code owner March 16, 2026 10:28

pjbgf force-pushed the pjbgf/jsonl-data2 branch from 0f69a62 to 270c665 Compare March 17, 2026 09:18

pjbgf mentioned this pull request Mar 17, 2026

Use partial fetch (blob:none) for metadata session sync #712

Draft

pjbgf and others added 7 commits March 18, 2026 09:47

perf: use shallow treeless fetch for metadata branch

36db444

Add --depth=1 to the filtered fetch so only the tip commit and its tree objects are downloaded, skipping the entire branch history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 564aea6f7ca5

refactor: remove unused hasBaseFile and simplify chunk collection in …

9ca78e1

…CollectTranscriptBlobHashes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: e5eaf035e386

pjbgf force-pushed the pjbgf/jsonl-data2 branch from 270c665 to 9ca78e1 Compare March 18, 2026 09:47

Soph and others added 2 commits March 18, 2026 12:58

pjbgf added the performance Performance issues label Mar 18, 2026

Soph and others added 6 commits March 18, 2026 13:56

Merge pull request #721 from entireio/soph/jsonl-data2-feedback

db99d03

Review Feedback

Soph approved these changes Mar 18, 2026

View reviewed changes

cursor bot reviewed Mar 18, 2026

View reviewed changes

pjbgf enabled auto-merge March 18, 2026 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce memory and network cost for checkpoint metadata#680

perf: reduce memory and network cost for checkpoint metadata#680
pjbgf wants to merge 15 commits intomainfrom
pjbgf/jsonl-data2

pjbgf commented Mar 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

cursor bot Mar 18, 2026

Uh oh!

cursor bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

pjbgf commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Reduce memory allocations when reading checkpoint metadata

2. Reduce network cost for pulling checkpoint data from remote

Resume flow (before → after)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 18, 2026

Choose a reason for hiding this comment

Unused resolver field allocated in every FetchingTree

Uh oh!

cursor bot Mar 18, 2026

Choose a reason for hiding this comment

Per-blob fetching misses intended batch optimization

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

pjbgf commented Mar 11, 2026 •

edited

Loading

cursor bot commented Mar 18, 2026 •

edited

Loading

cursor bot left a comment •

edited

Loading

Unused `resolver` field allocated in every `FetchingTree`