Skip to content

Conversation

@zhangchiqing
Copy link
Member

@zhangchiqing zhangchiqing commented Jan 7, 2026

Close #8308

This PR moves mtrie to a standalone service (ledger service), and introduces a flag to allow execution node to query register values from the remote ledger service.

Summary by CodeRabbit

  • New Features

    • Standalone Ledger gRPC service with remote client/server, factory-based ledger creation, new state query APIs (count and by-index), and CBOR trie-update encode/decode helpers.
  • Infrastructure

    • Ledger image build/push targets and localnet bootstrap options; protobuf tooling added; unified ledger-focused metrics collector; WAL directory file-locking utility.
  • Bug Fixes

    • Safer checkpoint symlink handling, robust WAL lock lifecycle, and whitespace/formatting cleanups.
  • Tests & Docs

    • Extensive tests for remote/local parity, encoding edge-cases, file-locking; new Ledger README and protobuf spec.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Adds a standalone gRPC Ledger service, factory-based ledger selection (local vs remote), remote client/server transport and protobufs, CBOR trie-update encoding, WAL file-locking and compactor wiring, metrics consolidation, localnet bootstrap support for ledger service containers, Makefile/CI image targets, and extensive tests and mocks.

Changes

Cohort / File(s) Summary
CLI & Server
cmd/ledger/main.go, cmd/ledger/README.md
New ledger gRPC server binary with flags, readiness/health checks, lifecycle and graceful shutdown.
Execution node wiring
cmd/execution_builder.go, cmd/execution_config.go, cmd/bootstrap/run/execution_state.go
ExecutionNode now uses ledger factory; ledgerStorage field type changed; removed WAL compactor boot method; added flags for remote ledger address and max message sizes.
Protobuf & Tooling
ledger/protobuf/ledger.proto, ledger/protobuf/buf.gen.yaml, ledger/protobuf/buf.yaml
New LedgerService protobuf and Buf config for go/go-grpc codegen (message/service definitions include nil-vs-empty Value handling).
Remote gRPC layer
ledger/remote/service.go, ledger/remote/client.go, ledger/remote/factory.go, ledger/remote/encoding.go, ledger/remote/protobuf_encoding_test.go
gRPC server + client implementing ledger.Ledger, transport encoding helpers, readiness probing, and protobuf encoding tests.
Factory & wiring
ledger/factory/factory.go, ledger/complete/factory.go, ledger/factory/factory_test.go
New ledger Factory abstraction selecting local vs remote; Result includes optional DiskWAL; extensive factory tests for parity between local and remote.
Ledger API & internals
ledger/ledger.go, ledger/complete/*.go, ledger/partial/*.go, ledger/config.go
Ledger interface adds StateCount/StateByIndex; LedgerWithCompactor and LocalLedgerFactory added; compactor config type introduced; sync and lifecycle fixes.
Trie encoding & tests
ledger/trie_encoder.go, ledger/trie_encoder_test.go
Adds CBOR encode/decode helpers (EncodeTrieUpdateCBOR/DecodeTrieUpdateCBOR), payload normalization for deterministic encoding, and expanded tests covering CBOR/protobuf/binary round-trips.
WAL & Checkpointer
ledger/complete/wal/wal.go, ledger/complete/wal/checkpointer.go, ledger/complete/wal/*_test.go
DiskWAL acquires a file lock; safer checkpoint symlink handling; test logger updates and test fixes.
File lock utility
utils/io/filelock.go, utils/io/filelock_test.go
New FileLock abstraction (flock-based) with Lock/Unlock/IsLocked/RemoveLockFile and comprehensive tests.
Integration / Localnet
integration/localnet/builder/bootstrap.go, integration/localnet/Makefile, integration/localnet/builder/ports.go, integration/localnet/Makefile
LEDGER_EXECUTION support and validation; ledger service bootstrap container, port ranges, and execution node wiring for remote-ledger usage; added stop-flow target.
Metrics
module/metrics/ledger.go, module/metrics/execution.go
New LedgerCollector implementing LedgerMetrics/WALMetrics; ExecutionCollector delegates many metrics to LedgerCollector.
Build & CI
.github/workflows/image_builds.yml, .github/workflows/ci.yml, Makefile
Added docker build/push targets and CI matrix entries for execution-ledger images/variants; PHONY and artifact updates.
Mocks & generated
ledger/mock/*
Mocks generated/updated for Factory and new Ledger methods (StateCount, StateByIndex).
Utilities & small edits
go.mod, various tests, whitespace changes
Promoted github.com/gofrs/flock to direct require; replaced unittest loggers with zerolog.Nop() in tests; minor whitespace cleanups.

Sequence Diagram(s)

sequenceDiagram
    participant Exec as Execution Node
    participant Factory as Ledger Factory
    participant Client as Remote Client
    participant Service as Ledger Service (gRPC)
    participant Ledger as Local Ledger Storage

    Exec->>Factory: NewLedger(config)
    alt ledger-service-addr set (remote)
        Factory->>Client: NewClient(grpcAddr, sizes)
        Client->>Service: Dial & Ready probe
        Exec->>Client: Get/Set/Prove RPC
        Client->>Service: RPC call
        Service->>Ledger: Query/Update/Prove
        Ledger-->>Service: Result
        Service-->>Client: Reply
        Client-->>Exec: Return values/proofs
    else local
        Factory->>Ledger: NewLocalLedgerFactory -> NewLedger()
        Ledger-->>Factory: ledger instance (+optional WAL)
        Factory-->>Exec: Return local ledger
    end
Loading
sequenceDiagram
    participant Bootstrap as Localnet Bootstrap
    participant LedgerSvc as Ledger Container
    participant ExecNode as Execution Container
    participant Trie as Shared Trie Dir

    Bootstrap->>Bootstrap: read LEDGER_EXECUTION
    alt ledger-execution > 0
        Bootstrap->>LedgerSvc: build & start ledger_service container
        LedgerSvc->>Trie: mount trie dir / copy checkpoint symlinks
        Bootstrap->>ExecNode: start exec nodes with --ledger-service-addr (for indices < LEDGER_EXECUTION)
        ExecNode->>LedgerSvc: RPC calls
        LedgerSvc->>Trie: read/write trie
    else ledger-execution == 0
        Bootstrap->>ExecNode: start exec nodes with local trie mount
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested Labels

Improvement, Breaking Change

Suggested Reviewers

  • fxamacker
  • janezpodhostnik
  • peterargue

Poem

🐇 I hopped across the build and test,
A ledger learned to serve the rest.
CBOR keeps bytes in tidy rows,
Locks hold fast where trie-root grows,
Services wake — the network rests.

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'POC Ledger Service' is vague and generic, using a non-descriptive term 'POC' without conveying the actual change—which is implementing a standalone ledger service with remote RPC communication. Use a more descriptive title such as 'Implement standalone ledger service with remote RPC' to clearly summarize the main change.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR fully implements the requirements from issue #8308: creates a ledger service conforming to the ledger interface, implements RPC communication via gRPC for remote queries, and enables execution nodes to use remote ledger service while maintaining local fallback.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the standalone ledger service and remote execution node integration. Minor adjustments to imports, dependency management, and test utilities are necessary supporting changes for the main objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch leo/poc-ledger-service

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@codecov-commenter
Copy link

codecov-commenter commented Jan 7, 2026

Comment on lines +880 to +881
require.Equal(t, decoded1, decoded2, "EncodeTrieUpdate and EncodeTrieUpdateCBOR should produce same TrieUpdate")
require.Equal(t, decoded2, decoded3, "EncodeTrieUpdateCBOR and EncodeTrieUpdateProtoBuf should produce same TrieUpdate")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have 3 encoding methods for the TrieUpdate.

  1. The EncodeTrieUpdate is used by DiskWAL to encode trie update into WAL files
  2. The cbor.Marshal, which is always used by EncodeTrieUpdateCBOR, is used by the execution data provider to encode the ChunkExecutionData which contains TrieUpdate.
  3. The proto.Marshal is used by the ledger grpc server to encode the TrieUpdate included in response of ledger.Set method.

And this test case is to ensure given the same trie update, after encode and decode, the nil value will be normalized to empty slice, but the decoded value should be consistent.

@zhangchiqing zhangchiqing marked this pull request as ready for review January 14, 2026 00:26
@zhangchiqing zhangchiqing requested a review from a team as a code owner January 14, 2026 00:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ledger/complete/wal/checkpoint_v6_test.go (1)

450-457: Bug: compareFiles opens file1 twice instead of file1 and file2.

Line 451 opens file1 again instead of file2, causing the function to always compare a file with itself. This renders the comparison meaningless and tests like TestCheckpointV6IsDeterminstic will always pass regardless of actual file differences.

🐛 Proposed fix
-	closable2, err := os.Open(file1)
+	closable2, err := os.Open(file2)
🤖 Fix all issues with AI agents
In @integration/localnet/builder/ports.go:
- Around line 56-60: The comment for the "ledger" port range is incorrect: with
start: 8000, end: 8100 and portCount: 2 the capacity is (8100 - 8000) / 2 = 50
services, not 20; update the inline comment in the "ledger" block to reflect the
correct capacity (50 ledger services) or adjust portCount/end values if you
intended 20 services so the math matches.

In @ledger/complete/wal/checkpointer_test.go:
- Around line 495-498: The helper randomlyModifyFile should not call t.Skip()
because that skips the whole test; instead change randomlyModifyFile to return
an error (or a bool) and have callers handle it: remove t.Skip() and either
return nil / false when fileSize==0 or return a specific error, update callers
in Test_Checkpointing’s loop to check the return value and continue the loop (or
t.Log and continue) when the file is empty. Ensure the helper signature
(randomlyModifyFile) and all call sites are updated accordingly so no test-level
skip occurs.

In @ledger/complete/wal/checkpointer.go:
- Around line 1120-1126: The call to filepath.Rel when computing
expectedRelTarget in checkpointer.go ignores its error, so if filepath.Rel fails
expectedRelTarget may be wrong and the subsequent comparison (variables:
symlinkDir, expectedRelTarget, match, target, newPath) can mis-detect correct
symlinks; fix by checking the error returned from filepath.Rel, and on error
fall back to a safe comparison (for example use match as the expected target or
compute absolute paths with filepath.Abs for both target and match before
comparing) so the condition if target == expectedRelTarget || target == match
reliably reflects the actual link target.

In @ledger/remote/client.go:
- Around line 45-55: The client always uses insecure.NewCredentials() in the
grpc.NewClient call, so add a configurable switch to use TLS credentials for
production: detect a config/env flag (e.g., LEDGER_GRPC_TLS or use a
TLSCert/TLSRootCA path) and, when enabled, create secure credentials (e.g., via
credentials.NewClientTLSFromFile or equivalent) and pass
grpc.WithTransportCredentials(creds) instead of insecure.NewCredentials(); keep
the existing grpcAddr and call option handling (grpc.MaxCallRecvMsgSize,
grpc.MaxCallSendMsgSize) and ensure errors from loading certs are returned
(update the grpc.NewClient construction around the grpcAddr, maxResponseSize,
maxRequestSize block).
- Around line 75-94: InitialState currently swallows errors and returns
ledger.DummyState, which breaks tests; change the Client.InitialState signature
to return (ledger.State, error) and update the implementation to propagate
errors from c.client.InitialState and from the invalid hash-length check (return
zero value state and the error instead of DummyState), and then update all
callers and the ledger.Client interface implementations to handle the new
(state, error) return so the remote client never silently returns DummyState on
failures.

In @ledger/remote/service.go:
- Around line 230-236: The loop over protoKey.Parts can panic if an element is
nil; update the loop that builds keyParts to check for nil before accessing
part.Type/value (i.e., inside the for i, part := range protoKey.Parts block),
and return a status.Error(codes.InvalidArgument, "...") when a nil part is
encountered; keep the existing uint16 bounds check for part.Type and continue to
call ledger.NewKeyPart only after validating part is non-nil and its Type fits
in uint16 so you populate the ledger.KeyPart slice safely.
- Around line 150-168: The loop over req.Values must guard against nil elements
to avoid panics; before reading protoValue.Data or protoValue.IsNil in the loop
that builds values (the slice of ledger.Value), add a nil check for protoValue
and handle it explicitly (e.g., treat a nil protoValue as an original nil
ledger.Value or return a clear error), then proceed with the existing logic that
distinguishes protoValue.IsNil vs empty Data to reconstruct the ledger.Value;
reference req.Values, protoValue, protoValue.Data, protoValue.IsNil and the
values slice when locating where to add the check.

In @ledger/trie_encoder_test.go:
- Around line 705-712: Fix the mismatched assertions/messages around
decoded.Payloads: replace the incorrect require.NotNil on
decoded.Payloads[0].Value() with require.Nil and update its message to "Decoded
Payload 0 should have nil value"; keep require.NotNil for
decoded.Payloads[1].Value() (empty slice) and ensure its message says "Decoded
Payload 1 should have non-nil value"; finally update the final require.Equal
message to reflect the intended expectation ("Decoded nil and []byte{} should be
equal after normalization") so the assertion text matches the check of
decoded.Payloads[0].Value() == decoded.Payloads[1].Value().
🧹 Nitpick comments (16)
ledger/common/pathfinder/pathfinder.go (1)

117-117: Minor: Comment still has a grammatical issue.

The comment uses "an slice" which should be "a slice" (slice starts with a consonant sound).

📝 Suggested fix
-// UpdateToPayloads constructs an slice of payloads given ledger update.
+// UpdateToPayloads constructs a slice of payloads given a ledger update.
ledger/trie_encoder.go (1)

707-711: Consider returning an error instead of panicking on marshal failure.

While CBOR marshalling of a TrieUpdate struct is unlikely to fail in practice, panicking violates the project's error handling guidelines which state "NEVER log and continue on best effort basis; ALWAYS explicitly handle errors." Returning an error allows callers to handle the failure gracefully.

♻️ Suggested refactor
-func EncodeTrieUpdateCBOR(t *TrieUpdate) []byte {
+func EncodeTrieUpdateCBOR(t *TrieUpdate) ([]byte, error) {
 	if t == nil {
-		return []byte{}
+		return []byte{}, nil
 	}

 	encoded, err := cbor.Marshal(t)
 	if err != nil {
-		// This should not happen in normal operation
-		panic(fmt.Errorf("failed to encode trie update with CBOR: %w", err))
+		return nil, fmt.Errorf("failed to encode trie update with CBOR: %w", err)
 	}

-	return encoded
+	return encoded, nil
 }

Note: This would require updating callers to handle the returned error. Based on learnings, the project prefers comprehensive error wrapping over panics.

ledger/complete/factory.go (1)

21-38: Consider validating critical parameters.

The constructor accepts parameters without validation. If compactorConfig is nil, the call to NewLedgerWithCompactor will panic when accessing compactorConfig fields. Consider adding a nil check or documenting that compactorConfig must not be nil.

♻️ Optional validation
 func NewLocalLedgerFactory(
 	ledgerWAL wal.LedgerWAL,
 	capacity int,
 	compactorConfig *ledger.CompactorConfig,
 	metrics module.LedgerMetrics,
 	logger zerolog.Logger,
 	pathFinderVersion uint8,
 ) ledger.Factory {
+	if compactorConfig == nil {
+		panic("compactorConfig must not be nil")
+	}
 	return &LocalLedgerFactory{

Alternatively, document this precondition in the function's doc comment.

utils/io/filelock_test.go (1)

383-444: Platform-specific tests are appropriately documented.

The permission-denied tests (lines 383-444) are correctly documented as potentially platform-specific (may not work on Windows). The tests properly restore permissions in defer blocks to ensure cleanup.

Consider adding a skip condition for Windows if these tests are known to fail:

💡 Optional: Skip on Windows
if runtime.GOOS == "windows" {
    t.Skip("permission tests not reliable on Windows")
}
ledger/complete/ledger_with_compactor.go (1)

90-104: Consider error recovery in Ready() goroutine.

The goroutine in Ready() doesn't have panic recovery. If either lwc.ledger.Ready() or lwc.compactor.Ready() panics, the ready channel will never close, potentially causing callers to block indefinitely.

This follows the existing pattern in the codebase but is worth noting for robustness.

.github/workflows/image_builds.yml (1)

167-167: Update actions/checkout to v4.

The actions/checkout@v3 action is outdated. Update to actions/checkout@v4 for security updates and continued compatibility.

Suggested fix
-      - name: Checkout repo
-        uses: actions/checkout@v3
+      - name: Checkout repo
+        uses: actions/checkout@v4

Apply this change at both lines 167 and 199.

Also applies to: 199-199

ledger/remote/factory.go (1)

33-38: Minor simplification available.

The function body can be simplified by returning the client directly.

✨ Suggested simplification
 func (f *RemoteLedgerFactory) NewLedger() (ledger.Ledger, error) {
-	client, err := NewClient(f.grpcAddr, f.logger, f.maxRequestSize, f.maxResponseSize)
-	if err != nil {
-		return nil, err
-	}
-	return client, nil
+	return NewClient(f.grpcAddr, f.logger, f.maxRequestSize, f.maxResponseSize)
 }
cmd/ledger/main.go (1)

125-132: Consider adding a shutdown timeout.

grpcServer.GracefulStop() blocks until all active RPCs complete, which could hang indefinitely if a client maintains a long-running connection. Consider adding a timeout with fallback to Stop().

⏱️ Suggested timeout-based shutdown
+import (
+	"context"
+	"time"
+)

 // Graceful shutdown
 logger.Info().Msg("shutting down gRPC server...")
-grpcServer.GracefulStop()
+
+// Give in-flight requests time to complete, then force stop
+shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+defer cancel()
+
+done := make(chan struct{})
+go func() {
+	grpcServer.GracefulStop()
+	close(done)
+}()
+
+select {
+case <-done:
+	logger.Info().Msg("gRPC server stopped gracefully")
+case <-shutdownCtx.Done():
+	logger.Warn().Msg("graceful shutdown timed out, forcing stop")
+	grpcServer.Stop()
+}
cmd/execution_builder.go (1)

937-959: Remove unused diskWAL field assignment or document its purpose.

While the concern about result.WAL being nil for remote ledgers is valid, verification shows that exeNode.diskWAL is assigned at line 957 but never referenced anywhere else in the codebase. This appears to be dead code left over from refactoring. Either remove the assignment or add a comment explaining why this field is being stored.

ledger/remote/protobuf_encoding_test.go (1)

23-105: Tests simulate rather than verify actual protobuf encoding behavior.

These tests are valuable as documentation of the nil vs []byte{} distinction issue. However, they simulate protobuf behavior (lines 68-79) rather than actually encoding/decoding via protobuf marshaling. Consider adding a brief note or renaming to clarify these are conceptual demonstrations, or add a companion test that performs actual proto.Marshal/proto.Unmarshal round-trips to verify the real behavior.

integration/localnet/builder/bootstrap.go (1)

702-718: Consider documenting why directory removal is needed.

The new logic at lines 703-710 removes an existing directory before creating a file. While this handles an edge case, it's somewhat unexpected behavior for a function named openAndTruncate. A comment explaining the scenario that necessitated this change would help future maintainers understand why a directory might exist at a file path location.

📝 Suggested documentation
 func openAndTruncate(filename string) (*os.File, error) {
 	// Check if path exists and is a directory, remove it if so
+	// This can happen if a previous run created a directory at this location
+	// (e.g., due to a configuration change or bug)
 	if fi, err := os.Stat(filename); err == nil {
 		if fi.IsDir() {
ledger/factory/factory_test.go (2)

366-370: Potential port allocation race condition.

The pattern of calling net.Listen(":0"), extracting the address, closing the listener, and then re-opening at that address (line 419) creates a small window where another process could claim the port. While unlikely in test environments, consider keeping the original listener open and using it directly.

♻️ Suggested fix
 func startLedgerServer(t *testing.T, walDir string) (string, func()) {
-	// Find an available port
-	listener, err := net.Listen("tcp", ":0")
-	require.NoError(t, err)
-	addr := listener.Addr().String()
-	listener.Close()
-
 	logger := unittest.Logger()
 
 	// Create WAL
@@ -416,8 +410,11 @@ func startLedgerServer(t *testing.T, walDir string) (string, func()) {
 	ledgerpb.RegisterLedgerServiceServer(grpcServer, ledgerService)
 
 	// Start gRPC server
-	lis, err := net.Listen("tcp", addr)
+	lis, err := net.Listen("tcp", ":0")
 	require.NoError(t, err)
+	addr := lis.Addr().String()
 
 	// Start server in goroutine
 	serverErr := make(chan error, 1)
@@ -428,8 +425,6 @@ func startLedgerServer(t *testing.T, walDir string) (string, func()) {
 	}()
 
-	// Wait a bit for server to start
-	time.Sleep(100 * time.Millisecond)
-
 	// Cleanup function
 	cleanup := func() {

The static analysis warning about binding to :0 is a false positive for test code—this is a standard pattern for allocating random ports in tests.


494-506: Cleanup order may cause issues.

The defer block calls <-remoteLedger.Done() first, then <-localResult.WAL.Done(), then <-localLedger.Done(). However, the local ledger likely depends on the WAL, so stopping the WAL before the ledger could cause issues. Consider reversing the local cleanup order.

♻️ Suggested cleanup order
 	defer func() {
 		// Stop remote ledger
 		<-remoteLedger.Done()
 
-		// Stop local ledger and WAL
+		// Stop local ledger first, then WAL (ledger may depend on WAL)
+		<-localLedger.Done()
 		if localResult.WAL != nil {
 			<-localResult.WAL.Done()
 		}
-		<-localLedger.Done()
 
 		// Stop server
 		serverCleanup()
 	}()
ledger/factory/factory.go (1)

116-124: Type assertion may be fragile.

The type assertion diskWal.(*wal.DiskWAL) at line 120 will fail if wal.NewDiskWAL ever returns a different type implementing wal.LedgerWAL. Since NewDiskWAL is called just above and returns *DiskWAL, this is currently safe, but the interface-to-concrete-type pattern could break if the constructor changes.

Consider documenting this assumption or using the concrete type directly:

♻️ Alternative: use concrete type directly
 func NewLedger(config Config) (*Result, error) {
 	var factory ledger.Factory
-	var diskWal wal.LedgerWAL
+	var diskWAL *wal.DiskWAL
 
 	// Check if remote ledger service is configured
 	if config.LedgerServiceAddr != "" {
@@ -75,7 +75,7 @@ func NewLedger(config Config) (*Result, error) {
 
 		// Create WAL
 		var err error
-		diskWal, err = wal.NewDiskWAL(
+		diskWAL, err = wal.NewDiskWAL(
 			config.Logger.With().Str("subcomponent", "wal").Logger(),
 			config.MetricsRegisterer,
 			config.WALMetrics,
@@ -100,7 +100,7 @@ func NewLedger(config Config) (*Result, error) {
 
 		// Use factory to create ledger with internal compactor
 		factory = complete.NewLocalLedgerFactory(
-			diskWal,
+			diskWAL,
 			int(config.MTrieCacheSize),
 			compactorConfig,
 			config.LedgerMetrics,
@@ -113,16 +113,8 @@ func NewLedger(config Config) (*Result, error) {
 		return nil, fmt.Errorf("failed to create ledger: %w", err)
 	}
 
-	// Type assert to get the concrete DiskWAL type (only for local ledger)
-	var diskWAL *wal.DiskWAL
-	if diskWal != nil {
-		var ok bool
-		diskWAL, ok = diskWal.(*wal.DiskWAL)
-		if !ok {
-			return nil, fmt.Errorf("expected *wal.DiskWAL but got %T", diskWal)
-		}
-	}
-
 	return &Result{
 		Ledger: ledgerStorage,
 		WAL:    diskWAL,
ledger/protobuf/ledger.proto (1)

1-7: Consider addressing Buf linting suggestions for API versioning.

Static analysis flags that the package lacks a version suffix (e.g., ledger.v1) and the file path doesn't match the package name. These are common protobuf best practices for API evolution and breaking change management.

If this service may evolve with breaking changes, consider adopting versioned packages early. However, if this is an internal-only service with controlled clients, this can be deferred.

ledger/remote/service.go (1)

15-28: Logger is currently unused.

The logger field is never used in any of the RPC handlers. If it's intended for future observability, consider adding structured logging for request tracing and error scenarios. If not needed, consider removing it to avoid confusion.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9e3915 and e637a0d.

⛔ Files ignored due to path filters (2)
  • ledger/protobuf/ledger.pb.go is excluded by !**/*.pb.go
  • ledger/protobuf/ledger_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (37)
  • .github/workflows/image_builds.yml
  • Makefile
  • cmd/bootstrap/run/execution_state.go
  • cmd/execution_builder.go
  • cmd/execution_config.go
  • cmd/ledger/README.md
  • cmd/ledger/main.go
  • go.mod
  • integration/localnet/Makefile
  • integration/localnet/builder/bootstrap.go
  • integration/localnet/builder/ports.go
  • ledger/common/pathfinder/pathfinder.go
  • ledger/complete/factory.go
  • ledger/complete/ledger.go
  • ledger/complete/ledger_with_compactor.go
  • ledger/complete/wal/checkpoint_v5_test.go
  • ledger/complete/wal/checkpoint_v6_test.go
  • ledger/complete/wal/checkpointer.go
  • ledger/complete/wal/checkpointer_test.go
  • ledger/complete/wal/wal.go
  • ledger/config.go
  • ledger/factory.go
  • ledger/factory/factory.go
  • ledger/factory/factory_test.go
  • ledger/protobuf/buf.gen.yaml
  • ledger/protobuf/buf.yaml
  • ledger/protobuf/ledger.proto
  • ledger/remote/client.go
  • ledger/remote/factory.go
  • ledger/remote/protobuf_encoding_test.go
  • ledger/remote/service.go
  • ledger/trie_encoder.go
  • ledger/trie_encoder_test.go
  • module/metrics/execution.go
  • module/metrics/ledger.go
  • utils/io/filelock.go
  • utils/io/filelock_test.go
🧰 Additional context used
📓 Path-based instructions (6)
**/*.go

📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)

Follow Go coding conventions as documented in @docs/agents/CodingConventions.md

Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md

**/*.go: Follow the existing module structure in /module/, /engine/, /model/ and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: run make generate-mocks after interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoid fmt.Errorf, use irrecoverable package for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)

Files:

  • ledger/complete/wal/checkpoint_v6_test.go
  • ledger/factory.go
  • ledger/complete/wal/checkpointer_test.go
  • ledger/complete/wal/checkpoint_v5_test.go
  • utils/io/filelock.go
  • ledger/complete/ledger.go
  • cmd/bootstrap/run/execution_state.go
  • ledger/complete/wal/wal.go
  • utils/io/filelock_test.go
  • integration/localnet/builder/ports.go
  • ledger/remote/protobuf_encoding_test.go
  • cmd/ledger/main.go
  • ledger/factory/factory_test.go
  • module/metrics/execution.go
  • cmd/execution_config.go
  • ledger/factory/factory.go
  • ledger/config.go
  • ledger/complete/factory.go
  • integration/localnet/builder/bootstrap.go
  • ledger/trie_encoder_test.go
  • ledger/remote/factory.go
  • ledger/remote/client.go
  • ledger/common/pathfinder/pathfinder.go
  • module/metrics/ledger.go
  • ledger/remote/service.go
  • ledger/complete/ledger_with_compactor.go
  • ledger/complete/wal/checkpointer.go
  • ledger/trie_encoder.go
  • cmd/execution_builder.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*_test.go: Unit tests should be co-located with the code they test
Follow the existing pattern of *_test.go files for test naming
Use fixtures for realistic test data as defined in /utils/unittest/

Files:

  • ledger/complete/wal/checkpoint_v6_test.go
  • ledger/complete/wal/checkpointer_test.go
  • ledger/complete/wal/checkpoint_v5_test.go
  • utils/io/filelock_test.go
  • ledger/remote/protobuf_encoding_test.go
  • ledger/factory/factory_test.go
  • ledger/trie_encoder_test.go
{crypto,fvm,ledger,storage}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Security checks for cryptographic misuse must be enforced using gosec

Files:

  • ledger/complete/wal/checkpoint_v6_test.go
  • ledger/factory.go
  • ledger/complete/wal/checkpointer_test.go
  • ledger/complete/wal/checkpoint_v5_test.go
  • ledger/complete/ledger.go
  • ledger/complete/wal/wal.go
  • ledger/remote/protobuf_encoding_test.go
  • ledger/factory/factory_test.go
  • ledger/factory/factory.go
  • ledger/config.go
  • ledger/complete/factory.go
  • ledger/trie_encoder_test.go
  • ledger/remote/factory.go
  • ledger/remote/client.go
  • ledger/common/pathfinder/pathfinder.go
  • ledger/remote/service.go
  • ledger/complete/ledger_with_compactor.go
  • ledger/complete/wal/checkpointer.go
  • ledger/trie_encoder.go
{crypto,fvm,ledger,access,engine}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation

Files:

  • ledger/complete/wal/checkpoint_v6_test.go
  • ledger/factory.go
  • ledger/complete/wal/checkpointer_test.go
  • ledger/complete/wal/checkpoint_v5_test.go
  • ledger/complete/ledger.go
  • ledger/complete/wal/wal.go
  • ledger/remote/protobuf_encoding_test.go
  • ledger/factory/factory_test.go
  • ledger/factory/factory.go
  • ledger/config.go
  • ledger/complete/factory.go
  • ledger/trie_encoder_test.go
  • ledger/remote/factory.go
  • ledger/remote/client.go
  • ledger/common/pathfinder/pathfinder.go
  • ledger/remote/service.go
  • ledger/complete/ledger_with_compactor.go
  • ledger/complete/wal/checkpointer.go
  • ledger/trie_encoder.go
{storage,ledger,execution,fvm}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

State consistency is paramount; use proper synchronization primitives

Files:

  • ledger/complete/wal/checkpoint_v6_test.go
  • ledger/factory.go
  • ledger/complete/wal/checkpointer_test.go
  • ledger/complete/wal/checkpoint_v5_test.go
  • ledger/complete/ledger.go
  • ledger/complete/wal/wal.go
  • ledger/remote/protobuf_encoding_test.go
  • ledger/factory/factory_test.go
  • ledger/factory/factory.go
  • ledger/config.go
  • ledger/complete/factory.go
  • ledger/trie_encoder_test.go
  • ledger/remote/factory.go
  • ledger/remote/client.go
  • ledger/common/pathfinder/pathfinder.go
  • ledger/remote/service.go
  • ledger/complete/ledger_with_compactor.go
  • ledger/complete/wal/checkpointer.go
  • ledger/trie_encoder.go
{module,engine,cmd}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

All major processing components must implement the Component interface from /module/component/component.go to ensure consistent lifecycle management and graceful shutdown patterns

Files:

  • cmd/bootstrap/run/execution_state.go
  • cmd/ledger/main.go
  • module/metrics/execution.go
  • cmd/execution_config.go
  • module/metrics/ledger.go
  • cmd/execution_builder.go
🧠 Learnings (13)
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*_test.go : Use fixtures for realistic test data as defined in `/utils/unittest/`

Applied to files:

  • ledger/complete/wal/checkpoint_v6_test.go
  • ledger/complete/wal/checkpoint_v5_test.go
  • utils/io/filelock_test.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {storage,ledger,execution,fvm}/**/*.go : State consistency is paramount; use proper synchronization primitives

Applied to files:

  • ledger/complete/ledger.go
  • cmd/bootstrap/run/execution_state.go
  • ledger/complete/wal/wal.go
  • utils/io/filelock_test.go
  • cmd/ledger/main.go
  • integration/localnet/builder/bootstrap.go
  • integration/localnet/Makefile
  • ledger/complete/wal/checkpointer.go
  • cmd/execution_builder.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {crypto,fvm,ledger,access,engine}/**/*.go : Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation

Applied to files:

  • cmd/bootstrap/run/execution_state.go
  • cmd/ledger/main.go
  • integration/localnet/builder/bootstrap.go
  • integration/localnet/Makefile
  • ledger/complete/wal/checkpointer.go
  • ledger/trie_encoder.go
  • cmd/execution_builder.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*_test.go : Unit tests should be co-located with the code they test

Applied to files:

  • utils/io/filelock_test.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*_test.go : Follow the existing pattern of `*_test.go` files for test naming

Applied to files:

  • utils/io/filelock_test.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to integration/tests/**/*.go : Integration tests should go in `/integration/tests/`

Applied to files:

  • utils/io/filelock_test.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : Uses golangci-lint with custom configurations (`.golangci.yml`) and custom linters for Flow-specific conventions (struct write checking)

Applied to files:

  • ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:19.865Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: .cursor/rules/godocs.mdc:0-0
Timestamp: 2025-12-23T00:28:19.865Z
Learning: Applies to **/*.go : Follow Go coding standards and conventions as documented in docs/agents/GoDocs.md

Applied to files:

  • ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : Use mock generators: run `make generate-mocks` after interface changes

Applied to files:

  • ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:14.267Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: .cursor/rules/coding_conventions.mdc:0-0
Timestamp: 2025-12-23T00:28:14.267Z
Learning: Applies to **/*.go : Follow Go coding conventions as documented in docs/agents/CodingConventions.md

Applied to files:

  • ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md

Applied to files:

  • ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : Follow the existing module structure in `/module/`, `/engine/`, `/model/` and use dependency injection patterns for component composition

Applied to files:

  • ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : Implement proper interfaces before concrete types

Applied to files:

  • ledger/protobuf/buf.gen.yaml
🧬 Code graph analysis (16)
cmd/bootstrap/run/execution_state.go (2)
ledger/complete/ledger.go (1)
  • DefaultPathFinderVersion (23-23)
model/flow/ledger.go (1)
  • DummyStateCommitment (283-283)
ledger/complete/wal/wal.go (2)
utils/io/filelock.go (2)
  • FileLock (15-18)
  • NewFileLock (24-34)
module/metrics.go (1)
  • WALMetrics (779-782)
utils/io/filelock_test.go (4)
fvm/fvm.go (1)
  • Run (78-85)
utils/io/filelock.go (2)
  • NewFileLock (24-34)
  • RemoveLockFile (101-107)
ledger/trie.go (1)
  • Path (21-21)
utils/unittest/fixtures/util.go (1)
  • NoError (6-10)
ledger/remote/protobuf_encoding_test.go (2)
ledger/protobuf/ledger.pb.go (3)
  • Value (153-165)
  • Value (169-169)
  • Value (170-172)
fvm/evm/types/call.go (1)
  • Data (323-323)
cmd/ledger/main.go (6)
ledger/factory/factory.go (2)
  • NewLedger (46-130)
  • Config (19-35)
module/metrics.go (2)
  • WALMetrics (779-782)
  • LedgerMetrics (726-777)
ledger/complete/ledger.go (1)
  • Ledger (38-46)
engine/access/rest/server.go (1)
  • NewServer (42-90)
ledger/remote/service.go (1)
  • NewService (23-28)
ledger/protobuf/ledger_grpc.pb.go (1)
  • RegisterLedgerServiceServer (161-163)
module/metrics/execution.go (1)
module/metrics/ledger.go (2)
  • LedgerCollector (19-43)
  • NewLedgerCollector (49-172)
ledger/config.go (2)
cmd/scaffold.go (1)
  • Metrics (97-105)
module/metrics.go (1)
  • WALMetrics (779-782)
ledger/complete/factory.go (6)
ledger/complete/wal/wal.go (1)
  • LedgerWAL (382-402)
ledger/config.go (1)
  • CompactorConfig (10-16)
module/metrics.go (1)
  • LedgerMetrics (726-777)
ledger/factory.go (1)
  • Factory (5-9)
ledger/complete/ledger.go (2)
  • NewLedger (49-87)
  • Ledger (38-46)
ledger/complete/ledger_with_compactor.go (1)
  • NewLedgerWithCompactor (25-61)
integration/localnet/builder/bootstrap.go (3)
integration/testnet/network.go (1)
  • GRPCPort (95-95)
model/bootstrap/filenames.go (2)
  • DirnameExecutionState (14-14)
  • FilenameWALRootCheckpoint (31-31)
ledger/complete/wal/checkpointer.go (1)
  • SoftlinkCheckpointFile (1091-1149)
ledger/trie_encoder_test.go (2)
ledger/protobuf/ledger.pb.go (3)
  • Value (153-165)
  • Value (169-169)
  • Value (170-172)
ledger/trie_encoder.go (4)
  • EncodeTrieUpdate (563-577)
  • DecodeTrieUpdate (612-628)
  • EncodeTrieUpdateCBOR (702-714)
  • DecodeTrieUpdateCBOR (719-730)
ledger/remote/factory.go (3)
ledger/factory.go (1)
  • Factory (5-9)
ledger/factory/factory.go (1)
  • NewLedger (46-130)
ledger/remote/client.go (1)
  • NewClient (30-65)
ledger/remote/client.go (2)
ledger/protobuf/ledger.pb.go (21)
  • State (25-30)
  • State (34-34)
  • State (35-37)
  • Value (153-165)
  • Value (169-169)
  • Value (170-172)
  • Key (113-118)
  • Key (122-122)
  • Key (123-125)
  • GetRequest (415-421)
  • GetRequest (425-425)
  • GetRequest (426-428)
  • SetRequest (503-510)
  • SetRequest (514-514)
  • SetRequest (515-517)
  • ProveRequest (607-613)
  • ProveRequest (617-617)
  • ProveRequest (618-620)
  • KeyPart (65-71)
  • KeyPart (75-75)
  • KeyPart (76-78)
ledger/trie_encoder.go (2)
  • DecodeTrieUpdateCBOR (719-730)
  • Type (30-30)
module/metrics/ledger.go (1)
module/metrics.go (2)
  • LedgerMetrics (726-777)
  • WALMetrics (779-782)
ledger/remote/service.go (2)
ledger/protobuf/ledger.pb.go (9)
  • Key (113-118)
  • Key (122-122)
  • Key (123-125)
  • Value (153-165)
  • Value (169-169)
  • Value (170-172)
  • KeyPart (65-71)
  • KeyPart (75-75)
  • KeyPart (76-78)
ledger/trie_encoder.go (2)
  • EncodeTrieUpdateCBOR (702-714)
  • Type (30-30)
ledger/complete/ledger_with_compactor.go (5)
ledger/complete/ledger.go (2)
  • Ledger (38-46)
  • NewLedger (49-87)
ledger/complete/wal/wal.go (1)
  • LedgerWAL (382-402)
ledger/config.go (1)
  • CompactorConfig (10-16)
module/metrics.go (1)
  • LedgerMetrics (726-777)
cmd/scaffold.go (1)
  • Metrics (97-105)
cmd/execution_builder.go (3)
ledger/complete/ledger.go (1)
  • Ledger (38-46)
ledger/trie.go (1)
  • RootHash (164-164)
ledger/factory/factory.go (2)
  • NewLedger (46-130)
  • Config (19-35)
🪛 actionlint (1.7.10)
.github/workflows/image_builds.yml

167-167: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


199-199: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 ast-grep (0.40.5)
ledger/factory/factory_test.go

[warning] 366-366: "Detected a network listener listening on 0.0.0.0 or an empty string.
This could unexpectedly expose the server publicly as it binds to all
available interfaces. Instead, specify another IP address that is not
0.0.0.0 nor the empty string."
Context: net.Listen("tcp", ":0")
Note: [CWE-200] Exposure of Sensitive Information to an Unauthorized Actor [REFERENCES]
- https://owasp.org/Top10/A01_2021-Broken_Access_Control

(avoid-bind-to-all-interfaces-go)

🪛 Buf (1.63.0)
ledger/protobuf/ledger.proto

3-3: Files with package "ledger" must be within a directory "ledger" relative to root but were in directory "ledger/protobuf".

(PACKAGE_DIRECTORY_MATCH)


3-3: Package name "ledger" should be suffixed with a correctly formed version, such as "ledger.v1".

(PACKAGE_VERSION_SUFFIX)


12-12: RPC request type "Empty" should be named "InitialStateRequest" or "LedgerServiceInitialStateRequest".

(RPC_REQUEST_STANDARD_NAME)


12-12: RPC response type "StateResponse" should be named "InitialStateResponse" or "LedgerServiceInitialStateResponse".

(RPC_RESPONSE_STANDARD_NAME)


15-15: RPC request type "StateRequest" should be named "HasStateRequest" or "LedgerServiceHasStateRequest".

(RPC_REQUEST_STANDARD_NAME)


18-18: RPC response type "ValueResponse" should be named "GetSingleValueResponse" or "LedgerServiceGetSingleValueResponse".

(RPC_RESPONSE_STANDARD_NAME)


27-27: RPC response type "ProofResponse" should be named "ProveResponse" or "LedgerServiceProveResponse".

(RPC_RESPONSE_STANDARD_NAME)

Comment on lines +56 to +60
"ledger": {
start: 8000, // 8000-8100 => 20 ledger services
end: 8100,
portCount: 2,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Comment inconsistency: port range supports 50 nodes, not 20.

With portCount: 2 and range [8000, 8100), the capacity is (8100 - 8000) / 2 = 50 ledger services, not 20 as stated.

Suggested fix
 	"ledger": {
-		start:     8000, // 8000-8100 => 20 ledger services
+		start:     8000, // 8000-8100 => 50 ledger services
 		end:       8100,
 		portCount: 2,
 	},
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"ledger": {
start: 8000, // 8000-8100 => 20 ledger services
end: 8100,
portCount: 2,
},
"ledger": {
start: 8000, // 8000-8100 => 50 ledger services
end: 8100,
portCount: 2,
},
🤖 Prompt for AI Agents
In @integration/localnet/builder/ports.go around lines 56 - 60, The comment for
the "ledger" port range is incorrect: with start: 8000, end: 8100 and portCount:
2 the capacity is (8100 - 8000) / 2 = 50 services, not 20; update the inline
comment in the "ledger" block to reflect the correct capacity (50 ledger
services) or adjust portCount/end values if you intended 20 services so the math
matches.

Comment on lines +495 to +498
if fileSize == 0 {
t.Skip("file is empty, cannot modify")
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Using t.Skip() in a helper function may skip the entire test unexpectedly.

t.Skip() will skip the calling test entirely, not just this helper invocation. In the context of Test_Checkpointing where randomlyModifyFile is called in a loop (lines 381-383), if any checkpoint file happens to be empty, the entire subtest will be skipped rather than just skipping that particular file modification.

Consider returning an error or simply returning early without logging:

🔧 Suggested fix
 	if fileSize == 0 {
-		t.Skip("file is empty, cannot modify")
-		return
+		// Empty file, nothing to modify - this shouldn't happen in normal test scenarios
+		// but handle gracefully by returning early
+		return
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if fileSize == 0 {
t.Skip("file is empty, cannot modify")
return
}
if fileSize == 0 {
// Empty file, nothing to modify - this shouldn't happen in normal test scenarios
// but handle gracefully by returning early
return
}
🤖 Prompt for AI Agents
In @ledger/complete/wal/checkpointer_test.go around lines 495 - 498, The helper
randomlyModifyFile should not call t.Skip() because that skips the whole test;
instead change randomlyModifyFile to return an error (or a bool) and have
callers handle it: remove t.Skip() and either return nil / false when
fileSize==0 or return a specific error, update callers in Test_Checkpointing’s
loop to check the return value and continue the loop (or t.Log and continue)
when the file is empty. Ensure the helper signature (randomlyModifyFile) and all
call sites are updated accordingly so no test-level skip occurs.

Comment on lines +1120 to +1126
// Calculate expected relative target for comparison
symlinkDir := filepath.Dir(newPath)
expectedRelTarget, _ := filepath.Rel(symlinkDir, match)
if target == expectedRelTarget || target == match {
// Symlink already exists and points to correct target, skip creation
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ignored error from filepath.Rel may cause incorrect symlink comparison.

If filepath.Rel fails and returns an error, expectedRelTarget could be an empty string or invalid, causing the comparison at line 1123 to behave unexpectedly. This could incorrectly flag a valid symlink as pointing to the wrong target.

🔧 Suggested fix
 				// Calculate expected relative target for comparison
 				symlinkDir := filepath.Dir(newPath)
-				expectedRelTarget, _ := filepath.Rel(symlinkDir, match)
-				if target == expectedRelTarget || target == match {
+				expectedRelTarget, relErr := filepath.Rel(symlinkDir, match)
+				if target == match || (relErr == nil && target == expectedRelTarget) {
 					// Symlink already exists and points to correct target, skip creation
 					continue
 				}
🤖 Prompt for AI Agents
In @ledger/complete/wal/checkpointer.go around lines 1120 - 1126, The call to
filepath.Rel when computing expectedRelTarget in checkpointer.go ignores its
error, so if filepath.Rel fails expectedRelTarget may be wrong and the
subsequent comparison (variables: symlinkDir, expectedRelTarget, match, target,
newPath) can mis-detect correct symlinks; fix by checking the error returned
from filepath.Rel, and on error fall back to a safe comparison (for example use
match as the expected target or compute absolute paths with filepath.Abs for
both target and match before comparing) so the condition if target ==
expectedRelTarget || target == match reliably reflects the actual link target.

Comment on lines +45 to +55
conn, err := grpc.NewClient(
grpcAddr,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultCallOptions(
grpc.MaxCallRecvMsgSize(int(maxResponseSize)),
grpc.MaxCallSendMsgSize(int(maxRequestSize)),
),
)
if err != nil {
return nil, fmt.Errorf("failed to connect to ledger service: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the client.go file to understand the context
head -60 ledger/remote/client.go | cat -n

Repository: onflow/flow-go

Length of output: 2287


🏁 Script executed:

# Look for comments or documentation around the credentials setup
rg -B5 -A5 "insecure.NewCredentials" ledger/remote/client.go

Repository: onflow/flow-go

Length of output: 531


🏁 Script executed:

# Check if there's documentation about the remote ledger service
fd -t f "README|doc|spec" ledger/remote/ || echo "No docs found in remote/"

Repository: onflow/flow-go

Length of output: 40


🏁 Script executed:

# Search for other gRPC client configurations in the codebase
rg "grpc\.NewClient|NewConn" -t go --max-count=20

Repository: onflow/flow-go

Length of output: 9775


🏁 Script executed:

# Check for environment-specific or configuration-based credential handling
rg "TransportCredentials|tls\.|credentials" -t go ledger/remote/

Repository: onflow/flow-go

Length of output: 212


Add configuration to support TLS credentials for production deployments.

The client uses grpc.WithTransportCredentials(insecure.NewCredentials()) without any mechanism to switch to secure credentials for production. While insecure credentials are appropriate for local development and testing, this implementation provides no configuration option or conditional logic to use TLS in production environments. Given that this is a ledger service client handling cryptographic operations, add support for configurable TLS credentials that can be enabled based on deployment context (e.g., environment variables or configuration parameters).

🤖 Prompt for AI Agents
In @ledger/remote/client.go around lines 45 - 55, The client always uses
insecure.NewCredentials() in the grpc.NewClient call, so add a configurable
switch to use TLS credentials for production: detect a config/env flag (e.g.,
LEDGER_GRPC_TLS or use a TLSCert/TLSRootCA path) and, when enabled, create
secure credentials (e.g., via credentials.NewClientTLSFromFile or equivalent)
and pass grpc.WithTransportCredentials(creds) instead of
insecure.NewCredentials(); keep the existing grpcAddr and call option handling
(grpc.MaxCallRecvMsgSize, grpc.MaxCallSendMsgSize) and ensure errors from
loading certs are returned (update the grpc.NewClient construction around the
grpcAddr, maxResponseSize, maxRequestSize block).

Comment on lines +75 to +94
// InitialState returns the initial state of the ledger.
func (c *Client) InitialState() ledger.State {
ctx := context.Background()
resp, err := c.client.InitialState(ctx, &emptypb.Empty{})
if err != nil {
c.logger.Error().Err(err).Msg("failed to get initial state")
return ledger.DummyState
}

var state ledger.State
if len(resp.State.Hash) != len(state) {
c.logger.Error().
Int("expected", len(state)).
Int("got", len(resp.State.Hash)).
Msg("invalid state hash length")
return ledger.DummyState
}
copy(state[:], resp.State.Hash)
return state
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for ledger.Ledger interface definition
find . -name "*.go" -path "*/ledger/*" | head -20

Repository: onflow/flow-go

Length of output: 1138


🏁 Script executed:

# Find the interface definition more broadly
rg "type Ledger interface" -A 20

Repository: onflow/flow-go

Length of output: 3090


🏁 Script executed:

# Also look for any definitions in package ledger
fd -e go -path "*/ledger/*" | xargs rg "type Ledger interface" -A 20

Repository: onflow/flow-go

Length of output: 3276


🏁 Script executed:

# Search for implementations of the Ledger interface
rg "func.*InitialState\(\)" -A 5 --type go -B 2

Repository: onflow/flow-go

Length of output: 5393


🏁 Script executed:

# Look for implementations of ledger.Ledger interface
rg "type.*Client.*struct" ledger/remote/ -A 10

Repository: onflow/flow-go

Length of output: 670


🏁 Script executed:

# Check if there's any documentation or comment about this design choice
rg "InitialState" ledger/ledger.go -B 5 -A 10

Repository: onflow/flow-go

Length of output: 684


🏁 Script executed:

# Search for usage of InitialState() to understand how it's called
rg "\.InitialState\(\)" --type go -B 3 -A 3 | head -100

Repository: onflow/flow-go

Length of output: 5642


🏁 Script executed:

# Check the ledger.DummyState constant to understand what's being returned
rg "DummyState" ledger/ --type go -B 2 -A 2

Repository: onflow/flow-go

Length of output: 4212


InitialState silently returns DummyState on error, contradicting test expectations.

The InitialState() method logs errors but returns ledger.DummyState instead of propagating them. While this matches the interface signature (which has no error return), factory tests explicitly assert that InitialState() should never return DummyState (assert.NotEqual(t, ledger.DummyState, remoteState)). Returning DummyState on gRPC failures or invalid state hash length creates a silent failure that violates this expectation and prevents callers from detecting errors. Consider updating the interface to support error returns, or ensure the remote client never returns DummyState except as a valid state commitment.

🤖 Prompt for AI Agents
In @ledger/remote/client.go around lines 75 - 94, InitialState currently
swallows errors and returns ledger.DummyState, which breaks tests; change the
Client.InitialState signature to return (ledger.State, error) and update the
implementation to propagate errors from c.client.InitialState and from the
invalid hash-length check (return zero value state and the error instead of
DummyState), and then update all callers and the ledger.Client interface
implementations to handle the new (state, error) return so the remote client
never silently returns DummyState on failures.

Comment on lines +150 to +168
values := make([]ledger.Value, len(req.Values))
for i, protoValue := range req.Values {
var value ledger.Value
// Reconstruct the original value type using is_nil flag
// This preserves the distinction between nil and []byte{} that protobuf loses
if len(protoValue.Data) == 0 {
if protoValue.IsNil {
// Original value was nil
value = nil
} else {
// Original value was []byte{} (empty slice)
value = ledger.Value([]byte{})
}
} else {
// Non-empty value, use data as-is
value = ledger.Value(protoValue.Data)
}
values[i] = value
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing nil check for protoValue can cause panic.

In proto3 generated Go code, repeated message fields like Values are []*ledgerpb.Value, meaning individual elements can be nil. Accessing protoValue.Data or protoValue.IsNil on a nil element would cause a panic. Per coding guidelines, all inputs must be considered potentially Byzantine.

🐛 Proposed fix
 values := make([]ledger.Value, len(req.Values))
 for i, protoValue := range req.Values {
+	if protoValue == nil {
+		return nil, status.Error(codes.InvalidArgument, "value at index is nil")
+	}
 	var value ledger.Value
 	// Reconstruct the original value type using is_nil flag
🤖 Prompt for AI Agents
In @ledger/remote/service.go around lines 150 - 168, The loop over req.Values
must guard against nil elements to avoid panics; before reading protoValue.Data
or protoValue.IsNil in the loop that builds values (the slice of ledger.Value),
add a nil check for protoValue and handle it explicitly (e.g., treat a nil
protoValue as an original nil ledger.Value or return a clear error), then
proceed with the existing logic that distinguishes protoValue.IsNil vs empty
Data to reconstruct the ledger.Value; reference req.Values, protoValue,
protoValue.Data, protoValue.IsNil and the values slice when locating where to
add the check.

Comment on lines +230 to +236
keyParts := make([]ledger.KeyPart, len(protoKey.Parts))
for i, part := range protoKey.Parts {
if part.Type > 65535 {
return ledger.Key{}, status.Error(codes.InvalidArgument, "key part type exceeds uint16")
}
keyParts[i] = ledger.NewKeyPart(uint16(part.Type), part.Value)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing nil check for KeyPart elements can cause panic.

Similar to the Values issue, protoKey.Parts is []*ledgerpb.KeyPart in generated Go code. If a client sends a key with a nil part element, accessing part.Type would cause a nil pointer dereference panic.

🐛 Proposed fix
 keyParts := make([]ledger.KeyPart, len(protoKey.Parts))
 for i, part := range protoKey.Parts {
+	if part == nil {
+		return ledger.Key{}, status.Error(codes.InvalidArgument, "key part is nil")
+	}
 	if part.Type > 65535 {
 		return ledger.Key{}, status.Error(codes.InvalidArgument, "key part type exceeds uint16")
 	}
🤖 Prompt for AI Agents
In @ledger/remote/service.go around lines 230 - 236, The loop over
protoKey.Parts can panic if an element is nil; update the loop that builds
keyParts to check for nil before accessing part.Type/value (i.e., inside the for
i, part := range protoKey.Parts block), and return a
status.Error(codes.InvalidArgument, "...") when a nil part is encountered; keep
the existing uint16 bounds check for part.Type and continue to call
ledger.NewKeyPart only after validating part is non-nil and its Type fits in
uint16 so you populate the ledger.KeyPart slice safely.

Comment on lines +705 to +712
require.NotNil(t, decoded.Payloads[0].Value(), "Decoded Payload 0 should have nil value")
require.Equal(t, 0, len(decoded.Payloads[0].Value()), "Decoded Payload 0 should have 0 length")
require.NotNil(t, decoded.Payloads[1].Value(), "Decoded Payload 1 should have non-nil value")
require.Equal(t, 0, len(decoded.Payloads[1].Value()), "Decoded Payload 1 should have 0 length")

// The key assertion: they should be the same after encoding/decoding due to normalization in encoding
require.Equal(t, decoded.Payloads[0].Value(), decoded.Payloads[1].Value(),
"Decoded nil and []byte{} should remain distinct with CBOR encoding")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix inconsistent assertion messages.

The assertion messages don't match what's being tested:

  • Line 705: Uses require.NotNil but message says "should have nil value"
  • Line 712: Message says values "should remain distinct" but the test asserts they are equal
🔧 Suggested fix
-		require.NotNil(t, decoded.Payloads[0].Value(), "Decoded Payload 0 should have nil value")
+		require.NotNil(t, decoded.Payloads[0].Value(), "Decoded Payload 0 should be normalized to non-nil empty slice")
		require.Equal(t, 0, len(decoded.Payloads[0].Value()), "Decoded Payload 0 should have 0 length")
		require.NotNil(t, decoded.Payloads[1].Value(), "Decoded Payload 1 should have non-nil value")
		require.Equal(t, 0, len(decoded.Payloads[1].Value()), "Decoded Payload 1 should have 0 length")

		// The key assertion: they should be the same after encoding/decoding due to normalization in encoding
		require.Equal(t, decoded.Payloads[0].Value(), decoded.Payloads[1].Value(),
-			"Decoded nil and []byte{} should remain distinct with CBOR encoding")
+			"Decoded nil and []byte{} should be identical after CBOR normalization")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
require.NotNil(t, decoded.Payloads[0].Value(), "Decoded Payload 0 should have nil value")
require.Equal(t, 0, len(decoded.Payloads[0].Value()), "Decoded Payload 0 should have 0 length")
require.NotNil(t, decoded.Payloads[1].Value(), "Decoded Payload 1 should have non-nil value")
require.Equal(t, 0, len(decoded.Payloads[1].Value()), "Decoded Payload 1 should have 0 length")
// The key assertion: they should be the same after encoding/decoding due to normalization in encoding
require.Equal(t, decoded.Payloads[0].Value(), decoded.Payloads[1].Value(),
"Decoded nil and []byte{} should remain distinct with CBOR encoding")
require.NotNil(t, decoded.Payloads[0].Value(), "Decoded Payload 0 should be normalized to non-nil empty slice")
require.Equal(t, 0, len(decoded.Payloads[0].Value()), "Decoded Payload 0 should have 0 length")
require.NotNil(t, decoded.Payloads[1].Value(), "Decoded Payload 1 should have non-nil value")
require.Equal(t, 0, len(decoded.Payloads[1].Value()), "Decoded Payload 1 should have 0 length")
// The key assertion: they should be the same after encoding/decoding due to normalization in encoding
require.Equal(t, decoded.Payloads[0].Value(), decoded.Payloads[1].Value(),
"Decoded nil and []byte{} should be identical after CBOR normalization")
🤖 Prompt for AI Agents
In @ledger/trie_encoder_test.go around lines 705 - 712, Fix the mismatched
assertions/messages around decoded.Payloads: replace the incorrect
require.NotNil on decoded.Payloads[0].Value() with require.Nil and update its
message to "Decoded Payload 0 should have nil value"; keep require.NotNil for
decoded.Payloads[1].Value() (empty slice) and ensure its message says "Decoded
Payload 1 should have non-nil value"; finally update the final require.Equal
message to reflect the intended expectation ("Decoded nil and []byte{} should be
equal after normalization") so the assertion text matches the check of
decoded.Payloads[0].Value() == decoded.Payloads[1].Value().

Comment on lines 216 to 217
// Decode trie update if present using CBOR decoding to preserve nil vs []byte{} distinction
trieUpdate, err := ledger.DecodeTrieUpdateCBOR(resp.TrieUpdate)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before normalizing the nil and []byte{} cases in the payload value, we had to use CBOR encoding because EncodeTrieUpdate could not distinguish between the two, whereas CBOR encoding could.

Now that normalization has been integrated into the encoding process, both EncodeTrieUpdate and EncodeTrieUpdateCBOR can be used to encode the TrieUpdate. We can therefore choose the option with better performance.

I will create a separate PR for adding benchmark results.

cc @fxamacker

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/image_builds.yml:
- Line 195: The promote-to-public-registry job's role matrix is missing the
execution-ledger entry, so update the matrix used by promote-to-public-registry
(the role: [...] list) to include execution-ledger (i.e., add "execution-ledger"
into the role array alongside access, collection, consensus, execution,
observer, verification) so that the execution-ledger images built and tagged as
execution-ledger:${TAG} in public-build are promoted to the public registry.
🧹 Nitpick comments (1)
.github/workflows/image_builds.yml (1)

167-167: Consider upgrading actions/checkout from v3 to v4.

Static analysis indicates that actions/checkout@v3 is outdated. While v3 still functions, v4 offers improved performance and is the current recommended version.

♻️ Suggested upgrade
      - name: Checkout repo
-       uses: actions/checkout@v3
+       uses: actions/checkout@v4

Apply this change to both occurrences at lines 167 and 199.

Also applies to: 199-199

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e637a0d and 682d78f.

📒 Files selected for processing (1)
  • .github/workflows/image_builds.yml
🧰 Additional context used
🪛 actionlint (1.7.10)
.github/workflows/image_builds.yml

167-167: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


199-199: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
  • GitHub Check: Lint (./insecure/)
  • GitHub Check: Lint (./integration/)
  • GitHub Check: Lint (./)
  • GitHub Check: Integration Tests Others (integration)
  • GitHub Check: Unit Tests (network/p2p/connection)
  • GitHub Check: Unit Tests (engine/common)
  • GitHub Check: Unit Tests (engine)
  • GitHub Check: Unit Tests (network/test/cohort1)
  • GitHub Check: Unit Tests (engine/execution/computation)
  • GitHub Check: Unit Tests (network/test/cohort2)
  • GitHub Check: Unit Tests (engine/execution)
  • GitHub Check: Unit Tests (engine/execution/ingestion)
  • GitHub Check: Unit Tests (utils)
  • GitHub Check: Unit Tests (network/alsp)
  • GitHub Check: Unit Tests (others)
  • GitHub Check: Unit Tests (network)
  • GitHub Check: Unit Tests (engine/verification)
  • GitHub Check: Unit Tests (state)
  • GitHub Check: Unit Tests (engine/collection)
  • GitHub Check: Unit Tests (network/p2p)
  • GitHub Check: Unit Tests (module/dkg)
  • GitHub Check: Unit Tests (module)
  • GitHub Check: Unit Tests (network/p2p/node)
  • GitHub Check: Unit Tests (engine/access)
  • GitHub Check: Unit Tests (consensus)
  • GitHub Check: Unit Tests (cmd)
  • GitHub Check: Unit Tests (network/p2p/scoring)
  • GitHub Check: Unit Tests (engine/consensus)
  • GitHub Check: Unit Tests (fvm)
  • GitHub Check: Unit Tests (storage)
  • GitHub Check: Unit Tests (admin)
  • GitHub Check: Unit Tests (ledger)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
  • GitHub Check: Unit Tests Insecure (insecure)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
  • GitHub Check: Docker Build
  • GitHub Check: analyze-code (go)
🔇 Additional comments (1)
.github/workflows/image_builds.yml (1)

62-66: LGTM! New execution-ledger build commands follow the established pattern.

The build commands for the execution-ledger service are consistent with all other roles, covering all four image variants (with-adx, without-adx, without-netgo-without-adx, arm).

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@@ -187,18 +193,18 @@ jobs:
fail-fast: false
matrix:
role: [access, collection, consensus, execution, observer, verification]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for execution-ledger references in .github/
rg -n "execution-ledger" .github/

# Search for ledger-related promotion configurations
rg -n -i "ledger" .github/workflows/

Repository: onflow/flow-go

Length of output: 1408


🏁 Script executed:

# Get the full context of the image_builds.yml file around line 195
wc -l .github/workflows/image_builds.yml

# Read the promote-to-public-registry job and surrounding context
head -n 250 .github/workflows/image_builds.yml | tail -n 100

Repository: onflow/flow-go

Length of output: 5045


🏁 Script executed:

# Read the public-build job from the start to understand role mapping
head -n 120 .github/workflows/image_builds.yml

Repository: onflow/flow-go

Length of output: 5927


🏁 Script executed:

# Find and examine the promote-images action
find .github/actions -name "promote-images" -type d

# Check the action's implementation
cat .github/actions/promote-images/action.yml 2>/dev/null || cat .github/actions/promote-images/action.yaml 2>/dev/null

Repository: onflow/flow-go

Length of output: 108


🏁 Script executed:

# Also check how images are tagged in the Makefile
rg -n "docker.*execution-ledger" --max-count=20

Repository: onflow/flow-go

Length of output: 1372


🏁 Script executed:

# Find the promote-images action - it might be in actions directory
find . -path "./.*" -prune -o -name "promote-images" -type d -print

# Or search more broadly
find . -name "promote-images*" -type f 2>/dev/null | head -20

Repository: onflow/flow-go

Length of output: 82


🏁 Script executed:

# Check what promote-images action looks like
rg -l "promote-images" --type yaml | head -10

Repository: onflow/flow-go

Length of output: 40


🏁 Script executed:

# Examine the promote-images action
ls -la ./actions/promote-images/

# Read the action file
cat ./actions/promote-images/action.yml

Repository: onflow/flow-go

Length of output: 2632


🏁 Script executed:

# Also check if there's a script that implements the promotion logic
find ./actions/promote-images -type f

Repository: onflow/flow-go

Length of output: 93


Add execution-ledger to the promotion matrix.

The public-build job explicitly builds and pushes execution-ledger images to the private registry (tagged as execution-ledger:${TAG}), but the promote-to-public-registry job's role matrix does not include execution-ledger. This means the ledger service images built in public-build are not being promoted to the public registry. Update the role matrix to include execution-ledger:

role: [access, collection, consensus, execution, execution-ledger, observer, verification]
🤖 Prompt for AI Agents
In @.github/workflows/image_builds.yml at line 195, The
promote-to-public-registry job's role matrix is missing the execution-ledger
entry, so update the matrix used by promote-to-public-registry (the role: [...]
list) to include execution-ledger (i.e., add "execution-ledger" into the role
array alongside access, collection, consensus, execution, observer,
verification) so that the execution-ledger images built and tagged as
execution-ledger:${TAG} in public-build are promoted to the public registry.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@ledger/partial/ledger_test.go`:
- Around line 210-256: The test TestPartialLedger_StateByIndex incorrectly
asserts that StateByIndex(-1) should error even though the Ledger interface
documents -1 as the last index; update the test to treat -1 as valid for the
partial ledger by calling pled.StateByIndex(-1) and asserting no error and that
the returned state equals newState (keep assertions that StateByIndex(1) and
StateByIndex(-2) error); reference the test function
TestPartialLedger_StateByIndex and the partial ledger method StateByIndex
(constructed via partial.NewLedger) when making this change.

In `@ledger/partial/ledger.go`:
- Around line 174-181: StateByIndex currently rejects index == -1 even though
the Ledger interface treats -1 as the last index; update the Ledger.StateByIndex
method to treat index == -1 as index 0 for the partial ledger: if index == 0 ||
index == -1 return l.state, nil, otherwise return ledger.DummyState with the
existing out-of-range error message. Modify the conditional in the StateByIndex
function (referencing l.state, ledger.DummyState, and the method name
StateByIndex) to handle -1 as described and keep the same error formatting for
other indices.

In `@ledger/remote/client.go`:
- Around line 257-282: The exponential backoff in the retry loop (uses ctx :=
context.Background(), maxRetries, retryDelay and c.client.InitialState) has no
upper bound; add a cap such as maxRetryDelay (e.g., 5s or configurable) and
after computing retryDelay = time.Duration(float64(retryDelay) * 1.5) clamp it
so retryDelay = min(retryDelay, maxRetryDelay) (or use an if to set to
maxRetryDelay), ensuring the logged Dur("retry_delay", retryDelay) reflects the
capped value.
♻️ Duplicate comments (2)
ledger/remote/client.go (2)

45-55: Insecure credentials - previously flagged.

This was identified in a previous review. The client uses insecure credentials without a mechanism to enable TLS for production.


75-94: InitialState silently returns DummyState on error - previously flagged.

This was identified in a previous review. Returning DummyState on gRPC failures creates silent failures.

🧹 Nitpick comments (6)
integration/localnet/builder/bootstrap.go (2)

850-888: Exposed port may be unused if service listens on Unix socket.

The service configuration exposes testnet.GRPCPort (line 888) but the ledger service is configured to listen on a Unix socket (line 856). If the Unix socket configuration is intentional, the port exposure is unnecessary. If TCP is intended (to fix the mismatch issue raised above), the listening address needs to be updated.

♻️ Ensure consistency between listen address and exposed ports

If TCP is the intended transport:

 	service := Service{
 		name:  ledgerServiceName,
 		Image: "localnet-ledger",
 		Command: []string{
 			"--triedir=/trie",
-			"--ledger-service-addr=unix:///sockets/ledger.sock",
+			fmt.Sprintf("--ledger-service-addr=0.0.0.0:%s", testnet.GRPCPort),
 			"--mtrie-cache-size=100",

If Unix socket is intended, remove the port exposure and set up a shared socket volume between ledger service and execution nodes.


793-810: Consider simplifying the loop logic.

The loop at lines 797-806 finds the first execution node to share its trie directory with the ledger service. The condition executionIndex < ledgerExecutionCount on line 800 and the increment on line 804 suggest support for multiple execution nodes, but since ledgerExecutionCount > 0 is validated before this function is called, the first execution node found will always satisfy the condition, making the increment unreachable.

The intent appears to be to always use the first execution node's trie directory. Consider simplifying for clarity:

♻️ Simplified loop
-	var firstExecutionNode *testnet.ContainerConfig
-	executionIndex := 0
-	for _, container := range flowNodeContainerConfigs {
-		if container.Role == flow.RoleExecution {
-			if executionIndex < ledgerExecutionCount {
-				firstExecutionNode = &container
-				break
-			}
-			executionIndex++
-		}
-	}
+	var firstExecutionNode *testnet.ContainerConfig
+	for _, container := range flowNodeContainerConfigs {
+		if container.Role == flow.RoleExecution {
+			firstExecutionNode = &container
+			break
+		}
+	}
ledger/complete/ledger.go (1)

470-496: Consider clarifying the error message for negative out-of-range indices.

The implementation correctly supports negative indexing. However, the error message on line 487 displays index-count which shows the re-computed negative value, not the original user-provided index. This could be confusing for debugging.

Proposed improvement for clearer error message
 // Handle negative index (-1 means last index)
+	originalIndex := index
 	if index < 0 {
 		index = count + index
 		if index < 0 {
-			return ledger.DummyState, fmt.Errorf("index %d is out of range (count: %d)", index-count, count)
+			return ledger.DummyState, fmt.Errorf("index %d is out of range (count: %d)", originalIndex, count)
 		}
 	}
cmd/ledger/main.go (2)

101-108: Consider adding bounds checking for message size conversion.

The uint to int cast for maxRequestSize and maxResponseSize could silently overflow on 32-bit systems if values exceed math.MaxInt32. While 32-bit deployments are uncommon, explicit bounds checking would be defensive.

🔧 Suggested defensive check
+	import "math"
+
 	// Create gRPC server with max message size configuration.
+	if *maxRequestSize > uint(math.MaxInt32) || *maxResponseSize > uint(math.MaxInt32) {
+		logger.Fatal().Msg("message size exceeds maximum int value")
+	}
 	grpcServer := grpc.NewServer(
 		grpc.MaxRecvMsgSize(int(*maxRequestSize)),
 		grpc.MaxSendMsgSize(int(*maxResponseSize)),
 	)

174-188: Consider adding a timeout for ledger shutdown.

The graceful shutdown waits indefinitely on <-ledgerStorage.Done(). If the ledger hangs during shutdown, the process will never exit. Consider adding a context with timeout.

🔧 Suggested timeout for shutdown
 	logger.Info().Msg("waiting for ledger to stop...")
-	<-ledgerStorage.Done()
+	select {
+	case <-ledgerStorage.Done():
+		logger.Info().Msg("ledger stopped cleanly")
+	case <-time.After(30 * time.Second):
+		logger.Warn().Msg("ledger shutdown timed out")
+	}
 
 	logger.Info().Msg("ledger service stopped")
ledger/complete/ledger_with_compactor.go (1)

117-143: Add a public method to close the trie update channel instead of accessing private fields.

LedgerWithCompactor.Done() directly accesses private fields (closeTrieUpdateCh, trieUpdateCh) from Ledger, breaking encapsulation and creating unnecessary coupling. The shutdown order is correct—closing the channel first allows the compactor to drain remaining updates—and sync.Once properly prevents double-close panics. However, expose a public method on Ledger (e.g., CloseTrieUpdateChannel()) so the wrapper can trigger the closure without reaching into private fields. This maintains the same safety guarantees while following Go visibility conventions.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 682d78f and d2cdcfa.

📒 Files selected for processing (10)
  • cmd/ledger/README.md
  • cmd/ledger/main.go
  • integration/localnet/builder/bootstrap.go
  • ledger/complete/ledger.go
  • ledger/complete/ledger_test.go
  • ledger/complete/ledger_with_compactor.go
  • ledger/ledger.go
  • ledger/partial/ledger.go
  • ledger/partial/ledger_test.go
  • ledger/remote/client.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/ledger/README.md
🧰 Additional context used
📓 Path-based instructions (6)
**/*.go

📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)

Follow Go coding conventions as documented in @docs/agents/CodingConventions.md

Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md

**/*.go: Follow the existing module structure in /module/, /engine/, /model/ and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: run make generate-mocks after interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoid fmt.Errorf, use irrecoverable package for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)

Files:

  • ledger/ledger.go
  • cmd/ledger/main.go
  • ledger/remote/client.go
  • ledger/complete/ledger.go
  • ledger/partial/ledger_test.go
  • ledger/complete/ledger_test.go
  • ledger/partial/ledger.go
  • ledger/complete/ledger_with_compactor.go
  • integration/localnet/builder/bootstrap.go
{crypto,fvm,ledger,storage}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Security checks for cryptographic misuse must be enforced using gosec

Files:

  • ledger/ledger.go
  • ledger/remote/client.go
  • ledger/complete/ledger.go
  • ledger/partial/ledger_test.go
  • ledger/complete/ledger_test.go
  • ledger/partial/ledger.go
  • ledger/complete/ledger_with_compactor.go
{crypto,fvm,ledger,access,engine}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation

Files:

  • ledger/ledger.go
  • ledger/remote/client.go
  • ledger/complete/ledger.go
  • ledger/partial/ledger_test.go
  • ledger/complete/ledger_test.go
  • ledger/partial/ledger.go
  • ledger/complete/ledger_with_compactor.go
{storage,ledger,execution,fvm}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

State consistency is paramount; use proper synchronization primitives

Files:

  • ledger/ledger.go
  • ledger/remote/client.go
  • ledger/complete/ledger.go
  • ledger/partial/ledger_test.go
  • ledger/complete/ledger_test.go
  • ledger/partial/ledger.go
  • ledger/complete/ledger_with_compactor.go
{module,engine,cmd}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

All major processing components must implement the Component interface from /module/component/component.go to ensure consistent lifecycle management and graceful shutdown patterns

Files:

  • cmd/ledger/main.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*_test.go: Unit tests should be co-located with the code they test
Follow the existing pattern of *_test.go files for test naming
Use fixtures for realistic test data as defined in /utils/unittest/

Files:

  • ledger/partial/ledger_test.go
  • ledger/complete/ledger_test.go
🧠 Learnings (2)
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {storage,ledger,execution,fvm}/**/*.go : State consistency is paramount; use proper synchronization primitives

Applied to files:

  • cmd/ledger/main.go
  • ledger/remote/client.go
  • ledger/complete/ledger.go
  • ledger/complete/ledger_test.go
  • integration/localnet/builder/bootstrap.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {crypto,fvm,ledger,access,engine}/**/*.go : Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation

Applied to files:

  • cmd/ledger/main.go
  • ledger/remote/client.go
  • integration/localnet/builder/bootstrap.go
🧬 Code graph analysis (4)
ledger/remote/client.go (2)
ledger/ledger.go (9)
  • State (155-155)
  • DummyState (159-159)
  • QuerySingleValue (88-91)
  • Value (350-350)
  • Key (203-205)
  • Query (52-55)
  • Update (109-113)
  • Proof (188-188)
  • KeyPart (289-292)
ledger/trie_encoder.go (2)
  • DecodeTrieUpdateCBOR (719-730)
  • Type (30-30)
ledger/complete/ledger.go (2)
ledger/ledger.go (3)
  • Ledger (21-49)
  • State (155-155)
  • DummyState (159-159)
ledger/trie.go (1)
  • RootHash (164-164)
ledger/partial/ledger.go (1)
ledger/ledger.go (3)
  • Ledger (21-49)
  • State (155-155)
  • DummyState (159-159)
integration/localnet/builder/bootstrap.go (3)
integration/testnet/network.go (1)
  • GRPCPort (95-95)
model/bootstrap/filenames.go (2)
  • DirnameExecutionState (14-14)
  • FilenameWALRootCheckpoint (31-31)
ledger/complete/wal/checkpointer.go (1)
  • SoftlinkCheckpointFile (1091-1149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
  • GitHub Check: Lint (./insecure/)
  • GitHub Check: Lint (./integration/)
  • GitHub Check: Lint (./)
  • GitHub Check: Unit Tests Insecure (insecure)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
  • GitHub Check: Integration Tests Others (integration)
  • GitHub Check: Unit Tests (module)
  • GitHub Check: Unit Tests (state)
  • GitHub Check: Unit Tests (network/p2p/connection)
  • GitHub Check: Unit Tests (network/p2p)
  • GitHub Check: Unit Tests (network/test/cohort1)
  • GitHub Check: Unit Tests (others)
  • GitHub Check: Unit Tests (network/alsp)
  • GitHub Check: Unit Tests (network/p2p/scoring)
  • GitHub Check: Unit Tests (engine/execution)
  • GitHub Check: Unit Tests (module/dkg)
  • GitHub Check: Unit Tests (network/p2p/node)
  • GitHub Check: Unit Tests (network)
  • GitHub Check: Unit Tests (network/test/cohort2)
  • GitHub Check: Unit Tests (engine/verification)
  • GitHub Check: Unit Tests (engine/common)
  • GitHub Check: Unit Tests (storage)
  • GitHub Check: Unit Tests (engine/consensus)
  • GitHub Check: Unit Tests (utils)
  • GitHub Check: Unit Tests (cmd)
  • GitHub Check: Unit Tests (engine/execution/computation)
  • GitHub Check: Unit Tests (engine/execution/ingestion)
  • GitHub Check: Unit Tests (engine)
  • GitHub Check: Unit Tests (fvm)
  • GitHub Check: Unit Tests (engine/access)
  • GitHub Check: Unit Tests (admin)
  • GitHub Check: Unit Tests (consensus)
  • GitHub Check: Unit Tests (ledger)
  • GitHub Check: Unit Tests (engine/collection)
  • GitHub Check: Docker Build
  • GitHub Check: analyze-code (go)
🔇 Additional comments (29)
integration/localnet/builder/bootstrap.go (6)

20-21: LGTM!

The new imports are appropriately aliased and used correctly for checkpoint symlinking and bootstrap filename constants.


67-67: LGTM!

The new flag with default value 0 maintains backward compatibility—all execution nodes use local ledger unless explicitly configured otherwise.

Also applies to: 110-110


183-189: LGTM!

The validation ensures ledgerExecutionCount is within the valid range [0, executionCount] with clear error messages.


203-206: LGTM!

The conditional creation of the ledger service is correctly placed after flow services are prepared, ensuring execution nodes can be configured with their dependencies first.


702-718: LGTM with a note.

The modification handles the edge case where a directory exists at the target file path. This is a reasonable defensive addition. The function is called for DockerComposeFile and PrometheusTargetsFile, both of which should always be regular files.


821-841: Symlink resolution is correct and will work inside the container.

The implementation of SoftlinkCheckpointFile correctly creates relative symlinks using filepath.Rel() to calculate the path from the symlink location (in /trie) to the target files (in /bootstrap/execution-state). The relative path calculated on the host (e.g., ../bootstrap/execution-state/checkpoint) preserves the same directory structure when both directories are mounted in the container. Since both /trie and /bootstrap are mounted as volumes, the symlinks resolve correctly inside the container. No code changes are needed.

ledger/ledger.go (1)

42-48: LGTM! Interface additions are well-documented.

The new StateCount() and StateByIndex() methods cleanly extend the Ledger interface. The documentation for -1 as the last index convention is clear.

ledger/partial/ledger.go (1)

168-172: LGTM!

The StateCount() implementation correctly returns 1 and the documentation clearly explains the partial ledger's single-state constraint.

ledger/partial/ledger_test.go (1)

173-208: LGTM!

The test properly validates the partial ledger's single-state constraint with thorough setup and cleanup.

ledger/complete/ledger_test.go (3)

817-854: LGTM!

The test validates state counting behavior correctly. The unused variable suppression on line 853 is acceptable for test clarity.


856-926: LGTM! Comprehensive test coverage for StateByIndex.

The test thoroughly covers:

  • Initial state retrieval at index 0
  • Last state retrieval using -1
  • Multiple updates and subsequent state access
  • Consistency between -1 and count-1 indices
  • Out-of-range error handling for both positive and negative indices

928-982: LGTM! Good integration test with real WAL and compactor.

The test exercises the LedgerWithCompactor wrapper, ensuring the state counting and index-based retrieval work correctly through the lifecycle-managed ledger.

ledger/complete/ledger.go (2)

44-44: LGTM! Good use of sync.Once for safe channel closure.

Using sync.Once to close trieUpdateCh prevents potential double-close panics if Done() is called multiple times. This aligns with the coding guidelines requiring proper synchronization primitives for state consistency.

Also applies to: 115-117


465-468: LGTM!

Simple and correct delegation to the existing ForestSize() method.

cmd/ledger/main.go (5)

23-32: Flag definitions look reasonable for the service configuration.

The flags provide sensible defaults and cover the necessary configuration options for the standalone ledger service.


34-48: Input validation and logging setup are correct.

The triedir validation and log level parsing with proper error handling follow good practices.


61-76: Ledger creation via factory is well-structured.

The factory pattern usage with proper error handling on Fatal is appropriate for a standalone service.


85-99: Health check validates ledger state availability.

The StateCount and StateByIndex checks ensure the ledger is properly initialized before accepting requests.


126-142: Unix socket cleanup and permission handling is reasonable.

Removing stale socket files on startup is appropriate. Warning but continuing on chmod failure is acceptable since the socket may still be usable with default permissions.

ledger/remote/client.go (6)

96-112: HasState silently returns false on gRPC error.

Similar to InitialState, this method swallows errors and returns false, which callers cannot distinguish from a legitimate "state not found" response. Consider whether this silent failure mode is acceptable for the use cases.


287-300: Done() goroutine may cause race if called concurrently with other operations.

The sync.Once ensures Close() is called once, but the goroutine closes the connection asynchronously. If Done() is called while other gRPC operations are in-flight, they may fail unexpectedly. Consider whether synchronous close or a graceful drain period is needed.


130-139: Nil vs empty byte slice handling is correctly implemented.

The is_nil flag usage properly preserves the distinction between nil and []byte{} that protobuf would otherwise lose. This is consistent with the server-side handling.


216-223: CBOR decoding for TrieUpdate preserves nil/empty semantics.

The use of DecodeTrieUpdateCBOR is appropriate here based on the previous discussion about encoding choices.


318-330: Key conversion helper is straightforward and correct.

The ledgerKeyToProtoKey function correctly maps ledger key parts to protobuf representation.


302-316: StateCount and StateByIndex are intentionally unimplemented for remote clients.

These methods are not supported because the gRPC service (ledger.proto) does not expose these operations. Returning 0 and an error respectively is intentional and documented. The code correctly reflects that health checks relying on StateCount() > 0 will fail for remote clients, which is the intended behavior given the current gRPC API limitations.

No changes are needed unless StateCount and StateByIndex are added to the gRPC service definition in a future update.

ledger/complete/ledger_with_compactor.go (4)

16-20: Wrapper struct cleanly encapsulates Ledger and Compactor.

The design hides the compactor as an implementation detail while managing both lifecycles together.


25-61: Constructor properly wires Ledger and Compactor with appropriate error handling.

The construction sequence and error wrapping follow good practices.


63-97: Delegated interface methods are straightforward pass-through.

All ledger operations correctly delegate to the underlying Ledger instance.


99-115: Ready() correctly sequences ledger initialization before compactor startup.

The sequential wait ensures WAL replay completes before the compactor starts processing.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +459 to +475
// Configure ledger service: execution nodes with index < ledgerExecutionCount use remote ledger
if i < ledgerExecutionCount {
// This execution node uses remote ledger service
service.Command = append(service.Command,
fmt.Sprintf("--ledger-service-addr=ledger_service_1:%s", testnet.GRPCPort),
)
// Execution node depends on ledger service
service.DependsOn = append(service.DependsOn, "ledger_service_1")
// Execution nodes using remote ledger should NOT mount the trie directory
// because the ledger service manages it
} else {
// Execution nodes with index >= ledgerExecutionCount use local ledger by default (no flag needed)
// These nodes need to mount the trie directory for their local ledger
service.Volumes = append(service.Volumes,
fmt.Sprintf("%s:/trie:z", trieDir),
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Address mismatch between execution node and ledger service configuration.

The execution node is configured to connect to the ledger service via TCP at ledger_service_1:9000 (line 463), but the ledger service in prepareLedgerService (line 856) is configured to listen on a Unix socket: --ledger-service-addr=unix:///sockets/ledger.sock.

This mismatch will cause connection failures at runtime.

🐛 Proposed fix: Use TCP address for ledger service

Either update the execution node to use the Unix socket (requires shared volume), or update the ledger service to listen on TCP. The simpler fix is to use TCP for the ledger service:

In prepareLedgerService, change line 856:

-		"--ledger-service-addr=unix:///sockets/ledger.sock",
+		fmt.Sprintf("--ledger-service-addr=0.0.0.0:%s", testnet.GRPCPort),

Comment on lines +210 to +256
func TestPartialLedger_StateByIndex(t *testing.T) {
w := &fixtures.NoopWAL{}

l, err := complete.NewLedger(w, 100, &metrics.NoopCollector{}, zerolog.Logger{}, complete.DefaultPathFinderVersion)
require.NoError(t, err)

compactor := fixtures.NewNoopCompactor(l)
<-compactor.Ready()

defer func() {
<-l.Done()
<-compactor.Done()
}()

// create update and get proof
state := l.InitialState()
keys := testutils.RandomUniqueKeys(2, 2, 2, 4)
values := testutils.RandomValues(2, 1, 32)
update, err := ledger.NewUpdate(state, keys, values)
require.NoError(t, err)

newState, _, err := l.Set(update)
require.NoError(t, err)

query, err := ledger.NewQuery(newState, keys)
require.NoError(t, err)
proof, err := l.Prove(query)
require.NoError(t, err)

pled, err := partial.NewLedger(proof, newState, partial.DefaultPathFinderVersion)
require.NoError(t, err)

// Test getting state at index 0 (only valid index for partial ledger)
state0, err := pled.StateByIndex(0)
require.NoError(t, err)
assert.Equal(t, newState, state0, "state at index 0 should match the ledger state")

// Test that other indices fail
_, err = pled.StateByIndex(1)
require.Error(t, err, "should error for index out of range")

_, err = pled.StateByIndex(-1)
require.Error(t, err, "should error for negative index (partial ledger only supports index 0)")

_, err = pled.StateByIndex(-2)
require.Error(t, err, "should error for negative index out of range")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test expects -1 to error, contradicting interface contract.

Line 251-252 expects StateByIndex(-1) to return an error, but the Ledger interface documents -1 as the last index. If the partial ledger implementation is updated to support -1, this test assertion will need adjustment.

Proposed test fix if implementation is updated
-	_, err = pled.StateByIndex(-1)
-	require.Error(t, err, "should error for negative index (partial ledger only supports index 0)")
+	// -1 should work as "last index" per interface contract
+	stateMinus1, err := pled.StateByIndex(-1)
+	require.NoError(t, err)
+	assert.Equal(t, newState, stateMinus1, "state at index -1 should match the ledger state")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestPartialLedger_StateByIndex(t *testing.T) {
w := &fixtures.NoopWAL{}
l, err := complete.NewLedger(w, 100, &metrics.NoopCollector{}, zerolog.Logger{}, complete.DefaultPathFinderVersion)
require.NoError(t, err)
compactor := fixtures.NewNoopCompactor(l)
<-compactor.Ready()
defer func() {
<-l.Done()
<-compactor.Done()
}()
// create update and get proof
state := l.InitialState()
keys := testutils.RandomUniqueKeys(2, 2, 2, 4)
values := testutils.RandomValues(2, 1, 32)
update, err := ledger.NewUpdate(state, keys, values)
require.NoError(t, err)
newState, _, err := l.Set(update)
require.NoError(t, err)
query, err := ledger.NewQuery(newState, keys)
require.NoError(t, err)
proof, err := l.Prove(query)
require.NoError(t, err)
pled, err := partial.NewLedger(proof, newState, partial.DefaultPathFinderVersion)
require.NoError(t, err)
// Test getting state at index 0 (only valid index for partial ledger)
state0, err := pled.StateByIndex(0)
require.NoError(t, err)
assert.Equal(t, newState, state0, "state at index 0 should match the ledger state")
// Test that other indices fail
_, err = pled.StateByIndex(1)
require.Error(t, err, "should error for index out of range")
_, err = pled.StateByIndex(-1)
require.Error(t, err, "should error for negative index (partial ledger only supports index 0)")
_, err = pled.StateByIndex(-2)
require.Error(t, err, "should error for negative index out of range")
}
func TestPartialLedger_StateByIndex(t *testing.T) {
w := &fixtures.NoopWAL{}
l, err := complete.NewLedger(w, 100, &metrics.NoopCollector{}, zerolog.Logger{}, complete.DefaultPathFinderVersion)
require.NoError(t, err)
compactor := fixtures.NewNoopCompactor(l)
<-compactor.Ready()
defer func() {
<-l.Done()
<-compactor.Done()
}()
// create update and get proof
state := l.InitialState()
keys := testutils.RandomUniqueKeys(2, 2, 2, 4)
values := testutils.RandomValues(2, 1, 32)
update, err := ledger.NewUpdate(state, keys, values)
require.NoError(t, err)
newState, _, err := l.Set(update)
require.NoError(t, err)
query, err := ledger.NewQuery(newState, keys)
require.NoError(t, err)
proof, err := l.Prove(query)
require.NoError(t, err)
pled, err := partial.NewLedger(proof, newState, partial.DefaultPathFinderVersion)
require.NoError(t, err)
// Test getting state at index 0 (only valid index for partial ledger)
state0, err := pled.StateByIndex(0)
require.NoError(t, err)
assert.Equal(t, newState, state0, "state at index 0 should match the ledger state")
// Test that other indices fail
_, err = pled.StateByIndex(1)
require.Error(t, err, "should error for index out of range")
// -1 should work as "last index" per interface contract
stateMinus1, err := pled.StateByIndex(-1)
require.NoError(t, err)
assert.Equal(t, newState, stateMinus1, "state at index -1 should match the ledger state")
_, err = pled.StateByIndex(-2)
require.Error(t, err, "should error for negative index out of range")
}
🤖 Prompt for AI Agents
In `@ledger/partial/ledger_test.go` around lines 210 - 256, The test
TestPartialLedger_StateByIndex incorrectly asserts that StateByIndex(-1) should
error even though the Ledger interface documents -1 as the last index; update
the test to treat -1 as valid for the partial ledger by calling
pled.StateByIndex(-1) and asserting no error and that the returned state equals
newState (keep assertions that StateByIndex(1) and StateByIndex(-2) error);
reference the test function TestPartialLedger_StateByIndex and the partial
ledger method StateByIndex (constructed via partial.NewLedger) when making this
change.

Comment on lines +174 to +181
// StateByIndex returns the state at the given index
// Partial ledger only has one state
func (l *Ledger) StateByIndex(index int) (ledger.State, error) {
if index == 0 {
return l.state, nil
}
return ledger.DummyState, fmt.Errorf("index %d is out of range (partial ledger has 1 state)", index)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent handling of -1 index with interface contract.

The Ledger interface documents that -1 is the last index, but StateByIndex in partial ledger returns an error for -1. For a single-state ledger, -1 should resolve to index 0.

Proposed fix to support -1 index
 func (l *Ledger) StateByIndex(index int) (ledger.State, error) {
-	if index == 0 {
+	if index == 0 || index == -1 {
 		return l.state, nil
 	}
 	return ledger.DummyState, fmt.Errorf("index %d is out of range (partial ledger has 1 state)", index)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// StateByIndex returns the state at the given index
// Partial ledger only has one state
func (l *Ledger) StateByIndex(index int) (ledger.State, error) {
if index == 0 {
return l.state, nil
}
return ledger.DummyState, fmt.Errorf("index %d is out of range (partial ledger has 1 state)", index)
}
// StateByIndex returns the state at the given index
// Partial ledger only has one state
func (l *Ledger) StateByIndex(index int) (ledger.State, error) {
if index == 0 || index == -1 {
return l.state, nil
}
return ledger.DummyState, fmt.Errorf("index %d is out of range (partial ledger has 1 state)", index)
}
🤖 Prompt for AI Agents
In `@ledger/partial/ledger.go` around lines 174 - 181, StateByIndex currently
rejects index == -1 even though the Ledger interface treats -1 as the last
index; update the Ledger.StateByIndex method to treat index == -1 as index 0 for
the partial ledger: if index == 0 || index == -1 return l.state, nil, otherwise
return ledger.DummyState with the existing out-of-range error message. Modify
the conditional in the StateByIndex function (referencing l.state,
ledger.DummyState, and the method name StateByIndex) to handle -1 as described
and keep the same error formatting for other indices.

Comment on lines +257 to +282
// Retry with exponential backoff for up to 30 seconds
ctx := context.Background()
maxRetries := 30
retryDelay := 100 * time.Millisecond

for i := 0; i < maxRetries; i++ {
_, err := c.client.InitialState(ctx, &emptypb.Empty{})
if err == nil {
c.logger.Info().Msg("ledger service ready")
return
}

if i < maxRetries-1 {
c.logger.Debug().
Err(err).
Int("attempt", i+1).
Dur("retry_delay", retryDelay).
Msg("ledger service not ready, retrying...")
time.Sleep(retryDelay)
retryDelay = time.Duration(float64(retryDelay) * 1.5) // exponential backoff
} else {
c.logger.Warn().Err(err).Msg("ledger service not ready after retries, proceeding anyway")
// Still close the channel to avoid blocking forever
// The execution node will fail later with a more specific error if the service is truly not ready
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Exponential backoff has no upper bound, leading to extremely long delays.

The retry delay grows unbounded: after ~20 iterations, delays exceed 1 hour each. With 30 max retries and no cap, the total wait time becomes astronomical. Cap the delay at a reasonable maximum.

🐛 Proposed fix to cap the backoff delay
 	maxRetries := 30
 	retryDelay := 100 * time.Millisecond
+	maxRetryDelay := 5 * time.Second

 	for i := 0; i < maxRetries; i++ {
 		_, err := c.client.InitialState(ctx, &emptypb.Empty{})
 		if err == nil {
 			c.logger.Info().Msg("ledger service ready")
 			return
 		}

 		if i < maxRetries-1 {
 			c.logger.Debug().
 				Err(err).
 				Int("attempt", i+1).
 				Dur("retry_delay", retryDelay).
 				Msg("ledger service not ready, retrying...")
 			time.Sleep(retryDelay)
-			retryDelay = time.Duration(float64(retryDelay) * 1.5) // exponential backoff
+			retryDelay = time.Duration(float64(retryDelay) * 1.5)
+			if retryDelay > maxRetryDelay {
+				retryDelay = maxRetryDelay
+			}
 		} else {
🤖 Prompt for AI Agents
In `@ledger/remote/client.go` around lines 257 - 282, The exponential backoff in
the retry loop (uses ctx := context.Background(), maxRetries, retryDelay and
c.client.InitialState) has no upper bound; add a cap such as maxRetryDelay
(e.g., 5s or configurable) and after computing retryDelay =
time.Duration(float64(retryDelay) * 1.5) clamp it so retryDelay =
min(retryDelay, maxRetryDelay) (or use an if to set to maxRetryDelay), ensuring
the logged Dur("retry_delay", retryDelay) reflects the capped value.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@ledger/remote/client.go`:
- Around line 179-224: In Set, guard against nil gRPC responses by first
verifying resp != nil and resp.NewState != nil before accessing
resp.NewState.Hash (and return ledger.DummyState with an error if they are nil),
then validate len(resp.NewState.Hash) as before; also defensively handle
resp.TrieUpdate being nil by either treating it as an empty trie update (skip
decode and set trieUpdate = nil) or returning an explicit error, and only call
decodeTrieUpdateFromTransport when resp.TrieUpdate != nil; update error messages
to mention which field was missing (resp or resp.NewState or resp.TrieUpdate) to
aid debugging.
♻️ Duplicate comments (5)
ledger/remote/client.go (3)

27-52: Add a TLS option for gRPC transport credentials.

The client always uses insecure.NewCredentials(), leaving no path to secure production deployments.


75-94: Avoid returning DummyState on RPC/validation errors.

Logging and returning ledger.DummyState hides connectivity or data issues from callers.


249-286: Cap the exponential backoff delay.

Without a cap, retry delays grow unbounded and can stretch readiness to many minutes.

🐛 Proposed fix
 maxRetries := 30
 retryDelay := 100 * time.Millisecond
+maxRetryDelay := 5 * time.Second

 for i := 0; i < maxRetries; i++ {
 	_, err := c.client.InitialState(ctx, &emptypb.Empty{})
 	if err == nil {
 		c.logger.Info().Msg("ledger service ready")
 		return
 	}

 	if i < maxRetries-1 {
 		c.logger.Debug().
 			Err(err).
 			Int("attempt", i+1).
 			Dur("retry_delay", retryDelay).
 			Msg("ledger service not ready, retrying...")
 		time.Sleep(retryDelay)
 		retryDelay = time.Duration(float64(retryDelay) * 1.5) // exponential backoff
+		if retryDelay > maxRetryDelay {
+			retryDelay = maxRetryDelay
+		}
 	} else {
ledger/remote/service.go (2)

150-168: Handle nil Value elements in Set requests.

req.Values is []*ledgerpb.Value; nil entries will panic on Data/IsNil access.

🐛 Proposed fix
 values := make([]ledger.Value, len(req.Values))
 for i, protoValue := range req.Values {
+	if protoValue == nil {
+		return nil, status.Error(codes.InvalidArgument, "value at index is nil")
+	}
 	var value ledger.Value
 	// Reconstruct the original value type using is_nil flag
 	// This preserves the distinction between nil and []byte{} that protobuf loses

As per coding guidelines, all inputs must be considered potentially byzantine.


225-237: Guard against nil KeyPart elements.

protoKey.Parts can contain nil entries, which will panic when accessing part.Type.

🐛 Proposed fix
 keyParts := make([]ledger.KeyPart, len(protoKey.Parts))
 for i, part := range protoKey.Parts {
+	if part == nil {
+		return ledger.Key{}, status.Error(codes.InvalidArgument, "key part is nil")
+	}
 	if part.Type > 65535 {
 		return ledger.Key{}, status.Error(codes.InvalidArgument, "key part type exceeds uint16")
 	}
 	keyParts[i] = ledger.NewKeyPart(uint16(part.Type), part.Value)

As per coding guidelines, all inputs must be considered potentially byzantine.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2cdcfa and 0ef2d4c.

📒 Files selected for processing (3)
  • ledger/remote/client.go
  • ledger/remote/encoding.go
  • ledger/remote/service.go
🧰 Additional context used
📓 Path-based instructions (4)
**/*.go

📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)

Follow Go coding conventions as documented in @docs/agents/CodingConventions.md

Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md

**/*.go: Follow the existing module structure in /module/, /engine/, /model/ and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: run make generate-mocks after interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoid fmt.Errorf, use irrecoverable package for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)

Files:

  • ledger/remote/client.go
  • ledger/remote/encoding.go
  • ledger/remote/service.go
{crypto,fvm,ledger,storage}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Security checks for cryptographic misuse must be enforced using gosec

Files:

  • ledger/remote/client.go
  • ledger/remote/encoding.go
  • ledger/remote/service.go
{crypto,fvm,ledger,access,engine}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation

Files:

  • ledger/remote/client.go
  • ledger/remote/encoding.go
  • ledger/remote/service.go
{storage,ledger,execution,fvm}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

State consistency is paramount; use proper synchronization primitives

Files:

  • ledger/remote/client.go
  • ledger/remote/encoding.go
  • ledger/remote/service.go
🧠 Learnings (3)
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {crypto,fvm,ledger,access,engine}/**/*.go : Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation

Applied to files:

  • ledger/remote/client.go
  • ledger/remote/service.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {storage,ledger,execution,fvm}/**/*.go : State consistency is paramount; use proper synchronization primitives

Applied to files:

  • ledger/remote/client.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented

Applied to files:

  • ledger/remote/service.go
🧬 Code graph analysis (1)
ledger/remote/encoding.go (1)
ledger/trie_encoder.go (2)
  • EncodeTrieUpdateCBOR (702-714)
  • DecodeTrieUpdateCBOR (719-730)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
  • GitHub Check: Lint (./insecure/)
  • GitHub Check: Lint (./integration/)
  • GitHub Check: Lint (./)
  • GitHub Check: Integration Tests Others (integration)
  • GitHub Check: Unit Tests (engine/collection)
  • GitHub Check: Unit Tests (engine)
  • GitHub Check: Unit Tests (network/alsp)
  • GitHub Check: Unit Tests (network/test/cohort2)
  • GitHub Check: Unit Tests (network/p2p/scoring)
  • GitHub Check: Unit Tests (network/test/cohort1)
  • GitHub Check: Unit Tests (module/dkg)
  • GitHub Check: Unit Tests (network/p2p/connection)
  • GitHub Check: Unit Tests (engine/execution)
  • GitHub Check: Unit Tests (engine/verification)
  • GitHub Check: Unit Tests (network)
  • GitHub Check: Unit Tests (module)
  • GitHub Check: Unit Tests (others)
  • GitHub Check: Unit Tests (network/p2p)
  • GitHub Check: Unit Tests (storage)
  • GitHub Check: Unit Tests (network/p2p/node)
  • GitHub Check: Unit Tests (engine/common)
  • GitHub Check: Unit Tests (engine/execution/ingestion)
  • GitHub Check: Unit Tests (state)
  • GitHub Check: Unit Tests (engine/execution/computation)
  • GitHub Check: Unit Tests (admin)
  • GitHub Check: Unit Tests (utils)
  • GitHub Check: Unit Tests (fvm)
  • GitHub Check: Unit Tests (engine/consensus)
  • GitHub Check: Unit Tests (consensus)
  • GitHub Check: Unit Tests (engine/access)
  • GitHub Check: Unit Tests (ledger)
  • GitHub Check: Unit Tests (cmd)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
  • GitHub Check: Unit Tests Insecure (insecure)
  • GitHub Check: Docker Build
  • GitHub Check: analyze-code (go)
🔇 Additional comments (5)
ledger/remote/encoding.go (1)

7-33: LGTM: centralized transport encoding/decoding is consistent.

ledger/remote/client.go (3)

67-73: LGTM for lifecycle helpers and straightforward RPC wrappers.

Also applies to: 96-112, 226-247, 288-331


141-177: Add nil checks for nested proto message fields, not repeated field elements.

The coding guideline requires defensive validation of all inputs. However, the concern should target nested message fields (resp.Value in GetSingleValue, resp.State in InitialState, resp.NewState in Set), which can be nil if the server doesn't initialize them. Proto3 guarantees that repeated field elements are never nil after successful unmarshaling, so resp.Values elements in the Get method are safe.

The actual vulnerable locations are:

  • GetSingleValue (line 132): resp.Value.Data access without null check on resp.Value
  • InitialState (line 85): resp.State.Hash access without null check on resp.State
  • Set: resp.NewState.Hash access without null check on resp.NewState

Likely an incorrect or invalid review comment.


114-139: Remove the suggested nil checks on gRPC response structures.

In proto3, when a gRPC call succeeds (no error returned), the response message and all nested message fields are guaranteed to be properly initialized by the proto deserialization layer. The generated gRPC code allocates the response object before deserialization, ensuring it's never nil when err == nil. All other methods in this file (InitialState, HasState, Get, Set, Prove) follow the same pattern—checking for errors, then directly accessing response fields—without nil checks on the response structure itself. Validating response data correctness (as done with len(resp.State.Hash)) is appropriate, but nil checks on the response struct itself are redundant and inconsistent with the codebase's established practices.

Likely an incorrect or invalid review comment.

ledger/remote/service.go (1)

15-126: LGTM for service scaffolding and request validation.

Also applies to: 192-223

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +179 to +224
// Set updates keys with new values at a specific state and returns the new state.
func (c *Client) Set(update *ledger.Update) (ledger.State, *ledger.TrieUpdate, error) {
ctx := context.Background()
state := update.State()
req := &ledgerpb.SetRequest{
State: &ledgerpb.State{
Hash: state[:],
},
Keys: make([]*ledgerpb.Key, len(update.Keys())),
Values: make([]*ledgerpb.Value, len(update.Values())),
}

for i, key := range update.Keys() {
req.Keys[i] = ledgerKeyToProtoKey(key)
}

for i, value := range update.Values() {
// Distinguish between nil and []byte{} for protobuf encoding
// Protobuf cannot distinguish them, so we use is_nil flag
isNil := value == nil
req.Values[i] = &ledgerpb.Value{
Data: value,
IsNil: isNil,
}
}

resp, err := c.client.Set(ctx, req)
if err != nil {
return ledger.DummyState, nil, fmt.Errorf("failed to set values: %w", err)
}

var newState ledger.State
if len(resp.NewState.Hash) != len(newState) {
return ledger.DummyState, nil, fmt.Errorf("invalid new state hash length")
}
copy(newState[:], resp.NewState.Hash)

// Decode trie update using centralized decoding function to ensure
// client and server use the same encoding method
trieUpdate, err := decodeTrieUpdateFromTransport(resp.TrieUpdate)
if err != nil {
return ledger.DummyState, nil, fmt.Errorf("failed to decode trie update: %w", err)
}

return newState, trieUpdate, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "client.go" -path "*/ledger/remote/*" | head -20

Repository: onflow/flow-go

Length of output: 83


🏁 Script executed:

git ls-files | grep -E "ledger.*\.go$" | head -20

Repository: onflow/flow-go

Length of output: 1083


🏁 Script executed:

fd "client.go" ledger/remote --type f

Repository: onflow/flow-go

Length of output: 81


🏁 Script executed:

wc -l ledger/remote/client.go

Repository: onflow/flow-go

Length of output: 85


🏁 Script executed:

sed -n '179,224p' ledger/remote/client.go

Repository: onflow/flow-go

Length of output: 1483


🏁 Script executed:

find . -name "*.proto" | grep -i ledger | head -20

Repository: onflow/flow-go

Length of output: 88


🏁 Script executed:

rg "message SetResponse" --type proto

Repository: onflow/flow-go

Length of output: 87


🏁 Script executed:

cat ledger/protobuf/ledger.proto | grep -A 20 "message SetResponse"

Repository: onflow/flow-go

Length of output: 392


🏁 Script executed:

cat ledger/protobuf/ledger.proto | grep -A 5 "service Ledger"

Repository: onflow/flow-go

Length of output: 325


🏁 Script executed:

cat ledger/protobuf/ledger.proto | grep -B 5 -A 15 "message State"

Repository: onflow/flow-go

Length of output: 1040


🏁 Script executed:

sed -n '1,100p' ledger/remote/client.go | tail -50

Repository: onflow/flow-go

Length of output: 1231


🏁 Script executed:

rg "resp\." ledger/remote/client.go | head -20

Repository: onflow/flow-go

Length of output: 572


🏁 Script executed:

sed -n '130,150p' ledger/remote/client.go

Repository: onflow/flow-go

Length of output: 675


Add nil checks for gRPC response fields in the Set method.

The code directly accesses resp.NewState.Hash without checking if resp or resp.NewState is nil. Per the coding guideline that all inputs must be considered potentially byzantine, a malformed server response with a missing or nil NewState field would cause a nil pointer panic. Add defensive validation before accessing the response field:

 resp, err := c.client.Set(ctx, req)
 if err != nil {
 	return ledger.DummyState, nil, fmt.Errorf("failed to set values: %w", err)
 }
+if resp == nil || resp.NewState == nil {
+	return ledger.DummyState, nil, fmt.Errorf("invalid response: missing new state")
+}

 var newState ledger.State
 if len(resp.NewState.Hash) != len(newState) {
🤖 Prompt for AI Agents
In `@ledger/remote/client.go` around lines 179 - 224, In Set, guard against nil
gRPC responses by first verifying resp != nil and resp.NewState != nil before
accessing resp.NewState.Hash (and return ledger.DummyState with an error if they
are nil), then validate len(resp.NewState.Hash) as before; also defensively
handle resp.TrieUpdate being nil by either treating it as an empty trie update
(skip decode and set trieUpdate = nil) or returning an explicit error, and only
call decodeTrieUpdateFromTransport when resp.TrieUpdate != nil; update error
messages to mention which field was missing (resp or resp.NewState or
resp.TrieUpdate) to aid debugging.

- docker-cross-build-execution-arm docker-push-execution-arm

# execution Ledger Service Build Commands
- docker-build-execution-ledger-with-adx docker-push-execution-ledger-with-adx
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @manny-yes for the changes to the ci builds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move ledger to standalone service

4 participants