-
Notifications
You must be signed in to change notification settings - Fork 201
POC Ledger Service #8309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
POC Ledger Service #8309
Conversation
📝 WalkthroughWalkthroughAdds a standalone gRPC Ledger service, factory-based ledger selection (local vs remote), remote client/server transport and protobufs, CBOR trie-update encoding, WAL file-locking and compactor wiring, metrics consolidation, localnet bootstrap support for ledger service containers, Makefile/CI image targets, and extensive tests and mocks. Changes
Sequence Diagram(s)sequenceDiagram
participant Exec as Execution Node
participant Factory as Ledger Factory
participant Client as Remote Client
participant Service as Ledger Service (gRPC)
participant Ledger as Local Ledger Storage
Exec->>Factory: NewLedger(config)
alt ledger-service-addr set (remote)
Factory->>Client: NewClient(grpcAddr, sizes)
Client->>Service: Dial & Ready probe
Exec->>Client: Get/Set/Prove RPC
Client->>Service: RPC call
Service->>Ledger: Query/Update/Prove
Ledger-->>Service: Result
Service-->>Client: Reply
Client-->>Exec: Return values/proofs
else local
Factory->>Ledger: NewLocalLedgerFactory -> NewLedger()
Ledger-->>Factory: ledger instance (+optional WAL)
Factory-->>Exec: Return local ledger
end
sequenceDiagram
participant Bootstrap as Localnet Bootstrap
participant LedgerSvc as Ledger Container
participant ExecNode as Execution Container
participant Trie as Shared Trie Dir
Bootstrap->>Bootstrap: read LEDGER_EXECUTION
alt ledger-execution > 0
Bootstrap->>LedgerSvc: build & start ledger_service container
LedgerSvc->>Trie: mount trie dir / copy checkpoint symlinks
Bootstrap->>ExecNode: start exec nodes with --ledger-service-addr (for indices < LEDGER_EXECUTION)
ExecNode->>LedgerSvc: RPC calls
LedgerSvc->>Trie: read/write trie
else ledger-execution == 0
Bootstrap->>ExecNode: start exec nodes with local trie mount
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested Labels
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Tip 🧪 Unit Test Generation v2 is now available!We have significantly improved our unit test generation capabilities. To enable: Add this to your reviews:
finishing_touches:
unit_tests:
enabled: trueTry it out by using the Have feedback? Share your thoughts on our Discord thread! Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
| require.Equal(t, decoded1, decoded2, "EncodeTrieUpdate and EncodeTrieUpdateCBOR should produce same TrieUpdate") | ||
| require.Equal(t, decoded2, decoded3, "EncodeTrieUpdateCBOR and EncodeTrieUpdateProtoBuf should produce same TrieUpdate") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have 3 encoding methods for the TrieUpdate.
- The EncodeTrieUpdate is used by DiskWAL to encode trie update into WAL files
- The
cbor.Marshal, which is always used by EncodeTrieUpdateCBOR, is used by the execution data provider to encode the ChunkExecutionData which contains TrieUpdate. - The
proto.Marshalis used by the ledger grpc server to encode the TrieUpdate included in response ofledger.Setmethod.
And this test case is to ensure given the same trie update, after encode and decode, the nil value will be normalized to empty slice, but the decoded value should be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ledger/complete/wal/checkpoint_v6_test.go (1)
450-457: Bug:compareFilesopensfile1twice instead offile1andfile2.Line 451 opens
file1again instead offile2, causing the function to always compare a file with itself. This renders the comparison meaningless and tests likeTestCheckpointV6IsDeterminsticwill always pass regardless of actual file differences.🐛 Proposed fix
- closable2, err := os.Open(file1) + closable2, err := os.Open(file2)
🤖 Fix all issues with AI agents
In @integration/localnet/builder/ports.go:
- Around line 56-60: The comment for the "ledger" port range is incorrect: with
start: 8000, end: 8100 and portCount: 2 the capacity is (8100 - 8000) / 2 = 50
services, not 20; update the inline comment in the "ledger" block to reflect the
correct capacity (50 ledger services) or adjust portCount/end values if you
intended 20 services so the math matches.
In @ledger/complete/wal/checkpointer_test.go:
- Around line 495-498: The helper randomlyModifyFile should not call t.Skip()
because that skips the whole test; instead change randomlyModifyFile to return
an error (or a bool) and have callers handle it: remove t.Skip() and either
return nil / false when fileSize==0 or return a specific error, update callers
in Test_Checkpointing’s loop to check the return value and continue the loop (or
t.Log and continue) when the file is empty. Ensure the helper signature
(randomlyModifyFile) and all call sites are updated accordingly so no test-level
skip occurs.
In @ledger/complete/wal/checkpointer.go:
- Around line 1120-1126: The call to filepath.Rel when computing
expectedRelTarget in checkpointer.go ignores its error, so if filepath.Rel fails
expectedRelTarget may be wrong and the subsequent comparison (variables:
symlinkDir, expectedRelTarget, match, target, newPath) can mis-detect correct
symlinks; fix by checking the error returned from filepath.Rel, and on error
fall back to a safe comparison (for example use match as the expected target or
compute absolute paths with filepath.Abs for both target and match before
comparing) so the condition if target == expectedRelTarget || target == match
reliably reflects the actual link target.
In @ledger/remote/client.go:
- Around line 45-55: The client always uses insecure.NewCredentials() in the
grpc.NewClient call, so add a configurable switch to use TLS credentials for
production: detect a config/env flag (e.g., LEDGER_GRPC_TLS or use a
TLSCert/TLSRootCA path) and, when enabled, create secure credentials (e.g., via
credentials.NewClientTLSFromFile or equivalent) and pass
grpc.WithTransportCredentials(creds) instead of insecure.NewCredentials(); keep
the existing grpcAddr and call option handling (grpc.MaxCallRecvMsgSize,
grpc.MaxCallSendMsgSize) and ensure errors from loading certs are returned
(update the grpc.NewClient construction around the grpcAddr, maxResponseSize,
maxRequestSize block).
- Around line 75-94: InitialState currently swallows errors and returns
ledger.DummyState, which breaks tests; change the Client.InitialState signature
to return (ledger.State, error) and update the implementation to propagate
errors from c.client.InitialState and from the invalid hash-length check (return
zero value state and the error instead of DummyState), and then update all
callers and the ledger.Client interface implementations to handle the new
(state, error) return so the remote client never silently returns DummyState on
failures.
In @ledger/remote/service.go:
- Around line 230-236: The loop over protoKey.Parts can panic if an element is
nil; update the loop that builds keyParts to check for nil before accessing
part.Type/value (i.e., inside the for i, part := range protoKey.Parts block),
and return a status.Error(codes.InvalidArgument, "...") when a nil part is
encountered; keep the existing uint16 bounds check for part.Type and continue to
call ledger.NewKeyPart only after validating part is non-nil and its Type fits
in uint16 so you populate the ledger.KeyPart slice safely.
- Around line 150-168: The loop over req.Values must guard against nil elements
to avoid panics; before reading protoValue.Data or protoValue.IsNil in the loop
that builds values (the slice of ledger.Value), add a nil check for protoValue
and handle it explicitly (e.g., treat a nil protoValue as an original nil
ledger.Value or return a clear error), then proceed with the existing logic that
distinguishes protoValue.IsNil vs empty Data to reconstruct the ledger.Value;
reference req.Values, protoValue, protoValue.Data, protoValue.IsNil and the
values slice when locating where to add the check.
In @ledger/trie_encoder_test.go:
- Around line 705-712: Fix the mismatched assertions/messages around
decoded.Payloads: replace the incorrect require.NotNil on
decoded.Payloads[0].Value() with require.Nil and update its message to "Decoded
Payload 0 should have nil value"; keep require.NotNil for
decoded.Payloads[1].Value() (empty slice) and ensure its message says "Decoded
Payload 1 should have non-nil value"; finally update the final require.Equal
message to reflect the intended expectation ("Decoded nil and []byte{} should be
equal after normalization") so the assertion text matches the check of
decoded.Payloads[0].Value() == decoded.Payloads[1].Value().
🧹 Nitpick comments (16)
ledger/common/pathfinder/pathfinder.go (1)
117-117: Minor: Comment still has a grammatical issue.The comment uses "an slice" which should be "a slice" (slice starts with a consonant sound).
📝 Suggested fix
-// UpdateToPayloads constructs an slice of payloads given ledger update. +// UpdateToPayloads constructs a slice of payloads given a ledger update.ledger/trie_encoder.go (1)
707-711: Consider returning an error instead of panicking on marshal failure.While CBOR marshalling of a
TrieUpdatestruct is unlikely to fail in practice, panicking violates the project's error handling guidelines which state "NEVER log and continue on best effort basis; ALWAYS explicitly handle errors." Returning an error allows callers to handle the failure gracefully.♻️ Suggested refactor
-func EncodeTrieUpdateCBOR(t *TrieUpdate) []byte { +func EncodeTrieUpdateCBOR(t *TrieUpdate) ([]byte, error) { if t == nil { - return []byte{} + return []byte{}, nil } encoded, err := cbor.Marshal(t) if err != nil { - // This should not happen in normal operation - panic(fmt.Errorf("failed to encode trie update with CBOR: %w", err)) + return nil, fmt.Errorf("failed to encode trie update with CBOR: %w", err) } - return encoded + return encoded, nil }Note: This would require updating callers to handle the returned error. Based on learnings, the project prefers comprehensive error wrapping over panics.
ledger/complete/factory.go (1)
21-38: Consider validating critical parameters.The constructor accepts parameters without validation. If
compactorConfigisnil, the call toNewLedgerWithCompactorwill panic when accessingcompactorConfigfields. Consider adding a nil check or documenting thatcompactorConfigmust not be nil.♻️ Optional validation
func NewLocalLedgerFactory( ledgerWAL wal.LedgerWAL, capacity int, compactorConfig *ledger.CompactorConfig, metrics module.LedgerMetrics, logger zerolog.Logger, pathFinderVersion uint8, ) ledger.Factory { + if compactorConfig == nil { + panic("compactorConfig must not be nil") + } return &LocalLedgerFactory{Alternatively, document this precondition in the function's doc comment.
utils/io/filelock_test.go (1)
383-444: Platform-specific tests are appropriately documented.The permission-denied tests (lines 383-444) are correctly documented as potentially platform-specific (may not work on Windows). The tests properly restore permissions in defer blocks to ensure cleanup.
Consider adding a skip condition for Windows if these tests are known to fail:
💡 Optional: Skip on Windows
if runtime.GOOS == "windows" { t.Skip("permission tests not reliable on Windows") }ledger/complete/ledger_with_compactor.go (1)
90-104: Consider error recovery in Ready() goroutine.The goroutine in
Ready()doesn't have panic recovery. If eitherlwc.ledger.Ready()orlwc.compactor.Ready()panics, thereadychannel will never close, potentially causing callers to block indefinitely.This follows the existing pattern in the codebase but is worth noting for robustness.
.github/workflows/image_builds.yml (1)
167-167: Updateactions/checkoutto v4.The
actions/checkout@v3action is outdated. Update toactions/checkout@v4for security updates and continued compatibility.Suggested fix
- - name: Checkout repo - uses: actions/checkout@v3 + - name: Checkout repo + uses: actions/checkout@v4Apply this change at both lines 167 and 199.
Also applies to: 199-199
ledger/remote/factory.go (1)
33-38: Minor simplification available.The function body can be simplified by returning the client directly.
✨ Suggested simplification
func (f *RemoteLedgerFactory) NewLedger() (ledger.Ledger, error) { - client, err := NewClient(f.grpcAddr, f.logger, f.maxRequestSize, f.maxResponseSize) - if err != nil { - return nil, err - } - return client, nil + return NewClient(f.grpcAddr, f.logger, f.maxRequestSize, f.maxResponseSize) }cmd/ledger/main.go (1)
125-132: Consider adding a shutdown timeout.
grpcServer.GracefulStop()blocks until all active RPCs complete, which could hang indefinitely if a client maintains a long-running connection. Consider adding a timeout with fallback toStop().⏱️ Suggested timeout-based shutdown
+import ( + "context" + "time" +) // Graceful shutdown logger.Info().Msg("shutting down gRPC server...") -grpcServer.GracefulStop() + +// Give in-flight requests time to complete, then force stop +shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second) +defer cancel() + +done := make(chan struct{}) +go func() { + grpcServer.GracefulStop() + close(done) +}() + +select { +case <-done: + logger.Info().Msg("gRPC server stopped gracefully") +case <-shutdownCtx.Done(): + logger.Warn().Msg("graceful shutdown timed out, forcing stop") + grpcServer.Stop() +}cmd/execution_builder.go (1)
937-959: Remove unuseddiskWALfield assignment or document its purpose.While the concern about
result.WALbeing nil for remote ledgers is valid, verification shows thatexeNode.diskWALis assigned at line 957 but never referenced anywhere else in the codebase. This appears to be dead code left over from refactoring. Either remove the assignment or add a comment explaining why this field is being stored.ledger/remote/protobuf_encoding_test.go (1)
23-105: Tests simulate rather than verify actual protobuf encoding behavior.These tests are valuable as documentation of the nil vs
[]byte{}distinction issue. However, they simulate protobuf behavior (lines 68-79) rather than actually encoding/decoding via protobuf marshaling. Consider adding a brief note or renaming to clarify these are conceptual demonstrations, or add a companion test that performs actualproto.Marshal/proto.Unmarshalround-trips to verify the real behavior.integration/localnet/builder/bootstrap.go (1)
702-718: Consider documenting why directory removal is needed.The new logic at lines 703-710 removes an existing directory before creating a file. While this handles an edge case, it's somewhat unexpected behavior for a function named
openAndTruncate. A comment explaining the scenario that necessitated this change would help future maintainers understand why a directory might exist at a file path location.📝 Suggested documentation
func openAndTruncate(filename string) (*os.File, error) { // Check if path exists and is a directory, remove it if so + // This can happen if a previous run created a directory at this location + // (e.g., due to a configuration change or bug) if fi, err := os.Stat(filename); err == nil { if fi.IsDir() {ledger/factory/factory_test.go (2)
366-370: Potential port allocation race condition.The pattern of calling
net.Listen(":0"), extracting the address, closing the listener, and then re-opening at that address (line 419) creates a small window where another process could claim the port. While unlikely in test environments, consider keeping the original listener open and using it directly.♻️ Suggested fix
func startLedgerServer(t *testing.T, walDir string) (string, func()) { - // Find an available port - listener, err := net.Listen("tcp", ":0") - require.NoError(t, err) - addr := listener.Addr().String() - listener.Close() - logger := unittest.Logger() // Create WAL @@ -416,8 +410,11 @@ func startLedgerServer(t *testing.T, walDir string) (string, func()) { ledgerpb.RegisterLedgerServiceServer(grpcServer, ledgerService) // Start gRPC server - lis, err := net.Listen("tcp", addr) + lis, err := net.Listen("tcp", ":0") require.NoError(t, err) + addr := lis.Addr().String() // Start server in goroutine serverErr := make(chan error, 1) @@ -428,8 +425,6 @@ func startLedgerServer(t *testing.T, walDir string) (string, func()) { }() - // Wait a bit for server to start - time.Sleep(100 * time.Millisecond) - // Cleanup function cleanup := func() {The static analysis warning about binding to
:0is a false positive for test code—this is a standard pattern for allocating random ports in tests.
494-506: Cleanup order may cause issues.The defer block calls
<-remoteLedger.Done()first, then<-localResult.WAL.Done(), then<-localLedger.Done(). However, the local ledger likely depends on the WAL, so stopping the WAL before the ledger could cause issues. Consider reversing the local cleanup order.♻️ Suggested cleanup order
defer func() { // Stop remote ledger <-remoteLedger.Done() - // Stop local ledger and WAL + // Stop local ledger first, then WAL (ledger may depend on WAL) + <-localLedger.Done() if localResult.WAL != nil { <-localResult.WAL.Done() } - <-localLedger.Done() // Stop server serverCleanup() }()ledger/factory/factory.go (1)
116-124: Type assertion may be fragile.The type assertion
diskWal.(*wal.DiskWAL)at line 120 will fail ifwal.NewDiskWALever returns a different type implementingwal.LedgerWAL. SinceNewDiskWALis called just above and returns*DiskWAL, this is currently safe, but the interface-to-concrete-type pattern could break if the constructor changes.Consider documenting this assumption or using the concrete type directly:
♻️ Alternative: use concrete type directly
func NewLedger(config Config) (*Result, error) { var factory ledger.Factory - var diskWal wal.LedgerWAL + var diskWAL *wal.DiskWAL // Check if remote ledger service is configured if config.LedgerServiceAddr != "" { @@ -75,7 +75,7 @@ func NewLedger(config Config) (*Result, error) { // Create WAL var err error - diskWal, err = wal.NewDiskWAL( + diskWAL, err = wal.NewDiskWAL( config.Logger.With().Str("subcomponent", "wal").Logger(), config.MetricsRegisterer, config.WALMetrics, @@ -100,7 +100,7 @@ func NewLedger(config Config) (*Result, error) { // Use factory to create ledger with internal compactor factory = complete.NewLocalLedgerFactory( - diskWal, + diskWAL, int(config.MTrieCacheSize), compactorConfig, config.LedgerMetrics, @@ -113,16 +113,8 @@ func NewLedger(config Config) (*Result, error) { return nil, fmt.Errorf("failed to create ledger: %w", err) } - // Type assert to get the concrete DiskWAL type (only for local ledger) - var diskWAL *wal.DiskWAL - if diskWal != nil { - var ok bool - diskWAL, ok = diskWal.(*wal.DiskWAL) - if !ok { - return nil, fmt.Errorf("expected *wal.DiskWAL but got %T", diskWal) - } - } - return &Result{ Ledger: ledgerStorage, WAL: diskWAL,ledger/protobuf/ledger.proto (1)
1-7: Consider addressing Buf linting suggestions for API versioning.Static analysis flags that the package lacks a version suffix (e.g.,
ledger.v1) and the file path doesn't match the package name. These are common protobuf best practices for API evolution and breaking change management.If this service may evolve with breaking changes, consider adopting versioned packages early. However, if this is an internal-only service with controlled clients, this can be deferred.
ledger/remote/service.go (1)
15-28: Logger is currently unused.The
loggerfield is never used in any of the RPC handlers. If it's intended for future observability, consider adding structured logging for request tracing and error scenarios. If not needed, consider removing it to avoid confusion.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
ledger/protobuf/ledger.pb.gois excluded by!**/*.pb.goledger/protobuf/ledger_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (37)
.github/workflows/image_builds.ymlMakefilecmd/bootstrap/run/execution_state.gocmd/execution_builder.gocmd/execution_config.gocmd/ledger/README.mdcmd/ledger/main.gogo.modintegration/localnet/Makefileintegration/localnet/builder/bootstrap.gointegration/localnet/builder/ports.goledger/common/pathfinder/pathfinder.goledger/complete/factory.goledger/complete/ledger.goledger/complete/ledger_with_compactor.goledger/complete/wal/checkpoint_v5_test.goledger/complete/wal/checkpoint_v6_test.goledger/complete/wal/checkpointer.goledger/complete/wal/checkpointer_test.goledger/complete/wal/wal.goledger/config.goledger/factory.goledger/factory/factory.goledger/factory/factory_test.goledger/protobuf/buf.gen.yamlledger/protobuf/buf.yamlledger/protobuf/ledger.protoledger/remote/client.goledger/remote/factory.goledger/remote/protobuf_encoding_test.goledger/remote/service.goledger/trie_encoder.goledger/trie_encoder_test.gomodule/metrics/execution.gomodule/metrics/ledger.goutils/io/filelock.goutils/io/filelock_test.go
🧰 Additional context used
📓 Path-based instructions (6)
**/*.go
📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)
Follow Go coding conventions as documented in @docs/agents/CodingConventions.md
Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md
**/*.go: Follow the existing module structure in/module/,/engine/,/model/and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: runmake generate-mocksafter interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoidfmt.Errorf, useirrecoverablepackage for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)
Files:
ledger/complete/wal/checkpoint_v6_test.goledger/factory.goledger/complete/wal/checkpointer_test.goledger/complete/wal/checkpoint_v5_test.goutils/io/filelock.goledger/complete/ledger.gocmd/bootstrap/run/execution_state.goledger/complete/wal/wal.goutils/io/filelock_test.gointegration/localnet/builder/ports.goledger/remote/protobuf_encoding_test.gocmd/ledger/main.goledger/factory/factory_test.gomodule/metrics/execution.gocmd/execution_config.goledger/factory/factory.goledger/config.goledger/complete/factory.gointegration/localnet/builder/bootstrap.goledger/trie_encoder_test.goledger/remote/factory.goledger/remote/client.goledger/common/pathfinder/pathfinder.gomodule/metrics/ledger.goledger/remote/service.goledger/complete/ledger_with_compactor.goledger/complete/wal/checkpointer.goledger/trie_encoder.gocmd/execution_builder.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*_test.go: Unit tests should be co-located with the code they test
Follow the existing pattern of*_test.gofiles for test naming
Use fixtures for realistic test data as defined in/utils/unittest/
Files:
ledger/complete/wal/checkpoint_v6_test.goledger/complete/wal/checkpointer_test.goledger/complete/wal/checkpoint_v5_test.goutils/io/filelock_test.goledger/remote/protobuf_encoding_test.goledger/factory/factory_test.goledger/trie_encoder_test.go
{crypto,fvm,ledger,storage}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Security checks for cryptographic misuse must be enforced using gosec
Files:
ledger/complete/wal/checkpoint_v6_test.goledger/factory.goledger/complete/wal/checkpointer_test.goledger/complete/wal/checkpoint_v5_test.goledger/complete/ledger.goledger/complete/wal/wal.goledger/remote/protobuf_encoding_test.goledger/factory/factory_test.goledger/factory/factory.goledger/config.goledger/complete/factory.goledger/trie_encoder_test.goledger/remote/factory.goledger/remote/client.goledger/common/pathfinder/pathfinder.goledger/remote/service.goledger/complete/ledger_with_compactor.goledger/complete/wal/checkpointer.goledger/trie_encoder.go
{crypto,fvm,ledger,access,engine}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation
Files:
ledger/complete/wal/checkpoint_v6_test.goledger/factory.goledger/complete/wal/checkpointer_test.goledger/complete/wal/checkpoint_v5_test.goledger/complete/ledger.goledger/complete/wal/wal.goledger/remote/protobuf_encoding_test.goledger/factory/factory_test.goledger/factory/factory.goledger/config.goledger/complete/factory.goledger/trie_encoder_test.goledger/remote/factory.goledger/remote/client.goledger/common/pathfinder/pathfinder.goledger/remote/service.goledger/complete/ledger_with_compactor.goledger/complete/wal/checkpointer.goledger/trie_encoder.go
{storage,ledger,execution,fvm}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
State consistency is paramount; use proper synchronization primitives
Files:
ledger/complete/wal/checkpoint_v6_test.goledger/factory.goledger/complete/wal/checkpointer_test.goledger/complete/wal/checkpoint_v5_test.goledger/complete/ledger.goledger/complete/wal/wal.goledger/remote/protobuf_encoding_test.goledger/factory/factory_test.goledger/factory/factory.goledger/config.goledger/complete/factory.goledger/trie_encoder_test.goledger/remote/factory.goledger/remote/client.goledger/common/pathfinder/pathfinder.goledger/remote/service.goledger/complete/ledger_with_compactor.goledger/complete/wal/checkpointer.goledger/trie_encoder.go
{module,engine,cmd}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
All major processing components must implement the
Componentinterface from/module/component/component.goto ensure consistent lifecycle management and graceful shutdown patterns
Files:
cmd/bootstrap/run/execution_state.gocmd/ledger/main.gomodule/metrics/execution.gocmd/execution_config.gomodule/metrics/ledger.gocmd/execution_builder.go
🧠 Learnings (13)
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*_test.go : Use fixtures for realistic test data as defined in `/utils/unittest/`
Applied to files:
ledger/complete/wal/checkpoint_v6_test.goledger/complete/wal/checkpoint_v5_test.goutils/io/filelock_test.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {storage,ledger,execution,fvm}/**/*.go : State consistency is paramount; use proper synchronization primitives
Applied to files:
ledger/complete/ledger.gocmd/bootstrap/run/execution_state.goledger/complete/wal/wal.goutils/io/filelock_test.gocmd/ledger/main.gointegration/localnet/builder/bootstrap.gointegration/localnet/Makefileledger/complete/wal/checkpointer.gocmd/execution_builder.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {crypto,fvm,ledger,access,engine}/**/*.go : Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation
Applied to files:
cmd/bootstrap/run/execution_state.gocmd/ledger/main.gointegration/localnet/builder/bootstrap.gointegration/localnet/Makefileledger/complete/wal/checkpointer.goledger/trie_encoder.gocmd/execution_builder.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*_test.go : Unit tests should be co-located with the code they test
Applied to files:
utils/io/filelock_test.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*_test.go : Follow the existing pattern of `*_test.go` files for test naming
Applied to files:
utils/io/filelock_test.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to integration/tests/**/*.go : Integration tests should go in `/integration/tests/`
Applied to files:
utils/io/filelock_test.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : Uses golangci-lint with custom configurations (`.golangci.yml`) and custom linters for Flow-specific conventions (struct write checking)
Applied to files:
ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:19.865Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: .cursor/rules/godocs.mdc:0-0
Timestamp: 2025-12-23T00:28:19.865Z
Learning: Applies to **/*.go : Follow Go coding standards and conventions as documented in docs/agents/GoDocs.md
Applied to files:
ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : Use mock generators: run `make generate-mocks` after interface changes
Applied to files:
ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:14.267Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: .cursor/rules/coding_conventions.mdc:0-0
Timestamp: 2025-12-23T00:28:14.267Z
Learning: Applies to **/*.go : Follow Go coding conventions as documented in docs/agents/CodingConventions.md
Applied to files:
ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Applied to files:
ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : Follow the existing module structure in `/module/`, `/engine/`, `/model/` and use dependency injection patterns for component composition
Applied to files:
ledger/protobuf/buf.gen.yaml
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : Implement proper interfaces before concrete types
Applied to files:
ledger/protobuf/buf.gen.yaml
🧬 Code graph analysis (16)
cmd/bootstrap/run/execution_state.go (2)
ledger/complete/ledger.go (1)
DefaultPathFinderVersion(23-23)model/flow/ledger.go (1)
DummyStateCommitment(283-283)
ledger/complete/wal/wal.go (2)
utils/io/filelock.go (2)
FileLock(15-18)NewFileLock(24-34)module/metrics.go (1)
WALMetrics(779-782)
utils/io/filelock_test.go (4)
fvm/fvm.go (1)
Run(78-85)utils/io/filelock.go (2)
NewFileLock(24-34)RemoveLockFile(101-107)ledger/trie.go (1)
Path(21-21)utils/unittest/fixtures/util.go (1)
NoError(6-10)
ledger/remote/protobuf_encoding_test.go (2)
ledger/protobuf/ledger.pb.go (3)
Value(153-165)Value(169-169)Value(170-172)fvm/evm/types/call.go (1)
Data(323-323)
cmd/ledger/main.go (6)
ledger/factory/factory.go (2)
NewLedger(46-130)Config(19-35)module/metrics.go (2)
WALMetrics(779-782)LedgerMetrics(726-777)ledger/complete/ledger.go (1)
Ledger(38-46)engine/access/rest/server.go (1)
NewServer(42-90)ledger/remote/service.go (1)
NewService(23-28)ledger/protobuf/ledger_grpc.pb.go (1)
RegisterLedgerServiceServer(161-163)
module/metrics/execution.go (1)
module/metrics/ledger.go (2)
LedgerCollector(19-43)NewLedgerCollector(49-172)
ledger/config.go (2)
cmd/scaffold.go (1)
Metrics(97-105)module/metrics.go (1)
WALMetrics(779-782)
ledger/complete/factory.go (6)
ledger/complete/wal/wal.go (1)
LedgerWAL(382-402)ledger/config.go (1)
CompactorConfig(10-16)module/metrics.go (1)
LedgerMetrics(726-777)ledger/factory.go (1)
Factory(5-9)ledger/complete/ledger.go (2)
NewLedger(49-87)Ledger(38-46)ledger/complete/ledger_with_compactor.go (1)
NewLedgerWithCompactor(25-61)
integration/localnet/builder/bootstrap.go (3)
integration/testnet/network.go (1)
GRPCPort(95-95)model/bootstrap/filenames.go (2)
DirnameExecutionState(14-14)FilenameWALRootCheckpoint(31-31)ledger/complete/wal/checkpointer.go (1)
SoftlinkCheckpointFile(1091-1149)
ledger/trie_encoder_test.go (2)
ledger/protobuf/ledger.pb.go (3)
Value(153-165)Value(169-169)Value(170-172)ledger/trie_encoder.go (4)
EncodeTrieUpdate(563-577)DecodeTrieUpdate(612-628)EncodeTrieUpdateCBOR(702-714)DecodeTrieUpdateCBOR(719-730)
ledger/remote/factory.go (3)
ledger/factory.go (1)
Factory(5-9)ledger/factory/factory.go (1)
NewLedger(46-130)ledger/remote/client.go (1)
NewClient(30-65)
ledger/remote/client.go (2)
ledger/protobuf/ledger.pb.go (21)
State(25-30)State(34-34)State(35-37)Value(153-165)Value(169-169)Value(170-172)Key(113-118)Key(122-122)Key(123-125)GetRequest(415-421)GetRequest(425-425)GetRequest(426-428)SetRequest(503-510)SetRequest(514-514)SetRequest(515-517)ProveRequest(607-613)ProveRequest(617-617)ProveRequest(618-620)KeyPart(65-71)KeyPart(75-75)KeyPart(76-78)ledger/trie_encoder.go (2)
DecodeTrieUpdateCBOR(719-730)Type(30-30)
module/metrics/ledger.go (1)
module/metrics.go (2)
LedgerMetrics(726-777)WALMetrics(779-782)
ledger/remote/service.go (2)
ledger/protobuf/ledger.pb.go (9)
Key(113-118)Key(122-122)Key(123-125)Value(153-165)Value(169-169)Value(170-172)KeyPart(65-71)KeyPart(75-75)KeyPart(76-78)ledger/trie_encoder.go (2)
EncodeTrieUpdateCBOR(702-714)Type(30-30)
ledger/complete/ledger_with_compactor.go (5)
ledger/complete/ledger.go (2)
Ledger(38-46)NewLedger(49-87)ledger/complete/wal/wal.go (1)
LedgerWAL(382-402)ledger/config.go (1)
CompactorConfig(10-16)module/metrics.go (1)
LedgerMetrics(726-777)cmd/scaffold.go (1)
Metrics(97-105)
cmd/execution_builder.go (3)
ledger/complete/ledger.go (1)
Ledger(38-46)ledger/trie.go (1)
RootHash(164-164)ledger/factory/factory.go (2)
NewLedger(46-130)Config(19-35)
🪛 actionlint (1.7.10)
.github/workflows/image_builds.yml
167-167: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
199-199: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 ast-grep (0.40.5)
ledger/factory/factory_test.go
[warning] 366-366: "Detected a network listener listening on 0.0.0.0 or an empty string.
This could unexpectedly expose the server publicly as it binds to all
available interfaces. Instead, specify another IP address that is not
0.0.0.0 nor the empty string."
Context: net.Listen("tcp", ":0")
Note: [CWE-200] Exposure of Sensitive Information to an Unauthorized Actor [REFERENCES]
- https://owasp.org/Top10/A01_2021-Broken_Access_Control
(avoid-bind-to-all-interfaces-go)
🪛 Buf (1.63.0)
ledger/protobuf/ledger.proto
3-3: Files with package "ledger" must be within a directory "ledger" relative to root but were in directory "ledger/protobuf".
(PACKAGE_DIRECTORY_MATCH)
3-3: Package name "ledger" should be suffixed with a correctly formed version, such as "ledger.v1".
(PACKAGE_VERSION_SUFFIX)
12-12: RPC request type "Empty" should be named "InitialStateRequest" or "LedgerServiceInitialStateRequest".
(RPC_REQUEST_STANDARD_NAME)
12-12: RPC response type "StateResponse" should be named "InitialStateResponse" or "LedgerServiceInitialStateResponse".
(RPC_RESPONSE_STANDARD_NAME)
15-15: RPC request type "StateRequest" should be named "HasStateRequest" or "LedgerServiceHasStateRequest".
(RPC_REQUEST_STANDARD_NAME)
18-18: RPC response type "ValueResponse" should be named "GetSingleValueResponse" or "LedgerServiceGetSingleValueResponse".
(RPC_RESPONSE_STANDARD_NAME)
27-27: RPC response type "ProofResponse" should be named "ProveResponse" or "LedgerServiceProveResponse".
(RPC_RESPONSE_STANDARD_NAME)
| "ledger": { | ||
| start: 8000, // 8000-8100 => 20 ledger services | ||
| end: 8100, | ||
| portCount: 2, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment inconsistency: port range supports 50 nodes, not 20.
With portCount: 2 and range [8000, 8100), the capacity is (8100 - 8000) / 2 = 50 ledger services, not 20 as stated.
Suggested fix
"ledger": {
- start: 8000, // 8000-8100 => 20 ledger services
+ start: 8000, // 8000-8100 => 50 ledger services
end: 8100,
portCount: 2,
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "ledger": { | |
| start: 8000, // 8000-8100 => 20 ledger services | |
| end: 8100, | |
| portCount: 2, | |
| }, | |
| "ledger": { | |
| start: 8000, // 8000-8100 => 50 ledger services | |
| end: 8100, | |
| portCount: 2, | |
| }, |
🤖 Prompt for AI Agents
In @integration/localnet/builder/ports.go around lines 56 - 60, The comment for
the "ledger" port range is incorrect: with start: 8000, end: 8100 and portCount:
2 the capacity is (8100 - 8000) / 2 = 50 services, not 20; update the inline
comment in the "ledger" block to reflect the correct capacity (50 ledger
services) or adjust portCount/end values if you intended 20 services so the math
matches.
| if fileSize == 0 { | ||
| t.Skip("file is empty, cannot modify") | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using t.Skip() in a helper function may skip the entire test unexpectedly.
t.Skip() will skip the calling test entirely, not just this helper invocation. In the context of Test_Checkpointing where randomlyModifyFile is called in a loop (lines 381-383), if any checkpoint file happens to be empty, the entire subtest will be skipped rather than just skipping that particular file modification.
Consider returning an error or simply returning early without logging:
🔧 Suggested fix
if fileSize == 0 {
- t.Skip("file is empty, cannot modify")
- return
+ // Empty file, nothing to modify - this shouldn't happen in normal test scenarios
+ // but handle gracefully by returning early
+ return
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if fileSize == 0 { | |
| t.Skip("file is empty, cannot modify") | |
| return | |
| } | |
| if fileSize == 0 { | |
| // Empty file, nothing to modify - this shouldn't happen in normal test scenarios | |
| // but handle gracefully by returning early | |
| return | |
| } |
🤖 Prompt for AI Agents
In @ledger/complete/wal/checkpointer_test.go around lines 495 - 498, The helper
randomlyModifyFile should not call t.Skip() because that skips the whole test;
instead change randomlyModifyFile to return an error (or a bool) and have
callers handle it: remove t.Skip() and either return nil / false when
fileSize==0 or return a specific error, update callers in Test_Checkpointing’s
loop to check the return value and continue the loop (or t.Log and continue)
when the file is empty. Ensure the helper signature (randomlyModifyFile) and all
call sites are updated accordingly so no test-level skip occurs.
| // Calculate expected relative target for comparison | ||
| symlinkDir := filepath.Dir(newPath) | ||
| expectedRelTarget, _ := filepath.Rel(symlinkDir, match) | ||
| if target == expectedRelTarget || target == match { | ||
| // Symlink already exists and points to correct target, skip creation | ||
| continue | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignored error from filepath.Rel may cause incorrect symlink comparison.
If filepath.Rel fails and returns an error, expectedRelTarget could be an empty string or invalid, causing the comparison at line 1123 to behave unexpectedly. This could incorrectly flag a valid symlink as pointing to the wrong target.
🔧 Suggested fix
// Calculate expected relative target for comparison
symlinkDir := filepath.Dir(newPath)
- expectedRelTarget, _ := filepath.Rel(symlinkDir, match)
- if target == expectedRelTarget || target == match {
+ expectedRelTarget, relErr := filepath.Rel(symlinkDir, match)
+ if target == match || (relErr == nil && target == expectedRelTarget) {
// Symlink already exists and points to correct target, skip creation
continue
}🤖 Prompt for AI Agents
In @ledger/complete/wal/checkpointer.go around lines 1120 - 1126, The call to
filepath.Rel when computing expectedRelTarget in checkpointer.go ignores its
error, so if filepath.Rel fails expectedRelTarget may be wrong and the
subsequent comparison (variables: symlinkDir, expectedRelTarget, match, target,
newPath) can mis-detect correct symlinks; fix by checking the error returned
from filepath.Rel, and on error fall back to a safe comparison (for example use
match as the expected target or compute absolute paths with filepath.Abs for
both target and match before comparing) so the condition if target ==
expectedRelTarget || target == match reliably reflects the actual link target.
| conn, err := grpc.NewClient( | ||
| grpcAddr, | ||
| grpc.WithTransportCredentials(insecure.NewCredentials()), | ||
| grpc.WithDefaultCallOptions( | ||
| grpc.MaxCallRecvMsgSize(int(maxResponseSize)), | ||
| grpc.MaxCallSendMsgSize(int(maxRequestSize)), | ||
| ), | ||
| ) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to connect to ledger service: %w", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the client.go file to understand the context
head -60 ledger/remote/client.go | cat -nRepository: onflow/flow-go
Length of output: 2287
🏁 Script executed:
# Look for comments or documentation around the credentials setup
rg -B5 -A5 "insecure.NewCredentials" ledger/remote/client.goRepository: onflow/flow-go
Length of output: 531
🏁 Script executed:
# Check if there's documentation about the remote ledger service
fd -t f "README|doc|spec" ledger/remote/ || echo "No docs found in remote/"Repository: onflow/flow-go
Length of output: 40
🏁 Script executed:
# Search for other gRPC client configurations in the codebase
rg "grpc\.NewClient|NewConn" -t go --max-count=20Repository: onflow/flow-go
Length of output: 9775
🏁 Script executed:
# Check for environment-specific or configuration-based credential handling
rg "TransportCredentials|tls\.|credentials" -t go ledger/remote/Repository: onflow/flow-go
Length of output: 212
Add configuration to support TLS credentials for production deployments.
The client uses grpc.WithTransportCredentials(insecure.NewCredentials()) without any mechanism to switch to secure credentials for production. While insecure credentials are appropriate for local development and testing, this implementation provides no configuration option or conditional logic to use TLS in production environments. Given that this is a ledger service client handling cryptographic operations, add support for configurable TLS credentials that can be enabled based on deployment context (e.g., environment variables or configuration parameters).
🤖 Prompt for AI Agents
In @ledger/remote/client.go around lines 45 - 55, The client always uses
insecure.NewCredentials() in the grpc.NewClient call, so add a configurable
switch to use TLS credentials for production: detect a config/env flag (e.g.,
LEDGER_GRPC_TLS or use a TLSCert/TLSRootCA path) and, when enabled, create
secure credentials (e.g., via credentials.NewClientTLSFromFile or equivalent)
and pass grpc.WithTransportCredentials(creds) instead of
insecure.NewCredentials(); keep the existing grpcAddr and call option handling
(grpc.MaxCallRecvMsgSize, grpc.MaxCallSendMsgSize) and ensure errors from
loading certs are returned (update the grpc.NewClient construction around the
grpcAddr, maxResponseSize, maxRequestSize block).
| // InitialState returns the initial state of the ledger. | ||
| func (c *Client) InitialState() ledger.State { | ||
| ctx := context.Background() | ||
| resp, err := c.client.InitialState(ctx, &emptypb.Empty{}) | ||
| if err != nil { | ||
| c.logger.Error().Err(err).Msg("failed to get initial state") | ||
| return ledger.DummyState | ||
| } | ||
|
|
||
| var state ledger.State | ||
| if len(resp.State.Hash) != len(state) { | ||
| c.logger.Error(). | ||
| Int("expected", len(state)). | ||
| Int("got", len(resp.State.Hash)). | ||
| Msg("invalid state hash length") | ||
| return ledger.DummyState | ||
| } | ||
| copy(state[:], resp.State.Hash) | ||
| return state | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Search for ledger.Ledger interface definition
find . -name "*.go" -path "*/ledger/*" | head -20Repository: onflow/flow-go
Length of output: 1138
🏁 Script executed:
# Find the interface definition more broadly
rg "type Ledger interface" -A 20Repository: onflow/flow-go
Length of output: 3090
🏁 Script executed:
# Also look for any definitions in package ledger
fd -e go -path "*/ledger/*" | xargs rg "type Ledger interface" -A 20Repository: onflow/flow-go
Length of output: 3276
🏁 Script executed:
# Search for implementations of the Ledger interface
rg "func.*InitialState\(\)" -A 5 --type go -B 2Repository: onflow/flow-go
Length of output: 5393
🏁 Script executed:
# Look for implementations of ledger.Ledger interface
rg "type.*Client.*struct" ledger/remote/ -A 10Repository: onflow/flow-go
Length of output: 670
🏁 Script executed:
# Check if there's any documentation or comment about this design choice
rg "InitialState" ledger/ledger.go -B 5 -A 10Repository: onflow/flow-go
Length of output: 684
🏁 Script executed:
# Search for usage of InitialState() to understand how it's called
rg "\.InitialState\(\)" --type go -B 3 -A 3 | head -100Repository: onflow/flow-go
Length of output: 5642
🏁 Script executed:
# Check the ledger.DummyState constant to understand what's being returned
rg "DummyState" ledger/ --type go -B 2 -A 2Repository: onflow/flow-go
Length of output: 4212
InitialState silently returns DummyState on error, contradicting test expectations.
The InitialState() method logs errors but returns ledger.DummyState instead of propagating them. While this matches the interface signature (which has no error return), factory tests explicitly assert that InitialState() should never return DummyState (assert.NotEqual(t, ledger.DummyState, remoteState)). Returning DummyState on gRPC failures or invalid state hash length creates a silent failure that violates this expectation and prevents callers from detecting errors. Consider updating the interface to support error returns, or ensure the remote client never returns DummyState except as a valid state commitment.
🤖 Prompt for AI Agents
In @ledger/remote/client.go around lines 75 - 94, InitialState currently
swallows errors and returns ledger.DummyState, which breaks tests; change the
Client.InitialState signature to return (ledger.State, error) and update the
implementation to propagate errors from c.client.InitialState and from the
invalid hash-length check (return zero value state and the error instead of
DummyState), and then update all callers and the ledger.Client interface
implementations to handle the new (state, error) return so the remote client
never silently returns DummyState on failures.
| values := make([]ledger.Value, len(req.Values)) | ||
| for i, protoValue := range req.Values { | ||
| var value ledger.Value | ||
| // Reconstruct the original value type using is_nil flag | ||
| // This preserves the distinction between nil and []byte{} that protobuf loses | ||
| if len(protoValue.Data) == 0 { | ||
| if protoValue.IsNil { | ||
| // Original value was nil | ||
| value = nil | ||
| } else { | ||
| // Original value was []byte{} (empty slice) | ||
| value = ledger.Value([]byte{}) | ||
| } | ||
| } else { | ||
| // Non-empty value, use data as-is | ||
| value = ledger.Value(protoValue.Data) | ||
| } | ||
| values[i] = value | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing nil check for protoValue can cause panic.
In proto3 generated Go code, repeated message fields like Values are []*ledgerpb.Value, meaning individual elements can be nil. Accessing protoValue.Data or protoValue.IsNil on a nil element would cause a panic. Per coding guidelines, all inputs must be considered potentially Byzantine.
🐛 Proposed fix
values := make([]ledger.Value, len(req.Values))
for i, protoValue := range req.Values {
+ if protoValue == nil {
+ return nil, status.Error(codes.InvalidArgument, "value at index is nil")
+ }
var value ledger.Value
// Reconstruct the original value type using is_nil flag🤖 Prompt for AI Agents
In @ledger/remote/service.go around lines 150 - 168, The loop over req.Values
must guard against nil elements to avoid panics; before reading protoValue.Data
or protoValue.IsNil in the loop that builds values (the slice of ledger.Value),
add a nil check for protoValue and handle it explicitly (e.g., treat a nil
protoValue as an original nil ledger.Value or return a clear error), then
proceed with the existing logic that distinguishes protoValue.IsNil vs empty
Data to reconstruct the ledger.Value; reference req.Values, protoValue,
protoValue.Data, protoValue.IsNil and the values slice when locating where to
add the check.
| keyParts := make([]ledger.KeyPart, len(protoKey.Parts)) | ||
| for i, part := range protoKey.Parts { | ||
| if part.Type > 65535 { | ||
| return ledger.Key{}, status.Error(codes.InvalidArgument, "key part type exceeds uint16") | ||
| } | ||
| keyParts[i] = ledger.NewKeyPart(uint16(part.Type), part.Value) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing nil check for KeyPart elements can cause panic.
Similar to the Values issue, protoKey.Parts is []*ledgerpb.KeyPart in generated Go code. If a client sends a key with a nil part element, accessing part.Type would cause a nil pointer dereference panic.
🐛 Proposed fix
keyParts := make([]ledger.KeyPart, len(protoKey.Parts))
for i, part := range protoKey.Parts {
+ if part == nil {
+ return ledger.Key{}, status.Error(codes.InvalidArgument, "key part is nil")
+ }
if part.Type > 65535 {
return ledger.Key{}, status.Error(codes.InvalidArgument, "key part type exceeds uint16")
}🤖 Prompt for AI Agents
In @ledger/remote/service.go around lines 230 - 236, The loop over
protoKey.Parts can panic if an element is nil; update the loop that builds
keyParts to check for nil before accessing part.Type/value (i.e., inside the for
i, part := range protoKey.Parts block), and return a
status.Error(codes.InvalidArgument, "...") when a nil part is encountered; keep
the existing uint16 bounds check for part.Type and continue to call
ledger.NewKeyPart only after validating part is non-nil and its Type fits in
uint16 so you populate the ledger.KeyPart slice safely.
| require.NotNil(t, decoded.Payloads[0].Value(), "Decoded Payload 0 should have nil value") | ||
| require.Equal(t, 0, len(decoded.Payloads[0].Value()), "Decoded Payload 0 should have 0 length") | ||
| require.NotNil(t, decoded.Payloads[1].Value(), "Decoded Payload 1 should have non-nil value") | ||
| require.Equal(t, 0, len(decoded.Payloads[1].Value()), "Decoded Payload 1 should have 0 length") | ||
|
|
||
| // The key assertion: they should be the same after encoding/decoding due to normalization in encoding | ||
| require.Equal(t, decoded.Payloads[0].Value(), decoded.Payloads[1].Value(), | ||
| "Decoded nil and []byte{} should remain distinct with CBOR encoding") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix inconsistent assertion messages.
The assertion messages don't match what's being tested:
- Line 705: Uses
require.NotNilbut message says "should have nil value" - Line 712: Message says values "should remain distinct" but the test asserts they are equal
🔧 Suggested fix
- require.NotNil(t, decoded.Payloads[0].Value(), "Decoded Payload 0 should have nil value")
+ require.NotNil(t, decoded.Payloads[0].Value(), "Decoded Payload 0 should be normalized to non-nil empty slice")
require.Equal(t, 0, len(decoded.Payloads[0].Value()), "Decoded Payload 0 should have 0 length")
require.NotNil(t, decoded.Payloads[1].Value(), "Decoded Payload 1 should have non-nil value")
require.Equal(t, 0, len(decoded.Payloads[1].Value()), "Decoded Payload 1 should have 0 length")
// The key assertion: they should be the same after encoding/decoding due to normalization in encoding
require.Equal(t, decoded.Payloads[0].Value(), decoded.Payloads[1].Value(),
- "Decoded nil and []byte{} should remain distinct with CBOR encoding")
+ "Decoded nil and []byte{} should be identical after CBOR normalization")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| require.NotNil(t, decoded.Payloads[0].Value(), "Decoded Payload 0 should have nil value") | |
| require.Equal(t, 0, len(decoded.Payloads[0].Value()), "Decoded Payload 0 should have 0 length") | |
| require.NotNil(t, decoded.Payloads[1].Value(), "Decoded Payload 1 should have non-nil value") | |
| require.Equal(t, 0, len(decoded.Payloads[1].Value()), "Decoded Payload 1 should have 0 length") | |
| // The key assertion: they should be the same after encoding/decoding due to normalization in encoding | |
| require.Equal(t, decoded.Payloads[0].Value(), decoded.Payloads[1].Value(), | |
| "Decoded nil and []byte{} should remain distinct with CBOR encoding") | |
| require.NotNil(t, decoded.Payloads[0].Value(), "Decoded Payload 0 should be normalized to non-nil empty slice") | |
| require.Equal(t, 0, len(decoded.Payloads[0].Value()), "Decoded Payload 0 should have 0 length") | |
| require.NotNil(t, decoded.Payloads[1].Value(), "Decoded Payload 1 should have non-nil value") | |
| require.Equal(t, 0, len(decoded.Payloads[1].Value()), "Decoded Payload 1 should have 0 length") | |
| // The key assertion: they should be the same after encoding/decoding due to normalization in encoding | |
| require.Equal(t, decoded.Payloads[0].Value(), decoded.Payloads[1].Value(), | |
| "Decoded nil and []byte{} should be identical after CBOR normalization") |
🤖 Prompt for AI Agents
In @ledger/trie_encoder_test.go around lines 705 - 712, Fix the mismatched
assertions/messages around decoded.Payloads: replace the incorrect
require.NotNil on decoded.Payloads[0].Value() with require.Nil and update its
message to "Decoded Payload 0 should have nil value"; keep require.NotNil for
decoded.Payloads[1].Value() (empty slice) and ensure its message says "Decoded
Payload 1 should have non-nil value"; finally update the final require.Equal
message to reflect the intended expectation ("Decoded nil and []byte{} should be
equal after normalization") so the assertion text matches the check of
decoded.Payloads[0].Value() == decoded.Payloads[1].Value().
ledger/remote/client.go
Outdated
| // Decode trie update if present using CBOR decoding to preserve nil vs []byte{} distinction | ||
| trieUpdate, err := ledger.DecodeTrieUpdateCBOR(resp.TrieUpdate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before normalizing the nil and []byte{} cases in the payload value, we had to use CBOR encoding because EncodeTrieUpdate could not distinguish between the two, whereas CBOR encoding could.
Now that normalization has been integrated into the encoding process, both EncodeTrieUpdate and EncodeTrieUpdateCBOR can be used to encode the TrieUpdate. We can therefore choose the option with better performance.
I will create a separate PR for adding benchmark results.
cc @fxamacker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/image_builds.yml:
- Line 195: The promote-to-public-registry job's role matrix is missing the
execution-ledger entry, so update the matrix used by promote-to-public-registry
(the role: [...] list) to include execution-ledger (i.e., add "execution-ledger"
into the role array alongside access, collection, consensus, execution,
observer, verification) so that the execution-ledger images built and tagged as
execution-ledger:${TAG} in public-build are promoted to the public registry.
🧹 Nitpick comments (1)
.github/workflows/image_builds.yml (1)
167-167: Consider upgradingactions/checkoutfrom v3 to v4.Static analysis indicates that
actions/checkout@v3is outdated. While v3 still functions, v4 offers improved performance and is the current recommended version.♻️ Suggested upgrade
- name: Checkout repo - uses: actions/checkout@v3 + uses: actions/checkout@v4Apply this change to both occurrences at lines 167 and 199.
Also applies to: 199-199
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/image_builds.yml
🧰 Additional context used
🪛 actionlint (1.7.10)
.github/workflows/image_builds.yml
167-167: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
199-199: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
- GitHub Check: Lint (./insecure/)
- GitHub Check: Lint (./integration/)
- GitHub Check: Lint (./)
- GitHub Check: Integration Tests Others (integration)
- GitHub Check: Unit Tests (network/p2p/connection)
- GitHub Check: Unit Tests (engine/common)
- GitHub Check: Unit Tests (engine)
- GitHub Check: Unit Tests (network/test/cohort1)
- GitHub Check: Unit Tests (engine/execution/computation)
- GitHub Check: Unit Tests (network/test/cohort2)
- GitHub Check: Unit Tests (engine/execution)
- GitHub Check: Unit Tests (engine/execution/ingestion)
- GitHub Check: Unit Tests (utils)
- GitHub Check: Unit Tests (network/alsp)
- GitHub Check: Unit Tests (others)
- GitHub Check: Unit Tests (network)
- GitHub Check: Unit Tests (engine/verification)
- GitHub Check: Unit Tests (state)
- GitHub Check: Unit Tests (engine/collection)
- GitHub Check: Unit Tests (network/p2p)
- GitHub Check: Unit Tests (module/dkg)
- GitHub Check: Unit Tests (module)
- GitHub Check: Unit Tests (network/p2p/node)
- GitHub Check: Unit Tests (engine/access)
- GitHub Check: Unit Tests (consensus)
- GitHub Check: Unit Tests (cmd)
- GitHub Check: Unit Tests (network/p2p/scoring)
- GitHub Check: Unit Tests (engine/consensus)
- GitHub Check: Unit Tests (fvm)
- GitHub Check: Unit Tests (storage)
- GitHub Check: Unit Tests (admin)
- GitHub Check: Unit Tests (ledger)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
- GitHub Check: Unit Tests Insecure (insecure)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
- GitHub Check: Docker Build
- GitHub Check: analyze-code (go)
🔇 Additional comments (1)
.github/workflows/image_builds.yml (1)
62-66: LGTM! New execution-ledger build commands follow the established pattern.The build commands for the execution-ledger service are consistent with all other roles, covering all four image variants (with-adx, without-adx, without-netgo-without-adx, arm).
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| @@ -187,18 +193,18 @@ jobs: | |||
| fail-fast: false | |||
| matrix: | |||
| role: [access, collection, consensus, execution, observer, verification] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for execution-ledger references in .github/
rg -n "execution-ledger" .github/
# Search for ledger-related promotion configurations
rg -n -i "ledger" .github/workflows/Repository: onflow/flow-go
Length of output: 1408
🏁 Script executed:
# Get the full context of the image_builds.yml file around line 195
wc -l .github/workflows/image_builds.yml
# Read the promote-to-public-registry job and surrounding context
head -n 250 .github/workflows/image_builds.yml | tail -n 100Repository: onflow/flow-go
Length of output: 5045
🏁 Script executed:
# Read the public-build job from the start to understand role mapping
head -n 120 .github/workflows/image_builds.ymlRepository: onflow/flow-go
Length of output: 5927
🏁 Script executed:
# Find and examine the promote-images action
find .github/actions -name "promote-images" -type d
# Check the action's implementation
cat .github/actions/promote-images/action.yml 2>/dev/null || cat .github/actions/promote-images/action.yaml 2>/dev/nullRepository: onflow/flow-go
Length of output: 108
🏁 Script executed:
# Also check how images are tagged in the Makefile
rg -n "docker.*execution-ledger" --max-count=20Repository: onflow/flow-go
Length of output: 1372
🏁 Script executed:
# Find the promote-images action - it might be in actions directory
find . -path "./.*" -prune -o -name "promote-images" -type d -print
# Or search more broadly
find . -name "promote-images*" -type f 2>/dev/null | head -20Repository: onflow/flow-go
Length of output: 82
🏁 Script executed:
# Check what promote-images action looks like
rg -l "promote-images" --type yaml | head -10Repository: onflow/flow-go
Length of output: 40
🏁 Script executed:
# Examine the promote-images action
ls -la ./actions/promote-images/
# Read the action file
cat ./actions/promote-images/action.ymlRepository: onflow/flow-go
Length of output: 2632
🏁 Script executed:
# Also check if there's a script that implements the promotion logic
find ./actions/promote-images -type fRepository: onflow/flow-go
Length of output: 93
Add execution-ledger to the promotion matrix.
The public-build job explicitly builds and pushes execution-ledger images to the private registry (tagged as execution-ledger:${TAG}), but the promote-to-public-registry job's role matrix does not include execution-ledger. This means the ledger service images built in public-build are not being promoted to the public registry. Update the role matrix to include execution-ledger:
role: [access, collection, consensus, execution, execution-ledger, observer, verification]
🤖 Prompt for AI Agents
In @.github/workflows/image_builds.yml at line 195, The
promote-to-public-registry job's role matrix is missing the execution-ledger
entry, so update the matrix used by promote-to-public-registry (the role: [...]
list) to include execution-ledger (i.e., add "execution-ledger" into the role
array alongside access, collection, consensus, execution, observer,
verification) so that the execution-ledger images built and tagged as
execution-ledger:${TAG} in public-build are promoted to the public registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@ledger/partial/ledger_test.go`:
- Around line 210-256: The test TestPartialLedger_StateByIndex incorrectly
asserts that StateByIndex(-1) should error even though the Ledger interface
documents -1 as the last index; update the test to treat -1 as valid for the
partial ledger by calling pled.StateByIndex(-1) and asserting no error and that
the returned state equals newState (keep assertions that StateByIndex(1) and
StateByIndex(-2) error); reference the test function
TestPartialLedger_StateByIndex and the partial ledger method StateByIndex
(constructed via partial.NewLedger) when making this change.
In `@ledger/partial/ledger.go`:
- Around line 174-181: StateByIndex currently rejects index == -1 even though
the Ledger interface treats -1 as the last index; update the Ledger.StateByIndex
method to treat index == -1 as index 0 for the partial ledger: if index == 0 ||
index == -1 return l.state, nil, otherwise return ledger.DummyState with the
existing out-of-range error message. Modify the conditional in the StateByIndex
function (referencing l.state, ledger.DummyState, and the method name
StateByIndex) to handle -1 as described and keep the same error formatting for
other indices.
In `@ledger/remote/client.go`:
- Around line 257-282: The exponential backoff in the retry loop (uses ctx :=
context.Background(), maxRetries, retryDelay and c.client.InitialState) has no
upper bound; add a cap such as maxRetryDelay (e.g., 5s or configurable) and
after computing retryDelay = time.Duration(float64(retryDelay) * 1.5) clamp it
so retryDelay = min(retryDelay, maxRetryDelay) (or use an if to set to
maxRetryDelay), ensuring the logged Dur("retry_delay", retryDelay) reflects the
capped value.
♻️ Duplicate comments (2)
ledger/remote/client.go (2)
45-55: Insecure credentials - previously flagged.This was identified in a previous review. The client uses insecure credentials without a mechanism to enable TLS for production.
75-94: InitialState silently returns DummyState on error - previously flagged.This was identified in a previous review. Returning
DummyStateon gRPC failures creates silent failures.
🧹 Nitpick comments (6)
integration/localnet/builder/bootstrap.go (2)
850-888: Exposed port may be unused if service listens on Unix socket.The service configuration exposes
testnet.GRPCPort(line 888) but the ledger service is configured to listen on a Unix socket (line 856). If the Unix socket configuration is intentional, the port exposure is unnecessary. If TCP is intended (to fix the mismatch issue raised above), the listening address needs to be updated.♻️ Ensure consistency between listen address and exposed ports
If TCP is the intended transport:
service := Service{ name: ledgerServiceName, Image: "localnet-ledger", Command: []string{ "--triedir=/trie", - "--ledger-service-addr=unix:///sockets/ledger.sock", + fmt.Sprintf("--ledger-service-addr=0.0.0.0:%s", testnet.GRPCPort), "--mtrie-cache-size=100",If Unix socket is intended, remove the port exposure and set up a shared socket volume between ledger service and execution nodes.
793-810: Consider simplifying the loop logic.The loop at lines 797-806 finds the first execution node to share its trie directory with the ledger service. The condition
executionIndex < ledgerExecutionCounton line 800 and the increment on line 804 suggest support for multiple execution nodes, but sinceledgerExecutionCount > 0is validated before this function is called, the first execution node found will always satisfy the condition, making the increment unreachable.The intent appears to be to always use the first execution node's trie directory. Consider simplifying for clarity:
♻️ Simplified loop
- var firstExecutionNode *testnet.ContainerConfig - executionIndex := 0 - for _, container := range flowNodeContainerConfigs { - if container.Role == flow.RoleExecution { - if executionIndex < ledgerExecutionCount { - firstExecutionNode = &container - break - } - executionIndex++ - } - } + var firstExecutionNode *testnet.ContainerConfig + for _, container := range flowNodeContainerConfigs { + if container.Role == flow.RoleExecution { + firstExecutionNode = &container + break + } + }ledger/complete/ledger.go (1)
470-496: Consider clarifying the error message for negative out-of-range indices.The implementation correctly supports negative indexing. However, the error message on line 487 displays
index-countwhich shows the re-computed negative value, not the original user-provided index. This could be confusing for debugging.Proposed improvement for clearer error message
// Handle negative index (-1 means last index) + originalIndex := index if index < 0 { index = count + index if index < 0 { - return ledger.DummyState, fmt.Errorf("index %d is out of range (count: %d)", index-count, count) + return ledger.DummyState, fmt.Errorf("index %d is out of range (count: %d)", originalIndex, count) } }cmd/ledger/main.go (2)
101-108: Consider adding bounds checking for message size conversion.The
uinttointcast formaxRequestSizeandmaxResponseSizecould silently overflow on 32-bit systems if values exceedmath.MaxInt32. While 32-bit deployments are uncommon, explicit bounds checking would be defensive.🔧 Suggested defensive check
+ import "math" + // Create gRPC server with max message size configuration. + if *maxRequestSize > uint(math.MaxInt32) || *maxResponseSize > uint(math.MaxInt32) { + logger.Fatal().Msg("message size exceeds maximum int value") + } grpcServer := grpc.NewServer( grpc.MaxRecvMsgSize(int(*maxRequestSize)), grpc.MaxSendMsgSize(int(*maxResponseSize)), )
174-188: Consider adding a timeout for ledger shutdown.The graceful shutdown waits indefinitely on
<-ledgerStorage.Done(). If the ledger hangs during shutdown, the process will never exit. Consider adding a context with timeout.🔧 Suggested timeout for shutdown
logger.Info().Msg("waiting for ledger to stop...") - <-ledgerStorage.Done() + select { + case <-ledgerStorage.Done(): + logger.Info().Msg("ledger stopped cleanly") + case <-time.After(30 * time.Second): + logger.Warn().Msg("ledger shutdown timed out") + } logger.Info().Msg("ledger service stopped")ledger/complete/ledger_with_compactor.go (1)
117-143: Add a public method to close the trie update channel instead of accessing private fields.
LedgerWithCompactor.Done()directly accesses private fields (closeTrieUpdateCh,trieUpdateCh) fromLedger, breaking encapsulation and creating unnecessary coupling. The shutdown order is correct—closing the channel first allows the compactor to drain remaining updates—andsync.Onceproperly prevents double-close panics. However, expose a public method onLedger(e.g.,CloseTrieUpdateChannel()) so the wrapper can trigger the closure without reaching into private fields. This maintains the same safety guarantees while following Go visibility conventions.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
cmd/ledger/README.mdcmd/ledger/main.gointegration/localnet/builder/bootstrap.goledger/complete/ledger.goledger/complete/ledger_test.goledger/complete/ledger_with_compactor.goledger/ledger.goledger/partial/ledger.goledger/partial/ledger_test.goledger/remote/client.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/ledger/README.md
🧰 Additional context used
📓 Path-based instructions (6)
**/*.go
📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)
Follow Go coding conventions as documented in
@docs/agents/CodingConventions.mdFollow Go coding standards and conventions as documented in
@docs/agents/GoDocs.md
**/*.go: Follow the existing module structure in/module/,/engine/,/model/and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: runmake generate-mocksafter interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoidfmt.Errorf, useirrecoverablepackage for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)
Files:
ledger/ledger.gocmd/ledger/main.goledger/remote/client.goledger/complete/ledger.goledger/partial/ledger_test.goledger/complete/ledger_test.goledger/partial/ledger.goledger/complete/ledger_with_compactor.gointegration/localnet/builder/bootstrap.go
{crypto,fvm,ledger,storage}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Security checks for cryptographic misuse must be enforced using gosec
Files:
ledger/ledger.goledger/remote/client.goledger/complete/ledger.goledger/partial/ledger_test.goledger/complete/ledger_test.goledger/partial/ledger.goledger/complete/ledger_with_compactor.go
{crypto,fvm,ledger,access,engine}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation
Files:
ledger/ledger.goledger/remote/client.goledger/complete/ledger.goledger/partial/ledger_test.goledger/complete/ledger_test.goledger/partial/ledger.goledger/complete/ledger_with_compactor.go
{storage,ledger,execution,fvm}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
State consistency is paramount; use proper synchronization primitives
Files:
ledger/ledger.goledger/remote/client.goledger/complete/ledger.goledger/partial/ledger_test.goledger/complete/ledger_test.goledger/partial/ledger.goledger/complete/ledger_with_compactor.go
{module,engine,cmd}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
All major processing components must implement the
Componentinterface from/module/component/component.goto ensure consistent lifecycle management and graceful shutdown patterns
Files:
cmd/ledger/main.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*_test.go: Unit tests should be co-located with the code they test
Follow the existing pattern of*_test.gofiles for test naming
Use fixtures for realistic test data as defined in/utils/unittest/
Files:
ledger/partial/ledger_test.goledger/complete/ledger_test.go
🧠 Learnings (2)
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {storage,ledger,execution,fvm}/**/*.go : State consistency is paramount; use proper synchronization primitives
Applied to files:
cmd/ledger/main.goledger/remote/client.goledger/complete/ledger.goledger/complete/ledger_test.gointegration/localnet/builder/bootstrap.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {crypto,fvm,ledger,access,engine}/**/*.go : Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation
Applied to files:
cmd/ledger/main.goledger/remote/client.gointegration/localnet/builder/bootstrap.go
🧬 Code graph analysis (4)
ledger/remote/client.go (2)
ledger/ledger.go (9)
State(155-155)DummyState(159-159)QuerySingleValue(88-91)Value(350-350)Key(203-205)Query(52-55)Update(109-113)Proof(188-188)KeyPart(289-292)ledger/trie_encoder.go (2)
DecodeTrieUpdateCBOR(719-730)Type(30-30)
ledger/complete/ledger.go (2)
ledger/ledger.go (3)
Ledger(21-49)State(155-155)DummyState(159-159)ledger/trie.go (1)
RootHash(164-164)
ledger/partial/ledger.go (1)
ledger/ledger.go (3)
Ledger(21-49)State(155-155)DummyState(159-159)
integration/localnet/builder/bootstrap.go (3)
integration/testnet/network.go (1)
GRPCPort(95-95)model/bootstrap/filenames.go (2)
DirnameExecutionState(14-14)FilenameWALRootCheckpoint(31-31)ledger/complete/wal/checkpointer.go (1)
SoftlinkCheckpointFile(1091-1149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
- GitHub Check: Lint (./insecure/)
- GitHub Check: Lint (./integration/)
- GitHub Check: Lint (./)
- GitHub Check: Unit Tests Insecure (insecure)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
- GitHub Check: Integration Tests Others (integration)
- GitHub Check: Unit Tests (module)
- GitHub Check: Unit Tests (state)
- GitHub Check: Unit Tests (network/p2p/connection)
- GitHub Check: Unit Tests (network/p2p)
- GitHub Check: Unit Tests (network/test/cohort1)
- GitHub Check: Unit Tests (others)
- GitHub Check: Unit Tests (network/alsp)
- GitHub Check: Unit Tests (network/p2p/scoring)
- GitHub Check: Unit Tests (engine/execution)
- GitHub Check: Unit Tests (module/dkg)
- GitHub Check: Unit Tests (network/p2p/node)
- GitHub Check: Unit Tests (network)
- GitHub Check: Unit Tests (network/test/cohort2)
- GitHub Check: Unit Tests (engine/verification)
- GitHub Check: Unit Tests (engine/common)
- GitHub Check: Unit Tests (storage)
- GitHub Check: Unit Tests (engine/consensus)
- GitHub Check: Unit Tests (utils)
- GitHub Check: Unit Tests (cmd)
- GitHub Check: Unit Tests (engine/execution/computation)
- GitHub Check: Unit Tests (engine/execution/ingestion)
- GitHub Check: Unit Tests (engine)
- GitHub Check: Unit Tests (fvm)
- GitHub Check: Unit Tests (engine/access)
- GitHub Check: Unit Tests (admin)
- GitHub Check: Unit Tests (consensus)
- GitHub Check: Unit Tests (ledger)
- GitHub Check: Unit Tests (engine/collection)
- GitHub Check: Docker Build
- GitHub Check: analyze-code (go)
🔇 Additional comments (29)
integration/localnet/builder/bootstrap.go (6)
20-21: LGTM!The new imports are appropriately aliased and used correctly for checkpoint symlinking and bootstrap filename constants.
67-67: LGTM!The new flag with default value 0 maintains backward compatibility—all execution nodes use local ledger unless explicitly configured otherwise.
Also applies to: 110-110
183-189: LGTM!The validation ensures
ledgerExecutionCountis within the valid range[0, executionCount]with clear error messages.
203-206: LGTM!The conditional creation of the ledger service is correctly placed after flow services are prepared, ensuring execution nodes can be configured with their dependencies first.
702-718: LGTM with a note.The modification handles the edge case where a directory exists at the target file path. This is a reasonable defensive addition. The function is called for
DockerComposeFileandPrometheusTargetsFile, both of which should always be regular files.
821-841: Symlink resolution is correct and will work inside the container.The implementation of
SoftlinkCheckpointFilecorrectly creates relative symlinks usingfilepath.Rel()to calculate the path from the symlink location (in/trie) to the target files (in/bootstrap/execution-state). The relative path calculated on the host (e.g.,../bootstrap/execution-state/checkpoint) preserves the same directory structure when both directories are mounted in the container. Since both/trieand/bootstrapare mounted as volumes, the symlinks resolve correctly inside the container. No code changes are needed.ledger/ledger.go (1)
42-48: LGTM! Interface additions are well-documented.The new
StateCount()andStateByIndex()methods cleanly extend theLedgerinterface. The documentation for-1as the last index convention is clear.ledger/partial/ledger.go (1)
168-172: LGTM!The
StateCount()implementation correctly returns1and the documentation clearly explains the partial ledger's single-state constraint.ledger/partial/ledger_test.go (1)
173-208: LGTM!The test properly validates the partial ledger's single-state constraint with thorough setup and cleanup.
ledger/complete/ledger_test.go (3)
817-854: LGTM!The test validates state counting behavior correctly. The unused variable suppression on line 853 is acceptable for test clarity.
856-926: LGTM! Comprehensive test coverage forStateByIndex.The test thoroughly covers:
- Initial state retrieval at index 0
- Last state retrieval using -1
- Multiple updates and subsequent state access
- Consistency between
-1andcount-1indices- Out-of-range error handling for both positive and negative indices
928-982: LGTM! Good integration test with real WAL and compactor.The test exercises the
LedgerWithCompactorwrapper, ensuring the state counting and index-based retrieval work correctly through the lifecycle-managed ledger.ledger/complete/ledger.go (2)
44-44: LGTM! Good use ofsync.Oncefor safe channel closure.Using
sync.Onceto closetrieUpdateChprevents potential double-close panics ifDone()is called multiple times. This aligns with the coding guidelines requiring proper synchronization primitives for state consistency.Also applies to: 115-117
465-468: LGTM!Simple and correct delegation to the existing
ForestSize()method.cmd/ledger/main.go (5)
23-32: Flag definitions look reasonable for the service configuration.The flags provide sensible defaults and cover the necessary configuration options for the standalone ledger service.
34-48: Input validation and logging setup are correct.The triedir validation and log level parsing with proper error handling follow good practices.
61-76: Ledger creation via factory is well-structured.The factory pattern usage with proper error handling on Fatal is appropriate for a standalone service.
85-99: Health check validates ledger state availability.The StateCount and StateByIndex checks ensure the ledger is properly initialized before accepting requests.
126-142: Unix socket cleanup and permission handling is reasonable.Removing stale socket files on startup is appropriate. Warning but continuing on chmod failure is acceptable since the socket may still be usable with default permissions.
ledger/remote/client.go (6)
96-112: HasState silently returns false on gRPC error.Similar to
InitialState, this method swallows errors and returnsfalse, which callers cannot distinguish from a legitimate "state not found" response. Consider whether this silent failure mode is acceptable for the use cases.
287-300: Done() goroutine may cause race if called concurrently with other operations.The
sync.OnceensuresClose()is called once, but the goroutine closes the connection asynchronously. IfDone()is called while other gRPC operations are in-flight, they may fail unexpectedly. Consider whether synchronous close or a graceful drain period is needed.
130-139: Nil vs empty byte slice handling is correctly implemented.The
is_nilflag usage properly preserves the distinction betweenniland[]byte{}that protobuf would otherwise lose. This is consistent with the server-side handling.
216-223: CBOR decoding for TrieUpdate preserves nil/empty semantics.The use of
DecodeTrieUpdateCBORis appropriate here based on the previous discussion about encoding choices.
318-330: Key conversion helper is straightforward and correct.The
ledgerKeyToProtoKeyfunction correctly maps ledger key parts to protobuf representation.
302-316: StateCount and StateByIndex are intentionally unimplemented for remote clients.These methods are not supported because the gRPC service (ledger.proto) does not expose these operations. Returning
0and an error respectively is intentional and documented. The code correctly reflects that health checks relying onStateCount() > 0will fail for remote clients, which is the intended behavior given the current gRPC API limitations.No changes are needed unless StateCount and StateByIndex are added to the gRPC service definition in a future update.
ledger/complete/ledger_with_compactor.go (4)
16-20: Wrapper struct cleanly encapsulates Ledger and Compactor.The design hides the compactor as an implementation detail while managing both lifecycles together.
25-61: Constructor properly wires Ledger and Compactor with appropriate error handling.The construction sequence and error wrapping follow good practices.
63-97: Delegated interface methods are straightforward pass-through.All ledger operations correctly delegate to the underlying Ledger instance.
99-115: Ready() correctly sequences ledger initialization before compactor startup.The sequential wait ensures WAL replay completes before the compactor starts processing.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| // Configure ledger service: execution nodes with index < ledgerExecutionCount use remote ledger | ||
| if i < ledgerExecutionCount { | ||
| // This execution node uses remote ledger service | ||
| service.Command = append(service.Command, | ||
| fmt.Sprintf("--ledger-service-addr=ledger_service_1:%s", testnet.GRPCPort), | ||
| ) | ||
| // Execution node depends on ledger service | ||
| service.DependsOn = append(service.DependsOn, "ledger_service_1") | ||
| // Execution nodes using remote ledger should NOT mount the trie directory | ||
| // because the ledger service manages it | ||
| } else { | ||
| // Execution nodes with index >= ledgerExecutionCount use local ledger by default (no flag needed) | ||
| // These nodes need to mount the trie directory for their local ledger | ||
| service.Volumes = append(service.Volumes, | ||
| fmt.Sprintf("%s:/trie:z", trieDir), | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address mismatch between execution node and ledger service configuration.
The execution node is configured to connect to the ledger service via TCP at ledger_service_1:9000 (line 463), but the ledger service in prepareLedgerService (line 856) is configured to listen on a Unix socket: --ledger-service-addr=unix:///sockets/ledger.sock.
This mismatch will cause connection failures at runtime.
🐛 Proposed fix: Use TCP address for ledger service
Either update the execution node to use the Unix socket (requires shared volume), or update the ledger service to listen on TCP. The simpler fix is to use TCP for the ledger service:
In prepareLedgerService, change line 856:
- "--ledger-service-addr=unix:///sockets/ledger.sock",
+ fmt.Sprintf("--ledger-service-addr=0.0.0.0:%s", testnet.GRPCPort),| func TestPartialLedger_StateByIndex(t *testing.T) { | ||
| w := &fixtures.NoopWAL{} | ||
|
|
||
| l, err := complete.NewLedger(w, 100, &metrics.NoopCollector{}, zerolog.Logger{}, complete.DefaultPathFinderVersion) | ||
| require.NoError(t, err) | ||
|
|
||
| compactor := fixtures.NewNoopCompactor(l) | ||
| <-compactor.Ready() | ||
|
|
||
| defer func() { | ||
| <-l.Done() | ||
| <-compactor.Done() | ||
| }() | ||
|
|
||
| // create update and get proof | ||
| state := l.InitialState() | ||
| keys := testutils.RandomUniqueKeys(2, 2, 2, 4) | ||
| values := testutils.RandomValues(2, 1, 32) | ||
| update, err := ledger.NewUpdate(state, keys, values) | ||
| require.NoError(t, err) | ||
|
|
||
| newState, _, err := l.Set(update) | ||
| require.NoError(t, err) | ||
|
|
||
| query, err := ledger.NewQuery(newState, keys) | ||
| require.NoError(t, err) | ||
| proof, err := l.Prove(query) | ||
| require.NoError(t, err) | ||
|
|
||
| pled, err := partial.NewLedger(proof, newState, partial.DefaultPathFinderVersion) | ||
| require.NoError(t, err) | ||
|
|
||
| // Test getting state at index 0 (only valid index for partial ledger) | ||
| state0, err := pled.StateByIndex(0) | ||
| require.NoError(t, err) | ||
| assert.Equal(t, newState, state0, "state at index 0 should match the ledger state") | ||
|
|
||
| // Test that other indices fail | ||
| _, err = pled.StateByIndex(1) | ||
| require.Error(t, err, "should error for index out of range") | ||
|
|
||
| _, err = pled.StateByIndex(-1) | ||
| require.Error(t, err, "should error for negative index (partial ledger only supports index 0)") | ||
|
|
||
| _, err = pled.StateByIndex(-2) | ||
| require.Error(t, err, "should error for negative index out of range") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test expects -1 to error, contradicting interface contract.
Line 251-252 expects StateByIndex(-1) to return an error, but the Ledger interface documents -1 as the last index. If the partial ledger implementation is updated to support -1, this test assertion will need adjustment.
Proposed test fix if implementation is updated
- _, err = pled.StateByIndex(-1)
- require.Error(t, err, "should error for negative index (partial ledger only supports index 0)")
+ // -1 should work as "last index" per interface contract
+ stateMinus1, err := pled.StateByIndex(-1)
+ require.NoError(t, err)
+ assert.Equal(t, newState, stateMinus1, "state at index -1 should match the ledger state")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestPartialLedger_StateByIndex(t *testing.T) { | |
| w := &fixtures.NoopWAL{} | |
| l, err := complete.NewLedger(w, 100, &metrics.NoopCollector{}, zerolog.Logger{}, complete.DefaultPathFinderVersion) | |
| require.NoError(t, err) | |
| compactor := fixtures.NewNoopCompactor(l) | |
| <-compactor.Ready() | |
| defer func() { | |
| <-l.Done() | |
| <-compactor.Done() | |
| }() | |
| // create update and get proof | |
| state := l.InitialState() | |
| keys := testutils.RandomUniqueKeys(2, 2, 2, 4) | |
| values := testutils.RandomValues(2, 1, 32) | |
| update, err := ledger.NewUpdate(state, keys, values) | |
| require.NoError(t, err) | |
| newState, _, err := l.Set(update) | |
| require.NoError(t, err) | |
| query, err := ledger.NewQuery(newState, keys) | |
| require.NoError(t, err) | |
| proof, err := l.Prove(query) | |
| require.NoError(t, err) | |
| pled, err := partial.NewLedger(proof, newState, partial.DefaultPathFinderVersion) | |
| require.NoError(t, err) | |
| // Test getting state at index 0 (only valid index for partial ledger) | |
| state0, err := pled.StateByIndex(0) | |
| require.NoError(t, err) | |
| assert.Equal(t, newState, state0, "state at index 0 should match the ledger state") | |
| // Test that other indices fail | |
| _, err = pled.StateByIndex(1) | |
| require.Error(t, err, "should error for index out of range") | |
| _, err = pled.StateByIndex(-1) | |
| require.Error(t, err, "should error for negative index (partial ledger only supports index 0)") | |
| _, err = pled.StateByIndex(-2) | |
| require.Error(t, err, "should error for negative index out of range") | |
| } | |
| func TestPartialLedger_StateByIndex(t *testing.T) { | |
| w := &fixtures.NoopWAL{} | |
| l, err := complete.NewLedger(w, 100, &metrics.NoopCollector{}, zerolog.Logger{}, complete.DefaultPathFinderVersion) | |
| require.NoError(t, err) | |
| compactor := fixtures.NewNoopCompactor(l) | |
| <-compactor.Ready() | |
| defer func() { | |
| <-l.Done() | |
| <-compactor.Done() | |
| }() | |
| // create update and get proof | |
| state := l.InitialState() | |
| keys := testutils.RandomUniqueKeys(2, 2, 2, 4) | |
| values := testutils.RandomValues(2, 1, 32) | |
| update, err := ledger.NewUpdate(state, keys, values) | |
| require.NoError(t, err) | |
| newState, _, err := l.Set(update) | |
| require.NoError(t, err) | |
| query, err := ledger.NewQuery(newState, keys) | |
| require.NoError(t, err) | |
| proof, err := l.Prove(query) | |
| require.NoError(t, err) | |
| pled, err := partial.NewLedger(proof, newState, partial.DefaultPathFinderVersion) | |
| require.NoError(t, err) | |
| // Test getting state at index 0 (only valid index for partial ledger) | |
| state0, err := pled.StateByIndex(0) | |
| require.NoError(t, err) | |
| assert.Equal(t, newState, state0, "state at index 0 should match the ledger state") | |
| // Test that other indices fail | |
| _, err = pled.StateByIndex(1) | |
| require.Error(t, err, "should error for index out of range") | |
| // -1 should work as "last index" per interface contract | |
| stateMinus1, err := pled.StateByIndex(-1) | |
| require.NoError(t, err) | |
| assert.Equal(t, newState, stateMinus1, "state at index -1 should match the ledger state") | |
| _, err = pled.StateByIndex(-2) | |
| require.Error(t, err, "should error for negative index out of range") | |
| } |
🤖 Prompt for AI Agents
In `@ledger/partial/ledger_test.go` around lines 210 - 256, The test
TestPartialLedger_StateByIndex incorrectly asserts that StateByIndex(-1) should
error even though the Ledger interface documents -1 as the last index; update
the test to treat -1 as valid for the partial ledger by calling
pled.StateByIndex(-1) and asserting no error and that the returned state equals
newState (keep assertions that StateByIndex(1) and StateByIndex(-2) error);
reference the test function TestPartialLedger_StateByIndex and the partial
ledger method StateByIndex (constructed via partial.NewLedger) when making this
change.
| // StateByIndex returns the state at the given index | ||
| // Partial ledger only has one state | ||
| func (l *Ledger) StateByIndex(index int) (ledger.State, error) { | ||
| if index == 0 { | ||
| return l.state, nil | ||
| } | ||
| return ledger.DummyState, fmt.Errorf("index %d is out of range (partial ledger has 1 state)", index) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent handling of -1 index with interface contract.
The Ledger interface documents that -1 is the last index, but StateByIndex in partial ledger returns an error for -1. For a single-state ledger, -1 should resolve to index 0.
Proposed fix to support -1 index
func (l *Ledger) StateByIndex(index int) (ledger.State, error) {
- if index == 0 {
+ if index == 0 || index == -1 {
return l.state, nil
}
return ledger.DummyState, fmt.Errorf("index %d is out of range (partial ledger has 1 state)", index)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // StateByIndex returns the state at the given index | |
| // Partial ledger only has one state | |
| func (l *Ledger) StateByIndex(index int) (ledger.State, error) { | |
| if index == 0 { | |
| return l.state, nil | |
| } | |
| return ledger.DummyState, fmt.Errorf("index %d is out of range (partial ledger has 1 state)", index) | |
| } | |
| // StateByIndex returns the state at the given index | |
| // Partial ledger only has one state | |
| func (l *Ledger) StateByIndex(index int) (ledger.State, error) { | |
| if index == 0 || index == -1 { | |
| return l.state, nil | |
| } | |
| return ledger.DummyState, fmt.Errorf("index %d is out of range (partial ledger has 1 state)", index) | |
| } |
🤖 Prompt for AI Agents
In `@ledger/partial/ledger.go` around lines 174 - 181, StateByIndex currently
rejects index == -1 even though the Ledger interface treats -1 as the last
index; update the Ledger.StateByIndex method to treat index == -1 as index 0 for
the partial ledger: if index == 0 || index == -1 return l.state, nil, otherwise
return ledger.DummyState with the existing out-of-range error message. Modify
the conditional in the StateByIndex function (referencing l.state,
ledger.DummyState, and the method name StateByIndex) to handle -1 as described
and keep the same error formatting for other indices.
| // Retry with exponential backoff for up to 30 seconds | ||
| ctx := context.Background() | ||
| maxRetries := 30 | ||
| retryDelay := 100 * time.Millisecond | ||
|
|
||
| for i := 0; i < maxRetries; i++ { | ||
| _, err := c.client.InitialState(ctx, &emptypb.Empty{}) | ||
| if err == nil { | ||
| c.logger.Info().Msg("ledger service ready") | ||
| return | ||
| } | ||
|
|
||
| if i < maxRetries-1 { | ||
| c.logger.Debug(). | ||
| Err(err). | ||
| Int("attempt", i+1). | ||
| Dur("retry_delay", retryDelay). | ||
| Msg("ledger service not ready, retrying...") | ||
| time.Sleep(retryDelay) | ||
| retryDelay = time.Duration(float64(retryDelay) * 1.5) // exponential backoff | ||
| } else { | ||
| c.logger.Warn().Err(err).Msg("ledger service not ready after retries, proceeding anyway") | ||
| // Still close the channel to avoid blocking forever | ||
| // The execution node will fail later with a more specific error if the service is truly not ready | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exponential backoff has no upper bound, leading to extremely long delays.
The retry delay grows unbounded: after ~20 iterations, delays exceed 1 hour each. With 30 max retries and no cap, the total wait time becomes astronomical. Cap the delay at a reasonable maximum.
🐛 Proposed fix to cap the backoff delay
maxRetries := 30
retryDelay := 100 * time.Millisecond
+ maxRetryDelay := 5 * time.Second
for i := 0; i < maxRetries; i++ {
_, err := c.client.InitialState(ctx, &emptypb.Empty{})
if err == nil {
c.logger.Info().Msg("ledger service ready")
return
}
if i < maxRetries-1 {
c.logger.Debug().
Err(err).
Int("attempt", i+1).
Dur("retry_delay", retryDelay).
Msg("ledger service not ready, retrying...")
time.Sleep(retryDelay)
- retryDelay = time.Duration(float64(retryDelay) * 1.5) // exponential backoff
+ retryDelay = time.Duration(float64(retryDelay) * 1.5)
+ if retryDelay > maxRetryDelay {
+ retryDelay = maxRetryDelay
+ }
} else {🤖 Prompt for AI Agents
In `@ledger/remote/client.go` around lines 257 - 282, The exponential backoff in
the retry loop (uses ctx := context.Background(), maxRetries, retryDelay and
c.client.InitialState) has no upper bound; add a cap such as maxRetryDelay
(e.g., 5s or configurable) and after computing retryDelay =
time.Duration(float64(retryDelay) * 1.5) clamp it so retryDelay =
min(retryDelay, maxRetryDelay) (or use an if to set to maxRetryDelay), ensuring
the logged Dur("retry_delay", retryDelay) reflects the capped value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@ledger/remote/client.go`:
- Around line 179-224: In Set, guard against nil gRPC responses by first
verifying resp != nil and resp.NewState != nil before accessing
resp.NewState.Hash (and return ledger.DummyState with an error if they are nil),
then validate len(resp.NewState.Hash) as before; also defensively handle
resp.TrieUpdate being nil by either treating it as an empty trie update (skip
decode and set trieUpdate = nil) or returning an explicit error, and only call
decodeTrieUpdateFromTransport when resp.TrieUpdate != nil; update error messages
to mention which field was missing (resp or resp.NewState or resp.TrieUpdate) to
aid debugging.
♻️ Duplicate comments (5)
ledger/remote/client.go (3)
27-52: Add a TLS option for gRPC transport credentials.The client always uses
insecure.NewCredentials(), leaving no path to secure production deployments.
75-94: Avoid returning DummyState on RPC/validation errors.Logging and returning
ledger.DummyStatehides connectivity or data issues from callers.
249-286: Cap the exponential backoff delay.Without a cap, retry delays grow unbounded and can stretch readiness to many minutes.
🐛 Proposed fix
maxRetries := 30 retryDelay := 100 * time.Millisecond +maxRetryDelay := 5 * time.Second for i := 0; i < maxRetries; i++ { _, err := c.client.InitialState(ctx, &emptypb.Empty{}) if err == nil { c.logger.Info().Msg("ledger service ready") return } if i < maxRetries-1 { c.logger.Debug(). Err(err). Int("attempt", i+1). Dur("retry_delay", retryDelay). Msg("ledger service not ready, retrying...") time.Sleep(retryDelay) retryDelay = time.Duration(float64(retryDelay) * 1.5) // exponential backoff + if retryDelay > maxRetryDelay { + retryDelay = maxRetryDelay + } } else {ledger/remote/service.go (2)
150-168: Handle nil Value elements in Set requests.
req.Valuesis[]*ledgerpb.Value; nil entries will panic onData/IsNilaccess.🐛 Proposed fix
values := make([]ledger.Value, len(req.Values)) for i, protoValue := range req.Values { + if protoValue == nil { + return nil, status.Error(codes.InvalidArgument, "value at index is nil") + } var value ledger.Value // Reconstruct the original value type using is_nil flag // This preserves the distinction between nil and []byte{} that protobuf losesAs per coding guidelines, all inputs must be considered potentially byzantine.
225-237: Guard against nil KeyPart elements.
protoKey.Partscan contain nil entries, which will panic when accessingpart.Type.🐛 Proposed fix
keyParts := make([]ledger.KeyPart, len(protoKey.Parts)) for i, part := range protoKey.Parts { + if part == nil { + return ledger.Key{}, status.Error(codes.InvalidArgument, "key part is nil") + } if part.Type > 65535 { return ledger.Key{}, status.Error(codes.InvalidArgument, "key part type exceeds uint16") } keyParts[i] = ledger.NewKeyPart(uint16(part.Type), part.Value)As per coding guidelines, all inputs must be considered potentially byzantine.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ledger/remote/client.goledger/remote/encoding.goledger/remote/service.go
🧰 Additional context used
📓 Path-based instructions (4)
**/*.go
📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)
Follow Go coding conventions as documented in
@docs/agents/CodingConventions.mdFollow Go coding standards and conventions as documented in
@docs/agents/GoDocs.md
**/*.go: Follow the existing module structure in/module/,/engine/,/model/and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: runmake generate-mocksafter interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoidfmt.Errorf, useirrecoverablepackage for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)
Files:
ledger/remote/client.goledger/remote/encoding.goledger/remote/service.go
{crypto,fvm,ledger,storage}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Security checks for cryptographic misuse must be enforced using gosec
Files:
ledger/remote/client.goledger/remote/encoding.goledger/remote/service.go
{crypto,fvm,ledger,access,engine}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation
Files:
ledger/remote/client.goledger/remote/encoding.goledger/remote/service.go
{storage,ledger,execution,fvm}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
State consistency is paramount; use proper synchronization primitives
Files:
ledger/remote/client.goledger/remote/encoding.goledger/remote/service.go
🧠 Learnings (3)
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {crypto,fvm,ledger,access,engine}/**/*.go : Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation
Applied to files:
ledger/remote/client.goledger/remote/service.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to {storage,ledger,execution,fvm}/**/*.go : State consistency is paramount; use proper synchronization primitives
Applied to files:
ledger/remote/client.go
📚 Learning: 2025-12-23T00:28:41.005Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:41.005Z
Learning: Applies to **/*.go : All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Applied to files:
ledger/remote/service.go
🧬 Code graph analysis (1)
ledger/remote/encoding.go (1)
ledger/trie_encoder.go (2)
EncodeTrieUpdateCBOR(702-714)DecodeTrieUpdateCBOR(719-730)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
- GitHub Check: Lint (./insecure/)
- GitHub Check: Lint (./integration/)
- GitHub Check: Lint (./)
- GitHub Check: Integration Tests Others (integration)
- GitHub Check: Unit Tests (engine/collection)
- GitHub Check: Unit Tests (engine)
- GitHub Check: Unit Tests (network/alsp)
- GitHub Check: Unit Tests (network/test/cohort2)
- GitHub Check: Unit Tests (network/p2p/scoring)
- GitHub Check: Unit Tests (network/test/cohort1)
- GitHub Check: Unit Tests (module/dkg)
- GitHub Check: Unit Tests (network/p2p/connection)
- GitHub Check: Unit Tests (engine/execution)
- GitHub Check: Unit Tests (engine/verification)
- GitHub Check: Unit Tests (network)
- GitHub Check: Unit Tests (module)
- GitHub Check: Unit Tests (others)
- GitHub Check: Unit Tests (network/p2p)
- GitHub Check: Unit Tests (storage)
- GitHub Check: Unit Tests (network/p2p/node)
- GitHub Check: Unit Tests (engine/common)
- GitHub Check: Unit Tests (engine/execution/ingestion)
- GitHub Check: Unit Tests (state)
- GitHub Check: Unit Tests (engine/execution/computation)
- GitHub Check: Unit Tests (admin)
- GitHub Check: Unit Tests (utils)
- GitHub Check: Unit Tests (fvm)
- GitHub Check: Unit Tests (engine/consensus)
- GitHub Check: Unit Tests (consensus)
- GitHub Check: Unit Tests (engine/access)
- GitHub Check: Unit Tests (ledger)
- GitHub Check: Unit Tests (cmd)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
- GitHub Check: Unit Tests Insecure (insecure)
- GitHub Check: Docker Build
- GitHub Check: analyze-code (go)
🔇 Additional comments (5)
ledger/remote/encoding.go (1)
7-33: LGTM: centralized transport encoding/decoding is consistent.ledger/remote/client.go (3)
67-73: LGTM for lifecycle helpers and straightforward RPC wrappers.Also applies to: 96-112, 226-247, 288-331
141-177: Add nil checks for nested proto message fields, not repeated field elements.The coding guideline requires defensive validation of all inputs. However, the concern should target nested message fields (
resp.ValueinGetSingleValue,resp.StateinInitialState,resp.NewStateinSet), which can be nil if the server doesn't initialize them. Proto3 guarantees that repeated field elements are never nil after successful unmarshaling, soresp.Valueselements in theGetmethod are safe.The actual vulnerable locations are:
GetSingleValue(line 132):resp.Value.Dataaccess without null check onresp.ValueInitialState(line 85):resp.State.Hashaccess without null check onresp.StateSet:resp.NewState.Hashaccess without null check onresp.NewStateLikely an incorrect or invalid review comment.
114-139: Remove the suggested nil checks on gRPC response structures.In proto3, when a gRPC call succeeds (no error returned), the response message and all nested message fields are guaranteed to be properly initialized by the proto deserialization layer. The generated gRPC code allocates the response object before deserialization, ensuring it's never nil when
err == nil. All other methods in this file (InitialState,HasState,Get,Set,Prove) follow the same pattern—checking for errors, then directly accessing response fields—without nil checks on the response structure itself. Validating response data correctness (as done withlen(resp.State.Hash)) is appropriate, but nil checks on the response struct itself are redundant and inconsistent with the codebase's established practices.Likely an incorrect or invalid review comment.
ledger/remote/service.go (1)
15-126: LGTM for service scaffolding and request validation.Also applies to: 192-223
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| // Set updates keys with new values at a specific state and returns the new state. | ||
| func (c *Client) Set(update *ledger.Update) (ledger.State, *ledger.TrieUpdate, error) { | ||
| ctx := context.Background() | ||
| state := update.State() | ||
| req := &ledgerpb.SetRequest{ | ||
| State: &ledgerpb.State{ | ||
| Hash: state[:], | ||
| }, | ||
| Keys: make([]*ledgerpb.Key, len(update.Keys())), | ||
| Values: make([]*ledgerpb.Value, len(update.Values())), | ||
| } | ||
|
|
||
| for i, key := range update.Keys() { | ||
| req.Keys[i] = ledgerKeyToProtoKey(key) | ||
| } | ||
|
|
||
| for i, value := range update.Values() { | ||
| // Distinguish between nil and []byte{} for protobuf encoding | ||
| // Protobuf cannot distinguish them, so we use is_nil flag | ||
| isNil := value == nil | ||
| req.Values[i] = &ledgerpb.Value{ | ||
| Data: value, | ||
| IsNil: isNil, | ||
| } | ||
| } | ||
|
|
||
| resp, err := c.client.Set(ctx, req) | ||
| if err != nil { | ||
| return ledger.DummyState, nil, fmt.Errorf("failed to set values: %w", err) | ||
| } | ||
|
|
||
| var newState ledger.State | ||
| if len(resp.NewState.Hash) != len(newState) { | ||
| return ledger.DummyState, nil, fmt.Errorf("invalid new state hash length") | ||
| } | ||
| copy(newState[:], resp.NewState.Hash) | ||
|
|
||
| // Decode trie update using centralized decoding function to ensure | ||
| // client and server use the same encoding method | ||
| trieUpdate, err := decodeTrieUpdateFromTransport(resp.TrieUpdate) | ||
| if err != nil { | ||
| return ledger.DummyState, nil, fmt.Errorf("failed to decode trie update: %w", err) | ||
| } | ||
|
|
||
| return newState, trieUpdate, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "client.go" -path "*/ledger/remote/*" | head -20Repository: onflow/flow-go
Length of output: 83
🏁 Script executed:
git ls-files | grep -E "ledger.*\.go$" | head -20Repository: onflow/flow-go
Length of output: 1083
🏁 Script executed:
fd "client.go" ledger/remote --type fRepository: onflow/flow-go
Length of output: 81
🏁 Script executed:
wc -l ledger/remote/client.goRepository: onflow/flow-go
Length of output: 85
🏁 Script executed:
sed -n '179,224p' ledger/remote/client.goRepository: onflow/flow-go
Length of output: 1483
🏁 Script executed:
find . -name "*.proto" | grep -i ledger | head -20Repository: onflow/flow-go
Length of output: 88
🏁 Script executed:
rg "message SetResponse" --type protoRepository: onflow/flow-go
Length of output: 87
🏁 Script executed:
cat ledger/protobuf/ledger.proto | grep -A 20 "message SetResponse"Repository: onflow/flow-go
Length of output: 392
🏁 Script executed:
cat ledger/protobuf/ledger.proto | grep -A 5 "service Ledger"Repository: onflow/flow-go
Length of output: 325
🏁 Script executed:
cat ledger/protobuf/ledger.proto | grep -B 5 -A 15 "message State"Repository: onflow/flow-go
Length of output: 1040
🏁 Script executed:
sed -n '1,100p' ledger/remote/client.go | tail -50Repository: onflow/flow-go
Length of output: 1231
🏁 Script executed:
rg "resp\." ledger/remote/client.go | head -20Repository: onflow/flow-go
Length of output: 572
🏁 Script executed:
sed -n '130,150p' ledger/remote/client.goRepository: onflow/flow-go
Length of output: 675
Add nil checks for gRPC response fields in the Set method.
The code directly accesses resp.NewState.Hash without checking if resp or resp.NewState is nil. Per the coding guideline that all inputs must be considered potentially byzantine, a malformed server response with a missing or nil NewState field would cause a nil pointer panic. Add defensive validation before accessing the response field:
resp, err := c.client.Set(ctx, req)
if err != nil {
return ledger.DummyState, nil, fmt.Errorf("failed to set values: %w", err)
}
+if resp == nil || resp.NewState == nil {
+ return ledger.DummyState, nil, fmt.Errorf("invalid response: missing new state")
+}
var newState ledger.State
if len(resp.NewState.Hash) != len(newState) {🤖 Prompt for AI Agents
In `@ledger/remote/client.go` around lines 179 - 224, In Set, guard against nil
gRPC responses by first verifying resp != nil and resp.NewState != nil before
accessing resp.NewState.Hash (and return ledger.DummyState with an error if they
are nil), then validate len(resp.NewState.Hash) as before; also defensively
handle resp.TrieUpdate being nil by either treating it as an empty trie update
(skip decode and set trieUpdate = nil) or returning an explicit error, and only
call decodeTrieUpdateFromTransport when resp.TrieUpdate != nil; update error
messages to mention which field was missing (resp or resp.NewState or
resp.TrieUpdate) to aid debugging.
| - docker-cross-build-execution-arm docker-push-execution-arm | ||
|
|
||
| # execution Ledger Service Build Commands | ||
| - docker-build-execution-ledger-with-adx docker-push-execution-ledger-with-adx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @manny-yes for the changes to the ci builds
Close #8308
This PR moves mtrie to a standalone service (ledger service), and introduces a flag to allow execution node to query register values from the remote ledger service.
Summary by CodeRabbit
New Features
Infrastructure
Bug Fixes
Tests & Docs
✏️ Tip: You can customize this high-level summary in your review settings.