Skip to content

fix: apply CGC immediately during network bootstrap phase#8697

Open
qu0b wants to merge 1 commit intosigp:bal-devnet-2from
qu0b:qu0b/fix/cgc-prefork-immediate-activation
Open

fix: apply CGC immediately during network bootstrap phase#8697
qu0b wants to merge 1 commit intosigp:bal-devnet-2from
qu0b:qu0b/fix/cgc-prefork-immediate-activation

Conversation

@qu0b
Copy link
Contributor

@qu0b qu0b commented Jan 25, 2026

Summary

When validators register during early network bootstrap (epoch 0-1) or before PeerDAS activates, apply the custody group count (CGC) immediately instead of with the standard 30-second delay.

Problem

The standard delay exists to give nodes time to subscribe to new subnets and avoid inconsistent column counts within an epoch. However, during network bootstrap, this delay causes issues:

  1. Node starts with CGC = spec.custody_requirement (4 custody groups)
  2. VC connects and registers validators (should give CGC=128 for 128 validators)
  3. 30-second delay pushes CGC effective epoch to epoch 1+
  4. PeerDAS at Fulu fork needs full custody immediately
  5. Nodes with insufficient custody can't participate properly → chain split

This was observed in bal-devnet-2 testing where pure Lighthouse networks without --supernode flag would consistently split by epoch 4-5.

Solution

Apply CGC immediately when:

  • current_epoch <= 1 (bootstrap phase), OR
  • current_epoch < fulu_fork_epoch (pre-PeerDAS)

For established networks (epoch 2+), the standard delay is preserved to ensure smooth subnet subscription coordination.

Test Results

Configuration Before Fix After Fix
3 LH + 3 Geth (no supernode) Chain split by epoch 5 ✅ Stable epoch 8+, epoch 7 finalized
3 LH + 3 Geth (all supernodes) Stable ✅ Stable (unchanged)

Tested on local Kurtosis devnet with preset: minimal, fulu_fork_epoch: 0, gloas_fork_epoch: 1.

Changes

  • beacon_node/beacon_chain/src/custody_context.rs: Modified register_validators() to check for bootstrap phase before applying delay

🤖 Generated with Claude Code

@cla-assistant
Copy link

cla-assistant bot commented Jan 25, 2026

CLA assistant check
All committers have signed the CLA.

@cla-assistant
Copy link

cla-assistant bot commented Jan 25, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is safe.

register_validators is called post genesis and there could be an inconsistent state where the block is received and the da check passes for cgc=4 before register_validators is called.
So nobody has the full data for slot 1.

Can we not run as supernodes for kurtosis instead?

@qu0b
Copy link
Contributor Author

qu0b commented Jan 26, 2026

I don't quite understand what you're describing here, how can there be an inconsistent state with this PR? In your description it sounds like register validators is called twice, after genesis and after receiving a block?

@qu0b
Copy link
Contributor Author

qu0b commented Jan 26, 2026

How can the check pass for cgc=4 if the LH node has 128 validations?

In the current implementation cgc=4 because of the delay, so Nobody has the full data in a LH only network.

@pawanjay176
Copy link
Member

How can the check pass for cgc=4 if the LH node has 128 validations?

Because until the register_validators call completes, the cgc is still 4. So if we receive and try to validate block 1 before the register_validators call completes, then we are using cgc=4. And after the call completes, the cgc is updated to 128 for the epoch, but we only have 4 columns for the validated block leading to a different cgc value for block 1
I understand that this isn't happening in your kurtosis runs, but this can still happen. I'm fine with merging it to the BAL branch, but won't be comfortable merging this to unstable.

I think the proper fix would be allow prepare_beacon_proposer api to run before genesis and then to special case genesis instead of special casing epoch 0 like you did in the PR

Replace the `epoch <= 1 || is_before_peerdas` special-case with a
simpler approach: start the VC preparation service before genesis wait
so CGC registrations arrive at the BN before the first block.

Changes:
- Move preparation_service.start_update_service() before wait_for_genesis()
- Fix BN HTTP handler to use now_or_genesis() for the CGC slot read
  (chain.slot() returns Err pre-genesis, killing the registration path)
- In register_validators(), apply CGC at epoch 0 when slot == 0

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@qu0b qu0b force-pushed the qu0b/fix/cgc-prefork-immediate-activation branch from 1818723 to 83b000a Compare February 2, 2026 13:15
@qu0b
Copy link
Contributor Author

qu0b commented Feb 2, 2026

@pawanjay176 I let claude cook up something based on your suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants