Skip to content

Conversation

@ibalajiarun
Copy link
Contributor

@ibalajiarun ibalajiarun commented Nov 11, 2025

Description

Introduce OptQuorumStoreV2 Payload that can include BatchV2 in the payload. It uses the same generic trick as the prior PRs to have multiple implementation over BatchInfo and BatchInfoExt. Processing OptQuorumStoreV2 is disabled during proposal handling phase and will be enabled in a future PR.

@ibalajiarun ibalajiarun force-pushed the balaji/enc-pool-5 branch 2 times, most recently from 0680427 to 77720f1 Compare November 17, 2025 15:18
@ibalajiarun ibalajiarun force-pushed the balaji/enc-pool-5 branch 2 times, most recently from 0006bf4 to 68e12f2 Compare November 24, 2025 20:26
@ibalajiarun ibalajiarun changed the base branch from balaji/enc-pool-5 to graphite-base/18087 November 25, 2025 19:06
@ibalajiarun ibalajiarun changed the base branch from graphite-base/18087 to balaji/enc-pool-4 November 25, 2025 19:06
@ibalajiarun ibalajiarun force-pushed the balaji/enc-pool-6 branch 3 times, most recently from 88d173e to 1d108f7 Compare November 25, 2025 19:28
@ibalajiarun ibalajiarun marked this pull request as ready for review November 25, 2025 19:30
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bug: Missing V2 handler causes unreachable panic

The get_transactions method handles OptQuorumStorePayload::V1 but not V2, causing V2 payloads to hit the unreachable!() macro and panic. This prevents processing of V2 payloads even though they're introduced in this PR and handled elsewhere in the codebase.

consensus/src/payload_manager/quorum_store_payload_manager.rs#L541-L548

},
_ => unreachable!(
"Wrong payload {} epoch {}, round {}, id {}",
payload,
block.block_data().epoch(),
block.block_data().round(),
block.id()
),

Fix in Cursor Fix in Web


Copy link
Contributor

@danielxiangzl danielxiangzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. cursor bot is good at reviewing pr like this..

Copy link
Contributor

@JoshLind JoshLind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

self.opt_batches.num_bytes() + self.proofs.num_bytes() + self.inline_batches.num_bytes()
pub(crate) fn extend(self, other: Self) -> Self {
match (self, other) {
(Self::V1(p1), Self::V1(p2)) => Self::V1(p1.extend(p2)),
Copy link
Contributor

@JoshLind JoshLind Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question, but do we imagine cases where we're extending V1 and V2 batches together, e.g., during a rollout? Or would they only be exclusive (i.e., only V1 extending other V1s, and V2 extending other V2s?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extend fn is just for the DAG implementation, where it is possible to have V1 extending V2, etc (we don't use that code path anymore, maybe time for a cleanup?). For our leader based consensus, we only propose one "Payload" and execute it.

@ibalajiarun ibalajiarun changed the base branch from balaji/enc-pool-4 to graphite-base/18087 December 2, 2025 01:05
@ibalajiarun ibalajiarun changed the base branch from graphite-base/18087 to main December 3, 2025 04:45
@ibalajiarun ibalajiarun enabled auto-merge (squash) December 3, 2025 04:54
.await?
},
Payload::OptQuorumStore(opt_qs_payload) => {
Payload::OptQuorumStore(OptQuorumStorePayload::V1(opt_qs_payload)) => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing V2 handler causes panic instead of graceful error

The get_transactions function only handles OptQuorumStorePayload::V1 and falls through to unreachable!() for V2. This is inconsistent with other functions in the same file (prefetch_payload_data, check_payload_availability, notify_commit, get_inline_transactions) which explicitly handle both V1 and V2 variants. The get_inline_transactions function demonstrates the expected pattern: handle V2 with error!("OptQSPayload V2 is not expected") and return gracefully. If V2 were to reach get_transactions, it would panic instead of returning an error.

Fix in Cursor Fix in Web

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

✅ Forge suite compat success on 92c0534410a26cafa4509b2dfda12c86811abf0a ==> d50062252700edf240ba9620a57ec7176fa69e9f

Compatibility test results for 92c0534410a26cafa4509b2dfda12c86811abf0a ==> d50062252700edf240ba9620a57ec7176fa69e9f (PR)
1. Check liveness of validators at old version: 92c0534410a26cafa4509b2dfda12c86811abf0a
compatibility::simple-validator-upgrade::liveness-check : committed: 12747.27 txn/s, latency: 2725.31 ms, (p50: 2800 ms, p70: 3000, p90: 3300 ms, p99: 3700 ms), latency samples: 422220
2. Upgrading first Validator to new version: d50062252700edf240ba9620a57ec7176fa69e9f
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6089.18 txn/s, latency: 5582.39 ms, (p50: 6200 ms, p70: 6300, p90: 6400 ms, p99: 6500 ms), latency samples: 206240
3. Upgrading rest of first batch to new version: d50062252700edf240ba9620a57ec7176fa69e9f
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6016.26 txn/s, latency: 5637.12 ms, (p50: 6200 ms, p70: 6300, p90: 6500 ms, p99: 6600 ms), latency samples: 208980
4. upgrading second batch to new version: d50062252700edf240ba9620a57ec7176fa69e9f
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 9979.23 txn/s, latency: 3277.53 ms, (p50: 3300 ms, p70: 3700, p90: 4500 ms, p99: 5200 ms), latency samples: 331580
5. check swarm health
Compatibility test for 92c0534410a26cafa4509b2dfda12c86811abf0a ==> d50062252700edf240ba9620a57ec7176fa69e9f passed
Test Ok

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

✅ Forge suite realistic_env_max_load success on d50062252700edf240ba9620a57ec7176fa69e9f

two traffics test: inner traffic : committed: 13592.43 txn/s, latency: 2774.15 ms, (p50: 2700 ms, p70: 2900, p90: 3000 ms, p99: 3600 ms), latency samples: 5060840
two traffics test : committed: 99.98 txn/s, latency: 725.20 ms, (p50: 700 ms, p70: 700, p90: 900 ms, p99: 1100 ms), latency samples: 1760
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.277, avg: 2.183", "ConsensusProposalToOrdered: max: 0.171, avg: 0.166", "ConsensusOrderedToCommit: max: 0.053, avg: 0.049", "ConsensusProposalToCommit: max: 0.222, avg: 0.215"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.46s no progress at version 29848 (avg 0.07s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.27s no progress at version 2089757 (avg 0.27s) [limit 16].
Test Ok

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

✅ Forge suite framework_upgrade success on 92c0534410a26cafa4509b2dfda12c86811abf0a ==> d50062252700edf240ba9620a57ec7176fa69e9f

Forge report malformed: Expecting property name enclosed in double quotes: line 6 column 1 (char 139)
'{\n  "metrics": [\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "submitted_txn",\n[2025-12-03T05:30:54Z INFO  aptos_forge::report] Test Ok\n      "value": 208881.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "expired_txn",\n      "value": 600.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "avg_tps",\n      "value": 2314.8371241832647\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "avg_latency",\n      "value": 1266.1030002736688\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "p50_latency",\n      "value": 1200.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "p90_latency",\n      "value": 1500.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "p99_latency",\n      "value": 1900.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "submitted_txn",\n      "value": 135520.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "expired_txn",\n      "value": 520.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "avg_tps",\n      "value": 1499.9998345833515\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "avg_latency",\n      "value": 1970.2481407407408\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "p50_latency",\n      "value": 1200.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "p90_latency",\n      "value": 1800.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "p99_latency",\n      "value": 11500.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "submitted_txn",\n      "value": 208563.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "expired_txn",\n      "value": 1000.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "avg_tps",\n      "value": 2286.792608149354\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "avg_latency",\n      "value": 1270.4775947543637\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "p50_latency",\n      "value": 1200.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "p90_latency",\n      "value": 1500.0\n    },\n    {\n      "test_name": "framework_upgrade::framework-upgrade::full-framework-upgrade",\n      "metric": "p99_latency",\n      "value": 1800.0\n    }\n  ],\n  "text": "Compatibility test results for 92c0534410a26cafa4509b2dfda12c86811abf0a ==> d50062252700edf240ba9620a57ec7176fa69e9f (PR)\\nUpgrade the nodes to version: d50062252700edf240ba9620a57ec7176fa69e9f\\nframework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2314.84 txn/s, submitted: 2321.51 txn/s, failed submission: 6.67 txn/s, expired: 6.67 txn/s, latency: 1266.10 ms, (p50: 1200 ms, p70: 1500, p90: 1500 ms, p99: 1900 ms), latency samples: 208281\\nframework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1500.00 txn/s, submitted: 1505.78 txn/s, failed submission: 5.78 txn/s, expired: 5.78 txn/s, latency: 1970.25 ms, (p50: 1200 ms, p70: 1500, p90: 1800 ms, p99: 11500 ms), latency samples: 135000\\n5. check swarm health\\nCompatibility test for 92c0534410a26cafa4509b2dfda12c86811abf0a ==> d50062252700edf240ba9620a57ec7176fa69e9f passed\\nUpgrade the remaining nodes to version: d50062252700edf240ba9620a57ec7176fa69e9f\\nframework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2286.79 txn/s, submitted: 2297.81 txn/s, failed submission: 11.02 txn/s, expired: 11.02 txn/s, latency: 1270.48 ms, (p50: 1200 ms, p70: 1500, p90: 1500 ms, p99: 1800 ms), latency samples: 207563\\nTest Ok"\n}'
Trailing Log Lines:
Compatibility test results for 92c0534410a26cafa4509b2dfda12c86811abf0a ==> d50062252700edf240ba9620a57ec7176fa69e9f (PR)
Upgrade the nodes to version: d50062252700edf240ba9620a57ec7176fa69e9f
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2314.84 txn/s, submitted: 2321.51 txn/s, failed submission: 6.67 txn/s, expired: 6.67 txn/s, latency: 1266.10 ms, (p50: 1200 ms, p70: 1500, p90: 1500 ms, p99: 1900 ms), latency samples: 208281
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1500.00 txn/s, submitted: 1505.78 txn/s, failed submission: 5.78 txn/s, expired: 5.78 txn/s, latency: 1970.25 ms, (p50: 1200 ms, p70: 1500, p90: 1800 ms, p99: 11500 ms), latency samples: 135000
5. check swarm health
Compatibility test for 92c0534410a26cafa4509b2dfda12c86811abf0a ==> d50062252700edf240ba9620a57ec7176fa69e9f passed
Upgrade the remaining nodes to version: d50062252700edf240ba9620a57ec7176fa69e9f
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2286.79 txn/s, submitted: 2297.81 txn/s, failed submission: 11.02 txn/s, expired: 11.02 txn/s, latency: 1270.48 ms, (p50: 1200 ms, p70: 1500, p90: 1500 ms, p99: 1800 ms), latency samples: 207563
Test Ok

=== BEGIN JUNIT ===
<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="forge" tests="1" failures="0" errors="0" uuid="8f660244-f1a1-40ff-87ff-c4980924f729">
    <testsuite name="local" tests="1" disabled="0" errors="0" failures="0">
        <testcase name="framework_upgrade::framework-upgrade">
        </testcase>
    </testsuite>
</testsuites>
=== END JUNIT ===
[2025-12-03T05:30:54Z INFO  aptos_forge::backend::k8s::cluster_helper] Deleting namespace forge-framework-upgrade-pr-18087: Some(NamespaceStatus { conditions: None, phase: Some("Terminating") })
[2025-12-03T05:30:54Z INFO  aptos_forge::backend::k8s::cluster_helper] aptos-node resources for Forge removed in namespace: forge-framework-upgrade-pr-18087

test result: ok. 1 passed; 0 soft failed; 0 hard failed; 0 filtered out

Debugging output:
NAME                                         READY   STATUS      RESTARTS   AGE
aptos-node-0-validator-0                     1/1     Running     0          11m
aptos-node-1-validator-0                     1/1     Running     0          11m
aptos-node-2-validator-0                     1/1     Running     0          3m56s
aptos-node-3-validator-0                     1/1     Running     0          3m18s
forge-testnet-deployer-jdfjv                 0/1     Completed   0          15m
genesis-aptos-genesis-eforge5a8d7bdf-fxvwp   0/1     Completed   0          15m

@ibalajiarun ibalajiarun merged commit 5c8402e into main Dec 3, 2025
121 of 131 checks passed
@ibalajiarun ibalajiarun deleted the balaji/enc-pool-6 branch December 3, 2025 05:38
0o-de-lally pushed a commit to 0LNetworkCommunity/zapatos that referenced this pull request Dec 9, 2025
* [consensus] Introduce OptQS::V2 Payload
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants