Skip to content

Conversation

@ibalajiarun
Copy link
Contributor

@ibalajiarun ibalajiarun commented Nov 11, 2025

Description

This PR introduces support for BatchV2 in QuorumStore, enabling batches to carry additional metadata (such as batch kind) while maintaining full backward compatibility with BatchV1. The implementation uses BatchInfoExt as a unified type that can represent both V1 and V2 batches, and adds new network message types to handle V2 batches. Sending BatchV2 is configured via a local Quorum Store Config flag.

BatchMsg and related QS messages are used to handle BatchInfo or BatchInfoExt::V1 batches normally. BatchMsgV2 and other V2 messages verify that BatchInfoExt::V2 is used. This is to simplify handling of different message types on the receiver.

@ibalajiarun ibalajiarun changed the base branch from balaji/enc-pool-3 to graphite-base/18085 November 11, 2025 21:37
@ibalajiarun ibalajiarun changed the base branch from graphite-base/18085 to balaji/enc-pool-3 November 12, 2025 00:20
@ibalajiarun ibalajiarun force-pushed the balaji/enc-pool-4 branch 2 times, most recently from f2dbced to 9244123 Compare November 14, 2025 02:23
@ibalajiarun ibalajiarun changed the title [qs] send proof v2 support with flag [qs] Support V2 SignedBatchInfo and ProofOfStore messages with BlockInfoExt Nov 14, 2025
@ibalajiarun ibalajiarun marked this pull request as ready for review November 14, 2025 03:57
Copy link
Contributor

@JoshLind JoshLind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me 😄

@ibalajiarun ibalajiarun changed the base branch from balaji/enc-pool-3 to graphite-base/18085 November 16, 2025 05:52
@ibalajiarun ibalajiarun changed the base branch from graphite-base/18085 to balaji/enc-pool-3 November 17, 2025 15:18
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on December 7

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

let batch: Batch<BatchInfoExt> = value.try_into().unwrap();
let batch: Batch<BatchInfo> = batch
.try_into()
.expect("Batch retieval requests must be for V1 batch");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Batch retrieval panics when serving V2 batches

The batch retrieval handler unconditionally attempts to convert Batch<BatchInfoExt> to Batch<BatchInfo> via try_into().expect(), which panics for V2 batches. The batch store now contains both V1 and V2 batches (as PersistedValue<BatchInfoExt>), and a BatchResponse::BatchV2 variant exists for V2 responses. However, when a retrieval request arrives for a V2 batch's digest, the code panics instead of either returning BatchV2 or gracefully handling the incompatibility. This could crash nodes when enable_batch_v2 is enabled and another node requests a V2 batch via the legacy retrieval mechanism.

Fix in Cursor Fix in Web

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

if !batch_info.is_v2() {
self.generate_signed_batch_info(batch_info.info().clone())
.ok()
.map(|inner| inner.into())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Signature verification fails due to type conversion mismatch

For V1 batches, generate_signed_batch_info(batch_info.info().clone()) creates a signature over BatchInfo, then .map(|inner| inner.into()) converts the result to SignedBatchInfo<BatchInfoExt> by wrapping the info in BatchInfoExt::V1. However, the signature remains unchanged. When verification occurs via optimistic_verify(self.signer, &self.info, &self.signature), it hashes BatchInfoExt::V1{info: BatchInfo} which has a different BCS serialization than BatchInfo (enum discriminator byte is prepended). This causes signature verification to fail because the signed data doesn't match the verified data. The PR author acknowledged this issue in the discussion.

Additional Locations (1)

Fix in Cursor Fix in Web

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

✅ Forge suite compat success on 92c0534410a26cafa4509b2dfda12c86811abf0a ==> 0c49bf4e08275a47c56ed52473bf15632fe46c0d

Compatibility test results for 92c0534410a26cafa4509b2dfda12c86811abf0a ==> 0c49bf4e08275a47c56ed52473bf15632fe46c0d (PR)
1. Check liveness of validators at old version: 92c0534410a26cafa4509b2dfda12c86811abf0a
compatibility::simple-validator-upgrade::liveness-check : committed: 13395.68 txn/s, latency: 2586.87 ms, (p50: 2700 ms, p70: 2800, p90: 3300 ms, p99: 4200 ms), latency samples: 442300
2. Upgrading first Validator to new version: 0c49bf4e08275a47c56ed52473bf15632fe46c0d
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5959.69 txn/s, latency: 5707.30 ms, (p50: 6300 ms, p70: 6400, p90: 6500 ms, p99: 6600 ms), latency samples: 203700
3. Upgrading rest of first batch to new version: 0c49bf4e08275a47c56ed52473bf15632fe46c0d
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6000.22 txn/s, latency: 5646.27 ms, (p50: 6200 ms, p70: 6300, p90: 6400 ms, p99: 6500 ms), latency samples: 209780
4. upgrading second batch to new version: 0c49bf4e08275a47c56ed52473bf15632fe46c0d
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 10449.33 txn/s, latency: 3114.16 ms, (p50: 3300 ms, p70: 3400, p90: 3500 ms, p99: 3700 ms), latency samples: 343500
5. check swarm health
Compatibility test for 92c0534410a26cafa4509b2dfda12c86811abf0a ==> 0c49bf4e08275a47c56ed52473bf15632fe46c0d passed
Test Ok

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

✅ Forge suite framework_upgrade success on 92c0534410a26cafa4509b2dfda12c86811abf0a ==> 0c49bf4e08275a47c56ed52473bf15632fe46c0d

Compatibility test results for 92c0534410a26cafa4509b2dfda12c86811abf0a ==> 0c49bf4e08275a47c56ed52473bf15632fe46c0d (PR)
Upgrade the nodes to version: 0c49bf4e08275a47c56ed52473bf15632fe46c0d
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2171.19 txn/s, submitted: 2179.28 txn/s, failed submission: 8.09 txn/s, expired: 8.09 txn/s, latency: 1323.71 ms, (p50: 1200 ms, p70: 1500, p90: 1800 ms, p99: 3000 ms), latency samples: 198660
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2200.93 txn/s, submitted: 2207.99 txn/s, failed submission: 7.06 txn/s, expired: 7.06 txn/s, latency: 1320.10 ms, (p50: 1200 ms, p70: 1500, p90: 1800 ms, p99: 2700 ms), latency samples: 199560
5. check swarm health
Compatibility test for 92c0534410a26cafa4509b2dfda12c86811abf0a ==> 0c49bf4e08275a47c56ed52473bf15632fe46c0d passed
Upgrade the remaining nodes to version: 0c49bf4e08275a47c56ed52473bf15632fe46c0d
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2244.73 txn/s, submitted: 2252.11 txn/s, failed submission: 7.38 txn/s, expired: 7.38 txn/s, latency: 1325.48 ms, (p50: 1200 ms, p70: 1500, p90: 1600 ms, p99: 2400 ms), latency samples: 200781
Test Ok

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

✅ Forge suite realistic_env_max_load success on 0c49bf4e08275a47c56ed52473bf15632fe46c0d

Forge report malformed: Expecting ',' delimiter: line 7 column 1 (char 136)
'{\n  "metrics": [\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "submitted_txn",\n      "value": 5088960.0\n[2025-12-02T01:42:15Z INFO  aptos_forge::report] Test Ok\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "avg_tps",\n      "value": 13668.686596280544\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "avg_latency",\n      "value": 2756.7169472740993\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "p50_latency",\n      "value": 2700.0\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "p90_latency",\n      "value": 3000.0\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "p99_latency",\n      "value": 3600.0\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "submitted_txn",\n      "value": 42680.0\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "avg_tps",\n      "value": 100.02417686183487\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "avg_latency",\n      "value": 733.6994047619048\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "p50_latency",\n      "value": 700.0\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "p90_latency",\n      "value": 900.0\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "p99_latency",\n      "value": 1000.0\n    }\n  ],\n  "text": "two traffics test: inner traffic : committed: 13668.69 txn/s, latency: 2756.72 ms, (p50: 2700 ms, p70: 2900, p90: 3000 ms, p99: 3600 ms), latency samples: 5088960\\ntwo traffics test : committed: 100.02 txn/s, latency: 733.70 ms, (p50: 700 ms, p70: 800, p90: 900 ms, p99: 1000 ms), latency samples: 1680\\nLatency breakdown for phase 0: [\\"MempoolToBlockCreation: max: 2.263, avg: 2.177\\", \\"ConsensusProposalToOrdered: max: 0.169, avg: 0.166\\", \\"ConsensusOrderedToCommit: max: 0.058, avg: 0.052\\", \\"ConsensusProposalToCommit: max: 0.224, avg: 0.218\\"]\\nMax non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.54s no progress at version 14521 (avg 0.07s) [limit 15].\\nMax epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.30s no progress at version 2410741 (avg 0.30s) [limit 16].\\nTest Ok"\n}'
Trailing Log Lines:
networkchaos.chaos-mesh.org "4-gcp--as-southeast1-to-3-gcp--us-east4-netem" deleted
test CompositeNetworkTest ... ok
Test Statistics: 
two traffics test: inner traffic : committed: 13668.69 txn/s, latency: 2756.72 ms, (p50: 2700 ms, p70: 2900, p90: 3000 ms, p99: 3600 ms), latency samples: 5088960
two traffics test : committed: 100.02 txn/s, latency: 733.70 ms, (p50: 700 ms, p70: 800, p90: 900 ms, p99: 1000 ms), latency samples: 1680
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.263, avg: 2.177", "ConsensusProposalToOrdered: max: 0.169, avg: 0.166", "ConsensusOrderedToCommit: max: 0.058, avg: 0.052", "ConsensusProposalToCommit: max: 0.224, avg: 0.218"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.54s no progress at version 14521 (avg 0.07s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.30s no progress at version 2410741 (avg 0.30s) [limit 16].
Test Ok

=== BEGIN JUNIT ===
<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="forge" tests="1" failures="0" errors="0" uuid="41dda6b1-0acb-493d-bc6c-7d3c51c5c1cf">
    <testsuite name="local" tests="1" disabled="0" errors="0" failures="0">
        <testcase name="CompositeNetworkTest(network:multi-region-network-emulation(two traffics test)) with ">
        </testcase>
    </testsuite>
</testsuites>
=== END JUNIT ===
[2025-12-02T01:42:15Z INFO  aptos_forge::backend::k8s::cluster_helper] Deleting namespace forge-e2e-pr-18085: Some(NamespaceStatus { conditions: None, phase: Some("Terminating") })
[2025-12-02T01:42:15Z INFO  aptos_forge::backend::k8s::cluster_helper] aptos-node resources for Forge removed in namespace: forge-e2e-pr-18085

test result: ok. 1 passed; 0 soft failed; 0 hard failed; 0 filtered out

Debugging output:
NAME                                         READY   STATUS      RESTARTS   AGE
aptos-node-0-fullnode-eforge5c1f69d2-0       1/1     Running     0          13m
aptos-node-0-validator-0                     1/1     Running     0          13m
aptos-node-1-fullnode-eforge5c1f69d2-0       1/1     Running     0          13m
aptos-node-1-validator-0                     1/1     Running     0          13m
aptos-node-2-fullnode-eforge5c1f69d2-0       1/1     Running     0          13m
aptos-node-2-validator-0                     1/1     Running     0          13m
aptos-node-3-fullnode-eforge5c1f69d2-0       1/1     Running     0          13m
aptos-node-3-validator-0                     1/1     Running     0          13m
aptos-node-4-fullnode-eforge5c1f69d2-0       1/1     Running     0          13m
aptos-node-4-validator-0                     1/1     Running     0          13m
aptos-node-5-validator-0                     1/1     Running     0          13m
aptos-node-6-validator-0                     1/1     Running     0          13m
forge-testnet-deployer-58c6w                 0/1     Completed   0          14m
genesis-aptos-genesis-eforge5c1f69d2-dhf4l   0/1     Completed   0          13m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants