-
Notifications
You must be signed in to change notification settings - Fork 25
fix(evm): preserve gas cost in cyclic chunks #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces cycle detection using Tarjan's strongly connected components (SCC) algorithm to prevent gas cost shifting into or out of cyclic blocks during the SPP (Safety, Precision, Polynomial-time) optimization phase. The change aims to preserve gas costs within cyclic chunks to maintain correct gas accounting in loops.
Changes:
- Added
computeInCyclefunction implementing Tarjan's SCC algorithm to identify blocks in cycles - Created
NonCycleMaskbitset to mark non-cyclic blocks - Modified gas chunk building logic to skip cyclic blocks during lemma614Update calls
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| std::vector<uint64_t> NonCycleMask(bitsetWordCount(Blocks.size()), 0); | ||
| for (size_t Id = 0; Id < Blocks.size(); ++Id) { | ||
| if (Id < InCycle.size() && InCycle[Id] == 0) { |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The redundant bounds check Id < InCycle.size() adds unnecessary complexity. Since InCycle is always sized to Blocks.size() (from line 370, 375), and the loop iterates from 0 to Blocks.size(), this check will always be true. This check is redundant and should be removed for clarity.
| std::function<void(uint32_t)> StrongConnect = [&](uint32_t Node) { | ||
| Index[Node] = CurIndex; | ||
| Lowlink[Node] = CurIndex; | ||
| ++CurIndex; | ||
| Stack.push_back(Node); | ||
| OnStack[Node] = 1; | ||
|
|
||
| for (uint32_t Succ : Blocks[Node].Succs) { | ||
| if (Succ >= NumBlocks) { | ||
| continue; | ||
| } | ||
| if (Index[Succ] < 0) { | ||
| StrongConnect(Succ); | ||
| Lowlink[Node] = std::min(Lowlink[Node], Lowlink[Succ]); | ||
| } else if (OnStack[Succ] != 0) { | ||
| Lowlink[Node] = std::min(Lowlink[Node], Index[Succ]); | ||
| } | ||
| } | ||
|
|
||
| if (Lowlink[Node] != Index[Node]) { | ||
| return; | ||
| } | ||
|
|
||
| std::vector<uint32_t> Scc; | ||
| while (!Stack.empty()) { | ||
| const uint32_t Top = Stack.back(); | ||
| Stack.pop_back(); | ||
| OnStack[Top] = 0; | ||
| Scc.push_back(Top); | ||
| if (Top == Node) { | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| if (Scc.size() > 1) { | ||
| for (uint32_t Id : Scc) { | ||
| InCycle[Id] = 1; | ||
| } | ||
| return; | ||
| } | ||
|
|
||
| const uint32_t Only = Scc.front(); | ||
| for (uint32_t Succ : Blocks[Only].Succs) { | ||
| if (Succ == Only) { | ||
| InCycle[Only] = 1; | ||
| break; | ||
| } |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recursive lambda StrongConnect can cause stack overflow for deeply nested or large graphs due to unbounded recursion depth. For EVM bytecode with complex control flow graphs containing thousands of blocks, this could be problematic.
Consider converting this to an iterative implementation using an explicit stack to avoid potential stack overflow issues, or document the assumption about maximum CFG depth if that's acceptable for this use case.
| std::function<void(uint32_t)> StrongConnect = [&](uint32_t Node) { | |
| Index[Node] = CurIndex; | |
| Lowlink[Node] = CurIndex; | |
| ++CurIndex; | |
| Stack.push_back(Node); | |
| OnStack[Node] = 1; | |
| for (uint32_t Succ : Blocks[Node].Succs) { | |
| if (Succ >= NumBlocks) { | |
| continue; | |
| } | |
| if (Index[Succ] < 0) { | |
| StrongConnect(Succ); | |
| Lowlink[Node] = std::min(Lowlink[Node], Lowlink[Succ]); | |
| } else if (OnStack[Succ] != 0) { | |
| Lowlink[Node] = std::min(Lowlink[Node], Index[Succ]); | |
| } | |
| } | |
| if (Lowlink[Node] != Index[Node]) { | |
| return; | |
| } | |
| std::vector<uint32_t> Scc; | |
| while (!Stack.empty()) { | |
| const uint32_t Top = Stack.back(); | |
| Stack.pop_back(); | |
| OnStack[Top] = 0; | |
| Scc.push_back(Top); | |
| if (Top == Node) { | |
| break; | |
| } | |
| } | |
| if (Scc.size() > 1) { | |
| for (uint32_t Id : Scc) { | |
| InCycle[Id] = 1; | |
| } | |
| return; | |
| } | |
| const uint32_t Only = Scc.front(); | |
| for (uint32_t Succ : Blocks[Only].Succs) { | |
| if (Succ == Only) { | |
| InCycle[Only] = 1; | |
| break; | |
| } | |
| std::function<void(uint32_t)> StrongConnect = [&](uint32_t StartNode) { | |
| // Iterative version of Tarjan's algorithm to avoid recursive calls. | |
| struct Frame { | |
| uint32_t Node; | |
| size_t SuccIndex; | |
| }; | |
| std::vector<Frame> DfsStack; | |
| DfsStack.push_back(Frame{StartNode, 0}); | |
| while (!DfsStack.empty()) { | |
| Frame &CurFrame = DfsStack.back(); | |
| const uint32_t Node = CurFrame.Node; | |
| if (Index[Node] < 0) { | |
| Index[Node] = CurIndex; | |
| Lowlink[Node] = CurIndex; | |
| ++CurIndex; | |
| Stack.push_back(Node); | |
| OnStack[Node] = 1; | |
| } | |
| const auto &Succs = Blocks[Node].Succs; | |
| bool Descended = false; | |
| while (CurFrame.SuccIndex < Succs.size()) { | |
| const uint32_t Succ = Succs[CurFrame.SuccIndex]; | |
| ++CurFrame.SuccIndex; | |
| if (Succ >= NumBlocks) { | |
| continue; | |
| } | |
| if (Index[Succ] < 0) { | |
| // Tree edge: descend into successor. | |
| DfsStack.push_back(Frame{Succ, 0}); | |
| Descended = true; | |
| break; | |
| } | |
| if (OnStack[Succ] != 0) { | |
| // Successor in current SCC stack: update lowlink with its index. | |
| Lowlink[Node] = std::min(Lowlink[Node], Index[Succ]); | |
| } | |
| } | |
| if (Descended) { | |
| continue; | |
| } | |
| // All successors processed; check if Node is a root of an SCC. | |
| if (Lowlink[Node] == Index[Node]) { | |
| std::vector<uint32_t> Scc; | |
| while (!Stack.empty()) { | |
| const uint32_t Top = Stack.back(); | |
| Stack.pop_back(); | |
| OnStack[Top] = 0; | |
| Scc.push_back(Top); | |
| if (Top == Node) { | |
| break; | |
| } | |
| } | |
| if (Scc.size() > 1) { | |
| for (uint32_t Id : Scc) { | |
| InCycle[Id] = 1; | |
| } | |
| } else if (!Scc.empty()) { | |
| const uint32_t Only = Scc.front(); | |
| for (uint32_t Succ : Blocks[Only].Succs) { | |
| if (Succ == Only) { | |
| InCycle[Only] = 1; | |
| break; | |
| } | |
| } | |
| } | |
| } | |
| // Propagate lowlink information to the parent in the DFS stack. | |
| DfsStack.pop_back(); | |
| if (!DfsStack.empty()) { | |
| const uint32_t ParentNode = DfsStack.back().Node; | |
| Lowlink[ParentNode] = std::min(Lowlink[ParentNode], Lowlink[Node]); | |
| } |
| if (!UseLinearSPP) { | ||
| for (uint32_t NodeId : RevTopo) { | ||
| lemma614Update(NodeId, Blocks, &BackEdges, nullptr, Metering); | ||
| if (NodeId < InCycle.size() && InCycle[NodeId] != 0) { | ||
| continue; | ||
| } | ||
| lemma614Update(NodeId, Blocks, &BackEdges, &NonCycleMask, Metering); |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code now uses two different mechanisms to identify loops/cycles: (1) dominator-based back edges via BackEdges, and (2) SCC-based cycle detection via InCycle. These serve different purposes but may produce conflicting results.
Specifically, BackEdges identifies loop back edges (edges where target dominates source), while InCycle identifies all nodes in any strongly connected component. A node can be in a cycle without having any back edges (e.g., in an irreducible loop), and a node with back edges might not be in the SCC identified as a cycle.
This dual approach could lead to inconsistent treatment of loop nodes. Consider documenting why both mechanisms are needed and how they interact, or simplifying to use a single, consistent approach to loop identification.
| if (NodeId < InCycle.size() && InCycle[NodeId] != 0) { | ||
| continue; | ||
| } | ||
| lemma614Update(NodeId, Blocks, nullptr, &NonCycleMask, Metering); |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AllowedMask parameter has been changed from &Loops[LoopId].NodeMask to &NonCycleMask, which fundamentally alters the algorithm's behavior.
The original code restricted updates to successors within the specific loop being processed (via Loops[LoopId].NodeMask). The new code only restricts to non-cyclic successors globally (via NonCycleMask), which means it will incorrectly consider successors outside the current loop during loop-specific processing.
For example, if a non-cyclic node N in loop L1 has a successor S that is non-cyclic and in loop L2 (or not in any loop), the update will now incorrectly include S in the minimum cost calculation. This breaks the loop-local invariant that the SPP algorithm depends on.
The correct approach would be to compute an intersection of both masks, or to restructure the logic to maintain the loop-specific constraint while also excluding cyclic nodes.
| if (!UseLinearSPP) { | ||
| for (uint32_t NodeId : RevTopo) { | ||
| lemma614Update(NodeId, Blocks, &BackEdges, nullptr, Metering); | ||
| if (NodeId < InCycle.size() && InCycle[NodeId] != 0) { |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bounds check NodeId < InCycle.size() is redundant because InCycle is created with size NumBlocks (line 375), and NodeId comes from iterating over RevTopo which contains node indices less than NumBlocks. The same InCycle vector is used throughout without any size changes.
This check pattern appears consistently but unnecessarily at lines 967, 999, and 1010. While not incorrect, it adds unnecessary overhead and code complexity. Consider removing these redundant checks or add a comment explaining why they are needed if there's a specific edge case they guard against.
| if (NodeId < InCycle.size() && InCycle[NodeId] != 0) { | |
| if (InCycle[NodeId] != 0) { |
1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):
2. What is the scope of this PR (e.g. component or file name):
3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):
4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):
5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:
6. Release note