Improve autopilot maintenance #4113

MartinquaXD · 2026-02-02T13:51:57Z

Description

While looking into the degraded time to happy moo SLI it became apparent that ethflow orders have a significantly worse SLI compared to "regular" orders.
Ethflow orders are not harder to solve for than any other orders but they are special in the way they enter the system. Instead of having a REST API call that puts those orders into the DB they get placed by calling the ethflow contract onchain. The autopilot then indexes those events and puts them into the DB.

Since the autopilot run loop is synced to the block chain (start a new auction right after seeing a new block) ethflow orders are comparable to regular orders that ALWAYS get placed at the worst possible time (immediately before cutting the auction).

Due to being overwhelmed with indexing ethflow orders because of a trade inventive we moved ethflow indexing off of the critical path (see here) but that also had the consequence of more ethflow orders not making it into the first possible auction which immediately delays them at least by 12s.

Changes

This PR puts ethflow order indexing back on the critical path while still avoiding the issue that caused us to move it off the critical path in the first place.
Instead of having a system where the autopilot triggers the maintenance to happen before a new auction or after new block appearing (when waiting for submitted solutions) with an additional background task that checks every second for new ethflow orders that need indexing.
This PR moves autopilot maintenance (i.e. block indexing) completely into a background task which triggers ASAP when the system sees a new block. In order to build the auction only after the blocks have been indexed this background tasks feeds a channel of processed blocks. The autopilot then only has to wait for this channel to yield a block with a high enough block number.

So the properties of the new solutions are:

event indexing has as little delay as possible
indexing runs concurrently so it's as fast as possible (without speeding up the individual code paths)
autopilot can wait for data from a given block to be processed fully
autopilot stops waiting after a configurable amount of time to keep running auctions even if indexing is slow for whatever reason

How to test

Covered by existing e2e tests

gemini-code-assist

Code Review

This pull request refactors the autopilot maintenance logic to run in a background task, using a watch channel for synchronization, to address the latency of ethflow order indexing. However, the current implementation introduces reliability issues, specifically a Denial of Service (DoS) vulnerability where a panic in the background task due to an unchecked unwrap() can crash the entire autopilot process. Additionally, unexpected stream termination could lead to panics and cascading failures, and the use of try_join! means a failure in one component could cancel other indexing tasks. Addressing these issues will improve system robustness and availability.

crates/autopilot/src/maintenance.rs

squadgazzz

The change makes sense to me. But why has the SLI degraded? Is it because of the increased amount of orders or something else?

fafk · 2026-02-03T07:03:49Z

The change makes sense to me. But why has the SLI degraded? Is it because of the increased amount of orders or something else?

The SLI measures fast_orders / total and if ethflow orders are slower it pushes the overall SLI down.

MartinquaXD · 2026-02-03T07:57:17Z

The change makes sense to me. But why has the SLI degraded? Is it because of the increased amount of orders or something else?

While what Jan wrote is correct it does not explain why it suddenly got worse. The change that moved ethflow indexing off of the critical path was a month or so before the ethflow specific metric took a nose dive. It's possible that on average the the rest of the order settlement pipeline was fast enough to compensate for the ethflow indexing degradation but later when another performance regression it it suddenly became a lot more problematic.
The thing with the SLI especially on mainnet is that it's very quantized. If an order misses an auction that's ~12s lost right away. If the settlement gets mined one block later that's another 12s. That's why I believe mainnet is especially prone to huge swings in the SLI when we have been "living on the edge" with the order submission.

On a different note: I did some analysis and according to data in our DB and only ~15% of ethflow orders make it into the first possible block. The rest has to wait at least 1 auction which puts some numbers to the huge difference in the SLI.

Improve autopilot maintenance

6ca1613

MartinquaXD requested a review from a team as a code owner February 2, 2026 13:51

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

crates/autopilot/src/maintenance.rs Show resolved Hide resolved

crates/autopilot/src/maintenance.rs Show resolved Hide resolved

Merge branch 'main' into improve-autopilot-maintenance

3653928

squadgazzz approved these changes Feb 2, 2026

View reviewed changes

Merge branch 'main' into improve-autopilot-maintenance

4be67f8

MartinquaXD enabled auto-merge February 3, 2026 07:58

MartinquaXD added this pull request to the merge queue Feb 3, 2026

Any commits made after this event will not be merged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve autopilot maintenance #4113

Improve autopilot maintenance #4113

MartinquaXD commented Feb 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

squadgazzz left a comment

Uh oh!

fafk commented Feb 3, 2026

Uh oh!

MartinquaXD commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improve autopilot maintenance #4113

Improve autopilot maintenance #4113

Conversation

MartinquaXD commented Feb 2, 2026

Description

Changes

How to test

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

squadgazzz left a comment

Choose a reason for hiding this comment

Uh oh!

fafk commented Feb 3, 2026

Uh oh!

MartinquaXD commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants