feat(monitor): add number_gte support with tiered backoff to allocation queries #912

MoonBoi9001 · 2026-01-27T04:38:40Z

Summary

Adds number_gte parameter to allocation queries to ensure responses come from indexers synced to at least the last successful block - meaning the block height of the last respond we got from the gateway.
Implements tiered backoff strategy when Gateway returns "block too high" errors (when indexers have not synced to at least the block height of the previous response)
This RP effectively protects against overwriting valid allocation data with empty results from stale indexers that are serving old data.

Problem

The Graph Gateway routes queries to different indexers with varying sync states. Without block freshness constraints, a query might be served by a stale indexer that returns 0 allocations (because it hasn't indexed them yet), overwriting valid cached data.

Solution

Track the block number from successful responses and use number_gte to constrain subsequent queries. When queries fail due to indexers being behind, progressively relax the constraint through tiers:

Tier	Relax the block height constraint by	Equilavalent time on Arbitrum
0	0 blocks	immediate
1	100 blocks	~25 sec
2	1,000 blocks	~4 min
3	10,000 blocks	~40 min
4+	Just use block height from error message	last resort

🤖 Generated with Claude Code

github-actions · 2026-01-27T05:08:59Z

Pull Request Test Coverage Report for Build 21385200531

Details

153 of 273 (56.04%) changed or added relevant lines in 4 files are covered.
5 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.1%) to 68.088%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/tap-agent/src/agent/sender_account.rs	29	51	56.86%
crates/monitor/src/allocations.rs	0	98	0.0%

Files with Coverage Reduction	New Missed Lines	%
crates/monitor/src/allocations.rs	2	0.0%
crates/watcher/src/lib.rs	3	82.47%

Totals
Change from base Build 21372283217:	-0.1%
Covered Lines:	10427
Relevant Lines:	15314

💛 - Coveralls

Add a tiered backoff strategy to handle Gateway routing to indexers with varying sync states. When queries fail due to "block too high" errors, the backoff progressively relaxes the number_gte constraint: - Tier 0: exact block (no backoff) - Tier 1: -100 blocks (~25 sec on Arbitrum) - Tier 2: -1,000 blocks (~4 min on Arbitrum) - Tier 3: -10,000 blocks (~40 min on Arbitrum) - Tier 4+: use max(latest) from Gateway error Includes parsing of Gateway error messages to extract available block numbers as a final fallback. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Update indexer_allocations to use TieredBlockBackoff for handling Gateway routing to indexers with varying sync states: - Add number_gte parameter to get_allocations_with_block_state - Track block number from successful responses - Use number_gte constraint on subsequent queries to ensure freshness - Progress through backoff tiers on "block too high" errors - Preserve existing allocation data when receiving empty results The backoff logic is implemented directly in the allocations module for simplicity, avoiding unnecessary abstractions in the watcher crate. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Apply the same number_gte with tiered backoff pattern to the closed allocations query in sender_account.rs. This ensures consistency with allocation queries and handles Gateway routing to indexers with varying sync states. Changes: - Add retry loop with TieredBlockBackoff for check_closed_allocations - Extract query logic to query_closed_allocations helper - Use number_gte constraint on initial query, hash for pagination - Log backoff tier progression on block-too-high errors Co-Authored-By: Claude Opus 4.5 <[email protected]>

Co-Authored-By: Claude Opus 4.5 <[email protected]>

…query Fixes #911 Tokio's time::interval fires immediately on the first tick, causing a redundant second query right after the initial query. This could overwrite valid data if the second query fails or returns stale results. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Remove 10,000 block tier (40 min too aggressive, 4 min sufficient) - Add ARBITRUM_MIN_EXPECTED_BLOCK (400M) sanity check on initial query - Retry up to 3 times if initial block is suspiciously low - Change closed allocations retry to 2 attempts per tier before advancing Co-Authored-By: Claude Opus 4.5 <[email protected]>

MoonBoi9001 · 2026-01-28T02:57:15Z

If you have time for a review @suchapalaver
Much appreciated!

neithanmo

While I understand the intent to handle Gateway routing to out-of-sync indexers, this PR has issues that block merging:

Hardcoded ARBITRUM_MIN_EXPECTED_BLOCK = 400_000_000 - Breaks on Base, Sepolia, local dev, and any non-Arbitrum-mainnet chain.
Brittle error parsing - String matching on "missing block" / "latest: \d+" is not a stable contract. Gateway error formats can change without notice.
The check "empty = stale" logic - The should_update check rejects empty results even from fresh indexers? this needs to be double check tho. If an indexer legitimately closes all allocations, the watcher will preserve stale data indefinitely.

On the broader approach: The current behavior without this PR is recoverable - queries fail, get logged, and retry until Gateway routes to a synced indexer.

We do think this issue must be addressed on the gateway side, because this problem is not severe enough to warrant a client-side workaround.

@MoonBoi9001 We are investigating the Gateway side to figure out what is the best approach.

Thanks for digging into this and putting together a detailed solution - the tiered backoff concept is solid thinking. Appreciate the effort to improve reliability here! 🙏

neithanmo · 2026-01-29T16:24:45Z

crates/monitor/src/allocations.rs

+    const MAX_RETRIES: usize = 3;
+    for attempt in 1..=MAX_RETRIES {
+        match initial_result.block_number {
+            Some(block) if block >= ARBITRUM_MIN_EXPECTED_BLOCK => break,


This constant is kind of magic number and is problematic because:

It's Arbitrum-specific but the code is meant to be generic

It will be wrong for other chains like Base, Base sepolia, local testnets we use for testing and so on.

an option is to make this value configurable but move the burden of configuring minimum block number threshold to indexer operators

neithanmo · 2026-01-29T16:27:47Z

crates/monitor/src/backoff.rs

+
+/// Checks if an error message indicates a "block too high" / "missing block" error
+/// from the Gateway.
+pub fn is_block_too_high_error(error: &str) -> bool {


This is brittle because:

Error message format is not a contract - Gateway can change it anytime, we do not control it.

we would need integration test again gateway latest version to ensure this works.

If Gateway changes "missing block" to "block unavailable" or "latest: 123" to "max_block: 123", this silently stops working and falls back to... what?

neithanmo · 2026-01-29T16:29:42Z

crates/watcher/src/lib.rs

        let mut time_interval = time::interval(interval);
        time_interval.set_missed_tick_behavior(time::MissedTickBehavior::Skip);
+        // Consume the immediate first tick to avoid redundant query at startup
+        time_interval.tick().await;


This change affects ALL watchers using new_watcher, not just allocations which this feature is trying to fix. While it prevents double-querying at startup, it could affect other watchers' expected behavior?

neithanmo · 2026-01-29T16:45:19Z

crates/monitor/src/allocations.rs

+                    let (should_update, current_count) = {
+                        let current = tx.borrow();
+                        let should_update =
+                            !query_result.allocations.is_empty() || current.is_empty();


this is confusing, only update if result is non-empty OR we have nothing yet. what happen if an indexer is up-to date but does not have opened allocations at this particular time?

MoonBoi9001 · 2026-01-29T19:39:26Z

The current behavior without this PR is recoverable - queries fail, get logged, and retry until Gateway routes to a synced indexer.

True, however there's an issue here that indexers aren't able to serve queries for their subgraphs when their allocation data is served back to them incorrectly, since their indexer-rs beleives that any receipts it receives from the gateway are not correct, so it will not accept them and respond to incomming queries. Indexers that run their own version of the network subgraph can bypass this, but for those that don't it appears to us when we look inside bigquery that they have a really low success rate. Also this affects their score as determined by both the ISA for query time selection and IISA for DIPS, the ISA will start to deprioritise them affecting how much revenue they can generate from serving queries, since from it's perspective it sees an indexer that cannot respond to queries it sends.

It will be wrong for other chains like Base, Base sepolia, local testnets we use for testing and so on.

Okay I was not aware that the protocol also lives on base & Base sepolia and those chains are being used for testing, makes sense that ARBITRUM_MIN_EXPECTED_BLOCK should not be hard coded, but also forgot that this would have issues with arb sepolia & local dev env.

Brittle error parsing - String matching on "missing block" / "latest: \d+" is not a stable contract. Gateway error formats can change without notice.
The check "empty = stale" logic - The should_update check rejects empty results even from fresh indexers? this needs to be double check tho. If an indexer legitimately closes all allocations, the watcher will preserve stale data indefinitely.

Appreciate the review & other comments here, I would be happy to address all review feedback and put this PR in draft for now until that's done, but do you think from your earlier comment "this issue must be addressed on the gateway side" that this repo is not the right place for such a PR then?

MoonBoi9001 · 2026-01-29T20:36:30Z

Just seen the thread about this on slack. Didn't know this was beind disucssed already. Looks like I independently found the same bug that was being reported.

MoonBoi9001 force-pushed the feat/number-gte-tiered-backoff branch from e41e52f to 043ba9d Compare January 27, 2026 04:41

MoonBoi9001 mentioned this pull request Jan 27, 2026

Bug: Watcher sends duplicate query at startup #911

Open

MoonBoi9001 force-pushed the feat/number-gte-tiered-backoff branch 2 times, most recently from e1be806 to 9f38b12 Compare January 27, 2026 05:02

MoonBoi9001 force-pushed the feat/number-gte-tiered-backoff branch from 9f38b12 to 5123b0a Compare January 27, 2026 15:06

MoonBoi9001 and others added 6 commits January 27, 2026 22:41

style: fix formatting in backoff.rs

204e67b

Co-Authored-By: Claude Opus 4.5 <[email protected]>

MoonBoi9001 force-pushed the feat/number-gte-tiered-backoff branch from 4a94fe2 to 5dc4539 Compare January 28, 2026 02:41

MoonBoi9001 marked this pull request as ready for review January 28, 2026 02:56

MoonBoi9001 requested a review from suchapalaver January 28, 2026 02:56

suchapalaver requested a review from neithanmo January 29, 2026 14:07

neithanmo requested changes Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(monitor): add number_gte support with tiered backoff to allocation queries #912

feat(monitor): add number_gte support with tiered backoff to allocation queries #912

MoonBoi9001 commented Jan 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

MoonBoi9001 commented Jan 28, 2026

Uh oh!

neithanmo left a comment

Uh oh!

neithanmo Jan 29, 2026

Uh oh!

neithanmo Jan 29, 2026

Uh oh!

neithanmo Jan 29, 2026

Uh oh!

neithanmo Jan 29, 2026

Uh oh!

MoonBoi9001 commented Jan 29, 2026 •

edited

Loading

Uh oh!

MoonBoi9001 commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(monitor): add number_gte support with tiered backoff to allocation queries #912

Are you sure you want to change the base?

feat(monitor): add number_gte support with tiered backoff to allocation queries #912

Conversation

MoonBoi9001 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Uh oh!

github-actions bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 21385200531

Details

💛 - Coveralls

Uh oh!

MoonBoi9001 commented Jan 28, 2026

Uh oh!

neithanmo left a comment

Choose a reason for hiding this comment

Uh oh!

neithanmo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

neithanmo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

neithanmo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

neithanmo Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

MoonBoi9001 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MoonBoi9001 commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MoonBoi9001 commented Jan 27, 2026 •

edited

Loading

github-actions bot commented Jan 27, 2026 •

edited

Loading

MoonBoi9001 commented Jan 29, 2026 •

edited

Loading