Skip to content

Optimize total_surplus query#4116

Draft
jmg-duarte wants to merge 4 commits intomainfrom
jmgd/surplus
Draft

Optimize total_surplus query#4116
jmg-duarte wants to merge 4 commits intomainfrom
jmgd/surplus

Conversation

@jmg-duarte
Copy link
Contributor

@jmg-duarte jmg-duarte commented Feb 3, 2026

Description

We've been experiencing latency spikes on several endpoints, we've pinned this down to the time it takes to acquire DB connections from the pool; when checking RDS monitoring, the surplus query always shows up at the top.

The current theory is that the query is a bit slower than it could be, as more users request the main swap page, their surplus is loaded (even if they don't request it — i.e. load the wallet modal) and if some of these users have a larger amount of orders, they're taking up connections that other endpoints aren't getting.

I looked into the user distribution, here are the results:

log_bucket bucket_start num_users pct cumulative_users cumulative_pct
0 1 295329 83.86 295329 83.86
1 10 51334 14.58 346663 98.44
2 100 5071 1.44 351734 99.88
3 1000 306 0.09 352040 99.96
4 10000 106 0.03 352146 99.99
5 100000 28 0.01 352174 100.00
Distribution Query

WITH buckets AS (
    SELECT 
        floor(log(order_count))::int as log_bucket,
        power(10, floor(log(order_count)))::int as bucket_start,
        count(*) as num_users
    FROM (
        SELECT owner, count(*) as order_count 
        FROM orders 
        GROUP BY owner
    ) sub
    GROUP BY 1, 2
)
SELECT 
    log_bucket,
    bucket_start,
    num_users,
    round(100.0 * num_users / sum(num_users) over(), 2) as pct,
    sum(num_users) over(order by log_bucket) as cumulative_users,
    round(100.0 * sum(num_users) over(order by log_bucket) / sum(num_users) over(), 2) as cumulative_pct
FROM buckets
ORDER BY log_bucket;

However, if it was just this, it would be too simple. Depending on the user, they might have no orders and only onchain orders (note that the max number of onchain orders is around ~10k), some will have skewed data distributions across tables too, leading to analyzing and optimizing this query a bit tricky.

There are two crucial changes to the query — removing ARRAY_AGG and adding indexes — the first makes it so that the DB does not have to materialize a potentially big array in memory, which would otherwise lead to bad estimations too; the second provides better "access paths" to some of the information the query requires.

Changes

  • Replace the query with the optimized one

Query Plans

Before

Aggregate  (cost=1033006.55..1033006.56 rows=1 width=8) (actual time=130139.025..130143.456 rows=1 loops=1)
  ->  Append  (cost=1032650.26..1033006.51 rows=2 width=100) (actual time=31175.499..130090.776 rows=10998 loops=1)
        ->  Subquery Scan on "*SELECT* 1"  (cost=1032650.26..1032856.24 rows=1 width=100) (actual time=31175.498..130085.109 rows=10998 loops=1)
              ->  Nested Loop  (cost=1032650.26..1032856.23 rows=1 width=100) (actual time=31175.497..130079.454 rows=10998 loops=1)
                    Join Filter: (t.order_uid = o.uid)
                    InitPlan 3 (returns $5)
                      ->  Finalize Aggregate  (cost=1032601.57..1032601.58 rows=1 width=32) (actual time=30761.210..30765.632 rows=1 loops=1)
                            ->  Gather  (cost=1032601.35..1032601.56 rows=2 width=32) (actual time=30678.267..30689.666 rows=3 loops=1)
                                  Workers Planned: 2
                                  Workers Launched: 2
                                  ->  Partial Aggregate  (cost=1031601.35..1031601.36 rows=1 width=32) (actual time=30675.862..30675.863 rows=1 loops=3)
                                        ->  Parallel Bitmap Heap Scan on orders  (cost=24591.99..1030861.56 rows=295913 width=57) (actual time=278.099..30619.932 rows=226281 loops=3)
                                              Recheck Cond: (owner = '\x10dad59905d93ca37cd25a35f25349cb5956ba8e'::bytea)
                                              Rows Removed by Index Recheck: 737927
                                              Heap Blocks: exact=15162 lossy=55571
                                              ->  Bitmap Index Scan on order_owner  (cost=0.00..24414.44 rows=710192 width=0) (actual time=271.994..271.994 rows=749143 loops=1)
                                                    Index Cond: (owner = '\x10dad59905d93ca37cd25a35f25349cb5956ba8e'::bytea)
                    InitPlan 4 (returns $6)
                      ->  Aggregate  (cost=47.00..47.01 rows=1 width=32) (actual time=0.021..0.022 rows=1 loops=1)
                            ->  Bitmap Heap Scan on onchain_placed_orders  (cost=4.09..46.97 rows=11 width=57) (actual time=0.017..0.018 rows=0 loops=1)
                                  Recheck Cond: (sender = '\x10dad59905d93ca37cd25a35f25349cb5956ba8e'::bytea)
                                  ->  Bitmap Index Scan on order_sender  (cost=0.00..4.08 rows=11 width=0) (actual time=0.013..0.013 rows=0 loops=1)
                                        Index Cond: (sender = '\x10dad59905d93ca37cd25a35f25349cb5956ba8e'::bytea)
                    ->  Nested Loop  (cost=1.12..136.08 rows=2 width=188) (actual time=31172.849..109529.707 rows=10914 loops=1)
                          ->  Index Scan using orders_pkey on orders o  (cost=0.56..85.71 rows=10 width=123) (actual time=31172.113..101635.999 rows=678843 loops=1)
                                Index Cond: (uid = ANY (array_cat($5, $6)))
                          ->  Index Only Scan using order_rewards_pkey on order_execution oe  (cost=0.56..5.01 rows=3 width=65) (actual time=0.011..0.011 rows=0 loops=678843)
                                Index Cond: (order_uid = o.uid)
                                Heap Fetches: 170
                    ->  Index Scan using trade_order_uid on trades t  (cost=0.56..0.74 rows=2 width=81) (actual time=0.416..0.416 rows=1 loops=10914)
                          Index Cond: (order_uid = oe.order_uid)
                    SubPlan 1
                      ->  Index Scan using auction_prices_pkey on auction_prices ap  (cost=0.58..35.00 rows=16 width=11) (actual time=1.428..1.428 rows=1 loops=9675)
                            Index Cond: ((auction_id = oe.auction_id) AND (token = o.buy_token))
                    SubPlan 2
                      ->  Index Scan using auction_prices_pkey on auction_prices ap_1  (cost=0.58..35.00 rows=16 width=11) (actual time=1.596..1.596 rows=1 loops=1323)
                            Index Cond: ((auction_id = oe.auction_id) AND (token = o.sell_token))
        ->  Subquery Scan on "*SELECT* 2"  (cost=2.09..150.26 rows=1 width=100) (actual time=0.010..0.013 rows=0 loops=1)
              ->  Nested Loop  (cost=2.09..150.25 rows=1 width=100) (actual time=0.009..0.012 rows=0 loops=1)
                    Join Filter: (j.uid = t_1.order_uid)
                    ->  Nested Loop  (cost=1.54..79.45 rows=1 width=192) (actual time=0.009..0.010 rows=0 loops=1)
                          ->  Nested Loop Anti Join  (cost=0.98..74.32 rows=1 width=127) (actual time=0.009..0.010 rows=0 loops=1)
                                ->  Index Scan using jit_user_order_creation_timestamp on jit_orders j  (cost=0.42..33.62 rows=8 width=127) (actual time=0.008..0.009 rows=0 loops=1)
                                      Index Cond: (owner = '\x10dad59905d93ca37cd25a35f25349cb5956ba8e'::bytea)
                                ->  Index Only Scan using orders_pkey on orders o_1  (cost=0.56..5.08 rows=1 width=57) (never executed)
                                      Index Cond: (uid = j.uid)
                                      Heap Fetches: 0
                          ->  Index Only Scan using order_rewards_pkey on order_execution oe_1  (cost=0.56..5.11 rows=3 width=65) (never executed)
                                Index Cond: (order_uid = j.uid)
                                Heap Fetches: 0
                    ->  Index Scan using trade_order_uid on trades t_1  (cost=0.56..0.74 rows=2 width=81) (never executed)
                          Index Cond: (order_uid = oe_1.order_uid)
                    SubPlan 5
                      ->  Index Scan using auction_prices_pkey on auction_prices ap_2  (cost=0.58..35.00 rows=16 width=11) (never executed)
                            Index Cond: ((auction_id = oe_1.auction_id) AND (token = j.buy_token))
                    SubPlan 6
                      ->  Index Scan using auction_prices_pkey on auction_prices ap_3  (cost=0.58..35.00 rows=16 width=11) (never executed)
                            Index Cond: ((auction_id = oe_1.auction_id) AND (token = j.sell_token))
Planning Time: 4.788 ms
Execution Time: 130157.850 ms

After

Aggregate  (cost=14722.58..14722.59 rows=1 width=8) (actual time=17.801..17.934 rows=1 loops=1)
  Buffers: shared hit=16124
  ->  Gather Merge  (cost=14668.25..14713.78 rows=391 width=136) (actual time=17.037..17.329 rows=917 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=16124
        ->  Sort  (cost=13668.22..13668.63 rows=162 width=136) (actual time=5.476..5.506 rows=306 loops=3)
              Sort Key: "*SELECT* 2".uid
              Sort Method: quicksort  Memory: 139kB
              Buffers: shared hit=16124
              Worker 0:  Sort Method: quicksort  Memory: 25kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              ->  Parallel Append  (cost=2.38..13662.28 rows=162 width=136) (actual time=0.017..5.331 rows=306 loops=3)
                    Buffers: shared hit=16110
                    ->  Subquery Scan on "*SELECT* 2"  (cost=24.37..13661.47 rows=53 width=136) (actual time=0.088..13.075 rows=709 loops=1)
                          Buffers: shared hit=13313
                          ->  Nested Loop Left Join  (cost=24.37..13660.94 rows=53 width=136) (actual time=0.087..12.985 rows=709 loops=1)
                                Buffers: shared hit=13313
                                ->  Nested Loop  (cost=23.67..13602.84 rows=53 width=156) (actual time=0.079..9.546 rows=709 loops=1)
                                      Buffers: shared hit=9567
                                      ->  Nested Loop  (cost=23.11..12524.27 rows=213 width=262) (actual time=0.074..6.789 rows=709 loops=1)
                                            Buffers: shared hit=6712
                                            ->  Nested Loop  (cost=22.56..8928.44 rows=774 width=181) (actual time=0.069..4.070 rows=711 loops=1)
                                                  Buffers: shared hit=3858
                                                  ->  Bitmap Heap Scan on onchain_placed_orders opo  (cost=22.00..2291.52 rows=774 width=57) (actual time=0.059..0.526 rows=711 loops=1)
                                                        Recheck Cond: (sender = '\x8ef4fb956d0cb06ca9e3db76040f08154e8d0122'::bytea)
                                                        Heap Blocks: exact=300
                                                        Buffers: shared hit=303
                                                        ->  Bitmap Index Scan on order_sender  (cost=0.00..21.80 rows=774 width=0) (actual time=0.032..0.032 rows=711 loops=1)
                                                              Index Cond: (sender = '\x8ef4fb956d0cb06ca9e3db76040f08154e8d0122'::bytea)
                                                              Buffers: shared hit=3
                                                  ->  Index Scan using orders_pkey on orders o  (cost=0.56..8.57 rows=1 width=124) (actual time=0.005..0.005 rows=1 loops=711)
                                                        Index Cond: (uid = opo.uid)
                                                        Filter: (owner <> '\x8ef4fb956d0cb06ca9e3db76040f08154e8d0122'::bytea)
                                                        Buffers: shared hit=3555
                                            ->  Index Only Scan using idx_trades_covering on trades t  (cost=0.56..4.63 rows=2 width=81) (actual time=0.003..0.003 rows=1 loops=711)
                                                  Index Cond: (order_uid = o.uid)
                                                  Heap Fetches: 0
                                                  Buffers: shared hit=2854
                                      ->  Index Only Scan using order_rewards_pkey on order_execution oe  (cost=0.56..5.03 rows=3 width=65) (actual time=0.003..0.004 rows=1 loops=709)
                                            Index Cond: (order_uid = o.uid)
                                            Heap Fetches: 16
                                            Buffers: shared hit=2855
                                ->  Index Only Scan using idx_auction_prices_covering on auction_prices ap  (cost=0.70..6.56 rows=16 width=40) (actual time=0.004..0.004 rows=1 loops=709)
                                      Index Cond: ((auction_id = oe.auction_id) AND (token = CASE o.kind WHEN 'sell'::orderkind THEN o.buy_token ELSE o.sell_token END))
                                      Heap Fetches: 0
                                      Buffers: shared hit=3746
                    ->  Subquery Scan on "*SELECT* 1"  (cost=2.38..4382.59 rows=337 width=136) (actual time=0.039..2.802 rows=208 loops=1)
                          Buffers: shared hit=2794
                          ->  Nested Loop  (cost=2.38..4379.22 rows=337 width=136) (actual time=0.039..2.777 rows=208 loops=1)
                                Join Filter: (t_1.order_uid = o_1.uid)
                                Buffers: shared hit=2794
                                ->  Nested Loop Left Join  (cost=1.82..3673.18 rows=150 width=150) (actual time=0.027..1.863 rows=208 loops=1)
                                      Buffers: shared hit=1958
                                      ->  Nested Loop  (cost=1.12..3511.39 rows=150 width=189) (actual time=0.018..0.945 rows=208 loops=1)
                                            Buffers: shared hit=872
                                            ->  Index Only Scan using idx_orders_owner_covering on orders o_1  (cost=0.56..371.00 rows=602 width=124) (actual time=0.010..0.061 rows=210 loops=1)
                                                  Index Cond: (owner = '\x8ef4fb956d0cb06ca9e3db76040f08154e8d0122'::bytea)
                                                  Heap Fetches: 2
                                                  Buffers: shared hit=24
                                            ->  Index Only Scan using order_rewards_pkey on order_execution oe_1  (cost=0.56..5.19 rows=3 width=65) (actual time=0.004..0.004 rows=1 loops=210)
                                                  Index Cond: (order_uid = o_1.uid)
                                                  Heap Fetches: 4
                                                  Buffers: shared hit=848
                                      ->  Index Only Scan using idx_auction_prices_covering on auction_prices ap_1  (cost=0.70..6.68 rows=16 width=40) (actual time=0.004..0.004 rows=1 loops=208)
                                            Index Cond: ((auction_id = oe_1.auction_id) AND (token = CASE o_1.kind WHEN 'sell'::orderkind THEN o_1.buy_token ELSE o_1.sell_token END))
                                            Heap Fetches: 0
                                            Buffers: shared hit=1086
                                ->  Index Only Scan using idx_trades_covering on trades t_1  (cost=0.56..4.63 rows=2 width=81) (actual time=0.003..0.004 rows=1 loops=208)
                                      Index Cond: (order_uid = oe_1.order_uid)
                                      Heap Fetches: 0
                                      Buffers: shared hit=836
                    ->  Subquery Scan on "*SELECT* 3"  (cost=2.80..85.63 rows=1 width=136) (actual time=0.011..0.013 rows=0 loops=1)
                          Buffers: shared hit=3
                          ->  Nested Loop Left Join  (cost=2.80..85.62 rows=1 width=136) (actual time=0.011..0.013 rows=0 loops=1)
                                Buffers: shared hit=3
                                ->  Nested Loop  (cost=2.09..84.49 rows=1 width=159) (actual time=0.011..0.012 rows=0 loops=1)
                                      Buffers: shared hit=3
                                      ->  Nested Loop  (cost=1.54..79.43 rows=1 width=208) (actual time=0.010..0.011 rows=0 loops=1)
                                            Buffers: shared hit=3
                                            ->  Nested Loop Anti Join  (cost=0.98..74.32 rows=1 width=127) (actual time=0.010..0.011 rows=0 loops=1)
                                                  Buffers: shared hit=3
                                                  ->  Index Scan using jit_user_order_creation_timestamp on jit_orders j  (cost=0.42..33.62 rows=8 width=127) (actual time=0.010..0.010 rows=0 loops=1)
                                                        Index Cond: (owner = '\x8ef4fb956d0cb06ca9e3db76040f08154e8d0122'::bytea)
                                                        Buffers: shared hit=3
                                                  ->  Index Only Scan using orders_pkey on orders o_2  (cost=0.56..5.08 rows=1 width=57) (never executed)
                                                        Index Cond: (uid = j.uid)
                                                        Heap Fetches: 0
                                            ->  Index Only Scan using idx_trades_covering on trades t_2  (cost=0.56..5.09 rows=2 width=81) (never executed)
                                                  Index Cond: (order_uid = j.uid)
                                                  Heap Fetches: 0
                                      ->  Index Only Scan using order_rewards_pkey on order_execution oe_2  (cost=0.56..5.03 rows=3 width=65) (never executed)
                                            Index Cond: (order_uid = t_2.order_uid)
                                            Heap Fetches: 0
                                ->  Index Only Scan using idx_auction_prices_covering on auction_prices ap_2  (cost=0.70..7.06 rows=16 width=40) (never executed)
                                      Index Cond: ((auction_id = oe_2.auction_id) AND (token = CASE j.kind WHEN 'sell'::orderkind THEN j.buy_token ELSE j.sell_token END))
                                      Heap Fetches: 0
Planning:
  Buffers: shared hit=260
Planning Time: 2.178 ms
Execution Time: 18.047 ms

How to test

Due to the fact that floating point addition is not commutative and the order specified in the old query is not deterministic (the ORDER BY uid is merely an approximation that matches) the validation script leaves some room for differences, 1e-9 to be precise.

Validation script

#!/usr/bin/env python3
"""
Compare original and optimized surplus queries for correctness.
Picks random addresses and verifies both queries return identical results.
"""

import os
import sys
import psycopg2
from psycopg2 import sql
from decimal import Decimal

# Connection settings - override with environment variables
DB_CONFIG = {
    "host": os.getenv("DB_HOST", "localhost"),
    "port": os.getenv("DB_PORT", "5432"),
    "dbname": os.getenv("DB_NAME", "your_database"),
    "user": os.getenv("DB_USER", "your_user"),
    "password": os.getenv("DB_PASSWORD", ""),
}

ORIGINAL_QUERY = """
WITH regular_orders AS (
    SELECT ARRAY_AGG(uid) AS ids FROM orders WHERE owner = $1
),
onchain_orders AS (
    SELECT ARRAY_AGG(uid) AS ids FROM onchain_placed_orders WHERE sender = $1
),
trade_components AS (
    SELECT
       CASE kind
          WHEN 'sell' THEN t.buy_amount
          WHEN 'buy' THEN t.sell_amount - t.fee_amount
       END AS trade_amount,
       CASE kind
          WHEN 'sell' THEN (t.sell_amount - t.fee_amount) * o.buy_amount / o.sell_amount
          WHEN 'buy' THEN t.buy_amount * o.sell_amount / o.buy_amount
       END AS limit_amount,
       o.kind,
       CASE kind
          WHEN 'sell' THEN (SELECT price FROM auction_prices ap WHERE ap.token = o.buy_token AND ap.auction_id = oe.auction_id)
          WHEN 'buy' THEN (SELECT price FROM auction_prices ap WHERE ap.token = o.sell_token AND ap.auction_id = oe.auction_id)
       END AS surplus_token_native_price
    FROM orders o
    JOIN trades t ON o.uid = t.order_uid
    JOIN order_execution oe ON o.uid = oe.order_uid
    WHERE o.uid = ANY(ARRAY_CAT((SELECT ids FROM regular_orders), (SELECT ids FROM onchain_orders)))
    UNION ALL
    SELECT
       CASE j.kind
          WHEN 'sell' THEN t.buy_amount
          WHEN 'buy' THEN t.sell_amount - t.fee_amount
       END AS trade_amount,
       CASE j.kind
          WHEN 'sell' THEN (t.sell_amount - t.fee_amount) * j.buy_amount / j.sell_amount
          WHEN 'buy' THEN t.buy_amount * j.sell_amount / j.buy_amount
       END AS limit_amount,
       j.kind,
       CASE j.kind
          WHEN 'sell' THEN (SELECT price FROM auction_prices ap WHERE ap.token = j.buy_token AND ap.auction_id = oe.auction_id)
          WHEN 'buy' THEN (SELECT price FROM auction_prices ap WHERE ap.token = j.sell_token AND ap.auction_id = oe.auction_id)
       END AS surplus_token_native_price
    FROM jit_orders j
    JOIN trades t ON j.uid = t.order_uid
    JOIN order_execution oe ON j.uid = oe.order_uid
    WHERE j.owner = $1 AND NOT EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.uid = j.uid
    )
),
trade_surplus AS (
    SELECT
        CASE kind
            WHEN 'sell' THEN (trade_amount - limit_amount) * surplus_token_native_price
            WHEN 'buy' THEN (limit_amount - trade_amount) * surplus_token_native_price
        END / POWER(10, 18) AS surplus_in_wei
    FROM trade_components
)
SELECT
   COALESCE(SUM(surplus_in_wei), 0) AS total_surplus_in_wei
FROM trade_surplus;
"""

OPTIMIZED_QUERY = """
WITH trade_components AS (
    -- Regular orders: join trades first, then order_execution
    SELECT
        o.uid,
        CASE o.kind
            WHEN 'sell' THEN t.buy_amount
            WHEN 'buy' THEN t.sell_amount - t.fee_amount
        END AS trade_amount,
        CASE o.kind
            WHEN 'sell' THEN (t.sell_amount - t.fee_amount) * o.buy_amount / o.sell_amount
            WHEN 'buy' THEN t.buy_amount * o.sell_amount / o.buy_amount
        END AS limit_amount,
        o.kind,
        ap.price AS surplus_token_native_price
    FROM orders o
    JOIN trades t ON t.order_uid = o.uid
    JOIN order_execution oe ON oe.order_uid = t.order_uid
    LEFT JOIN auction_prices ap
        ON ap.auction_id = oe.auction_id
        AND ap.token = CASE o.kind WHEN 'sell' THEN o.buy_token ELSE o.sell_token END
    WHERE o.owner = $1

    UNION ALL

    -- Onchain placed orders (if sender differs from owner)
    SELECT
        o.uid,
        CASE o.kind
            WHEN 'sell' THEN t.buy_amount
            WHEN 'buy' THEN t.sell_amount - t.fee_amount
        END AS trade_amount,
        CASE o.kind
            WHEN 'sell' THEN (t.sell_amount - t.fee_amount) * o.buy_amount / o.sell_amount
            WHEN 'buy' THEN t.buy_amount * o.sell_amount / o.buy_amount
        END AS limit_amount,
        o.kind,
        ap.price AS surplus_token_native_price
    FROM onchain_placed_orders opo
    JOIN orders o ON o.uid = opo.uid AND o.owner != $1
    JOIN trades t ON t.order_uid = o.uid
    JOIN order_execution oe ON oe.order_uid = t.order_uid
    LEFT JOIN auction_prices ap
        ON ap.auction_id = oe.auction_id
        AND ap.token = CASE o.kind WHEN 'sell' THEN o.buy_token ELSE o.sell_token END
    WHERE opo.sender = $1

    UNION ALL

    -- JIT orders
    SELECT
        j.uid,
        CASE j.kind
            WHEN 'sell' THEN t.buy_amount
            WHEN 'buy' THEN t.sell_amount - t.fee_amount
        END AS trade_amount,
        CASE j.kind
            WHEN 'sell' THEN (t.sell_amount - t.fee_amount) * j.buy_amount / j.sell_amount
            WHEN 'buy' THEN t.buy_amount * j.sell_amount / j.buy_amount
        END AS limit_amount,
        j.kind,
        ap.price AS surplus_token_native_price
    FROM jit_orders j
    JOIN trades t ON t.order_uid = j.uid
    JOIN order_execution oe ON oe.order_uid = t.order_uid
    LEFT JOIN auction_prices ap
        ON ap.auction_id = oe.auction_id
        AND ap.token = CASE j.kind WHEN 'sell' THEN j.buy_token ELSE j.sell_token END
    WHERE j.owner = $1
      AND NOT EXISTS (SELECT 1 FROM orders o WHERE o.uid = j.uid)
)
SELECT
    COALESCE(SUM(surplus_in_wei ORDER BY uid), 0) AS total_surplus_in_wei
FROM (
    SELECT
        uid,
        CASE kind
            WHEN 'sell' THEN (trade_amount - limit_amount) * surplus_token_native_price
            WHEN 'buy' THEN (limit_amount - trade_amount) * surplus_token_native_price
        END / POWER(10, 18) AS surplus_in_wei
    FROM trade_components
) ts;
"""

# Query to get random addresses with varying order counts
SAMPLE_ADDRESSES_QUERY = """
WITH address_order_counts AS (
    SELECT owner AS address, COUNT(*) AS order_count
    FROM orders
    GROUP BY owner
),
bucketed AS (
    SELECT
        address,
        order_count,
        CASE
            WHEN order_count < 10 THEN '0: 1-9'
            WHEN order_count < 100 THEN '1: 10-99'
            WHEN order_count < 1000 THEN '2: 100-999'
            WHEN order_count < 10000 THEN '3: 1000-9999'
            ELSE '4: 10000+'
        END AS bucket
    FROM address_order_counts
)
SELECT address, order_count, bucket
FROM (
    SELECT address, order_count, bucket,
           ROW_NUMBER() OVER (PARTITION BY bucket ORDER BY RANDOM()) AS rn
    FROM bucketed
) ranked
WHERE rn <= %s
ORDER BY bucket, order_count;
"""


def get_connection():
    return psycopg2.connect(**DB_CONFIG)


def fetch_sample_addresses(conn, samples_per_bucket=5):
    """Get random addresses from each order count bucket."""
    with conn.cursor() as cur:
        cur.execute(SAMPLE_ADDRESSES_QUERY, (samples_per_bucket,))
        return cur.fetchall()


def run_query(conn, query, address):
    """Run a query with the given address parameter."""
    # Convert $1 placeholder to %s for psycopg2
    pg_query = query.replace("$1", "%s")
    # Count how many parameters we need
    param_count = pg_query.count("%s")
    with conn.cursor() as cur:
        cur.execute(pg_query, tuple([address] * param_count))
        result = cur.fetchone()
        return result[0] if result else None


def compare_results(original, optimized, tolerance=1e-9):
    """Compare two numeric results with floating point tolerance."""
    if original is None and optimized is None:
        return True, "both NULL"
    if original is None or optimized is None:
        return False, f"NULL mismatch: original={original}, optimized={optimized}"

    # Convert to float for comparison
    orig_float = float(original)
    opt_float = float(optimized)

    if orig_float == opt_float:
        return True, "exact match"

    # Check relative difference for non-zero values
    if orig_float != 0:
        rel_diff = abs(orig_float - opt_float) / abs(orig_float)
        if rel_diff < tolerance:
            return True, f"within tolerance (rel_diff={rel_diff:.2e})"

    # Check absolute difference for values near zero
    abs_diff = abs(orig_float - opt_float)
    if abs_diff < tolerance:
        return True, f"within tolerance (abs_diff={abs_diff:.2e})"

    return False, f"MISMATCH: original={orig_float}, optimized={opt_float}, diff={abs_diff}"


def main():
    samples_per_bucket = int(sys.argv[1]) if len(sys.argv) > 1 else 5

    print(f"Comparing original vs optimized query ({samples_per_bucket} samples per bucket)")
    print("=" * 80)

    conn = get_connection()

    # Get sample addresses
    print("\nFetching sample addresses...")
    addresses = fetch_sample_addresses(conn, samples_per_bucket)
    print(f"Found {len(addresses)} addresses across buckets\n")

    passed = 0
    failed = 0
    current_bucket = None

    for address, order_count, bucket in addresses:
        if bucket != current_bucket:
            current_bucket = bucket
            print(f"\n--- Bucket: {bucket} ---")

        # Run both queries
        original_result = run_query(conn, ORIGINAL_QUERY, address)
        optimized_result = run_query(conn, OPTIMIZED_QUERY, address)

        # Compare
        match, reason = compare_results(original_result, optimized_result)

        addr_hex = "0x" + address.hex() if isinstance(address, (bytes, memoryview)) else str(address)
        status = "✓" if match else "✗"

        print(f"  {status} {addr_hex[:18]}... ({order_count:>5} orders): {reason}")

        if match:
            passed += 1
        else:
            failed += 1
            # Print detailed values on failure
            print(f"      Original:  {original_result}")
            print(f"      Optimized: {optimized_result}")

    # Summary
    print("\n" + "=" * 80)
    print(f"SUMMARY: {passed} passed, {failed} failed out of {passed + failed} tests")

    conn.close()

    return 0 if failed == 0 else 1


if __name__ == "__main__":
    sys.exit(main())

To validate the performance, I think it's best we give it a run in prod for anywhere from 30 minutes to 2 hours.
Even while requiring indexes, the new query should be faster.

@jmg-duarte jmg-duarte marked this pull request as ready for review February 3, 2026 11:50
@jmg-duarte jmg-duarte requested a review from a team as a code owner February 3, 2026 11:50
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request optimizes the total_surplus query by replacing an inefficient ARRAY_AGG approach with a UNION ALL of separate CTEs, which yields significant performance improvements. The logic appears correct, and the performance gains are well-documented. The original comment regarding significant code duplication in the CTEs and the suggestion to refactor for improved maintainability is valid and aligns with best practices, as none of the provided rules contradict it. Therefore, the comment has been kept as is.

@jmg-duarte jmg-duarte added the hotfix Labels PRs that should be applied into production right away label Feb 3, 2026
Copy link
Contributor

@squadgazzz squadgazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A few comments.

Also, for such PRs, I'd expect to see EXPLAIN ANALYZE before and after the change.


-- Additional query for jit_orders
-- deduplicate orders
WHERE o.owner != $1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I didn't miss anything, the old query included orders where owner = $1 OR sender = $1 (via ARRAY_CAT). The new onchain_order_trades CTE explicitly excludes orders where owner = $1.
This means if an order has both owner = $1 and sender = $1, it will now be counted only once (in regular_order_trades), whereas before it may have been processed differently through the array concatenation.

While deduplication seems reasonable, I just want to ensure that doesn't break any business logic.

Copy link
Contributor

@m-sz m-sz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing the data and script. LGTM.

@squadgazzz
Copy link
Contributor

I got a benchmark test + result checker script
SQL Query Benchmark

I am not sure this approach is correct. Postgres heavily uses caching. With the first query, you already warmed up a lot of things, so the CTE one runs much faster. For proper comparison, this requires running it locally on a meaningful data snapshot in an isolated way.

@jmg-duarte
Copy link
Contributor Author

jmg-duarte commented Feb 3, 2026

I am not sure this approach is correct. Postgres heavily uses caching. With the first query, you already warmed up a lot of things, so the CTE one runs much faster. For proper comparison, this requires running it locally on a meaningful data snapshot in an isolated way.

For both benchmarks I queried for different addresses, which is closer to what we would experience in prod. The user queries once and the subsequent queries would take advantage of that cache.

I'm open to ideas to improve them!

@fafk
Copy link
Contributor

fafk commented Feb 4, 2026

I am not sure this approach is correct. Postgres heavily uses caching. With the first query, you already warmed up a lot of things, so the CTE one runs much faster. For proper comparison, this requires running it locally on a meaningful data snapshot in an isolated way.

For both benchmarks I queried for different addresses, which is closer to what we would experience in prod. The user queries once and the subsequent queries would take advantage of that cache.

I'm open to ideas to improve them!

There is a way to avoid caching but it's mighty unpleasant: pull the DB data, run a local postgres instance in docker, load the data. Then run query 1. Turn off docker. Turn on docker. Run query 2 & compare results. More details here: #3542 (comment) + https://pastebin.com/ijpEmVDs

To be fair I think as long as you are sure the queries are functionally equivalent (give the same results) I'd just deploy it and observe the speed diff and revert if it doesn't work.

@jmg-duarte
Copy link
Contributor Author

There is a way to avoid caching but it's mighty unpleasant

Understatement for +1TB of data. Performing the same setup in AWS will also take a bunch 😭

@MartinquaXD
Copy link
Contributor

To be fair I think as long as you are sure the queries are functionally equivalent (give the same results) I'd just deploy it and observe the speed diff and revert if it doesn't work.

Yeah, when no additional indexes are needed and confidence is reasonably high we can just temporarily deploy to prod for a bit.

@jmg-duarte jmg-duarte marked this pull request as draft February 4, 2026 10:23
@fafk
Copy link
Contributor

fafk commented Feb 4, 2026

There is a way to avoid caching but it's mighty unpleasant

Understatement for +1TB of data. Performing the same setup in AWS will also take a bunch 😭

The trick is to only to only pull the tables you need. The biggest table (e.g. prices) you wouldn't need. Last time I did this I ended up with 10 gigs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotfix Labels PRs that should be applied into production right away

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants