Conversation
There was a problem hiding this comment.
Code Review
The pull request optimizes the total_surplus query by replacing an inefficient ARRAY_AGG approach with a UNION ALL of separate CTEs, which yields significant performance improvements. The logic appears correct, and the performance gains are well-documented. The original comment regarding significant code duplication in the CTEs and the suggestion to refactor for improved maintainability is valid and aligns with best practices, as none of the provided rules contradict it. Therefore, the comment has been kept as is.
squadgazzz
left a comment
There was a problem hiding this comment.
Nice! A few comments.
Also, for such PRs, I'd expect to see EXPLAIN ANALYZE before and after the change.
|
|
||
| -- Additional query for jit_orders | ||
| -- deduplicate orders | ||
| WHERE o.owner != $1 |
There was a problem hiding this comment.
If I didn't miss anything, the old query included orders where owner = $1 OR sender = $1 (via ARRAY_CAT). The new onchain_order_trades CTE explicitly excludes orders where owner = $1.
This means if an order has both owner = $1 and sender = $1, it will now be counted only once (in regular_order_trades), whereas before it may have been processed differently through the array concatenation.
While deduplication seems reasonable, I just want to ensure that doesn't break any business logic.
m-sz
left a comment
There was a problem hiding this comment.
Thanks for providing the data and script. LGTM.
I am not sure this approach is correct. Postgres heavily uses caching. With the first query, you already warmed up a lot of things, so the CTE one runs much faster. For proper comparison, this requires running it locally on a meaningful data snapshot in an isolated way. |
For both benchmarks I queried for different addresses, which is closer to what we would experience in prod. The user queries once and the subsequent queries would take advantage of that cache. I'm open to ideas to improve them! |
There is a way to avoid caching but it's mighty unpleasant: pull the DB data, run a local postgres instance in docker, load the data. Then run query 1. Turn off docker. Turn on docker. Run query 2 & compare results. More details here: #3542 (comment) + https://pastebin.com/ijpEmVDs To be fair I think as long as you are sure the queries are functionally equivalent (give the same results) I'd just deploy it and observe the speed diff and revert if it doesn't work. |
Understatement for +1TB of data. Performing the same setup in AWS will also take a bunch 😭 |
Yeah, when no additional indexes are needed and confidence is reasonably high we can just temporarily deploy to prod for a bit. |
The trick is to only to only pull the tables you need. The biggest table (e.g. prices) you wouldn't need. Last time I did this I ended up with 10 gigs. |
0191b65 to
e1fcb81
Compare
Description
We've been experiencing latency spikes on several endpoints, we've pinned this down to the time it takes to acquire DB connections from the pool; when checking RDS monitoring, the surplus query always shows up at the top.
The current theory is that the query is a bit slower than it could be, as more users request the main swap page, their surplus is loaded (even if they don't request it — i.e. load the wallet modal) and if some of these users have a larger amount of orders, they're taking up connections that other endpoints aren't getting.
I looked into the user distribution, here are the results:
Distribution Query
However, if it was just this, it would be too simple. Depending on the user, they might have no orders and only onchain orders (note that the max number of onchain orders is around ~10k), some will have skewed data distributions across tables too, leading to analyzing and optimizing this query a bit tricky.
There are two crucial changes to the query — removing ARRAY_AGG and adding indexes — the first makes it so that the DB does not have to materialize a potentially big array in memory, which would otherwise lead to bad estimations too; the second provides better "access paths" to some of the information the query requires.
Changes
Query Plans
Before
After
How to test
Due to the fact that floating point addition is not commutative and the order specified in the old query is not deterministic (the ORDER BY uid is merely an approximation that matches) the validation script leaves some room for differences, 1e-9 to be precise.
Validation script
To validate the performance, I think it's best we give it a run in prod for anywhere from 30 minutes to 2 hours.
Even while requiring indexes, the new query should be faster.