Description
Description
Been debugging some performance issues on our RDS database and came across this ugly chart.
pg_stat_statements say it's this query here:
SELECT event_id, json FROM event_json LEFT JOIN event_relations USING (event_id) WHERE event_id > '$iC2wL1TGv8qPaZXR8-KMSNuKxxgLLkhru3HeUeetEPI' AND event_relations.event_id IS NULL ORDER BY event_id LIMIT 100
Which is the top query on our DB by a factor of 3, as it needs to do a surprisingly slow and large index scan of the index on event_relations to see if there are any that exist for the events we'd like to process.
Limit (cost=1.12..90.44 rows=100 width=1064) (actual time=2295.323..2295.979 rows=100 loops=1)
-> Merge Anti Join (cost=1.12..20178406.57 rows=22592023 width=1064) (actual time=2295.322..2295.968 rows=100 loops=1)
Merge Cond: (event_json.event_id = event_relations.event_id)
-> Index Scan using event_json_event_id_key on event_json (cost=0.57..19154088.46 rows=24493063 width=1064) (actual time=0.022..0.630 rows=108 loops=1)
Index Cond: (event_id > '$iC2wL1TGv8qPaZXR8-KMSNuKxxgLLkhru3HeUeetEPI'::text)
-> Index Only Scan using event_relations_id on event_relations (cost=0.56..926766.22 rows=6923535 width=44) (actual time=0.003..1908.093 rows=5003206 loops=1)
Heap Fetches: 778015
Planning Time: 0.153 ms
Execution Time: 2296.006 ms
2 seconds to scan through the whole event_relations_id index looking for entries that don't exist.
Note our database is quite large. Our events_json table is about 145gb alone, and we're only on the $i events in this background update and we already have 7 million rows in our event_relations table.
It feels like the batch size calculation here is actually hurting us: https://github.com/matrix-org/synapse/blob/develop/synapse/storage/background_updates.py#L274
The query takes so long that even though we only want it to take 100ms, it takes 2s+, so we always only ever do the minimum amount (100 entries). However, the nature of the query means that we're scanning the same index over and over again to find pretty small sets of entries.
As a test, running the same query with LIMIT 1000 instead of LIMIT 100 only takes half a second more.
Limit (cost=1.12..894.30 rows=1000 width=1064) (actual time=2291.379..2778.058 rows=1000 loops=1)
-> Merge Anti Join (cost=1.12..20178846.86 rows=22592349 width=1064) (actual time=2291.378..2777.892 rows=1000 loops=1)
Merge Cond: (event_json.event_id = event_relations.event_id)
-> Index Scan using event_json_event_id_key on event_json (cost=0.57..19154443.67 rows=24493419 width=1064) (actual time=0.011..485.704 rows=1088 loops=1)
Index Cond: (event_id > '$iC2wL1TGv8qPaZXR8-KMSNuKxxgLLkhru3HeUeetEPI'::text)
-> Index Only Scan using event_relations_id on event_relations (cost=0.56..926849.84 rows=6923643 width=44) (actual time=0.003..1904.018 rows=5003394 loops=1)
Heap Fetches: 778134
Planning Time: 0.140 ms
Execution Time: 2778.188 ms
I'm a little scared to bump the minimum batch size across the board, as I'm not sure what impact that will have on different background updates.
Is there a way to do this work in a smarter way that doesn't hit the database so hard?
Should we have a way of tuning batch sizes on a per background update job basis so we avoid badly performing cases like this?
Version information
-
Homeserver:
matrix.beeper.com -
Version:
Lightly modified version of v1.46 -
Install method:
Rolling out own Docker container -
Platform:
Kubernetes on AWS + RDS for the database