-
Notifications
You must be signed in to change notification settings - Fork 90
Problem: Finding replay_lsn with lsn tracking is 7.5 times slower #731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: Finding replay_lsn with lsn tracking is 7.5 times slower #731
Conversation
I'm not sure about your work here. The idea is that we are using Our code is a little complex around this, the goal is to report a trustworthy replay_lsn to the source system and replication slot, in order to ensure we can still replay any failed transaction later. Maybe you could try removing all the code around the LSN mapping and using |
@dimitri Since we read the origin progress using a separate postgres connection, isn't the origin progress LSN representing a txn which is committed to disk? |
So... the origin LSN is valid in the context of the source database. The commit-to-disk aspect is asked in the context of the target database, where the transaction has its own (different) LSN. The replay transaction is associated with a source LSN and happens on the target database with That's why we have this complexity here. We want to make sure to report a replay_lsn to the source database that has already made it safely to disk on the target database, and that needs to be a source lsn, not a target lsn. The only way to bypass this complexity that I can see is to use |
@dimitri, let's consider a scenario where we're replicating 100 transactions from source to target with |
0d243ff
to
48663f9
Compare
Given Postgres docs for the function, it might work, yes. I am not sure it does though. I do not know how to compare the wording “flushed to disk” with the asynchronous_commit aspects of what we do. What kind of testing did you do around it? Did you study Postgres source code already to better understand the guarantees here?
|
@dimitri, I spent sometime with the postgres source, as per my understanding, Please refer this code snippet => https://github.com/postgres/postgres/blob/6f132ed693b6a0be0997f092c42abf14878362ac/src/backend/replication/logical/origin.c#L1044-L1045 Additionally, I wrote a simple SQL test to confirm, Create replication origin
Advance replication origin and verify it is always less than flush lsn
I had used |
Thanks for checking on that @arajkumar ! Next step would be to get rid of the whole lsn tracking infrastructure in the code then, including the SQLite catalogs entirely. Can you update your PR to get rid of all the extra code that's not needed anymore? |
Currently, we maintain a LSN mapping which tracks the target insert LSN against current DML operation's LSN. This mapping will be used to determine the replay_lsn. The mapping is implemented using LIST and every insert executes `pg_current_wal_insert_lsn` procedure on target server. When there is a need find current replay_lsn, another procedure `pg_current_wal_flush_lsn` executed on target and respective source LSN is determinted by iterating the mapping LIST. Additionally, we persist the list into a sqlite table on each sentinel sync leading more delay due to local disk write. Solution: Use replication origin progress lsn as replay_lsn We create postgres replication origin on target and use it to track the apply progress which includes source operation LSN. We can simply reuse the origin progress tracking to find the replay_lsn. By doing this, we eliminate a Postgres procedure call on each transaction and bunch of local disk write by sqlite. Removing the above mentioned functionality improves the overall write throughput by ~7.5 times. Without fix(main): ``` time pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql 09:48:19.678 112637 INFO Running pgcopydb version 0.15.38.gc1bc3d9 from "/home/ubuntu/dimitri/pgcopydb/src/bin/pgcopydb/pgcopydb" 09:48:19.726 112637 INFO Using work dir "/tmp/pgcopydb" 09:48:19.793 112637 INFO Setting up previous LSN from replication origin "pgcopydb" progress at 0/0 09:48:19.868 112637 INFO Replaying changes from file "/home/ubuntu/dimitri/bench.sql" pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql 9.25s user 11.48s system 7% cpu 4:54.31 total ``` 122987 / 4m54s = 419 records/sec With fix: ``` time pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql 10:01:42.701 113419 INFO Running pgcopydb version 0.15.38.gc1bc3d9 from "/home/ubuntu/dimitri/pgcopydb/src/bin/pgcopydb/pgcopydb" 10:01:42.750 113419 INFO Using work dir "/tmp/pgcopydb" 10:01:42.814 113419 INFO Setting up previous LSN from replication origin "pgcopydb" progress at 0/0 10:01:42.889 113419 INFO Replaying changes from file "/home/ubuntu/dimitri/bench.sql" pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql 13.22s user 5.46s system 47% cpu 39.205 total ``` 122987 / 39.2s = 3,137 records/sec Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
48663f9
to
8fc6a40
Compare
@dimitri, I removed all the unwanted lsn tracking infra. |
Thanks again for all the work @arajkumar ! |
Currently, we maintain a LSN mapping which tracks the target insert LSN against current DML operation's LSN. This mapping will be used to determine the replay_lsn. The mapping is implemented using LIST and every insert executes
pg_current_wal_insert_lsn
procedure on target server. When there is a need find current replay_lsn, another procedurepg_current_wal_flush_lsn
executed on target and respective source LSN is determinted by iterating the mapping LIST. Additionally, we persist the list into a sqlite table on each sentinel sync leading more delay due to local disk write.Solution: Use replication origin progress lsn as replay_lsn
We create postgres replication origin on target and use it to track the apply progress which includes source operation LSN. We can simply reuse the origin progress tracking to find the replay_lsn. By doing this, we eliminate a Postgres procedure call on each transaction and bunch of local disk write by sqlite. Removing the above mentioned functionality improves the overall write throughput by ~7.5 times.
Without fix(main):
122987 / 4m54s = 419 records/sec
With fix:
122987 / 39.2s = 3,137 records/sec
TODO