8000 Problem: Finding replay_lsn with lsn tracking is 7.5 times slower by arajkumar · Pull Request #731 · dimitri/pgcopydb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Problem: Finding replay_lsn with lsn tracking is 7.5 times slower #731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

arajkumar
Copy link
Contributor
@arajkumar arajkumar commented Apr 1, 2024

Currently, we maintain a LSN mapping which tracks the target insert LSN against current DML operation's LSN. This mapping will be used to determine the replay_lsn. The mapping is implemented using LIST and every insert executes pg_current_wal_insert_lsn procedure on target server. When there is a need find current replay_lsn, another procedure pg_current_wal_flush_lsn executed on target and respective source LSN is determinted by iterating the mapping LIST. Additionally, we persist the list into a sqlite table on each sentinel sync leading more delay due to local disk write.

Solution: Use replication origin progress lsn as replay_lsn

We create postgres replication origin on target and use it to track the apply progress which includes source operation LSN. We can simply reuse the origin progress tracking to find the replay_lsn. By doing this, we eliminate a Postgres procedure call on each transaction and bunch of local disk write by sqlite. Removing the above mentioned functionality improves the overall write throughput by ~7.5 times.

Without fix(main):

time pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql
09:48:19.678 112637 INFO   Running pgcopydb version 0.15.38.gc1bc3d9 from "/home/ubuntu/dimitri/pgcopydb/src/bin/pgcopydb/pgcopydb"
09:48:19.726 112637 INFO   Using work dir "/tmp/pgcopydb"
09:48:19.793 112637 INFO   Setting up previous LSN from replication origin "pgcopydb" progress at 0/0
09:48:19.868 112637 INFO   Replaying changes from file "/home/ubuntu/dimitri/bench.sql"
pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql  9.25s user 11.48s system 7% cpu 4:54.31 total

122987 / 4m54s = 419 records/sec

With fix:

time pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql
10:01:42.701 113419 INFO   Running pgcopydb version 0.15.38.gc1bc3d9 from "/home/ubuntu/dimitri/pgcopydb/src/bin/pgcopydb/pgcopydb"
10:01:42.750 113419 INFO   Using work dir "/tmp/pgcopydb"
10:01:42.814 113419 INFO   Setting up previous LSN from replication origin "pgcopydb" progress at 0/0
10:01:42.889 113419 INFO   Replaying changes from file "/home/ubuntu/dimitri/bench.sql"
pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql  13.22s user 5.46s system 47% cpu 39.205 total

122987 / 39.2s = 3,137 records/sec

TODO

  • Remove lsn_tracking functionality

@dimitri
Copy link
Owner
dimitri commented Apr 2, 2024

I'm not sure about your work here. The idea is that we are using { "synchronous_commit", "off" } when applying changes on the target, in order to avoid waiting for the disk flush at each transaction commit. Then we can't take the replay_lsn (nor the origin lsn) at face value, because at COMMIT time we did not wait to make sure the replayed transaction was indeed committed to disk yet.

Our code is a little complex around this, the goal is to report a trustworthy replay_lsn to the source system and replication slot, in order to ensure we can still replay any failed transaction later.

Maybe you could try removing all the code around the LSN mapping and using { "synchronous_commit", "on" } in the replay connection and see what are the performance implications of that?

@arajkumar
Copy link
Contributor Author

I'm not sure about your work here. The idea is that we are using { "synchronous_commit", "off" } when applying changes on the target, in order to avoid waiting for the disk flush at each transaction commit. Then we can't take the replay_lsn (nor the origin lsn) at face value, because at COMMIT time we did not wait to make sure the replayed transaction was indeed committed to disk yet.

@dimitri Since we read the origin progress using a separate postgres connection, isn't the origin progress LSN representing a txn which is committed to disk?

@dimitri
Copy link
Owner
dimitri commented Apr 2, 2024

@dimitri Since we read the origin progress using a separate postgres connection, isn't the origin progress LSN representing a txn which is committed to disk?

So... the origin LSN is valid in the context of the source database. The commit-to-disk aspect is asked in the context of the target database, where the transaction has its own (different) LSN. The replay transaction is associated with a source LSN and happens on the target database with synchronous_commit set to off, which means that when the COMMIT query is sent and the code receives an answer, we don't know yet if the COMMIT made it safely to disk on the target.

That's why we have this complexity here. We want to make sure to report a replay_lsn to the source database that has already made it safely to disk on the target database, and that needs to be a source lsn, not a target lsn. The only way to bypass this complexity that I can see is to use synchronous_commit set to on, which means that the source LSN being replayed is safe to publish as the replay_lsn to the source system right at COMMIT response time (on the target).

@arajkumar
Copy link
Contributor Author

@dimitri, let's consider a scenario where we're replicating 100 transactions from source to target with SET synchronous_commit=off. While sending those 100 transactions, we also update the origin progress with each transaction's commit LSN (from the source). Suppose that, so far, the target has co 8000 mmitted up to the 50th transaction to disk. If we run SELECT pg_replication_origin_progress('pgcopydb', true) on the target, wouldn't it return the LSN of the 50th transaction, which could then be considered as the replay_lsn?

@arajkumar arajkumar force-pushed the arajkumar/origin-progress-as-replay-lsn branch from 0d243ff to 48663f9 Compare April 3, 2024 06:15
@dimitri
Copy link
Owner
dimitri commented Apr 3, 2024

@dimitri, let's consider a scenario where we're replicating 100 transactions from source to target with SET synchronous_commit=off. While sending those 100 transactions, we also update the origin progress with each transaction's commit LSN (from the source). Suppose that, so far, the target has committed up to the 50th transaction to disk. If we run SELECT pg_replication_origin_progress('pgcopydb', true) on the target, wouldn't it return the LSN of the 50th transaction, which could then be considered as the replay_lsn?

Given Postgres docs for the function, it might work, yes. I am not sure it does though. I do not know how to compare the wording “flushed to disk” with the asynchronous_commit aspects of what we do. What kind of testing did you do around it? Did you study Postgres source code already to better understand the guarantees here?

Returns the replay location for the given replication origin. The parameter flush determines whether the corresponding local transaction will be guaranteed to have been flushed to disk or not.

@arajkumar
Copy link
Contributor Author
arajkumar commented Apr 5, 2024

@dimitri, I spent sometime with the postgres source, as per my understanding, SELECT pg_replication_origin_progress('pgcopydb', true) would ensure the local txn is flushed before returning the progress LSN.

Please refer this code snippet => https://github.com/postgres/postgres/blob/6f132ed693b6a0be0997f092c42abf14878362ac/src/backend/replication/logical/origin.c#L1044-L1045

Additionally, I wrote a simple SQL test to confirm,

Create replication origin

psql "$PG_URI" -f << EOF
SELECT pg_replication_origin_create('pgcopydb_test');
CREATE TABLE IF NOT EXISTS test(id int8);
EOF

Advance replication origin and verify it is always less than flush lsn

psql "$PG_URI" -f - << EOF
SELECT pg_replication_origin_session_setup('pgcopydb_test');

BEGIN;
INSERT INTO test VALUES(1);
SELECT pg_current_wal_insert_lsn() AS lsn \gset
SELECT pg_replication_origin_xact_setup(:'lsn'::pg_lsn, now());
INSERT INTO test VALUES(2);

SELECT  pg_replication_origin_progress('pgcopydb_test', true) < pg_current_wal_flush_lsn();
COMMIT;
EOF

I had used pg_current_wal_insert_lsn as a origin LSN for the ease of demonstration.

@dimitri
Copy link
Owner
dimitri commented Apr 5, 2024

Thanks for checking on that @arajkumar ! Next step would be to get rid of the whole lsn tracking infrastructure in the code then, including the SQLite catalogs entirely. Can you update your PR to get rid of all the extra code that's not needed anymore?

Currently, we maintain a LSN mapping which tracks the target insert LSN
against current DML operation's LSN. This mapping will be used to
determine the replay_lsn. The mapping is implemented using LIST and
every insert executes `pg_current_wal_insert_lsn` procedure on target server.
When there is a need find current replay_lsn, another procedure
`pg_current_wal_flush_lsn` executed on target and respective source LSN
is determinted by iterating the mapping LIST. Additionally, we
persist the list into a sqlite table on each sentinel sync leading
more delay due to local disk write.

Solution: Use replication origin progress lsn as replay_lsn

We create postgres replication origin on target and use it to track the
apply progress which includes source operation LSN. We can simply reuse
the origin progress tracking to find the replay_lsn. By doing this, we
eliminate a Postgres procedure call on each transaction and bunch of
local disk write by sqlite. Removing the above mentioned functionality
improves the overall write throughput by ~7.5 times.

Without fix(main):
```
time pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql
09:48:19.678 112637 INFO   Running pgcopydb version 0.15.38.gc1bc3d9 from "/home/ubuntu/dimitri/pgcopydb/src/bin/pgcopydb/pgcopydb"
09:48:19.726 112637 INFO   Using work dir "/tmp/pgcopydb"
09:48:19.793 112637 INFO   Setting up previous LSN from replication origin "pgcopydb" progress at 0/0
09:48:19.868 112637 INFO   Replaying changes from file "/home/ubuntu/dimitri/bench.sql"
pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql  9.25s user 11.48s system 7% cpu 4:54.31 total
```
122987 / 4m54s = 419 records/sec

With fix:
```
time pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql
10:01:42.701 113419 INFO   Running pgcopydb version 0.15.38.gc1bc3d9 from "/home/ubuntu/dimitri/pgcopydb/src/bin/pgcopydb/pgcopydb"
10:01:42.750 113419 INFO   Using work dir "/tmp/pgcopydb"
10:01:42.814 113419 INFO   Setting up previous LSN from replication origin "pgcopydb" progress at 0/0
10:01:42.889 113419 INFO   Replaying changes from file "/home/ubuntu/dimitri/bench.sql"
pgcopydb stream apply --resume --not-consistent ~/dimitri/bench.sql  13.22s user 5.46s system 47% cpu 39.205 total
```
122987 / 39.2s = 3,137 records/sec

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
@arajkumar arajkumar force-pushed the arajkumar/origin-progress-as-replay-lsn branch from 48663f9 to 8fc6a40 Compare April 5, 2024 14:00
@arajkumar
Copy link
Contributor Author

@dimitri, I removed all the unwanted lsn tracking infra.

@dimitri dimitri added the enhancement New feature or request label Apr 5, 2024
@dimitri dimitri added this to the v0.16 milestone Apr 5, 2024
@dimitri dimitri merged commit 3f07e0b into dimitri:main Apr 5, 2024
@dimitri
Copy link
Owner
dimitri commented Apr 5, 2024

Thanks again for all the work @arajkumar !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0