8000 Fix `lsn` for KEEPALIVE action. by shubhamdhama · Pull Request #164 · dimitri/pgcopydb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix lsn for KEEPALIVE action. #164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 16, 2023

Conversation

shubhamdhama
Copy link
Contributor
@shubhamdhama shubhamdhama commented Dec 9, 2022

It fixes an issue under the following conditions,

  1. Source doesn't have any activity.
  2. endpos is set to current LSN.
  3. The action that crosses the endpos is KEEPALIVE.

in this case, the written lsn that corresponds to KEEPALIVE isn't correct, i.e. less than endpos. This ultimately leads to Apply process waiting for a statement >= endpos which never happens, hence it never terminates.

@shubhamdhama
Copy link
Contributor Author
shubhamdhama commented Dec 9, 2022

Draft as I will write a unit test when no activity on source happens (sort of like edge case).
I'm not able to come up with a test case where I can deterministically set the endpos right after a K action because we also have B/C action due to activity on the sentinel table. So I was thinking the best we can do is create a test case where we have no activity for 10 seconds, prefetch, and put a timeout on apply for 10s so that migration should finish quickly.

Also, I have a question about some other tests (cdc/*) I see that Keepalive is in the expected JSON. How is every test run emitting the Keepalive at the same position relative to other actions?

@dimitri
Copy link
Owner
dimitri commented Jan 11, 2023

My reading of the Postgres source code makes me think that this PR is okay, we actually want to advance to WalEnd when receiving a keepalive message. See function WalSndUpdateProgress in src/backend/replication/walsender.c:

	 * When skipping empty transactions in synchronous replication, we send a
	 * keepalive message to avoid delaying such transactions.

About the test cases and the Keepalive (and other messages) position, we just skip comparing them because each run may provide a different one...

@shubhamdhama shubhamdhama marked this pull request as ready for review January 13, 2023 08:10
@shubhamdhama
Copy link
Contributor Author

Thanks for confirming that this fix is correct.

I've published the PR since this change is important and I'm couldn't finish the unit test I mentioned. I will share them in a separate PR once I am done.

@dimitri
Copy link
Owner
dimitri commented Jan 13, 2023

It seems you have added an extra KeepAlive message in some of the tests output, and the expected file has not been updated by your PR so tests are failing. Please see about updating the expected tests results.

@shubhamdhama shubhamdhama force-pushed the fix-keepalive-reaching-endpos branch from 64ee757 to b6a34c2 Compare January 16, 2023 11:02
@shubhamdhama
Copy link
Contributor Author

Oh right, I have updated the PR.

@dimitri dimitri merged commit 2598444 into dimitri:main Jan 16, 2023
arajkumar added a commit to arajkumar/pgcopydb that referenced this pull request Sep 18, 2024
In a Postgres-to-Timescale migration, pgcopydb currently executes VACUUM
ANALYZE at the hypertable level, which processes the underlying chunks
serially. For hypertables with a significant number of chunks, this
approach can be time-consuming.

This commit disables pgcopydb’s default vacuum operation and instead
performs vacuuming at the chunk level using parallel workers. This
change allows for better utilization of available resources on the
target system, speeding up the overall process.

Parallel vacuum analyze at chunk level:
```
Completed Vacuum analyze tables in 23 seconds
```

Vacuum analyze on hypertable:
```
vacuum analyze metrics_tmp ;
VACUUM
Time: 35636.559 ms (00:35.637)
``` 

The parallel vacuum analyze is 34% better!

Fixes
https://linear.app/timescale/issue/DAT-66/pg-to-ts-vaccum-analyze-via-pgcopydb-is-slow-for-hypertables-with-many

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
arajkumar added a commit to arajkumar/pgcopydb that referenced this pull request Sep 18, 2024
06347f0 Run VACUUM ANALYZE in Parallel for Hypertable Chunks (dimitri#164)
e59f88b Fix ctid based same table copy concurrency (dimitri#165)
57908cc Make queries compatible with PG12 (dimitri#163)

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0