8000 Fix issue with missing data from transactions with BEGIN LSN less than consistent_point. by shubhamdhama · Pull Request #295 · dimitri/pgcopydb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix issue 8000 with missing data from transactions with BEGIN LSN less than consistent_point. #295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

shubhamdhama
Copy link
Contributor

Currently we skip applying SQL statements whose source/origin LSN is less than previousLSN. When using pgcopydb clone --follow, the initial previousLSN matches theconsistent_point obtained during the snapshot.

Comparing LSN of each SQL statement creates a problem when parallel transactions are running during the snapshot capture. These transactions could have statements whose LSN is less than the previousLSN. Consequently, the initial portion of some transactions, containing statements with LSN less than the consistent_point, are inadvertently omitted.

The fix here is to compare the COMMIT LSN instead of each statement's LSN as it is guaranteed that the COMMIT LSN of every streamed message will be greater than consistent_point.

Fixes: #282

Copy link
Owner
@dimitri dimitri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! please edit the main comment of that PR as indicated below.

@shubhamdhama shubhamdhama force-pushed the fix-waiting-indefinitely-for-message branch 3 times, most recently from cde70ba to 60ff15e Compare May 24, 2023 06:34
@shubhamdhama shubhamdhama marked this pull request as ready for review May 24, 2023 06:35
@shubhamdhama shubhamdhama force-pushed the fix-waiting-indefinitely-for-message branch from f1e4913 to 65969e9 Compare May 24, 2023 06:43
@shubhamdhama
Copy link
Contributor Author
shubhamdhama commented May 24, 2023

We have two important follow-ups for this PR (or maybe in this PR only),

  1. We need to take care of < and <=.
  2. If the transaction spills to the next file, then we don't have COMMIT LSN.

@shubhamdhama shubhamdhama requested a review from dimitri May 24, 2023 07:07
Comment on lines 23 to 26
# run concurrent transactions in background and take snapshot in between to
# verify we copy newly inserted data when the snapshot is taken
bash ./run-background-traffic.sh &
BACKGROUND_TRAFFIC_PID=$!
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to move the background traffic to the inject service that runs in the background already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to have on-going traffic for at least some time before we take the snapshot (2 seconds here), that can be easily achieved here IMO. What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then let's add a comment that's what is happening, and that having control over the sequence of events is not a good fit for multiple services.

Also, can we use the bash coproc facility more than once? I believe we can give each coproc a name, see https://www.gnu.org/software/bash/manual/html_node/Coprocesses.html. Would be best to use the same background job mechanism twice rather than two different ways to do the same thing.

Copy link
Contributor Author
@shubhamdhama shubhamdhama May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried coproc at first, but it was giving this error,

warning: execute_coproc: coproc still exists

on a quick search, it seems we can't have a parallel coproc without closing an existing one in bash, it seems to have some problem with it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use the coproc syntax with a NAME, you can't run multiple coproc with the same name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I gave them different unique names. Let me see if I can still reproduce that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it worked but it still gave me this warning

/usr/src/pgcopydb/copydb.sh: line 31: warning: execute_coproc: coproc [35:BACKGROUND_TRAFFIC] still exists

here are the changes

diff --git a/tests/follow-data-only/copydb.sh b/tests/follow-data-only/copydb.sh
index 594e699..fc340b7 100755
--- a/tests/follow-data-only/copydb.sh
+++ b/tests/follow-data-only/copydb.sh
@@ -22,14 +22,13 @@ psql -v a=1 -v b=10 -d ${PGCOPYDB_SOURCE_PGURI} -f /usr/src/pgcopydb/dml.sql

 # run concurrent transactions in background and take snapshot in between to
 # verify we copy newly inserted data when the snapshot is taken
-bash ./run-background-traffic.sh &
-BACKGROUND_TRAFFIC_PID=$!
+coproc BACKGROUND_TRAFFIC ( bash ./run-background-traffic.sh )

 # wait for few seconds to allow snapshot to happen in between the traffic
 sleep 2

 # grab a snapshot on the source database
-coproc ( pgcopydb snapshot --follow --plugin wal2json --notice )
+coproc SNAPSHOT ( pgcopydb snapshot --follow --plugin wal2json --notice )

 sleep 2

@@ -51,8 +50,8 @@ pgcopydb follow --notice
 # the follow command ends when reaching endpos, set in inject.sh
 pgcopydb stream cleanup

-kill -TERM ${COPROC_PID}
-wait ${COPROC_PID}
+kill -TERM ${SNAPSHOT_PID}
+wait ${SNAPSHOT_PID}

 # check how many rows we have on source and target
 sql="select count(*), sum(some_field) from table_a"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's add a comment that's what is happening

I have already added some comments, I think they are enough, can you please suggest what are we missing?

and that having control over the sequence of events is not a good fit for multiple services.

Can you please elaborate?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay Sorry about that, I am now reading more about coproc and bash seems to only guarantee coproc for a single sub-process at a time. Let's keep your initial implementation. About the comment, here is my suggestion:

- # run concurrent transactions in background and take snapshot in between to
- # verify we copy newly inserted data when the snapshot is taken
+ # take a snapshot with concurrent activity happening
+ # it would be hard to sync concurrent activity in the inject service, so use a job instead

Copy link
Contributor Author
@shubhamdhama shubhamdhama May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes, please review.

@shubhamdhama shubhamdhama force-pushed the fix-waiting-indefinitely-for-message branch from 65969e9 to 6d64265 Compare May 25, 2023 04:16
@shubhamdhama shubhamdhama requested a review from dimitri May 25, 2023 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

First few prefetched rows not applied to target even after transformation
2 participants
0