-
Notifications
You must be signed in to change notification settings - Fork 90
Fix issue 8000 with missing data from transactions with BEGIN LSN less than consistent_point. #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with missing data from transactions with BEGIN LSN less than consistent_point. #295
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! please edit the main comment of that PR as indicated below.
cde70ba
to
60ff15e
Compare
f1e4913
to
65969e9
Compare
We have two important follow-ups for this PR (or maybe in this PR only),
|
tests/follow-data-only/copydb.sh
Outdated
# run concurrent transactions in background and take snapshot in between to | ||
# verify we copy newly inserted data when the snapshot is taken | ||
bash ./run-background-traffic.sh & | ||
BACKGROUND_TRAFFIC_PID=$! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to move the background traffic to the inject service that runs in the background already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to have on-going traffic for at least some time before we take the snapshot (2 seconds here), that can be easily achieved here IMO. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then let's add a comment that's what is happening, and that having control over the sequence of events is not a good fit for multiple services.
Also, can we use the bash coproc
facility more than once? I believe we can give each coproc a name, see https://www.gnu.org/software/bash/manual/html_node/Coprocesses.html. Would be best to use the same background job mechanism twice rather than two different ways to do the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried coproc
at first, but it was giving this error,
warning: execute_coproc: coproc still exists
on a quick search, it seems we can't have a parallel coproc
without closing an existing one in bash, it seems to have some problem with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to use the coproc syntax with a NAME, you can't run multiple coproc with the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I gave them different unique names. Let me see if I can still reproduce that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it worked but it still gave me this warning
/usr/src/pgcopydb/copydb.sh: line 31: warning: execute_coproc: coproc [35:BACKGROUND_TRAFFIC] still exists
here are the changes
diff --git a/tests/follow-data-only/copydb.sh b/tests/follow-data-only/copydb.sh
index 594e699..fc340b7 100755
--- a/tests/follow-data-only/copydb.sh
+++ b/tests/follow-data-only/copydb.sh
@@ -22,14 +22,13 @@ psql -v a=1 -v b=10 -d ${PGCOPYDB_SOURCE_PGURI} -f /usr/src/pgcopydb/dml.sql
# run concurrent transactions in background and take snapshot in between to
# verify we copy newly inserted data when the snapshot is taken
-bash ./run-background-traffic.sh &
-BACKGROUND_TRAFFIC_PID=$!
+coproc BACKGROUND_TRAFFIC ( bash ./run-background-traffic.sh )
# wait for few seconds to allow snapshot to happen in between the traffic
sleep 2
# grab a snapshot on the source database
-coproc ( pgcopydb snapshot --follow --plugin wal2json --notice )
+coproc SNAPSHOT ( pgcopydb snapshot --follow --plugin wal2json --notice )
sleep 2
@@ -51,8 +50,8 @@ pgcopydb follow --notice
# the follow command ends when reaching endpos, set in inject.sh
pgcopydb stream cleanup
-kill -TERM ${COPROC_PID}
-wait ${COPROC_PID}
+kill -TERM ${SNAPSHOT_PID}
+wait ${SNAPSHOT_PID}
# check how many rows we have on source and target
sql="select count(*), sum(some_field) from table_a"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's add a comment that's what is happening
I have already added some comments, I think they are enough, can you please suggest what are we missing?
and that having control over the sequence of events is not a good fit for multiple services.
Can you please elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay Sorry about that, I am now reading more about coproc and bash seems to only guarantee coproc for a single sub-process at a time. Let's keep your initial implementation. About the comment, here is my suggestion:
- # run concurrent transactions in background and take snapshot in between to
- # verify we copy newly inserted data when the snapshot is taken
+ # take a snapshot with concurrent activity happening
+ # it would be hard to sync concurrent activity in the inject service, so use a job instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the changes, please review.
65969e9
to
6d64265
Compare
Currently we skip applying SQL statements whose source/origin LSN is less than
previousLSN
. When usingpgcopydb clone --follow
, the initialpreviousLSN
matches theconsistent_point
obtained during the snapshot.Comparing LSN of each SQL statement creates a problem when parallel transactions are running during the snapshot capture. These transactions could have statements whose LSN is less than the
previousLSN
. Consequently, the initial portion of some transactions, containing statements with LSN less than theconsistent_point
, are inadvertently omitted.The fix here is to compare the COMMIT LSN instead of each statement's LSN as it is guaranteed that the COMMIT LSN of every streamed message will be greater than
consistent_point
.Fixes: #282