-
Notifications
You must be signed in to change notification settings - Fork 3.9k
crosscluster: monitor lagging spans #134090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6b3077e
to
0871b39
Compare
0871b39
to
60bc14e
Compare
f1f4403
to
5eb1bf7
Compare
bf4b58d
to
a49629e
Compare
unrelated unit test flake |
) | ||
|
||
type rangeStatsByProcessorID struct { | ||
mu syncutil.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the mutex required here? Are rows and producer metas produced in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know if they can produced in parallel, but i thought better safe than sorry, especially because these apis aren't called too often.
pkg/sql/execinfrapb/data.proto
Outdated
@@ -329,6 +329,8 @@ message RemoteProducerMetadata { | |||
(gogoproto.customname) = "FlowID", | |||
(gogoproto.customtype) = "FlowID"]; | |||
optional bool drained = 9 [(gogoproto.nullable) = false]; | |||
// ProcessorID is the ID of the processor that published the metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit / (why isn't there a formatter for this 😢 ): this comment is indented with tabs when it should be spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't tell you how many times i've tried to coax vscode to use spaces instead of tabs.
@@ -532,13 +552,31 @@ func (rh *rowHandler) handleRow(ctx context.Context, row tree.Datums) error { | |||
HighWater: &replicatedTime, | |||
} | |||
} | |||
progress.RunningStatus = fmt.Sprintf("logical replication running: %s", replicatedTime.GoTime()) | |||
progress.RunningStatus = status | |||
if fractionCompleted > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only want to show the progress bar if 0 < fractionCompleted < 1. Currently, this is always overwriting the high watermark, so we will no longer show the high watermark when everything is caught up and advancing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if fraction completed is 0, the status is now "all %d ranges are caught up"
a49629e
to
ab29769
Compare
0be0071
to
655565a
Compare
This patch teaches the ldr to collect and aggregration the count of source side ranges undergoing catchup and initial scans. In addition this patch reports this information in the job's running status and fraction completed. Epic: none Release note: none
Epic: none Release note: this patch adds the following LDR metrics - logical_replication.catchup_ranges: the number of source side ranges conducting catchup scans. -logical_replication.scanning_ranges: the number source side ranges conducting initial scans. Note that in the dbconsole, these metrics are not accurate if multiple LDR jobs are running, though there exists the equivalent labeled metrics for a user to consume via prometheus.
655565a
to
5e35c20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TFTR! bors r=dt |
This patch teaches the ldr to collect and aggregration the count of source side ranges undergoing catchup and initial scans. In addition this patch reports this information in the job's running status and fraction completed.
Epic: none
Release note: none