-
Notifications
You must be signed in to change notification settings - Fork 1.5k
checking wal_level before quering for control checkpoint metrics #20490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checking wal_level before quering for control checkpoint metrics #20490
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the first branch of an if
being the "normal" case and the else
to be the exception, so if we could do that if possible, would be good…
Other than that LGTM
Review from jasonmp85 is dismissed. Related teams and files:
- database-monitoring-agent
- postgres/datadog_checks/postgres/postgres.py
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
if self.wal_level == 'logical': | ||
self.log.debug("wal_level is logical, adding control checkpoint metrics") | ||
|
||
if self.version >= V10: | ||
queries.append(QUERY_PG_CONTROL_CHECKPOINT) | ||
|
||
else: | ||
queries.append(QUERY_PG_CONTROL_CHECKPOINT_LT_10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, this is an Aurora specific issue. Calling pg_current_wal_lsn()
on an Aurora will fail with:
SELECT pg_current_wal_lsn();
ERROR: wal_level must be set to 'logical'
HINT: WAL control functions cannot be executed when wal_level < logical.
Normal PG can run this query regardless of the WAL level so the check should also include self.is_aurora is False
…point metrics (#20500) * Remove extra setting of internal resource tag (#20485) * Initial LiteLLM integration (#20480) * litellm poc * fixtures * litellm implementation * Delete ibm_db2/simple_ibm_db2_check.py * e2e test skip * Rename 1.added to 20480.added * Update manifest.json * Update manifest.json * license * sort metadata * license header * fix manifest * remove unusable metric * labeler * Delete celery/.junit/test-e2e-py3.12.xml * Delete silk/.junit/test-unit-py3.12.xml * Delete strimzi/.junit/test-e2e-py3.12-0.34.xml * Update common.py * Update test_unit.py * Add Agent Integrations to Falco codeowners (#20486) * Falco Integration (#20449) * Falco Integration * E2E * undo manifest changes * add docker-compose.yaml * Lint and fix docker-compose * Falco service check and lint * Add metadata.csv * fix metadata.csv * Changelog fix and add towncrier header * Add version to Agent metadata * Remove falco.version from manifest * Added wal_level check * Added test * updated changelog * Formatting * Updated some logic * checking wal_level before quering for control checkpoint metrics (#20490) * Added wal_level check * Added test * updated changelog * Formatting * Updated some logic * Release new integrations for 7.68.x (#20491) (#20492) * Release new integrations for 7.68.x (#20491) * [Release] Bumped eset_protect version to 1.0.0 * [Release] Bumped kuma version to 1.0.0 * [Release] Bumped litellm version to 1.0.0 * [Release] Bumped microsoft_dns version to 1.0.0 * [Release] Bumped watchguard_firebox version to 1.0.0 * [Release] Update metadata * Remove the in-toto new file to backport to master * Add supported OS classifiers to Falco (#20495) * Updated check and test to account for aurora environments * Added changelog and formatted code * [SQLServer] - Add AO failover monitor template (#20488) * [SQLServer] - Add AO failover monitor template * manifest * Formatting --------- Co-authored-by: Eric Weaver <eweaver755@gmail.com> Co-authored-by: Steven Yuen <steven.yuen@datadoghq.com> Co-authored-by: Kyle Neale <kyle.neale@datadoghq.com> Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com> Co-authored-by: Ilia Kurenkov <ilia.kurenkov@datadoghq.com> Co-authored-by: Joel Marcotte <91903666+joelmarcotte@users.noreply.github.com>
…point metrics (#20500) * Remove extra setting of internal resource tag (#20485) * Initial LiteLLM integration (#20480) * litellm poc * fixtures * litellm implementation * Delete ibm_db2/simple_ibm_db2_check.py * e2e test skip * Rename 1.added to 20480.added * Update manifest.json * Update manifest.json * license * sort metadata * license header * fix manifest * remove unusable metric * labeler * Delete celery/.junit/test-e2e-py3.12.xml * Delete silk/.junit/test-unit-py3.12.xml * Delete strimzi/.junit/test-e2e-py3.12-0.34.xml * Update common.py * Update test_unit.py * Add Agent Integrations to Falco codeowners (#20486) * Falco Integration (#20449) * Falco Integration * E2E * undo manifest changes * add docker-compose.yaml * Lint and fix docker-compose * Falco service check and lint * Add metadata.csv * fix metadata.csv * Changelog fix and add towncrier header * Add version to Agent metadata * Remove falco.version from manifest * Added wal_level check * Added test * updated changelog * Formatting * Updated some logic * checking wal_level before quering for control checkpoint metrics (#20490) * Added wal_level check * Added test * updated changelog * Formatting * Updated some logic * Release new integrations for 7.68.x (#20491) (#20492) * Release new integrations for 7.68.x (#20491) * [Release] Bumped eset_protect version to 1.0.0 * [Release] Bumped kuma version to 1.0.0 * [Release] Bumped litellm version to 1.0.0 * [Release] Bumped microsoft_dns version to 1.0.0 * [Release] Bumped watchguard_firebox version to 1.0.0 * [Release] Update metadata * Remove the in-toto new file to backport to master * Add supported OS classifiers to Falco (#20495) * Updated check and test to account for aurora environments * Added changelog and formatted code * [SQLServer] - Add AO failover monitor template (#20488) * [SQLServer] - Add AO failover monitor template * manifest * Formatting --------- Co-authored-by: Eric Weaver <eweaver755@gmail.com> Co-authored-by: Steven Yuen <steven.yuen@datadoghq.com> Co-authored-by: Kyle Neale <kyle.neale@datadoghq.com> Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com> Co-authored-by: Ilia Kurenkov <ilia.kurenkov@datadoghq.com> Co-authored-by: Joel Marcotte <91903666+joelmarcotte@users.noreply.github.com> (cherry picked from commit 08389f8)
What does this PR do?
Makes sure that the
wal_level
islogical
before trying to query for the control checkpoint metrics introduced in this PR.Motivation
The motivation was to address the issues brought up in this case.
User was getting the error
Error querying pg_control_checkpoint: wal_level must be set to 'logical'
when the agent tries to collect the control checkpoint metrics.Review checklist (to be filled by reviewers)
qa/skip-qa
label if the PR doesn't need to be tested during QA.backport/<branch-name>
label to the PR and it will automatically open a backport PR once this one is merged