8000 checking wal_level before quering for control checkpoint metrics by aldrickdev · Pull Request #20490 · DataDog/integrations-core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

checking wal_level before quering for control checkpoint metrics #20490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

aldrickdev
Copy link
Contributor

What does this PR do?

Makes sure that the wal_level is logical before trying to query for the control checkpoint metrics introduced in this PR.

Motivation

The motivation was to address the issues brought up in this case.

User was getting the error Error querying pg_control_checkpoint: wal_level must be set to 'logical' when the agent tries to collect the control checkpoint metrics.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@aldrickdev aldrickdev requested review from a team as code owners June 11, 2025 22:31
@aldrickdev aldrickdev added the qa/skip-qa Automatically skip this PR for the next QA label Jun 11, 2025
@aldrickdev aldrickdev changed the title Sdbm 1804 postgres error querying pg control checkpoint wal level must be set to logical checking wal_level before quering for control checkpoint metrics Jun 11, 2025
jasonmp85
jasonmp85 previously approved these changes Jun 11, 2025
Copy link
Contributor
@jasonmp85 jasonmp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the first branch of an if being the "normal" case and the else to be the exception, so if we could do that if possible, would be good…

Other than that LGTM

@temporal-github-worker-1 temporal-github-worker-1 bot dismissed jasonmp85’s stale review June 11, 2025 23:08

Review from jasonmp85 is dismissed. Related teams and files:

  • database-monitoring-agent
    • postgres/datadog_checks/postgres/postgres.py
@aldrickdev aldrickdev enabled auto-merge June 12, 2025 00:46
@aldrickdev aldrickdev added this pull request to the merge queue Jun 12, 2025
Copy link
codecov bot commented Jun 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.34%. Comparing base (d2dd062) to head (f57ec0b).
Report is 4 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
confluent_platform ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
postgres 93.12% <100.00%> (+3.51%) ⬆️
presto ?
solr ?
tomcat ?
weblogic ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Merged via the queue into master with commit 5cc1053 Jun 12, 2025
23 checks passed
@aldrickdev aldrickdev deleted the SDBM-1804-postgres-Error-querying-pg_control_checkpoint-wal_level-must-be-set-to-logical branch June 12, 2025 00:48
Comment on lines +322 to +329
if self.wal_level == 'logical':
self.log.debug("wal_level is logical, adding control checkpoint metrics")

if self.version >= V10:
queries.append(QUERY_PG_CONTROL_CHECKPOINT)

else:
queries.append(QUERY_PG_CONTROL_CHECKPOINT_LT_10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, this is an Aurora specific issue. Calling pg_current_wal_lsn() on an Aurora will fail with:

SELECT pg_current_wal_lsn();
ERROR:  wal_level must be set to 'logical'
HINT:  WAL control functions cannot be executed when wal_level < logical.

Normal PG can run this query regardless of the WAL level so the check should also include self.is_aurora is False

@aldrickdev aldrickdev restored the SDBM-1804-postgres-Error-querying-pg_control_checkpoint-wal_level-must-be-set-to-logical branch June 12, 2025 12:44
aldrickdev added a commit that referenced this pull request Jun 12, 2025
)

* Added wal_level check

* Added test

* updated changelog

* Formatting

* Updated some logic
github-merge-queue bot pushed a commit that referenced this pull request Jun 13, 2025
…point metrics (#20500)

* Remove extra setting of internal resource tag (#20485)

* Initial LiteLLM integration (#20480)

* litellm poc

* fixtures

* litellm implementation

* Delete ibm_db2/simple_ibm_db2_check.py

* e2e test skip

* Rename 1.added to 20480.added

* Update manifest.json

* Update manifest.json

* license

* sort metadata

* license header

* fix manifest

* remove unusable metric

* labeler

* Delete celery/.junit/test-e2e-py3.12.xml

* Delete silk/.junit/test-unit-py3.12.xml

* Delete strimzi/.junit/test-e2e-py3.12-0.34.xml

* Update common.py

* Update test_unit.py

* Add Agent Integrations to Falco codeowners (#20486)

* Falco Integration (#20449)

* Falco Integration

* E2E

* undo manifest changes

* add docker-compose.yaml

* Lint and fix docker-compose

* Falco service check and lint

* Add metadata.csv

* fix metadata.csv

* Changelog fix and add towncrier header

* Add version to Agent metadata

* Remove falco.version from manifest

* Added wal_level check

* Added test

* updated changelog

* Formatting

* Updated some logic

* checking wal_level before quering for control checkpoint metrics (#20490)

* Added wal_level check

* Added test

* updated changelog

* Formatting

* Updated some logic

* Release new integrations for 7.68.x (#20491) (#20492)

* Release new integrations for 7.68.x (#20491)

* [Release] Bumped eset_protect version to 1.0.0

* [Release] Bumped kuma version to 1.0.0

* [Release] Bumped litellm version to 1.0.0

* [Release] Bumped microsoft_dns version to 1.0.0

* [Release] Bumped watchguard_firebox version to 1.0.0

* [Release] Update metadata

* Remove the in-toto new file to backport to master

* Add supported OS classifiers to Falco (#20495)

* Updated check and test to account for aurora environments

* Added changelog and formatted code

* [SQLServer] - Add AO failover monitor template (#20488)

* [SQLServer] - Add AO failover monitor template

* manifest

* Formatting

---------

Co-authored-by: Eric Weaver <eweaver755@gmail.com>
Co-authored-by: Steven Yuen <steven.yuen@datadoghq.com>
Co-authored-by: Kyle Neale <kyle.neale@datadoghq.com>
Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com>
Co-authored-by: Ilia Kurenkov <ilia.kurenkov@datadoghq.com>
Co-authored-by: Joel Marcotte <91903666+joelmarcotte@users.noreply.github.com>
datadog-agent-integrations-bot bot pushed a commit that referenced this pull request Jun 25, 2025
…point metrics (#20500)

* Remove extra setting of internal resource tag (#20485)

* Initial LiteLLM integration (#20480)

* litellm poc

* fixtures

* litellm implementation

* Delete ibm_db2/simple_ibm_db2_check.py

* e2e test skip

* Rename 1.added to 20480.added

* Update manifest.json

* Update manifest.json

* license

* sort metadata

* license header

* fix manifest

* remove unusable metric

* labeler

* Delete celery/.junit/test-e2e-py3.12.xml

* Delete silk/.junit/test-unit-py3.12.xml

* Delete strimzi/.junit/test-e2e-py3.12-0.34.xml

* Update common.py

* Update test_unit.py

* Add Agent Integrations to Falco codeowners (#20486)

* Falco Integration (#20449)

* Falco Integration

* E2E

* undo manifest changes

* add docker-compose.yaml

* Lint and fix docker-compose

* Falco service check and lint

* Add metadata.csv

* fix metadata.csv

* Changelog fix and add towncrier header

* Add version to Agent metadata

* Remove falco.version from manifest

* Added wal_level check

* Added test

* updated changelog

* Formatting

* Updated some logic

* checking wal_level before quering for control checkpoint metrics (#20490)

* Added wal_level check

* Added test

* updated changelog

* Formatting

* Updated some logic

* Release new integrations for 7.68.x (#20491) (#20492)

* Release new integrations for 7.68.x (#20491)

* [Release] Bumped eset_protect version to 1.0.0

* [Release] Bumped kuma version to 1.0.0

* [Release] Bumped litellm version to 1.0.0

* [Release] Bumped microsoft_dns version to 1.0.0

* [Release] Bumped watchguard_firebox version to 1.0.0

* [Release] Update metadata

* Remove the in-toto new file to backport to master

* Add supported OS classifiers to Falco (#20495)

* Updated check and test to account for aurora environments

* Added changelog and formatted code

* [SQLServer] - Add AO failover monitor template (#20488)

* [SQLServer] - Add AO failover monitor template

* manifest

* Formatting

---------

Co-authored-by: Eric Weaver <eweaver755@gmail.com>
Co-authored-by: Steven Yuen <steven.yuen@datadoghq.com>
Co-authored-by: Kyle Neale <kyle.neale@datadoghq.com>
Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com>
Co-authored-by: Ilia Kurenkov <ilia.kurenkov@datadoghq.com>
Co-authored-by: Joel Marcotte <91903666+joelmarcotte@users.noreply.github.com>
(cherry picked from commit 08389f8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0