8000 DCGM label report fix and other minor improvements by cmisale · Pull Request #67 · IBM/autopilot · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

DCGM label report fix and other minor improvements #67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 24, 2025
Merged

Conversation

cmisale
Copy link
Collaborator
@cmisale cmisale commented Feb 24, 2025

Summary

This PR:

  • fixes missed prometheus metrics when EVICT label is added to a node
  • adds a custom TTL for dcgm invasive jobs for later logs analysis
  • adds a verbose parameter in dcgm, that's used in the invasive jobs to print the output. The output is otherwise saved in memory by python and parsed
  • minor code and alerts improvements

Scope and Impact

  • API Changes?
  • No

GitHub Issue

  • None

How was this Pull-Request Tested and Validated?

  • Left running for a week and manually injected errors

Pull-Request Reminders

  • Does the Autopilot Readme require updates?

    • No
  • Are there any new software dependencies introduced to this Pull-Request?

    • No

Signed-off-by: Claudia <c.misale@ibm.com>
Copy link
Member
@Vezio Vezio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - good additions to capturing in global.go

Copy link
Collaborator
@polaya07 polaya07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good!

@cmisale cmisale merged commit b051b4a into main Feb 24, 2025
4 checks passed
@cmisale cmisale deleted the bugfix-label branch February 24, 2025 21:59
@cmisale
Copy link
Collaborator Author
cmisale commented Feb 24, 2025

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
@cmisale 3377 @polaya07 @Vezio
0