Docker compose file to set up NVIDIA GPU monitoring on a single server
-
Updated
Aug 16, 2022 - Shell
8000
Docker compose file to set up NVIDIA GPU monitoring on a single server
Horizontal Pod Autoscaling for Kubernetes using Nvidia GPU Metrics
This project shows how to add a GPU-enabled node pool to an existing AKS cluster and how to autoscale and monitor GPU-enabled worker nodes
GPU Auto Scaling based on Prometheus custom metric on EKS
This repository provides a comprehensive, production-ready solution for monitoring NVIDIA GPU metrics using open-source tools. By leveraging DCGM Exporter, Prometheus, and Grafana, it enables real-time visibility into GPU performance, health, and utilization. Designed for ease of deployment with Docker Compose, this stack is ideal for data centers,
Prometheus remote_write agent
Ansible role collection
Real-time NVIDIA GPU monitoring stack with Docker Compose, Prometheus and Grafana.
Add a description, image, and links to the dcgm-exporter topic page so that developers can more easily learn about it.
To associate your repository with the dcgm-exporter topic, visit your repo's landing page and select "manage topics."