dcgm-exporter

Here are 8 public repositories matching this topic...

hongshibao / gpu-monitoring-docker-compose

Docker compose file to set up NVIDIA GPU monitoring on a single server

docker monitoring docker-compose grafana prometheus nvidia gpu-temperature gpu-monitoring gpu-utilization gpu-metrics dcgm-exporter

Updated Aug 16, 2022
Shell

ashrafgt / k8s-gpu-hpa

Star

Horizontal Pod Autoscaling for Kubernetes using Nvidia GPU Metrics

kubernetes prometheus nvidia horizontal-pod-autoscaler gpu-metrics dcgm-exporter

Updated May 10, 2021

paolosalvatori / aks-gpu

Star

This project shows how to add a GPU-enabled node pool to an existing AKS cluster and how to autoscale and monitor GPU-enabled worker nodes

azure gpu grafana prometheus gpu-computing grafana-dashboard prometheus-metrics aks aks-cluster gpu-container dcgm-exporter

Updated Jul 7, 2021
Shell

DevSecOpsSamples / eks-gpu-autoscaling

Star

GPU Auto Scaling based on Prometheus custom metric on EKS

python kubernetes aws devops gpu container prometheus nvidia kubernetes-operator custom-metrics autoscaling cdk hpa eks dcgm-exporter

Updated Jan 13, 2023
TypeScript

Made-Jaya / NVIDIA-DCGM-Exporter

Star

This repository provides a comprehensive, production-ready solution for monitoring NVIDIA GPU metrics using open-source tools. By leveraging DCGM Exporter, Prometheus, and Grafana, it enables real-time visibility into GPU performance, health, and utilization. Designed for ease of deployment with Docker Compose, this stack is ideal for data centers,

docker-compose prometheus-exporter grafana-dashboard nvidia-docker nvidia-gpu dcgm-exporter