dashboards/nvidia-gpu/metadata.yaml (34 lines of code) (raw):
sample_dashboards:
-
category: NVIDIA GPUs
id: nvidia-overview
display_name: NVIDIA GPU Monitoring Overview (GCE & GKE)
description: |-
Displays GPU metrics for both GKE Nodes and GCE VMs. GPU metrics for the GCE VMs require the Ops Agent to be installed.
-
category: NVIDIA GPUs
id: nvidia-dcgm
display_name: NVIDIA GPU Monitoring Advanced DCGM Metrics (GCE Only)
description: |-
Displays Advanced GPU metrics from NVIDIA Datacenter GPU Manager (DCGM). This requires a specific setup (e.g. installing DCGM, installing the Ops Agent, and configuring it to receive DCGM metrics).
related_integrations:
- id: dcgm
platform: GCE
-
category: NVIDIA GPUs
id: nvidia-dcgm-prometheus
display_name: NVIDIA GPU Monitoring Advanced DCGM Metrics (GKE Only)
description: |-
Displays Advanced GPU metrics from NVIDIA Datacenter GPU Manager (DCGM). This requires a specific setup (e.g. installing DCGM and DCGM exporter).
related_integrations:
- id: dcgm
platform: GKE
-
category: NVIDIA GPUs
id: nvidia-triton-prometheus
display_name: NVIDIA Triton Inference Server (GKE Only)
description: |-
This dashboard has charts displaying throughput, latency, resource usage, errors, and other metrics for inference.
related_integrations:
- id: nvidia-triton
platform: GKE