Platform:

Ubuntu
24.04 22.04 20.04 18.04
Channel Revision Published Runs on
latest/stable 894 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 893 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 892 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 891 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 890 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 889 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 888 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 887 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 886 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 885 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 884 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 883 17 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 15 17 Jan 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 883 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 884 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 885 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 886 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 887 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 888 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 889 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 890 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 894 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 893 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 892 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 891 10 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 15 02 Jan 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 899 24 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 898 24 Jun 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 894 08 May 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 893 08 May 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 892 08 May 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 891 08 May 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 890 08 May 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 889 08 May 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 888 08 May 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 887 08 May 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 886 08 May 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 885 08 May 2026
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 15 03 Nov 2023
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
juju deploy hardware-observer

Metrics

The details of the GPU metrics exposed by Hardware Observer using dcgm-exporter and node-exporter are as follows:

Metric Name Description Labels
DCGM_FI_DEV_GPU_TEMP GPU temperature (in C) DCGM_FI_DEV_BAR1_TOTAL, DCGM_FI_DEV_BRAND, DCGM_FI_DEV_CC_MODE, DCGM_FI_DEV_COMPUTE_MODE, DCGM_FI_DEV_COUNT, DCGM_FI_DEV_CUDA_COMPUTE_CAPABILITY, DCGM_FI_DEV_ECC_CURRENT, DCGM_FI_DEV_ECC_INFOROM_VER, DCGM_FI_DEV_ENFORCED_POWER_LIMIT, DCGM_FI_DEV_FB_TOTAL, DCGM_FI_DEV_GPU_MAX_OP_TEMP, DCGM_FI_DEV_INFOROM_IMAGE_VER, DCGM_FI_DEV_MAX_MEM_CLOCK, DCGM_FI_DEV_MAX_SM_CLOCK, DCGM_FI_DEV_MINOR_NUMBER, DCGM_FI_DEV_NAME, DCGM_FI_DEV_OEM_INFOROM_VER, DCGM_FI_DEV_PERSISTENCE_MODE, DCGM_FI_DEV_POWER_MGMT_LIMIT, DCGM_FI_DEV_POWER_MGMT_LIMIT_MAX, DCGM_FI_DEV_POWER_MGMT_LIMIT_MIN, DCGM_FI_DEV_SERIAL, DCGM_FI_DEV_SHUTDOWN_TEMP, DCGM_FI_DEV_SLOWDOWN_TEMP, DCGM_FI_DEV_VBIOS_VERSION, DCGM_FI_DEV_VIRTUAL_MODE, DCGM_FI_DRIVER_VERSION, DCGM_FI_NVML_VERSION, Hostname, UUID, device, gpu, modelName, pci_bus_id
DCGM_FI_DEV_POWER_USAGE Power draw (in W) Same as DCGM_FI_DEV_GPU_TEMP
DCGM_FI_DEV_GPU_UTIL GPU utilization (in %) Same as DCGM_FI_DEV_GPU_TEMP
DCGM_FI_DEV_FAN_SPEED Fan speed (in 0-100%) Same as DCGM_FI_DEV_GPU_TEMP
DCGM_FI_DEV_MEM_CLOCK Memory clock frequency (in MHz) Same as DCGM_FI_DEV_GPU_TEMP
DCGM_FI_DEV_MEM_COPY_UTIL Memory utilization (in %) Same as DCGM_FI_DEV_GPU_TEMP
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS Throttling reasons bitmask Same as DCGM_FI_DEV_GPU_TEMP
node_hwmon_chip_names Annotation metric for human-readable chip names chip, chip_name
node_hwmon_temp_celsius Hardware monitor for temperature (input) chip, sensor
node_hwmon_power_average_watt Hardware monitor for power usage in watts (average) chip, sensor
node_hwmon_freq_freq_mhz Hardware monitor for GPU frequency in MHz sensor, chip
node_hwmon_fan_rpm Hardware monitor for fan revolutions per minute (input) sensor, chip
node_hwmon_fan_max_rpm Hardware monitor for fan revolutions per minute (max) sensor, chip
node_drm_card_info Card information card, chip, memory_vendor, power_performance_level, unique_id
node_drm_gpu_busy_percent How busy the GPU is as a percentage card, chip
node_drm_memory_vram_used_bytes The used amount of VRAM in bytes card, chip
node_drm_memory_vram_size_bytes The size of VRAM in bytes card, chip

NOTE: This is the subset of metrics used for alerts and the GPU dashboard. Please see this file to learn about other DCGM metrics.

NOTE: metrics prefixed with node_ are provided by the node_exporter DRM and HWmon collectors for any GPU using open-source drivers. node_exporter is deployed by the grafana-agent charm, not hardware-observer. The metrics are reported here for convenience.

Alerts

The details of the alerts that Hardware Observer provides for NVIDIA GPUs are as follows:

Alert Rule Name Description Severity
GPUPowerBrakeThrottle NVIDIA GPU Hardware Power Brake Slowdown throttling detected Warning
GPUThermalHWThrottle NVIDIA GPU Hardware Thermal throttling detected Warning
GPUThermalSWThrottle NVIDIA GPU Software Thermal throttling detected Warning
GPUSyncBoostThrottle NVIDIA GPU Sync Boost throttling detected Warning
GPUSlowdownThrottle GPU Hardware Slowdown throttling detected Warning
GPUPowerThrottle GPU Software Power throttling detected Warning

For more details, please see NVIDIA Clocks Throttle reasons.

Throttling detection is currently only available for NVIDIA GPUs.


Help improve this document in the forum (guidelines). Last updated 1 year, 8 months ago.