Hardware Observer

  • Canonical BootStack Charmers
Channel Revision Published Runs on
latest/stable 84 02 Jul 2024
Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/stable 13 01 Nov 2023
Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 113 15 Oct 2024
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 112 15 Oct 2024
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/candidate 13 30 Oct 2023
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 125 19 Nov 2024
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 124 19 Nov 2024
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 119 11 Nov 2024
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 118 11 Nov 2024
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
latest/edge 15 03 Nov 2023
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04
juju deploy hardware-observer --channel edge
Show information

Platform:

Ubuntu
24.04 22.04 20.04 18.04

Hardware Observer

Hardware-observer is a subordinate machine charm that provides monitoring and alerting of hardware resources on bare-metal infrastructure. This charm leverages the following exporters to provide detailed metrics:

  • Hardware Exporter: For collecting metrics from BMCs and RAID controllers.

  • Smartctl Exporter: For collecting SMART metrics from storage devices.

  • DCGM Exporter. For collecting metrics from NVIDIA GPUs (if present)

This charm is ideal for monitoring hardware resources when used in conjunction with the Canonical Observability Stack.

Hardware Exporter

Hardware-observer collects and exports Prometheus metrics from BMCs (using the IPMI and newer Redfish protocols) and various SAS and RAID controllers through the use of the prometheus-hardware-exporter project. It additionally configures Prometheus alert rules that are fired when the status of any metric is suboptimal.

Appropriate collectors and alert rules are installed based on the availability of one or more of the RAID/SAS controllers mentioned below:

  • Broadcom MegaRAID controller

  • Dell PowerEdge RAID Controller

  • LSI SAS-2 controller

  • LSI SAS-3 controller

  • HPE Smart Array controller

Smartctl Exporter

The Smartctl Exporter integrates with the Hardware-observer to provide monitoring of storage device health via SMART data. Metrics are collected and exported to Prometheus using the smartctl-exporter-snap.

DCGM Exporter

NOTE: requires revision ≥ 113

The DCGM exporter integrates with Hardware Observer to monitor NVIDIA GPUs by collecting various metrics. These metrics are then exported to Prometheus using the DCGM snap, enabling GPU performance tracking and monitoring. The snap is only installed if the charm detects the presence of NVIDIA GPUs.

Security, bugs and feature request

If you find a bug in this application or want to request a specific feature, here are the useful links:

  • Raise issues or feature requests in Github.
  • Security issues in Hardware Observer can be reported through LaunchPad. Please do not file GitHub issues about security issues.

Contributing

Please see the Juju SDK docs for guidelines on enhancements to this charm following best practice guidelines, and CONTRIBUTING.md for developer guidance.

License

Hardware Observer is free software, distributed under the Apache Software License, version 2.0. See LICENSE for more information.