Hardware Observer

  • By Canonical BootStack Charmers
Channel Revision Published Runs on
latest/stable 59 03 Apr 2024
Ubuntu 22.04 Ubuntu 20.04
latest/candidate 70 06 May 2024
Ubuntu 22.04 Ubuntu 20.04
latest/edge 70 06 May 2024
Ubuntu 22.04 Ubuntu 20.04
juju deploy hardware-observer --channel edge
Show information

Platform:

Ubuntu
22.04 20.04

Tutorial: Get started

The tutorial consists of the following steps:

  1. Prerequisites
  2. Deploy hardware-observer
  3. View the metrics
  4. Integrate with COS
  5. Clean up the Environment

Prerequisites

In order to perform the steps of this tutorial, you need access to a Juju controller that can deploy machines in a physical environment (for example: a MAAS cloud).

Deploy hardware-observer

Add a new model to perform the deployments for the tutorial.

juju add-model hw-obs-tutorial

Deploy an Ubuntu principal and a hardware-observer and grafana-agent subordinate applications in the model. Relate all three and wait for the installation of hardware-observer to complete.

# deploy applications
juju deploy ubuntu
juju deploy hardware-observer
juju deploy grafana-agent

# add relations
juju relate ubuntu hardware-observer
juju relate ubuntu grafana-agent
juju relate hardware-observer grafana-agent

The charm will automatically set up the tools required to monitor BMCs via IPMI or Redfish and HPE SSA CLI (if necessary). Other hardware resources like Dell and Broadcom RAID controllers require fetching and uploading their own vendor-specific command-line utilities. This mechanism is described in more detail in the documentation for hardware support detection.

After performing these steps, you should have a model that looks like this.

Model            Controller        Cloud/Region             Version  SLA          Timestamp
hw-obs-tutorial  serverstack-ctrl  serverstack/serverstack  3.1.5    unsupported  07:44:22Z

App                Version  Status   Scale  Charm              Channel           Rev  Exposed  Message
grafana-agent               blocked      1  grafana-agent      latest/candidate   12  no       logging-consumer: off, grafana-cloud-config: off, send-remote-write: off
hardware-observer           active       1  hardware-observer  latest/edge        12  no       Unit is ready
ubuntu             22.04    active       1  ubuntu             stable             24  no

Unit                    Workload  Agent  Machine  Public address  Ports  Message
ubuntu/1*               active    idle   1        10.5.3.177
  grafana-agent/1*      blocked   idle            10.5.3.177             logging-consumer: off, grafana-cloud-config: off, send-remote-write: off
  hardware-observer/1*  active    idle            10.5.3.177             Unit is ready

Machine  State    Address     Inst id                               Base          AZ    Message
1        started  10.5.3.177  b2b06a81-c014-4f12-93fc-83bc2745f726  ubuntu@22.04  nova  ACTIVE

Integration provider         Requirer                        Interface              Type         Message
grafana-agent:peers          grafana-agent:peers             grafana_agent_replica  peer
hardware-observer:cos-agent  grafana-agent:cos-agent         cos_agent              subordinate
ubuntu:juju-info             grafana-agent:juju-info         juju-info              subordinate
ubuntu:juju-info             hardware-observer:general-info  juju-info              subordinate

View the metrics

The exported Prometheus metrics can be viewed directly on the machine hosting the charm units at the localhost endpoint http://localhost:10200.

juju ssh hardware-observer/0 curl localhost:10200

# metrics output shortened for readability

(...)
# HELP ipmi_dcmi_power_cosumption_watts Current power consumption in watts
# TYPE ipmi_dcmi_power_cosumption_watts gauge
ipmi_dcmi_power_cosumption_watts 162.0
# HELP ipmi_dcmi_command_success Indicates if the ipmi dcmi command is successful or not
# TYPE ipmi_dcmi_command_success gauge
ipmi_dcmi_command_success 1.0
# HELP ipmi_sel_command_success Indicates if the ipmi sel command succeeded or not
# TYPE ipmi_sel_command_success gauge
ipmi_sel_command_success 1.0
# TYPE ipmi_fan_speed_rpm gauge
ipmi_fan_speed_rpm{name="Fan1",state="Nominal",unit="RPM"} 9240.0
# HELP ipmi_fan_speed_rpm Fan speed measure, in rpm
# TYPE ipmi_fan_speed_rpm gauge
ipmi_fan_speed_rpm{name="Fan2",state="Nominal",unit="RPM"} 9600.0
# HELP ipmi_fan_speed_rpm Fan speed measure, in rpm
# TYPE ipmi_fan_speed_rpm gauge
# HELP ipmi_temperature_celsius Temperature measure from temperature sensors
# TYPE ipmi_temperature_celsius gauge
ipmi_temperature_celsius{name="Exhaust Temp",state="Nominal",unit="C"} 27.0
(...)
# HELP lsi_sas_3_controllers Number of LSI SAS-{self.version} controllers
# TYPE lsi_sas_3_controllers gauge
lsi_sas_3_controllers 1.0
# HELP sas3ircu_command_success Indicates if the command is successful or not
# TYPE sas3ircu_command_success gauge
sas3ircu_command_success 1.0
# HELP lsi_sas_3_ir_volumes Number of IR volumes
# TYPE lsi_sas_3_ir_volumes gauge
lsi_sas_3_ir_volumes{controller_id="0"} 0.0
(...)

Integrate with COS

For a more real-world use case, Hardware Observer can be used alongside COS. The grafana-agent machine charm will scrape the exported metrics from the endpoint and push them to Prometheus in COS.

Read more about it on how to integrate with COS.

Clean up the Environment

After playing around with the setup, the tutorial model can be cleaned up.

juju destroy-model hw-obs-tutorial

Congratulations! Now you have a basic idea about the workings of the Hardware Observer charm.