Hw Health

juju deploy hw-health
Show information
You will need Juju 2.9 to be able to run this command. Learn how to upgrade to Juju 2.9.
Channel Version Revision Published Runs on
latest/stable 13 13 13 Oct 2021
Ubuntu 20.04 Ubuntu 18.04 Ubuntu 16.04
latest/candidate 13 13 11 Oct 2021
Ubuntu 20.04 Ubuntu 18.04 Ubuntu 16.04
latest/edge 8 8 01 Feb 2021
Ubuntu 20.04 Ubuntu 18.04 Ubuntu 16.04

Platform:

Ubuntu
20.04 18.04 16.04

About

Hardware Monitoring for Nagios Read more


Relevant links


Discuss this charm

Share your thoughts on this charm with the community on discourse.

Join the discussion

Overview

This charm installs various hardware system monitoring tools and configures Nagios NRPE checks. It will only work for bare-metal installations on specific hardware.

Currently supported hardware is:

  • Any controller supported by the megaraid_sas driver (ie, any controller handled by the MegaRAID CLI)
  • Supermicro: LSI SAS3008 RAID card with sas3ircu (Broadcoam's SAS3IRCU_P16)
  • Huawei: LSI SAS2308 RAID card with sas2ircu (Huawei FusionServer Tools InfoCollect)
  • SSD cards: Intel's PCIe Data Center SSD, Samsung's NVMe controllers for SM961/PM961 and 172Xa/172Xb.

Hardware-independent tools:

  • Linux software RAID (mdadm)
  • IPMI as implemented by freeipmi (enable_ipmi config option is enabled by default)
  • ipmiseld from the freeipmi suite (enable_ipmiseld is enabled by default) for logging system event log entries to syslog

In the backlog, hp-health logic still needs to be backported to support Hewlett-Packard gen8 and older equipment (HP Controllers with hpacucli)

Furthermore, other hardware in the roadmap is:

  • Huawei's ES3000 V2 PCIe SSD Card with hio_info (Huawei ES3000 V2 Driver)
  • S.M.A.R.T. Monitoring tool (smartctl)

Usage

juju deploy ubuntu
juju deploy hw-health
juju deploy nrpe
juju add-relation ubuntu nrpe
juju add-relation ubuntu hw-health
juju add-relation hw-health nrpe

The Charmstore version already ships a resource. However, this resource is empty to avoid violating software redistribution license issues. To be useful, a new resource must be attached that includes your hardware manufacturer's RAID tools:

  • Option 1: juju deploy hw-health --resource tools=/tmp/zipfile.zip
  • Option 2: juju attach-resource hw-health tools=/tmp/zipfile.zip

In both cases format of zipfile.zip must be one of the following:

zip /tmp/zipfile.zip megacli sas2ircu sas3ircu
zip /tmp/zipfile.zip megacli
etc.

IPMI SEL

SEL entries can be filtered by date, in order to allow to maintain monitoring SEL content without the need to clear it.

To filter out all current SEL entries you must use the ack-sel action. This will leave out from the IPMI check all SEL entries older than today.

The ack-sel action optionally takes a date parameter. SEL entries older that date will be ignored in the check.

The show-sel action also obeys the date filter.

juju run-action hw-health/8 ack-sel --wait
# or
juju run-action hw-health/8 ack-sel date=2019-08-24 --wait
# view
juju run-action hw-health/8 show-sel --wait

To clear the filter (ie., consider all SEL entries present), you must use the unack-sel action.

Under the hood, the filtering is done by appending a --seloptions --date-range=... parameter to the check_ipmi_sensor NRPE plugin. The charm will do the right thing if a --seloptions parameter is already present via the ipmi_check_options config. But the SEL filtering set by the ack-sel action will take precedence over a date filter set manually by the ipmi_check_options config. This is:

$ date
Wed Mar 10 15:36:36 UTC 2021
$ juju config hw-health/0 ipmi_check_options='--seloptions --date-range=07/02/2019-now'
$ juju run-action hw-health/0 ack-sel

... will cause SEL entries older than the 10th of March 2021 to be ignored.

Known Limitations and Issues

Charm only install method is via Juju resources. There are plans to support snaps but snapstore only supports strictly confined snaps. Hardware monitoring tools need special permissions that are under development.

See https://forum.snapcraft.io/t/request-for-classic-confinement-sas2ircu/9023

"tools" resource needs to be attached in ZIP format, and hardware monitoring tool(s) need to be on the first level of the archive tree.

Building the tools.zip resource

In order to build the tools.zip resource it is necessary to source the binaries from the respective vendor support pages.

For example: megacli/ sas3ircu/ sas2ircu

You will then have to extract, rename, and compress the binaries to obtain the following structure:

$ zipinfo tools.zip
Archive:  tools.zip
Zip file size: 1204457 bytes, number of entries: 3
-rwxr-xr-x  3.0 unx  2720320 bx defN 19-Jan-16 11:31 megacli
-rwxrwxr-x  3.0 unx   559164 bx defN 19-Jan-16 11:31 sas2ircu
-rwxrwxr-x  3.0 unx   562560 bx defN 19-Jan-16 11:31 sas3ircu
3 files, 3842044 bytes uncompressed, 1204005 bytes compressed:  68.7%

Two more zip resources may be needed for functional tests to succeed:

  • tools-checksum.zip replaces the megacli tool by an empty file.
  • tools-missing.zip removes the megacli tool from the resource
$ zipinfo tools-checksum.zip
Archive:  tools-checksum.zip
Zip file size: 547860 bytes, number of entries: 3
-rwxr-xr-x  3.0 unx        0 bx stor 19-Jan-16 11:35 megacli
-rwxr-xr-x  3.0 unx   559164 bx defN 19-Jan-16 11:31 sas2ircu
-rwxr-xr-x  3.0 unx   660560 bx defN 19-Jan-16 11:31 sas3ircu
3 files, 1219724 bytes uncompressed, 547408 bytes compressed:  55.1%

$ zipinfo tools-missing.zip
Archive:  tools-missing.zip
Zip file size: 547718 bytes, number of entries: 2
-rwxr-xr-x  3.0 unx   559164 bx defN 19-Jan-16 11:31 sas2ircu
-rwxr-xr-x  3.0 unx   660560 bx defN 19-Jan-16 11:31 sas3ircu
2 files, 1219724 bytes uncompressed, 547408 bytes compressed:  55.1%

Note: vendor tools may be updated over time. The charm verifies that the shared binaries match a set of known checksums. If you feel a checksum is missing, please file a bug (see link below) and it will be added.

Configuration

Manufacturer option needs to be left in auto mode.

Contact Information

Please contact the Nagios charmers via the "Submit a bug" link.

Upstream Project Name