Hardware Observer
- Canonical BootStack Charmers
Channel | Revision | Published | Runs on |
---|---|---|---|
latest/stable | 84 | 02 Jul 2024 | |
latest/stable | 13 | 01 Nov 2023 | |
latest/candidate | 113 | 15 Oct 2024 | |
latest/candidate | 112 | 15 Oct 2024 | |
latest/candidate | 13 | 30 Oct 2023 | |
latest/edge | 121 | 11 Nov 2024 | |
latest/edge | 120 | 11 Nov 2024 | |
latest/edge | 119 | 11 Nov 2024 | |
latest/edge | 118 | 11 Nov 2024 | |
latest/edge | 15 | 03 Nov 2023 |
juju deploy hardware-observer
Deploy universal operators easily with Juju, the Universal Operator Lifecycle Manager.
Platform:
24.04
22.04
20.04
18.04
Metrics
The details of the S.M.A.R.T. metrics exposed by Hardware Observer using the smartctl_exporter are as follows:
Metric Name | Description | Labels |
---|---|---|
smartctl_device | Device Info | ata_additional_product_id, device, ata_version, firmware_version, form_factor, interface, model_family, model_name, protocol, sata_version, scsi_vendor, scsi_product, scsi serial_number, scsi_revision |
smartctl_devices | Number of devices configured or dynamically discovered | |
smartctl_device_attribute | Device attributes | attribute_flags_long, attribute_flags_short, attribute_id, attribute_name, attribute_value_type, device |
smartctl_device_available_spare | Normalized percentage (0 to 100%) of the remaining spare capacity available | device |
smartctl_device_available_spare_threshold | When the Available Spare falls below the threshold indicated in this field, an asynchronous event completion may occur. The value is indicated as a normalized percentage (0 to 100%) | device |
smartctl_device_block_size | Device block size | blocks_type, device |
smartctl_device_bytes_read | device | |
smartctl_device_bytes_written | device | |
smartctl_device_capacity_blocks | Device capacity in blocks | device |
smartctl_device_capacity_bytes | Device capacity in bytes | device |
smartctl_device_nvme_capacity_bytes | NVMe device total capacity bytes | device |
smartctl_device_critical_warning | This field indicates critical warnings for the state of the controller | device |
smartctl_device_interface_speed | Device interface speed, bits per second | device, speed_type |
smartctl_device_media_errors | Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field | device |
smartctl_device_num_err_log_entries | Contains the number of Error Information log entries over the life of the controller | device |
smartctl_device_error_log_count | Device S.M.A.R.T. error log count | device, error_log_type |
smartctl_device_percentage_used | Contains a vendor specific estimate of the percentage of NVM subsystem life used. A value of 100 indicates that the estimated endurance of the NVM in the NVM subsystem has been consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255. This value shall be updated once per power-on hour (when the controller is not in a sleep state). | device |
smartctl_device_power_cycle_count | Device power cycle count | device |
smartctl_device_power_on_seconds | Device power on seconds | device |
smartctl_device_rotation_rate | Device rotation rate | device |
smartctl_device_smart_status | General S.M.A.R.T. status | device |
smartctl_device_smartctl_exit_status | Exit status of smartctl on device | device |
smartctl_device_statistics | Device statistics | device, statistic_table, statistic_name, statistic_flags_short, statistic_flags_long |
smartctl_device_temperature | Device temperature celsius | device, temperature_type |
smartctl_version | smartctl version | build_info, json_format_version, smartctl_version, svn_revision |
smartctl_device_self_test_log_count | Device S.M.A.R.T. self test log count | device, self_test_log_type |
smartctl_device_self_test_log_error_count | Device S.M.A.R.T. self test log error count | device, self_test_log_type |
smartctl_device_erc_seconds | Device S.M.A.R.T. Error Recovery Control Seconds | device, op_type |
smartctl_scsi_grown_defect_list | Device SCSI grown defect list counter | device |
smartctl_read_errors_corrected_by_rereads_rewrites | Read Errors Corrected by ReReads/ReWrites | device |
smartctl_read_errors_corrected_by_eccfast | Read Errors Corrected by ECC Fast | device |
smartctl_write_errors_corrected_by_eccdelayed | Write Errors Corrected by ECC Delayed | device |
smartctl_write_total_uncorrected_errors | Write Total Uncorrected Errors | device |
Alerts
The details of the alerts that are provided by Hardware Observer for S.M.A.R.T. are as follows:
Alert Rule Name | Description | Severity |
---|---|---|
SmartctlCriticalWarning | Critical warnings present for the state of the controller | Critical |
SmartctlDeviceSmartStatusFail | S.M.A.R.T. status for device is 0 | Critical |
SmartctlExitStatusFail | Non-zero exit status for smartctl command | Warning |
SmartclDeviceAttributeFailureWarning | S.M.A.R.T. attributes correlating strongly with failure have been detected | Warning |