Prometheus
- By Canonical Observability
Channel | Revision | Published | Runs on |
---|---|---|---|
latest/stable | 103 | 31 Jan 2023 | |
latest/candidate | 103 | 31 Jan 2023 | |
latest/beta | 103 | 31 Jan 2023 | |
latest/edge | 117 | Yesterday | |
1.0/stable | 103 | 31 Jan 2023 | |
1.0/candidate | 103 | 31 Jan 2023 | |
1.0/beta | 103 | 31 Jan 2023 | |
1.0/edge | 103 | 31 Jan 2023 |
juju deploy prometheus-k8s
You will need Juju 2.9 to be able to run this command. Learn how to upgrade to Juju 2.9.
Deploy Kubernetes operators easily with Juju, the Universal Operator Lifecycle Manager. Need a Kubernetes cluster? Install MicroK8s to create a full CNCF-certified Kubernetes system in under 60 seconds.
Platform:
charms.prometheus_k8s.v0.prometheus_scrape
-
- Last updated Yesterday
- Revision Library version 0
Prometheus Scrape Library.
Overview
This document explains how to integrate with the Prometheus charm for the purpose of providing a metrics endpoint to Prometheus. It also explains how alternative implementations of the Prometheus charms may maintain the same interface and be backward compatible with all currently integrated charms. Finally this document is the authoritative reference on the structure of relation data that is shared between Prometheus charms and any other charm that intends to provide a scrape target for Prometheus.
Source code
Source code can be found on GitHub at: https://github.com/canonical/prometheus-k8s-operator/tree/main/lib/charms/prometheus_k8s
Dependencies
Using this library requires you to fetch the juju_topology library from observability-libs.
charmcraft fetch-lib charms.observability_libs.v0.juju_topology
Provider Library Usage
This Prometheus charm interacts with its scrape targets using its
charm library. Charms seeking to expose metric endpoints for the
Prometheus charm, must do so using the MetricsEndpointProvider
object from this charm library. For the simplest use cases, using the
MetricsEndpointProvider
object only requires instantiating it,
typically in the constructor of your charm (the one which exposes a
metrics endpoint). The MetricsEndpointProvider
constructor requires
the name of the relation over which a scrape target (metrics endpoint)
is exposed to the Prometheus charm. This relation must use the
prometheus_scrape
interface. By default address of the metrics
endpoint is set to the unit IP address, by each unit of the
MetricsEndpointProvider
charm. These units set their address in
response to the PebbleReady
event of each container in the unit,
since container restarts of Kubernetes charms can result in change of
IP addresses. The default name for the metrics endpoint relation is
metrics-endpoint
. It is strongly recommended to use the same
relation name for consistency across charms and doing so obviates the
need for an additional constructor argument. The
MetricsEndpointProvider
object may be instantiated as follows
from charms.prometheus_k8s.v0.prometheus_scrape import MetricsEndpointProvider
def __init__(self, *args):
super().__init__(*args)
...
self.metrics_endpoint = MetricsEndpointProvider(self)
...
Note that the first argument (self
) to MetricsEndpointProvider
is
always a reference to the parent (scrape target) charm.
An instantiated MetricsEndpointProvider
object will ensure that each
unit of its parent charm, is a scrape target for the
MetricsEndpointConsumer
(Prometheus) charm. By default
MetricsEndpointProvider
assumes each unit of the consumer charm
exports its metrics at a path given by /metrics
on port 80. These
defaults may be changed by providing the MetricsEndpointProvider
constructor an optional argument (jobs
) that represents a
Prometheus scrape job specification using Python standard data
structures. This job specification is a subset of Prometheus' own
scrape
configuration
format but represented using Python data structures. More than one job
may be provided using the jobs
argument. Hence jobs
accepts a list
of dictionaries where each dictionary represents one <scrape_config>
object as described in the Prometheus documentation. The currently
supported configuration subset is: job_name
, metrics_path
,
static_configs
Suppose it is required to change the port on which scraped metrics are
exposed to 8000. This may be done by providing the following data
structure as the value of jobs
.
[
{
"static_configs": [
{
"targets": ["*:8000"]
}
]
}
]
The wildcard ("*") host specification implies that the scrape targets will automatically be set to the host addresses advertised by each unit of the consumer charm.
It is also possible to change the metrics path and scrape multiple ports, for example
[
{
"metrics_path": "/my-metrics-path",
"static_configs": [
{
"targets": ["*:8000", "*:8081"],
}
]
}
]
More complex scrape configurations are possible. For example
[
{
"static_configs": [
{
"targets": ["10.1.32.215:7000", "*:8000"],
"labels": {
"some_key": "some-value"
}
}
]
}
]
This example scrapes the target "10.1.32.215" at port 7000 in addition to scraping each unit at port 8000. There is however one difference between wildcard targets (specified using "*") and fully qualified targets (such as "10.1.32.215"). The Prometheus charm automatically associates labels with metrics generated by each target. These labels localise the source of metrics within the Juju topology by specifying its "model name", "model UUID", "application name" and "unit name". However unit name is associated only with wildcard targets but not with fully qualified targets.
Multiple jobs with different metrics paths and labels are allowed, but each job must be given a unique name:
[
{
"job_name": "my-first-job",
"metrics_path": "one-path",
"static_configs": [
{
"targets": ["*:7000"],
"labels": {
"some_key": "some-value"
}
}
]
},
{
"job_name": "my-second-job",
"metrics_path": "another-path",
"static_configs": [
{
"targets": ["*:8000"],
"labels": {
"some_other_key": "some-other-value"
}
}
]
}
]
Important: job_name
should be a fixed string (e.g. hardcoded literal).
For instance, if you include variable elements, like your unit.name
, it may break
the continuity of the metrics time series gathered by Prometheus when the leader unit
changes (e.g. on upgrade or rescale).
Additionally, it is also technically possible, but strongly discouraged, to configure the following scrape-related settings, which behave as described by the Prometheus documentation:
static_configs
scrape_interval
scrape_timeout
proxy_url
relabel_configs
metrics_relabel_configs
sample_limit
label_limit
label_name_length_limit
label_value_length_limit
The settings above are supported by the prometheus_scrape
library only for the sake of
specialized facilities like the Prometheus Scrape Config
charm. Virtually no charms should use these settings, and charmers definitely should not
expose them to the Juju administrator via configuration options.
Consumer Library Usage
The MetricsEndpointConsumer
object may be used by Prometheus
charms to manage relations with their scrape targets. For this
purposes a Prometheus charm needs to do two things
- Instantiate the
MetricsEndpointConsumer
object by providing it a
reference to the parent (Prometheus) charm and optionally the name of
the relation that the Prometheus charm uses to interact with scrape
targets. This relation must confirm to the prometheus_scrape
interface and it is strongly recommended that this relation be named
metrics-endpoint
which is its default value.
For example a Prometheus charm may instantiate the
MetricsEndpointConsumer
in its constructor as follows
from charms.prometheus_k8s.v0.prometheus_scrape import MetricsEndpointConsumer
def __init__(self, *args):
super().__init__(*args)
...
self.metrics_consumer = MetricsEndpointConsumer(self)
...
- A Prometheus charm also needs to respond to the
TargetsChangedEvent
event of the MetricsEndpointConsumer
by adding itself as
an observer for these events, as in
self.framework.observe(
self.metrics_consumer.on.targets_changed,
self._on_scrape_targets_changed,
)
In responding to the TargetsChangedEvent
event the Prometheus
charm must update the Prometheus configuration so that any new scrape
targets are added and/or old ones removed from the list of scraped
endpoints. For this purpose the MetricsEndpointConsumer
object
exposes a jobs()
method that returns a list of scrape jobs. Each
element of this list is the Prometheus scrape configuration for that
job. In order to update the Prometheus configuration, the Prometheus
charm needs to replace the current list of jobs with the list provided
by jobs()
as follows
def _on_scrape_targets_changed(self, event):
...
scrape_jobs = self.metrics_consumer.jobs()
for job in scrape_jobs:
prometheus_scrape_config.append(job)
...
Alerting Rules
This charm library also supports gathering alerting rules from all
related MetricsEndpointProvider
charms and enabling corresponding alerts within the
Prometheus charm. Alert rules are automatically gathered by MetricsEndpointProvider
charms when using this library, from a directory conventionally named
prometheus_alert_rules
. This directory must reside at the top level
in the src
folder of the consumer charm. Each file in this directory
is assumed to be in one of two formats:
- the official prometheus alert rule format, conforming to the
- a single rule format, which is a simplified subset of the official format,
comprising a single alert rule per file, using the same YAML fields.
The file name must have one of the following extensions:
.rule
.rules
.yml
.yaml
An example of the contents of such a file in the custom single rule format is shown below.
alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{my_key=my_value} > 0.5
for: 10m
labels:
severity: Medium
type: HighLatency
annotations:
summary: High request latency for {{ $labels.instance }}.
The MetricsEndpointProvider
will read all available alert rules and
also inject "filtering labels" into the alert expressions. The
filtering labels ensure that alert rules are localised to the metrics
provider charm's Juju topology (application, model and its UUID). Such
a topology filter is essential to ensure that alert rules submitted by
one provider charm generates alerts only for that same charm. When
alert rules are embedded in a charm, and the charm is deployed as a
Juju application, the alert rules from that application have their
expressions automatically updated to filter for metrics coming from
the units of that application alone. This remove risk of spurious
evaluation, e.g., when you have multiple deployments of the same charm
monitored by the same Prometheus.
Not all alerts one may want to specify can be embedded in a charm. Some alert rules will be specific to a user's use case. This is the case, for example, of alert rules that are based on business constraints, like expecting a certain amount of requests to a specific API every five minutes. Such alert rules can be specified via the COS Config Charm, which allows importing alert rules and other settings like dashboards from a Git repository.
Gathering alert rules and generating rule files within the Prometheus
charm is easily done using the alerts()
method of
MetricsEndpointConsumer
. Alerts generated by Prometheus will
automatically include Juju topology labels in the alerts. These labels
indicate the source of the alert. The following labels are
automatically included with each alert
juju_model
juju_model_uuid
juju_application
Relation Data
The Prometheus charm uses both application and unit relation data to obtain information regarding its scrape jobs, alert rules and scrape targets. This relation data is in JSON format and it closely resembles the YAML structure of Prometheus [scrape configuration] (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
Units of Metrics provider charms advertise their names and addresses
over unit relation data using the prometheus_scrape_unit_name
and
prometheus_scrape_unit_address
keys. While the scrape_metadata
,
scrape_jobs
and alert_rules
keys in application relation data
of Metrics provider charms hold eponymous information.
Index
class PrometheusConfig
Description
A namespace for utility functions for manipulating the prometheus config dict. None
Methods
PrometheusConfig. sanitize_scrape_config( job: dict )
Restrict permissible scrape configuration options.
Arguments
Returns
Description
PrometheusConfig. sanitize_scrape_configs( scrape_configs )
Description
PrometheusConfig. prefix_job_names( scrape_configs , prefix: str )
Description
PrometheusConfig. expand_wildcard_targets_into_individual_jobs( scrape_jobs , hosts , topology )
Extract wildcard hosts from the given scrape_configs list into separate jobs.
Arguments
PrometheusConfig. render_alertmanager_static_configs( alertmanagers )
Render the alertmanager static_configs section from a list of URLs.
Arguments
Returns
Description
class RelationNotFoundError
Description
Raised if there is no relation with the given name is found. None
Methods
RelationNotFoundError. __init__( self , relation_name: str )
class RelationInterfaceMismatchError
Description
Raised if the relation with the given name has a different interface. None
Methods
RelationInterfaceMismatchError. __init__( self , relation_name: str , expected_relation_interface: str , actual_relation_interface: str )
class RelationRoleMismatchError
Description
Raised if the relation with the given name has a different role. None
Methods
RelationRoleMismatchError. __init__( self , relation_name: str , expected_relation_role: RelationRole , actual_relation_role: RelationRole )
class InvalidAlertRuleEvent
Event emitted when alert rule files are not parsable.
Description
Enables us to set a clear status on the provider.
Methods
InvalidAlertRuleEvent. __init__( self , handle , errors: str , valid: bool )
InvalidAlertRuleEvent. snapshot( self )
Description
InvalidAlertRuleEvent. restore( self , snapshot )
Description
class InvalidScrapeJobEvent
Description
Event emitted when alert rule files are not valid. None
Methods
InvalidScrapeJobEvent. __init__( self , handle , errors: str )
InvalidScrapeJobEvent. snapshot( self )
Description
InvalidScrapeJobEvent. restore( self , snapshot )
Description
class MetricsEndpointProviderEvents
Description
Events raised by :class:`InvalidAlertRuleEvent`s. None
class InvalidAlertRulePathError
Description
Raised if the alert rules folder cannot be found or is otherwise invalid. None
Methods
InvalidAlertRulePathError. __init__( self , alert_rules_absolute_path: Path , message: str )
class AlertRules
Utility class for amalgamating prometheus alert rule files and injecting juju topology.
Description
An `AlertRules` object supports aggregating alert rules from files and directories in both official and single rule file formats using the `add_path()` method. All the alert rules read are annotated with Juju topology labels and amalgamated into a single data structure in the form of a Python dictionary using the `as_dict()` method. Such a dictionary can be easily dumped into JSON format and exchanged over relation data. The dictionary can also be dumped into YAML format and written directly into an alert rules file that is read by Prometheus. Note that multiple `AlertRules` objects must not be written into the same file, since Prometheus allows only a single list of alert rule groups per alert rules file. The official Prometheus format is a YAML file conforming to the Prometheus documentation (https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/). The custom single rule format is a subsection of the official YAML, having a single alert rule, effectively "one alert per file".
Methods
AlertRules. __init__( self , topology )
Build and alert rule object.
Arguments
AlertRules. add_path( self , path: str )
Add rules from a dir path.
Arguments
Returns
Description
AlertRules. as_dict( self )
Return standard alert rules file in dict representation.
Returns
class TargetsChangedEvent
Description
Event emitted when Prometheus scrape targets change. None
Methods
TargetsChangedEvent. __init__( self , handle , relation_id )
TargetsChangedEvent. snapshot( self )
Description
TargetsChangedEvent. restore( self , snapshot )
Description
class MonitoringEvents
Description
Event descriptor for events raised by `MetricsEndpointConsumer`. None
class MetricsEndpointConsumer
Description
A Prometheus based Monitoring service. None
Methods
MetricsEndpointConsumer. __init__( self , charm: CharmBase , relation_name: str )
A Prometheus based Monitoring service.
Arguments
MetricsEndpointConsumer. jobs( self )
Fetch the list of scrape jobs.
Returns
MetricsEndpointConsumer. alerts( self )
Fetch alerts for all relations.
Returns
Description
class MetricsEndpointProvider
Description
A metrics endpoint for Prometheus. None
Methods
MetricsEndpointProvider. __init__( self , charm , relation_name: str , jobs , alert_rules_path: str , refresh_event , external_url: str , lookaside_jobs_callable )
Construct a metrics provider for a Prometheus charm.
Arguments
Description
MetricsEndpointProvider. update_scrape_job_spec( self , jobs )
Description
MetricsEndpointProvider. set_scrape_job_spec( self , _ )
Ensure scrape target information is made available to prometheus.
Description
class PrometheusRulesProvider
Forward rules to Prometheus.
Arguments
Description
This object may be used to forward rules to Prometheus. At present it only supports forwarding alert rules. This is unlike :class:`MetricsEndpointProvider`, which is used for forwarding both scrape targets and associated alert rules. This object is typically used when there is a desire to forward rules that apply globally (across all deployed charms and units) rather than to a single charm. All rule files are forwarded using the same 'prometheus_scrape' interface that is also used by `MetricsEndpointProvider`.
Methods
PrometheusRulesProvider. __init__( self , charm: CharmBase , relation_name: str , dir_path: str , recursive )
class MetricsEndpointAggregator
Aggregate metrics from multiple scrape targets.
Description
`MetricsEndpointAggregator` collects scrape target information from one or more related charms and forwards this to a `MetricsEndpointConsumer` charm, which may be in a different Juju model. However, it is essential that `MetricsEndpointAggregator` itself resides in the same model as its scrape targets, as this is currently the only way to ensure in Juju that the `MetricsEndpointAggregator` will be able to determine the model name and uuid of the scrape targets. `MetricsEndpointAggregator` should be used in place of `MetricsEndpointProvider` in the following two use cases: 1. Integrating one or more scrape targets that do not support the `prometheus_scrape` interface. 2. Integrating one or more scrape targets through cross model relations. Although the [Scrape Config Operator](https://charmhub.io/cos-configuration-k8s) may also be used for the purpose of supporting cross model relations. Using `MetricsEndpointAggregator` to build a Prometheus charm client only requires instantiating it. Instantiating `MetricsEndpointAggregator` is similar to `MetricsEndpointProvider` except that it requires specifying the names of three relations: the relation with scrape targets, the relation for alert rules, and that with the Prometheus charms. For example ```python self._aggregator = MetricsEndpointAggregator( self, { "prometheus": "monitoring", "scrape_target": "prometheus-target", "alert_rules": "prometheus-rules" } ) ``` `MetricsEndpointAggregator` assumes that each unit of a scrape target sets in its unit-level relation data two entries with keys "hostname" and "port". If it is required to integrate with charms that do not honor these assumptions, it is always possible to derive from `MetricsEndpointAggregator` overriding the `_get_targets()` method, which is responsible for aggregating the unit name, host address ("hostname") and port of the scrape target. `MetricsEndpointAggregator` also assumes that each unit of a scrape target sets in its unit-level relation data a key named "groups". The value of this key is expected to be the string representation of list of Prometheus Alert rules in YAML format. An example of a single such alert rule is ```yaml - alert: HighRequestLatency expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5 for: 10m labels: severity: page annotations: summary: High request latency ``` Once again if it is required to integrate with charms that do not honour these assumptions about alert rules then an object derived from `MetricsEndpointAggregator` may be used by overriding the `_get_alert_rules()` method. `MetricsEndpointAggregator` ensures that Prometheus scrape job specifications and alert rules are annotated with Juju topology information, just like `MetricsEndpointProvider` and `MetricsEndpointConsumer` do. By default, `MetricsEndpointAggregator` ensures that Prometheus "instance" labels refer to Juju topology. This ensures that instance labels are stable over unit recreation. While it is not advisable to change this option, if required it can be done by setting the "relabel_instance" keyword argument to `False` when constructing an aggregator object.
Methods
MetricsEndpointAggregator. __init__( self , charm , relation_names , relabel_instance , resolve_addresses )
Construct a `MetricsEndpointAggregator`.
Arguments
MetricsEndpointAggregator. set_target_job_data( self , targets: dict , app_name: str )
Update scrape jobs in response to scrape target changes.
Arguments
Description
MetricsEndpointAggregator. remove_prometheus_jobs( self , job_name: str , unit_name )
Given a job name and unit name, remove scrape jobs associated.
Description
MetricsEndpointAggregator. set_alert_rule_data( self , name: str , unit_rules: dict , label_rules: bool )
Update alert rule data.
Description
MetricsEndpointAggregator. remove_alert_rules( self , group_name: str , unit_name: str )
Description
MetricsEndpointAggregator. group_name( self , unit_name: str )
Construct name for an alert rule group.
Arguments
Returns
Description
class CosTool
Description
Uses cos-tool to inject label matchers into alert rule expressions and validate rules. None
Methods
CosTool. __init__( self , charm )
CosTool. path( self )
Description
CosTool. apply_label_matchers( self , rules )
Description
CosTool. validate_alert_rules( self , rules: dict )
Description
CosTool. validate_scrape_jobs( self , jobs: list )
Description
CosTool. inject_label_matchers( self , expression , topology )
Description