Grafana Agent

Canonical Observability

Architecture:

Base version:

Channel	Revision	Published	Runs on
2/stable	689	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable	686	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable	688	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable	684	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable	687	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable	685	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate	689	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate	686	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate	688	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate	684	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate	687	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate	685	12 Nov 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge	823	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge	822	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge	821	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge	820	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge	819	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge	818	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	833	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	832	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	831	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	830	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	829	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	828	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	827	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	826	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	825	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	824	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge	816	25 Feb 2026	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable	606	18 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable	605	18 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable	607	18 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable	603	18 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable	604	18 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable	602	18 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate	606	09 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate	605	09 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate	607	09 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate	603	09 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate	604	09 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate	602	09 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta	606	02 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta	605	02 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta	607	02 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta	603	02 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta	604	02 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta	602	02 Sep 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge	607	20 Aug 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge	606	20 Aug 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge	605	20 Aug 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge	604	20 Aug 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge	603	20 Aug 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge	602	20 Aug 2025	Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04

Learn to deploy on juju >

Platform:

24.04 22.04 20.04

Relevant links

Homepage

Contacts

Submit a bug

Share your thoughts on this charm with the community on discourse.

Join the discussion

When we relate a metrics provider (e.g. some server) to prometheus, we expect prometheus to post an alert if the server is not responding. With prometheus’s PromQL this can be expressed universally with up and absent expressions:

up < 1
absent(up)

Instead of having every single charm in the ecosystem duplicate the same alert rules, they are automatically generated by the prometheus-scrape and remote-write charm libraries. This alleviates charm authors from having to implement their own HostHealth rules per charm and reduces implementation error.

Avoiding alert fatigue

The alert rules are designed to be in the Pending state for 5 minutes before transitioning to the Firing state. This is necessary to avoid alerting false positives in cases of new installation, or flapping metric behaviour.

“Host down” vs. “metrics missing”

Note that HostHealth has slightly different semantics between remote-write and scrape:

If Prometheus failed to scrape, then the target is down (up < 1).
If Grafana Agent failed to remote-write (regardless of whether scrape succeeded) then it’s absent(up).

Scrape

With support for centralized (generic) alerts, Prometheus provides a HostDown alert for each charm and each of its units via alert labels.

The alert rule within prometheus_scrape contains (ignoring annotations):

groups:
  - name: HostHealth
    rules:
    - alert: HostDown
      expr: up < 1
      for: 5m
      labels:
        severity: critical
    - alert: HostMetricsMissing
      # This alert is applicable only when the provider is linked via
      # an aggregator (such as grafana agent)
      expr: absent(up)
      for: 5m
      labels:
        severity: critical

Note: We use absent(up) with for: 5m because the alert transitions from Pending to Firing. If query portability is desired, absent_over_time(up[5m]) is an alternative, but this will trigger without a Pending state after 5 minutes.

Remote write

With support for centralized (generic) alerts, Prometheus provides a HostMetricsMissing alert for Grafana Agent itself and each application that is aggregated by it.

Note: The HostMetricsMissing alert does not show each unit, only the application!

The alert rule within prometheus_remote_write contains (ignoring annotations):

groups:
  - name: AggregatorHostHealth
    rules:
    - alert: HostMetricsMissing
      expr: absent(up)
      for: 5m
      labels:
        severity: critical

Alerting scenarios

Centralized (generic) alerts are supported in the following deployment scenarios

Note: In these example, the aggregator is Grafana Agent.

Note: Check Alertmanager for labelled alerts at either the unit level (HostDown) or at the application level (HostMetricsMissing).

Metrics endpoint (k8s charms)

When a unit of some-charm is down for 5 minutes, the HostDown alert fires in the Prometheus UI (showing the specific unit).
If multiple units are down, they show in the labels as well.

With an aggregator (k8s charms)

Scrape

When a unit of some-charm is down for 5 minutes, the HostDown alert fires in the Prometheus UI (showing the specific unit).
If multiple units are down, they show in the labels as well.

Remote write

When Grafana Agent is down for 5 minutes, the HostMetricsMissing alert fires for both the HostHealth and AggregatorHostHealth groups in the Prometheus UI.

With an aggregator (machine charms)

Scrape

When a unit of some-charm is down for 5 minutes, the HostDown alert fires in the Prometheus UI (showing the specific unit).
If multiple units are down, they show in the labels as well.

Remote write

When Grafana Agent is down for 5 minutes, the HostMetricsMissing alert fires for both the HostHealth and AggregatorHostHealth groups in the Prometheus UI.

With Cos-proxy (machine charms)

When cos-proxy is down for 5 minutes, the HostDown alert fires in the Prometheus UI.

References

Absent Alerting for Jobs – Robust Perception | Prometheus Monitoring Experts
Prometheus Absent function (StackOverflow)
Julius Volz, Prometheus Best Practices and Beastly Pitfalls, PromCon, August 17, 2017

Help improve this document in the forum (guidelines). Last updated a year ago.