grafana-agent

Grafana Agent

Channel Revision Published Runs on
1/stable 606 18 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable 605 18 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable 607 18 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable 603 18 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable 604 18 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/stable 602 18 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate 606 09 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate 605 09 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate 607 09 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate 603 09 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate 604 09 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/candidate 602 09 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta 606 02 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta 605 02 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta 607 02 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta 603 02 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta 604 02 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/beta 602 02 Sep 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge 607 20 Aug 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge 606 20 Aug 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge 605 20 Aug 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge 604 20 Aug 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge 603 20 Aug 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
1/edge 602 20 Aug 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge 719 06 Dec 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge 718 06 Dec 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge 717 06 Dec 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge 716 06 Dec 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge 715 06 Dec 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
dev/edge 714 06 Dec 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable 689 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable 686 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable 688 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable 684 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable 687 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/stable 685 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate 689 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate 686 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate 688 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate 684 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate 687 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/candidate 685 12 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge 707 27 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge 706 27 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge 705 27 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge 704 27 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge 703 27 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
2/edge 702 27 Nov 2025
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
juju deploy grafana-agent --channel 1/stable
Show information

Platform:

Ubuntu
24.04 22.04 20.04

When we relate a metrics provider (e.g. some server) to prometheus, we expect prometheus to post an alert if the server is not responding. With prometheus’s PromQL this can be expressed universally with up and absent expressions:

  • up < 1
  • absent(up)

Instead of having every single charm in the ecosystem duplicate the same alert rules, they are automatically generated by the prometheus-scrape and remote-write charm libraries. This alleviates charm authors from having to implement their own HostHealth rules per charm and reduces implementation error.

Avoiding alert fatigue

The alert rules are designed to be in the Pending state for 5 minutes before transitioning to the Firing state. This is necessary to avoid alerting false positives in cases of new installation, or flapping metric behaviour.

“Host down” vs. “metrics missing”

Note that HostHealth has slightly different semantics between remote-write and scrape:

  • If Prometheus failed to scrape, then the target is down (up < 1).
  • If Grafana Agent failed to remote-write (regardless of whether scrape succeeded) then it’s absent(up).

Scrape

With support for centralized (generic) alerts, Prometheus provides a HostDown alert for each charm and each of its units via alert labels.

The alert rule within prometheus_scrape contains (ignoring annotations):

groups:
  - name: HostHealth
    rules:
    - alert: HostDown
      expr: up < 1
      for: 5m
      labels:
        severity: critical
    - alert: HostMetricsMissing
      # This alert is applicable only when the provider is linked via
      # an aggregator (such as grafana agent)
      expr: absent(up)
      for: 5m
      labels:
        severity: critical

Note: We use absent(up) with for: 5m because the alert transitions from Pending to Firing. If query portability is desired, absent_over_time(up[5m]) is an alternative, but this will trigger without a Pending state after 5 minutes.

Remote write

With support for centralized (generic) alerts, Prometheus provides a HostMetricsMissing alert for Grafana Agent itself and each application that is aggregated by it.

Note: The HostMetricsMissing alert does not show each unit, only the application!

The alert rule within prometheus_remote_write contains (ignoring annotations):

groups:
  - name: AggregatorHostHealth
    rules:
    - alert: HostMetricsMissing
      expr: absent(up)
      for: 5m
      labels:
        severity: critical

Alerting scenarios

Centralized (generic) alerts are supported in the following deployment scenarios

Note: In these example, the aggregator is Grafana Agent.

Note: Check Alertmanager for labelled alerts at either the unit level (HostDown) or at the application level (HostMetricsMissing).

Metrics endpoint (k8s charms)

image

  1. When a unit of some-charm is down for 5 minutes, the HostDown alert fires in the Prometheus UI (showing the specific unit).
  2. If multiple units are down, they show in the labels as well.

With an aggregator (k8s charms)

Scrape

  1. When a unit of some-charm is down for 5 minutes, the HostDown alert fires in the Prometheus UI (showing the specific unit).
  2. If multiple units are down, they show in the labels as well.

Remote write

  1. When Grafana Agent is down for 5 minutes, the HostMetricsMissing alert fires for both the HostHealth and AggregatorHostHealth groups in the Prometheus UI.

With an aggregator (machine charms)

Scrape

  1. When a unit of some-charm is down for 5 minutes, the HostDown alert fires in the Prometheus UI (showing the specific unit).
  2. If multiple units are down, they show in the labels as well.

Remote write

  1. When Grafana Agent is down for 5 minutes, the HostMetricsMissing alert fires for both the HostHealth and AggregatorHostHealth groups in the Prometheus UI.

With Cos-proxy (machine charms)

  1. When cos-proxy is down for 5 minutes, the HostDown alert fires in the Prometheus UI.

References


Help improve this document in the forum (guidelines). Last updated 9 months ago.