grafana-agent

Grafana Agent

  • Canonical Observability
Channel Revision Published Runs on
latest/stable 457 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/stable 454 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/stable 452 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/stable 456 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/stable 453 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/stable 455 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/candidate 457 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/candidate 454 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/candidate 452 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/candidate 456 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/candidate 453 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/candidate 455 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/beta 457 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/beta 454 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/beta 452 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/beta 456 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/beta 453 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/beta 455 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/edge 457 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/edge 456 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/edge 455 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/edge 454 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/edge 453 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
latest/edge 452 Yesterday
Ubuntu 24.04 Ubuntu 22.04 Ubuntu 20.04
juju deploy grafana-agent --channel candidate
Show information

Platform:

Ubuntu
24.04 22.04 20.04

Due to a juju bug fixed in Juju 3.5.4, upgrading from certain charm revisions to certain other charm revisions (hard to tell which ones exactly), will break the grafana-agent charm (and in fact, all charms using certain revisions of the tempo_coordinator_k8s.v0.charm_tracing charm library and in particular its otel dependencies).

The symptom :mask:

If you see a stack trace like the following:

hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1

Failed to load context: contextvars_context, fallback to contextvars_context
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-cos-agent-0/charm/venv/opentelemetry/context/__init__.py", line 43, in _load_runtime_context
    return next(  # type: ignore
StopIteration
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-cos-agent-0/charm/venv/opentelemetry/context/__init__.py", line 43, in _load_runtime_context
    return next(  # type: ignore
StopIteration
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-cos-agent-0/charm/./src/charm.py", line 19, in <module>
    from charms.tempo_coordinator_k8s.v0.charm_tracing import trace_charm
  File "/var/lib/juju/agents/unit-cos-agent-0/charm/lib/charms/tempo_coordinator_k8s/v0/charm_tracing.py", line 316, in <module>
    from opentelemetry.exporter.otlp.proto.common._internal.trace_encoder import (
  File "/var/lib/juju/agents/unit-cos-agent-0/charm/venv/opentelemetry/exporter/otlp/proto/common/_internal/__init__.py", line 46, in <module>
    from opentelemetry.sdk.trace import Resource
  File "/var/lib/juju/agents/unit-cos-agent-0/charm/venv/opentelemetry/sdk/trace/__init__.py", line 45, in <module>
    from opentelemetry import context as context_api
  File "/var/lib/juju/agents/unit-cos-agent-0/charm/venv/opentelemetry/context/__init__.py", line 67, in <module>
    _RUNTIME_CONTEXT = _load_runtime_context()
  File "/var/lib/juju/agents/unit-cos-agent-0/charm/venv/opentelemetry/context/__init__.py", line 57, in _load_runtime_context
    return next(  # type: ignore
StopIteration
    
juju.worker.uniter.operation hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1

then you’re affected by this issue.

The issue: during refresh, juju didn’t properly wipe the virtualenv the charm shipped with. Consequently, some python packages making use of dark importlib sorcery break on import (that is, before the charm has a chance to even run).

The cure(s) :pill:

Depending on how hardcore your environment is, the following solutions will apply.

[mermaid source]
flowchart TD
    Z[are you too late and the units are already in error?] -->|yes| Y(manual fix)
   
    A[can you afford some downtime?] -->|yes| B(Redeploy all affected units instead of refreshing them)
    Z -->|no| A
    A -->|no| C(can you upgrade the controller?)
    C --> |yes| D(Upgrade the controller to >3.5.4, then refresh units)
    C --> |no| E(manual fix)
    D --> |and then still| E(manual fix)

The manual fix :hammer_and_wrench:

For a charm already deployed with an affected Juju version, upgrading the Juju controller or Juju unit version will not clean up the empty directories that already exist inside the charms. Applying this manual fix will ensure that future deployments do not introduce new empty directories.

Strategy one

Manually wipe empty dirs from the charm sources on the affected juju units.

juju exec --application the-charm "find /var/lib/juju/agents/unit-*/charm -type d -empty -delete"

Strategy two

Manually trigger the built-in patch in charm code.

The first thing you should try is check whether the latest revision of the charm (from edge) contains the patch that works around this issue. If so, it may be enough to refresh to that revision and ensure the charm can process the upgrade-charm hook.

If the charm was in error state before you attempted the refresh, juju won’t let it proceed to see that hook. All you can do in this case is skip ahead with juju resolve --no-retry until it hits upgrade-charm, then the patch should be applied and things fixed.

A hackier but quicker solution is to jhack fire myunit upgrade-charm and directly trigger the patch code.