Grafana
- Canonical Observability
Channel | Revision | Published | Runs on |
---|---|---|---|
latest/stable | 117 | 10 Sep 2024 | |
latest/candidate | 122 | 03 Dec 2024 | |
latest/beta | 126 | 03 Dec 2024 | |
latest/edge | 128 | 19 Dec 2024 | |
1.0/stable | 93 | 12 Dec 2023 | |
1.0/candidate | 93 | 22 Nov 2023 | |
1.0/beta | 93 | 22 Nov 2023 | |
1.0/edge | 93 | 22 Nov 2023 |
juju deploy grafana-k8s
Deploy Kubernetes operators easily with Juju, the Universal Operator Lifecycle Manager. Need a Kubernetes cluster? Install MicroK8s to create a full CNCF-certified Kubernetes system in under 60 seconds.
Platform:
charms.grafana_k8s.v0.grafana_dashboard
-
- Last updated 11 Jun 2024
- Revision Library version 0.36
# Copyright 2021 Canonical Ltd.
# See LICENSE file for licensing details.
"""## Overview.
This document explains how to integrate with the Grafana charm
for the purpose of providing a dashboard which can be used by
end users. It also explains the structure of the data
expected by the `grafana-dashboard` interface, and may provide a
mechanism or reference point for providing a compatible interface
or library by providing a definitive reference guide to the
structure of relation data which is shared between the Grafana
charm and any charm providing datasource information.
## Provider Library Usage
The Grafana charm interacts with its dashboards using its charm
library. The goal of this library is to be as simple to use as
possible, and instantiation of the class with or without changing
the default arguments provides a complete use case. For the simplest
use case of a charm which bundles dashboards and provides a
`provides: grafana-dashboard` interface,
requires:
grafana-dashboard:
interface: grafana_dashboard
creation of a `GrafanaDashboardProvider` object with the default arguments is
sufficient.
:class:`GrafanaDashboardProvider` expects that bundled dashboards should
be included in your charm with a default path of:
path/to/charm.py
path/to/src/grafana_dashboards/*.{json|json.tmpl|.tmpl}
Where the files are Grafana dashboard JSON data either from the
Grafana marketplace, or directly exported from a Grafana instance.
Refer to the [official docs](https://grafana.com/tutorials/provision-dashboards-and-data-sources/)
for more information.
When constructing a dashboard that is intended to be consumed by COS, make sure to use variables
for your datasources, and name them "prometheusds" and "lokids". You can also use the following
juju topology variables in your dashboards: $juju_model, $juju_model_uuid, $juju_application
and $juju_unit. Note, however, that if metrics are coming via peripheral charms (scrape-config
or cos-config) then topology labels would not exist.
The default constructor arguments are:
`charm`: `self` from the charm instantiating this library
`relation_name`: grafana-dashboard
`dashboards_path`: "/src/grafana_dashboards"
If your configuration requires any changes from these defaults, they
may be set from the class constructor. It may be instantiated as
follows:
from charms.grafana_k8s.v0.grafana_dashboard import GrafanaDashboardProvider
class FooCharm:
def __init__(self, *args):
super().__init__(*args, **kwargs)
...
self.grafana_dashboard_provider = GrafanaDashboardProvider(self)
...
The first argument (`self`) should be a reference to the parent (providing
dashboards), as this charm's lifecycle events will be used to re-submit
dashboard information if a charm is upgraded, the pod is restarted, or other.
An instantiated `GrafanaDashboardProvider` validates that the path specified
in the constructor (or the default) exists, reads the file contents, then
compresses them with LZMA and adds them to the application relation data
when a relation is established with Grafana.
Provided dashboards will be checked by Grafana, and a series of dropdown menus
providing the ability to select query targets by Juju Model, application instance,
and unit will be added if they do not exist.
To avoid requiring `jinja` in `GrafanaDashboardProvider` users, template validation
and rendering occurs on the other side of the relation, and relation data in
the form of:
{
"event": {
"valid": `true|false`,
"errors": [],
}
}
Will be returned if rendering or validation fails. In this case, the
`GrafanaDashboardProvider` object will emit a `dashboard_status_changed` event
of the type :class:`GrafanaDashboardEvent`, which will contain information
about the validation error.
This information is added to the relation data for the charms as serialized JSON
from a dict, with a structure of:
```
{
"application": {
"dashboards": {
"uuid": a uuid generated to ensure a relation event triggers,
"templates": {
"file:{hash}": {
"content": `{compressed_template_data}`,
"charm": `charm.meta.name`,
"juju_topology": {
"model": `charm.model.name`,
"model_uuid": `charm.model.uuid`,
"application": `charm.app.name`,
"unit": `charm.unit.name`,
}
},
"file:{other_file_hash}": {
...
},
},
},
},
}
```
This is ingested by :class:`GrafanaDashboardConsumer`, and is sufficient for configuration.
The [COS Configuration Charm](https://charmhub.io/cos-configuration-k8s) can be used to
add dashboards which are not bundled with charms.
## Consumer Library Usage
The `GrafanaDashboardConsumer` object may be used by Grafana
charms to manage relations with available dashboards. For this
purpose, a charm consuming Grafana dashboard information should do
the following things:
1. Instantiate the `GrafanaDashboardConsumer` object by providing it a
reference to the parent (Grafana) charm and, optionally, the name of
the relation that the Grafana charm uses to interact with dashboards.
This relation must confirm to the `grafana-dashboard` interface.
For example a Grafana charm may instantiate the
`GrafanaDashboardConsumer` in its constructor as follows
from charms.grafana_k8s.v0.grafana_dashboard import GrafanaDashboardConsumer
def __init__(self, *args):
super().__init__(*args)
...
self.grafana_dashboard_consumer = GrafanaDashboardConsumer(self)
...
2. A Grafana charm also needs to listen to the
`GrafanaDashboardConsumer` events emitted by the `GrafanaDashboardConsumer`
by adding itself as an observer for these events:
self.framework.observe(
self.grafana_source_consumer.on.sources_changed,
self._on_dashboards_changed,
)
Dashboards can be retrieved the :meth:`dashboards`:
It will be returned in the format of:
```
[
{
"id": unique_id,
"relation_id": relation_id,
"charm": the name of the charm which provided the dashboard,
"content": compressed_template_data
},
]
```
The consuming charm should decompress the dashboard.
"""
import base64
import hashlib
import json
import logging
import lzma
import os
import platform
import re
import subprocess
import tempfile
import uuid
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Union
import yaml
from ops.charm import (
CharmBase,
HookEvent,
RelationBrokenEvent,
RelationChangedEvent,
RelationCreatedEvent,
RelationEvent,
RelationRole,
)
from ops.framework import (
EventBase,
EventSource,
Object,
ObjectEvents,
StoredDict,
StoredList,
StoredState,
)
from ops.model import Relation
# The unique Charmhub library identifier, never change it
LIBID = "c49eb9c7dfef40c7b6235ebd67010a3f"
# Increment this major API version when introducing breaking changes
LIBAPI = 0
# Increment this PATCH version before using `charmcraft publish-lib` or reset
# to 0 if you are raising the major API version
LIBPATCH = 36
logger = logging.getLogger(__name__)
DEFAULT_RELATION_NAME = "grafana-dashboard"
DEFAULT_PEER_NAME = "grafana"
RELATION_INTERFACE_NAME = "grafana_dashboard"
TOPOLOGY_TEMPLATE_DROPDOWNS = [ # type: ignore
{
"allValue": ".*",
"datasource": "${prometheusds}",
"definition": "label_values(up,juju_model)",
"description": None,
"error": None,
"hide": 0,
"includeAll": True,
"label": "Juju model",
"multi": True,
"name": "juju_model",
"query": {
"query": "label_values(up,juju_model)",
"refId": "StandardVariableQuery",
},
"refresh": 1,
"regex": "",
"skipUrlSync": False,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": False,
},
{
"allValue": ".*",
"datasource": "${prometheusds}",
"definition": 'label_values(up{juju_model=~"$juju_model"},juju_model_uuid)',
"description": None,
"error": None,
"hide": 0,
"includeAll": True,
"label": "Juju model uuid",
"multi": True,
"name": "juju_model_uuid",
"query": {
"query": 'label_values(up{juju_model=~"$juju_model"},juju_model_uuid)',
"refId": "StandardVariableQuery",
},
"refresh": 1,
"regex": "",
"skipUrlSync": False,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": False,
},
{
"allValue": ".*",
"datasource": "${prometheusds}",
"definition": 'label_values(up{juju_model=~"$juju_model",juju_model_uuid=~"$juju_model_uuid"},juju_application)',
"description": None,
"error": None,
"hide": 0,
"includeAll": True,
"label": "Juju application",
"multi": True,
"name": "juju_application",
"query": {
"query": 'label_values(up{juju_model=~"$juju_model",juju_model_uuid=~"$juju_model_uuid"},juju_application)',
"refId": "StandardVariableQuery",
},
"refresh": 1,
"regex": "",
"skipUrlSync": False,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": False,
},
{
"allValue": ".*",
"datasource": "${prometheusds}",
"definition": 'label_values(up{juju_model=~"$juju_model",juju_model_uuid=~"$juju_model_uuid",juju_application=~"$juju_application"},juju_unit)',
"description": None,
"error": None,
"hide": 0,
"includeAll": True,
"label": "Juju unit",
"multi": True,
"name": "juju_unit",
"query": {
"query": 'label_values(up{juju_model=~"$juju_model",juju_model_uuid=~"$juju_model_uuid",juju_application=~"$juju_application"},juju_unit)',
"refId": "StandardVariableQuery",
},
"refresh": 1,
"regex": "",
"skipUrlSync": False,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": False,
},
]
DATASOURCE_TEMPLATE_DROPDOWNS = [ # type: ignore
{
"description": None,
"error": None,
"hide": 0,
"includeAll": True,
"label": "Prometheus datasource",
"multi": True,
"name": "prometheusds",
"options": [],
"query": "prometheus",
"refresh": 1,
"regex": "",
"skipUrlSync": False,
"type": "datasource",
},
{
"description": None,
"error": None,
"hide": 0,
"includeAll": True,
"label": "Loki datasource",
"multi": True,
"name": "lokids",
"options": [],
"query": "loki",
"refresh": 1,
"regex": "",
"skipUrlSync": False,
"type": "datasource",
},
]
REACTIVE_CONVERTER = { # type: ignore
"allValue": None,
"datasource": "${prometheusds}",
"definition": 'label_values(up{juju_model=~"$juju_model",juju_model_uuid=~"$juju_model_uuid",juju_application=~"$juju_application"},host)',
"description": None,
"error": None,
"hide": 0,
"includeAll": True,
"label": "hosts",
"multi": True,
"name": "host",
"options": [],
"query": {
"query": 'label_values(up{juju_model=~"$juju_model",juju_model_uuid=~"$juju_model_uuid",juju_application=~"$juju_application"},host)',
"refId": "StandardVariableQuery",
},
"refresh": 1,
"regex": "",
"skipUrlSync": False,
"sort": 1,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": False,
}
class RelationNotFoundError(Exception):
"""Raised if there is no relation with the given name."""
def __init__(self, relation_name: str):
self.relation_name = relation_name
self.message = "No relation named '{}' found".format(relation_name)
super().__init__(self.message)
class RelationInterfaceMismatchError(Exception):
"""Raised if the relation with the given name has a different interface."""
def __init__(
self,
relation_name: str,
expected_relation_interface: str,
actual_relation_interface: str,
):
self.relation_name = relation_name
self.expected_relation_interface = expected_relation_interface
self.actual_relation_interface = actual_relation_interface
self.message = (
"The '{}' relation has '{}' as "
"interface rather than the expected '{}'".format(
relation_name, actual_relation_interface, expected_relation_interface
)
)
super().__init__(self.message)
class RelationRoleMismatchError(Exception):
"""Raised if the relation with the given name has a different direction."""
def __init__(
self,
relation_name: str,
expected_relation_role: RelationRole,
actual_relation_role: RelationRole,
):
self.relation_name = relation_name
self.expected_relation_interface = expected_relation_role
self.actual_relation_role = actual_relation_role
self.message = "The '{}' relation has role '{}' rather than the expected '{}'".format(
relation_name, repr(actual_relation_role), repr(expected_relation_role)
)
super().__init__(self.message)
class InvalidDirectoryPathError(Exception):
"""Raised if the grafana dashboards folder cannot be found or is otherwise invalid."""
def __init__(
self,
grafana_dashboards_absolute_path: str,
message: str,
):
self.grafana_dashboards_absolute_path = grafana_dashboards_absolute_path
self.message = message
super().__init__(self.message)
def _resolve_dir_against_charm_path(charm: CharmBase, *path_elements: str) -> str:
"""Resolve the provided path items against the directory of the main file.
Look up the directory of the charmed operator file being executed. This is normally
going to be the charm.py file of the charm including this library. Then, resolve
the provided path elements and return its absolute path.
Raises:
InvalidDirectoryPathError if the resolved path does not exist or it is not a directory
"""
charm_dir = Path(str(charm.charm_dir))
if not charm_dir.exists() or not charm_dir.is_dir():
# Operator Framework does not currently expose a robust
# way to determine the top level charm source directory
# that is consistent across deployed charms and unit tests
# Hence for unit tests the current working directory is used
# TODO: updated this logic when the following ticket is resolved
# https://github.com/canonical/operator/issues/643
charm_dir = Path(os.getcwd())
dir_path = charm_dir.absolute().joinpath(*path_elements)
if not dir_path.exists():
raise InvalidDirectoryPathError(str(dir_path), "directory does not exist")
if not dir_path.is_dir():
raise InvalidDirectoryPathError(str(dir_path), "is not a directory")
return str(dir_path)
def _validate_relation_by_interface_and_direction(
charm: CharmBase,
relation_name: str,
expected_relation_interface: str,
expected_relation_role: RelationRole,
) -> None:
"""Verifies that a relation has the necessary characteristics.
Verifies that the `relation_name` provided: (1) exists in metadata.yaml,
(2) declares as interface the interface name passed as `relation_interface`
and (3) has the right "direction", i.e., it is a relation that `charm`
provides or requires.
Args:
charm: a `CharmBase` object to scan for the matching relation.
relation_name: the name of the relation to be verified.
expected_relation_interface: the interface name to be matched by the
relation named `relation_name`.
expected_relation_role: whether the `relation_name` must be either
provided or required by `charm`.
Raises:
RelationNotFoundError: If there is no relation in the charm's metadata.yaml
named like the value of the `relation_name` argument.
RelationInterfaceMismatchError: If the relation interface of the
relation named as the provided `relation_name` argument does not
match the `expected_relation_interface` argument.
RelationRoleMismatchError: If the relation named as the provided `relation_name`
argument has a different role than what is specified by the
`expected_relation_role` argument.
"""
if relation_name not in charm.meta.relations:
raise RelationNotFoundError(relation_name)
relation = charm.meta.relations[relation_name]
actual_relation_interface = relation.interface_name
if actual_relation_interface and actual_relation_interface != expected_relation_interface:
raise RelationInterfaceMismatchError(
relation_name, expected_relation_interface, actual_relation_interface
)
if expected_relation_role == RelationRole.provides:
if relation_name not in charm.meta.provides:
raise RelationRoleMismatchError(
relation_name, RelationRole.provides, RelationRole.requires
)
elif expected_relation_role == RelationRole.requires:
if relation_name not in charm.meta.requires:
raise RelationRoleMismatchError(
relation_name, RelationRole.requires, RelationRole.provides
)
else:
raise Exception("Unexpected RelationDirection: {}".format(expected_relation_role))
def _encode_dashboard_content(content: Union[str, bytes]) -> str:
if isinstance(content, str):
content = bytes(content, "utf-8")
return base64.b64encode(lzma.compress(content)).decode("utf-8")
def _decode_dashboard_content(encoded_content: str) -> str:
return lzma.decompress(base64.b64decode(encoded_content.encode("utf-8"))).decode()
def _convert_dashboard_fields(content: str, inject_dropdowns: bool = True) -> str:
"""Make sure values are present for Juju topology.
Inserts Juju topology variables and selectors into the template, as well as
a variable for Prometheus.
"""
dict_content = json.loads(content)
datasources = {}
existing_templates = False
template_dropdowns = (
TOPOLOGY_TEMPLATE_DROPDOWNS + DATASOURCE_TEMPLATE_DROPDOWNS # type: ignore
if inject_dropdowns
else DATASOURCE_TEMPLATE_DROPDOWNS
)
# If the dashboard has __inputs, get the names to replace them. These are stripped
# from reactive dashboards in GrafanaDashboardAggregator, but charm authors in
# newer charms may import them directly from the marketplace
if "__inputs" in dict_content:
for field in dict_content["__inputs"]:
if "type" in field and field["type"] == "datasource":
datasources[field["name"]] = field["pluginName"].lower()
del dict_content["__inputs"]
# If no existing template variables exist, just insert our own
if "templating" not in dict_content:
dict_content["templating"] = {"list": list(template_dropdowns)} # type: ignore
else:
# Otherwise, set a flag so we can go back later
existing_templates = True
for template_value in dict_content["templating"]["list"]:
# Build a list of `datasource_name`: `datasource_type` mappings
# The "query" field is actually "prometheus", "loki", "influxdb", etc
if "type" in template_value and template_value["type"] == "datasource":
datasources[template_value["name"]] = template_value["query"].lower()
# Put our own variables in the template
for d in template_dropdowns: # type: ignore
if d not in dict_content["templating"]["list"]:
dict_content["templating"]["list"].insert(0, d)
dict_content = _replace_template_fields(dict_content, datasources, existing_templates)
return json.dumps(dict_content)
def _replace_template_fields( # noqa: C901
dict_content: dict, datasources: dict, existing_templates: bool
) -> dict:
"""Make templated fields get cleaned up afterwards.
If existing datasource variables are present, try to substitute them.
"""
replacements = {"loki": "${lokids}", "prometheus": "${prometheusds}"}
used_replacements = [] # type: List[str]
# If any existing datasources match types we know, or we didn't find
# any templating variables at all, template them.
if datasources or not existing_templates:
panels = dict_content.get("panels", {})
if panels:
dict_content["panels"] = _template_panels(
panels, replacements, used_replacements, existing_templates, datasources
)
# Find panels nested under rows
rows = dict_content.get("rows", {})
if rows:
for row_idx, row in enumerate(rows):
if "panels" in row.keys():
rows[row_idx]["panels"] = _template_panels(
row["panels"],
replacements,
used_replacements,
existing_templates,
datasources,
)
dict_content["rows"] = rows
# Finally, go back and pop off the templates we stubbed out
deletions = []
for tmpl in dict_content["templating"]["list"]:
if tmpl["name"] and tmpl["name"] in used_replacements:
deletions.append(tmpl)
for d in deletions:
dict_content["templating"]["list"].remove(d)
return dict_content
def _template_panels(
panels: dict,
replacements: dict,
used_replacements: list,
existing_templates: bool,
datasources: dict,
) -> dict:
"""Iterate through a `panels` object and template it appropriately."""
# Go through all the panels. If they have a datasource set, AND it's one
# that we can convert to ${lokids} or ${prometheusds}, by stripping off the
# ${} templating and comparing the name to the list we built, replace it,
# otherwise, leave it alone.
#
for panel in panels:
if "datasource" not in panel or not panel.get("datasource"):
continue
if not existing_templates:
datasource = panel.get("datasource")
if isinstance(datasource, str):
if "loki" in datasource:
panel["datasource"] = "${lokids}"
elif "grafana" in datasource:
continue
else:
panel["datasource"] = "${prometheusds}"
elif isinstance(datasource, dict):
# In dashboards exported by Grafana 9, datasource type is dict
dstype = datasource.get("type", "")
if dstype == "loki":
panel["datasource"]["uid"] = "${lokids}"
elif dstype == "prometheus":
panel["datasource"]["uid"] = "${prometheusds}"
else:
logger.debug("Unrecognized datasource type '%s'; skipping", dstype)
continue
else:
logger.error("Unknown datasource format: skipping")
continue
else:
if isinstance(panel["datasource"], str):
if panel["datasource"].lower() in replacements.values():
# Already a known template variable
continue
# Strip out variable characters and maybe braces
ds = re.sub(r"(\$|\{|\})", "", panel["datasource"])
if ds not in datasources.keys():
# Unknown, non-templated datasource, potentially a Grafana builtin
continue
replacement = replacements.get(datasources[ds], "")
if replacement:
used_replacements.append(ds)
panel["datasource"] = replacement or panel["datasource"]
elif isinstance(panel["datasource"], dict):
dstype = panel["datasource"].get("type", "")
if panel["datasource"].get("uid", "").lower() in replacements.values():
# Already a known template variable
continue
# Strip out variable characters and maybe braces
ds = re.sub(r"(\$|\{|\})", "", panel["datasource"].get("uid", ""))
if ds not in datasources.keys():
# Unknown, non-templated datasource, potentially a Grafana builtin
continue
replacement = replacements.get(datasources[ds], "")
if replacement:
used_replacements.append(ds)
panel["datasource"]["uid"] = replacement
else:
logger.error("Unknown datasource format: skipping")
continue
return panels
def _inject_labels(content: str, topology: dict, transformer: "CosTool") -> str:
"""Inject Juju topology into panel expressions via CosTool.
A dashboard will have a structure approximating:
{
"__inputs": [],
"templating": {
"list": [
{
"name": "prometheusds",
"type": "prometheus"
}
]
},
"panels": [
{
"foo": "bar",
"targets": [
{
"some": "field",
"expr": "up{job="foo"}"
},
{
"some_other": "field",
"expr": "sum(http_requests_total{instance="$foo"}[5m])}
}
],
"datasource": "${someds}"
}
]
}
`templating` is used elsewhere in this library, but the structure is not rigid. It is
not guaranteed that a panel will actually have any targets (it could be a "spacer" with
no datasource, hence no expression). It could have only one target. It could have multiple
targets. It could have multiple targets of which only one has an `expr` to evaluate. We need
to try to handle all of these concisely.
`cos-tool` (`github.com/canonical/cos-tool` as a Go module in general)
does not know "Grafana-isms", such as using `[$_variable]` to modify the query from the user
interface, so we add placeholders (as `5y`, since it must parse, but a dashboard looking for
five years for a panel query would be unusual).
Args:
content: dashboard content as a string
topology: a dict containing topology values
transformer: a 'CosTool' instance
Returns:
dashboard content with replaced values.
"""
dict_content = json.loads(content)
if "panels" not in dict_content.keys():
return json.dumps(dict_content)
# Go through all the panels and inject topology labels
# Panels may have more than one 'target' where the expressions live, so that must be
# accounted for. Additionally, `promql-transform` does not necessarily gracefully handle
# expressions with range queries including variables. Exclude these.
#
# It is not a certainty that the `datasource` field will necessarily reflect the type, so
# operate on all fields.
panels = dict_content["panels"]
topology_with_prefix = {"juju_{}".format(k): v for k, v in topology.items()}
# We need to use an index so we can insert the changed element back later
for panel_idx, panel in enumerate(panels):
if not isinstance(panel, dict):
continue
# Use the index to insert it back in the same location
panels[panel_idx] = _modify_panel(panel, topology_with_prefix, transformer)
return json.dumps(dict_content)
def _modify_panel(panel: dict, topology: dict, transformer: "CosTool") -> dict:
"""Inject Juju topology into panel expressions via CosTool.
Args:
panel: a dashboard panel as a dict
topology: a dict containing topology values
transformer: a 'CosTool' instance
Returns:
the panel with injected values
"""
if "targets" not in panel.keys():
return panel
# Pre-compile a regular expression to grab values from inside of []
range_re = re.compile(r"\[(?P<value>.*?)\]")
# Do the same for any offsets
offset_re = re.compile(r"offset\s+(?P<value>-?\s*[$\w]+)")
known_datasources = {"${prometheusds}": "promql", "${lokids}": "logql"}
targets = panel["targets"]
# We need to use an index so we can insert the changed element back later
for idx, target in enumerate(targets):
# If there's no expression, we don't need to do anything
if "expr" not in target.keys():
continue
expr = target["expr"]
if "datasource" not in panel.keys():
continue
if isinstance(panel["datasource"], str):
if panel["datasource"] not in known_datasources:
continue
querytype = known_datasources[panel["datasource"]]
elif isinstance(panel["datasource"], dict):
if panel["datasource"]["uid"] not in known_datasources:
continue
querytype = known_datasources[panel["datasource"]["uid"]]
else:
logger.error("Unknown datasource format: skipping")
continue
# Capture all values inside `[]` into a list which we'll iterate over later to
# put them back in-order. Then apply the regex again and replace everything with
# `[5y]` so promql/parser will take it.
#
# Then do it again for offsets
range_values = [m.group("value") for m in range_re.finditer(expr)]
expr = range_re.sub(r"[5y]", expr)
offset_values = [m.group("value") for m in offset_re.finditer(expr)]
expr = offset_re.sub(r"offset 5y", expr)
# Retrieve the new expression (which may be unchanged if there were no label
# matchers in the expression, or if tt was unable to be parsed like logql. It's
# virtually impossible to tell from any datasource "name" in a panel what the
# actual type is without re-implementing a complete dashboard parser, but no
# harm will some from passing invalid promql -- we'll just get the original back.
#
replacement = transformer.inject_label_matchers(expr, topology, querytype)
if replacement == target["expr"]:
# promql-tranform caught an error. Move on
continue
# Go back and substitute values in [] which were pulled out
# Enumerate with an index... again. The same regex is ok, since it will still match
# `[(.*?)]`, which includes `[5y]`, our placeholder
for i, match in enumerate(range_re.finditer(replacement)):
# Replace one-by-one, starting from the left. We build the string back with
# `str.replace(string_to_replace, replacement_value, count)`. Limit the count
# to one, since we are going through one-by-one through the list we saved earlier
# in `range_values`.
replacement = replacement.replace(
"[{}]".format(match.group("value")),
"[{}]".format(range_values[i]),
1,
)
for i, match in enumerate(offset_re.finditer(replacement)):
# Replace one-by-one, starting from the left. We build the string back with
# `str.replace(string_to_replace, replacement_value, count)`. Limit the count
# to one, since we are going through one-by-one through the list we saved earlier
# in `range_values`.
replacement = replacement.replace(
"offset {}".format(match.group("value")),
"offset {}".format(offset_values[i]),
1,
)
# Use the index to insert it back in the same location
targets[idx]["expr"] = replacement
panel["targets"] = targets
return panel
def _type_convert_stored(obj):
"""Convert Stored* to their appropriate types, recursively."""
if isinstance(obj, StoredList):
return list(map(_type_convert_stored, obj))
if isinstance(obj, StoredDict):
rdict = {} # type: Dict[Any, Any]
for k in obj.keys():
rdict[k] = _type_convert_stored(obj[k])
return rdict
return obj
class GrafanaDashboardsChanged(EventBase):
"""Event emitted when Grafana dashboards change."""
def __init__(self, handle, data=None):
super().__init__(handle)
self.data = data
def snapshot(self) -> Dict:
"""Save grafana source information."""
return {"data": self.data}
def restore(self, snapshot):
"""Restore grafana source information."""
self.data = snapshot["data"]
class GrafanaDashboardEvents(ObjectEvents):
"""Events raised by :class:`GrafanaSourceEvents`."""
dashboards_changed = EventSource(GrafanaDashboardsChanged)
class GrafanaDashboardEvent(EventBase):
"""Event emitted when Grafana dashboards cannot be resolved.
Enables us to set a clear status on the provider.
"""
def __init__(self, handle, errors: List[Dict[str, str]] = [], valid: bool = False):
super().__init__(handle)
self.errors = errors
self.error_message = "; ".join([error["error"] for error in errors if "error" in error])
self.valid = valid
def snapshot(self) -> Dict:
"""Save grafana source information."""
return {
"error_message": self.error_message,
"valid": self.valid,
"errors": json.dumps(self.errors),
}
def restore(self, snapshot):
"""Restore grafana source information."""
self.error_message = snapshot["error_message"]
self.valid = snapshot["valid"]
self.errors = json.loads(str(snapshot["errors"]))
class GrafanaProviderEvents(ObjectEvents):
"""Events raised by :class:`GrafanaSourceEvents`."""
dashboard_status_changed = EventSource(GrafanaDashboardEvent)
class GrafanaDashboardProvider(Object):
"""An API to provide Grafana dashboards to a Grafana charm."""
_stored = StoredState()
on = GrafanaProviderEvents() # pyright: ignore
def __init__(
self,
charm: CharmBase,
relation_name: str = DEFAULT_RELATION_NAME,
dashboards_path: str = "src/grafana_dashboards",
) -> None:
"""API to provide Grafana dashboard to a Grafana charmed operator.
The :class:`GrafanaDashboardProvider` object provides an API
to upload dashboards to a Grafana charm. In its most streamlined
usage, the :class:`GrafanaDashboardProvider` is integrated in a
charmed operator as follows:
self.grafana = GrafanaDashboardProvider(self)
The :class:`GrafanaDashboardProvider` will look for dashboard
templates in the `<charm-py-directory>/grafana_dashboards` folder.
Additionally, dashboard templates can be uploaded programmatically
via the :method:`GrafanaDashboardProvider.add_dashboard` method.
To use the :class:`GrafanaDashboardProvider` API, you need a relation
defined in your charm operator's metadata.yaml as follows:
provides:
grafana-dashboard:
interface: grafana_dashboard
If you would like to use relation name other than `grafana-dashboard`,
you will need to specify the relation name via the `relation_name`
argument when instantiating the :class:`GrafanaDashboardProvider` object.
However, it is strongly advised to keep the default relation name,
so that people deploying your charm will have a consistent experience
with all other charms that provide Grafana dashboards.
It is possible to provide a different file path for the Grafana dashboards
to be automatically managed by the :class:`GrafanaDashboardProvider` object
via the `dashboards_path` argument. This may be necessary when the directory
structure of your charmed operator repository is not the "usual" one as
generated by `charmcraft init`, for example when adding the charmed operator
in a Java repository managed by Maven or Gradle. However, unless there are
such constraints with other tooling, it is strongly advised to store the
Grafana dashboards in the default `<charm-py-directory>/grafana_dashboards`
folder, in order to provide a consistent experience for other charmed operator
authors.
Args:
charm: a :class:`CharmBase` object which manages this
:class:`GrafanaProvider` object. Generally this is
`self` in the instantiating class.
relation_name: a :string: name of the relation managed by this
:class:`GrafanaDashboardProvider`; it defaults to "grafana-dashboard".
dashboards_path: a filesystem path relative to the charm root
where dashboard templates can be located. By default, the library
expects dashboard files to be in the `<charm-py-directory>/grafana_dashboards`
directory.
"""
_validate_relation_by_interface_and_direction(
charm, relation_name, RELATION_INTERFACE_NAME, RelationRole.provides
)
try:
dashboards_path = _resolve_dir_against_charm_path(charm, dashboards_path)
except InvalidDirectoryPathError as e:
logger.warning(
"Invalid Grafana dashboards folder at %s: %s",
e.grafana_dashboards_absolute_path,
e.message,
)
super().__init__(charm, relation_name)
self._charm = charm
self._relation_name = relation_name
self._dashboards_path = dashboards_path
# No peer relation bucket we can rely on providers, keep StoredState here, too
self._stored.set_default(dashboard_templates={}) # type: ignore
self.framework.observe(self._charm.on.leader_elected, self._update_all_dashboards_from_dir)
self.framework.observe(self._charm.on.upgrade_charm, self._update_all_dashboards_from_dir)
self.framework.observe(self._charm.on.config_changed, self._update_all_dashboards_from_dir)
self.framework.observe(
self._charm.on[self._relation_name].relation_created,
self._on_grafana_dashboard_relation_created,
)
self.framework.observe(
self._charm.on[self._relation_name].relation_changed,
self._on_grafana_dashboard_relation_changed,
)
def add_dashboard(self, content: str, inject_dropdowns: bool = True) -> None:
"""Add a dashboard to the relation managed by this :class:`GrafanaDashboardProvider`.
Args:
content: a string representing a Jinja template. Currently, no
global variables are added to the Jinja template evaluation
context.
inject_dropdowns: a :boolean: indicating whether topology dropdowns should be
added to the dashboard
"""
# Update of storage must be done irrespective of leadership, so
# that the stored state is there when this unit becomes leader.
stored_dashboard_templates: Any = self._stored.dashboard_templates # pyright: ignore
encoded_dashboard = _encode_dashboard_content(content)
# Use as id the first chars of the encoded dashboard, so that
# it is predictable across units.
id = "prog:{}".format(encoded_dashboard[-24:-16])
stored_dashboard_templates[id] = self._content_to_dashboard_object(
encoded_dashboard, inject_dropdowns
)
stored_dashboard_templates[id]["dashboard_alt_uid"] = self._generate_alt_uid(id)
if self._charm.unit.is_leader():
for dashboard_relation in self._charm.model.relations[self._relation_name]:
self._upset_dashboards_on_relation(dashboard_relation)
def remove_non_builtin_dashboards(self) -> None:
"""Remove all dashboards to the relation added via :method:`add_dashboard`."""
# Update of storage must be done irrespective of leadership, so
# that the stored state is there when this unit becomes leader.
stored_dashboard_templates: Any = self._stored.dashboard_templates # pyright: ignore
for dashboard_id in list(stored_dashboard_templates.keys()):
if dashboard_id.startswith("prog:"):
del stored_dashboard_templates[dashboard_id]
self._stored.dashboard_templates = stored_dashboard_templates
if self._charm.unit.is_leader():
for dashboard_relation in self._charm.model.relations[self._relation_name]:
self._upset_dashboards_on_relation(dashboard_relation)
def update_dashboards(self) -> None:
"""Trigger the re-evaluation of the data on all relations."""
if self._charm.unit.is_leader():
for dashboard_relation in self._charm.model.relations[self._relation_name]:
self._upset_dashboards_on_relation(dashboard_relation)
def _update_all_dashboards_from_dir(
self, _: Optional[HookEvent] = None, inject_dropdowns: bool = True
) -> None:
"""Scans the built-in dashboards and updates relations with changes."""
# Update of storage must be done irrespective of leadership, so
# that the stored state is there when this unit becomes leader.
# Ensure we do not leave outdated dashboards by removing from stored all
# the encoded dashboards that start with "file/".
if self._dashboards_path:
stored_dashboard_templates: Any = self._stored.dashboard_templates # pyright: ignore
for dashboard_id in list(stored_dashboard_templates.keys()):
if dashboard_id.startswith("file:"):
del stored_dashboard_templates[dashboard_id]
# Path.glob uses fnmatch on the backend, which is pretty limited, so use a
# custom function for the filter
def _is_dashboard(p: Path) -> bool:
return p.is_file() and p.name.endswith((".json", ".json.tmpl", ".tmpl"))
for path in filter(_is_dashboard, Path(self._dashboards_path).glob("*")):
# path = Path(path)
id = "file:{}".format(path.stem)
stored_dashboard_templates[id] = self._content_to_dashboard_object(
_encode_dashboard_content(path.read_bytes()), inject_dropdowns
)
stored_dashboard_templates[id]["dashboard_alt_uid"] = self._generate_alt_uid(id)
self._stored.dashboard_templates = stored_dashboard_templates
if self._charm.unit.is_leader():
for dashboard_relation in self._charm.model.relations[self._relation_name]:
self._upset_dashboards_on_relation(dashboard_relation)
def _generate_alt_uid(self, key: str) -> str:
"""Generate alternative uid for dashboards.
Args:
key: A string used (along with charm.meta.name) to build the hash uid.
Returns: A hash string.
"""
raw_dashboard_alt_uid = "{}-{}".format(self._charm.meta.name, key)
return hashlib.shake_256(raw_dashboard_alt_uid.encode("utf-8")).hexdigest(8)
def _reinitialize_dashboard_data(self, inject_dropdowns: bool = True) -> None:
"""Triggers a reload of dashboard outside of an eventing workflow.
Args:
inject_dropdowns: a :bool: used to indicate whether topology dropdowns should be added
This will destroy any existing relation data.
"""
try:
_resolve_dir_against_charm_path(self._charm, self._dashboards_path)
self._update_all_dashboards_from_dir(inject_dropdowns=inject_dropdowns)
except InvalidDirectoryPathError as e:
logger.warning(
"Invalid Grafana dashboards folder at %s: %s",
e.grafana_dashboards_absolute_path,
e.message,
)
stored_dashboard_templates: Any = self._stored.dashboard_templates # pyright: ignore
for dashboard_id in list(stored_dashboard_templates.keys()):
if dashboard_id.startswith("file:"):
del stored_dashboard_templates[dashboard_id]
self._stored.dashboard_templates = stored_dashboard_templates
# With all the file-based dashboards cleared out, force a refresh
# of relation data
if self._charm.unit.is_leader():
for dashboard_relation in self._charm.model.relations[self._relation_name]:
self._upset_dashboards_on_relation(dashboard_relation)
def _on_grafana_dashboard_relation_created(self, event: RelationCreatedEvent) -> None:
"""Watch for a relation being created and automatically send dashboards.
Args:
event: The :class:`RelationJoinedEvent` sent when a
`grafana_dashboaard` relationship is joined
"""
if self._charm.unit.is_leader():
self._update_all_dashboards_from_dir()
self._upset_dashboards_on_relation(event.relation)
def _on_grafana_dashboard_relation_changed(self, event: RelationChangedEvent) -> None:
"""Watch for changes so we know if there's an error to signal back to the parent charm.
Args:
event: The `RelationChangedEvent` that triggered this handler.
"""
if self._charm.unit.is_leader():
data = json.loads(event.relation.data[event.app].get("event", "{}")) # type: ignore
if not data:
return
valid = bool(data.get("valid", True))
errors = data.get("errors", [])
if valid and not errors:
self.on.dashboard_status_changed.emit(valid=valid) # pyright: ignore
else:
self.on.dashboard_status_changed.emit( # pyright: ignore
valid=valid, errors=errors
)
def _upset_dashboards_on_relation(self, relation: Relation) -> None:
"""Update the dashboards in the relation data bucket."""
# It's completely ridiculous to add a UUID, but if we don't have some
# pseudo-random value, this never makes it across 'juju set-state'
stored_data = {
"templates": _type_convert_stored(self._stored.dashboard_templates), # pyright: ignore
"uuid": str(uuid.uuid4()),
}
relation.data[self._charm.app]["dashboards"] = json.dumps(stored_data)
def _content_to_dashboard_object(self, content: str, inject_dropdowns: bool = True) -> Dict:
return {
"charm": self._charm.meta.name,
"content": content,
"juju_topology": self._juju_topology if inject_dropdowns else {},
"inject_dropdowns": inject_dropdowns,
}
# This is not actually used in the dashboards, but is present to provide a secondary
# salt to ensure uniqueness in the dict keys in case individual charm units provide
# dashboards
@property
def _juju_topology(self) -> Dict:
return {
"model": self._charm.model.name,
"model_uuid": self._charm.model.uuid,
"application": self._charm.app.name,
"unit": self._charm.unit.name,
}
@property
def dashboard_templates(self) -> List:
"""Return a list of the known dashboard templates."""
return list(self._stored.dashboard_templates.values()) # type: ignore
class GrafanaDashboardConsumer(Object):
"""A consumer object for working with Grafana Dashboards."""
on = GrafanaDashboardEvents() # pyright: ignore
_stored = StoredState()
def __init__(
self,
charm: CharmBase,
relation_name: str = DEFAULT_RELATION_NAME,
) -> None:
"""API to receive Grafana dashboards from charmed operators.
The :class:`GrafanaDashboardConsumer` object provides an API
to consume dashboards provided by a charmed operator using the
:class:`GrafanaDashboardProvider` library. The
:class:`GrafanaDashboardConsumer` is integrated in a
charmed operator as follows:
self.grafana = GrafanaDashboardConsumer(self)
To use this library, you need a relation defined as follows in
your charm operator's metadata.yaml:
requires:
grafana-dashboard:
interface: grafana_dashboard
If you would like to use a different relation name than
`grafana-dashboard`, you need to specify the relation name via the
`relation_name` argument. However, it is strongly advised not to
change the default, so that people deploying your charm will have
a consistent experience with all other charms that consume Grafana
dashboards.
Args:
charm: a :class:`CharmBase` object which manages this
:class:`GrafanaProvider` object. Generally this is
`self` in the instantiating class.
relation_name: a :string: name of the relation managed by this
:class:`GrafanaDashboardConsumer`; it defaults to "grafana-dashboard".
"""
_validate_relation_by_interface_and_direction(
charm, relation_name, RELATION_INTERFACE_NAME, RelationRole.requires
)
super().__init__(charm, relation_name)
self._charm = charm
self._relation_name = relation_name
self._tranformer = CosTool(self._charm)
self._stored.set_default(dashboards={}) # type: ignore
self.framework.observe(
self._charm.on[self._relation_name].relation_changed,
self._on_grafana_dashboard_relation_changed,
)
self.framework.observe(
self._charm.on[self._relation_name].relation_broken,
self._on_grafana_dashboard_relation_broken,
)
self.framework.observe(
self._charm.on[DEFAULT_PEER_NAME].relation_changed,
self._on_grafana_peer_changed,
)
def get_dashboards_from_relation(self, relation_id: int) -> List:
"""Get a list of known dashboards for one instance of the monitored relation.
Args:
relation_id: the identifier of the relation instance, as returned by
:method:`ops.model.Relation.id`.
Returns: a list of known dashboards coming from the provided relation instance.
"""
return [
self._to_external_object(relation_id, dashboard)
for dashboard in self._get_stored_dashboards(relation_id)
]
def _on_grafana_dashboard_relation_changed(self, event: RelationChangedEvent) -> None:
"""Handle relation changes in related providers.
If there are changes in relations between Grafana dashboard consumers
and providers, this event handler (if the unit is the leader) will
get data for an incoming grafana-dashboard relation through a
:class:`GrafanaDashboardsChanged` event, and make the relation data
available in the app's datastore object. The Grafana charm can
then respond to the event to update its configuration.
"""
changes = False
if self._charm.unit.is_leader():
changes = self._render_dashboards_and_signal_changed(event.relation)
if changes:
self.on.dashboards_changed.emit() # pyright: ignore
def _on_grafana_peer_changed(self, _: RelationChangedEvent) -> None:
"""Emit dashboard events on peer events so secondary charm data updates."""
if self._charm.unit.is_leader():
return
self.on.dashboards_changed.emit() # pyright: ignore
def update_dashboards(self, relation: Optional[Relation] = None) -> None:
"""Re-establish dashboards on one or more relations.
If something changes between this library and a datasource, try to re-establish
invalid dashboards and invalidate active ones.
Args:
relation: a specific relation for which the dashboards have to be
updated. If not specified, all relations managed by this
:class:`GrafanaDashboardConsumer` will be updated.
"""
if self._charm.unit.is_leader():
relations = (
[relation] if relation else self._charm.model.relations[self._relation_name]
)
for relation in relations:
self._render_dashboards_and_signal_changed(relation)
def _on_grafana_dashboard_relation_broken(self, event: RelationBrokenEvent) -> None:
"""Update job config when providers depart.
When a Grafana dashboard provider departs, the configuration
for that provider is removed from the list of dashboards
"""
if not self._charm.unit.is_leader():
return
self._remove_all_dashboards_for_relation(event.relation)
def _render_dashboards_and_signal_changed(self, relation: Relation) -> bool: # type: ignore
"""Validate a given dashboard.
Verify that the passed dashboard data is able to be found in our list
of datasources and will render. If they do, let the charm know by
emitting an event.
Args:
relation: Relation; The relation the dashboard is associated with.
Returns:
a boolean indicating whether an event should be emitted
"""
other_app = relation.app
raw_data = relation.data[other_app].get("dashboards", "") # pyright: ignore
if not raw_data:
logger.warning(
"No dashboard data found in the %s:%s relation",
self._relation_name,
str(relation.id),
)
return False
data = json.loads(raw_data)
# The only piece of data needed on this side of the relations is "templates"
templates = data.pop("templates")
# The dashboards are WAY too big since this ultimately calls out to Juju to
# set the relation data, and it overflows the maximum argument length for
# subprocess, so we have to use b64, annoyingly.
# Worse, Python3 expects absolutely everything to be a byte, and a plain
# `base64.b64encode()` is still too large, so we have to go through hoops
# of encoding to byte, compressing with lzma, converting to base64 so it
# can be converted to JSON, then all the way back.
rendered_dashboards = []
relation_has_invalid_dashboards = False
for _, (fname, template) in enumerate(templates.items()):
content = None
error = None
topology = template.get("juju_topology", {})
try:
content = _decode_dashboard_content(template["content"])
inject_dropdowns = template.get("inject_dropdowns", True)
content = self._manage_dashboard_uid(content, template)
content = _convert_dashboard_fields(content, inject_dropdowns)
if topology:
content = _inject_labels(content, topology, self._tranformer)
content = _encode_dashboard_content(content)
except lzma.LZMAError as e:
error = str(e)
relation_has_invalid_dashboards = True
except json.JSONDecodeError as e:
error = str(e.msg)
logger.warning("Invalid JSON in Grafana dashboard: {}".format(fname))
continue
# Prepend the relation name and ID to the dashboard ID to avoid clashes with
# multiple relations with apps from the same charm, or having dashboards with
# the same ids inside their charm operators
rendered_dashboards.append(
{
"id": "{}:{}/{}".format(relation.name, relation.id, fname),
"original_id": fname,
"content": content if content else None,
"template": template,
"valid": (error is None),
"error": error,
}
)
if relation_has_invalid_dashboards:
self._remove_all_dashboards_for_relation(relation)
invalid_templates = [
data["original_id"] for data in rendered_dashboards if not data["valid"]
]
logger.warning(
"Cannot add one or more Grafana dashboards from relation '{}:{}': the following "
"templates are invalid: {}".format(
relation.name,
relation.id,
invalid_templates,
)
)
relation.data[self._charm.app]["event"] = json.dumps(
{
"errors": [
{
"dashboard_id": rendered_dashboard["original_id"],
"error": rendered_dashboard["error"],
}
for rendered_dashboard in rendered_dashboards
if rendered_dashboard["error"]
]
}
)
# Dropping dashboards for a relation needs to be signalled
return True
stored_data = rendered_dashboards
currently_stored_data = self._get_stored_dashboards(relation.id)
coerced_data = _type_convert_stored(currently_stored_data) if currently_stored_data else {}
if not coerced_data == stored_data:
stored_dashboards = self.get_peer_data("dashboards")
stored_dashboards[relation.id] = stored_data
self.set_peer_data("dashboards", stored_dashboards)
return True
return None # type: ignore
def _manage_dashboard_uid(self, dashboard: str, template: dict) -> str:
"""Add an uid to the dashboard if it is not present."""
dashboard_dict = json.loads(dashboard)
if not dashboard_dict.get("uid", None) and "dashboard_alt_uid" in template:
dashboard_dict["uid"] = template["dashboard_alt_uid"]
return json.dumps(dashboard_dict)
def _remove_all_dashboards_for_relation(self, relation: Relation) -> None:
"""If an errored dashboard is in stored data, remove it and trigger a deletion."""
if self._get_stored_dashboards(relation.id):
stored_dashboards = self.get_peer_data("dashboards")
stored_dashboards.pop(str(relation.id))
self.set_peer_data("dashboards", stored_dashboards)
self.on.dashboards_changed.emit() # pyright: ignore
def _to_external_object(self, relation_id, dashboard):
return {
"id": dashboard["original_id"],
"relation_id": relation_id,
"charm": dashboard["template"]["charm"],
"content": _decode_dashboard_content(dashboard["content"]),
}
@property
def dashboards(self) -> List[Dict]:
"""Get a list of known dashboards across all instances of the monitored relation.
Returns: a list of known dashboards. The JSON of each of the dashboards is available
in the `content` field of the corresponding `dict`.
"""
dashboards = []
for _, (relation_id, dashboards_for_relation) in enumerate(
self.get_peer_data("dashboards").items()
):
for dashboard in dashboards_for_relation:
dashboards.append(self._to_external_object(relation_id, dashboard))
return dashboards
def _get_stored_dashboards(self, relation_id: int) -> list:
"""Pull stored dashboards out of the peer data bucket."""
return self.get_peer_data("dashboards").get(str(relation_id), {})
def _set_default_data(self) -> None:
"""Set defaults if they are not in peer relation data."""
data = {"dashboards": {}} # type: ignore
for k, v in data.items():
if not self.get_peer_data(k):
self.set_peer_data(k, v)
def set_peer_data(self, key: str, data: Any) -> None:
"""Put information into the peer data bucket instead of `StoredState`."""
self._charm.peers.data[self._charm.app][key] = json.dumps(data) # type: ignore[attr-defined]
def get_peer_data(self, key: str) -> Any:
"""Retrieve information from the peer data bucket instead of `StoredState`."""
data = self._charm.peers.data[self._charm.app].get(key, "") # type: ignore[attr-defined]
return json.loads(data) if data else {}
class GrafanaDashboardAggregator(Object):
"""API to retrieve Grafana dashboards from machine dashboards.
The :class:`GrafanaDashboardAggregator` object provides a way to
collate and aggregate Grafana dashboards from reactive/machine charms
and transport them into Charmed Operators, using Juju topology.
For detailed usage instructions, see the documentation for
:module:`cos-proxy-operator`, as this class is intended for use as a
single point of intersection rather than use in individual charms.
Since :class:`GrafanaDashboardAggregator` serves as a bridge between
Canonical Observability Stack Charmed Operators and Reactive Charms,
deployed in a Reactive Juju model, both a target relation which is
used to collect events from Reactive charms and a `grafana_relation`
which is used to send the collected data back to the Canonical
Observability Stack are required.
In its most streamlined usage, :class:`GrafanaDashboardAggregator` is
integrated in a charmed operator as follows:
self.grafana = GrafanaDashboardAggregator(self)
Args:
charm: a :class:`CharmBase` object which manages this
:class:`GrafanaProvider` object. Generally this is
`self` in the instantiating class.
target_relation: a :string: name of a relation managed by this
:class:`GrafanaDashboardAggregator`, which is used to communicate
with reactive/machine charms it defaults to "dashboards".
grafana_relation: a :string: name of a relation used by this
:class:`GrafanaDashboardAggregator`, which is used to communicate
with charmed grafana. It defaults to "downstream-grafana-dashboard"
"""
_stored = StoredState()
on = GrafanaProviderEvents() # pyright: ignore
def __init__(
self,
charm: CharmBase,
target_relation: str = "dashboards",
grafana_relation: str = "downstream-grafana-dashboard",
):
super().__init__(charm, grafana_relation)
# Reactive charms may be RPC-ish and not leave reliable data around. Keep
# StoredState here
self._stored.set_default( # type: ignore
dashboard_templates={},
id_mappings={},
)
self._charm = charm
self._target_relation = target_relation
self._grafana_relation = grafana_relation
self.framework.observe(
self._charm.on[self._grafana_relation].relation_joined,
self._update_remote_grafana,
)
self.framework.observe(
self._charm.on[self._grafana_relation].relation_changed,
self._update_remote_grafana,
)
self.framework.observe(
self._charm.on[self._target_relation].relation_changed,
self.update_dashboards,
)
self.framework.observe(
self._charm.on[self._target_relation].relation_broken,
self.remove_dashboards,
)
def update_dashboards(self, event: RelationEvent) -> None:
"""If we get a dashboard from a reactive charm, parse it out and update."""
if self._charm.unit.is_leader():
self._upset_dashboards_on_event(event)
def _upset_dashboards_on_event(self, event: RelationEvent) -> None:
"""Update the dashboards in the relation data bucket."""
dashboards = self._handle_reactive_dashboards(event)
if not dashboards:
logger.warning(
"Could not find dashboard data after a relation change for {}".format(event.app)
)
return
for id in dashboards:
self._stored.dashboard_templates[id] = self._content_to_dashboard_object( # type: ignore
dashboards[id], event
)
self._stored.id_mappings[event.app.name] = dashboards # type: ignore
self._update_remote_grafana(event)
def _update_remote_grafana(self, _: Optional[RelationEvent] = None) -> None:
"""Push dashboards to the downstream Grafana relation."""
# It's still ridiculous to add a UUID here, but needed
stored_data = {
"templates": _type_convert_stored(self._stored.dashboard_templates), # pyright: ignore
"uuid": str(uuid.uuid4()),
}
if self._charm.unit.is_leader():
for grafana_relation in self.model.relations[self._grafana_relation]:
grafana_relation.data[self._charm.app]["dashboards"] = json.dumps(stored_data)
def remove_dashboards(self, event: RelationBrokenEvent) -> None:
"""Remove a dashboard if the relation is broken."""
app_ids = _type_convert_stored(self._stored.id_mappings.get(event.app.name, "")) # type: ignore
if not app_ids:
logger.info("Could not look up stored dashboards for %s", event.app.name) # type: ignore
return
del self._stored.id_mappings[event.app.name] # type: ignore
for id in app_ids:
del self._stored.dashboard_templates[id] # type: ignore
stored_data = {
"templates": _type_convert_stored(self._stored.dashboard_templates), # pyright: ignore
"uuid": str(uuid.uuid4()),
}
if self._charm.unit.is_leader():
for grafana_relation in self.model.relations[self._grafana_relation]:
grafana_relation.data[self._charm.app]["dashboards"] = json.dumps(stored_data)
# Yes, this has a fair amount of branching. It's not that complex, though
def _strip_existing_datasources(self, dash: dict) -> dict: # noqa: C901
"""Remove existing reactive charm datasource templating out.
This method iterates through *known* places where reactive charms may set
data in contributed dashboards and removes them.
`dashboard["__inputs"]` is a property sometimes set when exporting dashboards from
the Grafana UI. It is not present in earlier Grafana versions, and can be disabled
in 5.3.4 and above (optionally). If set, any values present will be substituted on
import. Some reactive charms use this for Prometheus. COS uses dropdown selectors
for datasources, and leaving this present results in "default" datasource values
which are broken.
Similarly, `dashboard["templating"]["list"][N]["name"] == "host"` can be used to
set a `host` variable for use in dashboards which is not meaningful in the context
of Juju topology and will yield broken dashboards.
Further properties may be discovered.
"""
try:
if "list" in dash["templating"]:
for i in range(len(dash["templating"]["list"])):
if (
"datasource" in dash["templating"]["list"][i]
and dash["templating"]["list"][i]["datasource"] is not None
):
if "Juju" in dash["templating"]["list"][i].get("datasource", ""):
dash["templating"]["list"][i]["datasource"] = r"${prometheusds}"
# Strip out newly-added 'juju_application' template variables which
# don't line up with our drop-downs
dash_mutable = dash
for i in range(len(dash["templating"]["list"])):
if (
"name" in dash["templating"]["list"][i]
and dash["templating"]["list"][i].get("name", "") == "app"
):
del dash_mutable["templating"]["list"][i]
if dash_mutable:
dash = dash_mutable
except KeyError:
logger.debug("No existing templating data in dashboard")
if "__inputs" in dash:
inputs = dash
for i in range(len(dash["__inputs"])):
if dash["__inputs"][i].get("pluginName", "") == "Prometheus":
del inputs["__inputs"][i]
if inputs:
dash["__inputs"] = inputs["__inputs"]
else:
del dash["__inputs"]
return dash
def _handle_reactive_dashboards(self, event: RelationEvent) -> Optional[Dict]:
"""Look for a dashboard in relation data (during a reactive hook) or builtin by name."""
if not self._charm.unit.is_leader():
return {}
templates = []
id = ""
# Reactive data can reliably be pulled out of events. In theory, if we got an event,
# it's on the bucket, but using event explicitly keeps the mental model in
# place for reactive
for k in event.relation.data[event.unit].keys(): # type: ignore
if k.startswith("request_"):
templates.append(json.loads(event.relation.data[event.unit][k])["dashboard"]) # type: ignore
for k in event.relation.data[event.app].keys(): # type: ignore
if k.startswith("request_"):
templates.append(json.loads(event.relation.data[event.app][k])["dashboard"]) # type: ignore
builtins = self._maybe_get_builtin_dashboards(event)
if not templates and not builtins:
logger.warning("NOTHING!")
return {}
dashboards = {}
for t in templates:
# This seems ridiculous, too, but to get it from a "dashboards" key in serialized JSON
# in the bucket back out to the actual "dashboard" we _need_, this is the way
# This is not a mistake -- there's a double nesting in reactive charms, and
# Grafana won't load it. We have to unbox:
# event.relation.data[event.<type>]["request_*"]["dashboard"]["dashboard"],
# and the final unboxing is below.
#
# Apparently SOME newer dashboards (such as Ceph) do not have this double nesting, so
# now we get to account for both :toot:
dash = t.get("dashboard", {}) or t
# Replace values with LMA-style templating
dash = self._strip_existing_datasources(dash)
dash = json.dumps(dash)
# Replace the old-style datasource templates
dash = re.sub(r"<< datasource >>", r"${prometheusds}", dash)
dash = re.sub(r'"datasource": "prom.*?"', r'"datasource": "${prometheusds}"', dash)
dash = re.sub(
r'"datasource": "\$datasource"', r'"datasource": "${prometheusds}"', dash
)
dash = re.sub(r'"uid": "\$datasource"', r'"uid": "${prometheusds}"', dash)
dash = re.sub(
r'"datasource": "(!?\w)[\w|\s|-]+?Juju generated.*?"',
r'"datasource": "${prometheusds}"',
dash,
)
# Yank out "new"+old LMA topology
dash = re.sub(
r'(,?\s?juju_application=~)\\"\$app\\"', r'\1\\"$juju_application\\"', dash
)
# Replace old piechart panels
dash = re.sub(r'"type": "grafana-piechart-panel"', '"type": "piechart"', dash)
from jinja2 import DebugUndefined, Template
content = _encode_dashboard_content(
Template(dash, undefined=DebugUndefined).render(datasource=r"${prometheusds}") # type: ignore
)
id = "prog:{}".format(content[-24:-16])
dashboards[id] = content
return {**builtins, **dashboards}
def _maybe_get_builtin_dashboards(self, event: RelationEvent) -> Dict:
"""Tries to match the event with an included dashboard.
Scans dashboards packed with the charm instantiating this class, and tries to match
one with the event. There is no guarantee that any given event will match a builtin,
since each charm instantiating this class may include a different set of dashboards,
or none.
"""
builtins = {}
dashboards_path = None
try:
dashboards_path = _resolve_dir_against_charm_path(
self._charm, "src/grafana_dashboards"
)
except InvalidDirectoryPathError as e:
logger.warning(
"Invalid Grafana dashboards folder at %s: %s",
e.grafana_dashboards_absolute_path,
e.message,
)
if dashboards_path:
def is_dashboard(p: Path) -> bool:
return p.is_file() and p.name.endswith((".json", ".json.tmpl", ".tmpl"))
for path in filter(is_dashboard, Path(dashboards_path).glob("*")):
# path = Path(path)
if event.app.name in path.name: # type: ignore
id = "file:{}".format(path.stem)
builtins[id] = self._content_to_dashboard_object(
_encode_dashboard_content(path.read_bytes()), event
)
return builtins
def _content_to_dashboard_object(self, content: str, event: RelationEvent) -> Dict:
return {
"charm": event.app.name, # type: ignore
"content": content,
"juju_topology": self._juju_topology(event),
"inject_dropdowns": True,
}
# This is not actually used in the dashboards, but is present to provide a secondary
# salt to ensure uniqueness in the dict keys in case individual charm units provide
# dashboards
def _juju_topology(self, event: RelationEvent) -> Dict:
return {
"model": self._charm.model.name,
"model_uuid": self._charm.model.uuid,
"application": event.app.name, # type: ignore
"unit": event.unit.name, # type: ignore
}
class CosTool:
"""Uses cos-tool to inject label matchers into alert rule expressions and validate rules."""
_path = None
_disabled = False
def __init__(self, charm):
self._charm = charm
@property
def path(self):
"""Lazy lookup of the path of cos-tool."""
if self._disabled:
return None
if not self._path:
self._path = self._get_tool_path()
if not self._path:
logger.debug("Skipping injection of juju topology as label matchers")
self._disabled = True
return self._path
def apply_label_matchers(self, rules: dict, type: str) -> dict:
"""Will apply label matchers to the expression of all alerts in all supplied groups."""
if not self.path:
return rules
for group in rules["groups"]:
rules_in_group = group.get("rules", [])
for rule in rules_in_group:
topology = {}
# if the user for some reason has provided juju_unit, we'll need to honor it
# in most cases, however, this will be empty
for label in [
"juju_model",
"juju_model_uuid",
"juju_application",
"juju_charm",
"juju_unit",
]:
if label in rule["labels"]:
topology[label] = rule["labels"][label]
rule["expr"] = self.inject_label_matchers(rule["expr"], topology, type)
return rules
def validate_alert_rules(self, rules: dict) -> Tuple[bool, str]:
"""Will validate correctness of alert rules, returning a boolean and any errors."""
if not self.path:
logger.debug("`cos-tool` unavailable. Not validating alert correctness.")
return True, ""
with tempfile.TemporaryDirectory() as tmpdir:
rule_path = Path(tmpdir + "/validate_rule.yaml")
# Smash "our" rules format into what upstream actually uses, which is more like:
#
# groups:
# - name: foo
# rules:
# - alert: SomeAlert
# expr: up
# - alert: OtherAlert
# expr: up
transformed_rules = {"groups": []} # type: ignore
for rule in rules["groups"]:
transformed = {"name": str(uuid.uuid4()), "rules": [rule]}
transformed_rules["groups"].append(transformed)
rule_path.write_text(yaml.dump(transformed_rules))
args = [str(self.path), "validate", str(rule_path)]
# noinspection PyBroadException
try:
self._exec(args)
return True, ""
except subprocess.CalledProcessError as e:
logger.debug("Validating the rules failed: %s", e.output)
return False, ", ".join([line for line in e.output if "error validating" in line])
def inject_label_matchers(self, expression: str, topology: dict, type: str) -> str:
"""Add label matchers to an expression."""
if not topology:
return expression
if not self.path:
logger.debug("`cos-tool` unavailable. Leaving expression unchanged: %s", expression)
return expression
args = [str(self.path), "--format", type, "transform"]
variable_topology = {k: "${}".format(k) for k in topology.keys()}
args.extend(
[
"--label-matcher={}={}".format(key, value)
for key, value in variable_topology.items()
]
)
# Pass a leading "--" so expressions with a negation or subtraction aren't interpreted as
# flags
args.extend(["--", "{}".format(expression)])
# noinspection PyBroadException
try:
return re.sub(r'="\$juju', r'=~"$juju', self._exec(args))
except subprocess.CalledProcessError as e:
logger.debug('Applying the expression failed: "%s", falling back to the original', e)
return expression
def _get_tool_path(self) -> Optional[Path]:
arch = platform.machine()
arch = "amd64" if arch == "x86_64" else arch
res = "cos-tool-{}".format(arch)
try:
path = Path(res).resolve()
path.chmod(0o777)
return path
except NotImplementedError:
logger.debug("System lacks support for chmod")
except FileNotFoundError:
logger.debug('Could not locate cos-tool at: "{}"'.format(res))
return None
def _exec(self, cmd) -> str:
result = subprocess.run(cmd, check=True, stdout=subprocess.PIPE)
output = result.stdout.decode("utf-8").strip()
return output