Telemetry labels in the grafana ecosystem
Any application on any node may produce telemetry (e.g. metrics, logs). When telemetry from multiple sources is stored in a centralized database, we need to be able to differentiate telemetry by source (origin). This is accomplished with telemetry labels.
A telemetry label is a key-value pair. Telemetry labels can be specified:
- in the telemetry items themselves
- in ingestion jobs (“scrape configs”)
Telemetry labels are used throughout the Grafana ecosystem.
Metric labels
An app may expose labelled metrics under a /metrics
endpoint .
A simple way to see this in action is to find an instrumented app and curl its /metrics
endpoint.
One such app is prometheus:
$ sudo snap install prometheus
$ curl localhost:9090/metrics
# -- snip --
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 14
# -- snip --
# HELP prometheus_http_requests_total Counter of HTTP requests.
# TYPE prometheus_http_requests_total counter
prometheus_http_requests_total{code="200",handler="/metrics"} 128
prometheus_http_requests_total{code="302",handler="/"} 1
# ...
In the example above,
process_open_fds
is a metric without any labelsprometheus_http_requests_total
is a metric with two labels
Scrape job labels for metrics
While metric labels are set by the app developer, the monitoring service can append an additional fixed set of labels to all the metrics scraped by the same scrape jobs. Prometheus and grafana agent are two examples of monitoring services capable of scraping metrics.
For prometheus (or grafana agent) to scrape our apps (targets), we need to specify in its configuration file where to find them. This is also where we specify telemetry labels.
scrape_configs:
- job_name: "some-app-scrape-job"
metrics_path: "/metrics"
static_configs:
- targets: ["hostname.for.my.app:8080"]
labels:
location: "second_floor_third_server_from_the_left"
purpose: "weather_station_cluster"
Labels that are specified under a static_configs
entry are automatically “appended” to all metrics scraped from the targets:
$ curl -s --data-urlencode 'match[]={__name__="prometheus_http_requests_total"}' localhost:9090/api/v1/series | jq '.data'
[
{
"__name__": "prometheus_http_requests_total",
"code": "200",
"handler": "/metrics",
"instance": "localhost:9090",
"job": "prometheus",
"location": "second_floor_third_server_from_the_left",
"purpose": "weather_station_cluster"
},
{
"__name__": "prometheus_http_requests_total",
"code": "302",
"handler": "/",
"instance": "localhost:9090",
"job": "prometheus",
"location": "second_floor_third_server_from_the_left",
"purpose": "weather_station_cluster"
},
]
Similarly, “service labels” can be specified using prometheus remote-write endpoint and push-gateway, and grafana agent’s config file.
Log labels
Logs (“streams”) ingested by loki will be searchable by the specified labels. If you push logs directly to loki, you can attach labels to to every “stream” pushed. In loki’s terminology, a stream is a set of loglines pushed in a single request:
{
"streams": [
{
"stream": {
"label": "value"
},
"values": [
[ "<unix epoch in nanoseconds>", "<log line>" ],
[ "<unix epoch in nanoseconds>", "<log line>" ]
]
}
]
}
Scrape job labels for logs
Log files can be scraped by promtail or grafana agent, which then stream the log lines to loki using loki’s push api endpoint.
Promtail, similar to grafana agent, has a scarpe_configs
section in its config file for specifying targets (log filename) and associate labels to them.
See also grafana agent’s config file docs.
Alert labels
By design, prometheus (and loki) store all alerts in a centralized fashion: if you want your alerts to be evaluated, you must place them on the filesystem somewhere accessible by prometheus, and specify that path in prometheus’s config file:
rule_files:
- /path/to/*.rules
- /another/one/*.yaml
Alert definitions are not tied to any particular node, application or metric. This gives high flexibility in defining an alert. You could define an alert that triggers for any node that runs out of space, and another alert that triggers only for a specific application on a specific node. Narrowing down the scope of an alert is accomplished by using telemetry labels.
expr: process_cpu_seconds_total > 0.12
would trigger if the value of any metric with this name (regardless of any labels) exceeds0.12
.expr: process_cpu_seconds_total{region="europe", app="nginx"} > 0.12
would trigger only for this metrics that is also labeled asnginx
andeurope
.
When an on-caller receives an alert (via alertmanager, karma or similar), they see a rendering of the alert, which includes the expr
and label values, among a few additional fields.
Additional alert labels can be specified in the alert definition:
labels:
severity: critical
This is useful for:
- Filtering alert rules (see grouping, inhibition, silences).
- Enriching the message an on-caller sees with additional metadata.
Relabeling
relabel_configs
and metric_relabel_configs
are for modifying label and metric names, respectively.
See also:
Last updated 8 months ago.