Kubeflow

  • By Kubeflow Charmers | bundle
  • Cloud
Channel Revision Published
latest/stable 414 01 Dec 2023
latest/candidate 294 24 Jan 2022
latest/beta 430 30 Aug 2024
latest/edge 423 26 Jul 2024
1.9/stable 426 31 Jul 2024
1.9/beta 420 19 Jul 2024
1.9/edge 425 31 Jul 2024
1.8/stable 414 22 Nov 2023
1.8/beta 411 22 Nov 2023
1.8/edge 413 22 Nov 2023
1.7/stable 409 27 Oct 2023
1.7/beta 408 27 Oct 2023
1.7/edge 407 27 Oct 2023
1.6/stable 329 07 Sep 2022
1.6/beta 326 23 Aug 2022
1.6/edge 328 07 Sep 2022
1.4/stable 321 30 Jun 2022
1.4/edge 320 30 Jun 2022
juju deploy kubeflow --channel beta
Show information

Platform:

This guide presents the Grafana dashboards provided by Charmed Kubeflow (CKF). See Grafana dashboards for more details.

All Grafana dashboards provided by CKF use the ckf tag.

Generic dashboards

CKF charms state

This dashboard shows the state, up represented in green or down represented in red, of CKF charms. This includes only charms that provide metrics. See Prometheus metrics to learn which are those.

ckf-generic-dashboard

Istio control plane

This dashboard provides a general overview of the health and performance of the Istio control plane. It combines metrics from istio-pilot and istio-gateway.

See Visualizing Istio metrics with Grafana for more details.

istio-control-plane

Pipelines

The following dashboards provide visualisations related to Kubeflow Pipelines (KFP).

ArgoWorkflow metrics

The metrics from the Argo Workflow controller expose the status of Argo Workflow custom resources, including the following information:

  1. The number of workflows that have failed or are in error state.
  2. The time workflows spend in the queue before being run.
  3. The total size of captured logs that are pushed into S3 from the workflow pods.

Envoy service

The metrics from the envoy proxy expose the history of requests proxied from the KFP user interface to the MLMD application, including the following information:

  1. The total number of requests.
  2. The success rate of requests with status code non 5xx as well the number of requests with 4xx response, either upstream or downstream.

MinIO dashboard

The metrics from MinIO expose the status of the S3 storage instance used by KFP, including the following information:

  1. S3 available space and storage capacity.
  2. S3 traffic.
  3. S3 API request errors and data transferred.
  4. Node CPU, memory, file descriptors and IO usage.

Notebooks

The following dashboards provide visualisations related to Kubeflow Notebooks.

Jupyter Notebook controller

The metrics from the Jupyter controller expose the status of Jupyter Notebook custom resources.

Experiments

The following dashboards provide visualisations related to Katib experiments.

Katib status

The metrics from the Katib controller expose the status of Experiment and Trial custom resources.

Serving models

The following dashboards provide visualisations related to serving ML models.

Seldon Core

The metrics from the Seldon Core controller expose the status of Seldon Deployment custom resources, also called models, including information related to Seldon deployments currently available on the controller.