Kubeflow

  • By Kubeflow Charmers | bundle
  • Cloud
Channel Revision Published
latest/stable 414 01 Dec 2023
latest/candidate 294 24 Jan 2022
latest/beta 430 30 Aug 2024
latest/edge 423 26 Jul 2024
1.9/stable 426 31 Jul 2024
1.9/beta 420 19 Jul 2024
1.9/edge 425 31 Jul 2024
1.8/stable 414 22 Nov 2023
1.8/beta 411 22 Nov 2023
1.8/edge 413 22 Nov 2023
1.7/stable 409 27 Oct 2023
1.7/beta 408 27 Oct 2023
1.7/edge 407 27 Oct 2023
1.6/stable 329 07 Sep 2022
1.6/beta 326 23 Aug 2022
1.6/edge 328 07 Sep 2022
1.4/stable 321 30 Jun 2022
1.4/edge 320 30 Jun 2022
juju deploy kubeflow --channel edge
Show information

Platform:

Upgrading Charmed Kubeflow (CKF) from 1.8 to 1.9 requires upgrading each charm individually. New relations must be added separately. Most charms can be upgraded simply with juju refresh, however certain components require additional steps to upgrade.

CKF 1.9 is incompatible with Charmed MLflow 2.1. If you have Charmed MLflow deployed, you should avoid upgrading to 1.9, until a newer version of Charmed MLflow is released.

Before the upgrade

Before upgrading CKF, you should do the following:

  • Make sure:
    • All pipeline runs are completed and there are no recurring runs enabled.
    • Katib experiments, training jobs and notebooks are not in progress or pending.
  • Back up any important data according to your organisation’s policies. For databases, MinIO bucket pipelines and ML metadata, refer to the backup guide for further details. For restoring that data, refer to the restore guide.

The backup guide above does not guarantee the backup of all Kubeflow resources, such as notebooks and profiles. Make sure to take the appropriate actions to avoid accidental data loss.

  • Record all charm versions, including revisions, in your existing CKF deployment. This can be done by running juju export-bundle.

Upgrade environment

Juju

As with the 1.8 latest update, Charmed Kubeflow 1.9 is supported on Juju 3.4 (>= 3.4.3). Make sure to use a compatible version. If needed, follow the instructions in order to upgrade the deployment.

Kubernetes

Due to Istio, CKF requires a Kubernetes cluster >=1.27 (see Supported versions). Before upgrading to CKF 1.9, make sure this requirement is met.

Upgrade charms

To upgrade charms, you should follow the steps below in the proposed order.

Some charms may go to Blocked state during some steps of the upgrade process. Once the upgrade is completed, all charms should be green and in Active state.

Istio

Istio needs to be upgraded to version 1.22, assuming the deployed istio-pilot and istio-ingressgateway versions are 1.17.

  1. Scale down the istio-ingressgateway application to 0:
juju scale-application istio-ingressgateway 0
  1. To make sure the istio-ingressgateway deployment is removed, run the following command. It should succeed by returning 0:
kubectl -n kubeflow get deploy istio-ingressgateway-workload 2> >(grep -q "NotFound" && echo $?)
  1. Upgrade istio-pilot charm to all intermediate versions. Thus, run each of the following commands separately and wait until it goes to Active state before running the next one:
juju refresh istio-pilot --channel 1.18/stable
juju refresh istio-pilot --channel 1.19/stable
juju refresh istio-pilot --channel 1.20/stable
juju refresh istio-pilot --channel 1.21/stable
juju refresh istio-pilot --channel 1.22/stable
  1. Upgrade and scale up istio-ingressgateway charm:
juju refresh istio-ingressgateway --channel 1.22/stable
juju scale-application istio-ingressgateway 1

If you encounter any issues during the upgrade, refer to Istio upgrade troubleshooting for more details.

PodSpec to Sidecar charms

Some charms were rewritten from PodSpec to Sidecar between CKF 1.8 and 1.9.

Mlmd

  1. Backup ML metadata following the instructions from this guide for MLMD <= 1.14 and CKF 1.8.

  2. Remove the relation with requirer charms (envoy and kfp-metadata-writer):

juju remove-relation envoy mlmd
juju remove-relation kfp-metadata-writer mlmd

Note that grpc relations are restored once the requirer charms are upgraded. You’ll do that in the “Add grpc relations” step of Charms with refresh section.

  1. Remove the mlmd application:

This wipes out the storage attached to the mlmd charm, that is, the database handled by this charm. Make sure you have performed the backup from step 1.

You must wait for the application to disappear (takes less than a minute).

juju remove-application mlmd --destroy-storage
  1. Deploy mlmd from 1.9 corresponding channel:
juju deploy mlmd --channel ckf-1.9/stable --trust
  1. Restore ML metadata following instructions for MLMD > 1.14 and CKF 1.9.

Rest of PodSpec charms

Juju 3.4 requires to scale down the application, refresh it, and then scale it up.

If CKF is deployed on AKS, skip this section and follow instead the Rest of PodSpec charms on AKS section.

  1. Scale down applications:

You must wait for the units to disappear (takes less than a minute).

juju scale-application katib-controller 0
juju scale-application kubeflow-volumes 0
juju scale-application envoy 0
  1. Refresh to the new charms:
juju refresh katib-controller --channel 0.17/stable --trust
juju refresh kubeflow-volumes --channel 1.9/stable --trust
juju refresh envoy --channel 2.2/stable --trust
  1. Scale up the applications:
juju scale-application katib-controller 1
juju scale-application kubeflow-volumes 1
juju scale-application envoy 1

Rest of PodSpec charms on AKS

Due to this bug, the standard PodSpec charms upgrade path with juju refresh on AKS ends up with them being stuck in Unknown status, unable to spin up a new refreshed unit. Instead, you can apply the following workaround:

  1. The commands below prevent the loss of your workloads created by Katib:
kubectl annotate crd experiments.kubeflow.org controller.juju.is/id-
kubectl annotate crd experiments.kubeflow.org model.juju.is/id-
kubectl label crd experiments.kubeflow.org app.juju.is/created-by-
kubectl label crd experiments.kubeflow.org app.kubernetes.io/managed-by-
kubectl label crd experiments.kubeflow.org app.kubernetes.io/name-
kubectl label crd experiments.kubeflow.org model.juju.is/name-

kubectl annotate crd trials.kubeflow.org controller.juju.is/id-
kubectl annotate crd trials.kubeflow.org model.juju.is/id-
kubectl label crd trials.kubeflow.org app.juju.is/created-by-
kubectl label crd trials.kubeflow.org app.kubernetes.io/managed-by-
kubectl label crd trials.kubeflow.org app.kubernetes.io/name-
kubectl label crd trials.kubeflow.org model.juju.is/name-

kubectl annotate crd suggestions.kubeflow.org controller.juju.is/id-
kubectl annotate crd suggestions.kubeflow.org model.juju.is/id-
kubectl label crd suggestions.kubeflow.org app.juju.is/created-by-
kubectl label crd suggestions.kubeflow.org app.kubernetes.io/managed-by-
kubectl label crd suggestions.kubeflow.org app.kubernetes.io/name-
kubectl label crd suggestions.kubeflow.org model.juju.is/name-
  1. Remove PodSpec charms:
juju remove-application katib-controller
juju remove-application kubeflow-volumes
juju remove-application envoy
  1. Wait until the applications have been removed. To make sure all related resources are removed, run the following commands. They should succeed by returning 0:
juju show-application katib-controller 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy katib-controller 2> >(grep -q "NotFound" && echo $?)
juju show-application kubeflow-volumes 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy kubeflow-volumes 2> >(grep -q "NotFound" && echo $?)
juju show-application envoy 2> >(grep -q "not found" && echo $?)
kubectl -n kubeflow get deploy envoy 2> >(grep -q "NotFound" && echo $?)
  1. Deploy the new charms and add the relations:
juju deploy envoy --channel 2.2/stable --trust
juju deploy kubeflow-volumes --channel 1.9/stable --trust
juju deploy katib-controller --channel 0.17/stable --trust
juju integrate kubeflow-dashboard:links kubeflow-volumes:dashboard-links
juju integrate istio-pilot:ingress kubeflow-volumes:ingress
juju integrate istio-pilot:ingress envoy:ingress

Charms with refresh

  1. Upgrade the rest of the charms with juju refresh:
juju refresh admission-webhook --channel 1.9/stable
juju refresh argo-controller --channel 3.4/stable
juju refresh dex-auth --channel 2.39/stable
juju refresh jupyter-controller --channel 1.9/stable
juju refresh jupyter-ui --channel 1.9/stable
juju refresh katib-db-manager --channel 0.17/stable
juju refresh katib-ui --channel 0.17/stable
juju refresh kfp-api --channel 2.2/stable
juju refresh kfp-metadata-writer --channel 2.2/stable
juju refresh kfp-persistence --channel 2.2/stable
juju refresh kfp-profile-controller --channel 2.2/stable
juju refresh kfp-schedwf --channel 2.2/stable
juju refresh kfp-ui --channel 2.2/stable
juju refresh kfp-viewer --channel 2.2/stable
juju refresh kfp-viz --channel 2.2/stable
juju refresh knative-eventing --channel 1.12/stable
juju refresh knative-operator --channel 1.12/stable
juju refresh knative-serving --channel 1.12/stable
juju refresh kserve-controller --channel 0.13/stable
juju refresh kubeflow-dashboard --channel 1.9/stable
juju refresh kubeflow-profiles --channel 1.9/stable
juju refresh kubeflow-roles --channel 1.9/stable
juju refresh metacontroller-operator --channel 3.0/stable
juju refresh minio --channel ckf-1.9/stable
juju refresh oidc-gatekeeper --channel ckf-1.9/stable
juju refresh pvcviewer-operator --channel 1.9/stable
juju refresh tensorboard-controller --channel 1.9/stable
juju refresh tensorboards-web-app --channel 1.9/stable
juju refresh training-operator --channel 1.8/stable
  1. Add grpc relations to mlmd:
juju integrate envoy:grpc mlmd:grpc
juju integrate kfp-metadata-writer:grpc mlmd:grpc
  1. Add new relations:
juju integrate katib-db-manager:k8s-service-info katib-controller:k8s-service-info
juju integrate kubeflow-dashboard:links training-operator:dashboard-links
juju integrate oidc-gatekeeper:dex-oidc-config dex-auth:dex-oidc-config

See also