Kubeflow

Kubeflow Charmers | bundle
Cloud

Channel	Revision	Published
latest/candidate	294	24 Jan 2022
latest/beta	430	30 Aug 2024
latest/edge	423	26 Jul 2024
1.9/stable	426	31 Jul 2024
1.9/beta	420	19 Jul 2024
1.9/edge	425	31 Jul 2024
1.8/stable	414	22 Nov 2023
1.8/beta	411	22 Nov 2023
1.8/edge	413	22 Nov 2023
1.7/stable	409	27 Oct 2023
1.7/beta	408	27 Oct 2023
1.7/edge	407	27 Oct 2023

Learn to deploy on juju >

Platform:

Relevant links

Homepage

Discuss this bundle

Share your thoughts on this charm with the community on discourse.

Join the discussion

Backup Charmed Kubeflow
Alternative backup methods
- Backup ML Metadata using kubectl cp

Backup Charmed Kubeflow

The following instructions will allow you to backup and restore the Charmed Kubeflow (CKF) control plane data to a compatible S3 storage.

It is expected that these steps are followed all at once for backing up the CKF control plane, that is, backing up all databases, pipelines MinIO bucket, and ML Metadata database at the same time. Failing to do so may result in data loss.

Running Kubeflow pipelines and Katib experiments can affect the outcome of the backup, please make sure all pipelines and experiments are stopped and no other processes are calling them (e.g. Jupyter Notebooks).

User workloads in user namespaces will not be backed up.

Pre-requisites

Access to a S3 storage - only AWS S3 and S3 RadosGW are supported

This S3 storage will be used for storing all backup data from the CKF control plane.

Admin access to the Kubernetes cluster where CKF is deployed
Juju admin access to the kubeflow model
yq binary
Ensure the local storage is big enough to backup the data

Configure `rclone`

rclone is a tool that allows file management in cloud storage. This tool will be used for backing up several files throughout this guide and it can be installed as a snap:

sudo snap install rclone

Connect to a shared S3 storage

1. Configure `rclone` to connect to the shared S3 storage. The following can be used as reference.

[remote-s3]
type = s3
provider = AWS
env_auth = true
access_key_id = ...
secret_access_key = ...
region = eu-central-1
acl = private
server_side_encryption = AES256

You can check where this configuration file is located with rclone config file

2. Save the name of the S3 remote in an `ENV` variable.

RCLONE_S3_REMOTE=remote-s3

Connect to CKF MinIO

1. The following steps require an accessible MinIO endpoint, which can be done port forwarding the `minio Service`:

kubectl port-forward -n kubeflow svc/minio 9000:9000

2. Get `minio`’s `secret-key` value:

juju show-unit kfp-ui/0 \
        | yq '.kfp-ui/0.relation-info.[] | select (.endpoint == "object-storage") | .application-data.data' \
        | yq '.secret-key'

3. Get `minio`’s `access-key`:

juju config minio access-key

4. Configure `rclone` to connect to CKF MinIO. The following can be used as reference.

[minio-ckf]
type = s3
provider = Minio
access_key_id = minio
secret_access_key = ...
endpoint = http://localhost:9000
acl = private

5. Save the name of the MinIO remote in an `ENV` variable.

RCLONE_MINIO_CKF_REMOTE=minio-ckf

Backup CKF databases to S3 storage

CKF uses katib-db and kfp-db as databases for Katib and Kubeflow pipelines respectively.

1. Deploy and configure the `s3-integrator` to connect to the shared S3 storage.

Follow the S3 AWS and S3 Radowsg configuration guides for this step.

2. Scale up `kfp-db` and `katib-db`.

This step avoids the Primary database from becoming unavailable during backup.

juju scale-application kfp-db 2
juju scale-application katib-db 2

2. Create a backup for each database.

Please replace mysql-k8s with the name of the database you intend to create a backup for in the commands form that guide. E.g. katib-db instead of mysql-k8s.

Backup ML Metadata using `sqlite3`

The mlmd charm uses a SQLite database to store ML metadata generated from Kubeflow pipelines.

1. Install the required tools inside the application container

This step expects the mlmd application container to have internet access, if that is not the case, please check Backup ML Metadata with kubectl.

# MLMD > 1.14, CKF 1.9
MLMD_POD="mlmd-0"
MLMD_CONTAINER="mlmd-grpc-server"

# MLMD 1.14, CKF 1.8
MLMD_POD="mlmd-0"
MLMD_CONTAINER="mlmd"

kubectl exec -n kubeflow $MLMD_POD -c $MLMD_CONTAINER -- \
    /bin/bash -c "apt update && apt install sqlite3 -y"

2. Scale down `kfp-metadata-writer`

This is done to prevent any additional writes to MLMD.

juju scale-application kfp-metadata-writer 0

3. Perform a database backup

This will dump all the contents of the database into a compressed text file inside the mlmd-0 container.

MLMD_BACKUP=mlmd-$(date -d "today" +"%Y-%m-%d-%H-%M").dump.gz

kubectl exec -n kubeflow $MLMD_POD -c $MLMD_CONTAINER -- \
	/bin/bash -c \
	"sqlite3 /data/mlmd.db .dump | gzip -c >/tmp/$MLMD_BACKUP"

4. Copy the backup file to local storage.

In this step we’ll copy the dump of MLMD DB into the local machine that executes the commands.

kubectl cp -n kubeflow -c $MLMD_CONTAINER \
	$MLMD_POD:/tmp/$MLMD_BACKUP \
	./$MLMD_BACKUP

5. Copy the MLMD backup data to the S3 storage

In this step we’ll move the local copy of the MLMD DB dump to the S3 bucket that will store all the backup artifacts.

S3_BUCKET=backup-bucket-2024
RCLONE_S3_REMOTE=remote-s3
RCLONE_BWIDTH_LIMIT=20M

rclone --size-only copy \
	--bwlimit $RCLONE_BWIDTH_LIMIT \
	./$MLMD_BACKUP \
	$RCLONE_S3_REMOTE:$S3_BUCKET

Optionally you can remove the MLMD dump in your local machine

rm -rf $MLMD_BACKUP

6. Scale up `kfp-metadata-writer`

juju scale-application kfp-metadata-writer 1

Backup `mlpipeline` MinIO bucket

Sync all files from minio to the shared S3 storage

S3_BUCKET=backup-bucket-2024
RCLONE_S3_REMOTE=remote-s3
RCLONE_BWIDTH_LIMIT=20M

rclone --size-only sync \
	--bwlimit $RCLONE_BWIDTH_LIMIT \
	$RCLONE_MINIO_REMOTE:mlpipeline \
	$RCLONE_S3_REMOTE:$S3_BUCKET/mlpipeline

Alternative backup methods

Backup ML Metadata using `kubectl cp`

The mlmd charm uses a SQLite database to store ML metadata generated from Kubeflow pipelines.

1. Scale down `kfp-metadata-writer`

This is done to prevent any additional writes to MLMD.

juju scale-application kfp-metadata-writer 0

2. Copy the backup file to local storage.

This step creates a copy of the MLMD DB into the local machine that executes the commands.

# MLMD > 1.14, CKF 1.9
MLMD_POD="mlmd-0"
MLMD_CONTAINER="mlmd-grpc-server"

# MLMD 1.14, CKF 1.8
MLMD_POD="mlmd-0"
MLMD_CONTAINER="mlmd"

kubectl cp -n kubeflow -c $MLMD_CONTAINER \
	$MLMD_POD:/data/mlmd.db \
	./$MLMD_BACKUP

4. Copy the MLMD backup data to the S3 storage