Charmed Spark K8s

  • Canonical | bundle
Channel Revision Published
latest/edge 4 06 Aug 2024
3.4/edge 4 06 Aug 2024
juju deploy spark-k8s-bundle --channel edge
Show information

Platform:

Manage Service Accounts using the Python API

The spark-client snap relies on the spark8t toolkit. spark8t provides both a CLI and a programmatic interface to enhanced Apache Spark client functionalities.

Here we describe how to use the spark8t toolkit (as part of the spark-client snap) to manage service accounts using Python.

Preparation

The spark8t package is already part of the SNAP. However, if the python package is used outside of the SNAP context, please make sure that environment settings (described on the tool’s README) are correctly configured.

Furthermore, you need to make sure that PYTHONPATH contains the location where the spark8t libraries were installed within the snap (something like /snap/spark-client/current/lib/python3.10/site-packages)

Bind to Kubernetes

The following snipped allows you to import relevant environment variables into a confined object, among which there should an auto-inference of your kubeconfig file location.

import os
from spark8t.domain import Defaults
from spark8t.services import KubeInterface

# Defaults for spark-client
defaults = Defaults(dict(os.environ))  # General defaults

# Create a interface connection to k8s
kube_interface = KubeInterface(defaults.kube_config)

Note that if you want to override some of these settings, you can extend the Default class accordingly.

Alternatively, you can also use auto-inference using the kubectl command via

from spark8t.services import KubeInterface

kube_interface = KubeInterface.autodetect(kubectl_cmd="kubectl")

Once bound to the k8s cluster, you have some properties of the connection readily available, e.g.

kube_interface.namespace
kube_interface.api_server

You can also issue some kubectl commands, using the exec method

service_accounts = kube_interface.exec("get sa -A")
service_accounts_namespace = kube_interface.exec(
    "get sa", namespace=kube_interface.namespace
)

Manage Spark Service Accounts

All functionalities for managing Apache Spark service accounts are embedded within the K8sServiceAccountRegistry that can be instantiated using the kube_interface object we defined above

from spark8t.services import K8sServiceAccountRegistry

registry = K8sServiceAccountRegistry(kube_interface)

Once this object is instantiated we can perform several operations, as outlined in the sections below

Create new Apache Spark service accounts

New Apache Spark service accounts can be created by first creating a ServiceAccount domain object, and optionally specifying extra-properties, e.g.

from spark8t.domain import PropertyFile, ServiceAccount

configurations = PropertyFile({"my-key": "my-value"})
service_account = ServiceAccount(
    name="my-spark",
    namespace="default",
    api_server=kube_interface.api_server,
    primary=False,
    extra_confs=configurations,
)

The account can then be created using the registry

service_account_id = registry.create(service_account)

This returns an id, which is effectively the {namespace}:{username}, e.g. “default:my-spark”.

Listing spark service accounts

Once Apache Spark service accounts have been created, these can be listed via

spark_service_accounts = registry.all()

or retrieved using their ids

retrieved_account = registry.get(service_account_id)

Delete service account

The registry can also be used to delete existing service accounts

registry.delete(primary_account_id)

or using an already existing ServiceAccount object:

registry.delete(service_account.id)

Manage Primary Accounts

spark8t and spark-client snap have the notation of the so-called ‘primary’ service account, the one that would be chosen by default if no specific account is provided. The primary Apache Spark service account can be set up using:

registry.set_primary(service_account_id)

or using an already existing ServiceAccount object:

registry.set_primary(service_account.id)

The primary Apache Spark service account can be retrieved using

primary_account = registry.get_primary()

Manage configurations of Spark service accounts

Apache Spark service accounts can have a configuration that is provided (unless overridden) during each execution of Spark jobs. This configuration is stored in the PropertyFile object, which can be provided on the creation of a ServiceAccount object (extra_confs argument).

The PropertyFile object can either be created from a dictionary, as done above

from spark8t.domain import PropertyFile

static_property = PropertyFile({"my-key": "my-value"})

or also read from a file, e.g.:

from spark8t.domain import PropertyFile

static_property = PropertyFile.read(defaults.static_conf_file)

PropertyFile objects can be merged using the + operator:

merged_property = static_property + service_account.extra_confs

And ServiceAccount properties can be updated using new “merged” properties via the API provided by the registry:

registry.set_configurations(service_account.id, merged_property)

Alternatively, you can also store these properties in files:

with open("my-file", "w") as fid:
    merged_property.log().write(fid)