Charmed Spark K8s
- Canonical | bundle
Channel | Revision | Published |
---|---|---|
latest/edge | 4 | 06 Aug 2024 |
3.4/edge | 4 | 06 Aug 2024 |
juju deploy spark-k8s-bundle --channel edge
Deploy Kubernetes operators easily with Juju, the Universal Operator Lifecycle Manager. Need a Kubernetes cluster? Install MicroK8s to create a full CNCF-certified Kubernetes system in under 60 seconds.
Platform:
Manage Service Accounts using the Python API
The spark-client
snap relies on the spark8t
toolkit. spark8t
provides both a CLI and a programmatic interface to enhanced Apache Spark client functionalities.
Here we describe how to use the spark8t
toolkit (as part of the spark-client
snap) to manage service accounts using Python.
Preparation
The spark8t
package is already part of the SNAP. However, if the python package is used outside of the SNAP context, please make sure that environment settings (described on the tool’s README) are correctly configured.
Furthermore, you need to make sure that PYTHONPATH
contains the location where the spark8t
libraries were installed within the snap (something like /snap/spark-client/current/lib/python3.10/site-packages
)
Bind to Kubernetes
The following snipped allows you to import relevant environment variables into a confined object, among which there should an auto-inference of your kubeconfig file location.
import os
from spark8t.domain import Defaults
from spark8t.services import KubeInterface
# Defaults for spark-client
defaults = Defaults(dict(os.environ)) # General defaults
# Create a interface connection to k8s
kube_interface = KubeInterface(defaults.kube_config)
Note that if you want to override some of these settings, you can extend the Default
class accordingly.
Alternatively, you can also use auto-inference using the kubectl
command via
from spark8t.services import KubeInterface
kube_interface = KubeInterface.autodetect(kubectl_cmd="kubectl")
Once bound to the k8s cluster, you have some properties of the connection readily available, e.g.
kube_interface.namespace
kube_interface.api_server
You can also issue some kubectl
commands, using the exec
method
service_accounts = kube_interface.exec("get sa -A")
service_accounts_namespace = kube_interface.exec(
"get sa", namespace=kube_interface.namespace
)
Manage Spark Service Accounts
All functionalities for managing Apache Spark service accounts are embedded within
the K8sServiceAccountRegistry
that can be instantiated using the kube_interface
object we defined above
from spark8t.services import K8sServiceAccountRegistry
registry = K8sServiceAccountRegistry(kube_interface)
Once this object is instantiated we can perform several operations, as outlined in the sections below
Create new Apache Spark service accounts
New Apache Spark service accounts can be created by first creating a ServiceAccount
domain object, and optionally specifying extra-properties, e.g.
from spark8t.domain import PropertyFile, ServiceAccount
configurations = PropertyFile({"my-key": "my-value"})
service_account = ServiceAccount(
name="my-spark",
namespace="default",
api_server=kube_interface.api_server,
primary=False,
extra_confs=configurations,
)
The account can then be created using the registry
service_account_id = registry.create(service_account)
This returns an id, which is effectively the {namespace}:{username}
, e.g. “default:my-spark”.
Listing spark service accounts
Once Apache Spark service accounts have been created, these can be listed via
spark_service_accounts = registry.all()
or retrieved using their ids
retrieved_account = registry.get(service_account_id)
Delete service account
The registry can also be used to delete existing service accounts
registry.delete(primary_account_id)
or using an already existing ServiceAccount
object:
registry.delete(service_account.id)
Manage Primary Accounts
spark8t
and spark-client snap have the notation of the so-called ‘primary’ service account, the
one that would be chosen by default if no specific account is provided. The
primary Apache Spark service account can be set up using:
registry.set_primary(service_account_id)
or using an already existing ServiceAccount
object:
registry.set_primary(service_account.id)
The primary Apache Spark service account can be retrieved using
primary_account = registry.get_primary()
Manage configurations of Spark service accounts
Apache Spark service accounts can have a configuration that is provided (unless
overridden) during each execution of Spark jobs. This configuration is stored in the PropertyFile
object, which can be provided on the creation of a ServiceAccount
object (extra_confs
argument).
The PropertyFile
object can either be created from a dictionary, as
done above
from spark8t.domain import PropertyFile
static_property = PropertyFile({"my-key": "my-value"})
or also read from a file, e.g.:
from spark8t.domain import PropertyFile
static_property = PropertyFile.read(defaults.static_conf_file)
PropertyFile
objects can be merged using the +
operator:
merged_property = static_property + service_account.extra_confs
And ServiceAccount
properties can be updated using new “merged” properties
via the API provided by the registry:
registry.set_configurations(service_account.id, merged_property)
Alternatively, you can also store these properties in files:
with open("my-file", "w") as fid:
merged_property.log().write(fid)