Charmed Spark K8s

Canonical | bundle

Channel	Revision	Published
latest/edge	4	06 Aug 2024
3.4/edge	4	06 Aug 2024

Learn to deploy on juju >

Platform:

Relevant links

Homepage

Contacts

Submit a bug

Share your thoughts on this charm with the community on discourse.

Join the discussion

Security hardening guide

This document provides an overview of security features and guidance for hardening the security of Charmed Apache Spark K8s, including setting up and managing a secure environment.

Environment

The environment where applications operate can be divided in two components:

Kubernetes
Juju

Kubernetes

Charmed Apache Spark can be deployed on top of several Kubernetes distributions. The following table provides references for the security documentation for the main supported cloud platforms.

Cloud	Security guide
Charmed Kubernetes	Security in Charmed Kubernetes
AWS EKS	Best Practices for Security, Identity and Compliance, AWS security credentials, Security in EKS
Azure AKS	Azure security best practices and patterns, Managed identities for Azure resource, Security in AKS

Juju is the component responsible for orchestrating the entire lifecycle, from deployment to Day 2 operations, of all applications. Therefore, Juju must be set up securely. For more information, see the Juju security page and the How to harden your deployment guide.

Cloud credentials

When configuring the cloud credentials to be used with Juju, ensure that the users have correct permissions to operate at the required level on the Kubernetes cluster. Juju superusers responsible for bootstrapping and managing controllers require elevated permissions to manage several kind of resources. For this reason, the K8s user used for bootstrapping and managing the deployments should have full permissions, such as:

create, delete, patch, and list:
- namespaces
- services
- deployments
- stateful sets
- pods
- PVCs

In general, it is common practice to run Juju using the admin role of K8s, to have full permissions on the Kubernetes cluster.

Juju users

It is very important that juju users are set up with minimal permissions depending on the scope of their operations. Please refer to the User access levels documentation for more information on the access level and corresponding abilities that the different users can be granted.

Juju user credentials must be stored securely and rotated regularly to limit the chances of unauthorized access due to credentials leakage.

Applications

In the following, we provide guidance on how to harden your deployment using:

Base Images
Apache Spark Security Upgrades
Encryption
Authentication
Monitoring and Auditing

Base images

Charmed Apache Spark K8s runs on top of a set of Rockcraft-based images, all based on the same Apache Spark distribution binaries, available in the Apache Spark release page, on top of Ubuntu 22.04. The images that can be found in the Charmed Apache Spark rock images GitHub repo are used as the base images for pods both for Spark jobs and charms. The following table summarises the relation between the component and its underlying base image.

Component	Image
Spark Job (Driver)	`charmed-spark`
Spark Job (Executor)	`charmed-spark`
Spark History Server	`charmed-spark`
Charmed Apache Kyuubi	`charmed-spark-kyuubi`
Spark Job (Driver) - GPU Support	`charmed-spark-gpu`
Spark Job (Executor) - GPU Support	`charmed-spark-gpu`
Integration Hub	`spark-integration-hub`

New versions of the Charmed Apache Spark images may be released to provide patching of vulnerabilities (CVEs).

Charmed operator security upgrades

Charmed Apache Spark K8s operators, including Spark History server, Charmed Apache Kyuubi, and Integration Hub, install a pinned revision of the Charmed Apache Spark images outlined in the previous table to provide reproducible and secure environments. New versions of Charmed Apache Spark K8s operators may therefore be released to provide patching of vulnerabilities (CVEs). It is important to refresh the charm regularly to make sure the workload is as secure as possible.

Encryption

We recommend deploying Charmed Apache Spark K8s with encryption enabled for securing the communication between components, whenever available and supported by the server and client applications. In the following, we provide further information on how to encrypt the various data flow between the different components of the solution:

Client <> Kubernetes API connections
Object storage connections
Apache Kyuubi <> PostgreSQL connection
Apache Kyuubi <> Apache ZooKeeper connection
Spark History Server client connection
Kyuubi Client <> Kyuubi Server connection
Spark jobs communications

Client <> Kubernetes API connections

Make sure that the API service of Kubernetes is correctly encrypted, and it exposes HTTPS protocol. Please refer to the documentation above for the main substrates and/or the documentation of your distribution. Please ensure that the various components of the solution, e.g. spark-client, pods, etc, are correctly configured with the trusted CA certificate of the K8s cluster.

Object storage connections

Make sure that the object storage service is correctly encrypted, and it exposes HTTPS protocol. Please refer to the documentation of your object storage backend to make sure this option is supported and enabled. Please ensure that the various components of the solution, e.g. spark-client, pods, etc, are correctly configured with the trusted CA certificate of the K8s cluster. See the how-to manage certificates guide for more information.

Apache Kyuubi <> PostgreSQL connection

Charmed Apache Kyuubi integration with PostgreSQL can be secured by enabling encryption for the PostgreSQL K8s charm. See the PostgreSQL K8s how-to enable TLS user guide for more information on how to enable and customize encryption.

Apache Kyuubi <> Apache ZooKeeper connection

Charmed Apache Kyuubi integration with Apache ZooKeeper can be secured by enabling encryption for the Apache ZooKeeper K8s charm. See the Apache Kafka K8s how-to enable TLS user guide for more information on how to enable and customize encryption for Apache ZooKeeper.

Spark History Server client connection

Spark History Server implements encryption terminated at ingress-level. Therefore, internal Kubernetes communication between ingress and Spark History Server is unencrypted. To enable encryption, see the how-to expose Spark History Server user guide.

Kyuubi Client <> Kyuubi Server connection

The Apache Kyuubi charm exposes a JDBC-compliant endpoint which can be connected using JDBC-compliant clients, like Beeline. Encryption is currently not supported and it is planned for 25.04.

Spark jobs communications

To secure the RPC channel used for communication between driver and executor, use the dedicated Apache Spark properties. Refer to the how-to manage spark accounts for more information on how to customize the Apache Spark service account with additional properties, or the Spark Configuration Management explanation page for more information on how Spark workload can be further configured.

Authentication

Charmed Apache Spark K8s provides external authentication capabilities for:

Kubernetes API
Spark History Server
Kyuubi JDBC endpoint

Kubernetes API

Authentication to the Kubernetes API follows standard implementations, as described in the upstream Kubernetes documentation. Please make sure that the distribution being used supports the authentication used by clients, and that the Kubernetes cluster has been correctly configured.

Generally, client applications store credentials information locally in a KUBECONFIG file. On the other hand, pods created by the charms and the Spark Job workloads receive credentials via shared secrets, mounted to the default locations /var/run/secrets/kubernetes.io/serviceaccount/. See the upstream documentation for more information.

Spark History Server

Authentication can be enabled in the Spark History Server when exposed using Traefik by leveraging on the Oath Keeper integration, that provides a cloud native Identity & Access Proxy (IAP) and Access Control Decision API able to authenticates, authorizes, and mutates incoming HTTP(s) requests, fully-integrated with the Canonical Identity Platform.

Refer to the how-to enable authorization in the Spark history server user guide for more information. From a permission-wise point of view, white lists of authorised users can be provided using the Spark History Server charm configuration option.

Kyuubi JDBC endpoint

Authentication can be enabled for Charmed Apache Kyuubi via its integration with PostgreSQL charm on the auth-db interface. Currently, only one admin user is enabled, whose credentials can be retrieved using the get-jdbc-endpoint action.

Monitoring and auditing

Charmed Apache Spark provides native integration with the Canonical Observability Stack (COS). To reduce the blast radius of infrastructure disruptions, the general recommendation is to deploy COS and the observed application into separate environments, isolated from one another. Refer to the COS production deployments best practices page for more information.

For more information on how to enable and customise monitoring with COS, see the Guide.

Additional resources

For further information and details on the security and cryptographic specifications used by Charmed Apache Spark, please refer to the Cryptography.

Help improve this document in the forum (guidelines). Last updated 27 days ago.