Data Platform Libs

Channel Revision Published Runs on
latest/stable 81 19 Nov 2024
Ubuntu 22.04
latest/edge 81 19 Nov 2024
Ubuntu 22.04
juju deploy data-platform-libs
Show information

Platform:

Ubuntu
22.04

charms.data_platform_libs.v0.upgrade

Library to manage in-place upgrades for charms running on VMs and K8s.

This library contains handlers for upgrade relation events used to coordinate between units in an application during a juju refresh, as well as Pydantic models for instantiating, validating and comparing dependencies.

An upgrade on VMs is initiated with the command juju refresh. Once executed, the following events are emitted to each unit at random: - upgrade-charm - config-changed - leader-settings-changed - Non-leader only

Charm authors can implement the classes defined in this library to streamline the process of coordinating which unit updates when, achieved through updating of unit-data state throughout.

At a high-level, the upgrade steps are as follows: - Run pre-checks on the cluster to confirm it is safe to upgrade - Create stack of unit.ids, to serve as the upgrade order (generally workload leader is last) - Start the upgrade by issuing a Juju CLI command - The unit at the top of the stack gets permission to upgrade - The unit handles the upgrade and restarts their service - Repeat, until all units have restarted

Usage by charm authors
upgrade relation

Charm authors must implement an additional peer-relation.

As this library uses relation data exchanged between units to coordinate, charm authors need to add a new relation interface. The relation name does not matter.

metadata.yaml

peers:
  upgrade:
    interface: upgrade
Dependencies JSON/Dict

Charm authors must implement a dict object tracking current charm versions, requirements + upgradability.

Many workload versions may be incompatible with older/newer versions. This same idea also can apply to charm or snap versions. Workloads with required related applications (e.g Kafka + ZooKeeper) also need to ensure their versions are compatible during an upgrade, to avoid cluster failure.

As such, it is necessasry to freeze any dependencies within each published charm. An example of this could be creating a DEPENDENCIES dict within the charm code, with the following structure:

src/literals.py

DEPENDENCIES = {
    "kafka_charm": {
        "dependencies": {"zookeeper": ">50"},
        "name": "kafka",
        "upgrade_supported": ">90",
        "version": "100",
    },
    "kafka_service": {
        "dependencies": {"zookeeper": "^3"},
        "name": "kafka",
        "upgrade_supported": ">=0.8",
        "version": "3.3.2",
    },
}

The first-level key names are arbitrary labels for tracking what those versions+dependencies are for. The dependencies second-level values are a key-value map of any required external applications, and the versions this packaged charm can support. The upgrade_suppported second-level values are requirements from which an in-place upgrade can be supported by the charm. The version second-level values correspond to the current version of this packaged charm.

Any requirements comply with poetry's dependency specifications.

Dependency Model

Charm authors must implement their own class inheriting from DependencyModel.

Using a Pydantic model to instantiate the aforementioned DEPENDENCIES dict gives stronger type safety and additional layers of validation.

Implementation just needs to ensure that the top-level key names from DEPENDENCIES are defined as attributed in the model.

src/upgrade.py

from pydantic import BaseModel

class KafkaDependenciesModel(BaseModel):
    kafka_charm: DependencyModel
    kafka_service: DependencyModel
Overrides for DataUpgrade

Charm authors must define their own class, inheriting from DataUpgrade, overriding all required abstractmethods.

class ZooKeeperUpgrade(DataUpgrade):
    def __init__(self, charm: "ZooKeeperUpgrade", **kwargs):
        super().__init__(charm, **kwargs)
        self.charm = charm
Implementation of pre_upgrade_check()

Before upgrading a cluster, it's a good idea to check that it is stable and healthy before permitting it. Here, charm authors can validate upgrade safety through API calls, relation-data checks, etc. If any of these checks fail, raise ClusterNotReadyError.

    @override
    def pre_upgrade_check(self) -> None:
        default_message = "Pre-upgrade check failed and cannot safely upgrade"
        try:
            if not self.client.members_broadcasting or not len(self.client.server_members) == len(
                self.charm.cluster.peer_units
            ):
                raise ClusterNotReadyError(
                    message=default_message,
                    cause="Not all application units are connected and broadcasting in the quorum",
                )

            if self.client.members_syncing:
                raise ClusterNotReadyError(
                    message=default_message, cause="Some quorum members are syncing data"
                )

            if not self.charm.cluster.stable:
                raise ClusterNotReadyError(
                    message=default_message, cause="Charm has not finished initialising"
                )

        except QuorumLeaderNotFoundError:
            raise ClusterNotReadyError(message=default_message, cause="Quorum leader not found")
        except ConnectionClosedError:
            raise ClusterNotReadyError(
                message=default_message, cause="Unable to connect to the cluster"
            )
Implementation of build_upgrade_stack() - VM ONLY

Oftentimes, it is necessary to ensure that the workload leader is the last unit to upgrade, to ensure high-availability during the upgrade process. Here, charm authors can create a LIFO stack of unit.ids, represented as a list of unit.id strings, with the leader unit being at i[0].

@override
def build_upgrade_stack(self) -> list[int]:
    upgrade_stack = []
    for unit in self.charm.cluster.peer_units:
        config = self.charm.cluster.unit_config(unit=unit)

        # upgrade quorum leader last
        if config["host"] == self.client.leader:
            upgrade_stack.insert(0, int(config["unit_id"]))
        else:
            upgrade_stack.append(int(config["unit_id"]))

    return upgrade_stack
Implementation of _on_upgrade_granted()

On relation-changed events, each unit will check the current upgrade-stack persisted to relation data. If that unit is at the top of the stack, it will emit an upgrade-granted event, which must be handled. Here, workloads can be re-installed with new versions, checks can be made, data synced etc. If the new unit successfully rejoined the cluster, call set_unit_completed(). If the new unit failed to rejoin the cluster, call set_unit_failed().

NOTE - It is essential here to manually call on_upgrade_changed if the unit is the current leader. This ensures that the leader gets it's own relation-changed event, and updates the upgrade-stack for other units to follow suit.

@override
def _on_upgrade_granted(self, event: UpgradeGrantedEvent) -> None:
    self.charm.snap.stop_snap_service()

    if not self.charm.snap.install():
        logger.error("Unable to install ZooKeeper Snap")
        self.set_unit_failed()
        return None

    logger.info(f"{self.charm.unit.name} upgrading service...")
    self.charm.snap.restart_snap_service()

    try:
        logger.debug("Running post-upgrade check...")
        self.pre_upgrade_check()

        logger.debug("Marking unit completed...")
        self.set_unit_completed()

        # ensures leader gets it's own relation-changed when it upgrades
        if self.charm.unit.is_leader():
            logger.debug("Re-emitting upgrade-changed on leader...")
            self.on_upgrade_changed(event)

    except ClusterNotReadyError as e:
        logger.error(e.cause)
        self.set_unit_failed()
Implementation of log_rollback_instructions()

If the upgrade fails, manual intervention may be required for cluster recovery. Here, charm authors can log out any necessary steps to take to recover from a failed upgrade. When a unit fails, this library will automatically log out this message.

@override
def log_rollback_instructions(self) -> None:
    logger.error("Upgrade failed. Please run `juju refresh` to previous version.")
Instantiating in the charm and deferring events

Charm authors must add a class attribute for the child class of DataUpgrade in the main charm. They must also ensure that any non-upgrade related events that may be unsafe to handle during an upgrade, are deferred if the unit is not in the idle state - i.e not currently upgrading.

class ZooKeeperCharm(CharmBase):
    def __init__(self, *args):
        super().__init__(*args)
        self.upgrade = ZooKeeperUpgrade(
            self,
            relation_name = "upgrade",
            substrate = "vm",
            dependency_model=ZooKeeperDependencyModel(
                **DEPENDENCIES
            ),
        )

    def restart(self, event) -> None:
        if not self.upgrade.state == "idle":
            event.defer()
            return None

        self.restart_snap_service()

def verify_requirements(
    version: str,
    requirement: str
)

Verifies a specified version against defined constraint.

Arguments

version

the version currently in use

requirement

Poetry version constraint

Returns

True if version meets defined requirement. Otherwise False

Description

Supports Poetry version constraints https://python-poetry.org/docs/dependency-specification/#version-constraints

class DependencyModel

Manager for a single dependency.

Description

To be used as part of another model representing a collection of arbitrary dependencies.

Example::

class KafkaDependenciesModel(BaseModel):
    kafka_charm: DependencyModel
    kafka_service: DependencyModel

deps = {
    "kafka_charm": {
        "dependencies": {"zookeeper": ">5"},
        "name": "kafka",
        "upgrade_supported": ">5",
        "version": "10",
    },
    "kafka_service": {
        "dependencies": {"zookeeper": "^3.6"},
        "name": "kafka",
        "upgrade_supported": "~3.3",
        "version": "3.3.2",
    },
}

model = KafkaDependenciesModel(**deps)  # loading dict in to model

print(model.dict())  # exporting back validated deps

Methods

DependencyModel. dependencies_validator( cls , value )

Description

Validates version constraint. None

DependencyModel. version_upgrade_supported_validator( cls , values )

Description

Validates specified version meets upgrade_supported requirement. None

DependencyModel. can_upgrade( self , dependency )

Compares two instances of :class:DependencyModel for upgradability.

Arguments

dependency

a dependency model to compare this model against

Returns

True if current model can upgrade from dependent model. Otherwise False

class UpgradeError

Description

Base class for upgrade related exceptions in the module. None

Methods

UpgradeError. __init__( self , message: str , cause , resolution )

UpgradeError. __repr__( self )

Description

Representation of the UpgradeError class. None

UpgradeError. __str__( self )

Description

String representation of the UpgradeError class. None

class ClusterNotReadyError

Exception flagging that the cluster is not ready to start upgrading.

Arguments

message

string message to be logged out

cause

short human-readable description of the cause of the error

resolution

short human-readable instructions for manual error resolution (optional)

Description

For example, if the cluster fails :class:DataUpgrade._on_pre_upgrade_check_action

Methods

ClusterNotReadyError. __init__( self , message: str , cause: str , resolution )

class KubernetesClientError

Exception flagging that a call to Kubernetes API failed.

Arguments

message

string message to be logged out

cause

short human-readable description of the cause of the error

resolution

short human-readable instructions for manual error resolution (optional)

Description

For example, if the cluster fails :class:DataUpgrade._set_rolling_update_partition

Methods

KubernetesClientError. __init__( self , message: str , cause: str , resolution )

class VersionError

Exception flagging that the old version fails to meet the new upgrade_supporteds.

Arguments

message

string message to be logged out

cause

short human-readable description of the cause of the error

resolution

short human-readable instructions for manual solutions to the error (optional)

Description

For example, upgrades from version 2.x --> 4.x, but 4.x only supports upgrading from 3.x onwards

Methods

VersionError. __init__( self , message: str , cause: str , resolution )

class DependencyError

Exception flagging that some new dependency is not being met.

Arguments

message

string message to be logged out

cause

short human-readable description of the cause of the error

resolution

short human-readable instructions for manual solutions to the error (optional)

Description

For example, new version requires related App version 2.x, but currently is 1.x

Methods

DependencyError. __init__( self , message: str , cause: str , resolution )

class UpgradeGrantedEvent

Description

Used to tell units that they can process an upgrade. None

class UpgradeFinishedEvent

Description

Used to tell units that they finished the upgrade. None

class UpgradeEvents

Upgrade events.

Description

This class defines the events that the lib can emit.

class DataUpgrade

Description

Manages upgrade relation operations for in-place upgrades. None

Methods

DataUpgrade. __init__( self , charm: CharmBase , dependency_model: BaseModel , relation_name: str , substrate )

DataUpgrade. peer_relation( self )

Description

The upgrade peer relation. None

DataUpgrade. app_units( self )

Description

The peer-related units in the application. None

DataUpgrade. state( self )

Description

The unit state from the upgrade peer relation. None

DataUpgrade. stored_dependencies( self )

Description

The application dependencies from the upgrade peer relation. None

DataUpgrade. upgrade_stack( self )

Gets the upgrade stack from the upgrade peer relation.

Returns

List of integer unit.ids, ordered in upgrade order in a stack

Description

Unit.ids are ordered Last-In-First-Out (LIFO). i.e unit.id at index -1 is the first unit to upgrade. unit.id at index 0 is the last unit to upgrade.

DataUpgrade. upgrade_stack( self , stack )

Sets the upgrade stack to the upgrade peer relation.

Description

Unit.ids are ordered Last-In-First-Out (LIFO). i.e unit.id at index -1 is the first unit to upgrade. unit.id at index 0 is the last unit to upgrade.

DataUpgrade. other_unit_states( self )

Current upgrade state for other units.

Returns

Unsorted list of upgrade states for other units.

DataUpgrade. unit_states( self )

Current upgrade state for all units.

Returns

Unsorted list of upgrade states for all units.

DataUpgrade. cluster_state( self )

Current upgrade state for cluster units.

Returns

String of upgrade state from the furthest behind unit.

Description

Determined from :class:DataUpgrade.STATE, taking the lowest ordinal unit state.

For example, if units in have states: ["ready", "upgrading", "completed"], the overall state for the cluster is ready.

DataUpgrade. idle( self )

Flag for whether the cluster is in an idle upgrade state.

Returns

True if all application units in idle state. Otherwise False

DataUpgrade. pre_upgrade_check( self )

Runs necessary checks validating the cluster is in a healthy state to upgrade.

Description

Called by all units during :meth:_on_pre_upgrade_check_action.

DataUpgrade. build_upgrade_stack( self )

Builds ordered iterable of all application unit.ids to upgrade in.

Returns

Iterable of integer unit.ids, LIFO ordered in upgrade order i.e [5, 2, 4, 1, 3], unit 3 upgrades first, 5 upgrades last

Description

Called by leader unit during :meth:_on_pre_upgrade_check_action.

DataUpgrade. log_rollback_instructions( self )

Sets charm state and logs out rollback instructions.

Description

Called by all units when state=failed found during :meth:_on_upgrade_changed.

DataUpgrade. set_unit_failed( self , cause )

Sets unit state=failed to the upgrade peer data.

Arguments

cause

short description of cause of failure

DataUpgrade. set_unit_completed( self )

Description

Sets unit state=completed to the upgrade peer data. None

DataUpgrade. on_upgrade_changed( self , event: EventBase )

Description

Handler for upgrade-relation-changed events. None