Data Platform Libs
- Canonical
- Databases
Channel | Revision | Published | Runs on |
---|---|---|---|
latest/stable | 81 | 19 Nov 2024 | |
latest/edge | 81 | 19 Nov 2024 |
juju deploy data-platform-libs
Deploy universal operators easily with Juju, the Universal Operator Lifecycle Manager.
Platform:
charms.data_platform_libs.v0.upgrade
-
- Last updated 01 Jul 2024
- Revision Library version 0.18
Library to manage in-place upgrades for charms running on VMs and K8s.
This library contains handlers for upgrade
relation events used to coordinate
between units in an application during a juju refresh
, as well as Pydantic
models
for instantiating, validating and comparing dependencies.
An upgrade on VMs is initiated with the command juju refresh
. Once executed, the following
events are emitted to each unit at random:
- upgrade-charm
- config-changed
- leader-settings-changed
- Non-leader only
Charm authors can implement the classes defined in this library to streamline the process of
coordinating which unit updates when, achieved through updating of unit-data state
throughout.
At a high-level, the upgrade steps are as follows: - Run pre-checks on the cluster to confirm it is safe to upgrade - Create stack of unit.ids, to serve as the upgrade order (generally workload leader is last) - Start the upgrade by issuing a Juju CLI command - The unit at the top of the stack gets permission to upgrade - The unit handles the upgrade and restarts their service - Repeat, until all units have restarted
Usage by charm authors
upgrade
relation
Charm authors must implement an additional peer-relation.
As this library uses relation data exchanged between units to coordinate, charm authors need to add a new relation interface. The relation name does not matter.
metadata.yaml
peers:
upgrade:
interface: upgrade
Dependencies JSON/Dict
Charm authors must implement a dict object tracking current charm versions, requirements + upgradability.
Many workload versions may be incompatible with older/newer versions. This same idea also can apply to charm or snap versions. Workloads with required related applications (e.g Kafka + ZooKeeper) also need to ensure their versions are compatible during an upgrade, to avoid cluster failure.
As such, it is necessasry to freeze any dependencies within each published charm. An example of this could
be creating a DEPENDENCIES
dict within the charm code, with the following structure:
src/literals.py
DEPENDENCIES = {
"kafka_charm": {
"dependencies": {"zookeeper": ">50"},
"name": "kafka",
"upgrade_supported": ">90",
"version": "100",
},
"kafka_service": {
"dependencies": {"zookeeper": "^3"},
"name": "kafka",
"upgrade_supported": ">=0.8",
"version": "3.3.2",
},
}
The first-level key names are arbitrary labels for tracking what those versions+dependencies are for.
The dependencies
second-level values are a key-value map of any required external applications,
and the versions this packaged charm can support.
The upgrade_suppported
second-level values are requirements from which an in-place upgrade can be
supported by the charm.
The version
second-level values correspond to the current version of this packaged charm.
Any requirements comply with poetry
's dependency specifications.
Dependency Model
Charm authors must implement their own class inheriting from DependencyModel
.
Using a Pydantic
model to instantiate the aforementioned DEPENDENCIES
dict gives stronger type safety and additional
layers of validation.
Implementation just needs to ensure that the top-level key names from DEPENDENCIES
are defined as attributed in the model.
src/upgrade.py
from pydantic import BaseModel
class KafkaDependenciesModel(BaseModel):
kafka_charm: DependencyModel
kafka_service: DependencyModel
Overrides for DataUpgrade
Charm authors must define their own class, inheriting from DataUpgrade
, overriding all required abstractmethod
s.
class ZooKeeperUpgrade(DataUpgrade):
def __init__(self, charm: "ZooKeeperUpgrade", **kwargs):
super().__init__(charm, **kwargs)
self.charm = charm
Implementation of pre_upgrade_check()
Before upgrading a cluster, it's a good idea to check that it is stable and healthy before permitting it.
Here, charm authors can validate upgrade safety through API calls, relation-data checks, etc.
If any of these checks fail, raise ClusterNotReadyError
.
@override
def pre_upgrade_check(self) -> None:
default_message = "Pre-upgrade check failed and cannot safely upgrade"
try:
if not self.client.members_broadcasting or not len(self.client.server_members) == len(
self.charm.cluster.peer_units
):
raise ClusterNotReadyError(
message=default_message,
cause="Not all application units are connected and broadcasting in the quorum",
)
if self.client.members_syncing:
raise ClusterNotReadyError(
message=default_message, cause="Some quorum members are syncing data"
)
if not self.charm.cluster.stable:
raise ClusterNotReadyError(
message=default_message, cause="Charm has not finished initialising"
)
except QuorumLeaderNotFoundError:
raise ClusterNotReadyError(message=default_message, cause="Quorum leader not found")
except ConnectionClosedError:
raise ClusterNotReadyError(
message=default_message, cause="Unable to connect to the cluster"
)
Implementation of build_upgrade_stack()
- VM ONLY
Oftentimes, it is necessary to ensure that the workload leader is the last unit to upgrade, to ensure high-availability during the upgrade process. Here, charm authors can create a LIFO stack of unit.ids, represented as a list of unit.id strings, with the leader unit being at i[0].
@override
def build_upgrade_stack(self) -> list[int]:
upgrade_stack = []
for unit in self.charm.cluster.peer_units:
config = self.charm.cluster.unit_config(unit=unit)
# upgrade quorum leader last
if config["host"] == self.client.leader:
upgrade_stack.insert(0, int(config["unit_id"]))
else:
upgrade_stack.append(int(config["unit_id"]))
return upgrade_stack
Implementation of _on_upgrade_granted()
On relation-changed events, each unit will check the current upgrade-stack persisted to relation data.
If that unit is at the top of the stack, it will emit an upgrade-granted
event, which must be handled.
Here, workloads can be re-installed with new versions, checks can be made, data synced etc.
If the new unit successfully rejoined the cluster, call set_unit_completed()
.
If the new unit failed to rejoin the cluster, call set_unit_failed()
.
NOTE - It is essential here to manually call on_upgrade_changed
if the unit is the current leader.
This ensures that the leader gets it's own relation-changed event, and updates the upgrade-stack for
other units to follow suit.
@override
def _on_upgrade_granted(self, event: UpgradeGrantedEvent) -> None:
self.charm.snap.stop_snap_service()
if not self.charm.snap.install():
logger.error("Unable to install ZooKeeper Snap")
self.set_unit_failed()
return None
logger.info(f"{self.charm.unit.name} upgrading service...")
self.charm.snap.restart_snap_service()
try:
logger.debug("Running post-upgrade check...")
self.pre_upgrade_check()
logger.debug("Marking unit completed...")
self.set_unit_completed()
# ensures leader gets it's own relation-changed when it upgrades
if self.charm.unit.is_leader():
logger.debug("Re-emitting upgrade-changed on leader...")
self.on_upgrade_changed(event)
except ClusterNotReadyError as e:
logger.error(e.cause)
self.set_unit_failed()
Implementation of log_rollback_instructions()
If the upgrade fails, manual intervention may be required for cluster recovery. Here, charm authors can log out any necessary steps to take to recover from a failed upgrade. When a unit fails, this library will automatically log out this message.
@override
def log_rollback_instructions(self) -> None:
logger.error("Upgrade failed. Please run `juju refresh` to previous version.")
Instantiating in the charm and deferring events
Charm authors must add a class attribute for the child class of DataUpgrade
in the main charm.
They must also ensure that any non-upgrade related events that may be unsafe to handle during
an upgrade, are deferred if the unit is not in the idle
state - i.e not currently upgrading.
class ZooKeeperCharm(CharmBase):
def __init__(self, *args):
super().__init__(*args)
self.upgrade = ZooKeeperUpgrade(
self,
relation_name = "upgrade",
substrate = "vm",
dependency_model=ZooKeeperDependencyModel(
**DEPENDENCIES
),
)
def restart(self, event) -> None:
if not self.upgrade.state == "idle":
event.defer()
return None
self.restart_snap_service()
Index
def
verify_requirements(
version: str,
requirement: str
)
Verifies a specified version against defined constraint.
Arguments
the version currently in use
Poetry version constraint
Returns
True if version
meets defined requirement
. Otherwise False
Description
Supports Poetry version constraints https://python-poetry.org/docs/dependency-specification/#version-constraints
class DependencyModel
Manager for a single dependency.
Description
To be used as part of another model representing a collection of arbitrary dependencies.
Example::
class KafkaDependenciesModel(BaseModel):
kafka_charm: DependencyModel
kafka_service: DependencyModel
deps = {
"kafka_charm": {
"dependencies": {"zookeeper": ">5"},
"name": "kafka",
"upgrade_supported": ">5",
"version": "10",
},
"kafka_service": {
"dependencies": {"zookeeper": "^3.6"},
"name": "kafka",
"upgrade_supported": "~3.3",
"version": "3.3.2",
},
}
model = KafkaDependenciesModel(**deps) # loading dict in to model
print(model.dict()) # exporting back validated deps
Methods
DependencyModel. dependencies_validator( cls , value )
Description
Validates version constraint. None
DependencyModel. version_upgrade_supported_validator( cls , values )
Description
Validates specified version
meets upgrade_supported
requirement. None
DependencyModel. can_upgrade( self , dependency )
Compares two instances of :class:DependencyModel
for upgradability.
Arguments
a dependency model to compare this model against
Returns
True if current model can upgrade from dependent model. Otherwise False
class UpgradeError
Description
Base class for upgrade related exceptions in the module. None
Methods
UpgradeError. __init__( self , message: str , cause , resolution )
UpgradeError. __repr__( self )
Description
Representation of the UpgradeError class. None
UpgradeError. __str__( self )
Description
String representation of the UpgradeError class. None
class ClusterNotReadyError
Exception flagging that the cluster is not ready to start upgrading.
Arguments
string message to be logged out
short human-readable description of the cause of the error
short human-readable instructions for manual error resolution (optional)
Description
For example, if the cluster fails :class:DataUpgrade._on_pre_upgrade_check_action
Methods
ClusterNotReadyError. __init__( self , message: str , cause: str , resolution )
class KubernetesClientError
Exception flagging that a call to Kubernetes API failed.
Arguments
string message to be logged out
short human-readable description of the cause of the error
short human-readable instructions for manual error resolution (optional)
Description
For example, if the cluster fails :class:DataUpgrade._set_rolling_update_partition
Methods
KubernetesClientError. __init__( self , message: str , cause: str , resolution )
class VersionError
Exception flagging that the old version
fails to meet the new upgrade_supported
s.
Arguments
string message to be logged out
short human-readable description of the cause of the error
short human-readable instructions for manual solutions to the error (optional)
Description
For example, upgrades from version 2.x
--> 4.x
,
but 4.x
only supports upgrading from 3.x
onwards
Methods
VersionError. __init__( self , message: str , cause: str , resolution )
class DependencyError
Exception flagging that some new dependency
is not being met.
Arguments
string message to be logged out
short human-readable description of the cause of the error
short human-readable instructions for manual solutions to the error (optional)
Description
For example, new version requires related App version 2.x
, but currently is 1.x
Methods
DependencyError. __init__( self , message: str , cause: str , resolution )
class UpgradeGrantedEvent
Description
Used to tell units that they can process an upgrade. None
class UpgradeFinishedEvent
Description
Used to tell units that they finished the upgrade. None
class UpgradeEvents
Upgrade events.
Description
This class defines the events that the lib can emit.
class DataUpgrade
Description
Manages upgrade
relation operations for in-place upgrades. None
Methods
DataUpgrade. __init__( self , charm: CharmBase , dependency_model: BaseModel , relation_name: str , substrate )
DataUpgrade. peer_relation( self )
Description
The upgrade peer relation. None
DataUpgrade. app_units( self )
Description
The peer-related units in the application. None
DataUpgrade. state( self )
Description
The unit state from the upgrade peer relation. None
DataUpgrade. stored_dependencies( self )
Description
The application dependencies from the upgrade peer relation. None
DataUpgrade. upgrade_stack( self )
Gets the upgrade stack from the upgrade peer relation.
Returns
List of integer unit.ids, ordered in upgrade order in a stack
Description
Unit.ids are ordered Last-In-First-Out (LIFO).
i.e unit.id at index -1
is the first unit to upgrade.
unit.id at index 0
is the last unit to upgrade.
DataUpgrade. upgrade_stack( self , stack )
Sets the upgrade stack to the upgrade peer relation.
Description
Unit.ids are ordered Last-In-First-Out (LIFO).
i.e unit.id at index -1
is the first unit to upgrade.
unit.id at index 0
is the last unit to upgrade.
DataUpgrade. other_unit_states( self )
Current upgrade state for other units.
Returns
Unsorted list of upgrade states for other units.
DataUpgrade. unit_states( self )
Current upgrade state for all units.
Returns
Unsorted list of upgrade states for all units.
DataUpgrade. cluster_state( self )
Current upgrade state for cluster units.
Returns
String of upgrade state from the furthest behind unit.
Description
Determined from :class:DataUpgrade.STATE
, taking the lowest ordinal unit state.
For example, if units in have states: ["ready", "upgrading", "completed"]
,
the overall state for the cluster is ready
.
DataUpgrade. idle( self )
Flag for whether the cluster is in an idle upgrade state.
Returns
True if all application units in idle state. Otherwise False
DataUpgrade. pre_upgrade_check( self )
Runs necessary checks validating the cluster is in a healthy state to upgrade.
Description
Called by all units during :meth:_on_pre_upgrade_check_action
.
DataUpgrade. build_upgrade_stack( self )
Builds ordered iterable of all application unit.ids to upgrade in.
Returns
Iterable of integer unit.ids, LIFO ordered in upgrade order
i.e [5, 2, 4, 1, 3]
, unit 3
upgrades first, 5
upgrades last
Description
Called by leader unit during :meth:_on_pre_upgrade_check_action
.
DataUpgrade. log_rollback_instructions( self )
Sets charm state and logs out rollback instructions.
Description
Called by all units when state=failed
found during :meth:_on_upgrade_changed
.
DataUpgrade. set_unit_failed( self , cause )
Sets unit state=failed
to the upgrade peer data.
Arguments
short description of cause of failure
DataUpgrade. set_unit_completed( self )
Description
Sets unit state=completed
to the upgrade peer data. None
DataUpgrade. on_upgrade_changed( self , event: EventBase )
Description
Handler for upgrade-relation-changed
events. None