Rolling Ops Library and Example Charm
- Canonical
Channel | Revision | Published | Runs on |
---|---|---|---|
latest/stable | 5 | 26 Nov 2024 | |
latest/edge | 14 | 10 Jan 2025 |
juju deploy rolling-ops
Deploy universal operators easily with Juju, the Universal Operator Lifecycle Manager.
Platform:
charms.rolling_ops.v0.rollingops
-
- Last updated 30 Apr 2024
- Revision Library version 0.7
This library enables "rolling" operations across units of a charmed Application.
For example, a charm author might use this library to implement a "rolling restart", in which all units in an application restart their workload, but no two units execute the restart at the same time.
To implement the rolling restart, a charm author would do the following:
- Add a peer relation called 'restart' to a charm's
metadata.yaml
:
peers:
restart:
interface: rolling_op
Import this library into src/charm.py, and initialize a RollingOpsManager in the Charm's
__init__
. The Charm should also define a callback routine, which will be executed when
a unit holds the distributed lock:
src/charm.py
# ...
from charms.rolling_ops.v0.rollingops import RollingOpsManager
# ...
class SomeCharm(...):
def __init__(...)
# ...
self.restart_manager = RollingOpsManager(
charm=self, relation="restart", callback=self._restart
)
# ...
def _restart(self, event):
systemd.service_restart('foo')
To kick off the rolling restart, emit this library's AcquireLock event. The simplest way to do so would be with an action, though it might make sense to acquire the lock in response to another event.
def _on_trigger_restart(self, event):
self.charm.on[self.restart_manager.name].acquire_lock.emit()
In order to trigger the restart, a human operator would execute the following command on the CLI:
juju run-action some-charm/0 some-charm/1 <... some-charm/n> restart
Note that all units that plan to restart must receive the action and emit the aquire event. Any units that do not run their acquire handler will be left out of the rolling restart. (An operator might take advantage of this fact to recover from a failed rolling operation without restarting workloads that were able to successfully restart -- simply omit the successful units from a subsequent run-action call.)
Index
class LockNoRelationError
Description
Raised if we are trying to process a lock, but do not appear to have a relation yet. None
class LockState
Possible states for our Distributed lock.
Description
Note that there are two states set on the unit, and two on the application.
class Lock
A class that keeps track of a single asynchronous lock.
Description
Warning: a Lock has permission to update relation data, which means that there are side effects to invoking the .acquire, .release and .grant methods. Running any one of them will trigger a RelationChanged event, once per transition from one internal status to another.
This class tracks state across the cloud by implementing a peer relation interface. There are two parts to the interface:
The data on a unit's peer relation (defined in metadata.yaml.) Each unit can update this data. The only meaningful values are "acquire", and "release", which represent a request to acquire the lock, and a request to release the lock, respectively.
The application data in the relation. This tracks whether the lock has been "granted", Or has been released (and reverted to idle). There are two valid states: "granted" or None. If a lock is in the "granted" state, a unit should emit a RunWithLocks event and then release the lock.
If a lock is in "None", this means that a unit has not yet requested the lock, or that the request has been completed.
In more detail, here is the relation structure:
relation.data: <unit n>: status: 'acquire|release' <application>: <unit n>: 'granted|None'
Note that this class makes no attempts to timestamp the locks and thus handle multiple requests in a row. If a unit re-requests a lock before being granted the lock, the lock will simply stay in the "acquire" state. If a unit wishes to clear its lock, it simply needs to call lock.release().
Methods
Lock. __init__( self , manager , unit )
Lock. acquire( self )
Description
Request that a lock be acquired. None
Lock. release( self )
Description
Request that a lock be released. None
Lock. clear( self )
Description
Unset a lock. None
Lock. grant( self )
Description
Grant a lock to a unit. None
Lock. is_held( self )
Description
This unit holds the lock. None
Lock. release_requested( self )
Description
A unit has reported that they are finished with the lock. None
Lock. is_pending( self )
Description
Is this unit waiting for a lock? None
class Locks
Description
Generator that returns a list of locks. None
Methods
Locks. __init__( self , manager )
Locks. __iter__( self )
Description
Yields a lock for each unit we can find on the relation. None
class RunWithLock
Description
Event to signal that this unit should run the callback. None
class AcquireLock
Description
Signals that this unit wants to acquire a lock. None
Methods
AcquireLock. __init__( self , handle , callback_override )
AcquireLock. snapshot( self )
Description
Snapshot of lock event. None
AcquireLock. restore( self , snapshot )
Description
Restores lock event. None
class ProcessLocks
Description
Used to tell the leader to process all locks. None
class RollingOpsManager
Description
Emitters and handlers for rolling ops. None
Methods
RollingOpsManager. __init__( self , charm: CharmBase , relation: AnyStr , callback: Callable )
Register our custom events.
Description
params: charm: the charm we are attaching this to. relation: an identifier, by convention based on the name of the relation in the metadata.yaml, which identifies this instance of RollingOperatorsFactory, distinct from other instances that may be handling other events. callback: a closure to run when we have a lock. (It must take a CharmBase object and EventBase object as args.)