Rolling Ops Library and Example Charm

By Canonical Data Platform

Architecture:

Channel	Revision	Published	Runs on
latest/stable	3	23 Apr 2024	Ubuntu 22.04
latest/edge	3	17 Apr 2024	Ubuntu 22.04

Learn to deploy on juju >

Platform:

22.04

charms.rolling_ops.v0.rollingops

Docstrings Source code
- Fetch library
  
  charmcraft fetch-lib charms.rolling_ops.v0.rollingops
  Download rollingops.py
- 17 Apr 2024
- Library version 0.6

This library enables "rolling" operations across units of a charmed Application.

For example, a charm author might use this library to implement a "rolling restart", in which all units in an application restart their workload, but no two units execute the restart at the same time.

To implement the rolling restart, a charm author would do the following:

Add a peer relation called 'restart' to a charm's metadata.yaml:

peers:
    restart:
        interface: rolling_op

Import this library into src/charm.py, and initialize a RollingOpsManager in the Charm's __init__. The Charm should also define a callback routine, which will be executed when a unit holds the distributed lock:

src/charm.py

# ...
from charms.rolling_ops.v0.rollingops import RollingOpsManager
# ...
class SomeCharm(...):
    def __init__(...)
        # ...
        self.restart_manager = RollingOpsManager(
            charm=self, relation="restart", callback=self._restart
        )
        # ...
    def _restart(self, event):
        systemd.service_restart('foo')

To kick off the rolling restart, emit this library's AcquireLock event. The simplest way to do so would be with an action, though it might make sense to acquire the lock in response to another event.

    def _on_trigger_restart(self, event):
        self.charm.on[self.restart_manager.name].acquire_lock.emit()

In order to trigger the restart, a human operator would execute the following command on the CLI:

juju run-action some-charm/0 some-charm/1 <... some-charm/n> restart

Note that all units that plan to restart must receive the action and emit the aquire event. Any units that do not run their acquire handler will be left out of the rolling restart. (An operator might take advantage of this fact to recover from a failed rolling operation without restarting workloads that were able to successfully restart -- simply omit the successful units from a subsequent run-action call.)

Index

class LockNoRelationError
class LockState
class Lock
- def __init__( self, manager, unit)
- def acquire( self)
- def release( self)
- def clear( self)
- def grant( self)
- def is_held( self)
- def release_requested( self)
- def is_pending( self)
class Locks
- def __init__( self, manager)
- def __iter__( self)
class RunWithLock
class AcquireLock
- def __init__( self, handle, callback_override)
- def snapshot( self)
- def restore( self, snapshot)
class ProcessLocks
class RollingOpsManager
- def __init__( self, charm, relation, callback)

Description

Raised if we are trying to process a lock, but do not appear to have a relation yet. None

Description

Note that there are two states set on the unit, and two on the application.

Description

Warning: a Lock has permission to update relation data, which means that there are side effects to invoking the .acquire, .release and .grant methods. Running any one of them will trigger a RelationChanged event, once per transition from one internal status to another.

This class tracks state across the cloud by implementing a peer relation interface. There are two parts to the interface:

The data on a unit's peer relation (defined in metadata.yaml.) Each unit can update this data. The only meaningful values are "acquire", and "release", which represent a request to acquire the lock, and a request to release the lock, respectively.
The application data in the relation. This tracks whether the lock has been "granted", Or has been released (and reverted to idle). There are two valid states: "granted" or None. If a lock is in the "granted" state, a unit should emit a RunWithLocks event and then release the lock.

If a lock is in "None", this means that a unit has not yet requested the lock, or that the request has been completed.

In more detail, here is the relation structure:

relation.data: <unit n>: status: 'acquire|release' <application>: <unit n>: 'granted|None'

Note that this class makes no attempts to timestamp the locks and thus handle multiple requests in a row. If a unit re-requests a lock before being granted the lock, the lock will simply stay in the "acquire" state. If a unit wishes to clear its lock, it simply needs to call lock.release().

Methods

Lock. __init__( self , manager , unit )

Lock. acquire( self )

Description

Request that a lock be acquired. None

Lock. release( self )

Description

Request that a lock be released. None

Lock. clear( self )

Description

Unset a lock. None

Lock. grant( self )

Description

Grant a lock to a unit. None

Lock. is_held( self )

Description

This unit holds the lock. None

Lock. release_requested( self )

Description

A unit has reported that they are finished with the lock. None

Lock. is_pending( self )

Description

Is this unit waiting for a lock? None

Description

Generator that returns a list of locks. None

Methods

Locks. __init__( self , manager )

Locks. __iter__( self )

Description

Yields a lock for each unit we can find on the relation. None

Description

Event to signal that this unit should run the callback. None

Description

Signals that this unit wants to acquire a lock. None

Methods

AcquireLock. __init__( self , handle , callback_override )

AcquireLock. snapshot( self )

Description

Snapshot of lock event. None

AcquireLock. restore( self , snapshot )

Description

Restores lock event. None

Description

Used to tell the leader to process all locks. None

Description

Emitters and handlers for rolling ops. None

Methods

RollingOpsManager. __init__( self , charm: CharmBase , relation: AnyStr , callback: Callable )

Description

params: charm: the charm we are attaching this to. relation: an identifier, by convention based on the name of the relation in the metadata.yaml, which identifies this instance of RollingOperatorsFactory, distinct from other instances that may be handling other events. callback: a closure to run when we have a lock. (It must take a CharmBase object and EventBase object as args.)

Rolling Ops Library and Example Charm

charms.rolling_ops.v0.rollingops

class LockNoRelationError

class LockState

class Lock

class Locks

class RunWithLock

class AcquireLock

class ProcessLocks

class RollingOpsManager