Rolling Ops Library and Example Charm
Platform:
| Channel | Revision | Published | Runs on |
|---|---|---|---|
| latest/stable | 5 | 26 Nov 2024 | |
| latest/edge | 46 | 23 Apr 2026 |
juju deploy rolling-ops
charms.rolling_ops.v1.rollingops
-
- Last updated 03 Apr 2026
- Revision Library version 1.0
Rolling Ops v1 — coordinated rolling operations for Juju charms.
This library provides a reusable mechanism for coordinating rolling operations across units of a Juju application using a peer-relation distributed lock.
The library guarantees that at most one unit executes a rolling operation at any time, while allowing multiple units to enqueue operations and participate in a coordinated rollout.
Data model (peer relation)
Unit databag
Each unit maintains a FIFO queue of operations it wishes to execute.
Keys:
operations: JSON-encoded list of queuedOperationobjectsstate:"idle"|"request"|"retry-release"|"retry-hold"executed_at: UTC timestamp string indicating when the current operation last ran
Each Operation contains:
callback_id: identifier of the callback to executekwargs: JSON-serializable arguments for the callbackrequested_at: UTC timestamp when the operation was enqueuedmax_retry (optional): maximum retry count.Nonemeans unlimitedattempt: current attempt number
Application databag
The application databag represents the global lock state.
Keys:
granted_unit: unit identifier (unit name), or emptygranted_at: UTC timestamp indicating when the lock was granted
Operation semantics
- Units enqueue operations instead of overwriting a single pending request.
- Duplicate operations (same
callback_idandkwargs) are ignored if they are already the last queued operation. - When granted the lock, a unit executes exactly one operation (the queue head).
- After execution, the lock is released so that other units may proceed.
Retry semantics
- If a callback returns
OperationResult.RETRY_RELEASEthe unit will release the lock and retry the operation later. - If a callback returns
OperationResult.RETRY_HOLDthe unit will keep the lock and retry immediately. - Retry state (
attempt) is tracked per operation. - When
max_retryis exceeded, the failing operation is dropped and the unit proceeds to the next queued operation, if any.
Scheduling semantics
- Only the leader schedules lock grants.
- If a valid lock grant exists, no new unit is scheduled.
- Requests are preferred over retries.
- Among requests, the operation with the oldest
requested_attimestamp is selected. - Among retries, the operation with the oldest
executed_attimestamp is selected. - Stale grants (e.g., pointing to departed units) are automatically released.
All timestamps are stored in UTC using ISO 8601 format.
Using the library in a charm
1. Declare a peer relation
peers:
restart:
interface: rolling_op
Import this library into src/charm.py, and initialize a RollingOpsManagerV1 in the Charm's
__init__. The Charm should also define a callback routine, which will be executed when
a unit holds the distributed lock:
src/charm.py
from charms.rolling_ops.v1.rollingops import RollingOpsManagerV1, OperationResult
class SomeCharm(CharmBase):
def __init__(self, *args):
super().__init__(*args)
self.rolling_ops = RollingOpsManagerV1(
charm=self,
relation_name="restart",
callback_targets={
"restart": self._restart,
"failed_restart": self._failed_restart,
"defer_restart": self._defer_restart,
},
)
def _restart(self, force: bool) -> OperationResult:
# perform restart logic
return OperationResult.RELEASE
def _failed_restart(self) -> OperationResult:
# perform restart logic
return OperationResult.RETRY_RELEASE
def _defer_restart(self) -> OperationResult:
if not self.some_condition():
return OperationResult.RETRY_HOLD
# do restart logic
return OperationResult.RELEASE
Request a rolling operation
def _on_restart_action(self, event) -> None:
self.rolling_ops.request_async_lock(
callback_id="restart",
kwargs={"force": True},
max_retry=3,
)
All participating units must enqueue the operation in order to be included in the rolling execution.
Units that do not enqueue the operation will be skipped, allowing operators to recover from partial failures by reissuing requests selectively.
Do not include sensitive information in the kwargs of the callback. These values will be stored in the databag.
Make sure that callback_targets is not dynamic and that the mapping contains the expected values at the moment of the callback execution.
Index
class RollingOpsNoRelationError
Description
Raised if we are trying to process a lock, but do not appear to have a relation yet. None
class RollingOpsDecodingError
Description
Raised if the content of the databag cannot be processed. None
class RollingOpsInvalidLockRequestError
Description
Raised if the lock request is invalid. None
class Operation
Description
A single queued operation. None
Methods
Operation. __post_init__( self )
Description
Validate the class attributes. None
Operation. create( cls , callback_id: str , kwargs , max_retry )
Description
Create a new operation from a callback id and kwargs. None
Operation. to_string( self )
Description
Serialize to a string suitable for a Juju databag. None
Operation. increase_attempt( self )
Description
Increment the attempt counter. None
Operation. is_max_retry_reached( self )
Description
Return True if attempt exceeds max_retry (unless max_retry is None). None
Operation. from_string( cls , data: str )
Deserialize from a Juju databag string.
Operation. __eq__( self , other: object )
Description
Equal for the operation. None
Operation. __hash__( self )
Description
Hash for the operation. None
class OperationQueue
Description
In-memory FIFO queue of Operations with encode/decode helpers for storing in a databag. None
Methods
OperationQueue. __init__( self , operations )
OperationQueue. __len__( self )
Description
Return the number of operations in the queue. None
OperationQueue. empty( self )
Description
Return True if there are no queued operations. None
OperationQueue. peek( self )
Description
Return the first operation in the queue if it exists. None
OperationQueue. dequeue( self )
Description
Drop the first operation in the queue if it exists and return it. None
OperationQueue. increase_attempt( self )
Description
Increment the attempt counter for the head operation and persist it. None
OperationQueue. enqueue_lock_request( self , callback_id: str , kwargs , max_retry )
Description
Append operation only if it is not equal to the last enqueued operation. None
OperationQueue. to_string( self )
Description
Encode entire queue to a single string. None
OperationQueue. from_string( cls , data: str )
Decode queue from a string.
class LockIntent
Description
Unit-level lock intents stored in unit databags. None
class OperationResult
Description
Callback return values. None
class Lock
State machine view over peer relation databags for a single unit.
Description
This class is the only component that should directly read/write the peer relation databags for lock state, queue state, and grant state.
Important:
- All relation databag values are strings.
- This class updates both unit databags and app databags, which triggers relation-changed events.
Methods
Lock. __init__( self , model: Model , relation_name: str , unit: Unit )
Lock. request( self , callback_id: str , kwargs , max_retry )
Enqueue an operation and mark this unit as requesting the lock.
Arguments
identifies which callback to execute.
dict of callback kwargs.
None -> unlimited retries, else explicit integer.
Lock. retry_release( self )
Description
Indicate that the operation should be retried but the lock should be released. None
Lock. retry_hold( self )
Description
Indicate that the operation should be retried but the lock should be kept. None
Lock. complete( self )
Mark the head operation as completed successfully, pop it from the queue.
Description
Update unit state depending on whether more operations remain.
Lock. release( self )
Description
Clear the application-level grant. None
Lock. grant( self )
Description
Grant a lock to a unit. None
Lock. is_granted( self )
Description
Return True if the unit holds the lock. None
Lock. should_run( self )
Description
Return True if the lock has been granted to the unit and it is time to execute callback. None
Lock. should_release( self )
Description
Return True if the unit finished executing the callback and should be released. None
Lock. is_waiting( self )
Description
Return True if this unit is waiting for a lock to be granted. None
Lock. is_completed( self )
Description
Return True if this unit is completed callback but still has the grant (leader should clear). None
Lock. is_retry( self )
Description
Return True if this unit requested retry but still has the grant (leader should clear). None
Lock. is_waiting_retry( self )
Description
Return True if the unit requested retry and is waiting for lock to be granted. None
Lock. is_retry_hold( self )
Description
Return True if the unit requested retry and wants to keep the lock. None
Lock. get_current_operation( self )
Description
Return the head operation for this unit, if any. None
Lock. get_last_completed( self )
Description
Get the time the unit requested a retry of the head operation. None
Lock. get_requested_at( self )
Description
Get the time the head operation was requested at. None
class LockIterator
Description
Iterator over Lock objects for each unit present on the peer relation. None
Methods
LockIterator. __init__( self , model: Model , relation_name: str )
LockIterator. __iter__( self )
Description
Yields a lock for each unit we can find on the relation. None
def pick_oldest_completed(locks)
Description
Choose the retry lock with the oldest executed_at timestamp. None
def pick_oldest_request(locks)
Description
Choose the lock with the oldest head operation. None
class RollingOpsLockGrantedEvent
Description
Custom event emitted when the background worker grants the lock. None
class RollingOpsManagerV1
Description
Emitters and handlers for rolling ops. None
Methods
RollingOpsManagerV1. __init__( self , charm: CharmBase , relation_name: str , callback_targets )
Register our custom events.
Description
params: charm: the charm we are attaching this to. relation_name: the peer relation name from metadata.yaml. callback_targets: mapping from callback_id -> callable.
RollingOpsManagerV1. request_async_lock( self , callback_id: str , kwargs , max_retry )
Enqueue a rolling operation and request the distributed lock.
Arguments
Identifier for the callback to execute when this unit is granted the lock. Must be a non-empty string and must exist in the manager's callback registry.
Keyword arguments to pass to the callback when executed. If omitted, an empty dict is used. Must be JSON-serializable because it is stored in Juju relation databags.
Retry limit for this operation. None means unlimited retries. 0 means no retries (drop immediately on first failure). Must be >= 0 when provided.
Description
This method appends an operation (identified by callback_id and kwargs) to the calling unit's FIFO queue stored in the peer relation databag and marks the unit as requesting the lock. It does not execute the operation directly.
class RollingOpsAsyncWorker
Description
Spawns and manages the external rolling-ops worker process. None
Methods
RollingOpsAsyncWorker. __init__( self , charm: CharmBase , relation_name: str )
RollingOpsAsyncWorker. start( self )
Description
Start a new worker process. None
RollingOpsAsyncWorker. stop( self )
Description
Stop the running worker process if it exists. None
def main()
Description
Juju hook event dispatcher. None