Execution Lifecycle States
This state diagram represents the lifecycle of a Flyte execution (referred to as an "Action" in the SDK). The states are derived from the ActionPhase enum found in src/flyte/models.py.
The lifecycle begins in an Undefined state (representing the protobuf ACTION_PHASE_UNSPECIFIED) and moves to Queued upon creation. From there, it progresses through resource allocation (Waiting for Resources) and setup (Initializing) before entering the Running state.
Terminal states include Succeeded, Failed, Aborted, and Timed Out. The diagram also captures the retry mechanism where a Running action can transition back to Queued if a retryable failure occurs. Transitions can be triggered by the Flyte backend (scheduling, execution completion, timeouts) or by the user via the SDK (e.g., calling Action.abort()).
Key findings from the code:
- The
ActionPhaseenum defines the core states. - The
is_terminalproperty inActionPhaseidentifies the final states. - The
Action.abort()method insrc/flyte/remote/_action.pyexplicitly triggers a transition to the Aborted state. - The
Action.wait()andAction.watch()methods allow users to monitor these state transitions in real-time.
Key Architectural Findings:
- The
ActionPhaseenum insrc/flyte/models.pydefines the primary execution states: QUEUED, WAITING_FOR_RESOURCES, INITIALIZING, RUNNING, SUCCEEDED, FAILED, ABORTED, and TIMED_OUT. - Terminal states are explicitly defined in the code via the
is_terminalproperty and the_action_done_checkutility function. - The SDK provides an
abort()method on theActionclass to manually transition an execution to the ABORTED state. - Retry logic (though managed by the backend) is reflected in the SDK through attempt tracking and the ability for an action to return to a non-terminal state after a failure.
- The
ActionPhase.from_protobufmethod handles the mapping from the underlying Flyte IDL phases to the SDK's enum, including handling of the UNSPECIFIED (Undefined) state.