Extensibility & Sandbox
Extensibility and sandboxing in this SDK are designed to decouple the orchestration of workflows from the execution of specialized or isolated logic. This is achieved through a pluggable connector architecture for external services, a dual-mode sandboxing system for safe code execution, and deep git integration for versioning and traceability.
Extensibility via Custom Connectors
The SDK provides a standardized way to integrate with external systems (like BigQuery, Snowflake, or custom batch systems) through the AsyncConnector interface. This design allows Flyte to manage the lifecycle of external jobs—creation, status polling, and cleanup—without the core engine needing to understand the specifics of the external service.
The AsyncConnector Interface
All custom connectors must inherit from AsyncConnector (defined in src/flyte/connectors/_connector.py) and implement three core methods:
create(): Initiates the job in the external system and returns aResourceMetaobject (e.g., a job ID).get(): Polls the external system for the current status of the job, returning aResourceobject containing the phase (e.g.,SUCCEEDED,RUNNING) and any outputs.delete(): Cleans up the external resource. This method is designed to be idempotent.
from flyte.connectors._connector import AsyncConnector, Resource, ResourceMeta
from flyteidl2.core.execution_pb2 import TaskExecution
class MyBatchConnector(AsyncConnector):
name = "My Batch Service"
task_type_name = "my_batch_task"
async def create(self, task_template, output_prefix, inputs=None, **kwargs):
# Logic to trigger external job
return ResourceMeta(job_id="123")
async def get(self, resource_meta, **kwargs):
# Logic to check job status
return Resource(phase=TaskExecution.SUCCEEDED, outputs={"result": 42})
async def delete(self, resource_meta, **kwargs):
# Cleanup logic
pass
Registration and Local Execution
Connectors are registered globally via the ConnectorRegistry. This registry is used by the Flyte service to dispatch tasks to the correct connector based on the task_type_name.
For local development, the AsyncConnectorExecutorMixin allows developers to run connector-based tasks locally. When a task using this mixin is executed, it retrieves the connector from the registry and enters a polling loop, simulating the behavior of the remote Flyte propeller.
Isolated Sandboxes
The SDK offers two distinct sandboxing mechanisms to handle different execution requirements: Orchestration Sandboxes for lightweight control flow and Code Sandboxes for arbitrary, dependency-heavy execution.
Orchestration Sandboxes (Monty)
Orchestration sandboxes (found in src/flyte/sandbox/) are powered by the Monty runtime. They are designed for "pure" Python logic—routing, aggregation, and simple transformations—with microsecond startup times.
To maintain safety and performance, Monty enforces strict constraints:
- No IO or Network: Logic must be side-effect free.
- Restricted Syntax: No imports, no
try/exceptblocks, and no class definitions are allowed. - Immutable Data: Subscript assignment (e.g.,
d[key] = value) is forbidden to prevent side effects.
Code Sandboxes
For tasks that require external libraries (like numpy or pandas) or shell access, flyte.sandbox.create provides a Docker-based environment. It supports three modes:
- Code Mode: The SDK automatically wires inputs and outputs. Python code is provided as a string, and variables are injected into the local scope.
- Verbatim Mode: The script is responsible for reading inputs from
/var/inputsand writing outputs to/var/outputs. - Command Mode: Executes an arbitrary shell command or entrypoint inside the container.
from flyte.sandbox import create
# Code mode example
stats_sandbox = create(
name="numpy-stats",
code="import numpy as np; mean = float(np.mean(values))",
inputs={"values": list[float]},
outputs={"mean": float},
packages=["numpy"],
)
Git-Integrated Versioning
Traceability is a core principle of the SDK, implemented via the GitStatus class in src/flyte/git/_config.py. During deployment or local execution, the SDK automatically discovers git metadata to link tasks back to their source code.
Metadata Discovery
GitStatus.from_current_repo() uses subprocess calls to git to extract:
- The current Commit SHA.
- The Remote URL (normalized to HTTPS).
- The Clean Status of the working tree.
Source Linking
This metadata is used to build documentation links. The GIT_URL_BUILDER_REGISTRY contains logic for different providers (GitHub, GitLab) to construct deep links to specific files and line numbers. For example, GithubUrlBuilder generates URLs in the format {remote_url}/blob/{commit_sha}/{file_path}#L{line_number}.
This ensures that every task registered with Flyte carries a permanent, verifiable link to the exact version of the code that defined it, facilitating debugging and auditing in production environments.