Flytekit SDK System Context Diagram
The Flytekit SDK serves as the primary interface for users (Data Scientists and Developers) to author, manage, and execute workflows on the Flyte platform.
Key Components and Interactions:
- Flytekit SDK: The core Python library and CLI tool. It handles workflow/task definition, serialization, and communication with the Flyte backend.
- Flyte Remote API: A set of gRPC/ConnectRPC services (Project, Task, Run, etc.) that manage the lifecycle of Flyte entities. The SDK uses these APIs for registration and execution requests.
- Flyte Propeller: The core execution engine that orchestrates workflow graphs and manages task execution. While the SDK doesn't talk to it directly, it defines the instructions Propeller follows.
- Flyte Storage: S3, GCS, or Azure Blob Storage used for storing task inputs, outputs, and artifacts. The SDK interacts with storage via
fsspec,obstore, and signed URLs provided by theDataProxyservice. - Container Images: Stores the Docker images used for task execution. The SDK can build and push images locally (using Docker/Podman) or trigger remote builds.
- Connectors: Integrated via a robust plugin system, allowing Flyte tasks to interact with distributed computing (Spark, Ray), data warehouses (BigQuery, Snowflake), and AI services (OpenAI, Anthropic).
- Flyte Console: The web UI for monitoring and managing executions. The SDK provides helper methods to generate direct links to resources in the Console.
Key Architectural Findings:
- Flytekit SDK uses ConnectRPC to communicate with a suite of backend services including Project, Task, Run, and DataProxy services.
- Storage abstraction is handled via fsspec and obstore, supporting S3, GCS, and Azure Blob Storage through signed URLs.
- Image management includes local builds via Docker/Podman and remote builds triggered through the Flyte backend.
- A comprehensive plugin architecture enables direct integration with external platforms like Spark, Ray, Databricks, Snowflake, and various LLM providers.
- The SDK includes a rich CLI (built with click) for deploying applications, managing resources, and fetching logs.
Loading diagram...