Autonomous Python Code Generation

The AutoCoderAgent is a high-level orchestrator designed to bridge the gap between natural language requirements and validated, executable Python scripts. It automates the entire lifecycle of code generation: planning, dependency detection, sandbox image building, and iterative testing.

By executing code in isolated Flyte sandboxes, the agent ensures that generated scripts are not only syntactically correct but also functionally valid against real or sample data before they are ever used in production.

Core Concepts

The autonomous generation process relies on several key structures defined in flyteplugins.codegen.core.types:

CodePlan: Before writing any code, the LLM generates a high-level approach. This includes a description of the solution and the algorithm it intends to use.
CodeSolution: This contains the final generated Python code along with any required system-level packages (e.g., gcc, libpq-dev).
CodeGenEvalResult: The final output of a generation run. It encapsulates the success status, the built Flyte image, the solution, and metadata like token usage and conversation history.

The Generation Lifecycle

When you call agent.generate(), the AutoCoderAgent (located in plugins/codegen/src/flyteplugins/codegen/auto_coder_agent.py) follows a structured workflow:

Data Context Extraction: If samples are provided, the agent inspects the data to infer schemas (using Pandera), statistics, and patterns.
Planning: The LLM generates a CodePlan based on the prompt and data context.
Iterative Coding & Testing:
- The agent generates a CodeSolution.
- It detects required Python packages from imports.
- It builds a sandbox image containing these dependencies.
- It runs pytest-based tests within the sandbox.
- If tests fail, it feeds the errors back to the LLM and iterates (up to max_iterations).

Data-First Generation

A unique feature of this implementation is its "Data-First" approach. Instead of just providing a prompt, you can provide sample pd.DataFrame or flyte.io.File objects.

from flyte.io import File
from flyteplugins.codegen import AutoCoderAgent

agent = AutoCoderAgent(model="gpt-4.1", base_packages=["pandas"])

# The agent will sample 'sales_data' to understand its columns and types
result = await agent.generate.aio(
    prompt="Calculate the monthly growth rate of sales.",
    samples={"sales_data": File("s3://my-bucket/sales.csv")},
    outputs={"growth_rate": float}
)

The agent uses these samples to build an "enhanced prompt" that includes inferred Pandera schemas, ensuring the generated code correctly references column names and data types.

Execution Backends

The AutoCoderAgent supports two distinct execution strategies via the backend parameter:

litellm (Default): Uses a structured loop where the agent follows a predefined sequence of plan -> code -> test. This is highly predictable and works with any model supported by LiteLLM (e.g., GPT-4, Claude).
claude: Uses the Claude Agent SDK to create a fully autonomous agent. In this mode, the agent decides when to write code, when to run tests, and how to fix errors using tool-calling capabilities. This requires the flyteplugins-codegen[agent] extra.

Working with Results

Once generation is successful, the CodeGenEvalResult provides two primary ways to execute the code.

One-off Execution with `run()`

The run() method executes the generated code in a sandbox immediately. If samples were provided during generation, they are used as default values.

# Overriding the sample with real production data
final_output = await result.run.aio(
    sales_data=File("s3://prod-bucket/2024_sales.csv")
)

Reusable Tasks with `as_task()`

For integration into larger Flyte workflows, as_task() converts the generated solution into a standard Flyte task. This task uses the specific container image built during the generation phase, ensuring environment parity.

# Create a reusable task from the generated code
processing_task = result.as_task(
    name="prod_sales_processor",
    resources=flyte.Resources(cpu=2, memory="4Gi")
)

# Use it like any other Flyte task
output = await processing_task(sales_data=prod_file)

Sandbox Configuration and Security

Because the generated code is untrusted, AutoCoderAgent provides several security and resource controls:

Network Isolation: Set block_network=True to prevent the generated code from making outbound network calls.
Resource Limits: Define resources (e.g., flyte.Resources(cpu=1, memory="1Gi")) to constrain the sandbox environment.
Secrets: Pass flyte.Secret objects via the secrets argument to make sensitive credentials available to the sandbox securely.
Caching: The cache parameter (defaulting to "auto") controls whether the sandbox execution results are cached, which is useful for expensive data processing tasks.

The agent manages the underlying flyte.sandbox lifecycle, including the automatic mapping of inputs to /var/inputs and outputs to /var/outputs within the container.

Core Concepts​

The Generation Lifecycle​

Data-First Generation​

Execution Backends​

Working with Results​

One-off Execution with run()​

Reusable Tasks with as_task()​

Sandbox Configuration and Security​