Autonomous Python Code Generation
The AutoCoderAgent is a high-level orchestrator designed to bridge the gap between natural language requirements and validated, executable Python scripts. It automates the entire lifecycle of code generation: planning, dependency detection, sandbox image building, and iterative testing.
By executing code in isolated Flyte sandboxes, the agent ensures that generated scripts are not only syntactically correct but also functionally valid against real or sample data before they are ever used in production.
Core Concepts
The autonomous generation process relies on several key structures defined in flyteplugins.codegen.core.types:
CodePlan: Before writing any code, the LLM generates a high-level approach. This includes a description of the solution and the algorithm it intends to use.CodeSolution: This contains the final generated Python code along with any required system-level packages (e.g.,gcc,libpq-dev).CodeGenEvalResult: The final output of a generation run. It encapsulates the success status, the built Flyte image, the solution, and metadata like token usage and conversation history.
The Generation Lifecycle
When you call agent.generate(), the AutoCoderAgent (located in plugins/codegen/src/flyteplugins/codegen/auto_coder_agent.py) follows a structured workflow:
- Data Context Extraction: If
samplesare provided, the agent inspects the data to infer schemas (using Pandera), statistics, and patterns. - Planning: The LLM generates a
CodePlanbased on the prompt and data context. - Iterative Coding & Testing:
- The agent generates a
CodeSolution. - It detects required Python packages from imports.
- It builds a sandbox image containing these dependencies.
- It runs
pytest-based tests within the sandbox. - If tests fail, it feeds the errors back to the LLM and iterates (up to
max_iterations).
- The agent generates a
Data-First Generation
A unique feature of this implementation is its "Data-First" approach. Instead of just providing a prompt, you can provide sample pd.DataFrame or flyte.io.File objects.
from flyte.io import File
from flyteplugins.codegen import AutoCoderAgent
agent = AutoCoderAgent(model="gpt-4.1", base_packages=["pandas"])
# The agent will sample 'sales_data' to understand its columns and types
result = await agent.generate.aio(
prompt="Calculate the monthly growth rate of sales.",
samples={"sales_data": File("s3://my-bucket/sales.csv")},
outputs={"growth_rate": float}
)
The agent uses these samples to build an "enhanced prompt" that includes inferred Pandera schemas, ensuring the generated code correctly references column names and data types.
Execution Backends
The AutoCoderAgent supports two distinct execution strategies via the backend parameter:
litellm(Default): Uses a structured loop where the agent follows a predefined sequence of plan -> code -> test. This is highly predictable and works with any model supported by LiteLLM (e.g., GPT-4, Claude).claude: Uses the Claude Agent SDK to create a fully autonomous agent. In this mode, the agent decides when to write code, when to run tests, and how to fix errors using tool-calling capabilities. This requires theflyteplugins-codegen[agent]extra.
Working with Results
Once generation is successful, the CodeGenEvalResult provides two primary ways to execute the code.
One-off Execution with run()
The run() method executes the generated code in a sandbox immediately. If samples were provided during generation, they are used as default values.
# Overriding the sample with real production data
final_output = await result.run.aio(
sales_data=File("s3://prod-bucket/2024_sales.csv")
)
Reusable Tasks with as_task()
For integration into larger Flyte workflows, as_task() converts the generated solution into a standard Flyte task. This task uses the specific container image built during the generation phase, ensuring environment parity.
# Create a reusable task from the generated code
processing_task = result.as_task(
name="prod_sales_processor",
resources=flyte.Resources(cpu=2, memory="4Gi")
)
# Use it like any other Flyte task
output = await processing_task(sales_data=prod_file)
Sandbox Configuration and Security
Because the generated code is untrusted, AutoCoderAgent provides several security and resource controls:
- Network Isolation: Set
block_network=Trueto prevent the generated code from making outbound network calls. - Resource Limits: Define
resources(e.g.,flyte.Resources(cpu=1, memory="1Gi")) to constrain the sandbox environment. - Secrets: Pass
flyte.Secretobjects via thesecretsargument to make sensitive credentials available to the sandbox securely. - Caching: The
cacheparameter (defaulting to"auto") controls whether the sandbox execution results are cached, which is useful for expensive data processing tasks.
The agent manages the underlying flyte.sandbox lifecycle, including the automatic mapping of inputs to /var/inputs and outputs to /var/outputs within the container.