Configuring Validation Behavior
To control whether a Flyte task should fail or simply log a warning when Pandera validation fails, use the ValidationConfig class within a typing.Annotated type hint.
Basic Configuration
By default, any Pandera validation failure will raise an exception and cause the Flyte task to fail. You can change this behavior to log a warning instead by setting on_error="warn".
from typing import Annotated
import pandas as pd
import pandera.pandas as pa
import pandera.typing.pandas as pt
from flyteplugins.pandera import ValidationConfig
import flyte
class EmployeeSchema(pa.DataFrameModel):
employee_id: int = pa.Field(ge=0)
name: str
@flyte.task(report=True)
async def process_data(
# Validation failure on input will only log a warning
df: Annotated[pt.DataFrame[EmployeeSchema], ValidationConfig(on_error="warn")],
) -> Annotated[pt.DataFrame[EmployeeSchema], ValidationConfig(on_error="warn")]:
# If input validation failed, the task still runs with the provided data
return df
Configuration Options
The ValidationConfig class (defined in flyteplugins.pandera.config) supports the following parameter:
on_error: A string literal that determines the failure mode."raise"(default): Raises apandera.errors.SchemaErrorsexception on failure, stopping task execution."warn": Logs the validation error to the task logs and continues execution.
Applying Validation Behavior
You can apply different validation behaviors to inputs and outputs independently within the same task.
@flyte.task(report=True)
async def mixed_validation_task(
# Warn if input is invalid
df: Annotated[pt.DataFrame[EmployeeSchema], ValidationConfig(on_error="warn")]
) -> Annotated[pt.DataFrame[EmployeeSchema], ValidationConfig(on_error="raise")]:
# Task will fail here if the returned DataFrame does not match EmployeeSchema
return df
Validation Reports in Flyte Decks
Regardless of whether on_error is set to "raise" or "warn", the Flyte-Pandera integration still generates a validation report. If the task is configured with report=True, you can view the detailed Pandera validation results (including which rows or columns failed) in the Flyte Deck for that task execution.
Troubleshooting
- Annotated Requirement:
ValidationConfigmust be wrapped intyping.Annotated. If you provide the config directly as a type hint (e.g.,df: ValidationConfig), the Flyte type engine will not recognize it as a Pandera configuration. - Import Path: Ensure you import
ValidationConfigfromflyteplugins.pandera. - Task Execution: When using
on_error="warn", the task continues with the data as-is. Ensure your task logic can handle data that may not strictly adhere to the schema if you choose this setting.