Skip to main content

Configuring Validation Behavior

To control whether a Flyte task should fail or simply log a warning when Pandera validation fails, use the ValidationConfig class within a typing.Annotated type hint.

Basic Configuration

By default, any Pandera validation failure will raise an exception and cause the Flyte task to fail. You can change this behavior to log a warning instead by setting on_error="warn".

from typing import Annotated
import pandas as pd
import pandera.pandas as pa
import pandera.typing.pandas as pt
from flyteplugins.pandera import ValidationConfig
import flyte

class EmployeeSchema(pa.DataFrameModel):
employee_id: int = pa.Field(ge=0)
name: str

@flyte.task(report=True)
async def process_data(
# Validation failure on input will only log a warning
df: Annotated[pt.DataFrame[EmployeeSchema], ValidationConfig(on_error="warn")],
) -> Annotated[pt.DataFrame[EmployeeSchema], ValidationConfig(on_error="warn")]:
# If input validation failed, the task still runs with the provided data
return df

Configuration Options

The ValidationConfig class (defined in flyteplugins.pandera.config) supports the following parameter:

  • on_error: A string literal that determines the failure mode.
    • "raise" (default): Raises a pandera.errors.SchemaErrors exception on failure, stopping task execution.
    • "warn": Logs the validation error to the task logs and continues execution.

Applying Validation Behavior

You can apply different validation behaviors to inputs and outputs independently within the same task.

@flyte.task(report=True)
async def mixed_validation_task(
# Warn if input is invalid
df: Annotated[pt.DataFrame[EmployeeSchema], ValidationConfig(on_error="warn")]
) -> Annotated[pt.DataFrame[EmployeeSchema], ValidationConfig(on_error="raise")]:
# Task will fail here if the returned DataFrame does not match EmployeeSchema
return df

Validation Reports in Flyte Decks

Regardless of whether on_error is set to "raise" or "warn", the Flyte-Pandera integration still generates a validation report. If the task is configured with report=True, you can view the detailed Pandera validation results (including which rows or columns failed) in the Flyte Deck for that task execution.

Troubleshooting

  • Annotated Requirement: ValidationConfig must be wrapped in typing.Annotated. If you provide the config directly as a type hint (e.g., df: ValidationConfig), the Flyte type engine will not recognize it as a Pandera configuration.
  • Import Path: Ensure you import ValidationConfig from flyteplugins.pandera.
  • Task Execution: When using on_error="warn", the task continues with the data as-is. Ensure your task logic can handle data that may not strictly adhere to the schema if you choose this setting.