Skip to main content

Automated Validation Reporting

Flyte provides an automated reporting mechanism for data validation when using the Pandera plugin. This system captures validation results—both successes and failures—and renders them as rich HTML reports directly within the Flyte UI.

Core Reporting Architecture

The reporting system is built on a multi-layered architecture that connects Flyte's type system with Pandera's validation engine:

  1. PanderaDataFrameTransformer: Located in plugins/pandera/src/flyteplugins/pandera/transformers/base.py, this class intercepts data during type conversion (both to_literal for outputs and to_python_value for inputs). It triggers the validation process and manages the lifecycle of the report.
  2. PanderaReportRenderer: A protocol defined in plugins/pandera/src/flyteplugins/pandera/renderers/base.py that specifies the interface for generating HTML from validation data.
  3. PanderaPandasReportRenderer: The concrete implementation in plugins/pandera/src/flyteplugins/pandera/renderers/pandas.py. It uses the great_tables library to transform raw validation errors into formatted HTML tables.
  4. Report and Tab: Defined in src/flyte/report/_report.py, these classes manage the aggregation of HTML content into a multi-tabbed interface that Flyte can persist and display.

Enabling Automated Reports

To generate validation reports, a task must be decorated with report=True. When this is enabled, the PanderaDataFrameTransformer automatically creates tabs named "Pandera report: input" and "Pandera report: output" depending on where the validation occurs.

import pandas as pd
import pandera as pa
import flyte.typing as pt
from flyte.extend import task

class Schema(pa.DataFrameModel):
id: int = pa.Field(ge=0)
name: str

@task(report=True)
def validate_data(df: pd.DataFrame) -> pt.DataFrame[Schema]:
# The transformer will validate the return value against Schema
# and generate a report tab automatically.
return df

Report Structure and Content

The PanderaPandasReportRenderer constructs a comprehensive view of the data state using the PandasReport dataclass, which holds several key components:

  • Summary: A high-level overview including the Schema Name, Data Shape (rows x columns), and total error counts.
  • Data Preview: A snapshot of the first 5 rows (defined by DATA_PREVIEW_HEAD) of the validated DataFrame.
  • Schema-level Errors: Detailed in a table if metadata validation fails (e.g., missing columns or incorrect dtypes).
  • Data-level Errors: A breakdown of value-level failures (e.g., nulls where not allowed). This table includes a percent_valid column and a list of specific failure cases, limited to the first 10 instances (FAILURE_CASE_LIMIT) to prevent report bloat.

Customizing Validation Behavior

You can control how validation failures affect task execution using the ValidationConfig class. By wrapping types with Annotated, you can specify whether a validation failure should raise an exception or merely log a warning while still generating the report.

from typing import Annotated
from flyteplugins.pandera import ValidationConfig

@task(report=True)
def process_data(
# Validation failures will log a warning but allow the task to continue
df: Annotated[pt.DataFrame[Schema], ValidationConfig(on_error="warn")]
) -> pt.DataFrame[Schema]:
return df

In PanderaDataFrameTransformer._validate, the configuration is checked:

  • If on_error="raise" (default), the SchemaErrors exception is re-raised after the report is generated.
  • If on_error="warn", the renderer is called with warn=True, which changes the report icon to a warning symbol (⚠️) instead of an error (❌), and the task continues.

Technical Implementation Details

The integration relies on Flyte's internal reporting context. When _validate is called, it performs the following steps:

  1. Validation: Executes schema.validate(data, lazy=True).
  2. HTML Generation: Calls self._report_renderer.to_html(...).
  3. Tab Management: Uses flyte.report.get_tab(report_title).replace(html) to insert the generated HTML into the current task's report.
  4. Persistence: Calls await flyte.report.flush.aio() to write the report to the configured Flyte metadata storage (e.g., S3 or GCS).

The Report class in src/flyte/report/_report.py handles the final assembly by injecting the tab content into an HTML template (_template.html) using Python's string.Template. This template includes the necessary JavaScript to handle tab switching in the Flyte UI.