Automated Validation Reporting
Flyte provides an automated reporting mechanism for data validation when using the Pandera plugin. This system captures validation results—both successes and failures—and renders them as rich HTML reports directly within the Flyte UI.
Core Reporting Architecture
The reporting system is built on a multi-layered architecture that connects Flyte's type system with Pandera's validation engine:
PanderaDataFrameTransformer: Located inplugins/pandera/src/flyteplugins/pandera/transformers/base.py, this class intercepts data during type conversion (bothto_literalfor outputs andto_python_valuefor inputs). It triggers the validation process and manages the lifecycle of the report.PanderaReportRenderer: A protocol defined inplugins/pandera/src/flyteplugins/pandera/renderers/base.pythat specifies the interface for generating HTML from validation data.PanderaPandasReportRenderer: The concrete implementation inplugins/pandera/src/flyteplugins/pandera/renderers/pandas.py. It uses thegreat_tableslibrary to transform raw validation errors into formatted HTML tables.ReportandTab: Defined insrc/flyte/report/_report.py, these classes manage the aggregation of HTML content into a multi-tabbed interface that Flyte can persist and display.
Enabling Automated Reports
To generate validation reports, a task must be decorated with report=True. When this is enabled, the PanderaDataFrameTransformer automatically creates tabs named "Pandera report: input" and "Pandera report: output" depending on where the validation occurs.
import pandas as pd
import pandera as pa
import flyte.typing as pt
from flyte.extend import task
class Schema(pa.DataFrameModel):
id: int = pa.Field(ge=0)
name: str
@task(report=True)
def validate_data(df: pd.DataFrame) -> pt.DataFrame[Schema]:
# The transformer will validate the return value against Schema
# and generate a report tab automatically.
return df
Report Structure and Content
The PanderaPandasReportRenderer constructs a comprehensive view of the data state using the PandasReport dataclass, which holds several key components:
- Summary: A high-level overview including the Schema Name, Data Shape (rows x columns), and total error counts.
- Data Preview: A snapshot of the first 5 rows (defined by
DATA_PREVIEW_HEAD) of the validated DataFrame. - Schema-level Errors: Detailed in a table if metadata validation fails (e.g., missing columns or incorrect dtypes).
- Data-level Errors: A breakdown of value-level failures (e.g., nulls where not allowed). This table includes a
percent_validcolumn and a list of specific failure cases, limited to the first 10 instances (FAILURE_CASE_LIMIT) to prevent report bloat.
Customizing Validation Behavior
You can control how validation failures affect task execution using the ValidationConfig class. By wrapping types with Annotated, you can specify whether a validation failure should raise an exception or merely log a warning while still generating the report.
from typing import Annotated
from flyteplugins.pandera import ValidationConfig
@task(report=True)
def process_data(
# Validation failures will log a warning but allow the task to continue
df: Annotated[pt.DataFrame[Schema], ValidationConfig(on_error="warn")]
) -> pt.DataFrame[Schema]:
return df
In PanderaDataFrameTransformer._validate, the configuration is checked:
- If
on_error="raise"(default), theSchemaErrorsexception is re-raised after the report is generated. - If
on_error="warn", the renderer is called withwarn=True, which changes the report icon to a warning symbol (⚠️) instead of an error (❌), and the task continues.
Technical Implementation Details
The integration relies on Flyte's internal reporting context. When _validate is called, it performs the following steps:
- Validation: Executes
schema.validate(data, lazy=True). - HTML Generation: Calls
self._report_renderer.to_html(...). - Tab Management: Uses
flyte.report.get_tab(report_title).replace(html)to insert the generated HTML into the current task's report. - Persistence: Calls
await flyte.report.flush.aio()to write the report to the configured Flyte metadata storage (e.g., S3 or GCS).
The Report class in src/flyte/report/_report.py handles the final assembly by injecting the tab content into an HTML template (_template.html) using Python's string.Template. This template includes the necessary JavaScript to handle tab switching in the Flyte UI.