Experiment Tracking with Weights & Biases

To track experiments with Weights & Biases (W&B) in this project, use the @wandb_init decorator to automatically manage run lifecycles and the wandb_config utility to propagate settings across tasks.

Basic Experiment Tracking

The most direct way to log metrics is to decorate your task with @wandb_init and use get_wandb_run() to access the active W&B run object.

from flyteplugins.wandb import wandb_init, get_wandb_run
import flyte

@wandb_init(project="my-project", entity="my-team")
@flyte.task
async def train_model(learning_rate: float) -> str:
    # Access the automatically initialized run
    wandb_run = get_wandb_run()
    
    # Log metrics as you normally would with wandb
    wandb_run.log({"loss": 0.5, "learning_rate": learning_rate})
    
    return wandb_run.id

Configuring W&B via Context

Instead of hardcoding project and entity names in decorators, you can use wandb_config to set these globally for a workflow execution. This configuration propagates to all child tasks.

from flyteplugins.wandb import wandb_config, wandb_init

@wandb_init
@flyte.task
async def task_with_inherited_config():
    run = get_wandb_run()
    run.log({"metric": 1.0})

# Run with global context
result = flyte.with_runcontext(
    custom_context=wandb_config(
        project="flyte-wandb-test",
        entity="my-team",
        tags=["experiment-v1"]
    )
).run(task_with_inherited_config)

Managing Parent and Child Tasks

The run_mode parameter in wandb_config or @wandb_init determines how child tasks handle W&B runs relative to their parents.

auto (Default): Uses the parent's run ID if available; otherwise, creates a new run.
shared: Forces the child to use the parent's run ID.
new: Always creates a unique W&B run for the child task.

@wandb_init(run_mode="new")  # This child will always have its own unique run
@flyte.task
async def independent_child():
    run = get_wandb_run()
    run.log({"child_val": 42})

@wandb_init(project="my-project", entity="my-team")
@flyte.task
async def parent_task():
    run = get_wandb_run()
    run.log({"parent_val": 100})
    
    # This call will create a separate run in W&B
    await independent_child()

Distributed Training Logging

When using distributed training (e.g., with Elastic), the rank_scope parameter controls which processes initialize W&B runs.

global (Default): Only the global rank 0 process creates a run.
worker: The local rank 0 process on each worker (node) creates a run.

Combined with run_mode, you can achieve different logging patterns:

from flyteplugins.pytorch.task import Elastic
from flyteplugins.wandb import wandb_init, get_wandb_run

# 1. Single shared run for all ranks
@wandb_init(run_mode="shared", rank_scope="global")
@flyte.task(plugin_config=Elastic(nnodes=2, nproc_per_node=2))
async def shared_logging():
    run = get_wandb_run()
    run.log({"global_metric": 1.0})

# 2. Separate runs for every rank (grouped in W&B UI)
@wandb_init(run_mode="new", rank_scope="global")
@flyte.task(plugin_config=Elastic(nnodes=2, nproc_per_node=2))
async def separate_logging():
    run = get_wandb_run()
    run.log({"rank_specific_metric": 2.0})

Manual Link Integration

If you need to manage W&B runs manually (e.g., calling wandb.init() yourself), you can still ensure the Flyte UI displays the correct link by using the Wandb link class.

from flyteplugins.wandb import Wandb
import wandb

@flyte.task(
    links=(
        Wandb(
            project="my-project",
            entity="my-team",
            id="custom-manual-id",
            run_mode="new"
        ),
    )
)
async def manual_task():
    # You must ensure the ID here matches the ID in the Wandb link above
    run = wandb.init(project="my-project", entity="my-team", id="custom-manual-id")
    run.log({"manual": True})
    run.finish()

Troubleshooting and Gotchas

Decorator Order: The @wandb_init decorator must be the outermost decorator on your Flyte task.
Async and Distributed: Async task functions are not supported when using Elastic (distributed) training; use synchronous functions instead.
Log Downloads: Setting download_logs=True in wandb_config is not supported for distributed tasks.
Traces: If using @flyte.trace, do not use @wandb_init on the traced function. Instead, call get_wandb_run() inside the function to access the run initialized by the parent task.
Missing Project/Entity: If project or entity are not provided via decorator or context, the W&B link in the Flyte UI will default to the base host URL (e.g., https://wandb.ai).

Basic Experiment Tracking​

Configuring W&B via Context​

Managing Parent and Child Tasks​

Distributed Training Logging​

Manual Link Integration​

Troubleshooting and Gotchas​