Skip to main content

Experiment Tracking with Weights & Biases

To track experiments with Weights & Biases (W&B) in this project, use the @wandb_init decorator to automatically manage run lifecycles and the wandb_config utility to propagate settings across tasks.

Basic Experiment Tracking

The most direct way to log metrics is to decorate your task with @wandb_init and use get_wandb_run() to access the active W&B run object.

from flyteplugins.wandb import wandb_init, get_wandb_run
import flyte

@wandb_init(project="my-project", entity="my-team")
@flyte.task
async def train_model(learning_rate: float) -> str:
# Access the automatically initialized run
wandb_run = get_wandb_run()

# Log metrics as you normally would with wandb
wandb_run.log({"loss": 0.5, "learning_rate": learning_rate})

return wandb_run.id

Configuring W&B via Context

Instead of hardcoding project and entity names in decorators, you can use wandb_config to set these globally for a workflow execution. This configuration propagates to all child tasks.

from flyteplugins.wandb import wandb_config, wandb_init

@wandb_init
@flyte.task
async def task_with_inherited_config():
run = get_wandb_run()
run.log({"metric": 1.0})

# Run with global context
result = flyte.with_runcontext(
custom_context=wandb_config(
project="flyte-wandb-test",
entity="my-team",
tags=["experiment-v1"]
)
).run(task_with_inherited_config)

Managing Parent and Child Tasks

The run_mode parameter in wandb_config or @wandb_init determines how child tasks handle W&B runs relative to their parents.

  • auto (Default): Uses the parent's run ID if available; otherwise, creates a new run.
  • shared: Forces the child to use the parent's run ID.
  • new: Always creates a unique W&B run for the child task.
@wandb_init(run_mode="new")  # This child will always have its own unique run
@flyte.task
async def independent_child():
run = get_wandb_run()
run.log({"child_val": 42})

@wandb_init(project="my-project", entity="my-team")
@flyte.task
async def parent_task():
run = get_wandb_run()
run.log({"parent_val": 100})

# This call will create a separate run in W&B
await independent_child()

Distributed Training Logging

When using distributed training (e.g., with Elastic), the rank_scope parameter controls which processes initialize W&B runs.

  • global (Default): Only the global rank 0 process creates a run.
  • worker: The local rank 0 process on each worker (node) creates a run.

Combined with run_mode, you can achieve different logging patterns:

from flyteplugins.pytorch.task import Elastic
from flyteplugins.wandb import wandb_init, get_wandb_run

# 1. Single shared run for all ranks
@wandb_init(run_mode="shared", rank_scope="global")
@flyte.task(plugin_config=Elastic(nnodes=2, nproc_per_node=2))
async def shared_logging():
run = get_wandb_run()
run.log({"global_metric": 1.0})

# 2. Separate runs for every rank (grouped in W&B UI)
@wandb_init(run_mode="new", rank_scope="global")
@flyte.task(plugin_config=Elastic(nnodes=2, nproc_per_node=2))
async def separate_logging():
run = get_wandb_run()
run.log({"rank_specific_metric": 2.0})

If you need to manage W&B runs manually (e.g., calling wandb.init() yourself), you can still ensure the Flyte UI displays the correct link by using the Wandb link class.

from flyteplugins.wandb import Wandb
import wandb

@flyte.task(
links=(
Wandb(
project="my-project",
entity="my-team",
id="custom-manual-id",
run_mode="new"
),
)
)
async def manual_task():
# You must ensure the ID here matches the ID in the Wandb link above
run = wandb.init(project="my-project", entity="my-team", id="custom-manual-id")
run.log({"manual": True})
run.finish()

Troubleshooting and Gotchas

  • Decorator Order: The @wandb_init decorator must be the outermost decorator on your Flyte task.
  • Async and Distributed: Async task functions are not supported when using Elastic (distributed) training; use synchronous functions instead.
  • Log Downloads: Setting download_logs=True in wandb_config is not supported for distributed tasks.
  • Traces: If using @flyte.trace, do not use @wandb_init on the traced function. Instead, call get_wandb_run() inside the function to access the run initialized by the parent task.
  • Missing Project/Entity: If project or entity are not provided via decorator or context, the W&B link in the Flyte UI will default to the base host URL (e.g., https://wandb.ai).