Experiment Tracking with Weights & Biases
To track experiments with Weights & Biases (W&B) in this project, use the @wandb_init decorator to automatically manage run lifecycles and the wandb_config utility to propagate settings across tasks.
Basic Experiment Tracking
The most direct way to log metrics is to decorate your task with @wandb_init and use get_wandb_run() to access the active W&B run object.
from flyteplugins.wandb import wandb_init, get_wandb_run
import flyte
@wandb_init(project="my-project", entity="my-team")
@flyte.task
async def train_model(learning_rate: float) -> str:
# Access the automatically initialized run
wandb_run = get_wandb_run()
# Log metrics as you normally would with wandb
wandb_run.log({"loss": 0.5, "learning_rate": learning_rate})
return wandb_run.id
Configuring W&B via Context
Instead of hardcoding project and entity names in decorators, you can use wandb_config to set these globally for a workflow execution. This configuration propagates to all child tasks.
from flyteplugins.wandb import wandb_config, wandb_init
@wandb_init
@flyte.task
async def task_with_inherited_config():
run = get_wandb_run()
run.log({"metric": 1.0})
# Run with global context
result = flyte.with_runcontext(
custom_context=wandb_config(
project="flyte-wandb-test",
entity="my-team",
tags=["experiment-v1"]
)
).run(task_with_inherited_config)
Managing Parent and Child Tasks
The run_mode parameter in wandb_config or @wandb_init determines how child tasks handle W&B runs relative to their parents.
auto(Default): Uses the parent's run ID if available; otherwise, creates a new run.shared: Forces the child to use the parent's run ID.new: Always creates a unique W&B run for the child task.
@wandb_init(run_mode="new") # This child will always have its own unique run
@flyte.task
async def independent_child():
run = get_wandb_run()
run.log({"child_val": 42})
@wandb_init(project="my-project", entity="my-team")
@flyte.task
async def parent_task():
run = get_wandb_run()
run.log({"parent_val": 100})
# This call will create a separate run in W&B
await independent_child()
Distributed Training Logging
When using distributed training (e.g., with Elastic), the rank_scope parameter controls which processes initialize W&B runs.
global(Default): Only the global rank 0 process creates a run.worker: The local rank 0 process on each worker (node) creates a run.
Combined with run_mode, you can achieve different logging patterns:
from flyteplugins.pytorch.task import Elastic
from flyteplugins.wandb import wandb_init, get_wandb_run
# 1. Single shared run for all ranks
@wandb_init(run_mode="shared", rank_scope="global")
@flyte.task(plugin_config=Elastic(nnodes=2, nproc_per_node=2))
async def shared_logging():
run = get_wandb_run()
run.log({"global_metric": 1.0})
# 2. Separate runs for every rank (grouped in W&B UI)
@wandb_init(run_mode="new", rank_scope="global")
@flyte.task(plugin_config=Elastic(nnodes=2, nproc_per_node=2))
async def separate_logging():
run = get_wandb_run()
run.log({"rank_specific_metric": 2.0})
Manual Link Integration
If you need to manage W&B runs manually (e.g., calling wandb.init() yourself), you can still ensure the Flyte UI displays the correct link by using the Wandb link class.
from flyteplugins.wandb import Wandb
import wandb
@flyte.task(
links=(
Wandb(
project="my-project",
entity="my-team",
id="custom-manual-id",
run_mode="new"
),
)
)
async def manual_task():
# You must ensure the ID here matches the ID in the Wandb link above
run = wandb.init(project="my-project", entity="my-team", id="custom-manual-id")
run.log({"manual": True})
run.finish()
Troubleshooting and Gotchas
- Decorator Order: The
@wandb_initdecorator must be the outermost decorator on your Flyte task. - Async and Distributed: Async task functions are not supported when using
Elastic(distributed) training; use synchronous functions instead. - Log Downloads: Setting
download_logs=Trueinwandb_configis not supported for distributed tasks. - Traces: If using
@flyte.trace, do not use@wandb_initon the traced function. Instead, callget_wandb_run()inside the function to access the run initialized by the parent task. - Missing Project/Entity: If
projectorentityare not provided via decorator or context, the W&B link in the Flyte UI will default to the base host URL (e.g.,https://wandb.ai).