obj_hierachy
Object Hierarchy
π§± Metaflow Object Hierarchy
Hereβs the hierarchy in top-down order:
Flow
βββ Run
βββ Step
βββ Task
βββ Artifact
Think of it like a tree-shaped object model that helps you programmatically navigate past executions, debug, inspect data, or build automation.
1. Flow
Represents the entire flow class (your pipeline).
from metaflow import Flow
flow = Flow('MyFlow')
.latest_run.runsβ list of all runs.nameβ name of the flow
2. Run
Represents a single execution of your flow.
run = Flow('MyFlow').latest_run
.idβ unique run ID (e.g.,1699374571123927).stepsβ dictionary of steps in this run.successfulβ boolean.created_atβ timestamp.user_tags,.system_tags
You can loop through runs:
for run in Flow('MyFlow').runs():
print(run.id, run.successful)
3. Step
Represents a single step (e.g., start, train, join) inside a run.
step = run['train']
.nameβ step name.tasksβ list of tasks for this step (can be more than one forforeach).successfulβ if step finished without error
4. Task
Represents a single task (actual execution unit of a step).
task = step.task # or `step.tasks[0]`
.attemptβ retry attempt number.finishedβ if task finished.stdout,.stderr.tags.metadata
Most importantly:
task.data.<artifact_name>
This is how you retrieve artifacts from past runs.
5. Artifact
This is the actual data (variable) persisted by a step.
acc = task.data.accuracy
These are the self.var = ... assignments in your step code.
Artifacts:
- Are stored in Metaflow's data store
- Are tied to the task, step, and run that created them
- Are read-only from the perspective of another run
π§ͺ Example: Traversing the Object Tree
from metaflow import Flow
flow = Flow('GridSearchFlow')
run = flow.latest_run
step = run['train_model'] # Step where training occurred
for task in step.tasks:
result = task.data.result
print(result)
π·οΈ β Common Properties (Available on All Objects)
| Property | Description |
|---|---|
user_tags |
Tags added manually (CLI or code) |
system_tags |
Automatically generated tags (e.g., user, runtime, version) |
tags |
Union of user + system tags |
created_at |
When the object was created |
parent |
Parent object (e.g., taskβs parent is a step) |
pathspec |
Fully qualified string path (e.g., MyFlow/123/start/abcde) |
path_components |
List split of pathspec (e.g., [MyFlow, 123, start, abcde]) |
These are useful when writing generic traversal utilities.
π Flow-Level (Flow object)
Accessed via:
from metaflow import Flow
flow = Flow('MyFlow')
| Property | Description |
|---|---|
runs() |
Iterator of all runs in the current namespace |
latest_run |
Most recent run (finished or not) |
latest_successful_run |
Most recent successful and finished run |
β‘οΈ Great for programmatically pulling the last N runs, analyzing outputs, or rerunning failed jobs.
π Run-Level (Run object)
run = Flow('MyFlow').latest_run
| Property | Description |
|---|---|
steps() |
Iterator of steps in this run |
data |
Shortcut to run.end_task.data, i.e., final step's artifacts |
successful |
True if run finished successfully |
finished |
True if run finished (success or fail) |
finished_at |
datetime of when run finished |
code |
If saved, the code used in the run |
end_task |
Shortcut to task of the last step in DAG |
trigger |
Info on what triggered the run (e.g., a schedule or user) |
β‘οΈ The data property is especially handy to get final results fast without needing to drill down manually.
π Step-Level (Step object)
step = run['train']
| Property | Description |
|---|---|
task |
The single Task of this step (or any one of them, if multiple) |
tasks() |
Iterator of all Tasks (for foreach steps) |
finished_at |
When the step completed (i.e., all tasks finished) |
environment_info |
Execution environment metadata (e.g., Python version, OS) |
β‘οΈ If using foreach, this is where .tasks() becomes important for aggregating parallel task results.
βοΈ Task-Level (Task object)
task = step.task
| Property | Description |
|---|---|
data |
Artifact values (i.e., variables set via self.var = ...) |
artifacts |
List of individual DataArtifact objects (vs just values) |
successful |
Task succeeded? |
finished |
Task completed (even if failed)? |
finished_at |
When the task finished |
exception |
Exception info if task failed |
stdout / stderr |
Standard output/error strings |
code |
Source code used for this task (if persisted) |
environment_info |
Dict with system-level info (e.g., Python, Conda, image) |
β‘οΈ The data and stdout properties are the most common to use in inspection scripts or dashboards.
π§ͺ Minimal Working Example (From Docs)
from metaflow import Step
step = Step('DebugFlow/2/start') # format: FlowName/RunID/StepName
if step.task.successful:
print("Finished at:", step.task.finished_at)
print("Stdout:")
print(step.task.stdout)
print("Artifacts:", list(step.task.data.__dict__.keys()))
π§ Bonus Tips
pathspeccan be used to fetch any object directly usingStep(...),Task(...), etc.- Use
.tagsin combination with filters to group runs or detect anomalies codeandenvironment_infoare useful for reproducibility and audit logging- Combine
.datawithpandas.DataFrameorplotlyto visualize experiment results
β Summary Table
| Level | Key Object | Must-Know Property | Purpose |
|---|---|---|---|
| Flow | Flow |
latest_run, runs() |
Get recent runs |
| Run | Run |
data, successful, end_task |
Check results |
| Step | Step |
tasks(), finished_at |
Inspect step behavior |
| Task | Task |
data, stdout, exception |
Debug and extract outputs |