obj_hierachy

Object Hierarchy

🧱 Metaflow Object Hierarchy

Here’s the hierarchy in top-down order:

Flow
 └── Run
      └── Step
           └── Task
                └── Artifact

Think of it like a tree-shaped object model that helps you programmatically navigate past executions, debug, inspect data, or build automation.

1. Flow

Represents the entire flow class (your pipeline).

from metaflow import Flow
flow = Flow('MyFlow')

.latest_run
.runs → list of all runs
.name → name of the flow

2. Run

Represents a single execution of your flow.

run = Flow('MyFlow').latest_run

.id → unique run ID (e.g., 1699374571123927)
.steps → dictionary of steps in this run
.successful → boolean
.created_at → timestamp
.user_tags, .system_tags

You can loop through runs:

for run in Flow('MyFlow').runs():
    print(run.id, run.successful)

3. Step

Represents a single step (e.g., start, train, join) inside a run.

step = run['train']

.name → step name
.tasks → list of tasks for this step (can be more than one for foreach)
.successful → if step finished without error

4. Task

Represents a single task (actual execution unit of a step).

task = step.task  # or `step.tasks[0]`

.attempt → retry attempt number
.finished → if task finished
.stdout, .stderr
.tags
.metadata

Most importantly:

task.data.<artifact_name>

This is how you retrieve artifacts from past runs.

5. Artifact

This is the actual data (variable) persisted by a step.

acc = task.data.accuracy

These are the self.var = ... assignments in your step code.

Artifacts:

Are stored in Metaflow's data store
Are tied to the task, step, and run that created them
Are read-only from the perspective of another run

🧪 Example: Traversing the Object Tree

from metaflow import Flow

flow = Flow('GridSearchFlow')
run = flow.latest_run
step = run['train_model']  # Step where training occurred
for task in step.tasks:
    result = task.data.result
    print(result)

🏷️ ✅ Common Properties (Available on All Objects)

Property	Description
`user_tags`	Tags added manually (CLI or code)
`system_tags`	Automatically generated tags (e.g., user, runtime, version)
`tags`	Union of user + system tags
`created_at`	When the object was created
`parent`	Parent object (e.g., task’s parent is a step)
`pathspec`	Fully qualified string path (e.g., `MyFlow/123/start/abcde`)
`path_components`	List split of pathspec (e.g., `[MyFlow, 123, start, abcde]`)

These are useful when writing generic traversal utilities.

🔄 Flow-Level (`Flow` object)

Accessed via:

from metaflow import Flow
flow = Flow('MyFlow')

Property	Description
`runs()`	Iterator of all runs in the current namespace
`latest_run`	Most recent run (finished or not)
`latest_successful_run`	Most recent successful and finished run

➡️ Great for programmatically pulling the last N runs, analyzing outputs, or rerunning failed jobs.

🔁 Run-Level (`Run` object)

run = Flow('MyFlow').latest_run

Property	Description
`steps()`	Iterator of steps in this run
`data`	Shortcut to `run.end_task.data`, i.e., final step's artifacts
`successful`	True if run finished successfully
`finished`	True if run finished (success or fail)
`finished_at`	`datetime` of when run finished
`code`	If saved, the code used in the run
`end_task`	Shortcut to task of the last step in DAG
`trigger`	Info on what triggered the run (e.g., a schedule or user)

➡️ The data property is especially handy to get final results fast without needing to drill down manually.

🔁 Step-Level (`Step` object)

step = run['train']

Property	Description
`task`	The single Task of this step (or any one of them, if multiple)
`tasks()`	Iterator of all Tasks (for `foreach` steps)
`finished_at`	When the step completed (i.e., all tasks finished)
`environment_info`	Execution environment metadata (e.g., Python version, OS)

➡️ If using foreach, this is where .tasks() becomes important for aggregating parallel task results.

⚙️ Task-Level (`Task` object)

task = step.task

Property	Description
`data`	Artifact values (i.e., variables set via `self.var = ...`)
`artifacts`	List of individual DataArtifact objects (vs just values)
`successful`	Task succeeded?
`finished`	Task completed (even if failed)?
`finished_at`	When the task finished
`exception`	Exception info if task failed
`stdout` / `stderr`	Standard output/error strings
`code`	Source code used for this task (if persisted)
`environment_info`	Dict with system-level info (e.g., Python, Conda, image)

➡️ The data and stdout properties are the most common to use in inspection scripts or dashboards.

🧪 Minimal Working Example (From Docs)

from metaflow import Step

step = Step('DebugFlow/2/start')  # format: FlowName/RunID/StepName

if step.task.successful:
    print("Finished at:", step.task.finished_at)
    print("Stdout:")
    print(step.task.stdout)
    print("Artifacts:", list(step.task.data.__dict__.keys()))

🧠 Bonus Tips

pathspec can be used to fetch any object directly using Step(...), Task(...), etc.
Use .tags in combination with filters to group runs or detect anomalies
code and environment_info are useful for reproducibility and audit logging
Combine .data with pandas.DataFrame or plotly to visualize experiment results

✅ Summary Table

Level	Key Object	Must-Know Property	Purpose
Flow	`Flow`	`latest_run`, `runs()`	Get recent runs
Run	`Run`	`data`, `successful`, `end_task`	Check results
Step	`Step`	`tasks()`, `finished_at`	Inspect step behavior
Task	`Task`	`data`, `stdout`, `exception`	Debug and extract outputs