deep_dive
Deep dive
How everything works under the hood
Lifecycle Mnemonic
DAG ➜ start ➜ next step ➜ isolate ➜ run ➜ pass artifacts ➜ repeat ➜ end
🧠 Key Files and Responsibilities
| Location | Description |
|---|---|
metaflow/flowspec.py |
🧠 Core class: FlowSpec — base class your flows inherit. Handles @step methods, self.next(...), data artifact tracking, etc. |
metaflow/runtime.py |
🚀 Entry point for execution when you do python myflow.py run. Manages flow instantiation and hands off to the runtime engine. |
metaflow/runtime/engine.py |
⚙️ Abstract engine that defines the interface for running steps. Backend-specific engines (local, batch, etc.) subclass this. |
metaflow/runtime/step_functions/step_function.py |
🌿 Handles calling step methods, managing inputs, outputs, and error handling. |
metaflow/runtime/executor.py |
🔄 Responsible for executing steps, resolving dependencies, managing state transitions, etc. |
metaflow/datastore/ |
📦 Code for persisting and retrieving data artifacts across steps. |
metaflow/decorators/ |
💡 Contains logic for decorators like @batch, @retry, @conda etc. They wrap the step execution logic. |
🪜 How Execution Works Internally (Code Flow)
Here’s a simplified flow of what happens when you run:
python myflow.py run
1. runtime.py
- Parses CLI args.
- Locates the flow class in your script.
- Instantiates it.
2. FlowSpec (flowspec.py)
- Collects all
@stepmethods using decorators. - Builds up internal metadata (
_steps,_edges, etc.). - Runs the entry point step (
start()).
3. self.next(...) (in FlowSpec)
- Records edges in the DAG (who runs after whom).
- Actual execution is deferred until Metaflow resolves the step DAG at runtime.
4. engine.py + executor.py
- The engine resolves the next step(s) to run.
- Each step is isolated and run (locally or via a backend like AWS Batch).
5. step_function.py
- Runs the step method (
step_fn(self)). - Captures outputs, stores artifacts.
6. datastore/
- Persists variables assigned in the step (
self.x = ...) as pickled blobs. - Automatically loads them in the next step.
🧭 Entry Points to Explore
metaflow/flowspec.py: FlowSpecmetaflow/runtime.py: main()metaflow/runtime/executor.py: StepExecutormetaflow/runtime/engine.py: Enginemetaflow/runtime/local/local_runner.py(for local runs)
🧪 Tip for Debugging or Hacking
Add print statements or breakpoints in:
FlowSpec.next()— to trace DAG construction.StepExecutor._run_step_function()— to watch step execution.DataStoreclasses — to track how variables get stored/loaded.
Stages of metaflow
In Metaflow (and in many dynamic workflow systems), there’s a meaningful distinction between definition time and runtime
📌 1. Definition Time
- When: Happens when Python imports and parses your script.
- What happens:
- Your
FlowSpecclass is instantiated (or inspected via reflection). - All methods decorated with
@stepare registered. - All decorators (
@batch,@retry, etc.) are applied and locked in. - The DAG structure is not yet built—just the static wiring is made available.
- Your
- Key constraint: This is when Python and Metaflow set up metadata and decorators.
- Limitation: You can’t use flow artifacts or conditional logic based on runtime parameters here.
🧠 Think of it as the "blueprint building" phase—just loading and preparing the class, not actually running steps.
📌 2. Runtime
- When: Starts when you execute something like
python myflow.py run. - What happens:
- Metaflow starts from the entry point (
start()). - Your
@stepmethod runs in isolation. - Variables assigned (e.g.,
self.x = 10) become artifacts. - Metaflow builds the DAG dynamically based on
self.next(...). - Backend (local, AWS Batch, etc.) actually runs your code step by step.
- Metaflow starts from the entry point (
🔄 Each step is its own "process," run with serialized inputs and outputs passed between steps.
📌 3. Scheduling Time (Optional)
If you're using Metaflow with Airflow, Argo Workflows, or a scheduler:
- There's a third layer called scheduling or orchestration time.
- This happens before runtime, but after definition time, and is when the scheduler:
- Decides when and where to trigger a new flow run.
- May inspect the static flow definition.
- Doesn’t run steps or generate artifacts—just orchestrates triggers.
🧠 Why This Matters
-
Conditional logic based on flow parameters:
@batch(cpu=2 if self.cpu_heavy else 1) # ❌ Won’t work at definition time!self.cpu_heavydoesn’t exist yet —selfonly exists at runtime, not at definition time. -
self.next(...)is called at runtime, allowing dynamic DAG construction. -
@step,@batch, etc., are used at definition time, and cannot see runtime context.
✅ Rule of Thumb
| Phase | What you can use |
|---|---|
| Definition time | Class names, static imports, constants, decorators |
| Runtime | Flow parameters, self variables, control flow, self.next(...) |
| Scheduling time | Environment variables, scheduled triggers |