parameters
Parameters
🧩 What Are Parameters in Metaflow?
Metaflow provides the @parameter decorator to declare runtime parameters that are accessible via self.<param_name> in your steps.
They make your flow:
- Reusable across runs
- Testable with different configurations
- Integrable with automation (e.g., running in production or experimentation)
🛠️ Basic Syntax
from metaflow import FlowSpec, step, Parameter
class MyFlow(FlowSpec):
my_param = Parameter(
'my_param',
help='This is a sample parameter',
default='default_value'
)
@step
def start(self):
print("Parameter value:", self.my_param)
self.next(self.end)
@step
def end(self):
print("Flow complete.")
if __name__ == '__main__':
MyFlow()
🧪 Example CLI Run
You can override the parameter from the command line like this:
python my_flow.py run --my-param hello_world
Expected output:
Parameter value: hello_world
Flow complete.
If you don’t pass the parameter, the default will be used:
python my_flow.py run
Output:
Parameter value: default_value
Flow complete.
🧮 Parameter Types
By default, all parameters are treated as strings. You can specify a type to control parsing.
Examples:
from metaflow import FlowSpec, step, Parameter
class TypedFlow(FlowSpec):
count = Parameter('count', help='Number of items', type=int, default=3)
threshold = Parameter('threshold', type=float, default=0.8)
active = Parameter('active', type=bool, default=True)
@step
def start(self):
print("count:", self.count)
print("threshold:", self.threshold)
print("active:", self.active)
self.next(self.end)
@step
def end(self):
pass
if __name__ == '__main__':
TypedFlow()
CLI:
python typed_flow.py run --count 10 --threshold 0.95 --active False
Output:
count: 10
threshold: 0.95
active: False
Note: for bool, you must pass True or False (capitalized) on CLI.
🧪 Testing via Jupyter or Script
If running interactively (e.g., from Jupyter or testing):
from metaflow import FlowSpec, step, Parameter
class MyTestFlow(FlowSpec):
name = Parameter('name', default='default')
@step
def start(self):
print("Hello,", self.name)
self.next(self.end)
@step
def end(self):
pass
flow = MyTestFlow(name="Alice")
flow.run()
✅ Recap
| Feature | Description |
|---|---|
Parameter(...) |
Declare a configurable parameter |
type= |
Set expected type (e.g., int, float, bool) |
default= |
Provide a fallback value |
--param value |
CLI syntax to override parameter |
Awesome! Let's walk through a simple ML hyperparameter sweep using Metaflow parameters, simulating a grid search over a model training process.
We'll use:
@Parameterto pass hyperparameters likelearning_rate,epochs- Metaflow branching (
self.next(*args)) to simulate a grid of parameter combinations - A dummy ML model using scikit-learn’s
LogisticRegressiontrained onsklearn.datasets.make_classification
🧪 Example: Grid Search with Metaflow Parameters
from metaflow import FlowSpec, step, Parameter
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
class GridSearchFlow(FlowSpec):
learning_rates = Parameter(
'learning_rates',
help='Comma-separated learning rates',
default='0.01,0.1,1.0'
)
c_values = Parameter(
'c_values',
help='Comma-separated C values (inverse regularization)',
default='0.1,1.0,10.0'
)
@step
def start(self):
# Parse the parameters into lists
self.learning_rates = [float(x) for x in self.learning_rates.split(',')]
self.c_values = [float(x) for x in self.c_values.split(',')]
# Cartesian product of hyperparameters
from itertools import product
self.param_grid = list(product(self.learning_rates, self.c_values))
print(f"Total configs to try: {len(self.param_grid)}")
self.next(self.train_model, foreach='param_grid')
@step
def train_model(self):
lr, c = self.input
print(f"Training with learning_rate={lr}, C={c}")
# Generate dummy data
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Simulate learning rate via max_iter, as LogisticRegression doesn’t support lr directly
model = LogisticRegression(C=c, max_iter=int(1000 * lr), solver='lbfgs')
model.fit(X_train, y_train)
preds = model.predict(X_test)
acc = accuracy_score(y_test, preds)
print(f"Accuracy: {acc:.4f}")
self.result = {
'learning_rate': lr,
'C': c,
'accuracy': acc
}
self.next(self.aggregate)
@step
def aggregate(self, inputs):
self.results = [input.result for input in inputs]
# Sort results by accuracy
self.results.sort(key=lambda x: x['accuracy'], reverse=True)
print("Top results:")
for r in self.results[:3]:
print(r)
self.next(self.end)
@step
def end(self):
print("Grid search complete!")
if __name__ == '__main__':
GridSearchFlow()
🖥️ Run it via CLI
python grid_search_flow.py run \
--learning-rates 0.01,0.1 \
--c-values 0.1,1.0
This runs 4 parallel model trainings:
- (0.01, 0.1)
- (0.01, 1.0)
- (0.1, 0.1)
- (0.1, 1.0)
Each train_model step runs in parallel by virtue of foreach='param_grid'.
📌 Key Concepts Demonstrated
| Concept | Usage |
|---|---|
Parameter |
Inject hyperparameters via CLI |
@step(foreach=) |
Dynamically fan out steps across combinations |
self.input |
Access tuple of (learning_rate, C) in each branch |
aggregate |
Collect results and compute best models |
🧪 JSON Config Parameter
from metaflow import FlowSpec, step, Parameter, JSONType
class JsonParamFlow(FlowSpec):
config = Parameter(
'config',
type=JSONType,
help='JSON string of model config',
default='{"lr": 0.1, "batch_size": 32, "dropout": 0.3}'
)
@step
def start(self):
print("Parsed config:", self.config)
print("Learning Rate:", self.config['lr'])
print("Batch Size:", self.config['batch_size'])
print("Dropout Rate:", self.config['dropout'])
self.next(self.end)
@step
def end(self):
print("Done.")
if __name__ == '__main__':
JsonParamFlow()
🧪 CLI Run with JSON
python json_param_flow.py run \
--config '{"lr": 0.05, "batch_size": 64, "dropout": 0.2}'
✅ Metaflow will automatically convert this into:
{
"lr": 0.05,
"batch_size": 64,
"dropout": 0.2
}
And you'll get CLI output like:
Parsed config: {'lr': 0.05, 'batch_size': 64, 'dropout': 0.2}
Learning Rate: 0.05
Batch Size: 64
Dropout Rate: 0.2
Done.
⚠️ Pro Tip for Bash Users
Make sure to:
- Use single quotes
'around the whole JSON string - Use double quotes
"inside for the JSON keys/values
So this works:
'{"key": "value"}'
But this fails:
"{'key': 'value'}" # Not valid JSON
🧠 Tradeoff: JSONType vs JSON File
🧷 Option 1: JSONType (Inline Parameter)
--config '{"lr": 0.01, "batch_size": 32}'
Pros:
- Quick and easy for small configs
- Great for prototyping or ad hoc runs
- Can be used directly in CI/CD or Airflow triggers
Cons:
- Quoting issues in shell (
"{"key":"value"}"inside"..."= headache) - Not human-friendly for large configs
- Hard to version or document properly
- Gets messy fast when nested
📁 Option 2: Pass JSON file path as string
class MyFlow(FlowSpec):
config_file = Parameter('config_file', default='config.json')
@step
def start(self):
import json
with open(self.config_file, 'r') as f:
self.config = json.load(f)
print("Config:", self.config)
self.next(self.end)
python flow.py run --config-file configs/model_v1.json
Pros:
- Clean, readable, reusable
- Easily version-controlled
- Good for deep configs or experiments
- Less prone to quoting errors
Cons:
- Slightly more boilerplate (need to read the file)
- More moving parts in a fully automated run
🧪 When Metaflow's JSONType Shines
JSONType is best for lightweight structured inputs that change often, e.g.:
python train.py run \
--config '{"model":"xgb", "features":["f1","f2"], "cv":5}'
Or if you're running experiments programmatically from a notebook:
MyFlow(config={"lr": 0.01, "dropout": 0.2}).run()
It’s not intended for storing full model configs, deployment setups, or anything you'd want under version control.
| Situation | Best Choice |
|---|---|
| ✅ Small structured configs (e.g., 2–5 fields) | type=JSONType |
| ✅ Running flows from CI or notebooks where passing JSON inline is convenient | type=JSONType |
| ❌ Large, nested, or reused config (e.g., 10+ fields, multiple layers) | JSON file path + load inside flow |
| ❌ Config shared across multiple flows/scripts | JSON file |
| ✅ Config needs Git version control | JSON file |