AI/ML notes

configs

Configs


🎯 Config-Driven Experimentation

Metaflow's config-driven experimentation lets you separate experiment configuration (parameters, hyperparameters, environment variables, etc.) from flow code. This makes it easier to:

  • Run many experiments with different configs
  • Share, version, and reproduce experiments
  • Keep flow code clean and DRY
  • Support team-wide configuration standards

🧩 Key Concepts

1. Config Files

You define configs in external files (usually .yaml or .json):

# example_config.yaml
lr: 0.01
dropout: 0.3
epochs: 5

2. Loading Configs

Use the from_config() helper to load these values into a flow:

from metaflow import FlowSpec, step, from_config

config = from_config("example_config.yaml")

class MyFlow(FlowSpec):

    @step
    def start(self):
        self.lr = config["lr"]
        ...

3. Combining with Parameters

You can still define @Parameter and override values via CLI:

python my_flow.py run --lr 0.05

Use config.get("lr", self.lr) to give CLI overrides precedence.


βš™οΈ How It Works Under the Hood

  • Configs are loaded at runtime using from_config()
  • Configs can be Python modules, YAML, or JSON
  • Values are treated as regular Python dict entries
  • You can load multiple config files and merge them

πŸ”§ Advanced Features

Feature Description
Multiple config files Combine and override values across sources
Dynamic configs Load Python-based config logic (e.g., environment-aware)
Experiment tracking Combine configs with Metaflow tags/metadata/cards
YAML/JSON support Native parsing of structured config files

🎯 Basic configs

Metaflow provides a configuration system to control behavior globally or per environment, enabling you to customize how flows run, store data, use compute resources, and more β€” all without modifying your code.


πŸ“ Where Configurations Come From

Configurations can be set in four layers (in priority order):

Layer Example Priority
1. Runtime overrides CLI: METAFLOW_DATASTORE=s3 python flow.py run πŸ”Ί Highest
2. Environment files .metaflowconfig/config.json
3. Python code from metaflow import config
4. Built-in defaults Metaflow internal settings πŸ”» Lowest

πŸ”§ How to Set Configs

βœ… Option 1: Runtime environment variables (most common)

export METAFLOW_PROFILE=my-aws-profile
export METAFLOW_DATASTORE=s3

βœ… Option 2: Config file (.metaflowconfig/config.json)

{
  "METAFLOW_PROFILE": "default",
  "METAFLOW_DATASTORE": "s3",
  "METAFLOW_S3ROOT": "s3://my-bucket/metaflow"
}
from metaflow import config
print(config.METAFLOW_PROFILE)

βœ… Summary Benefits

  • πŸ”§ Configure flows without hardcoding
  • πŸ§ͺ Make your code environment-agnostic
  • πŸ“¦ Simplifies running flows in different profiles (e.g., dev vs prod)
  • 🌩 Works seamlessly with cloud deployments

πŸ“ .metaflowconfig/config.json – TEMPLATE

You can place this file in your project root or your home directory under ~/.metaflowconfig/config.json.

βœ… Local & AWS Profile Example

{
  "default": {
    "METAFLOW_PROFILE": "local",
    "METAFLOW_DATASTORE": "local",
    "METAFLOW_DEFAULT_METADATA": "local",
    "METAFLOW_CARD_DIR": "./_cards"
  },
  "aws": {
    "METAFLOW_PROFILE": "aws",
    "METAFLOW_DATASTORE": "s3",
    "METAFLOW_S3ROOT": "s3://your-bucket/metaflow",
    "METAFLOW_DEFAULT_METADATA": "service",
    "METAFLOW_SERVICE_URL": "https://your-metaflow-metadata-service.com",
    "METAFLOW_CARD_DIR": "./_cards",
    "METAFLOW_CARD_VIEWER": "https://your-metaflow-ui/cards"
  }
}

πŸ” This allows you to switch between profiles using:

export METAFLOW_PROFILE=default  # for local
export METAFLOW_PROFILE=aws      # for cloud

πŸ§ͺ Test it!

Try running:

export METAFLOW_PROFILE=aws
python my_flow.py run

Then switch back:

export METAFLOW_PROFILE=default
python my_flow.py run

Each profile will:

  • Use the correct artifact storage (local folder vs. S3)
  • Store metadata locally or in a shared service
  • Send cards to the appropriate viewer

πŸ› οΈ Useful Configuration Keys

Key Description
METAFLOW_PROFILE Profile name (default, aws, etc.)
METAFLOW_DATASTORE local or s3
METAFLOW_S3ROOT S3 root path for all artifacts
METAFLOW_DEFAULT_METADATA local or service
METAFLOW_SERVICE_URL URL to your metadata service
METAFLOW_CARD_DIR Directory where local cards are stored
METAFLOW_CARD_VIEWER Optional external URL for UI card viewing

🧠 Pro Tips

  • Keep .metaflowconfig/config.json in source control (but never hardcode credentials)
  • Store secrets like AWS credentials in ~/.aws/credentials or IAM roles
  • You can add more profiles like staging, dev, prod, etc.
  • Use metaflow metadata get and status to confirm the correct setup

πŸ› οΈ Parsing Configs in Metaflow

🧩 Core Tool: from_config()

Metaflow provides a utility:

from metaflow import from_config

This function loads a config file into a Python dictionary. The file can be:

  • .yaml or .yml
  • .json
  • .py (Python config file)

πŸ§ͺ Example Usage

YAML file: config.yaml

learning_rate: 0.01
dropout: 0.3

Flow code:

from metaflow import FlowSpec, step, from_config

config = from_config("config.yaml")

class MyFlow(FlowSpec):
    @step
    def start(self):
        self.lr = config["learning_rate"]
        print("LR:", self.lr)
        self.next(self.end)

    @step
    def end(self):
        pass

🧠 Supported Formats & Behavior

Format Notes
YAML Parsed with PyYAML (must be installed)
JSON Parsed with Python json module
Python Must define a CONFIG dict
.py file example:
# config.py
CONFIG = {
  "learning_rate": 0.1,
  "epochs": 5
}

Then use:

config = from_config("config.py")

πŸ›‘οΈ Safety Notes

  • Python config files are executed, so use with caution (don’t load untrusted files).
  • If config is missing or malformed, from_config() raises an error.

πŸ› οΈ Custom Parsers in Metaflow

Custom parsers allow you to extend Metaflow’s configuration system by defining your own rules for reading configuration files. This is useful when:

  • Your configuration format is non-standard or specialized.
  • You need custom preprocessing before the configuration values are used.
  • You want to integrate with legacy systems or non-YAML/JSON formats.

🧩 Core Concepts

  • Parser Interface:
    Metaflow expects parsers to adhere to a standard interface. Custom parsers are classes that implement methods for reading and parsing configuration files.

  • Registration:
    Your custom parser must be registered with Metaflow so that from_config() can detect and use it based on the file extension (or other heuristics).

  • Extensibility:
    By writing your own parser, you can:

    • Handle new file extensions.
    • Preprocess configuration data (e.g., environment variable substitution, validation).
    • Merge multiple config files or sources in a custom manner.

πŸ”§ Implementation Overview

  1. Define Your Parser:
    Create a class that typically inherits from a base parser provided by Metaflow (or implements the necessary interface). Implement at least a parse() method that accepts a file path and returns a Python dictionary.

  2. Register the Parser:
    Add your parser to Metaflow’s parser registry. This is often done by appending your parser (or its file extension mapping) to an internal list. Metaflow then knows to use your custom parser when encountering a file with the associated extension.

  3. Use with from_config():
    Once registered, you can load your configuration file using from_config(), and Metaflow will automatically invoke your custom parser if the file type matches.


πŸ“‹ Example Flow (Conceptual)

from metaflow import from_config, FlowSpec, step
from my_custom_parser import MyCustomParser  # Your custom parser class

# Ensure your parser is registered with Metaflow
# (This registration might be handled in your parser module)
MyCustomParser.register()

# Load configuration using your custom parser
config = from_config("config.myext")  # 'myext' is the custom file extension

class MyFlow(FlowSpec):

    @step
    def start(self):
        self.param = config["some_parameter"]
        print("Loaded parameter:", self.param)
        self.next(self.end)

    @step
    def end(self):
        print("Flow complete.")

Note: The above is a conceptual example. The actual implementation may require following Metaflow’s custom parser API closely.


βœ… Benefits

  • Flexibility: Integrate any configuration format you need.
  • Customization: Preprocess and validate configurations exactly as required.
  • Seamless Integration: Once set up, your custom parser works transparently with Metaflow’s from_config() utility.