AI/ML notes

metadata

Metadata

In Metaflow, metadata is the internal database that records all the execution history of flows, including:

  • Run status (started, finished, succeeded, failed)
  • Step/task execution details
  • Artifacts
  • Tags
  • System events
  • User-defined metadata (you can write your own)

It’s what powers metaflow status, the UI (like Metaflow Cards), and introspection from the Python API (Flow(), Run(), etc.).


🧠 Metadata Provider: What Is It?

The metadata provider is the backend service or database where Metaflow stores all this execution history.

πŸ”§ Common Providers:

Provider Description
Local (local) Default for local runs; stores metadata in ~/.metaflow/metadata/local
Service (service) Stores metadata in a remote metadata service (used in production/cloud)
Custom You can implement your own (e.g., PostgreSQL, Mongo, REST service)

You can configure the provider using:

export METAFLOW_METADATA=local     # default
export METAFLOW_METADATA=service   # use with Metaflow Metadata Service

Or set it in code:

from metaflow import metadata
metadata('service')  # programmatic switch

πŸ” Viewing Current Metadata Provider

You can check the metadata provider being used:

metaflow metadata get

Example output:

Using metadata provider: local

πŸ§ͺ Writing Custom Metadata (User-defined)

Metaflow allows you to record arbitrary key-value metadata at runtime, which is stored with the current task.

Example:

@step
def train(self):
    self.model_score = 0.93

    # Save custom metadata
    self.metadata = {
        "model": "xgboost",
        "score": self.model_score,
        "note": "Baseline experiment"
    }
    for key, value in self.metadata.items():
        self.set_metadata(key, value)

    self.next(self.end)

You can inspect it later:

from metaflow import Task
task = Task('MyFlow/123/train/abcdef')
print(task.metadata)

Each metadata item is stored with:

  • field_name
  • value
  • type (e.g., user)
  • ts_epoch (timestamp)

πŸ§ͺ Reading System Metadata

System-generated metadata includes:

  • Host info
  • Execution time
  • Attempt #
  • Retry status
  • Environment (Python, Conda, Docker, etc.)
task = Step('MyFlow/2/train').task
print(task.metadata)  # All key-value metadata entries

You can filter or log metadata like:

for m in task.metadata:
    if m.type == 'user':
        print(f"{m.field_name}: {m.value}")

πŸ“ Where Is Metadata Stored?

In local mode:

  • Metadata is written to:
    ~/.metaflow/metadata/local
    

You can inspect with:

metaflow status MyFlow

In service mode:

  • Metadata is stored in a centralized Metaflow Metadata Service
  • Supports team-wide collaboration, scalability, and monitoring tools
  • Used with Metaflow UI / Outerbounds cloud

βœ… Summary

Concept Description
Metadata Execution history, run/task state, events, artifacts
Provider Backend (local file, service, or custom)
set_metadata(key, val) Add custom metadata during flow execution
task.metadata Inspect all metadata from a task
metadata('service') Switch provider in code
CLI: metaflow metadata get Show current provider

πŸ”„ Comparing Tags vs Metadata

Feature Tags (add_tag) Metadata (set_metadata)
Key-Value Key only Key + value
Indexable Yes No
UI Support Yes Partial
Searchable Yes Not directly
Purpose Grouping Rich logging / audit

Setting up the Metaflow Metadata Service in a team environment is a critical step for enabling collaboration, observability, and production readiness. It lets multiple users share the same metadata backend, so everyone can view the same runs, tags, artifacts, and status, no matter where the flow was triggered from.


Metadata Service

Metaflow Metadata Service is a centralized HTTP service that stores all metadata for:

  • Flow definitions
  • Runs, steps, tasks
  • Tags
  • User-defined metadata
  • Execution history

It replaces the default local mode with service, and is often run as a Docker container or hosted behind a reverse proxy (e.g., nginx, AWS ALB).

🧱 1. Prerequisites

  • Docker or Python environment
  • Persistent storage (e.g., S3 for artifacts)
  • Shared infrastructure (VM, ECS, Kubernetes, etc.)

πŸš€ 2. Run the Metadata Service

πŸ”Έ Option A: Run locally via Docker

docker run -p 8080:8080 --rm \
  -e METAFLOW_DEFAULT_METADATA=service \
  -e METAFLOW_SERVICE_URL=http://localhost:8080 \
  metaflowservice/metaflow_metadata_service

This starts a local metadata service at http://localhost:8080.

Deploy using:

  • ECS / Fargate
  • Kubernetes (Helm or raw YAML)
  • VM (EC2, GCP Compute Engine, etc.)

Docker image:

metaflowservice/metaflow_metadata_service

Use a reverse proxy (e.g., nginx) for TLS and authentication if needed.


πŸ§ͺ 3. Configure Clients to Use the Service

On every team member’s machine or in CI:

export METAFLOW_METADATA=service
export METAFLOW_SERVICE_URL=http://<your-metadata-url>:8080

You can also set this in your .bashrc, .env, or a central configuration file if you're using Metaflow Profiles.


The metadata service tracks metadata only. You still need a datastore for actual flow data/artifacts (e.g., models, tensors).

export METAFLOW_DATATOOLS_S3ROOT=s3://your-bucket/metaflow
export METAFLOW_DATASTORE=s3
export AWS_DEFAULT_REGION=us-west-2

πŸ§‘β€πŸ€β€πŸ§‘ 5. Team Collaboration

Once set up:

  • All runs are visible by all team members via metaflow status FlowName
  • Tags, metadata, and code can be shared
  • Workflows can be orchestrated centrally via Airflow, Argo, etc.
  • Metaflow Cards can point to the same backend

βœ… Example Config for All Users

Create a .env.metaflow_team:

METAFLOW_METADATA=service
METAFLOW_SERVICE_URL=https://metaflow.yourcompany.com
METAFLOW_DATASTORE=s3
METAFLOW_DATATOOLS_S3ROOT=s3://my-company-metaflow-data
AWS_DEFAULT_REGION=us-west-2

Source it with:

source .env.metaflow_team

Or place in a config management system (Ansible, Terraform, etc.).


πŸ”’ Optional: Secure the Service

  • Deploy behind nginx + basic auth or OAuth
  • Use HTTPS with Let's Encrypt or company certificate
  • Restrict by IP range or cloud IAM

🧠 Monitoring and Debugging

  • The metadata service logs HTTP requests
  • Enable Prometheus metrics via env variables
  • Use curl http://<host>:8080/ping to check health

πŸ“‹ Verify Setup

metaflow metadata get

Expected:

Using metadata provider: service (http://your-service)

Run a flow:

python my_flow.py run --tag test:team
metaflow status MyFlow

βœ… Summary

Task How
Start metadata service Docker or deploy in cloud
Configure clients METAFLOW_METADATA=service and METAFLOW_SERVICE_URL
Store artifacts Use s3 with METAFLOW_DATASTORE
Share across team All users point to the same URL
Secure access TLS, reverse proxy, IP/IAM control