metadata
Metadata
In Metaflow, metadata is the internal database that records all the execution history of flows, including:
- Run status (started, finished, succeeded, failed)
- Step/task execution details
- Artifacts
- Tags
- System events
- User-defined metadata (you can write your own)
Itβs what powers metaflow status, the UI (like Metaflow Cards), and introspection from the Python API (Flow(), Run(), etc.).
π§ Metadata Provider: What Is It?
The metadata provider is the backend service or database where Metaflow stores all this execution history.
π§ Common Providers:
| Provider | Description |
|---|---|
Local (local) |
Default for local runs; stores metadata in ~/.metaflow/metadata/local |
Service (service) |
Stores metadata in a remote metadata service (used in production/cloud) |
| Custom | You can implement your own (e.g., PostgreSQL, Mongo, REST service) |
You can configure the provider using:
export METAFLOW_METADATA=local # default
export METAFLOW_METADATA=service # use with Metaflow Metadata Service
Or set it in code:
from metaflow import metadata
metadata('service') # programmatic switch
π Viewing Current Metadata Provider
You can check the metadata provider being used:
metaflow metadata get
Example output:
Using metadata provider: local
π§ͺ Writing Custom Metadata (User-defined)
Metaflow allows you to record arbitrary key-value metadata at runtime, which is stored with the current task.
Example:
@step
def train(self):
self.model_score = 0.93
# Save custom metadata
self.metadata = {
"model": "xgboost",
"score": self.model_score,
"note": "Baseline experiment"
}
for key, value in self.metadata.items():
self.set_metadata(key, value)
self.next(self.end)
You can inspect it later:
from metaflow import Task
task = Task('MyFlow/123/train/abcdef')
print(task.metadata)
Each metadata item is stored with:
field_namevaluetype(e.g.,user)ts_epoch(timestamp)
π§ͺ Reading System Metadata
System-generated metadata includes:
- Host info
- Execution time
- Attempt #
- Retry status
- Environment (Python, Conda, Docker, etc.)
task = Step('MyFlow/2/train').task
print(task.metadata) # All key-value metadata entries
You can filter or log metadata like:
for m in task.metadata:
if m.type == 'user':
print(f"{m.field_name}: {m.value}")
π Where Is Metadata Stored?
In local mode:
- Metadata is written to:
~/.metaflow/metadata/local
You can inspect with:
metaflow status MyFlow
In service mode:
- Metadata is stored in a centralized Metaflow Metadata Service
- Supports team-wide collaboration, scalability, and monitoring tools
- Used with Metaflow UI / Outerbounds cloud
β Summary
| Concept | Description |
|---|---|
| Metadata | Execution history, run/task state, events, artifacts |
| Provider | Backend (local file, service, or custom) |
set_metadata(key, val) |
Add custom metadata during flow execution |
task.metadata |
Inspect all metadata from a task |
metadata('service') |
Switch provider in code |
CLI: metaflow metadata get |
Show current provider |
π Comparing Tags vs Metadata
| Feature | Tags (add_tag) |
Metadata (set_metadata) |
|---|---|---|
| Key-Value | Key only | Key + value |
| Indexable | Yes | No |
| UI Support | Yes | Partial |
| Searchable | Yes | Not directly |
| Purpose | Grouping | Rich logging / audit |
Setting up the Metaflow Metadata Service in a team environment is a critical step for enabling collaboration, observability, and production readiness. It lets multiple users share the same metadata backend, so everyone can view the same runs, tags, artifacts, and status, no matter where the flow was triggered from.
Metadata Service
Metaflow Metadata Service is a centralized HTTP service that stores all metadata for:
- Flow definitions
- Runs, steps, tasks
- Tags
- User-defined metadata
- Execution history
It replaces the default local mode with service, and is often run as a Docker container or hosted behind a reverse proxy (e.g., nginx, AWS ALB).
π§± 1. Prerequisites
- Docker or Python environment
- Persistent storage (e.g., S3 for artifacts)
- Shared infrastructure (VM, ECS, Kubernetes, etc.)
π 2. Run the Metadata Service
πΈ Option A: Run locally via Docker
docker run -p 8080:8080 --rm \
-e METAFLOW_DEFAULT_METADATA=service \
-e METAFLOW_SERVICE_URL=http://localhost:8080 \
metaflowservice/metaflow_metadata_service
This starts a local metadata service at http://localhost:8080.
πΈ Option B: Run in the cloud (recommended for teams)
Deploy using:
- ECS / Fargate
- Kubernetes (Helm or raw YAML)
- VM (EC2, GCP Compute Engine, etc.)
Docker image:
metaflowservice/metaflow_metadata_service
Use a reverse proxy (e.g., nginx) for TLS and authentication if needed.
π§ͺ 3. Configure Clients to Use the Service
On every team memberβs machine or in CI:
export METAFLOW_METADATA=service
export METAFLOW_SERVICE_URL=http://<your-metadata-url>:8080
You can also set this in your .bashrc, .env, or a central configuration file if you're using Metaflow Profiles.
π¦ 4. Use with S3 for Artifact Storage (recommended)
The metadata service tracks metadata only. You still need a datastore for actual flow data/artifacts (e.g., models, tensors).
export METAFLOW_DATATOOLS_S3ROOT=s3://your-bucket/metaflow
export METAFLOW_DATASTORE=s3
export AWS_DEFAULT_REGION=us-west-2
π§βπ€βπ§ 5. Team Collaboration
Once set up:
- All runs are visible by all team members via
metaflow status FlowName - Tags, metadata, and code can be shared
- Workflows can be orchestrated centrally via Airflow, Argo, etc.
- Metaflow Cards can point to the same backend
β Example Config for All Users
Create a .env.metaflow_team:
METAFLOW_METADATA=service
METAFLOW_SERVICE_URL=https://metaflow.yourcompany.com
METAFLOW_DATASTORE=s3
METAFLOW_DATATOOLS_S3ROOT=s3://my-company-metaflow-data
AWS_DEFAULT_REGION=us-west-2
Source it with:
source .env.metaflow_team
Or place in a config management system (Ansible, Terraform, etc.).
π Optional: Secure the Service
- Deploy behind nginx + basic auth or OAuth
- Use HTTPS with Let's Encrypt or company certificate
- Restrict by IP range or cloud IAM
π§ Monitoring and Debugging
- The metadata service logs HTTP requests
- Enable Prometheus metrics via env variables
- Use
curl http://<host>:8080/pingto check health
π Verify Setup
metaflow metadata get
Expected:
Using metadata provider: service (http://your-service)
Run a flow:
python my_flow.py run --tag test:team
metaflow status MyFlow
β Summary
| Task | How |
|---|---|
| Start metadata service | Docker or deploy in cloud |
| Configure clients | METAFLOW_METADATA=service and METAFLOW_SERVICE_URL |
| Store artifacts | Use s3 with METAFLOW_DATASTORE |
| Share across team | All users point to the same URL |
| Secure access | TLS, reverse proxy, IP/IAM control |