Apache Airflow
Euno connects to Apache Airflow to discover workflow orchestration metadata, including Airflow instances, DAGs, tasks, and Airflow datasets or Airflow 3 assets.
Euno's Airflow integration supports auto-discovery of:
Airflow instances
DAGs
Tasks
Datasets and assets
Prerequisites
Airflow 2 with the stable REST API, or Airflow 3 with REST API v2.
An Airflow API token, or an Airflow username and password.
Network access from Euno to the Airflow webserver.
The configured identity must be able to read DAGs and tasks. Dataset or asset permissions are needed to discover dataset-aware scheduling metadata.
Euno detects the supported Airflow API version automatically during validation. If the dataset or asset endpoint is not present or the identity cannot read it, Euno continues without those resources.
Stage 1: Configure Airflow
Step 1: Create an API token or service account
Euno supports two authentication methods: a bearer token (recommended) or username/password basic auth.
Option A β API token (recommended)
Log in to your Airflow webserver and go to Admin β Users.
Open or create the user you want Euno to authenticate as.
Scroll to the Extra section and generate a new token, or use the Airflow CLI:
Copy the token β it will be shown only once.
The Airflow user must have at minimum the built-in Viewer role so it can read DAGs, tasks, and dataset metadata. If you also want Euno to collect Airflow connections for lineage resolution (the Resolve warehouse lineage option), the user additionally needs the Op role or a custom role that includes can_read on Connections.
Airflow 2 requires the stable REST API to be enabled. In airflow.cfg, set [api] auth_backends = airflow.api.auth.backend.basic_auth (or jwt_auth for token-based auth). Airflow 3 enables the REST API v2 by default.
Option B β Username and password
If your Airflow deployment does not support token authentication, prepare a username and password for a dedicated service account with the permissions described above. Euno will use HTTP Basic auth.
Step 2: Verify network access
Ensure Euno's servers can reach the Airflow webserver URL over HTTPS. If your Airflow instance uses a private CA or self-signed certificate, disable SSL verification in the source configuration (see Stage 2).
Do not expose your Airflow webserver directly to the public internet if it is not already. Prefer network-level controls (VPN, VPC peering, IP allowlist) to allow only Euno's egress IPs to reach your Airflow instance.
Stage 2: Configure New Airflow Source in Euno
Step 1: Access the Sources Page
Navigate to Settings β Sources and click Add New Source. Select Airflow from the integration list.
Step 2: General Configuration
Asterisk (*) means a mandatory field.
Base URL*
Airflow webserver URL. Host-only values are canonicalized to HTTPS.
API token
Bearer token for Airflow API access. If present, Euno uses token auth.
Username
Basic auth username. Used only when no API token is configured.
Password
Basic auth password. Used only when no API token is configured.
Verify SSL certificates
Keep enabled for production. Disable only for local labs or private certificates. Default: enabled.
DAG pattern
Optional allow/deny regex pattern for DAG IDs. Use this to limit discovery to selected DAGs.
Connection pattern
Optional allow/deny regex pattern for Airflow connection IDs to include in lineage resolution.
Observe execution history
Collect bounded DAG run and task instance events for metrics and operator lineage. Default: enabled.
Execution history days
Lookback window for execution history. Range: 1β60 days. Default: 30.
Observe datasets
Collect Airflow datasets or Airflow 3 assets when the endpoint is available. Default: enabled.
Resolve warehouse lineage
Collect raw connection and operator evidence for warehouse lineage resolution. Default: enabled.
Connection mapping
Optional manual mapping from Airflow connection IDs to Euno resource URIs.
Large Airflow deployments are protected by internal safety limits; if a limit is reached, the run report explains what stopped or was truncated.
Step 3: Schedule
Enable the Schedule option and choose how often Euno crawls the Airflow source:
Hourly: Set the interval in hours (e.g., every 4 hours). Recommended for active pipelines where DAG and task metadata changes frequently.
Weekly: Set specific days and times for a lighter crawl cadence.
Recommended: Schedule the Airflow integration to run every 1β4 hours if you have execution history collection enabled. This keeps DAG run metrics and task lineage close to real time. Manual runs are also supported at any time.
Step 4: Resource Cleanup
Configure the cleanup policy to control how Euno handles resources that disappear from Airflow:
Immediate Cleanup: Resources not detected in the most recent successful run are removed immediately. Use this to keep the catalog tightly in sync with your Airflow deployment.
TTL-based Cleanup: Resources are retained for a configurable number of days after they were last seen, then removed. Useful when DAGs are temporarily disabled or when Airflow is redeployed.
No Cleanup: Resources are retained indefinitely even if no longer detected by Airflow.
Recommended: Use Immediate Cleanup for most deployments. This ensures that retired or deleted DAGs and tasks are promptly removed from the catalog.
What Euno Discovers
Airflow instances
DAGs
Tasks
Datasets and assets
Airflow dataset and asset payloads are kept as raw source evidence. When a dataset or asset URI is a supported warehouse table URI, Euno also emits non-authoritative database, schema, and table observations from that explicit Airflow metadata. Global processors then derive the DAG and task lineage from the same evidence. When Airflow defines an output table, Euno also applies the defining DAG or task upstream table lineage to that output table. When the lineage processor reprocesses a previous Airflow-defined output and sees that Airflow no longer defines it, stale Airflow-owned output lineage is cleared.
Operator SQL lineage is more heuristic. When execution history and connection metadata provide a complete, supported input or output target, Euno emits non-authoritative warehouse database, schema, and table observations for that SQL or operator table. Temporary tables created inside the SQL are skipped as warehouse observations. MySQL and Postgres operator lineage still enriches only warehouse tables that already exist in Euno.
When execution history collection is enabled, Euno stores Airflow DAG runs and task instances as DAMA events:
airflow_dag_runstores the raw Airflow DAG run payload on the DAG URI.airflow_task_instancestores the raw Airflow task instance payload on the task URI.
Euno derives DAG and task execution metrics from these events. These include latest run status, failing task IDs, run counts, success rates, duration averages, retry rates, and failure streaks. Operator lineage processors also use task instance events when rendered operator fields are needed.
Current operator lineage heuristics cover generic SQL operators, Snowflake operators, MySqlOperator, PostgresOperator, DatabricksSubmitRunOperator, S3ToSnowflakeOperator, BigQueryInsertJobOperator, and BigQueryExecuteQueryOperator. Unsupported operators are reported as unresolved lineage evidence rather than guessed.
SQL operator lineage uses Airflow connection metadata or Connection mapping to resolve warehouse URI namespaces. If Airflow cannot read connections, or if an input or output target is partial or ambiguous, Euno does not create warehouse observations from that SQL evidence. MySQL, Postgres, and Databricks need a single matching mapped warehouse prefix when rendered fields omit conn_id. Databricks submit-run lineage reads only SQL task query_text from rendered JSON.
For the full list of discovered resource types, properties, and relationships, see Airflow Integration Discovered Resources.
Last updated