Airflow Integration Discovered Resources
Ingested Resources & Properties
Airflow Instance (airflow_instance)
airflow_instance)The top-level resource representing a single Airflow deployment. All DAGs and datasets discovered from one webserver are children of this instance.
name
Hostname extracted from the Airflow webserver URL
type
Always airflow_instance
native_id
Same as name; the hostname used as the stable identifier
subtype
Always airflow_instance
airflow_base_url
Canonical Airflow webserver URL
airflow_api_version
Detected REST API version (v1 for Airflow 2, v2 for Airflow 3)
airflow_version
Airflow version string reported by the health endpoint
airflow_connections
List of Airflow connections collected for lineage resolution (only present when Resolve warehouse lineage is enabled)
is_active
true when all Airflow health components report a healthy status
external_links
Link to the Airflow UI
DAG (airflow_dag)
airflow_dag)An Airflow DAG observed from the Airflow REST API.
name
DAG display name, or the DAG ID when no display name is set
type
Always airflow_dag
native_id
DAG ID
subtype
Always airflow_dag
parent_container
Parent airflow_instance URI
is_active
true when the Airflow scheduler still sees the DAG file; independent of is_paused. For Airflow 3, this is derived from is_stale
description
DAG description (when available)
source_path
Relative or absolute path to the DAG file
tags
List of DAG tags
native_owners
List of DAG owners; each entry contains name, display_name, and email
external_links
Link to the DAG in the Airflow UI
airflow_last_run_pass
true if the most recent DAG run succeeded, false if it failed, null if unknown
airflow_last_run_id
Run ID of the most recent DAG run
airflow_last_run_duration
Duration of the most recent DAG run in seconds
airflow_last_run_failing_tasks
List of task IDs that failed in the most recent run
airflow_last_successful_run
Timestamp of the most recent successful run
airflow_failure_streak
Number of consecutive failed runs (resets to 0 on success)
airflow_total_runs_14d
Total DAG runs in the last 14 days
airflow_total_runs_30d
Total DAG runs in the last 30 days
airflow_total_runs_60d
Total DAG runs in the last 60 days
airflow_success_rate_14d
Fraction of successful runs (0β1) over the last 14 days
airflow_success_rate_30d
Fraction of successful runs (0β1) over the last 30 days
airflow_success_rate_60d
Fraction of successful runs (0β1) over the last 60 days
airflow_avg_duration_14d
Average DAG run duration in seconds over the last 14 days
airflow_avg_duration_30d
Average DAG run duration in seconds over the last 30 days
airflow_avg_duration_60d
Average DAG run duration in seconds over the last 60 days
Execution metrics (airflow_last_run_*, airflow_total_runs_*, airflow_success_rate_*, airflow_avg_duration_*, airflow_failure_streak) are derived from stored DAG run events and require Observe execution history to be enabled.
Task (airflow_task)
airflow_task)An individual task within an Airflow DAG.
name
Task display name, or the task ID when no display name is set
type
Always airflow_task
native_id
Task ID
subtype
Operator class name (e.g. BashOperator, PythonOperator, SnowflakeOperator)
parent_container
Parent airflow_dag URI
parent_container_name
Display name of the parent DAG
airflow_task_avg_duration_14d
Average task instance duration in seconds over the last 14 days
airflow_task_avg_duration_30d
Average task instance duration in seconds over the last 30 days
airflow_task_retry_rate_14d
Fraction of task instances that required at least one retry over the last 14 days
airflow_task_retry_rate_30d
Fraction of task instances that required at least one retry over the last 30 days
Task execution metrics (airflow_task_avg_duration_*, airflow_task_retry_rate_*) are derived from stored task instance events and require Observe execution history to be enabled.
Dataset / Asset (airflow_dataset)
airflow_dataset)An Airflow dataset (Airflow 2) or asset (Airflow 3) observed from the Airflow REST API. Dataset and asset payloads are kept as raw source evidence for lineage resolution.
name
Dataset name, or the URI string when no explicit name is set
type
Always airflow_dataset
native_id
Dataset URI (Airflow 2) or asset URI (Airflow 3)
subtype
dataset for Airflow 2 resources, asset for Airflow 3 resources, or the dataset group name when one is assigned
parent_container
Parent airflow_instance URI
airflow_dataset resources are only discovered when Observe datasets is enabled.
Database (database) β non-authoritative
database) β non-authoritativeA warehouse database inferred from an Airflow dataset/asset URI or from a supported SQL/operator input or output target. These observations are non-authoritative and are enriched or overridden by the primary warehouse integration when one is configured.
name
Database name extracted from the dataset URI
database_technology
Warehouse technology (e.g. snowflake, bigquery)
platform_uri_prefix
Warehouse platform prefix (e.g. snowflake://, bigquery://)
Schema (database_schema) β non-authoritative
database_schema) β non-authoritativeA warehouse schema inferred from an Airflow dataset/asset URI or from a supported SQL/operator input or output target.
name
Schema name extracted from the dataset URI
database_database
Database name used for warehouse hierarchy matching
database_technology
Warehouse technology (e.g. snowflake, bigquery)
platform_uri_prefix
Warehouse platform prefix
Table (table) β non-authoritative
table) β non-authoritativeA warehouse table inferred from an Airflow dataset/asset URI or from a supported SQL/operator input or output target. Temporary tables created inside operator SQL are skipped. These observations are non-authoritative; final lineage and property resolution is handled by global processors.
name
Table name extracted from the dataset URI
database_database
Database name
database_schema
Schema name
database_technology
Warehouse technology
defined_by
Set to airflow when the table is currently defined by an Airflow DAG or task
platform_uri_prefix
Warehouse platform prefix
table_dependencies
Derived upstream warehouse tables inherited from the Airflow DAG or task that defines this output table
When the lineage global processor reprocesses a previous Airflow-defined output and sees that Airflow no longer defines it, stale Airflow-owned output lineage is cleared and final table dependencies are recomputed.
Relationships
airflow_instance
has child
airflow_dag
DAGs are direct children of the Airflow instance
airflow_instance
has child
airflow_dataset
Datasets and assets are direct children of the Airflow instance
airflow_dag
has child
airflow_task
Tasks are grouped under their parent DAG
airflow_dag
has upstream
table, airflow_dataset
Tables and datasets the DAG reads from; derived by the lineage global processor from dataset metadata and operator SQL heuristics
airflow_dag
defines
table, airflow_dataset
Tables and datasets the DAG writes to; derived by the lineage global processor
airflow_task
has upstream
airflow_task
Upstream tasks within the same DAG, derived from Airflow downstream_task_ids metadata (upstream_task_ids)
airflow_task
has upstream
table, airflow_dataset
Tables and datasets the task reads from; derived by the lineage global processor
airflow_task
defines
table, airflow_dataset
Tables and datasets the task writes to; derived by the lineage global processor
airflow_dataset
defines
table
Only when the dataset or asset URI maps to a supported warehouse table URI (e.g. Snowflake or BigQuery table path)
table
has upstream
table
Airflow-defined output tables inherit upstream warehouse table lineage from the defining DAG or task
database
has child
database_schema
Non-authoritative warehouse hierarchy emitted from dataset URIs and supported SQL/operator targets
database_schema
has child
table
Non-authoritative warehouse hierarchy emitted from dataset URIs and supported SQL/operator targets
Last updated