Airflow Integration Discovered Resources

Ingested Resources & Properties

Airflow Instance (airflow_instance)

The top-level resource representing a single Airflow deployment. All DAGs and datasets discovered from one webserver are children of this instance.

Property
Description

name

Hostname extracted from the Airflow webserver URL

type

Always airflow_instance

native_id

Same as name; the hostname used as the stable identifier

subtype

Always airflow_instance

airflow_base_url

Canonical Airflow webserver URL

airflow_api_version

Detected REST API version (v1 for Airflow 2, v2 for Airflow 3)

airflow_version

Airflow version string reported by the health endpoint

airflow_connections

List of Airflow connections collected for lineage resolution (only present when Resolve warehouse lineage is enabled)

is_active

true when all Airflow health components report a healthy status

external_links

Link to the Airflow UI

DAG (airflow_dag)

An Airflow DAG observed from the Airflow REST API.

Property
Description

name

DAG display name, or the DAG ID when no display name is set

type

Always airflow_dag

native_id

DAG ID

subtype

Always airflow_dag

parent_container

Parent airflow_instance URI

is_active

true when the Airflow scheduler still sees the DAG file; independent of is_paused. For Airflow 3, this is derived from is_stale

description

DAG description (when available)

source_path

Relative or absolute path to the DAG file

tags

List of DAG tags

native_owners

List of DAG owners; each entry contains name, display_name, and email

external_links

Link to the DAG in the Airflow UI

airflow_last_run_pass

true if the most recent DAG run succeeded, false if it failed, null if unknown

airflow_last_run_id

Run ID of the most recent DAG run

airflow_last_run_duration

Duration of the most recent DAG run in seconds

airflow_last_run_failing_tasks

List of task IDs that failed in the most recent run

airflow_last_successful_run

Timestamp of the most recent successful run

airflow_failure_streak

Number of consecutive failed runs (resets to 0 on success)

airflow_total_runs_14d

Total DAG runs in the last 14 days

airflow_total_runs_30d

Total DAG runs in the last 30 days

airflow_total_runs_60d

Total DAG runs in the last 60 days

airflow_success_rate_14d

Fraction of successful runs (0–1) over the last 14 days

airflow_success_rate_30d

Fraction of successful runs (0–1) over the last 30 days

airflow_success_rate_60d

Fraction of successful runs (0–1) over the last 60 days

airflow_avg_duration_14d

Average DAG run duration in seconds over the last 14 days

airflow_avg_duration_30d

Average DAG run duration in seconds over the last 30 days

airflow_avg_duration_60d

Average DAG run duration in seconds over the last 60 days

Execution metrics (airflow_last_run_*, airflow_total_runs_*, airflow_success_rate_*, airflow_avg_duration_*, airflow_failure_streak) are derived from stored DAG run events and require Observe execution history to be enabled.

Task (airflow_task)

An individual task within an Airflow DAG.

Property
Description

name

Task display name, or the task ID when no display name is set

type

Always airflow_task

native_id

Task ID

subtype

Operator class name (e.g. BashOperator, PythonOperator, SnowflakeOperator)

parent_container

Parent airflow_dag URI

parent_container_name

Display name of the parent DAG

airflow_task_avg_duration_14d

Average task instance duration in seconds over the last 14 days

airflow_task_avg_duration_30d

Average task instance duration in seconds over the last 30 days

airflow_task_retry_rate_14d

Fraction of task instances that required at least one retry over the last 14 days

airflow_task_retry_rate_30d

Fraction of task instances that required at least one retry over the last 30 days

Task execution metrics (airflow_task_avg_duration_*, airflow_task_retry_rate_*) are derived from stored task instance events and require Observe execution history to be enabled.

Dataset / Asset (airflow_dataset)

An Airflow dataset (Airflow 2) or asset (Airflow 3) observed from the Airflow REST API. Dataset and asset payloads are kept as raw source evidence for lineage resolution.

Property
Description

name

Dataset name, or the URI string when no explicit name is set

type

Always airflow_dataset

native_id

Dataset URI (Airflow 2) or asset URI (Airflow 3)

subtype

dataset for Airflow 2 resources, asset for Airflow 3 resources, or the dataset group name when one is assigned

parent_container

Parent airflow_instance URI

airflow_dataset resources are only discovered when Observe datasets is enabled.

Database (database) β€” non-authoritative

A warehouse database inferred from an Airflow dataset/asset URI or from a supported SQL/operator input or output target. These observations are non-authoritative and are enriched or overridden by the primary warehouse integration when one is configured.

Property
Description

name

Database name extracted from the dataset URI

database_technology

Warehouse technology (e.g. snowflake, bigquery)

platform_uri_prefix

Warehouse platform prefix (e.g. snowflake://, bigquery://)

Schema (database_schema) β€” non-authoritative

A warehouse schema inferred from an Airflow dataset/asset URI or from a supported SQL/operator input or output target.

Property
Description

name

Schema name extracted from the dataset URI

database_database

Database name used for warehouse hierarchy matching

database_technology

Warehouse technology (e.g. snowflake, bigquery)

platform_uri_prefix

Warehouse platform prefix

Table (table) β€” non-authoritative

A warehouse table inferred from an Airflow dataset/asset URI or from a supported SQL/operator input or output target. Temporary tables created inside operator SQL are skipped. These observations are non-authoritative; final lineage and property resolution is handled by global processors.

Property
Description

name

Table name extracted from the dataset URI

database_database

Database name

database_schema

Schema name

database_technology

Warehouse technology

defined_by

Set to airflow when the table is currently defined by an Airflow DAG or task

platform_uri_prefix

Warehouse platform prefix

table_dependencies

Derived upstream warehouse tables inherited from the Airflow DAG or task that defines this output table

When the lineage global processor reprocesses a previous Airflow-defined output and sees that Airflow no longer defines it, stale Airflow-owned output lineage is cleared and final table dependencies are recomputed.


Relationships

Source type(s)
Relationship
Target type(s)
Notes

airflow_instance

has child

airflow_dag

DAGs are direct children of the Airflow instance

airflow_instance

has child

airflow_dataset

Datasets and assets are direct children of the Airflow instance

airflow_dag

has child

airflow_task

Tasks are grouped under their parent DAG

airflow_dag

has upstream

table, airflow_dataset

Tables and datasets the DAG reads from; derived by the lineage global processor from dataset metadata and operator SQL heuristics

airflow_dag

defines

table, airflow_dataset

Tables and datasets the DAG writes to; derived by the lineage global processor

airflow_task

has upstream

airflow_task

Upstream tasks within the same DAG, derived from Airflow downstream_task_ids metadata (upstream_task_ids)

airflow_task

has upstream

table, airflow_dataset

Tables and datasets the task reads from; derived by the lineage global processor

airflow_task

defines

table, airflow_dataset

Tables and datasets the task writes to; derived by the lineage global processor

airflow_dataset

defines

table

Only when the dataset or asset URI maps to a supported warehouse table URI (e.g. Snowflake or BigQuery table path)

table

has upstream

table

Airflow-defined output tables inherit upstream warehouse table lineage from the defining DAG or task

database

has child

database_schema

Non-authoritative warehouse hierarchy emitted from dataset URIs and supported SQL/operator targets

database_schema

has child

table

Non-authoritative warehouse hierarchy emitted from dataset URIs and supported SQL/operator targets

Last updated