# Airflow Integration Discovered Resources

### Ingested Resources & Properties

#### Airflow Instance (`airflow_instance`)

The top-level resource representing a single Airflow deployment. All DAGs and datasets discovered from one webserver are children of this instance.

| Property              | Description                                                                                                               |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `name`                | Hostname extracted from the Airflow webserver URL                                                                         |
| `native_id`           | Same as `name`; the hostname used as the stable identifier                                                                |
| `subtype`             | Always `airflow_instance`                                                                                                 |
| `airflow_base_url`    | Canonical Airflow webserver URL                                                                                           |
| `airflow_api_version` | Detected REST API version (`v1` for Airflow 2, `v2` for Airflow 3)                                                        |
| `airflow_version`     | Airflow version string reported by the health endpoint                                                                    |
| `airflow_connections` | List of Airflow connections collected for lineage resolution (only present when **Resolve warehouse lineage** is enabled) |
| `is_active`           | `true` when all Airflow health components report a healthy status                                                         |
| `external_links`      | Link to the Airflow UI                                                                                                    |

#### DAG (`airflow_dag`)

An Airflow DAG observed from the Airflow REST API.

| Property                         | Description                                                                                                                           |
| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                           | DAG display name, or the DAG ID when no display name is set                                                                           |
| `native_id`                      | DAG ID                                                                                                                                |
| `subtype`                        | Always `airflow_dag`                                                                                                                  |
| `is_active`                      | `true` when the Airflow scheduler still sees the DAG file; independent of `is_paused`. For Airflow 3, this is derived from `is_stale` |
| `description`                    | DAG description (when available)                                                                                                      |
| `source_path`                    | Relative or absolute path to the DAG file                                                                                             |
| `tags`                           | List of DAG tags                                                                                                                      |
| `native_owners`                  | List of DAG owners; each entry contains `name`, `display_name`, and `email`                                                           |
| `external_links`                 | Link to the DAG in the Airflow UI                                                                                                     |
| `airflow_last_run_pass`          | `true` if the most recent DAG run succeeded, `false` if it failed, `null` if unknown                                                  |
| `airflow_last_run_id`            | Run ID of the most recent DAG run                                                                                                     |
| `airflow_last_run_duration`      | Duration of the most recent DAG run in seconds                                                                                        |
| `airflow_last_run_failing_tasks` | List of task IDs that failed in the most recent run                                                                                   |
| `airflow_last_successful_run`    | Timestamp of the most recent successful run                                                                                           |
| `airflow_failure_streak`         | Number of consecutive failed runs (resets to 0 on success)                                                                            |
| `airflow_total_runs_14d`         | Total DAG runs in the last 14 days                                                                                                    |
| `airflow_total_runs_30d`         | Total DAG runs in the last 30 days                                                                                                    |
| `airflow_total_runs_60d`         | Total DAG runs in the last 60 days                                                                                                    |
| `airflow_success_rate_14d`       | Fraction of successful runs (0–1) over the last 14 days                                                                               |
| `airflow_success_rate_30d`       | Fraction of successful runs (0–1) over the last 30 days                                                                               |
| `airflow_success_rate_60d`       | Fraction of successful runs (0–1) over the last 60 days                                                                               |
| `airflow_avg_duration_14d`       | Average DAG run duration in seconds over the last 14 days                                                                             |
| `airflow_avg_duration_30d`       | Average DAG run duration in seconds over the last 30 days                                                                             |
| `airflow_avg_duration_60d`       | Average DAG run duration in seconds over the last 60 days                                                                             |

{% hint style="info" %}
Execution metrics (`airflow_last_run_*`, `airflow_total_runs_*`, `airflow_success_rate_*`, `airflow_avg_duration_*`, `airflow_failure_streak`) are derived from stored DAG run events and require **Observe execution history** to be enabled.
{% endhint %}

#### Task (`airflow_task`)

An individual task within an Airflow DAG.

| Property                        | Description                                                                       |
| ------------------------------- | --------------------------------------------------------------------------------- |
| `name`                          | Task display name, or the task ID when no display name is set                     |
| `native_id`                     | Task ID                                                                           |
| `subtype`                       | Operator class name (e.g. `BashOperator`, `PythonOperator`, `SnowflakeOperator`)  |
| `parent_container_name`         | Display name of the parent DAG                                                    |
| `airflow_task_avg_duration_14d` | Average task instance duration in seconds over the last 14 days                   |
| `airflow_task_avg_duration_30d` | Average task instance duration in seconds over the last 30 days                   |
| `airflow_task_retry_rate_14d`   | Fraction of task instances that required at least one retry over the last 14 days |
| `airflow_task_retry_rate_30d`   | Fraction of task instances that required at least one retry over the last 30 days |

{% hint style="info" %}
Task execution metrics (`airflow_task_avg_duration_*`, `airflow_task_retry_rate_*`) are derived from stored task instance events and require **Observe execution history** to be enabled.
{% endhint %}

#### Dataset / Asset (`airflow_dataset`)

An Airflow dataset (Airflow 2) or asset (Airflow 3) observed from the Airflow REST API. Dataset and asset payloads are kept as raw source evidence for lineage resolution.

| Property    | Description                                                                                                        |
| ----------- | ------------------------------------------------------------------------------------------------------------------ |
| `name`      | Dataset name, or the URI string when no explicit name is set                                                       |
| `native_id` | Dataset URI (Airflow 2) or asset URI (Airflow 3)                                                                   |
| `subtype`   | `dataset` for Airflow 2 resources, `asset` for Airflow 3 resources, or the dataset group name when one is assigned |

{% hint style="info" %}
`airflow_dataset` resources are only discovered when **Observe datasets** is enabled.
{% endhint %}

#### Database (`database`) — non-authoritative

A warehouse database inferred from an Airflow dataset/asset URI or from a supported SQL/operator input or output target. These observations are non-authoritative and are enriched or overridden by the primary warehouse integration when one is configured.

| Property              | Description                                                    |
| --------------------- | -------------------------------------------------------------- |
| `name`                | Database name extracted from the dataset URI                   |
| `database_technology` | Warehouse technology (e.g. `snowflake`, `bigquery`)            |
| `platform_uri_prefix` | Warehouse platform prefix (e.g. `snowflake://`, `bigquery://`) |

#### Schema (`database_schema`) — non-authoritative

A warehouse schema inferred from an Airflow dataset/asset URI or from a supported SQL/operator input or output target.

| Property              | Description                                         |
| --------------------- | --------------------------------------------------- |
| `name`                | Schema name extracted from the dataset URI          |
| `database_database`   | Database name used for warehouse hierarchy matching |
| `database_technology` | Warehouse technology (e.g. `snowflake`, `bigquery`) |
| `platform_uri_prefix` | Warehouse platform prefix                           |

#### Table (`table`) — non-authoritative

A warehouse table inferred from an Airflow dataset/asset URI or from a supported SQL/operator input or output target. Temporary tables created inside operator SQL are skipped. These observations are non-authoritative; final lineage and property resolution is handled by global processors.

| Property              | Description                                                                                             |
| --------------------- | ------------------------------------------------------------------------------------------------------- |
| `name`                | Table name extracted from the dataset URI                                                               |
| `database_database`   | Database name                                                                                           |
| `database_schema`     | Schema name                                                                                             |
| `database_technology` | Warehouse technology                                                                                    |
| `defined_by`          | Set to `airflow` when the table is currently defined by an Airflow DAG or task                          |
| `platform_uri_prefix` | Warehouse platform prefix                                                                               |
| `table_dependencies`  | Derived upstream warehouse tables inherited from the Airflow DAG or task that defines this output table |

{% hint style="info" %}
When the lineage global processor reprocesses a previous Airflow-defined output and sees that Airflow no longer defines it, stale Airflow-owned output lineage is cleared and final table dependencies are recomputed.
{% endhint %}

***

### Relationships

| Source type(s)     | Relationship | Target type(s)             | Notes                                                                                                                             |
| ------------------ | ------------ | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `airflow_instance` | has child    | `airflow_dag`              | DAGs are direct children of the Airflow instance                                                                                  |
| `airflow_instance` | has child    | `airflow_dataset`          | Datasets and assets are direct children of the Airflow instance                                                                   |
| `airflow_dag`      | has child    | `airflow_task`             | Tasks are grouped under their parent DAG                                                                                          |
| `airflow_dag`      | has upstream | `table`, `airflow_dataset` | Tables and datasets the DAG reads from; derived by the lineage global processor from dataset metadata and operator SQL heuristics |
| `airflow_dag`      | defines      | `table`, `airflow_dataset` | Tables and datasets the DAG writes to; derived by the lineage global processor                                                    |
| `airflow_task`     | has upstream | `table`, `airflow_dataset` | Tables and datasets the task reads from; derived by the lineage global processor                                                  |
| `airflow_task`     | defines      | `table`, `airflow_dataset` | Tables and datasets the task writes to; derived by the lineage global processor                                                   |
| `airflow_dataset`  | defines      | `table`                    | Only when the dataset or asset URI maps to a supported warehouse table URI (e.g. Snowflake or BigQuery table path)                |
| `table`            | has upstream | `table`                    | Airflow-defined output tables inherit upstream warehouse table lineage from the defining DAG or task                              |
| `database`         | has child    | `database_schema`          | Non-authoritative warehouse hierarchy emitted from dataset URIs and supported SQL/operator targets                                |
| `database_schema`  | has child    | `table`                    | Non-authoritative warehouse hierarchy emitted from dataset URIs and supported SQL/operator targets                                |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.euno.ai/sources/transformation-etl/airflow-integration/airflow-integration-discovered-resources.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
