# Databricks Integration Discovered Resources

The Euno Databricks integration discovers and updates the following resource families:

* Databricks workspace
* Unity Catalog databases (catalogs)
* Unity Catalog schemas
* Databricks tables (including views, materialized views, and external tables)
* Databricks table columns
* Databricks notebooks
* Usage and lineage properties across the resources above

## Discovery inputs

The integration combines multiple Databricks data sources:

* `system.information_schema.*` for metadata
* `system.query.history` for usage and notebook execution evidence
* `system.access.table_lineage` for table lineage evidence
* `system.access.column_lineage` for regular column-level lineage and notebook reporting evidence
* Workspace API for notebook discovery

## Resource hierarchy

The Databricks integration follows this container hierarchy:

```
Databricks Workspace
└── Catalog (Database)
    └── Schema
        ├── Table / View / Materialized View / External Table
        │   └── Column
        └── ...

Databricks Workspace
└── Notebook
```

### Ingested Resources & Properties

#### Databricks workspace (`databricks_workspace`)

| Property              | Description                                                                                         |
| --------------------- | --------------------------------------------------------------------------------------------------- |
| `name`                | Workspace display name (from configuration; defaults to hostname if **Workspace Name** is not set). |
| `type`                | Always `databricks_workspace`.                                                                      |
| `subtype`             | Always `databricks_workspace`.                                                                      |
| `description`         | Short generated description that includes the workspace hostname.                                   |
| `database_technology` | Always `databricks`.                                                                                |

**URI pattern**

```
platform.databricks.databricks_workspace.<hostname>
```

Example: `platform.databricks.databricks_workspace.dbc-50e7cad0-c2f9.cloud.databricks.com`

#### Unity Catalog database (`database`)

| Property                   | Description                                       |
| -------------------------- | ------------------------------------------------- |
| `name`                     | Catalog name.                                     |
| `type`                     | Always `database`.                                |
| `subtype`                  | Always `database`.                                |
| `parent_container`         | Parent workspace URI.                             |
| `database_technology`      | Always `databricks`.                              |
| `description`              | Catalog comment from Unity Catalog, when present. |
| `native_owners`            | Catalog owner from Unity Catalog metadata.        |
| `created_at`, `updated_at` | Catalog timestamps when available.                |
| `created_by`, `updated_by` | Creator and last modifier when available.         |
| `tags`                     | Tag names with empty or null values.              |
| `meta`                     | Tag key–value pairs when values are non-empty.    |

**URI pattern**

```
databricks.<hostname>.<catalog_name>
```

#### Unity Catalog schema (`database_schema`)

| Property                   | Description                                    |
| -------------------------- | ---------------------------------------------- |
| `name`                     | Schema name.                                   |
| `type`                     | Always `database_schema`.                      |
| `subtype`                  | Always `database_schema`.                      |
| `parent_container`         | Parent catalog URI.                            |
| `database_technology`      | Always `databricks`.                           |
| `database`                 | Catalog name.                                  |
| `description`              | Schema comment when present.                   |
| `native_owners`            | Schema owner when present.                     |
| `created_at`, `updated_at` | Schema timestamps when available.              |
| `created_by`, `updated_by` | Creator and last modifier when available.      |
| `tags`                     | Tag names with empty or null values.           |
| `meta`                     | Tag key–value pairs when values are non-empty. |

**URI pattern**

```
databricks.<hostname>.<catalog_name>.<schema_name>
```

#### Databricks table (`table`)

Databricks table resources use these subtypes: `base_table`, `external_table`, `view`, `materialized_view`.

| Property                        | Description                                                                                                                                                                                                                                                                       |
| ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                          | Table or view name.                                                                                                                                                                                                                                                               |
| `type`                          | Always `table`.                                                                                                                                                                                                                                                                   |
| `subtype`                       | One of the Databricks subtypes above.                                                                                                                                                                                                                                             |
| `parent_container`              | Parent schema URI.                                                                                                                                                                                                                                                                |
| `database`                      | Catalog name.                                                                                                                                                                                                                                                                     |
| `database_schema`               | Schema name.                                                                                                                                                                                                                                                                      |
| `database_technology`           | Always `databricks`.                                                                                                                                                                                                                                                              |
| `description`                   | Table or view comment when present.                                                                                                                                                                                                                                               |
| `native_owners`                 | Owner when present.                                                                                                                                                                                                                                                               |
| `created_at`, `updated_at`      | Object timestamps when available.                                                                                                                                                                                                                                                 |
| `created_by`, `updated_by`      | Creator and last modifier when available.                                                                                                                                                                                                                                         |
| `tags`                          | Tag names with empty or null values.                                                                                                                                                                                                                                              |
| `meta`                          | Tag key–value pairs when values are non-empty.                                                                                                                                                                                                                                    |
| `sql_dialect`                   | SQL dialect (`databricks`).                                                                                                                                                                                                                                                       |
| `platform_uri_prefix`           | URI prefix for this workspace (`databricks.<hostname>`).                                                                                                                                                                                                                          |
| `table_properties.materialized` | Whether the object is materialized, derived from Databricks table type and view metadata.                                                                                                                                                                                         |
| `table_schema`                  | Structured column list for observed columns.                                                                                                                                                                                                                                      |
| `table_dependencies`            | Upstream tables and views for lineage and dependencies. Values combine warehouse lineage from `system.access.table_lineage`, dependencies parsed from view SQL when applicable, and other merged inputs; Euno exposes the result as this resource’s upstream table relationships. |

**URI pattern**

```
databricks.<hostname>.<catalog_name>.<schema_name>.<table_name>
```

**System exclusions**

The crawler skips catalog discovery for `information_schema` and `hive_metastore`. Metadata queries for tables, schemas, and related objects exclude the `system` and `samples` catalogs and the `information_schema` schema.

#### Table usage metrics

Usage windows are emitted for `14d`, `30d`, and `60d` intervals.

#### Read usage properties

Applicable to table resources (`base_table`, `external_table`, `view`, `materialized_view`).

| Property family                                                                                      | Description                                             |
| ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| `total_read_queries_14d`, `total_read_queries_30d`, `total_read_queries_60d`                         | Number of read queries.                                 |
| `total_read_runtime_14d`, `total_read_runtime_30d`, `total_read_runtime_60d`                         | Total active compute runtime (ms) for read queries.     |
| `total_read_bytes_processed_14d`, `total_read_bytes_processed_30d`, `total_read_bytes_processed_60d` | Total read bytes.                                       |
| `distinct_users_14d`, `distinct_users_30d`, `distinct_users_60d`                                     | Distinct users running read queries.                    |
| `total_read_dbu_14d`, `total_read_dbu_30d`, `total_read_dbu_60d`                                     | Total Databricks read DBU-seconds (compute-time proxy). |
| `average_read_dbu_14d`, `average_read_dbu_30d`, `average_read_dbu_60d`                               | Average read DBU-seconds per read query.                |

Read DBU values are derived from query-history active compute time.

#### Write usage properties

Write applicability follows Databricks `table_type` classification:

| Observed subtype    | Databricks `table_type` values                         | Write usage applicable? | Behavior                                                                              |
| ------------------- | ------------------------------------------------------ | ----------------------- | ------------------------------------------------------------------------------------- |
| `view`              | `VIEW`                                                 | No                      | Write metrics are emitted as zeros each crawl to clear stale values.                  |
| `materialized_view` | `MATERIALIZED_VIEW`                                    | Yes                     | Write metrics from write statements; zero-filled when there is no in-window activity. |
| `base_table`        | `MANAGED`, `STREAMING_TABLE`, `MANAGED_SHALLOW_CLONE`  | Yes                     | Same as materialized view.                                                            |
| `external_table`    | `EXTERNAL`, `EXTERNAL_SHALLOW_CLONE`, `EXTERNAL_TABLE` | Yes                     | Same as materialized view.                                                            |
| `external_table`    | `FOREIGN`                                              | No                      | Write metrics are not applicable.                                                     |

| Property family                                                                                         | Meaning                                              |
| ------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- |
| `total_write_queries_14d`, `total_write_queries_30d`, `total_write_queries_60d`                         | Number of write queries.                             |
| `total_write_runtime_14d`, `total_write_runtime_30d`, `total_write_runtime_60d`                         | Total active compute runtime (ms) for write queries. |
| `total_write_bytes_processed_14d`, `total_write_bytes_processed_30d`, `total_write_bytes_processed_60d` | Total bytes processed by write queries.              |

#### Databricks column (`column`)

| Property               | Description                                                                                                                                                                                                    |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                 | Column name.                                                                                                                                                                                                   |
| `type`                 | Always `column`.                                                                                                                                                                                               |
| `subtype`              | Always `column`.                                                                                                                                                                                               |
| `description`          | Column comment when present.                                                                                                                                                                                   |
| `parent_container`     | Parent table URI.                                                                                                                                                                                              |
| `database`             | Catalog name.                                                                                                                                                                                                  |
| `database_schema`      | Schema name.                                                                                                                                                                                                   |
| `database_technology`  | Always `databricks`.                                                                                                                                                                                           |
| `native_data_type`     | Native Databricks data type.                                                                                                                                                                                   |
| `normalized_data_type` | Euno-normalized data type.                                                                                                                                                                                     |
| `upstream_fields`      | Upstream column URIs for lineage. Values combine warehouse column lineage from `system.access.column_lineage` with other merged inputs; Euno exposes the result as this column’s upstream field relationships. |

**URI pattern**

```
databricks.<hostname>.<catalog_name>.<schema_name>.<table_name>.column.<column_name>
```

#### Column usage metrics

Column usage is emitted for `14d`, `30d`, and `60d` windows.

| Property family                                                              | Meaning                                   |
| ---------------------------------------------------------------------------- | ----------------------------------------- |
| `total_read_queries_14d`, `total_read_queries_30d`, `total_read_queries_60d` | Distinct statements that read the column. |
| `distinct_users_14d`, `distinct_users_30d`, `distinct_users_60d`             | Distinct users reading the column.        |

#### Databricks notebook (`databricks_notebook`)

| Property                   | Description                                                                                                                                        |
| -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                     | Notebook display name (path basename).                                                                                                             |
| `type`                     | Always `databricks_notebook`.                                                                                                                      |
| `subtype`                  | Always `databricks_notebook`.                                                                                                                      |
| `native_id`                | Databricks notebook object ID.                                                                                                                     |
| `parent_container`         | Parent workspace URI.                                                                                                                              |
| `created_at`, `updated_at` | Notebook object timestamps from the Workspace API.                                                                                                 |
| `description`              | Notebook description when the API provides it.                                                                                                     |
| `native_last_data_update`  | Latest observed execution time from query history evidence.                                                                                        |
| `defines`                  | Table URIs the notebook is observed to define or update from table-lineage evidence (`defines` on the notebook).                                   |
| `table_dependencies`       | Table URIs the notebook reads from (`system.access.table_lineage` with notebook context); stored on the notebook.                                  |
| `upstream_fields`          | Column URIs referenced in notebook column-lineage evidence for reporting-style relationships; stored on the **notebook** (not on the column rows). |

**URI pattern**

```
platform.databricks.databricks_workspace.<hostname>.notebook.<notebook_object_id>
```

#### Statement-level column dependencies for views

For Databricks views, Euno also captures statement-level column dependencies and reflects them in **table-level** `upstream_fields`.

That means a view can show upstream field relationships to columns used only in SQL logic (for example filters), not only columns in the `SELECT` list.

Included SQL logic clauses:

* `JOIN` conditions
* `WHERE`
* `GROUP BY`
* `HAVING`
* `QUALIFY`
* `ORDER BY`

Example: for `SELECT order_id FROM orders_raw WHERE order_status = 'COMPLETE'`, the view can relate upstream to `orders_raw.column.order_status` even when `order_status` is not projected.

## Relationships

| Source type(s)        | Relationship         | Target type(s)         | Notes                                                                                   |
| --------------------- | -------------------- | ---------------------- | --------------------------------------------------------------------------------------- |
| `database`            | has parent           | `databricks_workspace` | `parent_container` on the catalog.                                                      |
| `database_schema`     | has parent           | `database`             | `parent_container` on the schema.                                                       |
| `table`               | has parent           | `database_schema`      | `parent_container` on the table.                                                        |
| `column`              | has parent           | `table`                | `parent_container` on the column.                                                       |
| `databricks_notebook` | has parent           | `databricks_workspace` | `parent_container` on the notebook.                                                     |
| `table`               | has upstream         | `table`                | Upstream warehouse tables and views (merged `table_dependencies` / lineage).            |
| `databricks_notebook` | has upstream         | `table`                | Tables the notebook reads (`table_dependencies` on the notebook).                       |
| `column`              | has upstream\_fields | `column`               | Column-to-column lineage (merged `upstream_fields`).                                    |
| `databricks_notebook` | has upstream\_fields | `column`               | Notebook reporting lineage: the notebook lists source column URIs in `upstream_fields`. |
| `databricks_notebook` | defines              | `table`                | Tables the notebook is observed to define or update (`defines` on the notebook).        |
| `table`               | has definer          | `databricks_notebook`  | Inverse of **defines** when the table is tied to a defining notebook.                   |

## Notes and caveats

* Notebook execution evidence and notebook-derived relationships use `system.query.history`, `system.access.table_lineage`, and `system.access.column_lineage` with a 30-day lookback window.
* Regular Databricks column lineage reads from `system.access.column_lineage` with a 30-day lookback; graph relationships use the merged column upstream field model described above.
* Table and column usage metrics use `14d`, `30d`, and `60d` windows.
* Usage and lineage relationships are emitted only for resources observed in the current integration scope.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.euno.ai/sources/data-warehouses/databricks-integration/databricks-integration-discovered-resources.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
