# Databricks

Euno's Databricks integration supports auto-discovery of:

* Databricks workspaces
* Unity Catalog databases (catalogs)
* Unity Catalog schemas
* Unity Catalog tables, views, materialized views, and external tables
* Table and column metadata (including tags)
* Table usage metrics for 14/30/60 day windows
* Column usage metrics for 14/30/60 day windows
* Query-history-backed column lineage
* Databricks notebooks and notebook-driven relationships

## Setting up Databricks integration

### Overview

Databricks is a unified analytics platform that combines data engineering, data science, and analytics workflows.

In Euno, the Databricks integration combines:

* Unity Catalog metadata (`system.information_schema.*`)
* Databricks system lineage/usage evidence (`system.access.*`, `system.query.history`)
* Workspace notebook discovery (Workspace API)

This allows Euno to observe warehouse resources, usage, lineage, and notebook artifacts in one integration.

### Step 1: Prepare your Databricks workspace

#### Ensure Databricks SQL warehouse access

1. Verify that Unity Catalog is enabled.
2. Ensure at least one SQL warehouse is available and running.
3. Note your workspace hostname (for example `dbc-xxxxxxxx-xxxx.cloud.databricks.com`).
4. Note the SQL warehouse HTTP path (for example `/sql/1.0/warehouses/warehouse-id`).

#### Create a Databricks personal access token

1. In Databricks, go to **Settings** -> **Developer** -> **Access tokens**.
2. Click **Generate new token**.
3. Set a comment (for example `Euno integration`).
4. Set token lifetime as needed.
5. Click **Generate**.
6. Copy and store the token securely.

**Required permissions**

The token user should have:

* Access to the Databricks workspace
* Permission to use the configured SQL warehouse
* `SELECT` access to required system tables/views
* Access to catalogs/schemas/tables you want to observe
* Workspace API visibility to notebook paths you want to observe

### Step 2: Configure the Databricks integration in Euno

| Configuration   | Description                                                                          |
| --------------- | ------------------------------------------------------------------------------------ |
| Server Hostname | Databricks workspace hostname (for example `dbc-xxxxxxxx-xxxx.cloud.databricks.com`) |
| HTTP Path       | Databricks SQL warehouse HTTP path (for example `/sql/1.0/warehouses/warehouse-id`)  |
| Access Token    | Databricks personal access token                                                     |
| Workspace Name  | Optional display name in Euno (defaults to hostname)                                 |

### Step 3: Schedule

* Enable scheduling.
* Choose one option:
  1. **Weekly** with specific day/time.
  2. **Hourly** with an interval.

### Step 4: Resource cleanup

* **Immediate Cleanup**: remove resources not detected in the latest successful run.
* **No Cleanup**: keep resources even if no longer detected.

For cleanup semantics, see [Resource Sponsorship in Euno](https://docs.euno.ai/developer-reference/technical-concepts/resource-sponsorship-and-cleanup-in-euno).

### Step 5: Advanced settings

| Configuration     | Description                                                   |
| ----------------- | ------------------------------------------------------------- |
| Override Base URI | Override the hostname used in generated URIs                  |
| Database Pattern  | Include/exclude catalogs by regex pattern                     |
| Notebooks Pattern | Include/exclude notebooks by regex against full notebook path |

**Pattern examples**

* `.*` - include all matched resources
* `production_.*` - include only names starting with `production_`
* allow `.*` and deny `test_.*` - include all except names starting with `test_`

### Step 6: Save

Click **Test & Save** to validate connectivity and save the source.

## Important behavior notes

* Table read usage metrics are emitted for 14/30/60 day windows.
* Databricks read DBU metrics are emitted as:
  * `total_read_dbu_14d`, `total_read_dbu_30d`, `total_read_dbu_60d`
  * `average_read_dbu_14d`, `average_read_dbu_30d`, `average_read_dbu_60d`
* Column usage metrics are emitted for 14/30/60 day windows.
* Notebook execution evidence and notebook-derived relationships are evaluated from `system.query.history`, `system.access.table_lineage`, and `system.access.column_lineage` with a 30-day lookback window.
* Regular Databricks CLL (`upstream_fields` on target columns) is evaluated from `system.access.column_lineage` with a 30-day lookback window.
* Notebook observation is enabled by default and filtered by **Notebooks Pattern**.

## Discovered resources

For the full Databricks resource/property/relationship contract, see [Databricks Integration Discovered Resources](https://docs.euno.ai/sources/data-warehouses/databricks-integration/databricks-integration-discovered-resources).
