# OpenLineage

Euno's OpenLineage integration enables seamless ingestion of data lineage events from any system that produces [OpenLineage](https://openlineage.io/) compliant events. This integration automatically processes lineage metadata, table information, and tags to build a comprehensive view of your data pipeline dependencies and transformations.

### How It Works

The integration follows these steps:

1. **Provides a secure endpoint** Euno generates a unique integration key and endpoint URL for receiving OpenLineage events
2. **Accepts OpenLineage Events** The integration accepts both single events and arrays of events in standard OpenLineage format via HTTP POST
3. **Processes Lineage and Metadata**
   * Extracts table information from input and output datasets
   * Creates lineage relationships between input and output tables
   * Processes tags and metadata from dataset facets
   * Validates naming conventions for supported data warehouses

### Supported Data Warehouses

Currently, the Euno's OpenLineage integration supports:

* **Snowflake** - with naming convention:
  * namespace: `snowflake://org-account`
  * name: `database.schema.table`
* **BigQuery** - with naming convention:
  * namespace: `bigquery`
  * name: `project.dataset.table`

For detailed naming conventions, see the [OpenLineage Naming Specification](https://openlineage.io/docs/spec/naming).

## Setting up Euno's OpenLineage Integration

### Step 1: Configure New OpenLineage Source in Euno

#### Access the Sources Page

1. Navigate to the **Sources** page in the Euno application
2. Click on the **Add New Source** button
3. Select **OpenLineage** from the available integrations

### Step 2: General Configuration

1. **Name**: Enter a descriptive name for your OpenLineage source (e.g., "Data Pipeline Lineage")
2. **Configuration Details**:
   * OpenLineage integration requires minimal configuration as it's a push-based integration
   * No schedule configuration is needed since events are pushed in real-time

### Step 3: Resource Cleanup Options

Configure automatic **resource cleanup** options to manage outdated resources:

* **Time-Based Cleanup (default)**: Remove resources not detected for X days (default: 7 days)
* **Immediate Cleanup**: Remove resources not detected in the most recent run
* **No Cleanup**: Keep all resources indefinitely

### Step 4: Save Configuration

Click the **Save** button, and Euno will generate an integration key. **Copy and save this key securely** as it will not be displayed again.

### Step 5: Get the Upload Endpoint

1. Click **"Reset Trigger Key"** to get the endpoint URL
2. Copy the provided endpoint URL where you'll send OpenLineage events
3. Use the integration key from Step 4 as the Bearer token in your Authorization header

## Sending OpenLineage Events

### Example: Single Event with Lineage and Tags

Here's a complete example of an OpenLineage event with input/output lineage and tags:

```json
{
  "eventType": "COMPLETE",
  "eventTime": "2024-01-15T10:30:00.001Z",
  "run": {
    "runId": "my-etl-run-12345"
  },
  "job": {
    "namespace": "production-pipeline",
    "name": "customer-analytics-etl"
  },
  "inputs": [
    {
      "namespace": "snowflake://myorg-account123",
      "name": "raw_data.public.customer_events"
    },
    {
      "namespace": "bigquery", 
      "name": "external_data.staging.product_catalog"
    }
  ],
  "outputs": [
    {
      "namespace": "snowflake://myorg-account123",
      "name": "analytics.public.customer_analytics",
      "facets": {
        "tags": [
          {
            "key": "environment",
            "value": "production"
          },
          {
            "key": "team", 
            "value": "data-engineering"
          },
          {
            "key": "contains_pii"
          },
          {
            "key": "data_classification",
            "value": "sensitive"
          }
        ],
        "schema": {
          "fields": [
            {
              "name": "customer_id",
              "type": "BIGINT",
              "description": "Unique customer identifier"
            },
            {
              "name": "total_purchases",
              "type": "DECIMAL(10,2)",
              "description": "Total purchase amount"
            }
          ]
        }
      }
    }
  ],
  "producer": "https://my-etl-system.com/v1.2.0",
  "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
}
```

### cURL Command Examples

#### Single Event Upload

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_INTEGRATION_KEY_HERE" \
  -d '{
    "eventType": "COMPLETE",
    "eventTime": "2024-01-15T10:30:00.001Z",
    "run": {"runId": "simple-etl-123"},
    "job": {"namespace": "my-pipeline", "name": "daily-aggregation"},
    "inputs": [{
      "namespace": "snowflake://myorg-account123", 
      "name": "raw.public.events"
    }],
    "outputs": [{
      "namespace": "snowflake://myorg-account123",
      "name": "analytics.public.daily_stats",
      "facets": {
        "tags": [
          {"key": "environment", "value": "prod"},
          {"key": "automated"}
        ]
      }
    }],
    "producer": "my-pipeline-v1.0",
    "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
  }' \
  https://api.app.euno.ai/accounts/YOUR_ACCOUNT_ID/integrations/YOUR_INTEGRATION_ID/run
```

#### Multiple Events Upload

```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_INTEGRATION_KEY_HERE" \
  -d '[
    {
      "eventType": "START", 
      "eventTime": "2024-01-15T10:00:00.001Z",
      "run": {"runId": "batch-job-456"},
      "job": {"namespace": "etl", "name": "batch-processor"},
      "producer": "scheduler-v2.1",
      "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
    },
    {
      "eventType": "COMPLETE",
      "eventTime": "2024-01-15T10:15:00.001Z", 
      "run": {"runId": "batch-job-456"},
      "job": {"namespace": "etl", "name": "batch-processor"},
      "inputs": [{"namespace": "bigquery", "name": "raw.events.user_actions"}],
      "outputs": [{"namespace": "bigquery", "name": "processed.analytics.user_metrics"}],
      "producer": "scheduler-v2.1", 
      "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
    }
  ]' \
  https://api.app.euno.ai/accounts/YOUR_ACCOUNT_ID/integrations/YOUR_INTEGRATION_ID/run
```

#### Upload from File

```bash
# Save your event to a file
cat > event.json << 'EOF'
{
  "eventType": "COMPLETE",
  "eventTime": "2024-01-15T10:30:00.001Z",
  "run": {"runId": "file-upload-test"},
  "job": {"namespace": "testing", "name": "file-upload"},
  "outputs": [{
    "namespace": "snowflake://myorg-account123",
    "name": "test.public.sample_table"
  }],
  "producer": "test-script", 
  "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
}
EOF

# Upload the file
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_INTEGRATION_KEY_HERE" \
  -d @event.json \
  https://api.app.euno.ai/accounts/YOUR_ACCOUNT_ID/integrations/YOUR_INTEGRATION_ID/run
```

## What Gets Observed in Euno

When OpenLineage events are processed, Euno observes:

### Table Resources

* **Tables** from both inputs and outputs with properties:
  * `name`: Table name
  * `database`: Database name
  * `schema`: Schema name
  * `database_technology`: `snowflake` or `bigquery`
  * `type`: `table`
  * `meta` : see below
  * `tags` : see below

### Lineage Relationships

* **Lineage**: Output tables get lineage pointing to input tables

### Tags and Metadata

* **Tags with values** (e.g., `{"key": "environment", "value": "prod"}`) become **meta properties**
* **Tags without values** (e.g., `{"key": "pii"}`) become **simple tags**

**Generating a new trigger key:** If you need to create a new integration key, go to the **Sources** page and click on the three-dot menu next to your OpenLineage source. Select **"Reset Trigger Key"** to generate a new key and endpoint URL.
