Optimizing Complex Searches

Complex EQL queries, especially those involving multiple relationship checks, can sometimes be slow or time out. Computed properties can help by pre-calculating parts of these complex conditions, making the final query much faster.

A Real-World Example

Let's look at a common use case: finding all dashboards that depend on dbt models but don't depend on any raw tables not managed by dbt.

The Direct Approach

You might try to write this query directly:

type ~ 'dashboard' AND HAS upstream(type = 'dbt_model') AND NOT HAS upstream(type = 'table')

This query needs to:

  1. Find all dashboards

  2. For each dashboard, check all upstream dependencies for dbt models

  3. For each dashboard, check all upstream dependencies for tables

  4. Combine these conditions

With a large data model, this query can easily time out (default timeout is 10 seconds) because it needs to traverse the entire dependency graph multiple times.

The Optimized Approach

Instead, we can break this into two parts:

  1. First, create a computed boolean property that pre-calculates the expensive part of the query

  2. Then use this property in a simpler query

Step 1: Create the Computed Property

Create a new computed boolean property:

  1. Name: has_table_in_upstream

  2. Description: "Indicates whether this resource has any raw tables in its upstream dependencies"

  3. Type: Computed Boolean

  4. Scope: type ~ 'dashboard'

  5. Condition: HAS upstream(type = 'table')

[Screenshot: Creating the computed property]

This property will be calculated once for each dashboard and stored, rather than being computed every time you run a query.

Step 2: Use the Property

Now you can write a much simpler and faster query:

type ~ 'dashboard' AND HAS upstream(type = 'dbt_model') AND has_table_in_upstream IS FALSE

This query is faster because:

  • The expensive HAS upstream(type = 'table') check is pre-computed

  • The query only needs to check for dbt model dependencies

  • The boolean property check is a simple lookup

Best Practices for Query Optimization

  1. Identify Expensive Parts

    • Relationship checks (HAS upstream, HAS downstream) are expensive

    • Multiple relationship checks compound the performance impact

    • Negations of relationship checks (NOT HAS) are particularly expensive

  2. Create Focused Properties

    • Use the scope field to limit where the property is calculated

    • This reduces computation time and storage needs

    • Makes the property's purpose clear to other users

  3. Document Your Properties

    • Add clear descriptions explaining what the property represents

    • Include example queries in the description

    • Note any assumptions or limitations

  4. Monitor and Maintain

    • Check if the property is still being used

    • Review if the property needs to be recalculated when data changes

    • Consider cleaning up unused properties

When to Use This Pattern

Consider creating a computed property for optimization when:

  • Your queries frequently time out

  • You're using the same complex conditions in multiple queries

  • You need to negate relationship checks (NOT HAS)

  • You're combining multiple relationship checks in one query

Example Use Cases

  1. Data Quality Monitoring

    # Pre-compute
    name: has_quality_issues
    scope: type = 'dbt_model'
    condition: HAS downstream(type = 'quality_check' AND status = 'failed')
  2. Dependency Analysis

    # Pre-compute
    name: is_isolated_dashboard
    scope: type = 'dashboard'
    condition: NOT HAS upstream(TRUE)
  3. Usage Tracking

    # Pre-compute
    name: has_recent_usage
    scope: type IN ('dashboard', 'look')
    condition: total_views_30d > 0

Last updated