Extra Metrics

There are metrics provided by re_data but are not computed by default in monitored tables. You can compute them by updating the configuration for the specific table or adding them to a metrics groups which are computed by default.

Sample table for which we compute extra metrics
__      title               rental_rate rating      created_at
     Chamber Italian     4.99        NC-17       2021-09-01T11:00:00
     Grosse Wonderful    4.99        R           2021-09-01T12:00:00
     Airport Pollock     4.99        R           2021-09-01T15:00:00
     Bright Encounters   4.99        PG-13       2021-09-01T09:00:00
     Academy Dinosaur    0.99        PG-13       2021-09-01T08:00:00
     Ace Goldfinger      4.99        G           2021-09-01T10:00:00
     Adaptation Holes    2.99        NC-17       2021-09-01T11:00:00
     Affair Prejudice    2.99        G           2021-09-01T19:00:00
     African Egg         2.99        G           2021-09-01T20:00:00
    Agent Truman        2.99        PG          2021-09-01T07:00:00
    Airplane Sierra     4.99        PG-13       2021-09-02T09:00:00
    Alabama Devil       2.99        PG-13       2021-09-02T10:00:00
    Aladdin Calendar    4.99        NC-17       2021-09-02T11:00:00
    Alamo Videotape     0.99        G           2021-09-02T12:00:00
    Alaska Phantom      0.99        PG          2021-09-02T13:00:00
    Date Speed          0.99        R           2021-09-02T14:00:00
    Ali Forever         4.99        PG          2021-09-02T15:00:00
    Alice Fantasia      0.99        NC-17       2021-09-02T16:00:00
    Alien Center        2.99        NC-17       2021-09-02T17:00:00

Extra Table Metrics

distinct_table_rows

(source code)

This metric computes the distinct number of rows in the given table

time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00
distinct_rows = 10

Extra Column Metrics

info

regex_match_expression is resolved at runtime depending on the database in use.

match_regex

(source code)

Determines the count of values in a given column that matches the specified regex. Suppose we want to check if the rating column matches a specific regular expression pattern and we define it in our dbt_project.yml file.

regex must be specified for this metric to work else a compiler exception would be raised.

specifying match_regex config
vars:
  re_data:monitored:
    - tables:
        - name: sample_table
          time_filter: created_at

          metrics:
            column:
              rating:
                - match_regex:
                    regex: ([0-9]+)

select coalesce(
        sum(
            case when {{ regex_match_expression('rating', '([0-9]+)') }}
                then 1
            else 0
            end
        ), 0
    ) from your_table
where created_at between time_window_start and time_window_end

------------------------------------------------------------------------
1       Chamber Italian     4.99        NC-17       2021-09-01T11:00:00
4       Bright Encounters   4.99        PG-13       2021-09-01T09:00:00
5       Academy Dinosaur    0.99        PG-13       2021-09-01T08:00:00
7       Adaptation Holes    2.99        NC-17       2021-09-01T11:00:00

match_regex = 4 where created_at is between 2021-09-01T00:00:00 and 2021-09-02T00:00:00

match_regex_percent

(source code)

Determines the percentage of values in a given column that matches the specified regex.

Suppose we use the same configuration for the match_regex metric above, we have
match_regex_percent = 40 where created_at is between 2021-09-01T00:00:00 and 2021-09-02T00:00:00

not_match_regex

(source code)

Determines the count of values in a given column that does not match the specified regex.

Suppose we pass in ([0-9]+) as our regex parameter,
not_match_regex = 6 where created_at is between 2021-09-01T00:00:00 and 2021-09-02T00:00:00

distinct_values

(source code)

Determines the count of values in a given column that are unique.

rating  count
-----------------
PG-13   2
G       3
NC-17   2
PG      1
R       2
time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00
distinct_values = 5. (PG)

duplicate_values

(source code)

Determines the count of values in a given column that are duplicated.

rating  count
-----------------
PG-13   2
G       3
NC-17   2
PG      1
R       2

time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00
duplicate_values = 4. (PG-13, G, NC-17, R)

duplicate_rows

(source code)

Determines the count of rows in a given column that have values which are duplicates.

rating  count
-----------------
PG-13   2
G       3
NC-17   2
PG      1
R       2

time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00
duplicate_count = 9. (PG-13[2], G[3], NC-17[2], R[2])

unique_rows

(source code)

Determines the count of rows in a given column that have values which are unique.

rating  count
-----------------
PG-13   2
G       3
NC-17   2
PG      1
R       2

time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00
distinct_count = 1 (PG)

approx_distinct_values

(source code)

Determines the approximate distinct count of values in a given column. This metric is useful in large tables where an approximation is sufficient and query performance is required.
Note: Postgres does not support for approximate count of distinct values unlike bigquery, snowflake and redshift.

Extra Metrics

Extra Table Metrics​

distinct_table_rows​

Extra Column Metrics​

match_regex​

match_regex_percent​

not_match_regex​

distinct_values​

duplicate_values​

duplicate_rows​

unique_rows​

approx_distinct_values​

Extra Table Metrics

distinct_table_rows

Extra Column Metrics

match_regex

match_regex_percent

not_match_regex

distinct_values

duplicate_values

duplicate_rows

unique_rows

approx_distinct_values