Extra Metrics

There are metrics provided by re_data but are not computed by default in monitored tables. You can enable them by updating the configuration for the specific table. You can also make the base metrics by adding them to re_data:metrics_base.

Sample table for which we compute extra metrics

__      title               rental_rate rating      created_at1       Chamber Italian     4.99        NC-17       2021-09-01T11:00:002       Grosse Wonderful    4.99        R           2021-09-01T12:00:003       Airport Pollock     4.99        R           2021-09-01T15:00:004       Bright Encounters   4.99        PG-13       2021-09-01T09:00:005       Academy Dinosaur    0.99        PG-13       2021-09-01T08:00:006       Ace Goldfinger      4.99        G           2021-09-01T10:00:007       Adaptation Holes    2.99        NC-17       2021-09-01T11:00:008       Affair Prejudice    2.99        G           2021-09-01T19:00:009       African Egg         2.99        G           2021-09-01T20:00:0010      Agent Truman        2.99        PG          2021-09-01T07:00:0011      Airplane Sierra     4.99        PG-13       2021-09-02T09:00:0012      Alabama Devil       2.99        PG-13       2021-09-02T10:00:0013      Aladdin Calendar    4.99        NC-17       2021-09-02T11:00:0014      Alamo Videotape     0.99        G           2021-09-02T12:00:0015      Alaska Phantom      0.99        PG          2021-09-02T13:00:0016      Date Speed          0.99        R           2021-09-02T14:00:0017      Ali Forever         4.99        PG          2021-09-02T15:00:0018      Alice Fantasia      0.99        NC-17       2021-09-02T16:00:0019      Alien Center        2.99        NC-17       2021-09-02T17:00:00

Extra Table Metrics#

distinct_table_rows#

(source code)#

This metric computes the distinct number of rows in the given table

time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00distinct_rows = 10

Extra Column Metrics#

info

regex_match_expression is resolved at runtime depending on the database in use.

match_regex#

(source code)#

Determines the count of values in a given column that matches the specified regex. Suppose we want to check if the rating column matches a specific regular expression pattern and we define it in our dbt_project.yml file.

regex must be specified for this metric to work else a compiler exception would be raised.

specifying match_regex config

vars:  re_data:monitored:    - tables:        - name: sample_table          time_filter: created_at
          metrics:            column:              rating:                - match_regex:                    regex: ([0-9]+)

select coalesce(        sum(            case when {{ regex_match_expression('rating', '([0-9]+)') }}                then 1            else 0            end        ), 0    ) from your_tablewhere created_at between time_window_start and time_window_end
------------------------------------------------------------------------1       Chamber Italian     4.99        NC-17       2021-09-01T11:00:004       Bright Encounters   4.99        PG-13       2021-09-01T09:00:005       Academy Dinosaur    0.99        PG-13       2021-09-01T08:00:007       Adaptation Holes    2.99        NC-17       2021-09-01T11:00:00
match_regex = 4 where created_at is between 2021-09-01T00:00:00 and 2021-09-02T00:00:00

match_regex_percent#

(source code)#

Determines the percentage of values in a given column that matches the specified regex.

Suppose we use the same configuration for the match_regex metric above, we havematch_regex_percent = 40 where created_at is between 2021-09-01T00:00:00 and 2021-09-02T00:00:00

not_match_regex#

(source code)#

Determines the count of values in a given column that does not match the specified regex.

Suppose we pass in ([0-9]+) as our regex parameter,not_match_regex = 6 where created_at is between 2021-09-01T00:00:00 and 2021-09-02T00:00:00

distinct_values#

(source code)#

Determines the count of values in a given column that are unique.

rating  count-----------------PG-13   2G       3NC-17   2PG      1R       2time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00distinct_values = 5. (PG)

duplicate_values#

(source code)#

Determines the count of values in a given column that are duplicated.

rating  count-----------------PG-13   2G       3NC-17   2PG      1R       2
time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00duplicate_values = 4. (PG-13, G, NC-17, R)

duplicate_rows#

(source code)#

Determines the count of rows in a given column that have values which are duplicates.

rating  count-----------------PG-13   2G       3NC-17   2PG      1R       2
time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00duplicate_count = 9. (PG-13[2], G[3], NC-17[2], R[2])

unique_rows#

(source code)#

Determines the count of rows in a given column that have values which are unique.

rating  count-----------------PG-13   2G       3NC-17   2PG      1R       2
time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00distinct_count = 1 (PG)

approx_distinct_values#

(source code)#

Determines the approximate distinct count of values in a given column. This metric is useful in large tables where an approximation is sufficient and query performance is required.
Note: Postgres does not support for approximate count of distinct values unlike bigquery, snowflake and redshift.