Skip to main content

Extra Metrics

There are metrics provided by re_data but are not computed by default in monitored tables. You can compute them by updating the configuration for the specific table or adding them to a metrics groups which are computed by default.

Sample table for which we compute extra metrics
__      title               rental_rate rating      created_at
1 Chamber Italian 4.99 NC-17 2021-09-01T11:00:00
2 Grosse Wonderful 4.99 R 2021-09-01T12:00:00
3 Airport Pollock 4.99 R 2021-09-01T15:00:00
4 Bright Encounters 4.99 PG-13 2021-09-01T09:00:00
5 Academy Dinosaur 0.99 PG-13 2021-09-01T08:00:00
6 Ace Goldfinger 4.99 G 2021-09-01T10:00:00
7 Adaptation Holes 2.99 NC-17 2021-09-01T11:00:00
8 Affair Prejudice 2.99 G 2021-09-01T19:00:00
9 African Egg 2.99 G 2021-09-01T20:00:00
10 Agent Truman 2.99 PG 2021-09-01T07:00:00
11 Airplane Sierra 4.99 PG-13 2021-09-02T09:00:00
12 Alabama Devil 2.99 PG-13 2021-09-02T10:00:00
13 Aladdin Calendar 4.99 NC-17 2021-09-02T11:00:00
14 Alamo Videotape 0.99 G 2021-09-02T12:00:00
15 Alaska Phantom 0.99 PG 2021-09-02T13:00:00
16 Date Speed 0.99 R 2021-09-02T14:00:00
17 Ali Forever 4.99 PG 2021-09-02T15:00:00
18 Alice Fantasia 0.99 NC-17 2021-09-02T16:00:00
19 Alien Center 2.99 NC-17 2021-09-02T17:00:00

Extra Table Metricsโ€‹

distinct_table_rowsโ€‹

(source code)โ€‹

This metric computes the distinct number of rows in the given table

time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00
distinct_rows = 10

Extra Column Metricsโ€‹

info

regex_match_expression is resolved at runtime depending on the database in use.

match_regexโ€‹

(source code)โ€‹

Determines the count of values in a given column that matches the specified regex. Suppose we want to check if the rating column matches a specific regular expression pattern and we define it in our dbt_project.yml file.

  • regex must be specified for this metric to work else a compiler exception would be raised.
specifying match_regex config
vars:
re_data:monitored:
- tables:
- name: sample_table
time_filter: created_at

metrics:
column:
rating:
- match_regex:
regex: ([0-9]+)
select coalesce(
sum(
case when {{ regex_match_expression('rating', '([0-9]+)') }}
then 1
else 0
end
), 0
) from your_table
where created_at between time_window_start and time_window_end

------------------------------------------------------------------------
1 Chamber Italian 4.99 NC-17 2021-09-01T11:00:00
4 Bright Encounters 4.99 PG-13 2021-09-01T09:00:00
5 Academy Dinosaur 0.99 PG-13 2021-09-01T08:00:00
7 Adaptation Holes 2.99 NC-17 2021-09-01T11:00:00

match_regex = 4 where created_at is between 2021-09-01T00:00:00 and 2021-09-02T00:00:00

match_regex_percentโ€‹

(source code)โ€‹

Determines the percentage of values in a given column that matches the specified regex.

Suppose we use the same configuration for the match_regex metric above, we have
match_regex_percent = 40 where created_at is between 2021-09-01T00:00:00 and 2021-09-02T00:00:00

not_match_regexโ€‹

(source code)โ€‹

Determines the count of values in a given column that does not match the specified regex.

Suppose we pass in ([0-9]+) as our regex parameter,
not_match_regex = 6 where created_at is between 2021-09-01T00:00:00 and 2021-09-02T00:00:00

distinct_valuesโ€‹

(source code)โ€‹

Determines the count of values in a given column that are unique.

rating  count
-----------------
PG-13 2
G 3
NC-17 2
PG 1
R 2
time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00
distinct_values = 5. (PG)

duplicate_valuesโ€‹

(source code)โ€‹

Determines the count of values in a given column that are duplicated.

rating  count
-----------------
PG-13 2
G 3
NC-17 2
PG 1
R 2

time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00
duplicate_values = 4. (PG-13, G, NC-17, R)

duplicate_rowsโ€‹

(source code)โ€‹

Determines the count of rows in a given column that have values which are duplicates.

rating  count
-----------------
PG-13 2
G 3
NC-17 2
PG 1
R 2

time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00
duplicate_count = 9. (PG-13[2], G[3], NC-17[2], R[2])

unique_rowsโ€‹

(source code)โ€‹

Determines the count of rows in a given column that have values which are unique.

rating  count
-----------------
PG-13 2
G 3
NC-17 2
PG 1
R 2

time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00
distinct_count = 1 (PG)

approx_distinct_valuesโ€‹

(source code)โ€‹

Determines the approximate distinct count of values in a given column. This metric is useful in large tables where an approximation is sufficient and query performance is required.
Note: Postgres does not support for approximate count of distinct values unlike bigquery, snowflake and redshift.