Skip to main content

Default metrics

Default metrics are computed for all monitored tables. If you would rather not compute some of them it's easy to change the default metrics list via the re_data:default_metrics variable.

Sample table for example metrics
__      title               rental_rate rating      created_at             is_available
1 Chamber Italian 4.99 NC-17 2021-09-01T11:00:00 true
2 Grosse Wonderful 4.99 R 2021-09-01T12:00:00 true
3 Airport Pollock 4.99 R 2021-09-01T15:00:00 false
4 Bright Encounters 4.99 PG-13 2021-09-01T09:00:00 true
5 Academy Dinosaur 0.99 PG-13 2021-09-01T08:00:00 false
6 Ace Goldfinger 4.99 G 2021-09-01T10:00:00 false
7 Adaptation Holes 2.99 NC-17 2021-09-01T11:00:00 true
8 Affair Prejudice 2.99 G 2021-09-01T19:00:00 true
9 African Egg 2.99 G 2021-09-01T20:00:00 true
10 Agent Truman 2.99 PG 2021-09-01T07:00:00 false
11 Airplane Sierra 4.99 PG-13 2021-09-02T09:00:00 true
12 Alabama Devil 2.99 PG-13 2021-09-02T10:00:00 false
13 Aladdin Calendar 4.99 NC-17 2021-09-02T11:00:00 false
14 Alamo Videotape 0.99 G 2021-09-02T12:00:00 false
15 Alaska Phantom 0.99 PG 2021-09-02T13:00:00 true
16 Date Speed 0.99 R 2021-09-02T14:00:00 true
17 Ali Forever 4.99 PG 2021-09-02T15:00:00 true
18 Alice Fantasia 0.99 NC-17 2021-09-02T16:00:00 true
19 Alien Center 2.99 NC-17 2021-09-02T17:00:00 true

Below is a list of currently available metrics and how they are computed internally by re_data:

table level metricsโ€‹

row_countโ€‹

(source code)โ€‹

Numbers of rows added to the table in a specific time range.

row_count = 10 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

freshnessโ€‹

(source code)โ€‹

Information about the latest record in a given time frame. Suppose we calculate the freshness metric in the table above for the time window [2021-09-01T00:00:00, 2021-09-02T00:00:00). We observe that the latest record in that time frame appears in row 9 with created_at=2021-09-01T20:00:00. freshness is the difference between the end of the time window and the latest record in the time frame in seconds. For this example described, re_data would calculate freshness as:

2021-09-02T00:00:00 - 2021-09-01T20:00:00 = 14400

schema_changesโ€‹

Information about schema changes in the monitored table.

Stored separately from the rest of the metrics in the re_data_schema_changes model.

caution

Schema changes are metric different from the rest. Because information about schema changes is gathered by comparing schemas between re_data runs this metric doesn't filter changes to time-window specified and in fact, doesn't use time_window settings at all.

Column level metricsโ€‹

minโ€‹

(source code)โ€‹

Minimal value appearing in a given numeric column.

min(rental_rate) = 0.99 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

maxโ€‹

(source code)โ€‹

Maximal value appearing in a given numeric column.

max(rental_rate) = 4.99 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

avgโ€‹

(source code)โ€‹

Average of all values appearing in a given numeric column.

avg(rental_rate) = 3.79 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

stddevโ€‹

(source code)โ€‹

The standard deviation of all values appearing in a given numeric column.

stddev(rental_rate) = 1.3984117975602022 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

varianceโ€‹

(source code)โ€‹

The variance of all values appearing in a given numeric column.

variance(rental_rate) = 1.9555555555555557 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

min_lengthโ€‹

(source code) โ€‹

Minimal length of all strings appearing in a given column.

min_length(rating) = 1 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

max_length โ€‹

Maximal length of all strings appearing in a given column

max_length(rating) = 5 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

avg_lengthโ€‹

(source code)โ€‹

The average length of all strings appearing in a given column

avg_length(rating) = 2.4 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

nulls_countโ€‹

(source code)โ€‹

A number of nulls in a given column.

nulls_count(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

missing_countโ€‹

(source code)โ€‹

A number of nulls and empty string values in a given column for the specific time range.

missing_count(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

missing_percentโ€‹

(source code)โ€‹

A percentage of nulls and empty string values in a given column for the specific time range.

missing_percent(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

nulls_percentโ€‹

(source code)โ€‹

A percentage of null values in a given column for the specific time range.

nulls_percent(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

count_trueโ€‹

(source code)โ€‹

The total count of true values in a given boolean column for the specific time range.

count_true(is_available) = 12 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

count_falseโ€‹

(source code)โ€‹

The total count of false values in a given boolean column for the specific time range.

count_false(is_available) = 7 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00