Skip to main content

Base metrics

Base metrics are computed for all monitored tables. If you would rather not compute some of them it's easy to change the base metrics list via the re_data:metrics_base variable.

Sample table for example metrics
__      title               rental_rate rating      created_at
1 Chamber Italian 4.99 NC-17 2021-09-01T11:00:00
2 Grosse Wonderful 4.99 R 2021-09-01T12:00:00
3 Airport Pollock 4.99 R 2021-09-01T15:00:00
4 Bright Encounters 4.99 PG-13 2021-09-01T09:00:00
5 Academy Dinosaur 0.99 PG-13 2021-09-01T08:00:00
6 Ace Goldfinger 4.99 G 2021-09-01T10:00:00
7 Adaptation Holes 2.99 NC-17 2021-09-01T11:00:00
8 Affair Prejudice 2.99 G 2021-09-01T19:00:00
9 African Egg 2.99 G 2021-09-01T20:00:00
10 Agent Truman 2.99 PG 2021-09-01T07:00:00
11 Airplane Sierra 4.99 PG-13 2021-09-02T09:00:00
12 Alabama Devil 2.99 PG-13 2021-09-02T10:00:00
13 Aladdin Calendar 4.99 NC-17 2021-09-02T11:00:00
14 Alamo Videotape 0.99 G 2021-09-02T12:00:00
15 Alaska Phantom 0.99 PG 2021-09-02T13:00:00
16 Date Speed 0.99 R 2021-09-02T14:00:00
17 Ali Forever 4.99 PG 2021-09-02T15:00:00
18 Alice Fantasia 0.99 NC-17 2021-09-02T16:00:00
19 Alien Center 2.99 NC-17 2021-09-02T17:00:00

Below is a list of currently available metrics and how they are computed internally by re_data:

Base table level metricsโ€‹

row_countโ€‹

(source code)โ€‹

Numbers of rows added to the table in a specific time range.

row_count = 10 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

freshnessโ€‹

(source code)โ€‹

Information about the latest record in a given time frame. Suppose we calculate the freshness metric in the table above for the time window [2021-09-01T00:00:00, 2021-09-02T00:00:00). We observe that the latest record in that time frame appears in row 9 with created_at=2021-09-01T20:00:00. freshness is the difference between the end of the time window and the latest record in the time frame in seconds. For this example described, re_data would calculate freshness as:

2021-09-02T00:00:00 - 2021-09-01T20:00:00 = 14400

schema_changesโ€‹

Information about schema changes in the monitored table.

Stored separately from the rest of the metrics in the re_data_schema_changes model.

caution

Schema changes are metric different from the rest. Because information about schema changes is gathered by comparing schemas between re_data runs this metric doesn't filter changes to time-window specified and in fact, doesn't use time_window settings at all.

Base column level metricsโ€‹

minโ€‹

(source code)โ€‹

Minimal value appearing in a given numeric column.

min(rental_rate) = 0.99 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

maxโ€‹

(source code)โ€‹

Maximal value appearing in a given numeric column.

max(rental_rate) = 4.99 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

avgโ€‹

(source code)โ€‹

Average of all values appearing in a given numeric column.

avg(rental_rate) = 3.79 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

stddevโ€‹

(source code)โ€‹

The standard deviation of all values appearing in a given numeric column.

stddev(rental_rate) = 1.3984117975602022 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

varianceโ€‹

(source code)โ€‹

The variance of all values appearing in a given numeric column.

variance(rental_rate) = 1.9555555555555557 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

min_lengthโ€‹

(source code) โ€‹

Minimal length of all strings appearing in a given column.

min_length(rating) = 1 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

max_length โ€‹

Maximal length of all strings appearing in a given column

max_length(rating) = 5 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

avg_lengthโ€‹

(source code)โ€‹

The average length of all strings appearing in a given column

avg_length(rating) = 2.4 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

nulls_countโ€‹

(source code)โ€‹

A number of nulls in a given column.

nulls_count(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

missing_countโ€‹

(source code)โ€‹

A number of nulls and empty string values in a given column for the specific time range.

missing_count(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

missing_percentโ€‹

(source code)โ€‹

A percentage of nulls and empty string values in a given column for the specific time range.

missing_percent(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

nulls_percentโ€‹

(source code)โ€‹

A percentage of null values in a given column for the specific time range.

nulls_percent(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00