Base metrics

Base metrics are computed for all monitored tables. If you would rather not compute some of them it's easy to change the base metrics list via the re_data:metrics_base variable.

Sample table for example metrics

__      title               rental_rate rating      created_at1       Chamber Italian     4.99        NC-17       2021-09-01T11:00:002       Grosse Wonderful    4.99        R           2021-09-01T12:00:003       Airport Pollock     4.99        R           2021-09-01T15:00:004       Bright Encounters   4.99        PG-13       2021-09-01T09:00:005       Academy Dinosaur    0.99        PG-13       2021-09-01T08:00:006       Ace Goldfinger      4.99        G           2021-09-01T10:00:007       Adaptation Holes    2.99        NC-17       2021-09-01T11:00:008       Affair Prejudice    2.99        G           2021-09-01T19:00:009       African Egg         2.99        G           2021-09-01T20:00:0010      Agent Truman        2.99        PG          2021-09-01T07:00:0011      Airplane Sierra     4.99        PG-13       2021-09-02T09:00:0012      Alabama Devil       2.99        PG-13       2021-09-02T10:00:0013      Aladdin Calendar    4.99        NC-17       2021-09-02T11:00:0014      Alamo Videotape     0.99        G           2021-09-02T12:00:0015      Alaska Phantom      0.99        PG          2021-09-02T13:00:0016      Date Speed          0.99        R           2021-09-02T14:00:0017      Ali Forever         4.99        PG          2021-09-02T15:00:0018      Alice Fantasia      0.99        NC-17       2021-09-02T16:00:0019      Alien Center        2.99        NC-17       2021-09-02T17:00:00

Below is a list of currently available metrics and how they are computed internally by re_data:

Base table level metrics#

row_count#

(source code)#

Numbers of rows added to the table in a specific time range.

row_count = 10 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

freshness#

Information about the latest record in a given time frame. Suppose we calculate the freshness metric in the table above for the time window [2021-09-01T00:00:00, 2021-09-02T00:00:00). We observe that the latest record in that time frame appears in row 9 with created_at=2021-09-01T20:00:00. freshness is the difference between the end of the time window and the latest record in the time frame in seconds. For this example described, re_data would calculate freshness as:

2021-09-02T00:00:00 - 2021-09-01T20:00:00 = 14400

schema_changes#

Information about schema changes in the monitored table.

Stored separately from the rest of the metrics in the re_data_schema_changes model.

caution

Schema changes are metric different from the rest. Because information about schema changes is gathered by comparing schemas between re_data runs this metric doesn't filter changes to time-window specified and in fact, doesn't use time_window settings at all.

Base column level metrics#

min#

(source code)#

Minimal value appearing in a given numeric column.

min(rental_rate) = 0.99 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

max#

(source code)#

Maximal value appearing in a given numeric column.

max(rental_rate) = 4.99 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

avg#

(source code)#

Average of all values appearing in a given numeric column.

avg(rental_rate) = 3.79 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

stddev#

(source code)#

The standard deviation of all values appearing in a given numeric column.

stddev(rental_rate) = 1.3984117975602022 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

variance#

(source code)#

The variance of all values appearing in a given numeric column.

variance(rental_rate) = 1.9555555555555557 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

min_length#

(source code) #

Minimal length of all strings appearing in a given column.

min_length(rating) = 1 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

max_length #

Maximal length of all strings appearing in a given column

max_length(rating) = 5 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

avg_length#

(source code)#

The average length of all strings appearing in a given column

avg_length(rating) = 2.4 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

nulls_count#

(source code)#

A number of nulls in a given column.

nulls_count(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

missing_count#

(source code)#

A number of nulls and empty string values in a given column for the specific time range.

missing_count(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

missing_percent#

(source code)#

A percentage of nulls and empty string values in a given column for the specific time range.

missing_percent(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

nulls_percent#

(source code)#

A percentage of null values in a given column for the specific time range.

nulls_percent(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00