Skip to main content

Base metrics

Base metrics are computed for all monitored tables. If you would rather not compute some of them it's easy to change the base metrics list via the re_data:metrics_base variable.

Sample table for example metrics
__      title               rental_rate rating      created_at1       Chamber Italian     4.99        NC-17       2021-09-01T11:00:002       Grosse Wonderful    4.99        R           2021-09-01T12:00:003       Airport Pollock     4.99        R           2021-09-01T15:00:004       Bright Encounters   4.99        PG-13       2021-09-01T09:00:005       Academy Dinosaur    0.99        PG-13       2021-09-01T08:00:006       Ace Goldfinger      4.99        G           2021-09-01T10:00:007       Adaptation Holes    2.99        NC-17       2021-09-01T11:00:008       Affair Prejudice    2.99        G           2021-09-01T19:00:009       African Egg         2.99        G           2021-09-01T20:00:0010      Agent Truman        2.99        PG          2021-09-01T07:00:0011      Airplane Sierra     4.99        PG-13       2021-09-02T09:00:0012      Alabama Devil       2.99        PG-13       2021-09-02T10:00:0013      Aladdin Calendar    4.99        NC-17       2021-09-02T11:00:0014      Alamo Videotape     0.99        G           2021-09-02T12:00:0015      Alaska Phantom      0.99        PG          2021-09-02T13:00:0016      Date Speed          0.99        R           2021-09-02T14:00:0017      Ali Forever         4.99        PG          2021-09-02T15:00:0018      Alice Fantasia      0.99        NC-17       2021-09-02T16:00:0019      Alien Center        2.99        NC-17       2021-09-02T17:00:00

Below is a list of currently available metrics and how they are computed internally by re_data:

Base table level metrics#

row_count#

(source code)#

Numbers of rows added to the table in a specific time range.

row_count = 10 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

freshness#

(source code)#

Information about the latest record in a given time frame. Suppose we calculate the freshness metric in the table above for the time window [2021-09-01T00:00:00, 2021-09-02T00:00:00). We observe that the latest record in that time frame appears in row 9 with created_at=2021-09-01T20:00:00. freshness is the difference between the end of the time window and the latest record in the time frame in seconds. For this example described, re_data would calculate freshness as:

2021-09-02T00:00:00 - 2021-09-01T20:00:00 = 14400

schema_changes#

Information about schema changes in the monitored table.

Stored separately from the rest of the metrics in the re_data_schema_changes model.

caution

Schema changes are metric different from the rest. Because information about schema changes is gathered by comparing schemas between re_data runs this metric doesn't filter changes to time-window specified and in fact, doesn't use time_window settings at all.

Base column level metrics#

min#

(source code)#

Minimal value appearing in a given numeric column.

min(rental_rate) = 0.99 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

max#

(source code)#

Maximal value appearing in a given numeric column.

max(rental_rate) = 4.99 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

avg#

(source code)#

Average of all values appearing in a given numeric column.

avg(rental_rate) = 3.79 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

stddev#

(source code)#

The standard deviation of all values appearing in a given numeric column.

stddev(rental_rate) = 1.3984117975602022 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

variance#

(source code)#

The variance of all values appearing in a given numeric column.

variance(rental_rate) = 1.9555555555555557 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

min_length#

(source code) #

Minimal length of all strings appearing in a given column.

min_length(rating) = 1 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

max_length #

Maximal length of all strings appearing in a given column

max_length(rating) = 5 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

avg_length#

(source code)#

The average length of all strings appearing in a given column

avg_length(rating) = 2.4 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

nulls_count#

(source code)#

A number of nulls in a given column.

nulls_count(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

missing_count#

(source code)#

A number of nulls and empty string values in a given column for the specific time range.

missing_count(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

missing_percent#

(source code)#

A percentage of nulls and empty string values in a given column for the specific time range.

missing_percent(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00

nulls_percent#

(source code)#

A percentage of null values in a given column for the specific time range.

nulls_percent(rating) = 0 where time window is >= 2021-09-01T00:00:00 and < 2021-09-02T00:00:00