Quickstart - dbt users
This introduction assumes you are already using dbt in your company and tables you would like to monitor are managed by dbt. To fully use re_data you would need to install both:
We'll go over the steps required to do that & explain what possibilities those packages create for you.
Installing re_data dbt package
Add the re_data dbt package to your main dbt repo project.
You need to update your packages.yml
file with re_data package like that:
packages:
***
- package: re-data/re_data
version: [">=0.10.0", "<0.11.0"]
And then install dbt packages dependencies by running:
dbt deps
You can do that locally, in your dbt cloud environment, or Airflow etc. scheduler enviornment.
On production, you most likely are already running dbt deps
as part of dbt models computation. So this step maybe only necessary for your local environment.
Configuring tables
Computing metrics & anomalies for your dbt models & sources requires configuring them to be observed by re_data. You can do it in a couple of ways, all of them described in re_data configuration reference part. A simple configuration for a single model contains just information that the model should be monitored & timestamp expression (usually column name) to be used when computing re_data time-based stats.
{{
config(
re_data_monitored=true,
re_data_time_filter='time_column_name',
)
}}
select ...
dbt package functionality
Let's go over some of the things you already can use with re_data dbt package.
For specifics look into reference section:
- re_data dbt models
- re_data metrics
- re_data asserts
- re_data tests history
- re_data data cleaning, filtering, normalization, validation macros
dbt auto generated documentation, together with our models graph is also available: here
re_data macros don't require any configuration and can be just used after you add re_data into your environment.
Computing first metrics
To compute re_data models containing metrics & anomalies you can just run standard dbt command.
dbt run --models package:re_data
single re_data run produces single data points about your tables for a time window. The default time window when you run re_data without parameters is yesterday. (from yesterday's 00:00 AM up until today 00:00 AM) To compare tables over time you would need to run the re_data dbt package multiple times (by some scheduler, re_data python package or manually).
The following would create tables inside your {default_schema}_re
schema of your database. This is configured in dbt and can be overwritten in your dbt_project.yml
.
Storing tests history (optional)
re_data enables you to store dbt tests results to investigate them later on. You can enable this functionality by setting:
vars:
re_data:save_test_history: true
In your dbt_project.yml
file. After that when you run:
dbt test
re_data will store test history and with option --store-failures
is added, it will also store failures in re_data_test_history
model.
Installing re_data python package
To generate re_data reliability UI, send re_data alerts to Slack and easily backfill re_data models you will need to install the re_data python library. For this step, you need to have a python environment (with dbt installation) setup. Install re_data by executing:
pip install re_data
re_data python library should be installed in the same python environment where your dbt is installed. re_data makes use of dbt to run queries against your database. Because of that, you don't need to pass any DB credentials to re_data configuration. re_data by default will run dbt with the same credentials & profiles which you have in your dbt_project.yml
and ~/.dbt/profiles.yml
files. You can also change this behaviour by passing options to the re_data command.
Python package functionality
Python package add enabled you to use this functionality:
- re_data overview UI - for generating & displaying re_data UI
- re_data notify - for notifying external services about alerts (currently Slack)
- re_data run - for easily backfilling re_data dbt data
Generate & Serve UI
Let's go over 2 commands for generating & serving UI. It works quite similarly to dbt docs. First you create files by calling re_data overview generate
and then serving already existing files by re_data overview serve
. For more details on paramters accepted by this & other re_data commands check re_data CLI reference
re_data overview generate
re_data overview serve
Learning more
More detailed instrutions on running re_data are described in out toy_shop example tutorial 😊