Welcome to toy shop!
Welcome to a re_data getting started tutorial. We will prepare, analyze and monitor toy shop data here. The toy shop is a fictional e-commerce shop 🙂
It's an example of how a re_data project can work and help you improve data in your data warehouse
Setting up toy shop project
Install re_data if you don't have it already
pip install re_data
Set up a dbt project containing the toy shop data using the re_data CLI.
re_data init toy_shop
cd toy_shop/
Toy shop data
You would observe that the project has two seed files included:
- toy_shop/seeds/customers.csv
- toy_shop/seeds/orders.csv
And it also contains one model:
- toy_shop/models/pending_orders_per_customer.sql
Both seeds & model are already configured to be monitored by re_data, we will describe this configuration later on during the tutorial
We use seeds instead of sources much more often than you would normally do in dbt. This is due to the convenient setup dbt offers for seeds
Profile setup
For re_data to work you will need dbt connection with the name toy_shop
(project name we used) Here are examples of how it could look like in dbs supported by re_data, you can also check more details on connection configuration in dbt profiles docs.
- BigQuery
- Snowflake
- Redshift
- Postgres
toy_shop:
target: dev
outputs:
dev:
type: snowflake
account: xxx
user: xxx
password: xxx
database: database
warehouse: warehouse
schema: toy_shop
toy_shop:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: xxx
schema: toy_shop
location: US
threads: 4
toy_shop:
target: dev
outputs:
dev:
type: redshift
host: xxx
user: xxx
password: xxx
port: 5439
dbname: xxx
schema: toy_shop
threads: 4
toy_shop:
target: dev
outputs:
dev:
type: postgres
host: xxx
user: xxx
password: xxx
port: 5432
dbname: xxx
schema: toy_shop
threads: 4
Loading data & creating models
Now you are ready to load toy_shop seed data & create project models.
# load seed files into the database
dbt seed
# Compute models for toy_shop project, only pending_orders_per_customer table in this case.
dbt run --select package:toy_shop
After this step 3 tables should be available in the data warehouse of your choice:
toy_shop=> SELECT * FROM toy_shop.orders;
id | customer_id | status | amount | time_created
-----+-------------+-----------------+--------+---------------------
1 | 2 | PAID | 20000 | 2021-01-02 14:10:54
2 | 3 | SHIPPED | 20000 | 2021-01-06 06:39:15
3 | 4 | DELIVERED | 40000 | 2021-01-10 20:46:55
4 | 5 | PENDING_PAYMENT | 20000 | 2021-01-10 12:15:55
5 | 6 | PAID | 25000 | 2021-01-09 21:38:54
..
..
..
toy_shop=> SELECT * FROM toy_shop.customers;
id | age | name
----+-----+-------------------
1 | 25 | Matias Douglas
2 | 38 | Raelyn Harrison
3 | 34 | Anaya Reed
4 | 46 | Mario Harris
5 | 28 | John Roberts
..
..
..
toy_shop=> SELECT * FROM toy_shop.pending_orders_per_customer;
id | amount | status | time_created | customer_id | age
-----+--------+-----------------+---------------------+-------------+-----
4 | 20000 | PENDING_PAYMENT | 2021-01-10 12:15:55 | 5 | 28
8 | 5000 | PENDING_PAYMENT | 2021-01-05 11:41:49 | 9 | 60
12 | 20000 | PENDING_PAYMENT | 2021-01-08 13:10:48 | 13 | 38
16 | 30000 | PENDING_PAYMENT | 2021-01-05 13:57:46 | 2 | 38
20 | 30000 | PENDING_PAYMENT | 2021-01-09 20:07:28 | 6 | 39
24 | 10000 | PENDING_PAYMENT | 2021-01-06 06:42:35 | 10 | 29
28 | 45000 | PENDING_PAYMENT | 2021-01-02 10:03:27 | 14 | 20
..
..
..