Skip to content

Commit 421e097

Browse files
committed
Prepare repo for assignment
1 parent f72efd2 commit 421e097

30 files changed

+4287
-281
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
* @dbt-labs/dx

.github/workflows/validate.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
name: Validate dbt project
2+
3+
on: push
4+
5+
jobs:
6+
validate:
7+
runs-on: ubuntu-latest
8+
9+
steps:
10+
- uses: actions/checkout@v4
11+
- name: Install uv
12+
uses: astral-sh/setup-uv@v5
13+
with:
14+
enable-cache: true
15+
- name: Install the project
16+
run: uv sync --all-extras --dev
17+
- name: Run DBT
18+
run: uv run dbt build

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11

22
target/
3+
dbt_packages/
34
dbt_modules/
45
logs/
56
**/.DS_Store
7+
.user.yml
8+
venv/
9+
env/
10+
**/*.duckdb
11+
**/*.duckdb.wal

.sqlfluff

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[sqlfluff]
2+
3+
dialect = duckdb
File renamed without changes.

README.md

Lines changed: 82 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,78 +1,102 @@
1-
## Testing dbt project: `jaffle_shop`
1+
# Storio - Data Principal Engineer Assignment
22

3-
`jaffle_shop` is a fictional ecommerce store. This dbt project transforms raw data from an app database into a customers and orders model ready for analytics.
3+
This repo contains a scuffed version of `jaffle_shop`, a fictional ecommerce store. This project will be used to test your refactoring skills. In a nutshell: This dbt project transforms raw data from an app database into models ready for analytics.
4+
However, the project is not well-structured, and the code is not very readable. Your task is to refactor the code to make it more readable and maintainable.
5+
Things to consider:
46

5-
### What is this repo?
6-
What this repo _is_:
7-
- A self-contained playground dbt project, useful for testing out scripts, and communicating some of the core dbt concepts.
7+
- Warehouse layers
8+
- Code readability
9+
- Testing
810

9-
What this repo _is not_:
10-
- A tutorial — check out the [Getting Started Tutorial](https://docs.getdbt.com/tutorial/setting-up) for that. Notably, this repo contains some anti-patterns to make it self-contained, namely the use of seeds instead of sources.
11-
- A demonstration of best practices — check out the [dbt Learn Demo](https://github.com/fishtown-analytics/dbt-learn-demo-v2-archive) repo instead. We want to keep this project as simple as possible. As such, we chose not to implement:
12-
- our standard file naming patterns (which make more sense on larger projects, rather than this five-model project)
13-
- a pull request flow
14-
- CI/CD integrations
15-
- A demonstration of using dbt for a high-complex project, or a demo of advanced features (e.g. macros, packages, hooks, operations) — we're just trying to keep things simple here!
11+
The project contains [seeds](https://docs.getdbt.com/docs/building-a-dbt-project/seeds) that includes some (fake) raw data from a fictional app along with some basic dbt [models](https://docs.getdbt.com/docs/building-a-dbt-project/building-models), tests, and docs for this data.
1612

17-
### What's in this repo?
18-
This repo contains [seeds](https://docs.getdbt.com/docs/building-a-dbt-project/seeds) that includes some (fake) raw data from a fictional app.
13+
## Running this project
1914

20-
The raw data consists of customers, orders, and payments, with the following entity-relationship diagram:
15+
Prerequisities: Python >= 3.8
2116

22-
![Jaffle Shop ERD](/etc/jaffle_shop_erd.png)
17+
1. Install the project in a virtual environment using your favorite python/env management tool
18+
- `uv`
19+
- `pipenv`
20+
- `poetry`
21+
- `venv`
22+
- ...
23+
2. (`uv run`) `dbt build`
24+
3. (`uv run`) `dbt docs generate`
25+
4. (`uv run`) `dbt docs serve`
2326

27+
## Verifying your environment
2428

25-
### Running this project
26-
To get up and running with this project:
27-
1. Install dbt using [these instructions](https://docs.getdbt.com/docs/installation).
29+
1. Ensure your [profile](https://docs.getdbt.com/reference/profiles.yml) is setup correctly from the command line:
2830

29-
2. Clone this repository.
31+
```shell
32+
dbt --version
33+
dbt debug
34+
```
3035

31-
3. Change into the `jaffle_shop` directory from the command line:
32-
```bash
33-
$ cd jaffle_shop
34-
```
36+
2. Load the CSVs with the demo data set, run the models, and test the output of the models using the [dbt build](https://docs.getdbt.com/reference/commands/build) command:
3537

36-
4. Set up a profile called `jaffle_shop` to connect to a data warehouse by following [these instructions](https://docs.getdbt.com/docs/configure-your-profile). If you have access to a data warehouse, you can use those credentials – we recommend setting your [target schema](https://docs.getdbt.com/docs/configure-your-profile#section-populating-your-profile) to be a new schema (dbt will create the schema for you, as long as you have the right privileges). If you don't have access to an existing data warehouse, you can also setup a local postgres database and connect to it in your profile.
38+
```shell
39+
dbt build
40+
```
3741

38-
5. Ensure your profile is setup correctly from the command line:
39-
```bash
40-
$ dbt debug
41-
```
42+
3. Query the data:
4243

43-
6. Load the CSVs with the demo data set. This materializes the CSVs as tables in your target schema. Note that a typical dbt project **does not require this step** since dbt assumes your raw data is already in your warehouse.
44-
```bash
45-
$ dbt seed
46-
```
44+
Launch a DuckDB command-line interface (CLI):
4745

48-
7. Run the models:
49-
```bash
50-
$ dbt run
51-
```
46+
```shell
47+
duckcli jaffle_shop.duckdb
48+
```
5249

53-
> **NOTE:** If this steps fails, it might mean that you need to make small changes to the SQL in the models folder to adjust for the flavor of SQL of your target database. Definitely consider this if you are using a community-contributed adapter.
50+
Run a query at the prompt and exit:
5451

55-
8. Test the output of the models:
56-
```bash
57-
$ dbt test
58-
```
52+
```
53+
select * from customers_with_order_info where customer_id = 42;
54+
exit;
55+
```
5956

60-
9. Generate documentation for the project:
61-
```bash
62-
$ dbt docs generate
63-
```
57+
Alternatively, use a single-liner to perform the query:
6458

65-
10. View the documentation for the project:
66-
```bash
67-
$ dbt docs serve
68-
```
59+
```shell
60+
duckcli jaffle_shop.duckdb -e "select * from customers_with_order_info where customer_id = 42"
61+
```
6962

70-
### What is a jaffle?
71-
A jaffle is a toasted sandwich with crimped, sealed edges. Invented in Bondi in 1949, the humble jaffle is an Australian classic. The sealed edges allow jaffle-eaters to enjoy liquid fillings inside the sandwich, which reach temperatures close to the core of the earth during cooking. Often consumed at home after a night out, the most classic filling is tinned spaghetti, while my personal favourite is leftover beef stew with melted cheese.
63+
or:
7264

73-
---
74-
For more information on dbt:
75-
- Read the [introduction to dbt](https://docs.getdbt.com/docs/introduction).
76-
- Read the [dbt viewpoint](https://docs.getdbt.com/docs/about/viewpoint).
77-
- Join the [dbt community](http://community.getdbt.com/).
78-
---
65+
```shell
66+
echo 'select * from customers_with_order_info where customer_id = 42' | duckcli jaffle_shop.duckdb
67+
```
68+
69+
4. Generate and view the documentation for the project:
70+
```shell
71+
dbt docs generate
72+
dbt docs serve
73+
```
74+
75+
## Running `build` steps independently
76+
77+
1. Load the CSVs with the demo data set. This materializes the CSVs as tables in your target schema. Note that a typical dbt project **does not require this step** since dbt assumes your raw data is already in your warehouse.
78+
79+
```shell
80+
dbt seed
81+
```
82+
83+
2. Run the models:
84+
85+
```shell
86+
dbt run
87+
```
88+
89+
> **NOTE:** If you decide to run this project in your own data warehouse (outside of this DuckDB demo) and steps fail, it might mean that you need to make small changes to the SQL in the models folder to adjust for the flavor of SQL of your target database. Definitely consider this if you are using a community-contributed adapter.
90+
91+
3. Test the output of the models using the [test](https://docs.getdbt.com/reference/commands/test) command:
92+
```shell
93+
dbt test
94+
```
95+
96+
## Browsing the data
97+
98+
Some options:
99+
100+
- [DuckDB UI](https://duckdb.org/docs/stable/extensions/ui.html)
101+
- [duckcli](https://pypi.org/project/duckcli/)
102+
- [DuckDB CLI](https://duckdb.org/docs/installation/?environment=cli)

dbt_project.yml

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,24 @@
1+
name: "jaffle_shop"
12

2-
name: 'jaffle_shop'
3-
version: '0.1'
4-
profile: 'jaffle_shop'
53
config-version: 2
4+
version: "0.1"
65

7-
source-paths: ["models"]
8-
analysis-paths: ["analysis"]
6+
profile: "jaffle_shop"
7+
8+
model-paths: ["models"]
9+
seed-paths: ["seeds"]
910
test-paths: ["tests"]
10-
data-paths: ["data"]
11+
analysis-paths: ["analysis"]
1112
macro-paths: ["macros"]
1213

1314
target-path: "target"
1415
clean-targets:
15-
- "target"
16-
- "dbt_modules"
17-
- "logs"
16+
- "target"
17+
- "dbt_modules"
18+
- "logs"
19+
20+
require-dbt-version: [">=1.0.0", "<2.0.0"]
1821

1922
models:
2023
jaffle_shop:
21-
materialized: table
22-
staging:
23-
materialized: view
24+
materialized: table

etc/dbdiagram_definition.txt

Lines changed: 0 additions & 24 deletions
This file was deleted.

etc/jaffle_shop_erd.png

-53.9 KB
Binary file not shown.

models/customers.sql

Lines changed: 0 additions & 64 deletions
This file was deleted.

0 commit comments

Comments
 (0)