|
1 | | -## Testing dbt project: `jaffle_shop` |
| 1 | +# Storio - Data Principal Engineer Assignment |
2 | 2 |
|
3 | | -`jaffle_shop` is a fictional ecommerce store. This dbt project transforms raw data from an app database into a customers and orders model ready for analytics. |
| 3 | +This repo contains a scuffed version of `jaffle_shop`, a fictional ecommerce store. This project will be used to test your refactoring skills. In a nutshell: This dbt project transforms raw data from an app database into models ready for analytics. |
| 4 | +However, the project is not well-structured, and the code is not very readable. Your task is to refactor the code to make it more readable and maintainable. |
| 5 | +Things to consider: |
4 | 6 |
|
5 | | -### What is this repo? |
6 | | -What this repo _is_: |
7 | | -- A self-contained playground dbt project, useful for testing out scripts, and communicating some of the core dbt concepts. |
| 7 | +- Warehouse layers |
| 8 | +- Code readability |
| 9 | +- Testing |
8 | 10 |
|
9 | | -What this repo _is not_: |
10 | | -- A tutorial — check out the [Getting Started Tutorial](https://docs.getdbt.com/tutorial/setting-up) for that. Notably, this repo contains some anti-patterns to make it self-contained, namely the use of seeds instead of sources. |
11 | | -- A demonstration of best practices — check out the [dbt Learn Demo](https://github.com/fishtown-analytics/dbt-learn-demo-v2-archive) repo instead. We want to keep this project as simple as possible. As such, we chose not to implement: |
12 | | - - our standard file naming patterns (which make more sense on larger projects, rather than this five-model project) |
13 | | - - a pull request flow |
14 | | - - CI/CD integrations |
15 | | -- A demonstration of using dbt for a high-complex project, or a demo of advanced features (e.g. macros, packages, hooks, operations) — we're just trying to keep things simple here! |
| 11 | +The project contains [seeds](https://docs.getdbt.com/docs/building-a-dbt-project/seeds) that includes some (fake) raw data from a fictional app along with some basic dbt [models](https://docs.getdbt.com/docs/building-a-dbt-project/building-models), tests, and docs for this data. |
16 | 12 |
|
17 | | -### What's in this repo? |
18 | | -This repo contains [seeds](https://docs.getdbt.com/docs/building-a-dbt-project/seeds) that includes some (fake) raw data from a fictional app. |
| 13 | +## Running this project |
19 | 14 |
|
20 | | -The raw data consists of customers, orders, and payments, with the following entity-relationship diagram: |
| 15 | +Prerequisities: Python >= 3.8 |
21 | 16 |
|
22 | | - |
| 17 | +1. Install the project in a virtual environment using your favorite python/env management tool |
| 18 | + - `uv` |
| 19 | + - `pipenv` |
| 20 | + - `poetry` |
| 21 | + - `venv` |
| 22 | + - ... |
| 23 | +2. (`uv run`) `dbt build` |
| 24 | +3. (`uv run`) `dbt docs generate` |
| 25 | +4. (`uv run`) `dbt docs serve` |
23 | 26 |
|
| 27 | +## Verifying your environment |
24 | 28 |
|
25 | | -### Running this project |
26 | | -To get up and running with this project: |
27 | | -1. Install dbt using [these instructions](https://docs.getdbt.com/docs/installation). |
| 29 | +1. Ensure your [profile](https://docs.getdbt.com/reference/profiles.yml) is setup correctly from the command line: |
28 | 30 |
|
29 | | -2. Clone this repository. |
| 31 | + ```shell |
| 32 | + dbt --version |
| 33 | + dbt debug |
| 34 | + ``` |
30 | 35 |
|
31 | | -3. Change into the `jaffle_shop` directory from the command line: |
32 | | -```bash |
33 | | -$ cd jaffle_shop |
34 | | -``` |
| 36 | +2. Load the CSVs with the demo data set, run the models, and test the output of the models using the [dbt build](https://docs.getdbt.com/reference/commands/build) command: |
35 | 37 |
|
36 | | -4. Set up a profile called `jaffle_shop` to connect to a data warehouse by following [these instructions](https://docs.getdbt.com/docs/configure-your-profile). If you have access to a data warehouse, you can use those credentials – we recommend setting your [target schema](https://docs.getdbt.com/docs/configure-your-profile#section-populating-your-profile) to be a new schema (dbt will create the schema for you, as long as you have the right privileges). If you don't have access to an existing data warehouse, you can also setup a local postgres database and connect to it in your profile. |
| 38 | + ```shell |
| 39 | + dbt build |
| 40 | + ``` |
37 | 41 |
|
38 | | -5. Ensure your profile is setup correctly from the command line: |
39 | | -```bash |
40 | | -$ dbt debug |
41 | | -``` |
| 42 | +3. Query the data: |
42 | 43 |
|
43 | | -6. Load the CSVs with the demo data set. This materializes the CSVs as tables in your target schema. Note that a typical dbt project **does not require this step** since dbt assumes your raw data is already in your warehouse. |
44 | | -```bash |
45 | | -$ dbt seed |
46 | | -``` |
| 44 | + Launch a DuckDB command-line interface (CLI): |
47 | 45 |
|
48 | | -7. Run the models: |
49 | | -```bash |
50 | | -$ dbt run |
51 | | -``` |
| 46 | + ```shell |
| 47 | + duckcli jaffle_shop.duckdb |
| 48 | + ``` |
52 | 49 |
|
53 | | -> **NOTE:** If this steps fails, it might mean that you need to make small changes to the SQL in the models folder to adjust for the flavor of SQL of your target database. Definitely consider this if you are using a community-contributed adapter. |
| 50 | + Run a query at the prompt and exit: |
54 | 51 |
|
55 | | -8. Test the output of the models: |
56 | | -```bash |
57 | | -$ dbt test |
58 | | -``` |
| 52 | + ``` |
| 53 | + select * from customers_with_order_info where customer_id = 42; |
| 54 | + exit; |
| 55 | + ``` |
59 | 56 |
|
60 | | -9. Generate documentation for the project: |
61 | | -```bash |
62 | | -$ dbt docs generate |
63 | | -``` |
| 57 | + Alternatively, use a single-liner to perform the query: |
64 | 58 |
|
65 | | -10. View the documentation for the project: |
66 | | -```bash |
67 | | -$ dbt docs serve |
68 | | -``` |
| 59 | + ```shell |
| 60 | + duckcli jaffle_shop.duckdb -e "select * from customers_with_order_info where customer_id = 42" |
| 61 | + ``` |
69 | 62 |
|
70 | | -### What is a jaffle? |
71 | | -A jaffle is a toasted sandwich with crimped, sealed edges. Invented in Bondi in 1949, the humble jaffle is an Australian classic. The sealed edges allow jaffle-eaters to enjoy liquid fillings inside the sandwich, which reach temperatures close to the core of the earth during cooking. Often consumed at home after a night out, the most classic filling is tinned spaghetti, while my personal favourite is leftover beef stew with melted cheese. |
| 63 | + or: |
72 | 64 |
|
73 | | ---- |
74 | | -For more information on dbt: |
75 | | -- Read the [introduction to dbt](https://docs.getdbt.com/docs/introduction). |
76 | | -- Read the [dbt viewpoint](https://docs.getdbt.com/docs/about/viewpoint). |
77 | | -- Join the [dbt community](http://community.getdbt.com/). |
78 | | ---- |
| 65 | + ```shell |
| 66 | + echo 'select * from customers_with_order_info where customer_id = 42' | duckcli jaffle_shop.duckdb |
| 67 | + ``` |
| 68 | + |
| 69 | +4. Generate and view the documentation for the project: |
| 70 | + ```shell |
| 71 | + dbt docs generate |
| 72 | + dbt docs serve |
| 73 | + ``` |
| 74 | + |
| 75 | +## Running `build` steps independently |
| 76 | + |
| 77 | +1. Load the CSVs with the demo data set. This materializes the CSVs as tables in your target schema. Note that a typical dbt project **does not require this step** since dbt assumes your raw data is already in your warehouse. |
| 78 | + |
| 79 | + ```shell |
| 80 | + dbt seed |
| 81 | + ``` |
| 82 | + |
| 83 | +2. Run the models: |
| 84 | + |
| 85 | + ```shell |
| 86 | + dbt run |
| 87 | + ``` |
| 88 | + |
| 89 | + > **NOTE:** If you decide to run this project in your own data warehouse (outside of this DuckDB demo) and steps fail, it might mean that you need to make small changes to the SQL in the models folder to adjust for the flavor of SQL of your target database. Definitely consider this if you are using a community-contributed adapter. |
| 90 | +
|
| 91 | +3. Test the output of the models using the [test](https://docs.getdbt.com/reference/commands/test) command: |
| 92 | + ```shell |
| 93 | + dbt test |
| 94 | + ``` |
| 95 | + |
| 96 | +## Browsing the data |
| 97 | + |
| 98 | +Some options: |
| 99 | + |
| 100 | +- [DuckDB UI](https://duckdb.org/docs/stable/extensions/ui.html) |
| 101 | +- [duckcli](https://pypi.org/project/duckcli/) |
| 102 | +- [DuckDB CLI](https://duckdb.org/docs/installation/?environment=cli) |
0 commit comments