Skip to content

Commit 1e8515b

Browse files
committed
fix
1 parent 6c9d32d commit 1e8515b

File tree

8 files changed

+123
-34
lines changed

8 files changed

+123
-34
lines changed

.github/workflows/onpush.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@ jobs:
3737
run: |
3838
make deploy-dev
3939
40+
- name: Run job
41+
run: |
42+
make run-dev
43+
4044
- name: Run
4145
run: |
4246
make deploy-ci

Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
install:
2+
python3 -m pip install --upgrade pip
3+
pip install pipenv
24
pipenv install packages
35
pipenv run pytest tests/
6+
pipenv run pip list
47
pipenv shell
58

69
pre-commit:

Pipfile

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,10 @@ name = "pypi"
55

66
[packages]
77
funcy = "==2.0"
8-
packages = "*"
98
numpy = "==1.23.5"
109
pandas = "==1.5.3"
1110
pyarrow = "8.0.0"
1211
pydantic = "==2.7.4"
13-
unidecode = "==1.3.8"
1412
wheel = "==0.44.0"
1513
coverage = "==7.6.1"
1614
setuptools = "==72.1.0"
@@ -19,6 +17,7 @@ pytest = "==8.3.2"
1917
jinja2 = "==3.1.4"
2018
pyspark = "==3.5.1"
2119
pytest-cov = "==5.0.0"
20+
packages = "*"
2221

2322
[dev-packages]
2423

README.md

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,12 @@ This project template demonstrates how to:
1616
- utilize [pytest package](https://pypi.org/project/pytest/) to run unit tests on transformations.
1717
- utilize [argparse package](https://pypi.org/project/argparse/) to build a flexible command line interface to start your jobs.
1818
- utilize [funcy package](https://pypi.org/project/funcy/) to log the execution time of each transformation.
19-
- utilize [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/index.html) and (the new!!!) [Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/index.html) to package/deploy/run a Python wheel package on Databricks.
19+
- utilize [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/index.html) and [Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/index.html) to package/deploy/run a Python wheel package on Databricks.
2020
- utilize [Databricks SDK for Python](https://docs.databricks.com/en/dev-tools/sdk-python.html) to manage workspaces and accounts. This script enables your metastore system tables that have [relevant data about billing, usage, lineage, prices, and access](https://www.youtube.com/watch?v=LcRWHzk8Wm4).
2121
- utilize [Databricks Unity Catalog](https://www.databricks.com/product/unity-catalog) instead of Hive as your data catalog and earn for free data lineage for your tables and columns and a simplified permission model for your data.
2222
- utilize [Databricks Workflows](https://docs.databricks.com/en/workflows/index.html) to execute a DAG and [task parameters](https://docs.databricks.com/en/workflows/jobs/parameter-value-references.html) to share context information between tasks (see [Task Parameters section](#task-parameters)). Yes, you don't need Airflow to manage your DAGs here!!!
2323
- utilize [Databricks job clusters](https://docs.databricks.com/en/workflows/jobs/use-compute.html#use-databricks-compute-with-your-jobs) to reduce costs.
24+
- define clusters on AWS and Azure.
2425
- execute a CI/CD pipeline with [Github Actions](https://docs.github.com/en/actions) after a repo push.
2526

2627
For a debate about the use of notebooks x Python packages, please refer to:
@@ -98,13 +99,6 @@ Update "job_clusters" properties on wf_template.yml file. There are different pr
9899

99100
Configure [Github Actions repository secrets](https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions) DATABRICKS_HOST and DATABRICKS_TOKEN.
100101

101-
### 5) enable system tables on Catalog Explorer
102-
103-
python sdk_system_tables.py
104-
105-
106-
... and now you can code the transformations for each task and run unit and integration tests.
107-
108102

109103
# Task parameters
110104

conf/wf_template.yml

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ resources:
1717
tasks:
1818

1919
- task_key: extract_source1
20-
job_cluster_key: cluster-dev
20+
job_cluster_key: cluster-dev-aws
2121
max_retries: 0
2222
python_wheel_task:
2323
package_name: template
@@ -29,7 +29,7 @@ resources:
2929
- whl: ../dist/*.whl
3030

3131
- task_key: extract_source2
32-
job_cluster_key: cluster-dev
32+
job_cluster_key: cluster-dev-aws
3333
max_retries: 0
3434
python_wheel_task:
3535
package_name: template
@@ -44,7 +44,7 @@ resources:
4444
depends_on:
4545
- task_key: extract_source1
4646
- task_key: extract_source2
47-
job_cluster_key: cluster-dev
47+
job_cluster_key: cluster-dev-aws
4848
max_retries: 0
4949
python_wheel_task:
5050
package_name: template
@@ -58,7 +58,7 @@ resources:
5858
- task_key: generate_orders_agg
5959
depends_on:
6060
- task_key: generate_orders
61-
job_cluster_key: cluster-dev
61+
job_cluster_key: cluster-dev-aws
6262
max_retries: 0
6363
python_wheel_task:
6464
package_name: template
@@ -70,12 +70,26 @@ resources:
7070
- whl: ../dist/*.whl
7171

7272
job_clusters:
73-
- job_cluster_key: cluster-dev
73+
# - job_cluster_key: cluster-dev-azure
74+
# new_cluster:
75+
# spark_version: 15.3.x-scala2.12
76+
# node_type_id: Standard_D8as_v5
77+
# num_workers: 1
78+
# azure_attributes:
79+
# first_on_demand: 1
80+
# availability: SPOT_AZURE
81+
# data_security_mode: SINGLE_USER
82+
83+
- job_cluster_key: cluster-dev-aws
7484
new_cluster:
75-
spark_version: 15.3.x-scala2.12
76-
node_type_id: Standard_D8as_v5
77-
num_workers: 2
78-
azure_attributes:
85+
spark_version: 14.2.x-scala2.12
86+
node_type_id: c5d.xlarge
87+
num_workers: 1
88+
aws_attributes:
7989
first_on_demand: 1
80-
availability: SPOT_AZURE
81-
data_security_mode: SINGLE_USER
90+
availability: SPOT_WITH_FALLBACK
91+
zone_id: auto
92+
spot_bid_price_percent: 100
93+
ebs_volume_count: 0
94+
policy_id: 001934F3ABD02D4A
95+
data_security_mode: SINGLE_USER

conf/workflow.yml

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@ resources:
33
jobs:
44

55
default_python_job:
6-
name: data_reporting_${bundle.target}
6+
name: template_${bundle.target}
77
timeout_seconds: 3600
88

9-
9+
1010

1111
tasks:
1212

1313
- task_key: extract_source1
14-
job_cluster_key: cluster-dev
14+
job_cluster_key: cluster-dev-aws
1515
max_retries: 0
1616
python_wheel_task:
1717
package_name: template
@@ -23,7 +23,7 @@ resources:
2323
- whl: ../dist/*.whl
2424

2525
- task_key: extract_source2
26-
job_cluster_key: cluster-dev
26+
job_cluster_key: cluster-dev-aws
2727
max_retries: 0
2828
python_wheel_task:
2929
package_name: template
@@ -38,7 +38,7 @@ resources:
3838
depends_on:
3939
- task_key: extract_source1
4040
- task_key: extract_source2
41-
job_cluster_key: cluster-dev
41+
job_cluster_key: cluster-dev-aws
4242
max_retries: 0
4343
python_wheel_task:
4444
package_name: template
@@ -52,7 +52,7 @@ resources:
5252
- task_key: generate_orders_agg
5353
depends_on:
5454
- task_key: generate_orders
55-
job_cluster_key: cluster-dev
55+
job_cluster_key: cluster-dev-aws
5656
max_retries: 0
5757
python_wheel_task:
5858
package_name: template
@@ -64,12 +64,26 @@ resources:
6464
- whl: ../dist/*.whl
6565

6666
job_clusters:
67-
- job_cluster_key: cluster-dev
67+
# - job_cluster_key: cluster-dev-azure
68+
# new_cluster:
69+
# spark_version: 15.3.x-scala2.12
70+
# node_type_id: Standard_D8as_v5
71+
# num_workers: 1
72+
# azure_attributes:
73+
# first_on_demand: 1
74+
# availability: SPOT_AZURE
75+
# data_security_mode: SINGLE_USER
76+
77+
- job_cluster_key: cluster-dev-aws
6878
new_cluster:
69-
spark_version: 15.3.x-scala2.12
70-
node_type_id: Standard_D8as_v5
71-
num_workers: 2
72-
azure_attributes:
79+
spark_version: 14.2.x-scala2.12
80+
node_type_id: c5d.xlarge
81+
num_workers: 1
82+
aws_attributes:
7383
first_on_demand: 1
74-
availability: SPOT_AZURE
75-
data_security_mode: SINGLE_USER
84+
availability: SPOT_WITH_FALLBACK
85+
zone_id: auto
86+
spot_bid_price_percent: 100
87+
ebs_volume_count: 0
88+
policy_id: 001934F3ABD02D4A
89+
data_security_mode: SINGLE_USER

docs/ci_cd.drawio

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
<mxfile host="65bd71144e">
2+
<diagram id="mtFdcSvoKdh9C-5KIGSu" name="Page-1">
3+
<mxGraphModel dx="1048" dy="638" grid="0" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" background="#ffffff" math="0" shadow="0">
4+
<root>
5+
<mxCell id="0"/>
6+
<mxCell id="1" parent="0"/>
7+
<mxCell id="2" value="VS Code and notebooks" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;" vertex="1" parent="1">
8+
<mxGeometry x="440" y="190" width="120" height="60" as="geometry"/>
9+
</mxCell>
10+
<mxCell id="3" value="prototype" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontColor=#333333;" vertex="1" parent="1">
11+
<mxGeometry x="269" y="235" width="60" height="30" as="geometry"/>
12+
</mxCell>
13+
<mxCell id="5" value="dev catalog&lt;br&gt;(dev workspace)" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;" vertex="1" parent="1">
14+
<mxGeometry x="440" y="310" width="120" height="60" as="geometry"/>
15+
</mxCell>
16+
<mxCell id="6" value="stage catalog&lt;br&gt;(dev workspace)" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;" vertex="1" parent="1">
17+
<mxGeometry x="440" y="428" width="120" height="60" as="geometry"/>
18+
</mxCell>
19+
<mxCell id="8" value="prod catalog&lt;br&gt;(prod workspace)" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;" vertex="1" parent="1">
20+
<mxGeometry x="440" y="545" width="120" height="60" as="geometry"/>
21+
</mxCell>
22+
<mxCell id="9" value="" style="endArrow=classic;html=1;fillColor=#f5f5f5;strokeColor=#666666;" edge="1" parent="1">
23+
<mxGeometry width="50" height="50" relative="1" as="geometry">
24+
<mxPoint x="255" y="441" as="sourcePoint"/>
25+
<mxPoint x="405" y="461" as="targetPoint"/>
26+
</mxGeometry>
27+
</mxCell>
28+
<mxCell id="10" value="" style="endArrow=classic;html=1;fillColor=#f5f5f5;strokeColor=#666666;" edge="1" parent="1">
29+
<mxGeometry width="50" height="50" relative="1" as="geometry">
30+
<mxPoint x="260" y="370" as="sourcePoint"/>
31+
<mxPoint x="420" y="345" as="targetPoint"/>
32+
</mxGeometry>
33+
</mxCell>
34+
<mxCell id="11" value="" style="endArrow=classic;html=1;fillColor=#f5f5f5;strokeColor=#666666;" edge="1" parent="1">
35+
<mxGeometry width="50" height="50" relative="1" as="geometry">
36+
<mxPoint x="250" y="315" as="sourcePoint"/>
37+
<mxPoint x="410" y="225" as="targetPoint"/>
38+
</mxGeometry>
39+
</mxCell>
40+
<mxCell id="12" value="make &lt;br&gt;deploy-dev" style="text;html=1;align=center;verticalAlign=middle;resizable=0;points=[];autosize=1;strokeColor=none;fillColor=none;fontColor=#333333;" vertex="1" parent="1">
41+
<mxGeometry x="290" y="310" width="80" height="40" as="geometry"/>
42+
</mxCell>
43+
<mxCell id="13" value="&lt;div style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;background-color: initial;&quot;&gt;open PR&lt;/span&gt;&lt;/div&gt;&lt;div style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;background-color: initial;&quot;&gt;for every push:&lt;/span&gt;&lt;/div&gt;&lt;div style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;background-color: initial;&quot;&gt;&amp;nbsp; - run unit tests&lt;/span&gt;&lt;/div&gt;&lt;div style=&quot;text-align: left;&quot;&gt;&lt;span style=&quot;background-color: initial;&quot;&gt;&amp;nbsp; - deploy and run workflow&lt;/span&gt;&lt;/div&gt;" style="text;html=1;align=center;verticalAlign=middle;resizable=0;points=[];autosize=1;strokeColor=none;fillColor=none;fontColor=#333333;" vertex="1" parent="1">
44+
<mxGeometry x="271" y="379" width="170" height="70" as="geometry"/>
45+
</mxCell>
46+
<mxCell id="14" value="&lt;div style=&quot;text-align: left;&quot;&gt;PR approved&lt;/div&gt;" style="text;html=1;align=center;verticalAlign=middle;resizable=0;points=[];autosize=1;strokeColor=none;fillColor=none;fontColor=#333333;" vertex="1" parent="1">
47+
<mxGeometry x="290" y="488" width="90" height="30" as="geometry"/>
48+
</mxCell>
49+
<mxCell id="15" value="" style="endArrow=classic;html=1;fillColor=#f5f5f5;strokeColor=#666666;" edge="1" parent="1">
50+
<mxGeometry width="50" height="50" relative="1" as="geometry">
51+
<mxPoint x="250" y="495" as="sourcePoint"/>
52+
<mxPoint x="400" y="565" as="targetPoint"/>
53+
</mxGeometry>
54+
</mxCell>
55+
<mxCell id="18" value="" style="outlineConnect=0;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;shape=mxgraph.aws3.user;fillColor=#D2D3D3;gradientColor=none;" vertex="1" parent="1">
56+
<mxGeometry x="150" y="360" width="45" height="63" as="geometry"/>
57+
</mxCell>
58+
</root>
59+
</mxGraphModel>
60+
</diagram>
61+
</mxfile>

docs/ci_cd.png

38.2 KB
Loading

0 commit comments

Comments
 (0)