- Learn how to provision computing resources for running Big Data analyses using the Infrastructure as Code (IaC) approach.
- Learn how to set up opinionated CI/CD pipelines to deploy cloud infrastructure.
- Learn how to utilize linters for detecting security vulnerabilities in cloud infrastructure.
- Learn how to run Apache Spark code in a distributed way on Hadoop cluster using Vertex AI notebooks and Dataproc services on GCP.
- Learn how to use Workload Identity Federation for a secure authentication from GitHub Actions
to Google Cloud.

- Google Cloud SDK
- terraform ~> 1.11.0
- gsutil
- pre-commit
- Terraform ( Requirements )
- Python ~>3.8
- Linux/MacOS
- pre-commit-terraform dependencies
- Redeem a GCP coupon to create a billing account
- Authenticate to GCP to obtain the default credentials used for running the code
# first remove the stored credentials if exist
gcloud auth application-default revoke
# login and get the new application credentials
gcloud auth application-default login- Fork this repository to your own Github account.
- Export shared environment variables
export TF_VAR_tbd_semester=2025Z
# format: 20xx for teachers, student ID number for students
export TF_VAR_user_id=9901
# use your own billing account id
export TF_VAR_billing_account=01A068-6FDD3F-47FD8C
# for budget creation
export GOOGLE_BILLING_PROJECT=$(echo "tbd-${TF_VAR_tbd_semester}-${TF_VAR_user_id}" | tr '[:upper:]' '[:lower:]')- Enter
bootstrapfolder then init project and Terraform state bucket
cd bootstrap
terraform init
terraform apply
cd ..- CI/CD (Github Actions setup using Workload Identity Federation)
- Edit
env/backend.tfvarsfile and setbucketvariable with the Terraform state bucket - Edit
env/project.tfvarsfile and setproject_name,iac_service_accountvariables using the output from thebootstrapphase, e.g.:
- Edit
cicd_bootstrap/conf/github_actions.tfvarsto setgithub_organdgithub_repo, e.g.:
github_org = "mwiewior"
github_repo = "tbd-workshop-1"
- Init state file and set env variables
cd cicd_bootstrap
terraform init -backend-config=../env/backend.tfvars- Apply
# authenticate Docker backend with GCP
gcloud auth configure-docker
# create CI/CD integration using Workload Identity
terraform apply -var-file ../env/project.tfvars -var-file conf/github_actions.tfvars -compact-warnings
cd ..- Use output variables for configuring Github Actions workflow:
.github/workflows/pull-request.yml,e.g. :
Please do not edit and hardcode these values in a YAML but set the Github Actions secrets instead
while preserving the secret names, i.e. GCP_WORKLOAD_IDENTITY_PROVIDER_NAMEandGCP_WORKLOAD_IDENTITY_SA_EMAIL.
Also, set the INFRACOST_API_KEY secret. Register at infracost.io to obtain your API key.
- Install and configure
pre-commit
pre-commit install- Commit changes, push to a branch and open a PR to YOUR repository main/master branch.
If you see a warning like this -- please enable the workflows:
...and repush your changes!
Once all Pull Requests checks have passed please merge your PR and wait until your release job finishes.
- IMPORTANT ❗ ❗ ❗ Please remember to destroy all the resources after the workshop:
terraform init -backend-config=env/backend.tfvars
terraform destroy -no-color -var-file env/project.tfvars | Name | Version |
|---|---|
| terraform | ~> 1.11.0 |
| docker | 3.0.2 |
| ~> 5.44.0 | |
| kubernetes | 2.24.0 |
| Name | Version |
|---|---|
| 5.44.2 | |
| kubernetes | 2.24.0 |
| Name | Source | Version |
|---|---|---|
| composer | ./modules/composer | n/a |
| data-pipelines | ./modules/data-pipeline | n/a |
| dataproc | ./modules/dataproc | n/a |
| dbt_docker_image | ./modules/dbt_docker_image | n/a |
| gcr | ./modules/gcr | n/a |
| vpc | ./modules/vpc | n/a |
| Name | Type |
|---|---|
| google_compute_firewall.allow-all-internal | resource |
| kubernetes_service.dbt-task-service | resource |
| google_client_config.provider | data source |
| google_container_cluster.composer-gke-cluster | data source |
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| project_name | Project name | string |
n/a | yes |
| region | GCP region | string |
"europe-west1" |
no |
No outputs.
