-
Notifications
You must be signed in to change notification settings - Fork 3
Repository Structure
This folder is a bit of a catch all for more admin/devops-related things. For now it contains two subfolders
This contains all files related to maintaining our run environments. This includes
- python requirements. We define our requirements in
requirements.in, and compile them using uv to resolve versions inrequirements.txt. We also generate aconstraints.txtwhich isrequirements.txtbut slightly reformatted so that it can be used bypipas a constraints file. This constraints file is then used to ensure that all of our docker images, regardless of what packages are actually installed on that image, are using the same versions of each python package we use. The script that compiles them lives inadmin/ops - a
dockersubfolder. This folder contains bash scripts, DockerFiles, and more that are used to manage our various docker images. In addition to being used for our dev container image for development, these images are used for most of our github actions. See more about our docker images here
This folder contains various narrow-scoped scripts that we use for various devops-related tasks.
Our apps folder contains any apps that we produce. For now, this is one - our QA streamlit app, which is deployed on Digital Ocean. Each app folder should contain all code necessary for running it and deploying it (outside of GitHub Actions)
This folder contains bash utilities that are used across product builds. We're increasingly moving away from our bash utilities in favor of managing control flow of our processes in python, but for now they're still used across the codebase.
dcpy is our internal python package. Python is increasingly our language of choice for various parts of our product lifecycle, and dcpy contains numerous submodules for things like utilities, connectors to third parties, and our orchestrating lifecycle code. For more info, see dcpy
Various code-generated documentation
In products are one folder for each of our data products (and an extra one - "template" is our sandbox data product for testing out new workflows and technology.
Each of these folders contains all information and code needed to build a product. The goal is for this to really be two things
- a recipe file. This is a yaml file used by
dcpy.lifecycle.buildsto resolve versions of source datasets and load them into our build engine database - transformation logic. We're moving in the direction of this being sql files (postgres) that are run by dbt, but have a variety of structures and approaches across our products at the moment. In addition, every product still has some amount of bash scripting specific to that product, be it for running specific transformation steps (specifying order of sql files for many products), or generating export files. This logic will likely be moved eventually to dcpy as well, so that our product definitions can really be just two things - declarative metadata/instructions in yaml, and actual transformation logic in sql.