-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Most Important
How do we integrate this with dcpy.lifecycle?
wrap our functions with airflow decorators, and deploy flows. Very straightforward.
How do you run this on Azure?
We'd need:
- A
prefect serverinstance. Handles API traffic, and delegates to theprefect services - A
prefect servicesinstance. Handles distributing/managing jobs. Not needed for your local setup. - A Redis Queue
- Postgres DB
In the initial stage, we could even run all of these services in one container via supervisord, or similar, as we don't really need to persist much of anything (e.g. flow runs) long-term.
In github, for every PR, in addition to creating a docker tag, we'd probably deploy flows to Prefect. (a similarly, for deleted branches, we'd prune the flow on Prefect). The deployments themselves would look something like this:
build:
image: myrepo/myimage:latest
pull:
- prefect.deployments.steps.git_clone:
repository: https://github.com/your/repo.git
branch: my-feature-branch
token: "{{ prefect.blocks.secret.GITHUB_TOKEN }}"
subdirectory: path/to/your/code
destination: /opt/prefect/flows
deployments:
- name: my-deployment
entrypoint: /opt/prefect/flows/my_flow.py:my_flow
work_pool:
name: my-poolWhere for any given deployment we'd specify the source code to pull and mount into the image, quite similar to Github Actions. (but probably quite different from the typical Airflow pattern)
How do you spawn tasks via the UI
For any deployed flow, you can click a button to kick off a run. Very easy. Form inputs can be generated off of pydantic classes.
Can you configure automatic retries for a task? E.g. if A Socrata distribute job fails, can you parse the response and retry under certain conditions?
yes, very easy.
Can you retry a task (specifically, a sub component of a job) via the UI?
yes, very easy.
Can tasks wait for human input (e.g. someone clicking a button in the UI). If so, how do you implement this?
yes, there's are a few mechanisms in the Prefect python lib to pause and wait. See example in distribute/publish.
Infra: How do you set this up locally? E.g. suppose we had a new developer start. How would we get them set up?
Docker compose can replicate the production env. And just running via python will spin up a little server with a UI. Very easy.
Other Considerations
Can other, non-pythonista teams (e.g. AE) implement workflows?
Kinda... Say AE wanted to run a node app. They would need to execute a node process from Python. We could pretty easily make some glue to ease the impedance mismatch.
Conceptually, how easy/straightforward is it to understand? How much jargon does it introduce?
There are a handful of straightforward terms:
- Deployments
- Flows
- Tasks
- Worker Pools
Are there any special integrations with dbt?
Yeah, but I can't vouch for it. A tool like this will execute a dbt dag and generate a diagram.
Downsides to mention
The Task/Flow diagrams are messy, especially the default view:

You can't change the default yet, either.
The Temporal Sequence view is a little bit better, but you need to click to swap that view EVERY SINGLE TIME you load the page:

The UI for waiting-for-user-input is half-baked - you could train a non-technical person to use it, but I don't believe they could intuit things on their own.
Verdict
This would definitely work for our use case. The infrastructure and concepts are straightforward, and the local development experience is quite good. We'd struggle with the flow diagrams, but otherwise the UI is nice and quite intuitive.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status