Skip to content

Orchestration Tool POC: Prefect #1998

@alexrichey

Description

@alexrichey

Most Important

How do we integrate this with dcpy.lifecycle?

wrap our functions with airflow decorators, and deploy flows. Very straightforward.

How do you run this on Azure?

We'd need:

  • A prefect server instance. Handles API traffic, and delegates to the prefect services
  • A prefect services instance. Handles distributing/managing jobs. Not needed for your local setup.
  • A Redis Queue
  • Postgres DB

In the initial stage, we could even run all of these services in one container via supervisord, or similar, as we don't really need to persist much of anything (e.g. flow runs) long-term.

In github, for every PR, in addition to creating a docker tag, we'd probably deploy flows to Prefect. (a similarly, for deleted branches, we'd prune the flow on Prefect). The deployments themselves would look something like this:

build:
  image: myrepo/myimage:latest

pull:
  - prefect.deployments.steps.git_clone:
      repository: https://github.com/your/repo.git
      branch: my-feature-branch
      token: "{{ prefect.blocks.secret.GITHUB_TOKEN }}"
      subdirectory: path/to/your/code
      destination: /opt/prefect/flows

deployments:
  - name: my-deployment
    entrypoint: /opt/prefect/flows/my_flow.py:my_flow
    work_pool:
      name: my-pool

Where for any given deployment we'd specify the source code to pull and mount into the image, quite similar to Github Actions. (but probably quite different from the typical Airflow pattern)

How do you spawn tasks via the UI

For any deployed flow, you can click a button to kick off a run. Very easy. Form inputs can be generated off of pydantic classes.

Can you configure automatic retries for a task? E.g. if A Socrata distribute job fails, can you parse the response and retry under certain conditions?

yes, very easy.

Can you retry a task (specifically, a sub component of a job) via the UI?

yes, very easy.

Can tasks wait for human input (e.g. someone clicking a button in the UI). If so, how do you implement this?

yes, there's are a few mechanisms in the Prefect python lib to pause and wait. See example in distribute/publish.

Infra: How do you set this up locally? E.g. suppose we had a new developer start. How would we get them set up?

Docker compose can replicate the production env. And just running via python will spin up a little server with a UI. Very easy.

Other Considerations

Can other, non-pythonista teams (e.g. AE) implement workflows?

Kinda... Say AE wanted to run a node app. They would need to execute a node process from Python. We could pretty easily make some glue to ease the impedance mismatch.

Conceptually, how easy/straightforward is it to understand? How much jargon does it introduce?

There are a handful of straightforward terms:

  • Deployments
  • Flows
  • Tasks
  • Worker Pools

Are there any special integrations with dbt?

Yeah, but I can't vouch for it. A tool like this will execute a dbt dag and generate a diagram.

Downsides to mention

The Task/Flow diagrams are messy, especially the default view:
Image
You can't change the default yet, either.

The Temporal Sequence view is a little bit better, but you need to click to swap that view EVERY SINGLE TIME you load the page:
Image

The UI for waiting-for-user-input is half-baked - you could train a non-technical person to use it, but I don't believe they could intuit things on their own.

Verdict

This would definitely work for our use case. The infrastructure and concepts are straightforward, and the local development experience is quite good. We'd struggle with the flow diagrams, but otherwise the UI is nice and quite intuitive.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions