Build Stage Inputs and Outputs #1575
alexrichey
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This expands on our Roadmapping discussion on Wed April 10, and this mock PR
We'd like our builds be have an identical set of commands to run them end-to-end. ie
One way to accomplish that is by ensuring that each stage in the build provides enough info for the next stage to start. If we were to use the existing build_metadata.json, the flow might look like this (using yaml here just for ease-of-reading)
Additionally,
build.planwould need to set a few additional variables in the recipe_lock, for subsequent stages, like the BUILD_NOTE, the output directory for the build, etc.The lifecycle functions like
dcpy.lifecycle.build.buildcould potentially just return a path to the build_metadata.json. Thendcpy.lifecycle.build.publish_draftwould have enough information to do it's thing.We also discussed making the steps of the build explicit, and potentially including packaging and distribution, but decided that since we'd probably want similar machinery for datasets that we don't produce (e.g. we'd want to use package for LION datasets)
those type of instructions probably belong more in the product metadata repo.
There's also the consideration for how this plays with DAG tools like Prefect or Argo. But even in GHA, this would simplify things pretty tremendously.
As a first step, we should start adding additional outputs to the recipe in plan, like the build-note.
In the distant future, maybe the build_metadata (or equivalent) lives in a location that's configurable, rather than just the filesystem. Ie it could live in a builds database, or on s3, etc.
Beta Was this translation helpful? Give feedback.
All reactions