Putting pieces together: prospective recipes

Since we've got tools in different places, and they are starting to inter-operate more deeply and broadly, I wanted to have a discussion (@samwaseda, @jan-janssen, and anyone else who's interested) in how those pieces will fit together.

At a high level, I would describe part of the process as `Prospective Recipe -> WfMS Engine -> Retrospective Graph`. We already have a couple examples of the prospective side, including the [python-workflow-definition](https://github.com/pythonworkflow/python-workflow-definition) and the workflow code here (and (mostly?) duplicated in `semantikon`). At a stretch, one could describe the un-executed `pyiron_workflow` graphs as a "prospective recipe", but I would rather make a semantic destinction between a real recipe representation, and a WfMS being interoperable with such recipe representations.

The retrospective graph question is super interesting, and I think we can already make simple claims about it (pure DAG, etc.) and I do think it's worthwhile considering that side of the coin in the same universal and interoperable way -- e.g. "python-workflow-data", or inclusion here. I also think that the complex part of that question is how to serialize the python objects making up the data, so to the extent that we deal with values and defaults in the prospective graph, these will look very similar. I can imagine that there is lots of shared code, or even (nearly?) identical data structures used for both. But here I really just want to focus on the prospective graph.

I would make a few claims about what I think the prospective graph tool should do, with links to how far along we are in `flowrep`/`semantikon.X`:

- **The prospective recipe should have clear type annotations at all times**
  - A big pain point for me has been trying to develop around dictionaries, where I can only see what winds up in them by running the code, or reading the source code and making inferences based on use cases I see
  - We've started pushing towards this in [`semantikon.datastructures`](https://github.com/pyiron/semantikon/blob/31102a68511b1c53bc8e5b947fbb30d2c3e434d5/semantikon/datastructure.py), but it is not yet fully integrated
- **The prospective recipe should support [procedural programming](https://en.wikipedia.org/wiki/Procedural_programming)**
  - We can think about extensions as we go, but this is a solid foundation and plays _extremely_ well with graph-based descriptions of processes
  - That means we need to support
    - [Steps/functions](https://github.com/pyiron/flowrep/issues/21) -- [Done](https://github.com/pyiron/semantikon/blob/31102a68511b1c53bc8e5b947fbb30d2c3e434d5/semantikon/datastructure.py#L163-L165) and supported in `python-workflow-definition`, and obviously also (without formal connection) in [`pyiron_workflow`](https://github.com/pyiron/pyiron_workflow/blob/f711b4baa06689ce059d69dc3d6b55cd97d06f9e/pyiron_workflow/nodes/function.py#L16).
      - There is no decorator in `semantikon`/`flowrep` to attack a function node recipe to a function object -- I think we probably want one.
    - Groups/subgraphs/macros -- I don't have a discussion to reference here, but these are effectively [done](https://github.com/pyiron/semantikon/blob/31102a68511b1c53bc8e5b947fbb30d2c3e434d5/semantikon/datastructure.py#L187-L190), and Sam [parses them with a decorator](https://github.com/pyiron/semantikon/blob/31102a68511b1c53bc8e5b947fbb30d2c3e434d5/semantikon/workflow.py#L1095). Also supported in `python-workflow-definition` and (without formal connection) in [`pyiron_workflow`](https://github.com/pyiron/pyiron_workflow/blob/f711b4baa06689ce059d69dc3d6b55cd97d06f9e/pyiron_workflow/nodes/macro.py#L31).
    - [For](https://github.com/pyiron/flowrep/issues/23) -- We have some idea here, and a not-formally-connected [use case in `pyiron_workflow`](https://github.com/pyiron/pyiron_workflow/blob/f711b4baa06689ce059d69dc3d6b55cd97d06f9e/pyiron_workflow/nodes/for_loop.py)
      - I think our current discussion needs to further clarify the difference between nested and zipped for-loops, or perhaps to separately introduce the concept of zipping, as for-loops come in those two important flavours.
    - [While](https://github.com/pyiron/flowrep/issues/24) -- We have a clear picture, and a not-formally-connected [use case in `pyiron_workflow`](https://github.com/pyiron/pyiron_workflow/blob/f711b4baa06689ce059d69dc3d6b55cd97d06f9e/pyiron_workflow/nodes/while_loop.py)
    - [If/Elif/Else](https://github.com/pyiron/flowrep/issues/22) -- I think we've made tonnes of progress in the discussion, but this is fundamentally a really tricky one. We necessarily place some restrictions on the user to make a sensible definition, and I'm open to more conversation on what those are/how we frame them.
    - [Try/Except/Finally](https://github.com/pyiron/flowrep/issues/20) -- This is a python-special that is extremely similar to if/elif/else, and once we get that licked we should be able to transfer our learning to solve this very easily.
  - In terms of my [node taxonomy](https://github.com/pyiron/pyiron_core/issues/28)...
    - I think we _always_ want nodes to be static -- i.e. having prospectively defined IO -- otherwise they can't be productively in recipes
    - For, While, If, and Try nodes are all dynamic -- we will prospectively know their interface, but not their subgraph bodies! I see zero problem with this.
      - Note that in the _retrospective_ graph, these are all turned into simple DAGs (already the case for `pyiron_workflow` `for`- and `while`-loops, not yet for `if`)

IMO, that is the state of our business, but what shall we do with it? We have a variety of tools floating around, where do the borders fall for the responsibility of each, and how do they interact? With the caveat that I hold some of these opinions quite loosely as a "best thought so far", and that this is not meant to be an exhaustive list but focuses on modules relevant to what I'm supposed to be doing for PMD:
- `semantikon`
  - Purpose: Handles complex typing, including units and ontology
  - Key user interface: `@u` decorator for attaching annotations, ...?
  - Non-responsibilities: anything to do with workflows
  - Dependent on: nothing here
- `flowrep`
  - Purpose: parse, validate (for prospective graphs), and represent workflow graphs
    - Prospective recipes for now, but both sides eventually
  - Key user interface: `@node`, `@macro`, etc. decorators for attaching prospective recipes to python code
  - Non-responsibilities: actually executing the graphs
  - Dependent on: `semantikon`
    - Since `semantikon` doesn't explicitly know about graphs, we'll need to leverage tools there here in the context of graph connections in order to validate recipes
- `pyiron_workflow`
  - Purpose: build and execute workflows, and provide retrospective data investigation
  - Key user interface: `@function`, `@macro`, `for_node`, etc. wrappers to turn python code/functions into graph nodes
    - Don't worry about the [actual names here](https://github.com/pyiron/pyiron_workflow/issues/742), I'm just trying to get across the spirit of the matter
  - Non-responsibilities: parsing and validating recipes
  - Dependent on: `flowrep`

I see room for `bagofholding` to manage serialization in different places, and @jan-janssen I haven't mentioned `executorlib` at all yet, but obviously `pyiron_workflow` already integrates that and I can imagine that it may be possible to parse at least the non-dynamic DAG `flowrep` recipes directly into `executorlib` executions. We can dig into these things in this thread, it's just not what I'm driving at this exact moment.

The end goal I _am_ driving at is that I would like for `pyiron_workflow` parsers to internally first-and-always leverage `flowrep` parsers to generate a `flowrep` recipe for the object they are parsing, and _then_ to build a `pyiron_workflow` engine object directly from that recipe. As far as I can see, that attack as no real cons and three big pros:
1. By construction, all `pyiron_workflow` nodes make a `flowrep` recipe trivially available -- i.e. automatic outbound-interoperability with other `flowrep`-compliant WfMS
2. We only worry about having and maintaining a _single_ python-code-to-graph parser to develop and maintain -- the one in `flowrep` (i.e. @samwaseda's clever `ast` work)
3. We get automatic inbound-interoperability with recipes generated by other `flowrep`-compliant WfMS, since under the hood we _always_ generate node classes from the parsed recipe -- we can instead pass the recipe directly to the generator

Above I talked about how there is no "formal connection" between the `flowrep`/`semantikon` data structures and `pyiron_workflow`. This parsing setup is what I mean by formal connection. However, there is already an extremely good conceptual connection, so I am optimistic this sort of more formal connection will be mostly straightforward to build. This sort of decomposition allows us to offload the heavy-lifting part of parsing as well as connection validation over to `flowrep`, which should allow `pyiron_workflow` to slim down and focus more on the "engine" part of the job, which I like and I suspect @jan-janssen will be particularly happy with.

These are the concrete steps I see to get from where we are to where I'd like to go:

- (a) break apart the responsibilities of `semantikon` and `flowrep` to purge `semantikon` of any workflow-related duties
  - We'll want [the api](https://github.com/pyiron/semantikon/issues/172) for this
  - I've started this process already by [git-copying `semantikon.datastructures` over to `flowrep`](https://github.com/pyiron/flowrep/pull/16) and [purging it and the (already migrated and currently duplicated?) `semantikon.workflow` tools from `semantikon`](https://github.com/pyiron/semantikon/pull/235)
- (b) Adapt `flowrep` to leverage the dataclasses instead of dictionaries throughout
  - @samwaseda, you good with this? I can start taking swings at it, but will likely need some guidance
- (c) Extend `flowrep` representations and parsers to support the remaining node types for which we have open issues
- (d) Modify `pyiron_workflow` to build its nodes by first leveraging `flowrep` parsers and then building from the resulting recipe 

I'd like to unify the concepts of a `connection` (we already only allow one inbound) and a `value_receiver` (for negotiating subgraph IO) into a single `source` concept, which will complete the conceptual alignment of `flowrep` and `pyiron_workflow`, but after that I think (c) and (d) above can nicely be done in parallel -- it's not all-or-nothing to generate `pwf` nodes from recipes, we can do it as they're available.

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Putting pieces together: prospective recipes #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Putting pieces together: prospective recipes #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions