Skip to content

Putting pieces together: prospective recipes #17

@liamhuber

Description

@liamhuber

Since we've got tools in different places, and they are starting to inter-operate more deeply and broadly, I wanted to have a discussion (@samwaseda, @jan-janssen, and anyone else who's interested) in how those pieces will fit together.

At a high level, I would describe part of the process as Prospective Recipe -> WfMS Engine -> Retrospective Graph. We already have a couple examples of the prospective side, including the python-workflow-definition and the workflow code here (and (mostly?) duplicated in semantikon). At a stretch, one could describe the un-executed pyiron_workflow graphs as a "prospective recipe", but I would rather make a semantic destinction between a real recipe representation, and a WfMS being interoperable with such recipe representations.

The retrospective graph question is super interesting, and I think we can already make simple claims about it (pure DAG, etc.) and I do think it's worthwhile considering that side of the coin in the same universal and interoperable way -- e.g. "python-workflow-data", or inclusion here. I also think that the complex part of that question is how to serialize the python objects making up the data, so to the extent that we deal with values and defaults in the prospective graph, these will look very similar. I can imagine that there is lots of shared code, or even (nearly?) identical data structures used for both. But here I really just want to focus on the prospective graph.

I would make a few claims about what I think the prospective graph tool should do, with links to how far along we are in flowrep/semantikon.X:

  • The prospective recipe should have clear type annotations at all times
    • A big pain point for me has been trying to develop around dictionaries, where I can only see what winds up in them by running the code, or reading the source code and making inferences based on use cases I see
    • We've started pushing towards this in semantikon.datastructures, but it is not yet fully integrated
  • The prospective recipe should support procedural programming
    • We can think about extensions as we go, but this is a solid foundation and plays extremely well with graph-based descriptions of processes
    • That means we need to support
      • Steps/functions -- Done and supported in python-workflow-definition, and obviously also (without formal connection) in pyiron_workflow.
        • There is no decorator in semantikon/flowrep to attack a function node recipe to a function object -- I think we probably want one.
      • Groups/subgraphs/macros -- I don't have a discussion to reference here, but these are effectively done, and Sam parses them with a decorator. Also supported in python-workflow-definition and (without formal connection) in pyiron_workflow.
      • For -- We have some idea here, and a not-formally-connected use case in pyiron_workflow
        • I think our current discussion needs to further clarify the difference between nested and zipped for-loops, or perhaps to separately introduce the concept of zipping, as for-loops come in those two important flavours.
      • While -- We have a clear picture, and a not-formally-connected use case in pyiron_workflow
      • If/Elif/Else -- I think we've made tonnes of progress in the discussion, but this is fundamentally a really tricky one. We necessarily place some restrictions on the user to make a sensible definition, and I'm open to more conversation on what those are/how we frame them.
      • Try/Except/Finally -- This is a python-special that is extremely similar to if/elif/else, and once we get that licked we should be able to transfer our learning to solve this very easily.
    • In terms of my node taxonomy...
      • I think we always want nodes to be static -- i.e. having prospectively defined IO -- otherwise they can't be productively in recipes
      • For, While, If, and Try nodes are all dynamic -- we will prospectively know their interface, but not their subgraph bodies! I see zero problem with this.
        • Note that in the retrospective graph, these are all turned into simple DAGs (already the case for pyiron_workflow for- and while-loops, not yet for if)

IMO, that is the state of our business, but what shall we do with it? We have a variety of tools floating around, where do the borders fall for the responsibility of each, and how do they interact? With the caveat that I hold some of these opinions quite loosely as a "best thought so far", and that this is not meant to be an exhaustive list but focuses on modules relevant to what I'm supposed to be doing for PMD:

  • semantikon
    • Purpose: Handles complex typing, including units and ontology
    • Key user interface: @u decorator for attaching annotations, ...?
    • Non-responsibilities: anything to do with workflows
    • Dependent on: nothing here
  • flowrep
    • Purpose: parse, validate (for prospective graphs), and represent workflow graphs
      • Prospective recipes for now, but both sides eventually
    • Key user interface: @node, @macro, etc. decorators for attaching prospective recipes to python code
    • Non-responsibilities: actually executing the graphs
    • Dependent on: semantikon
      • Since semantikon doesn't explicitly know about graphs, we'll need to leverage tools there here in the context of graph connections in order to validate recipes
  • pyiron_workflow
    • Purpose: build and execute workflows, and provide retrospective data investigation
    • Key user interface: @function, @macro, for_node, etc. wrappers to turn python code/functions into graph nodes
      • Don't worry about the actual names here, I'm just trying to get across the spirit of the matter
    • Non-responsibilities: parsing and validating recipes
    • Dependent on: flowrep

I see room for bagofholding to manage serialization in different places, and @jan-janssen I haven't mentioned executorlib at all yet, but obviously pyiron_workflow already integrates that and I can imagine that it may be possible to parse at least the non-dynamic DAG flowrep recipes directly into executorlib executions. We can dig into these things in this thread, it's just not what I'm driving at this exact moment.

The end goal I am driving at is that I would like for pyiron_workflow parsers to internally first-and-always leverage flowrep parsers to generate a flowrep recipe for the object they are parsing, and then to build a pyiron_workflow engine object directly from that recipe. As far as I can see, that attack as no real cons and three big pros:

  1. By construction, all pyiron_workflow nodes make a flowrep recipe trivially available -- i.e. automatic outbound-interoperability with other flowrep-compliant WfMS
  2. We only worry about having and maintaining a single python-code-to-graph parser to develop and maintain -- the one in flowrep (i.e. @samwaseda's clever ast work)
  3. We get automatic inbound-interoperability with recipes generated by other flowrep-compliant WfMS, since under the hood we always generate node classes from the parsed recipe -- we can instead pass the recipe directly to the generator

Above I talked about how there is no "formal connection" between the flowrep/semantikon data structures and pyiron_workflow. This parsing setup is what I mean by formal connection. However, there is already an extremely good conceptual connection, so I am optimistic this sort of more formal connection will be mostly straightforward to build. This sort of decomposition allows us to offload the heavy-lifting part of parsing as well as connection validation over to flowrep, which should allow pyiron_workflow to slim down and focus more on the "engine" part of the job, which I like and I suspect @jan-janssen will be particularly happy with.

These are the concrete steps I see to get from where we are to where I'd like to go:

I'd like to unify the concepts of a connection (we already only allow one inbound) and a value_receiver (for negotiating subgraph IO) into a single source concept, which will complete the conceptual alignment of flowrep and pyiron_workflow, but after that I think (c) and (d) above can nicely be done in parallel -- it's not all-or-nothing to generate pwf nodes from recipes, we can do it as they're available.

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions