flexible, and config-driven pipeline for loading workflow data into Neo4j #38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new, flexible, and config-driven pipeline for loading workflow data into Neo4j, designed to support multiple data schemas and use cases. The pipeline includes scripts for converting workflow JSON to CSV, generating Cypher files, and loading data directly or via Cypher into Neo4j. The changes are grouped as follows:
Pipeline Implementation and Data Conversion:
convert_json_to_csv.pyto flatten workflow JSON into multiple CSV files for workflows, workflow files, steps, inputs, outputs, input connections, and tools. This script ensures schema flexibility and supports optional tool extraction.README.mddescribing the pipeline architecture, components, ID strategy, relationships, and usage instructions for the generic loader.README.mdin thedatadirectory.Neo4j Loading and Cypher Generation:
generic_csv_loader.py, a config-driven loader that ingests the generated CSVs into Neo4j using deterministic IDs and merges nodes and relationships based on the configuration.generate_cypher_files.pyto emit Cypher files (nodes.cypherandedges.cypher) from CSVs, using the same configuration and ID strategy as the loader.cypher_batch_loader.py, which executes the generated Cypher files against Neo4j, processing nodes before edges for correct graph construction.