flexible, and config-driven pipeline for loading workflow data into Neo4j #38

Johnnas12 · 2025-12-25T13:35:22Z

This pull request introduces a new, flexible, and config-driven pipeline for loading workflow data into Neo4j, designed to support multiple data schemas and use cases. The pipeline includes scripts for converting workflow JSON to CSV, generating Cypher files, and loading data directly or via Cypher into Neo4j. The changes are grouped as follows:

Pipeline Implementation and Data Conversion:

Added convert_json_to_csv.py to flatten workflow JSON into multiple CSV files for workflows, workflow files, steps, inputs, outputs, input connections, and tools. This script ensures schema flexibility and supports optional tool extraction.
Created a new README.md describing the pipeline architecture, components, ID strategy, relationships, and usage instructions for the generic loader.
Added a placeholder README.md in the data directory.

Neo4j Loading and Cypher Generation:

Added generic_csv_loader.py, a config-driven loader that ingests the generated CSVs into Neo4j using deterministic IDs and merges nodes and relationships based on the configuration.
Added generate_cypher_files.py to emit Cypher files (nodes.cypher and edges.cypher) from CSVs, using the same configuration and ID strategy as the loader.
Added cypher_batch_loader.py, which executes the generated Cypher files against Neo4j, processing nodes before edges for correct graph construction.

Introduces YAML and Pydantic-based schema definitions for nodes and relationships in the ingestion pipeline, enabling flexible, strongly-typed modeling of workflows, tools, categories, steps, and their interconnections. Facilitates easier schema evolution and validation for ingesting graph-structured data.

Introduces a generic, schema-flexible loader to convert workflow JSON into normalized CSVs and ingest them into Neo4j via a configurable mapping. Enables reusable, idempotent data import of workflows, tools, and relationships without hardcoding schema, supporting future extensibility and simplified dataset integration.

Introduces a new relationship definition to model the association between workflow steps and tools, enabling more granular tracking of tool usage within workflows. This enhances data integrity and supports future analytical capabilities regarding step dependencies.

Streamlines the loader file by deleting an extensive docstring with usage instructions and configuration notes. Aims to reduce redundancy and improve maintainability, assuming documentation is maintained elsewhere.

Introduces a utility for converting CSV-based node and relationship data into Cypher statements suitable for database import. Automates property handling, identifier generation, and supports configurable schemas to streamline graph data loading workflows.

Introduces a script to execute generated Cypher files for batch-loading nodes and edges into Neo4j. Automates the import process by reading Cypher files, splitting statements, and running them sequentially, improving reproducibility and simplifying data ingestion workflows. Adds batch Cypher loader for automated Neo4j imports Introduces a script to automate loading nodes and edges into Neo4j from generated Cypher files. Enhances reproducibility and simplifies the data ingestion workflow by running Cypher statements in order, reducing manual intervention and potential for errors.

…pe-and-dynamic-configs flexible, and config-driven pipeline for loading workflow data into Neo4j

Johnnas12 added 6 commits December 8, 2025 14:07

Removes verbose module docstring from loader script

8b133d5

Streamlines the loader file by deleting an extensive docstring with usage instructions and configuration notes. Aims to reduce redundancy and improve maintainability, assuming documentation is maintained elsewhere.

Johnnas12 merged commit 1643b9a into main Dec 25, 2025
0 of 2 checks passed

aprilyab pushed a commit to aprilyab/galaxy-agent-xp-II that referenced this pull request Dec 28, 2025

Merge pull request iCog-Labs-Dev#38 from iCog-Labs-Dev/integration/ty…

9ba472d

…pe-and-dynamic-configs flexible, and config-driven pipeline for loading workflow data into Neo4j

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flexible, and config-driven pipeline for loading workflow data into Neo4j #38

flexible, and config-driven pipeline for loading workflow data into Neo4j #38

Uh oh!

Johnnas12 commented Dec 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flexible, and config-driven pipeline for loading workflow data into Neo4j #38

flexible, and config-driven pipeline for loading workflow data into Neo4j #38

Uh oh!

Conversation

Johnnas12 commented Dec 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants