Skip to content

Conversation

@Johnnas12
Copy link
Collaborator

This pull request introduces a new, flexible, and config-driven pipeline for loading workflow data into Neo4j, designed to support multiple data schemas and use cases. The pipeline includes scripts for converting workflow JSON to CSV, generating Cypher files, and loading data directly or via Cypher into Neo4j. The changes are grouped as follows:

Pipeline Implementation and Data Conversion:

  • Added convert_json_to_csv.py to flatten workflow JSON into multiple CSV files for workflows, workflow files, steps, inputs, outputs, input connections, and tools. This script ensures schema flexibility and supports optional tool extraction.
  • Created a new README.md describing the pipeline architecture, components, ID strategy, relationships, and usage instructions for the generic loader.
  • Added a placeholder README.md in the data directory.

Neo4j Loading and Cypher Generation:

  • Added generic_csv_loader.py, a config-driven loader that ingests the generated CSVs into Neo4j using deterministic IDs and merges nodes and relationships based on the configuration.
  • Added generate_cypher_files.py to emit Cypher files (nodes.cypher and edges.cypher) from CSVs, using the same configuration and ID strategy as the loader.
  • Added cypher_batch_loader.py, which executes the generated Cypher files against Neo4j, processing nodes before edges for correct graph construction.

Introduces YAML and Pydantic-based schema definitions for nodes and relationships in the ingestion pipeline, enabling flexible, strongly-typed modeling of workflows, tools, categories, steps, and their interconnections. Facilitates easier schema evolution and validation for ingesting graph-structured data.
Introduces a generic, schema-flexible loader to convert workflow JSON into normalized CSVs and ingest them into Neo4j via a configurable mapping. Enables reusable, idempotent data import of workflows, tools, and relationships without hardcoding schema, supporting future extensibility and simplified dataset integration.
Introduces a new relationship definition to model the association between workflow steps and tools, enabling more granular tracking of tool usage within workflows. This enhances data integrity and supports future analytical capabilities regarding step dependencies.
Streamlines the loader file by deleting an extensive docstring
with usage instructions and configuration notes. Aims to reduce
redundancy and improve maintainability, assuming documentation
is maintained elsewhere.
Introduces a utility for converting CSV-based node and relationship data
into Cypher statements suitable for database import. Automates property
handling, identifier generation, and supports configurable schemas to
streamline graph data loading workflows.
Introduces a script to execute generated Cypher files for batch-loading nodes and edges into Neo4j. Automates the import process by reading Cypher files, splitting statements, and running them sequentially, improving reproducibility and simplifying data ingestion workflows.

Adds batch Cypher loader for automated Neo4j imports

Introduces a script to automate loading nodes and edges into Neo4j
from generated Cypher files. Enhances reproducibility and simplifies
the data ingestion workflow by running Cypher statements in order,
reducing manual intervention and potential for errors.
@Johnnas12 Johnnas12 merged commit 1643b9a into main Dec 25, 2025
0 of 2 checks passed
aprilyab pushed a commit to aprilyab/galaxy-agent-xp-II that referenced this pull request Dec 28, 2025
…pe-and-dynamic-configs

 flexible, and config-driven pipeline for loading workflow data into Neo4j
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants