Issue 93 #96

khoale88 · 2025-11-27T17:13:55Z

handle multigraph to match with neo4j -> treat each edge as a row
return [] for no edge, {} for 1 edge and [{}, {}] for multi edge
refactor node/edge mapping to easier to follow
handle zero edge hop better
fix various bug

codspeed-hq · 2025-11-27T17:18:12Z

CodSpeed Performance Report

Merging #96 will degrade performances by 25.02%

_{Comparing khoale88:issue-93 (f26c01f) with master (e9dcd88)}

Summary

⚡ 1 improvement
❌ 3 regressions
✅ 93 untouched
🆕 2 new

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`test_alias_with_order_by`	48.2 ms	21.7 ms	×2.2
❌	`test_simple_multi_edge`	108 ms	144.1 ms	-25.02%
❌	`test_skip_and_limit`	138.1 ms	175.4 ms	-21.26%
❌	`test_skip_only`	108.8 ms	144.8 ms	-24.86%
🆕	`test_node_type_edge_zero_hop_complex[DiGraph]`	N/A	58.5 ms	N/A
🆕	`test_node_type_edge_zero_hop_complex[MultiDiGraph]`	N/A	58.6 ms	N/A

- no edge: []. - 1 edge: {key: value} - 2 edges: [{k:v}, {k: v}]. Cannot access attributes in return and where.

khoale88 · 2025-11-29T09:52:41Z

Hi @j6k4m8 , I'm doing this big refactor with a lot of changes. I need your help in reviewing and analyzing the impact and correctness of this.

The PR comes with 4 commites so I hope it's eaiser to follow.

return and check edge following neo4j

unwinde multi edge
no edge: [].
1 edge: {key: value}
2 edges: [{k:v}, {k: v}]. Cannot access attributes in return and
where.

In this commit, I separate the get_edge and get_node into 2 different utilities.

Also I unwind multiedge graph such that each edge is a row. This mean no more labels and attribute aggregation and confusion. I'll explain the unwind multi edge later in the 3rd commit. But it's the same as edge hop, I will take every single edge key between 2 nodes, and we have catesian product where each edge with unique edge key is a row.

Along the way, I treat relation as a list of edges, not just None or the one aggregated edge as we are having. This means some thing break if we access edge properties in where statement and return statement. As stated in the commit message

no edge is equivalent to []
one edge is equivalent to the edge it self {}. So the property access is working here
two or more edges is equivalent to [{}, {}]. Property access is confusing and is prohibited. I believe we need to implement relationship, ANY, ALL, and some other function to achieve the filtering on relationship

refactor edgehop to util

Our code is supporting variadic edge hop by generate different motif for every combination of edge hop value. The edge hop catesian production and generation is refactored to use models and classes.

I introduce the struct.HopSpec to store the original edge, the nodes after hop, the and the hop_count for this. It's important to store the hop_count for clarity and later usage

I delegate the realization of the motif to materialize_motif while I store the catesian product of edge hop edges in struct.HopAssignment (just a dictionary), and under the help of generate_edge_hop_specs and generate_hop_assignments

docs and test

This is important one as wel as I test and document some refactored function earlier like get_edge_from_host, find_multiedge_keys, generate_multiedge_edge_hop_key, etc.

To improve maintainability, We now have so many data structures like key, hop, edge mapping, node mapping (match) passing around. I deliberately introduce EdgePath, kindda similar to path = (a, b, c, d, e). With EdgePath, we can access the path itself, or the edges for the path, similar to edges = ((a, b), (b, c), (c, d), (d, e). And of course, we have the edge stored inside.

EdgeMapping is a wrapper for edge hops and edge keys together. You can access EdgePath of a relation from EdgeMapping without knowing much how to construct an EdgePath

Match model now is a combination of NodeMapping, EdgeMapping, and where_result (idk what is this for). I introduce a small model call MotifToHostView or mth. This should give us a lot of convenience not to map to a host node everytime we access a node in an edge

access nodes with mth
instead of assert match[0]["A"] == "a", we do assert match.mth.node("A") == "a'

access edges with mth
instead of

edge_path = edge_mapping[("A", "C")]
edge_path_edges = list(zip(edge_path[:-1], edge_path[1:]))
assert [(match[0][a], match[0][b]) for a, b in edge_path_edges] == [("a", "b"), ("b, "c")]

we do

edge_path = match.mth.edge("A", "C")
assert edge_path .edges == [("a", "b"), ("b, "c")]  # this is simplified example, the real model include edge key

touch up

some cosmetic docs and model-keys renamed.

Let me know if you have any concern.

P.S

I'm not sure why performance is not at par, do you have any insight? the catesian of multiedge keys should not be a main factor. This is not tested with nx.Graph

j6k4m8 · 2025-11-29T14:57:23Z

WOOHOOO I'm reading through this now! 🎉

I'm not sure why performance is not at par, do you have any insight? the catesian of multiedge keys should not be a main factor. This is not tested with nx.Graph

I wouldn't worry about this, this performance benchmarks seem a little wonky anyway. I will override this if we need to, I don't think that this is a real performance regression but I'll double-check before we merge.

j6k4m8

I just did a first pass and things look great — are you able to see the codspeed reports? Basically this looks like a bug on THEIR end, not something in the code:

CPython internals are none of my business :)

j6k4m8 · 2025-11-29T15:05:07Z

grandcypher/struct.py

+    u: str
+    v: str


Suggested change

u: str

v: str

u: NodeID

v: NodeID

Thanks, I make change in my local, together with future comments from you

j6k4m8 · 2025-11-29T15:14:07Z

grandcypher/struct.py

+        )
+
+
+class MotifToHostView:


This is such a smart data structure to have around

j6k4m8 · 2025-11-29T15:29:26Z

I'm still reading a bit to make sure I understand what's going on, but this is an amazing update!
Will double check with some existing workflows to make sure it doesn't break anything in end-to-end tests on my side

khoale88 · 2025-11-30T17:22:01Z

@j6k4m8 I have added the 5th commit. Sorry for this inconvenience. I merged the #92 into this actually

Collapse Nodes of Zero Edge Hop using Disjoint Set Union

We have some issues with handling zero edge hop.

chain zero hop edge

match (A) -[*0]->(B)-[*0]->(C)

This should collapse not only B to A, but also C to A

edge referencing to collapsed node

match  (A) -[*0]->(B) -->(C)

The edge (B, C) should become (A, C)

using DSU as a solution

DSU is known for solving this efficiently
after passing zero hop to DSU, we have alias that can be used to identify where is B and/or C is referenced to

Additional improvement

docs and tests

more docs and tests

Double check multi graph edges

As grandiso cannot efficiently filter multiple edges betwen 2 nodes, we have to do this in our code (line 1294-1309). We only do the double check if there are more than 1 edge between 2 nodes

if edge.u != edge.v and self._target_graph.number_of_edges(host_u, host_v) < 2:
    continue

direct equijoin fix

Grandiso doesn't work well with equijoin.

match (n) --> (n)

We fix this by double check the edge if it's a direct equijoin or skip the double check if it's not

if edge.u != edge.v and self._target_graph.number_of_edges(host_u, host_v) < 2:
    continue

j6k4m8 · 2025-12-06T14:14:06Z

I finally got a chance to take a look at the latest — this is fabulous, I'm really excited to get this merged in :)

Okay with you for me to merge this? I am overriding the codspeed performance regression as it looks like it's a CPython internal.

@davidmezzetti I imagine this may impact some existing userbases if they have saved queries; my intention is to roll this as a v2.0.0 so it's not auto-updated in the existing consuming libraries and tools. Is that compatible with your use?

khoale88 · 2025-12-06T14:55:03Z

I'm okay to merge. Regarding the release management, are we going to maintain the v1.x branches (fix bug, etc) for a while?

If possible, can we hold on to the release of v2.0? I'm going to have other PR soon.

j6k4m8 · 2025-12-06T15:17:51Z

definitely, I'll merge this in but hold off on the 2* release!

davidmezzetti · 2025-12-11T23:42:14Z

I finally got a chance to take a look at the latest — this is fabulous, I'm really excited to get this merged in :)

Okay with you for me to merge this? I am overriding the codspeed performance regression as it looks like it's a CPython internal.

@davidmezzetti I imagine this may impact some existing userbases if they have saved queries; my intention is to roll this as a v2.0.0 so it's not auto-updated in the existing consuming libraries and tools. Is that compatible with your use?

@j6k4m8 Exciting to see the library evolving and progressing! As long as the breaking changes are documented, I'll fix it in TxtAI where possible. When not possible, I'll at least be able to point to the docs on how to update queries.

khoale88 added 3 commits November 29, 2025 15:46

return and check edge following neo4j

544c345

- no edge: []. - 1 edge: {key: value} - 2 edges: [{k:v}, {k: v}]. Cannot access attributes in return and where.

refactor edgehop to util

2413569

docs and tests

5d8264a

khoale88 force-pushed the issue-93 branch from b4e0ffd to 9283d18 Compare November 29, 2025 08:50

touch up

47381cf

khoale88 force-pushed the issue-93 branch from 9283d18 to 47381cf Compare November 29, 2025 09:08

j6k4m8 reviewed Nov 29, 2025

View reviewed changes

khoale88 force-pushed the issue-93 branch from 73e2a25 to 156817a Compare November 30, 2025 17:05

use union find to collapse nodes of zero hop edges

f26c01f

khoale88 force-pushed the issue-93 branch from 156817a to f26c01f Compare November 30, 2025 17:06

j6k4m8 merged commit bf30e3e into aplbrain:master Dec 6, 2025
6 of 7 checks passed

-    u: str
-    v: str
+    u: NodeID
+    v: NodeID

		)


		class MotifToHostView:

Issue 93 #96

Issue 93 #96

Uh oh!

Conversation

khoale88 commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #96 will degrade performances by 25.02%

Summary

Benchmarks breakdown

Uh oh!

khoale88 commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

return and check edge following neo4j

refactor edgehop to util

docs and test

touch up

P.S

Uh oh!

j6k4m8 commented Nov 29, 2025

Uh oh!

j6k4m8 left a comment

Choose a reason for hiding this comment

Uh oh!

j6k4m8 Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

khoale88 Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

j6k4m8 Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

j6k4m8 commented Nov 29, 2025

Uh oh!

khoale88 commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Collapse Nodes of Zero Edge Hop using Disjoint Set Union

chain zero hop edge

edge referencing to collapsed node

using DSU as a solution

Additional improvement

docs and tests

Double check multi graph edges

direct equijoin fix

Uh oh!

j6k4m8 commented Dec 6, 2025

Uh oh!

khoale88 commented Dec 6, 2025

Uh oh!

j6k4m8 commented Dec 6, 2025

Uh oh!

Uh oh!

davidmezzetti commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

khoale88 commented Nov 27, 2025 •

edited

Loading

codspeed-hq bot commented Nov 27, 2025 •

edited

Loading

khoale88 commented Nov 29, 2025 •

edited

Loading

khoale88 commented Nov 30, 2025 •

edited

Loading