-
Notifications
You must be signed in to change notification settings - Fork 15
Issue 93 #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 93 #96
Conversation
CodSpeed Performance ReportMerging #96 will degrade performances by 25.02%Comparing Summary
Benchmarks breakdown
|
- no edge: [].
- 1 edge: {key: value}
- 2 edges: [{k:v}, {k: v}]. Cannot access attributes in return and
where.
|
Hi @j6k4m8 , I'm doing this big refactor with a lot of changes. I need your help in reviewing and analyzing the impact and correctness of this. The PR comes with 4 commites so I hope it's eaiser to follow. return and check edge following neo4j
In this commit, I separate the get_edge and get_node into 2 different utilities. Also I unwind multiedge graph such that each edge is a row. This mean no more labels and attribute aggregation and confusion. I'll explain the unwind multi edge later in the 3rd commit. But it's the same as edge hop, I will take every single edge key between 2 nodes, and we have catesian product where each edge with unique edge key is a row. Along the way, I treat relation as a list of edges, not just None or the one aggregated edge as we are having. This means some thing break if we access edge properties in where statement and return statement. As stated in the commit message
refactor edgehop to utilOur code is supporting variadic edge hop by generate different motif for every combination of edge hop value. The edge hop catesian production and generation is refactored to use models and classes. I introduce the I delegate the realization of the motif to docs and testThis is important one as wel as I test and document some refactored function earlier like To improve maintainability, We now have so many data structures like key, hop, edge mapping, node mapping (match) passing around. I deliberately introduce EdgePath, kindda similar to path = (a, b, c, d, e). With EdgePath, we can access the path itself, or the edges for the path, similar to edges = ((a, b), (b, c), (c, d), (d, e). And of course, we have the edge stored inside. EdgeMapping is a wrapper for edge hops and edge keys together. You can access EdgePath of a relation from EdgeMapping without knowing much how to construct an EdgePath Match model now is a combination of NodeMapping, EdgeMapping, and where_result (idk what is this for). I introduce a small model call MotifToHostView or mth. This should give us a lot of convenience not to map to a host node everytime we access a node in an edge access nodes with mth access edges with mth edge_path = edge_mapping[("A", "C")]
edge_path_edges = list(zip(edge_path[:-1], edge_path[1:]))
assert [(match[0][a], match[0][b]) for a, b in edge_path_edges] == [("a", "b"), ("b, "c")]we do edge_path = match.mth.edge("A", "C")
assert edge_path .edges == [("a", "b"), ("b, "c")] # this is simplified example, the real model include edge keytouch upsome cosmetic docs and model-keys renamed. Let me know if you have any concern. P.SI'm not sure why performance is not at par, do you have any insight? the catesian of multiedge keys should not be a main factor. This is not tested with nx.Graph |
|
WOOHOOO I'm reading through this now! 🎉
I wouldn't worry about this, this performance benchmarks seem a little wonky anyway. I will override this if we need to, I don't think that this is a real performance regression but I'll double-check before we merge. |
j6k4m8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grandcypher/struct.py
Outdated
| u: str | ||
| v: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| u: str | |
| v: str | |
| u: NodeID | |
| v: NodeID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I make change in my local, together with future comments from you
| ) | ||
|
|
||
|
|
||
| class MotifToHostView: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a smart data structure to have around
|
I'm still reading a bit to make sure I understand what's going on, but this is an amazing update! |
|
@j6k4m8 I have added the 5th commit. Sorry for this inconvenience. I merged the #92 into this actually Collapse Nodes of Zero Edge Hop using Disjoint Set UnionWe have some issues with handling zero edge hop. chain zero hop edgematch (A) -[*0]->(B)-[*0]->(C)This should collapse not only B to A, but also C to A edge referencing to collapsed nodematch (A) -[*0]->(B) -->(C)The edge (B, C) should become (A, C) using DSU as a solutionDSU is known for solving this efficiently Additional improvementdocs and testsmore docs and tests Double check multi graph edgesAs grandiso cannot efficiently filter multiple edges betwen 2 nodes, we have to do this in our code (line 1294-1309). We only do the double check if there are more than 1 edge between 2 nodes if edge.u != edge.v and self._target_graph.number_of_edges(host_u, host_v) < 2:
continuedirect equijoin fixGrandiso doesn't work well with equijoin. match (n) --> (n)We fix this by double check the edge if it's a direct equijoin or skip the double check if it's not if edge.u != edge.v and self._target_graph.number_of_edges(host_u, host_v) < 2:
continue |
|
I finally got a chance to take a look at the latest — this is fabulous, I'm really excited to get this merged in :) Okay with you for me to merge this? I am overriding the codspeed performance regression as it looks like it's a CPython internal. @davidmezzetti I imagine this may impact some existing userbases if they have saved queries; my intention is to roll this as a v2.0.0 so it's not auto-updated in the existing consuming libraries and tools. Is that compatible with your use? |
|
I'm okay to merge. Regarding the release management, are we going to maintain the v1.x branches (fix bug, etc) for a while? If possible, can we hold on to the release of v2.0? I'm going to have other PR soon. |
|
definitely, I'll merge this in but hold off on the 2* release! |
@j6k4m8 Exciting to see the library evolving and progressing! As long as the breaking changes are documented, I'll fix it in TxtAI where possible. When not possible, I'll at least be able to point to the docs on how to update queries. |

Uh oh!
There was an error while loading. Please reload this page.