Scalar aggregation classes by khoale88 · Pull Request #97 · aplbrain/grand-cypher

khoale88 · 2025-12-06T15:46:40Z

No description provided.

codspeed-hq · 2025-12-06T15:51:04Z

CodSpeed Performance Report

Merging #97 will degrade performances by 10.85%

_{Comparing khoale88:scalar_aggregation_classes (4646928) with master (bf30e3e)}

Summary

❌ 2 regressions
✅ 97 untouched

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`test_nested_nots_in_statements[MultiDiGraph]`	33.3 ms	37.4 ms	-10.85%
❌	`test_nested_nots_in_statements[DiGraph]`	33.3 ms	37.3 ms	-10.84%

khoale88 · 2025-12-06T15:57:17Z

I know there might be some performance hit and I 'll take a look at that. I will provide details for this PR later, but basically, Refactoring and better support for scalar functions like ID, toLower, toUpper, Trim, Type, Coalesce, Size (return and left side condition). Support for better aggregation as well, including Collect, Relationships, list comprehensive syntax. And lastly Condition on list such as ANY, ALL, None, Single

khoale88 · 2025-12-06T16:42:27Z

Commit 1: Support list predicates all, any, none, single, size, relationships

support list_expression and relationships function

The list expression can be the entity_id which is a list by itself or through relationships function

list_expression      : "relationships"i "(" entity_id ")"  -> relationships_function
                             | entity_id  -> entity_list

The relationships function is to retrieve all relationships in a mattern matching, but I don't have that ATM. Right now we simply have this in our test and support

        MATCH (a)-[r*1..3]->(c)
        WHERE size(relationships(r)) = 2
        RETURN r

Basically, it just return the value (already a list). But this should be the base sample for future support. This is an extension of ListExpression

class RelationshipsFunction(ListExpression)

support all/none/single/any

This works on list_expression. One of the sample is

        MATCH (a)-[r*2]->(c)
        WHERE all(edge IN r WHERE edge.weight > 5)
        RETURN r

These classes are just conditions. And they accept nested compound condition.

        MATCH (a)-[r*2]->(c)
        WHERE all(edge IN r WHERE edge.weight > 5 AND edge.type = "friend")
        RETURN r

SIZE function

we support size to compute size of a list. In this commit, it is an instance of Comdition, but it should be a Scalar Function.

Introduction to Scope.

the signatures of these function are quite the same as Condition, except that they accept scope (dict) parameter. Consider the following query, the scope variable is the "edge", element of r, which is not a, neither r, neither c. so match doesn't work.

        MATCH (a)-[r*2]->(c)
        WHERE all(edge IN r WHERE edge.weight > 5 AND edge.type = "friend")
        RETURN r

khoale88 · 2025-12-06T16:59:05Z

Commit 2: refactor ID(A) to Scalar Function

ID and ScalarFunction

We have an intermediate class call ScalarFunction, it is an extension of Condition (idk why I inherit it as a Condition, but it doesn't really matter for now)

Under Scalar we have ID, and SIZE. Basically, the ID is getting a node id from match. So we nolonger have to manually handle ID(A) in the where, and the return.

For example in the lookup, we just call the scalar function data_path(...) to get the value. This should work with all scalar functions like ID and SIZE in both WHERE and RETURN statements

for data_path in data_paths:
            if isinstance(data_path, ScalarFunction):
                # Evaluate scalar function for each match
                ret = []
                for match in true_matches:
                    result_value = data_path(
                        match,
                        self._target_graph,
                        self._return_edges,
                        scope=None
                    )
                    ret.append(result_value)

                # Use str(data_path) as key: "ID(A)", "size(r)", etc.
                result[str(data_path)] = ret[offset_limit]
                processed_paths.add(data_path)
                processed_paths.add(str(data_path))
                continue

khoale88 · 2025-12-06T17:39:00Z

Commit 3+4: refactor aggregation to classes

Class AggregationFunction and aggregation.

Introduce classes AggregationFunction (abstract) and concrete Count, Sum, Avg, Max, Min, Collect
The names are self-explainatory, some we already support in aggregation, but this is a little bit different. we have separated classes for aggregation instead of just a function handling if-else cases.
This should pave the way for handling aggregation in WHERE clause in the future with WITH statement.
Along the way, we discover something we do it wrong, and I haven't fix it. The following query should not work because we return AVR(r.value) and r.value at the same time, which is invalid query.

        MATCH (n)-[r]->()
        RETURN n.name, AVG(r.value), r.value
        ORDER BY AVG(r.value) DESC

ScalarFunction and others

Introduce ScalarFunction, ToLower, ToUpper, Trim, Type, and Coalesce
This should work in both RETURN and WHERE statements
Just realize I forgot to test scalar in WHERE, but this should work (manually checked)

    def test_string_functions_with_where(self):
        """Test string functions in WHERE clause"""
        host = nx.DiGraph()
        host.add_node("a", name="ALICE")
        host.add_node("b", name="BOB")
    
        qry = """
        MATCH (n)
        WHERE toLower(n.name) = "alice"
        RETURN n.name
        """
        res = GrandCypher(host).run(qry)
        assert set(res["n.name"]) == {"ALICE"}

introduction to EntityAttributeGetter

This is to answer the question: We handle Return ID(A), size(r), how about Return A, B.name? I just move the logic of getting variables EntityAttributeGetter, including getting match node/match edge/scope variable

def evaluate(self, match: Match, host: nx.DiGraph,
                 return_edges: dict = None, scope: dict = None):
        """
        Evaluate this entity reference against a match.

        Priority order for resolution:
        1. Scope variables (highest priority - for list predicates)
        2. Node mappings (standard case)
        3. Edge mappings (for edge references)
        4. None (not found)

        Args:
            match: The current match containing node mappings
            host: The graph to query
            return_edges: Optional edge mappings for edge references
            scope: Optional scope dictionary for list predicate variables

        Returns:
            The attribute value if found, None otherwise
        """
        # 1. Check scope first (highest priority for list predicates)
        if scope and self.entity in scope:
            element = scope[self.entity]
            if self.attribute:
                # Scope variable with attribute access: e.related
                return element.get(self.attribute) if isinstance(element, dict) else None
            # Simple scope variable: e
            return element

        # 2. Check node mappings (standard case)
        if self.entity in match.node_mappings:
            node_id = match.node_mappings[self.entity]
            if self.attribute:
                # Node with attribute: n.name
                return host.nodes[node_id].get(self.attribute)
            # Simple node reference: n - return full node dictionary
            return dict(host.nodes[node_id])

        # 3. Check edge mappings (for edge references)
        if return_edges and self.entity in return_edges:
            edge_mapping = return_edges[self.entity]
            host_edges = match.mth.edge(*edge_mapping).edges
            return get_edge_from_host(host, host_edges, self.attribute)

        return None

Commit 5: docs

let's check docs/architecture docs

khoale88 · 2025-12-06T17:41:51Z

More work to be done

more tests regarding scalar and aggregation, simple and complex, in WHERE clause.
more throughout test with auto-hint to see these changes does trigger auto hints properly or not?

j6k4m8 · 2025-12-19T01:14:24Z

@khoale88 sorry I haven't been commenting — I HAVE been following this closely, this is really exciting work, and I think gets us away from a lot of the hacks this repo was using earlier on and puts us on much firmer ground. THANK YOU!

I'm wondering how you're thinking about approaching functions like size where they can act on a list or on, say, a string:

size('hello') // should return 5

[docs]

It looks like the function should support it, though I think the grammar expects a list right now?

I'm super excited about this PR!!!

khoale88 · 2025-12-21T16:01:52Z

you are right @j6k4m8 , syntax support should be easy, let me add that ... after the break :D.

khoale88 added 5 commits December 6, 2025 22:42

support list predicates all, any, none, single, size, relationships

6918a1d

refactor ID(A) to Scalar Function

df736f6

refactor aggregation to classes

1a5adbc

refactor entity getter to scalar functions

9a27df5

architecture docs

4583f32

khoale88 force-pushed the scalar_aggregation_classes branch from 2fdc232 to acc89ca Compare December 13, 2025 17:52

suppport nested scalar in aggregation, more tests

4646928

khoale88 force-pushed the scalar_aggregation_classes branch from acc89ca to 4646928 Compare December 13, 2025 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalar aggregation classes#97

Scalar aggregation classes#97
khoale88 wants to merge 6 commits intoaplbrain:masterfrom
khoale88:scalar_aggregation_classes

khoale88 commented Dec 6, 2025

Uh oh!

codspeed-hq bot commented Dec 6, 2025 •

edited

Loading

Uh oh!

khoale88 commented Dec 6, 2025

Uh oh!

khoale88 commented Dec 6, 2025 •

edited

Loading

Uh oh!

khoale88 commented Dec 6, 2025

Uh oh!

khoale88 commented Dec 6, 2025

Uh oh!

khoale88 commented Dec 6, 2025

Uh oh!

j6k4m8 commented Dec 19, 2025

Uh oh!

khoale88 commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

khoale88 commented Dec 6, 2025

Uh oh!

codspeed-hq bot commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #97 will degrade performances by 10.85%

Summary

Benchmarks breakdown

Uh oh!

khoale88 commented Dec 6, 2025

Uh oh!

khoale88 commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Commit 1: Support list predicates all, any, none, single, size, relationships

support list_expression and relationships function

support all/none/single/any

SIZE function

Introduction to Scope.

Uh oh!

khoale88 commented Dec 6, 2025

Commit 2: refactor ID(A) to Scalar Function

ID and ScalarFunction

Uh oh!

khoale88 commented Dec 6, 2025

Commit 3+4: refactor aggregation to classes

Class AggregationFunction and aggregation.

ScalarFunction and others

introduction to EntityAttributeGetter

Commit 5: docs

Uh oh!

khoale88 commented Dec 6, 2025

More work to be done

Uh oh!

j6k4m8 commented Dec 19, 2025

Uh oh!

khoale88 commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq bot commented Dec 6, 2025 •

edited

Loading

khoale88 commented Dec 6, 2025 •

edited

Loading