-
-
Notifications
You must be signed in to change notification settings - Fork 3
Designing an universal & relational query engine #9
Description
Introduction
We want to design a query engine that can accept any representation of SQL query statement, including but not limited to GraphQL and SQL. Then, the engine will parse the input query and convert it into a subquery tree that can be executed in parallel. Finally, execute those subquery and consolidate the result of each to produce the final output.
Architecture
Query Parser
It will parse any types that implemented
QueryParserTraitinto the internal typeQueryAST, which represents the query syntax tree.
By default, QueryParserTrait for GraphQL and SQL will be implemented. Of cause, QueryParserTrait can be implemented by any other foreign types as well.
QueryAST is a struct that have all the necessary information to reconstruct query statement on the target database or data storage, including...
- SELECT
- FROM
- JOIN
- WHERE
- ORDER BY
- GROUP BY
- LIMIT
- OFFSET
Query Planner
Split the
QueryASTinto a non-relational query tree that can be executed in parallel. And, come up with a strategy to consolidate those query results to form the final result.
If a query have subquery in it, those subquery would be split off and execute as a independent query. There are three place we could find subquery in a query, including...
- SELECT
- FROM
- WHERE
Also, we have to consider whether the subquery is correlated subquery or not.
- Correlated subquery: the correlated part, in the following case,
p1.productline, should be substituted with a concrete value and fetch oneAVG(buyprice)for each products. Or, group byproductlineand match the list ofAVG(buyprice)accordingly.SELECT productname, buyprice FROM products p1 --p1 was used in the subquery WHERE buyprice > (SELECT AVG(buyprice) FROM products WHERE productline = p1.productline)
- Independent subquery: fetch the result of subquery and apply the filter or match it against parent query
SELECT productname, buyprice FROM products WHERE buyprice > (SELECT AVG(buyprice) FROM products)
Query Executor
Execute the non-relational query tree in parallel and sort the result in a way that is easy to be consolidated as the final result.
How are we going to sort and aggregate the result efficiently? I think it's better to ask db engine to sort the result for us.
Brainstorming
- Are we going to support cross database querying? E.g. query aggregated data from multiple databases. It can be homogeneous: between multiple database of the same kind, such as between two MySQL. Or, it can be heterogeneous: between multiple database of different kind, such as between MySQL and PostgreSQL.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status