This directory contains the technical documentation for the lifecycle of the cloudSQL migration from C to C++, and its subsequent expansion into a distributed engine.
Focus: Type safety and Paged Storage.
- Modernized
Valuesystem usingstd::variant. - Binary-compatible
StorageManagerand thread-safeBufferPoolManager. - Slot-based
HeapTableimplementation.
Focus: Volcano Model & Communication.
- Iterator-based physical operators (
SeqScan,Filter,Project,HashJoin). - POSIX-based internal RPC layer.
- PostgreSQL Wire Protocol (Handshake + Simple Query).
- Local
LockManagerfor concurrency control.
Focus: SQL Ingestion & Metadata.
- Recursive Descent Parser for DDL and DML.
- Global
Catalogfor schema management. - Integration of System Tables for persistence.
Focus: Raft Consistency.
- Core Raft implementation (Leader Election, Heartbeats, Replication).
- Catalog-Raft integration for consistent metadata.
ClusterManagerfor node discovery and membership.
Focus: Performance & Advanced Advanced Joins.
- Shard Pruning logic for targeted routing.
- Global Aggregation Merging (COUNT/SUM).
- Broadcast Join orchestration.
- Inter-node data redistribution (Shuffle infrastructure).
Focus: High-throughput Data Redistribution.
- Context-aware Shuffle infrastructure in
ClusterManager. - Implementation of
ShuffleFragmentandPushDataRPC protocols. - Two-phase Shuffle Join orchestration in
DistributedExecutor. - Bloom Filter Optimization: Probabilistic tuple filtering to reduce network traffic in shuffle joins.
Focus: Fault Tolerance & Data Redundancy.
- Multi-Group Raft management via
RaftManager. - Log-based data replication for DML operations.
- Leader-aware query routing and automatic failover.
Focus: Columnar Storage & Vectorized Execution.
- Native Columnar storage implementation with binary persistence.
- Batch-at-a-time vectorized execution model (Scan, Filter, Project, Aggregate).
- High-performance
NumericVectorandVectorBatchdata structures.
Focus: Engine Robustness & E2E Validation.
- Advanced Execution: Full support for
LEFT,RIGHT, andFULLouter joins. - Transactional Integrity: Persistent connection-based execution state and comprehensive
ROLLBACKsupport for all DML operations. - Logic Validation: Integration of the SqlLogicTest (SLT) suite with 80+ logic test cases covering Joins, Transactions, Aggregates, and Indexes.
- Automation: Standardized cross-platform test orchestration via
run_test.shwith automatic CPU detection.
- Standard: C++17
- Build System: CMake
- Tests: GoogleTest
- Protocol: Binary internal RPC / PostgreSQL Wire Protocol external.