Skip to content

Latest commit

 

History

History
71 lines (58 loc) · 3.3 KB

File metadata and controls

71 lines (58 loc) · 3.3 KB

cloudSQL C++ Migration & Distributed Roadmap

This directory contains the technical documentation for the lifecycle of the cloudSQL migration from C to C++, and its subsequent expansion into a distributed engine.

Lifecycle Phases

Focus: Type safety and Paged Storage.

  • Modernized Value system using std::variant.
  • Binary-compatible StorageManager and thread-safe BufferPoolManager.
  • Slot-based HeapTable implementation.

Focus: Volcano Model & Communication.

  • Iterator-based physical operators (SeqScan, Filter, Project, HashJoin).
  • POSIX-based internal RPC layer.
  • PostgreSQL Wire Protocol (Handshake + Simple Query).
  • Local LockManager for concurrency control.

Focus: SQL Ingestion & Metadata.

  • Recursive Descent Parser for DDL and DML.
  • Global Catalog for schema management.
  • Integration of System Tables for persistence.

Focus: Raft Consistency.

  • Core Raft implementation (Leader Election, Heartbeats, Replication).
  • Catalog-Raft integration for consistent metadata.
  • ClusterManager for node discovery and membership.

Focus: Performance & Advanced Advanced Joins.

  • Shard Pruning logic for targeted routing.
  • Global Aggregation Merging (COUNT/SUM).
  • Broadcast Join orchestration.
  • Inter-node data redistribution (Shuffle infrastructure).

Focus: High-throughput Data Redistribution.

  • Context-aware Shuffle infrastructure in ClusterManager.
  • Implementation of ShuffleFragment and PushData RPC protocols.
  • Two-phase Shuffle Join orchestration in DistributedExecutor.
  • Bloom Filter Optimization: Probabilistic tuple filtering to reduce network traffic in shuffle joins.

Focus: Fault Tolerance & Data Redundancy.

  • Multi-Group Raft management via RaftManager.
  • Log-based data replication for DML operations.
  • Leader-aware query routing and automatic failover.

Focus: Columnar Storage & Vectorized Execution.

  • Native Columnar storage implementation with binary persistence.
  • Batch-at-a-time vectorized execution model (Scan, Filter, Project, Aggregate).
  • High-performance NumericVector and VectorBatch data structures.

Phase 9 — Stability & Testing Refinement

Focus: Engine Robustness & E2E Validation.

  • Advanced Execution: Full support for LEFT, RIGHT, and FULL outer joins.
  • Transactional Integrity: Persistent connection-based execution state and comprehensive ROLLBACK support for all DML operations.
  • Logic Validation: Integration of the SqlLogicTest (SLT) suite with 80+ logic test cases covering Joins, Transactions, Aggregates, and Indexes.
  • Automation: Standardized cross-platform test orchestration via run_test.sh with automatic CPU detection.

Technical Standards

  • Standard: C++17
  • Build System: CMake
  • Tests: GoogleTest
  • Protocol: Binary internal RPC / PostgreSQL Wire Protocol external.