Skip to content

feat: high-performance engine optimizations and Prepared Statement API#19

Open
poyrazK wants to merge 20 commits intomainfrom
feature/engine-optimizations
Open

feat: high-performance engine optimizations and Prepared Statement API#19
poyrazK wants to merge 20 commits intomainfrom
feature/engine-optimizations

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Apr 8, 2026

This PR implements a series of high-performance engine optimizations to close the throughput gap with SQLite.

Key Changes:

  1. Zero-Allocation Execution: Integrated a PMR-compliant ArenaAllocator into the Tuple and Operator classes, eliminating redundant heap allocations during query execution.
  2. Prepared Statement API: Implemented a new PreparedStatement structure that caches Statement plans, Schema metadata, HeapTable references, and BTreeIndex objects to bypass SQL parsing and metadata resolution overhead.
  3. Storage Optimizations:
    • Last-Page Caching: Implemented a caching mechanism in HeapTable to keep the current insertion page pinned, significantly reducing fetch_page calls and latch contention.
    • Lightweight Lookups: Refactored BufferPoolManager and LockManager to use non-string keys (PageKey and TupleId), eliminating performance-killing string allocations and formatting.
  4. Parameter Binding: Added support for ? placeholders in the Lexer, Parser, and Expression evaluator.

Performance Impact:

Point insertion throughput increased from ~10k rows/s to ~15.6k rows/s on Apple M3 Pro, with sequential scans reaching ~3.1M items/s.

Summary by CodeRabbit

  • New Features

    • Parameterized queries using "?" and first-class prepared statements.
  • Performance Improvements

    • Statement caching and prepared-statement execution to reduce parse/plan overhead.
    • Faster bulk inserts with batched execution inside explicit transactions.
    • Scoped arena-backed allocation and pmr usage to lower per-query memory churn and speed up execution.
  • Other

    • Reduced noisy debug/error logging.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds prepared-statement support with parameter placeholders and caching; introduces a PMR arena allocator and propagates memory resources; extends parser and expressions to support ? parameters; changes storage page-keying and HeapTable single-page caching/optimized inserts; replaces string RIDs with structured TupleId for locking.

Changes

Cohort / File(s) Summary
Prepared Statements & Executor
include/executor/query_executor.hpp, src/executor/query_executor.cpp, benchmarks/sqlite_comparison_bench.cpp
Added PreparedStatement type, prepare() and execute(prepared, params) overloads, global statement cache + mutex, INSERT fast-path using cached table/index metadata; benchmark updated to prepare once, reuse preallocated params, and wrap iterations in a single transaction.
Parameter Parsing & Expression Eval
include/parser/expression.hpp, src/parser/expression.cpp, src/parser/lexer.cpp, src/parser/parser.cpp
Lexer recognizes ?; parser returns ParameterExpr and tracks param count; added ExprType::Parameter; expression evaluate(..., params) APIs added and propagated; ParameterExpr implemented to read bound params.
Arena & PMR Integration
include/common/arena_allocator.hpp, include/executor/types.hpp, include/executor/operator.hpp, src/executor/operator.cpp
New common::ArenaAllocator (pmr memory_resource) with reset(); Tuple uses std::pmr::vector; Operator stores/propagates memory_resource* and params pointer; operators allocate temporaries from provided MR.
Storage: BufferPool & HeapTable
include/storage/buffer_pool_manager.hpp, src/storage/buffer_pool_manager.cpp, include/storage/heap_table.hpp, src/storage/heap_table.cpp
Replaced string page-keying with PageKey{file_id,page_id} and get_file_id() mapping; HeapTable added cached_page_ and cached_page_id_, optimized insert serialization (stack buffer + overflow), iterator accepts memory_resource*, and destructor unpins cached page.
Transaction Locking Types
include/transaction/lock_manager.hpp, src/transaction/lock_manager.cpp, include/transaction/transaction.hpp, tests/transaction_coverage_tests.cpp, tests/lock_manager_tests.cpp
Replaced string RIDs with storage::HeapTable::TupleId across LockManager APIs and Transaction lock-tracking containers; tests updated to use TupleId.
Misc & Logging / Cleanup
src/catalog/catalog.cpp, src/parser/expression.cpp, src/executor/query_executor.cpp
Removed various debug std::cerr prints; expression numeric/logical factories adjusted; QueryExecutor resets arena between queries and propagates params/MR into plans.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Parser
    participant QueryExecutor
    participant OperatorTree as Operator
    participant Storage
    participant Index as BTreeIndex

    Client->>QueryExecutor: prepare(sql)
    activate QueryExecutor
    QueryExecutor->>Parser: parse(sql)
    activate Parser
    Parser-->>QueryExecutor: AST (with ParameterExpr nodes)
    deactivate Parser
    QueryExecutor->>Storage: load table/schema/index metadata
    activate Storage
    Storage-->>QueryExecutor: TableInfo, Schema, HeapTable, indexes
    deactivate Storage
    QueryExecutor->>QueryExecutor: cache PreparedStatement
    QueryExecutor-->>Client: PreparedStatement
    deactivate QueryExecutor

    Client->>QueryExecutor: execute(prepared, params)
    activate QueryExecutor
    QueryExecutor->>QueryExecutor: set current_params_, arena_.reset()
    QueryExecutor->>OperatorTree: build_plan(prepared, arena, params)
    activate OperatorTree
    OperatorTree->>OperatorTree: set_memory_resource(arena)
    OperatorTree->>OperatorTree: set_params(params)
    OperatorTree-->>QueryExecutor: root operator
    deactivate OperatorTree

    QueryExecutor->>OperatorTree: open/next()
    activate OperatorTree
    OperatorTree->>Parser: evaluate(expr, tuple, schema, params)
    activate Parser
    Parser->>Parser: ParameterExpr reads params[index]
    Parser-->>OperatorTree: value
    deactivate Parser
    OperatorTree->>Storage: insert/lookup using evaluated values
    activate Storage
    Storage->>Index: update/lookups
    Index-->>Storage: result
    deactivate Storage
    OperatorTree-->>QueryExecutor: tuple/result
    deactivate OperatorTree

    QueryExecutor->>QueryExecutor: reset arena, clear current_params_
    QueryExecutor-->>Client: QueryResult
    deactivate QueryExecutor
Loading
sequenceDiagram
    participant HeapTable as HT
    participant BPM as BufferPoolManager
    participant PageCache as PageTable

    HT->>HT: insert(tuple)
    HT->>HT: if cached_page_ exists and has space
    alt cached page usable
        HT->>HT: serialize into cached_page_
    else
        HT->>HT: unpin/flush cached_page_ if present
        HT->>BPM: fetch_page(PageKey{file_id,page_id})
        activate BPM
        BPM->>PageCache: lookup PageKey in page_table_
        PageCache-->>BPM: page or miss
        alt page present
            BPM-->>HT: pinned page
        else
            BPM-->>HT: create/pin new page
        end
        deactivate BPM
        HT->>HT: set cached_page_ to pinned page
        HT->>HT: serialize into cached_page_
    end
    HT->>HT: update metadata / last_page_id_
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

enhancement

Poem

🐇 I bind each "?" with a twitch and a hop,
Arenas stack memory — nibble, don't stop,
Pages cached snug in a burrowed retreat,
TupleId locks click — no loose string to meet,
Prepared statements bound, my inserts go pop! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the primary changes: high-performance optimizations (arena allocator, caching, lightweight lookups) and the Prepared Statement API with parameter binding support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/engine-optimizations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/transaction/lock_manager.cpp (1)

67-83: ⚠️ Potential issue | 🟠 Major

Lock upgrade leaves stale shared lock request in queue.

When upgrading from shared to exclusive, the code detects the existing shared lock (upgrade = true, it = iter) but never removes it. After the exclusive lock is granted, the transaction has both S and X entries in the queue. When unlock() is called, only one entry is removed, leaving the other behind.

🐛 Proposed fix
     queue.request_queue.push_back({txn->get_id(), LockMode::EXCLUSIVE, false});
     const auto my_req = std::prev(queue.request_queue.end());
+
+    // If upgrading, remove the old shared lock entry
+    if (upgrade) {
+        queue.request_queue.erase(it);
+    }
 
     const auto timeout = std::chrono::milliseconds(1000);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/transaction/lock_manager.cpp` around lines 67 - 83, The upgrade path
currently marks the existing SHARED request (upgrade=true, it=iter) but never
removes it, leaving a stale S entry; fix by erasing the existing shared request
from queue.request_queue when you detect an upgrade (use the saved iterator it /
iter to call queue.request_queue.erase(it)) before proceeding to enqueue or
grant the EXCLUSIVE lock, ensuring iterator validity and preserving correct
behavior of unlock().
src/storage/heap_table.cpp (1)

227-279: ⚠️ Potential issue | 🔴 Critical

Reject tuples that cannot be placed on a page.

A row whose serialized payload is larger than the usable page area can never satisfy the insert predicate, so this loop just keeps advancing pages forever. Also, a payload that exactly fills the remaining space on a fresh page is currently rejected because the capacity check uses < instead of <=.

💡 Possible guard
-    const auto required = static_cast<uint16_t>(payload_size);
+    const size_t max_payload =
+        Page::PAGE_SIZE - sizeof(PageHeader) - (DEFAULT_SLOT_COUNT * sizeof(uint16_t));
+    if (payload_size > max_payload) {
+        return {0, 0};
+    }
+    const auto required = static_cast<uint16_t>(payload_size);
...
-        if (header.free_space_offset + required < Page::PAGE_SIZE &&
+        if (header.free_space_offset + required <= Page::PAGE_SIZE &&
             header.num_slots < DEFAULT_SLOT_COUNT) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/heap_table.cpp` around lines 227 - 279, The loop can spin forever
for payloads too large for any page and currently rejects payloads that exactly
fit because it uses '<'; modify Insert logic in this block (symbols: required,
payload_size, Page::PAGE_SIZE, header.free_space_offset, DEFAULT_SLOT_COUNT,
last_page_id_, cached_page_) to first compute the usable space per fresh page
(Page::PAGE_SIZE minus sizeof(PageHeader) minus
DEFAULT_SLOT_COUNT*sizeof(uint16_t)) and immediately return failure (e.g.,
{0,0}) if required > usable_space_per_page; then change the free-space check
from "header.free_space_offset + required < Page::PAGE_SIZE" to use "<=
Page::PAGE_SIZE" (or equivalently ensure the comparison allows exact-fill) so a
payload that exactly fills remaining space is accepted. Ensure this oversized
check runs before advancing pages (so the loop won't spin) and that header
initialization and buffer writes remain unchanged for normal inserts.
🧹 Nitpick comments (4)
benchmarks/sqlite_comparison_bench.cpp (1)

141-158: Consider wrapping SQLite benchmark in explicit transaction for consistency.

The cloudSQL benchmark uses BEGIN/COMMIT while SQLite relies on synchronous = OFF. For a more apples-to-apples comparison, consider adding an explicit transaction:

💡 Optional improvement
     sqlite3_stmt* stmt;
     sqlite3_prepare_v2(ctx.db, "INSERT INTO bench_table VALUES (?, ?, ?)", -1, &stmt, nullptr);
 
+    sqlite3_exec(ctx.db, "BEGIN", nullptr, nullptr, nullptr);
     for (auto _ : state) {
         sqlite3_bind_int64(stmt, 1, state.iterations());
         sqlite3_bind_double(stmt, 2, 3.14);
         sqlite3_bind_text(stmt, 3, "some_payload_data", -1, SQLITE_STATIC);
         
         sqlite3_step(stmt);
         sqlite3_reset(stmt);
     }
+    sqlite3_exec(ctx.db, "COMMIT", nullptr, nullptr, nullptr);
     
     sqlite3_finalize(stmt);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/sqlite_comparison_bench.cpp` around lines 141 - 158, The SQLite
benchmark BM_SQLite_Insert should wrap the repeated inserts in an explicit
transaction to match cloudSQL's BEGIN/COMMIT behavior: before the benchmark loop
call BEGIN (e.g., via sqlite3_exec(ctx.db, "BEGIN", ...)) and after the loop
call COMMIT, keeping the existing prepared statement (stmt) and finalize/reset
logic unchanged; reference BM_SQLite_Insert, SQLiteContext, and
sqlite3_exec/sqlite3_prepare_v2/sqlite3_step/sqlite3_reset/sqlite3_finalize when
making the change.
include/storage/heap_table.hpp (1)

55-60: Hash function may cause collisions for large page numbers.

The hash shifts page_num by only 16 bits, which discards the upper 16 bits when page_num > 65535. For databases with many pages, this could degrade hash table performance.

Consider using a stronger mixing function:

🔧 Suggested improvement
         struct Hash {
             std::size_t operator()(const TupleId& tid) const {
-                return (static_cast<size_t>(tid.page_num) << 16) ^
-                       static_cast<size_t>(tid.slot_num);
+                // Use full page_num range with better mixing
+                std::size_t h = static_cast<size_t>(tid.page_num);
+                h ^= h >> 16;
+                h *= 0x85ebca6b;
+                h ^= static_cast<size_t>(tid.slot_num);
+                return h;
             }
         };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/storage/heap_table.hpp` around lines 55 - 60, The current
Hash::operator() truncates page_num by shifting only 16 bits which risks
collisions when page_num > 65535; update struct Hash to combine the full
page_num and slot_num (TupleId) using a stronger mixing strategy—e.g., cast
page_num and slot_num into a single 64-bit value before hashing, or use
std::hash on a std::pair/tuple of page_num and slot_num or a well-known combine
routine (like boost::hash_combine or a rotate-xor mix) so both fields contribute
fully to the resulting std::size_t; locate Hash::operator() in
include/storage/heap_table.hpp and replace the simple (page_num<<16) ^ slot_num
logic with one of these robust combine approaches.
include/parser/expression.hpp (1)

63-65: Pass bound parameters through the vectorized API as well.

ParameterExpr can resolve ? in scalar evaluation, but evaluate_vectorized() still has no params input anywhere in the interface. If a parameterized expression reaches a vectorized operator, it has no clean way to materialize the bound value.

Also applies to: 186-190

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/parser/expression.hpp` around lines 63 - 65, The vectorized
evaluation API lacks a params parameter so ParameterExpr cannot resolve bound
parameters in vectorized execution; update the virtual method
evaluate_vectorized (and its overloads/usages) to accept const
std::vector<common::Value>* params = nullptr and propagate this parameter
through all implementations and callers (e.g.,
ParameterExpr::evaluate_vectorized, all Expression subclasses, and any operators
that call evaluate_vectorized) so bound parameters are available during
vectorized materialization; ensure the new parameter is forwarded wherever
evaluate_vectorized is invoked and add null checks where needed.
src/executor/query_executor.cpp (1)

915-924: Let ? predicates use the index fast path too.

The optimizer still only treats Column = Constant as eligible, so Column = ? falls back to a sequential scan even though evaluate(..., current_params_) can now resolve the bound value. Extending this branch to accept ExprType::Parameter would let prepared SELECTs benefit from indexes as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/executor/query_executor.cpp` around lines 915 - 924, The code in the
binary-expression handling only marks Column=Constant as eligible for the index
fast path; change the type checks in the block that examines
bin_expr->left()/bin_expr->right() so that it accepts ExprType::Parameter in
addition to ExprType::Constant (i.e., treat Constant or Parameter as the
constant side), then call evaluate(nullptr, nullptr, current_params_) as you do
now to resolve the bound value into const_val and set eligible=true; update both
branches that currently check only parser::ExprType::Constant (the checks around
bin_expr->left().type() / bin_expr->right().type()) and ensure col_name,
const_val and eligible are assigned the same way when the constant side is a
Parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/common/arena_allocator.hpp`:
- Around line 32-36: The oversized-allocation path in ArenaAllocator currently
appends a dedicated buffer to chunks_ and leaves current_chunk_idx_ /
current_offset_ unchanged, allowing subsequent small allocations to wrongly
reuse that buffer; change the logic so that when bytes > chunk_size_ you
allocate a dedicated buffer but do NOT add it into chunks_ (or instead add it
into a separate container like large_chunks_); keep current_chunk_idx_ and
current_offset_ unchanged so the reusable chunk pool is not polluted, and update
the destructor (~ArenaAllocator) to free buffers from both chunks_ and the
separate large_chunks_ (or otherwise ensure dedicated buffers are deleted) —
apply the same fix to the other oversized branches referenced near the other
allocation sites (the blocks around the other bytes > chunk_size_ checks).

In `@include/executor/query_executor.hpp`:
- Around line 32-36: PreparedStatement currently caches mutable storage handles
(table_meta, schema, table, indexes) which are mutated per-execution; remove
these members from PreparedStatement and stop storing storage::HeapTable and
storage::BTreeIndex pointers on the prepared object. Instead, allocate or
resolve table_meta, create Schema, instantiate storage::HeapTable and the vector
of storage::BTreeIndex inside the per-execution path (e.g., in
execute()/ExecuteQuery() or an ExecutionContext object) so each execution/thread
gets its own handles and DDL changes aren't stale; update any code referencing
PreparedStatement::table_meta, ::schema, ::table, or ::indexes to fetch them
from the per-execution context.

In `@include/storage/buffer_pool_manager.hpp`:
- Around line 113-117: The Hash::operator() currently does a left shift of
static_cast<size_t>(key.file_id) by 32 which is undefined on 32-bit platforms
and causes collisions; update PageKey::Hash (the operator() used by page_table_)
to use a portable hash combine instead (e.g., compute separate hashes for
key.file_id and key.page_id with std::hash and mix/combine them with XOR and a
rotation or multiply-by-constant) so it does not rely on a 64-bit shift and
yields distinct values across platforms.

In `@src/executor/query_executor.cpp`:
- Around line 341-342: The arena_ is being reset while returned rows are still
backed by it, causing dangling storage; in the execute_select() path ensure
tuples are detached/copied out of arena_ (e.g., move data into QueryResult-owned
buffers or invoke the existing detach/move routine) before calling
arena_.reset(); specifically, update the code around arena_.reset() so it only
runs after execute_select() has finished moving/owning all materialized tuples
into QueryResult (or call the function that detaches rows) to avoid reclaiming
the arena-backed memory prematurely.
- Around line 198-217: After prepared.table->insert(tuple, xmin) succeeds you
must immediately create rollback state so a partial failure (e.g. in
apply_index_write) can undo the heap tuple; move or add the
txn->add_undo_log(transaction::UndoLog::Type::INSERT, prepared.table_meta->name,
tid) call directly after the insert (before the index loop), and ensure
lock_manager_.acquire_exclusive(txn, tid) is called at an appropriate time
(either before insert or immediately after adding the undo log) so that any
thrown exceptions during apply_index_write on prepared.indexes[i] will trigger
the undo logic; update error handling to rely on the undo log for cleanup rather
than only adding it after index updates.
- Around line 174-223: The fast-path in QueryExecutor::execute for
PreparedStatement INSERTs bypasses the sharding/routing and durability/WAL logic
used by normal inserts; change it to follow the same path as execute_insert by
delegating to or reusing its internals instead of writing local heap/indexes
directly. Specifically, inside QueryExecutor::execute when prepared.stmt->type()
== parser::StmtType::Insert and prepared.table is set, invoke the same
sharding/forwarding and WAL/append-log steps (or call execute_insert(prepared,
params)) so routing, cluster forwarding, and inserting an INSERT log record are
preserved; also ensure transaction handling (is_auto_commit, txn start/commit)
and index/undo-log updates remain consistent with execute_insert's behavior.
- Around line 46-48: The process-wide statement_cache_ in QueryExecutor is
unbounded and will leak memory for ad-hoc / literal-heavy SQL; replace the raw
std::unordered_map with a bounded LRU or size-limited cache (e.g., maintain an
eviction list + map or use a thread-safe LRU helper) and enforce a configurable
max_entries; update accesses in QueryExecutor to acquire cache_mutex_ while
looking up/inserting and to evict the least-recently-used entry when the limit
is reached; apply the same bounded-cache change to the other statement cache
instance referenced in the file (the other declaration/use around the later
block) so both caches obey the size limit and remain thread-safe.

In `@src/parser/expression.cpp`:
- Around line 337-343: ParameterExpr::evaluate_vectorized currently does nothing
and the vectorized API lacks a params argument so parameter values can't be
substituted; either (A) change the evaluate_vectorized signature (declared in
include/parser/expression.hpp) to accept a params argument (mirror whatever type
is used by row evaluation), update all callers (e.g., VectorizedFilterOperator
and VectorizedProjectOperator) to pass params, and implement
ParameterExpr::evaluate_vectorized to fill executor::ColumnVector using
parameter values, or (B) if adding params is too invasive now, make
ParameterExpr::evaluate_vectorized explicitly reject vectorized execution by
returning/setting an error or raising an assertion so queries with ParameterExpr
are forced to fallback to row-by-row evaluation via evaluate(&t, &schema); pick
one approach and apply it consistently for ParameterExpr and any other
expressions that use params.
- Around line 324-335: Operator evaluate calls are missing the bound parameter
vector so ParameterExpr evaluates to NULL; update all evaluate(...) invocations
in the operator implementations to pass the operator's bound params via
get_params(). Specifically, change calls like condition_->evaluate(&tuple,
&schema_) in FilterOperator, col->evaluate(&input, &input_schema) in
ProjectOperator, the evaluate calls in SortOperator, the expression evaluations
in AggregateOperator, and the evaluate calls in HashJoinOperator to include
get_params() as the third argument (matching ParameterExpr::evaluate signature)
so bound parameters propagate into expression evaluation.

In `@src/storage/heap_table.cpp`:
- Around line 288-300: The cached-page fast paths (using cached_page_,
cached_page_id_, tuple_id and reading PageHeader into header) skip the slow-path
validation and can index past the slot directory; fix by validating the page is
initialized and that tuple_id.slot_num < header.num_slots before reading/writing
the slot entry or writing xmax (i.e., after memcpy into PageHeader check a
plausible initialization marker or at minimum header.num_slots > 0 and
tuple_id.slot_num < header.num_slots), and if validation fails return false;
apply the same validation to the other cached-page branches noted (around the
regions corresponding to the diffs at 333-338 and 376-385).
- Around line 41-44: The destructor HeapTable::~HeapTable() currently forces a
flush by calling bpm_.unpin_page(filename_, cached_page_id_, true) on
cached_page_, which can persist uncommitted/failed inserts; change it to avoid
unconditional flush: when cached_page_ != nullptr call
bpm_.unpin_page(filename_, cached_page_id_, false) (i.e., do not mark
dirty/force write) or otherwise only write back when the page is known safe to
persist (e.g., after commit/rollback bookkeeping), ensuring you keep the same
cached_page_/cached_page_id_ checks and clear cached_page_ afterwards.

---

Outside diff comments:
In `@src/storage/heap_table.cpp`:
- Around line 227-279: The loop can spin forever for payloads too large for any
page and currently rejects payloads that exactly fit because it uses '<'; modify
Insert logic in this block (symbols: required, payload_size, Page::PAGE_SIZE,
header.free_space_offset, DEFAULT_SLOT_COUNT, last_page_id_, cached_page_) to
first compute the usable space per fresh page (Page::PAGE_SIZE minus
sizeof(PageHeader) minus DEFAULT_SLOT_COUNT*sizeof(uint16_t)) and immediately
return failure (e.g., {0,0}) if required > usable_space_per_page; then change
the free-space check from "header.free_space_offset + required <
Page::PAGE_SIZE" to use "<= Page::PAGE_SIZE" (or equivalently ensure the
comparison allows exact-fill) so a payload that exactly fills remaining space is
accepted. Ensure this oversized check runs before advancing pages (so the loop
won't spin) and that header initialization and buffer writes remain unchanged
for normal inserts.

In `@src/transaction/lock_manager.cpp`:
- Around line 67-83: The upgrade path currently marks the existing SHARED
request (upgrade=true, it=iter) but never removes it, leaving a stale S entry;
fix by erasing the existing shared request from queue.request_queue when you
detect an upgrade (use the saved iterator it / iter to call
queue.request_queue.erase(it)) before proceeding to enqueue or grant the
EXCLUSIVE lock, ensuring iterator validity and preserving correct behavior of
unlock().

---

Nitpick comments:
In `@benchmarks/sqlite_comparison_bench.cpp`:
- Around line 141-158: The SQLite benchmark BM_SQLite_Insert should wrap the
repeated inserts in an explicit transaction to match cloudSQL's BEGIN/COMMIT
behavior: before the benchmark loop call BEGIN (e.g., via sqlite3_exec(ctx.db,
"BEGIN", ...)) and after the loop call COMMIT, keeping the existing prepared
statement (stmt) and finalize/reset logic unchanged; reference BM_SQLite_Insert,
SQLiteContext, and
sqlite3_exec/sqlite3_prepare_v2/sqlite3_step/sqlite3_reset/sqlite3_finalize when
making the change.

In `@include/parser/expression.hpp`:
- Around line 63-65: The vectorized evaluation API lacks a params parameter so
ParameterExpr cannot resolve bound parameters in vectorized execution; update
the virtual method evaluate_vectorized (and its overloads/usages) to accept
const std::vector<common::Value>* params = nullptr and propagate this parameter
through all implementations and callers (e.g.,
ParameterExpr::evaluate_vectorized, all Expression subclasses, and any operators
that call evaluate_vectorized) so bound parameters are available during
vectorized materialization; ensure the new parameter is forwarded wherever
evaluate_vectorized is invoked and add null checks where needed.

In `@include/storage/heap_table.hpp`:
- Around line 55-60: The current Hash::operator() truncates page_num by shifting
only 16 bits which risks collisions when page_num > 65535; update struct Hash to
combine the full page_num and slot_num (TupleId) using a stronger mixing
strategy—e.g., cast page_num and slot_num into a single 64-bit value before
hashing, or use std::hash on a std::pair/tuple of page_num and slot_num or a
well-known combine routine (like boost::hash_combine or a rotate-xor mix) so
both fields contribute fully to the resulting std::size_t; locate
Hash::operator() in include/storage/heap_table.hpp and replace the simple
(page_num<<16) ^ slot_num logic with one of these robust combine approaches.

In `@src/executor/query_executor.cpp`:
- Around line 915-924: The code in the binary-expression handling only marks
Column=Constant as eligible for the index fast path; change the type checks in
the block that examines bin_expr->left()/bin_expr->right() so that it accepts
ExprType::Parameter in addition to ExprType::Constant (i.e., treat Constant or
Parameter as the constant side), then call evaluate(nullptr, nullptr,
current_params_) as you do now to resolve the bound value into const_val and set
eligible=true; update both branches that currently check only
parser::ExprType::Constant (the checks around bin_expr->left().type() /
bin_expr->right().type()) and ensure col_name, const_val and eligible are
assigned the same way when the constant side is a Parameter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc6bbfab-e130-46cb-9ff1-2ca78cb22c64

📥 Commits

Reviewing files that changed from the base of the PR and between 0cedf2a and 41fb871.

📒 Files selected for processing (20)
  • benchmarks/sqlite_comparison_bench.cpp
  • include/common/arena_allocator.hpp
  • include/executor/operator.hpp
  • include/executor/query_executor.hpp
  • include/executor/types.hpp
  • include/parser/expression.hpp
  • include/parser/parser.hpp
  • include/storage/buffer_pool_manager.hpp
  • include/storage/heap_table.hpp
  • include/transaction/lock_manager.hpp
  • include/transaction/transaction.hpp
  • src/catalog/catalog.cpp
  • src/executor/operator.cpp
  • src/executor/query_executor.cpp
  • src/parser/expression.cpp
  • src/parser/lexer.cpp
  • src/parser/parser.cpp
  • src/storage/buffer_pool_manager.cpp
  • src/storage/heap_table.cpp
  • src/transaction/lock_manager.cpp
💤 Files with no reviewable changes (1)
  • src/catalog/catalog.cpp

Comment on lines +32 to +36
~ArenaAllocator() override {
for (auto* chunk : chunks_) {
delete[] chunk;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep oversized allocations out of the reusable chunk list.

The bytes > chunk_size_ branch appends the dedicated buffer to chunks_ but leaves current_chunk_idx_ / current_offset_ unchanged. The next small allocation can therefore be handed out from that same buffer at offset 0, aliasing the still-live large allocation.

💡 One way to isolate large allocations
     ~ArenaAllocator() override {
         for (auto* chunk : chunks_) {
             delete[] chunk;
         }
+        for (auto* chunk : large_chunks_) {
+            delete[] chunk;
+        }
     }
...
         if (bytes > chunk_size_) {
             auto* large_chunk = new uint8_t[bytes];
-            chunks_.push_back(large_chunk);
+            large_chunks_.push_back(large_chunk);
             // We don't make this the "current" chunk for small allocations
             // to avoid wasting space. We just return it.
             return large_chunk;
         }
...
     std::vector<uint8_t*> chunks_;
+    std::vector<uint8_t*> large_chunks_;
     size_t current_chunk_idx_;
     size_t current_offset_;

Also applies to: 79-85, 111-114

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/common/arena_allocator.hpp` around lines 32 - 36, The
oversized-allocation path in ArenaAllocator currently appends a dedicated buffer
to chunks_ and leaves current_chunk_idx_ / current_offset_ unchanged, allowing
subsequent small allocations to wrongly reuse that buffer; change the logic so
that when bytes > chunk_size_ you allocate a dedicated buffer but do NOT add it
into chunks_ (or instead add it into a separate container like large_chunks_);
keep current_chunk_idx_ and current_offset_ unchanged so the reusable chunk pool
is not polluted, and update the destructor (~ArenaAllocator) to free buffers
from both chunks_ and the separate large_chunks_ (or otherwise ensure dedicated
buffers are deleted) — apply the same fix to the other oversized branches
referenced near the other allocation sites (the blocks around the other bytes >
chunk_size_ checks).

Comment on lines +32 to +36
// Cached execution state for hot-path optimization
const TableInfo* table_meta = nullptr;
std::unique_ptr<Schema> schema;
std::unique_ptr<storage::HeapTable> table;
std::vector<std::unique_ptr<storage::BTreeIndex>> indexes;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't cache mutable storage handles inside a reusable prepared statement.

PreparedStatement is reused by design and even passed around as const, but these members are mutated on every execution. Sharing one prepared statement across threads will race on heap-page caching and B-tree writes, and any DDL after prepare() leaves this cached execution state stale.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/executor/query_executor.hpp` around lines 32 - 36, PreparedStatement
currently caches mutable storage handles (table_meta, schema, table, indexes)
which are mutated per-execution; remove these members from PreparedStatement and
stop storing storage::HeapTable and storage::BTreeIndex pointers on the prepared
object. Instead, allocate or resolve table_meta, create Schema, instantiate
storage::HeapTable and the vector of storage::BTreeIndex inside the
per-execution path (e.g., in execute()/ExecuteQuery() or an ExecutionContext
object) so each execution/thread gets its own handles and DDL changes aren't
stale; update any code referencing PreparedStatement::table_meta, ::schema,
::table, or ::indexes to fetch them from the per-execution context.

Comment on lines +113 to +117
struct Hash {
std::size_t operator()(const PageKey& key) const {
return (static_cast<size_t>(key.file_id) << 32) | static_cast<size_t>(key.page_id);
}
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

C++ shift by width of type undefined behavior

💡 Result:

In C++, for the built-in shift operators E1 << E2 and E1 >> E2, the behavior is undefined if the shift count E2 is:

  • negative, or
  • greater than or equal to the bit-width of the promoted left operand (i.e., after integral promotions are applied to E1).
    This is the key point: the limit is based on the width of the type after promotion, not necessarily the original type of E1. [1]

This rule is stated in the C++ standard wording for shift expressions. [1] It’s also summarized the same way by cppreference. [2]

Practical implications / examples (typical 32-bit int):

  • 1u << 32 is UB (32 is not < 32).
  • If std::uint8_t x = 1;, then x << 8 is often UB because x is promoted to int, and the shift is performed in int (commonly 32-bit), so 8 is fine—but x << 32 would still be UB for the same reason as above. [1][2]

To avoid UB, guard the shift:

  • Ensure 0 <= shift && shift < std::numeric_limits<decltype(+lhs)>::digits (using +lhs to force promotion), or shift in a known wider unsigned type. [1][2]
    Some CPUs mask the shift count (so “it seems to work”), but the language still says it’s UB, and compilers may optimize assuming it never happens. [3]

Sources:
[1] C++ draft (eel.is), shift operator rule (“undefined if … >= the width of the promoted left operand”)
[2] cppreference, shift operator undefined-behavior condition
[3] SEI CERT C guidance noting oversized shifts are UB and hardware may reduce the count modulo the width


🏁 Script executed:

# Check for platform-specific code or 32-bit support indicators
rg -i "32.bit|32bit|x86|arm|platform" include/storage/buffer_pool_manager.hpp

# Check if PageKey is used with unordered_map/unordered_set (where Hash is used)
rg "PageKey" --type cpp --type hpp | head -20

# Check CMakeLists or build configuration for platform targets
fd -e cmake -e txt | xargs grep -l "32\|platform\|architecture" 2>/dev/null | head -5

Repository: poyrazK/cloudSQL

Length of output: 87


🏁 Script executed:

# Verify the proposed fix syntax is valid C++
# Check if std::hash is included or available in this file
rg "#include.*hash" include/storage/buffer_pool_manager.hpp

# Check what headers are already included
head -50 include/storage/buffer_pool_manager.hpp | grep "#include"

Repository: poyrazK/cloudSQL

Length of output: 274


🏁 Script executed:

# Check the actual context around the Hash function to understand the complete picture
sed -n '100,125p' include/storage/buffer_pool_manager.hpp

Repository: poyrazK/cloudSQL

Length of output: 799


🏁 Script executed:

# Search for PageKey usage in the codebase
rg "PageKey" . --type cpp --type py | head -30

Repository: poyrazK/cloudSQL

Length of output: 837


🏁 Script executed:

# Check specifically for unordered_map or unordered_set with PageKey
rg "unordered_map<PageKey|unordered_set<PageKey" . --type cpp --type py

Repository: poyrazK/cloudSQL

Length of output: 171


🏁 Script executed:

# Look for the full class to see member variables that use PageKey hash
sed -n '100,200p' include/storage/buffer_pool_manager.hpp | tail -80

Repository: poyrazK/cloudSQL

Length of output: 1469


Hash function has undefined behavior on 32-bit platforms and causes hash collisions.

Shifting size_t left by 32 bits is undefined behavior when size_t is 32 bits (per C++20 standard, any shift amount ≥ type width is UB). The page_table_ member actively uses PageKey::Hash, making this a correctness issue. On 32-bit systems, the shift would result in 0, causing all keys with the same page_id to collide and degrade performance or cause data loss.

🔧 Proposed fix using portable hash combining
         struct Hash {
             std::size_t operator()(const PageKey& key) const {
-                return (static_cast<size_t>(key.file_id) << 32) | static_cast<size_t>(key.page_id);
+                // Portable hash combining that works on both 32-bit and 64-bit systems
+                std::size_t h1 = std::hash<uint32_t>{}(key.file_id);
+                std::size_t h2 = std::hash<uint32_t>{}(key.page_id);
+                return h1 ^ (h2 + 0x9e3779b9 + (h1 << 6) + (h1 >> 2));
             }
         };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
struct Hash {
std::size_t operator()(const PageKey& key) const {
return (static_cast<size_t>(key.file_id) << 32) | static_cast<size_t>(key.page_id);
}
};
struct Hash {
std::size_t operator()(const PageKey& key) const {
// Portable hash combining that works on both 32-bit and 64-bit systems
std::size_t h1 = std::hash<uint32_t>{}(key.file_id);
std::size_t h2 = std::hash<uint32_t>{}(key.page_id);
return h1 ^ (h2 + 0x9e3779b9 + (h1 << 6) + (h1 >> 2));
}
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/storage/buffer_pool_manager.hpp` around lines 113 - 117, The
Hash::operator() currently does a left shift of static_cast<size_t>(key.file_id)
by 32 which is undefined on 32-bit platforms and causes collisions; update
PageKey::Hash (the operator() used by page_table_) to use a portable hash
combine instead (e.g., compute separate hashes for key.file_id and key.page_id
with std::hash and mix/combine them with XOR and a rotation or
multiply-by-constant) so it does not rely on a 64-bit shift and yields distinct
values across platforms.

Comment on lines +46 to +48
// Define static members for statement cache
std::unordered_map<std::string, std::shared_ptr<parser::Statement>> QueryExecutor::statement_cache_;
std::mutex QueryExecutor::cache_mutex_;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bound the process-wide statement cache.

statement_cache_ grows for every distinct SQL string and never evicts. Workloads with ad-hoc SQL or literal-heavy statements will turn this into unbounded memory growth.

Also applies to: 244-263

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/executor/query_executor.cpp` around lines 46 - 48, The process-wide
statement_cache_ in QueryExecutor is unbounded and will leak memory for ad-hoc /
literal-heavy SQL; replace the raw std::unordered_map with a bounded LRU or
size-limited cache (e.g., maintain an eviction list + map or use a thread-safe
LRU helper) and enforce a configurable max_entries; update accesses in
QueryExecutor to acquire cache_mutex_ while looking up/inserting and to evict
the least-recently-used entry when the limit is reached; apply the same
bounded-cache change to the other statement cache instance referenced in the
file (the other declaration/use around the later block) so both caches obey the
size limit and remain thread-safe.

Comment on lines +174 to +223
QueryResult QueryExecutor::execute(const PreparedStatement& prepared,
const std::vector<common::Value>& params) {
// Fast-path for INSERT
if (prepared.stmt->type() == parser::StmtType::Insert && prepared.table) {
const auto start = std::chrono::high_resolution_clock::now();
QueryResult result;
current_params_ = &params;

const bool is_auto_commit = (current_txn_ == nullptr);
transaction::Transaction* txn = current_txn_;
if (is_auto_commit) txn = transaction_manager_.begin();

try {
const auto& insert_stmt = dynamic_cast<const parser::InsertStatement&>(*prepared.stmt);
uint64_t rows_inserted = 0;
const uint64_t xmin = (txn != nullptr) ? txn->get_id() : 0;

for (const auto& row_exprs : insert_stmt.values()) {
std::pmr::vector<common::Value> values(&arena_);
values.reserve(row_exprs.size());
for (const auto& expr : row_exprs) {
values.push_back(expr->evaluate(nullptr, nullptr, current_params_));
}

const Tuple tuple(std::move(values));
const auto tid = prepared.table->insert(tuple, xmin);

// Index updates using cached index objects
std::string err;
for (size_t i = 0; i < prepared.indexes.size(); ++i) {
const auto& idx_info = prepared.table_meta->indexes[i];
uint16_t pos = idx_info.column_positions[0];
if (!apply_index_write(*prepared.indexes[i], tuple.get(pos), tid,
IndexOp::Insert, err)) {
throw std::runtime_error(err);
}
}

if (txn != nullptr) {
txn->add_undo_log(transaction::UndoLog::Type::INSERT, prepared.table_meta->name,
tid);
if (!lock_manager_.acquire_exclusive(txn, tid)) {
throw std::runtime_error("Failed to acquire exclusive lock");
}
}
rows_inserted++;
}

if (is_auto_commit && txn != nullptr) transaction_manager_.commit(txn);
result.set_rows_affected(rows_inserted);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep prepared INSERTs on the same routing and durability path as normal INSERTs.

This shortcut writes the local heap and indexes directly, but it never runs the sharding forward path or appends an INSERT log record the way execute_insert() does. In clustered or WAL-enabled deployments, execute(*prepared, params) will diverge from regular INSERT semantics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/executor/query_executor.cpp` around lines 174 - 223, The fast-path in
QueryExecutor::execute for PreparedStatement INSERTs bypasses the
sharding/routing and durability/WAL logic used by normal inserts; change it to
follow the same path as execute_insert by delegating to or reusing its internals
instead of writing local heap/indexes directly. Specifically, inside
QueryExecutor::execute when prepared.stmt->type() == parser::StmtType::Insert
and prepared.table is set, invoke the same sharding/forwarding and
WAL/append-log steps (or call execute_insert(prepared, params)) so routing,
cluster forwarding, and inserting an INSERT log record are preserved; also
ensure transaction handling (is_auto_commit, txn start/commit) and
index/undo-log updates remain consistent with execute_insert's behavior.

Comment on lines +337 to +343
void ParameterExpr::evaluate_vectorized(const executor::VectorBatch& batch,
const executor::Schema& schema,
executor::ColumnVector& result) const {
(void)batch;
(void)schema;
(void)result;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for evaluate_vectorized call sites to understand usage patterns
echo "=== evaluate_vectorized call sites ==="
rg -n "evaluate_vectorized" --type cpp -C3

echo ""
echo "=== Check if VectorBatch is used in query execution paths ==="
rg -n "VectorBatch" --type cpp -l

Repository: poyrazK/cloudSQL

Length of output: 12709


🏁 Script executed:

# Check specific implementations mentioned in the review
echo "=== BinaryExpr::evaluate_vectorized (around line 71-130) ==="
sed -n '71,130p' src/parser/expression.cpp

echo ""
echo "=== UnaryExpr::evaluate_vectorized (around line 209-240) ==="
sed -n '209,240p' src/parser/expression.cpp

echo ""
echo "=== FunctionExpr::evaluate_vectorized (around line 372-420) ==="
sed -n '372,420p' src/parser/expression.cpp

Repository: poyrazK/cloudSQL

Length of output: 5959


🏁 Script executed:

# Check if any evaluate_vectorized implementations call evaluate() with params
echo "=== Search for evaluate() calls within evaluate_vectorized implementations ==="
rg -A 15 "void.*evaluate_vectorized" src/parser/expression.cpp | grep -E "(evaluate_vectorized|evaluate\()" | head -40

Repository: poyrazK/cloudSQL

Length of output: 940


🏁 Script executed:

# Check tests for ParameterExpr with vectorized execution
echo "=== Search for ParameterExpr in tests ==="
rg -n "ParameterExpr" tests/ --type cpp -B2 -A2

echo ""
echo "=== Search for params with vectorized tests ==="
rg -n "evaluate_vectorized" tests/ --type cpp -B5 -A5

Repository: poyrazK/cloudSQL

Length of output: 980


🏁 Script executed:

# Check vectorized_operator.hpp for any validation or comments about params/ParameterExpr
echo "=== vectorized_operator.hpp content (around condition/expression handling) ==="
sed -n '100,180p' include/executor/vectorized_operator.hpp

echo ""
echo "=== Search for any validation of expression types in vectorized context ==="
rg -n "ParameterExpr|params" include/executor/vectorized_operator.hpp src/executor/vectorized_operator.cpp

Repository: poyrazK/cloudSQL

Length of output: 3338


🏁 Script executed:

# Check if there are comments explaining the design choice for params in vectorized API
echo "=== Search for comments about vectorized and params in header files ==="
rg -B3 "void evaluate_vectorized" include/parser/expression.hpp | head -30

Repository: poyrazK/cloudSQL

Length of output: 1834


🏁 Script executed:

# Check how vectorized_operator creates/filters expressions to see if ParameterExpr is ever used
echo "=== Check vectorized operator implementation for expression setup ==="
rg -n "expressions_" include/executor/vectorized_operator.hpp src/executor/vectorized_operator.cpp | head -20

Repository: poyrazK/cloudSQL

Length of output: 669


ParameterExpr::evaluate_vectorized() is an unimplemented stub that silently produces incorrect results.

This method does not populate the result vector. Additionally, the evaluate_vectorized() API signature (defined in include/parser/expression.hpp:70-72) lacks a params parameter entirely, while other expression types that fall back to row-by-row evaluation call evaluate(&t, &schema) without parameters. This means:

  1. If a query with parameter placeholders reaches vectorized execution (via VectorizedFilterOperator or VectorizedProjectOperator), the result vector will remain unpopulated or stale.
  2. Even if this method were implemented, it cannot receive bound parameter values since the vectorized API does not support them.

Recommend either: (1) extending the evaluate_vectorized() signature to include a params parameter and implementing proper parameter substitution, or (2) explicitly rejecting vectorized evaluation of expressions containing parameters until full support is added.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parser/expression.cpp` around lines 337 - 343,
ParameterExpr::evaluate_vectorized currently does nothing and the vectorized API
lacks a params argument so parameter values can't be substituted; either (A)
change the evaluate_vectorized signature (declared in
include/parser/expression.hpp) to accept a params argument (mirror whatever type
is used by row evaluation), update all callers (e.g., VectorizedFilterOperator
and VectorizedProjectOperator) to pass params, and implement
ParameterExpr::evaluate_vectorized to fill executor::ColumnVector using
parameter values, or (B) if adding params is too invasive now, make
ParameterExpr::evaluate_vectorized explicitly reject vectorized execution by
returning/setting an error or raising an assertion so queries with ParameterExpr
are forced to fallback to row-by-row evaluation via evaluate(&t, &schema); pick
one approach and apply it consistently for ParameterExpr and any other
expressions that use params.

Comment on lines +41 to +44
HeapTable::~HeapTable() {
if (cached_page_ != nullptr) {
bpm_.unpin_page(filename_, cached_page_id_, true);
cached_page_ = nullptr;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't flush cached_page_ unconditionally from the destructor.

The stack-allocated HeapTable instances in the non-prepared DML paths can unwind through this destructor after insert() dirties the page but before rollback bookkeeping is established. unpin_page(..., true) then persists a row from a failed statement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/heap_table.cpp` around lines 41 - 44, The destructor
HeapTable::~HeapTable() currently forces a flush by calling
bpm_.unpin_page(filename_, cached_page_id_, true) on cached_page_, which can
persist uncommitted/failed inserts; change it to avoid unconditional flush: when
cached_page_ != nullptr call bpm_.unpin_page(filename_, cached_page_id_, false)
(i.e., do not mark dirty/force write) or otherwise only write back when the page
is known safe to persist (e.g., after commit/rollback bookkeeping), ensuring you
keep the same cached_page_/cached_page_id_ checks and clear cached_page_
afterwards.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/executor/types.hpp`:
- Around line 133-138: Add back backward-compatible constructors to Tuple to
accept plain std::vector<common::Value> and initializer lists: implement
Tuple(std::vector<common::Value> values) that constructs the internal
std::pmr::vector<common::Value> values_ from the std::vector contents
(delegating/transforming into values_ so existing call sites that pass
std::vector compile), and implement Tuple(std::initializer_list<common::Value>
init) (and the variant taking std::pmr::memory_resource* as well) that
constructs values_ from the initializer list; update Tuple’s overloads (class
Tuple and member values_) so these new constructors forward/convert into the
existing pmr-backed storage without changing other APIs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 08294201-15ef-48e0-8dcd-0db6f334b521

📥 Commits

Reviewing files that changed from the base of the PR and between 41fb871 and 1d09a7a.

📒 Files selected for processing (2)
  • include/executor/types.hpp
  • tests/transaction_coverage_tests.cpp

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
include/storage/buffer_pool_manager.hpp (1)

113-117: ⚠️ Potential issue | 🔴 Critical

Hash function has undefined behavior on 32-bit platforms.

The left shift by 32 bits is undefined behavior when size_t is 32 bits (C++ standard: shift amount ≥ type width is UB). On such platforms, file_id contribution becomes 0, causing all keys with the same page_id to collide.

🔧 Proposed fix using portable hash combining
         struct Hash {
             std::size_t operator()(const PageKey& key) const {
-                return (static_cast<size_t>(key.file_id) << 32) | static_cast<size_t>(key.page_id);
+                // Portable hash combining that works on both 32-bit and 64-bit systems
+                std::size_t h1 = std::hash<uint32_t>{}(key.file_id);
+                std::size_t h2 = std::hash<uint32_t>{}(key.page_id);
+                return h1 ^ (h2 + 0x9e3779b9 + (h1 << 6) + (h1 >> 2));
             }
         };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/storage/buffer_pool_manager.hpp` around lines 113 - 117, The
Hash::operator() currently shifts file_id left by 32 bits which is UB on 32-bit
platforms; replace that logic with a portable hash-combine of PageKey.file_id
and PageKey.page_id (e.g., compute std::hash for each field and mix them with a
standard combine like XOR with a golden constant and shifts or use
std::hash<std::pair<...>>/std::uint64_t hashes) so both file_id and page_id
contribute safely to the resulting std::size_t; update the Hash struct
(operator(), PageKey/file_id/page_id references) to use this portable combine
approach.
🧹 Nitpick comments (1)
include/executor/types.hpp (1)

142-150: Add null guards to memory_resource constructors to prevent undefined behavior.

Lines 142 and 149-150 accept memory_resource* without validation. The C++ standard requires pmr::polymorphic_allocator to receive non-null pointers; passing nullptr violates this precondition and causes undefined behavior. Although these constructors are currently unused in the codebase, hardening them prevents future misuse.

Proposed fix
 class Tuple {
    private:
     std::pmr::vector<common::Value> values_;
+    static std::pmr::memory_resource* require_resource(std::pmr::memory_resource* mr) {
+        if (mr == nullptr) {
+            throw std::invalid_argument("Tuple memory_resource must not be null");
+        }
+        return mr;
+    }

    public:
-    explicit Tuple(std::pmr::memory_resource* mr) : values_(mr) {}
+    explicit Tuple(std::pmr::memory_resource* mr) : values_(require_resource(mr)) {}

-    Tuple(const std::vector<common::Value>& values, std::pmr::memory_resource* mr)
-        : values_(values.begin(), values.end(), mr) {}
+    Tuple(const std::vector<common::Value>& values, std::pmr::memory_resource* mr)
+        : values_(values.begin(), values.end(), require_resource(mr)) {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/executor/types.hpp` around lines 142 - 150, The Tuple constructors
that accept std::pmr::memory_resource* (explicit
Tuple(std::pmr::memory_resource* mr) and Tuple(const std::vector<common::Value>&
values, std::pmr::memory_resource* mr)) must guard against nullptr to avoid
undefined behavior; update both constructors to check if mr is null and either
replace it with std::pmr::get_default_resource() or throw a clear exception
(e.g., std::invalid_argument) before constructing values_, ensuring the
polymorphic_allocator always receives a non-null memory_resource pointer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/executor/types.hpp`:
- Around line 146-147: The code uses std::make_move_iterator in the Tuple
constructor (Tuple(std::vector<common::Value>&& values) :
values_(std::make_move_iterator(values.begin()),
std::make_move_iterator(values.end())) {}), but the <iterator> header is
missing; add an `#include` <iterator> at the top of the file so
std::make_move_iterator is declared and the code is portable across standard
library implementations.

---

Duplicate comments:
In `@include/storage/buffer_pool_manager.hpp`:
- Around line 113-117: The Hash::operator() currently shifts file_id left by 32
bits which is UB on 32-bit platforms; replace that logic with a portable
hash-combine of PageKey.file_id and PageKey.page_id (e.g., compute std::hash for
each field and mix them with a standard combine like XOR with a golden constant
and shifts or use std::hash<std::pair<...>>/std::uint64_t hashes) so both
file_id and page_id contribute safely to the resulting std::size_t; update the
Hash struct (operator(), PageKey/file_id/page_id references) to use this
portable combine approach.

---

Nitpick comments:
In `@include/executor/types.hpp`:
- Around line 142-150: The Tuple constructors that accept
std::pmr::memory_resource* (explicit Tuple(std::pmr::memory_resource* mr) and
Tuple(const std::vector<common::Value>& values, std::pmr::memory_resource* mr))
must guard against nullptr to avoid undefined behavior; update both constructors
to check if mr is null and either replace it with
std::pmr::get_default_resource() or throw a clear exception (e.g.,
std::invalid_argument) before constructing values_, ensuring the
polymorphic_allocator always receives a non-null memory_resource pointer.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 37bade2a-3438-43a9-899b-8a333d33e3a2

📥 Commits

Reviewing files that changed from the base of the PR and between 378b9c2 and 9d165e0.

📒 Files selected for processing (4)
  • include/executor/types.hpp
  • include/storage/buffer_pool_manager.hpp
  • src/storage/buffer_pool_manager.cpp
  • tests/lock_manager_tests.cpp

Comment on lines +146 to +147
Tuple(std::vector<common::Value>&& values)
: values_(std::make_move_iterator(values.begin()), std::make_move_iterator(values.end())) {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify make_move_iterator usage and whether <iterator> is explicitly included
rg -n --type=cpp '\bmake_move_iterator\s*\(' include/executor/types.hpp
rg -n --type=cpp '^#include <iterator>$' include/executor/types.hpp || true

Repository: poyrazK/cloudSQL

Length of output: 164


🏁 Script executed:

head -30 include/executor/types.hpp

Repository: poyrazK/cloudSQL

Length of output: 808


Add the missing <iterator> include for std::make_move_iterator.

Line 147 uses std::make_move_iterator, which is defined in the <iterator> header. While some implementations provide it transitively through other includes, this is not guaranteed by the C++ standard and can cause portability issues on stricter toolchains or different standard library implementations.

Proposed fix
 `#include` <initializer_list>
+#include <iterator>
 `#include` <memory>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Tuple(std::vector<common::Value>&& values)
: values_(std::make_move_iterator(values.begin()), std::make_move_iterator(values.end())) {}
`#include` <initializer_list>
`#include` <iterator>
`#include` <memory>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/executor/types.hpp` around lines 146 - 147, The code uses
std::make_move_iterator in the Tuple constructor
(Tuple(std::vector<common::Value>&& values) :
values_(std::make_move_iterator(values.begin()),
std::make_move_iterator(values.end())) {}), but the <iterator> header is
missing; add an `#include` <iterator> at the top of the file so
std::make_move_iterator is declared and the code is portable across standard
library implementations.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/storage/heap_table.cpp`:
- Around line 57-58: The Iterator constructor stores mr_ which can be nullptr
causing undefined behavior when used by next_meta(); change the initialization
so mr_ is never nullptr (use the same fallback pattern as
Operator::get_memory_resource()), e.g., set mr_ to mr if non-null otherwise to
std::pmr::get_default_resource() (or call the existing
Operator::get_memory_resource()-style helper) so subsequent code in
HeapTable::Iterator::next_meta() can safely construct pmr containers with mr_.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b0ba190c-10c4-4b6a-a325-92d2f2bc5434

📥 Commits

Reviewing files that changed from the base of the PR and between 9d165e0 and 8ceaa93.

📒 Files selected for processing (2)
  • src/storage/heap_table.cpp
  • tests/cloudSQL_tests.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant