feat: high-performance engine optimizations and Prepared Statement API#19
feat: high-performance engine optimizations and Prepared Statement API#19
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds prepared-statement support with parameter placeholders and caching; introduces a PMR arena allocator and propagates memory resources; extends parser and expressions to support Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Parser
participant QueryExecutor
participant OperatorTree as Operator
participant Storage
participant Index as BTreeIndex
Client->>QueryExecutor: prepare(sql)
activate QueryExecutor
QueryExecutor->>Parser: parse(sql)
activate Parser
Parser-->>QueryExecutor: AST (with ParameterExpr nodes)
deactivate Parser
QueryExecutor->>Storage: load table/schema/index metadata
activate Storage
Storage-->>QueryExecutor: TableInfo, Schema, HeapTable, indexes
deactivate Storage
QueryExecutor->>QueryExecutor: cache PreparedStatement
QueryExecutor-->>Client: PreparedStatement
deactivate QueryExecutor
Client->>QueryExecutor: execute(prepared, params)
activate QueryExecutor
QueryExecutor->>QueryExecutor: set current_params_, arena_.reset()
QueryExecutor->>OperatorTree: build_plan(prepared, arena, params)
activate OperatorTree
OperatorTree->>OperatorTree: set_memory_resource(arena)
OperatorTree->>OperatorTree: set_params(params)
OperatorTree-->>QueryExecutor: root operator
deactivate OperatorTree
QueryExecutor->>OperatorTree: open/next()
activate OperatorTree
OperatorTree->>Parser: evaluate(expr, tuple, schema, params)
activate Parser
Parser->>Parser: ParameterExpr reads params[index]
Parser-->>OperatorTree: value
deactivate Parser
OperatorTree->>Storage: insert/lookup using evaluated values
activate Storage
Storage->>Index: update/lookups
Index-->>Storage: result
deactivate Storage
OperatorTree-->>QueryExecutor: tuple/result
deactivate OperatorTree
QueryExecutor->>QueryExecutor: reset arena, clear current_params_
QueryExecutor-->>Client: QueryResult
deactivate QueryExecutor
sequenceDiagram
participant HeapTable as HT
participant BPM as BufferPoolManager
participant PageCache as PageTable
HT->>HT: insert(tuple)
HT->>HT: if cached_page_ exists and has space
alt cached page usable
HT->>HT: serialize into cached_page_
else
HT->>HT: unpin/flush cached_page_ if present
HT->>BPM: fetch_page(PageKey{file_id,page_id})
activate BPM
BPM->>PageCache: lookup PageKey in page_table_
PageCache-->>BPM: page or miss
alt page present
BPM-->>HT: pinned page
else
BPM-->>HT: create/pin new page
end
deactivate BPM
HT->>HT: set cached_page_ to pinned page
HT->>HT: serialize into cached_page_
end
HT->>HT: update metadata / last_page_id_
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/transaction/lock_manager.cpp (1)
67-83:⚠️ Potential issue | 🟠 MajorLock upgrade leaves stale shared lock request in queue.
When upgrading from shared to exclusive, the code detects the existing shared lock (
upgrade = true,it = iter) but never removes it. After the exclusive lock is granted, the transaction has both S and X entries in the queue. Whenunlock()is called, only one entry is removed, leaving the other behind.🐛 Proposed fix
queue.request_queue.push_back({txn->get_id(), LockMode::EXCLUSIVE, false}); const auto my_req = std::prev(queue.request_queue.end()); + + // If upgrading, remove the old shared lock entry + if (upgrade) { + queue.request_queue.erase(it); + } const auto timeout = std::chrono::milliseconds(1000);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/transaction/lock_manager.cpp` around lines 67 - 83, The upgrade path currently marks the existing SHARED request (upgrade=true, it=iter) but never removes it, leaving a stale S entry; fix by erasing the existing shared request from queue.request_queue when you detect an upgrade (use the saved iterator it / iter to call queue.request_queue.erase(it)) before proceeding to enqueue or grant the EXCLUSIVE lock, ensuring iterator validity and preserving correct behavior of unlock().src/storage/heap_table.cpp (1)
227-279:⚠️ Potential issue | 🔴 CriticalReject tuples that cannot be placed on a page.
A row whose serialized payload is larger than the usable page area can never satisfy the insert predicate, so this loop just keeps advancing pages forever. Also, a payload that exactly fills the remaining space on a fresh page is currently rejected because the capacity check uses
<instead of<=.💡 Possible guard
- const auto required = static_cast<uint16_t>(payload_size); + const size_t max_payload = + Page::PAGE_SIZE - sizeof(PageHeader) - (DEFAULT_SLOT_COUNT * sizeof(uint16_t)); + if (payload_size > max_payload) { + return {0, 0}; + } + const auto required = static_cast<uint16_t>(payload_size); ... - if (header.free_space_offset + required < Page::PAGE_SIZE && + if (header.free_space_offset + required <= Page::PAGE_SIZE && header.num_slots < DEFAULT_SLOT_COUNT) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/heap_table.cpp` around lines 227 - 279, The loop can spin forever for payloads too large for any page and currently rejects payloads that exactly fit because it uses '<'; modify Insert logic in this block (symbols: required, payload_size, Page::PAGE_SIZE, header.free_space_offset, DEFAULT_SLOT_COUNT, last_page_id_, cached_page_) to first compute the usable space per fresh page (Page::PAGE_SIZE minus sizeof(PageHeader) minus DEFAULT_SLOT_COUNT*sizeof(uint16_t)) and immediately return failure (e.g., {0,0}) if required > usable_space_per_page; then change the free-space check from "header.free_space_offset + required < Page::PAGE_SIZE" to use "<= Page::PAGE_SIZE" (or equivalently ensure the comparison allows exact-fill) so a payload that exactly fills remaining space is accepted. Ensure this oversized check runs before advancing pages (so the loop won't spin) and that header initialization and buffer writes remain unchanged for normal inserts.
🧹 Nitpick comments (4)
benchmarks/sqlite_comparison_bench.cpp (1)
141-158: Consider wrapping SQLite benchmark in explicit transaction for consistency.The cloudSQL benchmark uses
BEGIN/COMMITwhile SQLite relies onsynchronous = OFF. For a more apples-to-apples comparison, consider adding an explicit transaction:💡 Optional improvement
sqlite3_stmt* stmt; sqlite3_prepare_v2(ctx.db, "INSERT INTO bench_table VALUES (?, ?, ?)", -1, &stmt, nullptr); + sqlite3_exec(ctx.db, "BEGIN", nullptr, nullptr, nullptr); for (auto _ : state) { sqlite3_bind_int64(stmt, 1, state.iterations()); sqlite3_bind_double(stmt, 2, 3.14); sqlite3_bind_text(stmt, 3, "some_payload_data", -1, SQLITE_STATIC); sqlite3_step(stmt); sqlite3_reset(stmt); } + sqlite3_exec(ctx.db, "COMMIT", nullptr, nullptr, nullptr); sqlite3_finalize(stmt);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/sqlite_comparison_bench.cpp` around lines 141 - 158, The SQLite benchmark BM_SQLite_Insert should wrap the repeated inserts in an explicit transaction to match cloudSQL's BEGIN/COMMIT behavior: before the benchmark loop call BEGIN (e.g., via sqlite3_exec(ctx.db, "BEGIN", ...)) and after the loop call COMMIT, keeping the existing prepared statement (stmt) and finalize/reset logic unchanged; reference BM_SQLite_Insert, SQLiteContext, and sqlite3_exec/sqlite3_prepare_v2/sqlite3_step/sqlite3_reset/sqlite3_finalize when making the change.include/storage/heap_table.hpp (1)
55-60: Hash function may cause collisions for large page numbers.The hash shifts
page_numby only 16 bits, which discards the upper 16 bits whenpage_num > 65535. For databases with many pages, this could degrade hash table performance.Consider using a stronger mixing function:
🔧 Suggested improvement
struct Hash { std::size_t operator()(const TupleId& tid) const { - return (static_cast<size_t>(tid.page_num) << 16) ^ - static_cast<size_t>(tid.slot_num); + // Use full page_num range with better mixing + std::size_t h = static_cast<size_t>(tid.page_num); + h ^= h >> 16; + h *= 0x85ebca6b; + h ^= static_cast<size_t>(tid.slot_num); + return h; } };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/storage/heap_table.hpp` around lines 55 - 60, The current Hash::operator() truncates page_num by shifting only 16 bits which risks collisions when page_num > 65535; update struct Hash to combine the full page_num and slot_num (TupleId) using a stronger mixing strategy—e.g., cast page_num and slot_num into a single 64-bit value before hashing, or use std::hash on a std::pair/tuple of page_num and slot_num or a well-known combine routine (like boost::hash_combine or a rotate-xor mix) so both fields contribute fully to the resulting std::size_t; locate Hash::operator() in include/storage/heap_table.hpp and replace the simple (page_num<<16) ^ slot_num logic with one of these robust combine approaches.include/parser/expression.hpp (1)
63-65: Pass bound parameters through the vectorized API as well.
ParameterExprcan resolve?in scalar evaluation, butevaluate_vectorized()still has noparamsinput anywhere in the interface. If a parameterized expression reaches a vectorized operator, it has no clean way to materialize the bound value.Also applies to: 186-190
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/parser/expression.hpp` around lines 63 - 65, The vectorized evaluation API lacks a params parameter so ParameterExpr cannot resolve bound parameters in vectorized execution; update the virtual method evaluate_vectorized (and its overloads/usages) to accept const std::vector<common::Value>* params = nullptr and propagate this parameter through all implementations and callers (e.g., ParameterExpr::evaluate_vectorized, all Expression subclasses, and any operators that call evaluate_vectorized) so bound parameters are available during vectorized materialization; ensure the new parameter is forwarded wherever evaluate_vectorized is invoked and add null checks where needed.src/executor/query_executor.cpp (1)
915-924: Let?predicates use the index fast path too.The optimizer still only treats
Column = Constantas eligible, soColumn = ?falls back to a sequential scan even thoughevaluate(..., current_params_)can now resolve the bound value. Extending this branch to acceptExprType::Parameterwould let preparedSELECTs benefit from indexes as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/executor/query_executor.cpp` around lines 915 - 924, The code in the binary-expression handling only marks Column=Constant as eligible for the index fast path; change the type checks in the block that examines bin_expr->left()/bin_expr->right() so that it accepts ExprType::Parameter in addition to ExprType::Constant (i.e., treat Constant or Parameter as the constant side), then call evaluate(nullptr, nullptr, current_params_) as you do now to resolve the bound value into const_val and set eligible=true; update both branches that currently check only parser::ExprType::Constant (the checks around bin_expr->left().type() / bin_expr->right().type()) and ensure col_name, const_val and eligible are assigned the same way when the constant side is a Parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@include/common/arena_allocator.hpp`:
- Around line 32-36: The oversized-allocation path in ArenaAllocator currently
appends a dedicated buffer to chunks_ and leaves current_chunk_idx_ /
current_offset_ unchanged, allowing subsequent small allocations to wrongly
reuse that buffer; change the logic so that when bytes > chunk_size_ you
allocate a dedicated buffer but do NOT add it into chunks_ (or instead add it
into a separate container like large_chunks_); keep current_chunk_idx_ and
current_offset_ unchanged so the reusable chunk pool is not polluted, and update
the destructor (~ArenaAllocator) to free buffers from both chunks_ and the
separate large_chunks_ (or otherwise ensure dedicated buffers are deleted) —
apply the same fix to the other oversized branches referenced near the other
allocation sites (the blocks around the other bytes > chunk_size_ checks).
In `@include/executor/query_executor.hpp`:
- Around line 32-36: PreparedStatement currently caches mutable storage handles
(table_meta, schema, table, indexes) which are mutated per-execution; remove
these members from PreparedStatement and stop storing storage::HeapTable and
storage::BTreeIndex pointers on the prepared object. Instead, allocate or
resolve table_meta, create Schema, instantiate storage::HeapTable and the vector
of storage::BTreeIndex inside the per-execution path (e.g., in
execute()/ExecuteQuery() or an ExecutionContext object) so each execution/thread
gets its own handles and DDL changes aren't stale; update any code referencing
PreparedStatement::table_meta, ::schema, ::table, or ::indexes to fetch them
from the per-execution context.
In `@include/storage/buffer_pool_manager.hpp`:
- Around line 113-117: The Hash::operator() currently does a left shift of
static_cast<size_t>(key.file_id) by 32 which is undefined on 32-bit platforms
and causes collisions; update PageKey::Hash (the operator() used by page_table_)
to use a portable hash combine instead (e.g., compute separate hashes for
key.file_id and key.page_id with std::hash and mix/combine them with XOR and a
rotation or multiply-by-constant) so it does not rely on a 64-bit shift and
yields distinct values across platforms.
In `@src/executor/query_executor.cpp`:
- Around line 341-342: The arena_ is being reset while returned rows are still
backed by it, causing dangling storage; in the execute_select() path ensure
tuples are detached/copied out of arena_ (e.g., move data into QueryResult-owned
buffers or invoke the existing detach/move routine) before calling
arena_.reset(); specifically, update the code around arena_.reset() so it only
runs after execute_select() has finished moving/owning all materialized tuples
into QueryResult (or call the function that detaches rows) to avoid reclaiming
the arena-backed memory prematurely.
- Around line 198-217: After prepared.table->insert(tuple, xmin) succeeds you
must immediately create rollback state so a partial failure (e.g. in
apply_index_write) can undo the heap tuple; move or add the
txn->add_undo_log(transaction::UndoLog::Type::INSERT, prepared.table_meta->name,
tid) call directly after the insert (before the index loop), and ensure
lock_manager_.acquire_exclusive(txn, tid) is called at an appropriate time
(either before insert or immediately after adding the undo log) so that any
thrown exceptions during apply_index_write on prepared.indexes[i] will trigger
the undo logic; update error handling to rely on the undo log for cleanup rather
than only adding it after index updates.
- Around line 174-223: The fast-path in QueryExecutor::execute for
PreparedStatement INSERTs bypasses the sharding/routing and durability/WAL logic
used by normal inserts; change it to follow the same path as execute_insert by
delegating to or reusing its internals instead of writing local heap/indexes
directly. Specifically, inside QueryExecutor::execute when prepared.stmt->type()
== parser::StmtType::Insert and prepared.table is set, invoke the same
sharding/forwarding and WAL/append-log steps (or call execute_insert(prepared,
params)) so routing, cluster forwarding, and inserting an INSERT log record are
preserved; also ensure transaction handling (is_auto_commit, txn start/commit)
and index/undo-log updates remain consistent with execute_insert's behavior.
- Around line 46-48: The process-wide statement_cache_ in QueryExecutor is
unbounded and will leak memory for ad-hoc / literal-heavy SQL; replace the raw
std::unordered_map with a bounded LRU or size-limited cache (e.g., maintain an
eviction list + map or use a thread-safe LRU helper) and enforce a configurable
max_entries; update accesses in QueryExecutor to acquire cache_mutex_ while
looking up/inserting and to evict the least-recently-used entry when the limit
is reached; apply the same bounded-cache change to the other statement cache
instance referenced in the file (the other declaration/use around the later
block) so both caches obey the size limit and remain thread-safe.
In `@src/parser/expression.cpp`:
- Around line 337-343: ParameterExpr::evaluate_vectorized currently does nothing
and the vectorized API lacks a params argument so parameter values can't be
substituted; either (A) change the evaluate_vectorized signature (declared in
include/parser/expression.hpp) to accept a params argument (mirror whatever type
is used by row evaluation), update all callers (e.g., VectorizedFilterOperator
and VectorizedProjectOperator) to pass params, and implement
ParameterExpr::evaluate_vectorized to fill executor::ColumnVector using
parameter values, or (B) if adding params is too invasive now, make
ParameterExpr::evaluate_vectorized explicitly reject vectorized execution by
returning/setting an error or raising an assertion so queries with ParameterExpr
are forced to fallback to row-by-row evaluation via evaluate(&t, &schema); pick
one approach and apply it consistently for ParameterExpr and any other
expressions that use params.
- Around line 324-335: Operator evaluate calls are missing the bound parameter
vector so ParameterExpr evaluates to NULL; update all evaluate(...) invocations
in the operator implementations to pass the operator's bound params via
get_params(). Specifically, change calls like condition_->evaluate(&tuple,
&schema_) in FilterOperator, col->evaluate(&input, &input_schema) in
ProjectOperator, the evaluate calls in SortOperator, the expression evaluations
in AggregateOperator, and the evaluate calls in HashJoinOperator to include
get_params() as the third argument (matching ParameterExpr::evaluate signature)
so bound parameters propagate into expression evaluation.
In `@src/storage/heap_table.cpp`:
- Around line 288-300: The cached-page fast paths (using cached_page_,
cached_page_id_, tuple_id and reading PageHeader into header) skip the slow-path
validation and can index past the slot directory; fix by validating the page is
initialized and that tuple_id.slot_num < header.num_slots before reading/writing
the slot entry or writing xmax (i.e., after memcpy into PageHeader check a
plausible initialization marker or at minimum header.num_slots > 0 and
tuple_id.slot_num < header.num_slots), and if validation fails return false;
apply the same validation to the other cached-page branches noted (around the
regions corresponding to the diffs at 333-338 and 376-385).
- Around line 41-44: The destructor HeapTable::~HeapTable() currently forces a
flush by calling bpm_.unpin_page(filename_, cached_page_id_, true) on
cached_page_, which can persist uncommitted/failed inserts; change it to avoid
unconditional flush: when cached_page_ != nullptr call
bpm_.unpin_page(filename_, cached_page_id_, false) (i.e., do not mark
dirty/force write) or otherwise only write back when the page is known safe to
persist (e.g., after commit/rollback bookkeeping), ensuring you keep the same
cached_page_/cached_page_id_ checks and clear cached_page_ afterwards.
---
Outside diff comments:
In `@src/storage/heap_table.cpp`:
- Around line 227-279: The loop can spin forever for payloads too large for any
page and currently rejects payloads that exactly fit because it uses '<'; modify
Insert logic in this block (symbols: required, payload_size, Page::PAGE_SIZE,
header.free_space_offset, DEFAULT_SLOT_COUNT, last_page_id_, cached_page_) to
first compute the usable space per fresh page (Page::PAGE_SIZE minus
sizeof(PageHeader) minus DEFAULT_SLOT_COUNT*sizeof(uint16_t)) and immediately
return failure (e.g., {0,0}) if required > usable_space_per_page; then change
the free-space check from "header.free_space_offset + required <
Page::PAGE_SIZE" to use "<= Page::PAGE_SIZE" (or equivalently ensure the
comparison allows exact-fill) so a payload that exactly fills remaining space is
accepted. Ensure this oversized check runs before advancing pages (so the loop
won't spin) and that header initialization and buffer writes remain unchanged
for normal inserts.
In `@src/transaction/lock_manager.cpp`:
- Around line 67-83: The upgrade path currently marks the existing SHARED
request (upgrade=true, it=iter) but never removes it, leaving a stale S entry;
fix by erasing the existing shared request from queue.request_queue when you
detect an upgrade (use the saved iterator it / iter to call
queue.request_queue.erase(it)) before proceeding to enqueue or grant the
EXCLUSIVE lock, ensuring iterator validity and preserving correct behavior of
unlock().
---
Nitpick comments:
In `@benchmarks/sqlite_comparison_bench.cpp`:
- Around line 141-158: The SQLite benchmark BM_SQLite_Insert should wrap the
repeated inserts in an explicit transaction to match cloudSQL's BEGIN/COMMIT
behavior: before the benchmark loop call BEGIN (e.g., via sqlite3_exec(ctx.db,
"BEGIN", ...)) and after the loop call COMMIT, keeping the existing prepared
statement (stmt) and finalize/reset logic unchanged; reference BM_SQLite_Insert,
SQLiteContext, and
sqlite3_exec/sqlite3_prepare_v2/sqlite3_step/sqlite3_reset/sqlite3_finalize when
making the change.
In `@include/parser/expression.hpp`:
- Around line 63-65: The vectorized evaluation API lacks a params parameter so
ParameterExpr cannot resolve bound parameters in vectorized execution; update
the virtual method evaluate_vectorized (and its overloads/usages) to accept
const std::vector<common::Value>* params = nullptr and propagate this parameter
through all implementations and callers (e.g.,
ParameterExpr::evaluate_vectorized, all Expression subclasses, and any operators
that call evaluate_vectorized) so bound parameters are available during
vectorized materialization; ensure the new parameter is forwarded wherever
evaluate_vectorized is invoked and add null checks where needed.
In `@include/storage/heap_table.hpp`:
- Around line 55-60: The current Hash::operator() truncates page_num by shifting
only 16 bits which risks collisions when page_num > 65535; update struct Hash to
combine the full page_num and slot_num (TupleId) using a stronger mixing
strategy—e.g., cast page_num and slot_num into a single 64-bit value before
hashing, or use std::hash on a std::pair/tuple of page_num and slot_num or a
well-known combine routine (like boost::hash_combine or a rotate-xor mix) so
both fields contribute fully to the resulting std::size_t; locate
Hash::operator() in include/storage/heap_table.hpp and replace the simple
(page_num<<16) ^ slot_num logic with one of these robust combine approaches.
In `@src/executor/query_executor.cpp`:
- Around line 915-924: The code in the binary-expression handling only marks
Column=Constant as eligible for the index fast path; change the type checks in
the block that examines bin_expr->left()/bin_expr->right() so that it accepts
ExprType::Parameter in addition to ExprType::Constant (i.e., treat Constant or
Parameter as the constant side), then call evaluate(nullptr, nullptr,
current_params_) as you do now to resolve the bound value into const_val and set
eligible=true; update both branches that currently check only
parser::ExprType::Constant (the checks around bin_expr->left().type() /
bin_expr->right().type()) and ensure col_name, const_val and eligible are
assigned the same way when the constant side is a Parameter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc6bbfab-e130-46cb-9ff1-2ca78cb22c64
📒 Files selected for processing (20)
benchmarks/sqlite_comparison_bench.cppinclude/common/arena_allocator.hppinclude/executor/operator.hppinclude/executor/query_executor.hppinclude/executor/types.hppinclude/parser/expression.hppinclude/parser/parser.hppinclude/storage/buffer_pool_manager.hppinclude/storage/heap_table.hppinclude/transaction/lock_manager.hppinclude/transaction/transaction.hppsrc/catalog/catalog.cppsrc/executor/operator.cppsrc/executor/query_executor.cppsrc/parser/expression.cppsrc/parser/lexer.cppsrc/parser/parser.cppsrc/storage/buffer_pool_manager.cppsrc/storage/heap_table.cppsrc/transaction/lock_manager.cpp
💤 Files with no reviewable changes (1)
- src/catalog/catalog.cpp
| ~ArenaAllocator() override { | ||
| for (auto* chunk : chunks_) { | ||
| delete[] chunk; | ||
| } | ||
| } |
There was a problem hiding this comment.
Keep oversized allocations out of the reusable chunk list.
The bytes > chunk_size_ branch appends the dedicated buffer to chunks_ but leaves current_chunk_idx_ / current_offset_ unchanged. The next small allocation can therefore be handed out from that same buffer at offset 0, aliasing the still-live large allocation.
💡 One way to isolate large allocations
~ArenaAllocator() override {
for (auto* chunk : chunks_) {
delete[] chunk;
}
+ for (auto* chunk : large_chunks_) {
+ delete[] chunk;
+ }
}
...
if (bytes > chunk_size_) {
auto* large_chunk = new uint8_t[bytes];
- chunks_.push_back(large_chunk);
+ large_chunks_.push_back(large_chunk);
// We don't make this the "current" chunk for small allocations
// to avoid wasting space. We just return it.
return large_chunk;
}
...
std::vector<uint8_t*> chunks_;
+ std::vector<uint8_t*> large_chunks_;
size_t current_chunk_idx_;
size_t current_offset_;Also applies to: 79-85, 111-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@include/common/arena_allocator.hpp` around lines 32 - 36, The
oversized-allocation path in ArenaAllocator currently appends a dedicated buffer
to chunks_ and leaves current_chunk_idx_ / current_offset_ unchanged, allowing
subsequent small allocations to wrongly reuse that buffer; change the logic so
that when bytes > chunk_size_ you allocate a dedicated buffer but do NOT add it
into chunks_ (or instead add it into a separate container like large_chunks_);
keep current_chunk_idx_ and current_offset_ unchanged so the reusable chunk pool
is not polluted, and update the destructor (~ArenaAllocator) to free buffers
from both chunks_ and the separate large_chunks_ (or otherwise ensure dedicated
buffers are deleted) — apply the same fix to the other oversized branches
referenced near the other allocation sites (the blocks around the other bytes >
chunk_size_ checks).
| // Cached execution state for hot-path optimization | ||
| const TableInfo* table_meta = nullptr; | ||
| std::unique_ptr<Schema> schema; | ||
| std::unique_ptr<storage::HeapTable> table; | ||
| std::vector<std::unique_ptr<storage::BTreeIndex>> indexes; |
There was a problem hiding this comment.
Don't cache mutable storage handles inside a reusable prepared statement.
PreparedStatement is reused by design and even passed around as const, but these members are mutated on every execution. Sharing one prepared statement across threads will race on heap-page caching and B-tree writes, and any DDL after prepare() leaves this cached execution state stale.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@include/executor/query_executor.hpp` around lines 32 - 36, PreparedStatement
currently caches mutable storage handles (table_meta, schema, table, indexes)
which are mutated per-execution; remove these members from PreparedStatement and
stop storing storage::HeapTable and storage::BTreeIndex pointers on the prepared
object. Instead, allocate or resolve table_meta, create Schema, instantiate
storage::HeapTable and the vector of storage::BTreeIndex inside the
per-execution path (e.g., in execute()/ExecuteQuery() or an ExecutionContext
object) so each execution/thread gets its own handles and DDL changes aren't
stale; update any code referencing PreparedStatement::table_meta, ::schema,
::table, or ::indexes to fetch them from the per-execution context.
| struct Hash { | ||
| std::size_t operator()(const PageKey& key) const { | ||
| return (static_cast<size_t>(key.file_id) << 32) | static_cast<size_t>(key.page_id); | ||
| } | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
C++ shift by width of type undefined behavior
💡 Result:
In C++, for the built-in shift operators E1 << E2 and E1 >> E2, the behavior is undefined if the shift count E2 is:
- negative, or
- greater than or equal to the bit-width of the promoted left operand (i.e., after integral promotions are applied to
E1).
This is the key point: the limit is based on the width of the type after promotion, not necessarily the original type ofE1. [1]
This rule is stated in the C++ standard wording for shift expressions. [1] It’s also summarized the same way by cppreference. [2]
Practical implications / examples (typical 32-bit int):
1u << 32is UB (32 is not< 32).- If
std::uint8_t x = 1;, thenx << 8is often UB becausexis promoted toint, and the shift is performed inint(commonly 32-bit), so8is fine—butx << 32would still be UB for the same reason as above. [1][2]
To avoid UB, guard the shift:
- Ensure
0 <= shift && shift < std::numeric_limits<decltype(+lhs)>::digits(using+lhsto force promotion), or shift in a known wider unsigned type. [1][2]
Some CPUs mask the shift count (so “it seems to work”), but the language still says it’s UB, and compilers may optimize assuming it never happens. [3]
Sources:
[1] C++ draft (eel.is), shift operator rule (“undefined if … >= the width of the promoted left operand”)
[2] cppreference, shift operator undefined-behavior condition
[3] SEI CERT C guidance noting oversized shifts are UB and hardware may reduce the count modulo the width
🏁 Script executed:
# Check for platform-specific code or 32-bit support indicators
rg -i "32.bit|32bit|x86|arm|platform" include/storage/buffer_pool_manager.hpp
# Check if PageKey is used with unordered_map/unordered_set (where Hash is used)
rg "PageKey" --type cpp --type hpp | head -20
# Check CMakeLists or build configuration for platform targets
fd -e cmake -e txt | xargs grep -l "32\|platform\|architecture" 2>/dev/null | head -5Repository: poyrazK/cloudSQL
Length of output: 87
🏁 Script executed:
# Verify the proposed fix syntax is valid C++
# Check if std::hash is included or available in this file
rg "#include.*hash" include/storage/buffer_pool_manager.hpp
# Check what headers are already included
head -50 include/storage/buffer_pool_manager.hpp | grep "#include"Repository: poyrazK/cloudSQL
Length of output: 274
🏁 Script executed:
# Check the actual context around the Hash function to understand the complete picture
sed -n '100,125p' include/storage/buffer_pool_manager.hppRepository: poyrazK/cloudSQL
Length of output: 799
🏁 Script executed:
# Search for PageKey usage in the codebase
rg "PageKey" . --type cpp --type py | head -30Repository: poyrazK/cloudSQL
Length of output: 837
🏁 Script executed:
# Check specifically for unordered_map or unordered_set with PageKey
rg "unordered_map<PageKey|unordered_set<PageKey" . --type cpp --type pyRepository: poyrazK/cloudSQL
Length of output: 171
🏁 Script executed:
# Look for the full class to see member variables that use PageKey hash
sed -n '100,200p' include/storage/buffer_pool_manager.hpp | tail -80Repository: poyrazK/cloudSQL
Length of output: 1469
Hash function has undefined behavior on 32-bit platforms and causes hash collisions.
Shifting size_t left by 32 bits is undefined behavior when size_t is 32 bits (per C++20 standard, any shift amount ≥ type width is UB). The page_table_ member actively uses PageKey::Hash, making this a correctness issue. On 32-bit systems, the shift would result in 0, causing all keys with the same page_id to collide and degrade performance or cause data loss.
🔧 Proposed fix using portable hash combining
struct Hash {
std::size_t operator()(const PageKey& key) const {
- return (static_cast<size_t>(key.file_id) << 32) | static_cast<size_t>(key.page_id);
+ // Portable hash combining that works on both 32-bit and 64-bit systems
+ std::size_t h1 = std::hash<uint32_t>{}(key.file_id);
+ std::size_t h2 = std::hash<uint32_t>{}(key.page_id);
+ return h1 ^ (h2 + 0x9e3779b9 + (h1 << 6) + (h1 >> 2));
}
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| struct Hash { | |
| std::size_t operator()(const PageKey& key) const { | |
| return (static_cast<size_t>(key.file_id) << 32) | static_cast<size_t>(key.page_id); | |
| } | |
| }; | |
| struct Hash { | |
| std::size_t operator()(const PageKey& key) const { | |
| // Portable hash combining that works on both 32-bit and 64-bit systems | |
| std::size_t h1 = std::hash<uint32_t>{}(key.file_id); | |
| std::size_t h2 = std::hash<uint32_t>{}(key.page_id); | |
| return h1 ^ (h2 + 0x9e3779b9 + (h1 << 6) + (h1 >> 2)); | |
| } | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@include/storage/buffer_pool_manager.hpp` around lines 113 - 117, The
Hash::operator() currently does a left shift of static_cast<size_t>(key.file_id)
by 32 which is undefined on 32-bit platforms and causes collisions; update
PageKey::Hash (the operator() used by page_table_) to use a portable hash
combine instead (e.g., compute separate hashes for key.file_id and key.page_id
with std::hash and mix/combine them with XOR and a rotation or
multiply-by-constant) so it does not rely on a 64-bit shift and yields distinct
values across platforms.
| // Define static members for statement cache | ||
| std::unordered_map<std::string, std::shared_ptr<parser::Statement>> QueryExecutor::statement_cache_; | ||
| std::mutex QueryExecutor::cache_mutex_; |
There was a problem hiding this comment.
Bound the process-wide statement cache.
statement_cache_ grows for every distinct SQL string and never evicts. Workloads with ad-hoc SQL or literal-heavy statements will turn this into unbounded memory growth.
Also applies to: 244-263
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/executor/query_executor.cpp` around lines 46 - 48, The process-wide
statement_cache_ in QueryExecutor is unbounded and will leak memory for ad-hoc /
literal-heavy SQL; replace the raw std::unordered_map with a bounded LRU or
size-limited cache (e.g., maintain an eviction list + map or use a thread-safe
LRU helper) and enforce a configurable max_entries; update accesses in
QueryExecutor to acquire cache_mutex_ while looking up/inserting and to evict
the least-recently-used entry when the limit is reached; apply the same
bounded-cache change to the other statement cache instance referenced in the
file (the other declaration/use around the later block) so both caches obey the
size limit and remain thread-safe.
| QueryResult QueryExecutor::execute(const PreparedStatement& prepared, | ||
| const std::vector<common::Value>& params) { | ||
| // Fast-path for INSERT | ||
| if (prepared.stmt->type() == parser::StmtType::Insert && prepared.table) { | ||
| const auto start = std::chrono::high_resolution_clock::now(); | ||
| QueryResult result; | ||
| current_params_ = ¶ms; | ||
|
|
||
| const bool is_auto_commit = (current_txn_ == nullptr); | ||
| transaction::Transaction* txn = current_txn_; | ||
| if (is_auto_commit) txn = transaction_manager_.begin(); | ||
|
|
||
| try { | ||
| const auto& insert_stmt = dynamic_cast<const parser::InsertStatement&>(*prepared.stmt); | ||
| uint64_t rows_inserted = 0; | ||
| const uint64_t xmin = (txn != nullptr) ? txn->get_id() : 0; | ||
|
|
||
| for (const auto& row_exprs : insert_stmt.values()) { | ||
| std::pmr::vector<common::Value> values(&arena_); | ||
| values.reserve(row_exprs.size()); | ||
| for (const auto& expr : row_exprs) { | ||
| values.push_back(expr->evaluate(nullptr, nullptr, current_params_)); | ||
| } | ||
|
|
||
| const Tuple tuple(std::move(values)); | ||
| const auto tid = prepared.table->insert(tuple, xmin); | ||
|
|
||
| // Index updates using cached index objects | ||
| std::string err; | ||
| for (size_t i = 0; i < prepared.indexes.size(); ++i) { | ||
| const auto& idx_info = prepared.table_meta->indexes[i]; | ||
| uint16_t pos = idx_info.column_positions[0]; | ||
| if (!apply_index_write(*prepared.indexes[i], tuple.get(pos), tid, | ||
| IndexOp::Insert, err)) { | ||
| throw std::runtime_error(err); | ||
| } | ||
| } | ||
|
|
||
| if (txn != nullptr) { | ||
| txn->add_undo_log(transaction::UndoLog::Type::INSERT, prepared.table_meta->name, | ||
| tid); | ||
| if (!lock_manager_.acquire_exclusive(txn, tid)) { | ||
| throw std::runtime_error("Failed to acquire exclusive lock"); | ||
| } | ||
| } | ||
| rows_inserted++; | ||
| } | ||
|
|
||
| if (is_auto_commit && txn != nullptr) transaction_manager_.commit(txn); | ||
| result.set_rows_affected(rows_inserted); |
There was a problem hiding this comment.
Keep prepared INSERTs on the same routing and durability path as normal INSERTs.
This shortcut writes the local heap and indexes directly, but it never runs the sharding forward path or appends an INSERT log record the way execute_insert() does. In clustered or WAL-enabled deployments, execute(*prepared, params) will diverge from regular INSERT semantics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/executor/query_executor.cpp` around lines 174 - 223, The fast-path in
QueryExecutor::execute for PreparedStatement INSERTs bypasses the
sharding/routing and durability/WAL logic used by normal inserts; change it to
follow the same path as execute_insert by delegating to or reusing its internals
instead of writing local heap/indexes directly. Specifically, inside
QueryExecutor::execute when prepared.stmt->type() == parser::StmtType::Insert
and prepared.table is set, invoke the same sharding/forwarding and
WAL/append-log steps (or call execute_insert(prepared, params)) so routing,
cluster forwarding, and inserting an INSERT log record are preserved; also
ensure transaction handling (is_auto_commit, txn start/commit) and
index/undo-log updates remain consistent with execute_insert's behavior.
| void ParameterExpr::evaluate_vectorized(const executor::VectorBatch& batch, | ||
| const executor::Schema& schema, | ||
| executor::ColumnVector& result) const { | ||
| (void)batch; | ||
| (void)schema; | ||
| (void)result; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for evaluate_vectorized call sites to understand usage patterns
echo "=== evaluate_vectorized call sites ==="
rg -n "evaluate_vectorized" --type cpp -C3
echo ""
echo "=== Check if VectorBatch is used in query execution paths ==="
rg -n "VectorBatch" --type cpp -lRepository: poyrazK/cloudSQL
Length of output: 12709
🏁 Script executed:
# Check specific implementations mentioned in the review
echo "=== BinaryExpr::evaluate_vectorized (around line 71-130) ==="
sed -n '71,130p' src/parser/expression.cpp
echo ""
echo "=== UnaryExpr::evaluate_vectorized (around line 209-240) ==="
sed -n '209,240p' src/parser/expression.cpp
echo ""
echo "=== FunctionExpr::evaluate_vectorized (around line 372-420) ==="
sed -n '372,420p' src/parser/expression.cppRepository: poyrazK/cloudSQL
Length of output: 5959
🏁 Script executed:
# Check if any evaluate_vectorized implementations call evaluate() with params
echo "=== Search for evaluate() calls within evaluate_vectorized implementations ==="
rg -A 15 "void.*evaluate_vectorized" src/parser/expression.cpp | grep -E "(evaluate_vectorized|evaluate\()" | head -40Repository: poyrazK/cloudSQL
Length of output: 940
🏁 Script executed:
# Check tests for ParameterExpr with vectorized execution
echo "=== Search for ParameterExpr in tests ==="
rg -n "ParameterExpr" tests/ --type cpp -B2 -A2
echo ""
echo "=== Search for params with vectorized tests ==="
rg -n "evaluate_vectorized" tests/ --type cpp -B5 -A5Repository: poyrazK/cloudSQL
Length of output: 980
🏁 Script executed:
# Check vectorized_operator.hpp for any validation or comments about params/ParameterExpr
echo "=== vectorized_operator.hpp content (around condition/expression handling) ==="
sed -n '100,180p' include/executor/vectorized_operator.hpp
echo ""
echo "=== Search for any validation of expression types in vectorized context ==="
rg -n "ParameterExpr|params" include/executor/vectorized_operator.hpp src/executor/vectorized_operator.cppRepository: poyrazK/cloudSQL
Length of output: 3338
🏁 Script executed:
# Check if there are comments explaining the design choice for params in vectorized API
echo "=== Search for comments about vectorized and params in header files ==="
rg -B3 "void evaluate_vectorized" include/parser/expression.hpp | head -30Repository: poyrazK/cloudSQL
Length of output: 1834
🏁 Script executed:
# Check how vectorized_operator creates/filters expressions to see if ParameterExpr is ever used
echo "=== Check vectorized operator implementation for expression setup ==="
rg -n "expressions_" include/executor/vectorized_operator.hpp src/executor/vectorized_operator.cpp | head -20Repository: poyrazK/cloudSQL
Length of output: 669
ParameterExpr::evaluate_vectorized() is an unimplemented stub that silently produces incorrect results.
This method does not populate the result vector. Additionally, the evaluate_vectorized() API signature (defined in include/parser/expression.hpp:70-72) lacks a params parameter entirely, while other expression types that fall back to row-by-row evaluation call evaluate(&t, &schema) without parameters. This means:
- If a query with parameter placeholders reaches vectorized execution (via
VectorizedFilterOperatororVectorizedProjectOperator), the result vector will remain unpopulated or stale. - Even if this method were implemented, it cannot receive bound parameter values since the vectorized API does not support them.
Recommend either: (1) extending the evaluate_vectorized() signature to include a params parameter and implementing proper parameter substitution, or (2) explicitly rejecting vectorized evaluation of expressions containing parameters until full support is added.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/parser/expression.cpp` around lines 337 - 343,
ParameterExpr::evaluate_vectorized currently does nothing and the vectorized API
lacks a params argument so parameter values can't be substituted; either (A)
change the evaluate_vectorized signature (declared in
include/parser/expression.hpp) to accept a params argument (mirror whatever type
is used by row evaluation), update all callers (e.g., VectorizedFilterOperator
and VectorizedProjectOperator) to pass params, and implement
ParameterExpr::evaluate_vectorized to fill executor::ColumnVector using
parameter values, or (B) if adding params is too invasive now, make
ParameterExpr::evaluate_vectorized explicitly reject vectorized execution by
returning/setting an error or raising an assertion so queries with ParameterExpr
are forced to fallback to row-by-row evaluation via evaluate(&t, &schema); pick
one approach and apply it consistently for ParameterExpr and any other
expressions that use params.
| HeapTable::~HeapTable() { | ||
| if (cached_page_ != nullptr) { | ||
| bpm_.unpin_page(filename_, cached_page_id_, true); | ||
| cached_page_ = nullptr; |
There was a problem hiding this comment.
Don't flush cached_page_ unconditionally from the destructor.
The stack-allocated HeapTable instances in the non-prepared DML paths can unwind through this destructor after insert() dirties the page but before rollback bookkeeping is established. unpin_page(..., true) then persists a row from a failed statement.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/storage/heap_table.cpp` around lines 41 - 44, The destructor
HeapTable::~HeapTable() currently forces a flush by calling
bpm_.unpin_page(filename_, cached_page_id_, true) on cached_page_, which can
persist uncommitted/failed inserts; change it to avoid unconditional flush: when
cached_page_ != nullptr call bpm_.unpin_page(filename_, cached_page_id_, false)
(i.e., do not mark dirty/force write) or otherwise only write back when the page
is known safe to persist (e.g., after commit/rollback bookkeeping), ensuring you
keep the same cached_page_/cached_page_id_ checks and clear cached_page_
afterwards.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@include/executor/types.hpp`:
- Around line 133-138: Add back backward-compatible constructors to Tuple to
accept plain std::vector<common::Value> and initializer lists: implement
Tuple(std::vector<common::Value> values) that constructs the internal
std::pmr::vector<common::Value> values_ from the std::vector contents
(delegating/transforming into values_ so existing call sites that pass
std::vector compile), and implement Tuple(std::initializer_list<common::Value>
init) (and the variant taking std::pmr::memory_resource* as well) that
constructs values_ from the initializer list; update Tuple’s overloads (class
Tuple and member values_) so these new constructors forward/convert into the
existing pmr-backed storage without changing other APIs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 08294201-15ef-48e0-8dcd-0db6f334b521
📒 Files selected for processing (2)
include/executor/types.hpptests/transaction_coverage_tests.cpp
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
include/storage/buffer_pool_manager.hpp (1)
113-117:⚠️ Potential issue | 🔴 CriticalHash function has undefined behavior on 32-bit platforms.
The left shift by 32 bits is undefined behavior when
size_tis 32 bits (C++ standard: shift amount ≥ type width is UB). On such platforms,file_idcontribution becomes 0, causing all keys with the samepage_idto collide.🔧 Proposed fix using portable hash combining
struct Hash { std::size_t operator()(const PageKey& key) const { - return (static_cast<size_t>(key.file_id) << 32) | static_cast<size_t>(key.page_id); + // Portable hash combining that works on both 32-bit and 64-bit systems + std::size_t h1 = std::hash<uint32_t>{}(key.file_id); + std::size_t h2 = std::hash<uint32_t>{}(key.page_id); + return h1 ^ (h2 + 0x9e3779b9 + (h1 << 6) + (h1 >> 2)); } };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/storage/buffer_pool_manager.hpp` around lines 113 - 117, The Hash::operator() currently shifts file_id left by 32 bits which is UB on 32-bit platforms; replace that logic with a portable hash-combine of PageKey.file_id and PageKey.page_id (e.g., compute std::hash for each field and mix them with a standard combine like XOR with a golden constant and shifts or use std::hash<std::pair<...>>/std::uint64_t hashes) so both file_id and page_id contribute safely to the resulting std::size_t; update the Hash struct (operator(), PageKey/file_id/page_id references) to use this portable combine approach.
🧹 Nitpick comments (1)
include/executor/types.hpp (1)
142-150: Add null guards tomemory_resourceconstructors to prevent undefined behavior.Lines 142 and 149-150 accept
memory_resource*without validation. The C++ standard requirespmr::polymorphic_allocatorto receive non-null pointers; passingnullptrviolates this precondition and causes undefined behavior. Although these constructors are currently unused in the codebase, hardening them prevents future misuse.Proposed fix
class Tuple { private: std::pmr::vector<common::Value> values_; + static std::pmr::memory_resource* require_resource(std::pmr::memory_resource* mr) { + if (mr == nullptr) { + throw std::invalid_argument("Tuple memory_resource must not be null"); + } + return mr; + } public: - explicit Tuple(std::pmr::memory_resource* mr) : values_(mr) {} + explicit Tuple(std::pmr::memory_resource* mr) : values_(require_resource(mr)) {} - Tuple(const std::vector<common::Value>& values, std::pmr::memory_resource* mr) - : values_(values.begin(), values.end(), mr) {} + Tuple(const std::vector<common::Value>& values, std::pmr::memory_resource* mr) + : values_(values.begin(), values.end(), require_resource(mr)) {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/executor/types.hpp` around lines 142 - 150, The Tuple constructors that accept std::pmr::memory_resource* (explicit Tuple(std::pmr::memory_resource* mr) and Tuple(const std::vector<common::Value>& values, std::pmr::memory_resource* mr)) must guard against nullptr to avoid undefined behavior; update both constructors to check if mr is null and either replace it with std::pmr::get_default_resource() or throw a clear exception (e.g., std::invalid_argument) before constructing values_, ensuring the polymorphic_allocator always receives a non-null memory_resource pointer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@include/executor/types.hpp`:
- Around line 146-147: The code uses std::make_move_iterator in the Tuple
constructor (Tuple(std::vector<common::Value>&& values) :
values_(std::make_move_iterator(values.begin()),
std::make_move_iterator(values.end())) {}), but the <iterator> header is
missing; add an `#include` <iterator> at the top of the file so
std::make_move_iterator is declared and the code is portable across standard
library implementations.
---
Duplicate comments:
In `@include/storage/buffer_pool_manager.hpp`:
- Around line 113-117: The Hash::operator() currently shifts file_id left by 32
bits which is UB on 32-bit platforms; replace that logic with a portable
hash-combine of PageKey.file_id and PageKey.page_id (e.g., compute std::hash for
each field and mix them with a standard combine like XOR with a golden constant
and shifts or use std::hash<std::pair<...>>/std::uint64_t hashes) so both
file_id and page_id contribute safely to the resulting std::size_t; update the
Hash struct (operator(), PageKey/file_id/page_id references) to use this
portable combine approach.
---
Nitpick comments:
In `@include/executor/types.hpp`:
- Around line 142-150: The Tuple constructors that accept
std::pmr::memory_resource* (explicit Tuple(std::pmr::memory_resource* mr) and
Tuple(const std::vector<common::Value>& values, std::pmr::memory_resource* mr))
must guard against nullptr to avoid undefined behavior; update both constructors
to check if mr is null and either replace it with
std::pmr::get_default_resource() or throw a clear exception (e.g.,
std::invalid_argument) before constructing values_, ensuring the
polymorphic_allocator always receives a non-null memory_resource pointer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 37bade2a-3438-43a9-899b-8a333d33e3a2
📒 Files selected for processing (4)
include/executor/types.hppinclude/storage/buffer_pool_manager.hppsrc/storage/buffer_pool_manager.cpptests/lock_manager_tests.cpp
include/executor/types.hpp
Outdated
| Tuple(std::vector<common::Value>&& values) | ||
| : values_(std::make_move_iterator(values.begin()), std::make_move_iterator(values.end())) {} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify make_move_iterator usage and whether <iterator> is explicitly included
rg -n --type=cpp '\bmake_move_iterator\s*\(' include/executor/types.hpp
rg -n --type=cpp '^#include <iterator>$' include/executor/types.hpp || trueRepository: poyrazK/cloudSQL
Length of output: 164
🏁 Script executed:
head -30 include/executor/types.hppRepository: poyrazK/cloudSQL
Length of output: 808
Add the missing <iterator> include for std::make_move_iterator.
Line 147 uses std::make_move_iterator, which is defined in the <iterator> header. While some implementations provide it transitively through other includes, this is not guaranteed by the C++ standard and can cause portability issues on stricter toolchains or different standard library implementations.
Proposed fix
`#include` <initializer_list>
+#include <iterator>
`#include` <memory>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Tuple(std::vector<common::Value>&& values) | |
| : values_(std::make_move_iterator(values.begin()), std::make_move_iterator(values.end())) {} | |
| `#include` <initializer_list> | |
| `#include` <iterator> | |
| `#include` <memory> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@include/executor/types.hpp` around lines 146 - 147, The code uses
std::make_move_iterator in the Tuple constructor
(Tuple(std::vector<common::Value>&& values) :
values_(std::make_move_iterator(values.begin()),
std::make_move_iterator(values.end())) {}), but the <iterator> header is
missing; add an `#include` <iterator> at the top of the file so
std::make_move_iterator is declared and the code is portable across standard
library implementations.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/storage/heap_table.cpp`:
- Around line 57-58: The Iterator constructor stores mr_ which can be nullptr
causing undefined behavior when used by next_meta(); change the initialization
so mr_ is never nullptr (use the same fallback pattern as
Operator::get_memory_resource()), e.g., set mr_ to mr if non-null otherwise to
std::pmr::get_default_resource() (or call the existing
Operator::get_memory_resource()-style helper) so subsequent code in
HeapTable::Iterator::next_meta() can safely construct pmr containers with mr_.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b0ba190c-10c4-4b6a-a325-92d2f2bc5434
📒 Files selected for processing (2)
src/storage/heap_table.cpptests/cloudSQL_tests.cpp
This PR implements a series of high-performance engine optimizations to close the throughput gap with SQLite.
Key Changes:
Performance Impact:
Point insertion throughput increased from ~10k rows/s to ~15.6k rows/s on Apple M3 Pro, with sequential scans reaching ~3.1M items/s.
Summary by CodeRabbit
New Features
Performance Improvements
Other