perf: add benchmarking infrastructure and performance baseline report#16
perf: add benchmarking infrastructure and performance baseline report#16
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 0 minutes and 58 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis change adds comprehensive benchmarking infrastructure to the project, integrating Google Benchmark v1.8.3 via CMake, introducing three benchmark suites measuring storage, execution, and network subsystems, and providing an initial performance baseline report with profiling analysis. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (3)
benchmarks/network_bench.cpp (1)
24-31: Uncheckedsend()return values in handler.The handler ignores return values from
send()calls. If either send fails (e.g., client disconnects), the benchmark continues with corrupt state. Consider at minimum logging failures, or returning early.♻️ Proposed fix
server->set_handler(RpcType::AppendEntries, [](const RpcHeader& header, const std::vector<uint8_t>& payload, int client_fd) { RpcHeader resp_header = header; resp_header.payload_len = static_cast<uint16_t>(payload.size()); char header_buf[RpcHeader::HEADER_SIZE]; resp_header.encode(header_buf); - send(client_fd, header_buf, RpcHeader::HEADER_SIZE, 0); - send(client_fd, payload.data(), payload.size(), 0); + if (send(client_fd, header_buf, RpcHeader::HEADER_SIZE, 0) <= 0) { + return; + } + if (!payload.empty() && send(client_fd, payload.data(), payload.size(), 0) <= 0) { + return; + } });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/network_bench.cpp` around lines 24 - 31, The handler registered via server->set_handler for RpcType::AppendEntries ignores the return values from the two send() calls, which can hide socket errors and leave the benchmark in an invalid state; update the lambda (the AppendEntries handler) to check each send() return value, log failures using the project logger or stderr including client_fd and errno, and return early (or close the client_fd) on send errors to avoid continuing with corrupt state — ensure both the header send (after RpcHeader::encode) and the payload send are validated and handled.CMakeLists.txt (1)
94-98: Consider linkingbenchmark::benchmark_mainfor consistency.The macro links only
benchmark::benchmark, requiring each benchmark source to includeBENCHMARK_MAIN(). This works but linkingbenchmark::benchmark_maininstead would eliminate that boilerplate.♻️ Optional: Link benchmark_main to avoid BENCHMARK_MAIN() in sources
macro(add_cloudsql_benchmark NAME SOURCE) add_executable(${NAME} ${SOURCE}) - target_link_libraries(${NAME} sqlEngineCore benchmark::benchmark) + target_link_libraries(${NAME} sqlEngineCore benchmark::benchmark_main) endmacro()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CMakeLists.txt` around lines 94 - 98, The add_cloudsql_benchmark macro currently links targets to benchmark::benchmark requiring each benchmark source to declare BENCHMARK_MAIN(); update the macro (macro name add_cloudsql_benchmark) to also link benchmark::benchmark_main in target_link_libraries so executables automatically get the main entry for Google Benchmark (i.e., add benchmark::benchmark_main alongside benchmark::benchmark when calling target_link_libraries for the created target).docs/performance/REPORT_V1.md (1)
27-35: Add language specifier to fenced code block.The static analysis tool flagged a missing language specifier. Since this is a call graph/ASCII representation, use
textorplaintext.📝 Proposed fix
-``` +```text BM_HeapTableInsert -> HeapTable::insert (1,859 samples) -> BufferPoolManager::unpin_page (789 samples)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/performance/REPORT_V1.md` around lines 27 - 35, The fenced code block showing the call graph (starting with BM_HeapTableInsert and lines including HeapTable::insert and BufferPoolManager::unpin_page) lacks a language specifier; update the opening triple-backtick to include a plaintext language (e.g., ```text or ```plaintext) so the static analysis tool recognizes it as plain text and preserves formatting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benchmarks/execution_bench.cpp`:
- Around line 45-55: The benchmark currently moves `table` into
`SeqScanOperator` each iteration (via std::move), destroying it and then
recreating an empty `HeapTable` without calling `SetupBenchTable()`, so only the
first iteration scans rows; fix by avoiding ownership transfer in the hot loop:
construct `table` and call `SetupBenchTable()` once outside the loop, and change
the `SeqScanOperator` invocation to accept a non-owning reference/pointer (e.g.,
pass `table.get()` or a `HeapTable&`) or use a `shared_ptr` so you can create
`scan_op` without std::move; alternatively, if you really want to re-populate
each iteration, call `SetupBenchTable()` after recreating
`HeapTable("scan_table", bpm, schema)` so the table has rows before scanning.
Ensure references to `SeqScanOperator`, `HeapTable`, `SetupBenchTable`, and the
`table` variable are updated accordingly.
- Around line 80-100: The benchmark currently std::move()s
left_table/right_table into SeqScanOperator inside the loop (see SeqScanOperator
and HashJoinOperator construction), which transfers ownership and leaves the
tables empty for subsequent iterations; fix by preserving table ownership across
iterations: either create the HeapTable instances once outside the for loop so
they are reused, or change the SeqScanOperator construction to take non-owning
pointers/references (e.g., pass left_table.get()/right_table.get() or use
shared_ptr instead of std::unique_ptr) so you don't std::move() the tables; if
you must recreate tables each iteration, repopulate them with data after
construction before creating SeqScanOperator/HashJoinOperator.
In `@benchmarks/network_bench.cpp`:
- Around line 42-46: TearDown currently calls server->stop() without checking if
server was initialized (SetUp may have failed); update TearDown in the test
class to guard the stop call by checking the server pointer (e.g., if (server)
server->stop()) before calling server.reset(), leaving client.reset() as-is;
ensure the check uses the same server symbol used in TearDown so you don't call
methods on a null pointer.
- Around line 33-39: The current code returns early when server->start() fails
which leaves client (the std::unique_ptr<RpcClient>) null and leaves the
benchmark in an inconsistent state; update the failure path in the block
containing server->start() so it aborts the benchmark cleanly (e.g., call
state.SkipWithError("server failed to start: <brief msg>") or throw) and ensure
any resources are reset/cleaned before returning so subsequent iterations won't
operate on a null client; locate the logic around server->start(), the client
unique_ptr and RpcClient construction/connection to apply this change.
In `@benchmarks/storage_bench.cpp`:
- Around line 49-76: BM_HeapTableInsert currently removes the test directory
while BufferPoolManager and HeapTable are still alive which can cause flush
failures, and uses a hardcoded "./bench_data_table" so different benchmark args
or concurrent runs could clash; fix by scoping or destroying bpm and table
before calling std::filesystem::remove_all (e.g., wrap StorageManager,
BufferPoolManager, HeapTable, and the tuple creation in an inner scope or
explicitly reset/destroy bpm/table) and make test_dir unique per run (use
benchmark state to incorporate state.thread_index(), state.range(0) or
state.iterations() into the directory name) so directories do not collide.
- Around line 43-46: The directory cleanup runs before bpm and disk_manager are
destroyed, causing flush_all_pages() to fail when destructors attempt I/O; move
the std::filesystem::remove_all(test_dir) so it runs after bpm and disk_manager
are out of scope (e.g., wrap bpm/disk_manager in an inner scope or explicitly
destroy/reset them before calling remove_all), ensuring the destructor of
BufferPoolManager (bpm) and DiskManager complete flushes to test_dir before
deletion.
---
Nitpick comments:
In `@benchmarks/network_bench.cpp`:
- Around line 24-31: The handler registered via server->set_handler for
RpcType::AppendEntries ignores the return values from the two send() calls,
which can hide socket errors and leave the benchmark in an invalid state; update
the lambda (the AppendEntries handler) to check each send() return value, log
failures using the project logger or stderr including client_fd and errno, and
return early (or close the client_fd) on send errors to avoid continuing with
corrupt state — ensure both the header send (after RpcHeader::encode) and the
payload send are validated and handled.
In `@CMakeLists.txt`:
- Around line 94-98: The add_cloudsql_benchmark macro currently links targets to
benchmark::benchmark requiring each benchmark source to declare
BENCHMARK_MAIN(); update the macro (macro name add_cloudsql_benchmark) to also
link benchmark::benchmark_main in target_link_libraries so executables
automatically get the main entry for Google Benchmark (i.e., add
benchmark::benchmark_main alongside benchmark::benchmark when calling
target_link_libraries for the created target).
In `@docs/performance/REPORT_V1.md`:
- Around line 27-35: The fenced code block showing the call graph (starting with
BM_HeapTableInsert and lines including HeapTable::insert and
BufferPoolManager::unpin_page) lacks a language specifier; update the opening
triple-backtick to include a plaintext language (e.g., ```text or ```plaintext)
so the static analysis tool recognizes it as plain text and preserves
formatting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a31cd230-bf36-42f0-a775-8d13ef320a57
📒 Files selected for processing (5)
CMakeLists.txtbenchmarks/execution_bench.cppbenchmarks/network_bench.cppbenchmarks/storage_bench.cppdocs/performance/REPORT_V1.md
This PR introduces the performance benchmarking infrastructure for cloudSQL.
Changes:
Summary by CodeRabbit
New Features
Documentation