Conversation
Signed-off-by: lhy1024 <admin@liudos.us>
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a CPU-based hot-region dimension: collect store/region CPU stats, propagate them through heartbeats and statistics, extend hot-region config/solver/metrics with CPU support and version gating, update storage/CLI/API outputs, add tests, and bump kvproto dependency versions. Changes
Sequence Diagram(s)sequenceDiagram
participant TiKV as TiKV (store)
participant PD as PD/statistics collector
participant StoreHandler as HeartbeatHandler
participant Scheduler as HotRegionScheduler
TiKV->>PD: send store heartbeat (peers, peer stats, cpu stats)
PD->>PD: aggregate store CPU (unified-read, grpc threads)
PD->>StoreHandler: compute per-region RegionReadCPU (unified + proportional gRPC)
StoreHandler->>Scheduler: publish region loads (bytes, keys, queries, cpu)
Scheduler->>Scheduler: evaluate hot regions (use cpuSupport, thresholds, priorities)
Scheduler->>PD: emit metrics/decisions (including CPU rates)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
No actionable comments were generated in the recent review. 🎉 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
b199e08 to
5b42858
Compare
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10178 +/- ##
==========================================
+ Coverage 78.76% 78.79% +0.03%
==========================================
Files 522 523 +1
Lines 70369 70527 +158
==========================================
+ Hits 55424 55572 +148
- Misses 10943 10957 +14
+ Partials 4002 3998 -4
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Signed-off-by: lhy1024 <admin@liudos.us>
d0d3233 to
550ffd8
Compare
|
please link an issue and add some descriptions |
| storeReadQuery := core.GetReadQueryNum(stats.QueryStats) | ||
| storeWriteQuery := core.GetWriteQueryNum(stats.QueryStats) | ||
| storeTotalQuery := storeReadQuery + storeWriteQuery | ||
| storeGRPCCPU := statistics.StoreGRPCCPUUsage(stats.GetCpuUsages()) |
There was a problem hiding this comment.
Here we intentionally use gRPC CPU only. Unified-read CPU is already in peerStat.CpuStats.UnifiedRead, so using store read CPU here would double count.
pkg/statistics/cpu.go
Outdated
| @@ -0,0 +1,74 @@ | |||
| // Copyright 2025 TiKV Project Authors. | |||
There was a problem hiding this comment.
| // Copyright 2025 TiKV Project Authors. | |
| // Copyright 2026 TiKV Project Authors. |
| return unifiedReadCPU | ||
| } | ||
| grpcCPU := float64(StoreGRPCCPUUsage(cpuUsages)) | ||
| return unifiedReadCPU + grpcCPU*float64(readQuery)/float64(totalQuery) |
There was a problem hiding this comment.
This is an approximation: unified-read CPU is read-only, while grpc-server CPU is shared by read/write requests, so we apportion gRPC CPU by readQuery/totalQuery.
| rollingWindowsSize = 5 | ||
| // It is used to moving average CPU usage, | ||
| // and the window size is larger than other dimensions to make the CPU usage more stable. | ||
| cpuRollingWindowsSize = 9 |
There was a problem hiding this comment.
A larger window will be more stable for cpu
| ) | ||
|
|
||
| // IsHotScheduleWithCPUSupported returns whether TiKV reports CPU info for hot scheduling. | ||
| func IsHotScheduleWithCPUSupported(clusterVersion *semver.Version) bool { |
There was a problem hiding this comment.
What if we wanna cp to release 8.5?
Signed-off-by: lhy1024 <admin@liudos.us>
|
@lhy1024: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: Close #5718
What is changed and how does it work?
Simple description
This pr introduces cpu as a new dimension for hot scheduler, it only serve hot read scheduler
From store heartbeat cpu_usages, sum unified‑read and grpc‑server thread CPU by prefix. Read CPU load is computed as unifiedReadCPU + grpcCPU * readQuery/totalQuery (or just unifiedReadCPU if queries are missing). This feeds the read CPUDim in store loads and hot‑peer stats. CPU uses a longer rolling median window; hotness checks use rolling average for CPU and last‑interval average for other dims. Read priorities become cpu→byte when supported, otherwise fall back to query→byte (or byte→key if query isn’t supported).
Check List
Tests
Release note
Summary by CodeRabbit
New Features
Tests
Chores