-
Notifications
You must be signed in to change notification settings - Fork 34
Feat: Add [account/user-partition] resource limit #529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThis change introduces comprehensive partition-level resource limit management for users, accounts, and QoS entities. It extends protobuf definitions, database schema, and in-memory structures to support hierarchical, partition-aware resource and job limit controls. The control flow and gRPC interfaces are updated to handle new resource fields, with enhanced validation, serialization, and concurrency handling. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant CtldGrpcServer
participant AccountManager
participant AccountMetaContainer
participant DbClient
Client->>CtldGrpcServer: ModifyUser/Account/Qos (with partition/resource fields)
CtldGrpcServer->>AccountManager: ModifyUserPartitionResource/ModifyAccountPartitionResource/ModifyQosTres
AccountManager->>DbClient: Update resource limits in DB
AccountManager->>AccountMetaContainer: Update in-memory resource maps
Client->>CtldGrpcServer: SubmitTaskToScheduler
CtldGrpcServer->>AccountMetaContainer: TryMallocMetaSubmitResource
AccountMetaContainer->>AccountManager: Validate resource/job limits (partition, user, account, qos)
AccountMetaContainer-->>CtldGrpcServer: Success/Error
CtldGrpcServer-->>Client: Submit result
Suggested labels
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
|
merged after #528 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 23
🔭 Outside diff range comments (1)
protos/PublicDefs.proto (1)
502-528: Do NOT renumber existing enum values – this breaks backward compatibility
ModifyFieldhad stable numeric tags.
By inserting new fields (Description = 3, etc.) every subsequent constant is
shifted, altering on-the-wire numbers and corrupting any persisted data
(in DBs, logs, cached protobufs) produced by earlier binaries.Add new values only at the tail of the enum and keep historical numbers fixed.
Roll back to the original numbering or introduce a new enum if semantics
changed.Also fix the typo
FLags→Flags.
🧹 Nitpick comments (9)
src/Utilities/PublicHeader/include/crane/String.h (2)
50-51: Nit: add function comment or reference
ParseMemStringAsByteis public but undocumented here. A one-liner describing accepted formats (e.g. “10G”, “512Mi”) will save every future reader a trip to the implementation.
95-98: Document conversion helpers
ConvertStringToDeviceMap/ConvertStringToResourceVieware extremely handy but tricky. Brief examples ("gpu:2(A100:1,H100:1)") in the header comment would be valuable.src/Utilities/PublicHeader/include/crane/PublicHeader.h (1)
487-489: Return type exposes mutable internal state – intentional?
DeviceMap& GetDeviceMap()gives callers full write access to the internal map, bypassing invariants (e.g. non-negative counts). If that is required, mention it in the comment; otherwise expose a const view or dedicated mutators.src/Utilities/PublicHeader/String.cpp (1)
503-553: Whitespace sensitivity may reject perfectly valid input
absl::StrSplit(s, ",")keeps leading/trailing spaces, therefore a commonly typed string like
"cpu=8, mem=32G, gres/gpu:2"fails because" mem=32G"does not satisfyitem.starts_with("gres/")and later yieldskv[0] == " mem".Trim each
itembefore further processing:- for (const auto& item : items) { + for (auto item : items) { + absl::StripAsciiWhitespace(&item);This tiny change makes the parser much more forgiving without affecting strict inputs.
src/CraneCtld/AccountMetaContainer.h (1)
28-38: Synchronise stripe counts – avoid redundant locking
kNumStripes = 128, yet eachparallel_flat_hash_mapis already internally
striped (4shards via the 6-th template arg).
Adding an additional 128-elementstd::array<std::mutex>per container
duplicates contention control, wastes memory, and complicates lock ordering.Consider:
- Drop the manual
m_*_stripes_arrays and rely on phmap’s striped mutexes; or- Reduce
kNumStripesto the same value used in the map template and document
the guaranteed lock ordering.Either way, keep a single, consistent stratagem to minimise dead-lock risk and
runtime overhead.Also applies to: 113-127
protos/PublicDefs.proto (1)
530-537: Nestedmapindirection looks unnecessary
UserInfo.PartitionToResourceLimitjust wraps a single map. Unless a second
level of metadata is planned, replace:map<string, PartitionToResourceLimit> account_to_partition_limitwith a direct
map<string, PartitionResourceLimit>to keep the schema simple.src/CraneCtld/DbClient.cpp (1)
657-658: Consider numeric types for memory serializationSerializing memory values as strings may cause precision issues for very large values and increases storage overhead.
Consider using numeric BSON types:
-allocDoc.append(kvp("mem", std::to_string(value.MemoryBytes()))); -allocDoc.append(kvp("mem_sw", std::to_string(value.MemoryBytes()))); +allocDoc.append(kvp("mem", static_cast<int64_t>(value.MemoryBytes()))); +allocDoc.append(kvp("mem_sw", static_cast<int64_t>(value.MemoryBytes())));src/CraneCtld/CtldPublicDefs.h (1)
948-959: Document memory implications of nested map structureThe
MetaResourceStatstructure uses multiple nested maps which could consume significant memory with many users, partitions, and QoS combinations.Consider:
- Adding a comment documenting the memory complexity: O(users × partitions × QoS)
- Implementing lazy initialization for rarely used combinations
- Adding metrics to monitor memory usage of these structures
src/CraneCtld/AccountManager.cpp (1)
2593-2609: Consider documenting the CPU limit calculation rationale.The implementation correctly initializes default resource limits. The CPU count calculation
std::numeric_limits<int32_t>::max() / 256appears to be a safeguard against overflow, but consider adding a comment explaining this choice.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
src/Misc/BPF/cgroup_dev_bpf.ois excluded by!**/*.o
📒 Files selected for processing (16)
protos/Crane.proto(1 hunks)protos/PublicDefs.proto(6 hunks)src/CraneCtld/AccountManager.cpp(26 hunks)src/CraneCtld/AccountManager.h(4 hunks)src/CraneCtld/AccountMetaContainer.cpp(2 hunks)src/CraneCtld/AccountMetaContainer.h(1 hunks)src/CraneCtld/CtldGrpcServer.cpp(10 hunks)src/CraneCtld/CtldPublicDefs.h(7 hunks)src/CraneCtld/DbClient.cpp(8 hunks)src/CraneCtld/DbClient.h(2 hunks)src/CraneCtld/TaskScheduler.cpp(10 hunks)src/CraneCtld/TaskScheduler.h(1 hunks)src/Utilities/PublicHeader/PublicHeader.cpp(2 hunks)src/Utilities/PublicHeader/String.cpp(2 hunks)src/Utilities/PublicHeader/include/crane/PublicHeader.h(4 hunks)src/Utilities/PublicHeader/include/crane/String.h(2 hunks)
🔇 Additional comments (13)
protos/Crane.proto (1)
313-320: Confirm wire-compatibility of the newpartitionfieldField number 7 appears to be unused so adding
string partition = 7;is wire-compatible.
If 7 was everreservedor used by a removed field, older binaries will mis-deserialize. Please verify historical.protohistory before merging.src/Utilities/PublicHeader/include/crane/PublicHeader.h (1)
469-472:ResourceView::operator-=may produce negative resourcesUnlike the node-level overload, the view-level overload does not guard against
rhs.allocatable_res > lhs.allocatable_res, leading to underfloweduint64_tcounts.
Consider mirroring theABSL_ASSERT(rhs <= this->allocatable_res)check used elsewhere.src/Utilities/PublicHeader/PublicHeader.cpp (1)
724-734: Safety check missing inResourceView -= ResourceViewThis overload blindly subtracts both maps; if
rhs.device_mapcontains keys not inlhs.device_map, the underlyingDeviceMap -=will ASSERT and, in release, mis-behave. Replicate the containment test or convert the assertion to a hard check.src/Utilities/PublicHeader/String.cpp (1)
64-85: Possible silent overflow & loss of precision
szis parsed into adoubleand then multiplied by large powers of two before the cast touint64_t.
Values above ~9 PiB can no longer be represented exactly in an IEEE-754double, and the multiplication may silently overflowuint64_twithout any diagnostics.Consider:
- double sz = 0.0; + long double sz = 0.0L; // preserve precision ... - uint64_t bytes = 0; + __uint128_t bytes = 0; // compute in wider type ... - return bytes; + if (bytes > std::numeric_limits<uint64_t>::max()) + return std::nullopt; + return static_cast<uint64_t>(bytes);src/CraneCtld/CtldGrpcServer.cpp (1)
1748-1752: Possible task-count underflow
TryMallocMetaSubmitResourcefails →UserReduceTask()is called, but
UserAddTask()is only invoked insideTryMallocMetaSubmitResourceon
success.
If the function returns early without incrementing the counter,
UserReduceTaskwill decrement a zero count → underflow (wrap to UINT_MAX).Double-check the implementation or guard the decrement with a prior increment
flag.src/CraneCtld/AccountMetaContainer.cpp (1)
44-50: Verify the semantics of max_submit_jobs == 0The current logic treats
max_submit_jobs == 0as "no jobs allowed". In many resource management systems, 0 typically means "unlimited". Please verify if this is the intended behavior.If 0 should mean unlimited, consider updating the checks:
-if (qos->max_submit_jobs_per_user == 0) +if (qos->max_submit_jobs_per_user == 0) + qos->max_submit_jobs_per_user = std::numeric_limits<uint32_t>::max();Otherwise, document this behavior clearly in the code comments.
src/CraneCtld/DbClient.cpp (1)
79-84: Clarify the CPU count initialization logicThe initialization uses
INT32_MAX / 256formax_tres.GetAllocatableRes().cpu_count. This division by 256 seems arbitrary and differs from other fields that use the full numeric_limits::max().#!/bin/bash # Check if this pattern is used elsewhere in the codebase rg "INT32_MAX\s*/\s*256" -A 2 -B 2Consider using consistent max values or documenting why this specific value is used.
src/CraneCtld/AccountManager.cpp (6)
22-22: LGTM!The TaskScheduler include is necessary for the new UserHasTask check.
190-192: Good addition for data integrity.Checking if a user has active tasks before deletion prevents data inconsistency and orphaned tasks.
1101-1101: Good refactoring to use centralized helper.Using
CraneModifyFieldStrimproves code maintainability and reduces duplication.
1280-1288: Correct implementation of account chain population.The account hierarchy chain is properly built by traversing from the task's account up to the root, which is essential for hierarchical resource tracking.
2133-2133: Consistent method name updates.The changes from
Delete*AllocationInfotoDelete*Metamethods are consistent with the AccountMetaContainer refactoring.Also applies to: 2172-2172, 2184-2184
3238-3241: Comprehensive cleanup of partition resources.The implementation properly removes partition resource limits from both the database and in-memory structures when partitions are deleted.
Also applies to: 3245-3247, 3279-3283
| bool UserHasTasks(const std::string& username); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
UserHasTasks should be const and accept string_view
The method is a pure query, so marking it const documents thread-safety expectations and allows callers to use it on const instances.
Using absl::string_view avoids an unnecessary allocation.
-bool UserHasTasks(const std::string& username);
+bool UserHasTasks(absl::string_view username) const;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| bool UserHasTasks(const std::string& username); | |
| bool UserHasTasks(absl::string_view username) const; |
🤖 Prompt for AI Agents
In src/CraneCtld/TaskScheduler.h at lines 508-509, the method UserHasTasks
should be updated to be a const member function and accept an absl::string_view
parameter instead of a const std::string&. This change documents that the method
does not modify the object and improves efficiency by avoiding string copying.
Modify the method signature accordingly to reflect these improvements.
| void operator+=(DeviceMap& lhs, const DeviceMap& rhs); | ||
| void operator-=(DeviceMap& lhs, const DeviceMap& rhs); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
DeviceMap compound operators declared here – beware of unsigned underflow
operator-= relies on run-time ABSL_ASSERTs in the implementation.
When assertions are stripped (e.g. Release builds) the code dereferences lhs_it unchecked → UB if the key is absent.
Either keep the early‐return checks or switch to CHECK/CRANE_ASSERT that stay in release.
🤖 Prompt for AI Agents
In src/Utilities/PublicHeader/include/crane/PublicHeader.h around lines 367 to
369, the operator-= for DeviceMap relies on ABSL_ASSERTs which are removed in
release builds, causing potential undefined behavior due to unchecked
dereferencing. To fix this, replace ABSL_ASSERT with a CHECK or CRANE_ASSERT
that remains active in release builds, or reintroduce explicit early-return
checks to ensure the key exists before dereferencing.
| // 65 - 69 | ||
| "The current running job exceeds the QoS limit (MaxJobPerUser)", | ||
| "User has insufficient privilege" | ||
| "User has insufficient privilege", | ||
| "The current account is not in the allowed account list for the partition", | ||
| "The current account has been explicitly added to the deny list for the partition", | ||
| "ERR_EBPF", | ||
|
|
||
| // 70 - 74 | ||
| "ERR_SUPERVISOR", | ||
| "The current submitted job exceeds the QoS limit (MaxSubmitJobsPerAccount)" | ||
| "ERR_USER_HAS_TASK", | ||
| "The current submitted job exceeds the QoS limit (MaxJobsPerQos)", | ||
| "ERR_CONVERT_TO_RESOURCE_VIEW", | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comma concatenates two error strings and shifts the array indexes
"The current submitted job exceeds the QoS limit (MaxSubmitJobsPerAccount)" is not followed by a comma, so the next literal is merged by the compiler, producing
"...MaxSubmitJobsPerAccount)ERR_USER_HAS_TASK".
Every subsequent entry is now off-by-one w.r.t the ErrCode enum, breaking CraneErrStr().
- "The current submitted job exceeds the QoS limit (MaxSubmitJobsPerAccount)"
+ "The current submitted job exceeds the QoS limit (MaxSubmitJobsPerAccount)",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // 65 - 69 | |
| "The current running job exceeds the QoS limit (MaxJobPerUser)", | |
| "User has insufficient privilege" | |
| "User has insufficient privilege", | |
| "The current account is not in the allowed account list for the partition", | |
| "The current account has been explicitly added to the deny list for the partition", | |
| "ERR_EBPF", | |
| // 70 - 74 | |
| "ERR_SUPERVISOR", | |
| "The current submitted job exceeds the QoS limit (MaxSubmitJobsPerAccount)" | |
| "ERR_USER_HAS_TASK", | |
| "The current submitted job exceeds the QoS limit (MaxJobsPerQos)", | |
| "ERR_CONVERT_TO_RESOURCE_VIEW", | |
| // 65 - 69 | |
| "The current running job exceeds the QoS limit (MaxJobPerUser)", | |
| "User has insufficient privilege", | |
| "The current account is not in the allowed account list for the partition", | |
| "The current account has been explicitly added to the deny list for the partition", | |
| "ERR_EBPF", | |
| // 70 - 74 | |
| "ERR_SUPERVISOR", | |
| "The current submitted job exceeds the QoS limit (MaxSubmitJobsPerAccount)", | |
| "ERR_USER_HAS_TASK", | |
| "The current submitted job exceeds the QoS limit (MaxJobsPerQos)", | |
| "ERR_CONVERT_TO_RESOURCE_VIEW", |
🤖 Prompt for AI Agents
In src/Utilities/PublicHeader/include/crane/PublicHeader.h between lines 204 and
217, there is a missing comma after the string "The current submitted job
exceeds the QoS limit (MaxSubmitJobsPerAccount)". Add a comma immediately after
this string to separate it from the next string literal "ERR_USER_HAS_TASK".
This will prevent the compiler from concatenating the two strings and ensure the
array indexes align correctly with the ErrCode enum, fixing the CraneErrStr()
mapping.
| void operator+=(DeviceMap& lhs, const DeviceMap& rhs) { | ||
| for (const auto& [dev_name, rhs_val] : rhs) { | ||
| const auto& [rhs_untyped, rhs_types] = rhs_val; | ||
| auto& lhs_pair = lhs[dev_name]; | ||
| lhs_pair.first += rhs_untyped; | ||
| for (const auto& [type_name, type_count] : rhs_types) { | ||
| lhs_pair.second[type_name] += type_count; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| void operator-=(DeviceMap& lhs, const DeviceMap& rhs) { | ||
| for (const auto& [dev_name, rhs_val] : rhs) { | ||
| const auto& [rhs_untyped, rhs_types] = rhs_val; | ||
| auto lhs_it = lhs.find(dev_name); | ||
| ABSL_ASSERT(lhs_it != lhs.end()); | ||
| auto& lhs_pair = lhs_it->second; | ||
| ABSL_ASSERT(lhs_pair.first >= rhs_untyped); | ||
| lhs_pair.first -= rhs_untyped; | ||
| for (const auto& [type_name, type_count] : rhs_types) { | ||
| auto type_it = lhs_pair.second.find(type_name); | ||
| ABSL_ASSERT(type_it != lhs_pair.second.end()); | ||
| ABSL_ASSERT(type_it->second >= type_count); | ||
| type_it->second -= type_count; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeviceMap::operator-= undefined behaviour in Release builds
When lhs lacks dev_name, lhs_it == lhs.end() and the subsequent dereference triggers UB once ABSL_ASSERT is removed. Same for missing type_name.
- auto lhs_it = lhs.find(dev_name);
- ABSL_ASSERT(lhs_it != lhs.end());
- auto& lhs_pair = lhs_it->second;
+ auto lhs_it = lhs.find(dev_name);
+ if (lhs_it == lhs.end()) return; // nothing to subtract / or handle error
+ auto& lhs_pair = lhs_it->second;Apply similar guarding to the inner type_it block.
Fail-fast or gracefully handle the mismatch; silent UB is deadly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void operator+=(DeviceMap& lhs, const DeviceMap& rhs) { | |
| for (const auto& [dev_name, rhs_val] : rhs) { | |
| const auto& [rhs_untyped, rhs_types] = rhs_val; | |
| auto& lhs_pair = lhs[dev_name]; | |
| lhs_pair.first += rhs_untyped; | |
| for (const auto& [type_name, type_count] : rhs_types) { | |
| lhs_pair.second[type_name] += type_count; | |
| } | |
| } | |
| } | |
| void operator-=(DeviceMap& lhs, const DeviceMap& rhs) { | |
| for (const auto& [dev_name, rhs_val] : rhs) { | |
| const auto& [rhs_untyped, rhs_types] = rhs_val; | |
| auto lhs_it = lhs.find(dev_name); | |
| ABSL_ASSERT(lhs_it != lhs.end()); | |
| auto& lhs_pair = lhs_it->second; | |
| ABSL_ASSERT(lhs_pair.first >= rhs_untyped); | |
| lhs_pair.first -= rhs_untyped; | |
| for (const auto& [type_name, type_count] : rhs_types) { | |
| auto type_it = lhs_pair.second.find(type_name); | |
| ABSL_ASSERT(type_it != lhs_pair.second.end()); | |
| ABSL_ASSERT(type_it->second >= type_count); | |
| type_it->second -= type_count; | |
| } | |
| } | |
| } | |
| void operator-=(DeviceMap& lhs, const DeviceMap& rhs) { | |
| for (const auto& [dev_name, rhs_val] : rhs) { | |
| const auto& [rhs_untyped, rhs_types] = rhs_val; | |
| auto lhs_it = lhs.find(dev_name); | |
| if (lhs_it == lhs.end()) return; // nothing to subtract / or handle error | |
| auto& lhs_pair = lhs_it->second; | |
| ABSL_ASSERT(lhs_pair.first >= rhs_untyped); | |
| lhs_pair.first -= rhs_untyped; | |
| for (const auto& [type_name, type_count] : rhs_types) { | |
| auto type_it = lhs_pair.second.find(type_name); | |
| ABSL_ASSERT(type_it != lhs_pair.second.end()); | |
| ABSL_ASSERT(type_it->second >= type_count); | |
| type_it->second -= type_count; | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In src/Utilities/PublicHeader/PublicHeader.cpp around lines 361 to 387, the
operator-= function for DeviceMap uses ABSL_ASSERT to check for the existence of
keys in lhs, but these asserts are removed in Release builds causing undefined
behavior when keys are missing. To fix this, replace ABSL_ASSERT checks with
explicit if conditions that either fail-fast by throwing exceptions or handle
the missing keys gracefully, ensuring no dereference occurs on invalid iterators
for both dev_name and type_name lookups.
| std::optional<uint64_t> ParseMemStringAsByte(const std::string& mem) { | ||
| static const LazyRE2 mem_regex = { R"(^([0-9]+(?:\.?[0-9]*))([MmGgKkB]?)$)" } ; | ||
| std::string num_str, unit; | ||
| if (!RE2::FullMatch(mem, *mem_regex, &num_str, &unit)) { | ||
| CRANE_ERROR("invalid memory format: {}", mem); | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| double sz = 0.0; | ||
| try { | ||
| sz = std::stod(num_str); | ||
| } catch (const std::exception& e) { | ||
| CRANE_ERROR("invalid memory format: {}", mem); | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| uint64_t bytes = 0; | ||
| switch (unit[0]) { | ||
| case 'K': | ||
| case 'k': | ||
| bytes = sz * 1024; | ||
| break; | ||
| case 'M': | ||
| case 'm': | ||
| bytes = sz * 1024 * 1024; | ||
| break; | ||
| case 'G': | ||
| case 'g': | ||
| bytes = sz * 1024 * 1024 * 1024; | ||
| break; | ||
| case 'B': | ||
| case 'b': | ||
| bytes = sz; | ||
| break; | ||
| default: | ||
| bytes = sz * 1024 * 1024; | ||
| } | ||
|
|
||
| return bytes; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Undefined-behaviour when unit is omitted
unit can be an empty string when the caller passes a plain number such as "1024".
Accessing unit[0] in that case is undefined behaviour and will read past the string’s bounds, potentially crashing the ctl-daemon.
A minimal, defensive fix:
- switch (unit[0]) {
+ char u = unit.empty() ? '\0' : unit[0];
+ switch (u) {You’ll also want to decide on a deterministic default (bytes is the typical choice) instead of silently falling through to the default case that currently assumes megabytes.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::optional<uint64_t> ParseMemStringAsByte(const std::string& mem) { | |
| static const LazyRE2 mem_regex = { R"(^([0-9]+(?:\.?[0-9]*))([MmGgKkB]?)$)" } ; | |
| std::string num_str, unit; | |
| if (!RE2::FullMatch(mem, *mem_regex, &num_str, &unit)) { | |
| CRANE_ERROR("invalid memory format: {}", mem); | |
| return std::nullopt; | |
| } | |
| double sz = 0.0; | |
| try { | |
| sz = std::stod(num_str); | |
| } catch (const std::exception& e) { | |
| CRANE_ERROR("invalid memory format: {}", mem); | |
| return std::nullopt; | |
| } | |
| uint64_t bytes = 0; | |
| switch (unit[0]) { | |
| case 'K': | |
| case 'k': | |
| bytes = sz * 1024; | |
| break; | |
| case 'M': | |
| case 'm': | |
| bytes = sz * 1024 * 1024; | |
| break; | |
| case 'G': | |
| case 'g': | |
| bytes = sz * 1024 * 1024 * 1024; | |
| break; | |
| case 'B': | |
| case 'b': | |
| bytes = sz; | |
| break; | |
| default: | |
| bytes = sz * 1024 * 1024; | |
| } | |
| return bytes; | |
| } | |
| std::optional<uint64_t> ParseMemStringAsByte(const std::string& mem) { | |
| static const LazyRE2 mem_regex = { R"(^([0-9]+(?:\.?[0-9]*))([MmGgKkB]?)$)" } ; | |
| std::string num_str, unit; | |
| if (!RE2::FullMatch(mem, *mem_regex, &num_str, &unit)) { | |
| CRANE_ERROR("invalid memory format: {}", mem); | |
| return std::nullopt; | |
| } | |
| double sz = 0.0; | |
| try { | |
| sz = std::stod(num_str); | |
| } catch (const std::exception& e) { | |
| CRANE_ERROR("invalid memory format: {}", mem); | |
| return std::nullopt; | |
| } | |
| uint64_t bytes = 0; | |
| + char u = unit.empty() ? '\0' : unit[0]; | |
| - switch (unit[0]) { | |
| + switch (u) { | |
| case 'K': | |
| case 'k': | |
| bytes = sz * 1024; | |
| break; | |
| case 'M': | |
| case 'm': | |
| bytes = sz * 1024 * 1024; | |
| break; | |
| case 'G': | |
| case 'g': | |
| bytes = sz * 1024 * 1024 * 1024; | |
| break; | |
| case 'B': | |
| case 'b': | |
| bytes = sz; | |
| break; | |
| default: | |
| bytes = sz * 1024 * 1024; | |
| } | |
| return bytes; | |
| } |
🤖 Prompt for AI Agents
In src/Utilities/PublicHeader/String.cpp lines 48 to 87, the code accesses
unit[0] without checking if unit is empty, causing undefined behavior when the
unit is omitted. Fix this by first checking if unit is empty before accessing
unit[0]. If empty, assign a default unit such as 'B' for bytes to ensure
deterministic behavior instead of relying on the default case that assumes
megabytes. Update the switch statement accordingly to handle this default unit
safely.
| resource->GetAllocatableRes().memory_bytes = std::stoull(std::string(allocatable_res["mem"].get_string().value)); | ||
| resource->GetAllocatableRes().memory_sw_bytes = std::stoull(std::string(allocatable_res["mem_sw"].get_string().value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for memory value parsing
The std::stoull calls could throw exceptions if the string values are invalid.
Add try-catch blocks:
-resource->GetAllocatableRes().memory_bytes = std::stoull(std::string(allocatable_res["mem"].get_string().value));
-resource->GetAllocatableRes().memory_sw_bytes = std::stoull(std::string(allocatable_res["mem_sw"].get_string().value));
+try {
+ resource->GetAllocatableRes().memory_bytes = std::stoull(std::string(allocatable_res["mem"].get_string().value));
+ resource->GetAllocatableRes().memory_sw_bytes = std::stoull(std::string(allocatable_res["mem_sw"].get_string().value));
+} catch (const std::exception& e) {
+ CRANE_ERROR("Failed to parse memory values: {}", e.what());
+ resource->GetAllocatableRes().memory_bytes = 0;
+ resource->GetAllocatableRes().memory_sw_bytes = 0;
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resource->GetAllocatableRes().memory_bytes = std::stoull(std::string(allocatable_res["mem"].get_string().value)); | |
| resource->GetAllocatableRes().memory_sw_bytes = std::stoull(std::string(allocatable_res["mem_sw"].get_string().value)); | |
| try { | |
| resource->GetAllocatableRes().memory_bytes = | |
| std::stoull(std::string(allocatable_res["mem"].get_string().value)); | |
| resource->GetAllocatableRes().memory_sw_bytes = | |
| std::stoull(std::string(allocatable_res["mem_sw"].get_string().value)); | |
| } catch (const std::exception& e) { | |
| CRANE_ERROR("Failed to parse memory values: {}", e.what()); | |
| resource->GetAllocatableRes().memory_bytes = 0; | |
| resource->GetAllocatableRes().memory_sw_bytes = 0; | |
| } |
🤖 Prompt for AI Agents
In src/CraneCtld/DbClient.cpp around lines 1179 to 1180, the calls to
std::stoull for parsing memory values can throw exceptions if the input strings
are invalid. Wrap these calls in try-catch blocks to catch std::invalid_argument
and std::out_of_range exceptions, and handle the errors appropriately, such as
logging the error and setting default values or returning an error status.
| void AccountManager::EmpalceAccountPartitionResrouce_( | ||
| Account& account, const std::string& partition) { | ||
| if (account.partition_to_resource_limit_map.contains(partition)) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in method name.
The method name has a typo: EmpalceAccountPartitionResrouce_ should be EmplaceAccountPartitionResource_.
🤖 Prompt for AI Agents
In src/CraneCtld/AccountManager.cpp around lines 2917 to 2919, the method name
EmpalceAccountPartitionResrouce_ contains typos. Rename the method to
EmplaceAccountPartitionResource_ to correct the spelling errors in both
"Empalce" and "Resrouce".
| CraneExpected<void> AccountManager::ModifyAccountPartitioinResource( | ||
| uint32_t uid, crane::grpc::ModifyField modify_field, | ||
| const std::string& account_name, const std::string& partition_name, | ||
| const std::string& value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in method name.
The method name has a typo: ModifyAccountPartitioinResource should be ModifyAccountPartitionResource.
🤖 Prompt for AI Agents
In src/CraneCtld/AccountManager.cpp around lines 1021 to 1024, the method name
ModifyAccountPartitioinResource contains a typo. Rename the method to
ModifyAccountPartitionResource by correcting the misspelled word "Partitioin" to
"Partition" in both the method declaration and any corresponding definitions or
calls.
| if ((modify_field == crane::grpc::ModifyField::MaxWall || | ||
| modify_field == crane::grpc::MaxWallDurationPerJob) && | ||
| !CheckIfTimeLimitSecIsValid(value_number)) | ||
| return std::unexpected(CraneErrCode::ERR_TIME_LIMIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect conditional logic for time limit validation.
The time limit validation should only apply to MaxWall and MaxWallDurationPerJob fields, but the current condition is inside a block that only executes for MaxJobs or MaxSubmitJobs.
Apply this diff to fix the logic:
- if (modify_field == crane::grpc::ModifyField::MaxJobs ||
- modify_field == crane::grpc::ModifyField::MaxSubmitJobs) {
- bool ok = util::ConvertStringToInt64(value, &value_number);
- if (!ok) return std::unexpected(CraneErrCode::ERR_CONVERT_TO_INTEGER);
-
- if ((modify_field == crane::grpc::ModifyField::MaxWall ||
- modify_field == crane::grpc::MaxWallDurationPerJob) &&
- !CheckIfTimeLimitSecIsValid(value_number))
- return std::unexpected(CraneErrCode::ERR_TIME_LIMIT);
- }
+ bool ok = util::ConvertStringToInt64(value, &value_number);
+ if (!ok) return std::unexpected(CraneErrCode::ERR_CONVERT_TO_INTEGER);
+
+ if ((modify_field == crane::grpc::ModifyField::MaxWall ||
+ modify_field == crane::grpc::ModifyField::MaxWallDurationPerJob) &&
+ !CheckIfTimeLimitSecIsValid(value_number))
+ return std::unexpected(CraneErrCode::ERR_TIME_LIMIT);Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/CraneCtld/AccountManager.cpp around lines 820 to 823, the time limit
validation condition is incorrectly nested inside a block that only runs for
MaxJobs or MaxSubmitJobs, causing it to not apply to MaxWall or
MaxWallDurationPerJob as intended. To fix this, move the conditional check for
MaxWall and MaxWallDurationPerJob fields with the time limit validation outside
or separate from the MaxJobs/MaxSubmitJobs block so that it executes correctly
when those fields are modified.
| if (modify_field == crane::grpc::ModifyField::MaxJobs || | ||
| modify_field == crane::grpc::ModifyField::MaxSubmitJobs) { | ||
| bool ok = util::ConvertStringToInt64(value, &value_number); | ||
| if (!ok) return std::unexpected(CraneErrCode::ERR_CONVERT_TO_INTEGER); | ||
|
|
||
| if ((modify_field == crane::grpc::ModifyField::MaxWall || | ||
| modify_field == crane::grpc::MaxWallDurationPerJob) && | ||
| !CheckIfTimeLimitSecIsValid(value_number)) | ||
| return std::unexpected(CraneErrCode::ERR_TIME_LIMIT); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect conditional logic for time limit validation.
Same issue as in ModifyUserPartitionResource - the time limit check is incorrectly nested.
Apply this diff to fix the logic:
- if (modify_field == crane::grpc::ModifyField::MaxJobs ||
- modify_field == crane::grpc::ModifyField::MaxSubmitJobs) {
- bool ok = util::ConvertStringToInt64(value, &value_number);
- if (!ok) return std::unexpected(CraneErrCode::ERR_CONVERT_TO_INTEGER);
-
- if ((modify_field == crane::grpc::ModifyField::MaxWall ||
- modify_field == crane::grpc::MaxWallDurationPerJob) &&
- !CheckIfTimeLimitSecIsValid(value_number))
- return std::unexpected(CraneErrCode::ERR_TIME_LIMIT);
- }
+ bool ok = util::ConvertStringToInt64(value, &value_number);
+ if (!ok) return std::unexpected(CraneErrCode::ERR_CONVERT_TO_INTEGER);
+
+ if ((modify_field == crane::grpc::ModifyField::MaxWall ||
+ modify_field == crane::grpc::ModifyField::MaxWallDurationPerJob) &&
+ !CheckIfTimeLimitSecIsValid(value_number))
+ return std::unexpected(CraneErrCode::ERR_TIME_LIMIT);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (modify_field == crane::grpc::ModifyField::MaxJobs || | |
| modify_field == crane::grpc::ModifyField::MaxSubmitJobs) { | |
| bool ok = util::ConvertStringToInt64(value, &value_number); | |
| if (!ok) return std::unexpected(CraneErrCode::ERR_CONVERT_TO_INTEGER); | |
| if ((modify_field == crane::grpc::ModifyField::MaxWall || | |
| modify_field == crane::grpc::MaxWallDurationPerJob) && | |
| !CheckIfTimeLimitSecIsValid(value_number)) | |
| return std::unexpected(CraneErrCode::ERR_TIME_LIMIT); | |
| } | |
| bool ok = util::ConvertStringToInt64(value, &value_number); | |
| if (!ok) | |
| return std::unexpected(CraneErrCode::ERR_CONVERT_TO_INTEGER); | |
| if ((modify_field == crane::grpc::ModifyField::MaxWall || | |
| modify_field == crane::grpc::ModifyField::MaxWallDurationPerJob) && | |
| !CheckIfTimeLimitSecIsValid(value_number)) | |
| return std::unexpected(CraneErrCode::ERR_TIME_LIMIT); |
🤖 Prompt for AI Agents
In src/CraneCtld/AccountManager.cpp around lines 1043 to 1052, the conditional
logic incorrectly nests the time limit validation inside the check for MaxJobs
and MaxSubmitJobs. To fix this, separate the conditions so that the time limit
validation for MaxWall and MaxWallDurationPerJob is checked independently after
converting the value to an integer. Adjust the if statements to correctly
validate the time limit only when modify_field matches the relevant time limit
fields.
Summary by CodeRabbit