Skip to content

Conversation

@edragain2nd
Copy link
Collaborator

@edragain2nd edragain2nd commented Sep 28, 2025

Summary by CodeRabbit

  • New Features

    • Global per-task deadline support with automatic deadline-based cancellation.
    • New task status "Deadline" and corresponding exit/return code for deadline terminations.
    • APIs and commands updated to accept optional time limits and optional deadlines when setting/updating task/job constraints.
  • Documentation

    • CLI docs and output formatting updated: new --deadline option and %deadline display token across tools.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 28, 2025

Walkthrough

Adds per-task deadline fields and enums to protobufs and DB persistence, propagates deadline_time through Ctld structures and RPCs, introduces deadline timers and cancellation paths in TaskScheduler and Supervisor, renames time-limit APIs to optional time-constraint variants, and documents CLI --deadline options.

Changes

Cohort / File(s) Summary
Protobuf & RPC schema
protos/PublicDefs.proto, protos/Crane.proto, protos/Supervisor.proto
Add deadline_time (google.protobuf.Timestamp) to task/step/job messages; add Deadline TaskStatus and ERR_INVALID_DEADLINE; rename ChangeTimeLimit → ChangeTimeConstraint and add optional time_limit_seconds and optional deadline_time; add Deadline target attribute.
Ctld public defs & serialization
src/CraneCtld/CtldPublicDefs.h, src/CraneCtld/CtldPublicDefs.cpp
Add absl::Time deadline_time to StepInCtld and TaskInCtld; read/write and propagate deadline_time during recovery, initialization, and when serializing Step/Task protobufs.
Database persistence
src/CraneCtld/DbClient.cpp
Persist and read deadline_time in MongoDB documents and include it in Task-related document field lists and FetchJobRecords parsing.
TaskScheduler (Ctld)
src/CraneCtld/TaskScheduler.h, src/CraneCtld/TaskScheduler.cpp
Add deadline timer thread and uvw loop, per-job timers and queues, create/cancel timer callbacks, enqueue deadline cancellations to mark tasks as Deadline; rename/extend ChangeTaskTimeLimit → ChangeTaskTimeConstraint with optional time_limit/deadline handling.
Ctld RPC service & client stubs
src/CraneCtld/RpcService/CtldGrpcServer.cpp, src/CraneCtld/RpcService/CranedKeeper.h, src/CraneCtld/RpcService/CranedKeeper.cpp
Extract optional fields from requests, call updated Craned RPC ChangeJobTimeConstraint with conditional field setting, and adjust local APIs to accept optional time_limit_seconds and optional deadline_time.
Craned core & supervisor RPCs / stubs
src/Craned/Core/CranedServer.h, src/Craned/Core/CranedServer.cpp, src/Craned/Core/SupervisorKeeper.h, src/Craned/Core/SupervisorKeeper.cpp
Rename RPC handlers and request/reply types to ChangeJobTimeConstraint / ChangeTaskTimeConstraint; accept std::optional time_limit_seconds and deadline_time and forward only when present.
Supervisor TaskManager (Craned)
src/Craned/Supervisor/TaskManager.h, src/Craned/Supervisor/TaskManager.cpp
Replace ChangeTaskTimeLimitAsync with ChangeTaskTimeConstraintAsync taking optional time_limit/deadline; add is_deadline-aware timers, ChangeTaskTimeConstraintQueueElem with optional fields, and propagate TERMINATION_BY_DEADLINE and deadline-aware timer semantics.
Supervisor server wiring
src/Craned/Supervisor/SupervisorServer.h, src/Craned/Supervisor/SupervisorServer.cpp
Rename RPC handler to ChangeTaskTimeConstraint, extract optional fields, and call TaskManager::ChangeTaskTimeConstraintAsync with std::optional values.
Exit codes & status strings
src/Utilities/PublicHeader/include/crane/PublicHeader.h, src/Utilities/PublicHeader/include/crane/String.h
Add EC_REACHED_DEADLINE exit code; map new TaskStatus value to string "Deadline".
CLI docs
docs/en/... and docs/zh/... (cacct.md, calloc.md, cbatch.md, ccontrol.md, cqueue.md, crun.md)
Document new --deadline CLI option, add %deadline/%Deadline format specifier and Deadline display options and accepted formats.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as User/CLI
    participant Ctld as Ctld (TaskScheduler)
    participant Deadline as DeadlineTimerThread
    participant Craned as Craned (Supervisor/TaskManager)
    participant Task as Task Process

    CLI->>Ctld: ChangeTaskTimeConstraint(task_id, time_limit?, deadline?)
    Ctld->>Ctld: validate deadline > now (if present)
    Ctld->>Ctld: update TaskInCtld.deadline_time
    Ctld->>Deadline: enqueue CreateDeadlineTimer(job_id, task_id, deadline)
    Deadline->>Deadline: start uvw timer for deadline

    rect rgb(200,240,220)
      Note over Task,Craned: Normal completion
      Task->>Craned: exits normally
      Craned->>Ctld: report completion
      Ctld->>DB: persist normal outcome
    end

    rect rgb(255,220,220)
      Note over Deadline,Task: Deadline expiry path
      Deadline->>Ctld: timer fires -> enqueue cancellation
      Ctld->>Craned: request termination (deadline)
      Craned->>Task: terminate (SIGTERM/SIGKILL)
      Craned->>Ctld: report TERMINATED_BY_DEADLINE / status=Deadline
      Ctld->>DB: persist deadline outcome (EC_REACHED_DEADLINE)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Files needing extra attention:
    • src/CraneCtld/TaskScheduler.cpp — new deadline thread/uvw loop, timer lifecycle, concurrency and queue correctness.
    • src/Craned/Supervisor/TaskManager.{h,cpp} — timer semantics, EvTaskTimerCb_ changes, termination reason handling and earliest-boundary logic.
    • src/CraneCtld/DbClient.cpp — DB schema changes, field ordering, timestamp conversions and defaults.
    • Cross-module RPC and protobuf renames — ensure all call sites, stubs, and generated code use new Request/Reply and optional-field handling.
    • Serialization between absl::Time, protobuf Timestamp, and int64 seconds.

Possibly related PRs

Suggested reviewers

  • Nativu5
  • L-Xiafeng

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add cbatch --deadline' directly describes the main change in the changeset: adding a new --deadline command-line option to the cbatch command.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66fc081 and e708951.

📒 Files selected for processing (9)
  • protos/PublicDefs.proto (3 hunks)
  • src/CraneCtld/CtldPublicDefs.cpp (2 hunks)
  • src/CraneCtld/CtldPublicDefs.h (1 hunks)
  • src/CraneCtld/RpcService/CranedKeeper.cpp (1 hunks)
  • src/CraneCtld/TaskScheduler.cpp (1 hunks)
  • src/CraneCtld/TaskScheduler.h (1 hunks)
  • src/Craned/Supervisor/TaskManager.cpp (3 hunks)
  • src/Craned/Supervisor/TaskManager.h (4 hunks)
  • src/Utilities/PublicHeader/include/crane/PublicHeader.h (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Applied to files:

  • src/CraneCtld/RpcService/CranedKeeper.cpp
🧬 Code graph analysis (2)
src/Craned/Supervisor/TaskManager.h (1)
src/Craned/Supervisor/TaskManager.cpp (2)
  • EvDeadlineTaskTimerCb_ (1772-1787)
  • EvDeadlineTaskTimerCb_ (1772-1772)
src/Craned/Supervisor/TaskManager.cpp (2)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (154-154)
src/Craned/Supervisor/TaskManager.h (2)
  • DelDeadlineTerminationTimer_ (353-358)
  • AddDeadlineTerminationTimer_ (334-343)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build

@Nativu5
Copy link
Collaborator

Nativu5 commented Sep 28, 2025

在Supervisor上是不是把deadline转换成一个timelimit比较好?

@edragain2nd
Copy link
Collaborator Author

在Supervisor上是不是把deadline转换成一个timelimit比较好?

是指1.和time_limit共用一个变量还是2.只是将deadline类型转换为和time_limit一样的类型?

@Nativu5
Copy link
Collaborator

Nativu5 commented Sep 28, 2025

在Supervisor上是不是把deadline转换成一个timelimit比较好?

是指1.和time_limit共用一个变量还是2.只是将deadline类型转换为和time_limit一样的类型?

Slurm 如果是 --deadline 原因结束一个正在运行的任务,任务状态应该是 TIMEOUT,和 timelimit 杀掉任务是一样的输出。因此我认为可以复用同一个 uvw timer。

@edragain2nd
Copy link
Collaborator Author

在Supervisor上是不是把deadline转换成一个timelimit比较好?

是指1.和time_limit共用一个变量还是2.只是将deadline类型转换为和time_limit一样的类型?

Slurm 如果是 --deadline 原因结束一个正在运行的任务,任务状态应该是 TIMEOUT,和 timelimit 杀掉任务是一样的输出。因此我认为可以复用同一个 uvw timer。

我注意到存在修改time_limit的函数,而调研文档里并没有提到提交任务后还能修改--deadline的val,如果复用可能存在 想修改time_limit但不想修改deadline导致deadline被误修改的情况,所以我认为独立开 会是一个比较好的选择

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/Craned/Supervisor/TaskManager.cpp (1)

1981-1994: Clamp negative deadline intervals before arming the timer.

If the deadline has already passed when this code executes (due to network skew, scheduling delays, or clock differences), deadline_sec becomes negative. Passing this to AddDeadlineTerminationTimer_() will create an invalid timer that never fires, as std::chrono::seconds(negative_value) passed to uvw::timer_handle::start() causes undefined behavior.

This issue was flagged in a previous review but remains unresolved.

Apply this diff to handle expired deadlines:

 if (m_step_.GetStep().has_deadline_time()) {
   int64_t deadline_sec = m_step_.GetStep().deadline_time().seconds() -
                          absl::ToUnixSeconds(absl::Now());
-  if (deadline_sec <= sec) {
+  if (deadline_sec <= 0) {
+    CRANE_TRACE("Task deadline already reached; firing immediately.");
+    EvDeadlineTaskTimerCb_();
+  } else if (deadline_sec <= sec) {
     AddDeadlineTerminationTimer_(deadline_sec);
     CRANE_TRACE("Add a deadline timer of {} seconds", deadline_sec);
   } else {
     AddTerminationTimer_(sec);
     CRANE_TRACE("Add a timer of {} seconds", sec);
   }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b575f33 and 3431876.

📒 Files selected for processing (3)
  • src/CraneCtld/TaskScheduler.cpp (2 hunks)
  • src/Craned/Supervisor/TaskManager.cpp (6 hunks)
  • src/Craned/Supervisor/TaskManager.h (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/CraneCtld/TaskScheduler.cpp
  • src/Craned/Supervisor/TaskManager.h
🧰 Additional context used
🧬 Code graph analysis (1)
src/Craned/Supervisor/TaskManager.cpp (2)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (154-154)
src/Craned/Supervisor/TaskManager.h (3)
  • DelTerminationTimer_ (345-351)
  • AddDeadlineTerminationTimer_ (334-343)
  • AddTerminationTimer_ (323-332)
🔇 Additional comments (3)
src/Craned/Supervisor/TaskManager.cpp (3)

1670-1673: LGTM!

The logic correctly populates the optional deadline_sec field with the absolute deadline timestamp when deadline_time is present.


1772-1787: LGTM!

The deadline timer callback correctly mirrors the timeout callback structure and uses the proper TERMINATION_BY_DEADLINE enum value and ExitCodeReachedDeadline exit code. The past review comment about the invalid field has been addressed.


1837-1842: LGTM!

The deadline termination cases are correctly handled for both signaled and non-signaled scenarios, following the same pattern as OOM and timeout handling. The status and exit codes are appropriate.

Also applies to: 1857-1862

@github-actions github-actions bot added the test-passed Build and test success label Oct 9, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/Craned/Supervisor/TaskManager.cpp (2)

1939-1948: Guard against negative deadline intervals.

If deadline_sec is negative (deadline already passed due to network latency, clock skew, or late scheduling), passing it to AddTerminationTimer_() can cause undefined behavior or result in a very large unsigned timeout that never fires. Please check if deadline_sec <= 0 and handle it by triggering immediate termination before calling AddTerminationTimer_().

Apply this diff to guard against expired deadlines:

 } else if (elem.deadline_sec) {
   int64_t deadline_sec =
       elem.deadline_sec.value() - absl::ToUnixSeconds(absl::Now());
   int64_t new_time_limit_sec =
       ToInt64Seconds((new_time_limit - (absl::Now() - start_time)));
+  if (deadline_sec <= 0) {
+    // Deadline already passed, terminate immediately
+    TaskTerminateQueueElem ev_task_terminate{
+        .termination_reason = TerminatedBy::TERMINATION_BY_DEADLINE};
+    m_task_terminate_queue_.enqueue(ev_task_terminate);
+    m_terminate_task_async_handle_->send();
+  } else if (deadline_sec <= new_time_limit_sec) {
-  if (deadline_sec <= new_time_limit_sec) {
     AddTerminationTimer_(deadline_sec, true);
   } else {
     AddTerminationTimer_(new_time_limit_sec, false);
   }
+  }

Based on past review comments.


1970-1984: Guard against negative deadline intervals.

If deadline_sec is negative (deadline already in the past), passing it to AddTerminationTimer_() causes undefined behavior. The code computes min(deadline_sec, sec) but doesn't check if deadline_sec is negative beforehand. If negative and smaller than sec, it will still be passed to the timer, which can cause the timer to never fire or behave unexpectedly.

Apply this diff to guard against expired deadlines:

 int64_t sec = m_step_.GetStep().time_limit().seconds();
 
 if (m_step_.GetStep().has_deadline_time()) {
   int64_t deadline_sec = m_step_.GetStep().deadline_time().seconds() -
                          absl::ToUnixSeconds(absl::Now());
+  if (deadline_sec <= 0) {
+    CRANE_TRACE("Task deadline already reached; firing immediately.");
+    EvTaskTimerCb_(true);
+  } else {
-  bool is_deadline = deadline_sec <= sec ? true : false;
-  std::string log_timer = deadline_sec <= sec ? "deadline" : "time_limit";
-  deadline_sec = deadline_sec <= sec ? deadline_sec : sec;
-  AddTerminationTimer_(deadline_sec, is_deadline);
-  CRANE_TRACE("Add a {} timer of {} seconds", log_timer, deadline_sec);
+    bool is_deadline = deadline_sec <= sec;
+    std::string log_timer = is_deadline ? "deadline" : "time_limit";
+    int64_t timer_secs = is_deadline ? deadline_sec : sec;
+    AddTerminationTimer_(timer_secs, is_deadline);
+    CRANE_TRACE("Add a {} timer of {} seconds", log_timer, timer_secs);
+  }
 } else {
   AddTerminationTimer_(sec, false);
   CRANE_TRACE("Add a time_limit timer of {} seconds", sec);
 }

Based on past review comments.

src/Craned/Supervisor/TaskManager.h (1)

384-385: Remove dead code declaration.

EvDeadlineTaskTimerCb_() is an empty function in the implementation and should be removed along with its declaration. The deadline functionality has been properly merged into EvTaskTimerCb_(bool is_deadline).

Apply this diff:

   void EvTaskTimerCb_(bool is_deadline);
-  void EvDeadlineTaskTimerCb_();
   void EvCleanTaskStopQueueCb_();
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3431876 and 73ac349.

📒 Files selected for processing (6)
  • src/CraneCtld/RpcService/CranedKeeper.cpp (0 hunks)
  • src/CraneCtld/RpcService/CranedKeeper.h (0 hunks)
  • src/CraneCtld/TaskScheduler.cpp (3 hunks)
  • src/CraneCtld/TaskScheduler.h (1 hunks)
  • src/Craned/Supervisor/TaskManager.cpp (6 hunks)
  • src/Craned/Supervisor/TaskManager.h (3 hunks)
💤 Files with no reviewable changes (2)
  • src/CraneCtld/RpcService/CranedKeeper.h
  • src/CraneCtld/RpcService/CranedKeeper.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/CraneCtld/TaskScheduler.cpp
🧰 Additional context used
🧬 Code graph analysis (2)
src/Craned/Supervisor/TaskManager.h (1)
src/Craned/Supervisor/TaskManager.cpp (4)
  • EvTaskTimerCb_ (1755-1777)
  • EvTaskTimerCb_ (1755-1755)
  • EvDeadlineTaskTimerCb_ (1779-1779)
  • EvDeadlineTaskTimerCb_ (1779-1779)
src/Craned/Supervisor/TaskManager.cpp (2)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (154-154)
src/Craned/Supervisor/TaskManager.h (2)
  • DelTerminationTimer_ (334-340)
  • AddTerminationTimer_ (323-332)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: format
🔇 Additional comments (7)
src/CraneCtld/TaskScheduler.h (1)

674-677: LGTM!

The addition of the reason field to CancelPendingTaskQueueElem enables tracking of cancellation causes (e.g., user-triggered vs deadline-driven), which aligns well with the broader deadline feature implementation.

src/Craned/Supervisor/TaskManager.cpp (3)

1670-1673: LGTM!

The code correctly propagates the deadline information into the queue element for later processing.


1755-1777: LGTM!

The function correctly differentiates between deadline and timeout terminations, using the appropriate status codes (TaskStatus::Deadline vs TaskStatus::ExceedTimeLimit) and termination reasons (TERMINATION_BY_DEADLINE vs TERMINATION_BY_TIMEOUT).


1829-1834: LGTM!

The code correctly handles deadline termination for both signal-based and non-signal-based exits, reporting TaskStatus::Deadline with appropriate exit codes and descriptive messages.

Also applies to: 1849-1854

src/Craned/Supervisor/TaskManager.h (3)

30-36: LGTM!

The addition of TERMINATION_BY_DEADLINE to the TerminatedBy enum is correctly placed and enables proper tracking of deadline-driven terminations.


323-332: LGTM!

The new AddTerminationTimer_ overload correctly captures the is_deadline flag and passes it to EvTaskTimerCb_, enabling proper differentiation between deadline and timeout terminations.


375-379: LGTM!

The addition of the deadline_sec field to ChangeTaskTimeLimitQueueElem correctly uses std::optional since not all tasks have deadlines, with appropriate default initialization to std::nullopt.

@github-actions github-actions bot removed the test-passed Build and test success label Oct 13, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/Craned/Supervisor/TaskManager.h (1)

310-321: Unify duplicated AddTerminationTimer_ variants; fix unit mismatch and typo; prevent double timers

  • Two nearly identical implementations; merge into one API to avoid drift.
  • Current versions use milliseconds (template) vs seconds (int64) → inconsistent behavior/precision.
  • Call DelTerminationTimer_() before creating a new timer to prevent double callbacks/leaks.
  • Typo: termination_handel → termination_handle.
  • Minor: avoid unused param warning for lambda arg h.

Suggested refactor:

-  template <typename Duration>
-  void AddTerminationTimer_(Duration duration) {
-    auto termination_handel = m_uvw_loop_->resource<uvw::timer_handle>();
-    termination_handel->on<uvw::timer_event>(
-        [this](const uvw::timer_event&, uvw::timer_handle& h) {
-          EvTaskTimerCb_(false);
-        });
-    termination_handel->start(
-        std::chrono::duration_cast<std::chrono::milliseconds>(duration),
-        std::chrono::seconds(0));
-    m_step_.termination_timer = termination_handel;
-  }
-
-  void AddTerminationTimer_(int64_t secs, bool is_deadline) {
-    auto termination_handel = m_uvw_loop_->resource<uvw::timer_handle>();
-    termination_handel->on<uvw::timer_event>(
-        [this, is_deadline](const uvw::timer_event&, uvw::timer_handle& h) {
-          EvTaskTimerCb_(is_deadline);
-        });
-    termination_handel->start(std::chrono::seconds(secs),
-                              std::chrono::seconds(0));
-    m_step_.termination_timer = termination_handel;
-  }
+  template <typename Duration>
+  void AddTerminationTimer_(Duration duration, bool is_deadline = false) {
+    DelTerminationTimer_();
+    auto termination_handle = m_uvw_loop_->resource<uvw::timer_handle>();
+    termination_handle->on<uvw::timer_event>(
+        [this, is_deadline](const uvw::timer_event&, uvw::timer_handle& /*h*/) {
+          EvTaskTimerCb_(is_deadline);
+        });
+    termination_handle->start(
+        std::chrono::duration_cast<std::chrono::milliseconds>(duration),
+        std::chrono::seconds(0));
+    m_step_.termination_timer = termination_handle;
+  }

Also applies to: 323-332

🧹 Nitpick comments (1)
src/Craned/Supervisor/TaskManager.h (1)

378-379: Unify time types; clarify deadline semantics

  • time_limit uses absl::Duration, but deadline uses std::optional<int64_t> seconds. Prefer consistent types and explicit semantics.
  • If deadline is absolute, use absl::Time; if relative, use absl::Duration.

Proposal:

  • Rename to deadline (not deadline_sec) and change type to std::optionalabsl::Duration (or absl::Time if absolute).
  • Document whether it’s absolute epoch or relative.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73ac349 and 06d3ea3.

📒 Files selected for processing (2)
  • src/CraneCtld/TaskScheduler.cpp (3 hunks)
  • src/Craned/Supervisor/TaskManager.h (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/CraneCtld/TaskScheduler.cpp
🧰 Additional context used
🧬 Code graph analysis (1)
src/Craned/Supervisor/TaskManager.h (1)
src/Craned/Supervisor/TaskManager.cpp (2)
  • EvTaskTimerCb_ (1755-1777)
  • EvTaskTimerCb_ (1755-1755)
🔇 Additional comments (2)
src/Craned/Supervisor/TaskManager.h (2)

34-36: Enum extension looks good; check downstream handling

Adding TERMINATION_BY_DEADLINE is fine. Ensure all switch statements, metrics, and any (de)serialization/IPC using TerminatedBy are updated accordingly.


384-385: Callback unification LGTM

EvTaskTimerCb_(bool is_deadline) cleanly merges deadline/timelimit paths. Matches TaskManager.cpp usage.

@L-Xiafeng L-Xiafeng added the enhancement New feature or request label Oct 13, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/Craned/Supervisor/TaskManager.cpp (1)

1937-1946: Clamp negative deadline intervals or terminate immediately.

If the deadline has already passed (network latency, clock skew, or late scheduling), deadline_sec becomes negative or zero at line 1938-1939. Passing this to AddTerminationTimer_() at line 1943 can cause the timer to never fire or behave unexpectedly, as negative durations in std::chrono::seconds() are problematic.

Apply this diff to guard against expired deadlines:

 } else if (elem.deadline_sec) {
   int64_t deadline_sec =
       elem.deadline_sec.value() - absl::ToUnixSeconds(absl::Now());
   int64_t new_time_limit_sec =
       ToInt64Seconds((new_time_limit - (absl::Now() - start_time)));
+  if (deadline_sec <= 0) {
+    // Deadline already passed, terminate immediately
+    TaskTerminateQueueElem ev_task_terminate{
+        .termination_reason = TerminatedBy::TERMINATION_BY_DEADLINE};
+    m_task_terminate_queue_.enqueue(ev_task_terminate);
+    m_terminate_task_async_handle_->send();
+  } else if (deadline_sec <= new_time_limit_sec) {
-  if (deadline_sec <= new_time_limit_sec) {
     AddTerminationTimer_(deadline_sec, true);
   } else {
     AddTerminationTimer_(new_time_limit_sec, false);
   }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06d3ea3 and b311907.

📒 Files selected for processing (4)
  • src/CraneCtld/TaskScheduler.cpp (4 hunks)
  • src/CraneCtld/TaskScheduler.h (2 hunks)
  • src/Craned/Supervisor/TaskManager.cpp (6 hunks)
  • src/Craned/Supervisor/TaskManager.h (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/CraneCtld/TaskScheduler.h
  • src/CraneCtld/TaskScheduler.cpp
🧰 Additional context used
🧬 Code graph analysis (2)
src/Craned/Supervisor/TaskManager.h (1)
src/Craned/Supervisor/TaskManager.cpp (2)
  • EvTaskTimerCb_ (1755-1777)
  • EvTaskTimerCb_ (1755-1755)
src/Craned/Supervisor/TaskManager.cpp (2)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (154-154)
src/Craned/Supervisor/TaskManager.h (2)
  • DelTerminationTimer_ (334-340)
  • AddTerminationTimer_ (323-332)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: format
🔇 Additional comments (9)
src/Craned/Supervisor/TaskManager.h (6)

34-35: LGTM! Clean enum extension.

The addition of TERMINATION_BY_DEADLINE is well-placed and consistent with the existing enum structure.


311-321: LGTM! Unified timer handling.

The template method correctly unifies deadline and timeout timers by accepting an is_deadline flag and propagating it to the callback. This design addresses the previous suggestion to merge timer functions.


323-332: LGTM! Consistent overload.

The overload correctly mirrors the template version's behavior and consistently handles the is_deadline flag.


384-384: LGTM! Unified callback signature.

The callback signature correctly accepts the is_deadline flag, enabling a single unified handler for both timeout and deadline events. This addresses the previous suggestion to merge separate callbacks.


384-384: Flag AI summary inconsistency.

The AI-generated enriched summary mentions "Added new public method: EvDeadlineTaskTimerCb_() to handle deadline-based timer events," but this method does not exist in the provided code. The actual implementation uses a unified EvTaskTimerCb_(bool is_deadline) method instead of separate callbacks.


378-378: No issues with deadline_sec handling: processing logic distinguishes time_limit and deadline_sec updates correctly and applies appropriate timers.

src/Craned/Supervisor/TaskManager.cpp (3)

1670-1673: LGTM!

The deadline propagation logic correctly extracts and forwards the deadline timestamp when present, enabling downstream timer rescheduling to consider both deadline and time_limit.


1755-1777: LGTM!

The timer callback now correctly distinguishes between deadline and timeout expirations via the is_deadline parameter. Batch tasks enqueue the appropriate termination reason, while non-batch tasks immediately report the correct status (Deadline vs ExceedTimeLimit) with the proper exit code.


1827-1832: LGTM!

The deadline termination handling correctly maps TERMINATION_BY_DEADLINE to TaskStatus::Deadline in both signal and no-signal paths, with appropriate exit codes and descriptive messages. This ensures batch jobs report deadline expiration distinctly from timeouts.

Also applies to: 1847-1852

@github-actions github-actions bot added the test-passed Build and test success label Oct 13, 2025
Copy link
Collaborator

@L-Xiafeng L-Xiafeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要新增ccontrol update job deadline功能

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/Craned/Supervisor/SupervisorServer.cpp (1)

63-86: Handle one-of semantics: downstream uses .value() unguarded and will crash if only one field is set

Supervisor’s TaskManager (EvCleanChangeTaskTimeConstraintQueueCb_, see src/Craned/Supervisor/TaskManager.cpp lines 1931‑1968 in the provided snippet) dereferences both optionals with .value(). If only time_limit or only deadline is provided, this will throw/UB and kill the supervisor.

Recommended fix (in TaskManager.cpp): treat each optional independently and compute the earliest expiry only from present fields.

// Replace TaskManager::EvCleanChangeTaskTimeConstraintQueueCb_ with guarded logic
void TaskManager::EvCleanChangeTaskTimeConstraintQueueCb_() {
  absl::Time now = absl::Now();
  ChangeTaskTimeConstraintQueueElem elem;
  while (m_task_time_constraint_change_queue_.try_dequeue(elem)) {
    DelTerminationTimer_();
    CRANE_TRACE("Delete the old timer.");

    absl::Time start_time =
        absl::FromUnixSeconds(m_step_.GetStep().start_time().seconds());

    const auto& tl = elem.time_limit;            // optional<absl::Duration>
    const auto& ddl = elem.deadline_time;        // optional<int64_t> (unix sec)

    bool timed_out = false;
    if (tl && now - start_time >= *tl) timed_out = true;
    if (ddl && absl::ToUnixSeconds(now) >= *ddl) timed_out = true;

    if (timed_out) {
      TaskTerminateQueueElem ev{ .termination_reason = TerminatedBy::TERMINATION_BY_TIMEOUT };
      m_task_terminate_queue_.enqueue(ev);
      m_terminate_task_async_handle_->send();
    } else {
      int64_t ttl_by_tl = tl ? ToInt64Seconds((*tl) - (now - start_time))
                             : std::numeric_limits<int64_t>::max();
      int64_t ttl_by_ddl = ddl ? (*ddl - absl::ToUnixSeconds(now))
                               : std::numeric_limits<int64_t>::max();
      int64_t secs = std::min(ttl_by_tl, ttl_by_ddl);
      bool is_deadline = ttl_by_ddl <= ttl_by_tl;

      AddTerminationTimer_(secs, is_deadline);
      CRANE_TRACE("Replace the old timer to {} timer of {} seconds",
                  is_deadline ? "deadline" : "time_limit", secs);
    }
    elem.ok_prom.set_value(CraneErrCode::SUCCESS);
  }
}

Based on relevant snippet

src/Craned/Core/JobManager.cpp (1)

1024-1052: Set promise on all paths to avoid deadlock

If supervisor stub is missing or RPC fails, ok_prom is never set. The caller waits on future.get() and can block indefinitely. Set false in all failure branches.

   while (m_task_time_constraint_change_queue_.try_dequeue(elem)) {
     if (auto job_ptr = m_job_map_.GetValueExclusivePtr(elem.job_id); job_ptr) {
       auto stub = g_supervisor_keeper->GetStub(elem.job_id);
       if (!stub) {
         CRANE_ERROR("Supervisor for task #{} not found", elem.job_id);
-        continue;
+        elem.ok_prom.set_value(false);
+        continue;
       }
 
       auto err =
           stub->ChangeTaskTimeConstraint(elem.time_limit, elem.deadline_time);
 
       if (err != CraneErrCode::SUCCESS) {
         CRANE_ERROR("Failed to change task time limit or deadline for task #{}",
                     elem.job_id);
-        continue;
+        elem.ok_prom.set_value(false);
+        continue;
       }
 
       elem.ok_prom.set_value(true);
     } else {
       CRANE_ERROR(
           "Try to update the time limit or deadline of a non-existent task "
           "#{}.",
           elem.job_id);
       elem.ok_prom.set_value(false);
     }
   }
src/Craned/Supervisor/TaskManager.cpp (1)

1932-1969: Critical issues: Unsafe optional access and incorrect termination reason.

This function has several critical issues:

  1. Unsafe optional access (lines 1943, 1946): The code calls .value() on optional fields without checking if they have values. This will crash if either is std::nullopt. While the ChangeTaskTimeConstraintAsync function (lines 1670-1686) always populates both fields, this assumption makes the code fragile.

  2. Incorrect termination reason (line 1949): When the task times out or reaches its deadline, the code always enqueues TERMINATION_BY_TIMEOUT regardless of which constraint was exceeded. If the deadline was exceeded, it should use TERMINATION_BY_DEADLINE.

  3. Negative deadline handling (lines 1955-1956): If elem.deadline_time.value() is in the past (due to network latency, clock skew, or late processing), deadline_sec becomes negative. Passing this to std::min and then to AddTerminationTimer_ will cause the timer to never fire or behave unexpectedly.

Apply this diff to fix these issues:

 void TaskManager::EvCleanChangeTaskTimeConstraintQueueCb_() {
   absl::Time now = absl::Now();
 
   ChangeTaskTimeConstraintQueueElem elem;
   while (m_task_time_constraint_change_queue_.try_dequeue(elem)) {
     // Delete the old timer.
     DelTerminationTimer_();
     CRANE_TRACE("Delete the old timer.");
 
     absl::Time start_time =
         absl::FromUnixSeconds(m_step_.GetStep().start_time().seconds());
-    absl::Duration const& new_time_limit = elem.time_limit.value();
+    
+    if (!elem.time_limit.has_value() || !elem.deadline_time.has_value()) {
+      CRANE_ERROR("time_limit or deadline_time not set in constraint change");
+      elem.ok_prom.set_value(CraneErrCode::ERR_INVALID_PARAM);
+      continue;
+    }
+    
+    absl::Duration const& new_time_limit = elem.time_limit.value();
+    int64_t deadline_time = elem.deadline_time.value();
 
-    if (now - start_time >= new_time_limit ||
-        absl::ToUnixSeconds(now) >= elem.deadline_time.value()) {
+    bool time_limit_exceeded = now - start_time >= new_time_limit;
+    bool deadline_exceeded = absl::ToUnixSeconds(now) >= deadline_time;
+    
+    if (time_limit_exceeded || deadline_exceeded) {
       // If the task times out, terminate it.
+      TerminatedBy reason = deadline_exceeded 
+          ? TerminatedBy::TERMINATION_BY_DEADLINE 
+          : TerminatedBy::TERMINATION_BY_TIMEOUT;
       TaskTerminateQueueElem ev_task_terminate{
-          .termination_reason = TerminatedBy::TERMINATION_BY_TIMEOUT};
+          .termination_reason = reason};
       m_task_terminate_queue_.enqueue(ev_task_terminate);
       m_terminate_task_async_handle_->send();
 
     } else {
       // If the task haven't timed out, set up a new timer.
-      int64_t deadline_sec =
-          elem.deadline_time.value() - absl::ToUnixSeconds(absl::Now());
+      int64_t deadline_sec = deadline_time - absl::ToUnixSeconds(absl::Now());
       int64_t new_time_limit_sec =
           ToInt64Seconds((new_time_limit - (absl::Now() - start_time)));
-      bool is_deadline = deadline_sec <= new_time_limit_sec ? true : false;
+      
+      if (deadline_sec <= 0) {
+        // Deadline already passed, terminate immediately
+        TaskTerminateQueueElem ev_task_terminate{
+            .termination_reason = TerminatedBy::TERMINATION_BY_DEADLINE};
+        m_task_terminate_queue_.enqueue(ev_task_terminate);
+        m_terminate_task_async_handle_->send();
+        elem.ok_prom.set_value(CraneErrCode::SUCCESS);
+        continue;
+      }
+      
+      bool is_deadline = deadline_sec <= new_time_limit_sec;
       std::string log_timer =
           deadline_sec <= new_time_limit_sec ? "deadline" : "time_limit";
       AddTerminationTimer_(std::min(deadline_sec, new_time_limit_sec),
                            is_deadline);
       CRANE_TRACE("Replace the old timer to {} timer of {} seconds", log_timer,
                   std::min(deadline_sec, new_time_limit_sec));
     }
     elem.ok_prom.set_value(CraneErrCode::SUCCESS);
   }
 }
♻️ Duplicate comments (1)
src/CraneCtld/CtldPublicDefs.cpp (1)

209-212: Reset deadline_time when field is absent to avoid stale carry-over

If the incoming proto omits deadline_time, the existing value persists on reused TaskInCtld objects. Reset to InfiniteFuture to represent “no deadline”.

-  if (val.has_deadline_time()) {
-    deadline_time = absl::FromUnixSeconds(val.deadline_time().seconds());
-  }
+  if (val.has_deadline_time()) {
+    deadline_time = absl::FromUnixSeconds(val.deadline_time().seconds());
+  } else {
+    deadline_time = absl::InfiniteFuture();
+  }
🧹 Nitpick comments (12)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)

590-611: Preserve nanoseconds when converting Timestamp → absl::Time

Current code ignores nanos; keep full precision by adding nanos.

-    absl::Time deadline_time =
-        absl::FromUnixSeconds(request->deadline_time().seconds());
+    const auto& ts = request->deadline_time();
+    absl::Time deadline_time =
+        absl::FromUnixSeconds(ts.seconds()) + absl::Nanoseconds(ts.nanos());
protos/Crane.proto (2)

146-155: Clarify units and semantics for time constraints

Specify in comments that:

  • time_limit_seconds: wall-clock seconds (>=0).
  • deadline_time: Unix epoch seconds (UTC).

This avoids ambiguity across languages and clients. Optionally consider google.protobuf.Timestamp for deadline_time here too for consistency with ModifyTaskRequest.


229-230: Consistent deadline representation across APIs

ModifyTaskRequest uses Timestamp while ChangeJobTimeConstraintRequest uses int64 seconds. It works, but mixed types increase conversion points. Consider unifying on Timestamp for both or documenting the rationale.

Also applies to: 245-246

src/CraneCtld/CtldPublicDefs.cpp (2)

309-310: Avoid serializing InfiniteFuture as a concrete timestamp

Emitting a max-ish Unix timestamp can confuse clients. Only set the field when finite; otherwise leave it unset/cleared.

-  task_info->mutable_deadline_time()->set_seconds(ToUnixSeconds(deadline_time));
+  if (absl::IsFinite(deadline_time)) {
+    task_info->mutable_deadline_time()->set_seconds(
+        ToUnixSeconds(deadline_time));
+  } else {
+    task_info->clear_deadline_time();
+  }

360-362: Likewise: don’t populate TaskToD.deadline_time for InfiniteFuture

Skip setting when no deadline to avoid sending sentinel values.

-  task_to_d.mutable_deadline_time()->set_seconds(
-      ToUnixSeconds(this->deadline_time));
+  if (absl::IsFinite(this->deadline_time)) {
+    task_to_d.mutable_deadline_time()->set_seconds(
+        ToUnixSeconds(this->deadline_time));
+  } else {
+    task_to_d.clear_deadline_time();
+  }
src/CraneCtld/DbClient.cpp (2)

1090-1110: Field array comment index is off; also consider centralizing field names

The comment says “35 - 39” but there are 4 fields (indices 35–38). Cosmetic, but please fix. Also, field name lists are duplicated in two functions—consider a single constexpr array to avoid drift.

Apply this diff to fix the comment:

-  // 35 - 39
+  // 35 - 38
   "mem_alloc", "device_map", "container", "deadline_time"

1179-1199: Duplicate field name list and off-by-one comment (Ctld variant)

Same nits as the EmbeddedDb variant: fix the “35 - 39” comment and consider de-duplicating the field list.

Diff:

-  // 35 - 39
+  // 35 - 38
   "mem_alloc", "device_map", "container", "deadline_time"
src/Craned/Core/SupervisorKeeper.cpp (1)

84-103: Set a client-side RPC deadline (optional)

To avoid hangs under network partitions, set a short deadline on ClientContext (e.g., 5–10s).

Example:

ClientContext context;
context.set_deadline(absl::ToChronoTime(absl::Now() + absl::Seconds(10)));
protos/Supervisor.proto (1)

104-107: Document field units and semantics

Please specify units to prevent ambiguity across components.

Diff:

-message ChangeTaskTimeConstraintRequest{
-  optional int64 time_limit_seconds = 1;
-  optional int64 deadline_time = 2;
+message ChangeTaskTimeConstraintRequest{
+  // Time limit duration in seconds.
+  optional int64 time_limit_seconds = 1;
+  // Absolute deadline, Unix epoch seconds (UTC). If unset, no deadline.
+  optional int64 deadline_time = 2;
 }
src/CraneCtld/TaskScheduler.h (1)

488-491: API change looks good; add brief unit docs

Clarify units for secs (seconds) and deadline_time (absl::Time absolute).

src/CraneCtld/RpcService/CranedKeeper.cpp (1)

266-269: Unify log message to avoid “Task/Job” mismatch

Minor nit: the error log still says ChangeTaskTimeConstraintAsync while this client method is ChangeJobTimeConstraint. Suggest standardizing to “Job”.

-    CRANE_ERROR("ChangeTaskTimeConstraintAsync to Craned {} failed: {} ",
+    CRANE_ERROR("ChangeJobTimeConstraint to Craned {} failed: {} ",
                 m_craned_id_, status.error_message());
src/CraneCtld/TaskScheduler.cpp (1)

572-591: Guard against “no deadline” default when pre-scanning pending tasks

Ensure tasks without a deadline aren’t cancelled. If TaskInCtld::deadline_time isn’t guaranteed to be absl::InfiniteFuture(), add an explicit guard.

-      for (const auto& [task_id, task] : m_pending_task_map_) {
-        if (task->deadline_time < absl_begin) {
+      for (const auto& [task_id, task] : m_pending_task_map_) {
+        if (task->deadline_time != absl::InfiniteFuture() &&
+            task->deadline_time < absl_begin) {
           CRANE_DEBUG("Task #{} was added to cancel_tasks", task_id);
           cancel_tasks.push_back(task_id);
         }
       }

Also consider <= to cancel exactly at the deadline boundary.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b311907 and 1088ad2.

📒 Files selected for processing (20)
  • protos/Crane.proto (4 hunks)
  • protos/PublicDefs.proto (4 hunks)
  • protos/Supervisor.proto (2 hunks)
  • src/CraneCtld/CtldPublicDefs.cpp (3 hunks)
  • src/CraneCtld/DbClient.cpp (10 hunks)
  • src/CraneCtld/RpcService/CranedKeeper.cpp (1 hunks)
  • src/CraneCtld/RpcService/CranedKeeper.h (1 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (2 hunks)
  • src/CraneCtld/TaskScheduler.cpp (8 hunks)
  • src/CraneCtld/TaskScheduler.h (3 hunks)
  • src/Craned/Core/CranedServer.cpp (1 hunks)
  • src/Craned/Core/CranedServer.h (1 hunks)
  • src/Craned/Core/JobManager.cpp (3 hunks)
  • src/Craned/Core/JobManager.h (4 hunks)
  • src/Craned/Core/SupervisorKeeper.cpp (1 hunks)
  • src/Craned/Core/SupervisorKeeper.h (1 hunks)
  • src/Craned/Supervisor/SupervisorServer.cpp (1 hunks)
  • src/Craned/Supervisor/SupervisorServer.h (1 hunks)
  • src/Craned/Supervisor/TaskManager.cpp (8 hunks)
  • src/Craned/Supervisor/TaskManager.h (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • protos/PublicDefs.proto
  • src/CraneCtld/RpcService/CranedKeeper.h
🧰 Additional context used
🧬 Code graph analysis (13)
src/Craned/Core/CranedServer.h (2)
src/CraneCtld/RpcService/CranedKeeper.cpp (2)
  • ChangeJobTimeConstraint (241-277)
  • ChangeJobTimeConstraint (241-243)
src/Craned/Core/CranedServer.cpp (2)
  • ChangeJobTimeConstraint (309-335)
  • ChangeJobTimeConstraint (309-312)
src/Craned/Core/CranedServer.cpp (1)
src/CraneCtld/RpcService/CranedKeeper.cpp (2)
  • ChangeJobTimeConstraint (241-277)
  • ChangeJobTimeConstraint (241-243)
src/Craned/Core/SupervisorKeeper.h (3)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1021-1136)
  • ChangeTaskTimeConstraint (1021-1023)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (84-103)
  • ChangeTaskTimeConstraint (84-86)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-86)
  • ChangeTaskTimeConstraint (63-66)
src/Craned/Supervisor/SupervisorServer.h (3)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1021-1136)
  • ChangeTaskTimeConstraint (1021-1023)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (84-103)
  • ChangeTaskTimeConstraint (84-86)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-86)
  • ChangeTaskTimeConstraint (63-66)
src/CraneCtld/RpcService/CranedKeeper.cpp (1)
src/Craned/Core/CranedServer.cpp (2)
  • ChangeJobTimeConstraint (309-335)
  • ChangeJobTimeConstraint (309-312)
src/CraneCtld/TaskScheduler.h (3)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1021-1136)
  • ChangeTaskTimeConstraint (1021-1023)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (84-103)
  • ChangeTaskTimeConstraint (84-86)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-86)
  • ChangeTaskTimeConstraint (63-66)
src/Craned/Core/JobManager.h (2)
src/Craned/Core/JobManager.cpp (4)
  • ChangeJobTimeConstraintAsync (995-1009)
  • ChangeJobTimeConstraintAsync (995-997)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1024-1052)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1024-1024)
src/Craned/Supervisor/TaskManager.cpp (2)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1932-1969)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1932-1932)
src/Craned/Supervisor/TaskManager.cpp (3)
src/Craned/Core/JobManager.cpp (2)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1024-1052)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1024-1024)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (154-154)
src/Craned/Supervisor/TaskManager.h (2)
  • DelTerminationTimer_ (336-342)
  • AddTerminationTimer_ (325-334)
src/Craned/Supervisor/TaskManager.h (1)
src/Craned/Supervisor/TaskManager.cpp (10)
  • EvTaskTimerCb_ (1769-1791)
  • EvTaskTimerCb_ (1769-1769)
  • ChangeTaskTimeConstraintAsync (1662-1691)
  • ChangeTaskTimeConstraintAsync (1662-1664)
  • EvCleanTaskStopQueueCb_ (1793-1887)
  • EvCleanTaskStopQueueCb_ (1793-1793)
  • EvCleanTerminateTaskQueueCb_ (1889-1930)
  • EvCleanTerminateTaskQueueCb_ (1889-1889)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1932-1969)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1932-1932)
src/Craned/Core/JobManager.cpp (1)
src/Craned/Supervisor/TaskManager.cpp (2)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1932-1969)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1932-1932)
src/Craned/Core/SupervisorKeeper.cpp (2)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1021-1136)
  • ChangeTaskTimeConstraint (1021-1023)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-86)
  • ChangeTaskTimeConstraint (63-66)
src/Craned/Supervisor/SupervisorServer.cpp (2)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1021-1136)
  • ChangeTaskTimeConstraint (1021-1023)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (84-103)
  • ChangeTaskTimeConstraint (84-86)
src/CraneCtld/TaskScheduler.cpp (3)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (84-103)
  • ChangeTaskTimeConstraint (84-86)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-86)
  • ChangeTaskTimeConstraint (63-66)
src/CraneCtld/CtldPublicDefs.h (1)
  • CheckIfTimeLimitSecIsValid (705-707)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: format
🔇 Additional comments (23)
protos/Crane.proto (1)

1014-1015: RPC rename and payloads look consistent

ChangeJobTimeConstraint RPC and messages align with server/client changes.

Please ensure all dependents regenerated protos (C++, Go, Python) after adding proto3 optional fields.

src/Craned/Core/CranedServer.cpp (1)

319-335: LGTM: correctly handles optional fields

Optionals are parsed and forwarded to JobManager. No issues spotted.

src/Craned/Supervisor/SupervisorServer.h (1)

53-58: Signature update aligns with SupervisorServer.cpp

Method rename and types match the implementation and proto.

src/CraneCtld/DbClient.cpp (2)

1120-1156: Persisting zero deadline_time: confirm semantics

Zero seconds will be stored when deadline is unset. Ensure downstream treats 0 as “unset” and does not schedule/timer off epoch.

Would you prefer skipping the field entirely when unset? That would require deviating from the tuple builder; if desired, I can propose a small helper to omit optional fields.


1209-1237: Ctld variant: zero deadline_time semantics

Same as the EmbeddedDb path: confirm that 0 means “unset” across timers/scheduler and reporting. If not, consider omitting the field when unset.

src/CraneCtld/TaskScheduler.h (2)

540-543: Ensure cancel path uses finish_status

Confirm CancelTaskAsyncCb_ and downstream state transitions honor finish_status (e.g., Cancelled vs Deadline).

If not yet wired, I can sketch the minimal changes.


679-680: finish_status addition acknowledged

Carrying finish_status in the queue is a solid change for clarity of final state.

protos/Supervisor.proto (1)

131-131: Verify RPC rename and generated code
No instances of ChangeTaskTimeLimit or the new ChangeTaskTimeConstraint were found in generated code; manually confirm all call sites were updated and ensure your proto3 optional toolchain supports has_* accessors.

src/Craned/Core/CranedServer.h (1)

99-103: RPC rename and signature update look correct

Header matches the implementation in CranedServer.cpp and new proto types.

src/CraneCtld/TaskScheduler.cpp (1)

1124-1131: Good: forwards both constraints to craned with clear error path

Forwarding both optionals and logging specific failure is correct.

src/Craned/Core/JobManager.h (3)

106-108: LGTM! Function signature extended for deadline support.

The function signature correctly extends the API to support both time limit and deadline constraints using optional parameters.


145-150: LGTM! Data structure updated for constraint-based semantics.

The struct is correctly renamed and updated with optional fields to support flexible time constraint modifications.


219-219: LGTM! Consistent renaming of internal members.

All internal queue, handle, and callback names are consistently updated to reflect the constraint-based semantics.

Also applies to: 247-250

src/Craned/Supervisor/TaskManager.h (6)

34-35: LGTM! New termination reason added.

The TERMINATION_BY_DEADLINE enum value correctly extends the termination reasons to support deadline-driven task termination.


89-90: LGTM! Accessor method for mutable step data.

The GetMutableStep() method appropriately provides mutable access to the step specification for updating time constraints.


313-323: LGTM! Timer setup extended for deadline awareness.

The template method correctly passes the is_deadline flag through the timer callback, enabling proper deadline vs. timeout handling.


325-334: LGTM! Overload consistently extended.

The int64_t overload is consistently updated with the is_deadline parameter, maintaining API consistency.


357-359: LGTM! Public API and internal structures updated consistently.

The function signatures, queue element structure, and callback declarations are all consistently updated to support time constraint modifications with optional time limit and deadline.

Also applies to: 379-383, 388-388, 392-392


410-413: LGTM! Internal member variables renamed consistently.

The async handle and queue member variables are appropriately renamed to reflect the broader constraint-based semantics.

src/Craned/Supervisor/TaskManager.cpp (4)

1523-1528: LGTM! Async handle setup updated.

The async handle is correctly renamed and registered with the updated callback function.


1662-1691: Function implementation is correct, but verify deadline handling in downstream code.

The function correctly handles optional parameters and ensures both time_limit and deadline_time are populated in the queue element. However, when deadline_time is not provided, it falls back to m_step_.GetStep().deadline_time().seconds(). If the step was initialized without a deadline, this may return 0 or an unset value, which could cause issues in downstream deadline calculations (see concerns in other review comments).


1769-1791: LGTM! Timer callback correctly handles deadline vs. timeout.

The callback properly distinguishes between deadline expiration and timeout, setting appropriate termination reasons and status codes for both batch and non-batch tasks.


1841-1846: LGTM! Deadline termination handled in task stop callback.

The code correctly handles TERMINATION_BY_DEADLINE in both the signal and non-signal termination paths, setting the appropriate status and exit code.

Also applies to: 1861-1866

@github-actions github-actions bot removed the test-passed Build and test success label Oct 14, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/CraneCtld/DbClient.cpp (1)

382-383: Guard reading deadline_time to avoid exceptions on legacy docs

Old records may not have deadline_time; direct get_int64() risks exceptions and aborts the loop. Guard presence and type.

Apply this diff:

-      task->mutable_deadline_time()->set_seconds(
-          view["deadline_time"].get_int64().value);
+      if (auto e = view["deadline_time"]) {
+        int64_t secs = 0;
+        if (e.type() == bsoncxx::type::k_int64) secs = e.get_int64().value;
+        else if (e.type() == bsoncxx::type::k_int32) secs = e.get_int32().value;
+        if (secs > 0) task->mutable_deadline_time()->set_seconds(secs);
+      }
🧹 Nitpick comments (1)
src/Craned/Supervisor/TaskManager.h (1)

379-383: Verify optional field handling in dequeue logic.

The struct correctly uses std::optional for both constraints. However, in EvCleanChangeTaskTimeConstraintQueueCb_ (TaskManager.cpp:1945), the code calls elem.deadline_time.value() without checking, relying on ChangeTaskTimeConstraintAsync always populating optionals from existing step values. While this works currently, it creates a fragile dependency between the enqueue and dequeue logic.

Consider adding defensive checks or assertions in the dequeue handler:

void TaskManager::EvCleanChangeTaskTimeConstraintQueueCb_() {
  ChangeTaskTimeConstraintQueueElem elem;
  while (m_task_time_constraint_change_queue_.try_dequeue(elem)) {
    // Add defensive check
    CRANE_ASSERT(elem.time_limit.has_value() && elem.deadline_time.has_value(),
                 "Both time_limit and deadline_time must be set in queue elem");
    // ... rest of implementation
  }
}

Or explicitly document the invariant that both fields are always populated when enqueued.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1088ad2 and e4f0240.

📒 Files selected for processing (5)
  • protos/Crane.proto (4 hunks)
  • protos/Supervisor.proto (2 hunks)
  • src/CraneCtld/DbClient.cpp (10 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (2 hunks)
  • src/Craned/Supervisor/TaskManager.h (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Craned/Supervisor/TaskManager.h (2)
src/Craned/Supervisor/TaskManager.cpp (10)
  • EvTaskTimerCb_ (1769-1791)
  • EvTaskTimerCb_ (1769-1769)
  • ChangeTaskTimeConstraintAsync (1662-1691)
  • ChangeTaskTimeConstraintAsync (1662-1664)
  • EvCleanTaskStopQueueCb_ (1793-1887)
  • EvCleanTaskStopQueueCb_ (1793-1793)
  • EvCleanTerminateTaskQueueCb_ (1889-1930)
  • EvCleanTerminateTaskQueueCb_ (1889-1889)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1932-1969)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1932-1932)
src/Craned/Core/JobManager.cpp (4)
  • EvCleanTerminateTaskQueueCb_ (919-981)
  • EvCleanTerminateTaskQueueCb_ (919-919)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1024-1052)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1024-1024)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (15)
src/CraneCtld/DbClient.cpp (3)

314-315: LGTM!

Comment accurately reflects the addition of the deadline_time field.


1094-1156: LGTM!

Field arrays and value tuples correctly extended to include deadline_time. The implementation properly serializes deadline_time to MongoDB documents for TaskInEmbeddedDb.


1183-1238: LGTM!

Field arrays and value tuples correctly extended to include deadline_time. The implementation properly serializes deadline_time to MongoDB documents for TaskInCtld.

src/CraneCtld/RpcService/CtldGrpcServer.cpp (2)

532-534: LGTM!

The type mismatch issue has been resolved. The code now correctly constructs std::optional<int64_t> from request->time_limit_seconds().


589-610: LGTM!

The new Deadline handling branch is well-implemented. The logic correctly:

  • Converts proto Timestamp to absl::Time
  • Calls ChangeTaskTimeConstraint with appropriate optional parameters
  • Handles all error cases consistently with existing TimeLimit handling
protos/Supervisor.proto (1)

104-111: LGTM!

Proto changes correctly rename the RPC and extend the request to support both time_limit_seconds and deadline_time as optional fields. This allows flexible constraint updates.

Also applies to: 131-131

protos/Crane.proto (3)

146-154: LGTM!

Message definitions are correctly structured with appropriate field numbering and optional qualifiers for flexible constraint updates.


229-229: LGTM!

The Deadline enum value and deadline_time field are correctly added without numbering conflicts.

Also applies to: 245-245


1014-1014: LGTM!

RPC signature correctly updated to use the new ChangeJobTimeConstraint message types.

src/Craned/Supervisor/TaskManager.h (6)

34-35: LGTM! Enum extension is consistent.

The addition of TERMINATION_BY_DEADLINE follows the existing pattern and integrates well with the termination handling logic shown in the implementation.


313-334: LGTM! Timer overloads properly handle deadline distinction.

The is_deadline parameter successfully addresses the past review comment about distinguishing deadline from time_limit timers. The lambda capture by value is appropriate for the boolean type, and the unified callback approach is cleaner than maintaining separate timer handlers.

Based on past review comments.


357-359: LGTM! API rename and optional parameters improve flexibility.

The rename from ChangeTaskTimeLimitAsync to ChangeTaskTimeConstraintAsync more accurately reflects the method's purpose. The optional parameters allow callers to update either time_limit, deadline_time, or both independently, which aligns with the PR objective of keeping deadline as an independent field.


388-388: LGTM! Timer callback signature consistent with design.

The is_deadline parameter addition properly distinguishes the termination reason, enabling correct status and exit code routing in the implementation (TaskManager.cpp:1768-1790).


392-392: LGTM! Method rename maintains consistency.

The rename from EvCleanChangeTaskTimeLimitQueueCb_ to EvCleanChangeTaskTimeConstraintQueueCb_ maintains naming consistency with the struct and async method renames.


410-413: LGTM! Member variable renames complete the refactoring.

The handle and queue renames align with the constraint-based semantics and maintain consistency across the codebase.

@github-actions github-actions bot added the test-passed Build and test success label Oct 15, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Craned/Supervisor/TaskManager.cpp (1)

1934-1964: Guard deadline/time-limit timers against missing or past deadlines.

The consumer side still assumes elem.deadline_time always carries a value and immediately dereferences it. Combined with the previous coercion, any job without a real deadline is terminated. Even after fixing the enqueue path, we still underflow timers when the deadline is in the past (negative seconds wrap into huge unsigned delays) and we continue to report deadline expirations as generic timeouts. Please treat deadline_time as optional, short‑circuit if the deadline is already exceeded, clamp negative durations to zero, and only mark a timer as deadline-driven when a real deadline exists.

-    if (now - start_time >= new_time_limit ||
-        absl::ToUnixSeconds(now) >= elem.deadline_time.value()) {
+    const bool has_deadline = elem.deadline_time.has_value();
+    if (now - start_time >= new_time_limit ||
+        (has_deadline &&
+         absl::ToUnixSeconds(now) >= elem.deadline_time.value())) {
       // If the task times out, terminate it.
-      TaskTerminateQueueElem ev_task_terminate{
-          .termination_reason = TerminatedBy::TERMINATION_BY_TIMEOUT};
+      TaskTerminateQueueElem ev_task_terminate{
+          .termination_reason = has_deadline
+                                    ? TerminatedBy::TERMINATION_BY_DEADLINE
+                                    : TerminatedBy::TERMINATION_BY_TIMEOUT};
       m_task_terminate_queue_.enqueue(ev_task_terminate);
       m_terminate_task_async_handle_->send();

     } else {
-      // If the task haven't timed out, set up a new timer.
-      int64_t deadline_sec =
-          elem.deadline_time.value() - absl::ToUnixSeconds(absl::Now());
-      int64_t new_time_limit_sec =
-          ToInt64Seconds((new_time_limit - (absl::Now() - start_time)));
-      bool is_deadline = deadline_sec <= new_time_limit_sec ? true : false;
-      std::string log_timer =
-          deadline_sec <= new_time_limit_sec ? "deadline" : "time_limit";
-      AddTerminationTimer_(std::min(deadline_sec, new_time_limit_sec),
-                           is_deadline);
-      CRANE_TRACE("Replace the old timer to {} timer of {} seconds", log_timer,
-                  std::min(deadline_sec, new_time_limit_sec));
+      // If the task hasn't timed out, set up a new timer.
+      int64_t new_time_limit_sec =
+          ToInt64Seconds(new_time_limit - (absl::Now() - start_time));
+      new_time_limit_sec = std::max<int64_t>(new_time_limit_sec, 0);
+
+      int64_t deadline_sec = std::numeric_limits<int64_t>::max();
+      if (has_deadline) {
+        deadline_sec =
+            elem.deadline_time.value() - absl::ToUnixSeconds(absl::Now());
+        if (deadline_sec <= 0) {
+          TaskTerminateQueueElem ev_task_terminate{
+              .termination_reason = TerminatedBy::TERMINATION_BY_DEADLINE};
+          m_task_terminate_queue_.enqueue(ev_task_terminate);
+          m_terminate_task_async_handle_->send();
+          elem.ok_prom.set_value(CraneErrCode::SUCCESS);
+          continue;
+        }
+      }
+
+      bool is_deadline = has_deadline && deadline_sec <= new_time_limit_sec;
+      int64_t timer_sec = is_deadline ? deadline_sec : new_time_limit_sec;
+      AddTerminationTimer_(timer_sec, is_deadline);
+      CRANE_TRACE("Replace the old timer to {} timer of {} seconds",
+                  is_deadline ? "deadline" : "time_limit", timer_sec);
     }
♻️ Duplicate comments (2)
src/Craned/Core/SupervisorKeeper.h (1)

37-39: Add missing #include <optional> for std::optional.

This declaration uses std::optional but the header doesn't include <optional>. While it may work due to transitive includes, this is fragile and non-portable.

Apply this diff:

 #include "CranedPublicDefs.h"
 // Precompiled header comes first.
 
 #include <protos/Supervisor.grpc.pb.h>
+#include <optional>
 
 #include "JobManager.h"

Additionally, consider adding brief parameter documentation:

// time_limit_seconds: optional time limit in seconds
// deadline_time: optional deadline as Unix epoch seconds
CraneErrCode ChangeTaskTimeConstraint(
    std::optional<int64_t> time_limit_seconds,
    std::optional<int64_t> deadline_time);
src/Craned/Supervisor/TaskManager.cpp (1)

1983-1993: Clamp negative timer intervals before arming uv timers.

When the supervisor starts a task with a deadline already in the past (or no deadline at all), deadline_sec becomes negative, and AddTerminationTimer_(std::min(deadline_sec, time_limit_sec), …) feeds a negative duration into libuv. That underflows into a huge unsigned timeout, so the deadline callback never runs. Please short‑circuit expired deadlines and only arm the timer with non‑negative intervals, falling back to the time limit when no deadline is set.

-    int64_t time_limit_sec = m_step_.GetStep().time_limit().seconds();
-    int64_t deadline_sec = m_step_.GetStep().deadline_time().seconds() -
-                           absl::ToUnixSeconds(absl::Now());
-
-    std::string log_timer =
-        deadline_sec <= time_limit_sec ? "deadline" : "time_limit";
-    bool is_deadline = deadline_sec <= time_limit_sec ? true : false;
-
-    AddTerminationTimer_(std::min(deadline_sec, time_limit_sec), is_deadline);
-    CRANE_TRACE("Add a {} timer of {} seconds", log_timer,
-                std::min(deadline_sec, time_limit_sec));
+    int64_t time_limit_sec =
+        std::max<int64_t>(m_step_.GetStep().time_limit().seconds(), 0);
+
+    if (m_step_.GetStep().has_deadline_time()) {
+      int64_t deadline_sec = m_step_.GetStep().deadline_time().seconds() -
+                             absl::ToUnixSeconds(absl::Now());
+      if (deadline_sec <= 0) {
+        CRANE_TRACE("Task deadline already reached; terminating immediately.");
+        EvTaskTimerCb_(true);
+      } else {
+        bool is_deadline = deadline_sec <= time_limit_sec;
+        int64_t timer_sec = is_deadline ? deadline_sec : time_limit_sec;
+        AddTerminationTimer_(timer_sec, is_deadline);
+        CRANE_TRACE("Add a {} timer of {} seconds",
+                    is_deadline ? "deadline" : "time_limit", timer_sec);
+      }
+    } else {
+      AddTerminationTimer_(time_limit_sec, false);
+      CRANE_TRACE("Add a time_limit timer of {} seconds", time_limit_sec);
+    }
🧹 Nitpick comments (1)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)

592-615: Consider extracting common error handling logic.

The Deadline branch duplicates error handling logic from the TimeLimit branch (lines 538-553). While functionally correct, this creates maintenance burden.

Consider extracting the common response logic into a helper lambda:

auto handle_result = [&](CraneErrCode err, task_id_t task_id, const char* constraint_name) {
  if (err == CraneErrCode::SUCCESS) {
    response->add_modified_tasks(task_id);
  } else if (err == CraneErrCode::ERR_NON_EXISTENT) {
    response->add_not_modified_tasks(task_id);
    response->add_not_modified_reasons(fmt::format(
        "Task #{} was not found in running or pending queue.", task_id));
  } else if (err == CraneErrCode::ERR_INVALID_PARAM) {
    response->add_not_modified_tasks(task_id);
    response->add_not_modified_reasons(fmt::format("Invalid {} value.", constraint_name));
  } else {
    response->add_not_modified_tasks(task_id);
    response->add_not_modified_reasons(
        fmt::format("Failed to change the {} of Task#{}: {}.",
                    constraint_name, task_id, CraneErrStr(err)));
  }
};

Then use it in both branches to reduce duplication.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4f0240 and 7b96edd.

📒 Files selected for processing (13)
  • src/CraneCtld/RpcService/CranedKeeper.cpp (1 hunks)
  • src/CraneCtld/RpcService/CranedKeeper.h (1 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (2 hunks)
  • src/CraneCtld/TaskScheduler.cpp (8 hunks)
  • src/CraneCtld/TaskScheduler.h (3 hunks)
  • src/Craned/Core/CranedServer.cpp (1 hunks)
  • src/Craned/Core/JobManager.cpp (3 hunks)
  • src/Craned/Core/JobManager.h (4 hunks)
  • src/Craned/Core/SupervisorKeeper.cpp (1 hunks)
  • src/Craned/Core/SupervisorKeeper.h (1 hunks)
  • src/Craned/Supervisor/SupervisorServer.cpp (1 hunks)
  • src/Craned/Supervisor/TaskManager.cpp (8 hunks)
  • src/Craned/Supervisor/TaskManager.h (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/Craned/Core/CranedServer.cpp
  • src/CraneCtld/RpcService/CranedKeeper.h
🧰 Additional context used
🧬 Code graph analysis (9)
src/CraneCtld/TaskScheduler.h (3)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1023-1150)
  • ChangeTaskTimeConstraint (1023-1025)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (84-103)
  • ChangeTaskTimeConstraint (84-86)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-85)
  • ChangeTaskTimeConstraint (63-66)
src/Craned/Supervisor/SupervisorServer.cpp (2)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1023-1150)
  • ChangeTaskTimeConstraint (1023-1025)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (84-103)
  • ChangeTaskTimeConstraint (84-86)
src/CraneCtld/TaskScheduler.cpp (3)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (84-103)
  • ChangeTaskTimeConstraint (84-86)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-85)
  • ChangeTaskTimeConstraint (63-66)
src/CraneCtld/CtldPublicDefs.h (1)
  • CheckIfTimeLimitSecIsValid (705-707)
src/CraneCtld/RpcService/CranedKeeper.cpp (1)
src/Craned/Core/CranedServer.cpp (2)
  • ChangeJobTimeConstraint (309-334)
  • ChangeJobTimeConstraint (309-312)
src/Craned/Core/SupervisorKeeper.h (3)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1023-1150)
  • ChangeTaskTimeConstraint (1023-1025)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (84-103)
  • ChangeTaskTimeConstraint (84-86)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-85)
  • ChangeTaskTimeConstraint (63-66)
src/Craned/Core/JobManager.h (1)
src/Craned/Core/JobManager.cpp (4)
  • ChangeJobTimeConstraintAsync (995-1010)
  • ChangeJobTimeConstraintAsync (995-997)
  • EvCleanChangeJobTimeConstraintQueueCb_ (1025-1053)
  • EvCleanChangeJobTimeConstraintQueueCb_ (1025-1025)
src/Craned/Core/SupervisorKeeper.cpp (2)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1023-1150)
  • ChangeTaskTimeConstraint (1023-1025)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-85)
  • ChangeTaskTimeConstraint (63-66)
src/Craned/Supervisor/TaskManager.h (1)
src/Craned/Supervisor/TaskManager.cpp (10)
  • EvTaskTimerCb_ (1768-1790)
  • EvTaskTimerCb_ (1768-1768)
  • ChangeTaskTimeConstraintAsync (1662-1690)
  • ChangeTaskTimeConstraintAsync (1662-1663)
  • EvCleanTaskStopQueueCb_ (1792-1886)
  • EvCleanTaskStopQueueCb_ (1792-1792)
  • EvCleanTerminateTaskQueueCb_ (1888-1929)
  • EvCleanTerminateTaskQueueCb_ (1888-1888)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1931-1968)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1931-1931)
src/Craned/Supervisor/TaskManager.cpp (2)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (154-154)
src/Craned/Supervisor/TaskManager.h (2)
  • DelTerminationTimer_ (336-342)
  • AddTerminationTimer_ (325-334)
🔇 Additional comments (9)
src/CraneCtld/TaskScheduler.h (2)

488-490: LGTM!

The method signature correctly accepts optional parameters for both time_limit_seconds and deadline_time, allowing callers to modify either constraint independently.


540-542: LGTM!

Adding finish_status to CancelPendingTaskQueueElem properly distinguishes between cancellation and deadline expiry, enabling correct task status reporting.

Also applies to: 677-679

src/Craned/Core/SupervisorKeeper.cpp (1)

84-103: LGTM!

The implementation correctly handles optional parameters by conditionally setting request fields only when values are present. Error handling and RPC semantics are preserved.

src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)

531-537: LGTM!

The TimeLimit handling correctly constructs the optional and calls ChangeTaskTimeConstraint with the appropriate nullopt for deadline_time.

src/Craned/Supervisor/SupervisorServer.cpp (1)

63-84: LGTM!

The RPC handler correctly extracts optional time_limit_seconds and deadline_time from the request and passes them to the async task manager method. Response handling is appropriate.

src/CraneCtld/RpcService/CranedKeeper.cpp (1)

241-277: LGTM!

The method correctly handles optional parameters by conditionally setting request fields only when values are present. Error messages have been updated to reflect the new method name, and the RPC semantics are preserved.

src/Craned/Core/JobManager.cpp (3)

99-104: LGTM!

The async handle has been appropriately renamed from m_change_task_time_limit_async_handle_ to m_change_job_time_constraint_async_handle_ to reflect the updated terminology.


995-1010: LGTM!

The method signature correctly accepts optional parameters for both time_limit_seconds and deadline_time, enabling independent modification of either constraint. The queue element structure has been updated accordingly.


1025-1053: LGTM!

The callback correctly processes the updated queue element structure, passes optional parameters to the supervisor stub, and provides appropriate error messages. The terminology has been updated to consistently use "job" rather than "task".

@github-actions github-actions bot added test-passed Build and test success and removed test-passed Build and test success labels Oct 15, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Craned/Supervisor/TaskManager.cpp (1)

1934-1971: Unsafe access of optional values without presence checks.

Multiple locations access .value() on std::optional<int64_t> fields without verifying they contain a value:

  • Line 1945: elem.time_limit.value()
  • Line 1948: elem.deadline_time.value()
  • Line 1958: elem.deadline_time.value()

If either field is std::nullopt, this throws std::bad_optional_access. The logic must check has_value() before dereferencing, and handle the case where no deadline or time limit is set.

Apply this diff to safely handle optional values:

     DelTerminationTimer_();

     absl::Time start_time =
         absl::FromUnixSeconds(m_step_.GetStep().start_time().seconds());
-    absl::Duration const& new_time_limit =
-        absl::Seconds(elem.time_limit.value());
-
-    if (now - start_time >= new_time_limit ||
-        absl::ToUnixSeconds(now) >= elem.deadline_time.value()) {
+    
+    if (!elem.time_limit.has_value()) {
+      CRANE_ERROR("time_limit is not set in ChangeTaskTimeConstraint request");
+      elem.ok_prom.set_value(CraneErrCode::ERR_INVALID_PARAM);
+      continue;
+    }
+    
+    absl::Duration const& new_time_limit =
+        absl::Seconds(elem.time_limit.value());
+
+    bool timed_out = (now - start_time >= new_time_limit);
+    if (elem.deadline_time.has_value()) {
+      int64_t deadline_sec = 
+          elem.deadline_time.value() - absl::ToUnixSeconds(now);
+      timed_out |= (deadline_sec <= 0);
+    }
+    
+    if (timed_out) {
       // If the task times out, terminate it.
       TaskTerminateQueueElem ev_task_terminate{
           .termination_reason = TerminatedBy::TERMINATION_BY_TIMEOUT};
       m_task_terminate_queue_.enqueue(ev_task_terminate);
       m_terminate_task_async_handle_->send();

     } else {
       // If the task haven't timed out, set up a new timer.
-      int64_t deadline_sec =
-          elem.deadline_time.value() - absl::ToUnixSeconds(absl::Now());
       int64_t new_time_limit_sec =
           ToInt64Seconds((new_time_limit - (absl::Now() - start_time)));
-      bool is_deadline = deadline_sec <= new_time_limit_sec ? true : false;
-      std::string log_timer =
-          deadline_sec <= new_time_limit_sec ? "deadline" : "time_limit";
-      AddTerminationTimer_(std::min(deadline_sec, new_time_limit_sec),
-                           is_deadline);
-      CRANE_TRACE("Replace the old timer to {} timer of {} seconds", log_timer,
-                  std::min(deadline_sec, new_time_limit_sec));
+      
+      int64_t timer_sec = new_time_limit_sec;
+      bool is_deadline = false;
+      
+      if (elem.deadline_time.has_value()) {
+        int64_t deadline_sec = 
+            elem.deadline_time.value() - absl::ToUnixSeconds(absl::Now());
+        if (deadline_sec > 0 && deadline_sec < new_time_limit_sec) {
+          timer_sec = deadline_sec;
+          is_deadline = true;
+        }
+      }
+      
+      AddTerminationTimer_(timer_sec, is_deadline);
+      CRANE_TRACE("Replace the old timer to {} timer of {} seconds", 
+                  is_deadline ? "deadline" : "time_limit", timer_sec);
     }
     elem.ok_prom.set_value(CraneErrCode::SUCCESS);
   }
♻️ Duplicate comments (2)
src/Craned/Supervisor/TaskManager.cpp (2)

1678-1685: Check proto field presence before reading optional values.

When the caller omits deadline_time, line 1684 reads deadline_time().seconds() which defaults to 0 if the proto field is unset. This converts a missing deadline into an epoch timestamp (unix time 0), causing the consumer to believe an expired deadline exists. Similarly, line 1675 may read an unset time_limit as 0.

Apply this diff to preserve std::nullopt when fields are unset:

   if (time_limit) {
     elem.time_limit = time_limit;
     m_step_.GetMutableStep().mutable_time_limit()->set_seconds(
         time_limit.value());
-  } else {
+  } else if (m_step_.GetStep().has_time_limit()) {
     elem.time_limit =
         std::optional<int64_t>(m_step_.GetStep().time_limit().seconds());
+  } else {
+    elem.time_limit = std::nullopt;
   }

   if (deadline_time) {
     elem.deadline_time = deadline_time;
     m_step_.GetMutableStep().mutable_deadline_time()->set_seconds(
         deadline_time.value());
-  } else {
+  } else if (m_step_.GetStep().has_deadline_time()) {
     elem.deadline_time =
         std::optional<int64_t>(m_step_.GetStep().deadline_time().seconds());
+  } else {
+    elem.deadline_time = std::nullopt;
+    m_step_.GetMutableStep().clear_deadline_time();
   }

1985-1996: Missing deadline presence check causes incorrect timer logic.

Lines 1987-1988 read deadline_time().seconds() without checking has_deadline_time() first. If no deadline is configured, seconds() returns 0, making deadline_sec deeply negative (current time minus epoch). This causes line 1991 to incorrectly treat the task as having an expired deadline, and line 1994 passes a negative value to AddTerminationTimer_.

Apply this diff to check deadline presence and handle missing/expired deadlines:

     // Add a timer to limit the execution time of a task.
     // Note: event_new and event_add in this function is not thread safe,
     //       so we move it outside the multithreading part.

     int64_t time_limit_sec = m_step_.GetStep().time_limit().seconds();
-    int64_t deadline_sec = m_step_.GetStep().deadline_time().seconds() -
-                           absl::ToUnixSeconds(absl::Now());
-
-    std::string log_timer =
-        deadline_sec <= time_limit_sec ? "deadline" : "time_limit";
-    bool is_deadline = deadline_sec <= time_limit_sec ? true : false;
-
-    AddTerminationTimer_(std::min(deadline_sec, time_limit_sec), is_deadline);
-    CRANE_TRACE("Add a {} timer of {} seconds", log_timer,
-                std::min(deadline_sec, time_limit_sec));
+    
+    if (m_step_.GetStep().has_deadline_time()) {
+      int64_t deadline_sec = m_step_.GetStep().deadline_time().seconds() -
+                             absl::ToUnixSeconds(absl::Now());
+      
+      if (deadline_sec <= 0) {
+        CRANE_TRACE("Task deadline already reached; terminating immediately.");
+        EvTaskTimerCb_(true);
+        elem.ok_prom.set_value(CraneErrCode::SUCCESS);
+        continue;
+      } else if (deadline_sec <= time_limit_sec) {
+        AddTerminationTimer_(deadline_sec, true);
+        CRANE_TRACE("Add a deadline timer of {} seconds", deadline_sec);
+      } else {
+        AddTerminationTimer_(time_limit_sec, false);
+        CRANE_TRACE("Add a time_limit timer of {} seconds", time_limit_sec);
+      }
+    } else {
+      AddTerminationTimer_(time_limit_sec, false);
+      CRANE_TRACE("Add a time_limit timer of {} seconds", time_limit_sec);
+    }
🧹 Nitpick comments (9)
docs/command/cqueue.md (4)

26-26: Default column vs option inconsistency: is Deadline always shown?

You’ve added Deadline to the default “主要输出项”. If Deadline is always in default output, the --deadline option becomes redundant. Please confirm intended behavior and make docs consistent (either remove from default list, or redefine the option).

Possible resolution A (keep as default, note redundancy):

  • Add a note near --deadline: “默认已显示,无需指定;仅用于与 --format 组合控制列顺序。”

Possible resolution B (opt-in like -S/--start):

  • Remove “Deadline” from the default list and keep --deadline as the toggle to include the column.

31-31: Clarify what --deadline does (toggle column vs filter vs range).

“显示作业的截止时间” is vague. If it toggles an extra column (like -S/--start), say so and note interaction with -o/--format. If it filters by a time or range, document accepted syntax and examples.

Apply one of these doc tweaks:

- - **--deadline** **string**:显示作业的截止时间
+ - **--deadline**:在输出中增加“Deadline”列(等同于在 --format 中添加 %Deadline,可与 -o 组合控制位置)。默认结果中已包含该列则无需指定。

or, if it’s a filter:

- - **--deadline** **string**:显示作业的截止时间
+ - **--deadline** **string**:按截止时间筛选,支持:
+   - 精确时间:2025-10-20T12:00:00
+   - 时间范围:start~end(ISO 8601),如 2025-10-01T00:00:00~2025-10-31T23:59:59
+   - 仅上/下界:~end 或 start~

36-36: Specify format/timezone and empty-display policy for %deadline/%Deadline.

Document the output format (e.g., ISO 8601, local timezone vs UTC), and what prints when a job has no deadline (e.g., “-” or empty).

Example addition:

  • 输出格式:YYYY-MM-DDTHH:MM:SS(本地时区,24 小时制)
  • 无截止时间:显示为 “-”

28-28: Fix heading level to satisfy markdownlint (MD001).

Heading jumps from H1 to H4. Use H2 and avoid emphasis for headings.

-#### **主要参数**
+## 主要参数

Based on static analysis hints.

docs/command/cbatch.md (5)

41-41: Document accepted time formats/timezone for --begin.

Specify input formats (e.g., ISO 8601 “YYYY-MM-DDTHH:MM:SS”), timezone (local vs UTC), and behavior if the time is in the past.

Recommended text:

  • -b/--begin string:指定作业的最早开始时间,格式支持:
    • 绝对时间:2025-10-20T12:00:00(本地时区,24 小时制)
    • 相对时间(若支持):now+2h、now+30m
    • 过去时间:若早于当前时间,则视为尽快开始

45-45: Clarify semantics and precedence of --deadline vs --time.

Document input format/timezone, and how it interacts with --time. Typical expectation: job is terminated at the earlier of (start_time + time_limit) and deadline. Also state whether deadline can be modified after submission.

Proposed wording:

  • --deadline string:指定作业的截止时间(绝对时间,ISO 8601,如 2025-10-20T12:00:00,本地时区)。当同时设置 --time 时,系统在以下时间点中较早者终止作业:开始时间+时间限制、截止时间。提交后是否允许修改截止时间:请在文档中明确(建议:不可修改)。

48-52: Add concise examples for new flags (--export, --extra-attr, --get-user-env, --gres).

Examples reduce ambiguity on syntax and quoting.

Example block to append under “例”:

  • --export 示例:--export VAR1=foo,VAR2=bar 或 --export ALL
  • --extra-attr 示例:--extra-attr '{"project":"A","ticket":"T-123"}'
  • --get-user-env 示例:--get-user-env
  • --gres 示例:--gres gpu:a100:1 或 --gres gpu:1

45-45: Mail and state mapping note for deadline terminations.

If deadline-triggered termination maps to TIMEOUT (same as timelimit), mention that it triggers --mail-type=TIMELIMIT and yields一致的作业状态/退出码说明。

Add a short note near --deadline or --mail-type: “达到截止时间的终止将以 TIMEOUT 处理(与 timelimit 一致),会触发 --mail-type=TIMELIMIT 通知。若系统定义了专用退出码,请在文档中列出。”


37-39: Avoid “bold as heading” (MD036).

Change bolded “主要参数:” to a proper heading to satisfy markdownlint and keep style consistent.

-**主要参数:**
+## 主要参数

Based on static analysis hints.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b96edd and 02c1c0e.

📒 Files selected for processing (5)
  • docs/command/cacct.md (1 hunks)
  • docs/command/cbatch.md (1 hunks)
  • docs/command/ccontrol.md (2 hunks)
  • docs/command/cqueue.md (1 hunks)
  • src/Craned/Supervisor/TaskManager.cpp (9 hunks)
✅ Files skipped from review due to trivial changes (2)
  • docs/command/cacct.md
  • docs/command/ccontrol.md
🧰 Additional context used
🧬 Code graph analysis (1)
src/Craned/Supervisor/TaskManager.cpp (2)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (154-154)
src/Craned/Supervisor/TaskManager.h (2)
  • DelTerminationTimer_ (336-342)
  • AddTerminationTimer_ (325-334)
🪛 markdownlint-cli2 (0.18.1)
docs/command/cbatch.md

45-45: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/command/cqueue.md

28-28: Heading levels should only increment by one level at a time
Expected: h2; Actual: h4

(MD001, heading-increment)

🔇 Additional comments (5)
src/Craned/Supervisor/TaskManager.cpp (5)

1523-1528: LGTM!

The renaming from time_limit to time_constraint accurately reflects the expanded API that now handles both time limits and deadlines.


1768-1790: LGTM!

The timer callback correctly distinguishes deadline expiration from time-limit timeout, mapping each to the appropriate status and exit code for both batch and non-batch tasks.


1840-1845: LGTM!

The deadline termination case is correctly integrated into the status change logic, with appropriate status mapping and exit code for signaled tasks.


1860-1865: LGTM!

Deadline termination is correctly handled for non-signaled tasks, following the same pattern as OOM detection.


1912-1914: LGTM!

The deadline termination reason is correctly propagated to the task instance, consistent with other termination types.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02c1c0e and 27ec3f2.

📒 Files selected for processing (1)
  • src/CraneCtld/TaskScheduler.cpp (8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/CraneCtld/TaskScheduler.cpp (3)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (84-103)
  • ChangeTaskTimeConstraint (84-86)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-85)
  • ChangeTaskTimeConstraint (63-66)
src/CraneCtld/CtldPublicDefs.h (1)
  • CheckIfTimeLimitSecIsValid (705-707)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/Craned/Common/CgroupManager.cpp (1)

569-597: Correct the line numbers and verify the UNDEFINED cgroup version case requires explicit handling.

The review identifies a real issue, but line numbers are incorrect. The problematic code is at lines 1579–1594 and 1610–1625 in DedicatedResourceAllocator::Allocate (both overloads), not 569–597.

The concern is valid: When SetDeviceAccess fails and CgroupManager::GetCgroupVersion() returns CGROUP_V2::UNDEFINED, the function returns true at line 1593 (and 1624), silently ignoring the device access failure. The UNDEFINED state is set when the system runs in hybrid cgroup mode (detected at line 52 during initialization), which is a valid and supported scenario.

Fix: Add explicit handling for the UNDEFINED case. Either:

  • Add an else if (CgroupManager::GetCgroupVersion() == CgConstant::CgroupVersion::UNDEFINED) { CRANE_WARN("..."); return false; } before the final return true;
  • Change the final return true; to return false; as a safe default for unsupported versions

Apply to both DedicatedResourceAllocator::Allocate overloads at lines 1579–1594 and 1610–1625.

src/Utilities/PublicHeader/include/crane/String.h (1)

19-39: Missing standard headers for used types

This header uses std::array, std::tuple, std::pair, std::vector but doesn’t include their headers. Add to prevent ODR/ABI surprises.

Apply:

 #include <charconv>
 #include <expected>
 #include <filesystem>
 #include <fstream>
 #include <list>
+#include <array>
+#include <tuple>
+#include <utility>
+#include <vector>
 #include <ranges>
 #include <string>
♻️ Duplicate comments (6)
src/CraneCtld/TaskScheduler.cpp (1)

1094-1105: Fix reservation end_time calculation to use new constraint values.

When both time_limit_seconds and absl_deadline_time are provided, the current logic uses a mix of old and new values:

  • Lines 1094-1099 use new time_limit_seconds with old task->deadline_time
  • Lines 1101-1105 use old task->time_limit with new absl_deadline_time

Since task fields are only updated later (lines 1117-1127), the reservation end_time bookkeeping becomes inconsistent with the actual constraints.

Compute the effective new values locally first, then use them for the std::min calculations:

+          absl::Duration effective_time_limit =
+              time_limit_seconds ? absl::Seconds(*time_limit_seconds)
+                                 : task->time_limit;
+          absl::Time effective_deadline =
+              absl_deadline_time ? *absl_deadline_time : task->deadline_time;
+
+          reservation_meta->logical_part.rn_task_res_map[task_id].end_time =
+              std::min(task->StartTime() + effective_time_limit,
+                       effective_deadline);
-          if (time_limit_seconds) {
-            reservation_meta->logical_part.rn_task_res_map[task_id].end_time =
-                std::min(task->StartTime() +
-                             absl::Seconds(time_limit_seconds.value()),
-                         task->deadline_time);
-          }
-
-          if (absl_deadline_time) {
-            reservation_meta->logical_part.rn_task_res_map[task_id].end_time =
-                std::min(task->StartTime() + task->time_limit,
-                         absl_deadline_time.value());
-          }

Based on learnings

src/Craned/Supervisor/TaskManager.cpp (4)

1802-1803: Remove dead function EvDeadlineTaskTimerCb_

Deadline handling has been merged into EvTaskTimerCb_(bool). Drop this unused stub.

-void TaskManager::EvDeadlineTaskTimerCb_() {}

1949-1992: Fix broken logic: duplicate else, unsafe .value(), wrong termination reason, negative intervals

  • There’s a stray extra else causing a syntax error.
  • Uses .value() on optional deadline without presence check.
  • Immediate-expiry always enqueues TIMEOUT; should enqueue DEADLINE when the deadline passed.
  • No clamping for negative intervals.

Replace the block with guarded, unified logic:

-    absl::Time start_time =
-        absl::FromUnixSeconds(m_step_.GetStep().start_time().seconds());
-    absl::Duration const& new_time_limit =
-        absl::Seconds(elem.time_limit.value());
-
-    if (now - start_time >= new_time_limit ||
-        absl::ToUnixSeconds(now) >= elem.deadline_time.value()) {
-      // If the task times out, terminate it.
-      TaskTerminateQueueElem ev_task_terminate{
-          .termination_reason = TerminatedBy::TERMINATION_BY_TIMEOUT};
-      m_task_terminate_queue_.enqueue(ev_task_terminate);
-      m_terminate_task_async_handle_->send();
-
-    } else {
-      // If the task haven't timed out, set up a new timer.
-      int64_t deadline_sec =
-          elem.deadline_time.value() - absl::ToUnixSeconds(absl::Now());
-      int64_t new_time_limit_sec =
-          ToInt64Seconds((new_time_limit - (absl::Now() - start_time)));
-      if (deadline_sec <= new_time_limit_sec) {
-        AddTerminationTimer_(deadline_sec, true);
-      } else {
-        AddTerminationTimer_(new_time_limit_sec, false);
-      }
-    } else {
-      // If the task haven't timed out, set up a new timer.
-      int64_t deadline_sec =
-          elem.deadline_time.value() - absl::ToUnixSeconds(absl::Now());
-      int64_t new_time_limit_sec =
-          ToInt64Seconds((new_time_limit - (absl::Now() - start_time)));
-      bool is_deadline = deadline_sec <= new_time_limit_sec ? true : false;
-      std::string log_timer =
-          deadline_sec <= new_time_limit_sec ? "deadline" : "time_limit";
-      AddTerminationTimer_(std::min(deadline_sec, new_time_limit_sec),
-                           is_deadline);
-      CRANE_TRACE("Replace the old timer to {} timer of {} seconds", log_timer,
-                  std::min(deadline_sec, new_time_limit_sec));
-    }
+    const absl::Time start_time =
+        absl::FromUnixSeconds(m_step_.GetStep().start_time().seconds());
+    const absl::Duration elapsed = now - start_time;
+    const int64_t cfg_time_limit_sec =
+        elem.time_limit.has_value()
+            ? elem.time_limit.value()
+            : m_step_.GetStep().time_limit().seconds();
+    int64_t tl_remaining_sec =
+        ToInt64Seconds(absl::Seconds(cfg_time_limit_sec) - elapsed);
+
+    std::optional<int64_t> dl_remaining_sec = std::nullopt;
+    if (elem.deadline_time.has_value()) {
+      dl_remaining_sec =
+          elem.deadline_time.value() - absl::ToUnixSeconds(now);
+    } else if (m_step_.GetStep().has_deadline_time()) {
+      dl_remaining_sec = m_step_.GetStep().deadline_time().seconds() -
+                         absl::ToUnixSeconds(now);
+    }
+
+    // Immediate expiry
+    if (dl_remaining_sec.has_value() && dl_remaining_sec.value() <= 0) {
+      m_task_terminate_queue_.enqueue(TaskTerminateQueueElem{
+          .termination_reason = TerminatedBy::TERMINATION_BY_DEADLINE});
+      m_terminate_task_async_handle_->send();
+    } else if (tl_remaining_sec <= 0) {
+      m_task_terminate_queue_.enqueue(TaskTerminateQueueElem{
+          .termination_reason = TerminatedBy::TERMINATION_BY_TIMEOUT});
+      m_terminate_task_async_handle_->send();
+    } else {
+      bool is_deadline = dl_remaining_sec.has_value() &&
+                         dl_remaining_sec.value() <= tl_remaining_sec;
+      int64_t timer_sec =
+          is_deadline ? dl_remaining_sec.value() : tl_remaining_sec;
+      timer_sec = std::max<int64_t>(0, timer_sec);
+      AddTerminationTimer_(timer_sec, is_deadline);
+      CRANE_TRACE("Replace the old timer to {} timer of {} seconds",
+                  is_deadline ? "deadline" : "time_limit", timer_sec);
+    }

1671-1694: Don’t coerce missing deadline to 0; preserve nullopt and clear proto

When no deadline is set, the else branch forces deadline_time to epoch (0), causing immediate “expired” behavior. Preserve absence and clear the proto field.

Apply:

   if (deadline_time) {
     elem.deadline_time = deadline_time;
     m_step_.GetMutableStep().mutable_deadline_time()->set_seconds(
         deadline_time.value());
-  } else {
-    elem.deadline_time =
-        std::optional<int64_t>(m_step_.GetStep().deadline_time().seconds());
+  } else if (m_step_.GetStep().has_deadline_time()) {
+    elem.deadline_time = std::optional<int64_t>(
+        m_step_.GetStep().deadline_time().seconds());
+  } else {
+    elem.deadline_time = std::nullopt;
+    m_step_.GetMutableStep().clear_deadline_time();
   }

2010-2033: Timer setup duplicated and broken (undefined ‘sec’, no presence checks, wrong window)

  • Two competing timer blocks; one uses undefined sec and will double-arm timers.
  • Compares deadline against configured time_limit instead of remaining time.
  • No guard for missing/past deadline.

Replace with a single, correct computation:

-    int64_t time_limit_sec = m_step_.GetStep().time_limit().seconds();
-    int64_t deadline_sec = m_step_.GetStep().deadline_time().seconds() -
-                           absl::ToUnixSeconds(absl::Now());
-
-    std::string log_timer =
-        deadline_sec <= time_limit_sec ? "deadline" : "time_limit";
-    bool is_deadline = deadline_sec <= time_limit_sec ? true : false;
-
-    AddTerminationTimer_(std::min(deadline_sec, time_limit_sec), is_deadline);
-    CRANE_TRACE("Add a {} timer of {} seconds", log_timer,
-                std::min(deadline_sec, time_limit_sec));
-
-    if (m_step_.GetStep().has_deadline_time()) {
-      int64_t deadline_sec = m_step_.GetStep().deadline_time().seconds() -
-                             absl::ToUnixSeconds(absl::Now());
-      bool is_deadline = deadline_sec <= sec ? true : false;
-      std::string log_timer = deadline_sec <= sec ? "deadline" : "time_limit";
-      deadline_sec = deadline_sec <= sec ? deadline_sec : sec;
-      AddTerminationTimer_(deadline_sec, is_deadline);
-      CRANE_TRACE("Add a {} timer of {} seconds", log_timer, deadline_sec);
-    } else {
-      AddTerminationTimer_(sec, false);
-      CRANE_TRACE("Add a time_limit timer of {} seconds", sec);
-    }
+    const int64_t now_sec = absl::ToUnixSeconds(absl::Now());
+    const int64_t start_sec = m_step_.GetStep().start_time().seconds();
+    const int64_t elapsed = now_sec - start_sec;
+
+    const int64_t cfg_time_limit_sec =
+        m_step_.GetStep().time_limit().seconds();
+    int64_t tl_remaining_sec = cfg_time_limit_sec - elapsed;
+
+    std::optional<int64_t> dl_remaining_sec = std::nullopt;
+    if (m_step_.GetStep().has_deadline_time()) {
+      dl_remaining_sec =
+          m_step_.GetStep().deadline_time().seconds() - now_sec;
+    }
+
+    if (dl_remaining_sec.has_value() && dl_remaining_sec.value() <= 0) {
+      CRANE_TRACE("Task deadline already reached; firing immediately.");
+      EvTaskTimerCb_(true);
+    } else if (tl_remaining_sec <= 0) {
+      CRANE_TRACE("Task time limit already exceeded; firing immediately.");
+      EvTaskTimerCb_(false);
+    } else {
+      bool is_deadline = dl_remaining_sec.has_value() &&
+                         dl_remaining_sec.value() <= tl_remaining_sec;
+      int64_t timer_sec =
+          is_deadline ? dl_remaining_sec.value() : tl_remaining_sec;
+      timer_sec = std::max<int64_t>(0, timer_sec);
+      AddTerminationTimer_(timer_sec, is_deadline);
+      CRANE_TRACE("Add a {} timer of {} seconds",
+                  is_deadline ? "deadline" : "time_limit", timer_sec);
+    }
src/Craned/Supervisor/TaskManager.h (1)

335-344: Remove AddDeadlineTerminationTimer_ in favor of unified AddTerminationTimer_

Deadline handling now routes through EvTaskTimerCb_(bool). This method (and its callback) is dead code.

-  void AddDeadlineTerminationTimer_(int64_t secs) {
-    auto termination_handel = m_uvw_loop_->resource<uvw::timer_handle>();
-    termination_handel->on<uvw::timer_event>(
-        [this](const uvw::timer_event&, uvw::timer_handle& h) {
-          EvDeadlineTaskTimerCb_();
-        });
-    termination_handel->start(std::chrono::seconds(secs),
-                              std::chrono::seconds(0));
-    m_step_.termination_timer = termination_handel;
-  }

Also remove the EvDeadlineTaskTimerCb_ declaration if present.

🧹 Nitpick comments (16)
docs/deployment/backend/eBPF.md (2)

79-82: Specify the language for the fenced code block.

The code block at line 79 lacks a language identifier. Add bash to improve syntax highlighting and readability.

-```
+```bash
-DCRANE_ENABLE_CGROUP_V2=ON
 -DCRANE_ENABLE_BPF=ON
-```
+```

15-135: Inconsistent code block styling throughout the document.

Markdownlint indicates that all fenced code blocks (```) should use indented formatting instead. Consider whether your project has an established markdown style guide. If a consistent style is preferred across the documentation, all code blocks in this file (lines 15–135) should be converted from fenced to indented blocks.

src/CraneCtld/CranedMetaContainer.cpp (1)

254-279: LGTM: Consistent refactoring with helpful debug logging.

The changes improve the code by:

  • Using GetValueExclusivePtr for more efficient lookup (single map access vs. Contains + operator[])
  • Updating error messages to correctly reference resv_id
  • Adding a debug log when freeing reservation resources, which aids troubleshooting

Optional: Consider removing the extra blank line at Line 272 for consistency with the rest of the codebase's formatting style.

   CRANE_DEBUG("[Job #{}] Freeing resource from reservation {}", task_id,
               resv_id);
-
   ResourceV2 const& resources = resource_iter->second.resources;
src/Craned/Core/CtldClient.cpp (1)

109-125: Error handling improved with explicit state transitions.

The restructured error paths now consistently set m_state_ = State::REQUESTING_CONFIG and call ActionRequestConfig_() for both the !request.ok() and !request.has_token() cases. The new else block for token mismatch appropriately logs and waits for the correct reply without retriggering the request.

Optional: Consider consolidating duplicate state transitions.

Lines 111-112 and lines 116-117 contain identical code. You could reduce duplication by consolidating these branches:

   } else {
-    if (!request.ok()) {
+    if (!request.ok() || !request.has_token()) {
-      CRANE_LOGGER_WARN(m_logger_, "ConfigureCranedRequest failed from Ctld.");
+      if (!request.ok())
+        CRANE_LOGGER_WARN(m_logger_, "ConfigureCranedRequest failed from Ctld.");
+      else
+        CRANE_LOGGER_DEBUG(m_logger_,
+                           "No token in ConfigureCranedRequest from Ctld.");
       m_state_ = State::REQUESTING_CONFIG;
       ActionRequestConfig_();
-    } else if (!request.has_token()) {
-      CRANE_LOGGER_DEBUG(m_logger_,
-                         "No token in ConfigureCranedRequest from Ctld.");
-      m_state_ = State::REQUESTING_CONFIG;
-      ActionRequestConfig_();
-
     } else {
docs/command/ccontrol.md (2)

213-213: Correct Chinese grammar on line 213.

The phrase "现在completion自动补全代码命令已经无法使用" contains a grammatical issue. Consider rephrasing to "现在的自动补全(completion)命令已经无法使用" or "completion自动补全功能已经无法使用" for clarity.

-由于删除了cobra框架,现在completion自动补全代码命令已经无法使用
+由于删除了cobra框架,现在completion自动补全功能已经无法使用

215-223: Document the purpose and usage of new backend files.

Lines 215–223 introduce new file structure documentation (help.go, ccontrol.go, CmdArgParser.go, parser.go) to reflect the shift away from the cobra framework. While this is a good structural addition, consider enhancing the descriptions to clarify their roles more precisely:

- help.go:由于删除了cobra框架,现在help命令的输出编写在help.go中
+ help.go:处理帮助命令输出的文件,由于删除了cobra框架,help命令的输出逻辑统一编写于此
- ccontrol.go:向后端通信的函数实现
+ ccontrol.go:实现与后端通信的业务逻辑函数
- CmdArgParser.go:命令解析文件,将使用解析到的命令行数据来调用ccontrol.go的函数
+ CmdArgParser.go:命令解析及调度文件,将解析后的命令行参数调用相应的ccontrol.go函数
- parser.go:解析器函数,定义数据结构与解析规则
+ parser.go:通用解析器实现,定义命令解析规则和数据结构

This improves clarity for maintainers and future contributors.

src/Misc/BPF/cgroup_dev_bpf.c (3)

14-16: Struct field type migration looks good; prefer unsigned for bitfields and avoid packing

  • __u64/__u32 use aligns with uapi/context types. Good.
  • access/type are bitfields; make them __u16, not signed, to avoid UB on bitwise ops.
  • Consider dropping #pragma pack: CO-RE/BTF expects natural alignment; packing can cause subtle layout issues across compilers/kernels. If you need 8‑byte alignment for keys, use attribute((aligned(8))) on the struct instead.

Apply within-range changes for bitfield signedness:

-  __s16 access;
-  __s16 type;
+  __u16 access;
+  __u16 type;

If you choose to remove packing (outside this range), a safe layout is:

struct BpfKey {
  __u64 cgroup_id;
  __u32 major;
  __u32 minor;
} __attribute__((aligned(8)));

struct BpfDeviceMeta {
  __u32 major;
  __u32 minor;
  __s32 permission;
  __u16 access;
  __u16 type;
};

Also applies to: 22-26


61-61: Use %llu for 64‑bit cgroup_id in bpf_printk

key.cgroup_id is __u64; prefer %llu for portability across archs.

-  if (enable_logging) bpf_printk("ctx cgroup ID : %lu\n", key.cgroup_id);
+  if (enable_logging) bpf_printk("ctx cgroup ID : %llu\n", key.cgroup_id);

100-105: Nit: incorrect log message for MKNOD case

Message says “Write access denied” in the MKNOD branch. Change to “Mknod access denied” to avoid confusion.

docs/reference/exit_code.md (1)

35-37: Optional: Consider varying sentence structure for readability.

Three successive sentences begin with "Values". While acceptable for technical documentation, you could improve readability slightly:

-- Values **0–255** represent the **exit return value**.  
-- Values **256–320** represent **program exit signals**.  
-- Values **above 320** represent **Crane-defined codes**.
+- **0–255**: Exit return value
+- **256–320**: Program exit signals  
+- **Above 320**: Crane-defined codes
src/Craned/Core/SupervisorKeeper.cpp (1)

32-33: Log placeholders misleading for ExecuteTask failure

Message says “ok:{}” but prints error_message(). Consider “error: {}” or include ok.ok() separately.

docs/deployment/backend/Ubuntu.md (1)

50-57: SELinux section is not applicable to Ubuntu

Ubuntu ships AppArmor by default; SELinux and setenforce usually aren’t present. Recommend removing or clarifying this as “if SELinux is installed” and otherwise reference AppArmor.

src/Utilities/PublicHeader/include/crane/String.h (2)

157-181: Status-to-string map lacks “Deadline” entry

TaskStatus::Deadline was introduced; StepStatusToString will return “Invalid” for it. Update CraneStepStatusStrArr to include “Deadline” at the correct ordinal, or switch StepStatusToString to a switch-case over enum values to avoid index drift.


126-136: FlattenMapView relies on values being ranges

Fine as a utility; consider documenting requirement (kv.second must be a range), to prevent misuse.

src/CraneCtld/TaskScheduler.cpp (1)

3450-3456: Consider removing redundant deadline check.

Tasks with deadline_time < now are already canceled and removed from m_pending_task_map_ in ScheduleThread_ (lines 572-592) before GetOrderedTaskIdList is called. This check at lines 3450-3453 should never trigger.

While this defensive code is harmless, removing it would simplify the logic since deadline-expired tasks are guaranteed not to reach this point.

src/Craned/Supervisor/TaskManager.h (1)

89-90: Consider narrowing mutable access

Exposing GetMutableStep() invites accidental unsynchronized mutation. Prefer explicit setters (e.g., SetTimeLimitSeconds/SetDeadlineTime) invoked on the UV loop thread.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27ec3f2 and dc7edbd.

⛔ Files ignored due to path filters (30)
  • docs/images/architecture-V1.png is excluded by !**/*.png
  • docs/images/cbatch.png is excluded by !**/*.png
  • docs/images/cbatch_script.png is excluded by !**/*.png
  • docs/images/ccancel.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_hold.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_hold1.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_hold2.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_hold3.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_release.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_release1.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_release2.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_show.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_showjob.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_shownode.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_showp.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_update.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_updatejob.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_updatejob1.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_updatejob2.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_updatejob3.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_updatejob4.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_updatenode.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_updatenode1.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_updatenode2.png is excluded by !**/*.png
  • docs/images/ccontrol/ccontrol_updatenode3.png is excluded by !**/*.png
  • docs/images/ccontrol_node.png is excluded by !**/*.png
  • docs/images/ccontrol_partition.png is excluded by !**/*.png
  • docs/images/cqueue.png is excluded by !**/*.png
  • docs/images/logo.png is excluded by !**/*.png
  • docs/images/sacctmgr.png is excluded by !**/*.png
📒 Files selected for processing (65)
  • .github/workflows/build.yaml (2 hunks)
  • .github/workflows/mkpages.yaml (0 hunks)
  • .github/workflows/pages.yaml (1 hunks)
  • .readthedocs.yaml (1 hunks)
  • CMakeLists.txt (3 hunks)
  • README.md (1 hunks)
  • README_EN.md (1 hunks)
  • dependencies/cmake/unqlite/CMakeLists.txt (1 hunks)
  • dependencies/cmake/uvw/CMakeLists.txt (1 hunks)
  • dependencies/cmake/uvw/uvw.diff (0 hunks)
  • dependencies/cmake/yaml-cpp/CMakeLists.txt (1 hunks)
  • docs/command/cacctmgr.md (1 hunks)
  • docs/command/ccontrol.md (2 hunks)
  • docs/command/ccontrol_yufa.md (0 hunks)
  • docs/command/cqueue.md (2 hunks)
  • docs/config/Centos7Configuration.md (0 hunks)
  • docs/deployment/backend/CentOS7.md (1 hunks)
  • docs/deployment/backend/Rocky9.md (1 hunks)
  • docs/deployment/backend/Ubuntu.md (1 hunks)
  • docs/deployment/backend/eBPF.md (1 hunks)
  • docs/deployment/backend/packaging.md (1 hunks)
  • docs/deployment/configuration/config.md (1 hunks)
  • docs/deployment/configuration/database.md (1 hunks)
  • docs/deployment/configuration/multi-node.md (1 hunks)
  • docs/deployment/configuration/pam.md (1 hunks)
  • docs/deployment/frontend/frontend.md (1 hunks)
  • docs/deployment/frontend/plugins.md (1 hunks)
  • docs/deployment/index.md (1 hunks)
  • docs/index.md (1 hunks)
  • docs/reference/exit_code.md (1 hunks)
  • docs/requirements.txt (1 hunks)
  • etc/config.yaml (1 hunks)
  • mkdocs.yaml (1 hunks)
  • protos/Crane.proto (8 hunks)
  • protos/PublicDefs.proto (7 hunks)
  • scripts/uninstall_craned.sh (1 hunks)
  • src/CraneCtld/AccountManager.cpp (1 hunks)
  • src/CraneCtld/CraneCtld.cpp (1 hunks)
  • src/CraneCtld/CranedMetaContainer.cpp (2 hunks)
  • src/CraneCtld/CranedMetaContainer.h (1 hunks)
  • src/CraneCtld/CtldPublicDefs.h (2 hunks)
  • src/CraneCtld/RpcService/CranedKeeper.cpp (22 hunks)
  • src/CraneCtld/RpcService/CranedKeeper.h (6 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (4 hunks)
  • src/CraneCtld/TaskScheduler.cpp (10 hunks)
  • src/CraneCtld/TaskScheduler.h (4 hunks)
  • src/Craned/Common/CgroupManager.cpp (3 hunks)
  • src/Craned/Common/CgroupManager.h (3 hunks)
  • src/Craned/Core/CranedPublicDefs.h (1 hunks)
  • src/Craned/Core/CtldClient.cpp (1 hunks)
  • src/Craned/Core/JobManager.cpp (5 hunks)
  • src/Craned/Core/SupervisorKeeper.cpp (3 hunks)
  • src/Craned/Supervisor/CforedClient.cpp (6 hunks)
  • src/Craned/Supervisor/CforedClient.h (4 hunks)
  • src/Craned/Supervisor/CranedClient.cpp (3 hunks)
  • src/Craned/Supervisor/CranedClient.h (1 hunks)
  • src/Craned/Supervisor/TaskManager.cpp (15 hunks)
  • src/Craned/Supervisor/TaskManager.h (6 hunks)
  • src/Misc/BPF/CMakeLists.txt (1 hunks)
  • src/Misc/BPF/cgroup_dev_bpf.c (3 hunks)
  • src/Utilities/PublicHeader/OS.cpp (2 hunks)
  • src/Utilities/PublicHeader/String.cpp (1 hunks)
  • src/Utilities/PublicHeader/include/crane/OS.h (1 hunks)
  • src/Utilities/PublicHeader/include/crane/PublicHeader.h (2 hunks)
  • src/Utilities/PublicHeader/include/crane/String.h (1 hunks)
💤 Files with no reviewable changes (4)
  • .github/workflows/mkpages.yaml
  • docs/command/ccontrol_yufa.md
  • dependencies/cmake/uvw/uvw.diff
  • docs/config/Centos7Configuration.md
✅ Files skipped from review due to trivial changes (2)
  • dependencies/cmake/unqlite/CMakeLists.txt
  • docs/command/cqueue.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/Utilities/PublicHeader/include/crane/PublicHeader.h
  • src/CraneCtld/CtldPublicDefs.h
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Applied to files:

  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
  • src/CraneCtld/RpcService/CranedKeeper.cpp
📚 Learning: 2025-06-23T07:53:30.513Z
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, `g_ctld_client` is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.

Applied to files:

  • src/CraneCtld/CraneCtld.cpp
📚 Learning: 2025-05-26T11:04:30.580Z
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#399
File: src/Craned/Craned/SupervisorKeeper.h:32-32
Timestamp: 2025-05-26T11:04:30.580Z
Learning: The Supervisor component in the Crane system is designed to manage only one task per instance. The task specification is provided to the Supervisor during startup by Craned (likely through InitSupervisorRequest), so the ExecuteTask() method doesn't need to accept task parameters since the Supervisor already knows which task to execute.

Applied to files:

  • src/Craned/Supervisor/CranedClient.cpp
📚 Learning: 2025-04-27T06:47:51.751Z
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#475
File: src/Craned/CtldClient.cpp:0-0
Timestamp: 2025-04-27T06:47:51.751Z
Learning: In CtldClient, when a CranedRegister RPC fails, the state is reset to NOTIFY_SENDING. This ensures that token will be refreshed by NotifyCranedConnected_() before any subsequent CranedRegister_() call, preventing null token access issues.

Applied to files:

  • src/CraneCtld/RpcService/CranedKeeper.h
  • src/CraneCtld/RpcService/CranedKeeper.cpp
📚 Learning: 2025-05-02T07:05:26.012Z
Learnt from: RileyWen
PR: PKUHPC/CraneSched#475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.

Applied to files:

  • src/CraneCtld/RpcService/CranedKeeper.h
  • src/CraneCtld/RpcService/CranedKeeper.cpp
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.

Applied to files:

  • src/CraneCtld/RpcService/CranedKeeper.h
  • src/Craned/Supervisor/CforedClient.h
  • src/CraneCtld/RpcService/CranedKeeper.cpp
📚 Learning: 2025-05-02T07:05:49.032Z
Learnt from: RileyWen
PR: PKUHPC/CraneSched#475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Applied to files:

  • src/CraneCtld/RpcService/CranedKeeper.cpp
🧬 Code graph analysis (20)
src/CraneCtld/CranedMetaContainer.h (1)
src/CraneCtld/CranedMetaContainer.cpp (2)
  • FreeResourceFromResv (254-279)
  • FreeResourceFromResv (254-255)
src/Craned/Core/CtldClient.cpp (1)
src/Utilities/PublicHeader/GrpcHelper.cpp (2)
  • ProtoTimestampToString (48-51)
  • ProtoTimestampToString (48-49)
src/Craned/Common/CgroupManager.h (2)
src/Craned/Common/CgroupManager.cpp (10)
  • Allocate (1543-1554)
  • Allocate (1543-1544)
  • Allocate (1556-1567)
  • Allocate (1556-1557)
  • Allocate (1569-1597)
  • Allocate (1569-1570)
  • Allocate (1599-1628)
  • Allocate (1599-1601)
  • AllocateAndGetCgroup (468-533)
  • AllocateAndGetCgroup (469-471)
src/Utilities/PublicHeader/PublicHeader.cpp (1)
  • ResourceInNode (484-487)
src/CraneCtld/TaskScheduler.cpp (3)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (88-109)
  • ChangeTaskTimeConstraint (88-90)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-85)
  • ChangeTaskTimeConstraint (63-66)
src/CraneCtld/CtldPublicDefs.h (1)
  • CheckIfTimeLimitSecIsValid (705-707)
src/CraneCtld/CraneCtld.cpp (1)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (154-154)
src/Craned/Core/CranedPublicDefs.h (1)
src/Craned/Common/CgroupManager.h (2)
  • CgroupInterface (374-405)
  • CgroupManager (514-621)
src/CraneCtld/AccountManager.cpp (1)
test/Misc/mongodb_test.cpp (1)
  • new_user (120-120)
src/Utilities/PublicHeader/include/crane/OS.h (1)
src/Utilities/PublicHeader/OS.cpp (2)
  • CreateFoldersForFileEx (109-148)
  • CreateFoldersForFileEx (109-110)
src/Utilities/PublicHeader/include/crane/String.h (2)
src/Utilities/PublicHeader/String.cpp (8)
  • StepIdsToString (475-477)
  • StepIdsToString (475-475)
  • StepIdTupleToString (479-481)
  • StepIdTupleToString (479-479)
  • StepIdPairToString (483-485)
  • StepIdPairToString (483-483)
  • StepStatusToString (487-489)
  • StepStatusToString (487-487)
src/Utilities/PublicHeader/include/crane/PublicHeader.h (2)
  • Internal (130-238)
  • string_view (252-254)
src/Craned/Supervisor/CranedClient.cpp (2)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (154-154)
src/Utilities/PublicHeader/String.cpp (2)
  • StepStatusToString (487-489)
  • StepStatusToString (487-487)
src/Craned/Core/SupervisorKeeper.cpp (2)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1024-1149)
  • ChangeTaskTimeConstraint (1024-1026)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-85)
  • ChangeTaskTimeConstraint (63-66)
src/Craned/Supervisor/TaskManager.cpp (2)
src/Craned/Supervisor/TaskManager.h (3)
  • IsBatch (83-87)
  • DelTerminationTimer_ (346-352)
  • AddTerminationTimer_ (324-333)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (154-154)
src/CraneCtld/RpcService/CranedKeeper.h (2)
src/CraneCtld/RpcService/CranedKeeper.cpp (9)
  • CranedStub (27-33)
  • Fini (35-37)
  • Fini (35-35)
  • ChangeJobTimeConstraint (239-275)
  • ChangeJobTimeConstraint (239-241)
  • CranedChannelConnFailNoLock_ (783-789)
  • CranedChannelConnFailNoLock_ (783-783)
  • ConnectCranedNode_ (693-781)
  • ConnectCranedNode_ (693-694)
src/Craned/Core/CranedServer.cpp (2)
  • ChangeJobTimeConstraint (309-334)
  • ChangeJobTimeConstraint (309-312)
src/CraneCtld/TaskScheduler.h (3)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1024-1149)
  • ChangeTaskTimeConstraint (1024-1026)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (88-109)
  • ChangeTaskTimeConstraint (88-90)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (63-85)
  • ChangeTaskTimeConstraint (63-66)
src/Craned/Supervisor/CforedClient.h (2)
src/Craned/Supervisor/TaskManager.h (1)
  • pty (142-152)
src/Craned/Supervisor/CforedClient.cpp (16)
  • InitFwdMetaAndUvStdoutFwdHandler (80-99)
  • InitFwdMetaAndUvStdoutFwdHandler (80-82)
  • InitUvX11FwdHandler (101-108)
  • InitUvX11FwdHandler (101-101)
  • TaskOutputFinish (671-675)
  • TaskOutputFinish (671-671)
  • TaskEnd (683-685)
  • TaskEnd (683-683)
  • SetupX11forwarding_ (110-207)
  • SetupX11forwarding_ (110-110)
  • WriteStringToFd_ (209-223)
  • WriteStringToFd_ (209-210)
  • CleanStdoutFwdHandlerQueueCb_ (225-352)
  • CleanStdoutFwdHandlerQueueCb_ (225-225)
  • CleanX11FwdHandlerQueueCb_ (354-360)
  • CleanX11FwdHandlerQueueCb_ (354-354)
src/Craned/Core/JobManager.cpp (1)
src/Craned/Common/CgroupManager.cpp (2)
  • CgroupStrByJobId (279-281)
  • CgroupStrByJobId (279-279)
src/Craned/Supervisor/TaskManager.h (1)
src/Craned/Supervisor/TaskManager.cpp (19)
  • ITaskInstance (200-219)
  • StepInstance (43-47)
  • StepInstance (49-56)
  • IsBatch (58-60)
  • IsBatch (58-58)
  • IsCrun (62-65)
  • IsCrun (62-62)
  • IsCalloc (67-70)
  • IsCalloc (67-67)
  • EvTaskTimerCb_ (1777-1800)
  • EvTaskTimerCb_ (1777-1777)
  • EvDeadlineTaskTimerCb_ (1802-1802)
  • EvDeadlineTaskTimerCb_ (1802-1802)
  • ChangeTaskTimeConstraintAsync (1671-1699)
  • ChangeTaskTimeConstraintAsync (1671-1672)
  • EvCleanTerminateTaskQueueCb_ (1900-1944)
  • EvCleanTerminateTaskQueueCb_ (1900-1900)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1946-1994)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1946-1946)
src/Utilities/PublicHeader/OS.cpp (1)
test/CraneCtld/EmbeddedDbClientTest.cpp (1)
  • exists (33-40)
src/Craned/Supervisor/CforedClient.cpp (2)
src/Utilities/PublicHeader/String.cpp (2)
  • SetCurrentThreadName (340-348)
  • SetCurrentThreadName (340-340)
src/Utilities/PublicHeader/OS.cpp (2)
  • SetFdNonBlocking (153-160)
  • SetFdNonBlocking (153-153)
src/CraneCtld/RpcService/CranedKeeper.cpp (4)
src/Craned/Core/CranedPublicDefs.h (2)
  • g_runtime_status (160-160)
  • g_config (154-154)
src/Craned/Core/CranedServer.cpp (2)
  • ChangeJobTimeConstraint (309-334)
  • ChangeJobTimeConstraint (309-312)
src/Utilities/PublicHeader/Logger.cpp (6)
  • AddLogger (73-99)
  • AddLogger (73-75)
  • AddLogger (101-124)
  • AddLogger (101-103)
  • StrToLogLevel (23-40)
  • StrToLogLevel (23-24)
src/Utilities/PublicHeader/GrpcHelper.cpp (2)
  • ProtoTimestampToString (48-51)
  • ProtoTimestampToString (48-49)
🪛 LanguageTool
docs/reference/exit_code.md

[grammar] ~21-~21: There might be a mistake here.
Context: ... Crane-defined codes: - 64: Terminated - 65: Permission Denied - 66: Cgroup Error...

(QB_NEW_EN)


[grammar] ~22-~22: There might be a mistake here.
Context: ...- 64: Terminated - 65: Permission Denied - 66: Cgroup Error - 67: File Not Found - ...

(QB_NEW_EN)


[grammar] ~23-~23: There might be a mistake here.
Context: ...65: Permission Denied - 66: Cgroup Error - 67: File Not Found - 68: Spawn Process F...

(QB_NEW_EN)


[grammar] ~24-~24: There might be a mistake here.
Context: ... - 66: Cgroup Error - 67: File Not Found - 68: Spawn Process Failed - 69: Exceeded ...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...ile Not Found - 68: Spawn Process Failed - 69: Exceeded Time Limit - 70: Crane Daem...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...Process Failed - 69: Exceeded Time Limit - 70: Crane Daemon Down - 71: Execution Er...

(QB_NEW_EN)


[grammar] ~27-~27: There might be a mistake here.
Context: ...eeded Time Limit - 70: Crane Daemon Down - 71: Execution Error - 72: RPC Failure -...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ... Crane Daemon Down - 71: Execution Error - 72: RPC Failure --- ## JSON Format Exp...

(QB_NEW_EN)


[style] ~37-~37: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...represent program exit signals. - Values above 320 represent **Crane-defined...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

docs/index.md

[grammar] ~102-~102: There might be a mistake here.
Context: ...ces and provides cgroup-based isolation. - AI jobs: the Container Manager uses Kube...

(QB_NEW_EN)


[grammar] ~109-~109: There might be a mistake here.
Context: ...md), calloc, cinfo - Admin commands: [cacct](command/cacct.md...

(QB_NEW_EN)


[grammar] ~110-~110: There might be a mistake here.
Context: ...ccontrol](command/ccontrol.md), ccancel - Exit codes: [reference](reference/exit_c...

(QB_NEW_EN)


[grammar] ~117-~117: There might be a mistake here.
Context: ... https://hpc.pku.edu.cn/demo/cranesched - Backend: <https://github.com/PKUHPC/Cran...

(QB_NEW_EN)


[grammar] ~118-~118: There might be a mistake here.
Context: ...d: https://github.com/PKUHPC/CraneSched - Frontend: <https://github.com/PKUHPC/Cra...

(QB_NEW_EN)

docs/deployment/index.md

[grammar] ~15-~15: There might be a mistake here.
Context: ...ces (cfored, cplugind) | As needed | | Database | MongoDB for storing job...

(QB_NEW_EN)


[grammar] ~116-~116: There might be a mistake here.
Context: ...ure: - Cluster name and control machine - Node definitions (CPU, memory, GPUs) - P...

(QB_NEW_EN)


[grammar] ~117-~117: There might be a mistake here.
Context: ...e - Node definitions (CPU, memory, GPUs) - Partitions and scheduling priorities - N...

(QB_NEW_EN)


[grammar] ~118-~118: There might be a mistake here.
Context: ...) - Partitions and scheduling priorities - Network settings and advanced options #...

(QB_NEW_EN)

docs/deployment/configuration/pam.md

[style] ~57-~57: Using many exclamation marks might seem excessive (in this case: 6 exclamation marks for a text that’s 1590 characters long)
Context: ...nal pam_reauthorize.so prepare ``` !!! warning The line `session required ...

(EN_EXCESSIVE_EXCLAMATION)

docs/deployment/configuration/database.md

[style] ~145-~145: Using many exclamation marks might seem excessive (in this case: 15 exclamation marks for a text that’s 1808 characters long)
Context: ...SetName: crane_rs DbName: crane_db ``` !!! warning This file should only exist...

(EN_EXCESSIVE_EXCLAMATION)

docs/deployment/configuration/multi-node.md

[grammar] ~137-~137: There might be a mistake here.
Context: ...not found**: Install the pdsh package from EPEL repository. Permission denied...

(QB_NEW_EN)

README_EN.md

[grammar] ~27-~27: There might be a mistake here.
Context: ...>100,000 scheduling decisions per second - Scalability: Scales to million-core cl...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...ility**: Scales to million-core clusters - Usability: Simple user and admin comma...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...bility**: Simple user and admin commands - Security: RBAC and encrypted communica...

(QB_NEW_EN)


[grammar] ~30-~30: There might be a mistake here.
Context: ...rity**: RBAC and encrypted communication - Resilience: Automatic job recovery, no...

(QB_NEW_EN)


[grammar] ~31-~31: There might be a mistake here.
Context: ...b recovery, no SPOF, fast state recovery - Open Source: All code available # Arc...

(QB_NEW_EN)


[grammar] ~44-~44: There might be a mistake here.
Context: ...resources and isolates jobs via cgroups. - For AI jobs: the **Container Manager...

(QB_NEW_EN)

docs/deployment/backend/Ubuntu.md

[grammar] ~8-~8: There might be a mistake here.
Context: ...reparation ### 1.1 Update Package Lists It is recommended to change the package ...

(QB_NEW_EN)


[grammar] ~65-~65: There might be a mistake here.
Context: ...tional configuration](eBPF.md) for GRES, or you can switch the system to use CGro...

(QB_NEW_EN)


[style] ~171-~171: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 2585 characters long)
Context: ... . ``` 2. Install the built binaries: !!! tip We recommend deploying CraneSch...

(EN_EXCESSIVE_EXCLAMATION)

docs/deployment/backend/eBPF.md

[style] ~8-~8: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 1399 characters long)
Context: ...ompute nodes. ## 1. Install Clang 19+ !!! tip If your system already installe...

(EN_EXCESSIVE_EXCLAMATION)

docs/deployment/backend/CentOS7.md

[grammar] ~99-~99: There might be a mistake here.
Context: ...wing toolchain versions: * CMake ≥ 3.24 * libstdc++ ≥ 11 * clang ≥ 19 or g++ ≥ 14 ...

(QB_NEW_EN)


[grammar] ~100-~100: There might be a mistake here.
Context: ...rsions: * CMake ≥ 3.24 * libstdc++ ≥ 11 * clang ≥ 19 or g++ ≥ 14 ### 4.1 CMake `...

(QB_NEW_EN)


[style] ~160-~160: Using many exclamation marks might seem excessive (in this case: 12 exclamation marks for a text that’s 2311 characters long)
Context: ...atabase.md) for detailed instructions. !!! info CentOS 7 supports up to MongoD...

(EN_EXCESSIVE_EXCLAMATION)

docs/command/ccontrol.md

[uncategorized] ~213-~213: 您的意思是“"不"全”?
Context: ... ### 删除的命令 由于删除了cobra框架,现在completion自动补全代码命令已经无法使用 ### 文件结构 help.go:由于删除了cobr...

(BU)

docs/deployment/backend/Rocky9.md

[grammar] ~65-~65: There might be a mistake here.
Context: ...eed additional configuration, or you can switch the system to use CGro...

(QB_NEW_EN)


[style] ~167-~167: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 2482 characters long)
Context: ... . ``` 2. Install the built binaries: !!! tip We recommend deploying CraneSch...

(EN_EXCESSIVE_EXCLAMATION)

docs/deployment/backend/packaging.md

[grammar] ~16-~16: There might be a mistake here.
Context: ...ntrol daemon package (for control nodes) - craned - Execution daemon package (for...

(QB_NEW_EN)


[grammar] ~21-~21: There might be a mistake here.
Context: ... package includes: - Binary executables - Systemd service files - Configuration fi...

(QB_NEW_EN)


[grammar] ~22-~22: There might be a mistake here.
Context: ...nary executables - Systemd service files - Configuration file templates - PAM secur...

(QB_NEW_EN)


[grammar] ~23-~23: There might be a mistake here.
Context: ...ice files - Configuration file templates - PAM security module (for craned package)...

(QB_NEW_EN)


[grammar] ~63-~63: There might be a mistake here.
Context: ...!!! tip For production deployments, use Release build type: ```bash cma...

(QB_NEW_EN)


[style] ~82-~82: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 2459 characters long)
Context: ...ate only DEB packages cpack -G DEB ``` !!! info The default configuration (`cp...

(EN_EXCESSIVE_EXCLAMATION)


[grammar] ~83-~83: There might be a mistake here.
Context: ... DEB packages cpack -G DEB ``` !!! info The default configuration (cpack witho...

(QB_NEW_EN)


[grammar] ~171-~171: There might be a mistake here.
Context: ...ssions 3. Creates /etc/crane directory 4. Copies sample configuration files to `/e...

(QB_NEW_EN)


[grammar] ~173-~173: There might be a mistake here.
Context: ...se.yaml` (if not exists, cranectld only) 6. Sets appropriate file ownership and perm...

(QB_NEW_EN)


[grammar] ~174-~174: There might be a mistake here.
Context: ...propriate file ownership and permissions

(QB_NEW_EN)

docs/deployment/configuration/config.md

[grammar] ~87-~87: There might be a mistake here.
Context: ... Hostname or range (e.g., node[01-10]) - cpu: Number of CPU cores - memory:...

(QB_NEW_EN)


[grammar] ~88-~88: There might be a mistake here.
Context: ...[01-10]`) - cpu: Number of CPU cores - memory: Total memory (supports K, M, G...

(QB_NEW_EN)


[grammar] ~89-~89: There might be a mistake here.
Context: ...al memory (supports K, M, G, T suffixes) - gres: Generic resources like GPUs (opt...

(QB_NEW_EN)


[grammar] ~121-~121: There might be a mistake here.
Context: ...ers:** - name: Partition identifier - nodes: Node range belonging to this pa...

(QB_NEW_EN)


[grammar] ~122-~122: There might be a mistake here.
Context: ...: Node range belonging to this partition - priority: Higher values = higher prior...

(QB_NEW_EN)


[grammar] ~123-~123: There might be a mistake here.
Context: ...s = higher priority (affects scheduling) - DefaultMemPerCpu: Default memory per C...

(QB_NEW_EN)


[grammar] ~124-~124: There might be a mistake here.
Context: ...per CPU in MB (0 = let scheduler decide) - MaxMemPerCpu: Maximum memory per CPU i...

(QB_NEW_EN)


[style] ~246-~246: Using many exclamation marks might seem excessive (in this case: 6 exclamation marks for a text that’s 2668 characters long)
Context: ... DeniedAccounts: banned_account ``` !!! warning AllowedAccounts and `Deni...

(EN_EXCESSIVE_EXCLAMATION)


[grammar] ~294-~294: There might be a mistake here.
Context: ...oubleshooting Nodes not appearing: Check ControlMachine hostname matches actual ...

(QB_NEW_EN)


[grammar] ~294-~294: There might be a mistake here.
Context: ...earing**: Check ControlMachine hostname matches actual control node hostname. **Config...

(QB_NEW_EN)

docs/deployment/frontend/frontend.md

[style] ~2-~2: Using many exclamation marks might seem excessive (in this case: 3 exclamation marks for a text that’s 1862 characters long)
Context: ...eployment Guide of Frontend Components !!! tip This tutorial has been verified...

(EN_EXCESSIVE_EXCLAMATION)


[grammar] ~24-~24: There might be a mistake here.
Context: ... (Frontend daemon for interactive jobs): - Provides support for interactive jobs (u...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...ractive jobs (used by crun, calloc). - Typically runs on login nodes where inte...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...service. - cplugind` (Plugin daemon): - Loads and manages plugins (mail, monitor...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...lugin services to CraneSched components. - Must be running on nodes that require pl...

(QB_NEW_EN)

docs/deployment/frontend/plugins.md

[grammar] ~14-~14: There might be a mistake here.
Context: ...ry** (.so file): Plugin implementation - Plugin configuration (.yaml file, op...

(QB_NEW_EN)


[style] ~25-~25: To form a complete sentence, be sure to include a subject.
Context: ...or.yaml`) : Individual plugin settings. Can be located anywhere readable, with the ...

(MISSING_IT_THERE)


[grammar] ~31-~31: There might be a mistake here.
Context: ...owing plugins: | Plugin | Description | |--------|-------------| | Mail | Se...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ...| Description | |--------|-------------| | Mail | Send email notifications on...

(QB_NEW_EN)


[grammar] ~33-~33: There might be a mistake here.
Context: ...ail notifications on job state changes | | Monitor | Collect job resource usa...

(QB_NEW_EN)


[grammar] ~34-~34: There might be a mistake here.
Context: ...fluxDB (Grafana integration supported) | | Energy | Monitor power consumption...

(QB_NEW_EN)


[grammar] ~35-~35: There might be a mistake here.
Context: ...umption for nodes and jobs to InfluxDB | | Event | Record node state changes ...

(QB_NEW_EN)


[grammar] ~70-~70: Ensure spelling is correct
Context: ... CraneCtld/Craned uses the plugin system - PlugindDebugLevel: Log level (trace/debug/info; recomme...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~91-~91: There might be a mistake here.
Context: ...following information: - Username: your_username - Bucket: your_bucket_name - Org: ...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...ername**: your_username - Bucket: your_bucket_name - Org: your_organization - Token: ...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...Bucket**: your_bucket_name - Org: your_organization - Token: your_token - Measurement:...

(QB_NEW_EN)


[grammar] ~94-~94: There might be a mistake here.
Context: ...Org**: your_organization - Token: your_token - Measurement: ResourceUsage ### Conf...

(QB_NEW_EN)


[grammar] ~180-~180: There might be a mistake here.
Context: ...** in /etc/crane/config.yaml (similar to monitor plugin). ### Using Email Notif...

(QB_NEW_EN)


[grammar] ~214-~214: There might be a mistake here.
Context: ... ### Mail Types | Type | Description | |------|-------------| | BEGIN | Job s...

(QB_NEW_EN)


[grammar] ~215-~215: There might be a mistake here.
Context: ...e | Description | |------|-------------| | BEGIN | Job starts running (Pending ...

(QB_NEW_EN)


[grammar] ~216-~216: There might be a mistake here.
Context: ...Job starts running (Pending → Running) | | FAILED | Job execution fails | | `TI...

(QB_NEW_EN)


[grammar] ~217-~217: There might be a mistake here.
Context: ...ng) | | FAILED | Job execution fails | | TIMELIMIT | Job exceeds time limit |...

(QB_NEW_EN)


[grammar] ~218-~218: There might be a mistake here.
Context: ...| TIMELIMIT | Job exceeds time limit | | END | Job finishes (Running → Comple...

(QB_NEW_EN)


[grammar] ~219-~219: There might be a mistake here.
Context: ...(Running → Completed/Failed/Cancelled) | | ALL | All of the above events | ###...

(QB_NEW_EN)


[grammar] ~231-~231: There might be a mistake here.
Context: ...leshooting - Verify Enabled: true in /etc/crane/config.yaml - Check that cplugind is running: `syste...

(QB_NEW_EN)


[grammar] ~232-~232: There might be a mistake here.
Context: ...ml- Check thatcplugindis running:systemctl status cplugind- Ensure.so` file paths are absolute and...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.18.1)
docs/index.md

125-125: Bare URL used

(MD034, no-bare-urls)

docs/deployment/configuration/database.md

25-25: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


46-46: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


71-71: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


80-80: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


99-99: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


109-109: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


115-115: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


137-137: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

README_EN.md

9-9: Heading style
Expected: atx; Actual: atx_closed

(MD003, heading-style)


25-25: Heading style
Expected: atx; Actual: atx_closed

(MD003, heading-style)


34-34: Heading style
Expected: atx; Actual: atx_closed

(MD003, heading-style)


47-47: Heading style
Expected: atx; Actual: atx_closed

(MD003, heading-style)

docs/deployment/backend/Ubuntu.md

34-34: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

docs/deployment/backend/eBPF.md

15-15: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


25-25: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


62-62: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


79-79: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


79-79: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


89-89: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


94-94: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


99-99: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


107-107: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


117-117: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


124-124: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


129-129: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


134-134: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

README.md

25-25: Heading style
Expected: atx; Actual: atx_closed

(MD003, heading-style)

docs/command/ccontrol.md

1-1: No space inside hashes on closed atx style heading

(MD020, no-missing-space-closed-atx)


28-28: Heading levels should only increment by one level at a time
Expected: h2; Actual: h4

(MD001, heading-increment)


78-78: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


83-83: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


88-88: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


93-93: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


98-98: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


103-103: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


108-108: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


113-113: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


118-118: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


123-123: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


128-128: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


133-133: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


142-142: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


147-147: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


152-152: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


157-157: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)


162-162: Code fence style
Expected: tilde; Actual: backtick

(MD048, code-fence-style)

docs/deployment/backend/Rocky9.md

13-13: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


20-20: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


34-34: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


41-41: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


52-52: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


72-72: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


86-86: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


106-106: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


121-121: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


127-127: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


152-152: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


170-170: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


190-190: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


197-197: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

docs/deployment/frontend/frontend.md

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


26-26: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


29-29: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


30-30: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


33-33: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


53-53: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


62-62: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


69-69: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


77-77: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)


95-95: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/CraneCtld/DbClient.cpp (2)

1549-1615: Fix field/value order: deadline_time and has_job_info are swapped

Current tuple places int64 deadline before bool, but fields array has has_job_info before deadline_time. This writes deadline seconds into has_job_info and true into deadline_time.

Apply:

-  std::tuple<..., std::optional<ContainerMetaInTask>, /*35-37*/
-             int64_t, bool>                                          /*38-39*/
+  std::tuple<..., std::optional<ContainerMetaInTask>, /*35-37*/
+             bool, int64_t>                                          /*38-39*/
@@
-             allocated_res_view.GetDeviceMap(), container_meta,
-             task_to_ctld.deadline_time().seconds(),
-             true /* Mark the document having complete job info */};
+             allocated_res_view.GetDeviceMap(), container_meta,
+             true /* Mark the document having complete job info */,
+             task_to_ctld.deadline_time().seconds()};

1647-1706: Fix field/value order in TaskInCtldToDocument_ as well

Same swap needed here to match fields "... meta_container, has_job_info, deadline_time".

Apply:

-  std::tuple<..., std::optional<ContainerMetaInTask>, /*35-37*/
-             int64_t, bool>                                          /*38-39*/
+  std::tuple<..., std::optional<ContainerMetaInTask>, /*35-37*/
+             bool, int64_t>                                          /*38-39*/
@@
-          task->allocated_res_view.GetDeviceMap(), container_meta,
-          task->TaskToCtld().deadline_time().seconds(),
-          true /* Mark the document having complete job info */
+          task->allocated_res_view.GetDeviceMap(), container_meta,
+          true /* Mark the document having complete job info */,
+          task->TaskToCtld().deadline_time().seconds()
       };
♻️ Duplicate comments (13)
src/Craned/Supervisor/TaskManager.h (3)

103-103: Mutable Step access remains unsafe (data race risk).

GetMutableStep() exposes direct mutable access to m_step_to_supv_ without documented thread-safety guarantees. Past reviews flagged this as critical (lines 103-103 and null-null in past comments). The implementation at TaskManager.cpp:1667-1675 shows mutations occur before enqueueing to the async queue, creating a race with the UV loop thread.

Replace with explicit, thread-safe setters or make this method private and document that it's only safe within the event loop thread.

Based on learnings


448-449: Missing input validation and semantic documentation.

ChangeTaskTimeConstraintAsync still lacks:

  1. Input validation: No rejection of negative values for time_limit or deadline_time
  2. Semantic definition: Unclear what nullopt means (no change? clear?) vs 0 (unlimited? invalid?)
  3. Coexistence rules: Can both be set? Must at least one be provided?

Past review (lines 448-450) flagged these issues as critical. Implementation at TaskManager.cpp:1657-1685 and callback at lines 1970-2022 still call .value() without null checks, risking crashes.

Add validation, document semantics (e.g., "nullopt = retain current, 0 = clear/unlimited, negative = invalid"), and ensure callback handles all cases safely.


469-473: Queue element struct updated for time constraints.

The renamed ChangeTaskTimeConstraintQueueElem with both time_limit and deadline_time optionals appropriately models the dual-constraint API. However, without validation in ChangeTaskTimeConstraintAsync, invalid combinations can be enqueued.

docs/en/command/cqueue.md (1)

33-33: Fix double emphasis formatting (MD036 violation).

Line 33 uses **--deadline** **string** which triggers MD036. Past review on lines 27-33 and static analysis both flag this.

Change to single emphasis for consistency with other options:

-- **--deadline** **string**: Display deadline
+- **--deadline string**: Display deadline
docs/zh/command/cbatch.md (1)

54-60: Fix unordered list indentation (MD007 violation).

Lines 55-60 use 4-space indentation but Markdown style requires 2-space. Past review and static analysis both flag this.

Apply this fix:

 - **--deadline string**: 作业的截止时间, 格式:
-    - `now + <count> <units>(无时默认为秒数)`
-    - `HH:MM[:SS]`
-    - `MM/DD[/YY]`
-    - `MMDD[YY]`
-    - `YYYY-MM-DD[THH:MM[:SS]]`
-    - `midnight/noon/teatime/elevenses/fika`
+  - `now + <count> <units>(无时默认为秒数)`
+  - `HH:MM[:SS]`
+  - `MM/DD[/YY]`
+  - `MMDD[YY]`
+  - `YYYY-MM-DD[THH:MM[:SS]]`
+  - `midnight/noon/teatime/elevenses/fika`
src/CraneCtld/RpcService/CranedKeeper.h (1)

92-94: Still missing validation and semantic documentation.

The method signature changed from ChangeJobTimeLimit to ChangeJobTimeConstraint with two optional parameters, but past critical issues remain unresolved:

  1. No semantic documentation: What does nullopt mean? (no change vs clear?) What about 0 or negative values?
  2. No input validation: Code path from TaskScheduler → CranedKeeper → CranedServer → JobManager has no checks. Negative values pass through unchecked.
  3. Type ambiguity: Using raw int64_t instead of std::chrono types loses compile-time unit safety.

The implementation at CranedKeeper.cpp:271-306 and CranedServer.cpp:322-355 confirms no validation occurs.

Add:

  • Header comment defining semantics (e.g., "nullopt = no change, 0 = clear/unlimited, negative = invalid")
  • Input validation rejecting invalid values at API boundary
  • Consider std::chrono::seconds for type safety
src/Craned/Core/SupervisorKeeper.h (1)

24-28: Include for std::optional; add brief units doc

Header uses std::optional but doesn’t include , which can break consumers. Also consider documenting parameter units for clarity.

Apply:

 #include "JobManager.h"
 #include "crane/AtomicHashMap.h"
+#include <optional>

Optionally add:

 class SupervisorStub {
  public:
-  CraneErrCode ChangeTaskTimeConstraint(
-      std::optional<int64_t> time_limit_seconds,
-      std::optional<int64_t> deadline_time);
+  // time_limit_seconds: seconds; deadline_time: Unix epoch seconds.
+  CraneErrCode ChangeTaskTimeConstraint(
+      std::optional<int64_t> time_limit_seconds,
+      std::optional<int64_t> deadline_time);

Also applies to: 38-40

src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)

560-567: Reject TimeLimit change when value is missing; avoid no‑op “success”

If has_time_limit_seconds() is false, we still call ChangeTaskTimeConstraint with nullopt and then mark tasks modified. Fail fast and report a reason.

Apply:

-  if (request->attribute() == ModifyTaskRequest::TimeLimit) {
-    std::optional<int64_t> time_limit_seconds =
-        request->has_time_limit_seconds()
-            ? std::optional<int64_t>(request->time_limit_seconds())
-            : std::nullopt;
+  if (request->attribute() == ModifyTaskRequest::TimeLimit) {
+    if (!request->has_time_limit_seconds()) {
+      for (auto task_id : request->task_ids()) {
+        response->add_not_modified_tasks(task_id);
+        response->add_not_modified_reasons("Missing time_limit_seconds.");
+      }
+      return grpc::Status::OK;
+    }
+    std::optional<int64_t> time_limit_seconds =
+        std::optional<int64_t>(request->time_limit_seconds());
src/Craned/Supervisor/TaskManager.cpp (3)

1899-1923: Logic bug: misspelled flag and ‘==’ used instead of ‘=’

is_must_terinate is misspelled and never set true; the guards below never elevate priority. Rename and fix assignment.

-    bool is_must_terinate = false;
-    if (elem.termination_reason == TerminatedBy::TERMINATION_BY_DEADLINE) {
-      is_must_terinate == true;
-    }
+    const bool must_terminate =
+        (elem.termination_reason == TerminatedBy::TERMINATION_BY_DEADLINE);
@@
-    if (!is_must_terinate && !elem.mark_as_orphaned &&
+    if (!must_terminate && !elem.mark_as_orphaned &&
         !g_runtime_status.CanStepOperate()) {
@@
-    if (!is_must_terinate && !elem.mark_as_orphaned &&
+    if (!must_terminate && !elem.mark_as_orphaned &&
         m_step_.AllTaskFinished()) {

1658-1686: Don’t coerce a missing deadline into 0; preserve nullopt and clear proto if needed

For absent inputs, keep std::nullopt so the consumer can distinguish “no change”. Coercing to 0 makes downstream treat it as epoch and kill immediately. Update producer now; consumer fix follows.

-  if (time_limit) {
-    elem.time_limit = time_limit;
-    m_step_.GetMutableStep().mutable_time_limit()->set_seconds(
-        time_limit.value());
-  } else {
-    elem.time_limit =
-        std::optional<int64_t>(m_step_.GetStep().time_limit().seconds());
-  }
+  if (time_limit) {
+    elem.time_limit = time_limit;
+    m_step_.GetMutableStep().mutable_time_limit()->set_seconds(*time_limit);
+  } else {
+    elem.time_limit = std::nullopt;
+  }
@@
-  if (deadline_time) {
-    elem.deadline_time = deadline_time;
-    m_step_.GetMutableStep().mutable_deadline_time()->set_seconds(
-        deadline_time.value());
-  } else {
-    elem.deadline_time =
-        std::optional<int64_t>(m_step_.GetStep().deadline_time().seconds());
-  }
+  if (deadline_time) {
+    elem.deadline_time = deadline_time;
+    m_step_.GetMutableStep().mutable_deadline_time()->set_seconds(*deadline_time);
+  } else if (m_step_.GetStep().has_deadline_time()) {
+    elem.deadline_time = std::optional<int64_t>(
+        m_step_.GetStep().deadline_time().seconds());
+  } else {
+    elem.deadline_time = std::nullopt;
+    m_step_.GetMutableStep().clear_deadline_time();
+  }

2041-2052: Clamp negative/absent deadline and select correct timer

If no deadline is set or the deadline is in the past, this computes a negative value and arms an invalid timer (or wrongly prefers deadline). Check presence, clamp, and branch.

-    int64_t time_limit_sec = m_step_.GetStep().time_limit().seconds();
-    int64_t deadline_sec = m_step_.GetStep().deadline_time().seconds() -
-                           absl::ToUnixSeconds(absl::Now());
-
-    std::string log_timer =
-        deadline_sec <= time_limit_sec ? "deadline" : "time_limit";
-    bool is_deadline = deadline_sec <= time_limit_sec ? true : false;
-
-    AddTerminationTimer_(std::min(deadline_sec, time_limit_sec), is_deadline);
-    CRANE_TRACE("Add a {} timer of {} seconds", log_timer,
-                std::min(deadline_sec, time_limit_sec));
+    const int64_t time_limit_sec = m_step_.GetStep().time_limit().seconds();
+    if (m_step_.GetStep().has_deadline_time()) {
+      const int64_t ddl_sec = m_step_.GetStep().deadline_time().seconds() -
+                              absl::ToUnixSeconds(absl::Now());
+      if (ddl_sec <= 0) {
+        CRANE_TRACE("Task deadline already reached; firing immediately.");
+        EvTaskTimerCb_(true);
+      } else {
+        const bool is_deadline = ddl_sec <= time_limit_sec;
+        const int64_t secs = is_deadline ? ddl_sec : time_limit_sec;
+        AddTerminationTimer_(secs, is_deadline);
+        CRANE_TRACE("Add a {} timer of {} seconds",
+                    is_deadline ? "deadline" : "time_limit", secs);
+      }
+    } else {
+      AddTerminationTimer_(time_limit_sec, false);
+      CRANE_TRACE("Add a time_limit timer of {} seconds", time_limit_sec);
+    }
src/CraneCtld/TaskScheduler.cpp (2)

1293-1339: Don’t abort loop on first expired deadline; signal cancel async handle

Using return here drops remaining queue items and fails to wake the cancel path. Replace with continue and send m_cancel_task_async_handle_.

-    if (deadline_time <= absl::ToUnixSeconds(now)) {
+    if (deadline_time <= absl::ToUnixSeconds(now)) {
       CRANE_ERROR("Job #{}'s deadline is earlier than now", job_id);
       {
         LockGuard pending_guard(&m_pending_task_map_mtx_);
         auto pd_it = m_pending_task_map_.find(job_id);
         if (pd_it != m_pending_task_map_.end()) {
           auto& task = pd_it->second;
           m_cancel_task_queue_.enqueue(CancelPendingTaskQueueElem{
               .task = std::move(task),
               .finish_status = crane::grpc::TaskStatus::Deadline});
           m_pending_task_map_.erase(pd_it);
         }
       }
-      return;
+      m_cancel_task_async_handle_->send();
+      continue;
     }

1450-1461: Reservation end_time must use the new effective constraints (min of both)

Current code writes twice and compares against stale task->time_limit/deadline_time. Compute effective values first, then set once.

-          if (time_limit_seconds) {
-            g_meta_container->GetResvMetaPtr(task->reservation)->end_time =
-                std::min(task->StartTime() +
-                             absl::Seconds(time_limit_seconds.value()),
-                         task->deadline_time);
-          }
-
-          if (absl_deadline_time) {
-            g_meta_container->GetResvMetaPtr(task->reservation)->end_time =
-                std::min(task->StartTime() + task->time_limit,
-                         absl_deadline_time.value());
-          }
+          // Use the would-be new values if provided; otherwise fall back to current ones.
+          absl::Duration effective_time_limit =
+              time_limit_seconds ? absl::Seconds(*time_limit_seconds)
+                                 : task->time_limit;
+          absl::Time effective_deadline =
+              absl_deadline_time ? *absl_deadline_time : task->deadline_time;
+          g_meta_container->GetResvMetaPtr(task->reservation)->end_time =
+              std::min(task->StartTime() + effective_time_limit,
+                       effective_deadline);
🧹 Nitpick comments (4)
src/CraneCtld/DbClient.cpp (1)

453-455: Optional: choose a clearer sentinel for “no deadline” in queries

Using INT64_MAX as default may confuse consumers; consider omitting the field or using 0 and documenting semantics. If keeping INT64_MAX, document it in API.

protos/Supervisor.proto (1)

104-112: Document units for new fields; ensure clients regenerate

Add brief comments: time_limit_seconds in seconds; deadline_time as Unix epoch seconds. Regenerate stubs to pick up ‘optional’ presence bits.

Also applies to: 131-131

src/CraneCtld/TaskScheduler.h (1)

825-828: Confirm downstream usage of finish_status in cancel path

New finish_status in CancelPendingTaskQueueElem requires CancelTaskAsyncCb_ and status propagation to honor Deadline vs Cancelled.

src/Craned/Supervisor/TaskManager.cpp (1)

1785-1801: Map deadline-triggered Calloc termination to Deadline, not TimeLimit

The Calloc branch always reports ExceedTimeLimit. Use is_deadline to emit Deadline for consistency.

-  if (m_step_.IsCalloc()) {
-    // For calloc job, we use job id, send TaskFinish_ directly.
-    TaskFinish_(m_step_.job_id, crane::grpc::TaskStatus::ExceedTimeLimit,
-                ExitCode::EC_EXCEED_TIME_LIMIT, std::nullopt);
+  if (m_step_.IsCalloc()) {
+    const auto status = is_deadline ? crane::grpc::TaskStatus::Deadline
+                                    : crane::grpc::TaskStatus::ExceedTimeLimit;
+    TaskFinish_(m_step_.job_id, status, ExitCode::EC_EXCEED_TIME_LIMIT,
+                is_deadline ? std::optional<std::string>("Reached task's deadline")
+                            : std::nullopt);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d09c97 and 0ec11b5.

📒 Files selected for processing (33)
  • docs/en/command/cacct.md (2 hunks)
  • docs/en/command/calloc.md (1 hunks)
  • docs/en/command/cbatch.md (1 hunks)
  • docs/en/command/ccontrol.md (2 hunks)
  • docs/en/command/cqueue.md (2 hunks)
  • docs/en/command/crun.md (1 hunks)
  • docs/zh/command/cacct.md (2 hunks)
  • docs/zh/command/calloc.md (1 hunks)
  • docs/zh/command/cbatch.md (1 hunks)
  • docs/zh/command/ccontrol.md (2 hunks)
  • docs/zh/command/cqueue.md (2 hunks)
  • docs/zh/command/crun.md (1 hunks)
  • protos/Crane.proto (4 hunks)
  • protos/PublicDefs.proto (7 hunks)
  • protos/Supervisor.proto (2 hunks)
  • src/CraneCtld/CtldPublicDefs.cpp (6 hunks)
  • src/CraneCtld/CtldPublicDefs.h (2 hunks)
  • src/CraneCtld/DbClient.cpp (10 hunks)
  • src/CraneCtld/RpcService/CranedKeeper.cpp (1 hunks)
  • src/CraneCtld/RpcService/CranedKeeper.h (1 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (2 hunks)
  • src/CraneCtld/TaskScheduler.cpp (16 hunks)
  • src/CraneCtld/TaskScheduler.h (5 hunks)
  • src/Craned/Core/CranedServer.cpp (1 hunks)
  • src/Craned/Core/CranedServer.h (1 hunks)
  • src/Craned/Core/SupervisorKeeper.cpp (1 hunks)
  • src/Craned/Core/SupervisorKeeper.h (1 hunks)
  • src/Craned/Supervisor/SupervisorServer.cpp (1 hunks)
  • src/Craned/Supervisor/SupervisorServer.h (1 hunks)
  • src/Craned/Supervisor/TaskManager.cpp (10 hunks)
  • src/Craned/Supervisor/TaskManager.h (7 hunks)
  • src/Utilities/PublicHeader/include/crane/PublicHeader.h (1 hunks)
  • src/Utilities/PublicHeader/include/crane/String.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • docs/zh/command/cqueue.md
  • docs/en/command/ccontrol.md
  • docs/en/command/crun.md
  • docs/zh/command/calloc.md
  • docs/en/command/cacct.md
  • docs/zh/command/ccontrol.md
  • src/CraneCtld/CtldPublicDefs.cpp
  • docs/en/command/cbatch.md
  • src/CraneCtld/CtldPublicDefs.h
  • docs/en/command/calloc.md
🧰 Additional context used
🧠 Learnings (14)
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.

Applied to files:

  • src/Craned/Core/CranedServer.h
  • src/Craned/Supervisor/SupervisorServer.h
  • protos/PublicDefs.proto
  • src/Craned/Core/CranedServer.cpp
  • src/Craned/Core/SupervisorKeeper.h
  • src/Craned/Supervisor/TaskManager.cpp
  • src/CraneCtld/RpcService/CranedKeeper.cpp
  • src/Craned/Supervisor/SupervisorServer.cpp
📚 Learning: 2025-05-26T11:04:30.580Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Craned/SupervisorKeeper.h:32-32
Timestamp: 2025-05-26T11:04:30.580Z
Learning: The Supervisor component in the Crane system is designed to manage only one task per instance. The task specification is provided to the Supervisor during startup by Craned (likely through InitSupervisorRequest), so the ExecuteTask() method doesn't need to accept task parameters since the Supervisor already knows which task to execute.

Applied to files:

  • src/Craned/Supervisor/SupervisorServer.h
  • src/Craned/Core/SupervisorKeeper.cpp
  • src/Craned/Core/SupervisorKeeper.h
  • src/Craned/Supervisor/SupervisorServer.cpp
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Applied to files:

  • src/Craned/Core/CranedServer.cpp
  • src/CraneCtld/RpcService/CranedKeeper.h
  • src/CraneCtld/TaskScheduler.h
  • src/CraneCtld/TaskScheduler.cpp
  • src/Craned/Supervisor/TaskManager.cpp
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
  • src/CraneCtld/RpcService/CranedKeeper.cpp
📚 Learning: 2025-05-09T02:15:30.422Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/TaskScheduler.cpp:1713-1721
Timestamp: 2025-05-09T02:15:30.422Z
Learning: TaskScheduler中的资源回收设计:当Craned节点离线时,通过TaskStatusChangeAsync异步处理任务状态变更和资源回收。尽管可能存在短暂的资源更新延迟,但这是设计上可接受的权衡。

Applied to files:

  • src/CraneCtld/TaskScheduler.h
  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-26T11:00:54.563Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Craned/SupervisorKeeper.cpp:174-178
Timestamp: 2025-05-26T11:00:54.563Z
Learning: The CraneSched project uses C++23 standard, allowing the use of modern C++ features like std::ranges::to and other C++23 language features and library components.

Applied to files:

  • src/CraneCtld/TaskScheduler.h
  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().

Applied to files:

  • src/CraneCtld/TaskScheduler.h
  • src/CraneCtld/TaskScheduler.cpp
  • src/Craned/Core/SupervisorKeeper.h
  • src/Craned/Supervisor/TaskManager.h
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.

Applied to files:

  • src/CraneCtld/TaskScheduler.h
  • src/CraneCtld/TaskScheduler.cpp
  • src/Craned/Core/SupervisorKeeper.h
  • src/Craned/Supervisor/TaskManager.h
📚 Learning: 2025-08-12T08:58:39.772Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 577
File: src/Craned/Supervisor/TaskManager.cpp:124-130
Timestamp: 2025-08-12T08:58:39.772Z
Learning: In the CraneSched project using C++23, Class Template Argument Deduction (CTAD) allows std::unique_ptr declarations without explicit template parameters when the type can be deduced from the initializer, such as `std::unique_ptr task = std::move(m_task_map_.at(task_id))` where the template parameter is deduced from the move operation.

Applied to files:

  • src/CraneCtld/TaskScheduler.h
  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-07-01T08:00:05.383Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-09T02:16:56.723Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/Craned/TaskManager.cpp:1498-1506
Timestamp: 2025-05-09T02:16:56.723Z
Learning: The `QueryRunningTasksAsync()` method in TaskManager.cpp is designed to never be called from inside the event loop thread, so there's no risk of deadlock with the synchronous `res.get()` call.

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
  • src/Craned/Supervisor/TaskManager.h
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-25T04:11:50.268Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Supervisor/TaskManager.cpp:220-220
Timestamp: 2025-05-25T04:11:50.268Z
Learning: In TaskManager.cpp, when step->IsCrun() is checked before dynamic_cast to CrunMetaInExecution*, the cast is guaranteed to succeed due to the program logic ensuring the correct meta type is used for Crun tasks. Null checks for these dynamic_cast operations are unnecessary in this context.

Applied to files:

  • src/Craned/Supervisor/TaskManager.cpp
  • src/Craned/Supervisor/TaskManager.h
📚 Learning: 2025-06-23T01:34:41.639Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 528
File: src/CraneCtld/DbClient.cpp:0-0
Timestamp: 2025-06-23T01:34:41.639Z
Learning: MemoryBytes() method returns uint64_t type in the CraneSched codebase, and when storing in MongoDB BSON documents, it should be cast to int64_t to maintain consistency with the existing schema and enable numeric database operations.

Applied to files:

  • src/Craned/Supervisor/TaskManager.cpp
📚 Learning: 2025-05-02T07:12:46.896Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/Craned/CranedServer.h:0-0
Timestamp: 2025-05-02T07:12:46.896Z
Learning: The initialization of `m_supervisor_recovered_` to `true` in the CranedServer class is intentional despite the comment saying "When supervisor ready, init with false". This is temporary until the supervisor functionality is fully implemented.

Applied to files:

  • src/Craned/Supervisor/TaskManager.cpp
🧬 Code graph analysis (12)
src/Craned/Core/CranedServer.h (2)
src/Craned/Core/CranedServer.cpp (2)
  • ChangeJobTimeConstraint (323-356)
  • ChangeJobTimeConstraint (323-326)
src/CraneCtld/RpcService/CranedKeeper.cpp (2)
  • ChangeJobTimeConstraint (272-307)
  • ChangeJobTimeConstraint (272-274)
src/Craned/Supervisor/SupervisorServer.h (3)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1379-1512)
  • ChangeTaskTimeConstraint (1379-1381)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (91-112)
  • ChangeTaskTimeConstraint (91-93)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (67-89)
  • ChangeTaskTimeConstraint (67-70)
src/Craned/Core/CranedServer.cpp (1)
src/CraneCtld/RpcService/CranedKeeper.cpp (2)
  • ChangeJobTimeConstraint (272-307)
  • ChangeJobTimeConstraint (272-274)
src/CraneCtld/RpcService/CranedKeeper.h (2)
src/Craned/Core/CranedServer.cpp (2)
  • ChangeJobTimeConstraint (323-356)
  • ChangeJobTimeConstraint (323-326)
src/CraneCtld/RpcService/CranedKeeper.cpp (2)
  • ChangeJobTimeConstraint (272-307)
  • ChangeJobTimeConstraint (272-274)
src/CraneCtld/TaskScheduler.h (3)
src/CraneCtld/TaskScheduler.cpp (10)
  • ChangeTaskTimeConstraint (1379-1512)
  • ChangeTaskTimeConstraint (1379-1381)
  • DeadlineTimerThread_ (731-750)
  • DeadlineTimerThread_ (731-731)
  • CancelDeadlineTaskCb_ (1350-1365)
  • CancelDeadlineTaskCb_ (1350-1350)
  • CreateDeadlineTimerCb_ (1293-1339)
  • CreateDeadlineTimerCb_ (1293-1293)
  • DelDeadlineTimer_ (1341-1348)
  • DelDeadlineTimer_ (1341-1341)
src/Craned/Core/SupervisorKeeper.cpp (3)
  • ChangeTaskTimeConstraint (91-112)
  • ChangeTaskTimeConstraint (91-93)
  • pair (132-192)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (67-89)
  • ChangeTaskTimeConstraint (67-70)
src/Craned/Core/SupervisorKeeper.cpp (2)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1379-1512)
  • ChangeTaskTimeConstraint (1379-1381)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (67-89)
  • ChangeTaskTimeConstraint (67-70)
src/CraneCtld/TaskScheduler.cpp (4)
src/Utilities/PublicHeader/String.cpp (2)
  • SetCurrentThreadName (354-362)
  • SetCurrentThreadName (354-354)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (91-112)
  • ChangeTaskTimeConstraint (91-93)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (67-89)
  • ChangeTaskTimeConstraint (67-70)
src/CraneCtld/CtldPublicDefs.h (1)
  • CheckIfTimeLimitSecIsValid (1062-1064)
src/Craned/Core/SupervisorKeeper.h (3)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1379-1512)
  • ChangeTaskTimeConstraint (1379-1381)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (91-112)
  • ChangeTaskTimeConstraint (91-93)
src/Craned/Supervisor/SupervisorServer.cpp (2)
  • ChangeTaskTimeConstraint (67-89)
  • ChangeTaskTimeConstraint (67-70)
src/Craned/Supervisor/TaskManager.cpp (2)
src/Craned/Core/CranedPublicDefs.h (2)
  • g_config (156-156)
  • g_runtime_status (162-162)
src/Craned/Supervisor/TaskManager.h (2)
  • DelTerminationTimer_ (421-427)
  • AddTerminationTimer_ (410-419)
src/CraneCtld/RpcService/CranedKeeper.cpp (1)
src/Craned/Core/CranedServer.cpp (2)
  • ChangeJobTimeConstraint (323-356)
  • ChangeJobTimeConstraint (323-326)
src/Craned/Supervisor/TaskManager.h (1)
src/Craned/Supervisor/TaskManager.cpp (10)
  • EvTaskTimerCb_ (1785-1802)
  • EvTaskTimerCb_ (1785-1785)
  • ChangeTaskTimeConstraintAsync (1658-1686)
  • ChangeTaskTimeConstraintAsync (1658-1659)
  • EvCleanTaskStopQueueCb_ (1804-1897)
  • EvCleanTaskStopQueueCb_ (1804-1804)
  • EvCleanTerminateTaskQueueCb_ (1899-1969)
  • EvCleanTerminateTaskQueueCb_ (1899-1899)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1971-2023)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1971-1971)
src/Craned/Supervisor/SupervisorServer.cpp (2)
src/CraneCtld/TaskScheduler.cpp (2)
  • ChangeTaskTimeConstraint (1379-1512)
  • ChangeTaskTimeConstraint (1379-1381)
src/Craned/Core/SupervisorKeeper.cpp (2)
  • ChangeTaskTimeConstraint (91-112)
  • ChangeTaskTimeConstraint (91-93)
🪛 markdownlint-cli2 (0.18.1)
docs/zh/command/cbatch.md

55-55: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


56-56: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


57-57: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


58-58: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


59-59: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


60-60: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (21)
src/Utilities/PublicHeader/include/crane/String.h (1)

183-183: LGTM! String mapping correctly updated for Deadline status.

The update properly maps TaskStatus value 10 to "Deadline", aligning with the new deadline feature.

src/Craned/Supervisor/TaskManager.h (4)

484-484: Timer callback signature extended with deadline flag.

EvTaskTimerCb_(bool is_deadline) now distinguishes termination reasons. Verify the implementation at TaskManager.cpp:1784-1801 correctly sets status based on this flag.


516-521: Member variables renamed for time constraint clarity.

The renaming from m_change_task_time_limit_* to m_change_task_time_constraint_* improves naming consistency with the new dual-constraint model.


39-40: Code change verified and complete.

The new TERMINATION_BY_DEADLINE variant is properly handled in the switch statement at TaskManager.cpp line 1852, with appropriate status mapping (crane::grpc::TaskStatus::Deadline) and message. The only switch statement handling TerminatedBy in the codebase now covers all enum variants (explicit cases for specific termination types, plus a default case for remaining values).


398-419: All timer cleanup verified correctly.

The DelTerminationTimer_() call at line 1988 properly precedes the AddTerminationTimer_() call at line 2012 in the time constraint change handler. The second AddTerminationTimer_() call at line 2049 in the task execution handler is for initial timer setup where no prior deletion is needed. The is_deadline parameter is correctly implemented to distinguish deadline from time-limit expirations. No dual-fire scenarios exist.

docs/en/command/cqueue.md (2)

27-27: Deadline field documented.

The new output field appropriately documents the deadline display feature.


67-67: Format identifier for deadline added.

The %deadline and %Deadline format tokens appropriately support deadline display in custom formats.

docs/zh/command/crun.md (1)

14-14: Deadline option documented for crun.

The addition appropriately documents the --deadline CLI option for interactive tasks.

src/Utilities/PublicHeader/include/crane/PublicHeader.h (1)

125-125: Exit code for deadline termination added.

The new EC_REACHED_DEADLINE appropriately extends the exit code enum to signal deadline-based terminations.

docs/zh/command/cacct.md (2)

54-54: Format identifier for deadline added.

The %deadline/%Deadline format option appropriately documents deadline display in custom output formats.


84-84: Display option for deadline documented.

The --deadline string option appropriately documents the deadline display feature.

src/Craned/Supervisor/SupervisorServer.h (1)

53-57: LGTM: RPC rename and types align with proto

src/Craned/Core/CranedServer.cpp (1)

323-356: LGTM: forwards optional time_limit_seconds/deadline_time to Supervisor

Handler matches proto and system semantics; logs and response handling look consistent.

src/Craned/Core/CranedServer.h (1)

97-101: LGTM: signature matches implementation and proto

src/CraneCtld/TaskScheduler.h (2)

762-765: LGTM: public API widened to optionals; matches callers


964-1070: Verify uvw loop/handles are constructed and used on the same thread

Ensure m_job_deadline_timer_thread_ owns uvw_deadline_loop and that async/timer handles are created/used within that loop’s thread to avoid races.

src/Craned/Core/SupervisorKeeper.cpp (1)

91-110: Forwarding optional time constraints looks correct

The stub now mirrors both optional fields cleanly, so Supervisor receives exactly what Ctld intends.

src/CraneCtld/RpcService/CranedKeeper.cpp (1)

272-307: Nice alignment with the new RPC shape

Conditional population of time limit vs. deadline keeps the wire format minimal while matching Supervisor expectations.

protos/PublicDefs.proto (1)

121-122: Good to see Deadline enumerated

Adding the explicit Deadline state gives downstream consumers a first-class signal when a task stops because of deadline enforcement.

protos/Crane.proto (1)

152-159: API rename and optional deadline field look solid

The new request names and optional deadline_time keep the contract clear for both CLI and Supervisor callers.

src/Craned/Supervisor/SupervisorServer.cpp (1)

67-88: Server handler matches the new constraint semantics

Extracting the optionals and delegating to ChangeTaskTimeConstraintAsync preserves previous behavior while adding deadline support.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/CraneCtld/CtldPublicDefs.cpp (1)

1024-1026: Reset deadline_time when the proto omits it.

This issue has already been flagged in a previous review. When TaskInCtld instances are reused and the proto doesn't specify deadline_time, the field retains stale values. Add an else clause to reset to absl::InfiniteFuture().

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ec11b5 and 2fe4613.

📒 Files selected for processing (2)
  • src/CraneCtld/CtldPublicDefs.cpp (6 hunks)
  • src/Craned/Supervisor/TaskManager.h (7 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-05-25T04:11:50.268Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Supervisor/TaskManager.cpp:220-220
Timestamp: 2025-05-25T04:11:50.268Z
Learning: In TaskManager.cpp, when step->IsCrun() is checked before dynamic_cast to CrunMetaInExecution*, the cast is guaranteed to succeed due to the program logic ensuring the correct meta type is used for Crun tasks. Null checks for these dynamic_cast operations are unnecessary in this context.

Applied to files:

  • src/Craned/Supervisor/TaskManager.h
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().

Applied to files:

  • src/Craned/Supervisor/TaskManager.h
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.

Applied to files:

  • src/Craned/Supervisor/TaskManager.h
📚 Learning: 2025-05-09T02:16:56.723Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/Craned/TaskManager.cpp:1498-1506
Timestamp: 2025-05-09T02:16:56.723Z
Learning: The `QueryRunningTasksAsync()` method in TaskManager.cpp is designed to never be called from inside the event loop thread, so there's no risk of deadlock with the synchronous `res.get()` call.

Applied to files:

  • src/Craned/Supervisor/TaskManager.h
📚 Learning: 2025-05-23T02:32:43.952Z
Learnt from: 1daidai1
Repo: PKUHPC/CraneSched PR: 458
File: src/CraneCtld/CtldPublicDefs.h:0-0
Timestamp: 2025-05-23T02:32:43.952Z
Learning: In the CraneSched project, allocated_res_view is updated before calling SetAllocatedRes, so it doesn't need to be updated again within the method itself.

Applied to files:

  • src/CraneCtld/CtldPublicDefs.cpp
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Applied to files:

  • src/CraneCtld/CtldPublicDefs.cpp
🧬 Code graph analysis (1)
src/Craned/Supervisor/TaskManager.h (1)
src/Craned/Supervisor/TaskManager.cpp (10)
  • EvTaskTimerCb_ (1785-1802)
  • EvTaskTimerCb_ (1785-1785)
  • ChangeTaskTimeConstraintAsync (1658-1686)
  • ChangeTaskTimeConstraintAsync (1658-1659)
  • EvCleanTaskStopQueueCb_ (1804-1897)
  • EvCleanTaskStopQueueCb_ (1804-1804)
  • EvCleanTerminateTaskQueueCb_ (1899-1969)
  • EvCleanTerminateTaskQueueCb_ (1899-1899)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1971-2023)
  • EvCleanChangeTaskTimeConstraintQueueCb_ (1971-1971)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (9)
src/CraneCtld/CtldPublicDefs.cpp (2)

552-552: LGTM!

Correctly copies deadline_time from the job to the primary step during initialization.


618-618: LGTM!

Serialization of deadline_time into proto messages (StepToCtld, StepToD, TaskInfo) is correctly implemented with proper time conversions.

Also applies to: 668-669, 1141-1141

src/Craned/Supervisor/TaskManager.h (7)

39-40: LGTM: Enum addition is clean.

The TERMINATION_BY_DEADLINE value is correctly added to support the new deadline termination path.


398-408: Timer logic correctly captures deadline flag.

The lambda properly captures is_deadline and forwards it to EvTaskTimerCb_, ensuring the timer callback knows which constraint triggered termination.


410-419: LGTM: Overload mirrors template version correctly.

The int64_t overload maintains consistency with the template version and properly forwards the deadline flag.


469-473: LGTM: Struct evolution matches API changes.

The renamed struct and optional fields properly support the dual time-constraint model.


484-484: LGTM: Callback signature updated for deadline awareness.

The signature change enables the timer callback to distinguish between time limit and deadline expirations.


490-490: LGTM: Callback rename maintains consistency.

The rename aligns with the broader time-constraint refactoring.


516-521: LGTM: Member variable renames are consistent.

All time-constraint-related members are renamed uniformly, maintaining code clarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants