Conversation
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
📝 WalkthroughWalkthroughThis PR introduces a new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_qad/configs/qwen3-8b_template.conf`:
- Line 58: Add a required-variable validation for LOG_DIR in sbatch_qad.sh
similar to the existing checks for other vars: detect if LOG_DIR is empty at the
start of the script and print an error then exit (matching the pattern used for
other variables), so mkdir -p "${LOG_DIR}" and the srun --output=${LOG_DIR}/...
usage cannot proceed with an empty path; reference the LOG_DIR variable, the
mkdir -p invocation, and the srun --output usage when placing the new check.
In `@examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml`:
- Around line 5-6: The YAML key text_encoder_path is misindented as a sibling of
load_checkpoint rather than a child, breaking the expected
model.load_checkpoint.text_encoder_path schema; fix by nesting text_encoder_path
under load_checkpoint (adjust indentation) so that the configuration key becomes
model.load_checkpoint.text_encoder_path and consumers of
load_checkpoint/text_encoder_path find the value correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e867e57c-47a3-4ef0-9fcd-5b65630fdba7
📒 Files selected for processing (4)
examples/llm_qad/configs/qwen3-30b-a3b-instruct-2507-moe_template.confexamples/llm_qad/configs/qwen3-8b_template.confexamples/llm_qad/sbatch_qad.shexamples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml
💤 Files with no reviewable changes (1)
- examples/llm_qad/sbatch_qad.sh
| export STUDENT_CONFIG_FILE="" # path to student model args script, e.g., ${MLM_DIR}/examples/post_training/modelopt/conf/Qwen/Qwen3-8B.sh | ||
| export QAD_CHECKPOINT_ROOT="" # path to store QAD checkpoints | ||
| export DATACACHE_DIR="" # path to data cache directory | ||
| export LOG_DIR="" # path to log directory |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if sbatch_qad.sh has any validation for LOG_DIR or other required variables
echo "=== Checking for LOG_DIR validation in sbatch_qad.sh ==="
rg -n 'LOG_DIR' examples/llm_qad/sbatch_qad.sh
echo ""
echo "=== Checking for any variable validation patterns (e.g., -z checks) ==="
rg -n '\[\[.*-z.*\]\]|\[\s+-z' examples/llm_qad/sbatch_qad.sh
echo ""
echo "=== Checking for required variable error handling ==="
rg -n 'required|REQUIRED|must be set|not set' examples/llm_qad/sbatch_qad.shRepository: NVIDIA/Model-Optimizer
Length of output: 988
Add validation for LOG_DIR in sbatch_qad.sh before use.
The script validates other required variables (lines 63-64, 102-103) but omits validation for LOG_DIR, which is marked as REQUIRED in the config. When empty, mkdir -p "" silently does nothing (line 74) and srun --output=${LOG_DIR}/%x_... (lines 146-147) becomes --output=/%x_..., attempting to write logs to the root filesystem.
Add a validation check at the start of the script, similar to the existing patterns:
[[ -z "${LOG_DIR:-}" ]] && echo "ERROR: LOG_DIR required" && exit 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llm_qad/configs/qwen3-8b_template.conf` at line 58, Add a
required-variable validation for LOG_DIR in sbatch_qad.sh similar to the
existing checks for other vars: detect if LOG_DIR is empty at the start of the
script and print an error then exit (matching the pattern used for other
variables), so mkdir -p "${LOG_DIR}" and the srun --output=${LOG_DIR}/... usage
cannot proceed with an empty path; reference the LOG_DIR variable, the mkdir -p
invocation, and the srun --output usage when placing the new check.
| load_checkpoint: | ||
| text_encoder_path: "/lustre/fsw/portfolios/adlr/users/dhutchins/models/gemma" | ||
| text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path |
There was a problem hiding this comment.
Fix YAML nesting under load_checkpoint to preserve schema.
text_encoder_path is currently aligned as a sibling of load_checkpoint instead of a child key. This changes the config structure and can break consumers expecting model.load_checkpoint.text_encoder_path.
Proposed fix
model:
model_path: "/path/to/ltx2/checkpoint.safetensors" # TODO: Set your LTX-2 checkpoint path
training_mode: "full"
load_checkpoint:
- text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path
+ text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| load_checkpoint: | |
| text_encoder_path: "/lustre/fsw/portfolios/adlr/users/dhutchins/models/gemma" | |
| text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path | |
| load_checkpoint: | |
| text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml` around lines
5 - 6, The YAML key text_encoder_path is misindented as a sibling of
load_checkpoint rather than a child, breaking the expected
model.load_checkpoint.text_encoder_path schema; fix by nesting text_encoder_path
under load_checkpoint (adjust indentation) so that the configuration key becomes
model.load_checkpoint.text_encoder_path and consumers of
load_checkpoint/text_encoder_path find the value correctly.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1148 +/- ##
==========================================
- Coverage 70.21% 70.19% -0.03%
==========================================
Files 230 230
Lines 26073 26073
==========================================
- Hits 18308 18302 -6
- Misses 7765 7771 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| fi | ||
| fi | ||
|
|
||
| # === Default Paths (override in config) === |
There was a problem hiding this comment.
removing seems too brutal. keeping them with some sane default value is good that people know what env vars are expected.
Remove internal /lusre paths
Summary by CodeRabbit
LOG_DIRenvironment variable support to configuration templates.