Skip to content

Remove internal lustre paths#1148

Open
kevalmorabia97 wants to merge 1 commit intomainfrom
kmorabia/remove-lustre
Open

Remove internal lustre paths#1148
kevalmorabia97 wants to merge 1 commit intomainfrom
kmorabia/remove-lustre

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 commented Mar 31, 2026

Remove internal /lusre paths

Summary by CodeRabbit

  • Configuration Updates
    • Added LOG_DIR environment variable support to configuration templates.
    • Removed built-in default values for environment variables; configuration parameters must now be explicitly provided via configuration files or environment variables.
    • Updated example configuration paths to use placeholders requiring user customization.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

This PR introduces a new LOG_DIR environment variable to configuration files, removes built-in default path initialization from a shell script, and updates YAML configuration paths to use relative placeholders with TODO comments. The changes shift configuration management from script-embedded defaults to externally-sourced environment variables.

Changes

Cohort / File(s) Summary
Log Directory Environment Variable
examples/llm_qad/configs/qwen3-30b-a3b-instruct-2507-moe_template.conf, examples/llm_qad/configs/qwen3-8b_template.conf
Added export LOG_DIR="" environment variable to both configuration files, extending the set of externally-provided paths.
Default Initialization Removal
examples/llm_qad/sbatch_qad.sh
Removed built-in default assignments for model directories, checkpoints root, data cache, log directory, and container image/mounts/workdir. These variables must now be provided via sourced config or environment prior to use.
Path Placeholder Updates
examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml
Replaced absolute filesystem paths with relative placeholders (/path/to/...) and added inline TODO comments for model_path, text_encoder_path, and preprocessed_data_root.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Remove internal lustre paths' directly aligns with the changeset, which removes absolute filesystem paths and replaces them with placeholders across configuration files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns ✅ Passed PR modifies only Bash configuration files, shell scripts, and YAML files with no Python code changes. Security anti-patterns check applies only to Python-specific vulnerabilities.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabia/remove-lustre

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_qad/configs/qwen3-8b_template.conf`:
- Line 58: Add a required-variable validation for LOG_DIR in sbatch_qad.sh
similar to the existing checks for other vars: detect if LOG_DIR is empty at the
start of the script and print an error then exit (matching the pattern used for
other variables), so mkdir -p "${LOG_DIR}" and the srun --output=${LOG_DIR}/...
usage cannot proceed with an empty path; reference the LOG_DIR variable, the
mkdir -p invocation, and the srun --output usage when placing the new check.

In `@examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml`:
- Around line 5-6: The YAML key text_encoder_path is misindented as a sibling of
load_checkpoint rather than a child, breaking the expected
model.load_checkpoint.text_encoder_path schema; fix by nesting text_encoder_path
under load_checkpoint (adjust indentation) so that the configuration key becomes
model.load_checkpoint.text_encoder_path and consumers of
load_checkpoint/text_encoder_path find the value correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e867e57c-47a3-4ef0-9fcd-5b65630fdba7

📥 Commits

Reviewing files that changed from the base of the PR and between 74a8694 and ebc6aa8.

📒 Files selected for processing (4)
  • examples/llm_qad/configs/qwen3-30b-a3b-instruct-2507-moe_template.conf
  • examples/llm_qad/configs/qwen3-8b_template.conf
  • examples/llm_qad/sbatch_qad.sh
  • examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml
💤 Files with no reviewable changes (1)
  • examples/llm_qad/sbatch_qad.sh

export STUDENT_CONFIG_FILE="" # path to student model args script, e.g., ${MLM_DIR}/examples/post_training/modelopt/conf/Qwen/Qwen3-8B.sh
export QAD_CHECKPOINT_ROOT="" # path to store QAD checkpoints
export DATACACHE_DIR="" # path to data cache directory
export LOG_DIR="" # path to log directory
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if sbatch_qad.sh has any validation for LOG_DIR or other required variables

echo "=== Checking for LOG_DIR validation in sbatch_qad.sh ==="
rg -n 'LOG_DIR' examples/llm_qad/sbatch_qad.sh

echo ""
echo "=== Checking for any variable validation patterns (e.g., -z checks) ==="
rg -n '\[\[.*-z.*\]\]|\[\s+-z' examples/llm_qad/sbatch_qad.sh

echo ""
echo "=== Checking for required variable error handling ==="
rg -n 'required|REQUIRED|must be set|not set' examples/llm_qad/sbatch_qad.sh

Repository: NVIDIA/Model-Optimizer

Length of output: 988


Add validation for LOG_DIR in sbatch_qad.sh before use.

The script validates other required variables (lines 63-64, 102-103) but omits validation for LOG_DIR, which is marked as REQUIRED in the config. When empty, mkdir -p "" silently does nothing (line 74) and srun --output=${LOG_DIR}/%x_... (lines 146-147) becomes --output=/%x_..., attempting to write logs to the root filesystem.

Add a validation check at the start of the script, similar to the existing patterns:

[[ -z "${LOG_DIR:-}" ]] && echo "ERROR: LOG_DIR required" && exit 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qad/configs/qwen3-8b_template.conf` at line 58, Add a
required-variable validation for LOG_DIR in sbatch_qad.sh similar to the
existing checks for other vars: detect if LOG_DIR is empty at the start of the
script and print an error then exit (matching the pattern used for other
variables), so mkdir -p "${LOG_DIR}" and the srun --output=${LOG_DIR}/... usage
cannot proceed with an empty path; reference the LOG_DIR variable, the mkdir -p
invocation, and the srun --output usage when placing the new check.

Comment on lines 5 to +6
load_checkpoint:
text_encoder_path: "/lustre/fsw/portfolios/adlr/users/dhutchins/models/gemma"
text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix YAML nesting under load_checkpoint to preserve schema.

text_encoder_path is currently aligned as a sibling of load_checkpoint instead of a child key. This changes the config structure and can break consumers expecting model.load_checkpoint.text_encoder_path.

Proposed fix
 model:
   model_path: "/path/to/ltx2/checkpoint.safetensors"  # TODO: Set your LTX-2 checkpoint path
   training_mode: "full"
   load_checkpoint:
-  text_encoder_path: "/path/to/gemma"  # TODO: Set your Gemma text encoder path
+    text_encoder_path: "/path/to/gemma"  # TODO: Set your Gemma text encoder path
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
load_checkpoint:
text_encoder_path: "/lustre/fsw/portfolios/adlr/users/dhutchins/models/gemma"
text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path
load_checkpoint:
text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml` around lines
5 - 6, The YAML key text_encoder_path is misindented as a sibling of
load_checkpoint rather than a child, breaking the expected
model.load_checkpoint.text_encoder_path schema; fix by nesting text_encoder_path
under load_checkpoint (adjust indentation) so that the configuration key becomes
model.load_checkpoint.text_encoder_path and consumers of
load_checkpoint/text_encoder_path find the value correctly.

@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1148/

Built to branch gh-pages at 2026-03-31 16:40 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.19%. Comparing base (74a8694) to head (ebc6aa8).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1148      +/-   ##
==========================================
- Coverage   70.21%   70.19%   -0.03%     
==========================================
  Files         230      230              
  Lines       26073    26073              
==========================================
- Hits        18308    18302       -6     
- Misses       7765     7771       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fi
fi

# === Default Paths (override in config) ===
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing seems too brutal. keeping them with some sane default value is good that people know what env vars are expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants