Remove internal lustre paths by kevalmorabia97 · Pull Request #1148 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-03-31T16:36:16Z

Remove internal /lusre paths

Summary by CodeRabbit

Configuration Updates
- Added LOG_DIR environment variable support to configuration templates.
- Removed built-in default values for environment variables; configuration parameters must now be explicitly provided via configuration files or environment variables.
- Updated example configuration paths to use placeholders requiring user customization.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

coderabbitai · 2026-03-31T16:36:30Z

📝 Walkthrough

Walkthrough

This PR introduces a new LOG_DIR environment variable to configuration files, removes built-in default path initialization from a shell script, and updates YAML configuration paths to use relative placeholders with TODO comments. The changes shift configuration management from script-embedded defaults to externally-sourced environment variables.

Changes

Cohort / File(s)	Summary
Log Directory Environment Variable `examples/llm_qad/configs/qwen3-30b-a3b-instruct-2507-moe_template.conf`, `examples/llm_qad/configs/qwen3-8b_template.conf`	Added `export LOG_DIR=""` environment variable to both configuration files, extending the set of externally-provided paths.
Default Initialization Removal `examples/llm_qad/sbatch_qad.sh`	Removed built-in default assignments for model directories, checkpoints root, data cache, log directory, and container image/mounts/workdir. These variables must now be provided via sourced config or environment prior to use.
Path Placeholder Updates `examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml`	Replaced absolute filesystem paths with relative placeholders (`/path/to/...`) and added inline TODO comments for `model_path`, `text_encoder_path`, and `preprocessed_data_root`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Remove internal lustre paths' directly aligns with the changeset, which removes absolute filesystem paths and replaces them with placeholders across configuration files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns	✅ Passed	PR modifies only Bash configuration files, shell scripts, and YAML files with no Python code changes. Security anti-patterns check applies only to Python-specific vulnerabilities.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kmorabia/remove-lustre

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_qad/configs/qwen3-8b_template.conf`:
- Line 58: Add a required-variable validation for LOG_DIR in sbatch_qad.sh
similar to the existing checks for other vars: detect if LOG_DIR is empty at the
start of the script and print an error then exit (matching the pattern used for
other variables), so mkdir -p "${LOG_DIR}" and the srun --output=${LOG_DIR}/...
usage cannot proceed with an empty path; reference the LOG_DIR variable, the
mkdir -p invocation, and the srun --output usage when placing the new check.

In `@examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml`:
- Around line 5-6: The YAML key text_encoder_path is misindented as a sibling of
load_checkpoint rather than a child, breaking the expected
model.load_checkpoint.text_encoder_path schema; fix by nesting text_encoder_path
under load_checkpoint (adjust indentation) so that the configuration key becomes
model.load_checkpoint.text_encoder_path and consumers of
load_checkpoint/text_encoder_path find the value correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e867e57c-47a3-4ef0-9fcd-5b65630fdba7

📥 Commits

Reviewing files that changed from the base of the PR and between 74a8694 and ebc6aa8.

📒 Files selected for processing (4)

examples/llm_qad/configs/qwen3-30b-a3b-instruct-2507-moe_template.conf
examples/llm_qad/configs/qwen3-8b_template.conf
examples/llm_qad/sbatch_qad.sh
examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml

💤 Files with no reviewable changes (1)

examples/llm_qad/sbatch_qad.sh

coderabbitai · 2026-03-31T16:39:45Z

examples/llm_qad/configs/qwen3-8b_template.conf

 export STUDENT_CONFIG_FILE="" # path to student model args script, e.g., ${MLM_DIR}/examples/post_training/modelopt/conf/Qwen/Qwen3-8B.sh
 export QAD_CHECKPOINT_ROOT="" # path to store QAD checkpoints
 export DATACACHE_DIR="" # path to data cache directory
+export LOG_DIR="" # path to log directory


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if sbatch_qad.sh has any validation for LOG_DIR or other required variables echo "=== Checking for LOG_DIR validation in sbatch_qad.sh ===" rg -n 'LOG_DIR' examples/llm_qad/sbatch_qad.sh echo "" echo "=== Checking for any variable validation patterns (e.g., -z checks) ===" rg -n '\[\[.*-z.*\]\]|\[\s+-z' examples/llm_qad/sbatch_qad.sh echo "" echo "=== Checking for required variable error handling ===" rg -n 'required|REQUIRED|must be set|not set' examples/llm_qad/sbatch_qad.sh

Repository: NVIDIA/Model-Optimizer

Length of output: 988

Add validation for LOG_DIR in sbatch_qad.sh before use.

The script validates other required variables (lines 63-64, 102-103) but omits validation for LOG_DIR, which is marked as REQUIRED in the config. When empty, mkdir -p "" silently does nothing (line 74) and srun --output=${LOG_DIR}/%x_... (lines 146-147) becomes --output=/%x_..., attempting to write logs to the root filesystem.

Add a validation check at the start of the script, similar to the existing patterns:

[[ -z "${LOG_DIR:-}" ]] && echo "ERROR: LOG_DIR required" && exit 1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/llm_qad/configs/qwen3-8b_template.conf` at line 58, Add a required-variable validation for LOG_DIR in sbatch_qad.sh similar to the existing checks for other vars: detect if LOG_DIR is empty at the start of the script and print an error then exit (matching the pattern used for other variables), so mkdir -p "${LOG_DIR}" and the srun --output=${LOG_DIR}/... usage cannot proceed with an empty path; reference the LOG_DIR variable, the mkdir -p invocation, and the srun --output usage when placing the new check.

coderabbitai · 2026-03-31T16:39:45Z

examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml

  load_checkpoint:
-  text_encoder_path: "/lustre/fsw/portfolios/adlr/users/dhutchins/models/gemma"
+  text_encoder_path: "/path/to/gemma"  # TODO: Set your Gemma text encoder path


⚠️ Potential issue | 🔴 Critical

Fix YAML nesting under load_checkpoint to preserve schema.

text_encoder_path is currently aligned as a sibling of load_checkpoint instead of a child key. This changes the config structure and can break consumers expecting model.load_checkpoint.text_encoder_path.

Proposed fix

model: model_path: "/path/to/ltx2/checkpoint.safetensors" # TODO: Set your LTX-2 checkpoint path training_mode: "full" load_checkpoint: - text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path + text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

load_checkpoint:

text_encoder_path: "/lustre/fsw/portfolios/adlr/users/dhutchins/models/gemma"

text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path

load_checkpoint:

text_encoder_path: "/path/to/gemma" # TODO: Set your Gemma text encoder path

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/windows/torch_onnx/diffusers/qad_example/ltx2_qad.yaml` around lines 5 - 6, The YAML key text_encoder_path is misindented as a sibling of load_checkpoint rather than a child, breaking the expected model.load_checkpoint.text_encoder_path schema; fix by nesting text_encoder_path under load_checkpoint (adjust indentation) so that the configuration key becomes model.load_checkpoint.text_encoder_path and consumers of load_checkpoint/text_encoder_path find the value correctly.

github-actions · 2026-03-31T16:40:25Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1148/
Built to branch `gh-pages` at 2026-03-31 16:40 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-03-31T16:49:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.19%. Comparing base (74a8694) to head (ebc6aa8).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1148      +/-   ##
==========================================
- Coverage   70.21%   70.19%   -0.03%     
==========================================
  Files         230      230              
  Lines       26073    26073              
==========================================
- Hits        18308    18302       -6     
- Misses       7765     7771       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

shengliangxu · 2026-04-01T02:15:34Z

examples/llm_qad/sbatch_qad.sh

    fi
 fi

-# === Default Paths (override in config) ===


removing seems too brutal. keeping them with some sane default value is good that people know what env vars are expected.

Remove internal lustre paths

ebc6aa8

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 requested review from ChenhanYu, cjluo-nv, meenchen, vishalpandya1990 and ynankani March 31, 2026 16:36

kevalmorabia97 requested review from a team as code owners March 31, 2026 16:36

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

meenchen approved these changes Mar 31, 2026

View reviewed changes

shengliangxu reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove internal lustre paths#1148

Remove internal lustre paths#1148
kevalmorabia97 wants to merge 1 commit intomainfrom
kmorabia/remove-lustre

kevalmorabia97 commented Mar 31, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 31, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Built to branch `gh-pages` at 2026-03-31 16:40 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov bot commented Mar 31, 2026

Uh oh!

shengliangxu Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kevalmorabia97 commented Mar 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 31, 2026

Built to branch gh-pages at 2026-03-31 16:40 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov bot commented Mar 31, 2026

Codecov Report

Uh oh!

shengliangxu Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevalmorabia97 commented Mar 31, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 31, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-03-31 16:40 UTC.
Preview will be ready when the GitHub Pages deployment is complete.