Test fixes #1718

terrykong · 2026-01-05T23:37:29Z

mcore generation config: fix: mcore generation config restored in nightly test #1720
gemma skip tokenizer fix: fix: gemma3 27b must now have skip_tokenizer_init=False in vllm #1721
seq parallel + tp no longer crashing: fix: remove seq_parallel + tp restriction in dtensor v2 #1725
cpu offload bug in v1: fix: apply offloading change from v2 to v1 #1726
rm checkpoint dir if successful (same PR as 5)
median metric change: fix: use median instead of mean for logprob error for stability in nightlies #1722
logger fix (val metrics were skipped) fix: log metrics that can be coerced to scalars #1723
increase time for some tests that failed due to model download fix: fix several nightly tests that were flaky #1724

Signed-off-by: Terry Kong <[email protected]>

github-actions · 2026-01-05T23:37:51Z

ℹ️ File Consistency Check

Check based on commit: 4d435d8 (PR #1718 from test-fixes)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

terrykong added 4 commits January 5, 2026 15:03

lots of fixes

6ac01b5

Signed-off-by: Terry Kong <[email protected]>

rm -rf ckpt dir that succeed

c557f31

Signed-off-by: Terry Kong <[email protected]>

median

e5feff2

Signed-off-by: Terry Kong <[email protected]>

more fixes

4d435d8

Signed-off-by: Terry Kong <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test fixes #1718

Test fixes #1718

terrykong commented Jan 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test fixes #1718

Are you sure you want to change the base?

Test fixes #1718

Conversation

terrykong commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 5, 2026

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

terrykong commented Jan 5, 2026 •

edited

Loading