Skip to content

Conversation

@Surya-Gunukula
Copy link
Contributor

Megatron's load_checkpoint() already calls opt_param_scheduler.load_state_dict() which internally increments num_steps. The extra step() call here was doubling the scheduler position on every resume.

More info: Issue #1546

Megatron's load_checkpoint() already calls opt_param_scheduler.load_state_dict()
which internally increments num_steps. The extra step() call here was doubling
the scheduler position on every resume.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant