NVIDIA-NeMo
diff --git a/‎docs/fp8.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/fp8.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/guides/ft-launcher-guide.md‎
Lines changed: 58 additions & 0 deletions b/‎docs/guides/ft-launcher-guide.md‎
Lines changed: 58 additions & 0 deletions
diff --git a/‎docs/index.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/index.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎examples/ft_launcher/ft_config.yaml‎
Lines changed: 3 additions & 0 deletions b/‎examples/ft_launcher/ft_config.yaml‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎nemo_rl/algorithms/distillation.py‎
Lines changed: 8 additions & 2 deletions b/‎nemo_rl/algorithms/distillation.py‎
Lines changed: 8 additions & 2 deletions
diff --git a/‎nemo_rl/algorithms/dpo.py‎
Lines changed: 1 addition & 0 deletions b/‎nemo_rl/algorithms/dpo.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎nemo_rl/algorithms/grpo.py‎
Lines changed: 23 additions & 10 deletions b/‎nemo_rl/algorithms/grpo.py‎
Lines changed: 23 additions & 10 deletions
diff --git a/‎nemo_rl/algorithms/rm.py‎
Lines changed: 2 additions & 0 deletions b/‎nemo_rl/algorithms/rm.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎nemo_rl/algorithms/sft.py‎
Lines changed: 5 additions & 1 deletion b/‎nemo_rl/algorithms/sft.py‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎nemo_rl/data/datasets/utils.py‎
Lines changed: 11 additions & 3 deletions b/‎nemo_rl/data/datasets/utils.py‎
Lines changed: 11 additions & 3 deletions
@@ -11,6 +11,10 @@ This module provides a suite of tools to enable FP8 quantization for large langu
 - Uses **TransformerEngine** for linear layer implementation.
 - Supports both **Deepseek-style sub-channel scaling** and **per-tensor scaling**.
 
+### Recommended recipe
+- For Hopper GPUs we recommend to use FP8 (Deepseek-style) precision for both generation and training for best convergence and speedup
+- For Blackwell GPUs, FP8 (deepseek-style) with FP32 scaling factor is not supported in training. Currently we recommend to use FP8 precision for generation and BF16 for training. We are actively exploring other recipes for better performance.
+
 ## Integration with NeMo RL
 
 NeMo RL applies monkey patches to several core `vLLM` components to enable FP8 generation for reinforcement learning.  
 
@@ -0,0 +1,58 @@
+# Fault Tolerance Launcher Guide
+
+The `ft_launcher` is provided by `nvidia-resiliency-ext` (included in NeMo RL dependencies) and enables automatic fault tolerance and recovery for distributed training runs.
+
+## Key Arguments
+
+| Argument | Description | Example |
+|----------|-------------|---------|
+| `--ft-cfg-path` | Path to FT YAML config file | `examples/ft_launcher/ft_config.yaml` |
+| `--ft-rank-heartbeat-timeout` | Heartbeat timeout in seconds | `450` |
+| `--ft-initial-rank-heartbeat-timeout` | Initial timeout (longer for setup) | `1200` |
+| `--max-restarts` | Maximum number of restart attempts | `5` |
+
+## Basic Usage
+
+```bash
+uv run ft_launcher \
+    --ft-cfg-path examples/ft_launcher/ft_config.yaml \
+    --ft-rank-heartbeat-timeout 450 \
+    --ft-initial-rank-heartbeat-timeout 1200 \
+    --max-restarts 5 \
+    examples/run_grpo_math.py \
+    --config <your_config.yaml>
+```
+
+## FT Config File (examples/ft_launcher/ft_config.yaml)
+
+```yaml
+fault_tolerance:
+  initial_rank_heartbeat_timeout: 360
+  restart_policy: any-failed
+```
+
+## Important Notes
+
+1. **Checkpointing**: Enable checkpointing for recovery to work:
+   ```bash
+   ++checkpointing.enabled=true
+   ++checkpointing.checkpoint_dir=/path/to/checkpoints
+   ++checkpointing.save_period=50
+   ```
+
+2. **Timeouts**: Set `--ft-initial-rank-heartbeat-timeout` higher than `--ft-rank-heartbeat-timeout` to allow for model loading/setup time.
+
+3. **Restart Policy**: The `any-failed` restart policy will restart the entire job if any rank fails. Look for these log messages to identify when a restart occurs:
+
+   ```
+   [ERROR] [ft_launcher...] failed (exitcode: 1) local_rank: 0 (pid: ...) of binary: ...
+   [INFO] [ft_launcher...] [default] Worker group FAILED. 3/5 attempts left; will restart worker group
+   [INFO] [ft_launcher...] Stopping workers... Timeout = 30 sec.
+   [INFO] [ft_launcher...] The node '...' attempts to join the next round of the rendezvous '...'.
+   [INFO] [ft_launcher...] The node '...' has joined round N of the rendezvous '...' as rank 0 in a world of size 1.
+   ```
+
+   Key indicators:
+   - `Worker group FAILED. X/Y attempts left` - shows a restart is happening and remaining attempts
+   - `will restart worker group` - confirms restart is in progress
+   - `has joined round N` - the round number increases with each restart
@@ -219,6 +219,7 @@ guides/deepseek.md
 model-quirks.md
 guides/async-grpo.md
 guides/dtensor-tp-accuracy.md
+guides/ft-launcher-guide.md
 ```
 
 ```{toctree}
 
@@ -0,0 +1,3 @@
+fault_tolerance:
+  initial_rank_heartbeat_timeout: 360
+  restart_policy: any-failed
@@ -695,7 +695,9 @@ def distillation_train(
                 print("▶ Computing teacher logprobs...", flush=True)
                 with timer.time("teacher_logprob_inference"):
                     teacher_topk = teacher_policy.get_topk_logits(
-                        train_data, k=master_config["distillation"]["topk_logits_k"]
+                        train_data,
+                        k=master_config["distillation"]["topk_logits_k"],
+                        timer=timer,
                     )
                     train_data["teacher_topk_logits"] = teacher_topk["topk_logits"]
                     train_data["teacher_topk_indices"] = teacher_topk["topk_indices"]
@@ -708,7 +710,11 @@ def distillation_train(
 
                 print("▶ Training policy...", flush=True)
                 with timer.time("policy_training"):
-                    train_results = student_policy.train(train_data, loss_fn)
+                    train_results = student_policy.train(
+                        train_data,
+                        loss_fn,
+                        timer=timer,
+                    )
 
                 is_last_step = (total_steps + 1 >= max_steps) or (
                     (current_epoch + 1 == max_epochs)
 
@@ -572,6 +572,7 @@ def dpo_train(
                         ## examples, chosen and rejected, and the pair needs to be processed as part of the same microbatch.
                         gbs=master_config["policy"]["train_global_batch_size"] * 2,
                         mbs=master_config["policy"]["train_micro_batch_size"] * 2,
+                        timer=timer,
                     )
 
                 is_last_step = total_steps + 1 >= master_config["dpo"][
 
@@ -1516,17 +1516,18 @@ def grpo_train(
                             **extra_multimodal_data,
                         }
                     )
-                    train_data["prev_logprobs"] = policy.get_logprobs(logprob_data)[
-                        "logprobs"
-                    ]
+                    train_data["prev_logprobs"] = policy.get_logprobs(
+                        logprob_data, timer=timer
+                    )["logprobs"]
 
                     if not master_config["grpo"].get(
                         "skip_reference_policy_logprobs_calculation"
                     ):
                         train_data["reference_policy_logprobs"] = (
-                            policy.get_reference_policy_logprobs(logprob_data)[
-                                "reference_logprobs"
-                            ]
+                            policy.get_reference_policy_logprobs(
+                                logprob_data,
+                                timer=timer,
+                            )["reference_logprobs"]
                         )
 
                     del logprob_data
@@ -1540,7 +1541,11 @@ def grpo_train(
 
                 print("▶ Training policy...", flush=True)
                 with timer.time("policy_training"):
-                    train_results = policy.train(train_data, loss_fn)
+                    train_results = policy.train(
+                        train_data,
+                        loss_fn,
+                        timer=timer,
+                    )
 
                 # Recompute KV scales after policy training if needed
                 if sync_kv_scales:
@@ -2510,9 +2515,13 @@ def async_grpo_train(
 
                 print("▶ Computing logprobs...")
                 with timer.time("policy_and_reference_logprobs"):
-                    fprop_logprobs = policy.get_logprobs(train_data)["logprobs"]
+                    fprop_logprobs = policy.get_logprobs(
+                        train_data,
+                        timer=timer,
+                    )["logprobs"]
                     reference_logprobs = policy.get_reference_policy_logprobs(
-                        train_data
+                        train_data,
+                        timer=timer,
                     )["reference_logprobs"]
                     train_data["prev_logprobs"] = fprop_logprobs
                     train_data["reference_policy_logprobs"] = reference_logprobs
@@ -2524,7 +2533,11 @@ def async_grpo_train(
 
                 print("▶ Training policy...")
                 with timer.time("policy_training"):
-                    train_results = policy.train(train_data, loss_fn)
+                    train_results = policy.train(
+                        train_data,
+                        loss_fn,
+                        timer=timer,
+                    )
 
                 print("🔄 Synchronizing policy weights to trajectory collector…")
                 generation_logger_metrics = None
 
@@ -343,6 +343,7 @@ def validate_one_dataset(
                 # NOTE: we double the batch size because each preference example corresponds to a pair of
                 # examples, chosen and rejected, and the pair needs to be processed as part of the same microbatch.
                 mbs=val_mbs * 2,
+                timer=timer,
             )
 
             if len(val_results["all_mb_metrics"]) == 0:
@@ -503,6 +504,7 @@ def rm_train(
                     ## examples, chosen and rejected, and the pair needs to be processed as part of the same microbatch.
                     gbs=master_config["policy"]["train_global_batch_size"] * 2,
                     mbs=master_config["policy"]["train_micro_batch_size"] * 2,
+                    timer=timer,
                 )
 
                 is_last_step = (
 
@@ -452,7 +452,11 @@ def sft_train(
 
                 print("▶ Taking a training step...")
                 with timer.time("policy_training"):
-                    train_results = policy.train(train_data, loss_fn)
+                    train_results = policy.train(
+                        train_data,
+                        loss_fn,
+                        timer=timer,
+                    )
 
                 is_last_step = total_steps + 1 >= master_config["sft"][
                     "max_num_steps"
 
@@ -63,15 +63,23 @@ def pil_to_base64(image: Image.Image, format: str = "PNG") -> str:
 
 
 def load_dataset_from_path(data_path: str, data_split: Optional[str] = "train"):
-    """Load a dataset from a json, huggingface dataset, or Arrow dataset (saved with save_to_disk).
+    """Load a dataset from a local file, huggingface dataset, or Arrow dataset (saved with save_to_disk).
 
     Args:
         data_path: The path to the dataset.
         data_split: The split to load from the dataset.
     """
+    FILEEXT2TYPE = {
+        ".arrow": "arrow",
+        ".csv": "csv",
+        ".json": "json",
+        ".jsonl": "json",
+        ".parquet": "parquet",
+        ".txt": "text",
+    }
     suffix = os.path.splitext(data_path)[-1]
-    if suffix in [".json", ".jsonl"]:
-        raw_dataset = load_dataset("json", data_files=data_path)
+    if dataset_type := FILEEXT2TYPE.get(suffix):
+        raw_dataset = load_dataset(dataset_type, data_files=data_path)
     else:
         try:
             raw_dataset = load_dataset(data_path)
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+fault_tolerance:`
	`2`	`+ initial_rank_heartbeat_timeout: 360`
	`3`	`+ restart_policy: any-failed`
Original file line number	Diff line number	Diff line change
`@@ -572,6 +572,7 @@ def dpo_train(`
`572`	`572`	`## examples, chosen and rejected, and the pair needs to be processed as part of the same microbatch.`
`573`	`573`	`gbs=master_config["policy"]["train_global_batch_size"] * 2,`
`574`	`574`	`mbs=master_config["policy"]["train_micro_batch_size"] * 2,`
	`575`	`+ timer=timer,`
`575`	`576`	`)`
`576`	`577`
`577`	`578`	`is_last_step = total_steps + 1 >= master_config["dpo"][`
Original file line number	Diff line number	Diff line change
`@@ -343,6 +343,7 @@ def validate_one_dataset(`
`343`	`343`	`# NOTE: we double the batch size because each preference example corresponds to a pair of`
`344`	`344`	`# examples, chosen and rejected, and the pair needs to be processed as part of the same microbatch.`
`345`	`345`	`mbs=val_mbs * 2,`
	`346`	`+ timer=timer,`
`346`	`347`	`)`
`347`	`348`
`348`	`349`	`if len(val_results["all_mb_metrics"]) == 0:`
`@@ -503,6 +504,7 @@ def rm_train(`
`503`	`504`	`## examples, chosen and rejected, and the pair needs to be processed as part of the same microbatch.`
`504`	`505`	`gbs=master_config["policy"]["train_global_batch_size"] * 2,`
`505`	`506`	`mbs=master_config["policy"]["train_micro_batch_size"] * 2,`
	`507`	`+ timer=timer,`
`506`	`508`	`)`
`507`	`509`
`508`	`510`	`is_last_step = (`