Skip to content

Commit ef06d44

Browse files
committed
fixed codderabbitai comments
1 parent aea6a66 commit ef06d44

File tree

1 file changed

+6
-7
lines changed

1 file changed

+6
-7
lines changed

docs/guides/sft.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ policy:
191191
use_triton: true # Use Triton-optimized kernels (DTensor v2 path)
192192
```
193193

194-
### Parameter Details
194+
### DTensor (Automodel) Parameter Details
195195
- **`enabled`** (bool): Whether to enable LoRA training
196196
- **`target_modules`** (list): Specific module names to apply LoRA. Empty with `match_all_linear=true` applies to all linear layers
197197
- **`exclude_modules`** (list): Module names to exclude from LoRA
@@ -230,13 +230,13 @@ policy:
230230
lora_dtype: None # Weight's dtype
231231
```
232232

233-
### Parameter Details
233+
### Megatron Parameter Details
234234
- **`enabled`** (bool): Whether to enable LoRA training
235235
- **`target_modules`** (list): Specific module names to apply LoRA. Defaults to all linear layers if the list is left empty. Example: ['linear_qkv', 'linear_proj', 'linear_fc1', 'linear_fc2'].
236-
- 'linear_qkv': Apply LoRA to the fused linear layer used for query, key, and value projections in self-attention.
237-
- 'linear_proj': Apply LoRA to the linear layer used for projecting the output of self-attention.
238-
- 'linear_fc1': Apply LoRA to the first fully-connected layer in MLP.
239-
- 'linear_fc2': Apply LoRA to the second fully-connected layer in MLP.
236+
- 'linear_qkv': Apply LoRA to the fused linear layer used for query, key, and value projections in self-attention.
237+
- 'linear_proj': Apply LoRA to the linear layer used for projecting the output of self-attention.
238+
- 'linear_fc1': Apply LoRA to the first fully-connected layer in MLP.
239+
- 'linear_fc2': Apply LoRA to the second fully-connected layer in MLP.
240240
Target modules can also contain wildcards. For example, you can specify target_modules=['*.layers.0.*.linear_qkv', '*.layers.1.*.linear_qkv'] to add LoRA to only linear_qkv on the first two layers.
241241
- **`exclude_modules`** (List[str], optional): A list of module names not to apply LoRa. It will match all nn.Linear & nn.Linear-adjacent modules whose name does not match any string in exclude_modules. If used, will require target_modules to be empty list or None.
242242
- **`dim`** (int): LoRA rank (r). Lower values = fewer parameters but less capacity. Typical: 4, 8, 16, 32, 64
@@ -247,7 +247,6 @@ policy:
247247
- **`lora_B_init`** (str): Initialization method for the low-rank matrix B. Defaults to "zero".
248248
- **`a2a_experimental`** (bool): Enables the experimental All-to-All (A2A) communication strategy. Defaults to False.
249249
- **`lora_dtype`** (torch.dtype): Weight's dtype, by default will use orig_linear's but if they are quantized weights (e.g. 4bit) needs to be specified explicitly.
250-
only.
251250

252251
### Megatron Example Usage
253252
The config uses DTensor by default, so the megatron backend needs to be explicitly enabled.

0 commit comments

Comments
 (0)