micro_acc_steps flag functionality

### `micro_acc_steps`: the documentation says the flag implements microbatching, but there seems no such functionality.
#### Expected behaviour (from README `--distribute_modules` example)

> “It accumulates gradients over **8** minibatches, **and splits each minibatch into 2 microbatches** before feeding them into the SAE encoder, thus saving a lot of memory.”  
> ```bash
> torchrun … --grad_acc_steps 8 … **--micro_acc_steps 2**
> ```  

---

#### Actual behaviour in the code

- `sparsify/config.py`:
```python  
micro_acc_steps: int = 1  # "Chunk the activations into this number of microbatches for training"
```
- `sparsify/trainer.py` (**only place** the value is used): 
```python  
acc_steps = self.cfg.grad_acc_steps * self.cfg.micro_acc_steps  
```

I don't see actual split on the `micro_acc_steps` minibatches, and the activations are fed to the SAE whole, regardless of the `micro_acc_steps` value.

---

From what I can see, setting `micro_acc_steps > 1` only multiplies the gradient-accumulation denominator (`acc_steps`). That means the effective learning rate goes down, but the memory footprint stays the same.

If that’s correct, it might be worth updating the README (and the flag’s doc-string in `config.py`) to avoid confusion for new users.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

micro_acc_steps flag functionality #111

`micro_acc_steps`: the documentation says the flag implements microbatching, but there seems no such functionality.

Expected behaviour (from README `--distribute_modules` example)

Actual behaviour in the code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

micro_acc_steps flag functionality #111

Description

micro_acc_steps: the documentation says the flag implements microbatching, but there seems no such functionality.

Expected behaviour (from README --distribute_modules example)

Actual behaviour in the code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`micro_acc_steps`: the documentation says the flag implements microbatching, but there seems no such functionality.

Expected behaviour (from README `--distribute_modules` example)