Skip to content

Commit e05e764

Browse files
committed
doc: add sglang usage document
1 parent 78f325c commit e05e764

File tree

1 file changed

+44
-1
lines changed

1 file changed

+44
-1
lines changed

README.md

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,9 +149,52 @@ torchrun --nproc-per-node 8 tests/test_update.py
149149
```
150150

151151
Other unit tests can be done with pytest.
152+
## SGLang Integration
153+
154+
Checkpoint Engine provides efficient distributed checkpoint loading for SGLang inference servers, significantly reducing model loading time for large models and multi-node setups.
155+
156+
### Quick Start
157+
158+
**1. Install checkpoint-engine:**
159+
```bash
160+
pip install 'checkpoint-engine[p2p]'
161+
```
162+
163+
**2. Launch SGLang server:**
164+
```bash
165+
python -m sglang.launch_server \
166+
--model-path $MODEL_PATH \
167+
--tp 8 \
168+
--load-format dummy \
169+
--wait-for-initial-weights
170+
```
171+
172+
**3. Run checkpoint engine:**
173+
```bash
174+
python -m sglang.srt.checkpoint_engine.update \
175+
--update-method broadcast \
176+
--checkpoint-path $MODEL_PATH \
177+
--inference-parallel-size 8
178+
```
179+
180+
### Multi-Node Setup
181+
182+
For 2-node setup, run the same commands on both nodes with appropriate `--host` and distributed training parameters.
183+
184+
### Key Options
185+
186+
**SGLang Server:**
187+
- `--wait-for-initial-weights`: Wait for checkpoint engine before becoming ready
188+
- `--load-format dummy`: Enable overlapping initialization tasks
189+
190+
**Checkpoint Engine:**
191+
- `--update-method`: Choose `broadcast`, `p2p`, or `all`
192+
- `--inference-parallel-size`: Number of parallel processes
193+
- `--checkpoint-path`: Model checkpoint directory
194+
152195
## Limitations and Future Work
153196

154-
- This project is currently only tested with vLLM. But it is easy to integrate with other frameworks like SGLang.
197+
- This project is currently tested with vLLM and SGLang. Integration with other frameworks is planned for future releases.
155198
- The perfect three-stage pipeline mentioned in our paper is currently not implemented. This could be useful for architectures where H2D and broadcast do not conflict in PCIE.
156199

157200
## Acknowledgments

0 commit comments

Comments
 (0)