Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token` by vivekh2000 · Pull Request #324 · lucidrains/vit-pytorch

vivekh2000 · 2024-07-25T16:34:34Z

Since in your code, the distillation_token and distill_mlp heads are defined in the DistillWrapper class, sending the model instance of the DistillableViT class to GPU does not send the distillation_token and distill_mlp head to GPU. Therefore, while training a model using this code, I got a device mismatch error, which made it hard to figure out the source of the error. Finally, the distillation_token and distill_mlp turned out to be the culprits as they are not defined in the model class but in the DistillWrapper class, which is a wrapper of loss function. Therefore, I have suggested the following changes when training a model on GPU: the training code should set the device="cude" if torch.cuda.is_available() else "cpu", or the same can be incorporated into the constructor of the DistillWrapper class.

…ead and `distillation_token` Since in your code, `distillation_token` and `distill_mlp` head are defined in the DistillWrapper class, sending the model instance of the DistillableViT class to GPU. do not send them to GPU. While training a model using this code, I got a device mismatch error, which made it hard to figure out the source of the error. Finally, the `distillation_token` and `distill_mlp` turned out to be the culprits as they are not defined in the model class but in the DistillWrapper class. Therefore, I have suggested the following changes, when training a model on GPU, the training code should set the device="cude" if torch.cuda.is_available() else "cpu". or the same can be incorporated in the constructor of the DistillWrapper class.

lucidrains force-pushed the main branch 3 times, most recently from 19eb6d4 to 5e808f4 Compare August 21, 2024 14:23

lucidrains force-pushed the main branch from 43cbcad to f50d7d1 Compare October 9, 2024 14:32

lucidrains force-pushed the main branch from 1de866d to db05a14 Compare March 5, 2025 18:50

lucidrains force-pushed the main branch from 0b273a2 to 3becf08 Compare September 25, 2025 13:21

lucidrains force-pushed the main branch 5 times, most recently from cbf6723 to 5cf8384 Compare October 28, 2025 19:17

lucidrains force-pushed the main branch from 7e703f2 to fb5014f Compare December 25, 2025 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token`#324

Update distill.py to include device agnostic code for `distill_mlp` head and `distillation_token`#324
vivekh2000 wants to merge 1 commit intolucidrains:mainfrom
vivekh2000:patch-3

vivekh2000 commented Jul 25, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vivekh2000 commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vivekh2000 commented Jul 25, 2024 •

edited

Loading