Some questions about decoder position embedding for masked tokens#173
Open
chrisway613 wants to merge 1 commit intolucidrains:mainfrom
Open
Some questions about decoder position embedding for masked tokens#173chrisway613 wants to merge 1 commit intolucidrains:mainfrom
chrisway613 wants to merge 1 commit intolucidrains:mainfrom
Conversation
Owner
|
@chrisway613 Hi Chris! while this is true, i think leaving untrained parameters in the wrapper class isn't elegant. you can always just concat the CLS tokens onto the decoder_cls_token = nn.Parameter(torch.randn(1, decoder_dim))
pos_embs_with_cls_token = torch.cat((decoder_cls_token, self.decoder_pos_emb), dim = 0) |
dbb7bd1 to
b983bbe
Compare
ddff7a7 to
b3e90a2
Compare
19eb6d4 to
5e808f4
Compare
cbf6723 to
5cf8384
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In the decoder position embedding matrix, the size of first dim is the number of patches + 1, as the 1 for ViT's cls_token. But when embedding the position for masked tokens, their indices have not shifted 1, it may confuse with the position of the ViT's cls_token(although MAE do not use cls_token, but this will lead to weak extensibility if we wanna use the cls_token later)