-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
Now forward is like that
def forward(self, x, value_residual = None):
first_values = None
for attn, ff in self.layers:
x, next_values = attn(x, value_residual = value_residual)
first_values = default(first_values, next_values)
x = ff(x)
return self.norm(x), first_valuesvs before:
def forward(self, x):
for attn, ff in self.layers:
x = attn(x) + x
x = ff(x) + x
return self.norm(x)It broke compatibility with old weights. Was it made intentionally? Is it needed for value residual learning and hyper connections?
lucidrains
Metadata
Metadata
Assignees
Labels
No labels