-
Notifications
You must be signed in to change notification settings - Fork 324
Open
Description
Dear Mr. Weidman,
I am currently trying to understand the code in [45], the function "loss_gradients".
I just want to ask, if in the line
loss_gradients['B1'] = dLdB1.sum(axis=0)
it should be written instead:
loss_gradients['B1'] = dLdB1
Reason:
The expression dLdB1 in my test project shows me, that the dimension of it is (hidden_size,1).
Also the dimension of weights['B1'] is (hidden_size,1).
If the expression additonally sum over all [hidden_size] entries, then each [hidden_size] entry of weights['B1'] is updated with the same value. That seems not correct for me.
Best Regards
Metadata
Metadata
Assignees
Labels
No labels