-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
in adpaters.py, the function run_transformer_block's description:
- ffn.w1.weight
Weight of the first linear transformation in the FFN.
Shape is (d_model, d_ff).
- ffn.w2.weight
Weight of the second linear transformation in the FFN.
Shape is (d_ff, d_model).
- ffn.w3.weight
Weight of the third linear transformation in the FFN.
Shape is (d_model, d_ff).
When I trying to implement this function as the shape in description, it doesn't match, I print it and find out the actually shape test_code input is
(d_ff, d_model). I think whatever your math is wtx or xw, but your input should match the description
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels