-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Resize T5 Vocab #5497
Resize T5 Vocab #5497
Conversation
Thank you! @dirkgr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments. Looks fine to me, overall. Will resize_token_embeddings
be called internally somewhere? When does the user call it? Should there be a docstring for what init_fn
is supposed to be (it wasn't clear to me without reading the code)?
|
||
# resize lm head | ||
old_size = self.lm_head.out_features | ||
if old_size == new_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing: Maybe we can do this check first thing? It'll avoid the 2 empty calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps not? We want to resize the embedding dim and the output dim to the same new_size
. Will the old embedding_dim
and the old lm_head.out_features
be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of defensive programming. It should never happen, but if the three places where the size of the token embedding matters get out of sync, this call will sync them all back up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a docstring in 074975f, to hopefully make it clearer what init_fn
does.
|
||
# resize lm head | ||
old_size = self.lm_head.out_features | ||
if old_size == new_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of defensive programming. It should never happen, but if the three places where the size of the token embedding matters get out of sync, this call will sync them all back up.
I also want to point out that I managed to push this and it passed all CI checks the first time 😁. |
CC @HarshTrivedi