Add necessary implicit embedding extension for transfer-modules api and vocab extension #2431

HarshTrivedi · 2019-01-24T01:13:17Z

Please don't review this yet; I still need to change some code and add tests. This is a follow-up on #2374 and #2387, but I will probably get back after #2395 gets finalized.

…-embedding-extension-load

HarshTrivedi · 2019-02-07T19:32:34Z

Okay, this is up for review. @joelgrus, @matt-gardner

Two main things changed here:

If vocab extension is ON during training, the embedding extension should also happen implicitly. Or else, it will raise error. This is same as implicitly doing embedding extension when fine-tuning with vocab extension is ON.
If one is using transfer-modules api and has transferred embeding/text_field_embedder (eg. from old-archive) with Vocab extension ON, loading new-archive currently doesn't work. This is because the new-archive statedict would have extended embedding but from_params would yet load the embedding from old-archive before copying the state dict from new-archive model. So it's necessary to take precautionary embedding extension before copying state-dict.

Other change: we need to make sure embedding-extension is no-op unless we are sure (eg. it's incorrect to default to "tokens" namespace). Incorrect implicit assumption with above changes can make some tests fails.

matt-gardner

A couple of minor questions; otherwise looks very good, thanks for the PR!

matt-gardner · 2019-02-11T21:55:15Z

allennlp/modules/token_embedders/embedding.py

-            logging.warning("No vocab_namespace provided to Embedder.extend_vocab. Defaulting to 'tokens'.")
+            # It's not safe to default to 'tokens' when we aren't sure that 'tokens'
+            # need to be extended. (Without this, several tests fail.)
+            logging.warning("No vocab_namespace provided to Embedder.extend_vocab. Extension will be no-op'.")


In what circumstances will this actually emit a warning? Will almost everyone that loads or trains a model in practice see this warning? If so, it should be at the info level (or even debug).

The warning won't be seen for any models trained after #2374 because _vocab_namespace is stored. For previously trained models, it would almost always be seen because extend_vocab call is implicit now. Will change it to info level.

matt-gardner · 2019-02-11T21:55:33Z

allennlp/modules/token_embedders/embedding.py

+            return
+
+        extended_num_embeddings = extended_vocab.get_vocab_size(vocab_namespace)
+        if extended_num_embeddings <= self.num_embeddings:


Shouldn't this be ==, not <=?

Yes, if vocab and embedding are already in sync it should be ==. I had kept <= as a precaution against incorrect vocab namespace. But on second thought, if user passed an incorrect vocab namespace, it's better to raise error than a silent no-op. Will change it.

Worth explicitly raising error for < case? It's only possible with user explicitly passed an incorrect vocab namespace that's smaller than embedding itself.

Okay, I have separated == and < cases, making no-op in first and raising configuration error in second.

@matt-gardner Correcting this revealed a subtle issue: defaulting to tokens and token_characters namepsace can be problematic, when num_embeddings was used instead of vocab namespace to decide embedding size. Fixed this in last commit.

This is up for another look now.

.. oops, didn't realize you already reviewed again!

…shtrivedi/allennlp into fix-embedding-extension-load

matt-gardner

A few minor wording tweaks, and this is good to merge. Thanks for the PR, for this and all of the related functionality! I think it's turned out quite nicely.

matt-gardner · 2019-02-12T01:22:03Z

allennlp/modules/token_embedders/embedding.py

-            logging.warning("No vocab_namespace provided to Embedder.extend_vocab. Defaulting to 'tokens'.")
+            # It's not safe to default to 'tokens' when we aren't sure that 'tokens'
+            # need to be extended. (Without this, several tests fail.)
+            logging.info("No vocab_namespace provided to Embedder.extend_vocab. Extension will be no-op'.")


To make this more obvious, I'd recommend a message like "Loading a model trained before embedding extension was implemented; pass an explicit vocab namespace if you want to extend the vocabulary."

matt-gardner · 2019-02-12T01:22:43Z

allennlp/modules/token_embedders/embedding.py

-            vocab_namespace = "tokens"
-            logging.warning("No vocab_namespace provided to Embedder.extend_vocab. Defaulting to 'tokens'.")
+            # It's not safe to default to 'tokens' when we aren't sure that 'tokens'
+            # need to be extended. (Without this, several tests fail.)


No need to reference failing tests in comments in the code - just give the justification (that it's not safe to default to "tokens").

…shtrivedi/allennlp into fix-embedding-extension-load

HarshTrivedi · 2019-02-12T02:52:49Z

@matt-gardner, Thanks for the review! This should be good to merge now.

HarshTrivedi added 12 commits January 23, 2019 19:51

Fix loading of fine-tuned + extended model.

1d47891

Refactor comments and remove unnecessary line.

6a95490

Assure transferred embedding modules are extended when required.

9b9c58c

Make embedding extension be no-op when extension isn't required.

273193d

Don't allow embedding extension to assume "tokens" namespace.

95de475

Test embedding extension to be no-op when appropriate.

239db8e

Merge branch 'master' of https://github.com/allenai/allennlp into fix…

869b894

…-embedding-extension-load

Add more tests.

e99f022

Fix a typo.

e87c816

update comments.

179a792

Merge branch 'master' of https://github.com/allenai/allennlp into fix…

db18aa4

…-embedding-extension-load

Fix typos in test and add some comments.

23d0e0b

HarshTrivedi changed the title ~~[WIP] Fix loading of model which was vocab + embedding extended.~~ Add necessary implicit embedding extension for transfer-modules api and vocab extension Feb 7, 2019

Merge branch 'master' into fix-embedding-extension-load

44c4250

matt-gardner reviewed Feb 11, 2019

View reviewed changes

HarshTrivedi added 8 commits February 11, 2019 17:13

Remove duplicate condition.

f523194

Change no namespace warning to info log level.

2f437c7

Change no-op condition for extend_vocab call.

d64cb39

Raise error if vocab or namespace is surely incorrect.

50949cd

Fix pylint.

41492d5

Merge branch 'fix-embedding-extension-load' of https://github.com/har…

9ed95c2

…shtrivedi/allennlp into fix-embedding-extension-load

Update tests.

04c0c1c

Prevent incorrect vocab namespace assignment.

879135e

matt-gardner approved these changes Feb 12, 2019

View reviewed changes

HarshTrivedi added 4 commits February 11, 2019 21:12

Working tweaks in messages.

74fb009

Merge branch 'master' into fix-embedding-extension-load

9b0b06b

Wording tweak in comment.

5ce82ac

Merge branch 'fix-embedding-extension-load' of https://github.com/har…

3e7de21

…shtrivedi/allennlp into fix-embedding-extension-load

matt-gardner merged commit 39413f2 into allenai:master Feb 12, 2019

HarshTrivedi deleted the fix-embedding-extension-load branch February 12, 2019 03:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add necessary implicit embedding extension for transfer-modules api and vocab extension #2431

Add necessary implicit embedding extension for transfer-modules api and vocab extension #2431

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add necessary implicit embedding extension for transfer-modules api and vocab extension #2431

Add necessary implicit embedding extension for transfer-modules api and vocab extension #2431

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!