8000 calypso transformer by joelgrus · Pull Request #2049 · allenai/allennlp · GitHub

More Web Proxy on the site http://driver.im/

This repository was archived by the owner on Dec 16, 2022. It is now read-only.

calypso transformer #2049

Merged

joelgrus merged 4 commits into allenai:master from joelgrus:calypso-transformer

Nov 21, 2018

Contributor

joelgrus commented

there's still some potential duplicate code, but it's not exact duplication, so it would take some work to factor out the common parts and I don't know that that's a great use of anyone's time.

for the most part this is just a straight port + adding type annotations and minor cleanup

joelgrus added 2 commits

November 13, 2018 06:58


          bidirectional transformer encoder

bf8db7c


          factoring

37aa7f2

joelgrus requested a review from brendan-ai2

November 13, 2018 18:58

Contributor

brendan-ai2 commented

Process question: Given that this is mostly raw Calypso code, should this be more on the stamp end of the code review spectrum or should I be more detailed?

Contributor Author

joelgrus commented

if you see something that you think could be improved, then you should point it out.

but if you have questions like "why did you implement X this way?" the answer is likely going to be "that's how it was implemented in calypso and I just copied the code over" 🤷‍♀️

brendan-ai2 reviewed

View reviewed changes

allennlp/modules/seq2seq_encoders/bidirectional_transformer_encoder.py Outdated

		@@ -0,0 +1,265 @@
		"""
		The BidirectionalTransformerEncoder from calypso

Contributor

brendan-ai2

I think we should link the annotated transformer tutorial (http://nlp.seas.harvard.edu/2018/04/03/attention.html) here as this implementation appears to be based on that. MIT license conveniently. https://github.com/harvardnlp/annotated-transformer/blob/master/LICENSE

brendan-ai2 reviewed

View reviewed changes

allennlp/modules/seq2seq_encoders/bidirectional_transformer_encoder.py

+              class SublayerConnection(torch.nn.Module):
+                  """
+                  A residual connection followed by a layer norm.
+                  Note for code simplicity the norm is first as opposed to last.

Contributor

brendan-ai2

Huh. This is different than the original transformer described in "Attention Is All You Need". They use self.norm(x + self.dropout(sublayer(x))). Not sure why that's more complicated. Regardless, since this is how Calypso/Annotated Transformer do it, looks good...

Contributor

brendan-ai2

Ah, it's simply that they apply the norm later. Not that it's left out.

brendan-ai2 reviewed

View reviewed changes

allennlp/modules/seq2seq_encoders/bidirectional_transformer_encoder.py Outdated

+                      self.d_k = input_dim // num_heads
+                      self.num_heads = num_heads
+                      self.linears = util.clone(torch.nn.Linear(input_dim, input_dim), 4)
+                      self.attn = None

Contributor

brendan-ai2

Appears to be dead code.

brendan-ai2 reviewed

View reviewed changes

allennlp/modules/seq2seq_encoders/bidirectional_transformer_encoder.py

+                      # We assume d_v always equals d_k
+                      self.d_k = input_dim // num_heads
+                      self.num_heads = num_heads
+                      self.linears = util.clone(torch.nn.Linear(input_dim, input_dim), 4)

Contributor

brendan-ai2

It's a bit scary to have these stored as a list because fundamentally they're heterogeneous. The first three layers or for projecting the query, key and value respectively and the final layer is for projecting the concatenated heads. Since you're just porting I don't want to request any big changes, but a warning comment here would be great.

brendan-ai2 reviewed

View reviewed changes

allennlp/modules/seq2seq_encoders/bidirectional_transformer_encoder.py

+                      if input_dropout:
+                          self._dropout = torch.nn.Dropout(input_dropout)
+                      else:
+                          self._dropout = lambda x: x

Contributor

brendan-ai2

For my edification, have we found there to be substantial overhead to torch.nn.Dropout(0.0)?

brendan-ai2 approved these changes

View reviewed changes

Contributor

brendan-ai2 left a comment

Only critical comment is that licensing one. The rest are minor/for my understanding. Thanks for the PR, Joel!

Contributor

brendan-ai2 commented

Per Matt's suggestion on Slack, we should probably mark this as AllenNLP internal somehow. In the hopes of consolidating to one transformer implementation soon.

Also, it would be great to get this merged before the break if you have time. :)

joelgrus added 2 commits

November 21, 2018 06:22


          Merge branch 'master' into calypso-transformer

225a98b


          address PR feedback

1c79c5b

Contributor Author

joelgrus commented

comments addressed, sorry about the delay

joelgrus merged commit 6b16222 into allenai:master

Contributor

brendan-ai2 commented

Thanks 77B8 , Joel! No worries at all.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

0