-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really familiar with this model, so my comments are mostly stylistic / about documentation
""" | ||
A Biaffine attention layer. | ||
|
||
This layer computes two projections of its' inputs in addition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary apostrophe after its
Parameters | ||
---------- | ||
input1 : ``torch.Tensor`` | ||
An input tensor with shape (batch, timesteps, input1_size). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are called input1_dim and input2_dim above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, batch_size
------- | ||
A tensor with shape (batch, output_dim, length, length). | ||
""" | ||
# Shape (batch_size, num_labels, timesteps, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_labels -> output_dim?
of nodes. | ||
current_nodes : ``List[bool]``, required. | ||
The nodes which are representatives in the graph. | ||
A representative at it's most basic represents a node, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its
An empty dictionary which will be populated with the | ||
nodes which are connected in the minimum spanning tree. | ||
old_input: ``numpy.ndarray``, required. | ||
old_input: ``numpy.ndarray``, required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeated line
allennlp/models/dependency_parser.py
Outdated
minus_mask = (1 - float_mask) * minus_inf | ||
attended_arcs = attended_arcs + minus_mask.unsqueeze(2) + minus_mask.unsqueeze(1) | ||
|
||
if self.training or (self.eval and not self.use_mst_decoding_for_validation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't self.eval
a function, so it will always evaluate as truthy? possibly you just want
if self.training or not self.use_mst_decoding_for_validation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks 👍
allennlp/models/dependency_parser.py
Outdated
child_type_representation, | ||
attended_arcs, | ||
mask) | ||
elif self.use_mst_decoding_for_validation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then just use else
here? I stared at it for a minute to figure out what happens if neither condition is hit before realizing that couldn't happen
allennlp/models/dependency_parser.py
Outdated
------- | ||
An output dictionary consisting of: | ||
loss : torch.FloatTensor, optional | ||
A scalar loss to be optimised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's other stuff in the output dict too
|
||
self.use_mst_decoding_for_validation = use_mst_decoding_for_validation | ||
|
||
self._attachement_scores = AttachmentScores() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: _attachment_scores
allennlp/models/dependency_parser.py
Outdated
mask) | ||
loss = arc_nll + type_nll | ||
|
||
# We calculate attatchment scores for the whole sentence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: attachment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran out of time for finishing this when I got to the MST algorithm, but what I looked at looked pretty good.
@@ -77,14 +82,21 @@ def text_to_instance(self, # type: ignore | |||
indices as fields. | |||
""" | |||
fields: Dict[str, Field] = {} | |||
tokens = TextField([Token(w) for w in words], self._token_indexers) | |||
|
|||
# In order to make it easy to structure a model as predicting arcs, we add a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this should be in a docstring, as it affects how you use this class.
fields["words"] = tokens | ||
fields["pos_tags"] = SequenceLabelField(upos_tags, tokens, label_namespace="pos") | ||
fields["head_tags"] = SequenceLabelField([x[0] for x in dependencies], | ||
if self._use_pos_tags and upos_tags is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant? Or are you looking ahead to a demo? Hmm, yeah, you probably need both of these conditions to handle a demo correctly.
allennlp/models/dependency_parser.py
Outdated
@Model.register("dependency_parser")
class DependencyParser(Model):
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brief model overview and pointer to the paper you're re-implementing here.
|
||
@Model.register("dependency_parser") | ||
class DependencyParser(Model): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brief model overview and pointer to the paper you're re-implementing here.
allennlp/models/dependency_parser.py
Outdated
# shape (batch_size, timesteps, type_representation_dim) | ||
head_type_representation = F.elu(self.head_type_projection(encoded_text)) | ||
child_type_representation = F.elu(self.child_type_projection(encoded_text)) | ||
head_type_representation = head_type_representation.contiguous() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this already contiguous? Seems like it should be.
allennlp/models/dependency_parser.py
Outdated
float_mask = mask.float() | ||
encoded_text = self.encoder(embedded_text_input, mask) | ||
|
||
# shape (batch_size, timesteps, arc_representation_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_words
or sentence_length
instead of timesteps
? Applies throughout.
""" | ||
A Biaffine attention layer. | ||
|
||
This layer computes two projections of its' inputs in addition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/its'/its/
first_biaffine = torch.matmul(input1.unsqueeze(1), self._biaffine_projection) | ||
# Shape (batch, output_dim, timesteps, timesteps) | ||
second_biaffine = torch.matmul(first_biaffine, input2.unsqueeze(1).transpose(2, 3)) | ||
combined = second_biaffine + projected_input1 + projected_input2 + self._bias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do this with a single biaffine mu 10000 ltiplication if you append a bias to each input before the multiplication. That is:
# in __init__
self._biaffine_projection = Parameter(torch.Tensor(output_dim, input_dim1 + 1, input_dim2 + 1))
# here
input1 = torch.cat([input1, ones], dim=-1)
input2 = torch.cat([input2, ones], dim=-1)
# first and second biaffine matmuls as before, and the result
# of `second_biaffine` is now what was `combined`
And if you do that, you could just add a flag for whether to include these biases to our current BilinearMatrixAttention
, and you could just use that without having to add another class.
@@ -0,0 +1,287 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really look at this file, as I need to get on to some ACL prep stuff, and I'm not familiar with the algorithm, anyway. I'll just trust your tests.
|
||
def test_dependency_parser_can_save_and_load(self): | ||
self.ensure_model_can_train_save_and_load(self.param_file) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra blanks lines in a few places here.
correct_labels_and_indices = correct_indices * correct_labels | ||
labeled_exact_match = (correct_labels_and_indices + (1 - mask)).prod(dim=-1) | ||
|
||
self._unlabeled_count += correct_indices.sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_unlabed_count
feels like a denominator name to me, not a numerator. It'd be more obvious as self._unlabeled_correct
. Same for the others.
float_mask = mask.float() | ||
encoded_text = self.encoder(embedded_text_input, mask) | ||
|
||
# shape (batch_size, timesteps, arc_representation_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a strong preference for something with more semantic content than "timesteps" here; if you really prefer "timesteps", then leave it, I just think it's a pretty vacuous (and misleading) term. You have "num_tokens" in part of your docstring above, so whatever you pick, make sure you update the docstring to match.
* model shell * add test fixture for dependency parser * make batch model test ignore any key containing 'loss' * add dummy root node so we can predict an arc for it * mostly functional forward pass * checkpoint, added a better split out structure for inference * docs for loss * add greedy decoding to parser * add MST decoding * add decode for dependency parser * tweaks * add dependency parsing metric and integrate into dependency parser * cleanup * add docs * clean up biaffine attention * add edmonds algorithm for decoding * clean up * refactor * some initial tests for edmonds alg * docs * tidy up more * add some tests for edmonds alg, remove symbolic labels from args * add some more tests * add some more docs * fix typo * m-m-m-more typos * make pos tags optional so we can crush them with elmo * add edmonds alg to docs * cleanup * explain complicated indexing thing, more docs * remove from_params from dependency parser * pylint * superficial PR coments * change name of model * use range vec instead of arange * metrics tweaks * fix lint * more tweaks * fix docs * appease sphinx * add bias option to bilinear matrix attention * refactor bilinear matrix attention to use matmuls * remove biaffine attention * more documentation about the model * remove unecessary calls to .data * simplify advanced indexing, replace use of timesteps
Adds a dependency parser for Universal Dependencies along the lines of Deep Biaffine Attention for Neural Dependency Parsing
Adds a
Metric
for dependency parsing metrics, including unlabeled/labeled attachment scores and exact match sentence level metrics.