8000 Dependency parser by DeNeutoy · Pull Request #1465 · allenai/allennlp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

Dependency parser #1465

Merged
merged 51 commits into from
Jul 20, 2018
Merged

Dependency parser #1465

merged 51 commits into from
Jul 20, 2018

Conversation

DeNeutoy
Copy link
Contributor
@DeNeutoy DeNeutoy commented Jul 6, 2018

@DeNeutoy DeNeutoy changed the title [WIP] Dependency parser Dependency parser Jul 9, 2018
Copy link
Contributor
@joelgrus joelgrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with this model, so my comments are mostly stylistic / about documentation

"""
A Biaffine attention layer.

This layer computes two projections of its' inputs in addition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary apostrophe after its

Parameters
----------
input1 : ``torch.Tensor``
An input tensor with shape (batch, timesteps, input1_size).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are called input1_dim and input2_dim above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, batch_size

-------
A tensor with shape (batch, output_dim, length, length).
"""
# Shape (batch_size, num_labels, timesteps, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_labels -> output_dim?

of nodes.
current_nodes : ``List[bool]``, required.
The nodes which are representatives in the graph.
A representative at it's most basic represents a node,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its

An empty dictionary which will be populated with the
nodes which are connected in the minimum spanning tree.
old_input: ``numpy.ndarray``, required.
old_input: ``numpy.ndarray``, required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeated line

minus_mask = (1 - float_mask) * minus_inf
attended_arcs = attended_arcs + minus_mask.unsqueeze(2) + minus_mask.unsqueeze(1)

if self.training or (self.eval and not self.use_mst_decoding_for_validation):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't self.eval a function, so it will always evaluate as truthy? possibly you just want

if self.training or not self.use_mst_decoding_for_validation:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks 👍

child_type_representation,
attended_arcs,
mask)
elif self.use_mst_decoding_for_validation:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then just use else here? I stared at it for a minute to figure out what happens if neither condition is hit before realizing that couldn't happen

-------
An output dictionary consisting of:
loss : torch.FloatTensor, optional
A scalar loss to be optimised.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's other stuff in the output dict too


self.use_mst_decoding_for_validation = use_mst_decoding_for_validation

self._attachement_scores = AttachmentScores()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: _attachment_scores

mask)
loss = arc_nll + type_nll

# We calculate attatchment scores for the whole sentence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: attachment

Copy link
Contributor
@matt-gardner matt-gardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran out of time for finishing this when I got to the MST algorithm, but what I looked at looked pretty good.

@@ -77,14 +82,21 @@ def text_to_instance(self, # type: ignore
indices as fields.
"""
fields: Dict[str, Field] = {}
tokens = TextField([Token(w) for w in words], self._token_indexers)

# In order to make it easy to structure a model as predicting arcs, we add a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this should be in a docstring, as it affects how you use this class.

fields["words"] = tokens
fields["pos_tags"] = SequenceLabelField(upos_tags, tokens, label_namespace="pos")
fields["head_tags"] = SequenceLabelField([x[0] for x in dependencies],
if self._use_pos_tags and upos_tags is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant? Or are you looking ahead to a demo? Hmm, yeah, you probably need both of these conditions to handle a demo correctly.


@Model.register("dependency_parser")
class DependencyParser(Model):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brief model overview and pointer to the paper you're re-implementing here.

# shape (batch_size, timesteps, type_representation_dim)
head_type_representation = F.elu(self.head_type_projection(encoded_text))
child_type_representation = F.elu(self.child_type_projection(encoded_text))
head_type_representation = head_type_representation.contiguous()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this already contiguous? Seems like it should be.

float_mask = mask.float()
encoded_text = self.encoder(embedded_text_input, mask)

# shape (batch_size, timesteps, arc_representation_dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_words or sentence_length instead of timesteps? Applies throughout.

"""
A Biaffine attention layer.

This layer computes two projections of its' inputs in addition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/its'/its/

first_biaffine = torch.matmul(input1.unsqueeze(1), self._biaffine_projection)
# Shape (batch, output_dim, timesteps, timesteps)
second_biaffine = torch.matmul(first_biaffine, input2.unsqueeze(1).transpose(2, 3))
combined = second_biaffine + projected_input1 + projected_input2 + self._bias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this with a single biaffine mu 10000 ltiplication if you append a bias to each input before the multiplication. That is:

# in __init__
self._biaffine_projection = Parameter(torch.Tensor(output_dim, input_dim1 + 1, input_dim2 + 1))

# here
input1 = torch.cat([input1, ones], dim=-1)
input2 = torch.cat([input2, ones], dim=-1)
# first and second biaffine matmuls as before, and the result
# of `second_biaffine` is now what was `combined`

And if you do that, you could just add a flag for whether to include these biases to our current BilinearMatrixAttention, and you could just use that without having to add another class.

@@ -0,0 +1,287 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really look at this file, as I need to get on to some ACL prep stuff, and I'm not familiar with the algorithm, anyway. I'll just trust your tests.


def test_dependency_parser_can_save_and_load(self):
self.ensure_model_can_train_save_and_load(self.param_file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blanks lines in a few places here.

correct_labels_and_indices = correct_indices * correct_labels
labeled_exact_match = (correct_labels_and_indices + (1 - mask)).prod(dim=-1)

self._unlabeled_count += correct_indices.sum()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_unlabed_count feels like a denominator name to me, not a numerator. It'd be more obvious as self._unlabeled_correct. Same for the others.

float_mask = mask.float()
encoded_text = self.encoder(embedded_text_input, mask)

# shape (batch_size, timesteps, arc_representation_dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have a strong preference for something with more semantic content than "timesteps" here; if you really prefer "timesteps", then leave it, I just think it's a pretty vacuous (and misleading) term. You have "num_tokens" in part of your docstring above, so whatever you pick, make sure you update the docstring to match.

@DeNeutoy DeNeutoy merged commit de0d3f7 into allenai:master Jul 20, 2018
gabrielStanovsky pushed a commit to gabrielStanovsky/allennlp that referenced this pull request Sep 7, 2018
* model shell

* add test fixture for dependency parser

* make batch model test ignore any key containing 'loss'

* add dummy root node so we can predict an arc for it

* mostly functional forward pass

* checkpoint, added a better split out structure for inference

* docs for loss

* add greedy decoding to parser

* add MST decoding

* add decode for dependency parser

* tweaks

* add dependency parsing metric and integrate into dependency parser

* cleanup

* add docs

* clean up biaffine attention

* add edmonds algorithm for decoding

* clean up

* refactor

* some initial tests for edmonds alg

* docs

* tidy up more

* add some tests for edmonds alg, remove symbolic labels from args

* add some more tests

* add some more docs

* fix typo

* m-m-m-more typos

* make pos tags optional so we can crush them with elmo

* add edmonds alg to docs

* cleanup

* explain complicated indexing thing, more docs

* remove from_params from dependency parser

* pylint

* superficial PR coments

* change name of model

* use range vec instead of arange

* metrics tweaks

* fix lint

* more tweaks

* fix docs

* appease sphinx

* add bias option to bilinear matrix attention

* refactor bilinear matrix attention to use matmuls

* remove biaffine attention

* more documentation about the model

* remove unecessary calls to .data

* simplify advanced indexing, replace use of timesteps
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0