Dependency parser #1465

DeNeutoy · 2018-07-06T23:45:49Z

Adds a dependency parser for Universal Dependencies along the lines of Deep Biaffine Attention for Neural Dependency Parsing
Adds a Metric for dependency parsing metrics, including unlabeled/labeled attachment scores and exact match sentence level metrics.

joelgrus

I'm not really familiar with this model, so my comments are mostly stylistic / about documentation

joelgrus · 2018-07-11T19:47:15Z

allennlp/modules/biaffine_attention.py

+    """
+    A Biaffine attention layer.
+
+    This layer computes two projections of its' inputs in addition


unnecessary apostrophe after its

joelgrus · 2018-07-11T19:54:16Z

allennlp/modules/biaffine_attention.py

+        Parameters
+        ----------
+        input1 : ``torch.Tensor``
+            An input tensor with shape (batch, timesteps, input1_size).


these are called input1_dim and input2_dim above

also, batch_size

joelgrus · 2018-07-11T19:55:17Z

allennlp/modules/biaffine_attention.py

+        -------
+        A tensor with shape (batch, output_dim, length, length).
+        """
+        # Shape (batch_size, num_labels, timesteps, 1)


num_labels -> output_dim?

joelgrus · 2018-07-11T20:06:16Z

allennlp/nn/decoding/chu_liu_edmonds.py

+        of nodes.
+    current_nodes : ``List[bool]``, required.
+        The nodes which are representatives in the graph.
+        A representative at it's most basic represents a node,


joelgrus · 2018-07-11T20:06:33Z

allennlp/nn/decoding/chu_liu_edmonds.py

+        An empty dictionary which will be populated with the
+        nodes which are connected in the minimum spanning tree.
+    old_input: ``numpy.ndarray``, required.
+    old_input: ``numpy.ndarray``, required.


repeated line

joelgrus · 2018-07-11T20:19:16Z

allennlp/models/dependency_parser.py

+        minus_mask = (1 - float_mask) * minus_inf
+        attended_arcs = attended_arcs + minus_mask.unsqueeze(2) + minus_mask.unsqueeze(1)
+
+        if self.training or (self.eval and not self.use_mst_decoding_for_validation):


isn't self.eval a function, so it will always evaluate as truthy? possibly you just want

if self.training or not self.use_mst_decoding_for_validation:

Good catch, thanks 👍

joelgrus · 2018-07-11T20:19:51Z

allennlp/models/dependency_parser.py

+                                                    child_type_representation,
+                                                    attended_arcs,
+                                                    mask)
+        elif self.use_mst_decoding_for_validation:


and then just use else here? I stared at it for a minute to figure out what happens if neither condition is hit before realizing that couldn't happen

joelgrus · 2018-07-11T20:20:24Z

allennlp/models/dependency_parser.py

+        -------
+        An output dictionary consisting of:
+        loss : torch.FloatTensor, optional
+            A scalar loss to be optimised.


there's other stuff in the output dict too

joelgrus · 2018-07-11T20:42:42Z

allennlp/models/dependency_pa 1E79 rser.py

+
+        self.use_mst_decoding_for_validation = use_mst_decoding_for_validation
+
+        self._attachement_scores = AttachmentScores()


typo: _attachment_scores

joelgrus · 2018-07-11T20:44:12Z

allennlp/models/dependency_parser.py

+                                                     mask)
+            loss = arc_nll + type_nll
+
+            # We calculate attatchment scores for the whole sentence


typo: attachment

matt-gardner

I ran out of time for finishing this when I got to the MST algorithm, but what I looked at looked pretty good.

matt-gardner · 2018-07-11T19:26:48Z

allennlp/data/dataset_readers/universal_dependencies.py

@@ -77,14 +82,21 @@ def text_to_instance(self,  # type: ignore
        indices as fields.
        """
        fields: Dict[str, Field] = {}
-        tokens = TextField([Token(w) for w in words], self._token_indexers)
+
+        # In order to make it easy to structure a model as predicting arcs, we add a


Feels like this should be in a docstring, as it affects how you use this class.

matt-gardner · 2018-07-11T19:29:27Z

allennlp/data/dataset_readers/universal_dependencies.py

        fields["words"] = tokens
-        fields["pos_tags"] = SequenceLabelField(upos_tags, tokens, label_namespace="pos")
-        fields["head_tags"] = SequenceLabelField([x[0] for x in dependencies],
+        if self._use_pos_tags and upos_tags is not None:


Isn't this redundant? Or are you looking ahead to a demo? Hmm, yeah, you probably need both of these conditions to handle a demo correctly.

matt-gardner · 2018-07-11T19:45:53Z

allennlp/models/dependency_parser.py

+
+@Model.register("dependency_parser")
+class DependencyParser(Model):
+    """


Brief model overview and pointer to the paper you're re-implementing here.

matt-gardner · 2018-07-11T20:06:41Z

allennlp/models/dependency_parser.py

+        # shape (batch_size, timesteps, type_representation_dim)
+        head_type_representation = F.elu(self.head_type_projection(encoded_text))
+        child_type_representation = F.elu(self.child_type_projection(encoded_text))
+        head_type_representation = head_type_representation.contiguous()


Why isn't this already contiguous? Seems like it should be.

matt-gardner · 2018-07-11T20:10:57Z

allennlp/models/dependency_parser.py

+        float_mask = mask.float()
+        encoded_text = self.encoder(embedded_text_input, mask)
+
+        # shape (batch_size, timesteps, arc_representation_dim)


num_words or sentence_length instead of timesteps? Applies throughout.

matt-gardner · 2018-07-11T21:32:16Z

allennlp/modules/biaffine_attention.py

+    """
+    A Biaffine attention layer.
+
+    This layer computes two projections of its' inputs in addition


s/its'/its/

matt-gardner · 2018-07-11T21:55:04Z

allennlp/modules/biaffine_attention.py

+        first_biaffine = torch.matmul(input1.unsqueeze(1), self._biaffine_projection)
+        # Shape (batch, output_dim, timesteps, timesteps)
+        second_biaffine = torch.matmul(first_biaffine, input2.unsqueeze(1).transpose(2, 3))
+        combined = second_biaffine + projected_input1 + projected_input2 + self._bias


You can do this with a single biaffine mu 10000 ltiplication if you append a bias to each input before the multiplication. That is:

# in __init__ self._biaffine_projection = Parameter(torch.Tensor(output_dim, input_dim1 + 1, input_dim2 + 1)) # here input1 = torch.cat([input1, ones], dim=-1) input2 = torch.cat([input2, ones], dim=-1) # first and second biaffine matmuls as before, and the result # of `second_biaffine` is now what was `combined`

And if you do that, you could just add a flag for whether to include these biases to our current BilinearMatrixAttention, and you could just use that without having to add another class.

matt-gardner · 2018-07-11T22:03:52Z

allennlp/nn/decoding/chu_liu_edmonds.py

@@ -0,0 +1,287 @@
+


I didn't really look at this file, as I need to get on to some ACL prep stuff, and I'm not familiar with the algorithm, anyway. I'll just trust your tests.

matt-gardner · 2018-07-11T22:05:15Z

allennlp/tests/models/dependency_parser_test.py

+
+    def test_dependency_parser_can_save_and_load(self):
+        self.ensure_model_can_train_save_and_load(self.param_file)
+


Extra blanks lines in a few places here.

matt-gardner · 2018-07-11T22:09:13Z

allennlp/training/metrics/attachment_scores.py

+        correct_labels_and_indices = correct_indices * correct_labels
+        labeled_exact_match = (correct_labels_and_indices + (1 - mask)).prod(dim=-1)
+
+        self._unlabeled_count += correct_indices.sum()


_unlabed_count feels like a denominator name to me, not a numerator. It'd be more obvious as self._unlabeled_correct. Same for the others.

matt-gardner · 2018-07-20T13:01:48Z

allennlp/models/biaffine_dependency_parser.py

+        float_mask = mask.float()
+        encoded_text = self.encoder(embedded_text_input, mask)
+
+        # shape (batch_size, timesteps, arc_representation_dim)


I still have a strong preference for something with more semantic content than "timesteps" here; if you really prefer "timesteps", then leave it, I just think it's a pretty vacuous (and misleading) term. You have "num_tokens" in part of your docstring above, so whatever you pick, make sure you update the docstring to match.

* model shell * add test fixture for dependency parser * make batch model test ignore any key containing 'loss' * add dummy root node so we can predict an arc for it * mostly functional forward pass * checkpoint, added a better split out structure for inference * docs for loss * add greedy decoding to parser * add MST decoding * add decode for dependency parser * tweaks * add dependency parsing metric and integrate into dependency parser * cleanup * add docs * clean up biaffine attention * add edmonds algorithm for decoding * clean up * refactor * some initial tests for edmonds alg * docs * tidy up more * add some tests for edmonds alg, remove symbolic labels from args * add some more tests * add some more docs * fix typo * m-m-m-more typos * make pos tags optional so we can crush them with elmo * add edmonds alg to docs * cleanup * explain complicated indexing thing, more docs * remove from_params from dependency parser * pylint * superficial PR coments * change name of model * use range vec instead of arange * metrics tweaks * fix lint * more tweaks * fix docs * appease sphinx * add bias option to bilinear matrix attention * refactor bilinear matrix attention to use matmuls * remove biaffine attention * more documentation about the model * remove unecessary calls to .data * simplify advanced indexing, replace use of timesteps

Mark Neumann and others added 29 commits July 3, 2018 10:58

model shell

f8f15e2

add test fixture for dependency parser

9b83f33

make batch model test ignore any key containing 'loss'

449f01b

add dummy root node so we can predict an arc for it

f4dd4c0

mostly functional forward pass

5a497c1

checkpoint, added a better split out structure for inference

c9b83b6

docs for loss

f5c427c

add greedy decoding to parser

daa6d52

add MST decoding

18bd430

add decode for dependency parser

06f7e32

tweaks

da6de62

add dependency parsing metric and integrate into dependency parser

3f445c9

cleanup

2c48da2

add docs

d8e3a57

clean up biaffine attention

0c9e85c

add edmonds algorithm for decoding

d9b56cf

clean up

98bde82

refactor

f11e04d

some initial tests for edmonds alg

abcece6

docs

08b9d07

Merge branch 'master' into dependency-parser

c98147f

tidy up more

ac212f5

add some tests for edmonds alg, remove symbolic labels from args

d5f234a

add some more tests

9790c0b

add some more docs

e350ee5

fix typo

d2e2f26

m-m-m-more typos

b34db52

make pos tags optional so we can crush them with elmo

c96efa4

add edmonds alg to docs

2f30e8f

DeNeutoy changed the title ~~[WIP] Dependency parser~~ Dependency parser Jul 9, 2018

Mark Neumann added 5 commits July 9, 2018 11:52

cleanup

cc8dbb6

explain complicated indexing thing, more docs

aab941e

fix merge

79daa90

remove from_params from dependency parser

8a8f173

pylint

fbfb9e9

joelgrus reviewed Jul 11, 2018

View reviewed changes

matt-gardner approved these changes Jul 11, 2018

View reviewed changes

Mark Neumann and others added 15 commits July 12, 2018 09:57

superficial PR coments

859bd32

change name of model

2ad6f84

use range vec instead of arange

207a60a

metrics tweaks

8b3d8fb

fix lint

4D32

e21bc1a

Merge branch 'master' into dependency-parser

3a7dae6

more tweaks

4c7bc8d

fix docs

de9c7cd

Merge branch 'master' into dependency-parser

b7d7697

appease sphinx

40816a5

add bias option to bilinear matrix attention

f229feb

refactor bilinear matrix attention to use matmuls

f5a05e2

remove biaffine attention

46214c0

more documentation about the model

a4a1706

remove unecessary calls to .data

039e21b

matt-gardner reviewed Jul 20, 2018

View reviewed changes

Mark Neumann and others added 2 commits July 20, 2018 09:26

simplify advanced indexing, replace use of timesteps

efe53cf

Merge branch 'master' into dependency-parser

d425e5c

DeNeutoy merged commit de0d3f7 into allenai:master Jul 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dependency parser #1465

Dependency parser #1465

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		self.use_mst_decoding_for_validation = use_mst_decoding_for_validation

		self._attachement_scores = AttachmentScores()


		def test_dependency_parser_can_save_and_load(self):
		self.ensure_model_can_train_save_and_load(self.param_file)

Dependency parser #1465

Dependency parser #1465

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!