New WikiTables language #2350

matt-gardner · 2019-01-12T16:17:46Z

This PR re-writes the variable-free language using the new API that does automatic grammar induction. I improved some things while porting the NLVR language, and this PR depends on those improvements, so it's not ready for review until #2319 is merged. But I'm opening this now so I can test it on CI.

This PR is a net reduction of ~600 lines of code, and the code that's left is way simpler. I'm pretty happy with how this rewrite of the grammar stuff worked out.

A few points to note on this new language:

I considered fixing the row / list of rows distinction (as mentioned here), but I thought for this initial PR I'd stick as closely as possible to the original language. We can change it later if we want to. Though, for this particular decision, I'm not sure if distinguishing the two actually helps you - you remove some spurious programs, but it comes at the expense of a deeper type system, because you have to add function calls to get from List[Row] to Row, and the type of select gets muddy.
Related a bit to the above, I removed the "multi-match" stuff entirely. Instead, I made the multi-matching happen at the terminal productions. This was a lot easier to do than implementing multi-match stuff in the new grammar induction API, and, more importantly, I think the new way makes more sense from a modeling perspective than the previous way. For instance, select has type <List[Row],Column:List[str]>. With the multi-match implementation, you would get three productions for List[str]: ['List[str] -> [<List[Row],Column:List[str]>, List[Row], DateColumn]', 'List[str] -> [<List[Row],Column:List[str]>, List[Row], NumberColumn]', 'List[str] -> [<List[Row],Column:List[str]>, List[Row], StringColumn]']. This means that the model has to decide when it's expanding List[str] what kind of column it is going to use. Except the column it uses is really a linking decision, and we have a lot more capacity to make that decision when we're predicting terminals, using our linking mechanism. So we're asking the List[str] embedding to do a whole lot, more than it should be doing. The way I implemented this, List[str] has just one production from select: ['List[str] -> [<List[Row],Column:List[str]>, List[Row], Column], and now there's a Column pre-terminal that expands to columns of all types. That is, the "multi-matching" happens at the terminals (where they are parameterized with a linking mechanism), instead of high up in the abstract syntax tree. This paragraph describes the modeling reasons for doing it this way, and what it means for how the grammar is different. To see how it's implemented, look at the changes I made to domain_language.py.
While I did not distinguish between List[Row] and Row, I did make a minor change to the language, distinguishing between List[str] (the result of doing a select or a mode) and str (the words you can use from the question). filter_in and filter_not_in using str instead of List[str].
This seemed like an obvious improvement, as question words are always strings, not lists. Though, thinking about it more, maybe I messed up - this means you can't filter based on the output of a select. @pdasigi, do we currently support that? Do we need it? It wasn't in any of the parsing tests, or it would have failed. This is easy to change back, if we need to (we can just mess with the types a bit to coerce the strings to be treated as List[str]).

matt-gardner · 2019-01-15T05:18:25Z

Ok, @pdasigi, this is rebased and should be ready for review.

pdasigi

Looks great!

allennlp/semparse/domain_languages/wikitables_language.py

allennlp/tests/semparse/domain_languages/wikitables_language_test.py

pdasigi · 2019-01-16T19:59:25Z

@matt-gardner Oh, and regarding your question about str vs List[str], I think it is better to have everything be a List[str]. More generally, the original implementation of this language involved evaluating the "expression" passed as the filter value in all filtering functions, and thus the value used for filtering could itself be a logical form with select, mode, etc. That said, I think it is okay to merge this PR now and improve the language to include those features later. The language in this repository is not the latest version of our implementation anyway (the one in allennlp-weak-supervision-research has some improvements on top of this). Actually, you may want to add a TODO somewhere saying that some features in the language are missing.

berlino · 2019-01-24T22:08:20Z

@matt-gardner Regarding differentiating between "rows" and "a list of rows", grammar indeed becomes messy after my attempts. But I think a more realistic way would be enforce some constraints during the execution phase. For instance, it would fail if the function "previous/next" gets a list row with more than one element; on the contrary, functions like "average/sum/min" should take a list of rows with more than one element (though it's possible to have only one). Consider the cases like "min (argmax ... ), average(argmin(...))", they are very likely to be spurious programs. Does it make sense to you?

matt-gardner · 2019-01-24T23:20:27Z

Yes, failing during execution is easy - we just need to raise an ExecutionError at the appropriate time, and be sure we catch it. I suppose you could use this to filter out a few spurious programs during decoding. I have no intuition for how often this actually happens, though.

matt-gardner force-pushed the wtq_language branch from 8260e39 to 1702b4c Compare January 15, 2019 05:10

matt-gardner changed the title ~~[Depends on #2319] New WikiTables language~~ New WikiTables language Jan 15, 2019

matt-gardner requested a review from pdasigi January 15, 2019 05:18

pdasigi approved these changes Jan 16, 2019

View reviewed changes

matt-gardner added 7 commits January 16, 2019 16:28

Working on moving WikiTables language to new API

f7f545a

Logical form execution passes all of the original tests

66f72ea

Got WikiTablesLanguage passing all tests

eb53edc

Remove old executor / type declaration / world

ea98432

pylint, mypy, and docs

c341c65

Switch executor tests back to returning lists

ab492be

Address PR feedback

6d9f364

matt-gardner force-pushed the wtq_language branch from 873b7f5 to 6d9f364 Compare January 17, 2019 00:29

matt-gardner merged commit ae63a2a into allenai:master Jan 17, 2019

matt-gardner deleted the wtq_language branch January 17, 2019 01:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New WikiTables language #2350

New WikiTables language #2350

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

New WikiTables language #2350

New WikiTables language #2350

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!