-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Conversation
8260e39
to
1702b4c
Compare
Ok, @pdasigi, this is rebased and should be ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
@matt-gardner Oh, and regarding your question about |
873b7f5
to
6d9f364
Compare
@matt-gardner Regarding differentiating between "rows" and "a list of rows", grammar indeed becomes messy after my attempts. But I think a more realistic way would be enforce some constraints during the execution phase. For instance, it would fail if the function "previous/next" gets a list row with more than one element; on the contrary, functions like "average/sum/min" should take a list of rows with more than one element (though it's possible to have only one). Consider the cases like "min (argmax ... ), average(argmin(...))", they are very likely to be spurious programs. Does it make sense to you? |
Yes, failing during execution is easy - we just need to raise an |
This PR re-writes the variable-free language using the new API that does automatic grammar induction. I improved some things while porting the NLVR language, and this PR depends on those improvements, so it's not ready for review until #2319 is merged. But I'm opening this now so I can test it on CI.
This PR is a net reduction of ~600 lines of code, and the code that's left is way simpler. I'm pretty happy with how this rewrite of the grammar stuff worked out.
A few points to note on this new language:
List[Row]
toRow
, and the type ofselect
gets muddy.select
has type<List[Row],Column:List[str]>
. With the multi-match implementation, you would get three productions forList[str]
:['List[str] -> [<List[Row],Column:List[str]>, List[Row], DateColumn]', 'List[str] -> [<List[Row],Column:List[str]>, List[Row], NumberColumn]', 'List[str] -> [<List[Row],Column:List[str]>, List[Row], StringColumn]']
. This means that the model has to decide when it's expandingList[str]
what kind of column it is going to use. Except the column it uses is really a linking decision, and we have a lot more capacity to make that decision when we're predicting terminals, using our linking mechanism. So we're asking theList[str]
embedding to do a whole lot, more than it should be doing. The way I implemented this,List[str]
has just one production fromselect
:['List[str] -> [<List[Row],Column:List[str]>, List[Row], Column]
, and now there's aColumn
pre-terminal that expands to columns of all types. That is, the "multi-matching" happens at the terminals (where they are parameterized with a linking mechanism), instead of high up in the abstract syntax tree. This paragraph describes the modeling reasons for doing it this way, and what it means for how the grammar is different. To see how it's implemented, look at the changes I made todomain_language.py
.List[Row]
andRow
, I did make a minor change to the language, distinguishing betweenList[str]
(the result of doing aselect
or amode
) andstr
(the words you can use from the question).filter_in
andfilter_not_in
usingstr
instead ofList[str]
.This seemed like an obvious improvement, as question words are always strings, not lists. Though, thinking about it more, maybe I messed up - this means you can't filter based on the output of a select. @pdasigi, do we currently support that? Do we need it? It wasn't in any of the parsing tests, or it would have failed. This is easy to change back, if we need to (we can just mess with the types a bit to coerce the strings to be treated as
List[str]
).