Text2sql reader #1738

DeNeutoy · 2018-09-09T14:05:21Z

Moves all semantic parsing dataset readers into their own folder.
Adds a dataset reader for the text2sql baseline which can read any of the 8 datasets.

I also refactored the sql utils a bit to read from my new directory format, for which I added a script in the previous PR. This includes adding functionality to de-duplicate the questions in a given dataset, not just the SQL. This PR looks massive, but I only added template_text2sql.py and modified text2sql_utils.py - all the rest are just moving folders around and adding depreciation warnings.

…into text2sql-reader

matt-gardner · 2018-09-10T22:20:39Z

allennlp/data/dataset_readers/dataset_utils/text2sql_utils.py

@@ -76,32 +85,42 @@ def process_sql_data_blob(data: JsonDict,
    use_all_sql : ``bool``, optional (default = False)
        Whether to use all of the sql queries which have identical semantics,
        or whether to just use the first one.
+    use_all_queries : ``bool``, (default = False)


This doesn't match the parameter name; above it's use_unique_queries. I'd vote for keeping use_all_queries, and do if not use_all_queries where you have use_unique_queries below.

matt-gardner · 2018-09-10T22:26:09Z

allennlp/data/dataset_readers/semantic_parsing/template_text2sql.py

+    cross_validation_split_to_exclude : ``int``, optional (default = None)
+        Some of the text2sql datasets are very small, so you may need to do cross validation.
+        Here, you can specify a integer corresponding to a split_{int}.json file not to include
+        int the training set.


s/int/in/

matt-gardner · 2018-09-10T22:27:56Z

allennlp/data/dataset_readers/semantic_parsing/template_text2sql.py

+        Parameters
+        ----------
+        file_path : ``str``, required.
+            For this dataset reader, file_path can either be a path to a file _or_ a


Use backticks instead of underscores for emphasis in RST. Sphinx might also complain about the underscore in file_path without code blocks...

matt-gardner · 2018-09-10T22:37:22Z

allennlp/tests/data/dataset_readers/semantic_parsing/template_text2sql_test.py

+
+        assert tokens == ['how', 'many', 'buttercup', 'kitchen', 'are', 'there', 'in', 'san', 'francisco', '?']
+        assert tags == ['O', 'O', 'name0', 'name0', 'O', 'O', 'O', 'city_name0', 'city_name0', 'O']
+        assert fields["template"].label == "SELECT COUNT ( * ) FROM LOCATION AS LOCATIONalias0 , RESTAURANT " \


So you predict what the template is and also run some kind of CRF tagger to fill in the variables in the template? Do you constrain the tagger to only use the variables in the template?

Right, exactly - no, in the text2sql paper constraints are not considered. That would be a good and easy extension.

Mark Neumann and others added 9 commits September 7, 2018 14:58

make semantic_parsing dataset reader directory

83e2290

add dataset reader for baseline

2fd7145

add template text2sql reader

06adc30

update docs, fix typing

03f6be3

Merge branch 'master' into text2sql-reader

441c866

fix build

e56b1fb

Merge branch 'text2sql-reader' of https://github.com/DeNeutoy/allennlp …

6bd7200

…into text2sql-reader

ignore depreciated docs

4d870eb

fix docs

82589d5

DeNeutoy requested a review from matt-gardner September 9, 2018 20:01

matt-gardner approved these changes Sep 10, 2018

View reviewed changes

Mark Neumann and others added 3 commits September 10, 2018 16:15

PR comments

faab1dc

Merge branch 'master' into text2sql-reader

ff8677c

Merge branch 'master' into text2sql-reader

1a76776

DeNeutoy merged commit 4c99f8e into allenai:master Sep 11, 2018

DeNeutoy deleted the text2sql-reader branch September 11, 2018 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Text2sql reader #1738

Text2sql reader #1738

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Text2sql reader #1738

Text2sql reader #1738

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!