8000 Text2sql reader by DeNeutoy · Pull Request #1738 · allenai/allennlp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

Text2sql reader #1738

Merged
merged 12 commits into from
Sep 11, 2018
Merged

Text2sql reader #1738

merged 12 commits into from
Sep 11, 2018

Conversation

DeNeutoy
Copy link
Contributor
@DeNeutoy DeNeutoy commented Sep 9, 2018
  • Moves all semantic parsing dataset readers into their own folder.
  • Adds a dataset reader for the text2sql baseline which can read any of the 8 datasets.

I also refactored the sql utils a bit to read from my new directory format, for which I added a script in the previous PR. This includes adding functionality to de-duplicate the questions in a given dataset, not just the SQL. This PR looks massive, but I only added template_text2sql.py and modified text2sql_utils.py - all the rest are just moving folders around and adding depreciation warnings.

@@ -76,32 +85,42 @@ def process_sql_data_blob(data: JsonDict,
use_all_sql : ``bool``, optional (default = False)
Whether to use all of the sql queries which have identical semantics,
or whether to just use the first one.
use_all_queries : ``bool``, (default = False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the parameter name; above it's use_unique_queries. I'd vote for keeping use_all_queries, and do if not use_all_queries where you have use_unique_queries below.

cross_validation_split_to_exclude : ``int``, optional (default = None)
Some of the text2sql datasets are very small, so you may need to do cross validation.
Here, you can specify a integer corresponding to a split_{int}.json file not to include
int the training set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/int/in/

Parameters
----------
file_path : ``str``, required.
For this dataset reader, file_path can either be a path to a file _or_ a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use backticks instead of underscores for emphasis in RST. Sphinx might also complain about the underscore in file_path without code blocks...


assert tokens == ['how', 'many', 'buttercup', 'kitchen', 'are', 'there', 'in', 'san', 'francisco', '?']
assert tags == ['O', 'O', 'name0', 'name0', 'O', 'O', 'O', 'city_name0', 'city_name0', 'O']
assert fields["template"].label == "SELECT COUNT ( * ) FROM LOCATION AS LOCATIONalias0 , RESTAURANT " \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you predict what the template is and also run some kind of CRF tagger to fill in the variables in the template? Do you constrain the tagger to only use the variables in the template?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, exactly - no, in the text2sql paper constraints are not considered. That would be a good and easy extension.

@DeNeutoy DeNeutoy merged commit 4c99f8e into allenai:master Sep 11, 2018
@DeNeutoy DeNeutoy deleted the text2sql-reader branch September 11, 2018 01:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0