-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Conversation
…into text2sql-reader
@@ -76,32 +85,42 @@ def process_sql_data_blob(data: JsonDict, | |||
use_all_sql : ``bool``, optional (default = False) | |||
Whether to use all of the sql queries which have identical semantics, | |||
or whether to just use the first one. | |||
use_all_queries : ``bool``, (default = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the parameter name; above it's use_unique_queries
. I'd vote for keeping use_all_queries
, and do if not use_all_queries
where you have use_unique_queries
below.
cross_validation_split_to_exclude : ``int``, optional (default = None) | ||
Some of the text2sql datasets are very small, so you may need to do cross validation. | ||
Here, you can specify a integer corresponding to a split_{int}.json file not to include | ||
int the training set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/int/in/
Parameters | ||
---------- | ||
file_path : ``str``, required. | ||
For this dataset reader, file_path can either be a path to a file _or_ a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use backticks instead of underscores for emphasis in RST. Sphinx might also complain about the underscore in file_path
without code blocks...
|
||
assert tokens == ['how', 'many', 'buttercup', 'kitchen', 'are', 'there', 'in', 'san', 'francisco', '?'] | ||
assert tags == ['O', 'O', 'name0', 'name0', 'O', 'O', 'O', 'city_name0', 'city_name0', 'O'] | ||
assert fields["template"].label == "SELECT COUNT ( * ) FROM LOCATION AS LOCATIONalias0 , RESTAURANT " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you predict what the template is and also run some kind of CRF tagger to fill in the variables in the template? Do you constrain the tagger to only use the variables in the template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, exactly - no, in the text2sql paper constraints are not considered. That would be a good and easy extension.
I also refactored the sql utils a bit to read from my new directory format, for which I added a script in the previous PR. This includes adding functionality to de-duplicate the questions in a given dataset, not just the SQL. This PR looks massive, but I only added
template_text2sql.py
and modifiedtext2sql_utils.py
- all the rest are just moving folders around and adding depreciation warnings.