8000 sql data updates by DeNeutoy · Pull Request #1827 · allenai/allennlp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

sql data updates #1827

Merged
merged 14 commits into from
Sep 27, 2018
Merged

sql data updates #1827

merged 14 commits into from
Sep 27, 2018

Conversation

DeNeutoy
Copy link
Contributor
@DeNeutoy DeNeutoy commented Sep 26, 2018
  • Add some analysis of the frequency of non-trivial AS statements to the script
  • make full stops their own tokens TABLE.COLUMN -> TABLE . COLUMN
  • Add a function which reads the schema of any of the text2sql datasets
  • Adds an option to clean unneeded aliases from SQL queries.

@DeNeutoy DeNeutoy changed the title [WIP] Structured sql sql data updates Sep 26, 2018
def split_table_and_column_names(table: str) -> Iterable[str]:

partitioned = [x for x in table.partition(".") if x != '']
if partitioned[0].isnumeric() and partitioned[-1].isnumeric():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What case is this handling? I'm having a hard time guessing what this logic accomplishes without having looked at the data.

previous_token = sql_tokens[0]
for (token, next_token) in zip(sql_tokens[1:-1], sql_tokens[2:]):
if token == "AS" and previous_token is not None:
table_name = next_token[:-6]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removing the "aliasX" bit, right? Some comment about that would be nice here (as a comment or a docstring).

'=', 'LOCATION', '.', 'RESTAURANT_ID', 'AND', 'RESTAURANT', '.', 'NAME', '=', "'name0'", ';']

# Check we don't mangle decimal numbers:
assert text2sql_utils.clean_unneeded_aliases(["2.5"]) == ["2.5"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, that bit above makes sense now. A brief comment would be nice.

cleaned = text2sql_utils.clean_unneeded_aliases(sql)
assert cleaned == ['SELECT', 'COUNT', '(', '*', ')', 'FROM', 'LOCATION', ',', 'RESTAURANT', 'WHERE',
'LOCATION', '.', 'CITY_NAME', '=', "'city_name0'", 'AND', 'RESTAURANT', '.', 'ID',
'=', 'LOCATION', '.', 'RESTAURANT_ID', 'AND', 'RESTAURANT', '.', 'NAME', '=', "'name0'", ';']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth adding a test here with a case where you shouldn't remove the "AS", just to be sure we handle those correctly.

@DeNeutoy DeNeutoy merged commit 9c7d0d0 into allenai:master Sep 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0