8000 GitHub - bossben/ESMp: ETM evaluation script along with state of the art examples for the Text-to-SQL task.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

bossben/ESMp

Repository files navigation

ESM+: A New Evaluation Metric for Text-To-SQL

ESM+ is a new metric for the Text-to-SQL task. ESM+ calculates semantic accuracy with a lower rate of false positives than Execution accuracy and a lower rate of false negatives than Exact Set Matching. It is released along with our baselines, as well as several other state of the art model outputs. This repo contains all the code necessary for evaluation.

Evaluation

ESMp.py and esmp_process_sql.py are written in Python 3.10, and are modeled after the test-suite-sql-eval. Just like in the original evaluation scripts, to run this evaluation you need gold and predicted txt files. Examples of these are linked in spider_dev, spider_test, and cosql_dev. In each of these folders,

  • gold.txt: gold file where each line is gold SQL \t db_id
  • GPT4Turbo.txt: GPT4Turbo baseline predictions
  • Claude.txt: Claude3Opus baseline predictions
  • C3.txt: C3 model predictions
  • DAIL.txt: DAIL model predictions
  • DIN.txt: DIN model predictions
  • RASAT+PICARD.txt: RASAT+PICARD predictions
  • RESDSQL.txt: RESDSQL predictions
  • Graphix.txt: Graphix predictions
  • STAR.txt: STAR predictions

For the dev sets, predictions are taken directly from the corresponding githubs, with the exception of RASAT+PICARD, which was reproduced. For spider_test, the predictions were reproduced using the same process as the original, but could have different results.

Install & Run

First, download the database folders for spider (dev and test) and cosql (only dev). Save the database folders into spider_dev, spider_test, and cosql_dev, respectively.

Then, create a conda environment:

conda create -n "ESMp" python=3.10.0

conda activate ESMp

Install packages:

pip install -r requirements.txt

To run our script, use the following command:

python3 ESMp.py --gold path/to/gold.txt --pred path/to/pred.txt --db path/to/database/ --table path/to/tables.json

Optional flags:

--gold: gold txt file.

--pred: predictions txt file.

--db: directory of databases.

--table: tables json file.

--etype: same as previous. Note that exe has been updated according to the paper. Default is match (ESM+).

--plug_value: same as previous. Note that this metric is designed for models that do predict values.

--progress_bar_for_each_datapoint: same as previous

--disable_value: add if you want to disable value checks, strongly discouraged.

--disable_distinct: add if you want to disable distinct checks, strongly discouraged.

--disable_rules: Takes a list of comma separated rules, none, or all. Rule numbers correspond to those in Table 1 of our paper. Default is none.

--verbose: add if you want information like which rules are being applied on each comparison.

Default configuration is to run ESM+ on spider's test set, with our baseline GPT4Turbo predictions.

Baseline

We introduced two new baselines. These are stored in the baselines folder.

To begin, save the spider and cosql datasets into baselines/.

To run, first put your LLM keys in llm.py.

Then install requirements:

pip install -r requirements.txt

Then, baselines can be run using:

python3 spider.py

python3 cosql.py

About

ETM evaluation script along with state of the art examples for the Text-to-SQL task.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0