Unit Interpretation Task #419

sajantanand · 2021-06-02T05:09:58Z

We submit a task on unit interpretation in the form of word problems of increasing difficulty. We expect models to struggle with this task, as these problems are non-trivial to humans.

google-cla · 2021-06-02T05:12:54Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

W10104 · 2021-06-02T05:16:27Z

@googlebot I fixed it.

google-cla · 2021-06-02T05:16:39Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

W10104 · 2021-06-02T05:19:04Z

@googlebot I fixed it.

chiafullo · 2021-06-16T19:42:39Z

@sajantanand We are now requesting that task authors please include an explicit section in the README.md file called "data source", listing exactly where you got your data from (or explicitly state that you made it up yourself). Many thanks!

…BIG-bench into unit_interpretation Getting Tianle's changes so that I can push updates to README.md.

chiafullo · 2021-06-17T17:57:58Z

@Alicia-Parrish are one of the assigned reviewers for this task. Please provide feedback and be sure to notify me with a formal accept or does not meet criteria by Tuesday June 22 (if you have not already done so).

chiafullo · 2021-06-17T18:00:53Z

@william-r-s are one of the assigned reviewers for this task. Please provide feedback and be sure to notify me with a formal accept or does not meet criteria by Tuesday June 22 (if you have not already done so).

Alicia-Parrish · 2021-06-17T22:06:10Z

Hi! I'm a reviewer on this task. The task looks like a very clever way to test numerical reasoning with units, and I found the examples, explanations, and justifications in the readme very clear. Below is my full review:

Correctness: The task.json files appear to be correctly formatted, but it looks like this is still waiting on someone to approve a run to check if it passes all the tests.

Formatting: The goals of the task are clearly stated and the task, including each sub-task, is well motivated and clearly aligns with the stated goals.

Specificity: This task is specifically targeting arithmetic and reasoning about units. The construction of the examples is justified, and the sub tasks allow for finer-grained interpretation of a model's ability to do the task given different kinds of information in the input.

Thoroughness: I do not think any of it will be solvable via memorization. The diversity of subjects and units make this a challenging task.

Difficulty: The authors clearly show that current language models are often at or below chance on this task.

Not solvable by memorizing the Internet: This task is not solvable through memorization.

Novelty: To my knowledge, this task is novel.

Justification: The background motivation is clear.

Size: This task includes 25 unique examples in each subtask. The full size of the dataset meets the requirements, though it's not clear to me if each subtask is required to also have the minimum of 32 examples.

Compute resources: This task includes a small and finite number of predictions to be made, and I do not expect computational resources to be a barrier in this task.

@chiafullo accept (assuming the final checks all pass)

william-r-s · 2021-06-20T00:32:45Z

Hi, I'm the other reviewer on this task. I agree with @Alicia-Parrish's assessment.
It would be nice to either include the code for generating additional examples here, or publish it in another repository and link to it in the readme. It would also make sense to just generate more problems with changed numbers to get a clearer evaluation signal.
It might be nice to include a subtask that just covers the arithmetic from the other problems without any units, particularly small language models might fail because of getting the arithmetic wrong and it might be useful to know when that's the case vs. when it's about unit understanding.
@chiafullo accept

sajantanand · 2021-06-21T17:36:23Z

@william-r-s we actually wrote these problems by hand, so unfortunately we cannot include any code or generate more examples. As far as having a subtask just including math, we were thinking that the "simple_arithmetic" family of tasks included by the organizers could be used to judge a model's ability to perform basic arithmetic and that our task would test the model's ability to glean what arithmetic is necessary from the word problem.

chiafullo · 2021-06-22T20:24:16Z

The amount of tasks we received this round is tremendous. With so many tasks to review we have decided to extend the review period to Tuesday, June 29.

Reviewers: if the submitter has made revisions please be sure to notify me with a formal ”accept” or ”does not meet criteria” by Tuesday, June 29 (if you haven't already done so).

this is an automated message

sajantanand · 2021-06-29T17:19:47Z

@chiafullo How do we get the approval to run the workflow to ensure that this task passes the checks?

Sohl-Dickstein · 2021-06-29T17:28:44Z

(approved workflow)

chiafullo · 2021-07-07T20:26:14Z

Your task has been accepted in the initial review stage. The next stage of this process is for us to assign your task a meta-reviewer for final review and merge. An assigned meta-reviewer will follow-up by commenting on the PR should it need further revisions.

vedant · 2021-07-26T16:33:03Z

Hi @sajantanand, I'm the meta-reviewer for this task. Thank you for your contribution! This task appears to be well-designed and evaluates an important model capability. I will go ahead and merge the task.

One non-blocking suggestion is that it might be useful to add a "task_prefix" key that describes what the model needs to complete; in the zero-shot setting this isn't necessarily clear from an input.
For example:
Please select the option that best replaces "()" in each text input given the choices presented.

sajantanand · 2021-07-29T05:07:51Z

@vedant I have made a new PR #496 to add the task_prefix.

Preparing to submit.

eeb79a4

google-cla bot added the cla: yes contributor license agreement: yes label Jun 2, 2021

Trivial update

84c95ff

google-cla bot added cla: no and removed cla: yes contributor license agreement: yes labels Jun 2, 2021

google-cla bot added cla: yes contributor license agreement: yes and removed cla: no labels Jun 2, 2021

chiafullo added the task submission label Jun 2, 2021

W10104 added 2 commits June 2, 2021 22:34

Correct a typo

27544bf

Correct typos in a problem

2c7dded

guygurari added task submission and removed task submission labels Jun 3, 2021

W10104 added 2 commits June 3, 2021 22:51

Add existing language models

1421c69

Update existing language model

e5d20ac

r-barnes mentioned this pull request Jun 17, 2021

Benchmarks for ability to do math with physical units #275

Closed

sajantanand mentioned this pull request Jun 17, 2021

Unit conversion #372

Merged

sajantanand added 3 commits June 16, 2021 23:36

Data source.

e3d1b17

Merge branch 'unit_interpretation' of https://github.com/james-simon/…

22ebb24

…BIG-bench into unit_interpretation Getting Tianle's changes so that I can push updates to README.md.

Data source.

3bcd365

Sohl-Dickstein force-pushed the main branch 2 times, most recently from 3fcd8da to 0afe508 Compare June 29, 2021 23:05

vedant merged commit 115cd93 into google:main Jul 26, 2021

sajantanand mentioned this pull request Jul 29, 2021

Addressing meta-reviewer comments for unit_interpretation task #496

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unit Interpretation Task #419

Unit Interpretation Task #419

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Unit Interpretation Task #419

Unit Interpretation Task #419

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!