8000 Unit Interpretation Task by sajantanand · Pull Request #419 · google/BIG-bench · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Unit Interpretation Task #419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 26, 2021
Merged

Conversation

sajantanand
Copy link
Contributor

We submit a task on unit interpretation in the form of word problems of increasing difficulty. We expect models to struggle with this task, as these problems are non-trivial to humans.

@google-cla google-cla bot added the cla: yes contributor license agreement: yes label Jun 2, 2021
@google-cla
Copy link
google-cla bot commented Jun 2, 2021

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added cla: no and removed cla: yes contributor license agreement: yes labels Jun 2, 2021
@W10104
Copy link
Contributor
W10104 commented Jun 2, 2021

@googlebot I fixed it.

@google-cla
Copy link
google-cla bot commented Jun 2, 2021

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@W10104
Copy link
Contributor
W10104 commented Jun 2, 2021

@googlebot I fixed it.

@google-cla google-cla bot added cla: yes contributor license agreement: yes and removed cla: no labels Jun 2, 2021
@chiafullo
Copy link
Collaborator

@sajantanand We are now requesting that task authors please include an explicit section in the README.md file called "data source", listing exactly where you got your data from (or explicitly state that you made it up yourself). Many thanks!

…BIG-bench into unit_interpretation

Getting Tianle's changes so that I can push updates to README.md.
@chiafullo
Copy link
Collaborator

@Alicia-Parrish are one of the assigned reviewers for this task. Please provide feedback and be sure to notify me with a formal accept or does not meet criteria by Tuesday June 22 (if you have not already done so).

@chiafullo
Copy link
Collaborator

@william-r-s are one of the assigned reviewers for this task. Please provide feedback and be sure to notify me with a formal accept or does not meet criteria by Tuesday June 22 (if you have not already done so).

@Alicia-Parrish
Copy link
Contributor

Hi! I'm a reviewer on this task. The task looks like a very clever way to test numerical reasoning with units, and I found the examples, explanations, and justifications in the readme very clear. Below is my full review:

Correctness: The task.json files appear to be correctly formatted, but it looks like this is still waiting on someone to approve a run to check if it passes all the tests.

Formatting: The goals of the task are clearly stated and the task, including each sub-task, is well motivated and clearly aligns with the stated goals.

Specificity: This task is specifically targeting arithmetic and reasoning about units. The construction of the examples is justified, and the sub tasks allow for finer-grained interpretation of a model's ability to do the task given different kinds of information in the input.

Thoroughness: I do not think any of it will be solvable via memorization. The diversity of subjects and units make this a challenging task.

Difficulty: The authors clearly show that current language models are often at or below chance on this task.

Not solvable by memorizing the Internet: This task is not solvable through memorization.

Novelty: To my knowledge, this task is novel.

Justification: The background motivation is clear.

Size: This task includes 25 unique examples in each subtask. The full size of the dataset meets the requirements, though it's not clear to me if each subtask is required to also have the minimum of 32 examples.

Compute resources: This task includes a small and finite number of predictions to be made, and I do not expect computational resources to be a barrier in this task.

@chiafullo accept (assuming the final checks all pass)

@william-r-s
Copy link
Contributor

Hi, I'm the other reviewer on this task. I agree with @Alicia-Parrish's assessment.
It would be nice to either include the code for generating additional examples here, or publish it in another repository and link to it in the readme. It would also make sense to just generate more problems with changed numbers to get a clearer evaluation signal.
It might be nice to include a subtask that just covers the arithmetic from the other problems without any units, particularly small language models might fail because of getting the arithmetic wrong and it might be useful to know when that's the case vs. when it's about unit understanding.
@chiafullo accept

@sajantanand
Copy link
Contributor Author

@william-r-s we actually wrote these problems by hand, so unfortunately we cannot include any code or generate more examples. As far as having a subtask just including math, we were thinking that the "simple_arithmetic" family of tasks included by the organizers could be used to judge a model's ability to perform basic arithmetic and that our task would test the model's ability to glean what arithmetic is necessary from the word problem.

@chiafullo
Copy link
Collaborator

The amount of tasks we received this round is tremendous. With so many tasks to review we have decided to extend the review period to Tuesday, June 29.

Reviewers: if the submitter has made revisions please be sure to notify me with a formal ”accept” or ”does not meet criteria” by Tuesday, June 29 (if you haven't already done so).

this is an automated message

@sajantanand
Copy link
Contributor Author

@chiafullo How do we get the approval to run the workflow to ensure that this task passes the checks?

@Sohl-Dickstein
Copy link
Contributor

(approved workflow)

@Sohl-Dickstein Sohl-Dickstein force-pushed the main branch 2 times, most recently from 3fcd8da to 0afe508 Compare June 29, 2021 23:05
@chiafullo
Copy link
Collaborator

Your task has been accepted in the initial review stage. The next stage of this process is for us to assign your task a meta-reviewer for final review and merge. An assigned meta-reviewer will follow-up by commenting on the PR should it need further revisions.

@vedant
Copy link
Collaborator
vedant commented Jul 26, 2021

Hi @sajantanand, I'm the meta-reviewer for this task. Thank you for your contribution! This task appears to be well-designed and evaluates an important model capability. I will go ahead and merge the task.

One non-blocking suggestion is that it might be useful to add a "task_prefix" key that describes what the model needs to complete; in the zero-shot setting this isn't necessarily clear from an input.
For example:
Please select the option that best replaces "()" in each text input given the choices presented.

@sajantanand
Copy link
Contributor Author

@vedant I have made a new PR #496 to add the task_prefix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes contributor license agreement: yes task submission
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0