8000 GitHub - evanthebouncy/20Q-selfplay: LLM play 20questions with itself
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

evanthebouncy/20Q-selfplay

Repository files navigation

20Q-selfplay

LLM play 20questions with itself. Browse the dataset here : https://evanthebouncy.github.io/20Q-selfplay/

Tested on 1823 hypotheses from the THINGS dataset, llm = OpenAI(model_name="gpt-3.5-turbo-0301"), score of 68 / 1823.

alt text

Original 1854 objects de-duplicated: bat(animal) and bat(sport tool) collapsed into 1 concept.

The scoring of success / fail needs more work, as currently it'll count a query ""Is the object smaller than a breadbox?" as being successful in guessing the concept "bread". Conversely, if the guesser had used the word "bouguetteux" it would've been counted as incorrect, even though conceptually it is also "bread" except with some errors.

Read the blog for full details:

https://evanthebouncy.medium.com/llm-self-play-on-20-questions-dee7a8c63377

20Questions is also explored in BIG-bench (albeit with only 40 objects):

https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/twenty_questions

Twitter URL

2023-03-27

About

LLM play 20questions with itself

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0