20Q-selfplay

LLM play 20questions with itself. Browse the dataset here : https://evanthebouncy.github.io/20Q-selfplay/

Tested on 1823 hypotheses from the THINGS dataset, llm = OpenAI(model_name="gpt-3.5-turbo-0301"), score of 68 / 1823.

Original 1854 objects de-duplicated: bat(animal) and bat(sport tool) collapsed into 1 concept.

The scoring of success / fail needs more work, as currently it'll count a query ""Is the object smaller than a breadbox?" as being successful in guessing the concept "bread". Conversely, if the guesser had used the word "bouguetteux" it would've been counted as incorrect, even though conceptually it is also "bread" except with some errors.

Read the blog for full details:

https://evanthebouncy.medium.com/llm-self-play-on-20-questions-dee7a8c63377

20Questions is also explored in BIG-bench (albeit with only 40 objects):

https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/twenty_questions

2023-03-27

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Play20Q.py		Play20Q.py
README.md		README.md
batch_20Q.py		batch_20Q.py
batch_20Q_1823.js		batch_20Q_1823.js
batch_20Q_1823_gpt4.js		batch_20Q_1823_gpt4.js
index.html		index.html
index.js		index.js
summary_20Q.png		summary_20Q.png
things.csv		things.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

20Q-selfplay

Read the blog for full details:

About

Uh oh!

Releases

Packages

Languages

evanthebouncy/20Q-selfplay

Folders and files

Latest commit

History

Repository files navigation

20Q-selfplay

Read the blog for full details:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages