Modified BIG-bench Repository

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe Large Language Models (LLMs) and extrapolate their future capabilities. This is a modified version of the original BIG-bench repository. The modifications were made by Viktoria Köhnlein in the course of the development of the Master Thesis "Bewertung und Erweiterung eines linguistischen Benchmark-Frameworks für große Sprachmodelle (LLMs)".

Master Thesis: Modification of BIG-bench

This repository forms part of the master thesis "Bewertung und Erweiterung eines linguistischen Benchmark-Frameworks für große Sprachmodelle (LLMs)". The practical part of the master thesis focuses on the usage of Google BIG-bench. The primary goals include analyzing capabilities of LLMs and designing benchmarks to challenge these models.

Benchmark tasks were developed according to the requirements specified for BIG-bench. Additionally, the LLM Llam 7A3E a was integrated alongside those LLMs already included and available from the initial installation of BIG-bench.

Contributions

Development of Three Tasks:
- V1: A benchmark designed to investigate the comprehension of semantic ambiguity in LLMs. The task is to identify the ambiguous word in a sentence.
- V2: A benchmark designed to investigate the comprehension of semantic ambiguity in LLMs. The task is to choose the sentence, which uses an ambiguous word correctly in context.
- V3: A benchmark designed to investigate the comprehension of syntactic ambiguity in LLMs. The task involves identifying underlying structures and dependencies in garden-path-sentences.
Each task includes multiple subtasks to test specific aspects of model reasoning and language comprehension.
Integration of Llama:
- Various models from the meta-llama/Llama-2 family have been successfully integrated into Google BIG-bench.

Use of the modified repository

Use the notebook Setup_ModifiedRepository.ipynb to evaluate the available LLMs (including Llama) with the benchmarks created in the course of the master thesis.
The notebook is located in the folder notebooks. Before using the notebook, please follow the instructions provided in the "README.md" file in the folder "notebooks".

Repository Structure

Only the parts of the original repository relevant to the evaluation of the created benchmarks were retained. The folders listed below are relevant to the modifications made and the functionality of the repository:

scripts_modification: Contains Python scripts relevant to the practical part of the master thesis
bigbench/benchmark_tasks: Includes the three designed benchmarks
bigbench/models: Contains the definitions of the model classes used for evaluation. This folder also includes the script llama_model.py, which handles the implementation of the LLM Llama
datasets: Contains the datasets used for task creation
results_excel: Contains Excel files summarizing the evaluation results. Additionally, the results for each benchmark are stored in their respective folders. The files in the results folder correspond to the evaluation with the setting "append_choices_to_input": true, while the files in the [Benchmark_Name]_acti_results folder correspond to evaluation with the setting "append_choices_to_input": false.

General Information on BIG-bench

For further information on BIG-bench, I refer to the original repository BIG-bench. General instructions regarding installation, task implementation etc. can be accessed here.

Citation and Attribution

This repository is a modification of the original BIG-bench repository.
If you use this work, please also cite the original BIG-bench analysis paper:

# TMLR
@article{srivastava2023beyond,
  title={Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models},
  author={BIG-bench authors},
  journal={Transactions on Machine Learning Research},
  issn={2835-8856},
  year={2023},
  url={https://openreview.net/forum?id=uyTL5Bvosj},
  note={}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6,380 Commits
.vs		.vs
bigbench		bigbench
bleurt		bleurt
datasets		datasets
docs		docs
notebooks		notebooks
results_excel		results_excel
scripts_modification		scripts_modification
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
keywords.md		keywords.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Modified BIG-bench Repository

Table of Contents

Master Thesis: Modification of BIG-bench

Contributions

Use of the modified repository

Repository Structure

General Information on BIG-bench

Citation and Attribution

About

Uh oh!

Releases

Packages

Languages

License

vkoeh/BIG-bench

Folders and files

Latest commit

History

Repository files navigation

Modified BIG-bench Repository

Table of Contents

Master Thesis: Modification of BIG-bench

Contributions

Use of the modified repository

Repository Structure

General Information on BIG-bench

Citation and Attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages