8000 GitHub - vkoeh/BIG-bench: Modified Version of the "Beyond the Imitation Game collaborative benchmark" for measuring and extrapolating the capabilities of language models to comprehend ambiguity
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
forked from google/BIG-bench

Modified Version of the "Beyond the Imitation Game collaborative benchmark" for measuring and extrapolating the capabilities of language models to comprehend ambiguity

License

Notifications You must be signed in to change notification settings

vkoeh/BIG-bench

 
 

Repository files navigation

Modified BIG-bench Repository

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe Large Language Models (LLMs) and extrapolate their future capabilities. This is a modified version of the original BIG-bench repository. The modifications were made by Viktoria Köhnlein in the course of the development of the Master Thesis "Bewertung und Erweiterung eines linguistischen Benchmark-Frameworks für große Sprachmodelle (LLMs)".

Table of Contents

Master Thesis: Modification of BIG-bench

This repository forms part of the master thesis "Bewertung und Erweiterung eines linguistischen Benchmark-Frameworks für große Sprachmodelle (LLMs)". The practical part of the master thesis focuses on the usage of Google BIG-bench. The primary goals include analyzing capabilities of LLMs and designing benchmarks to challenge these models.

Benchmark tasks were developed according to the requirements specified for BIG-bench. Additionally, the LLM Llam 7A3E a was integrated alongside those LLMs already included and available from the initial installation of BIG-bench.

Contributions

  1. Development of Three Tasks:

    • V1: A benchmark designed to investigate the comprehension of semantic ambiguity in LLMs. The task is to identify the ambiguous word in a sentence.
    • V2: A benchmark designed to investigate the comprehension of semantic ambiguity in LLMs. The task is to choose the sentence, which uses an ambiguous word correctly in context.
    • V3: A benchmark designed to investigate the comprehension of syntactic ambiguity in LLMs. The task involves identifying underlying structures and dependencies in garden-path-sentences.

    Each task includes multiple subtasks to test specific aspects of model reasoning and language comprehension.

  2. Integration of Llama:

    • Various models from the meta-llama/Llama-2 family have been successfully integrated into Google BIG-bench.

Use of the modified repository

Use the notebook Setup_ModifiedRepository.ipynb to evaluate the available LLMs (including Llama) with the benchmarks created in the course of the master thesis.
The notebook is located in the folder notebooks. Before using the notebook, please follow the instructions provided in the "README.md" file in the folder "notebooks".

Repository Structure

Only the parts of the original repository relevant to the evaluation of the created benchmarks were retained. The folders listed below are relevant to the modifications made and the functionality of the repository:

  • scripts_modification: Contains Python scripts relevant to the practical part of the master thesis
  • bigbench/benchmark_tasks: Includes the three designed benchmarks
  • bigbench/models: Contains the definitions of the model classes used for evaluation. This folder also includes the script llama_model.py, which handles the implementation of the LLM Llama
  • datasets: Contains the datasets used for task creation
  • results_excel: Contains Excel files summarizing the evaluation results. Additionally, the results for each benchmark are stored in their respective folders. The files in the results folder correspond to the evaluation with the setting "append_choices_to_input": true, while the files in the [Benchmark_Name]_acti_results folder correspond to evaluation with the setting "append_choices_to_input": false.

General Information on BIG-bench

For further information on BIG-bench, I refer to the original repository BIG-bench. General instructions regarding installation, task implementation etc. can be accessed here.

Citation and Attribution

This repository is a modification of the original BIG-bench repository.
If you use this work, please also cite the original BIG-bench analysis paper:

# TMLR
@article{srivastava2023beyond,
  title={Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models},
  author={BIG-bench authors},
  journal={Transactions on Machine Learning Research},
  issn={2835-8856},
  year={2023},
  url={https://openreview.net/forum?id=uyTL5Bvosj},
  note={}
}

About

Modified Version of the "Beyond the Imitation Game collaborative benchmark" for measuring and extrapolating the capabilities of language models to comprehend ambiguity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.4%
  • Jupyter Notebook 2.6%
0