LLM Calibration Benchmark.

This repository attempts to run benchmarks on some popular openly available language models.

Installation

pip install -r requirements. tx

When running in the colab environment it is recommended ot use.

pip install -r requirements-colab.txt

Unit Tests

Running unit tests requires pytest module invoked as follows:

    python -m pytest test

Running Individual Experiments

Any individual experiment can be rerun using the following command

python  ../llm_calibration/run_experiment.py --model_name='meta-llama/Llama-2-13b-hf' --dataset='STEM'

The experimental result will produce a json result files which can be parsed offline to generate the requisite plots.

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
.github/workflows		.github/workflows
humaneval		humaneval
llm_calibration		llm_calibration
output/model-output		output/model-output
report @ 29a8625		report @ 29a8625
scripts		scripts
test		test
.DS_Store		.DS_Store
.gitmodules		.gitmodules
.pytest.ini		.pytest.ini
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LLM_Calibration_Benchmark.ipynb		LLM_Calibration_Benchmark.ipynb
README.md		README.md
requirements-colab.txt		requirements-colab.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Calibration Benchmark.

Installation

Unit Tests

Running Individual Experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

aakarsh/rl-llm-calibration-test

Folders and files

Latest commit

History

Repository files navigation

LLM Calibration Benchmark.

Installation

Unit Tests

Running Individual Experiments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages