8000 GitHub - mohamadmansourX/yarn: YaRN: Efficient Context Window Extension of Large Language Models
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

YaRN: Efficient Context Window Extension of Large Language Models

License

Notifications You must be signed in to change notification settings

mohamadmansourX/yarn

 
 

Repository files navigation

YaRN

YaRN: Efficient Context Window Extension of Large Language Models

This repo contains the code and data for the YaRN context window extension method.

Awaiting arXiv announcement, citation will go here!

Models

We publish 7B and 13B variants of LLaMA 2 fine-tuned with YaRN at 64K and 128K context window length. They are available under the LLaMA 2 license on 🤗 Hugging Face.

Size Context Link
7B 64K NousResearch/Yarn-Llama-2-7b-64k
7B 128K NousResearch/Yarn-Llama-2-7b-128k
13B 64K NousResearch/Yarn-Llama-2-13b-64k
13B 128K NousResearch/Yarn-Llama-2-13b-128k

Reproduction

We strongly believe in open science, and thus publish all code and data to reproduce the results in our paper. To reproduce, clone the repository and perform a local installation.

git clone https://github.com/jquesnelle/yarn
cd yarn
pip install -e .

Training

To train the models, run accelerate config and enable DeepSpeed acceleration. deepspeed/zero3.json was the configuration file used for training.

# ./train.sh

The tokenized training data is available on Hugging Face and was derived from the pg19 dataset.

Evaluation

To reproduce the evaluations, install lm-evaluation-harness with pip install git+https://github.com/EleutherAI/lm-evaluation-harness and then run the two provided scripts.

# ./eval.sh
# ./eval-harness.sh

About

YaRN: Efficient Context Window Extension of Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.2%
  • Shell 3.8%
0