Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.
Features:
- Train various Huggingface models such as llama, pythia, falcon, mpt
- Supports fullfinetune, lora, qlora, relora, and gptq
- Customize configurations using a simple yaml file or CLI overwrite
- Load different dataset formats, use custom formats, or bring your own tokenized datasets
- Integrated with xformer, flash attention, rope scaling, and multipacking
- Works with single GPU or multiple GPUs via FSDP or Deepspeed
- Easily run with Docker locally or on the cloud
- Log results and optionally checkpoints to wandb or mlflow
- And more!
|
fp16/fp32 | lora | qlora | gptq | gptq w/flash attn | flash attn | xformers attn | |
---|---|---|---|---|---|---|---|
llama | β | β | β | β | β | β | β |
Mistral | β | β | β | β | β | β | β |
Mixtral-MoE | β | β | β | β | β | β | β |
Mixtral8X22 | β | β | β | β | β | β | β |
Pythia | β | β | β | β | β | β | β |
cerebras | β | β | β | β | β | β | β |
btlm | β | β | β | β | β | β | β |
mpt | β | β | β | β | β | β | β |
falcon | β | β | β | β | β | β | β |
gpt-j | β | β | β | β | β | β | β |
XGen | β | β | β | β | β | β | β |
phi | β | β | β | β | β | β | β |
RWKV | β | β | β | β | β | β | β |
Qwen | β | β | β | β | β | β | β |
Gemma | β | β | β | β | β | β | β |
β : supported β: not supported β: untested
Get started with Axolotl in just a few steps! This quickstart guide will walk you through setting up and running a basic fine-tuning task.
Requirements: Python >=3.10 and Pytorch >=2.1.1.
git clone https://github.com/OpenAccess-AI-Collective/axolotl
cd axolotl
pip3 install packaging ninja
pip3 install -e '.[flash-attn,deepspeed]'
# preprocess datasets - optional but recommended
CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess examples/openllama-3b/lora.yml
# finetune lora
accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml
# inference
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
--lora_model_dir="./lora-out"
# gradio
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
--lora_model_dir="./lora-out" --gradio
# remote yaml files - the yaml config can be hosted on a public URL
# Note: the yaml config must directly link to the **raw** yaml
accelerate launch -m axolotl.cli.train https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/examples/openllama-3b/lora.yml
docker run --gpus '"all"' --rm -it winglian/axolotl:main-latest
Or run on the current files for development:
docker compose up -d
Tip
If you want to debug axolotl or prefer to use Docker as your development environment, see the debugging guide's section on Docker.
Docker advanced
A more powerful Docker command to run would be this:
docker run --privileged --gpus '"all"' --shm-size 10g --rm -it --name axolotl --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --mount type=bind,src="${PWD}",target=/workspace/axolotl -v ${HOME}/.cache/huggingface:/root/.cache/huggingface winglian/axolotl:main-latest
It additionally:
- Prevents memory issues when running e.g. deepspeed (e.g. you could hit SIGBUS/signal 7 error) through
--ipc
and--ulimit
args. - Persists the downloaded HF data (models etc.) and your modifications to axolotl code through
--mount
/-v
args. - The
--name
argument simply makes it easier to refer to the container in vscode (Dev Containers: Attach to Running Container...
) or in your terminal. - The
--privileged
flag gives all capabilities to the container. - The
--shm-size 10g
argument increases the shared memory size. Use this if you seeexitcode: -7
errors using deepspeed.
-
Install python >=3.10
-
Install pytorch stable https://pytorch.org/get-started/locally/
-
Install Axolotl along with python dependencies
pip3 install packaging pip3 install -e '.[flash-attn,deepspeed]'
-
(Optional) Login to Huggingface to use gated models/datasets.
huggingface-cli login
Get the token at huggingface.co/settings/tokens
For cloud GPU providers that support docker images, use winglian/axolotl-cloud:main-latest
- on Latitude.sh use this direct link
- on JarvisLabs.ai use this direct link
- on RunPod use this direct link
Click to Expand
- Install python
sudo apt update
sudo apt install -y python3.10
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1
sudo update-alternatives --config python # pick 3.10 if given option
python -V # should be 3.10
- Install pip
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py
-
Install Pytorch https://pytorch.org/get-started/locally/
-
Follow instructions on quickstart.
-
Run
pip3 install protobuf==3.20.3
pip3 install -U --ignore-installed requests Pillow psutil scipy
- Set path
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
Click to Expand
Use a Deeplearning linux OS with cuda and pytorch installed. Then follow instructions on quickstart.
Make sure to run the below to uninstall xla.
pip uninstall -y torch_xla[tpu]
Please use WSL or Docker!
Use the below instead of the install method in QuickStart.
pip3 install -e '.'
More info: mac.md
Please use this example notebook.
To launch on GPU instances (both on-demand and spot instances) on 7+ clouds (GCP, AWS, Azure, OCI, and more), you can use SkyPilot:
pip install "skypilot-nightly[gcp,aws,azure,oci,lambda,kubernetes,ibm,scp]" # choose your clouds
sky check
Get the example YAMLs of using Axolotl to finetune mistralai/Mistral-7B-v0.1
:
git clone https://github.com/skypilot-org/skypilot.git
cd skypilot/llm/axolotl
Use one command to launch:
# On-demand
HF_TOKEN=xx sky launch axolotl.yaml --env HF_TOKEN
# Managed spot (auto-recovery on preemption)
HF_TOKEN=xx BUCKET=<unique-name> sky spot launch axolotl-spot.yaml --env HF_TOKEN --env BUCKET
Axolotl supports a variety of dataset formats. It is recommended to use a JSONL. The schema of the JSONL depends upon the task and the prompt template you wish to use. Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.
See these docs for more information on how to use different dataset formats.
See examples for quick start. It is recommended to duplicate and modify to your needs. The most important options are:
-
model
base_model: ./llama-7b-hf # local or huggingface repo
Note: The code will load the right architecture.
-
dataset
datasets: # huggingface repo - path: vicgalle/alpaca-gpt4 type: alpaca # huggingface repo with specific configuration/subset - path: EleutherAI/pile name: enron_emails type: completion # format from earlier field: text # Optional[str] default: text, field to use for completion data # huggingface repo with multiple named configurations/subsets - path: bigcode/commitpackft name: - ruby - python - typescript type: ... # unimplemented custom format # fastchat conversation # See 'conversation' options: https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py - path: ... type: sharegpt conversation: chatml # default: vicuna_v1.1 # local - path: data.jsonl # or json ds_type: json # see other options below type: alpaca # dataset with splits, but no train split - path: knowrohit07/know_sql type: context_qa.load_v2 train_on_split: validation # loading from s3 or gcs # s3 creds will be loaded from the system default and gcs only supports public access - path: s3://path_to_ds # Accepts folder with arrow/parquet or file path like above. Supports s3, gcs. ... # Loading Data From a Public URL # - The file format is `json` (which includes `jsonl`) by default. For different formats, adjust the `ds_type` option accordingly. - path: https://some.url.com/yourdata.jsonl # The URL should be a direct link to the file you wish to load. URLs must use HTTPS protocol, not HTTP. ds_type: json # this is the default, see other options below.
-
loading
load_in_4bit: true load_in_8bit: true bf16: auto # require >=ampere, auto will detect if your GPU supports this and choose automatically. fp16: # leave empty to use fp16 when bf16 is 'auto'. set to false if you want to fallback to fp32 tf32: true # require >=ampere bfloat16: true # require >=ampere, use instead of bf16 when you don't want AMP (automatic mixed precision) float16: true # use instead of fp16 when you don't want AMP
Note: Repo does not do 4-bit quantization.
-
lora
adapter: lora # 'qlora' or leave blank for full finetune lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: - q_proj - v_proj
See these docs for all config options.
< F1CB div class="markdown-heading" dir="auto">