GitHub - outerbounds/dolly-metaflow

For context, read this blog article: Training a Large Language Model With Metaflow, Featuring Dolly

Background

This repository trains Dolly, a large language model recently announced by Databricks Labs.

Please visit the original repository to learn more about Dolly's origins, and fair use.

The main contributions of this repository are:

Reproduce the Dolly training process on Outerbounds platform.
A Metaflow flow that runs Dolly training on multiple GPUs using Outerbounds platform, or your own Metaflow deployment.
A @gpu_profile decorator that you can reuse for any Metaflow task, to monitor GPU utilization.
A Streamlit app that lets you interact with different versions of Dolly when testing.

Dolly - and therefore this repository - is intended exclusively for research purposes and is not licensed for commercial use due to its dependency on the Alpaca Dataset. You will need to create your own instruction tuning dataset to use this repository for commercial applications.

Infrastructure & Environment ⚙️

GPU Environment

This code should be run on a GPU. We tested it in two environments:

AWS p3dn.24xlarge EC2 instance with 8 NVIDIA V100 GPUs
- Deep Learning AMI (Ubuntu 20.04) with id ami-0a39ed2b865d65970 (release notes here)
- smaller p3 instances worked too, but less reliably and efficiently
Coreweave node with 3 NVIDIA A100 GPUs
- Ubuntu 22.04
- NVIDIA driver version 515.105.01
- CUDA Version 11.7

In general, we found it is best to have at least 3 A100 GPUs. You will also need a large amount of CPU memory, as the training process requires significant RAM as deepspeed shares the model state across GPUs.

Python Environment 📦

python -m venv env 
source env/bin/activate
pip install -r requirements.txt

Option 1: Outerbounds platform users

If you have access to the Outerbounds platform, install the outerbounds package to connect to your organization's deployment.

pip install -U outerbounds

After installing outerbounds, find and run the command like outerbounds configure <YOUR KEY> in your platform onboarding documentation.

Option 2: Open-source Metaflow users

If you do not have access to the Outerbounds platform and want to run on a Metaflow deployment you manage, you can install open-source Metaflow normally (or add it to the requirements.txt file).

pip install metaflow

To get started with your own deployment, follow our guides for engineers and/or reach out in our community Slack for help.

Run the `TrainDolly` flow ▶️

python train_dolly.py run

View the GPU profiling results

python train_dolly.py card view train

If you want to look at where this information comes from, you can look at the my_decorators.py file, which defines the @gpu_profile decorator. This decorator currently assumes that you have nvidia-smi installed on the machine where the train step runs, and therefore that you are running on NVIDIA GPUs.

Generate responses with Dolly 🤖

Now you can make a prediction using the trained model. To do this, you can run the app.py Streamlit app. This will launch a web app that allows you to interact with Dolly.

streamlit run app.py

Interacting with the model on a remote instance

Although you can run the above streamlit app locally if you have the GPUs to make inference times reasonable, you may want to run it on a remote instance.

To set up the Streamlit server on a remote instance, and interact with it from your laptop, you can:

Set up a remote instance, such as on AWS EC2. Similar to during model training, you will want to select an instance with GPUs, such as a p3.16xlarge instance on AWS. As during training, we use the ami-0a39ed2b865d65970 deep learning AMI. Add a security role allowing your IP address to read from TCP port 8501, which is where Streamlit runs.
Make a ssh connection to your EC2 instance.
Install GitHub CLI tool by copy and pasting this in the terminal.

type -p curl >/dev/null || (sudo apt update && sudo apt install curl -y)
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \
&& sudo chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& sudo apt update \
&& sudo apt install gh -y

Log in with gh auth login
Clone this repository with gh repo clone outerbounds/dolly-ops && cd dolly-ops
Create an environment with mamba create -n dolly python=3.9 -y && mamba init && source ~/.bashrc && mamba activate dolly
Install the requirements.txt file with pip install -r requirements.txt
Install Metaflow with pip install metaflow if you are not on Outerbounds platform. If you are on Outerbounds platform, install Metaflow with pip install -U outerbounds and then use your outerbounds configure <YOUR KEY> to connect to your organization's deployment. Make sure your Metaflow config matches the one used during training.
Run the Streamlit app with streamlit run app.py.
From your a web browser on your laptop, open the External URL that is printed in the terminal. Then you can interact with the models. Note it takes a few minutes to download the models the first time you try to load each one, since the models are 10s of GBs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
static		static
LICENSE		LICENSE
README.md		README.md
app.py		app.py
check_resource_requirements.py		check_resource_requirements.py
consts.py		consts.py
cudatest.py		cudatest.py
ds_config.json		ds_config.json
generate.py		generate.py
my_decorators.py		my_decorators.py
requirements.txt		requirements.txt
test_gpu_setup.py		test_gpu_setup.py
train_dolly.py		train_dolly.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

For context, read this blog article: Training a Large Language Model With Metaflow, Featuring Dolly

Background

Infrastructure & Environment ⚙️

GPU Environment

Python Environment 📦

Option 1: Outerbounds platform users

Option 2: Open-source Metaflow users

Run the `TrainDolly` flow ▶️

View the GPU profiling results

Generate responses with Dolly 🤖

Interacting with the model on a remote instance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

outerbounds/dolly-metaflow

Folders and files

Latest commit

History

Repository files navigation

For context, read this blog article: Training a Large Language Model With Metaflow, Featuring Dolly

Background

Infrastructure & Environment ⚙️

GPU Environment

Python Environment 📦

Option 1: Outerbounds platform users

Option 2: Open-source Metaflow users

Run the TrainDolly flow ▶️

View the GPU profiling results

Generate responses with Dolly 🤖

Interacting with the model on a remote instance

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Run the `TrainDolly` flow ▶️

Packages