8000 GitHub - outerbounds/dolly-metaflow
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

outerbounds/dolly-metaflow

Repository files navigation

Background

This repository trains Dolly, a large language model recently announced by Databricks Labs.

Please visit the original repository to learn more about Dolly's origins, and fair use.

The main contributions of this repository are:

  • Reproduce the Dolly training process on Outerbounds platform.
  • A Metaflow flow that runs Dolly training on multiple GPUs using Outerbounds platform, or your own Metaflow deployment.
  • A @gpu_profile decorator that you can reuse for any Metaflow task, to monitor GPU utilization.
  • A Streamlit app that lets you interact with different versions of Dolly when testing.

Dolly - and therefore this repository - is intended exclusively for research purposes and is not licensed for commercial use due to its dependency on the Alpaca Dataset. You will need to create your own instruction tuning dataset to use this repository for commercial applications.

Infrastructure & Environment ⚙️

GPU Environment

This code should be run on a GPU. We tested it in two environments:

  • AWS p3dn.24xlarge EC2 instance with 8 NVIDIA V100 GPUs
    • Deep Learning AMI (Ubuntu 20.04) with id ami-0a39ed2b865d65970 (release notes here)
    • smaller p3 instances worked too, but less reliably and efficiently
  • Coreweave node with 3 NVIDIA A100 GPUs
    • Ubuntu 22.04
    • NVIDIA driver version 515.105.01
    • CUDA Version 11.7

In general, we found it is best to have at least 3 A100 GPUs. You will also need a large amount of CPU memory, as the training process requires significant RAM as deepspeed shares the model state across GPUs.

Python Environment 📦

python -m venv env 
source env/bin/activate
pip install -r requirements.txt

Option 1: Outerbounds platform users

If you have access to the Outerbounds platform, install the outerbounds package to connect to your organization's deployment.

pip install -U outerbounds

After installing outerbounds, find and run the command like outerbounds configure <YOUR KEY> in your platform onboarding documentation.

Option 2: Open-source Metaflow users

If you do not have access to the Outerbounds platform and want to run on a Metaflow deployment you manage, you can install open-source Metaflow normally (or add it to the requirements.txt file).

pip install metaflow

To get started with your own deployment, follow our guides for engineers and/or reach out in our community Slack for help.

Run the TrainDolly flow ▶️

python train_dolly.py run

View the GPU profiling results

python train_dolly.py card view train

If you want to look at where this information comes from, you can look at the my_decorators.py file, which defines the @gpu_profile decorator. This decorator currently assumes that you have nvidia-smi installed on the machine where the train step runs, and therefore that you are running on NVIDIA GPUs.

Generate responses with Dolly 🤖

Now you can make a prediction using the trained model. To do this, you can run the app.py Streamlit app. This will launch a web app that allows you to interact with Dolly.

streamlit run app.py

Interacting with the model on a remote instance

Although you can run the above streamlit app locally if you have the GPUs to make inference times reasonable, you may want to run it on a remote instance.

To set up the Streamlit server on a remote instance, and interact with it from your laptop, you can:

  1. Set up a remote instance, such as on AWS EC2. Similar to during model training, you will want to select an instance with GPUs, such as a p3.16xlarge instance on AWS. As during training, we use the ami-0a39ed2b865d65970 deep learning AMI. Add a security role allowing your IP address to read from TCP port 8501, which is where Streamlit runs.
  2. Make a ssh connection to your EC2 instance.
  3. Install GitHub CLI tool by copy and pasting this in the terminal.
type -p curl >/dev/null || (sudo apt update && sudo apt install curl -y)
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \
&& sudo chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& sudo apt update \
&& sudo apt install gh -y
  1. Log in with gh auth login
  2. Clone this repository with gh repo clone outerbounds/dolly-ops && cd dolly-ops
  3. Create an environment with mamba create -n dolly python=3.9 -y && mamba init && source ~/.bashrc && mamba activate dolly
  4. Install the requirements.txt file with pip install -r requirements.txt
  5. Install Metaflow with pip install metaflow if you are not on Outerbounds platform. If you are on Outerbounds platform, install Metaflow with pip install -U outerbounds and then use your outerbounds configure <YOUR KEY> to connect to your organization's deployment. Make sure your Metaflow config matches the one used during training.
  6. Run the Streamlit app with streamlit run app.py.
  7. From your a web browser on your laptop, open the External URL that is printed in the terminal. Then you can interact with the models. Note it takes a few minutes to download the models the first time you try to load each one, since the models are 10s of GBs.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages

0