8000 GitHub - Rakhivaish/gensyn-ai: Detailed Guide on How to Contribute to Gensyn RL-Swarm
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Rakhivaish/gensyn-ai

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 

Repository files navigation

image

Run RL Swarm (Testnet) Node

RL Swarm is a fully open-source framework developed by GensynAI for building reinforcement learning (RL) training swarms over the internet. This guide walks you through setting up an RL Swarm node and a web UI dashboard to monitor swarm activity.

Hardware Requirements

There are currently multiple swarms running on the Testnet, each training on a different data set. The current list of available models and swarms include:

  • Models: Qwen 2.5 0.5B, Qwen 2.5 1.5B, Qwen 2.5 7B, Qwen 2.5 32B (4 bit) & Qwen 2.5 72B (4 bit)
  • Swarms: Math (GSM8K dataset) & Math Hard (DAPO-Math 17K dataset)

Your hardware requirements will vary depending on which swarm and model you choose. Users with less powerful hardware should select a smaller model (e.g. Qwen 0.5B or 1.5B) and smaller dataset (GSM8K) A. Users with more powerful hardware can select a larger model (e.g. Qwen 7B, 32B or 72B) and larger dataset (DAPO-Math 17K) B. The requirements for each are listed below:

Small model (0.5B or 1.5B) + Math (GSM8K dataset)

  • CPU-only: arm64 or x86 CPU with minimum 16gb ram (note that if you run other applications during training it might crash training).

OR

  • GPU:
    • RTX 3090
    • RTX 4090
    • A100
    • H100
    • I may say you test out any >8 GB vRAM GPU

Big model (7B, 32B or 72B) + Math Hard (DAPO-Math 17K dataset)

  • GPU: A100 (80GB) or H100 (80GB)

Method 1 - Windows Users (Home PC):

If you are a windows user, you may need to install Ubuntu on your windows.

  • Install Ubuntu on Windows: Guide
  • After you installed Ubuntu on Windows, Verify you already have NVIDIA Driver & CUDA Toolkit ready:
# Install NVIDIA Toolkit
sudo apt-get update
sudo apt-get install -y nvidia-cuda-toolkit

# Verify NVIDIA Driver
nvidia-smi

# Verify CUDA Toolkit:
nvcc --version

Method 2 - Rent Cloud GPU:

You can rent a Cloud GPU instance instead of using your own Home PC

1- Rent Vast.ai GPUs

  • 1- Register in Vast.ai
  • 2- Create ssh key in your local system (If you don't have already) with this Guide: step 1-5
  • 3- Paste SSH public key to Setting > SSH Keys here
  • 4- Select Pytorch(Vast) template here
  • 5- Choose a supported GPU (I recommend >=24GB Per-GPU vRAM)
  • 6- Increase Disk Space slidebar to 200GB
  • 7- Top-up with credits and rent it.
  • 8- Go to instances, refresh the page, click on key button
  • 9- Create an ssh key,
  • 10- Copy SSH Command, and Replace -L 3000:localhost:3000 in front of the command.
  • 11- Enter the command in Windows Powershell and run it

2- Rent Hyperbolic GPUs

  • To install the node on Hyperbolic check this Guide: Rent & Connect to GPU
  • Add this flag: -L 3000:localhost:3000 in front of your Hyperbolic's SSH-command, this will allow you to access to login page via local system.

Screenshot_677

1) Install Dependencies

1. Update System Packages

sudo apt-get update && sudo apt-get upgrade -y

2. Install General Utilities and Tools

sudo apt install screen curl iptables build-essential git wget lz4 jq make gcc nano automake autoconf tmux htop nvme-cli libgbm1 pkg-config libssl-dev libleveldb-dev tar clang bsdmainutils ncdu unzip libleveldb-dev  -y

3. Install Python

sudo apt-get install python3 python3-pip python3-venv python3-dev -y

4. Install Node

sudo apt-get update
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejs
node -v
sudo npm install -g yarn
yarn -v

5. Install Yarn

curl -o- -L https://yarnpkg.com/install.sh | bash
export PATH="$HOME/.yarn/bin:$HOME/.config/yarn/global/node_modules/.bin:$PATH"
source ~/.bashrc

2) Get HuggingFace Access token

1- Create account in HuggingFace

2- Create an Access Token with Write permissions here and save it


3) Clone the Repository

git clone https://github.com/gensyn-ai/rl-swarm/
cd rl-swarm

4) Run the swarm

Open a screen to run it in background

screen -S swarm

Install swarm

python3 -m venv .venv
source .venv/bin/activate
./run_rl_swarm.sh
  • Would you like to connect to the Testnet? [Y/n] >>> Press Y to join testnet
  • Which swarm would you like to join (Math (A) or Math Hard (B))? [A/b] >>> We have two type of Swarms:
    • A: Math (GSM8K dataset) -- Lower systems (>8GB) -- Use Small model (0.5B or 1.5B) for it.
    • B: Math Hard (DAPO-Math 17K dataset) -- Higher systems -- Use Big model (7B, 32B or 72B) for it.
  • How many parameters (in billions)? [0.5, 1.5, 7, 32, 72] >>> 0.5 is minimal and 72 is very big model. Choose based on your system.
  • Check Step Hardware Requirement for more clue.

5) Login

1- You have to receive Waiting for userData.json to be created... in logs

image

2- Open login page in browser

  • Local PC: http://localhost:3000/
  • VPS users: Do not receive OTP code in emails by logging in 3000 port on browser. You have to forward port by entering a command in their local pc powershell command prompt. (Step 3 of this section)

3- ⚠️ If you can't login or no email code received, Forward port:

  • In windows start menu, Search Powershell and open its terminal in your local PC
  • Enter the command below and replace your vps ip with Server_IP and your vps port(.eg 22) with SSH_PORT
ssh -L 3000:localhost:3000 root@Server_IP -p SSH_PORT
  • ⚠️ Make sure you enter the command in your own local Windows Powershell cmd and NOT your VPS terminal.
  • This prompts you to enter your VPS password, when you enter it, you connect and tunnel to your vps
  • Now go to browser and open http://localhost:3000/ and login

4- Login with your preferred method

image

  • After login, your terminal starts installation.

5- Optional: Push models to huggingface

  • Enter your HuggingFace access token you've created when it prompted
  • This will need 2GB upload bandwidth for each model you train, you can pass it by entering N

image


Node Name

  • Now your node started running, Find your name after word Hello, like mine is whistling hulking armadillo as in the image below (You can use CTRL+SHIFT+F to search Hello in terminal)

image


Screen commands

  • Minimize: CTRL + A + D
  • Return: screen -r swarm
  • Stop and Kill: screen -XS swarm quit

Backup

You need to backup swarm.pem.

VPS:

Connect your VPS using Mobaxterm client to be able to move files to your local system. Back up these files:**

  • /root/rl-swarm/swarm.pem

WSL:

Search \\wsl.localhost in your Windows Explorer to see your Ubuntu directory. Your main directories are as follows:

  • If installed via a username: \\wsl.localhost\Ubuntu\home\<your_username>
  • If installed via root: \\wsl.localhost\Ubuntu\root
  • Look for rl-swarm/swarm.pem

GPU servers (.eg, Hyperbolic):

1- Connect to your GPU server by entering this command in Windows PowerShell terminal

sftp -P PORT ubuntu@xxxx.hyperbolic.xyz
  • Replace ubuntu@xxxx.hyperbolic.xyz with your given GPU hostname
  • Replace PORT with your server port (in your server ssh connection command)
  • ubuntu is the user of my hyperbolic gpu, it can be anything else or it's root if you test it out for vps

Once connected, you’ll see the SFTP prompt:

sftp>

2- Navigate to the Directory Containing the Files

  • After connecting, you’ll start in your home directory on the server. Use the cd command to move to the directory of your files:
cd /home/ubuntu/rl-swarm

3- Download Files

  • Use the get command to download the files to your local system. They’ll save to your current local directory unless you specify otherwise:
get swarm.pem
  • Downloaded file is in the main directory of your Powershell or WSL where you entered the sFTP command.
    • If entered sftp command in Porwershell, the swarm.pem file might be in C:\Users\<pc-username>.
  • You can now type exit to close connection. The files are in the main directory of your Powershell or WSL where you entered the first SFTP command.

Recovering Backup file (upload)

If you need to upload files from your local machine to the server.

  • WSL & VPS: Drag & Drop option.

GPU servers (.eg, Hyperbolic):

1- Connect to your GPU server using sFTP

2- Upload Files Using the put Command:

In SFTP, the put command uploads files from your local machine to the server.

put swarm.pem /home/ubuntu/rl-swarm/swarm.pem

Node Health

Official Dashboards

image

Telegram Bot

Search you Node ID here with /check here: https://t.me/gensyntrackbot

  • Node-ID is near your Node name

image

  • ⚠️ If receiving EVM Wallet: 0x0000000000000000000000000000000000000000, your onchain-participation is not being tracked and you have to Install with New Email and Delete old swarm.pem

image


Update Node

1- Stop Node

# list screens
screen -ls

# kill swarm screens (replace screen-id)
screen -XS screen-id quit

# You can kill by name
screen -XS swarm quit

2- Update Node Repository

Method 1 (test this first): If you cloned official repo with no local changes:

cd rl-swarm
git pull

Method 2: If you cloned official repo with local Changes:

cl rl-swarm

# Reset local changes:
git reset --hard
# Pull updates:
git pull

# Alternatively:
git fetch
git reset --hard origin/main
  • You have to do your local changes again.

Method 3: Cloned unofficial repo or Try from scratch (Recommended):

cd rl-swarm

# backup .pem
cp ./swarm.pem ~/swarm.pem

cd ..

# delete rl-swarm dir
rm -rf rl-swarm

# clone new repo
git clone https://github.com/gensyn-ai/rl-swarm

cd rl-swarm

# Recover .pem
cp ~/swarm.pem ./swarm.pem
  • If you had any local changes, you have to do it again.

3- Re-run Node

Head back to 4) Run the swarm and re-run Node.


Troubleshooting:

⚠️ Error: PS1 unbound variable

sed -i '1i # ~/.bashrc: executed by bash(1) for non-login shells.\n\n# If not running interactively, don'\''t do anything\ncase $- in\n    *i*) ;;\n    *) return;;\nesac\n' ~/.bashrc

⚠️ Upgrade viem & Node version in Login Page

1- Modify: package.json

cd rl-swarm
nano modal-login/package.json
  • Update: "viem": to "2.25.0"

2- Upgrade

cd rl-swarm
cd modal-login
yarn install

yarn upgrade && yarn add next@latest && yarn add viem@latest

cd ..

⚠️ CPU-only Users: Ran out of input

Navigate:

cd rl-swarm

Edit:

nano hivemind_exp/configs/mac/grpo-qwen-2.5-0.5b-deepseek-r1.yaml
  • Lower max_steps to 5

About

Detailed Guide on How to Contribute to Gensyn RL-Swarm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0