8000 GitHub - CraftJarvis/JarvisVLA: Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"

Notifications You must be signed in to change notification settings

CraftJarvis/JarvisVLA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

arXiv HF Models PyTorch Python License

Project Website | Datasets

Updates

  • [2025.03.21] Our paper can be found in arXiv.

Installation

Install dependencies.

git clone https://github.com/CraftJarvis/JarvisVLA.git
conda create -n mcvla python=3.10
conda activate mcvla
cd JarvisVLA
conda install --channel=conda-forge openjdk=8 -y
pip install -e .

After the installation, you can run the following command to check if the installation is successful and the environment is working:

# After the installation, you can run the following command to check if the installation is successful:
python -m minestudio.simulator.entry # using Xvfb
MINESTUDIO_GPU_RENDER=1 python -m minestudio.simulator.entry # using VirtualGL

Inference

You can serve the model with vllm to support multi-GPU and multi-process rollout.

CUDA_VISIBLE_DEVICES=0 vllm serve jarvis_vla_qwen2_vl_7b_sft --port 8000

Then you need to edit the rollout script to the use the correct base_url and port. Finally, you can run the rollout script.

sh scripts/evaluate/rollout-kill.sh

Train

Prepare the dataset and base model, and write their locations in the shell below.

  • Single GPU
sh scripts/vla/vla_qwen2_vl_7b_sft.sh
  • Multi-GPU
sh scripts/vla/vla_qwen2_vl_7b_sft-multi-GPU.sh
  • Multi-Node
sh scripts/vla/vla_qwen2_vl_7b_sft-multi-node.sh

Citation

If you find our code or models useful in your work, please cite our paper:

@article{li2025jarvisvla,
  title   = {JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse},
  author  = {Muyao Li and Zihao Wang and Kaichen He and Xiaojian Ma and Yitao Liang},
  journal = {arXiv preprint arXiv:2503.16365}, 
  year    = {2025}
}

About

Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0