This project depends on multiple models and tool libraries. It is recommended to use Conda to create an isolated environment.
- conda create -n flightgpt python=3.10
- conda activate flightgpt
- pip install -r requirements.txt
-
Download model weights to
./model_weight/
Note: Change the value ofmax_pixels
inpreprocessor_config.json
to16032016
. -
Download data to
./data/
-
And for sft, Download the cleaned_final.json to ./LLaMA-Factory/data
├── model_weight/ # Directory for model weights (download manually)
├── experiment/
├── R1PhotoData/
├── data/
└── citynav/ # Data annotation directory
└── rgbd-new/ # Raw image files
└── training_data/ # Training data directory
└── ...
├── data_examples/ # Examples of some training data
├── eval.py # Model inference and evaluation script
├── open-r1-multimodal/ # GRPO training directory
├── LLaMA-Factory/ # SFT training directory
├── requirements.txt # Combined environment dependency file
├── README.md # This document
├── ...
- Start the vLLM service
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve path/to/your/model \
--dtype auto \
--trust-remote-code \
--served-model-name qwen_2_5_vl_7b \
--host 0.0.0.0 \
-tp 4 \
--uvicorn-log-level debug \
--port your_port \
--limit-mm-per-prompt image=2,video=0 \
--max-model-len=32000
- Start the inference script
python eval_by_qwen.py
- Result Visualization
You can use the visualize_prediction function to visualize the predicted target coordinates and the landmark bounding boxes, as well as the actual target coordinates and landmark bounding boxes.
- SFT
cd LLaMA-Factory
llamafactory-cli train examples/train_lora/qwen2vl_lora_sft.yaml
llamafactory-cli export ./LLaMA-Factory/examples/merge_lora/qwen2vl_lora_sft.yaml
2、GRPO
sh ./open-r1-multimodal/run_scripts/run_grpo_rec_lora.sh