Closed
Description
Model:
- ModelScope: https://www.modelscope.cn/models/iic/mPLUG-Owl3-7B-240728
- Huggingface: https://huggingface.co/mPLUG/mPLUG-Owl3-7B-240728
Usually, fine-tuning a multimodal large model involves using a custom dataset for fine-tuning. Here, we will demonstrate a runnable demo.
Fine-tuned Dataset:
- https://www.modelscope.cn/datasets/modelscope/coco_2014_caption
- https://www.modelscope.cn/datasets/swift/VideoChatGPT
Before starting the fine-tuning, please ensure that your environment is properly prepared.
git clone https://github.com/modelscope/ms-swift.git
cd swift
pip install -e .[llm]
pip install decord icecream
Inference
# ModelScope
CUDA_VISIBLE_DEVICES=0 swift infer \
--model_type mplug-owl3-7b-chat \
--model_id_or_path iic/mPLUG-Owl3-7B-240728 \
# HuggingFace
USE_HF=1 CUDA_VISIBLE_DEVICES=0 swift infer \
--model_type mplug-owl3-7b-chat \
--model_id_or_path mPLUG/mPLUG-Owl3-7B-240728 \
Results
<<< who are you
I am an AI language model, designed to assist with a variety of tasks such as answering questions and providing information. I do not have a physical form, but rather exist as a program running on a computer. Is there anything specific you would like me to help you with?
--------------------------------------------------
<<< <image>describe the image
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
[INFO:swift] Setting max_num_frames: 16. You can adjust this hyperparameter through the environment variable: `MAX_NUM_FRAMES`.
This is a very cute photo of a kitten! The kitten has beautiful blue eyes and a very fluffy coat. It's adorable to see how it looks at the camera. The colors in the photo are very natural and well-balanced, which adds to the overall cuteness of the image. Great job capturing this adorable moment!
--------------------------------------------------
<<< clear
<<< <video>describe the video
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
The video captures a young child's interest in reading and learning, as they are seen sitting on a bed and flipping through the pages of a book while wearing glasses. The child appears to be engaged and curious about the content of the book.