This is a multi-modal large language model playground, way more than a benchmark.
MLLM-playground, short for Multimodal Large Language Model Playground, is a toolkit designed to streamline the training and evaluation processes for various vision-and-language datasets using different multimodal large language models. This project offers a unified and user-friendly interface to facilitate the experimentation and development of multimodal models.
Follow these steps to set up MLLM-playground on your local machine.
- Python >= 3.10
-
Clone the repository, and navigate to the project directory.
git clone https://github.com/chu0802/MLLM-playground.git cd MLLM-playground
-
Install dependencies:
pip install -r requirements.txt
To train and evaluate models using MLLM-playground, two main scripts are provided:
train.py
eval.py
Before running these scripts, it's essential to set up the configuration files appropriately.
-
Training Configuration:
- Open
train_config.yaml
and adjust the parameters according to your experimental setup. Specify the dataset, model architecture, hyperparameters, and any other relevant settings.
- Open
-
Evaluation Configuration:
- Similarly, in
eval_config.yaml
, configure the parameters needed for evaluation, such as the path to the trained model, evaluation metrics, etc.
- Similarly, in
Run the training script using the following command:
python train.py --cfg-path train_config.yaml
We provide basic settings in train_config.yaml
. But you can overwrite specific settings in the train_config.yaml
file by adding command-line options. For example:
python train.py --cfg-path train_config.yaml --options dataset.name=ScienceQA dataset.split.train.batch_size=16
This command will overwrite the dataset and batch size for training specified in the configuration file.
After training, you can evaluate the model by running the evaluation script:
python eval.py --cfg-path eval_config.yaml
Similarly, ensure that the evaluation configuration in eval_config.yaml
is appropriately set up for your experiment, and you can also overwrite the settings by specifying --options
arguments.
During training, we can monitor the progress on the WandB dashboard. The trained model will be saved for every epoch according to the settings in the configuration file.
By following these steps, you can efficiently train and evaluate multimodal large language models on various datasets using MLLM-playground.
This code base is partially based on LVLM-eHub [paper, code] and Lavis [paper, code].