VDEdit

This is the official PyTorch implementation for the ACM MM2024 paper:

Edit As You Wish: Video Description Editing with Multi-grained Commands

We propose a novel Video Description Editing (VDEdit) task to automatically revise an existing video description guided by flexible user requests. To facilitate the VDEdit task, we automatically construct an open-domain dataset namely VATEX-EDIT and manually collect an e-commerce benchmark dataset called EMMAD-EDIT.

Datasets

The VATEX-EDIT (EN) and EMMAD-EDIT (CN) dataset will be released publicly and can be downloaded from the dataset page.

Requirements

conda create -n VDEdit python=3.6
conda activate VDEdit
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers==4.3.3
pip install lemminflect==0.2.1
pip install inflect==5.2.0
pip install nltk==3.6.2

Training

We initialize the proposed OPA model with pre-trained English BART / Chinese BART when training on the VATEX-EDIT (EN) / EMMAD-EDIT (CN) dataset.

Take training on the EMMAD-EDIT datase 65AD t as an example. Please make sure you have downloaded the training/validation/test.json & middle_files/*.* and put them under codes/data/emmad-edit/.

(Optional) Data Processing. skip this if you download the middle_files/

# preprocess video features to get .tsv files to speed up training
cd codes/data_process/get_videoFeat_tsv
python create_video_tsv_EMMAD-EDIT.py

Training & Inference

cd codes/models_add_vision_cn
./train.sh
./infer.sh

The predicted file during inference will be put under the checkpoint folder ./checkpoints_vision_cn/.

Evaluation

For VDEdit evaluation, we adopt comprehensive metrics to measure three aspects of model performance, including caption quality, caption-command consistency, and caption-video alignment. We also provide the Chinese version of above metrics. The reference repositories for all metrics are as follows:

Caption Quality (Fluency)

PPL(GPT-2): lm_perplexity

BLEU4 and ROUGE-L: coco_caption
Caption-command Consistency (Controllability)

SARI: iterater

Len-Acc / Attr-Acc / Pos-Acc: proposed in this paper
Caption-video Alignment (Vision Align)

EMScore: emscore [Note] Run EMScore need to modify the encode_text() function in CLIP/clip/model.py, please refer to the original repo.

We integrate and modify the orginal metric codes to support both EN/CN evaluation. The evaluation codes will be slightly different in English and Chinese environments. If you want to evaluate on the English data cd metrics/eval_en, else cd metrics/eval_cn.

The overall evaluation results can be obtained by running the following script and the results will be saved in eval_log file:

cd metrics/eval_cn
bash eval_cn_overall.sh > eval_log

# you can define the predicted file in the .sh
testfile=your_self_predicted_file
# the exp_name is to label the different predicted files
exp_name=your_exp_name

If you also want to print the evaluation results of breakdown commands, you can run the following script to get results of 7 specific commands:

bash eval_cn_breakdown.sh > eval_log_bk

Citation

@article{yao2023edit,
  title={Edit As You Wish: Video Description Editing with Multi-grained Commands},
  author={Yao, Linli and Zhang, Yuanmeng and Wang, Ziheng and Hou, Xinglin and Ge, Tiezheng and Jiang, Yuning and Jin, Qin},
  journal={arXiv preprint arXiv:2305.08389},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
codes		codes
dataset		dataset
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VDEdit

Datasets

Requirements

Training

Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

yaolinli/VDEdit

Folders and files

Latest commit

History

Repository files navigation

VDEdit

Datasets

Requirements

Training

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages