Motion-Appearance Synergistic Networks for VideoQA (MASN)

Pytorch Implementation for the paper:

Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo, Gi-Cheon Kang, Joonhan Park, and Byoung-Tak Zhang
In ACL 2021

Requirements

python 3.7, pytorch 1.2.0

Dataset

Download TGIF-QA dataset and refer to the paper for details.
Download MSVD-QA and MSRVTT-QA.

Extract Features

Appearance Features

For local features, we used the Faster-RCNN pre-trained with Visual Genome. Please cite this Link.
- After you extracted object features by Faster-RCNN, you can convert them to hdf5 file with simple run: python adaptive_detection_features_converter.py
For global features, we used ResNet152 provided by torchvision. Please cite this Link.

Motion Features

For local features, we use RoIAlign with bounding box features obtained from Faster-RCNN. Please cite this Link.
For global features, we use I3D pre-trained on Kinetics. Please cite this Link.

We uploaded our extracted features:

TGIF-QA

res152_avgpool.hdf5: appearance global features (3GB).
tgif_btup_f_obj10.hdf5: appearance local features (30GB).
tgif_i3d_hw7_perclip_avgpool.hdf5: motion global features (3GB).
tgif_i3d_roialign_hw7_perclip_avgpool.hdf5: motion local features (59GB).

MSRVTT-QA

msrvtt_res152_avgpool.hdf5: appearance global features (1.7GB).
msrvtt_btup_f_obj10.hdf5: appearance local features (17GB).
msrvtt_i3d_avgpool_perclip.hdf5: motion global features (1.7GB).
msrvtt_i3d_roialign_perclip_obj10.hdf5: motion local features (34GB).

MSVD-QA

msvd_res152_avgpool.hdf5: appearance global features (220MB).
msvd_btup_f_obj10.hdf5: appearance local features (2.2GB).
msvd_i3d_avgpool_perclip.hdf5: motion global features (220MB).
msvd_i3d_roialign_perclip_obj10.hdf5: motion local features (4.2GB).

Training

Simple run

CUDA_VISIBLE_DEVICES=0 python main.py --task Count --batch_size 32

For MSRVTT-QA, run

CUDA_VISIBLE_DEVICES=0 python main_msrvtt.py --task MS-QA --batch_size 32

For MSVD-QA, run

CUDA_VISIBLE_DEVICES=0 python main_msvd.py --task MS-QA --batch_size 32

Saving model checkpoints

By default, our model save model checkpoints at every epoch. You can change the path for saving models by --save_path options. Each checkpoint's name is '[TASK]_[PERFORMANCE].pth' in default.

Evaluation & Results

CUDA_VISIBLE_DEVICES=0 python main.py --test --checkpoint [NAME] --task Count --batch_size 32

Per 682C formance on TGIF-QA dataset:

Model	Count	Action	Trans.	FrameQA
MASN	3.75	84.4	87.4	59.5

You can download our pre-trained model by this link : Count, Action, Trans., FrameQA

Performance on MSRVTT-QA and MSVD-QA dataset:

Model	MSRVTT-QA	MSVD-QA
MASN	35.2	38.0

Citation

If this repository is helpful for your research, we'd really appreciate it if you could cite the following paper:

@article{seo2021attend,
  title={Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering},
  author={Seo, Ahjeong and Kang, Gi-Cheon and Park, Joonhan and Zhang, Byoung-Tak},
  journal={arXiv preprint arXiv:2106.10446},
  year={2021}
}

License

MIT License

Acknowledgements

This work was partly supported by the Institute of Information & Communications Technology Planning & Evaluation (2015-0-00310-SW.StarLab/25%, 2017-0-01772-VTT/25%, 2018-0-00622-RMI/25%, 2019-0-01371-BabyMind/25%) grant funded by the Korean government.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data_utils		data_utils
model		model
LICENSE		LICENSE
README.md		README.md
adaptive_detection_features_converter.py		adaptive_detection_features_converter.py
embed_loss.py		embed_loss.py
main.py		main.py
main_msrvtt.py		main_msrvtt.py
main_msvd.py		main_msvd.py
model_overview.jpeg		model_overview.jpeg
util.py		util.py
warmup_scheduler.py		warmup_scheduler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Motion-Appearance Synergistic Networks for VideoQA (MASN)

Requirements

Dataset

Extract Features

Training

Saving model checkpoints

Evaluation & Results

Citation

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AmmieQi/MASN-pytorch

Folders and files

Latest commit

History

Repository files navigation

Motion-Appearance Synergistic Networks for VideoQA (MASN)

Requirements

Dataset

Extract Features

Training

Saving model checkpoints

Evaluation & Results

Citation

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages