Starred repositories
It is my belief that you, the postgraduate students and job-seekers for whom the book is primarily meant will benefit from reading it; however, it is my hope that even the most experienced research…
Bag of Tricks and A Strong Baseline for Deep Person Re-identification
Reading list for research topics in multimodal machine learning
A treasure chest for visual classification and recognition powered by PaddlePaddle
SIGIR paper Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback
This repo consists of the QA dataset collected for performing person search with natural language.
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Ins…
Official implementation of the Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) | ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Open-source toolbox for visual fashion analysis based on PyTorch
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
Official repository of ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning
A curated list of Multimodal Related Research.
A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsens…
A curated list of deep learning resources for video-text retrieval.
Awesome Cross-modality Person Re-identification
A Simple, High-efficiency, Strong framework for person re-Identification.
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
A reading list of papers about Visual Question Answering.
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)