In this challenge, the model is expected to answer the question according to the given image-text pair. Information diversity, multimedia multi-step reasoning and open-ended answer make our task more challenging than the existing dataset. The aim of this challenge is to develop and benchmark models that are able to multimedia entity alignment, multi-step reasoning and open-ended answer generation.
Coming soon.