Issue with the data type for images in the latest version of VLMEvalKit

Hello,
I tried to run the evaluation script for InternVL-Chat-V1-5 with the latest version of VLMEvalKit (main branch) and got the following error:

Processing JSON file: imdb_multiple_mcq.json
InternVL model version: V1.5
Traceback (most recent call last):
  File "./VLMEvalKit/evaluate.py", line 86, in <module>
    main()
  File "/./VLMEvalKit/evaluate.py", line 73, in main
    ret = model.generate(question_input, dataset="MCQ")
  File "./VLMEvalKit/vlmeval/vlm/base.py", line 116, in generate
    return self.generate_inner(message, dataset)
  File "./VLMEvalKit/vlmeval/vlm/internvl/internvl_chat.py", line 467, in generate_inner
    return self.generate_v1_5(message, dataset)
  File "./VLMEvalKit/vlmeval/vlm/internvl/internvl_chat.py", line 348, in generate_v1_5
    max_num = max(1, min(self.max_num, self.total_max_num // image_num))
ZeroDivisionError: integer division or modulo by zero

I checked the input to the function generate_v1_5 in "./VLMEvalKit/vlmeval/vlm/internvl/internvl_chat.py" (line 348) and got the following:

[{'type': 'text', 'value': 'images/imdb_multiple_mcq/question_1_1.png'}, {'type': 'text', 'value': 'images/imdb_multiple_mcq/question_1_2.png'}, {'type': 'text', 'value': 'images/imdb_multiple_mcq/question_1_3.png'}, {'type': 'text', 'value': 'images/imdb_multiple_mcq/question_1_4.png'}, {'type': 'text', 'value': "Which images contain the celebrity ..."}]

So it seems image paths are considered as text by the parser. I got similar issue when I tried other InternVL models (eg InternVL2-1B). Can you help how FaceXBench can be used with the latest version of VLMEvalKit?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions