[Benchmark] Adding MVTamperBench #5

amitbcp · 2025-01-08T20:24:12Z

Paper : https://arxiv.org/pdf/2412.19794

…EvalKit into MVTamperBench

…ompass#764) * [Fix] Better Import * [Improvement] Also reuse auxiliary files by default * [Fix] Fix fetch_aux_files * update * implement timeout for sympy usages

* Fixed the cache_path in pixtral.py * Updated response in llava.py only return the model's generated response instead of the whole output * Removed the unnecessary prompt print in LLaVA-OneVision-HF

* support MMMU_Pro * support MMMU_Pro as concat dataset * [Fix] Fix Bug * Fix Bug * [Fix] Fix MMMU_Pro_V judge * Fix vlmeval tools: EVAL * Bug Fix * [Fix] Fix MMMUPro COT evaluation * fix * Fix * Fix

* add ola model * Update ola_model.py * Update builder.py

* update .gitignore * update CIRCULAR implementation * [Refine] Update CircularEval

* add ClaudeWrapper Timeout, add Claude3-5V_Sonnet_20241022_tem07

* add ClaudeWrapper Timeout, add Claude3-5V_Sonnet_20241022_tem07 * add dynamath_noprompt

…s#772) current implement will not work if user use infer only mode under multiple processes

[Model] Add Ovis2

…compass#778) Co-authored-by: zli <zli@mail.com>

Co-authored-by: FangXinyu-0913 <fangxinyutju202009@126.com>

… class (open-compass#790) * support mmbench_c * Fix BUG * update mmbench_c * update SenseChat MaxTokens * Use Dual Eval for MMBench_C by default * [Fix] Fix MMBench-C * [Fix] Update Max_Token Args * [Fix] Fix Bug * [Fix] Fix Circular * update * update * update config * update dlist * Update Config * Update dataset name * add group live * [Minor] Set max_tokens to 2048 * [Minor] Change the max_tokens to 2048 for models that will be evaluated on LiveMMBench * remove print * patch error * fix md5 * [Minor] Update mmqa_display * Fix * minor

* change readme and quickstart * refine readme and quickstart. * refine readme_cn.md

* change readme and quickstart * refine readme and quickstart. * refine readme_cn.md * modify date

* add mmalignbench * add judge model gpt4o

…ompass#792) open-compass#657 has deleted this file, while open-compass#729 added it back, which is not needed.😢

srikant86panda and others added 30 commits November 11, 2024 15:43

MVBench data prep fixes

be173fc

handle missing video file with mvbench

6aaddc5

MVTamperBench base

101a33c

merge main and add exception information in generate tsv

d5918e0

add nturgb-d information to generate tsv function

9471a46

sample dataset

d0d0dd6

updated tamperbench

04ec81f

updated md5 and dataset name for complete dataset

1e127ec

Merge branch 'open-compass:main' into main

618c003

resolve conflict and update with VLM

dd3bf5e

fix on moving perception file to correct folder

f7f7a11

Merge branch 'MVTamperBench' of https://github.com/srikant86panda/VLM…

efa3824

…EvalKit into MVTamperBench

adding correct md5 for total ~19k sample

675987e

Local changes

b513f83

Merge remote-tracking branch 'origin/MVTamperBench' into MVTamperBench

fd850ed

Merge branch 'open-compass:main' into main

b568447

Local Changes to run

297039f

resolve merge conflict

b5063ae

updated evaluation

869ed13

updated to latest change

f54ac3a

incroporated metrics and pre processing change

1454281

sample file support

ce511b7

reporting fix

0e6fb2c

fix with md5 check

4863148

updated md5 and dataset support

8b6c816

ntu data path fix

fb36aa5

dataset support added on run.py

9f0d451

ntu files removed

a6f2c2d

md5 updated

b18808f

run file updated

46dfe35

kennymckormick and others added 30 commits February 1, 2025 19:49

[Improvement] Refine reuse strategy & implement sympy timeout (open-c…

a810188

…ompass#764) * [Fix] Better Import * [Improvement] Also reuse auxiliary files by default * [Fix] Fix fetch_aux_files * update * implement timeout for sympy usages

[Fix] Fix minor issues for llava 1.5

02ff9db

add mathverse_visiononly_cot (open-compass#765)

8b823bd

Updated the generate in LLaVA-OneVision-HF (open-compass#761)

2f1f6c8

* Fixed the cache_path in pixtral.py * Updated response in llava.py only return the model's generated response instead of the whole output * Removed the unnecessary prompt print in LLaVA-OneVision-HF

[Minor] Support cmd vlmutil scan

701f9c0

[Benchmark] Support MMMU Pro (open-compass#768)

87f2dbe

* support MMMU_Pro * support MMMU_Pro as concat dataset * [Fix] Fix Bug * Fix Bug * [Fix] Fix MMMU_Pro_V judge * Fix vlmeval tools: EVAL * Bug Fix * [Fix] Fix MMMUPro COT evaluation * fix * Fix * Fix

[Model] Add Ola Model (open-compass#752)

fa829b9

* add ola model * Update ola_model.py * Update builder.py

[Minor] Update SiliconFlowAPI

61ef196

[Minor] Fix SiliconFlowAPI

9a4136b

[Minor] Adopt CircularEval for all datasets with 'circular' in names

6e94c1e

[Improvement] Better Implementation for CircularEval (open-compass#770)

90a80d2

* update .gitignore * update CIRCULAR implementation * [Refine] Update CircularEval

Claude Timeout (open-compass#771)

b1d59b7

* add ClaudeWrapper Timeout, add Claude3-5V_Sonnet_20241022_tem07

[Fix] Fix build dataset from config for image dataset

6b81b5e

Add Dynamath_noprompt (open-compass#776)

aaa1420

* add ClaudeWrapper Timeout, add Claude3-5V_Sonnet_20241022_tem07 * add dynamath_noprompt

Removed debug prints

67e3a37

[Fix] Move barrier to the starting point of the for-loop (open-compas…

30589e3

…s#772) current implement will not work if user use infer only mode under multiple processes

add Ovis2

7f97843

remove redundant parentheses

5d5693c

Merge pull request open-compass#782 from runninglsy/main

c577277

[Model] Add Ovis2

Update SEEDBench2_Plus MD5

889822d

[Minor] Fix bug in extracting dataset_name from eval file name (open-…

fa26094

…compass#778) Co-authored-by: zli <zli@mail.com>

update worlsense (open-compass#775)

bf73bae

Co-authored-by: FangXinyu-0913 <fangxinyutju202009@126.com>

[Fix] qwen2.5vl evaluation do not use custom prompt (open-compass#786)

a2a82bd

Update README and QUICKSTART (open-compass#795)

1a0335b

* change readme and quickstart * refine readme and quickstart. * refine readme_cn.md

New document (open-compass#796)

2d8e44e

* change readme and quickstart * refine readme and quickstart. * refine readme_cn.md * modify date

Mmalignbench (open-compass#805)

4b94a54

* add mmalignbench * add judge model gpt4o

[Improvement] Refine Prompts for ShortQA Evaluation (open-compass#808)

88bfb0c

[Improvemnet] Remove Unnecessary vlmeval/vlm/internvl_chat.py (open-c…

e90a2c1

…ompass#792) open-compass#657 has deleted this file, while open-compass#729 added it back, which is not needed.😢

Merge branch 'main' into MVTamperBench_updated

f5a59fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Adding MVTamperBench #5

[Benchmark] Adding MVTamperBench #5

Uh oh!

Uh oh!

Uh oh!

[Benchmark] Adding MVTamperBench #5

Are you sure you want to change the base?

[Benchmark] Adding MVTamperBench #5

Uh oh!

Conversation

Uh oh!

Uh oh!