Stars
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Implementation for Describe Anything: Detailed Localized Image and Video Captioning
这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning ca…
Code for our EMNLP-2022 paper: "Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA"
Janus-Series: Unified Multimodal Understanding and Generation Models
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
🚀 [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with simple questio…
[CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
GQA-OOD is a new dataset and benchmark for the evaluation of VQA models in OOD (out of distribution) settings.
PyInstaller Extractor Next Generation
中文nlp解决方案(大模型、数据、模型、训练、推理)
EditThisCookie is the famous Google Chrome/Chromium extension for editing cookies
Counterfactual Samples Synthesizing for Robust VQA
按照汉字笔画顺序依次展示的图片数据集
Tutorial series on brush stroke rendering
A tool to extract embedded files from application virtualizers
A generative world for general-purpose robotics & embodied AI learning.