EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Visual Reasoning with Qwen2.5-Omni]

Python 36 2 Updated May 18, 2025

HarryHsing / EchoTraffic

EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights (CVPR 2025)

5 Updated May 8, 2025

NVIDIA / audio-flamingo

PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.

Python 491 29 Updated Jun 26, 2025

huggingface / trl

Train transformer language models with reinforcement learning.

Python 14,365 1,994 Updated Jun 27, 2025

threegold116 / Awesome-Omni-MLLMs

This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels

36 1 Updated Jun 20, 2025

bytedance / deer-flow

DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.

TypeScript 14,391 1,735 Updated Jun 27, 2025

yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

2,444 112 Updated Jun 20, 2025

AudioLLMs / AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

Python 228 9 Updated Jun 17, 2025

yaotingwangofficial / Awesome-MCoT

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

668 20 Updated Jun 21, 2025

PRIME-RL / TTRL

TTRL: Test-Time Reinforcement Learning

Python 668 49 Updated Jun 26, 2025

NVlabs / describe-anything

Implementation for Describe Anything: Detailed Localized Image and Video Captioning

Python 1,190 68 Updated Jun 26, 2025

bytedance / vidi

The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"

Python 113 6 Updated Jun 19, 2025

lllyasviel / FramePack

Lets make video diffusion practical!

Python 14,716 1,323 Updated Jun 27, 2025

UCSC-VLAA / VLAA-Thinking

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Python 124 1 Updated Apr 24, 2025

fscdc / Awesome-Efficient-Reasoning-Models

[arXiv 2025] Efficient Reasoning Models: A Survey

Python 198 12 Updated Jun 24, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,213 247 Updated Jun 12, 2025

BytedTsinghua-SIA / DAPO

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,384 58 Updated May 11, 2025

HumanMLLM / R1-Omni

Python 901 58 Updated Mar 24, 2025

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 18,573 1,523 Updated Jun 16, 2025

Wild-Cooperation-Hub / Awesome-MLLM-Reasoning-Benchmarks

A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.

64 7 Updated Mar 18, 2025

russellyq / MedHallTune

4 Updated Mar 3, 2025

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

937 43 Updated Jun 18, 2025