8000 HarryHsing (XING, Zhenghao) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View HarryHsing's full-sized avatar
🎾
TTWSYF
🎾
TTWSYF

Highlights

  • Pro

Block or report HarryHsing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

JavaScript 57 1 Updated Jun 26, 2025

Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

Python 74 1 Updated Jun 26, 2025

Unsupervised GRPO

Python 38 1 Updated Jun 11, 2025

[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"

Python 300 5 Updated Jun 27, 2025
86 5 Updated May 16, 2025

[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation

Python 740 21 Updated May 23, 2025

Open-source unified multimodal model

Python 4,360 364 Updated Jun 17, 2025

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Visual Reasoning with Qwen2.5-Omni]

Python 36 2 Updated May 18, 2025

EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights (CVPR 2025)

5 Updated May 8, 2025

PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.

Python 491 29 Updated Jun 26, 2025

Train transformer language models with reinforcement learning.

Python 14,365 1,994 Updated Jun 27, 2025

This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels

36 1 Updated Jun 20, 2025

DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.

TypeScript 14,391 1,735 Updated Jun 27, 2025

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

2,444 112 Updated Jun 20, 2025

AudioBench: A Universal Benchmark for Audio Large Language Models

Python 228 9 Updated Jun 17, 2025

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

668 20 Updated Jun 21, 2025

TTRL: Test-Time Reinforcement Learning

Python 668 49 Updated Jun 26, 2025

Implementation for Describe Anything: Detailed Localized Image and Video Captioning

Python 1,190 68 Updated Jun 26, 2025

The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"

Python 113 6 Updated Jun 19, 2025

Lets make video diffusion practical!

Python 14,716 1,323 Updated Jun 27, 2025

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Python 124 1 Updated Apr 24, 2025

[arXiv 2025] Efficient Reasoning Models: A Survey

Python 198 12 Updated Jun 24, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,213 247 Updated Jun 12, 2025

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,384 58 Updated May 11, 2025
Python 901 58 Updated Mar 24, 2025

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 18,573 1,523 Updated Jun 16, 2025

A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.

64 7 Updated Mar 18, 2025
4 Updated Mar 3, 2025

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

937 43 Updated Jun 18, 2025
Next
0