8000 lixin4ever (LI XIN) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View lixin4ever's full-sized avatar
🍉
I may be slow to respond before the due date of ACL.
🍉
I may be slow to respond before the due date of ACL.

Organizations

@dmlc @textmine

Block or report lixin4ever

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🌐 WebAgent for Information Seeking built by Tongyi Lab: WebWalker & WebDancer & WebSailor https://arxiv.org/pdf/2507.02592

Python 3,569 251 Updated Jul 11, 2025

WorldVLA: Towards Autoregressive Action World Model

Python 257 10 Updated Jul 5, 2025

MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.

Python 2,639 207 Updated Jul 7, 2025

open-source coding LLM for software engineering tasks

Python 731 85 Updated Jun 27, 2025

PyTorch code and models for VJEPA2 self-supervised learning from video.

Python 1,842 147 Updated Jul 2, 2025
Python 3 Updated Jun 9, 2025

🔥🔥First-ever hour scale video understanding models

Python 490 29 Updated Jul 9, 2025

EOC-Bench, an innovative benchmark designed to systematically evaluate object-centric embodied cognition in dynamic egocentric scenarios.

Python 12 1 Updated Jun 17, 2025

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Python 32 1 Updated Jul 5, 2025

The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning

Python 287 15 Updated May 31, 2025

Official code for paper "GRIT: Teaching MLLMs to Think with Images"

Python 108 2 Updated Jun 23, 2025

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给 C583 的无声视频添加生动而且同步的音效 😝

Python 607 61 Updated Jul 26, 2024

ICCV2025

Python 103 3 Updated Jun 28, 2025

Workshop: Build with Gemini

Jupyter Notebook 316 44 Updated Jul 7, 2025
Python 8 Updated Mar 2, 2025

[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions

Python 548 26 Updated Jul 2, 2025

This repo contains evaluation code for the paper "Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency"

Python 9 Updated May 12, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,944 265 Updated Jun 21, 2025

Embodied Reasoning Question Answer (ERQA) Benchmark

Python 183 8 Updated Mar 12, 2025

Lightweight coding agent that runs in your terminal

Rust 30,756 3,533 Updated Jul 11, 2025

[ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"

Python 159 36 Updated May 15, 2025

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Python 162 4 Updated Jun 9, 2025

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

Python 530 47 Updated Apr 28, 2025

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

950 42 Updated Jun 22, 2025

The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"

Python 25 1 Updated May 6, 2025
Python 488 41 Updated Jul 7, 2025

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Python 1,521 91 Updated Apr 24, 2025

[ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems.

Python 159 9 Updated Jun 8, 2025

Spark-TTS Inference Code

Python 10,008 1,056 Updated Apr 9, 2025
Next
0