- Shatin, N.T., HKSAR
- https://lixin4ever.github.io/
- @lixin4ever
Stars
🌐 WebAgent for Information Seeking built by Tongyi Lab: WebWalker & WebDancer & WebSailor https://arxiv.org/pdf/2507.02592
WorldVLA: Towards Autoregressive Action World Model
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
open-source coding LLM for software engineering tasks
PyTorch code and models for VJEPA2 self-supervised learning from video.
🔥🔥First-ever hour scale video understanding models
EOC-Bench, an innovative benchmark designed to systematically evaluate object-centric embodied cognition in dynamic egocentric scenarios.
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning
Official code for paper "GRIT: Teaching MLLMs to Think with Images"
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给 C583 的无声视频添加生动而且同步的音效 😝
Workshop: Build with Gemini
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
This repo contains evaluation code for the paper "Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency"
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Embodied Reasoning Question Answer (ERQA) Benchmark
Lightweight coding agent that runs in your terminal
[ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
[ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems.