-
UESTC | TongYi Laboratory
- Sichuan ⇌ Beijing
-
15:30
(UTC +08:00) - https://zchoi.github.io/
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
Official implementation of the paper "Reliable Few-shot Learning under Dual Noises"
Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Repository for awesome spatial/visual reasoning MLLMs. (focus more on embodied applications)
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
An example RLDS dataset builder for X-embodiment dataset conversion.
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization
[ACL25] Official codebase for "OmniCharacter:Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction" 🔥
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective (EMNLP 2024)
Official Repository of Absolute Zero Reasoner
[ICCV'23] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
The simplest, fastest repository for training/finetuning small-sized VLMs.
✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
[arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents
RainBowLuoCS / GUI-R1
Forked from ritzz-ai/GUI-R1Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
[NeurIPS 2024] Agent Planning with World Knowledge Model
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程