8000 RCLai1015 (Ruei-Chi Lai) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View RCLai1015's full-sized avatar
🚀
Focusing
🚀
Focusing
  • Taipei, Taiwan
  • 23:20 (UTC +08:00)

Organizations

@WiFiBoy

Block or report RCLai1015

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Jupyter Notebook 367 31 Updated Dec 15, 2024

[CVPR 2024] DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

Python 294 29 Updated Apr 2, 2025

[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Python 203 5 Updated Apr 4, 2025
Python 26 1 Updated Apr 20, 2025

Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models'

13 Updated Jun 4, 2025

A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.

Swift 15,858 306 Updated Jun 26, 2025

The mouse and trackpad utility for Mac.

Swift 4,648 84 Updated Jun 16, 2025

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Python 50 2 Updated Jun 4, 2025
Python 1,237 46 Updated Jun 22, 2025

Segment Anything in Medical Images

Jupyter Notebook 3,613 499 Updated May 7, 2025

Official repository for "AM-RADIO: Reduce All Domains Into One"

Python 1,224 47 Updated Jun 27, 2025

Official Repo for "TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding" [ACL 2025 oral]

Python 1,315 164 Updated Jun 25, 2025

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 3,567 315 Updated Jun 27, 2025

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 439 15 Updated Jan 4, 2025

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,337 75 Updated May 28, 2025

TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning

Python 78 Updated May 22, 2025

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,004 449 Updated Aug 7, 2024

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 16,520 1,509 Updated Sep 5, 2024

Implementation for Describe Anything: Detailed Localized Image and Video Captioning

Python 1,190 68 Updated Jun 26, 2025

The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"

Python 119 5 Updated Jun 5, 2025

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

Jupyter Notebook 46 5 Updated Jun 23, 2025

[CVPR 2025] EgoLife: Towards Egocentric Life Assistant

Python 297 18 Updated Mar 19, 2025

A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World

Python 274 14 Updated Nov 29, 2024
Python 2 Updated Apr 28, 2025

[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.

Python 124 9 Updated Jun 4, 2025

Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".

Python 30 Updated Jun 26, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framewor 4332 k based on veRL

Python 2,810 213 Updated Jun 27, 2025

Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"

Python 429 17 Updated Jun 12, 2025

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Python 316 9 Updated Apr 11, 2025
Next
0