RCLai1015

🚀

Focusing

Ruei-Chi Lai RCLai1015

🚀

Focusing

Nice to meet you :D

17 followers · 69 following

@WiFiBoy
Taipei, Taiwan
23:20 (UTC +08:00)

Achievements

Organizations

Lists (1)

Sort

🚀 Super Slam

Super Slammmmm

Stars

RL4VLM / RL4VLM

Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Jupyter Notebook 367 31 Updated Dec 15, 2024

tangjiapeng / DiffuScene

[CVPR 2024] DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

Python 294 29 Updated Apr 2, 2025

dongyh20 / Insight-V

[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Python 203 5 Updated Apr 4, 2025

zou-group / OpenBiomedVid

Python 26 1 Updated Apr 20, 2025

neu-vi / struct2d

Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models'

13 Updated Jun 4, 2025

apple / container

A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.

Swift 15,858 306 Updated Jun 26, 2025

linearmouse / linearmouse

The mouse and trackpad utility for Mac.

Swift 4,648 84 Updated Jun 16, 2025

haoningwu3639 / SpatialScore

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Python 50 2 Updated Jun 4, 2025

JiuhaiChen / BLIP3o

Python 1,237 46 Updated Jun 22, 2025

bowang-lab / MedSAM

Segment Anything in Medical Images

Jupyter Notebook 3,613 499 Updated May 7, 2025

brianmg / voynich-nlp-analysis

Python 122 5 Updated May 26, 2025

NVlabs / RADIO

Official repository for "AM-RADIO: Reduce All Domains Into One"

Python 1,224 47 Updated Jun 27, 2025

TIGER-AI-Lab / TheoremExplainAgent

Official Repo for "TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding" [ACL 2025 oral]

Python 1,315 164 Updated Jun 25, 2025

huggingface / nanoVLM

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 3,567 315 Updated Jun 27, 2025

pkunlp-icler / FastV

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 439 15 Updated Jan 4, 2025

facebookresearch / perception_models

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,337 75 Updated May 28, 2025

ZhangXJ199 / TinyLLaVA-Video-R1

TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning

Python 78 Updated May 22, 2025

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,004 449 Updated Aug 7, 2024

IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 16,520 1,509 Updated Sep 5, 2024

NVlabs / describe-anything

Implementation for Describe Anything: Detailed Localized Image and Video Captioning

Python 1,190 68 Updated Jun 26, 2025

TIGER-AI-Lab / VL-Rethinker

The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"

Python 119 5 Updated Jun 5, 2025

gersteinlab / medagents-benchmark

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

Jupyter Notebook 46 5 Updated Jun 23, 2025

EvolvingLMMs-Lab / EgoLife

[CVPR 2025] EgoLife: Towards Egocentric Life Assistant

Python 297 18 Updated Mar 19, 2025

ZCMax / LLaVA-3D

A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World

Python 274 14 Updated Nov 29, 2024

EnriqueSolarte / open_eqa_utils

Python 2 Updated Apr 28, 2025

LaVi-Lab / Video-3D-LLM

[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.

Python 124 9 Updated Jun 4, 2025

Haochen-Wang409 / ross3d

Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".

Python 30 Updated Jun 26, 2025

hiyouga / EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framewor 4332 k based on veRL

Python 2,810 213 Updated Jun 27, 2025

dvlab-research / Seg-Zero

Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"

Python 429 17 Updated Jun 12, 2025

Qi-Zhangyang / GPT4Scene-and-VLN-R1

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Python 316 9 Updated Apr 11, 2025