-
CSE @ HKUST
- Hong Kong, China
- https://seanzhuh.github.io/
Highlights
- Pro
Stars
Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Code for Scaling Language-Free Visual Representation Learning (WebSSL)
FlashMLA: Efficient MLA decoding kernels
[CVPR2024] OneFormer3D: One Transformer for Unified Point Cloud Segmentation
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[ECCV 2024] The official repo for "Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing"
Personal Implementation of the paper: Nuvo: Neural UV Mapping for Unruly 3D Representations
Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation", accepted by CVPR 2024.
[ECCV 2024] Tokenize Anything via Prompting
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
The repository for Hyperbolic Representation Learning for Computer Vision, ECCV 2022
Curated list of awesome works on unsupervised object localization in 2D images.
[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs
Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)
[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI