[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 8,219 504 Updated May 18, 2025

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

Python 1,023 74 Updated Nov 18, 2024

YuanJianhao508 / RAG-Driver

A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-to-end driving

Python 98 9 Updated Oct 7, 2024

lxtGH / OMG-Seg

A471 OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

Python 1,300 47 Updated May 30, 2025

YingqingHe / ScaleCrafter

[ICLR 2024 Spotlight] Official implementation of ScaleCrafter for higher-resolution visual generation at inference time.

Python 512 27 Updated Mar 7, 2024

TencentARC / MotionCtrl

Official Code for MotionCtrl [SIGGRAPH 2024]

Python 1,432 76 Updated Feb 19, 2025

xvjiarui / IMProv

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

Python 57 6 Updated Sep 26, 2024

ChenyangQiQi / FateZero

[ICCV 2023 Oral] "FateZero: Fusing Attentions for Zero-shot Text-based Video Editing"

Jupyter Notebook 1,150 107 Updated Aug 14, 2023

BAAI-DCAI / Training-Data-Synthesis

[ICLR 2024] Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Python 79 3 Updated Dec 9, 2023

zhaoyue-zephyrus / AVION

[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"

Python 130 10 Updated Jul 31, 2024

InternLM / InternLM

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

Python 6,938 488 Updated Feb 7, 2025

runjiali-rl / Oxford_HIC

😂😂😂Official Implementation for ICCV 2023 paper: OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?

Python 9 1 Updated Feb 23, 2024

OpenRobotLab / UniHSI

[ICLR 2024 Spotlight] Unified Human-Scene Interaction via Prompted Chain-of-Contacts

Python 216 11 Updated Apr 13, 2025

OpenRobotLab / PointLLM

[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds

Python 824 41 Updated May 22, 2025

bytedance / fc-clip

[NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Python 321 28 Updated Feb 5, 2024

bytedance / kmax-deeplab

a PyTorch re-implementation of ECCV 2022 paper based on Detectron2: k-means mask Transformer.

Python 75 10 Updated Jul 28, 2023

IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 16,458 1,507 Updated Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shuyang (Kevin) Sun kevin-ssy

Achievements

Achievements

Organizations

Block or report kevin-ssy

Stars

hwjiang1510 / RayZer

SandAI-org / MAGI-1

facebookresearch / perception_models

deepseek-ai / DeepSeek-R1

FoundationVision / Infinity

deepseek-ai / DeepSeek-V3

Genesis-Embodied-AI / Genesis

rasbt / LLMs-from-scratch

frank-xwang / UnSAM

FoundationVision / LlamaGen

kevin-ssy / CLIP_as_RNN

lllyasviel / Omost

Yujun-Shi / DragDiffusion

FoundationVision / VAR