8000 haihuangcode / Starred · GitHub

More Web Proxy on the site http://driver.im/

haihuangcode

Follow

haihuangcode

Follow

8 followers · 5 following

Zhejiang university

Achievements

Achievements

Lists (9)

Sort

clip-adapter-like

LLM

MLLM

OOD Detection

tts

vl-model

图像/视频生成

工具

美学

Stars

ChaofanTao / Autoregressive-Models-in-Vision-Survey

[TMLR 2025🔥] A survey for the autoregressive models in vision.

612 18 Updated May 23, 2025

JiuhaiChen / BLIP3o

Python 832 24 Updated May 23, 2025

kvablack / ddpo-pytorch

DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support

Python 584 53 Updated Mar 22, 2024

yifan123 / flow_grpo

An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 608 19 Updated May 20, 2025

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,239 262 Updated Jan 18, 2025

dvlab-research / VisionZip

Official repository for VisionZip (CVPR 2025)

Python 284 12 Updated Feb 27, 2025

JUNJIE99 / MLVU

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 199 1 Updated Mar 24, 2025

DAMO-NLP-SG / VideoLLaMA3

Frontier Multimodal Foundation Models for Image and Video Understanding

Jupyter Notebook 811 58 Updated May 19, 2025

lllyasviel / FramePack

Lets make video diffusion practical!

Python 13,505 1,164 Updated May 4, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 47,871 7,555 Updated May 23, 2025

vllm-project / vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Python 667 157 Updated May 23, 2025

djghosh13 / geneval

GenEval: An object-focused framework for evaluating text-to-image alignment

HTML 279 17 Updated Mar 3, 2025

Tencent-Hunyuan / HunyuanVideo-I2V

HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo

Python 1,434 129 Updated May 20, 2025

FoundationVision / Infinity

[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 1,282 63 Updated Apr 24, 2025

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,153 619 Updated Apr 27, 2025

OpenGVLab / VideoChat-Flash

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Python 415 11 Updated May 22, 2025

A2C2

rongyaofang / GoT

Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"

Jupyter Notebook 243 9 Updated Apr 30, 2025

songrise / MLLM4Art

[arxiv 2024] MLLMs for art

14 1 Updated Jan 16, 2025

AIGText / Glyph-ByT5

[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Mu…

Jupyter Notebook 578 30 Updated Jul 13, 2024

open-mmlab / mmcv

OpenMMLab Computer Vision Foundation

Python 6,130 1,688 Updated Apr 25, 2025

KaiyangZhou / mixstyle-release

Domain Generalization with MixStyle (ICLR'21)

Python 299 41 Updated Oct 6, 2022

lzhxmu / VTW

Code release for VTW (AAAI 2025) Oral

Python 39 1 Updated Jan 18, 2025

NVlabs / Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 4,164 270 Updated May 20, 2025

facebookresearch / mixup-cifar10

mixup: Beyond Empirical Risk Minimization

Python 1,179 225 Updated Oct 12, 2021

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,506 1,421 Updated May 22, 2025

QwenLM / Qwen2.5-VL

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 10,569 758 Updated May 15, 2025

apple / ml-slowfast-llava

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Python 220 13 Updated Sep 16, 2024

ziplab / LongVLM

Python 98 8 Updated Jul 30, 2024

facebookresearch / ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.

Python 1,051 71 Updated Jun 17, 2024

Tencent-Hunyuan / HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 10,083 880 Updated May 21, 2025

0