Luyitas

Mingda Jia Luyitas

10 followers · 24 following

Peking University
ShenZhen, China

Stars

ant-research / lumos

[CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text

Python 51 Updated Mar 16, 2025

zhoubolei / bolei_awesome_posters

CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!

1,671 150 Updated May 9, 2023

hrtang22 / MUSE

Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"

Python 21 1 Updated Feb 2, 2025

littlespray / VE-Bench

[AAAI 25] Official Implementation for ”E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment“

Python 42 1 Updated Apr 22, 2025

bbbdylan / proda

[CVPR2022] PyTorch re-implementation of Prompt Distribution Learning

18 1 Updated May 6, 2023

weijianan1 / LogicHOI

[NeurIPS2023] Neural-Logic Human-Object Interaction Detection

Python 11 2 Updated Aug 24, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,911 130 Updated Oct 30, 2024

xingaoli / DP-HOI

Disentangled Pre-training for Human-Object Interaction Detection

Python 21 1 Updated Nov 3, 2024

dalmia / siren

PyTorch implementation of Sinusodial Representation networks (SIREN)

Python 264 11 Updated Dec 8, 2022

dwromero / ckconv

Code repository of the paper "CKConv: Continuous Kernel Convolution For Sequential Data" published at ICLR 2022. https://arxiv.org/abs/2102.02611

Python 121 16 Updated Nov 29, 2022

test-time-training / ttt-lm-pytorch

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 1,208 77 Updated Jul 14, 2024

facebookresearch / chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 2,011 114 Updated Jul 29, 2024

NVlabs / GroupViT

Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.

Python 761 54 Updated May 10, 2022

kyegomez / Vit-RGTS

Open source implementation of "Vision Transformers Need Registers"

Python 179 15 Updated Apr 6, 2025

shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Python 341 28 Updated Aug 24, 2024

DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 1,178 80 Updated Jan 23, 2025

ShareGPT4Omni / ShareGPT4Video

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Python 1,059 41 Updated Oct 9, 2024

FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,774 78 Updated Aug 15, 2024

zhaoyue-zephyrus / bsq-vit

[ICLR 2025][arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization

Python 162 Updated Jun 12, 2024

LinuxSuRen / remote-jobs-in-china

支持远程办公的中国公司

2,711 93 Updated Dec 30, 2024

FoundationVision / OmniTokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.

Python 296 7 Updated Jul 9, 2024

Cominclip / BoxDiff-XL

Extend BoxDiff to SDXL (SDXL-based layout-to-image generation)

Python 24 1 Updated May 23, 2024

ChocoWu / SeTok

Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM

Python 64 1 Updated Apr 19, 2025

THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 6,618 561 Updated Apr 19, 2025

MME-Benchmarks / Video-MME

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

564 21 Updated May 8, 2025

briannlongzhao / DreamDistribution

Python 97 5 Updated Apr 21, 2025

PRIV-Creation / Awesome-Controllable-T2I-Diffusion-Models

A collection of resources on controllable generation with text-to-image diffusion models.

1,048 28 Updated Dec 31, 2024

HVision-NKU / StoryDiffusion

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Jupyter Notebook 6,301 638 Updated Sep 26, 2024

h-zhao1997 / cobra

[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

Python 278 11 Updated Jan 8, 2025

PKU-YuanGroup / Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Python 940 46 Updated Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly