8000 geyuying (Yuying Ge) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View geyuying's full-sized avatar

Block or report geyuying

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 10 Updated Jun 23, 2025

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

Python 83 1 Updated Jun 5, 2025
Python 33 Updated Jun 4, 2025

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 580 29 Updated May 28, 2025

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Python 52 Updated Jun 3, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,198 246 Updated Jun 12, 2025

Lets make video diffusion practical!

Python 14,646 1,315 Updated May 4, 2025

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models

Python 10 Updated Apr 1, 2025
Python 84 1 Updated Jun 23, 2025

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

Python 318 27 Updated Apr 9, 2025

A post-training method to enhance CLIP's fine-grained visual representations with generative models.

Python 53 Updated Mar 27, 2025

[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Python 90 2 Updated Mar 20, 2025

✨First Open-Source R1-like Video-LLM [2025/02/18]

Python 348 12 Updated Feb 23, 2025

SALMONN family: A suite of advanced multi-modal LLMs

1,266 101 Updated Jun 20, 2025

VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE

Python 334 7 Updated Jan 19, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,405 647 Updated May 29, 2025

Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Java 1,976 177 Updated Mar 18, 2024

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Python 1,465 151 Updated Jun 10, 2024

A suite of image and video neural tokenizers

Jupyter Notebook 1,636 78 Updated Feb 11, 2025
Jupyter Notebook 25 1 Updated Apr 11, 2025
Python 93 7 Updated Nov 27, 2024

Inference script for Oasis 500M

Python 1,850 160 Updated Nov 8, 2024

Latent Motion Token as the Bridging Language for Robot Manipulation

Python 105 1 Updated May 11, 2025

Implement FVD in pytorch

Python 9 1 Updated Apr 12, 2024
Python 2,116 157 Updated Nov 8, 2024

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

588 31 Updated Jun 21, 2025

Next-Token Prediction is All You Need

Python 2,152 81 Updated Mar 17, 2025

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 11,594 1,123 Updated Jun 17, 2025

[ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Python 312 16 Updated Feb 27, 2025
Next
0