-
City University of HongKong
- HongKong
-
12:08
(UTC -12:00) - http://yuhaoliu7456.github.io/
Highlights
- Pro
Starred repositories
Official code for the CVPR 2025 paper "Navigation World Models".
A general fine-tuning kit geared toward diffusion models.
[CVPR 2025] UniK3D: Universal Camera Monocular 3D Estimation
Ongoing research training transformer models at scale
🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
The ultimate training toolkit for finetuning diffusion models
Set of auxiliary tools to use with image and video generation libaries. Mainly created to be used with diffusers
Python tools for rendering, viewing and generating metric 3D depth videos. Tools for recovering and exporting camera pose and 3D geometry to popular formats as well as tools for projecting depthvid…
Code for MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data (CVPR 2025)
Wan: Open and Advanced Large-Scale Video Generative Models
Pannellum is a lightweight, free, and open source panorama viewer for the web.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Official inference repo for FLUX.1 models
[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.
Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.
[CVPR 2025 Highlight] Real-time dense scene reconstruction with SLAM3R
Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible