-
Alibaba
- Hangzhou
- https://daizuozhuo.github.io
Lists (1)
Sort Name ascending (A-Z)
Stars
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
Simple script to parallelize download and extract files for SA-1B Dataset.
VideoSys: An easy and efficient system for video generation
LAVIS - A One-stop Library for Language-Vision Intelligence
Fine-Grained Open Domain Image Animation with Motion Guidance
Fine-Grained Open Domain Image Animation with Motion Guidance
cvpr2024/cvpr2023/cvpr2022/cvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集,极市团队整理
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
🎥 Python and OpenCV-based scene cut/transition detection program & library.
Finetune ModelScope's Text To Video model using Diffusers 🧨
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Aligning pretrained language models with instruction data generated by themselves.
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
Robust Speech Recognition via Large-Scale Weak Supervision
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
METER: A Multimodal End-to-end TransformER Framework
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)