-
Microsoft Research Asia
Stars
Wan: Open and Advanced Large-Scale Video Generative Models
A generative world for general-purpose robotics & embodied AI learning.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Open-Sora: Democratizing Efficient Video Production for All
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
Generative Models by Stability AI
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
A latent text-to-image diffusion model
A unified 3D Transformer Pipeline for visual synthesis
Open source code for AAAI 2023 Paper "BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning"
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)
M3U8-Downloader 支持多线程、断点续传、加密视频下载缓存。
Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
My best practice of training large dataset using PyTorch.
Exploring Self-attention for Image Recognition, CVPR2020.
翻译自Miguel Grinberg的blog https://blog.miguelgrinberg.com 的2017年新版The Flask Mega-Tutorial教程
neural module network on the GQA dataset
BLOCK (AAAI 2019), with a multimodal fusion library for deep learning models
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
A pytroch reimplementation of "Bilinear Attention Network", "Intra- and Inter-modality Attention", "Learning Conditioned Graph Structures", "Learning to count object", "Bottom-up top-down" for Visu…
Code for NIPS 2018 paper, "Chain of Reasoning for Visual Question Answering"