Stars
\infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
3d skeleton visualization is used for NTU RGB+D dataset.
💦 Make any website your Mac desktop wallpaper
Official repository for VisionZip (CVPR 2025)
Neumann Network with Recursive Kernels for Single Image Defocus Deblurring, CVPR 2023
[CVPR 2022--Oral] Restormer: Efficient Transformer for High-Resolution Image Restoration. SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.
Reference github repository for the paper "Defocus Deblurring Using Dual-Pixel Data". We introduce a deep neural network (DNN) architecture that uses the dual-pixel (DP) sub-aperture views to reduc…
A pretrained Pytorch classifier for the Google Speech Commands dataset that is very quick to set up and use.
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
[CVPR2025] Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
🔥 公益免费的ChatGPT API,Free ChatGPT API,GPT4 API,可直连,无需代理,使用标准 OpenAI APIKEY 格式访问 ChatGPT,可搭配ChatGPT-next-web、ChatGPT-Midjourney、Lobe-chat、Botgem、FastGPT、沉浸式翻译等项目使用
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Google's Conceptual Captions Dataset translated into Korean
Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
Custom ava dataset, Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions
Implementation of ViViT: A Video Vision Transformer - Zipping Coding Challenge
Implementation of ViViT: A Video Vision Transformer
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Materials for the Hugging Face Diffusion Models Course
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
End-to-End Object Detection with Transformers
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite