Lists (22)
Sort Name ascending (A-Z)
awsome
backbones
captions
clustering
contrastve learning
diffusion_models
ego4d
few-shot
germany
learn
LLMs
long-tail
NCD
nlp
openset
resources
tech
time_transformers
transformers
video memory efficient
videos
work-in-progress
Stars
Official implementation of "Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks." CVPR 2025
A presentation/notebook that expresses my view on things making PyTorch efficient that targets researchers in AI and other domains.
A presentation explaining how Einsum could be understood and implemented.
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
ConceptAttention: A method for interpreting multi-modal diffusion transformers.
[ECCV 2024] Isomorphic Pruning for Vision Models
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
MEt3R: Measuring Multi-View Consistency in Generated Images
Official Pytorch Implementation for "Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer""
Easily compute clip embeddings and build a clip retrieval system with them
Open source platform for the machine learning lifecycle
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
A method to increase the speed and lower the memory footprint of existing vision transformers.
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Easily create large video dataset from video urls
The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
Fast and memory-efficient exact attention
OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Code and dataset for photorealistic Codec Avatars driven from audio
An open-source NLP research library, built on PyTorch.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'