Stars
🚀 Lightning-fast computer vision models. Fine-tune SOTA models with just a few lines of code. Ready for cloud ☁️ and edge 📱 deployment.
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
FreeStyle : Free Lunch for Text-guided Style Transfer using Diffusion Models
Code for “Pretrained Language Models as Visual Planners for Human Assistance”
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)
Object Detection component developed for the DARPA AIDA program.
S3D Text-Video model trained on HowTo100M using MIL-NCE
Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers
Repository for "Space-Time Correspondence as a Contrastive Random Walk" (NeurIPS 2020)
Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)
A new framework for open-vocabulary object detection, based on maskrcnn-benchmark
This repo covers the implementation for Labelling unlabelled videos from scratch with multi-modal self-supervision, which learns clusters from multi-modal data in a self-supervised way.
SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Pytorch code of for our CVPR 2018 paper "Neural Baby Talk"
Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"
Image Captions Generation with Spatial and Channel-wise Attention
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
Information Retrieval (IR, 2015) Final Project