rainuew

rainuew

Stars

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Python 1,465 217 Updated Apr 3, 2024

Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval." CVPR 2022

Python 105 19 Updated Jul 4, 2022

This is the official Github Repo for the paper Robustifying Human-Robot Collaboration through a Hierarchical and Multimodal Framework.

Python 4 Updated Nov 26, 2024

Official code for the paper "Understanding Co-speech Gestures in-the-wild"

Python 11 Updated Mar 31, 2025

Deep Learning-Based Multimodal Intention Retrieval for Human-Robot Collaboration, Accepted to ICRSA '24

Jupyter Notebook 1 Updated Dec 23, 2024