Stars
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval." CVPR 2022
This is the official Github Repo for the paper Robustifying Human-Robot Collaboration through a Hierarchical and Multimodal Framework.
Official code for the paper "Understanding Co-speech Gestures in-the-wild"
Deep Learning-Based Multimodal Intention Retrieval for Human-Robot Collaboration, Accepted to ICRSA '24