-
Google DeepMind
- Grenoble, France
- e-bug.github.io
- @ebugliarello
Highlights
- Pro
Stars
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Code for ALBEF: a new vision-language pre-training method
Paper List for Contrastive Learning for Natural Language Processing
Pytorch implementation of set transformer
Code and data for ImageCoDe, a contextual vison-and-language benchmark
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-…
Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
[EMNLP'21] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
[NAACL'21 & ACL'21] SapBERT: Self-alignment pretraining for BERT & XL-BEL: Cross-Lingual Biomedical Entity Linking.
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Awesome Transformers (self-attention) in Computer Vision
Paper bank for Self-Supervised Learning
PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Code for the CoNLL 2019 paper "Compositional Generalization in Image Captioning" by Mitja Nikolaus, Mostafa Abdou, Matthew Lamm, Rahul Aralikatte and Desmond Elliott
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".