Skip to content

Motivated by the potential synergy between robust optimization and multimodal contrastive learning, we present in this paper RMCL a robust multimodal contrastive learning. The goal is to reinforce the joint representation of an image-text pair by robust contrastive optimization.

Notifications You must be signed in to change notification settings

stanFurrer/Robust-Multimodal-Contrastive-Learning

Repository files navigation

Robust Multimodal Contrastive Learning

Introduction

PyTorch Lightning code for the paper "RMCL: Robust Multimodal Contrastive Learning". Slides of our talk are avialable here.


The main figure

Install

pip install -r requirements.txt
pip install -e .

Download Pretrained Weights

We provide five pretrained weights

  1. ViLT-B/32 Pretrained with MLM+ITM for 200k steps on GCC+SBU+COCO+VG (ViLT-B/32 200k) link
  2. ViLT-B/32 200k finetuned on VQAv2 link
  3. ViLT-B/32 200k finetuned on NLVR2 link
  4. ViLT-B/32 200k finetuned on COCO IR/TR link
  5. ViLT-B/32 200k finetuned on F30K IR/TR link

Download counter-fitting word embeddings.

The synonym selection for the Geometric based attack is computed from the cosine similarity scores between word pairs based on the counter-fitting word embeddings link

Dataset Preparation (Pretraining/Finetuning)

See DATA.md

Train New Models (Pretraining/Finetuning)

See TRAIN.md

Evaluation

See EVAL.md

Citation

If you use any part of this code and pretrained weights for your own purpose, please cite our [paper]

Contact for Issues

Stan Furrer
Zhao Meng

About

Motivated by the potential synergy between robust optimization and multimodal contrastive learning, we present in this paper RMCL a robust multimodal contrastive learning. The goal is to reinforce the joint representation of an image-text pair by robust contrastive optimization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages