Skip to content

Compositionality

  • Number of tasks: 7

AROCocoOrder

Compositionality Evaluation of images to their captions.Each capation has four hard negatives created by order permutations.

Dataset: gowitheflow/ARO-COCO-order • License: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
image to text (i2t) text_acc eng Encyclopaedic expert-annotated created
Citation
@inproceedings{yuksekgonul2023and,
  author = {Yuksekgonul, Mert and Bianchi, Federico and Kalluri, Pratyusha and Jurafsky, Dan and Zou, James},
  booktitle = {The Eleventh International Conference on Learning Representations},
  title = {When and why vision-language models behave like bags-of-words, and what to do about it?},
  year = {2023},
}

AROFlickrOrder

Compositionality Evaluation of images to their captions.Each capation has four hard negatives created by order permutations.

Dataset: gowitheflow/ARO-Flickr-Order • License: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
image to text (i2t) text_acc eng Encyclopaedic expert-annotated created
Citation
@inproceedings{yuksekgonul2023and,
  author = {Yuksekgonul, Mert and Bianchi, Federico and Kalluri, Pratyusha and Jurafsky, Dan and Zou, James},
  booktitle = {The Eleventh International Conference on Learning Representations},
  title = {When and why vision-language models behave like bags-of-words, and what to do about it?},
  year = {2023},
}

AROVisualAttribution

Compositionality Evaluation of images to their captions.

Dataset: gowitheflow/ARO-Visual-Attribution • License: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
image to text (i2t) text_acc eng Encyclopaedic expert-annotated created
Citation
@inproceedings{yuksekgonul2023and,
  author = {Yuksekgonul, Mert and Bianchi, Federico and Kalluri, Pratyusha and Jurafsky, Dan and Zou, James},
  booktitle = {The Eleventh International Conference on Learning Representations},
  title = {When and why vision-language models behave like bags-of-words, and what to do about it?},
  year = {2023},
}

AROVisualRelation

Compositionality Evaluation of images to their captions.

Dataset: gowitheflow/ARO-Visual-Relation • License: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
image to text (i2t) text_acc eng Encyclopaedic expert-annotated created
Citation
@inproceedings{yuksekgonul2023and,
  author = {Yuksekgonul, Mert and Bianchi, Federico and Kalluri, Pratyusha and Jurafsky, Dan and Zou, James},
  booktitle = {The Eleventh International Conference on Learning Representations},
  title = {When and why vision-language models behave like bags-of-words, and what to do about it?},
  year = {2023},
}

ImageCoDe

Identify the correct image from a set of similar images based on a precise caption.

Dataset: JamieSJS/imagecode-multi • License: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
image, text to image (it2i) image_acc eng Web, Written derived found
Citation
@article{krojer2022image,
  author = {Krojer, Benno and Adlakha, Vaibhav and Vineet, Vibhav and Goyal, Yash and Ponti, Edoardo and Reddy, Siva},
  journal = {arXiv preprint arXiv:2203.15867},
  title = {Image retrieval from contextual descriptions},
  year = {2022},
}

SugarCrepe

Compositionality Evaluation of images to their captions.

Dataset: yjkimstats/SUGARCREPE_fmt • License: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
image to text (i2t) text_acc eng Encyclopaedic expert-annotated created
Citation
@article{hsieh2024sugarcrepe,
  author = {Hsieh, Cheng-Yu and Zhang, Jieyu and Ma, Zixian and Kembhavi, Aniruddha and Krishna, Ranjay},
  journal = {Advances in neural information processing systems},
  title = {Sugarcrepe: Fixing hackable benchmarks for vision-language compositionality},
  volume = {36},
  year = {2024},
}

Winoground

Compositionality Evaluation of images to their captions.

Dataset: facebook/winoground • License: https://huggingface.co/datasets/facebook/winoground/blob/main/license_agreement.txt • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
image to text (i2t) accuracy eng Social expert-annotated created
Citation
@misc{thrush2022winogroundprobingvisionlanguage,
  archiveprefix = {arXiv},
  author = {Tristan Thrush and Ryan Jiang and Max Bartolo and Amanpreet Singh and Adina Williams and Douwe Kiela and Candace Ross},
  eprint = {2204.03162},
  primaryclass = {cs.CV},
  title = {Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality},
  url = {https://arxiv.org/abs/2204.03162},
  year = {2022},
}