Compositionality¶
- Number of tasks: 7
AROCocoOrder¶
Compositionality Evaluation of images to their captions.Each capation has four hard negatives created by order permutations.
Dataset: gowitheflow/ARO-COCO-order
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | text_acc | eng | Encyclopaedic | expert-annotated | created |
Citation
@inproceedings{yuksekgonul2023and,
author = {Yuksekgonul, Mert and Bianchi, Federico and Kalluri, Pratyusha and Jurafsky, Dan and Zou, James},
booktitle = {The Eleventh International Conference on Learning Representations},
title = {When and why vision-language models behave like bags-of-words, and what to do about it?},
year = {2023},
}
AROFlickrOrder¶
Compositionality Evaluation of images to their captions.Each capation has four hard negatives created by order permutations.
Dataset: gowitheflow/ARO-Flickr-Order
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | text_acc | eng | Encyclopaedic | expert-annotated | created |
Citation
@inproceedings{yuksekgonul2023and,
author = {Yuksekgonul, Mert and Bianchi, Federico and Kalluri, Pratyusha and Jurafsky, Dan and Zou, James},
booktitle = {The Eleventh International Conference on Learning Representations},
title = {When and why vision-language models behave like bags-of-words, and what to do about it?},
year = {2023},
}
AROVisualAttribution¶
Compositionality Evaluation of images to their captions.
Dataset: gowitheflow/ARO-Visual-Attribution
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | text_acc | eng | Encyclopaedic | expert-annotated | created |
Citation
@inproceedings{yuksekgonul2023and,
author = {Yuksekgonul, Mert and Bianchi, Federico and Kalluri, Pratyusha and Jurafsky, Dan and Zou, James},
booktitle = {The Eleventh International Conference on Learning Representations},
title = {When and why vision-language models behave like bags-of-words, and what to do about it?},
year = {2023},
}
AROVisualRelation¶
Compositionality Evaluation of images to their captions.
Dataset: gowitheflow/ARO-Visual-Relation
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | text_acc | eng | Encyclopaedic | expert-annotated | created |
Citation
@inproceedings{yuksekgonul2023and,
author = {Yuksekgonul, Mert and Bianchi, Federico and Kalluri, Pratyusha and Jurafsky, Dan and Zou, James},
booktitle = {The Eleventh International Conference on Learning Representations},
title = {When and why vision-language models behave like bags-of-words, and what to do about it?},
year = {2023},
}
ImageCoDe¶
Identify the correct image from a set of similar images based on a precise caption.
Dataset: JamieSJS/imagecode-multi
• License: cc-by-sa-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image, text to image (it2i) | image_acc | eng | Web, Written | derived | found |
Citation
@article{krojer2022image,
author = {Krojer, Benno and Adlakha, Vaibhav and Vineet, Vibhav and Goyal, Yash and Ponti, Edoardo and Reddy, Siva},
journal = {arXiv preprint arXiv:2203.15867},
title = {Image retrieval from contextual descriptions},
year = {2022},
}
SugarCrepe¶
Compositionality Evaluation of images to their captions.
Dataset: yjkimstats/SUGARCREPE_fmt
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | text_acc | eng | Encyclopaedic | expert-annotated | created |
Citation
@article{hsieh2024sugarcrepe,
author = {Hsieh, Cheng-Yu and Zhang, Jieyu and Ma, Zixian and Kembhavi, Aniruddha and Krishna, Ranjay},
journal = {Advances in neural information processing systems},
title = {Sugarcrepe: Fixing hackable benchmarks for vision-language compositionality},
volume = {36},
year = {2024},
}
Winoground¶
Compositionality Evaluation of images to their captions.
Dataset: facebook/winoground
• License: https://huggingface.co/datasets/facebook/winoground/blob/main/license_agreement.txt • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Social | expert-annotated | created |
Citation
@misc{thrush2022winogroundprobingvisionlanguage,
archiveprefix = {arXiv},
author = {Tristan Thrush and Ryan Jiang and Max Bartolo and Amanpreet Singh and Adina Williams and Douwe Kiela and Candace Ross},
eprint = {2204.03162},
primaryclass = {cs.CV},
title = {Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality},
url = {https://arxiv.org/abs/2204.03162},
year = {2022},
}