VisionCentricQA¶
- Number of tasks: 6
BLINKIT2IMultiChoice¶
Retrieve images based on images and specific retrieval instructions.
Dataset: JamieSJS/blink-it2i-multi
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image, text to image (it2i) | accuracy | eng | Encyclopaedic | derived | found |
Citation
@article{fu2024blink,
author = {Fu, Xingyu and Hu, Yushi and Li, Bangzheng and Feng, Yu and Wang, Haoyu and Lin, Xudong and Roth, Dan and Smith, Noah A and Ma, Wei-Chiu and Krishna, Ranjay},
journal = {arXiv preprint arXiv:2404.12390},
title = {Blink: Multimodal large language models can see but not perceive},
year = {2024},
}
BLINKIT2TMultiChoice¶
Retrieve the correct text answer based on images and specific retrieval instructions.
Dataset: JamieSJS/blink-it2t-multi
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image, text to text (it2t) | accuracy | eng | Encyclopaedic | derived | found |
Citation
@article{fu2024blink,
author = {Fu, Xingyu and Hu, Yushi and Li, Bangzheng and Feng, Yu and Wang, Haoyu and Lin, Xudong and Roth, Dan and Smith, Noah A and Ma, Wei-Chiu and Krishna, Ranjay},
journal = {arXiv preprint arXiv:2404.12390},
title = {Blink: Multimodal large language models can see but not perceive},
year = {2024},
}
CVBenchCount¶
count the number of objects in the image.
Dataset: nyu-visionx/CV-Bench
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image, text to text (it2t) | accuracy | eng | Academic | derived | found |
Citation
@article{tong2024cambrian,
author = {Tong, Shengbang and Brown, Ellis and Wu, Penghao and Woo, Sanghyun and Middepogu, Manoj and Akula, Sai Charitha and Yang, Jihan and Yang, Shusheng and Iyer, Adithya and Pan, Xichen and others},
journal = {arXiv preprint arXiv:2406.16860},
title = {Cambrian-1: A fully open, vision-centric exploration of multimodal llms},
year = {2024},
}
CVBenchDepth¶
judge the depth of the objects in the image with similarity matching.
Dataset: nyu-visionx/CV-Bench
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image, text to text (it2t) | accuracy | eng | Academic | derived | found |
Citation
@article{tong2024cambrian,
author = {Tong, Shengbang and Brown, Ellis and Wu, Penghao and Woo, Sanghyun and Middepogu, Manoj and Akula, Sai Charitha and Yang, Jihan and Yang, Shusheng and Iyer, Adithya and Pan, Xichen and others},
journal = {arXiv preprint arXiv:2406.16860},
title = {Cambrian-1: A fully open, vision-centric exploration of multimodal llms},
year = {2024},
}
CVBenchDistance¶
judge the distance of the objects in the image with similarity matching.
Dataset: nyu-visionx/CV-Bench
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image, text to text (it2t) | accuracy | eng | Academic | derived | found |
Citation
@article{tong2024cambrian,
author = {Tong, Shengbang and Brown, Ellis and Wu, Penghao and Woo, Sanghyun and Middepogu, Manoj and Akula, Sai Charitha and Yang, Jihan and Yang, Shusheng and Iyer, Adithya and Pan, Xichen and others},
journal = {arXiv preprint arXiv:2406.16860},
title = {Cambrian-1: A fully open, vision-centric exploration of multimodal llms},
year = {2024},
}
CVBenchRelation¶
decide the relation of the objects in the image.
Dataset: nyu-visionx/CV-Bench
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image, text to text (it2t) | accuracy | eng | Academic | derived | found |
Citation
@article{tong2024cambrian,
author = {Tong, Shengbang and Brown, Ellis and Wu, Penghao and Woo, Sanghyun and Middepogu, Manoj and Akula, Sai Charitha and Yang, Jihan and Yang, Shusheng and Iyer, Adithya and Pan, Xichen and others},
journal = {arXiv preprint arXiv:2406.16860},
title = {Cambrian-1: A fully open, vision-centric exploration of multimodal llms},
year = {2024},
}