VisionCentricQA¶

Number of tasks: 6

BLINKIT2IMultiChoice¶

Retrieve images based on images and specific retrieval instructions.

Dataset: mteb/blink-it2i-multi • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image, text to image (it2i)	accuracy	eng	Encyclopaedic	derived	found

Citation

@article{fu2024blink,
  author = {Fu, Xingyu and Hu, Yushi and Li, Bangzheng and Feng, Yu and Wang, Haoyu and Lin, Xudong and Roth, Dan and Smith, Noah A and Ma, Wei-Chiu and Krishna, Ranjay},
  journal = {arXiv preprint arXiv:2404.12390},
  title = {Blink: Multimodal large language models can see but not perceive},
  year = {2024},
}

BLINKIT2TMultiChoice¶

Retrieve the correct text answer based on images and specific retrieval instructions.

Dataset: mteb/blink-it2t-multi • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image, text to text (it2t)	accuracy	eng	Encyclopaedic	derived	found

Citation

@article{fu2024blink,
  author = {Fu, Xingyu and Hu, Yushi and Li, Bangzheng and Feng, Yu and Wang, Haoyu and Lin, Xudong and Roth, Dan and Smith, Noah A and Ma, Wei-Chiu and Krishna, Ranjay},
  journal = {arXiv preprint arXiv:2404.12390},
  title = {Blink: Multimodal large language models can see but not perceive},
  year = {2024},
}

CVBenchCount¶

count the number of objects in the image.

Dataset: mteb/CV-Bench • License: mit • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image, text to text (it2t)	accuracy	eng	Academic	derived	found

Citation

@article{tong2024cambrian,
  author = {Tong, Shengbang and Brown, Ellis and Wu, Penghao and Woo, Sanghyun and Middepogu, Manoj and Akula, Sai Charitha and Yang, Jihan and Yang, Shusheng and Iyer, Adithya and Pan, Xichen and others},
  journal = {arXiv preprint arXiv:2406.16860},
  title = {Cambrian-1: A fully open, vision-centric exploration of multimodal llms},
  year = {2024},
}

CVBenchDepth¶

judge the depth of the objects in the image with similarity matching.

Dataset: mteb/CV-Bench • License: mit • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image, text to text (it2t)	accuracy	eng	Academic	derived	found

Citation

@article{tong2024cambrian,
  author = {Tong, Shengbang and Brown, Ellis and Wu, Penghao and Woo, Sanghyun and Middepogu, Manoj and Akula, Sai Charitha and Yang, Jihan and Yang, Shusheng and Iyer, Adithya and Pan, Xichen and others},
  journal = {arXiv preprint arXiv:2406.16860},
  title = {Cambrian-1: A fully open, vision-centric exploration of multimodal llms},
  year = {2024},
}

CVBenchDistance¶

judge the distance of the objects in the image with similarity matching.

Dataset: mteb/CV-Bench • License: mit • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image, text to text (it2t)	accuracy	eng	Academic	derived	found

Citation

@article{tong2024cambrian,
  author = {Tong, Shengbang and Brown, Ellis and Wu, Penghao and Woo, Sanghyun and Middepogu, Manoj and Akula, Sai Charitha and Yang, Jihan and Yang, Shusheng and Iyer, Adithya and Pan, Xichen and others},
  journal = {arXiv preprint arXiv:2406.16860},
  title = {Cambrian-1: A fully open, vision-centric exploration of multimodal llms},
  year = {2024},
}

CVBenchRelation¶

decide the relation of the objects in the image.

Dataset: mteb/CV-Bench • License: mit • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image, text to text (it2t)	accuracy	eng	Academic	derived	found

Citation

@article{tong2024cambrian,
  author = {Tong, Shengbang and Brown, Ellis and Wu, Penghao and Woo, Sanghyun and Middepogu, Manoj and Akula, Sai Charitha and Yang, Jihan and Yang, Shusheng and Iyer, Adithya and Pan, Xichen and others},
  journal = {arXiv preprint arXiv:2406.16860},
  title = {Cambrian-1: A fully open, vision-centric exploration of multimodal llms},
  year = {2024},
}