ZeroShotClassification¶

Number of tasks: 24

BirdsnapZeroShot¶

Classifying bird images from 500 species.

Dataset: mteb/birdsnap • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@inproceedings{Berg_2014_CVPR,
  author = {Berg, Thomas and Liu, Jiongxin and Woo Lee, Seung and Alexander, Michelle L. and Jacobs, David W. and Belhumeur, Peter N.},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  title = {Birdsnap: Large-scale Fine-grained Visual Categorization of Birds},
  year = {2014},
}

CIFAR100ZeroShot¶

Classifying images from 100 classes.

Dataset: mteb/cifar100 • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Web	derived	created

Citation

@techreport{Krizhevsky09learningmultiple,
  author = {Alex Krizhevsky},
  institution = {},
  title = {Learning multiple layers of features from tiny images},
  year = {2009},
}

CIFAR10ZeroShot¶

Classifying images from 10 classes.

Dataset: mteb/cifar10 • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Web	derived	created

Citation

@techreport{Krizhevsky09learningmultiple,
  author = {Alex Krizhevsky},
  institution = {},
  title = {Learning multiple layers of features from tiny images},
  year = {2009},
}

CLEVRCountZeroShot¶

CLEVR count objects task.

Dataset: mteb/wds_vtab-clevr_count_all • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Constructed	human-annotated	created

Citation

@inproceedings{Johnson_2017_CVPR,
  author = {Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens and Fei-Fei, Li and Lawrence Zitnick, C. and Girshick, Ross},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {July},
  title = {CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning},
  year = {2017},
}

CLEVRZeroShot¶

CLEVR closest object distance identification task.

Dataset: mteb/wds_vtab-clevr_closest_object_distance • License: cc-by-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Constructed	human-annotated	created

Citation

@inproceedings{Johnson_2017_CVPR,
  author = {Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens and Fei-Fei, Li and Lawrence Zitnick, C. and Girshick, Ross},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {July},
  title = {CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning},
  year = {2017},
}

Caltech101ZeroShot¶

Classifying images of 101 widely varied objects.

Dataset: mteb/Caltech101 • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@inproceedings{1384978,
  author = {Li Fei-Fei and Fergus, R. and Perona, P.},
  booktitle = {2004 Conference on Computer Vision and Pattern Recognition Workshop},
  doi = {10.1109/CVPR.2004.383},
  keywords = {Bayesian methods;Testing;Humans;Maximum likelihood estimation;Assembly;Shape;Machine vision;Image recognition;Parameter estimation;Image databases},
  number = {},
  pages = {178-178},
  title = {Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories},
  volume = {},
  year = {2004},
}

Country211ZeroShot¶

Classifying images of 211 countries.

Dataset: mteb/wds_country211 • License: cc-by-sa-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Scene	derived	created

Citation

@article{radford2021learning,
  author = {Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},
  journal = {arXiv preprint arXiv:2103.00020},
  title = {Learning Transferable Visual Models From Natural Language Supervision},
  year = {2021},
}

DTDZeroShot¶

Describable Textures Dataset in 47 categories.

Dataset: mteb/dtd • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@inproceedings{cimpoi14describing,
  author = {M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and and A. Vedaldi},
  booktitle = {Proceedings of the {IEEE} Conf. on Computer Vision and Pattern Recognition ({CVPR})},
  title = {Describing Textures in the Wild},
  year = {2014},
}

EuroSATZeroShot¶

Classifying satellite images.

Dataset: mteb/eurosat-rgb • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@article{8736785,
  author = {Helber, Patrick and Bischke, Benjamin and Dengel, Andreas and Borth, Damian},
  doi = {10.1109/JSTARS.2019.2918242},
  journal = {IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
  keywords = {Satellites;Earth;Remote sensing;Machine learning;Spatial resolution;Feature extraction;Benchmark testing;Dataset;deep convolutional neural network;deep learning;earth observation;land cover classification;land use classification;machine learning;remote sensing;satellite image classification;satellite images},
  number = {7},
  pages = {2217-2226},
  title = {EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification},
  volume = {12},
  year = {2019},
}

FER2013ZeroShot¶

Classifying facial emotions.

Dataset: mteb/wds_fer2013 • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@misc{goodfellow2015explainingharnessingadversarialexamples,
  archiveprefix = {arXiv},
  author = {Ian J. Goodfellow and Jonathon Shlens and Christian Szegedy},
  eprint = {1412.6572},
  primaryclass = {stat.ML},
  title = {Explaining and Harnessing Adversarial Examples},
  url = {https://arxiv.org/abs/1412.6572},
  year = {2015},
}

FGVCAircraftZeroShot¶

Classifying aircraft images from 41 manufacturers and 102 variants.

Dataset: mteb/FGVCAircraftZeroShot • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@misc{maji2013finegrainedvisualclassificationaircraft,
  archiveprefix = {arXiv},
  author = {Subhransu Maji and Esa Rahtu and Juho Kannala and Matthew Blaschko and Andrea Vedaldi},
  eprint = {1306.5151},
  primaryclass = {cs.CV},
  title = {Fine-Grained Visual Classification of Aircraft},
  url = {https://arxiv.org/abs/1306.5151},
  year = {2013},
}

Food101ZeroShot¶

Classifying food.

Dataset: mteb/food101 • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Web	derived	created

Citation

@inproceedings{bossard14,
  author = {Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc},
  booktitle = {European Conference on Computer Vision},
  title = {Food-101 -- Mining Discriminative Components with Random Forests},
  year = {2014},
}

GTSRBZeroShot¶

The German Traffic Sign Recognition Benchmark (GTSRB) is a multi-class classification dataset for traffic signs. It consists of dataset of more than 50,000 traffic sign images. The dataset comprises 43 classes with unbalanced class frequencies.

Dataset: mteb/wds_gtsrb • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Scene	derived	created

Citation

@inproceedings{6033395,
  author = {Stallkamp, Johannes and Schlipsing, Marc and Salmen, Jan and Igel, Christian},
  booktitle = {The 2011 International Joint Conference on Neural Networks},
  doi = {10.1109/IJCNN.2011.6033395},
  keywords = {Humans;Training;Image color analysis;Benchmark testing;Lead;Histograms;Image resolution},
  number = {},
  pages = {1453-1460},
  title = {The German Traffic Sign Recognition Benchmark: A multi-class classification competition},
  volume = {},
  year = {2011},
}

Imagenet1kZeroShot¶

ImageNet, a large-scale ontology of images built upon the backbone of the WordNet structure.

Dataset: mteb/wds_imagenet1k • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Scene	human-annotated	created

Citation

@inproceedings{deng2009imagenet,
  author = {Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Kai Li and Li Fei-Fei},
  booktitle = {2009 IEEE Conference on Computer Vision and Pattern Recognition},
  doi = {10.1109/CVPR.2009.5206848},
  keywords = {Large-scale systems;Image databases;Explosions;Internet;Robustness;Information retrieval;Image retrieval;Multimedia databases;Ontologies;Spine},
  number = {},
  pages = {248-255},
  title = {ImageNet: A large-scale hierarchical image database},
  volume = {},
  year = {2009},
}

MNISTZeroShot¶

Classifying handwritten digits.

Dataset: mteb/mnist • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@article{lecun2010mnist,
  author = {LeCun, Yann and Cortes, Corinna and Burges, CJ},
  journal = {ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
  title = {MNIST handwritten digit database},
  volume = {2},
  year = {2010},
}

OxfordPetsZeroShot¶

Classifying animal images.

Dataset: mteb/OxfordPets • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@misc{maji2013finegrainedvisualclassificationaircraft,
  archiveprefix = {arXiv},
  author = {Subhransu Maji and Esa Rahtu and Juho Kannala and Matthew Blaschko and Andrea Vedaldi},
  eprint = {1306.5151},
  primaryclass = {cs.CV},
  title = {Fine-Grained Visual Classification of Aircraft},
  url = {https://arxiv.org/abs/1306.5151},
  year = {2013},
}

PatchCamelyonZeroShot¶

Histopathology diagnosis classification dataset.

Dataset: mteb/wds_vtab-pcam • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Medical	derived	created

Citation

@inproceedings{10.1007/978-3-030-00934-2_24,
  address = {Cham},
  author = {Veeling, Bastiaan S.
and Linmans, Jasper
and Winkens, Jim
and Cohen, Taco
and Welling, Max},
  booktitle = {Medical Image Computing and Computer Assisted Intervention -- MICCAI 2018},
  editor = {Frangi, Alejandro F.
and Schnabel, Julia A.
and Davatzikos, Christos
and Alberola-L{\'o}pez, Carlos
and Fichtinger, Gabor},
  isbn = {978-3-030-00934-2},
  pages = {210--218},
  publisher = {Springer International Publishing},
  title = {Rotation Equivariant CNNs for Digital Pathology},
  year = {2018},
}

RESISC45ZeroShot¶

Remote Sensing Image Scene Classification by Northwestern Polytechnical University (NWPU).

Dataset: mteb/resisc45 • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@article{7891544,
  author = {Cheng, Gong and Han, Junwei and Lu, Xiaoqiang},
  doi = {10.1109/JPROC.2017.2675998},
  journal = {Proceedings of the IEEE},
  keywords = {Remote sensing;Benchmark testing;Spatial resolution;Social network services;Satellites;Image analysis;Machine learning;Unsupervised learning;Classification;Benchmark data set;deep learning;handcrafted features;remote sensing image;scene classification;unsupervised feature learning},
  number = {10},
  pages = {1865-1883},
  title = {Remote Sensing Image Scene Classification: Benchmark and State of the Art},
  volume = {105},
  year = {2017},
}

RenderedSST2¶

RenderedSST2.

Dataset: mteb/wds_renderedsst2 • License: mit • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Reviews	human-annotated	created

STL10ZeroShot¶

Classifying 96x96 images from 10 classes.

Dataset: mteb/stl10 • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@inproceedings{pmlr-v15-coates11a,
  address = {Fort Lauderdale, FL, USA},
  author = {Coates, Adam and Ng, Andrew and Lee, Honglak},
  booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
  editor = {Gordon, Geoffrey and Dunson, David and Dudík, Miroslav},
  month = {11--13 Apr},
  pages = {215--223},
  pdf = {http://proceedings.mlr.press/v15/coates11a/coates11a.pdf},
  publisher = {PMLR},
  series = {Proceedings of Machine Learning Research},
  title = {An Analysis of Single-Layer Networks in Unsupervised Feature Learning},
  url = {https://proceedings.mlr.press/v15/coates11a.html},
  volume = {15},
  year = {2011},
}

SUN397ZeroShot¶

Large scale scene recognition in 397 categories.

Dataset: mteb/sun397 • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Encyclopaedic	derived	created

Citation

@inproceedings{5539970,
  author = {Xiao, Jianxiong and Hays, James and Ehinger, Krista A. and Oliva, Aude and Torralba, Antonio},
  booktitle = {2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition},
  doi = {10.1109/CVPR.2010.5539970},
  number = {},
  pages = {3485-3492},
  title = {SUN database: Large-scale scene recognition from abbey to zoo},
  volume = {},
  year = {2010},
}

SciMMIR¶

SciMMIR.

Dataset: mteb/SciMMIR • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Academic	human-annotated	created

Citation

@misc{wu2024scimmirbenchmarkingscientificmultimodal,
  archiveprefix = {arXiv},
  author = {Siwei Wu and Yizhi Li and Kang Zhu and Ge Zhang and Yiming Liang and Kaijing Ma and Chenghao Xiao and Haoran Zhang and Bohao Yang and Wenhu Chen and Wenhao Huang and Noura Al Moubayed and Jie Fu and Chenghua Lin},
  eprint = {2401.13478},
  primaryclass = {cs.IR},
  title = {SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval},
  url = {https://arxiv.org/abs/2401.13478},
  year = {2024},
}

StanfordCarsZeroShot¶

Classifying car images from 96 makes.

Dataset: mteb/StanfordCars • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Scene	derived	created

Citation

@inproceedings{Krause2013CollectingAL,
  author = {Jonathan Krause and Jia Deng and Michael Stark and Li Fei-Fei},
  title = {Collecting a Large-scale Dataset of Fine-grained Cars},
  url = {https://api.semanticscholar.org/CorpusID:16632981},
  year = {2013},
}

UCF101ZeroShot¶

UCF101 is an action recognition data set of realistic action videos collected from YouTube, having 101 action categories. This version of the dataset does not contain images but images saved frame by frame. Train and test splits are generated based on the authors' first version train/test list.

Dataset: mteb/ucf101 • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
image to text (i2t)	accuracy	eng	Scene	derived	created

Citation

@misc{soomro2012ucf101dataset101human,
  archiveprefix = {arXiv},
  author = {Khurram Soomro and Amir Roshan Zamir and Mubarak Shah},
  eprint = {1212.0402},
  primaryclass = {cs.CV},
  title = {UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild},
  url = {https://arxiv.org/abs/1212.0402},
  year = {2012},
}