ZeroShotClassification¶
- Number of tasks: 24
BirdsnapZeroShot¶
Classifying bird images from 500 species.
Dataset: isaacchung/birdsnap
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@inproceedings{Berg_2014_CVPR,
author = {Berg, Thomas and Liu, Jiongxin and Woo Lee, Seung and Alexander, Michelle L. and Jacobs, David W. and Belhumeur, Peter N.},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
title = {Birdsnap: Large-scale Fine-grained Visual Categorization of Birds},
year = {2014},
}
CIFAR100ZeroShot¶
Classifying images from 100 classes.
Dataset: uoft-cs/cifar100
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Web | derived | created |
Citation
@techreport{Krizhevsky09learningmultiple,
author = {Alex Krizhevsky},
institution = {},
title = {Learning multiple layers of features from tiny images},
year = {2009},
}
CIFAR10ZeroShot¶
Classifying images from 10 classes.
Dataset: uoft-cs/cifar10
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Web | derived | created |
Citation
@techreport{Krizhevsky09learningmultiple,
author = {Alex Krizhevsky},
institution = {},
title = {Learning multiple layers of features from tiny images},
year = {2009},
}
CLEVRCountZeroShot¶
CLEVR count objects task.
Dataset: clip-benchmark/wds_vtab-clevr_count_all
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Constructed | human-annotated | created |
Citation
@inproceedings{Johnson_2017_CVPR,
author = {Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens and Fei-Fei, Li and Lawrence Zitnick, C. and Girshick, Ross},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
title = {CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning},
year = {2017},
}
CLEVRZeroShot¶
CLEVR closest object distance identification task.
Dataset: clip-benchmark/wds_vtab-clevr_closest_object_distance
• License: cc-by-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Constructed | human-annotated | created |
Citation
@inproceedings{Johnson_2017_CVPR,
author = {Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens and Fei-Fei, Li and Lawrence Zitnick, C. and Girshick, Ross},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
title = {CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning},
year = {2017},
}
Caltech101ZeroShot¶
Classifying images of 101 widely varied objects.
Dataset: mteb/Caltech101
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@inproceedings{1384978,
author = {Li Fei-Fei and Fergus, R. and Perona, P.},
booktitle = {2004 Conference on Computer Vision and Pattern Recognition Workshop},
doi = {10.1109/CVPR.2004.383},
keywords = {Bayesian methods;Testing;Humans;Maximum likelihood estimation;Assembly;Shape;Machine vision;Image recognition;Parameter estimation;Image databases},
number = {},
pages = {178-178},
title = {Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories},
volume = {},
year = {2004},
}
Country211ZeroShot¶
Classifying images of 211 countries.
Dataset: clip-benchmark/wds_country211
• License: cc-by-sa-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Scene | derived | created |
Citation
@article{radford2021learning,
author = {Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},
journal = {arXiv preprint arXiv:2103.00020},
title = {Learning Transferable Visual Models From Natural Language Supervision},
year = {2021},
}
DTDZeroShot¶
Describable Textures Dataset in 47 categories.
Dataset: tanganke/dtd
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@inproceedings{cimpoi14describing,
author = {M. Cimpoi and S. Maji and I. Kokkinos and S. Mohamed and and A. Vedaldi},
booktitle = {Proceedings of the {IEEE} Conf. on Computer Vision and Pattern Recognition ({CVPR})},
title = {Describing Textures in the Wild},
year = {2014},
}
EuroSATZeroShot¶
Classifying satellite images.
Dataset: timm/eurosat-rgb
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@article{8736785,
author = {Helber, Patrick and Bischke, Benjamin and Dengel, Andreas and Borth, Damian},
doi = {10.1109/JSTARS.2019.2918242},
journal = {IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
keywords = {Satellites;Earth;Remote sensing;Machine learning;Spatial resolution;Feature extraction;Benchmark testing;Dataset;deep convolutional neural network;deep learning;earth observation;land cover classification;land use classification;machine learning;remote sensing;satellite image classification;satellite images},
number = {7},
pages = {2217-2226},
title = {EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification},
volume = {12},
year = {2019},
}
FER2013ZeroShot¶
Classifying facial emotions.
Dataset: clip-benchmark/wds_fer2013
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@misc{goodfellow2015explainingharnessingadversarialexamples,
archiveprefix = {arXiv},
author = {Ian J. Goodfellow and Jonathon Shlens and Christian Szegedy},
eprint = {1412.6572},
primaryclass = {stat.ML},
title = {Explaining and Harnessing Adversarial Examples},
url = {https://arxiv.org/abs/1412.6572},
year = {2015},
}
FGVCAircraftZeroShot¶
Classifying aircraft images from 41 manufacturers and 102 variants.
Dataset: HuggingFaceM4/FGVC-Aircraft
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@misc{maji2013finegrainedvisualclassificationaircraft,
archiveprefix = {arXiv},
author = {Subhransu Maji and Esa Rahtu and Juho Kannala and Matthew Blaschko and Andrea Vedaldi},
eprint = {1306.5151},
primaryclass = {cs.CV},
title = {Fine-Grained Visual Classification of Aircraft},
url = {https://arxiv.org/abs/1306.5151},
year = {2013},
}
Food101ZeroShot¶
Classifying food.
Dataset: ethz/food101
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Web | derived | created |
Citation
@inproceedings{bossard14,
author = {Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc},
booktitle = {European Conference on Computer Vision},
title = {Food-101 -- Mining Discriminative Components with Random Forests},
year = {2014},
}
GTSRBZeroShot¶
The German Traffic Sign Recognition Benchmark (GTSRB) is a multi-class classification dataset for traffic signs. It consists of dataset of more than 50,000 traffic sign images. The dataset comprises 43 classes with unbalanced class frequencies.
Dataset: clip-benchmark/wds_gtsrb
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Scene | derived | created |
Citation
@inproceedings{6033395,
author = {Stallkamp, Johannes and Schlipsing, Marc and Salmen, Jan and Igel, Christian},
booktitle = {The 2011 International Joint Conference on Neural Networks},
doi = {10.1109/IJCNN.2011.6033395},
keywords = {Humans;Training;Image color analysis;Benchmark testing;Lead;Histograms;Image resolution},
number = {},
pages = {1453-1460},
title = {The German Traffic Sign Recognition Benchmark: A multi-class classification competition},
volume = {},
year = {2011},
}
Imagenet1kZeroShot¶
ImageNet, a large-scale ontology of images built upon the backbone of the WordNet structure.
Dataset: clip-benchmark/wds_imagenet1k
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Scene | human-annotated | created |
Citation
@article{deng2009imagenet,
author = {Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li},
journal = {2009 IEEE Conference on Computer Vision and Pattern Recognition},
organization = {Ieee},
pages = {248--255},
title = {ImageNet: A large-scale hierarchical image database},
year = {2009},
}
MNISTZeroShot¶
Classifying handwritten digits.
Dataset: ylecun/mnist
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@article{lecun2010mnist,
author = {LeCun, Yann and Cortes, Corinna and Burges, CJ},
journal = {ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
title = {MNIST handwritten digit database},
volume = {2},
year = {2010},
}
OxfordPetsZeroShot¶
Classifying animal images.
Dataset: isaacchung/OxfordPets
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@misc{maji2013finegrainedvisualclassificationaircraft,
archiveprefix = {arXiv},
author = {Subhransu Maji and Esa Rahtu and Juho Kannala and Matthew Blaschko and Andrea Vedaldi},
eprint = {1306.5151},
primaryclass = {cs.CV},
title = {Fine-Grained Visual Classification of Aircraft},
url = {https://arxiv.org/abs/1306.5151},
year = {2013},
}
PatchCamelyonZeroShot¶
Histopathology diagnosis classification dataset.
Dataset: clip-benchmark/wds_vtab-pcam
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Medical | derived | created |
Citation
@inproceedings{10.1007/978-3-030-00934-2_24,
abstract = {We propose a new model for digital pathology segmentation, based on the observation that histopathology images are inherently symmetric under rotation and reflection. Utilizing recent findings on rotation equivariant CNNs, the proposed model leverages these symmetries in a principled manner. We present a visual analysis showing improved stability on predictions, and demonstrate that exploiting rotation equivariance significantly improves tumor detection performance on a challenging lymph node metastases dataset. We further present a novel derived dataset to enable principled comparison of machine learning models, in combination with an initial benchmark. Through this dataset, the task of histopathology diagnosis becomes accessible as a challenging benchmark for fundamental machine learning research.},
address = {Cham},
author = {Veeling, Bastiaan S.
and Linmans, Jasper
and Winkens, Jim
and Cohen, Taco
and Welling, Max},
booktitle = {Medical Image Computing and Computer Assisted Intervention -- MICCAI 2018},
editor = {Frangi, Alejandro F.
and Schnabel, Julia A.
and Davatzikos, Christos
and Alberola-L{\'o}pez, Carlos
and Fichtinger, Gabor},
isbn = {978-3-030-00934-2},
pages = {210--218},
publisher = {Springer International Publishing},
title = {Rotation Equivariant CNNs for Digital Pathology},
year = {2018},
}
RESISC45ZeroShot¶
Remote Sensing Image Scene Classification by Northwestern Polytechnical University (NWPU).
Dataset: timm/resisc45
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@article{7891544,
author = {Cheng, Gong and Han, Junwei and Lu, Xiaoqiang},
doi = {10.1109/JPROC.2017.2675998},
journal = {Proceedings of the IEEE},
keywords = {Remote sensing;Benchmark testing;Spatial resolution;Social network services;Satellites;Image analysis;Machine learning;Unsupervised learning;Classification;Benchmark data set;deep learning;handcrafted features;remote sensing image;scene classification;unsupervised feature learning},
number = {10},
pages = {1865-1883},
title = {Remote Sensing Image Scene Classification: Benchmark and State of the Art},
volume = {105},
year = {2017},
}
RenderedSST2¶
RenderedSST2.
Dataset: clip-benchmark/wds_renderedsst2
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Reviews | human-annotated | created |
STL10ZeroShot¶
Classifying 96x96 images from 10 classes.
Dataset: tanganke/stl10
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@inproceedings{pmlr-v15-coates11a,
address = {Fort Lauderdale, FL, USA},
author = {Coates, Adam and Ng, Andrew and Lee, Honglak},
booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
editor = {Gordon, Geoffrey and Dunson, David and DudÃk, Miroslav},
month = {11--13 Apr},
pages = {215--223},
pdf = {http://proceedings.mlr.press/v15/coates11a/coates11a.pdf},
publisher = {PMLR},
series = {Proceedings of Machine Learning Research},
title = {An Analysis of Single-Layer Networks in Unsupervised Feature Learning},
url = {https://proceedings.mlr.press/v15/coates11a.html},
volume = {15},
year = {2011},
}
SUN397ZeroShot¶
Large scale scene recognition in 397 categories.
Dataset: dpdl-benchmark/sun397
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Encyclopaedic | derived | created |
Citation
@inproceedings{5539970,
author = {Xiao, Jianxiong and Hays, James and Ehinger, Krista A. and Oliva, Aude and Torralba, Antonio},
booktitle = {2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition},
doi = {10.1109/CVPR.2010.5539970},
number = {},
pages = {3485-3492},
title = {SUN database: Large-scale scene recognition from abbey to zoo},
volume = {},
year = {2010},
}
SciMMIR¶
SciMMIR.
Dataset: m-a-p/SciMMIR
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Academic | human-annotated | created |
Citation
@misc{wu2024scimmirbenchmarkingscientificmultimodal,
archiveprefix = {arXiv},
author = {Siwei Wu and Yizhi Li and Kang Zhu and Ge Zhang and Yiming Liang and Kaijing Ma and Chenghao Xiao and Haoran Zhang and Bohao Yang and Wenhu Chen and Wenhao Huang and Noura Al Moubayed and Jie Fu and Chenghua Lin},
eprint = {2401.13478},
primaryclass = {cs.IR},
title = {SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval},
url = {https://arxiv.org/abs/2401.13478},
year = {2024},
}
StanfordCarsZeroShot¶
Classifying car images from 96 makes.
Dataset: isaacchung/StanfordCars
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Scene | derived | created |
Citation
@inproceedings{Krause2013CollectingAL,
author = {Jonathan Krause and Jia Deng and Michael Stark and Li Fei-Fei},
title = {Collecting a Large-scale Dataset of Fine-grained Cars},
url = {https://api.semanticscholar.org/CorpusID:16632981},
year = {2013},
}
UCF101ZeroShot¶
UCF101 is an action recognition data set of realistic action videos collected from YouTube, having 101 action categories. This version of the dataset does not contain images but images saved frame by frame. Train and test splits are generated based on the authors' first version train/test list.
Dataset: flwrlabs/ucf101
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
image to text (i2t) | accuracy | eng | Scene | derived | created |
Citation
@misc{soomro2012ucf101dataset101human,
archiveprefix = {arXiv},
author = {Khurram Soomro and Amir Roshan Zamir and Mubarak Shah},
eprint = {1212.0402},
primaryclass = {cs.CV},
title = {UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild},
url = {https://arxiv.org/abs/1212.0402},
year = {2012},
}