Available Benchmarks¶
BEIR¶
BEIR is a heterogeneous benchmark containing diverse IR tasks. It also provides a common and easy framework for evaluation of your NLP-based retrieval models within the benchmark.
Tasks
name | type | modalities | languages |
---|---|---|---|
TRECCOVID | Retrieval | text | eng |
NFCorpus | Retrieval | text | eng |
NQ | Retrieval | text | eng |
HotpotQA | Retrieval | text | eng |
FiQA2018 | Retrieval | text | eng |
ArguAna | Retrieval | text | eng |
Touche2020 | Retrieval | text | eng |
CQADupstackRetrieval | Retrieval | text | eng |
QuoraRetrieval | Retrieval | text | eng |
DBPedia | Retrieval | text | eng |
SCIDOCS | Retrieval | text | eng |
FEVER | Retrieval | text | eng |
ClimateFEVER | Retrieval | text | eng |
SciFact | Retrieval | text | eng |
MSMARCO | Retrieval | text | eng |
Citation
@article{thakur2021beir,
author = {Thakur, Nandan and Reimers, Nils and R{\"u}ckl{\'e}, Andreas and Srivastava, Abhishek and Gurevych, Iryna},
journal = {arXiv preprint arXiv:2104.08663},
title = {Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models},
year = {2021},
}
BEIR-NL¶
BEIR-NL is a Dutch adaptation of the publicly available BEIR benchmark, created through automated translation.
Tasks
name | type | modalities | languages |
---|---|---|---|
ArguAna-NL | Retrieval | text | nld |
CQADupstack-NL | Retrieval | text | nld |
FEVER-NL | Retrieval | text | nld |
NQ-NL | Retrieval | text | nld |
Touche2020-NL | Retrieval | text | nld |
FiQA2018-NL | Retrieval | text | nld |
Quora-NL | Retrieval | text | nld |
HotpotQA-NL | Retrieval | text | nld |
SCIDOCS-NL | Retrieval | text | nld |
ClimateFEVER-NL | Retrieval | text | nld |
mMARCO-NL | Retrieval | text | nld |
SciFact-NL | Retrieval | text | nld |
DBPedia-NL | Retrieval | text | nld |
NFCorpus-NL | Retrieval | text | nld |
TRECCOVID-NL | Retrieval | text | nld |
Citation
@misc{banar2024beirnlzeroshotinformationretrieval,
archiveprefix = {arXiv},
author = {Nikolay Banar and Ehsan Lotfi and Walter Daelemans},
eprint = {2412.08329},
primaryclass = {cs.CL},
title = {BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language},
url = {https://arxiv.org/abs/2412.08329},
year = {2024},
}
BRIGHT¶
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval. BRIGHT is the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents with a dataset consisting of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding. These queries are drawn from naturally occurring and carefully curated human data.
Tasks
name | type | modalities | languages |
---|---|---|---|
BrightRetrieval | Retrieval | text | eng |
Citation
@article{su2024bright,
author = {Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and others},
journal = {arXiv preprint arXiv:2407.12883},
title = {Bright: A realistic and challenging benchmark for reasoning-intensive retrieval},
year = {2024},
}
BRIGHT(long)¶
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval. BRIGHT is the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents with a dataset consisting of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding. These queries are drawn from naturally occurring and carefully curated human data.
This is the long version of the benchmark, which only filter longer documents.
Tasks
name | type | modalities | languages |
---|---|---|---|
BrightLongRetrieval | Retrieval | text | eng |
Citation
@article{su2024bright,
author = {Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and others},
journal = {arXiv preprint arXiv:2407.12883},
title = {Bright: A realistic and challenging benchmark for reasoning-intensive retrieval},
year = {2024},
}
BuiltBench(eng)¶
"Built-Bench" is an ongoing effort aimed at evaluating text embedding models in the context of built asset management, spanning over various dicsiplines such as architeture, engineering, constrcution, and operations management of the built environment.
Tasks
name | type | modalities | languages |
---|---|---|---|
BuiltBenchClusteringP2P | Clustering | text | eng |
BuiltBenchClusteringS2S | Clustering | text | eng |
BuiltBenchRetrieval | Retrieval | text | eng |
BuiltBenchReranking | Reranking | text | eng |
Citation
@article{shahinmoghadam2024benchmarking,
author = {Shahinmoghadam, Mehrzad and Motamedi, Ali},
journal = {arXiv preprint arXiv:2411.12056},
title = {Benchmarking pre-trained text embedding models in aligning built asset information},
year = {2024},
}
ChemTEB¶
ChemTEB evaluates the performance of text embedding models on chemical domain data.
Tasks
Citation
@article{kasmaee2024chemteb,
author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
journal = {arXiv preprint arXiv:2412.00532},
title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \\& Efficiency on a Specific Domain},
year = {2024},
}
CoIR¶
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models
Tasks
name | type | modalities | languages |
---|---|---|---|
AppsRetrieval | Retrieval | text | eng, python |
CodeFeedbackMT | Retrieval | text | eng |
CodeFeedbackST | Retrieval | text | eng |
CodeSearchNetCCRetrieval | Retrieval | text | go, java, javascript, php, python, ... (6) |
CodeTransOceanContest | Retrieval | text | c++, python |
CodeTransOceanDL | Retrieval | text | python |
CosQA | Retrieval | text | eng, python |
COIRCodeSearchNetRetrieval | Retrieval | text | go, java, javascript, php, python, ... (6) |
StackOverflowQA | Retrieval | text | eng |
SyntheticText2SQL | Retrieval | text | eng, sql |
Citation
@misc{li2024coircomprehensivebenchmarkcode,
archiveprefix = {arXiv},
author = {Xiangyang Li and Kuicai Dong and Yi Quan Lee and Wei Xia and Yichun Yin and Hao Zhang and Yong Liu and Yasheng Wang and Ruiming Tang},
eprint = {2407.02883},
primaryclass = {cs.IR},
title = {CoIR: A Comprehensive Benchmark for Code Information Retrieval Models},
url = {https://arxiv.org/abs/2407.02883},
year = {2024},
}
CodeRAG¶
A benchmark for evaluating code retrieval augmented generation, testing models' ability to retrieve relevant programming solutions, tutorials and documentation.
Tasks
name | type | modalities | languages |
---|---|---|---|
CodeRAGLibraryDocumentationSolutions | Reranking | text | python |
CodeRAGOnlineTutorials | Reranking | text | python |
CodeRAGProgrammingSolutions | Reranking | text | python |
CodeRAGStackoverflowPosts | Reranking | text | python |
Citation
@misc{wang2024coderagbenchretrievalaugmentcode,
archiveprefix = {arXiv},
author = {Zora Zhiruo Wang and Akari Asai and Xinyan Velocity Yu and Frank F. Xu and Yiqing Xie and Graham Neubig and Daniel Fried},
eprint = {2406.14497},
primaryclass = {cs.SE},
title = {CodeRAG-Bench: Can Retrieval Augment Code Generation?},
url = {https://arxiv.org/abs/2406.14497},
year = {2024},
}
Encodechka¶
A benchmark for evaluating text embedding models on Russian data.
Tasks
name | type | modalities | languages |
---|---|---|---|
RUParaPhraserSTS | STS | text | rus |
SentiRuEval2016 | Classification | text | rus |
RuToxicOKMLCUPClassification | Classification | text | rus |
InappropriatenessClassificationv2 | Classification | text | rus |
RuNLUIntentClassification | Classification | text | rus |
XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
RuSTSBenchmarkSTS | STS | text | rus |
Citation
@misc{dale_encodechka,
author = {Dale, David},
editor = {habr.com},
month = {June},
note = {[Online; posted 12-June-2022]},
title = {Russian rating of sentence encoders},
url = {https://habr.com/ru/articles/669674/},
year = {2022},
}
FollowIR¶
Retrieval w/Instructions is the task of finding relevant documents for a query that has detailed instructions.
Tasks
name | type | modalities | languages |
---|---|---|---|
Robust04InstructionRetrieval | InstructionReranking | text | eng |
News21InstructionRetrieval | InstructionReranking | text | eng |
Core17InstructionRetrieval | InstructionReranking | text | eng |
Citation
@misc{weller2024followir,
archiveprefix = {arXiv},
author = {Orion Weller and Benjamin Chang and Sean MacAvaney and Kyle Lo and Arman Cohan and Benjamin Van Durme and Dawn Lawrie and Luca Soldaini},
eprint = {2403.15246},
primaryclass = {cs.IR},
title = {FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions},
year = {2024},
}
JinaVDR¶
Multilingual, domain-diverse and layout-rich document retrieval benchmark.
Tasks
name | type | modalities | languages |
---|---|---|---|
JinaVDRMedicalPrescriptionsRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRStanfordSlideRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRDonutVQAISynHMPRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRTableVQARetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRChartQARetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRTQARetrieval | DocumentUnderstanding | text, image | eng |
JinaVDROpenAINewsRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDREuropeanaDeNewsRetrieval | DocumentUnderstanding | text, image | deu |
JinaVDREuropeanaEsNewsRetrieval | DocumentUnderstanding | text, image | spa |
JinaVDREuropeanaItScansRetrieval | DocumentUnderstanding | text, image | ita |
JinaVDREuropeanaNlLegalRetrieval | DocumentUnderstanding | text, image | nld |
JinaVDRHindiGovVQARetrieval | DocumentUnderstanding | text, image | hin |
JinaVDRAutomobileCatelogRetrieval | DocumentUnderstanding | text, image | jpn |
JinaVDRBeveragesCatalogueRetrieval | DocumentUnderstanding | text, image | rus |
JinaVDRRamensBenchmarkRetrieval | DocumentUnderstanding | text, image | jpn |
JinaVDRJDocQARetrieval | DocumentUnderstanding | text, image | jpn |
JinaVDRHungarianDocQARetrieval | DocumentUnderstanding | text, image | hun |
JinaVDRArabicChartQARetrieval | DocumentUnderstanding | text, image | ara |
JinaVDRArabicInfographicsVQARetrieval | DocumentUnderstanding | text, image | ara |
JinaVDROWIDChartsRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRMPMQARetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRJina2024YearlyBookRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRWikimediaCommonsMapsRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRPlotQARetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRMMTabRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRCharXivOCRRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRStudentEnrollmentSyntheticRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRGitHubReadmeRetrieval | DocumentUnderstanding | text, image | ara, ben, deu, eng, fra, ... (17) |
JinaVDRTweetStockSyntheticsRetrieval | DocumentUnderstanding | text, image | ara, deu, eng, fra, hin, ... (10) |
JinaVDRAirbnbSyntheticRetrieval | DocumentUnderstanding | text, image | ara, deu, eng, fra, hin, ... (10) |
JinaVDRShanghaiMasterPlanRetrieval | DocumentUnderstanding | text, image | zho |
JinaVDRWikimediaCommonsDocumentsRetrieval | DocumentUnderstanding | text, image | ara, ben, deu, eng, fra, ... (20) |
JinaVDREuropeanaFrNewsRetrieval | DocumentUnderstanding | text, image | fra |
JinaVDRDocQAHealthcareIndustryRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRDocQAAI | DocumentUnderstanding | text, image | eng |
JinaVDRShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRTatQARetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRInfovqaRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRDocVQARetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRDocQAGovReportRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRTabFQuadRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRDocQAEnergyRetrieval | DocumentUnderstanding | text, image | eng |
JinaVDRArxivQARetrieval | DocumentUnderstanding | text, image | eng |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
LongEmbed¶
LongEmbed is a benchmark oriented at exploring models' performance on long-context retrieval. The benchmark comprises two synthetic tasks and four carefully chosen real-world tasks, featuring documents of varying length and dispersed target information.
Tasks
name | type | modalities | languages |
---|---|---|---|
LEMBNarrativeQARetrieval | Retrieval | text | eng |
LEMBNeedleRetrieval | Retrieval | text | eng |
LEMBPasskeyRetrieval | Retrieval | text | eng |
LEMBQMSumRetrieval | Retrieval | text | eng |
LEMBSummScreenFDRetrieval | Retrieval | text | eng |
LEMBWikimQARetrieval | Retrieval | text | eng |
Citation
@article{zhu2024longembed,
author = {Zhu, Dawei and Wang, Liang and Yang, Nan and Song, Yifan and Wu, Wenhao and Wei, Furu and Li, Sujian},
journal = {arXiv preprint arXiv:2404.12096},
title = {LongEmbed: Extending Embedding Models for Long Context Retrieval},
year = {2024},
}
MIEB(Img)¶
A image-only version of MIEB(Multilingual) that consists of 49 tasks.
Tasks
name | type | modalities | languages |
---|---|---|---|
CUB200I2IRetrieval | Any2AnyRetrieval | image | eng |
FORBI2IRetrieval | Any2AnyRetrieval | image | eng |
GLDv2I2IRetrieval | Any2AnyRetrieval | image | eng |
METI2IRetrieval | Any2AnyRetrieval | image | eng |
NIGHTSI2IRetrieval | Any2AnyRetrieval | image | eng |
ROxfordEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
ROxfordMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
ROxfordHardI2IRetrieval | Any2AnyRetrieval | image | eng |
RP2kI2IRetrieval | Any2AnyRetrieval | image | eng |
RParisEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
RParisMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
RParisHardI2IRetrieval | Any2AnyRetrieval | image | eng |
SketchyI2IRetrieval | Any2AnyRetrieval | image | eng |
SOPI2IRetrieval | Any2AnyRetrieval | image | eng |
StanfordCarsI2IRetrieval | Any2AnyRetrieval | image | eng |
Birdsnap | ImageClassification | image | eng |
Caltech101 | ImageClassification | image | eng |
CIFAR10 | ImageClassification | image | eng |
CIFAR100 | ImageClassification | image | eng |
Country211 | ImageClassification | image | eng |
DTD | ImageClassification | image | eng |
EuroSAT | ImageClassification | image | eng |
FER2013 | ImageClassification | image | eng |
FGVCAircraft | ImageClassification | image | eng |
Food101Classification | ImageClassification | image | eng |
GTSRB | ImageClassification | image | eng |
Imagenet1k | ImageClassification | image | eng |
MNIST | ImageClassification | image | eng |
OxfordFlowersClassification | ImageClassification | image | eng |
OxfordPets | ImageClassification | image | eng |
PatchCamelyon | ImageClassification | image | eng |
RESISC45 | ImageClassification | image | eng |
StanfordCars | ImageClassification | image | eng |
STL10 | ImageClassification | image | eng |
SUN397 | ImageClassification | image | eng |
UCF101 | ImageClassification | image | eng |
CIFAR10Clustering | ImageClustering | image | eng |
CIFAR100Clustering | ImageClustering | image | eng |
ImageNetDog15Clustering | ImageClustering | image | eng |
ImageNet10Clustering | ImageClustering | image | eng |
TinyImageNetClustering | ImageClustering | image | eng |
VOC2007 | ImageClassification | image | eng |
STS12VisualSTS | VisualSTS(eng) | image | eng |
STS13VisualSTS | VisualSTS(eng) | image | eng |
STS14VisualSTS | VisualSTS(eng) | image | eng |
STS15VisualSTS | VisualSTS(eng) | image | eng |
STS16VisualSTS | VisualSTS(eng) | image | eng |
STS17MultilingualVisualSTS | VisualSTS(multi) | image | ara, deu, eng, fra, ita, ... (9) |
STSBenchmarkMultilingualVisualSTS | VisualSTS(multi) | image | cmn, deu, eng, fra, ita, ... (10) |
Citation
@article{xiao2025mieb,
author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
doi = {10.48550/ARXIV.2504.10471},
journal = {arXiv preprint arXiv:2504.10471},
publisher = {arXiv},
title = {MIEB: Massive Image Embedding Benchmark},
url = {https://arxiv.org/abs/2504.10471},
year = {2025},
}
MIEB(Multilingual)¶
MIEB(Multilingual) is a comprehensive image embeddings benchmark, spanning 10 task types, covering 130 tasks and a total of 39 languages. In addition to image classification (zero shot and linear probing), clustering, retrieval, MIEB includes tasks in compositionality evaluation, document undestanding, visual STS, and CV-centric tasks. This benchmark consists of MIEB(eng) + 3 multilingual retrieval datasets + the multilingual parts of VisualSTS-b and VisualSTS-16.
Tasks
name | type | modalities | languages |
---|---|---|---|
Birdsnap | ImageClassification | image | eng |
Caltech101 | ImageClassification | image | eng |
CIFAR10 | ImageClassification | image | eng |
CIFAR100 | ImageClassification | image | eng |
Country211 | ImageClassification | image | eng |
DTD | ImageClassification | image | eng |
EuroSAT | ImageClassification | image | eng |
FER2013 | ImageClassification | image | eng |
FGVCAircraft | ImageClassification | image | eng |
Food101Classification | ImageClassification | image | eng |
GTSRB | ImageClassification | image | eng |
Imagenet1k | ImageClassification | image | eng |
MNIST | ImageClassification | image | eng |
OxfordFlowersClassification | ImageClassification | image | eng |
OxfordPets | ImageClassification | image | eng |
PatchCamelyon | ImageClassification | image | eng |
RESISC45 | ImageClassification | image | eng |
StanfordCars | ImageClassification | image | eng |
STL10 | ImageClassification | image | eng |
SUN397 | ImageClassification | image | eng |
UCF101 | ImageClassification | image | eng |
VOC2007 | ImageClassification | image | eng |
CIFAR10Clustering | ImageClustering | image | eng |
CIFAR100Clustering | ImageClustering | image | eng |
ImageNetDog15Clustering | ImageClustering | image | eng |
ImageNet10Clustering | ImageClustering | image | eng |
TinyImageNetClustering | ImageClustering | image | eng |
BirdsnapZeroShot | ZeroShotClassification | image, text | eng |
Caltech101ZeroShot | ZeroShotClassification | text, image | eng |
CIFAR10ZeroShot | ZeroShotClassification | text, image | eng |
CIFAR100ZeroShot | ZeroShotClassification | text, image | eng |
CLEVRZeroShot | ZeroShotClassification | text, image | eng |
CLEVRCountZeroShot | ZeroShotClassification | text, image | eng |
Country211ZeroShot | ZeroShotClassification | image, text | eng |
DTDZeroShot | ZeroShotClassification | image, text | eng |
EuroSATZeroShot | ZeroShotClassification | image, text | eng |
FER2013ZeroShot | ZeroShotClassification | image, text | eng |
FGVCAircraftZeroShot | ZeroShotClassification | text, image | eng |
Food101ZeroShot | ZeroShotClassification | text, image | eng |
GTSRBZeroShot | ZeroShotClassification | image | eng |
Imagenet1kZeroShot | ZeroShotClassification | image, text | eng |
MNISTZeroShot | ZeroShotClassification | image, text | eng |
OxfordPetsZeroShot | ZeroShotClassification | text, image | eng |
PatchCamelyonZeroShot | ZeroShotClassification | image, text | eng |
RenderedSST2 | ZeroShotClassification | text, image | eng |
RESISC45ZeroShot | ZeroShotClassification | image, text | eng |
StanfordCarsZeroShot | ZeroShotClassification | image, text | eng |
STL10ZeroShot | ZeroShotClassification | image, text | eng |
SUN397ZeroShot | ZeroShotClassification | image, text | eng |
UCF101ZeroShot | ZeroShotClassification | image, text | eng |
BLINKIT2IMultiChoice | VisionCentricQA | text, image | eng |
BLINKIT2TMultiChoice | VisionCentricQA | text, image | eng |
CVBenchCount | VisionCentricQA | image, text | eng |
CVBenchRelation | VisionCentricQA | text, image | eng |
CVBenchDepth | VisionCentricQA | text, image | eng |
CVBenchDistance | VisionCentricQA | text, image | eng |
AROCocoOrder | Compositionality | text, image | eng |
AROFlickrOrder | Compositionality | text, image | eng |
AROVisualAttribution | Compositionality | text, image | eng |
AROVisualRelation | Compositionality | text, image | eng |
SugarCrepe | Compositionality | text, image | eng |
Winoground | Compositionality | text, image | eng |
ImageCoDe | Compositionality | text, image | eng |
STS12VisualSTS | VisualSTS(eng) | image | eng |
STS13VisualSTS | VisualSTS(eng) | image | eng |
STS14VisualSTS | VisualSTS(eng) | image | eng |
STS15VisualSTS | VisualSTS(eng) | image | eng |
STS16VisualSTS | VisualSTS(eng) | image | eng |
BLINKIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
BLINKIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
CIRRIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
CUB200I2IRetrieval | Any2AnyRetrieval | image | eng |
EDIST2ITRetrieval | Any2AnyRetrieval | text, image | eng |
Fashion200kI2TRetrieval | Any2AnyRetrieval | text, image | eng |
Fashion200kT2IRetrieval | Any2AnyRetrieval | text, image | eng |
FashionIQIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
Flickr30kI2TRetrieval | Any2AnyRetrieval | text, image | eng |
Flickr30kT2IRetrieval | Any2AnyRetrieval | text, image | eng |
FORBI2IRetrieval | Any2AnyRetrieval | image | eng |
GLDv2I2IRetrieval | Any2AnyRetrieval | image | eng |
GLDv2I2TRetrieval | Any2AnyRetrieval | text, image | eng |
HatefulMemesI2TRetrieval | Any2AnyRetrieval | text, image | eng |
HatefulMemesT2IRetrieval | Any2AnyRetrieval | text, image | eng |
ImageCoDeT2IRetrieval | Any2AnyRetrieval | text, image | eng |
InfoSeekIT2ITRetrieval | Any2AnyRetrieval | text, image | eng |
InfoSeekIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
MemotionI2TRetrieval | Any2AnyRetrieval | text, image | eng |
MemotionT2IRetrieval | Any2AnyRetrieval | text, image | eng |
METI2IRetrieval | Any2AnyRetrieval | image | eng |
MSCOCOI2TRetrieval | Any2AnyRetrieval | text, image | eng |
MSCOCOT2IRetrieval | Any2AnyRetrieval | text, image | eng |
NIGHTSI2IRetrieval | Any2AnyRetrieval | image | eng |
OVENIT2ITRetrieval | Any2AnyRetrieval | image, text | eng |
OVENIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
ROxfordEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
ROxfordMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
ROxfordHardI2IRetrieval | Any2AnyRetrieval | image | eng |
RP2kI2IRetrieval | Any2AnyRetrieval | image | eng |
RParisEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
RParisMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
RParisHardI2IRetrieval | Any2AnyRetrieval | image | eng |
SciMMIRI2TRetrieval | Any2AnyRetrieval | text, image | eng |
SciMMIRT2IRetrieval | Any2AnyRetrieval | text, image | eng |
SketchyI2IRetrieval | Any2AnyRetrieval | image | eng |
SOPI2IRetrieval | Any2AnyRetrieval | image | eng |
StanfordCarsI2IRetrieval | Any2AnyRetrieval | image | eng |
TUBerlinT2IRetrieval | Any2AnyRetrieval | text, image | eng |
VidoreArxivQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreDocVQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreInfoVQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreTabfquadRetrieval | DocumentUnderstanding | text, image | eng |
VidoreTatdqaRetrieval | DocumentUnderstanding | text, image | eng |
VidoreShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAAIRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAEnergyRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAGovernmentReportsRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAHealthcareIndustryRetrieval | DocumentUnderstanding | text, image | eng |
VisualNewsI2TRetrieval | Any2AnyRetrieval | image, text | eng |
VisualNewsT2IRetrieval | Any2AnyRetrieval | image, text | eng |
VizWizIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
VQA2IT2TRetrieval | Any2AnyRetrieval | text, image | eng |
WebQAT2ITRetrieval | Any2AnyRetrieval | image, text | eng |
WebQAT2TRetrieval | Any2AnyRetrieval | text | eng |
WITT2IRetrieval | Any2AnyMultilingualRetrieval | text, image | ara, bul, dan, ell, eng, ... (11) |
XFlickr30kCoT2IRetrieval | Any2AnyMultilingualRetrieval | text, image | deu, eng, ind, jpn, rus, ... (8) |
XM3600T2IRetrieval | Any2AnyMultilingualRetrieval | text, image | ara, ben, ces, dan, deu, ... (36) |
VisualSTS17Eng | VisualSTS(eng) | image | eng |
VisualSTS-b-Eng | VisualSTS(eng) | image | eng |
VisualSTS17Multilingual | VisualSTS(multi) | image | ara, deu, eng, fra, ita, ... (9) |
VisualSTS-b-Multilingual | VisualSTS(multi) | image | cmn, deu, fra, ita, nld, ... (9) |
Citation
@article{xiao2025mieb,
author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
doi = {10.48550/ARXIV.2504.10471},
journal = {arXiv preprint arXiv:2504.10471},
publisher = {arXiv},
title = {MIEB: Massive Image Embedding Benchmark},
url = {https://arxiv.org/abs/2504.10471},
year = {2025},
}
MIEB(eng)¶
MIEB(eng) is a comprehensive image embeddings benchmark, spanning 8 task types, covering 125 tasks. In addition to image classification (zero shot and linear probing), clustering, retrieval, MIEB includes tasks in compositionality evaluation, document undestanding, visual STS, and CV-centric tasks.
Tasks
name | type | modalities | languages |
---|---|---|---|
Birdsnap | ImageClassification | image | eng |
Caltech101 | ImageClassification | image | eng |
CIFAR10 | ImageClassification | image | eng |
CIFAR100 | ImageClassification | image | eng |
Country211 | ImageClassification | image | eng |
DTD | ImageClassification | image | eng |
EuroSAT | ImageClassification | image | eng |
FER2013 | ImageClassification | image | eng |
FGVCAircraft | ImageClassification | image | eng |
Food101Classification | ImageClassification | image | eng |
GTSRB | ImageClassification | image | eng |
Imagenet1k | ImageClassification | image | eng |
MNIST | ImageClassification | image | eng |
OxfordFlowersClassification | ImageClassification | image | eng |
OxfordPets | ImageClassification | image | eng |
PatchCamelyon | ImageClassification | image | eng |
RESISC45 | ImageClassification | image | eng |
StanfordCars | ImageClassification | image | eng |
STL10 | ImageClassification | image | eng |
SUN397 | ImageClassification | image | eng |
UCF101 | ImageClassification | image | eng |
VOC2007 | ImageClassification | image | eng |
CIFAR10Clustering | ImageClustering | image | eng |
CIFAR100Clustering | ImageClustering | image | eng |
ImageNetDog15Clustering | ImageClustering | image | eng |
ImageNet10Clustering | ImageClustering | image | eng |
TinyImageNetClustering | ImageClustering | image | eng |
BirdsnapZeroShot | ZeroShotClassification | image, text | eng |
Caltech101ZeroShot | ZeroShotClassification | text, image | eng |
CIFAR10ZeroShot | ZeroShotClassification | text, image | eng |
CIFAR100ZeroShot | ZeroShotClassification | text, image | eng |
CLEVRZeroShot | ZeroShotClassification | text, image | eng |
CLEVRCountZeroShot | ZeroShotClassification | text, image | eng |
Country211ZeroShot | ZeroShotClassification | image, text | eng |
DTDZeroShot | ZeroShotClassification | image, text | eng |
EuroSATZeroShot | ZeroShotClassification | image, text | eng |
FER2013ZeroShot | ZeroShotClassification | image, text | eng |
FGVCAircraftZeroShot | ZeroShotClassification | text, image | eng |
Food101ZeroShot | ZeroShotClassification | text, image | eng |
GTSRBZeroShot | ZeroShotClassification | image | eng |
Imagenet1kZeroShot | ZeroShotClassification | image, text | eng |
MNISTZeroShot | ZeroShotClassification | image, text | eng |
OxfordPetsZeroShot | ZeroShotClassification | text, image | eng |
PatchCamelyonZeroShot | ZeroShotClassification | image, text | eng |
RenderedSST2 | ZeroShotClassification | text, image | eng |
RESISC45ZeroShot | ZeroShotClassification | image, text | eng |
StanfordCarsZeroShot | ZeroShotClassification | image, text | eng |
STL10ZeroShot | ZeroShotClassification | image, text | eng |
SUN397ZeroShot | ZeroShotClassification | image, text | eng |
UCF101ZeroShot | ZeroShotClassification | image, text | eng |
BLINKIT2IMultiChoice | VisionCentricQA | text, image | eng |
BLINKIT2TMultiChoice | VisionCentricQA | text, image | eng |
CVBenchCount | VisionCentricQA | image, text | eng |
CVBenchRelation | VisionCentricQA | text, image | eng |
CVBenchDepth | VisionCentricQA | text, image | eng |
CVBenchDistance | VisionCentricQA | text, image | eng |
AROCocoOrder | Compositionality | text, image | eng |
AROFlickrOrder | Compositionality | text, image | eng |
AROVisualAttribution | Compositionality | text, image | eng |
AROVisualRelation | Compositionality | text, image | eng |
SugarCrepe | Compositionality | text, image | eng |
Winoground | Compositionality | text, image | eng |
ImageCoDe | Compositionality | text, image | eng |
STS12VisualSTS | VisualSTS(eng) | image | eng |
STS13VisualSTS | VisualSTS(eng) | image | eng |
STS14VisualSTS | VisualSTS(eng) | image | eng |
STS15VisualSTS | VisualSTS(eng) | image | eng |
STS16VisualSTS | VisualSTS(eng) | image | eng |
BLINKIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
BLINKIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
CIRRIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
CUB200I2IRetrieval | Any2AnyRetrieval | image | eng |
EDIST2ITRetrieval | Any2AnyRetrieval | text, image | eng |
Fashion200kI2TRetrieval | Any2AnyRetrieval | text, image | eng |
Fashion200kT2IRetrieval | Any2AnyRetrieval | text, image | eng |
FashionIQIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
Flickr30kI2TRetrieval | Any2AnyRetrieval | text, image | eng |
Flickr30kT2IRetrieval | Any2AnyRetrieval | text, image | eng |
FORBI2IRetrieval | Any2AnyRetrieval | image | eng |
GLDv2I2IRetrieval | Any2AnyRetrieval | image | eng |
GLDv2I2TRetrieval | Any2AnyRetrieval | text, image | eng |
HatefulMemesI2TRetrieval | Any2AnyRetrieval | text, image | eng |
HatefulMemesT2IRetrieval | Any2AnyRetrieval | text, image | eng |
ImageCoDeT2IRetrieval | Any2AnyRetrieval | text, image | eng |
InfoSeekIT2ITRetrieval | Any2AnyRetrieval | text, image | eng |
InfoSeekIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
MemotionI2TRetrieval | Any2AnyRetrieval | text, image | eng |
MemotionT2IRetrieval | Any2AnyRetrieval | text, image | eng |
METI2IRetrieval | Any2AnyRetrieval | image | eng |
MSCOCOI2TRetrieval | Any2AnyRetrieval | text, image | eng |
MSCOCOT2IRetrieval | Any2AnyRetrieval | text, image | eng |
NIGHTSI2IRetrieval | Any2AnyRetrieval | image | eng |
OVENIT2ITRetrieval | Any2AnyRetrieval | image, text | eng |
OVENIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
ROxfordEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
ROxfordMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
ROxfordHardI2IRetrieval | Any2AnyRetrieval | image | eng |
RP2kI2IRetrieval | Any2AnyRetrieval | image | eng |
RParisEasyI2IRetrieval | Any2AnyRetrieval | image | eng |
RParisMediumI2IRetrieval | Any2AnyRetrieval | image | eng |
RParisHardI2IRetrieval | Any2AnyRetrieval | image | eng |
SciMMIRI2TRetrieval | Any2AnyRetrieval | text, image | eng |
SciMMIRT2IRetrieval | Any2AnyRetrieval | text, image | eng |
SketchyI2IRetrieval | Any2AnyRetrieval | image | eng |
SOPI2IRetrieval | Any2AnyRetrieval | image | eng |
StanfordCarsI2IRetrieval | Any2AnyRetrieval | image | eng |
TUBerlinT2IRetrieval | Any2AnyRetrieval | text, image | eng |
VidoreArxivQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreDocVQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreInfoVQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreTabfquadRetrieval | DocumentUnderstanding | text, image | eng |
VidoreTatdqaRetrieval | DocumentUnderstanding | text, image | eng |
VidoreShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAAIRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAEnergyRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAGovernmentReportsRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAHealthcareIndustryRetrieval | DocumentUnderstanding | text, image | eng |
VisualNewsI2TRetrieval | Any2AnyRetrieval | image, text | eng |
VisualNewsT2IRetrieval | Any2AnyRetrieval | image, text | eng |
VizWizIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
VQA2IT2TRetrieval | Any2AnyRetrieval | text, image | eng |
WebQAT2ITRetrieval | Any2AnyRetrieval | image, text | eng |
WebQAT2TRetrieval | Any2AnyRetrieval | text | eng |
VisualSTS17Eng | VisualSTS(eng) | image | eng |
VisualSTS-b-Eng | VisualSTS(eng) | image | eng |
Citation
@article{xiao2025mieb,
author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
doi = {10.48550/ARXIV.2504.10471},
journal = {arXiv preprint arXiv:2504.10471},
publisher = {arXiv},
title = {MIEB: Massive Image Embedding Benchmark},
url = {https://arxiv.org/abs/2504.10471},
year = {2025},
}
MIEB(lite)¶
MIEB(lite) is a comprehensive image embeddings benchmark, spanning 10 task types, covering 51 tasks. This is a lite version of MIEB(Multilingual), designed to be run at a fraction of the cost while maintaining relative rank of models.
Tasks
name | type | modalities | languages |
---|---|---|---|
Country211 | ImageClassification | image | eng |
DTD | ImageClassification | image | eng |
EuroSAT | ImageClassification | image | eng |
GTSRB | ImageClassification | image | eng |
OxfordPets | ImageClassification | image | eng |
PatchCamelyon | ImageClassification | image | eng |
RESISC45 | ImageClassification | image | eng |
SUN397 | ImageClassification | image | eng |
ImageNetDog15Clustering | ImageClustering | image | eng |
TinyImageNetClustering | ImageClustering | image | eng |
CIFAR100ZeroShot | ZeroShotClassification | text, image | eng |
Country211ZeroShot | ZeroShotClassification | image, text | eng |
FER2013ZeroShot | ZeroShotClassification | image, text | eng |
FGVCAircraftZeroShot | ZeroShotClassification | text, image | eng |
Food101ZeroShot | ZeroShotClassification | text, image | eng |
OxfordPetsZeroShot | ZeroShotClassification | text, image | eng |
StanfordCarsZeroShot | ZeroShotClassification | image, text | eng |
BLINKIT2IMultiChoice | VisionCentricQA | text, image | eng |
CVBenchCount | VisionCentricQA | image, text | eng |
CVBenchRelation | VisionCentricQA | text, image | eng |
CVBenchDepth | VisionCentricQA | text, image | eng |
CVBenchDistance | VisionCentricQA | text, image | eng |
AROCocoOrder | Compositionality | text, image | eng |
AROFlickrOrder | Compositionality | text, image | eng |
AROVisualAttribution | Compositionality | text, image | eng |
AROVisualRelation | Compositionality | text, image | eng |
Winoground | Compositionality | text, image | eng |
ImageCoDe | Compositionality | text, image | eng |
STS13VisualSTS | VisualSTS(eng) | image | eng |
STS15VisualSTS | VisualSTS(eng) | image | eng |
VisualSTS17Multilingual | VisualSTS(multi) | image | ara, deu, eng, fra, ita, ... (9) |
VisualSTS-b-Multilingual | VisualSTS(multi) | image | cmn, deu, fra, ita, nld, ... (9) |
CIRRIT2IRetrieval | Any2AnyRetrieval | text, image | eng |
CUB200I2IRetrieval | Any2AnyRetrieval | image | eng |
Fashion200kI2TRetrieval | Any2AnyRetrieval | text, image | eng |
HatefulMemesI2TRetrieval | Any2AnyRetrieval | text, image | eng |
InfoSeekIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
NIGHTSI2IRetrieval | Any2AnyRetrieval | image | eng |
OVENIT2TRetrieval | Any2AnyRetrieval | text, image | eng |
RP2kI2IRetrieval | Any2AnyRetrieval | image | eng |
VidoreDocVQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreInfoVQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreTabfquadRetrieval | DocumentUnderstanding | text, image | eng |
VidoreTatdqaRetrieval | DocumentUnderstanding | text, image | eng |
VidoreShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAAIRetrieval | DocumentUnderstanding | text, image | eng |
VisualNewsI2TRetrieval | Any2AnyRetrieval | image, text | eng |
VQA2IT2TRetrieval | Any2AnyRetrieval | text, image | eng |
WebQAT2ITRetrieval | Any2AnyRetrieval | image, text | eng |
WITT2IRetrieval | Any2AnyMultilingualRetrieval | text, image | ara, bul, dan, ell, eng, ... (11) |
XM3600T2IRetrieval | Any2AnyMultilingualRetrieval | text, image | ara, ben, ces, dan, deu, ... (36) |
Citation
@article{xiao2025mieb,
author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
doi = {10.48550/ARXIV.2504.10471},
journal = {arXiv preprint arXiv:2504.10471},
publisher = {arXiv},
title = {MIEB: Massive Image Embedding Benchmark},
url = {https://arxiv.org/abs/2504.10471},
year = {2025},
}
MINERSBitextMining¶
Bitext Mining texts from the MINERS benchmark, a benchmark designed to evaluate the ability of multilingual LMs in semantic retrieval tasks, including bitext mining and classification via retrieval-augmented contexts.
Tasks
name | type | modalities | languages |
---|---|---|---|
BUCC | BitextMining | text | cmn, deu, eng, fra, rus |
LinceMTBitextMining | BitextMining | text | eng, hin |
NollySentiBitextMining | BitextMining | text | eng, hau, ibo, pcm, yor |
NusaXBitextMining | BitextMining | text | ace, ban, bbc, bjn, bug, ... (12) |
NusaTranslationBitextMining | BitextMining | text | abs, bbc, bew, bhp, ind, ... (12) |
PhincBitextMining | BitextMining | text | eng, hin |
Tatoeba | BitextMining | text | afr, amh, ang, ara, arq, ... (113) |
Citation
@article{winata2024miners,
author = {Winata, Genta Indra and Zhang, Ruochen and Adelani, David Ifeoluwa},
journal = {arXiv preprint arXiv:2406.07424},
title = {MINERS: Multilingual Language Models as Semantic Retrievers},
year = {2024},
}
MTEB(Code, v1)¶
A massive code embedding benchmark covering retrieval tasks in a miriad of popular programming languages.
Tasks
name | type | modalities | languages |
---|---|---|---|
AppsRetrieval | Retrieval | text | eng, python |
CodeEditSearchRetrieval | Retrieval | text | c, c++, go, java, javascript, ... (13) |
CodeFeedbackMT | Retrieval | text | eng |
CodeFeedbackST | Retrieval | text | eng |
CodeSearchNetCCRetrieval | Retrieval | text | go, java, javascript, php, python, ... (6) |
CodeSearchNetRetrieval | Retrieval | text | go, java, javascript, php, python, ... (6) |
CodeTransOceanContest | Retrieval | text | c++, python |
CodeTransOceanDL | Retrieval | text | python |
CosQA | Retrieval | text | eng, python |
COIRCodeSearchNetRetrieval | Retrieval | text | go, java, javascript, php, python, ... (6) |
StackOverflowQA | Retrieval | text | eng |
SyntheticText2SQL | Retrieval | text | eng, sql |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(Europe, v1)¶
A regional geopolitical text embedding benchmark targetting embedding performance on European languages.
Tasks
name | type | modalities | languages |
---|---|---|---|
BornholmBitextMining | BitextMining | text | dan |
BibleNLPBitextMining | BitextMining | text | aai, aak, aau, aaz, abt, ... (829) |
BUCC.v2 | BitextMining | text | cmn, deu, eng, fra, rus |
DiaBlaBitextMining | BitextMining | text | eng, fra |
FloresBitextMining | BitextMining | text | ace, acm, acq, aeb, afr, ... (196) |
NorwegianCourtsBitextMining | BitextMining | text | nno, nob |
NTREXBitextMining | BitextMining | text | afr, amh, arb, aze, bak, ... (119) |
BulgarianStoreReviewSentimentClassfication | Classification | text | bul |
CzechProductReviewSentimentClassification | Classification | text | ces |
GreekLegalCodeClassification | Classification | text | ell |
DBpediaClassification | Classification | text | eng |
FinancialPhrasebankClassification | Classification | text | eng |
PoemSentimentClassification | Classification | text | eng |
ToxicChatClassification | Classification | text | eng |
ToxicConversationsClassification | Classification | text | eng |
EstonianValenceClassification | Classification | text | est |
ItaCaseholdClassification | Classification | text | ita |
AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MultiHateClassification | Classification | text | ara, cmn, deu, eng, fra, ... (11) |
NordicLangClassification | Classification | text | dan, fao, isl, nno, nob, ... (6) |
ScalaClassification | Classification | text | dan, nno, nob, swe |
SwissJudgementClassification | Classification | text | deu, fra, ita |
TweetSentimentClassification | Classification | text | ara, deu, eng, fra, hin, ... (8) |
CBD | Classification | text | pol |
PolEmo2.0-OUT | Classification | text | pol |
CSFDSKMovieReviewSentimentClassification | Classification | text | slk |
DalajClassification | Classification | text | swe |
WikiCitiesClustering | Clustering | text | eng |
RomaniBibleClustering | Clustering | text | rom |
BigPatentClustering.v2 | Clustering | text | eng |
BiorxivClusteringP2P.v2 | Clustering | text | eng |
AlloProfClusteringS2S.v2 | Clustering | text | fra |
HALClusteringS2S.v2 | Clustering | text | fra |
SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
WikiClusteringP2P.v2 | Clustering | text | bos, cat, ces, dan, eus, ... (14) |
StackOverflowQA | Retrieval | text | eng |
TwitterHjerneRetrieval | Retrieval | text | dan |
LegalQuAD | Retrieval | text | deu |
ArguAna | Retrieval | text | eng |
HagridRetrieval | Retrieval | text | eng |
LegalBenchCorporateLobbying | Retrieval | text | eng |
LEMBPasskeyRetrieval | Retrieval | text | eng |
SCIDOCS | Retrieval | text | eng |
SpartQA | Retrieval | text | eng |
TempReasonL1 | Retrieval | text | eng |
WinoGrande | Retrieval | text | eng |
AlloprofRetrieval | Retrieval | text | fra |
BelebeleRetrieval | Retrieval | text | acm, afr, als, amh, apc, ... (115) |
StatcanDialogueDatasetRetrieval | Retrieval | text | eng, fra |
WikipediaRetrievalMultilingual | Retrieval | text | ben, bul, ces, dan, deu, ... (16) |
Core17InstructionRetrieval | InstructionReranking | text | eng |
News21InstructionRetrieval | InstructionReranking | text | eng |
Robust04InstructionRetrieval | InstructionReranking | text | eng |
MalteseNewsClassification | MultilabelClassification | text | mlt |
MultiEURLEXMultilabelClassification | MultilabelClassification | text | bul, ces, dan, deu, ell, ... (23) |
CTKFactsNLI | PairClassification | text | ces |
SprintDuplicateQuestions | PairClassification | text | eng |
OpusparcusPC | PairClassification | text | deu, eng, fin, fra, rus, ... (6) |
RTE3 | PairClassification | text | deu, eng, fra, ita |
XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
PSC | PairClassification | text | pol |
WebLINXCandidatesReranking | Reranking | text | eng |
AlloprofReranking | Reranking | text | fra |
WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (16) |
SICK-R | STS | text | eng |
STS12 | STS | text | eng |
STS14 | STS | text | eng |
STS15 | STS | text | eng |
STSBenchmark | STS | text | eng |
FinParaSTS | STS | text | fin |
STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
SICK-R-PL | STS | text | pol |
STSES | STS | text | spa |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(Indic, v1)¶
A regional geopolitical text embedding benchmark targetting embedding performance on Indic languages.
Tasks
name | type | modalities | languages |
---|---|---|---|
IN22ConvBitextMining | BitextMining | text | asm, ben, brx, doi, eng, ... (23) |
IN22GenBitextMining | BitextMining | text | asm, ben, brx, doi, eng, ... (23) |
IndicGenBenchFloresBitextMining | BitextMining | text | asm, awa, ben, bgc, bho, ... (30) |
LinceMTBitextMining | BitextMining | text | eng, hin |
SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
BengaliSentimentAnalysis | Classification | text | ben |
GujaratiNewsClassification | Classification | text | guj |
HindiDiscourseClassification | Classification | text | hin |
SentimentAnalysisHindi | Classification | text | hin |
MalayalamNewsClassification | Classification | text | mal |
IndicLangClassification | Classification | text | asm, ben, brx, doi, gom, ... (22) |
MTOPIntentClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
MultiHateClassification | Classification | text | ara, cmn, deu, eng, fra, ... (11) |
TweetSentimentClassification | Classification | text | ara, deu, eng, fra, hin, ... (8) |
NepaliNewsClassification | Classification | text | nep |
PunjabiNewsClassification | Classification | text | pan |
SanskritShlokasClassification | Classification | text | san |
UrduRomanSentimentClassification | Classification | text | urd |
XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
BelebeleRetrieval | Retrieval | text | acm, afr, als, amh, apc, ... (115) |
XQuADRetrieval | Retrieval | text | arb, deu, ell, eng, hin, ... (12) |
WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (16) |
IndicCrosslingualSTS | STS | text | asm, ben, eng, guj, hin, ... (13) |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(Law, v1)¶
A benchmark of retrieval tasks in the legal domain.
Tasks
name | type | modalities | languages |
---|---|---|---|
AILACasedocs | Retrieval | text | eng |
AILAStatutes | Retrieval | text | eng |
LegalSummarization | Retrieval | text | eng |
GerDaLIRSmall | Retrieval | text | deu |
LeCaRDv2 | Retrieval | text | zho |
LegalBenchConsumerContractsQA | Retrieval | text | eng |
LegalBenchCorporateLobbying | Retrieval | text | eng |
LegalQuAD | Retrieval | text | deu |
MTEB(Medical, v1)¶
A curated set of MTEB tasks designed to evaluate systems in the context of medical information retrieval.
Tasks
name | type | modalities | languages |
---|---|---|---|
CUREv1 | Retrieval | text | eng, fra, spa |
NFCorpus | Retrieval | text | eng |
TRECCOVID | Retrieval | text | eng |
TRECCOVID-PL | Retrieval | text | pol |
SciFact | Retrieval | text | eng |
SciFact-PL | Retrieval | text | pol |
MedicalQARetrieval | Retrieval | text | eng |
PublicHealthQA | Retrieval | text | ara, eng, fra, kor, rus, ... (8) |
MedrxivClusteringP2P.v2 | Clustering | text | eng |
MedrxivClusteringS2S.v2 | Clustering | text | eng |
CmedqaRetrieval | Retrieval | text | cmn |
CMedQAv2-reranking | Reranking | text | cmn |
MTEB(Multilingual, v1)¶
A large-scale multilingual expansion of MTEB, driven mainly by highly-curated community contributions covering 250+ languages. This benhcmark has been replaced by MTEB(Multilingual, v2) as one of the datasets (SNLHierarchicalClustering) included in v1 was removed from the Hugging Face Hub.
Tasks
name | type | modalities | languages |
---|---|---|---|
BornholmBitextMining | BitextMining | text | dan |
BibleNLPBitextMining | BitextMining | text | aai, aak, aau, aaz, abt, ... (829) |
BUCC.v2 | BitextMining | text | cmn, deu, eng, fra, rus |
DiaBlaBitextMining | BitextMining | text | eng, fra |
FloresBitextMining | BitextMining | text | ace, acm, acq, aeb, afr, ... (196) |
IN22GenBitextMining | BitextMining | text | asm, ben, brx, doi, eng, ... (23) |
IndicGenBenchFloresBitextMining | BitextMining | text | asm, awa, ben, bgc, bho, ... (30) |
NollySentiBitextMining | BitextMining | text | eng, hau, ibo, pcm, yor |
NorwegianCourtsBitextMining | BitextMining | text | nno, nob |
NTREXBitextMining | BitextMining | text | afr, amh, arb, aze, bak, ... (119) |
NusaTranslationBitextMining | BitextMining | text | abs, bbc, bew, bhp, ind, ... (12) |
NusaXBitextMining | BitextMining | text | ace, ban, bbc, bjn, bug, ... (12) |
Tatoeba | BitextMining | text | afr, amh, ang, ara, arq, ... (113) |
BulgarianStoreReviewSentimentClassfication | Classification | text | bul |
CzechProductReviewSentimentClassification | Classification | text | ces |
GreekLegalCodeClassification | Classification | text | ell |
DBpediaClassification | Classification | text | eng |
FinancialPhrasebankClassification | Classification | text | eng |
PoemSentimentClassification | Classification | text | eng |
ToxicConversationsClassification | Classification | text | eng |
TweetTopicSingleClassification | Classification | text | eng |
EstonianValenceClassification | Classification | text | est |
FilipinoShopeeReviewsClassification | Classification | text | fil |
GujaratiNewsClassification | Classification | text | guj |
SentimentAnalysisHindi | Classification | text | hin |
IndonesianIdClickbaitClassification | Classification | text | ind |
ItaCaseholdClassification | Classification | text | ita |
KorSarcasmClassification | Classification | text | kor |
KurdishSentimentClassification | Classification | text | kur |
MacedonianTweetSentimentClassification | Classification | text | mkd |
AfriSentiClassification | Classification | text | amh, arq, ary, hau, ibo, ... (12) |
AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
CataloniaTweetClassification | Classification | text | cat, spa |
CyrillicTurkicLangClassification | Classification | text | bak, chv, kaz, kir, krc, ... (9) |
IndicLangClassification | Classification | text | asm, ben, brx, doi, gom, ... (22) |
MasakhaNEWSClassification | Classification | text | amh, eng, fra, hau, ibo, ... (16) |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MultiHateClassification | Classification | text | ara, cmn, deu, eng, fra, ... (11) |
NordicLangClassification | Classification | text | dan, fao, isl, nno, nob, ... (6) |
NusaParagraphEmotionClassification | Classification | text | bbc, bew, bug, jav, mad, ... (10) |
NusaX-senti | Classification | text | ace, ban, bbc, bjn, bug, ... (12) |
ScalaClassification | Classification | text | dan, nno, nob, swe |
SwissJudgementClassification | Classification | text | deu, fra, ita |
NepaliNewsClassification | Classification | text | nep |
OdiaNewsClassification | Classification | text | ory |
PunjabiNewsClassification | Classification | text | pan |
PolEmo2.0-OUT | Classification | text | pol |
PAC | Classification | text | pol |
SinhalaNewsClassification | Classification | text | sin |
CSFDSKMovieReviewSentimentClassification | Classification | text | slk |
SiswatiNewsClassification | Classification | text | ssw |
SlovakMovieReviewSentimentClassification | Classification | text | svk |
SwahiliNewsClassification | Classification | text | swa |
DalajClassification | Classification | text | swe |
TswanaNewsClassification | Classification | text | tsn |
IsiZuluNewsClassification | Classification | text | zul |
WikiCitiesClustering | Clustering | text | eng |
MasakhaNEWSClusteringS2S | Clustering | text | amh, eng, fra, hau, ibo, ... (16) |
RomaniBibleClustering | Clustering | text | rom |
ArXivHierarchicalClusteringP2P | Clustering | text | eng |
ArXivHierarchicalClusteringS2S | Clustering | text | eng |
BigPatentClustering.v2 | Clustering | text | eng |
BiorxivClusteringP2P.v2 | Clustering | text | eng |
MedrxivClusteringP2P.v2 | Clustering | text | eng |
StackExchangeClustering.v2 | Clustering | text | eng |
AlloProfClusteringS2S.v2 | Clustering | text | fra |
HALClusteringS2S.v2 | Clustering | text | fra |
SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
WikiClusteringP2P.v2 | Clustering | text | bos, cat, ces, dan, eus, ... (14) |
PlscClusteringP2P.v2 | Clustering | text | pol |
SwednClusteringP2P | Clustering | text | swe |
CLSClusteringP2P.v2 | Clustering | text | cmn |
StackOverflowQA | Retrieval | text | eng |
TwitterHjerneRetrieval | Retrieval | text | dan |
AILAStatutes | Retrieval | text | eng |
ArguAna | Retrieval | text | eng |
HagridRetrieval | Retrieval | text | eng |
LegalBenchCorporateLobbying | Retrieval | text | eng |
LEMBPasskeyRetrieval | Retrieval | text | eng |
SCIDOCS | Retrieval | text | eng |
SpartQA | Retrieval | text | eng |
TempReasonL1 | Retrieval | text | eng |
TRECCOVID | Retrieval | text | eng |
WinoGrande | Retrieval | text | eng |
BelebeleRetrieval | Retrieval | text | acm, afr, als, amh, apc, ... (115) |
MLQARetrieval | Retrieval | text | ara, deu, eng, hin, spa, ... (7) |
StatcanDialogueDatasetRetrieval | Retrieval | text | eng, fra |
WikipediaRetrievalMultilingual | Retrieval | text | ben, bul, ces, dan, deu, ... (16) |
CovidRetrieval | Retrieval | text | cmn |
Core17InstructionRetrieval | InstructionReranking | text | eng |
News21InstructionRetrieval | InstructionReranking | text | eng |
Robust04InstructionRetrieval | InstructionReranking | text | eng |
KorHateSpeechMLClassification | MultilabelClassification | text | kor |
MalteseNewsClassification | MultilabelClassification | text | mlt |
MultiEURLEXMultilabelClassification | MultilabelClassification | text | bul, ces, dan, deu, ell, ... (23) |
BrazilianToxicTweetsClassification | MultilabelClassification | text | por |
CEDRClassification | MultilabelClassification | text | rus |
CTKFactsNLI | PairClassification | text | ces |
SprintDuplicateQuestions | PairClassification | text | eng |
TwitterURLCorpus | PairClassification | text | eng |
ArmenianParaphrasePC | PairClassification | text | hye |
indonli | PairClassification | text | ind |
OpusparcusPC | PairClassification | text | deu, eng, fin, fra, rus, ... (6) |
PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
RTE3 | PairClassification | text | deu, eng, fra, ita |
XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
PpcPC | PairClassification | text | pol |
TERRa | PairClassification | text | rus |
WebLINXCandidatesReranking | Reranking | text | eng |
AlloprofReranking | Reranking | text | fra |
VoyageMMarcoReranking | Reranking | text | jpn |
WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (16) |
RuBQReranking | Reranking | text | rus |
T2Reranking | Reranking | text | cmn |
GermanSTSBenchmark | STS | text | deu |
SICK-R | STS | text | eng |
STS12 | STS | text | eng |
STS13 | STS | text | eng |
STS14 | STS | text | eng |
STS15 | STS | text | eng |
STSBenchmark | STS | text | eng |
FaroeseSTS | STS | text | fao |
FinParaSTS | STS | text | fin |
JSICK | STS | text | jpn |
IndicCrosslingualSTS | STS | text | asm, ben, eng, guj, hin, ... (13) |
SemRel24STS | STS | text | afr, amh, arb, arq, ary, ... (12) |
STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
STS22.v2 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
STSES | STS | text | spa |
STSB | STS | text | cmn |
MIRACLRetrievalHardNegatives | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
SNLHierarchicalClusteringP2P | Clustering | text | nob |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(Multilingual, v2)¶
A large-scale multilingual expansion of MTEB, driven mainly by highly-curated community contributions covering 250+ languages.
Tasks
name | type | modalities | languages |
---|---|---|---|
BornholmBitextMining | BitextMining | text | dan |
BibleNLPBitextMining | BitextMining | text | aai, aak, aau, aaz, abt, ... (829) |
BUCC.v2 | BitextMining | text | cmn, deu, eng, fra, rus |
DiaBlaBitextMining | BitextMining | text | eng, fra |
FloresBitextMining | BitextMining | text | ace, acm, acq, aeb, afr, ... (196) |
IN22GenBitextMining | BitextMining | text | asm, ben, brx, doi, eng, ... (23) |
IndicGenBenchFloresBitextMining | BitextMining | text | asm, awa, ben, bgc, bho, ... (30) |
NollySentiBitextMining | BitextMining | text | eng, hau, ibo, pcm, yor |
NorwegianCourtsBitextMining | BitextMining | text | nno, nob |
NTREXBitextMining | BitextMining | text | afr, amh, arb, aze, bak, ... (119) |
NusaTranslationBitextMining | BitextMining | text | abs, bbc, bew, bhp, ind, ... (12) |
NusaXBitextMining | BitextMining | text | ace, ban, bbc, bjn, bug, ... (12) |
Tatoeba | BitextMining | text | afr, amh, ang, ara, arq, ... (113) |
BulgarianStoreReviewSentimentClassfication | Classification | text | bul |
CzechProductReviewSentimentClassification | Classification | text | ces |
GreekLegalCodeClassification | Classification | text | ell |
DBpediaClassification | Classification | text | eng |
FinancialPhrasebankClassification | Classification | text | eng |
PoemSentimentClassification | Classification | text | eng |
ToxicConversationsClassification | Classification | text | eng |
TweetTopicSingleClassification | Classification | text | eng |
EstonianValenceClassification | Classification | text | est |
FilipinoShopeeReviewsClassification | Classification | text | fil |
GujaratiNewsClassification | Classification | text | guj |
SentimentAnalysisHindi | Classification | text | hin |
IndonesianIdClickbaitClassification | Classification | text | ind |
ItaCaseholdClassification | Classification | text | ita |
KorSarcasmClassification | Classification | text | kor |
KurdishSentimentClassification | Classification | text | kur |
MacedonianTweetSentimentClassification | Classification | text | mkd |
AfriSentiClassification | Classification | text | amh, arq, ary, hau, ibo, ... (12) |
AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
CataloniaTweetClassification | Classification | text | cat, spa |
CyrillicTurkicLangClassification | Classification | text | bak, chv, kaz, kir, krc, ... (9) |
IndicLangClassification | Classification | text | asm, ben, brx, doi, gom, ... (22) |
MasakhaNEWSClassification | Classification | text | amh, eng, fra, hau, ibo, ... (16) |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MultiHateClassification | Classification | text | ara, cmn, deu, eng, fra, ... (11) |
NordicLangClassification | Classification | text | dan, fao, isl, nno, nob, ... (6) |
NusaParagraphEmotionClassification | Classification | text | bbc, bew, bug, jav, mad, ... (10) |
NusaX-senti | Classification | text | ace, ban, bbc, bjn, bug, ... (12) |
ScalaClassification | Classification | text | dan, nno, nob, swe |
SwissJudgementClassification | Classification | text | deu, fra, ita |
NepaliNewsClassification | Classification | text | nep |
OdiaNewsClassification | Classification | text | ory |
PunjabiNewsClassification | Classification | text | pan |
PolEmo2.0-OUT | Classification | text | pol |
PAC | Classification | text | pol |
SinhalaNewsClassification | Classification | text | sin |
CSFDSKMovieReviewSentimentClassification | Classification | text | slk |
SiswatiNewsClassification | Classification | text | ssw |
SlovakMovieReviewSentimentClassification | Classification | text | svk |
SwahiliNewsClassification | Classification | text | swa |
DalajClassification | Classification | text | swe |
TswanaNewsClassification | Classification | text | tsn |
IsiZuluNewsClassification | Classification | text | zul |
WikiCitiesClustering | Clustering | text | eng |
MasakhaNEWSClusteringS2S | Clustering | text | amh, eng, fra, hau, ibo, ... (16) |
RomaniBibleClustering | Clustering | text | rom |
ArXivHierarchicalClusteringP2P | Clustering | text | eng |
ArXivHierarchicalClusteringS2S | Clustering | text | eng |
BigPatentClustering.v2 | Clustering | text | eng |
BiorxivClusteringP2P.v2 | Clustering | text | eng |
MedrxivClusteringP2P.v2 | Clustering | text | eng |
StackExchangeClustering.v2 | Clustering | text | eng |
AlloProfClusteringS2S.v2 | Clustering | text | fra |
HALClusteringS2S.v2 | Clustering | text | fra |
SIB200ClusteringS2S | Clustering | text | ace, acm, acq, aeb, afr, ... (197) |
WikiClusteringP2P.v2 | Clustering | text | bos, cat, ces, dan, eus, ... (14) |
PlscClusteringP2P.v2 | Clustering | text | pol |
SwednClusteringP2P | Clustering | text | swe |
CLSClusteringP2P.v2 | Clustering | text | cmn |
StackOverflowQA | Retrieval | text | eng |
TwitterHjerneRetrieval | Retrieval | text | dan |
AILAStatutes | Retrieval | text | eng |
ArguAna | Retrieval | text | eng |
HagridRetrieval | Retrieval | text | eng |
LegalBenchCorporateLobbying | Retrieval | text | eng |
LEMBPasskeyRetrieval | Retrieval | text | eng |
SCIDOCS | Retrieval | text | eng |
SpartQA | Retrieval | text | eng |
TempReasonL1 | Retrieval | text | eng |
TRECCOVID | Retrieval | text | eng |
WinoGrande | Retrieval | text | eng |
BelebeleRetrieval | Retrieval | text | acm, afr, als, amh, apc, ... (115) |
MLQARetrieval | Retrieval | text | ara, deu, eng, hin, spa, ... (7) |
StatcanDialogueDatasetRetrieval | Retrieval | text | eng, fra |
WikipediaRetrievalMultilingual | Retrieval | text | ben, bul, ces, dan, deu, ... (16) |
CovidRetrieval | Retrieval | text | cmn |
Core17InstructionRetrieval | InstructionReranking | text | eng |
News21InstructionRetrieval | InstructionReranking | text | eng |
Robust04InstructionRetrieval | InstructionReranking | text | eng |
KorHateSpeechMLClassification | MultilabelClassification | text | kor |
MalteseNewsClassification | MultilabelClassification | text | mlt |
MultiEURLEXMultilabelClassification | MultilabelClassification | text | bul, ces, dan, deu, ell, ... (23) |
BrazilianToxicTweetsClassification | MultilabelClassification | text | por |
CEDRClassification | MultilabelClassification | text | rus |
CTKFactsNLI | PairClassification | text | ces |
SprintDuplicateQuestions | PairClassification | text | eng |
TwitterURLCorpus | PairClassification | text | eng |
ArmenianParaphrasePC | PairClassification | text | hye |
indonli | PairClassification | text | ind |
OpusparcusPC | PairClassification | text | deu, eng, fin, fra, rus, ... (6) |
PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
RTE3 | PairClassification | text | deu, eng, fra, ita |
XNLI | PairClassification | text | ara, bul, deu, ell, eng, ... (14) |
PpcPC | PairClassification | text | pol |
TERRa | PairClassification | text | rus |
WebLINXCandidatesReranking | Reranking | text | eng |
AlloprofReranking | Reranking | text | fra |
VoyageMMarcoReranking | Reranking | text | jpn |
WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (16) |
RuBQReranking | Reranking | text | rus |
T2Reranking | Reranking | text | cmn |
GermanSTSBenchmark | STS | text | deu |
SICK-R | STS | text | eng |
STS12 | STS | text | eng |
STS13 | STS | text | eng |
STS14 | STS | text | eng |
STS15 | STS | text | eng |
STSBenchmark | STS | text | eng |
FaroeseSTS | STS | text | fao |
FinParaSTS | STS | text | fin |
JSICK | STS | text | jpn |
IndicCrosslingualSTS | STS | text | asm, ben, eng, guj, hin, ... (13) |
SemRel24STS | STS | text | afr, amh, arb, arq, ary, ... (12) |
STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
STS22.v2 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
STSES | STS | text | spa |
STSB | STS | text | cmn |
MIRACLRetrievalHardNegatives | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(Scandinavian, v1)¶
A curated selection of tasks coverering the Scandinavian languages; Danish, Swedish and Norwegian, including Bokmål and Nynorsk.
Tasks
name | type | modalities | languages |
---|---|---|---|
BornholmBitextMining | BitextMining | text | dan |
NorwegianCourtsBitextMining | BitextMining | text | nno, nob |
AngryTweetsClassification | Classification | text | dan |
DanishPoliticalCommentsClassification | Classification | text | dan |
DalajClassification | Classification | text | swe |
DKHateClassification | Classification | text | dan |
LccSentimentClassification | Classification | text | dan |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
NordicLangClassification | Classification | text | dan, fao, isl, nno, nob, ... (6) |
NoRecClassification | Classification | text | nob |
NorwegianParliamentClassification | Classification | text | nob |
ScalaClassification | Classification | text | dan, nno, nob, swe |
SwedishSentimentClassification | Classification | text | swe |
SweRecClassification | Classification | text | swe |
DanFeverRetrieval | Retrieval | text | dan |
NorQuadRetrieval | Retrieval | text | nob |
SNLRetrieval | Retrieval | text | nob |
SwednRetrieval | Retrieval | text | swe |
SweFaqRetrieval | Retrieval | text | swe |
TV2Nordretrieval | Retrieval | text | dan |
TwitterHjerneRetrieval | Retrieval | text | dan |
SNLHierarchicalClusteringS2S | Clustering | text | nob |
SNLHierarchicalClusteringP2P | Clustering | text | nob |
SwednClusteringP2P | Clustering | text | swe |
SwednClusteringS2S | Clustering | text | swe |
VGHierarchicalClusteringS2S | Clustering | text | nob |
VGHierarchicalClusteringP2P | Clustering | text | nob |
Citation
@inproceedings{enevoldsen2024scandinavian,
author = {Enevoldsen, Kenneth and Kardos, M{\'a}rton and Muennighoff, Niklas and Nielbo, Kristoffer},
booktitle = {Advances in Neural Information Processing Systems},
title = {The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding},
url = {https://nips.cc/virtual/2024/poster/97869},
year = {2024},
}
MTEB(cmn, v1)¶
The Chinese Massive Text Embedding Benchmark (C-MTEB) is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets.
Tasks
name | type | modalities | languages |
---|---|---|---|
T2Retrieval | Retrieval | text | cmn |
MMarcoRetrieval | Retrieval | text | cmn |
DuRetrieval | Retrieval | text | cmn |
CovidRetrieval | Retrieval | text | cmn |
CmedqaRetrieval | Retrieval | text | cmn |
EcomRetrieval | Retrieval | text | cmn |
MedicalRetrieval | Retrieval | text | cmn |
VideoRetrieval | Retrieval | text | cmn |
T2Reranking | Reranking | text | cmn |
MMarcoReranking | Reranking | text | cmn |
CMedQAv1-reranking | Reranking | text | cmn |
CMedQAv2-reranking | Reranking | text | cmn |
Ocnli | PairClassification | text | cmn |
Cmnli | PairClassification | text | cmn |
CLSClusteringS2S | Clustering | text | cmn |
CLSClusteringP2P | Clustering | text | cmn |
ThuNewsClusteringS2S | Clustering | text | cmn |
ThuNewsClusteringP2P | Clustering | text | cmn |
LCQMC | STS | text | cmn |
PAWSX | STS | text | cmn |
AFQMC | STS | text | cmn |
QBQTC | STS | text | cmn |
TNews | Classification | text | cmn |
IFlyTek | Classification | text | cmn |
Waimai | Classification | text | cmn |
OnlineShopping | Classification | text | cmn |
JDReview | Classification | text | cmn |
MultilingualSentiment | Classification | text | cmn |
ATEC | STS | text | cmn |
BQ | STS | text | cmn |
STSB | STS | text | cmn |
MultilingualSentiment | Classification | text | cmn |
Citation
@misc{c-pack,
archiveprefix = {arXiv},
author = {Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
eprint = {2309.07597},
primaryclass = {cs.CL},
title = {C-Pack: Packaged Resources To Advance General Chinese Embedding},
year = {2023},
}
MTEB(deu, v1)¶
A benchmark for text-embedding performance in German.
Tasks
name | type | modalities | languages |
---|---|---|---|
AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
AmazonReviewsClassification | Classification | text | cmn, deu, eng, fra, jpn, ... (6) |
MTOPDomainClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
MTOPIntentClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
BlurbsClusteringP2P | Clustering | text | deu |
BlurbsClusteringS2S | Clustering | text | deu |
TenKGnadClusteringP2P | Clustering | text | deu |
TenKGnadClusteringS2S | Clustering | text | deu |
FalseFriendsGermanEnglish | PairClassification | text | deu |
PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
GermanQuAD-Retrieval | Retrieval | text | deu |
GermanDPR | Retrieval | text | deu |
XMarket | Retrieval | text | deu, eng, spa |
GerDaLIR | Retrieval | text | deu |
GermanSTSBenchmark | STS | text | deu |
STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
Citation
@misc{wehrli2024germantextembeddingclustering,
archiveprefix = {arXiv},
author = {Silvan Wehrli and Bert Arnrich and Christopher Irrgang},
eprint = {2401.02709},
primaryclass = {cs.CL},
title = {German Text Embedding Clustering Benchmark},
url = {https://arxiv.org/abs/2401.02709},
year = {2024},
}
MTEB(eng, v1)¶
The original English benchmark by Muennighoff et al., (2023). This page is an adaptation of the old MTEB leaderboard. We recommend that you use MTEB(eng, v2) instead, as it uses updated versions of the task, making it notably faster to run and resolving a known bug in existing tasks. This benchmark also removes datasets common for fine-tuning, such as MSMARCO, which makes model performance scores more comparable. However, generally, both benchmarks provide similar estimates.
Tasks
name | type | modalities | languages |
---|---|---|---|
AmazonPolarityClassification | Classification | text | eng |
AmazonReviewsClassification | Classification | text | cmn, deu, eng, fra, jpn, ... (6) |
ArguAna | Retrieval | text | eng |
ArxivClusteringP2P | Clustering | text | eng |
ArxivClusteringS2S | Clustering | text | eng |
AskUbuntuDupQuestions | Reranking | text | eng |
BIOSSES | STS | text | eng |
Banking77Classification | Classification | text | eng |
BiorxivClusteringP2P | Clustering | text | eng |
BiorxivClusteringS2S | Clustering | text | eng |
CQADupstackRetrieval | Retrieval | text | eng |
ClimateFEVER | Retrieval | text | eng |
DBPedia | Retrieval | text | eng |
EmotionClassification | Classification | text | eng |
FEVER | Retrieval | text | eng |
FiQA2018 | Retrieval | text | eng |
HotpotQA | Retrieval | text | eng |
ImdbClassification | Classification | text | eng |
MTOPDomainClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
MTOPIntentClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MedrxivClusteringP2P | Clustering | text | eng |
MedrxivClusteringS2S | Clustering | text | eng |
MindSmallReranking | Reranking | text | eng |
NFCorpus | Retrieval | text | eng |
NQ | Retrieval | text | eng |
QuoraRetrieval | Retrieval | text | eng |
RedditClustering | Clustering | text | eng |
RedditClusteringP2P | Clustering | text | eng |
SCIDOCS | Retrieval | text | eng |
SICK-R | STS | text | eng |
STS12 | STS | text | eng |
STS13 | STS | text | eng |
STS14 | STS | text | eng |
STS15 | STS | text | eng |
STS16 | STS | text | eng |
STSBenchmark | STS | text | eng |
SciDocsRR | Reranking | text | eng |
SciFact | Retrieval | text | eng |
SprintDuplicateQuestions | PairClassification | text | eng |
StackExchangeClustering | Clustering | text | eng |
StackExchangeClusteringP2P | Clustering | text | eng |
StackOverflowDupQuestions | Reranking | text | eng |
SummEval | Summarization | text | eng |
TRECCOVID | Retrieval | text | eng |
Touche2020 | Retrieval | text | eng |
ToxicConversationsClassification | Classification | text | eng |
TweetSentimentExtractionClassification | Classification | text | eng |
TwentyNewsgroupsClustering | Clustering | text | eng |
TwitterSemEval2015 | PairClassification | text | eng |
TwitterURLCorpus | PairClassification | text | eng |
MSMARCO | Retrieval | text | eng |
AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
Citation
@article{muennighoff2022mteb,
author = {Muennighoff, Niklas and Tazi, Nouamane and Magne, Loïc and Reimers, Nils},
doi = {10.48550/ARXIV.2210.07316},
journal = {arXiv preprint arXiv:2210.07316},
publisher = {arXiv},
title = {MTEB: Massive Text Embedding Benchmark},
url = {https://arxiv.org/abs/2210.07316},
year = {2022},
}
MTEB(eng, v2)¶
The new English Massive Text Embedding Benchmark. This benchmark was created to account for the fact that many models have now been finetuned to tasks in the original MTEB, and contains tasks that are not as frequently used for model training. This way the new benchmark and leaderboard can give our users a more realistic expectation of models' generalization performance.
The original MTEB leaderboard is available under the MTEB(eng, v1) tab.
Tasks
name | type | modalities | languages |
---|---|---|---|
ArguAna | Retrieval | text | eng |
ArXivHierarchicalClusteringP2P | Clustering | text | eng |
ArXivHierarchicalClusteringS2S | Clustering | text | eng |
AskUbuntuDupQuestions | Reranking | text | eng |
BIOSSES | STS | text | eng |
Banking77Classification | Classification | text | eng |
BiorxivClusteringP2P.v2 | Clustering | text | eng |
CQADupstackGamingRetrieval | Retrieval | text | eng |
CQADupstackUnixRetrieval | Retrieval | text | eng |
ClimateFEVERHardNegatives | Retrieval | text | eng |
FEVERHardNegatives | Retrieval | text | eng |
FiQA2018 | Retrieval | text | eng |
HotpotQAHardNegatives | Retrieval | text | eng |
ImdbClassification | Classification | text | eng |
MTOPDomainClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MedrxivClusteringP2P.v2 | Clustering | text | eng |
MedrxivClusteringS2S.v2 | Clustering | text | eng |
MindSmallReranking | Reranking | text | eng |
SCIDOCS | Retrieval | text | eng |
SICK-R | STS | text | eng |
STS12 | STS | text | eng |
STS13 | STS | text | eng |
STS14 | STS | text | eng |
STS15 | STS | text | eng |
STSBenchmark | STS | text | eng |
SprintDuplicateQuestions | PairClassification | text | eng |
StackExchangeClustering.v2 | Clustering | text | eng |
StackExchangeClusteringP2P.v2 | Clustering | text | eng |
TRECCOVID | Retrieval | text | eng |
Touche2020Retrieval.v3 | Retrieval | text | eng |
ToxicConversationsClassification | Classification | text | eng |
TweetSentimentExtractionClassification | Classification | text | eng |
TwentyNewsgroupsClustering.v2 | Clustering | text | eng |
TwitterSemEval2015 | PairClassification | text | eng |
TwitterURLCorpus | PairClassification | text | eng |
SummEvalSummarization.v2 | Summarization | text | eng |
AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
STS17 | STS | text | ara, deu, eng, fra, ita, ... (9) |
STS22.v2 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
doi = {10.48550/arXiv.2502.13595},
journal = {arXiv preprint arXiv:2502.13595},
publisher = {arXiv},
title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
url = {https://arxiv.org/abs/2502.13595},
year = {2025},
}
MTEB(fas, v1)¶
The Persian Massive Text Embedding Benchmark (FaMTEB) is a comprehensive benchmark for Persian text embeddings covering 7 tasks and 60+ datasets.
Tasks
name | type | modalities | languages |
---|---|---|---|
PersianFoodSentimentClassification | Classification | text | fas |
SynPerChatbotConvSAClassification | Classification | text | fas |
SynPerChatbotConvSAToneChatbotClassification | Classification | text | fas |
SynPerChatbotConvSAToneUserClassification | Classification | text | fas |
SynPerChatbotSatisfactionLevelClassification | Classification | text | fas |
SynPerChatbotRAGToneChatbotClassification | Classification | text | fas |
SynPerChatbotRAGToneUserClassification | Classification | text | fas |
SynPerChatbotToneChatbotClassification | Classification | text | fas |
SynPerChatbotToneUserClassification | Classification | text | fas |
SynPerTextToneClassification | Classification | text | fas |
SIDClassification | Classification | text | fas |
DeepSentiPers | Classification | text | fas |
PersianTextEmotion | Classification | text | fas |
SentimentDKSF | Classification | text | fas |
NLPTwitterAnalysisClassification | Classification | text | fas |
DigikalamagClassification | Classification | text | fas |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
BeytooteClustering | Clustering | text | fas |
DigikalamagClustering | Clustering | text | fas |
HamshahriClustring | Clustering | text | fas |
NLPTwitterAnalysisClustering | Clustering | text | fas |
SIDClustring | Clustering | text | fas |
FarsTail | PairClassification | text | fas |
CExaPPC | PairClassification | text | fas |
SynPerChatbotRAGFAQPC | PairClassification | text | fas |
FarsiParaphraseDetection | PairClassification | text | fas |
SynPerTextKeywordsPC | PairClassification | text | fas |
SynPerQAPC | PairClassification | text | fas |
ParsinluEntail | PairClassification | text | fas |
ParsinluQueryParaphPC | PairClassification | text | fas |
MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
WikipediaRerankingMultilingual | Reranking | text | ben, bul, ces, dan, deu, ... (16) |
SynPerQARetrieval | Retrieval | text | fas |
SynPerChatbotTopicsRetrieval | Retrieval | text | fas |
SynPerChatbotRAGTopicsRetrieval | Retrieval | text | fas |
SynPerChatbotRAGFAQRetrieval | Retrieval | text | fas |
PersianWebDocumentRetrieval | Retrieval | text | fas |
WikipediaRetrievalMultilingual | Retrieval | text | ben, bul, ces, dan, deu, ... (16) |
MIRACLRetrieval | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
ClimateFEVER-Fa | Retrieval | text | fas |
DBPedia-Fa | Retrieval | text | fas |
HotpotQA-Fa | Retrieval | text | fas |
MSMARCO-Fa | Retrieval | text | fas |
NQ-Fa | Retrieval | text | fas |
ArguAna-Fa | Retrieval | text | fas |
CQADupstackRetrieval-Fa | Retrieval | text | fas |
FiQA2018-Fa | Retrieval | text | fas |
NFCorpus-Fa | Retrieval | text | fas |
QuoraRetrieval-Fa | Retrieval | text | fas |
SCIDOCS-Fa | Retrieval | text | fas |
SciFact-Fa | Retrieval | text | fas |
TRECCOVID-Fa | Retrieval | text | fas |
Touche2020-Fa | Retrieval | text | fas |
Farsick | STS | text | fas |
SynPerSTS | STS | text | fas |
Query2Query | STS | text | fas |
SAMSumFa | BitextMining | text | fas |
SynPerChatbotSumSRetrieval | BitextMining | text | fas |
SynPerChatbotRAGSumSRetrieval | BitextMining | text | fas |
Citation
@article{zinvandi2025famteb,
author = {Zinvandi, Erfan and Alikhani, Morteza and Sarmadi, Mehran and Pourbahman, Zahra and Arvin, Sepehr and Kazemi, Reza and Amini, Arash},
journal = {arXiv preprint arXiv:2502.11571},
title = {Famteb: Massive text embedding benchmark in persian language},
year = {2025},
}
MTEB(fra, v1)¶
MTEB-French, a French expansion of the original benchmark with high-quality native French datasets.
Tasks
name | type | modalities | languages |
---|---|---|---|
AmazonReviewsClassification | Classification | text | cmn, deu, eng, fra, jpn, ... (6) |
MasakhaNEWSClassification | Classification | text | amh, eng, fra, hau, ibo, ... (16) |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MTOPDomainClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
MTOPIntentClassification | Classification | text | deu, eng, fra, hin, spa, ... (6) |
AlloProfClusteringP2P | Clustering | text | fra |
AlloProfClusteringS2S | Clustering | text | fra |
HALClusteringS2S | Clustering | text | fra |
MasakhaNEWSClusteringP2P | Clustering | text | amh, eng, fra, hau, ibo, ... (16) |
MasakhaNEWSClusteringS2S | Clustering | text | amh, eng, fra, hau, ibo, ... (16) |
MLSUMClusteringP2P | Clustering | text | deu, fra, rus, spa |
MLSUMClusteringS2S | Clustering | text | deu, fra, rus, spa |
PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
AlloprofReranking | Reranking | text | fra |
SyntecReranking | Reranking | text | fra |
AlloprofRetrieval | Retrieval | text | fra |
BSARDRetrieval | Retrieval | text | fra |
MintakaRetrieval | Retrieval | text | ara, deu, fra, hin, ita, ... (8) |
SyntecRetrieval | Retrieval | text | fra |
XPQARetrieval | Retrieval | text | ara, cmn, deu, eng, fra, ... (13) |
SICKFr | STS | text | fra |
STSBenchmarkMultilingualSTS | STS | text | cmn, deu, eng, fra, ita, ... (10) |
SummEvalFr | Summarization | text | fra |
STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
Citation
@misc{ciancone2024mtebfrenchresourcesfrenchsentence,
archiveprefix = {arXiv},
author = {Mathieu Ciancone and Imene Kerboua and Marion Schaeffer and Wissam Siblini},
eprint = {2405.20468},
primaryclass = {cs.CL},
title = {MTEB-French: Resources for French Sentence Embedding Evaluation and Analysis},
url = {https://arxiv.org/abs/2405.20468},
year = {2024},
}
MTEB(jpn, v1)¶
JMTEB is a benchmark for evaluating Japanese text embedding models.
Tasks
name | type | modalities | languages |
---|---|---|---|
LivedoorNewsClustering.v2 | Clustering | text | jpn |
MewsC16JaClustering | Clustering | text | jpn |
AmazonReviewsClassification | Classification | text | cmn, deu, eng, fra, jpn, ... (6) |
AmazonCounterfactualClassification | Classification | text | deu, eng, jpn |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
JSTS | STS | text | jpn |
JSICK | STS | text | jpn |
PawsXPairClassification | PairClassification | text | cmn, deu, eng, fra, jpn, ... (7) |
JaqketRetrieval | Retrieval | text | jpn |
MrTidyRetrieval | Retrieval | text | ara, ben, eng, fin, ind, ... (11) |
JaGovFaqsRetrieval | Retrieval | text | jpn |
NLPJournalTitleAbsRetrieval | Retrieval | text | jpn |
NLPJournalAbsIntroRetrieval | Retrieval | text | jpn |
NLPJournalTitleIntroRetrieval | Retrieval | text | jpn |
ESCIReranking | Reranking | text | eng, jpn, spa |
MTEB(kor, v1)¶
A benchmark and leaderboard for evaluation of text embedding in Korean.
Tasks
name | type | modalities | languages |
---|---|---|---|
KLUE-TC | Classification | text | kor |
MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
MIRACLRetrieval | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
Ko-StrategyQA | Retrieval | text | kor |
KLUE-STS | STS | text | kor |
KorSTS | STS | text | kor |
MTEB(pol, v1)¶
Polish Massive Text Embedding Benchmark (PL-MTEB), a comprehensive benchmark for text embeddings in Polish. The PL-MTEB consists of 28 diverse NLP tasks from 5 task types. With tasks adapted based on previously used datasets by the Polish NLP community. In addition, a new PLSC (Polish Library of Science Corpus) dataset was created consisting of titles and abstracts of scientific publications in Polish, which was used as the basis for two novel clustering tasks.
Tasks
name | type | modalities | languages |
---|---|---|---|
AllegroReviews | Classification | text | pol |
CBD | Classification | text | pol |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
PolEmo2.0-IN | Classification | text | pol |
PolEmo2.0-OUT | Classification | text | pol |
PAC | Classification | text | pol |
EightTagsClustering | Clustering | text | pol |
PlscClusteringS2S | Clustering | text | pol |
PlscClusteringP2P | Clustering | text | pol |
CDSC-E | PairClassification | text | pol |
PpcPC | PairClassification | text | pol |
PSC | PairClassification | text | pol |
SICK-E-PL | PairClassification | text | pol |
CDSC-R | STS | text | pol |
SICK-R-PL | STS | text | pol |
STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
Citation
@article{poswiata2024plmteb,
author = {Rafał Poświata and Sławomir Dadas and Michał Perełkiewicz},
journal = {arXiv preprint arXiv:2405.10138},
title = {PL-MTEB: Polish Massive Text Embedding Benchmark},
year = {2024},
}
MTEB(rus, v1)¶
A Russian version of the Massive Text Embedding Benchmark with a number of novel Russian tasks in all task categories of the original MTEB.
Tasks
name | type | modalities | languages |
---|---|---|---|
GeoreviewClassification | Classification | text | rus |
HeadlineClassification | Classification | text | rus |
InappropriatenessClassification | Classification | text | rus |
KinopoiskClassification | Classification | text | rus |
MassiveIntentClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
MassiveScenarioClassification | Classification | text | afr, amh, ara, aze, ben, ... (50) |
RuReviewsClassification | Classification | text | rus |
RuSciBenchGRNTIClassification | Classification | text | rus |
RuSciBenchOECDClassification | Classification | text | rus |
GeoreviewClusteringP2P | Clustering | text | rus |
RuSciBenchGRNTIClusteringP2P | Clustering | text | rus |
RuSciBenchOECDClusteringP2P | Clustering | text | rus |
CEDRClassification | MultilabelClassification | text | rus |
SensitiveTopicsClassification | MultilabelClassification | text | rus |
TERRa | PairClassification | text | rus |
MIRACLReranking | Reranking | text | ara, ben, deu, eng, fas, ... (18) |
RuBQReranking | Reranking | text | rus |
MIRACLRetrieval | Retrieval | text | ara, ben, deu, eng, fas, ... (18) |
RiaNewsRetrieval | Retrieval | text | rus |
RuBQRetrieval | Retrieval | text | rus |
RUParaPhraserSTS | STS | text | rus |
STS22 | STS | text | ara, cmn, deu, eng, fra, ... (10) |
RuSTSBenchmarkSTS | STS | text | rus |
Citation
@misc{snegirev2024russianfocusedembeddersexplorationrumteb,
archiveprefix = {arXiv},
author = {Artem Snegirev and Maria Tikhonova and Anna Maksimova and Alena Fenogenova and Alexander Abramov},
eprint = {2408.12503},
primaryclass = {cs.CL},
title = {The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design},
url = {https://arxiv.org/abs/2408.12503},
year = {2024},
}
NanoBEIR¶
A benchmark to evaluate with subsets of BEIR datasets to use less computational power
Tasks
name | type | modalities | languages |
---|---|---|---|
NanoArguAnaRetrieval | Retrieval | text | eng |
NanoClimateFeverRetrieval | Retrieval | text | eng |
NanoDBPediaRetrieval | Retrieval | text | eng |
NanoFEVERRetrieval | Retrieval | text | eng |
NanoFiQA2018Retrieval | Retrieval | text | eng |
NanoHotpotQARetrieval | Retrieval | text | eng |
NanoMSMARCORetrieval | Retrieval | text | eng |
NanoNFCorpusRetrieval | Retrieval | text | eng |
NanoNQRetrieval | Retrieval | text | eng |
NanoQuoraRetrieval | Retrieval | text | eng |
NanoSCIDOCSRetrieval | Retrieval | text | eng |
NanoSciFactRetrieval | Retrieval | text | eng |
NanoTouche2020Retrieval | Retrieval | text | eng |
R2MED¶
R2MED: First Reasoning-Driven Medical Retrieval Benchmark. R2MED is a high-quality, high-resolution information retrieval (IR) dataset designed for medical scenarios. It contains 876 queries with three retrieval tasks, five medical scenarios, and twelve body systems.
Tasks
name | type | modalities | languages |
---|---|---|---|
R2MEDBiologyRetrieval | Retrieval | text | eng |
R2MEDBioinformaticsRetrieval | Retrieval | text | eng |
R2MEDMedicalSciencesRetrieval | Retrieval | text | eng |
R2MEDMedXpertQAExamRetrieval | Retrieval | text | eng |
R2MEDMedQADiagRetrieval | Retrieval | text | eng |
R2MEDPMCTreatmentRetrieval | Retrieval | text | eng |
R2MEDPMCClinicalRetrieval | Retrieval | text | eng |
R2MEDIIYiClinicalRetrieval | Retrieval | text | eng |
Citation
@article{li2025r2med,
author = {Li, Lei and Zhou, Xiao and Liu, Zheng},
journal = {arXiv preprint arXiv:2505.14558},
title = {R2MED: A Benchmark for Reasoning-Driven Medical Retrieval},
year = {2025},
}
RAR-b¶
A benchmark to evaluate reasoning capabilities of retrievers.
Tasks
name | type | modalities | languages |
---|---|---|---|
ARCChallenge | Retrieval | text | eng |
AlphaNLI | Retrieval | text | eng |
HellaSwag | Retrieval | text | eng |
WinoGrande | Retrieval | text | eng |
PIQA | Retrieval | text | eng |
SIQA | Retrieval | text | eng |
Quail | Retrieval | text | eng |
SpartQA | Retrieval | text | eng |
TempReasonL1 | Retrieval | text | eng |
TempReasonL2Pure | Retrieval | text | eng |
TempReasonL2Fact | Retrieval | text | eng |
TempReasonL2Context | Retrieval | text | eng |
TempReasonL3Pure | Retrieval | text | eng |
TempReasonL3Fact | Retrieval | text | eng |
TempReasonL3Context | Retrieval | text | eng |
RARbCode | Retrieval | text | eng |
RARbMath | Retrieval | text | eng |
Citation
@article{xiao2024rar,
author = {Xiao, Chenghao and Hudson, G Thomas and Al Moubayed, Noura},
journal = {arXiv preprint arXiv:2404.06347},
title = {RAR-b: Reasoning as Retrieval Benchmark},
year = {2024},
}
RuSciBench¶
RuSciBench is a benchmark designed for evaluating sentence encoders and language models on scientific texts in both Russian and English. The data is sourced from eLibrary (www.elibrary.ru), Russia's largest electronic library of scientific publications. This benchmark facilitates the evaluation and comparison of models on various research-related tasks.
Tasks
name | type | modalities | languages |
---|---|---|---|
RuSciBenchBitextMining | BitextMining | text | eng, rus |
RuSciBenchCoreRiscClassification | Classification | text | eng, rus |
RuSciBenchGRNTIClassification.v2 | Classification | text | eng, rus |
RuSciBenchOECDClassification.v2 | Classification | text | eng, rus |
RuSciBenchPubTypeClassification | Classification | text | eng, rus |
RuSciBenchCiteRetrieval | Retrieval | text | eng, rus |
RuSciBenchCociteRetrieval | Retrieval | text | eng, rus |
RuSciBenchCitedCountRegression | Regression | text | eng, rus |
RuSciBenchYearPublRegression | Regression | text | eng, rus |
Citation
@article{vatolin2024ruscibench,
author = {Vatolin, A. and Gerasimenko, N. and Ianina, A. and Vorontsov, K.},
doi = {10.1134/S1064562424602191},
issn = {1531-8362},
journal = {Doklady Mathematics},
month = {12},
number = {1},
pages = {S251--S260},
title = {RuSciBench: Open Benchmark for Russian and English Scientific Document Representations},
url = {https://doi.org/10.1134/S1064562424602191},
volume = {110},
year = {2024},
}
VN-MTEB (vie, v1)¶
A benchmark for text-embedding performance in Vietnamese.
Tasks
name | type | modalities | languages |
---|---|---|---|
ArguAna-VN | Retrieval | text | vie |
SciFact-VN | Retrieval | text | vie |
ClimateFEVER-VN | Retrieval | text | vie |
FEVER-VN | Retrieval | text | vie |
DBPedia-VN | Retrieval | text | vie |
NQ-VN | Retrieval | text | vie |
HotpotQA-VN | Retrieval | text | vie |
MSMARCO-VN | Retrieval | text | vie |
TRECCOVID-VN | Retrieval | text | vie |
FiQA2018-VN | Retrieval | text | vie |
NFCorpus-VN | Retrieval | text | vie |
SCIDOCS-VN | Retrieval | text | vie |
Touche2020-VN | Retrieval | text | vie |
Quora-VN | Retrieval | text | vie |
CQADupstackAndroid-VN | Retrieval | text | vie |
CQADupstackGis-VN | Retrieval | text | vie |
CQADupstackMathematica-VN | Retrieval | text | vie |
CQADupstackPhysics-VN | Retrieval | text | vie |
CQADupstackProgrammers-VN | Retrieval | text | vie |
CQADupstackStats-VN | Retrieval | text | vie |
CQADupstackTex-VN | Retrieval | text | vie |
CQADupstackUnix-VN | Retrieval | text | vie |
CQADupstackWebmasters-VN | Retrieval | text | vie |
CQADupstackWordpress-VN | Retrieval | text | vie |
Banking77VNClassification | Classification | text | vie |
EmotionVNClassification | Classification | text | vie |
AmazonCounterfactualVNClassification | Classification | text | vie |
MTOPDomainVNClassification | Classification | text | vie |
TweetSentimentExtractionVNClassification | Classification | text | vie |
ToxicConversationsVNClassification | Classification | text | vie |
ImdbVNClassification | Classification | text | vie |
MTOPIntentVNClassification | Classification | text | vie |
MassiveScenarioVNClassification | Classification | text | vie |
MassiveIntentVNClassification | Classification | text | vie |
AmazonReviewsVNClassification | Classification | text | vie |
AmazonPolarityVNClassification | Classification | text | vie |
SprintDuplicateQuestions-VN | PairClassification | text | vie |
TwitterSemEval2015-VN | PairClassification | text | vie |
TwitterURLCorpus-VN | PairClassification | text | vie |
TwentyNewsgroupsClustering-VN | Clustering | text | vie |
RedditClusteringP2P-VN | Clustering | text | vie |
StackExchangeClusteringP2P-VN | Clustering | text | vie |
StackExchangeClustering-VN | Clustering | text | vie |
RedditClustering-VN | Clustering | text | vie |
SciDocsRR-VN | Reranking | text | vie |
AskUbuntuDupQuestions-VN | Reranking | text | vie |
StackOverflowDupQuestions-VN | Reranking | text | vie |
BIOSSES-VN | STS | text | vie |
SICK-R-VN | STS | text | vie |
STSBenchmark-VN | STS | text | vie |
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
archiveprefix = {arXiv},
author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
eprint = {2507.21500},
primaryclass = {cs.CL},
title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
url = {https://arxiv.org/abs/2507.21500},
year = {2025},
}
ViDoRe(v1)¶
Retrieve associated pages according to questions.
Tasks
name | type | modalities | languages |
---|---|---|---|
VidoreArxivQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreDocVQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreInfoVQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreTabfquadRetrieval | DocumentUnderstanding | text, image | eng |
VidoreTatdqaRetrieval | DocumentUnderstanding | text, image | eng |
VidoreShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAAIRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAEnergyRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAGovernmentReportsRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAHealthcareIndustryRetrieval | DocumentUnderstanding | text, image | eng |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
ViDoRe(v2)¶
Retrieve associated pages according to questions.
Tasks
name | type | modalities | languages |
---|---|---|---|
Vidore2ESGReportsRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
Vidore2EconomicsReportsRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
Vidore2BioMedicalLecturesRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
Vidore2ESGReportsHLRetrieval | DocumentUnderstanding | text, image | eng |
Citation
@article{mace2025vidorev2,
author = {Macé, Quentin and Loison António and Faysse, Manuel},
journal = {arXiv preprint arXiv:2505.17166},
title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
year = {2025},
}
VisualDocumentRetrieval¶
A benchmark for evaluating visual document retrieval, combining ViDoRe v1 and v2.
Tasks
name | type | modalities | languages |
---|---|---|---|
VidoreArxivQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreDocVQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreInfoVQARetrieval | DocumentUnderstanding | text, image | eng |
VidoreTabfquadRetrieval | DocumentUnderstanding | text, image | eng |
VidoreTatdqaRetrieval | DocumentUnderstanding | text, image | eng |
VidoreShiftProjectRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAAIRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAEnergyRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAGovernmentReportsRetrieval | DocumentUnderstanding | text, image | eng |
VidoreSyntheticDocQAHealthcareIndustryRetrieval | DocumentUnderstanding | text, image | eng |
Vidore2ESGReportsRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
Vidore2EconomicsReportsRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
Vidore2BioMedicalLecturesRetrieval | DocumentUnderstanding | text, image | deu, eng, fra, spa |
Vidore2ESGReportsHLRetrieval | DocumentUnderstanding | text, image | eng |
Citation
@article{mace2025vidorev2,
author = {Macé, Quentin and Loison António and Faysse, Manuel},
journal = {arXiv preprint arXiv:2505.17166},
title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
year = {2025},
}