Skip to content

Available Benchmarks

BEIR

BEIR is a heterogeneous benchmark containing diverse IR tasks. It also provides a common and easy framework for evaluation of your NLP-based retrieval models within the benchmark.

Learn more →

Tasks
name type modalities languages
TRECCOVID Retrieval text eng
NFCorpus Retrieval text eng
NQ Retrieval text eng
HotpotQA Retrieval text eng
FiQA2018 Retrieval text eng
ArguAna Retrieval text eng
Touche2020 Retrieval text eng
CQADupstackRetrieval Retrieval text eng
QuoraRetrieval Retrieval text eng
DBPedia Retrieval text eng
SCIDOCS Retrieval text eng
FEVER Retrieval text eng
ClimateFEVER Retrieval text eng
SciFact Retrieval text eng
MSMARCO Retrieval text eng
Citation
@article{thakur2021beir,
  author = {Thakur, Nandan and Reimers, Nils and R{\"u}ckl{\'e}, Andreas and Srivastava, Abhishek and Gurevych, Iryna},
  journal = {arXiv preprint arXiv:2104.08663},
  title = {Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models},
  year = {2021},
}

BEIR-NL

BEIR-NL is a Dutch adaptation of the publicly available BEIR benchmark, created through automated translation.

Learn more →

Tasks
name type modalities languages
ArguAna-NL Retrieval text nld
CQADupstack-NL Retrieval text nld
FEVER-NL Retrieval text nld
NQ-NL Retrieval text nld
Touche2020-NL Retrieval text nld
FiQA2018-NL Retrieval text nld
Quora-NL Retrieval text nld
HotpotQA-NL Retrieval text nld
SCIDOCS-NL Retrieval text nld
ClimateFEVER-NL Retrieval text nld
mMARCO-NL Retrieval text nld
SciFact-NL Retrieval text nld
DBPedia-NL Retrieval text nld
NFCorpus-NL Retrieval text nld
TRECCOVID-NL Retrieval text nld
Citation
@misc{banar2024beirnlzeroshotinformationretrieval,
  archiveprefix = {arXiv},
  author = {Nikolay Banar and Ehsan Lotfi and Walter Daelemans},
  eprint = {2412.08329},
  primaryclass = {cs.CL},
  title = {BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language},
  url = {https://arxiv.org/abs/2412.08329},
  year = {2024},
}

BRIGHT

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval. BRIGHT is the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents with a dataset consisting of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding. These queries are drawn from naturally occurring and carefully curated human data.

Learn more →

Tasks
name type modalities languages
BrightRetrieval Retrieval text eng
Citation
@article{su2024bright,
  author = {Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and others},
  journal = {arXiv preprint arXiv:2407.12883},
  title = {Bright: A realistic and challenging benchmark for reasoning-intensive retrieval},
  year = {2024},
}

BRIGHT(long)

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval. BRIGHT is the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents with a dataset consisting of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding. These queries are drawn from naturally occurring and carefully curated human data.

This is the long version of the benchmark, which only filter longer documents.

Learn more →

Tasks
name type modalities languages
BrightLongRetrieval Retrieval text eng
Citation
@article{su2024bright,
  author = {Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and others},
  journal = {arXiv preprint arXiv:2407.12883},
  title = {Bright: A realistic and challenging benchmark for reasoning-intensive retrieval},
  year = {2024},
}

BuiltBench(eng)

"Built-Bench" is an ongoing effort aimed at evaluating text embedding models in the context of built asset management, spanning over various dicsiplines such as architeture, engineering, constrcution, and operations management of the built environment.

Learn more →

Tasks
name type modalities languages
BuiltBenchClusteringP2P Clustering text eng
BuiltBenchClusteringS2S Clustering text eng
BuiltBenchRetrieval Retrieval text eng
BuiltBenchReranking Reranking text eng
Citation
@article{shahinmoghadam2024benchmarking,
  author = {Shahinmoghadam, Mehrzad and Motamedi, Ali},
  journal = {arXiv preprint arXiv:2411.12056},
  title = {Benchmarking pre-trained text embedding models in aligning built asset information},
  year = {2024},
}

ChemTEB

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Learn more →

Tasks
name type modalities languages
PubChemSMILESBitextMining BitextMining text eng
SDSEyeProtectionClassification Classification text eng
SDSGlovesClassification Classification text eng
WikipediaBioMetChemClassification Classification text eng
WikipediaGreenhouseEnantiopureClassification Classification text eng
WikipediaSolidStateColloidalClassification Classification text eng
WikipediaOrganicInorganicClassification Classification text eng
WikipediaCryobiologySeparationClassification Classification text eng
WikipediaChemistryTopicsClassification Classification text eng
WikipediaTheoreticalAppliedClassification Classification text eng
WikipediaChemFieldsClassification Classification text eng
WikipediaLuminescenceClassification Classification text eng
WikipediaIsotopesFissionClassification Classification text eng
WikipediaSaltsSemiconductorsClassification Classification text eng
WikipediaBiolumNeurochemClassification Classification text eng
WikipediaCrystallographyAnalyticalClassification Classification text eng
WikipediaCompChemSpectroscopyClassification Classification text eng
WikipediaChemEngSpecialtiesClassification Classification text eng
WikipediaChemistryTopicsClustering Clustering text eng
WikipediaSpecialtiesInChemistryClustering Clustering text eng
PubChemAISentenceParaphrasePC PairClassification text eng
PubChemSMILESPC PairClassification text eng
PubChemSynonymPC PairClassification text eng
PubChemWikiParagraphsPC PairClassification text eng
PubChemWikiPairClassification PairClassification text ces, deu, eng, fra, hin, ... (13)
ChemNQRetrieval Retrieval text eng
ChemHotpotQARetrieval Retrieval text eng
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \\& Efficiency on a Specific Domain},
  year = {2024},
}

CoIR

CoIR: A Comprehensive Benchmark for Code Information Retrieval Models

Learn more →

Tasks
name type modalities languages
AppsRetrieval Retrieval text eng, python
CodeFeedbackMT Retrieval text eng
CodeFeedbackST Retrieval text eng
CodeSearchNetCCRetrieval Retrieval text go, java, javascript, php, python, ... (6)
CodeTransOceanContest Retrieval text c++, python
CodeTransOceanDL Retrieval text python
CosQA Retrieval text eng, python
COIRCodeSearchNetRetrieval Retrieval text go, java, javascript, php, python, ... (6)
StackOverflowQA Retrieval text eng
SyntheticText2SQL Retrieval text eng, sql
Citation
@misc{li2024coircomprehensivebenchmarkcode,
  archiveprefix = {arXiv},
  author = {Xiangyang Li and Kuicai Dong and Yi Quan Lee and Wei Xia and Yichun Yin and Hao Zhang and Yong Liu and Yasheng Wang and Ruiming Tang},
  eprint = {2407.02883},
  primaryclass = {cs.IR},
  title = {CoIR: A Comprehensive Benchmark for Code Information Retrieval Models},
  url = {https://arxiv.org/abs/2407.02883},
  year = {2024},
}

CodeRAG

A benchmark for evaluating code retrieval augmented generation, testing models' ability to retrieve relevant programming solutions, tutorials and documentation.

Learn more →

Tasks
name type modalities languages
CodeRAGLibraryDocumentationSolutions Reranking text python
CodeRAGOnlineTutorials Reranking text python
CodeRAGProgrammingSolutions Reranking text python
CodeRAGStackoverflowPosts Reranking text python
Citation
@misc{wang2024coderagbenchretrievalaugmentcode,
  archiveprefix = {arXiv},
  author = {Zora Zhiruo Wang and Akari Asai and Xinyan Velocity Yu and Frank F. Xu and Yiqing Xie and Graham Neubig and Daniel Fried},
  eprint = {2406.14497},
  primaryclass = {cs.SE},
  title = {CodeRAG-Bench: Can Retrieval Augment Code Generation?},
  url = {https://arxiv.org/abs/2406.14497},
  year = {2024},
}

Encodechka

A benchmark for evaluating text embedding models on Russian data.

Learn more →

Tasks
name type modalities languages
RUParaPhraserSTS STS text rus
SentiRuEval2016 Classification text rus
RuToxicOKMLCUPClassification Classification text rus
InappropriatenessClassificationv2 Classification text rus
RuNLUIntentClassification Classification text rus
XNLI PairClassification text ara, bul, deu, ell, eng, ... (14)
RuSTSBenchmarkSTS STS text rus
Citation
@misc{dale_encodechka,
  author = {Dale, David},
  editor = {habr.com},
  month = {June},
  note = {[Online; posted 12-June-2022]},
  title = {Russian rating of sentence encoders},
  url = {https://habr.com/ru/articles/669674/},
  year = {2022},
}

FollowIR

Retrieval w/Instructions is the task of finding relevant documents for a query that has detailed instructions.

Learn more →

Tasks
name type modalities languages
Robust04InstructionRetrieval InstructionReranking text eng
News21InstructionRetrieval InstructionReranking text eng
Core17InstructionRetrieval InstructionReranking text eng
Citation
@misc{weller2024followir,
  archiveprefix = {arXiv},
  author = {Orion Weller and Benjamin Chang and Sean MacAvaney and Kyle Lo and Arman Cohan and Benjamin Van Durme and Dawn Lawrie and Luca Soldaini},
  eprint = {2403.15246},
  primaryclass = {cs.IR},
  title = {FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions},
  year = {2024},
}

JinaVDR

Multilingual, domain-diverse and layout-rich document retrieval benchmark.

Learn more →

Tasks
name type modalities languages
JinaVDRMedicalPrescriptionsRetrieval DocumentUnderstanding text, image eng
JinaVDRStanfordSlideRetrieval DocumentUnderstanding text, image eng
JinaVDRDonutVQAISynHMPRetrieval DocumentUnderstanding text, image eng
JinaVDRTableVQARetrieval DocumentUnderstanding text, image eng
JinaVDRChartQARetrieval DocumentUnderstanding text, image eng
JinaVDRTQARetrieval DocumentUnderstanding text, image eng
JinaVDROpenAINewsRetrieval DocumentUnderstanding text, image eng
JinaVDREuropeanaDeNewsRetrieval DocumentUnderstanding text, image deu
JinaVDREuropeanaEsNewsRetrieval DocumentUnderstanding text, image spa
JinaVDREuropeanaItScansRetrieval DocumentUnderstanding text, image ita
JinaVDREuropeanaNlLegalRetrieval DocumentUnderstanding text, image nld
JinaVDRHindiGovVQARetrieval DocumentUnderstanding text, image hin
JinaVDRAutomobileCatelogRetrieval DocumentUnderstanding text, image jpn
JinaVDRBeveragesCatalogueRetrieval DocumentUnderstanding text, image rus
JinaVDRRamensBenchmarkRetrieval DocumentUnderstanding text, image jpn
JinaVDRJDocQARetrieval DocumentUnderstanding text, image jpn
JinaVDRHungarianDocQARetrieval DocumentUnderstanding text, image hun
JinaVDRArabicChartQARetrieval DocumentUnderstanding text, image ara
JinaVDRArabicInfographicsVQARetrieval DocumentUnderstanding text, image ara
JinaVDROWIDChartsRetrieval DocumentUnderstanding text, image eng
JinaVDRMPMQARetrieval DocumentUnderstanding text, image eng
JinaVDRJina2024YearlyBookRetrieval DocumentUnderstanding text, image eng
JinaVDRWikimediaCommonsMapsRetrieval DocumentUnderstanding text, image eng
JinaVDRPlotQARetrieval DocumentUnderstanding text, image eng
JinaVDRMMTabRetrieval DocumentUnderstanding text, image eng
JinaVDRCharXivOCRRetrieval DocumentUnderstanding text, image eng
JinaVDRStudentEnrollmentSyntheticRetrieval DocumentUnderstanding text, image eng
JinaVDRGitHubReadmeRetrieval DocumentUnderstanding text, image ara, ben, deu, eng, fra, ... (17)
JinaVDRTweetStockSyntheticsRetrieval DocumentUnderstanding text, image ara, deu, eng, fra, hin, ... (10)
JinaVDRAirbnbSyntheticRetrieval DocumentUnderstanding text, image ara, deu, eng, fra, hin, ... (10)
JinaVDRShanghaiMasterPlanRetrieval DocumentUnderstanding text, image zho
JinaVDRWikimediaCommonsDocumentsRetrieval DocumentUnderstanding text, image ara, ben, deu, eng, fra, ... (20)
JinaVDREuropeanaFrNewsRetrieval DocumentUnderstanding text, image fra
JinaVDRDocQAHealthcareIndustryRetrieval DocumentUnderstanding text, image eng
JinaVDRDocQAAI DocumentUnderstanding text, image eng
JinaVDRShiftProjectRetrieval DocumentUnderstanding text, image eng
JinaVDRTatQARetrieval DocumentUnderstanding text, image eng
JinaVDRInfovqaRetrieval DocumentUnderstanding text, image eng
JinaVDRDocVQARetrieval DocumentUnderstanding text, image eng
JinaVDRDocQAGovReportRetrieval DocumentUnderstanding text, image eng
JinaVDRTabFQuadRetrieval DocumentUnderstanding text, image eng
JinaVDRDocQAEnergyRetrieval DocumentUnderstanding text, image eng
JinaVDRArxivQARetrieval DocumentUnderstanding text, image eng
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
  archiveprefix = {arXiv},
  author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
  eprint = {2506.18902},
  primaryclass = {cs.AI},
  title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
  url = {https://arxiv.org/abs/2506.18902},
  year = {2025},
}

LongEmbed

LongEmbed is a benchmark oriented at exploring models' performance on long-context retrieval. The benchmark comprises two synthetic tasks and four carefully chosen real-world tasks, featuring documents of varying length and dispersed target information.

Learn more →

Tasks
name type modalities languages
LEMBNarrativeQARetrieval Retrieval text eng
LEMBNeedleRetrieval Retrieval text eng
LEMBPasskeyRetrieval Retrieval text eng
LEMBQMSumRetrieval Retrieval text eng
LEMBSummScreenFDRetrieval Retrieval text eng
LEMBWikimQARetrieval Retrieval text eng
Citation
@article{zhu2024longembed,
  author = {Zhu, Dawei and Wang, Liang and Yang, Nan and Song, Yifan and Wu, Wenhao and Wei, Furu and Li, Sujian},
  journal = {arXiv preprint arXiv:2404.12096},
  title = {LongEmbed: Extending Embedding Models for Long Context Retrieval},
  year = {2024},
}

MIEB(Img)

A image-only version of MIEB(Multilingual) that consists of 49 tasks.

Learn more →

Tasks
name type modalities languages
CUB200I2IRetrieval Any2AnyRetrieval image eng
FORBI2IRetrieval Any2AnyRetrieval image eng
GLDv2I2IRetrieval Any2AnyRetrieval image eng
METI2IRetrieval Any2AnyRetrieval image eng
NIGHTSI2IRetrieval Any2AnyRetrieval image eng
ROxfordEasyI2IRetrieval Any2AnyRetrieval image eng
ROxfordMediumI2IRetrieval Any2AnyRetrieval image eng
ROxfordHardI2IRetrieval Any2AnyRetrieval image eng
RP2kI2IRetrieval Any2AnyRetrieval image eng
RParisEasyI2IRetrieval Any2AnyRetrieval image eng
RParisMediumI2IRetrieval Any2AnyRetrieval image eng
RParisHardI2IRetrieval Any2AnyRetrieval image eng
SketchyI2IRetrieval Any2AnyRetrieval image eng
SOPI2IRetrieval Any2AnyRetrieval image eng
StanfordCarsI2IRetrieval Any2AnyRetrieval image eng
Birdsnap ImageClassification image eng
Caltech101 ImageClassification image eng
CIFAR10 ImageClassification image eng
CIFAR100 ImageClassification image eng
Country211 ImageClassification image eng
DTD ImageClassification image eng
EuroSAT ImageClassification image eng
FER2013 ImageClassification image eng
FGVCAircraft ImageClassification image eng
Food101Classification ImageClassification image eng
GTSRB ImageClassification image eng
Imagenet1k ImageClassification image eng
MNIST ImageClassification image eng
OxfordFlowersClassification ImageClassification image eng
OxfordPets ImageClassification image eng
PatchCamelyon ImageClassification image eng
RESISC45 ImageClassification image eng
StanfordCars ImageClassification image eng
STL10 ImageClassification image eng
SUN397 ImageClassification image eng
UCF101 ImageClassification image eng
CIFAR10Clustering ImageClustering image eng
CIFAR100Clustering ImageClustering image eng
ImageNetDog15Clustering ImageClustering image eng
ImageNet10Clustering ImageClustering image eng
TinyImageNetClustering ImageClustering image eng
VOC2007 ImageClassification image eng
STS12VisualSTS VisualSTS(eng) image eng
STS13VisualSTS VisualSTS(eng) image eng
STS14VisualSTS VisualSTS(eng) image eng
STS15VisualSTS VisualSTS(eng) image eng
STS16VisualSTS VisualSTS(eng) image eng
STS17MultilingualVisualSTS VisualSTS(multi) image ara, deu, eng, fra, ita, ... (9)
STSBenchmarkMultilingualVisualSTS VisualSTS(multi) image cmn, deu, eng, fra, ita, ... (10)
Citation
@article{xiao2025mieb,
  author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
  doi = {10.48550/ARXIV.2504.10471},
  journal = {arXiv preprint arXiv:2504.10471},
  publisher = {arXiv},
  title = {MIEB: Massive Image Embedding Benchmark},
  url = {https://arxiv.org/abs/2504.10471},
  year = {2025},
}

MIEB(Multilingual)

MIEB(Multilingual) is a comprehensive image embeddings benchmark, spanning 10 task types, covering 130 tasks and a total of 39 languages. In addition to image classification (zero shot and linear probing), clustering, retrieval, MIEB includes tasks in compositionality evaluation, document undestanding, visual STS, and CV-centric tasks. This benchmark consists of MIEB(eng) + 3 multilingual retrieval datasets + the multilingual parts of VisualSTS-b and VisualSTS-16.

Learn more →

Tasks
name type modalities languages
Birdsnap ImageClassification image eng
Caltech101 ImageClassification image eng
CIFAR10 ImageClassification image eng
CIFAR100 ImageClassification image eng
Country211 ImageClassification image eng
DTD ImageClassification image eng
EuroSAT ImageClassification image eng
FER2013 ImageClassification image eng
FGVCAircraft ImageClassification image eng
Food101Classification ImageClassification image eng
GTSRB ImageClassification image eng
Imagenet1k ImageClassification image eng
MNIST ImageClassification image eng
OxfordFlowersClassification ImageClassification image eng
OxfordPets ImageClassification image eng
PatchCamelyon ImageClassification image eng
RESISC45 ImageClassification image eng
StanfordCars ImageClassification image eng
STL10 ImageClassification image eng
SUN397 ImageClassification image eng
UCF101 ImageClassification image eng
VOC2007 ImageClassification image eng
CIFAR10Clustering ImageClustering image eng
CIFAR100Clustering ImageClustering image eng
ImageNetDog15Clustering ImageClustering image eng
ImageNet10Clustering ImageClustering image eng
TinyImageNetClustering ImageClustering image eng
BirdsnapZeroShot ZeroShotClassification image, text eng
Caltech101ZeroShot ZeroShotClassification text, image eng
CIFAR10ZeroShot ZeroShotClassification text, image eng
CIFAR100ZeroShot ZeroShotClassification text, image eng
CLEVRZeroShot ZeroShotClassification text, image eng
CLEVRCountZeroShot ZeroShotClassification text, image eng
Country211ZeroShot ZeroShotClassification image, text eng
DTDZeroShot ZeroShotClassification image, text eng
EuroSATZeroShot ZeroShotClassification image, text eng
FER2013ZeroShot ZeroShotClassification image, text eng
FGVCAircraftZeroShot ZeroShotClassification text, image eng
Food101ZeroShot ZeroShotClassification text, image eng
GTSRBZeroShot ZeroShotClassification image eng
Imagenet1kZeroShot ZeroShotClassification image, text eng
MNISTZeroShot ZeroShotClassification image, text eng
OxfordPetsZeroShot ZeroShotClassification text, image eng
PatchCamelyonZeroShot ZeroShotClassification image, text eng
RenderedSST2 ZeroShotClassification text, image eng
RESISC45ZeroShot ZeroShotClassification image, text eng
StanfordCarsZeroShot ZeroShotClassification image, text eng
STL10ZeroShot ZeroShotClassification image, text eng
SUN397ZeroShot ZeroShotClassification image, text eng
UCF101ZeroShot ZeroShotClassification image, text eng
BLINKIT2IMultiChoice VisionCentricQA text, image eng
BLINKIT2TMultiChoice VisionCentricQA text, image eng
CVBenchCount VisionCentricQA image, text eng
CVBenchRelation VisionCentricQA text, image eng
CVBenchDepth VisionCentricQA text, image eng
CVBenchDistance VisionCentricQA text, image eng
AROCocoOrder Compositionality text, image eng
AROFlickrOrder Compositionality text, image eng
AROVisualAttribution Compositionality text, image eng
AROVisualRelation Compositionality text, image eng
SugarCrepe Compositionality text, image eng
Winoground Compositionality text, image eng
ImageCoDe Compositionality text, image eng
STS12VisualSTS VisualSTS(eng) image eng
STS13VisualSTS VisualSTS(eng) image eng
STS14VisualSTS VisualSTS(eng) image eng
STS15VisualSTS VisualSTS(eng) image eng
STS16VisualSTS VisualSTS(eng) image eng
BLINKIT2IRetrieval Any2AnyRetrieval text, image eng
BLINKIT2TRetrieval Any2AnyRetrieval text, image eng
CIRRIT2IRetrieval Any2AnyRetrieval text, image eng
CUB200I2IRetrieval Any2AnyRetrieval image eng
EDIST2ITRetrieval Any2AnyRetrieval text, image eng
Fashion200kI2TRetrieval Any2AnyRetrieval text, image eng
Fashion200kT2IRetrieval Any2AnyRetrieval text, image eng
FashionIQIT2IRetrieval Any2AnyRetrieval text, image eng
Flickr30kI2TRetrieval Any2AnyRetrieval text, image eng
Flickr30kT2IRetrieval Any2AnyRetrieval text, image eng
FORBI2IRetrieval Any2AnyRetrieval image eng
GLDv2I2IRetrieval Any2AnyRetrieval image eng
GLDv2I2TRetrieval Any2AnyRetrieval text, image eng
HatefulMemesI2TRetrieval Any2AnyRetrieval text, image eng
HatefulMemesT2IRetrieval Any2AnyRetrieval text, image eng
ImageCoDeT2IRetrieval Any2AnyRetrieval text, image eng
InfoSeekIT2ITRetrieval Any2AnyRetrieval text, image eng
InfoSeekIT2TRetrieval Any2AnyRetrieval text, image eng
MemotionI2TRetrieval Any2AnyRetrieval text, image eng
MemotionT2IRetrieval Any2AnyRetrieval text, image eng
METI2IRetrieval Any2AnyRetrieval image eng
MSCOCOI2TRetrieval Any2AnyRetrieval text, image eng
MSCOCOT2IRetrieval Any2AnyRetrieval text, image eng
NIGHTSI2IRetrieval Any2AnyRetrieval image eng
OVENIT2ITRetrieval Any2AnyRetrieval image, text eng
OVENIT2TRetrieval Any2AnyRetrieval text, image eng
ROxfordEasyI2IRetrieval Any2AnyRetrieval image eng
ROxfordMediumI2IRetrieval Any2AnyRetrieval image eng
ROxfordHardI2IRetrieval Any2AnyRetrieval image eng
RP2kI2IRetrieval Any2AnyRetrieval image eng
RParisEasyI2IRetrieval Any2AnyRetrieval image eng
RParisMediumI2IRetrieval Any2AnyRetrieval image eng
RParisHardI2IRetrieval Any2AnyRetrieval image eng
SciMMIRI2TRetrieval Any2AnyRetrieval text, image eng
SciMMIRT2IRetrieval Any2AnyRetrieval text, image eng
SketchyI2IRetrieval Any2AnyRetrieval image eng
SOPI2IRetrieval Any2AnyRetrieval image eng
StanfordCarsI2IRetrieval Any2AnyRetrieval image eng
TUBerlinT2IRetrieval Any2AnyRetrieval text, image eng
VidoreArxivQARetrieval DocumentUnderstanding text, image eng
VidoreDocVQARetrieval DocumentUnderstanding text, image eng
VidoreInfoVQARetrieval DocumentUnderstanding text, image eng
VidoreTabfquadRetrieval DocumentUnderstanding text, image eng
VidoreTatdqaRetrieval DocumentUnderstanding text, image eng
VidoreShiftProjectRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAAIRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAEnergyRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAGovernmentReportsRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAHealthcareIndustryRetrieval DocumentUnderstanding text, image eng
VisualNewsI2TRetrieval Any2AnyRetrieval image, text eng
VisualNewsT2IRetrieval Any2AnyRetrieval image, text eng
VizWizIT2TRetrieval Any2AnyRetrieval text, image eng
VQA2IT2TRetrieval Any2AnyRetrieval text, image eng
WebQAT2ITRetrieval Any2AnyRetrieval image, text eng
WebQAT2TRetrieval Any2AnyRetrieval text eng
WITT2IRetrieval Any2AnyMultilingualRetrieval text, image ara, bul, dan, ell, eng, ... (11)
XFlickr30kCoT2IRetrieval Any2AnyMultilingualRetrieval text, image deu, eng, ind, jpn, rus, ... (8)
XM3600T2IRetrieval Any2AnyMultilingualRetrieval text, image ara, ben, ces, dan, deu, ... (36)
VisualSTS17Eng VisualSTS(eng) image eng
VisualSTS-b-Eng VisualSTS(eng) image eng
VisualSTS17Multilingual VisualSTS(multi) image ara, deu, eng, fra, ita, ... (9)
VisualSTS-b-Multilingual VisualSTS(multi) image cmn, deu, fra, ita, nld, ... (9)
Citation
@article{xiao2025mieb,
  author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
  doi = {10.48550/ARXIV.2504.10471},
  journal = {arXiv preprint arXiv:2504.10471},
  publisher = {arXiv},
  title = {MIEB: Massive Image Embedding Benchmark},
  url = {https://arxiv.org/abs/2504.10471},
  year = {2025},
}

MIEB(eng)

MIEB(eng) is a comprehensive image embeddings benchmark, spanning 8 task types, covering 125 tasks. In addition to image classification (zero shot and linear probing), clustering, retrieval, MIEB includes tasks in compositionality evaluation, document undestanding, visual STS, and CV-centric tasks.

Learn more →

Tasks
name type modalities languages
Birdsnap ImageClassification image eng
Caltech101 ImageClassification image eng
CIFAR10 ImageClassification image eng
CIFAR100 ImageClassification image eng
Country211 ImageClassification image eng
DTD ImageClassification image eng
EuroSAT ImageClassification image eng
FER2013 ImageClassification image eng
FGVCAircraft ImageClassification image eng
Food101Classification ImageClassification image eng
GTSRB ImageClassification image eng
Imagenet1k ImageClassification image eng
MNIST ImageClassification image eng
OxfordFlowersClassification ImageClassification image eng
OxfordPets ImageClassification image eng
PatchCamelyon ImageClassification image eng
RESISC45 ImageClassification image eng
StanfordCars ImageClassification image eng
STL10 ImageClassification image eng
SUN397 ImageClassification image eng
UCF101 ImageClassification image eng
VOC2007 ImageClassification image eng
CIFAR10Clustering ImageClustering image eng
CIFAR100Clustering ImageClustering image eng
ImageNetDog15Clustering ImageClustering image eng
ImageNet10Clustering ImageClustering image eng
TinyImageNetClustering ImageClustering image eng
BirdsnapZeroShot ZeroShotClassification image, text eng
Caltech101ZeroShot ZeroShotClassification text, image eng
CIFAR10ZeroShot ZeroShotClassification text, image eng
CIFAR100ZeroShot ZeroShotClassification text, image eng
CLEVRZeroShot ZeroShotClassification text, image eng
CLEVRCountZeroShot ZeroShotClassification text, image eng
Country211ZeroShot ZeroShotClassification image, text eng
DTDZeroShot ZeroShotClassification image, text eng
EuroSATZeroShot ZeroShotClassification image, text eng
FER2013ZeroShot ZeroShotClassification image, text eng
FGVCAircraftZeroShot ZeroShotClassification text, image eng
Food101ZeroShot ZeroShotClassification text, image eng
GTSRBZeroShot ZeroShotClassification image eng
Imagenet1kZeroShot ZeroShotClassification image, text eng
MNISTZeroShot ZeroShotClassification image, text eng
OxfordPetsZeroShot ZeroShotClassification text, image eng
PatchCamelyonZeroShot ZeroShotClassification image, text eng
RenderedSST2 ZeroShotClassification text, image eng
RESISC45ZeroShot ZeroShotClassification image, text eng
StanfordCarsZeroShot ZeroShotClassification image, text eng
STL10ZeroShot ZeroShotClassification image, text eng
SUN397ZeroShot ZeroShotClassification image, text eng
UCF101ZeroShot ZeroShotClassification image, text eng
BLINKIT2IMultiChoice VisionCentricQA text, image eng
BLINKIT2TMultiChoice VisionCentricQA text, image eng
CVBenchCount VisionCentricQA image, text eng
CVBenchRelation VisionCentricQA text, image eng
CVBenchDepth VisionCentricQA text, image eng
CVBenchDistance VisionCentricQA text, image eng
AROCocoOrder Compositionality text, image eng
AROFlickrOrder Compositionality text, image eng
AROVisualAttribution Compositionality text, image eng
AROVisualRelation Compositionality text, image eng
SugarCrepe Compositionality text, image eng
Winoground Compositionality text, image eng
ImageCoDe Compositionality text, image eng
STS12VisualSTS VisualSTS(eng) image eng
STS13VisualSTS VisualSTS(eng) image eng
STS14VisualSTS VisualSTS(eng) image eng
STS15VisualSTS VisualSTS(eng) image eng
STS16VisualSTS VisualSTS(eng) image eng
BLINKIT2IRetrieval Any2AnyRetrieval text, image eng
BLINKIT2TRetrieval Any2AnyRetrieval text, image eng
CIRRIT2IRetrieval Any2AnyRetrieval text, image eng
CUB200I2IRetrieval Any2AnyRetrieval image eng
EDIST2ITRetrieval Any2AnyRetrieval text, image eng
Fashion200kI2TRetrieval Any2AnyRetrieval text, image eng
Fashion200kT2IRetrieval Any2AnyRetrieval text, image eng
FashionIQIT2IRetrieval Any2AnyRetrieval text, image eng
Flickr30kI2TRetrieval Any2AnyRetrieval text, image eng
Flickr30kT2IRetrieval Any2AnyRetrieval text, image eng
FORBI2IRetrieval Any2AnyRetrieval image eng
GLDv2I2IRetrieval Any2AnyRetrieval image eng
GLDv2I2TRetrieval Any2AnyRetrieval text, image eng
HatefulMemesI2TRetrieval Any2AnyRetrieval text, image eng
HatefulMemesT2IRetrieval Any2AnyRetrieval text, image eng
ImageCoDeT2IRetrieval Any2AnyRetrieval text, image eng
InfoSeekIT2ITRetrieval Any2AnyRetrieval text, image eng
InfoSeekIT2TRetrieval Any2AnyRetrieval text, image eng
MemotionI2TRetrieval Any2AnyRetrieval text, image eng
MemotionT2IRetrieval Any2AnyRetrieval text, image eng
METI2IRetrieval Any2AnyRetrieval image eng
MSCOCOI2TRetrieval Any2AnyRetrieval text, image eng
MSCOCOT2IRetrieval Any2AnyRetrieval text, image eng
NIGHTSI2IRetrieval Any2AnyRetrieval image eng
OVENIT2ITRetrieval Any2AnyRetrieval image, text eng
OVENIT2TRetrieval Any2AnyRetrieval text, image eng
ROxfordEasyI2IRetrieval Any2AnyRetrieval image eng
ROxfordMediumI2IRetrieval Any2AnyRetrieval image eng
ROxfordHardI2IRetrieval Any2AnyRetrieval image eng
RP2kI2IRetrieval Any2AnyRetrieval image eng
RParisEasyI2IRetrieval Any2AnyRetrieval image eng
RParisMediumI2IRetrieval Any2AnyRetrieval image eng
RParisHardI2IRetrieval Any2AnyRetrieval image eng
SciMMIRI2TRetrieval Any2AnyRetrieval text, image eng
SciMMIRT2IRetrieval Any2AnyRetrieval text, image eng
SketchyI2IRetrieval Any2AnyRetrieval image eng
SOPI2IRetrieval Any2AnyRetrieval image eng
StanfordCarsI2IRetrieval Any2AnyRetrieval image eng
TUBerlinT2IRetrieval Any2AnyRetrieval text, image eng
VidoreArxivQARetrieval DocumentUnderstanding text, image eng
VidoreDocVQARetrieval DocumentUnderstanding text, image eng
VidoreInfoVQARetrieval DocumentUnderstanding text, image eng
VidoreTabfquadRetrieval DocumentUnderstanding text, image eng
VidoreTatdqaRetrieval DocumentUnderstanding text, image eng
VidoreShiftProjectRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAAIRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAEnergyRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAGovernmentReportsRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAHealthcareIndustryRetrieval DocumentUnderstanding text, image eng
VisualNewsI2TRetrieval Any2AnyRetrieval image, text eng
VisualNewsT2IRetrieval Any2AnyRetrieval image, text eng
VizWizIT2TRetrieval Any2AnyRetrieval text, image eng
VQA2IT2TRetrieval Any2AnyRetrieval text, image eng
WebQAT2ITRetrieval Any2AnyRetrieval image, text eng
WebQAT2TRetrieval Any2AnyRetrieval text eng
VisualSTS17Eng VisualSTS(eng) image eng
VisualSTS-b-Eng VisualSTS(eng) image eng
Citation
@article{xiao2025mieb,
  author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
  doi = {10.48550/ARXIV.2504.10471},
  journal = {arXiv preprint arXiv:2504.10471},
  publisher = {arXiv},
  title = {MIEB: Massive Image Embedding Benchmark},
  url = {https://arxiv.org/abs/2504.10471},
  year = {2025},
}

MIEB(lite)

MIEB(lite) is a comprehensive image embeddings benchmark, spanning 10 task types, covering 51 tasks. This is a lite version of MIEB(Multilingual), designed to be run at a fraction of the cost while maintaining relative rank of models.

Learn more →

Tasks
name type modalities languages
Country211 ImageClassification image eng
DTD ImageClassification image eng
EuroSAT ImageClassification image eng
GTSRB ImageClassification image eng
OxfordPets ImageClassification image eng
PatchCamelyon ImageClassification image eng
RESISC45 ImageClassification image eng
SUN397 ImageClassification image eng
ImageNetDog15Clustering ImageClustering image eng
TinyImageNetClustering ImageClustering image eng
CIFAR100ZeroShot ZeroShotClassification text, image eng
Country211ZeroShot ZeroShotClassification image, text eng
FER2013ZeroShot ZeroShotClassification image, text eng
FGVCAircraftZeroShot ZeroShotClassification text, image eng
Food101ZeroShot ZeroShotClassification text, image eng
OxfordPetsZeroShot ZeroShotClassification text, image eng
StanfordCarsZeroShot ZeroShotClassification image, text eng
BLINKIT2IMultiChoice VisionCentricQA text, image eng
CVBenchCount VisionCentricQA image, text eng
CVBenchRelation VisionCentricQA text, image eng
CVBenchDepth VisionCentricQA text, image eng
CVBenchDistance VisionCentricQA text, image eng
AROCocoOrder Compositionality text, image eng
AROFlickrOrder Compositionality text, image eng
AROVisualAttribution Compositionality text, image eng
AROVisualRelation Compositionality text, image eng
Winoground Compositionality text, image eng
ImageCoDe Compositionality text, image eng
STS13VisualSTS VisualSTS(eng) image eng
STS15VisualSTS VisualSTS(eng) image eng
VisualSTS17Multilingual VisualSTS(multi) image ara, deu, eng, fra, ita, ... (9)
VisualSTS-b-Multilingual VisualSTS(multi) image cmn, deu, fra, ita, nld, ... (9)
CIRRIT2IRetrieval Any2AnyRetrieval text, image eng
CUB200I2IRetrieval Any2AnyRetrieval image eng
Fashion200kI2TRetrieval Any2AnyRetrieval text, image eng
HatefulMemesI2TRetrieval Any2AnyRetrieval text, image eng
InfoSeekIT2TRetrieval Any2AnyRetrieval text, image eng
NIGHTSI2IRetrieval Any2AnyRetrieval image eng
OVENIT2TRetrieval Any2AnyRetrieval text, image eng
RP2kI2IRetrieval Any2AnyRetrieval image eng
VidoreDocVQARetrieval DocumentUnderstanding text, image eng
VidoreInfoVQARetrieval DocumentUnderstanding text, image eng
VidoreTabfquadRetrieval DocumentUnderstanding text, image eng
VidoreTatdqaRetrieval DocumentUnderstanding text, image eng
VidoreShiftProjectRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAAIRetrieval DocumentUnderstanding text, image eng
VisualNewsI2TRetrieval Any2AnyRetrieval image, text eng
VQA2IT2TRetrieval Any2AnyRetrieval text, image eng
WebQAT2ITRetrieval Any2AnyRetrieval image, text eng
WITT2IRetrieval Any2AnyMultilingualRetrieval text, image ara, bul, dan, ell, eng, ... (11)
XM3600T2IRetrieval Any2AnyMultilingualRetrieval text, image ara, ben, ces, dan, deu, ... (36)
Citation
@article{xiao2025mieb,
  author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
  doi = {10.48550/ARXIV.2504.10471},
  journal = {arXiv preprint arXiv:2504.10471},
  publisher = {arXiv},
  title = {MIEB: Massive Image Embedding Benchmark},
  url = {https://arxiv.org/abs/2504.10471},
  year = {2025},
}

MINERSBitextMining

Bitext Mining texts from the MINERS benchmark, a benchmark designed to evaluate the ability of multilingual LMs in semantic retrieval tasks, including bitext mining and classification via retrieval-augmented contexts.

Learn more →

Tasks
name type modalities languages
BUCC BitextMining text cmn, deu, eng, fra, rus
LinceMTBitextMining BitextMining text eng, hin
NollySentiBitextMining BitextMining text eng, hau, ibo, pcm, yor
NusaXBitextMining BitextMining text ace, ban, bbc, bjn, bug, ... (12)
NusaTranslationBitextMining BitextMining text abs, bbc, bew, bhp, ind, ... (12)
PhincBitextMining BitextMining text eng, hin
Tatoeba BitextMining text afr, amh, ang, ara, arq, ... (113)
Citation
@article{winata2024miners,
  author = {Winata, Genta Indra and Zhang, Ruochen and Adelani, David Ifeoluwa},
  journal = {arXiv preprint arXiv:2406.07424},
  title = {MINERS: Multilingual Language Models as Semantic Retrievers},
  year = {2024},
}

MTEB(Code, v1)

A massive code embedding benchmark covering retrieval tasks in a miriad of popular programming languages.

Tasks
name type modalities languages
AppsRetrieval Retrieval text eng, python
CodeEditSearchRetrieval Retrieval text c, c++, go, java, javascript, ... (13)
CodeFeedbackMT Retrieval text eng
CodeFeedbackST Retrieval text eng
CodeSearchNetCCRetrieval Retrieval text go, java, javascript, php, python, ... (6)
CodeSearchNetRetrieval Retrieval text go, java, javascript, php, python, ... (6)
CodeTransOceanContest Retrieval text c++, python
CodeTransOceanDL Retrieval text python
CosQA Retrieval text eng, python
COIRCodeSearchNetRetrieval Retrieval text go, java, javascript, php, python, ... (6)
StackOverflowQA Retrieval text eng
SyntheticText2SQL Retrieval text eng, sql
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(Europe, v1)

A regional geopolitical text embedding benchmark targetting embedding performance on European languages.

Tasks
name type modalities languages
BornholmBitextMining BitextMining text dan
BibleNLPBitextMining BitextMining text aai, aak, aau, aaz, abt, ... (829)
BUCC.v2 BitextMining text cmn, deu, eng, fra, rus
DiaBlaBitextMining BitextMining text eng, fra
FloresBitextMining BitextMining text ace, acm, acq, aeb, afr, ... (196)
NorwegianCourtsBitextMining BitextMining text nno, nob
NTREXBitextMining BitextMining text afr, amh, arb, aze, bak, ... (119)
BulgarianStoreReviewSentimentClassfication Classification text bul
CzechProductReviewSentimentClassification Classification text ces
GreekLegalCodeClassification Classification text ell
DBpediaClassification Classification text eng
FinancialPhrasebankClassification Classification text eng
PoemSentimentClassification Classification text eng
ToxicChatClassification Classification text eng
ToxicConversationsClassification Classification text eng
EstonianValenceClassification Classification text est
ItaCaseholdClassification Classification text ita
AmazonCounterfactualClassification Classification text deu, eng, jpn
MassiveScenarioClassification Classification text afr, amh, ara, aze, ben, ... (50)
MultiHateClassification Classification text ara, cmn, deu, eng, fra, ... (11)
NordicLangClassification Classification text dan, fao, isl, nno, nob, ... (6)
ScalaClassification Classification text dan, nno, nob, swe
SwissJudgementClassification Classification text deu, fra, ita
TweetSentimentClassification Classification text ara, deu, eng, fra, hin, ... (8)
CBD Classification text pol
PolEmo2.0-OUT Classification text pol
CSFDSKMovieReviewSentimentClassification Classification text slk
DalajClassification Classification text swe
WikiCitiesClustering Clustering text eng
RomaniBibleClustering Clustering text rom
BigPatentClustering.v2 Clustering text eng
BiorxivClusteringP2P.v2 Clustering text eng
AlloProfClusteringS2S.v2 Clustering text fra
HALClusteringS2S.v2 Clustering text fra
SIB200ClusteringS2S Clustering text ace, acm, acq, aeb, afr, ... (197)
WikiClusteringP2P.v2 Clustering text bos, cat, ces, dan, eus, ... (14)
StackOverflowQA Retrieval text eng
TwitterHjerneRetrieval Retrieval text dan
LegalQuAD Retrieval text deu
ArguAna Retrieval text eng
HagridRetrieval Retrieval text eng
LegalBenchCorporateLobbying Retrieval text eng
LEMBPasskeyRetrieval Retrieval text eng
SCIDOCS Retrieval text eng
SpartQA Retrieval text eng
TempReasonL1 Retrieval text eng
WinoGrande Retrieval text eng
AlloprofRetrieval Retrieval text fra
BelebeleRetrieval Retrieval text acm, afr, als, amh, apc, ... (115)
StatcanDialogueDatasetRetrieval Retrieval text eng, fra
WikipediaRetrievalMultilingual Retrieval text ben, bul, ces, dan, deu, ... (16)
Core17InstructionRetrieval InstructionReranking text eng
News21InstructionRetrieval InstructionReranking text eng
Robust04InstructionRetrieval InstructionReranking text eng
MalteseNewsClassification MultilabelClassification text mlt
MultiEURLEXMultilabelClassification MultilabelClassification text bul, ces, dan, deu, ell, ... (23)
CTKFactsNLI PairClassification text ces
SprintDuplicateQuestions PairClassification text eng
OpusparcusPC PairClassification text deu, eng, fin, fra, rus, ... (6)
RTE3 PairClassification text deu, eng, fra, ita
XNLI PairClassification text ara, bul, deu, ell, eng, ... (14)
PSC PairClassification text pol
WebLINXCandidatesReranking Reranking text eng
AlloprofReranking Reranking text fra
WikipediaRerankingMultilingual Reranking text ben, bul, ces, dan, deu, ... (16)
SICK-R STS text eng
STS12 STS text eng
STS14 STS text eng
STS15 STS text eng
STSBenchmark STS text eng
FinParaSTS STS text fin
STS17 STS text ara, deu, eng, fra, ita, ... (9)
SICK-R-PL STS text pol
STSES STS text spa
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(Indic, v1)

A regional geopolitical text embedding benchmark targetting embedding performance on Indic languages.

Tasks
name type modalities languages
IN22ConvBitextMining BitextMining text asm, ben, brx, doi, eng, ... (23)
IN22GenBitextMining BitextMining text asm, ben, brx, doi, eng, ... (23)
IndicGenBenchFloresBitextMining BitextMining text asm, awa, ben, bgc, bho, ... (30)
LinceMTBitextMining BitextMining text eng, hin
SIB200ClusteringS2S Clustering text ace, acm, acq, aeb, afr, ... (197)
BengaliSentimentAnalysis Classification text ben
GujaratiNewsClassification Classification text guj
HindiDiscourseClassification Classification text hin
SentimentAnalysisHindi Classification text hin
MalayalamNewsClassification Classification text mal
IndicLangClassification Classification text asm, ben, brx, doi, gom, ... (22)
MTOPIntentClassification Classification text deu, eng, fra, hin, spa, ... (6)
MultiHateClassification Classification text ara, cmn, deu, eng, fra, ... (11)
TweetSentimentClassification Classification text ara, deu, eng, fra, hin, ... (8)
NepaliNewsClassification Classification text nep
PunjabiNewsClassification Classification text pan
SanskritShlokasClassification Classification text san
UrduRomanSentimentClassification Classification text urd
XNLI PairClassification text ara, bul, deu, ell, eng, ... (14)
BelebeleRetrieval Retrieval text acm, afr, als, amh, apc, ... (115)
XQuADRetrieval Retrieval text arb, deu, ell, eng, hin, ... (12)
WikipediaRerankingMultilingual Reranking text ben, bul, ces, dan, deu, ... (16)
IndicCrosslingualSTS STS text asm, ben, eng, guj, hin, ... (13)
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(Law, v1)

A benchmark of retrieval tasks in the legal domain.

Tasks
name type modalities languages
AILACasedocs Retrieval text eng
AILAStatutes Retrieval text eng
LegalSummarization Retrieval text eng
GerDaLIRSmall Retrieval text deu
LeCaRDv2 Retrieval text zho
LegalBenchConsumerContractsQA Retrieval text eng
LegalBenchCorporateLobbying Retrieval text eng
LegalQuAD Retrieval text deu

MTEB(Medical, v1)

A curated set of MTEB tasks designed to evaluate systems in the context of medical information retrieval.

Tasks
name type modalities languages
CUREv1 Retrieval text eng, fra, spa
NFCorpus Retrieval text eng
TRECCOVID Retrieval text eng
TRECCOVID-PL Retrieval text pol
SciFact Retrieval text eng
SciFact-PL Retrieval text pol
MedicalQARetrieval Retrieval text eng
PublicHealthQA Retrieval text ara, eng, fra, kor, rus, ... (8)
MedrxivClusteringP2P.v2 Clustering text eng
MedrxivClusteringS2S.v2 Clustering text eng
CmedqaRetrieval Retrieval text cmn
CMedQAv2-reranking Reranking text cmn

MTEB(Multilingual, v1)

A large-scale multilingual expansion of MTEB, driven mainly by highly-curated community contributions covering 250+ languages. This benhcmark has been replaced by MTEB(Multilingual, v2) as one of the datasets (SNLHierarchicalClustering) included in v1 was removed from the Hugging Face Hub.

Learn more →

Tasks
name type modalities languages
BornholmBitextMining BitextMining text dan
BibleNLPBitextMining BitextMining text aai, aak, aau, aaz, abt, ... (829)
BUCC.v2 BitextMining text cmn, deu, eng, fra, rus
DiaBlaBitextMining BitextMining text eng, fra
FloresBitextMining BitextMining text ace, acm, acq, aeb, afr, ... (196)
IN22GenBitextMining BitextMining text asm, ben, brx, doi, eng, ... (23)
IndicGenBenchFloresBitextMining BitextMining text asm, awa, ben, bgc, bho, ... (30)
NollySentiBitextMining BitextMining text eng, hau, ibo, pcm, yor
NorwegianCourtsBitextMining BitextMining text nno, nob
NTREXBitextMining BitextMining text afr, amh, arb, aze, bak, ... (119)
NusaTranslationBitextMining BitextMining text abs, bbc, bew, bhp, ind, ... (12)
NusaXBitextMining BitextMining text ace, ban, bbc, bjn, bug, ... (12)
Tatoeba BitextMining text afr, amh, ang, ara, arq, ... (113)
BulgarianStoreReviewSentimentClassfication Classification text bul
CzechProductReviewSentimentClassification Classification text ces
GreekLegalCodeClassification Classification text ell
DBpediaClassification Classification text eng
FinancialPhrasebankClassification Classification text eng
PoemSentimentClassification Classification text eng
ToxicConversationsClassification Classification text eng
TweetTopicSingleClassification Classification text eng
EstonianValenceClassification Classification text est
FilipinoShopeeReviewsClassification Classification text fil
GujaratiNewsClassification Classification text guj
SentimentAnalysisHindi Classification text hin
IndonesianIdClickbaitClassification Classification text ind
ItaCaseholdClassification Classification text ita
KorSarcasmClassification Classification text kor
KurdishSentimentClassification Classification text kur
MacedonianTweetSentimentClassification Classification text mkd
AfriSentiClassification Classification text amh, arq, ary, hau, ibo, ... (12)
AmazonCounterfactualClassification Classification text deu, eng, jpn
CataloniaTweetClassification Classification text cat, spa
CyrillicTurkicLangClassification Classification text bak, chv, kaz, kir, krc, ... (9)
IndicLangClassification Classification text asm, ben, brx, doi, gom, ... (22)
MasakhaNEWSClassification Classification text amh, eng, fra, hau, ibo, ... (16)
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MultiHateClassification Classification text ara, cmn, deu, eng, fra, ... (11)
NordicLangClassification Classification text dan, fao, isl, nno, nob, ... (6)
NusaParagraphEmotionClassification Classification text bbc, bew, bug, jav, mad, ... (10)
NusaX-senti Classification text ace, ban, bbc, bjn, bug, ... (12)
ScalaClassification Classification text dan, nno, nob, swe
SwissJudgementClassification Classification text deu, fra, ita
NepaliNewsClassification Classification text nep
OdiaNewsClassification Classification text ory
PunjabiNewsClassification Classification text pan
PolEmo2.0-OUT Classification text pol
PAC Classification text pol
SinhalaNewsClassification Classification text sin
CSFDSKMovieReviewSentimentClassification Classification text slk
SiswatiNewsClassification Classification text ssw
SlovakMovieReviewSentimentClassification Classification text svk
SwahiliNewsClassification Classification text swa
DalajClassification Classification text swe
TswanaNewsClassification Classification text tsn
IsiZuluNewsClassification Classification text zul
WikiCitiesClustering Clustering text eng
MasakhaNEWSClusteringS2S Clustering text amh, eng, fra, hau, ibo, ... (16)
RomaniBibleClustering Clustering text rom
ArXivHierarchicalClusteringP2P Clustering text eng
ArXivHierarchicalClusteringS2S Clustering text eng
BigPatentClustering.v2 Clustering text eng
BiorxivClusteringP2P.v2 Clustering text eng
MedrxivClusteringP2P.v2 Clustering text eng
StackExchangeClustering.v2 Clustering text eng
AlloProfClusteringS2S.v2 Clustering text fra
HALClusteringS2S.v2 Clustering text fra
SIB200ClusteringS2S Clustering text ace, acm, acq, aeb, afr, ... (197)
WikiClusteringP2P.v2 Clustering text bos, cat, ces, dan, eus, ... (14)
PlscClusteringP2P.v2 Clustering text pol
SwednClusteringP2P Clustering text swe
CLSClusteringP2P.v2 Clustering text cmn
StackOverflowQA Retrieval text eng
TwitterHjerneRetrieval Retrieval text dan
AILAStatutes Retrieval text eng
ArguAna Retrieval text eng
HagridRetrieval Retrieval text eng
LegalBenchCorporateLobbying Retrieval text eng
LEMBPasskeyRetrieval Retrieval text eng
SCIDOCS Retrieval text eng
SpartQA Retrieval text eng
TempReasonL1 Retrieval text eng
TRECCOVID Retrieval text eng
WinoGrande Retrieval text eng
BelebeleRetrieval Retrieval text acm, afr, als, amh, apc, ... (115)
MLQARetrieval Retrieval text ara, deu, eng, hin, spa, ... (7)
StatcanDialogueDatasetRetrieval Retrieval text eng, fra
WikipediaRetrievalMultilingual Retrieval text ben, bul, ces, dan, deu, ... (16)
CovidRetrieval Retrieval text cmn
Core17InstructionRetrieval InstructionReranking text eng
News21InstructionRetrieval InstructionReranking text eng
Robust04InstructionRetrieval InstructionReranking text eng
KorHateSpeechMLClassification MultilabelClassification text kor
MalteseNewsClassification MultilabelClassification text mlt
MultiEURLEXMultilabelClassification MultilabelClassification text bul, ces, dan, deu, ell, ... (23)
BrazilianToxicTweetsClassification MultilabelClassification text por
CEDRClassification MultilabelClassification text rus
CTKFactsNLI PairClassification text ces
SprintDuplicateQuestions PairClassification text eng
TwitterURLCorpus PairClassification text eng
ArmenianParaphrasePC PairClassification text hye
indonli PairClassification text ind
OpusparcusPC PairClassification text deu, eng, fin, fra, rus, ... (6)
PawsXPairClassification PairClassification text cmn, deu, eng, fra, jpn, ... (7)
RTE3 PairClassification text deu, eng, fra, ita
XNLI PairClassification text ara, bul, deu, ell, eng, ... (14)
PpcPC PairClassification text pol
TERRa PairClassification text rus
WebLINXCandidatesReranking Reranking text eng
AlloprofReranking Reranking text fra
VoyageMMarcoReranking Reranking text jpn
WikipediaRerankingMultilingual Reranking text ben, bul, ces, dan, deu, ... (16)
RuBQReranking Reranking text rus
T2Reranking Reranking text cmn
GermanSTSBenchmark STS text deu
SICK-R STS text eng
STS12 STS text eng
STS13 STS text eng
STS14 STS text eng
STS15 STS text eng
STSBenchmark STS text eng
FaroeseSTS STS text fao
FinParaSTS STS text fin
JSICK STS text jpn
IndicCrosslingualSTS STS text asm, ben, eng, guj, hin, ... (13)
SemRel24STS STS text afr, amh, arb, arq, ary, ... (12)
STS17 STS text ara, deu, eng, fra, ita, ... (9)
STS22.v2 STS text ara, cmn, deu, eng, fra, ... (10)
STSES STS text spa
STSB STS text cmn
MIRACLRetrievalHardNegatives Retrieval text ara, ben, deu, eng, fas, ... (18)
SNLHierarchicalClusteringP2P Clustering text nob
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(Multilingual, v2)

A large-scale multilingual expansion of MTEB, driven mainly by highly-curated community contributions covering 250+ languages.

Learn more →

Tasks
name type modalities languages
BornholmBitextMining BitextMining text dan
BibleNLPBitextMining BitextMining text aai, aak, aau, aaz, abt, ... (829)
BUCC.v2 BitextMining text cmn, deu, eng, fra, rus
DiaBlaBitextMining BitextMining text eng, fra
FloresBitextMining BitextMining text ace, acm, acq, aeb, afr, ... (196)
IN22GenBitextMining BitextMining text asm, ben, brx, doi, eng, ... (23)
IndicGenBenchFloresBitextMining BitextMining text asm, awa, ben, bgc, bho, ... (30)
NollySentiBitextMining BitextMining text eng, hau, ibo, pcm, yor
NorwegianCourtsBitextMining BitextMining text nno, nob
NTREXBitextMining BitextMining text afr, amh, arb, aze, bak, ... (119)
NusaTranslationBitextMining BitextMining text abs, bbc, bew, bhp, ind, ... (12)
NusaXBitextMining BitextMining text ace, ban, bbc, bjn, bug, ... (12)
Tatoeba BitextMining text afr, amh, ang, ara, arq, ... (113)
BulgarianStoreReviewSentimentClassfication Classification text bul
CzechProductReviewSentimentClassification Classification text ces
GreekLegalCodeClassification Classification text ell
DBpediaClassification Classification text eng
FinancialPhrasebankClassification Classification text eng
PoemSentimentClassification Classification text eng
ToxicConversationsClassification Classification text eng
TweetTopicSingleClassification Classification text eng
EstonianValenceClassification Classification text est
FilipinoShopeeReviewsClassification Classification text fil
GujaratiNewsClassification Classification text guj
SentimentAnalysisHindi Classification text hin
IndonesianIdClickbaitClassification Classification text ind
ItaCaseholdClassification Classification text ita
KorSarcasmClassification Classification text kor
KurdishSentimentClassification Classification text kur
MacedonianTweetSentimentClassification Classification text mkd
AfriSentiClassification Classification text amh, arq, ary, hau, ibo, ... (12)
AmazonCounterfactualClassification Classification text deu, eng, jpn
CataloniaTweetClassification Classification text cat, spa
CyrillicTurkicLangClassification Classification text bak, chv, kaz, kir, krc, ... (9)
IndicLangClassification Classification text asm, ben, brx, doi, gom, ... (22)
MasakhaNEWSClassification Classification text amh, eng, fra, hau, ibo, ... (16)
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MultiHateClassification Classification text ara, cmn, deu, eng, fra, ... (11)
NordicLangClassification Classification text dan, fao, isl, nno, nob, ... (6)
NusaParagraphEmotionClassification Classification text bbc, bew, bug, jav, mad, ... (10)
NusaX-senti Classification text ace, ban, bbc, bjn, bug, ... (12)
ScalaClassification Classification text dan, nno, nob, swe
SwissJudgementClassification Classification text deu, fra, ita
NepaliNewsClassification Classification text nep
OdiaNewsClassification Classification text ory
PunjabiNewsClassification Classification text pan
PolEmo2.0-OUT Classification text pol
PAC Classification text pol
SinhalaNewsClassification Classification text sin
CSFDSKMovieReviewSentimentClassification Classification text slk
SiswatiNewsClassification Classification text ssw
SlovakMovieReviewSentimentClassification Classification text svk
SwahiliNewsClassification Classification text swa
DalajClassification Classification text swe
TswanaNewsClassification Classification text tsn
IsiZuluNewsClassification Classification text zul
WikiCitiesClustering Clustering text eng
MasakhaNEWSClusteringS2S Clustering text amh, eng, fra, hau, ibo, ... (16)
RomaniBibleClustering Clustering text rom
ArXivHierarchicalClusteringP2P Clustering text eng
ArXivHierarchicalClusteringS2S Clustering text eng
BigPatentClustering.v2 Clustering text eng
BiorxivClusteringP2P.v2 Clustering text eng
MedrxivClusteringP2P.v2 Clustering text eng
StackExchangeClustering.v2 Clustering text eng
AlloProfClusteringS2S.v2 Clustering text fra
HALClusteringS2S.v2 Clustering text fra
SIB200ClusteringS2S Clustering text ace, acm, acq, aeb, afr, ... (197)
WikiClusteringP2P.v2 Clustering text bos, cat, ces, dan, eus, ... (14)
PlscClusteringP2P.v2 Clustering text pol
SwednClusteringP2P Clustering text swe
CLSClusteringP2P.v2 Clustering text cmn
StackOverflowQA Retrieval text eng
TwitterHjerneRetrieval Retrieval text dan
AILAStatutes Retrieval text eng
ArguAna Retrieval text eng
HagridRetrieval Retrieval text eng
LegalBenchCorporateLobbying Retrieval text eng
LEMBPasskeyRetrieval Retrieval text eng
SCIDOCS Retrieval text eng
SpartQA Retrieval text eng
TempReasonL1 Retrieval text eng
TRECCOVID Retrieval text eng
WinoGrande Retrieval text eng
BelebeleRetrieval Retrieval text acm, afr, als, amh, apc, ... (115)
MLQARetrieval Retrieval text ara, deu, eng, hin, spa, ... (7)
StatcanDialogueDatasetRetrieval Retrieval text eng, fra
WikipediaRetrievalMultilingual Retrieval text ben, bul, ces, dan, deu, ... (16)
CovidRetrieval Retrieval text cmn
Core17InstructionRetrieval InstructionReranking text eng
News21InstructionRetrieval InstructionReranking text eng
Robust04InstructionRetrieval InstructionReranking text eng
KorHateSpeechMLClassification MultilabelClassification text kor
MalteseNewsClassification MultilabelClassification text mlt
MultiEURLEXMultilabelClassification MultilabelClassification text bul, ces, dan, deu, ell, ... (23)
BrazilianToxicTweetsClassification MultilabelClassification text por
CEDRClassification MultilabelClassification text rus
CTKFactsNLI PairClassification text ces
SprintDuplicateQuestions PairClassification text eng
TwitterURLCorpus PairClassification text eng
ArmenianParaphrasePC PairClassification text hye
indonli PairClassification text ind
OpusparcusPC PairClassification text deu, eng, fin, fra, rus, ... (6)
PawsXPairClassification PairClassification text cmn, deu, eng, fra, jpn, ... (7)
RTE3 PairClassification text deu, eng, fra, ita
XNLI PairClassification text ara, bul, deu, ell, eng, ... (14)
PpcPC PairClassification text pol
TERRa PairClassification text rus
WebLINXCandidatesReranking Reranking text eng
AlloprofReranking Reranking text fra
VoyageMMarcoReranking Reranking text jpn
WikipediaRerankingMultilingual Reranking text ben, bul, ces, dan, deu, ... (16)
RuBQReranking Reranking text rus
T2Reranking Reranking text cmn
GermanSTSBenchmark STS text deu
SICK-R STS text eng
STS12 STS text eng
STS13 STS text eng
STS14 STS text eng
STS15 STS text eng
STSBenchmark STS text eng
FaroeseSTS STS text fao
FinParaSTS STS text fin
JSICK STS text jpn
IndicCrosslingualSTS STS text asm, ben, eng, guj, hin, ... (13)
SemRel24STS STS text afr, amh, arb, arq, ary, ... (12)
STS17 STS text ara, deu, eng, fra, ita, ... (9)
STS22.v2 STS text ara, cmn, deu, eng, fra, ... (10)
STSES STS text spa
STSB STS text cmn
MIRACLRetrievalHardNegatives Retrieval text ara, ben, deu, eng, fas, ... (18)
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(Scandinavian, v1)

A curated selection of tasks coverering the Scandinavian languages; Danish, Swedish and Norwegian, including Bokmål and Nynorsk.

Learn more →

Tasks
name type modalities languages
BornholmBitextMining BitextMining text dan
NorwegianCourtsBitextMining BitextMining text nno, nob
AngryTweetsClassification Classification text dan
DanishPoliticalCommentsClassification Classification text dan
DalajClassification Classification text swe
DKHateClassification Classification text dan
LccSentimentClassification Classification text dan
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification Classification text afr, amh, ara, aze, ben, ... (50)
NordicLangClassification Classification text dan, fao, isl, nno, nob, ... (6)
NoRecClassification Classification text nob
NorwegianParliamentClassification Classification text nob
ScalaClassification Classification text dan, nno, nob, swe
SwedishSentimentClassification Classification text swe
SweRecClassification Classification text swe
DanFeverRetrieval Retrieval text dan
NorQuadRetrieval Retrieval text nob
SNLRetrieval Retrieval text nob
SwednRetrieval Retrieval text swe
SweFaqRetrieval Retrieval text swe
TV2Nordretrieval Retrieval text dan
TwitterHjerneRetrieval Retrieval text dan
SNLHierarchicalClusteringS2S Clustering text nob
SNLHierarchicalClusteringP2P Clustering text nob
SwednClusteringP2P Clustering text swe
SwednClusteringS2S Clustering text swe
VGHierarchicalClusteringS2S Clustering text nob
VGHierarchicalClusteringP2P Clustering text nob
Citation
@inproceedings{enevoldsen2024scandinavian,
  author = {Enevoldsen, Kenneth and Kardos, M{\'a}rton and Muennighoff, Niklas and Nielbo, Kristoffer},
  booktitle = {Advances in Neural Information Processing Systems},
  title = {The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding},
  url = {https://nips.cc/virtual/2024/poster/97869},
  year = {2024},
}

MTEB(cmn, v1)

The Chinese Massive Text Embedding Benchmark (C-MTEB) is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets.

Learn more →

Tasks
name type modalities languages
T2Retrieval Retrieval text cmn
MMarcoRetrieval Retrieval text cmn
DuRetrieval Retrieval text cmn
CovidRetrieval Retrieval text cmn
CmedqaRetrieval Retrieval text cmn
EcomRetrieval Retrieval text cmn
MedicalRetrieval Retrieval text cmn
VideoRetrieval Retrieval text cmn
T2Reranking Reranking text cmn
MMarcoReranking Reranking text cmn
CMedQAv1-reranking Reranking text cmn
CMedQAv2-reranking Reranking text cmn
Ocnli PairClassification text cmn
Cmnli PairClassification text cmn
CLSClusteringS2S Clustering text cmn
CLSClusteringP2P Clustering text cmn
ThuNewsClusteringS2S Clustering text cmn
ThuNewsClusteringP2P Clustering text cmn
LCQMC STS text cmn
PAWSX STS text cmn
AFQMC STS text cmn
QBQTC STS text cmn
TNews Classification text cmn
IFlyTek Classification text cmn
Waimai Classification text cmn
OnlineShopping Classification text cmn
JDReview Classification text cmn
MultilingualSentiment Classification text cmn
ATEC STS text cmn
BQ STS text cmn
STSB STS text cmn
MultilingualSentiment Classification text cmn
Citation
@misc{c-pack,
  archiveprefix = {arXiv},
  author = {Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
  eprint = {2309.07597},
  primaryclass = {cs.CL},
  title = {C-Pack: Packaged Resources To Advance General Chinese Embedding},
  year = {2023},
}

MTEB(deu, v1)

A benchmark for text-embedding performance in German.

Learn more →

Tasks
name type modalities languages
AmazonCounterfactualClassification Classification text deu, eng, jpn
AmazonReviewsClassification Classification text cmn, deu, eng, fra, jpn, ... (6)
MTOPDomainClassification Classification text deu, eng, fra, hin, spa, ... (6)
MTOPIntentClassification Classification text deu, eng, fra, hin, spa, ... (6)
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification Classification text afr, amh, ara, aze, ben, ... (50)
BlurbsClusteringP2P Clustering text deu
BlurbsClusteringS2S Clustering text deu
TenKGnadClusteringP2P Clustering text deu
TenKGnadClusteringS2S Clustering text deu
FalseFriendsGermanEnglish PairClassification text deu
PawsXPairClassification PairClassification text cmn, deu, eng, fra, jpn, ... (7)
MIRACLReranking Reranking text ara, ben, deu, eng, fas, ... (18)
GermanQuAD-Retrieval Retrieval text deu
GermanDPR Retrieval text deu
XMarket Retrieval text deu, eng, spa
GerDaLIR Retrieval text deu
GermanSTSBenchmark STS text deu
STS22 STS text ara, cmn, deu, eng, fra, ... (10)
Citation
@misc{wehrli2024germantextembeddingclustering,
  archiveprefix = {arXiv},
  author = {Silvan Wehrli and Bert Arnrich and Christopher Irrgang},
  eprint = {2401.02709},
  primaryclass = {cs.CL},
  title = {German Text Embedding Clustering Benchmark},
  url = {https://arxiv.org/abs/2401.02709},
  year = {2024},
}

MTEB(eng, v1)

The original English benchmark by Muennighoff et al., (2023). This page is an adaptation of the old MTEB leaderboard. We recommend that you use MTEB(eng, v2) instead, as it uses updated versions of the task, making it notably faster to run and resolving a known bug in existing tasks. This benchmark also removes datasets common for fine-tuning, such as MSMARCO, which makes model performance scores more comparable. However, generally, both benchmarks provide similar estimates.

Tasks
name type modalities languages
AmazonPolarityClassification Classification text eng
AmazonReviewsClassification Classification text cmn, deu, eng, fra, jpn, ... (6)
ArguAna Retrieval text eng
ArxivClusteringP2P Clustering text eng
ArxivClusteringS2S Clustering text eng
AskUbuntuDupQuestions Reranking text eng
BIOSSES STS text eng
Banking77Classification Classification text eng
BiorxivClusteringP2P Clustering text eng
BiorxivClusteringS2S Clustering text eng
CQADupstackRetrieval Retrieval text eng
ClimateFEVER Retrieval text eng
DBPedia Retrieval text eng
EmotionClassification Classification text eng
FEVER Retrieval text eng
FiQA2018 Retrieval text eng
HotpotQA Retrieval text eng
ImdbClassification Classification text eng
MTOPDomainClassification Classification text deu, eng, fra, hin, spa, ... (6)
MTOPIntentClassification Classification text deu, eng, fra, hin, spa, ... (6)
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification Classification text afr, amh, ara, aze, ben, ... (50)
MedrxivClusteringP2P Clustering text eng
MedrxivClusteringS2S Clustering text eng
MindSmallReranking Reranking text eng
NFCorpus Retrieval text eng
NQ Retrieval text eng
QuoraRetrieval Retrieval text eng
RedditClustering Clustering text eng
RedditClusteringP2P Clustering text eng
SCIDOCS Retrieval text eng
SICK-R STS text eng
STS12 STS text eng
STS13 STS text eng
STS14 STS text eng
STS15 STS text eng
STS16 STS text eng
STSBenchmark STS text eng
SciDocsRR Reranking text eng
SciFact Retrieval text eng
SprintDuplicateQuestions PairClassification text eng
StackExchangeClustering Clustering text eng
StackExchangeClusteringP2P Clustering text eng
StackOverflowDupQuestions Reranking text eng
SummEval Summarization text eng
TRECCOVID Retrieval text eng
Touche2020 Retrieval text eng
ToxicConversationsClassification Classification text eng
TweetSentimentExtractionClassification Classification text eng
TwentyNewsgroupsClustering Clustering text eng
TwitterSemEval2015 PairClassification text eng
TwitterURLCorpus PairClassification text eng
MSMARCO Retrieval text eng
AmazonCounterfactualClassification Classification text deu, eng, jpn
STS17 STS text ara, deu, eng, fra, ita, ... (9)
STS22 STS text ara, cmn, deu, eng, fra, ... (10)
Citation
@article{muennighoff2022mteb,
  author = {Muennighoff, Niklas and Tazi, Nouamane and Magne, Loïc and Reimers, Nils},
  doi = {10.48550/ARXIV.2210.07316},
  journal = {arXiv preprint arXiv:2210.07316},
  publisher = {arXiv},
  title = {MTEB: Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2210.07316},
  year = {2022},
}

MTEB(eng, v2)

The new English Massive Text Embedding Benchmark. This benchmark was created to account for the fact that many models have now been finetuned to tasks in the original MTEB, and contains tasks that are not as frequently used for model training. This way the new benchmark and leaderboard can give our users a more realistic expectation of models' generalization performance.

The original MTEB leaderboard is available under the MTEB(eng, v1) tab.

Tasks
name type modalities languages
ArguAna Retrieval text eng
ArXivHierarchicalClusteringP2P Clustering text eng
ArXivHierarchicalClusteringS2S Clustering text eng
AskUbuntuDupQuestions Reranking text eng
BIOSSES STS text eng
Banking77Classification Classification text eng
BiorxivClusteringP2P.v2 Clustering text eng
CQADupstackGamingRetrieval Retrieval text eng
CQADupstackUnixRetrieval Retrieval text eng
ClimateFEVERHardNegatives Retrieval text eng
FEVERHardNegatives Retrieval text eng
FiQA2018 Retrieval text eng
HotpotQAHardNegatives Retrieval text eng
ImdbClassification Classification text eng
MTOPDomainClassification Classification text deu, eng, fra, hin, spa, ... (6)
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification Classification text afr, amh, ara, aze, ben, ... (50)
MedrxivClusteringP2P.v2 Clustering text eng
MedrxivClusteringS2S.v2 Clustering text eng
MindSmallReranking Reranking text eng
SCIDOCS Retrieval text eng
SICK-R STS text eng
STS12 STS text eng
STS13 STS text eng
STS14 STS text eng
STS15 STS text eng
STSBenchmark STS text eng
SprintDuplicateQuestions PairClassification text eng
StackExchangeClustering.v2 Clustering text eng
StackExchangeClusteringP2P.v2 Clustering text eng
TRECCOVID Retrieval text eng
Touche2020Retrieval.v3 Retrieval text eng
ToxicConversationsClassification Classification text eng
TweetSentimentExtractionClassification Classification text eng
TwentyNewsgroupsClustering.v2 Clustering text eng
TwitterSemEval2015 PairClassification text eng
TwitterURLCorpus PairClassification text eng
SummEvalSummarization.v2 Summarization text eng
AmazonCounterfactualClassification Classification text deu, eng, jpn
STS17 STS text ara, deu, eng, fra, ita, ... (9)
STS22.v2 STS text ara, cmn, deu, eng, fra, ... (10)
Citation
@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(fas, v1)

The Persian Massive Text Embedding Benchmark (FaMTEB) is a comprehensive benchmark for Persian text embeddings covering 7 tasks and 60+ datasets.

Learn more →

Tasks
name type modalities languages
PersianFoodSentimentClassification Classification text fas
SynPerChatbotConvSAClassification Classification text fas
SynPerChatbotConvSAToneChatbotClassification Classification text fas
SynPerChatbotConvSAToneUserClassification Classification text fas
SynPerChatbotSatisfactionLevelClassification Classification text fas
SynPerChatbotRAGToneChatbotClassification Classification text fas
SynPerChatbotRAGToneUserClassification Classification text fas
SynPerChatbotToneChatbotClassification Classification text fas
SynPerChatbotToneUserClassification Classification text fas
SynPerTextToneClassification Classification text fas
SIDClassification Classification text fas
DeepSentiPers Classification text fas
PersianTextEmotion Classification text fas
SentimentDKSF Classification text fas
NLPTwitterAnalysisClassification Classification text fas
DigikalamagClassification Classification text fas
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification Classification text afr, amh, ara, aze, ben, ... (50)
BeytooteClustering Clustering text fas
DigikalamagClustering Clustering text fas
HamshahriClustring Clustering text fas
NLPTwitterAnalysisClustering Clustering text fas
SIDClustring Clustering text fas
FarsTail PairClassification text fas
CExaPPC PairClassification text fas
SynPerChatbotRAGFAQPC PairClassification text fas
FarsiParaphraseDetection PairClassification text fas
SynPerTextKeywordsPC PairClassification text fas
SynPerQAPC PairClassification text fas
ParsinluEntail PairClassification text fas
ParsinluQueryParaphPC PairClassification text fas
MIRACLReranking Reranking text ara, ben, deu, eng, fas, ... (18)
WikipediaRerankingMultilingual Reranking text ben, bul, ces, dan, deu, ... (16)
SynPerQARetrieval Retrieval text fas
SynPerChatbotTopicsRetrieval Retrieval text fas
SynPerChatbotRAGTopicsRetrieval Retrieval text fas
SynPerChatbotRAGFAQRetrieval Retrieval text fas
PersianWebDocumentRetrieval Retrieval text fas
WikipediaRetrievalMultilingual Retrieval text ben, bul, ces, dan, deu, ... (16)
MIRACLRetrieval Retrieval text ara, ben, deu, eng, fas, ... (18)
ClimateFEVER-Fa Retrieval text fas
DBPedia-Fa Retrieval text fas
HotpotQA-Fa Retrieval text fas
MSMARCO-Fa Retrieval text fas
NQ-Fa Retrieval text fas
ArguAna-Fa Retrieval text fas
CQADupstackRetrieval-Fa Retrieval text fas
FiQA2018-Fa Retrieval text fas
NFCorpus-Fa Retrieval text fas
QuoraRetrieval-Fa Retrieval text fas
SCIDOCS-Fa Retrieval text fas
SciFact-Fa Retrieval text fas
TRECCOVID-Fa Retrieval text fas
Touche2020-Fa Retrieval text fas
Farsick STS text fas
SynPerSTS STS text fas
Query2Query STS text fas
SAMSumFa BitextMining text fas
SynPerChatbotSumSRetrieval BitextMining text fas
SynPerChatbotRAGSumSRetrieval BitextMining text fas
Citation
@article{zinvandi2025famteb,
  author = {Zinvandi, Erfan and Alikhani, Morteza and Sarmadi, Mehran and Pourbahman, Zahra and Arvin, Sepehr and Kazemi, Reza and Amini, Arash},
  journal = {arXiv preprint arXiv:2502.11571},
  title = {Famteb: Massive text embedding benchmark in persian language},
  year = {2025},
}

MTEB(fra, v1)

MTEB-French, a French expansion of the original benchmark with high-quality native French datasets.

Learn more →

Tasks
name type modalities languages
AmazonReviewsClassification Classification text cmn, deu, eng, fra, jpn, ... (6)
MasakhaNEWSClassification Classification text amh, eng, fra, hau, ibo, ... (16)
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification Classification text afr, amh, ara, aze, ben, ... (50)
MTOPDomainClassification Classification text deu, eng, fra, hin, spa, ... (6)
MTOPIntentClassification Classification text deu, eng, fra, hin, spa, ... (6)
AlloProfClusteringP2P Clustering text fra
AlloProfClusteringS2S Clustering text fra
HALClusteringS2S Clustering text fra
MasakhaNEWSClusteringP2P Clustering text amh, eng, fra, hau, ibo, ... (16)
MasakhaNEWSClusteringS2S Clustering text amh, eng, fra, hau, ibo, ... (16)
MLSUMClusteringP2P Clustering text deu, fra, rus, spa
MLSUMClusteringS2S Clustering text deu, fra, rus, spa
PawsXPairClassification PairClassification text cmn, deu, eng, fra, jpn, ... (7)
AlloprofReranking Reranking text fra
SyntecReranking Reranking text fra
AlloprofRetrieval Retrieval text fra
BSARDRetrieval Retrieval text fra
MintakaRetrieval Retrieval text ara, deu, fra, hin, ita, ... (8)
SyntecRetrieval Retrieval text fra
XPQARetrieval Retrieval text ara, cmn, deu, eng, fra, ... (13)
SICKFr STS text fra
STSBenchmarkMultilingualSTS STS text cmn, deu, eng, fra, ita, ... (10)
SummEvalFr Summarization text fra
STS22 STS text ara, cmn, deu, eng, fra, ... (10)
Citation
@misc{ciancone2024mtebfrenchresourcesfrenchsentence,
  archiveprefix = {arXiv},
  author = {Mathieu Ciancone and Imene Kerboua and Marion Schaeffer and Wissam Siblini},
  eprint = {2405.20468},
  primaryclass = {cs.CL},
  title = {MTEB-French: Resources for French Sentence Embedding Evaluation and Analysis},
  url = {https://arxiv.org/abs/2405.20468},
  year = {2024},
}

MTEB(jpn, v1)

JMTEB is a benchmark for evaluating Japanese text embedding models.

Learn more →

Tasks
name type modalities languages
LivedoorNewsClustering.v2 Clustering text jpn
MewsC16JaClustering Clustering text jpn
AmazonReviewsClassification Classification text cmn, deu, eng, fra, jpn, ... (6)
AmazonCounterfactualClassification Classification text deu, eng, jpn
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification Classification text afr, amh, ara, aze, ben, ... (50)
JSTS STS text jpn
JSICK STS text jpn
PawsXPairClassification PairClassification text cmn, deu, eng, fra, jpn, ... (7)
JaqketRetrieval Retrieval text jpn
MrTidyRetrieval Retrieval text ara, ben, eng, fin, ind, ... (11)
JaGovFaqsRetrieval Retrieval text jpn
NLPJournalTitleAbsRetrieval Retrieval text jpn
NLPJournalAbsIntroRetrieval Retrieval text jpn
NLPJournalTitleIntroRetrieval Retrieval text jpn
ESCIReranking Reranking text eng, jpn, spa

MTEB(kor, v1)

A benchmark and leaderboard for evaluation of text embedding in Korean.

Tasks
name type modalities languages
KLUE-TC Classification text kor
MIRACLReranking Reranking text ara, ben, deu, eng, fas, ... (18)
MIRACLRetrieval Retrieval text ara, ben, deu, eng, fas, ... (18)
Ko-StrategyQA Retrieval text kor
KLUE-STS STS text kor
KorSTS STS text kor

MTEB(pol, v1)

Polish Massive Text Embedding Benchmark (PL-MTEB), a comprehensive benchmark for text embeddings in Polish. The PL-MTEB consists of 28 diverse NLP tasks from 5 task types. With tasks adapted based on previously used datasets by the Polish NLP community. In addition, a new PLSC (Polish Library of Science Corpus) dataset was created consisting of titles and abstracts of scientific publications in Polish, which was used as the basis for two novel clustering tasks.

Learn more →

Tasks
name type modalities languages
AllegroReviews Classification text pol
CBD Classification text pol
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification Classification text afr, amh, ara, aze, ben, ... (50)
PolEmo2.0-IN Classification text pol
PolEmo2.0-OUT Classification text pol
PAC Classification text pol
EightTagsClustering Clustering text pol
PlscClusteringS2S Clustering text pol
PlscClusteringP2P Clustering text pol
CDSC-E PairClassification text pol
PpcPC PairClassification text pol
PSC PairClassification text pol
SICK-E-PL PairClassification text pol
CDSC-R STS text pol
SICK-R-PL STS text pol
STS22 STS text ara, cmn, deu, eng, fra, ... (10)
Citation
@article{poswiata2024plmteb,
  author = {Rafał Poświata and Sławomir Dadas and Michał Perełkiewicz},
  journal = {arXiv preprint arXiv:2405.10138},
  title = {PL-MTEB: Polish Massive Text Embedding Benchmark},
  year = {2024},
}

MTEB(rus, v1)

A Russian version of the Massive Text Embedding Benchmark with a number of novel Russian tasks in all task categories of the original MTEB.

Learn more →

Tasks
name type modalities languages
GeoreviewClassification Classification text rus
HeadlineClassification Classification text rus
InappropriatenessClassification Classification text rus
KinopoiskClassification Classification text rus
MassiveIntentClassification Classification text afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification Classification text afr, amh, ara, aze, ben, ... (50)
RuReviewsClassification Classification text rus
RuSciBenchGRNTIClassification Classification text rus
RuSciBenchOECDClassification Classification text rus
GeoreviewClusteringP2P Clustering text rus
RuSciBenchGRNTIClusteringP2P Clustering text rus
RuSciBenchOECDClusteringP2P Clustering text rus
CEDRClassification MultilabelClassification text rus
SensitiveTopicsClassification MultilabelClassification text rus
TERRa PairClassification text rus
MIRACLReranking Reranking text ara, ben, deu, eng, fas, ... (18)
RuBQReranking Reranking text rus
MIRACLRetrieval Retrieval text ara, ben, deu, eng, fas, ... (18)
RiaNewsRetrieval Retrieval text rus
RuBQRetrieval Retrieval text rus
RUParaPhraserSTS STS text rus
STS22 STS text ara, cmn, deu, eng, fra, ... (10)
RuSTSBenchmarkSTS STS text rus
Citation
@misc{snegirev2024russianfocusedembeddersexplorationrumteb,
  archiveprefix = {arXiv},
  author = {Artem Snegirev and Maria Tikhonova and Anna Maksimova and Alena Fenogenova and Alexander Abramov},
  eprint = {2408.12503},
  primaryclass = {cs.CL},
  title = {The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design},
  url = {https://arxiv.org/abs/2408.12503},
  year = {2024},
}

NanoBEIR

A benchmark to evaluate with subsets of BEIR datasets to use less computational power

Learn more →

Tasks
name type modalities languages
NanoArguAnaRetrieval Retrieval text eng
NanoClimateFeverRetrieval Retrieval text eng
NanoDBPediaRetrieval Retrieval text eng
NanoFEVERRetrieval Retrieval text eng
NanoFiQA2018Retrieval Retrieval text eng
NanoHotpotQARetrieval Retrieval text eng
NanoMSMARCORetrieval Retrieval text eng
NanoNFCorpusRetrieval Retrieval text eng
NanoNQRetrieval Retrieval text eng
NanoQuoraRetrieval Retrieval text eng
NanoSCIDOCSRetrieval Retrieval text eng
NanoSciFactRetrieval Retrieval text eng
NanoTouche2020Retrieval Retrieval text eng

R2MED

R2MED: First Reasoning-Driven Medical Retrieval Benchmark. R2MED is a high-quality, high-resolution information retrieval (IR) dataset designed for medical scenarios. It contains 876 queries with three retrieval tasks, five medical scenarios, and twelve body systems.

Learn more →

Tasks
name type modalities languages
R2MEDBiologyRetrieval Retrieval text eng
R2MEDBioinformaticsRetrieval Retrieval text eng
R2MEDMedicalSciencesRetrieval Retrieval text eng
R2MEDMedXpertQAExamRetrieval Retrieval text eng
R2MEDMedQADiagRetrieval Retrieval text eng
R2MEDPMCTreatmentRetrieval Retrieval text eng
R2MEDPMCClinicalRetrieval Retrieval text eng
R2MEDIIYiClinicalRetrieval Retrieval text eng
Citation
@article{li2025r2med,
  author = {Li, Lei and Zhou, Xiao and Liu, Zheng},
  journal = {arXiv preprint arXiv:2505.14558},
  title = {R2MED: A Benchmark for Reasoning-Driven Medical Retrieval},
  year = {2025},
}

RAR-b

A benchmark to evaluate reasoning capabilities of retrievers.

Learn more →

Tasks
name type modalities languages
ARCChallenge Retrieval text eng
AlphaNLI Retrieval text eng
HellaSwag Retrieval text eng
WinoGrande Retrieval text eng
PIQA Retrieval text eng
SIQA Retrieval text eng
Quail Retrieval text eng
SpartQA Retrieval text eng
TempReasonL1 Retrieval text eng
TempReasonL2Pure Retrieval text eng
TempReasonL2Fact Retrieval text eng
TempReasonL2Context Retrieval text eng
TempReasonL3Pure Retrieval text eng
TempReasonL3Fact Retrieval text eng
TempReasonL3Context Retrieval text eng
RARbCode Retrieval text eng
RARbMath Retrieval text eng
Citation
@article{xiao2024rar,
  author = {Xiao, Chenghao and Hudson, G Thomas and Al Moubayed, Noura},
  journal = {arXiv preprint arXiv:2404.06347},
  title = {RAR-b: Reasoning as Retrieval Benchmark},
  year = {2024},
}

RuSciBench

RuSciBench is a benchmark designed for evaluating sentence encoders and language models on scientific texts in both Russian and English. The data is sourced from eLibrary (www.elibrary.ru), Russia's largest electronic library of scientific publications. This benchmark facilitates the evaluation and comparison of models on various research-related tasks.

Learn more →

Tasks
name type modalities languages
RuSciBenchBitextMining BitextMining text eng, rus
RuSciBenchCoreRiscClassification Classification text eng, rus
RuSciBenchGRNTIClassification.v2 Classification text eng, rus
RuSciBenchOECDClassification.v2 Classification text eng, rus
RuSciBenchPubTypeClassification Classification text eng, rus
RuSciBenchCiteRetrieval Retrieval text eng, rus
RuSciBenchCociteRetrieval Retrieval text eng, rus
RuSciBenchCitedCountRegression Regression text eng, rus
RuSciBenchYearPublRegression Regression text eng, rus
Citation
@article{vatolin2024ruscibench,
  author = {Vatolin, A. and Gerasimenko, N. and Ianina, A. and Vorontsov, K.},
  doi = {10.1134/S1064562424602191},
  issn = {1531-8362},
  journal = {Doklady Mathematics},
  month = {12},
  number = {1},
  pages = {S251--S260},
  title = {RuSciBench: Open Benchmark for Russian and English Scientific Document Representations},
  url = {https://doi.org/10.1134/S1064562424602191},
  volume = {110},
  year = {2024},
}

VN-MTEB (vie, v1)

A benchmark for text-embedding performance in Vietnamese.

Learn more →

Tasks
name type modalities languages
ArguAna-VN Retrieval text vie
SciFact-VN Retrieval text vie
ClimateFEVER-VN Retrieval text vie
FEVER-VN Retrieval text vie
DBPedia-VN Retrieval text vie
NQ-VN Retrieval text vie
HotpotQA-VN Retrieval text vie
MSMARCO-VN Retrieval text vie
TRECCOVID-VN Retrieval text vie
FiQA2018-VN Retrieval text vie
NFCorpus-VN Retrieval text vie
SCIDOCS-VN Retrieval text vie
Touche2020-VN Retrieval text vie
Quora-VN Retrieval text vie
CQADupstackAndroid-VN Retrieval text vie
CQADupstackGis-VN Retrieval text vie
CQADupstackMathematica-VN Retrieval text vie
CQADupstackPhysics-VN Retrieval text vie
CQADupstackProgrammers-VN Retrieval text vie
CQADupstackStats-VN Retrieval text vie
CQADupstackTex-VN Retrieval text vie
CQADupstackUnix-VN Retrieval text vie
CQADupstackWebmasters-VN Retrieval text vie
CQADupstackWordpress-VN Retrieval text vie
Banking77VNClassification Classification text vie
EmotionVNClassification Classification text vie
AmazonCounterfactualVNClassification Classification text vie
MTOPDomainVNClassification Classification text vie
TweetSentimentExtractionVNClassification Classification text vie
ToxicConversationsVNClassification Classification text vie
ImdbVNClassification Classification text vie
MTOPIntentVNClassification Classification text vie
MassiveScenarioVNClassification Classification text vie
MassiveIntentVNClassification Classification text vie
AmazonReviewsVNClassification Classification text vie
AmazonPolarityVNClassification Classification text vie
SprintDuplicateQuestions-VN PairClassification text vie
TwitterSemEval2015-VN PairClassification text vie
TwitterURLCorpus-VN PairClassification text vie
TwentyNewsgroupsClustering-VN Clustering text vie
RedditClusteringP2P-VN Clustering text vie
StackExchangeClusteringP2P-VN Clustering text vie
StackExchangeClustering-VN Clustering text vie
RedditClustering-VN Clustering text vie
SciDocsRR-VN Reranking text vie
AskUbuntuDupQuestions-VN Reranking text vie
StackOverflowDupQuestions-VN Reranking text vie
BIOSSES-VN STS text vie
SICK-R-VN STS text vie
STSBenchmark-VN STS text vie
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

ViDoRe(v1)

Retrieve associated pages according to questions.

Learn more →

Tasks
name type modalities languages
VidoreArxivQARetrieval DocumentUnderstanding text, image eng
VidoreDocVQARetrieval DocumentUnderstanding text, image eng
VidoreInfoVQARetrieval DocumentUnderstanding text, image eng
VidoreTabfquadRetrieval DocumentUnderstanding text, image eng
VidoreTatdqaRetrieval DocumentUnderstanding text, image eng
VidoreShiftProjectRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAAIRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAEnergyRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAGovernmentReportsRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAHealthcareIndustryRetrieval DocumentUnderstanding text, image eng
Citation
@article{faysse2024colpali,
  author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
  journal = {arXiv preprint arXiv:2407.01449},
  title = {ColPali: Efficient Document Retrieval with Vision Language Models},
  year = {2024},
}

ViDoRe(v2)

Retrieve associated pages according to questions.

Learn more →

Tasks
name type modalities languages
Vidore2ESGReportsRetrieval DocumentUnderstanding text, image deu, eng, fra, spa
Vidore2EconomicsReportsRetrieval DocumentUnderstanding text, image deu, eng, fra, spa
Vidore2BioMedicalLecturesRetrieval DocumentUnderstanding text, image deu, eng, fra, spa
Vidore2ESGReportsHLRetrieval DocumentUnderstanding text, image eng
Citation
@article{mace2025vidorev2,
  author = {Macé, Quentin and Loison António and Faysse, Manuel},
  journal = {arXiv preprint arXiv:2505.17166},
  title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
  year = {2025},
}

VisualDocumentRetrieval

A benchmark for evaluating visual document retrieval, combining ViDoRe v1 and v2.

Learn more →

Tasks
name type modalities languages
VidoreArxivQARetrieval DocumentUnderstanding text, image eng
VidoreDocVQARetrieval DocumentUnderstanding text, image eng
VidoreInfoVQARetrieval DocumentUnderstanding text, image eng
VidoreTabfquadRetrieval DocumentUnderstanding text, image eng
VidoreTatdqaRetrieval DocumentUnderstanding text, image eng
VidoreShiftProjectRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAAIRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAEnergyRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAGovernmentReportsRetrieval DocumentUnderstanding text, image eng
VidoreSyntheticDocQAHealthcareIndustryRetrieval DocumentUnderstanding text, image eng
Vidore2ESGReportsRetrieval DocumentUnderstanding text, image deu, eng, fra, spa
Vidore2EconomicsReportsRetrieval DocumentUnderstanding text, image deu, eng, fra, spa
Vidore2BioMedicalLecturesRetrieval DocumentUnderstanding text, image deu, eng, fra, spa
Vidore2ESGReportsHLRetrieval DocumentUnderstanding text, image eng
Citation
@article{mace2025vidorev2,
  author = {Macé, Quentin and Loison António and Faysse, Manuel},
  journal = {arXiv preprint arXiv:2505.17166},
  title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
  year = {2025},
}