Available Benchmarks¶

BEIR¶

BEIR is a heterogeneous benchmark containing diverse IR tasks. It also provides a common and easy framework for evaluation of your NLP-based retrieval models within the benchmark.

Learn more →

Tasks

name	type	modalities	languages
TRECCOVID	Retrieval	text	eng
NFCorpus	Retrieval	text	eng
NQ	Retrieval	text	eng
HotpotQA	Retrieval	text	eng
FiQA2018	Retrieval	text	eng
ArguAna	Retrieval	text	eng
Touche2020	Retrieval	text	eng
CQADupstackRetrieval	Retrieval	text	eng
QuoraRetrieval	Retrieval	text	eng
DBPedia	Retrieval	text	eng
SCIDOCS	Retrieval	text	eng
FEVER	Retrieval	text	eng
ClimateFEVER	Retrieval	text	eng
SciFact	Retrieval	text	eng
MSMARCO	Retrieval	text	eng

Citation

@inproceedings{thakur2021beir,
  author = {Nandan Thakur and Nils Reimers and Andreas R{\"u}ckl{\'e} and Abhishek Srivastava and Iryna Gurevych},
  booktitle = {Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
  title = {{BEIR}: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models},
  url = {https://openreview.net/forum?id=wCu6T5xFjeJ},
  year = {2021},
}

BEIR-NL¶

BEIR-NL is a Dutch adaptation of the publicly available BEIR benchmark, created through automated translation.

Learn more →

Tasks

name	type	modalities	languages
ArguAna-NL	Retrieval	text	nld
CQADupstack-NL	Retrieval	text	nld
FEVER-NL	Retrieval	text	nld
NQ-NL	Retrieval	text	nld
Touche2020-NL	Retrieval	text	nld
FiQA2018-NL	Retrieval	text	nld
Quora-NL	Retrieval	text	nld
HotpotQA-NL	Retrieval	text	nld
SCIDOCS-NL	Retrieval	text	nld
ClimateFEVER-NL	Retrieval	text	nld
mMARCO-NL	Retrieval	text	nld
SciFact-NL	Retrieval	text	nld
DBPedia-NL	Retrieval	text	nld
NFCorpus-NL	Retrieval	text	nld
TRECCOVID-NL	Retrieval	text	nld

Citation

@misc{banar2024beirnlzeroshotinformationretrieval,
  archiveprefix = {arXiv},
  author = {Nikolay Banar and Ehsan Lotfi and Walter Daelemans},
  eprint = {2412.08329},
  primaryclass = {cs.CL},
  title = {BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language},
  url = {https://arxiv.org/abs/2412.08329},
  year = {2024},
}

BRIGHT¶

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval. BRIGHT is the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents with a dataset consisting of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding. These queries are drawn from naturally occurring and carefully curated human data.

Learn more →

Tasks

name	type	modalities	languages
BrightRetrieval	Retrieval	text	eng

Citation

@article{su2024bright,
  author = {Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and others},
  journal = {arXiv preprint arXiv:2407.12883},
  title = {Bright: A realistic and challenging benchmark for reasoning-intensive retrieval},
  year = {2024},
}

BRIGHT (long)¶

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval. BRIGHT is the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents with a dataset consisting of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding. These queries are drawn from naturally occurring and carefully curated human data.

This is the long version of the benchmark, which only filter longer documents.

Learn more →

Tasks

name	type	modalities	languages
BrightLongRetrieval	Retrieval	text	eng

Citation

@article{su2024bright,
  author = {Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and others},
  journal = {arXiv preprint arXiv:2407.12883},
  title = {Bright: A realistic and challenging benchmark for reasoning-intensive retrieval},
  year = {2024},
}

BRIGHT(v1.1)¶

v1.1 refactors the BRIGHT into a different tasks and added prompt to individual tasks.

Learn more →

Tasks

name	type	modalities	languages
BrightBiologyRetrieval	Retrieval	text	eng
BrightEarthScienceRetrieval	Retrieval	text	eng
BrightEconomicsRetrieval	Retrieval	text	eng
BrightPsychologyRetrieval	Retrieval	text	eng
BrightRoboticsRetrieval	Retrieval	text	eng
BrightStackoverflowRetrieval	Retrieval	text	eng
BrightSustainableLivingRetrieval	Retrieval	text	eng
BrightPonyRetrieval	Retrieval	text	eng
BrightLeetcodeRetrieval	Retrieval	text	eng
BrightAopsRetrieval	Retrieval	text	eng
BrightTheoremQATheoremsRetrieval	Retrieval	text	eng
BrightTheoremQAQuestionsRetrieval	Retrieval	text	eng
BrightBiologyLongRetrieval	Retrieval	text	eng
BrightEarthScienceLongRetrieval	Retrieval	text	eng
BrightEconomicsLongRetrieval	Retrieval	text	eng
BrightPsychologyLongRetrieval	Retrieval	text	eng
BrightRoboticsLongRetrieval	Retrieval	text	eng
BrightStackoverflowLongRetrieval	Retrieval	text	eng
BrightSustainableLivingLongRetrieval	Retrieval	text	eng
BrightPonyLongRetrieval	Retrieval	text	eng

Citation

@article{su2024bright,
  author = {Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and others},
  journal = {arXiv preprint arXiv:2407.12883},
  title = {Bright: A realistic and challenging benchmark for reasoning-intensive retrieval},
  year = {2024},
}

BuiltBench(eng)¶

"Built-Bench" is an ongoing effort aimed at evaluating text embedding models in the context of built asset management, spanning over various disciplines such as architecture, engineering, construction, and operations management of the built environment.

Learn more →

Tasks

name	type	modalities	languages
BuiltBenchClusteringP2P	Clustering	text	eng
BuiltBenchClusteringS2S	Clustering	text	eng
BuiltBenchRetrieval	Retrieval	text	eng
BuiltBenchReranking	Reranking	text	eng

Citation

@article{shahinmoghadam2024benchmarking,
  author = {Shahinmoghadam, Mehrzad and Motamedi, Ali},
  journal = {arXiv preprint arXiv:2411.12056},
  title = {Benchmarking pre-trained text embedding models in aligning built asset information},
  year = {2024},
}

ChemTEB¶

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Learn more →

Tasks

name	type	modalities	languages
PubChemSMILESBitextMining	BitextMining	text	eng
SDSEyeProtectionClassification	Classification	text	eng
SDSGlovesClassification	Classification	text	eng
WikipediaBioMetChemClassification	Classification	text	eng
WikipediaGreenhouseEnantiopureClassification	Classification	text	eng
WikipediaSolidStateColloidalClassification	Classification	text	eng
WikipediaOrganicInorganicClassification	Classification	text	eng
WikipediaCryobiologySeparationClassification	Classification	text	eng
WikipediaChemistryTopicsClassification	Classification	text	eng
WikipediaTheoreticalAppliedClassification	Classification	text	eng
WikipediaChemFieldsClassification	Classification	text	eng
WikipediaLuminescenceClassification	Classification	text	eng
WikipediaIsotopesFissionClassification	Classification	text	eng
WikipediaSaltsSemiconductorsClassification	Classification	text	eng
WikipediaBiolumNeurochemClassification	Classification	text	eng
WikipediaCrystallographyAnalyticalClassification	Classification	text	eng
WikipediaCompChemSpectroscopyClassification	Classification	text	eng
WikipediaChemEngSpecialtiesClassification	Classification	text	eng
WikipediaChemistryTopicsClustering	Clustering	text	eng
WikipediaSpecialtiesInChemistryClustering	Clustering	text	eng
PubChemAISentenceParaphrasePC	PairClassification	text	eng
PubChemSMILESPC	PairClassification	text	eng
PubChemSynonymPC	PairClassification	text	eng
PubChemWikiParagraphsPC	PairClassification	text	eng
PubChemWikiPairClassification	PairClassification	text	ces, deu, eng, fra, hin, ... (13)
ChemNQRetrieval	Retrieval	text	eng
ChemHotpotQARetrieval	Retrieval	text	eng

Citation

@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

ChemTEB(v1.1)¶

ChemTEB evaluates the performance of text embedding models on chemical domain data. This version adds the ChemRxivRetrieval task.

Learn more →

Tasks

name	type	modalities	languages
PubChemSMILESBitextMining	BitextMining	text	eng
SDSEyeProtectionClassification	Classification	text	eng
SDSGlovesClassification	Classification	text	eng
WikipediaBioMetChemClassification	Classification	text	eng
WikipediaGreenhouseEnantiopureClassification	Classification	text	eng
WikipediaSolidStateColloidalClassification	Classification	text	eng
WikipediaOrganicInorganicClassification	Classification	text	eng
WikipediaCryobiologySeparationClassification	Classification	text	eng
WikipediaChemistryTopicsClassification	Classification	text	eng
WikipediaTheoreticalAppliedClassification	Classification	text	eng
WikipediaChemFieldsClassification	Classification	text	eng
WikipediaLuminescenceClassification	Classification	text	eng
WikipediaIsotopesFissionClassification	Classification	text	eng
WikipediaSaltsSemiconductorsClassification	Classification	text	eng
WikipediaBiolumNeurochemClassification	Classification	text	eng
WikipediaCrystallographyAnalyticalClassification	Classification	text	eng
WikipediaCompChemSpectroscopyClassification	Classification	text	eng
WikipediaChemEngSpecialtiesClassification	Classification	text	eng
WikipediaChemistryTopicsClustering	Clustering	text	eng
WikipediaSpecialtiesInChemistryClustering	Clustering	text	eng
PubChemAISentenceParaphrasePC	PairClassification	text	eng
PubChemSMILESPC	PairClassification	text	eng
PubChemSynonymPC	PairClassification	text	eng
PubChemWikiParagraphsPC	PairClassification	text	eng
PubChemWikiPairClassification	PairClassification	text	ces, deu, eng, fra, hin, ... (13)
ChemNQRetrieval	Retrieval	text	eng
ChemHotpotQARetrieval	Retrieval	text	eng
ChemRxivRetrieval	Retrieval	text	eng

Citation

@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

@article{kasmaee2025chembed,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Astaraki, Mahdi and Saloot, Mohammad Arshi and Sherck, Nicholas and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2508.01643},
  title = {Chembed: Enhancing chemical literature search through domain-specific text embeddings},
  year = {2025},
}

CoIR¶

CoIR: A Comprehensive Benchmark for Code Information Retrieval Models

Learn more →

Tasks

name	type	modalities	languages
AppsRetrieval	Retrieval	text	eng, python
CodeFeedbackMT	Retrieval	text	eng
CodeFeedbackST	Retrieval	text	eng
CodeSearchNetCCRetrieval	Retrieval	text	go, java, javascript, php, python, ... (6)
CodeTransOceanContest	Retrieval	text	c++, python
CodeTransOceanDL	Retrieval	text	python
CosQA	Retrieval	text	eng, python
COIRCodeSearchNetRetrieval	Retrieval	text	go, java, javascript, php, python, ... (6)
StackOverflowQA	Retrieval	text	eng
SyntheticText2SQL	Retrieval	text	eng, sql

Citation

@misc{li2024coircomprehensivebenchmarkcode,
  archiveprefix = {arXiv},
  author = {Xiangyang Li and Kuicai Dong and Yi Quan Lee and Wei Xia and Yichun Yin and Hao Zhang and Yong Liu and Yasheng Wang and Ruiming Tang},
  eprint = {2407.02883},
  primaryclass = {cs.IR},
  title = {CoIR: A Comprehensive Benchmark for Code Information Retrieval Models},
  url = {https://arxiv.org/abs/2407.02883},
  year = {2024},
}

CodeRAG¶

A benchmark for evaluating code retrieval augmented generation, testing models' ability to retrieve relevant programming solutions, tutorials and documentation.

Learn more →

Tasks

name	type	modalities	languages
CodeRAGLibraryDocumentationSolutions	Reranking	text	python
CodeRAGOnlineTutorials	Reranking	text	python
CodeRAGProgrammingSolutions	Reranking	text	python
CodeRAGStackoverflowPosts	Reranking	text	python

Citation

@misc{wang2024coderagbenchretrievalaugmentcode,
  archiveprefix = {arXiv},
  author = {Zora Zhiruo Wang and Akari Asai and Xinyan Velocity Yu and Frank F. Xu and Yiqing Xie and Graham Neubig and Daniel Fried},
  eprint = {2406.14497},
  primaryclass = {cs.SE},
  title = {CodeRAG-Bench: Can Retrieval Augment Code Generation?},
  url = {https://arxiv.org/abs/2406.14497},
  year = {2024},
}

Encodechka¶

A benchmark for evaluating text embedding models on Russian data.

Learn more →

Tasks

name	type	modalities	languages
RUParaPhraserSTS	STS	text	rus
SentiRuEval2016	Classification	text	rus
RuToxicOKMLCUPClassification	Classification	text	rus
InappropriatenessClassificationv2	Classification	text	rus
RuNLUIntentClassification	Classification	text	rus
XNLI	PairClassification	text	ara, bul, deu, ell, eng, ... (14)
RuSTSBenchmarkSTS	STS	text	rus

Citation

@misc{dale_encodechka,
  author = {Dale, David},
  editor = {habr.com},
  month = {June},
  note = {[Online; posted 12-June-2022]},
  title = {Russian rating of sentence encoders},
  url = {https://habr.com/ru/articles/669674/},
  year = {2022},
}

FollowIR¶

Retrieval w/Instructions is the task of finding relevant documents for a query that has detailed instructions.

Learn more →

Tasks

name	type	modalities	languages
Robust04InstructionRetrieval	InstructionReranking	text	eng
News21InstructionRetrieval	InstructionReranking	text	eng
Core17InstructionRetrieval	InstructionReranking	text	eng

Citation

@misc{weller2024followir,
  archiveprefix = {arXiv},
  author = {Orion Weller and Benjamin Chang and Sean MacAvaney and Kyle Lo and Arman Cohan and Benjamin Van Durme and Dawn Lawrie and Luca Soldaini},
  eprint = {2403.15246},
  primaryclass = {cs.IR},
  title = {FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions},
  year = {2024},
}

HUME(v1)¶

The HUME benchmark is designed to evaluate the performance of text embedding models and humans on a comparable set of tasks. This captures areas where models perform better than human annotators and the reverse. In the paper, we go further into the analysis and what conclusions can be drawn.

Tasks

name	type	modalities	languages
HUMEEmotionClassification	Classification	text	eng
HUMEToxicConversationsClassification	Classification	text	eng
HUMETweetSentimentExtractionClassification	Classification	text	eng
HUMEMultilingualSentimentClassification	Classification	text	ara, eng, nob, rus
HUMEArxivClusteringP2P	Clustering	text	eng
HUMERedditClusteringP2P	Clustering	text	eng
HUMEWikiCitiesClustering	Clustering	text	eng
HUMESIB200ClusteringS2S	Clustering	text	ara, dan, eng, fra, rus
HUMECore17InstructionReranking	Reranking	text	eng
HUMENews21InstructionReranking	Reranking	text	eng
HUMERobust04InstructionReranking	Reranking	text	eng
HUMEWikipediaRerankingMultilingual	Reranking	text	dan, eng, nob
HUMESICK-R	STS	text	eng
HUMESTS12	STS	text	eng
HUMESTSBenchmark	STS	text	eng
HUMESTS22	STS	text	ara, eng, fra, rus

JMTEB(v2)¶

JMTEB is a benchmark for evaluating Japanese text embedding models. In v2, we have extended the benchmark to 28 datasets, enabling more comprehensive evaluation compared with v1 (MTEB(jpn, v1)).

Learn more →

Tasks

name	type	modalities	languages
LivedoorNewsClustering.v2	Clustering	text	jpn
MewsC16JaClustering	Clustering	text	jpn
SIB200ClusteringS2S	Clustering	text	ace, acm, acq, aeb, afr, ... (197)
AmazonReviewsClassification	Classification	text	cmn, deu, eng, fra, jpn, ... (6)
AmazonCounterfactualClassification	Classification	text	deu, eng, jpn
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
JapaneseSentimentClassification	Classification	text	jpn
SIB200Classification	Classification	text	ace, acm, acq, aeb, afr, ... (197)
WRIMEClassification	Classification	text	jpn
JSTS	STS	text	jpn
JSICK	STS	text	jpn
JaqketRetrieval	Retrieval	text	jpn
MrTidyRetrieval	Retrieval	text	ara, ben, eng, fin, ind, ... (11)
JaGovFaqsRetrieval	Retrieval	text	jpn
NLPJournalTitleAbsRetrieval.V2	Retrieval	text	jpn
NLPJournalTitleIntroRetrieval.V2	Retrieval	text	jpn
NLPJournalAbsIntroRetrieval.V2	Retrieval	text	jpn
NLPJournalAbsArticleRetrieval.V2	Retrieval	text	jpn
JaCWIRRetrieval	Retrieval	text	jpn
MIRACLRetrieval	Retrieval	text	ara, ben, deu, eng, fas, ... (18)
MintakaRetrieval	Retrieval	text	ara, deu, fra, hin, ita, ... (8)
MultiLongDocRetrieval	Retrieval	text	ara, cmn, deu, eng, fra, ... (13)
ESCIReranking	Reranking	text	eng, jpn, spa
JQaRAReranking	Reranking	text	jpn
JaCWIRReranking	Reranking	text	jpn
MIRACLReranking	Reranking	text	ara, ben, deu, eng, fas, ... (18)
MultiLongDocReranking	Reranking	text	ara, deu, eng, fra, hin, ... (13)

Citation

@article{li2025jmteb,
  author = {Li, Shengzhe and Ohagi, Masaya and Ri, Ryokan and Fukuchi, Akihiko and Shibata, Tomohide and Kawahara, Daisuke},
  issue = {3},
  journal = {Vol.2025-NL-265,No.3,1-15},
  month = {sep},
  title = {{JMTEB and JMTEB-lite: Japanese Massive Text Embedding Benchmark and Its Lightweight Version}},
  year = {2025},
}

JMTEB-lite(v1)¶

JMTEB-lite is a lightweight version of JMTEB. It makes agile evaluation possible by reaching an average of 5x faster evaluation comparing with JMTEB, as 6 heavy datasets in JMTEB are optimized with hard negative pooling strategy, making them much smaller. The result of JMTEB-lite is proved to be highly relevant with that of JMTEB, making it a faithful preview of JMTEB.

Learn more →

Tasks

name	type	modalities	languages
LivedoorNewsClustering.v2	Clustering	text	jpn
MewsC16JaClustering	Clustering	text	jpn
SIB200ClusteringS2S	Clustering	text	ace, acm, acq, aeb, afr, ... (197)
AmazonReviewsClassification	Classification	text	cmn, deu, eng, fra, jpn, ... (6)
AmazonCounterfactualClassification	Classification	text	deu, eng, jpn
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
JapaneseSentimentClassification	Classification	text	jpn
SIB200Classification	Classification	text	ace, acm, acq, aeb, afr, ... (197)
WRIMEClassification	Classification	text	jpn
JSTS	STS	text	jpn
JSICK	STS	text	jpn
JaqketRetrievalLite	Retrieval	text	jpn
MrTyDiJaRetrievalLite	Retrieval	text	jpn
JaGovFaqsRetrieval	Retrieval	text	jpn
NLPJournalTitleAbsRetrieval.V2	Retrieval	text	jpn
NLPJournalTitleIntroRetrieval.V2	Retrieval	text	jpn
NLPJournalAbsIntroRetrieval.V2	Retrieval	text	jpn
NLPJournalAbsArticleRetrieval.V2	Retrieval	text	jpn
JaCWIRRetrievalLite	Retrieval	text	jpn
MIRACLJaRetrievalLite	Retrieval	text	jpn
MintakaRetrieval	Retrieval	text	ara, deu, fra, hin, ita, ... (8)
MultiLongDocRetrieval	Retrieval	text	ara, cmn, deu, eng, fra, ... (13)
ESCIReranking	Reranking	text	eng, jpn, spa
JQaRARerankingLite	Reranking	text	jpn
JaCWIRRerankingLite	Reranking	text	jpn
MIRACLReranking	Reranking	text	ara, ben, deu, eng, fas, ... (18)
MultiLongDocReranking	Reranking	text	ara, deu, eng, fra, hin, ... (13)

Citation

@article{li2025jmteb,
  author = {Li, Shengzhe and Ohagi, Masaya and Ri, Ryokan and Fukuchi, Akihiko and Shibata, Tomohide and Kawahara, Daisuke},
  issue = {3},
  journal = {Vol.2025-NL-265,No.3,1-15},
  month = {sep},
  title = {{JMTEB and JMTEB-lite: Japanese Massive Text Embedding Benchmark and Its Lightweight Version}},
  year = {2025},
}

JinaVDR¶

Multilingual, domain-diverse and layout-rich document retrieval benchmark.

Learn more →

Tasks

name	type	modalities	languages
JinaVDRMedicalPrescriptionsRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRStanfordSlideRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRDonutVQAISynHMPRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRTableVQARetrieval	DocumentUnderstanding	text, image	eng
JinaVDRChartQARetrieval	DocumentUnderstanding	text, image	eng
JinaVDRTQARetrieval	DocumentUnderstanding	text, image	eng
JinaVDROpenAINewsRetrieval	DocumentUnderstanding	text, image	eng
JinaVDREuropeanaDeNewsRetrieval	DocumentUnderstanding	text, image	deu
JinaVDREuropeanaEsNewsRetrieval	DocumentUnderstanding	text, image	spa
JinaVDREuropeanaItScansRetrieval	DocumentUnderstanding	text, image	ita
JinaVDREuropeanaNlLegalRetrieval	DocumentUnderstanding	text, image	nld
JinaVDRHindiGovVQARetrieval	DocumentUnderstanding	text, image	hin
JinaVDRAutomobileCatelogRetrieval	DocumentUnderstanding	text, image	jpn
JinaVDRBeveragesCatalogueRetrieval	DocumentUnderstanding	text, image	rus
JinaVDRRamensBenchmarkRetrieval	DocumentUnderstanding	text, image	jpn
JinaVDRJDocQARetrieval	DocumentUnderstanding	text, image	jpn
JinaVDRHungarianDocQARetrieval	DocumentUnderstanding	text, image	hun
JinaVDRArabicChartQARetrieval	DocumentUnderstanding	text, image	ara
JinaVDRArabicInfographicsVQARetrieval	DocumentUnderstanding	text, image	ara
JinaVDROWIDChartsRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRMPMQARetrieval	DocumentUnderstanding	text, image	eng
JinaVDRJina2024YearlyBookRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRWikimediaCommonsMapsRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRPlotQARetrieval	DocumentUnderstanding	text, image	eng
JinaVDRMMTabRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRCharXivOCRRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRStudentEnrollmentSyntheticRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRGitHubReadmeRetrieval	DocumentUnderstanding	text, image	ara, ben, deu, eng, fra, ... (17)
JinaVDRTweetStockSyntheticsRetrieval	DocumentUnderstanding	text, image	ara, deu, eng, fra, hin, ... (10)
JinaVDRAirbnbSyntheticRetrieval	DocumentUnderstanding	text, image	ara, deu, eng, fra, hin, ... (10)
JinaVDRShanghaiMasterPlanRetrieval	DocumentUnderstanding	text, image	zho
JinaVDRWikimediaCommonsDocumentsRetrieval	DocumentUnderstanding	text, image	ara, ben, deu, eng, fra, ... (20)
JinaVDREuropeanaFrNewsRetrieval	DocumentUnderstanding	text, image	fra
JinaVDRDocQAHealthcareIndustryRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRDocQAAI	DocumentUnderstanding	text, image	eng
JinaVDRShiftProjectRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRTatQARetrieval	DocumentUnderstanding	text, image	eng
JinaVDRInfovqaRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRDocVQARetrieval	DocumentUnderstanding	text, image	eng
JinaVDRDocQAGovReportRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRTabFQuadRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRDocQAEnergyRetrieval	DocumentUnderstanding	text, image	eng
JinaVDRArxivQARetrieval	DocumentUnderstanding	text, image	eng

Citation

@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
  archiveprefix = {arXiv},
  author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
  eprint = {2506.18902},
  primaryclass = {cs.AI},
  title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
  url = {https://arxiv.org/abs/2506.18902},
  year = {2025},
}

KoViDoRe(v2)¶

KoViDoRe v2 sets a new industry gold standard for multi-modal, enterprise document visual retrieval evaluation. It addresses a critical challenge in production RAG systems: retrieving accurate information from complex, visually-rich documents.

Learn more →

Tasks

name	type	modalities	languages
KoVidore2CybersecurityRetrieval	DocumentUnderstanding	text, image	kor
KoVidore2EconomicRetrieval	DocumentUnderstanding	text, image	kor
KoVidore2EnergyRetrieval	DocumentUnderstanding	text, image	kor
KoVidore2HrRetrieval	DocumentUnderstanding	text, image	kor

Citation

@misc{choi2026kovidorev2,
  author = {Yongbin Choi},
  note = {A benchmark for evaluating Korean vision document retrieval with multi-page reasoning queries in practical domains},
  title = {KoViDoRe v2: a comprehensive evaluation of vision document retrieval for enterprise use-cases},
  url = {https://github.com/whybe-choi/kovidore-data-generator},
  year = {2026},
}

LongEmbed¶

LongEmbed is a benchmark oriented at exploring models' performance on long-context retrieval. The benchmark comprises two synthetic tasks and four carefully chosen real-world tasks, featuring documents of varying length and dispersed target information.

Learn more →

Tasks

name	type	modalities	languages
LEMBNarrativeQARetrieval	Retrieval	text	eng
LEMBNeedleRetrieval	Retrieval	text	eng
LEMBPasskeyRetrieval	Retrieval	text	eng
LEMBQMSumRetrieval	Retrieval	text	eng
LEMBSummScreenFDRetrieval	Retrieval	text	eng
LEMBWikimQARetrieval	Retrieval	text	eng

Citation

@article{zhu2024longembed,
  author = {Zhu, Dawei and Wang, Liang and Yang, Nan and Song, Yifan and Wu, Wenhao and Wei, Furu and Li, Sujian},
  journal = {arXiv preprint arXiv:2404.12096},
  title = {LongEmbed: Extending Embedding Models for Long Context Retrieval},
  year = {2024},
}

MAEB(beta)¶

MAEB is a comprehensive audio benchmark with 30 tasks spanning both audio-only and audio-text cross-modal evaluation. Tasks span 7 task types: retrieval (9), classification (10), clustering (3), multilabel classification (2), pair classification (3), reranking (1), and zero-shot classification (2). The benchmark is currently in beta as the paper has been submitted for review and will be released in its final version after the review process.

Tasks

name	type	modalities	languages
ClothoT2ARetrieval	Any2AnyRetrieval	text, audio	eng
CommonVoiceMini21T2ARetrieval	Any2AnyRetrieval	text, audio	abk, afr, amh, ara, asm, ... (114)
FleursT2ARetrieval	Any2AnyRetrieval	text, audio	afr, amh, ara, asm, ast, ... (102)
GigaSpeechT2ARetrieval	Any2AnyRetrieval	text, audio	eng
JamAltArtistA2ARetrieval	Any2AnyRetrieval	audio	deu, eng, fra, spa
JamAltLyricA2TRetrieval	Any2AnyRetrieval	text, audio	deu, eng, fra, spa
MACST2ARetrieval	Any2AnyRetrieval	text, audio	eng
SpokenSQuADT2ARetrieval	Any2AnyRetrieval	text, audio	eng
UrbanSound8KT2ARetrieval	Any2AnyRetrieval	text, audio	zxx
BeijingOpera	AudioClassification	audio	zxx
BirdCLEF	AudioClassification	audio	zxx
CREMA_D	AudioClassification	audio	eng
CommonLanguageAgeDetection	AudioClassification	audio	eng
GTZANGenre	AudioClassification	audio	zxx
IEMOCAPGender	AudioClassification	audio	eng
MInDS14	AudioClassification	audio	ces, deu, eng, fra, ita, ... (12)
MridinghamTonic	AudioClassification	audio	zxx
VoxCelebSA	AudioClassification	audio	eng
VoxPopuliLanguageID	AudioClassification	audio	deu, eng, fra, pol, spa
CREMA_DClustering	AudioClustering	audio	eng
VehicleSoundClustering	AudioClustering	audio	zxx
VoxPopuliGenderClustering	AudioClustering	audio	deu, eng, fra, pol, spa
FSD2019Kaggle	AudioMultilabelClassification	audio	eng
SIBFLEURS	AudioMultilabelClassification	audio	afr, amh, arb, asm, ast, ... (101)
CREMADPairClassification	AudioPairClassification	audio	eng
NMSQAPairClassification	AudioPairClassification	audio	eng
VoxPopuliAccentPairClassification	AudioPairClassification	audio	eng
GTZANAudioReranking	AudioReranking	audio	zxx
RavdessZeroshot	AudioZeroshotClassification	audio, text	eng
SpeechCommandsZeroshotv0.02	AudioZeroshotClassification	audio, text	eng

MAEB(beta, audio-only)¶

MAEB(audio-only) is the audio-only subset of MAEB with 19 tasks spanning 6 task types: classification (10), clustering (3), multilabel classification (1), pair classification (3), reranking (1), and retrieval (1). The benchmark is currently in beta as the paper has been submitted for review and will be released in its final version after the review process.

Tasks

name	type	modalities	languages
JamAltArtistA2ARetrieval	Any2AnyRetrieval	audio	deu, eng, fra, spa
BeijingOpera	AudioClassification	audio	zxx
BirdCLEF	AudioClassification	audio	zxx
CREMA_D	AudioClassification	audio	eng
CommonLanguageAgeDetection	AudioClassification	audio	eng
GTZANGenre	AudioClassification	audio	zxx
IEMOCAPGender	AudioClassification	audio	eng
MInDS14	AudioClassification	audio	ces, deu, eng, fra, ita, ... (12)
MridinghamTonic	AudioClassification	audio	zxx
VoxCelebSA	AudioClassification	audio	eng
VoxPopuliLanguageID	AudioClassification	audio	deu, eng, fra, pol, spa
CREMA_DClustering	AudioClustering	audio	eng
VehicleSoundClustering	AudioClustering	audio	zxx
VoxPopuliGenderClustering	AudioClustering	audio	deu, eng, fra, pol, spa
SIBFLEURS	AudioMultilabelClassification	audio	afr, amh, arb, asm, ast, ... (101)
CREMADPairClassification	AudioPairClassification	audio	eng
NMSQAPairClassification	AudioPairClassification	audio	eng
VoxPopuliAccentPairClassification	AudioPairClassification	audio	eng
GTZANAudioReranking	AudioReranking	audio	zxx

MAEB(beta, extended)¶

MAEB(extended) is an intermediate benchmark used during task selection, containing 89 tasks that combine audio-only and audio-text evaluation before filtering to MAEB. Audio-only (53 tasks): classification (28), multilabel classification (4), reranking (5), clustering (10), pair classification (5), audio-to-audio retrieval (1). Audio-text (36 tasks): audio-text retrieval (31), zero-shot classification (5). The benchmark is currently in beta as the paper has been submitted for review and will be released in its final version after the review process.

Tasks

name	type	modalities	languages
FSD50K	AudioMultilabelClassification	audio	eng
SIBFLEURS	AudioMultilabelClassification	audio	afr, amh, arb, asm, ast, ... (101)
FSD2019Kaggle	AudioMultilabelClassification	audio	eng
AudioSetMini	AudioMultilabelClassification	audio	eng
VoxPopuliAccentID	AudioClassification	audio	eng
MInDS14	AudioClassification	audio	ces, deu, eng, fra, ita, ... (12)
VoxPopuliGenderID	AudioClassification	audio	deu, eng, fra, pol, spa
BeijingOpera	AudioClassification	audio	zxx
AmbientAcousticContext	AudioClassification	audio	zxx
CREMA_D	AudioClassification	audio	eng
VoxCelebSA	AudioClassification	audio	eng
TUTAcousticScenes	AudioClassification	audio	zxx
NSynth	AudioClassification	audio	zxx
VocalSound	AudioClassification	audio	eng
VoxLingua107_Top10	AudioClassification	audio	eng
ESC50	AudioClassification	audio	zxx
CommonLanguageAgeDetection	AudioClassification	audio	eng
IEMOCAPEmotion	AudioClassification	audio	eng
CommonLanguageLanguageDetection	AudioClassification	audio	eng
CommonLanguageGenderDetection	AudioClassification	audio	eng
IEMOCAPGender	AudioClassification	audio	eng
SpokeNEnglish	AudioClassification	audio	eng
FSDD	AudioClassification	audio	eng
LibriCount	AudioClassification	audio	eng
GTZANGenre	AudioClassification	audio	zxx
BirdCLEF	AudioClassification	audio	zxx
VoxPopuliLanguageID	AudioClassification	audio	deu, eng, fra, pol, spa
MridinghamStroke	AudioClassification	audio	zxx
GunshotTriangulation	AudioClassification	audio	zxx
SpeechCommands	AudioClassification	audio	eng
MridinghamTonic	AudioClassification	audio	zxx
BirdSet	AudioMultilabelClassification	audio	zxx
ESC50AudioReranking	AudioReranking	audio	zxx
UrbanSound8KAudioReranking	AudioReranking	audio	zxx
GTZANAudioReranking	AudioReranking	audio	zxx
FSDnoisy18kAudioReranking	AudioReranking	audio	eng
VocalSoundAudioReranking	AudioReranking	audio	eng
VoiceGenderClustering	AudioClustering	audio	eng
VoxPopuliAccentClustering	AudioClustering	audio	eng
AmbientAcousticContextClustering	AudioClustering	audio	eng
VoxCelebClustering	AudioClustering	audio	eng
VoxPopuliGenderClustering	AudioClustering	audio	deu, eng, fra, pol, spa
VehicleSoundClustering	AudioClustering	audio	zxx
MusicGenreClustering	AudioClustering	audio	zxx
ESC50Clustering	AudioClustering	audio	zxx
CREMA_DClustering	AudioClustering	audio	eng
GTZANGenreClustering	AudioClustering	audio	zxx
VoxPopuliAccentPairClassification	AudioPairClassification	audio	eng
ESC50PairClassification	AudioPairClassification	audio	zxx
NMSQAPairClassification	AudioPairClassification	audio	eng
VocalSoundPairClassification	AudioPairClassification	audio	eng
CREMADPairClassification	AudioPairClassification	audio	eng
JamAltArtistA2ARetrieval	Any2AnyRetrieval	audio	deu, eng, fra, spa
AudioCapsA2TRetrieval	Any2AnyRetrieval	text, audio	eng, zxx
AudioSetStrongA2TRetrieval	Any2AnyRetrieval	text, audio	eng
CMUArcticA2TRetrieval	Any2AnyRetrieval	text, audio	eng
EmoVDBA2TRetrieval	Any2AnyRetrieval	text, audio	eng
GigaSpeechA2TRetrieval	Any2AnyRetrieval	text, audio	eng
HiFiTTSA2TRetrieval	Any2AnyRetrieval	text, audio	eng
JLCorpusA2TRetrieval	Any2AnyRetrieval	text, audio	eng
JamAltLyricA2TRetrieval	Any2AnyRetrieval	text, audio	deu, eng, fra, spa
LibriTTSA2TRetrieval	Any2AnyRetrieval	text, audio	eng
MACSA2TRetrieval	Any2AnyRetrieval	text, audio	eng
MusicCapsA2TRetrieval	Any2AnyRetrieval	audio, text	zxx
SpokenSQuADT2ARetrieval	Any2AnyRetrieval	text, audio	eng
UrbanSound8KA2TRetrieval	Any2AnyRetrieval	text, audio	zxx
AudioCapsT2ARetrieval	Any2AnyRetrieval	text, audio	eng, zxx
AudioSetStrongT2ARetrieval	Any2AnyRetrieval	text, audio	eng
CMUArcticT2ARetrieval	Any2AnyRetrieval	text, audio	eng
EmoVDBT2ARetrieval	Any2AnyRetrieval	text, audio	eng
GigaSpeechT2ARetrieval	Any2AnyRetrieval	text, audio	eng
HiFiTTST2ARetrieval	Any2AnyRetrieval	text, audio	eng
JLCorpusT2ARetrieval	Any2AnyRetrieval	text, audio	eng
JamAltLyricT2ARetrieval	Any2AnyRetrieval	text, audio	deu, eng, fra, spa
LibriTTST2ARetrieval	Any2AnyRetrieval	text, audio	eng
MACST2ARetrieval	Any2AnyRetrieval	text, audio	eng
MusicCapsT2ARetrieval	Any2AnyRetrieval	audio, text	zxx
UrbanSound8KT2ARetrieval	Any2AnyRetrieval	text, audio	zxx
ESC50_Zeroshot	AudioZeroshotClassification	audio, text	zxx
RavdessZeroshot	AudioZeroshotClassification	audio, text	eng
SpeechCommandsZeroshotv0.01	AudioZeroshotClassification	audio, text	eng
SpeechCommandsZeroshotv0.02	AudioZeroshotClassification	audio, text	eng
UrbanSound8kZeroshot	AudioZeroshotClassification	audio, text	zxx
ClothoA2TRetrieval	Any2AnyRetrieval	audio, text	eng
ClothoT2ARetrieval	Any2AnyRetrieval	text, audio	eng
FleursA2TRetrieval	Any2AnyRetrieval	audio, text	afr, amh, ara, asm, ast, ... (102)
FleursT2ARetrieval	Any2AnyRetrieval	text, audio	afr, amh, ara, asm, ast, ... (102)
CommonVoiceMini21A2TRetrieval	Any2AnyRetrieval	text, audio	abk, afr, amh, ara, asm, ... (114)
CommonVoiceMini21T2ARetrieval	Any2AnyRetrieval	text, audio	abk, afr, amh, ara, asm, ... (114)

MAEB+(beta)¶

MAEB+ is the full Massive Audio Embedding Benchmark (v1), containing 98 tasks with audio modality across 6 task types: classification, clustering, pair classification, reranking, zero-shot classification, and retrieval. The benchmark is currently in beta as the paper has been submitted for review and will be released in its final version after the review process.

Tasks

name	type	modalities	languages
AmbientAcousticContext	AudioClassification	audio	zxx
AudioSet	AudioMultilabelClassification	audio	eng
AudioSetMini	AudioMultilabelClassification	audio	eng
BeijingOpera	AudioClassification	audio	zxx
BirdCLEF	AudioClassification	audio	zxx
BirdSet	AudioMultilabelClassification	audio	zxx
CommonLanguageAgeDetection	AudioClassification	audio	eng
CommonLanguageGenderDetection	AudioClassification	audio	eng
CommonLanguageLanguageDetection	AudioClassification	audio	eng
CREMA_D	AudioClassification	audio	eng
ESC50	AudioClassification	audio	zxx
FSD2019Kaggle	AudioMultilabelClassification	audio	eng
FSD50K	AudioMultilabelClassification	audio	eng
FSDD	AudioClassification	audio	eng
GTZANGenre	AudioClassification	audio	zxx
GunshotTriangulation	AudioClassification	audio	zxx
IEMOCAPEmotion	AudioClassification	audio	eng
IEMOCAPGender	AudioClassification	audio	eng
LibriCount	AudioClassification	audio	eng
MInDS14	AudioClassification	audio	ces, deu, eng, fra, ita, ... (12)
MridinghamStroke	AudioClassification	audio	zxx
MridinghamTonic	AudioClassification	audio	zxx
NSynth	AudioClassification	audio	zxx
SIBFLEURS	AudioMultilabelClassification	audio	afr, amh, arb, asm, ast, ... (101)
SpeechCommands	AudioClassification	audio	eng
SpokeNEnglish	AudioClassification	audio	eng
SpokenQAForIC	AudioClassification	audio	eng
TUTAcousticScenes	AudioClassification	audio	zxx
UrbanSound8k	AudioClassification	audio	zxx
VocalSound	AudioClassification	audio	eng
VoxCelebSA	AudioClassification	audio	eng
VoxLingua107_Top10	AudioClassification	audio	eng
VoxPopuliAccentID	AudioClassification	audio	eng
VoxPopuliGenderID	AudioClassification	audio	deu, eng, fra, pol, spa
VoxPopuliLanguageID	AudioClassification	audio	deu, eng, fra, pol, spa
AmbientAcousticContextClustering	AudioClustering	audio	eng
CREMA_DClustering	AudioClustering	audio	eng
ESC50Clustering	AudioClustering	audio	zxx
GTZANGenreClustering	AudioClustering	audio	zxx
MusicGenreClustering	AudioClustering	audio	zxx
VehicleSoundClustering	AudioClustering	audio	zxx
VoiceGenderClustering	AudioClustering	audio	eng
VoxCelebClustering	AudioClustering	audio	eng
VoxPopuliAccentClustering	AudioClustering	audio	eng
VoxPopuliGenderClustering	AudioClustering	audio	deu, eng, fra, pol, spa
CREMADPairClassification	AudioPairClassification	audio	eng
ESC50PairClassification	AudioPairClassification	audio	zxx
NMSQAPairClassification	AudioPairClassification	audio	eng
VocalSoundPairClassification	AudioPairClassification	audio	eng
VoxPopuliAccentPairClassification	AudioPairClassification	audio	eng
ESC50AudioReranking	AudioReranking	audio	zxx
FSDnoisy18kAudioReranking	AudioReranking	audio	eng
GTZANAudioReranking	AudioReranking	audio	zxx
UrbanSound8KAudioReranking	AudioReranking	audio	zxx
VocalSoundAudioReranking	AudioReranking	audio	eng
ESC50_Zeroshot	AudioZeroshotClassification	audio, text	zxx
RavdessZeroshot	AudioZeroshotClassification	audio, text	eng
SpeechCommandsZeroshotv0.01	AudioZeroshotClassification	audio, text	eng
SpeechCommandsZeroshotv0.02	AudioZeroshotClassification	audio, text	eng
UrbanSound8kZeroshot	AudioZeroshotClassification	audio, text	zxx
JamAltArtistA2ARetrieval	Any2AnyRetrieval	audio	deu, eng, fra, spa
AudioCapsA2TRetrieval	Any2AnyRetrieval	text, audio	eng, zxx
AudioSetStrongA2TRetrieval	Any2AnyRetrieval	text, audio	eng
ClothoA2TRetrieval	Any2AnyRetrieval	audio, text	eng
CMUArcticA2TRetrieval	Any2AnyRetrieval	text, audio	eng
CommonVoiceMini17A2TRetrieval	Any2AnyRetrieval	text, audio	ara, ast, bel, ben, bre, ... (50)
CommonVoiceMini21A2TRetrieval	Any2AnyRetrieval	text, audio	abk, afr, amh, ara, asm, ... (114)
EmoVDBA2TRetrieval	Any2AnyRetrieval	text, audio	eng
FleursA2TRetrieval	Any2AnyRetrieval	audio, text	afr, amh, ara, asm, ast, ... (102)
GigaSpeechA2TRetrieval	Any2AnyRetrieval	text, audio	eng
GoogleSVQA2TRetrieval	Any2AnyRetrieval	audio, text	acm, apc, arq, arz, ben, ... (20)
HiFiTTSA2TRetrieval	Any2AnyRetrieval	text, audio	eng
JamAltLyricA2TRetrieval	Any2AnyRetrieval	text, audio	deu, eng, fra, spa
JLCorpusA2TRetrieval	Any2AnyRetrieval	text, audio	eng
LibriTTSA2TRetrieval	Any2AnyRetrieval	text, audio	eng
MACSA2TRetrieval	Any2AnyRetrieval	text, audio	eng
MusicCapsA2TRetrieval	Any2AnyRetrieval	audio, text	zxx
SoundDescsA2TRetrieval	Any2AnyRetrieval	text, audio	zxx
UrbanSound8KA2TRetrieval	Any2AnyRetrieval	text, audio	zxx
AudioCapsT2ARetrieval	Any2AnyRetrieval	text, audio	eng, zxx
AudioSetStrongT2ARetrieval	Any2AnyRetrieval	text, audio	eng
ClothoT2ARetrieval	Any2AnyRetrieval	text, audio	eng
CMUArcticT2ARetrieval	Any2AnyRetrieval	text, audio	eng
CommonVoiceMini17T2ARetrieval	Any2AnyRetrieval	text, audio	ara, ast, bel, ben, bre, ... (50)
CommonVoiceMini21T2ARetrieval	Any2AnyRetrieval	text, audio	abk, afr, amh, ara, asm, ... (114)
EmoVDBT2ARetrieval	Any2AnyRetrieval	text, audio	eng
FleursT2ARetrieval	Any2AnyRetrieval	text, audio	afr, amh, ara, asm, ast, ... (102)
GigaSpeechT2ARetrieval	Any2AnyRetrieval	text, audio	eng
GoogleSVQT2ARetrieval	Any2AnyRetrieval	text, audio	acm, apc, arq, arz, ben, ... (20)
HiFiTTST2ARetrieval	Any2AnyRetrieval	text, audio	eng
JamAltLyricT2ARetrieval	Any2AnyRetrieval	text, audio	deu, eng, fra, spa
JLCorpusT2ARetrieval	Any2AnyRetrieval	text, audio	eng
LibriTTST2ARetrieval	Any2AnyRetrieval	text, audio	eng
MACST2ARetrieval	Any2AnyRetrieval	text, audio	eng
MusicCapsT2ARetrieval	Any2AnyRetrieval	audio, text	zxx
SoundDescsT2ARetrieval	Any2AnyRetrieval	text, audio	zxx
SpokenSQuADT2ARetrieval	Any2AnyRetrieval	text, audio	eng
UrbanSound8KT2ARetrieval	Any2AnyRetrieval	text, audio	zxx

MIEB(Img)¶

A image-only version of MIEB(Multilingual) that consists of 49 tasks.

Learn more →

Tasks

name	type	modalities	languages
CUB200I2IRetrieval	Any2AnyRetrieval	image	eng
FORBI2IRetrieval	Any2AnyRetrieval	image	eng
GLDv2I2IRetrieval	Any2AnyRetrieval	image	eng
METI2IRetrieval	Any2AnyRetrieval	image	eng
NIGHTSI2IRetrieval	Any2AnyRetrieval	image	eng
ROxfordEasyI2IRetrieval	Any2AnyRetrieval	image	eng
ROxfordMediumI2IRetrieval	Any2AnyRetrieval	image	eng
ROxfordHardI2IRetrieval	Any2AnyRetrieval	image	eng
RP2kI2IRetrieval	Any2AnyRetrieval	image	eng
RParisEasyI2IRetrieval	Any2AnyRetrieval	image	eng
RParisMediumI2IRetrieval	Any2AnyRetrieval	image	eng
RParisHardI2IRetrieval	Any2AnyRetrieval	image	eng
SketchyI2IRetrieval	Any2AnyRetrieval	image	eng
SOPI2IRetrieval	Any2AnyRetrieval	image	eng
StanfordCarsI2IRetrieval	Any2AnyRetrieval	image	eng
Birdsnap	ImageClassification	image	eng
Caltech101	ImageClassification	image	eng
CIFAR10	ImageClassification	image	eng
CIFAR100	ImageClassification	image	eng
Country211	ImageClassification	image	eng
DTD	ImageClassification	image	eng
EuroSAT	ImageClassification	image	eng
FER2013	ImageClassification	image	eng
FGVCAircraft	ImageClassification	image	eng
Food101Classification	ImageClassification	image	eng
GTSRB	ImageClassification	image	eng
Imagenet1k	ImageClassification	image	eng
MNIST	ImageClassification	image	eng
OxfordFlowersClassification	ImageClassification	image	eng
OxfordPets	ImageClassification	image	eng
PatchCamelyon	ImageClassification	image	eng
RESISC45	ImageClassification	image	eng
StanfordCars	ImageClassification	image	eng
STL10	ImageClassification	image	eng
SUN397	ImageClassification	image	eng
UCF101	ImageClassification	image	eng
CIFAR10Clustering	ImageClustering	image	eng
CIFAR100Clustering	ImageClustering	image	eng
ImageNetDog15Clustering	ImageClustering	image	eng
ImageNet10Clustering	ImageClustering	image	eng
TinyImageNetClustering	ImageClustering	image	eng
VOC2007	ImageClassification	image	eng
STS12VisualSTS	VisualSTS(eng)	image	eng
STS13VisualSTS	VisualSTS(eng)	image	eng
STS14VisualSTS	VisualSTS(eng)	image	eng
STS15VisualSTS	VisualSTS(eng)	image	eng
STS16VisualSTS	VisualSTS(eng)	image	eng
STS17MultilingualVisualSTS	VisualSTS(multi)	image	ara, deu, eng, fra, ita, ... (9)
STSBenchmarkMultilingualVisualSTS	VisualSTS(multi)	image	cmn, deu, eng, fra, ita, ... (10)

Citation

@article{xiao2025mieb,
  author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
  doi = {10.48550/ARXIV.2504.10471},
  journal = {arXiv preprint arXiv:2504.10471},
  publisher = {arXiv},
  title = {MIEB: Massive Image Embedding Benchmark},
  url = {https://arxiv.org/abs/2504.10471},
  year = {2025},
}

MIEB(Multilingual)¶

MIEB(Multilingual) is a comprehensive image embeddings benchmark, spanning 10 task types, covering 130 tasks and a total of 39 languages. In addition to image classification (zero shot and linear probing), clustering, retrieval, MIEB includes tasks in compositionality evaluation, document understanding, visual STS, and CV-centric tasks. This benchmark consists of MIEB(eng) + 3 multilingual retrieval datasets + the multilingual parts of VisualSTS-b and VisualSTS-16.

Learn more →

Tasks

name	type	modalities	languages
Birdsnap	ImageClassification	image	eng
Caltech101	ImageClassification	image	eng
CIFAR10	ImageClassification	image	eng
CIFAR100	ImageClassification	image	eng
Country211	ImageClassification	image	eng
DTD	ImageClassification	image	eng
EuroSAT	ImageClassification	image	eng
FER2013	ImageClassification	image	eng
FGVCAircraft	ImageClassification	image	eng
Food101Classification	ImageClassification	image	eng
GTSRB	ImageClassification	image	eng
Imagenet1k	ImageClassification	image	eng
MNIST	ImageClassification	image	eng
OxfordFlowersClassification	ImageClassification	image	eng
OxfordPets	ImageClassification	image	eng
PatchCamelyon	ImageClassification	image	eng
RESISC45	ImageClassification	image	eng
StanfordCars	ImageClassification	image	eng
STL10	ImageClassification	image	eng
SUN397	ImageClassification	image	eng
UCF101	ImageClassification	image	eng
VOC2007	ImageClassification	image	eng
CIFAR10Clustering	ImageClustering	image	eng
CIFAR100Clustering	ImageClustering	image	eng
ImageNetDog15Clustering	ImageClustering	image	eng
ImageNet10Clustering	ImageClustering	image	eng
TinyImageNetClustering	ImageClustering	image	eng
BirdsnapZeroShot	ZeroShotClassification	image, text	eng
Caltech101ZeroShot	ZeroShotClassification	text, image	eng
CIFAR10ZeroShot	ZeroShotClassification	text, image	eng
CIFAR100ZeroShot	ZeroShotClassification	text, image	eng
CLEVRZeroShot	ZeroShotClassification	text, image	eng
CLEVRCountZeroShot	ZeroShotClassification	text, image	eng
Country211ZeroShot	ZeroShotClassification	image, text	eng
DTDZeroShot	ZeroShotClassification	image, text	eng
EuroSATZeroShot	ZeroShotClassification	image, text	eng
FER2013ZeroShot	ZeroShotClassification	image, text	eng
FGVCAircraftZeroShot	ZeroShotClassification	text, image	eng
Food101ZeroShot	ZeroShotClassification	text, image	eng
GTSRBZeroShot	ZeroShotClassification	image	eng
Imagenet1kZeroShot	ZeroShotClassification	image, text	eng
MNISTZeroShot	ZeroShotClassification	image, text	eng
OxfordPetsZeroShot	ZeroShotClassification	text, image	eng
PatchCamelyonZeroShot	ZeroShotClassification	image, text	eng
RenderedSST2	ZeroShotClassification	text, image	eng
RESISC45ZeroShot	ZeroShotClassification	image, text	eng
StanfordCarsZeroShot	ZeroShotClassification	image, text	eng
STL10ZeroShot	ZeroShotClassification	image, text	eng
SUN397ZeroShot	ZeroShotClassification	image, text	eng
UCF101ZeroShot	ZeroShotClassification	image, text	eng
BLINKIT2IMultiChoice	VisionCentricQA	text, image	eng
BLINKIT2TMultiChoice	VisionCentricQA	text, image	eng
CVBenchCount	VisionCentricQA	image, text	eng
CVBenchRelation	VisionCentricQA	text, image	eng
CVBenchDepth	VisionCentricQA	text, image	eng
CVBenchDistance	VisionCentricQA	text, image	eng
AROCocoOrder	Compositionality	text, image	eng
AROFlickrOrder	Compositionality	text, image	eng
AROVisualAttribution	Compositionality	text, image	eng
AROVisualRelation	Compositionality	text, image	eng
SugarCrepe	Compositionality	text, image	eng
Winoground	Compositionality	text, image	eng
ImageCoDe	Compositionality	text, image	eng
STS12VisualSTS	VisualSTS(eng)	image	eng
STS13VisualSTS	VisualSTS(eng)	image	eng
STS14VisualSTS	VisualSTS(eng)	image	eng
STS15VisualSTS	VisualSTS(eng)	image	eng
STS16VisualSTS	VisualSTS(eng)	image	eng
BLINKIT2IRetrieval	Any2AnyRetrieval	text, image	eng
BLINKIT2TRetrieval	Any2AnyRetrieval	text, image	eng
CIRRIT2IRetrieval	Any2AnyRetrieval	text, image	eng
CUB200I2IRetrieval	Any2AnyRetrieval	image	eng
EDIST2ITRetrieval	Any2AnyRetrieval	text, image	eng
Fashion200kI2TRetrieval	Any2AnyRetrieval	text, image	eng
Fashion200kT2IRetrieval	Any2AnyRetrieval	text, image	eng
FashionIQIT2IRetrieval	Any2AnyRetrieval	text, image	eng
Flickr30kI2TRetrieval	Any2AnyRetrieval	text, image	eng
Flickr30kT2IRetrieval	Any2AnyRetrieval	text, image	eng
FORBI2IRetrieval	Any2AnyRetrieval	image	eng
GLDv2I2IRetrieval	Any2AnyRetrieval	image	eng
GLDv2I2TRetrieval	Any2AnyRetrieval	text, image	eng
HatefulMemesI2TRetrieval	Any2AnyRetrieval	text, image	eng
HatefulMemesT2IRetrieval	Any2AnyRetrieval	text, image	eng
ImageCoDeT2IRetrieval	Any2AnyRetrieval	text, image	eng
InfoSeekIT2ITRetrieval	Any2AnyRetrieval	text, image	eng
InfoSeekIT2TRetrieval	Any2AnyRetrieval	text, image	eng
MemotionI2TRetrieval	Any2AnyRetrieval	text, image	eng
MemotionT2IRetrieval	Any2AnyRetrieval	text, image	eng
METI2IRetrieval	Any2AnyRetrieval	image	eng
MSCOCOI2TRetrieval	Any2AnyRetrieval	text, image	eng
MSCOCOT2IRetrieval	Any2AnyRetrieval	text, image	eng
NIGHTSI2IRetrieval	Any2AnyRetrieval	image	eng
OVENIT2ITRetrieval	Any2AnyRetrieval	image, text	eng
OVENIT2TRetrieval	Any2AnyRetrieval	text, image	eng
ROxfordEasyI2IRetrieval	Any2AnyRetrieval	image	eng
ROxfordMediumI2IRetrieval	Any2AnyRetrieval	image	eng
ROxfordHardI2IRetrieval	Any2AnyRetrieval	image	eng
RP2kI2IRetrieval	Any2AnyRetrieval	image	eng
RParisEasyI2IRetrieval	Any2AnyRetrieval	image	eng
RParisMediumI2IRetrieval	Any2AnyRetrieval	image	eng
RParisHardI2IRetrieval	Any2AnyRetrieval	image	eng
SciMMIRI2TRetrieval	Any2AnyRetrieval	text, image	eng
SciMMIRT2IRetrieval	Any2AnyRetrieval	text, image	eng
SketchyI2IRetrieval	Any2AnyRetrieval	image	eng
SOPI2IRetrieval	Any2AnyRetrieval	image	eng
StanfordCarsI2IRetrieval	Any2AnyRetrieval	image	eng
TUBerlinT2IRetrieval	Any2AnyRetrieval	text, image	eng
VidoreArxivQARetrieval	DocumentUnderstanding	text, image	eng
VidoreDocVQARetrieval	DocumentUnderstanding	text, image	eng
VidoreInfoVQARetrieval	DocumentUnderstanding	text, image	eng
VidoreTabfquadRetrieval	DocumentUnderstanding	text, image	eng
VidoreTatdqaRetrieval	DocumentUnderstanding	text, image	eng
VidoreShiftProjectRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAAIRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAEnergyRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAGovernmentReportsRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAHealthcareIndustryRetrieval	DocumentUnderstanding	text, image	eng
VisualNewsI2TRetrieval	Any2AnyRetrieval	image, text	eng
VisualNewsT2IRetrieval	Any2AnyRetrieval	image, text	eng
VizWizIT2TRetrieval	Any2AnyRetrieval	text, image	eng
VQA2IT2TRetrieval	Any2AnyRetrieval	text, image	eng
WebQAT2ITRetrieval	Any2AnyRetrieval	image, text	eng
WebQAT2TRetrieval	Any2AnyRetrieval	text	eng
WITT2IRetrieval	Any2AnyMultilingualRetrieval	text, image	ara, bul, dan, ell, eng, ... (11)
XFlickr30kCoT2IRetrieval	Any2AnyMultilingualRetrieval	text, image	deu, eng, ind, jpn, rus, ... (8)
XM3600T2IRetrieval	Any2AnyMultilingualRetrieval	text, image	ara, ben, ces, dan, deu, ... (38)
VisualSTS17Eng	VisualSTS(eng)	image	eng
VisualSTS-b-Eng	VisualSTS(eng)	image	eng
VisualSTS17Multilingual	VisualSTS(multi)	image	ara, deu, eng, fra, ita, ... (9)
VisualSTS-b-Multilingual	VisualSTS(multi)	image	cmn, deu, fra, ita, nld, ... (9)

Citation

@article{xiao2025mieb,
  author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
  doi = {10.48550/ARXIV.2504.10471},
  journal = {arXiv preprint arXiv:2504.10471},
  publisher = {arXiv},
  title = {MIEB: Massive Image Embedding Benchmark},
  url = {https://arxiv.org/abs/2504.10471},
  year = {2025},
}

MIEB(eng)¶

MIEB(eng) is a comprehensive image embeddings benchmark, spanning 8 task types, covering 125 tasks. In addition to image classification (zero shot and linear probing), clustering, retrieval, MIEB includes tasks in compositionality evaluation, document understanding, visual STS, and CV-centric tasks.

Learn more →

Tasks

name	type	modalities	languages
Birdsnap	ImageClassification	image	eng
Caltech101	ImageClassification	image	eng
CIFAR10	ImageClassification	image	eng
CIFAR100	ImageClassification	image	eng
Country211	ImageClassification	image	eng
DTD	ImageClassification	image	eng
EuroSAT	ImageClassification	image	eng
FER2013	ImageClassification	image	eng
FGVCAircraft	ImageClassification	image	eng
Food101Classification	ImageClassification	image	eng
GTSRB	ImageClassification	image	eng
Imagenet1k	ImageClassification	image	eng
MNIST	ImageClassification	image	eng
OxfordFlowersClassification	ImageClassification	image	eng
OxfordPets	ImageClassification	image	eng
PatchCamelyon	ImageClassification	image	eng
RESISC45	ImageClassification	image	eng
StanfordCars	ImageClassification	image	eng
STL10	ImageClassification	image	eng
SUN397	ImageClassification	image	eng
UCF101	ImageClassification	image	eng
VOC2007	ImageClassification	image	eng
CIFAR10Clustering	ImageClustering	image	eng
CIFAR100Clustering	ImageClustering	image	eng
ImageNetDog15Clustering	ImageClustering	image	eng
ImageNet10Clustering	ImageClustering	image	eng
TinyImageNetClustering	ImageClustering	image	eng
BirdsnapZeroShot	ZeroShotClassification	image, text	eng
Caltech101ZeroShot	ZeroShotClassification	text, image	eng
CIFAR10ZeroShot	ZeroShotClassification	text, image	eng
CIFAR100ZeroShot	ZeroShotClassification	text, image	eng
CLEVRZeroShot	ZeroShotClassification	text, image	eng
CLEVRCountZeroShot	ZeroShotClassification	text, image	eng
Country211ZeroShot	ZeroShotClassification	image, text	eng
DTDZeroShot	ZeroShotClassification	image, text	eng
EuroSATZeroShot	ZeroShotClassification	image, text	eng
FER2013ZeroShot	ZeroShotClassification	image, text	eng
FGVCAircraftZeroShot	ZeroShotClassification	text, image	eng
Food101ZeroShot	ZeroShotClassification	text, image	eng
GTSRBZeroShot	ZeroShotClassification	image	eng
Imagenet1kZeroShot	ZeroShotClassification	image, text	eng
MNISTZeroShot	ZeroShotClassification	image, text	eng
OxfordPetsZeroShot	ZeroShotClassification	text, image	eng
PatchCamelyonZeroShot	ZeroShotClassification	image, text	eng
RenderedSST2	ZeroShotClassification	text, image	eng
RESISC45ZeroShot	ZeroShotClassification	image, text	eng
StanfordCarsZeroShot	ZeroShotClassification	image, text	eng
STL10ZeroShot	ZeroShotClassification	image, text	eng
SUN397ZeroShot	ZeroShotClassification	image, text	eng
UCF101ZeroShot	ZeroShotClassification	image, text	eng
BLINKIT2IMultiChoice	VisionCentricQA	text, image	eng
BLINKIT2TMultiChoice	VisionCentricQA	text, image	eng
CVBenchCount	VisionCentricQA	image, text	eng
CVBenchRelation	VisionCentricQA	text, image	eng
CVBenchDepth	VisionCentricQA	text, image	eng
CVBenchDistance	VisionCentricQA	text, image	eng
AROCocoOrder	Compositionality	text, image	eng
AROFlickrOrder	Compositionality	text, image	eng
AROVisualAttribution	Compositionality	text, image	eng
AROVisualRelation	Compositionality	text, image	eng
SugarCrepe	Compositionality	text, image	eng
Winoground	Compositionality	text, image	eng
ImageCoDe	Compositionality	text, image	eng
STS12VisualSTS	VisualSTS(eng)	image	eng
STS13VisualSTS	VisualSTS(eng)	image	eng
STS14VisualSTS	VisualSTS(eng)	image	eng
STS15VisualSTS	VisualSTS(eng)	image	eng
STS16VisualSTS	VisualSTS(eng)	image	eng
BLINKIT2IRetrieval	Any2AnyRetrieval	text, image	eng
BLINKIT2TRetrieval	Any2AnyRetrieval	text, image	eng
CIRRIT2IRetrieval	Any2AnyRetrieval	text, image	eng
CUB200I2IRetrieval	Any2AnyRetrieval	image	eng
EDIST2ITRetrieval	Any2AnyRetrieval	text, image	eng
Fashion200kI2TRetrieval	Any2AnyRetrieval	text, image	eng
Fashion200kT2IRetrieval	Any2AnyRetrieval	text, image	eng
FashionIQIT2IRetrieval	Any2AnyRetrieval	text, image	eng
Flickr30kI2TRetrieval	Any2AnyRetrieval	text, image	eng
Flickr30kT2IRetrieval	Any2AnyRetrieval	text, image	eng
FORBI2IRetrieval	Any2AnyRetrieval	image	eng
GLDv2I2IRetrieval	Any2AnyRetrieval	image	eng
GLDv2I2TRetrieval	Any2AnyRetrieval	text, image	eng
HatefulMemesI2TRetrieval	Any2AnyRetrieval	text, image	eng
HatefulMemesT2IRetrieval	Any2AnyRetrieval	text, image	eng
ImageCoDeT2IRetrieval	Any2AnyRetrieval	text, image	eng
InfoSeekIT2ITRetrieval	Any2AnyRetrieval	text, image	eng
InfoSeekIT2TRetrieval	Any2AnyRetrieval	text, image	eng
MemotionI2TRetrieval	Any2AnyRetrieval	text, image	eng
MemotionT2IRetrieval	Any2AnyRetrieval	text, image	eng
METI2IRetrieval	Any2AnyRetrieval	image	eng
MSCOCOI2TRetrieval	Any2AnyRetrieval	text, image	eng
MSCOCOT2IRetrieval	Any2AnyRetrieval	text, image	eng
NIGHTSI2IRetrieval	Any2AnyRetrieval	image	eng
OVENIT2ITRetrieval	Any2AnyRetrieval	image, text	eng
OVENIT2TRetrieval	Any2AnyRetrieval	text, image	eng
ROxfordEasyI2IRetrieval	Any2AnyRetrieval	image	eng
ROxfordMediumI2IRetrieval	Any2AnyRetrieval	image	eng
ROxfordHardI2IRetrieval	Any2AnyRetrieval	image	eng
RP2kI2IRetrieval	Any2AnyRetrieval	image	eng
RParisEasyI2IRetrieval	Any2AnyRetrieval	image	eng
RParisMediumI2IRetrieval	Any2AnyRetrieval	image	eng
RParisHardI2IRetrieval	Any2AnyRetrieval	image	eng
SciMMIRI2TRetrieval	Any2AnyRetrieval	text, image	eng
SciMMIRT2IRetrieval	Any2AnyRetrieval	text, image	eng
SketchyI2IRetrieval	Any2AnyRetrieval	image	eng
SOPI2IRetrieval	Any2AnyRetrieval	image	eng
StanfordCarsI2IRetrieval	Any2AnyRetrieval	image	eng
TUBerlinT2IRetrieval	Any2AnyRetrieval	text, image	eng
VidoreArxivQARetrieval	DocumentUnderstanding	text, image	eng
VidoreDocVQARetrieval	DocumentUnderstanding	text, image	eng
VidoreInfoVQARetrieval	DocumentUnderstanding	text, image	eng
VidoreTabfquadRetrieval	DocumentUnderstanding	text, image	eng
VidoreTatdqaRetrieval	DocumentUnderstanding	text, image	eng
VidoreShiftProjectRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAAIRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAEnergyRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAGovernmentReportsRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAHealthcareIndustryRetrieval	DocumentUnderstanding	text, image	eng
VisualNewsI2TRetrieval	Any2AnyRetrieval	image, text	eng
VisualNewsT2IRetrieval	Any2AnyRetrieval	image, text	eng
VizWizIT2TRetrieval	Any2AnyRetrieval	text, image	eng
VQA2IT2TRetrieval	Any2AnyRetrieval	text, image	eng
WebQAT2ITRetrieval	Any2AnyRetrieval	image, text	eng
WebQAT2TRetrieval	Any2AnyRetrieval	text	eng
VisualSTS17Eng	VisualSTS(eng)	image	eng
VisualSTS-b-Eng	VisualSTS(eng)	image	eng

Citation

@article{xiao2025mieb,
  author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
  doi = {10.48550/ARXIV.2504.10471},
  journal = {arXiv preprint arXiv:2504.10471},
  publisher = {arXiv},
  title = {MIEB: Massive Image Embedding Benchmark},
  url = {https://arxiv.org/abs/2504.10471},
  year = {2025},
}

MIEB(lite)¶

MIEB(lite) is a comprehensive image embeddings benchmark, spanning 10 task types, covering 51 tasks. This is a lite version of MIEB(Multilingual), designed to be run at a fraction of the cost while maintaining relative rank of models.

Learn more →

Tasks

name	type	modalities	languages
Country211	ImageClassification	image	eng
DTD	ImageClassification	image	eng
EuroSAT	ImageClassification	image	eng
GTSRB	ImageClassification	image	eng
OxfordPets	ImageClassification	image	eng
PatchCamelyon	ImageClassification	image	eng
RESISC45	ImageClassification	image	eng
SUN397	ImageClassification	image	eng
ImageNetDog15Clustering	ImageClustering	image	eng
TinyImageNetClustering	ImageClustering	image	eng
CIFAR100ZeroShot	ZeroShotClassification	text, image	eng
Country211ZeroShot	ZeroShotClassification	image, text	eng
FER2013ZeroShot	ZeroShotClassification	image, text	eng
FGVCAircraftZeroShot	ZeroShotClassification	text, image	eng
Food101ZeroShot	ZeroShotClassification	text, image	eng
OxfordPetsZeroShot	ZeroShotClassification	text, image	eng
StanfordCarsZeroShot	ZeroShotClassification	image, text	eng
BLINKIT2IMultiChoice	VisionCentricQA	text, image	eng
CVBenchCount	VisionCentricQA	image, text	eng
CVBenchRelation	VisionCentricQA	text, image	eng
CVBenchDepth	VisionCentricQA	text, image	eng
CVBenchDistance	VisionCentricQA	text, image	eng
AROCocoOrder	Compositionality	text, image	eng
AROFlickrOrder	Compositionality	text, image	eng
AROVisualAttribution	Compositionality	text, image	eng
AROVisualRelation	Compositionality	text, image	eng
Winoground	Compositionality	text, image	eng
ImageCoDe	Compositionality	text, image	eng
STS13VisualSTS	VisualSTS(eng)	image	eng
STS15VisualSTS	VisualSTS(eng)	image	eng
VisualSTS17Multilingual	VisualSTS(multi)	image	ara, deu, eng, fra, ita, ... (9)
VisualSTS-b-Multilingual	VisualSTS(multi)	image	cmn, deu, fra, ita, nld, ... (9)
CIRRIT2IRetrieval	Any2AnyRetrieval	text, image	eng
CUB200I2IRetrieval	Any2AnyRetrieval	image	eng
Fashion200kI2TRetrieval	Any2AnyRetrieval	text, image	eng
HatefulMemesI2TRetrieval	Any2AnyRetrieval	text, image	eng
InfoSeekIT2TRetrieval	Any2AnyRetrieval	text, image	eng
NIGHTSI2IRetrieval	Any2AnyRetrieval	image	eng
OVENIT2TRetrieval	Any2AnyRetrieval	text, image	eng
RP2kI2IRetrieval	Any2AnyRetrieval	image	eng
VidoreDocVQARetrieval	DocumentUnderstanding	text, image	eng
VidoreInfoVQARetrieval	DocumentUnderstanding	text, image	eng
VidoreTabfquadRetrieval	DocumentUnderstanding	text, image	eng
VidoreTatdqaRetrieval	DocumentUnderstanding	text, image	eng
VidoreShiftProjectRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAAIRetrieval	DocumentUnderstanding	text, image	eng
VisualNewsI2TRetrieval	Any2AnyRetrieval	image, text	eng
VQA2IT2TRetrieval	Any2AnyRetrieval	text, image	eng
WebQAT2ITRetrieval	Any2AnyRetrieval	image, text	eng
WITT2IRetrieval	Any2AnyMultilingualRetrieval	text, image	ara, bul, dan, ell, eng, ... (11)
XM3600T2IRetrieval	Any2AnyMultilingualRetrieval	text, image	ara, ben, ces, dan, deu, ... (38)

Citation

@article{xiao2025mieb,
  author = {Chenghao Xiao and Isaac Chung and Imene Kerboua and Jamie Stirling and Xin Zhang and Márton Kardos and Roman Solomatin and Noura Al Moubayed and Kenneth Enevoldsen and Niklas Muennighoff},
  doi = {10.48550/ARXIV.2504.10471},
  journal = {arXiv preprint arXiv:2504.10471},
  publisher = {arXiv},
  title = {MIEB: Massive Image Embedding Benchmark},
  url = {https://arxiv.org/abs/2504.10471},
  year = {2025},
}

MINERSBitextMining¶

Bitext Mining texts from the MINERS benchmark, a benchmark designed to evaluate the ability of multilingual LMs in semantic retrieval tasks, including bitext mining and classification via retrieval-augmented contexts.

Learn more →

Tasks

name	type	modalities	languages
BUCC	BitextMining	text	cmn, deu, eng, fra, rus
LinceMTBitextMining	BitextMining	text	eng, hin
NollySentiBitextMining	BitextMining	text	eng, hau, ibo, pcm, yor
NusaXBitextMining	BitextMining	text	ace, ban, bbc, bjn, bug, ... (12)
NusaTranslationBitextMining	BitextMining	text	abs, bbc, bew, bhp, ind, ... (12)
PhincBitextMining	BitextMining	text	eng, hin
Tatoeba	BitextMining	text	afr, amh, ang, ara, arq, ... (113)

Citation

@article{winata2024miners,
  author = {Winata, Genta Indra and Zhang, Ruochen and Adelani, David Ifeoluwa},
  journal = {arXiv preprint arXiv:2406.07424},
  title = {MINERS: Multilingual Language Models as Semantic Retrievers},
  year = {2024},
}

MTEB(Code, v1)¶

A massive code embedding benchmark covering retrieval tasks in a miriad of popular programming languages.

Tasks

name	type	modalities	languages
AppsRetrieval	Retrieval	text	eng, python
CodeEditSearchRetrieval	Retrieval	text	c, c++, go, java, javascript, ... (13)
CodeFeedbackMT	Retrieval	text	eng
CodeFeedbackST	Retrieval	text	eng
CodeSearchNetCCRetrieval	Retrieval	text	go, java, javascript, php, python, ... (6)
CodeSearchNetRetrieval	Retrieval	text	go, java, javascript, php, python, ... (6)
CodeTransOceanContest	Retrieval	text	c++, python
CodeTransOceanDL	Retrieval	text	python
CosQA	Retrieval	text	eng, python
COIRCodeSearchNetRetrieval	Retrieval	text	go, java, javascript, php, python, ... (6)
StackOverflowQA	Retrieval	text	eng
SyntheticText2SQL	Retrieval	text	eng, sql

Citation

@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(Europe, v1)¶

A regional geopolitical text embedding benchmark targeting embedding performance on European languages.

Tasks

name	type	modalities	languages
BornholmBitextMining	BitextMining	text	dan
BibleNLPBitextMining	BitextMining	text	aai, aak, aau, aaz, abt, ... (829)
BUCC.v2	BitextMining	text	cmn, deu, eng, fra, rus
DiaBlaBitextMining	BitextMining	text	eng, fra
FloresBitextMining	BitextMining	text	ace, acm, acq, aeb, afr, ... (196)
NorwegianCourtsBitextMining	BitextMining	text	nno, nob
NTREXBitextMining	BitextMining	text	afr, amh, arb, aze, bak, ... (119)
BulgarianStoreReviewSentimentClassfication	Classification	text	bul
CzechProductReviewSentimentClassification	Classification	text	ces
GreekLegalCodeClassification	Classification	text	ell
DBpediaClassification	Classification	text	eng
FinancialPhrasebankClassification	Classification	text	eng
PoemSentimentClassification	Classification	text	eng
ToxicChatClassification	Classification	text	eng
ToxicConversationsClassification	Classification	text	eng
EstonianValenceClassification	Classification	text	est
ItaCaseholdClassification	Classification	text	ita
AmazonCounterfactualClassification	Classification	text	deu, eng, jpn
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MultiHateClassification	Classification	text	ara, cmn, deu, eng, fra, ... (11)
ScalaClassification	Classification	text	dan, nno, nob, swe
SwissJudgementClassification	Classification	text	deu, fra, ita
TweetSentimentClassification	Classification	text	ara, deu, eng, fra, hin, ... (8)
CBD	Classification	text	pol
PolEmo2.0-OUT	Classification	text	pol
CSFDSKMovieReviewSentimentClassification	Classification	text	slk
DalajClassification	Classification	text	swe
WikiCitiesClustering	Clustering	text	eng
RomaniBibleClustering	Clustering	text	rom
BigPatentClustering.v2	Clustering	text	eng
BiorxivClusteringP2P.v2	Clustering	text	eng
AlloProfClusteringS2S.v2	Clustering	text	fra
HALClusteringS2S.v2	Clustering	text	fra
SIB200ClusteringS2S	Clustering	text	ace, acm, acq, aeb, afr, ... (197)
WikiClusteringP2P.v2	Clustering	text	bos, cat, ces, dan, eus, ... (14)
StackOverflowQA	Retrieval	text	eng
TwitterHjerneRetrieval	Retrieval	text	dan
LegalQuAD	Retrieval	text	deu
ArguAna	Retrieval	text	eng
HagridRetrieval	Retrieval	text	eng
LegalBenchCorporateLobbying	Retrieval	text	eng
LEMBPasskeyRetrieval	Retrieval	text	eng
SCIDOCS	Retrieval	text	eng
SpartQA	Retrieval	text	eng
TempReasonL1	Retrieval	text	eng
WinoGrande	Retrieval	text	eng
AlloprofRetrieval	Retrieval	text	fra
BelebeleRetrieval	Retrieval	text	acm, afr, als, amh, apc, ... (115)
StatcanDialogueDatasetRetrieval	Retrieval	text	eng, fra
WikipediaRetrievalMultilingual	Retrieval	text	ben, bul, ces, dan, deu, ... (16)
Core17InstructionRetrieval	InstructionReranking	text	eng
News21InstructionRetrieval	InstructionReranking	text	eng
Robust04InstructionRetrieval	InstructionReranking	text	eng
MalteseNewsClassification	MultilabelClassification	text	mlt
MultiEURLEXMultilabelClassification	MultilabelClassification	text	bul, ces, dan, deu, ell, ... (23)
CTKFactsNLI	PairClassification	text	ces
SprintDuplicateQuestions	PairClassification	text	eng
OpusparcusPC	PairClassification	text	deu, eng, fin, fra, rus, ... (6)
RTE3	PairClassification	text	deu, eng, fra, ita
XNLI	PairClassification	text	ara, bul, deu, ell, eng, ... (14)
PSC	PairClassification	text	pol
WebLINXCandidatesReranking	Reranking	text	eng
AlloprofReranking	Reranking	text	fra
WikipediaRerankingMultilingual	Reranking	text	ben, bul, ces, dan, deu, ... (18)
SICK-R	STS	text	eng
STS12	STS	text	eng
STS14	STS	text	eng
STS15	STS	text	eng
STSBenchmark	STS	text	eng
FinParaSTS	STS	text	fin
STS17	STS	text	ara, deu, eng, fra, ita, ... (9)
SICK-R-PL	STS	text	pol
STSES	STS	text	spa

Citation

@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(Indic, v1)¶

A regional geopolitical text embedding benchmark targeting embedding performance on Indic languages.

Tasks

name	type	modalities	languages
IN22ConvBitextMining	BitextMining	text	asm, ben, brx, doi, eng, ... (23)
IN22GenBitextMining	BitextMining	text	asm, ben, brx, doi, eng, ... (23)
SIB200ClusteringS2S	Clustering	text	ace, acm, acq, aeb, afr, ... (197)
BengaliSentimentAnalysis	Classification	text	ben
GujaratiNewsClassification	Classification	text	guj
HindiDiscourseClassification	Classification	text	hin
SentimentAnalysisHindi	Classification	text	hin
MalayalamNewsClassification	Classification	text	mal
MTOPIntentClassification	Classification	text	deu, eng, fra, hin, spa, ... (6)
MultiHateClassification	Classification	text	ara, cmn, deu, eng, fra, ... (11)
TweetSentimentClassification	Classification	text	ara, deu, eng, fra, hin, ... (8)
NepaliNewsClassification	Classification	text	nep
PunjabiNewsClassification	Classification	text	pan
SanskritShlokasClassification	Classification	text	san
UrduRomanSentimentClassification	Classification	text	urd
XNLI	PairClassification	text	ara, bul, deu, ell, eng, ... (14)
BelebeleRetrieval	Retrieval	text	acm, afr, als, amh, apc, ... (115)
XQuADRetrieval	Retrieval	text	arb, deu, ell, eng, hin, ... (12)
WikipediaRerankingMultilingual	Reranking	text	ben, bul, ces, dan, deu, ... (18)
IndicCrosslingualSTS	STS	text	asm, ben, eng, guj, hin, ... (13)

Citation

@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(Law, v1)¶

A benchmark of retrieval tasks in the legal domain.

Tasks

name	type	modalities	languages
AILACasedocs	Retrieval	text	eng
AILAStatutes	Retrieval	text	eng
LegalSummarization	Retrieval	text	eng
GerDaLIRSmall	Retrieval	text	deu
LeCaRDv2	Retrieval	text	zho
LegalBenchConsumerContractsQA	Retrieval	text	eng
LegalBenchCorporateLobbying	Retrieval	text	eng
LegalQuAD	Retrieval	text	deu

MTEB(Medical, v1)¶

A curated set of MTEB tasks designed to evaluate systems in the context of medical information retrieval.

Tasks

name	type	modalities	languages
CUREv1	Retrieval	text	eng, fra, spa
NFCorpus	Retrieval	text	eng
TRECCOVID	Retrieval	text	eng
TRECCOVID-PL	Retrieval	text	pol
SciFact	Retrieval	text	eng
SciFact-PL	Retrieval	text	pol
MedicalQARetrieval	Retrieval	text	eng
PublicHealthQA	Retrieval	text	ara, eng, fra, kor, rus, ... (8)
MedrxivClusteringP2P.v2	Clustering	text	eng
MedrxivClusteringS2S.v2	Clustering	text	eng
CmedqaRetrieval	Retrieval	text	cmn
CMedQAv2-reranking	Reranking	text	cmn

MTEB(Multilingual, v1)¶

A large-scale multilingual expansion of MTEB, driven mainly by highly-curated community contributions covering 250+ languages. This benhcmark has been replaced by MTEB(Multilingual, v2) as one of the datasets (SNLHierarchicalClustering) included in v1 was removed from the Hugging Face Hub.

Learn more →

Tasks

name	type	modalities	languages
BornholmBitextMining	BitextMining	text	dan
BibleNLPBitextMining	BitextMining	text	aai, aak, aau, aaz, abt, ... (829)
BUCC.v2	BitextMining	text	cmn, deu, eng, fra, rus
DiaBlaBitextMining	BitextMining	text	eng, fra
FloresBitextMining	BitextMining	text	ace, acm, acq, aeb, afr, ... (196)
IN22GenBitextMining	BitextMining	text	asm, ben, brx, doi, eng, ... (23)
IndicGenBenchFloresBitextMining	BitextMining	text	asm, awa, ben, bgc, bho, ... (30)
NollySentiBitextMining	BitextMining	text	eng, hau, ibo, pcm, yor
NorwegianCourtsBitextMining	BitextMining	text	nno, nob
NTREXBitextMining	BitextMining	text	afr, amh, arb, aze, bak, ... (119)
NusaTranslationBitextMining	BitextMining	text	abs, bbc, bew, bhp, ind, ... (12)
NusaXBitextMining	BitextMining	text	ace, ban, bbc, bjn, bug, ... (12)
Tatoeba	BitextMining	text	afr, amh, ang, ara, arq, ... (113)
BulgarianStoreReviewSentimentClassfication	Classification	text	bul
CzechProductReviewSentimentClassification	Classification	text	ces
GreekLegalCodeClassification	Classification	text	ell
DBpediaClassification	Classification	text	eng
FinancialPhrasebankClassification	Classification	text	eng
PoemSentimentClassification	Classification	text	eng
ToxicConversationsClassification	Classification	text	eng
TweetTopicSingleClassification	Classification	text	eng
EstonianValenceClassification	Classification	text	est
FilipinoShopeeReviewsClassification	Classification	text	fil
GujaratiNewsClassification	Classification	text	guj
SentimentAnalysisHindi	Classification	text	hin
IndonesianIdClickbaitClassification	Classification	text	ind
ItaCaseholdClassification	Classification	text	ita
KorSarcasmClassification	Classification	text	kor
KurdishSentimentClassification	Classification	text	kur
MacedonianTweetSentimentClassification	Classification	text	mkd
AfriSentiClassification	Classification	text	amh, arq, ary, hau, ibo, ... (12)
AmazonCounterfactualClassification	Classification	text	deu, eng, jpn
CataloniaTweetClassification	Classification	text	cat, spa
CyrillicTurkicLangClassification	Classification	text	bak, chv, kaz, kir, krc, ... (9)
IndicLangClassification	Classification	text	asm, ben, brx, doi, gom, ... (22)
MasakhaNEWSClassification	Classification	text	amh, eng, fra, hau, ibo, ... (16)
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MultiHateClassification	Classification	text	ara, cmn, deu, eng, fra, ... (11)
NordicLangClassification	Classification	text	dan, fao, isl, nno, nob, ... (6)
NusaParagraphEmotionClassification	Classification	text	bbc, bew, bug, jav, mad, ... (10)
NusaX-senti	Classification	text	ace, ban, bbc, bjn, bug, ... (12)
ScalaClassification	Classification	text	dan, nno, nob, swe
SwissJudgementClassification	Classification	text	deu, fra, ita
NepaliNewsClassification	Classification	text	nep
OdiaNewsClassification	Classification	text	ory
PunjabiNewsClassification	Classification	text	pan
PolEmo2.0-OUT	Classification	text	pol
PAC	Classification	text	pol
SinhalaNewsClassification	Classification	text	sin
CSFDSKMovieReviewSentimentClassification	Classification	text	slk
SiswatiNewsClassification	Classification	text	ssw
SlovakMovieReviewSentimentClassification	Classification	text	slk
SwahiliNewsClassification	Classification	text	swa
DalajClassification	Classification	text	swe
TswanaNewsClassification	Classification	text	tsn
IsiZuluNewsClassification	Classification	text	zul
WikiCitiesClustering	Clustering	text	eng
MasakhaNEWSClusteringS2S	Clustering	text	amh, eng, fra, hau, ibo, ... (16)
RomaniBibleClustering	Clustering	text	rom
ArXivHierarchicalClusteringP2P	Clustering	text	eng
ArXivHierarchicalClusteringS2S	Clustering	text	eng
BigPatentClustering.v2	Clustering	text	eng
BiorxivClusteringP2P.v2	Clustering	text	eng
MedrxivClusteringP2P.v2	Clustering	text	eng
StackExchangeClustering.v2	Clustering	text	eng
AlloProfClusteringS2S.v2	Clustering	text	fra
HALClusteringS2S.v2	Clustering	text	fra
SIB200ClusteringS2S	Clustering	text	ace, acm, acq, aeb, afr, ... (197)
WikiClusteringP2P.v2	Clustering	text	bos, cat, ces, dan, eus, ... (14)
PlscClusteringP2P.v2	Clustering	text	pol
SwednClusteringP2P	Clustering	text	swe
CLSClusteringP2P.v2	Clustering	text	cmn
StackOverflowQA	Retrieval	text	eng
TwitterHjerneRetrieval	Retrieval	text	dan
AILAStatutes	Retrieval	text	eng
ArguAna	Retrieval	text	eng
HagridRetrieval	Retrieval	text	eng
LegalBenchCorporateLobbying	Retrieval	text	eng
LEMBPasskeyRetrieval	Retrieval	text	eng
SCIDOCS	Retrieval	text	eng
SpartQA	Retrieval	text	eng
TempReasonL1	Retrieval	text	eng
TRECCOVID	Retrieval	text	eng
WinoGrande	Retrieval	text	eng
BelebeleRetrieval	Retrieval	text	acm, afr, als, amh, apc, ... (115)
MLQARetrieval	Retrieval	text	ara, deu, eng, hin, spa, ... (7)
StatcanDialogueDatasetRetrieval	Retrieval	text	eng, fra
WikipediaRetrievalMultilingual	Retrieval	text	ben, bul, ces, dan, deu, ... (16)
CovidRetrieval	Retrieval	text	cmn
Core17InstructionRetrieval	InstructionReranking	text	eng
News21InstructionRetrieval	InstructionReranking	text	eng
Robust04InstructionRetrieval	InstructionReranking	text	eng
KorHateSpeechMLClassification	MultilabelClassification	text	kor
MalteseNewsClassification	MultilabelClassification	text	mlt
MultiEURLEXMultilabelClassification	MultilabelClassification	text	bul, ces, dan, deu, ell, ... (23)
BrazilianToxicTweetsClassification	MultilabelClassification	text	por
CEDRClassification	MultilabelClassification	text	rus
CTKFactsNLI	PairClassification	text	ces
SprintDuplicateQuestions	PairClassification	text	eng
TwitterURLCorpus	PairClassification	text	eng
ArmenianParaphrasePC	PairClassification	text	hye
indonli	PairClassification	text	ind
OpusparcusPC	PairClassification	text	deu, eng, fin, fra, rus, ... (6)
PawsXPairClassification	PairClassification	text	cmn, deu, eng, fra, jpn, ... (7)
RTE3	PairClassification	text	deu, eng, fra, ita
XNLI	PairClassification	text	ara, bul, deu, ell, eng, ... (14)
PpcPC	PairClassification	text	pol
TERRa	PairClassification	text	rus
WebLINXCandidatesReranking	Reranking	text	eng
AlloprofReranking	Reranking	text	fra
VoyageMMarcoReranking	Reranking	text	jpn
WikipediaRerankingMultilingual	Reranking	text	ben, bul, ces, dan, deu, ... (18)
RuBQReranking	Reranking	text	rus
T2Reranking	Reranking	text	cmn
GermanSTSBenchmark	STS	text	deu
SICK-R	STS	text	eng
STS12	STS	text	eng
STS13	STS	text	eng
STS14	STS	text	eng
STS15	STS	text	eng
STSBenchmark	STS	text	eng
FaroeseSTS	STS	text	fao
FinParaSTS	STS	text	fin
JSICK	STS	text	jpn
IndicCrosslingualSTS	STS	text	asm, ben, eng, guj, hin, ... (13)
SemRel24STS	STS	text	afr, amh, arb, arq, ary, ... (12)
STS17	STS	text	ara, deu, eng, fra, ita, ... (9)
STS22.v2	STS	text	ara, cmn, deu, eng, fra, ... (10)
STSES	STS	text	spa
STSB	STS	text	cmn
MIRACLRetrievalHardNegatives	Retrieval	text	ara, ben, deu, eng, fas, ... (18)
SNLHierarchicalClusteringP2P	Clustering	text	nob

Citation

@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(Multilingual, v2)¶

A large-scale multilingual expansion of MTEB, driven mainly by highly-curated community contributions covering 250+ languages.

Learn more →

Tasks

name	type	modalities	languages
BornholmBitextMining	BitextMining	text	dan
BibleNLPBitextMining	BitextMining	text	aai, aak, aau, aaz, abt, ... (829)
BUCC.v2	BitextMining	text	cmn, deu, eng, fra, rus
DiaBlaBitextMining	BitextMining	text	eng, fra
FloresBitextMining	BitextMining	text	ace, acm, acq, aeb, afr, ... (196)
IN22GenBitextMining	BitextMining	text	asm, ben, brx, doi, eng, ... (23)
IndicGenBenchFloresBitextMining	BitextMining	text	asm, awa, ben, bgc, bho, ... (30)
NollySentiBitextMining	BitextMining	text	eng, hau, ibo, pcm, yor
NorwegianCourtsBitextMining	BitextMining	text	nno, nob
NTREXBitextMining	BitextMining	text	afr, amh, arb, aze, bak, ... (119)
NusaTranslationBitextMining	BitextMining	text	abs, bbc, bew, bhp, ind, ... (12)
NusaXBitextMining	BitextMining	text	ace, ban, bbc, bjn, bug, ... (12)
Tatoeba	BitextMining	text	afr, amh, ang, ara, arq, ... (113)
BulgarianStoreReviewSentimentClassfication	Classification	text	bul
CzechProductReviewSentimentClassification	Classification	text	ces
GreekLegalCodeClassification	Classification	text	ell
DBpediaClassification	Classification	text	eng
FinancialPhrasebankClassification	Classification	text	eng
PoemSentimentClassification	Classification	text	eng
ToxicConversationsClassification	Classification	text	eng
TweetTopicSingleClassification	Classification	text	eng
EstonianValenceClassification	Classification	text	est
FilipinoShopeeReviewsClassification	Classification	text	fil
GujaratiNewsClassification	Classification	text	guj
SentimentAnalysisHindi	Classification	text	hin
IndonesianIdClickbaitClassification	Classification	text	ind
ItaCaseholdClassification	Classification	text	ita
KorSarcasmClassification	Classification	text	kor
KurdishSentimentClassification	Classification	text	kur
MacedonianTweetSentimentClassification	Classification	text	mkd
AfriSentiClassification	Classification	text	amh, arq, ary, hau, ibo, ... (12)
AmazonCounterfactualClassification	Classification	text	deu, eng, jpn
CataloniaTweetClassification	Classification	text	cat, spa
CyrillicTurkicLangClassification	Classification	text	bak, chv, kaz, kir, krc, ... (9)
IndicLangClassification	Classification	text	asm, ben, brx, doi, gom, ... (22)
MasakhaNEWSClassification	Classification	text	amh, eng, fra, hau, ibo, ... (16)
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MultiHateClassification	Classification	text	ara, cmn, deu, eng, fra, ... (11)
NordicLangClassification	Classification	text	dan, fao, isl, nno, nob, ... (6)
NusaParagraphEmotionClassification	Classification	text	bbc, bew, bug, jav, mad, ... (10)
NusaX-senti	Classification	text	ace, ban, bbc, bjn, bug, ... (12)
ScalaClassification	Classification	text	dan, nno, nob, swe
SwissJudgementClassification	Classification	text	deu, fra, ita
NepaliNewsClassification	Classification	text	nep
OdiaNewsClassification	Classification	text	ory
PunjabiNewsClassification	Classification	text	pan
PolEmo2.0-OUT	Classification	text	pol
PAC	Classification	text	pol
SinhalaNewsClassification	Classification	text	sin
CSFDSKMovieReviewSentimentClassification	Classification	text	slk
SiswatiNewsClassification	Classification	text	ssw
SlovakMovieReviewSentimentClassification	Classification	text	slk
SwahiliNewsClassification	Classification	text	swa
DalajClassification	Classification	text	swe
TswanaNewsClassification	Classification	text	tsn
IsiZuluNewsClassification	Classification	text	zul
WikiCitiesClustering	Clustering	text	eng
MasakhaNEWSClusteringS2S	Clustering	text	amh, eng, fra, hau, ibo, ... (16)
RomaniBibleClustering	Clustering	text	rom
ArXivHierarchicalClusteringP2P	Clustering	text	eng
ArXivHierarchicalClusteringS2S	Clustering	text	eng
BigPatentClustering.v2	Clustering	text	eng
BiorxivClusteringP2P.v2	Clustering	text	eng
MedrxivClusteringP2P.v2	Clustering	text	eng
StackExchangeClustering.v2	Clustering	text	eng
AlloProfClusteringS2S.v2	Clustering	text	fra
HALClusteringS2S.v2	Clustering	text	fra
SIB200ClusteringS2S	Clustering	text	ace, acm, acq, aeb, afr, ... (197)
WikiClusteringP2P.v2	Clustering	text	bos, cat, ces, dan, eus, ... (14)
PlscClusteringP2P.v2	Clustering	text	pol
SwednClusteringP2P	Clustering	text	swe
CLSClusteringP2P.v2	Clustering	text	cmn
StackOverflowQA	Retrieval	text	eng
TwitterHjerneRetrieval	Retrieval	text	dan
AILAStatutes	Retrieval	text	eng
ArguAna	Retrieval	text	eng
HagridRetrieval	Retrieval	text	eng
LegalBenchCorporateLobbying	Retrieval	text	eng
LEMBPasskeyRetrieval	Retrieval	text	eng
SCIDOCS	Retrieval	text	eng
SpartQA	Retrieval	text	eng
TempReasonL1	Retrieval	text	eng
TRECCOVID	Retrieval	text	eng
WinoGrande	Retrieval	text	eng
BelebeleRetrieval	Retrieval	text	acm, afr, als, amh, apc, ... (115)
MLQARetrieval	Retrieval	text	ara, deu, eng, hin, spa, ... (7)
StatcanDialogueDatasetRetrieval	Retrieval	text	eng, fra
WikipediaRetrievalMultilingual	Retrieval	text	ben, bul, ces, dan, deu, ... (16)
CovidRetrieval	Retrieval	text	cmn
Core17InstructionRetrieval	InstructionReranking	text	eng
News21InstructionRetrieval	InstructionReranking	text	eng
Robust04InstructionRetrieval	InstructionReranking	text	eng
KorHateSpeechMLClassification	MultilabelClassification	text	kor
MalteseNewsClassification	MultilabelClassification	text	mlt
MultiEURLEXMultilabelClassification	MultilabelClassification	text	bul, ces, dan, deu, ell, ... (23)
BrazilianToxicTweetsClassification	MultilabelClassification	text	por
CEDRClassification	MultilabelClassification	text	rus
CTKFactsNLI	PairClassification	text	ces
SprintDuplicateQuestions	PairClassification	text	eng
TwitterURLCorpus	PairClassification	text	eng
ArmenianParaphrasePC	PairClassification	text	hye
indonli	PairClassification	text	ind
OpusparcusPC	PairClassification	text	deu, eng, fin, fra, rus, ... (6)
PawsXPairClassification	PairClassification	text	cmn, deu, eng, fra, jpn, ... (7)
RTE3	PairClassification	text	deu, eng, fra, ita
XNLI	PairClassification	text	ara, bul, deu, ell, eng, ... (14)
PpcPC	PairClassification	text	pol
TERRa	PairClassification	text	rus
WebLINXCandidatesReranking	Reranking	text	eng
AlloprofReranking	Reranking	text	fra
VoyageMMarcoReranking	Reranking	text	jpn
WikipediaRerankingMultilingual	Reranking	text	ben, bul, ces, dan, deu, ... (18)
RuBQReranking	Reranking	text	rus
T2Reranking	Reranking	text	cmn
GermanSTSBenchmark	STS	text	deu
SICK-R	STS	text	eng
STS12	STS	text	eng
STS13	STS	text	eng
STS14	STS	text	eng
STS15	STS	text	eng
STSBenchmark	STS	text	eng
FaroeseSTS	STS	text	fao
FinParaSTS	STS	text	fin
JSICK	STS	text	jpn
IndicCrosslingualSTS	STS	text	asm, ben, eng, guj, hin, ... (13)
SemRel24STS	STS	text	afr, amh, arb, arq, ary, ... (12)
STS17	STS	text	ara, deu, eng, fra, ita, ... (9)
STS22.v2	STS	text	ara, cmn, deu, eng, fra, ... (10)
STSES	STS	text	spa
STSB	STS	text	cmn
MIRACLRetrievalHardNegatives	Retrieval	text	ara, ben, deu, eng, fas, ... (18)

Citation

@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(Scandinavian, v1)¶

A curated selection of tasks coverering the Scandinavian languages; Danish, Swedish and Norwegian, including Bokmål and Nynorsk.

Learn more →

Tasks

name	type	modalities	languages
BornholmBitextMining	BitextMining	text	dan
NorwegianCourtsBitextMining	BitextMining	text	nno, nob
AngryTweetsClassification	Classification	text	dan
DanishPoliticalCommentsClassification	Classification	text	dan
DalajClassification	Classification	text	swe
DKHateClassification	Classification	text	dan
LccSentimentClassification	Classification	text	dan
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
NordicLangClassification	Classification	text	dan, fao, isl, nno, nob, ... (6)
NoRecClassification	Classification	text	nob
NorwegianParliamentClassification	Classification	text	nob
ScalaClassification	Classification	text	dan, nno, nob, swe
SwedishSentimentClassification	Classification	text	swe
SweRecClassification	Classification	text	swe
DanFeverRetrieval	Retrieval	text	dan
NorQuadRetrieval	Retrieval	text	nob
SNLRetrieval	Retrieval	text	nob
SwednRetrieval	Retrieval	text	swe
SweFaqRetrieval	Retrieval	text	swe
TV2Nordretrieval	Retrieval	text	dan
TwitterHjerneRetrieval	Retrieval	text	dan
SNLHierarchicalClusteringS2S	Clustering	text	nob
SNLHierarchicalClusteringP2P	Clustering	text	nob
SwednClusteringP2P	Clustering	text	swe
SwednClusteringS2S	Clustering	text	swe
VGHierarchicalClusteringS2S	Clustering	text	nob
VGHierarchicalClusteringP2P	Clustering	text	nob

Citation

@article{enevoldsenScandinavianEmbeddingBenchmarks2024,
  author = {Enevoldsen, Kenneth and Kardos, Márton and Muennighoff, Niklas and Nielbo, Kristoffer},
  language = {en},
  month = feb,
  shorttitle = {The {Scandinavian} {Embedding} {Benchmarks}},
  title = {The {Scandinavian} {Embedding} {Benchmarks}: {Comprehensive} {Assessment} of {Multilingual} and {Monolingual} {Text} {Embedding}},
  url = {https://openreview.net/forum?id=pJl_i7HIA72},
  urldate = {2024-04-12},
  year = {2024},
}

MTEB(cmn, v1)¶

The Chinese Massive Text Embedding Benchmark (C-MTEB) is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets.

Learn more →

Tasks

name	type	modalities	languages
T2Retrieval	Retrieval	text	cmn
MMarcoRetrieval	Retrieval	text	cmn
DuRetrieval	Retrieval	text	cmn
CovidRetrieval	Retrieval	text	cmn
CmedqaRetrieval	Retrieval	text	cmn
EcomRetrieval	Retrieval	text	cmn
MedicalRetrieval	Retrieval	text	cmn
VideoRetrieval	Retrieval	text	cmn
T2Reranking	Reranking	text	cmn
MMarcoReranking	Reranking	text	cmn
CMedQAv1-reranking	Reranking	text	cmn
CMedQAv2-reranking	Reranking	text	cmn
Ocnli	PairClassification	text	cmn
Cmnli	PairClassification	text	cmn
CLSClusteringS2S	Clustering	text	cmn
CLSClusteringP2P	Clustering	text	cmn
ThuNewsClusteringS2S	Clustering	text	cmn
ThuNewsClusteringP2P	Clustering	text	cmn
LCQMC	STS	text	cmn
PAWSX	STS	text	cmn
AFQMC	STS	text	cmn
QBQTC	STS	text	cmn
TNews	Classification	text	cmn
IFlyTek	Classification	text	cmn
Waimai	Classification	text	cmn
OnlineShopping	Classification	text	cmn
JDReview	Classification	text	cmn
MultilingualSentiment	Classification	text	cmn
ATEC	STS	text	cmn
BQ	STS	text	cmn
STSB	STS	text	cmn
MultilingualSentiment	Classification	text	cmn

Citation

@misc{xiao2024cpackpackagedresourcesadvance,
  archiveprefix = {arXiv},
  author = {Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff and Defu Lian and Jian-Yun Nie},
  eprint = {2309.07597},
  primaryclass = {cs.CL},
  title = {C-Pack: Packaged Resources To Advance General Chinese Embedding},
  url = {https://arxiv.org/abs/2309.07597},
  year = {2024},
}

MTEB(deu, v1)¶

A benchmark for text-embedding performance in German.

Learn more →

Tasks

name	type	modalities	languages
AmazonCounterfactualClassification	Classification	text	deu, eng, jpn
AmazonReviewsClassification	Classification	text	cmn, deu, eng, fra, jpn, ... (6)
MTOPDomainClassification	Classification	text	deu, eng, fra, hin, spa, ... (6)
MTOPIntentClassification	Classification	text	deu, eng, fra, hin, spa, ... (6)
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
BlurbsClusteringP2P	Clustering	text	deu
BlurbsClusteringS2S	Clustering	text	deu
TenKGnadClusteringP2P	Clustering	text	deu
TenKGnadClusteringS2S	Clustering	text	deu
FalseFriendsGermanEnglish	PairClassification	text	deu
PawsXPairClassification	PairClassification	text	cmn, deu, eng, fra, jpn, ... (7)
MIRACLReranking	Reranking	text	ara, ben, deu, eng, fas, ... (18)
GermanQuAD-Retrieval	Retrieval	text	deu
GermanDPR	Retrieval	text	deu
XMarket	Retrieval	text	deu, eng, spa
GerDaLIR	Retrieval	text	deu
GermanSTSBenchmark	STS	text	deu
STS22	STS	text	ara, cmn, deu, eng, fra, ... (10)

Citation

@misc{wehrli2024germantextembeddingclustering,
  archiveprefix = {arXiv},
  author = {Silvan Wehrli and Bert Arnrich and Christopher Irrgang},
  eprint = {2401.02709},
  primaryclass = {cs.CL},
  title = {German Text Embedding Clustering Benchmark},
  url = {https://arxiv.org/abs/2401.02709},
  year = {2024},
}

MTEB(eng, v1)¶

The original English benchmark by Muennighoff et al., (2023). This page is an adaptation of the old MTEB leaderboard. We recommend that you use MTEB(eng, v2) instead, as it uses updated versions of the task, making it notably faster to run and resolving a known bug in existing tasks. This benchmark also removes datasets common for fine-tuning, such as MSMARCO, which makes model performance scores more comparable. However, generally, both benchmarks provide similar estimates.

Tasks

name	type	modalities	languages
AmazonPolarityClassification	Classification	text	eng
AmazonReviewsClassification	Classification	text	cmn, deu, eng, fra, jpn, ... (6)
ArguAna	Retrieval	text	eng
ArxivClusteringP2P	Clustering	text	eng
ArxivClusteringS2S	Clustering	text	eng
AskUbuntuDupQuestions	Reranking	text	eng
BIOSSES	STS	text	eng
Banking77Classification	Classification	text	eng
BiorxivClusteringP2P	Clustering	text	eng
BiorxivClusteringS2S	Clustering	text	eng
CQADupstackRetrieval	Retrieval	text	eng
ClimateFEVER	Retrieval	text	eng
DBPedia	Retrieval	text	eng
EmotionClassification	Classification	text	eng
FEVER	Retrieval	text	eng
FiQA2018	Retrieval	text	eng
HotpotQA	Retrieval	text	eng
ImdbClassification	Classification	text	eng
MTOPDomainClassification	Classification	text	deu, eng, fra, hin, spa, ... (6)
MTOPIntentClassification	Classification	text	deu, eng, fra, hin, spa, ... (6)
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MedrxivClusteringP2P	Clustering	text	eng
MedrxivClusteringS2S	Clustering	text	eng
MindSmallReranking	Reranking	text	eng
NFCorpus	Retrieval	text	eng
NQ	Retrieval	text	eng
QuoraRetrieval	Retrieval	text	eng
RedditClustering	Clustering	text	eng
RedditClusteringP2P	Clustering	text	eng
SCIDOCS	Retrieval	text	eng
SICK-R	STS	text	eng
STS12	STS	text	eng
STS13	STS	text	eng
STS14	STS	text	eng
STS15	STS	text	eng
STS16	STS	text	eng
STSBenchmark	STS	text	eng
SciDocsRR	Reranking	text	eng
SciFact	Retrieval	text	eng
SprintDuplicateQuestions	PairClassification	text	eng
StackExchangeClustering	Clustering	text	eng
StackExchangeClusteringP2P	Clustering	text	eng
StackOverflowDupQuestions	Reranking	text	eng
SummEval	Summarization	text	eng
TRECCOVID	Retrieval	text	eng
Touche2020	Retrieval	text	eng
ToxicConversationsClassification	Classification	text	eng
TweetSentimentExtractionClassification	Classification	text	eng
TwentyNewsgroupsClustering	Clustering	text	eng
TwitterSemEval2015	PairClassification	text	eng
TwitterURLCorpus	PairClassification	text	eng
MSMARCO	Retrieval	text	eng
AmazonCounterfactualClassification	Classification	text	deu, eng, jpn
STS17	STS	text	ara, deu, eng, fra, ita, ... (9)
STS22	STS	text	ara, cmn, deu, eng, fra, ... (10)

Citation

@article{muennighoff2022mteb,
  author = {Muennighoff, Niklas and Tazi, Nouamane and Magne, Loïc and Reimers, Nils},
  doi = {10.48550/ARXIV.2210.07316},
  journal = {arXiv preprint arXiv:2210.07316},
  publisher = {arXiv},
  title = {MTEB: Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2210.07316},
  year = {2022},
}

MTEB(eng, v2)¶

The new English Massive Text Embedding Benchmark. This benchmark was created to account for the fact that many models have now been finetuned to tasks in the original MTEB, and contains tasks that are not as frequently used for model training. This way the new benchmark and leaderboard can give our users a more realistic expectation of models' generalization performance.

The original MTEB leaderboard is available under the MTEB(eng, v1) tab.

Tasks

name	type	modalities	languages
ArguAna	Retrieval	text	eng
ArXivHierarchicalClusteringP2P	Clustering	text	eng
ArXivHierarchicalClusteringS2S	Clustering	text	eng
AskUbuntuDupQuestions	Reranking	text	eng
BIOSSES	STS	text	eng
Banking77Classification	Classification	text	eng
BiorxivClusteringP2P.v2	Clustering	text	eng
CQADupstackGamingRetrieval	Retrieval	text	eng
CQADupstackUnixRetrieval	Retrieval	text	eng
ClimateFEVERHardNegatives	Retrieval	text	eng
FEVERHardNegatives	Retrieval	text	eng
FiQA2018	Retrieval	text	eng
HotpotQAHardNegatives	Retrieval	text	eng
ImdbClassification	Classification	text	eng
MTOPDomainClassification	Classification	text	deu, eng, fra, hin, spa, ... (6)
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MedrxivClusteringP2P.v2	Clustering	text	eng
MedrxivClusteringS2S.v2	Clustering	text	eng
MindSmallReranking	Reranking	text	eng
SCIDOCS	Retrieval	text	eng
SICK-R	STS	text	eng
STS12	STS	text	eng
STS13	STS	text	eng
STS14	STS	text	eng
STS15	STS	text	eng
STSBenchmark	STS	text	eng
SprintDuplicateQuestions	PairClassification	text	eng
StackExchangeClustering.v2	Clustering	text	eng
StackExchangeClusteringP2P.v2	Clustering	text	eng
TRECCOVID	Retrieval	text	eng
Touche2020Retrieval.v3	Retrieval	text	eng
ToxicConversationsClassification	Classification	text	eng
TweetSentimentExtractionClassification	Classification	text	eng
TwentyNewsgroupsClustering.v2	Clustering	text	eng
TwitterSemEval2015	PairClassification	text	eng
TwitterURLCorpus	PairClassification	text	eng
SummEvalSummarization.v2	Summarization	text	eng
AmazonCounterfactualClassification	Classification	text	deu, eng, jpn
STS17	STS	text	ara, deu, eng, fra, ita, ... (9)
STS22.v2	STS	text	ara, cmn, deu, eng, fra, ... (10)

Citation

@article{enevoldsen2025mmtebmassivemultilingualtext,
  author = {Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff},
  doi = {10.48550/arXiv.2502.13595},
  journal = {arXiv preprint arXiv:2502.13595},
  publisher = {arXiv},
  title = {MMTEB: Massive Multilingual Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2502.13595},
  year = {2025},
}

MTEB(fas, v1)¶

The Persian Massive Text Embedding Benchmark (FaMTEB) is a comprehensive benchmark for Persian text embeddings covering 7 tasks and 60+ datasets.

Learn more →

Tasks

name	type	modalities	languages
PersianFoodSentimentClassification	Classification	text	fas
SynPerChatbotConvSAClassification	Classification	text	fas
SynPerChatbotConvSAToneChatbotClassification	Classification	text	fas
SynPerChatbotConvSAToneUserClassification	Classification	text	fas
SynPerChatbotSatisfactionLevelClassification	Classification	text	fas
SynPerChatbotRAGToneChatbotClassification	Classification	text	fas
SynPerChatbotRAGToneUserClassification	Classification	text	fas
SynPerChatbotToneChatbotClassification	Classification	text	fas
SynPerChatbotToneUserClassification	Classification	text	fas
SynPerTextToneClassification	Classification	text	fas
SIDClassification	Classification	text	fas
DeepSentiPers	Classification	text	fas
PersianTextEmotion	Classification	text	fas
SentimentDKSF	Classification	text	fas
NLPTwitterAnalysisClassification	Classification	text	fas
DigikalamagClassification	Classification	text	fas
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
BeytooteClustering	Clustering	text	fas
DigikalamagClustering	Clustering	text	fas
HamshahriClustring	Clustering	text	fas
NLPTwitterAnalysisClustering	Clustering	text	fas
SIDClustring	Clustering	text	fas
FarsTail	PairClassification	text	fas
CExaPPC	PairClassification	text	fas
SynPerChatbotRAGFAQPC	PairClassification	text	fas
FarsiParaphraseDetection	PairClassification	text	fas
SynPerTextKeywordsPC	PairClassification	text	fas
SynPerQAPC	PairClassification	text	fas
ParsinluEntail	PairClassification	text	fas
ParsinluQueryParaphPC	PairClassification	text	fas
MIRACLReranking	Reranking	text	ara, ben, deu, eng, fas, ... (18)
WikipediaRerankingMultilingual	Reranking	text	ben, bul, ces, dan, deu, ... (18)
SynPerQARetrieval	Retrieval	text	fas
SynPerChatbotTopicsRetrieval	Retrieval	text	fas
SynPerChatbotRAGTopicsRetrieval	Retrieval	text	fas
SynPerChatbotRAGFAQRetrieval	Retrieval	text	fas
PersianWebDocumentRetrieval	Retrieval	text	fas
WikipediaRetrievalMultilingual	Retrieval	text	ben, bul, ces, dan, deu, ... (16)
MIRACLRetrieval	Retrieval	text	ara, ben, deu, eng, fas, ... (18)
ClimateFEVER-Fa	Retrieval	text	fas
DBPedia-Fa	Retrieval	text	fas
HotpotQA-Fa	Retrieval	text	fas
MSMARCO-Fa	Retrieval	text	fas
NQ-Fa	Retrieval	text	fas
ArguAna-Fa	Retrieval	text	fas
CQADupstackRetrieval-Fa	Retrieval	text	fas
FiQA2018-Fa	Retrieval	text	fas
NFCorpus-Fa	Retrieval	text	fas
QuoraRetrieval-Fa	Retrieval	text	fas
SCIDOCS-Fa	Retrieval	text	fas
SciFact-Fa	Retrieval	text	fas
TRECCOVID-Fa	Retrieval	text	fas
Touche2020-Fa	Retrieval	text	fas
Farsick	STS	text	fas
SynPerSTS	STS	text	fas
Query2Query	STS	text	fas
SAMSumFa	BitextMining	text	fas
SynPerChatbotSumSRetrieval	BitextMining	text	fas
SynPerChatbotRAGSumSRetrieval	BitextMining	text	fas

Citation

@article{zinvandi2025famteb,
  author = {Zinvandi, Erfan and Alikhani, Morteza and Sarmadi, Mehran and Pourbahman, Zahra and Arvin, Sepehr and Kazemi, Reza and Amini, Arash},
  journal = {arXiv preprint arXiv:2502.11571},
  title = {Famteb: Massive text embedding benchmark in persian language},
  year = {2025},
}

MTEB(fas, v2)¶

The Persian Massive Text Embedding Benchmark (FaMTEB) is a comprehensive benchmark for Persian text embeddings covering 7 tasks and 50+ datasets. In version 2, we have optimized large datasets to make them more manageable and accessible, removed low-quality datasets, and added higher-quality data to improve the overall benchmark. For more details on the improvements, see the main PR comment: main PR.

Learn more →

Tasks

name	type	modalities	languages
PersianFoodSentimentClassification	Classification	text	fas
SynPerChatbotConvSAClassification	Classification	text	fas
SynPerChatbotConvSAToneChatbotClassification	Classification	text	fas
SynPerChatbotConvSAToneUserClassification	Classification	text	fas
SynPerChatbotSatisfactionLevelClassification	Classification	text	fas
SynPerTextToneClassification.v3	Classification	text	fas
SIDClassification.v2	Classification	text	fas
DeepSentiPers.v2	Classification	text	fas
PersianTextEmotion.v2	Classification	text	fas
NLPTwitterAnalysisClassification.v2	Classification	text	fas
DigikalamagClassification	Classification	text	fas
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
StyleClassification	Classification	text	fas
PerShopDomainClassification	Classification	text	fas
PerShopIntentClassification	Classification	text	fas
BeytooteClustering	Clustering	text	fas
DigikalamagClustering	Clustering	text	fas
HamshahriClustring	Clustering	text	fas
NLPTwitterAnalysisClustering	Clustering	text	fas
SIDClustring	Clustering	text	fas
FarsTail	PairClassification	text	fas
SynPerChatbotRAGFAQPC	PairClassification	text	fas
FarsiParaphraseDetection	PairClassification	text	fas
SynPerTextKeywordsPC	PairClassification	text	fas
SynPerQAPC	PairClassification	text	fas
ParsinluEntail	PairClassification	text	fas
ParsinluQueryParaphPC	PairClassification	text	fas
MIRACLReranking	Reranking	text	ara, ben, deu, eng, fas, ... (18)
WikipediaRerankingMultilingual	Reranking	text	ben, bul, ces, dan, deu, ... (18)
SynPerQARetrieval	Retrieval	text	fas
SynPerChatbotRAGFAQRetrieval	Retrieval	text	fas
PersianWebDocumentRetrieval	Retrieval	text	fas
WikipediaRetrievalMultilingual	Retrieval	text	ben, bul, ces, dan, deu, ... (16)
MIRACLRetrievalHardNegatives	Retrieval	text	ara, ben, deu, eng, fas, ... (18)
HotpotQA-FaHardNegatives	Retrieval	text	fas
MSMARCO-FaHardNegatives	Retrieval	text	fas
NQ-FaHardNegatives	Retrieval	text	fas
ArguAna-Fa.v2	Retrieval	text	fas
FiQA2018-Fa.v2	Retrieval	text	fas
QuoraRetrieval-Fa.v2	Retrieval	text	fas
SCIDOCS-Fa.v2	Retrieval	text	fas
SciFact-Fa.v2	Retrieval	text	fas
TRECCOVID-Fa.v2	Retrieval	text	fas
FEVER-FaHardNegatives	Retrieval	text	fas
NeuCLIR2023RetrievalHardNegatives	Retrieval	text	fas, rus, zho
WebFAQRetrieval	Retrieval	text	ara, aze, ben, bul, cat, ... (51)
Farsick	STS	text	fas
SynPerSTS	STS	text	fas
SAMSumFa	BitextMining	text	fas
SynPerChatbotSumSRetrieval	BitextMining	text	fas
SynPerChatbotRAGSumSRetrieval	BitextMining	text	fas

Citation

@article{zinvandi2025famteb,
  author = {Zinvandi, Erfan and Alikhani, Morteza and Sarmadi, Mehran and Pourbahman, Zahra and Arvin, Sepehr and Kazemi, Reza and Amini, Arash},
  journal = {arXiv preprint arXiv:2502.11571},
  title = {Famteb: Massive text embedding benchmark in persian language},
  year = {2025},
}

MTEB(fra, v1)¶

MTEB-French, a French expansion of the original benchmark with high-quality native French datasets.

Learn more →

Tasks

name	type	modalities	languages
AmazonReviewsClassification	Classification	text	cmn, deu, eng, fra, jpn, ... (6)
MasakhaNEWSClassification	Classification	text	amh, eng, fra, hau, ibo, ... (16)
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MTOPDomainClassification	Classification	text	deu, eng, fra, hin, spa, ... (6)
MTOPIntentClassification	Classification	text	deu, eng, fra, hin, spa, ... (6)
AlloProfClusteringP2P	Clustering	text	fra
AlloProfClusteringS2S	Clustering	text	fra
HALClusteringS2S	Clustering	text	fra
MasakhaNEWSClusteringP2P	Clustering	text	amh, eng, fra, hau, ibo, ... (16)
MasakhaNEWSClusteringS2S	Clustering	text	amh, eng, fra, hau, ibo, ... (16)
MLSUMClusteringP2P	Clustering	text	deu, fra, rus, spa
MLSUMClusteringS2S	Clustering	text	deu, fra, rus, spa
PawsXPairClassification	PairClassification	text	cmn, deu, eng, fra, jpn, ... (7)
AlloprofReranking	Reranking	text	fra
SyntecReranking	Reranking	text	fra
AlloprofRetrieval	Retrieval	text	fra
BSARDRetrieval	Retrieval	text	fra
MintakaRetrieval	Retrieval	text	ara, deu, fra, hin, ita, ... (8)
SyntecRetrieval	Retrieval	text	fra
XPQARetrieval	Retrieval	text	ara, cmn, deu, eng, fra, ... (13)
SICKFr	STS	text	fra
STSBenchmarkMultilingualSTS	STS	text	cmn, deu, eng, fra, ita, ... (10)
SummEvalFr	Summarization	text	fra
STS22	STS	text	ara, cmn, deu, eng, fra, ... (10)

Citation

@misc{ciancone2024mtebfrenchresourcesfrenchsentence,
  archiveprefix = {arXiv},
  author = {Mathieu Ciancone and Imene Kerboua and Marion Schaeffer and Wissam Siblini},
  eprint = {2405.20468},
  primaryclass = {cs.CL},
  title = {MTEB-French: Resources for French Sentence Embedding Evaluation and Analysis},
  url = {https://arxiv.org/abs/2405.20468},
  year = {2024},
}

MTEB(jpn, v1)¶

JMTEB is a benchmark for evaluating Japanese text embedding models.

Learn more →

Tasks

name	type	modalities	languages
LivedoorNewsClustering.v2	Clustering	text	jpn
MewsC16JaClustering	Clustering	text	jpn
AmazonReviewsClassification	Classification	text	cmn, deu, eng, fra, jpn, ... (6)
AmazonCounterfactualClassification	Classification	text	deu, eng, jpn
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
JSTS	STS	text	jpn
JSICK	STS	text	jpn
PawsXPairClassification	PairClassification	text	cmn, deu, eng, fra, jpn, ... (7)
JaqketRetrieval	Retrieval	text	jpn
MrTidyRetrieval	Retrieval	text	ara, ben, eng, fin, ind, ... (11)
JaGovFaqsRetrieval	Retrieval	text	jpn
NLPJournalTitleAbsRetrieval	Retrieval	text	jpn
NLPJournalAbsIntroRetrieval	Retrieval	text	jpn
NLPJournalTitleIntroRetrieval	Retrieval	text	jpn
ESCIReranking	Reranking	text	eng, jpn, spa

MTEB(kor, v1)¶

A benchmark and leaderboard for evaluation of text embedding in Korean.

Tasks

name	type	modalities	languages
KLUE-TC	Classification	text	kor
MIRACLReranking	Reranking	text	ara, ben, deu, eng, fas, ... (18)
MIRACLRetrieval	Retrieval	text	ara, ben, deu, eng, fas, ... (18)
Ko-StrategyQA	Retrieval	text	kor
KLUE-STS	STS	text	kor
KorSTS	STS	text	kor

MTEB(nld, v1)¶

MTEB-NL

Learn more →

Tasks

name	type	modalities	languages
DutchBookReviewSentimentClassification.v2	Classification	text	nld
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
SIB200Classification	Classification	text	ace, acm, acq, aeb, afr, ... (197)
MultiHateClassification	Classification	text	ara, cmn, deu, eng, fra, ... (11)
VaccinChatNLClassification	Classification	text	nld
DutchColaClassification	Classification	text	nld
DutchGovernmentBiasClassification	Classification	text	nld
DutchSarcasticHeadlinesClassification	Classification	text	nld
DutchNewsArticlesClassification	Classification	text	nld
OpenTenderClassification	Classification	text	nld
IconclassClassification	Classification	text	nld
SICKNLPairClassification	PairClassification	text	nld
XLWICNLPairClassification	PairClassification	text	nld
CovidDisinformationNLMultiLabelClassification	MultilabelClassification	text	nld
MultiEURLEXMultilabelClassification	MultilabelClassification	text	bul, ces, dan, deu, ell, ... (23)
VABBMultiLabelClassification	MultilabelClassification	text	nld
DutchNewsArticlesClusteringS2S	Clustering	text	nld
DutchNewsArticlesClusteringP2P	Clustering	text	nld
SIB200ClusteringS2S	Clustering	text	ace, acm, acq, aeb, afr, ... (197)
VABBClusteringS2S	Clustering	text	nld
VABBClusteringP2P	Clustering	text	nld
OpenTenderClusteringS2S	Clustering	text	nld
OpenTenderClusteringP2P	Clustering	text	nld
IconclassClusteringS2S	Clustering	text	nld
WikipediaRerankingMultilingual	Reranking	text	ben, bul, ces, dan, deu, ... (18)
ArguAna-NL.v2	Retrieval	text	nld
SCIDOCS-NL.v2	Retrieval	text	nld
SciFact-NL.v2	Retrieval	text	nld
NFCorpus-NL.v2	Retrieval	text	nld
BelebeleRetrieval	Retrieval	text	acm, afr, als, amh, apc, ... (115)
WebFAQRetrieval	Retrieval	text	ara, aze, ben, bul, cat, ... (51)
DutchNewsArticlesRetrieval	Retrieval	text	nld
bBSARDNLRetrieval	Retrieval	text	nld
LegalQANLRetrieval	Retrieval	text	nld
OpenTenderRetrieval	Retrieval	text	nld
VABBRetrieval	Retrieval	text	nld
WikipediaRetrievalMultilingual	Retrieval	text	ben, bul, ces, dan, deu, ... (16)
SICK-NL-STS	STS	text	nld
STSBenchmarkMultilingualSTS	STS	text	cmn, deu, eng, fra, ita, ... (10)

Citation

@misc{banar2025mtebnle5nlembeddingbenchmark,
  archiveprefix = {arXiv},
  author = {Nikolay Banar and Ehsan Lotfi and Jens Van Nooten and Cristina Arhiliuc and Marija Kliocaite and Walter Daelemans},
  eprint = {22509.12340},
  primaryclass = {cs.CL},
  title = {MTEB-NL and E5-NL: Embedding Benchmark and Models for Dutch},
  url = {https://arxiv.org/abs/2509.12340},
  year = {2025},
}

MTEB(pol, v1)¶

Polish Massive Text Embedding Benchmark (PL-MTEB), a comprehensive benchmark for text embeddings in Polish. The PL-MTEB consists of 28 diverse NLP tasks from 5 task types. With tasks adapted based on previously used datasets by the Polish NLP community. In addition, a new PLSC (Polish Library of Science Corpus) dataset was created consisting of titles and abstracts of scientific publications in Polish, which was used as the basis for two novel clustering tasks.

Learn more →

Tasks

name	type	modalities	languages
AllegroReviews	Classification	text	pol
CBD	Classification	text	pol
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
PolEmo2.0-IN	Classification	text	pol
PolEmo2.0-OUT	Classification	text	pol
PAC	Classification	text	pol
EightTagsClustering	Clustering	text	pol
PlscClusteringS2S	Clustering	text	pol
PlscClusteringP2P	Clustering	text	pol
CDSC-E	PairClassification	text	pol
PpcPC	PairClassification	text	pol
PSC	PairClassification	text	pol
SICK-E-PL	PairClassification	text	pol
CDSC-R	STS	text	pol
SICK-R-PL	STS	text	pol
STS22	STS	text	ara, cmn, deu, eng, fra, ... (10)

Citation

@article{poswiata2024plmteb,
  author = {Rafał Poświata and Sławomir Dadas and Michał Perełkiewicz},
  journal = {arXiv preprint arXiv:2405.10138},
  title = {PL-MTEB: Polish Massive Text Embedding Benchmark},
  year = {2024},
}

MTEB(rus, v1)¶

A Russian version of the Massive Text Embedding Benchmark with a number of novel Russian tasks in all task categories of the original MTEB.

Learn more →

Tasks

name	type	modalities	languages
GeoreviewClassification	Classification	text	rus
HeadlineClassification	Classification	text	rus
InappropriatenessClassification	Classification	text	rus
KinopoiskClassification	Classification	text	rus
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
RuReviewsClassification	Classification	text	rus
RuSciBenchGRNTIClassification	Classification	text	rus
RuSciBenchOECDClassification	Classification	text	rus
GeoreviewClusteringP2P	Clustering	text	rus
RuSciBenchGRNTIClusteringP2P	Clustering	text	rus
RuSciBenchOECDClusteringP2P	Clustering	text	rus
CEDRClassification	MultilabelClassification	text	rus
SensitiveTopicsClassification	MultilabelClassification	text	rus
TERRa	PairClassification	text	rus
MIRACLReranking	Reranking	text	ara, ben, deu, eng, fas, ... (18)
RuBQReranking	Reranking	text	rus
MIRACLRetrieval	Retrieval	text	ara, ben, deu, eng, fas, ... (18)
RiaNewsRetrieval	Retrieval	text	rus
RuBQRetrieval	Retrieval	text	rus
RUParaPhraserSTS	STS	text	rus
STS22	STS	text	ara, cmn, deu, eng, fra, ... (10)
RuSTSBenchmarkSTS	STS	text	rus

Citation

@misc{snegirev2024russianfocusedembeddersexplorationrumteb,
  archiveprefix = {arXiv},
  author = {Artem Snegirev and Maria Tikhonova and Anna Maksimova and Alena Fenogenova and Alexander Abramov},
  eprint = {2408.12503},
  primaryclass = {cs.CL},
  title = {The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design},
  url = {https://arxiv.org/abs/2408.12503},
  year = {2024},
}

MTEB(rus, v1.1)¶

A Russian version of the Massive Text Embedding Benchmark covering the task categories of classification, clustering, reranking, pair classification, retrieval, and semantic similarity. In v1.1, MIRACLRetrieval and RiaNewsRetrieval were replaced with their HardNegatives variants for improved time-optimization measurement. MIRACLRetrievalHardNegatives and RiaNewsRetrievalHardNegatives are used in their updated versions (v2), both of which include improved default prompts.

Learn more →

Tasks

name	type	modalities	languages
GeoreviewClassification	Classification	text	rus
HeadlineClassification	Classification	text	rus
InappropriatenessClassification	Classification	text	rus
KinopoiskClassification	Classification	text	rus
MassiveIntentClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
MassiveScenarioClassification	Classification	text	afr, amh, ara, aze, ben, ... (50)
RuReviewsClassification	Classification	text	rus
RuSciBenchGRNTIClassification	Classification	text	rus
RuSciBenchOECDClassification	Classification	text	rus
GeoreviewClusteringP2P	Clustering	text	rus
RuSciBenchGRNTIClusteringP2P	Clustering	text	rus
RuSciBenchOECDClusteringP2P	Clustering	text	rus
CEDRClassification	MultilabelClassification	text	rus
SensitiveTopicsClassification	MultilabelClassification	text	rus
TERRa	PairClassification	text	rus
MIRACLReranking	Reranking	text	ara, ben, deu, eng, fas, ... (18)
RuBQReranking	Reranking	text	rus
MIRACLRetrievalHardNegatives.v2	Retrieval	text	ara, ben, deu, eng, fas, ... (18)
RiaNewsRetrievalHardNegatives.v2	Retrieval	text	rus
RuBQRetrieval	Retrieval	text	rus
RUParaPhraserSTS	STS	text	rus
STS22	STS	text	ara, cmn, deu, eng, fra, ... (10)
RuSTSBenchmarkSTS	STS	text	rus

Citation

@misc{snegirev2024russianfocusedembeddersexplorationrumteb,
  archiveprefix = {arXiv},
  author = {Artem Snegirev and Maria Tikhonova and Anna Maksimova and Alena Fenogenova and Alexander Abramov},
  eprint = {2408.12503},
  primaryclass = {cs.CL},
  title = {The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design},
  url = {https://arxiv.org/abs/2408.12503},
  year = {2024},
}

MTEB(spa, v1)¶

Spanish text embedding benchmark covering classification, clustering, pair classification, reranking, retrieval, and semantic textual similarity tasks. For a discussion on the benchmark construction see the original submission.

Tasks

name	type	modalities	languages
SpanishNewsClassification.v2	Classification	text	spa
SpanishSentimentClassification.v2	Classification	text	spa
MLSUMClusteringP2P	Clustering	text	deu, fra, rus, spa
MLSUMClusteringS2S	Clustering	text	deu, fra, rus, spa
PawsXPairClassification	PairClassification	text	cmn, deu, eng, fra, jpn, ... (7)
XNLI	PairClassification	text	ara, bul, deu, ell, eng, ... (14)
MIRACLReranking	Reranking	text	ara, ben, deu, eng, fas, ... (18)
MIRACLRetrievalHardNegatives.v2	Retrieval	text	ara, ben, deu, eng, fas, ... (18)
MintakaRetrieval	Retrieval	text	ara, deu, fra, hin, ita, ... (8)
SpanishPassageRetrievalS2P	Retrieval	text	spa
SpanishPassageRetrievalS2S	Retrieval	text	spa
XPQARetrieval	Retrieval	text	ara, cmn, deu, eng, fra, ... (13)
STSES	STS	text	spa
STSBenchmarkMultilingualSTS	STS	text	cmn, deu, eng, fra, ita, ... (10)
STS17	STS	text	ara, deu, eng, fra, ita, ... (9)
STS22	STS	text	ara, cmn, deu, eng, fra, ... (10)

NanoBEIR¶

A benchmark to evaluate with subsets of BEIR datasets to use less computational power

Learn more →

Tasks

name	type	modalities	languages
NanoArguAnaRetrieval	Retrieval	text	eng
NanoClimateFeverRetrieval	Retrieval	text	eng
NanoDBPediaRetrieval	Retrieval	text	eng
NanoFEVERRetrieval	Retrieval	text	eng
NanoFiQA2018Retrieval	Retrieval	text	eng
NanoHotpotQARetrieval	Retrieval	text	eng
NanoMSMARCORetrieval	Retrieval	text	eng
NanoNFCorpusRetrieval	Retrieval	text	eng
NanoNQRetrieval	Retrieval	text	eng
NanoQuoraRetrieval	Retrieval	text	eng
NanoSCIDOCSRetrieval	Retrieval	text	eng
NanoSciFactRetrieval	Retrieval	text	eng
NanoTouche2020Retrieval	Retrieval	text	eng

R2MED¶

R2MED: First Reasoning-Driven Medical Retrieval Benchmark. R2MED is a high-quality, high-resolution information retrieval (IR) dataset designed for medical scenarios. It contains 876 queries with three retrieval tasks, five medical scenarios, and twelve body systems.

Learn more →

Tasks

name	type	modalities	languages
R2MEDBiologyRetrieval	Retrieval	text	eng
R2MEDBioinformaticsRetrieval	Retrieval	text	eng
R2MEDMedicalSciencesRetrieval	Retrieval	text	eng
R2MEDMedXpertQAExamRetrieval	Retrieval	text	eng
R2MEDMedQADiagRetrieval	Retrieval	text	eng
R2MEDPMCTreatmentRetrieval	Retrieval	text	eng
R2MEDPMCClinicalRetrieval	Retrieval	text	eng
R2MEDIIYiClinicalRetrieval	Retrieval	text	eng

Citation

@article{li2025r2med,
  author = {Li, Lei and Zhou, Xiao and Liu, Zheng},
  journal = {arXiv preprint arXiv:2505.14558},
  title = {R2MED: A Benchmark for Reasoning-Driven Medical Retrieval},
  year = {2025},
}

RAR-b¶

A benchmark to evaluate reasoning capabilities of retrievers.

Learn more →

Tasks

name	type	modalities	languages
ARCChallenge	Retrieval	text	eng
AlphaNLI	Retrieval	text	eng
HellaSwag	Retrieval	text	eng
WinoGrande	Retrieval	text	eng
PIQA	Retrieval	text	eng
SIQA	Retrieval	text	eng
Quail	Retrieval	text	eng
SpartQA	Retrieval	text	eng
TempReasonL1	Retrieval	text	eng
TempReasonL2Pure	Retrieval	text	eng
TempReasonL2Fact	Retrieval	text	eng
TempReasonL2Context	Retrieval	text	eng
TempReasonL3Pure	Retrieval	text	eng
TempReasonL3Fact	Retrieval	text	eng
TempReasonL3Context	Retrieval	text	eng
RARbCode	Retrieval	text	eng
RARbMath	Retrieval	text	eng

Citation

@article{xiao2024rar,
  author = {Xiao, Chenghao and Hudson, G Thomas and Al Moubayed, Noura},
  journal = {arXiv preprint arXiv:2404.06347},
  title = {RAR-b: Reasoning as Retrieval Benchmark},
  year = {2024},
}

RTEB(Code, beta)¶

RTEB Code is a subset of RTEB containing retrieval tasks specifically focused on programming and code domains including algorithmic problems, data science tasks, code evaluation, SQL retrieval, and multilingual code retrieval. The benchmark includes both open and closed datasets, providing a robust evaluation framework for real-world applications. To submit results on private tasks, please create open an issue.

Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.

Tasks

name	type	modalities	languages
AppsRetrieval	Retrieval	text	eng, python
DS1000Retrieval	Retrieval	text	eng, python
HumanEvalRetrieval	Retrieval	text	eng, python
MBPPRetrieval	Retrieval	text	eng, python
WikiSQLRetrieval	Retrieval	text	eng, sql
FreshStackRetrieval	Retrieval	text	eng, go, javascript, python
Code1Retrieval	Retrieval	text	eng
JapaneseCode1Retrieval	Retrieval	text	jpn

Citation

@article{rteb2025,
  author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
  title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
  year = {2025},
}

RTEB(Health, beta)¶

RTEB Healthcare is a subset of RTEB containing retrieval tasks specifically focused on healthcare and medical domains including medical Q&A, healthcare information retrieval, cross-lingual medical retrieval, and multilingual medical consultation. The benchmark includes both open and closed datasets, providing a robust evaluation framework for real-world applications. To submit results on private tasks, please create open an issue.

Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.

Tasks

name	type	modalities	languages
ChatDoctorRetrieval	Retrieval	text	eng
CUREv1	Retrieval	text	eng, fra, spa
EnglishHealthcare1Retrieval	Retrieval	text	eng
GermanHealthcare1Retrieval	Retrieval	text	deu

Citation

@article{rteb2025,
  author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
  title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
  year = {2025},
}

RTEB(Law, beta)¶

RTEB Legal is a subset of RTEB containing retrieval tasks specifically focused on legal domain including case documents, statutes, legal summarization, and multilingual legal Q&A. The benchmark includes both open and closed datasets, providing a robust evaluation framework for real-world applications. To submit results on private tasks, please create open an issue.

Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.

Tasks

name	type	modalities	languages
AILACasedocs	Retrieval	text	eng
AILAStatutes	Retrieval	text	eng
LegalSummarization	Retrieval	text	eng
LegalQuAD	Retrieval	text	deu
FrenchLegal1Retrieval	Retrieval	text	fra
GermanLegal1Retrieval	Retrieval	text	deu
JapaneseLegal1Retrieval	Retrieval	text	jpn

Citation

@article{rteb2025,
  author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
  title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
  year = {2025},
}

RTEB(beta)¶

RTEB (ReTrieval Embedding Benchmark) is a comprehensive benchmark for evaluating text retrieval models across multiple specialized domains including legal, finance, code, and healthcare. It contains diverse retrieval tasks designed to test models' ability to understand domain-specific terminology and retrieve relevant documents in specialized contexts across multiple languages. The dataset includes both open and closed datasets, providing a robust evaluation framework for real-world applications. To submit results on private tasks, please create open an issue.

Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.

Tasks

name	type	modalities	languages
AILACasedocs	Retrieval	text	eng
AILAStatutes	Retrieval	text	eng
LegalSummarization	Retrieval	text	eng
LegalQuAD	Retrieval	text	deu
FinanceBenchRetrieval	Retrieval	text	eng
HC3FinanceRetrieval	Retrieval	text	eng
FinQARetrieval	Retrieval	text	eng
AppsRetrieval	Retrieval	text	eng, python
DS1000Retrieval	Retrieval	text	eng, python
HumanEvalRetrieval	Retrieval	text	eng, python
MBPPRetrieval	Retrieval	text	eng, python
WikiSQLRetrieval	Retrieval	text	eng, sql
FreshStackRetrieval	Retrieval	text	eng, go, javascript, python
ChatDoctorRetrieval	Retrieval	text	eng
CUREv1	Retrieval	text	eng, fra, spa
MIRACLRetrievalHardNegatives	Retrieval	text	ara, ben, deu, eng, fas, ... (18)
Code1Retrieval	Retrieval	text	eng
JapaneseCode1Retrieval	Retrieval	text	jpn
EnglishFinance1Retrieval	Retrieval	text	eng
EnglishFinance2Retrieval	Retrieval	text	eng
EnglishFinance3Retrieval	Retrieval	text	eng
EnglishFinance4Retrieval	Retrieval	text	eng
EnglishHealthcare1Retrieval	Retrieval	text	eng
French1Retrieval	Retrieval	text	fra
FrenchLegal1Retrieval	Retrieval	text	fra
German1Retrieval	Retrieval	text	deu
GermanHealthcare1Retrieval	Retrieval	text	deu
GermanLegal1Retrieval	Retrieval	text	deu
JapaneseLegal1Retrieval	Retrieval	text	jpn

Citation

@article{rteb2025,
  author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
  title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
  year = {2025},
}

RTEB(deu, beta)¶

RTEB German is a subset of RTEB containing retrieval tasks in German across legal, healthcare, and business domains. The benchmark includes both open and closed datasets, providing a robust evaluation framework for real-world applications. To submit results on private tasks, please create open an issue.

Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.

Tasks

name	type	modalities	languages
LegalQuAD	Retrieval	text	deu
German1Retrieval	Retrieval	text	deu
GermanHealthcare1Retrieval	Retrieval	text	deu
GermanLegal1Retrieval	Retrieval	text	deu

Citation

@article{rteb2025,
  author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
  title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
  year = {2025},
}

RTEB(eng, beta)¶

RTEB English is a subset of RTEB containing retrieval tasks in English across legal, finance, code, and healthcare domains. Includes diverse tasks covering specialized domains such as healthcare and finance. The benchmark includes both open and closed datasets, providing a robust evaluation framework for real-world applications. To submit results on private tasks, please create open an issue.

Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.

Tasks

name	type	modalities	languages
AILACasedocs	Retrieval	text	eng
AILAStatutes	Retrieval	text	eng
LegalSummarization	Retrieval	text	eng
FinanceBenchRetrieval	Retrieval	text	eng
HC3FinanceRetrieval	Retrieval	text	eng
FinQARetrieval	Retrieval	text	eng
AppsRetrieval	Retrieval	text	eng, python
DS1000Retrieval	Retrieval	text	eng, python
HumanEvalRetrieval	Retrieval	text	eng, python
MBPPRetrieval	Retrieval	text	eng, python
WikiSQLRetrieval	Retrieval	text	eng, sql
FreshStackRetrieval	Retrieval	text	eng, go, javascript, python
ChatDoctorRetrieval	Retrieval	text	eng
CUREv1	Retrieval	text	eng, fra, spa
Code1Retrieval	Retrieval	text	eng
EnglishFinance1Retrieval	Retrieval	text	eng
EnglishFinance2Retrieval	Retrieval	text	eng
EnglishFinance3Retrieval	Retrieval	text	eng
EnglishFinance4Retrieval	Retrieval	text	eng
EnglishHealthcare1Retrieval	Retrieval	text	eng

Citation

@article{rteb2025,
  author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
  title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
  year = {2025},
}

RTEB(fin, beta)¶

RTEB Finance is a subset of RTEB containing retrieval tasks specifically focused on financial domain including finance benchmarks, Q&A, financial document retrieval, and corporate governance. The benchmark includes both open and closed datasets, providing a robust evaluation framework for real-world applications. To submit results on private tasks, please create open an issue.

Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.

Tasks

name	type	modalities	languages
FinanceBenchRetrieval	Retrieval	text	eng
HC3FinanceRetrieval	Retrieval	text	eng
FinQARetrieval	Retrieval	text	eng
EnglishFinance1Retrieval	Retrieval	text	eng
EnglishFinance2Retrieval	Retrieval	text	eng
EnglishFinance3Retrieval	Retrieval	text	eng
EnglishFinance4Retrieval	Retrieval	text	eng

Citation

@article{rteb2025,
  author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
  title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
  year = {2025},
}

RTEB(fra, beta)¶

RTEB French is a subset of RTEB containing retrieval tasks in French across legal and general knowledge domains. The benchmark includes both open and closed datasets, providing a robust evaluation framework for real-world applications. To submit results on private tasks, please create open an issue.

Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.

Tasks

name	type	modalities	languages
CUREv1	Retrieval	text	eng, fra, spa
French1Retrieval	Retrieval	text	fra
FrenchLegal1Retrieval	Retrieval	text	fra

Citation

@article{rteb2025,
  author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
  title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
  year = {2025},
}

RTEB(jpn, beta)¶

RTEB Japanese is a subset of RTEB containing retrieval tasks in Japanese across legal and code domains. The benchmark includes both open and closed datasets, providing a robust evaluation framework for real-world applications. To submit results on private tasks, please create open an issue.

Note: We have temporarily removed the 'Private' column to read more about this decision out the announcement.

Tasks

name	type	modalities	languages
JapaneseCode1Retrieval	Retrieval	text	jpn
JapaneseLegal1Retrieval	Retrieval	text	jpn

Citation

@article{rteb2025,
  author = {Liu, Frank and Enevoldsen, Kenneth and Solomatin, Roman and Chung, Isaac and Aarsen, Tom and Fődi, Zoltán},
  title = {Introducing RTEB: A New Standard for Retrieval Evaluation},
  year = {2025},
}

RuSciBench¶

RuSciBench is a benchmark designed for evaluating sentence encoders and language models on scientific texts in both Russian and English. The data is sourced from eLibrary (www.elibrary.ru), Russia's largest electronic library of scientific publications. This benchmark facilitates the evaluation and comparison of models on various research-related tasks.

Learn more →

Tasks

name	type	modalities	languages
RuSciBenchBitextMining.v2	BitextMining	text	eng, rus
RuSciBenchCoreRiscClassification	Classification	text	eng, rus
RuSciBenchGRNTIClassification.v2	Classification	text	eng, rus
RuSciBenchOECDClassification.v2	Classification	text	eng, rus
RuSciBenchPubTypeClassification	Classification	text	eng, rus
RuSciBenchCiteRetrieval	Retrieval	text	eng, rus
RuSciBenchCociteRetrieval	Retrieval	text	eng, rus
RuSciBenchCitedCountRegression	Regression	text	eng, rus
RuSciBenchYearPublRegression	Regression	text	eng, rus

Citation

@article{vatolin2024ruscibench,
  author = {Vatolin, A. and Gerasimenko, N. and Ianina, A. and Vorontsov, K.},
  doi = {10.1134/S1064562424602191},
  issn = {1531-8362},
  journal = {Doklady Mathematics},
  month = {12},
  number = {1},
  pages = {S251--S260},
  title = {RuSciBench: Open Benchmark for Russian and English Scientific Document Representations},
  url = {https://doi.org/10.1134/S1064562424602191},
  volume = {110},
  year = {2024},
}

VN-MTEB (vie, v1)¶

A benchmark for text-embedding performance in Vietnamese.

Learn more →

Tasks

name	type	modalities	languages
ArguAna-VN	Retrieval	text	vie
SciFact-VN	Retrieval	text	vie
ClimateFEVER-VN	Retrieval	text	vie
FEVER-VN	Retrieval	text	vie
DBPedia-VN	Retrieval	text	vie
NQ-VN	Retrieval	text	vie
HotpotQA-VN	Retrieval	text	vie
MSMARCO-VN	Retrieval	text	vie
TRECCOVID-VN	Retrieval	text	vie
FiQA2018-VN	Retrieval	text	vie
NFCorpus-VN	Retrieval	text	vie
SCIDOCS-VN	Retrieval	text	vie
Touche2020-VN	Retrieval	text	vie
Quora-VN	Retrieval	text	vie
CQADupstackAndroid-VN	Retrieval	text	vie
CQADupstackGis-VN	Retrieval	text	vie
CQADupstackMathematica-VN	Retrieval	text	vie
CQADupstackPhysics-VN	Retrieval	text	vie
CQADupstackProgrammers-VN	Retrieval	text	vie
CQADupstackStats-VN	Retrieval	text	vie
CQADupstackTex-VN	Retrieval	text	vie
CQADupstackUnix-VN	Retrieval	text	vie
CQADupstackWebmasters-VN	Retrieval	text	vie
CQADupstackWordpress-VN	Retrieval	text	vie
Banking77VNClassification	Classification	text	vie
EmotionVNClassification	Classification	text	vie
AmazonCounterfactualVNClassification	Classification	text	vie
MTOPDomainVNClassification	Classification	text	vie
TweetSentimentExtractionVNClassification	Classification	text	vie
ToxicConversationsVNClassification	Classification	text	vie
ImdbVNClassification	Classification	text	vie
MTOPIntentVNClassification	Classification	text	vie
MassiveScenarioVNClassification	Classification	text	vie
MassiveIntentVNClassification	Classification	text	vie
AmazonReviewsVNClassification	Classification	text	vie
AmazonPolarityVNClassification	Classification	text	vie
SprintDuplicateQuestions-VN	PairClassification	text	vie
TwitterSemEval2015-VN	PairClassification	text	vie
TwitterURLCorpus-VN	PairClassification	text	vie
TwentyNewsgroupsClustering-VN	Clustering	text	vie
RedditClusteringP2P-VN	Clustering	text	vie
StackExchangeClusteringP2P-VN	Clustering	text	vie
StackExchangeClustering-VN	Clustering	text	vie
RedditClustering-VN	Clustering	text	vie
SciDocsRR-VN	Reranking	text	vie
AskUbuntuDupQuestions-VN	Reranking	text	vie
StackOverflowDupQuestions-VN	Reranking	text	vie
BIOSSES-VN	STS	text	vie
SICK-R-VN	STS	text	vie
STSBenchmark-VN	STS	text	vie

Citation

@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

ViDoRe(v1&v2)¶

A benchmark for evaluating visual document retrieval, combining ViDoRe v1 and v2.

Learn more →

Tasks

name	type	modalities	languages
VidoreArxivQARetrieval	DocumentUnderstanding	text, image	eng
VidoreDocVQARetrieval	DocumentUnderstanding	text, image	eng
VidoreInfoVQARetrieval	DocumentUnderstanding	text, image	eng
VidoreTabfquadRetrieval	DocumentUnderstanding	text, image	eng
VidoreTatdqaRetrieval	DocumentUnderstanding	text, image	eng
VidoreShiftProjectRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAAIRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAEnergyRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAGovernmentReportsRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAHealthcareIndustryRetrieval	DocumentUnderstanding	text, image	eng
Vidore2ESGReportsRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, spa
Vidore2EconomicsReportsRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, spa
Vidore2BioMedicalLecturesRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, spa
Vidore2ESGReportsHLRetrieval	DocumentUnderstanding	text, image	eng

Citation

@article{mace2025vidorev2,
  author = {Macé, Quentin and Loison António and Faysse, Manuel},
  journal = {arXiv preprint arXiv:2505.17166},
  title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
  year = {2025},
}

ViDoRe(v1)¶

Retrieve associated pages according to questions.

Learn more →

Tasks

name	type	modalities	languages
VidoreArxivQARetrieval	DocumentUnderstanding	text, image	eng
VidoreDocVQARetrieval	DocumentUnderstanding	text, image	eng
VidoreInfoVQARetrieval	DocumentUnderstanding	text, image	eng
VidoreTabfquadRetrieval	DocumentUnderstanding	text, image	eng
VidoreTatdqaRetrieval	DocumentUnderstanding	text, image	eng
VidoreShiftProjectRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAAIRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAEnergyRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAGovernmentReportsRetrieval	DocumentUnderstanding	text, image	eng
VidoreSyntheticDocQAHealthcareIndustryRetrieval	DocumentUnderstanding	text, image	eng

Citation

@article{faysse2024colpali,
  author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
  journal = {arXiv preprint arXiv:2407.01449},
  title = {ColPali: Efficient Document Retrieval with Vision Language Models},
  year = {2024},
}

ViDoRe(v2)¶

Retrieve associated pages according to questions.

Learn more →

Tasks

name	type	modalities	languages
Vidore2ESGReportsRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, spa
Vidore2EconomicsReportsRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, spa
Vidore2BioMedicalLecturesRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, spa
Vidore2ESGReportsHLRetrieval	DocumentUnderstanding	text, image	eng

Citation

@article{mace2025vidorev2,
  author = {Macé, Quentin and Loison António and Faysse, Manuel},
  journal = {arXiv preprint arXiv:2505.17166},
  title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
  year = {2025},
}

ViDoRe(v3)¶

ViDoRe V3 sets a new industry gold standard for multi-modal, enterprise document visual retrieval evaluation. It addresses a critical challenge in production RAG systems: retrieving accurate information from complex, visually-rich documents. The benchmark includes both open and closed datasets: to submit results on private tasks, please open an issue.

Learn more →

Tasks

name	type	modalities	languages
Vidore3FinanceEnRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, ita, por, ... (6)
Vidore3IndustrialRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, ita, por, ... (6)
Vidore3ComputerScienceRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, ita, por, ... (6)
Vidore3PharmaceuticalsRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, ita, por, ... (6)
Vidore3HrRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, ita, por, ... (6)
Vidore3FinanceFrRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, ita, por, ... (6)
Vidore3PhysicsRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, ita, por, ... (6)
Vidore3EnergyRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, ita, por, ... (6)
Vidore3TelecomRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, ita, por, ... (6)
Vidore3NuclearRetrieval	DocumentUnderstanding	text, image	deu, eng, fra, ita, por, ... (6)

Citation

@article{loison2026vidorev3comprehensiveevaluation,
  archiveprefix = {arXiv},
  author = {António Loison and Quentin Macé and Antoine Edy and Victor Xing and Tom Balough and Gabriel Moreira and Bo Liu and Manuel Faysse and Céline Hudelot and Gautier Viaud},
  eprint = {2601.08620},
  primaryclass = {cs.AI},
  title = {ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios},
  url = {https://arxiv.org/abs/2601.08620},
  year = {2026},
}