Skip to content

Reranking

  • Number of tasks: 29

AlloprofReranking

This dataset was provided by AlloProf, an organisation in Quebec, Canada offering resources and a help forum curated by a large number of teachers to students on all subjects taught from in primary and secondary school

Dataset: mteb/AlloprofRerankingLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 fra Academic, Web, Written expert-annotated found
Citation
@misc{lef23,
  author = {Lefebvre-Brossard, Antoine and Gazaille, Stephane and Desmarais, Michel C.},
  copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International},
  doi = {10.48550/ARXIV.2302.07738},
  keywords = {Computation and Language (cs.CL), Information Retrieval (cs.IR), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  publisher = {arXiv},
  title = {Alloprof: a new French question-answer education dataset and its use in an information retrieval case study},
  url = {https://arxiv.org/abs/2302.07738},
  year = {2023},
}

AskUbuntuDupQuestions

AskUbuntu Question Dataset - Questions from AskUbuntu with manual annotations marking pairs of questions as similar or non-similar

Dataset: mteb/AskUbuntuDupQuestionsLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 eng Programming, Web human-annotated found
Citation
@article{wang-2021-TSDAE,
  author = {Wang, Kexin and Reimers, Nils and  Gurevych, Iryna},
  journal = {arXiv preprint arXiv:2104.06979},
  month = {4},
  title = {TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning},
  url = {https://arxiv.org/abs/2104.06979},
  year = {2021},
}

AskUbuntuDupQuestions-VN

A translated dataset from AskUbuntu Question Dataset - Questions from AskUbuntu with manual annotations marking pairs of questions as similar or non-similar The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/askubuntudupquestions-reranking-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) map vie Programming, Web derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

BuiltBenchReranking

Reranking of built asset entity type/class descriptions given a query describing an entity as represented in well-established industry classification systems such as Uniclass, IFC, etc.

Dataset: mteb/BuiltBenchRerankingLicense: cc-by-nd-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map eng Engineering, Written derived created
Citation
@article{shahinmoghadam2024benchmarking,
  author = {Shahinmoghadam, Mehrzad and Motamedi, Ali},
  journal = {arXiv preprint arXiv:2411.12056},
  title = {Benchmarking pre-trained text embedding models in aligning built asset information},
  year = {2024},
}

CMedQAv1-reranking

Chinese community medical question answering

Dataset: mteb/CMedQAv1-rerankingLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 cmn Medical, Written expert-annotated found
Citation
@article{zhang2017chinese,
  author = {Zhang, Sheng and Zhang, Xin and Wang, Hui and Cheng, Jiajun and Li, Pei and Ding, Zhaoyun},
  journal = {Applied Sciences},
  number = {8},
  pages = {767},
  publisher = {Multidisciplinary Digital Publishing Institute},
  title = {Chinese Medical Question Answer Matching Using End-to-End Character-Level Multi-Scale CNNs},
  volume = {7},
  year = {2017},
}

CMedQAv2-reranking

Chinese community medical question answering

Dataset: mteb/CMedQAv2-rerankingLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 cmn Medical, Written not specified not specified
Citation
@article{8548603,
  author = {S. Zhang and X. Zhang and H. Wang and L. Guo and S. Liu},
  doi = {10.1109/ACCESS.2018.2883637},
  issn = {2169-3536},
  journal = {IEEE Access},
  keywords = {Biomedical imaging;Data mining;Semantics;Medical services;Feature extraction;Knowledge discovery;Medical question answering;interactive attention;deep learning;deep neural networks},
  month = {},
  number = {},
  pages = {74061-74071},
  title = {Multi-Scale Attentive Interaction Networks for Chinese Medical Question Answer Selection},
  volume = {6},
  year = {2018},
}

CodeRAGLibraryDocumentationSolutions

Evaluation of code library documentation retrieval using CodeRAG-Bench. Tests the ability to retrieve relevant Python library documentation sections given code-related queries.

Dataset: code-rag-bench/library-documentationLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) ndcg_at_10 python Programming derived found
Citation
    @misc{wang2024coderagbenchretrievalaugmentcode,
  archiveprefix = {arXiv},
  author = {Zora Zhiruo Wang and Akari Asai and Xinyan Velocity Yu and Frank F. Xu and Yiqing Xie and Graham Neubig and Daniel Fried},
  eprint = {2406.14497},
  primaryclass = {cs.SE},
  title = {CodeRAG-Bench: Can Retrieval Augment Code Generation?},
  url = {https://arxiv.org/abs/2406.14497},
  year = {2024},
}

CodeRAGOnlineTutorials

Evaluation of online programming tutorial retrieval using CodeRAG-Bench. Tests the ability to retrieve relevant tutorials from online platforms given code-related queries.

Dataset: code-rag-bench/online-tutorialsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) ndcg_at_10 python Programming derived found
Citation
    @misc{wang2024coderagbenchretrievalaugmentcode,
  archiveprefix = {arXiv},
  author = {Zora Zhiruo Wang and Akari Asai and Xinyan Velocity Yu and Frank F. Xu and Yiqing Xie and Graham Neubig and Daniel Fried},
  eprint = {2406.14497},
  primaryclass = {cs.SE},
  title = {CodeRAG-Bench: Can Retrieval Augment Code Generation?},
  url = {https://arxiv.org/abs/2406.14497},
  year = {2024},
}

CodeRAGProgrammingSolutions

Evaluation of programming solution retrieval using CodeRAG-Bench. Tests the ability to retrieve relevant programming solutions given code-related queries.

Dataset: code-rag-bench/programming-solutionsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) ndcg_at_10 python Programming derived found
Citation
    @misc{wang2024coderagbenchretrievalaugmentcode,
  archiveprefix = {arXiv},
  author = {Zora Zhiruo Wang and Akari Asai and Xinyan Velocity Yu and Frank F. Xu and Yiqing Xie and Graham Neubig and Daniel Fried},
  eprint = {2406.14497},
  primaryclass = {cs.SE},
  title = {CodeRAG-Bench: Can Retrieval Augment Code Generation?},
  url = {https://arxiv.org/abs/2406.14497},
  year = {2024},
}

CodeRAGStackoverflowPosts

Evaluation of StackOverflow post retrieval using CodeRAG-Bench. Tests the ability to retrieve relevant StackOverflow posts given code-related queries.

Dataset: code-rag-bench/stackoverflow-postsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) ndcg_at_10 python Programming derived found
Citation
    @misc{wang2024coderagbenchretrievalaugmentcode,
  archiveprefix = {arXiv},
  author = {Zora Zhiruo Wang and Akari Asai and Xinyan Velocity Yu and Frank F. Xu and Yiqing Xie and Graham Neubig and Daniel Fried},
  eprint = {2406.14497},
  primaryclass = {cs.SE},
  title = {CodeRAG-Bench: Can Retrieval Augment Code Generation?},
  url = {https://arxiv.org/abs/2406.14497},
  year = {2024},
}

ESCIReranking

Dataset: mteb/ESCIRerankingLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 eng, jpn, spa Written derived created
Citation
@article{reddy2022shopping,
  archiveprefix = {arXiv},
  author = {Chandan K. Reddy and Lluís Màrquez and Fran Valero and Nikhil Rao and Hugo Zaragoza and Sambaran Bandyopadhyay and Arnab Biswas and Anlu Xing and Karthik Subbian},
  eprint = {2206.06588},
  title = {Shopping Queries Dataset: A Large-Scale {ESCI} Benchmark for Improving Product Search},
  year = {2022},
}

JQaRAReranking

JQaRA: Japanese Question Answering with Retrieval Augmentation - 検索拡張(RAG)評価のための日本語 Q&A データセット. JQaRA is an information retrieval task for questions against 100 candidate data (including one or more correct answers).

Dataset: sbintuitions/JMTEBLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map jpn Encyclopaedic, Non-fiction, Written derived found
Citation
@misc{yuichi-tateno-2024-jqara,
  author = {Yuichi Tateno},
  title = {JQaRA: Japanese Question Answering with Retrieval Augmentation - 検索拡張(RAG)評価のための日本語Q&Aデータセット},
  url = {https://huggingface.co/datasets/hotchpotch/JQaRA},
}

JaCWIRReranking

JaCWIR is a small-scale Japanese information retrieval evaluation dataset consisting of 5000 question texts and approximately 500k web page titles and web page introductions or summaries (meta descriptions, etc.). The question texts are created based on one of the 500k web pages, and that data is used as a positive example for the question text.

Dataset: sbintuitions/JMTEBLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map jpn Web, Written derived found
Citation
@misc{yuichi-tateno-2024-jacwir,
  author = {Yuichi Tateno},
  title = {JaCWIR: Japanese Casual Web IR - 日本語情報検索評価のための小規模でカジュアルなWebタイトルと概要のデータセット},
  url = {https://huggingface.co/datasets/hotchpotch/JaCWIR},
}

MIRACLReranking

MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a multilingual retrieval dataset that focuses on search across 18 different languages.

Dataset: mteb/MIRACLRerankingLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) ndcg_at_10 ara, ben, deu, eng, fas, ... (18) Encyclopaedic, Written expert-annotated created
Citation
@article{10.1162/tacl_a_00595,
  author = {Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy},
  doi = {10.1162/tacl_a_00595},
  issn = {2307-387X},
  journal = {Transactions of the Association for Computational Linguistics},
  month = {09},
  pages = {1114-1131},
  title = {{MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages}},
  volume = {11},
  year = {2023},
}

MMarcoReranking

mMARCO is a multilingual version of the MS MARCO passage ranking dataset

Dataset: mteb/MMarcoRerankingLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 cmn not specified not specified not specified
Citation
@misc{bonifacio2021mmarco,
  archiveprefix = {arXiv},
  author = {Luiz Henrique Bonifacio and Vitor Jeronymo and Hugo Queiroz Abonizio and Israel Campiotti and Marzieh Fadaee and  and Roberto Lotufo and Rodrigo Nogueira},
  eprint = {2108.13897},
  primaryclass = {cs.CL},
  title = {mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset},
  year = {2021},
}

MindSmallReranking

Microsoft News Dataset: A Large-Scale English Dataset for News Recommendation Research

Dataset: mteb/MindSmallRerankingLicense: https://github.com/msnews/MIND/blob/master/MSR%20License_Data.pdf • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) max_over_subqueries_map_at_1000 eng News, Written expert-annotated found
Citation
@inproceedings{wu-etal-2020-mind,
  abstract = {News recommendation is an important technique for personalized news
service. Compared with product and movie recommendations which have been comprehensively studied,
the research on news recommendation is much more limited, mainly due to the lack of a high-quality benchmark
dataset. In this paper, we present a large-scale dataset named MIND for news recommendation. Constructed from
the user click logs of Microsoft News, MIND contains 1 million users and more than 160k English news
articles, each of which has rich textual content such as title, abstract and body. We demonstrate MIND a good
testbed for news recommendation through a comparative study of several state-of-the-art news recommendation
methods which are originally developed on different proprietary datasets. Our results show the performance of
news recommendation highly relies on the quality of news content understanding and user interest modeling.
Many natural language processing techniques such as effective text representation methods and pre-trained
language models can effectively improve the performance of news recommendation. The MIND dataset will be
available at https://msnews.github.io.},
  address = {Online},
  author = {Wu, Fangzhao  and Qiao, Ying  and Chen, Jiun-Hung  and Wu, Chuhan  and Qi,
Tao  and Lian, Jianxun  and Liu, Danyang  and Xie, Xing  and Gao, Jianfeng  and Wu, Winnie  and Zhou, Ming},
  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  doi = {10.18653/v1/2020.acl-main.331},
  editor = {Jurafsky, Dan  and Chai, Joyce  and Schluter, Natalie  and Tetreault, Joel},
  month = jul,
  pages = {3597--3606},
  publisher = {Association for Computational Linguistics},
  title = {{MIND}: A Large-scale Dataset for News
Recommendation},
  url = {https://aclanthology.org/2020.acl-main.331},
  year = {2020},
}

NamaaMrTydiReranking

Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages. It is designed for monolingual retrieval, specifically to evaluate ranking with learned dense representations. This dataset adapts the arabic test split for Reranking evaluation purposes by the addition of multiple (Hard) Negatives to each query and positive

Dataset: mteb/NamaaMrTydiRerankingLicense: cc-by-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map ara Encyclopaedic, Written human-annotated found
Citation
@article{muennighoff2022mteb,
  author = {Muennighoff, Niklas and Tazi, Nouamane and Magne, Lo{\"\\i}c and Reimers, Nils},
  doi = {10.48550/ARXIV.2210.07316},
  journal = {arXiv preprint arXiv:2210.07316},
  publisher = {arXiv},
  title = {MTEB: Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2210.07316},
  year = {2022},
}

NevIR

Paired evaluation of real world negation in retrieval, with questions and passages. Since models generally prefer one passage over the other always, there are two questions that the model must get right to understand the negation (hence the paired_accuracy metric).

Dataset: orionweller/NevIR-mtebLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) paired_accuracy eng Web human-annotated created
Citation
@inproceedings{Weller2023NevIRNI,
  author = {{Orion Weller and Dawn J Lawrie and Benjamin Van Durme}},
  booktitle = {{Conference of the European Chapter of the Association for Computational Linguistics}},
  title = {{NevIR: Negation in Neural Information Retrieval}},
  url = {{https://api.semanticscholar.org/CorpusID:258676146}},
  year = {{2023}},
}

RuBQReranking

Paragraph reranking based on RuBQ 2.0. Give paragraphs that answer the question higher scores.

Dataset: mteb/RuBQRerankingLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 rus Encyclopaedic, Written human-annotated created
Citation
@inproceedings{RuBQ2021,
  author = {Ivan Rybin and Vladislav Korablinov and Pavel Efimov and Pavel Braslavski},
  booktitle = {ESWC},
  pages = {532--547},
  title = {RuBQ 2.0: An Innovated Russian Question Answering Dataset},
  year = {2021},
}

SciDocsRR

Ranking of related scientific papers based on their title.

Dataset: mteb/SciDocsRRLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 eng Academic, Non-fiction, Written not specified found
Citation
@inproceedings{cohan-etal-2020-specter,
  abstract = {Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level training objectives and do not leverage information on inter-document relatedness, which limits their document-level representation power. For applications on scientific documents, such as classification and recommendation, accurate embeddings of documents are a necessity. We propose SPECTER, a new method to generate document-level embedding of scientific papers based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, Specter can be easily applied to downstream applications without task-specific fine-tuning. Additionally, to encourage further research on document-level models, we introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. We show that Specter outperforms a variety of competitive baselines on the benchmark.},
  address = {Online},
  author = {Cohan, Arman  and
Feldman, Sergey  and
Beltagy, Iz  and
Downey, Doug  and
Weld, Daniel},
  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  doi = {10.18653/v1/2020.acl-main.207},
  editor = {Jurafsky, Dan  and
Chai, Joyce  and
Schluter, Natalie  and
Tetreault, Joel},
  month = jul,
  pages = {2270--2282},
  publisher = {Association for Computational Linguistics},
  title = {{SPECTER}: Document-level Representation Learning using Citation-informed Transformers},
  url = {https://aclanthology.org/2020.acl-main.207},
  year = {2020},
}

SciDocsRR-VN

A translated dataset from Ranking of related scientific papers based on their title. The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/scidocs-reranking-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) map vie Academic, Non-fiction, Written derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

StackOverflowDupQuestions

Stack Overflow Duplicate Questions Task for questions with the tags Java, JavaScript and Python

Dataset: mteb/StackOverflowDupQuestionsLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 eng Blog, Programming, Written derived found
Citation
@article{Liu2018LinkSOAD,
  author = {Xueqing Liu and Chi Wang and Yue Leng and ChengXiang Zhai},
  journal = {Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering},
  title = {LinkSO: a dataset for learning to retrieve similar question answer pairs on software development forums},
  url = {https://api.semanticscholar.org/CorpusID:53111679},
  year = {2018},
}

StackOverflowDupQuestions-VN

A translated dataset from Stack Overflow Duplicate Questions Task for questions with the tags Java, JavaScript and Python The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/stackoverflowdupquestions-reranking-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) map vie Academic, Non-fiction, Written derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

SyntecReranking

This dataset has been built from the Syntec Collective bargaining agreement.

Dataset: mteb/SyntecRerankingLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 fra Legal, Written human-annotated found
Citation
@misc{ciancone2024extending,
  archiveprefix = {arXiv},
  author = {Mathieu Ciancone and Imene Kerboua and Marion Schaeffer and Wissam Siblini},
  eprint = {2405.20468},
  primaryclass = {cs.CL},
  title = {Extending the Massive Text Embedding Benchmark to French},
  year = {2024},
}

T2Reranking

T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

Dataset: mteb/T2RerankingLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 cmn not specified not specified not specified
Citation
@misc{xie2023t2ranking,
  archiveprefix = {arXiv},
  author = {Xiaohui Xie and Qian Dong and Bingning Wang and Feiyang Lv and Ting Yao and Weinan Gan and Zhijing Wu and Xiangsheng Li and Haitao Li and Yiqun Liu and Jin Ma},
  eprint = {2304.03679},
  primaryclass = {cs.IR},
  title = {T2Ranking: A large-scale Chinese Benchmark for Passage Ranking},
  year = {2023},
}

VoyageMMarcoReranking

a hard-negative augmented version of the Japanese MMARCO dataset as used in Voyage AI Evaluation Suite

Dataset: mteb/VoyageMMarcoRerankingLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 jpn Academic, Non-fiction, Written derived found
Citation
@misc{clavié2023jacolbert,
  archiveprefix = {arXiv},
  author = {Benjamin Clavié},
  eprint = {2312.16144},
  title = {JaColBERT and Hard Negatives, Towards Better Japanese-First Embeddings for Retrieval: Early Technical Report},
  year = {2023},
}

WebLINXCandidatesReranking

WebLINX is a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. The reranking task focuses on finding relevant elements at every given step in the trajectory.

Dataset: mteb/WebLINXCandidatesRerankingLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) mrr_at_10 eng Academic, Web, Written expert-annotated created
Citation
@misc{lù2024weblinx,
  archiveprefix = {arXiv},
  author = {Xing Han Lù and Zdeněk Kasner and Siva Reddy},
  eprint = {2402.05930},
  primaryclass = {cs.CL},
  title = {WebLINX: Real-World Website Navigation with Multi-Turn Dialogue},
  year = {2024},
}

WikipediaRerankingMultilingual

The dataset is derived from Cohere's wikipedia-2023-11 dataset and contains synthetically generated queries.

Dataset: mteb/WikipediaRerankingMultilingualLicense: cc-by-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map_at_1000 ben, bul, ces, dan, deu, ... (16) Encyclopaedic, Written LM-generated and reviewed LM-generated and verified
Citation
@online{wikidump,
  author = {Wikimedia Foundation},
  title = {Wikimedia Downloads},
  url = {https://dumps.wikimedia.org},
}

XGlueWPRReranking

XGLUE is a new benchmark dataset to evaluate the performance of cross-lingual pre-trained models with respect to cross-lingual natural language understanding and generation. XGLUE is composed of 11 tasks spans 19 languages.

Dataset: forresty/xglueLicense: http://hdl.handle.net/11234/1-3105 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) map deu, eng, fra, ita, por, ... (7) Written human-annotated found
Citation
@misc{11234/1-3105,
  author = {Zeman, Daniel and Nivre, Joakim and Abrams, Mitchell and Aepli, No{\"e}mi and Agi{\'c}, {\v Z}eljko and Ahrenberg, Lars and Aleksandravi{\v c}i{\=u}t{\.e}, Gabriel{\.e} and Antonsen, Lene and Aplonova, Katya and Aranzabe, Maria Jesus and Arutie, Gashaw and Asahara, Masayuki and Ateyah, Luma and Attia, Mohammed and Atutxa, Aitziber and Augustinus, Liesbeth and Badmaeva, Elena and Ballesteros, Miguel and Banerjee, Esha and Bank, Sebastian and Barbu Mititelu, Verginica and Basmov, Victoria and Batchelor, Colin and Bauer, John and Bellato, Sandra and Bengoetxea, Kepa and Berzak, Yevgeni and Bhat, Irshad Ahmad and Bhat, Riyaz Ahmad and Biagetti, Erica and Bick, Eckhard and Bielinskien{\.e}, Agn{\.e} and Blokland, Rogier and Bobicev, Victoria and Boizou, Lo{\"{\i}}c and Borges V{\"o}lker, Emanuel and B{\"o}rstell, Carl and Bosco, Cristina and Bouma, Gosse and Bowman, Sam and Boyd, Adriane and Brokait{\.e}, Kristina and Burchardt, Aljoscha and Candito, Marie and Caron, Bernard and Caron, Gauthier and Cavalcanti, Tatiana and Cebiro{\u g}lu Eryi{\u g}it, G{\"u}l{\c s}en and Cecchini, Flavio Massimiliano and Celano, Giuseppe G. A. and {\v C}{\'e}pl{\"o}, Slavom{\'{\i}}r and Cetin, Savas and Chalub, Fabricio and Choi, Jinho and Cho, Yongseok and Chun, Jayeol and Cignarella, Alessandra T. and Cinkov{\'a}, Silvie and Collomb, Aur{\'e}lie and {\c C}{\"o}ltekin, {\c C}a{\u g}r{\i} and Connor, Miriam and Courtin, Marine and Davidson, Elizabeth and de Marneffe, Marie-Catherine and de Paiva, Valeria and de Souza, Elvis and Diaz de Ilarraza, Arantza and Dickerson, Carly and Dione, Bamba and Dirix, Peter and Dobrovoljc, Kaja and Dozat, Timothy and Droganova, Kira and Dwivedi, Puneet and Eckhoff, Hanne and Eli, Marhaba and Elkahky, Ali and Ephrem, Binyam and Erina, Olga and Erjavec, Toma{\v z} and Etienne, Aline and Evelyn, Wograine and Farkas, Rich{\'a}rd and Fernandez Alcalde, Hector and Foster, Jennifer and Freitas, Cl{\'a}udia and Fujita, Kazunori and Gajdo{\v s}ov{\'a}, Katar{\'{\i}}na and Galbraith, Daniel and Garcia, Marcos and G{\"a}rdenfors, Moa and Garza, Sebastian and Gerdes, Kim and Ginter, Filip and Goenaga, Iakes and Gojenola, Koldo and G{\"o}k{\i}rmak, Memduh and Goldberg, Yoav and G{\'o}mez Guinovart, Xavier and Gonz{\'a}lez Saavedra, Berta and Grici{\=u}t{\.e}, Bernadeta and Grioni, Matias and Gr{\=u}z{\={\i}}tis, Normunds and Guillaume, Bruno and Guillot-Barbance, C{\'e}line and Habash, Nizar and Haji{\v c}, Jan and Haji{\v c} jr., Jan and H{\"a}m{\"a}l{\"a}inen, Mika and H{\`a} M{\~y}, Linh and Han, Na-Rae and Harris, Kim and Haug, Dag and Heinecke, Johannes and Hennig, Felix and Hladk{\'a}, Barbora and Hlav{\'a}{\v c}ov{\'a}, Jaroslava and Hociung, Florinel and Hohle, Petter and Hwang, Jena and Ikeda, Takumi and Ion, Radu and Irimia, Elena and Ishola, {\d O}l{\'a}j{\'{\i}}d{\'e} and Jel{\'{\i}}nek, Tom{\'a}{\v s} and Johannsen, Anders and J{\o}rgensen, Fredrik and Juutinen, Markus and Ka{\c s}{\i}kara, H{\"u}ner and Kaasen, Andre and Kabaeva, Nadezhda and Kahane, Sylvain and Kanayama, Hiroshi and Kanerva, Jenna and Katz, Boris and Kayadelen, Tolga and Kenney, Jessica and Kettnerov{\'a}, V{\'a}clava and Kirchner, Jesse and Klementieva, Elena and K{\"o}hn, Arne and Kopacewicz, Kamil and Kotsyba, Natalia and Kovalevskait{\.e}, Jolanta and Krek, Simon and Kwak, Sookyoung and Laippala, Veronika and Lambertino, Lorenzo and Lam, Lucia and Lando, Tatiana and Larasati, Septina Dian and Lavrentiev, Alexei and Lee, John and L{\^e} H{\`{\^o}}ng, Phương and Lenci, Alessandro and Lertpradit, Saran and Leung, Herman and Li, Cheuk Ying and Li, Josie and Li, Keying and Lim, {KyungTae} and Liovina, Maria and Li, Yuan and Ljube{\v s}i{\'c}, Nikola and Loginova, Olga and Lyashevskaya, Olga and Lynn, Teresa and Macketanz, Vivien and Makazhanov, Aibek and Mandl, Michael and Manning, Christopher and Manurung, Ruli and M{\u a}r{\u a}nduc, C{\u a}t{\u a}lina and Mare{\v c}ek, David and Marheinecke, Katrin and Mart{\'{\i}}nez Alonso, H{\'e}ctor and Martins, Andr{\'e} and Ma{\v s}ek, Jan and Matsumoto, Yuji and {McDonald}, Ryan and {McGuinness}, Sarah and Mendon{\c c}a, Gustavo and Miekka, Niko and Misirpashayeva, Margarita and Missil{\"a}, Anna and Mititelu, C{\u a}t{\u a}lin and Mitrofan, Maria and Miyao, Yusuke and Montemagni, Simonetta and More, Amir and Moreno Romero, Laura and Mori, Keiko Sophie and Morioka, Tomohiko and Mori, Shinsuke and Moro, Shigeki and Mortensen, Bjartur and Moskalevskyi, Bohdan and Muischnek, Kadri and Munro, Robert and Murawaki, Yugo and M{\"u}{\"u}risep, Kaili and Nainwani, Pinkey and Navarro Hor{\~n}iacek, Juan Ignacio and Nedoluzhko, Anna and Ne{\v s}pore-B{\=e}rzkalne, Gunta and Nguy{\~{\^e}}n Th{\d i}, Lương and Nguy{\~{\^e}}n Th{\d i} Minh, Huy{\`{\^e}}n and Nikaido, Yoshihiro and Nikolaev, Vitaly and Nitisaroj, Rattima and Nurmi, Hanna and Ojala, Stina and Ojha, Atul Kr. and Ol{\'u}{\`o}kun, Ad{\'e}day{\d o}̀ and Omura, Mai and Osenova, Petya and {\"O}stling, Robert and {\O}vrelid, Lilja and Partanen, Niko and Pascual, Elena and Passarotti, Marco and Patejuk, Agnieszka and Paulino-Passos, Guilherme and Peljak-{\L}api{\'n}ska, Angelika and Peng, Siyao and Perez, Cenel-Augusto and Perrier, Guy and Petrova, Daria and Petrov, Slav and Phelan, Jason and Piitulainen, Jussi and Pirinen, Tommi A and Pitler, Emily and Plank, Barbara and Poibeau, Thierry and Ponomareva, Larisa and Popel, Martin and Pretkalni{\c n}a, Lauma and Pr{\'e}vost, Sophie and Prokopidis, Prokopis and Przepi{\'o}rkowski, Adam and Puolakainen, Tiina and Pyysalo, Sampo and Qi, Peng and R{\"a}{\"a}bis, Andriela and Rademaker, Alexandre and Ramasamy, Loganathan and Rama, Taraka and Ramisch, Carlos and Ravishankar, Vinit and Real, Livy and Reddy, Siva and Rehm, Georg and Riabov, Ivan and Rie{\ss}ler, Michael and Rimkut{\.e}, Erika and Rinaldi, Larissa and Rituma, Laura and Rocha, Luisa and Romanenko, Mykhailo and Rosa, Rudolf and Rovati, Davide and Roșca, Valentin and Rudina, Olga and Rueter, Jack and Sadde, Shoval and Sagot, Beno{\^{\i}}t and Saleh, Shadi and Salomoni, Alessio and Samard{\v z}i{\'c}, Tanja and Samson, Stephanie and Sanguinetti, Manuela and S{\"a}rg, Dage and Saul{\={\i}}te, Baiba and Sawanakunanon, Yanin and Schneider, Nathan and Schuster, Sebastian and Seddah, Djam{\'e} and Seeker, Wolfgang and Seraji, Mojgan and Shen, Mo and Shimada, Atsuko and Shirasu, Hiroyuki and Shohibussirri, Muh and Sichinava, Dmitry and Silveira, Aline and Silveira, Natalia and Simi, Maria and Simionescu, Radu and Simk{\'o}, Katalin and {\v S}imkov{\'a}, M{\'a}ria and Simov, Kiril and Smith, Aaron and Soares-Bastos, Isabela and Spadine, Carolyn and Stella, Antonio and Straka, Milan and Strnadov{\'a}, Jana and Suhr, Alane and Sulubacak, Umut and Suzuki, Shingo and Sz{\'a}nt{\'o}, Zsolt and Taji, Dima and Takahashi, Yuta and Tamburini, Fabio and Tanaka, Takaaki and Tellier, Isabelle and Thomas, Guillaume and Torga, Liisi and Trosterud, Trond and Trukhina, Anna and Tsarfaty, Reut and Tyers, Francis and Uematsu, Sumire and Ure{\v s}ov{\'a}, Zde{\v n}ka and Uria, Larraitz and Uszkoreit, Hans and Utka, Andrius and Vajjala, Sowmya and van Niekerk, Daniel and van Noord, Gertjan and Varga, Viktor and Villemonte de la Clergerie, Eric and Vincze, Veronika and Wallin, Lars and Walsh, Abigail and Wang, Jing Xian and Washington, Jonathan North and Wendt, Maximilan and Williams, Seyi and Wir{\'e}n, Mats and Wittern, Christian and Woldemariam, Tsegay and Wong, Tak-sum and Wr{\'o}blewska, Alina and Yako, Mary and Yamazaki, Naoki and Yan, Chunxiao and Yasuoka, Koichi and Yavrumyan, Marat M. and Yu, Zhuoran and {\v Z}abokrtsk{\'y}, Zden{\v e}k and Zeldes, Amir and Zhang, Manying and Zhu, Hanzhi},
  copyright = {Licence Universal Dependencies v2.5},
  note = {{LINDAT}/{CLARIAH}-{CZ} digital library at the Institute of Formal and Applied Linguistics ({{\'U}FAL}), Faculty of Mathematics and Physics, Charles University},
  title = {Universal Dependencies 2.5},
  url = {http://hdl.handle.net/11234/1-3105},
  year = {2019},
}

@inproceedings{Conneau2018XNLIEC,
  author = {Alexis Conneau and Guillaume Lample and Ruty Rinott and Adina Williams and Samuel R. Bowman and Holger Schwenk and Veselin Stoyanov},
  booktitle = {EMNLP},
  title = {XNLI: Evaluating Cross-lingual Sentence Representations},
  year = {2018},
}

@article{Lewis2019MLQAEC,
  author = {Patrick Lewis and Barlas Oguz and Ruty Rinott and Sebastian Riedel and Holger Schwenk},
  journal = {ArXiv},
  title = {MLQA: Evaluating Cross-lingual Extractive Question Answering},
  volume = {abs/1910.07475},
  year = {2019},
}

@article{Liang2020XGLUEAN,
  author = {Yaobo Liang and Nan Duan and Yeyun Gong and Ning Wu and Fenfei Guo and Weizhen Qi and Ming Gong and Linjun Shou and Daxin Jiang and Guihong Cao and Xiaodong Fan and Ruofei Zhang and Rahul Agrawal and Edward Cui and Sining Wei and Taroon Bharti and Ying Qiao and Jiun-Hung Chen and Winnie Wu and Shuguang Liu and Fan Yang and Daniel Campos and Rangan Majumder and Ming Zhou},
  journal = {arXiv},
  title = {XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation},
  volume = {abs/2004.01401},
  year = {2020},
}

@article{Sang2002IntroductionTT,
  author = {Erik F. Tjong Kim Sang},
  journal = {ArXiv},
  title = {Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition},
  volume = {cs.CL/0209010},
  year = {2002},
}

@article{Sang2003IntroductionTT,
  author = {Erik F. Tjong Kim Sang and Fien De Meulder},
  journal = {ArXiv},
  title = {Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition},
  volume = {cs.CL/0306050},
  year = {2003},
}

@article{Yang2019PAWSXAC,
  author = {Yinfei Yang and Yuan Zhang and Chris Tar and Jason Baldridge},
  journal = {ArXiv},
  title = {PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification},
  volume = {abs/1908.11828},
  year = {2019},
}