Overview¶

This is the API documentation for mteb a package for benchmark and evaluating the quality of embeddings. This package was initially introduced as a package for evaluating text embeddings for English¹, but have since been extended cover multiple languages² and multiple modalities³.

Package Overview¶

This package generally consists of three main concepts benchmarks, tasks and model implementations.

Benchmarks¶

A benchmark is a tool to evaluate an embedding model for a given use case. For instance, mteb(eng) is intended to evaluate the quality of text embedding models for broad range of English use-cases such retrieval, classification, and reranking. A benchmark consist of a collection of tasks. When a model is run on a benchmark it is run on each task individually.

An overview of the benchmark within `mteb`

Task¶

A task is an implementation of a dataset for evaluation. It could for instance be the MIRACL dataset consisting of queries, a corpus of documents as well as the correct documents to retrieve for a given query. In addition to the dataset a task includes specification for how a model should be run on the dataset and how its output should be evaluation. We implement a variety of different tasks e.g. for evaluating classification, retrieval etc., We denote these task categories. Each task also come with extensive metadata including the license, who annotated the data and so on.

Model Implementation¶

A model implementation is simply an implementation of an embedding model or API to ensure that others can reproduce the exact results on a given task. For instance, when running the OpenAI embedding API on a document larger than the maximum amount of tokens a user will have to decide how they want to deal with this limitations (e.g. by truncating the sequence). Having a shared implementation allow us to examine these implementation assumptions and allow for reproducible workflow. To ensure consistency we define a standard interface/protocol that models should follow to be implemented. These implementations additionally come with metadata, that for example include license, compatible frameworks, and whether the weight are public or not.

An overview of the model and its metadata within `mteb`

Niklas Muennighoff, Nouamane Tazi, Loic Magne, and Nils Reimers. MTEB: massive text embedding benchmark. In Andreas Vlachos and Isabelle Augenstein, editors, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2014–2037. Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. URL: https://aclanthology.org/2023.eacl-main.148, doi:10.18653/v1/2023.eacl-main.148. ↩
Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, Márton Kardos, Ashwin Mathur, David Stap, Jay Gala, Wissam Siblini, Dominik Krzemiński, Genta Indra Winata, Saba Sturua, Saiteja Utpala, Mathieu Ciancone, Marion Schaeffer, Gabriel Sequeira, Diganta Misra, Shreeya Dhakal, Jonathan Rystrøm, Roman Solomatin, Ömer Çağatan, Akash Kundu, Martin Bernstorff, Shitao Xiao, Akshita Sukhlecha, Bhavish Pahwa, Rafał Poświata, Kranthi Kiran GV, Shawon Ashraf, Daniel Auras, Björn Plüster, Jan Philipp Harries, Loïc Magne, Isabelle Mohr, Mariya Hendriksen, Dawei Zhu, Hippolyte Gisserot-Boukhlef, Tom Aarsen, Jan Kostkan, Konrad Wojtasik, Taemin Lee, Marek Šuppa, Crystina Zhang, Roberta Rocca, Mohammed Hamdy, Andrianos Michail, John Yang, Manuel Faysse, Aleksei Vatolin, Nandan Thakur, Manan Dey, Dipam Vasani, Pranjal Chitale, Simone Tedeschi, Nguyen Tai, Artem Snegirev, Michael Günther, Mengzhou Xia, Weijia Shi, Xing Han Lù, Jordan Clive, Gayatri Krishnakumar, Anna Maksimova, Silvan Wehrli, Maria Tikhonova, Henil Panchal, Aleksandr Abramov, Malte Ostendorff, Zheng Liu, Simon Clematide, Lester James Miranda, Alena Fenogenova, Guangyu Song, Ruqiya Bin Safi, Wen-Ding Li, Alessia Borghini, Federico Cassano, Hongjin Su, Jimmy Lin, Howard Yen, Lasse Hansen, Sara Hooker, Chenghao Xiao, Vaibhav Adlakha, Orion Weller, Siva Reddy, and Niklas Muennighoff. Mmteb: massive multilingual text embedding benchmark. arXiv preprint arXiv:2502.13595, 2025. URL: https://arxiv.org/abs/2502.13595, doi:10.48550/arXiv.2502.13595. ↩
Chenghao Xiao, Isaac Chung, Imene Kerboua, Jamie Stirling, Xin Zhang, Márton Kardos, Roman Solomatin, Noura Al Moubayed, Kenneth Enevoldsen, and Niklas Muennighoff. Mieb: massive image embedding benchmark. arXiv preprint arXiv:2504.10471, 2025. URL: https://arxiv.org/abs/2504.10471, doi:10.48550/ARXIV.2504.10471. ↩