Skip to content

Models

A model in mteb covers two concepts: metadata and implementation. - Metadata contains information about the model such as maximum input length, valid frameworks, license, and degree of openness. - Implementation is a reproducible workflow, which allows others to run the same model again, using the same prompts, hyperparameters, aggregation strategies, etc.

An overview of the model and its metadata within mteb

Utilities

mteb.get_model_metas(model_names=None, languages=None, open_weights=None, frameworks=None, n_parameters_range=(None, None), use_instructions=None, zero_shot_on=None)

Load all models' metadata that fit the specified criteria.

Parameters:

Name Type Description Default
model_names Iterable[str] | None

A list of model names to filter by. If None, all models are included.

None
languages Iterable[str] | None

A list of languages to filter by. If None, all languages are included.

None
open_weights bool | None

Whether to filter by models with open weights. If None this filter is ignored.

None
frameworks Iterable[str] | None

A list of frameworks to filter by. If None, all frameworks are included.

None
n_parameters_range tuple[int | None, int | None]

A tuple of lower and upper bounds of the number of parameters to filter by. If (None, None), this filter is ignored.

(None, None)
use_instructions bool | None

Whether to filter by models that use instructions. If None, all models are included.

None
zero_shot_on list[AbsTask] | None

A list of tasks on which the model is zero-shot. If None this filter is ignored.

None
Source code in mteb/models/get_model_meta.py
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def get_model_metas(
    model_names: Iterable[str] | None = None,
    languages: Iterable[str] | None = None,
    open_weights: bool | None = None,
    frameworks: Iterable[str] | None = None,
    n_parameters_range: tuple[int | None, int | None] = (None, None),
    use_instructions: bool | None = None,
    zero_shot_on: list[AbsTask] | None = None,
) -> list[ModelMeta]:
    """Load all models' metadata that fit the specified criteria.

    Args:
        model_names: A list of model names to filter by. If None, all models are included.
        languages: A list of languages to filter by. If None, all languages are included.
        open_weights: Whether to filter by models with open weights. If None this filter is ignored.
        frameworks: A list of frameworks to filter by. If None, all frameworks are included.
        n_parameters_range: A tuple of lower and upper bounds of the number of parameters to filter by.
            If (None, None), this filter is ignored.
        use_instructions: Whether to filter by models that use instructions. If None, all models are included.
        zero_shot_on: A list of tasks on which the model is zero-shot. If None this filter is ignored.
    """
    res = []
    model_names = set(model_names) if model_names is not None else None
    languages = set(languages) if languages is not None else None
    frameworks = set(frameworks) if frameworks is not None else None
    for model_meta in MODEL_REGISTRY.values():
        if (model_names is not None) and (model_meta.name not in model_names):
            continue
        if languages is not None:
            if (model_meta.languages is None) or not (
                languages <= set(model_meta.languages)
            ):
                continue
        if (open_weights is not None) and (model_meta.open_weights != open_weights):
            continue
        if (frameworks is not None) and not (frameworks <= set(model_meta.framework)):
            continue
        if (use_instructions is not None) and (
            model_meta.use_instructions != use_instructions
        ):
            continue

        lower, upper = n_parameters_range
        n_parameters = model_meta.n_parameters

        if upper is not None:
            if (n_parameters is None) or (n_parameters > upper):
                continue
            if lower is not None and n_parameters < lower:
                continue

        if zero_shot_on is not None:
            if not model_meta.is_zero_shot_on(zero_shot_on):
                continue
        res.append(model_meta)
    return res

mteb.get_model_meta(model_name, revision=None, fetch_from_hf=True)

A function to fetch a model metadata object by name.

Parameters:

Name Type Description Default
model_name str

Name of the model to fetch

required
revision str | None

Revision of the model to fetch

None
fetch_from_hf bool

Whether to fetch the model from HuggingFace Hub if not found in the registry

True

Returns:

Type Description
ModelMeta

A model metadata object

Source code in mteb/models/get_model_meta.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
def get_model_meta(
    model_name: str, revision: str | None = None, fetch_from_hf: bool = True
) -> ModelMeta:
    """A function to fetch a model metadata object by name.

    Args:
        model_name: Name of the model to fetch
        revision: Revision of the model to fetch
        fetch_from_hf: Whether to fetch the model from HuggingFace Hub if not found in the registry

    Returns:
        A model metadata object
    """
    if model_name in MODEL_REGISTRY:
        model_meta = MODEL_REGISTRY[model_name]

        if revision and (not model_meta.revision == revision):
            raise ValueError(
                f"Model revision {revision} not found for model {model_name}. Expected {model_meta.revision}."
            )
        return model_meta
    if fetch_from_hf:
        logger.info(
            "Model not found in model registry. Attempting to extract metadata by loading the model ({model_name}) using HuggingFace."
        )
        try:
            meta = _model_meta_from_hf_hub(model_name)
            meta.revision = revision
            return meta
        except RepositoryNotFoundError:
            pass

    not_found_msg = f"Model '{model_name}' not found in MTEB registry"
    not_found_msg += " nor on the Huggingface Hub." if fetch_from_hf else "."

    close_matches = difflib.get_close_matches(model_name, MODEL_REGISTRY.keys())
    model_names_no_org = {mdl: mdl.split("/")[-1] for mdl in MODEL_REGISTRY.keys()}
    if model_name in model_names_no_org:
        close_matches = [model_names_no_org[model_name]] + close_matches

    suggestion = ""
    if close_matches:
        if len(close_matches) > 1:
            suggestion = f" Did you mean: '{close_matches[0]}' or {close_matches[1]}?"
        else:
            suggestion = f" Did you mean: '{close_matches[0]}'?"

    raise KeyError(not_found_msg + suggestion)

mteb.get_model(model_name, revision=None, **kwargs)

A function to fetch and load model object by name.

Note

This function loads the model into memory. If you only want to fetch the metadata, use get_model_meta instead.

Parameters:

Name Type Description Default
model_name str

Name of the model to fetch

required
revision str | None

Revision of the model to fetch

None
**kwargs Any

Additional keyword arguments to pass to the model loader

{}

Returns:

Type Description
MTEBModels

A model object

Source code in mteb/models/get_model_meta.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def get_model(
    model_name: str, revision: str | None = None, **kwargs: Any
) -> MTEBModels:
    """A function to fetch and load model object by name.

    !!! note
        This function loads the model into memory. If you only want to fetch the metadata, use [`get_model_meta`](#mteb.get_model_meta) instead.

    Args:
        model_name: Name of the model to fetch
        revision: Revision of the model to fetch
        **kwargs: Additional keyword arguments to pass to the model loader

    Returns:
        A model object
    """
    from sentence_transformers import CrossEncoder, SentenceTransformer

    meta = get_model_meta(model_name, revision)
    model = meta.load_model(**kwargs)

    # If revision not available in the modelmeta, try to extract it from sentence-transformers
    if hasattr(model, "model") and isinstance(model.model, SentenceTransformer):  # type: ignore
        _meta = _model_meta_from_sentence_transformers(model.model)  # type: ignore
        if meta.revision is None:
            meta.revision = _meta.revision if _meta.revision else meta.revision
        if not meta.similarity_fn_name:
            meta.similarity_fn_name = _meta.similarity_fn_name

    elif isinstance(model, CrossEncoder):
        _meta = _model_meta_from_cross_encoder(model.model)
        if meta.revision is None:
            meta.revision = _meta.revision if _meta.revision else meta.revision

    model.mteb_model_meta = meta  # type: ignore
    return model

Metadata

mteb.models.model_meta.ModelMeta

Bases: BaseModel

The model metadata object.

Attributes:

Name Type Description
loader Callable[..., MTEBModels] | None

the function that loads the model. If None it will assume that the model is not implemented. Cross-encoders like models would implement only SearchInterface.

loader_kwargs dict[str, Any]

The keyword arguments to pass to the loader function.

name str | None

The name of the model, ideally the name on huggingface. It should be in the format "organization/model_name".

n_parameters int | None

The number of parameters in the model, e.g. 7_000_000 for a 7M parameter model. Can be None if the number of parameters is not known (e.g. for proprietary models) or if the loader returns a SentenceTransformer model from which it can be derived.

memory_usage_mb float | None

The memory usage of the model in MB. Can be None if the memory usage is not known (e.g. for proprietary models). To calculate it use the calculate_memory_usage_mb method.

max_tokens float | None

The maximum number of tokens the model can handle. Can be None if the maximum number of tokens is not known (e.g. for proprietary models).

embed_dim int | None

The dimension of the embeddings produced by the model. Currently all models are assumed to produce fixed-size embeddings.

revision str | None

The revision number of the model. If None, it is assumed that the metadata (including the loader) is valid for all revisions of the model.

release_date StrDate | None

The date the model's revision was released.

license Licenses | StrURL | None

The license under which the model is released. Required if open_weights is True.

open_weights bool | None

Whether the model is open source or proprietary.

public_training_code str | None

A link to the publicly available training code. If None, it is assumed that the training code is not publicly available.

public_training_data str | bool | None

A link to the publicly available training data. If None, it is assumed that the training data is not publicly available.

similarity_fn_name ScoringFunction | None

The distance metric used by the model.

framework list[FRAMEWORKS]

The framework the model is implemented in, can be a list of frameworks e.g. ["Sentence Transformers", "PyTorch"].

reference StrURL | None

A URL to the model's page on huggingface or another source.

languages list[ISOLanguageScript] | None

The languages the model is intended to be specified as a 3-letter language code followed by a script code e.g., "eng-Latn" for English in the Latin script.

use_instructions bool | None

Whether the model uses instructions E.g. for prompt-based models. This also includes models that require a specific format for input, such as "query: {document}" or "passage: {document}".

citation str | None

The citation for the model. This is a bibtex string.

training_datasets set[str] | None

A dictionary of datasets that the model was trained on. Names should be names as their appear in mteb for example {"ArguAna"} if the model is trained on the ArguAna test set. This field is used to determine if a model generalizes zero-shot to a benchmark as well as mark dataset contaminations.

adapted_from str | None

Name of the model from which this model is adapted. For quantizations, fine-tunes, long doc extensions, etc.

superseded_by str | None

Name of the model that supersedes this model, e.g., nvidia/NV-Embed-v2 supersedes v1.

is_cross_encoder bool | None

Whether the model can act as a cross-encoder or not.

modalities list[Modalities]

A list of strings representing the modalities the model supports. Default is ["text"].

Source code in mteb/models/model_meta.py
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
class ModelMeta(BaseModel):
    """The model metadata object.

    Attributes:
        loader: the function that loads the model. If None it will assume that the model is not implemented. Cross-encoders like models would implement *only* `SearchInterface`.
        loader_kwargs: The keyword arguments to pass to the loader function.
        name: The name of the model, ideally the name on huggingface. It should be in the format "organization/model_name".
        n_parameters: The number of parameters in the model, e.g. 7_000_000 for a 7M parameter model. Can be None if the number of parameters is not known (e.g. for proprietary models) or
            if the loader returns a SentenceTransformer model from which it can be derived.
        memory_usage_mb: The memory usage of the model in MB. Can be None if the memory usage is not known (e.g. for proprietary models). To calculate it use the `calculate_memory_usage_mb` method.
        max_tokens: The maximum number of tokens the model can handle. Can be None if the maximum number of tokens is not known (e.g. for proprietary
            models).
        embed_dim: The dimension of the embeddings produced by the model. Currently all models are assumed to produce fixed-size embeddings.
        revision: The revision number of the model. If None, it is assumed that the metadata (including the loader) is valid for all revisions of the model.
        release_date: The date the model's revision was released.
        license: The license under which the model is released. Required if open_weights is True.
        open_weights: Whether the model is open source or proprietary.
        public_training_code: A link to the publicly available training code. If None, it is assumed that the training code is not publicly available.
        public_training_data: A link to the publicly available training data. If None, it is assumed that the training data is not publicly available.
        similarity_fn_name: The distance metric used by the model.
        framework: The framework the model is implemented in, can be a list of frameworks e.g. `["Sentence Transformers", "PyTorch"]`.
        reference: A URL to the model's page on huggingface or another source.
        languages: The languages the model is intended to be specified as a 3-letter language code followed by a script code e.g., "eng-Latn" for English
            in the Latin script.
        use_instructions: Whether the model uses instructions E.g. for prompt-based models. This also includes models that require a specific format for
            input, such as "query: {document}" or "passage: {document}".
        citation: The citation for the model. This is a bibtex string.
        training_datasets: A dictionary of datasets that the model was trained on. Names should be names as their appear in `mteb` for example
            {"ArguAna"} if the model is trained on the ArguAna test set. This field is used to determine if a model generalizes zero-shot to
            a benchmark as well as mark dataset contaminations.
        adapted_from: Name of the model from which this model is adapted. For quantizations, fine-tunes, long doc extensions, etc.
        superseded_by: Name of the model that supersedes this model, e.g., nvidia/NV-Embed-v2 supersedes v1.
        is_cross_encoder: Whether the model can act as a cross-encoder or not.
        modalities: A list of strings representing the modalities the model supports. Default is ["text"].
    """

    model_config = ConfigDict(extra="forbid")

    # loaders
    loader: Callable[..., MTEBModels] | None
    loader_kwargs: dict[str, Any] = field(default_factory=dict)
    name: str | None
    revision: str | None
    release_date: StrDate | None
    languages: list[ISOLanguageScript] | None
    n_parameters: int | None
    memory_usage_mb: float | None
    max_tokens: float | None
    embed_dim: int | None
    license: Licenses | StrURL | None
    open_weights: bool | None
    public_training_code: str | None
    public_training_data: str | bool | None
    framework: list[FRAMEWORKS]
    reference: StrURL | None = None
    similarity_fn_name: ScoringFunction | None
    use_instructions: bool | None
    training_datasets: set[str] | None
    adapted_from: str | None = None
    superseded_by: str | None = None
    modalities: list[Modalities] = ["text"]
    is_cross_encoder: bool | None = None
    citation: str | None = None

    @field_validator("similarity_fn_name", mode="before")
    @classmethod
    def validate_similarity_fn_name(cls, value):
        """Converts the similarity function name to the corresponding enum value.
        sentence_transformers uses Literal['cosine', 'dot', 'euclidean', 'manhattan'],
        and pylate uses Literal['MaxSim']
        """
        if type(value) is ScoringFunction or value is None:
            return value
        mapping = {
            "cosine": ScoringFunction.COSINE,
            "dot": ScoringFunction.DOT_PRODUCT,
            "MaxSim": ScoringFunction.MAX_SIM,
        }
        if value in mapping:
            return mapping[value]
        raise ValueError(f"Invalid similarity function name: {value}")

    def to_dict(self):
        dict_repr = self.model_dump()
        loader = dict_repr.pop("loader", None)
        dict_repr["training_datasets"] = (
            list(dict_repr["training_datasets"])
            if isinstance(dict_repr["training_datasets"], set)
            else dict_repr["training_datasets"]
        )
        dict_repr["loader"] = _get_loader_name(loader)
        return dict_repr

    @field_validator("languages")
    @classmethod
    def languages_are_valid(
        cls, languages: list[ISOLanguageScript] | None
    ) -> list[ISOLanguageScript] | None:
        if languages is None:
            return None

        for code in languages:
            check_language_code(code)
        return languages

    @field_validator("name")
    @classmethod
    def check_name(cls, v: str | None) -> str | None:
        if v is None or v in ("bm25s", "Human"):
            return v
        if "/" not in v:
            raise ValueError(
                "Model name must be in the format 'organization/model_name'"
            )
        return v

    def load_model(self, **kwargs: Any) -> Encoder:
        if self.loader is None:
            raise NotImplementedError(
                "No model implementation is available for this model."
            )
        if self.name is None:
            raise ValueError("name is not set for ModelMeta. Cannot load model.")

        # Allow overwrites
        _kwargs = self.loader_kwargs.copy()
        _kwargs.update(kwargs)

        model: Encoder = self.loader(self.name, revision=self.revision, **_kwargs)
        model.mteb_model_meta = self  # type: ignore
        return model

    def model_name_as_path(self) -> str:
        if self.name is None:
            raise ValueError("Model name is not set")
        return self.name.replace("/", "__").replace(" ", "_")

    def is_zero_shot_on(self, tasks: Sequence[AbsTask] | Sequence[str]) -> bool | None:
        """Indicates whether the given model can be considered
        zero-shot or not on the given tasks.
        Returns None if no training data is specified on the model.
        """
        # If no tasks were specified, we're obviously zero-shot
        if not tasks:
            return True
        training_datasets = self.get_training_datasets()
        # If no tasks were specified, we're obviously zero-shot
        if training_datasets is None:
            return None

        if isinstance(tasks[0], str):
            benchmark_datasets = set(tasks)
        else:
            tasks = cast(Sequence[AbsTask], tasks)
            benchmark_datasets = set()
            for task in tasks:
                benchmark_datasets.add(task.metadata.name)
        intersection = training_datasets & benchmark_datasets
        return len(intersection) == 0

    def get_training_datasets(self) -> set[str] | None:
        """Returns all training datasets of the model including similar tasks."""
        import mteb

        if self.training_datasets is None:
            return None

        training_datasets = self.training_datasets.copy()
        if self.adapted_from is not None:
            try:
                adapted_from_model = mteb.get_model_meta(
                    self.adapted_from, fetch_from_hf=False
                )
                adapted_training_datasets = adapted_from_model.get_training_datasets()
                if adapted_training_datasets is not None:
                    training_datasets |= adapted_training_datasets
            except (ValueError, KeyError) as e:
                logger.warning(f"Could not get source model: {e} in MTEB")

        return_dataset = training_datasets.copy()
        visited = set()

        for dataset in training_datasets:
            similar_tasks = collect_similar_tasks(dataset, visited)
            return_dataset |= similar_tasks

        return return_dataset

    def zero_shot_percentage(
        self, tasks: Sequence[AbsTask] | Sequence[str]
    ) -> int | None:
        """Indicates how out-of-domain the selected tasks are for the given model."""
        training_datasets = self.get_training_datasets()
        if (training_datasets is None) or (not tasks):
            return None
        if isinstance(tasks[0], str):
            benchmark_datasets = set(tasks)
        else:
            tasks = cast(Sequence[AbsTask], tasks)
            benchmark_datasets = {task.metadata.name for task in tasks}
        overlap = training_datasets & benchmark_datasets
        perc_overlap = 100 * (len(overlap) / len(benchmark_datasets))
        return int(100 - perc_overlap)

    def calculate_memory_usage_mb(self) -> int | None:
        """Calculates the memory usage (in FP32) of the model in MB."""
        if "API" in self.framework:
            return None

        MB = 1024**2
        try:
            safetensors_metadata = get_safetensors_metadata(self.name)  # type: ignore
            if len(safetensors_metadata.parameter_count) >= 0:
                dtype_size_map = {
                    "F64": 8,  # 64-bit float
                    "F32": 4,  # 32-bit float (FP32)
                    "F16": 2,  # 16-bit float (FP16)
                    "BF16": 2,  # BFloat16
                    "I64": 8,  # 64-bit integer
                    "I32": 4,  # 32-bit integer
                    "I16": 2,  # 16-bit integer
                    "I8": 1,  # 8-bit integer
                    "U8": 1,  # Unsigned 8-bit integer
                    "BOOL": 1,  # Boolean (assuming 1 byte per value)
                }
                total_memory_bytes = sum(
                    parameters * dtype_size_map.get(dtype, 4)
                    for dtype, parameters in safetensors_metadata.parameter_count.items()
                )
                return round(total_memory_bytes / MB)  # Convert to MB

        except (NotASafetensorsRepoError, SafetensorsParsingError, GatedRepoError):
            pass
        if self.n_parameters is None:
            return None
        # Model memory in bytes. For FP32 each parameter is 4 bytes.
        model_memory_bytes = self.n_parameters * 4

        # Convert to MB
        model_memory_mb = model_memory_bytes / MB
        return round(model_memory_mb)

calculate_memory_usage_mb()

Calculates the memory usage (in FP32) of the model in MB.

Source code in mteb/models/model_meta.py
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
def calculate_memory_usage_mb(self) -> int | None:
    """Calculates the memory usage (in FP32) of the model in MB."""
    if "API" in self.framework:
        return None

    MB = 1024**2
    try:
        safetensors_metadata = get_safetensors_metadata(self.name)  # type: ignore
        if len(safetensors_metadata.parameter_count) >= 0:
            dtype_size_map = {
                "F64": 8,  # 64-bit float
                "F32": 4,  # 32-bit float (FP32)
                "F16": 2,  # 16-bit float (FP16)
                "BF16": 2,  # BFloat16
                "I64": 8,  # 64-bit integer
                "I32": 4,  # 32-bit integer
                "I16": 2,  # 16-bit integer
                "I8": 1,  # 8-bit integer
                "U8": 1,  # Unsigned 8-bit integer
                "BOOL": 1,  # Boolean (assuming 1 byte per value)
            }
            total_memory_bytes = sum(
                parameters * dtype_size_map.get(dtype, 4)
                for dtype, parameters in safetensors_metadata.parameter_count.items()
            )
            return round(total_memory_bytes / MB)  # Convert to MB

    except (NotASafetensorsRepoError, SafetensorsParsingError, GatedRepoError):
        pass
    if self.n_parameters is None:
        return None
    # Model memory in bytes. For FP32 each parameter is 4 bytes.
    model_memory_bytes = self.n_parameters * 4

    # Convert to MB
    model_memory_mb = model_memory_bytes / MB
    return round(model_memory_mb)

get_training_datasets()

Returns all training datasets of the model including similar tasks.

Source code in mteb/models/model_meta.py
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
def get_training_datasets(self) -> set[str] | None:
    """Returns all training datasets of the model including similar tasks."""
    import mteb

    if self.training_datasets is None:
        return None

    training_datasets = self.training_datasets.copy()
    if self.adapted_from is not None:
        try:
            adapted_from_model = mteb.get_model_meta(
                self.adapted_from, fetch_from_hf=False
            )
            adapted_training_datasets = adapted_from_model.get_training_datasets()
            if adapted_training_datasets is not None:
                training_datasets |= adapted_training_datasets
        except (ValueError, KeyError) as e:
            logger.warning(f"Could not get source model: {e} in MTEB")

    return_dataset = training_datasets.copy()
    visited = set()

    for dataset in training_datasets:
        similar_tasks = collect_similar_tasks(dataset, visited)
        return_dataset |= similar_tasks

    return return_dataset

is_zero_shot_on(tasks)

Indicates whether the given model can be considered zero-shot or not on the given tasks. Returns None if no training data is specified on the model.

Source code in mteb/models/model_meta.py
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
def is_zero_shot_on(self, tasks: Sequence[AbsTask] | Sequence[str]) -> bool | None:
    """Indicates whether the given model can be considered
    zero-shot or not on the given tasks.
    Returns None if no training data is specified on the model.
    """
    # If no tasks were specified, we're obviously zero-shot
    if not tasks:
        return True
    training_datasets = self.get_training_datasets()
    # If no tasks were specified, we're obviously zero-shot
    if training_datasets is None:
        return None

    if isinstance(tasks[0], str):
        benchmark_datasets = set(tasks)
    else:
        tasks = cast(Sequence[AbsTask], tasks)
        benchmark_datasets = set()
        for task in tasks:
            benchmark_datasets.add(task.metadata.name)
    intersection = training_datasets & benchmark_datasets
    return len(intersection) == 0

validate_similarity_fn_name(value) classmethod

Converts the similarity function name to the corresponding enum value. sentence_transformers uses Literal['cosine', 'dot', 'euclidean', 'manhattan'], and pylate uses Literal['MaxSim']

Source code in mteb/models/model_meta.py
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
@field_validator("similarity_fn_name", mode="before")
@classmethod
def validate_similarity_fn_name(cls, value):
    """Converts the similarity function name to the corresponding enum value.
    sentence_transformers uses Literal['cosine', 'dot', 'euclidean', 'manhattan'],
    and pylate uses Literal['MaxSim']
    """
    if type(value) is ScoringFunction or value is None:
        return value
    mapping = {
        "cosine": ScoringFunction.COSINE,
        "dot": ScoringFunction.DOT_PRODUCT,
        "MaxSim": ScoringFunction.MAX_SIM,
    }
    if value in mapping:
        return mapping[value]
    raise ValueError(f"Invalid similarity function name: {value}")

zero_shot_percentage(tasks)

Indicates how out-of-domain the selected tasks are for the given model.

Source code in mteb/models/model_meta.py
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
def zero_shot_percentage(
    self, tasks: Sequence[AbsTask] | Sequence[str]
) -> int | None:
    """Indicates how out-of-domain the selected tasks are for the given model."""
    training_datasets = self.get_training_datasets()
    if (training_datasets is None) or (not tasks):
        return None
    if isinstance(tasks[0], str):
        benchmark_datasets = set(tasks)
    else:
        tasks = cast(Sequence[AbsTask], tasks)
        benchmark_datasets = {task.metadata.name for task in tasks}
    overlap = training_datasets & benchmark_datasets
    perc_overlap = 100 * (len(overlap) / len(benchmark_datasets))
    return int(100 - perc_overlap)

Model Protocols

mteb.models.Encoder

Bases: Protocol

The interface for an encoder in MTEB.

Besides the required functions specified below, the encoder can additionally specify the following signatures seen below. In general the interface is kept aligned with sentence-transformers interface. In cases where exceptions occurs these are handled within MTEB.

Source code in mteb/models/models_protocols.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
@runtime_checkable
class Encoder(Protocol):
    """The interface for an encoder in MTEB.

    Besides the required functions specified below, the encoder can additionally specify the following signatures seen below.
    In general the interface is kept aligned with sentence-transformers interface. In cases where exceptions occurs these are handled within MTEB.
    """

    def __init__(self, model_name: str, revision: str | None, **kwargs: Any) -> None:
        """The initialization function for the encoder. Used when calling it from the mteb run CLI.

        Args:
            model_name: Name of the model
            revision: revision of the model
            kwargs: Any additional kwargs
        """
        ...

    def encode(
        self,
        inputs: DataLoader[BatchedInput],
        *,
        task_metadata: TaskMetadata,
        hf_split: str,
        hf_subset: str,
        prompt_type: PromptType | None = None,
        **kwargs: Any,
    ) -> Array:
        """Encodes the given sentences using the encoder.

        Args:
            inputs: Batch of inputs to encode.
            task_metadata: The metadata of the task. Encoders (e.g. SentenceTransformers) use to
                select the appropriate prompts, with priority given to more specific task/prompt combinations over general ones.

                The order of priorities for prompt selection are:
                    1. Composed prompt of task name + prompt type (query or passage)
                    2. Specific task prompt
                    3. Composed prompt of task type + prompt type (query or passage)
                    4. Specific task type prompt
                    5. Specific prompt type (query or passage)
            hf_split: Split of current task, allows to know some additional information about current split.
                E.g. Current language
            hf_subset: Subset of current task. Similar to `hf_split` to get more information
            prompt_type: The name type of prompt. (query or passage)
            **kwargs: Additional arguments to pass to the encoder.

        Returns:
            The encoded input in a numpy array or torch tensor of the shape (Number of sentences) x (Embedding dimension).
        """
        ...

    def similarity(
        self,
        embeddings1: Array,
        embeddings2: Array,
    ) -> Array:
        """Compute the similarity between two collections of embeddings. The output will be a matrix with the similarity scores between all embeddings
        from the first parameter and all embeddings from the second parameter. This differs from similarity_pairwise which computes the similarity
        between corresponding pairs of embeddings.

        read more at: https://www.sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity

        Args:
            embeddings1: [num_embeddings_1, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
            embeddings2: [num_embeddings_2, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.

        Returns:
            A [num_embeddings_1, num_embeddings_2]-shaped torch tensor with similarity scores.
        """
        ...

    def similarity_pairwise(
        self,
        embeddings1: Array,
        embeddings2: Array,
    ) -> Array:
        """Compute the similarity between two collections of embeddings. The output will be a vector with the similarity scores between each pair of
        embeddings.

        read more at: https://www.sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity_pairwise

        Args:
            embeddings1: [num_embeddings, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
            embeddings2: [num_embeddings, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.

        Returns:
            A [num_embeddings]-shaped torch tensor with pairwise similarity scores.
        """
        ...

    @property
    def mteb_model_meta(self) -> ModelMeta:
        """Metadata of the model"""
        ...

mteb_model_meta property

Metadata of the model

__init__(model_name, revision, **kwargs)

The initialization function for the encoder. Used when calling it from the mteb run CLI.

Parameters:

Name Type Description Default
model_name str

Name of the model

required
revision str | None

revision of the model

required
kwargs Any

Any additional kwargs

{}
Source code in mteb/models/models_protocols.py
88
89
90
91
92
93
94
95
96
def __init__(self, model_name: str, revision: str | None, **kwargs: Any) -> None:
    """The initialization function for the encoder. Used when calling it from the mteb run CLI.

    Args:
        model_name: Name of the model
        revision: revision of the model
        kwargs: Any additional kwargs
    """
    ...

encode(inputs, *, task_metadata, hf_split, hf_subset, prompt_type=None, **kwargs)

Encodes the given sentences using the encoder.

Parameters:

Name Type Description Default
inputs DataLoader[BatchedInput]

Batch of inputs to encode.

required
task_metadata TaskMetadata

The metadata of the task. Encoders (e.g. SentenceTransformers) use to select the appropriate prompts, with priority given to more specific task/prompt combinations over general ones.

The order of priorities for prompt selection are: 1. Composed prompt of task name + prompt type (query or passage) 2. Specific task prompt 3. Composed prompt of task type + prompt type (query or passage) 4. Specific task type prompt 5. Specific prompt type (query or passage)

required
hf_split str

Split of current task, allows to know some additional information about current split. E.g. Current language

required
hf_subset str

Subset of current task. Similar to hf_split to get more information

required
prompt_type PromptType | None

The name type of prompt. (query or passage)

None
**kwargs Any

Additional arguments to pass to the encoder.

{}

Returns:

Type Description
Array

The encoded input in a numpy array or torch tensor of the shape (Number of sentences) x (Embedding dimension).

Source code in mteb/models/models_protocols.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def encode(
    self,
    inputs: DataLoader[BatchedInput],
    *,
    task_metadata: TaskMetadata,
    hf_split: str,
    hf_subset: str,
    prompt_type: PromptType | None = None,
    **kwargs: Any,
) -> Array:
    """Encodes the given sentences using the encoder.

    Args:
        inputs: Batch of inputs to encode.
        task_metadata: The metadata of the task. Encoders (e.g. SentenceTransformers) use to
            select the appropriate prompts, with priority given to more specific task/prompt combinations over general ones.

            The order of priorities for prompt selection are:
                1. Composed prompt of task name + prompt type (query or passage)
                2. Specific task prompt
                3. Composed prompt of task type + prompt type (query or passage)
                4. Specific task type prompt
                5. Specific prompt type (query or passage)
        hf_split: Split of current task, allows to know some additional information about current split.
            E.g. Current language
        hf_subset: Subset of current task. Similar to `hf_split` to get more information
        prompt_type: The name type of prompt. (query or passage)
        **kwargs: Additional arguments to pass to the encoder.

    Returns:
        The encoded input in a numpy array or torch tensor of the shape (Number of sentences) x (Embedding dimension).
    """
    ...

similarity(embeddings1, embeddings2)

Compute the similarity between two collections of embeddings. The output will be a matrix with the similarity scores between all embeddings from the first parameter and all embeddings from the second parameter. This differs from similarity_pairwise which computes the similarity between corresponding pairs of embeddings.

read more at: https://www.sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity

Parameters:

Name Type Description Default
embeddings1 Array

[num_embeddings_1, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.

required
embeddings2 Array

[num_embeddings_2, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.

required

Returns:

Type Description
Array

A [num_embeddings_1, num_embeddings_2]-shaped torch tensor with similarity scores.

Source code in mteb/models/models_protocols.py
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def similarity(
    self,
    embeddings1: Array,
    embeddings2: Array,
) -> Array:
    """Compute the similarity between two collections of embeddings. The output will be a matrix with the similarity scores between all embeddings
    from the first parameter and all embeddings from the second parameter. This differs from similarity_pairwise which computes the similarity
    between corresponding pairs of embeddings.

    read more at: https://www.sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity

    Args:
        embeddings1: [num_embeddings_1, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
        embeddings2: [num_embeddings_2, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.

    Returns:
        A [num_embeddings_1, num_embeddings_2]-shaped torch tensor with similarity scores.
    """
    ...

similarity_pairwise(embeddings1, embeddings2)

Compute the similarity between two collections of embeddings. The output will be a vector with the similarity scores between each pair of embeddings.

read more at: https://www.sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity_pairwise

Parameters:

Name Type Description Default
embeddings1 Array

[num_embeddings, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.

required
embeddings2 Array

[num_embeddings, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.

required

Returns:

Type Description
Array

A [num_embeddings]-shaped torch tensor with pairwise similarity scores.

Source code in mteb/models/models_protocols.py
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
def similarity_pairwise(
    self,
    embeddings1: Array,
    embeddings2: Array,
) -> Array:
    """Compute the similarity between two collections of embeddings. The output will be a vector with the similarity scores between each pair of
    embeddings.

    read more at: https://www.sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity_pairwise

    Args:
        embeddings1: [num_embeddings, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
        embeddings2: [num_embeddings, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.

    Returns:
        A [num_embeddings]-shaped torch tensor with pairwise similarity scores.
    """
    ...

mteb.models.SearchProtocol

Bases: Protocol

Interface for searching models.

Source code in mteb/models/models_protocols.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
@runtime_checkable
class SearchProtocol(Protocol):
    """Interface for searching models."""

    def index(
        self,
        corpus: CorpusDatasetType,
        *,
        task_metadata: TaskMetadata,
        hf_split: str,
        hf_subset: str,
        encode_kwargs: dict[str, Any],
    ) -> None:
        """Index the corpus for retrieval.

        Args:
            corpus: Corpus dataset to index.
            task_metadata: Metadata of the task, used to determine how to index the corpus.
            hf_split: Split of current task, allows to know some additional information about current split.
            hf_subset: Subset of current task. Similar to `hf_split` to get more information
            encode_kwargs: Additional arguments to pass to the encoder during indexing.
        """
        ...

    def search(
        self,
        queries: QueryDatasetType,
        *,
        task_metadata: TaskMetadata,
        hf_split: str,
        hf_subset: str,
        top_k: int,
        encode_kwargs: dict[str, Any],
        top_ranked: TopRankedDocumentsType | None = None,
    ) -> RetrievalOutputType:
        """Search the corpus using the given queries.

        Args:
            queries: Queries to find
            task_metadata: Task metadata
            hf_split: split of the dataset
            hf_subset: subset of the dataset
            top_ranked: Top-ranked documents for each query, mapping query IDs to a list of document IDs.
                Passed only from Reranking tasks.
            top_k: Number of top documents to return for each query.
            encode_kwargs: Additional arguments to pass to the encoder during indexing.

        Returns:
            Dictionary with query IDs as keys with dict as values, where each value is a mapping of document IDs to their relevance scores.
        """
        ...

    @property
    def mteb_model_meta(self) -> ModelMeta:
        """Metadata of the model"""
        ...

mteb_model_meta property

Metadata of the model

index(corpus, *, task_metadata, hf_split, hf_subset, encode_kwargs)

Index the corpus for retrieval.

Parameters:

Name Type Description Default
corpus CorpusDatasetType

Corpus dataset to index.

required
task_metadata TaskMetadata

Metadata of the task, used to determine how to index the corpus.

required
hf_split str

Split of current task, allows to know some additional information about current split.

required
hf_subset str

Subset of current task. Similar to hf_split to get more information

required
encode_kwargs dict[str, Any]

Additional arguments to pass to the encoder during indexing.

required
Source code in mteb/models/models_protocols.py
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def index(
    self,
    corpus: CorpusDatasetType,
    *,
    task_metadata: TaskMetadata,
    hf_split: str,
    hf_subset: str,
    encode_kwargs: dict[str, Any],
) -> None:
    """Index the corpus for retrieval.

    Args:
        corpus: Corpus dataset to index.
        task_metadata: Metadata of the task, used to determine how to index the corpus.
        hf_split: Split of current task, allows to know some additional information about current split.
        hf_subset: Subset of current task. Similar to `hf_split` to get more information
        encode_kwargs: Additional arguments to pass to the encoder during indexing.
    """
    ...

search(queries, *, task_metadata, hf_split, hf_subset, top_k, encode_kwargs, top_ranked=None)

Search the corpus using the given queries.

Parameters:

Name Type Description Default
queries QueryDatasetType

Queries to find

required
task_metadata TaskMetadata

Task metadata

required
hf_split str

split of the dataset

required
hf_subset str

subset of the dataset

required
top_ranked TopRankedDocumentsType | None

Top-ranked documents for each query, mapping query IDs to a list of document IDs. Passed only from Reranking tasks.

None
top_k int

Number of top documents to return for each query.

required
encode_kwargs dict[str, Any]

Additional arguments to pass to the encoder during indexing.

required

Returns:

Type Description
RetrievalOutputType

Dictionary with query IDs as keys with dict as values, where each value is a mapping of document IDs to their relevance scores.

Source code in mteb/models/models_protocols.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def search(
    self,
    queries: QueryDatasetType,
    *,
    task_metadata: TaskMetadata,
    hf_split: str,
    hf_subset: str,
    top_k: int,
    encode_kwargs: dict[str, Any],
    top_ranked: TopRankedDocumentsType | None = None,
) -> RetrievalOutputType:
    """Search the corpus using the given queries.

    Args:
        queries: Queries to find
        task_metadata: Task metadata
        hf_split: split of the dataset
        hf_subset: subset of the dataset
        top_ranked: Top-ranked documents for each query, mapping query IDs to a list of document IDs.
            Passed only from Reranking tasks.
        top_k: Number of top documents to return for each query.
        encode_kwargs: Additional arguments to pass to the encoder during indexing.

    Returns:
        Dictionary with query IDs as keys with dict as values, where each value is a mapping of document IDs to their relevance scores.
    """
    ...

mteb.models.CrossEncoderProtocol

Bases: Protocol

The interface for a CrossEncoder in MTEB.

Besides the required functions specified below, the cross-encoder can additionally specify the following signatures seen below. In general the interface is kept aligned with sentence-transformers interface. In cases where exceptions occurs these are handled within MTEB.

Source code in mteb/models/models_protocols.py
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
@runtime_checkable
class CrossEncoderProtocol(Protocol):
    """The interface for a CrossEncoder in MTEB.

    Besides the required functions specified below, the cross-encoder can additionally specify the following signatures seen below.
    In general the interface is kept aligned with sentence-transformers interface. In cases where exceptions occurs these are handled within MTEB.
    """

    def __init__(self, model_name: str, revision: str | None, **kwargs: Any) -> None:
        """The initialization function for the encoder. Used when calling it from the mteb run CLI.

        Args:
            model_name: Name of the model
            revision: revision of the model
            kwargs: Any additional kwargs
        """
        ...

    def predict(
        self,
        inputs1: DataLoader[BatchedInput],
        inputs2: DataLoader[BatchedInput],
        *,
        task_metadata: TaskMetadata,
        hf_split: str,
        hf_subset: str,
        prompt_type: PromptType | None = None,
        **kwargs: Any,
    ) -> Array:
        """Predicts relevance scores for pairs of inputs. Note that, unlike the encoder, the cross-encoder can compare across inputs.

        Args:
            inputs1: First Dataloader of inputs to encode. For reranking tasks, these are queries (for text only tasks `QueryDatasetType`).
            inputs2: Second Dataloader of inputs to encode. For reranking, these are documents (for text only tasks `RetrievalOutputType`).
            task_metadata: Metadata of the current task.
            hf_split: Split of current task, allows to know some additional information about current split.
                E.g. Current language
            hf_subset: Subset of current task. Similar to `hf_split` to get more information
            prompt_type: The name type of prompt. (query or passage)
            **kwargs: Additional arguments to pass to the cross-encoder.

        Returns:
            The predicted relevance scores for each inputs pair.
        """
        ...

    @property
    def mteb_model_meta(self) -> ModelMeta:
        """Metadata of the model"""
        ...

mteb_model_meta property

Metadata of the model

__init__(model_name, revision, **kwargs)

The initialization function for the encoder. Used when calling it from the mteb run CLI.

Parameters:

Name Type Description Default
model_name str

Name of the model

required
revision str | None

revision of the model

required
kwargs Any

Any additional kwargs

{}
Source code in mteb/models/models_protocols.py
185
186
187
188
189
190
191
192
193
def __init__(self, model_name: str, revision: str | None, **kwargs: Any) -> None:
    """The initialization function for the encoder. Used when calling it from the mteb run CLI.

    Args:
        model_name: Name of the model
        revision: revision of the model
        kwargs: Any additional kwargs
    """
    ...

predict(inputs1, inputs2, *, task_metadata, hf_split, hf_subset, prompt_type=None, **kwargs)

Predicts relevance scores for pairs of inputs. Note that, unlike the encoder, the cross-encoder can compare across inputs.

Parameters:

Name Type Description Default
inputs1 DataLoader[BatchedInput]

First Dataloader of inputs to encode. For reranking tasks, these are queries (for text only tasks QueryDatasetType).

required
inputs2 DataLoader[BatchedInput]

Second Dataloader of inputs to encode. For reranking, these are documents (for text only tasks RetrievalOutputType).

required
task_metadata TaskMetadata

Metadata of the current task.

required
hf_split str

Split of current task, allows to know some additional information about current split. E.g. Current language

required
hf_subset str

Subset of current task. Similar to hf_split to get more information

required
prompt_type PromptType | None

The name type of prompt. (query or passage)

None
**kwargs Any

Additional arguments to pass to the cross-encoder.

{}

Returns:

Type Description
Array

The predicted relevance scores for each inputs pair.

Source code in mteb/models/models_protocols.py
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
def predict(
    self,
    inputs1: DataLoader[BatchedInput],
    inputs2: DataLoader[BatchedInput],
    *,
    task_metadata: TaskMetadata,
    hf_split: str,
    hf_subset: str,
    prompt_type: PromptType | None = None,
    **kwargs: Any,
) -> Array:
    """Predicts relevance scores for pairs of inputs. Note that, unlike the encoder, the cross-encoder can compare across inputs.

    Args:
        inputs1: First Dataloader of inputs to encode. For reranking tasks, these are queries (for text only tasks `QueryDatasetType`).
        inputs2: Second Dataloader of inputs to encode. For reranking, these are documents (for text only tasks `RetrievalOutputType`).
        task_metadata: Metadata of the current task.
        hf_split: Split of current task, allows to know some additional information about current split.
            E.g. Current language
        hf_subset: Subset of current task. Similar to `hf_split` to get more information
        prompt_type: The name type of prompt. (query or passage)
        **kwargs: Additional arguments to pass to the cross-encoder.

    Returns:
        The predicted relevance scores for each inputs pair.
    """
    ...

mteb.models.MTEBModels = Union[Encoder, CrossEncoderProtocol, SearchProtocol] module-attribute

Type alias for all MTEB model types as many models implement multiple protocols and many tasks can be solved by multiple model types.