Adding a Model

Adding a model to the Leaderboard¶

The MTEB Leaderboard is available here. To submit to it:

Add the model meta to mteb
Evaluate the desired model using mteb on the benchmarks
Push the results to the results repository via a PR. Once merged they will appear on the leaderboard after a day.

Info

This section contains info on how to submit a model implementation. If you wish to submit a model results see submit results.

Adding a model implementation¶

Adding a model implementation to mteb is quite straightforward. Typically, it only requires that you fill in metadata about the model and add it to the model directory:

Adding a ModelMeta object

from mteb.models import ModelMeta, SentenceTransformerEncoderWrapper

my_model = ModelMeta(
    name="model_name",
    loader=SentenceTransformerEncoderWrapper,
    languages=["eng-Latn"], # follows ISO 639-3 and BCP-47
    open_weights=True,
    revision="5617a9f61b028005a4858fdac845db406aefb181",
    release_date="2025-01-01",
    n_parameters=568_000_000,
    memory_usage_mb=2167,
    embed_dim=4096,
    license="mit",
    max_tokens=8194,
    reference="https://huggingface.co/user-or-org/model-name",
    similarity_fn_name="cosine",
    framework=["Sentence Transformers", "PyTorch"],
    use_instructions=False,
    public_training_code="https://github.com/user-or-org/my-training-code",
    public_training_data="https://huggingface.co/datasets/user-or-org/full-dataset",
    training_datasets={"MSMARCO"}, # if you trained on the MSMARCO training set
    output_dtypes=[OutputDType.INT8, OutputDType.BINARY], # Alternative output types supported by the model
)

This works for all Sentence Transformers compatible models. Once filled out, you can submit your model to mteb by submitting a PR.

You can generate it automatically by using:

General model from hubFor Sentence transformers modelFor CrossEncoder

from mteb.models import ModelMeta

meta = ModelMeta.from_hub("Qwen/Qwen3-Embedding-0.6B")
print(meta.to_python())

from mteb.models import ModelMeta
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B", device="cpu")
meta = ModelMeta.from_sentence_transformer_model(model)
print(meta.to_python())

from mteb.models import ModelMeta
from sentence_transformers import CrossEncoder

model = SentenceTransformer("Qwen/Qwen3-Reranker-0.6B", device="cpu")
meta = ModelMeta.from_cross_encoder(model)
print(meta.to_python())

Calculating the Memory Usage¶

To calculate memory_usage_mb, run:

model_meta = mteb.get_model_meta("model_name")
model_meta.calculate_memory_usage_mb()

Adding instruction models¶

Some models, such as the E5 models, use instructions or prefixes. You can directly add the prompts when saving and uploading your model to the Hub. Refer to this configuration file as an example.

However, you can also add these directly to the model configuration:

model = ModelMeta(
    loader=SentenceTransformerEncoderWrapper,
    loader_kwargs=dict(
        model_prompts={
           "query": "query: ",
           "passage": "passage: ",
        },
    ),
    ...
)

Using a custom Implementation¶

If you need to use a custom implementation, you can specify the loader parameter in the ModelMeta class. It should implement one of the following protocols: Encoder, CrossEncoder, or Search.

Custom Model Implementation

from mteb.types import PromptType, Array
import numpy as np

class CustomModel:
    def __init__(self, model_name: str, revision: str, **kwargs):
        pass # your initialization of model here

    def encode(
        self,
        inputs: DataLoader[BatchedInput],
        *,
        task_metadata: TaskMetadata,
        hf_split: str,
        hf_subset: str,
        prompt_type: PromptType | None = None,
        **kwargs,
    ) -> Array:

        arrays = []
        for batch in inputs:
            documents = batch["text"]
            # embed documents:
            embed_dim = 100
            embedding = np.zeros((len(documents), embed_dim))

        embeddings = np.concat(arrays)
        return embeddings

Then you can specify the loader parameter in the ModelMeta class:

your_model = ModelMeta(
    loader=CustomModel,
    loader_kwargs={...},
    ...
)

Adding model dependencies¶

If you are adding a model that requires additional dependencies, you can add them to the pyproject.toml file, under optional dependencies:

voyageai = ["voyageai>=1.0.0,<2.0.0"]

This ensures that the implementation does not break if a package is updated.

As it is an optional dependency, you can't use top-level dependencies, but will instead have to use import inside the wrapper scope:

Adding optional dependencies

class VoyageAIModel:
    def __init__(self, model_name: str, revision: str, **kwargs) -> None:
        import voyageai
        ...

# in the model meta specify the requirement group:
voyage_model = ModelMeta(
    model_name = "...",
    extra_requirements_groups=["voyageai"],
    ...
)

Submitting your model as a PR¶

When submitting you models as a PR, please copy and paste the following checklist into the pull request message:

- [ ] I have filled out the ModelMeta object to the extent possible
- [ ] I have ensured that my model can be loaded using
  - [ ] `mteb.get_model(model_name, revision)` and
  - [ ] `mteb.get_model_meta(model_name, revision)`
- [ ] I have tested the implementation works on a representative set of tasks.
- [ ] The model is public, i.e., is available either as an API or the weights are publicly available to download
- [ ] I reproduced results from the original paper (if applicable) on at least one benchmark, and I am including the results in the PR description.

Matryoshka embeddings¶

To add support for matryoshka embeddings you can specify embed_dim as a list of dimensions.

import mteb
from mteb.models import ModelMeta

my_model = ModelMeta(
    name="custom/my_model",
    ...,
    embed_dim=[128, 256, 512, 1024],
)