Submit Results¶

Overview¶

The ResultCache class manages evaluation results locally and submits them to the official results repository. Use it to cache results, avoid re-computation, and contribute results back to the community.

Loading Results¶

For a full guide on loading and working with results — including filtering, dataframe conversion, and benchmark scoring — see Loading Results.

Quick Start¶

Complete example: evaluate, cache, and submit results:

import mteb

# 1. Initialize cache
cache = mteb.ResultCache()

# 2. Evaluate model
model_meta = mteb.get_model_meta("sentence-transformers/all-MiniLM-L6-v2")
task = mteb.get_task("ArguAna")

mteb.evaluate(model_meta, task, cache=cache)

# 3. Submit results
cache.submit_results(model_meta, create_pr=False)  # manual review before pushing

Submitting Results¶

Manual SubmissionAutomated Submission

Note

Git is required for this action.

Prepare results without automatically creating a PR:

submission_info = cache.submit_results(
    models=["sentence-transformers/all-MiniLM-L6-v2"],
    create_pr=False
)

# submit_results logs the manual submission instructions
print(f"Prepared submission at: {submission_info['path']}")

Note

Git, GitHub CLI are required for this action. You also need to install the mteb[github] extra dependencies and configure GitHub integration by signing in with gh auth login or setting up your Git credential helper.

pipuv

pip install mteb[github]

uv pip install mteb[github]

Then run your code:

submission_info = cache.submit_results(
    models=["sentence-transformers/all-MiniLM-L6-v2"],
    create_pr=True
)

if submission_info.get("pr_url"):
    print(f"PR created: {submission_info['pr_url']}")

Batch Submission¶

Submit multiple models at once:

models = [
    "sentence-transformers/all-MiniLM-L6-v2",
    "sentence-transformers/all-mpnet-base-v2",
    "BAAI/bge-base-en-v1.5"
]

cache.submit_results(models=models, create_pr=False)

After submission¶

Once the PR is created your result will now wait for review, we aim for this to take less than a week. To speed up the review please make sure the fill out the checklist. During the review process we might ask you about suspicious results or ask you to check for potential data leakage.

API Reference¶

submit_results() - Submit results
save_to_cache() - Save evaluation results
load_results() - Load cached results
download_from_remote() - Sync with remote
clear_cache() - Clear cache