Submit Results¶
Overview¶
The ResultCache class manages evaluation results locally and submits them to the official results repository. Use it to cache results, avoid re-computation, and contribute results back to the community.
Loading Results¶
For a full guide on loading and working with results — including filtering, dataframe conversion, and benchmark scoring — see Loading Results.
Quick Start¶
Complete example: evaluate, cache, and submit results:
import mteb
# 1. Initialize cache
cache = mteb.ResultCache()
# 2. Evaluate model
model_meta = mteb.get_model_meta("sentence-transformers/all-MiniLM-L6-v2")
task = mteb.get_task("ArguAna")
mteb.evaluate(model_meta, task, cache=cache)
# 3. Submit results
cache.submit_results(model_meta, create_pr=False) # manual review before pushing
Submitting Results¶
Note
Git is required for this action.
Prepare results without automatically creating a PR:
submission_info = cache.submit_results(
models=["sentence-transformers/all-MiniLM-L6-v2"],
create_pr=False
)
# submit_results logs the manual submission instructions
print(f"Prepared submission at: {submission_info['path']}")
Note
Git, GitHub CLI are required for this action. You also need to install the mteb[github] extra dependencies and configure GitHub integration by signing in with gh auth login or setting up your Git credential helper.
pip install mteb[github]
uv pip install mteb[github]
Then run your code:
submission_info = cache.submit_results(
models=["sentence-transformers/all-MiniLM-L6-v2"],
create_pr=True
)
if submission_info.get("pr_url"):
print(f"PR created: {submission_info['pr_url']}")
Batch Submission¶
Submit multiple models at once:
models = [
"sentence-transformers/all-MiniLM-L6-v2",
"sentence-transformers/all-mpnet-base-v2",
"BAAI/bge-base-en-v1.5"
]
cache.submit_results(models=models, create_pr=False)
After submission¶
Once the PR is created your result will now wait for review, we aim for this to take less than a week. To speed up the review please make sure the fill out the checklist. During the review process we might ask you about suspicious results or ask you to check for potential data leakage.
API Reference¶
submit_results()- Submit resultssave_to_cache()- Save evaluation resultsload_results()- Load cached resultsdownload_from_remote()- Sync with remoteclear_cache()- Clear cache