Benchmark¶
A benchmark within mteb
is essentially just a list of tasks along with some metadata about the benchmark.

mteb
This metadata includes a short description of the benchmark's intention, the reference, and the citation. If you use a benchmark from mteb
, we recommend that you cite it along with mteb
.
Utilities¶
mteb.get_benchmarks(names=None, display_on_leaderboard=None)
¶
Get a list of benchmarks by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
names
|
list[str] | None
|
A list of benchmark names to retrieve. If None, all benchmarks are returned. |
None
|
display_on_leaderboard
|
bool | None
|
If specified, filters benchmarks by whether they are displayed on the leaderboard. |
None
|
Source code in mteb/benchmarks/get_benchmark.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
mteb.get_benchmark(benchmark_name)
¶
Get a benchmark by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
benchmark_name
|
str
|
The name of the benchmark to retrieve. |
required |
Source code in mteb/benchmarks/get_benchmark.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
The Benchmark Object¶
mteb.Benchmark
dataclass
¶
A benchmark object intended to run a certain benchmark within MTEB.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the benchmark |
required |
tasks
|
Sequence[AbsTask]
|
The tasks within the benchmark. |
required |
description
|
str | None
|
A description of the benchmark, should include its intended goal and potentially a description of its construction |
None
|
reference
|
StrURL | None
|
A link reference, to a source containing additional information typically to a paper, leaderboard or github. |
None
|
citation
|
str | None
|
A bibtex citation |
None
|
contacts
|
list[str] | None
|
The people to contact in case of a problem in the benchmark, preferably a GitHub handle. |
None
|
Examples:
>>> Benchmark(
... name="MTEB(custom)",
... tasks=mteb.get_tasks(
... tasks=["AmazonCounterfactualClassification", "AmazonPolarityClassification"],
... languages=["eng"],
... ),
... description="A custom benchmark"
... )
Source code in mteb/benchmarks/benchmark.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|