PairClassification¶

Number of tasks: 46

ArEntail¶

A manually-curated Arabic natural language inference dataset from news headlines.

Dataset: arbml/ArEntail • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	ara	News, Written	human-annotated	found

Citation

@article{obeidat2024arentail,
  author = {Obeidat, Rasha and Al-Harahsheh, Yara and Al-Ayyoub, Mahmoud and Gharaibeh, Maram},
  journal = {Language Resources and Evaluation},
  pages = {1--27},
  publisher = {Springer},
  title = {ArEntail: manually-curated Arabic natural language inference dataset from news headlines},
  year = {2024},
}

ArmenianParaphrasePC¶

asparius/Armenian-Paraphrase-PC

Dataset: asparius/Armenian-Paraphrase-PC • License: apache-2.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	hye	News, Written	derived	found

Citation

@misc{malajyan2020arpa,
  archiveprefix = {arXiv},
  author = {Arthur Malajyan and Karen Avetisyan and Tsolak Ghukasyan},
  eprint = {2009.12615},
  primaryclass = {cs.CL},
  title = {ARPA: Armenian Paraphrase Detection Corpus and Models},
  year = {2020},
}

Assin2RTE¶

Recognizing Textual Entailment part of the ASSIN 2, an evaluation shared task collocated with STIL 2019.

Dataset: nilc-nlp/assin2 • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	por	Written	human-annotated	found

Citation

@inproceedings{real2020assin,
  author = {Real, Livy and Fonseca, Erick and Oliveira, Hugo Goncalo},
  booktitle = {International Conference on Computational Processing of the Portuguese Language},
  organization = {Springer},
  pages = {406--412},
  title = {The assin 2 shared task: a quick overview},
  year = {2020},
}

CDSC-E¶

Compositional Distributional Semantics Corpus for textual entailment.

Dataset: PL-MTEB/cdsce-pairclassification • License: cc-by-nc-sa-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	pol	Written	human-annotated	found

Citation

@inproceedings{wroblewska-krasnowska-kieras-2017-polish,
  address = {Vancouver, Canada},
  author = {Wr{\'o}blewska, Alina  and
Krasnowska-Kiera{\'s}, Katarzyna},
  booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  doi = {10.18653/v1/P17-1073},
  editor = {Barzilay, Regina  and
Kan, Min-Yen},
  month = jul,
  pages = {784--792},
  publisher = {Association for Computational Linguistics},
  title = {{P}olish evaluation dataset for compositional distributional semantics models},
  url = {https://aclanthology.org/P17-1073},
  year = {2017},
}

CExaPPC¶

ExaPPC is a large paraphrase corpus consisting of monolingual sentence-level paraphrases using different sources.

Dataset: PNLPhub/C-ExaPPC • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	fas	Social, Web	derived	found

Citation

@inproceedings{9786243,
  author = {Sadeghi, Reyhaneh and Karbasi, Hamed and Akbari, Ahmad},
  booktitle = {2022 8th International Conference on Web Research (ICWR)},
  doi = {10.1109/ICWR54782.2022.9786243},
  keywords = {Data mining;Task analysis;Paraphrase Identification;Semantic Similarity;Deep Learning;Paraphrasing Corpora},
  number = {},
  pages = {168-175},
  title = {ExaPPC: a Large-Scale Persian Paraphrase Detection Corpus},
  volume = {},
  year = {2022},
}

CTKFactsNLI¶

Czech Natural Language Inference dataset of around 3K evidence-claim pairs labelled with SUPPORTS, REFUTES or NOT ENOUGH INFO veracity labels. Extracted from a round of fact-checking experiments.

Dataset: mteb/CTKFactsNLI • License: cc-by-sa-3.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	ces	News, Written	human-annotated	found

Citation

@article{ullrich2023csfever,
  author = {Ullrich, Herbert and Drchal, Jan and R{\\`y}par, Martin and Vincourov{\\'a}, Hana and Moravec, V{\\'a}clav},
  journal = {Language Resources and Evaluation},
  number = {4},
  pages = {1571--1605},
  publisher = {Springer},
  title = {CsFEVER and CTKFacts: acquiring Czech data for fact verification},
  volume = {57},
  year = {2023},
}

Cmnli¶

Chinese Multi-Genre NLI

Dataset: C-MTEB/CMNLI • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_accuracy	cmn	not specified	not specified	not specified

Citation

@inproceedings{xu-etal-2020-clue,
  address = {Barcelona, Spain (Online)},
  author = {Xu, Liang  and
Hu, Hai  and
Zhang, Xuanwei  and
Li, Lu  and
Cao, Chenjie  and
Li, Yudong  and
Xu, Yechen  and
Sun, Kai  and
Yu, Dian  and
Yu, Cong  and
Tian, Yin  and
Dong, Qianqian  and
Liu, Weitang  and
Shi, Bo  and
Cui, Yiming  and
Li, Junyi  and
Zeng, Jun  and
Wang, Rongzhao  and
Xie, Weijian  and
Li, Yanting  and
Patterson, Yina  and
Tian, Zuoyu  and
Zhang, Yiwen  and
Zhou, He  and
Liu, Shaoweihua  and
Zhao, Zhe  and
Zhao, Qipeng  and
Yue, Cong  and
Zhang, Xinrui  and
Yang, Zhengliang  and
Richardson, Kyle  and
Lan, Zhenzhong},
  booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
  doi = {10.18653/v1/2020.coling-main.419},
  month = dec,
  pages = {4762--4772},
  publisher = {International Committee on Computational Linguistics},
  title = {{CLUE}: A {C}hinese Language Understanding Evaluation Benchmark},
  url = {https://aclanthology.org/2020.coling-main.419},
  year = {2020},
}

DisCoTexPairClassification¶

The DisCoTEX dataset aims at assessing discourse coherence in Italian texts. This dataset focuses on Italian real-world texts and provides resources to model coherence in natural language.

Dataset: MattiaSangermano/DisCoTex-last-sentence • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	ita	Social, Written	derived	found

Citation

@inproceedings{brunato2023discotex,
  author = {Brunato, Dominique and Colla, Davide and Dell'Orletta, Felice and Dini, Irene and Radicioni, Daniele Paolo and Ravelli, Andrea Amelio and others},
  booktitle = {CEUR WORKSHOP PROCEEDINGS},
  organization = {CEUR},
  pages = {1--8},
  title = {DisCoTex at EVALITA 2023: overview of the assessing discourse coherence in Italian texts task},
  volume = {3473},
  year = {2023},
}

FalseFriendsGermanEnglish¶

A dataset to identify False Friends / false cognates between English and German. A generally challenging task for multilingual models.

Dataset: aari1995/false_friends_de_en_mteb • License: mit • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	deu	Written	human-annotated	created

Citation

@misc{Chibb_2022,
  author = {Chibb, Aaron},
  month = {Sep},
  title = {{German-English False Friends in Multilingual Transformer Models: An Evaluation on Robustness and Word-to-Word Fine-Tuning}},
  year = {2022},
}

FarsTail¶

This dataset, named FarsTail, includes 10,367 samples which are provided in both the Persian language as well as the indexed format to be useful for non-Persian researchers. The samples are generated from 3,539 multiple-choice questions with the least amount of annotator interventions in a way similar to the SciTail dataset

Dataset: azarijafari/FarsTail • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	fas	Academic, Written	human-annotated	found

Citation

@article{amirkhani2023farstail,
  author = {Amirkhani, Hossein and AzariJafari, Mohammad and Faridan-Jahromi, Soroush and Kouhkan, Zeinab and Pourjafari, Zohreh and Amirak, Azadeh},
  doi = {10.1007/s00500-023-08959-3},
  journal = {Soft Computing},
  publisher = {Springer},
  title = {FarsTail: a Persian natural language inference dataset},
  year = {2023},
}

FarsiParaphraseDetection¶

Farsi Paraphrase Detection

Dataset: alighasemi/farsi_paraphrase_detection • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	fas	not specified	derived	found

Citation

IndicXnliPairClassification¶

INDICXNLI is similar to existing XNLI dataset in shape/form, but focuses on Indic language family. The train (392,702), validation (2,490), and evaluation sets (5,010) of English XNLI were translated from English into each of the eleven Indic languages. IndicTrans is a large Transformer-based sequence to sequence model. It is trained on Samanantar dataset (Ramesh et al., 2021), which is the largest parallel multi- lingual corpus over eleven Indic languages.

Dataset: mteb/IndicXnliPairClassification • License: cc-by-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	asm, ben, guj, hin, kan, ... (11)	Fiction, Government, Non-fiction, Written	derived	machine-translated

Citation

@misc{aggarwal_gupta_kunch_22,
  author = {Aggarwal, Divyanshu and Gupta, Vivek and Kunchukuttan, Anoop},
  copyright = {Creative Commons Attribution 4.0 International},
  doi = {10.48550/ARXIV.2204.08776},
  publisher = {arXiv},
  title = {IndicXNLI: Evaluating Multilingual Inference for Indian Languages},
  url = {https://arxiv.org/abs/2204.08776},
  year = {2022},
}

KLUE-NLI¶

Textual Entailment between a hypothesis sentence and a premise sentence. Part of the Korean Language Understanding Evaluation (KLUE).

Dataset: klue/klue • License: cc-by-sa-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	kor	Encyclopaedic, News, Written	human-annotated	found

Citation

@misc{park2021klue,
  archiveprefix = {arXiv},
  author = {Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Taehwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Younghoon Jeong and Inkwon Lee and Sangwoo Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice Oh and Jungwoo Ha and Kyunghyun Cho},
  eprint = {2105.09680},
  primaryclass = {cs.CL},
  title = {KLUE: Korean Language Understanding Evaluation},
  year = {2021},
}

LegalBenchPC¶

This LegalBench pair classification task is a combination of the following datasets: - Citation Prediction Classification: Given a legal statement and a case citation, determine if the citation is supportive of the legal statement. - Consumer Contracts QA: The task consists of 400 yes/no questions relating to consumer contracts (specifically, online terms of service) and is relevant to the legal skill of contract interpretation. - Contract QA: Answer yes/no questions about whether contractual clauses discuss particular issues like confidentiality requirements, BIPA consent, PII data breaches, breach of contract etc. - Hearsay: Classify if a particular piece of evidence qualifies as hearsay. Each sample in the dataset describes (1) an issue being litigated or an assertion a party wishes to prove, and (2) a piece of evidence a party wishes to introduce. The goal is to determine if—as it relates to the issue—the evidence would be considered hearsay under the definition provided above. - Privacy Policy Entailment: Given a privacy policy clause and a description of the clause, determine if the description is correct. This is a binary classification task in which the LLM is provided with a clause from a privacy policy, and a description of that clause (e.g., “The policy describes collection of the user’s HTTP cookies, flash cookies, pixel tags, or similar identifiers by a party to the contract.”). - Privacy Policy QA: Given a question and a clause from a privacy policy, determine if the clause contains enough information to answer the question. This is a binary classification task in which the LLM is provided with a question (e.g., “do you publish my data”) and a clause from a privacy policy. The LLM must determine if the clause contains an answer to the question, and classify the question-clause pair.

Dataset: mteb/LegalBenchPC • License: cc-by-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_accuracy	eng	Legal, Written	expert-annotated	found

Citation

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{kolt2022predicting,
  author = {Kolt, Noam},
  journal = {Berkeley Tech. LJ},
  pages = {71},
  publisher = {HeinOnline},
  title = {Predicting consumer contracts},
  volume = {37},
  year = {2022},
}

@article{ravichander2019question,
  author = {Ravichander, Abhilasha and Black, Alan W and Wilson, Shomir and Norton, Thomas and Sadeh, Norman},
  journal = {arXiv preprint arXiv:1911.00841},
  title = {Question answering for privacy policies: Combining computational and legal perspectives},
  year = {2019},
}

@article{zimmeck2019maps,
  author = {Zimmeck, Sebastian and Story, Peter and Smullen, Daniel and Ravichander, Abhilasha and Wang, Ziqi and Reidenberg, Joel R and Russell, N Cameron and Sadeh, Norman},
  journal = {Proc. Priv. Enhancing Tech.},
  pages = {66},
  title = {Maps: Scaling privacy compliance analysis to a million apps},
  volume = {2019},
  year = {2019},
}

Ocnli¶

Original Chinese Natural Language Inference dataset

Dataset: C-MTEB/OCNLI • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_accuracy	cmn	not specified	not specified	not specified

Citation

@misc{hu2020ocnli,
  archiveprefix = {arXiv},
  author = {Hai Hu and Kyle Richardson and Liang Xu and Lu Li and Sandra Kuebler and Lawrence S. Moss},
  eprint = {2010.05444},
  primaryclass = {cs.CL},
  title = {OCNLI: Original Chinese Natural Language Inference},
  year = {2020},
}

OpusparcusPC¶

Opusparcus is a paraphrase corpus for six European language: German, English, Finnish, French, Russian, and Swedish. The paraphrases consist of subtitles from movies and TV shows.

Dataset: mteb/OpusparcusPC • License: cc-by-nc-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	deu, eng, fin, fra, rus, ... (6)	Spoken, Spoken	human-annotated	created

Citation

@misc{creutz2018open,
  archiveprefix = {arXiv},
  author = {Mathias Creutz},
  eprint = {1809.06142},
  primaryclass = {cs.CL},
  title = {Open Subtitles Paraphrase Corpus for Six Languages},
  year = {2018},
}

PSC¶

Polish Summaries Corpus

Dataset: PL-MTEB/psc-pairclassification • License: cc-by-3.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	pol	News, Written	derived	found

Citation

@inproceedings{ogrodniczuk-kopec-2014-polish,
  address = {Reykjavik, Iceland},
  author = {Ogrodniczuk, Maciej  and
Kope{\'c}, Mateusz},
  booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation ({LREC}'14)},
  editor = {Calzolari, Nicoletta  and
Choukri, Khalid  and
Declerck, Thierry  and
Loftsson, Hrafn  and
Maegaard, Bente  and
Mariani, Joseph  and
Moreno, Asuncion  and
Odijk, Jan  and
Piperidis, Stelios},
  month = may,
  pages = {3712--3715},
  publisher = {European Language Resources Association (ELRA)},
  title = {The {P}olish Summaries Corpus},
  url = {http://www.lrec-conf.org/proceedings/lrec2014/pdf/1211_Paper.pdf},
  year = {2014},
}

ParsinluEntail¶

A Persian textual entailment task (deciding sent1 entails sent2). The questions are partially translated from the SNLI dataset and partially generated by expert annotators.

Dataset: mteb/ParsinluEntail • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	fas	Reviews, Written	derived	found

Citation

@misc{khashabi2021parsinlusuitelanguageunderstanding,
  archiveprefix = {arXiv},
  author = {Daniel Khashabi and Arman Cohan and Siamak Shakeri and Pedram Hosseini and Pouya Pezeshkpour and Malihe Alikhani and Moin Aminnaseri and Marzieh Bitaab and Faeze Brahman and Sarik Ghazarian and Mozhdeh Gheini and Arman Kabiri and Rabeeh Karimi Mahabadi and Omid Memarrast and Ahmadreza Mosallanezhad and Erfan Noury and Shahab Raji and Mohammad Sadegh Rasooli and Sepideh Sadeghi and Erfan Sadeqi Azer and Niloofar Safi Samghabadi and Mahsa Shafaei and Saber Sheybani and Ali Tazarv and Yadollah Yaghoobzadeh},
  eprint = {2012.06154},
  primaryclass = {cs.CL},
  title = {ParsiNLU: A Suite of Language Understanding Challenges for Persian},
  url = {https://arxiv.org/abs/2012.06154},
  year = {2021},
}

ParsinluQueryParaphPC¶

A Persian query paraphrasng task (deciding whether two questions are paraphrases of each other). The questions are partially generated from Google auto-complete, and partially translated from the Quora paraphrasing dataset.

Dataset: mteb/ParsinluQueryParaphPC • License: not specified • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	fas	Reviews, Written	derived	found

Citation

@misc{khashabi2021parsinlusuitelanguageunderstanding,
  archiveprefix = {arXiv},
  author = {Daniel Khashabi and Arman Cohan and Siamak Shakeri and Pedram Hosseini and Pouya Pezeshkpour and Malihe Alikhani and Moin Aminnaseri and Marzieh Bitaab and Faeze Brahman and Sarik Ghazarian and Mozhdeh Gheini and Arman Kabiri and Rabeeh Karimi Mahabadi and Omid Memarrast and Ahmadreza Mosallanezhad and Erfan Noury and Shahab Raji and Mohammad Sadegh Rasooli and Sepideh Sadeghi and Erfan Sadeqi Azer and Niloofar Safi Samghabadi and Mahsa Shafaei and Saber Sheybani and Ali Tazarv and Yadollah Yaghoobzadeh},
  eprint = {2012.06154},
  primaryclass = {cs.CL},
  title = {ParsiNLU: A Suite of Language Understanding Challenges for Persian},
  url = {https://arxiv.org/abs/2012.06154},
  year = {2021},
}

PawsXPairClassification¶

{PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification

Dataset: mteb/PawsXPairClassification • License: https://huggingface.co/datasets/google-research-datasets/paws-x#licensing-information • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	cmn, deu, eng, fra, jpn, ... (7)	Encyclopaedic, Web, Written	human-annotated	human-translated

Citation

@misc{yang2019pawsx,
  archiveprefix = {arXiv},
  author = {Yinfei Yang and Yuan Zhang and Chris Tar and Jason Baldridge},
  eprint = {1908.11828},
  primaryclass = {cs.CL},
  title = {PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification},
  year = {2019},
}

PpcPC¶

Polish Paraphrase Corpus

Dataset: PL-MTEB/ppc-pairclassification • License: gpl-3.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
text to text (t2t)	max_ap	pol	Fiction, News, Non-fiction, Social, Spoken, ... (7)	derived	found

Citation

@misc{dadas2022training,
  archiveprefix = {arXiv},
  author = {Sławomir Dadas},
  eprint = {2207.12759},
  primaryclass = {cs.CL},
  title = {Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases},
  year = {2022},
}