Skip to content

Classification

  • Number of tasks: 456

AJGT

Arabic Jordanian General Tweets (AJGT) Corpus consisted of 1,800 tweets annotated as positive and negative. Modern Standard Arabic (MSA) or Jordanian dialect.

Dataset: komari6/ajgt_twitter_arLicense: afl-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Social, Written human-annotated found
Citation
@inproceedings{alomari2017arabic,
  author = {Alomari, Khaled Mohammad and ElSherif, Hatem M and Shaalan, Khaled},
  booktitle = {International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems},
  organization = {Springer},
  pages = {602--610},
  title = {Arabic tweets sentimental analysis using machine learning},
  year = {2017},
}

AJGT.v2

Arabic Jordanian General Tweets (AJGT) Corpus consisted of 1,800 tweets (900 for training and 900 for testing) annotated as positive and negative. Modern Standard Arabic (MSA) or Jordanian dialect. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/ajgtLicense: afl-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Social, Written human-annotated found
Citation
@inproceedings{alomari2017arabic,
  author = {Alomari, Khaled Mohammad and ElSherif, Hatem M and Shaalan, Khaled},
  booktitle = {International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems},
  organization = {Springer},
  pages = {602--610},
  title = {Arabic tweets sentimental analysis using machine learning},
  year = {2017},
}

AfriSentiClassification

AfriSenti is the largest sentiment analysis dataset for under-represented African languages.

Dataset: shmuhammad/AfriSenti-twitter-sentimentLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy amh, arq, ary, hau, ibo, ... (12) Social, Written derived found
Citation
@inproceedings{Muhammad2023AfriSentiAT,
  author = {Shamsuddeen Hassan Muhammad and Idris Abdulmumin and Abinew Ali Ayele and Nedjma Ousidhoum and David Ifeoluwa Adelani and Seid Muhie Yimam and Ibrahim Sa'id Ahmad and Meriem Beloucif and Saif Mohammad and Sebastian Ruder and Oumaima Hourrane and Pavel Brazdil and Felermino D'ario M'ario Ant'onio Ali and Davis Davis and Salomey Osei and Bello Shehu Bello and Falalu Ibrahim and Tajuddeen Gwadabe and Samuel Rutunda and Tadesse Belay and Wendimu Baye Messelle and Hailu Beshada Balcha and Sisay Adugna Chala and Hagos Tesfahun Gebremichael and Bernard Opoku and Steven Arthur},
  title = {AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages},
  year = {2023},
}

AfriSentiLangClassification

AfriSentiLID is the largest LID classification dataset for African Languages.

Dataset: HausaNLP/afrisenti-lid-dataLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy amh, arq, ary, hau, ibo, ... (12) Social, Written derived found
Citation

AllegroReviews

A Polish dataset for sentiment classification on reviews from e-commerce marketplace Allegro.

Dataset: PL-MTEB/allegro-reviewsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pol Reviews derived found
Citation
@inproceedings{rybak-etal-2020-klej,
  abstract = {In recent years, a series of Transformer-based models unlocked major improvements in general natural language understanding (NLU) tasks. Such a fast pace of research would not be possible without general NLU benchmarks, which allow for a fair comparison of the proposed methods. However, such benchmarks are available only for a handful of languages. To alleviate this issue, we introduce a comprehensive multi-task benchmark for the Polish language understanding, accompanied by an online leaderboard. It consists of a diverse set of tasks, adopted from existing datasets for named entity recognition, question-answering, textual entailment, and others. We also introduce a new sentiment analysis task for the e-commerce domain, named Allegro Reviews (AR). To ensure a common evaluation scheme and promote models that generalize to different NLU tasks, the benchmark includes datasets from varying domains and applications. Additionally, we release HerBERT, a Transformer-based model trained specifically for the Polish language, which has the best average performance and obtains the best results for three out of nine tasks. Finally, we provide an extensive evaluation, including several standard baselines and recently proposed, multilingual Transformer-based models.},
  address = {Online},
  author = {Rybak, Piotr  and
Mroczkowski, Robert  and
Tracz, Janusz  and
Gawlik, Ireneusz},
  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  doi = {10.18653/v1/2020.acl-main.111},
  editor = {Jurafsky, Dan  and
Chai, Joyce  and
Schluter, Natalie  and
Tetreault, Joel},
  month = jul,
  pages = {1191--1201},
  publisher = {Association for Computational Linguistics},
  title = {{KLEJ}: Comprehensive Benchmark for {P}olish Language Understanding},
  url = {https://aclanthology.org/2020.acl-main.111/},
  year = {2020},
}

AllegroReviews.v2

A Polish dataset for sentiment classification on reviews from e-commerce marketplace Allegro. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/allegro_reviewsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pol Reviews derived found
Citation
@inproceedings{rybak-etal-2020-klej,
  abstract = {In recent years, a series of Transformer-based models unlocked major improvements in general natural language understanding (NLU) tasks. Such a fast pace of research would not be possible without general NLU benchmarks, which allow for a fair comparison of the proposed methods. However, such benchmarks are available only for a handful of languages. To alleviate this issue, we introduce a comprehensive multi-task benchmark for the Polish language understanding, accompanied by an online leaderboard. It consists of a diverse set of tasks, adopted from existing datasets for named entity recognition, question-answering, textual entailment, and others. We also introduce a new sentiment analysis task for the e-commerce domain, named Allegro Reviews (AR). To ensure a common evaluation scheme and promote models that generalize to different NLU tasks, the benchmark includes datasets from varying domains and applications. Additionally, we release HerBERT, a Transformer-based model trained specifically for the Polish language, which has the best average performance and obtains the best results for three out of nine tasks. Finally, we provide an extensive evaluation, including several standard baselines and recently proposed, multilingual Transformer-based models.},
  address = {Online},
  author = {Rybak, Piotr  and
Mroczkowski, Robert  and
Tracz, Janusz  and
Gawlik, Ireneusz},
  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  doi = {10.18653/v1/2020.acl-main.111},
  editor = {Jurafsky, Dan  and
Chai, Joyce  and
Schluter, Natalie  and
Tetreault, Joel},
  month = jul,
  pages = {1191--1201},
  publisher = {Association for Computational Linguistics},
  title = {{KLEJ}: Comprehensive Benchmark for {P}olish Language Understanding},
  url = {https://aclanthology.org/2020.acl-main.111/},
  year = {2020},
}

AmazonCounterfactualClassification

A collection of Amazon customer reviews annotated for counterfactual detection pair classification.

Dataset: mteb/amazon_counterfactualLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy deu, eng, jpn Reviews, Written human-annotated found
Citation
@inproceedings{oneill-etal-2021-wish,
  abstract = {Counterfactual statements describe events that did not or cannot take place. We consider the problem of counterfactual detection (CFD) in product reviews. For this purpose, we annotate a multilingual CFD dataset from Amazon product reviews covering counterfactual statements written in English, German, and Japanese languages. The dataset is unique as it contains counterfactuals in multiple languages, covers a new application area of e-commerce reviews, and provides high quality professional annotations. We train CFD models using different text representation methods and classifiers. We find that these models are robust against the selectional biases introduced due to cue phrase-based sentence selection. Moreover, our CFD dataset is compatible with prior datasets and can be merged to learn accurate CFD models. Applying machine translation on English counterfactual examples to create multilingual data performs poorly, demonstrating the language-specificity of this problem, which has been ignored so far.},
  address = {Online and Punta Cana, Dominican Republic},
  author = {O{'}Neill, James  and
Rozenshtein, Polina  and
Kiryo, Ryuichi  and
Kubota, Motoko  and
Bollegala, Danushka},
  booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
  doi = {10.18653/v1/2021.emnlp-main.568},
  editor = {Moens, Marie-Francine  and
Huang, Xuanjing  and
Specia, Lucia  and
Yih, Scott Wen-tau},
  month = nov,
  pages = {7092--7108},
  publisher = {Association for Computational Linguistics},
  title = {{I} Wish {I} Would Have Loved This One, But {I} Didn{'}t {--} A Multilingual Dataset for Counterfactual Detection in Product Review},
  url = {https://aclanthology.org/2021.emnlp-main.568},
  year = {2021},
}

AmazonCounterfactualVNClassification

A collection of translated Amazon customer reviews annotated for counterfactual detection pair classification. The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/amazon-counterfactual-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Reviews, Written derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

AmazonPolarityClassification

Amazon Polarity Classification Dataset.

Dataset: mteb/amazon_polarityLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Reviews, Written derived found
Citation
@article{McAuley2013HiddenFA,
  author = {Julian McAuley and Jure Leskovec},
  journal = {Proceedings of the 7th ACM conference on Recommender systems},
  title = {Hidden factors and hidden topics: understanding rating dimensions with review text},
  url = {https://api.semanticscholar.org/CorpusID:6440341},
  year = {2013},
}

AmazonPolarityClassification.v2

Amazon Polarity Classification Dataset. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/amazon_polarityLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Reviews, Written derived found
Citation
@article{McAuley2013HiddenFA,
  author = {Julian McAuley and Jure Leskovec},
  journal = {Proceedings of the 7th ACM conference on Recommender systems},
  title = {Hidden factors and hidden topics: understanding rating dimensions with review text},
  url = {https://api.semanticscholar.org/CorpusID:6440341},
  year = {2013},
}

AmazonPolarityVNClassification

A collection of translated Amazon customer reviews annotated for polarity classification. The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/amazon-polarity-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Reviews, Written derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

AmazonReviewsClassification

A collection of Amazon reviews specifically designed to aid research in multilingual text classification.

Dataset: mteb/AmazonReviewsClassificationLicense: https://docs.opendata.aws/amazon-reviews-ml/license.txt • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn, deu, eng, fra, jpn, ... (6) Reviews, Written human-annotated found
Citation
@misc{keung2020multilingual,
  archiveprefix = {arXiv},
  author = {Phillip Keung and Yichao Lu and György Szarvas and Noah A. Smith},
  eprint = {2010.02573},
  primaryclass = {cs.CL},
  title = {The Multilingual Amazon Reviews Corpus},
  year = {2020},
}

AmazonReviewsVNClassification

A collection of translated Amazon reviews specifically designed to aid research in multilingual text classification. The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/amazon-reviews-multi-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Reviews, Written derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

AngryTweetsClassification

A sentiment dataset with 3 classes (positiv, negativ, neutral) for Danish tweets

Dataset: DDSC/angry-tweetsLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan Social, Written human-annotated found
Citation
@inproceedings{pauli2021danlp,
  author = {Pauli, Amalie Brogaard and Barrett, Maria and Lacroix, Oph{\'e}lie and Hvingelby, Rasmus},
  booktitle = {Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)},
  pages = {460--466},
  title = {DaNLP: An open-source toolkit for Danish Natural Language Processing},
  year = {2021},
}

AngryTweetsClassification.v2

A sentiment dataset with 3 classes (positiv, negativ, neutral) for Danish tweets This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/angry_tweetsLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan Social, Written human-annotated found
Citation
@inproceedings{pauli2021danlp,
  author = {Pauli, Amalie Brogaard and Barrett, Maria and Lacroix, Oph{\'e}lie and Hvingelby, Rasmus},
  booktitle = {Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)},
  pages = {460--466},
  title = {DaNLP: An open-source toolkit for Danish Natural Language Processing},
  year = {2021},
}

ArxivClassification

Classification Dataset of Arxiv Papers

Dataset: mteb/ArxivClassificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Academic, Written derived found
Citation
@article{8675939,
  author = {He, Jun and Wang, Liqun and Liu, Liu and Feng, Jiao and Wu, Hao},
  doi = {10.1109/ACCESS.2019.2907992},
  journal = {IEEE Access},
  number = {},
  pages = {40707-40718},
  title = {Long Document Classification From Local Word Glimpses via Recurrent Attention Learning},
  volume = {7},
  year = {2019},
}

ArxivClassification.v2

Classification Dataset of Arxiv Papers This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/arxivLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Academic, Written derived found
Citation
@article{8675939,
  author = {He, Jun and Wang, Liqun and Liu, Liu and Feng, Jiao and Wu, Hao},
  doi = {10.1109/ACCESS.2019.2907992},
  journal = {IEEE Access},
  number = {},
  pages = {40707-40718},
  title = {Long Document Classification From Local Word Glimpses via Recurrent Attention Learning},
  volume = {7},
  year = {2019},
}

Banking77Classification

Dataset composed of online banking queries annotated with their corresponding intents.

Dataset: mteb/banking77License: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Written human-annotated found
Citation
@inproceedings{casanueva-etal-2020-efficient,
  address = {Online},
  author = {Casanueva, I{\~n}igo  and
Tem{\v{c}}inas, Tadas  and
Gerz, Daniela  and
Henderson, Matthew  and
Vuli{\'c}, Ivan},
  booktitle = {Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI},
  doi = {10.18653/v1/2020.nlp4convai-1.5},
  editor = {Wen, Tsung-Hsien  and
Celikyilmaz, Asli  and
Yu, Zhou  and
Papangelis, Alexandros  and
Eric, Mihail  and
Kumar, Anuj  and
Casanueva, I{\~n}igo  and
Shah, Rushin},
  month = jul,
  pages = {38--45},
  publisher = {Association for Computational Linguistics},
  title = {Efficient Intent Detection with Dual Sentence Encoders},
  url = {https://aclanthology.org/2020.nlp4convai-1.5},
  year = {2020},
}

Banking77Classification.v2

Dataset composed of online banking queries annotated with their corresponding intents. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/banking77License: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Written human-annotated found
Citation
@inproceedings{casanueva-etal-2020-efficient,
  address = {Online},
  author = {Casanueva, I{\~n}igo  and
Tem{\v{c}}inas, Tadas  and
Gerz, Daniela  and
Henderson, Matthew  and
Vuli{\'c}, Ivan},
  booktitle = {Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI},
  doi = {10.18653/v1/2020.nlp4convai-1.5},
  editor = {Wen, Tsung-Hsien  and
Celikyilmaz, Asli  and
Yu, Zhou  and
Papangelis, Alexandros  and
Eric, Mihail  and
Kumar, Anuj  and
Casanueva, I{\~n}igo  and
Shah, Rushin},
  month = jul,
  pages = {38--45},
  publisher = {Association for Computational Linguistics},
  title = {Efficient Intent Detection with Dual Sentence Encoders},
  url = {https://aclanthology.org/2020.nlp4convai-1.5},
  year = {2020},
}

Banking77VNClassification

A translated dataset composed of online banking queries annotated with their corresponding intents. The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/banking77-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Written derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

BengaliDocumentClassification

Dataset for News Classification, categorized with 13 domains.

Dataset: dialect-ai/shironaamLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ben News, Written derived found
Citation
@inproceedings{akash-etal-2023-shironaam,
  address = {Dubrovnik, Croatia},
  author = {Akash, Abu Ubaida  and
Nayeem, Mir Tafseer  and
Shohan, Faisal Tareque  and
Islam, Tanvir},
  booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics},
  month = may,
  pages = {52--67},
  publisher = {Association for Computational Linguistics},
  title = {Shironaam: {B}engali News Headline Generation using Auxiliary Information},
  url = {https://aclanthology.org/2023.eacl-main.4},
  year = {2023},
}

BengaliDocumentClassification.v2

Dataset for News Classification, categorized with 13 domains. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/bengali_documentLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ben News, Written derived found
Citation
@inproceedings{akash-etal-2023-shironaam,
  address = {Dubrovnik, Croatia},
  author = {Akash, Abu Ubaida  and
Nayeem, Mir Tafseer  and
Shohan, Faisal Tareque  and
Islam, Tanvir},
  booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics},
  month = may,
  pages = {52--67},
  publisher = {Association for Computational Linguistics},
  title = {Shironaam: {B}engali News Headline Generation using Auxiliary Information},
  url = {https://aclanthology.org/2023.eacl-main.4},
  year = {2023},
}

BengaliHateSpeechClassification

The Bengali Hate Speech Dataset is a Bengali-language dataset of news articles collected from various Bengali media sources and categorized based on the type of hate in the text.

Dataset: rezacsedu/bn_hate_speechLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 ben News, Written expert-annotated found
Citation
@inproceedings{karim2020BengaliNLP,
  author = {Karim, Md. Rezaul and Chakravarti, Bharathi Raja and P. McCrae, John and Cochez, Michael},
  booktitle = {7th IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA,2020)},
  publisher = {IEEE},
  title = {Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network},
  year = {2020},
}

BengaliHateSpeechClassification.v2

The Bengali Hate Speech Dataset is a Bengali-language dataset of news articles collected from various Bengali media sources and categorized based on the type of hate in the text. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/bengali_hate_speechLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 ben News, Written expert-annotated found
Citation
@inproceedings{karim2020BengaliNLP,
  author = {Karim, Md. Rezaul and Chakravarti, Bharathi Raja and P. McCrae, John and Cochez, Michael},
  booktitle = {7th IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA,2020)},
  publisher = {IEEE},
  title = {Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network},
  year = {2020},
}

BengaliSentimentAnalysis

dataset contains 3307 Negative reviews and 8500 Positive reviews collected and manually annotated from Youtube Bengali drama.

Dataset: Akash190104/bengali_sentiment_analysisLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 ben Reviews, Written human-annotated found
Citation
@inproceedings{sazzed2020cross,
  author = {Sazzed, Salim},
  booktitle = {Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)},
  pages = {50--60},
  title = {Cross-lingual sentiment classification in low-resource Bengali language},
  year = {2020},
}

BengaliSentimentAnalysis.v2

dataset contains 2854 Negative reviews and 7238 Positive reviews collected and manually annotated from Youtube Bengali drama. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/bengali_sentiment_analysisLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 ben Reviews, Written human-annotated found
Citation
@inproceedings{sazzed2020cross,
  author = {Sazzed, Salim},
  booktitle = {Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)},
  pages = {50--60},
  title = {Cross-lingual sentiment classification in low-resource Bengali language},
  year = {2020},
}

BulgarianStoreReviewSentimentClassfication

Bulgarian online store review dataset for sentiment classification.

Dataset: artist/Bulgarian-Online-Store-Feedback-Text-AnalysisLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy bul Reviews, Written human-annotated found
Citation
@data{DVN/TXIK9P_2018,
  author = {Georgieva-Trifonova, Tsvetanka and Stefanova, Milena and Kalchev, Stefan},
  doi = {10.7910/DVN/TXIK9P},
  publisher = {Harvard Dataverse},
  title = {{Dataset for ``Customer Feedback Text Analysis for Online Stores Reviews in Bulgarian''}},
  url = {https://doi.org/10.7910/DVN/TXIK9P},
  version = {V1},
  year = {2018},
}

CBD

Polish Tweets annotated for cyberbullying detection.

Dataset: PL-MTEB/cbdLicense: bsd-3-clause • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pol Social, Written human-annotated found
Citation
@proceedings{ogr:kob:19:poleval,
  address = {Warsaw, Poland},
  editor = {Maciej Ogrodniczuk and Łukasz Kobyliński},
  isbn = {978-83-63159-28-3},
  publisher = {Institute of Computer Science, Polish Academy of Sciences},
  title = {{Proceedings of the PolEval 2019 Workshop}},
  url = {http://2019.poleval.pl/files/poleval2019.pdf},
  year = {2019},
}

CBD.v2

Polish Tweets annotated for cyberbullying detection. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/cbdLicense: bsd-3-clause • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pol Social, Written human-annotated found
Citation
@proceedings{ogr:kob:19:poleval,
  address = {Warsaw, Poland},
  editor = {Maciej Ogrodniczuk and Łukasz Kobyliński},
  isbn = {978-83-63159-28-3},
  publisher = {Institute of Computer Science, Polish Academy of Sciences},
  title = {{Proceedings of the PolEval 2019 Workshop}},
  url = {http://2019.poleval.pl/files/poleval2019.pdf},
  year = {2019},
}

CSFDCZMovieReviewSentimentClassification

The dataset contains 30k user reviews from csfd.cz in Czech.

Dataset: fewshot-goes-multilingual/cs_csfd-movie-reviewsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ces Reviews, Written derived found
Citation
@misc{štefánik2023resources,
  archiveprefix = {arXiv},
  author = {Michal Štefánik and Marek Kadlčík and Piotr Gramacki and Petr Sojka},
  eprint = {2304.01922},
  primaryclass = {cs.CL},
  title = {Resources and Few-shot Learners for In-context Learning in Slavic Languages},
  year = {2023},
}

CSFDCZMovieReviewSentimentClassification.v2

The dataset contains 30k user reviews from csfd.cz in Czech. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/csfdcz_movie_review_sentimentLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ces Reviews, Written derived found
Citation
@misc{štefánik2023resources,
  archiveprefix = {arXiv},
  author = {Michal Štefánik and Marek Kadlčík and Piotr Gramacki and Petr Sojka},
  eprint = {2304.01922},
  primaryclass = {cs.CL},
  title = {Resources and Few-shot Learners for In-context Learning in Slavic Languages},
  year = {2023},
}

CSFDSKMovieReviewSentimentClassification

The dataset contains 30k user reviews from csfd.cz in Slovak.

Dataset: fewshot-goes-multilingual/sk_csfd-movie-reviewsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy slk Reviews, Written derived found
Citation
@misc{štefánik2023resources,
  archiveprefix = {arXiv},
  author = {Michal Štefánik and Marek Kadlčík and Piotr Gramacki and Petr Sojka},
  eprint = {2304.01922},
  primaryclass = {cs.CL},
  title = {Resources and Few-shot Learners for In-context Learning in Slavic Languages},
  year = {2023},
}

CSFDSKMovieReviewSentimentClassification.v2

The dataset contains 30k user reviews from csfd.cz in Slovak. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/csfdsk_movie_review_sentimentLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy slk Reviews, Written derived found
Citation
@misc{štefánik2023resources,
  archiveprefix = {arXiv},
  author = {Michal Štefánik and Marek Kadlčík and Piotr Gramacki and Petr Sojka},
  eprint = {2304.01922},
  primaryclass = {cs.CL},
  title = {Resources and Few-shot Learners for In-context Learning in Slavic Languages},
  year = {2023},
}

CUADAffiliateLicenseLicenseeLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if a clause describes a license grant to a licensee (incl. sublicensor) and the affiliates of such licensee/sublicensor.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADAffiliateLicenseLicensorLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause describes a license grant by affiliates of the licensor or that includes intellectual property of affiliates of the licensor.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADAntiAssignmentLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause requires consent or notice of a party if the contract is assigned to a third party.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADAuditRightsLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause gives a party the right to audit the books, records, or physical locations of the counterparty to ensure compliance with the contract.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADCapOnLiabilityLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a cap on liability upon the breach of a party's obligation. This includes time limitation for the counterparty to bring claims or maximum amount for recovery.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADChangeOfControlLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause gives one party the right to terminate or is consent or notice required of the counterparty if such party undergoes a change of control, such as a merger, stock sale, transfer of all or substantially all of its assets or business, or assignment by operation of law.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADCompetitiveRestrictionExceptionLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause mentions exceptions or carveouts to Non-Compete, Exclusivity and No-Solicit of Customers.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADCovenantNotToSueLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that a party is restricted from contesting the validity of the counterparty's ownership of intellectual property or otherwise bringing a claim against the counterparty for matters unrelated to the contract.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADEffectiveDateLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies the date upon which the agreement becomes effective.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADExclusivityLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies exclusive dealing commitment with the counterparty. This includes a commitment to procure all 'requirements' from one party of certain technology, goods, or services or a prohibition on licensing or selling technology, goods or services to third parties, or a prohibition on collaborating or working with other parties), whether during the contract or after the contract ends (or both).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADExpirationDateLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies the date upon which the initial term expires.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADGoverningLawLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies which state/country’s law governs the contract.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADIPOwnershipAssignmentLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that intellectual property created by one party become the property of the counterparty, either per the terms of the contract or upon the occurrence of certain events.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADInsuranceLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if clause creates a requirement for insurance that must be maintained by one party for the benefit of the counterparty.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADIrrevocableOrPerpetualLicenseLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a license grant that is irrevocable or perpetual.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADJointIPOwnershipLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause provides for joint or shared ownership of intellectual property between the parties to the contract.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADLicenseGrantLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause contains a license granted by one party to its counterparty.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADLiquidatedDamagesLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause awards either party liquidated damages for breach or a fee upon the termination of a contract (termination fee).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADMinimumCommitmentLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a minimum order size or minimum amount or units per time period that one party must buy from the counterparty.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADMostFavoredNationLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if a third party gets better terms on the licensing or sale of technology/goods/services described in the contract, the buyer of such technology/goods/services under the contract shall be entitled to those better terms.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADNoSolicitOfCustomersLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause restricts a party from contracting or soliciting customers or partners of the counterparty, whether during the contract or after the contract ends (or both).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADNoSolicitOfEmployeesLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause restricts a party's soliciting or hiring employees and/or contractors from the counterparty, whether during the contract or after the contract ends (or both).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADNonCompeteLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause restricts the ability of a party to compete with the counterparty or operate in a certain geography or business or technology sector.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADNonDisparagementLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause requires a party not to disparage the counterparty.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADNonTransferableLicenseLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause limits the ability of a party to transfer the license being granted to a third party.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADNoticePeriodToTerminateRenewalLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a notice period required to terminate renewal.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADPostTerminationServicesLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause subjects a party to obligations after the termination or expiration of a contract, including any post-termination transition, payment, transfer of IP, wind-down, last-buy, or similar commitments.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADPriceRestrictionsLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause places a restriction on the ability of a party to raise or reduce prices of technology, goods, or services provided.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADRenewalTermLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a renewal term.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADRevenueProfitSharingLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause require a party to share revenue or profit with the counterparty for any technology, goods, or services.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADRofrRofoRofnLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause grant one party a right of first refusal, right of first offer or right of first negotiation to purchase, license, market, or distribute equity interest, technology, assets, products or services.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADSourceCodeEscrowLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause requires one party to deposit its source code into escrow with a third party, which can be released to the counterparty upon the occurrence of certain events (bankruptcy, insolvency, etc.).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADTerminationForConvenienceLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that one party can terminate this contract without cause (solely by giving a notice and allowing a waiting period to expire).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADThirdPartyBeneficiaryLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that that there a non-contracting party who is a beneficiary to some or all of the clauses in the contract and therefore can enforce its rights against a contracting party.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADUncappedLiabilityLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies that a party's liability is uncapped upon the breach of its obligation in the contract. This also includes uncap liability for a particular type of breach such as IP infringement or breach of confidentiality obligation.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADUnlimitedAllYouCanEatLicenseLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause grants one party an “enterprise,” “all you can eat” or unlimited usage license.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADVolumeRestrictionLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a fee increase or consent requirement, etc. if one party's use of the product/services exceeds certain threshold.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CUADWarrantyDurationLegalBenchClassification

This task was constructed from the CUAD dataset. It consists of determining if the clause specifies a duration of any warranty against defects or errors in technology, products, or services provided under the contract.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{hendrycks2021cuad,
  author = {Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
  journal = {arXiv preprint arXiv:2103.06268},
  title = {Cuad: An expert-annotated nlp dataset for legal contract review},
  year = {2021},
}

CanadaTaxCourtOutcomesLegalBenchClassification

The input is an excerpt of text from Tax Court of Canada decisions involving appeals of tax related matters. The task is to classify whether the excerpt includes the outcome of the appeal, and if so, to specify whether the appeal was allowed or dismissed. Partial success (e.g. appeal granted on one tax year but dismissed on another) counts as allowed (with the exception of costs orders which are disregarded). Where the excerpt does not clearly articulate an outcome, the system should indicate other as the outcome. Categorizing case outcomes is a common task that legal researchers complete in order to gather datasets involving outcomes in legal processes for the purposes of quantitative empirical legal research.

Dataset: nguha/legalbenchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

CataloniaTweetClassification

This dataset contains two corpora in Spanish and Catalan that consist of annotated Twitter messages for automatic stance detection. The data was collected over 12 days during February and March of 2019 from tweets posted in Barcelona, and during September of 2018 from tweets posted in the town of Terrassa, Catalonia. Each corpus is annotated with three classes: AGAINST, FAVOR and NEUTRAL, which express the stance towards the target - independence of Catalonia.

Dataset: community-datasets/catalonia_independenceLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cat, spa Government, Social, Written expert-annotated created
Citation
@inproceedings{zotova-etal-2020-multilingual,
  author = {Zotova, Elena  and
Agerri, Rodrigo  and
Nu{\~n}ez, Manuel  and
Rigau, German},
  booktitle = {Proceedings of the Twelfth Language Resources and Evaluation Conference},
  editor = {Calzolari, Nicoletta  and
B{\'e}chet, Fr{\'e}d{\'e}ric  and
Blache, Philippe  and
Choukri, Khalid  and
Cieri, Christopher  and
Declerck, Thierry  and
Goggi, Sara  and
Isahara, Hitoshi  and
Maegaard, Bente  and
Mariani, Joseph  and
Mazo, H{\'e}l{\`e}ne  and
Moreno, Asuncion  and
Odijk, Jan  and
Piperidis, Stelios},
  isbn = {979-10-95546-34-4},
  month = may,
  pages = {1368--1375},
  publisher = {European Language Resources Association},
  title = {Multilingual Stance Detection in Tweets: The {C}atalonia Independence Corpus},
  year = {2020},
}

ContractNLIConfidentialityOfAgreementLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA provides that the Receiving Party shall not disclose the fact that Agreement was agreed or negotiated.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLIExplicitIdentificationLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that all Confidential Information shall be expressly identified by the Disclosing Party.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLIInclusionOfVerballyConveyedInformationLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that Confidential Information may include verbally conveyed information.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLILimitedUseLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that the Receiving Party shall not use any Confidential Information for any purpose other than the purposes stated in Agreement.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLINoLicensingLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that the Agreement shall not grant Receiving Party any right to Confidential Information.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLINoticeOnCompelledDisclosureLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that the Receiving Party shall notify Disclosing Party in case Receiving Party is required by law, regulation or judicial process to disclose any Confidential Information.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLIPermissibleAcquirementOfSimilarInformationLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that the Receiving Party may acquire information similar to Confidential Information from a third party.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLIPermissibleCopyLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that the Receiving Party may create a copy of some Confidential Information in some circumstances.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLIPermissibleDevelopmentOfSimilarInformationLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that the Receiving Party may independently develop information similar to Confidential Information.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLIPermissiblePostAgreementPossessionLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that the Receiving Party may retain some Confidential Information even after the return or destruction of Confidential Information.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLIReturnOfConfidentialInformationLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that the Receiving Party shall destroy or return some Confidential Information upon the termination of Agreement.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLISharingWithEmployeesLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that the Receiving Party may share some Confidential Information with some of Receiving Party's employees.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLISharingWithThirdPartiesLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that the Receiving Party may share some Confidential Information with some third-parties (including consultants, agents and professional advisors).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

ContractNLISurvivalOfObligationsLegalBenchClassification

This task is a subset of ContractNLI, and consists of determining whether a clause from an NDA clause provides that some obligations of Agreement may survive termination of Agreement.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{koreeda2021contractnli,
  author = {Koreeda, Yuta and Manning, Christopher D},
  journal = {arXiv preprint arXiv:2110.01799},
  title = {ContractNLI: A dataset for document-level natural language inference for contracts},
  year = {2021},
}

CorporateLobbyingLegalBenchClassification

The Corporate Lobbying task consists of determining whether a proposed Congressional bill may be relevant to a company based on a company's self-description in its SEC 10K filing.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

CyrillicTurkicLangClassification

Cyrillic dataset of 8 Turkic languages spoken in Russia and former USSR

Dataset: tatiana-merz/cyrillic_turkic_langsLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy bak, chv, kaz, kir, krc, ... (9) Web, Written derived found
Citation
@inproceedings{goldhahn2012building,
  author = {Goldhahn, Dirk and Eckart, Thomas and Quasthoff, Uwe},
  booktitle = {Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)},
  title = {Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages},
  year = {2012},
}

CzechProductReviewSentimentClassification

User reviews of products on Czech e-shop Mall.cz with 3 sentiment classes (positive, neutral, negative)

Dataset: fewshot-goes-multilingual/cs_mall-product-reviewsLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ces Reviews, Written derived found
Citation
@inproceedings{habernal-etal-2013-sentiment,
  address = {Atlanta, Georgia},
  author = {Habernal, Ivan  and
Pt{\'a}{\v{c}}ek, Tom{\'a}{\v{s}}  and
Steinberger, Josef},
  booktitle = {Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis},
  editor = {Balahur, Alexandra  and
van der Goot, Erik  and
Montoyo, Andres},
  month = jun,
  pages = {65--74},
  publisher = {Association for Computational Linguistics},
  title = {Sentiment Analysis in {C}zech Social Media Using Supervised Machine Learning},
  url = {https://aclanthology.org/W13-1609},
  year = {2013},
}

CzechProductReviewSentimentClassification.v2

User reviews of products on Czech e-shop Mall.cz with 3 sentiment classes (positive, neutral, negative) This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/czech_product_review_sentimentLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ces Reviews, Written derived found
Citation
@inproceedings{habernal-etal-2013-sentiment,
  address = {Atlanta, Georgia},
  author = {Habernal, Ivan  and
Pt{\'a}{\v{c}}ek, Tom{\'a}{\v{s}}  and
Steinberger, Josef},
  booktitle = {Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis},
  editor = {Balahur, Alexandra  and
van der Goot, Erik  and
Montoyo, Andres},
  month = jun,
  pages = {65--74},
  publisher = {Association for Computational Linguistics},
  title = {Sentiment Analysis in {C}zech Social Media Using Supervised Machine Learning},
  url = {https://aclanthology.org/W13-1609},
  year = {2013},
}

CzechSoMeSentimentClassification

User comments on Facebook

Dataset: fewshot-goes-multilingual/cs_facebook-commentsLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ces Reviews, Written derived found
Citation
@inproceedings{habernal-etal-2013-sentiment,
  address = {Atlanta, Georgia},
  author = {Habernal, Ivan  and
Pt{\'a}{\v{c}}ek, Tom{\'a}{\v{s}}  and
Steinberger, Josef},
  booktitle = {Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis},
  editor = {Balahur, Alexandra  and
van der Goot, Erik  and
Montoyo, Andres},
  month = jun,
  pages = {65--74},
  publisher = {Association for Computational Linguistics},
  title = {Sentiment Analysis in {C}zech Social Media Using Supervised Machine Learning},
  url = {https://aclanthology.org/W13-1609},
  year = {2013},
}

CzechSoMeSentimentClassification.v2

User comments on Facebook This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/czech_so_me_sentimentLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ces Reviews, Written derived found
Citation
@inproceedings{habernal-etal-2013-sentiment,
  address = {Atlanta, Georgia},
  author = {Habernal, Ivan  and
Pt{\'a}{\v{c}}ek, Tom{\'a}{\v{s}}  and
Steinberger, Josef},
  booktitle = {Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis},
  editor = {Balahur, Alexandra  and
van der Goot, Erik  and
Montoyo, Andres},
  month = jun,
  pages = {65--74},
  publisher = {Association for Computational Linguistics},
  title = {Sentiment Analysis in {C}zech Social Media Using Supervised Machine Learning},
  url = {https://aclanthology.org/W13-1609},
  year = {2013},
}

CzechSubjectivityClassification

An Czech dataset for subjectivity classification.

Dataset: pauli31/czech-subjectivity-datasetLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ces Reviews, Written human-annotated found
Citation
@inproceedings{priban-steinberger-2022-czech,
  address = {Marseille, France},
  author = {P{\v{r}}ib{\'a}{\v{n}}, Pavel  and
Steinberger, Josef},
  booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference},
  month = jun,
  pages = {1381--1391},
  publisher = {European Language Resources Association},
  title = {\{C\}zech Dataset for Cross-lingual Subjectivity Classification},
  url = {https://aclanthology.org/2022.lrec-1.148},
  year = {2022},
}

DBpediaClassification

DBpedia14 is a dataset of English texts from Wikipedia articles, categorized into 14 non-overlapping classes based on their DBpedia ontology.

Dataset: fancyzhx/dbpedia_14License: cc-by-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Encyclopaedic, Written derived found
Citation
@inproceedings{NIPS2015_250cf8b5,
  author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
  pages = {},
  publisher = {Curran Associates, Inc.},
  title = {Character-level Convolutional Networks for Text Classification},
  url = {https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf},
  volume = {28},
  year = {2015},
}

DBpediaClassification.v2

DBpedia14 is a dataset of English texts from Wikipedia articles, categorized into 14 non-overlapping classes based on their DBpedia ontology. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/d_bpediaLicense: cc-by-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Encyclopaedic, Written derived found
Citation
@inproceedings{NIPS2015_250cf8b5,
  author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
  pages = {},
  publisher = {Curran Associates, Inc.},
  title = {Character-level Convolutional Networks for Text Classification},
  url = {https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf},
  volume = {28},
  year = {2015},
}

DKHateClassification

Danish Tweets annotated for Hate Speech either being Offensive or not

Dataset: DDSC/dkhateLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan Social, Written expert-annotated found
Citation
@inproceedings{sigurbergsson-derczynski-2020-offensive,
  abstract = {The presence of offensive language on social media platforms and the implications this poses is becoming a major concern in modern society. Given the enormous amount of content created every day, automatic methods are required to detect and deal with this type of content. Until now, most of the research has focused on solving the problem for the English language, while the problem is multilingual. We construct a Danish dataset DKhate containing user-generated comments from various social media platforms, and to our knowledge, the first of its kind, annotated for various types and target of offensive language. We develop four automatic classification systems, each designed to work for both the English and the Danish language. In the detection of offensive language in English, the best performing system achieves a macro averaged F1-score of 0.74, and the best performing system for Danish achieves a macro averaged F1-score of 0.70. In the detection of whether or not an offensive post is targeted, the best performing system for English achieves a macro averaged F1-score of 0.62, while the best performing system for Danish achieves a macro averaged F1-score of 0.73. Finally, in the detection of the target type in a targeted offensive post, the best performing system for English achieves a macro averaged F1-score of 0.56, and the best performing system for Danish achieves a macro averaged F1-score of 0.63. Our work for both the English and the Danish language captures the type and targets of offensive language, and present automatic methods for detecting different kinds of offensive language such as hate speech and cyberbullying.},
  address = {Marseille, France},
  author = {Sigurbergsson, Gudbjartur Ingi  and
Derczynski, Leon},
  booktitle = {Proceedings of the Twelfth Language Resources and Evaluation Conference},
  editor = {Calzolari, Nicoletta  and
B{\'e}chet, Fr{\'e}d{\'e}ric  and
Blache, Philippe  and
Choukri, Khalid  and
Cieri, Christopher  and
Declerck, Thierry  and
Goggi, Sara  and
Isahara, Hitoshi  and
Maegaard, Bente  and
Mariani, Joseph  and
Mazo, H{\'e}l{\`e}ne  and
Moreno, Asuncion  and
Odijk, Jan  and
Piperidis, Stelios},
  isbn = {979-10-95546-34-4},
  language = {English},
  month = may,
  pages = {3498--3508},
  publisher = {European Language Resources Association},
  title = {Offensive Language and Hate Speech Detection for {D}anish},
  url = {https://aclanthology.org/2020.lrec-1.430},
  year = {2020},
}

DKHateClassification.v2

Danish Tweets annotated for Hate Speech either being Offensive or not This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/dk_hateLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan Social, Written expert-annotated found
Citation
@inproceedings{sigurbergsson-derczynski-2020-offensive,
  abstract = {The presence of offensive language on social media platforms and the implications this poses is becoming a major concern in modern society. Given the enormous amount of content created every day, automatic methods are required to detect and deal with this type of content. Until now, most of the research has focused on solving the problem for the English language, while the problem is multilingual. We construct a Danish dataset DKhate containing user-generated comments from various social media platforms, and to our knowledge, the first of its kind, annotated for various types and target of offensive language. We develop four automatic classification systems, each designed to work for both the English and the Danish language. In the detection of offensive language in English, the best performing system achieves a macro averaged F1-score of 0.74, and the best performing system for Danish achieves a macro averaged F1-score of 0.70. In the detection of whether or not an offensive post is targeted, the best performing system for English achieves a macro averaged F1-score of 0.62, while the best performing system for Danish achieves a macro averaged F1-score of 0.73. Finally, in the detection of the target type in a targeted offensive post, the best performing system for English achieves a macro averaged F1-score of 0.56, and the best performing system for Danish achieves a macro averaged F1-score of 0.63. Our work for both the English and the Danish language captures the type and targets of offensive language, and present automatic methods for detecting different kinds of offensive language such as hate speech and cyberbullying.},
  address = {Marseille, France},
  author = {Sigurbergsson, Gudbjartur Ingi  and
Derczynski, Leon},
  booktitle = {Proceedings of the Twelfth Language Resources and Evaluation Conference},
  editor = {Calzolari, Nicoletta  and
B{\'e}chet, Fr{\'e}d{\'e}ric  and
Blache, Philippe  and
Choukri, Khalid  and
Cieri, Christopher  and
Declerck, Thierry  and
Goggi, Sara  and
Isahara, Hitoshi  and
Maegaard, Bente  and
Mariani, Joseph  and
Mazo, H{\'e}l{\`e}ne  and
Moreno, Asuncion  and
Odijk, Jan  and
Piperidis, Stelios},
  isbn = {979-10-95546-34-4},
  language = {English},
  month = may,
  pages = {3498--3508},
  publisher = {European Language Resources Association},
  title = {Offensive Language and Hate Speech Detection for {D}anish},
  url = {https://aclanthology.org/2020.lrec-1.430},
  year = {2020},
}

DadoEvalCoarseClassification

The DaDoEval dataset is a curated collection of 2,759 documents authored by Alcide De Gasperi, spanning the period from 1901 to 1954. Each document in the dataset is manually tagged with its date of issue.

Dataset: MattiaSangermano/DaDoEvalLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ita Written derived found
Citation
@inproceedings{menini2020dadoeval,
  author = {Menini, Stefano and Moretti, Giovanni and Sprugnoli, Rachele and Tonelli, Sara and others},
  booktitle = {Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020)},
  organization = {Accademia University Press},
  pages = {391--397},
  title = {DaDoEval@ EVALITA 2020: Same-genre and cross-genre dating of historical documents},
  year = {2020},
}

DalajClassification

A Swedish dataset for linguistic acceptability. Available as a part of Superlim.

Dataset: AI-Sweden/SuperLimLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy swe Non-fiction, Written expert-annotated created
Citation
@misc{2105.06681,
  author = {Elena Volodina and Yousuf Ali Mohammed and Julia Klezl},
  eprint = {arXiv:2105.06681},
  title = {DaLAJ - a dataset for linguistic acceptability judgments for Swedish: Format, baseline, sharing},
  year = {2021},
}

DalajClassification.v2

A Swedish dataset for linguistic acceptability. Available as a part of Superlim. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/dalajLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy swe Non-fiction, Written expert-annotated created
Citation
@misc{2105.06681,
  author = {Elena Volodina and Yousuf Ali Mohammed and Julia Klezl},
  eprint = {arXiv:2105.06681},
  title = {DaLAJ - a dataset for linguistic acceptability judgments for Swedish: Format, baseline, sharing},
  year = {2021},
}

DanishPoliticalCommentsClassification

A dataset of Danish political comments rated for sentiment

Dataset: community-datasets/danish_political_commentsLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan Social, Written derived found
Citation
@techreport{SAMsentiment,
  author = {Mads Guldborg Kjeldgaard Kongsbak and Steffan Eybye Christensen and Lucas Høyberg Puvis~de~Chavannes and Peter Due Jensen},
  institution = {IT University of Copenhagen},
  title = {Sentiment Analysis Multitool, SAM},
  year = {2019},
}

DanishPoliticalCommentsClassification.v2

A dataset of Danish political comments rated for sentiment This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/danish_political_commentsLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan Social, Written derived found
Citation
@techreport{SAMsentiment,
  author = {Mads Guldborg Kjeldgaard Kongsbak and Steffan Eybye Christensen and Lucas Høyberg Puvis~de~Chavannes and Peter Due Jensen},
  institution = {IT University of Copenhagen},
  title = {Sentiment Analysis Multitool, SAM},
  year = {2019},
}

Ddisco

A Danish Discourse dataset with values for coherence and source (Wikipedia or Reddit)

Dataset: DDSC/ddiscoLicense: cc-by-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan Non-fiction, Social, Written expert-annotated found
Citation
@inproceedings{flansmose-mikkelsen-etal-2022-ddisco,
  abstract = {To date, there has been no resource for studying discourse coherence on real-world Danish texts. Discourse coherence has mostly been approached with the assumption that incoherent texts can be represented by coherent texts in which sentences have been shuffled. However, incoherent real-world texts rarely resemble that. We thus present DDisCo, a dataset including text from the Danish Wikipedia and Reddit annotated for discourse coherence. We choose to annotate real-world texts instead of relying on artificially incoherent text for training and testing models. Then, we evaluate the performance of several methods, including neural networks, on the dataset.},
  address = {Marseille, France},
  author = {Flansmose Mikkelsen, Linea  and
Kinch, Oliver  and
Jess Pedersen, Anders  and
Lacroix, Oph{\'e}lie},
  booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference},
  editor = {Calzolari, Nicoletta  and
B{\'e}chet, Fr{\'e}d{\'e}ric  and
Blache, Philippe  and
Choukri, Khalid  and
Cieri, Christopher  and
Declerck, Thierry  and
Goggi, Sara  and
Isahara, Hitoshi  and
Maegaard, Bente  and
Mariani, Joseph  and
Mazo, H{\'e}l{\`e}ne  and
Odijk, Jan  and
Piperidis, Stelios},
  month = jun,
  pages = {2440--2445},
  publisher = {European Language Resources Association},
  title = {{DD}is{C}o: A Discourse Coherence Dataset for {D}anish},
  url = {https://aclanthology.org/2022.lrec-1.260},
  year = {2022},
}

Ddisco.v2

A Danish Discourse dataset with values for coherence and source (Wikipedia or Reddit) This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/ddisco_cohesionLicense: cc-by-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan Non-fiction, Social, Written expert-annotated found
Citation
@inproceedings{flansmose-mikkelsen-etal-2022-ddisco,
  abstract = {To date, there has been no resource for studying discourse coherence on real-world Danish texts. Discourse coherence has mostly been approached with the assumption that incoherent texts can be represented by coherent texts in which sentences have been shuffled. However, incoherent real-world texts rarely resemble that. We thus present DDisCo, a dataset including text from the Danish Wikipedia and Reddit annotated for discourse coherence. We choose to annotate real-world texts instead of relying on artificially incoherent text for training and testing models. Then, we evaluate the performance of several methods, including neural networks, on the dataset.},
  address = {Marseille, France},
  author = {Flansmose Mikkelsen, Linea  and
Kinch, Oliver  and
Jess Pedersen, Anders  and
Lacroix, Oph{\'e}lie},
  booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference},
  editor = {Calzolari, Nicoletta  and
B{\'e}chet, Fr{\'e}d{\'e}ric  and
Blache, Philippe  and
Choukri, Khalid  and
Cieri, Christopher  and
Declerck, Thierry  and
Goggi, Sara  and
Isahara, Hitoshi  and
Maegaard, Bente  and
Mariani, Joseph  and
Mazo, H{\'e}l{\`e}ne  and
Odijk, Jan  and
Piperidis, Stelios},
  month = jun,
  pages = {2440--2445},
  publisher = {European Language Resources Association},
  title = {{DD}is{C}o: A Discourse Coherence Dataset for {D}anish},
  url = {https://aclanthology.org/2022.lrec-1.260},
  year = {2022},
}

DeepSentiPers

Persian Sentiment Analysis Dataset

Dataset: PartAI/DeepSentiPersLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Reviews derived found
Citation

DeepSentiPers.v2

Persian Sentiment Analysis Dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/deep_senti_persLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Reviews derived found
Citation

DefinitionClassificationLegalBenchClassification

This task consists of determining whether or not a sentence from a Supreme Court opinion offers a definition of a term.

Dataset: nguha/legalbenchLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

DigikalamagClassification

A total of 8,515 articles scraped from Digikala Online Magazine. This dataset includes seven different classes.

Dataset: PNLPhub/DigiMagLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Web derived found
Citation

Diversity1LegalBenchClassification

Given a set of facts about the citizenships of plaintiffs and defendants and the amounts associated with claims, determine if the criteria for diversity jurisdiction have been met (variant 1).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

Diversity2LegalBenchClassification

Given a set of facts about the citizenships of plaintiffs and defendants and the amounts associated with claims, determine if the criteria for diversity jurisdiction have been met (variant 2).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

Diversity3LegalBenchClassification

Given a set of facts about the citizenships of plaintiffs and defendants and the amounts associated with claims, determine if the criteria for diversity jurisdiction have been met (variant 3).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

Diversity4LegalBenchClassification

Given a set of facts about the citizenships of plaintiffs and defendants and the amounts associated with claims, determine if the criteria for diversity jurisdiction have been met (variant 4).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

Diversity5LegalBenchClassification

Given a set of facts about the citizenships of plaintiffs and defendants and the amounts associated with claims, determine if the criteria for diversity jurisdiction have been met (variant 5).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

Diversity6LegalBenchClassification

Given a set of facts about the citizenships of plaintiffs and defendants and the amounts associated with claims, determine if the criteria for diversity jurisdiction have been met (variant 6).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

DutchBookReviewSentimentClassification

A Dutch book review for sentiment classification.

Dataset: mteb/DutchBookReviewSentimentClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy nld Reviews, Written derived found
Citation
@article{DBLP:journals/corr/abs-1910-00896,
  archiveprefix = {arXiv},
  author = {Benjamin, van der Burgh and
Suzan, Verberne},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/journals/corr/abs-1910-00896.bib},
  eprint = {1910.00896},
  journal = {CoRR},
  timestamp = {Fri, 04 Oct 2019 12:28:06 +0200},
  title = {The merits of Universal Language Model Fine-tuning for Small Datasets
- a case with Dutch book reviews},
  url = {http://arxiv.org/abs/1910.00896},
  volume = {abs/1910.00896},
  year = {2019},
}

DutchBookReviewSentimentClassification.v2

A Dutch book review for sentiment classification. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/dutch_book_review_sentimentLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy nld Reviews, Written derived found
Citation
@article{DBLP:journals/corr/abs-1910-00896,
  archiveprefix = {arXiv},
  author = {Benjamin, van der Burgh and
Suzan, Verberne},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/journals/corr/abs-1910-00896.bib},
  eprint = {1910.00896},
  journal = {CoRR},
  timestamp = {Fri, 04 Oct 2019 12:28:06 +0200},
  title = {The merits of Universal Language Model Fine-tuning for Small Datasets
- a case with Dutch book reviews},
  url = {http://arxiv.org/abs/1910.00896},
  volume = {abs/1910.00896},
  year = {2019},
}

EmotionClassification

Emotion is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise.

Dataset: mteb/emotionLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Social, Written human-annotated found
Citation
@inproceedings{saravia-etal-2018-carer,
  abstract = {Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.},
  address = {Brussels, Belgium},
  author = {Saravia, Elvis  and
Liu, Hsien-Chi Toby  and
Huang, Yen-Hao  and
Wu, Junlin  and
Chen, Yi-Shin},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
  doi = {10.18653/v1/D18-1404},
  editor = {Riloff, Ellen  and
Chiang, David  and
Hockenmaier, Julia  and
Tsujii, Jun{'}ichi},
  month = oct # {-} # nov,
  pages = {3687--3697},
  publisher = {Association for Computational Linguistics},
  title = {{CARER}: Contextualized Affect Representations for Emotion Recognition},
  url = {https://aclanthology.org/D18-1404},
  year = {2018},
}

EmotionClassification.v2

Emotion is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/emotionLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Social, Written human-annotated found
Citation
@inproceedings{saravia-etal-2018-carer,
  abstract = {Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.},
  address = {Brussels, Belgium},
  author = {Saravia, Elvis  and
Liu, Hsien-Chi Toby  and
Huang, Yen-Hao  and
Wu, Junlin  and
Chen, Yi-Shin},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
  doi = {10.18653/v1/D18-1404},
  editor = {Riloff, Ellen  and
Chiang, David  and
Hockenmaier, Julia  and
Tsujii, Jun{'}ichi},
  month = oct # {-} # nov,
  pages = {3687--3697},
  publisher = {Association for Computational Linguistics},
  title = {{CARER}: Contextualized Affect Representations for Emotion Recognition},
  url = {https://aclanthology.org/D18-1404},
  year = {2018},
}

EmotionVNClassification

Emotion is a translated dataset of Vietnamese from English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise. The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/emotion-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Social, Written derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

EstonianValenceClassification

Dataset containing annotated Estonian news data from the Postimees and Õhtuleht newspapers.

Dataset: kardosdrur/estonian-valenceLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy est News, Written human-annotated found
Citation
@article{Pajupuu2023,
  author = {Hille Pajupuu and Jaan Pajupuu and Rene Altrov and Kairi Tamuri},
  doi = {10.6084/m9.figshare.24517054.v1},
  month = {11},
  title = {{Estonian Valence Corpus  / Eesti valentsikorpus}},
  url = {https://figshare.com/articles/dataset/Estonian_Valence_Corpus_Eesti_valentsikorpus/24517054},
  year = {2023},
}

EstonianValenceClassification.v2

Dataset containing annotated Estonian news data from the Postimees and Õhtuleht newspapers. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/estonian_valenceLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy est News, Written human-annotated found
Citation
@article{Pajupuu2023,
  author = {Hille Pajupuu and Jaan Pajupuu and Rene Altrov and Kairi Tamuri},
  doi = {10.6084/m9.figshare.24517054.v1},
  month = {11},
  title = {{Estonian Valence Corpus  / Eesti valentsikorpus}},
  url = {https://figshare.com/articles/dataset/Estonian_Valence_Corpus_Eesti_valentsikorpus/24517054},
  year = {2023},
}

FilipinoHateSpeechClassification

Filipino Twitter dataset for sentiment classification.

Dataset: mteb/FilipinoHateSpeechClassificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fil Social, Written human-annotated found
Citation
@article{Cabasag-2019-hate-speech,
  author = {Neil Vicente Cabasag, Vicente Raphael Chan, Sean Christian Lim, Mark Edward Gonzales, and Charibeth Cheng},
  journal = {Philippine Computing Journal},
  month = {August},
  number = {1},
  title = {Hate speech in Philippine election-related tweets: Automatic detection and classification using natural language processing.},
  volume = {XIV},
  year = {2019},
}

FilipinoHateSpeechClassification.v2

Filipino Twitter dataset for sentiment classification. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/filipino_hate_speechLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fil Social, Written human-annotated found
Citation
@article{Cabasag-2019-hate-speech,
  author = {Neil Vicente Cabasag, Vicente Raphael Chan, Sean Christian Lim, Mark Edward Gonzales, and Charibeth Cheng},
  journal = {Philippine Computing Journal},
  month = {August},
  number = {1},
  title = {Hate speech in Philippine election-related tweets: Automatic detection and classification using natural language processing.},
  volume = {XIV},
  year = {2019},
}

FilipinoShopeeReviewsClassification

The Shopee reviews tl 15 dataset is constructed by randomly taking 2100 training samples and 450 samples for testing and validation for each review star from 1 to 5. In total, there are 10500 training samples and 2250 each in validation and testing samples.

Dataset: scaredmeow/shopee-reviews-tl-starsLicense: mpl-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fil Social, Written human-annotated found
Citation
@article{riegoenhancement,
  author = {Riego, Neil Christian R. and Villarba, Danny Bell and Sison, Ariel Antwaun Rolando C. and Pineda, Fernandez C. and Lagunzad, Herminiño C.},
  issue = {08},
  journal = {United International Journal for Research & Technology},
  pages = {72--82},
  title = {Enhancement to Low-Resource Text Classification via Sequential Transfer Learning},
  volume = {04},
}

FinToxicityClassification

    This dataset is a DeepL -based machine translated version of the Jigsaw toxicity dataset for Finnish. The dataset is originally from a Kaggle competition https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data.
    The original dataset poses a multi-label text classification problem and includes the labels identity_attack, insult, obscene, severe_toxicity, threat and toxicity.
    Here adapted for toxicity classification, which is the most represented class.

Dataset: TurkuNLP/jigsaw_toxicity_pred_fiLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 fin News, Written derived machine-translated
Citation
@inproceedings{eskelinen-etal-2023-toxicity,
  author = {Eskelinen, Anni  and
Silvala, Laura  and
Ginter, Filip  and
Pyysalo, Sampo  and
Laippala, Veronika},
  booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
  month = may,
  title = {Toxicity Detection in {F}innish Using Machine Translation},
  year = {2023},
}

FinToxicityClassification.v2

    This dataset is a DeepL -based machine translated version of the Jigsaw toxicity dataset for Finnish. The dataset is originally from a Kaggle competition https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data.
    The original dataset poses a multi-label text classification problem and includes the labels identity_attack, insult, obscene, severe_toxicity, threat and toxicity.
    Here adapted for toxicity classification, which is the most represented class.

    This version corrects errors found in the original data. For details, see [pull request](https://github.com/embeddings-benchmark/mteb/pull/2900)

Dataset: mteb/fin_toxicityLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 fin News, Written derived machine-translated
Citation
@inproceedings{eskelinen-etal-2023-toxicity,
  author = {Eskelinen, Anni  and
Silvala, Laura  and
Ginter, Filip  and
Pyysalo, Sampo  and
Laippala, Veronika},
  booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
  month = may,
  title = {Toxicity Detection in {F}innish Using Machine Translation},
  year = {2023},
}

FinancialPhrasebankClassification

Polar sentiment dataset of sentences from financial news, categorized by sentiment into positive, negative, or neutral.

Dataset: takala/financial_phrasebankLicense: cc-by-nc-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Financial, News, Written expert-annotated found
Citation
@article{Malo2014GoodDO,
  author = {P. Malo and A. Sinha and P. Korhonen and J. Wallenius and P. Takala},
  journal = {Journal of the Association for Information Science and Technology},
  title = {Good debt or bad debt: Detecting semantic orientations in economic texts},
  volume = {65},
  year = {2014},
}

FinancialPhrasebankClassification.v2

Polar sentiment dataset of sentences from financial news, categorized by sentiment into positive, negative, or neutral. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/financial_phrasebankLicense: cc-by-nc-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Financial, News, Written expert-annotated found
Citation
@article{Malo2014GoodDO,
  author = {P. Malo and A. Sinha and P. Korhonen and J. Wallenius and P. Takala},
  journal = {Journal of the Association for Information Science and Technology},
  title = {Good debt or bad debt: Detecting semantic orientations in economic texts},
  volume = {65},
  year = {2014},
}

FrenchBookReviews

It is a French book reviews dataset containing a huge number of reader reviews on French books. Each review is pared with a rating that ranges from 0.5 to 5 (with 0.5 increment).

Dataset: Abirate/french_book_reviewsLicense: cc0-1.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fra Reviews, Written derived found
Citation

FrenchBookReviews.v2

It is a French book reviews dataset containing a huge number of reader reviews on French books. Each review is pared with a rating that ranges from 0.5 to 5 (with 0.5 increment). This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/french_book_reviewsLicense: cc0-1.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fra Reviews, Written derived found
Citation

FrenkEnClassification

English subset of the FRENK dataset

Dataset: classla/FRENK-hate-enLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Social, Written derived found
Citation
@misc{ljubešić2019frenk,
  archiveprefix = {arXiv},
  author = {Nikola Ljubešić and Darja Fišer and Tomaž Erjavec},
  eprint = {1906.02045},
  primaryclass = {cs.CL},
  title = {The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English},
  url = {https://arxiv.org/abs/1906.02045},
  year = {2019},
}

FrenkEnClassification.v2

English subset of the FRENK dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/frenk_enLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Social, Written derived found
Citation
@misc{ljubešić2019frenk,
  archiveprefix = {arXiv},
  author = {Nikola Ljubešić and Darja Fišer and Tomaž Erjavec},
  eprint = {1906.02045},
  primaryclass = {cs.CL},
  title = {The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English},
  url = {https://arxiv.org/abs/1906.02045},
  year = {2019},
}

FrenkHrClassification

Croatian subset of the FRENK dataset

Dataset: classla/FRENK-hate-hrLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy hrv Social, Written derived found
Citation
@misc{ljubešić2019frenk,
  archiveprefix = {arXiv},
  author = {Nikola Ljubešić and Darja Fišer and Tomaž Erjavec},
  eprint = {1906.02045},
  primaryclass = {cs.CL},
  title = {The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English},
  url = {https://arxiv.org/abs/1906.02045},
  year = {2019},
}

FrenkHrClassification.v2

Croatian subset of the FRENK dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/frenk_hrLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy hrv Social, Written derived found
Citation
@misc{ljubešić2019frenk,
  archiveprefix = {arXiv},
  author = {Nikola Ljubešić and Darja Fišer and Tomaž Erjavec},
  eprint = {1906.02045},
  primaryclass = {cs.CL},
  title = {The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English},
  url = {https://arxiv.org/abs/1906.02045},
  year = {2019},
}

FrenkSlClassification

Slovenian subset of the FRENK dataset. Also available on HuggingFace dataset hub: English subset, Croatian subset.

Dataset: classla/FRENK-hate-slLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy slv Social, Written derived found
Citation
@misc{ljubešić2019frenk,
  archiveprefix = {arXiv},
  author = {Nikola Ljubešić and Darja Fišer and Tomaž Erjavec},
  eprint = {1906.02045},
  primaryclass = {cs.CL},
  title = {The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English},
  url = {https://arxiv.org/abs/1906.02045},
  year = {2019},
}

FrenkSlClassification.v2

Slovenian subset of the FRENK dataset. Also available on HuggingFace dataset hub: English subset, Croatian subset. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/frenk_slLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy slv Social, Written derived found
Citation
@misc{ljubešić2019frenk,
  archiveprefix = {arXiv},
  author = {Nikola Ljubešić and Darja Fišer and Tomaž Erjavec},
  eprint = {1906.02045},
  primaryclass = {cs.CL},
  title = {The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English},
  url = {https://arxiv.org/abs/1906.02045},
  year = {2019},
}

FunctionOfDecisionSectionLegalBenchClassification

The task is to classify a paragraph extracted from a written court decision into one of seven possible categories: 1. Facts - The paragraph describes the faction background that led up to the present lawsuit. 2. Procedural History - The paragraph describes the course of litigation that led to the current proceeding before the court. 3. Issue - The paragraph describes the legal or factual issue that must be resolved by the court. 4. Rule - The paragraph describes a rule of law relevant to resolving the issue. 5. Analysis - The paragraph analyzes the legal issue by applying the relevant legal principles to the facts of the present dispute. 6. Conclusion - The paragraph presents a conclusion of the court. 7. Decree - The paragraph constitutes a decree resolving the dispute.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

GeoreviewClassification

Review classification (5-point scale) based on Yandex Georeview dataset

Dataset: ai-forever/georeview-classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus Reviews, Written derived found

GeoreviewClassification.v2

Review classification (5-point scale) based on Yandex Georeview dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/georeviewLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus Reviews, Written derived found

GeorgianSentimentClassification

Goergian Sentiment Dataset

Dataset: asparius/Georgian-SentimentLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kat Reviews, Written derived found
Citation
@inproceedings{stefanovitch-etal-2022-resources,
  abstract = {This paper presents, to the best of our knowledge, the first ever publicly available annotated dataset for sentiment classification and semantic polarity dictionary for Georgian. The characteristics of these resources and the process of their creation are described in detail. The results of various experiments on the performance of both lexicon- and machine learning-based models for Georgian sentiment classification are also reported. Both 3-label (positive, neutral, negative) and 4-label settings (same labels + mixed) are considered. The machine learning models explored include, i.a., logistic regression, SVMs, and transformed-based models. We also explore transfer learning- and translation-based (to a well-supported language) approaches. The obtained results for Georgian are on par with the state-of-the-art results in sentiment classification for well studied languages when using training data of comparable size.},
  address = {Marseille, France},
  author = {Stefanovitch, Nicolas  and
Piskorski, Jakub  and
Kharazi, Sopho},
  booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference},
  editor = {Calzolari, Nicoletta  and
B{\'e}chet, Fr{\'e}d{\'e}ric  and
Blache, Philippe  and
Choukri, Khalid  and
Cieri, Christopher  and
Declerck, Thierry  and
Goggi, Sara  and
Isahara, Hitoshi  and
Maegaard, Bente  and
Mariani, Joseph  and
Mazo, H{\'e}l{\`e}ne  and
Odijk, Jan  and
Piperidis, Stelios},
  month = jun,
  pages = {1613--1621},
  publisher = {European Language Resources Association},
  title = {Resources and Experiments on Sentiment Classification for {G}eorgian},
  url = {https://aclanthology.org/2022.lrec-1.173},
  year = {2022},
}

GermanPoliticiansTwitterSentimentClassification

GermanPoliticiansTwitterSentiment is a dataset of German tweets categorized with their sentiment (3 classes).

Dataset: Alienmaster/german_politicians_twitter_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy deu Government, Social, Written human-annotated found
Citation
@inproceedings{schmidt-etal-2022-sentiment,
  address = {Potsdam, Germany},
  author = {Schmidt, Thomas  and
Fehle, Jakob  and
Weissenbacher, Maximilian  and
Richter, Jonathan  and
Gottschalk, Philipp  and
Wolff, Christian},
  booktitle = {Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)},
  editor = {Schaefer, Robin  and
Bai, Xiaoyu  and
Stede, Manfred  and
Zesch, Torsten},
  month = {12--15 } # sep,
  pages = {74--87},
  publisher = {KONVENS 2022 Organizers},
  title = {Sentiment Analysis on {T}witter for the Major {G}erman Parties during the 2021 {G}erman Federal Election},
  url = {https://aclanthology.org/2022.konvens-1.9},
  year = {2022},
}

GermanPoliticiansTwitterSentimentClassification.v2

GermanPoliticiansTwitterSentiment is a dataset of German tweets categorized with their sentiment (3 classes). This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/german_politicians_twitter_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy deu Government, Social, Written human-annotated found
Citation
@inproceedings{schmidt-etal-2022-sentiment,
  address = {Potsdam, Germany},
  author = {Schmidt, Thomas  and
Fehle, Jakob  and
Weissenbacher, Maximilian  and
Richter, Jonathan  and
Gottschalk, Philipp  and
Wolff, Christian},
  booktitle = {Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)},
  editor = {Schaefer, Robin  and
Bai, Xiaoyu  and
Stede, Manfred  and
Zesch, Torsten},
  month = {12--15 } # sep,
  pages = {74--87},
  publisher = {KONVENS 2022 Organizers},
  title = {Sentiment Analysis on {T}witter for the Major {G}erman Parties during the 2021 {G}erman Federal Election},
  url = {https://aclanthology.org/2022.konvens-1.9},
  year = {2022},
}

GreekLegalCodeClassification

Greek Legal Code Dataset for Classification. (subset = chapter)

Dataset: AI-team-UoA/greek_legal_codeLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ell Legal, Written human-annotated found
Citation
@inproceedings{papaloukas-etal-2021-glc,
  address = {Punta Cana, Dominican Republic},
  author = {Papaloukas, Christos and Chalkidis, Ilias and Athinaios, Konstantinos and Pantazi, Despina-Athanasia and Koubarakis, Manolis},
  booktitle = {Proceedings of the Natural Legal Language Processing Workshop 2021},
  doi = {10.48550/arXiv.2109.15298},
  pages = {63--75},
  publisher = {Association for Computational Linguistics},
  title = {Multi-granular Legal Topic Classification on Greek Legislation},
  url = {https://arxiv.org/abs/2109.15298},
  year = {2021},
}

GujaratiNewsClassification

A Gujarati dataset for 3-class classification of Gujarati news articles

Dataset: mlexplorer008/gujarati_news_classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy guj News, Written derived found

GujaratiNewsClassification.v2

A Gujarati dataset for 3-class classification of Gujarati news articles This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/gujarati_newsLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy guj News, Written derived found

HateSpeechPortugueseClassification

HateSpeechPortugueseClassification is a dataset of Portuguese tweets categorized with their sentiment (2 classes).

Dataset: hate-speech-portuguese/hate_speech_portugueseLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy por Social, Written expert-annotated found
Citation
@inproceedings{fortuna-etal-2019-hierarchically,
  address = {Florence, Italy},
  author = {Fortuna, Paula  and
Rocha da Silva, Jo{\~a}o  and
Soler-Company, Juan  and
Wanner, Leo  and
Nunes, S{\'e}rgio},
  booktitle = {Proceedings of the Third Workshop on Abusive Language Online},
  doi = {10.18653/v1/W19-3510},
  editor = {Roberts, Sarah T.  and
Tetreault, Joel  and
Prabhakaran, Vinodkumar  and
Waseem, Zeerak},
  month = aug,
  pages = {94--104},
  publisher = {Association for Computational Linguistics},
  title = {A Hierarchically-Labeled {P}ortuguese Hate Speech Dataset},
  url = {https://aclanthology.org/W19-3510},
  year = {2019},
}

HeadlineClassification

Headline rubric classification based on the paraphraser plus dataset.

Dataset: ai-forever/headline-classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus News, Written derived found
Citation
@inproceedings{gudkov-etal-2020-automatically,
  abstract = {The article is focused on automatic development and ranking of a large corpus for Russian paraphrase generation which proves to be the first corpus of such type in Russian computational linguistics. Existing manually annotated paraphrase datasets for Russian are limited to small-sized ParaPhraser corpus and ParaPlag which are suitable for a set of NLP tasks, such as paraphrase and plagiarism detection, sentence similarity and relatedness estimation, etc. Due to size restrictions, these datasets can hardly be applied in end-to-end text generation solutions. Meanwhile, paraphrase generation requires a large amount of training data. In our study we propose a solution to the problem: we collect, rank and evaluate a new publicly available headline paraphrase corpus (ParaPhraser Plus), and then perform text generation experiments with manual evaluation on automatically ranked corpora using the Universal Transformer architecture.},
  address = {Online},
  author = {Gudkov, Vadim  and
Mitrofanova, Olga  and
Filippskikh, Elizaveta},
  booktitle = {Proceedings of the Fourth Workshop on Neural Generation and Translation},
  doi = {10.18653/v1/2020.ngt-1.6},
  editor = {Birch, Alexandra  and
Finch, Andrew  and
Hayashi, Hiroaki  and
Heafield, Kenneth  and
Junczys-Dowmunt, Marcin  and
Konstas, Ioannis  and
Li, Xian  and
Neubig, Graham  and
Oda, Yusuke},
  month = jul,
  pages = {54--59},
  publisher = {Association for Computational Linguistics},
  title = {Automatically Ranked {R}ussian Paraphrase Corpus for Text Generation},
  url = {https://aclanthology.org/2020.ngt-1.6},
  year = {2020},
}

HeadlineClassification.v2

Headline rubric classification based on the paraphraser plus dataset. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/headlineLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus News, Written derived found
Citation
@inproceedings{gudkov-etal-2020-automatically,
  abstract = {The article is focused on automatic development and ranking of a large corpus for Russian paraphrase generation which proves to be the first corpus of such type in Russian computational linguistics. Existing manually annotated paraphrase datasets for Russian are limited to small-sized ParaPhraser corpus and ParaPlag which are suitable for a set of NLP tasks, such as paraphrase and plagiarism detection, sentence similarity and relatedness estimation, etc. Due to size restrictions, these datasets can hardly be applied in end-to-end text generation solutions. Meanwhile, paraphrase generation requires a large amount of training data. In our study we propose a solution to the problem: we collect, rank and evaluate a new publicly available headline paraphrase corpus (ParaPhraser Plus), and then perform text generation experiments with manual evaluation on automatically ranked corpora using the Universal Transformer architecture.},
  address = {Online},
  author = {Gudkov, Vadim  and
Mitrofanova, Olga  and
Filippskikh, Elizaveta},
  booktitle = {Proceedings of the Fourth Workshop on Neural Generation and Translation},
  doi = {10.18653/v1/2020.ngt-1.6},
  editor = {Birch, Alexandra  and
Finch, Andrew  and
Hayashi, Hiroaki  and
Heafield, Kenneth  and
Junczys-Dowmunt, Marcin  and
Konstas, Ioannis  and
Li, Xian  and
Neubig, Graham  and
Oda, Yusuke},
  month = jul,
  pages = {54--59},
  publisher = {Association for Computational Linguistics},
  title = {Automatically Ranked {R}ussian Paraphrase Corpus for Text Generation},
  url = {https://aclanthology.org/2020.ngt-1.6},
  year = {2020},
}

HebrewSentimentAnalysis

HebrewSentiment is a data set consists of 12,804 user comments to posts on the official Facebook page of Israel’s president, Mr. Reuven Rivlin. In October 2015, we used the open software application Netvizz (Rieder, 2013) to scrape all the comments to all of the president’s posts in the period of June – August 2014, the first three months of Rivlin’s presidency.2 While the president’s posts aimed at reconciling tensions and called for tolerance and empathy, the sentiment expressed in the comments to the president’s posts was polarized between citizens who warmly thanked the president, and citizens that fiercely critiqued his policy.

Dataset: omilab/hebrew_sentimentLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy heb Reviews, Written expert-annotated found
Citation
@inproceedings{amram-etal-2018-representations,
  address = {Santa Fe, New Mexico, USA},
  author = {Amram, Adam and Ben David, Anat and Tsarfaty, Reut},
  booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
  month = aug,
  pages = {2242--2252},
  publisher = {Association for Computational Linguistics},
  title = {Representations and Architectures in Neural Sentiment Analysis for Morphologically Rich Languages: A Case Study from {M}odern {H}ebrew},
  url = {https://www.aclweb.org/anthology/C18-1190},
  year = {2018},
}

HebrewSentimentAnalysis.v2

HebrewSentiment is a data set consists of 12,804 user comments to posts on the official Facebook page of Israel’s president, Mr. Reuven Rivlin. In October 2015, we used the open software application Netvizz (Rieder, 2013) to scrape all the comments to all of the president’s posts in the period of June – August 2014, the first three months of Rivlin’s presidency.2 While the president’s posts aimed at reconciling tensions and called for tolerance and empathy, the sentiment expressed in the comments to the president’s posts was polarized between citizens who warmly thanked the president, and citizens that fiercely critiqued his policy. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/hebrew_sentiment_analysisLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy heb Reviews, Written expert-annotated found
Citation
@inproceedings{amram-etal-2018-representations,
  address = {Santa Fe, New Mexico, USA},
  author = {Amram, Adam and Ben David, Anat and Tsarfaty, Reut},
  booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
  month = aug,
  pages = {2242--2252},
  publisher = {Association for Computational Linguistics},
  title = {Representations and Architectures in Neural Sentiment Analysis for Morphologically Rich Languages: A Case Study from {M}odern {H}ebrew},
  url = {https://www.aclweb.org/anthology/C18-1190},
  year = {2018},
}

HinDialectClassification

HinDialect: 26 Hindi-related languages and dialects of the Indic Continuum in North India

Dataset: mlexplorer008/hin_dialect_classificationLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 anp, awa, ben, bgc, bhb, ... (21) Social, Spoken, Written expert-annotated found
Citation
@misc{11234/1-4839,
  author = {Bafna, Niyati and {\v Z}abokrtsk{\'y}, Zden{\v e}k and Espa{\~n}a-Bonet, Cristina and van Genabith, Josef and Kumar, Lalit "Samyak Lalit" and Suman, Sharda and Shivay, Rahul},
  copyright = {Creative Commons - Attribution-{NonCommercial}-{ShareAlike} 4.0 International ({CC} {BY}-{NC}-{SA} 4.0)},
  note = {{LINDAT}/{CLARIAH}-{CZ} digital library at the Institute of Formal and Applied Linguistics ({{\'U}FAL}), Faculty of Mathematics and Physics, Charles University},
  title = {{HinDialect} 1.1: 26 Hindi-related languages and dialects of the Indic Continuum in North India},
  url = {http://hdl.handle.net/11234/1-4839},
  year = {2022},
}

HindiDiscourseClassification

A Hindi Discourse dataset in Hindi with values for coherence.

Dataset: midas/hindi_discourseLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy hin Fiction, Social, Written expert-annotated found
Citation
@inproceedings{dhanwal-etal-2020-annotated,
  address = {Marseille, France},
  author = {Dhanwal, Swapnil  and
Dutta, Hritwik  and
Nankani, Hitesh  and
Shrivastava, Nilay  and
Kumar, Yaman  and
Li, Junyi Jessy  and
Mahata, Debanjan  and
Gosangi, Rakesh  and
Zhang, Haimin  and
Shah, Rajiv Ratn  and
Stent, Amanda},
  booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference},
  isbn = {979-10-95546-34-4},
  language = {English},
  month = may,
  publisher = {European Language Resources Association},
  title = {An Annotated Dataset of Discourse Modes in {H}indi Stories},
  url = {https://www.aclweb.org/anthology/2020.lrec-1.149},
  year = {2020},
}

HindiDiscourseClassification.v2

A Hindi Discourse dataset in Hindi with values for coherence. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/hindi_discourseLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy hin Fiction, Social, Written expert-annotated found
Citation
@inproceedings{dhanwal-etal-2020-annotated,
  address = {Marseille, France},
  author = {Dhanwal, Swapnil  and
Dutta, Hritwik  and
Nankani, Hitesh  and
Shrivastava, Nilay  and
Kumar, Yaman  and
Li, Junyi Jessy  and
Mahata, Debanjan  and
Gosangi, Rakesh  and
Zhang, Haimin  and
Shah, Rajiv Ratn  and
Stent, Amanda},
  booktitle = {Proceedings of the 12th Language Resources and Evaluation Conference},
  isbn = {979-10-95546-34-4},
  language = {English},
  month = may,
  publisher = {European Language Resources Association},
  title = {An Annotated Dataset of Discourse Modes in {H}indi Stories},
  url = {https://www.aclweb.org/anthology/2020.lrec-1.149},
  year = {2020},
}

HotelReviewSentimentClassification

HARD is a dataset of Arabic hotel reviews collected from the Booking.com website.

Dataset: mteb/HotelReviewSentimentClassificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Reviews, Written derived found
Citation
@article{elnagar2018hotel,
  author = {Elnagar, Ashraf and Khalifa, Yasmin S and Einea, Anas},
  journal = {Intelligent natural language processing: Trends and applications},
  pages = {35--52},
  publisher = {Springer},
  title = {Hotel Arabic-reviews dataset construction for sentiment analysis applications},
  year = {2018},
}

HotelReviewSentimentClassification.v2

HARD is a dataset of Arabic hotel reviews collected from the Booking.com website. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/HotelReviewSentimentClassificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Reviews, Written derived found
Citation
@article{elnagar2018hotel,
  author = {Elnagar, Ashraf and Khalifa, Yasmin S and Einea, Anas},
  journal = {Intelligent natural language processing: Trends and applications},
  pages = {35--52},
  publisher = {Springer},
  title = {Hotel Arabic-reviews dataset construction for sentiment analysis applications},
  year = {2018},
}

IFlyTek

Long Text classification for the description of Apps

Dataset: C-MTEB/IFlyTek-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified
Citation
@inproceedings{xu-etal-2020-clue,
  abstract = {The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks. These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP). The problem, however, is that most such benchmarks are limited to English, which has made it difficult to replicate many of the successes in English NLU for other languages. To help remedy this issue, we introduce the first large-scale Chinese Language Understanding Evaluation (CLUE) benchmark. CLUE is an open-ended, community-driven project that brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text. To establish results on these tasks, we report scores using an exhaustive set of current state-of-the-art pre-trained Chinese models (9 in total). We also introduce a number of supplementary datasets and additional tools to help facilitate further progress on Chinese NLU. Our benchmark is released at https://www.cluebenchmarks.com},
  address = {Barcelona, Spain (Online)},
  author = {Xu, Liang  and
Hu, Hai and
Zhang, Xuanwei and
Li, Lu and
Cao, Chenjie and
Li, Yudong and
Xu, Yechen and
Sun, Kai and
Yu, Dian and
Yu, Cong and
Tian, Yin and
Dong, Qianqian and
Liu, Weitang and
Shi, Bo and
Cui, Yiming and
Li, Junyi and
Zeng, Jun and
Wang, Rongzhao and
Xie, Weijian and
Li, Yanting and
Patterson, Yina and
Tian, Zuoyu and
Zhang, Yiwen and
Zhou, He and
Liu, Shaoweihua and
Zhao, Zhe and
Zhao, Qipeng and
Yue, Cong and
Zhang, Xinrui and
Yang, Zhengliang and
Richardson, Kyle and
Lan, Zhenzhong },
  booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
  doi = {10.18653/v1/2020.coling-main.419},
  month = dec,
  pages = {4762--4772},
  publisher = {International Committee on Computational Linguistics},
  title = {{CLUE}: A {C}hinese Language Understanding Evaluation Benchmark},
  url = {https://aclanthology.org/2020.coling-main.419},
  year = {2020},
}

IFlyTek.v2

Long Text classification for the description of Apps This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/i_fly_tekLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified
Citation
@inproceedings{xu-etal-2020-clue,
  abstract = {The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks. These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP). The problem, however, is that most such benchmarks are limited to English, which has made it difficult to replicate many of the successes in English NLU for other languages. To help remedy this issue, we introduce the first large-scale Chinese Language Understanding Evaluation (CLUE) benchmark. CLUE is an open-ended, community-driven project that brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text. To establish results on these tasks, we report scores using an exhaustive set of current state-of-the-art pre-trained Chinese models (9 in total). We also introduce a number of supplementary datasets and additional tools to help facilitate further progress on Chinese NLU. Our benchmark is released at https://www.cluebenchmarks.com},
  address = {Barcelona, Spain (Online)},
  author = {Xu, Liang  and
Hu, Hai and
Zhang, Xuanwei and
Li, Lu and
Cao, Chenjie and
Li, Yudong and
Xu, Yechen and
Sun, Kai and
Yu, Dian and
Yu, Cong and
Tian, Yin and
Dong, Qianqian and
Liu, Weitang and
Shi, Bo and
Cui, Yiming and
Li, Junyi and
Zeng, Jun and
Wang, Rongzhao and
Xie, Weijian and
Li, Yanting and
Patterson, Yina and
Tian, Zuoyu and
Zhang, Yiwen and
Zhou, He and
Liu, Shaoweihua and
Zhao, Zhe and
Zhao, Qipeng and
Yue, Cong and
Zhang, Xinrui and
Yang, Zhengliang and
Richardson, Kyle and
Lan, Zhenzhong },
  booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
  doi = {10.18653/v1/2020.coling-main.419},
  month = dec,
  pages = {4762--4772},
  publisher = {International Committee on Computational Linguistics},
  title = {{CLUE}: A {C}hinese Language Understanding Evaluation Benchmark},
  url = {https://aclanthology.org/2020.coling-main.419},
  year = {2020},
}

ImdbClassification

Large Movie Review Dataset

Dataset: mteb/imdbLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Reviews, Written derived found
Citation
@inproceedings{maas-etal-2011-learning,
  address = {Portland, Oregon, USA},
  author = {Maas, Andrew L.  and
Daly, Raymond E.  and
Pham, Peter T.  and
Huang, Dan  and
Ng, Andrew Y.  and
Potts, Christopher},
  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
  editor = {Lin, Dekang  and
Matsumoto, Yuji  and
Mihalcea, Rada},
  month = jun,
  pages = {142--150},
  publisher = {Association for Computational Linguistics},
  title = {Learning Word Vectors for Sentiment Analysis},
  url = {https://aclanthology.org/P11-1015},
  year = {2011},
}

ImdbClassification.v2

Large Movie Review Dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/imdbLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Reviews, Written derived found
Citation
@inproceedings{maas-etal-2011-learning,
  address = {Portland, Oregon, USA},
  author = {Maas, Andrew L.  and
Daly, Raymond E.  and
Pham, Peter T.  and
Huang, Dan  and
Ng, Andrew Y.  and
Potts, Christopher},
  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
  editor = {Lin, Dekang  and
Matsumoto, Yuji  and
Mihalcea, Rada},
  month = jun,
  pages = {142--150},
  publisher = {Association for Computational Linguistics},
  title = {Learning Word Vectors for Sentiment Analysis},
  url = {https://aclanthology.org/P11-1015},
  year = {2011},
}

ImdbVNClassification

A translated dataset of large movie reviews annotated for sentiment classification. The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/imdb-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Reviews, Written derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

InappropriatenessClassification

Inappropriateness identification in the form of binary classification

Dataset: ai-forever/inappropriateness-classificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus Social, Web, Written human-annotated found
Citation
@inproceedings{babakov-etal-2021-detecting,
  abstract = {Not all topics are equally {``}flammable{''} in terms of toxicity: a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness. While toxicity in user-generated data is well-studied, we aim at defining a more fine-grained notion of inappropriateness. The core of inappropriateness is that it can harm the reputation of a speaker. This is different from toxicity in two respects: (i) inappropriateness is topic-related, and (ii) inappropriate message is not toxic but still unacceptable. We collect and release two datasets for Russian: a topic-labelled dataset and an appropriateness-labelled dataset. We also release pre-trained classification models trained on this data.},
  address = {Kiyv, Ukraine},
  author = {Babakov, Nikolay  and
Logacheva, Varvara  and
Kozlova, Olga  and
Semenov, Nikita  and
Panchenko, Alexander},
  booktitle = {Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing},
  editor = {Babych, Bogdan  and
Kanishcheva, Olga  and
Nakov, Preslav  and
Piskorski, Jakub  and
Pivovarova, Lidia  and
Starko, Vasyl  and
Steinberger, Josef  and
Yangarber, Roman  and
Marci{\'n}czuk, Micha{\l}  and
Pollak, Senja  and
P{\v{r}}ib{\'a}{\v{n}}, Pavel  and
Robnik-{\v{S}}ikonja, Marko},
  month = apr,
  pages = {26--36},
  publisher = {Association for Computational Linguistics},
  title = {Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company{'}s Reputation},
  url = {https://aclanthology.org/2021.bsnlp-1.4},
  year = {2021},
}

InappropriatenessClassification.v2

Inappropriateness identification in the form of binary classification This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/inappropriatenessLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus Social, Web, Written human-annotated found
Citation
@inproceedings{babakov-etal-2021-detecting,
  abstract = {Not all topics are equally {``}flammable{''} in terms of toxicity: a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness. While toxicity in user-generated data is well-studied, we aim at defining a more fine-grained notion of inappropriateness. The core of inappropriateness is that it can harm the reputation of a speaker. This is different from toxicity in two respects: (i) inappropriateness is topic-related, and (ii) inappropriate message is not toxic but still unacceptable. We collect and release two datasets for Russian: a topic-labelled dataset and an appropriateness-labelled dataset. We also release pre-trained classification models trained on this data.},
  address = {Kiyv, Ukraine},
  author = {Babakov, Nikolay  and
Logacheva, Varvara  and
Kozlova, Olga  and
Semenov, Nikita  and
Panchenko, Alexander},
  booktitle = {Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing},
  editor = {Babych, Bogdan  and
Kanishcheva, Olga  and
Nakov, Preslav  and
Piskorski, Jakub  and
Pivovarova, Lidia  and
Starko, Vasyl  and
Steinberger, Josef  and
Yangarber, Roman  and
Marci{\'n}czuk, Micha{\l}  and
Pollak, Senja  and
P{\v{r}}ib{\'a}{\v{n}}, Pavel  and
Robnik-{\v{S}}ikonja, Marko},
  month = apr,
  pages = {26--36},
  publisher = {Association for Computational Linguistics},
  title = {Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company{'}s Reputation},
  url = {https://aclanthology.org/2021.bsnlp-1.4},
  year = {2021},
}

InappropriatenessClassificationv2

Inappropriateness identification in the form of binary classification

Dataset: mteb/InappropriatenessClassificationv2License: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) accuracy rus Social, Web, Written human-annotated found
Citation
@inproceedings{babakov-etal-2021-detecting,
  abstract = {Not all topics are equally {``}flammable{''} in terms of toxicity: a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness. While toxicity in user-generated data is well-studied, we aim at defining a more fine-grained notion of inappropriateness. The core of inappropriateness is that it can harm the reputation of a speaker. This is different from toxicity in two respects: (i) inappropriateness is topic-related, and (ii) inappropriate message is not toxic but still unacceptable. We collect and release two datasets for Russian: a topic-labelled dataset and an appropriateness-labelled dataset. We also release pre-trained classification models trained on this data.},
  address = {Kiyv, Ukraine},
  author = {Babakov, Nikolay  and
Logacheva, Varvara  and
Kozlova, Olga  and
Semenov, Nikita  and
Panchenko, Alexander},
  booktitle = {Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing},
  editor = {Babych, Bogdan  and
Kanishcheva, Olga  and
Nakov, Preslav  and
Piskorski, Jakub  and
Pivovarova, Lidia  and
Starko, Vasyl  and
Steinberger, Josef  and
Yangarber, Roman  and
Marci{\'n}czuk, Micha{\l}  and
Pollak, Senja  and
P{\v{r}}ib{\'a}{\v{n}}, Pavel  and
Robnik-{\v{S}}ikonja, Marko},
  month = apr,
  pages = {26--36},
  publisher = {Association for Computational Linguistics},
  title = {Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company{'}s Reputation},
  url = {https://aclanthology.org/2021.bsnlp-1.4},
  year = {2021},
}

IndicLangClassification

A language identification test set for native-script as well as Romanized text which spans 22 Indic languages.

Dataset: ai4bharat/Bhasha-AbhijnaanamLicense: cc0-1.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy asm, ben, brx, doi, gom, ... (22) Non-fiction, Web, Written expert-annotated created
Citation
@inproceedings{madhani-etal-2023-bhasa,
  address = {Toronto, Canada},
  author = {Madhani, Yash  and
Khapra, Mitesh M.  and
Kunchukuttan, Anoop},
  booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
  doi = {10.18653/v1/2023.acl-short.71},
  editor = {Rogers, Anna  and
Boyd-Graber, Jordan  and
Okazaki, Naoaki},
  month = jul,
  pages = {816--826},
  publisher = {Association for Computational Linguistics},
  title = {Bhasa-Abhijnaanam: Native-script and romanized Language Identification for 22 {I}ndic languages},
  url = {https://aclanthology.org/2023.acl-short.71},
  year = {2023},
}

IndicNLPNewsClassification

A News classification dataset in multiple Indian regional languages.

Dataset: Sakshamrzt/IndicNLP-MultilingualLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy guj, kan, mal, mar, ori, ... (8) News, Written expert-annotated found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

IndicSentimentClassification

A new, multilingual, and n-way parallel dataset for sentiment analysis in 13 Indic languages.

Dataset: mteb/IndicSentimentLicense: cc0-1.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy asm, ben, brx, guj, hin, ... (13) Reviews, Written human-annotated machine-translated and verified
Citation
@article{doddapaneni2022towards,
  author = {Sumanth Doddapaneni and Rahul Aralikatte and Gowtham Ramesh and Shreyansh Goyal and Mitesh M. Khapra and Anoop Kunchukuttan and Pratyush Kumar},
  doi = {10.18653/v1/2023.acl-long.693},
  journal = {Annual Meeting of the Association for Computational Linguistics},
  title = {Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages},
  year = {2022},
}

IndonesianIdClickbaitClassification

The CLICK-ID dataset is a collection of Indonesian news headlines that was collected from 12 local online news publishers.

Dataset: manandey/id_clickbaitLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 ind News, Written expert-annotated found
Citation
@article{WILLIAM2020106231,
  abstract = {News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas.},
  author = {Andika William and Yunita Sari},
  doi = {https://doi.org/10.1016/j.dib.2020.106231},
  issn = {2352-3409},
  journal = {Data in Brief},
  keywords = {Indonesian, Natural Language Processing, News articles, Clickbait, Text-classification},
  pages = {106231},
  title = {CLICK-ID: A novel dataset for Indonesian clickbait headlines},
  url = {http://www.sciencedirect.com/science/article/pii/S2352340920311252},
  volume = {32},
  year = {2020},
}

IndonesianIdClickbaitClassification.v2

The CLICK-ID dataset is a collection of Indonesian news headlines that was collected from 12 local online news publishers. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/indonesian_id_clickbaitLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 ind News, Written expert-annotated found
Citation
@article{WILLIAM2020106231,
  abstract = {News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas.},
  author = {Andika William and Yunita Sari},
  doi = {https://doi.org/10.1016/j.dib.2020.106231},
  issn = {2352-3409},
  journal = {Data in Brief},
  keywords = {Indonesian, Natural Language Processing, News articles, Clickbait, Text-classification},
  pages = {106231},
  title = {CLICK-ID: A novel dataset for Indonesian clickbait headlines},
  url = {http://www.sciencedirect.com/science/article/pii/S2352340920311252},
  volume = {32},
  year = {2020},
}

IndonesianMongabayConservationClassification

Conservation dataset that was collected from mongabay.co.id contains topic-classification task (multi-label format) and sentiment classification. This task only covers sentiment analysis (positive, neutral negative)

Dataset: Datasaur/mongabay-experimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 ind Web, Written derived found
Citation
@inproceedings{fransiska-etal-2023-utilizing,
  address = {Nusa Dua, Bali, Indonesia},
  author = {Fransiska, Mega  and
Pitaloka, Diah  and
Saripudin, Saripudin  and
Putra, Satrio  and
Sutawika*, Lintang},
  booktitle = {Proceedings of the First Workshop in South East Asian Language Processing},
  doi = {10.18653/v1/2023.sealp-1.4},
  editor = {Wijaya, Derry  and
Aji, Alham Fikri  and
Vania, Clara  and
Winata, Genta Indra  and
Purwarianti, Ayu},
  month = nov,
  pages = {30--54},
  publisher = {Association for Computational Linguistics},
  title = {Utilizing Weak Supervision to Generate {I}ndonesian Conservation Datasets},
  url = {https://aclanthology.org/2023.sealp-1.4},
  year = {2023},
}

IndonesianMongabayConservationClassification.v2

Conservation dataset that was collected from mongabay.co.id contains topic-classification task (multi-label format) and sentiment classification. This task only covers sentiment analysis (positive, neutral negative) This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/indonesian_mongabay_conservationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 ind Web, Written derived found
Citation
@inproceedings{fransiska-etal-2023-utilizing,
  address = {Nusa Dua, Bali, Indonesia},
  author = {Fransiska, Mega  and
Pitaloka, Diah  and
Saripudin, Saripudin  and
Putra, Satrio  and
Sutawika*, Lintang},
  booktitle = {Proceedings of the First Workshop in South East Asian Language Processing},
  doi = {10.18653/v1/2023.sealp-1.4},
  editor = {Wijaya, Derry  and
Aji, Alham Fikri  and
Vania, Clara  and
Winata, Genta Indra  and
Purwarianti, Ayu},
  month = nov,
  pages = {30--54},
  publisher = {Association for Computational Linguistics},
  title = {Utilizing Weak Supervision to Generate {I}ndonesian Conservation Datasets},
  url = {https://aclanthology.org/2023.sealp-1.4},
  year = {2023},
}

InsurancePolicyInterpretationLegalBenchClassification

Given an insurance claim and policy, determine whether the claim is covered by the policy.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

InternationalCitizenshipQuestionsLegalBenchClassification

Answer questions about citizenship law from across the world. Dataset was made using the GLOBALCIT citizenship law dataset, by constructing questions about citizenship law as Yes or No questions.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@misc{vink2023globalcit,
  author = {Vink, Maarten and van der Baaren, Luuk and Bauböck, Rainer and Džankić, Jelena and Honohan, Iseult and Manby, Bronwen},
  howpublished = {https://hdl.handle.net/1814/73190},
  publisher = {Global Citizenship Observatory},
  title = {GLOBALCIT Citizenship Law Dataset, v2.0, Country-Year-Mode Data (Acquisition)},
  year = {2023},
}

IsiZuluNewsClassification

isiZulu News Classification Dataset

Dataset: isaacchung/isizulu-newsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy zul News, Written human-annotated found
Citation
@article{Madodonga_Marivate_Adendorff_2023,
  author = {Madodonga, Andani and Marivate, Vukosi and Adendorff, Matthew},
  doi = {10.55492/dhasa.v4i01.4449},
  month = {Jan.},
  title = {Izindaba-Tindzaba: Machine learning news categorisation for Long and Short Text for isiZulu and Siswati},
  url = {https://upjournals.up.ac.za/index.php/dhasa/article/view/4449},
  volume = {4},
  year = {2023},
}

IsiZuluNewsClassification.v2

isiZulu News Classification Dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/isi_zulu_newsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy zul News, Written human-annotated found
Citation
@article{Madodonga_Marivate_Adendorff_2023,
  author = {Madodonga, Andani and Marivate, Vukosi and Adendorff, Matthew},
  doi = {10.55492/dhasa.v4i01.4449},
  month = {Jan.},
  title = {Izindaba-Tindzaba: Machine learning news categorisation for Long and Short Text for isiZulu and Siswati},
  url = {https://upjournals.up.ac.za/index.php/dhasa/article/view/4449},
  volume = {4},
  year = {2023},
}

ItaCaseholdClassification

An Italian Dataset consisting of 1101 pairs of judgments and their official holdings between the years 2019 and 2022 from the archives of Italian Administrative Justice categorized with 64 subjects.

Dataset: itacasehold/itacaseholdLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ita Government, Legal, Written expert-annotated found
Citation
@inproceedings{10.1145/3594536.3595177,
  abstract = {Legal holdings are used in Italy as a critical component of the legal system, serving to establish legal precedents, provide guidance for future legal decisions, and ensure consistency and predictability in the interpretation and application of the law. They are written by domain experts who describe in a clear and concise manner the principle of law applied in the judgments.We introduce a legal holding extraction method based on Italian-LEGAL-BERT to automatically extract legal holdings from Italian cases. In addition, we present ITA-CaseHold, a benchmark dataset for Italian legal summarization. We conducted several experiments using this dataset, as a valuable baseline for future research on this topic.},
  address = {New York, NY, USA},
  author = {Licari, Daniele and Bushipaka, Praveen and Marino, Gabriele and Comand\'{e}, Giovanni and Cucinotta, Tommaso},
  booktitle = {Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law},
  doi = {10.1145/3594536.3595177},
  isbn = {9798400701979},
  keywords = {Italian-LEGAL-BERT, Holding Extraction, Extractive Text Summarization, Benchmark Dataset},
  location = {<conf-loc>, <city>Braga</city>, <country>Portugal</country>, </conf-loc>},
  numpages = {9},
  pages = {148–156},
  publisher = {Association for Computing Machinery},
  series = {ICAIL '23},
  title = {Legal Holding Extraction from Italian Case Documents using Italian-LEGAL-BERT Text Summarization},
  url = {https://doi.org/10.1145/3594536.3595177},
  year = {2023},
}

Itacola

An Italian Corpus of Linguistic Acceptability taken from linguistic literature with a binary annotation made by the original authors themselves.

Dataset: gsarti/itacolaLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ita Non-fiction, Spoken, Written expert-annotated found
Citation
@inproceedings{trotta-etal-2021-monolingual-cross,
  address = {Punta Cana, Dominican Republic},
  author = {Trotta, Daniela  and
Guarasci, Raffaele  and
Leonardelli, Elisa  and
Tonelli, Sara},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021},
  doi = {10.18653/v1/2021.findings-emnlp.250},
  month = nov,
  pages = {2929--2940},
  publisher = {Association for Computational Linguistics},
  title = {Monolingual and Cross-Lingual Acceptability Judgments with the {I}talian {C}o{LA} corpus},
  url = {https://aclanthology.org/2021.findings-emnlp.250},
  year = {2021},
}

Itacola.v2

An Italian Corpus of Linguistic Acceptability taken from linguistic literature with a binary annotation made by the original authors themselves. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/italian_linguistic_acceptabilityLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ita Non-fiction, Spoken, Written expert-annotated found
Citation
@inproceedings{trotta-etal-2021-monolingual-cross,
  address = {Punta Cana, Dominican Republic},
  author = {Trotta, Daniela  and
Guarasci, Raffaele  and
Leonardelli, Elisa  and
Tonelli, Sara},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021},
  doi = {10.18653/v1/2021.findings-emnlp.250},
  month = nov,
  pages = {2929--2940},
  publisher = {Association for Computational Linguistics},
  title = {Monolingual and Cross-Lingual Acceptability Judgments with the {I}talian {C}o{LA} corpus},
  url = {https://aclanthology.org/2021.findings-emnlp.250},
  year = {2021},
}

JCrewBlockerLegalBenchClassification

The J.Crew Blocker, also known as the J.Crew Protection, is a provision included in leveraged loan documents to prevent companies from removing security by transferring intellectual property (IP) into new subsidiaries and raising additional debt. The task consists of detemining whether the J.Crew Blocker is present in the document.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

JCrewBlockerLegalBenchClassification.v2

The J.Crew Blocker, also known as the J.Crew Protection, is a provision included in leveraged loan documents to prevent companies from removing security by transferring intellectual property (IP) into new subsidiaries and raising additional debt. The task consists of detemining whether the J.Crew Blocker is present in the document. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/j_crew_blocker_legal_benchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

JDReview

review for iphone

Dataset: C-MTEB/JDReview-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified
Citation
@article{xiao2023c,
  author = {Xiao, Shitao and Liu, Zheng and Zhang, Peitian and Muennighof, Niklas},
  journal = {arXiv preprint arXiv:2309.07597},
  title = {C-pack: Packaged resources to advance general chinese embedding},
  year = {2023},
}

JDReview.v2

review for iphone This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/jd_reviewLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified
Citation
@article{xiao2023c,
  author = {Xiao, Shitao and Liu, Zheng and Zhang, Peitian and Muennighof, Niklas},
  journal = {arXiv preprint arXiv:2309.07597},
  title = {C-pack: Packaged resources to advance general chinese embedding},
  year = {2023},
}

JapaneseSentimentClassification

Japanese sentiment classification dataset with binary (positive vs negative sentiment) labels. This version reverts the morphological analysis from the original multilingual dataset to restore natural Japanese text without artificial spaces.

Dataset: sbintuitions/JMTEBLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy jpn Reviews, Written derived found
Citation
@inproceedings{mollanorozy-etal-2023-cross,
  address = {Dubrovnik, Croatia},
  author = {Mollanorozy, Sepideh  and
Tanti, Marc  and
Nissim, Malvina},
  booktitle = {Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
  doi = {10.18653/v1/2023.sigtyp-1.9},
  editor = {Beinborn, Lisa  and
Goswami, Koustava  and
Murado{\\u{g}}lu, Saliha  and
Sorokin, Alexey  and
Shcherbakov, Andreas  and
Ponti, Edoardo M.  and
Cotterell, Ryan  and
Vylomova, Ekaterina},
  month = may,
  pages = {89--95},
  publisher = {Association for Computational Linguistics},
  title = {Cross-lingual Transfer Learning with \{P\}ersian},
  url = {https://aclanthology.org/2023.sigtyp-1.9},
  year = {2023},
}

JavaneseIMDBClassification

Large Movie Review Dataset translated to Javanese. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets.

Dataset: w11wo/imdb-javaneseLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy jav Reviews, Written human-annotated found
Citation
@inproceedings{wongso2021causal,
  author = {Wongso, Wilson and Setiawan, David Samuel and Suhartono, Derwin},
  booktitle = {2021 International Conference on Advanced Computer Science and Information Systems (ICACSIS)},
  organization = {IEEE},
  pages = {1--7},
  title = {Causal and Masked Language Modeling of Javanese Language using Transformer-based Architectures},
  year = {2021},
}

JavaneseIMDBClassification.v2

Large Movie Review Dataset translated to Javanese. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/javanese_imdbLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy jav Reviews, Written human-annotated found
Citation
@inproceedings{wongso2021causal,
  author = {Wongso, Wilson and Setiawan, David Samuel and Suhartono, Derwin},
  booktitle = {2021 International Conference on Advanced Computer Science and Information Systems (ICACSIS)},
  organization = {IEEE},
  pages = {1--7},
  title = {Causal and Masked Language Modeling of Javanese Language using Transformer-based Architectures},
  year = {2021},
}

KLUE-TC

Topic classification dataset of human-annotated news headlines. Part of the Korean Language Understanding Evaluation (KLUE).

Dataset: klue/klueLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kor News, Written human-annotated found
Citation
@misc{park2021klue,
  archiveprefix = {arXiv},
  author = {Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Taehwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Younghoon Jeong and Inkwon Lee and Sangwoo Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice Oh and Jungwoo Ha and Kyunghyun Cho},
  eprint = {2105.09680},
  primaryclass = {cs.CL},
  title = {KLUE: Korean Language Understanding Evaluation},
  year = {2021},
}

KLUE-TC.v2

Topic classification dataset of human-annotated news headlines. Part of the Korean Language Understanding Evaluation (KLUE). This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/klue_tcLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kor News, Written human-annotated found
Citation
@misc{park2021klue,
  archiveprefix = {arXiv},
  author = {Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Taehwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Younghoon Jeong and Inkwon Lee and Sangwoo Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice Oh and Jungwoo Ha and Kyunghyun Cho},
  eprint = {2105.09680},
  primaryclass = {cs.CL},
  title = {KLUE: Korean Language Understanding Evaluation},
  year = {2021},
}

KannadaNewsClassification

The Kannada news dataset contains only the headlines of news article in three categories: Entertainment, Tech, and Sports. The data set contains around 6300 news article headlines which are collected from Kannada news websites. The data set has been cleaned and contains train and test set using which can be used to benchmark topic classification models in Kannada.

Dataset: Akash190104/kannada_news_classificationLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kan News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

KannadaNewsClassification.v2

The Kannada news dataset contains only the headlines of news article in three categories: Entertainment, Tech, and Sports. The data set contains around 6300 news article headlines which are collected from Kannada news websites. The data set has been cleaned and contains train and test set using which can be used to benchmark topic classification models in Kannada. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/kannada_newsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kan News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

KinopoiskClassification

Kinopoisk review sentiment classification

Dataset: ai-forever/kinopoisk-sentiment-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus Reviews, Written derived found
Citation
@article{blinov2013research,
  author = {Blinov, PD and Klekovkina, Maria and Kotelnikov, Eugeny and Pestov, Oleg},
  journal = {Computational Linguistics and Intellectual Technologies},
  number = {12},
  pages = {48--58},
  title = {Research of lexical approach and machine learning methods for sentiment analysis},
  volume = {2},
  year = {2013},
}

KorFin

The KorFin-ASC is an extension of KorFin-ABSA, which is a financial sentiment analysis dataset including 8818 samples with (aspect, polarity) pairs annotated. The samples were collected from KLUE-TC and analyst reports from Naver Finance.

Dataset: amphora/korfin-ascLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kor Financial, News, Written expert-annotated found
Citation
@article{son2023removing,
  author = {Son, Guijin and Lee, Hanwool and Kang, Nahyeon and Hahm, Moonjeong},
  journal = {arXiv preprint arXiv:2301.03136},
  title = {Removing Non-Stationary Knowledge From Pre-Trained Language Models for Entity-Level Sentiment Classification in Finance},
  year = {2023},
}

KorHateClassification

The dataset was created to provide the first human-labeled Korean corpus for toxic speech detection from a Korean online entertainment news aggregator. Recently, two young Korean celebrities suffered from a series of tragic incidents that led to two major Korean web portals to close the comments section on their platform. However, this only serves as a temporary solution, and the fundamental issue has not been solved yet. This dataset hopes to improve Korean hate speech detection. Annotation was performed by 32 annotators, consisting of 29 annotators from the crowdsourcing platform DeepNatural AI and three NLP researchers.

Dataset: inmoonlight/kor_hateLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kor Social, Written expert-annotated found
Citation
@misc{moon2020beep,
  archiveprefix = {arXiv},
  author = {Jihyung Moon and Won Ik Cho and Junbum Lee},
  eprint = {2005.12503},
  primaryclass = {cs.CL},
  title = {BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection},
  year = {2020},
}

KorHateClassification.v2

The dataset was created to provide the first human-labeled Korean corpus for toxic speech detection from a Korean online entertainment news aggregator. Recently, two young Korean celebrities suffered from a series of tragic incidents that led to two major Korean web portals to close the comments section on their platform. However, this only serves as a temporary solution, and the fundamental issue has not been solved yet. This dataset hopes to improve Korean hate speech detection. Annotation was performed by 32 annotators, consisting of 29 annotators from the crowdsourcing platform DeepNatural AI and three NLP researchers.

    This version corrects errors found in the original data. For details, see [pull request](https://github.com/embeddings-benchmark/mteb/pull/2900)

Dataset: mteb/kor_hateLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kor Social, Written expert-annotated found
Citation
@misc{moon2020beep,
  archiveprefix = {arXiv},
  author = {Jihyung Moon and Won Ik Cho and Junbum Lee},
  eprint = {2005.12503},
  primaryclass = {cs.CL},
  title = {BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection},
  year = {2020},
}

KorSarcasmClassification

    The Korean Sarcasm Dataset was created to detect sarcasm in text, which can significantly alter the original
    meaning of a sentence. 9319 tweets were collected from Twitter and labeled for sarcasm or not_sarcasm. These
    tweets were gathered by querying for: irony sarcastic, and
    sarcasm.
    The dataset was created by gathering HTML data from Twitter. Queries for hashtags that include sarcasm
    and variants of it were used to return tweets. It was preprocessed by removing the keyword
    hashtag, urls and mentions of the user to preserve anonymity.

Dataset: SpellOnYou/kor_sarcasmLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kor Social, Written expert-annotated found
Citation
@misc{kim2019kocasm,
  author = {Kim, Jiwon and Cho, Won Ik},
  howpublished = {https://github.com/SpellOnYou/korean-sarcasm},
  journal = {GitHub repository},
  publisher = {GitHub},
  title = {Kocasm: Korean Automatic Sarcasm Detection},
  year = {2019},
}

KorSarcasmClassification.v2

    The Korean Sarcasm Dataset was created to detect sarcasm in text, which can significantly alter the original
    meaning of a sentence. 9319 tweets were collected from Twitter and labeled for sarcasm or not_sarcasm. These
    tweets were gathered by querying for: irony sarcastic, and
    sarcasm.
    The dataset was created by gathering HTML data from Twitter. Queries for hashtags that include sarcasm
    and variants of it were used to return tweets. It was preprocessed by removing the keyword
    hashtag, urls and mentions of the user to preserve anonymity.

    This version corrects errors found in the original data. For details, see [pull request](https://github.com/embeddings-benchmark/mteb/pull/2900)

Dataset: mteb/kor_sarcasmLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kor Social, Written expert-annotated found
Citation
@misc{kim2019kocasm,
  author = {Kim, Jiwon and Cho, Won Ik},
  howpublished = {https://github.com/SpellOnYou/korean-sarcasm},
  journal = {GitHub repository},
  publisher = {GitHub},
  title = {Kocasm: Korean Automatic Sarcasm Detection},
  year = {2019},
}

KurdishSentimentClassification

Kurdish Sentiment Dataset

Dataset: asparius/Kurdish-SentimentLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kur Web, Written derived found
Citation
@article{article,
  author = {Badawi, Soran and Kazemi, Arefeh and Rezaie, Vali},
  doi = {10.1007/s10579-023-09716-6},
  journal = {Language Resources and Evaluation},
  month = {01},
  pages = {1-20},
  title = {KurdiSent: a corpus for kurdish sentiment analysis},
  year = {2024},
}

KurdishSentimentClassification.v2

Kurdish Sentiment Dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/kurdish_sentimentLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy kur Web, Written derived found
Citation
@article{article,
  author = {Badawi, Soran and Kazemi, Arefeh and Rezaie, Vali},
  doi = {10.1007/s10579-023-09716-6},
  journal = {Language Resources and Evaluation},
  month = {01},
  pages = {1-20},
  title = {KurdiSent: a corpus for kurdish sentiment analysis},
  year = {2024},
}

LanguageClassification

A language identification dataset for 20 languages.

Dataset: papluca/language-identificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara, bul, cmn, deu, ell, ... (20) Fiction, Government, Non-fiction, Reviews, Web, ... (6) derived found
Citation
@inproceedings{conneau2018xnli,
  author = {Conneau, Alexis
and Rinott, Ruty
and Lample, Guillaume
and Williams, Adina
and Bowman, Samuel R.
and Schwenk, Holger
and Stoyanov, Veselin},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods
in Natural Language Processing},
  location = {Brussels, Belgium},
  publisher = {Association for Computational Linguistics},
  title = {XNLI: Evaluating Cross-lingual Sentence Representations},
  year = {2018},
}

LccSentimentClassification

The leipzig corpora collection, annotated for sentiment

Dataset: DDSC/lccLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan News, Web, Written expert-annotated found
Citation
@inproceedings{quasthoff-etal-2006-corpus,
  abstract = {A simple and flexible schema for storing and presenting monolingual language resources is proposed. In this format, data for 18 different languages is already available in various sizes. The data is provided free of charge for online use and download. The main target is to ease the application of algorithms for monolingual and interlingual studies.},
  address = {Genoa, Italy},
  author = {Quasthoff, Uwe  and
Richter, Matthias  and
Biemann, Christian},
  booktitle = {Proceedings of the Fifth International Conference on Language Resources and Evaluation ({LREC}{'}06)},
  editor = {Calzolari, Nicoletta  and
Choukri, Khalid  and
Gangemi, Aldo  and
Maegaard, Bente  and
Mariani, Joseph  and
Odijk, Jan  and
Tapias, Daniel},
  month = may,
  publisher = {European Language Resources Association (ELRA)},
  title = {Corpus Portal for Search in Monolingual Corpora},
  url = {http://www.lrec-conf.org/proceedings/lrec2006/pdf/641_pdf.pdf},
  year = {2006},
}

LearnedHandsBenefitsLegalBenchClassification

This is a binary classification task in which the model must determine if a user's legal post discusses public benefits and social services that people can get from the government, like for food, disability, old age, housing, medical help, unemployment, child care, or other social needs.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsBusinessLegalBenchClassification

This is a binary classification task in which the model must determine if a user's legal question discusses issues faced by people who run small businesses or nonprofits, including around incorporation, licenses, taxes, regulations, and other concerns. It also includes options when there are disasters, bankruptcies, or other problems.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsConsumerLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses issues people face regarding money, insurance, consumer goods and contracts, taxes, and small claims about quality of service.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsCourtsLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses the logistics of how a person can interact with a lawyer or the court system. It applies to situations about procedure, rules, how to file lawsuits, how to hire lawyers, how to represent oneself, and other practical matters about dealing with these systems.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsCrimeLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses issues in the criminal system including when people are charged with crimes, go to a criminal trial, go to prison, or are a victim of a crime.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsDivorceLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses issues around filing for divorce, separation, or annulment, getting spousal support, splitting money and property, and following the court processes.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsDomesticViolenceLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses dealing with domestic violence and abuse, including getting protective orders, enforcing them, understanding abuse, reporting abuse, and getting resources and status if there is abuse.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsEducationLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses issues around school, including accommodations for special needs, discrimination, student debt, discipline, and other issues in education.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsEmploymentLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses issues related to working at a job, including discrimination and harassment, worker's compensation, workers rights, unions, getting paid, pensions, being fired, and more.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsEstatesLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses planning for end-of-life, possible incapacitation, and other special circumstances that would prevent a person from making decisions about their own well-being, finances, and property. This includes issues around wills, powers of attorney, advance directives, trusts, guardianships, conservatorships, and other estate issues that people and families deal with.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsFamilyLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses issues that arise within a family, like divorce, adoption, name change, guardianship, domestic violence, child custody, and other issues.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsHealthLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses issues with accessing health services, paying for medical care, getting public benefits for health care, protecting one's rights in medical settings, and other issues related to health.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsHousingLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses issues with paying your rent or mortgage, landlord-tenant issues, housing subsidies and public housing, eviction, and other problems with your apartment, mobile home, or house.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsImmigrationLegalBenchClassification

This is a binary classification task in which the model must determine if a user's post discusses visas, asylum, green cards, citizenship, migrant work and benefits, and other issues faced by people who are not full citizens in the US.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsTortsLegalBenchClassification

This is a binary classification task in which the model must determine if a user's legal question discusses problems that one person has with another person (or animal), like when there is a car accident, a dog bite, bullying or possible harassment, or neighbors treating each other badly.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LearnedHandsTrafficLegalBenchClassification

This is a binary classification task in which the model must determine if a user's legal post discusses problems with traffic and parking tickets, fees, driver's licenses, and other issues experienced with the traffic system. It also concerns issues with car accidents and injuries, cars' quality, repairs, purchases, and other contracts.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@dataset{learned_hands,
  author = {{Suffolk University Law School} and {Stanford Legal Design Lab}},
  note = {The LearnedHands dataset is licensed under CC BY-NC-SA 4.0},
  title = {LearnedHands Dataset},
  url = {https://spot.suffolklitlab.org/data/#learnedhands},
  urldate = {2022-05-21},
  year = {2022},
}

LegalReasoningCausalityLegalBenchClassification

Given an excerpt from a district court opinion, classify if it relies on statistical evidence in its reasoning.

Dataset: nguha/legalbenchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

LegalReasoningCausalityLegalBenchClassification.v2

Given an excerpt from a district court opinion, classify if it relies on statistical evidence in its reasoning. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/legal_reasoning_causality_legal_benchLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

MAUDLegalBenchClassification

This task was constructed from the MAUD dataset, which consists of over 47,000 labels across 152 merger agreements annotated to identify 92 questions in each agreement used by the 2021 American Bar Association (ABA) Public Target Deal Points Study. Each dataset is formatted as a series of multiple-choice questions, where given a segment of the merger agreement and a Deal Point question, the model is to choose the answer that best characterizes the agreement as response.

    This is a combination of all 34 of the MAUD Legal Bench datasets:
    1. MAUD Ability To Consummate Concept Is Subject To MAE Carveouts: Given an excerpt from a merger agreement and the task is to answer: is the “ability to consummate” concept subject to Material Adverse Effect (MAE) carveouts? amongst the multiple choice options.
    2. MAUD Accuracy Of Fundamental Target RWS Bringdown Standard: Given an excerpt from a merger agreement and the task is to answer: how accurate must the fundamental representations and warranties be according to the bring down provision, amongst the multiple choice options.
    3. MAUD Accuracy Of Target Capitalization RW Outstanding Shares Bringdown Standard Answer: Given an excerpt from a merger agreement and the task is to answer: how accurate must the fundamental representations and warranties be according to the bring down provision, amongst the multiple choice options.
    4. MAUD Accuracy Of Target General RW Bringdown Timing Answer: Given an excerpt from a merger agreement and the task is to answer: how accurate must the fundamental representations and warranties be according to the bring down provision, amongst the multiple choice options.
    5. MAUD Additional Matching Rights Period For Modifications Cor: Given an excerpt from a merger agreement and the task is to answer: how long is the additional matching rights period for modifications in case the board changes its recommendation, amongst the multiple choice options.
    6. MAUD Application Of Buyer Consent Requirement Negative Interim Covenant: Given an excerpt from a merger agreement and the task is to answer: what negative covenants does the requirement of Buyer consent apply to, amongst the multiple choice options.
    7. MAUD Buyer Consent Requirement Ordinary Course: Given an excerpt from a merger agreement and the task is to answer: in case the Buyer's consent for the acquired company's ordinary business operations is required, are there any limitations on the Buyer's right to condition, withhold, or delay their consent, amongst the multiple choice options.
    8. MAUD Change In Law Subject To Disproportionate Impact Modifier: Given an excerpt from a merger agreement and the task is to answer: do changes in law that have disproportionate impact qualify for Material Adverse Effect (MAE), amongst the multiple choice options.
    9. MAUD Changes In GAAP Or Other Accounting Principles Subject To Disproportionate Impact Modifier: Given an excerpt from a merger agreement and the task is to answer: do changes in GAAP or other accounting principles that have disproportionate impact qualify for Material Adverse Effect (MAE), amongst the multiple choice options.
    10. MAUD COR Permitted In Response To Intervening Event: Given an excerpt from a merger agreement and the task is to answer: is Change of Recommendation permitted in response to an intervening event, amongst the multiple choice options.
    11. MAUD COR Permitted With Board Fiduciary Determination Only: Given an excerpt from a merger agreement and the task is to answer: is Change of Recommendation permitted as long as the board determines that such change is required to fulfill its fiduciary obligations, amongst the multiple choice options.
    12. MAUD COR Standard Intervening Event: Given an excerpt from a merger agreement and the task is to answer: what standard should the board follow when determining whether to change its recommendation in response to an intervening event, amongst the multiple choice options.
    13. MAUD COR Standard Superior Offer: Given an excerpt from a merger agreement and the task is to answer: what standard should the board follow when determining whether to change its recommendation in connection with a superior offer, amongst the multiple choice options.
    14. MAUD Definition Contains Knowledge Requirement Answer: Given an excerpt from a merger agreement and the task is to answer: what is the knowledge requirement in the definition of “Intervening Event”, amongst the multiple choice options.
    15. MAUD Definition Includes Asset Deals: Given an excerpt from a merger agreement and the task is to answer: what qualifies as a superior offer in terms of asset deals, amongst the multiple choice options.
    16. MAUD Definition Includes Stock Deals: Given an excerpt from a merger agreement and the task is to answer: what qualifies as a superior offer in terms of stock deals, amongst the multiple choice options.
    17. MAUD Fiduciary Exception Board Determination Standard: Given an excerpt from a merger agreement and the task is to answer: under what circumstances could the Board take actions on a different acquisition proposal notwithstanding the no-shop provision, amongst the multiple choice options.
    18. MAUD Fiduciary Exception Board Determination Trigger No Shop: Given an excerpt from a merger agreement and the task is to answer: what type of offer could the Board take actions on notwithstanding the no-shop provision, amongst the multiple choice options.
    19. MAUD Financial Point Of View Is The Sole Consideration: Given an excerpt from a merger agreement and the task is to answer: is “financial point of view” the sole consideration when determining whether an offer is superior, amongst the multiple choice options.
    20. MAUD FLS MAE Standard: Given an excerpt from a merger agreement and the task is to answer: what is the Forward Looking Standard (FLS) with respect to Material Adverse Effect (MAE), amongst the multiple choice options.
    21. MAUD General Economic and Financial Conditions Subject To Disproportionate Impact Modifier: Given an excerpt from a merger agreement and the task is to answer: do changes caused by general economic and financial conditions that have disproportionate impact qualify for Material Adverse Effect (MAE), amongst the multiple choice options.
    22. MAUD Includes Consistent With Past Practice: Given an excerpt from a merger agreement and the task is to answer: does the wording of the Efforts Covenant clause include “consistent with past practice”, amongst the multiple choice options.
    23. MAUD Initial Matching Rights Period COR: Given an excerpt from a merger agreement and the task is to answer: how long is the initial matching rights period in case the board changes its recommendation, amongst the multiple choice options.
    24. MAUD Initial Matching Rights Period FTR: Given an excerpt from a merger agreement and the task is to answer: how long is the initial matching rights period in connection with the Fiduciary Termination Right (FTR), amongst the multiple choice options.
    25. MAUDInterveningEventRequiredToOccurAfterSigningAnswer: Given an excerpt from a merger agreement and the task is to answer: is an “Intervening Event” required to occur after signing, amongst the multiple choice options.
    26. MAUD Knowledge Definition: Given an excerpt from a merger agreement and the task is to answer: what counts as Knowledge, amongst the multiple choice options.
    27. MAUDLiabilityStandardForNoShopBreachByTargetNonDORepresentatives: Given an excerpt from a merger agreement and the task is to answer:  what is the liability standard for no-shop breach by Target Non-D&O Representatives, amongst the multiple choice options.
    28. MAUD Ordinary Course Efforts Standard: Given an excerpt from a merger agreement and the task is to answer: what is the efforts standard, amongst the multiple choice options.
    29. MAUD Pandemic Or Other Public Health Event Subject To Disproportionate Impact Modifier: Given an excerpt from a merger agreement and the task is to answer: do pandemics or other public health events have to have disproportionate impact to qualify for Material Adverse Effect (MAE), amongst the multiple choice options.
    30. MAUD Pandemic Or Other Public Health Event Specific Reference To Pandemic Related Governmental Responses Or Measures: Given an excerpt from a merger agreement and the task is to answer: is there specific reference to pandemic-related governmental responses or measures in the clause that qualifies pandemics or other public health events for Material Adverse Effect (MAE), amongst the multiple choice options.
    31. MAUD Relational Language MAE Applies To: Given an excerpt from a merger agreement and the task is to answer: what carveouts pertaining to Material Adverse Effect (MAE) does the relational language apply to?, amongst the multiple choice options.
    32. MAUD Specific Performance: Given an excerpt from a merger agreement and the task is to answer: what is the wording of the Specific Performance clause regarding the parties' entitlement in the event of a contractual breach, amongst the multiple choice options.
    33. MAUD Tail Period Length: Given an excerpt from a merger agreement and the task is to answer: how long is the Tail Period, amongst the multiple choice options.
    34. MAUD Type Of Consideration: Given an excerpt from a merger agreement and the task is to answer: what type of consideration is specified in this agreement, amongst the multiple choice options.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{wang2023maud,
  author = {Wang, Steven H and Scardigli, Antoine and Tang, Leonard and Chen, Wei and Levkin, Dimitry and Chen, Anya and Ball, Spencer and Woodside, Thomas and Zhang, Oliver and Hendrycks, Dan},
  journal = {arXiv preprint arXiv:2301.00876},
  title = {MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding},
  year = {2023},
}

MAUDLegalBenchClassification.v2

This task was constructed from the MAUD dataset, which consists of over 47,000 labels across 152 merger agreements annotated to identify 92 questions in each agreement used by the 2021 American Bar Association (ABA) Public Target Deal Points Study. Each dataset is formatted as a series of multiple-choice questions, where given a segment of the merger agreement and a Deal Point question, the model is to choose the answer that best characterizes the agreement as response.

    This is a combination of all 34 of the MAUD Legal Bench datasets:
    1. MAUD Ability To Consummate Concept Is Subject To MAE Carveouts: Given an excerpt from a merger agreement and the task is to answer: is the “ability to consummate” concept subject to Material Adverse Effect (MAE) carveouts? amongst the multiple choice options.
    2. MAUD Accuracy Of Fundamental Target RWS Bringdown Standard: Given an excerpt from a merger agreement and the task is to answer: how accurate must the fundamental representations and warranties be according to the bring down provision, amongst the multiple choice options.
    3. MAUD Accuracy Of Target Capitalization RW Outstanding Shares Bringdown Standard Answer: Given an excerpt from a merger agreement and the task is to answer: how accurate must the fundamental representations and warranties be according to the bring down provision, amongst the multiple choice options.
    4. MAUD Accuracy Of Target General RW Bringdown Timing Answer: Given an excerpt from a merger agreement and the task is to answer: how accurate must the fundamental representations and warranties be according to the bring down provision, amongst the multiple choice options.
    5. MAUD Additional Matching Rights Period For Modifications Cor: Given an excerpt from a merger agreement and the task is to answer: how long is the additional matching rights period for modifications in case the board changes its recommendation, amongst the multiple choice options.
    6. MAUD Application Of Buyer Consent Requirement Negative Interim Covenant: Given an excerpt from a merger agreement and the task is to answer: what negative covenants does the requirement of Buyer consent apply to, amongst the multiple choice options.
    7. MAUD Buyer Consent Requirement Ordinary Course: Given an excerpt from a merger agreement and the task is to answer: in case the Buyer's consent for the acquired company's ordinary business operations is required, are there any limitations on the Buyer's right to condition, withhold, or delay their consent, amongst the multiple choice options.
    8. MAUD Change In Law Subject To Disproportionate Impact Modifier: Given an excerpt from a merger agreement and the task is to answer: do changes in law that have disproportionate impact qualify for Material Adverse Effect (MAE), amongst the multiple choice options.
    9. MAUD Changes In GAAP Or Other Accounting Principles Subject To Disproportionate Impact Modifier: Given an excerpt from a merger agreement and the task is to answer: do changes in GAAP or other accounting principles that have disproportionate impact qualify for Material Adverse Effect (MAE), amongst the multiple choice options.
    10. MAUD COR Permitted In Response To Intervening Event: Given an excerpt from a merger agreement and the task is to answer: is Change of Recommendation permitted in response to an intervening event, amongst the multiple choice options.
    11. MAUD COR Permitted With Board Fiduciary Determination Only: Given an excerpt from a merger agreement and the task is to answer: is Change of Recommendation permitted as long as the board determines that such change is required to fulfill its fiduciary obligations, amongst the multiple choice options.
    12. MAUD COR Standard Intervening Event: Given an excerpt from a merger agreement and the task is to answer: what standard should the board follow when determining whether to change its recommendation in response to an intervening event, amongst the multiple choice options.
    13. MAUD COR Standard Superior Offer: Given an excerpt from a merger agreement and the task is to answer: what standard should the board follow when determining whether to change its recommendation in connection with a superior offer, amongst the multiple choice options.
    14. MAUD Definition Contains Knowledge Requirement Answer: Given an excerpt from a merger agreement and the task is to answer: what is the knowledge requirement in the definition of “Intervening Event”, amongst the multiple choice options.
    15. MAUD Definition Includes Asset Deals: Given an excerpt from a merger agreement and the task is to answer: what qualifies as a superior offer in terms of asset deals, amongst the multiple choice options.
    16. MAUD Definition Includes Stock Deals: Given an excerpt from a merger agreement and the task is to answer: what qualifies as a superior offer in terms of stock deals, amongst the multiple choice options.
    17. MAUD Fiduciary Exception Board Determination Standard: Given an excerpt from a merger agreement and the task is to answer: under what circumstances could the Board take actions on a different acquisition proposal notwithstanding the no-shop provision, amongst the multiple choice options.
    18. MAUD Fiduciary Exception Board Determination Trigger No Shop: Given an excerpt from a merger agreement and the task is to answer: what type of offer could the Board take actions on notwithstanding the no-shop provision, amongst the multiple choice options.
    19. MAUD Financial Point Of View Is The Sole Consideration: Given an excerpt from a merger agreement and the task is to answer: is “financial point of view” the sole consideration when determining whether an offer is superior, amongst the multiple choice options.
    20. MAUD FLS MAE Standard: Given an excerpt from a merger agreement and the task is to answer: what is the Forward Looking Standard (FLS) with respect to Material Adverse Effect (MAE), amongst the multiple choice options.
    21. MAUD General Economic and Financial Conditions Subject To Disproportionate Impact Modifier: Given an excerpt from a merger agreement and the task is to answer: do changes caused by general economic and financial conditions that have disproportionate impact qualify for Material Adverse Effect (MAE), amongst the multiple choice options.
    22. MAUD Includes Consistent With Past Practice: Given an excerpt from a merger agreement and the task is to answer: does the wording of the Efforts Covenant clause include “consistent with past practice”, amongst the multiple choice options.
    23. MAUD Initial Matching Rights Period COR: Given an excerpt from a merger agreement and the task is to answer: how long is the initial matching rights period in case the board changes its recommendation, amongst the multiple choice options.
    24. MAUD Initial Matching Rights Period FTR: Given an excerpt from a merger agreement and the task is to answer: how long is the initial matching rights period in connection with the Fiduciary Termination Right (FTR), amongst the multiple choice options.
    25. MAUDInterveningEventRequiredToOccurAfterSigningAnswer: Given an excerpt from a merger agreement and the task is to answer: is an “Intervening Event” required to occur after signing, amongst the multiple choice options.
    26. MAUD Knowledge Definition: Given an excerpt from a merger agreement and the task is to answer: what counts as Knowledge, amongst the multiple choice options.
    27. MAUDLiabilityStandardForNoShopBreachByTargetNonDORepresentatives: Given an excerpt from a merger agreement and the task is to answer:  what is the liability standard for no-shop breach by Target Non-D&O Representatives, amongst the multiple choice options.
    28. MAUD Ordinary Course Efforts Standard: Given an excerpt from a merger agreement and the task is to answer: what is the efforts standard, amongst the multiple choice options.
    29. MAUD Pandemic Or Other Public Health Event Subject To Disproportionate Impact Modifier: Given an excerpt from a merger agreement and the task is to answer: do pandemics or other public health events have to have disproportionate impact to qualify for Material Adverse Effect (MAE), amongst the multiple choice options.
    30. MAUD Pandemic Or Other Public Health Event Specific Reference To Pandemic Related Governmental Responses Or Measures: Given an excerpt from a merger agreement and the task is to answer: is there specific reference to pandemic-related governmental responses or measures in the clause that qualifies pandemics or other public health events for Material Adverse Effect (MAE), amongst the multiple choice options.
    31. MAUD Relational Language MAE Applies To: Given an excerpt from a merger agreement and the task is to answer: what carveouts pertaining to Material Adverse Effect (MAE) does the relational language apply to?, amongst the multiple choice options.
    32. MAUD Specific Performance: Given an excerpt from a merger agreement and the task is to answer: what is the wording of the Specific Performance clause regarding the parties' entitlement in the event of a contractual breach, amongst the multiple choice options.
    33. MAUD Tail Period Length: Given an excerpt from a merger agreement and the task is to answer: how long is the Tail Period, amongst the multiple choice options.
    34. MAUD Type Of Consideration: Given an excerpt from a merger agreement and the task is to answer: what type of consideration is specified in this agreement, amongst the multiple choice options.

    This version corrects errors found in the original data. For details, see [pull request](https://github.com/embeddings-benchmark/mteb/pull/2900)

Dataset: mteb/maud_legal_benchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{wang2023maud,
  author = {Wang, Steven H and Scardigli, Antoine and Tang, Leonard and Chen, Wei and Levkin, Dimitry and Chen, Anya and Ball, Spencer and Woodside, Thomas and Zhang, Oliver and Hendrycks, Dan},
  journal = {arXiv preprint arXiv:2301.00876},
  title = {MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding},
  year = {2023},
}

MTOPDomainClassification

MTOP: Multilingual Task-Oriented Semantic Parsing

Dataset: mteb/mtop_domainLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy deu, eng, fra, hin, spa, ... (6) Spoken, Spoken human-annotated created
Citation
@inproceedings{li-etal-2021-mtop,
  abstract = {Scaling semantic parsing models for task-oriented dialog systems to new languages is often expensive and time-consuming due to the lack of available datasets. Available datasets suffer from several shortcomings: a) they contain few languages b) they contain small amounts of labeled examples per language c) they are based on the simple intent and slot detection paradigm for non-compositional queries. In this paper, we present a new multilingual dataset, called MTOP, comprising of 100k annotated utterances in 6 languages across 11 domains. We use this dataset and other publicly available datasets to conduct a comprehensive benchmarking study on using various state-of-the-art multilingual pre-trained models for task-oriented semantic parsing. We achieve an average improvement of +6.3 points on Slot F1 for the two existing multilingual datasets, over best results reported in their experiments. Furthermore, we demonstrate strong zero-shot performance using pre-trained models combined with automatic translation and alignment, and a proposed distant supervision method to reduce the noise in slot label projection.},
  address = {Online},
  author = {Li, Haoran  and
Arora, Abhinav  and
Chen, Shuohui  and
Gupta, Anchit  and
Gupta, Sonal  and
Mehdad, Yashar},
  booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume},
  doi = {10.18653/v1/2021.eacl-main.257},
  editor = {Merlo, Paola  and
Tiedemann, Jorg  and
Tsarfaty, Reut},
  month = apr,
  pages = {2950--2962},
  publisher = {Association for Computational Linguistics},
  title = {{MTOP}: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark},
  url = {https://aclanthology.org/2021.eacl-main.257},
  year = {2021},
}

MTOPDomainVNClassification

A translated dataset from MTOP: Multilingual Task-Oriented Semantic Parsing The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/mtop-domain-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Spoken, Spoken derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

MTOPIntentClassification

MTOP: Multilingual Task-Oriented Semantic Parsing

Dataset: mteb/mtop_intentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy deu, eng, fra, hin, spa, ... (6) Spoken, Spoken human-annotated created
Citation
@inproceedings{li-etal-2021-mtop,
  abstract = {Scaling semantic parsing models for task-oriented dialog systems to new languages is often expensive and time-consuming due to the lack of available datasets. Available datasets suffer from several shortcomings: a) they contain few languages b) they contain small amounts of labeled examples per language c) they are based on the simple intent and slot detection paradigm for non-compositional queries. In this paper, we present a new multilingual dataset, called MTOP, comprising of 100k annotated utterances in 6 languages across 11 domains. We use this dataset and other publicly available datasets to conduct a comprehensive benchmarking study on using various state-of-the-art multilingual pre-trained models for task-oriented semantic parsing. We achieve an average improvement of +6.3 points on Slot F1 for the two existing multilingual datasets, over best results reported in their experiments. Furthermore, we demonstrate strong zero-shot performance using pre-trained models combined with automatic translation and alignment, and a proposed distant supervision method to reduce the noise in slot label projection.},
  address = {Online},
  author = {Li, Haoran  and
Arora, Abhinav  and
Chen, Shuohui  and
Gupta, Anchit  and
Gupta, Sonal  and
Mehdad, Yashar},
  booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume},
  doi = {10.18653/v1/2021.eacl-main.257},
  editor = {Merlo, Paola  and
Tiedemann, Jorg  and
Tsarfaty, Reut},
  month = apr,
  pages = {2950--2962},
  publisher = {Association for Computational Linguistics},
  title = {{MTOP}: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark},
  url = {https://aclanthology.org/2021.eacl-main.257},
  year = {2021},
}

MTOPIntentVNClassification

A translated dataset from MTOP: Multilingual Task-Oriented Semantic Parsing The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/mtop-intent-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Spoken, Spoken derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

MacedonianTweetSentimentClassification

An Macedonian dataset for tweet sentiment classification.

Dataset: isaacchung/macedonian-tweet-sentiment-classificationLicense: cc-by-nc-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy mkd Social, Written human-annotated found
Citation
@inproceedings{jovanoski-etal-2015-sentiment,
  address = {Hissar, Bulgaria},
  author = {Jovanoski, Dame  and
Pachovski, Veno  and
Nakov, Preslav},
  booktitle = {Proceedings of the International Conference Recent Advances in Natural Language Processing},
  editor = {Mitkov, Ruslan  and
Angelova, Galia  and
Bontcheva, Kalina},
  month = sep,
  pages = {249--257},
  publisher = {INCOMA Ltd. Shoumen, BULGARIA},
  title = {Sentiment Analysis in {T}witter for {M}acedonian},
  url = {https://aclanthology.org/R15-1034},
  year = {2015},
}

MacedonianTweetSentimentClassification.v2

An Macedonian dataset for tweet sentiment classification. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/macedonian_tweet_sentimentLicense: cc-by-nc-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy mkd Social, Written human-annotated found
Citation
@inproceedings{jovanoski-etal-2015-sentiment,
  address = {Hissar, Bulgaria},
  author = {Jovanoski, Dame  and
Pachovski, Veno  and
Nakov, Preslav},
  booktitle = {Proceedings of the International Conference Recent Advances in Natural Language Processing},
  editor = {Mitkov, Ruslan  and
Angelova, Galia  and
Bontcheva, Kalina},
  month = sep,
  pages = {249--257},
  publisher = {INCOMA Ltd. Shoumen, BULGARIA},
  title = {Sentiment Analysis in {T}witter for {M}acedonian},
  url = {https://aclanthology.org/R15-1034},
  year = {2015},
}

MalayalamNewsClassification

A Malayalam dataset for 3-class classification of Malayalam news articles

Dataset: mlexplorer008/malayalam_news_classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy mal News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

MalayalamNewsClassification.v2

A Malayalam dataset for 3-class classification of Malayalam news articles This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/malayalam_newsLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy mal News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

MarathiNewsClassification

A Marathi dataset for 3-class classification of Marathi news articles

Dataset: mlexplorer008/marathi_news_classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 mar News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

MarathiNewsClassification.v2

A Marathi dataset for 3-class classification of Marathi news articles This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/marathi_newsLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 mar News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

MasakhaNEWSClassification

MasakhaNEWS is the largest publicly available dataset for news topic classification in 16 languages widely spoken in Africa. The train/validation/test sets are available for all the 16 languages.

Dataset: mteb/masakhanewsLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy amh, eng, fra, hau, ibo, ... (16) News, Written expert-annotated found
Citation
@misc{adelani2023masakhanews,
  archiveprefix = {arXiv},
  author = {David Ifeoluwa Adelani and Marek Masiak and Israel Abebe Azime and Jesujoba Alabi and Atnafu Lambebo Tonja and Christine Mwase and Odunayo Ogundepo and Bonaventure F. P. Dossou and Akintunde Oladipo and Doreen Nixdorf and Chris Chinenye Emezue and sana al-azzawi and Blessing Sibanda and Davis David and Lolwethu Ndolela and Jonathan Mukiibi and Tunde Ajayi and Tatiana Moteu and Brian Odhiambo and Abraham Owodunni and Nnaemeka Obiefuna and Muhidin Mohamed and Shamsuddeen Hassan Muhammad and Teshome Mulugeta Ababu and Saheed Abdullahi Salahudeen and Mesay Gemeda Yigezu and Tajuddeen Gwadabe and Idris Abdulmumin and Mahlet Taye and Oluwabusayo Awoyomi and Iyanuoluwa Shode and Tolulope Adelani and Habiba Abdulganiyu and Abdul-Hakeem Omotayo and Adetola Adeeko and Abeeb Afolabi and Anuoluwapo Aremu and Olanrewaju Samuel and Clemencia Siro and Wangari Kimotho and Onyekachi Ogbu and Chinedu Mbonu and Chiamaka Chukwuneke and Samuel Fanijo and Jessica Ojo and Oyinkansola Awosan and Tadesse Kebede and Toadoum Sari Sakayo and Pamela Nyatsine and Freedmore Sidume and Oreen Yousuf and Mardiyyah Oduwole and Tshinu Tshinu and Ussen Kimanuka and Thina Diko and Siyanda Nxakama and Sinodos Nigusse and Abdulmejid Johar and Shafie Mohamed and Fuad Mire Hassan and Moges Ahmed Mehamed and Evrard Ngabire and Jules Jules and Ivan Ssenkungu and Pontus Stenetorp},
  eprint = {2304.09972},
  primaryclass = {cs.CL},
  title = {MasakhaNEWS: News Topic Classification for African languages},
  year = {2023},
}

MassiveIntentClassification

MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages

Dataset: mteb/amazon_massive_intentLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy afr, amh, ara, aze, ben, ... (50) Spoken human-annotated human-translated and localized
Citation
@misc{fitzgerald2022massive,
  archiveprefix = {arXiv},
  author = {Jack FitzGerald and Christopher Hench and Charith Peris and Scott Mackie and Kay Rottmann and Ana Sanchez and Aaron Nash and Liam Urbach and Vishesh Kakarala and Richa Singh and Swetha Ranganath and Laurie Crist and Misha Britan and Wouter Leeuwis and Gokhan Tur and Prem Natarajan},
  eprint = {2204.08582},
  primaryclass = {cs.CL},
  title = {MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages},
  year = {2022},
}

MassiveIntentVNClassification

A translated dataset from MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/amazon-massive-intent-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Spoken derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

MassiveScenarioClassification

MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages

Dataset: mteb/amazon_massive_scenarioLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy afr, amh, ara, aze, ben, ... (50) Spoken human-annotated human-translated and localized
Citation
@misc{fitzgerald2022massive,
  archiveprefix = {arXiv},
  author = {Jack FitzGerald and Christopher Hench and Charith Peris and Scott Mackie and Kay Rottmann and Ana Sanchez and Aaron Nash and Liam Urbach and Vishesh Kakarala and Richa Singh and Swetha Ranganath and Laurie Crist and Misha Britan and Wouter Leeuwis and Gokhan Tur and Prem Natarajan},
  eprint = {2204.08582},
  primaryclass = {cs.CL},
  title = {MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages},
  year = {2022},
}

MassiveScenarioVNClassification

A translated dataset from MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/amazon-massive-scenario-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Spoken derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

Moroco

The Moldavian and Romanian Dialectal Corpus. The MOROCO data set contains Moldavian and Romanian samples of text collected from the news domain. The samples belong to one of the following six topics: (0) culture, (1) finance, (2) politics, (3) science, (4) sports, (5) tech

Dataset: universityofbucharest/morocoLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ron News, Written derived found
Citation
@inproceedings{Butnaru-ACL-2019,
  author = {Andrei M. Butnaru and Radu Tudor Ionescu},
  booktitle = {Proceedings of ACL},
  pages = {688--698},
  title = {{MOROCO: The Moldavian and Romanian Dialectal Corpus}},
  year = {2019},
}

Moroco.v2

The Moldavian and Romanian Dialectal Corpus. The MOROCO data set contains Moldavian and Romanian samples of text collected from the news domain. The samples belong to one of the following six topics: (0) culture, (1) finance, (2) politics, (3) science, (4) sports, (5) tech This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/morocoLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ron News, Written derived found
Citation
@inproceedings{Butnaru-ACL-2019,
  author = {Andrei M. Butnaru and Radu Tudor Ionescu},
  booktitle = {Proceedings of ACL},
  pages = {688--698},
  title = {{MOROCO: The Moldavian and Romanian Dialectal Corpus}},
  year = {2019},
}

MovieReviewSentimentClassification

The Allociné dataset is a French-language dataset for sentiment analysis that contains movie reviews produced by the online community of the Allociné.fr website.

Dataset: tblard/allocineLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fra Reviews, Written derived found
Citation
@software{blard2020,
  author = {Théophile Blard},
  title = {French sentiment analysis with BERT},
  url = {https://github.com/TheophileBlard/french-sentiment-analysis-with-bert},
  year = {2020},
}

MovieReviewSentimentClassification.v2

The Allociné dataset is a French-language dataset for sentiment analysis that contains movie reviews produced by the online community of the Allociné.fr website. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/movie_review_sentimentLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fra Reviews, Written derived found
Citation
@software{blard2020,
  author = {Théophile Blard},
  title = {French sentiment analysis with BERT},
  url = {https://github.com/TheophileBlard/french-sentiment-analysis-with-bert},
  year = {2020},
}

MultiHateClassification

Hate speech detection dataset with binary (hateful vs non-hateful) labels. Includes 25+ distinct types of hate and challenging non-hate, and 11 languages.

Dataset: mteb/multi-hatecheckLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara, cmn, deu, eng, fra, ... (11) Constructed, Written expert-annotated created
Citation
@inproceedings{rottger-etal-2021-hatecheck,
  abstract = {Detecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also risks overestimating generalisable model performance due to increasingly well-evidenced systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, we introduce HateCheck, a suite of functional tests for hate speech detection models. We specify 29 model functionalities motivated by a review of previous research and a series of interviews with civil society stakeholders. We craft test cases for each functionality and validate their quality through a structured annotation process. To illustrate HateCheck{'}s utility, we test near-state-of-the-art transformer models as well as two popular commercial models, revealing critical model weaknesses.},
  address = {Online},
  author = {R{\"o}ttger, Paul  and
Vidgen, Bertie  and
Nguyen, Dong  and
Waseem, Zeerak  and
Margetts, Helen  and
Pierrehumbert, Janet},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
  doi = {10.18653/v1/2021.acl-long.4},
  editor = {Zong, Chengqing  and
Xia, Fei  and
Li, Wenjie  and
Navigli, Roberto},
  month = aug,
  pages = {41--58},
  publisher = {Association for Computational Linguistics},
  title = {{H}ate{C}heck: Functional Tests for Hate Speech Detection Models},
  url = {https://aclanthology.org/2021.acl-long.4},
  year = {2021},
}

@inproceedings{rottger-etal-2022-multilingual,
  abstract = {Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, recent research has thus introduced functional tests for hate speech detection models. However, these tests currently only exist for English-language content, which means that they cannot support the development of more effective models in other languages spoken by billions across the world. To help address this issue, we introduce Multilingual HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models. MHC covers 34 functionalities across ten languages, which is more languages than any other hate speech dataset. To illustrate MHC{'}s utility, we train and test a high-performing multilingual hate speech detection model, and reveal critical model weaknesses for monolingual and cross-lingual applications.},
  address = {Seattle, Washington (Hybrid)},
  author = {R{\"o}ttger, Paul  and
Seelawi, Haitham  and
Nozza, Debora  and
Talat, Zeerak  and
Vidgen, Bertie},
  booktitle = {Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)},
  doi = {10.18653/v1/2022.woah-1.15},
  editor = {Narang, Kanika  and
Mostafazadeh Davani, Aida  and
Mathias, Lambert  and
Vidgen, Bertie  and
Talat, Zeerak},
  month = jul,
  pages = {154--169},
  publisher = {Association for Computational Linguistics},
  title = {Multilingual {H}ate{C}heck: Functional Tests for Multilingual Hate Speech Detection Models},
  url = {https://aclanthology.org/2022.woah-1.15},
  year = {2022},
}

MultilingualSentiment

A collection of multilingual sentiments datasets grouped into 3 classes -- positive, neutral, negative

Dataset: C-MTEB/MultilingualSentiment-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified

MultilingualSentiment.v2

A collection of multilingual sentiments datasets grouped into 3 classes -- positive, neutral, negative This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/multilingual_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified

MultilingualSentimentClassification

Sentiment classification dataset with binary (positive vs negative sentiment) labels. Includes 30 languages and dialects.

Dataset: mteb/multilingual-sentiment-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara, bam, bul, cmn, cym, ... (29) Reviews, Written derived found
Citation
@inproceedings{mollanorozy-etal-2023-cross,
  address = {Dubrovnik, Croatia},
  author = {Mollanorozy, Sepideh  and
Tanti, Marc  and
Nissim, Malvina},
  booktitle = {Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
  doi = {10.18653/v1/2023.sigtyp-1.9},
  editor = {Beinborn, Lisa  and
Goswami, Koustava  and
Murado{\\u{g}}lu, Saliha  and
Sorokin, Alexey  and
Kumar, Ritesh  and
Shcherbakov, Andreas  and
Ponti, Edoardo M.  and
Cotterell, Ryan  and
Vylomova, Ekaterina},
  month = may,
  pages = {89--95},
  publisher = {Association for Computational Linguistics},
  title = {Cross-lingual Transfer Learning with \{P\}ersian},
  url = {https://aclanthology.org/2023.sigtyp-1.9},
  year = {2023},
}

MyanmarNews

The Myanmar News dataset on Hugging Face contains news articles in Burmese. It is designed for tasks such as text classification, sentiment analysis, and language modeling. The dataset includes a variety of news topics in 4 categorie, providing a rich resource for natural language processing applications involving Burmese which is a low resource language.

Dataset: mteb/MyanmarNewsLicense: gpl-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy mya News, Written derived found
Citation
@inproceedings{Khine2017,
  author = {A. H. Khine and K. T. Nwet and K. M. Soe},
  booktitle = {15th Proceedings of International Conference on Computer Applications},
  month = {February},
  pages = {401--408},
  title = {Automatic Myanmar News Classification},
  year = {2017},
}

MyanmarNews.v2

The Myanmar News dataset on Hugging Face contains news articles in Burmese. It is designed for tasks such as text classification, sentiment analysis, and language modeling. The dataset includes a variety of news topics in 4 categorie, providing a rich resource for natural language processing applications involving Burmese which is a low resource language. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/myanmar_newsLicense: gpl-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy mya News, Written derived found
Citation
@inproceedings{Khine2017,
  author = {A. H. Khine and K. T. Nwet and K. M. Soe},
  booktitle = {15th Proceedings of International Conference on Computer Applications},
  month = {February},
  pages = {401--408},
  title = {Automatic Myanmar News Classification},
  year = {2017},
}

NLPTwitterAnalysisClassification

Twitter Analysis Classification

Dataset: hamedhf/nlp_twitter_analysisLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Social derived found
Citation

NLPTwitterAnalysisClassification.v2

Twitter Analysis Classification This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/nlp_twitter_analysisLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Social derived found
Citation

NYSJudicialEthicsLegalBenchClassification

Answer questions on judicial ethics from the New York State Unified Court System Advisory Committee.

Dataset: nguha/legalbenchLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

NaijaSenti

NaijaSenti is the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria — Hausa, Igbo, Nigerian-Pidgin, and Yorùbá — consisting of around 30,000 annotated tweets per language, including a significant fraction of code-mixed tweets.

Dataset: HausaNLP/NaijaSenti-TwitterLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy hau, ibo, pcm, yor Social, Written expert-annotated found
Citation
@inproceedings{muhammad-etal-2022-naijasenti,
  address = {Marseille, France},
  author = {Muhammad, Shamsuddeen Hassan  and
Adelani, David Ifeoluwa  and
Ruder, Sebastian  and
Ahmad, Ibrahim Sa{'}id  and
Abdulmumin, Idris  and
Bello, Bello Shehu  and
Choudhury, Monojit  and
Emezue, Chris Chinenye  and
Abdullahi, Saheed Salahudeen  and
Aremu, Anuoluwapo  and
Jorge, Al{\'\i}pio  and
Brazdil, Pavel},
  booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference},
  month = jun,
  pages = {590--602},
  publisher = {European Language Resources Association},
  title = {{N}aija{S}enti: A {N}igerian {T}witter Sentiment Corpus for Multilingual Sentiment Analysis},
  url = {https://aclanthology.org/2022.lrec-1.63},
  year = {2022},
}

NepaliNewsClassification

A Nepali dataset for 7500 news articles

Dataset: bpHigh/iNLTK_Nepali_News_DatasetLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy nep News, Written derived found
Citation
@inproceedings{arora-2020-inltk,
  abstract = {We present iNLTK, an open-source NLP library consisting of pre-trained language models and out-of-the-box support for Data Augmentation, Textual Similarity, Sentence Embeddings, Word Embeddings, Tokenization and Text Generation in 13 Indic Languages. By using pre-trained models from iNLTK for text classification on publicly available datasets, we significantly outperform previously reported results. On these datasets, we also show that by using pre-trained models and data augmentation from iNLTK, we can achieve more than 95{\%} of the previous best performance by using less than 10{\%} of the training data. iNLTK is already being widely used by the community and has 40,000+ downloads, 600+ stars and 100+ forks on GitHub.},
  address = {Online},
  author = {Arora, Gaurav},
  booktitle = {Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)},
  doi = {10.18653/v1/2020.nlposs-1.10},
  editor = {Park, Eunjeong L.  and
Hagiwara, Masato  and
Milajevs, Dmitrijs  and
Liu, Nelson F.  and
Chauhan, Geeticka  and
Tan, Liling},
  month = nov,
  pages = {66--71},
  publisher = {Association for Computational Linguistics},
  title = {i{NLTK}: Natural Language Toolkit for Indic Languages},
  url = {https://aclanthology.org/2020.nlposs-1.10},
  year = {2020},
}

NepaliNewsClassification.v2

A Nepali dataset for 7500 news articles This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/nepali_newsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy nep News, Written derived found
Citation
@inproceedings{arora-2020-inltk,
  abstract = {We present iNLTK, an open-source NLP library consisting of pre-trained language models and out-of-the-box support for Data Augmentation, Textual Similarity, Sentence Embeddings, Word Embeddings, Tokenization and Text Generation in 13 Indic Languages. By using pre-trained models from iNLTK for text classification on publicly available datasets, we significantly outperform previously reported results. On these datasets, we also show that by using pre-trained models and data augmentation from iNLTK, we can achieve more than 95{\%} of the previous best performance by using less than 10{\%} of the training data. iNLTK is already being widely used by the community and has 40,000+ downloads, 600+ stars and 100+ forks on GitHub.},
  address = {Online},
  author = {Arora, Gaurav},
  booktitle = {Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)},
  doi = {10.18653/v1/2020.nlposs-1.10},
  editor = {Park, Eunjeong L.  and
Hagiwara, Masato  and
Milajevs, Dmitrijs  and
Liu, Nelson F.  and
Chauhan, Geeticka  and
Tan, Liling},
  month = nov,
  pages = {66--71},
  publisher = {Association for Computational Linguistics},
  title = {i{NLTK}: Natural Language Toolkit for Indic Languages},
  url = {https://aclanthology.org/2020.nlposs-1.10},
  year = {2020},
}

NewsClassification

Large News Classification Dataset

Dataset: fancyzhx/ag_newsLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng News, Written expert-annotated found
Citation
@inproceedings{NIPS2015_250cf8b5,
  author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
  pages = {},
  publisher = {Curran Associates, Inc.},
  title = {Character-level Convolutional Networks for Text Classification},
  url = {https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf},
  volume = {28},
  year = {2015},
}

NewsClassification.v2

Large News Classification Dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/newsLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng News, Written expert-annotated found
Citation
@inproceedings{NIPS2015_250cf8b5,
  author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
  pages = {},
  publisher = {Curran Associates, Inc.},
  title = {Character-level Convolutional Networks for Text Classification},
  url = {https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf},
  volume = {28},
  year = {2015},
}

NoRecClassification

A Norwegian dataset for sentiment classification on review

Dataset: mteb/norec_classificationLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy nob Reviews, Written derived found
Citation
@inproceedings{velldal-etal-2018-norec,
  address = {Miyazaki, Japan},
  author = {Velldal, Erik  and
{\\O}vrelid, Lilja  and
Bergem, Eivind Alexander  and
Stadsnes, Cathrine  and
Touileb, Samia  and
J{\\o}rgensen, Fredrik},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)},
  editor = {Calzolari, Nicoletta  and
Choukri, Khalid  and
Cieri, Christopher  and
Declerck, Thierry  and
Goggi, Sara  and
Hasida, Koiti  and
Isahara, Hitoshi  and
Maegaard, Bente  and
Mariani, Joseph  and
Mazo, H{\\'e}l{\\`e}ne  and
Moreno, Asuncion  and
Odijk, Jan  and
Piperidis, Stelios  and
Tokunaga, Takenobu},
  month = may,
  publisher = {European Language Resources Association (ELRA)},
  title = {{N}o{R}e{C}: The {N}orwegian Review Corpus},
  url = {https://aclanthology.org/L18-1661},
  year = {2018},
}

NoRecClassification.v2

A Norwegian dataset for sentiment classification on review This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/no_recLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy nob Reviews, Written derived found
Citation
@inproceedings{velldal-etal-2018-norec,
  address = {Miyazaki, Japan},
  author = {Velldal, Erik  and
{\\O}vrelid, Lilja  and
Bergem, Eivind Alexander  and
Stadsnes, Cathrine  and
Touileb, Samia  and
J{\\o}rgensen, Fredrik},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)},
  editor = {Calzolari, Nicoletta  and
Choukri, Khalid  and
Cieri, Christopher  and
Declerck, Thierry  and
Goggi, Sara  and
Hasida, Koiti  and
Isahara, Hitoshi  and
Maegaard, Bente  and
Mariani, Joseph  and
Mazo, H{\\'e}l{\\`e}ne  and
Moreno, Asuncion  and
Odijk, Jan  and
Piperidis, Stelios  and
Tokunaga, Takenobu},
  month = may,
  publisher = {European Language Resources Association (ELRA)},
  title = {{N}o{R}e{C}: The {N}orwegian Review Corpus},
  url = {https://aclanthology.org/L18-1661},
  year = {2018},
}

NordicLangClassification

A dataset for Nordic language identification.

Dataset: strombergnlp/nordic_langidLicense: cc-by-sa-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan, fao, isl, nno, nob, ... (6) Encyclopaedic derived found
Citation
@inproceedings{haas-derczynski-2021-discriminating,
  abstract = {Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokm{\aa}l), Faroese and Icelandic.},
  address = {Kiyv, Ukraine},
  author = {Haas, Ren{\'e}  and
Derczynski, Leon},
  booktitle = {Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects},
  editor = {Zampieri, Marcos  and
Nakov, Preslav  and
Ljube{\v{s}}i{\'c}, Nikola  and
Tiedemann, J{\"o}rg  and
Scherrer, Yves  and
Jauhiainen, Tommi},
  month = apr,
  pages = {67--75},
  publisher = {Association for Computational Linguistics},
  title = {Discriminating Between Similar {N}ordic Languages},
  url = {https://aclanthology.org/2021.vardial-1.8},
  year = {2021},
}

NorwegianParliamentClassification

Norwegian parliament speeches annotated for sentiment

Dataset: NbAiLab/norwegian_parliamentLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy nob Government, Spoken derived found
Citation
@inproceedings{kummervold-etal-2021-operationalizing,
  abstract = {In this work, we show the process of building a large-scale training set from digital and digitized collections at a national library. The resulting Bidirectional Encoder Representations from Transformers (BERT)-based language model for Norwegian outperforms multilingual BERT (mBERT) models in several token and sequence classification tasks for both Norwegian Bokm{\aa}l and Norwegian Nynorsk. Our model also improves the mBERT performance for other languages present in the corpus such as English, Swedish, and Danish. For languages not included in the corpus, the weights degrade moderately while keeping strong multilingual properties. Therefore, we show that building high-quality models within a memory institution using somewhat noisy optical character recognition (OCR) content is feasible, and we hope to pave the way for other memory institutions to follow.},
  address = {Reykjavik, Iceland (Online)},
  author = {Kummervold, Per E  and
De la Rosa, Javier  and
Wetjen, Freddy  and
Brygfjeld, Svein Arne},
  booktitle = {Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)},
  editor = {Dobnik, Simon  and
{\O}vrelid, Lilja},
  month = may # { 31--2 } # jun,
  pages = {20--29},
  publisher = {Link{\"o}ping University Electronic Press, Sweden},
  title = {Operationalizing a National Digital Library: The Case for a {N}orwegian Transformer Model},
  url = {https://aclanthology.org/2021.nodalida-main.3},
  year = {2021},
}

NorwegianParliamentClassification.v2

Norwegian parliament speeches annotated for sentiment This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/norwegian_parliamentLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy nob Government, Spoken derived found
Citation
@inproceedings{kummervold-etal-2021-operationalizing,
  abstract = {In this work, we show the process of building a large-scale training set from digital and digitized collections at a national library. The resulting Bidirectional Encoder Representations from Transformers (BERT)-based language model for Norwegian outperforms multilingual BERT (mBERT) models in several token and sequence classification tasks for both Norwegian Bokm{\aa}l and Norwegian Nynorsk. Our model also improves the mBERT performance for other languages present in the corpus such as English, Swedish, and Danish. For languages not included in the corpus, the weights degrade moderately while keeping strong multilingual properties. Therefore, we show that building high-quality models within a memory institution using somewhat noisy optical character recognition (OCR) content is feasible, and we hope to pave the way for other memory institutions to follow.},
  address = {Reykjavik, Iceland (Online)},
  author = {Kummervold, Per E  and
De la Rosa, Javier  and
Wetjen, Freddy  and
Brygfjeld, Svein Arne},
  booktitle = {Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)},
  editor = {Dobnik, Simon  and
{\O}vrelid, Lilja},
  month = may # { 31--2 } # jun,
  pages = {20--29},
  publisher = {Link{\"o}ping University Electronic Press, Sweden},
  title = {Operationalizing a National Digital Library: The Case for a {N}orwegian Transformer Model},
  url = {https://aclanthology.org/2021.nodalida-main.3},
  year = {2021},
}

NusaParagraphEmotionClassification

NusaParagraphEmotionClassification is a multi-class emotion classification on 10 Indonesian languages from the NusaParagraph dataset.

Dataset: gentaiscool/nusaparagraph_emotLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 bbc, bew, bug, jav, mad, ... (10) Fiction, Non-fiction, Written human-annotated found
Citation
@inproceedings{cahyawijaya-etal-2023-nusawrites,
  address = {Nusa Dua, Bali},
  author = {Cahyawijaya, Samuel  and  Lovenia, Holy  and Koto, Fajri  and  Adhista, Dea  and  Dave, Emmanuel  and  Oktavianti, Sarah  and  Akbar, Salsabil  and  Lee, Jhonson  and  Shadieq, Nuur  and  Cenggoro, Tjeng Wawan  and  Linuwih, Hanung  and  Wilie, Bryan  and  Muridan, Galih  and  Winata, Genta  and  Moeljadi, David  and  Aji, Alham Fikri  and  Purwarianti, Ayu  and  Fung, Pascale},
  booktitle = {Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)},
  editor = {Park, Jong C.  and  Arase, Yuki  and  Hu, Baotian  and  Lu, Wei  and  Wijaya, Derry  and  Purwarianti, Ayu  and  Krisnadhi, Adila Alfa},
  month = nov,
  pages = {921--945},
  publisher = {Association for Computational Linguistics},
  title = {NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages},
  url = {https://aclanthology.org/2023.ijcnlp-main.60},
  year = {2023},
}

NusaParagraphTopicClassification

NusaParagraphTopicClassification is a multi-class topic classification on 10 Indonesian languages.

Dataset: gentaiscool/nusaparagraph_topicLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 bbc, bew, bug, jav, mad, ... (10) Fiction, Non-fiction, Written human-annotated found
Citation
@inproceedings{cahyawijaya-etal-2023-nusawrites,
  address = {Nusa Dua, Bali},
  author = {Cahyawijaya, Samuel  and  Lovenia, Holy  and Koto, Fajri  and  Adhista, Dea  and  Dave, Emmanuel  and  Oktavianti, Sarah  and  Akbar, Salsabil  and  Lee, Jhonson  and  Shadieq, Nuur  and  Cenggoro, Tjeng Wawan  and  Linuwih, Hanung  and  Wilie, Bryan  and  Muridan, Galih  and  Winata, Genta  and  Moeljadi, David  and  Aji, Alham Fikri  and  Purwarianti, Ayu  and  Fung, Pascale},
  booktitle = {Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)},
  editor = {Park, Jong C.  and  Arase, Yuki  and  Hu, Baotian  and  Lu, Wei  and  Wijaya, Derry  and  Purwarianti, Ayu  and  Krisnadhi, Adila Alfa},
  month = nov,
  pages = {921--945},
  publisher = {Association for Computational Linguistics},
  title = {NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages},
  url = {https://aclanthology.org/2023.ijcnlp-main.60},
  year = {2023},
}

NusaX-senti

NusaX is a high-quality multilingual parallel corpus that covers 12 languages, Indonesian, English, and 10 Indonesian local languages, namely Acehnese, Balinese, Banjarese, Buginese, Madurese, Minangkabau, Javanese, Ngaju, Sundanese, and Toba Batak. NusaX-Senti is a 3-labels (positive, neutral, negative) sentiment analysis dataset for 10 Indonesian local languages + Indonesian and English.

Dataset: indonlp/NusaX-sentiLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ace, ban, bbc, bjn, bug, ... (12) Constructed, Reviews, Social, Web, Written expert-annotated found
Citation
@misc{winata2022nusax,
  archiveprefix = {arXiv},
  author = {Winata, Genta Indra and Aji, Alham Fikri and Cahyawijaya,
Samuel and Mahendra, Rahmad and Koto, Fajri and Romadhony,
Ade and Kurniawan, Kemal and Moeljadi, David and Prasojo,
Radityo Eko and Fung, Pascale and Baldwin, Timothy and Lau,
Jey Han and Sennrich, Rico and Ruder, Sebastian},
  eprint = {2205.15960},
  primaryclass = {cs.CL},
  title = {NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages},
  year = {2022},
}

OPP115DataRetentionLegalBenchClassification

Given a clause from a privacy policy, classify if the clause describes how long user information is stored.

Dataset: nguha/legalbenchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115DataSecurityLegalBenchClassification

Given a clause from a privacy policy, classify if the clause describes how user information is protected.

Dataset: nguha/legalbenchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115DataSecurityLegalBenchClassification.v2

Given a clause from a privacy policy, classify if the clause describes how user information is protected. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/opp115_data_security_legal_benchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115DoNotTrackLegalBenchClassification

Given a clause from a privacy policy, classify if the clause describes if and how Do Not Track signals for online tracking and advertising are honored.

Dataset: nguha/legalbenchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115DoNotTrackLegalBenchClassification.v2

Given a clause from a privacy policy, classify if the clause describes if and how Do Not Track signals for online tracking and advertising are honored. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/opp115_do_not_track_legal_benchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115FirstPartyCollectionUseLegalBenchClassification

Given a clause from a privacy policy, classify if the clause describes how and why a service provider collects user information.

Dataset: nguha/legalbenchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115InternationalAndSpecificAudiencesLegalBenchClassification

Given a clause from a privacy policy, classify if the clause describe practices that pertain only to a specific group of users (e.g., children, Europeans, or California residents).

Dataset: nguha/legalbenchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115PolicyChangeLegalBenchClassification

Given a clause from a privacy policy, classify if the clause describes if and how users will be informed about changes to the privacy policy.

Dataset: nguha/legalbenchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115ThirdPartySharingCollectionLegalBenchClassification

Given a clause from a privacy policy, classify if the clause describe how user information may be shared with or collected by third parties.

Dataset: nguha/legalbenchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115UserAccessEditAndDeletionLegalBenchClassification

Given a clause from a privacy policy, classify if the clause describes if and how users may access, edit, or delete their information.

Dataset: nguha/legalbenchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115UserChoiceControlLegalBenchClassification

Given a clause fro ma privacy policy, classify if the clause describes the choices and control options available to users.

Dataset: nguha/legalbenchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OPP115UserChoiceControlLegalBenchClassification.v2

Given a clause fro ma privacy policy, classify if the clause describes the choices and control options available to users. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/opp115_user_choice_control_legal_benchLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{wilson2016creation,
  author = {Wilson, Shomir and Schaub, Florian and Dara, Aswarth Abhilash and Liu, Frederick and Cherivirala, Sushain and Leon, Pedro Giovanni and Andersen, Mads Schaarup and Zimmeck, Sebastian and Sathyendra, Kanthashree Mysore and Russell, N Cameron and others},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages = {1330--1340},
  title = {The creation and analysis of a website privacy policy corpus},
  year = {2016},
}

OdiaNewsClassification

A Odia dataset for 3-class classification of Odia news articles

Dataset: mlexplorer008/odia_news_classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 ory News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

OdiaNewsClassification.v2

A Odia dataset for 3-class classification of Odia news articles This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/odia_newsLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 ory News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

OnlineShopping

Sentiment Analysis of User Reviews on Online Shopping Websites

Dataset: C-MTEB/OnlineShopping-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified
Citation
@article{xiao2023c,
  author = {Xiao, Shitao and Liu, Zheng and Zhang, Peitian and Muennighof, Niklas},
  journal = {arXiv preprint arXiv:2309.07597},
  title = {C-pack: Packaged resources to advance general chinese embedding},
  year = {2023},
}

OnlineStoreReviewSentimentClassification

This dataset contains Arabic reviews of products from the SHEIN online store.

Dataset: Ruqiya/Arabic_Reviews_of_SHEINLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Reviews, Written derived found

OnlineStoreReviewSentimentClassification.v2

This dataset contains Arabic reviews of products from the SHEIN online store. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/online_store_review_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Reviews, Written derived found

OralArgumentQuestionPurposeLegalBenchClassification

This task classifies questions asked by Supreme Court justices at oral argument into seven categories: 1. Background - questions seeking factual or procedural information that is missing or not clear in the briefing 2. Clarification - questions seeking to get an advocate to clarify her position or the scope of the rule being advocated for 3. Implications - questions about the limits of a rule or its implications for future cases 4. Support - questions offering support for the advocate’s position 5. Criticism - questions criticizing an advocate’s position 6. Communicate - question designed primarily to communicate with other justices 7. Humor - questions designed to interject humor into the argument and relieve tension

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

OralArgumentQuestionPurposeLegalBenchClassification.v2

This task classifies questions asked by Supreme Court justices at oral argument into seven categories: 1. Background - questions seeking factual or procedural information that is missing or not clear in the briefing 2. Clarification - questions seeking to get an advocate to clarify her position or the scope of the rule being advocated for 3. Implications - questions about the limits of a rule or its implications for future cases 4. Support - questions offering support for the advocate’s position 5. Criticism - questions criticizing an advocate’s position 6. Communicate - question designed primarily to communicate with other justices 7. Humor - questions designed to interject humor into the argument and relieve tension

    This version corrects errors found in the original data. For details, see [pull request](https://github.com/embeddings-benchmark/mteb/pull/2900)

Dataset: mteb/oral_argument_question_purpose_legal_benchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

OverrulingLegalBenchClassification

This task consists of classifying whether or not a particular sentence of case law overturns the decision of a previous case.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{zheng2021does,
  author = {Zheng, Lucia and Guha, Neel and Anderson, Brandon R and Henderson, Peter and Ho, Daniel E},
  booktitle = {Proceedings of the eighteenth international conference on artificial intelligence and law},
  pages = {159--168},
  title = {When does pretraining help? assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings},
  year = {2021},
}

OverrulingLegalBenchClassification.v2

This task consists of classifying whether or not a particular sentence of case law overturns the decision of a previous case. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/overruling_legal_benchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@inproceedings{zheng2021does,
  author = {Zheng, Lucia and Guha, Neel and Anderson, Brandon R and Henderson, Peter and Ho, Daniel E},
  booktitle = {Proceedings of the eighteenth international conference on artificial intelligence and law},
  pages = {159--168},
  title = {When does pretraining help? assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings},
  year = {2021},
}

PAC

Polish Paraphrase Corpus

Dataset: laugustyniak/abusive-clauses-plLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pol Legal, Written not specified not specified
Citation
@misc{augustyniak2022waydesigningcompilinglepiszcze,
  archiveprefix = {arXiv},
  author = {Łukasz Augustyniak and Kamil Tagowski and Albert Sawczyn and Denis Janiak and Roman Bartusiak and Adrian Szymczak and Marcin Wątroba and Arkadiusz Janz and Piotr Szymański and Mikołaj Morzy and Tomasz Kajdanowicz and Maciej Piasecki},
  eprint = {2211.13112},
  primaryclass = {cs.CL},
  title = {This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish},
  url = {https://arxiv.org/abs/2211.13112},
  year = {2022},
}

PAC.v2

Polish Paraphrase Corpus This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/pacLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pol Legal, Written not specified not specified
Citation
@misc{augustyniak2022waydesigningcompilinglepiszcze,
  archiveprefix = {arXiv},
  author = {Łukasz Augustyniak and Kamil Tagowski and Albert Sawczyn and Denis Janiak and Roman Bartusiak and Adrian Szymczak and Marcin Wątroba and Arkadiusz Janz and Piotr Szymański and Mikołaj Morzy and Tomasz Kajdanowicz and Maciej Piasecki},
  eprint = {2211.13112},
  primaryclass = {cs.CL},
  title = {This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish},
  url = {https://arxiv.org/abs/2211.13112},
  year = {2022},
}

PROALegalBenchClassification

Given a statute, determine if the text contains an explicit private right of action. Given a privacy policy clause and a description of the clause, determine if the description is correct. A private right of action (PROA) exists when a statute empowers an ordinary individual (i.e., a private person) to legally enforce their rights by bringing an action in court. In short, a PROA creates the ability for an individual to sue someone in order to recover damages or halt some offending conduct. PROAs are ubiquitous in antitrust law (in which individuals harmed by anti-competitive behavior can sue offending firms for compensation) and environmental law (in which individuals can sue entities which release hazardous substances for damages).

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

PatentClassification

Classification Dataset of Patents and Abstract

Dataset: mteb/PatentClassificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written derived found
Citation
@inproceedings{sharma-etal-2019-bigpatent,
  abstract = {Most existing text summarization datasets are compiled from the news domain, where summaries have a flattened discourse structure. In such datasets, summary-worthy content often appears in the beginning of input articles. Moreover, large segments from input articles are present verbatim in their respective summaries. These issues impede the learning and evaluation of systems that can understand an article{'}s global content structure as well as produce abstractive summaries with high compression ratio. In this work, we present a novel dataset, BIGPATENT, consisting of 1.3 million records of U.S. patent documents along with human written abstractive summaries. Compared to existing summarization datasets, BIGPATENT has the following properties: i) summaries contain a richer discourse structure with more recurring entities, ii) salient content is evenly distributed in the input, and iii) lesser and shorter extractive fragments are present in the summaries. Finally, we train and evaluate baselines and popular learning models on BIGPATENT to shed light on new challenges and motivate future directions for summarization research.},
  address = {Florence, Italy},
  author = {Sharma, Eva  and
Li, Chen  and
Wang, Lu},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  doi = {10.18653/v1/P19-1212},
  editor = {Korhonen, Anna  and
Traum, David  and
M{\`a}rquez, Llu{\'\i}s},
  month = jul,
  pages = {2204--2213},
  publisher = {Association for Computational Linguistics},
  title = {{BIGPATENT}: A Large-Scale Dataset for Abstractive and Coherent Summarization},
  url = {https://aclanthology.org/P19-1212},
  year = {2019},
}

PatentClassification.v2

Classification Dataset of Patents and Abstract This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/patentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written derived found
Citation
@inproceedings{sharma-etal-2019-bigpatent,
  abstract = {Most existing text summarization datasets are compiled from the news domain, where summaries have a flattened discourse structure. In such datasets, summary-worthy content often appears in the beginning of input articles. Moreover, large segments from input articles are present verbatim in their respective summaries. These issues impede the learning and evaluation of systems that can understand an article{'}s global content structure as well as produce abstractive summaries with high compression ratio. In this work, we present a novel dataset, BIGPATENT, consisting of 1.3 million records of U.S. patent documents along with human written abstractive summaries. Compared to existing summarization datasets, BIGPATENT has the following properties: i) summaries contain a richer discourse structure with more recurring entities, ii) salient content is evenly distributed in the input, and iii) lesser and shorter extractive fragments are present in the summaries. Finally, we train and evaluate baselines and popular learning models on BIGPATENT to shed light on new challenges and motivate future directions for summarization research.},
  address = {Florence, Italy},
  author = {Sharma, Eva  and
Li, Chen  and
Wang, Lu},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  doi = {10.18653/v1/P19-1212},
  editor = {Korhonen, Anna  and
Traum, David  and
M{\`a}rquez, Llu{\'\i}s},
  month = jul,
  pages = {2204--2213},
  publisher = {Association for Computational Linguistics},
  title = {{BIGPATENT}: A Large-Scale Dataset for Abstractive and Coherent Summarization},
  url = {https://aclanthology.org/P19-1212},
  year = {2019},
}

PersianFoodSentimentClassification

Persian Food Review Dataset

Dataset: asparius/Persian-Food-SentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Reviews, Written derived found
Citation
@article{ParsBERT,
  author = {Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
  journal = {ArXiv},
  title = {ParsBERT: Transformer-based Model for Persian Language Understanding},
  volume = {abs/2005.12515},
  year = {2020},
}

PersianTextEmotion

Emotion is a Persian dataset with six basic emotions: anger, fear, joy, love, sadness, and surprise.

Dataset: SeyedAli/Persian-Text-EmotionLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas not specified derived found
Citation

PersianTextEmotion.v2

Emotion is a Persian dataset with six basic emotions: anger, fear, joy, love, sadness, and surprise. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/persian_text_emotionLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas not specified derived found
Citation

PersonalJurisdictionLegalBenchClassification

Given a fact pattern describing the set of contacts between a plaintiff, defendant, and forum, determine if a court in that forum could excercise personal jurisdiction over the defendant.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

PoemSentimentClassification

Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg.

Dataset: google-research-datasets/poem_sentimentLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Reviews, Written human-annotated found
Citation
@misc{sheng2020investigating,
  archiveprefix = {arXiv},
  author = {Emily Sheng and David Uthus},
  eprint = {2011.02686},
  primaryclass = {cs.CL},
  title = {Investigating Societal Biases in a Poetry Composition System},
  year = {2020},
}

PoemSentimentClassification.v2

Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/poem_sentimentLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Reviews, Written human-annotated found
Citation
@misc{sheng2020investigating,
  archiveprefix = {arXiv},
  author = {Emily Sheng and David Uthus},
  eprint = {2011.02686},
  primaryclass = {cs.CL},
  title = {Investigating Societal Biases in a Poetry Composition System},
  year = {2020},
}

PolEmo2.0-IN

A collection of Polish online reviews from four domains: medicine, hotels, products and school. The PolEmo2.0-IN task is to predict the sentiment of in-domain (medicine and hotels) reviews.

Dataset: PL-MTEB/polemo2_inLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pol Social, Written derived found
Citation
@inproceedings{kocon-etal-2019-multi,
  abstract = {In this article we present an extended version of PolEmo {--} a corpus of consumer reviews from 4 domains: medicine, hotels, products and school. Current version (PolEmo 2.0) contains 8,216 reviews having 57,466 sentences. Each text and sentence was manually annotated with sentiment in 2+1 scheme, which gives a total of 197,046 annotations. We obtained a high value of Positive Specific Agreement, which is 0.91 for texts and 0.88 for sentences. PolEmo 2.0 is publicly available under a Creative Commons copyright license. We explored recent deep learning approaches for the recognition of sentiment, such as Bi-directional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT).},
  address = {Hong Kong, China},
  author = {Koco{\'n}, Jan  and
Mi{\l}kowski, Piotr  and
Za{\'s}ko-Zieli{\'n}ska, Monika},
  booktitle = {Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)},
  doi = {10.18653/v1/K19-1092},
  month = nov,
  pages = {980--991},
  publisher = {Association for Computational Linguistics},
  title = {Multi-Level Sentiment Analysis of {P}ol{E}mo 2.0: Extended Corpus of Multi-Domain Consumer Reviews},
  url = {https://aclanthology.org/K19-1092},
  year = {2019},
}

PolEmo2.0-IN.v2

A collection of Polish online reviews from four domains: medicine, hotels, products and school. The PolEmo2.0-IN task is to predict the sentiment of in-domain (medicine and hotels) reviews.

Dataset: mteb/pol_emo2_inLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pol Social, Written derived found
Citation
@inproceedings{kocon-etal-2019-multi,
  abstract = {In this article we present an extended version of PolEmo {--} a corpus of consumer reviews from 4 domains: medicine, hotels, products and school. Current version (PolEmo 2.0) contains 8,216 reviews having 57,466 sentences. Each text and sentence was manually annotated with sentiment in 2+1 scheme, which gives a total of 197,046 annotations. We obtained a high value of Positive Specific Agreement, which is 0.91 for texts and 0.88 for sentences. PolEmo 2.0 is publicly available under a Creative Commons copyright license. We explored recent deep learning approaches for the recognition of sentiment, such as Bi-directional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT).},
  address = {Hong Kong, China},
  author = {Koco{\'n}, Jan  and
Mi{\l}kowski, Piotr  and
Za{\'s}ko-Zieli{\'n}ska, Monika},
  booktitle = {Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)},
  doi = {10.18653/v1/K19-1092},
  month = nov,
  pages = {980--991},
  publisher = {Association for Computational Linguistics},
  title = {Multi-Level Sentiment Analysis of {P}ol{E}mo 2.0: Extended Corpus of Multi-Domain Consumer Reviews},
  url = {https://aclanthology.org/K19-1092},
  year = {2019},
}

PolEmo2.0-OUT

A collection of Polish online reviews from four domains: medicine, hotels, products and school. The PolEmo2.0-OUT task is to predict the sentiment of out-of-domain (products and school) reviews using models train on reviews from medicine and hotels domains.

Dataset: PL-MTEB/polemo2_outLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pol Social, Written not specified not specified

PolEmo2.0-OUT.v2

A collection of Polish online reviews from four domains: medicine, hotels, products and school. The PolEmo2.0-OUT task is to predict the sentiment of out-of-domain (products and school) reviews using models train on reviews from medicine and hotels domains.

Dataset: mteb/pol_emo2_outLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pol Social, Written not specified not specified

PunjabiNewsClassification

A Punjabi dataset for 2-class classification of Punjabi news articles

Dataset: mlexplorer008/punjabi_news_classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy pan News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

RestaurantReviewSentimentClassification

Dataset of 8364 restaurant reviews from qaym.com in Arabic for sentiment analysis

Dataset: hadyelsahar/ar_res_reviewsLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Reviews, Written derived found
Citation
@inproceedings{elsahar2015building,
  author = {ElSahar, Hady and El-Beltagy, Samhaa R},
  booktitle = {International conference on intelligent text processing and computational linguistics},
  organization = {Springer},
  pages = {23--34},
  title = {Building large arabic multi-domain resources for sentiment analysis},
  year = {2015},
}

RestaurantReviewSentimentClassification.v2

Dataset of 8156 restaurant reviews from qaym.com in Arabic for sentiment analysis This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/restaurant_review_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Reviews, Written derived found
Citation
@inproceedings{elsahar2015building,
  author = {ElSahar, Hady and El-Beltagy, Samhaa R},
  booktitle = {International conference on intelligent text processing and computational linguistics},
  organization = {Springer},
  pages = {23--34},
  title = {Building large arabic multi-domain resources for sentiment analysis},
  year = {2015},
}

RomanianReviewsSentiment

LaRoSeDa (A Large Romanian Sentiment Data Set) contains 15,000 reviews written in Romanian

Dataset: universityofbucharest/larosedaLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ron Reviews, Written derived found
Citation
@article{tache2101clustering,
  author = {Anca Maria Tache and Mihaela Gaman and Radu Tudor Ionescu},
  journal = {ArXiv},
  title = {Clustering Word Embeddings with Self-Organizing Maps. Application on LaRoSeDa -- A Large Romanian Sentiment Data Set},
  year = {2021},
}

RomanianReviewsSentiment.v2

LaRoSeDa (A Large Romanian Sentiment Data Set) contains 15,000 reviews written in Romanian This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/romanian_reviews_sentimentLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ron Reviews, Written derived found
Citation
@article{tache2101clustering,
  author = {Anca Maria Tache and Mihaela Gaman and Radu Tudor Ionescu},
  journal = {ArXiv},
  title = {Clustering Word Embeddings with Self-Organizing Maps. Application on LaRoSeDa -- A Large Romanian Sentiment Data Set},
  year = {2021},
}

RomanianSentimentClassification

An Romanian dataset for sentiment classification.

Dataset: dumitrescustefan/ro_sentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ron Reviews, Written human-annotated found
Citation
@article{dumitrescu2020birth,
  author = {Dumitrescu, Stefan Daniel and Avram, Andrei-Marius and Pyysalo, Sampo},
  journal = {arXiv preprint arXiv:2009.08712},
  title = {The birth of Romanian BERT},
  year = {2020},
}

RomanianSentimentClassification.v2

An Romanian dataset for sentiment classification. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/romanian_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ron Reviews, Written human-annotated found
Citation
@article{dumitrescu2020birth,
  author = {Dumitrescu, Stefan Daniel and Avram, Andrei-Marius and Pyysalo, Sampo},
  journal = {arXiv preprint arXiv:2009.08712},
  title = {The birth of Romanian BERT},
  year = {2020},
}

RuNLUIntentClassification

Contains natural language data for human-robot interaction in home domain which we collected and annotated for evaluating NLU Services/platforms.

Dataset: mteb/RuNLUIntentClassificationLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) accuracy rus not specified human-annotated found
Citation
@misc{liu2019benchmarkingnaturallanguageunderstanding,
  archiveprefix = {arXiv},
  author = {Xingkun Liu and Arash Eshghi and Pawel Swietojanski and Verena Rieser},
  eprint = {1903.05566},
  primaryclass = {cs.CL},
  title = {Benchmarking Natural Language Understanding Services for building Conversational Agents},
  url = {https://arxiv.org/abs/1903.05566},
  year = {2019},
}

RuReviewsClassification

Product review classification (3-point scale) based on RuRevies dataset

Dataset: ai-forever/ru-reviews-classificationLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus Reviews, Written derived found
Citation
@inproceedings{Smetanin-SA-2019,
  author = {Sergey Smetanin and Michail Komarov},
  booktitle = {2019 IEEE 21st Conference on Business Informatics (CBI)},
  doi = {10.1109/CBI.2019.00062},
  issn = {2378-1963},
  month = {July},
  number = {},
  pages = {482-486},
  title = {Sentiment Analysis of Product Reviews in Russian using Convolutional Neural Networks},
  volume = {01},
  year = {2019},
}

RuReviewsClassification.v2

Product review classification (3-point scale) based on RuRevies dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/ru_reviewsLicense: apache-2.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus Reviews, Written derived found
Citation
@inproceedings{Smetanin-SA-2019,
  author = {Sergey Smetanin and Michail Komarov},
  booktitle = {2019 IEEE 21st Conference on Business Informatics (CBI)},
  doi = {10.1109/CBI.2019.00062},
  issn = {2378-1963},
  month = {July},
  number = {},
  pages = {482-486},
  title = {Sentiment Analysis of Product Reviews in Russian using Convolutional Neural Networks},
  volume = {01},
  year = {2019},
}

RuSciBenchCoreRiscClassification

This binary classification task aims to determine whether a scientific paper (based on its title and abstract) belongs to the Core of the Russian Science Citation Index (RISC). The RISC includes a wide range of publications, but the Core RISC comprises the most cited and prestigious journals, dissertations, theses, monographs, and studies. The task is provided for both Russian and English versions of the paper's title and abstract.

Dataset: mlsa-iai-msu-lab/ru_sci_bench_mtebLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng, rus Academic, Non-fiction, Written derived found
Citation
@article{vatolin2024ruscibench,
  author = {Vatolin, A. and Gerasimenko, N. and Ianina, A. and Vorontsov, K.},
  doi = {10.1134/S1064562424602191},
  issn = {1531-8362},
  journal = {Doklady Mathematics},
  month = {12},
  number = {1},
  pages = {S251--S260},
  title = {RuSciBench: Open Benchmark for Russian and English Scientific Document Representations},
  url = {https://doi.org/10.1134/S1064562424602191},
  volume = {110},
  year = {2024},
}

RuSciBenchGRNTIClassification

Classification of scientific papers (title+abstract) by rubric

Dataset: ai-forever/ru-scibench-grnti-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus Academic, Written derived found

RuSciBenchGRNTIClassification.v2

Classification of scientific papers based on the GRNTI (State Rubricator of Scientific and Technical Information) rubricator. GRNTI is a universal hierarchical classification of knowledge domains adopted in Russia and CIS countries to systematize the entire flow of scientific and technical information. This task uses the first level of the GRNTI hierarchy and top 28 classes by frequency.

    In this version, English language support has been added and data partitioning has been slightly modified.

Dataset: mlsa-iai-msu-lab/ru_sci_bench_mtebLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng, rus Academic, Non-fiction, Written derived found
Citation
@article{vatolin2024ruscibench,
  author = {Vatolin, A. and Gerasimenko, N. and Ianina, A. and Vorontsov, K.},
  doi = {10.1134/S1064562424602191},
  issn = {1531-8362},
  journal = {Doklady Mathematics},
  month = {12},
  number = {1},
  pages = {S251--S260},
  title = {RuSciBench: Open Benchmark for Russian and English Scientific Document Representations},
  url = {https://doi.org/10.1134/S1064562424602191},
  volume = {110},
  year = {2024},
}

RuSciBenchOECDClassification

Classification of scientific papers (title+abstract) by rubric

Dataset: ai-forever/ru-scibench-oecd-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy rus Academic, Written derived found

RuSciBenchOECDClassification.v2

Classification of scientific papers based on the OECD (Organization for Economic Co-operation and Development) rubricator. OECD provides a hierarchical 3-level system of classes for labeling scientific articles. This task uses the first two levels of the OECD hierarchy, top 29 classes.

    In this version, English language support has been added and data partitioning has been slightly modified.

Dataset: mlsa-iai-msu-lab/ru_sci_bench_mtebLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng, rus Academic, Non-fiction, Written derived found
Citation
@article{vatolin2024ruscibench,
  author = {Vatolin, A. and Gerasimenko, N. and Ianina, A. and Vorontsov, K.},
  doi = {10.1134/S1064562424602191},
  issn = {1531-8362},
  journal = {Doklady Mathematics},
  month = {12},
  number = {1},
  pages = {S251--S260},
  title = {RuSciBench: Open Benchmark for Russian and English Scientific Document Representations},
  url = {https://doi.org/10.1134/S1064562424602191},
  volume = {110},
  year = {2024},
}

RuSciBenchPubTypeClassification

This task involves classifying scientific papers (based on their title and abstract) into different publication types. The dataset identifies the following types: 'Article', 'Conference proceedings', 'Survey', 'Miscellanea', 'Short message', 'Review', and 'Personalia'. This task is available for both Russian and English versions of the paper's title and abstract.

Dataset: mlsa-iai-msu-lab/ru_sci_bench_mtebLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng, rus Academic, Non-fiction, Written derived found
Citation
@article{vatolin2024ruscibench,
  author = {Vatolin, A. and Gerasimenko, N. and Ianina, A. and Vorontsov, K.},
  doi = {10.1134/S1064562424602191},
  issn = {1531-8362},
  journal = {Doklady Mathematics},
  month = {12},
  number = {1},
  pages = {S251--S260},
  title = {RuSciBench: Open Benchmark for Russian and English Scientific Document Representations},
  url = {https://doi.org/10.1134/S1064562424602191},
  volume = {110},
  year = {2024},
}

RuToxicOKMLCUPClassification

On the Odnoklassniki social network, users post a huge number of comments of various directions and nature every day.

Dataset: mteb/RuToxicOKMLCUPClassificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) accuracy rus not specified derived found

RuToxicOKMLCUPClassification.v2

On the Odnoklassniki social network, users post a huge number of comments of various directions and nature every day. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/ru_toxic_okmlcupLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) accuracy rus not specified derived found

RuToxicOKMLCUPMultilabelClassification

On the Odnoklassniki social network, users post a huge number of comments of various directions and nature every day.

Dataset: mteb/RuToxicOKMLCUPClassificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) accuracy rus not specified derived found

SCDBPAccountabilityLegalBenchClassification

This is a binary classification task in which the LLM must determine if a supply chain disclosure meets the following coding criteria: 'Does the above statement disclose whether the retail seller or manufacturer maintains internal compliance procedures on company standards regarding human trafficking and slavery? This includes any type of internal accountability mechanism. Requiring independently of the supply to comply with laws does not qualify or asking for documentary evidence of compliance does not count either.'

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@article{chilton2017limitations,
  author = {Chilton, Adam S and Sarfaty, Galit A},
  journal = {Stan. J. Int'l L.},
  pages = {1},
  publisher = {HeinOnline},
  title = {The limitations of supply chain disclosure regimes},
  volume = {53},
  year = {2017},
}

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

SCDBPAuditsLegalBenchClassification

This is a binary classification task in which the LLM must determine if a supply chain disclosure meets the following coding criteria: 'Does the above statement disclose whether the retail seller or manufacturer performs any type of audit, or reserves the right to audit?'

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@article{chilton2017limitations,
  author = {Chilton, Adam S and Sarfaty, Galit A},
  journal = {Stan. J. Int'l L.},
  pages = {1},
  publisher = {HeinOnline},
  title = {The limitations of supply chain disclosure regimes},
  volume = {53},
  year = {2017},
}

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

SCDBPCertificationLegalBenchClassification

This is a binary classification task in which the LLM must determine if a supply chain disclosure meets the following coding criteria: 'Does the above statement disclose whether the retail seller or manufacturer performs any type of audit, or reserves the right to audit?'

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@article{chilton2017limitations,
  author = {Chilton, Adam S and Sarfaty, Galit A},
  journal = {Stan. J. Int'l L.},
  pages = {1},
  publisher = {HeinOnline},
  title = {The limitations of supply chain disclosure regimes},
  volume = {53},
  year = {2017},
}

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

SCDBPTrainingLegalBenchClassification

This is a binary classification task in which the LLM must determine if a supply chain disclosure meets the following coding criteria: 'Does the above statement disclose whether the retail seller or manufacturer provides training to employees on human trafficking and slavery? Broad policies such as ongoing dialogue on mitigating risks of human trafficking and slavery or increasing managers and purchasers knowledge about health, safety and labor practices qualify as training. Providing training to contractors who failed to comply with human trafficking laws counts as training.'

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@article{chilton2017limitations,
  author = {Chilton, Adam S and Sarfaty, Galit A},
  journal = {Stan. J. Int'l L.},
  pages = {1},
  publisher = {HeinOnline},
  title = {The limitations of supply chain disclosure regimes},
  volume = {53},
  year = {2017},
}

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

SCDBPVerificationLegalBenchClassification

This is a binary classification task in which the LLM must determine if a supply chain disclosure meets the following coding criteria: 'Does the above statement disclose whether the retail seller or manufacturer engages in verification and auditing as one practice, expresses that it may conduct an audit, or expressess that it is assessing supplier risks through a review of the US Dept. of Labor's List?'

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@article{chilton2017limitations,
  author = {Chilton, Adam S and Sarfaty, Galit A},
  journal = {Stan. J. Int'l L.},
  pages = {1},
  publisher = {HeinOnline},
  title = {The limitations of supply chain disclosure regimes},
  volume = {53},
  year = {2017},
}

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

SCDDAccountabilityLegalBenchClassification

This is a binary classification task in which the LLM must determine if a supply chain disclosure meets the following coding criteria: 'Does the above statement disclose to what extent, if any, that the retail seller or manufacturer maintains internal accountability standards and procedures for employees or contractors failing to meet company standards regarding slavery and trafficking?'

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@article{chilton2017limitations,
  author = {Chilton, Adam S and Sarfaty, Galit A},
  journal = {Stan. J. Int'l L.},
  pages = {1},
  publisher = {HeinOnline},
  title = {The limitations of supply chain disclosure regimes},
  volume = {53},
  year = {2017},
}

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

SCDDAuditsLegalBenchClassification

This is a binary classification task in which the LLM must determine if a supply chain disclosure meets the following coding criteria: 'Does the above statement disclose to what extent, if any, that the retail seller or manufacturer conducts audits of suppliers to evaluate supplier compliance with company standards for trafficking and slavery in supply chains? The disclosure shall specify if the verification was not an independent, unannounced audit.'

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@article{chilton2017limitations,
  author = {Chilton, Adam S and Sarfaty, Galit A},
  journal = {Stan. J. Int'l L.},
  pages = {1},
  publisher = {HeinOnline},
  title = {The limitations of supply chain disclosure regimes},
  volume = {53},
  year = {2017},
}

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

SCDDCertificationLegalBenchClassification

This is a binary classification task in which the LLM must determine if a supply chain disclosure meets the following coding criteria: 'Does the above statement disclose to what extent, if any, that the retail seller or manufacturer requires direct suppliers to certify that materials incorporated into the product comply with the laws regarding slavery and human trafficking of the country or countries in which they are doing business?'

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@article{chilton2017limitations,
  author = {Chilton, Adam S and Sarfaty, Galit A},
  journal = {Stan. J. Int'l L.},
  pages = {1},
  publisher = {HeinOnline},
  title = {The limitations of supply chain disclosure regimes},
  volume = {53},
  year = {2017},
}

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

SCDDTrainingLegalBenchClassification

This is a binary classification task in which the LLM must determine if a supply chain disclosure meets the following coding criteria: 'Does the above statement disclose to what extent, if any, that the retail seller or manufacturer provides company employees and management, who have direct responsibility for supply chain management, training on human trafficking and slavery, particularly with respect to mitigating risks within the supply chains of products?'

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@article{chilton2017limitations,
  author = {Chilton, Adam S and Sarfaty, Galit A},
  journal = {Stan. J. Int'l L.},
  pages = {1},
  publisher = {HeinOnline},
  title = {The limitations of supply chain disclosure regimes},
  volume = {53},
  year = {2017},
}

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

SCDDVerificationLegalBenchClassification

This is a binary classification task in which the LLM must determine if a supply chain disclosure meets the following coding criteria: 'Does the above statement disclose to what extent, if any, that the retail seller or manufacturer engages in verification of product supply chains to evaluate and address risks of human trafficking and slavery? If the company conducts verification], the disclosure shall specify if the verification was not conducted by a third party.'

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@article{chilton2017limitations,
  author = {Chilton, Adam S and Sarfaty, Galit A},
  journal = {Stan. J. Int'l L.},
  pages = {1},
  publisher = {HeinOnline},
  title = {The limitations of supply chain disclosure regimes},
  volume = {53},
  year = {2017},
}

@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

SDSEyeProtectionClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/SDSEyeProtectionClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry LM-generated and reviewed created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

@inproceedings{pereira2020msds,
  author = {Pereira, Eliseu},
  booktitle = {15th Doctoral Symposium},
  pages = {42},
  title = {MSDS-OPP: Operator Procedures Prediction in Material Safety Data Sheets},
  year = {2020},
}

SDSEyeProtectionClassification.v2

ChemTEB evaluates the performance of text embedding models on chemical domain data. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/sds_eye_protectionLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry LM-generated and reviewed created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

@inproceedings{pereira2020msds,
  author = {Pereira, Eliseu},
  booktitle = {15th Doctoral Symposium},
  pages = {42},
  title = {MSDS-OPP: Operator Procedures Prediction in Material Safety Data Sheets},
  year = {2020},
}

SDSGlovesClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/SDSGlovesClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry LM-generated and reviewed created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

@inproceedings{pereira2020msds,
  author = {Pereira, Eliseu},
  booktitle = {15th Doctoral Symposium},
  pages = {42},
  title = {MSDS-OPP: Operator Procedures Prediction in Material Safety Data Sheets},
  year = {2020},
}

SDSGlovesClassification.v2

ChemTEB evaluates the performance of text embedding models on chemical domain data. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/sds_glovesLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry LM-generated and reviewed created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

@inproceedings{pereira2020msds,
  author = {Pereira, Eliseu},
  booktitle = {15th Doctoral Symposium},
  pages = {42},
  title = {MSDS-OPP: Operator Procedures Prediction in Material Safety Data Sheets},
  year = {2020},
}

SIB200Classification

SIB-200 is the largest publicly available topic classification dataset based on Flores-200 covering 205 languages and dialects annotated. The dataset is annotated in English for the topics, science/technology, travel, politics, sports, health, entertainment, and geography. The labels are then transferred to the other languages in Flores-200 which are human-translated.

Dataset: mteb/sib200License: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ace, acm, acq, aeb, afr, ... (197) News, Written expert-annotated human-translated and localized
Citation
@article{adelani2023sib,
  author = {Adelani, David Ifeoluwa and Liu, Hannah and Shen, Xiaoyu and Vassilyev, Nikita and Alabi, Jesujoba O and Mao, Yanke and Gao, Haonan and Lee, Annie En-Shiun},
  journal = {arXiv preprint arXiv:2309.07445},
  title = {SIB-200: A simple, inclusive, and big evaluation dataset for topic classification in 200+ languages and dialects},
  year = {2023},
}

SIDClassification

SID Classification

Dataset: MCINext/sid-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Academic derived found
Citation

SIDClassification.v2

SID Classification This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/sidLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Academic derived found
Citation

SanskritShlokasClassification

This data set contains ~500 Shlokas

Dataset: bpHigh/iNLTK_Sanskrit_Shlokas_DatasetLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy san Religious, Written derived found
Citation
@inproceedings{arora-2020-inltk,
  abstract = {We present iNLTK, an open-source NLP library consisting of pre-trained language models and out-of-the-box support for Data Augmentation, Textual Similarity, Sentence Embeddings, Word Embeddings, Tokenization and Text Generation in 13 Indic Languages. By using pre-trained models from iNLTK for text classification on publicly available datasets, we significantly outperform previously reported results. On these datasets, we also show that by using pre-trained models and data augmentation from iNLTK, we can achieve more than 95{\%} of the previous best performance by using less than 10{\%} of the training data. iNLTK is already being widely used by the community and has 40,000+ downloads, 600+ stars and 100+ forks on GitHub.},
  address = {Online},
  author = {Arora, Gaurav},
  booktitle = {Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)},
  doi = {10.18653/v1/2020.nlposs-1.10},
  editor = {Park, Eunjeong L.  and
Hagiwara, Masato  and
Milajevs, Dmitrijs  and
Liu, Nelson F.  and
Chauhan, Geeticka  and
Tan, Liling},
  month = nov,
  pages = {66--71},
  publisher = {Association for Computational Linguistics},
  title = {i{NLTK}: Natural Language Toolkit for Indic Languages},
  url = {https://aclanthology.org/2020.nlposs-1.10},
  year = {2020},
}

SardiStanceClassification

SardiStance is a unique dataset designed for the task of stance detection in Italian tweets. It consists of tweets related to the Sardines movement, providing a valuable resource for researchers and practitioners in the field of NLP.

Dataset: MattiaSangermano/SardiStanceLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ita Social derived found
Citation
@inproceedings{cignarella2020sardistance,
  author = {Cignarella, Alessandra Teresa and Lai, Mirko and Bosco, Cristina and Patti, Viviana and Rosso, Paolo and others},
  booktitle = {CEUR WORKSHOP PROCEEDINGS},
  organization = {Ceur},
  pages = {1--10},
  title = {Sardistance@ evalita2020: Overview of the task on stance detection in italian tweets},
  year = {2020},
}

ScalaClassification

ScaLa a linguistic acceptability dataset for the mainland Scandinavian languages automatically constructed from dependency annotations in Universal Dependencies Treebanks. Published as part of 'ScandEval: A Benchmark for Scandinavian Natural Language Processing'

Dataset: mteb/multilingual-scala-classificationLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan, nno, nob, swe Blog, Fiction, News, Non-fiction, Spoken, ... (7) human-annotated created
Citation
@inproceedings{nielsen-2023-scandeval,
  address = {T{\'o}rshavn, Faroe Islands},
  author = {Nielsen, Dan},
  booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
  editor = {Alum{\"a}e, Tanel  and
Fishel, Mark},
  month = may,
  pages = {185--201},
  publisher = {University of Tartu Library},
  title = {{S}cand{E}val: A Benchmark for {S}candinavian Natural Language Processing},
  url = {https://aclanthology.org/2023.nodalida-1.20},
  year = {2023},
}

ScandiSentClassification

The corpus is crawled from se.trustpilot.com, no.trustpilot.com, dk.trustpilot.com, fi.trustpilot.com and trustpilot.com.

Dataset: mteb/scandisentLicense: openrail • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy dan, eng, fin, nob, swe Reviews, Written expert-annotated found
Citation
@inproceedings{isbister-etal-2021-stop,
  address = {Reykjavik, Iceland (Online)},
  author = {Isbister, Tim  and
Carlsson, Fredrik  and
Sahlgren, Magnus},
  booktitle = {Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)},
  editor = {Dobnik, Simon  and
{\O}vrelid, Lilja},
  month = may # { 31--2 } # jun,
  pages = {385--390},
  publisher = {Link{\"o}ping University Electronic Press, Sweden},
  title = {Should we Stop Training More Monolingual Models, and Simply Use Machine Translation Instead?},
  url = {https://aclanthology.org/2021.nodalida-main.42/},
  year = {2021},
}

SentiRuEval2016

Russian sentiment analysis evaluation SentiRuEval-2016 devoted to reputation monitoring of banks and telecom companies in Twitter. We describe the task, data, the procedure of data preparation, and participants’ results.

Dataset: mteb/SentiRuEval2016License: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) accuracy rus not specified derived found
Citation
@inproceedings{loukachevitch2016sentirueval,
  author = {Loukachevitch, NV and Rubtsova, Yu V},
  booktitle = {Computational Linguistics and Intellectual Technologies},
  pages = {416--426},
  title = {SentiRuEval-2016: overcoming time gap and data sparsity in tweet sentiment analysis},
  year = {2016},
}

SentiRuEval2016.v2

Russian sentiment analysis evaluation SentiRuEval-2016 devoted to reputation monitoring of banks and telecom companies in Twitter. We describe the task, data, the procedure of data preparation, and participants’ results. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/senti_ru_eval2016License: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to text (t2t) accuracy rus not specified derived found
Citation
@inproceedings{loukachevitch2016sentirueval,
  author = {Loukachevitch, NV and Rubtsova, Yu V},
  booktitle = {Computational Linguistics and Intellectual Technologies},
  pages = {416--426},
  title = {SentiRuEval-2016: overcoming time gap and data sparsity in tweet sentiment analysis},
  year = {2016},
}

SentimentAnalysisHindi

Hindi Sentiment Analysis Dataset

Dataset: OdiaGenAI/sentiment_analysis_hindiLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 hin Reviews, Written derived found
Citation
@misc{OdiaGenAI,
  author = {Shantipriya Parida and Sambit Sekhar and Soumendra Kumar Sahoo and Swateek Jena and Abhijeet Parida and Satya Ranjan Dash and Guneet Singh Kohli},
  howpublished = {{https://huggingface.co/OdiaGenAI}},
  journal = {Hugging Face repository},
  publisher = {Hugging Face},
  title = {OdiaGenAI: Generative AI and LLM Initiative for the Odia Language},
  year = {2023},
}

SentimentAnalysisHindi.v2

Hindi Sentiment Analysis Dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/sentiment_analysis_hindiLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 hin Reviews, Written derived found
Citation
@misc{OdiaGenAI,
  author = {Shantipriya Parida and Sambit Sekhar and Soumendra Kumar Sahoo and Swateek Jena and Abhijeet Parida and Satya Ranjan Dash and Guneet Singh Kohli},
  howpublished = {{https://huggingface.co/OdiaGenAI}},
  journal = {Hugging Face repository},
  publisher = {Hugging Face},
  title = {OdiaGenAI: Generative AI and LLM Initiative for the Odia Language},
  year = {2023},
}

SentimentDKSF

The Sentiment DKSF (Digikala/Snappfood comments) is a dataset for sentiment analysis.

Dataset: hezarai/sentiment-dksfLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Reviews derived found
Citation

SentimentDKSF.v2

The Sentiment DKSF (Digikala/Snappfood comments) is a dataset for sentiment analysis. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/sentiment_dksfLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Reviews derived found
Citation

SinhalaNewsClassification

This file contains news texts (sentences) belonging to 5 different news categories (political, business, technology, sports and Entertainment). The original dataset was released by Nisansa de Silva (Sinhala Text Classification: Observations from the Perspective of a Resource Poor Language, 2015).

Dataset: NLPC-UOM/Sinhala-News-Category-classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy sin News, Written derived found
Citation
@article{deSilva2015,
  author = {Nisansa de Silva},
  journal = {Year of Publication},
  title = {Sinhala Text Classification: Observations from the Perspective of a Resource Poor Language},
  year = {2015},
}

@article{dhananjaya2022,
  author = {Dhananjaya et al.},
  journal = {Year of Publication},
  title = {BERTifying Sinhala - A Comprehensive Analysis of Pre-trained Language Models for Sinhala Text Classification},
  year = {2022},
}

SinhalaNewsClassification.v2

This file contains news texts (sentences) belonging to 5 different news categories (political, business, technology, sports and Entertainment). The original dataset was released by Nisansa de Silva (Sinhala Text Classification: Observations from the Perspective of a Resource Poor Language, 2015). This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/sinhala_newsLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy sin News, Written derived found
Citation
@article{deSilva2015,
  author = {Nisansa de Silva},
  journal = {Year of Publication},
  title = {Sinhala Text Classification: Observations from the Perspective of a Resource Poor Language},
  year = {2015},
}

@article{dhananjaya2022,
  author = {Dhananjaya et al.},
  journal = {Year of Publication},
  title = {BERTifying Sinhala - A Comprehensive Analysis of Pre-trained Language Models for Sinhala Text Classification},
  year = {2022},
}

SinhalaNewsSourceClassification

This dataset contains Sinhala news headlines extracted from 9 news sources (websites) (Sri Lanka Army, Dinamina, GossipLanka, Hiru, ITN, Lankapuwath, NewsLK, Newsfirst, World Socialist Web Site-Sinhala).

Dataset: NLPC-UOM/Sinhala-News-Source-classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy sin News, Written derived found
Citation
@article{dhananjaya2022,
  author = {Dhananjaya et al.},
  journal = {Year of Publication},
  title = {BERTifying Sinhala - A Comprehensive Analysis of Pre-trained Language Models for Sinhala Text Classification},
  year = {2022},
}

SinhalaNewsSourceClassification.v2

This dataset contains Sinhala news headlines extracted from 9 news sources (websites) (Sri Lanka Army, Dinamina, GossipLanka, Hiru, ITN, Lankapuwath, NewsLK, Newsfirst, World Socialist Web Site-Sinhala). This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/sinhala_news_sourceLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy sin News, Written derived found
Citation
@article{dhananjaya2022,
  author = {Dhananjaya et al.},
  journal = {Year of Publication},
  title = {BERTifying Sinhala - A Comprehensive Analysis of Pre-trained Language Models for Sinhala Text Classification},
  year = {2022},
}

SiswatiNewsClassification

Siswati News Classification Dataset

Dataset: isaacchung/siswati-newsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ssw News, Written human-annotated found
Citation
@article{Madodonga_Marivate_Adendorff_2023,
  author = {Madodonga, Andani and Marivate, Vukosi and Adendorff, Matthew},
  doi = {10.55492/dhasa.v4i01.4449},
  month = {Jan.},
  title = {Izindaba-Tindzaba: Machine learning news categorisation for Long and Short Text for isiZulu and Siswati},
  url = {https://upjournals.up.ac.za/index.php/dhasa/article/view/4449},
  volume = {4},
  year = {2023},
}

SiswatiNewsClassification.v2

Siswati News Classification Dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/siswati_newsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ssw News, Written human-annotated found
Citation
@article{Madodonga_Marivate_Adendorff_2023,
  author = {Madodonga, Andani and Marivate, Vukosi and Adendorff, Matthew},
  doi = {10.55492/dhasa.v4i01.4449},
  month = {Jan.},
  title = {Izindaba-Tindzaba: Machine learning news categorisation for Long and Short Text for isiZulu and Siswati},
  url = {https://upjournals.up.ac.za/index.php/dhasa/article/view/4449},
  volume = {4},
  year = {2023},
}

SlovakHateSpeechClassification

The dataset contains posts from a social network with human annotations for hateful or offensive language in Slovak.

Dataset: TUKE-KEMT/hate_speech_slovakLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy slk Social, Written human-annotated found

SlovakHateSpeechClassification.v2

The dataset contains posts from a social network with human annotations for hateful or offensive language in Slovak. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/slovak_hate_speechLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy slk Social, Written human-annotated found

SlovakMovieReviewSentimentClassification

User reviews of movies on the CSFD movie database, with 2 sentiment classes (positive, negative)

Dataset: janko/sk_csfd-movie-reviewsLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy svk Reviews, Written derived found
Citation
@article{vstefanik2023resources,
  author = {{\v{S}}tef{\'a}nik, Michal and Kadl{\v{c}}{\'\i}k, Marek and Gramacki, Piotr and Sojka, Petr},
  journal = {arXiv preprint arXiv:2304.01922},
  title = {Resources and Few-shot Learners for In-context Learning in Slavic Languages},
  year = {2023},
}

SlovakMovieReviewSentimentClassification.v2

User reviews of movies on the CSFD movie database, with 2 sentiment classes (positive, negative) This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/slovak_movie_review_sentimentLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy svk Reviews, Written derived found
Citation
@article{vstefanik2023resources,
  author = {{\v{S}}tef{\'a}nik, Michal and Kadl{\v{c}}{\'\i}k, Marek and Gramacki, Piotr and Sojka, Petr},
  journal = {arXiv preprint arXiv:2304.01922},
  title = {Resources and Few-shot Learners for In-context Learning in Slavic Languages},
  year = {2023},
}

SouthAfricanLangClassification

A language identification test set for 11 South African Languages.

Dataset: mlexplorer008/south_african_language_identificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy afr, eng, nbl, nso, sot, ... (11) Non-fiction, Web, Written expert-annotated found
Citation
@misc{south-african-language-identification,
  author = {ExploreAI Academy, Joanne M},
  publisher = {Kaggle},
  title = {South African Language Identification},
  url = {https://kaggle.com/competitions/south-african-language-identification},
  year = {2022},
}

SpanishNewsClassification

A Spanish dataset for news classification. The dataset includes articles from reputable Spanish news sources spanning 12 different categories.

Dataset: MarcOrfilaCarreras/spanish-newsLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy spa News, Written derived found
Citation

SpanishNewsClassification.v2

A Spanish dataset for news classification. The dataset includes articles from reputable Spanish news sources spanning 12 different categories. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/spanish_newsLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy spa News, Written derived found
Citation

SpanishSentimentClassification

A Spanish dataset for sentiment classification.

Dataset: sepidmnorozy/Spanish_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy spa Reviews, Written derived found
Citation
@inproceedings{mollanorozy-etal-2023-cross,
  address = {Dubrovnik, Croatia},
  author = {Mollanorozy, Sepideh  and
Tanti, Marc  and
Nissim, Malvina},
  booktitle = {Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
  doi = {10.18653/v1/2023.sigtyp-1.9},
  editor = {Beinborn, Lisa  and
Goswami, Koustava  and
Murado{\\u{g}}lu, Saliha  and
Sorokin, Alexey  and
Kumar, Ritesh  and
Shcherbakov, Andreas  and
Ponti, Edoardo M.  and
Cotterell, Ryan  and
Vylomova, Ekaterina},
  month = may,
  pages = {89--95},
  publisher = {Association for Computational Linguistics},
  title = {Cross-lingual Transfer Learning with \{P\}ersian},
  url = {https://aclanthology.org/2023.sigtyp-1.9},
  year = {2023},
}

SpanishSentimentClassification.v2

A Spanish dataset for sentiment classification. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/spanish_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy spa Reviews, Written derived found
Citation
@inproceedings{mollanorozy-etal-2023-cross,
  address = {Dubrovnik, Croatia},
  author = {Mollanorozy, Sepideh  and
Tanti, Marc  and
Nissim, Malvina},
  booktitle = {Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP},
  doi = {10.18653/v1/2023.sigtyp-1.9},
  editor = {Beinborn, Lisa  and
Goswami, Koustava  and
Murado{\\u{g}}lu, Saliha  and
Sorokin, Alexey  and
Kumar, Ritesh  and
Shcherbakov, Andreas  and
Ponti, Edoardo M.  and
Cotterell, Ryan  and
Vylomova, Ekaterina},
  month = may,
  pages = {89--95},
  publisher = {Association for Computational Linguistics},
  title = {Cross-lingual Transfer Learning with \{P\}ersian},
  url = {https://aclanthology.org/2023.sigtyp-1.9},
  year = {2023},
}

SwahiliNewsClassification

Dataset for Swahili News Classification, categorized with 6 domains (Local News (Kitaifa), International News (Kimataifa), Finance News (Uchumi), Health News (Afya), Sports News (Michezo), and Entertainment News (Burudani)). Building and Optimizing Swahili Language Models: Techniques, Embeddings, and Datasets

Dataset: Mollel/SwahiliNewsClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy swa News, Written derived found
Citation
@inproceedings{davis2020swahili,
  author = {Davis, David},
  doi = {10.5281/zenodo.5514203},
  publisher = {Zenodo},
  title = {Swahili: News Classification Dataset (0.2)},
  url = {https://doi.org/10.5281/zenodo.5514203},
  year = {2020},
}

SwahiliNewsClassification.v2

Dataset for Swahili News Classification, categorized with 6 domains (Local News (Kitaifa), International News (Kimataifa), Finance News (Uchumi), Health News (Afya), Sports News (Michezo), and Entertainment News (Burudani)). Building and Optimizing Swahili Language Models: Techniques, Embeddings, and Datasets This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/swahili_newsLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy swa News, Written derived found
Citation
@inproceedings{davis2020swahili,
  author = {Davis, David},
  doi = {10.5281/zenodo.5514203},
  publisher = {Zenodo},
  title = {Swahili: News Classification Dataset (0.2)},
  url = {https://doi.org/10.5281/zenodo.5514203},
  year = {2020},
}

SweRecClassification

A Swedish dataset for sentiment classification on review

Dataset: mteb/swerec_classificationLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy swe Reviews, Written derived found
Citation
@inproceedings{nielsen-2023-scandeval,
  address = {T{\'o}rshavn, Faroe Islands},
  author = {Nielsen, Dan},
  booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
  editor = {Alum{\"a}e, Tanel  and
Fishel, Mark},
  month = may,
  pages = {185--201},
  publisher = {University of Tartu Library},
  title = {{S}cand{E}val: A Benchmark for {S}candinavian Natural Language Processing},
  url = {https://aclanthology.org/2023.nodalida-1.20},
  year = {2023},
}

SweRecClassification.v2

A Swedish dataset for sentiment classification on review This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/swe_recLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy swe Reviews, Written derived found
Citation
@inproceedings{nielsen-2023-scandeval,
  address = {T{\'o}rshavn, Faroe Islands},
  author = {Nielsen, Dan},
  booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)},
  editor = {Alum{\"a}e, Tanel  and
Fishel, Mark},
  month = may,
  pages = {185--201},
  publisher = {University of Tartu Library},
  title = {{S}cand{E}val: A Benchmark for {S}candinavian Natural Language Processing},
  url = {https://aclanthology.org/2023.nodalida-1.20},
  year = {2023},
}

SwedishSentimentClassification

Dataset of Swedish reviews scarped from various public available websites

Dataset: mteb/SwedishSentimentClassificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy swe Reviews, Written derived found

SwedishSentimentClassification.v2

Dataset of Swedish reviews scarped from various public available websites This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/swedish_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy swe Reviews, Written derived found

SwissJudgementClassification

Multilingual, diachronic dataset of Swiss Federal Supreme Court cases annotated with the respective binarized judgment outcome (approval/dismissal)

Dataset: rcds/swiss_judgment_predictionLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy deu, fra, ita Legal, Written expert-annotated found
Citation
@misc{niklaus2022empirical,
  archiveprefix = {arXiv},
  author = {Joel Niklaus and Matthias Stürmer and Ilias Chalkidis},
  eprint = {2209.12325},
  primaryclass = {cs.CL},
  title = {An Empirical Study on Cross-X Transfer for Legal Judgment Prediction},
  year = {2022},
}

SynPerChatbotConvSAAnger

Synthetic Persian Chatbot Conversational Sentiment Analysis Anger

Dataset: MCINext/synthetic-persian-chatbot-conversational-sentiment-analysis-angerLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotConvSAFear

Synthetic Persian Chatbot Conversational Sentiment Analysis Fear

Dataset: MCINext/synthetic-persian-chatbot-conversational-sentiment-analysis-fearLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotConvSAFriendship

Synthetic Persian Chatbot Conversational Sentiment Analysis Friendship

Dataset: MCINext/synthetic-persian-chatbot-conversational-sentiment-analysis-friendshipLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotConvSAHappiness

Synthetic Persian Chatbot Conversational Sentiment Analysis Happiness

Dataset: MCINext/synthetic-persian-chatbot-conversational-sentiment-analysis-happinessLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotConvSAJealousy

Synthetic Persian Chatbot Conversational Sentiment Analysis Jealousy

Dataset: MCINext/synthetic-persian-chatbot-conversational-sentiment-analysis-jealousyLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotConvSALove

Synthetic Persian Chatbot Conversational Sentiment Analysis Love

Dataset: MCINext/synthetic-persian-chatbot-conversational-sentiment-analysis-loveLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotConvSASadness

Synthetic Persian Chatbot Conversational Sentiment Analysis Sadness

Dataset: MCINext/synthetic-persian-chatbot-conversational-sentiment-analysis-sadnessLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotConvSASatisfaction

Synthetic Persian Chatbot Conversational Sentiment Analysis Satisfaction

Dataset: MCINext/synthetic-persian-chatbot-conversational-sentiment-analysis-satisfactionLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotConvSASurprise

Synthetic Persian Chatbot Conversational Sentiment Analysis Surprise

Dataset: MCINext/synthetic-persian-chatbot-conversational-sentiment-analysis-surpriseLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotConvSAToneChatbotClassification

Synthetic Persian Chatbot Conversational Sentiment Analysis Tone Chatbot Classification

Dataset: MCINext/synthetic-persian-chatbot-conversational-sentiment-analysis-tone-chatbot-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotConvSAToneUserClassification

Synthetic Persian Chatbot Conversational Sentiment Analysis Tone User

Dataset: MCINext/chatbot-conversational-sentiment-analysis-tone-user-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotRAGToneChatbotClassification

Synthetic Persian Chatbot RAG Tone Chatbot Classification

Dataset: MCINext/synthetic-persian-chatbot-rag-tone-chatbot-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotRAGToneUserClassification

Synthetic Persian Chatbot RAG Tone User Classification

Dataset: MCINext/synthetic-persian-chatbot-rag-tone-user-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotSatisfactionLevelClassification

Synthetic Persian Chatbot Satisfaction Level Classification

Dataset: MCINext/synthetic-persian-chatbot-satisfaction-level-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotToneChatbotClassification

Synthetic Persian Chatbot Tone Chatbot Classification

Dataset: MCINext/synthetic-persian-chatbot-tone-chatbot-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerChatbotToneUserClassification

Synthetic Persian Chatbot Tone User Classification

Dataset: MCINext/synthetic-persian-chatbot-tone-user-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas Spoken LM-generated LM-generated and verified
Citation

SynPerTextToneClassification

Persian Text Tone

Dataset: MCINext/synthetic-persian-text-tone-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas not specified LM-generated LM-generated and verified
Citation

SynPerTextToneClassification.v2

Persian Text Tone This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/syn_per_text_toneLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy fas not specified LM-generated LM-generated and verified
Citation

TNews

Short Text Classification for News

Dataset: C-MTEB/TNews-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified
Citation
@inproceedings{xu-etal-2020-clue,
  address = {Barcelona, Spain (Online)},
  author = {Xu, Liang  and
Hu, Hai and
Zhang, Xuanwei and
Li, Lu and
Cao, Chenjie and
Li, Yudong and
Xu, Yechen and
Sun, Kai and
Yu, Dian and
Yu, Cong and
Tian, Yin and
Dong, Qianqian and
Liu, Weitang and
Shi, Bo and
Cui, Yiming and
Li, Junyi and
Zeng, Jun and
Wang, Rongzhao and
Xie, Weijian and
Li, Yanting and
Patterson, Yina and
Tian, Zuoyu and
Zhang, Yiwen and
Zhou, He and
Liu, Shaoweihua and
Zhao, Zhe and
Zhao, Qipeng and
Yue, Cong and
Zhang, Xinrui and
Yang, Zhengliang and
Richardson, Kyle and
Lan, Zhenzhong },
  booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
  doi = {10.18653/v1/2020.coling-main.419},
  month = dec,
  pages = {4762--4772},
  publisher = {International Committee on Computational Linguistics},
  title = {{CLUE}: A {C}hinese Language Understanding Evaluation Benchmark},
  url = {https://aclanthology.org/2020.coling-main.419},
  year = {2020},
}

TNews.v2

Short Text Classification for News This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/t_newsLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified
Citation
@inproceedings{xu-etal-2020-clue,
  address = {Barcelona, Spain (Online)},
  author = {Xu, Liang  and
Hu, Hai and
Zhang, Xuanwei and
Li, Lu and
Cao, Chenjie and
Li, Yudong and
Xu, Yechen and
Sun, Kai and
Yu, Dian and
Yu, Cong and
Tian, Yin and
Dong, Qianqian and
Liu, Weitang and
Shi, Bo and
Cui, Yiming and
Li, Junyi and
Zeng, Jun and
Wang, Rongzhao and
Xie, Weijian and
Li, Yanting and
Patterson, Yina and
Tian, Zuoyu and
Zhang, Yiwen and
Zhou, He and
Liu, Shaoweihua and
Zhao, Zhe and
Zhao, Qipeng and
Yue, Cong and
Zhang, Xinrui and
Yang, Zhengliang and
Richardson, Kyle and
Lan, Zhenzhong },
  booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
  doi = {10.18653/v1/2020.coling-main.419},
  month = dec,
  pages = {4762--4772},
  publisher = {International Committee on Computational Linguistics},
  title = {{CLUE}: A {C}hinese Language Understanding Evaluation Benchmark},
  url = {https://aclanthology.org/2020.coling-main.419},
  year = {2020},
}

TamilNewsClassification

A Tamil dataset for 6-class classification of Tamil news articles

Dataset: mlexplorer008/tamil_news_classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 tam News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

TamilNewsClassification.v2

A Tamil dataset for 6-class classification of Tamil news articles This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/tamil_newsLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 tam News, Written derived found
Citation
@article{kunchukuttan2020indicnlpcorpus,
  author = {Anoop Kunchukuttan and Divyanshu Kakwani and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
  journal = {arXiv preprint arXiv:2005.00085},
  title = {AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages},
  year = {2020},
}

TelemarketingSalesRuleLegalBenchClassification

Determine how 16 C.F.R. § 310.3(a)(1) and 16 C.F.R. § 310.3(a)(2) (governing deceptive practices) apply to different fact patterns. This dataset is designed to test a model’s ability to apply 16 C.F.R. § 310.3(a)(1) and 16 C.F.R. § 310.3(a)(2) of the Telemarketing Sales Rule to a simple fact pattern with a clear outcome. Each fact pattern ends with the question: “Is this a violation of the Telemarketing Sales Rule?” Each fact pattern is paired with the answer “Yes” or the answer “No.” Fact patterns are listed in the column “text,” and answers are listed in the column “label.”

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

TeluguAndhraJyotiNewsClassification

A Telugu dataset for 5-class classification of Telugu news articles

Dataset: mlexplorer008/telugu_news_classificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 tel News, Written derived found

TeluguAndhraJyotiNewsClassification.v2

A Telugu dataset for 5-class classification of Telugu news articles This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/telugu_andhra_jyoti_newsLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 tel News, Written derived found

TenKGnadClassification

10k German News Articles Dataset (10kGNAD) contains news articles from the online Austrian newspaper website DER Standard with their topic classification (9 classes).

Dataset: mteb/TenKGnadClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy deu News, Written expert-annotated found
Citation
@inproceedings{Schabus2017,
  address = {Tokyo, Japan},
  author = {Dietmar Schabus and Marcin Skowron and Martin Trapp},
  booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)},
  doi = {10.1145/3077136.3080711},
  month = aug,
  pages = {1241--1244},
  title = {One Million Posts: A Data Set of German Online Discussions},
  year = {2017},
}

TenKGnadClassification.v2

10k German News Articles Dataset (10kGNAD) contains news articles from the online Austrian newspaper website DER Standard with their topic classification (9 classes). This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/ten_k_gnadLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy deu News, Written expert-annotated found
Citation
@inproceedings{Schabus2017,
  address = {Tokyo, Japan},
  author = {Dietmar Schabus and Marcin Skowron and Martin Trapp},
  booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)},
  doi = {10.1145/3077136.3080711},
  month = aug,
  pages = {1241--1244},
  title = {One Million Posts: A Data Set of German Online Discussions},
  year = {2017},
}

TextualismToolDictionariesLegalBenchClassification

Determine if a paragraph from a judicial opinion is applying a form textualism that relies on the dictionary meaning of terms.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

TextualismToolPlainLegalBenchClassification

Determine if a paragraph from a judicial opinion is applying a form textualism that relies on the ordinary (“plain”) meaning of terms.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

ToxicChatClassification

This dataset contains toxicity annotations on 10K user prompts collected from the Vicuna online demo. We utilize a human-AI collaborative annotation framework to guarantee the quality of annotation while maintaining a feasible annotation workload. The details of data collection, pre-processing, and annotation can be found in our paper. We believe that ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions. Only human annotated samples are selected here.

Dataset: lmsys/toxic-chatLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Constructed, Written expert-annotated found
Citation
@misc{lin2023toxicchat,
  archiveprefix = {arXiv},
  author = {Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang},
  eprint = {2310.17389},
  primaryclass = {cs.CL},
  title = {ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation},
  year = {2023},
}

ToxicChatClassification.v2

This dataset contains toxicity annotations on 10K user prompts collected from the Vicuna online demo. We utilize a human-AI collaborative annotation framework to guarantee the quality of annotation while maintaining a feasible annotation workload. The details of data collection, pre-processing, and annotation can be found in our paper. We believe that ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions. Only human annotated samples are selected here. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/toxic_chatLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Constructed, Written expert-annotated found
Citation
@misc{lin2023toxicchat,
  archiveprefix = {arXiv},
  author = {Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang},
  eprint = {2310.17389},
  primaryclass = {cs.CL},
  title = {ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation},
  year = {2023},
}

ToxicConversationsClassification

Collection of comments from the Civil Comments platform together with annotations if the comment is toxic or not.

Dataset: mteb/toxic_conversations_50kLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Social, Written human-annotated found
Citation
@misc{jigsaw-unintended-bias-in-toxicity-classification,
  author = {cjadams and Daniel Borkan and inversion and Jeffrey Sorensen and Lucas Dixon and Lucy Vasserman and nithum},
  publisher = {Kaggle},
  title = {Jigsaw Unintended Bias in Toxicity Classification},
  url = {https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification},
  year = {2019},
}

ToxicConversationsClassification.v2

Collection of comments from the Civil Comments platform together with annotations if the comment is toxic or not. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/toxic_conversationsLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Social, Written human-annotated found
Citation
@misc{jigsaw-unintended-bias-in-toxicity-classification,
  author = {cjadams and Daniel Borkan and inversion and Jeffrey Sorensen and Lucas Dixon and Lucy Vasserman and nithum},
  publisher = {Kaggle},
  title = {Jigsaw Unintended Bias in Toxicity Classification},
  url = {https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification},
  year = {2019},
}

ToxicConversationsVNClassification

A translated dataset from Collection of comments from the Civil Comments platform together with annotations if the comment is toxic or not. The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/toxic-conversations-50k-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Social, Written derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

TswanaNewsClassification

Tswana News Classification Dataset

Dataset: dsfsi/daily-news-dikgangLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy tsn News, Written derived found
Citation
@inproceedings{marivate2023puoberta,
  author = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai},
  booktitle = {SACAIR 2023 (To Appear)},
  dataset_url = {https://github.com/dsfsi/PuoBERTa},
  keywords = {NLP},
  preprint_url = {https://arxiv.org/abs/2310.09141},
  software_url = {https://huggingface.co/dsfsi/PuoBERTa},
  title = {PuoBERTa: Training and evaluation of a curated language model for Setswana},
  year = {2023},
}

TswanaNewsClassification.v2

Tswana News Classification Dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/tswana_newsLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy tsn News, Written derived found
Citation
@inproceedings{marivate2023puoberta,
  author = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai},
  booktitle = {SACAIR 2023 (To Appear)},
  dataset_url = {https://github.com/dsfsi/PuoBERTa},
  keywords = {NLP},
  preprint_url = {https://arxiv.org/abs/2310.09141},
  software_url = {https://huggingface.co/dsfsi/PuoBERTa},
  title = {PuoBERTa: Training and evaluation of a curated language model for Setswana},
  year = {2023},
}

TurkicClassification

A dataset of news classification in three Turkic languages.

Dataset: Electrotubbie/classification_Turkic_languagesLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy bak, kaz, kir News, Written derived found
Citation

TurkishMovieSentimentClassification

Turkish Movie Review Dataset

Dataset: asparius/Turkish-Movie-ReviewLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy tur Reviews, Written derived found
Citation
@inproceedings{Demirtas2013CrosslingualPD,
  author = {Erkin Demirtas and Mykola Pechenizkiy},
  booktitle = {wisdom},
  title = {Cross-lingual polarity detection with machine translation},
  url = {https://api.semanticscholar.org/CorpusID:3912960},
  year = {2013},
}

TurkishMovieSentimentClassification.v2

Turkish Movie Review Dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/turkish_movie_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy tur Reviews, Written derived found
Citation
@inproceedings{Demirtas2013CrosslingualPD,
  author = {Erkin Demirtas and Mykola Pechenizkiy},
  booktitle = {wisdom},
  title = {Cross-lingual polarity detection with machine translation},
  url = {https://api.semanticscholar.org/CorpusID:3912960},
  year = {2013},
}

TurkishProductSentimentClassification

Turkish Product Review Dataset

Dataset: asparius/Turkish-Product-ReviewLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy tur Reviews, Written derived found
Citation
@inproceedings{Demirtas2013CrosslingualPD,
  author = {Erkin Demirtas and Mykola Pechenizkiy},
  booktitle = {wisdom},
  title = {Cross-lingual polarity detection with machine translation},
  url = {https://api.semanticscholar.org/CorpusID:3912960},
  year = {2013},
}

TurkishProductSentimentClassification.v2

Turkish Product Review Dataset This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/turkish_product_sentimentLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy tur Reviews, Written derived found
Citation
@inproceedings{Demirtas2013CrosslingualPD,
  author = {Erkin Demirtas and Mykola Pechenizkiy},
  booktitle = {wisdom},
  title = {Cross-lingual polarity detection with machine translation},
  url = {https://api.semanticscholar.org/CorpusID:3912960},
  year = {2013},
}

TweetEmotionClassification

A dataset of 10,000 tweets that was created with the aim of covering the most frequently used emotion categories in Arabic tweets.

Dataset: mteb/TweetEmotionClassificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Social, Written human-annotated found
Citation
@inproceedings{al2018emotional,
  author = {Al-Khatib, Amr and El-Beltagy, Samhaa R},
  booktitle = {Computational Linguistics and Intelligent Text Processing: 18th International Conference, CICLing 2017, Budapest, Hungary, April 17--23, 2017, Revised Selected Papers, Part II 18},
  organization = {Springer},
  pages = {105--114},
  title = {Emotional tone detection in arabic tweets},
  year = {2018},
}

TweetEmotionClassification.v2

A dataset of 10,012 tweets that was created with the aim of covering the most frequently used emotion categories in Arabic tweets. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/TweetEmotionClassificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Social, Written human-annotated found
Citation
@inproceedings{al2018emotional,
  author = {Al-Khatib, Amr and El-Beltagy, Samhaa R},
  booktitle = {Computational Linguistics and Intelligent Text Processing: 18th International Conference, CICLing 2017, Budapest, Hungary, April 17--23, 2017, Revised Selected Papers, Part II 18},
  organization = {Springer},
  pages = {105--114},
  title = {Emotional tone detection in arabic tweets},
  year = {2018},
}

TweetSarcasmClassification

Arabic sarcasm detection dataset, which was created through the reannotation of available Arabic sentiment analysis datasets.

Dataset: iabufarha/ar_sarcasmLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Social, Written human-annotated found
Citation
@inproceedings{abu-farha-magdy-2020-arabic,
  abstract = {Sarcasm is one of the main challenges for sentiment analysis systems. Its complexity comes from the expression of opinion using implicit indirect phrasing. In this paper, we present ArSarcasm, an Arabic sarcasm detection dataset, which was created through the reannotation of available Arabic sentiment analysis datasets. The dataset contains 10,547 tweets, 16{\%} of which are sarcastic. In addition to sarcasm the data was annotated for sentiment and dialects. Our analysis shows the highly subjective nature of these tasks, which is demonstrated by the shift in sentiment labels based on annotators{'} biases. Experiments show the degradation of state-of-the-art sentiment analysers when faced with sarcastic content. Finally, we train a deep learning model for sarcasm detection using BiLSTM. The model achieves an F1 score of 0.46, which shows the challenging nature of the task, and should act as a basic baseline for future research on our dataset.},
  address = {Marseille, France},
  author = {Abu Farha, Ibrahim  and
Magdy, Walid},
  booktitle = {Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection},
  editor = {Al-Khalifa, Hend  and
Magdy, Walid  and
Darwish, Kareem  and
Elsayed, Tamer  and
Mubarak, Hamdy},
  isbn = {979-10-95546-51-1},
  language = {English},
  month = may,
  pages = {32--39},
  publisher = {European Language Resource Association},
  title = {From {A}rabic Sentiment Analysis to Sarcasm Detection: The {A}r{S}arcasm Dataset},
  url = {https://aclanthology.org/2020.osact-1.5},
  year = {2020},
}

TweetSarcasmClassification.v2

Arabic sarcasm detection dataset, which was created through the reannotation of available Arabic sentiment analysis datasets. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/tweet_sarcasmLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara Social, Written human-annotated found
Citation
@inproceedings{abu-farha-magdy-2020-arabic,
  abstract = {Sarcasm is one of the main challenges for sentiment analysis systems. Its complexity comes from the expression of opinion using implicit indirect phrasing. In this paper, we present ArSarcasm, an Arabic sarcasm detection dataset, which was created through the reannotation of available Arabic sentiment analysis datasets. The dataset contains 10,547 tweets, 16{\%} of which are sarcastic. In addition to sarcasm the data was annotated for sentiment and dialects. Our analysis shows the highly subjective nature of these tasks, which is demonstrated by the shift in sentiment labels based on annotators{'} biases. Experiments show the degradation of state-of-the-art sentiment analysers when faced with sarcastic content. Finally, we train a deep learning model for sarcasm detection using BiLSTM. The model achieves an F1 score of 0.46, which shows the challenging nature of the task, and should act as a basic baseline for future research on our dataset.},
  address = {Marseille, France},
  author = {Abu Farha, Ibrahim  and
Magdy, Walid},
  booktitle = {Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection},
  editor = {Al-Khalifa, Hend  and
Magdy, Walid  and
Darwish, Kareem  and
Elsayed, Tamer  and
Mubarak, Hamdy},
  isbn = {979-10-95546-51-1},
  language = {English},
  month = may,
  pages = {32--39},
  publisher = {European Language Resource Association},
  title = {From {A}rabic Sentiment Analysis to Sarcasm Detection: The {A}r{S}arcasm Dataset},
  url = {https://aclanthology.org/2020.osact-1.5},
  year = {2020},
}

TweetSentimentClassification

A multilingual Sentiment Analysis dataset consisting of tweets in 8 different languages.

Dataset: mteb/tweet_sentiment_multilingualLicense: cc-by-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ara, deu, eng, fra, hin, ... (8) Social, Written human-annotated found
Citation
@inproceedings{barbieri-etal-2022-xlm,
  abstract = {Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, current analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, and have relied on clean pre-training and task-specific corpora as multilingual signals. In this paper, we introduce XLM-T, a model to train and evaluate multilingual language models in Twitter. In this paper we provide: (1) a new strong multilingual baseline consisting of an XLM-R (Conneau et al. 2020) model pre-trained on millions of tweets in over thirty languages, alongside starter code to subsequently fine-tune on a target task; and (2) a set of unified sentiment analysis Twitter datasets in eight different languages and a XLM-T model trained on this dataset.},
  address = {Marseille, France},
  author = {Barbieri, Francesco  and
Espinosa Anke, Luis  and
Camacho-Collados, Jose},
  booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference},
  month = jun,
  pages = {258--266},
  publisher = {European Language Resources Association},
  title = {{XLM}-{T}: Multilingual Language Models in {T}witter for Sentiment Analysis and Beyond},
  url = {https://aclanthology.org/2022.lrec-1.27},
  year = {2022},
}

TweetSentimentExtractionClassification

Dataset: mteb/tweet_sentiment_extractionLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Social, Written human-annotated found
Citation
@misc{tweet-sentiment-extraction,
  author = {Maggie, Phil Culliton, Wei Chen},
  publisher = {Kaggle},
  title = {Tweet Sentiment Extraction},
  url = {https://kaggle.com/competitions/tweet-sentiment-extraction},
  year = {2020},
}

TweetSentimentExtractionClassification.v2

    This version corrects errors found in the original data. For details, see [pull request](https://github.com/embeddings-benchmark/mteb/pull/2900)

Dataset: mteb/tweet_sentiment_extractionLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Social, Written human-annotated found
Citation
@misc{tweet-sentiment-extraction,
  author = {Maggie, Phil Culliton, Wei Chen},
  publisher = {Kaggle},
  title = {Tweet Sentiment Extraction},
  url = {https://kaggle.com/competitions/tweet-sentiment-extraction},
  year = {2020},
}

TweetSentimentExtractionVNClassification

A collection of translated tweets annotated for sentiment extraction. The process of creating the VN-MTEB (Vietnamese Massive Text Embedding Benchmark) from English samples involves a new automated system: - The system uses large language models (LLMs), specifically Coherence's Aya model, for translation. - Applies advanced embedding models to filter the translations. - Use LLM-as-a-judge to scoring the quality of the samples base on multiple criteria.

Dataset: GreenNode/tweet-sentiment-extraction-vnLicense: cc-by-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Social, Written derived machine-translated and LM verified
Citation
@misc{pham2025vnmtebvietnamesemassivetext,
  archiveprefix = {arXiv},
  author = {Loc Pham and Tung Luu and Thu Vo and Minh Nguyen and Viet Hoang},
  eprint = {2507.21500},
  primaryclass = {cs.CL},
  title = {VN-MTEB: Vietnamese Massive Text Embedding Benchmark},
  url = {https://arxiv.org/abs/2507.21500},
  year = {2025},
}

TweetTopicSingleClassification

Topic classification dataset on Twitter with 6 labels. Each instance of TweetTopic comes with a timestamp which distributes from September 2019 to August 2021. Tweets were preprocessed before the annotation to normalize some artifacts, converting URLs into a special token {{URL}} and non-verified usernames into {{USERNAME}}. For verified usernames, we replace its display name (or account name) with symbols {@}.

Dataset: cardiffnlp/tweet_topic_singleLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng News, Social, Written expert-annotated found
Citation
@inproceedings{dimosthenis-etal-2022-twitter,
  address = {Gyeongju, Republic of Korea},
  author = {Antypas, Dimosthenis  and
Ushio, Asahi  and
Camacho-Collados, Jose  and
Neves, Leonardo  and
Silva, Vitor  and
Barbieri, Francesco},
  booktitle = {Proceedings of the 29th International Conference on Computational Linguistics},
  month = oct,
  publisher = {International Committee on Computational Linguistics},
  title = {{T}witter {T}opic {C}lassification},
  year = {2022},
}

TweetTopicSingleClassification.v2

Topic classification dataset on Twitter with 6 labels. Each instance of TweetTopic comes with a timestamp which distributes from September 2019 to August 2021. Tweets were preprocessed before the annotation to normalize some artifacts, converting URLs into a special token {{URL}} and non-verified usernames into {{USERNAME}}. For verified usernames, we replace its display name (or account name) with symbols {@}.

    This version corrects errors found in the original data. For details, see [pull request](https://github.com/embeddings-benchmark/mteb/pull/2900)

Dataset: mteb/tweet_topic_singleLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng News, Social, Written expert-annotated found
Citation
@inproceedings{dimosthenis-etal-2022-twitter,
  address = {Gyeongju, Republic of Korea},
  author = {Antypas, Dimosthenis  and
Ushio, Asahi  and
Camacho-Collados, Jose  and
Neves, Leonardo  and
Silva, Vitor  and
Barbieri, Francesco},
  booktitle = {Proceedings of the 29th International Conference on Computational Linguistics},
  month = oct,
  publisher = {International Committee on Computational Linguistics},
  title = {{T}witter {T}opic {C}lassification},
  year = {2022},
}

UCCVCommonLawLegalBenchClassification

Determine if a contract is governed by the Uniform Commercial Code (UCC) or the common law of contracts.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

UkrFormalityClassification

    This dataset contains Ukrainian Formality Classification dataset obtained by
    trainslating English GYAFC data.
    English data source: https://aclanthology.org/N18-1012/
    Translation into Ukrainian language using model: https://huggingface.co/facebook/nllb-200-distilled-600M
    Additionally, the dataset was balanced, witha labels: 0 - informal, 1 - formal.

Dataset: ukr-detect/ukr-formality-dataset-translated-gyafcLicense: openrail++ • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ukr News, Written derived machine-translated
Citation
@inproceedings{rao-tetreault-2018-dear,
  author = {Rao, Sudha  and
Tetreault, Joel},
  booktitle = {Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
  month = jun,
  publisher = {Association for Computational Linguistics},
  title = {Dear Sir or Madam, May {I} Introduce the {GYAFC} Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer},
  url = {https://aclanthology.org/N18-1012},
  year = {2018},
}

UkrFormalityClassification.v2

    This dataset contains Ukrainian Formality Classification dataset obtained by
    trainslating English GYAFC data.
    English data source: https://aclanthology.org/N18-1012/
    Translation into Ukrainian language using model: https://huggingface.co/facebook/nllb-200-distilled-600M
    Additionally, the dataset was balanced, witha labels: 0 - informal, 1 - formal.

    This version corrects errors found in the original data. For details, see [pull request](https://github.com/embeddings-benchmark/mteb/pull/2900)

Dataset: mteb/ukr_formalityLicense: openrail++ • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy ukr News, Written derived machine-translated
Citation
@inproceedings{rao-tetreault-2018-dear,
  author = {Rao, Sudha  and
Tetreault, Joel},
  booktitle = {Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
  month = jun,
  publisher = {Association for Computational Linguistics},
  title = {Dear Sir or Madam, May {I} Introduce the {GYAFC} Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer},
  url = {https://aclanthology.org/N18-1012},
  year = {2018},
}

UnfairTOSLegalBenchClassification

Given a clause from a terms-of-service contract, determine the category the clause belongs to. The purpose of this task is classifying clauses in Terms of Service agreements. Clauses have been annotated by into nine categories: ['Arbitration', 'Unilateral change', 'Content removal', 'Jurisdiction', 'Choice of law', 'Limitation of liability', 'Unilateral termination', 'Contract by using', 'Other']. The first eight categories correspond to clauses that would potentially be deemed potentially unfair. The last category (Other) corresponds to clauses in agreements which don’t fit into these categories.

Dataset: nguha/legalbenchLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Legal, Written expert-annotated found
Citation
@misc{guha2023legalbench,
  archiveprefix = {arXiv},
  author = {Neel Guha and Julian Nyarko and Daniel E. Ho and Christopher Ré and Adam Chilton and Aditya Narayana and Alex Chohlas-Wood and Austin Peters and Brandon Waldon and Daniel N. Rockmore and Diego Zambrano and Dmitry Talisman and Enam Hoque and Faiz Surani and Frank Fagan and Galit Sarfaty and Gregory M. Dickinson and Haggai Porat and Jason Hegland and Jessica Wu and Joe Nudell and Joel Niklaus and John Nay and Jonathan H. Choi and Kevin Tobia and Margaret Hagan and Megan Ma and Michael Livermore and Nikon Rasumov-Rahe and Nils Holzenberger and Noam Kolt and Peter Henderson and Sean Rehaag and Sharad Goel and Shang Gao and Spencer Williams and Sunny Gandhi and Tom Zur and Varun Iyer and Zehua Li},
  eprint = {2308.11462},
  primaryclass = {cs.CL},
  title = {LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models},
  year = {2023},
}

@article{lippi2019claudette,
  author = {Lippi, Marco and Pa{\l}ka, Przemys{\l}aw and Contissa, Giuseppe and Lagioia, Francesca and Micklitz, Hans-Wolfgang and Sartor, Giovanni and Torroni, Paolo},
  journal = {Artificial Intelligence and Law},
  pages = {117--139},
  publisher = {Springer},
  title = {CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service},
  volume = {27},
  year = {2019},
}

UrduRomanSentimentClassification

The Roman Urdu dataset is a data corpus comprising of more than 20000 records tagged for sentiment (Positive, Negative, Neutral)

Dataset: mteb/UrduRomanSentimentClassificationLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 urd Social, Written derived found
Citation
@misc{misc_roman_urdu_data_set_458,
  author = {Sharf,Zareen},
  howpublished = {UCI Machine Learning Repository},
  note = {{DOI}: https://doi.org/10.24432/C58325},
  title = {{Roman Urdu Data Set}},
  year = {2018},
}

UrduRomanSentimentClassification.v2

The Roman Urdu dataset is a data corpus comprising of more than 20000 records tagged for sentiment (Positive, Negative, Neutral) This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/urdu_roman_sentimentLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 urd Social, Written derived found
Citation
@misc{misc_roman_urdu_data_set_458,
  author = {Sharf,Zareen},
  howpublished = {UCI Machine Learning Repository},
  note = {{DOI}: https://doi.org/10.24432/C58325},
  title = {{Roman Urdu Data Set}},
  year = {2018},
}

VieStudentFeedbackClassification

A Vietnamese dataset for classification of student feedback

Dataset: uitnlp/vietnamese_students_feedbackLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Reviews, Written human-annotated created
Citation
@inproceedings{8573337,
  author = {Nguyen, Kiet Van and Nguyen, Vu Duc and Nguyen, Phu X. V. and Truong, Tham T. H. and Nguyen, Ngan Luu-Thuy},
  booktitle = {2018 10th International Conference on Knowledge and Systems Engineering (KSE)},
  doi = {10.1109/KSE.2018.8573337},
  number = {},
  pages = {19-24},
  title = {UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis},
  volume = {},
  year = {2018},
}

VieStudentFeedbackClassification.v2

A Vietnamese dataset for classification of student feedback This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/vie_student_feedbackLicense: mit • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy vie Reviews, Written human-annotated created
Citation
@inproceedings{8573337,
  author = {Nguyen, Kiet Van and Nguyen, Vu Duc and Nguyen, Phu X. V. and Truong, Tham T. H. and Nguyen, Ngan Luu-Thuy},
  booktitle = {2018 10th International Conference on Knowledge and Systems Engineering (KSE)},
  doi = {10.1109/KSE.2018.8573337},
  number = {},
  pages = {19-24},
  title = {UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis},
  volume = {},
  year = {2018},
}

WRIMEClassification

A dataset of Japanese social network rated for sentiment

Dataset: shunk031/wrimeLicense: https://huggingface.co/datasets/shunk031/wrime#licensing-information • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy jpn Social, Written human-annotated found
Citation
@inproceedings{kajiwara-etal-2021-wrime,
  abstract = {We annotate 17,000 SNS posts with both the writer{'}s subjective emotional intensity and the reader{'}s objective one to construct a Japanese emotion analysis dataset. In this study, we explore the difference between the emotional intensity of the writer and that of the readers with this dataset. We found that the reader cannot fully detect the emotions of the writer, especially anger and trust. In addition, experimental results in estimating the emotional intensity show that it is more difficult to estimate the writer{'}s subjective labels than the readers{'}. The large gap between the subjective and objective emotions imply the complexity of the mapping from a post to the subjective emotion intensities, which also leads to a lower performance with machine learning models.},
  address = {Online},
  author = {Kajiwara, Tomoyuki  and
Chu, Chenhui  and
Takemura, Noriko  and
Nakashima, Yuta  and
Nagahara, Hajime},
  booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  doi = {10.18653/v1/2021.naacl-main.169},
  editor = {Toutanova, Kristina  and
Rumshisky, Anna  and
Zettlemoyer, Luke  and
Hakkani-Tur, Dilek  and
Beltagy, Iz  and
Bethard, Steven  and
Cotterell, Ryan  and
Chakraborty, Tanmoy  and
Zhou, Yichao},
  month = jun,
  pages = {2095--2104},
  publisher = {Association for Computational Linguistics},
  title = {{WRIME}: A New Dataset for Emotional Intensity Estimation with Subjective and Objective Annotations},
  url = {https://aclanthology.org/2021.naacl-main.169},
  year = {2021},
}

WRIMEClassification.v2

A dataset of Japanese social network rated for sentiment This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/wrimeLicense: https://huggingface.co/datasets/shunk031/wrime#licensing-information • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy jpn Social, Written human-annotated found
Citation
@inproceedings{kajiwara-etal-2021-wrime,
  abstract = {We annotate 17,000 SNS posts with both the writer{'}s subjective emotional intensity and the reader{'}s objective one to construct a Japanese emotion analysis dataset. In this study, we explore the difference between the emotional intensity of the writer and that of the readers with this dataset. We found that the reader cannot fully detect the emotions of the writer, especially anger and trust. In addition, experimental results in estimating the emotional intensity show that it is more difficult to estimate the writer{'}s subjective labels than the readers{'}. The large gap between the subjective and objective emotions imply the complexity of the mapping from a post to the subjective emotion intensities, which also leads to a lower performance with machine learning models.},
  address = {Online},
  author = {Kajiwara, Tomoyuki  and
Chu, Chenhui  and
Takemura, Noriko  and
Nakashima, Yuta  and
Nagahara, Hajime},
  booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  doi = {10.18653/v1/2021.naacl-main.169},
  editor = {Toutanova, Kristina  and
Rumshisky, Anna  and
Zettlemoyer, Luke  and
Hakkani-Tur, Dilek  and
Beltagy, Iz  and
Bethard, Steven  and
Cotterell, Ryan  and
Chakraborty, Tanmoy  and
Zhou, Yichao},
  month = jun,
  pages = {2095--2104},
  publisher = {Association for Computational Linguistics},
  title = {{WRIME}: A New Dataset for Emotional Intensity Estimation with Subjective and Objective Annotations},
  url = {https://aclanthology.org/2021.naacl-main.169},
  year = {2021},
}

Waimai

Sentiment Analysis of user reviews on takeaway platforms

Dataset: C-MTEB/waimai-classificationLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified
Citation
@article{xiao2023c,
  author = {Xiao, Shitao and Liu, Zheng and Zhang, Peitian and Muennighof, Niklas},
  journal = {arXiv preprint arXiv:2309.07597},
  title = {C-pack: Packaged resources to advance general chinese embedding},
  year = {2023},
}

Waimai.v2

Sentiment Analysis of user reviews on takeaway platforms This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/waimaiLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy cmn not specified not specified not specified
Citation
@article{xiao2023c,
  author = {Xiao, Shitao and Liu, Zheng and Zhang, Peitian and Muennighof, Niklas},
  journal = {arXiv preprint arXiv:2309.07597},
  title = {C-pack: Packaged resources to advance general chinese embedding},
  year = {2023},
}

WikipediaBioMetChemClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaEasy2GeneExpressionVsMetallurgyClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaBioMetChemClassification.v2

ChemTEB evaluates the performance of text embedding models on chemical domain data. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/wikipedia_bio_met_chemLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaBiolumNeurochemClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaMedium2BioluminescenceVsNeurochemistryClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaChemEngSpecialtiesClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaMedium5ClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaChemFieldsClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaEZ10ClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaChemFieldsClassification.v2

ChemTEB evaluates the performance of text embedding models on chemical domain data. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/wikipedia_chem_fieldsLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaChemistryTopicsClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaEasy10ClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaCompChemSpectroscopyClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaMedium2ComputationalVsSpectroscopistsClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaCompChemSpectroscopyClassification.v2

ChemTEB evaluates the performance of text embedding models on chemical domain data. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/wikipedia_comp_chem_spectroscopyLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaCryobiologySeparationClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaEasy5ClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaCrystallographyAnalyticalClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaMedium2CrystallographyVsChromatographyTitrationpHClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaCrystallographyAnalyticalClassification.v2

ChemTEB evaluates the performance of text embedding models on chemical domain data. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/wikipedia_crystallography_analyticalLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaGreenhouseEnantiopureClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaEasy2GreenhouseVsEnantiopureClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaIsotopesFissionClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaHard2IsotopesVsFissionProductsNuclearFissionClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaLuminescenceClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaHard2BioluminescenceVsLuminescenceClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaOrganicInorganicClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaEasy2SpecialClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaSaltsSemiconductorsClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaHard2SaltsVsSemiconductorMaterialsClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaSolidStateColloidalClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaEasy2SolidStateVsColloidalClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaTheoreticalAppliedClassification

ChemTEB evaluates the performance of text embedding models on chemical domain data.

Dataset: BASF-AI/WikipediaEZ2ClassificationLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WikipediaTheoreticalAppliedClassification.v2

ChemTEB evaluates the performance of text embedding models on chemical domain data. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/wikipedia_theoretical_appliedLicense: cc-by-nc-sa-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Chemistry derived created
Citation
@article{kasmaee2024chemteb,
  author = {Kasmaee, Ali Shiraee and Khodadad, Mohammad and Saloot, Mohammad Arshi and Sherck, Nick and Dokas, Stephen and Mahyar, Hamidreza and Samiee, Soheila},
  journal = {arXiv preprint arXiv:2412.00532},
  title = {ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance \& Efficiency on a Specific Domain},
  year = {2024},
}

WisesightSentimentClassification

Wisesight Sentiment Corpus: Social media messages in Thai language with sentiment label (positive, neutral, negative, question)

Dataset: mteb/WisesightSentimentClassificationLicense: cc0-1.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 tha News, Social, Written expert-annotated found
Citation
@software{bact_2019_3457447,
  author = {Suriyawongkul, Arthit and
Chuangsuwanich, Ekapol and
Chormai, Pattarawat and
Polpanumas, Charin},
  doi = {10.5281/zenodo.3457447},
  month = sep,
  publisher = {Zenodo},
  title = {PyThaiNLP/wisesight-sentiment: First release},
  url = {https://doi.org/10.5281/zenodo.3457447},
  version = {v1.0},
  year = {2019},
}

WisesightSentimentClassification.v2

Wisesight Sentiment Corpus: Social media messages in Thai language with sentiment label (positive, neutral, negative, question) This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/wisesight_sentimentLicense: cc0-1.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) f1 tha News, Social, Written expert-annotated found
Citation
@software{bact_2019_3457447,
  author = {Suriyawongkul, Arthit and
Chuangsuwanich, Ekapol and
Chormai, Pattarawat and
Polpanumas, Charin},
  doi = {10.5281/zenodo.3457447},
  month = sep,
  publisher = {Zenodo},
  title = {PyThaiNLP/wisesight-sentiment: First release},
  url = {https://doi.org/10.5281/zenodo.3457447},
  version = {v1.0},
  year = {2019},
}

WongnaiReviewsClassification

Wongnai features over 200,000 restaurants, beauty salons, and spas across Thailand on its platform, with detailed information about each merchant and user reviews. In this dataset there are 5 classes corressponding each star rating

Dataset: Wongnai/wongnai_reviewsLicense: lgpl-3.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy tha Reviews, Written derived found
Citation
@software{cstorm125_2020_3852912,
  author = {cstorm125 and lukkiddd},
  doi = {10.5281/zenodo.3852912},
  month = may,
  publisher = {Zenodo},
  title = {PyThaiNLP/classification-benchmarks: v0.1-alpha},
  url = {https://doi.org/10.5281/zenodo.3852912},
  version = {v0.1-alpha},
  year = {2020},
}

YahooAnswersTopicsClassification

Dataset composed of questions and answers from Yahoo Answers, categorized into topics.

Dataset: community-datasets/yahoo_answers_topicsLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Web, Written human-annotated found
Citation
@inproceedings{NIPS2015_250cf8b5,
  author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
  pages = {},
  publisher = {Curran Associates, Inc.},
  title = {Character-level Convolutional Networks for Text Classification},
  url = {https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf},
  volume = {28},
  year = {2015},
}

YahooAnswersTopicsClassification.v2

Dataset composed of questions and answers from Yahoo Answers, categorized into topics. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/yahoo_answers_topicsLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Web, Written human-annotated found
Citation
@inproceedings{NIPS2015_250cf8b5,
  author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
  pages = {},
  publisher = {Curran Associates, Inc.},
  title = {Character-level Convolutional Networks for Text Classification},
  url = {https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf},
  volume = {28},
  year = {2015},
}

YelpReviewFullClassification

Yelp Review Full is a dataset for sentiment analysis, containing 5 classes corresponding to ratings 1-5.

Dataset: Yelp/yelp_review_fullLicense: https://huggingface.co/datasets/Yelp/yelp_review_full#licensing-information • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Reviews, Written derived found
Citation
@inproceedings{NIPS2015_250cf8b5,
  author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
  pages = {},
  publisher = {Curran Associates, Inc.},
  title = {Character-level Convolutional Networks for Text Classification},
  url = {https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf},
  volume = {28},
  year = {2015},
}

YelpReviewFullClassification.v2

Yelp Review Full is a dataset for sentiment analysis, containing 5 classes corresponding to ratings 1-5. This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/yelp_review_fullLicense: https://huggingface.co/datasets/Yelp/yelp_review_full#licensing-information • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy eng Reviews, Written derived found
Citation
@inproceedings{NIPS2015_250cf8b5,
  author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
  pages = {},
  publisher = {Curran Associates, Inc.},
  title = {Character-level Convolutional Networks for Text Classification},
  url = {https://proceedings.neurips.cc/paper_files/paper/2015/file/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf},
  volume = {28},
  year = {2015},
}

YueOpenriceReviewClassification

A Cantonese dataset for review classification

Dataset: izhx/yue-openrice-reviewLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy yue Reviews, Spoken human-annotated found
Citation
@inproceedings{xiang2019sentiment,
  author = {Xiang, Rong and Jiao, Ying and Lu, Qin},
  booktitle = {Proceedings of the 8th KDD Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM)},
  organization = {KDD WISDOM},
  pages = {1--9},
  title = {Sentiment Augmented Attention Network for Cantonese Restaurant Review Analysis},
  year = {2019},
}

YueOpenriceReviewClassification.v2

A Cantonese dataset for review classification This version corrects errors found in the original data. For details, see pull request

Dataset: mteb/yue_openrice_reviewLicense: not specified • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
text to category (t2c) accuracy yue Reviews, Spoken human-annotated found
Citation
@inproceedings{xiang2019sentiment,
  author = {Xiang, Rong and Jiao, Ying and Lu, Qin},
  booktitle = {Proceedings of the 8th KDD Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM)},
  organization = {KDD WISDOM},
  pages = {1--9},
  title = {Sentiment Augmented Attention Network for Cantonese Restaurant Review Analysis},
  year = {2019},
}