AudioPairClassification¶

Number of tasks: 5

CREMADPairClassification¶

Classifying pairs as having same or different emotions in actor's voice recordings of text spoken in 6 different emotions

Dataset: mteb/CREMADPairClassification • License: http://opendatacommons.org/licenses/odbl/1.0/ • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to audio (a2a)	max_ap	eng	Spoken	human-annotated	created

Citation

@article{Cao2014-ih,
  author = {Cao, Houwei and Cooper, David G and Keutmann, Michael K and Gur,
Ruben C and Nenkova, Ani and Verma, Ragini},
  copyright = {https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html},
  journal = {IEEE Transactions on Affective Computing},
  keywords = {Emotional corpora; facial expression; multi-modal recognition;
voice expression},
  language = {en},
  month = oct,
  number = {4},
  pages = {377--390},
  publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
  title = {{CREMA-D}: Crowd-sourced emotional multimodal actors dataset},
  volume = {5},
  year = {2014},
}

ESC50PairClassification¶

Environmental Sound Classification Dataset.

Dataset: mteb/ESC50PairClassification • License: cc-by-nc-sa-3.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to audio (a2a)	max_ap	zxx	Encyclopaedic	human-annotated	found

Citation

@inproceedings{piczak2015dataset,
  author = {Piczak, Karol J.},
  booktitle = {Proceedings of the 23rd {Annual ACM Conference} on {Multimedia}},
  date = {2015-10-13},
  doi = {10.1145/2733373.2806390},
  isbn = {978-1-4503-3459-4},
  location = {{Brisbane, Australia}},
  pages = {1015--1018},
  publisher = {{ACM Press}},
  title = {{ESC}: {Dataset} for {Environmental Sound Classification}},
  url = {http://dl.acm.org/citation.cfm?doid=2733373.2806390},
}

NMSQAPairClassification¶

A textless Q&A dataset. Given a pair of audio question and audio answer, is the answer relevant to the question?

Dataset: mteb/NMSQAPairClassification • License: cc-by-sa-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to audio (a2a)	max_ap	eng	Spoken	human-annotated	found

Citation

@misc{lin2022dualdiscretespokenunit,
  archiveprefix = {arXiv},
  author = {Guan-Ting Lin and Yung-Sung Chuang and Ho-Lam Chung and Shu-wen Yang and Hsuan-Jui Chen and Shuyan Dong and Shang-Wen Li and Abdelrahman Mohamed and Hung-yi Lee and Lin-shan Lee},
  eprint = {2203.04911},
  primaryclass = {cs.CL},
  title = {DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering},
  url = {https://arxiv.org/abs/2203.04911},
  year = {2022},
}

VocalSoundPairClassification¶

Recognizing whether two audio clips are the same human vocal expression (laughing, sighing, etc.)

Dataset: mteb/VocalSoundPairClassification • License: cc-by-sa-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to audio (a2a)	max_ap	eng	Spoken	human-annotated	found

Citation

@inproceedings{Gong_2022,
  author = {Gong, Yuan and Yu, Jin and Glass, James},
  booktitle = {ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  doi = {10.1109/icassp43922.2022.9746828},
  month = may,
  publisher = {IEEE},
  title = {Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition},
  url = {http://dx.doi.org/10.1109/ICASSP43922.2022.9746828},
  year = {2022},
}

VoxPopuliAccentPairClassification¶

Classifying same or different regional accent of English (Empty Audio Samples filtered out)

Dataset: mteb/VoxPopuliAccentPairClassificationFiltered • License: cc0-1.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to audio (a2a)	max_ap	eng	Spoken	human-annotated	created

Citation

@inproceedings{wang-etal-2021-voxpopuli,
  address = {Online},
  author = {Wang, Changhan  and
Riviere, Morgane  and
Lee, Ann  and
Wu, Anne  and
Talnikar, Chaitanya  and
Haziza, Daniel  and
Williamson, Mary  and
Pino, Juan  and
Dupoux, Emmanuel},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
  doi = {10.18653/v1/2021.acl-long.80},
  editor = {Zong, Chengqing  and
Xia, Fei  and
Li, Wenjie  and
Navigli, Roberto},
  month = aug,
  pages = {993--1003},
  publisher = {Association for Computational Linguistics},
  title = {{V}ox{P}opuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation},
  url = {https://aclanthology.org/2021.acl-long.80/},
  year = {2021},
}