AudioZeroshotClassification¶
- Number of tasks: 5
ESC50_Zeroshot¶
Environmental Sound Classification Dataset.
Dataset: mteb/esc50 • License: cc-by-nc-sa-3.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to text (a2t) | accuracy | zxx | Spoken | human-annotated | found |
Citation
@inproceedings{piczak2015dataset,
author = {Piczak, Karol J.},
booktitle = {Proceedings of the 23rd {Annual ACM Conference} on {Multimedia}},
date = {2015-10-13},
doi = {10.1145/2733373.2806390},
isbn = {978-1-4503-3459-4},
location = {{Brisbane, Australia}},
pages = {1015--1018},
publisher = {{ACM Press}},
title = {{ESC}: {Dataset} for {Environmental Sound Classification}},
url = {http://dl.acm.org/citation.cfm?doid=2733373.2806390},
}
RavdessZeroshot¶
Emotion classification Dataset. RAVDESS contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech emotions includes neutral,calm, happy, sad, angry, fearful, surprise, and disgust expressions. These 8 emtoions also serve as labels for the dataset.
Dataset: mteb/RavdessZeroshot • License: cc-by-nc-sa-3.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to text (a2t) | accuracy | eng | Spoken | human-annotated | found |
Citation
@article{10.1371/journal.pone.0196391,
author = {Livingstone, Steven R. AND Russo, Frank A.},
doi = {10.1371/journal.pone.0196391},
journal = {PLOS ONE},
month = {05},
number = {5},
pages = {1-35},
publisher = {Public Library of Science},
title = {The Ryerson Audio-Visual Database ofal Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English},
url = {https://doi.org/10.1371/journal.pone.0196391},
volume = {13},
year = {2018},
}
SpeechCommandsZeroshotv0.01¶
Sound Classification/Keyword Spotting Dataset. This is a set of one-second audio clips containing a single spoken English word or background noise. These words are from a small set of commands such as 'yes', 'no', and 'stop' spoken by various speakers. With a total of 10 labels/commands for keyword spotting and a total of 30 labels for other auxiliary tasks
Dataset: mteb/SpeechCommandsZeroshotv0.01 • License: cc-by-4.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to text (a2t) | accuracy | eng | Spoken | human-annotated | found |
Citation
@article{DBLP:journals/corr/abs-1804-03209,
author = {Pete Warden},
bibsource = {dblp computer science bibliography, https://dblp.org},
biburl = {https://dblp.org/rec/journals/corr/abs-1804-03209.bib},
eprint = {1804.03209},
eprinttype = {arXiv},
journal = {CoRR},
timestamp = {Mon, 13 Aug 2018 16:48:32 +0200},
title = {Speech Commands: {A} Dataset for Limited-Vocabulary Speech Recognition},
url = {http://arxiv.org/abs/1804.03209},
volume = {abs/1804.03209},
year = {2018},
}
SpeechCommandsZeroshotv0.02¶
Sound Classification/Keyword Spotting Dataset. This is a set of one-second audio clips containing a single spoken English word or background noise. These words are from a small set of commands such as 'yes', 'no', and 'stop' spoken by various speakers. With a total of 10 labels/commands for keyword spotting and a total of 30 labels for other auxiliary tasks
Dataset: mteb/SpeechCommandsZeroshotv0.02 • License: cc-by-4.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to text (a2t) | accuracy | eng | Spoken | human-annotated | found |
Citation
@article{DBLP:journals/corr/abs-1804-03209,
author = {Pete Warden},
bibsource = {dblp computer science bibliography, https://dblp.org},
biburl = {https://dblp.org/rec/journals/corr/abs-1804-03209.bib},
eprint = {1804.03209},
eprinttype = {arXiv},
journal = {CoRR},
timestamp = {Mon, 13 Aug 2018 16:48:32 +0200},
title = {Speech Commands: {A} Dataset for Limited-Vocabulary Speech Recognition},
url = {http://arxiv.org/abs/1804.03209},
volume = {abs/1804.03209},
year = {2018},
}
UrbanSound8kZeroshot¶
Environmental Sound Classification Dataset.
Dataset: mteb/urbansound8K • License: cc-by-nc-sa-3.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to text (a2t) | accuracy | zxx | AudioScene | human-annotated | found |
Citation
@inproceedings{Salamon:UrbanSound:ACMMM:14,
author = {Salamon, Justin and Jacoby, Christopher and Bello, Juan Pablo},
booktitle = {Proceedings of the 22nd ACM international conference on Multimedia},
organization = {ACM},
pages = {1041--1044},
title = {A Dataset and Taxonomy for Urban Sound Research},
year = {2014},
}