AudioMultilabelClassification¶
- Number of tasks: 6
AudioSet¶
AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos.
Dataset: agkphysics/AudioSet • License: cc-by-4.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to text (a2t) | lrap | eng | Music, Scene, Speech, Web | human-annotated | found |
Citation
@inproceedings{audioset,
author = {Gemmeke, Jort F and Ellis, Daniel PW and Freedman, Dylan and Jansen, Aren and Lawrence, Wade and Moore, R Channing and Plakal, Manoj and Ritter, Marvin},
booktitle = {2017 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
organization = {IEEE},
pages = {776--780},
title = {Audio set: An ontology and human-labeled dataset for audio events},
year = {2017},
}
AudioSetMini¶
AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. This is a mini version that is sampled from the original dataset.
Dataset: mteb/audioset • License: cc-by-4.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to text (a2t) | lrap | eng | Music, Scene, Speech, Web | human-annotated | found |
Citation
@inproceedings{audioset,
author = {Gemmeke, Jort F and Ellis, Daniel PW and Freedman, Dylan and Jansen, Aren and Lawrence, Wade and Moore, R Channing and Plakal, Manoj and Ritter, Marvin},
booktitle = {2017 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
organization = {IEEE},
pages = {776--780},
title = {Audio set: An ontology and human-labeled dataset for audio events},
year = {2017},
}
BirdSet¶
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
Dataset: mteb/BirdSet • License: cc-by-nc-4.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to text (a2t) | accuracy | zxx | Bioacoustics, Speech, Spoken | human-annotated | created |
Citation
@misc{rauch2024birdsetlargescaledatasetaudio,
archiveprefix = {arXiv},
author = {Lukas Rauch and Raphael Schwinger and Moritz Wirth and René Heinrich and Denis Huseljic and Marek Herde and Jonas Lange and Stefan Kahl and Bernhard Sick and Sven Tomforde and Christoph Scholz},
eprint = {2403.10380},
primaryclass = {cs.SD},
title = {BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics},
url = {https://arxiv.org/abs/2403.10380},
year = {2024},
}
FSD2019Kaggle¶
Multilabel Audio Classification.
Dataset: mteb/fsdkaggle2019-parquet • License: cc-by-4.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to text (a2t) | accuracy | eng | Web | human-annotated | found |
Citation
@dataset{eduardo_fonseca_2020_3612637,
author = {Eduardo Fonseca and
Manoj Plakal and
Frederic Font and
Daniel P. W. Ellis and
Xavier Serra},
doi = {10.5281/zenodo.3612637},
month = jan,
publisher = {Zenodo},
title = {FSDKaggle2019},
url = {https://doi.org/10.5281/zenodo.3612637},
version = {1.0},
year = {2020},
}
FSD50K¶
Multilabel Audio Classification on a subsampled version of FSD50K using 2048 samples
Dataset: mteb/fsd50k_mini • License: cc-by-4.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to text (a2t) | accuracy | eng | Web | human-annotated | found |
Citation
@article{9645159,
author = {Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier},
doi = {10.1109/TASLP.2021.3133208},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
keywords = {Videos;Task analysis;Labeling;Vocabulary;Speech recognition;Ontologies;Benchmark testing;Audio dataset;sound event;recognition;classification;tagging;data collection;environmental sound},
number = {},
pages = {829-852},
title = {FSD50K: An Open Dataset of Human-Labeled Sound Events},
volume = {30},
year = {2022},
}
SIBFLEURS¶
Topic Classification for multilingual audio dataset. This dataset is a stratified and downsampled subset of the SIBFLEURS dataset, which is a collection of 1000+ hours of audio data in 100+ languages.
Dataset: mteb/sib-fleurs-multilingual-mini • License: cc-by-4.0 • Learn more →
| Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
|---|---|---|---|---|---|
| audio to category (a2c) | accuracy | afr, amh, arb, asm, ast, ... (101) | Encyclopaedic | derived | found |
Citation
@misc{schmidt2025fleursslumassivelymultilingualbenchmark,
archiveprefix = {arXiv},
author = {Fabian David Schmidt and Ivan Vulić and Goran Glavaš and David Ifeoluwa Adelani},
eprint = {2501.06117},
primaryclass = {cs.CL},
title = {Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding},
url = {https://arxiv.org/abs/2501.06117},
year = {2025},
}