Skip to content

AudioMultilabelClassification

  • Number of tasks: 6

AudioSet

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos.

Dataset: agkphysics/AudioSetLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
audio to text (a2t) lrap eng Music, Scene, Speech, Web human-annotated found
Citation
@inproceedings{audioset,
  author = {Gemmeke, Jort F and Ellis, Daniel PW and Freedman, Dylan and Jansen, Aren and Lawrence, Wade and Moore, R Channing and Plakal, Manoj and Ritter, Marvin},
  booktitle = {2017 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
  organization = {IEEE},
  pages = {776--780},
  title = {Audio set: An ontology and human-labeled dataset for audio events},
  year = {2017},
}

AudioSetMini

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. This is a mini version that is sampled from the original dataset.

Dataset: mteb/audiosetLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
audio to text (a2t) lrap eng Music, Scene, Speech, Web human-annotated found
Citation
@inproceedings{audioset,
  author = {Gemmeke, Jort F and Ellis, Daniel PW and Freedman, Dylan and Jansen, Aren and Lawrence, Wade and Moore, R Channing and Plakal, Manoj and Ritter, Marvin},
  booktitle = {2017 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
  organization = {IEEE},
  pages = {776--780},
  title = {Audio set: An ontology and human-labeled dataset for audio events},
  year = {2017},
}

BirdSet

BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics

Dataset: mteb/BirdSetLicense: cc-by-nc-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
audio to text (a2t) accuracy zxx Bioacoustics, Speech, Spoken human-annotated created
Citation
@misc{rauch2024birdsetlargescaledatasetaudio,
  archiveprefix = {arXiv},
  author = {Lukas Rauch and Raphael Schwinger and Moritz Wirth and René Heinrich and Denis Huseljic and Marek Herde and Jonas Lange and Stefan Kahl and Bernhard Sick and Sven Tomforde and Christoph Scholz},
  eprint = {2403.10380},
  primaryclass = {cs.SD},
  title = {BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics},
  url = {https://arxiv.org/abs/2403.10380},
  year = {2024},
}

FSD2019Kaggle

Multilabel Audio Classification.

Dataset: mteb/fsdkaggle2019-parquetLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
audio to text (a2t) accuracy eng Web human-annotated found
Citation
@dataset{eduardo_fonseca_2020_3612637,
  author = {Eduardo Fonseca and
Manoj Plakal and
Frederic Font and
Daniel P. W. Ellis and
Xavier Serra},
  doi = {10.5281/zenodo.3612637},
  month = jan,
  publisher = {Zenodo},
  title = {FSDKaggle2019},
  url = {https://doi.org/10.5281/zenodo.3612637},
  version = {1.0},
  year = {2020},
}

FSD50K

Multilabel Audio Classification on a subsampled version of FSD50K using 2048 samples

Dataset: mteb/fsd50k_miniLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
audio to text (a2t) accuracy eng Web human-annotated found
Citation
@article{9645159,
  author = {Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier},
  doi = {10.1109/TASLP.2021.3133208},
  journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  keywords = {Videos;Task analysis;Labeling;Vocabulary;Speech recognition;Ontologies;Benchmark testing;Audio dataset;sound event;recognition;classification;tagging;data collection;environmental sound},
  number = {},
  pages = {829-852},
  title = {FSD50K: An Open Dataset of Human-Labeled Sound Events},
  volume = {30},
  year = {2022},
}

SIBFLEURS

Topic Classification for multilingual audio dataset. This dataset is a stratified and downsampled subset of the SIBFLEURS dataset, which is a collection of 1000+ hours of audio data in 100+ languages.

Dataset: mteb/sib-fleurs-multilingual-miniLicense: cc-by-4.0 • Learn more →

Task category Score Languages Domains Annotations Creators Sample Creation
audio to category (a2c) accuracy afr, amh, arb, asm, ast, ... (101) Encyclopaedic derived found
Citation
@misc{schmidt2025fleursslumassivelymultilingualbenchmark,
  archiveprefix = {arXiv},
  author = {Fabian David Schmidt and Ivan Vulić and Goran Glavaš and David Ifeoluwa Adelani},
  eprint = {2501.06117},
  primaryclass = {cs.CL},
  title = {Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding},
  url = {https://arxiv.org/abs/2501.06117},
  year = {2025},
}