AudioMultilabelClassification¶

Number of tasks: 6

AudioSet¶

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos.

Dataset: agkphysics/AudioSet • License: cc-by-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to text (a2t)	lrap	eng	Music, Scene, Speech, Web	human-annotated	found

Citation

@inproceedings{audioset,
  author = {Gemmeke, Jort F and Ellis, Daniel PW and Freedman, Dylan and Jansen, Aren and Lawrence, Wade and Moore, R Channing and Plakal, Manoj and Ritter, Marvin},
  booktitle = {2017 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
  organization = {IEEE},
  pages = {776--780},
  title = {Audio set: An ontology and human-labeled dataset for audio events},
  year = {2017},
}

AudioSetMini¶

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. This is a mini version that is sampled from the original dataset.

Dataset: mteb/audioset • License: cc-by-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to text (a2t)	lrap	eng	Music, Scene, Speech, Web	human-annotated	found

Citation

@inproceedings{audioset,
  author = {Gemmeke, Jort F and Ellis, Daniel PW and Freedman, Dylan and Jansen, Aren and Lawrence, Wade and Moore, R Channing and Plakal, Manoj and Ritter, Marvin},
  booktitle = {2017 IEEE international conference on acoustics, speech and signal processing (ICASSP)},
  organization = {IEEE},
  pages = {776--780},
  title = {Audio set: An ontology and human-labeled dataset for audio events},
  year = {2017},
}

BirdSet¶

BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics

Dataset: mteb/BirdSet • License: cc-by-nc-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to text (a2t)	accuracy	zxx	Bioacoustics, Speech, Spoken	human-annotated	created

Citation

@misc{rauch2024birdsetlargescaledatasetaudio,
  archiveprefix = {arXiv},
  author = {Lukas Rauch and Raphael Schwinger and Moritz Wirth and René Heinrich and Denis Huseljic and Marek Herde and Jonas Lange and Stefan Kahl and Bernhard Sick and Sven Tomforde and Christoph Scholz},
  eprint = {2403.10380},
  primaryclass = {cs.SD},
  title = {BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics},
  url = {https://arxiv.org/abs/2403.10380},
  year = {2024},
}

FSD2019Kaggle¶

Multilabel Audio Classification.

Dataset: mteb/fsdkaggle2019-parquet • License: cc-by-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to text (a2t)	accuracy	eng	Web	human-annotated	found

Citation

@dataset{eduardo_fonseca_2020_3612637,
  author = {Eduardo Fonseca and
Manoj Plakal and
Frederic Font and
Daniel P. W. Ellis and
Xavier Serra},
  doi = {10.5281/zenodo.3612637},
  month = jan,
  publisher = {Zenodo},
  title = {FSDKaggle2019},
  url = {https://doi.org/10.5281/zenodo.3612637},
  version = {1.0},
  year = {2020},
}

FSD50K¶

Multilabel Audio Classification on a subsampled version of FSD50K using 2048 samples

Dataset: mteb/fsd50k_mini • License: cc-by-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to text (a2t)	accuracy	eng	Web	human-annotated	found

Citation

@article{9645159,
  author = {Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier},
  doi = {10.1109/TASLP.2021.3133208},
  journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  keywords = {Videos;Task analysis;Labeling;Vocabulary;Speech recognition;Ontologies;Benchmark testing;Audio dataset;sound event;recognition;classification;tagging;data collection;environmental sound},
  number = {},
  pages = {829-852},
  title = {FSD50K: An Open Dataset of Human-Labeled Sound Events},
  volume = {30},
  year = {2022},
}

SIBFLEURS¶

Topic Classification for multilingual audio dataset. This dataset is a stratified and downsampled subset of the SIBFLEURS dataset, which is a collection of 1000+ hours of audio data in 100+ languages.

Dataset: mteb/sib-fleurs-multilingual-mini • License: cc-by-4.0 • Learn more →

Task category	Score	Languages	Domains	Annotations Creators	Sample Creation
audio to category (a2c)	accuracy	afr, amh, arb, asm, ast, ... (101)	Encyclopaedic	derived	found

Citation

@misc{schmidt2025fleursslumassivelymultilingualbenchmark,
  archiveprefix = {arXiv},
  author = {Fabian David Schmidt and Ivan Vulić and Goran Glavaš and David Ifeoluwa Adelani},
  eprint = {2501.06117},
  primaryclass = {cs.CL},
  title = {Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding},
  url = {https://arxiv.org/abs/2501.06117},
  year = {2025},
}