MVEB
Audio-visual video embedding quality across retrieval, classification, clustering, pair classification, zero-shot classification, and video-centric QA, with tasks selected to maximize coverage of audio-video joint modality inputs.
Languages 16
Tasks 23
Task Types 6
Models 0