DocumentUnderstanding¶
- Number of tasks: 58
JinaVDRAirbnbSyntheticRetrieval¶
Retrieve rendered tables from Airbnb listings based on templated queries.
Dataset: jinaai/airbnb-synthetic-retrieval_beir
• License: cc0-1.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | ara, deu, eng, fra, hin, ... (10) | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRArabicChartQARetrieval¶
Retrieve Arabic charts based on queries.
Dataset: jinaai/arabic_chartqa_ar_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | ara | Academic | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRArabicInfographicsVQARetrieval¶
Retrieve Arabic infographics based on queries.
Dataset: jinaai/arabic_infographicsvqa_ar_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | ara | Academic | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRArxivQARetrieval¶
Retrieve figures from scientific papers from arXiv based on LLM generated queries.
Dataset: jinaai/arxivqa_beir
• License: cc-by-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRAutomobileCatelogRetrieval¶
Retrieve automobile marketing documents based on LLM generated queries.
Dataset: jinaai/automobile_catalogue_jp_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | jpn | Engineering, Web | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRBeveragesCatalogueRetrieval¶
Retrieve beverages marketing documents based on LLM generated queries.
Dataset: jinaai/beverages_catalogue_ru_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | rus | Web | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRCharXivOCRRetrieval¶
Retrieve charts from scientific papers based on human annotated queries.
Dataset: jinaai/CharXiv-en_beir
• License: cc-by-sa-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRChartQARetrieval¶
Retrieve charts based on LLM generated queries.
Dataset: jinaai/ChartQA_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRDocQAAI¶
Retrieve AI documents based on LLM generated queries.
Dataset: jinaai/docqa_artificial_intelligence_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRDocQAEnergyRetrieval¶
Retrieve energy industry documents based on LLM generated queries.
Dataset: jinaai/docqa_energy_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRDocQAGovReportRetrieval¶
Retrieve government reports based on LLM generated queries.
Dataset: jinaai/docqa_gov_report_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Government | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRDocQAHealthcareIndustryRetrieval¶
Retrieve healthcare industry documents based on LLM generated queries.
Dataset: jinaai/docqa_healthcare_industry_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Medical | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRDocVQARetrieval¶
Retrieve industry documents based on human annotated queries.
Dataset: jinaai/docvqa_beir
• License: cc-by-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRDonutVQAISynHMPRetrieval¶
Retrieve medical records based on templated queries.
Dataset: jinaai/donut_vqa_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Medical | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDREuropeanaDeNewsRetrieval¶
Retrieve German news articles based on LLM generated queries.
Dataset: jinaai/europeana-de-news_beir
• License: cc0-1.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | deu | News | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDREuropeanaEsNewsRetrieval¶
Retrieve Spanish news articles based on LLM generated queries.
Dataset: jinaai/europeana-es-news_beir
• License: cc0-1.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | spa | News | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDREuropeanaFrNewsRetrieval¶
Retrieve French news articles from Europeana based on LLM generated queries.
Dataset: jinaai/europeana-fr-news_beir
• License: cc0-1.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | fra | News | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDREuropeanaItScansRetrieval¶
Retrieve Italian historical articles based on LLM generated queries.
Dataset: jinaai/europeana-it-scans_beir
• License: cc0-1.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | ita | News | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDREuropeanaNlLegalRetrieval¶
Retrieve Dutch historical legal documents based on LLM generated queries.
Dataset: jinaai/europeana-nl-legal_beir
• License: cc0-1.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | nld | Legal | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRGitHubReadmeRetrieval¶
Retrieve GitHub readme files based their description.
Dataset: jinaai/github-readme-retrieval-multilingual_beir
• License: multiple • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | ara, ben, deu, eng, fra, ... (17) | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRHindiGovVQARetrieval¶
Retrieve Hindi government documents based on LLM generated queries.
Dataset: jinaai/hindi-gov-vqa_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | hin | Government | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRHungarianDocQARetrieval¶
Retrieve Hungarian documents in various formats based on human annotated queries.
Dataset: jinaai/hungarian_doc_qa_beir
• License: cc-by-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | hun | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRInfovqaRetrieval¶
Retrieve infographics based on human annotated queries.
Dataset: jinaai/infovqa_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRJDocQARetrieval¶
Retrieve Japanese documents in various formats based on human annotated queries.
Dataset: jinaai/jdocqa_beir
• License: cc-by-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | jpn | Web | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRJina2024YearlyBookRetrieval¶
Retrieve pages from the 2024 Jina yearbook based on human annotated questions.
Dataset: jinaai/jina_2024_yearly_book_beir
• License: apache-2.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRMMTabRetrieval¶
Retrieve tables from the MMTab dataset based on queries.
Dataset: jinaai/MMTab_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRMPMQARetrieval¶
Retrieve product manuals based on human annotated queries.
Dataset: jinaai/mpmqa_small_beir
• License: apache-2.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | human-annotated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRMedicalPrescriptionsRetrieval¶
Retrieve medical prescriptions based on templated queries.
Dataset: jinaai/medical-prescriptions_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Medical | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDROWIDChartsRetrieval¶
Retrieve charts from the OWID dataset based on accompanied text snippets.
Dataset: jinaai/owid_charts_en_beir
• License: cc-by-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDROpenAINewsRetrieval¶
Retrieve news articles from the OpenAI news website based on human annotated queries.
Dataset: jinaai/openai-news_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | News, Web | human-annotated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRPlotQARetrieval¶
Retrieve plots from the PlotQA dataset based on LLM generated queries.
Dataset: jinaai/plotqa_beir
• License: cc-by-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRRamensBenchmarkRetrieval¶
Retrieve ramen restaurant marketing documents based on LLM generated queries.
Dataset: jinaai/ramen_benchmark_jp_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | jpn | Web | LM-generated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRShanghaiMasterPlanRetrieval¶
Retrieve pages from the Shanghai Master Plan based on human annotated queries.
Dataset: jinaai/shanghai_master_plan_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | zho | Web | human-annotated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRShiftProjectRetrieval¶
Retrieve documents with graphs from the Shift Project based on LLM generated queries.
Dataset: jinaai/shiftproject_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRStanfordSlideRetrieval¶
Retrieve scientific and engineering slides based on human annotated queries.
Dataset: jinaai/stanford_slide_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | human-annotated | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRStudentEnrollmentSyntheticRetrieval¶
Retrieve student enrollment data based on templated queries.
Dataset: jinaai/student-enrollment_beir
• License: cc0-1.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRTQARetrieval¶
Retrieve textbook pages (images and text) based on LLM generated queries from the text.
Dataset: jinaai/tqa_beir
• License: cc-by-nc-3.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRTabFQuadRetrieval¶
Retrieve tables from industry documents based on LLM generated queries.
Dataset: jinaai/tabfquad_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRTableVQARetrieval¶
Retrieve scientific tables based on LLM generated queries.
Dataset: jinaai/table-vqa_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRTatQARetrieval¶
Retrieve financial reports based on human annotated queries.
Dataset: jinaai/tatqa_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRTweetStockSyntheticsRetrieval¶
Retrieve rendered tables of stock prices based on templated queries.
Dataset: jinaai/tweet-stock-synthetic-retrieval_beir
• License: not specified • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | ara, deu, eng, fra, hin, ... (10) | Social | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRWikimediaCommonsDocumentsRetrieval¶
Retrieve historical documents from Wikimedia Commons based on their description.
Dataset: jinaai/wikimedia-commons-documents-ml_beir
• License: multiple • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | ara, ben, deu, eng, fra, ... (20) | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
JinaVDRWikimediaCommonsMapsRetrieval¶
Retrieve maps from Wikimedia Commons based on their description.
Dataset: jinaai/wikimedia-commons-maps_beir
• License: multiple • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Web | derived | found |
Citation
@misc{günther2025jinaembeddingsv4universalembeddingsmultimodal,
archiveprefix = {arXiv},
author = {Michael Günther and Saba Sturua and Mohammad Kalim Akram and Isabelle Mohr and Andrei Ungureanu and Bo Wang and Sedigheh Eslami and Scott Martens and Maximilian Werk and Nan Wang and Han Xiao},
eprint = {2506.18902},
primaryclass = {cs.AI},
title = {jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval},
url = {https://arxiv.org/abs/2506.18902},
year = {2025},
}
MIRACLVisionRetrieval¶
Retrieve associated pages according to questions.
Dataset: nvidia/miracl-vision
• License: cc-by-sa-4.0 • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | ara, ben, deu, eng, fas, ... (18) | Encyclopaedic | derived | created |
Citation
@article{osmulski2025miraclvisionlargemultilingualvisual,
author = {Radek Osmulski and Gabriel de Souza P. Moreira and Ronay Ak and Mengyao Xu and Benedikt Schifferer and Even Oldridge},
eprint = {2505.11651},
journal = {arxiv},
title = {{MIRACL-VISION: A Large, multilingual, visual document retrieval benchmark}},
url = {https://arxiv.org/abs/2505.11651},
year = {2025},
}
Vidore2BioMedicalLecturesRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/biomedical_lectures_v2
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | deu, eng, fra, spa | Academic | derived | found |
Citation
@article{mace2025vidorev2,
author = {Macé, Quentin and Loison António and Faysse, Manuel},
journal = {arXiv preprint arXiv:2505.17166},
title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
year = {2025},
}
Vidore2ESGReportsHLRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/esg_reports_human_labeled_v2
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{mace2025vidorev2,
author = {Macé, Quentin and Loison António and Faysse, Manuel},
journal = {arXiv preprint arXiv:2505.17166},
title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
year = {2025},
}
Vidore2ESGReportsRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/esg_reports_v2
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | deu, eng, fra, spa | Academic | derived | found |
Citation
@article{mace2025vidorev2,
author = {Macé, Quentin and Loison António and Faysse, Manuel},
journal = {arXiv preprint arXiv:2505.17166},
title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
year = {2025},
}
Vidore2EconomicsReportsRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/economics_reports_v2
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | deu, eng, fra, spa | Academic | derived | found |
Citation
@article{mace2025vidorev2,
author = {Macé, Quentin and Loison António and Faysse, Manuel},
journal = {arXiv preprint arXiv:2505.17166},
title = {ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
year = {2025},
}
VidoreArxivQARetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/arxivqa_test_subsampled_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
VidoreDocVQARetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/docvqa_test_subsampled_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
VidoreInfoVQARetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/infovqa_test_subsampled_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
VidoreShiftProjectRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/shiftproject_test_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
VidoreSyntheticDocQAAIRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/syntheticDocQA_artificial_intelligence_test_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
VidoreSyntheticDocQAEnergyRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/syntheticDocQA_energy_test_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
VidoreSyntheticDocQAGovernmentReportsRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/syntheticDocQA_government_reports_test_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
VidoreSyntheticDocQAHealthcareIndustryRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/syntheticDocQA_healthcare_industry_test_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
VidoreTabfquadRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/tabfquad_test_subsampled_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}
VidoreTatdqaRetrieval¶
Retrieve associated pages according to questions.
Dataset: vidore/tatdqa_test_beir
• License: mit • Learn more →
Task category | Score | Languages | Domains | Annotations Creators | Sample Creation |
---|---|---|---|---|---|
text to image (t2i) | ndcg_at_5 | eng | Academic | derived | found |
Citation
@article{faysse2024colpali,
author = {Faysse, Manuel and Sibille, Hugues and Wu, Tony and Viaud, Gautier and Hudelot, C{\'e}line and Colombo, Pierre},
journal = {arXiv preprint arXiv:2407.01449},
title = {ColPali: Efficient Document Retrieval with Vision Language Models},
year = {2024},
}