# Datasets ###### tags: `graduated` `dataset` ref: https://blog.csdn.net/z704630835/article/details/99844183 ## VQA v2 website: https://visualqa.org/ paper: https://arxiv.org/pdf/1612.00837.pdf image_source: COCO size: 1.1 Million (image, question) pairs ![](https://i.imgur.com/HbBf49N.png) e.g. ![](https://i.imgur.com/lBAH0HV.png) ## Visual-Genome (International Journal of Computer Vision 2017) website: https://visualgenome.org/ paper: https://visualgenome.org/static/paper/Visual_Genome.pdf e.g. ![](https://i.imgur.com/9fzhi38.png) --- ## VQA-E > The `VQA-E` dataset is automatically derived from the popular VQA v2 dataset. paper: https://arxiv.org/pdf/1803.07464.pdf references: https://blog.csdn.net/z704630835/article/details/102721997 image_source: COCO e.g. ![](https://i.imgur.com/qW5lqWs.png) --- ## SNLI-VE: Visual Entailment Dataset > `SNLI-VE` is built on top of `SNLI` and `Flickr30K`. The problem that VE is trying to solve is to reason about the relationship between an image premise $P_{image}$ and a text hypothesis $H_{text}$. github: https://github.com/allenai/allennlp-models, https://github.com/necla-ml/SNLI-VE image_source: e.g. ![](https://i.imgur.com/naL3jjL.jpg) ## VQA-CP > 基於 VQAv2 ,打亂training, testing dataset答案的分佈 paper: https://arxiv.org/pdf/1712.00377.pdf ## Visual7W > 選擇題, 由 47,300 張 COCO,327,929 問答,1,311,756 個人工生成的多項選擇和 561,459 個對象基礎。 paper: http://ai.stanford.edu/~yukez/papers/cvpr2016.pdf e.g. ![](https://i.imgur.com/jISZVUQ.jpg) ## GQA (IEEE/CVF 2019) > 新的VQA資料集,由 [COCO](https://cocodataset.org/#home), [Flickr](https://webscope.sandbox.yahoo.com/catalog.php?datatype=i&did=67) and [Visual Genome](https://visualgenome.org/) 的圖片組成 paper: https://arxiv.org/pdf/1902.09506.pdf website: https://cs.stanford.edu/people/dorarad/gqa/index.html ![](https://i.imgur.com/n8R5olc.png) e.g. ![](https://i.imgur.com/h2cZmNh.png) ## e-ViL(ICCV 2021) > human-written NLEs (natural language explanations), provides a unified evaluation framework that is designed to be re-usable for future works paper: https://openaccess.thecvf.com/content/ICCV2021/papers/Kayser_E-ViL_A_Dataset_and_Benchmark_for_Natural_Language_Explanations_in_ICCV_2021_paper.pdf ![](https://i.imgur.com/Rw5eZ1O.png) e.g. ![](https://i.imgur.com/CqmvZri.png) ## VCR (CVPR 2019) website: https://visualcommonsense.com/ paper: https://arxiv.org/pdf/1811.10830.pdf usage: ![](https://i.imgur.com/t6KhVgv.png) e.g. ![](https://i.imgur.com/qC6Tg50.png) ## LVIS (CVPR 2019) tag: `segmentation` paper: https://arxiv.org/abs/1908.03195 e.g. ![](https://i.imgur.com/B5m8Vp0.jpg) ## MMDialog github: https://github.com/victorsungo/MMDialog paper:https://arxiv.org/abs/2211.05719 e.g. ![](https://i.imgur.com/A62QM6i.png) ## VQA-X paper: https://arxiv.org/pdf/1711.07373.pdf github: drive: ## ACT-X (CVPR'14) paper: https://ieeexplore.ieee.org/document/6909866 ## TextVQA-X github: https://github.com/amzn/explainable-text-vqa ## IconQA website: https://iconqa.github.io/