STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi


Abstract
In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions for 164,062 images. In the experiment, we show that a neural network trained using STAIR Captions can generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions.
Anthology ID:
P17-2066
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
417–421
Language:
URL:
https://aclanthology.org/P17-2066
DOI:
10.18653/v1/P17-2066
Bibkey:
Cite (ACL):
Yuya Yoshikawa, Yutaro Shigeto, and Akikazu Takeuchi. 2017. STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 417–421, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset (Yoshikawa et al., ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-2066.pdf
Data
STAIR CaptionsFlickr30kMS COCO