Quantifying the visual concreteness of words and topics in multimodal datasets

Hessel, Jack; Mimno, David; Lee, Lillian

Computer Science > Computation and Language

arXiv:1804.06786 (cs)

[Submitted on 18 Apr 2018 (v1), last revised 23 May 2018 (this version, v2)]

Title:Quantifying the visual concreteness of words and topics in multimodal datasets

Authors:Jack Hessel, David Mimno, Lillian Lee

View PDF

Abstract:Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multimodal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multimodal research.

Comments:	NAACL HLT 2018, 14 pages, 6 figures, data available at this http URL
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Cite as:	arXiv:1804.06786 [cs.CL]
	(or arXiv:1804.06786v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1804.06786
Journal reference:	2018 North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT)

Submission history

From: Jack Hessel [view email]
[v1] Wed, 18 Apr 2018 15:23:04 UTC (5,075 KB)
[v2] Wed, 23 May 2018 19:15:45 UTC (5,075 KB)

Computer Science > Computation and Language

Title:Quantifying the visual concreteness of words and topics in multimodal datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Quantifying the visual concreteness of words and topics in multimodal datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators