Creating a Multimodal Dataset of Images and Text to Study Abusive Language

Aprosio, Alessio Palmero; Menini, Stefano; Tonelli, Sara

Computer Science > Computation and Language

arXiv:2005.02235 (cs)

[Submitted on 5 May 2020]

Title:Creating a Multimodal Dataset of Images and Text to Study Abusive Language

Authors:Alessio Palmero Aprosio, Stefano Menini, Sara Tonelli

View PDF

Abstract:In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal hate speech data. We therefore developed CREENDER, an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments, which we make freely available under Apache 2.0 license. The corpus, with Italian comments, has been analysed from different perspectives, to investigate whether the subject of the images plays a role in triggering a comment. We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2005.02235 [cs.CL]
	(or arXiv:2005.02235v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.02235

Submission history

From: Alessio Palmero Aprosio [view email]
[v1] Tue, 5 May 2020 14:31:47 UTC (2,190 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-05

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Alessio Palmero Aprosio

export BibTeX citation

Computer Science > Computation and Language

Title:Creating a Multimodal Dataset of Images and Text to Study Abusive Language

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Creating a Multimodal Dataset of Images and Text to Study Abusive Language

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators