The Limitations of Stylometry for Detecting Machine-Generated Fake News

Schuster, Tal; Schuster, Roei; Shah, Darsh J; Barzilay, Regina

Computer Science > Computation and Language

arXiv:1908.09805 (cs)

[Submitted on 26 Aug 2019 (v1), last revised 20 Feb 2020 (this version, v2)]

Title:The Limitations of Stylometry for Detecting Machine-Generated Fake News

Authors:Tal Schuster, Roei Schuster, Darsh J Shah, Regina Barzilay

View PDF

Abstract:Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. While humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks demonstrating the stylistic similarity between malicious and legitimate uses of LMs, employed in auto-completion and editing-assistance settings. Our findings highlight the need for non-stylometry approaches in detecting machine-generated misinformation, and open up the discussion on the desired evaluation benchmarks.

Comments:	Accepted for Computational Linguistics journal (squib). Previously posted with title "Are We Safe Yet? The Limitations of Distributional Features for Fake News Detection"
Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY)
Cite as:	arXiv:1908.09805 [cs.CL]
	(or arXiv:1908.09805v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1908.09805

Submission history

From: Tal Schuster [view email]
[v1] Mon, 26 Aug 2019 17:23:22 UTC (190 KB)
[v2] Thu, 20 Feb 2020 18:32:33 UTC (208 KB)

Computer Science > Computation and Language

Title:The Limitations of Stylometry for Detecting Machine-Generated Fake News

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:The Limitations of Stylometry for Detecting Machine-Generated Fake News

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators