Surfacing Biases in Large Language Models using Contrastive Input Decoding

Yona, Gal; Honovich, Or; Laish, Itay; Aharoni, Roee

Computer Science > Computation and Language

arXiv:2305.07378 (cs)

[Submitted on 12 May 2023]

Title:Surfacing Biases in Large Language Models using Contrastive Input Decoding

Authors:Gal Yona, Or Honovich, Itay Laish, Roee Aharoni

View PDF

Abstract:Ensuring that large language models (LMs) are fair, robust and useful requires an understanding of how different modifications to their inputs impact the model's behaviour. In the context of open-text generation tasks, however, such an evaluation is not trivial. For example, when introducing a model with an input text and a perturbed, "contrastive" version of it, meaningful differences in the next-token predictions may not be revealed with standard decoding strategies. With this motivation in mind, we propose Contrastive Input Decoding (CID): a decoding algorithm to generate text given two inputs, where the generated text is likely given one input but unlikely given the other. In this way, the contrastive generations can highlight potentially subtle differences in how the LM output differs for the two inputs in a simple and interpretable manner. We use CID to highlight context-specific biases that are hard to detect with standard decoding strategies and quantify the effect of different input perturbations.

Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2305.07378 [cs.CL]
	(or arXiv:2305.07378v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.07378

Submission history

From: Gal Yona [view email]
[v1] Fri, 12 May 2023 11:09:49 UTC (7,280 KB)

Computer Science > Computation and Language

Title:Surfacing Biases in Large Language Models using Contrastive Input Decoding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Surfacing Biases in Large Language Models using Contrastive Input Decoding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators