Reconsidering the Past: Optimizing Hidden States in Language Models

Yoshida, Davis; Gimpel, Kevin

Computer Science > Computation and Language

arXiv:2112.08653 (cs)

[Submitted on 16 Dec 2021]

Title:Reconsidering the Past: Optimizing Hidden States in Language Models

Authors:Davis Yoshida, Kevin Gimpel

View PDF

Abstract:We present Hidden-State Optimization (HSO), a gradient-based method for improving the performance of transformer language models at inference time. Similar to dynamic evaluation (Krause et al., 2018), HSO computes the gradient of the log-probability the language model assigns to an evaluation text, but uses it to update the cached hidden states rather than the model parameters. We test HSO with pretrained Transformer-XL and GPT-2 language models, finding improvement on the WikiText103 and PG-19 datasets in terms of perplexity, especially when evaluating a model outside of its training distribution. We also demonstrate downstream applicability by showing gains in the recently developed prompt-based few-shot evaluation setting, again with no extra parameters or training data.

Comments:	Findings of EMNLP version
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2112.08653 [cs.CL]
	(or arXiv:2112.08653v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2112.08653
Journal reference:	Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4099-4105

Submission history

From: Davis Yoshida [view email]
[v1] Thu, 16 Dec 2021 06:14:37 UTC (5,430 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-12

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Kevin Gimpel

export BibTeX citation

Computer Science > Computation and Language

Title:Reconsidering the Past: Optimizing Hidden States in Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Reconsidering the Past: Optimizing Hidden States in Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators