How General-Purpose Is a Language Model? Usefulness and Safety with Human Prompters in the Wild

Authors

  • Pablo Antonio Moreno Casares Universidad Complutense de Madrid, Spain
  • Bao Sheng Loe The Psychometrics Centre, Cambridge Judge Business School, UK
  • John Burden Centre for the Study of Existential Risk, University of Cambridge, UK
  • Sean hEigeartaigh Centre for the Study of Existential Risk, University of Cambridge, UK Leverhulme Centre for the Future of Intelligence, University of Cambridge, UK
  • José Hernández-Orallo Leverhulme Centre for the Future of Intelligence, University of Cambridge, UK Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València, Spain

DOI:

https://doi.org/10.1609/aaai.v36i5.20466

Keywords:

Humans And AI (HAI), Philosophy And Ethics Of AI (PEAI), Speech & Natural Language Processing (SNLP)

Abstract

The new generation of language models is reported to solve some extraordinary tasks the models were never trained for specifically, in few-shot or zero-shot settings. However, these reports usually cherry-pick the tasks, use the best prompts, and unwrap or extract the solutions leniently even if they are followed by nonsensical text. In sum, they are specialised results for one domain, a particular way of using the models and interpreting the results. In this paper, we present a novel theoretical evaluation framework and a distinctive experimental study assessing language models as general-purpose systems when used directly by human prompters --- in the wild. For a useful and safe interaction in these increasingly more common conditions, we need to understand when the model fails because of a lack of capability or a misunderstanding of the user's intents. Our results indicate that language models such as GPT-3 have limited understanding of the human command; far from becoming general-purpose systems in the wild.

Downloads

Published

2022-06-28

How to Cite

Casares, P. A. M., Loe, B. S., Burden, J., hEigeartaigh, S., & Hernández-Orallo, J. (2022). How General-Purpose Is a Language Model? Usefulness and Safety with Human Prompters in the Wild. Proceedings of the AAAI Conference on Artificial Intelligence, 36(5), 5295-5303. https://doi.org/10.1609/aaai.v36i5.20466

Issue

Section

AAAI Technical Track on Humans and AI