Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

Bang, Yejin; Yu, Tiezheng; Madotto, Andrea; Lin, Zhaojiang; Diab, Mona; Fung, Pascale

Computer Science > Computation and Language

arXiv:2210.07652 (cs)

[Submitted on 14 Oct 2022]

Title:Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

Authors:Yejin Bang, Tiezheng Yu, Andrea Madotto, Zhaojiang Lin, Mona Diab, Pascale Fung

View PDF

Abstract:Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. Yet, human values can vary under diverse cultural conditions. Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command. Along with the task, we propose a practical approach that distills value-aligned knowledge from large-scale language models (LLMs) to construct value-aligned classifiers in two steps. First, we generate value-aligned training data from LLMs by prompt-based few-shot learning. Next, we fine-tune smaller classification models with the generated data for the task. Empirical results show that our VA-Models surpass multiple baselines by at least 15.56% on the F1-score, including few-shot learning with OPT-175B and existing text augmentation methods. We suggest that using classifiers with explicit human value input improves both inclusivity & explainability in AI.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.07652 [cs.CL]
	(or arXiv:2210.07652v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.07652

Submission history

From: Yejin Bang [view email]
[v1] Fri, 14 Oct 2022 09:10:49 UTC (2,007 KB)

Computer Science > Computation and Language

Title:Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators