Aligning Large Language Models via Fine-grained Supervision

Xu, Dehong; Qiu, Liang; Kim, Minseok; Ladhak, Faisal; Do, Jaeyoung

Computer Science > Computation and Language

arXiv:2406.02756 (cs)

[Submitted on 4 Jun 2024]

Title:Aligning Large Language Models via Fine-grained Supervision

Authors:Dehong Xu, Liang Qiu, Minseok Kim, Faisal Ladhak, Jaeyoung Do

View PDF HTML (experimental)

Abstract:Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback (RLHF) to improve model alignment, which works by transforming coarse human preferences of LLM outputs into a feedback signal that guides the model learning process. However, because this approach operates on sequence-level feedback, it lacks the precision to identify the exact parts of the output affecting user preferences. To address this gap, we propose a method to enhance LLM alignment through fine-grained token-level supervision. Specifically, we ask annotators to minimally edit less preferred responses within the standard reward modeling dataset to make them more favorable, ensuring changes are made only where necessary while retaining most of the original content. The refined dataset is used to train a token-level reward model, which is then used for training our fine-grained Proximal Policy Optimization (PPO) model. Our experiment results demonstrate that this approach can achieve up to an absolute improvement of $5.1\%$ in LLM performance, in terms of win rate against the reference model, compared with the traditional PPO model.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2406.02756 [cs.CL]
	(or arXiv:2406.02756v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.02756

Submission history

From: Dehong Xu [view email]
[v1] Tue, 4 Jun 2024 20:21:45 UTC (4,349 KB)

Computer Science > Computation and Language

Title:Aligning Large Language Models via Fine-grained Supervision

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Aligning Large Language Models via Fine-grained Supervision

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators