IJCAI International Joint Conferences on Artificial Intelligence Organization’s Post

#IJCAItutorial T24: Demystifying RL for Large Language Models: A training paradigm shift #IJCAI2024 🗣️Florian STRUB, Olivier Pietquin ➡️https://lnkd.in/eTXYwKeZ Abstract: While Reinforcement Learning (RL) recently became essential to Large Language Models (LLMs) alignment, we still only scraped the surface of the potential impact of RL on LLMs. Beyond alignment to human preferences, RL genuinely trains LLMs to generate full completions from prompts, potentially outperforming standard supervised learning approaches based on next-token prediction. Contrary to popular belief, the structural properties of the language domain make applying RL a straightforward process. This tutorial thus aims to pedagogically dive into several RL-inspired methods to train language models efficiently. Taking an inductive approach, we use a summarization task as a support to demystify RL-based training: detailing underlying hypotheses underneath online RL(HF) and DPO-like algorithms, hinting at good practices and pitfalls before exploring original approaches such as language sequence modeling and self-play. We expect to democratize the usage of RL in the LLM community and intuite the emergence of new language modeling training paradigms. #LLMs #AI #Chatbot

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics