Grokking Deep Reinforcement Learning
5/5
()
About this ebook
Summary
We all learn through trial and error. We avoid the things that cause us to experience pain and failure. We embrace and build on the things that give us reward and success. This common pattern is the foundation of deep reinforcement learning: building machine learning systems that explore and learn based on the responses of the environment. Grokking Deep Reinforcement Learning introduces this powerful machine learning approach, using examples, illustrations, exercises, and crystal-clear teaching. You'll love the perfectly paced teaching and the clever, engaging writing style as you dig into this awesome exploration of reinforcement learning fundamentals, effective deep learning techniques, and practical applications in this emerging field.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
We learn by interacting with our environment, and the rewards or punishments we experience guide our future behavior. Deep reinforcement learning brings that same natural process to artificial intelligence, analyzing results to uncover the most efficient ways forward. DRL agents can improve marketing campaigns, predict stock performance, and beat grand masters in Go and chess.
About the book
Grokking Deep Reinforcement Learning uses engaging exercises to teach you how to build deep learning systems. This book combines annotated Python code with intuitive explanations to explore DRL techniques. You’ll see how algorithms function and learn to develop your own DRL agents using evaluative feedback.
What's inside
An introduction to reinforcement learning
DRL agents with human-like behaviors
Applying DRL to complex situations
About the reader
For developers with basic deep learning experience.
About the author
Miguel Morales works on reinforcement learning at Lockheed Martin and is an instructor for the Georgia Institute of Technology’s Reinforcement Learning and Decision Making course.
Table of Contents
1 Introduction to deep reinforcement learning
2 Mathematical foundations of reinforcement learning
3 Balancing immediate and long-term goals
4 Balancing the gathering and use of information
5 Evaluating agents’ behaviors
6 Improving agents’ behaviors
7 Achieving goals more effectively and efficiently
8 Introduction to value-based deep reinforcement learning
9 More stable value-based methods
10 Sample-efficient value-based methods
11 Policy-gradient and actor-critic methods
12 Advanced actor-critic methods
13 Toward artificial general intelligence
Miguel Morales
Miguel Morales is a Staff Research Engineer at Lockheed Martin, Missile and Fire Control-Autonomous Systems. He is also a faculty member at Georgia Institute of Technology where he works as an Instructional Associate for the Reinforcement Learning and Decision Making graduate course. Miguel has worked for numerous other educational and technology companies including Udacity, AT&T, Cisco, and HPE.
Related to Grokking Deep Reinforcement Learning
Related ebooks
Grokking Machine Learning Rating: 0 out of 5 stars0 ratingsDeep Learning with Keras: Beginner’s Guide to Deep Learning with Keras Rating: 3 out of 5 stars3/5Grokking Deep Learning Rating: 0 out of 5 stars0 ratingsDeep Reinforcement Learning in Action Rating: 4 out of 5 stars4/5Natural Language Processing in Action: Understanding, analyzing, and generating text with Python Rating: 0 out of 5 stars0 ratingsGrokking Artificial Intelligence Algorithms Rating: 0 out of 5 stars0 ratingsDeep Learning for Search Rating: 0 out of 5 stars0 ratingsDeep Learning for Vision Systems Rating: 5 out of 5 stars5/5Deep Learning Patterns and Practices Rating: 0 out of 5 stars0 ratingsDeep Learning with Python, Second Edition Rating: 0 out of 5 stars0 ratingsDeep Learning with PyTorch Rating: 5 out of 5 stars5/5Machine Learning Bookcamp: Build a portfolio of real-life projects Rating: 4 out of 5 stars4/5Probabilistic Deep Learning: With Python, Keras and TensorFlow Probability Rating: 0 out of 5 stars0 ratingsReal-World Natural Language Processing: Practical applications with deep learning Rating: 0 out of 5 stars0 ratingsTransfer Learning for Natural Language Processing Rating: 0 out of 5 stars0 ratingsDeep Learning with Python Rating: 5 out of 5 stars5/5Machine Learning in Action Rating: 0 out of 5 stars0 ratingsReinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges Rating: 0 out of 5 stars0 ratingsMachine Learning with TensorFlow, Second Edition Rating: 0 out of 5 stars0 ratingsDeep Learning with Structured Data Rating: 0 out of 5 stars0 ratingsReal-World Machine Learning Rating: 0 out of 5 stars0 ratingsClassic Computer Science Problems in Python Rating: 0 out of 5 stars0 ratingsAdvanced Algorithms and Data Structures Rating: 0 out of 5 stars0 ratingsMachine Learning for Business: Using Amazon SageMaker and Jupyter Rating: 5 out of 5 stars5/5Data Science Bookcamp: Five real-world Python projects Rating: 5 out of 5 stars5/5GANs in Action: Deep learning with Generative Adversarial Networks Rating: 0 out of 5 stars0 ratingsDeep Learning with JavaScript: Neural networks in TensorFlow.js Rating: 0 out of 5 stars0 ratingsGraph-Powered Machine Learning Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
2084: Artificial Intelligence and the Future of Humanity Rating: 4 out of 5 stars4/5Nexus: A Brief History of Information Networks from the Stone Age to AI Rating: 4 out of 5 stars4/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Summary of Super-Intelligence From Nick Bostrom Rating: 4 out of 5 stars4/5Writing AI Prompts For Dummies Rating: 0 out of 5 stars0 ratingsMidjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5ChatGPT For Dummies Rating: 4 out of 5 stars4/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5ChatGPT For Fiction Writing: AI for Authors Rating: 5 out of 5 stars5/5101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5Build a Career in Data Science Rating: 5 out of 5 stars5/5Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 3 out of 5 stars3/5Coding with AI For Dummies Rating: 0 out of 5 stars0 ratingsA Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION Rating: 5 out of 5 stars5/5Enterprise AI For Dummies Rating: 3 out of 5 stars3/5
Reviews for Grokking Deep Reinforcement Learning
1 rating0 reviews
Book preview
Grokking Deep Reinforcement Learning - Miguel Morales
Grokking Deep Reinforcement Learning
Miguel Morales
Foreword by Charles Isbell, Jr.
To comment go to liveBook
Manning
Shelter Island
For more information on this and other Manning titles go to
manning.com
Copyright
For online information and ordering of these and other Manning books, please visit manning.com. The publisher offers discounts on these books when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: [email protected]
©2020 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
♾ Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
ISBN: 9781617295454
dedication
For Danelle, Aurora, Solomon, and those to come.
Being with you is a +1 per timestep.
(You can safely assume +1 is the highest reward.)
I love you!
contents
foreword
preface
acknowledgments
about this book
about the author
1 Introduction to deep reinforcement learning
What is deep reinforcement learning?
The past, present, and future of deep reinforcement learning
The suitability of deep reinforcement learning
Setting clear two-way expectations
2 Mathematical foundations of reinforcement learning
Components of reinforcement learning
MDPs: The engine of the environment
3 Balancing immediate and long-term goals
The objective of a decision-making agent
Planning optimal sequences of actions
4 Balancing the gathering and use of information
The challenge of interpreting evaluative feedback
Strategic exploration
5 Evaluating agents’ behaviors
Learning to estimate the value of policies
Learning to estimate from multiple steps
6 Improving agents’ behaviors
The anatomy of reinforcement learning agents
Learning to improve policies of behavior
Decoupling behavior from learning
7 Achieving goals more effectively and efficiently
Learning to improve policies using robust targets
Agents that interact, learn, and plan
8 Introduction to value-based deep reinforcement learning
The kind of feedback deep reinforcement learning agents use
Introduction to function approximation for reinforcement learning
NFQ: The first attempt at value-based deep reinforcement learning
9 More stable value-based methods
DQN: Making reinforcement learning more like supervised learning
Double DQN: Mitigating the overestimation of action-value functions
10 Sample-efficient value-based methods
Dueling DDQN: A reinforcement-learning-aware neural network architecture
PER: Prioritizing the replay of meaningful experiences
11 Policy-gradient and actor-critic methods
REINFORCE: Outcome-based policy learning
VPG: Learning a value function
A3C: Parallel policy updates
GAE: Robust advantage estimation
A2C: Synchronous policy updates
12 Advanced actor-critic methods
DDPG: Approximating a deterministic policy
TD3: State-of-the-art improvements over DDPG
SAC: Maximizing the expected return and entropy
PPO: Restricting optimization steps
13 Toward artificial general intelligence
What was covered and what notably wasn’t?
More advanced concepts toward AGI
What happens next?
index
front matter
foreword
So, here’s the thing about reinforcement learning. It is difficult to learn and difficult to teach, for a number of reasons. First, it’s quite a technical topic. There is a great deal of math and theory behind it. Conveying the right amount of background without drowning in it is a challenge in and of itself.
Second, reinforcement learning encourages a conceptual error. RL is both a way of thinking about decision-making problems and a set of tools for solving those problem. By a way of thinking,
I mean that RL provides a framework for making decisions: it discusses states and reinforcement signals, among other details. When I say a set of tools,
I mean that when we discuss RL, we find ourselves using terms like Markov decision processes and Bellman updates. It is remarkably easy to confuse the way of thinking with the mathematical tools we use in response to that way of thinking.
Finally, RL is implementable in a wide variety of ways. Because RL is a way of thinking, we can discuss it by trying to realize the framework in a very abstract way, or ground it in code, or, for that matter, in neurons. The substrate one decides to use makes these two difficulties even more challenging—which bring us to deep reinforcement learning.
Focusing on deep reinforcement learning nicely compounds all these problems at once. There is background on RL, and background on deep neural networks. Both are separately worthy of study and have developed in completely different ways. Working out how to explain both in the context of developing tools is no easy task. Also, do not forget that understanding RL requires understanding not only the tools and their realization in deep networks, but also understanding the way of thinking about RL; otherwise, you cannot generalize beyond the examples you study directly. Again, teaching RL is hard, and there are so many ways for teaching deep RL to go wrong—which brings us to Miguel Morales and this book.
This book is very well put together. It explains in technical but clear language what machine learning is, what deep learning is, and what reinforcement learning is. It allows the reader to understand the larger context of where the field is and what you can do with the techniques of deep RL, but also the way of thinking that ML, RL, and deep RL present. It is clear and concise. Thus, it works as both a learning guide and as a reference, and, at least for me, as a source of some inspiration.
I am not surprised by any of this. I’ve known Miguel for quite a few years now. He went from taking machine learning courses to teaching them. He has been the lead teaching assistant on my Reinforcement Learning and Decision Making course for the Online Masters of Science at Georgia Tech for more semesters than I can count. He’s reached thousands of students during that time. I’ve watched him grow as a practitioner, a researcher, and an educator. He has helped to make the RL course at GT better than it started out, and continues even as I write this to make the experience of grokking reinforcement learning a deeper one for the students. He is a natural teacher.
This text reflects his talent. I am happy to be able to work with him, and I’m happy he’s been moved to write this book. Enjoy. I think you’ll learn a lot. I learned a few things myself.
Charles Isbell, Jr.
Professor and John P. Imlay Jr. Dean
College of Computing
Georgia Institute of Technology
preface
Reinforcement learning is an exciting field with the potential to make a profound impact on the history of humankind. Several technologies have influenced the history of our world and changed the course of humankind, from fire, to the wheel, to electricity, to the internet. Each technological discovery propels the next discovery in a compounding way. Without electricity, the personal computer wouldn’t exist; without it, the internet wouldn’t exist; without it, search engines wouldn’t exist.
To me, the most exciting aspect of RL and artificial intelligence, in general, is not so much to merely have other intelligent entities next to us, which is pretty exciting, but instead, what comes after that. I believe reinforcement learning, being a robust framework for optimizing specific tasks autonomously, has the potential to change the world. In addition to task automation, the creation of intelligent machines may drive the understanding of human intelligence to places we have never been before. Arguably, if you can know with certainty how to find optimal decisions for every problem, you likely understand the algorithm that finds those optimal decisions. I have a feeling that by creating intelligent entities, humans can become more intelligent beings.
But we are far away from this point, and to fulfill these wild dreams, we need more minds at work. Reinforcement learning is not only in its infancy, but it’s been in that state for a while, so there is much work ahead. The reason I wrote this book is to get more people grokking deep RL, and RL in general, and to help you contribute.
Even though the RL framework is intuitive, most of the resources out there are difficult to understand for newcomers. My goal was not to write a book that provides code examples only, and most definitely not to create a resource that teaches the theory of reinforcement learning. Instead, my goal was to create a resource that can bridge the gap between theory and practice. As you’ll soon see, I don’t shy away from equations; they are essential if you want to grok a research field. And, even if your goal is practical, to build quality RL solutions, you still need that theoretical foundation. However, I also don’t solely rely on equations because not everybody interested in RL is fond of math. Some people are more comfortable with code and concrete examples, so this book provides the practical side of this fantastic field.
Most of my effort during this three-year project went into bridging this gap; I don’t shy away from intuitively explaining the theory, and I don’t just plop down code examples. I do both, and in a very detail-oriented fashion. Those who have a hard time understanding the textbooks and lectures can more easily grasp the words top researchers use: why those specific words, why not other words. And those who know the words and love reading the equations but have trouble seeing those equations in code and how they connect can more easily understand the practical side of reinforcement learning.
Finally, I hope you enjoy this work, and more importantly that it does fulfill its goal for you. I hope that you emerge grokking deep reinforcement learning and can give back and contribute to this fantastic community that I’ve grown to love. As I mentioned before, you wouldn’t be reading this book if it wasn’t for a myriad of relatively recent technological innovations, but what happens after this book is up to you, so go forth and make an impact in the world.
acknowledgments
I want to thank the people at Georgia Tech for taking the risk and making available the first Online Master of Science in Computer Science for anyone in the world to get a high-quality graduate education. If it weren’t for those folks who made it possible, I probably would not have written this book.
I want to thank Professor and Dean Charles Isbell and Professor Michael Littman for putting together an excellent reinforcement-learning course. I have a special appreciation for Dean Isbell, who has given me much room to grow and learn RL. Also, the way I teach reinforcement learning—by splitting the problem into three types of feedback—I learned from Professor Littman. I’m grateful to have received instruction from them.
I want to thank the vibrant teaching staff at Georgia Tech’s CS 7642 for working together on how to help students learn more and enjoy their time with us. Special thanks go to Tim Bail, Pushkar Kolhe, Chris Serrano, Farrukh Rahman, Vahe Hagopian, Quinn Lee, Taka Hasegawa, Tianhang Zhu, and Don Jacob. You guys are such great teammates. I also want to thank the folks who previously contributed significantly to that course. I’ve gotten a lot from our interactions: Alec Feuerstein, Valkyrie Felso, Adrien Ecoffet, Kaushik Subramanian, and Ashley Edwards. I want to also thank our students for asking the questions that helped me identify the gaps in knowledge for those trying to learn RL. I wrote this book with you in mind. A very special thank you goes out to that anonymous student who recommended me to Manning for writing this book; I still don’t know who you are, but you know who you are. Thank you.
I want to thank the folks at Lockheed Martin for all their feedback and interactions during my time writing this book. Special thanks go to Chris Aasted, Julia Kwok, Taylor Lopez, and John Haddon. John was the first person to review my earliest draft, and his feedback helped me move the writing to the next level.
I want to thank the folks at Manning for providing the framework that made this book a reality. I thank Brian Sawyer for reaching out and opening the door; Bert Bates for setting the compass early on and helping me focus on teaching; Candace West for helping me go from zero to something; Susanna Kline for helping me pick up the pace when life got busy; Jennifer Stout for cheering me on through the finish line; Rebecca Rinehart for putting out fires; Al Krinker for providing me with actionable feedback and helping me separate the signal from the noise; Matko Hrvatin for keeping up with MEAP releases and putting that extra pressure on me to keep writing; Candace Gillhoolley for getting the book out there, Stjepan Jurekovic´ for getting me out there; Ivan Martinovic for getting the much-needed feedback to improve the text; Lori Weidert for aligning the book to be production-ready twice; Jennifer Houle for being gentle with the design changes; Katie Petito for patiently working through the details; Katie Tennant for the meticulous and final polishing touches; and to anyone I missed, or who worked behind the scenes to make this book a reality. There are more, I know: thank you all for your hard work.
To all the reviewers—Al Rahimi, Alain Couniot, Alberto Ciarlanti, David Finton, Doniyor Ulmasov, Edisson Reinozo, Ezra Joel Schroeder, Hank Meisse, Hao Liu, Ike Okonkwo, Jie Mei, Julien Pohie, Kim Falk Jørgensen, Marc-Philippe Huget, Michael Haller, Michel Klomp, Nacho Ormeño, Rob Pacheco, Sebastian Maier, Sebastian Zaba, Swaminathan Subramanian, Tyler Kowallis, Ursin Stauss, and Xiaohu Zhu—thank you, your suggestions helped make this a better book.
I want to thank the folks at Udacity for letting me share my passion for this field with their students and record the actor-critic lectures for their Deep Reinforcement Learning Nanodegree. Special thanks go to Alexis Cook, Mat Leonard, and Luis Serrano.
I want to thank the RL community for helping me clarify the text and improve my understanding. Special thanks go to David Silver, Sergey Levine, Hado van Hasselt, Pascal Poupart, John Schulman, Pieter Abbeel, Chelsea Finn, Vlad Mnih, for their lectures; Rich Sutton for providing the gold copy of the field in a single place (his textbook); and James MacGlashan, and Joshua Achiam for their codebases, online resources, and guidance when I didn’t know where to go to get an answer to a question. I want to thank David Ha for giving me insights as to where to go next.
Special thanks go to Silvia Mora for helping make all the figures in this book presentable and helping me in almost every side project that I undertake.
Finally, I want to thank my family, who were my foundation throughout this project. I knew
writing a book was a challenge, and then I learned. But my wife and kids were there regardless, waiting for my 15-minute breaks every 2 hours or so during the weekends. Thank you, Solo, for brightening up my life midway through this book. Thank you, Rosie, for sharing your love and beauty, and thank you Danelle, my wonderful wife, for everything you are and do. You are my perfect teammate in this interesting game called life. I’m so glad I found you.
about this book
grokking Deep Reinforcement Learning bridges the gap between the theory and practice of deep reinforcement learning. The book’s target audience is folks familiar with machine learning techniques, who want to learn reinforcement learning. The book begins with the foundations of deep reinforcement learning. It then provides an in-depth exploration of algorithms and techniques for deep reinforcement learning. Lastly, it provides a survey of advanced techniques with the potential for making an impact.
Who should read this book
Folks who are comfortable with a research field, Python code, a bit of math here and there, lots of intuitive explanations, and fun and concrete examples to drive the learning will enjoy this book. However, any person only familiar with Python can get a lot, given enough interest in learning. Even though basic DL knowledge is assumed, this book provides a brief refresher on neural networks, backpropagation, and related techniques. The bottom line is that this book is self contained, and anyone wanting to play around with AI agents and emerge grokking deep reinforcement learning can use this book to get them there.
How this book is organized: a roadmap
This book has 13 chapters divided into two parts.
In part 1, chapter 1 introduces the field of deep reinforcement learning and sets expectations for the journey ahead. Chapter 2 introduces a framework for designing problems that RL agents can understand. Chapter 3 contains details of algorithms for solving RL problems when the agent knows the dynamics of the world. Chapter 4 contains details of algorithms for solving simple RL problems when the agent does not know the dynamics of the world. Chapter 5 introduces methods for solving the prediction problem, which is a foundation for advanced RL methods.
In part 2, chapter 6 introduces methods for solving the control problem, methods that optimize policies purely from trial-and-error learning. Chapter 7 teaches more advanced methods for RL, including methods that use planning for more sample efficiency. Chapter 8 introduces the use of function approximation in RL by implementing a simple RL algorithm that uses neural networks for function approximation. Chapter 9 dives into more advanced techniques for using function approximation for solving reinforcement learning problems. Chapter 10 teaches some of the best techniques for further improving the methods introduced so far. Chapter 11 introduces a slightly different technique for using DL models with RL that has proven to reach state-of-the-art performance in multiple deep RL benchmarks. Chapter 12 dives into more advanced methods for deep RL, state-of-the-art algorithms, and techniques commonly used for solving real-world problems. Chapter 13 surveys advanced research areas in RL that suggest the best path for progress toward artificial general intelligence.
About the code
This book contains many examples of source code both in boxes titled I speak Python
and in the text. Source code is formatted in a fixed-width font like this to separate it from ordinary text and has syntax highlighting to make it easier to read.
In many cases, the original source code has been reformatted; we’ve added line breaks, renamed variables, and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and code includes line-continuation operator in Python, the backslash (\), to indicate that a statement is continued on the next line.
Additionally, comments in the source code have often been removed from the boxes, and the code is described in the text. Code annotations point out important concepts.
The code for the examples in this book is available for download from the Manning website at https://www.manning.com/books/grokking-deep-reinforcement-learning and from GitHub at https://github.com/mimoralea/gdrl.
liveBook discussion forum
Purchase of grokking Deep Reinforcement Learning includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To access the forum, go to https://livebook.manning.com/#!/book/grokking-deep-reinforcement-learning/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/discussion.
Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking him some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
about the author
Miguel Morales works on reinforcement learning at Lockheed Martin, Missiles and Fire Control, Autonomous Systems, in Denver, Colorado. He is a part-time Instructional Associate at Georgia Institute of Technology for the course in Reinforcement Learning and Decision Making. Miguel has worked for Udacity as a machine learning project reviewer, a Self-driving Car Nanodegree mentor, and a Deep Reinforcement Learning Nanodegree content developer. He graduated from Georgia Tech with a Master’s in Computer Science, specializing in interactive intelligence.
1 Introduction to deep reinforcement learning
In this chapter
You will learn what deep reinforcement learning is and how it is different from other machine learning approaches.
You will learn about the recent progress in deep reinforcement learning and what it can do for a variety of problems.
You will know what to expect from this book and how to get the most out of it.
I visualize a time when we will be to robots what dogs are to humans, and I’m rooting for the machines.
— Claude Shannon Father of the information age and contributor to the field of artificial intelligence
Humans naturally pursue feelings of happiness. From picking out our meals to advancing our careers, every action we choose is derived from our drive to experience rewarding moments in life. Whether these moments are self-centered pleasures or the more generous of goals, whether they bring us immediate gratification or long-term success, they’re still our perception of how important and valuable they are. And to some extent, these moments are the reason for our existence.
Our ability to achieve these precious moments seems to be correlated with intelligence; intelligence
is defined as the ability to acquire and apply knowledge and skills. People who are deemed by society as intelligent are capable of trading not only immediate satisfaction for long-term goals, but also a good, certain future for a possibly better, yet uncertain, one. Goals that take longer to materialize and that have unknown long-term value are usually the hardest to achieve, and those who can withstand the challenges along the way are the exception, the leaders, the intellectuals of society.
In this book, you learn about an approach, known as deep reinforcement learning, involved with creating computer programs that can achieve goals that require intelligence. In this chapter, I introduce deep reinforcement learning and give suggestions to get the most out of this book.
What is deep reinforcement learning?
Deep reinforcement learning (DRL) is a machine learning approach to artificial intelligence concerned with creating computer programs that can solve problems requiring intelligence. The distinct property of DRL programs is learning through trial and error from feedback that’s simultaneously sequential, evaluative, and sampled by leveraging powerful non-linear function approximation.
I want to unpack this definition for you one bit at a time. But, don’t get too caught up with the details because it’ll take me the whole book to get you grokking deep reinforcement learning. The following is the introduction to what you learn about in this book. As such, it’s repeated and explained in detail in the chapters ahead.
If I succeed with my goal for this book, after you complete it, you should understand this definition precisely. You should be able to tell why I used the words that I used, and why I didn’t use more or fewer words. But, for this chapter, simply sit back and plow through it.
Deep reinforcement learning is a machine learning approach to artificial intelligence
artificial intelligence (AI) is a branch of computer science involved in the creation of computer programs capable of demonstrating intelligence. Traditionally, any piece of software that displays cognitive abilities such as perception, search, planning, and learning is considered part of AI. Several examples of functionality produced by AI software are
The pages returned by a search engine
The route produced by a GPS app
The voice recognition and the synthetic voice of smart-assistant software
The products recommended by e-commerce sites
The follow-me feature in drones
Subfields of artificial intelligence
All computer programs that display intelligence are considered AI, but not all examples of AI can learn. machine learning (ML) is the area of AI concerned with creating computer programs that can solve problems requiring intelligence by learning from data. There are three main branches of ML: supervised, unsupervised, and reinforcement learning.
Main branches of machine learning
supervised learning (SL) is the task of learning from labeled data. In SL, a human decides which data to collect and how to label it. The goal in SL is to generalize. A classic example of SL is a handwritten-digit-recognition application: a human gathers images with handwritten digits, labels those images, and trains a model to recognize and classify digits in images correctly. The trained model is expected to generalize and correctly classify handwritten digits in new images.
unsupervised learning (UL) is the task of learning from unlabeled data. Even though data no longer needs labeling, the methods used by the computer to gather data still need to be designed by a human. The goal in UL is to compress. A classic example of UL is a customer segmentation application; a human collects customer data and trains a model to group customers into clusters. These clusters compress the information, uncovering underlying relationships in customers.
reinforcement learning (RL) is the task of learning through trial and error. In this type of task, no human labels data, and no human collects or explicitly designs the collection of data. The goal in RL is to act. A classic example of RL is a Pong-playing agent; the agent repeatedly interacts with a Pong emulator and learns by taking actions and observing their effects. The trained agent is expected to act in such a way that it successfully plays Pong.
A powerful recent approach to ML, called deep learning (DL), involves using multi-layered non-linear function approximation, typically neural networks. DL isn’t a separate branch of ML, so it’s not a different task than those described previously. DL is a collection of techniques and methods for using neural networks to solve ML tasks, whether SL, UL, or RL. DRL is simply the use of DL to solve RL tasks.
Deep learning is a powerful toolbox
The bottom line is that DRL is an approach to a problem. The field of AI defines the problem: creating intelligent machines. One of the approaches to solving that problem is DRL. Throughout the book, will you find comparisons between RL and other ML approaches, but only in this chapter will you find definitions and a historical overview of AI in general. It’s important to note that the field of RL includes the field of DRL, so although I make a distinction when necessary, when I refer to RL, remember that DRL is included.
Deep reinforcement learning is concerned with creating computer programs
At its core, DRL is about complex sequential decision-making problems under uncertainty. But, this is a topic of interest in many fields; for instance, control theory (CT) studies ways to control complex known dynamic systems. In CT, the dynamics of the systems we try to control are usually known in advance. Operations research (OR), another instance, also studies decision-making under uncertainty, but problems in this field often have much larger action spaces than those commonly seen in DRL. psychology studies human behavior, which is partly the same complex sequential decision-making under uncertainty
problem.
The synergy between similar fields
The bottom line is that you have come to a field that’s influenced by a variety of others. Although this is a good thing, it also brings inconsistencies in terminologies, notations, and so on. My take is the computer science approach to this problem, so this book is about building computer programs that solve complex decision-making problems under uncertainty, and as such, you can find code examples throughout the book.
In DRL, these computer programs are called agents. An agent is a decision maker Only and nothing else. That means if you’re training a robot to pick up objects, the robot arm isn’t part of the agent. Only the code that makes decisions is referred to as the agent.
Deep reinforcement learning agents can solve problems that require intelligence
On the other side of the agent is the environment. The environment is everything outside the agent; everything the agent has no total control over. Again, imagine you’re training a robot to pick up objects. The objects to be picked up, the tray where the objects lay, the wind, and everything outside the decision maker are part of the environment. That means the robot arm is also part of the environment because it isn’t part of the agent. And even though the agent can decide to move the arm, the actual arm movement is noisy, and thus the arm is part of the environment.
This strict boundary between the agent and the environment is counterintuitive at first, but the decision maker, the agent, can only have a single role: making decisions. Everything that comes after the decision gets bundled into the environment.
Boundary between agent and environment
Chapter 2 provides an in-depth survey of all the components of DRL. The following is a preview of what you’ll learn in chapter 2.
The environment is represented by a set of variables related to the problem. For instance, in the robotic arm example, the location and velocities of the arm would be part of the variables that make up the environment. This set of variables and all the possible values that they can take are referred to as the state space. A state is an instantiation of the state space, a set of values the variables take.
Interestingly, often, agents don’t have access to the actual full state of the environment. The part of a state that the agent can observe is called an Observation. Observations depend on states but are what the agent can see. For instance, in the robotic arm example, the agent may only have access to camera images. While an exact location of each object exists, the agent doesn’t have access to this specific state. Instead, the observations the agent perceives are derived from the states. You’ll often see in the literature states being used interchangeably, including in this book. I apologize in advance for the inconsistencies. Simply know the differences and be aware of the lingo; that’s what matters.
States vs. observations
At each state, the environment makes available a set of actions the agent can choose from. The agent influences the environment through these actions. The environment may change states as a response to the agent’s action. The function that’s responsible for this mapping is called the transition function. The environment may also provide a reward signal as a response. The function responsible for this mapping is called the reward function. The set of transition and reward functions is referred to as the model of the environment.
The reinforcement learning cycle
The environment commonly has a well-defined task. The goal of this task is defined through the reward function. The reward-function signals can be simultaneously sequential, evaluative, and sampled. To achieve the goal, the agent needs to demonstrate intelligence, or at least cognitive abilities commonly associated with intelligence, such as long-term thinking, information gathering, and generalization.
The agent has a three-step process: the agent interacts with the environment, the agent evaluates its behavior, and the agent improves its responses. The agent can be designed to learn mappings from observations to actions called policies. The agent can be designed to learn the model of the environment on mappings called models. The agent can be designed to learn to estimate the reward-to-go on mappings called value functions.
Deep reinforcement learning agents improve their behavior through trial-and-error learning
The interactions between the agent and the environment go on for several cycles. Each cycle is called a time step. At each time step, the agent observes the environment, takes action, and receives a new observation and reward. The set of the state, the action, the reward, and the new state is called an experience. Every experience has an opportunity for learning and improving performance.
Experience tuples
The task the agent is trying to solve may or may not have a natural ending. Tasks that have a natural ending, such as a game, are called episodic tasks. Conversely, tasks that don’t are called continuing tasks, such as learning forward motion. The sequence of time steps from the beginning to the end of an episodic task is called an episode. Agents may take several time steps and episodes to learn to solve a task. Agents learn through trial and error: they try something, observe, learn, try something else, and so on.
You’ll start learning more about this cycle in chapter 4, which contains a type of environment with a single step per episode. Starting with chapter 5, you’ll learn to deal with environments that require more than a single interaction cycle per episode.
Deep reinforcement learning agents learn from sequential feedback
The action taken by the agent may have delayed consequences. The reward may be sparse and only manifest after several time steps. Thus the agent must be able to learn from sequential feedback. Sequential feedback gives rise to a problem referred to as the temporal credit assignment problem. The temporal credit assignment problem is the challenge of determining which state and/or action is responsible for a reward. When there’s a temporal component to a problem, and actions have delayed consequences, it’s challenging to assign credit for rewards.
The difficulty of the temporal credit assignment problem
In chapter 3, we’ll study the ins and outs of sequential feedback in isolation. That is, your programs learn from simultaneously sequential, supervised (as opposed to evaluative), and exhaustive (as opposed to sampled) feedback.
Deep reinforcement learning agents learn from evaluative feedback
The reward received by the agent may be weak, in the sense that it may provide no supervision. The reward may indicate goodness and not correctness, meaning it may contain no information about other potential rewards. Thus the agent must be able to learn from evaluative feedback. Evaluative feedback gives rise to the need for exploration. The agent must be able to balance the gathering of information with the exploitation of current information. This is also referred to as the exploration versus exploitation trade-off.
The difficulty of the exploration vs. exploitation trade-off
In chapter 4, we’ll study the ins and outs of evaluative feedback in isolation. That is, your programs will learn from feedback that is simultaneously one-shot (as opposed to sequential), evaluative, and exhaustive (as opposed to sampled).
Deep reinforcement learning agents learn from sampled feedback
The reward received by the agent is merely a sample, and the agent doesn’t have access to the reward function. Also, the state and action spaces are commonly large, even infinite, so trying to learn from sparse and weak feedback becomes a harder challenge with samples. Therefore, the agent must be able to learn from sampled feedback, and it must be able to generalize.
The difficulty of learning from sampled feedback
Agents that are designed to approximate policies are called policy-based; agents that are designed to approximate value functions are called value-based; agents that are designed to approximate models are called model-based; and agents that are designed to approximate both policies and value functions are called actor-critic. Agents can be designed to approximate one or more of these components.
Deep reinforcement learning agents use powerful non-linear function approximation
The agent can approximate functions using a variety of ML methods and techniques, from decision trees to SVMs to neural networks. However, in this book, we use only neural networks; this is what the deep
part of DRL refers to, after all. Neural networks aren’t necessarily the best solution to every problem; neural networks are data hungry and challenging to interpret, and you must keep these facts in mind. However, neural networks are among the most potent function approximations available, and their performance is often the best.
A simple feed-forward neural network
artificial neural networks (ANN) are multi-layered non-linear function approximators loosely inspired by the biological neural networks in animal brains. An ANN isn’t an algorithm, but a structure composed of multiple layers of mathematical transformations applied to input values.
From chapter 3 through chapter 7, we only deal with problems in which agents learn from exhaustive (as opposed to sampled) feedback. Starting with chapter 8, we study the full DRL problem; that is, using deep neural networks so that agents can learn from sampled feedback. Remember, DRL agents learn from feedback that’s simultaneously sequential, evaluative, and sampled.
The past, present, and future of deep reinforcement learning
History isn’t necessary to gain skills, but it can allow you to understand the context around a topic, which in turn can help you gain motivation, and therefore, skills. The history of AI and DRL should help you set expectations about the future of this powerful technology. At times, I feel the hype surrounding AI is actually productive; people get interested. But, right after that, when it’s time to put in work, hype no longer helps, and it’s a problem. Although I’d like to be excited about AI, I also need to set realistic expectations.
Recent history of artificial intelligence and deep reinforcement learning
The beginnings of DRL could be traced back many years, because humans have been intrigued by the possibility of intelligent creatures other than ourselves since antiquity. But a good beginning could be Alan Turing’s work in the 1930s, 1940s, and 1950s that paved the way for modern computer science and AI by laying down critical theoretical foundations that later scientists leveraged.
The most well-known of these is the Turing Test, which proposes a standard for measuring machine intelligence: if a human interrogator is unable to distinguish a machine from another human on a chat Q&A session, then the computer is said to count as intelligent. Though rudimentary, the Turing Test allowed generations to wonder about the possibilities of creating smart machines by setting a goal that researchers could pursue.
The formal beginnings of AI as an academic discipline can be attributed to John McCarthy, an influential AI researcher who made several notable contributions to the field. To name a few, McCarthy is credited with coining the term artificial intelligence
in 1955, leading the first AI conference in 1956, inventing the Lisp programming language in 1958, cofounding the MIT AI Lab in 1959, and contributing important papers to the development of AI as a field over several decades.
Artificial intelligence winters
All the work and progress of AI early on created a great deal of excitement, but there were also significant setbacks. Prominent AI researchers suggested we would create human-like machine intelligence within years, but this never came. Things got worse when a well-known researcher named James Lighthill compiled a report criticizing the state of academic research in AI. All of these developments contributed to a long period of reduced funding and interest in AI research known as the first AI winter.
The field continued this pattern throughout the years: researchers making progress, people getting overly optimistic, then overestimating—leading to reduced funding by government and industry partners.
Al funding pattern through the years
The current state of artificial intelligence
We are likely in another highly optimistic time in AI history, so we must be careful. Practitioners understand that AI is a powerful tool, but certain people think of AI as this magic black box that can take any problem in and out comes the best solution ever. Nothing can be further from the truth. Other people even worry about AI gaining consciousness, as if that was relevant, as Edsger W. Dijkstra famously said: The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.
But, if we set aside this Hollywood-instilled vision of AI, we can allow ourselves to get excited about the recent progress in this field. Today, the most influential companies in the world make the most substantial investments to AI research. Companies such as Google, Facebook, Microsoft, Amazon, and Apple have invested in AI research and have become highly profitable thanks, in part, to AI systems. Their significant and steady investments have created the perfect environment for the current pace of AI research. Contemporary researchers have the best computing power available and tremendous amounts of data for their research, and teams of top researchers are working together, on the same problems, in the same location, at the same time. Current AI research has become more stable and more productive. We have witnessed one AI success after another, and it doesn’t seem likely to stop anytime soon.
Progress in deep reinforcement learning
The use of artificial neural networks for RL problems started around the 1990s. One of the classics is the backgammon-playing computer program, TD-Gammon, created by Gerald Tesauro et al. TD-Gammon learned to play backgammon by learning to evaluate table positions on its own through RL. Even though the techniques implemented aren’t precisely considered DRL, TD-Gammon was one of the first widely reported success stories using ANNs to solve complex RL problems.
TD-Gammon architecture
In 2004, Andrew Ng et al. developed an autonomous helicopter that taught itself to fly stunts by observing