Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Grokking Deep Reinforcement Learning
Grokking Deep Reinforcement Learning
Grokking Deep Reinforcement Learning
Ebook1,034 pages8 hours

Grokking Deep Reinforcement Learning

Rating: 5 out of 5 stars



Read preview

About this ebook

Grokking Deep Reinforcement Learning uses engaging exercises to teach you how to build deep learning systems. This book combines annotated Python code with intuitive explanations to explore DRL techniques. You’ll see how algorithms function and learn to develop your own DRL agents using evaluative feedback.

We all learn through trial and error. We avoid the things that cause us to experience pain and failure. We embrace and build on the things that give us reward and success. This common pattern is the foundation of deep reinforcement learning: building machine learning systems that explore and learn based on the responses of the environment. Grokking Deep Reinforcement Learning introduces this powerful machine learning approach, using examples, illustrations, exercises, and crystal-clear teaching. You'll love the perfectly paced teaching and the clever, engaging writing style as you dig into this awesome exploration of reinforcement learning fundamentals, effective deep learning techniques, and practical applications in this emerging field.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
We learn by interacting with our environment, and the rewards or punishments we experience guide our future behavior. Deep reinforcement learning brings that same natural process to artificial intelligence, analyzing results to uncover the most efficient ways forward. DRL agents can improve marketing campaigns, predict stock performance, and beat grand masters in Go and chess.

About the book
Grokking Deep Reinforcement Learning uses engaging exercises to teach you how to build deep learning systems. This book combines annotated Python code with intuitive explanations to explore DRL techniques. You’ll see how algorithms function and learn to develop your own DRL agents using evaluative feedback.

What's inside
    An introduction to reinforcement learning
    DRL agents with human-like behaviors
    Applying DRL to complex situations

About the reader
For developers with basic deep learning experience.

About the author
Miguel Morales works on reinforcement learning at Lockheed Martin and is an instructor for the Georgia Institute of Technology’s Reinforcement Learning and Decision Making course.

Table of Contents

1 Introduction to deep reinforcement learning

2 Mathematical foundations of reinforcement learning

3 Balancing immediate and long-term goals

4 Balancing the gathering and use of information

5 Evaluating agents’ behaviors

6 Improving agents’ behaviors

7 Achieving goals more effectively and efficiently

8 Introduction to value-based deep reinforcement learning

9 More stable value-based methods

10 Sample-efficient value-based methods

11 Policy-gradient and actor-critic methods

12 Advanced actor-critic methods

13 Toward artificial general intelligence
Release dateOct 15, 2020
Grokking Deep Reinforcement Learning

Miguel Morales

Miguel Morales is a Staff Research Engineer at Lockheed Martin, Missile and Fire Control-Autonomous Systems. He is also a faculty member at Georgia Institute of Technology where he works as an Instructional Associate for the Reinforcement Learning and Decision Making graduate course. Miguel has worked for numerous other educational and technology companies including Udacity, AT&T, Cisco, and HPE.

Related to Grokking Deep Reinforcement Learning

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Grokking Deep Reinforcement Learning

Rating: 5 out of 5 stars

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Grokking Deep Reinforcement Learning - Miguel Morales

    Grokking Deep Reinforcement Learning

    Miguel Morales

    Foreword by Charles Isbell, Jr.

    To comment go to liveBook


    Shelter Island

    For more information on this and other Manning titles go to


    For online information and ordering of these and other Manning books, please visit The publisher offers discounts on these books when ordered in quantity.

    For more information, please contact

    Special Sales Department

    Manning Publications Co.

    20 Baldwin Road

    PO Box 761

    Shelter Island, NY 11964

    Email: [email protected]

    ©2020 by Manning Publications Co. All rights reserved.

    No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

    Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

    ♾ Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.

    ISBN: 9781617295454


    For Danelle, Aurora, Solomon, and those to come.

    Being with you is a +1 per timestep.

    (You can safely assume +1 is the highest reward.)

    I love you!





    about this book

    about the author

      1  Introduction to deep reinforcement learning

    What is deep reinforcement learning?

    The past, present, and future of deep reinforcement learning

    The suitability of deep reinforcement learning

    Setting clear two-way expectations

      2  Mathematical foundations of reinforcement learning

    Components of reinforcement learning

    MDPs: The engine of the environment

      3  Balancing immediate and long-term goals

    The objective of a decision-making agent

    Planning optimal sequences of actions

      4  Balancing the gathering and use of information

    The challenge of interpreting evaluative feedback

    Strategic exploration

      5  Evaluating agents’ behaviors

    Learning to estimate the value of policies

    Learning to estimate from multiple steps

      6  Improving agents’ behaviors

    The anatomy of reinforcement learning agents

    Learning to improve policies of behavior

    Decoupling behavior from learning

      7  Achieving goals more effectively and efficiently

    Learning to improve policies using robust targets

    Agents that interact, learn, and plan

      8 Introduction to value-based deep reinforcement learning

    The kind of feedback deep reinforcement learning agents use

    Introduction to function approximation for reinforcement learning

    NFQ: The first attempt at value-based deep reinforcement learning

      9  More stable value-based methods

    DQN: Making reinforcement learning more like supervised learning

    Double DQN: Mitigating the overestimation of action-value functions

    10  Sample-efficient value-based methods

    Dueling DDQN: A reinforcement-learning-aware neural network architecture

    PER: Prioritizing the replay of meaningful experiences

    11  Policy-gradient and actor-critic methods

    REINFORCE: Outcome-based policy learning

    VPG: Learning a value function

    A3C: Parallel policy updates

    GAE: Robust advantage estimation

    A2C: Synchronous policy updates

    12  Advanced actor-critic methods

    DDPG: Approximating a deterministic policy

    TD3: State-of-the-art improvements over DDPG

    SAC: Maximizing the expected return and entropy

    PPO: Restricting optimization steps

    13  Toward artificial general intelligence

    What was covered and what notably wasn’t?

    More advanced concepts toward AGI

    What happens next?


    front matter


    So, here’s the thing about reinforcement learning. It is difficult to learn and difficult to teach, for a number of reasons. First, it’s quite a technical topic. There is a great deal of math and theory behind it. Conveying the right amount of background without drowning in it is a challenge in and of itself.

    Second, reinforcement learning encourages a conceptual error. RL is both a way of thinking about decision-making problems and a set of tools for solving those problem. By a way of thinking, I mean that RL provides a framework for making decisions: it discusses states and reinforcement signals, among other details. When I say a set of tools, I mean that when we discuss RL, we find ourselves using terms like Markov decision processes and Bellman updates. It is remarkably easy to confuse the way of thinking with the mathematical tools we use in response to that way of thinking.

    Finally, RL is implementable in a wide variety of ways. Because RL is a way of thinking, we can discuss it by trying to realize the framework in a very abstract way, or ground it in code, or, for that matter, in neurons. The substrate one decides to use makes these two difficulties even more challenging—which bring us to deep reinforcement learning.

    Focusing on deep reinforcement learning nicely compounds all these problems at once. There is background on RL, and background on deep neural networks. Both are separately worthy of study and have developed in completely different ways. Working out how to explain both in the context of developing tools is no easy task. Also, do not forget that understanding RL requires understanding not only the tools and their realization in deep networks, but also understanding the way of thinking about RL; otherwise, you cannot generalize beyond the examples you study directly. Again, teaching RL is hard, and there are so many ways for teaching deep RL to go wrong—which brings us to Miguel Morales and this book.

    This book is very well put together. It explains in technical but clear language what machine learning is, what deep learning is, and what reinforcement learning is. It allows the reader to understand the larger context of where the field is and what you can do with the techniques of deep RL, but also the way of thinking that ML, RL, and deep RL present. It is clear and concise. Thus, it works as both a learning guide and as a reference, and, at least for me, as a source of some inspiration.

    I am not surprised by any of this. I’ve known Miguel for quite a few years now. He went from taking machine learning courses to teaching them. He has been the lead teaching assistant on my Reinforcement Learning and Decision Making course for the Online Masters of Science at Georgia Tech for more semesters than I can count. He’s reached thousands of students during that time. I’ve watched him grow as a practitioner, a researcher, and an educator. He has helped to make the RL course at GT better than it started out, and continues even as I write this to make the experience of grokking reinforcement learning a deeper one for the students. He is a natural teacher.

    This text reflects his talent. I am happy to be able to work with him, and I’m happy he’s been moved to write this book. Enjoy. I think you’ll learn a lot. I learned a few things myself.

    Charles Isbell, Jr.

    Professor and John P. Imlay Jr. Dean

    College of Computing

    Georgia Institute of Technology


    Reinforcement learning is an exciting field with the potential to make a profound impact on the history of humankind. Several technologies have influenced the history of our world and changed the course of humankind, from fire, to the wheel, to electricity, to the internet. Each technological discovery propels the next discovery in a compounding way. Without electricity, the personal computer wouldn’t exist; without it, the internet wouldn’t exist; without it, search engines wouldn’t exist.

    To me, the most exciting aspect of RL and artificial intelligence, in general, is not so much to merely have other intelligent entities next to us, which is pretty exciting, but instead, what comes after that. I believe reinforcement learning, being a robust framework for optimizing specific tasks autonomously, has the potential to change the world. In addition to task automation, the creation of intelligent machines may drive the understanding of human intelligence to places we have never been before. Arguably, if you can know with certainty how to find optimal decisions for every problem, you likely understand the algorithm that finds those optimal decisions. I have a feeling that by creating intelligent entities, humans can become more intelligent beings.

    But we are far away from this point, and to fulfill these wild dreams, we need more minds at work. Reinforcement learning is not only in its infancy, but it’s been in that state for a while, so there is much work ahead. The reason I wrote this book is to get more people grokking deep RL, and RL in general, and to help you contribute.

    Even though the RL framework is intuitive, most of the resources out there are difficult to understand for newcomers. My goal was not to write a book that provides code examples only, and most definitely not to create a resource that teaches the theory of reinforcement learning. Instead, my goal was to create a resource that can bridge the gap between theory and practice. As you’ll soon see, I don’t shy away from equations; they are essential if you want to grok a research field. And, even if your goal is practical, to build quality RL solutions, you still need that theoretical foundation. However, I also don’t solely rely on equations because not everybody interested in RL is fond of math. Some people are more comfortable with code and concrete examples, so this book provides the practical side of this fantastic field.

    Most of my effort during this three-year project went into bridging this gap; I don’t shy away from intuitively explaining the theory, and I don’t just plop down code examples. I do both, and in a very detail-oriented fashion. Those who have a hard time understanding the textbooks and lectures can more easily grasp the words top researchers use: why those specific words, why not other words. And those who know the words and love reading the equations but have trouble seeing those equations in code and how they connect can more easily understand the practical side of reinforcement learning.

    Finally, I hope you enjoy this work, and more importantly that it does fulfill its goal for you. I hope that you emerge grokking deep reinforcement learning and can give back and contribute to this fantastic community that I’ve grown to love. As I mentioned before, you wouldn’t be reading this book if it wasn’t for a myriad of relatively recent technological innovations, but what happens after this book is up to you, so go forth and make an impact in the world.


    I want to thank the people at Georgia Tech for taking the risk and making available the first Online Master of Science in Computer Science for anyone in the world to get a high-quality graduate education. If it weren’t for those folks who made it possible, I probably would not have written this book.

    I want to thank Professor and Dean Charles Isbell and Professor Michael Littman for putting together an excellent reinforcement-learning course. I have a special appreciation for Dean Isbell, who has given me much room to grow and learn RL. Also, the way I teach reinforcement learning—by splitting the problem into three types of feedback—I learned from Professor Littman. I’m grateful to have received instruction from them.

    I want to thank the vibrant teaching staff at Georgia Tech’s CS 7642 for working together on how to help students learn more and enjoy their time with us. Special thanks go to Tim Bail, Pushkar Kolhe, Chris Serrano, Farrukh Rahman, Vahe Hagopian, Quinn Lee, Taka Hasegawa, Tianhang Zhu, and Don Jacob. You guys are such great teammates. I also want to thank the folks who previously contributed significantly to that course. I’ve gotten a lot from our interactions: Alec Feuerstein, Valkyrie Felso, Adrien Ecoffet, Kaushik Subramanian, and Ashley Edwards. I want to also thank our students for asking the questions that helped me identify the gaps in knowledge for those trying to learn RL. I wrote this book with you in mind. A very special thank you goes out to that anonymous student who recommended me to Manning for writing this book; I still don’t know who you are, but you know who you are. Thank you.

    I want to thank the folks at Lockheed Martin for all their feedback and interactions during my time writing this book. Special thanks go to Chris Aasted, Julia Kwok, Taylor Lopez, and John Haddon. John was the first person to review my earliest draft, and his feedback helped me move the writing to the next level.

    I want to thank the folks at Manning for providing the framework that made this book a reality. I thank Brian Sawyer for reaching out and opening the door; Bert Bates for setting the compass early on and helping me focus on teaching; Candace West for helping me go from zero to something; Susanna Kline for helping me pick up the pace when life got busy; Jennifer Stout for cheering me on through the finish line; Rebecca Rinehart for putting out fires; Al Krinker for providing me with actionable feedback and helping me separate the signal from the noise; Matko Hrvatin for keeping up with MEAP releases and putting that extra pressure on me to keep writing; Candace Gillhoolley for getting the book out there, Stjepan Jurekovic´ for getting me out there; Ivan Martinovic for getting the much-needed feedback to improve the text; Lori Weidert for aligning the book to be production-ready twice; Jennifer Houle for being gentle with the design changes; Katie Petito for patiently working through the details; Katie Tennant for the meticulous and final polishing touches; and to anyone I missed, or who worked behind the scenes to make this book a reality. There are more, I know: thank you all for your hard work.

    To all the reviewers—Al Rahimi, Alain Couniot, Alberto Ciarlanti, David Finton, Doniyor Ulmasov, Edisson Reinozo, Ezra Joel Schroeder, Hank Meisse, Hao Liu, Ike Okonkwo, Jie Mei, Julien Pohie, Kim Falk Jørgensen, Marc-Philippe Huget, Michael Haller, Michel Klomp, Nacho Ormeño, Rob Pacheco, Sebastian Maier, Sebastian Zaba, Swaminathan Subramanian, Tyler Kowallis, Ursin Stauss, and Xiaohu Zhu—thank you, your suggestions helped make this a better book.

    I want to thank the folks at Udacity for letting me share my passion for this field with their students and record the actor-critic lectures for their Deep Reinforcement Learning Nanodegree. Special thanks go to Alexis Cook, Mat Leonard, and Luis Serrano.

    I want to thank the RL community for helping me clarify the text and improve my understanding. Special thanks go to David Silver, Sergey Levine, Hado van Hasselt, Pascal Poupart, John Schulman, Pieter Abbeel, Chelsea Finn, Vlad Mnih, for their lectures; Rich Sutton for providing the gold copy of the field in a single place (his textbook); and James MacGlashan, and Joshua Achiam for their codebases, online resources, and guidance when I didn’t know where to go to get an answer to a question. I want to thank David Ha for giving me insights as to where to go next.

    Special thanks go to Silvia Mora for helping make all the figures in this book presentable and helping me in almost every side project that I undertake.

    Finally, I want to thank my family, who were my foundation throughout this project. I knew writing a book was a challenge, and then I learned. But my wife and kids were there regardless, waiting for my 15-minute breaks every 2 hours or so during the weekends. Thank you, Solo, for brightening up my life midway through this book. Thank you, Rosie, for sharing your love and beauty, and thank you Danelle, my wonderful wife, for everything you are and do. You are my perfect teammate in this interesting game called life. I’m so glad I found you.

    about this book

    grokking Deep Reinforcement Learning bridges the gap between the theory and practice of deep reinforcement learning. The book’s target audience is folks familiar with machine learning techniques, who want to learn reinforcement learning. The book begins with the foundations of deep reinforcement learning. It then provides an in-depth exploration of algorithms and techniques for deep reinforcement learning. Lastly, it provides a survey of advanced techniques with the potential for making an impact.

    Who should read this book

    Folks who are comfortable with a research field, Python code, a bit of math here and there, lots of intuitive explanations, and fun and concrete examples to drive the learning will enjoy this book. However, any person only familiar with Python can get a lot, given enough interest in learning. Even though basic DL knowledge is assumed, this book provides a brief refresher on neural networks, backpropagation, and related techniques. The bottom line is that this book is self contained, and anyone wanting to play around with AI agents and emerge grokking deep reinforcement learning can use this book to get them there.

    How this book is organized: a roadmap

    This book has 13 chapters divided into two parts.

    In part 1, chapter 1 introduces the field of deep reinforcement learning and sets expectations for the journey ahead. Chapter 2 introduces a framework for designing problems that RL agents can understand. Chapter 3 contains details of algorithms for solving RL problems when the agent knows the dynamics of the world. Chapter 4 contains details of algorithms for solving simple RL problems when the agent does not know the dynamics of the world. Chapter 5 introduces methods for solving the prediction problem, which is a foundation for advanced RL methods.

    In part 2, chapter 6 introduces methods for solving the control problem, methods that optimize policies purely from trial-and-error learning. Chapter 7 teaches more advanced methods for RL, including methods that use planning for more sample efficiency. Chapter 8 introduces the use of function approximation in RL by implementing a simple RL algorithm that uses neural networks for function approximation. Chapter 9 dives into more advanced techniques for using function approximation for solving reinforcement learning problems. Chapter 10 teaches some of the best techniques for further improving the methods introduced so far. Chapter 11 introduces a slightly different technique for using DL models with RL that has proven to reach state-of-the-art performance in multiple deep RL benchmarks. Chapter 12 dives into more advanced methods for deep RL, state-of-the-art algorithms, and techniques commonly used for solving real-world problems. Chapter 13 surveys advanced research areas in RL that suggest the best path for progress toward artificial general intelligence.

    About the code

    This book contains many examples of source code both in boxes titled I speak Python and in the text. Source code is formatted in a fixed-width font like this to separate it from ordinary text and has syntax highlighting to make it easier to read.

    In many cases, the original source code has been reformatted; we’ve added line breaks, renamed variables, and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and code includes line-continuation operator in Python, the backslash (\), to indicate that a statement is continued on the next line.

    Additionally, comments in the source code have often been removed from the boxes, and the code is described in the text. Code annotations point out important concepts.

    The code for the examples in this book is available for download from the Manning website at and from GitHub at

    liveBook discussion forum

    Purchase of grokking Deep Reinforcement Learning includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To access the forum, go to!/book/grokking-deep-reinforcement-learning/discussion. You can also learn more about Manning’s forums and the rules of conduct at!/discussion.

    Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking him some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

    about the author

    Miguel Morales works on reinforcement learning at Lockheed Martin, Missiles and Fire Control, Autonomous Systems, in Denver, Colorado. He is a part-time Instructional Associate at Georgia Institute of Technology for the course in Reinforcement Learning and Decision Making. Miguel has worked for Udacity as a machine learning project reviewer, a Self-driving Car Nanodegree mentor, and a Deep Reinforcement Learning Nanodegree content developer. He graduated from Georgia Tech with a Master’s in Computer Science, specializing in interactive intelligence.

    1 Introduction to deep reinforcement learning

    In this chapter

    You will learn what deep reinforcement learning is and how it is different from other machine learning approaches.

    You will learn about the recent progress in deep reinforcement learning and what it can do for a variety of problems.

    You will know what to expect from this book and how to get the most out of it.

    I visualize a time when we will be to robots what dogs are to humans, and I’m rooting for the machines.

    — Claude Shannon Father of the information age and contributor to the field of artificial intelligence

    Humans naturally pursue feelings of happiness. From picking out our meals to advancing our careers, every action we choose is derived from our drive to experience rewarding moments in life. Whether these moments are self-centered pleasures or the more generous of goals, whether they bring us immediate gratification or long-term success, they’re still our perception of how important and valuable they are. And to some extent, these moments are the reason for our existence.

    Our ability to achieve these precious moments seems to be correlated with intelligence; intelligence is defined as the ability to acquire and apply knowledge and skills. People who are deemed by society as intelligent are capable of trading not only immediate satisfaction for long-term goals, but also a good, certain future for a possibly better, yet uncertain, one. Goals that take longer to materialize and that have unknown long-term value are usually the hardest to achieve, and those who can withstand the challenges along the way are the exception, the leaders, the intellectuals of society.

    In this book, you learn about an approach, known as deep reinforcement learning, involved with creating computer programs that can achieve goals that require intelligence. In this chapter, I introduce deep reinforcement learning and give suggestions to get the most out of this book.

    What is deep reinforcement learning?

    Deep reinforcement learning (DRL) is a machine learning approach to artificial intelligence concerned with creating computer programs that can solve problems requiring intelligence. The distinct property of DRL programs is learning through trial and error from feedback that’s simultaneously sequential, evaluative, and sampled by leveraging powerful non-linear function approximation.

    I want to unpack this definition for you one bit at a time. But, don’t get too caught up with the details because it’ll take me the whole book to get you grokking deep reinforcement learning. The following is the introduction to what you learn about in this book. As such, it’s repeated and explained in detail in the chapters ahead.

    If I succeed with my goal for this book, after you complete it, you should understand this definition precisely. You should be able to tell why I used the words that I used, and why I didn’t use more or fewer words. But, for this chapter, simply sit back and plow through it.

    Deep reinforcement learning is a machine learning approach to artificial intelligence

    artificial intelligence (AI) is a branch of computer science involved in the creation of computer programs capable of demonstrating intelligence. Traditionally, any piece of software that displays cognitive abilities such as perception, search, planning, and learning is considered part of AI. Several examples of functionality produced by AI software are

    The pages returned by a search engine

    The route produced by a GPS app

    The voice recognition and the synthetic voice of smart-assistant software

    The products recommended by e-commerce sites

    The follow-me feature in drones

    Subfields of artificial intelligence

    All computer programs that display intelligence are considered AI, but not all examples of AI can learn. machine learning (ML) is the area of AI concerned with creating computer programs that can solve problems requiring intelligence by learning from data. There are three main branches of ML: supervised, unsupervised, and reinforcement learning.

    Main branches of machine learning

    supervised learning (SL) is the task of learning from labeled data. In SL, a human decides which data to collect and how to label it. The goal in SL is to generalize. A classic example of SL is a handwritten-digit-recognition application: a human gathers images with handwritten digits, labels those images, and trains a model to recognize and classify digits in images correctly. The trained model is expected to generalize and correctly classify handwritten digits in new images.

    unsupervised learning (UL) is the task of learning from unlabeled data. Even though data no longer needs labeling, the methods used by the computer to gather data still need to be designed by a human. The goal in UL is to compress. A classic example of UL is a customer segmentation application; a human collects customer data and trains a model to group customers into clusters. These clusters compress the information, uncovering underlying relationships in customers.

    reinforcement learning (RL) is the task of learning through trial and error. In this type of task, no human labels data, and no human collects or explicitly designs the collection of data. The goal in RL is to act. A classic example of RL is a Pong-playing agent; the agent repeatedly interacts with a Pong emulator and learns by taking actions and observing their effects. The trained agent is expected to act in such a way that it successfully plays Pong.

    A powerful recent approach to ML, called deep learning (DL), involves using multi-layered non-linear function approximation, typically neural networks. DL isn’t a separate branch of ML, so it’s not a different task than those described previously. DL is a collection of techniques and methods for using neural networks to solve ML tasks, whether SL, UL, or RL. DRL is simply the use of DL to solve RL tasks.

    Deep learning is a powerful toolbox

    The bottom line is that DRL is an approach to a problem. The field of AI defines the problem: creating intelligent machines. One of the approaches to solving that problem is DRL. Throughout the book, will you find comparisons between RL and other ML approaches, but only in this chapter will you find definitions and a historical overview of AI in general. It’s important to note that the field of RL includes the field of DRL, so although I make a distinction when necessary, when I refer to RL, remember that DRL is included.

    Deep reinforcement learning is concerned with creating computer programs

    At its core, DRL is about complex sequential decision-making problems under uncertainty. But, this is a topic of interest in many fields; for instance, control theory (CT) studies ways to control complex known dynamic systems. In CT, the dynamics of the systems we try to control are usually known in advance. Operations research (OR), another instance, also studies decision-making under uncertainty, but problems in this field often have much larger action spaces than those commonly seen in DRL. psychology studies human behavior, which is partly the same complex sequential decision-making under uncertainty problem.

    The synergy between similar fields

    The bottom line is that you have come to a field that’s influenced by a variety of others. Although this is a good thing, it also brings inconsistencies in terminologies, notations, and so on. My take is the computer science approach to this problem, so this book is about building computer programs that solve complex decision-making problems under uncertainty, and as such, you can find code examples throughout the book.

    In DRL, these computer programs are called agents. An agent is a decision maker Only and nothing else. That means if you’re training a robot to pick up objects, the robot arm isn’t part of the agent. Only the code that makes decisions is referred to as the agent.

    Deep reinforcement learning agents can solve problems that require intelligence

    On the other side of the agent is the environment. The environment is everything outside the agent; everything the agent has no total control over. Again, imagine you’re training a robot to pick up objects. The objects to be picked up, the tray where the objects lay, the wind, and everything outside the decision maker are part of the environment. That means the robot arm is also part of the environment because it isn’t part of the agent. And even though the agent can decide to move the arm, the actual arm movement is noisy, and thus the arm is part of the environment.

    This strict boundary between the agent and the environment is counterintuitive at first, but the decision maker, the agent, can only have a single role: making decisions. Everything that comes after the decision gets bundled into the environment.

    Boundary between agent and environment

    Chapter 2 provides an in-depth survey of all the components of DRL. The following is a preview of what you’ll learn in chapter 2.

    The environment is represented by a set of variables related to the problem. For instance, in the robotic arm example, the location and velocities of the arm would be part of the variables that make up the environment. This set of variables and all the possible values that they can take are referred to as the state space. A state is an instantiation of the state space, a set of values the variables take.

    Interestingly, often, agents don’t have access to the actual full state of the environment. The part of a state that the agent can observe is called an Observation. Observations depend on states but are what the agent can see. For instance, in the robotic arm example, the agent may only have access to camera images. While an exact location of each object exists, the agent doesn’t have access to this specific state. Instead, the observations the agent perceives are derived from the states. You’ll often see in the literature states being used interchangeably, including in this book. I apologize in advance for the inconsistencies. Simply know the differences and be aware of the lingo; that’s what matters.

    States vs. observations

    At each state, the environment makes available a set of actions the agent can choose from. The agent influences the environment through these actions. The environment may change states as a response to the agent’s action. The function that’s responsible for this mapping is called the transition function. The environment may also provide a reward signal as a response. The function responsible for this mapping is called the reward function. The set of transition and reward functions is referred to as the model of the environment.

    The reinforcement learning cycle

    The environment commonly has a well-defined task. The goal of this task is defined through the reward function. The reward-function signals can be simultaneously sequential, evaluative, and sampled. To achieve the goal, the agent needs to demonstrate intelligence, or at least cognitive abilities commonly associated with intelligence, such as long-term thinking, information gathering, and generalization.

    The agent has a three-step process: the agent interacts with the environment, the agent evaluates its behavior, and the agent improves its responses. The agent can be designed to learn mappings from observations to actions called policies. The agent can be designed to learn the model of the environment on mappings called models. The agent can be designed to learn to estimate the reward-to-go on mappings called value functions.

    Deep reinforcement learning agents improve their behavior through trial-and-error learning

    The interactions between the agent and the environment go on for several cycles. Each cycle is called a time step. At each time step, the agent observes the environment, takes action, and receives a new observation and reward. The set of the state, the action, the reward, and the new state is called an experience. Every experience has an opportunity for learning and improving performance.

    Experience tuples

    The task the agent is trying to solve may or may not have a natural ending. Tasks that have a natural ending, such as a game, are called episodic tasks. Conversely, tasks that don’t are called continuing tasks, such as learning forward motion. The sequence of time steps from the beginning to the end of an episodic task is called an episode. Agents may take several time steps and episodes to learn to solve a task. Agents learn through trial and error: they try something, observe, learn, try something else, and so on.

    You’ll start learning more about this cycle in chapter 4, which contains a type of environment with a single step per episode. Starting with chapter 5, you’ll learn to deal with environments that require more than a single interaction cycle per episode.

    Deep reinforcement learning agents learn from sequential feedback

    The action taken by the agent may have delayed consequences. The reward may be sparse and only manifest after several time steps. Thus the agent must be able to learn from sequential feedback. Sequential feedback gives rise to a problem referred to as the temporal credit assignment problem. The temporal credit assignment problem is the challenge of determining which state and/or action is responsible for a reward. When there’s a temporal component to a problem, and actions have delayed consequences, it’s challenging to assign credit for rewards.

    The difficulty of the temporal credit assignment problem

    In chapter 3, we’ll study the ins and outs of sequential feedback in isolation. That is, your programs learn from simultaneously sequential, supervised (as opposed to evaluative), and exhaustive (as opposed to sampled) feedback.

    Deep reinforcement learning agents learn from evaluative feedback

    The reward received by the agent may be weak, in the sense that it may provide no supervision. The reward may indicate goodness and not correctness, meaning it may contain no information about other potential rewards. Thus the agent must be able to learn from evaluative feedback. Evaluative feedback gives rise to the need for exploration. The agent must be able to balance the gathering of information with the exploitation of current information. This is also referred to as the exploration versus exploitation trade-off.

    The difficulty of the exploration vs. exploitation trade-off

    In chapter 4, we’ll study the ins and outs of evaluative feedback in isolation. That is, your programs will learn from feedback that is simultaneously one-shot (as opposed to sequential), evaluative, and exhaustive (as opposed to sampled).

    Deep reinforcement learning agents learn from sampled feedback

    The reward received by the agent is merely a sample, and the agent doesn’t have access to the reward function. Also, the state and action spaces are commonly large, even infinite, so trying to learn from sparse and weak feedback becomes a harder challenge with samples. Therefore, the agent must be able to learn from sampled feedback, and it must be able to generalize.

    The difficulty of learning from sampled feedback

    Agents that are designed to approximate policies are called policy-based; agents that are designed to approximate value functions are called value-based; agents that are designed to approximate models are called model-based; and agents that are designed to approximate both policies and value functions are called actor-critic. Agents can be designed to approximate one or more of these components.

    Deep reinforcement learning agents use powerful non-linear function approximation

    The agent can approximate functions using a variety of ML methods and techniques, from decision trees to SVMs to neural networks. However, in this book, we use only neural networks; this is what the deep part of DRL refers to, after all. Neural networks aren’t necessarily the best solution to every problem; neural networks are data hungry and challenging to interpret, and you must keep these facts in mind. However, neural networks are among the most potent function approximations available, and their performance is often the best.

    A simple feed-forward neural network

    artificial neural networks (ANN) are multi-layered non-linear function approximators loosely inspired by the biological neural networks in animal brains. An ANN isn’t an algorithm, but a structure composed of multiple layers of mathematical transformations applied to input values.

    From chapter 3 through chapter 7, we only deal with problems in which agents learn from exhaustive (as opposed to sampled) feedback. Starting with chapter 8, we study the full DRL problem; that is, using deep neural networks so that agents can learn from sampled feedback. Remember, DRL agents learn from feedback that’s simultaneously sequential, evaluative, and sampled.

    The past, present, and future of deep reinforcement learning

    History isn’t necessary to gain skills, but it can allow you to understand the context around a topic, which in turn can help you gain motivation, and therefore, skills. The history of AI and DRL should help you set expectations about the future of this powerful technology. At times, I feel the hype surrounding AI is actually productive; people get interested. But, right after that, when it’s time to put in work, hype no longer helps, and it’s a problem. Although I’d like to be excited about AI, I also need to set realistic expectations.

    Recent history of artificial intelligence and deep reinforcement learning

    The beginnings of DRL could be traced back many years, because humans have been intrigued by the possibility of intelligent creatures other than ourselves since antiquity. But a good beginning could be Alan Turing’s work in the 1930s, 1940s, and 1950s that paved the way for modern computer science and AI by laying down critical theoretical foundations that later scientists leveraged.

    The most well-known of these is the Turing Test, which proposes a standard for measuring machine intelligence: if a human interrogator is unable to distinguish a machine from another human on a chat Q&A session, then the computer is said to count as intelligent. Though rudimentary, the Turing Test allowed generations to wonder about the possibilities of creating smart machines by setting a goal that researchers could pursue.

    The formal beginnings of AI as an academic discipline can be attributed to John McCarthy, an influential AI researcher who made several notable contributions to the field. To name a few, McCarthy is credited with coining the term artificial intelligence in 1955, leading the first AI conference in 1956, inventing the Lisp programming language in 1958, cofounding the MIT AI Lab in 1959, and contributing important papers to the development of AI as a field over several decades.

    Artificial intelligence winters

    All the work and progress of AI early on created a great deal of excitement, but there were also significant setbacks. Prominent AI researchers suggested we would create human-like machine intelligence within years, but this never came. Things got worse when a well-known researcher named James Lighthill compiled a report criticizing the state of academic research in AI. All of these developments contributed to a long period of reduced funding and interest in AI research known as the first AI winter.

    The field continued this pattern throughout the years: researchers making progress, people getting overly optimistic, then overestimating—leading to reduced funding by government and industry partners.

    Al funding pattern through the years

    The current state of artificial intelligence

    We are likely in another highly optimistic time in AI history, so we must be careful. Practitioners understand that AI is a powerful tool, but certain people think of AI as this magic black box that can take any problem in and out comes the best solution ever. Nothing can be further from the truth. Other people even worry about AI gaining consciousness, as if that was relevant, as Edsger W. Dijkstra famously said: The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.

    But, if we set aside this Hollywood-instilled vision of AI, we can allow ourselves to get excited about the recent progress in this field. Today, the most influential companies in the world make the most substantial investments to AI research. Companies such as Google, Facebook, Microsoft, Amazon, and Apple have invested in AI research and have become highly profitable thanks, in part, to AI systems. Their significant and steady investments have created the perfect environment for the current pace of AI research. Contemporary researchers have the best computing power available and tremendous amounts of data for their research, and teams of top researchers are working together, on the same problems, in the same location, at the same time. Current AI research has become more stable and more productive. We have witnessed one AI success after another, and it doesn’t seem likely to stop anytime soon.

    Progress in deep reinforcement learning

    The use of artificial neural networks for RL problems started around the 1990s. One of the classics is the backgammon-playing computer program, TD-Gammon, created by Gerald Tesauro et al. TD-Gammon learned to play backgammon by learning to evaluate table positions on its own through RL. Even though the techniques implemented aren’t precisely considered DRL, TD-Gammon was one of the first widely reported success stories using ANNs to solve complex RL problems.

    TD-Gammon architecture

    In 2004, Andrew Ng et al. developed an autonomous helicopter that taught itself to fly stunts by observing

    Enjoying the preview?
    Page 1 of 1