Practical Deep Reinforcement Learning with Python: Concise Implementation of Algorithms, Simplified Maths, and Effective Use of TensorFlow and PyTorch (English Edition)
By Ivan Gridin
4/5
()
Reinforcement Learning
Q-Learning
Machine Learning
Neural Networks
Tensorflow
Problem-Solving
Mentor
Hero's Journey
Workaholic
Absent-Minded Professor
Exploration
Genius Protagonist
Family Man
Learning From Experience
Intelligent Machines
Deep Q-Network
Pytorch
Deep Learning
Artificial Intelligence
Stock Trading
About this ebook
This book introduces readers to reinforcement learning from a pragmatic point of view. The book does involve mathematics, but it does not attempt to overburden the reader, who is a beginner in the field of reinforcement learning.
The book brings a lot of innovative methods to the reader's attention in much practical learning, including Monte-Carlo, Deep Q-Learning, Policy Gradient, and Actor-Critical methods. While you understand these techniques in detail, the book also provides a real implementation of these methods and techniques using the power of TensorFlow and PyTorch. The book covers some enticing projects that show the power of reinforcement learning, and not to mention that everything is concise, up-to-date, and visually explained.
After finishing this book, the reader will have a thorough, intuitive understanding of modern reinforcement learning and its applications, which will tremendously aid them in delving into the interesting field of reinforcement learning.
Read more from Ivan Gridin
Time Series Forecasting using Deep Learning: Combining PyTorch, RNN, TCN, and Deep Neural Network Models to Provide Production-Ready Prediction Solutions Rating: 0 out of 5 stars0 ratings
Related to Practical Deep Reinforcement Learning with Python
Related ebooks
Advanced Machine Learning with Python Rating: 0 out of 5 stars0 ratingsDeep Learning with Keras Rating: 4 out of 5 stars4/5Hands-on Supervised Learning with Python Rating: 0 out of 5 stars0 ratingsDeep Learning for Data Architects: Unleash the power of Python's deep learning algorithms (English Edition) Rating: 0 out of 5 stars0 ratingsPython Machine Learning: A Step by Step Beginner’s Guide to Learn Machine Learning Using Python Rating: 0 out of 5 stars0 ratingsMachine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques Rating: 5 out of 5 stars5/5Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python Rating: 0 out of 5 stars0 ratingsMicrosoft Azure Machine Learning Rating: 4 out of 5 stars4/5Image Processing in Python Rating: 0 out of 5 stars0 ratingsFundamentals of Machine Learning: An Introduction to Neural Networks Rating: 0 out of 5 stars0 ratingsDEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB Rating: 0 out of 5 stars0 ratingsNo-Code Artificial Intelligence: The new way to build AI powered applications (English Edition) Rating: 3 out of 5 stars3/5Capitalizing Data Science: A Guide to Unlocking the Power of Data for Your Business and Products (English Edition) Rating: 0 out of 5 stars0 ratingsNumPy Cookbook Rating: 5 out of 5 stars5/5Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratingsReinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges Rating: 0 out of 5 stars0 ratingsConvolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python Rating: 0 out of 5 stars0 ratingsLearning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models Rating: 0 out of 5 stars0 ratingsDesigning Machine Learning Systems with Python Rating: 0 out of 5 stars0 ratingsMachine Learning for Finance Rating: 5 out of 5 stars5/5Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python Rating: 2 out of 5 stars2/5Deep Learning with Keras: Beginner’s Guide to Deep Learning with Keras Rating: 3 out of 5 stars3/5
Trending on #Booktok
Powerless Rating: 4 out of 5 stars4/5The Assassin and the Pirate Lord: A Throne of Glass Novella Rating: 4 out of 5 stars4/5Icebreaker: A Novel Rating: 4 out of 5 stars4/5A Court of Mist and Fury Rating: 5 out of 5 stars5/5It Ends with Us: A Novel Rating: 4 out of 5 stars4/5A Little Life: A Novel Rating: 4 out of 5 stars4/5Pride and Prejudice Rating: 4 out of 5 stars4/5The Secret History: A Read with Jenna Pick: A Novel Rating: 4 out of 5 stars4/5If We Were Villains: A Novel Rating: 4 out of 5 stars4/5Once Upon a Broken Heart Rating: 4 out of 5 stars4/5The Summer I Turned Pretty Rating: 4 out of 5 stars4/5Funny Story Rating: 4 out of 5 stars4/5Crime and Punishment Rating: 4 out of 5 stars4/5Normal People: A Novel Rating: 4 out of 5 stars4/5Happy Place Rating: 4 out of 5 stars4/5The Love Hypothesis Rating: 4 out of 5 stars4/5Seven Stones to Stand or Fall: A Collection of Outlander Fiction Rating: 4 out of 5 stars4/5Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones Rating: 4 out of 5 stars4/5Fire & Blood: 300 Years Before A Game of Thrones Rating: 4 out of 5 stars4/5Beauty and the Beast Rating: 4 out of 5 stars4/5Divine Rivals: A Novel Rating: 4 out of 5 stars4/5Better Than the Movies Rating: 4 out of 5 stars4/5The 48 Laws of Power Rating: 4 out of 5 stars4/5The Little Prince: New Translation Version Rating: 5 out of 5 stars5/5Rich Dad Poor Dad Rating: 4 out of 5 stars4/5Dune Rating: 4 out of 5 stars4/5The Lord Of The Rings: One Volume Rating: 5 out of 5 stars5/5Finnegans Wake Rating: 4 out of 5 stars4/5Beach Read Rating: 4 out of 5 stars4/5Milk and Honey: 10th Anniversary Collector's Edition Rating: 4 out of 5 stars4/5
Reviews for Practical Deep Reinforcement Learning with Python
1 rating0 reviews
Book preview
Practical Deep Reinforcement Learning with Python - Ivan Gridin
Part - I
The first part of the book will be devoted to classical reinforcement learning methods. This part will consider the theoretical foundations of reinforcement learning problems and the primary techniques for solving them. One of the main concepts of the book's first part is the Q-Learning method. The Q-Learning method described in Chapter 6: Escaping Maze With Q-Learning, is the cornerstone for most reinforcement learning solutions. The book's first part can be considered as an introduction to reinforcement learning.
CHAPTER 1
Introducing Reinforcement Learning
Reinforcement learning ( RL ) is one of the most active research areas in machine learning. Many researchers think that RL will take us closer to reaching artificial general intelligence. In the past few years, RL has evolved rapidly and has been used in complex applications ranging from stock trading to self-driving cars. The main reason for this growth is the involvement of deep reinforcement learning, which is a combination of deep learning and reinforcement learning. Reinforcement learning is one of the most promising areas of machine learning that we will study in this book.
Structure
In this chapter, we will discuss the following topics:
What is reinforcement learning?
Reinforcement learning mechanism
Reinforcement learning vs. supervised learning
Applications of reinforcement learning
Objectives
After completing this chapter, you will have a basic understanding of reinforcement learning and its key definitions. You will also have learned how reinforcement learning works and how it differs from other machine learning approaches.
What is reinforcement learning?
Reinforcement learning is defined as a machine learning technique concerned with how agents should take actions in a surrounding environment depending on their current state. RL is a part of machine learning that helps an agent maximize the cumulative reward collected after making some sequence of actions. In RL, agents act in a known or unknown environment to constantly adapt and learn based on collected experience. The feedback of an environment might be positive, also known as rewards, or negative, also called punishments. At this point of time, all the above definitions may seem too abstract and unclear, but we will elaborate on them in this chapter.
The following figure represents the key concept of RL:
Figure 1.1: Reinforcement learning
Here, the agent is in some initial state in some environment. Then, the agent decides to take some action. The environment reacts to the agent's action, returns the agent some reward for his action, and transfers him to another state.
Most used reinforcement learning keywords are as follows:
Agent is a decision-maker who defines what action to take.
Examples: Self-driving car, chess player, stock trading robot
Action is a concrete act in a surrounding environment that is taken by the agent.
Examples: Turn car left, move chess pawn one cell forward, sell all assets
Environment is a problem context that the agent cooperates with.
Examples: Car track, chess board, stock market
State is a position of the agent in the environment.
Examples: Car coordinates on the track and its speed, arrangement of pieces on the chessboard, price of assets
Reward is a numerical value returned by an environment as the reaction to the agent's action.
Example: To reach a goal on the car without any accidents, to win chess play, to earn more money
RL is learning what to do or how to map situations to actions to maximize a reward. The agent doesn't know which actions to take but must learn which actions produce the most reward by trying them. Usually, actions may affect the immediate reward and the next situation and all subsequent rewards. It means that the agent should not think about the immediate reward only but about the reward in the long-term sense.
Reinforcement learning mechanics
In our life, we usually try to maximize our rewards. And it does not mean that we are always thinking about money or materialistic things. To give an example, when we read a new book to learn new skills, we understand that it is better to read a book carefully, without hurrying. Our way to read a book is a strategy, and the skills we gain are our reward. When we are negotiating with other people, we are trying to be polite, and the feedback we get is our reward.
The purpose of the reward is to tell our agent how well it has behaved. The main goal of RL is to find such strategy that maximizes the reward after some number of actions. Let's see some simple examples that help you illustrate the reinforcement learning mechanism.
Consider the following scientifically factual scenario. A robot has arrived on our planet. This robot is very good at designing posters but does not know how to negotiate with people. His target is to get a job and make a lot of money in 5 years. Good plan, why not? Every day, the robot makes a particular decision about how it will act today. At the end of the day, he checks his bank account and summarizes his state in the company.
Let's consider the first scenario. The robot decides to steal a computer from the office and sell it on the first working day. And it may seem that this is a pretty good decision because it will help the robot increase its balance significantly. But of course, we understand that the decision like this can be made only once, and the profits of our robot will stop there.
The following figure illustrates the first scenario:
Figure 1.2: First strategy
Now, let's consider the second scenario. Every day the robot works hard and learns new things. In this case, his strategy is long-term. It may be inferior to other strategies in the short term, but it will be significantly more profitable in the long term.
Figure 1.3: Second strategy
Of course, in real life, everything is much more complicated. But this example illustrates the principle when it is necessary to think several steps ahead. A solution that has a quick effect can be fatal in the long run. Reinforcement learning aims to find long-term strategies that maximize the agent's reward.
Here are some essential characteristics of reinforcement learning:
There is no supervisor. Agent only receives a reward signal
Sequential decision making
Agent's actions determine the subsequent data it receives
The term reinforcement comes from the fact that a reward received by an agent should reinforce its behavior in a positive or negative direction. A local reward indicates the success of the agent's recent action and not overall successes achieved by the agent so far. Of course, getting a large reward for some action doesn't mean that you won't face dramatic consequences later due to your previous decisions. Remember our example with a robot that decides to rob a computer - it could look like a brilliant idea until you think about the next day.
The problem can be considered as RL problem if we can define the following:
Agent: Define the subject, which takes some actions.
Environment: Define the system that receives an agent's actions.
Set of states: Define the set of states that an agent can receive. This set can be infinite.
Set of actions: Define the set of actions an agent can take. This set can be infinite.
Reward: Define what the agent's primary goal is and how it can be achieved with some reward system.
If all the above definitions can be obtained, you obviously deal with the reinforcement learning problem.
Reinforcement learning vs. supervised learning
When we have an intuitive understanding of reinforcement learning, we can examine how it differs from traditional supervised learning. A good rule of thumb is to treat reinforcement learning as a dynamic model and supervised learning as a static model. Let's elaborate on this.
We can use supervised learning as a statistical model that can extract some correlations and patterns from which they make predictions without being explicitly programmed. Generally speaking, supervised learning makes only one action. It takes input and returns the output. Its primary goal is to provide you with an automatically built function F that maps some input X into some output Y:
Figure 1.4: Supervised learning
While reinforcement learning builds an agent that makes a sequence of actions interacting with an environment, this agent cooperates with an environment and produces the sequence of actions:
Figure 1.5: Reinforcement learning
Let's summarize all distinctions between reinforcement learning and supervised learning in the following table:
Table 1.1: Reinforcement learning vs. supervised learning
It is important to understand the difference between reinforcement learning and supervised learning. This knowledge will help you in the correct use of each of these methods.
Examples of reinforcement learning
In this section, we will see some popular examples of RL problems. In all these problems, we have the following: agent, environment, set of states, set of actions, and the reward.
Stock trading
This type of activity assumes making a profit by buying and selling shares of different companies. All traders tend to buy stocks of a company when they are cheap and sell when they are high:
Table 1.2: Stock trading as RL problem
Chess
Chess is one of the oldest games. This game has many different styles and approaches. However, chess is also a reinforcement learning problem:
Table 1.3: Chess as RL problem
Neural Architecture Search (NAS)
RL has been successfully applied to the domain of Neural network Architecture Search (NAS). The goal is to get the best performance on some datasets by selecting the number of layers or their parameters, adding extra connections, or making other changes to the architecture. The reward, in this case, is the performance of neural network architecture:
Table 1.4: NAS as RL problem
As you can see, many practical problems can be solved using the reinforcement learning approach.
Conclusion
Reinforcement learning is a machine learning approach that aims to find optimal decision-making strategies. It differs from other machine learning approaches by emphasizing agent learning from direct interaction with its environment. It doesn't require traditional supervision or complete computational models of the environment. Reinforcement learning aims to find an appropriate long-term strategy that allows collecting maximum rewards to an agent. In the next chapter, we will study the theory of Markov decision processes that form the base of the entire reinforcement learning approach.
Points to remember
A solution that has a quick effect can be fatal in the long run.
RL doesn't assume any supervisor. Agent only receives a reward signal.
RL produces a sequential decision-making strategy.
Reinforcement learning is a dynamic model, and supervised learning is a static model.
Multiple choice questions
Let's consider a popular and simple computer game called Tetris, which has relatively simple mechanics. When the player builds one or more completed rows, the completed rows disappear, and the player gains some points. The game's goal is to prevent the blocks from stacking up to the top of the screen and collect as many points as possible.
Figure 1.6: Reinforcement learning
What do you think? Can Tetris be considered as an RL problem?
Yes
No
Considering Tetris as an RL problem, define an agent.
Score
Player
Number of disappeared lines
Considering Tetris as an RL problem, define a state.
Score
Arrangement of bricks and score
Arrangement of bricks, score, and the next element
Answers
a
b
c
Key terms
Agent: A decision-maker who defines what action to take.
Action: A concrete act in a surrounding environment that takes the agent.
Environment: A problem context that the agent cooperates with.
State: A position of an agent in the environment.
Reward: A numerical value returned by an environment as the reaction of the agent's action.
CHAPTER 2
Playing Monopoly and Markov Decision Process
In the last chapter, you got a general introduction to reinforcement learning ( RL ). We saw different examples for different problems and highlighted the main characteristics of reinforcement learning. But before we start solving practical problems, we will formally describe how you can solve them using the RL approach. One of the RL cornerstones is the Markov decision process ( MDP ). This concept is the foundation of the whole theory of reinforcement learning. We will dedicate this chapter to explaining what the Markov decision process is with the help of Monopoly game examples. We'll discuss MDPs in greater detail as we walk through the chapter. Markov chains and Markov decision processes are extensively used in many aspects of engineering and statistics. Reading this chapter will be useful for understanding the context of reinforcement learning and a much wider range of topics. If you're already familiar with MDPs, you can quickly get a grasp of this chapter, just by focusing on the terminology definitions that will be used later in the book.
Structure
In this chapter, we will discuss the following topics:
What is the best strategy for playing Monopoly?
Markov chain
Markov reward process
Markov decision process
Policy
Monopoly as Markov decision process
Objectives
The primary goal of this chapter is to provide the basics and fundamental concepts of reinforcement learning: Markov reward process and Policy. We will look at simple and straightforward examples that will allow us to understand what lies at the heart of these concepts. This chapter will give you a clear understanding of tasks that reinforcement learning deals with.
Choosing the best strategy for playing Monopoly
The formal mathematical explanation of Markov decision process often confuses the reader, although this concept is not as complicated as it might seem. In this chapter, we will explore what Markov decision process is by playing the popular game of Monopoly.
Let's create a list of simplified versions of the Monopoly game.
We will consider only simplified rules of the game here. This chapter does not need to go through a complete list of rules.
List of rules
Our custom simplified Monopoly game will follow the given set of rules:
Two players are playing. For the sake of simplicity, we will consider a game for two players only. We will denote the players by a square and a triangle:
Figure 2.1: Monopoly players
Each player rolls the dice and moves forward a certain number of cells:
Figure 2.2: Player 1 moves four steps forward
Each cell can be purchased for the price indicated on it. When a player gets on a free cell, they have two options:
Buy a cell
Do not buy a cell
It is not obligatory to buy a free cell:
Figure 2.3: Cell prices
If a player lands on someone else's cell, then he must pay the other player 20% of the cost of the cell.
Figure 2.4: Player 1 has to pay $2 to Player 2
Each player starts the game with $100.
There are surprise cells on the board. They randomly give three results:
Player gets $10 from the bank
Player gives $5 to the bank
Player skips one turn
A player loses when they run out of money.
Let's take a look at the entire board:
Figure 2.5: Monopoly playing board
Now that we have defined the rules, we have a more interesting question: what strategy should we choose for the game? It would seem that there is a reasonable and straightforward strategy: buy everything you can! Indeed, the more cells the player buys, the more rent he will receive when another player hits his cells. But everything is not so simple. Let's take a look at the example in Figure 2.6:
Figure 2.6: To buy or not to buy?
Suppose player 1 has only $40 left. And he just got on the cell that costs $40. Should they buy it? If player 1 buys it, then the probability of losing on the next move is extremely high. Because player 1 will have no money left, and they can get to the cells that have already been bought by player 2:
Figure 2.7: Player 1 can lose on the next turn if he buys a cell
As we can see, there is no primitive strategy in this game. A more advanced approach