Reinforcement Learning: With Open AI, TensorFlow and Keras Using Python 1st Edition Abhishek Nandy 2024 scribd download
Reinforcement Learning: With Open AI, TensorFlow and Keras Using Python 1st Edition Abhishek Nandy 2024 scribd download
com
https://textbookfull.com/product/reinforcement-learning-
with-open-ai-tensorflow-and-keras-using-python-1st-edition-
abhishek-nandy/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/applied-reinforcement-learning-with-
python-with-openai-gym-tensorflow-and-keras-beysolow-ii/
textboxfull.com
https://textbookfull.com/product/deep-learning-with-python-develop-
deep-learning-models-on-theano-and-tensorflow-using-keras-jason-
brownlee/
textboxfull.com
https://textbookfull.com/product/beginning-anomaly-detection-using-
python-based-deep-learning-with-keras-and-pytorch-sridhar-alla/
textboxfull.com
Abhishek Nandy
Manisha Biswas
Reinforcement Learning
Abhishek Nandy Manisha Biswas
Kolkata, West Bengal, India North 24 Parganas, West Bengal, India
ISBN-13 (pbk): 978-1-4842-3284-2 ISBN-13 (electronic): 978-1-4842-3285-9
https://doi.org/10.1007/978-1-4842-3285-9
Library of Congress Control Number: 2017962867
Copyright © 2018 by Abhishek Nandy and Manisha Biswas
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical
way, and transmission or information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark
symbol with every occurrence of a trademarked name, logo, or image we use the names, logos,
and images only in an editorial fashion and to the benefit of the trademark owner, with no
intention of infringement of the trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the
date of publication, neither the authors nor the editors nor the publisher can accept any legal
responsibility for any errors or omissions that may be made. The publisher makes no warranty,
express or implied, with respect to the material contained herein.
Cover image by Freepik (www.freepik.com)
Managing Director: Welmoed Spahr
Editorial Director: Todd Green
Acquisitions Editor: Celestin Suresh John
Development Editor: Matthew Moodie
Technical Reviewer: Avirup Basu
Coordinating Editor: Sanchita Mandal
Copy Editor: Kezia Endsley
Compositor: SPi Global
Indexer: SPi Global
Artist: SPi Global
Distributed to the book trade worldwide by Springer Science+Business Media New York,
233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505,
e-mail [email protected], or visit www.springeronline.com. Apress Media,
LLC is a California LLC and the sole member (owner) is Springer Science + Business Media
Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
For information on translations, please e-mail [email protected], or visit
http://www.apress.com/rights-permissions.
Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook
versions and licenses are also available for most titles. For more information, reference our
Print and eBook Bulk Sales web page at http://www.apress.com/bulk-sales.
Any source code or other supplementary material referenced by the author in this book is
available to readers on GitHub via the book’s product page, located at www.apress.com/
978-1-4842-3284-2. For more detailed information, please visit http://www.apress.com/
source-code.
Printed on acid-free paper
Contents
■
■Chapter 1: Reinforcement Learning Basics������������������������������������ 1
What Is Reinforcement Learning?����������������������������������������������������������� 1
Faces of Reinforcement Learning����������������������������������������������������������� 6
The Flow of Reinforcement Learning������������������������������������������������������ 7
Different Terms in Reinforcement Learning�������������������������������������������� 9
Gamma������������������������������������������������������������������������������������������������������������������� 10
Lambda������������������������������������������������������������������������������������������������������������������� 10
Conclusion��������������������������������������������������������������������������������������������� 18
■
■Chapter 2: RL Theory and Algorithms������������������������������������������� 19
Theoretical Basis of Reinforcement Learning��������������������������������������� 19
Where Reinforcement Learning Is Used������������������������������������������������ 21
Manufacturing�������������������������������������������������������������������������������������������������������� 22
Inventory Management������������������������������������������������������������������������������������������� 22
iii
■ Contents
Delivery Management��������������������������������������������������������������������������������������������� 22
Finance Sector�������������������������������������������������������������������������������������������������������� 23
What Is MDP?���������������������������������������������������������������������������������������� 47
The Markov Property���������������������������������������������������������������������������������������������� 48
The Markov Chain��������������������������������������������������������������������������������������������������� 49
MDPs���������������������������������������������������������������������������������������������������������������������� 53
SARSA��������������������������������������������������������������������������������������������������� 54
Temporal Difference Learning�������������������������������������������������������������������������������� 54
How SARSA Works�������������������������������������������������������������������������������������������������� 56
Q Learning��������������������������������������������������������������������������������������������� 56
What Is Q?�������������������������������������������������������������������������������������������������������������� 57
How to Use Q���������������������������������������������������������������������������������������������������������� 57
SARSA Implementation in Python��������������������������������������������������������������������������� 58
The Entire Reinforcement Logic in Python������������������������������������������������������������� 64
iv
■ Contents
OpenAI Universe������������������������������������������������������������������������������������ 84
Conclusion��������������������������������������������������������������������������������������������� 87
■
■Chapter 4: Applying Python to Reinforcement Learning�������������� 89
Q Learning with Python������������������������������������������������������������������������� 89
The Maze Environment Python File������������������������������������������������������������������������ 91
The RL_Brain Python File��������������������������������������������������������������������������������������� 94
Updating the Function�������������������������������������������������������������������������������������������� 95
Conclusion������������������������������������������������������������������������������������������� 128
■■Chapter 5: Reinforcement Learning with Keras,
TensorFlow, and ChainerRL�������������������������������������������������������� 129
What Is Keras?������������������������������������������������������������������������������������ 129
Using Keras for Reinforcement Learning�������������������������������������������� 130
Using ChainerRL���������������������������������������������������������������������������������� 134
Installing ChainerRL���������������������������������������������������������������������������������������������� 134
Pipeline for Using ChainerRL�������������������������������������������������������������������������������� 137
Conclusion������������������������������������������������������������������������������������������� 153
v
■ Contents
Conclusion������������������������������������������������������������������������������������������� 163
Index���������������������������������������������������������������������������������������������� 165
vi
About the Authors
vii
About the Technical
Reviewer
ix
Acknowledgments
I want to dedicate this book to my mom and dad. Thank you to my teachers and my
co-author, Abhishek Nandy. Thanks also to Abhishek Sur, who mentors me at work
and helps me adapt to new technologies. I would also like to dedicate this book to my
company, InSync Tech-Fin Solutions Ltd., where I started my career and have grown
professionally.
—Manisha Biswas
xi
Introduction
xiii
CHAPTER 1
Reinforcement Learning
Basics
This chapter is a brief introduction to Reinforcement Learning (RL) and includes some
key concepts associated with it.
In this chapter, we talk about Reinforcement Learning as a core concept and then
define it further. We show a complete flow of how Reinforcement Learning works. We
discuss exactly where Reinforcement Learning fits into artificial intelligence (AI). After
that we define key terms related to Reinforcement Learning. We start with agents and
then touch on environments and then finally talk about the connection between agents
and environments.
2
Chapter 1 ■ Reinforcement Learning Basics
3
Chapter 1 ■ Reinforcement Learning Basics
In the maze, the centralized concept is to keep moving. The goal is to clear the maze
and reach the end as quickly as possible.
The following concepts of Reinforcement Learning and the working scenario are
discussed later this chapter.
• The agent is the intelligent program
• The environment is the maze
• The state is the place in the maze where the agent is
• The action is the move we take to move to the next state
• The reward is the points associated with reaching a particular
state. It can be positive, negative, or zero
We use the maze example to apply concepts of Reinforcement Learning. We will be
describing the following steps:
4
Chapter 1 ■ Reinforcement Learning Basics
The rewards predictions are made iteratively, where we update the value of each
state in a maze based on the value of the best subsequent state and the immediate reward
obtained. This is called the update rule.
The constant movement of the Reinforcement Learning process is based on
decision-making.
Reinforcement Learning works on a trial-and-error basis because it is very difficult to
predict which action to take when it is in one state. From the maze problem itself, you can
see that in order get the optimal path for the next move, you have to weigh a lot of factors.
It is always on the basis of state action and rewards. For the maze, we have to compute
and account for probability to take the step.
The maze also does not consider the reward of the previous step; it is specifically
considering the move to the next state. The concept is the same for all Reinforcement
Learning processes.
Here are the steps of this process:
1. We have a problem.
2. We have to apply Reinforcement Learning.
3. We consider applying Reinforcement Learning as a
Reinforcement Learning box.
4. The Reinforcement Learning box contains all essential
components needed for applying the Reinforcement Learning
process.
5. The Reinforcement Learning box contains agents,
environments, rewards, punishments, and actions.
Reinforcement Learning works well with intelligent program agents that give rewards
and punishments when interacting with an environment.
The interaction happens between the agents and the environments, as shown in
Figure 1-4.
From Figure 1-4, you can see that there is a direct interaction between the agents and
its environments. This interaction is very important because through these exchanges,
the agent adapts to the environments. When a Machine Learning program, robot, or
Reinforcement Learning program starts working, the agents are exposed to known or
unknown environments and the Reinforcement Learning technique allows the agents to
interact and adapt according to the environment’s features.
Accordingly, the agents work and the Reinforcement Learning robot learns. In order
to get to a desired position, we assign rewards and punishments.
5
Chapter 1 ■ Reinforcement Learning Basics
Now, the program has to work around the optimal path to get maximum rewards if
it fails (that is, it takes punishments or receives negative points). In order to reach a new
position, which also is known as a state, it must perform what we call an action.
To perform an action, we implement a function, also known as a policy. A policy is
therefore a function that does some work.
6
Chapter 1 ■ Reinforcement Learning Basics
The interaction happens from one state to another. The exact connection starts
between an agent and the environment. Rewards are happening on a regular basis.
We take appropriate actions to move from one state to another.
The key points of consideration after going through the details are the following:
• The Reinforcement Learning cycle works in an interconnected
manner.
• There is distinct communication between the agent and the
environment.
• The distinct communication happens with rewards in mind.
• The object or robot moves from one state to another.
• An action is taken to move from one state to another
7
Chapter 1 ■ Reinforcement Learning Basics
An agent is always learning and finally makes a decision. An agent is a learner, which
means there might be different paths. When the agent starts training, it starts to adapt and
intelligently learns from its surroundings.
The agent is also a decision maker because it tries to take an action that will get it the
maximum reward.
When the agent starts interacting with the environment, it can choose an action and
respond accordingly.
From then on, new scenes are created. When the agent changes from one place to
another in an environment, every change results in some kind of modification. These
changes are depicted as scenes. The transition that happens in each step helps the agent
solve the Reinforcement Learning problem more effectively.
8
Chapter 1 ■ Reinforcement Learning Basics
Let’s look at another scenario of state transitioning, as shown in Figures 1-8 and 1-9.
At each state transition, the reward is a different value, hence we describe reward
with varying values in each step, such as r0, r1, r2, etc. Gamma (γ) is called a discount
factor and it determines what future reward types we get:
• A gamma value of 0 means the reward is associated with the
current state only
• A gamma value of 1 means that the reward is long-term
9
Chapter 1 ■ Reinforcement Learning Basics
Gamma
Gamma is used in each state transition and is a constant value at each state change.
Gamma allows you to give information about the type of reward you will be getting in
every state. Generally, the values determine whether we are looking for reward values in
each state only (in which case, it’s 0) or if we are looking for long-term reward values (in
which case it’s 1).
Lambda
Lambda is generally used when we are dealing with temporal difference problems. It is
more involved with predictions in successive states.
Increasing values of lambda in each state shows that our algorithm is learning fast.
The faster algorithm yields better results when using Reinforcement Learning techniques.
As you’ll learn later, temporal differences can be generalized to what we call
TD(Lambda). We discuss it in greater depth later.
10
Chapter 1 ■ Reinforcement Learning Basics
RL Characteristics
We talk about characteristics next. The characteristics are generally what the agent does
to move to the next state. The agent considers which approach works best to make the
next move.
The two characteristics are
• Trial and error search.
• Delayed reward.
As you probably have gathered, Reinforcement Learning works on three things
combined:
(S,A,R)
11
Chapter 1 ■ Reinforcement Learning Basics
12
Chapter 1 ■ Reinforcement Learning Basics
Agents
In terms of Reinforcement Learning, agents are the software programs that make
intelligent decisions. Agents should be able to perceive what is happening in the
environment. Here are the basic steps of the agents:
1. When the agent can perceive the environment, it can make
better decisions.
2. The decision the agents take results in an action.
3. The action that the agents perform must be the best, the
optimal, one.
Software agents might be autonomous or they might work together with other agents
or with people. Figure 1-14 shows how the agent works.
13
Chapter 1 ■ Reinforcement Learning Basics
RL Environments
The environments in the Reinforcement Learning space are comprised of certain factors
that determine the impact on the Reinforcement Learning agent. The agent must adapt
accordingly to the environment. These environments can be 2D worlds or grids or even a
3D world.
Here are some important features of environments:
• Deterministic
• Observable
• Discrete or continuous
• Single or multiagent.
Deterministic
If we can infer and predict what will happen with a certain scenario in the future, we say
the scenario is deterministic.
It is easier for RL problems to be deterministic because we don’t rely on the
decision-making process to change state. It’s an immediate effect that happens with state
transitions when we are moving from one state to another. The life of a Reinforcement
Learning problem becomes easier.
When we are dealing with RL, the state model we get will be either deterministic or
non-deterministic. That means we need to understand the mechanisms behind how DFA
and NDFA work.
14
Chapter 1 ■ Reinforcement Learning Basics
We are showing a state transition from a start state to a final state with the help of
a diagram. It is a simple depiction where we can say that, with some input value that is
assumed as 1 and 0, the state transition occurs. The self-loop is created when it gets a
value and stays in the same state.
The working principle of the state diagram in Figure 1-16 can be explained as
follows. In NDFA the issue is when we are transitioning from one state to another, there is
more than one option available, as we can see in Figure 1-16. From State S0 after getting
an input such as 0, it can stay in state S0 or move to state S1. There is decision-making
involved here, so it becomes difficult to know which action to take.
Observable
If we can say that the environment around us is fully observable, we have a perfect
scenario for implementing Reinforcement Learning.
An example of perfect observability is a chess game. An example of partial
observability is a poker game, where some of the cards are unknown to any one player.
15
Chapter 1 ■ Reinforcement Learning Basics
Discrete or Continuous
If there is more than one choice for transitioning to the next state, that is a continuous
scenario. When there are a limited number of choices, that’s called a discrete scenario.
16
Chapter 1 ■ Reinforcement Learning Basics
Figure 1-18 shows how multiagents work. There is an interaction between two agents
in order to make the decision.
17
Chapter 1 ■ Reinforcement Learning Basics
Conclusion
This chapter touched on the basics of Reinforcement Learning and covered some key
concepts. We covered states and environments and how the structure of Reinforcement
Learning looks.
We also touched on the different kinds of interactions and learned about single-
agent and multiagent solutions.
The next chapter covers algorithms and discusses the building blocks of
Reinforcement Learning.
18
Exploring the Variety of Random
Documents with Different Content
Defects.—Too light or heavy a head; too highly arched frontal
bone; large ears, and hanging flat to face; short neck; full dewlap;
too narrow or too broad a chest; sunken, or hollow, or quite straight
back; bent fore legs, overbent fetlocks, twisted feet, spreading toes;
too curly a tail; weak hind quarters and a general want of muscle;
too short in body.
THE HOUND (RUSSIAN
WOLFHOUND).
Optimist.
Me Too.
Boodles, Esq.
King of Kent.
Norman. Nellie.
H. G. Trevor’s, Southampton, L. I.
Champion Milo.
Origin.—There is little doubt but that the poodle of to-day finds its
origin in the old “water-dog” of France, where it was not only used
for retrieving wounded water-fowl, but for swimming-contests, when
the hind parts were clipped or shaven in order to give freer action to
the legs.
Uses.—A very bright, intelligent companion, and a good retriever.
* Scale of Points, Etc.
Value.
Head, muzzle, and eyes 20
Neck and chest 5
Back and loins 10
Legs and feet 15
Stern 5
Coat 20
Color 10
Symmetry 15
Total 100
General Appearance.—Strong, active, intelligent, cobby in build, and
perfectly coated with close curls or long “cords.”
Head.—Long; skull large, wide between the eyes, slight
peak; parts over eyes well arched; the whole covered
with curls or cords. Muzzle long (not snipy), slightly
tapering, not too deep; stop well defined. Teeth level and
strong; black roof of mouth preferable. Eyes medium
size, dark, bright, and set at right angles with the line of
face. Nose large, perfectly black; wide-open nostrils. Ears very long,
close to cheek, low set, and well covered with ringlets or curls.
Neck.—Very strong, admitting head to be carried high.
Chest.—Fairly deep, but not too wide, well covered with muscles.
Legs.—Fore legs perfectly straight, and not so long as to be leggy;
hind legs muscular, well bent, with hocks low down.
Feet.—Strong, slightly spread, standing well on toes; nails black;
pads large and hard.
Back.—Fair length; well-ribbed-up body; loins strong and muscular.
Tail.—Carried at angle of 45 degrees, with long ringlets or cords.
Preferable length, 3 to 5 inches.
Coat.—If corded, cords should be thick and strong, hanging in
long, ropy cords. If curly, the curls close, thick, and of silky texture.
Weight.—From 40 to 60 pounds.
Only three colors are admitted, black, white, and red, and they
should be without mixture.
THE POODLE (BLACK, CORDED).
Tell.
For origin, uses, scale of points, etc., see The Poodle (Black, Curly-
coated).
THE POODLE (WHITE-AND-RED).
* Scale of Points, Etc.
Same as the black poodle, except:
Eyes.—Yellow or light blue, free from black rims around eyelids.
Nose.—Red or liver color.
Nails.—Red or pink.
Back.—Spots on back should be red or liver, and the entire body
free from black ticks.
THE PUG (FAWN).
Haughty Madge.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com