The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations With Python 1st Edition Michael Hu
The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations With Python 1st Edition Michael Hu
https://ebookmass.com/product/the-art-of-reinforcement-learning-
michael-hu/
https://ebookmass.com/product/learning-scientific-programming-with-
python-hill/
https://ebookmass.com/product/model-based-reinforcement-learning-
milad-farsi/
https://ebookmass.com/product/learning-scientific-programming-with-
python-2nd-edition-christian-hill-2/
https://ebookmass.com/product/learning-scientific-programming-with-
python-2nd-edition-christian-hill/
https://ebookmass.com/product/python-fundamentals-for-finance-a-
survey-of-algorithmic-options-trading-with-python-van-der-post/
https://ebookmass.com/product/mathematics-fundamentals-dr-des-hill/
Michael Hu
© Michael Hu 2023
Apress Standard
The publisher, the authors, and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
Source Code
You can download the source code used in this book from github.com/
apress/art-of-reinforcement-lear ning.
Michael Hu
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub (https://github.
com/Apress). For more detailed information, please visit https://www.
apress.com/gp/services/source-code.
Contents
Part I Foundation
1 Introduction
1.1 AI Breakthrough in Games
1.2 What Is Reinforcement Learning
1.3 Agent-Environment in Reinforcement Learning
1.4 Examples of Reinforcement Learning
1.5 Common Terms in Reinforcement Learning
1.6 Why Study Reinforcement Learning
1.7 The Challenges in Reinforcement Learning
1.8 Summary
References
2 Markov Decision Processes
2.1 Overview of MDP
2.2 Model Reinforcement Learning Problem Using MDP
2.3 Markov Process or Markov Chain
2.4 Markov Reward Process
2.5 Markov Decision Process
2.6 Alternative Bellman Equations for Value Functions
2.7 Optimal Policy and Optimal Value Functions
2.8 Summary
References
3 Dynamic Programming
3.1 Use DP to Solve MRP Problem
3.2 Policy Evaluation
3.3 Policy Improvement
Visit https://ebookmass.com today to explore
a vast collection of ebooks across various
genres, available in popular formats like
PDF, EPUB, and MOBI, fully compatible with
all devices. Enjoy a seamless reading
experience and effortlessly download high-
quality materials in just a few simple steps.
Plus, don’t miss out on exciting offers that
let you access a wealth of knowledge at the
best prices!
3.4 Policy Iteration
3.5 General Policy Iteration
3.6 Value Iteration
3.7 Summary
References
4 Monte Carlo Methods
4.1 Monte Carlo Policy Evaluation
4.2 Incremental Update
4.3 Exploration vs.Exploitation
4.4 Monte Carlo Control (Policy Improvement)
4.5 Summary
References
5 Temporal Difference Learning
5.1 Temporal Difference Learning
5.2 Temporal Difference Policy Evaluation
5.3 Simplified 𝜖-Greedy Policy for Exploration
5.4 TD Control—SARSA
5.5 On-Policy vs.Off-Policy
5.6 Q-Learning
5.7 Double Q-Learning
5.8 N-Step Bootstrapping
5.9 Summary
References
Part II Value Function Approximation
6 Linear Value Function Approximation
6.1 The Challenge of Large-Scale MDPs
6.2 Value Function Approximation
6.3 Stochastic Gradient Descent
6.4 Linear Value Function Approximation
6.5 Summary
References
7 Nonlinear Value Function Approximation
7.1 Neural Networks
7.2 Training Neural Networks
7.3 Policy Evaluation with Neural Networks
7.4 Naive Deep Q-Learning
7.5 Deep Q-Learning with Experience Replay and Target
Network
7.6 DQN for Atari Games
7.7 Summary
References
8 Improvements to DQN
8.1 DQN with Double Q-Learning
8.2 Prioritized Experience Replay
8.3 Advantage function and Dueling Network Architecture
8.4 Summary
References
Part III Policy Approximation
9 Policy Gradient Methods
9.1 Policy-Based Methods
9.2 Policy Gradient
9.3 REINFORCE
9.4 REINFORCE with Baseline
9.5 Actor-Critic
9.6 Using Entropy to Encourage Exploration
9.7 Summary
References
10 Problems with Continuous Action Space
10.1 The Challenges of Problems with Continuous Action Space
10.2 MuJoCo Environments
10.3 Policy Gradient for Problems with Continuous Action
Space
10.4 Summary
References
11 Advanced Policy Gradient Methods
11.1 Problems with the Standard Policy Gradient Methods
11.2 Policy Performance Bounds
11.3 Proximal Policy Optimization
11.4 Summary
References
Part IV Advanced Topics
12 Distributed Reinforcement Learning
12.1 Why Use Distributed Reinforcement Learning
12.2 General Distributed Reinforcement Learning Architecture
12.3 Data Parallelism for Distributed Reinforcement Learning
12.4 Summary
References
13 Curiosity-Driven Exploration
13.1 Hard-to-Explore Problems vs.Sparse Reward Problems
13.2 Curiosity-Driven Exploration
13.3 Random Network Distillation
13.4 Summary
References
14 Planning with a Model:AlphaZero
14.1 Why We Need to Plan in Reinforcement Learning
14.2 Monte Carlo Tree Search
14.3 AlphaZero
14.4 Training AlphaZero on a 9 × 9 Go Board
14.5 Training AlphaZero on a 13 × 13 Gomoku Board
14.6 Summary
References
Index
About the Author
Michael Hu
is an exceptional software engineer with a wealth of
expertise spanning over a decade, specializing in the
design and implementation of enterprise-level
applications. His current focus revolves around leveraging
the power of machine learning (ML) and artificial
intelligence (AI) to revolutionize operational systems
within enterprises. A true coding enthusiast, Michael finds
solace in the realms of mathematics and continuously
explores cutting-edge technologies, particularly machine learning and
deep learning. His unwavering passion lies in the realm of deep
reinforcement learning, where he constantly seeks to push the
boundaries of knowledge. Demonstrating his commitment to the field,
he has built various numerous open source projects on GitHub that
closely emulate state-of-the-art reinforcement learning algorithms
pioneered by DeepMind, including notable examples like AlphaZero,
MuZero, and Agent57. Through these projects, Michael demonstrates
his commitment to advancing the field and sharing his knowledge with
fellow enthusiasts. He currently resides in the city of Shanghai, China.
About the Technical Reviewer
Shovon Sengupta
has over 14 years of expertise and a deepened
understanding of advanced predictive analytics, machine
learning, deep learning, and reinforcement learning. He
has established a place for himself by creating innovative
financial solutions that have won numerous awards. He is
currently working for one of the leading multinational
financial services corporations in the United States as the
Principal Data Scientist at the AI Center of Excellence. His job entails
leading innovative initiatives that rely on artificial intelligence to
address challenging business problems. He has a US patent (United
States Patent: Sengupta et al.: Automated Predictive Call Routing Using
Reinforcement Learning [US 10,356,244 B1]) to his credit. He is also a
Ph.D. scholar at BITS Pilani. He has reviewed quite a few popular titles
from leading publishers like Packt and Apress and has also authored a
few courses for Packt and CodeRed (EC-Council) in the realm of
machine learning. Apart from that, he has presented at various
international conferences on machine learning, time series forecasting,
and building trustworthy AI. His primary research is concentrated on
deep reinforcement learning, deep learning, natural language
processing (NLP), knowledge graph, causality analysis, and time series
analysis. For more details about Shovon’s work, please check out his
LinkedIn page: www.linkedin.com/in/shovon-sengupta-272aa917.
Part I
Foundation
© The Author(s), under exclusive license to APress Media, LLC, part of Springer
Nature 2023
M. Hu, The Art of Reinforcement Learning
https://doi.org/10.1007/978-1-4842-9606-6_1
1. Introduction
Michael Hu1
(1) Shanghai, Shanghai, China
Fig. 1.1 A DQN agent learning to play Atari’s Breakout. The goal of the game is to
use a paddle to bounce a ball up and break through a wall of bricks. The agent only
takes in the raw pixels from the screen, and it has to figure out what’s the right action
to take in order to maximize the score. Idea adapted from Mnih et al. [1]. Game
owned by Atari Interactive, Inc.
Go
Go is an ancient Chinese strategy board game played by two players,
who take turns laying pieces of stones on a 19x19 board with the goal
of surrounding more territory than the opponent. Each player has a set
of black or white stones, and the game begins with an empty board.
Visit https://ebookmass.com today to explore
a vast collection of ebooks across various
genres, available in popular formats like
PDF, EPUB, and MOBI, fully compatible with
all devices. Enjoy a seamless reading
experience and effortlessly download high-
quality materials in just a few simple steps.
Plus, don’t miss out on exciting offers that
let you access a wealth of knowledge at the
best prices!
Other documents randomly have
different content
epämuodostumina, kuten esim. kuusisormisia tai harjasihoisia
ihmisiä, Ancona-lampaita, Niata-karjaa y.m.; ja koska ne
ominaisuuksiltaan suuresti eroavat luonnonlajeista, eivät ne
paljoakaan valaise kyseessäolevaa asiaa. Jollemme ota lukuun
tämänlaatuisia muuntelutapauksia, lienee olemassa ainoastaan
harvoja äkillisiä muunteluita ja kun niitä luonnossa tavataan, voidaan
niitä korkeintaan pitää epävarmoina lajeina, jotka ovat läheistä sukua
kantamuodolleen.
Voi tuskin olla epäilystä siitä, että monet lajit ovat kehittyneet
peräti asteittaisella tavalla. Monien luonnontilassa elävien laajojen
heimojen lajit jopa suvutkin liittyvät niin läheisesti toisiinsa, että niitä
useinkin on vaikea erottaa toisistaan. Jokaisella mantereella
tapaamme, kulkiessamme pohjoisesta etelään, alankoseuduilta
ylänköseuduille j.n.e., runsaasti läheissukuisia eli toisiaan vastaavia
lajeja, samoin tapaamme niitä eräillä toisistaan erotetuilla
mantereilla, joiden on täytynyt olettaa aikaisemmin olleen
yhteydessä keskenään. Tehdessäni nämä ynnä alempana seuraavat
huomautukset, minun on pakko viitata seikkoihin, jotka tulevat vasta
myöhemmin käsiteltäviksi. Jos luomme katseemme jotakin
mannermaata ympäröiviin monilukuisiin saariin, havaitsemme että
hyvin monet niiden asujamista voidaan lukea ainoastaan
epävarmojen lajien joukkoon. Samoin on laita, jos luomme
katseemme menneisiin aikoihin ja vertaamme äskettäin elävien
ilmoilta poistuneita lajeja nykyisiin; tai jos vertaamme saman
geologisen muodostuman eri kerroksiin hautautuneita kivettyneitä
lajeja. On aivan ilmeistä, että monen monituiset lajit ovat mitä
läheisimmässä sukulaisuussuhteessa muihin vielä eläviin tai vielä
hiljan elossa olleisiin lajeihin; ja sellaisten lajien voitanee tuskin
väittää kehittyneen jyrkällä ja äkillisellä tavalla. Meidän ei myöskään
pidä unohtaa, että tarkastaessamme eri lajien sijasta
samansukuisten lajien erikoisia osia voimme havaita lukuisien ja
ihmeen hienovivahteisten välimuotojen liittävän toisiinsa hyvin
erilaisia rakennelmia.
VAISTO.
ERIKOISIA VAISTOJA.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookmass.com