0% found this document useful (0 votes)

107 views12 pages

Report PDF

The document summarizes an assignment on reinforcement learning. It includes: 1. Code snippets and explanations of implementations of Taxi, Cartpole, and DQN algorithms. 2. Experiment results comparing the performance of Taxi, Cartpole, and DQN on their respective environments. 3. Questions about calculating optimal Q-values, discretizing state spaces, exploration vs exploitation, and the purpose of "torch.no_grad()" in DQN.

Uploaded by

鄭博仁

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views12 pages

Report PDF

Uploaded by

鄭博仁

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Homework 4:

Reinforcement Learning
Report
Please keep the title of each section and delete examples. Note that please keep the questions liste
d in Part III.

Part I. Implementation (-5 if not explain in detail):

● Please screenshot your code snippets of Part 1 ~ Part 3, and explain your implementation.
part1:
part2:
part3:
Part II. Experiment Results:

Please paste taxi.png, cartpole.png, DQN.png and compare.png here.

1. taxi.png:
2. cartpole.png

3. DQN.png
4. compare.png

Part III. Question Answering (50%):

1. Calculate the optimal Q-value of a given state in Taxi-v3, and compare with the Q-value y
ou learned (Please screenshot the result of the “check_max_Q” function to show the Q-
value you learned). (10%)

taxi : (2,2) 、passenger : Y , destination : R

[west, west, south, south, pick up, north, north, north, north, drop-off]
10 steps and passenger delivered, gamma = 0.9
optimal Q = (– 1 – 0.9*1 – 0.9^2*1 - … - 0.9^9*1) + 0.9^9 * 20

equals to the result of max-Q in taxi.py

2. Calculate the max Q-value of the initial state in CartPole-v0, and compare with the Q-valu
e you learned. (Please screenshot the result of the “check_max_Q” function to show the
Q-value you learned) (10%)

num of steps is large , gamma = 0.97

optimal Q = 1 + 0.97*1 + 0.97^2*1+ … + 0.97^n*1
= 1 * (1-0.97^n) / 1 -0.97 = 1 / 0.03 = 33.3333
cartpole.py : 29.55 ; DQN.py : 34.656 result of DQN is closer to optimal Q.

3.
a. Why do we need to discretize the observation in Part 2? (3%)
Since the original continuous observation is difficult to use, discretize them can
make it easily to build the state list.

b. How do you expect the performance will be if we increase “num_bins”? (3%)

If num_bins increase, the state will be more precise and the performance will be
better.

c. Is there any concern if we increase “num_bins”? (3%)

We will need more space and time since it become bigger.

4. Which model (DQN, discretized Q learning) performs better in Cartpole-v0, and what are t
he reasons? (5%)
DQN performs better
Since DQN use deep neueal network which allows it to handle large and continous state
space, it performs better with high dimensional space state and action space.

5.
a. What is the purpose of using the epsilon greedy algorithm while choosing an actio
n? (3%)
To balance exploration and exploitation when choosing actions.

b. What will happen, if we don’t use the epsilon greedy algorithm in the CartPole-v
0 environment? (3%)
The agent will always select the action with the highest estimated Q-value,
which may cause the agent get stuck in a suboptimal policy, where it fails to
balance the pole for long periods of time.

c. Is it possible to achieve the same performance without the epsilon greedy algorith
m in the CartPole-v0 environment? Why or Why not? (3%)
It is possible if we can find another exploration strategy which can also balance
exploration and exploitation to replace it.
d. Why don’t we need the epsilon greedy algorithm during the testing section? (3%)
Since the qtable of the environment is done and agent's policy is fixed.

6. Why does “with torch.no_grad():“ do inside the “choose_action” function in DQN?

(4%)
To disable gradient computation and backpropagation during action selection.

The Influences of Cellphone Artificial Intelligence Toward The Educational Productivity of Grade 12 Students
80% (5)
The Influences of Cellphone Artificial Intelligence Toward The Educational Productivity of Grade 12 Students
57 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Dav Public School, Vasant Kunj, New Delhi: Artificial Intelligence (Subject Code: 417)
No ratings yet
Dav Public School, Vasant Kunj, New Delhi: Artificial Intelligence (Subject Code: 417)
8 pages
Getting Started With Reinforcement Learning and Open AI Gym
No ratings yet
Getting Started With Reinforcement Learning and Open AI Gym
10 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
Q-Learning in RL With Openai Gym: Joo Soon Lee
No ratings yet
Q-Learning in RL With Openai Gym: Joo Soon Lee
34 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Rahman2018 Article ImplementationOfQLearningAndDe PDF
No ratings yet
Rahman2018 Article ImplementationOfQLearningAndDe PDF
6 pages
HW4 Spec
No ratings yet
HW4 Spec
5 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
Report PDF
100% (1)
Report PDF
5 pages
hw4 PDF
No ratings yet
hw4 PDF
6 pages
Report PDF
No ratings yet
Report PDF
5 pages
Information Systems and Technology (In/It) : Purdue University Global 2022-2023 Catalog - 1
No ratings yet
Information Systems and Technology (In/It) : Purdue University Global 2022-2023 Catalog - 1
9 pages
Multi-Agent Deep Reinforcement Learning: Maxim Egorov Stanford University
No ratings yet
Multi-Agent Deep Reinforcement Learning: Maxim Egorov Stanford University
8 pages
Machine Learning (ML) in Wireless Sensor Networks (WSNS)
No ratings yet
Machine Learning (ML) in Wireless Sensor Networks (WSNS)
25 pages
Comp417 Assignment3 2022
No ratings yet
Comp417 Assignment3 2022
2 pages
Sensors 23 02036
No ratings yet
Sensors 23 02036
24 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
12 pages
Experiment Number 5
No ratings yet
Experiment Number 5
2 pages
Learning To Drive A Real Car in 20 Minutes
No ratings yet
Learning To Drive A Real Car in 20 Minutes
8 pages
Cart-Pole Balancing With Deep Q Network (DQN) : The Objective
No ratings yet
Cart-Pole Balancing With Deep Q Network (DQN) : The Objective
1 page
39-Q Learning Numerical
No ratings yet
39-Q Learning Numerical
13 pages
Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science
No ratings yet
Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science
7 pages
School of Computer Science and Engineering: Project On Artificial Intelligence in Defence
No ratings yet
School of Computer Science and Engineering: Project On Artificial Intelligence in Defence
33 pages
Part 2 - Building A Deep Q-Network To Play Gridworld - Catastrophic Forgetting and Experience Replay - by NandaKishore Joshi - Towards Data Science
No ratings yet
Part 2 - Building A Deep Q-Network To Play Gridworld - Catastrophic Forgetting and Experience Replay - by NandaKishore Joshi - Towards Data Science
8 pages
Hydrological Processes - 2021 - Refsgaard - Hydrological Process Knowledge in Catchment Modelling Lessons and
No ratings yet
Hydrological Processes - 2021 - Refsgaard - Hydrological Process Knowledge in Catchment Modelling Lessons and
35 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Machine Learning and Its Application in Food Science and Technology
No ratings yet
Machine Learning and Its Application in Food Science and Technology
32 pages
Deep Q Learning
No ratings yet
Deep Q Learning
5 pages
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
No ratings yet
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
13 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
Bit Soft Computing Lab Manual
No ratings yet
Bit Soft Computing Lab Manual
2 pages
Assignment 3 - Q-Learning and Actor-Critic Algorithms
No ratings yet
Assignment 3 - Q-Learning and Actor-Critic Algorithms
6 pages
01 Module 2 Neural Network Based Reinforcement Learning
No ratings yet
01 Module 2 Neural Network Based Reinforcement Learning
133 pages
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
No ratings yet
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
10 pages
Q Learning
No ratings yet
Q Learning
38 pages
4b - Deep Reinforcement Learning
No ratings yet
4b - Deep Reinforcement Learning
29 pages
Project Phase-1 Report
No ratings yet
Project Phase-1 Report
28 pages
01 IEEE 2023 A Data Mesh Approach For Enabling Data-Centric Applications at The Tactical Edge
No ratings yet
01 IEEE 2023 A Data Mesh Approach For Enabling Data-Centric Applications at The Tactical Edge
9 pages
Core Concepts of Supervised, Unsupervised, and Reinforcement Learning
No ratings yet
Core Concepts of Supervised, Unsupervised, and Reinforcement Learning
3 pages
Unit Iv Deep Q Learning
No ratings yet
Unit Iv Deep Q Learning
27 pages
BUSN600 W5 Assignment Part Two 1
No ratings yet
BUSN600 W5 Assignment Part Two 1
4 pages
DL2F A Deep Learning Model For The Local Forecasting of Renewable Sources
No ratings yet
DL2F A Deep Learning Model For The Local Forecasting of Renewable Sources
16 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
2 4+Advanced+Tricks+for+DQNs
No ratings yet
2 4+Advanced+Tricks+for+DQNs
82 pages
RL Unit V Qa
No ratings yet
RL Unit V Qa
13 pages
Untitled Document
No ratings yet
Untitled Document
11 pages
FPGA Based Implementation of Neural Network
No ratings yet
FPGA Based Implementation of Neural Network
5 pages
Demonstration Final Presentation
No ratings yet
Demonstration Final Presentation
59 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
ML Endsem 2022
No ratings yet
ML Endsem 2022
7 pages
Experiment 9
No ratings yet
Experiment 9
4 pages
Scaling Laws For Data Poisoning in Llms
No ratings yet
Scaling Laws For Data Poisoning in Llms
20 pages
Chapter 1
No ratings yet
Chapter 1
33 pages
Learning Task
No ratings yet
Learning Task
14 pages
Efficient Neural Architecture Search (NAS)
No ratings yet
Efficient Neural Architecture Search (NAS)
2 pages
Midterm Report Example3
No ratings yet
Midterm Report Example3
4 pages
REPORT
No ratings yet
REPORT
42 pages
Hota ML ReinforcementLearning
No ratings yet
Hota ML ReinforcementLearning
12 pages
Q Learning
No ratings yet
Q Learning
187 pages
DRL hw2 2022 Fin2
No ratings yet
DRL hw2 2022 Fin2
6 pages
Business Idea Presentation About Ai Field
No ratings yet
Business Idea Presentation About Ai Field
25 pages
RLDL PBL AmriteshChandra 09411503121
No ratings yet
RLDL PBL AmriteshChandra 09411503121
15 pages
Reinforcement Learning Review 1
No ratings yet
Reinforcement Learning Review 1
7 pages
MAS Lab7 QFA
No ratings yet
MAS Lab7 QFA
10 pages
EMCAD Efficient Multi-Scale Convolutional Attention Decoding For Medical Image Segmentation
No ratings yet
EMCAD Efficient Multi-Scale Convolutional Attention Decoding For Medical Image Segmentation
14 pages
RL Concepts and Methods
No ratings yet
RL Concepts and Methods
8 pages
Chapter 4 5 Results Conclusion RL Report Kiran
No ratings yet
Chapter 4 5 Results Conclusion RL Report Kiran
2 pages
Ranking Approach To Monolingual Question
No ratings yet
Ranking Approach To Monolingual Question
6 pages
Chapter 4 5 Vast Professional RL Report Kiran
No ratings yet
Chapter 4 5 Vast Professional RL Report Kiran
2 pages
Chapter 4 5 Formatted RL Report Kiran
No ratings yet
Chapter 4 5 Formatted RL Report Kiran
3 pages
2022 Resit Solution
No ratings yet
2022 Resit Solution
12 pages
10 Vol 103 No 1
No ratings yet
10 Vol 103 No 1
12 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
Chapter 2 3 Problem and Methodology RL Report Kiran
No ratings yet
Chapter 2 3 Problem and Methodology RL Report Kiran
3 pages
Cancer Detection 1
No ratings yet
Cancer Detection 1
15 pages
Self-Driving Car Racing: Application of Deep Reinforcement Learning
No ratings yet
Self-Driving Car Racing: Application of Deep Reinforcement Learning
12 pages
Unit - 4 ANN
No ratings yet
Unit - 4 ANN
46 pages
Artificial Intelligence-Based Inventory Management For Retail Supply
No ratings yet
Artificial Intelligence-Based Inventory Management For Retail Supply
14 pages
Module 5-1
No ratings yet
Module 5-1
10 pages
CH5 - Function Approximation
No ratings yet
CH5 - Function Approximation
33 pages
Hyperparameter Impact On Learning Efficiency in Q-Learning and DQN Using Openai Gymnasium Environments
No ratings yet
Hyperparameter Impact On Learning Efficiency in Q-Learning and DQN Using Openai Gymnasium Environments
13 pages
Smart Health Consulting System
No ratings yet
Smart Health Consulting System
7 pages
Reinforcement Learning With A and A Deep Heuristic: Ariel Kesleman Sergey Ten Adham Ghazali Majed Jubeh
No ratings yet
Reinforcement Learning With A and A Deep Heuristic: Ariel Kesleman Sergey Ten Adham Ghazali Majed Jubeh
6 pages
2025 - Do Carmo - Evidence Review Report On AI For Multilingual Communication in The Public Sector
No ratings yet
2025 - Do Carmo - Evidence Review Report On AI For Multilingual Communication in The Public Sector
44 pages
ML Module-2 QB
No ratings yet
ML Module-2 QB
2 pages
ML Engineering Masterclass
No ratings yet
ML Engineering Masterclass
8 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet

Report PDF

Uploaded by

Report PDF

Uploaded by

Homework 4:

Part I. Implementation (-5 if not explain in detail):

Please paste taxi.png, cartpole.png, DQN.png and compare.png here.

Part III. Question Answering (50%):

taxi : (2,2) 、passenger : Y , destination : R

equals to the result of max-Q in taxi.py

num of steps is large , gamma = 0.97

b. How do you expect the performance will be if we increase “num_bins”? (3%)

c. Is there any concern if we increase “num_bins”? (3%)

6. Why does “with torch.no_grad():“ do inside the “choose_action” function in DQN?

You might also like