0% found this document useful (0 votes)

31 views5 pages

Report PDF

This document contains the report for Homework 4. It includes: 1. Results from experiments with Taxi, Cartpole, and DQN models including screenshots. 2. Answers to 10 questions about the DQN algorithm, including discussions of discretizing observations, epsilon-greedy exploration, advantages of the replay buffer and target network, and what was learned from the homework.

Uploaded by

鄭博仁

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views5 pages

Report PDF

Uploaded by

鄭博仁

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

HW4 Report

109550206 陳品劭 Self link

Part I. Experiment Results

1. taxi.png

2. cartpole.png
3. DQN.png

4. compare.png

Part II. Question Answering

1. Calculate the optimal Q-value of a given state in Taxi-v3 (the state is assigned in google sheet), and compare
with the Q-value you learned (Please screenshot the result of the “check_max_Q” function to show the Q-
value you learned). (4%)
+---------+
|R: | : :G|
| : | : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
Rewards:
- -1 per step unless other reward is triggered.
- +20 delivering passenger.
- -10 executing "pickup" and "drop-off" actions illegally.

Very close with max Q.

2. Calculate the max Q-value of the initial state in CartPole-v0, and compare with the Q-value you learned.
(Please screenshot the result of the “check_max_Q” function to show the Q-value you learned) (4%)

Since discretize it is not very close to

Close to
3. a. Why do we need to discretize the observation in Part 2? (2%)

We can use a lighter architecture to learn at the cost of a less efficient policy.

b. How do you expect the performance will be if we increase “num_bins”? (2%)

Better.

c. Is there any concern if we increase “num_bins”? (2%)

We need more resource to train this machine.

4. Which model (DQN, discretized Q learning) performs better in Cartpole-v0, and what are the reasons? (3%)
DQN

1. Since discretized Q learning use discretized. (See above question)

2. Deep Q Learning tends to overestimate the reward, which leads to unstable training and lower quality
policy. Let’s consider the equation for the Q value: The last part of the equation takes the estimate of the
maximum value. This procedure results in systematic overestimation, which introduces a maximization
bias. In DQN, the Q values will be taken from the target network, which is meant to reflect the state of the
main DQN. However, it doesn’t have identical weights because it’s only updated after a certain number of
episodes. The addition of the target network might slow down the training since the target network is not
continuously updated. However, it should have a more robust performance over time.
5. a. What is the purpose of using the epsilon greedy algorithm whilechoosing an action? (2%)
In the beginning, the agent has none or limited knowledge about the environment, so we need use some to let it
discover its environment. Here we use epsilon greedy algorithm. It will let agent try some actions are never taken
before and let it know these environment. On the other hand, let it learn which action is better and choose that in
that moment.

b. What will happen, if we don’t use the epsilon greedy algorithm in the CartPole-v0 environment? (3%)

It may get the result look like above. Since it can not get the information of the environment, it will stay in some
well-known action and result.

c. Is it possible to achieve the same performance without the epsilon greedy algorithm in the CartPole-v0
environment? Why or Why not? (3%)

Yes. Since we just need some to konw the environment, epsilon greedy algorithm is a simple and efficient way.
But we may use other similar way to achieve that. For example, softmax may reach same performance.

d. Why don’t we need the epsilon greedy algorithm during the testing section? (2%)

Since the agent has enough information about the environment.

6. Why is there “with torch.no_grad():“ in the “choose_action” function in DQN? (3%)

Since we only want to use network to choose action here, we do not want to update it.

7. a. Is it necessary to have two networks when implementing DQN? (1%)

No.

b. What are the advantages of having two networks? (3%)

If we have two network, the Q values can be taken from the target network, which is meant to reflect the state of
the main DQN. However, it doesn’t have identical weights because it’s only updated after a certain number of
episodes. It should have a more robust performance over time.

c. What are the disadvantages? (2%)

The addition of the target network might slow down the training since the target network is not continuously
updated.

8. a. What is a replay buffer(memory)? Is it necessary to implement a replay buffer? What are the advantages
of implementing a replay buffer? (5%)

Relay buffer is a way to remember some samples for agent.

No.

The agent will get a relatively good result in a single step, and then affect the subsequent actions, which can be
solved by remembering through the relay buffer.

b. Why do we need batch size? (3%)

Let us know how many samples we can remember and we have how many samples for every train.

c. Is there any effect if we adjust the size of the replay buffer(memory) or batch size? Please list some
advantages and disadvantages. (2%)

A small buffer might force your network to only care about what it saw recently. A large buffer might take a long
time to "become refreshed" with good trajectories, when they finally start to be discovered. Conversely, their (large,
small) respective advantages
9. a. What is the condition that you save your neural network? (1%)

We save my neural network when done is true and this time is a great one of 5 trainings.

b. What are the reasons? (2%)

We can let it train faster by this condition.

In addition, I try to use test() to be condition to get better result. But it will let train slower. I think that is not a
best way.

10. What have you learned in the homework? (2%)

I have learned many thing about deep Q learning, one way of RL. Some implement and detail with DQN.

In fact, after finishing part 2, I think I know nothing and I don't have any idea with part 3. Then I had to look up a lot
of information, and then I felt like I learned something during part 3. And these question let me know many detail
about DQN, some I've thought about, some haven't. These questions let me know more thing about DQN.

Citra Log - Txt.old
No ratings yet
Citra Log - Txt.old
10,387 pages
Q Learning
No ratings yet
Q Learning
187 pages
Install Hackintosh High Sierra On PC Laptop - Hackintosh Shop
No ratings yet
Install Hackintosh High Sierra On PC Laptop - Hackintosh Shop
35 pages
01 Module 2 Neural Network Based Reinforcement Learning
No ratings yet
01 Module 2 Neural Network Based Reinforcement Learning
133 pages
Hyperparameter Impact On Learning Efficiency in Q-Learning and DQN Using Openai Gymnasium Environments
No ratings yet
Hyperparameter Impact On Learning Efficiency in Q-Learning and DQN Using Openai Gymnasium Environments
13 pages
2 4+Advanced+Tricks+for+DQNs
No ratings yet
2 4+Advanced+Tricks+for+DQNs
82 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
CH5 - Function Approximation
No ratings yet
CH5 - Function Approximation
33 pages
Demonstration Final Presentation
No ratings yet
Demonstration Final Presentation
59 pages
Lecture 7
No ratings yet
Lecture 7
52 pages
Chapter 1
No ratings yet
Chapter 1
33 pages
Crash 2025 06 21 - 01.47.36 FML
No ratings yet
Crash 2025 06 21 - 01.47.36 FML
31 pages
Q Learning
No ratings yet
Q Learning
38 pages
HNS New OS 2021 Adama
No ratings yet
HNS New OS 2021 Adama
184 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
Unit Iv Deep Q Learning
No ratings yet
Unit Iv Deep Q Learning
27 pages
KNX PIR Sensor SIS05 1 User Manual V1 0 0
No ratings yet
KNX PIR Sensor SIS05 1 User Manual V1 0 0
39 pages
RLDL PBL AmriteshChandra 09411503121
No ratings yet
RLDL PBL AmriteshChandra 09411503121
15 pages
4b - Deep Reinforcement Learning
No ratings yet
4b - Deep Reinforcement Learning
29 pages
Self-Driving Car Racing: Application of Deep Reinforcement Learning
No ratings yet
Self-Driving Car Racing: Application of Deep Reinforcement Learning
12 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
Bio Inspired AI Seminar Paper
No ratings yet
Bio Inspired AI Seminar Paper
18 pages
MAS Lab7 QFA
No ratings yet
MAS Lab7 QFA
10 pages
Solution: Introduction To Deep Learning
No ratings yet
Solution: Introduction To Deep Learning
20 pages
Hota ML ReinforcementLearning
No ratings yet
Hota ML ReinforcementLearning
12 pages
Q Transformer
No ratings yet
Q Transformer
20 pages
AWS Introduction
No ratings yet
AWS Introduction
12 pages
Deep Q Network
No ratings yet
Deep Q Network
6 pages
Towards Monocular Vision Based Obstacle Avoidance Through Deep Reinforcement Learning
No ratings yet
Towards Monocular Vision Based Obstacle Avoidance Through Deep Reinforcement Learning
14 pages
Sensors 23 02036
No ratings yet
Sensors 23 02036
24 pages
Deep Exploration Via Bootstrapped DQN: Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
No ratings yet
Deep Exploration Via Bootstrapped DQN: Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
18 pages
Untitled Document
No ratings yet
Untitled Document
11 pages
Valentina Transifex Instructions
No ratings yet
Valentina Transifex Instructions
11 pages
Continuous Deep Q-Learning With Model-Based Acceleration
No ratings yet
Continuous Deep Q-Learning With Model-Based Acceleration
13 pages
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
No ratings yet
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
13 pages
Software Requirements Specification (SRS) Project Lane Management System - 1
No ratings yet
Software Requirements Specification (SRS) Project Lane Management System - 1
35 pages
LABS 1-12 & Bonus Task
No ratings yet
LABS 1-12 & Bonus Task
75 pages
Advanced Spreadsheet Skills
No ratings yet
Advanced Spreadsheet Skills
25 pages
CS6700 Programming Assignment 2
No ratings yet
CS6700 Programming Assignment 2
17 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
Operating System Word
No ratings yet
Operating System Word
21 pages
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
No ratings yet
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
10 pages
Report PDF
100% (1)
Report PDF
5 pages
Reinforcement Learning With A and A Deep Heuristic: Ariel Kesleman Sergey Ten Adham Ghazali Majed Jubeh
No ratings yet
Reinforcement Learning With A and A Deep Heuristic: Ariel Kesleman Sergey Ten Adham Ghazali Majed Jubeh
6 pages
Case Study C Neww
No ratings yet
Case Study C Neww
12 pages
Architecting With Google Compute Engine
No ratings yet
Architecting With Google Compute Engine
3 pages
Lecture Notes On Reinforcement Learning Basics
No ratings yet
Lecture Notes On Reinforcement Learning Basics
6 pages
Technical Publications: Invenia ABUS 2.0 Version 2.0.x Dicom Conformance Statement
No ratings yet
Technical Publications: Invenia ABUS 2.0 Version 2.0.x Dicom Conformance Statement
59 pages
A Deeper Look at Experience Replay
No ratings yet
A Deeper Look at Experience Replay
9 pages
Rights Issue Application Process (R-Wap) Step 1: Click On The Link
No ratings yet
Rights Issue Application Process (R-Wap) Step 1: Click On The Link
3 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
1) Explain Better Weight Initialization Methods
No ratings yet
1) Explain Better Weight Initialization Methods
8 pages
Acknowledgements
No ratings yet
Acknowledgements
13 pages
Report PDF
No ratings yet
Report PDF
12 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
Part 2 - Building A Deep Q-Network To Play Gridworld - Catastrophic Forgetting and Experience Replay - by NandaKishore Joshi - Towards Data Science
No ratings yet
Part 2 - Building A Deep Q-Network To Play Gridworld - Catastrophic Forgetting and Experience Replay - by NandaKishore Joshi - Towards Data Science
8 pages
Chapter 2 3 Problem and Methodology RL Report Kiran
No ratings yet
Chapter 2 3 Problem and Methodology RL Report Kiran
3 pages
DRL hw2 2022 Fin2
No ratings yet
DRL hw2 2022 Fin2
6 pages
2021 Level L Computer Science Exam Related Material T3 Wk2
No ratings yet
2021 Level L Computer Science Exam Related Material T3 Wk2
7 pages
MS Xiip
No ratings yet
MS Xiip
7 pages
How To Setup A Solaris 11.4 Local Repository GOOD
No ratings yet
How To Setup A Solaris 11.4 Local Repository GOOD
7 pages
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
No ratings yet
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
6 pages
NB-MED - 2.2 - Rec - 4-2010 Software and Medical Devices
No ratings yet
NB-MED - 2.2 - Rec - 4-2010 Software and Medical Devices
16 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
System Programming and Operating System Notes
No ratings yet
System Programming and Operating System Notes
5 pages
Control of Nonholonomic Vehicle System Using Hierarchical Deep Reinforcement Learning
No ratings yet
Control of Nonholonomic Vehicle System Using Hierarchical Deep Reinforcement Learning
4 pages
Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science
No ratings yet
Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science
7 pages
Deep Q Learning
No ratings yet
Deep Q Learning
5 pages
MECH 370 Exp#1
No ratings yet
MECH 370 Exp#1
19 pages
hw4 PDF
No ratings yet
hw4 PDF
6 pages
HW4 Spec
No ratings yet
HW4 Spec
5 pages
Week 7
No ratings yet
Week 7
3 pages
Chapter 1 Installing R and RStudio - Introduction To Data Science
No ratings yet
Chapter 1 Installing R and RStudio - Introduction To Data Science
9 pages
EEE4114F 2022 ML Tutorial Solution 2 of 2
No ratings yet
EEE4114F 2022 ML Tutorial Solution 2 of 2
4 pages
Large-Scale Deep Reinforcement Learning
No ratings yet
Large-Scale Deep Reinforcement Learning
6 pages
Rahman2018 Article ImplementationOfQLearningAndDe PDF
No ratings yet
Rahman2018 Article ImplementationOfQLearningAndDe PDF
6 pages
Practical 6C
No ratings yet
Practical 6C
4 pages
Gautam Virmani Resume
No ratings yet
Gautam Virmani Resume
1 page
Experiment Number 5
No ratings yet
Experiment Number 5
2 pages
Srijan Tripathi
No ratings yet
Srijan Tripathi
1 page
Learning To Drive A Real Car in 20 Minutes
No ratings yet
Learning To Drive A Real Car in 20 Minutes
8 pages
Ee126 Project 1
No ratings yet
Ee126 Project 1
5 pages
Unit 2 Penerapan Pancasila Dalam Konteks Berbangsa PDF
No ratings yet
Unit 2 Penerapan Pancasila Dalam Konteks Berbangsa PDF
1 page
Problem 1: Sort Integers Saved in A File: Do Not Distribute Without Written Permission From Prof. Xiaoning Ding
No ratings yet
Problem 1: Sort Integers Saved in A File: Do Not Distribute Without Written Permission From Prof. Xiaoning Ding
2 pages
Bitwise Operators in C
No ratings yet
Bitwise Operators in C
3 pages
Fi - HR
No ratings yet
Fi - HR
3 pages
Cart-Pole Balancing With Deep Q Network (DQN) : The Objective
No ratings yet
Cart-Pole Balancing With Deep Q Network (DQN) : The Objective
1 page
Use of Variables in A Report Painter Report
No ratings yet
Use of Variables in A Report Painter Report
11 pages

Report PDF

Uploaded by

Report PDF

Uploaded by

HW4 Report

109550206 陳品劭 Self link

Part I. Experiment Results

Part II. Question Answering

Very close with max Q.

Since discretize it is not very close to

b. How do you expect the performance will be if we increase “num_bins”? (2%)

c. Is there any concern if we increase “num_bins”? (2%)

We need more resource to train this machine.

1. Since discretized Q learning use discretized. (See above question)

Since the agent has enough information about the environment.

6. Why is there “with torch.no_grad():“ in the “choose_action” function in DQN? (3%)

7. a. Is it necessary to have two networks when implementing DQN? (1%)

b. What are the advantages of having two networks? (3%)

c. What are the disadvantages? (2%)

Relay buffer is a way to remember some samples for agent.

b. Why do we need batch size? (3%)

b. What are the reasons? (2%)

We can let it train faster by this condition.

10. What have you learned in the homework? (2%)

You might also like