1 DRL Compre Regular

The document outlines the exam details for a Deep Reinforcement Learning course at the Biíla Institute of Technology, including the nature of the exam, weightage, and instructions for students. It consists of five questions focused on various aspects of reinforcement learning, including modeling trading strategies, Markov Decision Processes (MDPs), action selection strategies, and the use of supervised learning in AlphaGo. Each question requires detailed answers and analysis, with specific marks allocated to different parts.

Uploaded by

psychosaniyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views12 pages

1 DRL Compre Regular

Uploaded by

psychosaniyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

BiílaInstituteofTechnologyhScience,Pilani

WoíkI⭲tcoíatcdLcaí⭲i⭲oPíooíammcsDivisio
⭲ Fiíst Scmcstcí 2023-2024
Compícmc⭲sivc"cst(EC-3Rcoulaí)
Course No. : AIMLCZG512 Course Title :Deep Reinforcement
Learning Nature of Exam : Open BookWeightage : 40%No. of Pages = 2 ; No. Of Questions = 4;
Duration:2:30Hours/150Mins DateofExam:06-06-2024 (AN)

NotetoStudents:
1. Answerallthequestions.Allpartsofaquestionshouldbeansweredconsecutively.Ea
ch answer should start from a fresh page.
2. WritealltheanswersneatlyinA4papers,scananduploadthem.
3. Assumptionsmadeifany,shouldbestatedclearlyatthebeginningofyouranswer.

Q1)[Answerpartsandtheirsubpartsinthesamesequence.]
Imagine you are an investor trying to optimize your trading strategy for four
different stocks, labeled A, B, C, and D. Each stock has its own unique potential for
profit, which is unknownto you. To maximize
yourreturnsoveraseriesof100trades,youdecidetoimplementanε-greedy strategywithε
being0.1. The actual returns from each stock follow these distributions:

StockA:70%chanceof+1return,30%chanceof0.
StockB:50%chanceof+2return,50%chanceof0.
StockC:10%chanceof+5return,90%chanceof0.
Stock D: Guaranteed return of +0.5.
Giventhis,answerthefollowingquestions
(a) ShowhowdoyoumodelthisasaReinforcementLearningProblem.[1M]
(b) An investor intends to buy 100 times ( each time buying one share of one
stock). The strategies the investor chooses are (i) follow ε-greedy for the
initial25tradingsandonly exploit the information for the next 75 purchases
[1.5M] (ii)followε-greedyfortheinitial 75 tradings and only exploit the
information for the next 25 purchases [1.5M](iii)Follow ε-greedy for all the
100 purchases.[1.5M] Support the investor with youranalysis.Show all the
steps, tabulate answers to all the options and write your conclusion. [1M]
(c) What are MDP, POMDP and CMDP? What are they? Suggest one RL
techniquethatis usedtosolveproblems stated using them. It is adequate if you
write just one/two lines for each. [1.5M]
[1+1.5+1.5+1+1.5=8M]
Q2)[Answerpartsandtheirsubpartsinthesamesequence.]
Consider the MDP given below containing 2 states A and B with action Shift that may
result in A,B or terminal state..The rewards obtained are as indicated along the
edges in the figure (X-2,X-3,X-1,X,X+2). Treat the value of X to be6.The transition
probabilities are. As given along the edges. Let the discount factor be 0.4.

(a) Evaluate the given deterministic policy where the shift always
executes the higher probability action.Improve it upto1iterations.Use
DynamicProgramming solution to MDP.[4.0 M]
(b) Using value iteration of dynamic programming, determine the values
of states A and B. Let the values of A and B be initialized to 1. Show 1
Iteration. [4.0 M]

[4+4=8Marks]
Q3)[Answerpartsandtheirsubpartsinthesamesequence.]
(a) What are the two
mostimportantissueswhenyouhavetolearnthevaluefunctionusing a first-visit
Monte Carlo using for a deterministic policy.[2.0 M]Explain. Also, provide
possible solutions. [1.5 M].
(b) Explain any 3 most significant action selection strategies used in RL and
mention how each selection method balances exploration and exploitation.
Provide your answer asa table. [3 M]
(c) If we utilize a policy gradient method to address a reinforcement learning
problem and find that the policy it provides is not
optimal,whatcouldbethepossibleexplanationsfor this? State the most relevant
3 reasons. [1.5 M]

[2+1.5+3+1.5=8 Marks]
Q4) [ Answer parts and their subparts in the same sequence. ]For each of the
questions answer in not more than 4 precise statements. Vague Answers will not be
accepted.
(a) WhyAlphaGouseaseparatepolicynetworkandaseparatevaluenetwork?[1.0M]
(b) How does the MCTS ensure an actionwiththehighestvalueisfoundinreal-time?
Ifthe best action can be selected only by MCTS, why
isanypriorlearningofQ(s,a)required? [2.0 M]
(c) We have learned that Supervised Learning that learns with samples from a
given distribution does not capture the online nature of interactions as
required for reinforcement learning quite well.
(i) Why does AlphaGo use supervised learning to learn the initial policy
( and even further)? [1.5 M]
(ii) In
whatwaystheshortcomingsofsupervisedlearningaremitigatedinAlphaGo
? [2.0 M]
(d) HowdoesDQNhandlethechallengesreferredtointhecpartofthisquestion?[1.5M]
[1+2+1.5+2+1.5=8Marks]
Q5)[Answerpartsandtheirsubpartsinthesamesequence.]
(a) Considerthefollowingwaysoforganizingreinforcementlearningtechniques.(i)
Model-Basedvs.ModelFree;(ii)Value-basedvs.Policy-Based;(iii)On-Policyvs.Off-
Policy. Writeastatementortwooneachofthepoints(forbothcategories)explaining
the kind of problems those RL techniques are suited to. Provide your
response in a neatly organized table.[3 M]
(b) Consider the learning scenario. A human expert is
presentedwithtwotrajectoriestaken by two drivers in a highway stretch. The
humanexpertmarkswhichofthetrajectoriesis better. The agent
learnsthisexpertise(todecideabettertrajectorybygivingtwounseen trajectories)
observing the expert’s decisionfrommanysuchexamples.Explainhowyou
precisely model this as an appropriate RL problem [3 M]. Show all
theelementsofyour modeling making necessary assumptions [2 M]. [Note:
Only the most appropriate modeling gets the credit.]
[3+3+2=8Marks]
(a)

Project Proposal I. Project Profile
0% (1)
Project Proposal I. Project Profile
6 pages
10 Minutes Presentation About Myself: Your Company Name
No ratings yet
10 Minutes Presentation About Myself: Your Company Name
65 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
23 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
2 DRL Compre Makeup
No ratings yet
2 DRL Compre Makeup
12 pages
Exam RL 2022 Sample
No ratings yet
Exam RL 2022 Sample
8 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
Assignment 9: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 9: Reinforcement Learning Prof. B. Ravindran
3 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Wa 2
No ratings yet
Wa 2
6 pages
Reinforcement Learning Exam
No ratings yet
Reinforcement Learning Exam
6 pages
RL Unit 1
100% (1)
RL Unit 1
26 pages
RL Viva
No ratings yet
RL Viva
30 pages
Reinforcement Learning Question Bank
No ratings yet
Reinforcement Learning Question Bank
5 pages
2023-24 First Sem - DRL Mid Sem Regular
No ratings yet
2023-24 First Sem - DRL Mid Sem Regular
2 pages
Assignment 3 - Solution
No ratings yet
Assignment 3 - Solution
4 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
Reinforcement Learning Question Bank
No ratings yet
Reinforcement Learning Question Bank
11 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
RL Unitwise Imp Questions
No ratings yet
RL Unitwise Imp Questions
4 pages
Deep Learning
No ratings yet
Deep Learning
45 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Notes For Module 4 and 5
No ratings yet
Notes For Module 4 and 5
9 pages
Exam MT7051 VT24
No ratings yet
Exam MT7051 VT24
2 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
3.RL Unit 3
No ratings yet
3.RL Unit 3
31 pages
Reinforcement Learning - Basics
No ratings yet
Reinforcement Learning - Basics
7 pages
Deep Reinforcement Learning Handout v2.0
0% (1)
Deep Reinforcement Learning Handout v2.0
6 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
RL Intro-2
No ratings yet
RL Intro-2
24 pages
Unit No. 05 - Reinforced and Deep Learning
No ratings yet
Unit No. 05 - Reinforced and Deep Learning
44 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
Question Bank 1
No ratings yet
Question Bank 1
2 pages
Reinforcement Learning in The Era of LLMS: What Is Essential? What Is Needed? An RL Perspective On RLHF, Prompting, and Beyond
No ratings yet
Reinforcement Learning in The Era of LLMS: What Is Essential? What Is Needed? An RL Perspective On RLHF, Prompting, and Beyond
11 pages
RL
No ratings yet
RL
94 pages
Unit 3
No ratings yet
Unit 3
12 pages
FunAI Assignment Week 12
No ratings yet
FunAI Assignment Week 12
3 pages
RL Concepts and Methods
No ratings yet
RL Concepts and Methods
8 pages
F90de-Introduction To Reinforcement Learning
No ratings yet
F90de-Introduction To Reinforcement Learning
67 pages
20ai903 - RL - Unit 4
No ratings yet
20ai903 - RL - Unit 4
49 pages
HW3 Questions
No ratings yet
HW3 Questions
13 pages
Lect 2
No ratings yet
Lect 2
26 pages
RL Lecture1-Introduction (IITH)
No ratings yet
RL Lecture1-Introduction (IITH)
44 pages
An Invitation To Deep Reinforcement Learning: Bernhard Jaeger
No ratings yet
An Invitation To Deep Reinforcement Learning: Bernhard Jaeger
39 pages
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
CSLM 621
No ratings yet
CSLM 621
2 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Week 12
No ratings yet
Week 12
22 pages
Lecture 37 - Deep Deterministic Policy Gradient (DDPG)
No ratings yet
Lecture 37 - Deep Deterministic Policy Gradient (DDPG)
17 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Thesis Reinforcement Learning
100% (2)
Thesis Reinforcement Learning
5 pages
Question Bank - Reinforcement Learning
No ratings yet
Question Bank - Reinforcement Learning
3 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
State of North Carolina Sub W-9 01292019
No ratings yet
State of North Carolina Sub W-9 01292019
4 pages
Study of P&ID, Safeguarding Philosophy, Design Basis and Its Features
No ratings yet
Study of P&ID, Safeguarding Philosophy, Design Basis and Its Features
6 pages
Global Weekly Economic Update - Deloitte Insights
No ratings yet
Global Weekly Economic Update - Deloitte Insights
15 pages
Rookie Recruiters Training Lesson Plan - 112005 - 101653
No ratings yet
Rookie Recruiters Training Lesson Plan - 112005 - 101653
3 pages
Module 4 - Product Design & Capacity Planning
No ratings yet
Module 4 - Product Design & Capacity Planning
22 pages
CII RT-362 Collaborative Schedule
No ratings yet
CII RT-362 Collaborative Schedule
102 pages
Gujarat Technological University: Page 1 of 3
No ratings yet
Gujarat Technological University: Page 1 of 3
3 pages
Ue College of Law Nego Course Sybllabus 1ST Sem Sy 2022 23
No ratings yet
Ue College of Law Nego Course Sybllabus 1ST Sem Sy 2022 23
5 pages
Siva Traders
No ratings yet
Siva Traders
1 page
Re Project
No ratings yet
Re Project
54 pages
Ideal Money - John Nash
No ratings yet
Ideal Money - John Nash
9 pages
Patient Safety Strategy
100% (1)
Patient Safety Strategy
12 pages
Billion BaseCamp Intro Deck - India Tax Planning and Advisory
No ratings yet
Billion BaseCamp Intro Deck - India Tax Planning and Advisory
13 pages
Bara Cool Oring
No ratings yet
Bara Cool Oring
2 pages
Application of Machine Learning For A Systematic Analysis of The Autonomous-Vehicle Ecosystem
No ratings yet
Application of Machine Learning For A Systematic Analysis of The Autonomous-Vehicle Ecosystem
96 pages
Unknown
No ratings yet
Unknown
11 pages
AI For Everyone Leveraging Generative AI Tools
No ratings yet
AI For Everyone Leveraging Generative AI Tools
3 pages
CSEC Agricultural Science SA June 2011 P1
No ratings yet
CSEC Agricultural Science SA June 2011 P1
8 pages
Hl0 Rs0 Ls-Resume
No ratings yet
Hl0 Rs0 Ls-Resume
4 pages
Print Job Listing WASH Coordinator - Re-Advertised - Ethiojobs
No ratings yet
Print Job Listing WASH Coordinator - Re-Advertised - Ethiojobs
2 pages
MA - Cost Accounting Techniques: Materials: Procedures and Documents
No ratings yet
MA - Cost Accounting Techniques: Materials: Procedures and Documents
31 pages
Account Usage and Recharge Statement From 01-Oct-2024 To 30-Oct-2024
No ratings yet
Account Usage and Recharge Statement From 01-Oct-2024 To 30-Oct-2024
6 pages
Pages From Overview Transport Infrastructure Cambodia
No ratings yet
Pages From Overview Transport Infrastructure Cambodia
4 pages
Banking Laws U 1.5 Notes SD
No ratings yet
Banking Laws U 1.5 Notes SD
28 pages
Role of Data Driven HR in Current Covid Time (Sneha & Subhrotosh, Division D)
No ratings yet
Role of Data Driven HR in Current Covid Time (Sneha & Subhrotosh, Division D)
12 pages
E - Tender /bid Document: Tender No: NEIGR/ENGG/EL-1/21
No ratings yet
E - Tender /bid Document: Tender No: NEIGR/ENGG/EL-1/21
59 pages
Market and Marketing-1
No ratings yet
Market and Marketing-1
6 pages
Hong Kong Economic Times Holdings Ltd. - Major Transaction in Relation To The Disposal of The Property
No ratings yet
Hong Kong Economic Times Holdings Ltd. - Major Transaction in Relation To The Disposal of The Property
6 pages

1 DRL Compre Regular

Uploaded by

1 DRL Compre Regular

Uploaded by

BiílaInstituteofTechnologyhScience,Pilani

You might also like