0% found this document useful (0 votes)

314 views5 pages

Exam RL Questions

This document contains exam questions for the course "Advanced Topics in Machine Learning". Students must answer 3 of the questions, with each question worth 20 marks. Part A focuses on kernel methods and reinforcement learning, while Part B focuses solely on reinforcement learning. The questions assess understanding of topics like Markov decision processes, policy evaluation, policy improvement, state-value functions, and Monte Carlo methods.

Uploaded by

elz0rr0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

314 views5 pages

Exam RL Questions

Uploaded by

elz0rr0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Advanced Topics in Machine Learning, GI13, 2010/11

Answer any THREE questions. Each question is worth 20 marks. Use separate answer books
Answer any THREE questions. Each question is worth 20 marks. Use separate answer books
for PART A and PART B. Gatsby PhD students only: answer either TWO questions from
for PART A and PART B. Gatsby PhD students only: answer either TWO questions from
PART A and ONE question from PART B; or ONE question from PART A and TWO questions
PART A and ONE question from PART B; or ONE question from PART A and TWO questions
from PART B.
from PART B.
Marks
Marksforforeach
eachpart
partofofeach
eachquestion
questionare
areindicated
indicated in
in square
square brackets
brackets
Calculators
CalculatorsareareNOT
NOTpermitted
permitted

Part
PartA:
A:Kernel
KernelMethods
Methods

Part
PartB:
B:Reinforcement
ReinforcementLearning
Learning
1.1.Consider
Considerthe
thefollowing
followingMarkov
MarkovDecision
Decision Process
Process (MDP)
(MDP) with discount factor γγ =
discount factor 0.5.
= 0.5.
Upper
Uppercase
caseletters
lettersA,
A,B,B,CCrepresent
represent states;
states; arcs
arcs represent
represent state transitions;
transitions; lower
lower case
case
lettersab,
letters ab,ba,
ba,bc,
bc,ca,
ca,cbcbrepresent
representactions;
actions;signed
signed integers
integers represent rewards;
rewards; and
and fractions
fractions
representtransition
represent transitionprobabilities.
probabilities.
+2

ba
-8
A ab B
bc

1/4 -2
+8
3/4
+4
ca
cb
C

Definethe
• •Define thestate-value functionVVππ(s)
state-valuefunction for aa discounted
(s)for discounted MDP
[1
[1 marks]
marks]

Writedown
• •Write downthe
theBellman
Bellmanexpectation
expectationequation
equation for
for state-value
state-value functions
functions
[2
[2 marks]
marks]
GI13 1 TURN OVER
GI13 1 TURN OVER
• Consider the uniform random policy π1 (s,a) that takes all actions from state s with
equal probability. Starting with an initial value function of V1 (A) = V1 (B) = V1 (C) =
2, apply one synchronous iteration of iterative policy evaluation (i.e. one backup for
each state) to compute a new value function V2 (s)
[3 marks]

• Apply one iteration of greedy policy improvement to compute a new, deterministic

policy π2 (s)
[2 marks]

• Consider a deterministic policy π(s). Prove that if a new policy π0 is greedy with
respect to V π then it must be better than or equal to π, i.e. V π (s) ≥ V π (s) for all s;
0

and that if V π (s) = V π (s) for all s then π0 must be an optimal policy.
0

[5 marks]

• Define the optimal state-value function V ∗ (s) for an MDP

[1 marks]

• Write down the Bellman optimality equation for state-value functions

[2 marks]

• Starting with an initial value function of V1 (A) = V1 (B) = V1 (C) = 2, apply one
synchronous iteration of value iteration (i.e. one backup for each state) to compute
a new value function V2 (s).
[3 marks]

• Is your new value function V2 (s) optimal? Justify your answer.

[1 marks]

[Total 20 marks]

GI13 2 CONTINUED
2. Consider an undiscounted Markov Reward Process with two states A and B. The transition
matrix and reward function are unknown, but you have observed two sample episodes:

A + 3 → A + 2 → B − 4 → A + 4 → B − 3 → terminate

B − 2 → A + 3 → B − 3 → terminate

In the above episodes, sample state transitions and sample rewards are shown at each step,
e.g. A + 3 → A indicates a transition from state A to state A, with a reward of +3.

• Using first-visit Monte-Carlo evaluation, estimate the state-value function V (A),V (B)

[2 marks]

• Using every-visit Monte-Carlo evaluation, estimate the state-value function V (A),V (B)

[2 marks]

• Draw a diagram of the Markov Reward Process that best explains these two episodes
(i.e. the model that maximises the likelihood of the data - although it is not necessary
to prove this fact). Show rewards and transition probabilities on your diagram.
[4 marks]

• Define the Bellman equation for a Markov reward process

[2 marks]

• Solve the Bellman equation to give the true state-value function V (A),V (B). Hint:
solve the Bellman equations directly, rather than iteratively.

• What value function would batch TD(0) find, i.e. if TD(0) was applied repeatedly
to these two episodes?
[2 marks]

• What value function would batch TD(1) find, using accumulating eligibility traces?

[2 marks]

• What value function would LSTD(0) find?

[2 marks]

[Total 20 marks]

GI13 3 TURN OVER

3. A rat is involved in an experiment. It experiences one episode. At the first step it hears
a bell. At the second step it sees a light. At the third step it both hears a bell and sees
a light. It then receives some food, worth +1 reward, and the episode terminates on the
fourth step. All other rewards were zero. The experiment is undiscounted.

• Represent the rat’s state s by a vector of two binary features, bell(s) ∈ {0, 1} and
light(s) ∈ {0, 1}. Write down the sequence of feature vectors corresponding to this
episode.
[3 marks]

• Approximate the state-value function by a linear combination of these features with

two parameters: b · bell(s) + l · light(s). If b = 2 and l = −2 then write down the
sequence of approximate values corresponding to this episode.
[3 marks]

• Define the λ-return vtλ

[1 marks]

• Write down the sequence of λ-returns vtλ corresponding to this episode, for λ = 0.5
and b = 2, l = −2
[3 marks]

• Using the forward-view TD(λ) algorithm and your linear function approximator,
what are the sequence of updates to weight b? What is the total update to weight b?
Use λ = 0.5, γ = 1, α = 0.5 and start with b = 2, l = −2
[3 marks]

• Define the TD(λ) accumulating eligibility trace et when using linear value function
approximation
[1 marks]

• Write down the sequence of eligibility traces et corresponding to the bell, using
λ = 0.5, γ = 1
[3 marks]

• Using the backward-view TD(λ) algorithm and your linear function approximator,
what are the sequence of updates to weight b? (Use offline updates, i.e. do not ac-
tually change your weights, just accumulate your updates). What is the total update
to weight b? Use λ = 0.5, γ = 1, α = 0.5 and start with b = 2, l = −2
GI13 4 CONTINUED
[3 marks]

∆b1 = αδ1 e1 = 0.5(0 + −2 − 2)1 = −2

∆b2 = αδ2 e2 = 0.5(0 + 0 − −2)1/2 = 1/2

∆b3 = αδ3 e3 = 0.5(1 + 0 − 0)5/4 = 5/8

∑ ∆b = (−2 + 1/2 + 5/8) = −7/8

[Total 20 marks]

END OF PAPER

GI13 5

Public Speaking Strategies For Success 7
No ratings yet
Public Speaking Strategies For Success 7
3 pages
Cs 188 HW Solutions Artificial Intelligence
No ratings yet
Cs 188 HW Solutions Artificial Intelligence
7 pages
Python Cheat Sheet v1
No ratings yet
Python Cheat Sheet v1
1 page
Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
Robert Nelson - Sequel To The Art of Cold Reading
No ratings yet
Robert Nelson - Sequel To The Art of Cold Reading
24 pages
MANNA - Marshall Brain
No ratings yet
MANNA - Marshall Brain
3 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
Intro RL Paper Grock
No ratings yet
Intro RL Paper Grock
6 pages
RL Paper Deepsk
No ratings yet
RL Paper Deepsk
4 pages
Reinforcement Learning Lec12
No ratings yet
Reinforcement Learning Lec12
60 pages
DRL Homework 1
No ratings yet
DRL Homework 1
4 pages
Cs748 s2021 Quizzes Till q4
No ratings yet
Cs748 s2021 Quizzes Till q4
4 pages
Tutorial Questions (Annexure I) Que S-Tion No Questions Co BTL
No ratings yet
Tutorial Questions (Annexure I) Que S-Tion No Questions Co BTL
6 pages
Solutions To Reinforcement Learning by Sutton Chapter 3 rx1
No ratings yet
Solutions To Reinforcement Learning by Sutton Chapter 3 rx1
10 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
cs747 A2020 Quizzes PDF
No ratings yet
cs747 A2020 Quizzes PDF
5 pages
CS6700 RL 2024 Wa1
No ratings yet
CS6700 RL 2024 Wa1
7 pages
Intro RL Paper GPT
No ratings yet
Intro RL Paper GPT
5 pages
HGTFHGFHTF
No ratings yet
HGTFHGFHTF
5 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
AIS462 - Reinforcement Learning - Spring2025 - Lec4
No ratings yet
AIS462 - Reinforcement Learning - Spring2025 - Lec4
13 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
MDP RL Paper Grock
No ratings yet
MDP RL Paper Grock
5 pages
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
No ratings yet
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
10 pages
Quiz2 Sol
No ratings yet
Quiz2 Sol
4 pages
RL Exam Tutti
No ratings yet
RL Exam Tutti
47 pages
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
No ratings yet
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
4 pages
2 DRL Compre Makeup
No ratings yet
2 DRL Compre Makeup
12 pages
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
No ratings yet
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
9 pages
cs188 sp16 mt1 Sol
No ratings yet
cs188 sp16 mt1 Sol
23 pages
17 - Markov Decision Processes
No ratings yet
17 - Markov Decision Processes
59 pages
Lec 12
No ratings yet
Lec 12
60 pages
20ai903 - RL - Unit 2
No ratings yet
20ai903 - RL - Unit 2
27 pages
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
No ratings yet
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
40 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
HW 2
No ratings yet
HW 2
2 pages
RL Cheatsheet Quiz1
No ratings yet
RL Cheatsheet Quiz1
2 pages
Solution 3
No ratings yet
Solution 3
4 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
Cs229-Notes12 Reinforcement in Control
No ratings yet
Cs229-Notes12 Reinforcement in Control
17 pages
CO431 RL 2023 End Nov
No ratings yet
CO431 RL 2023 End Nov
3 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
Deep RL - Content Beyond Syllabus
No ratings yet
Deep RL - Content Beyond Syllabus
16 pages
ECE 493, Spring 2020, Assignment 1 Due: Friday June 19, 11:59pm
No ratings yet
ECE 493, Spring 2020, Assignment 1 Due: Friday June 19, 11:59pm
3 pages
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
No ratings yet
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
26 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
Lecture#5 Monte Carlo Methods Part I
No ratings yet
Lecture#5 Monte Carlo Methods Part I
28 pages
Lecture26 Ri
No ratings yet
Lecture26 Ri
55 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
EE675A Lec12
No ratings yet
EE675A Lec12
5 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
Exam Prep Exercises034534123124
No ratings yet
Exam Prep Exercises034534123124
20 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
Subtitle
No ratings yet
Subtitle
2 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
Wa 2
No ratings yet
Wa 2
6 pages
Brochure Professional Certificate in Data Engineering
100% (1)
Brochure Professional Certificate in Data Engineering
14 pages
Strategies To Prevent Weight Gain Among Adults
No ratings yet
Strategies To Prevent Weight Gain Among Adults
40 pages
Deep Learning For Cyber Security Intrusion Detection Approaches, Datasets, and Comparative Study PDF
No ratings yet
Deep Learning For Cyber Security Intrusion Detection Approaches, Datasets, and Comparative Study PDF
20 pages
A Survey On Network Intrusion Detection
No ratings yet
A Survey On Network Intrusion Detection
19 pages
Deep Learning Based Computer Vision A Review
No ratings yet
Deep Learning Based Computer Vision A Review
4 pages
Deep Learning For Cyber Security Intrusion Detection Approaches, Datasets, and Comparative Study PDF
No ratings yet
Deep Learning For Cyber Security Intrusion Detection Approaches, Datasets, and Comparative Study PDF
20 pages
Intro To Scientific Python (2018-01-23) PDF
No ratings yet
Intro To Scientific Python (2018-01-23) PDF
16 pages
Originals-How Nonconformists Move The World
67% (3)
Originals-How Nonconformists Move The World
39 pages
A Review On The Use of Deep Learning in Android Malware Detection PDF
No ratings yet
A Review On The Use of Deep Learning in Android Malware Detection PDF
17 pages
Baby Python Basic Code For Programming Adrian Wallen
No ratings yet
Baby Python Basic Code For Programming Adrian Wallen
10 pages
Python Numpy Tutorial
No ratings yet
Python Numpy Tutorial
26 pages
Keepsake Trunk Plans - WoodArchivist
100% (2)
Keepsake Trunk Plans - WoodArchivist
16 pages
Notes On TOPSIS Method
No ratings yet
Notes On TOPSIS Method
8 pages
STAT 100 - Data Science I
No ratings yet
STAT 100 - Data Science I
2 pages
Multivariable Calculus: 1. The Derivative
No ratings yet
Multivariable Calculus: 1. The Derivative
17 pages
Course Plan Numerical Methods & Statistics KNS2723: PO/ WA WP/ EA WK
No ratings yet
Course Plan Numerical Methods & Statistics KNS2723: PO/ WA WP/ EA WK
3 pages
Levenes Test
No ratings yet
Levenes Test
7 pages
Ws Multi Model Optimization-1
No ratings yet
Ws Multi Model Optimization-1
85 pages
(Total 1 Mark) : IB Questionbank Chemistry 1
No ratings yet
(Total 1 Mark) : IB Questionbank Chemistry 1
2 pages
Week02 Bracketing Methods
No ratings yet
Week02 Bracketing Methods
8 pages
Research Projects in Early Childhood Studies: January 2015
No ratings yet
Research Projects in Early Childhood Studies: January 2015
16 pages
Sport As Social Policy Midnight Football and The Governing of Society Routledge Critical Perspectives On Equality and Social Justice in Sport and Leisure 1st Edition David Ekholm Instant Download
No ratings yet
Sport As Social Policy Midnight Football and The Governing of Society Routledge Critical Perspectives On Equality and Social Justice in Sport and Leisure 1st Edition David Ekholm Instant Download
84 pages
MA201
No ratings yet
MA201
3 pages
Linear Programming: Simplex Method
No ratings yet
Linear Programming: Simplex Method
39 pages
Stress-Constrained Topology Optimization With Design-Dependent Loading
No ratings yet
Stress-Constrained Topology Optimization With Design-Dependent Loading
15 pages
Product - Process Validation
No ratings yet
Product - Process Validation
5 pages
Bayes Lectures English
No ratings yet
Bayes Lectures English
74 pages
Socio-Legal Discourse Conventions: Where Textual Attributes and Social Realities Meet
No ratings yet
Socio-Legal Discourse Conventions: Where Textual Attributes and Social Realities Meet
20 pages
Calculus I - Mr. John-1
No ratings yet
Calculus I - Mr. John-1
6 pages
Math
No ratings yet
Math
3 pages
Book Reviews: AAS and AES Analysis
No ratings yet
Book Reviews: AAS and AES Analysis
2 pages
BBCS4103 Integrated Case Study (SG)
No ratings yet
BBCS4103 Integrated Case Study (SG)
93 pages
Regression Analysis MCQ
No ratings yet
Regression Analysis MCQ
15 pages
Multicultural Values Analysis of Vocational High School Efl Textbook
No ratings yet
Multicultural Values Analysis of Vocational High School Efl Textbook
33 pages
Midterm Exercises - Solved
No ratings yet
Midterm Exercises - Solved
6 pages
Transmedia Project Design: Theoretical and Analytical Considerations
No ratings yet
Transmedia Project Design: Theoretical and Analytical Considerations
21 pages
Skewness, Five-Number Summary, Box-And-Whisker Plot and Kurtosis
No ratings yet
Skewness, Five-Number Summary, Box-And-Whisker Plot and Kurtosis
4 pages
7 Igcse Functions Ws
No ratings yet
7 Igcse Functions Ws
1 page
Business Analytics Concepts and Frameworks-Module1
No ratings yet
Business Analytics Concepts and Frameworks-Module1
5 pages
Fractional Integration and Differentiation
No ratings yet
Fractional Integration and Differentiation
10 pages
Mth603 Assignment No 1 Solution
No ratings yet
Mth603 Assignment No 1 Solution
6 pages
DMM Project1
No ratings yet
DMM Project1
13 pages