0% found this document useful (0 votes)

7 views4 pages

RL Paper Deepsk

The document outlines a mathematics exam on reinforcement learning, consisting of multiple sections covering basic concepts, value functions, policy iteration, exploration vs exploitation, and model-based vs model-free approaches. Each section includes specific questions and marks allocation, with calculations and definitions required for various concepts such as the Bellman equation and expected returns. The exam assesses understanding of reinforcement learning principles through problem-solving and theoretical explanations.

Uploaded by

alijaskani35

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views4 pages

RL Paper Deepsk

Uploaded by

alijaskani35

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Reinforcement Learning Mathematics Exam (100

Marks)
Time: 2 Hours

Section A: Basic Concepts (20 Marks)

1. (a) Define the Bellman equation for the state-value function vπ (s). (5 Marks)
(b) Calculate the expected return Gt for rewards Rt+1 = 2, Rt+2 = −1, Rt+3 = 3 with
γ = 0.9. (5 Marks)

2. (a) Differentiate between deterministic and stochastic policies. (4 Marks)

(b) Given policy: (
0.7 if a = 1
π(a|s) =
0.3 if a = 2
Compute π(a = 1|s). (6 Marks)

Section B: Value Functions and Bellman Equations

(30 Marks)
3. Consider an MDP with:

• States S = {s1 , s2 }, Actions A = {a1 , a2 }

• Transitions:
– In s1 : a1 → s2 (+5), a2 → s1 (-1)
– In s2 : All actions terminate (+10)

(a) Compute vπ (s1 ) for policy π that always chooses a1 (γ = 1). (10 Marks)
(b) Find the optimal value function v∗ (s1 ). (10 Marks)

3. For the gridworld:

A B (Terminal) +10
C D (Terminal) -5

Compute V (A) using Bellman equation (γ = 0.8, non-terminal rewards = -1, random
policy). (10 Marks)

1
Section C: Policy Iteration and Value Iteration (25
Marks)
5. Perform one iteration of policy evaluation for:

• States S = {s1 , s2 }, terminal s2

• Transition: s1 → s2 (+5)
• γ = 0.9, initial V (s1 ) = V (s2 ) = 0

(10 Marks)

5. Apply value iteration for 2 steps on the MDP in Question 5. (15 Marks)

Section D: Exploration vs Exploitation (15 Marks)

7. Given bandit arms:

• Arm 1: 10 pulls, avg=3

• Arm 2: 5 pulls, avg=4
• Arm 3: 20 pulls, avg=2
√
Compute UCB scores (c = 2) and choose next arm. (10 Marks)

7. Explain -greedy strategy (ϵ = 0.1) and compute exploration probability for 3 arms. (5
Marks)

Section E: Model-Based vs Model-Free (10 Marks)

9. (a) Contrast model-based and model-free RL. (4 Marks)
(b) Compute expected reward for model with 70% accuracy (R1 = 3, R2 = 5). (6
Marks)

2
Complete Solutions
Section A
1. (a) The Bellman equation for vπ (s) is:

vπ (s) = Eπ [Rt+1 + γvπ (St+1 ) | St = s]

(b) Expected return calculation:

Gt = 2 + 0.9(−1) + 0.92 (3) = 2 − 0.9 + 2.43 = 3.53

2. (a) Deterministic policy: Maps states to specific actions (a = π(s)).

Stochastic policy: Specifies probability distribution over actions (π(a|s) =
P[At = a|St = s]).
(b) Direct from definition:
π(a = 1|s) = 0.7

Section B
3. (a) For policy π always choosing a1 :

vπ (s1 ) = 5 + γvπ (s2 ) = 5 + 1 × 10 = 15

(b) Optimal value function:

5 + 10 = 15 (a1 )
v∗ (s1 ) = max = 15
a −1 + 15 = 14 (a2 loops)

3. Bellman equation for state A:

1 1 1 1
V (A) = −1 + 0.8 V (B) + V (C) + V (A) + V (A)
4 4 4 4

Terminal states: V (B) = 10, V (D) = −5. Non-terminal V (C) = 0 (initial assump-
tion):
1
V (A) = −1 + 0.8 × (10) = −1 + 2 = 1
4

Section C
5. First iteration of policy evaluation:

V (s1 ) = 5 + γV (s2 ) = 5 + 0.9 × 0 = 5

V (s2 ) remains 0 (terminal).

5. Value iteration steps:

Step 1: V1 (s1 ) = 5 + 0.9 × 0 = 5

Step 2: V2 (s1 ) = 5 + 0.9 × 0 = 5

3
Section D
7. UCB scores (ntotal = 10 + 5 + 20 = 35):
r
2 ln 35
UCB1 = 3 + ≈ 3 + 0.97 = 3.97
r 10
2 ln 35
UCB2 = 4 + ≈ 4 + 1.12 = 5.12
r 5
2 ln 35
UCB3 = 2 + ≈ 2 + 0.65 = 2.65
20
Choose Arm 2 (highest UCB).

7. -greedy strategy:

• Exploit best arm with probability 1 − ϵ = 0.9

• Explore randomly with probability ϵ = 0.1
• Exploration probability per arm: 0.1
3
≈ 0.033

Section E
9. (a) Model-based: Uses environment model for predictions.
Model-free: Learns directly from experience without model.
(b) Expected reward with 70% accurate model:

0.7 × (3 + 5) = 0.7 × 8 = 5.6

242 Sheet 02 03
No ratings yet
242 Sheet 02 03
5 pages
Fa19 Lecture 15 MDPs II
No ratings yet
Fa19 Lecture 15 MDPs II
76 pages
2 Dynamic
No ratings yet
2 Dynamic
50 pages
Lec 09
No ratings yet
Lec 09
51 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
Lecture26 Ri
No ratings yet
Lecture26 Ri
55 pages
Slidedeck 6 MAS 2021 22 RL 2 MDP Model-Based
No ratings yet
Slidedeck 6 MAS 2021 22 RL 2 MDP Model-Based
36 pages
Notações Dos Algoritimos
No ratings yet
Notações Dos Algoritimos
10 pages
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
No ratings yet
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
40 pages
RL-UNIT2 - RL Unit 2 RL-UNIT2 - RL Unit 2
No ratings yet
RL-UNIT2 - RL Unit 2 RL-UNIT2 - RL Unit 2
23 pages
Assignment 5 (Sol.) : Reinforcement Learning
100% (1)
Assignment 5 (Sol.) : Reinforcement Learning
4 pages
Exam Prep Exercises034534123124
No ratings yet
Exam Prep Exercises034534123124
20 pages
Reinforcement Learning Lec12
No ratings yet
Reinforcement Learning Lec12
60 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
M 2
No ratings yet
M 2
12 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
Operation and Maintenance Manual: Effluent Treatment Plant
100% (2)
Operation and Maintenance Manual: Effluent Treatment Plant
49 pages
EE675 Lecture 10
No ratings yet
EE675 Lecture 10
4 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
Artificial Intelligence: Lecture 9 - Markov Decision Processes II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 9 - Markov Decision Processes II Dr. Shivanjali Khare
44 pages
Experiment 4
No ratings yet
Experiment 4
7 pages
RL Exam Tutti
No ratings yet
RL Exam Tutti
47 pages
RL Practice Midterm
No ratings yet
RL Practice Midterm
4 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
CS229
No ratings yet
CS229
17 pages
CO431 RL 2023 End Nov
No ratings yet
CO431 RL 2023 End Nov
3 pages
Cs229-Notes12 Reinforcement in Control
No ratings yet
Cs229-Notes12 Reinforcement in Control
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
1 page
Sp14 Cs188 Lecture 9 - Mdps II
No ratings yet
Sp14 Cs188 Lecture 9 - Mdps II
48 pages
RL Cheatsheet Quiz1
No ratings yet
RL Cheatsheet Quiz1
2 pages
Unit-5 Ai
No ratings yet
Unit-5 Ai
19 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
Tutorial Questions (Annexure I) Que S-Tion No Questions Co BTL
No ratings yet
Tutorial Questions (Annexure I) Que S-Tion No Questions Co BTL
6 pages
Reinforcement Learning Cheatsheet
No ratings yet
Reinforcement Learning Cheatsheet
16 pages
2023-24 First Sem - DRL Mid Sem Regular
No ratings yet
2023-24 First Sem - DRL Mid Sem Regular
2 pages
HGTFHGFHTF
No ratings yet
HGTFHGFHTF
5 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
Intro RL Paper Grock
No ratings yet
Intro RL Paper Grock
6 pages
Intro RL Paper GPT
No ratings yet
Intro RL Paper GPT
5 pages
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
No ratings yet
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
4 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
DRL Homework 1
No ratings yet
DRL Homework 1
4 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
Solution 3
No ratings yet
Solution 3
4 pages
MDP RL Paper Grock
No ratings yet
MDP RL Paper Grock
5 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
Mid Term Report SoS
No ratings yet
Mid Term Report SoS
18 pages
CS6700 RL 2024 Wa1
No ratings yet
CS6700 RL 2024 Wa1
7 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
BMS Procedure
100% (3)
BMS Procedure
138 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
RL Lecture4
No ratings yet
RL Lecture4
7 pages
Cut&PasteG1 3 Math
96% (27)
Cut&PasteG1 3 Math
98 pages
Instructional Module
100% (2)
Instructional Module
6 pages
PT Mathematics-6 Q2
No ratings yet
PT Mathematics-6 Q2
7 pages
Assessment Test 2nd Cash&Rec
100% (1)
Assessment Test 2nd Cash&Rec
6 pages
Telangana Inter 1st Year Result 2021
No ratings yet
Telangana Inter 1st Year Result 2021
8 pages
Unit 2
No ratings yet
Unit 2
15 pages
Logiq 500 GE
No ratings yet
Logiq 500 GE
407 pages
Philippine Primitive Art
100% (1)
Philippine Primitive Art
3 pages
3 - Technical - Methods of Development
No ratings yet
3 - Technical - Methods of Development
29 pages
RM 64
No ratings yet
RM 64
632 pages
Operation & Service Manual For Cable Tensiometer: Series
No ratings yet
Operation & Service Manual For Cable Tensiometer: Series
28 pages
Cement Project
No ratings yet
Cement Project
16 pages
Carcassonne V3 Supplement
No ratings yet
Carcassonne V3 Supplement
2 pages
Pandas Viva Questions
No ratings yet
Pandas Viva Questions
23 pages
Atitude of Fast-Food Worker
No ratings yet
Atitude of Fast-Food Worker
8 pages
TCA 1 Hard Surface Flooring Proposal and Reason Statement
No ratings yet
TCA 1 Hard Surface Flooring Proposal and Reason Statement
2 pages
Algorithm - Pseudocode of 2D CNN
No ratings yet
Algorithm - Pseudocode of 2D CNN
7 pages
RRL
No ratings yet
RRL
2 pages
Previews 2034814 Pre
No ratings yet
Previews 2034814 Pre
7 pages
PR A2plus B1 The World Today Videos Videoscript
No ratings yet
PR A2plus B1 The World Today Videos Videoscript
3 pages
Basics of Share Allotement
No ratings yet
Basics of Share Allotement
3 pages
FPGA TN 02136 1 8 LatticeECP3 SPI Slave Port
No ratings yet
FPGA TN 02136 1 8 LatticeECP3 SPI Slave Port
22 pages
190 MP IgM-IFU-en-EU-IVDD-V2.1
No ratings yet
190 MP IgM-IFU-en-EU-IVDD-V2.1
2 pages
Britannia Industries Historical Closing Price Data-Final
No ratings yet
Britannia Industries Historical Closing Price Data-Final
48 pages
"Office Green Wants To Increase Brand Awareness.": Goal One: SMART Goal One
No ratings yet
"Office Green Wants To Increase Brand Awareness.": Goal One: SMART Goal One
2 pages
68 133 1 SM PDF
No ratings yet
68 133 1 SM PDF
9 pages
Using The Universal PE Unpacker
No ratings yet
Using The Universal PE Unpacker
11 pages
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet

RL Paper Deepsk

Uploaded by

RL Paper Deepsk

Uploaded by

Reinforcement Learning Mathematics Exam (100

Section A: Basic Concepts (20 Marks)

2. (a) Differentiate between deterministic and stochastic policies. (4 Marks)

Section B: Value Functions and Bellman Equations

• States S = {s1 , s2 }, Actions A = {a1 , a2 }

3. For the gridworld:

• States S = {s1 , s2 }, terminal s2

Section D: Exploration vs Exploitation (15 Marks)

• Arm 1: 10 pulls, avg=3

Section E: Model-Based vs Model-Free (10 Marks)

vπ (s) = Eπ [Rt+1 + γvπ (St+1 ) | St = s]

(b) Expected return calculation:

Gt = 2 + 0.9(−1) + 0.92 (3) = 2 − 0.9 + 2.43 = 3.53

2. (a) Deterministic policy: Maps states to specific actions (a = π(s)).

vπ (s1 ) = 5 + γvπ (s2 ) = 5 + 1 × 10 = 15

(b) Optimal value function:

3. Bellman equation for state A:

V (s1 ) = 5 + γV (s2 ) = 5 + 0.9 × 0 = 5

V (s2 ) remains 0 (terminal).

5. Value iteration steps:

Step 1: V1 (s1 ) = 5 + 0.9 × 0 = 5

• Exploit best arm with probability 1 − ϵ = 0.9

0.7 × (3 + 5) = 0.7 × 8 = 5.6

You might also like