0% found this document useful (0 votes)
10 views11 pages

Tutorial 1

Uploaded by

Он самый
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

Tutorial 1

Uploaded by

Он самый
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Dynamic Programming &

Reinforcement Learning

Tutorial session 1
Exercise 1
Compute by hand the discounted (β=3/4) and long-run
average rewards for the following infinite series:
• 1,1,1,1,1,1,1,….
• 2,0,2,0,2,0,2,….
• 100 1’s, then ∞ many 0’s
• 100 0’s, then ∞ many 1’s
• Give a reason why you prefer discounting over
averaging
• Give a sufficient condition on the rewards under
which discounted series converge
• Give an example of a diverging series
Exercise 2
• Find the shortest path from A to D in the directed
graph below using dynamic programming (by hand)
B
3 13

A 10 D
1
5 5
C
• After how many iterations are you certain to have
found the optimal solution? Give 2 termination
conditions
Exercise 3
• Look up on the internet how Dijkstra’s
algorithm works and apply it to the
example of Exercise 2
• Construct a simple graph for which Dijkstra
does not work and backward recursion
does (hint: you are allowed to use negative
arc lengths)
Exercise 4
• Determine by enumeration (i.e., try all
combinations) the shortest path in the first
example (the drawing of the city) of the
transparencies of lecture 1
• Check the correctness by verifying the
backward recursion
• Implement the problem in some suitable
tool or language (R/Excel/…) and solve it
Exercise 5
• Consider an inventory problem with 10 items and
immediate replenishments
• Let N be the maximum stock level for each item
• What is the number of states?
• You have a computer with 200GB memory available
for computations
• How big can N be such that you can still execute a
backward recursion algorithm?
– Hint: argue first why it suffices to store only 2 vectors
the size of X in memory
Exercise 6
• Estimate the total number of positions of
chess pieces on the chess board (feasible
& unfeasible)
• Estimate the total number of feasible
positions
• Compare your estimation with information
you find on the web
Exercise 7
• Consider a knapsack problem with W=10,
T=4, w=(5,4,3,3) and v=(3,2,2,2)
• Solve the problem by:
a) dynamic programming
b) a decision tree in which you consider all
possibilities
c) What is the complexity (≈ number of
calculations) of both methods as a function of
W and T?
Exercise 8
• Prove the following property of knapsack
problems: Vt(x) ≤ Vt(y) for all t and x ≤ y
• Hint: use induction on t starting from T
Exercise 9
• A variant of the Ludo board game (Dutch:
Mens-erger-je-niet) has the following rules.
A token advances with the role of a die,
when a player roll a 6 he/she can roll again
until the outcome is less than 6. What is
the expected number of squares that the
token advances in a full turn?

German board (wikipedia)


Exercise 10
• Two common examples of discrete and continuous
distributions are the Poisson and exponential
distribution:
λ n −λ
N ∼ Poisson(λ) ⇔ P(N = n) = e
n!
X ∼ exp(μ) ⇔ P(X ≤ t) = 1 − e−μt
• What are their expectations EN and EX?
(Hint: for the exponential distribution first derive the density
and then use partial integration; or use a formula for the
expectation that uses the tail of F)

You might also like