0% found this document useful (0 votes)

6 views69 pages

Infinite Horizon Problems

The document discusses infinite horizon Markov Decision Processes (MDPs) and the application of dynamic programming to solve them, highlighting concepts such as value iteration and policy iteration. It uses the example of a warehouse managing inventory to illustrate the principles of MDPs, including state and action spaces, transition probabilities, and reward functions. The document emphasizes the importance of optimal policies and value functions in decision-making processes over an infinite time horizon.

Uploaded by

hung.nguyenbk18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views69 pages

Infinite Horizon Problems

Uploaded by

hung.nguyenbk18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

2022-09-15

Inﬁnite horizon problems

tl;dr: Dynamic programming still works

Cathy Wu
6.7950 Reinforcement Learning: Foundations and Methods

Wu
2

References

1. With many slides adapted from Alessandro Lazaric and

Ma9eo Piro9a.

2. Dimitri P. Bertsekas. Dynamic Programming and OpAmal

Control. Volume 2. 4th EdiAon. (2012). Chapters 1-2:
Discounted Problems.

3. R. E. Bellman. Dynamic Programming. Princeton University

Press, Princeton, N.J., 1957.

Wu
Outline

1. Infinite horizon Markov Decision Processes

2. Value iteration

3. Policy iteration

Wu
Outline

1. Inﬁnite horizon Markov Decision Processes

a. Discounted, stochas:c shortest path, average cost
b. Policies
c. Dynamic programming algorithm?

2. Value iteraAon

3. Policy iteraAon

Wu
7

Example: The Amazing Goods Company

§ Description. At each month 𝑡, a warehouse contains 𝑠! items of a
specific goods and the demand for that goods is 𝐷 (stochastic).
At the end of each month the manager of the warehouse can order
𝑎! more items from the supplier.

§ The cost of maintaining an inventory of 𝑠 is ℎ(𝑠).

§ The cost to order 𝑎 items is 𝐶(𝑎).
§ The income for selling 𝑞 items if 𝑓(𝑞).
§ If the demand 𝑑~𝐷 is bigger than the available inventory 𝑠,
customers that cannot be served leave.
§ The value of the remaining inventory at the end of the year is
𝑔 𝑠 .
§ Constraint: the store has a maximum capacity 𝐶.

Wu
12

Markov Decision Process

Definition (Markov decision process)
A Markov decision process (MDP) is defined as a tuple 𝑀 = (𝑆, 𝐴, 𝑃, 𝑟, 𝛾) where
§ 𝑆 is the state space,
often simplified to finite
§ A is the action space,
§ 𝑃(𝑠"|𝑠, 𝑎) is the transition probability with
𝑃 𝑠" 𝑠, 𝑎 = ℙ(𝑠!#$ = 𝑠"|𝑠! = 𝑠, 𝑎! = 𝑎)
§ 𝑟(𝑠, 𝑎, 𝑠") is the immediate reward
at state 𝑠 upon taking action 𝑎, sometimes simply 𝑟(𝑠)
§ 𝛾 ∈ [0, 1) is the discount factor.

Example: The Amazing Goods Company

§ Discount: 𝛾 = 0.95. A dollar today is worth more than a dollar tomorrow.

Wu
13

Markov Decision Process

Example: The Amazing Goods Company

§ Objective: 𝑉 𝑠% ; 𝑎% , … = ∑' !
!&% 𝛾 𝑟! . This corresponds to the cumulative
reward, including the value of the remaining inventory at “the end.”
§ The “horizon” of the problem is 12 (12 months in 1 year), i.e. r$(
= g s$( ; 𝑟! = 0, 𝑡 > 12.
Wu
23

Markov Decision Process

Definition (Markov decision process)
A Markov decision process (MDP) is defined as a tuple 𝑀 = (𝑆, 𝐴, 𝑃, 𝑟, 𝛾) where
§ 𝑆 is the state space,
§ A is the action space,
§ 𝑃(𝑠"|𝑠, 𝑎) is the transition probability with
𝑃 𝑠" 𝑠, 𝑎 = ℙ(𝑠!#$ = 𝑠"|𝑠! = 𝑠, 𝑎! = 𝑎)
§ 𝑟(𝑠, 𝑎, 𝑠") is the immediate reward
at state 𝑠 upon taking action 𝑎,
§ 𝛾 ∈ [0, 1) is the discount factor.

F Two missing ingredients:

§ How actions are selected → policy.
§ What determines which actions (and states) are good → value function.

Wu
24

Policy
Definition (Policy)
A decision rule 𝑑 can be
§ Deterministic: 𝑑: 𝑆 → 𝐴,
§ Stochastic: 𝑑: 𝑆 → Δ(𝐴),
§ History-dependent: 𝑑: 𝐻! → 𝐴,
§ Markov: 𝑑: 𝑆 → Δ(𝐴),
A policy (strategy, plan) can be
§ Stationary: 𝜋 = 𝑑, 𝑑, 𝑑, … ,
§ (More generally) Non-stationary: 𝜋 = (𝑑% , 𝑑$ , 𝑑( , … )

FFor simplicity, we will typically write 𝜋 instead of 𝑑 for stationary policies, and 𝜋!
instead of 𝑑! for non-stationary policies.

Wu
Recall: The Amazing Goods 25

Company Example
§ Descrip(on. At each month 𝑡, a warehouse
contains 𝑠! items of a specific goods and the
demand for that goods is 𝐷 (stochastic). At the
end of each month the manager of the warehouse can
order 𝑎! more items from the supplier.

§ The cost of maintaining an inventory of 𝑠 is ℎ(𝑠).

§ The cost to order 𝑎 items is 𝐶(𝑎).
§ The income for selling 𝑞 items if 𝑓(𝑞).
§ If the demand 𝑑~𝐷 is bigger than the available
inventory 𝑠, customers that cannot be served
leave.
§ The value of the remaining inventory at the
end of the year is 𝑔 𝑠 .
§ Constraint: the store has a maximum capacity
𝐶.

Wu
Recall: The Amazing Goods 26

Company Example
§ Description. At each month 𝑡, a warehouse contains
𝑠! items of a specific goods and the demand for that
goods is 𝐷 (stochastic). At the end of each month
the manager of the warehouse can order 𝑎! more
items from the supplier.

§ The cost of maintaining an inventory of 𝑠 is ℎ(𝑠).

§ The cost to order 𝑎 items is 𝐶(𝑎).
§ The income for selling 𝑞 items if 𝑓(𝑞). Stationary policy composed of
§ If the demand 𝑑~𝐷 is bigger than the available deterministic Markov decision rules
inventory 𝑠, customers that cannot be served
leave. 𝐶−𝑠 if 𝑠 < 𝑀/4
𝜋 𝑠 = $
§ The value of the remaining inventory at the 0 otherwise
end of the year is 𝑔 𝑠 .
§ Constraint: the store has a maximum capacity
𝐶.

Wu
Recall: The Amazing Goods 27

Company Example
§ Descrip(on. At each month 𝑡, a warehouse
contains 𝑠! items of a specific goods and the
demand for that goods is 𝐷 (stochastic). At the end
of each month the manager of the warehouse can
order 𝑎! more items from the supplier.

§ The cost of maintaining an inventory of 𝑠 is ℎ(𝑠).

§ The cost to order 𝑎 items is 𝐶(𝑎).
§ The income for selling 𝑞 items if 𝑓(𝑞). Stationary policy composed of stochastic
§ If the demand 𝑑~𝐷 is bigger than the available history-dependent decision rules
inventory 𝑠, customers that cannot be served if 𝑠! < 𝑠!"# /2
𝑈(𝐶 − 𝑠!"# , 𝐶 − 𝑠!"# + 10)
leave. 𝜋 𝑠! = '
0 otherwise
§ The value of the remaining inventory at the
end of the year is 𝑔 𝑠 .
§ Constraint: the store has a maximum capacity
𝐶.

Wu
29

Optimization Problem
§ Our goal: solve the MDP
Definition (Optimal policy and optimal value function)
The solution to an MDP is an optimal policy 𝜋 ∗ satisfying

𝜋 ∗ ∈ arg max 𝑉 %
%∈'

where Π is some policy set of interest.

The corresponding value function is the optimal value function
∗
𝑉∗ = 𝑉%

Wu
State Value Function
§ Given a policy 𝜋 = (𝑑8 , 𝑑9 , … ) (deterministic to simplify notation)
§ Infinite time horizon with discount: the problem never terminates but
rewards which are closer in time receive a higher importance.
>

𝑉 : 𝑠 = 𝔼 + 𝛾 ; 𝑟 𝑠; , 𝜋; ?) |𝑠= = 𝑠; 𝜋
;<=
with discount factor 0 ≤ 𝛾 < 1:
§ Small = short-term rewards, big = long-term rewards
§ For any 𝛾 ∈ [0, 1) the series always converges (for bounded rewards)

§ Used when: there is uncertainty about the deadline and/or an intrinsic

definition of discount.

Wu
State Value Function
§ Given a policy 𝜋 = (𝑑5 , 𝑑6 , … ) (deterministic to simplify
notation)
§ Finite time horizon 𝑇: deadline at time 𝑇, the agent
focuses on the sum of the rewards up to 𝑇.
AB8

𝑉 : 𝑡, 𝑠 = 𝔼 + 𝑟 𝑠@ , 𝜋@ , ℎ@ + 𝑅 𝑠A |𝑠; = 𝑠; 𝜋 = (𝜋; , … , 𝜋 A )
@<;
where 𝑅 is a value function for the final state.

§ Used when: there is an intrinsic deadline to meet.

Wu
State Value Function
§ Given a policy 𝜋 = (𝑑5 , 𝑑6 , … ) (deterministic to simplify
notation)
§ Stochastic shortest path: the problem never terminates but
the agent will eventually reach a termination state.
;
𝑉 7 𝑠 = 𝔼 - 𝑟 𝑠8 , 𝜋8 , ℎ8 |𝑠: = 𝑠; 𝜋
89:
where 𝑇 is the first (random) time when the termination
state is achieved.

§ Used when: there is a specific goal condition.

Wu
State Value Func6on
§ Given a policy 𝜋 = (𝑑5 , 𝑑6 , … ) (deterministic to simplify
notation)
§ Infinite time horizon with average reward: the problem
never terminates but the agent only focuses on the
(expected) average of the rewards.
;>5
1
𝑉7 𝑠 = lim 𝔼 - 𝑟 𝑠8 , 𝜋8 , ℎ8 |𝑠: = 𝑠; 𝜋
;→= 𝑇
89:

§ Used when: the system should be constantly controlled

over time.
Wu
Notice
From now on we mostly work in the
discounted infinite horizon setting (except Lecture 5).

Most results (not always so smoothly ) extend to other settings.

Wu Wu
DP applies to infinite horizon problems, too!
§ Finite horizon stochastic and Markov problems (e.g. driving, robotics,
games)

𝑉!∗ 𝑠! = 𝑟! 𝑠! for all 𝑠! terminal reward

∗
𝑉M∗ 𝑠M = max 𝑟M (𝑠M , 𝑎M ) + 𝔼Q!"# ~R ⋅ 𝑠M , 𝑎M 𝑉MST 𝑠MST
O! ∈P
for all 𝑠M , and 𝑡 = {𝑇 − 1, … , 0}

§ From finite to (discounted) infinite horizon problems?

§ Infinite horizon stochastic problems (e.g. package delivery over months or

years, long-term customer satisfaction, control of autonomous vehicles)

𝑉 ∗ 𝑠 = max 𝑟(𝑠, 𝑎) + 𝛾𝔼&C ~ ( ⋅ &,#) 𝑉 ∗(𝑠 , ) for all 𝑠

#∈%
Wu Wu
Really?
§ Infinite horizon stochastic problems (e.g. package delivery over
months or years, long-term customer satisfaction, control of
autonomous vehicles)

𝑉 ∗ 𝑠 = max 𝑟(𝑠, 𝑎) + 𝛾𝔼&C ~ ( ⋅ &,#) 𝑉 ∗(𝑠 , ) for all 𝑠

#∈%

FThis is called the optimal Bellman equation.

§ An optimal policy is such that:

𝜋 ∗ 𝑠 = arg max 𝑟 𝑠, 𝑎 + 𝛾𝔼&C ~ ( ⋅ &,#) 𝑉∗ 𝑠, for all 𝑠
#∈%

§ Discuss: Any difficulties with this new algorithm?

Wu Wu
Outline

1. Infinite horizon Markov Decision Processes

2. Value iteration
a. Bellman operators, Optimal Bellman equation, and properties
b. Convergence
c. Numerical example

3. Policy iteration

Wu
Value iteration algorithm
1. Let 𝑉% (𝑠) be any function 𝑉% : 𝑆 → ℝ. [Note: not stage 0, but iteration 0.]
2. Apply the principle of optimality so that given 𝑉* at iteration 𝑖, we compute
𝑉*#$ 𝑠 = 𝒯𝑉* 𝑠 = max 𝑟(𝑠, 𝑎) + 𝛾𝔼. ! ~ 0 ⋅ .,+) 𝑉* 𝑠 " for all 𝑠
+∈-
3. Terminate when 𝑉* stops improving, e.g. when max |𝑉*#$ 𝑠 − 𝑉* 𝑠 | is small.
.
4. Return the greedy policy: 𝜋4 𝑠 = arg max 𝑟 𝑠, 𝑎 + 𝛾𝔼. ! ~ 0 ⋅ .,+) 𝑉4 𝑠"
+∈-

F A key result: 𝑉* → 𝑉 ∗ , as 𝑖 → ∞.

V
F Helpful properties
• Markov process
• Contraction in max-norm
• Cauchy sequences
• Fixed point

Adapted from Morales, Grokking Deep

Reinforcement Learning, 2020.

Definition (Optimal Bellman operator)

For any 𝑊 ∈ ℝ 6 , the optimal Bellman operator is defined as
𝒯𝑊 𝑠 = max 𝑟 𝑠, 𝑎 + 𝛾𝔼. ! ~ 0 ⋅ .,+) 𝑊 𝑠 " for all 𝑠
+∈-

F Then we can write the algorithm step 2 concisely:

𝑉*#$ 𝑠 = 𝒯𝑉* 𝑠 for all 𝑠

Key question: Does 𝑉* → 𝑉 ∗ ?

Wu
The student dilemma
§ Model: all the transitions are
Markov, states 𝑠D, 𝑠E, 𝑠F are 2 r=1
p=0.5
terminal. 1 Rest
Rest 0.4 r=−10
§ Setting: infinite horizon with 0.5

terminal states. 5
r=0 Work
§ Objective: find the policy Work 0.3 0.4 0.6
that maximizes the expected 0.5 0.7
0.5
sum of rewards before Rest
r=100
0.6
achieving a terminal state.
r=−1
§ Notice: Not a discounted Rest
0.9 6
Work
infinite horizon setting. But 3
the Bellman equations hold 0.1 r=−1000
unchanged. 0.5
1
0.5
§ Discuss: What kind of r=−10
Work

problem setting is this? (Hint: 4 7

value function.)

Wu Wu
The Op,mal Bellman Equa,on
Bellman’s Principle of Op8mality (Bellman (1957)):
“An optimal policy has the property that, whatever the initial state and the
initial decision are, the remaining decisions must constitute an
optimal policy with regard to the state resulting from the ﬁrst
decision.”

Wu
The Op6mal Bellman Equa6on
Theorem (Optimal Bellman Equation)
The optimal value function 𝑉 ∗ (i.e. 𝑉 ∗ = max 𝑉 % ) is the solution to the optimal Bellman
%
equation:

𝑉 ∗ 𝑠 = max 𝑟 𝑠, 𝑎 + 𝛾 7 𝑝 𝑠 * 𝑠, 𝑎) 𝑉 ∗ (𝑠 * )
&∈(
)(

And any optimal policy is such that:

𝜋 ∗ 𝑎 𝑠 ≥ 0 ⟺ 𝑎 ∈ arg max
(
𝑟 𝑠, 𝑎* + 𝛾 7 𝑝 𝑠 * 𝑠, 𝑎) 𝑉 ∗ (𝑠 * )
& ∈(
)(

Or, for short: 𝑉 ∗ = 𝒯𝑉 ∗

F There is always a deterministic policy (see: Puterman, 2005, Chapter 7)

Wu
Proof: The Optimal Bellman Equation
For any policy 𝜋 = 𝑎, 𝜋 , (possibly non-stationary),
D
𝑉 ∗ 𝑠 = max 𝔼 5 𝛾 A 𝑟 𝑠A , 𝜋 𝑠A | 𝑠C = 𝑠; 𝜋
@ [value function]
ABC
C
= maxC 𝑟 𝑠, 𝑎 + 𝛾 5 𝑝 𝑠 , 𝑠, 𝑎 𝑉 @ 𝑠 ,
(#,@ ) [Markov property &
&C change of “time”]
C
= max 𝑟 𝑠, 𝑎 + 𝛾 5 𝑝 𝑠 , 𝑠, 𝑎 max
C
𝑉 @ 𝑠,
# @
&C

= max 𝑟 𝑠, 𝑎 + 𝛾 5 𝑝 𝑠 , 𝑠, 𝑎 𝑉 ∗ 𝑠 ,
# [value function]
&C

Wu
Proof: Line 2 (also, the Bellman Equation)
For simplicity, consider any stationary
>
policy 𝜋 = 𝜋, 𝜋, … ,
𝑉 : 𝑠 = 𝔼 + 𝛾 ; 𝑟 𝑠; , 𝜋 𝑠; | 𝑠= = 𝑠; 𝜋 [value function]
;<= >

= 𝑟 𝑠, 𝜋 𝑠 + 𝔼 + 𝛾 ; 𝑟 𝑠; , 𝜋 𝑠; | 𝑠= = 𝑠; 𝜋 [Markov property]
;<8 >

= 𝑟 𝑠, 𝜋 𝑠 + 𝛾 + ℙ 𝑠8 = 𝑠 I 𝑠= = 𝑠; 𝜋(𝑠= )) 𝔼 + 𝛾 ;B8 𝑟 𝑠; , 𝜋 𝑠; | 𝑠8 = 𝑠′; 𝜋

H7 ;<8
[MDP and change of “time”]
>
7
= 𝑟 𝑠, 𝜋 𝑠 + 𝛾 + 𝑝(𝑠 I |𝑠, 𝜋 𝑠 ) 𝔼 + 𝛾 ; 𝑟 𝑠; 7 , 𝜋 𝑠; 7 | 𝑠=7 = 𝑠′; 𝜋
H7 ; 7 <=

[value function]
= 𝑟 𝑠, 𝜋 𝑠 + 𝛾 + 𝑝 𝑠 I 𝑠, 𝜋 𝑠 𝑉 : (𝑠 I )
H7
Wu
Proof: Line 3
For the =, we have:
! !
max c 𝑝 𝑠 " 𝑠, 𝑎 𝑉 8 𝑠 " ≤ c 𝑝 𝑠 " 𝑠, 𝑎 max
!
𝑉8 𝑠"
8" 8
.! .!
!
But, let 𝜋k 𝑠 " = argmax 𝑉 8 (𝑠 " )
8"
! !
c 𝑝 𝑠 " 𝑠, 𝑎 max
!
𝑉 8 𝑠 " ≤ c 𝑝 𝑠 " 𝑠, 𝑎 𝑉 89 𝑠 " ≤ max c 𝑝 𝑠 " 𝑠, 𝑎 𝑉 8 𝑠 "
8 8"
.! .! .!

Wu
The student dilemma p=0.5
2 r=1

1 Rest 0.5
Rest
0.4 r=−10

Work 5
𝑉 ∗ 𝑠 = max 𝑟 𝑥, 𝑎 + 𝛾 I 𝑝 𝑦 𝑥, 𝑎) 𝑉 ∗(𝑦)
r=0
Work 0.3 0.4 0.6
K∈L 0.5 0.7
M 0.5
Rest
r=100
0.6
r=−1
Rest
0.9 6
Work

System of equations 3
0.1
r=−1000
0.5
1
0.5
𝑉8 = max 0 + 0.5 𝑉8 + 0.5 𝑉9; 0 + 0.5 𝑉8 + 0.5 𝑉N r=−10
Work

4 7
𝑉9 = max 0 + 0.4 𝑉D + 0.6 𝑉9; 0 + 0.3 𝑉8 + 0.7 𝑉N
𝑉N = max −1 + 0.4 𝑉9 + 0.6 𝑉N; −1 + 0.5 𝑉O + 0.5 𝑉N
𝑉O = max −10 + 0.9 𝑉E + 0.1 𝑉O; −10 + 𝑉F
𝑉D = −10 Wu

𝑉E = 100
𝑉F = −1000

Discuss: How to solve this system of equations?

Wu
System of Equa6ons
The optimal Bellman equation:
𝑉 ∗ 𝑠 = max 𝑟 𝑠, 𝑎 + 𝛾 , 𝑝 𝑠 @ 𝑠, 𝑎) 𝑉 ∗ (𝑠 @ )
<∈>
?+

Is a non-linear system of equations with 𝑁 unknowns and 𝑁 non-

linear constraints (i.e. the max operator).

F A key result: 𝑉* → 𝑉 ∗ , as 𝑖 → ∞.

V
F Helpful properties
• Markov process
• Contraction in max-norm
• Cauchy sequences
• Fixed point

Adapted from Morales, Grokking Deep

Reinforcement Learning, 2020.

Wu
Proper&es of Bellman
Operators
Proposi'on
1. Contraction in 𝐿)-norm: for any 𝑊# , 𝑊* , ∈ ℝ+
𝒯𝑊# − 𝒯𝑊* ) ≤ 𝛾 𝑊# − 𝑊* )

2. Fixed point: 𝑉 ∗ is the unique fixed point of 𝒯, i.e. 𝑉 ∗ = 𝒯𝑉 ∗ .

Proof: value iteration

§ From contraction property of 𝒯, 𝑉: = 𝒯V; <$ , and optimal value function 𝑉 ∗ = 𝒯𝑉 ∗ :
𝑉 ∗ − 𝑉:#$ '
= 𝒯𝑉 ∗ − 𝒯𝑉: ' [optimal Bellman eq. and value iteration]
≤ 𝛾 𝑉 ∗ − 𝑉: ' [contraction]
≤ 𝛾 :#$ 𝑉 ∗ − 𝑉% ' [recursion]
→0
𝑉: → 𝑉 ∗ [fixed point]

Wu
Properties of Bellman
Operators
Proposi'on
1. Contraction in 𝐿)-norm: for any 𝑊# , 𝑊* , ∈ ℝ+
𝒯𝑊# − 𝒯𝑊* ) ≤ 𝛾 𝑊# − 𝑊* )

2. Fixed point: 𝑉 ∗ is the unique fixed point of 𝒯, i.e. 𝑉 ∗ = 𝒯𝑉 ∗ .

Proof: value iteration

§ Convergence rate. Let 𝜖 > 0 and 𝑟 ' ≤ 𝑟max , then after at most
𝑟max
log
1−𝛾 𝜖
𝑉 ∗ − 𝑉: ' ≤ 𝛾 : 𝑉 ∗ − 𝑉% ' <𝜖 ⟹ 𝐾>
1
log( )
𝛾

Wu
Proof: Contraction of the Bellman Operator
For any 𝑠 ∈ 𝑆

𝒯𝑊$ 𝑠 − 𝒯𝑊( 𝑠

= max 𝑟 𝑠, 𝑎 + 𝛾 c 𝑝 𝑠 " 𝑠, 𝑎) 𝑊$ 𝑠 " − max

!
𝑟 𝑠, 𝑎 " + 𝛾 c 𝑝 𝑠 " 𝑠, 𝑎 " ) 𝑊( 𝑠 "
+ +
.! .!

≤ max 𝑟 𝑠, 𝑎 + 𝛾 c 𝑝 𝑠 " 𝑠, 𝑎) 𝑊$ 𝑠 " − 𝑟 𝑠, 𝑎 + 𝛾 c 𝑝 𝑠 " 𝑠, 𝑎) 𝑊( 𝑠 "

+
.! .!

= 𝛾 max c 𝑝 𝑠 " 𝑠, 𝑎) 𝑊$ 𝑠 " − 𝑊( 𝑠 "

+
.!

≤ 𝛾 𝑊$ − 𝑊( ' max c 𝑝 𝑠 " 𝑠, 𝑎) = 𝛾 𝑊$ − 𝑊( '

+
.!

max 𝑓 𝑥 − max
!
𝑔 𝑥 " ≤ max(𝑓 𝑥 − 𝑔 𝑥 )
= = =

Wu
Value Iteration: the Guarantees
Corollary
Let 𝑉, be the function computed after 𝐾 iterations by value iteration, then the greedy policy

𝜋, 𝑠 ∈ arg max 𝑟 𝑠, 𝑎 + 𝛾 Z 𝑝 𝑠 0 𝑠, 𝑎 𝑉, 𝑠 0
-∈.
/"

is such that

2𝛾
𝑉 ∗ − 𝑉 %# ) ≤ 𝑉 ∗ − 𝑉, )
1−𝛾

performance loss approx. error

Furthermore, there exists 𝜖 > 0 such that if 𝑉 ∗ − 𝑉, ) ≤ 𝜖, then 𝜋, is optimal.

Wu
Proof: Performance Loss
§ Note 1: We drop the 𝐾 everywhere.

§ Note 2: 𝜋 is the greedy policy corresponding to 𝑉, and 𝑉 @ is the

value function evaluated with 𝜋.

𝑉 ∗ − 𝑉 @ D ≤ Τ𝑉 ∗ − Τ @ 𝑉 D + Τ @ V − Τ @ 𝑉 @ D
≤ Τ𝑉 ∗ − Τ𝑉 D + 𝛾 V − 𝑉 @ D
≤ 𝛾 V ∗ − 𝑉 D + 𝛾( V − 𝑉 ∗ D+ V ∗ − 𝑉 @ D)
2𝛾
≤ V∗ − 𝑉 D
1−𝛾

Wu
Value Iteration: the Complexity
Time complexity
§ Each iteration takes on the order of 𝑆 9 𝐴 operations.
𝑉ab8 𝑠 = 𝒯𝑉a 𝑠 = max 𝑟 𝑠, 𝑎 + 𝛾 + 𝑝 𝑠 I 𝑠, 𝑎 𝑉a 𝑠 I
K∈L
HI

§ The computation of the greedy policy takes on the order of 𝑆 9 𝐴 operations.

𝜋c 𝑠 ∈ arg max 𝑟 𝑠, 𝑎 + 𝛾 + 𝑝 𝑠 I 𝑠, 𝑎 𝑉c 𝑠 I
K∈L
HI

§ Total time complexity on the order of 𝐾𝑆 9 𝐴.

Space complexity
§ Storing the MDP: dynamics on the order of 𝑆 9 𝐴 and reward on the order of
𝑆𝐴.
§ Storing the value function and the optimal policy on the order of 𝑆.

Wu
Value Iteration: Extensions and Implementations
Asynchronous VI:
1. Let 𝑉r be any vector in 𝑅 s
2. At each iteration 𝑘 = 1, 2, … . , 𝐾
• Choose a state 𝑠1
• Compute 𝑉12# 𝑠1 = Τ𝑉1 (𝑠1 )

3. Return the greedy policy

𝜋t 𝑠 ∈ arg max 𝑟 𝑠, 𝑎 + 𝛾 A 𝑝 𝑠 u 𝑠, 𝑎 𝑉t 𝑠 u
O∈P
Q+
Comparison
§ Reduced time complexity to O(SA)
§ Using round-robin, number of iterations increased by at most O(KS)
but much smaller in practice if states are properly prioritized
§ Convergence guarantees if no starvation
Wu
The Grid-World Problem

Wu
Example: Winter parking (with ice and potholes)
§ Simple grid world with a goal state (green, desired parking spot) with
reward (+1), a “bad state” (red, pothole) with reward (-100), and all
other states neural (+0).
§ Omnidirectional vehicle (agent) can head in any direction. Actions
move in the desired direction with probably 0.8, in one of the
perpendicular directions with.
§ Taking an action that would bump into a wall leaves agent where it is.

[Source: adapted from Kolter, 2016]

Wu
Example: value iteration

(a)
Recall value iteration algorithm:
V>#$ 𝑠 = max 𝑟 𝑠, 𝑎 + 𝛾𝔼. ! ~ 0 ⋅ .,+) 𝑉* 𝑠" for all 𝑠
+∈-
Let’s arbitrarily initialize 𝑉% as the reward function, since it can be any function. Wu

Example update (red state):

V$ red = −100 + 𝛾 max{ 0.8𝑉% green + 0.1𝑉% red + 0, [up]
0 + 0.1𝑉% red + 0, [down]
0 + 0.1𝑉% green + 0, [left]
0.8𝑉% red + 0.1𝑉% green + 1 } [right]
= −100 + 0.9 0.1 ∗ 1 = −99.91 [best: go left]

Wu
Example: value iteration

Example update (green state):

V$ red = 1 + 𝛾 max{ 0.8𝑉% green + 0.1𝑉% green , [up]
0.8𝑉% red + 0.1𝑉% green , [down]
0 + 0.1𝑉% green + 0.1𝑉% red , [left]
0.8𝑉% red + 0.1𝑉% green + 0 } [right]
= 1 + 0.9 0.9 ∗ 1 = 1.81 [best: go up]

Wu
Example: value iteration

(a) (b)
Recall value iteration algorithm:
V>#$ 𝑠 = max 𝑟 𝑠, 𝑎 + 𝛾𝔼. ! ~ 0 ⋅ .,+) 𝑉* 𝑠" for all 𝑠
+∈- Wu

Let’s arbitrarily initialize 𝑉% as the reward function, since it can be any function.

Need to also do this for all the “unnamed” states, too.

Wu
Example: value iteration

(a) (b) (c)

(d) (e) (f)

Wu
Outline

1. Infinite horizon Markov Decision Processes

2. Value iteration

3. Policy iteration
a. Bellman equation, and properties
b. Convergence
c. Geometric interpretations
d. Generalized policy iteration

Wu
More generally…
Value iteration:
1. 𝑉db8 𝑠 = max 𝑟(𝑠, 𝑎) + 𝛾𝔼H 7 ~ e ⋅ H,K) 𝑉d (𝑠 I ) for all 𝑠
K∈L
2. 𝜋c 𝑠 = arg max 𝑟 𝑠, 𝑎 + 𝛾𝔼H 7 ~ e ⋅ H,K) 𝑉c 𝑠 I
K∈L

Related Operations:
§ Policy evaluation: 𝑉db8 𝑠 = 𝑟 𝑠, 𝜋d 𝑠 + 𝛾𝔼H7~ e ⋅ H,:? H ) 𝑉d 𝑠 I for all 𝑠
§ Policy improvement: 𝜋d 𝑠 = arg max 𝑟 𝑠, 𝑎 + 𝛾𝔼H 7 ~ e ⋅ H,K) 𝑉d 𝑠 I
K∈L

F Generalized Policy Iteration:

§ Repeat:
1. Policy evaluation for 𝑁 steps
2. Policy improvement

§ Value iteration: 𝑁 = 1; Policy iteration: 𝑁 = ∞

Wu
85

In pictures

Adapted from Morales, Grokking Deep Reinforcement Learning, 2020.

Wu
Policy Itera6on: the Idea
1. Let 𝜋C be any stationary policy
2. At each iteration 𝑘 = 1, 2, … . , 𝐾
• Policy evaluation: given 𝜋: , compute 𝑉 8"
• Policy improvement: compute the greedy policy
𝜋:#$ 𝑠 ∈ arg max 𝑟 𝑠, 𝑎 + 𝛾 c 𝑝 𝑠 " 𝑠, 𝑎 𝑉 8" 𝑠 "
+∈-
.!

3. Stop if 𝑉 @C = 𝑉 @CDE
4. Return the last policy 𝜋E

Wu
Policy Iteration: the Guarantees
Proposition
The policy iteration algorithm generates a sequence of policies with non-decreasing performance
𝑉 8"#$ ≥ 𝑉 8"

and it converges to 𝜋 ∗ in a finite number of iterations.

Wu
The Bellman Equation
Theorem (Bellman equa7on)
For any stationary policy 𝜋 = (𝜋, 𝜋, … ), at any state 𝑠 ∈ 𝑆, the state value function satisfies the Bellman equation:

𝑉 % 𝑠 = 𝑟 𝑠, 𝜋 𝑠 + 𝛾 Z 𝑝 𝑠 0 𝑠, 𝜋 𝑠 𝑉 % 𝑠 0
/" ∈3

Wu
The student dilemma
V2 = 88.3
p=0.5 r=1
Rest
0.4 r=−10
V1 = 88.3 Rest 0.5
V5= −10
§ Discuss: How to solve this system of equa5ons? r=0
Work
W ork
0.3 0.4 0.6
0.5 0.7
0.5 r=100
Rest
0.6
V = 100
r=−1 0.9
6

𝑉 8 𝑥 = 𝑟 𝑥, 𝜋 𝑥 + 𝛾 c 𝑝 𝑦 𝑥, 𝜋 𝑥 𝑉 8 (𝑦) V3 = 86.9 Rest

Wor
k
0.1 r=−1000
F 0.5
1
V7 = −100
0.5
System of equations
r=−10 Work
V4= 88.9

𝑉$ = 0 + 0.5 𝑉$ + 0.5 𝑉(
𝑉( = 1 + 0.3 𝑉$ + 0.7 𝑉@ 𝑉, 𝑅 ∈ ℝD , 𝑃8 ∈ ℝD×D
𝑉@ = −1 + 0.5 𝑉A + 0.5 𝑉@ 𝑉 = 𝑅 + 𝑃𝑉
𝑉A = −10 + 0.9𝑉C + 0.1 𝑉A ⟹
𝑉B = −10 ⇓
𝑉C = 100 𝑉 = 𝐼 −𝑃 <$
𝑅
𝑉D = −1000

Wu
Recap: The Bellman Operators
Notation. w.l.o.g. a discrete state space 𝑆 = 𝑁 and 𝑉 @ ∈ ℝK
(analysis extends to include 𝑁 → ∞ )

Deﬁni:on
For any 𝑊 ∈ ℝ, , the Bellman operator Τ % : ℝ, → ℝ, is
Τ % 𝑊 𝑠 = 𝑟 𝑠, 𝜋 𝑠 + 𝛾 7 𝑝 𝑠 * 𝑠, 𝜋 𝑠 𝑊(𝑠 * )
)(

And the optimal Bellman operator (or dynamic programming operator) is

Τ𝑊 𝑠 = max 𝑟 𝑠, 𝑎 + 𝛾 7 𝑝 𝑠 * 𝑠, 𝑎 𝑊(𝑠)
&∈(
)(

Wu
The Bellman Operators
Proposition
Properties of the Bellman operators
1. Monotonicity: For any 𝑊- , 𝑊. ∈ ℝ, , if 𝑊- ≤ W. component-wise, then
Τ % 𝑊- ≤ Τ % 𝑊.
Τ𝑊- ≤ Τ𝑊.
2. Offset: For any scalar 𝑐 ∈ ℝ,
Τ % 𝑊 − 𝑐𝐼, = Τ % 𝑊 + 𝛾𝑐𝐼,
Τ 𝑊 − 𝑐𝐼, = Τ𝑊 + 𝛾𝑐𝐼,

Wu
The Bellman Operators
Proposition
3. Contraction in 𝐿/ -norm: For any 𝑊- , 𝑊. ∈ ℝ,
Τ % 𝑊- − Τ % 𝑊. / ≤ 𝛾 𝑊- − 𝑊. /
Τ𝑊- − Τ𝑊. / ≤ 𝛾 𝑊- − 𝑊. /

4. Fixed point: For any policy 𝜋,

𝑉 % is the unique fixed point of Τ %

𝑉 ∗ is the unique fixed point of Τ

§ For any 𝑊 ∈ ℝ+ and any stationary policy 𝜋

lim Τ % 1 𝑊 = 𝑉 %
1→)
lim Τ 1 𝑊 = 𝑉 ∗
1→)

Wu
Policy Iteration: the Idea
1. Let 𝜋C be any stationary policy
2. At each iteration 𝑘 = 1, 2, … . , 𝐾
• Policy evaluation: given 𝜋: , compute 𝑉 8"
• Policy improvement: compute the greedy policy
𝜋:#$ 𝑠 ∈ arg max 𝑟 𝑠, 𝑎 + 𝛾 c 𝑝 𝑠 " 𝑠, 𝑎 𝑉 8" 𝑠 "
+∈-
.!

3. Stop if 𝑉 @C = 𝑉 @CDE
4. Return the last policy 𝜋E

Wu
Policy Iteration: the Guarantees
Proposition
The policy iteration algorithm generates a sequence of policies with non-
decreasing performance
𝑉 F0"# ≥ 𝑉 F0

and it converges to 𝜋 ∗ in a finite number of iterations.

Wu
Proof: Policy Iteration
§ From the definition of the Bellman operators and the greedy policy
𝜋‚ST
𝑉 ƒ0 = 𝒯 ƒ0 𝑉 ƒ0 ≤ 𝒯𝑉 ƒ0 = 𝒯 ƒ0"# 𝑉 ƒ0
§ and from the monotonicity property of 𝒯 ƒ0"# , it follows that
𝑉 ƒ0 ≤ 𝒯 ƒ0"# 𝑉 ƒ0
𝒯 ƒ0"# 𝑉 ƒ0 ≤ 𝒯 ƒ0"# „ 𝑉 ƒ0
…
𝒯 ƒ0"# …†T 𝑉 ƒ0 ≤ 𝒯 ƒ0"# … 𝑉 ƒ0
…
§ Joining all inequalities in the chain, we obtain
𝑉 ƒ0 ≤ lim 𝒯 ƒ0"# … 𝑉 ƒ0 = 𝑉 ƒ0"#
…→‡

§ Then 𝑉 ƒ0 ‚ is a non-decreasing sequence.

Wu
Policy Itera6on: the Guarantees

Since a finite MDP admits a finite number of policies, then the termination condition
is eventually met for a specific 𝑘.

Thus eq. 1 holds with an equality and we obtain

𝑉 8" = 𝒯𝑉 8"

and 𝑉 8" = 𝑉 ∗ which implies that 𝜋: is an optimal policy.

Wu
Notation. For any policy 𝜋 the reward vector
Policy Itera&on: Complexity is 𝑟 % 𝑥 = 𝑟(𝑥, 𝜋 𝑥 ) and the transition
matrix is 𝑃% 5,7 = 𝑝(𝑦|𝑥, 𝜋 𝑥 )
§ Policy Evaluation Step
§ Direct computation: For any policy 𝜋 compute
𝑉 8 = 𝐼 − 𝛾𝑃8 <$ 8
𝑟
Complexity: O(S3).
§ Iterative policy evaluation: For any policy 𝜋
lim Τ 8 𝑉% = 𝑉 8
G→'
$
IJK
8 (
Complexity: An 𝜖-approximation of 𝑉 requires 𝑂 𝑠 %
$ steps.
IJK
&

§ Monte-Carlo simulation: In each state 𝑠, simulate 𝑛 trajectories 𝑠!* !L%

following
$M*MG Wu
policy 𝜋 and compute
G
1
𝑉• 8 𝑠 ≃ c c 𝛾 ! 𝑟 𝑠!* , 𝜋 𝑠!*
𝑛
*&$ !L%
N'() $
Complexity: In each state, the approximation error is 𝑂 $<O G
.

Wu
Policy Iteration: Complexity
§ Policy Improvement Step
• Complexity O(S2A)

§ Number of IteraAons
L> M
• At most 𝑂 log
MNO MNO
• Other results exist that do not depend on 𝛾

Wu
Comparison between Value and Policy Itera6on
§ Value Iteration
• Pros: each iteration is computationally eﬃcient.
• Cons: convergence is only asymptotic.

§ Policy Iteration
• Pros: converge in a finite number of iterations (often small in practice).
• Cons: each iteration requires a full policy evaluation and it might be
expensive.
Wu

[Source: adapted from Kolter, 2016]

Wu
Example: value iteration

(a) (b) (c)

(d) (e) (f)

Wu
Example: policy iteration

(a) (b)

Wu
Policy iteration: geometric Interpretation
𝒯𝑉

Wu
More variations

Adapted from Morales, Grokking Deep Reinforcement Learning, 2020. Wu

Summary & Takeaways
§ When specifying a sequential problem, care should be taken to
select an appropriate type of policy and value function,
depending on the use case.
§ The ideas from dynamic programming, namely the principle of
optimality, carry over to infinite horizon problems.
§ The value iteration algorithm solves discounted infinite horizon MDP
problems by leveraging results of Bellman operators, namely the
optimal Bellman equation, contractions, and fixed points.
§ Generalized policy iteration methods include policy iteration and
value iteration.
§ Policy iteration algorithm additionally leverages monotonicity and
Bellman equation.
§ The update mechanism for VI and PI differ and thus their
convergence in practice depends on the geometric structure of the
optimal value function.

Berry-Esseen Central Limit The
No ratings yet
Berry-Esseen Central Limit The
65 pages
Stochastic DP
No ratings yet
Stochastic DP
23 pages
19.5 Markov Decision Processes: Resolving Unbounded Expected Rewards
No ratings yet
19.5 Markov Decision Processes: Resolving Unbounded Expected Rewards
13 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
EE290 Lecture 16
No ratings yet
EE290 Lecture 16
4 pages
Markovian Decision Process
No ratings yet
Markovian Decision Process
27 pages
06 MDP
No ratings yet
06 MDP
89 pages
242 Sheet 02 03
No ratings yet
242 Sheet 02 03
5 pages
Lecture Notes
No ratings yet
Lecture Notes
29 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
17 pages
Conjugate Markov Decision Processes
No ratings yet
Conjugate Markov Decision Processes
8 pages
A Tutorial For Reinforcement Learning
No ratings yet
A Tutorial For Reinforcement Learning
14 pages
MIT 6.036 Lecture
No ratings yet
MIT 6.036 Lecture
64 pages
Microsoft PowerPoint - Lecture20Final-Part1
No ratings yet
Microsoft PowerPoint - Lecture20Final-Part1
65 pages
08 - Markov Decision Processes
No ratings yet
08 - Markov Decision Processes
31 pages
Dynamic Programming
No ratings yet
Dynamic Programming
52 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
Markov Decision Process I
No ratings yet
Markov Decision Process I
111 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
20ai903 - RL - Unit 2
No ratings yet
20ai903 - RL - Unit 2
27 pages
15 MDP
No ratings yet
15 MDP
35 pages
NIPS 2012 A Unifying Perspective of Parametric Policy Search Methods For Markov Decision Processes Paper
No ratings yet
NIPS 2012 A Unifying Perspective of Parametric Policy Search Methods For Markov Decision Processes Paper
9 pages
CSE2530 Reinforcement Learning 2025 P1+2
No ratings yet
CSE2530 Reinforcement Learning 2025 P1+2
115 pages
M 2
No ratings yet
M 2
12 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
100% (1)
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
86 pages
Lecture7 MDPs I
No ratings yet
Lecture7 MDPs I
9 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
No ratings yet
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
26 pages
CS229
No ratings yet
CS229
17 pages
Cs5811 Ch17 Complex Dec
No ratings yet
Cs5811 Ch17 Complex Dec
29 pages
MDPintro 4 Yixin Ye-2
No ratings yet
MDPintro 4 Yixin Ye-2
42 pages
Sp14 Cs188 Lecture 8 - Mdps I
No ratings yet
Sp14 Cs188 Lecture 8 - Mdps I
50 pages
RL and ObC Lecture 2
No ratings yet
RL and ObC Lecture 2
20 pages
MIT16 410F10 Lec22
No ratings yet
MIT16 410F10 Lec22
19 pages
Reinforcement Learning Cheatsheet
No ratings yet
Reinforcement Learning Cheatsheet
16 pages
MDP PDF
No ratings yet
MDP PDF
37 pages
RL-UNIT2 - RL Unit 2 RL-UNIT2 - RL Unit 2
No ratings yet
RL-UNIT2 - RL Unit 2 RL-UNIT2 - RL Unit 2
23 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
242 Sheet 02 02
No ratings yet
242 Sheet 02 02
6 pages
On State Variables and POMDP-s
No ratings yet
On State Variables and POMDP-s
49 pages
Lecture#5 Monte Carlo Methods Part I
No ratings yet
Lecture#5 Monte Carlo Methods Part I
28 pages
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
No ratings yet
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
66 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
Mondal Smdps
No ratings yet
Mondal Smdps
17 pages
08 MDPs
No ratings yet
08 MDPs
110 pages
RL Module 4
No ratings yet
RL Module 4
50 pages
Cs229-Notes12 Reinforcement in Control
No ratings yet
Cs229-Notes12 Reinforcement in Control
17 pages
Powell UnifiedFrameworkStochasticOptimization Jan292018
No ratings yet
Powell UnifiedFrameworkStochasticOptimization Jan292018
69 pages
Lecture 2 Post
No ratings yet
Lecture 2 Post
65 pages
RL Unit - Ii
No ratings yet
RL Unit - Ii
20 pages
RL Unit-Ii
No ratings yet
RL Unit-Ii
14 pages
Markov Decision
No ratings yet
Markov Decision
4 pages
Your Opportunity River: Increase the Flow of Profits and Scale Your Manufacturing Organization
From Everand
Your Opportunity River: Increase the Flow of Profits and Scale Your Manufacturing Organization
Greg Lake
No ratings yet
MYP Criteria Year 5
No ratings yet
MYP Criteria Year 5
4 pages
Sirosonic L
No ratings yet
Sirosonic L
100 pages
Brochure Damen ASD Tug 3212
100% (1)
Brochure Damen ASD Tug 3212
39 pages
DK 5 Dan 8 Financial Plan - Iqmal Denda R
No ratings yet
DK 5 Dan 8 Financial Plan - Iqmal Denda R
24 pages
Mariadb Vs Mysql: Daniel Bartholomew Monty Program, Ab Sep 2012
No ratings yet
Mariadb Vs Mysql: Daniel Bartholomew Monty Program, Ab Sep 2012
6 pages
Determination of Flow Properties of Powders and Granules
No ratings yet
Determination of Flow Properties of Powders and Granules
6 pages
01 Introduction To Yocto
No ratings yet
01 Introduction To Yocto
62 pages
C++ All Modules
No ratings yet
C++ All Modules
68 pages
Pascal Output Answer
100% (1)
Pascal Output Answer
13 pages
Silva Et-Al 2013
No ratings yet
Silva Et-Al 2013
8 pages
Depolarization
No ratings yet
Depolarization
8 pages
Merged Seminar 1 - Integrali I Pacaktuar
No ratings yet
Merged Seminar 1 - Integrali I Pacaktuar
16 pages
Beamon's Model
No ratings yet
Beamon's Model
5 pages
IPS SW Upgrade Document Rev 15
No ratings yet
IPS SW Upgrade Document Rev 15
17 pages
Compressive Strength Characteristic of Cowdung Ash Blended Cement Concrete
No ratings yet
Compressive Strength Characteristic of Cowdung Ash Blended Cement Concrete
7 pages
Java Programming
No ratings yet
Java Programming
2 pages
3-1 Derivatives of Elementary Weaves
No ratings yet
3-1 Derivatives of Elementary Weaves
20 pages
Common Emitter Amplifier
100% (1)
Common Emitter Amplifier
11 pages
LIU 2019 Prepublication Version
No ratings yet
LIU 2019 Prepublication Version
351 pages
G2P2 CN en 0
No ratings yet
G2P2 CN en 0
110 pages
Project Report
100% (1)
Project Report
58 pages
II Sem Syllabus
No ratings yet
II Sem Syllabus
12 pages
CA02CA3103 RMTTransportation Problem
No ratings yet
CA02CA3103 RMTTransportation Problem
25 pages
FT-891 Quick Manual: (PWR/LOCK) Key RF/SQL Knob
No ratings yet
FT-891 Quick Manual: (PWR/LOCK) Key RF/SQL Knob
2 pages
Knowledge Based System PDF
No ratings yet
Knowledge Based System PDF
14 pages
Dignaga's Philosophy of Language Dignaga On Anyapoha
No ratings yet
Dignaga's Philosophy of Language Dignaga On Anyapoha
374 pages
Math 30-2 Unit Exam (#1)
No ratings yet
Math 30-2 Unit Exam (#1)
10 pages
TOPSOE Seminar - Catalysts and Reactions PDF
100% (4)
TOPSOE Seminar - Catalysts and Reactions PDF
132 pages
Assignment
No ratings yet
Assignment
6 pages