0% found this document useful (0 votes)
17 views24 pages

Session 10

MARKOV

Uploaded by

Irfan Khilji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views24 pages

Session 10

MARKOV

Uploaded by

Irfan Khilji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

MARKOV DECISION PROCESS

AND APPLICATIONS – SESSION 10


Sumanta Basu
Professor
Operations Management Group
Indian Institute of Management Calcutta
MARKOV DECISION PROCESS (MDP)
AN EXAMPLE
 A manufacturer has one key machine used in the
production process. Because of the heavy use, the
machine deteriorates rapidly in both quality and
output. Therefore a thorough inspection is done
at the end of each week to classify the condition
of the machine into one of the four possible
states:
State Condition
0 Good as new
1 Operable – minor deterioration
2 Operable – major deterioration
3 Inoperable
EXAMPLE: MDP
 From historical data, the following matrix shows
the relative frequency (probability) of each
possible transition from the state in one week to
the state in the following week. (All states follow
Markovian property)
0 1 2 3
0 0 7/8 1/16 1/16
1 0 3/4 1/8 1/8
2 0 0 1/2 1/2
3 0 0 0 1
EXAMPLE: MDP
 So the revised transition probability matrix will
be:
0 1 2 3
0 0 7/8 1/16 1/16
1 0 3/4 1/8 1/8
2 0 0 1/2 1/2
3 0 0 0 1

0 1 2 3
0 0 7/8 1/16 1/16
1 0 3/4 1/8 1/8
2 0 0 1/2 1/2
3 1 0 0 0
MDP: DECISION ALTERNATIVES
 Given the scenario, what are the other decisions
we can take?
Decision Action Relevant States
1 Do nothing 0 (New),1 (Minor), 2
(Major)
2 Overhaul (Return system to 2 (Major)
state 1)
3 Overhauling can(Return
Replace take place only
system to when
1, 2,the system is
3 (Inoperable)
in state 2state
and0)it takes 1 week.
MDP: COST STRUCTURE FOR DECISION
MAKING
Decision State Expected cost for Maintenance / Cost of lost Total cost
producing defective Replacement cost production per week
items
Do 0 0 0 0 0
nothing
1 1000 0 0 1000

2 3000 0 0 3000

Overhaul 2 0 2000 2000 4000

Replace 1,2,3 0 4000 2000 6000


MDP: POLICIES CHOSEN
 Decision set is as follows:
Decision Action Relevant States
1 Do nothing 0,1,2
2 Overhaul (return system to state 1) 2
3 Replace (Return system to state 0) 1,2,3

 Policy is a combination of multiple decisions to be


chosen in different states: Decision State

Policy Verbal Description d0[R] d1[R] d2[R] d3[R]


Ra Replace in state 3 1 1 1 3
Rb Replace in state 3, 1 1 2 3
overhaul in state 2
Rc Replace in states 2 & 3 1 1 3 3
Rd Replace in states 1, 2 & 3 1 3 3 3
MDP: TRANSITION PROBABILITY MATRIX
FOR POLICY RA

 Policy Ra:
Policy Verbal Description d0[R] d1[R] d2[R] d3[R]
Ra Replace in state 3 1 1 1 3

0 1 2 3 Initial
0 0 7/8 1/16 1/16 TPM
1 0 3/4 1/8 1/8
2 0 0 1/2 1/2
3 0 0 0 1

0 1 2 3
0 0 7/8 1/16 1/16
1 0 3/4 1/8 1/8
2 0 0 1/2 1/2
3 1 0 0 0
MDP: TRANSITION PROBABILITY MATRIX
FOR POLICY RB

 Policy Rb:
Policy Verbal Description d0[R] d1[R] d2[R] d3[R]
Rb Replace in state 3, 1 1 2 3
overhaul in state 2
0 1 2 3 Initial
0 0 7/8 1/16 1/16 TPM
1 0 3/4 1/8 1/8
2 0 0 1/2 1/2
3 0 0 0 1

0 1 2 3
0 0 7/8 1/16 1/16
1 0 3/4 1/8 1/8
2 0 1 0 0
3 1 0 0 0
MDP: TRANSITION PROBABILITY MATRIX
FOR POLICY RC

 Policy Rc:
Policy Verbal Description d0[R] d1[R] d2[R] d3[R]
Rc Replace in states 2 & 3 1 1 3 3

0 1 2 3 Initial
0 0 7/8 1/16 1/16 TPM
1 0 3/4 1/8 1/8
2 0 0 1/2 1/2
3 0 0 0 1

0 1 2 3
0 0 7/8 1/16 1/16
1 0 3/4 1/8 1/8
2 1 0 0 0
3 1 0 0 0
MDP: TRANSITION PROBABILITY MATRIX
FOR POLICY RD

 Policy Rd:
Policy Verbal Description d0[R] d1[R] d2[R] d3[R]
Rd Replace in states 1, 2 & 3 1 3 3 3

0 1 2 3 Initial
0 0 7/8 1/16 1/16 TPM
1 0 3/4 1/8 1/8
2 0 0 1/2 1/2
3 0 0 0 1

0 1 2 3
0 0 7/8 1/16 1/16
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
MDP: EXPECTED AVERAGE COST
Decision State Expected cost for Maintenance Cost of lost Total cost
producing defective items cost production per week
Do nothing 0 0 0 0 0
(1)
1 1000 0 0 1000

2 3000 0 0 3000

Overhaul (2) 2 0 2000 2000 4000

Replace (3) 1,2,3 0 4000 2000 6000

 Expected average cost of each policy is calculated as :


Policy Decisions in states (π0, π1, π2, π3) E(C)
(0,1,2,3)
Ra (1,1,1,3) (2/13, 7/13, 2/13, 2/13) 1/13[2(0) + 7(1) + 2(3) + 2(6)] = Rs. 1923
Rb (1,1,2,3) (2/21, 5/7, 2/21, 2/21) 1/21[2(0) + 15(1) + 2(4) + 2(6)] = Rs. 1667
Rc (1,1,3,3,) (2/11, 7/11, 1/11, 1/11) 1/11[2(0) + 7(1) + 1(6) + 1(6)] = Rs. 1727
Rd (1,3,3,3) (1/2, 7/16, 1/32, 1/32) 1/32[16(0) + 14(6) + 1(6) + 1(6)] = Rs. 3000
STEPS IN MARKOV DECISION PROCESS
1. Identify the basic transition probability matrix
after state identification
2. Identify the possible decisions which can be
exercised in each state
3. Develop policy by defining the decision to be taken
in each state
4. Re-create TPM for each policy
5. Calculate steady state probability values of each
state in each policy
6. Re-create cost structure for each policy
7. Identify expected average cost of each policy
8. Choose the policy with minimum expected average
cost
ABSORBING STATES
 State ‘k’ is called an absorbing state if pkk = 1.
 fik : Probability of absorption into state ‘k’
starting from state ‘i’

 Application of Absorbing state:


 Gambler ruin problem
 Credit evaluation problem
EXAMPLE: CREDIT EVALUATION
Consider a credit card company which classifies
customers based on fully paid (state 0), 1 to 30 days
due (state 1), 31 to 60 days due (state 2) or bad
debt (state 3). Accounts are checked in each billing
cycle to determine the state of each customer. In
general, credit is not extended and customers are
supposed to pay their bills within 30 days. Part
payment is accepted from customers. If the part
payment is made by customers in state 1 (1 to 30
days due), they will remain in that state. If the part
payment is received from customers in state 2 (31
to 60 days due), they will move to state 1.
Customers from Bad-debt category (state 3) cannot
move up to any other state.
EXAMPLE: CREDIT EVALUATION

Amount
Due
0 Min Amount Due Total Amount Due

Date Due

0 30 days 60 days

State 0 State 1 State 2 State 3


EXAMPLE: CREDIT EVALUATION
 After examining data over past several years on
the progression data of an individual customer,
credit card company developed the following
transition matrix:
State 0: Fully 1: 1 to 30 2: 31 to 60 3: Bad debt
paid days due days due (> 60 days due)
0: Fully paid 1 0 0 0
1: 1 to 30 days 0.7 0.2 0.1 0
2: 31 to 60 days 0.5 0.1 0.2 0.2
3: Bad debt 0 0 0 1

Approximately what percentage of customers from state ‘1 to 30 days’ will end


up in being into bad debt category?
EXAMPLE OF PROBABILITY OF
ABSORPTION: CREDIT EVALUATION
 fik: Probability of absorption into state ‘k’
starting from state ‘i’ = ∑ ∀ = 0, 1, … ,

• fik = 0, if state i is recurrent or


i j k
another absorbing state
• fkk = 1
pij fjk
Single-step Probability of
transition absorption
from ‘j’ to ‘k’ 1 0 0 0
f13 = p10 f03 +p11 f13 + p12 f23 + p13 f33 0.7 0.2 0.1 0
f23 = p20 f03 +p21 f13 + p22 f23 + p23 f33 0.5 0.1 0.2 0.2
0 0 0 1
f13 = 0.032
f23 = 0.254
FIRST PASSAGE TIMES
 Number of transitions made by the process in
going from state ‘i’ to state ‘j’ for the first time

 Number of transitions made by the process to


come back to a particular state ‘i’ for the first
time is called the recurrence time for state ‘i’

Recurrence time for state 3

X0 X1 X2 X3 X4 X5
3 2 1 0 3 1

First passage time to go to


state 1 from state 3
FIRST PASSAGE TIMES (FPT): INVENTORY
EXAMPLE
 µij : expected first passage time from state ‘i’ to state ‘j’
To calculate expected FPT,
we may consider all possible
i j i k j ways by which values of FPT
can be calculated along with
corresponding probability
FPT is 1 with probability pij FPT is 2 with probability pik * pkj

 Transition probability matrix for the inventory


example:
0.080 0.184 0.368 0.368
0.632 0.368 0 0
0.264 0.368 0.368 0
0.080 0.184 0.368 0.368

 What will be the expected first passage times from state ‘3’
to state ‘0’ (µ30)?
FIRST PASSAGE TIMES (FPT)
Understanding FPT from state ‘i’ to state ‘j’ by considering a direct path
and through indirect path:

pij j The FPT is 1 with probability pij

i
ij = 1 + ∑
Time
period = 1
pik
k
j The FPT is (1+kj)
kj kj
FIRST PASSAGE TIMES (FPT): INVENTORY
EXAMPLE
0.080 0.184 0.368 0.368 What will be the expected first
0.632 0.368 0 0 passage times from state ‘3’ to state
0.264 0.368 0.368 0 ‘0’ (µ30)?
0.080 0.184 0.368 0.368

ij = 1 + ∑

=1+ + +
=1+ + +
=1+ + +
10 = 1.58 weeks
After inserting the transition
probability values:
20 = 2.51 weeks
30 = 3.50 weeks
= 1 + 0.184 + 0.368 + 0.368
= 1 + 0.368 + 0.368
= 1 + 0.368
RECURRENCE TIMES: INVENTORY
EXAMPLE
 μii : expected recurrence time for state ‘i’
 What will be the expected recurrence time for
state ‘0’, value of µ00 ?
 µ00 = 1+ p0110 + p0220 + p0330 = 3.50 weeks

 µ00 = 3.50 weeks = (1/π0)

You might also like