Reinforcement Learning - Assignment 2
Reinforcement Learning - Assignment 2
Question 1
In [2]: total_steps = 100
n = 8
Pmat = np.array([[90.81,8.33,0.68,0.06,0.08,0.02,0.01,0.01],
[0.7,90.65,7.79,0.64,0.06,0.13,0.02,0.01],
[0.09,2.27,91.05,5.52,0.74,0.26,0.01,0.06],
[0.02,0.33,5.95,85.93,5.3,1.17,1.12,0.18],
[0.03,0.14,0.67,7.73,80.53,8.84,1,1.06],
[0.01,0.11,0.24,0.43,6.48,83.46,4.07,5.2],
[0.21,0,0.22,1.3,2.38,11.24,64.86,19.79],
[0,0,0,0,0,0,0,100],
],dtype=float) / 100
P = np.zeros((total_steps,n,n),dtype=np.float64)
P[0] = Pmat
for t in range(1,total_steps):
P[t] = np.matmul(P[t-1],Pmat)
Question 2
The Markov chain has two classes.
file:///Users/salmanyousaf/Downloads/HW Assignment 2 (2).html 5/10
07/11/2023, 13:05 HW Assignment 2
The first class contains the states "AAA", "AA", "A", "BBB", "BB", "B", and "CCC".
These states can all transition to each other.
The second class contains the single state "D". Once the chain enters state "D", it
cannot leave.
In other words, the first class is a group of states that the chain can move between
freely. The second class is a single state that the chain can only enter once and then
cannot leave.
Question 3
If a Markov chain has an absorbing state (a state that, once entered, cannot be left),
the periodicity of the chain is 1, making the chain aperiodic.
Question 4
In [3]: np.random.seed(1234)
states = ['AAA']
transition_lh = 1.0
summary = pd.DataFrame(columns=['Step','State_i','State_j','Transition Pr
for t in range(total_steps):
next_state = np.random.choice(range(n), p = Pmat[ratings.index(states
states.append(ratings[next_state])
transition_lh = Pmat[ratings.index(states[-2]),ratings.index(states[-
summary = pd.concat([summary, pd.DataFrame({'Step':[t+1],'State_i':[s
'Transition Probability':
In [4]: plt.figure(figsize=(10,5))
plt.step(range(total_steps+1),states)
plt.title('Bond Transition Simulation with Initial AAA Rating')
plt.xlabel('Time (t)')
plt.ylabel('State')
In [5]: display(summary)
Question 5
In [8]: Q = Pmat[:-1,:-1]
I = np.identity(Q.shape[0])
N = np.linalg.inv(I - Q)
AAA AA A BBB BB B CC
C
AAA 0.922412 0.947005 0.946123 0.902255 0.791519 0.779229 0.53810
4
AA 0.162693 0.953632 0.953058 0.906814 0.791358 0.779837 0.53853
6
A 0.098821 0.488781 0.971262 0.914662 0.796703 0.778549 0.53844
5
BBB 0.069195 0.330421 0.628312 0.954371 0.795564 0.77178 0.55083
1
BB 0.047866 0.218362 0.403173 0.594373 0.919893 0.815265 0.50701
8
B 0.029006 0.127264 0.229114 0.326122 0.495025 0.910368 0.47074
9
CCC 0.021674 0.07644 0.135751 0.192684 0.265219 0.413163 0.72305
5
The Markov chain is irreducible, suggesting that it is possible to get from any state to
any other state in the Markov chain.
Question 7
In [14]: # Define the number of states and periods
N = 8
T = 5
If a bond currently has an ""AAA" rating, there’s a 91.02% chance it will maintain that
rating within 5 periods. If a bond currently has a "AA" rating, there’s only a 2.98%
chance it will upgrade to an "AAA" rating within 5 periods and so on.
In [15]: # Print the probabilities
print("\nProbability of reaching CCC rating within 5 periods:")
for i, label in enumerate(ratings):
print(f"From {label}: {prob_CCC[i]:.6f}")
If a bond currently has an “AAA” rating, there’s only a 0.09% chance it will downgrade
to a “CCC” rating within 5 periods. However, if a bond currently has a “CCC” rating,
there’s a 66.51% chance it will maintain that rating after 5 periods and so on.
In [16]: # Print fi,i for each state
for i in range(n):
print(f'fi,{ratings[i]}: {round(P[0][i,i],4)}')
fi,AAA: 0.9081
fi,AA: 0.9065
fi,A: 0.9105
fi,BBB: 0.8593
fi,BB: 0.8053
fi,B: 0.8346
fi,CCC: 0.6486
fi,D: 1.0
Confirms intuition.
In [ ]: