0% found this document useful (0 votes)
27 views

CS2A Final Notes

CS2A Notes

Uploaded by

swankyrockingon
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

CS2A Final Notes

CS2A Notes

Uploaded by

swankyrockingon
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 78

Chapter – 1 (Stochastic Process)

1) Definition:
Stochastic Process is a set/ family/ collection of ordered (time dependent) Random
Variables.
e.g. Xt = X1, X2, X3,……………….. : Score of T-20 after t balls
where: State Space (S) = 1,2,3,……..,720
and, Time Domain (J) = 1,2,3,……….,120

2) State Space:
It is the set of values which is possible for a random variable Xt to take.
e.g. State Space of rolling a coin is [H,T].

3) Time Set:
It is the time at which a process contains a random variable Xt.
e.g. time set of rolling a coin is {0,1,2,…………..}

4) Sample path:
 It is the collection of all the possible values which a process can take over the time
period.
 e.g. sample path of rolling two coins up to time period 1 is [ HH,HT,TH,TT].
 A joint realisation of the random variables Xt for all t in J is called a sample path of
the process; this is a function from J to S .

5) Mixed Random Variables:


A mixed RV is otherwise a continuous RV but it has probability mass at at least one or more
points.
For example:
i) Revenue from car sold between $40k or $50k and 10-20 cars each year and car service.
Car Sold: S = {40000, 50000} J: { 10, 11, 12,……,20)
Car Service: S = (0, ∞) J: (0, ∞)
Here, S and J are mixed RVs.
ii) The number of contributors to a pension plan can be modelled with S = {1,2,3,…},
J=[0,∞]
6) Counting Process:
 Unit Change only ( +1)
 Non – decreasing
 Example:
i) No. of wickets in a match, S = (0,1,2……..)
ii) No. of claims reported in insurance company by time t.
 J can be discrete or continuous.

7) Increments:
It is the change in process over a period of time. ( Xt to X(t+4) )

8) Symmetric Simple Random Walk:


Xt = Zt + X(t-1) = ∑j=1n Zj  Increment + last observed value X at (t-1)
Where Zt =
+1 probability of +1 is p
-1 probability of -1 is (1-p)
Hence,
Xt = X0 + sum(s=1, t):Zs
If X0 = 0,
E[Xt] = [2p-1]*t
V[Xt] = [1-(2p-1)^2]*t
Note: Sometimes, Increments of a process have simpler properties (e.g. independent of
time) than process itself. So, it better to model increments than process itself.
State Space: Discrete (the set of all integers, Z)
Time Domain: Discrete (the set of all integers, Z)
Stationary: it is not weakly stationary.
Markov Property: It has Markov property.

9) General Random Walk:


If Y1, …, Yj ,… is a sequence of independent and identically distributed variables then the
process:
(i) Xn = ∑j=1n Yj with initial condition X0 = 0
State Space: Discrete/Continuous depends upon variable Yj
Time Domain: Discrete (the set of all integers, Z)
For example: Modelling share price daily, inflation index measured on monthly basis.
Mean: E[Yj] = m, E[Xn] = m*n where n is not constant, hence not stationary.
Markov property: It has Markov property.

10) White Noise Process:


en for n=(0,∞) where ei’s are IID RVs with mean 0 and variance σ2.
It is a stochastic process with a collection of RVs.
ei ~ N(0, σ^2)
e.g. Zt is a white noise process with mean zero and constant volatility.
i) J (time set) can be both discrete and continuous.
ii) S (state space) can be both discrete and continuous.
iii) White noise is weakly stationary.
iv) Markov property holds.

11) First level of classification:


 Discrete S with Discrete J: NCD (No Claim Discount) Model
 Discrete S with Discrete J: HSD (Health – Sick – Dead) Model
 Continuous S: Claims paid to policyholder

simple random walk poisson process general random walk brownian motion
markov chain markov jump time series ito process
NCD counting process white noise compund poisson
Credit rating at the end of status of pension scheme inflation index process
year members share price during
trading period

Discrete S Discrete S Continuous S Continuous S


Discrete J Continuous J Discrete J Continuous J

12) Second level of classification:


 Stationarity
 Independent increments
 Markov property

13) Stationarity:
 Strict Stationarity: A Stochastic process is said to be a strictly stationary,
if the joint distributions of
Xt1 ,Xt2…Xtn and X(k+t1), X(k+t2)……,X(k+tn) are identical or
f(Xt1 ,Xt2…Xtn) = f(X(k+t1), X(k+t2)……,X(k+tn))
for all t1, t2,.…,tn and k+t1, k+t2,…….k+tn in J and all integers n.
hence, the statistical properties (mean, variance etc.) of process remain unchanged as
the time elapses.

 Weak- Stationarity:
Because strict stationary is very difficult to test in real life, we use less stringent
condition of weak stationary.
i) E[Xt] = constant (free of t)
ii) Cov[ Xt, X(t+k) ] = function of time lag (k), not t
iii) Var[Xt] = Cov[ Xt, Xt ] = constant, not t (where time lag k = 0)
iv) Random Walk is not stationary as it’s mean is a function of time t.
v) White noise process (Zt) is strictly stationary with mean 0 and variance σ^2 (both
are constant).
vi) Weakly stationary multivariate is strictly stationary:
A normal distribution is defined by its mean,µ, and its variance, σ2 only. So if these
are constant (as per the weakly stationary definition) then this will uniquely define
the process. Hence it will also be strictly stationary.

14) Independent Increments:


A process Xt is said to have an independent increments if for all t and u>0, the
increment X(t+u) – Xt is independent of all the past process {Xs: 0<= s <= t}.

15) Stationary Increments:


A process Xt is said to have an stationary increments if for all t and s>0, the increment
X(t) – X(s) depends only upon time lag (t-s).

16) Markov Property:


A process is set to have the markov property if the future evaluation of the process can
be completely understood by the knowledge of current state alone (unlike independent
increments) and all the past information becomes irrelevant.
Note: A process with independent increments have markov property.
17) Important Results:

Properties White Noise Random Walk Poisson process


(a) Weak Stationary Yes No No
(b) Independent No Yes Yes
Increments
(c) Markov property Yes Yes Yes

Note: Moving average process is weakly stationary.

18) Poisson Process:


A poisson process operates in continuous J and has discrete S (set of non- negative
integers).
N(0) = 0
N(t) has an independent increments.
N(t) – N(s) ~ Poisson(λ(t-s) for t>s and n=0,1,2,…………….
P(N(t) – N(s)) =[ exp(-λ(t-s) * λ(t-s)n / n! ]
For examples:
i) Claim arriving to insurance company through time.
ii) Car accident reported over time.
iii) Arrival of customer at a service point over time.

19) Compound Poisson Process:


 Let Nt be a poisson process and t>=0
 Let Y1, Y2,…..Yj be a sequence of IID Random Variables
 Then compound poisson process is defined as
Xt = ∑ (j =1 to Nt): Yj
 State space: Discrete/Continuous depends upon variable Yj
 Time domain: (0,∞)
 Example: Modelling total amount of claim to an insurance company over time.

20) Mixed Stochastic Process:


A stochastic process in continuous time which can also change value at pre-determined
discrete instants.
Chapter – 2 (Markov Chains)
1) Properties of Markov Chain:
a) S is discrete.
b) J is discrete.
c) It holds Markov property.

2) Transition Diagram Graph:


E.g. NCD (No Claim Discount) Model –
0.75 0.75
0.25 0.75

0% 30% 60%

0.25 0.25
It is one-step Markov model.

3) Transition Probability Matrix (TPM):


0 % 30 % 60 %

[ ]
0% 0.25 0.75 0
P= 30 % 0.25 0 0.75
60 % 0 0.25 0.75

a) It is always a square matrix.


b) Each row of TPM is a conditional PDF.
c) Since each row is PDF, its sum is 1.
j

4) Notation: i

t n steps t+n

P[ X(t+n) = j | Xt = i ] = p(t,t+n)ij if time inhomogeneous


p(t)ij if time homogeneous
5) Chapman – Kolmogorov Equation:
P[ X(t+n) = j | Xt = i ] =
p(t,t+n)ij = Sum(k belongs to s): ( p(t,t+s)ik * p(t+s,t+n)kj ) if time inhomogeneous
p(n)ij = Sum(k belongs to s): ( p(s)ik * p(n-s)kj ) if time homogeneous

6) Initial Condition Method:


X0 = [ 1, 0, 0 ]
X1 = X0*P
X2 = X1*P
X3 = X2*P………
P(3)00 = It will be the first element of X3 as t=3 and state is 00 i.e. initial element of matrix.

7) Simple Random Walk when S = {……….,-3, -2, -1, 0, 1, 2, 3,…..….}


a) if difference between states is odd (e.g. p(6)27 > difference = 5)
b) and difference between number of steps is even (e.g. p(6)27 > difference = 6),
or vice-versa of point a) and b)
then, p(n)ij = 0

a) Let u be the up-steps, then u = (n+j-i)/2


b) To keep u a whole number, (n+j-i) should be whole number,
c) Either n and (j-i) both are odd or vice-versa.
d) p(n)ij = nCu * p^u *(1-p)^(n-u) if 0 <= n+j-I <= 2n and n+j-I is even
=0 if otherwise
8) Boundary Conditions:
a) Absorbing boundary: P[ X(n+1) = 0 | Xn = 0] = 1
b) Reflecting boundary: P[ X(n+1) = 1 | Xn = 0] = 1
c) Mixed boundary: i) P[ X(n+1) = 0 | Xn = 0] = α
ii) P[ X(n+1) = 1 | Xn = 0] = 1 - α

9) Simple Random Walk when S = { 0, 1, 2, 3, 4, 5, 6,……………..…….}


e.g. NCD model is an example of a bounded random walk.

10) Steps from Non- Markov to Markov Model:


a) Identify the critical state.
b) Redefine the state space by redefining the critical state which is done by incorporating
the relevant past information into the definition of critical state.
11) A model of accident proneness:
Yj = 0 : no accident in year j
Yj = 1 : accident in year j

P[ Y(n+1) = 1 | Y1 = y1, …………Yn = yn ] = f (y1+ y2+….yn) / g(n) or


P[ Y(n+1) = 1 | Y1 = y1, …………Yn = yn ] = Total Number of accident / No. of years

where f and g are increasing function.


and f and g must satisfy the inequality - 0 <= f(m) <= m

12) Stationary Distribution: Π*P = Π


Conditions:
a) If there are finite S, then, stationary distribution will exist.
b) If there are finite S and irreducible, then there is unique stationary distribution.
c) Finite S, irreducible and aperiodic, unique distribution and settled down.

13) Irreducibility:
If there always exist a path from any state I to any state j irrespective of no. of steps, then
markov chain is irreducible.

14) Periodicity:
A state in a Markov chain is periodic with period d  1 if a return to that state is possible
only in a number of steps that is a multiple of d .
Results:
a) If Markov chain is irreducible, all the states have same period or all the states are
aperiodic.
b) A Markov chain has period d if all the states in the chain have period d .
c) A Markov chain is aperiodic if If there is no such d  1.
d) Note: if chain may remain at it’s current state, chain is aperiodic.

15) Estimating transitional probabilities:


pij = Nij / Ni
where Nij / Ni ~ Bin(Ni, pij)

16) Triplet test:


Chi – square test
Chapter – 3 ( The two-state Markov
Model and the Poisson Model )
1) Properties of Markov Chain:
a) Discrete S
b) Continuous J
c) It satisfies Markov property.
d) Two-state model is the example of Markov Jump.

2) Two-state Model:

D D

A A

x+t x+t+h x+t+1

S = {A, D}
J = (0, ∞)
a) P[ Alive at age x+t+h | Alive at age x ] = p(x, x+t+h)AA = t+hpx
b) P[ Dead at age x+t+h | Alive at age x ] = p(x, x+t+h)AD = t+hqx = 1 - t+hpx
c) P[ Dead at age x+t+h | Dead at age x ] = 1
d) P[ Alive at age x+t+h | Alive at age x ] = 0

3) Assumptions of Markov Two-state Model:


a) Two-state model satisfy Markov property.
b) hqx+t ~= h.µx+t
c) µx+t is constant µ for 0 <= t < 1.
d) ∂/∂t tpx = - tpx * µx+t Hence, tpx = exp[-int(0, t): µx+s ds]
e) Using 3rd assumption, tpx = exp(-µt)

Note: Transition rates are the parameters of Markov jump process.


4) Statistics:
a) Define RV Di as follows:
i) Di is an indicator variable.
ii) Di ~ Bernoulli ( bi – ai q x + ai )
iii) E[Di] = bi – ai q x + ai
iv) V[Di] = bi – ai q x + ai * ( 1 – bi – ai q x + ai)

b) Define RV Ti as follows:
i) x + Ti = age at which observation if ith life ends.
ii) Di = 0  Ti = bi i.e. no death occurred (discrete).
iii) Di = 1  ai < Ti < bi i.e. death must occurred between x+ai and x+bi
(continuous).
iv) Ti is a mixed RV because it has probability mass at bi.

5) Vi = Ti – ai (duration)
i) Vi = Exact waiting time
ii) Di = 0  Ti = bi  Vi = bi – ai  maximum waiting time
iii) Di = 1  ai < Ti < bi  0 < Vi < bi-ai  exact waiting time.
iv) Vi is mixed RV with probability mass at bi – ai.
v) Joint PD of di and vi =
f(di, vi) = vipx+ai *( µx+ai+vi )di = e-µ*vi * µ di
vi) µ̂ = sum(di)/sum(vi) = d/v = total no. of deaths / total observation
period. This is point estimate of µ.

6) µ̃ ~ N( µ, d/v2 )
95% CI for µ = µ̂ +– 1.96*sqrt(d/v2)

7) Poisson Model:
Dx ~ P( µx+0.5 * Ecx )
Where Ecx = exact waiting period
µ̂x+0.5 = d/ Ecx = No. of deaths / total obs. period (Exact waiting time) = d/v
V( µ̂x+0.5) = d/(Ecx)2
Chapter – 4 & 5 (Time-inhomogeneous
Markov jump process)
1) Markov jump processes:
 A continuous time Markov Process Xt , t>=0 with a discrete state space S
is called Markov Jump process.
 A Markov jump process is a stochastic process with a continuous time
set and a discrete state space that satisfies the Markov process.

2) Properties of Valid Generator Matrix:


i) The transition rates of moving from each state to each other state must
be non-negative. That is, µij >= 0 for i ≠ j
ii) The sum of each row must be zero.
iii) Should be a finite or countable square matrix.

3) Difference between time-homogeneous vs time-inhomogeneous Markov


jump processes:
 Probability that an event occurs during the short interval between time t
and t+h is approximately equal to λt.
 For time-inhomogeneous process, λt depends upon current time t.
 For time-homogeneous process, λt is independent of time.

4) Conditions for Valid Time homogeneous Jump Markov Process:


 A Markov jump process is time homogeneous if the transition
probabilities P[Xt = j | Xs = i] depend only on the length of time interval t-
s. or
 A Markov jump process is time homogeneous if the transition rates are
constant.
5) Difficulties in Time-Inhomogeneous Markov jump processes:
 It requires more parameters and complex calculations.
 There may be no sufficient data available to estimate parameters.
 The solution to Kolmogorov equations may not be easy ( or even possible) to find
analytically. A possible procedure is to divide the time interval into subintervals, assume
that the transition rates are constant over each subinterval, and estimate the transition
rates for each subinterval using the procedure described above.
 Alternatively, we could select an appropriate functional form for ( ) ij  t and use the
data to estimate the relevant parameters. This is only possible if we have an idea of
what kind of formula would be appropriate.

6) Features of time-inhomogeneous Markov jump processes:


a) Chapman – Kolmogorov Equation:
pij(s,t) = ∑k∈S pik(s,u)*pkj(u,t) for all s<u<t
b) Transition Rates:
pij(s,s+h) = h*µij(s) + o(h) if i≠j
1 + h*µii(s) + o(h) if i=j

7) Kolmogorov’s forward differential equations:


a) General Form:
∂/∂t pij(s,t) = ∑k∈S pij(s,t)*µkj(t)
where µkj(t) is the force of transition rate from state k to j at time t.

b) Compact Form:
∂/∂t P(s,t) = P(s,t)*A(t)
where A(t) is the matrix with entries µij(t) .

c) Integrated Form:
pij(s,t) = ∑k≠i int(0, t-s): [pik(s,t-w)dw] *[µkj(t-w)]* [exp(t-w,t): -λj(u)du] dw
where,
 the probability of going from state i at time s to state k at time t-w
 then making a transition from state k to state j at time t-w
 and staying in state j from time t-w to time t .
8) Time-inhomogeneous HSD model (KFDE):

σ(t)
H: Healthy S: Sick
ρ(t)

µ(t) v(t)

D: Dead

a) Standard Forward Equation:


∂/∂t pHD(s,t) = pHH(s,t)*µHD(t) + pHS(s,t)*σSD(t) + pHD(s,t)*µDD(t) =
pHH(s,t)*µHD(t) + pHS(s,t)*σSD(t)

b) Non-standard forward equations:


PH̅H̅(s,t) = exp[ - int(s,t): (σ(t) + µ(t))du]
pS̅S(̅ s,t) = exp[ - int(s,t): (ρ(t) + v(t))du ]

9) Time-inhomogeneous Poisson Process with transition rate λ(t) at time t:


KFDE: ∂/∂t P(s,t) = P(s,t)*A(t)
KBDE: ∂/∂t P(s,t) = -A(s)*P(s,t)
Where P(s,t) is the matrix of transition probabilities
And, A(t) is generator matrix at time t.

10) Occupancy Probabilities:


pi̅i̅(s,t) = exp(- int(s,t): λi(u)*du ) or exp(- int(0,t-s): λi(s+u)*du )
where λi(u): total force of transition out of state i at time u.

11) Important Result:


µii(s) = - ∑j≠i µij(s)

12) Kolmogorov’s Backward Differential Equations:


a) General Form:
∂/∂s pij(s,t) = (-) ∑k∈S µik(s)*pkj(s,t)
where µik(s) is the force of transition rate from state i to k at time s.
b) Compact Form:
∂/∂s P(s,t) = -A(s)*P(s,t)
where A(s) is the matrix with entries µij(s) .
c) Integrated Form:
pij(s,t) = ∑l≠i int(0, t-s): [exp(s, s+w): -λi(u)du]*[µil(s+w)]*[plj(s+w,t)dw]
where,
 the probability of remaining in state i from time s to time s+w
 then making a transition to state l at time s+w
 and finally going from state l at time s+w to state j at time t .

13) Time-inhomogeneous HSD model (KBDE):

σ(t)
H: Healthy S: Sick
ρ(t)

µ(t) v(t)

D: Dead

a) Standard Backward Equation:


i) ∂/∂s pHH(s,t) = - [ µHH(s)*pHH(s,t) + µSH(s)*pHS(s,t) + µHD(s)*pDH(s,t) ]
= - [ - ( σ(s) + µ(s) )*pHH(s,t) + σ(s)*pHS(s,t) ]
ii) ∂/∂s pHS(s,t) = - [ µHH(s)*pHS(s,t) + µHS(s)*pSS(s,t) + µSD(s)*pDS(s,t) ]
= - [ -( σ(s) + µ(s) )*pHS(s,t) + σ(s)*pSS(s,t) ]

b) Non-standard forward equations:


PH̅H̅(s,t) = exp[ - int(s,t): (σ(t) + µ(t))du]
pS̅S(̅ s,t) = exp[ - int(s,t): (ρ(t) + v(t))du ]

14) Residual Holding Time:

RS: Residual holding time at s is the amount of time after s (>s) for which the
process stays in current state.
For (residual holding time at s) > w, the process must stay in state i for all
times u between s and s+w, given that process was in state i at time s.

Or
It is the probability that process stays in state i for at least next w time units.
P[ RS>w, XS = I ] = pi̅i̅(s, s+w) = exp[ - int(s,s+w): λi(t)*dt ]

15) Probability that the process goes into state j when it leaves state I:
a) Given that the process in state i at time s.
b) It stays in state i until the time (s+w).
c) The probability that it moves to state j at time(s+w) is: µ ij(s,s+w)/λi(s,s+w)
i) Force of transition within same state i = -∑λi(t)
ii) Force of transition out of state i = λi(t)

16) Testing for Estimate of transition rates (MLE):


 Whether successive holding times are exponential variables and
independent.
 Any procedure which does this test is acceptable.

17) Expected waiting time in state j:


∑i Probability of reaching in state j from state i*Expected waiting time in
state i

18) Transition rate based on Expected waiting time:


 If Expected waiting time of staying in state i is x hours, then,
transition rate of staying in state i is λii = -1/x per hour.
 If probability of move out from state i to state j and k is 0.4 and 0.6
respectively, then
 λij = 1/x * 0.4
 λik = 1/x * 0.6

19) MLE of transition rates in time homogeneous Markov jump process:


µ̂ij = nij/ti or µii = -∑j≠i µ̂ij
where nij = observed number of transition from state i to j
ti =Total observed waiting time in state i.
Chapter – 6 (Survival Model)
1) Definitions:
a) T: Complete future lifetime of a new born.
b) F(t) = tq0
c) S(t) = tp0
d) Tx = Complete future lifetime of a person aged x.
e) FX(t) = tqx
f) SX(t) = tpx
g) FX(t) + SX(t) = 1
h) CDF of survival function - P[T<= t] for all t>=0
i) Survival Function Sx(t) – P[Tx > t] for all t>=0
j) Hazard Function/ Force of mortality µx+0.5 – lim (h tends to 0) P[Tx < t+h | Tx > t ]/h
k) Interval of Survival Function Random Variable T = [0,∞)
l) Survival Function Random Variable T is non-increasing function of t.

2) Consistency Condition result:


a) S(x+t) = S(x)*SX(t)
b) F(t+x) = F(x) + S(x)*FX(t)
c) FX(t) = ( F(x+t) – F(x) )/ (1- F(x))
d) SY(t) = SX(y+t-x) / SX(y-x)

3) Properties of o(h):
a) o(h) +- o(h) = o(h)
b) k* o(h) = o(h)
c) lim o(h)/h = 0
h→0
d) lim g(h)/h = A
h→0
g(h)/h = A + o(h)/h

4) Force of Mortality (µx) =


It is the rate of change of probabity of death w.r.t. time
(from time 0 to h).
hqx = h* µx
5) Important result:
a) SX(t) * hX(t) = fX(t)
b) fX(t) = tpx * µx+t
c) f(t) = tp0 * µt
d) If TX ~ Exp(λ),
SX(t) = exp(-λt) > fX(t) = λ* SX(t)
Where λ = hX(t)
e) tpx = SX(t) = exp( int(0,t): -µx+s*ds )
f) tpx = SX(t) = exp( int(x, x+t): -µs*ds )

6) Life tables results:


a) dx = lx – lx+1
b) tpx = lx+t / lx
c) tqx = (lx - lx+t ) / lx = dx/lx

7) Constant force of mortality (constant µx+t):


a) If age is integer:
px = exp(-µ)
b) If end age is non-integer:
t t
tpx = (exp(-µ)) = exp(-µ*t) = (px)

c) If start date and end date both are non-integer:


t-s
t-spx+s = (px)

8) Uniform distribution of deaths (increasing µx+t):


a) If age is integer:
qx = fX(s)
b) If end age is non-integer:
tqx = t* qx

c) If start date and end date both are non-integer:


t-sqx+s = (t-s)*qx / (1- s*qx )

9) Balducci Assumption (decreasing µx+t over [x,x+1] ):


a) If end age is non-integer:
( 1-tqx+t ) ~= (1-t)*qx
b) If start date and end date both are non-integer:
t-sqx+s = (t-s)*qx / (1- (1-t)*qx )
q
t x = t*qx / (1- (1-t)*qx )

10) Initial and central rate of mortality:


a) Initial rate of mortality:
qx = It is the probability that a life exactly aged x will die before reaching exact age x+1,
which is called initial rate of mortality.
qx = dx/ lx
b) Central rate of mortality:
mx = It represents the probability that a life alive between the ages of x and x+1 dies.
c) Definition of mx :
i) int(0,1): tpx * µx+t dt / int(0,1): tpx *dt
ii) dx / int(0,1): lx+t dt
iii) qx / int(0,1): tpx dt
Note: int(0,1): tpx *dt and exp(int(0,1): tpx *dt) is different.

d) Under CFM assumption, mx = µx


e) Where mx is weighted average of µx.
f) If µx is constant under CFM, mx will also be constant and equal to µx.

11) Order on the basis of:


a) qx ( initial rate of mortality): Badducci < CFM < UDD
b) mx (central rate of mortality): Badducci > CFM > UDD

Note:
i) We would expect mx to be highest for that with lowest estimated exposure.
ii) For a given number of deaths over the period, the estimated exposure would be
highest if we assume an increasing mortality rate.
iii) If actual number of survivor is less than expected, then in that case UDD assumption
is appropriate (increasing mortality rate or deaths are evenly spaced) but over single
years of age. UDD is not acceptable over a 10 year of age span.
iv) Under UDD, survival function decreases linearly between ages x and y while under
CFM, survival function decrease exponentially between ages x and y.

12) Survival function of Gompertz Law of mortality:


p = gc^x*(c^t -1) where g = exp(-B/ln c)
t x

13) Survival function for Makeham’s Law of mortality:


p = st * gc^x*(c^t -1) where g = exp(-B/ln c) and s = exp(-A)
t x

14) Complete expectation [E(Tx)]:


ex0 = int(0, ∞): tpx*dt = int(0, ∞): SX(t)*dt
e00 = int(0, ∞): tp0*dt = int(0, ∞): S(t)*dt
Note: ex0 > e00 because complete expectation at any age is average of future lifetimes of
a person aged x ( or x day old bulb) which is not died (failed) before that (future) age.
So, average ex excludes those persons who died before age x. So, it inflates the average
ex from e0 where e0 includes those who died after the birth (considering deaths before
age x as well).

15) Complete expectation [E(Tx)] after Division at age x:


e00 = int(0,x): tp0 dt + xp0 * ex0

16) Curtate lifetime RV: P[K=k] = kpx*qx+k and E[Kx]:


i) Curtate expectations: ex = ∑k=1∞ :kpx
ii) Relationship between ex0 and ex: ex0 = ex + 0.5

17) Divide complete lifetime where µ changes at age x:


e00 = int(0,x): tp0 dt + xp0 ex0

18) Variance of complete and curtate future lifetime:


From book Page no. 328.

19) Formulas:
a) When PDF (fX(t)) is given:
i) FX(t) = int(0,t):fX(s)*ds
ii) SX(t) = 1 – int(0,t):fX(s)*ds
iii) µx+t = fX(s) / ( 1 - int(0,t):fX(s)*ds ) = PDF/Survival Function

b) When CDF (FX(t)) is given:


i) fX(t) = d/dt (FX(t))
ii) SX(t) = 1 – FX(t)
iii) µx+t = d/dt (FX(t)) / (1 - FX(t))
c) When SX(t) = tpx is given:
i) fX(t) = -d/dt (SX(t))
ii) FX(t) = 1 - SX(t)
iii) µx+t = -d/dt (SX(t)) / SX(t) = -d/dt ln(SX(t))

d) When µx+t is given:


i) fX(t) = exp(0,t): (- µx+s*ds) * µx+t
ii) FX(t) = 1 - exp(0,t): (- µx+s*ds)
iii) SX(t) = exp(0,t): (- µx+s*ds)

e) tpx = exp(-int(0,t): µx+s ds)


f) tqs = exp(int(0,t): spx * µx+s ds)

20) Where do we use Gompertz hazard?


Taking logarithms of the Gompertz hazard produces log x = log B + xlog c which indicates that
the rate of increase of the hazard with age is constant. Empirically, this is often a reasonable
assumption for middle ages and older ages, which include the age range 50 - 65 years.

21) Use of Gompertz model under proportional Hazard Model:


 Putting B = exp(β0 + β1 X1 + β2 X2 + β3 X3 ) into the Gompertz model produces
 λx = exp( β0 + β1 X1 + β2 X2 + β3 X3 ) . cx , defining x as duration since 50th birthday.
 The hazard can therefore be factorized into two parts:
 exp(β0 + β1 X1 + β2 X2 + β3 X3), which depends only on the values of the covariates, and
 cx , which depends only on duration.
 Therefore the ratio between the hazards for any two persons with different characteristics does
not depend on duration, and so the model is a proportional hazards model.
 In R:
 µx = B*cx
 logµx = logB + x*logc
 Here, y = logµx and x = x (age) and two parameters – logB and logc.
Chapter – 7 (Estimating the lifetime
distribution function)
Notations:
 n: lives under observation
 m: number of failures/deaths
 k: number of times at which m is observed i.e. tk.
 n-m: number of remaining lives (either censored or remaining lives)
 tj: times at which failure is observed.
 dj: number of deaths observed at time j or at tj
 nj: number of lives available for death/failure
 cj: number of lives censored between time tj or tj+1.
 λj = dj/nj

Kaplan-Meier (Product Limit) Estimate of survival function:


 It is used to estimate survival function in case of Censored lifetime data.
 Ŝ(t): ∏j=1k (1-λ̂j) = ∏j=1k (nj - dj)/nj
 = Number of survivor/ Number at risk
 Hazard Function = λj at time tj
 Greenwood formula to calculate Variance of Kaplan-Meier estimator:
VAR[F~(t)] = Ŝ(t)2 * ∑j=1k dj/ (nj - dj)*nj where Ŝ(t)2 = (1-F̂(t))2 and V[S~(t)] = V[F~(t)]
 CI for 95% = Ŝ(t) +- 1.96*sqrt(Var[F~(t)])
 L = ∏j=1k (1-λj)(nj-dj) * λjdj

Assumption under Kaplan-Meier Method:


 Presence of non-informative censoring
 Homogeneous population
 If any individual is observed to be censored at the same time as of one of the deaths.,
convention is to treat censoring as if it happens shortly afterwars.

Nelson-Aalen Model:
 It derive an estimator for cumulative hazard function.
 Ŝ(t) = exp(-capital λt) where capital λt = ∑j=1k dj/nj or ∑(j=1 to k) λj
 VAR[capital λt~] approx. = ∑(tj<=t):dj(nj-dj)/nj3
 L = ∏(censored lives) S(ti) * ∏(deaths) f(ti)
Types of Censoring:
1. Right Censoring:
Right censoring occurs when a life exits the investigation for a reason other than death.
Both random censoring and Type I censoring are examples of right censoring.
E.g. Endowment policy matures.

i) Random Censoring:
With random censoring, the censoring times are not known in advance – they are not
chosen by the investigator and are random variables.
 E.g. censoring in life insurance the event of a policyholder choosing to surrender a policy.

ii) Type I censoring:


Type I censoring occurs when the censoring times are known in advance, ie the censoring
times are chosen by the investigator.
 E.g. observation ceases for all those still alive at the end of the period of investigation.

2. Left Censoring:
It occurs when the censoring mechanism prevents us from knowing when entry into state
that we wish to observe take place or past information is missing.
 e.g. Exact DOB/ date of sickness is not known.

3. Interval Censoring:
It occurs when observation plan allows us to say that an event of interest fell within some
interval of time.
 E.g. Policy sold in year 2020 means between interval 1/12020 to 31/12/2020.

4. Type – II Censoring:
If observation is continued until a predetermined number of deaths has occurred.
 e.g. a trial ends after 100 lives on a particular course of treatment have died.

5. Informative censoring:
When there is reason that the risk of death of a person still in cohort and those who has left
the cohort are different.
 E.g. In this investigation withdrawals might be informative, since lives that are in better health
may be more likely to surrender their policies than those in a poor state of health. Lives that are
censored are therefore likely to have lighter mortality than those that remain in the
investigation.
6. Non-informative censoring:
Censoring is non-informative if it gives no information about the future patterns of
mortality by age for the censored lives.
 E.g. In the context of this investigation, non-informative censoring occurs if at any given time,
lives are equally likely to be censored regardless of their subsequent force of mortality. This
means that we cannot tell anything about a person’s mortality after the date of the censoring
event from the fact that they have been censored.

Median:
The median time to qualify as estimated by the Kaplan-Meier estimate is the first time at which S(t) is
below 0.5.
Chapter – 8 (Proportional Hazard
Model)
Definition:
If the hazard for life i is λ(t; zi), then λ(t; zi) = λ0(t) *exp(b*ziT) where λ0(t) is the baseline hazard,
and β is a vector of regression parameters.

Suitability of Cox Model:

 It ensures that hazard is always positive.


 Log hazard is linear.
 It allows general shape of the hazard function of all the individual to be determine by
data, giving a high degree of flexibility while an exponential term accounts for
differences between individuals.
 If we are not primarily concerned with the precise form of hazard, we can ignore the
shape of baseline hazard and estimate the effect of covariates from the data directly.
 It is widely available in standard computer packages. It is a popular, well-established
model.

Why this model is known as Semi-Parametric?


The model is semi-parametric because it is possible to estimate from the data without
estimating the baseline hazard. Therefore the baseline hazard can have any shape determined
by the data.

Fully parametric models vs Cox regression model for assessing the impact of
covariates on survival:
 Fully parametric models are good for comparing homogenous groups, as confidence
intervals for the fitted parameters give a test of difference between the groups which
should be better than non-parametric procedures, or semiparametric procedures such
as the Cox model. But parametric methods need foreknowledge of the form of the
hazard function, which might be the object of the study
 The Cox model is semi-parametric so such knowledge is not required. The Cox model is a
standard feature of many statistical packages for estimating survival model, but many
parametric distributions are not, and numerical methods may be required, entailing
additional programming.
Proportional Hazard:
µi(t) = λ0(t) * exp(β1X1 + β2X2)

Proportional Hazard under Gompertz Law:


µx = Bcx
 µx = exp( β0 + β1X1 + β2X2)* cx where B = exp( β0 + β1X1 + β2X2)
 µx = exp(β0 + β1X1 + β2X2) where B = exp(β0) and cx = exp(β1X1 + β2X2)
where,
 exp( β0 + β1X1 + β2X2) depends only on the values of covariates.
 cx which depends only on duration since age x.

Observation:
 tpx = gc^x *(c^t -1) where g = e-B/log c
 It is an exponential function which implies that the rate of increase in
mortality with age is constant.
 The Gompertz model is appropriate at ages over about 30.
 It is appropriate for ages at which the force of mortality is increasing
exponentially.

Note: This model is suitable when hazard is monotonically increasing or


decreasing. If hazard is increasing/ decreasing in initial stages and then
decreasing/increasing at later points, Gompertz model is not appropriate.

Proportional Hazard under Makeham Law:


 µx = A + Bcx
 tpx = st * gc^x *(c^t -1) where g = e-B/log c and s = e-A

Gompertz-Makeham Formula GM(r,s):


µx = poly1(t) + exp[poly2(t)]
where t is linear function of x
poly1(t) and ploy2(t) are polynomials of degree r and s respectively.

Likelihood of data:
L = ∏ i=1 to n f(ti)di S(ti)1-di
Where f(ti) is the probability density function and S(ti) is the survivor function.
f(ti) = h(ti)*S(ti)
the likelihood can be rewritten as:
L = ∏ i=1 to n h(ti)di S(ti)

Check the significance of β for a particular covariate:


H0: β = 0 (No impact)
H1: β < 0 (results in decreasing mortality) or β ≠ 0 or β>0
H 1: β ≠ 0
H1: β > 0 (results in increasing mortality)

Calculate 95% CI of β:
 (-∞ , β̂+1.645*sqrt(CRLB)) for β<0
 (β̂ - 1.96*sqrt(CRLB) , β̂ + 1.96*sqrt(CRLB)) for β≠0
 (β̂ - 1.645*sqrt(CRLB) , ∞) for β<0
Where CRLB = -1/E[d2l/dβ2] at β = β̂
Result: If 0 lies in this interval, there is insufficient evidence to reject the null
hypothesis (as stated above).
Chapter – 9 (Exposed to Risk)
Principle of correspondence:
 The principle of correspondence states that the death data and the exposed to risk must
be defined consistently, ie the numerator (dx) and denominator (Ecx) must correspond.
 A life alive at time t should be included in the exposure at age x at time t if and only if,
were that life to die immediately, he or she would be counted in the death data dx at
age x .

Importance of dividing the data for a mortality investigation into


homogeneous classes:
 All our models and analyses assume that we can observe groups of identical lives (with
respect to mortality characteristics) Although in practice this is never completely
possible, we can at least subdivide by characteristics which have a known impact on
mortality, reduce the heterogeneity of each class being investigated.
 A balance must be struck between obtaining more homogeneity and retaining large
enough populations to make analysis possible.
 If the groups are not homogeneous, any rates derived will be a weighted average of the
underlying rates for the different individuals in the group.
 The weightings may change with time, which will make it very difficult to establish what
patterns are emerging.
 If premiums are calculated based on mortality rates derived from heterogeneous
groups, then anti-selection may occur, with the healthier lives choosing to insure
themselves with an office where they will not be charged a premium based on others
with an inherent higher level of risk.
 Data can be sub-divided according to certain characteristics that we know to have a
significant effect on mortality. This will reduce the heterogeneity of each group, so that
we can at least observe groups with similar, but not the same, characteristics.
 The factors often used in life assurance: Sex, Age, Type of policy, Smoker/Non-smoker
status, Level of underwriting, Duration in force, Sales channel, Policy size, Occupation
(or social class) of policyholder, Known impairments, Geographical region, Educational
attainment, Housing tenure and Disability, chronic health condition, limiting long-term
illness.
Problems in collecting homogeneous data:
 Sub-dividing data using many factors can result in the numbers in each class being too
low. It is necessary to strike a balance between homogeneity of the group and retaining
a large enough group to make statistical analysis possible.
 Sufficient data may not be collected to allow sub-division. This may be because
marketing pressures mean proposal forms are kept to a minimum.

Assumptions:
i) Data vary linearly between two census dates or human birthday’s are uniformly
distributed.
ii) We assume that deaths follow poisson distribution with parameter (µx+1/2 Ecx) and µ is
applicable at the mid of two ages depends upon definition of deaths of data.

Formulas to estimate central exposed to risk:


1. If population varies linearly between census dates:
Area of Trapezium: ½ * period (in years) * sum of parallel sides
2. If population remained constant between census dates:
Area of rectangle: Length * Height

General Formula of Central Exposed to Risk:


Ecx = Int(0,t):Px(t)dt ~= ∑t Px(t) or
Ecx = Sum the number of period between two census dates after the entry and before
the exit of observation or reaching at particular age for different lives6.

Likelihood of hazard of death at age x:


MLE at age x = µ̂x = dx/ Exc
qx = 1 – px = 1 – exp(-µx)

Census approximations to the central exposed to risk:


Sr. No Death Age (dx) Observation Date (P(x,t)) Formula of P*(x,t)
1 Nearest Age Birthday Last Age Birthday ½ P(x-1,t) + ½ P(x,t)
2 Nearest Age Birthday Next Age Birthday ½ P(x,t) + ½ P(x+1,t)
3 Next Age Birthday Last Age Birthday P(x-1,t)
4 Next Age Birthday Nearest Age Birthday ½ P(x-1,t) + ½ P(x,t)
5 Last Age Birthday Nearest Age Birthday ½ P(x,t) + ½ P(x+1,t)
6 Last Age Birthday Next Age Birthday P(x+1,t)
Data required for exact calculation of central exposed to risk:
For each life observed during the investigation period (eg period between 1/1/2016 to
1/1/2018) we need:
 date of birth or date of xth birthday.
 date of joining the investigation (if after 1 January 2016)
 Age nearest birthday

µ and q estimates:
Definition of x Rate interval ˆ estimates qˆ estimates
Age last birthday [ x,x+1] µx+1/2 qx
Age nearest birthday [x – ½ ,x + ½ ] µx qx-1/2
Age next birthday [x-1 , x] µx-1/2 qx-1

Where qx is initial rate of mortality.


µ̂x = qx/Exc or µ̂x = (qx,t + qx,t+1 + qx,t+2….)/Exc
µ̂x = dx/ Exc
Chapter – 10 & 11 (Graduation)
Graduation:
Graduation is a process of using statistical techniques to improve the estimate provided by the crude rates.
It results in smoothing of crude rates.

Aim/ Purpose of Graduation:


1) To produce smooth set of rates that are suitable for a particular response.
2) To remove random sampling error as far as possible.
3) To use the information available from adjacent ages (to improve the reliability of estimates).

Comparison of recent experience (to check the consistency i.e., shape of mortality curve
over range of ages and level of mortality rates) with:
 Company’s own experience
 Already published life tables (standards tables like National life tables, English Life tables and
tables based on data from insurance companies).

Comparison with standard tables:


H0: the mortality rate being tested ( ^μx) are consistent with those from the standard table (µxs).

Reasons for Graduations:


 If the force of mortality is smooth and not changing too rapidly, then our estimate of µ x being the
best estimate as it would be close to µx-1 and µx+1. Thus, by smoothing, we can make use of data at
adjacent ages to improve the estimate at each age. However, the mortality rates may show some
significant changes of mortality at certain ages e.g., accidental hump.
 To calculate smooth premium rates with age, it is more convenient to have smooth mortality
rates.
 This reduces the sampling errors at each age.
 It is desirable that financial quantities progress smoothly with age, as irregularities are hard to
justify to clients.

Limitation of Graduation:
If data is faulty or biased, resulted output will never be reliable.

Smoothness vs adherence of data:


Overgraduation: to more focus on smoothness of the curve.
Undergraduation: to more focus on adherence of data.

Testing smoothness of a graduation:

1) Calculate third differences of the graduated quantities (µxo):


a) The third differences should be small in magnitude compared with the quantity
themselves.
b) It should progress regularly.

Statistical test of a mortality experience:

1. Chi-squared Test:
i) Purpose: To test whether observed number of deaths at each age are consistent with
graduated mortality rates or a particular standard table.
ii) Rationale / Observation: A high value of test statistics indicates that the discrepancies
between observed numbers and those predicted by graduation rates and standard table are
large, hence, the fit is not very good due to overgraduation.
iii) Assumptions:
 No heterogeneity of mortality (e.g., no accidental hump) within each age group
 Lives are independent
 The expected number of deaths are high enough (at least 5 in each cell) for the chi-
square approximation to be valid.
iv) Methods:
a) Step 1: Calculate zx:
i) Binomial Model:
zx = (dx – Ex*qx0 ) / sqrt( Ex*qx0 *(1- qx0 ))
ii) Poisson Model:
zx = (dx – Ex*µ0x+0.5) / sqrt(Ex*µ0x+0.5)
Hence, calculate z2.
b) Combine small groups such that expected number of deaths is never less than 5.
c) Step 3: Calculate test statistic for chi-square goodness of fit test: sum((O-E) ^2/ E).
d) ∑i=1to m z2 = chi-sq(m)
e) Step 4: Calculate appropriate degree of freedom:
 m: number of age groups
 Number of parameters
 Number of constraints
 Clubbing’s
 We lose 2-3 degrees for every 10 ages graduated graphically.
v) Conclusion: If values of test statistics exceed critical value at upper 5% point of chi-square
distribution, it indicates poor fir or overgraduation.
vi) Strengths of chi-square test: It is a good test to check overall goodness-of-fit.
vii) Deficiencies of chi-square test:
a. Outliers: Few large deviations can be offset by lot of small deviations. So, test could
be satisfied although the data do not satisfy the distributional assumptions.
(Assumption – 1 define above).
b. Small Bias: It ignores consistent positive or negative bias due to squaring of
difference (O-E) ^2. (Since it is based on squared deviation, it tells nothing about
direction of any bias.)
c. Clumps/Runs: There could be significant groups of consecutive ages (continuously
up or down) called as clumping which should be avoided.
d. Rates may not progress smoothly from age to age.

2. Different types of tests for different types of defects:


i) For large deviations (defect a.) :
i. Individual Standardised Deviation Test
ii) For small and consistent bias (Overgraduation/Undergraduation):
i. Sign Test
ii. Cumulative Deviation Test
iii) Shape of graduation/runs or clump/ long run of same sign graduation:
i. Grouping of Sign Test
ii. Serial Correlation Test
iv) Smoothness of Graduated Rates:
i. Third Differences Test

3. Individual Standardised deviation test:


i) Purpose: To identify excessively large deviations (first drawback of chi-square test).
ii) Rationale / Observation: the test looks at the distribution (normality) of standardised
deviation values.
iii) Assumptions: Normal approximation provides good approximation at all ages.
iv) Methods:
a) Step 1: Calculate the standardised deviation zx for each age/ age group.
b) Step 2: Divide the real number line into any convenient intervals.
-3 -2 -1 0 1 2 3

Where intervals at end are (-inf, -3 ) and (3, inf).


Or
-1.2816 -0.6745 0 0.6745 1.2816
-3 3
(10%) (25%) (50%) (75%) (90%)

c) Step 3: Observe number of observed and expected zx in under each interval.


d) Step 4: Club values up to at least 5.
e) Calculate chi-square test statistics.
f) Calculate degree of freedom.
g) Compare test statistic value with chi-square at DOF.
v) Conclusion: If standardised deviations do not appear to conform N(0,1), then observed
mortality rates do not conform to the model to the rates assumed in the graduation.
vi) Strengths of test:
a. It is a good all-around test.
b. This is effective against a. and b. defects of chi-square test as all positive values or all
negative values can never pass this test.
vii) Deficiencies of test:
a. It is not rigidly defined. We can have multiple answers for this result.
b. Subjective approach is used very often. We can divide the number line as per our
convenience.
c. It is not useful when number of observed values/ data/ m is small as whole number line
will be split into only two intervals (-inf,0) and (0, inf) after clubbing.

4. Informal Tests for Standardised deviation test:


i) Overall Shape (shape should be symmetrical of observed values.)
ii) Absolute Deviations (at least half of the observed values should lie between (-2/3,
2/3) i.e. 50% mid area or 25%<Observed Values<75% percentiles values of
number line)
iii) Outliers: Depends on how much area we allow for outliers (e.g. 5% total area is
considered as outlier so 2.5% from both sides. So, 1 out of 20 values are allowed
to be in category of outlier.). In general, all the observed values above or below 3
are considered as outliers.
iv) Symmetry: 50% of values should be positive and 50% values are negative.

5. Sign Test:
i) Purpose: To test overall bias (whether graduate rates are too high or too low – 2nd deficiency
of chi-square test).
ii) Rationale / Observation: If there are m groups, the number above (or below) should have a
binomial distribution (m , ½ ). A excessively high number of positive and negative deviations
indicates that the rates are biased.
iii) Assumptions: None.
iv) Methods:
a) Step 1: H0: P~ Bin(m , ½) where P : No. of zx which are positive.
b) Step 2: Calculate Probability[P <= k] for k being larger deviation or P[P>=k] for k
being lower deviation number (two- tailed test) and multiply by 2.
c) If m is large, P~ N(1/2* m, ¼ * m)
d) If p value exceed upper 5% , then there is insufficient data to reject null hypothesis.
e) If P[P>=k-0.5] > 2.5%, we cannot reject H0. Where k are the positive deviations and
0.5 is continuity correction.
f) If P[P<=k+0.5] > 2.5%, we cannot reject H0. Where k are the negative deviations.
v) Conclusion: If test shows that no. of positive or negative values are too large or too low, this
indicates that rates are too high or too low.
vi) Strengths:
a. It can detect the overall bias.
b. It is rigidly defined. (Single solution).
vii) Deficiencies of chi-square test:
a. Looking at sign does not tell about the extent of the discrepancy.
b. Test is qualitative.
c. If there are equal number of deviations on both sides, then test can’t be conducted.
d. Ignored the magnitude of deviations. Hence, test can be cleared even if deviations are
large.
e. It does not look at the pattern of occurrence of deviations (Can’t detect clumping).

6. Cumulative deviations:
i) Purpose: It address the problem of inability of chi-square test to detect a large positive or
negative cumulative deviation over all range of age.
ii) Rationale / Observation: Ecx µox .It is two tailed test.
iii) Assumptions: None.
iv) Calculate T.S.:
1. Binomial Model:
zx = (∑dx – ∑Ex*qx0 ) / sqrt( ∑Ex*qx0 *(1- qx0 )) ~N(0,1)
2. Poisson Model:
zx = (∑dx – ∑Ex*µ0x+0.5) / sqrt(∑Ex*µ0x+0.5) ~N(0,1)
iv) Result: If T.S. lies between -1.96 to 1.96 at 5% level significance level (2.5% both sides),
then we have insufficient evidence to reject Ho.

7. Grouping of Sign Test:


i) This is left-tailed test.
ii) Maximum the groups, the data will be more fitted.
iii) Where G denotes number of groups of positive deviations and we observed “g”
groups of positive deviations, then p-value of test is:
T.S. = P[G <= g] = ∑t ( n1 – 1 Ct-1 * n2 + 1Ct ) / ( n1 + n2 Cn1 )
Where n1: number of positive deviations
n2: number of negative deviations
t: nth groups of positive deviations
iv) Note: From Page no. 189 of table, we get value of critical value.
v) if g <= critical value, we reject H0 for 5% level of significance.

8. Serial Correlation Test:

i) Calculate correlation (r) using calculator.


ii) Calculate test statistics = r*sqrt(n).
iii) It is right-tailed test. Hence, critical value = 1.6449 or p-value = 5%.
iv) If T.S. > 1.6449, then we will reject H0.

9. Analysis of mortality through graduation table:


If there are huge positive z values, it shows that the graduated rates are generally less
than the observed. The graduated rates display lighter mortality than the observed rates.
10. Graduating using a parametric approach:
Advantage:
i) Graduated rates will progress smoothly provided the number of parameters is
small.
ii) Good for producing standard table.
iii) Can be easily extended to more complex formula, provided optimisation can be
achieved.
iv) Can fit the same formula to different experiences and compare parameter values
to highlight differences between them.

Disadvantage:
i) Hard to find a formula to fit well at all ages without having lots of
parameters.
ii) Care is required when extrapolating: fit is bound to be best at ages where we
have lots of data, can often be poor at extreme ages.

Methods of Graduation:

1. By Parametric formula:
a. It should be used when aim is to produce a standard table.
b. it depends upon a suitable formula being found which fits the data well.
c. Provided the number of parameters is small, the resulting curve should be smooth.
2. With reference to a standard table:
a. It should be used if standard table for class of lives is like the experience is
available.
b. It must relate to a similar class of lives, eg assurances and not annuities etc.
c. It must be available for all classes of lives, eg males and females
d. It should be up-to-date, ie relate to fairly recent experience.
e. It must cover the age range for which rates are required.
f. It must be a ‘benchmark’ table, ie generally acceptable to all other actuaries.
g. It should be used when we have not provided much data.
h. The standard table would be smooth, provided there is simple function which
links graduated rates to the standard table rates. Hence, smoothness will be
transferred to graduated rates.
i. Company generally insures non-standard lives which makes it unlikely that a
suitable standard table would exist.
j. Steps:
i. Select a suitable table based on similar group of lives.
ii. Plot the crude rates against qxs from standard table to identify a sample
relationship.
iii. Find the best-parameters, using MLE or weighted least square estimates.
iv. Test the graduation for goodness of fit. If the fit is not adequate, the
process should be repeated.
3. Graphical method:
a. If the quick check is needed and data are very scanty (small in quantity) or there
is very little prior knowledge of class of lives being analysed.
b. The graduation should be tested for smoothness by testing third differences of
graduate rates, which should be small in magnitude and should progress regularly
with age.
c. Steps:
i. Plot the crude data, preferably on a logarithmic scale.
ii. If data are scanty, group the ages together, choosing evenly spaced group
and making sure that there is suitable number of death (at least 5) in each
group.
iii. Plot approximate confidence limit or error bars around the plotted crude
rates.
iv. Draw the curve as smoothly as possible, trying to capture the overall
shape of crude rates.
v. Test goodness of fit or smoothness of graduation.
vi. If the graduation fails, re-draw the curve.
vii. If smoothness is unsatisfactory, the curve can be adjusted by “hand
polishing” and testing again.

Considerations while graduating the rates for premium calculations:

1. Graduated rates must not be overestimated, as premium would be too low.


2. The rates will be based on current mortality. We should also consider expected future
changes.
3. Premium charge by other insurers should be considered to exist in market.

Graduation using Spline function:

1. These are polynomials of a specified degree which are defined on a piecewise basis
across the age range.
A. We will be choosing spline function which are polynomial of specified degree
and we will be defining them on piecewise basis.
B. The spline function must satisfy three conditions at knots:
i. We require the function to have continuous at the knots.
ii. We require the function to be continuous first derivative at the knots i.e.
there should be no sharp turn at the knots.
iii. We require the function to be continuous second derivative at the knots
i.e. there should be no change in curvature at the knots.
iv. Note: The minimum degree of spline function should be cubic so that the
second derivative of the spline function could be exist and continuous.
v. If knots are x1, x2,…….,xn so, function should be linear before x1 and
after xn.
vi. Formula of spline function:
α0 + α1*x + ∑(j=1,n): βj*ϕj(x)
vii. For ages x < first knot x1 , formula = α0 + α1*x

2. Steps:
A. Identify the ages at which we are choosing the knots.
i. We can choose knots by looking at the behaviour of crude mortality rates.
B. Preliminary calculations :
i. Calculate Φj(x)
C. Estimate parameter values
D. Calculate graduation rates
E. Test
Chapter – 12 (Mortality Projection)
Methods of Projecting Mortality Rate:
1. Methods based on expectation:
Equation:
R(x,t) = αx + (1-αx)*(1-fn,x)t/n
Where
R(x,t): It measures the proportion by which mortality rate at age x (q x) is expected to be
reduced by future year t. (Reduction factor)
αx: ultimate reduction factor/ maximum reduction level/ lowest possible value of
reduction factor
1-αx: maximum amount of reduction in future mortality at age x
fn,x: proportion to total decline that is expected to occur in n years.
t: Projected mortality in t years
n: Total years in which mortality is expected to effect/reduce at age x.

Parameter Estimation:
Both factors αx and fn,x are set by expert opinion (perhaps based on analysis of recent
observed mortality trends.

Calculation:
mx,0: mortality rate for base year of mortality projection
αx = minimum possible mortality rate at age x/ mortality rate for base year of mortality
projection
mx,t: projected/ central rate of mortality at age x in year t.

Advantages of this approach:


 This method is easy to implement.

Disadvantages of this approach:


 Effects of factors like change in lifestyle prevention from major causes of death
are difficult to predict, as they have not occurred before and expert may fail to
judge the extent of impact of these factors on future mortality adequately.
 Parameters are themselves based for forecast which are being used to construct
a model to produce forecasts.
 Due to setting the targets level, it leads to underestimation of true level of
uncertainty around the forecast. (e.g. mortality of 70 years old can never reduce
below 30% of current mortality rate.

2. Method based on extrapolation:


It uses the past mortality experience to produce a model of future mortality experience
by using a pure extrapolation approach for indefinite time.

Example-
Lee-Carter Model:
Ln(mx,t) = ax + bx*kt + ex,t or
mx,t = exp(ax + bx*kt + ex,t)
ax: Mean value of Ln(mx,t) averaged over all period t for age x / General shape of
mortality at age x.
bx: extent to which the time period effects the mortality rate at age x / it measures the
change in rates in response to an underlying time trend in the level of mortality k t for
age x.
kt: effect of time on mortality or factor related to mortality rates for year t / effect of
time trend in year t
ex,t: stochastic error terms which are assumed to be IID random variables for all x,t with
mean 0.

Factors:
i) An age factor
ii) A period factor

Reason of using log:


The log transformation is applied to reflect the fact that mortality rates historically
increases exponentially with increase in age.

Constraints:
∑b̂x = 1 over all values of x
∑k̂t = 0 over all values of t

Calculation of parameters:
âx = 1/n * ∑(t =1,n): ln(m̂x,t)
If kt is a function of time series –
k̂t0 + 1 = kt0 + µ̂
k̂t0 + I = kt0 + I*µ̂

Interpret b60 = 3*b70 if mortality rates are improving over time:


 Mortality rates at age 60 are assumed to be improving three times the rate at
which they are improving at age 70.
 Since, mortality rates are improving, b60 and b70 are contain positive values.

Disadvantage of Lee- Carter Model:


 Future estimate of mortality at different ages depends upon original estimates of
parameters ax and bx which are estimated from past data which may distorted by
past period events so results might be unreliable or not useful.
 If estimated bx values shows variability from age to age, it is possible to forecast
age-specific mortality rates to “cross over” (e.g. projected rates (mx,t) may
increase with age at one duration, but decrease with age at the next due to
variability in value of bx).
 This model assumes that the underline mortality change rates are constant over
time which is unlikely to empirical evidence.
 This model does not include cohort term, whereas there is evidence from some
countries that certain cohort exhibits higher mortality improvements than
others.
 It can produce “jump-off” effects (implausible jump between most recent
observed mortality rate and forecast for the first future period.
 There is a tendency for Lee-Carter Forecasts to become increasingly rough over
time.

3. Explanation based model:


It projects mortality rates separately by cause of death and combine those to produce
overall projected mortality rates.
Mx,t = ∑Mjx,t where Mjx,t is the mortality rate by cause j at age x in time t.

4. General Result:
Dx,t/ Ecx,t is an unbiased estimator of mx,t. So,
mx,t = E[ Dx,t / Ecx,t ] and p̂x,t = exp(m̂(x,t))
qx,t = qx,0 * Rx,t
Chapter – 13 & 14 (Time Series)
1. Meaning of Time Series:
A time series is a stochastic process with continuous state space and discrete time domain.
E.g. closing price of a stock.
Where S = (0,∞) and J = { 0,1,2…..}

2. Types of Time Series Process:


1. 3Auto-Regressive Process (AR(p))
2. Moving-Average Process (MA(q))
3. Auto-Regressive Moving Average Process (ARMA(p,q))
4. Auto-Regressive Integrated Moving Average Process (ARIMA(p,d,q))

3. Stationarity:
A stochastic process is called stationary if the statistical properties (mean, variance etc.) of
process remain unchanged as the time elapses.
For practical purposes, it is sufficient for a series to be ―weakly stationary‖, which requires its first
two moments to be constant over time. In other words, the mean and variance take constant
values, and the covariance depends only on the lag, not on the time t.
Stationarity is an issue relating only to the autoregressive AR(p) terms, and is not affected by adding
or subtracting constants (e.g. MA(q) terms in ARMA(p,q) can be ignored and can simply write
ARMA(p.q) process to simple AR(p) process.

4. Invertibility:
A time series process is said to be invertible if we can express et in terms of Xt’s only.

5. Backward Shift Operator (B):


BXt = Xt-1
BkXt = Xt-k
where B works as an operator.

6. Difference Operator (∇):


∇Xt = Xt – Xt-1
∇2Xt = ∇(∇Xt) = (1-B)2Xt
∇k = (1-B)k
7. Exitance of d:
If sum of coefficients are equal to zero, then an order exist (differencing) for all order (1 st,
2nd , 3rd …...).

8. Integrated Process (I(d)):


An I(d) process is the process which is not originally stationary but becomes stationary after
differencing d times.

9. Markov Property:
If the future evaluation of the process can be completely determined by the knowledge of
its current state only (one-step dependency) and other past information becomes irrelevant
or useless, then the process is said to have Markov property.
Note: AR(p) is only Markov iff p=1.

10.Important Points:
1. Stationarity is necessary for all models since the Yule-Walker equations do not hold
without the existence of the auto-covariance function.
2. If observed data comes from an MA(q), it means that the autocorrelation function (ρ k)
will cut off (i.e ρk = 0) for all k>q or non-zero up to lag q and PACF(φk) will decay
exponentially to zero, but it will never get there, so that the PACF will always be non-
zero.
3. If observed data comes from an AR(p), e.g. the time series is an AR(3) series. The
autocorrelation function (ρk) will decay (i.e tend to 0) and the PACF(φk) will cut off (i.e φk
= 0) for k>3.
4. Exponential smoothing might be expected to outperform Box-Jenkins forecasting when
a slowly varying trend or multiplicative seasonal variations is present.

11. Cointegrated time series:


X and Y are said to be co-integrated if:
i) X and Y are I(1) random process.
Where I(1) shows that process was not originally stationary but become stationary
after differencing 1 time.
ii) There exist a non-zero vector (α,β) such that αX+βY is stationary. The vector (α,β) is
called cointegrated vector
iii) Two processes might be cointegrated if:
a. one of the processes is driving the other.
b. Both are being driven by same underlying process.
12. Turning Point (T):
If e1, e2,……,eN is a sequence of residuals, we say that ek is a turning point if either { ek-1 < ek -
and ek > ek+1 } or { ek-1 > ek and ek < ek+1 }.
Purpose of Test: This test checks whether the residuals are patternless.
E[T] = 2/3 * (n-2)
V[T] = (16*n-29)/90
95% CI = { E[T] +- 1.96* S[T] } , It is two-tailed test.
T~ N( E[T] , S[T] )
P[ T<t] if t<E[T] or
P[ T>t] if t> E[T]
where t is set after continuity correction.

13.The “portmanteau” Ljung-Box chi-square test:


Purpose: This test checks for correlation between the residuals. If the residuals form a white
noise process, then the sample autocorrelations will be small.
H0 : the residuals form a white noise process with zero mean.
T.S: Formula given on Page. No. 48 of tables.
Critical Value: from chi-square percentage table.
If T.S.< Critical Value, we have insufficient data to reject Ho.
Effect on Test result by changing value of n:
i) The absolute value of T.S. increases as ρ2 increases.
ii) If the sample autocorrelation values (r not ρ) were not equal to the theoretical values
but instead the first few autocorrelation values happen to be small, using a small
number of lags (k) will result in a relatively small value of the test statistic under the
alternative hypothesis.
iii) A large value of n would then be required to reject the null hypothesis.
iv) This would tend to support the use of a larger number of lags (k) to maximize the power
of the test for a fixed value of n.

14.Inspection of ACF:
Purpose: This test also checks whether the residuals are uncorrelated.
T.S. The ACF of the residuals should be zero for all lags except k = 0.
An approximate 95% confidence interval for k , k 1 , is (-1.96/sqrt(n) , 1.96/sqrt(n)).
Result: If all ρk values falls under above confidence interval, then there is insufficient
evidence to reject the null hypothesis. Hence, The tests suggest that the residuals form a
white noise process and hence the fitted ARMA(p,q) model is satisfactory.
15. ARCH(ρ) Model:
ARCH(ρ) models are defined by the relation:
Xt = µ + et*Sqrt( α0 + ∑ k = 1 to ρ αk*(Xt-k - µ)2
Where et is a sequence of independent standard normal random variables.
ARCH models can be used for modelling financial time series . If Z t is the price
of asset at the end of tth trading day, ARCH model can be used to model Xt =
ln(Zt/Zt-1), interpreted as daily return on day t.
The ARCH family of models captures the feature frequently observed in the
asset price data that a significant change in price of the asset is often followed
by a period of high volatility. A significant deviation of Xt-k from the mean µ
gives rise to an increase in volatility of the asset prices.

16.Does ARMA(1,1) process holds Markov property?


Xt = α*Xt-1 + et + βet-1
Xt = α*BXt + et + β*Bet
(1-α*B)*Xt = (1+β*B)*et
(1-αB)* (1+βB)-1 *Xt = et
(1-αB)* (1- βB+ β2B2- β3B3)-1 *Xt = et
ARMA(1,1) does not hold Markov property since the above definition of
process at time t depends upon the values at times t-1, t-2, t-3, t-4 etc.

17.Tests applied to residuals (et) after fitting a model to time series data:
i) The turning point test
ii) The “portmanteau” Ljung-Box chi-square test
iii) The inspection of the values of SACF values based on their 95% CI under
white noise null hypothesis.
See more detail on these tests on page no. 1259.

18. Error terms are always stationary.

19.A process, X, is said to be I(d) (‘integrated of order d ’) if the dth difference,


dX , is a stationary process.
Auto-Regressive Process (AR(p))
1. General Form:
Xt = µ + α1(Xt-1 - µ) + α2(Xt-2 - µ) + ………….+ αp(Xt-p - µ) + et
where et ~ N(0,σ2)
AR(1) = Yt = α*Yt-1 + et , by putting Yt = Xt - µ
AR(2) = Yt = α1*Yt-1 + α2*Yt-2 + et
AR(p) = Yt = α1*Yt-1 + α2*Yt-2 +……+ αp*Yt-p + et

2. Auto-Covariance Function (γk):


i) AR(1) = γk = α*γk-1 for all k >= 1 and γk = αk*γ0
ii) AR(2) = γk = α1*γk-1 + α2*γk-2 for all k >= 2
iii) AR(p) = γk = α1*γk-1 + α2*γk-2 + α3*γk-3 +………. + αp*γk-p for all k >= p

3. Auto-Correlation Function (ACF) (ρk):


i) ρXY = Cov(X,Y) / SQRT(V(X)*V(Y))  ρk = γk/γ0
ii) AR(1) = ρk = αk for all k
iii) AR(2) = ρk = α1*ρk-1 + α2*ρk-2 for all k>=2.
iv) AR(p) = ρk = α1*ρk-1 + α2*ρk-2 +…….+ αp*ρk-p for all k>=p.

4. Partial Auto-Correlation Function (PACF) (φk):


For AR(p), φk = 0 for all k>p.

5. Conditions to check the stationary for AR process:


AR(p) = Yt = α1*Yt-1 + α2*Yt-2 +…………+ αp*Yt-p + et
Characteristics Equation: 1 - α1z - α2z2 -..……- αpzp = 0
If for all roots, |z|>1, process is stationary.

6. Conditions to check invertibility:


AR process is always invertible.

7. For AR(p) process:


I. ACF - γk decays.
II. PACF - Φk cut off.
Moving-Average Process (MA(q))

1. General Form:

Xt = µ + β1et-1 + β2et-2 +……+ βqet-q + et

MA(1) = Yt = β1et-1 + et , by putting Yt = Xt - µ


MA(2) = Yt = β1et-1 + β2et-2 + et
MA(q) = Yt = β1et-1 + β2et-2 +…………+ βqet-q + et

1. Auto-Covariance Function (γk):


i) MA(1) = γk = 0 for all k > 1 and γk = αk*γ0
ii) MA(2) = γk = 0 for all k > 2
iii) MA(q) = γk = 0 for all k > p

2. Auto-Correlation Function(ρk):
i) ρXY = Cov(X,Y) / SQRT(V(X)*V(Y))  ρk = γk/γ0
ii) MA(1) = ρk = 0 for all k>1
iii) MA(2) = ρk = 0 for all k>2.
iv) MA(p) = 0 for all k>q.
3. Conditions to check the stationary for MA process:
MA process is always stationary because MA process is the linear combination
of the white noise processes.

4. Conditions to check Invertibility:


MA(q) = Yt = β1et-1 + β2et-2 +…………+ βqet-q + et

Characteristics Equation:
1 - β1z - β2z2 -..……- βqzq = 0
If for all roots, |z|>1 , process is invertible.

5. Partial Auto-Correlation Function (PACE) (φk):


Refer Actuarial Tables P.N. 46
6. For MA(q) process:
I. ACF - γk cuts off.
II. PACF - Φk decays.

Auto-Regressive Moving Average Process (ARMA(p,q))

1. General Form:
Xt = µ + α1(Xt-1 -µ) + α2(Xt-2-µ) +………………+ αp(Xt-p -µ) + β1et-1
+ β2et-2 +…………………..+ βqet-q + et
φ(B)*Yt = θ(B)*et
where, φ(B) = [ 1 – α1B – α2B2 -…………….- αpBp ] by using Yt = Xt - µ and Xt-1 = BXt
and
θ(B) = [ 1 + β1B + β2B2 -……………+ βpBp ]
Note: It is assumed that φ(B) and θ(B) have no common factors. If there are
common factors, then the expression must be simplified.
2. Auto-Covariance Function (γk):
i) ARMA(1,1) = γk = αγk-1 for all k > 1
ii) ARMA(3,2) = γk = α1γk-1 + α2γk-2 + α3γk-3 for all k > 2
After crossing the lag of q i.e. MP process, ARMA process starts behaving like
AR process for all k>=p.
3. Auto-Correlation Function(ρk):
iii) ρXY = Cov(X,Y) / SQRT(V(X)*V(Y))  ρk = γk/γ0
iv) ARMA(1,1) = ρk = αk-1ρ1 for all k>1
After crossing the lag of q i.e. MP process, ARMA process starts behaving like
AR process for all k>=p.
4. Conditions to check the stationary for MA process:
Check stationarity of φ(B) (refer AR(p) – 4th point) as MA process is always
stationary.
5. Conditions to check Invertibility:
Check invertibility of θ(B) (refer MA(q) – 5th point) as AR process is always
stationary.
6. Important Result:
i. AR(p) is the special case of ARMA(p,0) process.
AR(p) = ARMA(p,0)
ii. MP(q) is the special case of ARMA(q,0) process.
MP(q) = ARMA(0,q)

Auto-Regressive Integrated Moving Average Process


(ARIMA(p,d,q))

d: It defines the number of times differencing is done to make the process


stationary.
1. General Form after deleting common factors:
Xt = µ + α1(Xt-1 -µ) + α2(Xt-2 -µ) +………………………+ αp(Xt-p -µ)
+ β1et-1 + β2et-2 +………………….…+ βqet-q + et

φ(B)*Yt = θ(B)*et
where,
φ(B) = [ 1 – α1B – α2B2 -…………….- αpBp ] and
θ(B) = [ 1 + β1B + β2B2 -……………+ βpBp ]

2. Calculation of d:
a. If φ(B) is stationary, d = 0, ARIMA(p,0,q) = ARMA(p,q)
b. If φ(B) is not stationary, make φ(B) stationary by differencing d times.
c. If the sample variance of d at difference values are given, then we
will choose d with minimum sample variance.
d. If ACF decays slowly from 1, then, we will need differencing.
Otherwise, d = 0
3. Important Result:
a. AR(p) = ARIMA(p,0,0)
b. MP(q) = ARIMA(0,0,q)
c. ARMA(p,q) = ARIMA(p,0,q)

Vector Auto Regressive Process (VAR(p))


1. General Form:
Wt = A*Wt-1 + etw

Where Wt = Y
t
[ ] and
Xt
A is m*m matrices (Square matrices).

Here, Future value (Wt) depends only upon 1 past value (Wt-1), So, it is VAR(1)
process.
2. Conditions to check Stationarity:
|A-λI| = 0
If |λ| < 1 for all values of λ, process is stationarity.
3. Conditions to check Invertibility:
Just like AR(p), VAR(p) is always invertible.

4. Check Markov Property:


Just like AR(1), VAR(1) is also markov.

Box – Jenkins Methology for fitting ARIMA(p,d,q):


1. Prepare the data

2. Tentative identification of ARIMA model

3. Estimation of parameters in identified


model

4. Diagnostic checks

5. Use model to forecast future values.


Step 1: Prepare the data:
1. Remove Linear Trend:
 Differencing
 Least Square Method
Steps:
i) Estimate a and b using least square regression on xt against t.
ii) Subtract fitted regression line (a+bt) from the observed values to givr
a de-trended series.
Note: MLE and least square are equivalent when ei ~N(0,σ2)
Note: More details in notebook and page no. 1246 of study material.

2. Remove exponential trend


3. Remove Seasonal Effect:
 Seasonal differencing
 Method of seasonal means
 Method of moving averages

Note: More details in notebook and page no. 1245 of study material.

Step 2: Tentative identification of ARIMA model


1. Estimation of sample ACF and PACF
2. Identification of AR(p):
i) ACF decays (will tend to zero)
ii) PACF cut off (equals to zero)
3. Identification of MP(q):
i) ACF cut off
ii) PACF decays
4. Identification of d (differencing d times)

Step 3: Estimation of parameters in identified model


Step 4: Diagnostic Checks
1. White noise process:
a) Inspection of graph of residuals
i) Zero mean
ii) Constant variance
iii) Pattern less (Random)
2. Counting turning points
3. Lying & Box Test

Step 5: Use model to forecast future values.


1. Box-Jenkins approach to forecasting stationary time series
2. Exponential Smoothening:
x̂t(1) = α*xt + (1-α) * x̂t-1(1) where α is the smoothening parameter.
Method of Moments (Yule-Walker) estimates (α or σ2):
1. Auto-Covariance Function (γk):
i) AR(1) =
I. γ0 = α1*γ1 + σ2 or
2 2
γ0 = σ /(1-α ) or
ρ0 = α1*ρ1 + σ2
II. γ1 = α1*γ0 or ρ1 = α1
III. γk = α1*γk-1 for all k >= 1 and γk = αk*γ0 or γk = αk*σ2/(1-α2)

ii) AR(2) =
I. γ0 = α1*γ1 + α2*γ2 + σ2 or ρ0 = α1*ρ1 + α2*ρ2 + σ2
II. γ1 = α1*γ0 + α2*γ1 or ρ1 = α1 + α2*ρ1
III. γ2 = α1*γ1 + α2*γ0 or ρ2 = α1*ρ1 + α2
IV. γk = α1*γk-1 + α2*γk-2 for all k >= 2

iii) AR(p) = γk = α1*γk-1 + α2*γk-2 + α3*γk-3 +………. + αp*γk-p for all k >= p

2. Auto-Correlation Function (ACF) (ρk):


i) ρXY = Cov(X,Y) / SQRT(V(X)*V(Y))  ρk = γk/γ0
ii) AR(1) = ρk = αk for all k
iii) AR(2) =
I. ρ0 = 1
II. ρ1 = α1/(1-α2)
III. ρ2 = α1* ρ1 + α2
IV. ρk = α1*ρk-1 + α2*ρk-2 for all k>=2 where k=|k|
iv) AR(p) = ρk = α1*ρk-1 + α2*ρk-2 +…….+ αp*ρk-p for all k>=p.

3. Partial Auto-Correlation Function (PACF) (φk):


i) AR(1) =
a. φ1 = ρ1
b. φk = 0 for all k>1.

ii) For AR(2)=


a. φ1 = ρ1
b. φ2 = (ρ2 – ρ12)/(1- ρ12)
c. φk = 0 for all k>2.

4. Other Formulas:
iv) µ̂ = ∑xi/n
v) γ̂0 = ∑(xi - x̅)2/n = Cov(Yt,Yt) = V(Yt)
vi) γ̂1 = ∑(xi - x̅)*(xi-1 - x̅) /n
vii)γ̂2 = ∑(xi - x̅)*(xi-2 - x̅) /n
viii) ϕ̂1 = ρ1 = γ1/Sample variance
ix) φ2 = (ρ2 – ρ12)/(1- ρ12) where ρ2 = γ2/Sample variance
x) Corr(x,y) = Cov(x,y)/sqrt(var(x)*var(y))
xi) Let Xt = α*Xt-1 + et and et ~ N(0,σ2) then,
a. Xt - α*Xt-1 ~ N(0 , σ2) and
b. Xt|Xt-1 ~ N(α*Xt-1 , σ2)
c. L = ∏ i=1 to n P(Xi = xi| xi-1) * P(x0) where P(x0) = 1
d. Yt = 1 + 0.6Yt-1 + 0.16Yt-2 + et or Yt = a0 + a1 Yt-1 + et
From the stationarity condition then,
E(Yt ) = μ = 1 + 0.6μ + 0.16μ + 0 or μ = a0 +a1μ

xii)Yt = a + Yt = a + Yt-1 + et and t and Y0 = 0


Yt = at + ∑ i=1t ei
E[Yt] = at
V[Yt] = t*σ2
xiii) Stats of Time Series process:
a. Mean/Expected value of rate of inflation:
Take Expectation on both side of time series process and resulted
value of µ is the expected value of rate of inflation.
b. Variance: γ0

5. Box-Jenkins method of Forecasting:


Forecasting ARMA(1,1):
Let given equation is:
Xt = 5.67 + 0.61*Xt-1 + et - 0.23*et-1
One-step ahead forecast:
xt(1) = 5.67 + 0.61*xt + et+1 -0.23*et
where xt and et are given and future errors after time t (et+1) replaced with zero.

Two-steps ahead forecast:


xt(2) = 5.67 + 0.61*xt(1) + et+2 -0.23*et+1 where et+1 and et+2 replaced with zero.

Forecasting two steps ahead ARIMA(1,2,1):


Z n = ∇2Xn = (1-B)2Xn = (1-2B+B2)Xn = Xn – 2Xn-1 + Xn-2
And, Xn = Zn + 2Xn-1 - Xn-2 So, Xn+2 = Zn+2 + 2Xn+1 – Xn
Hence, x̂ n(2) = ẑn(2) + 2x̂ n(1) - x̂ n
Where Zn = µ + α(Zn-1 - µ) + en + βen-1 #ARMA(1,1)
Zn+2 = µ + α(Zn+1 - µ) + en+2 + βen+1
Hence, ẑn(2) = µ̂ + α̂(ẑn(1) - µ)

6. Equivalent Infinite-order MA representation (of AR process):


ϕ(z) * (Yt - µ) = Zt
Where Yt is AR(p) process and Zt is MR(q) process and ϕ(z) is characteristic equation of
AR(p).
Hence, Yt = µ + ϕ-1(z)*Zt
Chapter – 16 (Extreme Value Theory)
Need of Extreme Value Theory:
 By fitting a distribution across the whole data range, the single distribution chosen may be a
good overall fit of the data but could be a poor fit where there is little data, e.g. in the tails
which are of primary concern.
 EVT can be useful where we are particularly interested in the tail of a distribution and need
to model that part accurately.

Approaches to study Extreme Value Theory (EVT):


1. Generalised Extreme Value Distribution
2. Generalised Pareto Distribution

1. Generalised Extreme Value Distribution (GEV):


It considers the block maxima.
XM = max{ X1, X2,…………………..,Xn} where n is the block size.
Note: Large n, few blocks => More extreme values
Distribution:
(XM – αn)/βn ~ EVD

CDF of (standardised) Maximum Value:


P[XM < x ] = [F(x)]n where x is the observed value of 1 block.
P[ (XM – αn)/βn < x ] = P[XM < αn + x*βn ] = [F(αn + x*βn)]n when n tends to ∞.
Where distribution of data to calculate CDF is given.

When n is large, the distribution of standardised Value converge to


Generalised Extreme Value Distribution (GEV).
Hence, lim n tend to 0 [F(αn + x*βn)]n = H(x)

CDF of Generalised Extreme Value Distribution (GEV):


H(x) = exp( -(1+γ*(x-α)/β)-1/γ ) when γ≠0
exp( -exp(-(x-α)/β ) ) when γ=0
PDF of Generalised Extreme Value Distribution (GEV):
Refer P.N. 819 of CS2 study material.

GEV Family of
Distruibutions

Frechet Type GEV Gumbel Type GEV Weibull Type GEV


(gamma > 0) (gamma = 0) (gamma < 0)

1. Frechet Type GEV (when γ>0):


 It is heavy tailed.
 It has finite lower limit. [ x> α - β/γ ]
 It is used when we have extreme events in the right.
 Example: to model claim amount.
 Distribution of loss amount: Burr, F, Pareto, Log-gamma

2. Gumbel Type GEV ( when γ=0):


 It has unbounded graph.
 If α = 0 and β = 1, it tends towards standard Gumbel distribution.
 Then, H(x) = exp(-exp(x)).
 Standard Gumbel distribution is the EVD from exponential distribution.
 Distribution of loss amount: chi-square, exponential, Gamma, Log-
normal, Normal, Weibull

3. Weibull Type GEV ( when γ<0):


 It is light tailed. It tends to zero quickly.
 It has finite upper limit. [ x< α – β/γ ]
 It is used when we have extreme events in the left.
 Example: to model profits.
 Distribution of loss amount: Beta, Uniform, Triangular

2. Generalised Pareto Distribution (GPD):


It sets the threshold to model claim amount above the threshold limit.
W = X-u | X>u where u is the threshold.

CDF of threshold exceedance:


Fu(x) = (F(x+u)-F(u)) / (1-F(u))
Where distribution of claim amount is given.
Under Pareto Distribution (α,λ), λ=(λ+u) and α=α

When u is large, distribution of threshold exceedance will converge to GPD:


Lim u → ∞ Fu(x) = G(x)

CDF of Generalised Pareto Distribution:


G(x) = 1- (1+ x/γβ)-γ when γ≠0
1- exp(-x/β) when γ=0

PDF of Generalised Pareto Distribution:


Refer P.N. 831 of CS2 study material.

Some General Results:


Under Pareto Distribution (α,λ), λ=γβ and α=γ
Under Exponential Distribution(λ), λ=1/β

Limitation of Generalised Pareto Distribution (based on Javelin throws):


1. Not all throws are identically distributed.
2. The sample size is not particularly large.
3. The generalized Pareto distribution is a limiting distribution and the actual distribution of
the exceedances over any finite threshold will be different.
4. An example of a source of non-independence, e.g. each thrower will make multiple throws.
5. There could be trends in the distances thrown over the years (e.g. improvements in training
techniques, improvements in javelin technology (e.g. lighter javelins)).
6. Alternative thresholds should be analysed.
7. Measures of Tail Weight:
1) Existence of moments:
For kth moment, large k tends to light tail and vice-versa.

2) Limiting density ratios:


For x → ∞,
 If PDF(1)/PDF(2) = PDF(1)/0 = ∞ or increasing function, PDF(1)
has heavier tail.
 If PDF(1)/PDF(2) = 0/PDF(2) = 0 , PDF(1) has light tail.

3) Hazard rate:
h(x) = f(x)/(1-F(x))
h(x) increases, light tailed
h(x) decreases, heavy tailed
if h(x) is increasing function of x, then, x has a light tail.

Under Weibull distribution:


γ>1 h(x) increases light tailed
γ=1 h(x) constant
0<γ<1 h(x) decreases heavy tailed

4) Mean residual life:


e(x) = [ inf(x,∞):(1-F(y))dy ] / 1-F(x)
e(x) increases or h(x) decreases heavy tailed
e(x) decreases or h(x) increases light tailed

If the mean residual life is an increasing function of x, then the


distribution has a heavy tail.

Note: Due to memoryless property of exponential function, the


mean residual life of exponential function follows Exp(λ)
So, e(x) = 1/λ
Chapter – 17 (Copulas)
Type Structure Discrete Continuous
Joint PDF fXY(x,y) P[X=x, Y=y] f(x,y)
Int(start,x)(start,y):
Joint CDF FXY(x,y) P[X<=x, Y<=Y]
f(s,t) dt*ds
fX(x) ∑y f(x,y) Int(y): f(x,y)dy
Marginal PDF
fY(y) ∑y f(x,y) Int(x): f(x,y)dx
FX(x) Int(start,x): fX(s)ds
Marginal CDF
FY(y) Int(start,y): fY(s)ds

1) C(u1, u2, u3):


This gives the probability that RV1 is in the bottom u1 percentile, and RV2 is in the bottom
u2 percentile, and RV3 is in the bottom u3 percentile.

2) Marginal CDF:
a) Fx(x) = int(s: start, x): fx(s)*ds

3) Property of CDF:
CDF of every distribution follows U(0, 1)
Hence, Fx(x) = u is true for all distribution.

4) Measure of Association:
a) Pearson correlation coefficient:
i) It measures the strength of linear relationship.
ii) After applying monotonic function (X, X2, X3,....), there is different rho for each
function. So, it does not satisfy invariance property.
iii) ρX,Y = cov(X,Y)/sqrt(V(X))* sqrt(V(Y))

b) Spearman’s Rank Correlation coefficient:


i) It satisfies invariance property because rank of X and Y will always be same after
applying monotonic function.
ii) It measures the strength of all types of relationship (not only linear) because it
doesn’t take account the original value of X and Y.
iii) it is also called measure of concordance.
iv) sρ = 1- 6/(T(T2-1)) * ∑i=1T di2
where T: number of pairs of observation
di : difference in rank for ith observation

5) Copula Function:
A copula function takes marginal CDF as input and gives joint CDF as output.
C[ FX(x) , FY(y) ] = FXY(x,y)
Or C(u,v) = FXY(x,y) > Bivariant input information.
Where FX(x) = u and FY(y) = v

6) Properties of Copula function:


a) It is increasing function of input. (Increase Marginal CDF > increase Joint PDF)
C(u, v) > C(u*, v) given u > u*.
b) If all the inputs of copula function are equal to 1, except one of the marginal CDF, the
copula function is equal to that marginal CDF.
C(1, 1 ,1 ,1, 1,u ,1 ,1 ,1 ) = u = Fx(x).
c) Copula function will always give answer between 0 and 1.

7) Sklar’s theorem:
For every joint CDF, there exist a copula function.
And if there is a copula function, there exist a joint CDF.
Let F be a joint distribution function with marginal cumulative distribution functions F1,…, Fd, then
there exists a copula C such that for all x1,…, xd ϵ [-∞, ∞]: F(x1,….., xd) = C[F1(x1),…, Fd(xd)]

8) Measure of Tail dependence (Extreme values):


a) Coefficient of upper tail dependence
It tells us how the higher values of X (eg 5%) are relates with higher values of Y (eg 5%).
If u = 0.95, upper tail values are 0.05.
λU = lim u tends to 1- P( X > FX-1(u) | X > FY-1(v) )

b) Coefficient of lower tail dependence.


λL = lim v tends to 0+ P( X <= FX-1(u) | X <= FY-1(v) )
9) Coefficient of lower tail dependence using copula function
λL = lim C(u, u)/u
u tends to 0+

10) Survival Copula:


C̅[ 1-u, 1-v ] = 1 – u – v + C(u,v)

11) Coefficient of upper tail dependence:


lim
λU = u →1−¿ (1−2 u+C ¿
n
(u , u)¿¿(1−u)) ¿

λU = lim u tends to 0+ C̅[u,u]/u = lim u tends to 1- (1-2u+C[u,u])/(1-u)

Types of
Copula
Functions

Implicit Fundamental Explicit


Copulas Copulas Copulas

Gaussain Student's t Co-monotonic Counter- Clayton Frank


Independence Gumbel
(minimum) Monotonic
Copula Copula Copula Copula Copula Copula
Copula Copula

12) Types of Copula Function:


a) Fundamental copulas
b) Explicit copulas
c) Implicit copulas

13) Fundamental copulas:


These copulas function describes three basis to fundamental dependency structure that a
set of variables can have.

a) Independent Copula (Product Copula):


C(u ,v) = u*v
λL = λU = 0
because X and Y are fully independent, there would be no relation at the tails of the
function.
b) Co-monotonic (Minimum Copula):
C(u ,v) = min(u,v)
λL = λU = 1 because there is perfect positive correlation.

c) Counter-Monotonic (Maximum Copula):


C(u,v) = max( u+v-1, 0)
λL = λU = 0 because there is perfect negative correlation.
There is dependency of lower values of x on higher values of y or vice-versa.
But there is no dependency of lower values of Y on lower values of X ( λL) and vice-versa
( λU).

14) Explicit Copula:

a. Gumbel Copula:
C[u,v] = exp(- (-ln uα + -ln vα )1/α )
λL = 0 , λU = 2 - 2-1/α
Notes:
1. It applies when there is an upper tail on dependence but no lower tail dependence.
2. Higher the value of α, higher the level of upper tail dependence for Gumbel copula
(for α > 1).
3. If α = 1, it will become Independent or product copula.

b. Clayton Copula:
C[u,v] = (u-α + v-α – 1)-1/α
λU = 0 , λL = 2-1/α
Notes:
1. It applies when there is lower tail dependence on α but no upper tail dependence.
2. Higher the value of α, higher will be the lower tail dependence.

c. Frank Copula:
C[u,v] = -1/α * ln( 1 + (e-αu – 1)*(e-αv – 1)/ (e-α – 1) )
λL = λU = 0
Notes:
1. Under this, there is no upper or lower tail dependence but have positive
dependence throughout the function.
2. If there is not at all dependence on whole function ie α → 0, then it means they are
fully independent and C[u,v] = u*v.
15) Implicit Copula:
a) Gaussian Copula:
C[u,v] = Φρ [ Φ-1(u) + Φ-1(v) ]
Where Φ is the CDF of Standard Normal Distribution.
And Φρ os the distribution function of bivariate Normal Distribution with correlation ρ.
Notes:
1. If ρ = 0  Independent copula
2. If ρ = -1  Maximum Copula
3. If ρ = +1  Minimum Copula
4. Simplified formula is that page no. 889 of the study material.
5. Gaussian copula has zero upper tail dependence for ρ<1.

b) Student’s t Copula:
C[u,v] = tγ,ρ [ tγ-1(u) , tγ-1(v) ]
Where γ is the degree of freedom.
tγ : CDF of t distribution
tγ,ρ: CDF of bivariate t distribution with correlation ρ.
Notes:
1. This copula is better than Gaussian copula as it has additional parameter γ.
2. γ decides degree of tail dependence (no matter upper tail or lower tail).
3. Smaller the value of γ, greater the level of tail dependence.
4. As γ → ∞ , it approaches to Gaussian copula.

Inverse Function:
i) Put ψ(x) = y,
ii) Calculate x in terms of y.
iii) Replace x = ψ-1(y)

Pseudo-Inverse Function:
If ψ(0) = ∞ , ψ[-1](x) = ψ-1(x)

Archimedean Copula:
It is described by specific generator function:
C[u,v] = ψ[-1] (ψ(u) + ψ(v))
Where ψ(x) is corresponding generator function.
C[u1, u2,……..,un] = ψ[-1] (ψ(u1) + ψ(u2) +………………+ ψ(un))

Properties of Generator function ψ(t):


1. At t = 0, limt→0 Ψ(t) = ∞
2. At t = 1, limt→0 Ψ(t) = 0
So, ψ(0,1) → [0,∞]
3. Ψ(t) should be decreasing function.
Calculation of copula function through generator function:
1. Calculate ψ(u) and ψ(v) from ψ(x).
2. Calculate ψ[-1](x) using steps of inverse function.
3. Put the above calculated values in Archimedean Copula function.

Generator function of:


1. Gumbel Copula: ψ(t) = (-ln t)α
2. Clayton Copula: 1/α * (t-α – 1)
3. Independent Copula: ψ(t) = (-ln t)
Chapter – 18 (Reinsurance)

Life Insurance vs General Insurance:

Life Insurance General Insurance

Model No. of Claims Model both Np. and size of


Long Term eg 25 years claims
Consider interest factor and Short Term (Generally renew
inflation factor every year)
Claims are not Random Variable Only consider infaltion factor
Claims are not Random
Variable.
Use positive skewed
distributions

Insurance:

Policyholder Insurer Reinsurer


(X) (Y) (Z)

where
X: Total Claim Amount payable by the policyholder
Y: Claim amount paid by insurer
Z: Claim amount paid by reinsurer
So, X = Y + Z where all are the Random Variables.
X=Y if there is no reinsurance
Conditions under which insurance is available:
 Financially valuable/ expensive
 Durable
 Probability of event is very low. (Cancer patient can’t buy insurance)

Reasons of purchasing Reinsurance:


 Mean claim amount E[Y] decreases.
 Variability of claim amount V[Y] decreases
 Insurer can make use of expertise of reinsurance to model high risk portfolios.

Types of Reinsurance:

Reinsurance

Non-
Proportional
Proportional

Quota Share Surplus Excess of Loss Stop Loss

Proportional Reinsurance:
1. Surplus: Claim ratio is different for different claims.
2. Quota Share: A fixed claim ratio (α) is decided between insurer and reinsurer.
α: A proportion of risk bears by the insurer. 0<α<1
1-α: A proportion of risk bears by reinsurer.

When Proportional (Quota Share) Reinsurance (α) is purchased and distribution


of X is given:
E[Y] = α*E[X] E[Z] = (1-α)*E[X]
V[Y] = α2* V[X] V[Z] = (1- α)2*V[X]

When Non-Proportional (Excess of Loss) Reinsurance (Maximum Amount: M) is


purchased and distribution of X is given:
E[Y] = E[X] – E[Z] = E[X] - Int(M,∞):(x-M)f(x)dx
E[Z] = Int(M,∞):(x-M)f(x)dx
Distributions:

1. Pareto Distribution(α,λ):
a. Proportional Reinsurance:
E[X] = λ/(α-1)
E[Y] = α* λ/(α-1)
E[Z] = (1-α)* λ/(α-1)

b. Excess of Loss Reinsurance:


E[X] = λ/(α-1)
E[Y] = λ/(α-1) - λα/(α-1)(λ+M)α-1
E[Z] = λα/(α-1)(λ+M)α-1

2. Log-normal Distribution(µ,σ2):
a. Proportional Reinsurance:
E[X] = eµ+0.5σ2
E[Y] = α* eµ+0.5σ2
E[Z] = (1-α)* eµ+0.5σ2

b. Excess of Loss Reinsurance:


E[X] = eµ+0.5σ2
E[Y] = eµ+0.5σ2 - Int(M,∞):xf(x)dx + M*(Int(M,∞):f(x)dx)
E[Z] = Int(M,∞):xf(x)dx + M*(Int(M,∞):f(x)dx)
Note: Use Actuarial Table formula to solve Ist integral with k = 1 and 1nd integral with k
=0
Key Results:
log(Inf) = Inf , Φ(-inf) = 0 , Φ(0) = 0.5 , Φ(inf) = 1,

Conditional Reinsurance
Under Conditional Reinsurance,
Z = X-M|X>M, where Z>0
Some Results:
a. FZ(z) = (FX(M+z) – FX(M)) / 1- FX(M)
b. FZ(z) = fX(M+z)/(1-fX(M)
c. If X~Exp(λ), then Z~Exp(λ)
d. If X~Pareto(α,λ), then Z~Pareto(λ+M)
Inflation
E[Y] = E[X] – E[Z] = E[X] - Int(M/k,∞):(kx-M)f(x)dx
Or E[Y] = Int(0,M/k):kx*f(x)dx + Int(M/k,∞):M*f(x)dx
E[Z] = Int(M/k,∞):(kx-M)f(x)dx

Policy Excess/ Deductible


L: It is the (fixed) claim amount charges which is to be born by policyholder while
making a claim.
Y: It is the net amount paid by the insurer (deduct after deductible).
Y = X-L|X>L
E[Y|X>L] = E[Y]/P[X>L] = E[Y]/1-FX[L]
Where E[Y] = λα/(α-1)(λ+M)α-1

Special Case of Excess of Loss Of Reinsurance


Y= 0<M<Y<K<∞
E[Y] = Inf(0,M):x*f(x)dx + Inf(M,K)M*f(x)dx + Inf(K,∞):(x+M-K)*f(x)dx

Estimation
1. Maximum Likelihood Estimation:
L = ∏ i=1 n-a f(xi) * P[X>M]a
Where a: number of claims in which reinsurer is involved.

2. Method of Percentile:
P[X<= x25] = 0.25
P[X<= x75] = 0.75
Chapter – 19 & 20 (Risk Models – I & II)
Types of Risk Models:
1. Collective Risk Models
2. Individuals Risk Models

Collective Risk Models


S = X1 + X2 + X3 +……………………..+XN

Where Xi: Amount of ith claim and N:No. of claims


And S: Aggregate Claim Amount

CDF of S:
{S<=x}: It is the summation of mutually exclusive events (due to different value of N) but not
exhaustive (because claims with amount greater than x are not covered here.

{S<=x} = ∑n=0 to ∞ P[S<=x and N=n]

Calculation of S with claim cost (Y):


X: Amount of claim
Y: Cost of claim
E[X+Y] = E[X] + E[Y]
V[X+Y] = V[X] + V[Y] if claim cost and claim amount are independent.

Estimation of S using Normal Approximation for n policies:


N ~ Poi(λ)
n*N ~ Poi(n*λ)
S ~ N[ n*E[S] , n*V[S] ]

Conditional Mean, Variance and MGF based on two types:


1. E(X) = E(E(X|Type)) = E( X|Type I) *P(Type I) + E(X|Type II) *P(Type II)
2. E(X2) = E(E(X2|Type)) = E( X2|Type I) *P(Type I) + E(X2|Type II) *P(Type II)
Where E( X2|Type I) *P(Type I) = V( X|Type I) + [E(X|Type I)]2
3. V[X] = E[X2] – E[X]2
4. MX(t) = E[etX] = E[E(etX|Type)] = E(etX|Type I)*P(Type I) + E(etX|Type II) *P(Type II)

Skewness:
Skewness = E[ (X – E[X])3 ] = C’’’(0)
Where CX(t) = ln(MX(t)) and differentiate CX(t) w.r.t. t
Coefficient of Skewness:
Skewness(X) / Var[X]3/2

Conditional PDF:
Ni|λi ~ Poisson (i) and λi ~ Gamma(α,β)
P[Ni=x] = Int(λi): P[Ni = n|λi] * f(λi) dλi
f(n) = Int(λi = 0,∞): f(n|λi) * f(λi) dλi

Probability of all claim below Retention Limit:


∑ j = 0 ∞ [all claims below retention | j claims] * P[ j claims]
= ∑ j = 0 ∞ ( jCj pj (1-p)0 ) * ( e-λ * λj / j! )

If there are more than two portfolios:

Portfolio A :
E[S1] = E[X]*E[N]
VAR[S1] = E[N]* VAR[X] + E[X]^2 * VAR[N]
NOTE: If N follows Poisson distribution, VAR[S1] = E[N]* [ VAR[X] + E[X]^2 ]

Portfolio B :
E[S2] = E[X]*E[N]
VAR[S2] = E[N]* VAR[X] + E[X]^2 * VAR[N]
NOTE: If N follows Poisson distribution, VAR[S1] = E[N]* [ VAR[X] + E[X]^2 ]

JOINT PORTFOLIO:
S = S1 + S2
E[S] = E[S1] + E[S2]
V[S] = V[S1] + V[S2] (By Principle Of Independence)

Expected profit per unit of time:


(i) Without reinsurance:
E[Profit] = c - E(S) = θ* E(S) where c=(1+θ)*E(S)

(ii) With reinsurance:


E[Profit] = c(Net) - E(SI)
= (1+θ)*E(S) – (1+ξ)*E(SR) - E(SI)
Chapter – 21 (Machine Learning)
1. Supervised Learning:
In supervised learning, the machine is given a specified output or aim.
This might be the prediction of a specific numerical value (e.g. a future lifetime) or the
prediction of which category an individual will fall into (e.g. default on a loan or not).
Examples:
 Generalized linear models
 Naïve Bayes classification
 Decision trees
 Prediction of future lifetime /claims on certain classes of insurance
 Neural networks
 Defaulting of loan
 Regression models
 Logistic regression
 Discriminant analysis
 Support vector machines

2. Unsupervised Learning:
In unsupervised learning the machine is set the task without a specific target to aim at. For
example o identifying clusters within a set of data without the number or nature of the
clusters needing to be pre-specified).
Examples:
 Cluster analysis
 Principal components analysis
 Apriori algorithm
 Market basket analysis
 Text analysis
 Neural networks

3. Train-validation-test approach:
1. In machine learning, the convention is to divide the data into two parts.
i) A training dataset
ii) A validation dataset
2. A training data set which is the sample of data used to fit the model; that is, to train the
algorithm to choose the most appropriate hypothesis;
3. A validation data set which is the sample of data used to provide an unbiased evaluation
of model fit on the training dataset while adjusting the hyper-parameters.
4. These hyper-parameters are often specified in advance and then adjusted/optimized
according to the performance of the model on the validation data;
5. Finally, the test dataset is used where a test data set which is the sample of data used to
provide an unbiased evaluation of the final model fit on the training data set. Under
machine learning the results of the modelling exercise are applied to data which was
not used to develop the algorithm, so the test data should be representative of the
data on which the algorithm is to be used.
6. A typical split of data is 60% for training, 20% for validation and 20% for testing the
principle being that enough data must be selected for the validation and testing sets,
with the remainder used for the training set.

4. Advantages and disadvantages of using a large number of parameters:


1. The advantage of having more parameters is that it can improve the accuracy of the
model and predictions, because a model with, more parameters will fit the data more
closely than one with fewer parameters For example, the flow of traffic is likely to be
affected by a large number of factors, such as time of day, weather, weekday vs
weekend.
2. However, if too many parameters are used there is a risk of over-fitting where the
estimates from the model will reflect idiosyncratic characteristics of the “training” data
set rather than characteristics which apply to the whole data set. This may lead to the
analyst identifying patterns which do not exist. For example, the analyst in this case
may have used a training dataset which includes anomalies in traffic flow, perhaps due
to a vehicle breaking down near one of the sensors which distorted the collection of
data of other vehicles had to divert around it
3. If too many parameters are used the model can become complex and computationally
expensive to run
4. Using too many parameters may lead to model stability issues.

5. Parameters:
Parameters are variable internal to a model. These values are estimated from the data and
used when calculating predictions from the model.

6. Hyper-parameters:
Hyper-parameters are variables external to the model whose values are set in advance by
the user prior to running an algorithm. They are chosen based on the user’s knowledge and
experience in order to produce a model that works well.
7. Heuristic:
‘Heuristic’ means that there are no hard and fast rules for these. They are determined using
rough guidelines and past experience of what works well, combined with experimentation.

8. Over-fitting:
it leads to the identification of patterns that are specific to the training data and do not
generalize to other data sets.

So there is a trade-off here, between bias – the lack of fit of the model to the training data –
and variance – the tendency for the estimated parameters to reflect the specific data we
use for training the model.

9. Cross-validation:
Cross-validation is a technique to evaluate predictive models by partitioning the original
sample into a training set to train the model, and a test set to evaluate it. In s-fold cross-
validation, the original sample is randomly partitioned into s equal size subsamples and to
‘train’ the model s times, using a different slice for validation each time.

10.Gini index measure:


It is a measure of the ‘impurity’ of the nodes in the tree, ie the extent to which final nodes
contain a mixture of different data types.
Gini index of a node: Gi = 1 - (p12 + p22 + ….)
Gini index of a tree: Weighted average of Gini index of each node. ∑i ni/n * Gi
Notes:
i) In case of binary classification problem, Gini index can take any value between 0 and
0.5.
ii) A node with a score of 0 is perfectly pure and only contains one category.
iii) If there are m categories, the range of gini index can be (0,1- 1/m).
iv) If there are m categories, a node with score 1 – 1/m means that proportion osf each
category in the node is 1/m.

11.Regularisation or penalization:
This approach exacts a penalty for having too many parameters.

12.Branches of Machine Learning:


1. With supervised learning, the algorithm is given a set of specific targets to aim for.
2. With unsupervised learning, the algorithm aims to produce a set of suitable labels (ie
targets).
3. With semi-supervised learning, the algorithm uses a combination of supervised and
unsupervised methods.
4. With reinforcement learning, the algorithm aims to improve its performance through
trial and error, using a rewards (or penalties) approach.

13.Examples of different branches:


1. An example of a classification problem is a spam filter that classifies emails into the two
categories ‘Safe’ or ‘Suspicious’.
2. An example of a regression problem is a health awareness app that predicts the user’s
life expectancy.
3. An example of a clustering problem is a system that groups together postcode areas
that tend to have a similar experience of insurance claims.
4. An example of semi-supervised learning is a photo app that groups photos featuring
people with a similar appearance and then allows the user to name the people in order
to add their names automatically to new photos
5. An example of reinforcement learning is a voice recognition app that adapts over time
to the user’s voice.

14.Extension of decision tree:


a. Bagged Decision tree: In bagged decision trees, we create random sub-samples
of our data with replacement, train a CART model on each sample, and (given new
data) calculate the average prediction from each model.
b. Boosted Decision tree: Random forests apply a method based on averaging a
number of randomly generated decision trees.
c. Random forest: Random forests apply a method based on averaging a number of
randomly generated decision trees.

15.Formula for naïve bayes theorm:


P[A] = Prior Probability

P[A|B] = Posterior Probability

P[A|B] = P[A and B] / P[B] or

P[Ai|B] = ( P[Ai] * P[B|Ai] )/ P[B] where P[B] = ∑(j=1,k): P(Aj)*P(B|Aj)

P[C|A] = ( P[A|C]*P[C] ) / ( P[A|C]*P[C] + P[A|D]*P[D] )

Where C & D are two different events.


Topics Left-
1. Solve integration equation (using two methods)
2. Lee cartal model

CT4 CHAPTERS
1. Stochastic process
2. Markov chains
3. Two-state Markov model
4. Time homogeneous & Inhomogeneous Markov jump model
6. Survival model
7. Estimating the lifetime distribution
8. Proportional hazard model
9. Exposed to risk
10. Graduation

CT6 CHAPTERS
14. Time series
15. Loss distributions
18. Reinsurance
19. Risk models

You might also like