CS2A Final Notes
CS2A Final Notes
1) Definition:
Stochastic Process is a set/ family/ collection of ordered (time dependent) Random
Variables.
e.g. Xt = X1, X2, X3,……………….. : Score of T-20 after t balls
where: State Space (S) = 1,2,3,……..,720
and, Time Domain (J) = 1,2,3,……….,120
2) State Space:
It is the set of values which is possible for a random variable Xt to take.
e.g. State Space of rolling a coin is [H,T].
3) Time Set:
It is the time at which a process contains a random variable Xt.
e.g. time set of rolling a coin is {0,1,2,…………..}
4) Sample path:
It is the collection of all the possible values which a process can take over the time
period.
e.g. sample path of rolling two coins up to time period 1 is [ HH,HT,TH,TT].
A joint realisation of the random variables Xt for all t in J is called a sample path of
the process; this is a function from J to S .
7) Increments:
It is the change in process over a period of time. ( Xt to X(t+4) )
simple random walk poisson process general random walk brownian motion
markov chain markov jump time series ito process
NCD counting process white noise compund poisson
Credit rating at the end of status of pension scheme inflation index process
year members share price during
trading period
13) Stationarity:
Strict Stationarity: A Stochastic process is said to be a strictly stationary,
if the joint distributions of
Xt1 ,Xt2…Xtn and X(k+t1), X(k+t2)……,X(k+tn) are identical or
f(Xt1 ,Xt2…Xtn) = f(X(k+t1), X(k+t2)……,X(k+tn))
for all t1, t2,.…,tn and k+t1, k+t2,…….k+tn in J and all integers n.
hence, the statistical properties (mean, variance etc.) of process remain unchanged as
the time elapses.
Weak- Stationarity:
Because strict stationary is very difficult to test in real life, we use less stringent
condition of weak stationary.
i) E[Xt] = constant (free of t)
ii) Cov[ Xt, X(t+k) ] = function of time lag (k), not t
iii) Var[Xt] = Cov[ Xt, Xt ] = constant, not t (where time lag k = 0)
iv) Random Walk is not stationary as it’s mean is a function of time t.
v) White noise process (Zt) is strictly stationary with mean 0 and variance σ^2 (both
are constant).
vi) Weakly stationary multivariate is strictly stationary:
A normal distribution is defined by its mean,µ, and its variance, σ2 only. So if these
are constant (as per the weakly stationary definition) then this will uniquely define
the process. Hence it will also be strictly stationary.
0% 30% 60%
0.25 0.25
It is one-step Markov model.
[ ]
0% 0.25 0.75 0
P= 30 % 0.25 0 0.75
60 % 0 0.25 0.75
4) Notation: i
t n steps t+n
13) Irreducibility:
If there always exist a path from any state I to any state j irrespective of no. of steps, then
markov chain is irreducible.
14) Periodicity:
A state in a Markov chain is periodic with period d 1 if a return to that state is possible
only in a number of steps that is a multiple of d .
Results:
a) If Markov chain is irreducible, all the states have same period or all the states are
aperiodic.
b) A Markov chain has period d if all the states in the chain have period d .
c) A Markov chain is aperiodic if If there is no such d 1.
d) Note: if chain may remain at it’s current state, chain is aperiodic.
2) Two-state Model:
D D
A A
S = {A, D}
J = (0, ∞)
a) P[ Alive at age x+t+h | Alive at age x ] = p(x, x+t+h)AA = t+hpx
b) P[ Dead at age x+t+h | Alive at age x ] = p(x, x+t+h)AD = t+hqx = 1 - t+hpx
c) P[ Dead at age x+t+h | Dead at age x ] = 1
d) P[ Alive at age x+t+h | Alive at age x ] = 0
b) Define RV Ti as follows:
i) x + Ti = age at which observation if ith life ends.
ii) Di = 0 Ti = bi i.e. no death occurred (discrete).
iii) Di = 1 ai < Ti < bi i.e. death must occurred between x+ai and x+bi
(continuous).
iv) Ti is a mixed RV because it has probability mass at bi.
5) Vi = Ti – ai (duration)
i) Vi = Exact waiting time
ii) Di = 0 Ti = bi Vi = bi – ai maximum waiting time
iii) Di = 1 ai < Ti < bi 0 < Vi < bi-ai exact waiting time.
iv) Vi is mixed RV with probability mass at bi – ai.
v) Joint PD of di and vi =
f(di, vi) = vipx+ai *( µx+ai+vi )di = e-µ*vi * µ di
vi) µ̂ = sum(di)/sum(vi) = d/v = total no. of deaths / total observation
period. This is point estimate of µ.
6) µ̃ ~ N( µ, d/v2 )
95% CI for µ = µ̂ +– 1.96*sqrt(d/v2)
7) Poisson Model:
Dx ~ P( µx+0.5 * Ecx )
Where Ecx = exact waiting period
µ̂x+0.5 = d/ Ecx = No. of deaths / total obs. period (Exact waiting time) = d/v
V( µ̂x+0.5) = d/(Ecx)2
Chapter – 4 & 5 (Time-inhomogeneous
Markov jump process)
1) Markov jump processes:
A continuous time Markov Process Xt , t>=0 with a discrete state space S
is called Markov Jump process.
A Markov jump process is a stochastic process with a continuous time
set and a discrete state space that satisfies the Markov process.
b) Compact Form:
∂/∂t P(s,t) = P(s,t)*A(t)
where A(t) is the matrix with entries µij(t) .
c) Integrated Form:
pij(s,t) = ∑k≠i int(0, t-s): [pik(s,t-w)dw] *[µkj(t-w)]* [exp(t-w,t): -λj(u)du] dw
where,
the probability of going from state i at time s to state k at time t-w
then making a transition from state k to state j at time t-w
and staying in state j from time t-w to time t .
8) Time-inhomogeneous HSD model (KFDE):
σ(t)
H: Healthy S: Sick
ρ(t)
µ(t) v(t)
D: Dead
σ(t)
H: Healthy S: Sick
ρ(t)
µ(t) v(t)
D: Dead
RS: Residual holding time at s is the amount of time after s (>s) for which the
process stays in current state.
For (residual holding time at s) > w, the process must stay in state i for all
times u between s and s+w, given that process was in state i at time s.
Or
It is the probability that process stays in state i for at least next w time units.
P[ RS>w, XS = I ] = pi̅i̅(s, s+w) = exp[ - int(s,s+w): λi(t)*dt ]
15) Probability that the process goes into state j when it leaves state I:
a) Given that the process in state i at time s.
b) It stays in state i until the time (s+w).
c) The probability that it moves to state j at time(s+w) is: µ ij(s,s+w)/λi(s,s+w)
i) Force of transition within same state i = -∑λi(t)
ii) Force of transition out of state i = λi(t)
3) Properties of o(h):
a) o(h) +- o(h) = o(h)
b) k* o(h) = o(h)
c) lim o(h)/h = 0
h→0
d) lim g(h)/h = A
h→0
g(h)/h = A + o(h)/h
Note:
i) We would expect mx to be highest for that with lowest estimated exposure.
ii) For a given number of deaths over the period, the estimated exposure would be
highest if we assume an increasing mortality rate.
iii) If actual number of survivor is less than expected, then in that case UDD assumption
is appropriate (increasing mortality rate or deaths are evenly spaced) but over single
years of age. UDD is not acceptable over a 10 year of age span.
iv) Under UDD, survival function decreases linearly between ages x and y while under
CFM, survival function decrease exponentially between ages x and y.
19) Formulas:
a) When PDF (fX(t)) is given:
i) FX(t) = int(0,t):fX(s)*ds
ii) SX(t) = 1 – int(0,t):fX(s)*ds
iii) µx+t = fX(s) / ( 1 - int(0,t):fX(s)*ds ) = PDF/Survival Function
Nelson-Aalen Model:
It derive an estimator for cumulative hazard function.
Ŝ(t) = exp(-capital λt) where capital λt = ∑j=1k dj/nj or ∑(j=1 to k) λj
VAR[capital λt~] approx. = ∑(tj<=t):dj(nj-dj)/nj3
L = ∏(censored lives) S(ti) * ∏(deaths) f(ti)
Types of Censoring:
1. Right Censoring:
Right censoring occurs when a life exits the investigation for a reason other than death.
Both random censoring and Type I censoring are examples of right censoring.
E.g. Endowment policy matures.
i) Random Censoring:
With random censoring, the censoring times are not known in advance – they are not
chosen by the investigator and are random variables.
E.g. censoring in life insurance the event of a policyholder choosing to surrender a policy.
2. Left Censoring:
It occurs when the censoring mechanism prevents us from knowing when entry into state
that we wish to observe take place or past information is missing.
e.g. Exact DOB/ date of sickness is not known.
3. Interval Censoring:
It occurs when observation plan allows us to say that an event of interest fell within some
interval of time.
E.g. Policy sold in year 2020 means between interval 1/12020 to 31/12/2020.
4. Type – II Censoring:
If observation is continued until a predetermined number of deaths has occurred.
e.g. a trial ends after 100 lives on a particular course of treatment have died.
5. Informative censoring:
When there is reason that the risk of death of a person still in cohort and those who has left
the cohort are different.
E.g. In this investigation withdrawals might be informative, since lives that are in better health
may be more likely to surrender their policies than those in a poor state of health. Lives that are
censored are therefore likely to have lighter mortality than those that remain in the
investigation.
6. Non-informative censoring:
Censoring is non-informative if it gives no information about the future patterns of
mortality by age for the censored lives.
E.g. In the context of this investigation, non-informative censoring occurs if at any given time,
lives are equally likely to be censored regardless of their subsequent force of mortality. This
means that we cannot tell anything about a person’s mortality after the date of the censoring
event from the fact that they have been censored.
Median:
The median time to qualify as estimated by the Kaplan-Meier estimate is the first time at which S(t) is
below 0.5.
Chapter – 8 (Proportional Hazard
Model)
Definition:
If the hazard for life i is λ(t; zi), then λ(t; zi) = λ0(t) *exp(b*ziT) where λ0(t) is the baseline hazard,
and β is a vector of regression parameters.
Fully parametric models vs Cox regression model for assessing the impact of
covariates on survival:
Fully parametric models are good for comparing homogenous groups, as confidence
intervals for the fitted parameters give a test of difference between the groups which
should be better than non-parametric procedures, or semiparametric procedures such
as the Cox model. But parametric methods need foreknowledge of the form of the
hazard function, which might be the object of the study
The Cox model is semi-parametric so such knowledge is not required. The Cox model is a
standard feature of many statistical packages for estimating survival model, but many
parametric distributions are not, and numerical methods may be required, entailing
additional programming.
Proportional Hazard:
µi(t) = λ0(t) * exp(β1X1 + β2X2)
Observation:
tpx = gc^x *(c^t -1) where g = e-B/log c
It is an exponential function which implies that the rate of increase in
mortality with age is constant.
The Gompertz model is appropriate at ages over about 30.
It is appropriate for ages at which the force of mortality is increasing
exponentially.
Likelihood of data:
L = ∏ i=1 to n f(ti)di S(ti)1-di
Where f(ti) is the probability density function and S(ti) is the survivor function.
f(ti) = h(ti)*S(ti)
the likelihood can be rewritten as:
L = ∏ i=1 to n h(ti)di S(ti)
Calculate 95% CI of β:
(-∞ , β̂+1.645*sqrt(CRLB)) for β<0
(β̂ - 1.96*sqrt(CRLB) , β̂ + 1.96*sqrt(CRLB)) for β≠0
(β̂ - 1.645*sqrt(CRLB) , ∞) for β<0
Where CRLB = -1/E[d2l/dβ2] at β = β̂
Result: If 0 lies in this interval, there is insufficient evidence to reject the null
hypothesis (as stated above).
Chapter – 9 (Exposed to Risk)
Principle of correspondence:
The principle of correspondence states that the death data and the exposed to risk must
be defined consistently, ie the numerator (dx) and denominator (Ecx) must correspond.
A life alive at time t should be included in the exposure at age x at time t if and only if,
were that life to die immediately, he or she would be counted in the death data dx at
age x .
Assumptions:
i) Data vary linearly between two census dates or human birthday’s are uniformly
distributed.
ii) We assume that deaths follow poisson distribution with parameter (µx+1/2 Ecx) and µ is
applicable at the mid of two ages depends upon definition of deaths of data.
µ and q estimates:
Definition of x Rate interval ˆ estimates qˆ estimates
Age last birthday [ x,x+1] µx+1/2 qx
Age nearest birthday [x – ½ ,x + ½ ] µx qx-1/2
Age next birthday [x-1 , x] µx-1/2 qx-1
Comparison of recent experience (to check the consistency i.e., shape of mortality curve
over range of ages and level of mortality rates) with:
Company’s own experience
Already published life tables (standards tables like National life tables, English Life tables and
tables based on data from insurance companies).
Limitation of Graduation:
If data is faulty or biased, resulted output will never be reliable.
1. Chi-squared Test:
i) Purpose: To test whether observed number of deaths at each age are consistent with
graduated mortality rates or a particular standard table.
ii) Rationale / Observation: A high value of test statistics indicates that the discrepancies
between observed numbers and those predicted by graduation rates and standard table are
large, hence, the fit is not very good due to overgraduation.
iii) Assumptions:
No heterogeneity of mortality (e.g., no accidental hump) within each age group
Lives are independent
The expected number of deaths are high enough (at least 5 in each cell) for the chi-
square approximation to be valid.
iv) Methods:
a) Step 1: Calculate zx:
i) Binomial Model:
zx = (dx – Ex*qx0 ) / sqrt( Ex*qx0 *(1- qx0 ))
ii) Poisson Model:
zx = (dx – Ex*µ0x+0.5) / sqrt(Ex*µ0x+0.5)
Hence, calculate z2.
b) Combine small groups such that expected number of deaths is never less than 5.
c) Step 3: Calculate test statistic for chi-square goodness of fit test: sum((O-E) ^2/ E).
d) ∑i=1to m z2 = chi-sq(m)
e) Step 4: Calculate appropriate degree of freedom:
m: number of age groups
Number of parameters
Number of constraints
Clubbing’s
We lose 2-3 degrees for every 10 ages graduated graphically.
v) Conclusion: If values of test statistics exceed critical value at upper 5% point of chi-square
distribution, it indicates poor fir or overgraduation.
vi) Strengths of chi-square test: It is a good test to check overall goodness-of-fit.
vii) Deficiencies of chi-square test:
a. Outliers: Few large deviations can be offset by lot of small deviations. So, test could
be satisfied although the data do not satisfy the distributional assumptions.
(Assumption – 1 define above).
b. Small Bias: It ignores consistent positive or negative bias due to squaring of
difference (O-E) ^2. (Since it is based on squared deviation, it tells nothing about
direction of any bias.)
c. Clumps/Runs: There could be significant groups of consecutive ages (continuously
up or down) called as clumping which should be avoided.
d. Rates may not progress smoothly from age to age.
5. Sign Test:
i) Purpose: To test overall bias (whether graduate rates are too high or too low – 2nd deficiency
of chi-square test).
ii) Rationale / Observation: If there are m groups, the number above (or below) should have a
binomial distribution (m , ½ ). A excessively high number of positive and negative deviations
indicates that the rates are biased.
iii) Assumptions: None.
iv) Methods:
a) Step 1: H0: P~ Bin(m , ½) where P : No. of zx which are positive.
b) Step 2: Calculate Probability[P <= k] for k being larger deviation or P[P>=k] for k
being lower deviation number (two- tailed test) and multiply by 2.
c) If m is large, P~ N(1/2* m, ¼ * m)
d) If p value exceed upper 5% , then there is insufficient data to reject null hypothesis.
e) If P[P>=k-0.5] > 2.5%, we cannot reject H0. Where k are the positive deviations and
0.5 is continuity correction.
f) If P[P<=k+0.5] > 2.5%, we cannot reject H0. Where k are the negative deviations.
v) Conclusion: If test shows that no. of positive or negative values are too large or too low, this
indicates that rates are too high or too low.
vi) Strengths:
a. It can detect the overall bias.
b. It is rigidly defined. (Single solution).
vii) Deficiencies of chi-square test:
a. Looking at sign does not tell about the extent of the discrepancy.
b. Test is qualitative.
c. If there are equal number of deviations on both sides, then test can’t be conducted.
d. Ignored the magnitude of deviations. Hence, test can be cleared even if deviations are
large.
e. It does not look at the pattern of occurrence of deviations (Can’t detect clumping).
6. Cumulative deviations:
i) Purpose: It address the problem of inability of chi-square test to detect a large positive or
negative cumulative deviation over all range of age.
ii) Rationale / Observation: Ecx µox .It is two tailed test.
iii) Assumptions: None.
iv) Calculate T.S.:
1. Binomial Model:
zx = (∑dx – ∑Ex*qx0 ) / sqrt( ∑Ex*qx0 *(1- qx0 )) ~N(0,1)
2. Poisson Model:
zx = (∑dx – ∑Ex*µ0x+0.5) / sqrt(∑Ex*µ0x+0.5) ~N(0,1)
iv) Result: If T.S. lies between -1.96 to 1.96 at 5% level significance level (2.5% both sides),
then we have insufficient evidence to reject Ho.
Disadvantage:
i) Hard to find a formula to fit well at all ages without having lots of
parameters.
ii) Care is required when extrapolating: fit is bound to be best at ages where we
have lots of data, can often be poor at extreme ages.
Methods of Graduation:
1. By Parametric formula:
a. It should be used when aim is to produce a standard table.
b. it depends upon a suitable formula being found which fits the data well.
c. Provided the number of parameters is small, the resulting curve should be smooth.
2. With reference to a standard table:
a. It should be used if standard table for class of lives is like the experience is
available.
b. It must relate to a similar class of lives, eg assurances and not annuities etc.
c. It must be available for all classes of lives, eg males and females
d. It should be up-to-date, ie relate to fairly recent experience.
e. It must cover the age range for which rates are required.
f. It must be a ‘benchmark’ table, ie generally acceptable to all other actuaries.
g. It should be used when we have not provided much data.
h. The standard table would be smooth, provided there is simple function which
links graduated rates to the standard table rates. Hence, smoothness will be
transferred to graduated rates.
i. Company generally insures non-standard lives which makes it unlikely that a
suitable standard table would exist.
j. Steps:
i. Select a suitable table based on similar group of lives.
ii. Plot the crude rates against qxs from standard table to identify a sample
relationship.
iii. Find the best-parameters, using MLE or weighted least square estimates.
iv. Test the graduation for goodness of fit. If the fit is not adequate, the
process should be repeated.
3. Graphical method:
a. If the quick check is needed and data are very scanty (small in quantity) or there
is very little prior knowledge of class of lives being analysed.
b. The graduation should be tested for smoothness by testing third differences of
graduate rates, which should be small in magnitude and should progress regularly
with age.
c. Steps:
i. Plot the crude data, preferably on a logarithmic scale.
ii. If data are scanty, group the ages together, choosing evenly spaced group
and making sure that there is suitable number of death (at least 5) in each
group.
iii. Plot approximate confidence limit or error bars around the plotted crude
rates.
iv. Draw the curve as smoothly as possible, trying to capture the overall
shape of crude rates.
v. Test goodness of fit or smoothness of graduation.
vi. If the graduation fails, re-draw the curve.
vii. If smoothness is unsatisfactory, the curve can be adjusted by “hand
polishing” and testing again.
1. These are polynomials of a specified degree which are defined on a piecewise basis
across the age range.
A. We will be choosing spline function which are polynomial of specified degree
and we will be defining them on piecewise basis.
B. The spline function must satisfy three conditions at knots:
i. We require the function to have continuous at the knots.
ii. We require the function to be continuous first derivative at the knots i.e.
there should be no sharp turn at the knots.
iii. We require the function to be continuous second derivative at the knots
i.e. there should be no change in curvature at the knots.
iv. Note: The minimum degree of spline function should be cubic so that the
second derivative of the spline function could be exist and continuous.
v. If knots are x1, x2,…….,xn so, function should be linear before x1 and
after xn.
vi. Formula of spline function:
α0 + α1*x + ∑(j=1,n): βj*ϕj(x)
vii. For ages x < first knot x1 , formula = α0 + α1*x
2. Steps:
A. Identify the ages at which we are choosing the knots.
i. We can choose knots by looking at the behaviour of crude mortality rates.
B. Preliminary calculations :
i. Calculate Φj(x)
C. Estimate parameter values
D. Calculate graduation rates
E. Test
Chapter – 12 (Mortality Projection)
Methods of Projecting Mortality Rate:
1. Methods based on expectation:
Equation:
R(x,t) = αx + (1-αx)*(1-fn,x)t/n
Where
R(x,t): It measures the proportion by which mortality rate at age x (q x) is expected to be
reduced by future year t. (Reduction factor)
αx: ultimate reduction factor/ maximum reduction level/ lowest possible value of
reduction factor
1-αx: maximum amount of reduction in future mortality at age x
fn,x: proportion to total decline that is expected to occur in n years.
t: Projected mortality in t years
n: Total years in which mortality is expected to effect/reduce at age x.
Parameter Estimation:
Both factors αx and fn,x are set by expert opinion (perhaps based on analysis of recent
observed mortality trends.
Calculation:
mx,0: mortality rate for base year of mortality projection
αx = minimum possible mortality rate at age x/ mortality rate for base year of mortality
projection
mx,t: projected/ central rate of mortality at age x in year t.
Example-
Lee-Carter Model:
Ln(mx,t) = ax + bx*kt + ex,t or
mx,t = exp(ax + bx*kt + ex,t)
ax: Mean value of Ln(mx,t) averaged over all period t for age x / General shape of
mortality at age x.
bx: extent to which the time period effects the mortality rate at age x / it measures the
change in rates in response to an underlying time trend in the level of mortality k t for
age x.
kt: effect of time on mortality or factor related to mortality rates for year t / effect of
time trend in year t
ex,t: stochastic error terms which are assumed to be IID random variables for all x,t with
mean 0.
Factors:
i) An age factor
ii) A period factor
Constraints:
∑b̂x = 1 over all values of x
∑k̂t = 0 over all values of t
Calculation of parameters:
âx = 1/n * ∑(t =1,n): ln(m̂x,t)
If kt is a function of time series –
k̂t0 + 1 = kt0 + µ̂
k̂t0 + I = kt0 + I*µ̂
4. General Result:
Dx,t/ Ecx,t is an unbiased estimator of mx,t. So,
mx,t = E[ Dx,t / Ecx,t ] and p̂x,t = exp(m̂(x,t))
qx,t = qx,0 * Rx,t
Chapter – 13 & 14 (Time Series)
1. Meaning of Time Series:
A time series is a stochastic process with continuous state space and discrete time domain.
E.g. closing price of a stock.
Where S = (0,∞) and J = { 0,1,2…..}
3. Stationarity:
A stochastic process is called stationary if the statistical properties (mean, variance etc.) of
process remain unchanged as the time elapses.
For practical purposes, it is sufficient for a series to be ―weakly stationary‖, which requires its first
two moments to be constant over time. In other words, the mean and variance take constant
values, and the covariance depends only on the lag, not on the time t.
Stationarity is an issue relating only to the autoregressive AR(p) terms, and is not affected by adding
or subtracting constants (e.g. MA(q) terms in ARMA(p,q) can be ignored and can simply write
ARMA(p.q) process to simple AR(p) process.
4. Invertibility:
A time series process is said to be invertible if we can express et in terms of Xt’s only.
9. Markov Property:
If the future evaluation of the process can be completely determined by the knowledge of
its current state only (one-step dependency) and other past information becomes irrelevant
or useless, then the process is said to have Markov property.
Note: AR(p) is only Markov iff p=1.
10.Important Points:
1. Stationarity is necessary for all models since the Yule-Walker equations do not hold
without the existence of the auto-covariance function.
2. If observed data comes from an MA(q), it means that the autocorrelation function (ρ k)
will cut off (i.e ρk = 0) for all k>q or non-zero up to lag q and PACF(φk) will decay
exponentially to zero, but it will never get there, so that the PACF will always be non-
zero.
3. If observed data comes from an AR(p), e.g. the time series is an AR(3) series. The
autocorrelation function (ρk) will decay (i.e tend to 0) and the PACF(φk) will cut off (i.e φk
= 0) for k>3.
4. Exponential smoothing might be expected to outperform Box-Jenkins forecasting when
a slowly varying trend or multiplicative seasonal variations is present.
14.Inspection of ACF:
Purpose: This test also checks whether the residuals are uncorrelated.
T.S. The ACF of the residuals should be zero for all lags except k = 0.
An approximate 95% confidence interval for k , k 1 , is (-1.96/sqrt(n) , 1.96/sqrt(n)).
Result: If all ρk values falls under above confidence interval, then there is insufficient
evidence to reject the null hypothesis. Hence, The tests suggest that the residuals form a
white noise process and hence the fitted ARMA(p,q) model is satisfactory.
15. ARCH(ρ) Model:
ARCH(ρ) models are defined by the relation:
Xt = µ + et*Sqrt( α0 + ∑ k = 1 to ρ αk*(Xt-k - µ)2
Where et is a sequence of independent standard normal random variables.
ARCH models can be used for modelling financial time series . If Z t is the price
of asset at the end of tth trading day, ARCH model can be used to model Xt =
ln(Zt/Zt-1), interpreted as daily return on day t.
The ARCH family of models captures the feature frequently observed in the
asset price data that a significant change in price of the asset is often followed
by a period of high volatility. A significant deviation of Xt-k from the mean µ
gives rise to an increase in volatility of the asset prices.
17.Tests applied to residuals (et) after fitting a model to time series data:
i) The turning point test
ii) The “portmanteau” Ljung-Box chi-square test
iii) The inspection of the values of SACF values based on their 95% CI under
white noise null hypothesis.
See more detail on these tests on page no. 1259.
1. General Form:
2. Auto-Correlation Function(ρk):
i) ρXY = Cov(X,Y) / SQRT(V(X)*V(Y)) ρk = γk/γ0
ii) MA(1) = ρk = 0 for all k>1
iii) MA(2) = ρk = 0 for all k>2.
iv) MA(p) = 0 for all k>q.
3. Conditions to check the stationary for MA process:
MA process is always stationary because MA process is the linear combination
of the white noise processes.
Characteristics Equation:
1 - β1z - β2z2 -..……- βqzq = 0
If for all roots, |z|>1 , process is invertible.
1. General Form:
Xt = µ + α1(Xt-1 -µ) + α2(Xt-2-µ) +………………+ αp(Xt-p -µ) + β1et-1
+ β2et-2 +…………………..+ βqet-q + et
φ(B)*Yt = θ(B)*et
where, φ(B) = [ 1 – α1B – α2B2 -…………….- αpBp ] by using Yt = Xt - µ and Xt-1 = BXt
and
θ(B) = [ 1 + β1B + β2B2 -……………+ βpBp ]
Note: It is assumed that φ(B) and θ(B) have no common factors. If there are
common factors, then the expression must be simplified.
2. Auto-Covariance Function (γk):
i) ARMA(1,1) = γk = αγk-1 for all k > 1
ii) ARMA(3,2) = γk = α1γk-1 + α2γk-2 + α3γk-3 for all k > 2
After crossing the lag of q i.e. MP process, ARMA process starts behaving like
AR process for all k>=p.
3. Auto-Correlation Function(ρk):
iii) ρXY = Cov(X,Y) / SQRT(V(X)*V(Y)) ρk = γk/γ0
iv) ARMA(1,1) = ρk = αk-1ρ1 for all k>1
After crossing the lag of q i.e. MP process, ARMA process starts behaving like
AR process for all k>=p.
4. Conditions to check the stationary for MA process:
Check stationarity of φ(B) (refer AR(p) – 4th point) as MA process is always
stationary.
5. Conditions to check Invertibility:
Check invertibility of θ(B) (refer MA(q) – 5th point) as AR process is always
stationary.
6. Important Result:
i. AR(p) is the special case of ARMA(p,0) process.
AR(p) = ARMA(p,0)
ii. MP(q) is the special case of ARMA(q,0) process.
MP(q) = ARMA(0,q)
φ(B)*Yt = θ(B)*et
where,
φ(B) = [ 1 – α1B – α2B2 -…………….- αpBp ] and
θ(B) = [ 1 + β1B + β2B2 -……………+ βpBp ]
2. Calculation of d:
a. If φ(B) is stationary, d = 0, ARIMA(p,0,q) = ARMA(p,q)
b. If φ(B) is not stationary, make φ(B) stationary by differencing d times.
c. If the sample variance of d at difference values are given, then we
will choose d with minimum sample variance.
d. If ACF decays slowly from 1, then, we will need differencing.
Otherwise, d = 0
3. Important Result:
a. AR(p) = ARIMA(p,0,0)
b. MP(q) = ARIMA(0,0,q)
c. ARMA(p,q) = ARIMA(p,0,q)
Where Wt = Y
t
[ ] and
Xt
A is m*m matrices (Square matrices).
Here, Future value (Wt) depends only upon 1 past value (Wt-1), So, it is VAR(1)
process.
2. Conditions to check Stationarity:
|A-λI| = 0
If |λ| < 1 for all values of λ, process is stationarity.
3. Conditions to check Invertibility:
Just like AR(p), VAR(p) is always invertible.
4. Diagnostic checks
Note: More details in notebook and page no. 1245 of study material.
ii) AR(2) =
I. γ0 = α1*γ1 + α2*γ2 + σ2 or ρ0 = α1*ρ1 + α2*ρ2 + σ2
II. γ1 = α1*γ0 + α2*γ1 or ρ1 = α1 + α2*ρ1
III. γ2 = α1*γ1 + α2*γ0 or ρ2 = α1*ρ1 + α2
IV. γk = α1*γk-1 + α2*γk-2 for all k >= 2
iii) AR(p) = γk = α1*γk-1 + α2*γk-2 + α3*γk-3 +………. + αp*γk-p for all k >= p
4. Other Formulas:
iv) µ̂ = ∑xi/n
v) γ̂0 = ∑(xi - x̅)2/n = Cov(Yt,Yt) = V(Yt)
vi) γ̂1 = ∑(xi - x̅)*(xi-1 - x̅) /n
vii)γ̂2 = ∑(xi - x̅)*(xi-2 - x̅) /n
viii) ϕ̂1 = ρ1 = γ1/Sample variance
ix) φ2 = (ρ2 – ρ12)/(1- ρ12) where ρ2 = γ2/Sample variance
x) Corr(x,y) = Cov(x,y)/sqrt(var(x)*var(y))
xi) Let Xt = α*Xt-1 + et and et ~ N(0,σ2) then,
a. Xt - α*Xt-1 ~ N(0 , σ2) and
b. Xt|Xt-1 ~ N(α*Xt-1 , σ2)
c. L = ∏ i=1 to n P(Xi = xi| xi-1) * P(x0) where P(x0) = 1
d. Yt = 1 + 0.6Yt-1 + 0.16Yt-2 + et or Yt = a0 + a1 Yt-1 + et
From the stationarity condition then,
E(Yt ) = μ = 1 + 0.6μ + 0.16μ + 0 or μ = a0 +a1μ
GEV Family of
Distruibutions
3) Hazard rate:
h(x) = f(x)/(1-F(x))
h(x) increases, light tailed
h(x) decreases, heavy tailed
if h(x) is increasing function of x, then, x has a light tail.
2) Marginal CDF:
a) Fx(x) = int(s: start, x): fx(s)*ds
3) Property of CDF:
CDF of every distribution follows U(0, 1)
Hence, Fx(x) = u is true for all distribution.
4) Measure of Association:
a) Pearson correlation coefficient:
i) It measures the strength of linear relationship.
ii) After applying monotonic function (X, X2, X3,....), there is different rho for each
function. So, it does not satisfy invariance property.
iii) ρX,Y = cov(X,Y)/sqrt(V(X))* sqrt(V(Y))
5) Copula Function:
A copula function takes marginal CDF as input and gives joint CDF as output.
C[ FX(x) , FY(y) ] = FXY(x,y)
Or C(u,v) = FXY(x,y) > Bivariant input information.
Where FX(x) = u and FY(y) = v
7) Sklar’s theorem:
For every joint CDF, there exist a copula function.
And if there is a copula function, there exist a joint CDF.
Let F be a joint distribution function with marginal cumulative distribution functions F1,…, Fd, then
there exists a copula C such that for all x1,…, xd ϵ [-∞, ∞]: F(x1,….., xd) = C[F1(x1),…, Fd(xd)]
Types of
Copula
Functions
a. Gumbel Copula:
C[u,v] = exp(- (-ln uα + -ln vα )1/α )
λL = 0 , λU = 2 - 2-1/α
Notes:
1. It applies when there is an upper tail on dependence but no lower tail dependence.
2. Higher the value of α, higher the level of upper tail dependence for Gumbel copula
(for α > 1).
3. If α = 1, it will become Independent or product copula.
b. Clayton Copula:
C[u,v] = (u-α + v-α – 1)-1/α
λU = 0 , λL = 2-1/α
Notes:
1. It applies when there is lower tail dependence on α but no upper tail dependence.
2. Higher the value of α, higher will be the lower tail dependence.
c. Frank Copula:
C[u,v] = -1/α * ln( 1 + (e-αu – 1)*(e-αv – 1)/ (e-α – 1) )
λL = λU = 0
Notes:
1. Under this, there is no upper or lower tail dependence but have positive
dependence throughout the function.
2. If there is not at all dependence on whole function ie α → 0, then it means they are
fully independent and C[u,v] = u*v.
15) Implicit Copula:
a) Gaussian Copula:
C[u,v] = Φρ [ Φ-1(u) + Φ-1(v) ]
Where Φ is the CDF of Standard Normal Distribution.
And Φρ os the distribution function of bivariate Normal Distribution with correlation ρ.
Notes:
1. If ρ = 0 Independent copula
2. If ρ = -1 Maximum Copula
3. If ρ = +1 Minimum Copula
4. Simplified formula is that page no. 889 of the study material.
5. Gaussian copula has zero upper tail dependence for ρ<1.
b) Student’s t Copula:
C[u,v] = tγ,ρ [ tγ-1(u) , tγ-1(v) ]
Where γ is the degree of freedom.
tγ : CDF of t distribution
tγ,ρ: CDF of bivariate t distribution with correlation ρ.
Notes:
1. This copula is better than Gaussian copula as it has additional parameter γ.
2. γ decides degree of tail dependence (no matter upper tail or lower tail).
3. Smaller the value of γ, greater the level of tail dependence.
4. As γ → ∞ , it approaches to Gaussian copula.
Inverse Function:
i) Put ψ(x) = y,
ii) Calculate x in terms of y.
iii) Replace x = ψ-1(y)
Pseudo-Inverse Function:
If ψ(0) = ∞ , ψ[-1](x) = ψ-1(x)
Archimedean Copula:
It is described by specific generator function:
C[u,v] = ψ[-1] (ψ(u) + ψ(v))
Where ψ(x) is corresponding generator function.
C[u1, u2,……..,un] = ψ[-1] (ψ(u1) + ψ(u2) +………………+ ψ(un))
Insurance:
where
X: Total Claim Amount payable by the policyholder
Y: Claim amount paid by insurer
Z: Claim amount paid by reinsurer
So, X = Y + Z where all are the Random Variables.
X=Y if there is no reinsurance
Conditions under which insurance is available:
Financially valuable/ expensive
Durable
Probability of event is very low. (Cancer patient can’t buy insurance)
Types of Reinsurance:
Reinsurance
Non-
Proportional
Proportional
Proportional Reinsurance:
1. Surplus: Claim ratio is different for different claims.
2. Quota Share: A fixed claim ratio (α) is decided between insurer and reinsurer.
α: A proportion of risk bears by the insurer. 0<α<1
1-α: A proportion of risk bears by reinsurer.
1. Pareto Distribution(α,λ):
a. Proportional Reinsurance:
E[X] = λ/(α-1)
E[Y] = α* λ/(α-1)
E[Z] = (1-α)* λ/(α-1)
2. Log-normal Distribution(µ,σ2):
a. Proportional Reinsurance:
E[X] = eµ+0.5σ2
E[Y] = α* eµ+0.5σ2
E[Z] = (1-α)* eµ+0.5σ2
Conditional Reinsurance
Under Conditional Reinsurance,
Z = X-M|X>M, where Z>0
Some Results:
a. FZ(z) = (FX(M+z) – FX(M)) / 1- FX(M)
b. FZ(z) = fX(M+z)/(1-fX(M)
c. If X~Exp(λ), then Z~Exp(λ)
d. If X~Pareto(α,λ), then Z~Pareto(λ+M)
Inflation
E[Y] = E[X] – E[Z] = E[X] - Int(M/k,∞):(kx-M)f(x)dx
Or E[Y] = Int(0,M/k):kx*f(x)dx + Int(M/k,∞):M*f(x)dx
E[Z] = Int(M/k,∞):(kx-M)f(x)dx
Estimation
1. Maximum Likelihood Estimation:
L = ∏ i=1 n-a f(xi) * P[X>M]a
Where a: number of claims in which reinsurer is involved.
2. Method of Percentile:
P[X<= x25] = 0.25
P[X<= x75] = 0.75
Chapter – 19 & 20 (Risk Models – I & II)
Types of Risk Models:
1. Collective Risk Models
2. Individuals Risk Models
CDF of S:
{S<=x}: It is the summation of mutually exclusive events (due to different value of N) but not
exhaustive (because claims with amount greater than x are not covered here.
Skewness:
Skewness = E[ (X – E[X])3 ] = C’’’(0)
Where CX(t) = ln(MX(t)) and differentiate CX(t) w.r.t. t
Coefficient of Skewness:
Skewness(X) / Var[X]3/2
Conditional PDF:
Ni|λi ~ Poisson (i) and λi ~ Gamma(α,β)
P[Ni=x] = Int(λi): P[Ni = n|λi] * f(λi) dλi
f(n) = Int(λi = 0,∞): f(n|λi) * f(λi) dλi
Portfolio A :
E[S1] = E[X]*E[N]
VAR[S1] = E[N]* VAR[X] + E[X]^2 * VAR[N]
NOTE: If N follows Poisson distribution, VAR[S1] = E[N]* [ VAR[X] + E[X]^2 ]
Portfolio B :
E[S2] = E[X]*E[N]
VAR[S2] = E[N]* VAR[X] + E[X]^2 * VAR[N]
NOTE: If N follows Poisson distribution, VAR[S1] = E[N]* [ VAR[X] + E[X]^2 ]
JOINT PORTFOLIO:
S = S1 + S2
E[S] = E[S1] + E[S2]
V[S] = V[S1] + V[S2] (By Principle Of Independence)
2. Unsupervised Learning:
In unsupervised learning the machine is set the task without a specific target to aim at. For
example o identifying clusters within a set of data without the number or nature of the
clusters needing to be pre-specified).
Examples:
Cluster analysis
Principal components analysis
Apriori algorithm
Market basket analysis
Text analysis
Neural networks
3. Train-validation-test approach:
1. In machine learning, the convention is to divide the data into two parts.
i) A training dataset
ii) A validation dataset
2. A training data set which is the sample of data used to fit the model; that is, to train the
algorithm to choose the most appropriate hypothesis;
3. A validation data set which is the sample of data used to provide an unbiased evaluation
of model fit on the training dataset while adjusting the hyper-parameters.
4. These hyper-parameters are often specified in advance and then adjusted/optimized
according to the performance of the model on the validation data;
5. Finally, the test dataset is used where a test data set which is the sample of data used to
provide an unbiased evaluation of the final model fit on the training data set. Under
machine learning the results of the modelling exercise are applied to data which was
not used to develop the algorithm, so the test data should be representative of the
data on which the algorithm is to be used.
6. A typical split of data is 60% for training, 20% for validation and 20% for testing the
principle being that enough data must be selected for the validation and testing sets,
with the remainder used for the training set.
5. Parameters:
Parameters are variable internal to a model. These values are estimated from the data and
used when calculating predictions from the model.
6. Hyper-parameters:
Hyper-parameters are variables external to the model whose values are set in advance by
the user prior to running an algorithm. They are chosen based on the user’s knowledge and
experience in order to produce a model that works well.
7. Heuristic:
‘Heuristic’ means that there are no hard and fast rules for these. They are determined using
rough guidelines and past experience of what works well, combined with experimentation.
8. Over-fitting:
it leads to the identification of patterns that are specific to the training data and do not
generalize to other data sets.
So there is a trade-off here, between bias – the lack of fit of the model to the training data –
and variance – the tendency for the estimated parameters to reflect the specific data we
use for training the model.
9. Cross-validation:
Cross-validation is a technique to evaluate predictive models by partitioning the original
sample into a training set to train the model, and a test set to evaluate it. In s-fold cross-
validation, the original sample is randomly partitioned into s equal size subsamples and to
‘train’ the model s times, using a different slice for validation each time.
11.Regularisation or penalization:
This approach exacts a penalty for having too many parameters.
CT4 CHAPTERS
1. Stochastic process
2. Markov chains
3. Two-state Markov model
4. Time homogeneous & Inhomogeneous Markov jump model
6. Survival model
7. Estimating the lifetime distribution
8. Proportional hazard model
9. Exposed to risk
10. Graduation
CT6 CHAPTERS
14. Time series
15. Loss distributions
18. Reinsurance
19. Risk models