Survival Analysis
Survival Analysis focuses on estimating the length of time until an event occurs. It is called
‘survival analysis’ because it was largely developed by medical researchers interested in
estimating the expected lifetime of different cohorts. Today, these methods are applied to many
types of events in the business domain.
Examples:
How long will a customer remains on books before churning
How long until equipment needs repairs
Survival Analysis is useful when we want to measure the risk of events occurring and our data
are Censored.
This can be referred to as failure time, event time, or survival time.
If our data are complete and unbiased, standard regression methods may work.
Survival Analysis allows us to consider cases with incomplete or censored data.
The Survival Function is defined as S(t)=P(T>t)S(t)=P(T>t) . Itmeasures the probability that a
subject will survive past time t.
This function:
Is decreasing (non-increasing) over time.
Starts at 1 for all observations when t=0
Ends at 0 for a high-enough t
The Hazard Rate is defined as:
h(t)=\frac{f(t)}{S(t)}h(t)=S(t)f(t)
It represents the instantaneous rate at which events occur, given that it has not occurred already.
The cumulative hazard rate (sum of h(t)h(t) from t = 0 to t = t) represents accumulated risk over
time.
The Kaplan-Meier estimator is a non-parametric estimator. It allows us to use observed data to
estimate the survival distribution. The Kaplan-Meier Curve plots the cumulative probability of
survival beyond each given time period.
Using the Kaplan-Meier Curve allows us to visually inspect differences in survival rates by
category. We can use Kaplan-Meier Curves to examine whether there appear to be differences
based on this feature.
To see whether survival rates differ based on number of services, we estimate Kaplan-Meier
curves for different groups.