0% found this document useful (0 votes)
9 views12 pages

Sac401-Lesson 2

Uploaded by

njokikariuki72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views12 pages

Sac401-Lesson 2

Uploaded by

njokikariuki72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

LESSON TWO

PARAMETRIC MODELING OF SURVIVAL DATA


1.1 Introduction
In lesson one we defined failure time and considered various distributions when it is discrete
and when it is continuous. In this lesson we assume that a specific family of distribution has
been selected so that the distribution is known except for a vector parameter φ . In the
previous statistics courses you have been taught how to construct the likelihood function and
carry out inferences when the distribution of the complete data is known. In this lesson we
will look at the situation when some of the observations are missing. Data of this type is
referred to as survival data. We will construct the likelihood function when some of the
observations are missing and look at different methods of making inference about the
unknown parameterφ .This is called parametric modelling of survival data.

1.2 Learning Outcomes.


By the end of this unit you will be able to;
2.2.1 Construct the likelihood function for survival data when the failure time is continuous
and when it is discrete.
2.2.2 Carry out inference using various methods.
2.2.3 Use this likelihood function to estimate, construct confidence intervals and make
inference about the parameter φ for different distributions.
1.2.1 Construction of the likelihood function.
We suppose that we have a single sample of failure times possible subject to censoring which
we are to use to make inference about φ .
Often φ = ( w , λ )

Where w is the parameter of interest


λ is a nuisance parameter.

For example if the distribution is Normal then;


N ( μ , σ2 ) w could be μ and λ could be σ 2

We look at methods of making inference about w based on the likelihood function so,it is
important is to know how to derive the likelihood function.

1.2.1.1 ;When Failure time T is continuous


Consider the observations which consists of complete and incomplete observations. The

complete observation are denoted by and incomplete observations are

The likelihood function is given by

A unit observed to fail at t contributes a term f (t , φ ) to the likelihood, the density of failure at
time t. On the other hand the contribution from a unit whose survival time is censored at C
contributes S(c , φ ) the probability of survival beyond C.

The full likelihood from n independent units is given by

L= ∏ f (t ; φ ) ∏ S(c i φ )
U C

∏ ∏
Where U and C indicate products over uncensored and censored units respectively.

l = log L = ∑ log f (t i , φ ) + ∑ log S( ci , φ )


U C

x=
Since i
min (ti , c i )

∑ log f (x i , φ) + ∑ log S( x i , φ )
l= U C

Now
f (ti φ) = h(ti φ) S(t i φ)
thus
∑ log h( x i , φ ) S ( xi , φ) + ∑ log S (x i , φ)
l= U C

∑ log h( x i φ ) + ∑ log S ( x i φ) + ∑ log S (x i φ )


= U U C
∑ log h( x i φ ) + ∑ log S( x i φ )
= U A


Where A indicates summations over all sample units

Next we know that

S(t ) = exp (− H (t ) )

Then
l = ∑ log h( x i φ ) − ∑ H ( x i φ)
U A ……………………………………………..(2.1)

1.2.2 Fitting a parametric model to a single sample of survival data by assuming different
distributions.

1.2.2.1 Exponential Distribution

The probability distribution of the exponential distribution is given by


The survivor function is given by

S(t ) = e− βt

f (t )
h(t ) =
The hazard function S(t )

h(t , φ) = h(t , β ) = β

The log likelihood of the single parameter β is given by

l = ∑ log h( x, φ) − ∑ H ( x , φ ) from equation (2.1)

Now
xi
H ( x i φ) = ∫ 0 h (u) du
xi
=
β ∫0 du

=
β xi
Thus

l=
d log β − β ∑ xi
∑ x i is often called the total time at risk for both failures and survivors, d is the total number of
failures.

To get M.L.E we use the method of least squares;

dl d
= − ∑ xi = 0
dβ β
d
¿^ =
∑ xi
β ¿ which is a M.L.E. of β

¿^
We get the variance of β ¿ using the method of information matrix (2.2.2.2)
The information matrix is given by

d2 l d
I =− 2 = 2
dβ β

¿ ^ = ¿¿
^(
¿ β) ¿

Now Var ¿
¿
β2
= d

d
= ∑ x 2i
Note that censored failure times contributes to the denominator and not the numerator of the
ratio.
If there is no censoring ( all the observations were complete) the log likelihood becomes;

l = n log β − β ∑ xi
1.2.2.2 ;When the failure time T is discrete

f (φ) at point U j
Next we consider the case when T is discrete with probability j

such that
U j (U 0 ≤ U 1 ≤ U 2 ≤ ..................

We take the convention that the unit censored at point C could have been observed to fail at C

With this convention a unit that fails at


U j contributes f j (φ) and that censored at C contributes
+
S(C , φ ) .
Now;
∑ f j (φ )
U j ≤C
P(T>C)=1- .
f (φ) =
But j
h j ( φ) (1 − h0 φ) (1 − hi (φ).......(1 − h j − 1 (φ)

j−1
h j ( φ) ∏ (1 − h k (φ ))
= k=0

Then
j−1
+
S(C φ ) = (1 − h j ( φ) ∏ (1 − hk ( φ) )
k=0

= (1 − h 0(φ) (1 − h1(φ)))..........(1 − h j (φ))

S(C + ϕ) = ∏ (1 − h j ( φ) )
⇒ j = u j ≤C for units which have not failed at c.

Each term is a product over points (


U j)

To obtain the full likelihood from a sample of n observations we first collect all the terms

corresponding to
U j If there are d j failures among the r j in view of
U j (i.e. r j are the units that

have not yet failed or censored at


U j or simply units at risk at point U j )
The contribution of
U j to the total likelihood function is

d r j −d j
(h j (φ ) ) j (1 − h j ( φ) )

Therefore the total likelihood is given by

r −d
L = ∏ h j (φ ) )
dj
[ 1 − h j ( φ)] j j

l=Log L = ∑ ( d j log h j (φ ) + (r j − d j ) log (1 − h j (φ ) )


j …………………(2.2)
Example 2.1
The following is a report of a clinical trial to evaluate the efficiency of maintenances
chemotherapy for acute Leukemia. Patients were randomly allocated to groups A and B. Group
A received maintenance chemotherapy and group B did not. The following data on times
(weeks) to relapse (times to remission) were observed.

¿ ¿ ¿ ¿
Group A: 9 13 13 18 23 28 31 34 45 48 116

i)
U
Compute j , j and j
d r
ii) Find the 95% confidence interval for groups A and B assuming an exponential
distribution for failure times.

Solution:
i)

j
Uj rj dj

1 9 14 1
2 13 13 2
3 13* 11 0
3 18 10 1
4 23 9 1
5 31 5 3
6 34 4 1
7 45* 3 0
8 116 1 0
ii)

For group A

d = 7, where d = no of uncensored observations.

Therefore

d
¿^ =
∑ xi
β ¿ = = 0.0185

¿^
The standard error for β ¿ is

= ¿¿
=

= 0.00699

The symmetric 95% confidence interval based on a normal approximations to the distribution of
¿^
β ¿ is

(0.017 - 1.96 x 0.00699, 0.017 + 1.96 x 0.00699)

= (0.0033, 0.031)

1.2.2.3 Weibull distribution:

The probability density of the Weibull distribution is given by

f (t ) = ρk ( ρt )k −1 exp [− ( ρt )k ] which depends on two parameters and k. is called the scale


parameter and k the shape parameter.
Since the Weibull distribution with parameters (k, ρ ) it is continuous distribution use log
l = ∑ log h( x i φ ) − ∑ H ( x i φ )
likelihood for continuous ⇒ U

The hazard function

h(t ) = kρ( ρt )k − 1

If there are d failures the log likelihood is

k
l = d log k + k d log ρ + (k − 1 ) ∑ log xi − ρ ∑ x ki
U
We differentiate partially w.r.t k and partially w.r.t p and equate to zero.

w.r.t p

d l kd
= − kρk − 1 ∑ xik = 0
dρ ρ ……………………………………(2.3)

w.r.t. k

dl d
= + d log ρ + ∑ log x i − ρk ∑ x ki log ( ρ x i ) = 0
dk k U ……………………… …….
(2.4)

If k is specified then m.l.e. of ρ can be found explicitly by solving equation (2.3)

1
kd ^  d  k

− kρ k − 1 ∑ x ki = 0  
x


ρ ⇒  i 

Substitution in the equation (2.4) above results in

d
+ d log ρ + ∑ log xi − ρk ∑ x ki log ( ρ x i ) = 0
k U
Becomes
d
d ∑ x ik log xi
+ ∑ log xi − = 0
k U ∑ x ik
This is a non-linear equation in k which can only be solved using an iterative scheme like
Newton-Raphson algorithm numerical procedure. This procedure maximizes both the estimates k
and simultaneously.

EXAMPLE 2.2
The following data refer to the number of weeks from the commencement of the use of intra
uterine device(IUD)for family planning to discontinuance. The device is removed (discontinued)
if the woman becomes pregnant or she gets prolonged or irregular bleeding.
Time in weeks to discontinuation of the use of IUD.
10,13*,18*,19,23*,30,36,36*,38*,54*,56*,59,75,93,97,104*,107,107*.
Fit a Weibull distribution to this data.
Solution
Using a computer package such as SAS, the Weibull distribution can be fitted. The results of the

estimated scale parameter is and the estimated shape parameter is .

The standard errors of these two estimates are and

The confidence intervals for the two estimates are =(-0.00143,0.00235) and

E-Tivity- 2.2.3 :Fitting of Parametric model to survival data.


Numbering and pacing 2.2.3
and sequencing
Title Fitting of parametric model to survival data.

Purpose To enable you to fit a given parametric model to survival data.

Brief summary of Watch the video on how to fit a given parametric distribution.
overall task https://www.youtube.com/watch?v=ccMcg8BRnUg

Spark

Individual contribution The following data refer to the number of weeks from the
commencement of the use of intra uterine device(IUD)for family
planning to discontinuance. The device is removed(discontinued) if
the woman becomes pregnant or she gets prolonged or irregular
bleeding.
Time in weeks to discontinuation of the use of IUD.
10,13*,18*,19,23*,30,36,36*,38*,54*,56*,59,75,93,97,104*,107,107*
Fit an exponential distribution to these data.

Interaction begins  Post your answers on the discussion forum 2.2.3

 Read what your colleagues have posted.

 In a sentence or two, comment on what two of your colleagues


have posted keeping netiquette in mind

E- Moderator  Focussing group discussion


interactions
 Encouraging lurkers (quiet ones) to contribute

 Providing feedback/ teaching points


 Closing the discussion

Schedule and time The activity takes one hour.

Non-parametric modelling of survival data.


Next.

2.3 Assessment
1. The following data gave remission times in weeks of Leukemia patients

¿ ¿ ¿ ¿ ¿ ¿ ¿
6 , 6, 6, 6 7, 9 , 10 , 10, 11 , 13, 16, 17 19 , 20 , 22, 23 , 25*,
¿ ¿ ¿ ¿
32 , 32 , 34 , 35

Find the M.L.E. of the parameters if a

(i) Weibull distribution is fitted to the data.

(ii) Log normal distribution is fitted to the data.

2. The following data refer to the number of weeks from the commencement of the use of
intra uterine device (IUD) for family planning to discontinuance. The device is removed
(discontinued) if the woman becomes pregnant or she gets prolonged or irregular
bleeding time in weeks to discontinuation of the use of IUD.

10,13*,18*,19,23*,30,36,36*,38*,54*,56*,59,75,93,97,104*,107,107*

(i) Fit a Weibull distribution to these data using Newton-Raphson algorithm.


(ii) Fit a log logistic distribution to this data
2.4 References
1. Continuous Univariate distributions by Johnson, Kotz and Balakrishnan
(1970).
2. Statistical theory and modelling by Hinkley, Reid and Snell (1991).
3. Introduction into optimization methods and their applications in
statistics by Everitt (1987).

You might also like