0% found this document useful (0 votes)
38 views

Statistics 512 Notes 19

The document provides information about the gamma distribution and maximum likelihood estimation for parameters of the gamma distribution. It then describes performing likelihood ratio tests and bootstrap confidence intervals for parameters estimated from Illinois rainfall data that are modeled with a gamma distribution. Examples are also given for linkage in genetics and performing a likelihood ratio test to assess if observed data fits expected ratios from a genetic model.

Uploaded by

Sandeep Singh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Statistics 512 Notes 19

The document provides information about the gamma distribution and maximum likelihood estimation for parameters of the gamma distribution. It then describes performing likelihood ratio tests and bootstrap confidence intervals for parameters estimated from Illinois rainfall data that are modeled with a gamma distribution. Examples are also given for linkage in genetics and performing a likelihood ratio test to assess if observed data fits expected ratios from a genetic model.

Uploaded by

Sandeep Singh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 12

Statistics 512 Notes 19:

Example 2:
Gamma distribution:
1 /
1
, 0
( ) ( ; , )
0, elsewhere
x
x e x
f x

< <


'

2
log ( ; , ) '( )
log log
( )
log ( ; , )
f X
X
f X X


( )
( )
( )
( )
2
2
,
2 3
2
2
2
''( ) ( ) '( )
1
-
( )
( , )
1 2
-
''( ) ( ) '( )
1

( )
1

I E
X


1
+
1
1

1
1

1
]
1

1
1
1
1
1
]
For the Illinois rainfall data,

.4408

.5091
MLE
MLE

Thus,
( )
( )
2
2
2
''(.4408) (.4408) '(.4408)
1

.5091
(.4408)

( , )
1 .4408

.5091 .5091
6.133 1.964
1.964 1.701
MLE MLE
I
1

1
1

1
1
1
]
1
1
]
infmat=matrix(c(6.133,1.964,1.964,1.704),ncol=2)
> invinfmat=solve(infmat)
> invinfmat
[,1] [,2]
[1,] 0.2584428 -0.2978765
[2,] -0.2978765 0.9301816
Thus,
0.259 -0.298

0
227 227

( , ) ,
0 0.298 0.259

227 227
MLE MLE
N
_
1

1
_
1


, 1

1
]
,
Thus, approximate 95% confidence intervals for

and

are
0.259
: 0.441 1.96 (0.375, 0.507)
227
0.930
: 0.509 1.96 (0.384, 0.634)
227

t
t
Note: We can also use observed Fisher information or the
parametric bootstrap to form confidence intervals based on
maximum likelihood estimates.
Observed Fisher information:

( )( ) (0, IdentityMatrix)
D
n p
n O N
where the observed
information matrix O equals
2
1

log ( ) 1
MLE
n
i
ij
i
i j
f X
O
n

1
0

( )

( ) ,
0
MLE
MLE
O
N
n

_
_







,
,
M
( )
( )
2
2
1
2 3
''( ) ( ) '( )
1

( )
2
1


6.133 1.964

1.964 1.700
MLE MLE MLE
MLE MLE
n
i
i MLE
MLE MLE MLE
O
X
n

1

1
1

1
1

1
]
1
1
]

O is very close to

( , )
MLE MLE
I
Parametric bootstrap:
Resample from

( ; )
MLE
f x
For each resample data set
* *
1
( , , )
n
X X K
, compute
* *
1

( , , )
MLE n
X X K . Percentile bootstrap approximate 95%
confidence interval for
j

= (2.5% quantile of
* *
, 1

( , , )
MLE j n
X X K
, 97.5% quantile of
* *
, 1

( , , )
MLE j n
X X K
)
# MLE for gamma
alphahatfunc=function(alpha,xvec){
n=length(xvec);
eq=-n*digamma(alpha)-n*log(mean(xvec))+n*log(alpha)
+sum(log(xvec));
eq;
}
mlegammafunc=function(X,alphahatlow=.01,alphahathigh
=10){
# Need to make sure that alphahatfunc(alphahatlow)>0,
# alphahatfunc(alphahathigh)<0
tempoptim=uniroot(alphahatfunc,interval=c(alphahatlow,al
phahathigh),xvec=X);
mlealphahat=tempoptim$root;
mlebetahat=mean(X)/mlealphahat;
list(mlealphahat=mlealphahat,mlebetahat=mlebetahat);
}
# Bootstrap CI
bootcigammafunc=function(X,m,signiflevel){
# X is a vector containing the original sample
# m is the desired number of bootstrap replications
n=length(X);
mlevec=mlegammafunc(X);
mlealphahat=mlevec$mlealphahat;
mlebetahat=mlevec$mlebetahat;
bootmlealphahatvec=rep(0,m);
bootmlebetahatvec=rep(0,m);
for(i in 1:m){
bootX=rgamma(n,shape=mlealphahat,scale=mlebetahat);
bootmle=mlegammafunc(bootX);
bootmlealphahatvec[i]=bootmle$mlealphahat;
bootmlebetahatvec[i]=bootmle$mlebetahat;
}
cutoff=floor((signiflevel/2)*(m+1));
bootmlealphahatsorted=sort(bootmlealphahatvec);
bootmlebetahatsorted=sort(bootmlebetahatvec);
# Lower CI endpoints
lowercialpha=bootmlealphahatsorted[cutoff];
lowercibeta=bootmlebetahatsorted[cutoff];
# Upper CI endpoints
uppercialpha=bootmlealphahatsorted[m+1-cutoff];
uppercibeta=bootmlebetahatsorted[m+1-cutoff];
list(lowercialpha=lowercialpha,uppercialpha=uppercialpha,
lowercibeta=lowercibeta,uppercibeta=uppercibeta);
}
> bootcigammafunc(illinoisrainfall,1000,.05)
$lowercialpha
[1] 0.3787617
$uppercialpha
[1] 0.516352
$lowercibeta
[1] 0.3914287
$uppercibeta
[1] 0.6354028
Likelihood Ratio Test for multiparameter problems:
The hypotheses of interest are
0
: H
versus
1
:
C
H I
where is defined in terms of
, 0 q q p <
,
independent constraints of the form
1 1
( ) , , ( )
q q
g a g a K
.
Likelihood ratio:
max ( )
max ( )
L
L


Reject for small values of .
Theorem 6.5.1: Let
1
, ,
n
X X K
be iid with pdf
1
( ; ( , , ))
p
f x K
for . Assume the regularity
conditions (R6-R9) hold. Under the null hypothesis,
0
: H
,
2
2log ( )
D
q
Thus, we reject
0
: H
when
2
2log ( ) q


Example 1: Likelihood ratio test for the mean of a normal
distribution.
Let
1
, ,
n
X X K
be a random sample from a normal
distribution with mean

and variance
2
unknown.
Suppose we are interested in testing
0 0 1 0
: versus : H H
where
0

is specified.
Let
2 2
{( , ) : , 0} < < > denote the full
model parameter space. Here the null hypothesis parameter
space is defined as the subset of for which the function
2
1
( , ) g
satisfies the constraint
2
1 0
( , ) g
.
2 2
0
{( , ) : , 0} >
The MLEs for the parameter space are
X and
2 2
1
1
( )
n
i
i
X X
n

. It is easy to show that the MLEs


for the parameter space

are
0 0

and
2
0
2
1
0
( )

n
i
i
X
n

. Thus, the likelihood ratio is


2
0
1
2
0 0
2
1
2
0
( )
1 1
exp
2
2
max ( )
max ( )
( )
1 1 1
exp
2 2
2
1
exp
2

1
exp
2
n n
n
i
i
n
n n
i
i
n
n
X
L
L
X X
n
n

_
_

' ;

,
,



_ _

' ;

, ,


_

' ;

,

'

,

/ 2
2
1
2
0
1
( )
( )
n
n
i
i
n
i
i
X X
X

,
;

2
0
1
2
1
( )
2log log
( )
n
i
i
n
i
i
X
n
X X

We reject for
2
2log (1)


Using the identity
2 2 2
0 0
1 1
( ) ( ) ( )
n n
i i
i i
X X X n X

+

,
we have
2
0
2
1
( )
2log log 1
( )
n
i
i
n X
n
X X
n

Thus the likelihood ratio test rejects for large values of


2
0
2
0
1
( )
( )
n
i
i
X
X
n

.
Example 2: Linkage in genetics
Corn can be starchy (S) or sugary (s) and can have a green
base leaf (G) or a white base leaf (g). The traits starchy and
green base leaf are dominant traits. Suppose the alleles for
these two factors occur on separate chromosomes and are
hence independent. Then each parent with alleles SsGg
produces with equal likelihood gametes of the form (S,G),
(S,g), (s,G) and (s,g). If two such hybrid parents are
crossed, the phenotypes of the offspring will occur in the
proportions suggested by the able below. That is, the
probability of an offspring of type (S,G) is 9/16; type (SG)
is 3/16; type (S,g) 3/16; type (s,g) 1/16.
Alleles of first parent
Alleles
of
second
parent
SG Sg sG sg
SG (S,G) (S,G) (S,G) (S,G)
Sg (S,G) (S,g) (S,G) (s,G)
sG (S,G) (S,G) (s,G) (s,G)
Sg (S,G) (S,g) (s,G) (s,g)
The table below shows the results of a set of 3839 SsGg x
SsGg crossings (Carver, 1927, Genetics, A Genetic Study
of Certain Chlorophyll Deficiencies in Maize.)
Phenotype Number in sample
Starchy green 1997
Starchy white 906
Sugary green 904
Sugary white 32
Does the genetic model with 9:3:3:1 ratios fit the data?
Let
i
X
denote the phenotype of the ith crossing.
Model:
1
, ,
n
X X K
are iid multinomial.
( ) , ( ) , ( ) , ( )
i SG i Sg i sG i sg
P X SG p P X Sg p P X sG p P X sg p
0
1
: 9/16, 3/16, 3/16, 1/16
: At least one of 9/16, 3/16, 3/16, 1/16 is not correct.
SG Sg sG sg
SG Sg sG sg
H p p p p
H p p p p


Maximum likelihood for multinomial distribution:
Consider a random trial which can result in one, and only
one, of k outcomes or categories. Let
1 1
, ,
k
p p

K
denote
the probabilities of the 1,...,k-1 outcomes (Note:
1 1
1
k k
p p p

K
. Let
X
denote the outcome of the
trial. For
1
, ,
n
X X K
iid, let
1
, ,
k
Y Y K
denote the number of
trials whose outcome is 1,...,k respectively.
We have
1 1 1 1
1 1 1 1 1
( , , ) (1 )
k k
Y n Y Y Y
n k k
P X X p p p p




K
K L L
.
1 1 1 1 1 1 1 1 1 1
( , ) log log ( ) log(1 )
k k k k k
l p p Y p Y p n Y Y p p

+ + + K L L L
1 1 1
1 1 1 1
1
k
k
n Y Y Y l
p p p p


+

L
L
,...,
1 1 1
1 1 1 1
1
k k
k k k
Y n Y Y l
p p p p



+

L
L
It is easily seen that
,

j
j MLE
Y
p
n
satisfies these equations.
See (6.4.19) and (6.4.20) in book for information matrix.
Back to genetic model:
Likelihood ratio test:
1997 906 904 32
1997 906 904 32
max ( ) (9/16) (3/16) (3/16) (1/16)
max ( ) (1997/ 3839) (906/ 3839) (904/ 3839) (32/ 3839)
L
L


9/16 3/16
2log 2*(1997log 906log
1997 / 3839 906/ 3839
3/16 1/16
904log 32log ) 387.51
904/3839 3839
+ +
+
Under
0
: 9/16, 3/16, 3/16, 1/16
SG Sg sG sg
H p p p p
,
2
2log ~ (3) [there are three extra free parameters in
1
H
].
Reject
0
H
if
2
.05
2log (3) 7.81
.
Thus we reject
0
: 9/16, 3/16, 3/16, 1/16
SG Sg sG sg
H p p p p
.
What could be going on? Linkage. See handout.
Model for linkage:
1 1 1 1
(2 ), (1 ), (1 ),
4 4 4 4
SG Sg sG sg
p p p p +
1 2 3 1 2 3
1 1 1 1
( ) (2 ) (1 ) (1 )
4 4 4 4
Y Y Y n Y Y Y
L

_ _ _ _
+

, , , ,
Maximum likelihood estimate of for corn data = 0.0357,
see handout.
Test
0
1
1 1 1 1
: (2 ), (1 ), (1 ), vs.
4 4 4 4
: , , , do not satisfy
1 1 1 1
(2 ), (1 ), (1 ),
4 4 4 4
for any ,0 1
SG Sg sG sg
SG Sg sG sg
SG Sg sG sg
H p p p p
H p p p p
p p p p



+
+

1997 906 904 32
1997 906 904 32
max ( ) (.25*(2 .0357)) (.25*(1 .0357)) (.25*(1 .0357 )) (.25*.0357)
max ( ) (1997/ 3839) (906/ 3839) (904/ 3839) (32/ 3839)
L
L

+

2log 2.02
Under
0
H
,
2
2log ~ (2) [there are two extra free
parameters in
1
H
].
2
.05
2log (2) 5.99 <
Linkage model is not rejected.

You might also like