0% found this document useful (0 votes)
159 views10 pages

A Study of Weibull Mixture ROC Curve With Constant Shape Parameter PDF

The document presents a study on the Constant Shape Weibull Mixture Receiver Operating Characteristic (ROC) curve. The authors propose the CSWMROC curve, which accounts for heterogeneity in the data while keeping the shape parameter constant. Parameters of the CSWMROC curve are estimated using the Expectation-Maximization algorithm. The area under the CSWMROC curve and its variance are also derived. Simulation studies are used to verify the proposed theory and estimates.

Uploaded by

Azhar Uddin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views10 pages

A Study of Weibull Mixture ROC Curve With Constant Shape Parameter PDF

The document presents a study on the Constant Shape Weibull Mixture Receiver Operating Characteristic (ROC) curve. The authors propose the CSWMROC curve, which accounts for heterogeneity in the data while keeping the shape parameter constant. Parameters of the CSWMROC curve are estimated using the Expectation-Maximization algorithm. The area under the CSWMROC curve and its variance are also derived. Simulation studies are used to verify the proposed theory and estimates.

Uploaded by

Azhar Uddin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IAPQR Transactions

Vol. 41, No. 2, 2017

A STUDY OF WEIBULL-MIXTURE ROC CURVE WITH


CONSTANT SHAPE PARAMETER

AZHARUDDIN AND SUDESH PUNDIR


Department of Statistics, Pondicherry University, Puducherry, India

ABSTRACT: Receiver Operating Characteristic (ROC) Curve is a useful tool to


characterize a diagnostic test. The Constant Shape Weibull Mixture ROC
(CSWMROC) Curve takes into account the presence of heterogeneity in the data. In
this paper, we have proposed CSWMROC Curve and discussed its properties.
Parameters of the ROC Curve are estimated by using the Expectation-Maximization
(EM) algorithm. The Area Under CSWMROC Curve (AUC) and its variance are
derived. Testing of AUC is also done. The proposed theory is verified by using
simulation.
Keywords: CSWMROC Curve, AUC, EM algorithm, Delta method, Monte Carlo
simulation.

1. INTRODUCTION

Weibull mixture distribution is very useful in Reliability Engineering,


Survival analysis and Medicine. Kao (1959) introduced the Weibull mixture
distribution and derived the estimates of the parameters of the two
component Weibull mixture distribution using the method of moments.
Patra and Dey (1999) found the maximum likelihood estimator of the
multivariate Weibull mixture distribution in reliability modeling. Franco et
al. (2000) discussed the two component generalized mixture of Weibull
distribution in context of Reliability. Arfa and Aslam (2008) compared the
estimates of the parameters of mixed Weibull distribution using method of
moments and graphical method. Erisoglu and Erisoglu (2014) derived

Correspondence email: [email protected] : [email protected]

89
IAPQR Transactions

estimates of the Weibull mixture distribution in case of het erogeneous data


using EM algorithm, L-moment method and MLE method. They compared
the bias, mean absolute error, total mean error and time completion of the
algorithm using different method of estimation by simulation studies.

The first article on the mixture ROC Curve is given by Dass and Kim
(2012). They studied the Multivariate Bi-normal Mixture ROC Curve and
also discussed the Bayesian computation, group invariance property,
mixture models and semi parametric inference for estimation.Gonen (2013)
also discussed the bi-normal mixture ROC Curve and its AUC.EM
algorithm is used for the estimation of parameters of bi-normal mixture
ROC Curve and he showed that if the heterogeneity is found in the data then
bi-normal mixture ROC Curve gives better smoothness as compared to
univariate bi-normal ROC Curve.

In Section 2, we discuss the Constant Shape Weibull Mixture ROC Curve,


its AUC and properties. The maximum likelihood estimates of the AUC of
CSWMROC Curve using EM algorithm are given in Section 3. In Section 4,
the variance of AUC using delta method is derived. Testing of AUC is also
proposed. In Section 5, the estimates of the parameters of AUC, variance of
AUC, standard error and confidence interval, Z values for AUC with
different sample sizes are found by using simulation.

2. CSWMROC CURVE, AUC AND ITS PROPERTIES

As Weibull mixture ROC curve does not yield a closed form of AUC, hence
we have fixed the shape parameter as α1=α2=α.

Let X follow Weibull mixture distribution with healthy cases and Y follow
Weibull mixture distribution with disease cases, then the CSWMROC curve
is defined as

1x 2 x
CSWMROC  t   p  Fx  t  1 y  1  p   FX  t  2 y (2.1)

where
 t   t 
FX  t   p exp     1  p  exp    .
 
 1x   2x 

90
AZHARUDDIN AND PUNDIR

The AUC of CSWMROC Curve is defined as


1y 2 y
AUC  p  1  p  . (2.2)
1x  1y 2 x  2 y

The CSWMROC Curve satisfies the following properties.

(a) CSWMROC Curve is invariant with respect to monotonic


increasing transformation of values of biomarkers.

(b) CSWMROC Curve is a monotonically increasing function.

(c) CSWMROC Curve is concave.

(d) The slope of the ROC Curve gives a single test value which is used
for evaluating the likelihood ratio of diagnostic test. The slope of the
CSWMROC Curve at the cut off value t is given as

  t   t 
1x  2 x  p  2 y exp     1  p  1 y exp   
  1 y   2 y 
   
slope(t )  .
  t   t 
     1x  
1 y  2 y  p  2 x exp    1  p  exp  
  1 x   2 x  
(2.3)

(e) CSWMROC Curve is TPR asymmetric.

3. MAXIMUM LIKELIHOOD ESTIMATES OF PARAMETERS OF


CSWMROC CURVE USING EM ALGORITHM

Before estimating the parameters of CSWMROC Curve, first we will check


the identifiability condition of Constant Shape Weibull Mixture distribution.

The Constant Shape Weibull Mixture distribution can be easily shown to be


identifiable.

The likelihood function is defined as

91
IAPQR Transactions

n    1  x 1    1  x 1  
L    px xi exp   i   1  p x  xi exp   i   .
     
 1x 2x
i 1 
 1x   2 x 
(3.1)

For healthy cases, the estimates are given as

n n n
 i  x     x  x 
i 1 
i i     1    x  x 
i 1 
i i   
pˆ x  i 1
, ˆ1x  n
and ˆ2 x  n
.
n
i 1
 i  x  
i 1
1  i  x 

(3.2)
where
  1  x 
px xi exp   i 
1x  
i  x    1x  ,
  1  x    1  x 
px xi exp   i   1  p x  xi exp   i 
1x   2 x   
 1x   2x 

 x  

 1  px  xi 1 exp   i 
1x  1x 
1  i  x   .
  1  x    1  x 
px xi exp   i    1  px  xi exp   i 
1x     2x  
 1x   2x 

Similarly for diseased cases, the estimates are given as


m m m

j 1
j  y  
j 1 

   y  y 
j j     
 1    y  y 
j 1 
j j   
pˆ y  , ˆ1 y  m
and ˆ2 y  m
.
m 
j 1
j  y  
j 1
1   j  y 
(3.3)
On putting (3.8) and (3.9) in (2.2), the estimated AUC of CSWMROC curve
is given as

92
AZHARUDDIN AND PUNDIR

m m

   y y  
j 1
j j  1    y  y  
j 1
j j

m m

  y j  1    y 
j

AUˆC  p  1  p 
j 1 j 1
n m n m
.
  x x   
i 1
i

i
j 1
j  y y j   1   x x   1  
i 1
i

i
j 1
j  y y j 
n
 m n
 m

  x 
i 1
i    y
j 1
j  1   x 
i 1
i  1    y 
j 1
j

4. VARIANCE OF AUC OF CSWMROC CURVE

The approximate variance of AUC of CSWMROC Curve can be computed


in the following manner.

ˆ  pV AUC
V AUC ˆ  ˆ
1  1  p  V AUC2     (4.1)

where
1 x
AUC1  (4.2)
1 x  1 y
2x
AUC 2  . (4.3)
2x  2y
ˆ
Using delta method, V AUC ˆ
1 and V AUC 2 are given as    
2 2
 AUC1   AUC1   AUC1   AUC1 
ˆ
V AUC1  

1x 
ˆ
 V 1x  

 V ˆ1y  2
  
 
1x  
 

 Cov ˆ1x , ˆ1y
    
 1y  1y 
(4.4)
2 2
 AUC2   AUC2   AUC2   AUC2 
ˆ
V AUC 2  
 2x 
 V ˆ 2x    V ˆ 2y  2 
  

 2x   2y
   
 Cov ˆ 2x , ˆ 2y 
 2y  
(4.5)

To get V  A Uˆ C 1  , first, we have to evaluate the variances of ˆ1y , ˆ1x , ˆ 2y


and ̂2x . We use the inverse Fisher Information matrix given as

93
IAPQR Transactions

 a 22 a 33  a 21a 33  a 22 a 31 
1 1  2 
I  1   2 2   a 12 a 33 a 11 a 33  a 13 a 12 a 31 
a 11a 22 a 33  a 12 a 33  a 22 a 13  2 
  a 22 a 13 a 21a 13 a 11a 22  a 12 
 V  ˆ 

 
Cov ˆ , ˆ 1x 
Cov ˆ , ˆ 1y 


  Cov ˆ 1x , ˆ
  V ˆ1x  Cov 1x , 1y 
ˆ  ˆ 
 
ˆ 
 Cov 1y , ˆ  Cov ˆ1y , ˆ1x  V ˆ 1y   (4.9)
where,

2
a11   m1x  n1y 1  ''2  2 n1y ln 1y  m1x ln 1x '2  n1y ln 1y  m1x  ln 1x   
   2

   


m n m 
1x
1y
a 22  21x , a 33  2 , a 23  0, a 32  0, a12  a 21   1x '2  ln 1x
1y 1x
 


n1y 
a13  a 31  
1y
'

2  ln 1y  


(4.6)
 1 n 1 
 n'    n  1!     
 n k  1 k   is Euler-Mascheroni constant
approximately equal to 0.5772.
m1x , m 2 x are the sample sizes of healthy controls and n1 y , n 2 y are the
sample sizes of disease cases.

Putting all the above expressions for variance and Covariance in (4.4),
 
V A UˆC1 is given as
  2 
   
   ln  1x
 
  2 2 m  n   1y   
1x 1y  1x 1y 
ˆ
V AUC1      .


 1x  1y 
4
 m 1x n 1y  
m 1x  n 1y 1   2 ''  2 '2  
  
   

(4.7)

94
AZHARUDDIN AND PUNDIR

 
Similarly V A Uˆ C 2 is defined as

  2 
  
   ln  2 x  
  22 x  22 y m    2 y  
 2x  n 2y  

V A Uˆ C 2       .


 2x  2y 
4
 m 2x n 2 y 
m 2x  n 2y  1   2 ''  2 ' 2  
  
   

(4.8)
On substituting(4.11) and (4.12) in (4.1), the variance of AUC is given as

  2 
   
   ln  1x
 
 2 2 m  n   1y   
1x 1y  1x 1y 
ˆ
V AUC 
 p    


 1x  1y 
4 m
 n
1x 1y 
m1x  n 1y 1   2 ''  2  
 '2

  
   
  2 
    
  ln  2x
 
 2 2 m  n    2 y   
 2x 2 y  2x 2y 
1  p   4  m n
  .

  2 x   2 y 
2x 2y   
m 2x  n 2 y 1   2 ''   2 '2  
 
   
(4.9)

 
Using V AUˆC , one can easily find confidence interval and Mean Square
Error (MSE). We can also develop the test of significance for AUC.

5. SIMULATION RESULTS

Random numbers are generated from Weibull Mixture distribution with


Constant Shape   2 and 1x  1 , 2 x  2, 1y  10 and 2 y  9 .
Sample sizes are equal for healthy controls and disease cases. The weight of
both density function is fixed asp1x=p1y=0.7.

Table 5.1 shows the estimates and bias of the parameters of CSWMROC
Curve using MLE via EM algorithm for healthy controls and disease cases
in case of different sample sizes m=n=10, 20, 30, 100, 200, 300.

95
IAPQR Transactions

Table 5.1 Estimates and bias of parameters of CSWMROC Curve for different
sample size using EM algorithm

n ˆ1 y ˆ2 y
pˆ1x ˆ1x ˆ2 x pˆ1 y
10 0.7497 0.8531 0.8916 0.6932 6.6332 6.5813
(0.0497) (-0.146) (-1.108) (-0.006) (-3.366) (-2.418)
20 0.7490 0.8456 0.9211 0.6932 6.6602 6.5655
(0.049) (-0.154) (-1.078) (-0.006) (-3.339) (-2.434)
30 0.7316 0.9607 1.1788 0.6959 7.9033 7.6226
(0.0316) (-0.039) (-0.821) (-0.004) (-2.096) (-1.377)
100 0.7208 1.0848 1.2618 0.6980 8.7689 8.5401
(0.0208) (0.0848) (-0.738) (-0.002) (-1.231) (-0.459)
200 0.7316 1.0022 1.1199 0.6962 7.9763 7.8212
(0.0316) (0.0022) (-0.880) (-0.003) (-2.023) (-1.178)
300 0.7286 1.0203 1.1704 0.6967 8.1545 8.1015
(0.0286) (0.0203) (-0.829) (-0.003) (-1.845) (-0.898)

From Table 5.1, we observe that the estimates of healthy controls and
disease cases are closer to the parameters. The estimates of AUC, variance,
SE, MSE and CI of AUC are obtained in Table 5.2 using the estimates from
Table 5.1. For testing the AUC of CSWMROC Curve, the hypotheses are
H 0 : AUC  0.88 vs H 1 : AUC  0.88 . Using (4.14), the values of Z-
statistic are given in Tables 5.2.

    
Table 5.2 A UˆC , V A UˆC , SE A UˆC , MSE A UˆC , CI ( AUC ) and Z-values of 
CSWMROC Curve using MLE via EM algorithm

n AUˆC 
V AUˆC  
SE A Uˆ C  
MSE AUˆC  CI  AUC  Z-values

0.8844
10 0.0033 0.0582 0.0034 [0.770, 0.998] 0.075
(0.0044)
0.8842
20 0.0017 0.0423 0.0018 [0.801, 0.967] 0.099
(0.0042)
0.8839
30 0.0011 0.0336 0.0011 [0.817, 0.949] 0.116
(0.0039)
0.8843
100 0.0003 0.0184 0.0003 [0.848, 0.920] 0.233
(0.0043)
0.8842
200 0.0001 0.0130 0.0001 [0.858, 0.909] 0.323
(0.0042)
0.8842
300 0.0001 0.01064 0.0001 [0.863, 0.905] 0.394
(0.0042)

96
AZHARUDDIN AND PUNDIR

In Table 5.2, we note that estimate of AUC is approximately closer to the


true AUC. The bias of AUC also decreases with increase in sample size. The
variance, SE and MSE of AUC also decreases with increase in sample size.
As all Z-values are greater than -1.645, hence null hypothesis is accepted in
all cases. Thereafter, we conclude that AUC of CSWMROC Curve is 0.88.
Figure 5.1 shows the CSWMROC Curve for different sample sizes m=n=10,
20, 30, 100, 200 and 300for   2 and 1x  1 , 2x  2, 1y  10 and
2 y  9
.

1
0.9
0.8
0.7
0.6
TPR

0.5
0.4
0.3
0.2
0.1
0
0 0.5 1
FPR
Fig. 5.1 : CSWMROC Curve for different sample sizes

It is observed that the CSWMROC Curve remains same for different sample
sizes and fixed values of shape and scale parameters of healthy cases and
disease cases.

ACKNOWLEDGEMENT

The authors would like to thank University Grants Commission, for


providing financial support to carry out this research work under the UGC-
MRP.

97
IAPQR Transactions

REFERENCES

[1] Arfa, M. and Aslam M. (2008): A comparative study to estimate the


parameters of mixed-weibull distribution, Pakistan Journal of Statistics and
Operation Research, 4(1), 1-8.
[2] Atienza, N., Garcia-Heras, J., and Munoz Pichardo, J. M. (2006): A new
condition for Identifiability of finite mixture distributions, Metrika, 63(2),
215-221.
[3] Chandra, S. (1977): On the Mixtures of Probability Distributions,
Scandinavian Journal of Statistics, 4, 105-112.
[4] Dass, S.C., and Kim, S.W. (2012): A Semi-parametric Approach to
Estimation of ROC Curves for Multivariate BinormalMixtures,
www.stt.msu.edu/Links/Research_Memoranda/RM/RM_683.pdf.
[5] Dempster, A.P., Laired, N.M., and Rubin D.B. (1977):Maximum- Likelihood
from incomplete data, via the EM algorithm, Journal of Royal Statistics
Society Series B (Methodological), 39 (1), 1-38.
[6] Erisoglu U. and Erisoglu M. (2014), L-Moments estimations for mixture of
weibull distributions, Journal of Data Science, 12, 69-85.
[7] Franco,M., Balakrishnan, N., Kundu, D. and Juana Maria, V. (2000):
Generalized mixtures of weibull components. Home.iitk.ac.in/~Kundu/gen-
mix-weib.pdf.
Gonen, M. (2013): Mixtures of Receiver Operating Characteristic
[8]
Curves,Academic Radiology, 20(7), 831-837.
[9] Kao, J.H.K (1959): A graphical estimation of mixed Weibull parameters in
life-testing of electron tubes, Technometrics, 1, 389-407.
[10] Patra,K.,and Dey, D.K. (1999): A multivariate mixture of Weibull
distributions in reliability modeling, Statistics & Probability Letters,45, 225-
235.
[11] Teicher, H. (1960): On mixture of distributions, The Annals of Mathematical
Statistics,31, 55-73.
[12] Teicher, H. (1961):Identifiability of Mixtures, The Annals of Mathematical
Statistics,32, 244-248.
[13] Teicher H. (1963:a): Identifiability of finite Mixtures, The Annals of
Mathematical Statistics, 34, 1265-1269.
[14] Teicher, H. (1963:b): Identifiability of Mixtures of product measures, The
Annals of Mathematical Statistics, 38, 1300-1302.
[15] Yakowitz, S.J., and Spragins, J.D. (1968): On the Identifiability of Finite
Mixtures,The Annals of Mathematical Statistics, 39, 209-214.

98

You might also like