0% found this document useful (0 votes)
6 views34 pages

Session 11 (Comparing)

Uploaded by

Gerry Contillo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views34 pages

Session 11 (Comparing)

Uploaded by

Gerry Contillo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Data Analysis, Statistics, Machine Learning

Leland  Wilkinson  
 
Adjunct  Professor  
                     UIC  Computer  Science  
Chief  Scien<st  
                     H2O.ai  
 
[email protected]  
Comparing  
o Sta<s<cal  methods  exist  for  comparing  2  or  more  groups  
o The  classical  approach  is  Analysis  of  Variance  (ANOVA)  
o This  method  invented  by  Sir  Ronald  Fisher  
o It  revolu<onized  industrial/scien<fic  experiments  
o The  researcher  was  able  to  examine  more  than  one  treatment  at  a  <me  
o With  only  two  groups,  results  of  Student’s  t-­‐test  and  F-­‐test  are  
equivalent  
o Mul<variate  Analysis  of  Variance  (MANOVA)  
o This  is  ANOVA  for  more  than  one  dependent  variable  (outcome)  
o Hierarchical  modeling  is  for  nested  data  
o There  are  several  forms  of  this  mul<level  modeling  

 
2   Copyright  ©  2016  Leland  Wilkinson    
 
Comparing  
o A  simple  two-­‐group  comparison  
 
 

3   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o A  simple  two-­‐group  comparison  
o We  compare  Model  1  vs  Model  2    
o A  likelihood  ra<o  test  would  do  for  large  samples  
o Full  model  (Model  2):  
yi = β 0 + β 1 x + � i
= µ + τ + �i
o Restricted  model  (Model  1):  
yi = µ + � i
o But  for  small  samples,  use  Student’s  t-­‐test  
o Don’t  bother  with  all  the  unnecessarily  complicated  intro  stat  book  formulas  
o They  are  useless  
o You  don’t  want  to  try  this  at  home,  folks  
o Let  the  stat  package  do  it  
o You  want  the  Saaerthwaite  formula  
o The  standard  pooled  formula  is  almost  never  valid  on  real  data  
o The  Saaerthwaite  formula  gives  the  same  answer  if  the  variances  are  equal  

 4   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o A  simple  two-­‐group  comparison  
o The  independent  groups  t-­‐test  
o Assump<ons  
o The  variable  is  normally  distributed    
o The  groups  are  independent  

o BUT  the  t-­‐test  is  for  small  n  


o As  David  Freedman  pointed  out,    if  n  is  so  small  that  you  need  a  t-­‐test,  then  the  sample  
is  too  small  to  assess  the  normality  assump<on  
o And  if  n  is  large  enough  to  assess  normality,  then  you  might  as  well  use  a  Normal  z-­‐test  
instead  of  t  

o The  variances  are  supposed  to  be  equal.    


o Some  say  the  t-­‐test  is  robust  viola<ons  of  that  assump<on.  
o Then  why  does  the  Saaerthwaite  modifica<on  exist?  
o And  no,  the  t-­‐test  (and  F-­‐test)  are  not  robust  against  skewness  

 
5   Copyright  ©  2016  Leland  Wilkinson    
Comparing  
o Another  way  at  looking  at  the  independent  groups  test  
o The  OLS  regression  model  on  two  groups  
y = Xb + e
 
 

6   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Another  way  at  looking  at  the  independent  groups  test  
o Effects  coding  
 
y = Xb + e
 
     
y1,1 1 1 �1,1
 y2,1   1 1   �2,1 
     
 ..   .. ..   .. 
 .   . .   . 
    � �  
 yn1 ,1   1 1  β0  �n1 ,1 
y=


 X=


 b= e=



 y1,2   1 −1  β1  �1,2 
 y2,2   1 −1   �2,2 
     
 ..   .. ..   .. 
 .   . .   . 
yn2 ,2 1 −1 �n2 ,2

7   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Another  way  at  looking  at  the  independent  groups  test  
o Means  coding  
 
y = Xb + e
 
     
y1,1 1 0 �1,1
 y2,1   1 0   �2,1 
     
 ..   .. ..   .. 
 .   . .   . 
    � �  
 yn1 ,1   1 0  µ1  �n1 ,1 
y=


 X=

 b= e= 
 y1,2   0 1   µ2 
 �1,2 

 y2,2   0 1   �2,2 
     
 ..   .. ..   .. 
 .   . .   . 
yn2 ,2 0 1 �n2 ,2

8   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Another  way  at  looking  at  the  independent  groups  test  
o Hypothesis  tests  

y = Xb + e

o Go  ahead  and  do  all  the  usual  things  


o Confidence  intervals  on  effects  coded  es<mates  are  confidence  intervals  on  
difference  between  cell  means.  
o Confidence  intervals  on  means  coded  es<mates  are  confidence  intervals  on  cell  
means.  
o Examine  residuals  
o You  want  to  see  same  variance  and  Normal  distribu<on  in  both  groups  

 
 
9   Copyright  ©  2016  Leland  Wilkinson    
Comparing  
o Analysis  of  Variance  (ANOVA)  sums  of  squares  
    g

SSB = nj (µ̂j − µ̂)2 between  groups  (regression)  sum  of  squares  
j=1
nj
g �

SSW = (yij − µ̂j )2 within  groups  (error)  sum  of  squares  
j=1 i=1
n

SST = (yi − µ̂)2 total  sum  of  squares  
i=1

M SB = SSB/(g − 1) mean  square  between  groups  

M SW = SSW/(n − g) mean  square  within  groups  

Fg−1,n−g = M SB/M SW F  test  for  difference  between  cell  means  

10   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o A  simple  two-­‐group  comparison  
o The  dependent  groups  t-­‐test  
o Suppose  you  have  repeated  measures  on  the  same  subjects  (e.g.,  pre-­‐post)  
o Then  you  need  a  dependent  t-­‐test  
o Forget  about  the  intro-­‐stat  textbook  formulas  
o They  are  useless  
o The  same  cau<ons  apply  to  this  situa<on  concerning  assump<ons,  however  
o And  there’s  a  nasty  gotcha  
o The  dependent  t-­‐test  takes  advantage  of  the  variance  of  dependent  random  variables  
V AR(X + Y ) = V AR(X) + V AR(Y ) + 2COV (X, Y )
o Actually,  we’re  working  with  a  difference  here,  so  
V AR(X − Y ) = V AR(X) + V AR(Y ) − 2COV (X, Y )
o So,  if  your  measure  is  posi<vely  correlated  across  subjects,  you’ve  increased  the  power    
o If  they  are  nega<vely  correlated,  however,  you’ve  decreased  the  power  

o Ever  see  a  researcher  test  whether  the  within-­‐subject  correla<on  is  posi<ve  before  
using  the  dependent  t-­‐test?  
o I  didn’t  think  so  

11   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o A  simple  two-­‐group  comparison  
o The  dependent  groups  t-­‐test  
o But  it  gets  worse  
o You  don’t  want  to  use  change  (difference)  scores  for  a  pre-­‐post  design.  
o Instead,  you  want  an  analysis  of  covariance  with  Pre  as  a  covariate  and  Post  as  the  
dependent  variable  (more  on  that  later)  
o And  if  you  did  a  Pre-­‐Post  design  with  Experiment  and  Control  groups?  
o Hope  you  randomly  assigned  to  treatments  
o Hope  you  know  that  the  test  in  this  case  involves  an  interac<on  in  a  repeated  measures  
design  (we’ll  talk  about  that  later)  

o And  you  thought  A/B  tes<ng  is  simple?  


o Only  market  researchers  and  Web  designers  think  that.  

12   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Three  groups  
o Same  model  
  y = Xb + e

13   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Three  groups  
o Means  coding  
 
       
y1,1 1 0 0 �1,1
 y2,1   1 0 0   �2,1 
     
 ..   .. .. ..   .. 
 .   . . .   . 
     
 yn1 ,1   1 0 0   �n1 ,1 
     
 y1,2   0 1 0     �1,2 
    µ1  
 y2,2   0 1 0   �2,2 
    b =  µ2   
y= ..  X= .. .. ..  e= .. 
 .   . . .   . 
    µ3  
 yn2 ,2   0 1 0   �n2 ,2 
     
 y1,3   0 0 1   �1,3 
     
 y2,3   0 0 1   �2,3 
     
 ..   .. .. ..   .. 
 .   . . .   . 
yn3 ,3 0 0 1 �n3 ,3

14   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Three  groups  
o Effects  coding  
 
       
y1,1 1 1 0 �1,1
 y2,1   1 1 0   �2,1 
     
 ..   .. .. ..   .. 
 .   . . .   . 
     
 yn1 ,1   1 1 0   �n1 ,1 
     
 y1,2   1 0 1     �1,2 
     
 y2,2   1 0 1  β0  �2,2 
     
y= ..  X= .. .. ..  b =  β1  e= .. 
 .   . . .   . 
    β2  
 yn2 ,2   1 0 1   �n2 ,2 
     
 y1,3   1 −1 −1   �1,3 
     
 y2,3   1 −1 −1   �2,3 
     
 ..   .. .. ..   .. 
 .   . . .   . 
yn3 ,3 1 −1 −1 �n3 ,3

15   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o The  two-­‐way  factorial  model  (2  x  2  design)  
 
 

16   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o The  two-­‐way  factorial  model  (2  x  2  design)  
o Effects  coding  
       
y1,1,1 1 1 1 1 �1,1,1
  


y2,1,1
..






1
..
1
..
1
..
1
..






�2,1,1
..



 .   . . . .   . 
     
 yn1,1 ,1,1   1 1 1 1   �n1,1 ,1,1 
     
 y1,2,1   1 −1 1 −1   �1,2,1 
     
 y2,2,1   1 −1 1 −1   �2,2,1 
       
 ..   .. .. .. ..   .. 

 . 


 . . . . 

β0 
 . 

 yn2,1 ,2,1   1 −1 1 −1   β1   �n2,1 ,2,1 
y=


 X=


 b=  e=



 y1,1,2   1 1 −1 −1   β2   �1,1,2 
 y2,1,2   1 1 −1 −1   �2,1,2 
    β12  
 ..   .. .. .. ..   .. 
 .   . . . .   . 
     
 yn1,2 ,1,2   1 1 −1 −1   �n1,2 ,1,2 
     
 y1,2,2   1 −1 −1 1   �1,2,2 
     
 y2,2,2   1 −1 −1 1   �2,2,1 
     
 ..   .. .. .. ..   .. 
 .   . . . .   . 
yn2,2 ,2,2 1 −1 −1 1 �n2,2 ,2,2

17   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Mul<way  factorials  
o Don’t  even  try  to  look  at  the  design  matrix  
o Aren’t  you  glad  there’s  computer  sojware  for  this?  
 

18   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Things  to  consider  with  ANOVA  
o Don’t  even  LOOK  at  any  lower  term  if  it  is  contained  in  a  significant  interac<on  

“Variance  analysis  found  a  significant  main  effect  of  gender  on  perceived  dura<on  (F(1,109)=4.29,  p<.05).”  
James  J.  Kellaris  and  Susan  Powell  Mantel  (1994)  ,"The  Influence  of  Mood  and  Gender  on  Consumers'  Time  
PercepHons",  in  NA  -­‐  Advances  in  Consumer  Research  Volume  21,  eds.  Chris  T.  Allen  and  Deborah  Roedder  John,  
Provo,  UT  :  AssociaHon  for  Consumer  Research,  Pages:  514-­‐518.  
 
BULLSHIT  
The  story  is  different  for  males  and  females  

19   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Things  to  consider  with  ANOVA  
o Don’t  even  LOOK  at  any  lower  term  if  it  is  contained  in  a  significant  interac<on.  
o If  you  want  to  say  something  about  main  effects,  you  will  have  to  do  simple  contrasts.  

20   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Things  to  consider  with  ANOVA  
o Don’t  trust  p  values  in  mul<way  factorials  
o Use  FDR  on  all  effects  
o Probability  plot  the  p  values  on  a  uniform  
o And,  excuse  me  
o Try  explaining  a  4-­‐way  interac<on  to  someone  
o The  example  on  the  right  is  from  SYSTAT  
o I  generated  random  data  and  got  2  significant  effects  

21   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Things  to  consider  with  ANOVA  
o F  tests  are  generally  robust  against  heterogeneity  of  variance  
o But  not  against  skewness  
o If  your  data  are  highly  skewed,  you  are  probably  using  wrong  model  
o Counts?  (you  probably  want  Poisson)  
o Incomes?  (you  probably  want  to  log  the  dependent  variable  to  take  care  of  Bill  Gates)  
o Check  out  the  next  example  

22   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Things  to  consider  with  ANOVA  
o Poisson  ANOVA  (thanks  to  Jerry  Dallal  for  this  final  exam  ques<on)  

ANOVA  

Poisson  ANOVA  

23   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Analysis  of  Covariance  (ANCOVA)  
o Just  throw  any  con<nuous  variables  you  want  into  X  
o It’s  the  same  least  squares  model  
y = Xb + e
o Here’s  one  covariate  (  x    )  and  one  treatment  (τ
     )  
 
       y      ij
         =
           µ
         +
           τ      j      +
           β(x
                 ij
           −
           x̄
     j    )        +
           �    ij
               (group  indexed  by  j,  case  indexed  by  i)  

o We  subtract  the  mean  of  the  covariate  (  x̄


     j  )  out  to  specify  devia<ons  from  cell  
means  in  the  model  

24   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Analysis  of  Covariance  (ANCOVA)  
o Here’s  what  we  are  modeling  (3  groups,  one  covariate)  
o If  the  lines  are  parallel,  then  we  can  impute  the  effect  of  the  treatment  by  looking  
at  how  ver<cally  separated  the  three  regression  lines  are  
yij = µ + τj + β(xij − x̄j ) + �ij

25   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Things  to  consider  with  ANCOVA  
o ANCOVA  does  not  “control”  for  the  covariate  
o it  is  like  blocking  or  matching  
o regression  doesn’t  “control”  anything  
o control  requires  random  assignment  
o The  separate  regressions  should  have  parallel  slopes  
o if  the  slopes  are  different,  add  an  interac<on  term  between  the  covariate  and  the  
treatment  
o of  course,  this  will  make  your  interpreta<on  of  the  results  more  difficult  
o this  is  a  similar  problem  to  tes<ng  simple  effects  in  factorial  ANOVA  
o tes<ng  this  interac<on  term  is  ojen  called  tes<ng  the  “parallelism  assump<on”  
o The  other  usual  assump<ons  of  ANOVA  s<ll  apply  

26   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Mul<variate  Analysis  of  Variance  (MANOVA)  
o The  model  is  the  same,  except  Y  is  now  a  matrix  
o The  dimensionality  of  Y  is  q  

Ynq = Xnp Bpq + Enq


o Es<ma<on  is  the  same  (ordinary  least  squares)  
−1
B = (X� X) X� Y

o But  our  hypothesis  tests  require  a  mul<variate  distribu<on  


o We  normally  assume  a  mul<variate  Normal  distribu<on  

27   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Mul<variate  Analysis  of  Variance  (MANOVA)  
We  seek  a  rota<on  that  produces  a  maximum  ra<o  of  between  and  
within  groups  variance  

Y2

Y1

28   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Mul<variate  Analysis  of  Variance  (MANOVA)  
o Tes<ng  hypotheses  
   
0 1 0 ··· 0
 0 0 1 ··· 0 
  Contrast  matrix  
A= . . .. .. .. 
 .. .. . . . 
0 0 0 ··· 1

H = B� A� (X� X)−1 AB Hypothesis  sum  of  squares  

G = E� E Error  sum  of  squares  

(H − λG)v = 0 Characteris<c  equa<on  

29   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Mul<variate  Analysis  of  Variance  (MANOVA)  
o Tes<ng  hypotheses  
•  Roy's  Largest  Root:  based  the  first  (largest)  eigenvalue            
•  Wilks'  Lambda:  based  on  the  product  of  the  reciprocal  eigenvalues            
•  Pillai  Trace:  based  on  the  sum  of  the  reciprocal  eigenvalues            
•  Hotelling-­‐Lawley  Trace:  based  on  the  sum  of  the  eigenvalues  

o Wilks’  Lambda  can  be  transformed  to  exact  or  approximate  F  


o If  you  don’t  know  what  an  eigenvalue  is,  don’t  worry  
o Most  people  who  use  sta<s<cs  packages  don’t  know  either  
o But  they  love  to  use  the  word  at  cocktail  par<es  
o It’s  also  called  a  characteris<c  value  or  latent  root    
o Germans  prefer  the  term  eigenvalue  
o Malcolm  Gladwell  prefers  the  term  Igon  Value  (Steven  Pinker,  NYT)  

30   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Repeated  Measures  ANOVA  
o Use  the  MANOVA  model  (it’s  safer)  
o Tes<ng  hypotheses  
 
0 1 0 ··· 0
 0 0 1 ··· 0  Treatments  contrasts  
 
A= . . . .. .. 
 .. .. .. . . 
0 0 0 ··· 1
 
1 −1 0 ··· 0 0
 0 1 −1 · · · 0 0 
  Measures  (trials)  contrasts  
C= . .. .. .. .. .. 
 .. . . . . . 
0 0 0 ··· 1 −1
  One  can  also  use  polynomials  (linear,  quadra<c,  cubic,  …)  in  C  matrix  
31   Copyright  ©  2016  Leland  Wilkinson    
 
Comparing  
o Repeated  Measures  ANOVA  
o Use  the  MANOVA  model  (it’s  safer)  
o Tes<ng  hypotheses  
 

H = C� B� A� (X� X)−1 ABC Hypothesis  sum  of  squares  

G = C� E� EC Error  sum  of  squares  

32   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Repeated  Measures  ANOVA  
Assume  we  have  4  groups  and  3  trials.    
In  the  one-­‐way  repeated  measures  model,  we  are  interested  in  three  tests:            
 •  Are  the  4  profiles  parallel?  (no  group  x  trial  interac<on)            
 •  Are  all  4  profiles  coincident?  (no  group  effect)            
 •  Are  the  profiles  level?  (no  trial  effect)  

These  tests  are  done  in  sequence.    


 1)  If  the  4  profiles  are  parallel,  then  we  can  go  on  to  compare  means  
across  profiles  to  see  if  they  are  coincident.  Otherwise,  there  is  an  interac<on  
between  the  trials  factor  and  the  grouping  factor  and  we  have  to  stop  there.    
 2)  If  the  4  profiles  are  coincident,  then  we  can  go  on  to  test  whether  they  
are  level.  If  not,  then  there  is  a  groups  effect  and  we  have  to  stop  there.    
 3)  If  they  are  level,  then  there  is  no  trials  effect.  

33   Copyright  ©  2016  Leland  Wilkinson    


Comparing  
o Repeated  Measures  ANOVA  

34   Copyright  ©  2016  Leland  Wilkinson    

You might also like