0% found this document useful (0 votes)
47 views2 pages

Bioinformatics: Applications Note

The program MODELTEST uses log likelihood scores to determine the best-fitting model of DNA evolution for a given dataset. It compares nested models of DNA substitution in a hierarchical framework using likelihood ratio tests and the Akaike information criterion. MODELTEST is written in C and accepts input files of likelihood scores from PAUP* or other programs to select the model that best fits the data according to these statistical tests. It outputs the results of the model selection process.

Uploaded by

angelica barraza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views2 pages

Bioinformatics: Applications Note

The program MODELTEST uses log likelihood scores to determine the best-fitting model of DNA evolution for a given dataset. It compares nested models of DNA substitution in a hierarchical framework using likelihood ratio tests and the Akaike information criterion. MODELTEST is written in C and accepts input files of likelihood scores from PAUP* or other programs to select the model that best fits the data according to these statistical tests. It outputs the results of the model selection process.

Uploaded by

angelica barraza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

&#  %&


BIOINFORMATICS APPLICATIONS NOTE  )   

MODELTEST: testing the model of DNA


substitution
," &) % "*!  (%##
'(*$%* & &&#& - (" !$ &+% %",()"*-   (&,  
.  

      
    
  

Abstract through the Monte Carlo simulation (parametric bootstrap-


Summary: The program MODELTEST uses log likeli- ping) (Goldman, 1993).
hood scores to establish the model of DNA evolution that Another way of comparing different models without the
best fits the data. nested requirement is the Akaike information criterion
Availability: The MODELTEST package, including the (minimum theoretical information criterion, AIC) (Akaike,
source code and some documentation is available at 1974). The AIC is a useful measure that rewards models for
http://bioag.byu.edu/zoology/crandall_lab/modeltest.html. good fit, but imposes a penalty for unnecessary parameters
Contact: [email protected] (e.g. Hasegawa, 1990). If L is the maximum value of the li-
kelihood function for a specific model using n independently
All phylogenetic methods make assumptions, whether ex- adjusted parameters within the model, then AIC = –2ln L +
plicit or implicit, about the process of DNA substitution (Fel- 2n. Smaller values of AIC indicate better models.
senstein, 1988). For example, an assumption common to MODELTEST is a simple program written in ANSI C and
many phylogenetic methods is a bifurcating tree to describe compiled for the Power Macintosh using Metrowerks Code-
the phylogeny of species (Huelsenbeck and Crandall, 1997). Warrior. It is designed to compare different nested models of
Consequently, all the methods of phylogenetic inference de- DNA substitution in a hierarchical hypothesis-testing frame-
pend on their underlying models. To have confidence in in- work (Figure 1). MODELTEST calculates the likelihood
ferences it is necessary to have confidence in the models ratio test statistic δ = 2 log Λ and its associated P-value
(Goldman, 1993). Because of this, all the methods based on using a χ2 distribution with q degrees of freedom in order to
explicit models of evolution should explore which is the reject or fail to reject different null hypotheses about the pro-
model that fits the data best, justifying then its use. In tradi- cess of DNA substitution. It also calculates the AIC estimate
tional statistical theory, a widely accepted statistic for testing associated with each likelihood score.
the goodness of fit of models is the likelihood ratio test statis- The user communicates with the program using a standard
tic δ = 2 log Λ, being console interface, where the input and output files as well as
some options and help can be specified. By default, the program
max [L 0 (Null Model | Data)] will accept two classes of input files: a file containing ordered
 raw log likelihood scores corresponding to the tested models
max [L 1 (Alternative Model | Data)]
(see Figure 1) or a PAUP* (Swofford, 1998) file containing a
where L0 is the likelihood under the null hypothesis (simple matrix of the same log likelihood scores resulting from the ex-
model) and L1 is the likelihood under the alternative hypoth- ecution of a block of PAUP* (Swofford, 1998) commands. This
esis (more complex, parameter rich, model). When the mo- block of PAUP* commands is available in the documentation.
dels compared are nested (the null hypothesis is a special When specified, the program can also read a file with likelihood
case of the alternative hypothesis), and the null hypothesis is scores for identifying the minimum AIC estimate. The output
correct, the δ statistic is asymptotically distributed as χ2 with of MODELTEST consists of the P-values corresponding to the
q degrees of freedom, where q is the difference in number of tests performed. In these tests the null hypotheses are equal base
free parameters between the two models; equivalently, q is frequencies, transition rate equals transversion rate, equal transi-
the number of restrictions on the parameters of the alternative tion rates and equal transversion rates, rates equal among sites
hypothesis required to derive the particular case of the null and no invariable sites. Finally, the program interprets these P-
hypothesis (Kendall and Stuart, 1979). To preserve the nest- values and chooses the model that fits the data best among those
ing of the models, the likelihood scores are estimated using tested following the likelihood ratio test and/or AIC criteria,
the same tree, and then, once the models have been com- using a default individual alpha value of 0.01 (for maintaining
pared, a final tree is estimated using the chosen model of an overall alpha value of 0.05, the standard Bonferroni correc-
evolution. When the models are not nested, an alternative tion — alpha/number of tests — results in an individual alpha
means of generating the null distribution of the δ statistic is value of 0.01), or another value specified by the user.

 Oxford University Press 817


D.Posada and K.A.Crandall

Fig. 1. Hierarchical hypothesis testing in MODELTEST. At each level the null hypothesis (upper model) is either accepted (A) or reject ed (R).
The models of DNA substitution are: JC (Jukes and Cantor, 1969), K80 (Kimura, 1980), SYM (Zharkikh, 1994), F81 (Felsenstein, 19 81), HKY
(Hasegawa et al., 1985), and GTR (Rodríguez et al., 1990). Γ: shape parameter of the gamma distribution; I: proportion of invariable sites. df:
degrees of freedom. !: equal base frequencies (0.25), πA: frequency of adenine, πC: frequency of cytosine, πG: frequency of guanine, πT:
frequency of thymine. ρ: equal substitution rate, α: transition rate, β: transversion rate; µ1: A⇒C rate, µ2: A⇒G rate, µ3: A⇒T rate, µ4: C⇒G
rate, µ5: C⇒T rate, µ6: G⇒T rate.

Acknowledgements Huelsenbeck,J.P. and Crandall,K.A. (1997) Phylogeny estimation and


hypothesis testing using maximum likelihood. Annu. Rev. Ecol. Syst.,
This project was supported by a fellowship from Caixagali- 28, 437–466.
cia Foundation (D.P.), the Alfred P. Sloan Foundation Jukes,T.H. and Cantor,C.R. (1969) Evolution of protein molecules. In
(K.A.C), and the National Institutes of Health (K.A.C.). We Munro (ed.), Mammalian Protein Metabolism. Academic Press, New
wish to thank the anonymous reviewers for their excellent York, pp. 21–132.
suggestions. Kendall,M. and Stuart,A. (1979) The Advanced Theory of Statistics, Vol.
2, 4th edn. Charles Griffin, London, pp. 240–252.
Kimura,M. (1980) A simple method for estimating evolutionary rate of
References base substitutions through comparative studies of nucleotide sequences.
Akaike,H. (1974) A new look at the statistical model identification. IEEE J. Mol. Evol., 16, 111–120.
Trans. Autom. Contr., 19, 716–723. Rodríguez,F.J., Oliver,J.L., Marín,A. and Medina,J.R. (1990) The general
Felsenstein,J. (1988) Phylogenies from molecular sequences: inference stochastic model of nucleotide substitution. J. Theor. Biol., 142,
and reliability. Annu. Rev. Genet., 22, 521–565. 485–501.
Goldman,N. (1993) Statistical tests of models of DNA substitution. J. Mol. Swofford,D.L. (1998) PAUP*: phylogenetic analysis using parsimony
Evol., 36, 182–198. (and other methods). Version 4.0 (prerelease test version). Sinauer,
Hasegawa,M. (1990) Phylogeny and molecular evolution in primates. Jpn Sunderland, Massachusetts (in press).
J. Genet., 65, 243–265. Zharkikh,A. (1994) Estimation of evolutionary distances between nucleo-
Hasegawa,M., Kishino,H. and Yano,T. (1985) Dating of the human-ape tide sequences. J. Mol. Evol., 9, 315–329.
splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol., 21,
160–174.

818

You might also like