Bioinformatics: Applications Note

The program MODELTEST uses log likelihood scores to determine the best-fitting model of DNA evolution for a given dataset. It compares nested models of DNA substitution in a hierarchical framework using likelihood ratio tests and the Akaike information criterion. MODELTEST is written in C and accepts input files of likelihood scores from PAUP* or other programs to select the model that best fits the data according to these statistical tests. It outputs the results of the model selection process.

Uploaded by

angelica barraza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views2 pages

Bioinformatics: Applications Note

Uploaded by

angelica barraza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

&# %&

BIOINFORMATICS APPLICATIONS NOTE )

MODELTEST: testing the model of DNA

substitution
," &) % "*! (%##
'(*$%* & &&#& - (" !$ &+% %",()"*- (&,
.

Abstract through the Monte Carlo simulation (parametric bootstrap-

Summary: The program MODELTEST uses log likeli- ping) (Goldman, 1993).
hood scores to establish the model of DNA evolution that Another way of comparing different models without the
best fits the data. nested requirement is the Akaike information criterion
Availability: The MODELTEST package, including the (minimum theoretical information criterion, AIC) (Akaike,
source code and some documentation is available at 1974). The AIC is a useful measure that rewards models for
http://bioag.byu.edu/zoology/crandall_lab/modeltest.html. good fit, but imposes a penalty for unnecessary parameters
Contact: [email protected] (e.g. Hasegawa, 1990). If L is the maximum value of the li-
kelihood function for a specific model using n independently
All phylogenetic methods make assumptions, whether ex- adjusted parameters within the model, then AIC = –2ln L +
plicit or implicit, about the process of DNA substitution (Fel- 2n. Smaller values of AIC indicate better models.
senstein, 1988). For example, an assumption common to MODELTEST is a simple program written in ANSI C and
many phylogenetic methods is a bifurcating tree to describe compiled for the Power Macintosh using Metrowerks Code-
the phylogeny of species (Huelsenbeck and Crandall, 1997). Warrior. It is designed to compare different nested models of
Consequently, all the methods of phylogenetic inference de- DNA substitution in a hierarchical hypothesis-testing frame-
pend on their underlying models. To have confidence in in- work (Figure 1). MODELTEST calculates the likelihood
ferences it is necessary to have confidence in the models ratio test statistic δ = 2 log Λ and its associated P-value
(Goldman, 1993). Because of this, all the methods based on using a χ2 distribution with q degrees of freedom in order to
explicit models of evolution should explore which is the reject or fail to reject different null hypotheses about the pro-
model that fits the data best, justifying then its use. In tradi- cess of DNA substitution. It also calculates the AIC estimate
tional statistical theory, a widely accepted statistic for testing associated with each likelihood score.
the goodness of fit of models is the likelihood ratio test statis- The user communicates with the program using a standard
tic δ = 2 log Λ, being console interface, where the input and output files as well as
some options and help can be specified. By default, the program
max [L 0 (Null Model | Data)] will accept two classes of input files: a file containing ordered
raw log likelihood scores corresponding to the tested models
max [L 1 (Alternative Model | Data)]
(see Figure 1) or a PAUP* (Swofford, 1998) file containing a
where L0 is the likelihood under the null hypothesis (simple matrix of the same log likelihood scores resulting from the ex-
model) and L1 is the likelihood under the alternative hypoth- ecution of a block of PAUP* (Swofford, 1998) commands. This
esis (more complex, parameter rich, model). When the mo- block of PAUP* commands is available in the documentation.
dels compared are nested (the null hypothesis is a special When specified, the program can also read a file with likelihood
case of the alternative hypothesis), and the null hypothesis is scores for identifying the minimum AIC estimate. The output
correct, the δ statistic is asymptotically distributed as χ2 with of MODELTEST consists of the P-values corresponding to the
q degrees of freedom, where q is the difference in number of tests performed. In these tests the null hypotheses are equal base
free parameters between the two models; equivalently, q is frequencies, transition rate equals transversion rate, equal transi-
the number of restrictions on the parameters of the alternative tion rates and equal transversion rates, rates equal among sites
hypothesis required to derive the particular case of the null and no invariable sites. Finally, the program interprets these P-
hypothesis (Kendall and Stuart, 1979). To preserve the nest- values and chooses the model that fits the data best among those
ing of the models, the likelihood scores are estimated using tested following the likelihood ratio test and/or AIC criteria,
the same tree, and then, once the models have been com- using a default individual alpha value of 0.01 (for maintaining
pared, a final tree is estimated using the chosen model of an overall alpha value of 0.05, the standard Bonferroni correc-
evolution. When the models are not nested, an alternative tion — alpha/number of tests — results in an individual alpha
means of generating the null distribution of the δ statistic is value of 0.01), or another value specified by the user.

Oxford University Press 817

D.Posada and K.A.Crandall

Fig. 1. Hierarchical hypothesis testing in MODELTEST. At each level the null hypothesis (upper model) is either accepted (A) or reject ed (R).
The models of DNA substitution are: JC (Jukes and Cantor, 1969), K80 (Kimura, 1980), SYM (Zharkikh, 1994), F81 (Felsenstein, 19 81), HKY
(Hasegawa et al., 1985), and GTR (Rodríguez et al., 1990). Γ: shape parameter of the gamma distribution; I: proportion of invariable sites. df:
degrees of freedom. !: equal base frequencies (0.25), πA: frequency of adenine, πC: frequency of cytosine, πG: frequency of guanine, πT:
frequency of thymine. ρ: equal substitution rate, α: transition rate, β: transversion rate; µ1: A⇒C rate, µ2: A⇒G rate, µ3: A⇒T rate, µ4: C⇒G
rate, µ5: C⇒T rate, µ6: G⇒T rate.

Acknowledgements Huelsenbeck,J.P. and Crandall,K.A. (1997) Phylogeny estimation and

hypothesis testing using maximum likelihood. Annu. Rev. Ecol. Syst.,
This project was supported by a fellowship from Caixagali- 28, 437–466.
cia Foundation (D.P.), the Alfred P. Sloan Foundation Jukes,T.H. and Cantor,C.R. (1969) Evolution of protein molecules. In
(K.A.C), and the National Institutes of Health (K.A.C.). We Munro (ed.), Mammalian Protein Metabolism. Academic Press, New
wish to thank the anonymous reviewers for their excellent York, pp. 21–132.
suggestions. Kendall,M. and Stuart,A. (1979) The Advanced Theory of Statistics, Vol.
2, 4th edn. Charles Griffin, London, pp. 240–252.
Kimura,M. (1980) A simple method for estimating evolutionary rate of
References base substitutions through comparative studies of nucleotide sequences.
Akaike,H. (1974) A new look at the statistical model identification. IEEE J. Mol. Evol., 16, 111–120.
Trans. Autom. Contr., 19, 716–723. Rodríguez,F.J., Oliver,J.L., Marín,A. and Medina,J.R. (1990) The general
Felsenstein,J. (1988) Phylogenies from molecular sequences: inference stochastic model of nucleotide substitution. J. Theor. Biol., 142,
and reliability. Annu. Rev. Genet., 22, 521–565. 485–501.
Goldman,N. (1993) Statistical tests of models of DNA substitution. J. Mol. Swofford,D.L. (1998) PAUP*: phylogenetic analysis using parsimony
Evol., 36, 182–198. (and other methods). Version 4.0 (prerelease test version). Sinauer,
Hasegawa,M. (1990) Phylogeny and molecular evolution in primates. Jpn Sunderland, Massachusetts (in press).
J. Genet., 65, 243–265. Zharkikh,A. (1994) Estimation of evolutionary distances between nucleo-
Hasegawa,M., Kishino,H. and Yano,T. (1985) Dating of the human-ape tide sequences. J. Mol. Evol., 9, 315–329.
splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol., 21,
160–174.

818

Bioinformatics 1st Edition Michael J. Dunn: Encyclopedia of Genetics Genomics Proteomics and
100% (1)
Bioinformatics 1st Edition Michael J. Dunn: Encyclopedia of Genetics Genomics Proteomics and
69 pages
Improved Statistical Test
87% (172)
Improved Statistical Test
20 pages
Probability Models For DNA Sequence Evolution, 2nd Edition PDF
100% (16)
Probability Models For DNA Sequence Evolution, 2nd Edition PDF
16 pages
V81i09 1
No ratings yet
V81i09 1
46 pages
ASBP Training - Alignment and Phylogeny
No ratings yet
ASBP Training - Alignment and Phylogeny
36 pages
Accurate Prediction of Genetic Values
No ratings yet
Accurate Prediction of Genetic Values
33 pages
Projectreport
No ratings yet
Projectreport
31 pages
Cornualt y Sanmartín 2022 - UNA HOJA DE RUTA PARA ÁRBOLES DE ESPECIES
No ratings yet
Cornualt y Sanmartín 2022 - UNA HOJA DE RUTA PARA ÁRBOLES DE ESPECIES
41 pages
Abadi Et Al. 2019 - Model Selection May Not Be Mandaotry For Phylogenetic Reconstruction
No ratings yet
Abadi Et Al. 2019 - Model Selection May Not Be Mandaotry For Phylogenetic Reconstruction
11 pages
Predicting Breeding Values With Applications in Forest Tree Improvement-Springer Netherlands (1989)
No ratings yet
Predicting Breeding Values With Applications in Forest Tree Improvement-Springer Netherlands (1989)
371 pages
Markov Chain Monte Carlo Computation of Confidence Intervals For Substitution-Rate Variation in Proteins
No ratings yet
Markov Chain Monte Carlo Computation of Confidence Intervals For Substitution-Rate Variation in Proteins
12 pages
Sullivan&Joyce 2005
No ratings yet
Sullivan&Joyce 2005
24 pages
Statistical Inference - Part - III PDF
No ratings yet
Statistical Inference - Part - III PDF
45 pages
Estimating Phylogenetic Trees With Phangorn (Version 1.6-0) : Klaus P. Schliep April 5, 2012
No ratings yet
Estimating Phylogenetic Trees With Phangorn (Version 1.6-0) : Klaus P. Schliep April 5, 2012
12 pages
Global Test
No ratings yet
Global Test
67 pages
Kartavtsev mtDNA2011review
No ratings yet
Kartavtsev mtDNA2011review
11 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
10 pages
W13课后
No ratings yet
W13课后
40 pages
J of Evolutionary Biology - 2011 - GRUEBER - Multimodel Inference in Ecology and Evolution Challenges and Solutions
No ratings yet
J of Evolutionary Biology - 2011 - GRUEBER - Multimodel Inference in Ecology and Evolution Challenges and Solutions
13 pages
Jmodeltest 0.1.1: (April 2008)
No ratings yet
Jmodeltest 0.1.1: (April 2008)
23 pages
89-Article Text-351-1-10-20220725
No ratings yet
89-Article Text-351-1-10-20220725
10 pages
Choosing The Right Test
No ratings yet
Choosing The Right Test
6 pages
Entropy: Measures of Difference and Significance in The Era of Computer Simulations, Meta-Analysis, and Big Data
No ratings yet
Entropy: Measures of Difference and Significance in The Era of Computer Simulations, Meta-Analysis, and Big Data
11 pages
Biotechniques Simon
No ratings yet
Biotechniques Simon
22 pages
AP Bio Cheat Sheet Packet
No ratings yet
AP Bio Cheat Sheet Packet
24 pages
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
No ratings yet
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
32 pages
JKBDSSK
No ratings yet
JKBDSSK
19 pages
14 Pam
No ratings yet
14 Pam
9 pages
Appendix Fig1
No ratings yet
Appendix Fig1
1 page
(IJCST-V1I2P7) : T.Shanmugavadivu, T.Ravichandran
No ratings yet
(IJCST-V1I2P7) : T.Shanmugavadivu, T.Ravichandran
3 pages
The Statistical Analysis of Mitochondrial DNA Polymorphisms: X and The Problem of Small Samples
No ratings yet
The Statistical Analysis of Mitochondrial DNA Polymorphisms: X and The Problem of Small Samples
7 pages
Statistical Tests in Biology AQA Guidance
No ratings yet
Statistical Tests in Biology AQA Guidance
8 pages
Application of Statistical Tools For Data Analysis and Interpretation in Crops
No ratings yet
Application of Statistical Tools For Data Analysis and Interpretation in Crops
10 pages
Pub - Encyclopedia of Genetics Genomics Proteomics and B PDF
100% (1)
Pub - Encyclopedia of Genetics Genomics Proteomics and B PDF
4,046 pages
Ovaskainen Et Al A Bayesian Framework For Comparative Quantitative Genetics
No ratings yet
Ovaskainen Et Al A Bayesian Framework For Comparative Quantitative Genetics
10 pages
BLAST Lab
No ratings yet
BLAST Lab
6 pages
CSE 312-Introduction To Statistical Tools in Research - Question Bank
No ratings yet
CSE 312-Introduction To Statistical Tools in Research - Question Bank
6 pages
BIOL180 Practice4 2018fall
No ratings yet
BIOL180 Practice4 2018fall
20 pages
Movement Assessment Battery For Children: 2nd Edition (MABC-2)
50% (2)
Movement Assessment Battery For Children: 2nd Edition (MABC-2)
20 pages
Lab 1b Model Organism Genetics Ver 2
No ratings yet
Lab 1b Model Organism Genetics Ver 2
9 pages
Fgene 10 00899
No ratings yet
Fgene 10 00899
4 pages
MATH3353 Notes
No ratings yet
MATH3353 Notes
100 pages
Multivariate Exploratory
No ratings yet
Multivariate Exploratory
13 pages
Statistical For de
No ratings yet
Statistical For de
9 pages
Problem Set 5 - April Rose B. Clarin Educ 303
No ratings yet
Problem Set 5 - April Rose B. Clarin Educ 303
2 pages
2071 TC2AILab5
No ratings yet
2071 TC2AILab5
6 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
Ho:  = 0 H:  0 Z= g B 0 σ: Tukey Ho: μ H1: μ q= SE II. Answer briefly. Use point form. 25pts
No ratings yet
Ho:  = 0 H:  0 Z= g B 0 σ: Tukey Ho: μ H1: μ q= SE II. Answer briefly. Use point form. 25pts
5 pages
Patterns of Inheritance - Aria Foroughi
No ratings yet
Patterns of Inheritance - Aria Foroughi
9 pages
Blast Pre Lab
No ratings yet
Blast Pre Lab
2 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
Introduction To Bios Tatis Tic S Second
No ratings yet
Introduction To Bios Tatis Tic S Second
374 pages
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
No ratings yet
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
3 pages
Inference About A Population Mean
No ratings yet
Inference About A Population Mean
19 pages
Phylogenetic Analysis
100% (1)
Phylogenetic Analysis
27 pages
1 Improved Statistical Test
100% (1)
1 Improved Statistical Test
20 pages
Approximate Bayesian Computation (ABC) in Practice
No ratings yet
Approximate Bayesian Computation (ABC) in Practice
9 pages
1 Improved Statistical Test
No ratings yet
1 Improved Statistical Test
20 pages
Non Parametrical Statics Biological With R PDF
No ratings yet
Non Parametrical Statics Biological With R PDF
341 pages
STAT613
No ratings yet
STAT613
295 pages
Introduction To Research Methodology: Mr. Rajasekar Ramalingam
100% (1)
Introduction To Research Methodology: Mr. Rajasekar Ramalingam
41 pages
Assignment 1
No ratings yet
Assignment 1
43 pages
Morrison Et Al-2020-Cochrane Database of Systematic Reviews
No ratings yet
Morrison Et Al-2020-Cochrane Database of Systematic Reviews
120 pages
Eca Micro Project
No ratings yet
Eca Micro Project
22 pages
The Hypothesis-Oriented Algorithm For Clinicians II (HOAC II) : A Guide For Patient Management
No ratings yet
The Hypothesis-Oriented Algorithm For Clinicians II (HOAC II) : A Guide For Patient Management
16 pages
NIH Public Access: Author Manuscript
No ratings yet
NIH Public Access: Author Manuscript
9 pages
Research in Developmental Disabilities
No ratings yet
Research in Developmental Disabilities
7 pages
Keynutritionalstrategiesto Optimizeperformancein Paraathletes
No ratings yet
Keynutritionalstrategiesto Optimizeperformancein Paraathletes
16 pages
Paediatric Respiratory Reviews: S. Rand, L. Hill, S.A. Prasad
No ratings yet
Paediatric Respiratory Reviews: S. Rand, L. Hill, S.A. Prasad
7 pages
Sports-Related Injuries in Athletes With Disabilities: Review
No ratings yet
Sports-Related Injuries in Athletes With Disabilities: Review
12 pages
(Ebook) Analysis of Panel Data by Cheng Hsiao ISBN 9781009057745, 9781009060752, 9781316512104, 100905774X, 1009060759, 131651210X Download
No ratings yet
(Ebook) Analysis of Panel Data by Cheng Hsiao ISBN 9781009057745, 9781009060752, 9781316512104, 100905774X, 1009060759, 131651210X Download
85 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Tests of Hypothesis-Large Samples
No ratings yet
Tests of Hypothesis-Large Samples
7 pages
Lampiran
No ratings yet
Lampiran
8 pages
RCBD (Recovered)
No ratings yet
RCBD (Recovered)
40 pages
EC221: Principles of Econometrics Introducing Lent: DR M. Schafgans
No ratings yet
EC221: Principles of Econometrics Introducing Lent: DR M. Schafgans
518 pages
Statistical Failure Models For Water Distribution Pipes - A Review From A Unified Perspective
No ratings yet
Statistical Failure Models For Water Distribution Pipes - A Review From A Unified Perspective
11 pages
An Introduction To Neuroimaging Analysis: General Linear Model For Neuroimaging
No ratings yet
An Introduction To Neuroimaging Analysis: General Linear Model For Neuroimaging
35 pages
Assumptions For Regression Analysis: MGMT 230: Introductory Statistics
No ratings yet
Assumptions For Regression Analysis: MGMT 230: Introductory Statistics
3 pages
L4b - Perfomance Evaluation Metric - Regression
No ratings yet
L4b - Perfomance Evaluation Metric - Regression
6 pages
Session: 27: Topic
No ratings yet
Session: 27: Topic
62 pages
Luto Ni Bespren
No ratings yet
Luto Ni Bespren
3 pages
Bodo Winter's ANOVA Tutorial
No ratings yet
Bodo Winter's ANOVA Tutorial
18 pages
Zuur A.F. Et Al 2009 - Mixed Effects Models and Extensions in Ecology With R - Chap03
No ratings yet
Zuur A.F. Et Al 2009 - Mixed Effects Models and Extensions in Ecology With R - Chap03
30 pages
Marginal Likelihood Estimation Via Power Posteriors: N. Friel
No ratings yet
Marginal Likelihood Estimation Via Power Posteriors: N. Friel
13 pages
Session - 01 Notes
No ratings yet
Session - 01 Notes
17 pages
Limits of Simple Regression: Allen Downey
No ratings yet
Limits of Simple Regression: Allen Downey
43 pages
Statistics and Probability
No ratings yet
Statistics and Probability
18 pages
SDSC3006 - Assignment 1
No ratings yet
SDSC3006 - Assignment 1
2 pages
1.1 Parametric and Nonparametric Statistical Inference
No ratings yet
1.1 Parametric and Nonparametric Statistical Inference
8 pages
Widiantari
No ratings yet
Widiantari
13 pages
UAS Materi 13 Uji Chi-Square Dan Korelasi Spearman
No ratings yet
UAS Materi 13 Uji Chi-Square Dan Korelasi Spearman
27 pages
Course Outline Econometrics-I 2022-23
No ratings yet
Course Outline Econometrics-I 2022-23
4 pages
Work Sheet For Final Exam
No ratings yet
Work Sheet For Final Exam
4 pages
Predictive Analytics Practice Problem
No ratings yet
Predictive Analytics Practice Problem
3 pages
04 Hme 712 Week 4 Spearman's Rank Correlation Coefficient Video
No ratings yet
04 Hme 712 Week 4 Spearman's Rank Correlation Coefficient Video
2 pages

Bioinformatics: Applications Note

Uploaded by

Bioinformatics: Applications Note

Uploaded by

&#  %&

MODELTEST: testing the model of DNA

Abstract through the Monte Carlo simulation (parametric bootstrap-

Oxford University Press 817

Acknowledgements Huelsenbeck,J.P. and Crandall,K.A. (1997) Phylogeny estimation and

You might also like

&# %&