Molecular Systematics - David Hillis, Craig Moritz, Barbara Mable

nr Systematics
Second Edition
Edited by
David M. Hillis
THE UNIVERSITY OF TEXAS
Craig Moritz
UNIVERSITY OF QUEENS1,AND
and
Barbara K,MabTe
THE UNIVERSITY OF TEXAS
Sinauer Associates, Inc. * Publishers

Sunderland, Massachusetts U.S.A.
The Cover
A clrcular Cree of life inferred from ribosomal RNA genes, superimposed on the
genome of a triploid parthenogenetic species of gecko (Heteronotia binoei com-
plex) 'The chromosomes were stained by fluorescent in situ hybridization for the
ribosomal DNA arrays m a study of concerted evolution of repeated genes (see
Hillls et 11 ,1991, Science 251.308-310). The photographs represent a sampling of
rnult~ccllularspecies from the tips of the tree of life Photographs by David Hillis.
Graphic design by Janet Young.
A/IoEeculnr Systemntics, Second Edition
Copyright O 1996 by Sinauer Associates, Inc.

All rigliis reserved. This book may not be reproduced for
any purpose without permission from the publisher. For
information, address Sinauer Associates, 23 Plumtree
Road, Sunderland, MA 01 375-0407 U.S.A.
FAX:413-549-1118
Internet: yublishQsinauer.com
Library of Congress Cataloging-in-Publication Data

Molecular systematics / edited by David M. Hillis, Craig Moritz,
Barbara K. Mable. -2nd ed.
P. cm.
Includes bibliographical references (p. ) and indexes.
1SBN 0-87893-282-8 (paper lay-flat bdg.)
1 Brology-Classifiiat1011-Molecular aspects. I. Hillis, David
M., '1958- 11. Moritz, Craig. 111. Mable, Barbara K., 1963-.
Ql-183.M665 1996
574.8'S--dc20 95-41159
CIP
I'nnied in Canada
5 4 3 2 1
Contents in Brief
I. Molecular Systematics: Context and Controversies 1
Part 1,Sampling
2. Project Design 17
3. Collection and Storage of Tissues 29
Part 2. Molecular Techniques

4. Proteins: Isozyme Electrophoresis 51
5. Chromosomes: Molecular Cytogenetics 121
6. Nucleic Acids I: DNA-DNA Hybridization 169
7, Nucleic Acids 11: The Polymerase Chain Reaction 205
8. Nucleic Acids 111:Analysis of Fragments
and Restriction Sites 249
9. Nucleic Acids IV Sequencing and Cloning 321
Part 3. Analysis
10. Intraspecific Differentiation 385
11. Phylogenetic Inference 407
12. Applications of Molecular Systematics:
The State of the Field and a Look to the Future 515
Contents
Pveface
Preface to the First Edition
Contributors
CONTROVERSIES IN MOLECULAR SYSTEMATICS 5
Molecules versus Morphology 5
Molecular Systematics: Types of Characters and Metl~odsof Analysis 6
Corztext and Controve~sies 1 Homology and Similarity Molecular Systematics 7
Craig Moritz and David M. Hillis Gene Trees and Organisma1 Phylogeny 9
THE EVOLUTION OF MOLECULAR SYSTBMATICS 1 Constancy of EvolutionaryRates 10
The L i i Between Molecular Evolution and Neutrality of Molecular Variants 11
Systematics 3 Data Quality and Presentation I1
The Link Between Mdecular Population Genetics SCOPE AND USE OF TIHIS BOOK 12
and Phylogenetics 4 FOR FZTRTHER S T m Y 12
Part 1 Sampling
cftapk~r 2 MOLECULAR SYSTEMATICS 18
Project Design 17 Studies of Population Structure 19
Studies of Species Boundaries and Hybridization 22
Peter R. Baverstock and Craig Moritz
Phylogenetic Relationships 25
INTRODUCTION 17 CONCLUDING REMARKS 27
STATISTICAL CONSIDERATIONS 18
Shipping Regulations 36
Sources of Liquid Nitrogen and Dry Ice 36
Collection and Storage of Tissues 29 STORAGE OF TISSUES ON RETUXN FROM THE FIELD 37
Herbert C. Dessauer, Charles J. Cole,
and Mark S. Hafiler STABILITY OF MACROMOLECULES DURING LONG-TERM
STORAGE 37
INTRODUCTION 29
DEVELOPMENT AND SUPPORT OF SYNOPTIC TISSUE
TGGULATIONSGQVEIWNG ACQUISITIONOF SPECm/m\IS 30 COLLECTIONS 39
REMOVING AND PRESERVING TISSUES IN THE FIELD 30 Disposition of 'Iissues far Long-Term Preservation 40
General Procedures 30 Curatorial Problems Unique to Tlsue Collections 40
Procedures Unique to Animal Tissue Collection 33 EXISTING COLLECTIONS 41
Procedures Unique to Plant Tissue Collection 35
Collecting Cell Lines 35
TRANSPORT OF TISSUES FROM FIELD TO LABORATORY
OR BETWEEN LABORATORIES 36
Contents vii
Part 2 Molecular Techniques

chaf?tcr4 Protocols
Protocol 1: Essue Hoinogen~zatzon 73
Proteins: Isozyme Electrophoresis 51 P~atocol2:Preparation of Stnich Gels 79
Robert W. Murphy, Jack W. Sites, Jr., Profocol 3: Gel Loading 51
Donald G. Buth, and Christopher 13. Haufler P1otocol4. Electraplzorests 82
INTRODUCTION 51 Protocol5: Gel Sliczng 84
PRINCIPLES AND COMPARTSON OF METHODS 52 I'rotocol6: Hzsfochm~calStanzing 86
General Principles 52 Protocol 7: Dryzng of Agar Overlays 88
Assumptions 53 I'rotocol8: Docurnentatlotz of Results 89
Comparison of the Primary Methods 54 LMiERPIIETATION AND TROUBLESI-IOOTING 89
APPLICATIONS AND LIMITATIONS 56 ENZYME AND LOCUS NOMENCLATURE 94
Intraspecific Applications 56
Interspecific Applications 58
Appendix 1: Enzyme Staining F'onnuXas 96
Gene Expression and Gene Duplication 62 Apyettdix 2: Buffers and Traclcir~gDye
Limitations 63 for Isszjrmc Elcctrotmpi~oresis 116
LABORATORY SETUP 67
PROJECT PLANNING 69
Protocol 10:Mitohc Cl~nnnosomesfwn2Insect Embryos 154

Protocol 11: Polytene Chroinosomesfroin D~yteralz
Chronzoso~ms:Mole~~Za~
Cytogwtics 121 Salzvaly Glands 155
Stanley K. Sessions Protocol 12: Lampbrush Chroniosomes 155
PRINCIPLES AND COMPARISON OF METHODS 121 Protocol 13: C-Banding 156
General Principles 121 Profocol 14: Q-Baitdzng 157
Assumptions 126 Protocol 15: G-banding 157
Comparison of the Primary Methods 127 Protocol 16: Fluorochrorrie K-Banding wzth Clzroiirui~iycz~z
APPLICATXONS AND LIMITATIONS 142 A3 157
Applications 142 Protocol 17: &NOR Baliding 1.57
Limitations 146 Profocol18:Diffuaztial Xepllcntmz Banding unth BrdU 358
Protocol 19: Mod#cat~on of BrdLl Banding for
LABORATORY SETUP 148 Salainander En~biyos 158
Prataculs Protocol 20: Labeling lJro&for ISH via Nick Tmllsl~tion 159
Profocol 1: Subbed Slides 149 Protocol 21: Rad~olsotopzclSII for liciteratcd Sequcnics
Protocol 2: Mztotic ClzromosomesJranzGut Epithliuin 149 Uszng a D N A Probe 2 60
Protocol 3:Mitohc Chromosomesfrom PlantRoot Tips 149 Profocol 22: Iiadzolsotopic ISH Usmng arz RNA Probr 362
Profocol 4: Squash Technzque for Mitotic aizd Meiotic Profocol 23. Radioisotoptc Local~zationof Single-Copy
Clzrornosomes 150 Sequences 161
Protocol 5: Yeast Method for Mttotic Chroinosomesfronz Protocol 24: Autoradiography for Detection of
Snzall Vertebrates 151 Radioisotoptc ISH 162
Protocol 6: Splaslz Techniquefor Slide Preparations of Ptotocol25: Chromosome Pnmtiizg Uszng FISH 7 63
Mitotic Chromosomes 151 Protocol 26: FISH wtlh S~nglc-CopyGeizo~nicProbc 164
Protocol 7: Mitotic ChromosomesJronz Peripheral Blood tn INTERPRETATION AND TROUBLESHOOTlNG 165
Vertebrates 152 Chromosome Bands 165
Platocol8: Mitotic Clzroi~~osor?zes
from Fibroblast Cultures In Sltu llybr~dlzation 165
(Reptzles) 153
Protocol 9: Mitotic ClzinnzosomesfroinCorneal Appendix: Stock. Sc~lrrtions I66
Epitlielium of Vertebrates 153
...
vlrl Contents
Cktirp~er6 Protocol 3: Tracer Preparation with 3 2 or~ 3H 181

Protocol 4: Tracer Self-Reactionand Repeat Rmoval 182
Nz~cleicAcids I: DNA-DNA Protocol 5: Fractionation of Single-Copy Tracer over
Hybridization 169 Hydroxttapntife 183
Steven D.Werman, Mark S. Springer, and Protocol 6: Estimation oJTracer Fragment Length 184
Roy J. Bntte~x Protocol 7:Preparing Tracers Ly lodination 184
INTRODUCTION 169 Protocol 8: D N A Hybridization with Hydroxyapatite and
Phosplulfe Buffer 285
PIUbCIPLES AND COMPARISON OF METHODS 170
Protocol 9: Ilydroxyapat~teColumn Preparation 186
General Principles 170
Protocol 10: Pl?etzol Emulsion Reassociation Eci7nzqme
Summary of the DNA Hybridization Techniques and
(PERT) 187
Data Analysis 171
Protocol 21: Anulysrs of Hybrid Tl~emzalStability Using
Properties of Hybridrzahon Data 172
the S1 Nuclease-TEACL Assay 188
Factors Affecting DNA I-fybridization 172
The Criterion and Precision of lieassociation 173 INTERPRETATION AND TROUBLESHOOTWG 189
Comparison of the Primary Methods 174 Calculation of Melting Curves from Raw Counts:
An Example 189
Al>l'LlCATIONS AND LIMITATIONS 176
Problematic Melting Curves 192
LABORATORY SETUP 178 Characteristicsof Distance Estimates Derived from
Raw Melting Curve Data 194
Prcrioctrfs
Hybridization Data in Phylogmetic Reconstruction 197
Proiocol 1: D N A Isolatiorz and Purification 179
Pt o/ocol2. Preparing Sl~earedDrivers From .Long Natzue Appendix: Stock Solution5 201
Uh"4 180
I
Protocol 1: D N A isolation for PCR 222
Nucleic Acids 11: T h e Polymerase ProtocoI 2: The Polynzerase Chain Reaction 225
Chain Reaction 205 Profocol3: PCR From RNA 229
Stephen R. Palumbi TROUBLESHOOTNG 230
WTRODUCTlQN 205 Avoiding PCR Problems: PCR Hygiene 230
I'KINCIPLES AND COMPARISON OF METIIODS 206 Some Common Problems with PCR 230
Cei~eraiPrinciples 206 Problelns with Single-Strand Amplifications 231
The Cycle 207 USEFUL PRIMERS 232
Choosing Reaction Conditions 209 Nuclear Ribosomal Gene Primers 232
PCR Components 210 Animal Mitochondria1 Gene Primers 235
'l'lie Thermal Cycler 211 Chloroplast DNA Primers 239
l'r~mersand Primer Design 212 Jntron Primers 240
ASSUMPTIONS 214 More Information about PCR 245
AIJl'LIICA'rIONS AND LIMITATIONS 215 Append& Stock Sol-orkkxris 246
'Types of Amplifications and Types of Data 215
Contents ix
Chapter 8 Isra~ocols
Protocol 1: Isolafion of Aniinul m t D N A Using CsCl-p~
Nztcleic Acids III: Analysis of Fragments Gradients 283
and Restriction Sites 249 Protocol 2: Isolation of cyDNA Using Sucrose Step and
T h o m a s E. D o w l i n g , C r a i g M o r i t z , CsCI-EB Gradients. 289
J e f f r e y D. Palmer, a n d L o r e n H. Rieseberg Protocol 3: Digestion of D N A with Restriction
PPJNCIIT,ES AND COMPARISON OF METHODS 249 Endonucleasa 290
General Principles 249 Protocol 4: Agarose and Polyacylamide Electrophoresis 291
Assumptions 255 Protocol 5: Staining with Eth~diumBmmide 297
Comparison o f the Primary Methods 257 Pmtocol6. d 2 P3'End-labeling oJRestricfion Fragments 297
PmLowl7: Pnmer Labeling fov Microsc~tellite Analysk 298
APPLICATIONSAND LIMITATIONS 266
Protocol 8: Transfer Hybrid~zation 299
Choice o f Sequence 266
Protocol 12: Mapping Restriction Sites 302
Population-Level Comparisons 268
Species-Level Comparisons 276 INTERPRETATION AND TROUBLESHOOTING 308
Higher-Level Systematics 279 RFLP Analysis 308
Troubleshooting 314
LABORATORY SETUP 282
Microsatellites 317
Appendix: Stock Solutions 319
Gl~apter9 Protocol 7: Growtng Bacteriophage 347

Profocol8: Screening Bacteriophnge Libraries 348
Nucleic Acids IV: Sequencing and Protocol 9: Miniprep lsolafion of ;lD N A 349
Cloning 321 Protocol 10: Subclon~nginto Plasmids or M13 352
D a v i d M . Hillis, Barbara K.M a b l e , A l l a n Larson, Protocol 11: Preparation of Frozen Competent Cellsfor
Scott K.Davis, and Elizabeth A. Z i m m e r Transformation 351
PRINCIPLES AND COMPARISON OF METHODS 321 Protocol 12: I"lamfmmation of E.coli with Plamid DNA 352
Isolating Target Sequences 323 Pmtocoll3: Tuamjmnation@MI3 Bactwiophage DNA 352
Nucleic Acid Sequencing 326 Protocol 14: Isolation ofPlasmid D N A 353
Assumptions 330 Protocol 15: Miniprep lsolation of M I 3 D N A 354
Comparison o f the Primary Techniques 332 Protocol 16: Preparing Pemanent Frozen Stocks of
Plasmid Clones 354
APPLICATIONS AND LlMlTATIONS 335 Protocol 17: isolation of PCR Productsfor Sequerici 355
Evolution o f Genes 335 Pmtocol 18: Cloning Methods for PCR IJmducts 356
Inhaspecific Diversity 336 Protocol 19:Pun'fiMtion of PCR Pvoductsj w Seqwndng 359
Interspecific Diversity 337 Protocol 20: Screening Methods for Detecting Variation in
SUMMARY 339 D N A Sequences 361
LABORATORY S E W 339 Protocol 22: Preparing a Sequencing Gel 362
Protocol 22. D N A Sequencing Reactions 363
Protocols Protocol 23: RNA Sequencing Reactions 366
Protocol 2: D N A Isolation from Animals, Profists, and Protocol 24: Thermal Cycle Sequencing 367
Prokuyotes 342 Protocol 25: Running a Sequencing Gel 368
Protocol 2: DNA IsolatiOnfrom Plants, F u n g i 343 ProfocoI26: Microsatellites 370
Protocol 3: Isolation of DNAfrom Minute Quantities of
INTERPRETATION AND TROUBLESHOOTING 371
Tissue 344
Autoradiograph Interpretation 371
Protocol 4: Isolation of RNAfrom Animals 345
Sequence Comparison and Alignment 374
Protocol 5: Isolafton of X N A from Plants 346
Protocol 6: Preparation of Partial Gene Libraries in 2. Appendix: Stock Sofutions 378
Bacteriophage Vectors 347
x Contents
Part 3: Analysis
Chapter 10 APPLICATIONS 401
Ivltraspecific Differentiation 385 Conditional Genotypic Frequencies 401
Bruce S. Weir IMPLEMENTATION 402
Sampling 402
BIOLOGICAL CONTEXT 385 Analysis 403
Genetic and Statistical Sampling 387
AN EXAMPLE 403
Fixed and Random Models 388
CONCLUSION 405
STATISTICAL MBTIlODS 389
Fixed Populations 389
Random Populations 394
Chapfeeu21 OPTlMALITY CRITERIA 11: METHODS BASED ON MODELS

OF EVOLUTIONARY CI-IANGE 426
Phylogenetic Inference 407 The Utility of Models 426
David L. Swofford, Gary J. Olsen, Maximum Likelihood Methods 430
Peter J. Waddell, a n d David M. Hillis Painvise Distancc Methods 446
LNTROBUCTION 407 Model-Based Correctionsfor Character Data:
Algorithms versus Optimality Criteria 408 Hadamard Conjugation 464
Use of Models and Assumptioi'~~
inPhylogenetics 409 Lake's Method of Invariants 474
Definitions of Terms 410 Rooting Revisited 477
TYPES OF DATA 410 SEARCHING FOR OPTIMAL TREES 478
Sequence Data 422 Exact Algorithms 478
Restriction Endonuclease Data 412 Heuristic Approacl~cs 482
Isozyme Data 413 Algorithmic and Other Methods 486
Gene Order Data 414 RELIABILITY OF INFERRED TREES 493
OPTIMALITY CRITERIA I: PARSIMONY METHODS 415 Systematicversus Random Error 493
Fitch and Wagner Parsimony 416 Systematic Error 494
Other Parsimony Variants 419 Random Error 503
Generalized Parsimony 422
Appendix: Programs md Sofiware 510
Parsimoi~yon Protein Sequences 424
Parsimony on Aliozyme Data 425
Predictions of Time from Molecular Data 531

APPLICATION OF PHYLOGENIES FOR WALYZlNG MACRO-
Applications of Molecular Systematics 515 EVOLUTIONARYPATElWS: COlvlPAIUTNE Mf3HODS 540
David M. Hillis, Barbara K. Mable, a n d
Craig Moritz THE FUTURE OF MOLECULAR SYSTEMATICS 543
INTRODUCTION 515
Acknowledgments 545
DATA ANALYSIS: ISSUES AND CONTROVERSIBS 521 Measurement Symbols 548
Trees versus Networks 521
Combined versus Separate Analyses of Multiple Glossay and Abbreviations 549
Data Sets 522
Literut.ure Cited 560
Hypothesis Testing and the Parametric Bootsfrap 523
Phylogenetic Accuracy 526 Index 636
Preface
It is gratifying to us to see how much the field of molecular systematics has
grown and matured since the first edition of Molecular Systenzatics appeared six
years ago. We have received a considerable amount of helpful advice and sug-
gestions about material that should be included in this book, and we have tried to
incorporate as many of these suggestions as possible in this new edition. Every
chapter has been completely updated, and most chapters have undergone major
revision and expansion. Because of this expansion and our desire to include a
new chapter on the polymerase chain reaction, we decided to drop the chapter
on immunological techiques from the new edition to keep the size of the book
within reason. ImmunologicaI kechniques are no longer widely used in system-
atic studies, so there was little need to update the summary from the first edition.
In this edition, we have tried to incorporate more information on the
processes of molecular evolution. This is most visible in the chapter on phyloge-
netic inference (Chapter II), which now deals extensively with models of nu-
cleotide substitution. Throughout the remaining chapters, there is considerably
more information about applying the techniques to studying problems in molec-
ular evolution, although the emphasis is still on intraspecific and interspecific
systematic analyses. One of the trends in the molecular evolution literature that
has appeared over the past decade is a blurring of the distinction between studies
of molecular processes and studies of historical relationships among taxa. We see
this as a positive trend: as we learn more about how genes evolve, we have more
information to apply to the study of the history of populations and taxa; con-
versely, population genetic and phylogenetic analyses are providing critical con-
tributions to the body of information on gene evolution. This reciprocal illumi-
nation has resulted in rapid advances-in the merging fields of molecular
systematics and evolution.
As with the first edition, we have relied heavily on outside reviewers for ad-
vice and assistance. Most of the individuals who helped in the preparation of the
first edition (listed in the Preface to the First Edition, which follows) have con-
tributed to this edition as well, and we thank them for their extensive help and
continuing enthusiasm for this project. In addition, Chris Austin, Marty ~ a d g e t t ,
Mike Charleston, Keith Crandall, Sandie Degnan, Joseph Felsenstein, Christina
James, John Huelsenbeck, Shane Lavery, Paul Lewis, Peter Lockhart, Phillip
Tucker, David Maddison, Wayne Maddison, Jim McGuire, and David Penny have
provided reviews or other assistance. Janet Young helped design the cove; illus-
tration. The staff of Sinauer Associates has been extremely helpful in the produc-
tion of the book; we are especially grateful to Andy Sinauer, Chris Small, and
Carol Wigg for their dedication and work on this volume.
Preface to the First Edition
The need for a book on molecular systematics has been evident for many years,
However, no one person can possibly become a practitioner and at the same time
remain current in all of the molecular techniques used in systematic biology; the
technology changes too quickly. Thus, we decided in 1987 to organize a multi-
authored book on the subject. Because we were concerned about the possibility of
uneven treatment by the various authors, we structured the chapters carefully
and enforced the structure rigidly. We organized the book into three main sec-
tions that correspond to the three parts of every molecular systematic study: Sam-
pling design and execution, collection of molecular data, and data analysis. Our
hope is that this book can guide beginners all the way through a molecular sys-
tematic study, and at the same time provide established investigators with new
ideas, techniques, and approaches.
We use the term systematics in its broad sense to include the comparative
study of biotic diversity at any level. The goals of molecular systematics are also
the goals of systematics in general; this book deals specifically with molecular
approaches because of the unique problems of collecting and analyzing molecu-
lar data. We hope the book will also be useful to non-molecular systematists by
describing the principles, applications, and limitations of molecular techniques.
A book of this type must rely heavily on cooperation from expert reviewers,
and we have been fortunate to have extraordinary cooperation from the research
community. John Avise, John Gillespie, Morris Goodman, Mark Kirkpatrick, Irv
Kornfield, Mike Miyamoto, Colin Patterson, Vincent Sarich, and Allan Wilson
sent us detailed comments on several chapters each, and we thank them for their
considerable commitment of time. We also received very useful reviews of chap-
ters from Loren Ammerman, James Archie, Robert Baker, Peter Baverstock, John
Benzie, James Bull, Paul Chippindale, Joel Cracraft, Brian Crother, Ross Crozier,
Llewellyn Densmore, Michael Dixon, Rafael de SQ,Herbert Dessauer, John Gold,
Sheldon Cuttman, James Hamrick, Richard Highton, John Kirsch, Mike Johnson,
Linda Maxson, Steve Palumbi, James Patton, Craig Pease, Eric Pianka, Michael
Ryan, Barbara Schaal, Charles Sibley, Montgomery Slatkin, Jerry Slightom, Carol
Stepien, David Swofford, D. Tagle, Bruce Weir, and Gregory Whitt. We appreciate
the time and effort that these reviewers have invested in this book.
Argye Hillis and Hamish McCallum provided invaluable statistical advice,
and Michael Dixon and Loren Ammerman assisted with figure preparation.
Thomas White provided advice and prepublication information on the poly-
merase chain reaction. We thank Linda Davis, Brad Garton, Diana Hews, Beth
Reid, and Vicki Young-Lehmeier for assisting with the correction, handling, and
translating of computer files of the chapters. Andy Sinauer has contributed to
xiv Preface to the First Edition
every stage of the book, from planning and organizing to production; we thank
him for his personal interest and concern for this book. The National Science
Foundation and the Australian Research Council have provided generous sup-
port for our research in molecular systematics; this support provided us with the
experience in a diversity of molecular techniques that we needed to edit this v01-
ume. Some of the travel involved in editing was generously supported by the
University of Queensland.
Finally, our wives Ann Hillis and Fiona Hamer have assisted us and sup-
ported us throughout this project We may never be able to repay them for all
their help, encouragement, and extraordinary patience.
David M. Hillis and Craig Moritz

16 October 1989
Austin, Texas, USA
Brisbane, Australia
Contributors
Peter R. Baverstock University of New England, Northern Rivers, P.0. Box 157,
Lismore, New South Wales 2480 AUSTRALIA
Roy J. Britten Division of Biology, California Institute of Technology, Pasadena,
California 91125 USA
Donald G . Buth Department of Biology, University of California, Los Angeles,
Charles J. Cole Department of Herpetology and Ichythology, American
Museum of Natural History, New York, New York 10024 USA
Scott K. Davis Faculty of Genetics, Texas A&N University, College Station,
Texas 77843 USA
Herbert C. Dessauer Department of Biochemistry and Molecular Biology,
Louisiana State University Medical Center, New Orleans, Louisiana 703 12
USA
Thomas E. Dowling Department of Zoology, Arizona State University, Tempe,
Arizona, 85287 USA
Mark S. Hafner Museum of Natural Science, Louisiana State University, Baton
Rouge, Louisiana 70803 USA
Christopher H. Haufler Department of Botany, University of Kansas,
Lawrence, Kansas 66045 USA
David M. Hillis Department of Zaology, University of Texas, Austin, Texas
78712 USA
Allan Larson Department of Biology, Washington University, St. Louis, Mis-
souri 63130 USA
Barbara K. Mable Department of Zoology, University of Texas, Austin, Texas
78712 USA
Craig Moxitz Department of Zoology, University of Queensland, St. Lucia,
Queensland 4067 AUSTRALIA
Robert W. Murphy Department of Icl~ythyologyand Herpetology, Royal On-
tario Museum, 100 Queen's Park, Toronto, Ontario M5S 2C6 CANADA
Gary J. Olsen Department of Microbiology, University of Illinois, Urbana,
Illinois 61801 USA
xvi Contributors
Jeffrey D. Palmer Department of Biology, Indiana University, Bloomington,
Indiana 47405 USA
Stephen R. Palumbi Kewalo Marine Lab, 41 Ahui Street, Honolulu, Hawaii
96813 USA
Loren W. Rieseberg Department of Biology, Indiana University, Bloomington,
Trtdlana 47405 USA
Stanley K. Sessions Department of Biology, Hartwick College, Oneonta, New
York 13820 USA
Jack W. Sites, Jr. Deparhnent of Zoology, Brigham Young Universily, Provo,
Utah 84602 USA
Mark S. Springer Department of Biology, University of California, Riverside,
David L.Swofford Laboratory of Molecular Systematics, National Museum of
Natural History, Smithsonian Institutio~x,Wasl~ington,D.C. 20560 USA
Peter J. Waddell Department of Mathematics, Massey University, Private Bag,
Palmerston North, NEW ZEALAND
Bruce S . Weir Department of Statistics, North Carolina State University,
lialc~gh,North Carolina 27695 USA
Steven D. Wennan Department of Biology, Mesa State College, Grand Junction,
Colorado 81502 USA
Elizabeth A. Zimmer Laboratory of Molecular Systematics, National Museum
of Natural History, Smithsonian Institution, Washington, D.C. 20560 USA
Molecular Systematics
Context and Controversies
For centuries, naturalists have tried to detect, describe, and explain diversity in
the biological world; this endeavor is known as systematics. The formalization of
a hierarchical system of nomenclature by Linnaeus (1758) established a frame-
work for describing and categorizing biological diversity. This hierarchical sys-
tem was initially independent of evolutionary theory, and in fact early evolu-
tionists (such as Buffon, 1753) opposed the Linnean system and the Aristotelian
essentialism it embodied. However, the Linnean system prevailed, and later evo-
lutionists (e.g., Lamarck, 1809; Darwin, 1859; Haeckel, 1866, reviewed by Mayr,
1983) simply co-opted the system to produce a classification based on phyloge-
netic relationships (Figure 1).Initial efforts to reconstruct phylogenetic history
were based on few (ifany) objective criteria, and estimates of phylogeny were lit-
tle more than plausible assertions by experts on particular taxonomic groups.
During much of the first half of the twentieth century, systematists were con-
cerned more with problems of species, speciation, and geographic variation than
with problems of phylogeny In fact, the word phylogeny does not even appear in
the index to Julian Huxley's Evolution: The Modern Synthesis, published in 1942.
The situation began to change during the 1930s, 1940s, and 1950s, through
the efforts of individuals like the botanist Walter Zimmermann (1930; 1931; 1934;
1943) and the zoologist Willi Hennig (1950; 1966).They began to define objective
methods for reconstructing evolutionary history based on the shared attributes
2 Chapter
Figure 1 T11e phylogeny and classification of life as proposed by Haeckel(1866).
of extant and fossil organisms. in the 1960s, these mented in computer programs, which allowed the
methods (and others) were refined and developed al~alysisof large and colnpkx data sets. The past
illto explicit criteria for estimating phylogeny. Al- 30 years have continued to see major conceptual
gorithms based on these criteria were soon imple- and operational advances in the estimation of
Systematics: Contexf al~dControversies 3
phylogeny, as well as in the analysis of microevo- ally, the sophlsticatlon of phylogenetic arlalysls
lutionary change, and now studies of phylogeny has grown rapidly, wsth ~ncreasingelnphasis be-
are no longer limited to applicatlons in biological ing placed on assessments of phylogenetlc accu-
classification. Indeed, studies of phylogeny have racy (reviewed by Hillis, 1995). New methods of
permeated almost every subdiscipline in biology, analysis relate not only to the generation of phy-
and comparative biologists of all sorts appreciate logenetic hypotheses, but also to testing hypothe-
the importance of phylogenetic methods for inter- ses about biogeography, ecology, behavlor, physi-
preting all kinds of biological patterns and ology, development, epidemiology, and almost
processes. every other aspect of biology. Mareaver, ~ncreased
About tlze same time that these advances in sophisticatiolz in the analysis of character evolu-
methods for phylogenetic estimation were devel- tlon (e.g., Maddison and Maddison, 1992) has
oping in the 1960s, another sort of revolution was greatly improved our ability to invcstigato them-
happening in molecular biology, Methods for ex- tricacies of molecular data in relation to cvolu-
amining the molecular structure of proteins and tsonary models and processes.
nucleic acids were soon adopted by evolutionary Our enthusiasm for new molecular tools and
biologists, and the data available for phylogenetic methods of analysis sliould not be taken to mean
estimation began to increase exponentially, at that we advocate abandonment of old and
least for some taxa. This book is a summary of the proven techniques. Both allozyme electrophore-
lnetlzods and applications in systematics that have sis (Chapter 4) and cytogenetics (Chapter 5) have
developed out of these parallel advances in esti- made major contributions to evolutionary theory
mation procedures, co~nputationalanalyses, and (M.J.D.White, 1973; Lcwontin, 1974, Av~se,1994)
molecular bioteclznologies. and continue to be extrelnely cost-effectlvc ap-
The six years between the first and second proaches for many applications. There ex~stsa
editions of this book have see11 a quaniu~nleap in broad spectrum of methods for analyzing varla-
the number and scope of applications of rnolecu- tion In DNA, including DIVA-DNA hybridiza-
lar systematics. Central to this increase has been tion (Chapter 6), vanous methods for generating
development of new applications of the poly- and analyzing DNA fragments (e.g., rcs tr~ction
merase chain reaction, or PCR, for investigating enzyme analysis, m~crosa~ellites, RAPDs, and
variation in DNA on a large scale (Kleppe et al., multilocus DNA fingerprinting; Chapter a), and
1977; Mullis and Faloona, 1987). In conjunction DNA sequencing (Chapter 91, each wit11 charac-
with the design of broadly applicable sets of teristic strengths and limitations. A theme devel-
primers (see Chapter 71, gene amplification meth- oped throughout this book is that tlze technjque
ods have spawned increasingly large data sets on and the molecule or form of variation to be as-
DNA sequence variation within and between sayed must each be matched to a carefully de-
species. Gene amplification is also fundamental to fined problem, assessed though pilot s i u d i c s
new approaches to DNA fingerprinting, such as (Chapter 2), and the results subjected to appro-
microsatellite (Weber and May, 1989) and RAPD priate statistical analyses (Chapters 10-1 2) Only
(Williams et al., 1990) analyses. Symptomatic of then wlll the full power of molecular sysLematies
this evolution, the new edition of Molecular Sys- be realized.
tetnatics includes a separate chapter on PCR am-
plification (Chapter 7). The Link Between Molecular Evolution
Alongside the advances in bioteclznology- and Systematics
indeed, driven by them-have been improve-
ments 111 the analysis of molecular variation There is a fundamental synergy between studies
within and among species. Within species, the of molecular systematics and molecular evolut~on.
ability to obtain gene trees has encouraged the de- Molecular systematics uses genetic markers to
velopment of coalescence theory (reviewed by make inferences about population processes and
Hudson, 1990) and tlze analysis of phylogeogra- phylogeny and in doing so creates a substantial
phy (Avise et al., 1987; Avise, 1994).More gener- comparative database for specific genes or pro-
4 Cizizpfer I / Moritz Ij Hillis
telns Studies of molecular evolution use these ticular problem. For example, contrary to earlier
data to evaluate rates, processes, and constraints suggestions (e.g., Moritz et al., 19871, it has be-
on molecular change through time (reviewed by come clear that genes with highly conserved
Kirnura, 1983b; J.H. Gillespie, 1991; Li and Graur, amino acid sequences may be less useful than
1991).The results of molecular evolutionary stud- those with high replacement rates for inferring
ies can then provide for more informed use of phylogeny of distantly related taxa if allowable
molecular markers in population genetics and substitutions in the former are rapidly saturated
phylogerletic analyses. (Graybeal, 1993,1994).Molecular systematics also
'This linkage of inolecular systematics and can contribute to studies of molecular evoIution
evolution is most obvious in the analysis of DNA beyond just providing comparative sequence
sequence variation (Miyalnoto and Cracraft, 1991; data. For example, estimated molecular phyloge-
Crozier, 1993; Simon et al., 1994; Chapters 11 and nies can be used to detect intragenic recombina-
12). It is clear from comparisons of closely related tion, exon shuffling, horizontal transfer, or gene
sequences and analyses of pseudogenes and their conversion events (e.g., Hughes and Nei, 1989;
functional paralogs that substitutions between Valdes and Pifiera, 1992; Robertson et al., 19951, to
different bases occur at disparate rates (see Chap- test for heterogeneous rates of evolution (e.g.,
ter 31 and Li and Graur, 1991).For instance, in an- Easteal, 1990), and to identify sequences subject to
imal mtDNA, the bias toward transitions can be selection (Fu and Li, 1993; Klein, 1993).These ex-
extreme I'rotcin-coding genes and the non-cod- amples are far from exhaustive, but serve to illus-
ing control region of intDNA may accumulate trate the growing interplay between molecular
transillo~~s 10 or more times more quickly than evolution and systematics in the analysis of DNA
transversions in some species (W.M.Brown et al., sequences,
1982; Irwin et al., 1991; Kocher and Wilson, 1991), The synergy between molecular evolution
although other genes (such as the rRNA genes) and systematics Is also developing for some forms
may experience very different substitutional pres- of DNA fragment analysis. Assays sf variation at
sures (Vawter and Brown, 1993). To complicate microsatellite loci are becoming increasingly pop-
matters further, the relative probabilities of sub- uIar for studies of intraspecific variation, but there
stitutlons between a particular pair of bases (e.g., are concerns that the mutation rate may be so
C ++T transitions) can be asymmetric, resulting in high as to overwhelm information on population
biased base composition 111 the sequences com- history and migration rate (reviewed in Chapter
pared. Correction for these inequalities using 8). In particular, the effects of mutation and mi-
weighted parsimony, maximum likelihood, or ap- gration can be confounded, making it difficult to
propriate estimators for sequence divergence in- use measures of among-popula tion differentiation
creases the range of parameter space over which (e.g., FST)to estimate migration rates. These diffi-
estimates of phylogeny from DNA sequence data culties can be overcome if the form and rate of
are rrllable (Felsenstein, 1988a; Huelsenbeck, mutation at microsatellite loci is understood; in-
1995, Chapter 11).Methods to correct for effects of deed, measures of population differentiation and
varia rion in base composition among taxa on phy- genetic distance that incorporate specific mutation
logenetic analysis also have been developed (e.g., models have been developed recently (Goldstein
Sidow and Wllson, 1991; Stecle et al., 1993b; Lock- et al., 1995; Slatkin, 1995).
hart et al., 1994; Lake, 1994). Knowledge about
varlous types of interactions among sequence po- The Link Between
sitions (e.g., Wheeler and Honeycutt, 1988; Korber Molecular Population Genetics
et a1 , 1993) and differences in probabilities of and Phylogenetics
change across sites have led to objective criteria
for differential character weighting. An under- In the first edition of Molecular Systematics, we as-
standli~gof evolutionary constraints also can serted that the field of molecular systematics en-
guide the selection of genes to be used for a par- compasses both intraspecific variation, tradition-
Molecula Systematics: Context and Controversies 5
ally the field of population genetics, and inter- (e.g., Moritz, 1991a; Hillis et al., 1991c; Bradley et
specific diversity, traditionally the field of phylo- al., 1993).
genetics. This linkage is fundamental to the inte-
gration of molecular evolution and systematics
discussed above and has been enhanced by the CONTROVERSIES IN
use of allelic genealogies at both levels. Indeed, MOLECULAR SYSTEMATICS
population genetics is undergoing a renaissance
fueled by the availability of information on the The collection of molecular data and their use in
molecular differences among alleles, which is of- systematics has led to several controversies, some
ten expressed as a phylogeny (Avise, 1989). Coa- of which have generated more heat than light.
lescence theory (reviewed by Hudson, 1990) These controversies include arguments about the
predicts the effects pf genetic drift, mutation, mi- relative value of molecular versus morphological
gration, and selection on expected times to com- data, the types of data that should be collected,
mon ancestry of alleles. If the rate of nucleotide the various philosophical approaches to analyz-
substitution is sufficient for the allelic genealogy ing data, the meaning of homology in relation to
to be estimated, then inferences about historical molecular characters, the extent to which individ-
population size, gene flow, and selection events ual gene trees reflect relationships among popu-
are possible (Slatkin and Maddison, 1989; Slatkin, lations or species, the constancy of rates of molec-
1991; Slatkin and Hudson, 1991; Felsenstein, ular evolution, and the neutrality of molecular
1992; Hudson et al., 1992b; Nee et al., 1995).A sig- variants. Some of these debates are specific to
nificant outcome of theoretical and empirical molecular data, whereas others are general to all
studies of allelic genealogies within species will types of evidence used to estimate phylogeny.
be an improved understanding of the conditions Each of these debates is reviewed at length else-
for which gene and organismal trees are congru- where; here we merely outline the principal argu-
ent in comparisons of closely related species. ments and their implications for molecular sys-
The methods described in this book vary in tematics. Another recent controversy-whether to
their ability to link population genetics and phy- analyze multiple data sets separately or in combi-
logenetic analysis. Certainly, DNA sequencing nation-is discussed in Chapter 12.
and restriction site analvsis allow a close connec-
tion. However, some of the increasingly popular
methods for analyzing within-species variation,
Molecules versus Morphology
such as microsatellite analysis and RAPDs, do There has been considerable debate over whether
not lend themselves to this approach because the molecular or morphological features are inher-
homology of alleles detected between species is ently better sources of information for estimating
questionable (FitzSimmons et al., 1995; J.J. Smith phylogeny (Patterson et al., 1993). Some have
et al., 1995; see C h a ~ t e 8).
r Other methods such claimed that molecular characters are relatively
as aliozyme electro'phoresis (Chapter 4) or ge- weak (e.g., Kluge, 19831, whereas others have
nomic DNA-DNA hybridization (Chapter 6) also claimed that morphological characters are likely
do not reveal individual gene genealogies and ac- to be misleading or uninformative (e.g., Frelin and
cordingly are less able to take advantage of the Vuilleumier, 1979; Sibley and Ahlquist, 1987a;
developing interaction between molecular popu- Lamboy, 1994).Closer examination shows this to
lation genetics and phylogenetics. Nonetheless, be an empty argument (Hillis, 1987; Sytsma, 1990;
studies-that combine sequence analysis or restric- Wiens and Hillis, 1996). The real concerns for the
tion analysis with chromosomal or allozymic practicing systematist are whether the characters
analysis provide an approach for linking studies examined exhibit variation appropriate to the
of allelic phylogeny to genetic analyses of popu- question(s) posed, whether the characters have a
lations or species (e.g., Moritz, 1991b; Radtke et clear and independent genetic basis, and whether
al., 1995) or processes of molecular evolution the data are collected and analyzed in such a way
that it is possible to compare and combine phylo- It also needs to be recognized that morpho-
genetic hypotheses derived from tlzern (see Chap- logical and molecular approaches each have dis-
ter 12). tinct advantages and disadvantages. For example,
We suggest that the conflicts between molec- most (but not all) molecular data have a clear ge-
ular and morphological evidence have been netic basis and the total data set is limited only by
overemphasized. The development of molecular the genome size. On the other hand, morphologi-
systematics has not resulted in widespread refu- cal data can be obtained more readily from an-
tation of phylogenetic hypotheses generated by cient fossils (e.g., Gauthier et al., 1988) and exten-
morphologists, although the molecular approach sive preserved collections and can be interpreted
may be revealing in situations where morpholog- in the context of ontogeny (Kluge and Strauss,
ical variation is limited or the homology of mor- 1985; cf. Mabee, 2993). Irt general, studies that in-
phological features is unclear. Two recent contro- corporate both molecular and morpl~ologicaldata
versies concerning relationships among eutherian will provide much better descriptions and inter-
mammals illustrate this point. pretations of biological diversity than those that
Stimulated by the observation that flying focus on just one approach. Furthermore, it is pos-
foxes and their relatives (megachiropterans) share sible to address some systematic problems only
a number of features of brain organization with wit11 morphological data and other problems only
primates that are not present in other bats (mi- with molecular data (see Hillis, 1987; Fernholm et
crochiropterans), Pettigrew (1986) proposed that al., 1989; Sytsma, 1990). This book is concerned
that the megachiropterans are more closely re- only with molecular variation because many is-
lated to primates than they are to the microchi- sues are unique to molecular data and are inade-
xopterans, and thus that wings and flying have quately covered elsewhere, not because we view
evolved separately in these lineages. This molecules as inherently superior to morphologi-
provocative hypothesis led to the generation of cal characters as markers of evolution.
many molecular datasets (Bennet et al., 1988; Ad-
kins and Honeycutt, 1991; R.J. Baker et al., 1991b;
Mindell et a]., 1991; Ammerman and Hillis, 1992;
Types of Characters and
Bailey et al., 1992; Stanhope et al., 1992) and criti-
Methods of Analysis
cal reassessments of both the morphological and The techniques of molecular systematics produce
molecular data (Pettigrew et al., 1989; Pettigrew, two fundamentally different types of information:
1991a,b; R.J. Baker et al., 1991b; N.B. Simmons et distance data, where differences among mole-
al. 1991; Pettigrew, 1994; Van Den Bussche et al., cules are measured as a single variable (e.g., DNA
1996).Whatever the outcome for bat phylogeny, hybridization, Chapter 7); and character data,
this debate has been very healthy in focusing crit- where differences are measured as a series of dis-
ical attention on the validity and interpretation of crete variables (characters), each with multiple
characters and on potential sources of conver- states. Character data can be converted to dis-
gence of either morphological or moIecular fea- tances, but distances usually cannot be converted
tures (e.g;., see the section "Recognizing System- into character data. Character data have some ad-
atic Errors" in Chapter 11). Another example is vantages for data collection and analysis. It is rel-
the debate over cetacean relationships, where re- atively easy to add information on new taxa to the
cent molecular data have suggested paraphyly of data set (see Chapter 2) and data obtained from
toothed whales, which in turn led to a reassess- different sources-(i.e.. other molecules or other
ment of morphological. characters (reviewed by types of attributes) can be combined for analysis
Milinkovitch, 1995). Jn this case there is disagree- (see Chapter 12).
ment between molecular data sets (Arnason and Arguments abaut different approaches to
Gullberg, 1994), and the conflicting hypotheses phylogeny estimation, such as the relative effi-
are again prompting critical assessment of both ciency, consistency, and robustness of the compet-
the molecular and morphological evidence. ing methods, continue to dominate discussions of
Molecula r Systematics: Context and Controversies 7
phylogenetic analysis (see Chapter 11).Some mol- However, in most cases, it is likely that the two
ecular techniques inevitably restrict the range of proteins are hoinologous across the~rlength, and
applicable methods of analysis. This is not a prob- have simply diverged at 5% of the pasitlons Fur-
lem as long as the remaining options can reliably thermore, there arc several reasons thal the pro-
estimate phylogeny, which in turn depends on the teins may be simllar, lnclud~ngcommon ancestry
frequency wit11 which assumptions specific to the (homology), convergence, and gene conversion
method are violated and how sensitive the (Patterson, 1988).
method is to those deviations. Considerable effort There are also several types of homology that
is being given to examining the robustness of al- must be distinguished. If the common ancestry of
ternative methods for phylogenetic analysis two sequences can be traced back to a speclation
(Chapter 11) and estimation of population genetic event, then they are said to be related by orthol-
parameters (Chapter 10). Development of new ogy (Fitch, 1970). Jf, on the other hand, the com-
methods of analysis and their implementation in mon ancestry of the sequences can be tracccl back
computer software packages also constitute to a gene duplication event, then tlze rela tlonsh~p
a very active field. Nonetheless, the greatest ob- is one of paralogy (Fitch, 1970).Homologous se-
stacle to the incorporation of the new flood of quences also can be related through lateral gene
molecular data in systematics remains a lack of transfer (via retroviruses, for instance), in which
adequate algorithm development and implemen- case the sequences are related by xenology (Gray
tation, especially for the alignment and analysis of and Fitch, 1983).The distinction is necessary be-
very large data sets. cause only orthologous sequences can be used to
infer phylogeny of species. Confusion of paralo-
gous and ortl~ologousscquenccs can result in a
Homology and Similarity in correctly estimated phylogeny for the molecules
Molecular Systematics that differs markedly from that of the organisms
The uses and misuses of the word homology frame from which they were sampled. Consider the ex-
a complex subject. Difficulties in its use arise as a ample in Figure 2. A gene duplication event in the
result of differences in intended meaning between ancestor of species 1,2, and 3 gave rise to the two
some molecular biologists and morphologists paralagous sequences, A and B. Subsequently,
(Patterson, 1988), among molecular biologists (Re- two speciation events gave rise to the three
eck et al., 1987; Aboitiz, 1987; Dover, 1987; Weg- species, such that specics 1 and 2 shared a ;nore
nez, 1987; Hillis, 19944, and depending on con- recent common ancestor (Figure 2A). One could
text (B.K. Hall, 1994). potentially recover the phylogeny of the three
In general, homology means inferred comlnon species by examinlng the orthologous A se-
ancestry, althouglx it is cammonly misused to quences in each species, or by examinlng the or-
mean similarity (Fitch, 1966, 1970; Reeck et al., thologous B sequences In each species (Flgr11.e2B).
1987). Similarity is an empirical observation and However, examination of paralogous sequences
can be quantified, whereas homology usually (e.g., A in speclerj 1 and 3 and B in specles 2)
must be inferred and is not usually a quantifiable would result in incorrect inferences about specles
relationship. When a molecular biologist states phylogeny but correct inferences about gene phy-
that two proteins are "95% homologous," 11e or logeny (Figure 2C). Thus, for problems of species
she usually means that the two proteins are the phylogeny, the sequences examined must be or-
same at 95% of the amino acid positions (= 95% thologous.
isologous; Wegnez, 19871, which may be used to However, paralogous sequences w ~ t h i na
infer that they are homologous. The statement is species do not always evolve ~ndependentlyIn-
confusing because it is possible that 95% of one deed, sequences that are repeated in tandem ar-
protein is homologous to the other, and that the rays rarely undergo Independent evolution. In-
remaining 5%is unrelated by direct ancestry (per- stead, the many copies evolve in conccrl (hence
haps because of exon shuffling; see Hillis, 1994a). the name concerted evolution; Zimmer et al.,
8 Ctlnpter I / Moritz €9 llillis
3A IA 2A 1B 20 3B
+-- Speciat~onevent 2 ----+
Figure 2 The consequences of using orthologous ver- The phylogeny inferred from comparison of either set
sus paralogous genes to infer phylogeny. (A) The phy- of orthologous genes. (Notice that this is the correct
logcny of a set of homologous genes in three species species phylogeny.) (C) The phylogeily inferred from
(1-3) A gene duplicatlorl event in the ancestor of the comparison of two orthologous and one paralogous se-
three species gave rise to two sets of paralogous genes quence. (This is the correct gene phylogeny, but not the
(Aand 131, and two subsequent speciation events gave correct species phylogeny.)
rise to orthologous genes in each of three species. (0)
1980) because of molecular processes such as bi- events in their simulations. At intermediate rates
ased gene conversion and unequal crossing over. of concerted evolution, the inferred trees usually
Patterson (1988) suggested the term plerology to confounded speciation and gene duplication
describe the relationship among paralogous se- events. Obviously, these estimates are dependent
quences homogenized within a taxon as a result on the details of their simulations, but they give
of concerted evolution. If the rate of concerted an approximation of the level of concerted evolu-
evolut~onis high enough, then all the copies in the tion that is likely to confuse phylogenetic esti-
repeated array may appear to be evolving almost mates of taxa or genes. Their results indicate that
as a slngle sequence, and the distinction between plerologous sequences can be used to infer rela-
orthology and paralogy is blurred. tionships among taxa, as long as the rate of ho-
Sanderson and Doyle (1992) simulated the ef- mogenization is demonstrably faster than the rate
fects of concerted evolution on phylogenetic esti- of speciation in the group of interest. For some se-
mation They found that when 70% of sites un- quences (such as nuclear ribosomal RNA genes),
derwent homogenization between speciation this condition appears to be met for most species
evcnts, the inferred trees represented the relation- comparisons.
ships of the taxa rather than of the individual The increased use of the polymerase chain re-
genes To reliably infer the gene trees among the action (Chapter 7 ) for in vitro amplification of
paralogs, concerted evolution had to involve DNA has increased the likelihood of practical
fewer than 10% of the sites between speciation problems associated with paralogy. Conserved
Molecula'r Systenzatics: Context and Controversies 9
primer sequences for nuclear or organellar genes differences in tissue specificity or electrophoretic
are likely to be conserved in paralogous pseudo- migration, and nonfunctional pseudogenes do not
genes as well. This may lead not only to the am- cause difficulties since they are not expressed and
pIification of pseudogenes and other paralogs (see therefore cannot be scored. This suggests that
Chapters 7 and £9, but also to the amplification of combined studies of allozyme electrophoresis and
in vitro recombination products among different sequence analysis of the individual alleles has
alleles, functional paralogs, or pseudogenes (Saiki great potential for resolving problems of homol-
et al., 1988; Scharf et al., 1988a,b). Thus, the pres- ogy assignment in studies of gene evolution.
ence of the amplified sequences in the original or- Finally, the term homologous has taken on at
ganism must be confirmed, especially for studies least two additional meanings in molecular biol-
of gene evolution where such recombinational ar- ogy. In cytogenetics, it is standard to refer to the
tifacts are likely to be highly misleading. Verifica- respective chromosomes in a chromosome pair of
tion of sequence fidelity may be accomplished a diploid organism as homologs and to refer to
through genomic restriction mapping or cloning the homologous pair of chromosomes in another
techniques (Chapters 8 and 9). species as homeologs (Chapter 5), even though
Still another level of homology must be ad- this is quite different from the use of homology in
dressed if the sequences of homologous genes are classical morphology (where homonomy is used to
to be compared. Even if two genes are known to refer to a repeated structure in a single organism).
be homologous (i.e., they are descended from a In addition, a molecular probe is said to be ho-
common ancestral gene), insertion and deletion mologous if it is used to study the same species
events may have confused the positional homal- from which it was derived, and heterologous if it
ogy of the individual nucleotide sites or amino is used to study an homologous sequence in an-
acid positions. Because most phylogenetic infer- other species. We prefer the terms homospecific
ence methods depend upon accurate assessment and heterospecific for these latter meanings.
of positional homology, as much (or more) atten-
tion should be given to sequence alignment as is
given to analysis of the aligned sequences (see
Gene Trees and Organismal Phylogeny
Chapter 9). As DNA sequences have become easier to obtain,
In restriction enzyme analyses, the potentially increasing emphasis has been placed on estimat-
homologous characters are the restriction sites ing gene trees and, from these, making inferences
rather than the restriction fragments (Chapter 8). about relationships among populations or species.
Two homologous sequences may not share any A major concern that arises is whether the gene
restriction fragments, even though all of the re- tree reflects the organismal phylogeny (reviewed
striction sites in one sequence are also found in by Doyle, 1992; Avise, 1994). Assuming that the
the other. Confusion of homology from using re- genes compared are truly homologous (see
striction fragments rather than restriction sites as above), gene trees and organismal phylogenies
characters has been shown to result in positively can differ because of retention of ancestral poly-
misleading analyses in experimental studies of morphism~,or reticulation among populations
phylogenetic inference (M,E. White et al., 1991; (i.e., gene flow) or species (i.e., hybridization).
Hillis et al., 1994a).Similar problems plague in- This is of particular concern for non-recombining
ferences of homology in interspecific applications segments such as organelle genomes because the
of some other DNA fragment techniques (FitzSim- effects of reticulation are potentially retained
mons et al., 1995; J.J. Smith et al,, 1995; Chapter 8). through subsequent generations (Doyle, 1992;
Homology of protein loci in isozyme studies Degnan, l993b).
(Chapter 4) is inferred on the basis of a number of The process by which ancestral alleles are
functional, structural, and expressional criteria. sorted among recently separated populations or
Most functional paralogous proteins are easily species has been the subject of several theoretical
recognized as the products of distinct loci through analyses (e.g.,Neigel and Avise, 1986; Pamilo and
10 Chapter 1 J Moritz & Hillis
Nei, 1988; Takahata, 1989; Wu, 1991). The rela- Where there is substantial migration, approaches
tionships among gene lineages found in two pop- that do not assume a hierarchy of populations
ulations progress from polyphyly to paraphyly should be used to examine the extent of genetic
and finally reciprocal monophyly following re- differentiationin relation to geographic separation
productive isolation. The rate at which this occurs of populations (Lessa, 1990; Slatkin and Maddison,
is affected by the pre-existing geographic struc- 1990; Slatkin, 1993; Csandall and Templeton, 1996).
ture of the variation, demographic processes Hybridization among long-separated species
within eacli population, and, primarily, the effec- also can lead to introgression of a non-recornbin-
tive population size (the process takes approxi- ing gene (e.g., mtDNA or cpDNA) and, thus, dis-
mately 4Ne generations, where N, is the effective crepancies between gene and organismal phyIo-
population size; Neigel and Avise, 1986).Variation genies (reviewed in Avise, 1994). However, if
among loci is expected because of stochastic vari- relationships are assessed by several means, these
ation (Ball et al., 1990) and differences in effective discrepancies can provide insights into the evolu-
population size (e.g., organellar genomes versus tionary history of a species, in particular the role
nuclear genes). Thus, differences in gene trees of hybridization (e.g., M.L. Arnold et al., 1991;
among populations or closely related species can Whittemore and Schall, 1991; Dowling and De-
arise purely because of lineage sorting effects Marais, 1993).
(e.g., Hey and Kliman, 1993; Slade et al., 1994).
Both theory and practice suggest that these effects
can be overcome by combining data across a large
Constancy of Evolutionary Rates
number of loci (Pamilo and Nei, 1988; Atchley Early indications of a strong correlation between
and Fitch, 1991; Slade et al., 1994), although there estimates of sequence divergence and divergence
are potential drawbacks from combining data time (Zuckerkandl and Pauling, 1962) raised the
from distinct gene trees in a single analysis (Bull exciting possibility that molecular comparisons
et al., 199313; de Queiroz, 1993). could provide indications of the time of diver-
Reticulation will disrupt hierarcitical patterns gence for taxa where no fossils exist. Although
that are produced by an underlying process of lin- most biologists now accept a broad correlation be-
eage divergence bcc the section "Trees versus Net- tween amount of molecular divergence (at least
works" in Chapter 12). On one hand, this disrup- for proteins and DNA) and time, recent evidence
tion of phylogeographic structure provides the (reviewed in Chapter 12; J.H. Gillespie, 1991;
basis for estimating rates of gene flow among pop- Avise, 1994) indicates sufficient rate heterogeneity
ulations (Slatkin and Maddison, 1989). On the that one sl~ouldnot assume that rates are equal on
other hand, it can frustrate attempts to use phylo- an a priaui basis. For instance, J.12. Gillespie (1987)
genetic metliods to assess relationslups among the Iias calculated the ratio of the variance of the
populations or taxa involved, especially if the number of substitutions to the mean number of
methods used assume an underlying tree structure substitutions that occur along a lineage as ranging
for the relationships. In general, phylogenetic from 1to 35 for amino acid substitutions and 1to
methods that assume a tree should only be used to 19 for silent substitutions, indicating considerable
infer population relationships from gene trees fluctuation in evolutionary rate (see also Lynch
where those populations are effectively indepen- and Jarrell, 1993).This has significant imylications
dant (i.e., where the rate of immigration is trivial for molecular systematics. Constancy of rates is an
campared to the rate of lineage extinction). In expectation of the neutral theory of molecular
practice, phylogenies estimated from mtDNA evolution (discussed below), is an assumption of
have been very useful for generating or testing hy- a few methods for estimating phylogeny (Chap-
potheses about historical biogeography at geo- ter II), and is widely assumed in estimating time
graphic scales where migration rates are low (e.g., since divergence (Chapter 12).
Bermingham and Avise, 1986; Moritz et al., 1992a; To some extent, the arguments over the mole-
Moritz and Heideman, 1993; Joseph et al., 1995). cular clock stem from different expectations. The
.Molecular Systematics: Coil text and Colztroversics 13.
utility of such a clock depends on the quality of depends on the proportron of markers (loci) nf-
information needed to test a specific hypothesis. fected, the extent and sign of correlations among
If a clock indicates 3:20 P.M., but the real time loci, and the robustness of the method of analys~s
could be anywhere from 12:20 I:M. to 620 P.M., the to departures from neutrahty (Chapters 2, 10, and
clock is only useful if one needs to know whether 11).Also, ~tmay be that selection is ep~sodlcrathcr
it is morning or afternoon. In some cases (e.g., than constant and therefore affects inferel~ces
l~ominoiddivergences; see Sarich and Wilson, based on ~nterspeciflcversus lntraspeciflc varia-
1967), molecular estimates of divergence time tlon (e.g., McDonald and Kreitman, 1991) or 111-
have led to a substantial re-evaluation of fossil ev- traspeciflc allele genealogies (e.g., Rand et al ,
idence. However, most purported tests of hy- 1994), but not those based on short-term dynam-
potheses about divergence time have ignored ics of alleles (e.g, Easteal, 1985; Waples, 1989)
problems associated with calibration, and few Where deviations from neutrality are llkcly to blas
have calculated appropriate confidence intervals. analyses significantly, the assuinptlon of neutral-
These confidence intervals can be so large in some ity should be made explicit, preferably m a w a y
cases that the term "clock"-or even "sloppy that can be tested. However, because most depar-
clock"-becomes meaningless (see Chapter 12). tures from neutrality are thought to be locus-spe-
cific, ~tis widely assumed that select~onwill have
relatively minor effects on the overall analysls if
Neutrality of Molecular Variants numerous loci are examined.
A frequently voiced concern is that molecular
characters are not neutral and that selection will Data Quality and Presentation
bias analyses of ii~traspecificvariation and esti-
mates of phylogeny. This relates to a much SJopulation genetic or pl~ylogeneticestimates can
broader argument over the evolutionary signifi- only be as accurate as the primary data them-
cance of molecular variation, the "neutralist- selves. Therefore, it is essential to confirm that ap-
selectionist controversy" that has been a major parent molecular varid tion has a genetic basis
concern of molecular population genetics since This includes such obvious procedures as con-
fimura's (1968)seminal paper on the neutral the- firming DNA sequences using the complementary
ory (reviewed by Lewontin, 1974, 1985; Kimura, strand or overlapping primers and repeated runs
198310, 1986; Crow, 1985; J.H. Gillespie, 1991). (Chapter 91, using internal controls in allozyme
There can be no doubt that many protein, chro- electropharesis (Chapter 4) and DNA lzybndiza-
mosome, and DNA variants are acted on by selection procedures (Chapter 6), and verifying the re-
tion; it also appears that much molecular varia- pcatability of DNA fragment data (Chapter 8).
tion is consistent with predictions of various I-fowever, it also includes some less obvious pro-
modifications of the neutral theory (Sarich, 1977; cedures, such as confirming that observed molec-
Neil 1987; Ohta, 1992).Thus, the debate is reduced ular variation exists in the organism, rather than
to whether or not most molecular variants are se- being an artifact of m vltro recombination or poly-
lectively neutral (or nearly neutral), and whether merase errors (Chapter 7).
neutrality or selection should be considered the Inevitably, it is up to the investigator(s) to de-
null hypothesis. The current lack of a general cide wkether molecular data are accurate. How-
testable theory sf lnolecular evolution based on ever, the data should be presented in such a way
selection dictates that neutrality must usually that peer reviewers and readers can judge the
serve as the null hypothesis. However, given the technical. quality and extent of the data them-
poor fit of many molecular data to the neutral the- selves. Unfortunately, once techniques lzave bc-
ory (J.H.Gillespie, 19911, one should make a con- come established in the literature, there has becn a
scious distinction between testing for neutrality tendency on the part of editors and authors allke
and simply assuming that it exists. to dispense with the primary data, such as pho-
The impact of selection on systematic studies tographs of gels or chron~osomes,raw experi-
mental data, and even alignments of sequences! for detecting and recording variation in proteins,
In practice, this can lead to ul~necessarilyacrimo- chromosomes, and nucleic acids (Chapters 4-9);
nious cicbates over data quality and interpretation and methods for analyzing the data (Chapters
(e.g, Cracraft, 1987; Sibley and Alzlquist, 1987; 10-12). We have attempted to include a balance of
Sarlch et al., 1989).T h ~ is
s a poor reflection on the viewpoints concerning different methods of data
field as a whole, and it is u p to practitioners of collection and analysis. Obvious omissions in this
~nolecularsystematics to insist on rigorous stan- edition are amino acid sequencing and immuno-
dards of data quality and presentation. Increas- logical methods, both techniques of great histori-
ingly, n~olecularjournals are insisting tlzat DNA cal importance in molecular systematics (Good-
sequences and alignments be entered into appro- man et al., 1987; Maxson and Maxson, 1990), but
priate databases, but this does not ensure the ac- which have been largely replaced by nucleic acid
curacy of the sequences themselves. Clark and sequencing for most applications in systematics.
Whittam (1992) reported a low error rate (=1/1000 Otherwise, the coverage of techniques is fairly
bp) lor sequences in GenBank, and concluded tlzat comprehensive. To facilitate comparisons, each of
such a n error rate would have little impact on the molecular techniq~lechapters is arranged into
molecular systematics of organisms or genes with sections on (1)principles and comparisons of
hig.17 scquence diversity However, this error rate methods (including a discussion of assumptions);
could adversely affect population genetic analy- (2) applications and limitations; (3) laboratory set-
ses of species with low nucleotide diversity (e.g., up; (4) protocols; and (5) interpretation and trou-
humans; Li and Sadler, 19911, so even greater cau- bleshooting. A glossary of terms and common ab-
tion may be required for such applications. breviations is given after Chapter 12. Words and
phrases included ill the glossary appear in bold-
face type at their first appearance in the text.
SCOPE AND USE OF THIS BOOK For the most part, protocols are basic and well
proven. Emphasis also has been placed on high-
M o ; c c ~ , i n l Syslernatccs aims to provide an lighting recent developments that appear partic-
overview of ~nolecularmethods currently used to ularly promising. However, for each approach,
anajyLe diversity witl-trn and among species. The there is a wide range of alternative protocols not
prrmary goal is to provide new workers in this described here. This volume is designed to com-
rap+ expanding field nrith sufficient technical plement existing manuals that focus on a single
and iiieoretical inforination to enable them to se- approach (see works cited at the end of this chap-
lect one or more appropriate methods, to dcsign ter, as well ds the manuals for phylogenetic analy-
and ~lnplementa study, and to analyze the resis packages listed in Chapter 11).These more de-
suiilng data, all with maximurn efficiency. In se- tailed sources should be referred to for additional
lecc~ngan appropriate technique for obtaining background and alternative methods once a par-
data, the basic questions to be considered are (1) ticular approach has been adopted.
will i r produce information compatible with the
dcs~rodinetlzod of analysis; (2) it; the signal-to-
nolsc ratio likely to be sufficiently high to address FOR FURTHER STUDY
the question(s) posed; and (3) is it cost-effective
and tcaslble, given the available facilities and ex- General References
perLlse7 For practicing molecular systematists, Avise, J. C . 1994. Molec~llauMarkers, Nafural History,
these chapters may suggest alternative strategies and Evolutio~i.Chapman and Hall, New York.
for collcct~ngand analyzing data and new per- Koelzal, A. R. (ed.).1992. Molecular Genetic Analysis of
spectxves on limitations and assumptions of fa- Populations. IRL Press, Oxford.
Iloelzal, A. R. and G. A. Dover. 1991. Molecular Genet~c
mi11ar iecl~n~ques.
Ecology. Oxford University Press, Oxford.
The book has three major sections, each rep- Soltis, P. S., D. B. Soltis and J. J. Doyle (eds.).1992.
rcsenimg an important phase of a study: sampli~zg Molecular Systematzcs of Plants. Chapman and
dcs3gn and methods (Chapters 2 and 3);methods Hall, New York.
Molecular Systematics: Context and Controversies 13
Zimmer, E. A., T. J. White, R. L. Cann and A. C. Wilson Nucleic Acids

(eds.). 1993. Molecular Evolution: Producing the Adams, M. D., C. Fields and J. C. Venter (eds.).1994.
Biochemical Data. Academic Press, San Diego. Automated D N A Sequeizcing and Analysis.
Academic Press, San Diego.
Proteins Ausubel, F. M. (ed.). 1989. Current Protocols in
Harris, H.and D. A. Hopkinson. 1976 et seq. Handbook Molecular Biology. John Wiley and Sons, New
of Enzyme Electrophoresis in Human Genetics. York.
North-Holland, Amsterdam. Hames, B. D. and S. J. Higgins (eds.). 1985. Nucleic
Manchenko, G. I? 1994. Detection of Enzymes on Acid Hyb~idizafion:A Practical Approach. IRL Press,
Electrophorefic Gels. CRC Press, Boca Raton, Oxford.
Florida. Miyamoto, M. M. and J. Cracraft (eds.). 1991.
Richardson, B. J., P. R. Baverstock and M. Adams. Phylogenetic Analysis of D N A Sequences. Oxford
1986. Allozyme Electrophoreszs. Academic Press University Press, New York.
Australia, Sydney. Sambrook, E., F. Fritsch and T.Maniatis. 1989.
Molecular Cloning. Cold Spring Harbor Press,
Chromosomes Cold Spring Harbor, New York.
Darlington, C. D. and L. F. La Cour. 1969. The Handling Simon, C., F. Frati, A. Beckenbach, B. Crespi, H. Liu
of Chromosomes, 5th ed. Allen and Unwin, and P. Flook. 1994. Evolution, weighting and phy-
London. logenetic utility of mitochondria1 gene sequences
Green, D. M. and S. K. Sessions (eds.). 1991. Amphibian and a compilation of conserved polymerase chain
Cytogenetics and Evolution. Academic Press, San reaction primers. Ann. Entomol. Soc. Am.
Diego. 87:651-701.
MacGregor, H. and J. Varley. 1983. Working with White, T. J., T. Bruns, S. Lee, and J. Taylor. 1989.PCR
Antma1 Chromosomes. John Wiley and Sons, New Protocols: A Guide to Methods and Applications.
York. Academic Press, New York.
Sharma, A. I<. and A. Sharma. 1972. Chrornosoine
Techniq~ies:Theory and Practice. Butterworth,
London.
Par
Chapte
Project Design
Peter R. Baverstock and Craig Moritz
INTRODUCTION
Molecular systematic studies require particularly careful planning because they
are usually relatively expensive and may involve destructive sampling of the or-
ganisms (Chapter 3). The aim, therefore, should be to maximize the information
obtained per specimen; too few specimens may lead to an inconclusive or incor-
rect result; too many is sheer waste. Despite this requirement, molecular system-
atic studies seem especially prone to poor planning. Too often projects are well
advanced before it is realized that the sampling strategy is inappropriate, the
wrong tissues have been collected, the tissues have been stored inappropriately,
the wrong technique has been chosen, or far too many or far too few specimens
have been included.
Molecular systematic studies typically involve the following stages:
1, Define the problem
2. Conduct a pilot study
3. Determine the appropriate sampling strategy
4. Collect samples
5. Analyze the samples
6. Analyze the data
This chapter deals mainly with step 3, establishing the most efficient sampling
design. However, steps 1and 2 will have a profound influence on step 3 and are
therefore considered. The remaining chapters of this book concern steps 4-6.
The cost of a project should take into account not only the cost of chemicals
and other consumables, but also the cost of time, which includes both the col-
lecting and the screening phases of the project. Thus, the sampling design should
18 Chaptev 2 / Baverstock & Moritz
aim to minimize both the number of specimens The sample sizes required for a given level of
and the handling in a way that remains compati- type I error and a given power depend on the
ble with the biological questions being asked, In- sampling variance. Many biological data follow a
deed, the first and most important step is to de- normal distribution, for which the mean and the
fine clearly the biological questions being asked. variance are independent. However, genetic data
The questions should be stated in as specific and such as allele frequencies determined by allozyme
detailed a way as possible. It will be particularly electrophoresis (Chapter 4) or DNA fragment
useful to contrast formal hypotheses as a guide to studies (Chapter 8) may follow a binomial distri-
further steps in the analysis. bution where the variance (s2)can be estimated
from the mean:
STATISTICAL CONSIDERATIONS
At the outset, one needs to decide on the level of where p is the allele frequency and n is tlze sam-
error that is acceptable. There are two types of er- ple size. For nuclear loci in a diploid population
rors; type I if the null hypothesis is rejected when this distribution is appropriate if the genotype fre-
it should be accepted, and type I1 if the null hy- quencies conform to Hardy-Weinberg equilib-
pothesis is accepted when it should be rejected. rium (HWE), otherwise resampling procedures
Type 11 errors are difficult to define for most bio- should be used to estimate variances of allele fre-
logical systems because the expected differences quencies (Chapter 10).
between two populations are usually unknown.
They are usually expressed as the power of the
test, i.e., I minus the probability of a type I1 error. MOLECULAR SYSTEMATICS
The level of type I error one is prepared to accept
depends on the consequences of being wrong. In Three main applications of molecular systematics
biological studies it is usually set at 5%, but this will be considered here: studies of population
limit should not be accepted blindly. structure (e.g., geographic variation, mating sys-
For example, a researcher may be testing the tems, heterozygosity, and individual relatedness),
hypothesis that a particular species of commercial identification of species boundaries (including hy-
fish has a population structure characterized by bndization), and estimation of pkylogenies. Each
isolated demes. The corresponding null hypothe- of these requires different approaches to virtually
sis is that the entire species is a single panmictic every phase of the study, from project planning
unit. Before launching into a full-scale study, the through pilot studies to sample sizes, sampling
researcher should conduct a pilot study to see if strategies, methods of data collection, and data
there is any suggestion at all that the fish popula- analysis. Therefore, it is necessary to have a clear
tion shows evidence of genetic substructuring. idea of the aims of the study very early in the
Here the type I error might be set at 20%, since the planning stage.
consequence of being wrong (i.e,, rejecting the Determinhg the relationships of specific indi-
null hypothesis when it is true) is that a fuller viduals (e.g., testing parentage) requires direct
study is carried out. However, in the full-scale comparison of alleles at allozyme loci (Chapter 4)
study, a type I error might be set at 1%, since here or, preferably, l~ypervariableloci (Chapter 8) be-
the consequence of being wrong is that inappro- tween putative relatives. Sampling and statistical
priate management procedures are adopted for considerations are reviewed by Sensabaugh
the fish species. By contrast, if the fish species con- (1982) for allozymes and by others (e.g., Lynch,
cerned was endangered, the type I error might be 1991a; Chakraborty, 1992; Chakraborty and Jin,
set at 20%, since t l ~ econsequences of (incorrectly) 1993) for hypervariable loci. An important con-
concluding that the species is a single panmictic sideration is the need to have information on the
unit may be disastrous for the recovery program. frequencies of different alleles for the population
Project Desiyz 19
in question in order to calculate exclusion/inclu- Units, but allele frequencies to define Manage-
sion probabilities (Lewontin and Hartl, 1991; cf, ment Units for rnol&ring of current populatio~s.
Chakraborty and Kidd, 1991; Jin and Chakraborty, This has major ilnplicatjons for the choice of tech-
1995). Statistical considerations relevant to the niques and sampling design.
analysis of mating systems using RAPDs were To date, allozyme electrophoresis (Chapter 4)
discussed by Milligan and McMurray (1993). The has been the genetic technique most widely used
remaining applications are discussed below. to study the genetic structure of populat~ons.
FIowever, various DNA fragment methods are be-
Studies of Population Structure ing used with increasing frequency (Chapter 8).
The distribution of variation within and among
Background populations revealed by these methods map dif-
The genetic structure of a population is perhaps fer. Uniparentally inherited loci (e.g., mitochondr-
the most fundamental piece of information for a ial DNA; Y-linked loci; most chloroplast DNA) are
species that requires management. For some generally expected to show less variation within
species, the entire population may consist of a sin- populations and more between populations than
gle random mating unit; others may consist of a are biparentally inherited loci (e.g.,autosomal 11~1-
series of small subpopulations, each largely iso- clear loci). Similarly, repeated sequences subject to
lated from other subpopulations (the stepping- strong concerted evolution (Chapter 8) also may
stone or islands model); still others may consist of have reduced levels of variation wit11111 popula-
a continuous population, but individuals within tions. Anv such alteration in the distributibr-t of
i t exchange genes only with geographically prox- variation has important implications for sampll~~g
imate individuals (the isolation-by-distance design. For hypervariable loci (e.g., microsatel-
model). Deciding which model best approximates lites), the large number of alleles and potential
the population structure of a particular species is genotypes complicate even the simplest statistical
usually the first step in understanding population analyses (e.g., tests of fit to Hardy-Weinberg equi-
biology. The three different models of population librium, Chapter 10). Altl-tough the theory to deal
structure result in different patterns of genetic with these data is still being developed (e.g.,
differentiation within and between geographic lo- Chakraborty, 1992; Chakraborty and Jin, 1993;
calities. Therefore, an analysis of the genetic struc- Slatkin, 1995) it is clear that large sample slzes are
ture of a species can give the investigator im- needed in combination with greater emphasls on
portant clues about the population structure randomization procedures for analysis.
(reviewed by Richardson et al., 1986; Slatkin, 1987;
see also Chapters 4 and 8). Pilot Studies
An important consideration is whether infor- The pilot study has three major aims: (1) to lind
mation is required on current population structure, genetic markers (i.e., polymorphic loci); (2) to de-
historical population structure, or both. Analyses termine whether the polymorphic markers are
of historical population structure are enhanced by suitable in a practical sense; and (3) to establish
information on the relationships among alleles as the feasibility of a large-scale sampling program.
well as their frequency and distribution (Avise,
1989; Hudson, 1990; Slatkin and Hudson, 1991; ESTABLISHING MARKERS Samples should be
Slatkin, 1993). However, in some circumstances, obtained from multiple populations representing
inclusion of information about allele phylogeny or a hierarchy from closely spaced to geographical-
divergences can be quite misleading about current ly distant sites to identify locally polymorphic
population processes (e.g., Avise et al., 1992a).For markers as well as those wit11 widespread varia-
threatened species, where population sizes and/or tion. It is at this point that the distribution of
migration rates may be changing rapidly, Moritz variation within versus among populat~olls
(1994) advocated the use of information on allele should be assessed.
phylogeny to define Evolutionarily Significant A suitable approach for the pilot study may
20 Chnptev 2 / Bcrvevsfock & Moritz
be to collect relatively large samples (e.g., n > 20) but not with RFLPs or microsatellites if a small
from two localities at the extremes of the range n ~ ~ m b of
e r loci is examined per gel (see Chapter
and sm,~lIersamples (n= 5) from several other lo- 8). At this stage it is prudent to experiment with
callIiei This represents a trade-off between the different tissues, different tissue treatment, differ-
need to assess within-population variation (parent storage regimes, PCR conditions, etc. (see
tlcularly for iiiploid nuclear loci) and among- Chapters 3,4, and 7) to improve the resolution of
population variation (particularly for diploid nu- loci that are polymorphic but are proving diffi-
clear loc~),and samples should be assayed for as cult to score. These things should be sorted out
many loci ds possible, preferably including dif- before the main sampling program begins.
ferent genetic systerns (e.g., allozymes or mi-
crosatellites as well as mtDNA or CDDNA).At FEASIBILITY OF POPULATION SAMPLING PROGRAM
this point, it may be appropriate to approach The pilot study also gives the opportunity to test
other laboratories with experience in specific ap- the feasibility of the main sampling program:
proaches rather than spending resources estab- Have all the logistic problems been sorted out? Is
lishing methods that turn out to be uninformative the spatial scale of sampling appropriate?
(i.e,rnonomorphic).
The actual number of genetic markers re- Sample Sizes and Strategies
cl~~ired will depend to some extent on the subtlety Sampled localities may have different allele fre-
of [he population substructuring encountered and quencies for a polymorphic locus, but the differ-
the variance among loci. The need to examine a ence may go undetected because of small sample
large number of loci is evident from the observa- size (a type I1 error). The smaller the actual differ-
t~onsIhak the reliability of estimates of summary ence in allele frequencies, the larger the sample
st3list1cs such as heterozygosity, genetic distance, sizes needed to reliably detect them. Thus, the
and Fs, depends more on the number of loci than first consideration in setting sample sizes is the
on the number of individuals (Nei, 1978; Nei and magnitude of the allele frequency differences one
Chesser, 1983; Chakraborty and Leimar, 1987; expects to encounter, which may be determined
Slatkln and Barton, 1989; Leberg, 19921. One from the pilot study. The only other considera-
marker 1s clearly insufficient because the effects of tions are the level of type T and type I1 errors one
selection and substructuring cannot be distin- is prepared to accept. Again, both can be reduced
guished Even two loci are insufficient because se- by increasing the sample size. Table 1 gives the
lectlon or linkage may give the same pattern for minimum sample sizes required to detect given
each. At the very least, three loci with multiple al- levels of allele frequency differences (assuming
leles at reasonably high frequency should be used. Hardy-Weinberg equilibrium) for various levels
If fewer are found, some other approach to the of type I error and various powers of test.
problein should be explored. Weir (Chapter 10) For example, let us assume that we have two
suggests that a minimum of five polymorphic loci diploid populations, both polymorphic at a locus
are needed to test the significance af population with allele frequencies of 0.510.5 and 0.55/0.45,
structure via resampling procedures, respectively. Our null hypothesis is that the two
populations do not differ in allele frequency. Let
sunAnlLlTY OF MARKERS Because a large number us assume also that we have decided on a type I
of san~plesmay need to be screened in the main error of 5% (i.e., we will reject the null hypothesis
study, markers should be inexpensive and easily only when the data have less than a 5% probabil-
scored. Moreover, for diploid loci, it is highly ity of occurring if the null hypothesis were true),
preferable that heterozygotes can be clearly and and a power of 90% (i.e., we want to be sure that
consiste~z~lydistinguished from both homozy- if we accept the null hypothesis it has a 90%
gotes. This can be a problem with multilocus chance of being correct). This would mean that
mmisatellite fingerprints, RAPDs, and some the two samples would each need to consist of at
allo~ymemarkers (e.g., Richardson et al., 19861, least 2081 individuals!
Project Design 21
Table 3.
The number of diploid individuals in each of two samples required to
detect given differences in allele frequency ( A P ) ~
v
Power *P 0.55 0.70 0.80 0.90 0.95
50% 0.05 760 645 492 276 146

0.10 190 162 123 69 5ob
0.20 48 40 31 25 5ob
0.50 6b 9 13 25 5ob
80% , 0.05 1554 1319 1006 564 299
0.10 389 332 252 141 76
0.20 99 82 64 27 5ob
0.50 16 14 13 25 5ob
90% 0.05 2081 1766 1345 756 400
0.10 520 444 337 189 102
0.20 132 110 85 50 5ob
0.50 22 20 14 25b 5ob
From Richardson et al., 1986.

0 Based on a x2 test for homogeneity wlth the probability of a type I error set at 5%, and wllh
powers of 50%, 80%, and 90%. Because the sample sizes requlred depend on the actual allele
frequency (y), sample sizes required are given for various values of p.
b The x2 homogeneity test requires a minimum expected frequency of 5 in each cell; values
marked have been set to meet this requirement.
An alternative way of viewing the problem of of the time (i.e., a power of 0.5) with a type I error
sample size is to look at what level of discrimina- of 5%is approximately
tion will be achieved for a given sample size. For
example, with sample sizes of 100 in each of two
diploid populations, only differences in allele fre-
quency of at least 0.1 to 0.2 (depending on the ac- (Chakraborty and Leimar, 1987; Slatkin and Bar-
tual allele frequencies) will be detected with a ton, 1989). For example, to detect a GSTvalue of
probability of a type I error set at 5% and the 0,05, samples of just 10 diploids per locality may
power set at 80%.Smaller differences in allele fre- be adequate.
quency between populations will be indistin- The above examples apply to diploid loci
guishable from sampling error even with such where the only information extracted is on allele
large sample sizes. Similarly, Chakraborty (1992) frequencies. The power of tests for population
concluded that sample sizes of 50 are required to subdivision may be greater where they incorpo-
detect alleles with frequencies of p > 0.05 at hy- rate information on the molecular differences be-
pervariable loci with >95%probability. tween alleles as well as their frequency (e.g., Ex-
T11e situation with respect to estimation of coffier et al., 1992). Based on simulation studies,
variance in allele frequencies appears somewhat Hudson et al. (1992a) concluded that such se-
less demanding. For small numbers of popula- quence-based statistics (e.g., NST; Lynch and
tions, the sample size (n)needed to detect a given Crease, 1990) are more powerful than statistics
level of differentiation at a diploid locus among that consider only allele frequency when muta-
populations (using GST;see Nei, 1973) at least 50% tion rates are high and sample sizes small. Con-
22 Chapter 2 / Bavevstock G.' Moritz
versely, ~2 tests based on frequency-based statis- tures at each scale. Other aspects of the species' bi-
tics (Chapter 10) usually will have greater power ology may dictate further sampling requirements.
for detecting population subdivision when muta- For example, the strict two-year breeding cycle of
tion rate and thus allelic diversity is low, that is, some salmonids requires odd- and even-year
when breeders to be sampled and analyzed separately.
Repeat sampling of at least. some localities is
highly desirable to permit a direct assessment of
sampling variance in allele frequencies or to de-
where HTis the estimate of allelic diversity for the tect artifacts arising from non-random sampling
total population and nl and 12, are the sample of a gene pool. The latter could arise if there are
sizes for the populations compared (Hudson et spatially or temporally separate groups, such as
al., 1992a). These results have implications not schools of siblings or genetically distinct cohorts
only for choosing the most appropriate statistics, (Richardson eE al., 1986). Changes in allele fre-
but also for determining the most cost-effective quency between generations, other than through
way of measuring variation in the first sampling error, also permit an estimate of the ef-
place (see Chapter 8 for further details). Tn a simi- fective population size, N,,assuming that the al-
lar vein, Lynch and Milligan (1994) found that the leles are selectively neutral (Waples, 1989).Also,
use of dominant RAPD markers to estimate pop- the overaIl mean sampling variance can be used
ulation genetic parameters requires 2-10 times to estimate the size of a genetically uniform
more individuals and more loci compared to neighborhood in a continuously distributed
codominant genetic markers (e.g., RFLPS,mi- species (Richardson et al., 1986).
crosatellites).
The number and geographic pattern of locali-
ties that ultiinately need to be sampled will de- Studies of Species Boundaries and
pend to a large extent on the actual scale of sub- I-fybridization
structuring, which may not be apparent until after
the first round of sampling. For example, if fol- Background
lowing the first round of sampling the entire pop- There has been considerable debate in the litera-
ulation conforms to a panmictic unit, it may be ture concerning the most appropriate definition of
decided to conduct no further sampling.- - Alterna- a species (see Endler, 1989; O'Hara, 1993).A com-
tively, there may be no obvious geographic struc- monly held view is the evolutionary species con-
turing, but a deficiency of heterozygotes (under cept, according to which a species is "a single lin-
HWE expectations) may be observed, prompt- eage of ancestral-descendant populations which
ing sampling at a finer geographic scale (e.g., maintains its identity from other such lineages
Richardson, 1981; Johnson and Black, 1984). and which has its own evolutionary tendencies
Therefore, the budget for the program should fore- and historical fate" (Wiley, 1978). For sympatric,
shadow the possibility of additional rounds of sexually reproducing species, this reduces to the
sampling. Weir (Chapter 10) recommends that at biological species concept (Mayr, 19691, according
least five localities with n > 20 be sampled to pro- to which a species consists of a group of individu-
vide for statistical testing via resampling methods. als capable of exchanging genetic material with
It may prove useful to use a spatially hierar- each other but which are reproductively isolated
chical sampling strategy in both the pilot and sub- from all other such groups.
sequent studies, especially where the geographic There are at least: five situations in which
scale of gene flow in the species is not apparent a morphological data alone will be inadequate for
priori. For example, Lavery et al. (1995a) sampled defining species boundaries. First, two species
coconut crabs from different islands within an may be sympatric (overlapping) or parapatric
archipelago, islands from different archipelagos (abutting), but be so similar in morphology that
within an ocean, and islands in different oceans their specific status goes undetected (e.g., Don-
and found qualitatively distinct population struc- nellan and Aplin, 1989). Second, two allopatric
(geographically separate) populations may be of 10 individuals is collected. The null hypothesis
morphologically different, but their status as (bi- under test is that all specimens belong to a smgle
ological) species is questionable. Third, two para- random-mating population. At one locus, 6 speci-
patric populations may be ~norphologically mens are homozygous for one allele and 4 speci-
distinct, but slzow clinal variation or broad hy- mens are homozygous lor a different allele, tl~ere
bridization (e.g., Jackman and Wake, 1994). are no heterozygotes. The best estimates of the al-
Fourth, two ~norphologicallydistinct forms may lele frequencies at this var~ablelocus are p = 0.6
represent polymsrphisms within a single inter- and q = 0.4. The expected proportioil of heterozy-
breeding population (e.g., Titus et al., 1989; Hillis gotes assumiilg Hardy-Weinberg equilibrium for
et al., 199110).Fifth, an asexual species complex a random-breeding population is 2pq, which in
may have morphologically similar forms that this case is 0.48. The probability of an individual
arose independently from sexual species (e.g., not being a heterozygote is therefore 1 - 2pq =
Moritz and Heideman, 1993). 0.52. The probability of all 10 individuals not be-
Of the various molecular genetic approaches ing heterozygous is (0.52)1°= 0.0014. Clearly, the
that may be brought to bear on tl-ie problem, al- null hypothesis is under serious challenge. If an
lozyme electrophoresis (Chapter 4) appears to re- additional locus is found that shows the samc pat-
main the most generally applicable and efficient, tern of fixed differences involving the same indi-
although cytogenetic (Chapter 5 ) and DNA frag- viduals, then the hypothesis is that two species
ment (Chapter 8) analysis often can be useful as are involved. Other hypotheses that deserve con-
well (see the Summary in Chapter 12). With re- sideration are that the species is actually asexual,
gard to defining species under tl-ie phylogenetic that it is haploid, or that it has a very high level of
species concept, Davis and Nixon (1992) sug- self-fertilization,
gested that non-recombining loci such as mtDNA In practice, investigators should a m for a
have advantages, although Moritz et al. (1992a) minirnum of two locl show~ngpatterns of iixcd
argued against tl-ie use of mtDNA phylogenies to differences that are consistent between mdividu-
define species boundaries because of the potential als. This is necessary because an apparent lack of
for differing patterns of geographic variation in heterozygotes at a locus can result from other ef-
nuclear versus organellar genes. fects. For example, variation may not be under
Different species usually have a fixed allelic simple genetic control (c.g.,Ldlz-R in Mus doi7zesti-
difference at some of the loci screened in protein cus; Shows and Iiuddle, 1968), there may be llull
electrophoretic studies. Thus, for predominantly alleles, there may be ontogenetic variation (c.g.,
outcrossing species, the presence of sympatric hemoglobin in vertebrates), or, at least in theory,
cryptic species can be tested by loolung for variable there may be very strong selection against het-
loci that lack l-ieterozygotes, while the status of erozygotes.
sympatric morphotypes can be evaluated by test- The above argument rests on the assumption
ing for significant differences in genotype or allele that fixed differences will be found. Clearly, the
frequencies (Chapter 11).For allopatric populations more loci that are screencd, the greater thc chancc
and asexual populations, the aim is to assess the of finding such loci i f two genetically distinct
extent of genetic divergence between the popula- species really are represented. Consequeri tly, part
tions being tested in relation to geographic varia- of the aim of the pilot study should be to screen
tion within species. In all cases it is more important as many loci as possible. For allozyine analysis, it
to maximize the number of loci screened than to may be necessary to try different tissues and dif-
maximize the number of individuals examined. ferent treatments on a limited number of speci-
mens to detect additional loci (see Chapter 4).
Pilot Studies, Sample Sizes, and If the aim is to test whether two previously
Sampling SStrategies identified sympatric groups (e.g., distinctive nlor-
For sympatric outcrossing species, very small photypes or cl-irornosome races) are reproduc-
samples are adequate so long as they include both tively isolated, loci wlth shared polymorphisms
species. For example, let us assume that a sample also may be useful. These can be examined onc lo-
24 Cilapter 2 / Bclverstock & Moritz
clrs a t a time, testing for slgluficant differences in be incorrectly scored as fixed differences, but,
ailele fseq~~encies (Table 1) or significant deficien- from the point of view of assessing genetic diver-
cies oi I~eterozygotes(Chapter 10). Alternatively, gence between allopatric populations, very differ-
sevcsal locl can be examined simultaneously us- ent allele frequencies indicate high genetic diver-
lng discqulllbrium statistics (e.g., Ryman et al., gence and are operationally equivalent to fixed
1979, see Chapter 10). Such analyses usually will differences. Once again, however, because the
require much larger sample sizes than is the case variance of the estimate of between-population
if locr with fixed differences are used (A.D.H variation depends mainly upon the number of
Brown, 1975). Moreover, disequilibrium statistics loci (Nei, 1978), every attempt should be made to
should be ~nterpretedwit11 caution because sig- maximize the number of loci screened.
niflcar~tdisequilibrium can result from many fac- The pilot study should consist of screening
tors other than the presence of two reproductively about five individuals of each of the two geo-
isolated groups (Hart1and Clark, 1989). graphic forms. If no fixed differences are found,
Once the presence of two species has been in- there is no point in screening additional individ-
dlca ted, a follow-up study usually is required in uals or additional populations since increasing the
order to find rnorphologlcal features diagnostic population sample size can only reduce the esti-
for the species. Such studies often require multi- mate of fixed differences. Any additional effort
varrate analyses. It is unlikely that the original should focus on examining additional loci. Only
sample w ~ l bel sufficiently large for a full multi- where potentially diagnostic differences are found
varra te morphornetric analysis, especially when should additional sampling be contemplated.
the posslble effects of age and sex are taken into Here, small samples (about five individuals for
account However, additional specimens need to diploid organisms) should be screened, including
be "Lypcd" only for the diagnostic loci. The sam- samples from each of several geographically
ple sues required for thls part of the study will widespread populations of each of the morpho-
depend on the subtlety of any n~orphologicaldif- logical forms.
ferences between the species.
IvIethods for assessing whether allopatric Hybrid Zones
populations represent distinct species are contro- The population genetics of hybrid zones is most
verslal (e.g., Frost and Hillis, 1990; Davis and readily investigated if fixed genetic differences are
Nixon, 1992). For example, some have suggested found between the parental taxa involved in hy-
Lhat a certain level of genetic divergence is re- bridization, although some information can be ob-
quired for populations to be considered as sepa- tained from polymorphic markers (see Chapters 4
rate species (c,g.,Baverstock et al., 1977; Highton and 8). Consequently, every effort should be made
et al., 19891, although thls approach has been in the pilot study to discover such fixed differences
strongly crrticized (Avise and Aquadro, 1982; rather than rely on allele frequency differences.
Frost and W~llis,1990). Another approach is to Three additional features of hybrid zones are
compare genetic divergence between two al- salient to the project planning stage. First, genetic
lopalric populations suspected of representing markers frequently show introgression over much
distinct species with that between similarly sepa- broader geographic areas than might be predicted
rated populations within each form (e.g.,Jackson from morphology alone. Second, different genetic
and P o ~ ~ n d1979;
s , Moritz et al., 1993).It has been markers frequently show different levels of intro-
argued elsewhere that for studies of species gression. Third, uniparentally inherited non-re-
boundaries and relationships, the proportion of combining markers (such as mtDNA and cpDNA)
fixed differences between two samples is the most provide information of a different kind from
appropriate measure of genetic divergence diploid nuclear markers (Chapter 8).
(Xilcl~ardsonet al., 1986; see also Davis and Nixon, As a consequence of these considerations, it is
1992) Uslng this approach, shared poIymor- useful to screen both nuclear diploid and uni-
phlsms with very different allele frequencies may parental haploid loci for fixed differences. More-
Projecf Design 25
over, the pilot samples should be taken from lo- levels and others work at lower levels. The num-
calities well away from the hybrid zone itself, and ber of specimens examined per group can be quite
should involve several populations of each of the small (even one), unless shared polymorphisms
parental taxa. among species are a likely possibility (e.g., closely
related species). If shared polymorphism is a rea-
Phylogenetic Relationships sonable possibility, then at least two larger sam-
ples (n = 10) should be included. Multiple popu-
Background lations should bc examined for closely related
The single most important component of the pro- pairs (Smouse et al., 1991).
ject planning stage of a phylogenetic analysis is In principle, these specimens could be sub-
the decision as to which method(s1 or sequence(s1 jected in the first instance to any one of the treat-
are appropriate to,the phylogenetic question at ments discussed in Chapters 4 9 to obtain some
hand. The method chosen must yield sufficient idea of the most appropriate methsd/gene to be
variation to be phylogenetically informative, but used in the main study. However, it would be
not so much variation that convergences and par- most efficient to start with a method already
allelisms overwhelm informative changes (see available in the laboratory or to begin with a rela-
Chapter 12). tively cheap and fast method. If no technique is
There is a considerable body of evidence sug- available locally, it would be wise to see if another
gesting that the rate of evolution at the molecular laboratory with one of the techniques already es-
level is at least similar (i.e., within an order of tablished will run the pilot specimens rather than
magnitude) across most groups for a particular establish a method de nova that ultimately turns
gene or set of genes. As a consequence, the out to be inappropriate for the major study.
method chosen will depend to a large extent on The pilot study will determine which tech-
the time frame over which divergence has oc- nique(~)or gene(s) are most appropriate for the
curred for the study group (see Chapter 12). group, and hence how many specimens are
When the phylogenetio study begins, the time needed for the main study, which tissues should
scale for the group in question probably will be be collected, and how they should be stored. The
largely unknown. Guesses based exclusively on preliminary data can be used to project the size of
morphology of extant forms are likely to be quite the final data set that will likely be needed to
misleading because rates of morphological evolu- achieve a well-supported estimate of the phy-
tion vary enormously between groups (Cherry et logeny (see the section "Hypothesis Testing and
al., 1982; Baverstock and Adams, 1987). Fossil the Parametric Bootstrap," Chapter 12). Xt also
data also must be interpreted with caution (A.C. should be possible at this stage to estimate the
Wilson et al., 1977). Therefore, the prime purpose cost of the study in terms of both consumables
of the pilot study should be to determine which and time.
molecular technique or techniques are appropri-
ate to the study group. Sample Sizes and S a r n p l i ~ gStrategy
The number of specimens and populations
Pilot Study needed per group to resolve relationships among
For the pilot study, it is desirable to sample indi- groups depends critically on the amount of poly-
viduals from taxa representing the two extremes morphism relative to the extent of divergence. If,
of differentiation (i.e., two pairs of closely related for a given method or sequence, the study
taxa and two pairs of distantly related taxa). indicates that virtually all of the variation occurs
Again, it is desirable to evaluate the distribution among groups, then it is appropriate to use small
and nature of variation within and between samples per group (e.g., Gorman and Renzi,
groups for different types of loci (e.g., allozymes; 1979). However, even here it is necessary to in-
slowly versus rapidly evolving genes). It may be clude multiple populations of closely related
that some approaches work at higher taxonomic species, particularly if non-recombining se-
26 Chapter 2 Baverstock C;. Moritz
quences such as mtDNA or cpDNA are being one must be certain that the outgroup taxa are in-
used (see Neigel and Avise, 1986; Smouse et al., deed outside of the group under study Including
1991). Thus, it may be most efficient to conduct multiple members of the sister group is useful for
the sampling and analysis in two steps-the first reducing long-branch attraction problems (A.B.
to identify clades of closely related taxa and the Smith, 1994), and multiple successive outgroups
second to add geographically remote populations may provide a minimal test of ingroup monophyly
for each of the members of such clades. If the cho- Another important issue, partici~larlyrelevant
sen method and sequence reveal appreciable to assessment of phylogenies by DNA sequencing
polymorphism (relative to divergence) in some or or RFT,P analysis, is how genes should be sampled.
all taxa, then larger sample sizes will be needed to One consideration is the number and distribution
estimate phylogeny (Archie et al,, 1989). In this of nucleotides that should be sampled within a sin-
case, correct choice of method of analysis (see gle linkage group. Comparisons of phylogenies
Chapter 12) becomes even more important. For produced from various subsamples of whole ver-
example, different methods of coding polymor- tebrate mtDNA genomes have indicated that
phism~as character states are subject to very dif- blocks of contiguous sites are less likely to repro-
ferent levels of sampling error (Swofford and duce the whole-genome tree than samples of
Berlocher, 1987; Chapter 11). equivalent size drawn from nucleotide sites dis-
The number of species that must be included persed throughout the genome, apparently be-
to obtain an accurate phylogeny represents a cause of heterogeneity among regions in variabil-
trade-off between sampling enough so that char- ity and base composition (Cummings et al., 1995).
acter changes can be accurately reconstructed (e.g., Thus, restriction site analysis or sequencing of mul-
splitting long branches; Chapter 11), but not so tiple short stretches from sequence-tagged sites can
many that phylogenetic analysis become unwieldy provide more power than sequencing of longer
(e,g.,M.W. Chase et al., 1993).It has long been rec- conti~~ous segments. Consistent with this, Hillis et
ognized that phylogenetic iderence is sensitive to al. (1994a) found that restriction sites performed
the number and phylogenetic distribution of better than a similar number of variable sites
species included (e.g., Lanyon, 1985; Lecointre et within a continous sequence at estimating a known
al., 1993). Sampling of species is likely to be an it- phylogeny of T7 viruses. In recombining genes, se-
erative procedure, adding new taxa within groups quencing of long continuous stretches will also in-
as discussed above in relation to populations (e.g., crease the likelihood of spanning sites of recoinbi-
Moritz et al., 1992a).At the minimum, we suggest nation and thus obtaining reticulate gene trees.
that there should be replication of samples one A second consideration is the number of
level below the level of inference. For example, if genes that should be analyzed. Particularly for
the relationship of families is being examined, at closely related species, any one gene tree can dif-
least two (non-sister) genera should be examined fer from the species tree because of retained an-
per family where this is possible. cestral polymorphisms (Pamilo and Nei, 1988;
At least one outgroup taxon should be in- Wu, 1991; Doyle, 1992; Hey and IKliman, 1993).In
cluded in the analysis to root the tree (W.P. Maddi- a study of pinniped seals, Slade et al. (1994) found
son et al., 1984). In the absence of a suitable out- that trees for individual nuclear (intron) genes dif-
group, the data for the ingroup can be used to fered, but that a tree based on concatenated nu-
produce an unxooted tree, which pro~iidesvaluable clear sequences was congruent with both the
information, but it is usual to aim for a rooted tree mtDNA tree and the traditional phylogeny. One
(Chapter 11). The use of more than one outgroup conclusion drawn from this study was that, for a
will be useful for jackknifing the final data set given amount of effort, it may be preferable to
(Chapter 11). The outgroups should be as closely combine several short sequences from unlinked
related as possible to the ingroup (preferably in- nuclear genes than to maximize the information
cluding at least one member of its sister group), but obtained from a single gene or linkage group.
Finally, the neth hods discussed in Chapters
4-9 fall into two broad categories: those that, by CONCLUDING REMARKS
their very nature, yield distance data (e.g., DNA
hybridization), and those that can yield character- We have attempted to highlight the necessity of
state data (e.g., allozymes, cl~romosomes,frag- proper project plannlng in the use of molecular
ment methods, and nucleic acid sequences). methods in systematics. Some of the common pit-
Methods that yield distance data alone require a falls can be avoided by careful project planning
different project strategy from those that yield and the judicious use of pjlot studies. A partlcu-
character-state data. For character-state data, the larly common pitfall is to attempt to include all
cost (of both time and chemicals) goes up linearly three applications of molecular nzethods in sys-
with the number of taxa, whereas for distance tematics-population structuring, species bound-
data (where a matrix is required), the cost goes up aries, and phylogenetlc reconstruction-~nto a sin-
with the square of the number of taxa. Therefore, gle study. Yet these three applications, although
for methods that yield distance data a sensible using similar techniques, have such different
strategy might be to divide the project into several strategies that attempts to combine them are al-
matrices, one providing the major branches for most certain to be inefficient, or at worst fail on all
the group and others dealing wit11 lower-order re- three.
lationships.
Chapte
ection and Storage of Tissues
Herbert C. Dessauer, Charles J. Cole, and Mark S. Hafner
INTRODUCTION
Research in molecular systematics requires plasmid, cell, or tissue samples in
which proteins and nucleic acids are maintained in the structurally intact physi-
ologically active state. In field work to obtain such material, the collector is con-
fronted with unique challenges. Field equipment often must include u n u s ~ ~ a l
items such as liquid nitrogen tanks or dry ice, because freezing is the most effec-
tive method for preserving the widest variety of tissue constituents. Even after
obtaining the samples, collectors may encounter difficulties transporting them.
For example, airlines often are reluctant to accept liquid nitrogen as baggage.
Also, collectors working in foreign countries may find that customs officials re-
quire special permits for the importation of biologically active materials. In this
chapter we offer advice on meeting the cl~allengesof collecting, packaging, and
preserving tissues, and we include a listing of collections of material for molecu-
lar studies and emphasize the need to develop such synoptic collections of these
materials.
30 Clzapter 3 / Dessaue~;Cole & Hafner
REGULATIONS GOVERNING loci was obtained from individual lice weighing

ACQUISITION OF SPECIMENS less than one milligram (Hafner and Nadler,
1988).As little as one microliter of blood or a few
Collectors should become familiar with local, nanograms of muscle were sufficient to obtain im-
state, national, and international laws and regula- munoiogical evidence on the evolution of taxa for
tions, and they should aIlow adequate lead time which antisera existed (Maxson and Szymura,
to obtain the necessary permits. Regulations con- 1984). Nanogram quantities of DNA amplified
cerning the collection and transport of biological with the polymerase chain reaction (Mullis and
materials are designed to safeguard the general Faloona, 1987) yielded sufficient material to ob-
public health, protect domestic crops and live- tain sequence data useful in systematic studies
stock, and control illicit traffic in endangered and (Paabo et al., 1989). The kinds and quantities of
threatened species. Scientists intending to jmport .tissues preserved depend 011 the needs of the in-
frozen tissues and other specimens must adhere dividual investigator (see Chapter 2); however,
to all applicable wildlife regulations for the coun- we recommend that collectors maximize diversity
tries involved. and quantity of tissue types to help develop syn-
Scientific collecting permits usually are neces- optic collections.
sary for sampling natural populations; these are
obtained from state and national fish and wildlife, Packaging
forestry, or conservation offices (for examples of CoIlectors should be aware of the importance of
forms used by United States institutions see keeping instruments, containers, and reagents
Dessauer and Hafner, 1984). Six months may be clean, and of placing tissue samples in a cold en-
required to obtain permits for collecting; this may vironment and away from light as quickly as pos-
be extended to a year or more if international sible. Tissues should be packaged in plastic cry-
travel or endangered species are involved. Differ- otubes (Figure lA), plastic bags (Figure IB), or
ent documents may be required to collect, to wrapped tightly in aluminum foil. Packages
travel in certain areas, to export from the country should have space for tissue expansion during
of origin, and to import into the researcher's freezing but otherwise have as little unused space
country. To verify the animal's good health, some (air pockets) as possible to minimize drying of tis-
countries require a quarantine period following sues and denaturation of macromolecules. Plastic
importation of certain Iiving organisms. bags are suitable for storage in electric ultracold
freezers, but generally are not suitable for immer-
sion in liquid nitrogen. If tissues are placed in liq-
uid nitrogen, care must be taken to exclude nitro-
REMOVING AND PRESERVING gen from the package. Otherwise, when removed
TISSUES IN THE FIELD from the dewar flask, aluminum foil packages ex-
pand and may burst; similarly, plastic tubes may
shatter.
General Procedures To store tissue samples that do not fit conve-
The collection and handling of tissues for use in niently into manufactured plastic tubes, we place
future molecular studies can be carried out by in- them in subdivided packets of aluminum foil
dividuals with minimal training (Dessauer and (Figure 1C).A rectangular piece of heavy-duty foil
Hafner, 1984; N.K. Johnson et al., 1984).Although is folded in half lengtl~wise,and open subdivi-
different groups of organisms present different sions are then created with two or three small
problems, many procedures are common to all folds made back on each other at regular inter-
groups. Tissues should be sampled while the or- vals; we make as many attached packets as neces-
ganism is alive or as soon after its death as possi- sary f o the
~ different tissues being sampled from a
ble. Even a tiny amount of tissue is valuable. For specimen. After adding samples, we press out the
example, electrophoretic evidence on 14 protein air, seal by folding tightly, and wrap the package
Collection and Storage of Tissues 33.
Figure I. Materials and supplies that are useful for forl (see text). After tissucs are sealed and folded wlthm
preparing and storing frozen tissues. (A) A plastic these packets (I-I, heart, L,livcr, In, intest~neand stom-
"French Straw" (right), assorted plastic containers, and ach, K, kidney, Sm, skeletal muscle), the package is
a waterproof ink marker (top). The marker works on wrapped m extra-heavy-duty aluminum forl, labeled
paper, plastic, and aluminum foil; the ink withstands by pressmg the foil llghtly with a ballpo~ntpen, and
freezing even in liqujd nitrogen, so it can be used for droppcd into liquid nitrogen (Dl Glass tubes for col-
both external and internal (backup) labcls. The plastic lecting blood, sealant, and an example (bottom) of
straw and tubes can be used in liquid nitrogen if the capped tubes being slid Into a labeled ptece of corm-
lids are sealed well, but the small tubes with pop-off gated cardboard (see tcxt for freezing instructions) (E)
lids (bottom row) should be packaged in tightly Hand-foot centrifuge for ficld work in areas wlthout
wrapped foil or a larger tubc for maximum security. (B) elcctrlcity (Dessauer et a1 , 1983) (F) Plastlc box (wlth
Plastic bags; these are acceptable for storage in electric gasket) for long-term storage of samples in an elcctrlc
ultracold freezers but are not recommended for use in ultracold freezer.
liquid nitrogen. (C) Packets made by folding akuminum
32 Chnpfer 3 / Dessat~er,Cole & Hafner
withln a sheet of extra-heavy-duty aluminum foil. clear impression in aluminum foil. Experiment by
Such packets are readily customized to individual exposing your materials to liquid nitrogen or
needs, they can be folded by assistants in advance other ultracold conditions followed by thawing.
of field work, and they transport efficiently in the In addition, we recommend use of a backup sys-
flat (unopened) state. tem (e.g., number written in ink and also etched
on the tube with a diamond-tipped scribe; labels
Doctiinentation of Samples both inside and outside the package).
Careful documentation oi samples is critical in all On occasion, it is not possible to cross-refer-
phases ol work. A sample in a cryotube or other ence a tissue sample with a permanently pre-
package 1s essentially useless if it has lost its label. served voucher specimen, such as when a blood
It is important to: (1)label samples and specimens sample is taken from an individual in a zoo or
so that no information is lost in wrapping, trans- from one that will be released after temporary re-
port, storage, and entering of data into permanent straint in the field. Under such circumstances,
records, (2) cross-referencethe tissue sample with photograph the individual to document its iden-
field collection data for the original specimen; (3) tification and record its tag, band, or other identi-
label containers, Iaboratory notebooks, and ex- fying number, if known.
perimental sainples during study; and (4) list
specimens examined in research publications. Ide- Preseivation
ally this cltation will include the museum cata- As soon as possible after collection and packag-
logue number for the voucl~erspecimen (e.g., ing, most tissues should be dropped directly into
study skin, skull, preserved or dried body, liquid nitrogen or covered with dry ice. Field
pressed leaves) housed in a permanent repository. workers should be aware that liquid nitrogen is
Although the museum or herbarium number may potentially hazardous. Quick-freezing in liquid ni-
be assigned long after the tissue sample was col- trogen generally shatters fragile glass hernatocrit
lected, all records pertaining to the sample (e.g., and microtubes filled with tissue fluids; such
field cdtalogue data, notes, and photographs) tubes must be frozen slowly before being sub-
should be cross-referenced with the permanent jected to ultracold temperature. In emergencies, a
voucher number. salt-ice mixture will substitute as a temporary re-
We recornmend that individuals collecting tis- frigerant. Fragile capillary tubes and microtubes
sue specimens in the fleld continue to use tradi- can be inserted into the slots of corregated card-
tio~lalcollector's catalogues (e, g., Remsen, 1977; board (Figure ID) or into a plastic straw such as
I-Terman, 1980). These usually are organized so those used for sperm storage (Figure 1A) for pro-
that each specimen receives a unique number pre- tection during long-term storage.
ceded by the collector's initials. The catalogue en- Fresh, unfrozen tissue gives the highest yield
try should indicate the type(s) of tissues sampled. of animal mtDNA (Chapter 8). Tissues have been
The package or tube containing the tissues should maintained successfully for 7-10 days unfrozen,
be marked clearly with the collector's inltials and immersed in a mannitol-sucrose buffer contain-
f~eldnumber. The name of the specialist who pro- ing 100 mM EDTA. Soft tissues and especially
vided ~dentificationof a specimen is an important oocytes, which contain 100 times more rnitochon-
part of the document~t' ron. dria per cell than somatic cells, are the best
Great care should be taken in labeling tubes sources of mtDNA (Lansman et al., 1981; J. C.
and packages containing tissue samples. The fol- Avise, personal communication).
lotvli-ig items have been reliable: (1)high-quality Cryopreservation is not required for tissues
bond paper and a drafting pen with permanent, collected for certain purposes. Although not rec-
waterproof, non-smearing ink; (2) felt-tip pen ommended for long-term preservation, immersion
with permanent ink that adheres to plastic tubes of tissues jn an aqueous solution containing 2% 2-
01.packages; and (3) a ballpoint pen that leaves a phenoxyethanol preserves the physicochemical
Collection and Sforage of Tissues 33
and catalytic properties of many enzymes for at Procedures Unique to

least 3 weeks (Nakanishi et al., 1969). Plasma al- Animal Tissue Collection
bumin, which is stable in alcol~ol(Schwert, 1957),
can be isolated from samples stored in phe- Animals collected should be handled with care
noxyethanol at room temperature for at least a consistent with appropriate guidelines for their
year. Lenses of vertebrate eyes, collected for se- welfare. Quick-freezing the entire organism is the
quence studies of a-crystallin, can be preserved in fastest method. Unfortunately, this often dimin-
a saturated solution of guanidine hydrochloride. ishes the general usefulness of the tissues and
I-Iair and feathers of dry museum study skins rep- their stability during long-term cryopreservation.
resent valuable sources of keratins. For example, subsequent dissection of individual
Cryopreservation of tissues is not absolutely organs from the frozen specimen is difficult,
required if the collection is solely for studies of blood plasma cannot be retrieved free of red-cell
DNA. Recent studies even show that some frag- proteins, and proteolytic enzymes and bacteria in
ments of highly degraded DNA are recoverable digestive organs may cause destructive changes
from dried or traditionally preserved museum in biopolymers. We recommend that tissues be
specimens and directly from fossils (see section on collected and packaged individually when larger
"Stability of Macromolecules during Long-term animals are sacrificed. With vertebrates, blood,
Storage"; Houde and Braun, 1988; Paabo et al., heart, liver, kidney, stomach, intestine, a sample
1989; Golenberg et al., 1990; DeSalle et al., 1992). of skeletal muscle, and perhaps other organs
Preservation of nucleic acids depends primarily should be collected. Red blood cells of all verte-
on the denaturation or inhibition of tissue nucle- brates except mammals and some amphibians are
ases with ethanol or EDTA, respectively. excellent sources of DNA; the buffy coat of white
The following procedure is recommended for blood cells is a good source of mammalian DNA.
the non-cryogenic preservation of tissues for DNA Avoid contaminating the tissues with bile salts,
analysis (Sibley and Ahlquist, 1981b; C.G. Sibley, which are surface tension-reducing agents that
personal communication). Place the tissue in a could adversely influence tissue stability. How-
shallow dish and cut it into pieces of about 2-4 ever, bile may be saved as it has been useful for
mm in diameter (the smaller the better). Cover the systematic work (Haslewood, 1967; Tammar,
minced tissue with twice its volume of ethanol 1974).
(preferably 95%, but 75% can be used). After al- Valuable material for molecular studies-
lowing the alcohol to diffuse through the tissue for such as blood, hemolymph (M.G. Boyden, 1967),
one or two hours, replace the original alcohol, eggs (Sibley et al., 1974), snake venom (Russell,
which has been diluted with tissue water. Do not 1980),feather pulp (Marsden and May, 1984), and
use a blender or tissue homogenizer, as such treat- muscle biopsies-can be obtained without sacri-
ment causes excessive degradation of DNA. After ficing the animal. Methods for collecting tissues
soaking in strong alcohol for at least two days, from most types of invertebrates are described in
place the moist tissue, covered with about twice its C.A. Wright (1978) and in papers listed in a bibli-
volume of ethanol, in a plastic bag or other con- ography of immunotaxonomic literature (Leone,
tainer for shipment and storage. Isopropanol, 1968).
which usually is available as rubbing alcohol, and
propanol can substitute for ethanol. The addition Anesthesia
of EDTA (about 100 pnol per liter) to the alcohol The collector often requires anesthetic drugs to
will further stabilize the nucleic acids. In the labo- immobilize or euthanize animals. Background in-
ratory, alcohol-preserved tissues may be stored at formation on anesthetics, dose levels for a wide
room temperature, but they are stable for longer variety of both homeotherms and heterotherms,
periods if kept cool. Techniques for preserving and ways to assess depth of anesthesia are given
DNA have been summarized by Arctander (1988). in H.S. McDonald (19761 and Lumb and Jones
34 Chapter 3 / Dessaueu, Cole b Hafner
(1984). Inhalatjon anesthetics such as halothane Blood and Hemolymplz Collectior~

and ether are valuable for use under laboratory To prevent clotting, we recommend use of he-
conditions, but injectable drugs are more conve- parin or EDTA. One milligram of heparin or
nient in the field and, for most purposes, in the about one millimole of EDTA is sufficient to pre-
laboratory. Of these, ketamine, pentobarbital vent clotting of 100 ml of blood. Blood diluted
(NembutalTM), and tricane are most widely used. with STE prior to freezing preserves DNA and
Ketamine is the drug of choice for many proce- many proteins. Small samples can be obtained
dures involving a wide variety of species; it is with only minor discomfort to the animal by
easy to administer, induction of anesthesia is "milking" blood from the tail or toe into a tube
smooth, and dosage has a wide margin of safety. coated with anticoagulant (Figure ID). Microliter
Dosage levels of these drugs vary widely de- to milliliter samples also have been collected from
pending on the species, size, sex, metabolic rate, the retroorbital sinus of mammals (V. Riley, 1960).
and body temperature of the individual organism, Larger samples can be collected directly from the
and with the mode of injection of the drug. In heart, or from large, easily located vessels such as
general, small animals require higher doses rela- the femoral or jugular veins. Tmmunologists com-
tive to body size than large ones. Induction and monly collect rabbit blood from vessels in the ear.
recovery from anesthesia are slower in het- Before the needle is inserted, the vessels are
erotherms than in homeotherms. Recovery of het- caused to swell by wiping the ear with a mild ir-
erotherms from anesthesia is facilitated by raising ritant such as xylene. This approach probably will
their body temperature to increase metabolic rate. work with other mammals with large, highly vas-
In selecting an anesthetic, remember that all are cularized ears. Blood of small birds is easily sam-
poisons that may influence subsequent ~nolecular pled from a wing vein (Arctander, 1988); blood of
experiments. For example, rotenone, which turtles and crocodilians may be sampled from a
would appear to be an excellent drug for collect- cervical sinus. Blood of large reptiles can be taken
ing fish for molecular investigations, kills by in- from caudal vessels (Gorzula et al., 1976).With
hibiting cellular oxidation involving NAD; this in- the animal on its back, the needle is inserted
activates such enzymes as lactate dehydrogenase through the skin at a point slightly distal to the
and malate dehydrogenase (Stecher et al., 1968). vent at the midline. When the needle contacts a
Fortunately, ketamine, pentobarbital, and tricane vertebra, caudal vessels are entered and blood is
seem to have no detrimental effects on the pro- drawn into the syringe. We have obtained as
teins and nucleic acids commonly studied by mol- much as 20 ml of blood from crocodilians with
ecular biologists. this method.
We recommend the following procedures Heart puncture, particularly for heterotherms
for use of anesthetic drugs in the field. Anes- such as snakes and crocodilians, is easily accom-
thetic doses of pentobarbital for a wide selection plished without causing injury to the animal. The
of homeotherms and heterotherms fall between position of the heart may be located by pulsations
20 and 30 mg/kg body weight when given invisible on the anteroventral body wall. A doppler
traperitoneally; doses of tricane for tetrapods ultrasonic device is also useful for locating the po-
fall between 100 and 200 mg/kg body weight. sition of the heart in difficult cases (Brazaitis and
TO anesthetize fish and other aquatic species, tri- Watanabe, 1982).Cardiac puncture is more com-
cane may be added to water in a ratio of 1:2000 plex in turtles, but one can approach the heart lat-
to 1:5000 parts drug-to-water. To lessen the risk erally, directing the needle through the soft tissues
of death from any of these drugs, we recom- between the plastron and carapace. A more com-
mend treating the specimen with half the antici- mon practice is to tap the heart through a hole in
pated dose, followed by additional drug if the anteromedial corner of the right abdominal
needed. To euthanize animals with these drugs, scute of the plastron.
usually double or triple the dosage required for Blood cells should be separated from plasma
anesthesia. prior to freezing. Commercial hand centrifuges or
Collection and Sforage of Tissues 35
a lightweight plastic centrifuge (Figure 1E; certain taxa show a "senescence" phenomenon
Dessauer et al., 1983) are useful for separating wherein their proteins disappear during seasoiial
blood cells from plasma under field conditions. aging. Immediately upon collection, leaf cuttings
Methods for collecting hemolymph are described should be rinsed in distilled water, packaged, and
in bulletins 2,5, and 15 (crustaceans),2 (molluscs), frozen rapidly. Rapid freezing is particularly im-
and 37 (Limulus) of the Serological Museum (A. portant when collecting leathery leaves, whicln
Boyden, 1948-1978). tend to rot rather than dry. Proteins in leaves
ranging from ferns to oaks have survived freezing
Venom Collection for at least 3-4 years.
Collecting venom from snakes and other organ- For maximum yield of DNA, leaves should be
isms is a dangerous task that should not be taken pressed, dried overnight at 42'C, then stored at
lightly. The fangs of the snake are inserted through room temperature (see also Chapter 9).The most
a rubber or plastic membrane stretched across a important role of drying plant tissues may be the
collecting container. Many snakes discharge prevention of rotting, rather than the preservation
venom upon piercing the film. Additional venom of DNA. Although this method appears to pre-
may be obtained by gently massaging the region serve the integrity of DNA for several months,
over the venom glands, avoiding undue compres- dried samples should be frozen at -70°C for long-
sion which may injure the glands and cause bleed- term storage (Doyle and Dickson, 1987).
ing. If carefully handled, snakes can be "milked" Chemical treatment to remove lipids and phe-
repeatedly at 3-week intervals (Russell, 1980). i~olicsubstances from freshly collected,vegetative
tissue prior to long-term storage probably should
be avoided. Doyle and Djckson (1987) found that
Procedures Unique to treatment with preservatives used in ana to~nical
Plant Tissue Collection studies, or wit11 solvents such as ethanol and chlo-
Leaves, pollen, seeds, fern spores, and tubers of roform-ethanol, tended to cause degradation of
vascular plants have been preserved successfully DNA. Similarly, Coradin and Giannasi (1980)
for subsequent use in studies at the molecular found that chemical treatment interfered with
level (Jensen and Fairbrothers, 1983). Many pa- subsequent analyses of flavanoids.
pers in the SerologicalMuseum Bulletin (A. Boy-
den, 1948-1978) include valuable information on
collecting and handling plant material. As with
Collecting Cell Lines
animal tissues, plant tissue samples must be pack- Cryopreservation of living cells requires special
aged properly and labeled with the collector's collecting, freezing, and storage procedures ~f cells
field x~umber. are to survive intact (Stowell, 1965; Watson, 1978;
Seeds, pollen, and fern spores should be har- Hay, 1979; Hay and Gee, 1984). Cell damage i s
vested only when mature. Viability of mistletoe most likely to occur during the freezing and thaw-
seeds was greatest when collected during their ing process. Some cells will survive freezlng and
period of dehiscence (Nickrent et al., 1984). Fair- thawing, if pretreated with a cryoprotectant such
brothers (in Dessauer et al., 1984) recommended as glycerol or DMSO. For every species, tissue,
the following protocol for preserving seeds from and freezing system there 1s an optimum cryo-
most plants: (1) remove fleshy portion; (2) dry; (3) protectant cancel~trationand freezing rate (Mazur,
place in vacuum-sealed container; and (4) store in 1970). The cryoprotectant must be concentrated
the dark at or below freezing. Pollen and fern enough to protect the cells from freeze damage,
spores may be treated in the same manner after yet dilute enougl~to avoid chemical injury to
screening to remove debris. cells. Ideally, the rate of freezing should be con-
Young, actively growing vegetative tissues trolled precisely, with rate depending on variables
are more valuable for molecular study than are such as species, tissue type, size of the sample,
mature leaves (see Werth et al., 1985a). Leaves of cryoprotectant used, and the system used to
36 Chnpter 3 / Dessnz~er,Cole b IlaJner
freeze the sample. Generally, a cooling rate that is carried in personal baggage in thermos bottles
too fast results in death due to formation of ice containing refrigeration packs. Dry ice (referred to
crystals with~nthe cells; too slow a rate causes as "carbon dioxide, solid") and liquid nitrogen are
death from the chemical consequences of solute classified as Restricted Articles by the Interna-
concentration. Nevertheless, it is possible to store tional Air Transport Association, and the shipper
and I ecover low yields of viable cells without a must be aware of all pertinent regulations (see
lilghly specialized freezing procedure. Dangerous Goods Regulations, 30th Edition, ef-
Sei~iensamples and tissue biopsies are easily fective 1January 1989).
obtalned under field conditions without perma- Dry ice containers are usually accepted as
nent inpry to the donor animal. The equipment baggage by the airline agent if the "Shipper's Cer-
and supplies needed to establish proper freezing tification for Restricted Articles" is attached to the
conditions in the field arc not elaborate: alcohol, package. Dry ice is designated as Hazard Class
heezil~gmedium, and liquid nitrogen in an appro- OW-A, and packages containing dry ice must be
priate tank (Maure, 1978).Plastic "French Straws" so marked. No more than 200 kg of dry ice may
have been useful as containers for storage of se- be shipped in a single package.
men If the freeze rates used are less than optimal, Liquid nitrogen in nonpresswized, metal de-
nlanlpulation during the thawing process may in- war flasks is also authorized for shipment by air.
crease the y~eldof viable cells (Mazur, 1970). No more than 50 kg per flask can be shipped on an
'fl-ie following tissue biopsy protocol is rec- aircraft carrying passengers. The flask must be
ornr~~ended by Hay and Gee (1984) for use in the marked "Nitrogen (Liquid, Nonpressurized)" and
fleld. The tissue is collected aseptically, minced must be further labeled to discourage loading or
into fragments of about 1 mm diameter, and handling in any position other than upright. The
placed in a culture medium containing 10% upright position should be indicated prominently
seium, antibiotics, and 10 to 12% DMSO. The tis- by arrows and the wording KEEP UPRIGW placed at
sue 1s allowed to eclullibrate with this "freeze 120-degree intervals around the container. It must
mcdlurn" for 2-3 l~oursm an ice bath or refrigera- also be prominently marked DO NOT DROP-HAN-
tor, II available. The temperature is then lowered DLE WITH CARE. For shorter trips, the liquid nitro-
slowpiyto -50°C (approximately 1 degree per gen can be poured out and the tank checked as
min~tte),after which tlme the tissue container is baggage. Most standard dewars are so we11 insu-
dropped Into hquid nitrogen. The gradual cooling lated that they will maintain a large mass of tis-
procedure may be carried out in the nitrogen de- sues frozen for many hours, sometimes days, after
war by suspending the sample in the cold space the liquid nitrogen has been poured off. If a tank
above the liquid nitrogen. Successful cultures of contains few specimens, add plastic tubes filled
some t~ssueshave been established using samples with water to the tank to provide supercooled ice
equ~libratedin the freeze medium and frozen im- before pouring off excess nitrogen. A "Dry Ship-
medxately in liquid nitrogen (H.A. Taylor et al., per," which contains an absorbent that keeps ni-
1978,I<J Baker, personal communication). trogen from spilling, alleviates problems of air
transport of nitrogen in dewars; storage in some
models is effective for up to 3 weeks. For maxi-
mum duration it is necessary to saturate the "Dry
TRANSPORT OF TISSUES FROM Shipper" initially with liquid nitrogen and keep it
FIELD TO LABORATORY OR in the upright position. Although there is no dan-
BETWEEN LABORATORIES ger of spills in the horizontal position, the boil-off
rate is greater so that holding time is reduced.
Shipping Regulations
Tissues are usually transported either in styro-
Sources of Liquid Nitrogen and Dry Ice
loaln boxes packed with dry ice or in liquid nitro- Scientists conducting field work often have diffi-
gen containers. Small samples are conveniently culty locating sources of liquid nitrogen and dry
Collection and Storage of Tissues 37
ice. Universities, hospitals, welding supply com- space and freezer boxes are more easily retrieved.
panies, industrial gas suppliers, and mining oper- Freezers should be monitored at least once each
ations are useful contacts in seeking sources of ni- day and should be equipped with both local and
trogen and dry ice. A partial listing of such remote systems that will sound alarms in the
sources in different areas of the world is given in event of electrical or mechanical failure. During
Dessauer and Hafner (1984). holiday periods, special arrangements must be
made to ensure that freezers are monitored daily.
Readily visible notices should be posted in freezer
STORAGE OF TISSUES ON RETURN areas, indicating persons (and telephone num-
FROM THE FIELD bers) to be contacted in case of freezer malfunc-
tion. Some form of backup storage system (other
The majority of tissue samples are stored in an freezers, liquid nitrogen, or dry ice) should be
electric ultracold freezer (-70 to -150°C), on dry available in the event of freezer failure, and a gen-
ice, or in liquid nitrogen. Ease of access of samples erator should be available in the event of a pro-
within the freezer or nitrogen tank is of special longed blackout.
importance. Samples can be stored in numbered Ideally, liquid nitrogen is better than ultracold
moisture-proof boxes (Figure IF). For easy re- freezers for long-term storage because of its much
trieval, all specimens of a given taxon should be colder temperature (-196°C). However, if large
stored together; color codes on the outside of numbers of samples are stored in the collection, it
boxes facilitate the identification of contents. may be difficult to retrieve individual samples,
Within boxes each sample is identified by the col- Also, continual replenishment of evaporated liq-
lector's field number. A listing of the holdings in uid nitrogen may become costly. A small number
each freezer, complete with box number, contents of samples stored in liquid nitrogen tanks is rela-
of the box, and location in the freezer is main- tively easy to organize for efficient retrieval. Liq-
tained and routinely updated as samples are uid nitrogen freezers are available that are large
moved, used, granted, or discarded. enough to organize up to 15,000 2-ml cryotubes
Ultracold storage space is very expensive to with easy access to any tube.
purchase and maintain, and it is important that Household refrigerators and freezers may be
materials be stored in a space-efficient manner. used to store freeze-dried blood fractions, acetone
Thus, access and inventory procedures for frozen powders, seeds and pollen, and enzymes in
tissue collections should be extremely well orga- strong salt solutions or glycerol. Isolated samples
nized. Freezers should be opened as rarely as pos- of DNA and bacterial cultures containing DNA
sible; ultracold freezers are sensitive to even brief for cloning purposes also may be stored in a
periods of temperature warm-up, and every sec- household freezer (Sambrook et al., 1989). How-
ond that a freezer door is open while one searches ever, frost-free appliances should be avoided, be-
for a particular sample is energy-consuming and cause of the danger that biomolecules may de-
could eventually contribute to freezer failure. One grade during warming cycles.
must know exactly where each sample is located
before opening the freezer. A map on the door of
the freezer facilitates location of individual boxes. STABILITY OF MACROMOLECULES
A "working freezer" should be used for storage of DURING LONG-TERM STORAGE
tissues that are currently being analyzed.
On a cost-per-sample basis, an ultracold All biological macromolecules spontaneously de-
freezer provides the most convenient and efficient. compose (Lindahl, 1993). Many proteins, how-
method for long-term storage of large numbers of ever, are far more stable than is generally as-
samples. Of the two designs (chest and upright sumed (Sensabaugh et al., 1971a,b; Dessauer and
models), chest models maintain more constant Menzies, 1984). For example, remnants of blood
temperatures during use and are less prone to me- samples used in Nuttall's (1904) classical im-
chanical failure; upright models occupy less floor munological study of mammalian relationships
38 Chapter 3 / Dessauev, Cole & Hafner
served as experimental material for an important chemical compositioi~and molecular conforma-

test of protein stability. Keilen and Wang (1947) tion may arise during storage. These are often de-
found that hemoglobin, carbonic anhydrase, cata- tected as modifications in electrophoretic banding
lase, glyoxylase, and choline esterase remained patterns (McWrigl~tet al., 1975).Common modi-
75-85% active in blood samples that had been fications include: oxidation of sulkydryl residues
kept in the dark under aseptic conditions at room (Chilson et al., 1965; Hopkinson, 1975; Harris and
temperature for approximately 42 years. Kopkinson, 1976), oxidation of ferrous iron of
Certain proteins retain activity for surpris- heme proteins (Jirninez-Marin and Dessauer,
ingly long periods, even in tissues exposed to ad- 19731, deamination of asparagine residues (Wulf
verse conditions. Proteins in tissues with high and Cutler, 19751, rearrangements of subunits of
proteolytic activity, such as those in viperid snake multimeric enzymes (Chilson et al., 19651, and
venoms, retained their activities for extended pe- formation of conformation isomers (Kitto et al.,
riods (Russell et al., 1960; Russell, 1980). Of 17 1966; Dawson et aI., 1967).
proteins commonly examined in electrophoretic The addition of sulfhydryl reagents (e.g.,
studies, only alcohol dehydrogenase was unde- dithiotreitol or mercaptoethanol), coenzymes, and
tectable in mammalian tissues stored at room activating ions to homogenates of previously
temperature for 12 hours after death (Moore and frozen tissues stabilizes certain enzymes and may
Yates, 1983).Sensabaugh and colleagues (1971a,b) reverse some of the adverse effects of long-term
found that 8 of 11 proteins tested in an 8-year-old storage (Chilson et al., 1965; Harris and Hopkin-
sample of dried blood retained at least some ac- son, 1976). Sucrose, mannitol, or glycerol are often
tivity. Plasma albumin and esterase activity were added to homogenizing fluids and diluents to sta-
detectable in malnmal skins stored up to 16 years bilize proteins; BSA may be added to minimize
as standard museum preparations (J. L.Patton, surface denaturation of proteins in the lughly di-
personal communication). Albumin in muscle tis- lute solutions of antigens and antibodies used in
sue from a mammoth frozen in northern Siberia microcolnplement fixation studies of non-mam-
for approximately 40,000 years (tissue which had malian albumins (Champion eE al., 1974).
probably undergone numerous freeze-thaw cy- Nucleic acids, although less reactive chemi-
cles) maintained sufficient immunological speci- cally than proteins, are vulnerable to oxidative
ficity to demonstrate that the species is properly and hydrolytic changes, Lindahl (1993) reviews
classified with the elephants (Lowenstein et al., the evidence on their instability and structural de-
1981). Certain proteins in acetone powders and cay over time. Even though reaction rates are rel-
freeze-dried tissues retain their structure and ac- atively slow, structural changes may be extensive
tivity at room temperatures for short periods of after long-term storage under unfavorable condi-
time and at freezer temperatures for much longer tions. RNA is very susceptible to hydrolysis be-
periods. Endocrinologists and botanists find that cause of the presence of the 2'-hydroxyl group.
acetone powders retain many activities of the par- The absence of the 2'-hydroxyl group stabilizes
ent tissue. Lyophilization is commonly used to the 3'-5' diester bond of DNA, but increases the
preserve irnmunoglobuIins. Mellor (1978) sug- susceptibility of the N-glycosyl bond of its purine
gests that vacuum drying of frozen tissues over a residues to hydrolytic cleavage. After purine
desiccant such as silica gel is less likely to damage bases are lost the DNA chain is weakened and
sensitive proteins than is the usual method of may undergo cleavage. Cytosine and 5-metl~ylcy-
freeze drying. Dried seeds of inany taxa have re- tosine are susceptible to deamination. Exposure of
mained viable for up to 10 years when stored in DNA to oxygen may cause the oxidation of gua-
the dark, at or below 0°C. Macromolecules in nine to 8-hydroxyguanine, a nitrogen base that
dried pollen and fern spores are stable for at least pairs preferentially with adenine rather than cy-
four years when stored at or below -30°C (J.O. tosine. Additional oxidative changes include the
Anderson et al., 1978). formation of purine dimers and/or cyclopurine
Decreases in activity traceable to changes in structures between the base and sugar phosphate.
Reaction rates are very slow for the few ple, fragments yielding useful systematic data
processes measured in s~ngle-strandedDNA have been retrieved from the dried brain of a
stored at 37OC in a solution with pH 7.4. For ex- 7000-year-old human (Paabo et al., 1988), from
ample, the half-life of an individual cytosine skin of the extinct quagga (Higucl~iet al., 19841,
residue is about 200 years. Loss of a purine and from 1000-year-old kernels of maize (Rollo et
residue occurs at a similar or slightly slower rate. al., 1988). The oldest DNA sequences to date, re-
The double helix structure affords protection trieved from a 25 to 30 million-year-old termite
against cytosine deamination; for example, in fossilized in amber, were used analyze insect
double-stranded DNA, this same reaction occurs phylogeny (DeSalle et al., 1992).Sequences havc
at only 0.5-0.7% of the rate of single-stranded been reported from even older fossil material, but
DNA (i.e., a half-life of about 30,000 years for each have not yet been confirmed as belonging to the
cytosine residue in double-stranded DNA). targeted species.
Such structural changes are minimized by -LJnfor\unately, DNA is not recoverable from
maintaining DNA or its sources at very low tem- all museum specimens. Tissues treated wit11 picric
perature in the absence of oxygen. Dehydration, acid or mercuric chloride or fixed for long periods
high salt concentrations, presence of l~ydroxyap- in unbuffered or excess formaldehyde are unfa-
atite (e.g., as in bone), and EDTA to inhibit nucle- vorable candidates for DNA extraction (Goelz et
ases are additional factors that stabilize DNA. Or- al., 1985).Similarly, specimens preserved in alco-
dinary handling of fresh tissue, such as freeze- hol are poor sources'of DNA i j the nucleases in
thaw processing or alcohol preservation, usually the tissues were not inactivated prior to storage.
causes few chemical changes and only minimal The recovery of DNA from long-dead specimens
shearing of DNA. DNA isolated from such tissue is about 5-20% of that extractable from ires11 tis-
samples is valuable for DNA-DNA hybridization sue. The majority of the DNA is single-stranded
studies (Chapter 61, analysis of restriction frag- and degraded to between 200-300 bp, which is
ments (Chapter 8), and gene cloning and se- too small for restriction fragment analysis; how-
quencing (Chapter 9). For example, Sibley and ever, somc fragments of kllobase size may be pre-
Ahlquist (1990) used DNA isolated from bird tis- sent (Houde and Braun, 1988).A relatively high
sue preserved in ethanol in an extensive series of proportion of fragments appears to be derlved
DNA-DNA hybridization studies. From reptilian from DNA that exists in multiple copies, such as
red cells stored frozen for over 20 years, we have mtDNA (Paabo et al., 1988) and Alu-xepeats
isolated high-molecular-weight DNA that yielded (Paabo, 1985).
restriction fragment fingerprints comparable to In summary, to minimize detrimental influ-
those obtained with DNA from fresh blood. ences of freezing on macromolecules (as con-
McBee and her colleagues (1987) isolated DNA trasted to viable cells), tissues should be frozen
from mammalian brain, skeletal muscle, and liver quickly and thawed rapidly. Thawing tissues prior
tissues maintained at 24'C from death through 72 to use should be avoided when possible. Finally,
hours after death. They found that DNA was most proteins in homogenates are stabilized by the pres-
stable in brain and least stable in liver tissue. ence of coenzymes, sulfl~ydrylreagents, glycerol,
DNA fragments have been isolated from tis- DMSO, and the presence of other proteins.
sues of preserved museum specimens and fossils
and compared to DNA from contemporary
species. Work on such fragments is difficult, how- DEVELOPMENT AND SUPPORT OF
ever, because trace amounts of contaminating SYNOPTIC TISSUE COLLECTIONS
DNA can yield false-positive results. Highly de-
graded segments of DNA have been isolated from CoIlections of tissues in which macromolecules
mummies, museum specimens preserved dry and are structurally intact and/or physiologically ac-
in ethanol, and even rarely from forn-taldehyde- tive are important resources for investigahons in
fixed and paraffin-embedded tissues. For exam- many areas of biology; several natural history
40 Chapter 3 / Dessauer, Cole & Hafner
museums and herbaria worldwide are accumu- museum symbolic codes). Examples of organisms
lating such collections (see listings at the end of from which tissues may be accepted without
this chapter). The goal is to develop collections vouchers include large ungulates, proboscideans,
that a r e as synoptic and diverse as traditional and marine mammals such as whales; pho-
skin, skeletal, dried, and splrit collections. tographs can substitute as vouchers in such cases.
Disposition of Tissues for Deacquisition Policies

Unlike materials in conventional systematic col-
Long-Term Preservation lections, those stored in tissue collections are usu-
Tissues collected in the field may be handled in ally consumed as they are analyzed. Consequently,
one of Ll~reeways, depending on whether or not when discussing transfer of these materials be-
the researcher maintains a formal frozen tissue tween collections and researchers, the word "loan"
collection and how soon the samples are to be shodd be replaced with "grant" in most instances.
used. lL the collector does not maintain a tissue Because tissues are only quasipermanent in the
collcctlon and does not plan to use the material collection, a premium is placed on their judicious
ixnmedlatcly, he or she should send the tissues dissemination and use. An efficient inventory sys-
(with con~plctedocumentation) to a recognized tem is required to keep track of their use, distribu-
tissue repository. If the collector does not maintain tion, and status. It is the joint responsibility of the
a t~ssuecollection but intends to use the samples donor and the recipient to be sure that transfer of
immediately, he or she should plan to send all un- tissue samples is legal (i.e,, the recipient may re-
used tissues, antisera, DNA clones, and other tis- quire special permits to handle the material). The
sue isolates (with complete documentation) to a repository and person(s) who originally collected
recognized tissue repository upon completion of the tissues should be properly acknowledged in
the research. If the collector or institution inain- any publication resulting from their use.
tains a formal tissue collection, incoming tissue
samples should be incorporated into the perma- Database Mnnagement
nen t coilection. Many of the unique curatorial problems posed by
frozen tissue collections can be minimized by a
Curatorial Problems Unique to computerized inventory system. Programs de-
signed for traditional museum coIlections may be
Tissue Collections adapted for use with frozen collections. The com-
Curators of tissue collections face several unique puter database should include all field data and
curatorial problems; important among these are cross-references to the field catalogue number and
the acquisition and deacquisition of specimens voucher specimen number. When a sample is de-
and database management. pleted, prompt deletion from or emendation of
the collection records avoids costly searches.
Acquzs~tionPolicies
Curators of frozen tissue collections should seize Guiding Principles
every opportunity to acquire tissues from rare, un- Museums and herbaria have maintained synoptic
usual, and exotic species. Zoos, botanical gardens, scientific collections for use in systematic research
and biomedical institutions often are sources of for decades, and in some cases, for centuries. In
valuable specimens. Samples from rare species are contrast, the development of collections of frozen
valuable, even if they are not in optimal condition tissues and other materials useful for molecular
or lack complete data. Positive identification research is relatively recent. Because individuals
shou!d be the primary concern for acceptance of collecting tissues in the field generally save Inore
malcrlals ~ n t othe collection; but details concern- material than is necessary for their immediate re-
ing rhti source of the specimen, including the loca- search, small collections of tissues are common
ti017 of !he traditional voucher specimen, are de- worldwide. Specimens in these collections, avail-
sirable (see Leviton et al., 1985 for listing of able to the genera1 community of systematists,
Collecfion and Storage of Tissues 41
have helped answer many research questions. 3. In the laboratory, use only as much of any
currently, individual scientists distribute samples sample as is necessary to complete the exper-
of frozen tissues throughout the research commu- iments. Conserve as much of the sample as
nity on an informal basis. It is hoped that, in time, possible for future use.
a network of institutions will accept the responsi- 4. After completing the experiments, place the
bility for long-term maintenance; development, remaining samples in a formal synoptic tissue
and distribution of such material. Unique curato- repository.
rial problems posed by frozen tissue collections
have been addressed at workshops organized by 5. Personal collections should be discouraged,
the Association of Systematics Collections in 1983 as they are often lost to science. If a personal
(Dessauer and Haher, 1984) and in 1988, and a set research collection is maintained, arrange for
of guidelines has been proposed for their curation its conservation and timely transfer to an ap-
(Dessauer et al., 1988). propriate institutional collection in the event
The principle of obtaining large quantities of of disability or death.
tissues when opportunity permits (rather than a
small sample for a few experiments by a single
scientist) greatly enhances the value of collecting
efforts. The following principles (Dessauer et al.,
EXISTING COLLECTIONS
1988) are intended to guide the development of The following is a list of collections of materials
synoptic collections of tissues representing the useful for research in molecular biology, updated
world's biota: from the list of Dessauer and Hafner (1984). We
welcome any additions and comments for im-
1. When collecting, do not limit efforts only to
proving this list in future editions of this book.
the specific material needed. Within the lim-
Collections are listed alphabetically by coun-
its of permits, make general collections when try and (for the United States) by state. Each col-
opportunities arise, with emphasis on the un- lection cited includes collection location, person(s)
usual, the difficult to obtain, and on filling
in charge, specific information on collection size,
gaps in existing collections. Individuals who
materials, taxa, and geographic regions repre-
seek grants of tissue from large collections
sented. Investigators who may wish grants from
should also help to develop such collections. any of these collections should first read the pre-
2. When preparing specimens, discard as little ceding section, "Development and Support of
material as possible. Obtain tissue samples for Synoptic Tissue Collections," and, in particular, be
the research, select appropriate anatomical prepared to assist in future collection growth or to
material to document the specimen by tradi- reciprocate in other ways.
tional means, then freeze as much of the spec-
imen as possible for a synoptic frozen tissue
repository.
42 Chapter 3 / Dessauer, Cole b Hafner
APPENDIX: SYNOPTIC TISSUE Strengths: Caudata, Squamata, Crotalus

COLLECTIONS Contact: Robert W. Murphy
ROYAL ONTARIO MUSEUM

KEY Department of Mammalogy, 100 Queen's I'ark,
Toronto, Ontario M5S 2C6
Size Regions
A = Nearctic Size: 4; material: 6; taxa: 5; regions: A, C, E
1 = <I00 specimens
Contact: Mark D. Engstrom or J. Eger
2 = 100-1000 specimens B = Palearctic
3 = 1000-5000 speci- C = Neotropical
mens D = Ethiopian REDPATH M U S E W
4 = 5000-10,000 speci- E = Oriental
McGill University, 859 Sherbrook St. West,
mens F = Australian
Montreal, Quebec H3A 2K6
5 = 10,000-30,000 Size: 3; material: 1, 6,7; taxa: 2; regions: A<, F
specimens Material Strengths: tissues for chromosomes
1 = nucleic acids Contact: David M. Green
6 = >30,000specimens
2 = plasmids
Taxa 3 = probes, primers, FINLAND
1 = fishes libraries FINNISH GAME AND FISH RESEARCH
2 = amphibians 4 = isolated proteins INSTITUTE
3 = reptiles 5 = cell lines Ahveniarvi Game Research Station, NN-82900,
4 = birds 6 = frozen tissues llomantsi
5 = mammals 7 = blood Size: 3; material: 6; taxa: 5; regions: B
6 = arthropods 8 = antiscra Strengths: Alces
7 = other invertebrates 9 = seeds and/or spores Contact: Tuire Nygren
8 = bacteria, fungi,
algae FRANCE
9 = green plants
INSTITUT CURIE
Section de Biologie, URA 620, CNRS,
26 rue d'Ulm, 75231, Paris Cedex 05
Size: 3; material: 5,6; taxa: 3,5; regions: A-E
AUSTRALIA Strengths: cryopreserved cells and tissues
UNIVERSITY OF NEW ENGLAND Contact: Vitaly Volobouev
Northern Rivers, P.O. Box 157, Lismore,
N. S. W. 2480 INSTITUT SCIENCES EVOLUTION
Size: 3; material: 6; taxa: 3-5; regions: F URA 327, CNR, Universite Montpellier 11,
Contact: Peter Baverstock Case 064, F-34095, Montpellier
Size: 3;material: 1,6; taxa: 5; regions: A-E
SOUTH AUSTRALIA MUSEUM Strengths: Soricidae, Arvicolidae, Muridae
Evolutionary Biology Unit, Adelaide, S. A.5000 Contact: f ancois Catzeflis
Size: 5; material: 1,3, 6-8; taxa: 1-7; regions: E, F
Strengths: fauna of Australia and New Guinea GERMANY
Contact: Mark Adams or Stephen Donnellan HESSISCHES LANDESMUSEUM
Department of Zoology, Friedensplatz 1,
CANADA 64283 Darmstadt
ROYAL ONTARIO MUSEUM Size: 3; material: 3,4,6-8; taxa: 2,3;
Department of Ichthyology and Herpetology, regions: B, D, E
100 Queen's Park, Toronto, Ontario M5S 2C6 Strengths: Viperidae, Agamidae, Gelckonldae
Size: 6; material: 6,7; taxa: 1-3; regions: A-F Contact: Urich Joger
Collection an.d Storage of Tissues 43
ISRAEL UNITED STATES OF AMERICA

INSTITUTE OF EVOLUTION
University of Haifa, Haifa 31905 Alaska
Size: 3; material: 6,9; taxa: 2,3,5, 6,9; regions: B ALASKA DEPARTMENT OF FISH AND GAME
Contact: Eviatar Nevo Genetics Laboratory, 333 Raspberry Road,
Anchorage, AK 99518
MALAYSIA Size: 5; material: 6; taxa: 1; regions: A
UNIVERSITY OF MALAYA Strengths: salmonids
Department of Zoology, 59100 Kuala Lumpur Contact: Lisa Seeb
Size: 3; material: 6,7; taxa: 1,2,5,6; regions: E
Contact: H. S. Yong NOAA-NMFS, AUKE BAY FISH LAB
11305 Glacier Hwy., Juneau, AK 99803
THE NETHERLANDS Size: 5; material: 6; taxa: I; regions: A
UNIVERSITY OF MJMEGEN Strengths: salmonids
Department of Biochemistry, P.O. Box 9101 Contact: James Olsen
6500 HB, Nijmegen
Size: 1; material: 1,3,4,6,8; taxa: 4,5; Arizona
regions: A-F UNIVERSITY OF ARIZONA
Strengths: mammalian and bird lenses, College of Pharmacy, Tucson, AZ 85721
crystallins, and corresponding libraries; cDNA Size: 2; material: 4, 6,8; taxa: 1,3,6; regions: A-F
Contact: Wilfried W, de Jong Strengths: venom, antivenins
Contact: Findlay E.Russell
REPUBLIC OF CHINA
NATIONAL DEFENSE MEDICAL CENTER California
Department of Anatomy and Biology, MUSEUM OF VERTEBRATE ZOOLOGY
P.O. Box 90048-502, Taipei 100, Taiwan 1120 Life Sciences Bldg., University of California,
Size: 2; material: 5,8; taxa: 3,5; regions: A, E Berkeley, CA 94720
Strengths: antisera of turtles and snakes; Size: 6; material: 1,3,6,7; taxa: 2-5;
smooth muscle cells of mammals regions: A, B, C
Contact: Shou-Hsian Mao or Mei-Hua Lu Contact: James L. Patton, N. K.Johnson, or
David B. Wake
RUSSIA
INSTITUTE OF MOLECULAR BIOLOGY UNIVERSITY OF CALIFORNIA AT DAVIS
Laboratory of Isotopic Investigations, Section of Evolution and Ecology, Davis,
32 Vavilov Street, Moscow 117984 CA 95616
Size: 1; material: 1, 7; taxa: 3; regions: B Size: 4; material: 3,6, 7; taxa: 2,3; regions: A, C
Strengths: parthenogenetic reptiles Strengths: Ambystomatidae, Emydidae
Contact: Veznata V. Grechko Contact: H. Bradley Schaffer
SWEDEN BECKMAN RESEARCH INSTITUTE
UPPSALA UNIVERSITY Department of Molecular Biochemistry, City of
Department of Genetics, Box 7003, Hope Medical Center, 1450 E. Duarte Road,
Uppsala S-750 07 Duarte, CA 91010-0269
Size: 4; material: 1,3,5-7; taxa: 1-5, 7; regions: B Size: 3; material: 1-3; taxa: 4; regions: A
Strengths: social insects, salmonids, Swedish Strengths: probes of MHC alleles; bird
herps, boas, lemmings, shrews alloantisera
Contact: Hakait Tegeistrom or Hans P. Geltes Contact: Marcia M.Miller
44 Chapter 3 / Dessnt~er,Cole b Hafne~
CENTER FOR REPRODUCTION OF Indiana

ENDANGERED SPECIES INDIANA UNIVERSITY
Zoological Society of San Diego, P.O. Box 551, Department of Biology Bloomington, IN 47405
Sar, Dicgo, CA 92112 Size: 2; material: 1-3; taxa: 9; regions: A-F
Slze. 6, material: 1-3,5-7, taxa 5; regions: A-F Strengths: clones of chloroplast genomes;
Strengths artiodactyls angiosperms
Contact. Oliver A. Ryder Contact: Jeffery D. Palmer
LABORATORY OF MOLECULAR Kansas
SYSTEMATICS MUSEUM OF NATURAL HISTORY
California Academy of Sciences, San Francisco, University of Kansas, Division of Herpetology
CA 94118 Lawrence, KS 66045-2454
Size: 3; material: 1,3,6, 8; taxa. 1 4 6 ; Size: 3; material: 6; taxa: 2-3; regions: A, C
regions: A-D Contact: William E. Duellman
Contacl Robin Lawson
MUSEUM OF NATURAL HISTORY
Florida
University of Kansas, Division of Ichthyology,
UNIVERSITY OF FLORIDA Lawrence, KS 66045-2454
Depal tinent of Zoology, 223 Bartram, Size: 3; material: 1,6; taxa: 1; regions: A, F
Gainesv~lle,FL 32611 Strengths: Caribbean reef fishes, catfishes,
Size. 2; material: 1-3, 6,7; taxa: 1-5; regions: A-E: Tongo fish
Strengths. Green turtles, artiodactyls Contact: Edward 0.Wiley or W. W. Dirnrnick
Contact-,Michael M. Miyanloto
Louisiana
UNIVERSITY OF FLORIDA MUSEUM OF NATURAL SCIENCE
Department of Small Animal Clinical Sciences, Louisiana State University, Baton Rouge,
P.O. Box 100126, Gainesville, FL 32610-0126 LA 70803
Size: 3; material: 5-8; taxa: 3; regions: A-C, E Size: 6; material: 1-8; taxa: 1-5; regions: A-F
Strengths: viruses; monoclonal antibodies of Strengths: squamates, crocodiles, New Guinea
reptllc proteins herps, neotropical birds and mammals .
Contaci. Elliott Jacobson Contact: Fredrick H. Sheldon
Illinois Maryland
FIELD MUSEUM OF NATURAL HISTORY AMERICAN TYPE CULTURE COLLECTION
Division of Amphibians and Reptiles, Roosevelt Department of Microbiology, General Service
Road at Lake Shore Dr., Chicago, IL 60605-2496 Collection, 12301Parkland Drive, Rockville,
Size: I, material: 6; taxa: 2,3; regions: E MD 20852
Contac!: Harold K. Voris Size: 6; material: 5; taxa: 8; regions: C, E
Strengths: microbes
UNLVERSITY OF ILLINOIS Contact: Raymond H. Cypress
Department of Ecology, Ethology and Evolution,
5 15 Morrlll Hall, 5Q5S. Coodwin Ave., Urbana, AMERICAN TYPE CULTURE COLLECTION
ZL61ROl Department of Cell Culture, 12301 Parkland
Size 2; material: 1,3,6, 8; taxa: 1; Drive, Rockville, MD 20852
reglons: A, C, D, F Size: 3; material: 5; taxa: 1-7; regions: '8-F
Contact. Gregory S. Whitt Contact: Robert 1. Hay
AMERICAN TYPE CULTURE COLLECTION Size: 6; material: 3-7; taxa: 1,3-6; regions: A-C, E
Department of Molecular Biology, 12301 Strengths: Mammals of southwestern USA and
Parkland Drive, Rockville, MD 20852 Bolivia
Size: 5; material: 1-3; taxa: 5; regions: A-F Contact: Terry L. Yates or William L. Cannon
Strengths: national repository for human and
mouse DNA, probes, and libraries New York
Contact: Bill Nierman AMERICAN MUSEUM OF NATURAL
HISTORY
UNNERSITY OF MARYLAND Molecular Laboratory, 79th Street at Central Park
Department of Zoology, College Park, MD 20742 West, New York, NY 10024-5192
Size: 5; material: 6, 7; taxa: 2; regions: A Size: 3; material: 1, 6,7; taxa: 2, 3,5, 7; regions: A
Strengths: Plethodon Contact: Ward Wheeler or Rob DeSalle
Contact: Richard Highton
UNIVERSITY OF ROCHESTER
CAPTIVE PROPAGATION RESEARCH GROUP Department of Biology, Rochester, NY
Patuxent Wildlife Research Center, U. S. Size: 2; material: 5; taxa: 8; regions: A-C
Department of the Interior, Laurel, MD Strengths: E , coli strains characterized pheno-
20708-4019 typically and genetically
Size: 3; material: 6,7,8; taxa: 4; regions: A Contact: Howard Ochman
Strengths: Semen and blood of cranes
Contact: George F. Gee BRONX ZOO
Department of Wildlife Management Services,
NATIONAL CANCER INSTITUTE Bronx, NY 10460-1099
National Institutes of Health, P.O. Box B, Size: 4; material: 1,3, 6,7; taxa: 2-5; regions: A-F
Frederick, MD 21701 Strengths: zoo and aquarium specimens, Tibet
Size: 3; material: 1,5-7; taxa: 5; regions: A-F and Patagonia
Strengths: Felidae; other carnivores; primates; Contact: Dan Wharton or George Amato
marsupials
Contact: Stephen J. O'Brien Ohio
CINCINNATI MUSEUM OF NATURAL
Michigan HISTORY
WAYNE STATE UNIVERSITY University of Cincinnati, Cincinnati, OH
Department of Anatomy and Cell Biology, 45202-1401
School of Medicine, Detroit, MI 48201 Size: 3; material: 6,7; taxa: 2-5,9; regions: E
Size: 2; material: 1,3,6,8; taxa: 5; regions: B-D (Philippines only)
Contact: Morris Goodman Contact: Robert S. Kennedy
New Jersey .Oregon
RUTGERS UNIVERSITY U.S. NATIONAL FISH AND WILDLIFE
Center of Theoretical and Applied Genetics, SERVICE
New Brunswick, NJ 08903-0231 Forensic Laboratory, 1490 E. Main St., Ashland,
Size: 5; material: 6; taxa: 1,7; regions: A-F OR 97520
Strengths: Deep-sea hydrothermal vent Size: 5; material: 6,7; taxa: 2,3,5; regions: A-D
invertebrates Strengths: herps from Morocco and Spain;
Contact: Robert C. Vjrijenhoek Eumeces; cervids; ursids, manatees
Contact: Stephen D. Busack (herps),
New Mexico Wayne F. Ferguson (mammals)
MUSEUM OF SOUTHWESTERN BIOLOGY
University of New Mexico, Albuquerque,
NM 87131
46 Chapter 3 / Dessaueu, Cole & Hafneu
Pennsylvania UNIVERSITY OF TEXAS

ACADEMY OF NATURAL SCIENCES Texas Memorial Museum and Department of
Department of Ornithology, 1900 Benjamin Zoology, Austin, TX 78712
Franklin Park, Philadelphia, PA 19103 Size: 5; material: 1-3,6,7; taxa: 1-8; regions: A-F
Size: 4; material: 1,6,7; taxa: 4,5,7; Strengths: amphibians and reptiles from North,
regions: A-C, E Central, and South America; parrots; crayfishes
Strengths: neotropical birds; mollusks of Texas; laboratory-evolved viruses
Contact: Frank B. Gill Contact: David C. Cannatella or David M. Hillis
UNIVERSITY OF PITTSBURGH TEXAS TECH UNIVERSITY

MEDICAL CENTER The Museum, P.O. Box 4499, Lubbock, TX 79409
Department of Psychiatry, 3811 O'Hara St., Size: 5; material: 6; taxa: 2-5; regions: A, C-E
Pittsburgh, PA 15213-2593 Strengths: bats and rodents of the Caribbean and
Size: 2; material: 1,5; taxa: 5; regions: C-E Central America
Strengths: DNA of major primate taxa; Contact: Robert J. Baker
fibroblasts
Utah
Contact: Kathy Neiswanger
BRIGHAM YOUNG LINIVERSITY
CARNEGIE MUSEUM OF NATURAL HISTORY Department of Biology, Provo, UT 84602
Section of Mammals, Edward O'Neil Research Size: 4; material: 1,3,6; taxa: 1,3,5; regions: A, C
Center, 5800 Baum Blvd., Pittsburgh, PA Contact: Jack W. Sites, Jr., Duke S. Rogers, or
15206-3706 Dennis K. Shiozawa
Size: 5; material: 1,6,7; taxa: 3,5; regions: A-D
Vermont
Strengths: African rodents, bats, and insectivores
Contact: Duane A. Schlitter UNIVERSITY OF VERMONT
Department of Zoology, Burlington, VT
South Carolina 05405-0086
RIVERBANKS ZOOLOGICAL PARK Size: 4; material: 1,6,7; taxa: 5; regions: A-C, E
Veterinary Department, P. 0 . Box 1060, Strengths: muroid rodents; mongooses; saliva
Columbia, SC 29202 Contact: C. William Kilpatrick
Size: 1; material: 6-8; taxa: 3-5; regions: A-F
Washington
Strengths: chiefly mammals
Contact: Terry Norton WASHINGTON DEPARTMENT OF FISHERIES
P, 0. Box 43135, Olympia, WA 98504-3135
Texas Size: 3; material: 6; taxa: I; regions: A, F
ANGEL0 STATE UNIVERSITY Strengths: salmon and trout of northwest USA
Department of Biology, Natural History Contact: James B. Shaklee
Collection, San Angelo, TX 76909
Washington, D. C .
Size: 3; material: 6,7; taxa: 5; regions: A, C
Strengths: Mexican mammals, especially from LABORATORY OF MOLECULAR
the Yucatan SYSTEMATICS
Contact: Robert C. Dowler National Museum of Natural History,
Smithsonian Institution, Washington, DC 20560
UNIVERSITY OF NORTH TEXAS Size: 4; material: 1-3,6,7,9; taxa: 4;5, 9;
Department of Biological Sciences, Denton, regions: A, C, E
TX 76203 Strengths: New World birds, especially Brazil,
Size: 2; material: 1,3,4; taxa: 1,5; regions: A Ecuador, and Panama; Philippine birds; maize
Contact: Earl G. Zirnmerman Contact: Michael Braun (animals) or Elizabeth A.
Zimmer (plants)
West Virginia Size: 2; material: 7; taxa: 4; regions: A, B, D-F

MARSHALL UNIVERSITY Strengths: blood from all species of cranes
Department of Biological Sciences, Huntington, Contact: Claire M. Mirande
WV 25701
Size: 2; material: 6,7; taxa: 3; regions: A UNIVERSITY OF WISCONSIN
Strengths: turtles Molecular Systematics Lab, Zoological Museurn,
Contact: Michael E. Seidel 250 Mills Street, Madison, WI 53706
Size: 3; material: 1, 3, 6-8; taxa: 3-5; regions: A-F
Wisconsin Strengths: marsupials, fruit bats, birds
INTERNATIONAL CRANE FOUNDATION Contact: John A. W. Kirsch
Ell376 Shady Lane Road, Baraboo,
WI 53913-9778 .
ar Techniques
Proteins: Isozyme E ectrophoresis
Robert W. Murphy, Jack W. Sites, Jr., Donald G. Buth, and
Christopher H. Haufler
INTRODUCTION
Protein electrophoresis, t l ~ emigration of proteins under tlze influence of an elec-
tric field, is among the most cost-effective methods of investigating genetic phc-
nomena at the molecular level. Since the origin of starch gel electrophoresis
(Smithies, 1955) and the histochemlcal visualization of enzymes on gcls (I-lunter
and Markext, 19571, and the classic studies of H. Harris (1966), Ilubby and Lewon-
tin (19661, and Lewantin and Hubby (1966),a major revolution in undersfandll-ig
micro- and macroevolutionary processes has occurred. Using enzymatic and non-
enzymatic proteins, numerous investigations have focused on enzyme effic~cncy,
estimating and understanding genetic variability jn natural populations, gene
flow, hybridization, recognition of species boundaries, and phylogenetic rela-
tionships, among other problems. The frequency of such investigations has not
waned in recent years, but rather has increased as refinements and new methods
have been developed.
Two general forms of protein data can be gathered simultaneously using elec-
trophoretic methods. One is derived from isozymes, which are all functiol-tally
similar forms of enzymes, including all polymers of subunits produced by dlf-
ferent gene loci or by different alleles at the same locus (Markcrt and Mollcr,
1959).The other data set consists of allozymes, a subset of isozymes, which are
variants of polypeptides representing different allelic alternatives of the same
gene locus. Both forms of data are important in molecular systematics, and both
involve proteins that can be separated on the basis of net charge and size.
52 Chapter 4 /Murphy, Sites, Buth ~!3Haufler
Here we provide a review of applications, they are attracted to neither the (positive) anode
step-by-slep instruct~onson how to establish a nor the (negative) cathode.
hor~zonlalstarch gel electrophoresis laboratory, Uncharged amino acids are either non-polar
perform protein electrophoresis, stain for specific and hydrophobic or polar. These amino acids can
entyrnatlc and non-enzymatic pmteins, and inter- become hydrogen-bonded to one another result-
pret the resultant gels. Although other metl~odsof ing in folding (/%structure)or helical (a-helix) con-
prote~nelectrophoresis exist, using media such as figurations, termed secondary structure. Depend-
cellulose acetate gels (Richardson et al., 1986),we ing on the primary and secondary structure, the
have chosen to detail horizontal starch gel meth- molecule usually undergoes additional folding to
ods because of their widespread use and effi- form its tertiary structure. The shape and size of a
ciency Ways of avoiding or recovering from com- protein also may have an effect on protein migra-
mon pitfalls are described. The electrophoretic tion, depending on the pore size of the elec-
principles and methods described are applicable trophoresis matrix. To some extent the shape of a
to all organisms. particular protein is determined by the relative
Where possible, we provide inexpensive al- charges of adjacent amino acids because of the ef-
terna tives to costly equipment and methods, but fect of like charges repelling and different charges
not a i the expense of increased health risk. As attracting. Finally, many proteins contain more
wlth most molecular methods used in systemat- than one polypeptide chain (subunit) bound to-
ics, some aspects of the data gathering pose ex- gether by hydrogen bonds, van der Waals forces,
treme health risks, both acute and chronic. There- ionic bonds, disulfide bridges, and/or hydropho-
fore, the appropriate level of caution, as known to bic interactions. Proteins having more than one
us, 1s always given the highest priority. polypeptide (multimeric) have a quaternary
structure (Darnell et al., 1986).
Some forms of electrophoresis separate pro-
PRINCIPLES AND COMPARISON OF teins on the basis of net protein charge Q, shape
METHODS as measured by radius r, strength of the electric
field d, and viscosi!y of the suspension medium n,
General Principles as given by the following equation:
Proteins are composed of amino acids joined by

covalent peptide bonds to form polypeptides.
Thesc sequences, or "primary structures," are ge-
neticalIy determined. Each of the 20 amino acids Under appropriate conditions, the rate of move-
has a unique side chain, characterized by its ment, u, increases with net charge and strength of
shape, slze, and charge. The side chains on five of electric field and decreases with the size of the
these amino acids are either basic and thus posi- molecule. The actual situation is usually more
tively charged (NH3+;lysine, arginine and histi- complex than this simple equation indicates (e.g.,
dme), or acidic and negatively charged (COO-; as- some proteins occur as relatively simple strands,
pariic acid and glutamic acid). Charged side whereas others have a globular form) and indeed
chains are responsible for the movement of the much remains to be learned about the physics of
proteins through a matrix during electrophoresis. electrophoresis itself.
The net charge of each protein varies with pH; at All electrophoretic techniques consist of an
a iotv pH the amino groups become positively electric power supply, a support matrix (cellulose
charged, and at high pI3 the carboxyl groups be- acetate gel or strips, starch gel, etc.), and ionic
come negatively charged. Most proteins have a buffers. Electric current is applied to opposite
po~ntat which the effect of positive and negative ends of the suspension medium via the ionic
charges are equal, the isoelectric point. Isoelectric buffers. Molecules (e.g., proteins) having a net
protelns do not move in an electric field because positive charge (cations) migrate to the cathode,
Proteins: Isozyme Electrophoresis 53
and negatively charged proteins (anions) migrate

to the anode. Following electrophoresis, the pro-
Assumptions
teins may be visualized by a number of different The correct application of isozyme data requires
methods, the most frequently used being specific that banding patterns observed on gels are tor-
histocl-iemical staining ( ~ p p e n d i x1 and refer- rectly interpreted. The most basic assumption that
ences cited therein). After electrophoresing a pro- evolutionary biologists make in using isozyme
tein sample in the gel matrix, the individual pro- data is that changes in the mobility of enzymes in
teins are selectively stained. Most of the stains an electric field reflect changes in the encoding
provide a specific substrate for the enzyme, allow DNA sequence. Thus, if the banding patterns of
it to catalyze the particular reaction involved, and two individuals differ, it is assumed that these dif-
then develop a dye that can be visualized in nor- ferences are genetically based and heritable (see
mal light or by fluorescence under UV light. Thus, Matson, 1984).Also, it is assumed that enzyme ex-
from the hundreds or thousands of enzymes in pression is codominant, i.e., that all alleles at a lo-
the crude extract, proteins with the same substrate cus are expressed. To interpret these banding pat-
utilization can be identified. terns, one must know something about the
As detailed later, numerous ionic buffers are number of subunits in the enzyme. Interpretation
available for electrophoresis. The buffers serve and non-heritable aspects of gel isozyme patterns
several functions. The primary function is to are detailed below and elsewhere (e.g., Richard-
buffer against p H change that occurs during son et al., 1986; Buth, 1990; Hernandez-Juviel et
electrophoresis: acid is produced at the anode al., 1992). In addition to biochemical components
and base at the cathode. The extent of pH change of gel interpretation, one must be aware of com-
is directly related to the duration of electro- partmentalization of enzymes, or enzyme activity,
phoresis, the voltage, and the current generated. in particular organs or organelles. For example,
Buffers also form an ionic connection between livers may have different enzymes than spleens or
the electric supply (electrodes) and suspension brains (see Murphy and Matson, 1986). In plants
medium {gel) and reduce the interaction of and animals, some enzymes are restricted to the
charged groups on the protein with any charged cytosol whereas others are found only in mito-
groups in the matrix, and may modify the net chondria or chloroplasts (Weeden and Wendel,
charges of proteins, carry enzyme stabilizers 1989; D.J. Crawford, 1990).
(e.g., disodium EDTA), or provide enzyme cata- For most enzymes, the genetic controls are
lysts (e.g., Mg2'). well enough known to allow genetic inferences to
The amino acid sequences of proteins are be made from gel isozyme patterns. The distribu-
changed by mutations in the encoding DNA lo- tion of isozyrnes per cell or tissue can be reliably
cus. Such mutations ma37 alter shape and net predicted, homozygous and heterozygous indi-
charge, as well as catalytic efficiency and stabil- viduals can be identified, and concIusions about
ity (Shaw, 1965). Protein electrophoresis aims to genetic polymorphism, the breeding system of in-
reveal as many of these changes as possible. dividuals, and population structuring can be
However, considering the principles of elec- drawn. It is necessary to conduct breeding stud-
trophoresis and the properties of protein side ies only when banding patterns depart from ex-
chains, it is unlikely that all allelic variants will pectations. Cell fractionation studies can demon-
be identified using a single buffer pH, buffer sys- strate whether the enzymes are housed in the
tem, or concentration of gel, or even by using a cytoplasm or one of several separate organelles
single method of protein electropl~oresis.Most (Weeden, 1983; Weeden and Wendel, 1989).
laboratories concentrate on manipulating the first Populatio~~ geneticists have developed statis-
three of these four variables because of the diffi- tical models for interpreting genetic population
culty in mastering the alternative technologies structure (Chapter 10). The most relevant of these
and the added expense of additional equipment to gel interpretation is the Hardy-Weinberg equi-
and expendables. librium principle. Oversimplified, this states that
54 Chapter 4 /Murphy, Sites, Butlz & Hauj
in the absence of selection, drift, and migration, Advantages of the four basic methods are com-
the frequencies of alleles in a randomly mating pared in Table 1, although choice of method will
population will maintain a stable equilibrium often be determined by availability of equipment
with genotype frequencies of AA = p2, Aa = 2pq, and expertise.
and aa = q2, where p is the frequency of allele A,
and q is the frequency of the alternative allele a. Starch Gel Electrophoresis (SGE)
Nonconformity to the prediction of Hardy-Wein- Hydrolyzed starch is heated in an ionic buffer so-
berg equilibrium indicates that the phenotypic lution and allowed to cool, thereby forming a gel.
variation has a non-genetic basis or that one or The ratio of starch to buffer can be varied to alter
more of the Hardy-Weinberg assumptions is not the size of the gel pores. Pore size allows for a
met in the population. Thus, for example, the indi- sieving effect in the gel. Thus, these gels can sepa-
viduals may not be randomly mating, or some nat- rate on the basis of both size (shape) and charge.
ural selective force may be acting on the species, or Two forms of SGE exist: horizontal and verti-
genes from neighboring populations may be mi- cal. In horizontal SGE, a poured gel. is allowed to
grating into the study site. Lf these principles of bio- cool in a gel mold without further preparations.
chemistry, genetics, and gel interpretation are fol- Vertical starch gels are poured into double-sided
lowed, electrophoresis can yield many valuable molds having a "gel comb" or "well former" that
insights for the evolutionary biologist. makes the "gel wells" for holding tissue extracts
A major assumption in the use of allele fre- (Brewer, 1970; Morizot and Schmidt, 1990; see also
quency data to infer population structure is that Chapter 8). In general, the vertical system requires
alternative alleles at a given locus are selectively a greater amount of starch and larger quantities of
equivalent or neutral (Kimura, 1983a,b),or nearIy tissue extract, allows for fewer samples to be run
neutral (Ohta, 1992). Exceptions to this assump- per gel, and is thus more costly. The advantages
tion are known (see below), and accepting neu- of vertical SGE include the avoidance of the phe-
trality for most protein polymorphisms also re- nomenon known as electxodecantation: as pro-
quires accepting IargeIy untested or poorly tested teins migrate on horizontal gels, enzymes of high
null hypotheses. However, in the absence of evi- molecular weight tend to drop toward the bottom
dence for selection at a particular locus, it has of the gel. This may make slices from the upper
been suggested that studies begin with neutrality reglons of the horizontaI gel inferior or inade-
as a working assumption (Allendorf and Phelps, quate for resolving these proteins. Nevertheless,
1981). the method of horizontal SGE is used almost ex-
clusively in our laboratories and in the vast ma-
Comparison of the Primary Methods jority of other laboratories; the vertical system will
not be discussed further (for more information,
The four primary methods of electrophoresis dif- see Siciliano and Shaw, 1976; Morizot and
fer by the nature of the support medium: starch Schmidt, 1990).
gel (including both horizontal and vertical sys-
tems), polyacrylamide gel, agarose gel, and cellu- Polyacylamide Gel Electrophoresis (PAGE)
lose acetate gel. Each method will be briefly de- Polyacrylamide gels are formed by the catalytic
scribed and discussed in terms of specific polymerization of monomeric forms of acry-
advantages and limitations. Less frequently used larnide and bisacrylarnide. It allows the separa-
methods of resolving protein variants are not tion of proteins on the basis of both size and
discussed herein; these include paper elec- charge (Chrambach and Rodbard, 1971). The pore
trophoresis (Freifelder, 1982), isoelectric focusing size of acrylamide gels can be controlled by alter-
(Whitmore, 1990), immunoelectrophoresis, and ing concentrations of acrylamide and/or bisacry-
two-dimensional electrophoresis (Harris and larnide. This sieving attribute has made PAGE one
Hopkinson, 1976; Hames and Rickwood, 1981). of the methods of choice in molecular biology lab-
Proteins: lsozyme Electropi~oresi~ 55
Table 1
Comparison of the attributes of the four primary methods of protein electrophoresis
on gel support mediau
Attribute SGE PAGE CAGE AGE
Separates by charge Yes Yes yes yes

Separates by size yesh yes" o
11 110
Number of sllces per gel >fjb 1 1 1
Toxic no6 Yes nab noh
Running time 4-24 hr 4-6 hr 0.3-3 kr" 3-4 h r
Minimum amount of sample
required per gel
Maximum amount of sample
possible per gel >50 plb >50 plb 5 pl >50 /rib
Amount of stain required 5-50 ml 10-50 ml 1-3 mlf' 10-50 ml
Electroendosmosis Yes nob yes Yes
Voltage required (V/cm) 1-10 5-10 <3b 20
Cooling required Yes at times nob YES
Gel easily handled usually" 110 yesb yesb
Simultaneously resolves cationic
and anionic proteins yesb no yesb yes"
Allows counterstaining of adjacent slices yes6 no no no
a~~~ = starch gel electrophoresis;PAGE = palyacxylamide gel electrophoresis; CAGE = cellulose acetate gel electrophoresis,
AGE = agarose gel electraphoresis.
Perceived advantage.
oratories examining nucleic acid sequences be- number of variants identified (M.A. Riley e t al.,
cause, unlike most other forms of gel elec- 1992).In addition, the large pores also cause elec-
trophoresis, it allows for the accurate, controlled troendosmosis, a "backwash" of buffer soliltlon
separation of charged particles on the basis of caused by gel charge groups that accelerates the
molecular weight (Chapters 8 and 9). General ref- mobility of cationic isozymes but retards or re-
erences to this system are found in Hames and verses the anionic isozymes. Although this prob-
Rickwood (1981). lem occurs with SGE and AGE, it is more pro-
nounced with CAGE (Harris and Wopkii~son,
Cellulose Acetate Gel Electrophowesis (CAGE) 1976). CAGE has been discussed in detail by
Electropl~oresiscan be carried out on preformed Rcl~ardsonet al. (1986).
cellulose acetate gels or strips. The gel form of cel-
lulose acetate is preferred because of repeatability Agavosc Gel Electvophorcsis
of experiments (Harris and Hopkinson, 1976). A (AGE) Agar and agarose gels are prepared rnuclx
major advantage is that electrophoresis can be car- in the same way as starch gels. Pure "agar" gels
ried out wit11 very small quantities of tissue ho- have a relatively high concentration of acidic
mogenate. Althougl~the gel itself is prernade, it groups (carboxyls and sulfates), resulting in con-
must be soaked in the appropriate buffer prior to siderable electroendosmosis and occasiol~alad-
electrophoresis. Due to the large pore size, CAGE sorption of proteins, although adsorption prob-
has no sieving effect; proteins are separated on the lems may be overcome by use of highly purified
basis of net charge only and this may reduce the agarose (Harris and Hopkinson, 1976).
56 Chapter 4 / Mzirphy, Sites, Buth &' Haufler
low, 1988; Baker and Moeed, 1987; A.J. Baker,

APYLICAT1oNS AND LIMITAT1oNS 1992) have revealed that founder populations
generally have fewer alleles and lowered het-
erozygosity than source populations. Easteal
(1985) conducted a similar study and estimated
Poptilation Slvlicture effective popultion size (N,) from both allozyme
Proteln palymorphisms in natural populations frequency and ecological data for founder popu-
have been used to describe allele frequency lations of the giant toad (Bufo marinus) with
changes in both time and space. For example, known introduction histories in Hawaii and Aus-
Patarnello et a). (1989) showed GPI alleles to be tralia, and concluded that the forn~ergave more
stable over a 15-year period in the marine arnphi- accurate estimates.
pod Ga?nmnrusinsensibilis (see below). A temporal Extremely low levels of allozyme heterozy-
study carried out by McClenaghan et al. (1985) on gosity in broad geographic surveys may imply
rr~osquitofishin the Savannah River drainage the occurrence of one or more recent severe bot-
basln showed that allele frequencies were gener- tlenecks, especially when genetically similar sis-
ally stable through time. ter species possess much higher levels of variabil-
On a spatial scale, population genetic struc- ity (e.g., Menken, 1987).However, for a number of
ture has been inferred from the geographic distri- theoretical reasons, heterozygosity (H)estimates
butlor1 of allele frequencies. For example, among may not always be good indicators of past popu-
pralrle dog populatio~~s (Cynolnys ludoviciantts), lation bottlenecks (Nei et al., 1975; Chakraborty
genetic variability was partitioned at several hier- and Nei, 1977; Motro and Thornson, 1982; see B.J.
archical levels that correlated with life history pa- Turner, 1984, for a possible empirical example;
rameters (Chesser, 1983). Rogers and Engstrom and Leberg, 1992, for a controlled study of bottle-
(1992) found a high degree of genetic structuring neck effects). From a conservation perspective, es-
in spiny pocket mice of the Liomys pictus species timates of allozyme variability may be used to
group, indicating that some species were para- make genetically based management recommen-
phyletic. In contrast, Shaklee (1984) surveyed dations for the recovery of endangered species
damselfish populations (Sfegastes fasciolatus) col- (Vrijenhoek et al., 1985; A.F. Echelle et al., 1989;
lected from Midway Island to Hawaii, a linear Briscoe et al., 1992). Some plant studies (e.g.,
distance of about 2500 km. Allele frequencies at all Muona et al., 1987) have demonstrated that the
eight polymorphic loci were remarkably uniform frequency of inbred individuals is significantly re-
(FsT= 0.003; see Chapter 10)and, with the excep- duced as populations progress from seed to
tion of a few very rare alleles, all populations pos- seedling 1-0 adult stages.
sessed the same alleles at all loci.
Allozyme variability has been investigated Inbreeding, Outcrossing, and Dispersal
from the perspective of both space and time. For Allozyme studies can mesh genetic and ecological
example, Hoey and Parks (1991) examined allo- information to strengthen inferences about specific
zyrne variability and divergence of North Amer- aspects of population structure, especially breed-
ican and Asian plants of the genus Liquidambar ing structure and effective gene flow (e.g., Ham-
(Hamnmelidaceae) and Haglund et al. (1993)eval- rick and Godt, 1984; Slatkin, 1985,1987; Chepko-
uatcd Asian, North American and European nine- Sade and Kalpin, 1987; Ryman and Utter, 1987;
spine sticklebacks (Gastcrosteidae). Slatkin and Barton, 1989).Pope (1992) compared
Assessments of allozyme variability may be genetic estimates of gene flow and breeding struc-
used to infer 21istorical events that have signifi- ture with expectations based on 10 years of census
cantly influenced the genetic structure of popula- data for two Venezuelan popurations of the red
tions, such as bottlenecks (Bonnell and Selander, howler monkey (Alouatta seniculus). The two pop-
1974).Skudies of several species of introduced ulations differed significantly in among-troop ge-
buds (Parkin and Cole, 1985; St. Louis and Bar- netis structure, with FST= 0.225 for popuIation W
Pmteins: Jsozyrne Electrophoresis 5,
and FST= -0.142 for population G; the latter pop- 1983, 1985b, 1987; Knight and Waller, 1987;
ulation was subject to a higher rate of troop failure Crouau-Roy, 1988; Van Treuren et al., 1991).
and recolonization. This pattern also correlates
with significant population differences in within- Paternity Studies
troop heterozygosity (Frs = -0.136 for W, FIs = Allozyme studies combining ecological data on
-0.064 for GI, although troops in both populations dispersal with genotypic data that establish pa-
displayed excess heterozygotes, a pattern pre- ternity of offspring have allowed assessment of
dicted from ecological and behaviorial observa- the relative importance of several factors affecting
tions of mating and dispersion. The importance of genetic structure in some mammals. An experi-
obtaining independent estimates of dispersal can mental removal and colonization study of pocket
be illustrated by K.L. Brown's (1985) study of the gophers (Tl~omornysbottae; Patton and Feder, 1981)
demographic and genetic characteristics of disper- revealed that migration (i.e., recolonization) was
sal in mosquitofish (Gambusia affinis) in a ther- nearly random with respect to the available
mally heated pond on the Savannah River Reser- source populations. This movement depressed be-
vation, South Carolina. Genotype frequencies of tween-field genetic heterogeneity, but this was re-
the dispersers were non-randomly distributed stored within a single generation due to a high
throughout the pond, and associations of genetic variance in male reproductive success. Juvenile
distance values and geographical distances be- dispersal apparently was responsible for the
tween collection sites indicated that the dispersers maintenance of intrapopulation variability in
did not constitute a random intermixing of refuge highly socially structured breeding groups of a
groups. Counter to intuitive expectations, disper- Neotropical cave-dwelling bat (McCracken and
sal in these populations resulted in an increase in Bradbury, 1977) and colonies of yellow-bellied
allelic differentiationbetween sites and an increase marmots (Schwartz and Armitage, 1980). Genetic
in mean levels of intrapopulation heterozygosity. markers revealed that inbreeding was avoided by
Very few plant or animal species have ade- the near-total dispersal of male offspring from
quate demographic data for estimating effective their natal colonies in both of these species.
genetic dispersal (Endler, 1979) and N,.In the Paternity of specific groups of offspring has
house sparrow (Passer domesticus), Fleischer (1983) been studied in several groups using detectable
used demographic data to predict FST(Wright, allozyme markers. For example, Tilley and Hans-
1943), and then tested the prediction with al- man (1976) collected female dusky salamanders
lozyme data and concluded that these birds ap- (Desmognathus ochrophaeus) and their broods, and
proximated a stepping stone model (Kimura and showed that at least 7% of all individual clutches
Weiss, 1964) of genetic structure. However, a were sired by more than one male. Insemination
number of indirect approaches are now available and fertilization are therefore effectively uncou-
for inferring gene flow patterns entirely from the pled allowing the opportunity for sperm compe-
geographic distribution of allozyme frequencies, tition, which has been shown in controlled labo-
and these appear to be robust for some classes of ratory matings in D,ochrophaeus (Houck et al.,
dispersal patterns (reviewed by Slatkin, 3985, 1985).Other studies of colony breeding structure,
1987, 1993; see also Slatkin and Barton, 1989; based on either relatedness or parentage of spe-
Lessa, 1990; A.H. Porter, 1990; and Chapter 10). cific broods as inferred from allozyme markers,
Some population genetic surveys have re- include Evarts and Williams (1987), Harry and
vealed striking examples of heterozygote defi- Briscoe (1988), Quellar et al. (19881, and Price et al.
ciency, which could result from (1) strong selection (1989). Among studies of plants, Stanton (1986)
against heterozygous genotypes, (2) inbreeding, or summarized the ongoing problems in trying to as-
(3) a Wahlund (1928) effect (the inclusion of two or sess the contribution of different fathers to suc-
more genetically distinct units into a single popu- cessive generations, and Murawski and Hamrick
lation sample). In a number of studies, the in- (1990) investigated the effect of density of flower-
breeding explanation is favored (see O'Brien et al., ing individuals on the mating systems of nine
58 Chapter 4 /Murphy, Sites, Buth &Ha;
species of trees. The latter authors discovered that plications (Daugherty et al., 1990; see also Crother,
there was (not unexpectedly) great variation 1992; T.E. Dowling et al., 1992).
among the density of flowering individuals,
which in itself varied annually, and in years of Ecological Genetics
lower flowering-tree density there was greater The neutral theory of molecular polymorphism
heterogeneity and/or more selfing. (Kimura, 1968; King and Jukes, 1969) has ques-
tioned the primacy of natural selection as an agent
Species Boundaries of molecular evolution (Lewontin, 1974; Nei and
Because isozyme electrophoresis is a cost-effective Koehn, 1983).Several statistical studies have con-
method for screening a large number of single- cluded that most alternative alleles are selectively
copy nuclear gene loci (see Appendix I), it will equivalent and may represent transient stages of
continue to be especially helpful in multiple-pop- replacement, with fixation probability being a
ulation sampling efforts designed to determine function of mutation rate and effective population
species boundaries. Isozyme data readily can be size (see Kimura, 1983a,b).However, these studies
used as diagnostic markers in the sense described are based upon several largely untested assump-
by Davis and Nixon (1992; fixed differences be- tions, and may not distinguish among various
tween samples) for a priori identification of the ba- processes of neutrality and selection (W.B. Watt,
sic units (species) of phylogenetic analyses; this is 1985).Alternatively, several elegant multidiscipli-
especiaUy critical in view of the potential for over- nary studies have investigated the selective basis
splitting taxa defined exclusively by rapidly for specific enzyme polymorpl~isms.
evolving portions of the animal mitochondria1 W.B. Watt (1985,1986) outlined a bioenergetic
genome (Moritz et al., 1992a).Isozyme studies of- approach for investigating possible functional and
ten reveal discordant geographic patterns be- ecological differences between alternate al-
tween levels of genetic divergence and taxonomic lozymes. Documentation of adaptive differences
boundaries inferred from morphological data, esin allozymes requires demonstration of (1) differ-
pecially for geologically old and morpl~ologically ences in a catalytic function, (2) allozyme-based
conservative radiations (see Wake et al., 1983; catalytic differences having pl~ysiologicaleffects,
Wake and Larson, 1987). For example, Larson and (3) fitness differences in natural environments
(1989) has summarized 19 examples of pairs of between physiological effects (see also Koehn,
cryptic or morphologically very similar species of 1978; Powers, 1987).Watt (1977) identified four
plethodontid salamanders differing by from 1 to common GPI alleles in several natural butterfly
14 fixed allozyme markers, and similar cases are (Colias eu ytheme) populations, and demonstrated
known in other salamanders (e.g., D.A. Good et genotypic differences in survivorship, flight activ-
al., 1987; D.A. Good, 1989). Ranker and Schnabel ity, and mating success (Watt, 1983; Watt et al.,
(1986) used isozyme evidence to demonstrate the 1985,1986). Similar studies have been carried out
genetic differentiation of two lily species whose on bivalves (Koehn et al., 1980, 1988; Koehn and
only clear separation was a difference in flower- Immermann, 1981; Koehn and Siebenaller, 1981;
ing time. ]. Shaw et al. (1987) demonstrated the Hilbish et al., 1982; Hilbish and Koehn, 1985a,b;
clear genetic differentiation of two moss varieties, McDonald and Siebenaller, 1989), fishes (Powers
as did Odrzykoski and Szweykowski (1991) for et al., 1979; DiMichele and Powers, 1982a,b; Al-
the thallose liverwort. I-fowever, the reverse pat- lendorf et dl., 1983; Leary et al., 1984; Crawford
tern is also well documented: morphologically and Powers, 1989; Ropson and Powers, 1989; Van
distinct taxa sometimes show little or no genetic Beneden and Powers, 1989; Ropson et al., 19901,
divergence (B.J. Turner, 1974; Ecl~elleand Dowl- Drosophila (Heinstra et al., 1986; Barnes and Lau-
ing, 1992). Nevertheless, the many examples rie-Ahlberg, 1986; M.W. White et al., 2988), and
given above highlight the power of this approach marine amphipod crustaceans (J.H.McDonald,
for diagnosis of the basic units of analysis, and 1989; Patarnello and Battaglia, 1992). However,
these so~netimeshave profound conservation im- Eanes (1987) points out that many of these stud-
Proteins: Isozylne Electrophoresis 59
ies were not designed to separate the effects of a Murphy et al., 1983a). At the other extreme, ,l-
single locus upon individual fitness relative to the lozyme divergence may have proceeded to
contribution of linked loci; future efforts will re- point where too few electromorphs are shared,
quire strongly integrated approaches combinillg and many of those that are shared are convergent
in vivo and in vitro analyses and biochemical (e,g., Sites et al., 1984; Derr et al., 1987, Dlrnrnick,
properties of segregating allozymes (Eanes et al., 1987).
1990). There are many groups, however, for 1vl~ic11
allozyme divergence provides information appro-
Interspecific Applications priate for analysis of intra- or intergeneric rcla-
tionships. Allozyme electrophoresis IS appropri-
Phylogenetic Systematics ate for analyzing intergeneric phyIogeny In b ~ r d s
Allozyme data (and to a lesser extent, isozyme (Gutierrez et al., 1983; N.K. Johnson et a1 , 1988),
data) have been used extensively to investigate snakes (Murphy, 1988) and, occas~o~~ally, other
phylogenetic relationships. Some recent reviews groups as well (e.g., Hafner and Nadler, 1988).
of phylogenetic applications include M.W. Smith Most studies have focused on intragenerlc rela-
et al. (1982, 1994) for vertebrates, Matson (1984) tionships (see reviews clted above), and these are
and N.K. Johnson et al. (1984) for birds, D.J. only informative when the individual loci are
Crawford (1989, 2990) for plants, and Kilias analysed as discrete charactcrs (Murphy, 1993;
(1987) for lichens. Buth (13844 reviewed the ap- Chapter 11). Among other things, this has the ad-
plication of isozyme and allozyme data to sys- vantage that the evidcnce for each node is made
tematic problems in general, and Mabee and explicit and therefore can be related back to the
Humphries (1993) and Murphy (1993) provide re- primary data (e.g., Crabtrce, 1987; Sites et al.,
cent evaluations of methods and suggestions for 1990; Wiens and Tltus, 1991).
phylogenetic analysis (see also Chapter 11). The
methods of data analysis used for pl~ylogenetic Modes of Speciatiorz
analysis of these data vary widely and are highly AIlozy~nescan be used to explicate the pdtterlzs
controversial. Undoubtedly, the methods are still and mechanisms by w h ~ c hnew specles are
in a relatively early stage of refinement and much formed. Changes In allozyme frequency havo
remains to be developed. What seems critical is been used to identify incip~entspecles (Aradhya
that the assu~nptionsassociated with each et al., 1991; Gottlieb, 1973, McPheron et a1 , 1988),
method of analysis not be violated. Because each study sibling species (Anderson and Oakeshot~,
method of analysis has its limitations, and some 1984),analyze how adaptation has col-~tributcdto
commonly used methods are simply invalid the process of speciat~on(Allcgrucci et al., 1967),
(But11,1984a; Murphy, 1993; Chapter 111, the cita- differentiate between competing hypotheses for
tions in the general reviews (and below) should the origin of new species (Mayden, 2986; S~nallct
not necessarily be considered exemplary. Here, al., 1992), and explore the role that spcclat~on
we restrict our comments to the use of allozymes; plays in evolution (Mindell et al., 1989, 1990).
isozyme characters such as the presence of dupli- Even though DNA ana1yses can be a p p l ~ e dl o
cate loci, patterns of gene expression, and tke these kinds of questions, the ease with which elcc-
ability to form heteropolymers are considered trophoresis can be used to survey large ~ ~ u n ~ b c r s
later, in the section "Gene Expression and Gene of individuals for genetically infor~nahvef ea lures
Duplication." makes aUozyme analysis remain the tool of cho~ce
Allozyme characters are subject to many of for many studies.
the same limitations as other forms of systematic
data. For example, morphologically distinct Paleobiogeoguaphy
species may show very low levels of divergence, Phylogel~etichypotheses formed from allozyme
and so differ by few phylogenetically informative data can be applied to answering questions of pa-
characters even when many loci are screened (e.g., leobiogeography, ,4 primary method is Brooks
60 Chapter 4 / Murphy, Sites, Buth & Haufler
parsimony analysis, or BPA (Wiley, 1988a,b), also where evolutionary tempo can be examined from
known as co-speciation analysis. 111 the first step the perspective of cladograms (Mindell et al.,
of thls method, cladograms are constructed for 1989,1990;Murphy and Lovejoy, 1995).
the taxa i n question (Brooks, 1981,1990).Next, ge- Tempo questions may also be applied to (I)
ographlc areas in which the species occur are des- estimated dates of dispersal or vicariance events,
igna tcd as if they were taxa. Using geological evi- (2) the relative arrival times of taxa on oceanic is-
dence, an area cladogram is constructed slzowing lands, (3) relative roles of colonization, extinction
the historical connections among the study areas. and historical factors in island biogeography, and
Next, the taxa are treated as if they comprised a (4) prediction of the time sf origin of populations
completely polarized multistate transformation or geographic areas, such as islands, in the ab-
series in which each taxon and each internal sence of supporting geological data or radioiso-
branch of the tree are numbered. The taxa are tope dating, such as 14C. For example, Murphy
then coded using non-redundant linear coding (1983a) used genetic distance data from presumed
(O'Grady and Deets, 1987) and the species names sister taxa of a number of reptile taxa presumably
are replaced by their area names. A new area isolated by the same geological vicariant events in
cladogram is constructed based on the phyloge- Baja California, Mexico and found a good correla-
iietic relationships of the species, and this new tion between geological dates and genetic similar-
area cIadogram is presumed to represent the his- ity. Genetic distance data were then used to pre-
torical involvement of areas in the evolution of dict the age of one island in the Gulf of California,
the species. Although tlus is the preferred method lsla Santa Catalina. Unfortunately, most of the sis-
of paleobiogeographic analysis, it has rarely been ter taxa were presumed rather than tested using
appllcd to allozyme data (see also Kluge, 1988).A more preferable cladistic methods, such as that of
less preferable alternative is described below. Mindell et al. (1989, 1990). Nevertheless, genetic
similarity data were extended to test the applica-
X ~ r f e sof Evolution bility of the MacArthur and Wilson (1963, 1967)
Queslions of evolutionary tempo areimportant theory of island biogeography as it relates to rep-
considerations, especially if applying the molecu- tiles on islands in the Gulf of California (Murphy,
lar clock (see Chapter 12) or examinii~grelative 1983b), and to the colonization of islands by some
rates among different kinds of data from the same rattlesnakes (Murphy and Crabtree, 1985a).
taxa, Tor example, allozyrnes versus morphology.
Rosen and But11 (1980) provided a protocol for ex- Hybridization
anuning evolutionary tempo using allozjhe data Ideally, studies of interspecific hybridization
that lncluded the calculation of ancestral genetic should incorporate three features, including: (1)
d~stancebetween all examined taxa and their hy- phylogenetic analysis of the taxa involved to al-
potl~etlcalcommon ancestor, Murphy and Crab- low inferences to be drawn about the origin of the
tree (1985a) applied this method to rattlesnakes hybrid zone (primary versus secondary; e.g.,
and found that rates of divergence had been Hillis, 19851, (2) identification of autapomorphic
equal, although they were unable to confidently electromorphs in each of the hybridizing species,
calibrate the clock. However, comparisons of rela- which provides the most unambiguous genetic
tlonships revealed by methods that assurqe equal markers for gene flow inferences (Murphy et al.,
ra tes (c.g., UPGMA) and distance (similarity) 1984), and (3) identification of at least three un-
1nc~l30dsthat do not (e.g-.,distance-Wagner trees; linked markers (fixed or nearly fixed electro-
Chapter 11)most frequently reveal marked varia- morph differences) between hybridizing taxa (see
tion among lineages in rates of change (e.g., Chapter 2). With three or more single-copy inark-
Baverstock et al., 1979; Hillis, 1985).In general, the ers, F1 individuals (heterozygous for parental
ev~dencefor an allozyrne clock is weak (Avise and electromorphs at all markers) can clearly be dis-
Aquadro, 1982; Chapter 12). Finally, a phyloge- tinguished from most F2 or backcross classes,
netlc approach has been developed recently which will be heterozygous for some but not all
markers (R.J. Baker et al., 1989; D.A. Good, 1989; in the hybrid origins of various unisexual taxa.
Wake et al., 1989; Arevalo et al., 1993; Sites et al., Most carefully examined unisexual vertebrates
1993).Few of the studies conducted to date satisfy appear to be of hybrid origin (reviewed in Daw-
all of these criteria, but most have contributed to a ley and Bogart, 1989).Typically, unisexual taxa are
better understanding of the structure and dynam- characterized by higher levels of multilocus het-
ics of hybrid zones. Least informative are studies erozygosity than either parental form because of
in which hybridizing populations are not charac- their hybridity and the absence sf segregation.
terized by any fixed allozyme differences (Green- Laboratory studies have confirmed patterns of
baum, 1981; Frykman and Bengtsson, 1984; clonal inheritance of fixed heterozygosity in some
Halkka et al., 1987). Hybridizing taxa distin- unisexual lizards (Dessauer and Cole, 1986). In
guished by several fixed differences (e.g., Patton cases of multiple ploidy levels among different
et al., 1984; D.A. Good, 1989; Szymura and Barton, unisexuals of hybrid lineages, allozymes fre-
1986, 1991; Wake et al., 1989; Dessauer and Cole, quently show different staining intensities due to
1991) offer greater potential for inferring the ex- alleles that are represented unequally in the
tent and symmetry of introgressed nuclear genes. genome (Dessauer and Cole, 1984,1989; Dawley
Several recent studies have used allozyme et al., 1985; Kraus, 1991). Ideally, an analysis of
markers to infer the extent of introgression of suspected hybridization events should be carried
other classes of genetic markers (M.L. Arnold et out in a phylogenetic context that will permit the
al., 198%; Harrison et al., 1987; Nelson et al,, 1987; identification of uniquely derived (autapomor-
Marchant et al., 1988; R.J. Baker et al., 1989; Klier phic) markers in the parental species; this will
et al., 1991; Ar6valo et al., 1993), and occasionally eliminate ambiguities arising from the use of
hybridization is assessed in a historical context shared ancestral (symplesiomorphic) alleles to
(Dowling and DeMarais, 1993). Isozymes have define bisexual taxa involved in hybridization
also been used to study developmental stability as events (W.H. Wagner, 1983; Murphy et al., 1984;
manifested by morphological asymmetry (Gra- Funk, 1985; Moritz, 1987; Sites et al., 1990).
ham and Felley, 19851, and the origin and distrib- Allozyme data are useful in the estimation of
ution of rare or unique alleles (called hybxizyrnes clonal diversity within gynogenetic or partheno-
by D.S. Woodruff, 1989; see also I-Iunt and Se- genetic populations which arise through recurrent
lander, 1973; Sage and Selander, 1979; Green- hybridization (Moritz et al., 1989c; Vrijenhoek,
baum, 1981; Barton et al., 1983; Case and 1989), mutation (Parker and Selander, 1976;
Williams, 1984; Murphy et al., 1984; Kocher and Spinella and Vrijenhoek, 1982), limited recombi-
Sage, 1986; Gollmann et al., 1988; Bradley et al. nation (Asher, 1970; Bogart et al., 1987), or some
1993))the genetic status of threatened and endan- combination of these factors. The matrilineal
gered taxa (Echelle and Conner, 1989; Dowling clones are frequently not a random representation
and Childs, 1992),and to address issues of hybrid of the possible genotypic diversity (B.J. Turner et
speciation (M.L.Arnold et al., 1990; Meagher and al., 1983); interclonal selection may produce habi-
Dowling, 1991; DeMarais et al., 1992). Future tat or trophic specialists, or hybridogens with dif-
studies of hybridization that merge ecological and ferent life history characteristics, and these kinds
molecular genetic approaches in appropriate phy- of differences may be "f~ozen"during the origin
logenetic and biogeographic contents offer great of new clones (Vrijenhoek, 1989).Others have ad-
potential for understanding processes involved in dressed questions of genome replacement in some
genome divergence, gene flow, and adaptation to of the unisexual salamanders (Spolsky et al., 1992)
alternative stable equilibria (Hewitt, 1988; Barton and have used isozymes in combination with
and Hewitt, 1989; Harrison, 1990). other genetic markers to provide evidence for
semi-independent segregation of unisexual al-
Parentage of Unisexual Biotypes lochthonous genornes during hybridigenetic
Allozyme electrophoresis is a powerful method . meiosis in some populations of the salamander
for identifying the bisexual parent taxa involved genus Ambystoma (Kraus and Miyamoto, 1990).
62 Chapter 4 /Murphy, Sites, Buth & Haufler
Origin of Polyploid Plants (Ohno, 1970; MacIntyre, 1976; B.J. Turner et al.,
As in studies of hybridization, isozymes have 1980) can produce isozymc loci that often diverge
been valuable in identifying the parents of poly- markedly in their developmental expression
ploid plants. Isozymes have supported hypothe- (Whitt, 1981).Differences in gene number may ei-
ses based on other lines of evidence (Roose and ther serve as characters useful in systematic stud-
Gottlieb, 1976; Werth et al., 1985a) and differenti- ies (Gottlieb, 1982b; Whitt, 1983, 1987; Buth,
ated among alternative hypotheses (Holsinger 1984a), or have little value because of extensive
and Gottlieb, 1988; Gastony, 1986). Allozymes homoplasy (Sites and Murphy, 1991). These dif-
have shown that some polyploids have a single ferences can arise through gains of new genes
origin (Werth et al., 1985b) and that some are au- (duplication) or losses (gene silencing); both con-
topolyploids (Soltis and Rieseberg, 1986). In the ditions may be considered as derived relative to
process of exploring the origin of allopolyploids, an ancestral state. For example, many groups of
allozymes have been used to predict the existence fishes (Buth, 1983) and plants (Gottlieb and Wee-
of, and ultimately to discover, new diploid species den, 1979; Gottlieb, 1982a) have extra loci encod-
(Pryer and Haufler, 1993). Diversification and spe- ing enzyme systems, suggesting that gene dupli-
ciation in polyploid lineages have occurred cation events have played an important role in
through gene silencing (Werth and Windham, their evolution, perhaps in the acquisition of
1987; Gastony, 1991). However, if gene silencing novel gene functions (Ohno, 1970; Markert et al.,
regularly leads to diploidization of entire poly- 1975; Fisher et al., 1980). In fishes, tetraploidiza-
ploid genomes (Haufler, 1987), then the value of tion is followed by a shift from tetrasomic (pair-
allozymes for assessing ploidy must be ques- ing of homologous chromosomes in tetrads) back
tioned, especially in phylogenetically ancient to disomic (pairing in dyads) patterns of inheri-
groups. tance. During this "rediploidization," some 50%
of the duplicated loci are silenced either by fixa-
tion of new mutations or the deletion of some
Gene Expression and Gene Duplication codons (Allendorf and Utter, 1973; Ferris and
The expression of gene products is subject to both Whitt, 1977a,b; W.H. Li, 1980).Patterns of malate
temporal (ontogenetic) and spatial (cells/tissues) dehydrogenase (MDH) meiotic segregation dur-
variation in organisms. The predominance of ing rediploidization in the recently evolved
products of different L-lactate dehydrogenase loci tetraproid frog Hyla versicolav suggest polyrnor-
in different tissues of vertebrates (e.g., Ldh-A in phic (tetrasomic, disonic, and tetrasomic-dis-
skeletal muscle, Ldk-B in heart) is a classic exam- omic) inheritance tl~oughtto be a transitory phase
ple of this phenomenon (reviewed by Markert, between complete tetrasomy and complete dis-
1983). An example of the evolutionary conse- omy (Danzmann and Bogart, 1982a).A phyloge-
quences of regulatory divergence in gene expres- netic evaluation of the catostomid fish Moxastoma
sion is provided by the third L-lactate dehydroge- lachneri suggests that "retetraploidization" of the
nase locus (Ldk-C) in the bony fishes. Fishes of second glucose-6-phosphate isomerase locus, Gpi-
several morphologically primitive orders express B, is due to reactivation, or postpolyploidization
Ldk-C in many tissues, whereas in most teleosts regional duplication (Buth, 1982a).
Ldh-C expression is limited to eye or liver tissue Isozyme staining intensities may be used to
(Shaklee et al,, 1973; W11itt et al., 1975; Shaklee investigate ploidy levels. Danamann and Bogart
and Whitt, 1981). To ensure relevant comparisons (1982b) and Dessauer and Cole (1984) found that
of homologous gene products, extracts from ho- gene dosages, and thus ploidy levels (212, 3n, or
mologous tissues/organs must be prepared and 4n), could be inferred accurately from staining in-
specimens at similar developmental stages com- tensities because the subunit interactions were
pared. additive.
The duplication of genes via aneuploidy, As discussed earlier, many enzymes are mul-
polyploidization, and regional gene duplications timeric, composed of subunits that must be as-
Table 2
Evolutionary patterns of creatine kinase gene expression in fishes
Character state
Character Ancestral Derived

--
Number of loci 2 (A, C) 4 (A,B,C, Dl
Tissue specificity Widespread liestricted
Interlocus heteradimer formation Prcsent Absent
Intralocus heterodimer formation Present Absent
Summarized from Ferns and Whitt (1978b) and Flsher and WNtt (1978,1979)
sembled in order for the enzyme to function. Mul- and Murphy, 1991) and glucose-6-phosphate iso-
tiple isozymes of multilners can be produced by merase in the Leguminosae (Weeden et al., 1989)
combining different kinds of subunits in het- does not correlate strongly wit11 phylogenetic re-
erazygotes (lzeteromers) and by the interactions lationships.
among multimers of duplicated genes in a multi-
locus isozyme system producing interlocus het- Limitations
eropolymers. Heteropolymer formation may be
non-random because regulatory differences may Taxonomic Limits
suppress the formation of some or all of the pos- Studies of population structure, breeding biology,
sible heteromers, e.g., the heterotetramers of t-lac- and other intraspecific applications require suffl-
tate dehydrogenase of some lizards (Gorman, cient levels of intraspecific variability. Allozynzes
1971; Sites et al., 1986), fishes (But11 et al., 1980), are not sufficiently variable in some organisms,
and snakes (Murphy, 1988). making other molecular methods, such as RFLP
The isozyme characters (sensu Whitt, 1983, studies (Chapter 8), more appropriate. For exam-
1987; Buth, 198413; Murphy and Crabtree, 1985b) ple, DeSalle et al. (1987b)examined the distribu-
of gene number, tissue specificity of expression tion of mtDNA haplotypes in populations of
(gene regulation) and posttranslational modifica- Drosoplzila mercatorurn distributed along a short al-
tion, and heteropolymer assembly can be of sys- titudinal transect near Kamuela, Hawaii, and
tematic value only if they vary at a taxonomic found statistically significant spatial and ternpo-
level useful to the investigator. These characters ral heterogeneity in the absence of isozyme d~ver-
may be useful for intraspecific, intrageneric, or in- gence. Intraspecific studies of birds are often ham-
trafamilial comparisons depending upon the pered by very low levels of isozyme pelynzor-
group (Buth, 1984b).However, the few studies of phisnz (Barrowclougl~et al., 1985),yet Quinu and
enzyme systems reveal certain limited group White (198713) demonstrated extensive genolnic
trends. Studies of creatine kinase (CK) expression DNA RFLP variability in the snow goose (A~zserc.
in fishes by Ferris and Whitt (1978b), Fisher and caerulescens; see also Haig et al., 1993, for another
Whitt (1978,1979), and otlzers permit the general- example). Similarly, Sites and Davis (1989) found
izations for CK isozyn~echaracters listed in Table many more variable markers using r e s t r ~ c t ~ o n
2 , Three of the four evolutionary patterns in Table sites in both rntDNA and nuclear ribosomal DNA
2 appear to hold for amphibians and reptiles than they found using allozylnes among central
(But11 et al., 2985). In contrast, LDH expression in Mexican chromosome races of the Iizard Sccloyoms
sea snakes and cobras (Murphy, 19881, and the grainmicus. Tlzese and other studies (Wettan et al.,
number of loci encoding glycerol-3-phosphate de- 1987; Karl et al., 1992; Karl and Avise, 1993) sl~owa
hydrogenase (G3PDH) in squamate reptiles (Sites definite lower taxonomic limit to the resoivlng
64 Chccyfer 4 / Murplzy, Sites, Buth & Hnz
power of protein electrophoresis (which may vary LIMITS TO DETECTION OF SEGREGATING ALLELES
among groups; Kessler and .Avise, 1985b). Hubby and Lewontin (1966) recognized that gel
At the opposite extreme, some taxa have di- bands represented enzyme phenotypes, and not
verged to the extent that they share virtually no necessarily all underlying allelic variation. King
alleles For example, Sites et al. (1984) surveyed and Okta (1975) introduced the term electro-
17 genera of batagurine turtIes and found the morph to label allozymes of the same mobility as
taxa to be so divergent and homoplasy so exten- different classes of alleles. Allendorf (1977)
sive that they could not recover well-corrobo- stressed that electromorph identity did not mean
ratcd branches for most basal stems of the clado- identity in DNA base sequence; homology is a
gram l h g h levels of divergence found among conditional concept for isozyme phenotypes.
congeneric fern species average D = 1.1 (I= 0.33) Because accurate estimation of allelic vana-
(Soltis and Soltis, 1989), which approach or ex- tion has important implications for many evolu-
ceed the limits of resolution of isozyme elec- tionary questions (Coyne, 1982), the problem of
trophoresis. Nei (1987: 251-252) offered as a gen- hidden heterogeneity (G.B. Johnson, 1977) fos-
eral rule that if genetic distance D (Nei, 1972, tered several studies to determine how accurately
1978) 1s greater than 1.0, then the frequencies of conventional electrophoretic techniques estimate
back/parallel mutations will be high, and the genetic variability. R.S. Singh et al. (1976) used a
varlance of D large, even if numerous loci are as- sequential assay of four different electrophoretic
sayed The hierarchical taxonomic level at which conditions, termed sequential electrophoresis,
phylogenetic utility is lost (D2 1.0) will vary and heat stability tests to examine Xdh-A variation
with taxonomic assignments and taxon-specific in Drosophilu pseudoobscuru. They resolved 37 al-
rates of lnolecular evolution-birds appear to be leIes where only 6 had previously been identified
decelerated (Avise and Aquadro, 1982)-but gen- by conventional protocols. Other approaches to
erally the greatest phylogenetic utility for detecting cryptic alleles include thermostability
isozylnes will be at. the level of species or closely analysis (e.g., Chambers et al., 19811, peptide
related genera (Nei, 1987). mapping (Ayala, 19821, and the use of polyacry-
lamide gels of varying pore sizes to produce a
Sampling Limitations sieving effect for separation by size or molecular
Several kinds of limitations of isozyme tech- weight (G.B. Jolwson, 1976,1979).
niques are recognized, including limits to the Although these methods show that conven-
number of (1) loci resolved, (2) alleles per locus, tional isozyme electrophoresis may underestimate
a n d ( 3 ) individuals required for population or variability, they do not reveal what proportion of
phylogenetic studies, The total number of loci alleles may remain undetected. Ramshaw et al.
that can now be visualized with histochemical (1979) examined several human hemoglobin vari-
s t a i n ~ n gtccllniques is in excess of 300 (D.A. ants of known amino acid sequence using both
Wrlght et al., 1983; Morizot and Siciliano, 1984; standard and sequential acrylamide electrophore-
Mnnchenko, 1994), but this is still only a very sis (varying conditions of p H and pore size).
small sample of the total genome. However, Three experiments determined what types and
given the slze of most eukaryotjc genomes (sum- proportions of substitutions could be resolved by
marlzed in Cavalier-Smith, 1985b; Loomis, 19881, these methods. First, 8 and 17 hemoglobin vari-
this 1s a constraint common to most moIecular ants out of 20 were detected by the two proce-
tecl-iniques and will not be elaborated further dures, respectively. Second, groups of variants
here. What is apparent is that, in general, one with the same amino acid substitutions in differ-
needs to resolve about three times as many loci ent parts of the molecule were screened by two
as ihcre are taxa in order to have a reasonable approaches and revealed 77% and 90% of the
chance of resolving most nodes of a cladogram in known variants, respectively. Third, 4 of 5 pairs of
a character-state evaluation. hemoglobins differing by charge-equivalent sub-
Proteins: Isozyrne Electrophoresis 65
stitutions in the same positions were separated by Clearly, hidden heterogeneity is pervasive,
both procedures. There was no class of commonly and one cannot always rely on any single method
indistinguishable substitutions, and Ramshaw et to resolve all alleles. Equally important, however,
al. (1979) concluded that the standard protocol of are findings that (1)some loci are much more
electrophoresis was a powerful method for iden- likely than others to harbor cryptic alleles, espe-
tifying most variants. cially systems originally resolved as highly poly-
McLellan (1984) examined 14 whale myoglo- morphic by conventional methods, and ( 2 ) con-
bins of known sequence by sequential polyacry- ventional methods will resolve most or all
lamide electrophoresis (five pH values) and was variation at the more conservative loci, Further, a
able to separate 13 of the 14 variants. No further number of classes of studies will be largely unaf-
resolution was obtained by altering concentration fected by this phenomenon (Coyne, 1982). Fixed
or composition of the gels, or by screening with differences between populations or species de-
other techniques such as urea denaturation or iso- tected by conventional methods are real and the
electric focusing (McLellan and Inouye, 1986). differences can only increase by resolution of ad-
Aquadro and Avise (1982a) used several ditional alleles. Similarly, between-population al-
starch and acrylamide conditions, gel-sieving, iso- lele frequency heterogeneity is also real, regard-
electric focusing, and thermal stability tests to less of any underlying heterogeneity in
screen for cryptic alleles at three loci (sAat-A, electromorphs, because such differences in elec-
sMdh-A, and Est-1) in five populations of Per- tromorph classes should also reflect the same de-
omyscus maniculatus. sAat-A (their Got-1) was pre- viations of cryptic alleles. Other kinds of studies
viously known to segregate for two alleles across (e.g., absolute estimates of heterozygosity) may be
most of the range, sMdh-A (their Mdh-1) was es- more affected by cryptic allelic variability, but to
sentially monomorphic throughout the range, and an unknown degree. Obviously, any problem ad-
Est-1 was highly polymorphic, with eight alleles dressed with isozyme techniques will be better
resolved in earlier studies. None of the techniques understood by more accurate descriptions of al-
uncovered any further variation in either sAat-A lelic variation. Where time and resources permit,
or sMdh-A. In contrast, sequential electrophoresis we suggest that at least loci showing extensive
(five additional starch gel conditions) resolved 23 variation under standard conditions be screened
variants in Est-1, which were further resolved into sequentially with additional buffers to maximize
35 variants by heat denaturation, although the al- separation. Notwithstanding, at least one study
lelic nature of the latter group was not deter- (C.D. Chase et al., 1991) suggests allozymes corre-
mined. Aquadro and Avise (1982b) also uncov- late well with DNA RFLP data (Chapter 8). For
ered additional sMDH isozymes among ten phylogenetic studies involving extensive radia-
orders of birds using multiple buffers. tions likely including multiple monophyletic
Bradley et al. (1993) sequenced the entire cod- groups, conservative loci also should be screened
ing regions from multiple individuals of gophers with multiple buffers for resolution of additional
(Geornys) that expressed several combinations of electromorphs likely to define basal splits (S.B.
three Adh-1 electromorphs. They found that the Hedges, 1989; Burnell and Hedges, 1990). To this
three electromorphs were encoded by a total of six we would add that all hypothesized synapomor-
alleles at the nucleotide level and five alleles at the phic allozymes, identified as such through a pre-
amino acid level. However, each electromorph class liminary phylogenetic analysis, should be sub-
contained only alleles that were phylogenetically jected to sequential electrophoresis. However, the
closely related. Thus, the electromorphs represented sequencing study by Bradley et al. (1993; dis-
natural groups of alleles that would be expected to cussed above) lent support to the hypothesis that
be informative about phylogenetic relationships, even if electromorphs are encoded by different al-
even though all of the nucleotide variation was not leles at the nucleotide level, allozymes neverthe-
apparent in the allozymic differences. less are likely to represent related groups of
66 Ckaptev 4 / Muvphy, Sites, Butk G.Haufler
alleles, which are thus informative about evolu- compIicate isozyme interpretations. Several non-
tionary relationships. Mendelian factors also may complicate isozyme
phenotypes via in vivo or in vitro environmental
NULL ALLELES AND ISOLOCI Other phenomena conditions, or through the action of modifier loci.
cause deviation from codominant expression of
allozymes. Null alleles (those with reduced or POSTTRANSLATIONAL MODIFICATIONS OF ENZYMES
no expression of a protein product) are detected Polypeptide synthesis involves (I) translation, (2)
by reduced staining intensity of some single polymerization, (3) termination, and (4) process-
isozymes on the same gel; complete absence of ing of the final protein product. Only the first step
activity may indicate null homozygotes (see involves the direct coding of nucleotide sequences
Utter et al., 1987). These interpretations are often into primary protein structure, while the others
ambiguous and require confirmation by breed- are posttranslational processes that give a final
ing studies (e.g., Stoneking et al., 1981). In the structure to the product. These latter processes
absence of breeding studies, quantification of a change t11e 20 primary amino acids specified by
null allele cannot be made reliably. Apparent het- the genetic code as monomeric building blocks in
erozygote deficiencies may be due to null het- polypeptide assenzbly into about 140 amino acids
erozygotes being scored as active homozygotes and derivatives in completed proteins (Uy and
(Foltz, 1986). Heterozygotes for null alleles are Wold, 1977). On gels, a number of these epige-
more readily detected if they either form partial netic events may produce conformational iso-
heteropolymer isozymes in polymorphic single- zymes, or multiple forms of a singIe gene product
locus systems (Burkhart et al., 1984) or are that differ in secondary or tertiary structure (also
expressed in multilocus, multimeric proteins called secondary isozymes or subbands; Riclz-
(e.g., Engel et al., 1973; Allendorf et al., 1984; ardson et al., 1986) and/or variants that differ in
Utter et al,, 1987; Gastony, 1991). In both cases, thermal stability (Lebherz, 1983).In some cases,
reduced intensities of one or more multiple modifying genes have been shown to be poly-
bands provide additional visual clues to the morpluc for alleles that differ in their influence on
presence of null alleles. electrophoretic mobilities of the protein products
Another difficulty may occur when isozymes (Cochrane and Richmond, 1979; Womack, 1983;
with identical electrophoretic mobilities represent Dykhuizen et al., 1985).In other cases, altered mo-
the products of two different loci of the same mul- bilities appear to be restricted to specific tissues
tilocus enzyme system (Utter et al., 1987).These (Murphy and Crabtree, 1985b), or to be a function
isoloci may present rather complicated isozyme of environmental conditions and/or the physio-
patterns, and, if allelic variation is present, deter- logical state of the organism (McGovern and
mination of which locus i s polymorphic (or Tracy, 1981; van Tets and Cowan, 1966; Fields et
whether both are) may not be possible. Isoloci al., 1989). For example, in a cryptic species of the
may be individually identifiable if their respective freshwater clam genus Corbicula, the synthesis of
encoded loci are synthesized at different levels in an enzyme seems to be a seasonal event in an en-
different tissues, but this appears to be uncom- tire population (Hillis and Patton, 1982). Suclt al-
mon (Allendorf and Thorgaard, 1984). Under terations of mobility may lead to incorrect hy-
some circumstances, different staining intensities potheses about the number of loci encoding an
are expected (see Utter et al., 1987),but often such enzyme system (Hickey et al., 1989).
distinctions are difficult or impossible to make. Mobilities of some proteins are also suscepti-
Changing electrophoresis buffers often results in ble to protease degradation associated with re-
the separation of isoloci. peated freezing and thawing (Harris and Hopkin-
son, 1976; Richardson et al., 1986), or long- and
Other Sources of Phenotypic Variation of lsozymes short-term aging of the sample (Walter et al., 1965;
The phenomena described above may either limit Kobayashi et al., 1984). Moore and Yates (1983)
the resolving ability of isozyme electropl~oresis,or showed that many of the loci frequently screened
Proteins: lsozylne Elcct~ophouesis 67
in population and systematic studies were resis- respectively, as a functlon of varying el-izymc
tant to mobility modification when kept at room dilution. This di1ution effect, which occurs
temperature up to 12 hours after death. Posttrans- because GTDH molecules associate with one
lational effects can frequently be determined by another in the presence of coenzymes and purine
evaluating relative intensity of isozyrne staining; nucleotides, is known from GTDH only. Al-
alternate segregating alleles usually give constant though tlxe phenomenon of greater mobility with
patterns of expression, while breakdown effects increasing dilution occurs neither in all taxa nor
are likely to give a full range of expression of rela- on all buffer systems, it remains a variable to be
tive strengths (Richardson et al., 1986). considered.
XNTRACX~TRONIC RECOMBINATION Recombination

between alternative nucleotide sequences within LABORATORY SETUP
a single gene locus has been investigated theoret-
ically by a number of workers with respect to its In order to carry out starch gel electrophoresis, a
potential evolutionary importance (W.B. Watt, number of pieces of equipment are essential and
1972; Strobeck and Morgan, 1978; Morgan and others are highly desirable. In many cases, initially
Strobeck, 1979; Golding and Strobeck, 1983). The more expensive alternatives are more cost-effic~ent
mechanism can generate new alleles at rates sev- in tlxe long nm because of time saved. Table 3 lists
eral orders of magnitude above standard gametic both the necessary and desirable equipment for
mutation rates if some minimum Ievel of vari- SGE but does not include a detailed list of many
ability is already present at a given Iocus. This supplies (see Werth, 1985). Table 4 provides a lxst
hypothesis-namely, that "polymorphism gener- of chemicals required for the stain and buffer
ates polymorphism"-was einpirically demsn- recipes (Appendices 1and 2, respectively).
strated by Ohno et al. (1969) in test crosses of 26 The room in which electrophoresis is to be
pairs of Japanese quail (Coturnix c. japonica) het- carried out should be equipped with sufflcrcnt
erozygous for different combinations of four alle- counter space and electric outlets, a large sink, gas
les for phosphogluconatc dehydrogenase. They line (propane or natural gas), and a certified,
recovered 11 mutation-like events in enzyme working fume hood. If a fume hood is not avail-
phenotypes scored from 1011 test cross progeny, able, then all procedures involving the use of 2-
including new electromorphs, novel-combination rnercaptoethanol and some aIcohols must be
genotypes, and one case of the inheritance of avoided; the former liquid is highly volatile and
three alleles. If these results are d u e to the fumes may be lethal. Either the water pressure
intracistronic recombination, and if this is a gen- must be sufficiently high to allow for a faucet as-
eral plxenomenon, then it may form part of the pirator or a vacuum pump or vacuum line must
explanation for the well-known "rare allele" be available. There should be an abundant supply
observations in many hybrid zones (Woodruff of distilled water and a water fiIter/deionizex
and Thompson, 1980; Murphy et al., 1984; but see The starch gels must be cooled during elec-
Bradley et al., 1993). This type of recombination trophoresis. Ideally this is accomplished by per-
differs qualitatively from that between separate for~ningelectrophoresis in a walk-in refrigerator,
loci by producing new gene products, which may chromatography chamber (dairy case), or stan-
then be transmitted in a Mendelian fashion, dard refrigerator; horizontal gels are cooled fur-
rather than new genotypic combinations that will ther using ice-filled aluminum trays. If a refriger-
be disrupted in later generations. ated chamber is not available, tlxe ice levels must
be checked at least every 2 hours, making over-
NON-GENETIC VARIATION Frieden (1963) and night electrophoresis runs difficult. In addition,
Hernandez-Juviel et al. (1993) have reported rela- tissue hornogenates are kept cool on ice while gel
tive mobility changes for glutamate dehydroge- loading takes pIace. Thus, an ice machine is
nase from bovine and rattlesnake liver extracts, lughly desirable. If crushed ice is not readily avail-
68 Chapter 4 /Murphy, Sites, Buth b Haufler
Table 3
Basic equipment. a n d non-chemical s u p p l i e s necessary and desirable for starch gel
electrophoresisa
Quantity Quantity Protocol
Ilquipment Description needed desirable number
Major equipnlerrt
Freezc.r Manual defrost 1 1 6
Refr~gelator >12 cu f t 1 1 6,8
Analytical balance 0.1 mg to 100 g 1 1 1,2,6
pH meter 0.01 pH, with Tris probe 1 I 2,6
Fuine hood 1 1 2,4,6
Water delonlzer and filter 1 1 1,2,4,6
Power suppl~es 0-500 V; 0-100 mA 1 10+ 4
Refrlgcrated, high-speed >10,000 g 0 1 1
cen trlfuge
Centrifuge rotor Fixed angle; 24-36 place
Refr~geidtedchamber or
walk-ln refrigerator
Ultracold freezer, -70°C
C 0 2 (or 1 NZ) backup for
ult~acoldfreezer
Incubator
Tlssuc hoinogenizer High speed
Sonica ior/cell disrupter
Water bail]
~Microwavcoven
Pipeticrs, set Adpstable: 1 p11-5 ml
S~ngiel c l ~ 1seflex camera With macro lens and
yellow filter
Ice rnaci~lne h lieu of blue ice
Minor eqzripment
Gel ~-ilolds 1 >20
Buifer wells (trays) 1pair >20 pairs Y
Dessica tors 2 4 6
Spatula (stamless steel) Large and small 12 af each >12 6
Magizetic stirrer Preferably with hot plate 1 1 6
6
Magl-iet~cstlrrlng bars Various sizes I pkg 1 pkg
Asplratol /vacuum line 1 1 2
Asp11 arlon safety shield 1 1 2
Btmser~burner 1000 Cal 1 1 2
Hcab gloves 1pair 2 pairs 2
Gel slicer 1 1 5
Polystyrene stain boxes 10 >200 6
IIazardous chemical 1 2 6
Table 3 (continued)
Basic equipment and non-chemical supplies necessary and desirable for starch gel
electrophoresisa
Quantity Quantity Protocol

Equipment Description needed desirable number
Forceps Small, straight tips 2 24 3
Forceps Small, curved tips 2 24 3
Dissecting kit Scalpel, scissors, etc. 1 2 1
Copy stand Adjustable height 1 1 8
Light box 1 1 6,8
Ultraviolet lamp Long wave (340 nm) 1 1 8
UV light face shield 1 1 8
Pipetter Ranging 1-10 ml; fixed
volume; bottle top 0 5 6
Liquid dispenser Adjustable; 10-60 ml 0 1+ 6
Erlenmeyer flasks 125-1000 ml Numerous 2,6
Glass bottles, narrow 2004000 ml Numerous 6
mouth, amber color
Graduated cylinders 10-1000 ml 5 10 2,6
Beakers 3-3000 ml 1 of each 12 6
Funnels Large and small 2 >2 2,6
Wash bottles 250 ml 1 6+ 1,5,6
Pasteur pipettes As needed 2,6
Disposable rubber or As needed 2,6
vinyl gloves
Disposable dust mask(s) As needed 2/6
a See text far discussion and alternatives.
able, then Blue IcerMpacks can be used during form, or a commercially available solvent (methyl-
electrophoresis without having deleterious effects. ene chloride containing dissolved plastic).
Most staining gels are placed in an incubator Table 4 lists the chemicals necessary to estab-
set at 37OC. Alternatively, gel staining can be car- lish an allozyme electrophoresis (specifically,
ried out in dark cabinets or drawers, the only ef- SGE) laboratory having a capacity to use many
fect being a longer stain reaction time. different buffer combinations for running and
It may be necessary or desirable to construct staining of most enzyme systems that have been
some of the equipment, especially gel molds, buffer adapted to eukaryotes.
wells, gel origin guide, gel slicer, slicing tray, and
aspiration shield. Plans and examples of equip-
ment are provided in Figures 1-5 and detailed as- PROJECT PLANNING
sembly instructions will be provided upon request
to R. W. Murphy. Buffer well plans are designed to The problems to be solved in preliminary studies
prevent accidental electrocution (see E.W. Spencer are (1)what is the optimal buffer system? and (2)
et al., 1966).Gel molds, buffer wells, and gel origin how does expression vary among tissues and
guides are constructed from high-quality acrylic which tissues are best for analysis? This technical
plastic (transparent polymethyl methacrylate) development phase can be combined with a pilot
sheets, such as PlexiglaslM G. The pieces of plastic study (see Chapter 2) to determine the efficiency
are glued using either methylene chloride, chloro- of the approach. We have found that most fre-
Figure 1 Plans for two types of gel molds used in
horizontal starch gel electrophoresis. (A) Simple gel
mold that requires the use of a sponge wick. (B) A
wickless gel mold. The construction material is %-inch
acrylic plastic. A11 measurements are in millimeters.
quently the optima1 gel buffer systems for partic- summarized commonly used combinations for
ular proteins vary among taxa and are not trans- plants.
ferable. Also, impurities in water can affect differ- With five gel setups, in a few days it is possi-
ences in electrophoretic conditions, making ble to determine optimal electrophoretic conditions
interlaboratory protocols vary. Unless multiple gel by surveying a few specimens for a wide array of
buffer systems are initially tried for each enzyme enzymes on virtually all commonly used gel buffer
or general protein system, much of the variation systems. Each of the five gels is made fro111 a dif-
may be unresolved (see above). Before the ferent buffer and can be cut into 5 x 5 minislices, aI-
isozyme data are gathered, it is highly desirable,
if not essential, to determine independenby which
Figure 2 Design for an electrophoresis buffer tray P
of the various gel buffer systems are useful. that prevents accidental electrocution. (A) Base. (B)
Therefore, we have avoided suggesting buffer and Cover. Construction material is %-inchacrylic plastic.
stain combinations, although Kephart (1990) has All measurements are in millimeters.
1 qL A
2551-79 Male banana
/ plug 1
(B) rF1:;7-4
0 0 0 0 0
- 0 0 0 0 0 -
0 0 0 o 0 2 5 3 panel) / blob
0 0 0 0 0 ,Wire
t
44.5 Femalc
1 0 0 0 0 0 banana plug
2 p
-------
k32+1+44.5 ../
6 3 B B E s g l u O Q
@ 3 6 5 = e e a *
72 Chapter 4 / Murphy, Sifes, Buth & Haufler
(A)
6 mm stoel rod Music wire
- ,-Groove to hold wire 7
10/32 Nuts and bolts ---/ 2'
F i g u ~ e3 Gel slicing apparatus. (A) Bow slicer (con- in both number of loci and amount of allelic vari-
structed from 'h-inch aluinlnum bar). (R) Gel slicing
tray (constrnctedfrom %-inchacrylic plastic). All mea- ability within loci (J.H. Gillespie and Kojima,
surerncnts are in millimeters. 1968; Gottlieb, 1982a).
The final stage of planning involves the elec-
trophoresis of allozymes from numerous individ-
loxving the rapid survey of 30 or more enzyme sys- uals on established buffer systems and from
ten-~s.Five minigels representing five different known tissues to generate data on allozyme vari-
buffers are silnultaneously stained in the same ation. Gel runs must be well planned in advance
stain box making the protocoI cost- and time-effi- in order to avoid unnecessary reruns. Richardson
cient 'The specimens examined can represent thc et al. (1986) have detailed many variables that
taxonomic diversity to be studled (see Chapter 21, a should be taken into consideration in the plan-
rang(' of different tissue types, or both. ning stages. Some of the more important consid-
I t may be important to have a mix of rela- erations are as follows:
tively rapidly andAslowlyevolving loci, especially Enzyme systems sensitive to freezing and
if one study is to be compared to another, or if dif- thawing (e.g., HBDH, GSPDH, IDDH, etc.)
iere1-d hierarchical taxonomic levels are being ex- should be resolved first, preferably before
auzincd. Some enzymes, such as those involved freezing the tlssues or extracts.
wltll glycolysis, tend to be relatively conservative
Figure 4 Gel origin guide (constructed from %-inch

acl yllc plastic). All measurements are in millimeters.
P-----152.5--4
I cuum
Detail of shield Setup

Figure 5 Gel aspiration setup. Plastic implosion shield
is made from %-inchor thicker acrylic sheet.
7. Drying agar overlays
2. Tracking dye (Appendix 2) or a blank space 8. Documentation of the results
should be used about every 10 specimens, but Many of the chemicals used in the various proto-
without separating populations or taxa. cols are extremely hazardous; Table 4 briefly
3. Initial specimen aligiunent on the gels should summarizes the known acute and chronic health
allow the first sample(s) to be repeated occa- hazards. For more information, "material safety
sionally, or at least on the end of the gel, espe- data sheets" may be available from suppliers free
cially if more than two taxa or populations are of charge upon request when ordering chemicals.
being surveyed. In order to differentiate Contact with these organic and inorganic sub-
among different alleles, side-by-side compar- stances should be avoided, as many are absorbed
isons are necessary. readily through the skin. Protocols should not be
performed without using protective laboratory
4. Some enzyme systems require, or are better coats, rubber gloves, dust masks, and eye goggles
resolved with, the addition of known activa- whenever appropriate. Food, drink, and tobacco
tors or cofactors such as EDTA and MgZ+(e.g., should not be allowed in the laboratory. Other
PGM), or coenzymes such as NAD (e.g., safety precautions, such as eye wash stations,
ADH), or NADP (e.g., G6PDH, PGDH) (Har- showers, chemical spill clean-up kits, and first
ris and Hopkinson, 1976).These are added to aid kits, should be available. Operator safety
specific gels either before cooking (EDTA, must be accorded priority over all other consid-
Mg2+)or following aspiration (coenzymes). erations.
PROTOCOLS Protocol 3 : Tissue Homogcrkizatlon

(Time: 2 min/specimen)
1. Tissue homogenization
2. Preparation of starch gels Tissue extract preparation may precede or follow
preparation of the starch gels depending on (1)
3. Gel loading the necessity for extremely fresh extracts, or (2)
4. Electrophoresis desirability of preparing gels a day in advance.
The homogenization of tissue samples far in ad-
5. Gel slicing vance of their use may result in significantly re-
6 . Histochemical staining duced levels of enzyme activity. Many extraction
74 Chapter 4 / Murphy, Sites, Butlz b Haufler
Table 4
Chemicals required for electrophoresis, use, location of storage, a n d health hazard information
Reference
Chemical (use)a Locationb numberC Health and safetyd
Acetic acid, glacial (buffers) s

Acetone (RDH) s
Cis-Aconitic acid (ACOEI) f
Adenosine (ADA) r
Adenosine 5'-diphosphate (AK, ARK, CK, ENO, PK, TAT) f
Adenosine 5'-monophosphatc (AK) f
Adenosine 5'-triphosphate (AK, GLAL, GUK, HK, PFK, f
PGAM, PGK, SUDH, UK)
Agar (general) s
DL-Alanine(ALAT, ALPDH) s
DL-Alanyl-DL-methioninc(PEP) f
Aldolase (GAPDH, PFK) r
Amaranth (tracking dye) s
3-Amino-9-ethylcarbazo1c (PER) s
N-(3-Aminopropy1)-diethanolamine(buffers) s
N-(3-Aminopropy1)-morpholine(buffers) s
Ammonium hydroxide (general) s
Arsenic acid (Na salt) (ADA, GAPDK, TPI) s A-6756
Ascorbic acid (GLT) s A-1417
L-Ascorbicacid (GLAL, NTP) s A-0278
L-Aspartic acid (AAT) s A-9256
1,3-Bis(dimethy1amino)-2-propanol (buffers) s B-4298-5
Boric acid (buffers) s 0-0252
Brilliant blue G (CBP, GP) s 0-1131
Calcium chloride (GUK, PER) s (2-3881
Citric acid (anhydrous, free acid) (buffers) s C-0759
Citric acid (dihydrate) (buffers) s (2-7254
Citric acid (monohydrate) (buffers) s C-7129
4,6-Diamidino-2-phenylindole(DAPI) f D-1388
o-Dianisidine dihydrocl~loride(PEP) r D-3252
Dichlorophenol-indophenol (CBR, DDH, Gli, LGL) s D-1878
Dihydroxyacetone phosphate (TPI) r D-7878
Dowex (ion exchange resm) (TPI) s 50 X 4-200R
N,N-Dimethylfor~namide(PER) s D-4254
Dimethyl sulfoxide (p GA, PGALA) s D-5879
Ethanol (ADH, ODH) s xxxxe
Ethylenediaminetetraacetate s EDS
(EDTA) free acid (buffers, AAT)
EDTA-dihydrate (PGAM, SUDH, buffers) s
Fast blue BE (salt) (ALAT, AAT, EST) f
Fast blue RR (salt) (ALP) f
Fast garnet GBC (salt) (PGLUR, CAP) f
Flavine adenine dinucleetide (GR) f
-
(contzrzucd)
Proteins: Isozyrne Electrophovesisis 7s
Table 4 (conti~zued)
Chemicals required for electrophoresis, use, location of storage, and health hazard informatior-1
Reference
Chemical (use)' l.ocationb numberC Health and safetyd
Fluorescein diacetate (CAI f P7378 E,S,R,I

Formaldehyde (FDH)
D-Fructose-1,6-diphosphate(ENO, FBA, FBP, ALD,
GAPDH, PK)
D-Fructose-6-phosphate (GPI, PFK, buffers)
F ~ ~ m a racid
i c (FUMH)
D-Gluconic acid lactone (HADW)
D (+)Glucose (AK, ARK, CK, GCDH, HK, PK)
a -a- Glucose-1,6-diphosphate (PGM, UGUT)
a-D-Glucose-l-phosphate (PGM)
D - Glucose-6-phosphate (G6PDH)
Glucose-6-phosphate dehydrogenase (AK, CK, GPI, HK,
MPI, PGM, PK, UGUT)
L-Glutamic acid (GLAL, GTDH)
L-Glutamic dehydrogcnase (ALAT, TAT)
Glutathione, oxidized (disodium salt) (GR)
Glutathione, reduced (FDH, HAGH, LGL)
Glyceraldel~yde-3-pl~osphate de17ydrogenase (ALD, I;BA,
PF, PGAM, PGK, TPT)
DL-Glycericacid (GCYDH)
Glycerol (fixatwe)
DL - a -Glycerophosphate (G3PDH)
a- Glycerophosphate dehydrogenase (PGK)
Glycine (buffer)
Glycolic acid (HAOX)
Glycyl-L-leucinc(PEP)
Glyoxalase I (HAGH)
Guanine (CDA)
Guanosine-5'-manopl~osphate(GUK)
1-Hexanol (ADH)
Hexokinase (AK, ARK, CK, PK)
L-Histidine HCl monohydrate (gel buffer)
Hydrochloric a c ~ d
Hydrogen peroxide (CAT, PER)
DL-P-Hydroxybutyricacid (Na salt) (HBDH)
Ilypoxanthine (XDH)
Inosine (FNP)
Inosine triphosphate (NTP)
DL-Isocitric acid (IDH)
Isocitric dehydrogenase (ACOH)
a-Ketoglutaric acid (ALAT, AAT, TAT)
Dt-Lactic acid (Buffer, LDW)
76 Chnpfer 4 / Mz~vphy,Sites, Butk & Haufler
Table 4 (cont~l?ued)
C h e ~ ~ x i c arlesq u i r e d for electrophoresis, use, location of storage, a n d health hazard i n f o r m a t i o n
- - .
Reference
Chemical (use)" tocaticlnb numberC Health and safetyd
L-Lactic dekydrogenase (AK, ALAT, CK, ENO, GUK, r
tIAG1-1, PK,UK)
L-Lc~ucyl-L-alanine(gci~eralPEP) r
L - Idcucylglycylglycine (PEP) f
I,-Leuclne p-naphthylarn~deHCl (CAP) f
L-Leucyl-L-lcucyl-L-leuclne(PEP) f
Lithium hydroxide (buffer) s
IvIagneslum acetate (CK) s
Mdgncsium chlorlde (general) s
Magneslurn sulfate (ALP, PK, buffers) s
Male~cacld (buffers) s
D T -Malic a c ~ d(buffer, MBH, MDHP, ME) s
Maiic dehydrogenase (FUMH) r
D -hfal1nose-6-phosphate (MPI) f
2-Mercaptoethanol (PBP, NTP, PFK) r
Methylglyoxal (HAGH, LGL) r
Methyl alcohol s
4-Mcthylumbelliferyl acetate (EST) f
4-Metlzyl~imbelliferyl-N-acetyl-P-D- galactosa~nlde@GALA) f
4-Metl~ylur~1bel11fery1-N-acetyl-~-~-glucosamide (PGA) f
4-bf/lcthylurnbelliferyl-a-L-arabinoslde (aARAB) f
4-i\.Teihylumbelliferyl-a-D-galactoside (aGAL) f
4-blethylumbelliferyl-PD- galactoside @GAL) f
4-Methylui~1belliferyl-a-~-g1ucoside(aGLUS) f
4-Me tl~ylumbelliferyl-P-D-glucoside (PGLUS) f
4-Melhylumbelllferyl-P -D-glucuronide ((3 GLUR) f
4-Methylumbelliferyl-a-w-mannopyranoside (aMAN) f
Molybdlc acid ammoruum tetrahydrate (GLAL) s
MTT (Lctrazolium salt) (general) r
f3-NAD (Nicotinam~deadenine dinucleotide) (general) f
p -NXJ3H (general) r
p -NADP (general) f
P-NADPH (GSR) f
Naphthol AS-BI (3-D-glucuronic acid (PGLUR) f
Naphthol blue black (anlido black) (GP) s
U-Naphthyl acetate (EST) f
P-Naphthyl acetate (EST) f
p-haphthyl acid phosphate (ALP) f
N ~ t r oblue fetrazo31um (NBT) (general) r
Nucleoside phospl~orylase(ADA) r
1-Octanol (ADII, ODH) s
D -0cinpine (OPDK) f
I-lr'e~ltanol (ADH) s
(conf i n d )
Table 4 (confi7zued)
Chemicals required for electrophoresis, use, location of storage, and health hazard information
Reference
Chemical (use)' ~ o c a t i o n ~ numberc Heaith and safetyd
Peroxidase (PEP) f P-8125 A, E,S, X,I

Phenazine methosulfate (PMS) (general) f P-9625 M, E,S, It, I
Phenolphthalein diphosphate (ACP) f P-9875 E, S, R, 0 , I
L -Phenylalanyl-t-proline (PEP) f P-6258 N
Phosphocreatine (CK) f P-6502 N
Phospho(enoZ)pyruvate(AK, GUK, PK, U K ) f P-7127 E, S, I
Phosphoglucomutase (UGUT) r P-6156 A, I
6-Phosphogluconic acid (Ba salt) (I'GDH) f P-7627 P, E, S, I
6-Phosphogluconic acid (trisodiurn salt) (PGDH) f P-7877 N
Phosphoglucose isomerase (FBE MPI) r P-5381 A, 1
(= Glucose-6-phosphate isomerase)
3-Phosphoglycer~cphosphokinase (PGAM) r P-7634 A, I
Phosphoglycerate mutase (ENO) r P-8252 A, I
D (4)-2-Phosphoglyceric acid (ENO, PGAM) f P-0257 N
D (-)-3-Phosphoglyceric acid (ENO, PGK) f P-0769 N
Plzospho-L-arginine(ARK) f P-5139 N
Polyvinylpyrrolidone (ALP, AAT) s PVP40 E, S, C
Potassium acetate (CK) s P-1147 E,S, I
Potassium bicarbonate buffer) s P-9144 E,S, 1
Potassium chlorlde (AK, ENO, GUK, PK, buffers) s P-4504 E, S,R, 1
Potassium cyanide (RDS) s F, E, S, 8, G,0
Potassium hydroxide (XDH) s P-1767 D,v, H
Potassium iodide (CAT) s P-8256 A, E,S, R, B
Potassium phosphate (dibasic-anhydrous) (buffer) s P-8281 E, S, R
Potassium phosphate (dibasic trihydrate) (buffer) s P-5504 E,S, R
Potassium phosphate (monobasic) (buffer) s P-5379 E, S, R
Potassium sulfate (UK) s P-0772 E, S, R, I
Pyrazole (GLYDH, HADH, general) s P-2646 E, S, R
Pyridoxal-5'-phosphate (TAT) f P-9255 E, S, I
L-Pyroglutamic acid (PCDH) s P-3634 E, S, I
Pyrophosphate (UGUT) s P-8135 E, S,R,I
Pyruvate kinase (salt free) (AK, CK, ENO, GUK, UK) f P-9136 A, I
Pyruvic acid (ALPDH, HAGH, GLYDH) r P-2256 E,S, 1
Retinol (RDH) f R-7632 E, H, S, G, 2;1
Shikimic acid (SKDH) s S-5375 E, S, I
Sodium acetate (ACP) s 5-8625 E,S, R, I
Sodium chloride (ALP, HBDH) s 5-9625 E, R, S
Sodium hydroxide (buffer) s S-5881 F, D, 1, V
Sodium phosphate (Na2HP04)(buffers, AAT) s 5-0876 E,S, 1, R
Sodium thiosulfate (buffers, CAT, TST) s $8503 E, S, I
D -Sorbit01 (buffers, IDDH) s S-1876 E,S, R,I
Starch (potato), hydrolyzed r 5-4501 A
Succinic acid (free acid) (buffer) s S-7501 E,S, R, I
(continued)
Table 4 (continued)
Chemicals required for electrophoresis, use, location of storage, and health hazard information
Reference
Chemical (use)a ~ o c a t i o n ~ numberC Health and safetyd
Succinic acid (disodium salt) (SUDH) s 5-2378 E, S, I

Sucrose (buffers) s S-9378 E,S, R
5-Sulfosalicylic acid (CBI', GP) s 5-2130 E,S, R,1
Sulfuric acid (GLT) s D, V
Trichloroacetic acid (CBP, GP) s T-4885 F, D, G, I
Triethanolamine (buffers) s E, S, R, '2, 0
Triosphosphate isomerase (PFK) r T-2507 A, I
Trizma base (buffers) s T-1503 E, S, R
L -5rosine WCi (TAT) s T-2006 E, S, R,I
Uridine 5'diphosphaglucose (UGUT) f U-4625 E,S, R,G,0,
1
Uridine 5'-monophosphate (UK) f U-1752 N
Venom, Crotalus atvox (PEP) f V-7000 F, A, E
Xanthine (XDH) s X-7375 E, S, R, I
Xanthine oxidase (ADA, GDA, PNP) r X-1875 E, I
Zinc chloride (I-IAGH) s 2-4875 D, V
a Enzyme systcm(s) and/or buffer(s).

s = room temperature, shelf, f = freezer (-20°C); r = refrigerator (0-4OC).
Slgma Chemical Company (St Louis, Missouri, U.S.A.) catalog numbers are provided to dlslinguish among multlple forms of
some chemicals; however, reagent-grade chemicals may be obtained from most major suppliers. S~gmacatalog numbers are pro-
vided as a reference and do not necessarily represent endorsement.
' Safety information as of 1993 from information available from Sigma. All of these chemicals may be harmful if inhaled, swal-
lowed, and/or absorbed through skin. A = allergen (especially respiratory tract and shn); B = effects on fertility (c.g., spermatoge-
nesis, testes, epididymis, sperm ducts, male fertility index and/or post-implantation mortality); C = carcinogen; D = high concen-
trations are extremely destructive to tissues (mucouy membranes, respiratory tract, eyes ruxd skin); E = eye irritant, F = potentially
fatal (if inhaled, swallowed, or absorbed through skin), at acute level; G = CNS depression, narcotic effect, convulsions, etc.; H =
headache, nausea and/or vomiting; I = not tl~oroughlyinvestigated; M = mutagen; N = no hazards knewn; 0 = other harmful
effects known; P = poisonous; R = respiratory irritant; S = skin irritant; T = teratogen; V = corrosive.
Denatured alcohol should not be used, We have achieved greatest ADH enzyme activity by using "gold" tequila as a source of
ethanol; other liquors may work equally well.
buffer recipes use 2-mercaptoetl~anol,a sulfhydryl mogenizer, (3) hand grinding with a glass test
reducing agent, to reduce subbands. However, at tube or rod sanded on its base and a porcelain
least in reptiles, this ingredient significantly re- spot plate (Werth, 1985; Kephart, 1990), (4) motor-
duces the activity levels of many enzyme systems. ized plastic (e.g., TeflolP) pestle and plastic (cen-
Phenolic compounds in many plant tissues trifuge tube) mortar, or (5) a high-speed tissue ho-
form complexes with proteins upon homogeniza- mogenizer with a generator blade (Figure 6 ) .
tion. The addition of polyvinylpyrrolidone to the Homogenization using devices designed not to
extraction solution usually reduces this problem; disrupt cell membranes (Figure 6 ) may require
some plants also require other ingredients (see that the samples be subjected to sonication or re-
Werth, 1985; Kephart, 1990). freezing for 10 min at -20°C, All of the methods
There are several ways of extracting enzy- work very well, even without sonication; the lat-
matic proteins from cells including (1) simple ter, and initially most expensive method, is the
maceration of tissue(s) with scissors f ollowed-by fastest. If san~plesare not to be used immediately,
freezing, (2) use of a hand-held ground-glass ho- refreeze, preferably in an ultracold freezer.
preferably at >10,000 g for 15-30 nzirl, to sepa-

rate extracted proteins from cellular dcbris.
Although highly desirable, centrifugation is
not always necessary for some tissues and
some taxa.
of SkarcJ'~Gels
PsaafocaR 2: $brcparnhic~n
(Time: 2-3 hr/gel)
Gel cooking involves either the boili~lgof hy-
drolyzed starcli in gel buffer (below) or tlic addi-
tion of starch to hot gel buffer (e.g., Micalcs e l al.,
1986).Hydrolyzed potato starch may be made fol-
lowing the method of Smithies (1955) or pur-
chased. Although relatively expensive, Con-
naught Medical Laboratories (Toronto, Ontario,
Canada) starch has a longstanding reputation for
consistently producing very high-quality gels.
Electrostarch Co. (Madison, Wisconsin) starch is
relatively inexpensive, but variable in quality, and
sometimes requires the addition of Connaught
starch to make it usable. Starch from Starch Art
Corp. (P.O. Box 268, Sniitl~ville,Texas 75957
U.S.A.) produces highly satisfactory gels and is
moderately priced, As with Electrostarch, a frcc
sample is available on request. Other sources of
Figure 6 Homogenization of tissue extracts using a hydrolyzed potato starch include varlous cherni-
high-speed homogenizer. See text for other methods. cal (e.g., Sigma) and biological supply compal~ics;
these are invariably the most expensive and usu-
ally obtain their stock from the sources above.
1. Dissect out desired tissues or retrieve previ-
Typically, starch gels are made in concentra-
ously dissected tissue samples from the freezer
tions of 9-18% (w/v) starch in gel buffcr, depend-
and place them in a clean grinding tube.
ing on the quality of starch, preferred texture of
2. Dilute the samples 3-5 fold with grinding so- the gcl, and desired sieving effect obtained dunng
lution. The ice-cold grinding solution may be electrophoresis. The appropriate conccn trat~ons
either distilled, deionized water or one of are determined by trial (and error).
many solutions described in the literature Thrce problems may occur during gel prcpa-
(e.g., Selander et al., 1971; Harris and Hop- ration: undercooking, overcooking, and btlrnmg.
kinson, 1976; Werth, 1985; Kephart, 1990). If Undercooking can be recognized by soft, w e t gels
enzyme activity levels are to be surveyed, the that are difficult to lzandle following slicing; un-
tissue samples must be weighed precisely and dercooking is rare. Overcooking is easily recog-
diluted (Klebe, 1975; Kettler and Whitt, 1986; nized during four stages. aspiration, cool~ng,
Kettler et al., 1986). loading, and slicing. Durrng aspiration, over-
3. Mechanically homogenize the mixture of tis- cooked gels may boll out of the flask. Vrgorous
sue and grinding solution. The mixture shaking during asplratlon may be required if the
slzouid be kept ice-cold during the l~ornoge- gel is to be saved, although t h ~ is
s sometlmcs 111-
nization process. effective. During cooling, deep crevasses or clrcu-
4. Just prior to use, centrifuge the homogenate, lar or octagonal patterns may form in the surface.
80 Ciznpter 4 / Mzlrplzy, Sites, Buth & Haufler
Overcooked gel mixtures tend to stick to the gel a. While wearlng eye protection and insulated
moldi, often splitting during loading or removal glove(s), continuously swirl flask above a 1,000-
for sLicing following electrophoresis; gel slices are Cal Bunsen burner (Figure 7A). The mixture
tacky and sometimes iinpossible to handle. Burn- will become viscous and then quite rapidly
mg can occur without overcooking. It results fron~ much less viscous. As boiling begins (after
not swrrling the mixture vigorously enougl-t dur- about 3 4 min) stop heating.
ing cooking and can be recognized by brown- b. Use a magnetic stirring hot plate and large mag-
black, burned starch on tl-te bottom of the flask netic stirring bar to heat the starch-buffer mix-
and/or dark flecks in the gel. Burning frequently ture until the mixture becomes too viscous for
results in tacky gels. Improperly cooked gels the stirring bar to swirl. Remove the flask and
should be discarded. occasionally swirl by hand until the mixture be-
Most types of gels can be cooked, poured, left comes less viscous once again, in about 1 rnin.
overnight, and run the following day. However, Return the flask to the stirrer and continue heat-
Tris-citrate/borate, Tris-citrate 111, lithium-bo- ing until boiling as above. This procedure takes
rate/?i.rs-citrate and Tris-I-ICl gels tend to crack about 20 min.
during electrophoresis if used after this period of c. Cut the bottom out of a microwave oven having
storage. a stainless-steel interior in order to accommo-
Frnally, there are a number of peculiarities date a magnetic stirring plate. While stirring,
associated with some gel buffers. Tris-borate- heat the starch-buffer mixture until it becomes
EDTA 11 gels tend to stick to the flask after cook- less viscous. Stop heating. (We have not used
ing. The problem can be overcome by lowering this method.)
the percentage of starch by 0.5-1 percent and/or
preparing an extra 20 ml of gel. Tris-citrate/bo- 5. Using an insulated glove, quickly transfer
rate and lithium-borate/Tris-citrate gels tend to molten gel to the aspiration shield, set flask on
spIi t apart at the origin during running (see Pro- a heat pad and cover the open hole of the T-
tocol 4). Borate gels tend to be difficult to aspi- connector to apply vacuum for about 15 sec
rate, sllght undercooking and/or vigorous shak- (Figures 5 and 7B). The mixture will resume
ing during aspiration reduce these problems (see boiling. Swirling of the flask may be required
also Protocol 4). during the first few seconds to avoid aspirat-
ing the gel out of the flask. SIowly release the
1. Locate a stable, horizontal surface to hold gel vacuum.
molds until gels are cool enough to move (=1
hr). The surface should be near the aspirator. 6. Rapidly pour the hot mixture into gel mold
filling evenly and almost overflowing (Figure
2 I'repare gel molds for receiving hot starch: 7C); avoid dribbles.
unglued wick molds (e.g., Micales et al., 1986)
7. Immediately (within 1 min) remove any re-
il-~usthave the edges clamped; use masking
maining air bubbles from the molten gel us-
iape and seal the open portions of the legs of
wrckless molds. Place molds on the table or ing a Pasteur pipette and pipette bulb.
bench on top of a paper towel. Label the pa- 8. Rush used flask in hot running water before
per towel (or masklng tape) noting the type of remaining mixture solidifies.
gel buffer to be poured and the date. 9. After cooking all gels, and while they are
3. Welg11 out 40 g (or appropriate weight) starch, cooling, fill buffer wells (trays).
place lnto a 1000-ml glass Erlenmeyer flask 10. Allow the gel to cool to ambient temperature,
(narlow mouth, heavy duty rim), and add about 45-60 min, and gently cover with plas-
400-ml gel buffer. Swirl contents until starch tic food wrap. With both hands, 11old tlre
is well emulsified. wrap at one end. Allow the opposite free end
4. Cook gel using one of the following methods: to contact one edge of the gel. SlowIy lower
Proteins: lsozyrne Electrophoresis 81
Figure 7 (A) Cooking, (B) aspirating, and (C) pouring a starch gel.
the wrap allowing it to drop on the gel. If air ple wicks-rectangular pieces of filter paper
bubbles begin to form, lift the wrap and lower (Whatman No. 3) measuring 2-4 mm in width
it again; air bubbles induce malformations in and 1 mm taller than the gel mold. Wicks can be
the gel. surface. Pulling/tugging of the wrap hand-cut or purchased. The following protocol is
should be avoided as this can split in the used for loading multiple gels, and for right-
forming gel matrix. Gently write the name of handed operators.
the gel buffer on the wrap using a felt-tip
1. Before loading, make sure that the buffer
marker.
wells have been filled and labeled.
11. Place gel in refrigerator for 1 hr, or allow to
2. If applicable, remove frozen homogenized
continue to cool at ambient temperature for 2
samples from freezer and initiate thawing,
hr.
and recentrifuge if desirable; keep thawed
samples chilled.
3. Number a piece of filter paper from 1 to the
Yrofocol3: Gel Loading number of samples being applied to a gel, in-
(Time: 10-20 min per gel) cluding tracking dye. Tape the paper to the
The inoculation of protein extracts into horizontal table to the right of the operator.
gels is generally accomplished by the use of Sam- 4, Make stacks of wicks on the numbered filter
82 Chapter 4 / Murphy, Sites, Buth B Hauf7
paper. Each stack should contain as many

wicks as there are gels to be loaded.
5. Remove gels from refrigerator and fold the
plastic wrap back onto itself parallel to the
sample origin exposing half of the gel (or
more),
6. Cut the edges of the gel free from the mold
using a microspatula.
7. Slowly cut gel origin vertically using a thin,
stainless steel microspatula and a gel origin
guide (Figures 4 and 8A). The guide must be
firmly held against the gel mold in order to
avoid slipping. Trial buffer gels should be cut
near the middle of the gel, others nearer to
one edge.
8. Thoroughly wet the first five stacks of wicks
with the first five tissue extracts, respectively,
using Pasteur pipettes or 1-200 p1pipetters.
Avoid cross-sample contamination by dispos-
ing of used pipette tips between samples.
9. Place the narrow side of the gel nearest to
operator. Gently open the origin of the well
about 5 mm by pushing the wide side of the
gel away. Using narrow-tip forceps, pick up
a damp wick from the first stack and place it
vertically into the gel origin against the nar-
row side, 1 cm from the left side of the mold,
and in contact with the bottom of the gel Figure 8 (A) Cutting the gel origin u s u ~ ga gel origin
guide and (B) loading a starch gel.
mold (Figure 8). Load the remaining four
samples, spacing the wicks about 1.5-2 rnm
apart. Sequentially load any other gel($.
10. Repeat steps 8 and 9 using the next series of 12. Recover gel with plastic wrap and perform
samples. Using tracking dye or a blank space electrophoresis as described below.
about every 10 specimens facilitates later gel
interpretation; allow 3 mm of space on either
side of a tracking dye wick. The last wick Protocol 4: Efectrophosesis
should be soaked in tracking dye and located (Time: 4-24 hrs)
about 5 rnm from the edge of the gel mold.
Two primary types of buffer systems are used:
11. Once all samples have been loaded, examine continuous and discontinuous. In continuous sys-
the wick placement from the bottom side of tems, the gel buffer is usually a 10% or less dilu-
the gel mold to be sure that all wicks are com- tion of the tray (electrode)buffer. In discontinuous
pletely inserted into the well. DO NOT shift systems (e.g., Tris-citrate/borate, and Tris-HC1) the
wicks laterally. Using a rolling action of the tray (borate tray buffer) and gel buffers are made
index finger remove any bubbles from the of different electrolytes (Appendix 2); this system
bottom of the gel by pushing them to the has the effect of tightening isozyme bands during
sample origin. electrophoresis (see Richardson et al., 1986). The
Proteins: Isozyme Elecfropho~esis 83
tray buffer elech.olytes can be observed to migrate without supervision). Splitting typically occul-s
through the gel. when the tray buffer electrolytes pass tl~roughthe
Certain kinds of gels have peculiarities, espe- origin. There are three remedies to this problem.
cially the discon.tinuaus buffer systems. In many First, sligl~tlyovercook the gels durlng prepara-
buffer systems, such as Tris-HC1, the amperage tion. Second, push the gel halves together after
(electric current) drops as electrophoresis pro- the borate line has passed through the origin.
ceeds. Consequently, if running time is to be rnin- Third, following 1-2 hr of electrophoresis, wedge
imized the voltage should be progressively raised plastic drinking straws or thin glass rods between
to the maximum level (Table 5) about every half the gel and inside edge of gel mold thereby forc-
hour but without exceeding 75 mA. ing the gel halves together, These gels should be
Tris-citratelborate, and lithium-borate/Tris- checked at the midpoint of electrophoresis to as-
citrate gels tend to split apart at the origin during sure that spli.tting has not occurred. Splits can be
electrophoresis, especially if the gels were cooked repaired by pushing the two halves of the gel
a day in advance or run overnight (i.e., run slowly back together.
Table 5
Recommended electrophoretic conditions for the wickless system described herein,
including electric potential in V/cm and average duration
Buffer combination Vicm Duration
Amine-citrate (morpholine) 4.2 Overnight 14 hr

Amine-citrate (propanol) 42 Overnight 14 hr
Borate (conhnusus) 3.9 Overnight 18 hr
8.3 6-7 hr
Borate (discontinuous) 3.3 Overnight 14 hr
5.5 7-8 hr
Histidine-citrate 12.0 6-7 hr
Lithium-borate/Tris-citrate 3.8 Overnight 20 hr
11.0 7-8 l-ir
IJhosphate-citrate 2.2 Overnight 20 hr
4.4 10-12 hr
Tris-borate-EDTA I 2.7 Overnight >l8 hr
Tris-borate-EDTA11 3.3 Overnight 18 hr
11.0 6-7 hr
Tris-borate-EDTA-lithium 5.8 12 hr
Tris-citrate I1 3.3 Overnight >14 hr
6.1 7hr
Tris-citrateIII 3.8 Overnight 22-24 hr
Tris-citrate-borate 1.6 Overnight 18 hr
11.0 5-6 hr
Tris-citrate-EDTA 4.4 Overnight 12 hr
8.3 6 hr
Tris-EDTA 8.3 12 hr
Tris-HC1 9.7 5 hr
2.2 20 hr
Tris-maleate-EDTA 4.8 18 hr
84 Cizapter 4 1 Mtirplzy, Sites, Buth & Haufler
It is frequently possible to run gels much able to place a paper towel and glass plate be-
more rapidly than recommended, down to as lit- tween the gel and ice pack in order to prevent
tle as four hours. Because of the sieving effect of freezing of the gel surface.
starch gels, however, rapid running usually re- 4. Plug the well box top into the bottom, i.e.,
sults Ln less well-defined protein bands following connect the buffer well electrodes to the
staining. Moreover, as gels begin to heat up, resis- power supplies (Figure 9).
tance increases and further heating will likely OC- 5. Turn the power supply on, allow it to warm
cur-to the extent of melting gels! up for a few minutes, and adjust to desired
1. If wickless gel molds are used, remove the voltage/amperage levels (Table 5). Amperage
masking tape from the legs. should not be allowed to exceed 100 mA, and
2. Place the gel mold in the buffer well box ori- preferably 75 mA, as overheating of the gel
enting the narrow end towards the cathode will likely occur.
(negative, black terminal). If wick molds are 6. After 25 min of electrophoresis, check track-
used, a sponge cloth must be used to coming dye by examining edge of gel mold to as-
plete the electric circuit between the gel and sure that the gel was properly oriented in the
buffer wells. While wearing rubber gloves, buffer well. If not, reverse polarity of the elec-
dlp the sponge clot11 into the well buffer and trodes at the power supply.
place it so that one end is in the buffer, and 7. Check ice levels every 2 hr if not running gels
one on the gel surface 1 cm onto the gel and in refrigeration.
under the plastic food wrap. 8. When tracking dye has reached the end of the
J I'lace either an alummum tray filled with gel, turn power supply off and remove gel
clusl~edice, or a frozen package of Blue IceTM (and gel mold) from buffer well box.
on the gel ensuring that the plastic wrap com-
pletely covers the gel and separates it. from
the ice pack. If Blue IceTM is used, it is advis-
Pro~ocial5:Gel Slicing
(Time 5-10 min/gel)
Once electropl~oresishas been completed, the gels
Figure 9 Horizontal starch gel apparatus during
elccrroyhoresis. The electropl~oresisbuffer tray is a need to be sliced and the slices placed in stain
sl~gl~ily
more complex version of that shown in Figure
3. The gel is being cooled by using Blue IceTM.
Figure 10 Gel slicing. (A) Use of a simple bow slicer use of a slicing tray. Note that when handling a gel
(see Figure 4). (B) A multiple slicer (plans available on slice, bhe fingers of both hands are touching to prevent
request). (C) A gel sliced with a multiple slicer. (D) Gel stretching of the gel. Top glass (or plastic) plate in (A)
slice handling. The multiple slicer does not require the has been removed for clarity.
boxes. A number of methods have been devel- transfer the two parts separately. Improperly
oped including, among others, the use of bow cooked gels are difficult to handle by hand. Trans-
slicers and slicing trays (Figures 3 and 10A),mul- fer these slices by using plastic food wrap as a car-
tiple slicers (B.J. Turner, 1980; Figure 10B and C), rying medium,
and nylon string (thread) (Micales et al., 1986).Gel
slicing and handling should be carried out while 1. Using masking tape, label stain boxes with
wearing protective gloves, even though this in- the gel number, enzyme system or locus to be
creases the difficulty of the operations. stained, gel buffer, and date. (This step usu-
Several problems can occur during slicing, the ally is completed during electrophoresis.)
most common of which is that of splitting or tear- 2. Using a microspatula and gel origin guide, cut
ing the slices. Once a split has formed in a gel, it away the anodal and cathodal 1crn of the gel
can be extremely difficult to transfer slices from (or legs of the wickless gel), 3-5 mm of the
the tray to the stain box; splits usually result from edges of the gel, and notch the left anodal and
bending the gel too much while transferring it cathodal corners of the gel. Remove these
from one slicing tray to another. The easiest solu- edges and notched pieces from the mold, leav-
tion is to completely separate the split slice and ing the greater portion of the gel in the mold.
3. Separate halves at the origin and remove (Figure 10D) to the stain box. If an agar over-
wicks. The gel may be more difficult to move lay, UV fluorescing, or limited volume stain is
for some buffers (e.g., lithium-borate/Tris-cit- to be applied, then it is important that no bub-
rate). Using a paper towel, gently dry the top bles occur underneath tlze slice. Relatively ex-
of the gel, arrange the two pieces so tlzat they pensive or critical stains should be made on
form a V separated by about 1 cm at one end, slices cut from the bottom of the gel.
and cover with a piece of plate glass or a slic- 11. Repeat slicing. Always initiate subsequent
ing tray. slices from opposite ends of the gel to prevent
4. Invert the sandwiched gel and gently dry the uneven thinning. It may be necessary to re-
bottom of the gel. Choose the appropriate peat steps 4-5 if the remaining gel slides eas-
thickness of slicing tray (if applicable), center ily on the slicing tray. The top slice can be in-
it upside down on the bottom gel surface verted and used, although it is preferable to
with the tray ridges aligned with the origin, stain with an agar overlay. Remaining por-
and turn the gel right side up again. tions of gels can be temporarily saved (24+
5. Remove air bubbles between the gel and slic- hr) by wrapping.
ing surface. Failure to remove bubbles may
result in holes in the gel slice and/or render
tlze remaining gel incapable of being sliced. PrakocoB 6 : I-Tistochemicial Staining
Re-cover top of the gel with second slicing (Time: 2 inin to 6 hr/stain)
tray (or glass plate).
The distance of rnigratiotz of specific proteins
6. Clean slicer wire with damp towel or steel through a starch gel is visualized by histoche~ni-
wool. cal staining. These stains (Appendix 1)consist of a
7. Orient the gel so that the apex of the V is fur- substrate on which a specific enzyme reacts, and a
thest away from the operator. Brace the (bot- detection mechanism such as a dye or substance
tom) slicing tray to prevent it from moving to- that fluoresces under long-wave (340 nm) W. The
ward tlze operator during slicing. Place the common mechanisms for detection include (1)the
wire of the slicer on the raised ridges of the formation of a purple precipitate (forrnazan) by
slicing tray, press downward on the slicer, the reduction of NBT or MTT using PMS or DCIP
and in one continuous operation slowly as the intermediate electron carrier or reducer, re-
(about 3 cm per second) pull the wire through spectively; (2) the non-fluorescence of NAD,
the gel. Gels usually move toward the opera- wlzich is formed from fluorescent NADH; (3) flu-
tor slightly during slicing; do not stop pulling orescence of methylumbelliferone; (4) fast diazo
if this is observed, and DO NOT PRESS DOWN dye (e.g., esterases); and (5) the oxidized form of
ON THE TOP TRAY/PLATE. o-dianisidine diHCl producing an insoluble
8. Clean slicer wire with damp towel, or steel brown precipitate. Many stains also contain co-
wool. Do not immerse wire slicers in water. factors, coupling enzymes, and other requisite
molecules. Details of how each of these systems
9. Remove top tray/plate, carefully separate the
work are provided by Harris and Hopkinson
gel from the bottom slice, and transfer the gel
(1976), Richardson et aI. (1986), and Manchenko
to tlze second slicing tray allowing for the V
(1994). A complete understanding of the concepts
to have the opposite orientation (apex near
is desirable but not absolutely necessary, althouglz
operator). Similarly, move the anodal top slice
such understanding greatly faciIitates the resolu-
but use both hands to support opposite sides
tion of staining problems when they occur.
of the gel. Lift anodal top gel slice to second
Some stains (e.g., for PGM) are best applied
tray, forming a V.
to the gels in the form of an agar overlay, or an
10. Open a plastic staining box and carefully agar-based gel containing stain components;
transfer the anodal and cathodal bottom slices agarose may be preferred over agar because the
latter inhibits the activity of some proteins be necessary (e.g., HADH). When mlxlng
through binding (Harris and Hopkinson, 1976). formazan-based stains, all powdered lngre-
Most laboratories use agar because it is much less dients should be dissolved in the stain buffer
expensive. The overlays serve the function of con- and pH adjustments slzould be made before
taining the precipitating dye, which prevents it adding cofactors, PMS, and NBT (or MTT).
from either diffusing over a broad area of the gel Once completely mlxed, pour the stain onto
or becoming too diffuse to be observed. the gel and gently shake the box free~ngthe
Several UV-fluorescing stains (e.g., 0-GLU) gel from the bottom. Agar overlays are pre-
may be applied to the gel slices as filter paper pared by bringing a 0.7% (w/v) mlxture of
overlays, the overlays being cut from Whatman agar/stain buffer to a boil, allowing ~t to set
1MM or other thin filter paper (Harris and Hop- until all agar grains have disappeared, cool-
kinson, 1976). However, wc have not noticed an ing to just below 50°C, adding remaining
advantage over simply applying these stains di- staining components, and pouring onto the
rectly to the gel. gel slice. For the typical 50-ml stain, 3 5 4 0 ml
The quantity of specific chemicals in some of stain buffer is mixed with 0 35 g agar In a
recipes in Appendix 1 varies from amounts speci- 125-ml Erlenmeyer flask; the remaining
fied in other sources (e.g., Selander et al., 1971). 10-15 ml of staln components are added to
These amounts are the minimum required to re- the warm agar just prlor to covering the gel.
solve these protein systems from the maximum Under ideal condltlons, the agar 1s prepared
diversity of taxa. Often these quantities can be re- m advance of slicing and staining by bring-
duced by applying less stain to a gel, especially ing the mixture to a boil in a lnicrowave
once the region of activity has been identified. oven. The flask is corked or covered with
Most agar overlay stains can be easily accom- aluminum foil and kept in a 50°C water bath
plished using as little as 10 ml of stain solution. until used. Coollng of hot, molten agar can
Of the two dyes used in formazan-based be made rapid by the use of ice and an accu-
stains, MTT is cheaper, more toxic, and precipi- rate thermometer. The molten agar forms a
tates more rapidly than NRT but tends to diffuse gel at around 42OC. Some fluorescent stains
and is less stable. The two dyes can be used in are prepared as small agar overlays. Do not
concert. If NBT is yielding only faint bands ini- view the UV light or fluorescing gel wthout
tially, the addition of MTT during staining may the use of a UV light s h ~ e l dor protective
help to intensify the isozymes. glasses. Short wave lights are not necessary
For formazan stains, three components are and should not be used because of the ad&-
particularly sensitive to light: PMS, MTT, and tional health hazard.
NBT. Therefore, the stock liquid solutions and 3. Most stains should be incubated at J7OC fol-
staining gel slices must be kept out of light. Stock lowing staining.
solutions should be stored in either amber glass 4. Staining gel slices must be continuously mon-
bottles and/or bottles wrapped in aluminum foil, itored to prevent overstaining, which results
All stains can be safely and conveniently pre- in unresolvable, diffused, or smeared bands.
pared in Erlenmeyer flasks. Because some stains Some stains must be scored and documented
contain liquid components only (e.g., LDH), these as soon as they are ready, sometimes within 5
may be mixed directly in the stain box so long as min of staining. Stains using insoluble precip-
the stain buffer is applied first. itates can be preserved (see below) and scored
1. Dry chemicals should be weighed and placed following the completion of all staining, even
in a 225-m1 Erlenmeyer flask. on the following day.
2. Add the liquid components. Liquid compo- 5. If the stain has been applied as a liqu~d,and
nents can be handled safely using pipetting not an agar overlay, s~phonoff the stain solu-
devices. In some cases adjustment of pH will t ~ o nand save for appropriate hazardous
88 Chapter 4 /Mzirphy, Sites, Buth & Haufler
waste disposal. Completely cover the gel slice
with fixlng solution (about 50 ml; Appendix applied as an agar overlay. Overstaining results
2) and refrigerate. If M'TT is used as the dye, in very dense isozyine banding patterns. Occa-
do not flood the gel slice with fixative or the sionally, background "ghost bands" may be ob-
forrnazan dye will wash out of the gel; apply served. These bands result from the ability of an
only enough fixative to wet the gel slice enzyme to act on an alternative substrate (e.g.,
(about 20 mI). LDH acting on DL-glycericacid, the substrate of
GLYDH), presence of suificient substrate in the
tissue extract, or contamination by bacteria,
Eoilib Jeslzooting molds, and yeasts (e.g., ethanol and ADII). LDH,
A number of problems inay be encountered fol- ADH and other isozymes can be identified either
lowing application oi the stain, the most common by counterstaining, or by inclusion of the end
of which is the absence of enzyme activity on a product of the reaction, a procedure termed end-
gel This may have several causes. (1) If the dura- product suppression. For example, pyruvic acid
tion of electrophoresis is too long or short, the en- suppresses (but does not stop) LDI-I, and pyra-
zymes lnay have migrated off of the gel or re- zole inhibits ADH. For some enzyme systems
mained in wicks in the origin, respectively. (2) (e.g., GLYDH), use of one or more suppressors is
It IS possible that one (or more) of the stain com- required.
panenis were omitted from the stain recipe. Suc-
cessful staining may be possible by adding the
mlssing component to the stain. (3) Very weak ex- Pr~koccll7 : Drying r?f Agar Overdays
pression typically results from too little of a glven (Time: 6 hr)
compoi~ent,or the use of a partially degraded so-
lutlon of coenzymes. Under these cotlditions, it Agar overlays can be dried on filter paper
will be necessary to add additional stain compo- and saved as documentation as follows:
nents, or reorder the coenzyme. If more than one
1. Cut fiIter paper (c.g., Whatman No. 1) to di-
stall1 1s resolving inadequately, check for common mensions allowing it to fit into a stain box
stnln components, such as G6PDH. Coenzyme ac- (12 x 17 cm) and label it with the enzyme sys-
tivity can be checked by electrophoresis and stain-
tem, gel number, and buffer conditions.
ing a small amount of the coenzyme along with
tissue extracts where activity has been prevjously 2. Decant or vacuum excess fixative from the
resolved. (4) A change In starch lot can result in stain box.
the necessrly to change tile conditions of elec- 3. Cut the agar free from tke edges of the gel
trophoresis. (5) Shifts to a high pH can result in slice using a microspatula.
the conversion of NAD(P) to NAD(P)H. Check 4. Carefully overlay the filter paper on the agar
the pi3 of the final stain solution. (6) Finally, the overlay and then slowly lift the filter paper
addlhon of too much substrate or coenzyme can while separating the agar overlay from the gel
suppress enzyme activity. slice using a microspatula (Figure 11).
Smeared lsozymcs lnay result from use of the 5. Place the filter paper on a few paper towels
wrong eIectraphoresis buffer conditions, too high agar-side up and allow to dry (several hours).
a current (overheating), high concentrations of
11plds in the tissue extracts, or (rarely) improper 6. Once dry, curled overlays can be pressed flat
formation of the gel matnx. and wrapped in plastic for safe handling.
Diffuse isozymes can indicate overstaining, They should be stored in the dark and with
less than deal electrophoresis conditions andlor light pressure to avoid recurling. BECAUSE THE
that a n agar overlay stain should have been ap- AGAR WILL RETAIN DANGEROUS CHEMICALS FOR
YEARS, OVERLAYS SHOULD NEVER BE HANDLED
plied In the latter case, if light shaking of the gel
results in disturbance of the formazan precipitate WITHOUT WEARING PROTECTIVE GLOVES AND/OR
on top of the gel slice, then the stain should be UNLESS THEY ARE WRAPPED.
Proteins: Jsozyme Electrophoresis 89
lated by choosing a tissue that will express the de-

sired gene products in the most scorable fashion.
The variables of subunit structure and genetic
control are discussed below, followed by a brief in-
troduction to some common ~roblemsfaced in the
interpretation of zymograms. For additional dis-
cussion see Harris and Hopkinson (1976), Rider and
Taylor (19801, Moss (1982))Richardson et al. (19861,
Weeden and Wendel (19891, and Buth (1990).
While many enzymes are monomeric pro-
teins (i.e., made up of one polypeptide chain) the
majority are multimeric (made of two or more
polypeptide chains), most often dimers and
Figure 21 Lifting an agar overlay from a gel slice.
tetramers. Harris and Hopkinson (19761, in a sur-
vey of the subunit structure of enzymes, found
Psatocal8: DocumenEafiasn of Results 28% monomers, 43% dimers, 4% trimers, 24%
(Time: 1-5 min/slice) tetramers, and 1% octamers.
The simplest patterns of expression are single-
At the completion of staining, the isozyme pat- locus enzyme systems in diploid organisms. In a
terns should be documented by photography or homozygous individual, only a single allelic prod-
by drawing observed patterns on paper. Photog- uct is formed. Even if the enzyme in question is a
raphy can be accomplished with a standard 35- multimer, only one kind of homogenous product
mm camera, or more expensively with a Po- will be assembled; this product is a homomeric
laroid'M system. If 35-mm photography is used, a isozyme seen as a single zone of activity on a gel.
Y48 yellow filter should be fitted to the camera If the individual is heterozygous at this single lo-
lens in order to increase the contrast between the cus, two kinds of allelic products are formed. In
stained isozyme patterns and the gel background; the case of monomeric ehzymes, the two allelic
this filter is necessary for documenting W stains. products are produced in equal quantity, do not
Photography is best carried out on a copy stand interact structurally, and are expressed equally in a
fitted with a light box. A polarizing filter helps to given tissue of an individual (1:l ratio). A zymo-
cut glare if the gel slice is illuminated from above gram illustrating triallelic variation involving a
but it should not be used when photographing monomeric enzyme is shown in Figure 12. In the
W stains. If using a UV lamp rather than a trans-
illuminator, locate it close to the gel. When pho-
tographing, we place the stain box label on the gel
to document each photograph.
INTERPRETATION AND
TROUBLESHOOTING
The interpretation of the band patterns comprising Figure 12 Photograph exhibiting triallelic variation
the zyrnogram requires the knowledge of the sub- at the phosphogluc~mutaselocus (Pgm-A) in muscle
unit structure and the genetic control of the en- extracts from the cyprinid flsh Luxilus cardinalis.
zyme system. As discussed in the gene expression Specimens 1, 2, and 4 are homozygous expressing
only the 82 hornomex; specimen 3 is heterozygous
section, the tissue examined for enzyme activity expressing both 82 and 100 homomers; specimens 5,6,
may limit the number of gene products or sub- and 7 are also hcterozygous expressing both 68 and 82
units expressed. These variables may be manipu- homomers.
case of multirneric enzymes, the two allelic prod- Hardy-Weinberg expectations for the distribution
ucts are also produced in equal quantity in a given of allozyme products of a given locus is usually a
tissue but the products will usually randomIy as- safe one. Violation of this assumption suggests
semble to form all expected heteromers, in addi- that additional study is necessary, beginning with
tion to homomers. It is usually the case that the a reassessment of the scoring of that enzyme sys-
subunits of multimeric enzymes form homomers tem. Scoring only clear bands and omitting
and heteromers at random, yielding banding pat- smeared zones may overestimate the frequency of
terns in predictable rataos. Because heteromers of homozygotes. Frequently the report of 50% allele
similar composition can be assembled in several 1 and 50%allele 2 for a given locus in a table of al-
ways, the ratio of expected intensity of enzyme ac- lele frequencies is the result of incorrect scoring of
tivity differs among isozymes according to the an entire sample (n > 5 ) as heterozygotes.
subunit structure of the enzyme (Figure 13). This Difficulty in scoring gels can occur when any
variation is detailed in Table 6. of the other assumptions discussed previously are
The situation is more complex for multilocus violated. Exceptions to expected subunit interac-
enzyme systems. Multimeric gene products of tions and genetic control are often encountered.
multiple loci in an enzyme system often retain The random association of subunits of multimeric
their ability to form heteromers, and the number enzymes sometimes is restricted, yielding fewer
of isozymes formed can be considerable where zones of activity than expected. For example, cre-
heterozygosity occurs. Harris and Hopkinson atine kinase is a dimer in all vertebrates but the
(1976) provided the following equation for the
computation of the expected number of isozymes
(i) under these circumstances: Homozygote Meterozygote Homozygote
+I
Monomer
where L, = the total number of loci, h = the num- 0

ber of heterozygous loci, and n = the number of
subunits for this enzyme. The multilocus situation
differs from its single-locus counterpart in that,
whereas allelic products of a single locus can ac-
count for equal quantities of both products, the
products of two different loci would rarely con- + 1
tribute equal quantities of both products in the -3
same tissue. The predictable symmetrical ratios of Trimer -3
isozymes in heterozygotes cannot be extended to 1
the multiple loci unless, by chance, the two gene
products are produced in equivalent proportions.
+ I
Examination of enzyme expression in multiple
4
tissues may aid in distinguishing single-locus 6
heterozygosity from similar isozyme patterns re- Tetramer 4
sulting from interactive multilocus products. 1
However, the question of a two-allele, single-lo-

cus model versus that of interactive products of
two homozygous Ioci can also be addressed by Figure 13 Diagram of isozyme patterns expected in
homozygotes and heterozygotes for enzymes of com-
comparing the frequency of heterozygotes to the mon subunit composition. Modified from Harris and
predictions of Hardy-Weinberg equilibrium (e.g., Hopkinson (1976). Ratios of intensity of isozyme acliv-
Ferris and Whitt, 1978a). The assumption of ity in heterozygotes are indicated. See Table 6.
Proteins: Isozyme Electroplzoresis 91
Table 6
Subunit structures of homomeric and heteromeric isozymes in heterozygotesa
Monomer Dimer Trimer Tetramer
Homomer
Heteromers
Homomer
aModified from Harris and Hopkinson (1976).Two alleles at tlus slngle locus determine polypeptide uruts
enzymes
1 and 2 respectively. Random combination of subumts of m u l ~ m e r ~ c is assumed.
heterodimer is not formed in heterozygotes at the ity or homozygosity should be correlated arnang
Ck-A locus in teleost fishes (Ferris and Whitt, tissues of an individual (e.g., Murphy and Crab-
197813). The subunit structure of enzymes often is tree, 1983). The probability for unlinked multiple
quite conservative across taxa; however, some en- loci to covary in such a way can be addressed sta-
zymes have been reported to have a variety of tistically (see Hart1 and Clark, 1989).
structures in different groups of organisms In all studies that deal with questions of
(Manchenko, 1988). These reports may reflect ei- whether mobilities of electromorplzs are equiva-
ther real structural differences among taxa or the lent or whether an individual is heterozygous at
restriction of heteromer formation misinterpreted a locus, the resolution of discrete zones of enzyme
as structural differences. Rigorous testing (e.g., activity on a gel is essential. If multiple buffer sys-
Ferris and Whitt, 1978b) should be applied in tems are not used or if tissue extracts no longer
these cases. On rare occasions, allelic products provide sufficient enzyme activity, the rcsolut~on
have different catalytic properties and expected may be inadequate, Interpretation of these subop-
ratios of isozyme expression are not realized. Ex- timal gels results in dubious data sets. For exam-
amination of a large series of individuals that re- ple, overstaining will obscure the subtle differ-
solve all heterozygous and homozygous cate- ences in relative activity of isozymes. In spite of
gories should allow the correct interpretation of the resolution of discrete zones of enzyme activ-
such variation. Epigenetic effects yield isozymes ity and efforts to limit enzyme expression to pri-
of different electropl~oreticmobilities in different mary isozymes, some non-genetic subbanding
tissues and can suggest the action of more struc- may confound the interpretation of gels. The pro-
tural loci than are actually present. If only a single duction of these secondary isozymes, or sub-
locus is active in this case, apparent heterozygos- bands, may vary by tissue location and age, en-
92 Chapfer 4 / Murpliy, Sites, Butlz & Haufler
Gpz-A products
Gpr-B products
plus subbands
Gpi-Mgenotypcs -28 100 100 100 100 100 100 100 164
100 100 100 100 700 100 100 100 100 100
b ~ g u r e14 Photograph exlrr~Llli~lg
variat~onat glu nnlis (lanes 1-6) a n d Luxilus zolzatus (lanes 7-10).
cosc-6-phosphate isomerase locl Gj7z-A and Gyr-R in Genotypes are listed for each locus. Notice the sub-
muscle extracts from the cyprinid hshcs Luxrlus cardz- banding.
(D) (E)
Figure 15 Photograph demonstrating a gel buffer in no case are all of the anticipated five isozymes
screen from rattlesnakes for the enzyme system L-lac- resolved (see Figures 13 and 16). Buffer (A) suppresses
iate dchydrogenase. (A) Tns-c~trateI11 pII 7.0. (B) Tris- the activity of the more anionic system, products of
citrate/borate pH 8.2. (C) Tris-citrate I1 pH 8.0. (D) the heart-predominating Ldk-B locus. Isozymes of the
Trls-citra'ce-EDTApH 7.0. (E) Phosphate-citrate pH 7.0. slower skeletal-muscle-predominating system Ldh-A
Incrcas~ngthe pH also increases the net charge and cannot be resolved adequately on systems (D) and (E).
re!a:ive mobility of tl-ic isozymes. Three or four Note that the minislices are uniquely notched.
iscuylnes are observed, depending on the buffer, but
Anode erodimers in relevant individuals can provide an

additional hint of heterozygosity in this example
(e.g., specimens 6 and 9 in Figure 14 in which the
variation at the Gpi-A locus is subtle).
Having noted the basics of gel interpretation
we provide a few additional examples. The effect
of using different buffers to resolve enzyme vari-
ability is shown in Figure 15. Alternative buffers
can differentially affectboth relative mobility and
activity of isozymes. Although not shown, some
*Origin
buffers would result in smeared isozyme patterns.
Figure 16 demonstrates that multiple buffers may
be required to resolve allelic variants at different
loci of the same enzyme system. Diagnosis of a
posttranslational modification is shown in Figure
1 2 3 4 5 6 7 8 Cathode 17, although the exact nature of this modification
Figure 16 Photograph demonstrating intra- and is unknown.
interlocus variability of tetrameric L-lactate dehydro- Some enzyme systems may appear as back-
genase (LDW) isozymes in spring peeper frogs, Hyla ground upon staining for others. These may be
(Pseudacrrs) crucifer. The more anodal system is the
heart-predominating locus, Ldh-B, and the cationic
locus (on this buffer system) is muscle-predominating
Ldlz-A. Were resolution is inadequate for interpreting
variation at the Ldh-A locus but is excellent for Ldh-3;
the buffer revealing Ldh-A variation masks that of Ldh-B.
Specimens 1, 2, and 5 are homozygous at both loci
although I has a different allele expressed at Ldh-A;
these specimens show the five expected isozymes
(Figure 13).Lanes 3,4, and 8 are heterozygous at Ldh-A
but homozygous for two different alleles at Ldh-B. sMD1-I
Lanes 6 and 7 are heterozygous at both loci.
zyme system, and electrophoresis buffer used. - Origin

Resolution of this problem often comes by chang- mMDH
ing to another buffer or resolving variation at the
locus. The subbanding problem is illustrated for
dimeric glucose-6-phosphate isomerase (GPI) in PTM
Figure 14. This enzyme system is controlled by
two loci, now known as Gpi-A a n d Gpi-B,in
teIeost fishes (Avise and itt ti, 1973); products of Figure 17 Gel of malate dehydrogenase (MDH) from
the latter locus predominate in muscle tissue and sdamanders (Ambystoma maculatum) showing varia-
an inter~ocuslIeterodimer is Llsually formed, tion in anionic isozymes (the supernatant locus sMdh-
A) and cationic isozymes (the mitochondrial locus
Isozymes of often yield two sub- rnMdh-A), and a posttranslaeon modification.sMdh-A
bands beyond each homomer Or heterOmer.This is dimeric and heterozygotes appear similar to those
pattern might be confused with that of heterozy- of GPI (Figure 14). mMdh-A products are tetramerie
gotes. ow ever, as Pigure 14 illustrates, heterozy- and heterozygotes near the gE.1origin (arrow) appear
smeared because the intralocus heteropolymers are
gates serve to clarify sibation even if their ho-
and heteromers are superimposed On too close together to be resolved. A posttranslation
modification (PTM) of mMdh-A products results in
some subbands and obscure the expected ratios of highly isozymes which in this case lack
isozymes. The presence of two interlocus het- the intralocus heteropolymers.
94 Chapter 4 /Murphy, Sites, Butk & Haufler
inede
- SOD
4 Origin
Figure 18 Photograph showing the resolution of
Superoxidedismutase (SOD) isozymes (light bands) as
background on a gel stained for glycerol-3-phosphate
dehydrogenase (G3PDH) in spring peeper frogs, Hyla terpretation of the zyrnograms. Documentation of
(pseudacris)crucifer. results through the publication of either gel
photographs or zymograms is recommended
strongly.
desirable, as in the case of observing superoxide
dismutase (SOD) following staining for glycerol-
3-phosphate dehydrogenase (G3PDH; Figure 18), ENZYME AND Locus
or undesirable (Figure 19). NOMENCLATURE
Figure 20 documents the necessity of choos-
ing the comcf array of tissues to be surveyed. Fi- The effective communication of data derived from
nally, Figure 21 shows unacceptable resolution of protein electrophoresis is critical to a clear under-
an isozyrne system. Optimal resolution of en- standing of any study. Consequently, there is a
zyme activity will facilitate a correct genetic in- need for a reasonably standard system of enzyme
Fiwe 19 Photograph showing extensive variability which may be misinterpreted as a second MPI: locus.
in mannose-6-phospilatt isomerase (MPI) isozymes All individuals except those indicated by arrows are
among some hylid frogs along with backgrourtd reso- Hyla (Psercdacris) crucifer. Tlze more anodal isozyrnes in
lution of L-lactate dehydrogenase products (LDH), species 2,H. cadaverina, are Ldh-B products.
Proteins: Isozynw Electropl~oresis 95
locus of vertebrates is seferred to as Ldlz-B, and the

locus predominating in skeletal muscle Ldlz-A
- Ck-C products
(Fisher et al., 1980). Where the locus lzomolog~es
heterodimers have not been specifically analyzed by irnrnuno-
logical affinity, or other means, we refer to them
by number (e.g., EsC-1, Gp-I). Duplicated paralo-
gous loci, for example the duplicated G3pdlz-A lo-
cus of some squamate reptiles (Sites and Murphy,
1991), are noted by follo~7ingthe locus des~gna-
tion with a number: G3pdiz-AI, and G3pdi1-A2 FI-
1 2 3 1 2 3 1 2 3 1 2 3
"YJi-,,--'lydi-yJ nally, many loci are expressed in subcellular or-
Muscle Heart Liver Stomach ganelles and specific designations have been
Figure 20 Creatine kinase (CK)lsozyines from the proposed for them in the form of a lower case pre-
marine toad, Bufo n~arinus,showing differences in tis- fix: among animals, m = mitochondria1 loci (e.g.,
sue expression and the presence of interlocus het- 17zAat-A);s = cytosolic (soluble or supernatant;
eromers. 111 stomach, only products of the Ck-C locus when enzymes systems are expressed by dupli-
are expressed whereas in skeletal muscle only Ck-A cate loci and specialized to function at the subcel-
is seen. In heart tissue, both locus products are
expressed, albeit weakly, and the interlocus hetero- lular level); 1 = lysosomal; p = peroxisornal; and
polymer (heterodimer)is present; the pure locus prod- among plants, s = cytosol; mt = mitochondnal; p =
ucts (homodirners)are not expressed in equal intensi- plastid; mb = microbody; c = cell wall The sufllx
ty, eliminating the possib~lityof a heterozygotic state. designations of Shaklee et al. (1992),wluck denote
CK is not expressed in liver. Adenylate kinase (Ak-A regulatory loci (r; e.g,, Ldh-Ar) and pseudogenes
locus) activity is also resolved and limited to skeletal (p = documented; 1 = ambiguous orthology), also
muscle tissue.
can be incorporated into our system.
Spec~ficalleles at each locus are denoted with
and locus nomenclature. Shaklee et al. (1992) rec- pareiztl~eses,referred to by lower case letters, and
ognized this need and recently proposed a system follow the locus abbrev~ation;thus the a allele of
for fish based on human gene nomenclature. Ldlz-A is accordingly designated as Ldlz-A(a) The
However, we believe this system can be sirnpli- genotype of a specific homozygous ~ n d ~ v ~ d u a l
lied by eliminating asterisks and comma nota- having allele a is referred to as Ldh-A(a/a). Siml-
tions. We also prefer not to use italics to designate larly, a l-ieterozygous individual having alleles n
loci, although many publications (such as this and b would be referred to as Ldiz-A(a/b). S ~ n u -
book) may require their use to distinguish them
from other text.
Our nomenclature generally follows But11
(1983,1984b), Murphy and Crabtree (19831, and
Shaklee et al. (1992) with some modifications.
Enzyme systems are referred to by upper case let-
ters; for example, L-lactate delzydrogenase is re-
ferred to as LDH. (Otl-ierrecommended abbrevia-
tions are given in Appendix 1). In some enzyme
systems, the specific locus homologies (ortholo-
gies) 11ave been identified by tissue-specific distri- Figure 21 Unacceptable resoluhon of pyruvatc klnasc
butions, immunologica1 affinities, physiological (PK)m sprlng peeper frogs, Hyln iPsetldaci 1s) crueller
properties, and relative mobilities. Specific loci are Smeared bands could result frotl~uslng a n lncorrcct
buffer system, overstauung, a bad gel, or bad salnplrs
noted using system identification but with lower (too-high lipid content, denatured proteins, old sain-
case letters except for the first letter whiclt is up- ples, etc ). 111addltzon, the satnple wicks have no1 been
per case. For example, the heart-predominating spaccd adequately
ufler
larly, numbered alleles should be separated by a sources usually extends only to the secondary lit-
fonzvardslasl~,as in Ldiz-A(200/125). erature wherein modifications are already noted.
Particular interlocus isozymes resulting from With few exceptions, the enzyme names and
subunit Interactions in multilocus systems are Enzyme Commission (EC) numbers used in this
desigaated using enzyme system notation (capi- compilation are those recommended by the Inter-
tal letters) followed by subscripts that designate national Union of Biochemistry (IUBNC, 1984).
subunits, e.g., LDH-A3B1, or simply abbreviated Abbreviations of enzyme names are placed in
A3B1. ln heterozygous individuals, polymeric en- capital letters; abbreviations are developed from
zymes (enzymes composed of multiple subunits) the IUBNC (1984) recommended names and
yield multiple isozymes. Intralocus polymeric sometimes differ from common usage by the ad-
isozymes, formed from the interactions of differ- dition of letters for clarity. The listing of named
ent alleles a t a particular locus in a heterozygote, loci controlling each enzyme system is beyond the
are denoted by locus designation with allelic sub- scope of this appendix and other abbreviations
script notation showing the number of con- are defined in the glossary.
stxtuent allclic suburuts, e.g., Ldh-A(a3blj, or sSod- The quaternary structures for enzymes listed
A 2 j n l b l ) . [Note that in all alIozyme studies, a herein were taken from those reported by Harris
heterozygous condition is simply denoted as Ldh- and Hopkinson (1976), D.E. Soltis et al. (1983),
A(a/b) or sSod-A2(a/b).]Finally, in polymeric, mul- Richardson et al. (1986), Aebersold et al. (1987),
tilocus systems, it is possible for subunits pro- Manchenko (1988), and personal communications
duced by different loci (e.g., A or B) and from a number of researchers. For some enzymes,
alternative alleles at a particular locus [e.g., A(a) these structures are well documented and conser-
or A@)] to combine within a single tissue yield- vative across taxa. For others (e.g., catalase and
ing a multitude of distinguishable isozymes [see glucose-6-phosphate dehydrogenase), several
Gorll~anand Shochat (1972) for the resolution of quaternary structures have been reported
15 LDH isozymes.] For this we suggest combin- (Manchenko, 1988). Whether these and other en-
ing system and allelic notation as follows: LDH- zymes actually exist in multiple structural forms
A3(a2b1)13,where two subunits of Ldk-A(aj, and or have a conserved single multimeric structure
one subunit each of A(b) and B(a), combine to that is expressed as restricted subunit combina-
form a single LDH isozyme. tions remains to be investigated.
Many of the biochemicals used in enzyme
stains are marketed in a number of forms. In some
cases, ultrapurity is not required and considerable
savings can be achieved through the purchase of a
lesser grade. In some cases, the choice of a partic-
(Compiled by Donald G. Buth and Robert W. ular salt may be critical, We have listed (Table 4)
Murphy) the product number of many of these stain com-
ponents keyed to the catalog of the Sigma Chemi-
FarmuXas for enzyme stains frequently are modi- cal Company (P.O. Box 14508, St. Louis, Missouri
fled and republished, often as compilations for 63178 U,S.A.) to allow the reader to evaluate the
speciflc groups of organisms or even for single kind and cost of these biochemicals. This choice
species. Textbook treatments often provide a lim- does not necessarily represent our endorsement of
ited introduction to the vast array of stains avail- these products.
able, whereas listings for specific groups of or- Most of the stains below are based on a stan-
ganlsms often are limited to those systems well dard volume of 50 ml suitable for gel slices from
known or expressed only in those groups. Our most horizontal starch gel apparatus (scaled down.
llsting is not meant to be all-inclusive; our selec- from stain formulas for 100-ml volume used com-
tlon is biased toward economical systems in use monly for the rnacroscale vertical apparatus of ear-
by botanists and zoologists. Our reference to stain lier studies). Some investigators have reduced the
Proteins: Isozyme Electrophoreszs
staining solution volume even further to conserve '

Dissolve the substrate:
expensive stain components; many of these reduc-
4-methylumbelliferyl-N-acetyl-D-
tions are noted and others may be warranted.
galactosamide 5 mg
The agar overlay method (see text) is recom-
dimethyl sulfoxide 0.25 ml
mended for most stains by some (e.g., Shaklee
and Keenan, 1986). For 50-ml stains, we recom- Add the dissolved substrate to:
mend 35-40 ml of buffer with the agar, and 10-15 0.1 M phosphate-citrate buffer, pH 9.5 15 ml
ml buffer with the stain components. Specific rec-
omn~endationsare made for stains using reduced Incubate, then view under UV light (long wave-
volumes. See text for specific instructions and length). Zones activity will appear as bright ar-
suggestions. Our staining methods reflect our eas. To enhance spray the gel with a
own biases in choice of bioclIemical reagents and concentrated solution of ammonium hydroxide.
their means of storage and application, F~~exam- This stain was described by Aebersold et al. (1987).
ple, we see no need to use the more expensive
NADP in stains requiring glucose-6-phosphate ~ - ~ c e ~ ~ - ~ - ~ ~ u c O s a(fimGA)
ir%$$ i ~ a S e
dehydrogenase when an NAD-dependent form of
(EC 3.2.1.30)
GGPDH can be used (Buth and Murphy, 1980).
NADP is used in solid form herein [see malate de- Dimer This enzyme formerly was called hex-
hydrogenase NADP (MDHP)] but may be suit- osaminidase (HA). This stain may be prepared as
abIe in liquid stock for many other stains. We pre- an agar overlay (0.17 g agar, 10 ml H20) which is
fer NBT to MTT (see text). Other individual added to the substrate below, or simply pour the
preferences abound; for instance,-Avala , et al. followine:
" onto the gel
- surface:
b 7 2 ) recommended adding phenazine metho-
sulfate to stains after an hour or more of incuba-
4~methylumbelIeryl-N-pacetyl-o~
glucosaminide 5 mi3
tion with the other components. We urge re-
dimethyl sulfoxide 0.25 ml
searchers to experiment with the options listed
among these recipes and to try other modifica- Add the dissolved substrate to:
tions of their owninvention. -
Unless otherwise indicated, the stained eel 0.1 M phosphate-citrate buffer, pH 4.5 15 ml
slices should be incubated in the dark at 37%. Incubate, then view under UV light (long wave-
Recipes for stain fixing solutions follow the stain length). Zones of activity will appear as bright ar-
recipes. eas. To enhance fluorescence, spray the gel slice
The costs of most stains listed below are pro- with a concentrated solution of ammonium hy-
vided in Rainboth and Buth (1992) and are based droxide. This stain was described by Aebersold et
on the 1992 Sigma Chemical Company catalog. al. (1987).
The relative costs in U.S. dollars are provided
with each stain in the following notation: $,
~$1.00;$$, $1.00-$2.00; $$$, $2.01-$5.00; $$$$, Acid Phospll~ataseIACP) $
r$5.00. (EC 3.1.3.2)
Monomer or Dimer Two stains are required in
fiiW-lr%cetylgdla@iolsajrninidase many vertebrates in order to resolve both the red-
40 GALA) $$ cell and tissue ACP isozymes. For dirneric tissue
isozymes incubate the gel slice at ambient tem-
(EC 3,2,1,53) perature for 30 min in a 0.05 M sodium acetate
Dimer Prepare this stain as an agar overlay (0.17 buffer, pH 6.0 (= 0.33 g sodium acetate in 50 ml
g agar in 10 ml water and combine with substrate Hz0 with a minor pH adjustment). Drain, then
mixture). add the following solution to the gel slice:
98 Chapter 4 / Muupizy, Sites, Buth B Haufler
sodium acetate (NaC2H302.3H20) 0.33 g This stain was modified from Harris and Hopkin-
G-naphtl~yl acid phosphate 0.15 g son (1976) and Siciliano and Shaw (1976). Note
fast garnet GBC 0.03 g that the magnesium chloride used is 1.0 MI not 0.1
Hz0 50 ml M as is used in many other enzyme stains. Cis-
aconitic acid stock solutions should be made in
The pH of the staining solution is about 5.5 so fur- small quantities as it seems to decompose in 1-2
ther adjustment is usually unnecessary; the pH insntl~s.
should be 5.0-6.0. The stain was modified from
Shaw and Prasad (1970). Werth (1985) recom-
mended the use of a stock solution of the substrate Adenosine Deaminase (ADA) $S
Pnapththyl acid phosphate in 70% acetone (use 1 (EC 3.5.4.4)
ml of a 1% solution). D.E. Soltis et al. (1983) and
Werth (1985) recommended the use of fast garnet Monomer This stain may be prepared as an agar
GBC salt as a substitute for black K salt. Sigma fast overlay.
black K salt has not provided satisfactory results 0.2 M Tris-HC1, pH 8.0 15 ml
in studies of reptiles. This stain was modified from Hz0 35 ml
Harris and Hopkinson (1976)and does not resolve adenosine 0.04 g
red-cell ACP in many vertebrates, including many arsenic acid 0.08 g
reptiles. Monomeric red-cell ACP isozymes, also xanthine oxidase 0.4 U
l a m - t as erythrocytic acid phospl~atase(EM; $$I,
nucleoside phosphorylase 1.8 U
may be stained as follows:
5 m d m l MTT 1ml
0.05 M citrate buffer, pH 6.0 50 ml
phenolphthalein diphosphate 0.2 g
This stain was modified from Spencer et al. (1968).
Incubate for 1 hr, decant the staining solution and
spray the gel surface with a concentrated solution
of ammonium hydroxide. Zones of activity will ap- Adenylate Kilaasc ( A K )$$$
pear as pink bands. This same stain was described (EC 2.7.4.3)
by Harris and Hopkinson (1976) who recom-
Monomer This stain may be prepared as an agar
mended 4 hr of incubation at 37OC. In some verte-
overlay.
brates tissue ACP isozymes can also be resolved.
0.2 M Tris-HC1, pH 8.0 50 ml
0.1 M MgC12 6 ml
Aconifate Hgrdra tasc IACOH) S$$ 0.03 g
adenosine 5'-diphosphate
(EC 4.2.1.3) D(+)glucose 0.1 g
Monomer This enzyme was known formerly as hexokinase 200 u
aconitase (ACO or ACON). Mitochondria1and su- G6PDH 40 NAD U
pernatant/cytosolic forms are known (Harris and 10 mg/ml NAD 2 ml
Hopkinson, 1976).This stain may be prepared as 5 mg/mI NBT 1ml
an agar overlay. 5 mg/ml PMS 1ml
0.2 M Tris-HCI, p H 8.0 50 ml This stain was described by Buth and Murphy
1.0 M MgC12 ((seenote below) 1.5 ml (1980) as modified from Fildes and Harris (1966).
0.1 M cis-aconitic acid, pH 8.0 5 ml A more sensitive, but more expensive, fluorescent
isocitric dehydrogenase 3U stain modified from Harris and Hopkinson (1976)
NADP 0.01 g may also be used.
5 mg/ml MTT I ml
5 mg/ml PMS 1ml
Proteins: Isozynw Electropizoresis gg
0.2 IM Tris-HC1, pH 8.0

95 or 100% ethanol
10 rnglml NAD
Dimer This enzyme was known formerly as glu- 5 mg/ml NBT
tamic-pyruvic transaminase (GPT). Prepare as an 5 mg/ml PMS
agar overlay (0.07 g agar in 5 ml buffer and then
combine with the following): Products of some ADH loci may be resolved bet-
ter using other alcohols (e.g., 0.2 ml 98% l-hexa-
0.2 M Tris-HCI, pH 7.0 5 ml
nol, or 0.3 ml I-amyl alcohol). Some related en-
DL-alanine 0.04 g zymes are named specifically (e.g., octanol
a-ketoglutaric acid 0,02 g dehydrogenase: EC 1.1.1.73) although the sub-
NADH 0.02 g strate, octanol, also may resolve other alcohols
I>-lacticdehydrogenase 25 U (e.g.,ADH). This stain was modified from Brcwer
Monitor the development of expression under UV (1970).Werth (1985) warned of fume contamina-
light (long wavelength). Enzyme activity is indi- t ~ o nof nearby incubating gels and recommended
cated by zones of defluorescence. This stain was sealing the AD13 stain box wit11 plastic food wrap.
modified from Harris and Hopkinson (1976) and
Siciliano and Shaw (1976). See also Casillas et al.
(1982). An alternative stain was described by Ae-
bersold et al. (1987).
Monomer or Dimer First combine the following:
Monomer Prepare as an agar overlay (0.07 g agar

in 5 ml buffer). Then add:
0.2 M Tris-HC1, pH 8.0 10 ml a-napthyl acid phosphate 0.15 g
pyruvic acid 0.01 g polyvinylpyrrolidone (PVP) 0.15 g
L-alanine 0.13 g fast blue RR salt 0.05 g
NADH 0.01 g This stain was modified from Boyer (1961) and
Incubate and view under UV light (long wave- Ayala et al. (1972).
length). Enzyme activity is indicated by zones of
defluorescence. This stain was modified from
Shaklee and Keenan (1986). See also Dando et al.
(1981); this enzyme may be limited to certain in-
vertebrate animals (Manchenko, 1988). Subunit structure unknown.
0.1 M phosphate-citrate buffer, pI3 4.0 5 ml
Pafc~hoiDeE~vdrogtnasei,kElii )9 4-methylumbelliferyl-L-arabinoside 0.01 g
4EC 3 -1) Incubate for =30 min and then view under UV
Dilnev Mix the substrate alcohol and the buffer light (long wavelength). Zones of activity wlll ap-
thoroughly prior to adding the other stain com- pear as bright areas. To enlzance fluorescence,
ponents. An agar overlay may be appropriate in spray the gel slice wit11 a concentrated ammo-
some circumstances. nium hydroxide solution. This stain was n ~ o d ~ l l e d
from Harris and Hopkinson (1976).
100 CI~apter4 / Murplzy, Sites, Buth b Haufler
This stain was provided by J.P. Bogart. An alter-

native stain modified from M.K. Schwartz et al.
(1963) was given in the first edition of Moleczllar
D i l t ~ e rThe following stain may also yield adeny- Systematics. A final pH of 8.0 is critical to the suc-
!ate kinase (AK) gene products. A controI slice cess of this stain (D.E. Soltis et al., 1983). A more
from the same gel must. be stained specifically for sensitive, but more expensive, fluorescent stain is
AK to scertain, by a process of elimination, found in Harris and Hopkinson (1976).
rvhich zones of activity are ARK. Prepared as an
agar overlay (0.09 g agar in 8 ml buffer).
0.2 M Trls-HC1, p1-I 7.0 12 ml
0.1M MgC12 1ml
adenosil.ie 5'-diphosphate 0.02 g Monomer CBPs migrate rapidly toward the anode
u(+)glucose 0.04 g and are somewhat diffuse in appearance (Buth,
hexokmase 200 u 1979,1982b). The creatine kinase gene products
phospho-L-arginine 0.02 g (Ck-A) and other proteins that predominate in the
G6PDI-I 40 NAD U tissue examined k g . , those often scored as gen-
eral proteins) will also stain via this procedure.
10 mg/ml NAD 1 ml
5 rng/ml NBT 1 ml Dissolve the dye in water:
5 mg/rr~lPMS 1ml
H20 50 ml
This slain was modified from Shaklee and Keenan brilliant blue G dye 0.05 g
(1986). This enzyme is present only in some in- Then add acids:
vertebrate animals (Shaklee and Keenan, 1986).
trict~loroaceticacid 7.5 g
5-sulfosalicylic acid 2.5 g
Tlzis stafn was modified from Massaro and Mark-
ert (1968).
Dl~iier This enzyme was known formerly as glu-
fama te-oxaloacetate transalninase (GOT), Mito-
chondrial and supernatant/cytosolic forms are
known (Harris and Hopkinson, 1976). This en-
zyme is best resolved from extracts of relatively Tetmmer? Allow the gel slice to warm to ambient
fresh tissues. temperature, then add:
Substrate solution: 0.06 M sodium thiosulfate 15 ml
ddH20 400 ml 3%hydrogen peroxide 35 ml
L-aspartic acid 1.33 g
a-kcioglutaric acid 0.365 g Incubate at ambient temperature for 1 min, drain,
poly vinylpyrrolidone 2.5 g and add:
Na2bDTA 0.5 g 0.09 M potassium iodide (KI) 50 ml
Na2HP04 14.2 g
Flush the gel slice quickly with water to remove
Adjust f~nalsolution to 500 ml. Minor pH adjust- KI as soon as the white zones of catalase activity
ment to 8.0 with 4.0 M NaOH might be required. are visible against the blue background, DO NOT
Keep refrigerated. For one stain add: place stained gel slice in gel fixitive! Photograph
the developed gel slice immediately if possible.
0.2 M Tris/HCl, pH 8.0 25 ml
The gel slice nzay be stored in water in a dark re-
substrate solution 25 ml
frigerator for a few days but the blue back-
fast blue BB salt 0.05 g
P~oteins:Isozyme Electrophoresis 101
ground stain will be lost eventually. This stain would be to consider this enzyme under the cate-
was modified from Brewer (1970). Siciliano and gory of generic arninopeptidases EC 3,4.-.-.
Shaw (1976) recommended a longer incubation
period (15 min) for the initial solution. D.E. Soltis Incubate the gel slice in the following solution
for 30-60 min:
et al. (1983) and Werth (1985) noted that up to 1
ml of glacial acetic acid may have to be added to 0.1 M KEE2P04buffer, pH 7.0 50 ml
the KI solution to induce or improve staining. An 0.1 M MgC12 1 ml
alternative CAT stain was described by Harris 10 mg/ml L-leucine-
and Hopkinson (1976) and Aebersold et al. naphthylamide HC1 0.1 ml
(1987). This enzyme cannot be resolved on high
pH gels. Then add:
fast garnet GBC (dissolved in a
small quantity of water) 0.03 g
Continue incubation.
Dimer The following stain may also yield adeny- This stain was modified from those of Brewer
late kinase (AK) gene products. A control slice (19701, Shaw and Prasad (1970) and Ayala et al.
from the same gel must be stained specifically for (1972). Some of these staining methods involve a
AK to ascertain, by a process of elimination, preincubation step in a boric acid solution which
which zones of activity are CK. This stain may be may not be necessary.
prepared as an agar overlay.
0.1 M MgC12 1 ml
adenosine 5'-diphosphate 0.03 g
glucose 0.05 g
hexokinase 200 u Monomer or Dimer This enzyme was known for-.
merly as diaphorase (DLA) and lipoamide dehy-
phosphocreatine 0.05 g
drogenase (EC 1.6.4.3; see Muramatsu et al., 1978).
GGPDH 40 NAD U
10 mg/ml NAD 1 rnl 0.2 M Tris-HC1, pH 8.0 50 ml
5 rng/ml NBT 1 ml 2 mg/ml2,6-dichlorophenol-
5 mg/ml PMS 1ml indophenol 1ml
NADH 0.01 g
This stain was described by Buth and Murphy 5 mg/ml M n 1 ml
(1980) as modifed from Shaw and Prasad (1970).
A more sensitive, but more expensive, fluorescent Zones of enzyme activity will appear pink/pur-
stain was described by ~ a r i i and
s Hopkinson ple against the blue background of the gel. The
(2976). blue DCIP color will clear overnight if the devel-
oped gel is kept refrigerated (dark) yielding a
white gel with purple isozymes. This stain was
modified from those of Kaplan and Beutler
(1967) and Brewer (1970).Harris and Hopkinson
(1976) used this stain to resolve NADH di-
Monomer This enzyme was known formerly as
aphorase (a synonym of cytochrome-b5 reduc-
leucine aminopeptidase (LAP); the current
tase; EC 1.6.2.2). Aebersold et al. (1987) noted
IUBNC (1984) name and EC number may be
that this stain may also resolve xanthine oxidase
changed as more is learned about peptidases.
(XO) gene products as well as those of a variety
Rickardson et al. (1986) refer to this enzyme as
of other enzymes.
Pep-E (see Peptidase). A conservative approach
102 Chapter 4 / Murphy, Sites, Butk b Haufler
D.E. Soltis et al., 1983; Shaklee and Keenan, 1986;

Aebersold et al., 1987):
Dimer Prepare this stain as an agar overlay. acetic acid, pH 5.5 10 ml

4-methylumbelliferyl acetate 0.01 g
0.2 M Tris-HC1, pH 8.0 50 ml
0.1 M MgC12 5 ml Incubate for approximately 10-30 min at 37OC and
adenosine 5'-diphosphate 0.05 g then view under UV light (long wavelength).
Zones of activity will appear as bright areas. Ad-
D(-13-phosphoglycericacid 0.05 g
just the pH of the acetic acid with sodium acetate.
NADH 0.04 g
pyruvate kinase 20 U
L-lacticdehydrogenase 125 U Deh ydragcmcase (Z'DEi) $
Fom~aldeii~ydc
phosphoglycerate rnutase 125 U (EC 3+21.1)
Monitor the development of expression under W Dimer
light (long wavelength). Enzyme activity is indi- 50 ml
0.2 M Tris-HCI, pH 8.0
cated by zones of defluorescence. This stain was
37% formaldehyde (reagent grade) 3 drops
modified from Harris and Hopkinson (1976). Two
glutathione, reduced 0.05 g
alternative agar-overlay, fluorescent stains have
been described for EN0 that call for slightly dif- 10 mg/ml NAD 3 ml
~ ferent sets of components that may be less expen- 5 mg/ml NBT 1ml
sive (Siciliano and Shaw, 1976; Shaklee and 5 mg/ml PMS 1ml
Keenan, 1986; Aebersold et al., 1987). Prepare the stain and incubate the gel slice at 37°C
in the dark in a fume hood. This stain was modi-
Esterase (EST) $ fied from an unpublished formula of G.I?
Manchenko (personal communication). See also
[non-specif icl Manchenko (1988).
Monomer or Dimer The following general esterase
stain can resolve a number of gene products with
broad substrate specificities;these enzymes might
be identified specifically using other methods.
The products resolved non-specifically might be Dimer or Tetramer This enzyme was known for-
considered as generic carboxylic ester hydrolases merly as hexosediphosphatase (HDP) or fructose-
(EC 3.1.1.-; IUBNC, 1984). 1,6-diphosphatase.
0.2 M Tris-HC1, pH 7.0 50 ml 0.2 M Tris-HC1, pH 8.0 50 ml
a-naphthyl acetate solution 3 in1 MgS04*7H20 0.25 g
fast blue BB salt 0.05 g D-fructose-1,6-diphosphate 0.02 g
Incubate at ambient temperature; dark not re- glucose-6-phosphate isomerase 50 U
quired. To prepare the stock substrate solution G6PDH 40 NAD U
(2%solution in 50% acetone), dissolve the a-naph- 2-mercaptoethanol(1 drop in
thy1 acetate in the acetone, then add the water. 10 ml H20) 1 drop
Clearer resolution of some esterase products can NADP 0.02 g
be achieved by using P-naphthyl acetate as the 5 mg/ml NBT 1ml
substrate. This stain was modified from Brewer 5 mg/ml PMS 1ml
(1970).A fluorescent stain for dimeric esterase-D
(EC 3.1.1.-; $) has been described by several in- Prepare stain and incubate the gel slice at 37°C in
vestigators (e.g., Harris and Hopkinson, 1976; the dark in a fume hood. This stain is modified
Proteins: Isozynze Elect~ophoresis I03
from Shaw and Prasad (1970). Tlus enzyme seems

particularly sensitive to freezing and thawing of
tissue samples.
Dinzer
0.1 M phosphate-citrate buffer, pH 4.0 5 ml
4-methylumbelliferyl-a-D-galactoside 0.01g
Incubate for approximately 30 min and khcn vic\y.
under W light (long wavelength). Zones of actlv-
Teframer This enzyme was known formerly as al- ~ t yw ~ l appear
l as bright areas. To enhance fluo-
dolase (ALD) and is currently abbreviated as rescence, spray the gcl slice with a concentrated
FBALD by some investigators. amlnonlum hydroxide solution. This stain was
0.2 M Tris-HC1, pH 8.0 50 snl modified from Harris and Hopkinson (1976)
glyceraldehyde-3-phosphate
dehydrogenase 200 U
D-fructose-1,6-diphosphate 0.08 g
10 mg/ml NAD 2 ml
5 mg/ml NBT 1 ml Monomer
5 mg/inl PMS 1 ml 0.1 M phosphate-citrate buffer, pH 4.0 5 ml
This stain was modified from Ayala et al. (1972). 4-methylumbellifcryl-j3-11-galactoside 0.01g
Aebersold et al. (1987) recommended the addition Incubate at 37OC for approxiinately 30 min and
of 0.01 g arsenic acid to an IFBA stain of 50 ml. then view under UV light (long wavelength).
Zones of activity will appear as bright areas. To
Feaaraarate Hydralase (FUMW) S$ enhance fluorescence, spray the gel slice with a
concentrated ammonium hydroxide solution.
(EC 4.2,X.2) This stain was modified from Harris and Hopkin-
Teframer This enzyme was known formerly as fu- son (1976).
marase (FUM). The following stain may also yield
malate dehydrogenase (MDH) gene products. A
control slice from the same gel must be stained General Proteins (GFj $
specifically for MDH to ascertain, by a process of I~lon-specific]
elimination, which zones of activity are FUMH. Various quaterna y structures Creatine kinase gene
0.2 M Tris-HC1, pH 8.0 50 ml products (Ck-A) and other proteins that predorni-
fumaric acid 0.05 g natc will also stain via this procedure.
malic dehydrogenase 150 U Prepare stock solution:
10 mg/ml NAD I rnl
naphthol blue black (amido black) 1g
5 mg/ml NBT 1 rnl
stain-fixing solution (1:5:5 glacial
5 mg/ml PMS 1ml acetic acid:rnetl~anol:water) 500 ml
This stain was modified from Brewer (1970). The
Filter stain to remove undissolved dye.
FUMH stain described by Siciliano and Shaw
(1976) called for the use of the disodium salt of fu- Stain gel slices in 50 ml of stock solution for 20
maric acid and MDK in greater concentration. Ae- min at 20°C. Wash slices in fixing solution several
bersold et al. (1987)recommended the addition of times until background is pale. The stain may be
0.01 g pyruvic acid to a mJMH stain of 50 ml to reused. This stdin was modified from Selander et
suppress LDH. al. (1971).
104 Cizapter 4 /Murphy, Sites, Buth & Haufler
D-fructose-6-phosplzate 0.04 g
G6PDH 40 NAD U
10 mg/ml NAD 2 ml
Dirrier? This enzyme was known formerly as hex- 5 mg/ml NBT 1 ml
ose-6-phosphate dehydrogenase (H6PDH). Tlze
5 mg/ml PMS 1 rnl
following stain may also yield LDH. Either a con-
trol slice from the same gel or the addition of 0.05 This stain was described by Buth and Murphy
g pyruvic acid may be necessary. (1980) as modified from DeLorenzo and Ruddle
(1969).
0.05 M patassium phosphate buffer,
p1-l 7.0 50 ml
u(+)glucose 9g t>l-Glercasidas.e4sxGL'BLIS)$$$
10 nlg /ml NAD 2 ml (EC 3,2.1.20)
5 mg/ml NBT 1 ml
5 mg/ml PMS 1 ml Tetramer
This stain was modified by Berg and But11 (1984) 0.1 M phosplzate-citrate buffer, p H 4.0 5 ml
from (hat described by Harris and Hopkinson 4-methylumbelliferyl-a-D-glucoside 0.01 g
(1976)
Monitor the development of expression under W
light (long wavelength). Zones of activity will ap-
pear as bright areas. To enhance fluorescence,
spray the gel slice with a concentrated solution of
ammonium hydroxide. This stain was modified
from Harris and Hopkinson (1976).Aebersold et
Dillier? NADP (0.02 g in 400 ml) should be added al. (1987) recommended a stain buffer pH of 8.0.
to thc gel before electrophoresis.
0.2 h.i Tris-HC1, pH 8.0 50 ml P-GTucosidase (EgGiUS)$
0.1,2/MgC12
1 3 ml (EC 3.2.3.23)
u-glucose-6-phospha te 0.3 g
hlADI-' 0.03 g Subunit structure uncertain
5 mg/ml NET 1 ml 8.1M phosphate-citrate buffer, pH 4.0 5 ml
5 mg /ml PMS I ml 4-methylun~belliferyl-PD-glucoside 0.01 g
This stain was modified from Brewer (1970). At Incubate for approximately 30 min and then
least for amphibians, the quantities of NADP and view under UV light (long wavelength). Zones
u-glucose-6-phosphate may be reduced by 60% or of activity will appear as briglzt areas. To en-
more hance fluorescence, spray the gel slice with a
concentrated ammonium hydroxide solution.
This stain was modified from Harris and Hop-
kinson (1976).
PGlucurrrnldase (PGLUR) $$$

Di~iierThis enzyme was known formerly as phos- (EC 3,2."1,31)
phohexose isomerase (PHI) or phosphoglucoiso-
merase (PGI). This stain may be prepared as an Tetramer Some investigators abbreviate this en-
agar overlay. zyme as GUS.
0.1 M phosplzate-citrate buffer, pH 4.0 5 ml
4-methylumbellife yl-j?-Dglucuronide 0.01 g
Incubate for approximately 30 rnin and then view 1.0 M L-glutamicacid 15 ml

under W light (long wavelength). Zones of activ- I0 mg/rnl NAD 3 ml
ity will appear as bright areas. To enhance fluo- 5 mg/ml NBT 1 ml
rescence, spray the gel slice with a concentrated 5 mg/ml PMS 1 ml
ammonium hydroxide solution. This stain was
modified from Harris and Hopkinson (1976). A This stain was modified from Shaw and Prasad
more expensive, non-fluorescent stain was de- (1970). Brewer (1970) recommended a staining
scribed by Aebersold et al. (1987). buffer of pH 9.0. Aebersold et al. (1987) recorn-
mended the addition of 0.014 g adenosine 5'-
diphosphate and 0.001 g pyridoxal5-phosphate
Glutamate-AmmoaPia Ligase (GLAL) $$ to a GTDH stain of 50 mi. A stain buffer of 0.2 M
bEC 6,3.1.23 Tris/HCl pH 7.0 may give equally good results.
Subunit structure uncertain This enzyme was
known formerly as glutamine synthetase (GS). Cl-krt,amaii.cDehyifrogenase (NADPH')
Preparation of a substrate solution and a visual- (CTDHP) $$$I
ization solution is required. For the substrate so- CEC 3.4.1 .iaj
lution, dissolve 0.10 g MgC12*6H20in 2.0 ml H20,
then add: Subunit structure uncertain
0.2 M Tris-HC1buffer, pH 8.0 3 ml 0.1 M potassium phosphate buffer,
L-glutamicacid 0.20 g pH 7.0 35 ml
adenosine 5'-triphosphate 0.08 g 1.0 M L-glutamicacid 15 ml
N&OH (concentrated) 0.2 ml NADP 0.02 g
5 mg/ml NBT 1 ml
Raise the pH to 9.3 with 1.0 N NaOH, Incubate
the gel slice in the substrate solution at 37'C for at
5 mg/ml PMS 1 ml
least 1 l-tr,then remove the substrate solution from Counterstain for GTDH. This stain was modified
the gel slice but do not rinse. Cover the gel slice from Shaw and Prasad (1970). Brewer (1970) rec-
with a 50% acetone solution and incubate it at am- ommended a staining buffer of pH 9.0. Aebersold
bient temperature for 15 min. Flush the gel slice et al. (1987) recommended the addition of 14 mg
with water and add the fallowing visualization adenosine 5'-diphosphate and 1 mg pyridoxal5-
solution: phosphate to a GTDHP stain of 50 ml volume.
L-ascorbicacid 0.8 g
ammonium molybdate solution
(see below) 6 ml
Incubate at 37°C. The ammonium molybdate so-
lution can be prepared as a stock (2.5 g molybdic
Dimer Prepare as an agar overlay (0.17 g agar in
acid ammonium tetrahydrate, 8 ml concentrated
10 ml water, mix the stain components in the 13
H2S04,92 ml water). This stain was described by
ml buffer):
Morizot et al. (1983).
0.2 M Tris-HC1, pH 8.0 13 ml

2 mg/ml2,6-dichlorophenol-
indophenol 1 ml
Tetramer? glutathione, oxidized 0.02 g
flavine adenine dinucleotide 0.002 g
0.1 M potassium phosphate buffer, NADH 0.01 g
pH 7.0 35 ml
5 mg/ml MTT I ml
This stain was modified from Aebersold et al. gene products. A control slice from the same gel
(1987). Alternative stains are discussed by Brewer may be necessary.
(1970). P.T. Chippindale (personal communica- 0.2 M Tris-HC1, pH 8.0 50 ml
tion) obtains better results by applying the stain DL-glycericacid 0.2 g
without agar and in only 13 ml of Tris/HCl. Di-
pyruvic acid 0.05 g
hydrolipoamide dehydrogenase (DDH; EC
1.8.1.4) may also be resolved as a second, rela- pyrazole 0.05 g
tively slower system (see Harris and Hopkinson, 10 mg/rnl NAD 2 ml
1976) because it appears if glutathione is omitted 5 mg/ml NBT 1ml
from the stain. In addition, because Aebersold et 5 mg/rnl PMS 1 ml
al. (1987) noted that the DDH stain may also re- This stain was modified from Siciliano and Shaw
solve xanthine oxidase (XO) gene products as well (1976).
as those of a variety of other enzymes, this may
also be true for GR. FAD may not be necessary if
NADPH is used instead of NADH. However, the
use of NADPH may result in the resolution of
NADPH dehydrogenase (EC 1.6.99.1).
Dimer This enzyme was known formerly as

a-glycerophosphate dehydrogenase (aGPD or
&PDH).
0.2 M Tris-HC1, pH 8.0 50 ml
Tetramer Prepare the substrate solution: DL-glycerophosphate,pH 8.0 Ig
0.1 M MgC12 1 ml
0.2 M Tris-HC1, pH 7.0 10 ml
aldolase 100 U 10 mg/ml NAD 1 ml
D-fructose-I,6-diphosphate 0.25 g 5 mg/ml NBT 1ml
5 mg/ml PMS 1 ml
Incubate the substrate solution at 37°C for 30
min, then add the following: This stain may require a higher pH of stain buffer
(pH 9.5), 2 ml of NAD, and incubation for 1hr be-
0.2 M Tris-HC1, pH 7.0 40 ml fore adding PMS. This stain was modified from
arsenic acid 0.08 g Shaw and Prasad (1970).
10 mg/ml NAD 2 ml
5 mg/ml NBT I d
5 mg/ml PMS 1 ml
Guanine Deamiraase (GDA) $$
(EC 3.5.4.3)
This stain was modified from those of Ayala et al.
(1972) and Siciliano and Shaw (1976). In order to Dimer This stain may be applied as an agar over-
minimize L-lactate dehydrogenase (LDH) stain- lay (use 0.07 g agar in the 5 ml water and mix the
ing, 0.1 g of pyruvic acid may be added to the stain components in the 5 ml buffer). Otherwise
staining solution (Harris and Hopkinson, 1976). simply mix the following:
0.2 M Tris-HCl, pH 8.0 5 ml
Hz0 5 ml
guanine 0.25 g
xanthine oxidase 2.6 U
5 mg/ml MTT 2 rnl
Subunit structure uncertain The following stain 5 mg/ml PMS 1ml
may also yield L-lactate dehydrogenase (LDH)
Proteins: l s o z y ~ n eElectrophoresis 107
If the enzyme activity is low, the amount of XO 10 mg/ml NAD

may have to be increased. However, Harris and 5 mg/ml NBT
Hopkinson (1976) recommended that less XO be 5 mg/ml PMS
used (0.1 U). This stain was modified from
Richardson (1983). Jncubate the gel slice in tlze dark at ambient tern-
perature or at 37'C. This stain was modified from
Shaw and Prasad (1970). D.E. Soltis et al. (1983)
and Werth (1985) suggested that 0.02g tetra-
sodium EDTA be added to an HK stain of 50 ml.
Marzorner This stain may be prepared as an agar
overlay (0.07 g agar in 5 ml of buffer and mix the
stain components in tlze remaining 8 mnl of buffer).
0.2 M Tris-HCI, pH 8.0
0.1 M MgC12 Dimer This enzyme has multiple substrate affini-
adenosine 5'-triplzosphate ties, and may be confused with 3-hydroxybu-
guanosine 5'-monophosphate tyrate dehydrogenase (I-IBDH). Both LDH and
phospho(enol)pyruvate ADH gene products may appear with prolonged
NADH staining. Either a control slice from the same gel
potassium chloride or end-product suppression staining may be nec-
calcium chloride essary.
pyruvate kinase Dissolve 2.0 g D-gluconic acid lactone in 25 ml
lactic dehydrogenase water. Adjust the p H to 12.5 with sodium
Monitor the development of expression under W hydroxide pellets ( ~ 0 . 5g). Incubate this solu-
light (long wavelength). Enzyme activity is indi- tion at ambient temperature for 30 lnin with
cated by zones of defluorescence. This stain was occasional stirring. Then readjust the p1-f to just
modified from Harris and Hopkinson (1976). See below 8.0 by adding, dropwise, 12 N HCl, Add
also Morizot and Siciliano (1982). 25 ml of 0.2 M Tris-HC1, pH 8.0 (at this point,
the pH of tlze substrate solution should be 8.0
but a minor adjustment with 4 N HC1 may be
I-JexoMxriase IHK) $$$$ necessary). Add the following to the substrate
(EC 2.7.1.2) solution and apply to the gel slice:
Moizomer The following stain may also yield glu- pyrazole
cose dehydrogenase (GCDH) gene products. Ei- pyruvic acid
ther a control slice from the same gel must be 10 mg/ml NAD
stained specifically for GCDH to ascertain, by a 5 mg/ml NBT
process of elimination, which zones of activity are 5 mg/ml PMS
HK, or 0.05 g of gluconic acid must be added to
the stain to suppress GCDH. HK and GCDH of- Tlus stain was described by But11 (1980)as modi-
ten have very different tissue-specific expression fied from Tobler and Grell(1978).
so the choice of tissue may make continued con-
trol testing unnecessary.
0.2 M Tris-HC1, pH 8.0 50 ml
0.1 M MgClz 1 ml Tetramer This enzyme was known formerly as
adenosine 5'-triphosphate 0.25 g glycolate oxidase (GOX)EC 1.1.3.1. Prepare as an
D(+)-glucose 5g agar overlay.
G6PDH 80 NAD U
108 Chapter 4 /Murphy, Sites, Buth G. HaufZer
0.2 M Tris-HC1, pH 8.0 50 ml 0.1 M MgC12 2 ml

glycolic acid 0.05 g NaCl 0.30 g
5 mg/ml MTT 1 ml DL-hydroxybutyricacid 0.63 g
5 ~ n g / r n PMS
l 1 ml 10 mg/ml NAD 3 ml
This stc~inwas modified from R.L. Garthwaite 5 mg/rnl NBT 1ml
(personal communication). 5 mg/ml PMS 1ml
This stain was modified from Shaw and Prasad
ffydrsx)racylgluiaIkione Hydrolase (1970).
iI3A c; ti) $$$$
fEC 3. i 2 . 6 )
Dirrzer Thls enzyme was known formerly as gly-
oxalase II (GLO-TI).This stain should be prepared Tetramer This enzyme was known formerly as sor-
as an agar overlay (0.17 g agar in 10 ml water; mix bit01 dehydrogenase (SDH or SORD). LDH gene
the stain components in the 15 ml buffer): products may appear with prolonged staining.
0.2 n/l 'Tris-E-IC1, pH 8.0 15 ml Either a control slice from the same gel or end-
~neihylglyoxal 0.24 ml product suppression staining may be necessary.
glut,lthlone, reduced 0.03 g 0.2 M Tris-HC1, pH 8.0 50 mI
glyoxalase I 150 U 2.0 M D-sorbitol 5 rnl
10 mg/ml NAD 1 ml
Incubate at 37OC for at least 30 min, then add:
5 mg/ml NBT 1 ml
1.0 h/l zlnc chloride 80 pl 5 mg/ml PMS 1 ml
pyruvlc acld 5 mg
This stain was modified from Lin et al. (1969).En-
L-lactlcdehydrogenase 500 U zyme products often migrate cathodally
10 mg/ml NAIL) 3 ml
5 ing/ml MTT 1 ml
5 mg/ml PMS 1 ml
Incubate the gel slice at 37OC in the dark. This
sin111 was modified from Aebersold et al. (1987). Dirner The following stain is for NADP-depen-
dent isocitrate dehydrogenase. Mitochondria1and
supernatant/cytosolic forms are known (Harris
3-I-i ydroxybutyrate 13ehydxogenase and Hopkinson, 1976).
kn-ran~-r) $Q;
0.2 M Tris-HC1, pH 8.0 50 ml
(EC i . r .?.3U)
0.1 M MgC12 3 ml
C Y enzyme may be confused with 2-
D Z I ~ ~This DL-isocitricacid 0.08 g
l~ydroxy-aciddehydrogenase (HADH) due to the NADP 0,Ol g
lnultlple substrate affinities of the latter. A control 5 mg/ml NBT 1ml
slice from the same gel should be stained for 5 mg/rnl PMS 1 ml
HADII. Electromorphs in common between slices
stallled for HBDH and HADT-I are probably This stain was modified from those of N.S. Hen-
HADH. ADH products may appear with pro- derson (1965), Brewer (19701, Shaw and Prasad
longed staining. (19701, and Ayala et al. (1972). Note that Brewer
(1970) erred in listing manganese chloride
0.5 hl potassium phosphate buffer, (MnC12)instead of magnesium chloride in this
pT-I 7 0 25 ml stain. For amphibians, the Tris-HCI buffer shouJd
Hz0 20 ml be adjusted to pH 7.0.
5 mg/ml NBT
5 mg/rnl PMS
Tetramer This stain was modified from Brewer (1970).
1.0 M lithium lactate, pH 8.0
(see below) 8 mi
20 mg/ml NAD I d
5 mg/ml NBT 1 ml
5 mg/ml PMS 1ml Tetramer The following stain is for NADP-depen-
dent malate dehydrogenase. The convention for
The stock substrate solution may be prepared us- name abbreviation of this kind of enzyme (+P to
ing either DL-lacticacid or lactic acid solution; the MDH) follows Aebersold et al. (1987). This en-
pH should be adjusted to 8.0 with the addition of zyme was known formerly as malic enzyme (ME).
LiOH. This stain was modified from Shaw and Mitochondrial and supernatant/cytosolic forms
Prasad (1970). are known (Harris and Hopkinson, 1976). NADP
(0.02 g in 400 ml) should be added to the gel be-
fore electrophoresis.
0.2 M Tris-HC1, pH 8.0 50 ml
0.1 M MgC12 1 ml
Dirner This enzyme was known formerly as gly- 2.0 M DL-malicacid, pH 8.0 5 ml
oxalase I (GLO). NADP (see note below) 0.02 g
0.2 M potassium phosphate buffer, 5 mg/ml NBT 1ml
pH 6.8 50 ml 5 mg/ml PMS 1 ml
methylglyoxal (40%solution) 0.9 ml
This stain was modified from those of Ayala et al.
glutathione, reduced 0.25 g
(1972) and Cross et al. (1979). It is important that
5 mg/ml MTT 1 ml NADP be used in solid form in this stain. There is
Incubate the gel slice in this solution at 37'C for often sufficient breakdown of NADP to NAD in
40 min and then add: liquid stocks in prolonged storage that NAD-de-
pendent MDH activity will be resolved in addition
2 mg/ml2,6-dichlorophenol- to MDHP. If there is any doubt as to the identity of
indophenol 1ml MDHP, a control slice from the same gel should be
Areas of activity will be seen as white zones on a stained specifically for MDH to ascertain, by a
blue background. This stain was modified from process of elimination, which zones of activity are
Harris and Hopkinson (1976). MDHP. Aebersold et al. (1987) recommended
adding 0.02 g oxaloacetic acid to an MDHP stain
of 50 ml,Use caution when preparing the DL-malic
acid substrate as the solution becomes extremely
hot while adjusting the pH with NaOH.
Dirner The following stain is for NAD-depen-

dent malate dehydrogenase. Mitochondrial and
supernatant/cytosolic forms are known (Harris
and Hopkinson, 1976).
0.2 M Tris-HC1, pH 8.0 Monomer This stain may be prepared as an agar
2,O M DL-malicacid overlay.
10 mg/ml NAD
0.2 M Tris-HC1, pH 8.0 50 ml keep the gel slice moist (Harris and Hopkinson,
0.1 M MgC12 1 ml 1976; Aebersold et al., 1987). After incubation,
w-mannose-6-phospha te 0.05 g remove the substrate solution from the slice
glucose-6-phosphate isomerase 50 U but do not rinse. Add the following visualiza-
G6PDH 40 NAD U tion solution:
10 mg/ml NAD 2 ml L-ascorbicacid 0.5 g
5 mg/ml MTT 1 ml ammonium molybdate solution
5 mg/ml PMS 1ml (see below) 2 ml
Products of LDH may appear as faint bands fol- Incubate at 37OC in a fume hood in the dark. The
lowing staining. LDH activity can be suppressed ammonium molybdate solution can be prepared
by adding 0.05 g of pyruvic acid. This stain was as a stock (2.5 g ammonium molybdate, 8 ml con-
described by Buth and Murphy (1980) as modi- centrated H2SO4,92 ml H20).This stain was mod-
fied from E. Nichols et al. (1973). ified from Aebersold et al. (1987).
Monomer or dimer Dimer

0.1 M phosphate-citrate buffer, pH 4.0 5 ml 0.2 M Tris-HC1, pH 8.0 50 ml
4-methylumbelliferyl-a- 1-octanol 3 ml
D-mannopyranoside 0.01 g 95% ethanol 1 ml
Incubate for 30 min and then view under UV light 10 mg/ml NAD . I ml.
(long wavelength). Zones of activity will appear 5 mg/ml NBT I ml
as bright areas. To enhance fluorescence, spray the 5 mg/ml PMS 1 ml
gel slice wit11 a concentrated ammonium hydrox- This stain, modified from those of Ayala et al.
ide solution. This stain was modified from Harris (1972) and Shaklee and Keenan (1986), includes
and Hopkinson (1976). ethanol, which may also serve as a substrate for
alcohol dehydrogenase (ADH). A control slice
from the same gel should be stained for ADH us-
ing only ethanol as substrate to ascertain which
zones of activity are ODH.
Dimer This enzyme was known formerly as ino- 11-Octcppine Dehydrogcr~ase(OBaDE3-Fj$$

sine triphosphatase (ITP). This stain requires the
preparation of a substra.te solution and a visual- (EC "8.5.1.11)
ization solution. Monomer Prepare this stain as an agar overlay
Substrate solution: (0.17 g agar in 15 ml buffer, the stain components
in the remaining 10 ml buffer):
MgC1.y 6H20 0.1 g 0.2 M Tris-HC1, pH 8.0 25 ml
inosine triphosphate 0.03 g 0.1 M MgC12 1ml
2-mercaptoethanol 120 pl D-octopine 0.01 g
10 mg/inl NAD 1 ml
Mix these components and pour on the gel slice 5 mg/rnl NBT 1 ml
in a fume hood. Incubate at 37'C for at least one 5 mg/ml PMS 1ml
hour. A filter paper overlay may be desired to
Proteins: Isozyme Elect~ophoresis 211
This stain was modified from ShaMee and Keenan Snake venom is used in this stain as a source of 1.-
(1986). This enzyme is known only from inverte- amino acid oxidase. The substitution of a less pu-
brates. rified but adequate source of this enzpine (vla
snake venom) was advantageous final~ciallyat
one time but may no longer be so. Several stain
formulas list specifically the venom of the eastern
diamondback rattlesi~ake(Crotalus adamat~teus)for
Subunit structure vaviable The terms dipeptidase use in peptidase stains (e.g., Siciliano and Shaw,
(EC 3.4.13.11) and tripeptide aminopeptidase 1976).We have tried the less expensive venom of
(EC 3.4.11.4) are recommended over the more a closely related rattlesnake, the western dra-
generic term peptidase (IUBNC, 1984).However, mondback (C, atrox, recommended herein), and
the multiple substrate affinities of these enzymes found that it yielded equivalent results. These
and problematic assignment of their homology stains may disappear quickly and thus should be
makes the exact assignment of EC numbers diffi- scored and photographed promptly. For rapidly
cult. Exceptions are those of proline dipeptidase developing peptidases (e.g., Pep-A), it may be ad-
(Pep-D; EC 3.4.13.9) and perhaps cyEoso1 vantageous to incubate these gels at room tem-
aininopeptidase (Pep-E; EC 3.4.11.1). Recom- perature to slow staining.
mended substrates for the resolution of products
of seven peptidase loci described from verte-
brates follow Frick (1981,1983, personal commu-
i~ication),Richardson et al. (19861, and/or Mat-
son (1989). The tissue distribution of these gene Subunit structure lrnccrtain
products is often restricted (e.g. Frick, 1983; Mat-
3-amino-9-ethyl carbazole 0 04 g
son, 1989). Pep-B frequently appears upon stain-
N,N-dimetl~ylformamide 2.5 ml
ing for Pep-F, as does Pep-C on Pep-A making
counterstaining necessary. Then add:
0.05 M sodium acetate buffer, pH 5.0 5 1111
Pep-A: glycyl-L-leucine[dimer] $$$
0.1 M calcium chloride 1 ml
Pep-B: L-leucylglycylglycine[monomer or
dimer] $$$ 3% hydrogen peroxide 1 ml
Pep-C: glpcyl-L-leucine or DL-alanyl-DL- Incubate the gel slice in a refrigerator, usually for
methionine [monomer] $$$ 30-60 min. This stain was modified from Shaw
Pep-D: L-phenylalanyl-L-proline [dimerl $$$ and Prasad (1970) by D.E. Soltis et al. (1983). See
Pep-E: see cytosol aminopeptidase [monomer] Brewer (1970) and Siliciano and Shaw (1976) for
Pep-F: L-leucyl-L-leucyl-L-leucine
[subunit additional PER stains.
structure unknown] $$$$
Pep-S: glycyl-L-leucine[tetramer?] $$$
The following general stain for peptidases is mod-

ified from Merritt et al. (1978) and is best used as Diiner This stain may be prepared as an agar
an agar overlay: overlay (0.14 g agar in 10 ml of water and stain
components in the remaining 10 ml of buffer).
di/tripeptide (see above) 0.04 g 0.2 M Tris-HC1, pH 8.0 10 an1
snake venom (from Crotalus atrox) 0.01 g H20 10 mi
peroxidase 0.02 g D-fructose-6-phosphate 0.012 g.
0-dianisidine dihydrochloride 0.01 g adenosine 5'-triphosphate 0.012 g
MgClZo6H20 0.04 g
arsenic acid (Na salt) 0.20 g

2-mercaptoethanol 20 fl
alclolase 36 U
tnosphosphate isomerase 500 U
a-glycerophosphate dehydrogenase 40 U Dimer This enzyme was known formerly as 6-
10 rng/ml NAD 1 ml phosphogluconate dehydrogenase (6PGD or
GPGDH). B.J. Turner (1974) identified a second
l'repare stain and incubate the gel slice at 37OC in gene product that sometimes appears with this
af~~m hood.
e Incubate for 30 min and then view stain as glucose-6-phosphate dehydrogenase.
under UV hght (long wavelength). Zones of activ- Limit the following staining solution to the irnme-
ily ~ ~ 1 appear
11 as bright areas. To enhance fluo- diate area of anticipated activity. NADP (0.02 g in
resccnce, spray the gel slice with a concentrated 400 ml) should be added to the gel before elec-
ainlnonium hydroxide solution. This stain was trophoresis.
modified froin Harris and Hopkinson (1976)' who 5 ml
0.2 M Tris-HC1, pH 8.0
recommend the addition of 2-mercaptoethanol (fi-
nal concentration of 10 rnM) and ATP (final con- 0.1 M MgC12 5 ml
centration of 0.2 mM) to gel before degassing, but 6-phosphogluconic acid 0.01 g
this may not be necessary. NADP 0.01 g
5 ing/ml NBT 0.5 ml
5 mg/ml PMS 0.5 ml
Phos phoglrr con~utrsseWZM) $$
(EC 5.4.2.2) This stain was modified from Shaw and Prasad
(1970). Harris and Hopkinson (1976) recom-
Monomer This stain may be prepared as an agar mended an agar overlay for PGDH. The use of
overlay. high amperage may result in significant streaking.
0.2 M Tris-KC1, pH 8.0 50 ml
0.1 h.l MgC12 5 ml
a-D-glucose-1-phosphate 0.1 g
G6PDH 40 NAD U
10 mg/ml NAD 2 ml Monomer This stain should be prepared as an
5 mg/ml NBT 1rnl agar overlay (0.14 g agar in 10 ml buffer and stain
components in the remaining 10 ml buffer).
5 mg/ml PMS 1ml
0.2 M Tris-HC1, pH 8.0 20 ml
This stain was described by Buth and Murphy
0.1 M MgC12 1ml
(1980) as modified from N. Spencer et al. (1964).
The reaction requires glucose-1,6-diphosphate in
u(-)3-phosphoglyceric acid 0.03 g
addltion to glucose-1-phosphate listed above. adenosine 5'-triphosphate 0.05 g
I-lowever, the former biochemical is expensive if NADH 0.02 g
purchased in purified form and only a trace is glyceraldehyde-3-phosphate
necessary. The Sigma G-7000 product is said to dehydrogenase 20 U
carry a trace of G-1,6-P (0.01-0.2%)which is usu- a-glycerophosphate dehydrogenase 5U
ally enough to stain for PGM but may not be so in Monitor the development of expression under UV
some cases. D.E. Soltis et al. (1983) and Werth light (long wavelength). Staining may occur
(1985)recommended the Sigma G-1259 product rapidly; check after 5-10 min. Enzyme activity is
tvhlcl~,while Inore expensive than G-7000, has a indicated by zones of defluorescence. This stain
sufhcient amount of G-1,6-P (1%)as a contami- was modified from those of Beutler (1969) and
nant to guarantee I2GMactivity.
Harris and Hopkinson (1976). Aebersold et al.
(1987) recommended using 10 times this amount
of magnesium chloride and adding 300 U of 0.05 M potassium phosphate buffer,
triose-phosphate isomerase. We have found the pH 7.0 50 ml
use of a-glycerophosphate dehydrogenase to be inosine 0.01 g
unnecessary in amphibians, reptiles and mam- xanthinc oxidase 0.4 U
mals, A stock partial substrate buffer solution may 5 mg/ml MTT 1mi
be prepared as follows: 5 mg/ml PMS 1ml
MgCI2*6H20 l g lncubate the gel slice at 37OC in the dark. This
EDTA 0.20 g stain was modified from those of Harris and Hop-
0.2 A4 Tris-IIC1, pH 8.0 20 ml kinson (1976) and Ward et al. (1979).
Hz0 80 ml
Store partial substrate solution under refrigera- Pyrolinc-5-Crnrbaxylakc Dehydrcsgennse
tion. Add 3-phosphoglyceric acid, NADH, ATP, (%CDHf$
and GAPDH to the partial substrate solution in
the quantities described above, incubate and view (EC a,5.1.ra
under UV light (long wavelength). Subunit structure uncertain
0.2 M Tris-HC1, p H 8.0 50 ml
Pilaclisphagltycerate Mutase L-pyroglutamic acid 0.05 g
maw $$$$ 10 mg/ml NAD
5 mg/ml NBT
1 ml
1ml
(EC 5.4.2.1)
5 mg/ml PMS 1 ml
Dimer This stain was modified from Mulley and Latter
0.2 M Tris-FIC1, pH 8.0 10 ml (1980).
MgC12*6H20 0.02 g
~(+)2-phosphoglycericacid 0.03 g
adenosine 5'-triphosphate 0.02 g Pyruvate Kinase 6PK) $$$$
disodium EDTA 0.01 g (EC 2.7.1.40)
NADH 0.01 g Tetralner The following stain may also yield
3-phosphoglyceric phosphokinase 400 U adenylate kinase (AK) gene products. A control
glyceraldehyde-3-phosphate slice from the same gel must be stained specifi-
dehydrogenase 100 U cally for AK. This stain may be prepared as an
agar overlay.
Monitor the development of expression under
UV light (long wavelength). Enzyme activity is 0.2 M Tris-HC1, pH 8.0 50 ml
indicated by zones of defluorescence. This stain 0.1 M MgC12 6 ml
was modified from those of Harris and Hopkin- adenosine 5'-diphosphate 0.03 g
son (1976) and Siciliano and Shaw (1976). An al- D(+)-glucose 0.09 g
ternative stain is suggested by Richardson et al. hexokinase 200 U
(1986). phospho(eno1)pyruvate 0.02 g
G6PDH 60 NAD U
Purine-Mzaeleoside Phosptaorcyl ase 10 mg/ml NAD 1 ml
QPNP)$ 5 mg/ml NBT 1 ml
(EC 2,4,2.2) 5 mg/ml PMS 1 ml
Trirner This enzyme was known formerly as nu- This stain was described by Buth and Murphy
cleoside phosphorylase (NP). Tlus stain should be (1980) as modified from Brewer (1970).This stain
prepared as an agar overlay may also resolve adenylate kinase gene products.
114 Chapter 4 /Murphy, Sites, Buth G.' Haufler
A fluorescent stain that develops very rapidly was

described by Harris and IHopkinson (1976).
Dimer and tetramer This enzyme was known for-
merly as indophenol oxidase (IPO) or tetrazoliurn
oxidase (TO).Mitochondria1and supernatant/cy-
tosolic forms are known (Harris and Hopkinson,
Subunit structure uncertain 1976). The mitochondria1 form is a tetrameric
retinol 0.05 g manganoprotein, whereas the cytosolic form is a
acetone 3.5 ml dimeric cuprozinc protein (Healy and Mulcahy,
1979).
Then add to:
0.2 M Tris-HC1, pH 9.0 50 ml
0.1 M phosphate buffer, pH 7.0 50 ml 0.1 M MgC12 1 ml
10 mg/ml NAD 1ml
10 mg/ml NAD 10. ml
5 mg/ml NBT 1ml
5 mg/ml NBT 0.5 ml
5 mg/ml PMS 1ml
5 mg/ml MTT 0.5 ml
This stain recipe was supplied by R.L. Garthwaite 5 mg/ml PMS 1 ml
(personal communication).
Incubate the gel slice exposed to light at ambient
temperature or at 37'C. The enzyme appears as
light bands on a dark background and is fre-
quently resolved on ather enzyme systems (e.g.,
G3PDH, PNP, ACOH). This stain is modified from
Subunit structure uncertain A.G. Johnson et al. (1970) and Siciliano and Shaw
0.2 M Tris-HC1, pH 8.0 50 in1 (1976).In some cases, resolution may be improved
shikimic acid 0.05 g if this stain is applied as an agar overlay.
NADP 0.01 g
5 mg/ml NBT 1 ml
5 mg/ml PMS 1 ml
This stain was modified from D.E. Soltis et al. Subunit structure uncertain This enzyme was pre-
(1983). viously called rhodanase (RDS).
0.2 M Tris-HC1, pH 8.0 50 ml
Succinsate Delmydrage~aase(SUDHf $$ potassium cyanide 0.012 g
(EC 1,3.963.1) sodium thiosulfate 0.5 g
Monomer 5 mg/ml MTT 1 ml
5 mg/ml PMS 1ml
0.1 M phosphate buffer, pH 7.0 50 ml
adenosine 5'-triphosphate 0.04 g This recipe was given to us by R.D. Sage (personal
succinic acid 0.2 g communicatian).
disodium EDTA 0.2 g
10 mg/ml NAD 3 ml
5 mg/ml NBT 1 ml
5 m g / d PMS I ml
Dimer This stain may be prepared as an agar
This stain was modified from Brewer (1970). overlay (0.12 g agar in 10 ml buffer, stain compo-
nents in remaining 2 ml buffer).
Proteins: Isozywze Electrophoresis 115

dihydroxyacetone phosphate solution
(see below) 1 rnl
glyceraldehyde-3-phosphate
dehydrogenase 800 U Monomer or octanzer This enzyme, formerly
arsenic acid 0.1 g known as uridine dipl~osphoglucosepyrophos-
10 mg/ml NAD 3 ml pl~orylase(UDP), is commonly resolved as two
5 mg/ml NBT 0.5 ml isozymes in plants. In humans, this enzyme ap-
pears to be octameric, whereas in plants i t is
5 mg/ml PMS 0.5 ml
monomeric. UTP is an abbreviation for uridine
This stain was modified from Brewer (1970). The tripl~osphoand should not be used as an enzyme
required amount of glyceraldehyde-3-phosphate system designation. Prepare as a n agar overlay
can oiten be reduced by 50%. Preparation of the (0.07 g agar in 6 ml buffer and mix the stain com-
dihydroxyacetone phosphate (DHAP) solution ponents in remaining 3 ml buffer and water).
follows Morizot and Schmidt (1990). Rinse 5 g
Dowex-50 resin (supplied with DHAP) four times 0.2 M Tris-HC1, pH 8.0 1.5 ml
in distilled water. Add 0.25 g DHAP and 5 g Hz0 7.5 ml
washed Dowex-50 resin to 10 ml water; swirl for uridine 5'-diphosphoglucose 0.04 g
30 sec. Filter into graduated cylinder and add wa- pyrophosphate 3 ing
ter to 25 ml. Incubate at 37OC for 4 hr. Adjust pH cl-~-glucose-1,6-dipl1ospl1ate 0.5 mg
to 4.5 with KHC03. Freeze in 1 ml aliquots. An alpl~osphoglucomutase 75 u
ternative procedure that does not involve the very G6PDH 10 NAD U
expensive DHAP is described by Ayala et al. 10 mg/ml NAD I ml
(1972).The latter procedure requires two hours of 5 mg/ml NBT 1 ml
substrate preincubation and two pH adjustments 5 mg/ml PMS 1 rnl
for each stain preparation.
Stains rapidly with good resolution. Excessive py-
rophosphate will suppress enzyme activity. This
TBiys~sineAmino transf erase (TAT) $$$ stain was modified from Harris and flopk~nson
(EC 2,6.';8.5) (1977) and Jech and Wheeler (1984).
Subunit structure uncertain The stain sl~ouldbe pre-
pared as an agar overlay (0.14 g agar in 10 ml H20,
mix the stain components in the 15 ml buffer).
0.2 M Tris-HC1, pH 8.0 15 ml Monomer This enzyme was known formerly as
H20 10 rnl uridine monophosphatc kinase (UAdPK). 1his
L-tyrosineHC1(1% solution in stain should be prepared as an agar overlay (014
0.2 N HC1) 1 ml g agar in 10 ml. buffer and mix the stain compo-
a-ketoglutaric acid 0.01 g nents in remaining 10 ml buffer).
adenosine 5'-diphosphate 0.01 g 0.2 M Tris-HCl, pH 8.0
pyridoxal5-phosphate 0.01 g MgC12*6H20
L-glutamicdehydrogenase 100 u uridine 5'-monophospi~ate
10 mg/ml NAD 1ml adenosine 5'-triphosphate
5 mg/ml NBT 1 ml pl~ospho(enol)pyruvate
5 mg/ml PMS 1 ml NADH
This stain was modif3ed from Aebersold et al. (1987) potassium sulfate
and D.E. Campton (personal communica~on). pyruvate kinase
L-lacticdehydrogenase
116 Chapter 4 / Murphy, Sites, Buth & Hal
Monitor the development of expression under W 1. 1:5:5 glacial acetic acid:methanol:water

light (long wavelength). Enzyme activity is indi-
cated by zones of defluorescence. This stain was Caution must be exercised when handling Axed
il~odiiledfrom Harris and I-Iopkinson (1976). gel slices; do not breathe vapors. Do not soak
(completely immerse) gels stained with MTT in
this fixative or significant fading will result. This
fixative is from Selander et al. (1971).
2. 50% ethanol (in water)
Mo170~1tlror dinzer XO, LDK, ALDH, and A 0 may
be resolved upon staining for XDH (see Maldon- This should not be used for the fixation of AAT
ado, 1992)which has led to speculations about the and PER stains; see D.E. Soltis et al. (1973).
existence of XDH in vertebrates (Richardson et al., 3. 50% glycerol (in water)
1986). Hypoxanthine is quite insoluble; Brewer
(1970) recommended heating the substrate in the This fixative may be preferred for gels stained us-
buifer and then adding the other components ing MTT as a dye to reduce fading. Soak gels for
when the solution has cooled. Richardson et al. several hours before wrapping if the gels are to be
(1986) recommended suspending the hypoxan- saved. This method of fixation may result in the
thine in acetone (0.04 g hypoxanthine per ml of resolution of faint LDH isozymes (see also D.E.
acetone). Another method to increase the amount Soltis et al., 1983; Werth, 1985).
of substrate in solution requires the preparation of
the follo~ving:
APFENDJX 2: BUFFERS AND
0.2 Ivt Tris-HCI, pH 8.0 5 ml
0.1M KOH 5 ml TRACKING DYE FOR ifSOZVMB
hypoxanthine (see note below) 0.2 g ELECTRQI3NORESiL4i
Stir this solution for at least 10 min, then add: (Compiled by Donald G. Buth and Robert W.
Murphy)
0.2 M Trls-HC1, pH 8.0 40 ml
NADI1: 0.02 g The importance of buffers has reemerged as a crit-
10 rng/ml NAD 1 ml ical issue for the resolution of enzymes in starch
5 mg/ml NBT 1 ml gels. However, the search for optimal buffers of-
5 mg / rnl PMS 1 ml ten leads to a trade-off of cost versus quality. The
more buffers needed for optimal resolution of a
Adjust the pH to 8.0, if necessary. Cover the gel large array of enzymes, the higher the cost in both
slice with the stain solution; make sure that any effort and funds. To illustrate both extremes, Har-
undissolved hypoxanthine covers the gel slice. In- ris and Hopkinson (1976) recommended a differ-
cubate the gel slice at 37OC in the dark. Note that ent buffer, or slight modification thereof, for al-
for some species, as much as 1 g of hypoxanthine most every enzyme system listed, whereas
m a y be required for optimal resolution. For some Siciliano and Shaw (1976) "worked to develop
srganls~ns,xanthine may be a more suitable sub- just a few buffer systems which would give clear
siratc This stain was modified from those of resolution of many proteins" and reported clear
Brcwer (1970),Shaw and I'rasad (19701, and Mal- resolution of enzymes using only two buffers;
douado (1992). Tris-citrate, pH 7.0, and Tris-borate-EDTA 11, pH
8.0. Many laboratories have geared their opera-
tions toward the latter extreme, which has re-
sulted in less than "clear resolution" for many en-
Mal~ystain fixing solutions have been used. These zymes of many taxa. We encourage the initial
include the following: screening of enzyme systems using several of the
Proteins: Isozyrne Electrophoresis 117
buffer systems listed herein followed by compar- is not necessary to use non-denatured ethanol al-
isons using similar buffers, if necessary, for opti- though this may be preferable to keep methyl and
mal resolution of enzymes. Additional buffers are isopropyl alcohol out of the gel.
listed by Brewer (1970), Selander et al. (1971),
Clayton and Tretiak (1972): Harris and Hopkinson
(1976), Steiner and Joslyn (1979)) Shaklee and
Tamaru (19811, Conkle et al. (1982), D.E. Soltis et Stock solution:
al. (19831, Cheliak and Pitel (19841, Werth (1985), (0.04 M)citric acid monohydrate 8.4 g/liter
Micales et al. (19861, Selander et al. (1986), Shak-
lee and Keenan (19861, Aebersold et al. (1987))and Adjust to desired pH by adding =lo-15 ml/liter
Morizot and Schmidt (1990). Several of those N-(3-aminopropy1)-morpholine
listed for use in cellulose acetate electrophoresis Electrode: Undiluted stock solution
by Richardson et a1. (1986:153-154) may be ap-
plicable to starch gel work. The reader must re- Gel: 1:19 dilution of stock solution
main aware that buffer formulas are usually de- These gels are hazardous and should be handled
rived empirically and additional modification only with protective gloves. This buffer was de-
should be encouraged. Sambrook et al. (1989) pre- scribed by Clayton and Tretiak (1972). Werth
sented a useful appendix for the preparation of (1985),Shaklee and Keenan (19861, and Aebersold
phosphate buffers. et al. (1987) recommended its use at pH 6,1, 6.0,
Our buffer accounts include a descriptive and 7.0, respectively. Its range of use may be p H
name of the system, molarities of components in 6.0-8.0 (D.E. Campton, personal communication).
solution, exact gram measures of components in Aebersold et al. (1987) suggested the inclusion of
one liter equivalents, formulas for stock solutions 0.01 M EDTA in the stock solution.
as well as dilutions for electrode chambers and
the gels, and references. We have resisted listing
the electrical potential for each of these buffer Axnine-C itrate QPropanol)
systems, although many other compilations of Stock solution:
buffer formulas provide such information. (0.04 M)citric acid monohydrate 8.4 g/liter
Among these, only Brewer (1970) identified cor-
rectly the fact that such potentials are related to Adjust to the desired pH by adding =lo-15
the length of the gel mold and should be ex- ml/liter bis(dimethy1amino)-2-propanol
pressed as volts per linear centimeter of gel. We Electrode: Undiluted stock solution
find most published voltages to be at or beyond
the high end of applicability and improved reso- Gel: 1:19 diIution of stock solution
lution (as well as lab planning) can be gained This buffer was described by Clayton and Tretiak
with electrophoretic runs for longer duration at (1972).It may be optimal at pH 57.5.
lower voltages.
Elcckrophsrssis Backing Dye

Stock solution:
amaranth (0.25 M)boric acid 15.5 g/liter
brilliant blue 6
ethanol Adjust to pH 8.6 with NaOH (pellets)
W20 Electrode: Undiluted stock solution
This stain was modified from a recipe formerly Gel: 1:9 dilution of stock solution
prepared by Gelman Sciences, Inc., Ann Arbor,
MI, U.S.A. It gives both blue and red markers. It This buffer was modified from Sackler (1966) who
used it at pH 8.8.
118 Chapter 4 / Murphy, Sites, Buth G-' Ha1
Borate (discontin%;c.rersb which is a slight modification of that described by

Ridgway et al. (1970).This buffer will often cause
Electrode: gels to separate as the citric acid buffer front:
(0.30 M)boric acid 18.6 g/liter passes tlvough the origin.
(0.03M) sodium chloride 1.75 g/liter
Adjust to pH 8.0 with 2 N NaOH Bhospl~ate-Citra te
Gel: (0.02 M)boric acid 1.24 g/liter Stock solution:
Adjust to pH 8.6 with 2 N NaOH (0.214 M) potassium phosphate
dibasic (K2HF04) 37.3 g/liter
This buffer was described by Brewer (1970). (0.027M) citric acid monohydrate 5.67 g/liter
Adjust to pH 7.0
Electrode: Undiluted stock solution
Electrode:
Gel: 1:25 dilution of stock solution
(0.41M) citric acid dihydrate
(trisodium salt) 20.6 g/liter This buffer was modified from Selander et al.
(1971),who called for 0.214 M potassium phos-
Adjust to pH 7.0 or 8.0 with HC1. phate dibasic but provided the measurement
Gel; (5 mn/n L-histidine (g/Iiter) for this rnolarity of potassium phosphate
HC1 monohydrate 1.05 g/liter monobasic (KH2P04).If the latter is used, use 29.1
g/liter. Harris and Hopkinson (1976) described a
Adjust to pH 7.0 or 8.0 with NaOH. similar buffer using 0.245 M sodium phosphate
This buffer was described by Fildes and Harris monobasic and 0.15 M citric acid monohydrate
(1966), Brewer (1970), and Harris and Hopkinson adjusted within the range of pH 5.9-7.5; dilute the
(1976).The electrical current may drastically in- stock 1:39 for the gel.
crease after running for a few hours. These gels
shouId be monitored while running.
Stock solution:
(0.90 M ) Tris 9.0 g/liter
Stock solution A: (0.50 M) boric acid 0.9 glliter
(0.19 M) boric acid 11.8 g/liter (0.02M) disodium EDTA 6.7 g/liter
(0.03M) lithium hydroxide Adjust to pH 8.6 with NaOH (pellets)
(LiOH*H20) 1.26 gjliter
Electrode:
Adjust to pH 8.1
Anode: 35 ml stock solution + 215 ml Hz0
Stock solution B: (1:6 dilution)
(0.05M) Tris 6.06 g/liter Cathode: 50 m1 stock solution + 200 ml H 2 0
(0.008 M) citric acid monohydrate 1.68 g/liter (1:4 dilution)
Adjust to pH 8.4 Gel: 1:19 dilution of stock solution
Electrode: Undiluted stock solution A This is a modification of the buffer described bp
Boyer et al. (1963) referred to as EBT by ER. Wil-
Gel: 1:9 mixture of stock solutions A:B, final
son et al. (1973) and Shaklee and Keenan (1986),
pH 8.3
and as TBE by Aebersold et al. (1987).Shaklee and
This discontinuous buffer is the lithium hydrox- Keenan (1986)recomn~endedthe use of 7.4 g/liter
ide buffer described by Selander et al. (1971), tetrasodium EDTA in this buffer.
Proteins: Isozy?ncElec troplzoresis 119
sucrose
r-120
Stock solution:
(0.50 M) Tris 60.6 g/liter Tkis buffer system was described by B.J. Turner
(0.65 MI boric acid 40.2 g/liter (1973).
(0.02 M)disodiurn EDTA 6.7 g/liter
Adjust to pH 8.0 'Tr'pSs-Citra tr TI
Electrode: Undiluted stock solution Stock solution:
Gel: 1:9 dilution of stock solution (0.687 M) Tris 83.2 g/liter
These gels tend to be thick and are particularly (0.157M) citric acid monokydrate 33.0 g/liter
difficult to aspirate, pour, and slice. Stock solu- Adjust to pH 8.0
tions are not suitable for long-term storage and
better results may be found using fresh solutions. Electrode: Undiluted stock solution
This is the TVB (Tris-versene-borate)buffer of Se- Gel: 1:29 dilution of stock solution
lander et al. (1971) and Siciliano and Shaw (1976).
Another version of this buffer is described by This is the continuous Tns-citrate I1 buffer of Sc-
Brewer (1570): 0.21 M Tris, 0.15 M boric acid, and lander ct al. (1971). D.E. Soltis et al. (1983) llstcd
6 mM disodium EDTA adjusted to pH 8.0 for the other modifications of the Tris-citrate buffers of
electrode and 21 mM Tris, 20 mM boric acid, and Shaw and Prasad (1970) including (1) 0.135 M
0.68 mM disodium EDTA adjusted to pH 8.6 for Tris, 0.032 M citric acid, pH 8.0, diluted 1:14 for
the gel. Werth (1985) described another system, the gel, (2) 0.135 M Trls, 0.017 M citric acid, pH
termed "salamander 0" and attributed to S.I. 8.5, diluted 134 for the gel, (3) 0.223 M Tns, 0 086
Guttman, which uses the same stock solution and M citric acid, pH 7.5, diluted 1:27.5for the gel, and
concentrations for the electrode and gel buffers: 84 (4) 0.223 M Tris, 0.065 M cltric acid, pH 7.2, di-
mM Tris, 7.9 mM boric acid, and 0.86 mM di-
luted 1:27.5 for the gel. Other variations include
sodium EDTA adjusted to pH 5.1 with HC1. 0.13 M Tris, 0.043 M citric acid, pH 7.0, diluted
1:14for the gel (Siciliano and Shaw, 1976), 0.094 A4
Tris, 0.0235 M citric acid, pH 8.6, diluted 1.5 for
$i-is-Borate-ED7A-Eithiglrrn the gel (Harris and Hopkinson, 19761, and 0 22 M
Tris, 0.086 M citric acid, pH 5.8, diluted 1:27.5 for
Stock solution: the gel (Shaklee and Keenan, 1986).
(0.9 M) Tris 109 g/liter
(0.4 M) lithium hydroxide
(LiOH.H20) 16.8 g/liter
(0.5M) boric acid 30.9 g/liter Stock solution:
(0.1 M ) EDTA free acid 29.2 g (0.75A4) Tris 90.8 g/liter
Hz0 to 1000 ml (0.25 M) citric acid monohydrate 52.5 g/liter
Adjust to pH 9.1 with NaOH Adjust to pH 7.0 with NaOH (pellets).
Electrode: Electrode:
stock solution 225 ml Anode: 35 ml stock solution + 215 ml H 2 0
sucrose 100 g (1:G dilution)
Hz0 to 2000 ml Cathode: 50 rnl stock solution .t 200 ml HzO
Gel: (1:4dilution)
stock solution 40 rnl Gel: 1:19 dilution of stock solution

120 Chapter 4 / Murphy, Sites, Buth b Haufler
This buffer was described by Whitt (1 970) and Adjust to pH 9.6

Rainboth and Whitt (1974).
Gel: 1:9 dilution of stock buffer
This buffer was described by Harris and Hopkin-
Electrode: son (1976).
(0.30 M )boric acid 18.6 g/liter
(0.06 M )sodium hydroxide 2.4 g/liter
Adjust to pH 8.2 Electrode:
Gel: (0.30 M) boric acid 18.6 g/liter
(0.076 M ) Tris 9.21 g/liter (0.06 M) sodium hydroxide 2.4 g/liter
(0.005 M)citric acid monohydrate 1.05 g/liter Adjust to pH 8.2
Adjust to pH 8.7
Gel:
This is the discontinuous Tris-citrate or Poulik
buffer described by Selander et al. (1971). This (0.01 M) Tris 1.21 g/liter
buffer will often cause gels to separate as the bo- Adjust to pH 8.5 using concentrated HCI
rate buffer front passes through the origin. Chip-
plndale (1989) modified this system by using 26.2 This discontinuous buffer was described by Se-
lander et al. (1971). Harris and Hopkillson (1974)
g/liier Tris to bring the pH of the gel buffer to 9.5;
111smodification is excellent for resolving other- recommended a continuous buffer modification
wise fuzzy systems. using 0.1 M Tris-HC1 with a 1:4 dilution for the gel
or 0.3 M Tris-HC1 with a 1:14 dilution for the gel.
They recommended the use of this buffer in the
range of pH 8.6-9.6.
Stock solution:
(0.135 M) Tris 16.4 g/liiter
(0.045 M)citric acid rnonohydrate 9.46 g/liter Stock solution:
(1.3 mM) disodium EDTA 0.44 g/liter
(0.10 M)Tris 12.1 g/liter
Adjust to pH 7.0 (0.10 M)malcic acid 11.6 g/liter
(0.01 M)disodium EDTA 3.36 g/liter
(0.01 M) MgC12-6H20 2.03 g/liter
Gel: 1:14 dilution of stock solution
Adjust to pH 7.4 with NaOH
This buffer is modified from Avise et al. (1975). A
modification of t h s system using different molar- Electrode: Undiluted stock solution
ities of the components (pH 5.7-8.6) and limiting Gel: 1.:9 dilution of stock solution
the disodium EDTA to the gel buffer was de-
scribed by Harris and EIopkinson (1976). This buffer was described by Brewer (1970) and
Selander et al. (1971) at pH 7.6 and 7.4, respec-
tively. Brewer (1970) noted that some of the
reagents (e.g., EDTA) will not go into solution un-
til the NaOH is added. Harris and Hopkinson
Stock solution: (1976) recommended the use of this buffer in the
(C 1 Mj Tris 12.1 g/litcr range p H 6.5-7.4.
(00045 M ) disodium EDTA 1.5 g/liter
Chnpte
Chromosomes:
ecular Cytogenetics
Stanley K.Sessions
PRINCIPLES AND COMPARISON OF METHODS
General Principles
This chapter concerns the analysis of lnicroscopically visible aspects of the molec-
ular structure of chromosomes. The term "chromosome" was introduced in 1888
by Wilhelm Waldeyer, and the chromosomaI theory of inheritance was put for-
ward and elaborated by Theodore Boveri, Walter S. Sutton, and Thomas H. Mor-
gan in the first part of this century. Ever since that time the study of chromosomes
(known as either karyology or cytogenetics) has occupied a prominent place in
genetics in both clinical and academic applications, as well as in comparative bi-
ology and phylogenetic studies. Comparative cytogenetics is thus an old field
with diverse schools of interpretation concerning chromosome structure, func-
tion, and evolution.
The history of cytogenetic research can be divided into several eras, each cou-
pled with major technological innovations that triggered revolutions in analytical
approaches (Hsu, 1979). The modern era in cytogenetics, including the incorpo-
ration of molecular methods, was initiated by the development of four main tech-
nological breakthroughs: (1) the discovery that hypotonic treatment spreads
metaphase chromosomes, allowing accurate assessments of chromosome num-
bers and morphology; (2) the development of chromosome banding techniques,
which allows the identification of homologs (within karyotypes of the same
3.22 Chapter 5 / Sessions
Figure 1 Radioisotopic in situ hybridization of ribo- tion, including one homologous pair (chromosome no.
somal probe to chromosomes of salamanders of the 2) and two single chromosomes. (D) P. larselli showing
genus Plethodon. (A) P, d u i ~ nshowing
i labeling over the heavy labeling on the smallest pair of chromosomes
short arm of a medium-sized submetacentric chromo- (no. 14) and scattered labeling over all the chromo-
some. (B) P. veiziculum showing four distinctly labeled somes. (From Macgregor and Sherwood, 1979, with
sites. (C) glutinosus showing four sites of hybridiza- permission.
species) and homoeologs (between karyotypes of al., 1969; Hsu, 1979; Macgregor, 1993); and (4) the
different species); (3) the development of tech- use of immunochemistry, in conjunction with ISH,
niques for i n situ hybridization (ISH) of nucleic to allow the non-radioisotopic detection of hy-
acid probes to cytological preparations of chro- bridized probes with various fluorochromes in a
mosomes, by which specific DNA sequences can process known as chromoso~nepainting (Plate 11,
be localized to particular chromosomes and parts used not only for mapping sequences on chromo-
of chromosomes (Gall. and Pardue, 1969; john et somes, but for identifying chromosomal homolo-
Chromosomes:MolectiLar Cytogenetics 123
gies (synteny)between species (Lichter and Ward, catifig single-copy DNA sequences on mitotic
1990; Trask, 1991; Sasavage, 1992; Wienberg et al., chromosomes (Figure 2; Harper and. Saunders,
1992; Luke and Verma, 1993; Therman and Sus- 1984; Steinmuller et al., 1993). Almost all. moiecu-
man, 1993). lar cytogenetics done these days utilizes non-iso-
The field of molecular cytogenetics is cen- topic in situ hybridization (NISH)teclu~iques.The
tered on the technique of in situ hybridization of most prominent: of these is fluorescence in situ
nucleic acids using radioactive or non-radioactive hybridization (FISH, Plate I), in which norx-ra-
probes, and this methodology will be discussed in dioactive, biotinylated hybridized probes arc vi-
detail in this chapter. The basis of in situ nucleic sualized via fluorescently labeled monoclonal
acid hybridization is the annealing of labeled, mo- and/or polyclonal antibodies wl1ic11, w l ~ e nam-
bile probe molecules and stationary target mole- plified with avidin-biotin and subjected to corn-
cules to form base-paired duplexes. Comparative puter-assisted image processing, results in a de-
studies using in situ hybridizatioi~have tradition- gree sf specificity, resolution, and versatility that
ally involved locating repetitive sequences, such exceeds even the best autoradiographic ISH
as satellite DNA, ribosomal gene clusters, or the (Langer et al., 1981; Manuelidis et al., 1982; Pinkel
extensively reduplicated genes of polytene chro- et al., 1986; Frommer et al., 1988; C.A. Porter et al.,
mosomes using highly radioactive molecular 1991, 1994; Therman and Susman, 1993). High-
probes detected via autoradiography (Plate 1).In resolution detection of biotinylated probes can~bc
situ hybridization can also be used to locate spe- achieved by using FISH in conjunction with con-
cific RNA transcripts on lanzpbrush chromosomes focal laser scanning microscopy (CLSM),or by us-
(Figure 1; Diaz et al., 1981; Varley et al., 1980),and ing gold-conjugated antibodies and transmjssion
techniques have been developed for reliably lo- electron microscopy (TEMISH; Fetni et al., 1992;
- Expected
--.-o--- Observed
Chramosome number
Figure 2 Localization of human insulin gene. Com- some n.Expected number of s11ver grains calculated as
posite label from 35 cells hybridized with tritiated silver grains per unit length, assurnmg random d~strlb-
probe (0.2 pg/ml for 11hr and exposed for 11 days). La- ution. (After Harper et al., 1981.)
be1 is concentrated at telomere of short arm of chromo-
124 Chapter 5 / Sessions
Mulaceiitric Submetacentric Subtelocentric Telocentric

Figure 3 Chromosome morphologies. Two different
slzcs of each morphology are shown. Telocentric and
sub telocentric chromosomes are sometimes referred to kinds of chromosomes, including polytene and
as acrocentrics. lampbrush chromosomes as well as mitotic
metaphase spreads, and have also included meth-
ods for obtaining different kinds of banding pat-
Ste~nrnulleret al., 1993). Another novel and po- terns for the identification of homologs. The band-
tentially powerful approach to the study of the ing patterns themselves reveal information about
cornparatlve molecular structure of chromosomes the molecular structure of chro~nosomes,espe-
uses monoclonal antibodies directed against nu- cially when sequence-specific dyes are used.
clear proteins on polytene chromosomes of insects In many cases, chromoso~nescan be identi-
and lampbrush chromosomes of salamander fied on the basis of morphological characteristics
oocytes (Ragghianti et al., 1988). such as relative size, centromere position, and sec-
These advances, along with the availability of ondary constrictions (Figures 3 and 4; Table 1).
a w ~ d erange of probes and the increasing acces- Some chromosomes, such as insect polytene and
sibility of techniques for constructing new probes, oocyte lampbrush chromosomes, have intrinsic,
as well as the commercial availability of probe la- complex patterns of bands or other markers that
bcllng kits and antigen-specific monoclonal and facilitate the identification of hornologs (M.J.D.
polycIonal antibodies, make molecular cytogenet- White, 1973; Macgregor, 1993). Chromosome
ics an increasingly powerful approach for the identification in most species, however, depends
study of chromosomal evolution. ISH, and partic-
ularly chromosome painting using FISH and re-
lated techniques, has become the simplest and Table 1
most direct way to physically map genes or any
DNA sequences to chromosomes (AS. Hender- Terminology for chromosome morphology
son, 1982; McKusick, 1988; Therman and Susman, (centromere position)
1993). Centromerr indexR Terminology
The usefulness of ISH for physical mapping
of specific genes and other kinds of sequences to 0.00-0.12 Telocentric = t
particular chromosomes and regions of chromo- 0.13-0.25 Subtelocentric = st
sornes depends not only on the quality of cyto- 0.26-0.37 Submetacentric = sm
logical preparations but also on the unambiguous 0.38-0.50 Metacentric = m
ideiitification of homoIogous and homeologous
From Levan et al. (1964) and Green et al. (1980).
chromosomes. For this reason, I have included Centromere index = length of short arm/length of whole
some reliable methods for preparing various chromosome.
Chromosomes: Molecular Cytogenetics 125
Figure 4 Diagrammatic representation of C-band

positions relative to chromosome morphology, c,
centromeric; pc, pericentric; i, interstitial; t, telomeric;
st, subtelomeric.
on an analysis of induced banding patterns. Usu-

ally it is necessary to band the same chromosome
preparations that are used for in situ hybridiza-
tion, although this can be simultaneous in the case
of chromosome painting.
Once homologs are identified, the chromo-
somes can be arranged as a karyotype (Figure 5)
by cutting out photographic prints of chromo-
somes and pasting the homologs in pairs on white
cardboard. The chromosomes also can be mea-
sured (e.g., with a ruler, map measuring wheel, or
digitizer pad) to obtain relative lengths and cen-
tromere indices (Table 1).These quantitative data
can be used to classify each chromosome's mor-
Figure 5 Karyotype of a salamander, Ambystoma lat-

erale female; secondary constrictions (NORs)are seen at
a subtelocentric position on the short arm of chromo-
some 6. (From Sessions, 1982.)
Figure 6 Composite idiogram of two species of Am-

bystoma. Cold-induced secondary constrictions are in-
dicated by white lines, and C-bands are indicated by edge of the building blocks of immunology to
dark areas. more fully understand these methods and to be
able to modify them when things go wrong. Two
phology (Figure 4; Table I), and to construct idio- basic assumptions about antigens are immuno-
grams (Figure 6). genicity, which is the ability to induce antibody
formation, and specific reactivity with the anti-
body it caused to be produced. The reaction be-
Assumptions tween antigen and antibody results in the forma-
The primary assumption in the use of in situ hy- tion of immune complexes comprising several
bridization is that cl~romosomalDNA can be de- antigen and antibody molecules, which may be-
natured in such a way that it will anneal with rea- come very large and form measurable precipitates.
sonably high efficiency to complementary FISH utilizes probes that have been labeled with
single-stranded nucleic acid probes to form hy- biotin or BrdU, which can serve as the antigens.
brid duplexes. This is not a trivial matter, since Antibodies, often called immunoglobulins
chromosomal DNA is complexed with various (Ig), are composed of four protein chains: two
chromosomal proteins and RNA. In fact, the effi- heavy chains and two light chains. Antibody so-
ciency of in situ hybridization is constrained by lutions used for immunochemical staining contain
the difficulty of obtaining complete denaturation mostly secreted IgG-type antibodies. These form
of chromosomal DNA, the loss of DNA during a Y-shaped structure consisting of a crystallizing
fixation and slide pretreatment, and the presence fragment (Fc), comprised of two heavy chains
of chromosomal proteins (AS. Henderson, 1982). only (the stem of the Y), and an antigen-binding
It is commonly assumed that banding pat- fragment (Fab), comprised of heavy and light
terns reflect differences in sequence organization chains (the bifurcating part of the Y). An IgG mol-
(e.g., GC-rich or AT-rich regions) or repetitiveness. ecule thus has the ability to bind two antigen mol-
For example, G-bands are thought to represent ecules, one at each Fab site. The two main types
AT-rich regions, and C-bands correspond to con- of antibody preparations used for immunochem-
stitutive heterochromatin, generally assumed to istry are polyclonal antibodies (prepared from
be rich in highly repetitive sequences (Comings, whole serum, containing a variety of specific and
1978; Hsu, 1979; Schmid and Guttenbach, 1988). non-specific antibodies), and monoclonal anti-
Chromosome painting involves immuno- bodies (produced by a clone grown from a single
chemistry, which is based on specific binding in- B-ceIl/myeIoma somatic cell hybrid, or hy-
teractions between antigens and antibodies. Com- bridoma).
mercially available kits make it possible to The success of FISH, and of chromosome
perform the irnmunochemistry necessary fox FISH painting in general, is based largely on the bio-
with little or no knowledge of immunology. Nev- chemical properties of avidin and biotin (McInnes
ertheless, it is important to have a basic knowl- et al., 1987).Avidin is a glycoprotein found in high
Chrornosonzes: Molecular Cyfogenetics 127
concentrations in egg white (streptavidin, a simi- ture conditions have been used for various
lar protein made by the bacterium Streptomyces species, and often the tissue culture requirements
nvidirzii, is sometimes used instead) that non-im- must be worked out for particular species of m-
munologically binds four molecules of t l ~ evita- terest. One great advantage of in vitro culture is
min biotin. This property of avidin-biotin binding that it may be possible to synchronize cell cycles
allows the amplification of hybridization signal to increase the yield of cl~romosomes,decrease the
when used in co~~junction with biotin-labeled (= variation in chromosome condensation between
biotinylated) probes, and anti-avidin and/or spreads, and decrease the required dose of the im-
anti-biotin antibodies which are themselves con- totrc arrest agent (Watt and Stephen, 1986).
jugated with biotin or fluorochromes. In general, both in vivo and in vitro chromo-
some cultures require a mitotic spindle inhibitor
Comparison of the Primary Methods to block cells in mitotic metaphase. The most com-
monly used spindle inhibitors are colchicine, its
Prepamtion of Mitotic Metaphase Chvomosomes synthetic analog colcemid (deacetylmethyl-
The preparation of useful spreads of mitotic colchicine),and vinblastine (Tji0 and Levan, 1956;
metaphase chromosomes involves five steps: (1) Macgregor and Varley, 1983; Watt and Stephen,
selection of tissues with a high mitotic activity (or 1986).For in vivo culture, relatively high conccn-
stimulation of such activity), (2) in vivo or in vitro trations of colchicine arc injected directly into the
treatment with a mitotic arresting agent (with or body cavity or under the skin (colcemid is sub-
without cell cycle synchronization), (3) hypotonic stantially more potent than colchicine and is uscd
treatment of tissues or cells, (4) fixing (and stor- at lower concentrations). Some organisms, such as
mg) tissues or cells, and (5) making permanent aquatic amphibian larvae, or tissues, such as plant
chromosome preparations on slides. Specific pro- root tips, can simply be immersed in a solution of
tocols for the production of mitotic chromosome colchicine. The optimal treatment time depends
spreads are given later in this chapter. on the cell cycle of the species used and on the de-
The simplest method for obtaining mitotic sired level of cl~romosornecontraction. Cell cycle
chromosomes is to select a tissue that has an in- time is proportional to genome size (C-value, the
trinsicalIy high rate of mitotic cell division in vivo. haploid amount of nuclear DNA), at least in poik-
Root tips in plants, and developing embryos, lar- ilotlierms. The treatment time is short (1-2 lir) for
vae, or regenerating blastemas in animals are organisms with small C-values (most vertebrates
gaod in this regard. In adult vertebrates, high mi- and invertebrates), but is much longer (24-72 hr)
totic rates may be found in bone marrow, intesti- for species with very large C-values (e.g., lung-
nal epithelium, corneal epithelium, kidneys, fish, salamanders, and certain species of flowcr-
spleen, and gonads, depending on the species. ing plants). Much smaller amounts of colchicine
Mitotic proliferation can sometimes be stimulated (or colcemid) are needed for cells in tissue culture.
in vivo by subcutaneous or intraperitoneal injec-
tion of a mitogen such as phytohemagglutinin Squash and Splash Techniqrres
(PHA) or pokeweed mitogen (PWM), or even ac- There are two main techniques for the production
tivated yeast suspension. Use of these tissues of permanent chromosome preparations sultable
avoids the need for tissue culture, which can be for molecular hybridization studies: the "squasl~"
expensive and unpredictably time consuining and "splash" tecliniques. Both are designed to
when Inany species are studied. The disadvantage achieve optimal flattening and spreading of cyto-
of the in vivo technique is that specimens gener- logical material on microscope slides. In the
ally must be kdled to harvest the tissue. squash technique, small pieces of tissue (tluck cell
In vitro methods involve culturing peripheral suspensions can also be used) are macerated
blood or cells from explants of various other kinds and/or finely minced on a slide and then firmly
of tissues (see Fresl~ney,1994, for general tissue squashed beneath a siliconized coverslip. In the
cuIture methods). Many different media and cul- splash technique, a thick suspension of cells is
placed onto a slide from a pipette, usually from a obtained during the process of screening using
distance, and spreading of cells and chromosomes phase contrast optics. A disadvantage of the
occurs via surface tension. For both techniques it squash technique is that the chromosomes can be
is critlcal that the tissues be exposed to hypotonic damaged or lost during the process of making the
soiutron (either distilled water or dilute KCI) be- slides permanent. Another common problem witk,
fore fixation, and then fixed in freshly prepared, the squasl~technique is that the preparation can
ice-cold 3:l fixative made with 3 parts absolute be ruined by the slightest sideways movement of
ethanol (for squashing) or methanol (for splash- the coverslip during squashing or by the inclusion
ing) and 1 part glacial acetic acid. of large bits of tissue, lint, or air bubbles beneath
A useful feature of the 3:l fixative is that tis- the coverslip. The squash technique works best
sues (or cell pellets) can be stored in it for years so for organisms with very large chromosomes, es-
long as they are kept in tightly sealed vials at pecially salamanders, plants, and insect polytene
-20°C. If such long-term storage is necessary (or chromosomes, and is not recommended for
desirable) it is important to make sure that the tis- species with very small chromosomes such as
sues have been well fixed in at least two changes mammals, birds, and reptiles because of the diffi-
of 3 1 for at least 15 min (to remove all water) and culty in obtaining sufficiently flattened chromo-
then stored in fresh 3:l. For preparations that will somes.
eventually be used for in situ hybridization, it is The splash technique involves preparing a
advisable to keep the 3:l fixative ice-cold to mini- suspension of cells fixed in 3:l methano1:acetic
mize hydrolysis. Storage at -20°C is necessary be- acid. Methanol is used because it evaporates
cause this fixative decomposes rapidly at room faster than ethanol. Cells are collected from a cul-
temperature. Cytological tissues fixed and stored ture medium, dispersed, and then inc~ibatedin a
in Lhls manner have been used successfully not hypotonic solution (0.075 M KC1). The cells are
only for in situ hybridization but also for the ex- then centrifuged out of the hypotonic solution
traction of high-molecular-weight DNA se- and re-suspended in fixative (preferably ice-cold),
quences (P.E, Barker et al., 1986). washed several times in fresh fixative, and finally
To make slide preparations using the squash re-suspended in a small volume of fixative. The
technique, small pleces of tissue are removed resulting concentrated cell suspension can either
from the 3:l fixative and briefly soaked in 45% be stored at -20°C, or used immediately to make
acetic acid (this treatment hydrolyzes cytoplasmic slides. The concentrated cell suspension is pipet-
components and can eventually reduce the tissue ted (splashed) onto slides using various tech-
to a nuclear suspension). Plant cells usually have niques designed to maximize spreading of cells
to be hydrolyzed in warm HCl before the 45% and chromosomes. One commonly used method
acetic acid step, to soften the cell wall. The soft- is to splash several drops of the cell suspension
ened tissue is then removed in a small drop of onto ice-cold slides wet with distilled water from
acetic acid to a very clean, subbed (gelatinized) a height of 0.5 m or more and drying them on a
microscope slide, minced as finely as possible, slide warmer (40°C). Another technique is to
covered with a siliconized coverslip, and flame-dry the slides by holding the slide over a
squashed firmly with the thumb on a cushion of bunsen burner and letting the methanol ignite
absorbent paper. The preparation is now either and burn off. Yet another technique is to use a
examined directly, or made permanent by freez- 1-ml pipetter and a plastic pipette tip to pipette a
ing on dry ice. The slides can then be stored in- cell suspension up and down at several different
definitely if kept desiccated at 4OC. The main ad- spots on a slide that has been warmed to 60°C on
vantages of the squash technique are that it is a a slide warmer. Each time the cell suspension is
very quick and efficient way to analyze a particu- drawn back into the pipette, cells are left sticking
lar specimen without building up a large number to the slide in concentric rings. The suspension is
of unnecessary slides. Also, excellent photomicro- drawn back into the pipette before moving to the
graphs of selected chromosome spreads can be next spot. An advantage of this latter technique is
Plate 1 Examples of the use of FISH and confocal ing. (D) Higher magnification of the X chromosome
laser scanning microscopy (CLSM). (A-E) Marsupial with FISH. (E) Silver staining of the X chromosome. (F)
(PtK1) chromosomes observed with CLSM. (From 55 rDNA hybridized to chromosomes of the tetraploid
Robert-Fortel, 1993.) (A) Propidium iodide staining of frog, Odonfophyrnus arnericnnus, observed with stan-
the X chromosome shows secondary constriction (ar- dard epifluorescent microscope. (Compliments of
row). (B) Fluorescence in situ hybridization (FISH) with Maria Jose Martinez Exposito and Martina Guttenbach,
rDNA probe wit11 labeling superimposed on phase con- Institute of Human Genetics, University of Wiirzburg.)
trast. (C) Silver staining visualized by reflection Imag-
tliat several spots can be placed in controlled po- than cell number, and involves endoreplication of
sitions on a single slide, which facilitates screen- their chromosomes. I'olytene chromosomes are
ing the slides for good chromosome spreads (M. easiest to prepare from salivary glands, which are
Schmid, personal communication). Splaslz slide found near the anterlor end of the larvae. The
preparations can be stored indefinitely if they are glands can be exposed quickly by rclnoving the
kept desiccated at 4OC. larva's head with watchmaker's forceps or nec-
A disadvantage of the splash technique is that dles and removing the adhering fat bodies, the
it is sometimes difficult to see or photograph good glands can then be squashed on a subbed slide
examples of chromosome morphology until after under a siliconized coverslip.
the preparations have been stained and/or cover-
slipped (although chromosome spreads can be lo- Preparation of Meiotic CIzromosolnes
cated using phase contrast optics or using a defo- from Spermatocytes
cused condenser under bright field optics). Meiotic chromosome preparations can be rela-
Whichever technique is used to obtain chromo- tively easily made from testes of most anlrnals
some preparations for in situ hybridization, it is and from pollen mother cells (PMCs) in plants.
important that the actual preparations are made Tlie squash technique is used for amphibians and
near one end of the slide for ease of handling later, plants with large chromosomes, while the splash
especially during autoradiography. technique usually is used for amniotes, whrch
Cytological preparations for in situ hy- have much smaller chromosomes. For amphlb-
bridization should be made on slides that have lans, birds, fish, and reptiles, testes are s~niplyrc-
been coatcd with a thin layer of gelatin (subbed) moved, cut or sliced, and fixed directly in 3:l fixa-
to minimize loss of material during processing. t ~ v e .In salamanders, analysis of meiosis 1s
Subbed slides are particularly ~mportantfor au- facilitated by the fact that meiosis occurs in a cau-
toradiography to prevent the nuclear track emul- docephalic wave in the testes (Kezer et al., 1989).
sion from slipping during developing, fixing, A brief hypotonic treatment (e.g., 10 mln In dls-
washing, and staining. tilled water) appears to strip away enough of the
chromatin matrix to visualize all four individual
Preparation of Polytene Chronzosomes chromatids in diplotene bivalents (Kezer et al.,
Polytene chromosomes are somatic chromosomes 1989). In mammals, the gametes are produced In
that have undergone many rounds of endorepli- scmlnlferous tubules that contain all stages of
cation (DNA replication without division of the meiosis. These tubules can be teased out of dls-
cell or nucleus) such that each chromosomal ele- sected testes into a dish of liypotonlc solution,
ment consists of hundreds to thousands of unsep- fixed in 3:l methano1:acetic acid, and hydrolyzed
arated chxomatids. Polytene chromosomes are to a cell suspension which is used to makc splash
found in the cells of dipteran insect larvae, in preparations.
collembolans (springtails), and in certain other in- The chromosomes of certain species of Insects
vertebrates (M.J.D. White, 1973). The familiar (e.g., grasshoppers), amphibians (e.g., salaman-
bands of polytene chromosomes are formed by ders), fish (e.g., lungfish), and plants (e.g, lihcs
chromomeres (densely packed chromatid fibers) and orchids) are large enough to provlde easily
that are found along the length of each chromatid. visible meiotic configurat~ons,such as bivalenis at
Polytene chromosomes are particularly useful for pachytene and diplotene of prophasc I and
gene mapping and comparative studies because metaphase I. In organisms with much smaller
of their large size and banding patterns. cl~romosomcsthe analysls of meiotic conf~gura-
Polytene chromosomes are very easy to pre- tions IS facilitated by prcparing silver-stalncd
pare from dipteran larvae. Good examples are synaptonemal complexes (SCs) of pachytene bl-
midges of the genus Chironomus and fruitflies of valents, which are examined with electron mi-
the genus Duosophila. In both of these organisms, croscopy (Jones and de Azkue, 1993; Macgregor,
larval growth occurs by increase in cell size rather 1993; Peterson et al., 1994). This technique In-
vol\.cs lysmg cells in a mild detergent to cause thread of DNA double helix, extend from many of
surlace spreading of pachytene bivalents at the the chromomeres and are sites of active RNA syn-
alr-water interface. These are then dried down thesis (Callan, 1986).
onto a plastic film on a glass slide, fixed in The largest and most easily studied lamp-
paraformaldehyde, and stained with a concen- brush chromosomes occur in the oocytes of sala-
tra ted AgN03 solution. Selected regions of the manders. A generalized method for obtaining
plastlc film are floated onto a water bath and lampbrush chromosomes from amphibian
picked u p onto copper grids for electron mi- oocytes is included in the protocol section of this
croscopy. The advantage of tlus technique is that it chapter (Protocol 12; see also Callan et al., 1987;
allows very high resolution of synaptic configura- Macgregor, 1993). The procedure involves manu-
tions, including the XY bivalent, and has been ally isolating and opening the nucleus ("germinal
particularly useful in analyzing meiosis in trmslo- vesicle") of immature oocytes with needles and/
catlon and inversion heterozygotes (Johannisson or very sharp watchmaker's forceps in a n un-
and Winking, 1994). buffered salt solution ("isolation medium," IM)
(Figure 8). The nuclear contents (including the
P~~eparation of Lampbrush Chromosomes lampbrush chromosomes) are then transferred to
Lampbrush chromosomes represent bivalents at and dispersed in "dispersal medium" (DM) in an
dlplotene stage in female meiotic cells; they are observation chamber on a specially designed
found In the oocytes of most animals (see Callan, slide. The optimal salt concentration of the IM
1986, for a recent review of lampbrush chromo- and DM varies among species and must be deter-
some structure and function). Lampbrush chro- mined empirically. IM consisting of a 5:l mixture
inosomes consist of two duplicated homalogous of 0.1 M KC1 and 0.1 M NaC1 and DM consisting
chrornosornes held together at regions of crossing of IM + 0.5% paraformaldehyde works fine for
over, or chiasmata (Figure 7). Each homologous most amphibians (Macgregor, 1993).
cl-iromosome of the bivalent consists of an axis The lampbrush chromosomes gradually sink
formed by the two closely associated sister chro- to the bottom of the observation chamber and can
matlds that connects a series of ellipsoid chromo- then be observed and photographed with phase
meres. Lateral loops, each consisting of a single contrast optics. The traditional observation cham-
ber consists of a regular glass microscope slide
with a hole bored in it and a coverslip sealed to
Figure 7 Two lanipbrush bivalents of Ambystoma
rnacro~inctylurn.(From Kezer ct al., 1980.)
Chromosomes:Molecular Cytogenetics 131
Figure 8 Steps in the preparation of lampbrush chro- yolk (Dl,and then transferred to a dispersion chamber
mosomes. Oocytes are isolated in isolation medium in a (E, F). The nuclear envelope is peeled off the nucleus
dish (A). An oocyte is opened with fine forceps (B) and in the dispersion chambcr (G), releasing the nuclear
the yolk is extruded (C), revealing the oocyte nucleus. contents (H).Finally, a coverslip is added to the prepa-
The nucleus is sucked in and out of a pipette to remove ration (I).
the bottom with paraffin wax. The disadvantage seal it to the top of a slide with paraffin (Figure 8).
of this kind of chamber is that once settled, the For in situ hybridization and/or immunocyto-
chromosomes can only be observed with an in- chemistry, the preparation must be centrifuged to
verted phase contrast microscope. A simple alter- firmly attach the lampbrush chromosomes to the
native, allowing the use of a regular phase con- coverslip at the bottom of the chamber. One way
trast microscope, is to punch a hole in a plastic to do this is to use a plexiglass disc observation
coverslip (using a regular paper hole punch) and chamber designed so that it will fit on an epoxy
132 Chapter 5/ Sessions
(Q-bands, or QFQ bands), which are brightest in

AT-rich regions of the chromosomes (Hsu, 19791,
but also are influenced by variation in protein
composition of the chromosomes (Bern and Perle,
(A)
a
T
---= -
(B) (C)
1986). The disadvantage of Q-banding is that the
bands are visible only with UV optics and they
fade quite rapidly. G-banding is also simple and
involves brief treatment with trypsin or NaOH
and staining with Giemsa (or similar dyes) in a
phosphate buffer. The result is alternating light
and dark bands (G-bands, or GTG bands), the lat-
Epoxy ter representing primarily AT-rich regions and
Coversllp plug thus corresponding to most Q-bands.
Whereas Q- and G-banding require little or no
pretreatment of the chromosomes, R-banding and
C-banding require a stringent extraction step re-
sulting in significant loss of chromosomal DNA
(at least 60%; Pathak and Arrighi, 1973). R-band-
ing, or "reverse banding", involves pretreatment
with hot (80-90°C) alkali and subsequent staining
with Giemsa (RHG bands) or with a fluorochrome
such as acridine orange (RFA bands). This results
in a banding pattern that is the reverse of G-band-
ing (RHG bands) or of Q-banding (RFA bands). A
Figute 9 Centrifuge hbe fitted with an epoxy plug for much less stringent method for obtaining fluores-
centrifuging lampbrush preparations (Protocol 12). (A) cent R-bands is described below. For C-bands,
Dispersion chamber consisting of a plastic disk with a chromosomes are treated with a strong base at an
7-mm hole bored in the center and a coverslip attached
to the bottom with paraffin. (B) Polymerized epoxy elevated temperature, incubated in a sodium cit-
plug in 30-ml centrifuge tube. (C) Dispersion chamber rate solution at high temperature, and stained in
i s positioned on the epoxy plug by raising the plug a concentrated Giemsa solution. This results in the
with a probe inserted through a hole in the bottom of extraction of almost all of the non-C-band chro-
the centrifuge tube.
matin, leaving only constitutive heterochromatin
(Figure lo), which usually contains rapidly reas-
sociating repeated sequences (Comings et al.,
plug inside a large centrifuge tube (Figure 9). Af- 1973). Methods for Q-, G-, R-, and C-banding are
ter centrifugation, the dispersal medium can be given in the protocol section.
removed and the lampbrush chromosomes fixed Various specialized banding procedures have
and dehydrated. also beeit developed (see Rooney and Czepul-
kowski, 1986; Venna and Babu, 1989; Therman and
Chromosome Banding Susman, 1993). Some of the most useful methods
The four most common methods for banding are fluorescence banding using various flue-
chromosomes are Q-banding, G-banding, R- rochromes (e.g., chromolnycin A3, Hoechst 33258,
banding, and C-banding (Bickmore and Sumner, and DAPI), differential-replication banding using
1989; Sumner, 1990; Therman and Susman, 1993). bromodeoxyuridine (BrdU), and staining for nu-
The simplest of these is Q-banding, which in- cleolar organizer region (NOR) using silver ni-
volves soaking the slides in a buffer and then trate. Chromomycin A3 can generate "R-bands" in
staining with quinacrine mustard or quinacrine mammalian chromosomes, and BrdU banding is
dihydrochloride. This produces fluorescent bands particularly useful for species that are difficult to
Figure 10 C-banded mitotic (A) and meiotic (B) chro-

mosomes of the plethodontid salamander A n e i d e s
aei~eus.(From Sessions and Kezer, 1987.)
represent regions of incompletely condensed In-
tercalary heterochromatin, and may be analogous
G-band, such as salamanders (Figure 11).An ex- to late-replicating fragile sites in human cl~romo-
ample of NOR staining is shown in Figure 12. Var- somes (Laird, 1987).
ious restriction endonucleases have also been used Chromosome preparations can be banded ei-
to induce banding patterns (e.g., Ferrucci et al., ther before or after in situ hybridization. For
1987; Figure 13). Vast differences often are ob- banding after autoradiography it is important to
served in the response of chromosomes of different
organisms to these banding procedures, reflecting
differences in chromosome organization. For ex-
ample, banding with chro~nomycinA3 (counter-
stained with methyl green) or with distamycin A
(counterstained with mithramycin) specifically
stains small stretches of GC-rich chromosomal
DNA and yields multiple chromossmal. bands (R-
bands) in mammals but stains almost nothing but
nucleolar organizer regons in fishes, amphibians,
reptiles, and plants (Figure 12; Sessions and Kezer,
1987; Schmid and Guttenbach, 1988).
One potentially very useful (but little used)
approach to the identification of homologs is the
induction of cold-induced constrictions, or CICs
(Figure 114; Callan, 1966; Sessions, 1982). These
constrictions are clvomosolne specific, and can be
induced in organisms with large chromosomes,
such as certain plants and pro- Figure 11 D~fferentialreplication banding ~n an cm-
longed treatment of the specimens at o.5-2.5°C in bryo of the Japanese salamander, Hynobius tokyoensis.
the presence of colchicine. The CICs apparently (Reproduced with permission fro111 Kuro-o et a1 ,1987.)
234 Chclpter 5 / Sessions
Figure 12 AgNOR banding and chromomycin A3 flu-

orochrome banding (R-band~ng)in species of the
picihodontid salamander genus Aneides. (A, B) AgNOR cloned) can be used as a probe for in situ hy-
bands (C-F) Different positions of R-bands.
bridization, DNA or RNA probes of any desired
sequence can be obtained through recombinant
use a banding procedure that requires little or no DNA technology (see Chapter 9) and/or by in
pretreatment, such as G-banding or Q-banding. vitro DNA amplification (PCR; see Chapter 7).
Prehybridization baizdlng procedures requiring Both RNA and DNA double-stranded or single-
stringent pretreatlnents may have an adverse ef- stranded probes can be used. RNA probes have
fect on the hybridization reaction due to loss of the advantage that they produce less background
cl~rornosornalDNA. than DNA probes, and unbound or non-specifi-
cally hybridized RNA is efficiently removed by
fi Sittl Hybridization W a s e digestion after the hybridization reaction.
In sltu hybridization of nucleic acid probes to It is somewhat more difficult to reliably remove
chromosome preparations involves four general non-specifically bound DNA probes. In general, a
sleps ( 1 ) preparation and labeling of probe; (2) better signal-to-noise ratio is obtained when vec-
the hybridization reaction between probe and tar- tor sequences are removed from the purified
get, ( 3 ) removal of unbound or non-specifically DNA, although in some cases the extra DNA pro-
hybndl~edprobe; and (4) visual detection of the vided by the vector may facilitate the forma.tion of
sites of hybridization. Any DNA (cloned or un- DNA networks that enhance the signal.
Chromosomes: Moleculau Cytogenetics 135
Human Chimpanzee Gorilla Orangutan be1 probes for nucleic acid hybridization are 3 2 ~ ,
1251, and 3H; choosing one for in situ hybridization
involves a trade-off between sensitivity and reso-
lution. 32Pyields the highest specific activity and
is extensively used in transfer-hybridization ex-
periments (Chapter 8) but is not used in in situ
hybridization because its high energy disintegra-
Chromosome 1 1 1 1 tions result in poor resolution. Tritium is usually
Chromosome 12 12 12
considered the best radioisotope for in situ hy-
bridization because of the extremely low energy
of p particles emitted (0.018 MeV; Pardue, 1985).
The low-energy P particles emitted by 3H travel
only about 1 pm through autoradiographic emul-
sion, which results in close spatial correspondence
between silver grains and hybridized target (Mac-
gregor and Varley, 1983; Pardue, 1986).The disad-
vantage of using tritium is that it often necessi-
tates long autoradiographic exposure times,
depending on the specific radioactivity of the
probe. Shorter exposure times are achieved with
lz5X, but the radiation emitted is significantly more
energetic than that of tritium with the danger of
less precise resolution and higher background
Chromosome Y Y Y Y (Pardue, 1986).
Figure 13 Restriction endonuclease banding in apes High specific radioactivities are achieved by
and human. Top row: human chromosome 1 and ape in vitro labeling (Macgregor and Varley, 1983; see
homologs. Middle row: human chromosome 2 and ape protocol section). In vitro transcription of RNA by
homologs. Bottom row: human Y chromosome and ape E. coli polymerase for in situ hybridization in-
homolois. (Reproduced with permission from ~errucci
et al., 1987.)
There are four main sources of labeled nu-

cleic acid probes used for in situ hybridization:
(1)in vivo labeling of ribosomal RIVA; (2) in vitro
production of RNA from DNA templates using
either Escherichia coli or SP6 polymerase; (3) nick
translation of double-stranded DNA using
DNase I and DNA polymerase I; and (4) in vitro
DNA amplification via the polymerase chain re-
action (for other labeling techniques see Arrand,
1985; Schleif and Wensink, 1981; Berger and Kim-
mel, 1987; Ausubel, 1989; Ausubel et al., 1992;
and Chapters 8 and 9). Probes may be labeled ei-
ther with radioisotopes, detected via autoradiog-
raphy, or with non-radioisotopic molecules such
as biotin, BrdU, and various fluorochromes for
FISH. Figure 14 Chromosomes of Ambwstoma ieffersonianum
The radioisotopes most commonly used to la- sh;wing cold-induced secondary constri~tcons(CICs).
valves the random transcription of double-

stranded or single-stranded DNA to yield labeled G - %= ; " -
(A) A T C-C G A T-- T C G C C G T A A T C G C C G A
. . . ----~-c----4
,.--
GT
RNA (a protocol is given below). In vitro tran-
scription with SP6 RNA polymerase involves
cloning a known DNAsequence adjacent to a pro-
moter of the phage SP6 to produce a single-
stranded tritium-labeled RNA probe. SP6 poly-
merase and cloning vectors are commercially
available (see Pardue, 1985 and Chapter 9 for
Nick
\ I DNase I nicks DNA,
allowing DNA
polymerase I to bind
cloning protocols). / 5' exonuclease activity

DNA
Nick translation of DNA is usually considered polymerase I of the polymerase
degrades original DNA
the most successful and generally useful method Labeled as new strand is
for obtaining labeled probes with high specific ac- dNTP syntl~esized
\
tivity (Macgregor and Varley, 1983). The advan-
tages of nick translation are that it is a relatively
rapid, simple, and inexpensive reaction, it yields
uniformly labeled DNA, and it can be used to pro-
duce probes with high specific activities (Ausubel,
1989; Ausubel et al., 1992). Furthermore, nick Ligase seals nick
translation can be used to prepare eithcr radioac-
A
tive or non-radioactive (biotin, fluorochrome, or

BrdU) probes. Nick translation involves the use of
DNase I to create single-strand nicks in double-
stranded DNA, followed by cxposure of the
Figure 15 Nick translation. A nick is introduced into
nicked DNA to E. coli DNA polymerase I in the double-stranded DNA (A) and by DNase I (B). As E.coli
presence of labeled deoxyribonucleotides (Figure DNA polymerase I repairs the nick, it excises the bases
15). The polymerase binds to a nick and removes ahead of it. If labeled nucleoside tripl~osphatesare pre-
nucleotides from the 5' side while adding labeled sent, they will be used to repair these holes (C). The re-
nucleotides to the 3' side as it moves down the sult is a strand of DNA that is very heavily labeled (L)).
DNA strand (Arrand, 1985; Gilbert, 1991).This ac- (After D.A. Gilbert, 1991.)
tion results in the translation (i.e.,translocation) of
the nicks along the DNA strand in a 5 ' 4 3' direc- cursors with high specific activity (40-110
tion and the replacement synthesis of a labeled Ci/mmol; Harper and Saunders, 1984).
DNA strand. Denaturation of nick-translated Protocols for non-isotopic labeling are quite
DNA thus yields uniformly labeled DNA frag- similar to standard radioisotopic nick translation
ments that can be used as probes. The length of the except for alterations to optimize the incorpora-
fragments produced as well as the extent of incor- tion of modified deoxynucleotides (Ausubel et al.,
poration can be controlled by the amount of 1992).The two most frequently used non-isotopic
DNase I that is used. For radioisotopic labeling, labels are biotin and digoxigenin, but BrdU can
the specific activity of the probes is controlled by also be used for nick translation. For biotin-avidin
the number and specific activities of tritiated de- labeling methods, biotin-labeled nucleotides (e.g.,
oxyribonucleotides. The best results are obtained biotin-11-dUTP instead of dTTP) are substituted
if all of the deoxyribonucleotides are tritiated, but in a standard nick translation reaction mixture
at minimum the reaction should include VHldCTP and the DNase I concentration is adjusted to en-
and I3HldTTP, since these are available at the high- sure a size range of 100 to 500 nucleotides
est specific activity. The detection of single copy (Ausubel et al., 1992). The resulting probe is said
sequences requires a specific activity of 2-4 x lo7 to be biotinylated, and will bind with high effi-
cpm/pg, which can be achieved by the use of pre- ciency to avidin or to anti-biotin antibodies.
Chromosomes: Molecular Cytogenetics 3.37
Labeled probes may also be obtained by in The actual hybridizat~onreaction Involves

vitro DNA amplification: the polymerase chain re- first dissolving the probe m hybridizat~on'buffer
action (Saiki et al., 1988 and Chapter 7). This tech- with 50% formamide (lob7 stringency) or 65%
nique involves the exponential amplification of a forrnamide (high stringency). Optionally, 10%
DNA segment by repeating cycles of polymerase- dextran sulfate and sheared, denatured, non-com-
mediated oligonucleotide primer extension. The petitive carrier DNA at 1000-fold greater concen-
basic PCR cycle consists of (1)heating to denature tration than the probe may also be added to the
the template DNA, (2) cooling to permit precise hybridization reaction mixture (see below). For-
primer annealing, and (3) w a r m i ~ gto facilitate mamide lowers the melting temperature of dou-
polymerase-mediated extension. For labeled ble-stranded DNA, and thus lowers the tempcra-
probes, radioactive or biotinylated deoxyribonu- ture required for the hybridization reactlon.
cleotides are included in the reaction mixture. Formainide is especially important for RNA-
Since the copy number of the DNA segment dou- DNA hybridization because ~tretards the rate of
bles after each cycle, it can be amplified several RNA degradation (Schleif and Wensink, 1981).
million fold (2n,where n = the number of cycles). The formamide should be of the highest quality,
deionized, and stored at -20°C. The additlon of
The Hybridization Reaction dextran sulfate to the hybridization mixture ac-
Although almost every laboratory has its own mi- celerates the hybridization reaction approximately
nor modifications, the hybridization reaction of tenfold by increasing the effective concentrat~on
nucleic acid probes to chromosomal preparations of the probe, and favors maximum binding by cn-
involves four general steps (see protocols): (1) re- hancing the formation of networks of probe mole-
moval of endogenous RNA from the cytological cules on the target. The add~tionof dextran sulfate
preparation (for DNA targets); (2) denaturation of to the hybridization mixture is essential for the
the DNA of the target chromosomes (for DNA tar- detection of single-copy sequences. Adding
gets); (3)incubation of the cytological preparation sheared, denatured, non-competitive, unlabeled
with probe dissolved in hybridization buffer; and DNA to the hybridization mixture reduces non-
(4) removal of non-specifically bound probe. specific binding of the probe. For DNA probes,
For hybridizing to DNA targets it is impor- such "blocking" DNA can be extracted from E. co11
tant to remove endogenous RNA because it can (Pardue, 1986) or from salmon sperm (Malcolm ct
compete with target sequences of hybridization al., 1986). Unlabeled ribosolnal RNA from E coli
by treating the cytological preparations with a can be used for RNA probes (Pardue, 1986).
high concentration of pancreatic ribonuclease, or Probes for in situ hybridization should be
with a mixture of ribonuclease A and ribonucle- used at a concentration that will nearly saturate
ase TI.This step is, of course, omitted if the target the target DNA without contributing to back-
is RNA. ground signal. Large targets, such as localized
Cl-tromosomaltarget DNA must be denatured clusters of repetitive sequences and the reiterated
for in situ hybridization. This is accomplished by sequences of polytene chromosomes, require
briefly treating the slide preparations with 0.07 M lower concentrations. The optimal probe concen-
NaOH at room temperature. Alternatively, the tration depends on the probe, its specific activity,
slide preparation can be exposed to 70% forma- and the nature of the target.. Therefore, it gener-
mide at a higher temperature (formamide lowers ally is best to use a range of concentrations (e.g.,
the melting temperature of double-stranded from 1-20 ng per slide, Malcolm et al., 1986). In
DNA). Unfortunately, both procedures result in general, a radioactively labeled probe should be
some degradation of chromosomal morphology. dissolved .to provide a total of 3 x 106cpm as de-
Alternatively, denaturation with 0.1 M HCl results termined by a liquid scintillation counter (Mac-
in better chromosomal morphology but reduced gregor and Varley, 1983).
hybridization due to depurination of the target The hybridization reaction is initiated by
DNA (Macgregor and Varley, 1983; Pardue, 1986). pipetting the probe mixture onto the center of the
cytological slide preparation, covering with a cov- 0.5 pm from the source, although 1-2% of the par-
ersiip, and incubating in a moist chamber at 37'C ticles may travel up to 3 pm. lZ5Iproduces a
for approximately 12 hr. A suitable moist chamber greater scatter of grains, u p to 16 pm from the
is a large petri dish, plastic freezer box, or similar source, although approximately 90% of the grains
container, lined with paper towels that have been will fall within a 3.5-pm radius and at least half of
soaked UI buffer. Microscope slides are supported the grains will be at the same distance as those
on broken pipettes or glass rods laid side-by-side produced by tritium (A.S. Henderson, 1982).
on the bottom of the chamber. Several different nuclear track emulsions are
After an appropriate incubation time, the hy- available with different sensitivities. The single
bridized cytological preparations must be washed most important property of the emulsion is the in-
to remove probe molecules and their degradation trinsic background of silver grains formed in the
products that are not bound to complementary absence of exposure. Therefore, it is necessary to
sequences on nuclei or chroinosomes. This proce- test each batch of new emulsion as it arrives from
dure 1s necessary to remove both weakly hy- the supplier by developing coated blank slides
bridized molecules and unboulld or non-specifi- and examining them under a microscope. A back-
cally bound lnolecules and is essential to reduce ground of less than 50 grains per field of view un-
background signal. This step can involve different der a 1 0 0 oil
~ objective is considered very good,
levels of stringency depending on the nature of but a grain count of over 100 is unacceptable
the hybrld. Washing usually involves incubation (Macgregor and Varley, 1983).Unacceptable emul-
in 2 x SSC at temperature that is slightly Iower sion should be returned to the supplier for a re-
than that used for the hybridization reaction, in placement. Background can also be controlled by
add~tlonto treatment with 50% formamide (or 5% careful handling and storage of the emulsion,
cold TCX, in the case of radioactive probes). If an After the slides are coated with emulsion, the
RNA probe is used, washing includes a mild di- preparations are exposed for a length of time that
gestion with ribonuclease. Following the washing must be determined empirically. The objective is
step, che slides are ussally dehydrated in ethanol to obtain a sufficient number of silver grains to
and ax-dried. The slides are now ready for au- detect hybridization unambiguously but not so
torad~ographyor immunochemistry. many that cytological detail is obscured. Tlxe best
exposure time for a particular in situ Itybridiza-
Auior~adzngrayhy tion experiment can be determined by including
Visualization of sites of hybr~dizationbetween a several replicates or cytologically suboptimal
radioactive probe and its cytological target is preparations that can be used as test slides. One
achlevcd by autoradiograpl-ry.This procedure intest slide is developed at a given interval to deter-
volves coating the slides wrth a thin layer of nu- mine whether exposure has been adequate. Expo-
clear track photographic emulsion consisting of sure times can vary from days to months, de-
silver halide crystals suspended in a gelatin ma- pending on the concentration and specific activity
trix TZadiation from hght or from radioactivity of the hybridized probe molecules.
s e n s ~ t l ~ the
e s crystals to form a "latent image,"
which IS visualized when the crystals are reduced Chromosome Painting Using FISH
to metallic silver by photographic developer. The Standard ISH utilizing radioisotopes I-ras been
resulting grain density 1s highest immediately largely supplanted by fluorescent in situ hy-
over the source of radiation and decreases sym- bridization (FISH), which is now widely ac-
n~etricallyon each side of the source in increasing knowledged as the method of choice for localiz-
distance The rate of decrease of grain density ing specific chromosomal sequences in clinical as
from t l ~ source
t determines the resolution, and is well as comparative cytogenetics. One great ad-
depe~\dent on the radioisotope that is used. The vantage of FISH is that it is possible to map mul-
vast rnajorlty of silver grains produced by the /3 tiple probes simultaneously by detection with dif-
particles emltted by tritium will be located within ferent fluorochromes. Up to seven different
pobes have been visualized simultaneously on a tension in the presence of labeled nucleotides
single preparation (Freshney, 1994).Another ad- (e.g., FITC-12-dUTP), using the chromosome tar-
vantage is that non-radioisotopic probes eliminate get as the template (Koch et al., 1991; Volpi and
the need for autoradiography and long exposure Baldini, 1993). This technique results in specific
times, so the procedure is relatively rapid. Also, banding patterns, and can be used to map both
the use of digital imaging systems, such as confo- dispersed and localized repeated sequences. The
cal laser scanning microscopy, makes FISH partic- advantages of PRINS over traditional ISH are that
ularly good for data manipulation and storage, it is very fast, allowing good signals to be ob-
with a degree of sensitivity and localization that tained from repeated DNA sequences in less than
cannot be achieved with standard isotopic ISH one hour and from unique DNA sequences in less
(Freshney, 1994; Plate 1). Einally, non-isotopic than three hours. A useful modification of the
probes are stable for long periods of time and P13INS technique is the use of multiple probes de-
large quantities may be produced at one time and tected with different fluorochromes, called MUL-
stored at -20°C for many months or years. There . TIPRINS (Volpi and Baldini, 1993).
are a large number of different techniques that
come under the category of FISH, and new meth- Immunochemis fy
ods are being developed at a rapid rate. The most The use of immunochemistry has been vastly sim-
commonly used technique involves the use of bi- plified by the commercial availability of numer-
otin-conjugated nucleotides to label nucleic acid ous polyclonal and monoclonal antibodies and
probes (Langer et al., 1981; Ausubel et al., 1992). detection kits. Specific antibodies (e.g., anti-biotin
The probe is detected immunochemically with bi- or anti-avidin) are usually purchased already con-
otin-specific antibodies, avidin, or streptavidin, jugated with biotin, fluorochrome, or peroxidase.
which are conjugated either to a fluorochrome Five immunochemical systems are most com-
(e.g., fluorescein isothiocyanate, FITC) that is vi- monly used to visualize non-radioisotopically la-
sualized with a UV microscope, or to enzymatic beled hybridized probes (Figures 16 and 17). In
reagents (e.g., alkaline phosphatase or horserad- each case the probe usually has been either bi-
ish peroxidase) that can be reacted with a sub- otinylated or BrdU-substituted via nick transla-
strate to form a cytologically visible stain. Several tion, The direct fluorescence method involves the
otlzer techniques for FISH have been developed, use of a single, fluorochrome-conjugated anti-bi-
including CISS (chromosome in situ suppres- otin monoclonal antibody (mab) which binds di-
sion hybridization) and PRINS (primed in situ rectly to the biotin side-groups on the probe. The
labeling). CISS utilizes probes from DNA libraries advantage of this system is that it is relatively fast,
of flow-sorted cl~romosomesto search for DNA simple, and inexpensive. The main disadvantage
sequence homology of the entire length of the tar- is that, unless the target is a reiterated sequence,
get chromosomes while suppressing the repetitive the signal may be too weak to detect. The indirect
DNA sequences of the other chron~osomesby al- fluorescence method involves an anti-biotin mab
lowing the repeated sequences of the probe itself as a primary antibody, followed by reaction with
to reanneal in the hybridization mixture prior to a fluorochrome-conjugated secondary antibody
the actual hybridization reaction (Luke and which recognizes the primary mab as antigen
Verma, 1993). CISS results in the labeling of whole (since mab's are made in mouse cells, the sec-
chromosomes or parts of chromosomes, and can ondary antibody should be a polyclonal anti-
be used to identify homeologous chromosomes in mouse antibody, usually made in rabbit or goat).
different species or in hybrids (Wienberg et al., The main advantage of the indirect method is that
1990,1992; Luke and Verma, 1993). PRINS in- each secondary antibody can bind with two pri-
volves in situ hybridization of unlabeled oligonu- mary antibodies, thus amplifying the signal. The
cleotide probes (oligos) to complementary se- main disadvantage is that it is both somewhat
quences on fixed chromosomes. The oligos serve more expensive and time-consuming than the di-
as primers for in situ DNA polymerase-driven ex- rect method.
A third approach is the peroxidase-antiper- is greater sensitivity than either the direct or indi-
oxidase (PAP) method involving at least three rect method; the main disadvantage is that it is
reagents: primary and secundary antibodies, and much more time-consuming. The PAP method
a PAP complex comprised of the enzyme peroxi- has been used to detect BrdU-labeled probes
dase and an antibody against peroxidase (Figure (Frommer et al., 1988).
26). T11e primary antibody binds to the biotin on The last two methods utilize fluorochrome-
the probe, the secondary, or bridging, antibody conjugated avidin (or streptavidid-biotin conju-
binds to both the primary and to the PAP complex gates to greatly amplify the signal from 11y-
(since both are produced in the same animal bridized biotinylated probes. The first of these
species). The main advantage of the PAP method involves an anti-biotin mab followed by reaction
4@ Fluorochrome 8 Peroxidase
Target DNA 5
Figure 16 The primary systems for detecting biotiny- dase-anti-peroxidase complex. (C) Direct fluorescence:
lated probes using immunochemistry.(A) Indirect fluo- a fluorochrome-conjugatedanti-biotin ab is used alone.
rescence: a primary anti-biotin antibody (ab) is recog- (D) Avidin-biotin conjugated fluorocl~rome:a biotiny-
nized by a fluorochrome-conjugated secondary ab. (B) latcd secondary ab binds to an anti-biotin primary ab,
PAP system: a primary anti-biotin ab is recognized by and fluorochrome-conjugated avidin then binds to the
a bridging (or linker) ab which also binds to a peroxi- secondary ab.
Chromosomes: Molecular Cytogelzetics 141
wit17 a biotinylated anti-mouse polyclonal sec- painting via FISH treats the hybridized biotiny-
ortdary antibody, and amplified by adding fluo- lated probes first with FiTC-conjugated avldin
rochroil~e-conjugatedavidin, which binds to the wluch binds to the biotin side groups of the probe,
biotin on the secondary antibody (Figure 17).The followed by reaction with a biotinylated primary
most commonly used procedure for cl~romosome anti-avidin mab, and then amplification by treat-
(A) Biotin
Target DNA
Biotinj~latedprimary
anti-avidin antibody
Arnpllflcation wlth
addltlonal fluoroclirnme-
conjugated avldil~
Figure 1 7 Ainplif~cat~on with fluorochrome-conju- ot~nylatedanti-avldm ant~bodpIS added, then (D)a n

gated avldln. (A) A biotinylated probe 1s hybr~dizedto additional layer of fluorocl~romc-conjugateda v t d ~ n1s
a target (B) Fluorochrome-conjugated avldin is then added to ampl~fythe signal.
added, wluch bli~dsto the blotln on the probe. (C)A bi-
merit with additional fluorochr~me-conjugated APPLICATIONS AND LIMITATIONS
atildm (Figure 17). The main advantage of this last
method is that the signal can be greatly amplified Applications
by additional treatments of anti-avidin antibody
fdllowed by more fluoroclirome-conjugated Cytogenetic studies can contribute an array of in-
avidln, ii needed. This last method is important formation independent from morphological, bio-
because ~t usually yields a signal that is strong chemical, behavioral, and other characters that are
enough to be detected via standard epifluores- used for phylogenetic analysis. As with biochemi-
cencc microscopy. cal data, cytogenetic information can reveal differ-
Biotin-labeled probes can also be detected by ences and similarities that may not be obvious at
treating the hybridized preparations with an anti- the morphological level. The inherent attractive-
biotin primary antibody followed by a secondary ness of cytogenetics is that it encompasses several
al-it~bodythat is conjugated with horseradish per- levels of biological organization ranging from the
oxidase. The peroxidase is then visualized by re- morphological to the molecular, depending on the
acting with diaminobenzidine tetrahydrochloride applicable technology.At one extreme, the overall
(DAB) and hydrogen peroxide, which results in a amount of DNA in a genome can have substantial
reddish-brown to black signal. Alternatively, the phenotypic consequences at the whole organismal
preparations are treated with alkaline phos- level in terms of cell size and cell cycle time and
pliaiase- or peroxidase-conjugated avidin (or the effects these cellular parameters can have on
streptavidin), proteins that bind to biotin very organismal development rate (Cavalier-Smith,
tightly. Alkaline pl~osphataseis visualized by re- 1985b; Sessions and Larson, 19871. These pheno-
acting with 5-bromo-4-chloro-3-indolylphosphate typic correlates of genome size have been termed
(BCII') and nitro blue tetrazolium (NBT), result- "nucleotypic" by Bennett (1972). Chromosomes
ing in the deposition of a purple precipitate at can be studied as a morphological manifestation
sites of hybridizat~on.Complexing the avidin of the genome in terms of their microscopically
~7itl1electron-dense colloidal gold allows the hy- visible size, shape, number, and behavior during
bridlzed probes to be visualized with electron mi- meiosis and mitosis. The analysis of ploidy levels
croscopy (Hamkalo and Hutchison, 1984; Fctni et or meiosis can provide unique insights into
al.,1992). changes in breeding systems or modes of inheri-
Whcreas the products of both alkaline phos- tance (e.g., apomixis or translocation heterozygos-
phalasc and peroxidase staining are stable for ity). At a lower level, banding studies and chro-
long periods of time, fluorescein (FITC) has the mosome painting reveal aspects of the general
disadvantage that it fades rather quickly. A solu- structural organization of chromatin along the
tion of p-phenylenediamine is sometimes added lengths of individual chromosomes. At a still
tcr retard fading of fluorescence (Pinkel et al., lower level, chromosomes may be probed with
1986) Preparations labeled with fluorescein can known DNA or RNA sequences using fluores-
be counterstained with DNA-specific fluorescent cence or radioisotopic in situ hybridization to re-
dyes, e.g., 4,6-diamidino-2-phenylindole (DAPI) veal finer details of chromosomal anatomy in
and propidium iodide (Pinkel et al., 1986). The terms of the spatial arrangements, or presence/ab-
propldmn~iodide fluoresces red and allows si- sence, of particular kinds of sequences. It is now
multaneous observation of the yellowish-green possible to go one step further and directly mi-
FlTC-labeled hybridized probe and total DNA. crodissect selected regions of chromosomes for se-
DAPI fluoresces blue and is used so that biotin- quence analysis (Pirrotta, 1986). Another impor-
labeled and total DNA can be observed sepa- tant technological innovation is fluoxescence
rately Fluorescein and propidium iodide are ex- activated cell sorting (FACS), which allows the
cited at 450490 nm and propidium iodide can large scale isolation of particular chromosomes
be vlewed separately at 546 nm. DAPI is excited from a given karyotype. DNA extracted from these
in UV at 360 nm. purified chromosomes can be used to prepare
whole-chromosome-specificprobes with which to may be attributable to Robertsonian trans lo^^..

study chromosome homology between species via tions (fusions and fissions of chromosomes at their
chromosome painting (Breneman et al., 1993). centromeres) and two different kinds of inversions:
FACS is facilitated in karyotypes with large varia- pericentric inversions, which involve the cen-
tion in chromosome size (e.g.,Homo sapiens), but is tromere, and paracentric inversions, which occur
more difficult if the karyotype consists of a large outside of the centromeric region (Figure 18).Al-
number of similar-sized chromosomes.
Most phylogenetic studies using cytogenetics
have used mainly chromosome number and mor- Figure 18 Some of the most common chromosome re-
phology and, less commonly, various kinds of arrangements and their effects on banding patterns and
chromosome bands (e.g., Sessions and Kezer, 1987; chromosome morphology. (A) Robertsonian transloca-
tion~,in which two telocentiics undergo fusion to form
Macgregor et al., 1990).Comparative studies using a biarmed chromosome, or a biarmed chromosome un-
chromosome painting techniques are becoming dergoes fission to produce two telocentrics. (R) A para-
more frequent, thanks to widely available and rela- centric inversion, in which the centromere is not in-
tively inexpensive probes and kits for labeling and volved so no change in chromosome morphology
detection. Much of the phylogenetically useful occurs, but banding patterns lnay be altered. (C)A pen-
centric inversion, which does involve the centromere,
variation in chromosome number and morphology and can result in changes in both chromosome mor-
phology and banding patterns.
- Fusion
f-----------
Fission
though pericentric inversions often can be docu- The easiest and most successful application of
mented by chromosomal morphology alone (i.e., in situ hybridization concerns the localization of
shifts in centromere position), confirmation of any moderately long sequence that is repeated
Robertsonian translocation and paracentric inver- more than 100 times at one place in the genome
sions usually requires some kind of chromosome (Macgregor and Varley, 1983).Consequently, most
markers such as bands, NORs, or hybridized comparative studies have focused on repetitive
probes. Inversions and translocations can also be sequences such as those coding for ribosomal
detected in lampbrush chromosomes and other RNA, tRNA, and histones, as well as highly re-
meiotic preparations. peated satellite DNA sequences. Single-copy
Although differences in chromosome struc- genes have always been easily detected in
ture are often correlated with taxonomic differen- dipteran polytene chromosomes because all gene
tiation, the role of cytogenetic change in actual sequences are multiplied several hundred times
processes of speciation is controversial (Patton and are localized and concentrated.
and Sherwood, 1983; Sites and Moritz, 1987; M. Although there have been many comparative
King, 1993). The fixation of structural rearrange- studies of sequence localization using ISH, these
ments may be the most important, and easily data rarely have been used for estimating phy-
comprehended, cytogenetic correlate of speciation logeny. Phylogenetic analyses are possible using
(cf.Patton and Sherwood, 1982). Some organisms, such characters as the location(s) of sequences
such as many groups of salamanders, show little (e.g., various repeat families) among and with
or no intra- or interspecific variation in cytologi- chromosomes, sequence structure and copy num-
cally visible aspects of chromosome structure de- ber, spatial relationships among identified genes
spite extensive changes in organismal morphol- and other sequences and to specific bands or
ogy, protein biochemistry, and even DNA other markers, and the localization of functional
sequence structure, suggesting that cl~omosomal versus non-functional NORs. These kinds of stud-
morphology has been strongly constrained (Ses- ies have been particularly important for the iden-
sions and Kezer, 1987).The reasons for this cyto- tification of homologies among chromosomes or
genetic stasis are unknown. parts of chromosomes among distantly related
Even more controversial is the possible exis- species for phylogenetic analysis (e.g., Duosophila,
tence of major trends in karyological evolution. In Steinemann et al., 1984; Wienberg et al., 1992).
salamanders, for example, it has been argued that The use of such characters is predicated on
primitive karyotypes are asymmetrical (i.e. contain our understanding of their evolution. Two differ-
both bi-armed and telocentric ch.rornosomes) and ent (but not mutually exclusive) views concerning
bimodal (i.e., contain both rnacrochxomosomes and the mode of evolutionary change in the molecular
much smaller microchromosomes) with high chro- structure of chromosomes are the chromosome
mosome numbers, whereas derived karyotypes are repatterning hypothesis (Mancino et al., 1977;
symmetrical (all bi-armed) and unimodal (no mi- Cremisi et al., 1988) and the homosequentiality
crochromosomes), with lower chromosome num- hypothesis (Figure 19; Macgregor and Sherwood,
bers (Morescalchi, 1973; 1975). This argument is 1979). According to the repatterning hypothesis,
based on correlations of karyotypic patterns with interspecific differences in the chro&osomal loca-
morphology and reproductive biology. A similar tion of certain repetitive DNA sequences (e.g., ri-
correlation has been noted in frogs (Morescalchi, bosomal RNA gcnes) reflect the redistribution of
1973).A hypothesized mechanism for such a trend chromosomal elements within karyotypes. A pre-
involves pericentric inversions to produce telo- diction based on this view is that evolutionary
centrics, followed by Robertsonian translocations changes in sequence location should be relatively
(Morescalchi, 1975; Green, 1983). Chromosome conservative (i.e., slow, unique, and irreversible).
painting and other current methods employing in The homosequentiality hypothesis, on the other
situ hybridization should eventually generate the hand, postulates that differences in the apparent
kind of high resolution data to test this kind of hy- location of various sequences reflect localized am-
pothesis within a phylogenetic context. plifications or diminution of sequences with fairly
sequences originate and grow at particular chro-

mosomal locations such as centromeres, or wher-
ever they are tolerated. Non-homologous recom-
bination at such sites would result in the rapid
spread of the new sequence throughout the kt&-
otype and subsequent concerted evolution among
non-homologous chromosomes. Subsequent
small rearrangements would result in the gradual
stochastic breakup, dispersal, and degradation of
these sequences as they are moved away from the
centromere regions along the cl~romosomearms.
A prediction based on this model is that recently
evolved sequences should be homogeneous in
struckre, localized in large clusters (especially at
or near the centromeres or telomeres), function-
ally inert, and taxonomically restricted in occur-
rence. Ancient sequences, 011 the other lxand,
should show more sequence complexity, sliould
be found as small clusters in intercalary positions
along the chromosome arms, may be transcribed,
and are likely to have a wider taxonomic occur-
rence. Information that is consistent with this
model is available from both sala~nanders(Mac-
Figure 19 Some possible modes of chromosomal evo- gregor and Sessions, 1986; Cremisi et al., 1988)
lution. (A) The homosequentiaLity l?ypothesis,in which and plants (Flavell, 1986).
sequences change mainly by tandem duplications or Generating global models of karyologlcal
diminution in situ. This mechanism could be rapid and
reversible. (B) The repatterning hypothesis, in which evolution is complicated by the fact that there ap-
chromosomes undergo large changes in sequence com- pear to be large differences in the molecular orga-
position, a process that would probably be much nization of genomes of different groups of eu-
slower and irreversible. karyotic organisms (Bernardi et al., 1985; O'Uricn
et al., 1985a;Li and Graur, 1991). These differences
may explain why G-band~nghas not been suc-
stable chromosomal locations. This view predicts cessful in amphibians (Birstein, 1982).Much of the
relatively rapid and reversible changes in the cy- repetitive DNA in mammalian genomes may
tologically visible chromosomal location of certain have originated from functional genes (e.g., tRNA
kinds of sequences. A likely cytogenetic mecha- genes) through a retroposition (reverse transcrlp-
nism for this kind of rapid change is unequal tion) mechanism (Deimnger and Daniels, 1986).
crossing over during meiosis-or unequal. sister This mechanism involves the copying of RNA
chromatid exchange in mitosis or meiosis- molecules back into DNA wit11 subsequent ~ n t e -
within repetitive sequences. These mechanisms gration and amplification of these copies at new
can result in a sequence becoming too small to be genomic sites. Retroposition may habe been the
visualized by ISH or, alternatively, a diminutive dominant source of the ~naiorrewetitive DNA
sequence could rapidly expand in size and be- families in mammals, but there are no known ex-
come detectable (Figure 19). amples of high-copy number retxoposon families
A testable model of evolutionary changes in in non-mammalian genomes (Deininger and
chromosome size and molecular structure involv- Daniels, 1986).
ing major groups of repetitive sequences was pre- The picture that has emerged from compara-
sented by Macgregor and Sessions (1986). Ac- tive studies on diverse organisms is that genomes
cording to this model, highly repetitive DNA are incredibly dynamic in terms of position and/
or structure of identified sequences, especially in bridization reaction. The sensitivity of in situ hy-
ter~nsof [lie number, kinds, and locations of vari- bridization using radioactive probes depends on
ous repetitive sequences. Cytogenetic mecha- three main parameters: (1)the specific radioactiv-
nislns such as unequal crossing over, inversions, ity of the probe; (2) the efficiency of the hy-
and rri~nslocationshave clearly played a domi- bridization reaction; and (3) the autoradiographic
nant role, and we are just beginning to under- procedure. For many years these parameters lim-
siand ihe role of transposons, retroposons, and ited most in situ hybridization studies to repeti-
the phenomenon of gene conversion in chromo- tive sequences that can be localized with poorly
somal evolution (W.F. Doolittle, 1985; Deininger defined probes of low specific activity and subop-
and Daniels, 1986; Baker and Wichman, 1990; timal hybridization conditions (A.S. Henderson,
Elillls et al., 1991~).
It is clear that we have very lit- 1982). The specific radioactivity of a probe is lim-
tle t~nderstandingof the relationship between the ited only by the specific activity of the nucleotide
molecular structure and function of chromo- precursors used in the synthesis of the probe. For
somes For example, clusters of ribosomal se- clusters of repeated sequences, including polytene
quences have been found on almost every single chromosomes, W A probes labeled wit11 [ 3 H ] U ~ ~
chromosome (in addition to a stable nucleolus or- at 50 Ci/mmol are sufficiently radioactive (Par-
ganlzer region) in the European newt, Triturus due, 1985). For smaller targets, the specific ra-
vul~al-is(Andronico et al., 1985), and certain sim- dioactivity of the probe can be increased by using
ple-sequence satellite DNA sequences are tran- additional 3H-labeled nucleotides.
scribed by lampbrush chromosomes in salaman- The efficiency of hybridization depends on
der oocytes (Varley et al., 1980). These results numerous factors, including the concentration of
make it difficult to make testable predictions con- the probe, the ionic strength of the hybridization
cerning rates, constraints, and directions of evolu- mixture, the incubation temperature for the hy-
tionary change, and indicate that full and proper bridization reaction, the type of chromosomes,
use of molecular cytogenetic information for phy- and the complexity of the site, as well as the
logenciic analyses will require a better under- method used to prepare the slides and the age of
slanding of the molecular basis of chromosome the slides. Ideally, all available complementary
structure and function. target sites will hybridize with the probe at satu-
ration concentrations. This is precluded, however,
Limitations by the nature of cytological preparations, includ-
ing the difficulty in obtaining complete denatura-
One of the most serious limitations of molecular tion of chromosomal DNA, loss of chromosomal
cytogenetics concerns the reliability of chromo- DNA during denaturation, and the possibility of
sornc identification. Ideally, this identification stearic hindrance by chromosomal proteins (A.S.
should be based on banding patterns or some Henderson, 1982). Overall, the efficiency of hy-
other cl~romosome-specificmarkers, independent bridization has been estimated to be 6-10% (Mac-
of the localization of particular sequences. The gregor and Varley, 2983).
chron~osoinesof most mammals and various Some of these limitations of the hybridization
other organisms are readily G-banded and show reaction have been counteracted by using dextran
complex, chromosome-specific banding patterns. sulfate in the hybridization reaction. Dextran sul-
Other organisms, such as amphibians, have seem- fate is essential for the detection of single-copy
ingly C-band-resistant chromosomes, and unam- chromosomal sequences (Harper and Saunders,
big~iouschromosome identification is more diffi- 1984).It is possible that the signal can also be en-
cult and requires a variety of specialized banding hanced by vector sequences that are attached to
techniques. The application of FISH should help cloned probes. These sequences are radiolabeled
to solve this problem, and is limited only by the and free to participate in network formation, thus
avalldb~lltyof suitable probes. contributing to the overall signal at the hybridiza-
Limitations of in situ hybridization mainly tion sites.
concern the sensitivity and efficiency of the hy- The main limitation with non-isotopic ISH,
Equipment needed to set up a molecular cytogenetics laboratory

Equipment" Useb
-
Autoclave* Sterilization
Balance, analytical* Weighing small samples
Balance, top-loading" Weighing large samples
Camera attachments for microscopes" Chromosomal pl-~otography
CCD video camera Digitizing microscopic images for computer-assisted
image processing
Centrifuge, benchtop, with swinging rotors Preparing mitotic chromosomes (4,5,6,7,8,9,18) and
(preferably refrigerated)" lampbrush chromosomes (11)
Centrifuge, microtube* Centrifuging small samples
Centrifuge, vacuum Probe labeling (19,20)
Computer-assisted analysis hardware and software Image-storage, analysis, and manipulation of
microscopic images
Darkroom' Autoradiography (241, general photography
Electron microscope (TEM/SEM) Colloidal gold labeling, synaptonemal complexes,
chromosome ultrastructure
Flow cytometer (and associated computer hardware Measuring genome size, fluorescence activated cell
and software) sorting (FACS), analyzing cell cycles
fluorescence attachments for epifluorescence microscopy* Fluorocl~rsmebanding (16), FISH (25)
Freezers (-20°C and -70°C)* Storing chromosome tissues, probes, labels,
antibodies
Incubator (20-37°C) Tissue culture (7,8)
Microscope, compound, with phase contrast lenses Examining chromosomes
(lox, 20x, 40x, loox)*
Microscope, confocal laser scanning High-resolution optical sectioning and digitized
analysis of fluorochrome-stained cytogenetic
preparations
Microscope, inverted compound, with phase Checking tissue cultures, microdissection, lampbrush
contrast lenses chromosomes
Ovens, 30-90°C* C-banding (12),AgNOR-banding (161, labeling probes
(20), in situ hybridization (21-23/25)
pH meterr Making buffers, etc.
Refrigerator* Storing samples and reagents
Scintillation counter and geiger counter Labeling probes (20)
Tissue culture lrood Tissue culture (8,19)
UV light source, 15W R-banding (18,19)
Water baths, 45-10O0C* Autoradiography (24), ISH (21-23,25)
Water purification system* Preparing reagents
aItems that are essential for most protocols are marked with an asterisk.
~rotocolsin which the items are used are in parentheses.
such as FISH, is that problems are often encoun- sible than the compact DNA in the interior of the
tered in the accessibility of chromosomal target chromosomes (Pinkel et al., 1986). These prob-
DNA to the reagents, a n d halos of signal are of- lems can b e minimized by careful preparation
ten seen around chromosomal targets that repre- a n d storage of the prehybridized slides, a n d (in
sent diffuse strands of DNA that are more acces- the case of biotinylated probes) amplifying the
148 Chapter 5/ Sessions
signal by using multiple layers of avidin (Pinkel and supplies, see the most recent issue of The
et al., 1986). Other problems concern the im- Biotecknology Directory, Stockton Press, New York.
munochemical procedures. Care must be taken in
choosing appropriate primary and secondary an-
tibodies, in performing staining steps in the cor-
rect order, and in preventing cross-contamina-
tion. Background staining from non-specific
1. Subbed slides
binding of one or more of the antibodies is often a
problem, and appropriate controls must be per- 2. Mitotic chromosomes from gut epithelium
formed. Many fluorochromes (e.g., FITC) fade 3. Mitotic chromosomes from plant root tips
quickly under UV, making photographic docu-
mentation difficult or impossible. Confocal laser 4. Squash technique for mitotic and meiotic
scanning microscopy (CLSM) in conjunction with chromosomes
computer-assisted image processing has greatly
5. Yeast method for mitotic chromosomes from
enhanced the resolution of FISH preparations.
small vertebrates
However, it is expensive and thus not always
available. 6. Splash technique for slide preparations of
mitotic chromosomes
7. Mitotic chromosomes from peripheral blood
LABORATORY SETUP in vertebrates
The most essential piece of equipment for cyto- 8. Mitotic chromosomes from fibroblast cultures
genetic studies is a compound microscope (reptiles)
equipped with high-quality phase-contrast objec-
9. Mitotic chromosomes from corneal
tive lenses. Other equipment needed, for banding
epithelium
procedures, molecular techniques, and even tis-
sue culture, are commonly found in most labora- 10. Mitotic chromosomes from insect embryos
tories. Additional specialized (and expensive)
11. Polytene chromosomes from dipteran
equipment that allows the highest quality cyto-
salivary glands
genetic work includes a confocal scanning micro-
scope, CCD video camera, and computer hard- 12. Lampbrush chromasomes
ware and software for digitized image analysis.
13. C-banding
Access to an electron microscope facility is also
an advantage. Table 2 lists the major equipment 14. Q-banding
necessary to set up a molecular cytogenetics lab-
15. G-banding
oratory. Some useful references are: Nierman and
Maglott, 1993 (ATCC/NIH repository catalogue 16. Fluorochrome R-banding with chrornomycin
of human and mouse DNA probes and libraries), A3
Haugland, 1992-1994 (a catalogue of fluorescent
17. AgNOR banding
probes and research chemicals), Ausubel et al.,
1992 (short protocols in molecular biology), Rost, 18. Differential replication banding with BrdU
1992 (a description and review of fluorescence
19. Modification of BrdU banding for salamander
microscopy), Freshney, 1994 (a manual of animal
embryos
tissue culture techniques), Macgregor, 1993 (an
introduction to animal cytogenetics), and Ther- 20. Labeling probes for ISH via nick translation
man and Susman, 1993 (a description and review
21. Radioisotopic IS13 for reiterated sequences
of human chromosome technology). For a com-
using a DNA probe
plete listing of worldwide suppliers of equipment
Clzrornosomes:Moleculat. Cytogerzetics 149
22. Radioisotopic ISH using an RNA probe 3. Kill the animal by overanesthesia (or by pre-
ferred method).
23. Radioisotopic localization of single copy se-
quences 4. Remove the stomach and intestines, squeeze
out any contents, and open lengthwise using
24. Autoradiography fine pointed scissors. Also remove the spleen,
25. Chromosome painting using EISJ-I kidneys, and (if male) gonads and make small
cuts with scissors.
26. FISH with single-copy genomic probe 5. Submerge tissues (separately from each spec-
imen) in a large volume (e.g., 50 ml) of dis-
tilled water in a flask or beaker for 10-15 inin
Proeocof 1: Subbed Slides with agitation. The water should be changed
(Time: =1 hr handling plus 24 hr incubation) if it becomes cloudy or full of debris.
Microscope slides should be very clean; washing 6. Remove tissues from water, blot briefly on pa-
in hot water and detergent is recommended, but per towels, and submerge in 50-100 ml of
at a minimum slides can be cleaned by soaking freshly prepared, ice-cold 3:l fixative (3 parts
them in 95% ethanol to which several drops of ethanol, 1 part glacial acetic acid) for at least
glacial acetic acid have been added, and then air- 15 min on ice (this first volume of 3:l fixative
dried. Subbing coats slides with a thin gelatin can be reused for all syecimei~sduring a par-
film.Reference: Macgregor and Varley (1983). ticular fixing session if kept cold).
7. Transfer fixed tissues to a vial with fresh, cold
1. Wash slides in hot water and detergent, and 3:1 fixative. Glass, 20-ml scintillation vials
rinse copiously in hot water and then distilled with plastic cone inserts in the caps are deal
water. for storing tissues fixed in 3:1 (do not use foil
2. After a final rinse in distilled water, dip the liners, as they wilI decompose into the fixa-
slides into the subbing solution (Appendix). tive). The tissues can now be stored indefi-
3. Drain the slides and dry in a rack overnight nitely (1Qyears or mare) at -20°C, or used Im-
at 60°C. Subbed slides can be stored indefi- mediately to make slides using the squash or
nitely in a slide box. splash techniques (see Protocols 4 and 6).
These tissues even remain suitable ior DNA
extraction.
g3hromsr;omes from
i3roirpcol 2; !di;&~Eic
&kt Epillaelinm
(Time: incubation from 2-48 hr or more, depend- PrntocuX 3: Mitotic Chi-s;nr;somss fi.c;t-rt
ing on species, plus =I hr handling) Piane Root Tips
(Time: incubation 1 2 4 8 hr or more, dependlng
This technique (from Kezer and Sessions, 1979) on species, plus =1 hr l~andhng)
works best for amphibians with large chramo-
somes, but will work for just about any vertebrate Root t ~ p may
s be obtained either from germinated
(and could easily be modified for invertebrates as seeds or from the cleaned roots of adult plants.
well). For potted plants it is best to water liberally 7 or
2 days before taking root tips. Seeds may be ger-
1. Give healthy, well-fed animals an intraperi-
minated on moist filter paper in a petri dish, and
toneal injection of 1.0-5.0% aqueous colchi-
roots can be obtained from bulbs by suspending
cine, approximately 0.1 ml/g body weight.
them with tootl~picksover dishes of water so that
2. Let animal incubate at a physiologically com- they are partially submerged. Healthy growing
fortable temperature for about 1 hr h a m - root tips are brittle, translucent w h ~ t e ,~ v i t h
mals), 4 hr (reptiles), 12-24 hr (frogs), or 24-48 opaque, tapered tips. The most rapidly diwdlng
lu (salamanders).
cells are located in the embryonic tissue (meris- gle drop of 45% acetic acid toward one end of
tern) just proximal to the tip. If roots are not avail- a clean slide (subbed slides are recom-
nblc, l i is possible to use young leaves or the mi- mended, especially if the preparations are to
totrcally active ovary or ovule wall of developing be made permanent).
f!owers or fruits, or premeiotic cells of pollen 3. Mince tissue as finely as possible using sharp
motl~ercells (Dyer, 1979). forceps, scalpel, or razor blade. The result
1, Immerse roots in a solution of 1.0-5.0% should be a cloudy suspension of cells and
colchicine at room temperature for 1 2 4 8 hr small clumps of cells. Remove any clumps,
(germinated seeds can be left intact). The vol- chunks, lint, or other solid bits of debris.
ume of colchicine can be minimized by using 4. Cover the cell suspension with a clean, silicon-
~iplockbags. ized, 22-mm2 coverslip (coverslips can be sili-
2. Cut off distal ends (0.5-1.0 mm) of the root conized by rubbing with commercially avail-
t ~ p sand fix the tips in fresl~lyprepared 3:l able siliconized paper wipes). To avoid bub-
ethano1:acetic acid for at least 15 min. bles, the coverslip should be lowered gradu-
ally by placing one edge down first, in contact
3. Macerate the tissues in 1 N HCl at 60°C for 5
with the suspension on the slide, and then
min.
slowly lowering the coverslip with forceps.
4. Soak the root tips in 45% acetic acid for 1-5
5. To squash the cells, put the slide between lay-
min.
ers of absorbent paper (e.g., paper towels, or
5. Transfer a root tip to a drop of 45% acetic acid bibulous paper pads), stabilize the coverslip
on a clean slide, cut off the terminal 1 mm of by pushing down firmly with thumb and in-
the root tip (containing the meristem) and dex finger on the top layer of paper near two
discard the rest, and crush and mince very edges of the coverslip, and push down very
thoroughly with a scalpel or razor blade (do hard with the thumb of the other hand in the
not let the preparation dry out). center of the coverslip. Slipping of the cover-
6. Make squash preparations (Protocol 4). slip, which may ruin the preparation, can
sometimes be avoided by tapping gently on
the coverslip wit11 a pencil eraser before
B'ra~lr~co'l 4: Squash Technique for squashing.
- t i t o t i c and Meiotic Chrrznslascs~aes 6. The slide can now be examined directly with
(Tune: <5 min per preparation) phase-contrast optics to check for suitable
chromosome spreads. A gross phase contrast
Tlmc is a certain amount of art in making good effect can be obtained with regular bright
chrolnosome preparations, and this is particularly field optics by defocusing the condenser lens.
true of squashes; practice and perseverance usu- It is useful at this stage to record the coordi-
ally are necessary. Once the technique is mastered nates of particularly good spreads, and, if
it is very fast, and it is convenient to set up the working with large chromosomes (e.g., sala-
slide mahng station adjacent to the microscope so manders), to photograph selected examples.
that each preparation can be examined immedi- Photography is particularly useful at this
ately This technique is recommended for organ- stage if it is a rare specimen and good chro-
isms with large chroinosomes. Reference: Kezer mosome spreads are difficult to find, since
and Sessions (1979). subsequent treatment of the slides may de-
stroy or degrade chromosome morphology.
1. Remove a small piece of tissue from 3:l fixa-
and
t~v? submerge it in 45% acetic acid in a 7. The slide can be made permanent with the
small glass dish for at least 2 min (tissue will dry ice technique (Conger and airc child,
disintegrate after prolonged exposure). 1953) by placing the slide on a block of dry ice
(or into a -70°C freezer) for at least 5 min,
2. Put a small bit of tissue (e.g., 1 mm2) in a sin-
then quickly prying off the coverslip with the syringe full of hypotonic KC1 (0.075 MI in-
point of a razor blade and plunging the slide serted in one end to flush out the marrow into
into 95% ethanol for at least 2 min. The slides a small volume (approximately 3 ml) of hy-
are then air-dried, and can be stored indefi- potonic solution in a 15-ml centrifuge tube.
nitely if kept in a sealed slide box with a cot- Flick the tube to disperse the cells and, if nec-
ton-stoppered vial of desiccant at 4OC. essary, add more hypotonic solution to bring
the volume up to 6 ml (the solution should be
cloudy).
Protaco1 5: Yeast Method for Mitotic 5. Let the cell suspension incubate in the hypo-
Ghrasmosomcs from S~nalIartebrates tonic solution for 15-20 m i . at room tempera-
(Time: >24 hr incubation time, plus =3 hr ture.
handling) 6. Add an equal volume of freshly prepared ice-
cold 3:l (3 parts methanol plus 1 part glacial
This technique is based on the stimulation of acetic acid), mixing constantly, and centrifuge
white blood cell proliferation in bone marrow. For at 100g for 2 min.
mammals, sufficient bone marrow can be ob-
tained from the long bones of the limbs. For small 7. Discard the supernatant and flick the tube
lizards, bone marrow may also be obtained by re- vigorously to loosen the pellet (or use a vor-
moving and crushing the spine (C. Moritz, per- tex mixer), then slowly re-suspend the pellet
sonal communication). The volumes given are in 4-6 ml of fresh 3:l fixative with constant
based on tissues obtained from an adult labora- mixing, and let fix for at least 10 min on ice.
tory mouse; they may be reduced or increased for 8. Centrifuge at 100 g for 2 min.
substantially smaller or larger amounts of tissue. 9. Repeat step 7, but re-suspend in <0.5 ml of 3:l
This technique may also work without the yeast methano1:acetic acid to give a finely dispersed
treatment (especially in lab-raised animals). Ref- cell suspension.
erence: Lee and Elder (1980). 10. Check the cell density by making a slide (via
Inject animals with active yeast suspension the splash technique, Protocol 6) and examin-
(subcutaneously in dorsal region, or directly ing under the microscope.
into body cavity), 0.5 m1/25 g body weight. 11. Cells can now be stored in fixative in the
One injection followed by a 24-hr incubation freezer, or can be used immediately to make
period is adequate for subadults and newly slide preparations.
caught animals, but two or three consecutive
injections at 24-hr intervals may be required
for others.
Protocol 6: Spiaska Technique for Slide
2. After the yeast incubation period, inject the
animal with 1 mg/ml colchicine, 0.1 ml/1O g
Preparations af Mitotic Chromosomes
(Time: <1 min per slide)
body weight, and incubate for 1 hr (shorter
incubation times of 20-40 min will yield less Nearly every lab has a slightly different method
condensed mitotic chromosomes as well as for obtaining splash chromosome preparations, in-
fewer spreads). dicating that many of the parameters are matters
3. Kill the animal (e.g., anesthetize with halo- of preference rather than necessity. The following
thane or C02 followed by cervical dislocation) is a generalized protocol that usually works.
and dissect the upper leg bones (femur) and 1. Take a clean, ice-cold, wet slide (slides can be
upper arm bones (humerus) and remove as stored in a coplin jar of distilled water on ice),
much soft tissue as possible to expose the shake it, hold it with one hand at an approxi-
ends of the bones. mately 30" angle over a trash can or towels,
4. Cut off both ends of each long bone and use a and splash several drops of a fixed cell sus-
152 Chapte.~5 / Sessions
pension from a height of 0.5 m or more onto 5. Thirty to 60 mix1 before harvesting, add one
the slide. drop of 0.025% colchicine to eaclx tube.
2. Gently blow on the slide surface, and dry the
slide on a slide warmer or hot plate at 40°C. Park 8. Harvesting and Fixittion
Alternatively, slides may be flame-dried by 1. Centrifuge tubes for 5 min at 200 g.
holding over a bunsea burner or alcohol lamp
to ignite the alcohol. 2. Carefully remove (and discard) supernatant
with pipette, down to just above the pellet (do
3. Check cell density on one or two test slides not disturb the pellet).
and adjust the cell concentration if necessary
by diluting, or spinning down and re-sus- 3. Loosen the cell pellet by flicking the tube (or
pending the cells in a smaller volume of fixa- buzzing it with a vortex mixer).
tive. Cells should be evenly spread and not 4. Add warm (37°C) hypotonic solution (0.075
touching. M KCl) to produce a dilute cell suspension:
a. Add just enough so that it is just possible to see
through the suspension (usually 6-8 mi, but
Protocol 7: Mitotic Chromoso~a-tesfrom check after 3 ml).
i%eripheral Blood in V~r&ebrates b. Add thc hypntonic solution v~gorouslyto sus-
(Time: 72 hr incubation plus 2-3 hr handling) pend pellet.
c. Let cells incubate 5 min at room temperature.
This protocol can be used for a variety of mam-
malian, avian, and reptilian species. Exact vol- 5. Centrifuge at 200 g for 5 min.
umes depend on the amount of blood used. The 6, Discard supernatant with pipette, and flick
culture medium will depend on the species. Stan- tube to loosen pellet.
dard DMEM or RPMI with 10% fetal bovine
serum (FBS) plus antibiotics works well for mam- The next four steps should be done rather quickly.
mals and birds. L-15 medium is often used for
7. Add more hypotonic solution (approximately
amphibian tissue culture, with the advantage that
2 ml), followed immediately by 4 drops of
the cells can be cultured in the absence of artificial
freshly made fixative (3:1 methano1:glacial
COz. Serum enriched for leukocytes may be ob- acetic acid).
trained by allowing the blood to coagulate at 4°C
for 2 hr and then collecting the supernatant 8. Now mix the cells by bubbling air gently into
(serum plus leukocytes). Lymphocytes in whole the solution with a glass pipette (do not suck
blood may be stimulated with a mitogen, usually the cells into the pipette).
plxytohemaglutinin (PHA) at a concenlration of 50 9. Centrifuge at 200 g for 5 min.
,ug/ml in the medium. Reference: Rooney and 10. Discard the supernatant with a pipette, and
Czepulkowski (1984). flick very vigorously to loosen pellet.
11. Add a small amount of fresh fixative vigor-
Part A. Sattting Up C ~ a l t ~ ~ r c s ously, and flick hard to re-suspend pellet.
1. Using sterile techniques, dispense 5 ml of cul- 12. Wash down the inside walls of the tube wit11
ture medium into sterile culture tubes. two more pipettefuls of fixative.
2. Add 1-10 drops of blood into each tube. 13. Let cells sit for 10 rnin in the fixative at room
3. Incubate tubes on their sides, capped ends temperature.
slightly elevated with caps loosened, at 36-37OC
for 72 lu; mixing the contents at least daily.
Part C . Washing
4. Add additional antibiotic if cultures become
1. Centrifuge at 200 g for 5 min.
cloudy (contaminated).
Cizrornosomes: Molecular Cytogenetics 153
2. Discard the supernatant, and flick gently to 3. Transfer muscles to a pctri dish, cut 111to small
loosen pellet (but be careful to keep cells froin fragments, and culture in 2 ml of Dulbecco's
flying u p and sticking to the sides of the medium with 20% FRS and 50 mg/nzl
tube). neomycin in a culture flask at 30°C (cultures
3. Add fresh fixative, rinsing down the walls of should be gassed with air plus 5% C 0 2 when
the tube to keep cells down. the phenol red in the nzediurn indicates a rise
in pH).
4. Let sit for 10 rnin at room temperature.
4. When confluent sheets of cells are seen (>24
5. Centrifuge at 200 g for 5 min . l~r),add 0.02 ml of 0.16% colchicine, and ~ncu-
6. Repeat washing steps 1-5 once. bate for 1hr.
7. Now re-suspend cells in a small amount of 5. Harvest cells by detaching them with 0.125%
new fixative (less than 1/2 pipetteful). trypsin in 0.02% EDTA (withdraw medlum
8. The fixed cell suspension may now be stored and discard; rinse cells once with tryps~nso-
in fixative in a freezer, or slides can be pre- lution, and discard rinse; add fresh trypsin
pared following the splash technique (Proto- and incubate 15-30 sec, then withdraw and
col 6). [Note: If slides are to be G-banded (Pro- discard; incubate until ceils round up 15-15
tocol 15), they should be dried for 24 11r in an rnin], then disperse in fresh medium, Freslz-
oven at 9O0C.1 ney, 1994).
6. Harvest the cells as in Protocol 7, steps 6-10,
Part D. Stainirtg Slides
1. Place air-dried slides into a clean coplin jar
with 50 ml of phosphate buffer (pH 6.8).
Pro'b-ocrsl9: Mitatis: C"nomosomcs froin
Corneal Epiiheiiu33 of Vertsbratss
2. Add 2.5 in1 of Giemsa stain, and squirt up
(Time: 8-18 hr incubation plus several hours
and down to mix well.
handling)
3. Let stain for 10 min.
4. Quickly flood out stain with tap water, then This is a reliable technique for obtaining good mi-
rinse once quickly with either distilled water totic chromosome preparations from anurans, and
or phospl~atebuffer. probably works 011 fish, reptiles, birds, and mam-
5. Shake off excess fluid, and allow slides to air- mals as well (it does not work very well on snla-
dry. manders, however, because of their longer cell cy-
cle times). The sgecifjc protocol described here is
from David M. Green and is designed for frogs
PrslCocsal8: Mitotic Cfnsoa~ssnmcsfrom and toads (references: Bogart, 1972; Iizuka t.t al.,
1991).
Pibrablast Carl tuarcs (Reptiles)
(Time: >2 days incubation, plus =3 hr handling) 1. Inject animal with 0.1% colchicinc in distilled
water, using a long 22-gauge needle. Insert
This technique (from Yonenaga-Yasuda et al., the needle under the skin of tlze uppcr th~gh
1988) could probably be used for any vertebrate, and work it u p the back under the sku11 111to
with appropriate modifications in culture media, the dorsal lymphatic sac. The needle passes
incubation times, etc. through the membrane dividing the dorsal
1. Sterilize hind legs by successive treatment sac from the sac on the upper thigh. Fill the
with 70% ethanol, ether, and merfene. dorsal lymphatic sac until the skin between
2. Remove muscles aseptically and place in a the eyes bulges (amount depends upon the
small sterile bottle with 5 ml of L15 medium size of the frog). Incubate 8 hr (Eleuti7crudacty-
with 5% fetal bovine serum (FBS) and 50 lus, Hyla), 10 hr (Bufo), 14 hr (Rarzn),or up to
mg/ml gentamycin, for 24 hr at room tem- 18 hr (Leiopelma) depending upon tempera-
perature. ture and metabolic rate of the frog.
154 Chapter 5 1 Sessions
2. Clean a spot plate with ethanol, and fill one pick it up. Turn it right-side up and place it on
well for each eye with distilled water. a black surface (so you can see the tissue). Use
3. Kill the frog by immersion in anesthetic solu- the wooden end of a match stick to lightly tap
tion (1% tricaine methosulfonate) or by ap- the top of the coverslip to remove bubbles and
plying a glob of benzocaine ointment distribute the tissue over the slide. Place the
(Anbesol'") on the top of the head. Alterna- slide on some absorbent bibulous paper, cover
tively the frog can be pithed. it with another strip of bibulous paper and
hold it all in place with the thumb and index
4. Dissect out the eyes using a fine No. 11
finger of your left hand (if you are right-
scalpel, being careful not to puncture the eye.
handed). Put your right thumb firmly on top
To begin, insert the blade under the eye be-
between your left thumb and finger and press
tween the eye and the lower eyelid using the
with considerable force to squash the whole
blunt, back edge of the blade. Then carefully
thing. Remove thumb and paper and seal the
cut around the eye's connections at the front
coverglass around the edges with rubber
and back and to the upper eyelid. When free
cement.
from the eyelids and peripheral muscles to
the side, maneuver the scalpel around the 9. Examine with phase-contrast.
back of the eyeball to sever the optic nerve 10. To make the slide permanent, peel off the mb-
and musculature. When the eye is almost ber cement and freeze the slide by immersing
completely free, remove it from any remain- it in liquid nitrogen. Scrape off the remaining
ing connections and lift it from its socket us- coverglass cement, pop the coverslip off with
ing fine forceps. a razor blade or scalpel, and fix the slide in
5. Place each eye in distilled water in the spot 95% ethanol in a coplin jar. This will not work
plate wells, and let sit for one hour. if the coverglass is not siliconized. Transfer to
fresh 95% ethanol after 5 min and then air-
6. To fix the tissue, pick up the eye with fine for-
dry. The chromosomes can now be banded,
ceps and'position it so that it can be held with
hybridized, or stained.
cornea facing down (do this by holding onto
the stubs of the muscles at the back of the
cye). Suspend the eye for 1 min, 1-2 mm Prrstocoli 30: Mitotic Chramosomes
above the surface of a watch glass filled with
glacial acetic acid. The fumes will fix the from Insect Embryos
cornea. Do not allow the eye to touch the (Time: -8 hr)
acetic acid. Place the eye back into its well in This technique is modified from Zhan et al. (1984)
"the spot plate to check for proper fixation. The for orthopterans.
eye surface should be cloudy. If it isn't, re-sus-
1. Place eggs separately in a petri dish contain-
pcnd it over the fumes until it is.
ing filter paper soaked with Mark's M-20 in-
7. Arrange at least three or four clean, sili- sect culture medium (Gibco) with 7.5% FBS
conized coverglasses on the spot plate under and 5.0 mg/ml actinomycin D, and incubate
a dissecting microscope. Place the eye, cornea at 37OC for 4 hr.
up, in a central well. Under low power, use
2. Transfer eggs to another petri dish containing
iorceps to hold the eye and use a blunt scalpel
fresh M-20 medium with 0.16 mg/ml col-
to carefully scrape off the fixed cornea. Trans-
fer the tissue to a coverglass. Divide the tissue
cemid and incubate for 1hr at 37OC.
into equal portions and distribute to cover- 3. Dissect embryos out of eggs in plain M-20
glasses using the scalpel and forceps. Place a medium.
drop (or two) of 70% acetic acid onto the tis- 4. Transfer intact embryos to a centrifuge tube
sue on each coverglass. containing 0.075 M KC1 hypotonic ( ~ 0 . 5
8 Apply a clean slide to a coverglass in order to ml/embryo) for 30 min at 37°C.
5. Dissociate embryos by gentle pipetting, then tap lightly on the coverslip directly over the
centrifuge 2 min at 100 g. glands with a pencil eraser, to help spread the
6. Discard supernatant and re-suspend cells in a chromosomes. Monitor the spreading with a
large volume of fresh 3:l methanokacetic acid, phase microscope.
and fix for 20 min at room temperature. 6. When the chromosomes appear well spread,
7. Centrifuge 2 min at 100g, decant supernatant, make slides permanent using the squash and
and re-suspend in fresh fixative. Repeat once, dry ice techniques (Protocol.4).
re-suspending the final cell pellet in approxi-
mately 1ml of fixative.
S, Use final cell suspension to make splash Protocol 12: Lampbrush Chromosomes
and/or squash preparations. (Time: Part A, 15 min; Part B, >1 hr; Part C, sev-
eral hr; part D, =1hr)
This technique works for salamanders, and easily
PraftocaI 11: Polyiene Chrcomosomes can be modified for frogs, reptiles, fishes, and
from Dipteran Salivary Glands birds. Generally, medium-sized yolky oocytes
(Time: 5-10 min/preparation) (i.e., neither the largest nor the smallest) yield the
best lampbrush chromosome (LBC) preparations;
This technique works best with large, healthy, large, more mature oocytes usually. have con-
well-fed larvae of many species of dipteran flies. densed, featureless LBCs. The best dispersion
Third instar Drosophila larvae are usually found medium (DM) varies among taxa (J. Kezer, per-
crawling up the sides of the culture jar. The paired sonal communication; Macgregor and Varley,
salivary glands are clear or slightly opaque, some- 1983; Callan, 1986). The DM and IM given here
what zucchini-shaped, and have pieces of glisten- are general "all-purpose,'' and should be tried
ing fat body attached. The glands can be seen first.
clearly by using understage lighting on a dissect-
ing microscope. Good polytene chromosomes for Part A. Preparing Ovaries
in situ hybridization should be flat and gray with
1. Anesthetize or kill animal (e.g., in 0.1-0.2%
no refractivity, and the banding pattern should be
MS222).
clearly recognizable (Macgregor and Varley, 1983;
Pardue, 1986). 2. Remove one or both ovaries through an inci-
sion in the ventral body wall.
1. Remove a large third instar larva and place in
3. Transfer ovaries immediateIy to a dry, clean
45% acetic acid (or in isotonic saline) in the
embryological watch glass or small dish, and
middle of a clean slide.
keep covered. Ovaries can be stored "dry" at
2. Use needles and/or watchmaker's forceps to 4°C for 2-3 days if dish is sealed with
pinch off the anterior end of the larva just be- parafilm.
hind the head segment. It is best to hold the
head steady and pull the rest of the body away
until paired salivary glands emerge from the Part B. Isolaiiotr af Nucleus
anterior opening (if they don't appear imme- 1, Submerge a small piece of ovary into 5:l iso-
diately, discard and select a fresh larva). lation medium (5 parts 0.1 M KCI: 1 part 0.1
3. Tease off as much fat body as possible with- M NaC1) in a clean dish.
out damaging the glands. 2. Using watchmaker's forceps, tear open ovary
4. Transfer (by sliding) the glands to a small and remove an oocyte. Grasp the oocyte with
drop of 45% acetic acid near one end of the two forceps and pull laterally to break open
slide, and fix for 1-2 min. the oocyte. The yolky contents will spill out
(take care to keep the preparation completely
5. Cover with a clean siliconized coverslip and
submerged).
3. Locate the translucent nucleus (=0.3-0,4 mm speed to be gradually increased over a 3-min
in diameter in salamanders; 0.1 m m in period to 2000-3000 g-.If the centrifuge is not
lizards), and suck in and out of a small-bore, refrigerated, it can be prechilled by placing
flame-polished Pasteur pipette several times dry ice in the chamber for approximately 30
to remove the adherent yolk (the nucleus is min before use. Centrifuge at 2000-3000 g for
sturdy and can be bounced off the bottom of at least 15 min.
the dish to dislodge adherent bits of yolk). 3. Remove the cliambers from the centrifuge
tubes, immerse in dispersion medium, and
Y'art C. Dispersal of Chrntgaosozrres use a razor blade to remove the coverglass on
which the chromosomes now rest. Gently
1. Transfer the cleaned nucleus to a dispersion swish the coverslip around in the medium to
chamber (if permanent preparations are de- wash away any remaining nucleoplasm.
sired, e.g., for ISH, use a bored circular plas-
tic disc with a paraffin-attached coverslip on 4. Fix the preparation in 70% ethanol for 5 min.
the bottom) completely filled with dispersion 5. Remove the preparation to fresh 70% ethanol
medium (5:1 plus 0.5%paraformaldehyde) so for at least 15 min, then dehydrate in 95%
that there is a convex meniscus on top of the ethanol (2 x 10 min) and air-dry. Tlze prepara-
chamber. tions are now ready for ISH, but may be
2. Using a black background under a dissecting stored desiccated at 4°C until needed.
microscope, grasp the nuclear envelope at the
top of the nucleus with one pair of forceps,
then take hold with a second pair very near Protocol 13: G-Bamdix~g
the first, and pull the two forceps apart with (Time: 1day pretreatment, plus =2.0hr)
a slightly downward motion (nuclear con-
tents should emerge as a gelatinous mass This method works for mitotic and meiotic chro-
completely separated from the envelope, mosomes of most organisms, including plants, in-
which should remain attached to one or both sects, urodeles, anurans, birds, reptiles, fish, and
forceps. [Note: Abandon the preparation im- mammals (Schmid et al., 1979).
mediately if the nuclear contents begin to ex-
1. Bake permanent, unstained, air-dried clzro-
trude spontaneously througlz a small hole in
mosome preparation slides for 1 day in a 60°C
the envelope; such preparations will yield
oven.
only fragmented chromosomes.]
3. Cover the preparation with a coverslip. To
2. Place slides in coplin jar with prewarmed
(30°C), saturated barium hydroxide for 5 min
avoid the disruptive effects of surface tension,
at 30°C.
the coverslip must be dropped so that the sur-
face of the coverslip is parallel to the surface 3. Rinse very briefly in 0.1 N HCl, followed by a
of the slide. thorough rinse in distilled water (e.g., fill and
4. It takes several hours for the chromosomes to
empty the coplin jar six times).
settle onto the floor of the dispersion cham- 4. Place slides in coplin jar with prewarmed 2x
ber. Keep the slides refrigerated in a humid SSC (Appendix) for 1 hr at 6Q°C.
chamber during this time. 5. Rinse in distilled water (2 min).
6. Stain slides in 8.0% Giemsa in phosphate
Part D, To Makc Permanerat Preparation:; buffer [Appendix) p H 6.8, for 5 min. Load
1. Place the dispersion chamber into a centrifuge slides into a coplin jar and add 50 ml buffer,
tube fitted with an epoxy plug. then add 4 ml Giemsa and quickly pipette up
and down until thoroughly mixed.
2. Centrifuge, using a swinging bucket rotor, in
a prechilled table-top centrifuge that allows 7. Rinse out Giemsa stain by flooding coplin jar
Chuomosomes:Molecular Cytogenetics 357
with distilled water or fresh buffer to avoid 6. Air-dry slides and cover with a xylene-based
contamination of slides with film that forms mounting medium ( ~ e ~ e xor' " Permouni'").
on the surface of Giemsa staining solution.
8. Air-dry the slides and mount with a coverslip
in a xylene-based mounting medium (e.g., 1%0loclh?l'16: H:Xuh~rscRrro~12eR-Banrii:.ig
~ e P e x or
' ~ permountTM). with Chromrsmysi~tA3
(Time: ~30-45min)
This stain produces reverse banding (relative to
FrotocoB 14: 9-Banding G-banding) in mammals, and stains NORs in sala-
(Time: =15 min/preparation)
manders, fishes, and some plants (Hack and
This protocol is from Berm and Perle (1986). Lawce, 1980; Sessions and Kezer, 1987).
I. Place slides in 0.5 mg/ml quinacrine dihy- 1. Place air-dried slides in a humid chamber and
drochloride stain for 10 rnin at room tempera- flood each preparation with at least 50 pl of 5
ture. pg/ml chromomycin A3 in chromomjlcin
2. Rinse briefly in distilled water to remove ex- buffer (see Appendix), cover with a coverslip,
cess stain. and stain for 20 rnin in the dark at room tern-
perature.
3. Soak in a coplin jar of McIlvaine's buffer (Ap-
pendix) for 1 min. 2. Rinse off the chromomycin with distilled wa-
ter, and place slides (no more than three at a
4. Mount in a few drops of buffer, aquamount,
time) in coplin jar with methyl green counter-
or 100% glycerol using a thin glass coverslip. staining solution (2-ml stock solution in 50 in1
5. Examine and photograph immediately with phosphate buffer, pH 6.8) for 6 min.
fluorescent optics using a filter combination 3. Rinse in distilled water.
appropriate for fluorescein (FITC; e.g., Zeiss
filter set No. 9, BP 450-490 nm, FT 510 nm, 4. Air-dry, then mount in 100% glycerine or
and LP 520 nm); the fluorescent iinage fades aquamount and examine under UV epifluo-
quickly. rescence optics using an appropriate filter
combination (see Protocol 14, "Q-Banding").
Prestscsl15: 6-barnding
(Time: ==7midslide plus 1 hr drying time) Protocol 17: AgNBR Banding
(Time: =1 min/slide)
This protocol is from Benn and Perle (1986).
This fast, easy, and reliable technique was pub-
1. Age slides by placing them in a hot oven lished by Hsu (1981), and seems to work for all or-
(60°C) overnight. ganisms. Use aged (at least 1 day), air-dried
2. Agitate slides for a few seconds in 0.005% slides.
trypsin in PBS (Appendix); optimal time 1. Mix 2 parts of 50% (w/v in distilled water) sil-
varies widely for different preparations. ver nitrate solution and 1part developer (Ap-
3. Rinse in three changes of ice-cold PBS (dip pendix) in a glass vial (allowing at least 150 p1
consecutively in each coplin jar). for each preparation), and mix thoroughly
4. Stain for 5 min in 5% Giemsa solution in 2. Add 3 drops to each preparation and quickly
phosphate buffer, pH 6.8. add a coverslip.
5. Remove Giemsa by flooding under a gentle 3. Incubate at 90°C for 30-60 sec (or until stain-
stream of water. ing solution has turned muddy yellowish
brown).
4 liir~seoff coverslip with distilled water (using This technique yields complex banding patterns
a squirt bottle, or rinse in a beaker of water). comparable to G-bands in salamanders (Kuro-o et
5. Air-dry slides, and mount in oil or permanent al., 1986; Kohno et al., 1991).
mounting medium. 1. Wash dejellied embryos in several changes of
sterile amphibian saline.
2. Transfer embryos to culture dish (35-mm di-
~ j l DiffcrexuPiaf. Replication
P k . ~ ; i i i18: ameter) containing l a 5rnl of 60% Eagle's min-
U~txziiingwith SxdU imum essential medium (MEM) with 20%
(Time several days incubation, plus 1 full work- FBS, 20% sterile-filtered water, and 400 pg/ml
ing day) BrdU.
Thrs technique can be used to obtain complex 3. Disrupt embryo with a sterile Pasteur pipette,
chron~osornalbanding patterns in organisms in and incubate cells in a darkened, humidified
whlcl~more conventional banding methods d o incubator under a constant flow of air with
not work (Dutrillaux, 1975; Benn and Perle, 1986). 5% C 0 2 for 24 hr at 20°C.
1. Sct up tissue culture cells. 4. Add another 1.5 ml of medium containing
1.0% colchicine and incubate 8 hr at 20°C,
2. Five to seven hours before addition of col-
ccmid, a d d 0.01 ~romodeoxyuri~ine 5. Centrifuge cells and medium at 120 g for 7
(RrdU) and 0.01 M dcoxycytidine to make fi- min.
nal concentration of la-*M each. 6. Discard supernatant, re-suspend cells in 10 ml
3. One hour before harvest add colcemid to final hypotonic solution (amphibian saline diluted
concentration of 0.1%. 3:7 with distilled water), and incubate for 1 hr
at room temperature.
4. l-larvest and make slides via the splash or
squasl~technique (see above). 7. Add 0.5 ml fresh 3:l methanokacetic acid fix-
ative and fix for 10 min.
5. Soak air-dried slides in PBS for 5 min at room
temperature. 8. Centrifuge at 420 g for 5 min, replace super-
natant with fresh fixative, and fix for 5 min.
6. Stam in 0.5 pg/ml I-Ioechst 33258 for 10 min
a t room temperature. 9. Repeat step 8 twice, but re-suspend final cell
pellet in approximately 1 ml fixative, and use
7. Mount in McIlvaine's buffer (Appendix).
to make splash, and/or squash preparations.
8. Trradlate slides at approximately 5 cm from a
10. Age slides for 3-5 days at room temperature.
15-W UV light source at 50°C for 15 min or
under a 75-W growlamp for 24 hr. 11. Stain with 50 pg/ml Hsechst 33258 in cal-
cium- and magnesium-free PBS for 15 min.
9. Ibnse coverslip away with disti1Ied water,
alld incubate slides for 15 min in 2~ ssc at 12. Rinse briefly in distilled water, mount in PBS,
65°C. and expose to UV light at a distance of 10 cm
for 30 min.
10. Sta~rzslides in 8% Giemsa in phosphate buffer
pH 6.8 for 5-10 rnin. 13. Remove coverslips, rinse briefly in distilled
water, then incubate in 2x SSC for 30 min at
11. iilr-dry slides and mount in a xylene-based
60°C.
n7 edium.
14. Rinse slides in running water, then stain in 3%
Giemsa in PBS at pII 6.5 for 4 min.
Protost:! 19: Modification of BrdU- 15. Air-dry slides, and mount in a xylene-based
nanding for Salamander Etabryos
-TP mounting medium.
(Tirne 32 11r incubation, 3-5 days slide aging,
plus =3 11s)
PECE~OCQI20: Labeling Probes far lSEd 8. Slow the reaction by placing the tube on ice,
via Nick, TransIaEion and determine the percentage incorporation
(Time: 3-4 hr) of radioactive nucleotides with the following
procedure:
Nick translation is the most commonly used
a. Mix 5 ,uLof the reaction mixture with 995 pl of
method for both radioisotopic and non-isotopic
TCA/BSA in a microcentrifuge tube, and keep
labeling of probes for in situ hybridization. The on ice for 15 min.
radioisotopic method is designed to label probes
with tritium, which are then detected with au- b. Pass 5 ml of ice-cold 5%TCA through a 2.5-cm-
toradiography (Macgregor and Varley, 1983; Par- diameter Whatman GF/C glass fiber filter fol-
due, 1985, 1986; Malcolm et al., 1986). The proto- lowed by the TCA/BSA reaction mixture.
col for non-isotopic nick translation is designed c. Wash the filter three times with 5 ml of cold 5%
for biotin-avidin labeling, used in most FISH (flu- TCA, and dry the filter at 65OC for 20 min.
orescence in situ hybridization) and chromosome- d. Measure the radioactivity of the filter in a scin-
painting protocols (Ausubel et al., 1992). HPLC- tillation counter using a toluene-based scintilla-
purified nucleoside triphosphates have limited tion fluid.
shelf life in solution, but are stable for up to 1year e. To measure total incorporated and unincorpo-
when stored as aliquots at -20°C. Deoxyribonu- rated radioactivity in the reaction mixture, take
cleoside triphosphates (dNTPs) can be purchased another 5-pl sample of the reaction mix and put
as ready-made 100 rnM solutions, or they can be it directly on a clean filter without TCA, dry it,
purchased in lyophilized form (Ausubel et al., and count it.
1992). Nick translation kits are commercially
available. f. The percentage incorporation of radioisotopes
into the probe is determined by comparing
Nick 'rr;ensXation for Tritium-Labeled Probes counts between the two filters. The TCA-treated
This method utilizes 1 pg of DNA and produces filter should have 20-60% of the counts ob-
enough probe for at least 10 slides, with a specific tained from the untreated filter. The DNA
activity of 2-6 x lo6 cpm/pg. should not be used if it shows less than 10% in-
corporation.
1. Mix 2 x 10" pmol each of tritiated precursors
(dNTPs) in a microcentrifuge tube. 9. Stop the nick translation reaction by adding
100 pl of water-saturated phenol and mixing
2. Aliquot 18 pl of the mixture into ethanol- well with a Pasteur pipette.
washed microcentrifuge tubes; quickly freeze-
dry under vacuum to prevent radiolysis. 10. Centrifuge at 5000 g for 5 min.
3. Add to a tube containing the dried, tritiated 11. Unincorporated nucleotides can be removed
dNTPs: 10 pl of nick translation buffer (Ap- by loading the aqueous supernatant directly
pendix), 5 p1 of DNA (1 pg), and glass-dis- onto a Sephadex G-50 column (see Chapter 8)
tilled water to make a total of 94 pl. that has been prewashed with distilled water.
4. Incubate the mixture at 15OC for 10 min, then 12. Elute with distilled water and collect consec-
chill the tube in ice water. utive fractions of 30 drops each. Count 5 ,ul of
each fraction in a scintillation counter (using
5. Add 5 p1 (12.5 U) of DNA polymerase I to a tergitol scintillator) and combine the frac-
make a total volume of 99 pl. tions containing the first peak of radioactivity
6. Add 1 pl of diluted (1 pg/mU DNase I (1 to come off the column.
mg/ml) stock (dilute stock immediately be- 13. Freeze-dry these combined fractions and re-
fore use). dissolve them in 50 pJ of distilled water.
7. Incubate at 15°C for 1 hr.
Non-Isotopic Nick 'rranslatinn for 8. Separate biotinylated probe from unincorpo-

Biotin-Laheled Probes rated nucleotides using the prepared column.
In this method, biotin-11-dUTP is substituted for Eluted probe should be 20 ng/pl (for 2 pg
dTTP in a standard nick translation reaction mix- nick-translated DNA); it can be stored at
ture. Other biotinylated nucleotides can be used, -20°C for years.
e.g., biotin-14-dATP, with appropriate adjustments
of the unsubstituted dNTPs. The concentration of
DNase I must be determined empirically and PsoatscoT22: Radioisotopic ISH for
should be adjusted to ensure a size range of Reiterated Sequences Using a
100-500 nucleotides. Incorporation is 50 biotin DNA Frabe
molecules/kb DNA. Since non-isotopicallylabeled (Time: Part A, 3.5 11r; Part B, 15 min; Part C, 6-12
probes have a long shelf life (>2 years), many mi- hr; Part D, 1.5-2 hr)
crograms of DNA can be labeled in one reaction to
provide probes for multiple experiments. Cytological preparations to be hybridized should
be air-dried on subbed slides. This method pro-
1. Prepare 100-pl reaction mix containing 10 pl
vides enough hybridization reaction mixture for
each of lox E , coli DNA polymerase I buffer
approximately 10 slides, assuming 30 pl with lo5
(Appendix), 0.5 mM 3dNTP mix (minus
counts per slide (Macgregor and Varley, 1983; Par-
dTTP), 0.5 mM biotin-11-dUTP stock, and 0.1
due, 1986).
M 2-mercaptoethanol, then add 2 pg of DNA,
20 U of E. coli DNA polymerase I, DNase I
stock (1 mg/ml in 0.15 M NaC1/50% glycerol) Part A. Slide Pretreairnent.
diluted 1:1000 in cold water immediately be- 1. Place slides horizontally in a humid chamber
fore use, and make up to a total volume of 100 (with black filter paper on the bottom to make
pl with glass-distilled
- water. it easier to see the cytological material on the
2. Incubate reaction mix 2-2.5 hr at 15°C. slides). For hybridization to a DNA target
(but not for an RNA target), put 200 p1 of
3, Remove 6 pl and place remainder on ice. Boil
RNase mixture on each preparation, cover
the 6-p1 aliquot 3 min and place on ice 2 min.
with a coverslip (22 x 40 mm), and incubate
Load aliquot on agarose minigel, along with
in the humid chamber at 37OC for 2 hr.
suitable sized markers (0.1-10 kb; see Chap-
ter 8). Run gel quickly (15 V/cm) in case ad- 2. Remove coverslips by dipping slides in a
ditional incubation is necessary. large beaker (500 rnl) of 2x SSC (so that the
coverslips float off), and was11 in 2x SSC in a
4. If digested DNA is between 100 and 500 nu-
coplin jar, 3 x 10 rnin.
cleotides, proceed to step 4. If probe size is
larger than 500 nucleotides, add second 3. Denature the target DNA by placing the
aliquot of DNase 1 (added in a more concen- slides in 0.07 M NaOH at 20°C for 3 min.
trated form to minimize volume changes) and 4. Wash and dehydrate the slides in three
incubate further. changes of 70% ethanol and two changes of
5 . Add 2 pl of 0.5 M EDTA (10 mM final) and 1 95% ethanol, 10 mi11 each, and air-dry.
/d of 10%SDS (0.1%final) to reaction.
6. Heat 10 min at 68OC. Part B: Preparation of the E3ybridization
7. Prepare a spin column using Sephadex G-50 Mixture
or Bio-Gel P-60 (see Chapter 81, washing the 1. Dissolve labeled DNA probe (freeze-dried or
column 3 to 4 times with 100 pl SDS column ethanol-precipitated) in 30 pl of 0.1 M NaOH
buffer before loading the sample. An addi- in a microcentrifuge tube by flicking the tube
tional wash after loading the sample is un- with a finger; total radioactivity should be ap-
necessary. proximately 3 x lo6 cpm.
Chromosomes:Molecular Cyfogenetics 161
2. Add 150 pl of forinamide stock (Appendix) to Pro t m o i 22:: Radisilietopic ISH Ui;il-tg
make a final concentration of 50% and mix an RNA Frobe
well. (Time:6-12 11r incubation plus 6-8 11r)
3. Add 60 pl of 20x SSC (to make final concen-
tration of 4x SSC) and mix well. This protocol is based on Macgregor and Varlcy
(1983) and Pardue (1986).
4. Add 30 pl of distilled water, mix well, and
cool on ice for 5 min. 1. Preparation of RNA probe: lyophilized or
5. Add 30 pl of 0.1 M HCl (i.e., enough to ex- ethanol-precipitated RNA should be dis-
actly titrate the 0.1 M NaOH) and mix well. solved in 2x SSC or in 4x SSC/SO% for-
mamide to provide a total of 2-3 x 10"
6. Keep the hybridization reaction mixture on cpm/ml and 30 yl per slide.
ice and use within 10-15 min.
2. Slide pretreatment and hybridization reac-
tion: same as for DNA-DNA hybridization
Bart C. The ilybridization Rcastirjn (Protocol 21).
1. Place pretreated, air-dried slides horizontally 3, Remove coverslips by dipping in large vol-
in humid chambers (again using black paper). ume of 2x SSC.
2. Place 30 pl of the hybridization reaction mix- 4. Place slides in fresh 2x SSC for 15 min at room
ture (wluch has been kept on ice) in the mid- temperature.
dle of the preparation of each slide.
5. Treat each slide with ribonuclease mixture
3. Place a 22-mm2 glass coverslip over each (Appendix) at 37°C for 1 l-ir.
preparation, avoiding bubbles and making
6. Wash slides in 2x SSC, 2 x 15 min.
sure that the entire preparation is covered
with reaclion mixture. 7. Place slides in 5% TCA at 5OC for 5 min.
4. Cover the humid chambers and incubate at 8. Wash slides in 2x SSC, 2 x 10 min.
37OC for 6-12 hr. 9. Wash slides in 70% and 95% ethanol, 2 x 10
min each.
Pare U. Washing the Slides 10. Air-dry the slides; they are now ready for au-
toradiography (Protocol 24).
1. Lift each slide from the humid chamber and
remove the coverslip by dipping into a large
volume of 2x SSC.
2. Place the slides in a coplin jar of fresh 2x SSC Protocol 23: Radioisotopic Localization
at 65°C for 15 min. of Single-Copy Seqateraces
3. Wash in 2x SSC, 2 x 10 min at room tempera- (Time: 8-16 hr incubation plus 5-6 hr)
ture. This protocol is from Harper and Saunders (1984).
4. Place the slides in a caplin jar of 5% TCA at Use recombinant bacteriophage or plasmid DNA
5°C for 5 min. containing single-copy sequences of ~nterest.
5. Wash in 2x SSC, 2 x 10 min at room tempera- Probes should be labeled with tritiated dNTPs by
ture. nick translation to 20-40 x 106cpmlyg (see Proto-
col 20).
6. Wash in 70% and 95% ethanol, 2 x 10 min
each. 1. Pretreat slides with RNase (as in Protocol 21),
7. Air-dry. The slides are now ready for autora- rinse in 2x SSC, and dehydrate in ethanol.
diography (Protocol 24). 2. Dissolve probe in 50% formamide, 2x SSC,
10% dextran sulfate, p H 7.0, along with
500-fold excess sonicated salmon sperm
DNA carrier.
162 Chnpter 5/ Sessions
3. Dcnaiure probe mixture (85°C for 3-15 min, tents slowly down the side of the 500-ml
then chill quickly on ice). beaker.
4. Apply chilled, denatured probe mixture to 4. Thoroughly mix the contents of the beaker by
sllde preparations and cover with a coverslip. swirling gently so as to prevent the formation
5. Jncubate in a humid chamber at 37OC for 8-16 of bubbles.
hr 5. Dispense the diluted emulsion into scintilla-
6. Rinse thoroughly (e.g., 3 x 5 min each) in 2x tion vials, approximately 10 ml per vial. This
S5C/50% formamide, pH 7.0, and then 2x is enough emulsion to coat approximately 30
SSC,pH 7.0, at 39"C, followed by dehydration slides.
In eihanol (e.g., 70 and 95%, 2 x 10 min each). 6. Wrap each vial in aluminum foil, place them
7 The slides are now ready for autoradiography in a light-proof box, and store at 3-5'C in a re-
(Protocol 24). Slides require an exposure tlme frigerator that is never used for radioisotopes
of 5-22 days at 4°C; keep the slides in a light- or organic solvents. Stored in this way, the
proof slide box (e.g., taped with black electric emulsion may be good for at least 5 years.
tape) along with desiccant in a cotton-stop-
pered vial. Par! Ti, Coating and Expasirrg the Slides
8 G-band clu-ornosomes with Wright stain (see 1. Working in complete darkness or under a
Protocol 24). safelight, place the sealed vial of emulsion
and a slide dipping chamber into the 45°C
water bath for 15-20 min (the dipping cham-
J'rotrziol 24: huto~at7fiograpi1yfor ber can be stood in a beaker or diagonally in a
D ~ ' i C . c t i oof~ ~Radiois~topic1SW coplin jar filled with water, and should be im-
(%me. Part A, 30 min; Part B, 2 4 hr; Part C, =30 mersed to within 0.5 cm of its top edge).
min; Part D, =30 min) 2. Fill the dipping chamber by slowly pouring
Aulorddlograpl~yrequires three main tasks: dilut- the emulsion down its side to avoid bubbles
ing and allquoting a new batch of emulsion, coat- (a small funnel is useful).
ing llic shdes, and developing the exposed au- 3. Dip the slides slowly and smoothly into the
ioracilographs (Macgregor and Varley, 1983). chamber, one at a time (taking care not to
touch the emulsion with fingers), withdraw
Ps!f 'i. Diluting, hliquotirxg, aaui Stcrrlng and drain briefly against the edge of the
!!rtt:~l~~iun chamber, and place in a slide rack to dry.
1. Open package of emulsion (e.g., Kodak 4. Air-dry the slides for at least 2 hr in complete
NTB2) in the darkroom either in complete darkness.
ciarkncss or under a safelight (e.g., Kodak 5. Store the slides for exposure in light-proof
8152-2525) and warm the bottle for 30 min in slide boxes sealed with black electric tape.
a 45'C water bath along with a 200-ml flask of Moisture during the exposure time can cause
dislilled water and an empty 500-ml beaker. the latent image to fade, so it is important to
2. After the emulsion has melted, pour the en- place a vial of desiccant into each slide box.
l ~ r econtents of the bottle very slowly down The vial of desiccant should be loosely
ilw side of the prewarmed 500-ml beaker and plugged with cotton and can be held in place
return beaker to the water bath. with a blank microscope slide. The slide
boxes should be stored at 4:C for the appro-
3. Fill the empty plastic emulsion bottle with
priate exposure time (since the exposure time
pretvarmed distilled water from the Erlen-
is determined empirically, it is important to
meyer flask, mix gently, and pour the con-
include some expendable test slides).
6. After the required exposure time, the slides PTO~BC 25:Q chkoxnasnme:
~ Painting
should be warmed to room temperature and usingFISH
developed according to the following proce- (Time: Part A, 2-3 hr; part B, 12-18 hr; Part C,
dure (all solutrons must be at the same tem- 6 5 hr)
perature, 15-2O0C, to avoid cracking or wrin-
kling the emulsion). This technique has been used to locate ribosomal
DNA on chromosomes of vertebrates, and can be
used for chromosome painting using chromo-
f artC. Developing the Atatoradingraphs
some-specific probes (many are commercially
1. In the dark: gently rock the preparation in available). Probes are labeled with biotin or digox-
freshly mixed developer (e.g., Kodak D-19), igenin (Protocol 20), and the signal is amplified
2.5 min at 20°C (a single coplin jar can be and detected using avidin-biotin and immuno-
used if developing 10 or fewer slides). chemistry (C.A. Porter et al., 1991).
2. Pour out developer and replace with fixer; fix
for 5 rnin at 20°C (lights can come back on af- Past A. Preparation and Denaturation
ter 2 rnin in fixer).
1. Use air-dried or flame-dried permanent slide
3. Pour out fixer and rinse slides in distilled wa- preparations.
ter at least five times, 2 rnin each at 20°C.
2. Treat with RNase (100 ,ug/ml in 2x SSC, p H
4. Air-dry the slides. 7.0) for 1hr at 37°C.
3. Rinse 3 x 3 rnin in 2x SSC.
Part D. Post-Autoradi~grayfiyStaining with 4. Dehydrate in 70,80, and 95% ethanol.
Wight's Stain
5. Denature for 2-4 rnin (determine empirically,
This procedure sometimes results in G-banding in starting with 2 min) at 70°C with prewarmed
mammalian (especially human) chromosomes 70% formamide (Kodak ACS) in 2x SSC (pH
(Chandler and Yunis, 1978; cited in Pardue, 1985). 7.0).
1. Stain for 5 rnin in 5% Giemsa solution in 6. Wash at least 3 times in ice-cold 70% ethanol.
phosphate buffer, pH 6.8. 7. Dehydrate in 80 and 95% ethanol, and air-dry.
2. Remove Giemsa by flooding under a gentle
stream of water.
Part B. Hybridization
3. Air-dry slides and cover with a xylene-based
mounting medium (DePex'" or PermountTM). 1. Make hybridization reaction mix:
a. biotinylated probe DNA, 1-3 &/mi in 2x SSC
For G-bands (mammalian chromosomes): b. 500 pg/ml E. coli carrier DNA
1. Place the slides in a solution of Wright's stain c. 30% formarnide
(15 ml in 45 mI of phosphate buffer, p H 6.8)
for 8-10 min. 2. Denature the probe by heating hybridization
mix to 70°C for 5 min, then immediately cool
2. Rinse briefly in distilled water.
by placing on ice.
3. Enhance staining contrast by destaining slides 3. Add 30 pl of hybridization reaction mixture to
in 95% ethanol (2 min), chloroform (15 sec), the preparation on each slide, cover with cov-
95% ethanol plus 1% HC1 (30 sec), 100% erslips (22 mm2).
methanol (2 rnin), and then restaining in
Wright's stain (6-8 rnin); repeat at least once. 4. Seal coverslips with rubber cement.
4. Rinse in distilled water, air-dry, and mount in 5. Incubate at 37°C for 12-18 hr in a humid
~ermount'"or other xylene-based mounting chamber.
medium.
which DNA can be inserted, and a cos site from

phage A) are most suitable for probes to detect
single-copy genes, although cDNA probes are also
1. Remove the coverslips by immersion in a
used. Large cosmid probes often contain repeti-
beaker of 2x SSC at room temperature, and
tive sequences that must be suppressed prior to
peel off the adhesive.
hybridization to avoid non-specific hybridization.
2. Wash slides 5 x 2 min in 2x SSC, p H 7.0, at Suppression can be achieved via competition with
4042°C. unlabeled human Cot DNA sequences enriched
3. Add 30 pl avidin-FITC (3 pg/rnl in 5% bovine for repeat sequences. Cot DNA can be included at
albumin in 1 ml PBST), add a coverslip, and the precipitation step of the nick translation reac-
incubate at 37'C for 1 hr in a humid chamber. tion (Freshney, 1994).
4. Wash 3 x 3 min in PBS.
5. Add primary biotinylated goat anti-avidin Part A. Chromoson~rrPreparation and
antibody: 7.5 pg/ml in PBST with 5% goat Dcnaluration
serum, add a coverslip, and incubate in a hu- (Procedure is same as in Protocol 25.)
mid chamber at 37OC for 1 hr.
Part B. f %ybridizak.ion
6. Wash slides 3 x 3 min in PBS.
1. Re-suspend biotin- or digoxigenin-labeled
7. Repeat steps 3-4 at least once.
probes in hybridization buffer: 2-10 nglyl
8. Counterstain slides with propidium iodide probe DNA in 50% formamide, 5% dextran
(PI, 0.4 ,ug/ml in PBS) and/ or 4,6-diamidino- sulfate, and 500 pg/ml salmon sperm DNA in
2-phenylindole (DAPI, 0.8 pg/ml in PBS), for 2x SSC.
10 min.
2. For probes containing repetitive sequences,
9. Mount in anti-fade mounting medium. denature probe mix by heating it to 70°C for
10. View with UV epifluorescence at 436 nm for 10 min, then incubate at 37OC for 1 hr before
simultaneous observation of fluorescein (yel- application to slide; for probes without repet-
low) and total DNA (orange). View at wave- itive sequences, heat probe mix to 70°C for 10
length of 365 nm for DAPI-stained total DNA min, then chill on ice for 10 min.
(blue). 3. Apply 10-30 pl of denatured probe mix over
11. Photography: For color slides, use Ko- area of spreads and cover with a 22-mm2 cov-
dachrome 64, ASA 160,20-30 sec darkfield ex- erslip.
posure settings; for color prints, use Kodak 4. Seal coverslip with rubber cement.
Ektar 1000, ASA 200,20-30 sec exposure with
darkfield settings (K. Anderson, personal 5. Incubate slides overnight in a moist chamber
communication). Some adjustments of expo- at 37°C.
sure times may be necessary depending on
equipment. lJas+iC. Probe Detection
1. Remove coverslip by immersion in 2x SSC at
room temperature and peel off rubber ce-
Protocol 26: FISH with Single-Copy ment.
Fcnrsmic Prabe 2. Soak slides 2 x 10 min in 50% formamide in
(Time: Part A, 2-3 hr; Part B, 12-18 hr; Part C, l x SSC at 42°C ill a coplin jar.
4-5 hr) 3. Wash slides 2 x 10 min in 2x SSC at 42OC.
Large cosmid clones (a cosmid is a lambda- 4. Rinse slides in PBST.
der~vedplasmid vector containing a selectable 5 . Add blocking serum (3% bovine serum albu-
marker, a plasmid origin of replication, a site into min, BSA, in PBST), 30-60 min at 37'C.
Chromosonzes: Molecular Cytogenetics 165
6. Add primary antibody (polyclonal anti-biotin Chromosome preparation and banding can be
or anti-digoxigenin, in blocking serum), 100 capricious, depending on the particular organlsm
pl per slide (dilute the antibody according to and the kind of banding. Usually it is easiest to
supplier's directions, or determine empiri- obtain good cl~ron~osorne preparations from
cally), cover with coverslip, and incubate in freshly caught, healthy, well-fed individuals (al-
moist chamber for 1 hr at 37°C. though there are always exceptions). Colchlcme
7. Remove coverslip (as above), and wash 3 x 3 may be light-sensitive when in solution, so it IS
min in PBST at room temperature. advisable to make it up fresh just before use, and
to keep it refrigerated. Colchicine powder should
8. Add secondary, FITC-conjugated IgG in
be kept refrigerated and desiccated.
blocking serum (diluted according to sup-
plier's directions, or determined empirically), Among available banding techniques, C-
banding is perhaps the inost foolproof, rellable
and incubate 30 rnin at 37OC in a moist cham-
method for most organisms, although G-bandmg
ber. [Note: Be sure to use correct antibody
works reliably for most species of mammals. Fail-
combinations; for example, rabbit polyclonal
ure to band using the C-banding protocol may be
antibody to biotin as primary, followed by
due to poorly aged slides, inferior Giemsa, or the
RTC-conjugated goat anti-rabbit IgG.1
absence of stainable hetcrochromatin in the chro-
9. Soak off coverslip as before, and wash 3 x 3 mosomes. It is best, therefore, to make sure that
min in PBS at room temperature. the procedure works on an organism known to
10. Counterstain with PI or D M 1 as in Protocol 25. have good C-bands before trying it on uiztcstcd
11. Mount in anti-fade mount for fluorochromes species. Also, if banding is not produced the llrst
(e.g., AquamountrM). time, it sometimes can be induced by treating ihc
same slides a second time; this is especially nn-
12. View slides and photograph using epifluores-
portant for rare or small organisms from svhlch
cencc microscopy and appropriate filters (see
few preparations are available. Sometimes the
Protacol25).
same slides may be used for several different
banding procedures, using the less stringent
methods first (e.g., fluorochrome banding then G-
INTERPRETATION AND banding then C-banding and/or AgNOR band-
TROUBLESHOOTING ing). If fluorocl~romebandlng does not work, then
either the dye is no good (e.g., it is too old or In-
correctly prepared), the wrong excitation f ~ l t e r
Chromosome Bands
was used, or the chromosomes are devoid of the
Chromosome bands can be scored in terms of kinds of sequences for which the fluarochromc IS
their position within and between chromosomes, specific.
as well as their relative sizes (Figure 4). The ter-
minology used for the position of bands and other
markers differs for different organisms, but has
In Situ Hybridization
been standardized for dipterans (especially There are many reasons ~ v h yISH may fail to
Drosophila; Sturtevant and Novitski, 1941) and work. The most common problem is that the 11~7-
various species of mammals (Paris Conference, bridization signal is too wcak or the background
1971; CSKRN, 1973; ISCN, 1981; Rooney and signal is too high. Ideally, there should be sufli-
Czepulkowski, 1986). Banding data usually are cient signal (silver grains or fluorescence) to lo-
presented as a lcaryotype constructed of chromo- cate sites of hybridization unambiguously, but
somes cut from a photomicrograph. It is helpful not so much that details of chromosome structure
to include an idiograrn, which indicates relative are obscured. For radioisotopically labeled repet-
lengths of chromosomes and positions of bands, itive sequences, hybridized sltes are often visibly
especially if the banding pattern is complex. obvious (Plate 11, but de~nonstrationof single-
copy sites usually involves an analysis of silver APPENDIX: STOCK SOLUTIONS

gram distribution in at least 10 different prepara-
tions (Figure 2). Countlng the number of grains
can yield information on target size or number
that can be used to detect amplification or
diminution of particular sequences, or duplica-
t ~ o nevents during stages of the cell cycle (Ses- Mix gelatin powder in 50 ml distilled water and
sions, 1982). For FISH,the most common prob- heat to dissolve. Cool and add formic acid.
lem 1s a signal that is too weak, or fades too
qulckly to photograph. This is compounded by Aawaphibiax~Ringer's Solution
the fact that different filters often have to be ex-
changed to detect multiple probes labeled with 123 mM NaCl
different fluorocl~romes. 2 mM KC1
Possible reasons for difficulties with ra- 0.7 mM CaC12
dioisotopic IS13 include: (1) probe was not la-
beled, or was weakly labeled, or was used at too Bia tinyhaked Coat Anti-Avidin
low a concentration; (2) presence of contaminat- Antibody
ing nucleases; (3) target material was not dena-
tured, or was overtreated (loss of DNA); (4) incu- 5 pg/ml in BN buffer
batlon time was too shart (or too long), or at the 5.0% goat serum
wrong temperature; (5) the washes were too 0.02% NaN3
stringent or not stringent enough; (6) the autora-
dlographic exposure was too long or too short Blocking Solution for Fbrxoresceirz-
(the former is obviously easier to conclude than Avidin
the latter); (7) slippage or loss of the autoradi-
ograpl-tic emulsion during developing, fixing or 5.0 % nonfat dry milk
washing; and (8) absence of complementary se- 0.02% NaN3 in BN buffer
quences on the target.
Srlnilar problems can be encountered with BPa Buffer
non-isotopic ISH, but with additional problems
associated with immunochemistry. These prob- 0.1 M sodium bicarbonate
lems include: (1) absence of staining due to incor- 0.05% Nonidet P-40
rect sequence of antibody staining steps, or omis- Adjust to pH 8.0.
sion of an antibody or antigen; (2) weak staining
due to incubation times that are too short or anti-
body solutions that are too dilute or that have de-
teriorated from improper storage, or the signal 0.15 M NaCl
was not sufficiently amplified; (3) excess back- 2.5 mM MgC12
ground staining due to antibody concentration 0.03 M KC1
beii~gtoo high, or insufficient rinsing of slides be- 0.01 M Na2HP04
tween staining steps.
Strong FlSH signals can be analyzed using Adjust to pH 7a0.
standard epifluorescence microscopy and photog-
raphy, but data analysis and storage can be vastly Culttlre Medium for Peripheral Blood
improved by the use of computer-assisted digital Cells (Verfebrates)
imaging systems such as confocal laser scanning
Appropriate culture medium (e.g., DMEM,
microscopy.
Eagle's, L-15, etc.)
1.25 g/ml NaHC03 Lampbrush Chsamosome Iso1atiaa.r

10% fetal calf serum Medium @:I)
2 mM L-glutamine
5 parts 0.1 M KC1
100 U/ml penicillin
1part 0.1 M NaCl
100 mg/ml streptomycin
1.5% (v/v) phy~ohemaglutinin
Mclkvaine's Buffer
Solution A: 0.1 M anhydrous citric acid
DAB SCaining SesSution
Solution B: 0.4 M anhydrous sodium phosphate
0.5 mg/ml diaminobenzidine in PBS dibasic
1/ 100 vol of 1% hydrogen peroxide
pH 5.6: 92 ml solution A + 50 ml solution B (adjust
Denkra~df'sSolution (1 xb pH)
0.02% BSA pH 7.5: 80 ml solution A + 920 ml solution B (ad-
0.02% Ficoll just pH)
0.02% polyvinylpyrrolidone
Dextran sulfate stock (50%) Methyl Green Stock SdsIutiorl
20 g dextran sulfate 0.11 g methyl green
25 ml phosphate buffer, p H 6.8
Add distilled water to final volume of 40 ml. Filter
through two Whatman No. 1 filter papers (takes
several hours under vacuum). Nick lii.anslakiun Buffer flOX)
0.5 M Tris-HC1 (pH 7.5)
Famanaide Stock, Deionized (>95%) 0.1 M MgS04
500 ml formamide 1 mM dithiothreitol
25 g mixed resin, 20-50 mesh [AG 501-X8 (D), 500 pg/ml bovine serum albumin
BIO-MD]
Stir for 30 min at room temperature. Filter
PBS
through a Whatman No. 1 filter. Store in aliquots 0.15 MNaCl
at -20°C. 0.05 M NaHPO*
Adjust to pH 7.4.
Hybridizatioar Buffer for Biotin-
l,a&eledSPaobes Phosphate Buffer, pH 6.8
50% deionized formamide 0.025 M KH2P04
0.6 M NaCl
10 mM Tris-HC1, pH 7.5 Titrate to pH 6.8 with 50% NaOH.
1mMEDTA
1x Denhardt's solution Paraform;aTdeBryc%e(4%)
0.5 mg/ml carrier RNA (Sigma, Type IV) 20 g paraformaldehyde
10% dextran sulfate (from 50% stock) 500 ml PBS
Filter-sterilize through 45 pm nitrocellulose filter Heat to 65OC, stirring rapidly until completely dis-
and store at -20°C. solved.
SCP (Ix)
0.12 M NaCl 10 mM Tris
0.015 M sodium citrate 10 mM MgC12
0.02 M NaP04 Adjust to pH 7.5.
Adjust to pH 6.0, if necessary,
'FPBS
0.15 M NaCl
3.0 M NaCl 4 mM NaHP04
0.30 M sodium citrate 4 mM Tris-HC1
Adjust pH to 7.0 with 10 N NaOH. Adjust to pH 7.6.
Subbiing SottaZiorx for Microscope

Slides 0.1 M NaCl
0.1% gelatin 10 mM Tris-HC1
0.01% chromium potassium sulfate Adjust to pH 7.4.
Dissolve gelatin in hot distilled water, cool and
add chromium potassium sulfate. Store at 4'C. Yeast Suspension
2-3 g fresh dry yeast (e.g. Fleischmann'sm
"active dry")
0.15 M NaCl 5-6 g dextrose
20 mM Tris-HCI 25 ml warm water
Adjust to pH 7.4. Incubate at 40°C until it begins to foam vigorous-
ly ( ~ 3 min).
0
TCAJB §A
50 p1stock BSA (1mg/ml)
845 pl distilled water
100 pllOO% TCA
Chapte
eic Acids I:
DNA-DNA Hybridization
Steven D. Werman, Mark S. Springer, and Roy J. Britten
INTRODUCTION
This chapter focuses on "in solution" hybridization for the quantitative assess-
ment of relatedness of biological species using nuclear DNA. We therefore ignore
filter hybridization, whiclz has not yet been shown to be useful for the quantlta-
tive evaluation of relationships. In addition to comparing and reviewing different
DNA hybridization protocols, we also consider laboratory practice, DNA reasso-
ciation kinetics, the significance of genome organization to DNA hybridization
data, the interpretation of melting curves, and the application of DNA hy-
bridization data to systematics. Recently, the latter topics have come under close
methodological and analytical scrutiny, We believe it is important to glve due at-
tention to these issues in a volume on molecular systematics.
Early DNA hybridization/DNA kinetics studies include those of Wetmur
and Davidson (1968), Kohne (1970), ICohne and Britten (1971), Bonner et al.
(19731, and Britten et al. (1974). These authors provided a sound description and
theoretical underpinning for the DNA hybridization technique as well as the un-
derlying reassociation kinetics of DNA. They also introduced severaI metrics lor
DNA l~ybridizationdata. Although these studies were not primarily concerned
with systernatics, the distance metrics that they suggested are still in use today.
Large-scale application of the DNA hybridization technique to problems 111
systematics was pioneered by Charles Sibley and Jon Ahlquist. Their r a p ~ d
progress was made possible by the construction of an automated thermal elution
170 Chapter 6 / Werman, Springer 8Britt
dev~ce,appropriately named the "DNAnalyzer". phological characters and/or life-history traits.

Although the quantity of data generated by Sib- Such phylogenies are sorely needed if we are to
ley and AhIquist is truly impressive, their studies understand how phenotypic characters and life
have often been criticized because the individual history traits evolve.
characters (i.e., nucleotides) remain unidentMed.
Proponents of DNA hybridization have countered
that the sheer number of nucleotides under com- PRINCIPLES AND COMPARISON OF
parison compensates for this lack of individual METHODS
iderztification. The mammalian genome, for ex-
ample, 1s 3 x lo9 nucleotide pairiper haploid set
and th~lsthere are about 10,000,000 essentially dif-
General Principles
ferent fragments when it is sheared to lengths of DNA hybridization takes advantage of the dou-
about 300 nucleotides. The large number of frag- ble-stranded nature of the DNA molecule in
ments always involved in interspecies cornpar- which nucleotides on opposing strands are held
isons effectively suppresses distance fluctuation together by hydrogen bonds. In the case of ade-
due to sampling and the accuracy is determined nine and thymine, there are two hydrogen
ent~relyby the measurement technique. However, bonds. Guanine and cytosine, in turn, are held
it appears to us that the issues of (lydistance data together by three hydrogen bonds. When dou-
versus discrete character data in phylogenetic re- ble-stranded DNA is heated to 100°C, the hydro-
construction (see Chapter 11) and (2) the number gen bonds between complementary base pairs
of nucleotide positions under comparison are are broken and the opposing strands separate.
largely separate issues. Tt is therefore appropriate Subsequent cooling of the solution facilitates re-
to ask if distance data in general and DNA hy- annealing of the complementary strands. Reas-
brid~zationdata in particular have systematic sociation conditions (e.g., salt concentration,
value. We will attempt to place these issues in per- temperature, viscosity, fragment size) determine
spective. the amount of base pair mismatch in the hybrid
As mentioned above, there are other points of molecules that are permitted to form. At high-
conlent~onsurrounding DNA hybridization. One stringency reassociation conditions, which gen-
of rlwse is the choice of an appropriate distance erally are achieved by decreasing the salt con-
metric, (l.e., AT,, versus ATsoHversus NPH versus centration and/or increasing the temperature,
AT,,,,,). We will also comment on these issues. interspecies base pairing will only occur between
In spite of these current controversies, DNA well-matched sequences. At progressively lower
l~ybridizationis now a practical technique that is stringency conditions, increased mismatch is tol-
used broadly to assess evolutionary relationships. erated up to a point at which random reassocia-
It primarily is applied to single-copy DNA, and tion occurs.
select~onamong sequences is considered to be DNA from two different species can be com-
unimportant to the evolution of many sites within bined, denatured, and then allowed to reassociate.
thls class of DNA (Britten, 1986).Thus, hybridiza- The double-stranded molecules that form be-
tiori primarily is a measure of the neutral drift of tween complementary strands from the two
thc DNA and is not a measure of the events af- species will contain base pair mismatch because
fecting p h e n ~ t y p i cchange. This decoupling of of their evolutionary divergence from a common
DNA sequence evolution and phenotypic evolu- ancestor. The extent of mismatch determines the
tlon may have profound consequences for evolu- temperature at which these hybrid molecules melt
tionary morphologists studying processes and when they are placed in a thermal gradient. The
rates of phenotypic change. In particular, se- depression of melting temperature in a heterodu-
quence data and hybridization data may allow plex hybrid relative to a homoduplex hybrid then
the construction of phylogenetic hypotheses that serves as an index of divergence between the
are largely independent of the evolution of mor- DNAs under comparison. The extent of reassocia-
Nucleic Acids I: DNA- DNA Hybridization 171
tion in homoduplex versus heteroduplex reac- cubation in seconds). Cot plots, in turn, allow one
tions can also be measured, although the factors to determine a Cot value (under specific incuba-
that influence percentage reassociation are not as tion conditions) at which repetitive sequences
easy to disentangle. have reassociated and can be separated from sin-
gle-stranded, single-copy DNA by hydroxyapatite
Summary of the DNA Hybridization column chromatography (Kohne and Britten,
1971). Fractionated single-copy DNA from one
Techniques and Data Analysis species is then radioactively labeled (txacer) and
Briefly, double-stranded DNA is isolated and then hybridized with unlabeled DNA (driver) from the
purified to remove RNA and protein. Long- same species (homoduplex reaction) and from
stranded DNA is then fragmented to short pieces different species (heteroduplex reactions). When
to permit separation of repetitive and single-copy the hybridization is complete, melting profiles
DNA and to reduce viscosity and gel formation. and the extent of reaction are then determined.
Fractionation of single-copy DNA from repetitive Melting profiles, in turn, permit the quantification
sequences is accomplished most easily using reas- of median and/or modal melting temperatures.
sociation kinetic techniques developed by Britten Differences in these parameters between homo-
et al. (1974). These methods facilitate the con- duplex and heteroduplex curves are then used as
struction of Cot plots (Figure 11, which present the the estimates of genetic distance, AT, and ATmod,.
percentage of single-stranded DNA versus the log The extent of hybridization for an interspecies
of Cot (Cot = initial concentration of DNA in moles heteroduplex measurement may be divided by
of nucleotides per liter multiplied by time of in- that for the homoduplex control and multiplied
(A) 100 (B) 100
'8
2
E
3z
4-
Z Z
% 50 $ 50
5 .-
1
Y
!b3
c, 2
o lo-' lo0 lo1 0 10" lo-' lo0 10' 10' 103 104
Cot Cot
Figure 1 (A) Ideal reassociation curve (Cotcurve) for a cludes a mixture of highly repetitive, moderately repet-
single class of DNA (i.e., single-copy or a single fre- itive, and single-copy components. The individual re-
quency class of repetitive DNA). The curve tracks the association curve for the slowest (higher Cot)compo-
loss of single-stranded DNA (determined by the for- nent is shown, with the approximate half-Cotidentified
mula 1/1 + kCot; see text) and the formation of duplex by ( a ) . Horizontal dashed lines approximate the per-
DNA over log intervals of Cotas expressed in [moles of centage of the genome that each class comprises (20%
nucleotide/liter] x sec. The "half-Cot" is the Cot value highly repetitive, 20% moderately repetitive, and 60%
(here = 1) at which 50%of the DNA has reassociated. In single-copy).Since single-copy DNA is the last compo-
an ideal reaction, 80% of the DNA reassociates over 2 nent to reassociate (half-Cot= l,OOO), it can be fraction-
log intervals; thus the Cot value at which 90% of the ated from the repetitive DNA(over hydroxyapatite) by
DNA has reassociated is 10 times the half-Cot.(B) Hy- reassociating the total DNA to a Cot value of 100. At this
pothetical reassociation curve for genomic DNA that in- point, 90% of the single-copy DNA is single-stranded.
172 Chapter 6 / Werman,Springer b Britter
by 100 to obtain a normalized percentage of hy- When distance data are non-additive in expecta-
bridization (NPH).NPH values are generally con- tion, a simple evolutionary path length interpre-
sidered to be a measure of the fraction of the DNA tation of distances on best-fit trees is confounded.
that has diverged to the point where it will no How well do DNA hybridization data fare
longer form stable interspecies dupIexes under under the assumption of additivity? Unfortu-
criterion conditions (see below). However, NPH nately, several factors may compromise the addi-
may also be influenced by (I) sequences in the tivity of DNA hybridization data (Springer and
single-copy genome of the tracer species that are Krajewski, 1989). These factors include homo-
deleted in the single-copy genome of the driver plasy (i.e., parallelisms, reversals, multiple hits),
species, and (2) kinetic effects (i.e., rates of reasso- sequences that are too divergent to form het-
ciation decrease as interspecies sequence diver- er6duplexes, pairing between paralogous se-
gence increases; Bonner et al., 1973). quences, horizontal gene transfer, measurement
Finally, AT, and NPH are sometimes incor- error, the distributioi; of rates of sequence change
porated into yet another distance measure, AT50H. for different sequences, and the history of genetic
These different measures of genetic distance can variation in different lineages. Some of these fac-
then be used in phylogenetic analysis. Typically, tors also affect sequence data. On the other hand,
complete matrices in which each taxon has been violations of additivity do not necessarily pre-
labeled and compared to all other taxa are used clude the accurate recovery of branching order,
for this purpose. Algorithms for phylogenetic even if they cast doubt on the validity of branch
analysis with distance data (see Chapter 11)in- lengths (chapter 11).In some instances (e.g., ho-
clude phenetic methods (Sneath and Sokal, 1973), moplasy), there are even remedies for sources of
best-fit methods (Fitch and Margoliash, 1967; non-additivity; these remedies make it possible to
Cavalli-Sforza and Edwards, 19671, minimum- approach truly additive data. In other instances,
length tree methods (Farris, 1972, 1981; Faith, further work will be required to improve the in-
1985; Saitou and Nei, 19871, and maximum likeli- terpretation of DNA hybridization distances. For
hood methods (Felsenstein, 1987). example, we can use the Jukes and Cantor (1969)
model to correct DNA hybridization distances for
homoplasy, but this requires that we know the
Properties of Hybridization Data conversion between melting point depression and
Genome structure, sequence evolution, popula- percent sequence mismatch (as well as requiring
tion history, and the DNA hybridization tech- that the assumptions of the model are met). Evi-
nique all affect the content of distance matrices. dence for a linear relationship between the ther-
The task at hand is to understand how these fac- mal stability of imperfect hybrids and the extent
tors affect DNA hybridization data and then to se- of sequence divergence shows that about one de-
lect appropriate tree-construction algorithms. gree change in melting temperature corresponds
When rates of sequence evolution are not the to 1%sequence divergence (Bonner et al., 1973).
same in all lineages, for example, UPGMA clus- Other estimates of this relationship are discussed
tering is an inappropriate tree-construction algo- in a following section.
rithm if we desire trees that accurately reflect
phylogeny. Many tree-construction methods,
however, make no assumption about equal rates
Factors Affecting DNA Hybridization
of change (see Chapter 11).Partly for this reason, The kinetics of DNA hybridization are affected by
methods that do not assume a molecular clock several factors, including genome size, copy num-
have become increasingly popular with distance ber, DNA fragment size, and base composition.
data. On the other hand, most of these methods Genome size is significant because the rate of re-
do assume that distance data are additive (at least association or hybridization is inversely propor-
in expectation). Indeed, an evolutionary interpre- tional to the number of different sequences in the
tation of branch lengths on a best-fit tree requires genome. Since most of the DNA is single-copy
an underlying additive matrix (Farris, 1981). (that is, most sequences are different from each
Nucleic Acids I: DNA-DNA Hybridization 173
the rate of hybridization is determined pri- associated fragments (which include most repctl-
marily by the genome size or DNA content per tive elements) on hydroxyapatite. Precis~on
haploid set of chromosomes. Genome size varies among multiple hybridizations, with prepared
widely among taxa, ranging from 106to about 1011 single-copy DNA, will be improved by knowing
nucleotide pairs among eukaryotes (Britten and the required Cot and accurately controll~ngthe
Davidson, 1969; Cavalier-Smith, 1985).For bacte- DNA concentrations, purities, and fragment
ria the DNA content is much smaller and for sizes. For fragmented DNA in solution (500 nu-
viruses it can be less than 104 bp. This represents a cleotide-long fragments at 60°C in 0.12 M ncutral
10 million-fold range in size, and thus rate of hy- phosphate buffer), the fraction that remains sin-
bridization cannot be ignored and is a central part gle-stranded (i.e., has no duplexed regions and
of experimental design. does not bind to hydroxyapatite) can be slmpiy
Repetitive DNA makes up a significant mi- expressed as follows:
nority of the genome of all eukaryotes (e.g., Fig-
ure 18) and some prokaryotes and greatly influ-
ences the dynamics of hybridization. Repetitive
DNA typically shows a large amount of diver- where k is the practical reaction rate constant (lo6
gence within the genome of an individual and divided by the genome slze in bp). The Cot 1s most
does not usually evolve at the same rate as single- easily calculated as l o x U XH X A or 2 x OD x l i x
copy DNA, Thus, it is practical and necessary to A, where A is an acceleratior~factor that depends
remove the repeats. Very few data exist for inter- on incubation conditions (= 1.0 at 0.72 M PB at
species hybridization of repetitive DNA and it has 60°C), U is micrograms per microliter, H is hours,
not been used for effective resolution of system- and OD is optical density at 260 nin for a 1-cm
atic issues although the evolution of the repeats path.
themselves is of some interest. For mammalian DNA with 3 x 109bp a Cot of
There are two essential problems in the sepa- about 3000 is required for half reassoc~ation.Wxth
ration of repetitive and single-copy DNA. First, 1yglpl of DNA under these conditions the Cot is
repeats are interspersed throughout the single- only 240 per day. To accelerate the process, the
copy DNA requiring that the DNA be sheared to DNA concentration and the pl~osphatebuffer con-
small fragments of a few hundred nucleotide centration can be raised (10 pg/pI is about the
pairs (bp). Second, it is not practical to separate practical limit for the former; Britten et al., 1974).
low-frequency repeats from single-copy DNA. These modifications can be used to obtain a C,f of
Tlie more rapid rate of reassociation of repeats, about 30,000 (or 10 times the half-Cot)in a day and
compared to single-copy sequences, has been the a half. At this Cot, the single-stranded fract~on1s
only means for separating these two classes (see only about lo%, whlc11 is sufficient for most ap-
hydroxyapatite procedure described below). plications. Near the end of the reaction an increase
However, separation based on reassociation rate of a factor of 10 in Cot reduces the amount of un-
is never absolute, and small numbers of copies of reacted DNA by a factor of 1Q.In practice, DNA
each repeat family will remain in the "single- degradation may occur and it may not be prof-
copy" DNA. This problem has never caused sig- itable to use more than a few days of incubation,
nificant uncertainty in the interpretation of hy- although chaotropic agents may help by reducing
bridization data because the quantity of DNA the temperature of incubation.
involved is small. The only situation in which this
source of error is likely to be significant is at great
evolutionary distances when very small amounts
The Criterion and Precision of
of hybridization are observed.
Reassociation
To fractionate the repetitive and single-copy The precision of the partially matching duplexes
components, it is practical to reassociate short- that can form during reassociation is determmed
fragment DNA to the Cot required for 10% reas- by the temperature and ionic strength of the incu-
sociation of single-copy DNA and remove all re- bation buffer. Together they establish the criterion
174 Chnpter 6 / Werman, Springer b Brittt
(stringency of reassociation) that is usually de- the use of chaotropic solvents (e.g., TEACL, de-
scribed as the difference between the T, of perfect scribed below) for duplex denaturation.
duplexes (about 85°C in most cases for 0.18 M Na*
or 0.12 M phosphate buffer) in the incubation Comparison of the Primary Methods
buffer and the temperature of incubation (Britten
e l al., 1974: 366). The optimum rate of reassociation Hydroxyapatite
occurs at about 25°C below the T, of the duplexes In 0.12 M PB at temperatures from 45 to 60°C,
being formed (Bonner et al., 1973). If the condi- double-stranded DNA binds efficiently to hy-
tions are too stringent (l.e., if the temperature is too droxyapatite (HAP) whereas single strands do
high and the salt concentration too low), the NPH not. Further, HAP continues to bind double-
is reduced and all of the duplexes have high melt- stranded DNA until the melting temperature is
ing temperatures. The maximum AT, under such reached. Thus, the separation of single- and dou-
conditions is quite small, reducing the resolving ble-stranded DNA is a simple procedure. The
power of the method. On the other hand, if the practical capacity of HAP for DNA, including di-
condihons are too relaxed, the rate of reassociation vergent sequence duplexes, is about 100 pg/400
is reduced and dissimilar sequences may form un- mg HAP in buffer, although native DNA may be
stable duplexes. Under these circumstances, even bound at much higher levels. However, small
very distant sequences form duplexes, possibly to amounts of duplex DNA are slightly eluted near
the exclusion of more closely related sequences. It the melting temperature in 0.12 M PB (G.M. Fox
1s not known what fraction of duplexes are be- et al., 1980b; Martinson, 1973). Thus, it pays to use
tween non-orthologous sequences, and this poten- consistent flow rates and elution volumes for high
tially may provide misleading hformation of evo- accuracy.
lutionary history. However, under more suitable Many substances, such as 0.3 M NaCl or
conditions (2540°C below the T,) practically all NaAc, 7 M NaC104, or 8 M urea, can be present
of the duplexes that form are thought to involve and d o not interfere with the binding to HAP,
orthologous sequences. These have little influence on the separation of
Hybridizations in phosphate buffer (PB) are single- from double-stranded DNA, although
carried out at 60°C in about 0.48 M PB to acceler- they do influence the T,. However, other sub-
ate the reaction (see above) and thereafter diluted stances, such as small amounts of CsC1, protein,
to 0 32 M PB for thermal denaturation. Upon di- and some metallic ions, interfere with duplex
lution, the effective temperature of incubation is binding. For a detailed account of the HAP proce-
reduced to about 53°C. Consequently, any tracer dure see Britten et al. (1974).
that elutes below this latter temperature is not The major advantage for the use of HAP for
due to denaturation of duplexes formed during DNA hybridization studies is that it is the most
the hybridization reaction and therefore can be explored technique. It has been investigated ex-
considered an artifact. tensively at the physicocheinical level and applied
Finally, the base composition of duplexes can the most widely to systematic problems. Disad-
affect their individual melting temperatures. Since vantages, however, include (1)melting curves that
C-C pairs share three hydrogen bonds and A-T are broader than those obtained with the TEACL
pairs share two, DNA double strands that are G-C method, and (2) the time involved in running in-
rich will melt at a slightly higher temperature dividual columns and large numbers of taxa that
than an A-T-rich fragment in standard phosphate may require automated procedures.
buffer. This factor tends to increase the width of
the melting curve for mixed fragments (i.e., when SI Nuclease and Precipitation to Assay Melting
the tracer is not from a source of cloned frag- Many hybridization experiments (Benveniste and
ments) in intra- or interspecies comparisons Todaro, 1976; Benveniste, 1985; O'Brien et al.,
where the .total single-copy fraction is used. The 1985a) have used a straightforward procedure in
eiie?! of base composition can be eliminated by which, after hybridization, the DNA is treated
Nuc leic Acids I: DNA- DNA Hybridization 175
with the single-strand-specific S1 nuclease and cone, 1989). This procedure, however, is some-
precipitated. S1 nuclease degrades single- what more complicated and time consuming than
but not double-stranded, DNA. The ef- the standard PB/HAP system. At present, the
fective criterion is established by the rigor of the method is most useful for closely related species.
enzyme treatment and these authors have shown TEACL is usually combined with S1 nuclease
that different degrees of S1 digestion can change methods for systematic problems. The advantages
the criterion significantly.With this procedure the of this system include (1) elimination of the need
extent of hybridization is never as large as with for HAP columns, (2) many samples can be run at
HAP because the effective kinetics of reassocia- a single sitting (depending on the size of the heat-
tion are different since all unduplexed regions are ing block), (3) melting curves are narrow as com-
digested, and because the unpaired tails of re- pared to HAP, and (4) TEACL compensates for
gions containing duplexes bind to HAP but are A-T, G-C differences. The disadvantages, as com-
digested by 51. The result is a kinetic curve such pared to HAP procedures, are that (1)S1 nuclease
as S = (1 + kCat)-044. Ultimately, there is likely to activity must be carefully standardized between
be steric hindrance and further reduction in rate assays, (2) the criterion depends on enzyme treat-
of reassociation so that the practical extent of comment, and (3) the NPH is more difficult to control.
pletion for high quality tracer and driver from the Although we include a representative TEACL
same species is likely to be only about 70% Sl re- protocol below (Protocol ll), other variations are
sistant. For hybridizations between moderately in use. For details of similar methods see Powell
distant species, the NPH is reduced compared to and Caccone (1990) and Caccone and Powell
that observed with HAP. There is apparently a (1992).
proportionality between NPH and AT,,, with a
slope that depends on the degree of S1 digestion Melting Curves Combining the Advantages of
(Benveniste, 1985).Thus, the activity of particular TEACL and Hydroxyapatite
S1 nuclease preparations must be assayed and ap- In 2.4 M TEACL, precisely paired DNA melts at
plied in a consistent fashion. about 62°C regardless of base composition. The
width of the melting curve is about 1.5"C. How-
The Use of Tetraethylammonium Chloride ever, this procedure has two disadvantages. First,
Tetraethylammonium chloride (TEACL) is a the criterion of reassociation cannot be set for
chaotropic solvent that essentially eliminates the widely divergent duplexes since such duplexes
effect of base composition on hybrid melting tem- are digested by S1 almost as fast as single strands.
perature at a concentration of 2.4 M (Meldhior and Second, the combination of the HAP technique
Von Hipple, 1973; Hutton and Wetmur, 1973). and TEACL melting characteristics has been re-
With the use of TEACL the observed width of a stricted by the fact that TEACL and phosphate
precise duplex melting curve will decrease by a buffers tend to form precipitates or two-phase so-
factor of 10, from about 14°C (in 0.12 M PB) to lutions under a variety of conditions. However,
about. 1.5"C. The relative technical advantages of we have observed recently that in the presence of
TEACL for DNA hybridization are discussed in high concentrations of TEACL, the phosphate
Powell and Caccone (1990). Additionally, since concentration that permits duplexes to bind but
TEACL interferes with hydroxyapatite chro- allows single strands to pass through HAP is
matography, single-stranded DNA cannot be re- much reduced.
moved except by digestion with S1 nuclease (see Although we have not exhaustively varied
below). the concentrations and conditions, a good com-
The TEACL procedure has been used effec- promise is 2.0 M TEACL and 0.013 M PB (PT).
tively to detect intraspecific polymorphism in sin- This solvent is stable from 4°C to at least 75'C.
gle-copy sequence divergence (Britten et al., 1978) Precise duplexes bind HAP very well from room
and evolutionary relationships of closely related temperature upward and are eluted from HAP
Drosophila (Caccone et al., 1988a; Powell and Cac- only as they are melted by increasing tempera-
176 Chapter 6 / Werman, Sprivlgev Ci. Briften
ture. However, single strands bind below 50°C, so an internal standard, and using a large number of
this method is restricted to comparisons of rela- columns in an automatic machine. Of course,
tively closely related species. tracer and driver fragment sizes and concentra-
Native long DNA melts in PT at 68OC with a tions (as well as other critical variables) should be
width of about 3"C, and sonicated fragments of controlled carefully in automated procedures as
DNA (500 bp average) melt at 65OC (as deter- in manual methods. There are no automated ma-
mined by elution from HAP in PT). As an exam- chines as yet available commercially. Most of
ple of fractionation, we have used the method to those in use are based on individual requirements
isolate precisely paired repeat duplexes by incu- and design.
bation to Cot 100 in PT at 60°C and passing the so-
lution over HAP at this temperature. The temper-
ature is raised in steps to 64OC with washes of PT; APPLICATIONS AND LIMITATIONS
then the temperature is dropped and the HAP
washed with low concentrations of PB to remove Investigations into the kinetics of DNA reassocia-
the TEACL. The duplexes are eluted with 0.4 M tion form the foundations of DNA l~ybridization
PB for analysis. as a tool for questions of systematic and evolu-
PT is a good solvent for DNA reassociation tionary relationships. Studies regarding the rate of
because it has a rate acceleration factor of more reassociation of sheared, total genomic single-
than 10 (as compared to 0.12 M PB at 60°C) at its stranded DNA have provided quantitative esti-
optimum temperature of 40°C. The acceleration mates of the degree of sequence repetition, length
factor drops to about 1.0 at 65°C or about 3°C be- of repeated sequences, and the interspersion pat-
low the T,. It gives narrow accurate melting tern of these sequences throughout the genome
curves where long native duplexes melt at about (Britten and Kohne, 1967, 1968; Britten et al.,
68°C. 1974). Highly repeated sequences reassociate far
A method is now possible in which DNA is more rapidly than single-copy sequences, and by
melted in 2.4 M TEACL and then the samples are varying the fragment length of the reassociating
diluted 40-fold into 0.12 M PB and passed over DNA, it is possible to estimate repeat length and
HAP to separate duplex from single strands (C. interspersion patterns. Thus, kinetic studies have
Hsiao, personal communication). Those who test yielded a wealth of information on genome orga-
this promising method further will have to ascer- nization and structure in prokaryotes and eukary-
tain that the small concentration of TEACL does otes, as well as providing a method useful for the
not elute some divergent DNA duplexes and per- separation of specific sequence classes. Britten
haps test for the best HAP temperature and the and Kohne (19681, as well as Hood et al. (1974:56),
PB concentration for ideal separation of double provided explanations of reassociation kinetics
from single strands. and Cot analysis.
There are at least two other observations de-
Automated Melting Assay rived from kinetic studies that are important CO
In the earliest eukaryotic interspecies DNA hy- systematic applications of this technique. First, the
bridizations (Kohne, 1970),HAP elution was car- observed reduction in the thermal stability of re-
ried out with a pump and the temperature was associated hybrid DNA (AT,) is directly propor-
raised with a control on a single column. The ac- tional to the sequence difference (in percentage
curacy was increased by using two isotopes and base pair mismatch) between reannealed single
an internal reference DNA. Sibley and Ahlquist strands. Second, this sequence divergence can re-
(1987b and references therein) used an automatic duce the rate at wluch sequences reassociate (Bon-
machine to process 25 HAP columns simultane- ner et al., 1973).In other words, as divergence in-
ously. A good compromise might be made be- creases between sequences, their reassociation
tween accuracy and efficiency of measurements rate slows. This issue is discussed in a later sec-
by avoiding iodination, incorporating the use of tion of this chapter.
Nucleic Acids I: DNA- DNA Hybridizatiolz 177
Several studies at the intraspecific level have bridization has limitations. Since many o l these
utilized hybridization techniques to assess in- details are discussed throughout this chapter, we
terindividual and interpopulational sequence di- provide a brief list (arbltrar~lyordered) of lnajor
vergence and variation. Britten et al. (1978) deter- points.
mined the magnitude of single-copy sequence 1. Direct sequence data are not uncovered, and
polymorphism among individuals of the sea the data are m the form of distance mforma-
urchin Stronglyocentrotus purpuratus and, surpris- tian.
ingly, found it to be about 5%. Similarly, diver-
gence estimates also have been made for isogenic 2. Comparisons are restricted to the single-copy
(parthenogenetic) strains of Drosophila mercatorum fraction of the genome.
(Caccone et al., 1987). Both of these studies em- 3. Dramatic differences In the size of the slnglc-
ployed the use of TEACL to decrease the effective copy fraction between species paus could
width of the melting curves for more accurate de- produce errors in reciprocal measurements of
terminat~onsof AT, over standard phosphate NPH, although thls has not yet been docu-
buffer conditions. Others have used the latter sys- men ted.
tem lo determine intraspecific variation in sea
stars (M.J. Smith et al., i982), herons (Sheldon, 4. Large amounts of i~~traspecific
polymorphism
1987), diprotodont marsupials (Springer, 1988), can be problematic in the estimation of phy-
cave crickets (Caccone and Powell, 19871, and logenetic relationship of closely related
Drosophila (Powell and Caccone, 1989). The mag- species.
nitude of intraspecific variation may be important 5. The upper limit of divergence between
to consider in determining the relationships species where relationships can bc deter-
among closely related species (Chapter 11). mined by this method is set by the cond~tions
The majority of hybridization studies, as ap-
sf DNA reassociation. For example, with
plied to systematics, have involved species and
HAP procedures at standard conditions of 1n-
higher taxon relationships, up to family and ordi-
cubation it is generally difficult to estimate rc-
nal level comparisons. The effective limits of reso-
lationships with reasonable certainty if the
lution depend primarily on the degree of diver-
NPH falls below 50% and the AT,,, is greater
gence (as related to rate) among taxa under
than 20°C. However, Krrsch et al. (1991) have
investigation (see below). Powell and Caccone
used NPH values of less than 50% to look at
(1989) noted that the smallest interspecific differ-
interordinal relationships among marsupials.
ence accurately resolved in their studies with
Marshall and Swift (1992), using a NaC1-Sl-
TEACL was a T, reduction of 0.27"C. Recent
nuclease assay, provided evidence that phylo-
studies on invertebrates include Drosophila (Cac-
genetic comparisons can be made where reac-
cone et al., 1992), sea urchins (M.J. Smith et al.,
tions fall below these NPH and AT, values.
1990), and sand dollars (Marshall, 1992; Marshall
and Swift, 1992).Interspecific comparisons wit11 a 6. DNA hybridization is relatively expensive as
primary focus on phylogenetic relationships compared to other techniques and involves
among birds include Sheldon (19871, Sheldon et the use of radioisotopes.
al., (19')2), Sibley and Ahlquist (1987a,bf and ref-
erences therein), and Krajewski (1989). Mam-
7. In many milligram quantities DNA
are required to permit reasonable compar-
malian studies include Bledsoe (19871, Kirsch et
isons with replication. Thus, comparisons
a1.,(1990a, 1990b, 1991), Springer (1988), Springer
among individuals are restricted to organisms
et al., (1990, 1992b), and Springer and Kirsch
from which the required amount of DNA can
(1989,1991).
be extracted. However, the PERT procedure
As with other molecular techniques used to
(Protocol 10) requires less DNA, allowing
obtain information useful to phylogenetic recon-
studies to be accomplished with several hun-
struction and systematic relationships, DNA hy-
dred micrograms.
178 Chapter 6 / Wernzan, Springer b Britten
PROTOCOLS
L.4BORATORY SETUP
1. DNA isolation and purification
Most of the supplies required for DNA hybridiza- 2e sheared drivers from long native
tlon are those corninonly used in other DNA iso- DNA
lation, manipulation, and characterization tech-
mques, including cloning and sequencing 3. Tracer preparation with 32E' and 3H
(Cl-i'lpter 9). General supplies include centrifuge 4. Tracer self reaction and repeat removal
tubes, culture tubes, assorted glassware, ceramic
mortar and pestle, pipettes (with microliter to mil- 5 . Fractionation of single-copy tracer over
llliier delivery), filters, razor blades, liquid scintil- hydroxyapatite
lation vials, etc. Supplies that may be unique to 6. Estimation of tracer fragment length
DNA hybridization include capillary tubes
(10-100 p1 volumes) in which hybridization reac- 7. Preparing tracers by iodination
tions are carried out and polystyrene disposable 8. D N hybridization
~ with hydroxyapatite and
chron~atograpl-iycolumns with filter discs, 6.5 ml phosphate buffer
capaclty (Figure 2) Equipment needed is shown
in Table 1. 9. Hydroxyapatite column preparation
10. Phenol emulsion reassociation technique
Figure 2 Diagrammatic frontal view of a multicolumn (PERT)
apparatus for fractionating tracers that have been self-
reacied to remove repeats and/or foldback DNA. The
system conslsts of an acrylic plastic box with legs,
ll. '' "ybrid stabiliwusing
the S1 nuclease-TEACL assay
through which heated water can be circulated. Holes
drlllrd in the top and bottom allow for the insertion of
disposable plastic columns, with filter discs, sealed by
rubber grommets. A thermometer placed in the last
tube (with water) is used to measure column tempera-
Lure The apparatus is hooked up to an externally cir-
culating heated water bath as shown in Figure 3.
~ u c l e i Acids
c I: DNA-DNA Hybridization 179
Table 1
Primary equipment used in DNA hybridization
Equipment Use
Refractometer (precise to 0.0001 refractive Determine solutlon molarityD

index units)
Balance, analytical Weighing small samples
Balance, top loading General weighing
Centrifuge, microtube Centrifuging small samples
Centrifuge, high speed refrigerated, with rotors for DNA isolation
250- or 50-ml bottles
Electrophoresis apparatus (agarose) large and mini DNA and tracer sizing
submerged horizontal
Gamma counter Detection of 1 2 5 ~ a
Gradient heating block Thermal denaturation of DNAa
Liquid scintillation counter Detection of 32P and 3H
Lyophilization apparatus DNA purification for iodination
Adjustable micropipettes: 20 ,u1,100 pl, 1000 pl General solution manipulation
Spectrophotometer Determining DNA concentration of
samples
Ultrasonic cell disruptor (sonicator, probe type) Shearing DNAa
UV transilluminator DNA detection in gels
Water baths
Heated (non-circulating) Hybridization incubation
Heated (circulating) Thermal denaturation of DNA
Refrigerated Nick-translation labeling
Water jacket for multiple columns (Figure 2) Tracer fractionationo
Water jacketed glass column with stopcock and Thermal denaturation of DNA on
scintered glass filter disc (Figure 3) hydroxyapatitea
Vacuum chamber Drying DNA samples
Vacuum pump Drying DNA samples
5pecific to DNA hybridization protocols.
4YratucaE I: DNA Isolation and

Parif icatiorr given by Jeanpierre (1987).Protocols for small or-
(Time: 1.5 days) ganisms (e.g., fruitflies, etc.) or for small amounts
of tissue are described in Chapters 7,8 and 9.
Genomic DNA used for hybridization studies
Prechill a smaV mortar and pestle in dry ice
must be pure and free fiom contaminants such as
(chill the grinding surface of the pestle only).
glycogen, protein, metallic ions, and other impu-
Grind a small mass of frozen tissue (5-10 g)
rities. These contaminants may interfere with
mixed with an equal quantity of dry ice to a
DNA reassociation. The following procedures ap-
very fine powder. Set aside and let most of
ply to the isolation of DNA from dissected tissues
the dry ice sublime, but keep the ground tis-
and can include frozen blood samples. This pro-
sue frozen. Steps 2-5 should be done as
tocol has been adapted from D.E. Graham (1978);
quickly as possible.
a rapid method for DNA extraction from biood is
180 Chapter 6 / Werman, Springer b Britten
2. Rapidly dissolve powder in ice-cold SEDTA 10. Add 5 ml of TE (or about 0.5 ml/mg DNA) to
(Appendix). Use 10-100 volumes SEDTA to the DNA and let it swell overnight at 4°C. If
tissue depending on DNA content, e.g., sperm the DNA does not dissolve add more TE as
requires a larger volume than blood. The re- appropriate.
sulting solution should be viscous. 11. (Optional, if there is RNA and protein conta-
3. Rapidly dissolve all lumps of tissue, immedi- mination) To the re-suspended DNA add 20
ately add 20% SDS to a final concentration of pg/ml DNase-free RNase and incubate in a
I % , and stir gently, to avoid shearing the water bath at 37°C for 1 hr. Remove and add
DNA, 100,ug/ml of proteinase-K solution, 1/10 vol-
4. Add 1/5 volume of 5 M sodium perchlorate urne 3.0 M sodium acetate, and 1/I00 volume
solution and mix. of 25% SDS. Mix and incubate at 60°C for 1hr.
Extract once with phenol, then twice with
5. Immediately add an equal volume of equili- 241, chloroform: isoamyl alcohol. Add 2 vol-
brated phenol and mix for 30-60 min. Mix umes of 95% ethanol to precipitate the DNA.
with just enough force to keep the emulsion Wash once with 70% ethanol and partially dry
from separating. Centrifuge at 5000 g for 15 the DNA pellet under vacuum. Re-suspend at
min at 4'C. 2-3 mg/ml in 0.1 D I EDTA over a few drops
M
6. The phenol phase should be on top because of of chloroform.
the density of the perchlorate solution. If the 12. In a spectrophotometer, check the optical den-
phases do not separate, add a small volume sity (OD) of a dilution of the DNA prepara-
of SEDTA and recentrifuge. If you add too tion at 230,260 and 280 nm. At 260 nm, 50 pg
much SEDTA, the phenol layer may end up of DNA in 1 ml solution will give an OD of
on the bottom. Remove the phenol phase. 1.0. The ratio of ODs at 260/280 provides an
7. To the aqueous phase, add an equal volume indication of RNA contamination. The ratio
of 24:l chloroform:isoamyl alcohol, and mix should be close to 1.8. The more RNA present
at room temperature for 30 rnin. Centrifuge as the higher the value. A low ratio of ODs at
above and save the aqueous phase (should be 260/230 indicates protein contamination; this
on the top). Leave the miky interface. Repeat value should be greater than or equal to 2.3.
steps 5-7. 13. Electrophorese 500 ng of the DNA solution on
8. The DNA is now ready for spooling. Place the a 0.6% neutral agarose gel with DNAmarkers
aqueous phase in an acid-washed beaker and stain with ethidium bromide (Chapter 8).
large enough to hold four times the sample The majority of the DNA should migrate as a
volume, Carefully layer 2 volumes of ice-cold large band close to the origin.
95% ethanol onto the DNA solution by pour-
ing it down the side. Keep the two solutions
from mixing.
Protancot. 2: Preparirzg Sheared Drivers
9. Take a long acid-washed glass rod and rotate
it slowly with a slow mixing action just below
from Long Nativc DNA
the interface of the two solutions. The DNA (Time: 6 hr)
should wind onto the rod and form a mass For interspecies hybridizations both driver and
iarge enough so that no more DNA will cling. tracer DNA fragments must be approximately 500
Remove the glass rod and gently squeeze ex- bases in length (denatured) to provide for the sep-
cess ethanol out of the wound DNA against aration of repetitive and single-copy fractions,
the side of the beaker. Slice the DNA with a ra- since repeats are dispersed t l ~ r o u g l ~ a uthe
t
zor blade along the axis of the rod and remove genome at some frequency.
it to a 15-ml sterile tube and repeat the wind- Shearing small samples of DNA is best accoxn-
ing until no more DNA sticks to the rod and plished using an ultrasonic cell disruptor or soni-
the two layers are nearly completely mixed.
:cleic.AcidsI: DNA-DNA Hybridization 181
cator. DNA samples (>20 xnl) can be sheared in a alize the DNA. If the sheared DNA is much
motorized tissue homogenizer following Britten et longer than 500 bases in length, then it must
al, (1974) and J.A. Hunt et al. (1981).Aprotocol for be sonicated again. If the DNA is much
sonicating DNA (in solution) is outlined below. shorter than 500 bases, then it cannot be used
for driver. Thus, it is best not to overshear the
1. To 400 pl sterile water add 50 pl3.0 M sodium DNA at first.
acetate and 100 pg of DNA (= 50 p1 at a con-
centration of 2 mg/ml) in a 2-ml sterile glass
screwcap vial. Mix gently and cool on ice for Protocol 3: Traccr Prepamtion with
15 min. '"3 or :H
2. Sonicate for 30 sec at 80-90% maximum (Time: 3 11r)
power with the tip of the sonicator probe just
below the surface of the solution. Put on ice Radioactively labeled tracer DNA can be prepared
for 30 sec and repeat four more times. Place by standard nick translation procedures (see also
on ice and set up a small chelating resin col- Chapter 8), although iodination has been used ex-
umn (see step 3) to filter out any metallic ions tensively in previous systematic applications of
or particles introduced during sonication. Be- DNA hybridization (Sibley and Ahlquist 1987a,
fore step 3, it may be desirable to go to step 4 and references therein). Iadination procedures
(below) and check the size of the DNA in case (Protocol 7) are somewhat difficult to establish and
additional sonications are required. may require practice to achieve good tracers on a
3. Clamp a 1-ml pipette tip to a ring stand and routine basis. Ail advantage of an iodinated tracer
push a small piece of sterile glass wool into is a long half-life (about 60 days). However, there
the tip. Add 0.5 ml of chelating resin (equili- are also advantages in the use of 32P-or 3H-labeled
brated to pH 7.0 with 0.3 M sodium acetate) tracers, which can be counted in a beta counter. "I?
and rinse several times with 1 rnl of 0.3 M can be counted Cherenkov without the use of sun-
sodium acetate. As the final rinse of sodium tillation fluid, can be detected by a hand-held
acetate passes through the chelating resin, Geiger counter, and has a lugh counting efficiency
add the DNA sample and collect it after it (95%).Tritium does not share these advantages
passes through to the column. Add 250 p1 but has a very long half-life and is a lower energy
sodium acetate to the column to wash out any emitter; consequently, tracers made wit11 W have
DNA and combine with the previously col- an extremely long shelf life. 32Phas a half-life of
lected sample. about 14 days; consequently tracers lose their ac-
4. Divide sample into 500 pl aliquots in 1.5-ml tivity and detectability rather quickly. Also, "P
microcentrifuge tubes and to each add 1 ml of tracers degrade within a few days to a week if la-
cold 95% ethanol to precipitate the DNA. Spin beled to very high specific activity (>I x 106
in a microcentrifuge at high speed for 10 min cpm/pg). Below is a protocol for synthesizing a
at 4°C. Wash pellets in 1 ml of 70% ethanol 32Pgenomic tracer. 3H can be substituted or used
and repeat the spin. Decant the ethanol and in combination with "P,so that a 3EI tracer can be
partially dry the pellets under vacuum. Re- tracked easily.
suspend the pellets in 2030 plO.1 rnM EDTA Nick translation of long native genomic DNA
or in an appropriate volume to obtain a con- is preferred since one has more control in the re-
centration of 5 pglpI(5 ng/ml). sulting fragment size by varying the quantlty of
DNase added to the reaction. If starting wjth
5. Electrophorese 500 ng of the sonicated DNA
sheared DNA (500 bp), the resulting fragment slze
on a 2% alkaline agarose mini-gel (Protocol 6)
will, on average, be considerably smaller, poss~bly
at 40 V for 2-4 hr with PBRIHinfI marker (or
too small to be used as tracer. Additional details
some other suitable marker for 500-bp frag-
regarding nick translation procedures are pre-
ments). Neutralize gel in 500 mM Tris (pH
sented in Sambrook et al. (1989).
7.5) and stain with ethidium bromide to visu-
182 Chapter 6 / We~nzan,Springer E7) Britten
1. In a 1.5-1111 microcentrifuge tube add 5 , L L ~of Geneclean@kit (BIO 101 Inc., P.O. Box 2284,
long native DNA to be nick translated. Then La Jolla, CA 92038-2284). The general proce-
add dure is outlined below.
5 ,ul lox nick translation buffer 5 . To the nick translation reaction add 10 yl of
(Appendix) the giass beads soiution (50% slurry in water,
= glassmilk of the Geneclean@kit) and gently
2 pi each 1 I ~ M dAT13,dGTP, dTTP.
mix. Add 150 pl sodium iodide solution (Ap-
1 p l 1 inM dCTP (or less, for higher spe- pendix), mix gently and set aside at room
clfic activity) temperature for 5 min.
5 pl [32PldCTP(50 pCi at 800 Ci/mM)
6. Spin at high speed in a microcentrifuge for
1 p1DNA polymerase
5-10 sec. Discard the supernatant (radioac-
0.5 pl DNase (lo6 dilution of a 10 mg/mi tive) and wash the glass pellet three times
stock) with 500 pi of the ethanol-Tris wash solution
Sterile water to 50 p1 total (see ethanol wash in Appendix). Spin 5-10
For "3,use several labeled triphosphates sec between washes and discard supernatant
without dilution. Incubate at 12-14OC for 2 hr. at each step. Be sure to resuspend the glass
Add 1/20 volumc 5 M EDTA and place on pellet with each new wash. On the last wash
ice. Remove 1 pl and dilute to 500 pl with wa- remove as much of the ethanol solution as
ter in another 1.5-ml tube to check 32Pincor- possible.
poration. 7. Elute the nick translated DNA from the glass-
2. To check the amount of radioactive nucleotide milk by adding 25 pl of TE or 0.48 M PB and
incorporated, take 10 y1 of the 500 p1 sample placing it into a 50°C water bath for 15 min.
and dot it onto a Whatman GF/C glass filter Spin down the gIass for 30 sec and remove
disc and set aside. Take another 10 pl and mix the supernatant to a new tube. The tracer
ii wlt11 5 ml 10% ice-cold trichloroacetic acid DNA should be free of unincorporated nu-
(TCA) and 50 yg of sheared salmon sperm cleotides and other impurities. It can now be
DNA. Put on ice for 15 ~nin. self-reacted to remove repetitive DNA and
3. Fllter the 5 ml of sample plus DNA through a
any "snapback" DNA formed during the nick
2.4 cin GF/C glass filter disc and wash 5
translation procedure (Protocol 4).
iirnes with 5 ml 10% TCA followed by two
\\iashes with 95% ethanol. Set the filter aside
111 a fume hood behrnd a shield and let it dry; Profact21 4: Eaccr Seli-Reaceion and
remember both filters are radioactive, as are Repeat Reu~ovaI
the wash solutions. Place both. filters into sep- (Time: 1-2 days)
arate scintillation vials and add 10 ml of scin-
illlation cocktail (for 3H)or count Cherenkov A preliminary hybridization reaction and frac-
(no fluid added) for 32P.The unwashed filter tionation must be carried out on the newly la-
represents the total radioactivity added; the beled tracer. An appropriate Cot must be chosen
washed filter represents the amount of 32P(or to reassociate the repeat DNA while leaving the
3H)incorporated into the DNA. Incorporation single-copy component single-stranded so as to
should reach 3040%. separate these components using hydroxyapatite
4 . The unincorporated nucleotides must be re- chromatography. The Cot necessary can be calcu-
lated by the following formula:
moved from the nick translation reaction.
This can be accomplislied by the "spun col-
umn technique" outlined in Chapter 8. How- Cot = (pg DNA/,d sample vo~ume)x
ever, we prefer to use the glassmilk elution 10 x AF x time (hr)
procedure described in Davis et al. (1986:123),
which is available commercially as the
AE is the acceleration factor due to an increase in Protocol 5: Fracticsnatian of Single-

pB concentration over 0.12 M. For 0.48 M PB the Copy Tracer nvcr Mydroxyapatitt;
AF is 5.6. Usually, adjustments to achieve a par- (Time: 3 hr)
ticular Cot are made by varying the time required
{or incubation. For the tracer prepared above (as- Removal of repeated DNA from the tracer is best
suming it is primate DNA) the Cotis: accomplished with the use of disposable plastic
columns which fit through a clear plastic box that
can be filled with circulating water adjusted to
50°C (Figure 2). The high radioactivity of the total
tracer may contaminate the columns if used later
The appropriate Cot should be determined empir- for interspecies melts.
ically for each taxon; estimates of Cot values suit-
1. Add 300 mg of dry HAP to a plastic col-
able for repeat removal have been made for
umn and rinse three times with 3 ml of
Drosophila (Cot 10-50), sea urchin (Cot loo), and
sterile water. Blow the water through with
primates (Cot300).
a small air pressure hose fitted with a rub-
1. Aspirate the 25-,ul tracer sample into a sterile ber stopper on the end to fit snugly into the
50-pl glass capillary tube leaving at least 1cm top of the column. Wash the HAP twice
air space at each end of the tube. Seal the ends with 3 ml of 0.12 M PB. Load 1 ml of 0.12 M
by melting them closed over a gas flame. La- PB and raise the circulating water temper-
bel with a piece of waterproof tape and im- ature to 50°C and blow the buffer through
merse in a boiling water bath for 2 min to de- the column.
nature the DNA. 2. Add the tracer preparation (about 500 pl) and
2. Remove the capillary tube and place it in a mix the HAP bed with a 5-pl sealed capillary
large screwcap vial filled with water at 60°C. tube to remove trapped air bubbles. Let the
Submerge this vial into a water bath at 60°C. column equilibrate to 50°C, blow the tracer
The vial will protect the capillary tube from through the column and collect into a 1.5-ml
damage. Incubate for the required amount of tube. Wash the HAP three times with 1 ml of
time. 0.12 M PB, collecting each wash. Then wash
3. Remove the capillary tube from the vial and the HAP twice with 0.48 M PI3 to elute the ad-
quickly place on ice. To a sterile 1.5-ml mi- sorbed double-stranded DNA (mainly re-
crocentrifuge tube add 75 p1 sterile water peated sequences) and collect each into a sep-
(which will dilute the reaction mix to 0.12 M arate tube.
PB) and 400 pl of 0.12 M PB (NEVER add the 3. Remove 1 pl of each sample (blank, load frac-
hybridization reaction mix directly to pure tion, 0.12 wash #I, #2, #3, 0.48 wash #1, #2)
water, as some PB must be present to prevent and place into scintillation vials containing 1
premature denaturation of duplexed frag- ml of water. Add 10 ml of scintillation fluid
ments). Mix and set aside at room tempera- and count each vial for 2 min (for 3H, all scin-
ture. tillation solutions must be identical). The load
4. File a small groove near each end of the capil- fraction should have the highest cpm/pl and
lary tube and break off the sealed tips. Care- is to be used as the tracer. Remember the
fully force out the hybridization reaction into tracer is in 0.12 M PB which should be con-
the 1.5-ml tube and mix gently by rocking the sidered in the calculation of the final hy-
tube back and forth for 30-60 sec. Fractionate bridization PB concentration (or the tracer
over HAP at 50°C (Protocol 5) and collect the may be concentrated with the glass elution
fraction not bound to the HAP; this is the sin- method, Protocol 3).
gle-copy tracer.
184 Chapter 6 / Werman, Springer b Britten
Protaca116: Esl-ilnatisra of Trace14 Protocssl9: B%rieparing Tracers by

Fragmcnrt Length Xudina tion
(Time: 6 hr) (Time: Part A, 4 days; Part B,24 hr)
Tracers below 500 bases in length have the effect Unlike the nick translation procedure, iodination
of lowering the T, by (500/tracer length) in de- of DNA far hybridization experiments in system-
grees C. Thus, a tracer of length 250 will lower the atics generally is carried out on sheared, single-
T, by 2°C. Tracer size must be estimated in the copy DNA. Therefore, we will describe a proce-
denatured state and this can be accomplished dure in which single-copy DNA is first isolated
with alkaline agarose gel electrophoresis as de- and then radiolabeled. This procedure is derived
scribed below. from the general protocols given in Commorford
(1971), Davis (1973), Tereba and McCarthy (1973),
1. Prepare a 2% agarose gel by adding 3 g Orosz and Wetmur (1974),Scherberg and Refetoff
agarose to 150 ml of gel buffer (50 mM NaC1, (1975), Chan et al. (1976), Anderson and Folk
1 mM EDTA) and microwave or boil to dis- (1976), Prensky (1976), and Sibley and Ahlquist
solve. Pour into an appropriately sized gel (1981a). The primary result of the iodination re-
mold and let cool. action is the replacement of a hydrogen atom at
2. Place the gel into a submerged electrophore- the C-5 position of cytidine by an iodine atom.
sis chamber and add enough alkaline running Iodination is much more efficient when DNA is
buffer to cover the gel with 0.5 cm of buffer. single-stranded.
Let the gel equilibrate for 1 hr. Remove A few words of caution should be mentioned
enough running buffer so that only about 2 or for investigators contemplating the use of ra-
3 mm of buffer lies on top of the gel. dioiodine. The temperature, acidic pH, and pres-
3. Add 1/20 vol 1 M NaOI-I to 1000-2000 cpm ence of an oxidizing agent in the iodination reac-
of tracer and incubate at 37OC for 10 rnin. tion all contribute to the volatization of a fraction
Add 1/5 vol loading dye (Chapter 8, Appen- of the radioiodine (Prensky, 1976).This danger re-
dix), mix, and load onto the gel in the second quires rigid measures of monitoring and protec-
or third lane. Treat 500 ng of appropriate tion. The review paper of Prensky (1976)is partic-
marker (PBRIHinfI) in the same fashion and ularly relevant.
load onto the first lane. Run gel at 35 V for Much larger quantities of DNA are generally
6-10 hr. labeled with radloiodine, When DNA samples are
4. Following electrophoresis, cut the lanes into in short supply, this is a significant consideration.
strips separating the marker lane from the Also, radioiodine must be assayed in a gamma
tracer lane(s). Neutralize marker lane in 0.5 M counter.
Tris-HC1, pH 7.5 and stain with ettzidium bro-
mide. Visualize and photograph (include a Part A. ??reparation of Saaiplcs for ladination
ruler) on a UV transilluminator. with IZ5E
5. Cut the tracer lane into 0.5-cm segments, 1. Boil 1.0-1.5 mg of sheared, native DNA in
starting from the origin (loading well) and 0.48 M phosphate buffer for 10 min and incu-
place each piece into a separate scintillatio~~ bate at 60°C to a Cot value at which repeated
vial in order; add 10 ml of scintillation fluid sequences have reannealed. Dilute the sample
and count for 5 min each. Graphically com- to 0.12 M phosphate buffer and apply to a hy-
pare the position of the modal cpm with the droxyapatite column at 55°C. Elute the single-
measured marker fragments to estimate the stranded, single-copy DNA with 20 ml. of 0.12
average tracer size. Use tlus size to adjust the M phosphate buffer.
T, if required. The prjmary reason for sizing 2. Dialyze the single-copy fraction of DNA
is to avoid degraded tracers or those that are against deionized water for 48 hr to remove
exceptionally long. phosphate buffer. Change water frequently.
Nucleic Acids I: DNA-DNA Hybuidizatio7q 185
3. Transfer dialyzed sample to a serum bottle container; iodine IS highly volatile at this
and freeze at -20°C. Lyophilize sample for 24 stage.
hr until DNA sample appears like cotton. 2. Using a 23-gauge needle and a 1-ml syringe,
4. Refrigerate sample until subsequent iodina- carefully draw out all. of the iodlne solutioxl
tion (not more than 24 hr). and add 40 pl of isotope (0.625 mC1) to each
5. Rehydrate lyophilized sample in a small vol- of the eight samples. Do not remove the rub-
ume (50-100 ~ 1of) 0.2 M NaAc adjusted to ber stoppers from the serum vials.
pH 7.5 with glacial acetic acid. It is convenient 3. Add 60 pl of 18 mM thallium chloride (TlCl)
to carry out the rehydration on a piece of to each sample. Again, use a 23-gauge needle
parafilm. , and 1-ml syringe to deliver the TlCl througl~
6. Transfer the sample to a 1.5-ml microfuge the rubber stopper that caps each sample rc-
tube. Vortex for 15 sec and centrifuge in a mi- action.
crofuge for 30 sec to remove insoluble debris. 4. Incubate samples at 6O0Cfor 15 min in a tem-
Determine the concentration of DNA using a perature block.
spectrophotometer. One to 2 pl of the sample 5. mace samples on ice for 5 min.
diluted in 2 ml of water generally is sufficient
6. Use a 23-gauge needle and 1-ml syringe to
for this purpose.
add 30 pl of 1.0 M Tris (base) to each sample.
7. Combine an aliquot of the sample containing
7. Heat samples for 10 rnin at 60°C.
100 pg of DNA with 0.2 M NaAc (pH 5.7) to
bring the total volume to 130 p1 in a 1-ml 8. Place samples on ice for 5 min.
stoppered serum vial. Add 6 p1 of 2 mM KI 9. Transfer samples to dialysis bags and dialyze
and 11 pl of bromcresol green dye (BGD is a overnight against a 4-liter solution of 0.4 M
p H indicator). Adjust the pH of the reaction NaC1, 0.01 M phosphate buffer, and 0.2 IIIM
mixture to 4.7-48 with 0.2 M NaAc (pH 4.01, EDTA.
using pH color standards. Place sample on 10. Transfer samples from dialysis bags to screw-
ice. top vials using Pasteur pipettes.
11. Determine concentration of DNA using a
spectrophotometer (see Protocol 1, step 12).
Part B. Preparation of lodins
Cuvettes committed for this purpose will re-
It is convenient to carry out iodinations for sev- main radioactive and should not be used for
eral samples at once. For sample reactions pre- other laboratory work.
pared as above, eight DNA samples can be radio-
labeled with 5 mCi of IZ51. The foIlowing protocol 12. Count 1 ,u1 of each sample in a gamma
is thus designed for the simultaneous iodination counter.
of eight samples with 5 mCi of 1251,although the 13. Store iodinated DNA tracers at -20°C until
basic protocol can be adapted to any number of needed far hybridization.
samples by using more or less iodine.
All of the manipulations described below
should be carried out under a hood while wear- Protocol 8: DNA Hybridization with
ing two pairs of latex gloves,
Hydruxyapafite and Phasphatc Bu ffcr
1. Start with 5 mCi of lZ5Iin a volume of 10 fi. (~i'me:
2 hr setup; up to 2 weeks incubation)
Vent the rubber seal of the iodine container
The reassociation of DNA hybrids and their melt-
with a 23-gauge needle. Using a 23-gauge
ing properties in neutral phosphate buffer (PB)
needle and a 1-ml syringe, dilute the iodine
has been used extensively to study genome struc-
solution with 340 pl of 0.2 M NaAc and 10 p1
ture and systematic relationsl-iips. The descriplion
of 1 mM KI and allow to equilibrate for 1 hr.
given here is for the simplest manual procedure,
Do not remove the rubber top on the iodine
186 Chnyfer 6 / Werman, Springer &3Britten
wllich 1s adequate but less accurate than some au- Incubate for the appropriate length of time to
tomated methods (e.g., Kohne et al., 1972; Britten achieve the desired Cot.This can be calculated
e t al, 1974). as follows:
'Cnsetting up hybridizations in the PB system
w l i l ~properly prepared tracer, careful attention Cot = (,ug driver/pl reaction volume) x
must be given to the following: (1) The concentra- 10 x AF x hr of incubation
tlons of the PB stock solutions, the hybridization
reaction mix, and the PB used to elute single- For PB concentrations over 0.12 accelerate the
stranded DNA from the hydroxyapatite column reaction: e.g., 0.48 M = 5.6 times faster and
are crltical and must be known with accuracy (2) 0.60 M = 6.5 times faster (see Britten et al.,
The reaction volume must be as small as possible 1974).
(10-1 00 pl) with a DNA mass:volume ratio of at Following the incubation, remove the capil-
least one or greater. Long periods of time are re- lary tube and break off each end by first filing
quircci to achieve high Cot if the volume is large a small groove and snapping it manually. Im-
and driver concentration is small (see below). (3) mediately dilute to 0.12 M PB. This is accom-
The driver must be from 1,000 to 10,000-fold in ex- plished by calculating the volume of water
cess over the tracer mass for total single-copy re- necessary to dilute the 0.60 M reaction mix to
actions to prevent significant self-reassociation of 0.12 M and adding this volume of water to
the tracer. (4) For a set of interspecific measure- 500 p10.12 M Pl3. The hybridization solution
ments using a particular tracer, three reactions can be dissolved directly into this solution.
must be set up for thermal fractionation: (a) tracer Do not dilute the reaction mix into pure wa-
x dliver DNA of the same individual or species; ter, as denaturation may take place prema-
(b) tracer x driver DNA from different species; turely. PB must be present.
and (c) tracer x greatly divergent DNA (to control 6. If the hybridizations are not to be fractionated
for self-reaction of tracer). Hybridizations using immediately, they should be taken out of the
32Por 3H are set up as follows (1251hybridizations 60°C bath and placed directly into a dry
require approximately 250,000 cpm of tracer): ice-ethanol bath and quick frozen. They then
1. 111 a 1.5-ml microcentrifuge tube combine can be stored at -20°C for a few days. Slow
500-1,000 cpm of tracer (mass of tracer can be freezing is disastrous as everything binds to
calculated from its specific activity) with a HAP.
1000-fold excess of sheared (500 bp) driver 7. Once the hybridizations have been diluted to
(C g., for 5 ng tracer add 5 fig or more of dri- 0.12 M PB they can he loaded on a column for
ver). Remember the tracer will be in 0.12 M thermal denaturation (Protocols 9-10).
PB and the driver will be in water or 0.1 mM
EDTA.
2. Adjust the final PB concentration to 0.60 A4 Pro"Lo@aB9: Hydroxy apatite Caltxaxhan
w ~ t h2.4 M stock (pH 6.8), mix, and draw the S%rflparat*ron
solution into a sterile glass capillary tube (Time: 3 hr)
leaving at Ieast 1 cm of air space at each end.
Flame seal the ends and inspect under a dis- 1. Rinse the column (Figure 3) twice with dis-
sect~onmicroscope to insure the integrity of tilled water (all solutions must be blown
[he seal. through the column under air pressure) and
load 400 mg dry hydroxyapatite (HAP).Rinse
3. liepeat for homoduplex and control reactions.
the HAP three times with 3 ml water followed
1' Marlc tubes with water-resistant tape and boil by two rinses with 3 ml. 0.12 M PB. Load each
ai 100°Cfor 2 min. Place immediately into a column with 3 ml0.12 M PB and increase the
50-mi screw cap glass tube filled with water circulating water temperature to 50°C. Blow
at 60°C and submerge in a 60°C water bath. this solution through the column into indi-
Figure 3 Standard water-jacketed glass column for

DNA hybrid melts, attached to a circulating, controlled
Hypodermic needle heating water bath. These columns are not available
commercially but can be made with the aid of a glass-
blower by modifying standard chromatography glass-
ygon tubing ware. Plastic syringes can also be modified to this con-
figuration, with glass wool replacing the sintered glass
disc. The glass column (shown) can be attached in se-
ries of up to six, depending on the flow rate of the wa-
ter bath. Also shown is a rubber stopper/needle setup
required to pressurize and blow the samples through
the column.
Circulating water bath

with controlled heating
Heated water
vidual scintillation vials and use them as the 5. The first fraction collected is the blank. The
"blanks" for background counts. load fraction and the three 50°C washes are
2. Load the hybridization solution (now in about used to determine the proportion of unre-
600 p1 of 0.12 M PB) onto the HAP bed and acted single-stranded tracer. The remaining
add 2.4 ml of 0.12 M PB. Gently stir the HAP fractions at the temperatures above 50°C are
with a glass rod to remove trapped air and used to determine the T, (see "Interpretation
record the temperature of the column. Remove and Troubleshooting").
the thermometer and blow the PB through the
column into a new vial. Add 3 ml of 0.12 M PB
to the column, let it equilibrate to 50°C, and Protocol 10: Phenol Emulsion
blow through and collect. Repeat the 0.12 M Reassociation xchnique (PERF)
PB wash twice more at this temperature. (Time: 2 hr, plus a 1-5 day incubation)
3. Raise the column temperature at 35OC inter-
vals to ~ O O ~collecting
C, a 3 ml 0.12 M PB The PERT system (Kohne et al., 1977) is a method
wash at each interval. Be sure to gently mix that achieves high Cot value at room temperature
the HAP at each interval and allow the tem- by using a phenol emulsion phase to accelerate
perature to remain constant for 3-4 min. the reassociation reaction as much as 10,000-fold.
Check and record the temperature before We have found this to be a high stringency of re-
blowing the wash through. association system and it is essentially unexplored
4, When the fractions are collected, add nl for studies of relationships. Advantages of this
of scintillation fluid and count eacl,vial Mce system include high Cot in a short period of time
for 5 min, or longer (10 min) if counts are low (half Cot of Dvosophila scnDNA is about 10 min),
( 4 0 cpm). the elimination of a low temperature foot on re-
melting curves (due to degraded trace*, dient from 45 to 65'C. This is accomplished with
etc,), and the reduction in the amount of driver the use of an aluminum block with heat exchang.
1lecessary (optimal acceleration at about 5 pg). ers at each end that can be connected to two sepa-
However, the system is limited in that the strin- rate temperature-controlled circulating water
gency criteria probably cannot be manipulated baths. A series of 11-mm diameter holes drilled
and the NPH falls more quickly with increasing into the block in a staggered pattern can accom-
evolutionary distance than with the standard PB modate 1-5-ml microcentrifuge tubes after the
system. ~ h e s ~ s t e m a tvalue
i c of this technique has holes have been partially filled with water. The
yet to be determined but the protocol is included block must be well insulated with styrofoam (six
here to promote further study. sides) and the resulting temperature gradient
along the block is linear and should be accurate
1. Combine 5 pg of sheared driver and 500-1000 within O.l°C (Britten et al., 1978). This should be
cpm tracer in a 1.5-ml microcentrifuge tube
checked with a precision thermometer moved
and adjust the PB concentration to 0.48 M us-
from hole to hole.
ing 2.4 M stock. Add 0.48 M PB to a final vol-
ume of 1 ml, mix, and place in 100°C water 1. Tracer and driver DNA must be in 0.1 M
bath for 3-5 min to denature the DNA. EDTA. Tracer should be at least 200 cpm/pl
2. Cool the mixture to room temperature and and sheared driver must be at a concentration
add 100 p1 equilibrated phenol and vortex. of 5 pg/pl. Add about 8,000-10,000 cpm
Shake continuously to keep the phases from tracer to 300-400 pg driver (e.g., 30 p1 tracer
separating for one to several days. Mixing can to 70 driver). To this mixture add an equal
be accomplished by attaching the tubes to a volume of 3.0 M TEACL to reach a final coil-
"wrist action" flask shaker, which shakes the centration of 1.5 M TEACL. Seal in a 200-pl
tubes in a vigorous up-and-down motion. capillary tube as described above (in the PB
system), and incubate in a boiling water bath
3. Remove from shaking and add 400 pl ether
for 2 min. Due to the large bore of the capil-
and vortex to extract the phenol. Remove the
lary tube, it is useful to break a smaller diam-
ether and discard.
eter tube (10-20 pl) and insert a small 0.5-crn
4. In a sterile 15-ml tube combine 3 ml water, 1 piece into the ends of the 200-pl. tube before
m10.12 M PB, and add exactly 1 ml of the hy- flaming the ends. This will ensure a goad seal.
bridization solution. This will yield 5 ml of a
2. After denaturation, place the tube in a 45'C
0.12 M PB solution which can be fractionated
water bath for a sufficient length of time to
over HAP as in Protocols 8 and 9, except for
the following: instead of adding 0.12 M PB to achieve an appropriate Cot,The acceleration
factor for incubations in 3.5 M TEACL is four
the loaded sample to bring it to 3 ml, simply
times that of 0.12 M PB. Thus, Cot can be cal-
add the 5 ml to the column, mix, and blow it
culated using the formula: Cot = [(,ug/pl) x 10
through the column, dividing the sample
x 4 x hr incubation].
equally into two scintillation vials. Wash three
times with 0.12 M PB at 50°C and continue as 3. Following the reassociation incubation, cool
in Protocol 9, step 3. the sample to rooin temperature, add an equal
volume of 2x S1 nuclease bulfer to decrease
the pH to 4.4 and the TEACL concentration to
Protocol 13: Analysis of Hybrid 0.75 M. Add SZ nuclease (1-2 p1 of a 5 U/,d
Thermal Stabilitgr Using the stock) for 95% single-strand digestion (appro-
51 Nlaclcaac-T'BArC'k Assay priate incubation times may have to be deter-
(Time: 3 hr setup, incubation up to two weeks, 6 mined empirically). Vortex and incubate at
hr fractionation) 37°C for 10 min. Remove, vortex again to re-
move lumps, and incubate at 37'C for 50 min.
This procedure requires the heating of a number 4. Chill the sample on ice and add 1/10 volume
of samples simultaneously in a temperature gra- of 0.30 M EDTA to stop the reaction. Remove
10-20 pl of the mixture and purify with the is the portion of DNA remaining as undi-
glassmilk elution proccdure (Protocol 3) or gested duplex. The melting curve can be con-
the spun column technique (Chapter 8) and structed by plotting the incrcasing fraction of
determine the duplex fragment size by elec- the sample digested by the S1 nuclease versus
trophoresis through alkaline agarose gel elec- increasing temperature. The cpin of duplex m
trophoresis, as outlined in Protocol 7. the unheated sample divided by the total cpm
5. To remove digested single-stranded DNA of the predigested 100-p1 sample ( ~ 1 0 0pro-
)
from the remainder of the sample and bring it vldes the extent of reassociation in percent.
to 2.4 M TEACL, load the sample on to a 3-ml 9. As an alternative to the final Sephadex G-100
Sephadex G-100 column previously equili- fractionation, the DNA in duplex can be pre-
brated with 2.4 M TEACL and wash through cipitated with cetyltrimethylammonium bro-
with 10-15 ml of 2.4 M TEACL. Collect 20- to mide (CETAB) and counted separately from
30-drop fractions and count Cherenkov in a the digested single-stranded DNA remaining
scintillation counter to identify the exclusion in the supernatant (Hereford and Rosbash,
peak (maximum concentration of duplex 1977; T.J. Hall et al., 1980).Add 150 pg of calf
DNA which may be in more than one of the thymus DNA and 1 / 2 volume of 9% CETAB
fractions). Bring the peak fraction to 1.8 ml and NaAc to a final concentration of 0.1 M.
with 2.4 M TEACL, and determine the precise Spin in a microcentrifuge for 10 min and re-
concentration by means of a refractometer. suspend the pellet in a small volume of 1.0
6. Remove 100-p1 aliquots from the 1.8-ml sam- mM EDTA; count this sample separately from
ple and place into 16 microcentrifuge tubes. the supernatant.
Place 14 of these ill the heating block (with a
little water in each hole). Keep one of the re-
maining samples unheated and heat the other INTERPRETATION AND
to 70°C for 30 min, prior to S1 digestion. Heat TROUBLESHOOTING
the samples in the heating block for 30 min in
the thermal gradient described above. These
samples will determine the melting curve. To Calculation of Melting Curves from
determine the total amount of radioactivity in Raw Counts: An Example
each sample, count 100 yl of the 1.8-1111 sam-
ple in liquid scintillation fluid. Several different measures are possible for esti-
mating evolutionary distance from interspecilic
7. After heating, place all the samples on ice and DNA hybridizations and melting curves ATillode,
add 100 pl of sterile water, 200 pl of 2x S1 nu- AT, percentage hybridizatio~i(ANPH),and AT50F1
clease buffer, and 10 pl of S1 enzyme solution (which combines AT, and ANPH into a angle
(Appendix); use enough S1 nuclease to digest number).
99% of single strands. Vortex and incubate at The T, is the interpolated tempcratl~rea t
37°C for 1hr. Put on ice and add 1/10 volume which 50% of the hybrids that were formed rr-
0.5 M EDTA (= 40.5 yl) to each tube to stop the main in duplexes. T, can be determined eas~iyby
reaction. At this point the remaining duplex inspection of an integral mcltlng curve, or non-
DNA must again be separated from the diges- linear least-squares regression methods can bc
tion products (both of wluch are radioactive). used to increase accuracy. The TSoHis an estimate
8. Prepare 16 individual (3-ml) Sephadex G-100 of the tcmperature at which 50% of the DNA re-
columns that have been equilibrated with 1.0 mains in duplexes; this measure differs from T,,, if
mM EDTA. Load the 16 samples and elute all DNA fragmenis do not form duplexes. The
with 3 ml 1.0 MEDTA and collect the ex- Tmodedepends on the determmation of the rnter-
cluded fraction. Mix with an appropriate vol- polated temperature at which inaxirnum number
ume of scintillation fluid (10 ml) and count of hybrids melt (the peak in a differential plot of
for 5-10 min per vial. This excluded fraction a meltlng curve).
190 Chapter 6 / Werman, Springer & Britten
The following example (Tables 2-4) is based bound). The bound fraction, or the extent of reac-
on n PERT, interspecies tracer-driver reassociation, is the percentage tracer that formed stable
tion, with a homoduplex control. The curves are hybrids during the reassociation and is repre-
derived from actual data and are of good quality. sented by the sum of the counts from vials 6
Poor curves, influenced by a variety of factors, are through 15 divided by the total, and expressed as
illustrated and described at the end of this section. a percentage [(553.8/638.4) x 100 = 86.74%1.
The tracer has reacted to the extent of 87% To obtain the data to draw an integral melting
with driver from which the tracer was originally curve normalized to 100% reactivity, the cpm
made (homoduplex control); this degree of hy- from vials 6-15 are added sequentially and each
bridlzation is not uncommon for PERT reactions. total is divided by the total cpm bound and mul-
Thermal denaturation of the duplex DNA is as de- tiplied by 100 to give the total percentage eluted
scribed in Protocol 9, where 15 fractions (vials) are at each temperature (Table 3). The curve (Figure
collecied and counted, with washes at 5OC inter- 4) is generated by plotting percentage eluted (y
vals (narrower intervals would be preferable). The axis) versus temperature ( x axis). The T,, of this
fract~onsand the cpm tracer in each fraction based curve is the point on the temperature scale where
on 5 rnin counting of each in a liquid scintillation the curve intersects 50% eluted.
counter (beta) are presented in Table 2 (a gamma In a heteroduplex DNA hybridization reac-
counter is needed for I2b5I). tion involving the tracer (in the first example) and
Adding the blank-corrected cpm from #2 to a driver from a different species, the curve (nor-
#15 gives the total cpm (= 638.4). The proportion malized to 100%)can be determined and the T ,
of tracer which did not bind to the HAP on load- identified as in the example above. However, for
ing is the sum of fractions 2 through 5 divided by T5OHestimates the homoduplex curve is normal-
the total cpm x 100 (84.6/638.4 x 100 = 13.25%un- ized to loo%, as above, but the heteroduplex
Table 2
Raw counts data for homoduplex hybridization for the calculation of
NPR and an integral melting curve
Temperature ("C) Vial cpm eluted cpm blank
50 1. Blank (3-ml PB wash) 12.4 0.0

50 2. Load fraction 41.8 29.4 (unbouitd fraction)
50 3.3-1111 PB wash 59.2 46.8
50 4.3-ml PB wash 20.8 8.4
50 5.3-ml PB wash 12.0 0.0
55 6.3-ml PB wash 14.2 1.8 (bound Daction)
60 7.3-ml PB wash 17.6 5.2
65 8.3-ml PB wash 16.8 4.4
70 9.3-ml PB wash 25.6 13.2
75 10.3-ml PB wash 44.4 32.0
80 11.3-ml PB wash 95.4 83.4
85 12.3-ml PB wash 216.2 203.8
90 13.3-ml PB wash 200.2 187.8
95 14.3-1111 PB wash 34.2 21.8
100 15.3-ml PB wash 12.6 0.2
Total 638.4
Nucleic Acids I: DIVA-DNA Hybridization 191
Table 3
Counting data for calculating an integral melting curve normalized to
100% reactivity
Temperature (OC) cpm blank cpm eluted (%)
55 1.8 (vial 6) 1.8/553.8 = 0.33

60 7.0 (vials 6-7) 7.0/553.8 = 1.27
65 11.4 (vials 6-8) 11.4/553.8 = 2.06
70 24.6 (vials6-9) 24.6/553.8 = 4.45
75 56.6 (vials 6-10) 56.6/553.8 = 10.20
80 140.0 (vials6-11) 140.0/553.8 = 25.30
85 343.8 (vials 6-12) 343.8/553.8 = 62.10
90 531.6 (vials 6-13) 531.6/553.8 = 96.00
95 553.4 (vials 6-14) 553.4/553.8 = 99.96
100 553.8 (vials 6-15) 553.8/553.8 = 100.00
curve is normalized to the homoduplex melt. For

example, if the homoduplex melt reacted 86.7%
and the heteroduplex comparison reacted 75.6%,
then the hornoduplex curve would be normalized
to 100% and the heteroduplex curve would be
normalized to the former, resulting in an NPH
(normalized percentage hybridization) of 87.2%.
The procedure for normaiizing is illustrated in
Table 4. For this melt unbound cpm = 154.9,
bound cpm = 480.9, and total cpm = 635.8.
The Tm (Figure 4) can be derived from column
I3 (Table 4). To estimate T501-I,these data and the
resulting curve must be normalized to the ho-
moduplex melt. This can be accomplished by di-
viding the heteroduplex percentage reaction (per-
cent bound) by the homoduplex percentage
reaction: 75.6/86.7 = 0.8719, which gives het-
eroduplex NPH of 87.19. The data from column D
above are transformed by multiplying the het-
eroduplex values by NPH/100 (0.8719) and
adding the y intercept of the curve (12.81).
Utilizing the information in the far right col-
umn (E) of Table 4, a curve can be drawn that is
normalized to the homoduplex melt (Figure 5).
The value of T5OHcan be obtained by inspection
where both curves intersect the 50% eluted line. Tm Tm
For the homoduplex curve in this example, T 5 0 ~= Temperature PC)
T ~The* difference between the homoduplex Tm Figure 4 Homoduplex (o) and heteroduplex ( a ) melt-
and the heteroduplex T,, or the difference being curves illustrating Tm and AT,. Both curves are
tween the T501-Ivalues, is the numerical value used normalized to 100% (see text for explanation).
192 Chapter 6 / Werman, Springer & Britten
Table 4
Calculation of interspecies melting curve dataa
Elution
Vial A B C D E temperature PC)
1 15.7 0.0
2 46.4 30.7
3 129.6 113.9
4 26.0 10.3
5 13.4 0.0 a.oo 12.18 50
6 17.0 1.3 1.3 0.27 13.05 55
7 20.4 4.7 6.0 1.25 13.89 60
8 25.4 9.7 15.7 3.26 . 15.65 65
9 53.0 37.3 53.0 11.02 22.41 70
10 97.8 82.1 135.1 28.90 37.30 75
11 150.4 34.7 269.8 56.10 61.71 80
12 155.0 139.3 409.1 85.07 86.98 85
23 82.8 67.1 476.2 99.02 99.14 90
14 20.4 4.7 480.9 100.00 100.00 95
15 11.8 0.0 480.9 100.00 100.00 100
aData are from raw counts normalized to 100%reaction (column D)and normalized with
respect to the homoduplex control (column E). Column A, raw counts, B, raw counts minus
blank; C, cumulative sum of counts; D, % of total bound (= 480.9) versus temperatures; and
E, counts normalized to homoduplex control.
100
in distance calculations for phylogenetic analysis 90

(Chapter 11; see also Felsenstein 1981a, 1987).
80
Problematic Melting Curves

70
Precision and accuracy in the execution of hy-
bridization methods are paramount for the con- 60
sistent generation of high-quality melting curves 2
and the related distance information. Precision in 50 4
replication can be enhanced with the use of auto-
mated methods that can run many columns si- $
40 $
muItaneousIy. Accuracy can be improved by close
attention to technical details. 30
There are several sources of error that can re-
sult in poor melting curves and lead to difficulties 20
10
Figure 5 Homoduplex (0) and heteroduplex ( * )
curves illustrating TSoHand AT50H. The homoduplex
curve is normalized to 100%whereas the heteroduplex 0
curve is normalized relative to the homoduplex curve Tm Tson
(see text for explanation). Temperature PC)
Nucleic Acids I: DNA- D N A Hybvidization 193
Repeated Elenzents in the Silagle-Copy DNA

Used to Make Tracers
Repeated elements often have divergent members
that may lead to extensive paralogous rather than
orthologous reassociation. This can contribute to
unstable, low-temperature melting duplexes. Al-
though it is unlikely that all repeated elements
can be removed (i.e., low-frequency repeats), it IS
important to fractionate and remove repeated ele-
ments by obtain~nga Cot value where at least 10%
of the single-copy DNA has reassaciated.
Short or Degraded Tvacer Fragments

Short tracer fragments (<25 bases) usually melt at
low temperatures, or simply broaden the result-
Temperature ("C) ing curve if there exists a distribution of small
Figure 6 Idealized integral melting curve profiles that lengths. Also, if the entire tracer sample is cam-
may ~ndicatepotential problems of tracers used in hy- posed of short fragments the T, will be greatly re-
bridization reactions based on melting characteristics of duced. A solution would be to use fresh tracer
homoduplex controls. Curve A represents the expected that has been properly sized and is free of possi-
curve for a homoduplex reaction, with good tracer and
driver. It is characterized by a steep slope near the me- ble DNase contaminat~onand unincorporated nu-
dian duplex melting temperature (T, here = 85"C), with cleotides.
very little DNA eluted DNA below 70°C. Curve B may
result if the tracer includes a significant amount of re- Pnterindividual Sequence Variation
peated DNA. This usually results in a slightly depressed Individual heterozygosity, coupled with the use
T, and broad lower temperature "foot!' The profile of
curve C indicates contamination with short, degraded of different sources of tracer and driver in in-
tracer fragments,which usually leads to a steady elution traspecies melts, can lead to broad melting curves.
of fragments at low temperatures with only a moderate This can also be the case if the driver or tracer is
reduction in the T,. Curve D illustrates that the tracer derived from a mixed source (Britten et al., 1978).
is dominated by short, degraded fragments, resulting in Good quality control reactions may provide in-
an excessive low temperature component and a dra-
matic depression of the expected T,. sight into the contribution of these factors to ques-
tionable melts.
in interpretation. Consequently, poor melting Pizcomplete Reactions

curves, and the resulting ambiguous distance in- If hybridization reactions are not carried out to
formation, ultimately result in weak hypotheses the appropriate Cot to permit reassociation of at
of phylogenetic relationship. Many of these po- least 90% of the single-copy DNA, low-tempera-
tential errors are manifested by (1)the presence of ture melting, hybrid duplexes may result. De-
a low temperature melting component, or "foot" pression of reassociation rates with sequence di-
near or at the criterion temperature; and (2) the vergence must be considered in allowing for
broadening of a melting curve over a wider tem- reactions to approach termination, without com-
perature range (Figure 6). In certain instances, promising tracer stability due to autoradiochemi-
these observations are real and not due to experi- cal degradation.
mental error; the significance of these phenomena
is currently under debate (Britten, 1989). Below is Inadequate DNA Purification
a list of possible sources of contamination in the D N A used for driver and tracer preparations
hybridization reaction that may give rise to poor must be free of protein, carbohydrate, metallic
curves, alone or in combination. ion, and RNA contamination. The presence of any
194 Chapter 6 / Wevman, Springer b Britten
of chese substances can interfere with reassocia-

tion or HAP binding. Occasionally proteins
bound to DNA or contaminating hemoglobin
(from samples derived from whole blood) are dif-
ficult to remove and repeated phenol extractions
and proieinase treatments are required. Glycogen
and RNA are usually eliminated by spooling
DNA during isolation, whereas metallic ions are
excl~ldedby passage of the DNA through a
chelntlng resin column (see Protocols 1 and 2).
Characteristics of Distance Estimates

Derived from Raw Melting Curve Data
The precise shape of melting curves depends on
how the single-copy DNA evolves; this is not a
simple issue because there are differences be-
tween different systematic groups. Among in-
sects, it has been clearly shown that there is a
wlde rdnge of rates of substitution among differ-
ent reg~onsof the genome of an individual lineage
(Caccone et al., 1987).This also has been shown to
be true for sea urchins (Grula et al., 19821, al-
though they have less heterogeneity. High rate Temperature ("C)
variability may also be typical of mammals (based
on the fraction of interspecies duplexes digested Figure 7 (A) Melting curves of heteroduplexes at dif-
by S1 nuclease; Benveniste, 1985), but further in- ferent distances, showing T, T50Hand TmOd,.Curve A
is for intraspecies hybrids and curve B is for fairly
vestigation is needed. Figure 7 shows freehand closely related species. The lower curves are smoothed
cuives for closely and more distantly related representations of the amount of DNA dissociated at
spxles, with the Tnlodr,Tlnr and TsoHindicated. each temperature (differential melting curve). (B)
Flgure 7A shows how these three measures Curves C and D show the results to be expected at
change with distance between the species, and moderate and great distances. For curve C the TsaHis
clearly the preferred measure. The Tmodecannot be de-
E~gure7B shows an example (T.J.Hall et al., 1980) termined due to the flat melting curve. For very great
of greater distances for sea urchin DNA. distance, as in curve D, no established measure is in use
The range of rates of DNA change, distrib- and future work is required. The T, is meaningless; the
uted among different sequences in the genome, TmOd,depends on details of the distribution of con-
deicrrnines the shape of melting curves at differ- served DNA sequences and, in the case shown, is al-
most the same as for native DNA. The best measure for
ent interspecies distances. The slowly changing this comparison is perhaps the extent of hybridization.
DNA shows little T,, reduction and maintains the
hlgher renlperature part of the melting curve as
d~stanceincreases. The rapidly changing fraction
makes the foot of the curves (at lower tempera- closely related species. At moderate distances the
ture) change rapidly and leads to incomplete hy- most divergent sequences fail to hybridize and
bridization at small distances, depending on the the rapidly changing sequences then principally
condllions of hybridization. The Tmodeis not ap- affect the extent of hybridization rather than the
preciably affected by the rapidly changing DNA T,. If the distribution of rates was well known,
sequences. In contrast, the T, is affected by both the shape of the melting curves would be evident.
the rapidly and slowly changing sequences for In that circumstance, the relationships between
Tn,, Tmode,NPH, and T5OWwould also be apparent It is worth noting that much of the width of
and each of the others could be calculated if one HAP melting curves is due to the range of base
were measured. However, it will be a while before composition present in the DNA and the failure of
that much information is available for other taxo- complete washing at each temperature. It has
nomic groups, although Sheldon and Bledsoe never been possible to draw condusions about
(1989) have investigated the relationship between the range of rates of divergence from HAP mea-
these distance measures for bird data and Kirsch surements for this reason. If the distribution of
et al. (1989) have presented data for marsupials. rates of divergence were broad and uniform, and
Below, we list some characteristics of each of these if there were no variance in base composition, no
measures. mode could exist. In reality, however, spectropho-
tometric and hydroxyapatite melting curves for
Tmode native DNA are sigmoidal owing to the effects of
ATnlodehas been advocated as the distance mea- base composition, and in the case of hydroxyap-
sure of choice for plzylogenetic applications by atite, incomplete washing. Thus, one always ob-
Sariclz et al. (1989).Like AT,, ATmodeis moderately serves a mode and it is not yet possible to know
precise and does not suffer from the high standard how such artifacts affect the apparent mode in a
error that NPH and ATsoHdo (Bledsoe, 1987; particular measurement. It may be the case, how-
Kirsch et al., 1989). This is clearly an advantage of ever, that the mode indexes that component of the
the former two measures for the range over which genome that has the modal base composition as
they can be measured. Another putative attribute well as the modal rate of divergence.
of ATmodeis that the criterion conditions may not
form a boundary at which differences become NPH
compressed; the mode may simply shift below cri- The normalized percent hybridization (NPH) falls
terion (with increasing interspecies divergence) for heteroduplex comparisons compared to ho-
and cease to be a property of melting curves. Thus, moduplex controls, even if the heteroduplex com-
ATmod,may index comparable suites of sequences parison involves closely related species. If there is
over its range rather than the increasingly smaller a rapidIy evolving class of DNA, NPH is a mea-
subsets indexed by AT,; theoretically, this would sure of that class. However, it is not yet known
render ATmode values more additive than ATmval- how much of the reduction in NPH is due to ki-
ues. An exception to this pattern is noted in T.J. netic effects; rates of reassociation differ in ho-
Hall et al. (1980), where slowly evolving compo- moduplex versus heteroduplex reactions and the
nents of the genome in distant interspecies com- amount of driver DNA available for hybridization
parisons may retain a Tmode. may be used up before the divergent tracer-dri-
Tmodemeasures a peak in the distribution of ver sequences can hybridize. Clearly, there is a
diverging DNA. If the distribution of rates were need for measurements of this kinetic effect.
Gaussian, the variance were small, and if the tech- The rate of reassociation for interspecies hy-
nique were good enough to determine the peak brid formation is retarded by a factor of two per
accurately, then Tmodewould be the measure of 10°C divergence in Tm (Bonner et al., 1973). Late
choice. However, melting curves spread with inin the incubation there is less driver DNA avail-
creasing divergence and it becomes increasingly able to complete the slower formation of more di-
difficult to determine Tmodeaccurately (Figure 7B); vergent tracer-driver duplexes. One published
false modes may result from the scatter of indi- measurement (Galau et al., 1976) suggests a large
vidual measurements if the curve is broad and kinetic effect for a Xenopus interspecies hybridiza-
flat. Also, different algorithms for Tmodedetermi- tion but it has not been confirmed. Recent mea-
nation (e.g., modified Fermi-Dirac curve fitting, surements with Drosophila DNA (Werman et al.,
parabolic curve fitting, graphic methods; see Shel- 1990) and primate DNA (Bonner et al., 1980)sug-
don and Bledsse, 1989) do not always give the gest that the kinetic effect is small for modest (6%)
same answer. divergence. The underlying process is complex
196 Chapter 6 / Werrnan, Springer & Britten
since the duplexes of driver DNA form between peak in the distribution of rates of DNA sequence
randomly terminated fragments and single- change since the peak may move down toward
stranded regions remain available at their ends. the criterion. Thus, there is a linear relationship
Thus, the concentration of the single-stranded between T , reduction and divergence over short
part of the driver falls at a rate of (1+ kCOt)4.44,
SO evolutionary time. At increasing divergence the
that there is some single-stranded driver available curve begins to level off. This causes compression
very late in the incubation (Britten and Davidson, of the estimates of greater divergence.
1985).The kinetic effect therefore can be reduced
by incubation to high Cot, as is usually done. Since T50~
there are no satisfactory measurements, anyone The T50Hmeasure was devised by Kohne et al.
planning to use NPH for major phylogenetic (1972) to correct for the reduction in normalized
work is advised to make some determinations of percentage hybridization (NPH) that occurs even
the kinetic effect under the conditions of incuba- for closely related species and to remedy the com-
tion. The obvious method is to rehybridize the pression of AT, values that is forced by criterion
non-hybridizing fraction in a standard incubation. conditions. The method of calculation is shown in
In many measurements (Sibley and Ahlquist, Figure 5 and Table 4. As discussed, T50H is a mea-
1981a, 1983; Kirsch et al., 1989; Powell and Cac- sure of the median sequence divergence between
cone, 1990) NPH is not accurately determined. species (T.J. Hall et al., 1980).Obviously, if median
However, in other work it apparently has been sequence divergence could be accurately esti-
determined more reproducibly (Hall et al., 1980; mated, the result would be independent of the cri-
Benveniste, 1985).The technical problems that are terion used in a particular measurement. It was
due to limited Cot and to variations in length and shown by M.J. Smith et al. (1982) that a good com-
concentration of tracer and driver preparations pensation for different temperatures of incubation
are undoubtedly solvable, so we may look for- can be achieved by using T50H.The compensation
ward to more precise determinations of NPH. At for different criteria even extends to S1 methods.
large evolutionary distances where a majority of The S1 nuclease digests the more divergent du-
the DNA can no longer form interspecies duplexes and the undigested DNA is the better
plexes at the criterion temperature, NPH is the paired fraction, so the observed reduction in T,
primary available measure of interspecies rela- with S1 is less than with HAE However, the NPH
tionships. Tmodeand Tm, in turn, are not useful at is also less and as a result the T50Hobserved with
such high levels of divergence. Recently, Marshall SI is about the same as with hydroxyapatite. Also,
and Swift (1992) have used l/NPH as a distance TSoHgives a more linear relationship with se-
metric for sand dollars, where observed NPHs are quence divergence than do the other measures of
less than 50%. While their resulting phylogenies distance (Britten, 1986).
derived from AT, and 1/NPH show identical AT5oH,however, has its own limitations. Some
branching patterns, they point out that the utility of the initial reduction is due to kinetic effects (see
of this approach is based on highly reproducible k11e discussion of NPH) and this could exaggerate
NPH values. the actual amount of divergence. The contin-
ues to fall more or less linearly with increasing di-
T, vergence until the NPH falls below 50%, at which
For fairly closely related species the T, is a good point it becomes difficult to determine and re-
measure of the amount of the DNA that hy- quires extrapolation beyond the observed melting
bridizes. The T, falls steadily with increasing di- curve. Estimates of T50Hobtained using extrapo-
vergence until it reaches about halfway between lation must be regarded as more unreliable than
the criterion temperature and the Tmof precise those that are obtained directly from melting
duplexes, At greater divergences, the amount of curves. Another problem with TSoHresults from
DNA that hybridizes continues to fall while the the error in determining NPH. The effect of this
T, changes very little. There may be some addi- error in measuring NPH has much less of an im-
tional decrease in T , if there is a well-defined pact on TSOH when the slope of the melting curve
is very steep (as in close relationships measured root the topology and speclfy net amounts of
with 2.4 M TEACL). shared derived and uniquely derived change on
One approach to remedy the kinetic problems that topology. Furthermore, such an analysis pro-
of ATsoH is to calculate the expected decrease in duces a topology that is equ~valentto the topol-
NPH (for a heteroduplex reaction) based solely on ogy that one would obtain using the indimdual
kinetics and add this to the observed NPH value characters and a parsimony algorithm (see Chap-
before calculating ATsoH.However, data are not ter 11). In reality, however, several factors ]nay
yet available to make such a correction and such compromise the additivrty of DNA hybridization
measurements would be valuable. A practical data and destroy the precise correspondence be-
suggestion is to carry out hybridization reactions tween trees derived from distance data and trees
to high Cot values in an attempt to minimize the derived from parsimony analysis of individual
NPH differences caused by kinetic effects. characters. Some of these factors result from
processes of DNA evolution and also influence se-
1 25H quence data; others are peculiar to different mea-
An alternative for very distant species would be sures of genetic distance derived from DNA hy-
to use the TZgHor the 'temperature at which 25% bridization data. At close distances, for examplc,
of the hybridizable tracer remains in duplexes, NPH is somewl~atinaccurate but is preferable to
but this has not yet been tested. In the case of the use of ATmodeor ATm At larger distances,
melting curves for two distant species of sea ATmodeis subject to error in its determination and
urchin studied by T.J. Hall et al. (1980) and An- A T , exhibits a saturat~on,making ATsoH the
gerer et al. (1976), only about 20% of the DNA hy- metbod of choice, At even larger distances ATzoI,
bridized. The reduction in Tnlodcand Tm, in turn, cannot be determined and NPH remains perhaps
was only a few degrees since the DNA that hy- the only meaningful measure.
bridized was dominated by a high melting tem- Resampling techniques (jackknifing and boot-
perature component. The 20% NPH is a good strapping) have been utillzed in DNA-DNA hy-
measure for this case but a T2sHestimate for tlze bridization studies to determine the confidence
distant species would be easier to combine with levels associated with particular topological
other data for more closely related species. arrangements in tree construction (see Krajewski
and Dickerman, 1990 and Chapter 11). The jack-
Hybridization Data in Phylogenetic knifing method of Lanyon (1985)is particularly
sensitive to between-cell internal inconsistency in
Reconstruction a distance matrix whereas the bootstrapp~ng
Phylogenetic reconstruction is discussed in Chap- method of Krajewski and Dickerman (1990) 1s
ter 11, so the discussion here is restricted to factors sensitive to within-cell imprecision. Trees assessed
that specifically apply to DNA hybridization. If by these two techniques are largely robust wlth
DNA evolutionary changes were additive, so that A T , and ATsoH (Springer et al., 1990; Kirsch el al.,
Buneman's (1971) four-point metric was satisfied 1990a; Springer and Kirsch, 1991; Caccone et a1 ,
and all base pair changes were accurately indexed 1992; Sheldon et al., 19921, but not (at least in one
over all pairwise comparisons, then reconstruc- case) with ATnlode(Kirsch et al., 1990a).
tion of phylogeny would be a trivial operation.
Optiinality criteria such as those developed by Sources of Non-Additivity and Error
Fitch and Margoliash (1967) and Cavalli-Sforza Below we consider sources of error relevant to
and Edwards (1967) provide unambiguous crite- DNA-DNA hybridization data: homoplasy, un-
ria for choosing among competing topologies even distribution of rates of change, measurement
when distances are additive: the correct topology error, paralogous sequences, differences in
will exhibit a perfect fit to the matrix of distances. genome size, and intraspecific variation.
Springer and Krajewski (1989) have proved a Per-
fect-Fit Theorem to substantiate tlus argument. An HOMOPLASY Homoplasy (i.e., reversals and par-
unambiguous outgroup taxon then allows one to allelisms) causes observed sequence differences
198 Chapter 6 / Werrrnlz,Springer 6.Britten
to underrepresent actual amounts of sequence It should also be noted that the expected
divergence. As a consequence both DNA amount of homoplasy is deterministic but that
sequence data and DNA hybridization data are stochastic influences lead to variance around this
non-additive in expectation, because they do not expectation. For DNA sequence data, the stochas-
index all base pair changes that have taken place. tic component is much more important than for
Furthermore, the accumulation of homoplastic DNA hybridization data. Indeed, the variance
changes is non-linear and becomes progressively component associated with the expected amount
more important for increasingly divergent of homoplasy is trivial when the entire single-
sequences. Indeed, the accumulation of homo- copy genome is under comparison (Nei, 1987).
plasy in DNA sequences has been studied exten- This is an advantage of DNA hybridization data
sively and several mathematical models have over DNA sequence data. Mitigating against this
been developed to describe the effect of accumu- putative advantage is the increased measurement
lated ho~noplasyon sequence divergence. One of error associated with DNA hybridization.
the simplest models is based on a Poisson There is also a need to investigate the conse-
process and is given as follows: quences of homoplasy for phylogenetic recon-
struction if a correction for homoplasy is not em-
ployed. Most importantly, branch lengths on
T= -(%) 111/1- (%)Dl resulting topologies will be too short and the rela-
tive proportionality of branch lengths will be dis-
where D is the observed fraction of sequences that torted. Thus, homoplasy cannot be ignored if the
are different for any pairwise comparison and T is relative timing of branching events is of interest.
the expected sum (expressed as a fraction) of ho- The sequence of branching events on a topology
moplastic changes plus observed differences is much less affected by homoplasy, however, and
(T~rkesand Cantor, 1969). Since this model makes in most instances, corrections for homoplasy do
unrealistic assumptions about DNA sequence not affect the sequence of branching events
evolution, more sophisticated inodeIs have been (Springer and Kirsch, 1989; Springer and Krajew-
developed to account for biased codon usage, ski, 1989). This results from the deterministic fash-
synonymous versus non-synonymous substitu- ion in which homoplasy accumulates (i.e., homo-
tions, position-dependent differences in substitu- pldsy is a function of divergence) when a large
tion probabilities, and base-dependent differences number of nucleotides are under comparison, as
113 substitution probabilities (Fitch, 1971a, 1976a, is the case for DNA hybridization data. In con-
1986, Kimura, 1980, 1981; Golding, 1983; Tajima trast, no one has ever proposed (or documented)
and Nel, 1984; W.H. Li et al., 1985b;J.H. Gillespie, that homoplasy among morphological characters
1986b;Nei and Gojobori, 1986; Nei, 1987). [Shoe- is such a predictable, deterministic function of di-
maker and Fitch (1989), however, argue that all of vergence. When homoplasy is not a function of
these models are too conservative since not all nu- divergence, or when the variance associated with
cleotide positions are replaceable.] Unfortunately, this function is extremely large, homoplasy is
these models require actual DNA sequences and much more of an obstacle to phylogenetic recon-
cannot be used in conjunction with DNA hy- struction.
br~dizationdata. Even so, for observed sequence Application of the Jukes and Cantor correc-
differcnces up to 50%, all of these models provide tion requires that we know the conversion be-
estimates of T that are in excellent agreement with tween delta values and percent mismatch. Empir-
the Jukes and Cantor model; discrepancies be- ical estimates from the literature range from 0.7%
come important only at larger distances. Since to 2.0% base pair mismatch per 1°C depression in
DNA hybridization distances are generally much AT, (Bautz and Bautz, 1964; Laird et al., 1969;
less than 50% divergence, the Jukes and Cantor Kohne, 1970; Hutton and Wetmur, 1973; Britten et
model 1s therefore appropriate, albeit slightly con- al., 1974; Caccone et al., 198810). The conversion
serva~lve. most often used is that 1% sequence mismatch
:leic Acids I: DNA-DNA Hybridization 199
corresponds to 1°C of Tm depression, which is went off scale. If kinetic effects were accounted
partly a matter of convenience and standardiza- for, and if deletions were an unimportant source
tion. The recent estimate of 1.7% sequence diver- of NPH reduction (as they very well may be; see
gence per degree of Tm depression (Caccone et al., Meyerowitz and Martin, 1984, then ATsoHshould
198813) may well be correct for the ribosomal se- also converge on the same value as ATmode.In re-
quences studied since these sequences have con- ality, we do not know the distribution of rates of
served regions and clustered substitutions. It may change for the suite of sequences in the single-
not apply to typical single-copy DNA since most copy genome, but it is most likely to differ among
of this DNA is non-coding and might be expected taxonomic groups. Thus, it is unclear if modal or
to exhibit a more random distribution of substitu- median values of sequence divergence provide
tions. To test this, Springer et al. (1992b) compared better estimates of mean sequence divergence. We
the known sequence divergence of a 7.1-kb seg- hope that this issue can be evaluated quantita-
ment of the primate €-globin pseudogene to the tively in the future.
thermal stability of heteroduplexes. They found a
1.18%sequence divergence per degree centigrade. MEASUREMENT ERROR Measurement error is po-
Because the €-globin pseudogene region is non- tentially the single biggest problem with DNA
coding, it presumably evolves in a similar fashion hybridization distances. Springer and Krajewski
io the majority of the single-copy DNA. Thus, this (1989) discussed such error in the context of
value is significant in the conversion of percent- imprecision and inaccuracy, where precision
age sequence divergence to AT, in the majority of refers to the repeatability of replicate measure-
DNA-DNA hybridization studies. Further work ments and accuracy refers to the reliability of a
will be required, however, to determine the mean measurement as an estimate of some quantity.
value of the conversion between percent sequence Reciprocity is also a useful concept for dealing
divergence and melting temperature depression with matrices of DNA hybridization distances.
for a population of sequences (i.e., the single-copy Sarich and Cronin (1976) defined the percentage
genome) that undoubtedly exhibits great varia- non-reciprocity for a pairwise comparison as
tion in the clustering of substitutions. [(distanceAB - distance BA)/(distance AB + dis-
tance BA)] x 100. The average percentage of non-
THE DISTRIBUTION OF RATES OF SEQUENCE CHANGE reciprocity for a distance matrix is then the mean
A desirable property of any DNA hybridization value of this parameter over all pairwise com-
distance measure is that it represents the mean parisons.
amount of sequence divergence between equiva- Precision sf DNA hybridization measure-
lent portions of all genomes under comparison. ments is generally indexed as the standard error
However, once a rapidly evolving fraction of the or standard deviation of replicate measurements.
DNA has diverged such that its Tm is less than the Sibley et al. (1987) reported an average standard
temperature of its reassociation, the average or deviation of 0.35 degrees for AT, measurements.
mean divergence can no longer be measured. The Krajewski (1989), in turn, reported an average
median can be measured out to about 50% NPH standard deviation of 0.48 degrees for AT, mea-
and perhaps estimated further as mentioned surements of cranes. Furthermore, Sibley et al.
above. It seems likely that median distances (1987) found that the standard deviation for AT,
would then need to be corrected only for homo- values increases as a function of sample size up to
plasy to generate reliable estimates of additive ge- n = 5 and then remains stable. Also, standard de-
netic distance, but this has not yet been shown. viation does not depend on the magnitude of AT,
The shape of the distribution of rates of DNA se- values (Sibley et al., 1987; Springer et al., 1990;
quence change (see Springer and Krajewski, 1989) Krajewski, 1989).]Finally, average percent non-rec-
is also important. If this distribution were Gauss- iprocities for matrices of AT, values generally fall
ian, for example, ATmod,would provide a reliable between 3 and 10%(Sheldon, 1987; Springer and
estimate of mean sequence divergence until it Kirsch, 1989; Springer et al., 1990).
200 Chapter 6 / Werman, Springer & Briffen
Similarly, ATmodevalues are moderately pre- given in Springer and Kirsch (19891, the average
cise (Kirsch et al., 1989; Bledsoe, 1987).NPH and percent non-reciprocity was 3.12%.After several
ATsOHvalues, on the other hand, exhibit more iterations of the correction algorithm, this value
scatter for replicate measurements (Sheldon, 1987; was reduced to 1.05%.
Krajewski, 1989; Kirsch et al., 1989). Th'is mea- Both imprecision and inaccuracy affect the in-
surement error may obscure branching patterns ternal inconsistency of distance data and reduce
revealed by AT, and ATmodematrices (see Kirsch the fit between observed distances and distances
et al., 1989).An alternative strategy is to use a re- on an output topology. This internal inconsistency
gression equation to convert AT, values into casts doubt on the validity of branching arrange-
ATsOHvalues. This approach is much less sensitive ments when clades are united by short branch
to the effects of measurement error, yet it allows lengths (see Chapter 12).
one to reduce the effects of compression that
plague ATm values and obtain better estimates of DIFFERENCES IN PARALOGOUS SEQUENCES AND GE-
branch lengths on output topologies, If our intent NOME SIZE Sequences whose differences are a con-
is to use DNA hybridization distances to estimate sequence of independent evolutionary change
the timing of branching events, this issue cannot arising after speciation are referred to as ortholo-
be overlooked. Catzeflis et al. (1987) and Springer gous sequences (Fitch, 1976a).In contrast, paralo-
et al. (1990) have developed exponential regres- gous sequences evolve in parallel in a single line
sions of ATsoHon AT, for DNA hybridization data of descent subsequent to their origin through
on rodents and marsupials, respectively; addi- gene duplication (see Chapter 1). A salient point
tional equations would have to be developed for is that cross-matched paralogous sequences from
other groups. A major disadvantage of this ap- two different species may contain differences
proach is that it may prove intractable for some that predate speciation.
taxonomic groups, e.g., a consistent relationship Fox and Schmid (1980) and Saricl~et al. (1989)
between AT, and AT501-I may not hold for all taxa have argued that such cross-matched hybrids
under study because of differences jn genome size may be present in significant numbers when the
or variation in the amount of rapidly evolving conditions of reassociation are too relaxed, and
DNA. A second disadvantage is that NPH and that these paralogous hybrids form a low melting
ATSOHare most useful when AT, can no longer temperature component characteristic of many
provide resolution, but a regression equation melting curves. Furthermore, they argue that this
should only be used over a range where AT, and low melting temperature component seriously
ATSOHare both monotonically increasing functions compromises the phyiogenetic value of certain
of sequence divergence. hybridization distances, such as AT, and ATSOH.
In contrast to imprecision, inaccuracy is often However, there are no measurements in any spe-
caused by systematic biases that affect a whole cies that precisely quantify the number of such
suite of measurements. One such bias deserves low-copy number elements in the single-copy
mention. Most workers who have used 1251tracers fraction of the genome. Significant quantities of
are familiar with a compression of AT, values as- hybridizing paralogous sequences may be present
sociated with specific tracers (Springer and only under relaxed reassociation, although the sit-
Kirsch, 1989).Short tracer fragments are probably uation may be different for polyploid genomes.
the culprit. Compression can increase the average In addition, for iodinated tracers, which con-
percent non-reciprocity in a distance matrix. The stitute most of the melting curves to which Sarich
effects of compression, however, can be reduced et al, (1989) refer, short fragment size is often a
through the use of an algorithm developed by contributing cause (if not the most important
Springer and Kirsch (1989), which, in turn, is a cause) of the low melting temperature compo-
modification of an earlier algorithm developed by nent. Furthermore, if paralogous sequences are
Sarich and Cronin (1976) for immunological dis- shown to exist, it is easy to calculate a Tm for a
tances. For an uncorrected matrix of AT, values higher temperature component and discard the
Nucleic Acids I: DNA- D N A Hybridizatiol~ 201
lower melting temperature component. Finally, Ijztraspecific Variation

the effect of paralogous hybrids on A T , or AT50H Intraspecific polymorphisms can obscure rela-
values is probably to decrease their absolute mag- tionships, particularly among closely related
nitude without having much impact on their rel- species (Chapters 2,11,and 12). Differences in the
ative magnitude, or on the branching patterns melting profiles of conspecific individuals are
that are derived from matrices of such values generally small in most vertebrate species that
(Springer and Krajewski, 1989). have been investigated. Sheldon (1987),for exam-
Differences in genome size may also affect ple, provides evidence that mean intraspecific
certain DNA hybridization metrics, at least in the- AT,, values were only 0.28"C in herons. Similarly,
ory. Consider a case in which differences in Springer (1988) reported a mean intraspecific AT,,
genome size reside entirely in the single-copy value of 0.36"C for diprotodontian marsuprals. In
fraction of the genome: one species has all of the contrast, lug11 levels of intraspccific variation have
genes that are found in the single-copy genome of been discovered in several invertebrate taxa, in-
the second species as well as its own unique set of cluding sea urchins, cave crickets, and fruit flies
genes. When the DNA from the species with the (Britten ct al., 1978; Caccone et al, 1987; Cacconc
larger single-copy genome is labeled, part of that and Powell, 1987).In some instances, intraspecific
DNA will be incapable of reacting wit11 driver AT, values are as high as 5°C for jlxdividuals from
DNA from the second species simply because the the same population. If population bottlenecks are
homologous sequences do not exist. This will re- also important in the evolutionary history of such
sult in a decreased NPH value for the heterodu- lineages, then phylogenies derived from distance
plex reaction, although AT, and ATmodemay or matrices could prove positively i l ~ i s l e a d ~ n g
may no1 be affected. When DNA from the second (Roberts et al., 1985), especially with respect to
species is labeled, however, an NPH reduction branch lengths and the t ~ m i n gof divergence
will. not occur because t l ~ driver
e species has all of events. While it generally 1s not possible to know
the genes that are present in the single-copy when and where bottlenecks occurred, it is possi-
genome of the tracer species. Thus, we expect ble (and advisable) to assess the magmtude of in-
non-reciprocity in NPH values. This, in turn, will traspecific variation versus the magnitude of in-
affect ATSOI-Ivalues. terspecific distances.
As a second example of the effect of genome
size, consider a case in which the size and con-
stituency of the single-copy genome is the same in APPENDIX: S'TOCK SBT,UTTONS
each of two species, but for which the repeated
DNA fractions differ considerably. If repeats con- Alkaline Bleckxrophorcsis Tray RrxCfcr
stitute 80% of the genome in one species and only
II0x1
10% of the genome in the second species, and if
this remains unassessed, then effective concentra- 300 mM NaOH
tions of single-copy driver DNA (which react wit11 10 1n.M EDTA
single-copy tracer DNA) will differ when the to-
tal concentration of DNA (repeats plus single-
copy) is the same. Theoretically, these differences
could have a profound impact on NPH values
and lead to non-reciprocity, although there is not 500 mM NaCl
ye.t any empirical evidence that bears on this 10 rnM EDTA
question. In any event, some knowledge of
genome size is probably important for studies DNase I.
that seek to understand non-reciprocity among
NPH and ATsoHvalues. 10 mg/ml in 150 r n .NaCl, 50% glycerol
202 Chapter 6 / Werman, Springer 61Britten
EI3'F'A 0.5 Pltosphate Buffer (PB) 0.48 A4
18 6 g EDTA dihydrate 100 m12.4 M stock
60 rill water 400 ml water
Add 10 NNaOH to pH 8.0. Add water to 100 ml. Check refractive index of solution against that of
water. The difference should be exactly 0.0098
--
triz:iiriai. Wnsk (e.g., if water = 1.3320, PB at 0.48 M = 1.3418).Ad-
just up or down with 2.4 M PB and water, respec-
50 1111 95% ethanol tively.
50 xnl buffer:
20 inM Tris, pH 7.5 Phaspha te Buffer (PB) 8-12M
1 mM EDTA 125 m10.48 M PB (checked with refractometer)
100 mM NaCl 375 ml water
Store at -20°C.
RNasc (DNastr-frecf
E thidium Brorvride Solation Re-suspend RNase in TE at 10 mg/ml. Heat to
Add 200 mg to 20 ml water. Store in a light-proof 70°C for 15 min, cool and store at -20°C.
container at 4°C.
Salmon Sperm DNA
I"\I 1t.k 'Traxtslation Buffer UOx) Re-suspend at 5 mg/ml in 0.1 mM EDTA, and
500 n N Tris, pH 7.5 force through a 23-gauge needle several times to
100 mM magnesium chloride shear or sonicate 4x at 80% maximum power.
1 mM DTT Store over chloroform.
500 pg/ml BSA
SEDTA
0.1 M NaCl
Re-suspend in water to 10 lllM concentration. Ad- 50 mM EDTA
just pH to 7.0 with 50 mM Tris by spotting small Adjust pH to 8.0.
samples on pH paper; aliquot into small volumes
and store at -20°C.
Phc:xlsl, equilibrated to p H 7.4 1.2 m15 M NaCl

664 plul00 mM zinc sulfate
500 g bottle phenol (solid) 400 ,L 3MLsodium
I acetate, pH 4.5
Add 100 m12 M Tris, pII 7.4, 100 ml water, heat Add water to 4 ml.
slowly to 37"C, mix layers, let stand. Remove aque-
ous layer, add equal volume 1M Tris, pH 7.4, mix $1 Nmckease Solution
and let stand; remove aqueous layer. Repeat until
Tris remains at pH 7.5. Add 500 mg &hydroxy- Re-suspended S1 nuclease in l x S1 buffer at a con-
qumoline. Store at 4'C under 1 M Tris pH 7.5. centration of 5 U/pl
Pirosphaic Buffet. CPB) 2.4 IM Sodium Acetate (3.0 Ia/r)i

500 n-112.4M sodium phosphate monobasic 20.4 g sodium acetate, trihydrate
500 wLl2.4 M sodium phosphate dibasic Add water to 50 ml; adjust-pH to 7.5 with acetic
pH should be 6.8. acid.
Sodium Hydroxide Ibf N 2.4 M = 1.4032 (0.0712)

200 g NaOI3
400 ml water Note: The molarity of TEACL changes with pro-
longed storage, so it must be checked carefully be-
Dissolve, add water to 500 ml. fore each use.
Sodium Iodide Solution Acid (TCA) 10%
Triek~uroace$ie
500 g solid TCA
(Or use Na%of Geneclean'"kit) Add 227 ml water to make a 100%solution. Dilute
to 10%for working stock solution and refrigerate.
91.0 g sodium iodide
1.5 g sodium sulfite ;firis-HCI,2.0 M , pH 7.5 and 8,5
Add water to 100 ml, filter through Whatman # 1 850 ml water
filter paper Add 0.5 g sodium s~~lfide.
Store at 4OC
212.2 Trims base
in a light-proof bottle.
20 ml HC1 (conc.)
TG Adjust pH to 7.5 or 8.5; add water to 1liter.
10 mM Tris, pH 7.5
0.1 rnM EDTA Tris-Acetate, Netafral Gel and Tray
Buffer (10x1
850 ml water
48.4 g Tris
Dissolve at a concentration of about 300 g/liter. 27.22 g sodium acetate trihydrate
Vacuum distillate to about 3.0 to 3.2 M. Pass twice 3.8 g EDTA
over activated charcoal and fiIter through a 0.45 Adjust pH to 7.0 with acetic acid; add water to 1
,urn MiIliporeTM
filter; adjust pH to 7.0 with tetra- liter.
ethylammonium hydroxide. Dilute to about 80%
and check refractive index (R.I.):
R.I. (difference versus water a1.3320) of
2.5 M = 1.4065 (0.0745)
eic Acids 11:
ymerase Chain Reaction
Stephen R.Palumbi
INTRODUCTION
The polymerase chain reaction (PCR) has become one of the standard colors on
the systematist's palette. It is a tool of unrivaled power, but as is so often the case,
this power is linked to unrivaled complexity. The source of this complexity is the
PCR reaction itself-a myriad of ionic interactions, kinetic constants, and enzy-
matic activities, all taking place repeatedly and, hopefully, perfectly, In a few
hours time. The fact that it works so well and for so many people is one of the
most astonishing things about it.
This chapter covers some of the basic events of PCR and describes how these
events are critical to the success of a particular amplif~cation.The goal is to fa-
miliarize the reader with the process of PCR, so that PCR can be used to its fullest
extent. Even to veteran molecular biologists, PCR amplification often remains a
mystery. Why does one set of cycles work while others do not? Why does one set
of primers work while others do not? To help solve these problems, this chapter
elnpl~asizesan important aspect of PCR that is frequently overlooked: a PCR ma-
chine is an experimental tool, not just a troublesome gadget designed to produce
a DNA product. Using this tool allows an investigator to master PCR, instead of
the other way around.
206 Chapter 7 / Pal~lmbi
PRINCIPLES AND COMPARISON OF Oligonucleotides cannot anneal to double-

METHODS stranded (ds) DNA, but quickly bind to single-
stranded (ss) DNA. By heating dsDNA, the ionic
General Principles bonds that hold complementary strands together
weaken, and the DNA "melts" into a single-
Every cellular organlsm replicates its own DNA. stranded state. Upon cooling, the ssDNA will re-
The study of this process (e.g., Kornberg, 1980) associate; the kinetics of this reassociation will de-
has been instrumental in developing the field of pend on the copy number of the strands that can
molecular biology. In particular, the behavior and reanneal (see Chapter 6 for more details). Seg-
manipulation of DNA polymerases has been the ments of DNA that are not very coinmon in an
foci~sof a concerted effort to understand the way experiment will take the longest to reanneal be-
that genetic material is copied, repaired, and in- cause the complementary ssDNAs require time
hentcd. for Brownian motion to bring them together. If, in
Most polymerases recognize single-stranded the same experiment, a vast overdose of short
DNA as an appropriate template and bind tem- complementary oligonucIeotides are available,
porarily to this strand at a point adjacent to a dou- these will tend to anneal first, mainly because
ble-siranded stretch. The polymerase also binds to they are so numerous that the chance of a random
deoxynucleotide triphospl~ates(dNTPs) available encounter is higher and because their small size
in the medium, and, by using the energy stored in makes them Insre mobile.
the triple phosphate bond, it catalyzes a reaction As a result, heat denaturation followed by
that attaches the nucleotide to the second DNA slow cooling in the presence of large quantities of
strand. The enzyme then moves to the new end of synthetic, complementary oligonucleotides
lengthened double-stranded section, and the causes an oligonucleotide to anneal to the iein-
process is repeated hundreds or thousands of plate DNA wherever it shows adequate similar-
times a second.
DNA polymerases are unidirectional. They
start synthesis next to the 3' end of the double-
stranded section, and they synthesize new DNA
Double-
in the 5' + 3' direction. However, the bipolar na- stranded
tur? of the two compleinentary strands of the DNA
staitdard double helix makes it possible for a I
polymerase to synthesize either strand: it moves

in one direction on the top strand, but in the other Synthesis
direst1011 on the bottom (Figure 1). of strand 1
In laboratory experiments, it has long been
pussible to direct the synthesis of a complement
to single DNA strand by using the strand as a
temp late and by giving the enzyme an appropri-
ate starting signal. This signal can be any short
stretch of single-stranded DNA annealed to the
tenlpla te strand. Anywhere there is a double-
strand/single-strand junction, a polymerase can Figure 1 Synthesis of both strands of a DNA molecule
beg111 work (Figure 1). To direct the specific syn- proceeds by action of polymerases (shown as ovals),
thesis of particular seglnent of DNA merely re- which add bases complelncntary to the template
cltnres the binding of an oligonucleotide (a small strand. Synthesis is always in the 5' -t 3' direction.
Polymerases will begin synthesis wherever they find a
numb?r of nucleotides joined together in a short stretch of double-stranded DNA just "upstream"
stretch of single-stranded DNA) just upstream (to (meaning in the 5' direction) of a stretch of single-
the 5' s d e ) of the desired segment. stranded DNA.
Nucleic Acids II: The Polymerase Chain Reaction 207
ity. DNA synthesis can begin at that point, copy- cycles to denature/renature DNA and the use of
ing the template by primer extension in the 3' di- the stable activity of Taq polymerase (i.e., T.
rection (Figure 1). aquaticus polymerase, hereafter referred to as Taq)
The polymerase chain reaction uses this syn- throughout this cycle, led quickly to the develop-
thetic process to copy a specific target sequence ment of simple thermal cyclers to guide the poly-
over and over again. Mixtures of oligonu- merase chain reaction to completion. Since the
cleotides, usually called primers because they first use of Taq in PCR, several additional heat-sta-
prime DNA synthesis, are used in the reaction to ble polymerases have been isolated and used in
initiate DNA synthesis at specific places on the the reaction. Some, such as Vent polymerase, have
template. The two primers are designed to anneal a 3' + 5' exonuclease activity (Erlich et al., 1991;
close to one another (within several thousand Erlich and Arnheim, 1992; or see the brochures
base pairs) but on different strands, and they are published by biotechnology companies). This al-
oriented to copy the DNA strand lying between lows the enzyme to "back up" over the last bases
them. For each cycle of heat denaturation/an- that it synthesized and replace them if they were
nealing/synthesis, the region between the incorrect. As a result, these enzymes work more
primers is copied and its abundance in the reac- slowly and have a much lower error incorpora-
tion mixture doubles. During successive cycles, tion rate than does Taq polymerase. Polymerase
this doubling proceeds until the DNA bridging errors do not play a strong role in analysis of PCR
the primers comes to dominate the mixture. products by direct sequencing, but sequencing of
Moreover, most of the copies produced in later cloned PCR products may lead to the incorpora-
cycles are of exactly the same length-the length tion of such errors in the results.
of the DNA between primers. A typical PCR reaction, then, has all the com-
Although this chain reaction (the exponential ponents required for a n in vitro synthesis of
increase of DNA through successive cycles) was DNA: enzyme, appropriate buffers, ample
first described in the early 1970s (Kleppe et al., dNTPs, template DNA, primers, and cofactors
1971), it was not until purification and use of a such as magnesium. The mix is allowed to work
heat-stable polymerase (Mullis and Faloona, 1987) repeatedly, copying the DNA strand between the
that the chain reaction became practical. The treat- primers, with a reaction speed and specificity de-
ments that denature dsDNA to ssDNA (heat, high termined largely by temperature. Successful am-
pH) also destroy most enzyme activity. Thus, with plification within this reaction mix depends on
typical heat-sensitive polymerases, new poly- efficient interaction of all these components,
merase had to be added after every denaturation many of which can be optimized for a given tar-
step to maintain a chain reaction. Moreover, the get DNA.
temperatures at which most DNA polymerases
are active (<45OC) also tended to allow too much
non-specific annealing of primers with template
The Cycle
DNA. The PCR cycle consists of three major phases: de-
The heat-stable polymerase was isolated naturation, annealing, and extension. The molec-
from a hot springs bacterium, Thermus aquaticus, ular events occurring at each of these stages, and
which normally grows at high temperatures. why they are important to PCR, were discussed
Evolution had led to the adaptation of this DNA briefly above. To control the events at each phase,
polymerase to be active at high temperatures, the experimenter needs to make decisions about
and just as importantly, it is stable at even higher the temperature of the phase, its duration, and
temperatures. Heating a reaction to 94OC can de- how quickly this temperature is approached.
nature DNA, but the DNA polymerase of T.
aqunticus is not destroyed (or at least, not imme- Denaturation
diately; see later). In this phase, heat is used to stop all enzymatic re-
The combination of the idea of temperature actions (for example, the synthesis that was oc-
208 Chapter 7 / Palumbi
curing during a previous extension phase) and quickly by Brownian motion (because different
denature the DNA from double to single strands. sections of the molecule are trying to move in dif-
Usually 94°C is the temperature used, although ferent directions). Thus, small molecules like
some recent protocols have suggested 92°C. Too primers have the best chance of jiggling ran-
low a temperature, or too short a denaturation domly into exactly the right position to form
phase, may fail to completely disassociate high- ionic bonds with the targeted annealing site. Of
molecular-weight, genomic DNA. However, al- course, they are also jiggling next to every other
though Taq polymerase is resistant to heat denat- possibIe priming site, and they will bind to these
uration, it is not immune to it, and excessive sites as well if the ionic attraction of the site is
denaturation will reduce enzyme activity. For ex- greater than the forces breaking these attractions.
ample, after 30 incubations at 94OC for 60 seconds Note that if the template DNA is degraded, then
each, Taq loses about half of its activity,In general, the small pieces of template can act as a suite of
30-second denaturations at 94OC seem to strike a random primers, all of which have a perfect
good balance between complete denaturation and match somewhere in the target genome! This is
destruction of enzyme. Some protocols recom- one of the reasons why high-molecular-weight
mend a longer first denaturation step (i.e., of DNA and low-molecular-weight DNA do not mix
60-120 seconds), because this is the cycle in which we11 as templates.
full genomic disassociation is critical. Although every primer is different, there are
some simple rules to approximate the tempera-
Annealing ture at which the ionic attraction of a primer to its
In this phase, the temperature is lowered so that binding site is balanced by the forces of Brownian
oligonucleotide primers can bind to the appropri- motion pushing it away. The relevant measure is
ate sites in the template DNA. This is the most called the T,, or the temperature at which half of
critical phase, because if primers bind correctly to the potential binding sites are thought to have
only the target positions in the template, then primer bound to them. A long primer, or one with
there is a good probability that the expected syn- with greater GC content, has a higher T, (because
thesis product will result. However, there are of the greater number of hydrogen bonds). A com-
many factors that interfere with this perfect union mon rule of thumb is that the T,, (in degrees centi-
of primers and targets. grade) of a perfect primer (that is, one that has a
Consider first that the primers do not know perfect sequence match to the template) is four
what it is the experimenter wants to happen. An- times the number of G's and C's plus two times
nealing is a random process that depends criti- the number of A's and T's in the primer sequence.
cally on the concentration of primer, the avail- Above the T,, few primers are bound (al-
ability of annealing sites, and the presence of though if primer concentration is very high, there
competing, non-ideal annealing positions. As the can still be a Iot of coming and going at annealing
temperature is lowered from the denaturation sites). Below the T,, most of the perfect annealing
phase, primers are jiggling around the PCR mix- sites are occupied, but the primer is also binding
ture, driven by Brownian motion. Ionic bonds be- to a greater number of non-perfect sites (e.g.,
tween the single-stranded primers and the single- those that do not have the exact sequence of the
stranded template are constantly formed and primer). Here lies one of the great tradeoffs of
broken. The most stable ionic bonds last a little PCR. If annealing temperature is too high, not
longer, and as the temperature drops they last for enough primer is bound, but if it is too Iow, then
greater and greater periods of time. Simultane- multiple sites are used and many PCR artifacts
ously, every other single-stranded piece of DNA are generated. If annealing temperature is very
is forming transient bonds with every other low, the genomic DNA will reanneal to itself and
piece, with the exception of sections which can- "self-prime" (that is, it will form its own double-
not bind because they are too close together. strand/single-strand steps to start-synthesis; see
However, larger pieces of DNA do not move as Figure 1).
Nucleic Acids II: The Poly~nevaseChain Reactioiz 209 ,
Other problems can occur as well. Genomic to their target sequences once the extension tem-
DNA often has vast stretches of similar sequences perature is reached. How do they stay attached?
(satellite DNAs), and these will quickly reanneal. As the temperature rises slowly from the anneal-
In addition, in some cases there are stretches of ing temperature to 72OC, polymerizat~onbegins,
DNA that are the inverse of an adjoining stretch albeit slowly. However, thls slow polymerlzatlon
and can bind to this upstream or downstream is enough to add a few extra bases .to the primer,
stretch to form a hairpin structure. This is espe- lncreasrng the stability of the primer-template
cially true of genes for ribosomal RNA, which are complex. Thus, by the time the extension temper-
designed to fold into a series of loops and stems. ature is reached (typically in about 30 seconds or
Often, the DNA in the loop of this structure can- SO), the primer is already part of a growing DNA
not efficiently bind a primer, or if it can, a poly- daughter strand.
merase cannot synthesize past the stem of the Extension time is another important variable.
hairpin. Thus these sites, even though they exist Under ideal conditions, ?izq polymerase wlll syn-
in the template DNA, are invisible to the primers. thesize thousands of bases a minute. As a result,
This is thought to be why some sets of perfect PCR products under 500 bp do not require much
primers work better than others on the same gene. time for complete syntlzesis. For such short prod-
How long should the annealing reaction con- ucts, 30 seconds is ample. For longer products,
tinue? Again, there are important tradeoffs. The l~owever,longer periods of time are best. Typi-
chief advantage the primers have in annealing be- cally, a 30-second extension IS adequate for prod-
fore the template DNA reassociates are concentra- ucts under 500 bp, 60 seconds is needed for prod-
tion and speed. Both are eroded by long anneal- ucts between 500 and 1500 bp, and 90 seconds is
ing times, which give other, bulkier DNA required for longer products. However, opt~miza-
molecules a chance to find one another and an- tion of these times may be required because un-
neal. Long annealing times also are thought to necessarily long extension times appear to in-
give the primers time to "find" imperfect matches crease the likelihood of PCR artifacts.
in the genome, although a little reflection shows
that such matches are found as fast as perfect Choosing Reaction Conditions
matches. The biggest difference is that the resi-
dence times of the primer bound to such imper- Because each template and each primer pair is
fect sites is Iower because the ionic bonds are different, and because molecular systematists
weaker, and this lowered ionic attraction has little tend to use a wide variety of taxa or primers in a
to do with time. single research program, PCR reactions need to
Nevertheless, shorter annealing times seem to be carefully optimized. This means that a certain
provide greater specificity in the PCR reaction amount of trial and error is an integral part of the
than longer ones. Generally, annealing times of PCR experience.
30-60 seconds are most common, although times For molecular systematics, most attention has
as short as 15 seconds often work well at high an- been focused on the primers and how strongly
nealing temperatures with perfect primers. they anneal to the template DNA. If primer an-
nealing is inefficient, or if the primers anneal at
Extension unexpected sites, then the amplification might not
This phase allows the enzyme to work, synthesiz- proceed, or alternative products could be pro-
ing the target DNA segment. Taq polymerase duced. To guard against this occurrence, the slm-
works well at about 72"C, and this is the tempera- plest procedure is to anneal the primers at high
ture usually cl~osenfor the extension reaction. The stringency. In this way, only well-matched
enzyme is active at lower temperatures, however, primer/template couplings will occur and the
and this is important for the success of most am- amplification will be highly specific. Unfortu-
plifications. Most primers have a T , well below nately, these high temperatures often preclude use
7Z°C, and so most will not be bound very tightly of universal primers (those primers designed to
be effectivein a wide variety of taxa; e.g., Kocher become enormously creative: they are basic out-
et al, 1989). Such primers are seldom perfect lines that can be adapted to a large number of
mdtches in a target sequence. They are usually de- slightly different purposes. By understanding the
signed irt highly conserved regions that vary only basic kinetics of the complex PCR reaction, the
slightly ainong taxa, but variation of a few bases most effective set cf cycle parameters can usually
20 1s common. As a result, use of high anneal- be achieved. For any given set of reactions, this set
ing temperatures may prevent efficient primer- of "best" parameters must be discovered by ex-
telnplate binding. perimentation.
Choices of conditions are even more complex
because most PCR machines allow different cycles
to be joined together to produce a complex ampli-
PCR Components
fication profile. For example, in order to amplify The chemical environment of the PCR reaction is
a product with universal primers, which are not very important to the specificity and efficiency of
likely to be perfect when used on a particular tar- the amplification. This does not mean, however,
get, i t may be practical to use 5 cycles with a low that only a single reaction mix will work: often a
amlealing temperature (45°C or so), followed by variety of reaction mixes will give satisfactory re-
30-35 cycles at a high annealing temperature sults. A short, handy guide to some of the com-
(55°C) In this case, the amplification starts at law mon PCR variations was presented by Carbonari
stringency, allowing imperfect primers to anneal (1993). Note that it is seldom possible to predict
and start synthesis. During the first 5 cycles, how- precisely the effect of changing a particular reac-
evei, PCR products are produced with perfect tion component. In general, a good starting place
ends (the DNA sequence of the ends of the PCR is to use the buffer recipe suggested by the sup-
fragment are identical to the reaction primers be- plier of the thermostable enzyme you will use. Be-
cause the primers have been physically incorpo- cause these enzymes have different origins, they
rakd into the DNA). As a result, subsequent am- have slightly different requirements. Neverthe-
~~liflcations can occur at a higher annealing less, experience has shown that many buffer
temperature. In this case, the products made in recipes can substitute for one another. In addition
the first 5 cycles are the only templates for ampli- to the buffer, reactions must include the raw ma-
fication in the last 30-35 cycles. Although use of terial for synthesis (the dNTPs), the enzyme,
40 cycles at a 45OC annealing temperature would primers, templates, and Mg2+(a cofactor required
also help mismatched primers function, repeated by the enzyme to function). Only the primers and
use of this low annealing temperature often dNTPs are consumed during the reaction, and
causes many more PCR artifacts, these are added in enormous excess, so synthesis
Paradoxically, exactly the opposite strategy is is rarely limited by these components (Table 1).
sometimes used to promote amplification with In the case of the dNTPs, high concentrations
impelfect primers. The first cycles are performed enhance reaction speed because the enzyme acts
at high annealing temperature, which assures that most quickly if the substrates are in concentra-
olily the correct products are made (although tions so h i g l ~that the time until the correct nu-
there are few of them, since this is an inefficient cleotide triphosphate diffuses into the catalytic
set of cycles).Subsequent cycles are performed at site of the enzyme is short. In the case of the
low annealing temperature because this increases primer, excess is required to ensure that alI possi-
reactloll efficiency and by this time, the correct ble annealing sites have "access" to a primer mol-
PCR amplified segment makes up most of the re- ecule during the annealing step. In fact, the
acbon template. This strategy tends to minimize amount of nucleic acid added as primer is often
alternat~veproducts without sacrificing reaction very similar to the amount added as template
eHiaency. (about I pg per 100 /A). This means that the length
As these two examples illustrate, an~plifica- of the template DNA and the total length of all the
clan protocols are not rigidly set but instead have primers combined is nearly the same.
Table 1
Amounts of a 1000-bp DNA product that could be produced by complete synthesis
in a typical PCR reaction (100 ml v ~ l u m e ) ~
If the component is used completely
Initial Initial number Number of
component concentration of molecules product strands Weight of product
Primer 0.5 ,u.M 3 x loi3 3 x l0l3 30 Pg

dNTPs 800 pJ.4 5 x lot6 2.5 x l0l3 25 UE
-
a The typical PCR reactlon produces only 1-10 pg of roduct, so the actual yields are far less than the possible yields.
This means that only a fraction of the primers or d ~ & sare actually used in a reaction.
The cofactor Mg2+is not consumed in the syn- are usually reasons for poor PCR yields other
thesis reaction and is impervious to the heat ex- than low enzyme concentrations.
tremes of the amplification cycle, so initial and fi- Many other additives have been proposed to
nal concentrations are the same. The ion is an enhance PCR reactions. Some of the most: com-
important cofactor in enzymatic catalysis of the mon are BSA (bovine serum albumin), gelatin,
synthesis reaction, so adequate concentrations NP-40, Tween-20, Triton X-100, glycerol, and
speed u p the reaction considerably. However, DMSO. These additives are thought to stabilize
Mg2+ also interacts with the negative phosphate the enzyme (BSA and gelatin), reduce secondary
groups of the dNTPs strongly enough that Mg2+ structure problems (the detergents), or favor pre-
ionically attracted to P04- groups is less available cise annealing. In general, providing moderate
to act as an enzymatic cofactor. For this reason, amounts of these additives may make some reac-
Mg2+concentrations need to be higher than dNTP tions proceed more easily. Again, trial and error
concentrations. Note that the template DNA also seems to work best. Note that in almost every
probably interacts with Mg2+,but there is usually case, too much additive will kill a reaction. Gen-
not enough of this to play any important role in erally, concentrations range from about 0.1% to
sequestering Mg2+. 1%in the final PCR cocktail. Also, all three deter-
Varying Mg2+concentration has been a popu- gents are rarely used together, and when any ad-
lar method of tinkering with PCR reaction condi- ditives are used, reactions tend to have either BSA
tions. In general, 1.5 mM MgC12is added to most or gelatin (not both) and DMSO or glycerol (not
reactions. However, for a particular reaction, both).
titrating MgC12 concentration-that is, perform-
ing a controlled experiment in which MgC12is
varied-can often increase yield, reduce un-
The Thermal Cycler
wanted products, and increase reaction efficiency. All too often, the thermal cycler (PCR machine) is
Maximum concentrations seem to be about 6 viewed simply as a way to produce a product
mM. Above this level, Taq activity tends to de- needed to conduct research. One fills the tubes
cline. with complex reagents and hopes the desired
Enzyme concentration can also be altered in product is present at the end.
PCR reactions. The recommended amounts of But frequently, the product is not there, espe-
polymerase are often far in excess of the amounts cially when first embarking on a PCR experi-
required for amplification, and increasing enzyme ment, or when first using primers obtained
concentration does not automatically increase through the mail from someone who designed
PCR quality. Adding extra enzyme can cause a them for other organisms. In this circumstance, it
previously recalcitrant reaction to work, but there is important to treat the PCR process as an exper-
imentai opportunity. Tltis requires two elements: tions run with each of four different Mg2+concen-
a good experimental design with proper controls, trations) should be performed, with appropriate
and feedback about tlte results of each experi- positive and negative controls for each set of con-
mental manipulation. ditions. By varying conditions in this careful way,
Negative controls are fairly common in PCR; optimal reactions often can be obtained that
these generally are reaction tubes made without greatly enhance the product of the reaction. Even
template DNA. Presence of a PCR product in such a modest 10% increase in efficiency at every cycle
a case usually means contamination in one or will lead, over 30 cycles, to a 15-fold increase in
(usually) more reagents. However, positive con- product.
trols are also important. These are reactions that
are guaranteed to work as long as the basic PCR
cocktail is functional. Usually they involve using a
Primers and Primer Design
sample of genomic DNA known to give good con- Primers that amplify a given section of DNA in a
sistent results. Sometimes a previous PCR prod- wide range of taxa-so-called universal primers-
uct is used. Sometimes a cloned segment of DNA have been extremely useful in molecular system-
that contains both primers is used. Failure of the atics. The primary reason is that universal primers
positive control means failure of the basic cocktail allow amplification of a DNA segment in a
(e.g., perhaps Taq was inadvertently left out of the species that has never before been the subject of
reaction). In this case, failure of the remaining re- molecular genetic study.
actions is expected. If, however, the positive con- The most common universal primers are for
trol works, then the basic cocktail is functional animal mtDNA, plant cpDNA (chloroplast DNA),
and failure of other reactions must be considered and nuclear ribosomal RNA genes, although uni-
to be a sign of other problems associated with the versal primers for conserved exons of nuclear
template DNA and its relationship to the primers genes are becoming more commonplace.What are
used, the attributes of good universal primers? How are
Discerning the nature of these problems often they designed?
requires additional experiments. First, however, a The most straightforward design metl~odis to
hypothesis about the nature of the problem must align homologous sequences from as many differ-
be formulated using tile results of the previous ex- ent taxa as possible. For protein-coding genes,
periment. The troubleshoot.ing section at the end identical amino acid sequences over a 7-9 amino
of this chapter explains some of the causes of acid stretch are convenient locations for universal
some of the common problems seen in PCR reac- primers. In such regions, nucleotides in third-base
tions. The most important point is that even if the positions can vary widely, and it is variation at
products are not what you expected, you should such positions that creates most of the primer mis-
examine the results carefully. An agarose gel is not match during PCR. This problem can be reduced
"blank" if there are obvious primer-dimers, or in several ways. First, some amino acids are en-
smears running from the wells, or evidence of de- coded by two codons. In these two-fold codons,
graded DNA. the third position usualIy can be either (T or C)or
Once a hypothesis is formed about the nature it can be (A or G). Other, four-fold codons can
of the problem, a remedy can be devised and have any of the four bases at the third position.
tested. Often the remedy involves trying different Designing a primer to include as many one- and
reaction conditions (annealing temperature, Mg2+ two-fold codons as possible greatly reduces the
concentration, amount of template), and often a potential variation. Second, not all codons are
range of conditions should be tested at tl-te same used with equal frequency. In some genomes, like
time (obviously annealing temperature cannot be insect mtDNA, there are strong nucleotide biases
treated this way). In cases where two variables which result in non-random distribution of third-
need to be varied, a traditional block design ex- base positions. In insects and crustaceans, 95% of
periment (e.g., three different template concentra- these bases are A or T. This means that two-fold
Nucleic Acids II: The Polynzerase Chain Reaction 213
codons are very conserved evolutionarily (for ex- withstand higher annealing temperatures, but are
ample, a cysteine-TGT or TGC-will most fre- also subject to greater amounts of self-annealing.
quently be a TGT), and that even four-fold codons
do not vary wildly. Primer-Template Match
Besides nucleotide bias, many genes, espe- Specificity is obtalned through n~aximizingse-
cially nuclear genes, show codon bias. In these quence similarity between the primer and tern-
cases, all four versions of a four-fold codon might plate. However, ampllficat~onproducts arc ob-
theoretically be used, but in most cases only one tarned even when the primer and template do not
or two of the possibilities are commonly used. have perfect sim~larity.Single internal mismatches
Alignment of many l~omologoussequences usu- havc little effect on PCR product yield when the
ally can reveal the nature and extent of codon primers are long and there are 6-10 matched
bias. For an example of protein alignment and de- bases on either side of the mismatch. By contrast,
sign of PCR primers, consult the section on cy- single mismatches at or near the 3' end of thc
tochroine c primers at the end of this chapter. primer can signif~cantlydecrease amplification.
Of the mismatches at the 3' end, A:G, G:A, and
Primer Length C:C reduce yields about 100-fold, whereas A . A
Primers can be as short as 13bp (even shorter for mismatches reduce ylelds 20-fold. T's appear to
RAPDs; see below) and as long as 80 bp. In most be dble to base pair wlth all three other bases
cases 18-24 bp primers are sufficient. The longer fairly well, and this suggests some tips for dcslgn
the primer, the higher the annealing temperature of universal primers.
can be and the greater the specificity. However, Another suggestion for primer design IS to ex-
unpurified primers and long primers have amine closely the 3' end of the primer, where t l ~ c
greater amounts of non-specific primer products polymerase binds initially. The primer m this re-
present in the primer mixture. In a typical gion nceds to be flrrnly annealed to the ternpratc
ohgonucleotide synthesizer, bases are added to a for elfielent polymerase binding. If the 3' end of
synthetic oligonucleotide wit11 about a 98% effi- the primer is a third codon position, then t h ~ last,
s
ciency. This means that for every nucleotide c r ~ t ~ cbase
a l will frequently be misinatched and
added to the primer, 2% of the product is not syn- poor amplification will result. For this reason,
thesized properly. If, for instance, the primer is 20 primers are usually ended at second codon posi-
bp long, then only 68% of the oligonucleotides in tions (the positions which are least likely to vary).
solution are the desired primer product. The re- In addition, the primer is enl-,anted if the third
maining 32% are non-specific primers, approxi- codon position closest to the 3' end (usudlly
mately 2% of which are 1 bp shorter, 2% are 2 bp within 3 bases of the 3' end) IS a two-fold posltion
shorter, etc. In general, these non-specific or is a degenerate base in the primer (that is, the
oligonucleotides do not interfere with amplifica- base at this position is one of two or four different
tion. However, in many cases, primer artifacts nucleotides). This will produce a primer 11kely to
(dimers) and non-specific amplification occur. If be a good match at the flve bases nearest the
you need to make long primers, product purifi- primer's 3' end.
cation is recommended. Otherwise this does not
seem to be necessary. Common Modifications to Primers
Restriction sitcs (see Chapter 8) can be incorpo-
Nucleotide Composition rated into the primers, thus allowing easier
Primers can be any sequence. The ideal primer cloning of PCR products. Some researchers sug-
has roughly equal numbers of each nucleotide gest adding an additional 3-5 bases to the 5' end
without internal repeats or self-similarity. For in- of the primer (after the restriction site) becausc
stance, a primer with the sequence AAATT- thls greatly increases the digestion efficiency. In-
TAAATTT may lead to self-priming and corporation of a biatinylated nucleotide to the 5'
primer-dilner products. GC-rich primers can end during primer synthesis allows solid-phase
sequencing and non-isotopic detection of ampli- GAG GAA GA A
.. or
GAA
.X The primer GAA does
not anneal well with
fied products. Recently, fluorescent tags have been I I I o r I I I
CTC CTT C TT CTC the sequence CTC.
added to nucleotides as well; these form the basis Coding strand
Doubie -stranded
of detection systems in many automatic DNA se- templates primer
quencers.
The primer GAG can
Desig~iof Univevsal Primers GAG GAA GAG GAG anneal well with the
l l l o r l l l or : , , sequence CTTbecause
There are several ways primers can be designed CTC CTT CT T C TC a G:T bond has some
to make them more useful with unknown tem- Double-stranded Coding strand
plate sequences. Olfgonucleotide synthesizers al- templates primer
low equal molar ratios of two, three, or four dif- Figure 2 Design of primers to reduce redundancy by
ferent bases to be added at a particular position. taking advantage of the ability of G-T bonds to form in
This type of synthesis creates so-called degener- primer-template interactions. The G-T bond is not as
ate primers, which are in reality complex mixes stable as an A-T bond, but is much more stable than an
of ~Iigo~~ucleotides of different sequences. The A-C bond.
advantage of this type of design is that, theoreti-
cally, there will be an exact match of a target se-
quence to something in the primer mix. A disad- ticular taxon under study. These new primers typ-
vantage is that, depending on the degree of ically are located just inside the universal set, and
degeneracy, the concentration of this perfect tend to provide cleaner, more consistent amplifi-
primer 1s low. Also, because a complex mixture of cations than even the best universal primers.
primers will show a spectrum of affinities for the
template, it is difficult to judge the best concen-
tration of primers to use without careful titration ASSUMPTIONS
experiments.
A way to reduce primer degeneracy is to take The biggest assumption made,about PCP. is that
advantage of the ability of some "mismatched" the product produced is the product desired. It is
base pairs to form a partial bond. Although G easy to be skeptical about this assumption: within
blnds to C best, G-T bonds can also form. This the billions and billions of base pairs of the typi-
unusual bond suggests a strategy for designing cal genome there is likely to be-a region that will
pnmers, especially for animal mtDNA in which an,neal (although maybe only at low stringency)
most substitutions are transitions. with virtually any short primer. In fact, this is the
Suppose we need to design a primer In a basis for the development of random-primed PCR
stretch of DNA that includes a two-fold codon, analysis (see the section on RAPDs). However,
such as for glutamic acid (GAG, GAA). If we in- random priming also can lead to PCR artifacts
cltrd e the sequence GAA in the primer, it will an- that must be identified.
neal well only with the perfect complement, CTT. Random priming is reduced by using long
However, if we use the sequence GAG instead, primers with reasonably high annealing temper-
thls will anneal perfectly with the perfect comple- atures, and by using pairs of primers known to be
ment, CTC, but it will also anneal well with the in a given orientation and a given distance apart.
imperfect complement, CTT (Figure 2). Thus, for In this case, the most direct indication that the
primers on the coding strand, we can use G in PCR product is the correct one is its size: if the
every position in which there is a potential for ei- product is the predicted size, it is probably the one
ther an A or G in the template. Similar logic sug- for which the primers were designed.
gests using T in each position that has a or a C Given that the product is not a random one,
in tlie template. there are still several assumptions that typically
After a universal primer is designed and used are made about it. The most important one is that
successfully, often it is a good idea to design a the gene segment amplified is from the ortholo-
new set of primers that work we11 only on the par- gous locus (see Chapter 1). As an example, the
Nircleic Acids 11: The Polymerase Chain Reaction 215
$obin genes occur in a small, multigene family. If A separate assumption made when using
primers were designed to a conserved region of PCR for population analyses is that all alleles are
g-lobin, they may well amplify a gene segment being faithfully and equally amplified. This may
from several loci, both functional and non-func- be untrue if alleles differ in the sequences that an-
tional (i.e., pseudogenes). In amplifications from neal to the primers, or if some alle!es do not copy
a number of species, PCR products of the pre- well due to secondary structure. (Note that if the
dicted size might be derived from several of these PCR products are to be cloned, then unbiased
loci. In order to use such gene segments in phylo- cloning is another important assumption; see
genetic studies of species, some evidence that they Chapter 9). Another problem is the production of
are orthologous is required. recombinant PCR products by template "jump-
It generally is assumed that this problem is ing" (Saiki et al., 1988; Scharf et al., 1988a,b; Paabo
less severe when using animal mitochondrial et al., 1989; Scharf, 1990). These recombinant
DNA because this genome does not have multiple products are produced when partially extended
loci (although, of course, multiple copies are the DNA from one site of primer attachment (e.g., one
hallmark of this genome). Even with mtDNA, allele at a heterozygous locus, or one gene in a re-
however, care must be taken that nuclear pseudo- peated family) attaches at a second site (e.g., the
genes are not amplified. There are mitochondrial alternate allele at the heterozygous locus, or an-
gene segments that have been transferred into the other gene in the family) during a subsequent ex-
nuclear genome. These nuclear copies have been tension cycle. The resulting products are likely to
characterized (some minimally) in several taxa, contain some stretches of DNA from one allele or
including locusts (Gellisen et al., 1983), sea locus and other stretches of DNA from the other
urchins (Jacobs et al., 1983), birds (Quinn, 19921, allele or locus. Some polymerases (especially Taq)
rodents (M.F. Smith et al., 1992), crabs, and corals appear to have "pause sites" at particular DNA
(S. Romano, personal communication). sequences that are likely to promote recombinant
Quinn (1992) found that mtDNA sequences PCR products in heterozygotes (R. D. Bradley and
from snow geese were variable from population D. M. Hillis, personal communication). In these
to population, but that this analysis was compli- cases, switching to a different thermostable poly-
cated by the amplification of a nuclear pseudo- merase may reduce the production of recombi-
gene similar to the control region mtDNA se- nants. The problem also can be reduced by ampli-
quences that were the true target of PCR. fying short stretches of DNA or by using long
Moreover, these pseudogenes tended to dominate extension times. In some cases, it may be possible
amplifications of some samples but not others, to design allele-specific piimers. However, if al-
probably because some samples were from blood lelic-specific sequences need to be determined, the
(rich in nuclear DNA) whereas others were from likelihood of recombinant products in amplifica-
liver (with more mitochondrial DNA). tions from heterozygous individuals should al-
Quinn solved this problem by comparing am- ways be considered.
plifications from pure mtDNA and blood samples
from the same individuals. Mitochondrial-specific
primers allowed unambiguous amplification of APPLICATIONS AND LIMITATIONS
the correct locus. Other questions may be asked to
help distinguish nuclear pseudogenes from mito- Types of Amplifications and
chondrial targets: Is the transition:transversion ra- Types of Data
tio appropriate for mitochondrial versus nuclear
genes? Are the sequences monophyletic among The most common use of PCR in molecular analy-
cIose1y related species? Are there odd insertions sis has been for amplification and sequencing of
or deletions or stop codons that would indicate homologous genes in related organisms. How-
that the gene is non-functional? Do Southern hy- ever, several other types of data can be derived
bridizations using the PCR product as a probe from PCR products, and each of these types of
yield the expected results? data can be gathered from several different types
216 Chapter 7 / Palurnbi
of amplified DNA. Methods for obtaining these might often be the most informative (see Chapter
data (RFLPs, length variants for microsatellites, 12).Other approaches are to target conserved sec-
mobility variants for denaturation gels) are cov- tions of genes for primers, and to use these
ered in Chapter 8 and will only be mentioned primers to bracket more variable sections. For ex-
here. Instead, this chapter concentrates on some ample, cytochrome b has a wide variety of con-
of the common PCR targets. served and variable domains that are associated
with the function of this gene in the mitochondria1
Animal Mitochondria1 DNA membrane (see references in Irwin et al., 1991;
The existence of full sequences of mtDNAs from Martin and Palumbi, 1993a). Kocher et al. (1989)
several phyla has encouraged the development of designed primers that anneal to sections of the
a suite of so-called universal primers for this gene coding for conserved regions, yet also span a
genome (e.g., Kocher et al., 1989). These primers less conserved section.
allow access to the mitochondrial genomes of Recently, a different approach to mtDNA am-
species otherwise unknown to molecular biology, plification has been developed by M.J. Smith and
and encourage the sequencing and comparison of co-workers (1993).In their strategy, primers span
homoIogous genes of closely related species and gene junctions in mtDNA, and PCR products con-
of populations within species. tain the 5' and 3' ends of adjacent genes, along
In animal mtDNA, two sets of universal with intervening tRNA genes. This approach has
primers have been widely used for ribosomal been used to quickly estimate mtDNA gene order
genes and two sets have been used for protein- for novel taxa, as gene order includes important
coding genes. The ribosomal primers are highly phylogenetic information at higher taxonomic
conserved (see Appendix), yet span a region that levels (M.J. Smith et al., 1993).Another use for this
includes enough variation to be phylogenetically approach has been to span highly variable seg-
useful at the species level and below. Overall, the ments like the vertebrate control region by using
12s rRNA gene is shorter than the 16s rRNA gene, conserved primers in the flanking cytochrome b
but the former has been subjected to more careful and 125 RNA genes (Martin et al., 1992a). This
analysis of secondary structure (e.g., Simon et al., stretch of DNA generally is too long to be se-
1990).The 12s gene evolves at about the same rate quenced with most double-stranded or asymmet-
as the average for the entire mitochondrial ric PCR methods, but it can be restriction digested
genome (Simon et al., 1990). or cloned and sequenced. ,
Among protein-coding genes in mtDNA, Amplifications of mtDNA profit from the
there is a wide range of levels of conservation. multiple copies of this compact genome in animal
Some proteins are so variable that it is difficult to cells. Most somatic cells have thousands of copies
align homologous amino acids (e+g.,ATPase 6,8). of mtDNA. Large oocytes have hundreds of
Others are so highly conserved that it may be dif- thousands of copies. From a practical standpoint,
ficult to detect any amino acid change among gen- this provides a large number of starting copies
era (e.g., cytochrome oxidase I). As expected from for PCR-an advantage shared only with chloro-
the neutral theory, when four-fold degenerate plast DNA or multicopy nuclear loci like the ri-
sites are examined in these proteins, there is no re- bosomal RNA genes. Moreover, DNA extractions
lationship between rate of silent substitution and can be adjusted to yield a greater ratio of mtDNA
degree of amino acid conservation (Kessing, to chromosomal DNA. Differential centrifugation
1991).Thus, even the most highly conserved gene has long been used to isolate mitochondria (Lans-
(at the amino acid level) has as great a rate of mann et al., 1981; see also Chapter 8). Such mito-
silent change as does the most variable gene. This chondrial preparations have up to 50% mtDNA;
makes the design of universal primers easier for ultracentrifugation can increase this to near 100%,
highly conserved genes. Of course, amino acid greatly easing amplifications, and easing the
evolution is faster in less conserved genes, and for problems of identifying true mitochondria1
phylogenetic reconstructions this type of data genes.
Nucleic Acids 11: The Polymerase Chain Reaction 217
Chloroplast DNA ing the spacers between genes). Practical advan-

The chloroplast genome, derived from an ancient tages of PCR include the existence of universal
bacteria1 symbiosis, is far larger and more com- primers that can be used to amplify rRNA genes
plex than the animal mitochondria1 genome (see in a wide array of taxa, from bacteria to sponges
Palmer, 1985a,b for reviews). As a result, charac- to mammals (e.g., T.J. White et al., 1990; Wam-
terization of the whole genome in a variety of taxa wright et al., 19931, and the existence of hundreds
has been more challenging than these types of of copies of these genes in tandem arrays in [he
analyses for animal mtDNA. However, the com- (eukaryotic) nuclear genome. Thus, these genes
pIete cldoroplast genome is now known for a va- are nearly as abundant (113 number sf copies) as is
riety of taxa, including tobacco, rice, a parasitic mtDNA. In addition, concerted evolution (sensu
plant, and Euglena (Shinozaki et al., 1986; Hirat- Dover, 1982) can homogenize the tandem copies
suka et al., 1989; Wolfe et al., 1992; Hallick et al., (Hillis et al., 1991~).As a result, the hundreds of
19931, and sequence-level comparisons have be- tandem copies are almost always treated as a sin-
come common (e.g., D.E. Soltis et al., 1990; M.L. gle locus that can be compared among species.
Arnold et al., 1991; Jouannic et al., 1992; N. Ohta Hillis and Dixon (1991) present a useful revicw of
et al., 1992; M.W. Chase et al., 1993). ribosomal genes and their use in molecular sys-
PCR amplification of chloroplast DNA has tematics.
tended to center on several well-described genes, Ribosomal RNA genes also evolve slowly,
especially ribulose-l,5-bisphosphate carboxylase and have been used to compare very dlstant taxa
(rbcL; D.E. Soltis et al., 1990; Olmstead et al., (e.g., Wainwright et al., 1993).They are less use-
1992). An issue of the Annals of the Missouri ful, however, at finer taxonomic levels. For In-
Bofanical Garden (vol. 80, no. 3) includes a series stance, the sea urchins Heliocidaris tuberculatn and
of papers on the phylogenetics of this gene re- H. erythrogramrna are 10% different in mtDNA se-
gion; for instance, M.W. Chase et al. (1993) re- quence, but there are few substitutions that dis-
ported the availability of over 500 rbcL sequences tinguish them in their 18s rRNA sequences. Hlll~s
from seed plants, Other genes have been ampli- and Dixon (1991) show the variable levels of con-
fied and studied by restriction fragment analysis servation along the length of all the major nuclear
(Rieseberg et al., 1992) or DNA sequence com- ribosomal genes. This analysis shows that the 18s
parisoizs, including rpoCl and rpoC2 (Liston, gene tends to be the most conserved across taxo-
1992), tmK (N. Ohta et al., 1992), atp (Leu et al., nomic boundaries, but that even within thls gene
1992;Jouannic et al., 1992), and tmL (Mubumbila there are some regions that evolve relatively
et al., 1993). quickly.
Nuclear DNA Amplifications SINGLE-COPY NUCLEAR GENES At the other end of

RIBOSOMAL RNA GENES In molecular systematics, the spectrum are a large number of single-copy
there has been a focus on repeated nuclear genes nuclear genes that have been the focus of numcr-
that are highly conserved. The roots of this work ous studies across the various fields of molecular
lie in the detailed investigations of Pace, Woese, genetics. Single-copy amplifications are more
Lake and co-workers on the structure and func- demanding than are amplificatiol~sfrom repetl-
tion of large ribosomal RNA genes. At first, these tive DNA because there are far fewer targets for
were sequenced directly using reverse transcrip- the primers. In addition, most- eukaryotic genes
tase (e.g., Field et al., 1988) or were' cloned. How- are interrupted by one or more introns. As a
ever, PCR amplifications now are used routinely result, amplification of specific genes from
to provide sequencing templates (e.g., Wain- genomic DNA is often difficult.
wright et al., 1993). The cluster of tandemly re- One solution to both these problems has been
peated rRNA genes includes some highly con- the development of techniques to amplify from
served regions (typically within the genes cDNA preparations or directly from mRNA by us-
themselves), and several variable regions (including reverse transcriptase during the first round of
218 Chapter 7 / Pnlumbi
arnphfication. Because mRNAs are usually found MrcRosATELLrTE DNA A similar approach has
in many copies in a cell, and because introns have led to development of an entirely different strat-
been edited out, cDNA amplifications of particu- egy for obtaining population genetic data.
Jar coding regions are sometimes simpler than the Genomic clones are screened for homology to
corresponding genoinic amplifications. It is even probes constructed from dinucleotide repeats
possible to amplify from a cDNA preparation us- (e.g., CACACA, etc.). These clones are se-
ing only one specific primer if the downstream quenced, and PCR primers are constructed that
prllnei- used is an oligo dT. (This technique has flank the repeated segments. The number of tan-
been modified into the RACE protocol of dem dinucleotide repeats tends to vary from
Frohman et al., 1988.) individual to individual due to unequal cross-
However, drawbacks of cDNA amplifications ing-over, slip mismatch replication, and other
include the extreme care with which samples genetic mechanisms (Queller et al., 1993). Thus,
must be treated before use. The mRNAs upon the PCR products will be of slightly different
whlc11 this method depends are finicky templates sizes (differing by 2, 4, 6 bp, etc.). By elec-
prone to fast decay if not preserved carefully. In trophoresing these products on an acrylamide
addltlon, some genes are only poorly expressed in gel, the number of repeats that an individual
some tissues, and are thus largely unavailable as possesses (at both alleles) can be estimated.
mlWAs. Population cl~aracterizationconsists of docu-
It is important to note that many genes are menting the frequency of different length vari-
not truly single-copy. Instead they exist as part of ants in each population. Advantages of this
small gene families that have 2-10 expressed loci technique are its speed and accuracy once the
and might have additional copies as pseudo- appropriate PCR primers are known, and that a
genes. Use of highly conserved primers may well great deal of polymorphism tends to be visible
amplify a suite of products from these indepen- with this technique. This is especially important
dent loci, and care should be exercised in their for comparison of large numbers of individuals
analysis. in a population (Queller et al.,, 1993).
Disadvantages include uncertainty about the
ANONYMOUS SINGLE-COPY SEQUENCES Recently, functional role of nucleotide repeat variation (one
Karl and Avise (1993) have developed an ap- such trinucleotide-repeat polymorphism gives
proach to single-copy amplifications that they rise to the fragile X syndrome in humans; Verkerk
call "anonymous single-copy RFLPs." A genom- et al., 1990),the work required fo develop primers
ic library is screened for single-copy sequences for each new species examined (but see Schlot-
(see Quinn and White, 198713). These regions terer et al., 1991), and the fact that only a few al-
are sequenced and spec~ficprimers that anneal lelic states are possible, enhancing the chance of
to the ends of the region are constructed. parallel evolution of a particular length variant
Amplifications from genomic DNA produce a (see FitzSimmons et al., 1994).This latter problem
homolog of the cloned fragment, which can be means that phylogenetic analysis of the alleles
assayed by restriction digestion or sequencing. discovered is difficult or misleading. The
Advantages of this technique are that it pro- strengths and limitations of microsatellites are dis-
vid es a large number of independent loci, and it cussed further in Chapter 8.
can be applied to any species, A serious disad-
vaniage is the large effort involved in screening RANDOM PRIMER AMPLIFICATIONS Because one of
and confirming single-copy clones, an effort that the most time-consuming aspects of the above
must be repeated with every new species. In methods is primer design, and because this
addition, nothing is known about the sequences design process must often be repeated for each
produced, which makes identification of homol- new taxon studied, efforts have been made to
ogous sections in related species difficult and develop amplification systems that sidestep
complic'~tesanalysis of the results. primer design. RAPDs (see Hadrys et al,, 1992)
this by using a large set of short, ran- trol over the amplification condition is critical to
dom oligonucleotides. Even random primers confident interpretation of the results.
anneal with some probability in any given Another problem is that absence of a product
genome, and by screening a large number of in a particular reaction could be caused by many
primer pairs it is possible, by chance, to find genomic differences, such as nucleotide substitu-
some that produce useful products. tions in one or both primer sites, unequal recom-
These products may not occur on every chro- bination or replication slippage between sites cre-
mosome or in every individual. If the primer sites ating a long insertion that is not amplified well, or
are polymorphic, then the existence of a particu- inversion of a primer site.
lar product may be a good Mendelian character In addition, controlled matings have some-
which is typically (but not always) dominant. In times shown the appearance of novel bands in
this case the frequency of this product in amplifi - PCR experiments, making inference about the
cations from a large number of individuals can be parents of offspring problematical. This type of
used as a population marker. problem needs to be solved by very careful exper-
A great advantage of RAPDs is that they re- iments that show the Mendelian inheritance of the
quire no foreknowledge about any particular bands used in an analysis (e.g., Levitan and Gros-
gene in a target taxon. Given a large bank of ran- berg, 1993).
dom primers, some useful products are likely to Lastly, homologous loci are very difficult to
be amplified from virtually any species. A sec- identify, making IiAPDs difficult to use in inter-
ond advantage of the method is that it is random populational or interspecific comparisons (Hillis,
with respect to the genome. In a large number of 1994a; J.J. Smith et al., 1994).This problem can be
different products, there are likely to be some alleviated by cloning and sequencing the RAPD
that amplify a section of every chromosome. As a PCR product, and using redesigned, locus-specific
result, RAP% can generate molecular markers primers to amplify homologous loci from other
that can then be correlated with other pheno- individuals or species.
typic traits (such as pesticide resistance in
insects). BXON-PRIMED INTRON-CROSSING (EPIC)AMPLIFICA-
Another powerful use of RAPDs is in mater- T r o N s Many nuclear genes play important
nity and paternity exclusion analysis. Once a par- metabolic roles, and their products have amino
ticular product is known to be inherited as a acid sequences that are highly conserved among
Mendelian character, it can be used to screen a set taxa. For example, the amino acid sequence of
of adults to test to see which are the parents of a actin, a protein involved in muscle action and
particular offspring. Use of a large number of the cellular skeleton, is up to 95% identical
RAPD products in this way can be a precise way among different animal phyla. In addition to
of documenting offspring-parent relationships, such structural proteins, many catalytic proteins
dispersal distances, or multiple paternity of a exhibit highly conserved active sites.
brood (Levitan and Grosberg 1993). These conserved gene segments often are use-
Disadvantages of RAPDs are numerous, how- ful for designing PCR primers, but the resulting
ever. First and foremost is that it is difficult to dis- gene segment is likely to be too conserved to be
tinguish many of the polymorphisrns apparent phylogenetically informative except in distant tax-
using this technique from PCR artifacts. PCR is onomic comparisons. A solution to this problem
not always a precise process that gives exactly the is to design these conserved primers so that the
same results every time. Sometimes minor differ- gene segment between them crosses an intron.
ences in template quality or abundance cause ar- This strategy works well only if the intron posi-
tifactual differences in whether a particular prod- tions are known and intron sizes are small enough
uct is seen from individual to individual. Because to be amplified efficiently, but genomic DNA se-
some of the RAPD results come from negative ev- quences have provided this information for a
idence (e.g., lack of a band on a gel), precise con- wide array of genes. Intron positions can evolve,
but in some cases the placement of an intron in a the loci are undergoing concerted evolution). Sec-
gene has remained constant over a long period of ond, no more than two different sequences (rep-
time. For example, in virtually all actin genes, resenting +wo alleles possible at this locus) should
there is an intron at amino acid position 41 (Kow- occur in a diploid organism. Presence of more
be1 and Smith, 1989). Within smaller taxonomic than two different sequences indicates presence of
groups (such as mammals), many genes are con- more than a single locus or recombinational arti-
served at the amino acid level and are known to facts (note, however, that PCR errors can create
include introns in conserved positions. The basic minor differences in cloned sequences but that the
approach of intron amplification has been de- frequency of these transition errors is only about 1
scribed by Lessa (1992), Lessa and Applebaum in 500 bp).
(19931, Slade et al. (1993), and Palumbi and Baker Once the loci are identified, locus-specific and
(1994). species-specificprimers can be designed that give
Primer design and initial amplifications are single PCR products at high stringency. Amplifi-
similar to procedures used for other types of am- cations from multiple individuals or multiple
plifications. However, intron sizes can be highly species can then be cloned (to separate alleles)
variable among species, and it is difficult to pre- and sequenced. Alternatively, RFLP analysis on
dict what size PCR product to expect. In addition, amplified introns can be used to quickly screen
many conserved genes occur in small gene fami- population patterns (Slade et al., 1993; Palumbi
lies with several loci or pseudogenes. Thus, initial and Baker, 1994).
amplifications at low stringency are expected to
produce multiple products. These products, plus Ancient DNA
the inevitable artifacts produced by PCR, usually The polymerase chain reaction makes it possible
result in a confused initial picture of the amplifi- to analyze the tiny amounts of DNA that are pre-
cation results. Typically, a few strong bands ap- served in some fossil and subfossil material. In
pear, along with several minor bands or smears. general, the DNA extracted from such samples is
In many cases, the smallest, strongest bands in very small (100400 bp), in low abundance, and
an amplification represent processed pseudo- shows extensive oxidative damage. Paabo (1988)
genes, from which the introns have been removed estimated yields of 1-200 pg DNA per gram of
(by mRNA processing before insertion into the starting material. Cooper et al. (1992) reported
genome).To distinguish tme introns from PCR ar- that samples extracted from soft tissues tended to
tifacts, and to identify pseudogenes or separate be smaller than those recoverable from bone.
loci, it is a good idea to amplify genomic DNA A major advantage of PCR analysis of fossil
from several closely related species, and clone the DNA is the direct recovery of ancestral DNA se-
entire original PCR products (Marchuk et al., 1991; quences which can be used to clarify phylogenetic
see Chapter 9). A variety of inserts are then se- relationships (Higuchi et al., 1989; W.K. Thomas
lected and sequenced, including representatives of et al., 1990; Cooper et al., 1992; DeSalle et al., 1992;
the strongest bands from the initial amplification. Janczewski et al., 1992). A second valuable result
Ideally, primers are designed so that the PCR is the demonstration of previous levels of genetic
product includes a short stretch of the amino variation or population subdivision in modern
acids that flank the intron. True introns are identi- species that have undergone contemporary bot-
fied by sequencing the intron/exon junctions of tlenecks or range shifts (W.K. Thomas et al., 1990).
cloned inserts. Confirmed introns include the pre- The small size of DNA fragments recovered from
dicted amino acid flanking regions as well as the ancient samples inakes it difficult to form strong
conserved intron splice signals. Different loci are conclusions about close phylogenetic relation-
identified in several ways. For a given locus, se- ships on the basis of a single genetic locus. AS a
quences between closely related species should be result, muliple loci need to be examined for high
more similar than sequences between loci (unless resolution (e.g., Janczewski et al., 1992).
Nucleic Acids 11: The Polymerase Clzailz Xeactio~z 221
Forensic Identification of Small Tissue Sanaples protein-coding sequences to be obtained from ge-
In some cases, it is impossible to identify a species nomic DNA without recourse to cDNA libraries.
from the morphological information available. Such amplifications would cross large introns,
Sometimes the available material is only bits and and may provide a convenient source of material
pieces from museum collections (Higuchi et al., for RFLP analysis of nuclear DNA variation.
19891, fossil digs (Cooper et al., 1992; Janczewski
et al., 1992), fish markets (Baker and Palumbi,
19941, or crime scenes. Sometimes individuals are LABORATORY SETUP
so small (e.g., bacteria) that individual identifica-
tion is impossible (DeLong, 1990).Sometimes dif- Few pieces of specialized equipment are requlred
ferent species (e.g., marine larvae, insect larvae) for PCR. Compared to the expensive commitment
are so similar morph~logicallyat some stage of to ultracentrifuges that characterizes some mole-
their life history that they cannot be distinguished cular work (e.g., for isolation of whole mtDNA),
(R.R. Olsen et al., 1991). In these cases, positive the basic PCR systematist's lab is far simpler. The
identification often can be obtained by RFLP or laboratory setup described in Chapter 9 is more
sequence analysis of PCR products (Silberman than sufficient for all PCR needs. The minimum
and Walsh, 1992). These analyses differ only needs for PCR include a thermal cycler, a variety
slightly from standard analyses; the biggest of automatic pipetters (e.g., at least one in each of
changes are in sample preparation. Very small the following size ranges: 1-20 pl, 10-200 pl, and
samples (e.g,, individual copepods, sperm sus- 50-1000 PI), agarose gel apparatus and power
pensions, single larvae) often can be amplified di- supplies (see Chapter 8), a UV light source to
rectly-that is, without prior DNA extraction. view the PCR products, and a PolaroidTM camera
This limits the chance of contamination during ex- for photodocumentation of gels. To help solve
traction, and speeds analysis of large number of contamination problems, a powerful UV source-
samples. Larger samples (e.g., bone, dried skin, a UV cross-linker-can be used to irradiate PCR
blubber, canned meat) can be extracted first, and solutions and hardware. Other equipment llsted
then amplified. in Chapter 9, such as microcentrifuges, is requ~red
for some of the following protocols.
Long PCR
A number of recent reports laid the groundwork
for amplifications of large sections of DNA (see
Cheng et al., 1994a,b and references therein). The
methods they describe do not differ dramatically 1. DNA isolation for PCX
from the protocols used to amplify short sections 2. Polymerase chain reaction
of DNA. However, the addition of two types of
enzymes to the PCR reaction, one of which in- 3. PCR from RNA
cludes a 3' -+ 5' cxonuclease activity, seems to en-
hance the yield of Iong PCR reactions, possibly by Protocols used to amplify DNA ln molecular sys-
correcting mistakes made by Taq polymerase tematics may requlre considerable improvisation.
(which does not proofread) (see W.M. Barnes, The exploration of different taxa, many of wlxcli
1994 and Chapter 9, Protocol 17 for a more com- have never before been the subject of molecular
plete discussion). study, requires a flexibility not normally needed
Such Iong PCR amplifications may be partic- in laboratories specializing in well-understood
ularly useful for mtDNA because the two widely model organisms. Solution of the special prob-
used 16s rDNAprimers (16Sa and 16Sb) could be lems of phlox phenolics, molluscan mucus, or
used to amplify the entire mtDNA of a given simian pseudogenes requires both imag~natlon
species. In addition, long amplification allows and common sense. Remember that irnaginat~ve
:17odiflcatlons may be required to use the follow- Far%A. Orgaarelle Separation

ing prnLocoXs wit11 specialized taxa. Xn this step, cells are broken and their contents re-
Many PCR laboratories may not have access leased into a buffer designed to limit nucleic acid
to high-speed centrifuges with variable rotors, so degradation. By maintaining the integrity of these
the ioliowing protocols assume the use of a mi- organelles, they can be cleaned and isolated from
crocentrifuge. Most table-top microcentrifuges one another, increasing the quality of DNA ex-
have similar sized rotors and a meter to indicate tracted, and increasing the relative proportion of
speed in revolutions per minute (rpm). Tltere- organelle over nuclear DNA. Physical disruption
fore, the following protocols indicate centrifuge is the most common form of tissue homogeniza-
speed (in rpm) rather than centripetal force (in tion, generally using a teflon pestle that fits
g), unlike the standard in the rest of this book. loosely in a cylindrical tube. A few passes of the
Reallze, however, that the relative centripetal pestle from the top of the buffer to the bottom of
force is dependant not only on rpm but on the the tube, often with rotation, is enough to break
radlus of the centrifuge rotor. The speeds given up soft tissues like mammalian liver, fish gonad,
a r e for an average rotor diameter of =I0 cm; or insect larvae. Gentle homogenization breaks up
0 t h ~rotor sizes will require adjustments up or cells but not nuclei or mitochondria, allowing
down in speed. The conversion between rpm (in these organelles to be separated from the rest of
thousands) and g force (RCF) is RCF = 11.2(rpm)~ the homogenate by centrifugation. This is a strong
x radius (in cm). advantage of this old technique over newer, enzy-
matic methods.
1. Procure/dissect roughly 0.1-0.5 g of the de-
fgrra:ot?ol1:DNA 'esoEatiox~for IEXCR sired tissue type and place in a 1.5-ml micro-
(Tl~lrne.=4 hr) centrifuge tube. [Note:It is more common for
people to use too much tissue than too little.
A good DNA extraction, clean of contamination,
If you are working with a tissue that is espe-
is vastly easier to amplify and remains stable far
cially DNA dense, such as sperm, do not start
longer than a quickly done preparation. There are
with "generous" quantities of gonads or cells.
certainly times when quick extractions are appro-
Obtain sparing samples and wash them in a
priate, and even times when quick extractions are
water / seawater/Ringer's solution before
optimal, but these times usually come later in a
putting them in the homogenization buffer.]
project, when methods and primers and data col-
lection procedures have become routine. 2. Add an approximately equal volume of ho-
Different tissues require different ap- mogenization buffer (100 rnM EDTA; 10 rnM
pronches to release the DNA from the cells with- Tris, pH 7.5; 100 mM NaC1) to the tissue
out exposing it to too many exogenous chemi- swatch.
cals But once the high-molecular-weight DNA 3. Macerate or grind the tissue. We use a mi-
is released into solution, along with proteins, cropestle made by pouring plastic casting
carbohydrates, RNA, and secondary com- resin or hot melt glue into a microcentrifuge
pounds, chemical isolation of the nucleic acids tube and letting it harden with a stiff rod in-
is straightforward and standard. Other chapters serted into it for a handle. Insert the pestle
in this volume cover some of the standard DNA gently into the microfuge tube and rotate.
extraction techniques such as phenol/chloro- Make about three passes at the tissue (or until
form extractions dnd DNA precipitation. The it appears milky), [Note: When working with
following section is limited to discussing some more than one preparation it is a good idea to
of the types of extractions that are more specific keep the samples on ice between steps of the
to PCR.These sections include organelle extrac- extraction procedure. This precaution may
tion, difficult-to-extract tissues, and clean-up slow down any enzymatic digestion of the
procedures for PCR contaminants. DNA that may occur before it is isolated.]
Nucleic Ac,ids 11: The Polymerase Chain Reaction 223
4. The solution prepared in step 3 contains only Dilute such samples in TE and mix until the
about 0.1% DNA; the rest is cellular debris. To viscoelastic behavior of the solution is greatly
enhance the relative fraction of DNA, cen- reduced.
trifugation can be used to separate nuclei and 8. Proceed with phenol/chloroform extraction
mitochondria from the cellular fragments and detailed in Chapter 9 (Protocol I).
from unbroken tissues. Centrifuge the ho-
mogenate at about 1,000 rpm in a microcen-
trifuge to pellet debris. One minute is usually Part. B.Variations an thc Basic Extraction
sufficient, but longer spins may be appropri- PRESERVED TISSUES In many circumstances, tissues
ate for viscous solutions. The supernatant are not fresh when obtained. For tissues that are
should be cloudy but should not contain ob- frozen, or stored in preserving chemicals like al-
vious particulate matter. cohol, DMSO/EDTA, or lugh salt solutions, initial
5, To separate the organelles (containing the DNA isolation procedures might differ from those
DNA) from the soluble proteins, carbohy- listed above.
drates, etc., microcentrifuge this supernatant For frozen tissues, the thawing process often
for =3 min at 14,000rpm. Longer spins may be releases active DNases that degrade nucleic acids.
important for some homogenates. The mito- To avoid this problem, grind the tissue with a
chondrial pellet should be a velvety gray-tan. mortar and pestle while it is still frozen in liquid
[Note: It is possible to improve the isolation of nitrogen or dry ice. A small coffee bean grinder
mitochondria1 DNA at this point by first per- does a good job of powdering tissue when the
forming a low-speed spin (2-3000 rpm for 2 frozen tissue is mixed with equal parts particulate
min) to remove nuclei, and then performing dry ice.
the high-speed spin to pellet mitochondria. After the tissue has been ground, proceed
Different tissues require slightly different con- with DNA isolation, either using the basic proce-
ditions, so check a few test pellets and super- dure outlined in Part A, or more generally, the
natants by viewing them in a microscope. At proteinase K isolation described in Chapter 9.
400x, nuclei usually are visible as round, clear For tissues immersed in preservative chemi-
bubbles with a definite diameter. Mitochon- cals, first remove the chemicals. This usually can
dria can barely be seen as dark dots moving be done by blot-drying a specimen with paper
around by Brownian motion.] toweling, or if alcohol was the preservative, the
sample can be dried in a vacuum. Most of the
6. Pour off the supernatant gently. Resuspend
preservatives used in storing samples for DNA
the pellet in the original volume with TE, If
work will not interfere too much with DNA ex-
the velvet part of the pellet is in a ring around
traction, so exhaustive removal of these chemicals
a darker looking center, try to re-suspend the
is not required. The major exception to this state-
velvet ring and transfer it to a new tube, leav-
ment is formalin. Formalin cross-links DNA and
ing the dark center behind, The re-suspended
proteins, and prevents efficient extraction of high-
pellet should again be cloudy.
molecular-weight DNA. Some formalin-preserved
7. Add enough 20% SDS to bring the re-sus- specimens remain useful for PCR, however, espe-
pended pellet to 1%SDS. The solution should cially those that have been "hardened" in forma-
immediately clear as the membranes dissolve. lin for only a short time before being placed in
If too much tissue was used, the solution will ethanol. For formalin-preserved specimens, high
be extremely viscous. Ideally, it should be dithiothreitol (DTT) concentrations (up to 100
slightly viscous, but not stringy with DNA. mM) in the proteinase K extraction buffer seems
Samples that are too dense with DNA at this to help break the protein-DNA cross links.
point have an extremely high viscosity and
exhibit impressive viscoelasticities, which BONE Bone contains substantial amounts of
hinders subsequent separation procedures. DNA trapped in the cells that formed the calci-
um carbonate matrix, and in the central marrow. DNA from plants for PCR. If the DNA is still not
Altl~ougl~there are several published reports of pure, the following protocol (contributed by S.
DNA extraction from bone (even fossil bone), Miller, U.S. Fish and Wildlife Service) allows pre-
contaminants that interfere with PCR have cipitation of DNA away from muc~polysacclza-
plagued this field (Hoss and Paabo, 1993). The rides that inhibit PCR.
following protocol was suggested by Hoss and 1. Heat genomic DNA extraction at 60°C for 1
Paabo (1993) to alleviate this problem. hr.
1. Remove a 1-mm layer from the outside of the 2. Add 1/2 volume of 8 M LiC1.
bone with a drilling machine to reduce conta- 3. Let stand for an hour, then centrifuge at high
mination. speed in a microcentrifuge to pellet the pre-
2, Grind the sample to a fine powder in a freezer cipitated DNA. Mucopolysaccharides remain
mill (see Hoss and Paabo, 1993) with liquid in solution, so remove supernatant and resus-
nitrogen, or use a coffee grinder and dry ice. pend the pellet in water or TE.
3. Add 0.5 g bone powder to 1 ml of 10 M
guanidinium thiocyanate (GuSCN), 0.1 M
Tris-HC1 pH 6.4, 0.02 M EDTA pH 8.0, and RESCUING SAMPLES THAT WILL NOT AMPLIFY On
1.3%Triton-X 100. occasion, some precious sample will not give
4. Incubate at 60°C for 1-3 hr with occasional good amplifications, no matter how malzy per-
agitation. mutations of the amplification protocol are tried
(see "Troubleshooting"). Sometimes cleaning up
5. Centrifuge for 5 min at about 5000 rpm in a
the DNA solution can solve this problem.
rnicrocentrifuge.
6. Extract the supernatant with glass milk (see Filtration:
Chapter 9). I. Use an ultrafiltration tube (e.g., Centricon 30
or 100)to wash the DNA. Add 1/3 volume of
EXTRACTION CONTROLS When using small 7.5 M ammonium acetate (pH 7.5) to the
amounts of precious tissues like fossils, hair, or DNA sample and add enough 2.5 M ammo-
larvae, it is important to include extraction con- nium acetate to fill the filtration tube.
trols. These are extractions of nothing, using the 2. Spin using the recommendations of the ven-
same chemicals and conditions used to extract dor to reduce volume to about 10 pl.
the small samples. The products of these shadow-
3. Add sterile distilled water to fill the tube and
extractions should be included in subsequent
repeat the centrifugation,
PCR amplifications to certify that no exogenous
DNA was in the extraction chemicals. 4. Repeat step 3.
5. Dilute sample to original volume.
PROBLEMS CAUSED BY PHENOLICS OR MUCUS In
some cases, huge excesses of carbohydrate, pig- Re-precipitate:
ment, or other chemicals are released from a tis- 6. Precipitate the DNA a second time to rid it of
sue. Worse, these offending molecules often co- unwanted salts, Add 1 / 3 volume of 7.5 M
purify with the DNA, preventing subsequent ammonium acetate (pH 7.5) to the DNA sam-
amplification. To help limit these problems, a ple and add enough 2.5 M ammonium acetate
number of exotic extractions have been devel- to increase the volume to ten times the origi-
oped. The most popular uses a detergent called nal amount.
CTAB that complexes with carbohydrates and 7. Add one volume of isopropanol or two vol-
can be phenol extracted (see Chapter 9). Fangan umes of EtOH, and collect pellet as described
et al. (1994) give a good review of extracting in Chapter 9 (Protocol 1).
Nucleic Ac ids 11: The Polymerase Chain Reactiolz 225
]'art C. Amplifying froxa~Ti'ssraes produce a large amount of that product and only
For very small amounts of tissue, cellular contents that product.
released into the amplification cocktail do not The following PCR protocols work well with
seem to prevent the polymerase chain reaction. In a variety of different DNAs. There are a number
these cases, the tissues are added directly to the of different buffers being used in other labs and in
PCR cocktail, and subjected to thermal cycling, the literature that may work as well as or better
avoiding the DNA extraction step entirely. It is a than ours. In particular, the amount of MgC12
good idea to use a PCR buffer that includes some used in the PCR reactions should be adjusted for
detergent (Triton X-100 or NP-40) when attempt- every different DNA template being used. Some
ing amplifications directly from tissues. Some recent reports suggest adding other organrc sol-
people have had more success when the tissues vents such as DMSO, formamide, or PEG up to
are placed in the PCR cocktail (minus enzyme) about 1%concentration to the reaction buffer. If
and are incubated in tlie refrigerator overnight. you are having problems, explore other buffers
Others incubate tissues at 94OC far a few minutes, using the experimental approach outlined above.
centrifuge the mixture and use an aliquot for PCR. Table 2 lists some inhibitors of the PCR reaction.
Whether a particular tissue will allow this type of Above the levels listed, PCR reactions tend to be
amplification will require some experimentation. inhibited. But below the levels shown, such addl-
Below is a protocol for the direct amplification tives often can help adjust yields and reaction
from bacterial colonies that is very useful in facil- specificity.Note that some of these levels are sur-
itating the screening of plasmid or phagemid prisingly high (e.g., for urea). Finally, careful ad-
clones: justment of reaction conditions appears to favor
PCR of long fragments (W.M.Barnes, 1994; Chcng
1. Label. colonies growing on agar in petri dishes
et al., 1994a). W.M. Barnes (1994) reported the first
by drawing circles around them on the back
amplification of very long DNA fragments
of the plates.
through use of several different polymerases in
2. Prepare 25 pl PCR cocktails for each labeled the same mix.
colony using primers that are upstream and
downstream of the point in the vector into Part. A, iiocibie-Strailiiecf D N A iamptifica8ionc
which the inserts have been cloned (e.g., use This is the most common type of amplification; jt
M13 and M13-reverse sequencing primers). can produce DNA products for many uses. The
3. Using a sterile loop, pick a colony and swirl goal is to produce a large amount of double-
the loop into a PCR reaction tube. Do not use stranded DNA copied from a particular gene re-
too much of the colony or the amplifications gion. Start by preparing or assembling the follow-
will not work. A light toucli is all that is ing solutions:
needed.
4. Amplify at 55°C annealing, using an exten-
sion time appropriate for the size of the insert Table 2
(see Protocol 2). Levels above which various solvents
inhibit PCR
Solvent Inhibitor)) level
F%otocoE2: The Poiys~eraseChain
Reactian Ethanol 10%
(Time: 1hr to set up, 2 4 hr for a 40-cycle Urea 1.5M
amplification) DMSO 1%
PCR is a means to an end: tlie genetic characteri- Formamide 10%
SDS 0.01%
zation of a species or a population or an individ-
ual. Given a particular target DNA, the goal is to From Gelfand and White, 1990
1. DNA template (genomic DNA, cpDNA, 5. Place tube in thermal cycler and start run (see
mtDNA, etc.). "Thermal Cycling," below).
2 10x polymerase buffer (Appendix).
MULTIPLE REACTIONS Very seldom are PCR reac-
3 10x dNTPs (Appendix). These are the raw tions run singly. If nothing else, this violates the
materials for DNA synthesis, and are usually maxim of always using positive and negative con-
used at 200 pM. The 10x solution is thus 2 trols. In fact, several reactions, identical except for
ntM for each dNTP. the template, typically are run side by side. In this
4. Oligonucleotide primers, diluted to 10 pM. case, it is convenient to make a PCR cocktail that
5. MgC12in water (usually 150 mM). includes everything but the template by multiply-
6 Taq poIymerase (kept cold). ing all the volumes in step 1 above by the number
of reactions to be run. This limits measurement er-
7. Distilled and sterilized water. rors, especially of small sample volumes, and it
ensures that all the reagent concentrations in each
BASIC REACTION The following protocol is for 25 tube are exactly the same.
pl reactions. It can be scaled up as needed.
1. Multiply volumes in step 1 under "Basic Re-
1. Mix the following in a microcentrifuge tube: action."
2 5 ,u! lox Taq buffer, 2.5 @ 8 8dNTPs (i.e., 2. Add reagents together in any order, except
2 rnM each of dATP, dGTP, dCTP, dTTP), 1.2 enzyme is added last. (See note about the care
pl each of two primers (10 pM stock solu- of enzyme above.)
tions), 0.5-1 U Taq polymerase (this is a lot
3, Mix enzyme gently and aliquot the reaction to
less than recommended by many suppliers separate sterile 500-,d tubes.
but it is sufficient). Add ddH20 to make 24 ,d
per reaction (=I7pl).The order in which these 4. Add template (see note above about
are added is not very important as long as the amounts).
enzyme comes last. Make certain the enzyme 5. Add oil. Cap and label tubes.
solution, which typically is in a heavy glyc- 6. Spin tubes briefly in a microfuge, and place in
erol stabilizing buffer, is well mixed, but do thermal cycler (see "Thermal Cycling").
not create any froth in the tube. Keep the en-
zyme on ice or in the freezer until needed; re- THERMAL CYCLING The PCR cycle is relatively
move it only to take the small amount of fluid
simple and is composed of three major steps
11eededfor the reaction. Using a larger cock-
reviewed below (a more complete description is
La11 (see below) solves problems with measur-
presented earlier in this chapter).
ing small amounts of solutions.
2 Add about 1 pl template DNA (in ddKzO or 1. Denaturation. 94°C for 30 sec seems to work
0 . 1TE).
~ This should be about 1-2 ng of well, but shorter times have also been recom-
mtDNA or 1-2 ,ug genomic DNA. If in doubt, mended. If the melting temperature is too
use less. low, or the time too short, the double-
stranded DNA may not denature, thereby re-
3. Add any additives, such as extra Mg2+, ducing the efficiency of the reaction. This is
DMSO, etc. especially true for the first cycle, in which the
4. Add 1 drop of mlneral oil (common pure goal is to denature high-molecular-weight
Jrug slore variety) to prevent evaporation of DNA. Some protocols suggest a long initial
the sample. If condensation forms on the top denaturation (usually 60 sec). However, Taq
of the PCR tubes during cycles the reactions polymerase loses activity with each denatu-
may not work. Spin in microcentrifuge for =5 ration cycle and eventually becomes less ac-
sec. [Note that larger reactions need more oil tive. So, there needs to be a balance between
because the surface of the fluid is larger for denaturation of the DNA and that of the
high volume soiutions in conical tubes.] enzyme.
2. Annealing. Standard temperatures are about
55°C for 30-60 sec for good primers. If you perature. The following types of temperature cy-
are having problems with getting any prod- cles generally are appropriate when dealing with
uct at this annealing temperature, lower the various DNAs, primers, and annealing tempera-
temperature in stages (of about 2") to 4548°C tures (Table 3).
(although temperatures as low as 37°C some-
times work). For long, perfect primers, an- HIGH-STRINGENCY BOUNCES When the primers
nealing temperatures of 60-65°C or higher are very good, the annealing temperature is
may be used. high, and the product is under about 500-750 bp,
3. Extension. The Taq polymerase works best at it is possible to "bounce" from the denaturation
temperatures of about 72-75°C. The enzyme to the annealing to the extension very quickly,
synthesizes thousands of bases a minute, so with only a 15-sec pause at each step. These
long extensions generally are not needed. amplifications proceed very quickly, and often
Thirty sec is adequate for products u p to produce very clean, single products.
about 500 bp; 60 and 90 sec are needed for
products of 500-1500 and >I500 bp, respec- LOW-STRINGENCY SHUPPLES Lowering the an-
tively. nealing temperature is a commonly used method
of encouraging a reluctant PCR reaction. There
are few rules to go by, but there are two general
Part G. Variations in the Cycle approaches. The first is to lower the annealing
If a high annealing temperature is not practical temperature by a few degrees in subsequent PCR
(usually because of an imperfect match between experiments until a product is produced. The
primers and template), the simplest way to en- other is to drop to very low temperatures
hance PCR success is to lower the annealing tem- (4045") and observe how many products (e.g.,
Table 3
Combinations of cycle temperatures that can be used for different matches of
primer to templaten
Cycle
temperatures Used for Comments
94-60-72 Perfect, long primers Higher temperatures can be used;

see primer section to calculate
maximum annealing temperature
94-55-72 Good or perfectly matched Standard conditions, useful for
primers between 19 and 24 bases most amplifications
94-50-72 Adequate primers When there are about 1-3
mismatches out of 20
94-48-68 Poorly matched primers When there are about 4-5
mismatches out of 20
94-25-65 Unknown match, likely very poor Where the primer is of very question-
able quality; often used as a starting
point for long-shot PCR.
Note that the extension temperahre
has been lowered
94-37-65 Last effort before giving up These conditions often give
uncontrolled results
These are starting points only; separate ex eriments need to be conducted for each new
tempiate/primer combination to optimize &R
DNA strands of different sizes) are produced. is meant to favor annealing of poor primers m
Then the annealing temperature is increased by the early cycles. In later cycles, only the products
stages to eliminate the extra bands. These extra made in the first few cycles will amplify. This is
bands also can be manipulated by increasing because PCR products have incorporated the
MgClz concentrations: often fewer bands are synthetic primers, so the free primers are a per-
produced at 3 rnM MgC12. fect match for the new products. A typical step-
Note that when the annealing temperature is up is: 5 cycles of 94-45-72 for 30 sec each, fol-
very low, it is a good idea to drop the extension lowed by 35 cycles of 94-55-72 for the same
temperature as well. This is so that the primer times.
does not fall off the template before the enzyme Obviously, longer extension times might be
has had a chance to begin synthesis. Alternatively needed for long products. Possible variations in-
this is one instance when it is sometimes appro- clude altering the number of low stringency cy-
priate to change from one temperature to another cles (3-lo), altering the annealing temperatures
slowly This type of temperature change is called a (based on the T, of the primers), and decreasing
"ramp" and is a basic part of the programming the extension temperature of the low-stringency
options on some thermal cyclers (but not all). In cycles.
these cases, after annealing, the temperature is in-
creased slowly toward the extension temperature, TOUCH DOWN PROCEDURES Another approach,
allowing the enzyme to work (albeit slowly) to ex- called a "touch down," works best if the primer
tend the primer. This extension solidifies the is a good match to the template, but has alterna-
primer's "grip" on the template, allowing the ex- tive binding sites as well. High-stringency
tension temperature to be raised even further. annealing steps favor binding only to the correct
Typically, ramp times of about a minute are used sites. Later, when the mixture is dominated by
to "locktfa bad primer in place. PCR products from the initial few cycles, lower
Some investigators believe that it is useful to stringency annealing temperatures are less likely
extend the annealing time in addition to (or as op- to result in binding at the alternative sites. A typ-
posed to) extensively lowering the annealing and ical touch down sequence is:
extension temperatures in steps 2 and 3. This
2 cycles at 94-60-72 for 30 sec at each step
gives a primer more of a chance of finding its
complement and allows the Taq polymerase (even 2 cycles at 94-58-72
though not at optimal temperature) to extend the 2 cycles at 94-56-72
primer sequence a little, thus "locking" it to its 2 cycles at 94-52-72
complement on the template. The trade-off is that
32 cycles at 94-50-72
this also greatly reduces specificity of the reaction,
and non-specific products can result.
hart.43, Single-Stranded Amplifications
A D D r n o N OF A THIRD PRIMER Adding a third In single-stranded or asymmetric PCR amplifica-
primer to the cocktail can greatly enhance the tions, ssDNA template for sequencing is pro-
yield of product from the other two. This third duced by limiting one of the two primers. During
primer should anneal outside of the other two, the initial cycles of a PCR run, dsDNA is pro-
and be added in about one-fifth the normal duced as in a normal alnplification. However;
amount. This procedure often enhances the sub- during the final cycles the limiting primer runs
sequent production of single-stranded template out. The other primer continues to initiate ampli-
with asymmetric amplifications. fication, but only single-stranded products are
produced with each cycle. The trick to this
STEP-UP PROCEDURE A " ~ t e p - ~ p " i n run-~ ~ l ~ e method
~ is in adjusting the concentration of the
ning the first few cycles (3-5) at a low annealing limiting primer so that it runs out after enough
temperature, and then switching to high strin- double-stranded product has been produced and
gency to finish the ampljfication. This procedure before the PCR run is complete.
Nucleic Acids II: TIze Polynzerase Clinitz Xeactiorz 229
Although asymmetric amplifications used to alnplification in a new reaction with only one
be the primary means for generating DNA se- primer (use 5 pl of 10 plvl solution of pnmer, as
pencing template from PCR products, there are usual). The amount of template added is deter-
now several other methods, such as double- mined by the amount of double-stranded DNA in
stranded sequencing, cloning, solid-phase se- the first amplification.
quencing, and cycle sequencing (see Chapter 9). Metlzod B: Take a one-hundredth dilut~onof
These other metl~odstend to be more reliable and the double-stranded PCR product and use that 111
less prone to the failures so common in asymmet- an asymmetric single-stranded amplification as
ric PCR. These failures can be resolved, however, described above.
for given templates, and in these cases, the Mefhod C: Gel purify the double-stranded
method can work very we11 to provide sequenc- band. Run the double-strand ampIification on a
ing template. regular 2% agarose gel. Cut out the appropriate
The use of a third primer, which anneals in- band with a sterile blade. (This method is ideal ~f
side those used in the original double-stranded the double-stranded amplification has multiple
amplification, often gives superior single- products.) Some investigators recommend uslng
stranded amplifications. This is probably because TAE instead of TBE in these agarose gels. Soak the
use of the third primer adds a new layer of speci- gel slice in 1 ml sterile water for 1-3 hr, then re-
ficity to the single-stranded reaction that is not place with 50 pl sterile water and freeze the gel
added by using one of the previous primers. This slice. Thaw it immediately, and repeat the
appears to be important for ribosomal RNA gene freeze/tl~awcycle two more times. Use 1 pl of thls
primers, or others that tend to give non-specific solution in the single-stranded amplifications. Al-
double-stranded products. ternatively, take a tiny sllce of out of the middle of
Single-stranded DNA can be produced from the gel band that contains the DNA and use that
pure mtDNA, or from a previous double-stranded directly in a PCR solution as template. Low-melt
PCR amplification. Typical reaction mixtures are agarose can also be used in the originaI gel; in this
in 100 fl instead of 25 pl. case, cut out the band and melt it at =80°C. The
agarose can be removed by phenol extraction, fol-
SINGLE-STRANDED AMPLIFICATIONS FROM PURE lowed by a chloroform extraction and EtOH pre-
mtDNA FOX. pure mtDNA samples with good cipitation (see Chapter 9, Protocol 1).
primers, an asymmetric amplification can be
used. Make the PCli cocktail as above, with the
following differences. One of the primers (the PrseocsX 3: XrCR Exam RRNA
limiting primer) is one hundredth the concentra- (Time: 3 hr for first strand synthesis; 3 hr for
tion used i n double-stranded amplification. standard PCR)
Thus, use a 0.1 p M primer solution (instead of
the normal 10 p M solution). Add 1-5 ng of pure In some cases, the most appropriate starting ma-
mtDNA as template. terial for PCR is not DNA, but is instead an RNA
transcript. To amplify from RNA, an additional
SINGLB-STRANDBD AMPLIFICATIONS FROM DOUBLE- step is required. This is because Taq polymerase
STRANDED AMPLIFICATIONS Use one of the three can only use DNA as a template. The enzyme re-
methods outlined below to generate single- verse transcriptase (RT) is used to reverse tran-
stranded product from initial double-stranded scribe RNA into DNA.
amplifications. In all three methods, primer PCR amplifications from cDNAs can be done
annealing temperatures are stringent (55-60°C), with no modification of the basic protocol. First-
regardless of the conditions originally needed to strand synthesis of the cDNA is accomplished us-
generate the double-stranded product. In gener- ing RT primed with an oligo dT. This primer (a
al, 100 p1 reactions are used, increasing the pro- string of T's) anneals to the poly-A tail that is
duction of template. added By the cell) to most messenger RNA before
Method A: Use 1-5 ,dof the double-stranded it is translated. The result is a IWA/DNA hybrid
that can also be used directly in PCR reactions. TO 3. When primers are made, the stock solution
clone these products into a cDNA library, second- usually is highly concentrated. From this
strai-td synthesis is required. This is the replace- highly concentrated stock solution, it is desir-
ment of the first RNA strand with a DNA strand. able to make a 100 pM stock solution which
Prokocols for extracting RNA from animal and can then be used in malung 10 pM solutions
plant tissue are given in Chapter 9. for individual use. The different stock solu-
tions are stored separately. In this way mas-
1. Dry about 1pg of mRNA in a lyophilizer. Do
sive, laboratory-wide contamination prob-
no1 overdry it because then it will re-suspend
lems can be avoided and any contamination
poorly.
problems that do arise can be contained.
2. To 30 pmol of prlmer add RNase-free water
4. Different sets of pipetters should be desig-
up to a total volume of 11 pl.
nated for different procedures. One set of
3 Heat to 70°C in a heating block for 10 min and pipetters should be designated for preparing
freeze on dry ice. Let the sample warm to PCR reactions, These pipetters should never
room temperature slowly (about 10 m i d . come in contact with any amplified DNA. An-
4 A d d 4 pl of reaction buffer (usually supplied other set of pipetters can be designated for
with RT enzyme), 2 p10.1 M DTT, and 1.5 p l post-PCR use. One pipetter should be desig-
dNTPs (10 mM each). nated to be used only in loading samples in
5 Incubate at 37OC for 10 min. agarose gels. Another set of pipetters should
be designated for use with radiation only.
6 Add 1.5pl of reverse transcriptase and incu-
bate a t 37°C for 60-90 minutes.
7. Amplify the target sequence from cDNA us- Some Common Problems With PCR
ing the standard PCR protocol (Protocol 1).
Problem: No PCR product, not even in positive
controls.
TROUBLESHOOTING Possible remedies:
1. Repeat the experiment.
2. Check buffer, dN'TPs, and primer recipes and
Avoiding PCR Problems: PCR Hygiene concentrations. Iiemake any questionable so-
lutions.
Bccduse PCR products are so concentrated and
easily volatilized (by opening a microfuge tube or 3. Try a different set of primers or a different
plpettmg, for instance), cross-contamination of positive control.
sanlples is potentially a serious problem. Certain 4. Try a new batch of enzyme (this is seldom the
slmplc precautions can be taken to avoid contam- problem unless the enzyme is very old).
lnalion or at least ~nilzimizeit if it occurs. 5. Was oil added to the reactions?
1 . Al~quotingsolutions makes it possible to con- 6. Check the thermal cycler by watching it go
Lcun and help resolve contamination problems through 2-3 cycles.
thdt do arise. Each person working in the lab
should have his or her own set of solutions.
Problem: Positive control works, but otherwise
PCR reagents prepared in large amounts there is no product.
should be distributed in 1.5-ml microfuge
tubes and stored at -20°C. Possible remedies:
1. Run 5 pl of the stock DNA solution on a 1%
2. Water used for PCR reagents, DNA, and agarose gel. If there is a large amount of high-
pnrners should be double-distilled, sterilized, molecular-weight DNA, try diluting the start-
and then distributed in 1.5-ml microfuge ing template DNA (try dilutions of 1:10 or
tubes and stored at -20°C.
1:100). If there is no high-molecular-weight series will determine which concentration re-

DNA, increase the amount of starting material suits in the best bands.
or switch to better samples of genomic DNA. 4. Try fewer cycles. This is often recommended
2. Try lowering the annealing temperature in the but is probably not the best solution. While
PCR cycle. there mav be less evidence of non-specific
3. Try a step-up cycle (see Protocol 2, Part B). amplification, subsequent amplification from
4. Try using more cycles on the PCR machine this PCR reaction will amplify even minute
quantities of non-target DNA to visible levels
(increase from 40 cycles to 50 cycles). This is
(unless gel slices are used). A better solution
effective only when the product is present but
is to optimize conditions to reduce mis-prim-
in small quantity.
ing (e.g,, temperature and salt concentration
5. It is possible that something in the DNA tem- in buffer).
plate is interfering with the PCR reaction.
5. Try gel purifying the double strands (only
This can be determined by setting up a single
take the brightest part of the band) and then
reaction with two templates (the added tem-
reamplifying (with stringent conditions) the
plate should be known to work well with the
purified double-stranded product.
primers being used). If the problem template
prevents the added template from amplify- Problem: Bands in the negative controls.
ing, then there is something in the problem Possible remedies:
template solution that is inhibiting the reac- 1. Often, in spite of all precautions, contamina-
tion. The mucopolysaccharides present in the tion problems occur. Once contamination be-
mucus of corals and snails, for instance, are comes a visible problem, the contaminant is
known to co-precipitate with DNA and pre- often in more than one solution, so altering
vent PCR. To solve this problem, try diluting one solution may not be informative. Remak-
the problem template, or try one of the rescue ing all stock solutions is desirable.
procedures outlined above. 2. Wash the pipetters well and expose the tips to
6. Switch primers and try again. 10 min of W light.
3. Treat the solutions, including the primers,
Problem: Bright bands in well of agarose gel fol-
with UV light. Place the solutions in plastic
lowing electrophoresis. tubes on a UV light source and illuminate
Possible remedy: Such bands usually result from
them for 10 min (less if the UV source is a
overamplification of the PCR product or from in-
short: wavelength source). This tends to break
sufficient dilution of the product prior to elec-
up contaminating DNA, making it less attrac-
trophoresis. This is also a common result of am- tive as a PCR template.
plifications from too much genomic DNA. Try
diluting the template 100- to 1000-fold. 4. If all else fails, the contaminant can be de-
stroyed with restriction enzymes or DNase
Problem: "Smearing" of double-stranded PCR (see Furrer et al., 1990).
products or multiple bands following electro-
phoresis. Problems with Single-Strand
Possible remedies: Amplifications
1. Try less template. The most common cause
seems to be too much template. If initial attempts to sequence from asymmetric
2, Try annealing temperatures 2-5°C higher. A amplifications fail, it might be useful to adjust the
lot of smearing, or multiple bands, may indi- concentration of the limiting primer. Too much
cate that the primer is annealing to other parts limiting primer will not allow the production of
of the template DNA. any single-stranded template. Too little limiting
primer will not generate enough double-stranded
3. Try varying MgC12 concentrations. A dilution
product from which to produce single-stranded For most gene regions, primer sequences are
templates. Try the following remedies to produce given and aligned with published sequences from
high-quality single-stranded product. a variety of taxa. A reference map is also included
that shows the location of the primers relative to
1. Try a dilution series with the double-stranded
each other. The labels S'and 3' indicate "up-
product to determine the template concentra- stream" and "downstream" primers, respectively.
tion that works the best. The following standard degeneracy symbols are
2. Re-do the double-stranded amplification at used:Y=CorT;R=GorA;Z=CorG;S=Cor
higher stringency. It may be that the original A;Q=AorT;M=A,T,orC;D-GorT.
double-stranded amplification is tainted with
non-specific products. This is especially help-
ful when the single-stranded products are Nuclear Ribosomal Gene Primers
"smeary" on the gels, indicating a non-spe- Three of the eukaryotic nuclear ribosomal RNA
cific product. genes are organized in a cluster that includes a
3. Adjust the concentration of the limiting small subunit gene (16s to 185, where S stands for
primer in the initial double-stranded amplifi- Svedburg units, a measure of sedimentation rate),
cation. a large subunit gene (26s to 28S), and the 5.85
4. Try gel-purifying the double-stranded tem- gene. In addition, two internal transcribed spac-
plates. Gel-purification, in principle, should ers (ITS-1 and ITS-2) lie between these genes and
eliminate all excess primers. Therefore, both there is an external transcribed spacer (ETS) at the
primers must be added to the single-strand re- 5' end of the transcribed RNA . These six compo-
actions. The concentration of the limiting nents make up the basic cluster; they are repeated
primer should be about one-hundreth the con- in a tandem array in the eukaryotic genome up to
centration of the second primer. However, ~un hundreds or thousands of times. Between each
a dilution series with the limiting primer to de- cluster in the array is a non-transcribed spacer
termine the concentration that works best. (NTS) that serves to separate individual repeats
from one another on the chromosome.
5. Tiy using an internal primer to generate sin-
In general, the genes are more highly con-
gle-stranded template (see "Method C").
served than the transcribed spacers, which are
6. Try a different sequencing technique. Al- more highly conserved than the non-transcribed
though asymmetric PCR was once the most region. Within even the conserved genes, how-
common way to sequence PCR products, ever, there are regions of very high sequence sim-
other (often more reliable) methods are avail- ilarity among taxa, and regions of low conserva-
able. See Chapter 9. tion (Hillis and Dixon, 1991). Thus, the entire
array is a patchwork of evolutionary rates.
PCR primers have been designed to span
USEFUL PRIMERS each of the ribosomal genes. Generally, the 5.85
gene is too short for most purposes, but primers
Included herein are primers that have been useful designed for this small gene can be used to am-
across a broad range of taxa for targets in nuclear, plify the adjacent ITS regions. The primers
chloroplast, and mitochondria1 genomes. Several shown in Figure 3 are a sample of the primers
other sources provide access to @imers that may listed by Hillis and Dixon (1991). Maps of se-
work on specific taxa, including Palumbi et al. quence conservation based on comparisons of
(19911, Simon et al. (2994), T.J. White et al. (1990), mammalian sequences to those of other phyla
and Kocher et al. (1989). These primers are all writ- can be found in Figures 3-8 of Hillis and Dixon
ten in the 5' -+ 3' direction. This means that the (1991). The map in Figure 3 is very approximate;
"downstream" primers are the reverse complements the reader should consult Hillis and Dixon (1991)
of the coding sequences to which they anneal. for details.
Nucleic Acids 11: The Polymerase Chain Reactiolz 233
I II 111 rv v VI
---+ -+ -+. -+ -+ -4
/ NTS I ETS I 18s I ITS-1 / 5.8s 1 ITS-2 / 28s

t f- t C C
W VIII IX X XI
Map Sequence
position Primer/Taxa position
I. 18e-5' CTGGTTGATCCTGCCAGT 5
Mammal . . . . . . . . . . . . . . . . .
Frog
Urchin
Fruit fly
Rice
Yeast
Protist
11. 18j-5' GCCTGCGGCTTAATTTGACTCAACACGGG 1231

Mammal
Frog
Fruit fly
Nematode
Rice
Ycast . . . . . . . . . . . . . . . . . . . . . .
Protist . .G . . . . . T . . . . . . . . .. . .
111. 18d-5' CACACCGCCCGTCGCTACTACCGATTG 1640
Mammal
Frog
Urchin
Fruit fly . . . . . . . . . . . . . . . . . . . . . . .
Nematode ...... GGAC
Rice . . . . . . . . . . . . . . . .C. . . . . . . .
Ycast . . . . . . . . . . . . . . . . . .G . . . . . .
Protist
IV . 28y-5' CTAACCAGGATTCCCTCAGTAACGGCGAGT
Mammal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Frog . . . . . . . . . . . . . . .C . . . . . . . . . . . . . .
Urchin . . . . . . . . . . . . . . .C . . . . . . . . . . . . . .
Fruit fly . . . . . A . . . . . . TT . . T . . . . G . . . . . . . .
Nematode . . , . AA . . . . . . . . . . T . . . . . . . . . . . .
Rice . . .T . .G . . . . . . . . C T . . . . . . . . . . . . .
l'east .C G...T.............
V. 28ee-5' ATCCGCTAAGGAGTGTGTAACAACTCACC 1795
Mammal . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Frog . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Urchin . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fruit fiy . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Nematode . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rice . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Yeast . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
(cont~nuedon next pqe)
Figure 3 Map of the nuclear rDNA array showing ap- Individual primer sequences are listed wit11 an allgn-
proximate positions of primers used to amplify regions ment to other taxonomic groups. Sequence positlon
of the 185, 5.85, and 285 genes, as well as the interven- refers to the starting position af the primer In thc! rele-
ing ITS regions (see Hillis and Dixon, 1991 for details), vant Mus rnusculus gene.
horn previous
(co~rh171ied page)
I I1 111 N V VI
-+ 4 4 -+ -+ -+
[ NTS 1 ETS 1 18s ] ITS-1 / 5.85 I ITS-2 I 285
b f- C t 4-
VII VIII IX X XI
Map Sequence
posltion Prlmer/Taxa position
VI. 28~-5' AAGGTAGCCAAATGCCTCATC 3429
Mammal
Frog
Frult fly . . . . . . . . . . . . . . . . . . . . .
Nematode . . . . . . . . . . . . . . . . . . . .T
Rice . . . . . . . . . . . . . . . . . . . . .
Yeas1 . . . . . . . . . . . . . . . . . . . . .
Protist . . . . . . . . . . . . . . . . .T . .
VI1 1811-3' AGGGTTCGATTCCGGAGAGGGAGCCTGAGAGAAA 420
Mammal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Frog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fruit fly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Neina tode . . . . . . . . . .C . . . . . . . . . . . . . . . . . . . . .
lbce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Yeast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Protist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VJII. 5.8~-3' GTGCGTTCGAAGTGTCGATGATCAA 85
Mammal . . . . . . . . . . . . . . . . . . . . . . . . .
Frog . . . . . . . . . . . . . . . . . . . . . . . . .
Loach . . . . . . . . . . . . . . . . . . . . . . . . .
Urchin . . . . . . . . . . .A . . . . . . . . . . . . .
Silkworm . . . . . . . . . . .A . . . . . . . . T . . . .
RJce T . . . . . . . A . . . A C . . . . . . G . TC .
1X. 282-3' AGACTCCTTGGTCCGTGTTTCAAGAC

Mammal . . . . . . . . . . . . . . . . . . . . . . . . . .
Frog . . . . . . . . . . . . . . . . . . . . . . . . . .
Urchin . . . . . . . . . . . . . . . . . . . . . . . . . .
Fruit fly . . . . . . . . . . . . . . . . . . . . . . . . . .
Nematode GC . . . . . G C A A . . . . . . . . . . . . . . .
Rice . . . . . . . . . . . . . . . . . . . . . . . . . .
Ycast T . . . . . . . . . . . . . . . . . . . . . . . . .
X. 28x3 GTGAATTCTGCTTCATCAATGTAGGAAGAGCC 4106
Mammal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Frog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fruit fly . . . . . . .T . . . . . . . . . . . . . . . . . . . . . . . .
Nematode . C. . . . . . . . . . . . G . . . . . . . . . . . . . . . . .
lbce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Yeast . . . . . . . . . . . . . .G G T . . . . . . . . . . . . . . .
Protist .C.TC . . . . . .G G . G . T . G . T . . . . . . . . . . .
XI. 28jj-3' AGTAGGGTAAAACTAACCT 4200
Mammal . . . . . . . . . . . . . . . . . . .
Frog . . . . . . . . . . . . . . . . . . .
Fruit fly . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
Rice . . . . . . . . . . . . . . . . . . .
Yeast
Prot~st
Animal Mitochondria1 Gene Primers been useful in phylogenetic studies of families,

genera, and even species within a genus (e.g.,
12s Ribosomal W A Primers R.G. Gillespie et al., 1994). This gene region sel-
These primers (Figure 4) amplify the gene for the dom is useful for population genetic surveys.
subunit ribosomal rZNA in mitochondria. Note that the ribosomal genes of insects are often
Like most ribosomal genes, it is fairly conserved reported in opposite orientation compared to
among taxa, but there are regions of high se- those of mammals. That is, the coding strand is
quence substitution. Overall, it seems to evolve at the published strand for mammals but the pub-
about the same rate as the average for the rest of lished strand for Drosophila is the non-coding
the mitochondria1 genome (Simon et al., 1990). (RNA synonymous) strand.
Secondary structure is highly conserved, and mol-
ecular evolution of nucleotides in stems versus 16s Ribosomal RNA Primers
loops has been the subject of increasing attention This is the large subunit ribosomal RNA gene in
(e.g., Dixon and Hillis, 1993; Kjer et al., 1994; Si- mtDNA. Like the 12s rRNA gene, this gene is
mon et al., 1994). The region amplified by the fairly conserved in sequence and secondary struc-
primers below is short, about 410 bp, but it has ture, and seems to evolve more slowly than the
I
-+
125rCNA 1
4-
I1
Map Sequence
I. 12SA-5' AAACTGGGATTAGATACCCCACTAT
Human . . . . . 1067
Frog . . . . . . . 2486
Urchin . CA . TGT.. 491
Fruit fly . A. .T T . 14612
12Sai-5' AAACTAGGATTAGATACCCTATTAT'
Human . . G. . . . . C C . 1067
Urchin . C . . . . . . . G . 491
Fruit fly . . 14588
11. 1258-3' GAGGGTGACGGGCGGTGTGT
Human . . . . . . . . . 1478
Frog . . . . . . . . . 2898
Urchin A . . . . . .A. 853
Fruit fly A .A C . . . . .A . 14211
12Sbi-3' AAGAGCGACGGGCGATGTGT
Human G G I G 1478
Urchin G . T 855
Fruit fly 14214
Figure 4 12s rRNA primers. Many of these primers designed to walk through most of the 16s gene and the
were first published by Kocher et al. (1989). They tend 3' part of the 12s gene. The 12Sai and 12Sbi primers
to work for most animal phyla, although there are ex- were made for insects by Chris Simon. They have a
ceptions. For vertebrates it is useful to use 12SA-5' with higher ratio of A's and T's than the other primers and
16SA-3' (Figure 5). This is a large fragment (=I425 bp) require a slightly lower initial annealing temperature.
that can be subjected to restriction digestion or se- These latter primers also work for most crustaceans (ex-
quenced from both directions.New primers then can be cept the barnacles) and many mallusks.
Map Sequence
position Primer /Taxa position
I. 16Sar-5' CGCCTGTTTATCAAAAACAT
Human . . . . . . . . . .C . . . . . . . . . 2510
Frog . . . . .C . .GC.T . . . . . . . . 3999
Urchin . . . . . . . . . .C . . . . . . . . . 5092
Fruit fly . . . . . . . . . .A . . . . . . . . . 13398
11. 16Sb-5' ACGTGATCTGAGTTCAGACCGG

I3uman . . . . . . . . . . . . . . . . . . . . . . .
Frog . . . . . . . . . . . . . . . . . . . . . .
Urchin . . . . . . . . . . . . . . . . . . . . . .
Fruit fly . . A . . . . . . . . . . . . .A . . . . .
111. 16Sa-3' ATGTTTTTGATAAACAGGCG
I-Iurnan . . . . . . . . .G . . . . . . . . . .
Frog . . . . . . . .A . G C . . G . . . . .
Urclun . . . . . . . . .G . . . . . . . . . .
Fruit fly . . . . . . . . .T . . . . . . . . . .
IV. 16Sbr-3' CCGGTCTGAACTCAGATCACGT
Human . . . . . . . . . . . . . . . . . . . . . . 3080
Frog . . . . . . . . . . . . . . . . . . . . . . 4572
F:
og r./c,c,
Urchin . . . . . . . . . . . . . . . . . . . . . . 5682
Fruit fly . . . . . T . . . . . . . . . . . . .T . . 12887
Figure 5 16s rRNA primers. These 16s primers were insects, crustacea, molluscs and cnidarians. They am-
some of the first universal primers made, having been plify a 500650-bp fragment. Note that in Xenopus there
designed by S. Palumbi (16Sb) and T. Kocher (16Sa). is a tandem repeat on the 3'side of the l6Sa primer. The
The reverse primers 16Sar and 16Sbr work wcll for insertions in the repeat should make it difficult for the
mast animal phyla including sea urchins, vertebrates, primer to anneal there, however.
mitochondrial genome as a whole. Because the chain. Its amino acid sequence is highly con-
amplified fragment of the 16Sar and 16Sbr served across phyla, making it easy to align se-
primers (I?igure 5 ) is larger than the 12s fragment quences to one another, and making it possible to
(about 550 bp compared to 400 bp), it is slightly design useful universal primers (Figure 6). Be-
more useful in phylogenetic reconstruction. klso, cause it is so highly conserved, amino acid substi-
there is enough variation in some species to be tutions are rare within species, but silent changes
useful in population level studies. are just as common as they are in other mtDNA
Most investigators who have used the genes with lower constraints on amino acid se-
primers shown in Figure 5 successfully on organ- quence (Kessing, 1991).Amino acid sequences are
isms distantly related to the species on which the useful in phylogenetic reconstruction of deep evo-
primers were based (e.g., corals, hydroids, gas- lutionary branches (Palumbi and Benzie, 1991).
tropods) have used preliminary sequences from
initial amplifications done at low stringency to de- Cytochrome b Primers
sign taxon-specific primers. Cytochrome b is a protein in the electron transport
chain. It is the only protein product of the mito-
Cytochrome Oxidase I chondrial genome that is a fully functional
This gene is a subunit of the cytochrome oxidase monomer-that is, it is not a subunit of a large en-
complex that is part of the electron transport zyme complex, as are all the other mitochondrial-
Nucleic Acids 11: The Polyrnerasc Chain Reaction 237
11
---+
I Cvtochrome oxidase I: Vertebrates 1
VII
4
IAS I Cvtochrome oxidase I: Urchins I R I
Map Sequeu ce
position Primer/Taxa positior~
I. Cole-3' CCA GAG A T T AGA GGG A A T C A G TG
Human . .T . . . .A. . .G . . A . . . . . . . . 7110
Frog . . . . TA . A . . AC . . . . . . . . . . . 8602
Urchin . . . . . . .AG . . G . . A . . C . . . . . 6992
Fruit fly . . . . T A . .A. . AT . . . T. . . . . . . 2672
11. COlf-5' C C T G C A GGA GGA GGA G A Y C C

Human . .C . .C . . . . . . . . . . . . . .
Frog . . . . .C . . . . . . . . T . . . . .
Urchin . . . . . . . . . . .G . . . . . . .
Fruit fly . .A . .T . . . . . . . . . . . . . .
111. Cola-3' AGT ATA AGC GTC TGG GTA GTC

Human G. . G . . T . . A. G. . . . . . . .
Frog T . . . . . . . . . . . . . . . . . . .
Urchin T . . . . . G . . . . . . . . A . . . . .
Fruitfly ... G . . . . . A. . A. . . . . A .
VII. 16SB-5' ACG TGA TCT GAG TTC AGA CCG G
Human . . . . . . . . . , . . , , , . , . , . , 3080
Frog . . . . . . . . . . . . . . . . . . . . .
Urchin . . . . . . . . . . . . . . . . . . . . .
Fruit Fly . . . . .T . . .
Figure 6 Mitochondria1cytochrome oxidase I primers. COlf primer, perhaps because of ~ t small
s size and de-
COlf is one of the most general of primers for protein generacy. Sequencing with C o l e is casler Succcsstul
coding regions, and is useful in amplifications of every Cola-f ampl*ficat~onshave been obtained for dmofla-
phylum attempted except Cnidaria. The initial com- gellates, sharks, lamprey frsh, sea urchms, sp~ders,and
panion to COlf is usually COla. C o l e and COlf make shrimp. Note that COlf 1s a degenerate prirncr Typl-
superb dsDNA a~nplificationsfor vertebrates. Single- cally, COla and COlf are used in in~tiallow-stringency
stranded DNA amplifications using either primer often ampllhcations, and the subsequent sequence data arc
appear smeared on gels. Sequencing is difficult with the used to design taxon-specific primers.
encoded proteins. The chemistry a n d action of cy- tions of the gene that are highly conserved among
tochroine b a r e well k n o w n , a n d a s e c o n d a r y taxa and are thought to be important in the func-
structure of the protein has been proposed. Irwin tion of the protein.
et al. (1990) examined the molecular evolution of Figure 7 s l ~ o w primers
s for cytochrome b, prl-
cytochrome b i n m a m m a l s , a n d M a r t i n a n d marily for use in vertebrates. Kocher et al. (1989)
Palumbi (1993a) compared these results to t h e listed t w o universal cytochrome b primers that
evolution of this protein i n sharks. Both studies amplify a s h o r l section from a wide varieky of
noted that the level of amino acid conservation taxa. This short section s h o w s s o m e variation
varies significantly in different parts of the cy- within some populations (e.g., fish) and between
tochrome b gene. The most variable part is at the species. It is s o short that robust phylogen~csare
3' e n d of the sense strand. There are several sec- sometimes difficult to produce. To obtain a longer
238 Chapter 7 / P a l m b i
I I1 VI
3 3 -+
[ E 1 Cytochrome b IT/
t t *
I11 v VII
Map Sequence
position 13rimer/Taxa position
1. GLU-5' TGA TAT GAA AAA CCA TCG TTG
I-Iurnan . C . . . T . . 14724
Frog C C
Carp CT G C
Chlcken C G CT G T
Shrlrnp C AT T G T AT
1. GLUDG-5' TGA C T T GAA RAA CCA YCG T T G
I1 CB1-5' CCA TCC AAC A T C TCA GCA TGA TGA AA

I-Iuman C 14817
St~ngray C A T
Frog A T T T 16321
Urchin C C T C ATT G 14581
111 CB2-3' CCC TCA GAA TGA TAT T T G TCC TCA

Hun~an 15175
~ r o g' . . . . . . A . . A . . . . . . . . . . . . . .
Urchin AG. . . . A . . G . . C . . . . . C . . C . .
V. CB3-3' GGC AAA T A G GAA R T A T C A T T C
Human . . . . . . G . . A . . . . . . . . . . . 15560
Frog . . . G. . . . . . . . . . . . . . . . . 17065
Stingray . . . . . . . . . . . . . . . . . . . . .
Sturgeon . . . . . . G . . A. . . . . . . . . . .
Urchin . . . G.. ..A . . . . . . C.. . . .
Fruit fly A . . . . . . .A A . . . . . . . . . . . 11325
VI CB3R-5' C A T A T T A A A CCC GAA TGA TAY T T

Rat C C A 15560
Froa A 17065
StinUgay . . . . . . . . . . . . . . . . . . . . . . . .
Urchin . . C . . . C . G . . A . . . . . G . . . . .
Fruit fly . .C . . . C. . . .A . . . . . . . . . . .
VII CB6THR-3' C T C CAG T C T TCG RCT T A C AAG
Human T T T C T 15930
Frog T
Sturgeon T GA T
Shark
Urchln C TCT C T C CTG T .
Fru~tfly A T T A T T T
Figure 7 Mitochondria1 cytochrome b primers. Most
of khe primers listed are known to work in a wide di-
versity of vertebrates. The Control Region
The D-loop, or displacemel~tloop, is a region of
the mitochondria1genome of mammals that con-
sequence several other primers have been devel- tains the control regions for mtDNA replication
oped (see Martin et al., 1992b).Most work well for and transcription. It is called the displacement
most vertebrates. This gene has not been studied loop because during replication, the huo strands
extens]vely in invertebrates. in this region are displaced from one another by
v I
4 -+
l T l P l Control region I F /
t C
II IV
Map Sequence
position Prmer/Taxa position
I PRO-5' CTA CCT CCA ACT CCC A A A GC
Human C A T T G A 15980
Frog C A TTG C
Sturgeon TC C TT. . . . .
Urchin TAC A T . G . . .
11. PHE-3' T C T T C T AGG C A T T T T C A G T C
Human C G AA . . . . . . . 625
Frog .A CA
Urchin C TG A
V CB3R-5' CAT A T T A A A CCC GAA TGA T A T T T

Human C C .A 15560
Stingray A
Frog
Urch~n C C G A . . .G ..
Fruit fly C . C A . ..
IV. 12SAR-3' A T A G T G GGG T A T C T A A T C C C A G T T
Human . . . . . . 1067
Frog
Fnut fly . A.A . T
Figure 8 Mitochondria1 control region primers. The tions of the cytochrome b and 125 rRNA genes in fish
latter two primers (CB3R-5' and 12SAR-3')amplify the and mammals. See Martin et al. (1993).
entire control region pfus the flanklng tlWAs and por-
a third strand. In taxa other than mammals, the The map shown in Figure 8 is for mammals.
control region is organized very differently, often It is a useful guide for fishes as well, in which
without an obvious D-loop. In sea urchins, this rates of control region substitution can be up to 40
region is under 200 bp long. In fish, it tends to be times hgher than in the cytochrome b gene (W. 0.
very long and Is often full of repeated sequences. McMillan and S. R. Palumbi, unpublished data).
In insects, it is called the AT-rich region and can However, this region appears to be rearranged in
also be long and full of repeated sequences. birds (T. Quinn, personal communication).
In the control region there is usually a set of
conserved sequence-blocks that presumably are Chloroplast DNA Primers
important in controlling mtDNA replication and
transcription. See Attardi (1985) for a review. In- Among chloroplast genes that have been se-
terspersed in this region, often flanking the con- quenced extensively, rbcL clearly leads in terms of
served areas, are sections of non-coding DNA that the number of species examined (see R.G. Olm-
seem to be free to vary. These regions contain stead et al., 1992; also, volume 80 of the Annals of
many polymorphic sites, and have been the focus the Missouri Botanical Garden).
of concerted efforts to understand the population
biology of mammals through mtDNA sequencing. rbcL
Attention to the control regions of other groups The large subunit of ribulose 1,5-bisphosphate
for similar purposes has lagged behind work on carboxylase is encoded in the chloroplast
mammals. genome, and is part of a large enzyme that cat-
Primer/Taxa
rbcLla-5' GGCCGTCGACATGTCACCACAAACAGARACTAAAGC
Barley . . . . . . . . . . . . . . . . .A . . . . . . . .
]Rice . . . . . . . . . . . . . . . . .A . . . . . . . .
Vicia . . . . . . . . . . . . . . . . .A . . . . . . . .
Galium . . . . . . . . . . . . . . . . .G . . . . . . . .
rbcLlb-5' ATGTCACCACAAACAGAAACTAAAGCAAGT
rbcL12-3' CTCGSAGCTCCTTTTAGTAAAAGATTGGGCCGAG
Spinach . . . . . . . . . . . . . . . . . . . . . . . .
Tobacco . . . . . . . . . . . . . . . . . . . . . . . .
Pea . . . . . . . . . . . . . . . . . . . . . . . . .
Cotton . . . . . . . . . . . . . . . . . . . . .c . .
Alfalfa . . . . . . T . . . . . . . . . . . . . .T . .
OW-3' ACTACAGATCTCATACTACCCC
Rice . . . . . . . . . . . . . . . . . . . . . .
Figure 9 Chloroplast rbcL primcrs. The underlined similarity). The 3' primers actually anneal to a con-
sections of rbcLla and rbcLl2 correspond to restriction served section of the chloroplast DNA downstream of
sites that have been built into these primers (Sal I and the rbcL gene (see R.G. Olmstead et al., 1992).These
Sac I). The rbcLla primer is identical to 25 sequences in primers have been used extensively on angiosperms,
GenBank (note that the matches listed here are from a but work on some non-angiosperms as well.
BLAST search, and represent the taxa with the highest
alyzes the combination of C o n and ribulose 1,5- Intron primers

bisphosphate into two molecules of 3-phospho-
glycerate. The enzyme is very abundant in Recently, universal PCR primers have been de-
chloroplasts (some say it is the most abundant scribed for PCR amplification of introns in con-
enzyme on Earth) and the reaction it catalyzes is served nuclear genes (Lessa, 1992; Lessa and
basic to all of carbon fixation. More importantly Applebaum, 1993; Slade et al., 1993; Palumbi
for molecular systematics, the large subunit gene and Baker, 1994). These primers anneal to adja-
is relatively large (= 1400 bases), is flanked by cent exons of highly conserved nuclear genes
conserved sections that make amplifications easy, and can be designed to be of broad taxonomic
and has a rate of substitution that has made it at- utility. Sequence analysis of the introns provides
tractive in studies of plant systematics at the fam- high resolution allelic phylogenies comparable
ily level (D.E. Soltis et al., 1990; R.G. Olmstead et to those from mtDNA studies. Because the PCR
al., 1992).Riesenberg et al. (1992) analyzed RFLPs products span non-coding introns (which often
of PCR-amplified gene segments. Figure 9 shows occur in conserved places within genes), th'e am-
two primers (rbcLla and rbcL12; courtesy of J. plified DNA segment has potential for high
Palmer and R. Olmstead) that have been used of- rates of sequence evolution. The broad applica-
ten in rbcL amplifications from land plants. Also
listed are the primers used by Riesenberg et al.
(1992). Primer
rpoCl-5' AAGCGGAATTTGTGCTTGTG
rpoC1 and rpoC2
rpoC2-3' TAGACATCGGTACTCCAGTGC
RNA polymerases C1 and C2 are found in the sin-
gle-copy region of the chloroplast genome. rpoCl Figure 10 Chloroplast rpoCl and rpoC2 primers.
contains an intron in many taxa, and is separated These primers anneal to positions 195-214 and 1364-
from rpoC2 by an intergenic spacer. Liston (1992) 1384 of the tobacco sequence (Shinozaki et a]., 1986).
According to Liston (1992),the amplified fragment in-
used the primers shown in Figure 10 for RFLP cludes 91%of the rpoCl gene with its intron, the inter-
studies of phylogeny. genic spacer, and 33%of the rpaC2 gene.
Nucleic Acids TI: The Polymerase Cizaiiz Reactiorz 247.
Amino acid
Primer/Taxa position
CK6-5' GAC C A C C T C CGA GTC A T C T C Z ATG
Mouse . . . . . . . . . . .C . .G . . . . . . . . .
Chicken . . . . . T A CC A . . . . G . . . . . C . . . 248
Fish . . . . . . . . . . . . . . . . . . ..G . . . 233
Urchin . . T . . . ACT . . G . . T . . T . . C . . . 273
Lobster . .T . . . . . . ..C A.. . . . .T . . 168
CK7-3' CAG GTG C T C G T T CCA CAT GAA

Mouse . . . . . . . . . . . . . . . . . . . . .
Chicken G. . .C. T. . A. . . . . . . . . . .
Fish . . . . . . . . . . . . . . . . . . . . .
Urchin A , . A , . T . T . . . . . . . . . . . .
ARK7-3' G T G C C A A G G T T GGT DGG G C A
Lobster . . . . . . . . . . . . . . G . . . . .
Urchin CK . . A . . . . . . . . . .A T . . A . .
Human CK . . . . . C . . . . . . .A T . . . . .
Figure 11 Creatine kinase intron primers. The first separately. These primers work with elasmobranchs,
pair of primers, CK6-5' and CK7-3: amplifies the sixth teleosts, whales, birds, sea urchins, and (surprisingly)
intron of the creatine Wnase genes. There are three loci tephritid flies. The third primer, ARK7-3', was designed
(muscle-specific,brain-specific, and mitochondrial) . In to work with arthropods when paired with CK6-5'.
general, there are two or three (and rarely four) ampli- These primers will amplify a stretch of the arginine ki-
fication products corresponding to the different genes. nase gene, which appears to be present in a single copy
The amplified fragments differ in size and can be sepa- in Drosophila (Collier, 1990). See Wothe et al. (1990),
rated by gel electrophoresis, isolated, and re-amplified Wirz et al. (1990), and GenBank for sequences.
bility, potential high rates of variability, and alleles at these loci for a particular set of spccies. It
well-understood genetic background of this nu- would be interesting to compare electromorphs
clear allelic system make it attractive for molec- and intron differences in these taxa. Primers are
ular studies of population structure and genetic shown in Figure 11.
diversity. Below are a few exon-priming, intron-
crossing (EPIC) primer pairs that seem to have Actin
broad applicability. Use of these primers re- Genes in this small family are known to be highly
quires a bit of patience and willingness to trou- conserved across animal phyla (Foran et al., 19851,
bleshoot. and codon bias limits heterogeneity at four-fold
and two-fold sites. Most importantly some intron
Creatine Kinase positions are also conserved. For example, the
Creatine kinase (CK) supplies muscles with en- first intron occurs near ammo acid position 41 in
ergy by transferring a phosphate group from most species examined (Kowbel and Smith, 1989),
creatine phosphate to ADP. This enzyme is only which allows exon primers to be developed both
known in vertebrates and echinoderms, but a upstream and downstream from the intron (see
very similar enzyme occurs in the protostome Palumbi and Baker, 1994). Both cytoplasm~cand
phyla, where it is called arginine kinase (ARK). muscle types of actin appear to be amplified ~11th
Creatine kinase has three unlinked loci in most these primers (Figure 12).
vertebrates known, and occurs in a tandem
triple repeat in sea urchins. There are a variety Cytochrome c
of loci of arginine kinase in insects and crus- Cytochrome c has long been of interest to mole-
taceans. cular evolutionists. It was studied extensively
Many electrophoretic surveys have included using protein sequencing before large-scale
CI< or ARK, and so there may be protein data for DNA sequencing became practical (see Dicker-
232 Chnpfer 7 / Pnlunzbi
Amino acid
Pnmer/Taxa position
ACTI-5' GCT G T T T T C CCG TCG A T T GT
Starf~sliC C C . 29
Starfish M . G A T 29
Urchin A
Mouse . .. ,
Frult fly C A G T
Yeast A T
Frog G T T
Cliicken A T . T . ..
Brine shrimp . C .T C . T
Nerna tode C G . A . T
ACTII-3' GTC C T T C T G CCC C A T ACC SAC CAG
Starfish C . . . A 51
Starfish M . . G 51
Urclun T T . C .
Mouse . T .. C .
Chicken . C ..
Carp T ... A .
Nematode .. . . . T . G . ..
Figure 12 Actin intron primers. These primers gener- for some insects, spiders, all mammals tested, and sea
al!y produce multiple bands between 200 and 2000 bp urchins. See Baker and Palumbi (1994) for details.
117 length Sucessful amphfications have been obtained
son, 1971). Comparatively little work has been Proto-Oncogene int

done on the evolution of this gene since the ad- The proto-oncogene int was first identified as a
vent o l DNA sequencing, bucsee Kemmerer et gene in mice in which a tumor virus inserted it-
al. (1991). self. The gene subsequently was found to be ho-
Cytochrome c is a small protein vital to ox- mologous to the product of the wingless locus in
ldatlve pl~osphorylatiot~ during electron trans- Drosophila. Amino acid conservation is high in this
port. The amino and carboxy terminals of the gene, and studies of genomic clones have re-
protein are very highly conserved, making it pos- vealed several introns in conserved positions. The
sible lo develop primers that are appropriate for primers shown in Figure 14 amplify the third in-
taxa as diverse as plants, animals, and yeast tron in this gene.
(Kemmerer et al., 1991),Usually there are one or
two ~ntronsin cytochrome c. Vertebrates studied Elongation Factor l a
to dd LE have a single intron at amino acid 56. Rice This protein is involved in the translation'of
has one intron at position 37 and one at position mRNA to protein. Elongation factor complexes 1
82, although other plants have slightly different and 2 both consist of several proteins with specific
iniron positions (Kemmerer et al., 1991).The two roles in translation. EF1a functions in the transport
insecis studied (Drosopliila and Manduca) have no of an aminoacyl-tRNA to the ribosome, and its
introns. amino acid sequence is highly conserved among
Figure 13 shows amino acid similarities from taxa as diverse as plants and animals. Unfortu-
a number of taxa at the beginning (amino acid nately, the intron positions tend to vary, making it
pos~tions13-19) and the end (amino acid posi- difficult to predict the i n w n positions in new taxa.
tions 72-80) of the gene. Most introns occur be- Cho et al. (1995) used primers to a 1240-bp section
tween these points (although one of the rice in- of EFla for phylogenetic analysis of lepidopter-
trolls occurs after the double lysines [Kl at ans, although their focus is on coding sequences.
pos1tlor.i~72 and 73). Primers cytC-C-5' and cytC- Of the primers shown in Figure 15, EF1 and EF2
8-3' dn11ea1to the coding and non-coding strands, span an intron that occurs in vertebrates and
respecilvely. arthropods, although surveys have not revealed in-
Nucleic Acids 1T: The Polymerase Chain Reacfion 243
Amino acid
Primer/Taxa posit~on
R/K C A/L Q C H T
cytC-C-5' AAG T G T GCY CAR TGC CAC AC
Human C . .G . 19
Insect CGC , C . C G 18
Yeast GA . CTA A 18
Rice C C . G . , , 22
M K T G P I Y K K
cytC-B-3' C A T C T T G G T G C C GGG G A T G T A T T T C T T
Insect ... . C 77
Rat . . . . . . . C. 72
Yeast ... A A . A A .. 77
Ibce . . T . A
Figure 13 Cytochrome c intron primers. Note that lysines) should be omitted. Although the plant introns
cytC-C is a degenerate primer but that degeneracy in seem to be large and might be useful in population or
cytC-C has been reduced by using the mismatch rules systematic studies, the vertebrate introns are small, and
discussed in the text. The nucleotide at position 10 of have not provided enough sequence data to be useful.
cytC-B (listed as a G here to match the insect sequence) Some of these latter amplifications might be processed
might be changed for plants or fungi. Also note that a pseudogenes, which are known for mammals. How-
second intron occurs near the 3' end of cytC-B in some ever, initial amplifications from fish have shown larger
plants, and the last two codons (both corresponding to intron sizes.
trons i n all taxa. EFO was designed to anneal up- v i d e a section of a v e r y polymorphic nuclear
stream of several other potential intron positions. coding gene. The amplified product is 800-1000
Note that i n humans there are m a n y processed bp in most mammals tested. The primers d o not
pseudogenes of E F l a (Uetsuki et al., 1989). a p p e a r t o w o r k o n birds, reptiles, o r inverte-
brates, a n d s o are probably o n l y useful within
DQa the mammals.
The primers shown in Figure 16 amplify a sec-
tion of the hypervariable protein-coding domain Aldolase
of the MHC locus DQA, a n d were developed by Lessa a n d Applebaum (1993) demonstrated the
Slade e t al. (1993). Note that these primers are usefulness of combining EPIC alnplifications and
not designed to target introns, b u t instead pro- denaturing gradient gel electrophoresis. They de-
Amino acid
INTA-5' AAC C T T CAC A A C A A Y GAG GC
Human . . . . . . . . . . . 197
Mouse . . . . . .
Frog . .T .G . . . . C C
Fruit fly .T G . . . . . . . C .. 21
INTB-3' TT GCA CTC TTG I C G CAT YTC

Human . . C . T . . . 213
Mouse . . . . G . . . . . .
Frog . T . . C . A . .
Fruit fly . . . . . . C , . T . . . . . 220
Figure 14 Proto-oncogene int intron primers. These and gives a number of amplification products. Note
primers amplify the third intron in int. In mice this in- that at position 12 there are many different bases in dif-
tron is about 600 bp long. In whales it is only about 300 ferent taxa, so inosine is used at this position. Inosines
bp. These prlmers work well in most vertebrates and do not bind well with anything, but do not disrupt the
have been used successfully with sea urchins and ne- annealing of adjacent bases (see van Ooyen et al., 1985;
merteans. In some taxa, inf is a small multigene family, Rijsewijk et al., 1987; and Noordermeer et al., 1989).
Amino
Map acid
I. EFO-5' T C C GGA TGG C A Y GGC GAG A A Y A T G
Human . . T . . . AAT . . T . . . . C . . .
Fruit fly . . . . . . ..C . . .
Brine shrimp . .T . . . . .C . . .
11. EF1-5' A A C G T T GGC T T C A A C G T G A A G A A C G
Human . .T . .G . . . . . . . . T ..C . . . . .T .
Fruit fly
Pish (Zcbra)
Corn
Rice
111. EF2-3' A T GTG AGC AGT GTG GCA A T C C A A
Human . . . . . . . . G... . . . . . . . . . . . 360
Fruit fly . . . . . . . . G.. . . . . . . . . . . . . 360
Frog . . . . . . . . . . . . . . . . . . . . . . .
Tetrahymena . . . . . . . . . . . . . . . . . . . . . . .
Arabidopsis . . . . . . .A G: . . . . . . . . . . . . .
Figure 15 E F l a intron primers. These primers work rather than at the beginning of the gene (see Walldorf
well for insects, spiders, crustaceans, and gastropods. and Havermann, 1990; Uetsuki et al., 1989). These
Success has also been obtained with vertebrates, al- primers were designed in coIlaberation with George
though processed pseudogenes (or loci with very small Roderick, University of Hawaii. Vertical lines denote
introns) are often amplified. Amplification with EFO intron positions in human (h), Drosophlla (f), and
and EF2 generally give the best results. Note that the shrimp (s).
above map starts at about amino acid position 150
scribed the aldolase primers shown in Figure 17, Histone H2.A.F

which were also used by Slade et al. (1993). Al- Histones are highly conserved in amino acid se-
dolase (in mammals) occurs as a three gene fam- quence and usually occur as a set of small multi-
ily, and the primers have been designed in regions gene families. The family H2A was studied by
that allow them potentially to amplify an intron Slade et al. (1993), and several primers (Figure 18)
(intron G, Lessa and Applebaum, 1993)in all three were constructed that seemed to give good am-
loci. Although all three loci are present in mam- plification for vertebrates. Sequencing of some of
mals, aldolase A and C usually are amplified these products, however, revealed the existence of
(Slade et al., 1993). many processed pseudogenes. These pseudo-
Nucleotide
DQAI-5' CCGGATCCCAGTACACCCATGAATTTGATGG 492
Human . . . . . . . . . . . .
DQA2-3' CCGGATCCCCAGTGCTCCACCTTGCAGTC
Human 1336
Rat
Sheep
Figure 16 DQa intron primers. Both primers have an section of the DQA gene, as well as the intron between
eight-base Barn HI linker at the 5' ends. They amplify a exons two and three.
Nucleic Acids XI: The Polymerase Clzai~zReaction 245
Nucleotide
Prmer/Taxa position
Aldl-5' TGTGCCCAGTATAAGAAGGATGG 5323
Rat A
Rat B T C
Ald2-3' CCCATCAGGGAGAATTTCAGGCTCCACAA 5743

HumanA . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rat A . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mouse C . . . . . . . .C . . . . . C . . . . . . . . . . . . .
Figure 17 Aldolase intron primers. Primers work well 1993).Nucleotide positions correspond to those 111 the
for loci A and C in mammals, reptiles, and birds. Some mouse sequence MUSALDAA, as grven m Lessa and
rodents are !mown to have a pseudogene that also am- Applebaum (1993).
plifies (see Lessa and Applebaum, 1993; Slade et al.,
genes had a size consistent with a lack of an in- Drosopizila has four loci, Chiamydomorzas has two.
tron and showed stop codons in every reading Yeast and chicken proteins are about 70% similar
frame. Introns between exons 2 and 4 are similar at the amino acid level. The primers shown 1x1 Fig-
in position in humans, chickens, and Drosophila; ure 19 were designed by Tom Duda, University of
they range in size from 180 to 1370 bp. The intron Hawaii.
between exons 4 and 5 can be bigger (>2500bp in
chickens), or not occur at all (Drosophila). See
SIade et al. (1993) for details.
More Information about PCR
Many different guides and lists of protocols are
Beta Tubulin available that describe the PCR process. Soille of
Alpha and beta tubulin are related proteins that the most useful include Innis et ai. (1990), Erlich
form heterodimers to make up the bulk of micro- et al. (1989), Sambrook et al. (1989), and Z~rnmer
tubules. Like actin, the proteins are highly con- et al. (1993). Recent reviews include Erlich et al.
served, and occur in a small gene family: (1991) and Erlich and Arnheim (1992). Trade jour-
Map Nucleotide
position Prrmer/Taxa posihon
I. H2A6-5' GCTGGGCCGGTAAGGCTGGNAAGG 19
Cow G . G
11. H2A2-5' GAAGAGT TGGATTCCCTCATCAA
Ch~cken
Dog A
111. H2A5-3' TGTGGATGTGTGGAATGACACCT 320
Human
Cow
Figure 18 Histone H2AF intron primers. Designed by and H2A5 primers gave the fewest pseudogene ampli-
Slade et al. (1993), these primers seem to work only on fications, althougl~the prlrners may not work wcll out-
vertebrates, and they produce many pseudogene am- side of placental mammals.
plifications.According to Slade et al., use of their M2A2
246 Chapte7*7 / Pnlumbi
Amino
Map acid
pos~tlonPnmer/Taxa position
I Tubl5' C A G G C T GGT C A A T G T GGY A A Y C A
Human C C
Ctucken A G G C C C
Frult fly C C G C C C
NematodeA A C A C T T
Nematode I? A C C
11 Tub2-3' CC RTG Y T C A T C A C T TIGAT YAC C T C CCA

Human A T G C 20
Chlcken A C G G C 20
Fruit fly G C C T T A A G T 20
Nematode A G T . GGA T T 20
Nematode B A T CGA T T T 20
111 Tub3-5' G A T T T G G A G C C N GGN A C C A T G GA

Human C A A T G 66
Chcken .G C G 66
Frult fly C C A 72
Nemaiode A A T 66
Nematode B A A T 66
IV. Tub4-3' AT ACG G T C TGG G T A C T C Y T C NCG

Human . . T . . A . . C . . . . . T , . T . . T . . 153
Clucken . . G . . . . . G . . . . . . . . C . . T . . 153
Fruit fly . . G . . , . . G . . . . . . . . C . . T . . 159
NematodeA . . C. . . . . A. . C. . T . . T. . G. . 153
NematodeB . . T. . A. . C . . A , . . . . T . . A . . 153
Figure 1 9 Beta-tubulin intron primers. Tub1 and Tub2 92, whereas urchins, insects, and nematodes have an in-
d o noi work wcll together, but Tub1 and Tub4 work on tron near amino acid 130. Introns are marked by verti-
ail of the taxa tested, including vertebrates, sea urchins, cal lines. Taxa possessing a given intron are listed to the
mollusks, and flies. Tub3 and Tub4 also work well to- right: c = chicken; f = fly; h = human; n = nematode; u =
gethcr on the same set of taxa, and in this region verte- urchin. All sequences are from GenBank.
braies tend to have an intron near amino acid position
ndls such as BioTechn~qtlesa n d A?nplificatians are TLXPolymerase

~ Buffer (pH 8.3, silsklally
dcciicared LO providing detailed summaries of the made as 18>0
latest findings about PCR technology; virtually
Tris-HC1, pH 8.3
every issue has some new tip.
M R C ~
Thebe recipes are for lx solutions, b u t they usu- Once the solution is made, test it immediately, ~f
ally d : made
~ at the concentrations shown and di- it passes, it be aliquoted into 1.5-mi tubes
luted for use. and frozen, where it will last u p to one year.
TE Buffer (pH '7.6, usualEy made as Dense-Dye (full. strength)

100~) EDTA 50 mM
~ris-HC1,pH 7.6 10 mM glycerol 30%
EDTA 1rnM bromophenol blue 0.25%
xylene cyan01 0.25%
TBB Buffer (final pH 8.3, usually made

as 15x1 dNTP M i x (usually made as 1Qx)
Tris-HC1 89 rnM d ATP 2.0 mM
boric acid 89 rnM dGTP 2.0 rnM
EDTA, pH 8.0 2 mM dCTP 2.0 mM
dTP 2.0 rnM
Aliquot in 0.5-ml lots in sterile tubes.
TAB Buffer (pH 723, usually made as
5QX
Tris-HC1 40 mM
glacial acetic acid 0.114%
EDTA, pH 8.0 1mM
Chapte
eic Acids I R Ana ysis of Fragments
and Restriction Sites
Thomas E. Dowling, Craig Moritz, Jeffrey D. Palmer,
and Loren H. Rieseberg

General Principles
The analysis of DNA is an increasingly important tool in studies of evolutionary
ecology, population genetics, and systematics (e.g., Avise, 1994). DNA has scv-
era1 significant advantages over alternatives such as proteins for molecular sys-
tematics: (1)the genotype rather than the phenotype is assayed; (2) one or more
sequences appropriate to a problem can be selected on the basis of evolutionary
rate or mode of inheritance (Chapter 1); (3) the methods are, for the most part,
general to any type of DNA; and (4) DNA can be prepared from small amounts of
tissue and is relatively stable, even in non-cryogenically stored tissues (Cliapter
3). The last attribute means that genetic information on rare or endangered
species can be obtained without destructive sampling (e.g., Hoss et al., 1992;
Morin et al., 1992; Taberlet and Bouvet, 1994) and it is possible to analyze DNA
from extinct populations or taxa (e.g., Higuchi et al., 1987; Paabo, 1989; Golen-
berg, 1991; DeSalle et al., 1992; W.K. Thomas et al., 1990; Cano et al., 1993; Brown
and Brown, 1994; DeSalle, 1994;A.C. Taylor et al,, 1994).
There are several approaches to assaying DNA variation. DNA-DNA hy-
bridization estimates the amount of sequence divergence between genomes
(Chapter 6), but cannot provide discrete character data and does not resolve the
nature of the sequence variation. Comparison of the DNA sequences themselves
250 Chapter 8 1 Dowling, Moritz, Palmer 6.1 Rieseberg
Type of
- --,---s--=+
-
..I
- -
~~gga~~~&ggg
sequerrce change s
e
z
2
ab~~-@g-~~
-4
+
-7
-
Assays
Levels of
application -Broad ------+ < Intraspecific b
Types of Mating systems + ++ -/++ +++ +++

application
D~versity +++ ++ + + +++
Parentage ++ ++ + +++ +++
Relatedness ++ ++ - +/++ f ++
Hybrid zones & +++ ++ +++ - -

species boundaries
Phylogeny +++ - - - -
Figure 1 Schematic of techniques for analyzing frag- useful for estimating selfing rates directly from family
ments and their applicability to differentproblems in data but not indirectly from genotype frequencies in
molecular systematics. Time and expense required for natural populations.
llnplctnenlation are not considered. IiAPDs are more
(Chapler 9 ) offers extremely high resolution and ciency to screen populations or species for specific
yields character data that also can be converted to changes in sequence (Slade et al., 1993).
estimates of sequence divergence if so desired
(Chapter 11).The development of reliable methods Forms of Fragment Variation
lor direct sequencing via PCR (Chapters 7 and 9) Differences among individuals in the number
has simplified the generation of sequence data. and/or pattern of DNAfragments can arise through
Sequence variation also can be examined indi- a number of distinct processes, including changes in
rectly by electrophoreticallycomparing DNA frag- the amount of DNA, the structure of the DNA, or
nwrlts to look for variation in their number, size, the number or distribution of restriction sites. Also,
or conformation (Figure 1; reviewed by Lessa and where assays use gene amplification, there may be
Applebaum, 1993; Grompe, 1993).Although frag- variation in the ability to amplify a specific DNA
ment analysis offers less resolution than nu- segment. The types of polymorphisms related to
cleotide sequencing in some respects, it is a pow- these processes are reviewed below.
erful and cost-effective alternative where large
nunlbers of individuals or loci or large segments VARIATION LN FRAGMENT SIZE Fragments can vary
of a genome are being screened. A theme devel- in size because of unique insertions or deletions
oped in this chapter (see also Chapter 12) is that or because of changes the number of copies of
fragment methods are a powerful complement to tandemly repeated sequences. The tandem
sequencing studies. The nucleotide sequence pro- repeats with the lughest rate of change in copy
vides a detailed restriction site map that allows for number are very short-2 to 5 bp in the case of
precisc interpretation of fragment changes k g . , microsatellites (Tautz, 1989; Weber and May,
Can11 et al., 1984; Liston, 1992; Wugall et al., 1994) 1989; Goodfelloxv, 1993; Morgante and Oliveri,
and fragment methods can be used with great effi- 1993; Queller et al., 1993; Figure 2A)--or some-
Nucleic Acids ID: Analysis of Fragments and Restriction Sites 251
(A) Locus 829 Locus B123

( s m ) A B C D E F C H I J K L(sm)MNOPQ R S T
Figure 2 Examples of variation at (A) single micro- lite autoradiogram (provided by D. Lambert; see Lam-
satellite loci and (B) multiple minisatellite loci. In (A), bert et al., 1994) illustrates variation within and be-
the two microsatellite loci (B29, B123) were amplified tween families of pukeko detected using probes pV47-2
from each of 20 bridled nailtail wallabies (lanes A to T) and 3'HVR. The pukeko families have a complex mat-
and run adjacent to a known sequence (sm) to identify ing structure with a , p, and y individuals ranked ac-
alleles by their slze. Both loci show the shadowing typ- cording to their dominance. The autoradiograph illus-
ical of dinucleotide microsatellite loci, but this is very trates both the complexity of the profiles and the power
consistent from sample to sample, enabling accurate for determining parentage.
scoring of genotypes. In (B), the multilocus minisatel-
252 Chapter 8/ Dowling, Moritz, Palmer 6Rieseberg
what longer ( ~ 2 0bp) for minisatellites (Armour simultaneously by simultaneous amplification

and Jeffreys, 1992; Bruford et al., 1992; Figure 2B). ("multiplexing").
Together these are often referred to as variable
number of tandem repeat, or VNTR, loci. The VARIATION IN CONFORMATION Changes in nucleo-
rate of mutation at these loci can be very high (up tide sequence also can affect fragment confonna-
to 10-2/gamete/generation; Jeffreys et al., 1988), tion or stability, which can be detected with
as can the number of alleles per locus (Table 1). appropriate electrophoretic techniques (Figure
Although variation in both systems is the result 1). One approach is based on the observation
of changes in copy number, the underlying mech- that changes in sequence alter the folding of sill-
anisms of mutation and the chromosomal loca- gle-stranded DNA which, in turn, can affect elec-
tions differ (Shriver et al., 1993). Microsatellite trophoretic mobility, resulting in single-strand
loci are randomly distributed and are subject to conformation polymorphisrns (SSCPs; Orita et
replication slippage (Weber and May, 1989, al., 1989; Hayashi, 1991a,b). Another approach
Schltitterer and Tautz, 1992), whereas minisatel- relies on subtle variations in the stability of DNA
Iite loci tend to be concentrated near telomeres duplexes according to their nucleotide sequence
and vary because of intramolecular or interallelic (Chapter 6) and detects these differences by
recombination and gene conversion (Armour and denaturation of double-stranded DNA into sin-
Jeffreys, 1992; Jeffreys et al., 1991, 1994). gle strands within a gel (denaturing gradient gel
Minisatellite loci typically are examined in multi- electrophoresis, or DGGE; Myers at al., 1986,
locus profiles (e.g., Figure 2B) via hybridization 1989; Lessa, 1993). Heteroduplexes (DNA mole-
methods. In contrast, microsateUte loci are usual- cules composed of strands from two different
ly examined one at a time via PCR (Chapters 7 alleles) result in mobility shifts during elec-
and 121, although several loci can be examined trophoresis through polyacrylamide (M.B. White
Table 1
Properties of selected microsatellite and minisatellite loci in humans
Number of
Locusa N Heterozygosity allelesb Mutation rateC
Minisahllite loci
D5S43 125 0.890 NR 0 (0-0.003)
D12Sll 125 0.970 NR 0 (0-0.003)
D7S22 125 0.980 NR 0.003 (0.0006-0.009)
D7S21 125 0.980 NR 0.007 (0.003-0.015)
DlS7 125 1.000 NR 0.052 (0.038-0.072)
Microsatellite loci
HUMHPRTB [AGAT], 417 0.753 9 0.000054
I-IUMT401 [AATG], 370 0.720 5 0.000035
HUMRENA4 [ACAC], 374 0.442 6 0.000023
HUMFABP [AATI, 152 0.574 7 0.000045
IIUMARA. [AGCI, 97 0.892 14 0.000159
a Data for minisatellitc loci are from Jeffreyset al., 1988; those for microsatellite loci are from Edwards et
al.. 1992.
NR = not reported.
For minisatellites, observed mutation rates among 344 offsprin are re orted with 95% confidence inter-
vals in parentheses. For the micronatellites, no mutations were o&erve8arid the estlmahes are derived
from diversity levels using a maximum likelihood method that simuitaneously estimates N,and mutation
rate (p).A more recent stud re orts observed mutation rates for 28 human microsatellite loci at between 0
and 0.008 with a mean of 0.80lBmutations per gamete per generation (Webcr and Wong, 1993).
Nucleic Acids 111: Analysis of Fragments and Restrictio~zSites 253
Table 2
Examples of recognition sequences and types of end produced
Recognition
Enzyme sequence End TYPE
EcoRT 5'-GJAAnC-3' -G 5' overhang

3'-CTTAALG-5' -CTTAA
RsaI ~'-GT$AC-~' -GT Blunt

3'-CAJTG-5' -CA
Sac11 ~'-ccGcLGG-~' -CCGC 3' overhang

3'4GLCGCC-5' -GG
et al,, 1992; Ruano et al., 1994) and provide a morphism~(RFLPs). Base substitutions (or small
simple means of identifying distinct sequences. indels) can create or eliminate cleavage sites for a
Keteroduplex analysis also increases the sensi- particular enzyme, thereby altering the number
tivity of DGGE (Myers et al., 1989). These tech- and size of fragments detected by that enzyme
niques have the potential to detect single base- alone (e.g., Figure 3). Larger indels or rearrange-
pair changes and have been developed to assay ments typically alter fragment patterns for several
variation in PCR products, although SSCPs also restriction enzymes simultaneously (see the sec-
can be detected using genomic DNA (e.g., Orita tion on "Interpretation and TroubIeshooting"), re-
et al., 1989). sulting in correlated change in restriction frag-
ments and, thus, non-independence of fragnlent
RESTRICTION SITE VARIATION Base substitutions characters (Figure 3).
or insertion/deletion (indel) events have often
been detected using restriction endonucleases RAPDS The approaches discussed above all refer
(REs): enzymes isolated from bacteria that cut to changes within a spec~fic,dehberately target-
DNA at a constant position within a specific ed segment of DNA. An alternative method of
recognition sequence, typically 4-6 bp long (e.g., detecting variation, specihc to PCR, is to defect
Table 2). Thousands of REs have been isolated the presence or absence of randomly amplified
and characterized, with all of the information polymorphic DNAs (RAPDs; J.G.K. Williams e t
stored in a database (REBASE; Roberts and al., 1990; see Chapter 7 ) . This technique 1s
Macelis, 1993). Each cleaves DNA at a character- designed to detect sequence changes within PCR
istic recognition sequence that usually is symme- priming sites; base substitutions within elther
trical and, when cleaved, leaves ends with a 5' priinlng site will affect the eff~ciencyof ampl~fl-
overhang, a 3' overhang, or no overhang (Table cation, changing the praflle of fragments pro-
2). In some cases, enzymes isolated from differ- duced by a given primer (Caetano-Anollks and
ent bacteria have the same recognition sequence Bassam, 1993). However, the abiIity to amplily a
(= isoschizomers) a n d some recognition specific segment also will be affected by large
sequences overlap (see below). The specificity of insertions or deletions between priming sites
cleavage by REs means that complete digestion and by template DNA quality and other factors
of a particular DNA allele will yield a repro- affecting the PCR reaction (Hadrys et al., 1992;
ducible array of fragments. Black, 1993; Ellsworth et al., 1993). The method
The variations in fragment pattern revealed was originally developed using very short (e.g,,
following digestion with restriction enzymes are 10 bp) primers and low annealing temperatures,
referred to as restriction fragment length poly- but longer "semi-random primers" also can be
254 Chapter 8 / Dowling,Moritz, Palmer & Rieseberg
(A) C
Reference
d
Rase substituhon a ~'GATc~ ~

C
~ -
0
C
Deletion
a $ f & b C
Duplication I 1 4
(direct, tandem) * *
Invers~on g J-h' C
*I *I
(B) Eiectrophores~s (c) Calibration

7A
origin nB D n E nS r0
-
-----
C
- A
d - --
g
- b
-- - -
f
-
-
a -- -
h -
- - Distance from origin
,@
Eigure 3 The effect of different kinds of sequence a calibration curve based on a sample with fragments
cljai-rge on RFLI's. (A) DNA fragments (a-11) are gener- of known size run on each gel (lane S = size standard).
by RE digestion and (B) electrophoretically sepa- Vertical arrows indicate cleavage sites and asterisks in-
rared by size. (C) Eragn~cntsizes are determined using dicate the boundaries of rearrangements.
used (e.g., to target intron-exon boundaries; logarithm of the molecular weight. Fragments of
Welning and Langridge, 1991). known size are run on each gel to act as an inter-
nal standard against which the size of other frag-
Prillciples of Electrophoresis ments are estimated by interpolation from a cali-
The fragments produced by PCR amplification or bration curve (Figure 3).
by digestion of DNA with REs are sorted accord- The migration of fragments through neutral
ing to their size and/or conformation by gel elec- polyacrylamide gels is also affected by their con-
tlophoresis. At neutral yH, the sugar-phosphate formation (G. Sing11 et al., 1987).Single-stranded,
backbone of the DNA is negatively charged, caus- or even partially denatured DNA, migrates more
ing the molecule to migrate through an electric slowly than does double-stranded DNA. The mo-
field. The gel media used, agarose or polyacry- bility of single-stranded DNA also depends on
lamide, form a dense matrix through which patterns of local folding or renaturation. These
smaller fragments can move more easily than properties provide the basis for detecting het-
large fragments. For double-stranded DNA, the eroduplexes, SSCPs, and, using gradients of de-
distance migrated typically is proportional to the naturing chemicaIs (e.g., urea) or temperature, the
Nzicleic Acids 111: A nalysis of Fragments and Restriction Sites 255
DGGE method. These methods are discussed in al., 1993). For microsatellite analysis, null alleles
more detail below. may fail to amplify because of large increases in
the size of the product or mutations in the flank-
~ssumptions ing primer sites. This is a potential problem be-
cause heterozygotes for null alleles will appear as
~ e r i t n b i l i t Repeatability,
y~ and Independence homozygotes for the amplified allele, biasing esti-
The most basic assumptions of fragment and RE mates of genotype and allele frequencies. Such
are that the characters in question are null alleles have been reported for humans, using
heritable, repeatable, and independent. These as- primers designed from human sequences (Callen
sumption~are considered in turn below. Violation et al., 2993) and may be more frequent when using
of any one could have significant effects on phy- primers designed from different species (C. Moritz
logenetic analyses and population genetic studies. and A. Heideman, unpublished data). N d alleles
The assuinption of heritability has two ele- also have been reported for minisatellite loci (Bru-
ments: fidelity of transmission, and mode of in- ford et al., 1992) and could occur where the probed
heritance. Fidelity of transmission is most likely minisatellite sequence contains a site for the re-
to be violated when using rapidly evolving char- striction enzyme used.
acters such as VNTR loci. Because of their high It generally has been assumed that animal
mutation rates (e.g., Jeffreys et al., 1988), such loci mtDNA and (in many cases) cpDNA are strictly
may be inappropriate for comparisons among di- maternally inherited. Recent studies have identi-
vergent populations or distinct species. RE sites fied paternal leakage of mtDNA in some animals;
recognized by methylation-sensitive enzymes however, the effect of such paternal contributions
could also violate the assumption of heritability, is unclear. Birky et al. (1989) demonstrated that
as variation in the state of methylation would low-frequency leakage (such as that seen in Mus;
mimic the gain or loss of cleavage sites. Methyla- U.B.Gyllensten et al., 1991) would have little im-
tion-induced artifacts can be detected by compact, but more frequent paternal transmission
paring the fragment patterns produced by (e.g., Mytilus; Hoeh et al., 1991; Zouros et al., 1992)
isoschizomers that differ in sensitivity to meth- could have a substantial impact on the evolution-
ylation (e.g., MspI and HpaII; Groot and Kroon, ary dynamics of the system. Biparental inheri-
1979), but the frequency of these artifacts is un- tance of cpDNA is suspected in close to 20% of
known. Methylation does not appear to be a flowering plant species (Corriveau and Coleman,
problem for RFLP analysis of mtDNA and 1988; Harris and Ingram, 1991), and predomi-
cpDNA (Palmer, 1985a), but can result in hyper- nateIy paternal inheritance has been reported for
variation and apparent homoplasy of specific conifer cpDNA (Szmidt et al., 1987; D.B. Wagner
sites in nuclear sequences (G.N. Wilson et al., et al., 1987).Thus, deviations from strict maternal
1984; Jorgensen and Cluster, 1988). It does not af- inheritance appear to be relatively frequent in
fect RFLP analysis of PCR products because these plants, making it necessary to verlfy mode of in-
are produced without methylation. heritance of plastid DNA using appropriately de-
Knowledge of the mode of inheritance is also signed breeding experiments or cytological tests
critical, especially for RAPDs, which are often (Milligan, 1992).
dominantly expressed (Hadrys et al., 1992).With- Repeatability is more of a problem for DGGE,
out this information, it may be impossible to dis- SSCP, and RAPDs. The subtle nature of changes
tinguish alleles of a single codominant locus from detected by DGGE and SSCP makes strict adher-
independent products of non-homologous loci ence to specific parameters (e.g., gel conditions,
(Riedy et al., 1992; J.J. Smith et al., 1995), thereby temperatures, run times) essential for consistent
affecting estimates of population genetic parame- resolution (Lessa, 1993). RAPDs are sensitive to
ters (Clark and Lanigan, 1993; Lynch and MiUigan, reaction conditions and often produce spurious
1994). In these instances, it is essential to examine and unrepeatable products if parameters are not
heritability using appropriately designed crossing carefully standardized (e.g., Ellsworth et al., 1993;
experiments (e.g., Carlson et al., 1991; Rieseberg et Muralidharan and Wakeland, 1993).
256 Chapter 8 / Dowling, Moritz, Palmer & Xieseberg
The assumption of independence of charac- Chapter 7). Given the increasing number of in-
ters has both technical and biological (e.g., link- stances of mitochondria1pseudogenes discovered
age) dimensions. The assumption of indepen- in the nuclear genome (e.g., Gellisen et al., 1983;
dence is potentially violated at the technical level Jacobs et al., 1983; Quinn, 1992; M.F. Smith et al.
for characters generated by several of these meth- 1992; Lopez et al. 19941, the orthology of putative
ods. Where the genetic basis of specific fragments mtDNA amplimers should be tested rather than
is unknown (e.g., multilocus DNA fingerprints, assumed! Given the assumption that copies of
RAPDs), it is impossible to assign fragments of mtDNA residing in the nucleus probably are in-
specific mobility to a particular locus without set- complete, one way to ensure that the amplified
ting up laboratory crosses or cloning/hybridiza- products are derived from the mitochondrion is to
tion experiments. Non-independence can affect amplify the whole genome from a subset of Sam-
RFLPs if there is overlap in the recognition se- ples using "long PCR" (Cheng et al., 1994a,b),
quences of the REs used. This effect may be obvi- then use this product as a template for subsequent
ous, as in the case of MboI (GATC) and BamHI amplifications. Another is to use highly purified
(GGATCC),in which the cleavage sites of the lat- mtDNA (see below) as template, although trace
ter are a subset of cleavage sites of the former. Or, contamination with nuclear sequences can make
the non-independence may be subtle, as exempli- even this ineffective (M.F. Smitl~,personal com-
fied by a C-T transition that eliminates an EcoRI munication) .
site (GGATCC) but produces an HinfI site For RE analysis, fragments of identical mobil-
(GANTC). Such non-independance can cause sig- ity tend to be homologous for sequences from
nificant errors in estimating sequence divergence closely related individuals and perhaps even from
or phylogeny (e.g., Hillis et al., 1994a; Hugall et most intraspecific comparisons (depending upon
al., 1994). the rate of evolution of the DNA sequence in
question). However, the likelihood of conver-
Homology of DNA Segments and Alleles gence-that is, two samples having fragments of
If two individuals exhibit fragments with identical the same size but produced by different cleavage
mobilities, it is assumed that these fragments iden- sites-increases as sequences become more differ-
tify homologous stretches of DNA. For anony- ent (Upholt, 1977). Convergent fragments are
mous or rapidly mutating segments of DNA, this readily identified by mapping cleavage sites.
assumption may not hold. In RAPD analysis, In comparing restriction sites, the investigator
comigrating products may not be homologous, assumes that it is possible to identify positional
particularly in comparisons between species homologs; however, the accuracy of restriction
(Black, 1993; J.J. Smith et al., 1995) and should be maps is limited by error in the measurement of
verified by transfer hybridization using the spe- fragments. Since tke magnitude of measurement
cific RAPD product in question as a hybridization error is proportional to the size of the fragment in
probe (Hadrys et al., 1992) or by cleaving gel-iso- question, such errors are minimized by using
iated products with restriction enzymes and ob- small fragments when constructing cleavage
serving congruent band profiles (e.g., Fritsch and maps. In addition, all siles of questianable homol-
Rieseberg, 1992). Similarly, bands of identical mo- ogy should be checked using side-by-side com-
bility in the complex profiles typically produced parisons of each sample, with the site in question
by multilocus minisatellites are not necessarily ho- isolated on as small a fragment as possible. ACCU-
mologous (Lynch, 1988; Burke et al., 1991). racy of cleavage maps and checks of positional 110-
Under some circumstances, the fragments mology are improved considerably by the use of
compared may be paralogous rather than orthol- polyacrylamide gels to visualize small fragments.
ogous (see Chapter 1). This is especially a prob- Comigrating homologous fragments in differ-
lem for RE analysis of PCR products, where mul- ent individuals are assumed to represent products
tiple copies and/or pseudogenes, or even that are identical by descent, In practice, this as-
non-homologous products, may be amplified by sumption may not hold because of technical limi-
a single set of primers (e.g., Slade et al., 1993; see tations or convergence of allelic state. For exam-
Nucleic Acids 111: Analysis of Fragments and Restriction Sites 257
pie, DGGE and SSCP can distinguish sequences gree of bending of DNA rather than changes In
differingby a single base change (e.g., Lessa, 1992; the flanking restriction sites being assayed. Sucln
T.A. Smith et ai., 1992; reviewed in Lessa and Ap- changes will affect the fragment patterns pro-
$ebaum, 1993); however, under a given set of duced by all REs that cleave on either side of the
conditions these methods may not identify all conformation mutation and can be misintcr-
(Sarkar et al., 1992; Fan et al., 1993; Nor- preted. Other changes, such as sequence re-
man et al., 1994).For minisatellite loci, the identi- arrangements (e.g., Palmer et al., 1985; Jansen and
fication of separate alleles is limited by the accu- Palmer, 1987a), duplications (e.g., Moritz and
racy of estimating fragment size and variation in Brown, 1987; Moritz, 199la; Zeverii~get al., 1991)
among lanes on a gel and alleles of sim- and minor length variants (e.g.,Cann and Wilson,
ilar mobility are usually pooled for analysis (Bu- 1983; Densmore et al., 1985; Palmer et al., 1985)
dowle et al., 1991; resolution can be improved us- can also cause change in mobility of fragments
ing internal size markers in each lane, see Burke defined by homologous restriction sites. Misinter-
et al., 1991; Taggart and Ferguson, 1994).Conver- preting such changes as gain/loss of cleavage
gence of alleles at VNTR loci could also occur if sites leads to gross errors in subsequent analyses
the mutation rate is high and the number of pos- (e.g., Hugall et al., 1994). Therefore, it is essent~al
sible character states is finite (ValdCs et al., 1993). to establish fragment homology by mapylng
However, these parameters are largely undeter- cleavage sites.
mined. FitzSimmons et al. (1995) found several in-
sertion-deletion events in the sequences flanking
Comparison of the Primary Methods
homologous microsatellite loci from different
species, so that alleles of the same length from The choice of technique should be based upon thr
these species would not be identical by descent. type of variation and gene(s) appropriate to the
Precisely mapped (and therefore homolo- problcm (see the section on "Application and
gous) RE cleavage sites also may be convergent Limitations" and Chapter 12). The choiccs to be
because of multiple base substitutions within the made among techniques relate to (1)method for
recognition sequence (Upholt, 1977). However, isolation of DNA, (2) selection of restriction en-
there is some debate about the level of sequence zymes (where appropriate), (3) medium for elec-
divergence at which this occurs (Templeton, trophoresis, and (4) method used to detect Irag-
1983a; Nei and Tajima, 1985). This level may vary ments (Figure 4). Again, one approach is not
among taxonomic groups and is likely to depend exclusive of another; different approaches can be
on factors such as mutation rates and base com- combined as needed.
position. The probability of convergent site loss is
far greater than that of convergent site gain be- Netlzod of DNA Preparation
cause a site loss is caused by any point mutation The optimal method of DNA preparation de-
within a cleavage site whereas a site gain requires pends on the type of tissue available, sequences to
a specific base substitution at a particular base be assayed, and whether or not PCR wiIl be uscd.
pair (Templeton, 1983a; DeBry and Slade, 1985; Howcver, even for PCR, there is no substitute for
W.-H. Li, 1986).These inequalities should be con- highly purified DNA of known quality and can-
sidered when restriction site data are used for centration for initial experiments (see Chapter 7).
phylogenetic analysis (see Chapter 11). For multilocus minisa tellite analyses, DNA witlt
Conversely, fragments of different mobility consistently high quality and uniform concentra-
may actually be homologous. For example, SSCP tion is critical (Bruford et al., 1992). Preparations
may represent one sequence as two or more mo- of total cellular DNA can be used for analysis of
bility variants, suggesting alternate stable confor- any sequences, and in many cases (e.g., in practl-
mations for the same sequence (Hayashi, 1991a). cally all current studies of plant molecular sys-
G. Singh et al. (1987) demonstrated that changes tematics) are preferred for reasons relating to
in mobility of some human mtDNA fragments yield, flexibility, and adaptability (Palmer et a1 ,
were due to base substitutions affecting the de- 1988b), For hybridization methods or PCR ap-
258 Chpter 8 / Dowling, Moritz, Palmer b Xieseberg
W t P analysis of whole RFLP analysis via Analysis of PCR products from

organelle genome6 transfer hybridization specific organellar or nuclear loci
Treatment
Electrophoretic
conditions
Type ol data +------ RFLPs, RE site maps ------+ PCR R n P s DGGE Microsatellites SSCPs
Sensitivity High Low-moderate Moderate-lugh Moderate High High Moderatohigh
Cost High Low Modcratehigh Low Moderate Moderate Low
Effic~ency Moderate High Moderate High Moderate High High

Range of
frdgment >I0 bp >lOObp >ZOO bp 50-ca. 3000 bp 50-600 bp 100-600 bp 50-400 bp
sizes assayed
Figure 4 Flow chart of methods fax assessing varia-
tion DNA fragments or PCR amplimers and qualita-
111
tlve assessments of the cost cffectivrness of different
facilitated by enriching the organellar fraction by
strattgies. a round of differential centrifugation (Lansman et
al., 1981; Palmer et al., 1985).
Fragment analysis of whole mtDNA or
proaches sensitive to DNA quality (e.g., RAPDs), cpDNA has the highest sensitivity when the or-
hlgh-molecular-weight DNA should be prepared ganelle genome has been purified and separated
by phenol-chloroform extraction, salt extraction, from other DNA. Enriched organellar DNA can
or ultracentrifugation (see Chapters 6 and 9). be extracted from organelles isolated via sucrose
Where PCX products robust to template quality gradients of varying densities when large
are analyzed (e.g., most mtDNA, rDNA, or amounts of tissue (e.g., >1.0 g vertebrate organ tis-
cpDNA amplifications of ~ 5 0 bp), 0 a variety of sue, >10 g wet weight of plant tissue) are available
rapid lysis protocols suited to minute amounts of (Lansman et al., 1981; Palmer, 19863). This is the
tissue also can be used (Chapter 7). If only or- approach most often used to purify cpDNA.
a r highly repeated nuclear sequences
g a ~ ~ e l l and However, for animal mtDNA, the level of nuclear
are Lo be assayed, detection of the former may be contamination in such preparations may some-
Nucleic Acids IU: Analysis of Fragments and Restriction Sites 259
times be too high to visualize DNA via end-label- ping), the types of ends produced (5' overhang or
ing. Maximum purity can be achieved using neu- blunt for end-labeling), and cost (Table 3). In gen-
tral CsCl equilibrium gradients with intercalating eral, REs that cleave at 4-bp sites will cleave more
dyes such as ethidium bromide or propidium io- often than those that cleave at 6-bp sequences.
dide. Invertebrate and fungal mtDNAs and some The base content of the recognition sequences is
algal cpDNAs (but not vertebrate mtDNAs or also important: REs with GC-rich recognition se-
plant cpDNAs) typically have a strong bias to- quences will cleave in fewer places in sequences
ward A and T (G.M. Brown, 1985; Palmer, 1985b; that have low GC content (e.g., Table 3). Sensitiv-
Wolstenholme, 1992) and can be separated from ity to methylation also may be relevant where a
unbiased or GC-rich DNA in neutral CsCl gradi- mixed sample of nDNA and organellar DNA is
ents, especially with the addition of dyes such as analyzed. Palmer (1986a) lists methylation-sensi-
bisbenzimide (Hoescht 332581, which preferen- tive REs that cut plant nDNA rarely, but cpDNA
tially bind AT-rich regions (Hudspeth et al., 1980; sufficiently, to permit RFLP analysis. Analysis of
Gargouri, 1989).Care also must be taken to avoid large sequences such as entire cpDNAs (1.150 kb)
copurification with nuclear satellite sequences typically use 6-bp-recognizing REs that produce
(e.g., Arnason and Widegren, 1984). fragments that mostly range from 1-5 kb, al-
The time and expense involved in obtaining though REs that produce more, smaller fragments
highly purified organellar DNAs by ultracentrifu- can be used to compare closely related sequences.
gation is considerable, but justified for whole Closely related animal mtDNAs can be compared
genome RFLP analysis in which fragments are to using 4-bp-recognizing REs (e.g., W.M. Brown,
be detected by staining or end-labeling (e.g., ani- 1980; Moritz, 1991b; Dowling and Childs, 1992;
mal mtDNA). Numerous alternatives to purifica- Dowling and Brown, 19931, but beyond ~ 2 se- %
tion via CsCl gradients have been proposed (e.g., quence divergence it becomes too difficult to
Chapman and Powers, 1984; Powell and Zuniga, identify individual gains or losses of cleavage
1983; Palva and Palva, 1985; C.S. Jones et al., 1988; sites (but see Hillis et al., 1992 for a different ap-
DeSalle et al., 1993). These generally are cheaper proach) and mtDNAs are best analyzed by map-
and faster and seem adequate for particular tis- ping cleavage sites for 5- or 6-bp-recognizing REs.
sues or organisms and for detection methods of Even for large sequences or entire genomes, it
low sensitivity. However, they are not as generally has been recognized that restriction enzymes vary
applicable as CsCl banding, and any nuclear con- in their efficiency for generating RFLPs (Whitkus
tamination can lead to misinterpretation of frag- et al., 1994). For example, RFLP studies of low-
ment patterns, particularly where highly repeated copy number anonymous nuclear sequences in
nuclear sequences are common (see "Interpreta- plants suggest that larger restriction f~agmentsare
tion and Troubleshooting"). If such alternatives more likely to be polymorphic than smaller
are to be used, we recommend that mtDNA be ex- fragments due to the high frequency of inser-
tracted via CsCl gradients from a subset of Sam- tions/deletions observed at these loci (McCouch
ples to act as nDNA-free controls. A promising et al., 1988; Miller and Tanksley, 1990a). Thus, en-
new development, long PCR, can efficiently am- zymes that cut less frequently, and therefore
plify segments of DNA >20 kb (W.M. Barnes, generate larger fragments, tend to detect more
1994) and allows the purification of whole fragment length polymorphism. For analysis of
mtDNA genomes by gene amplification (Cheng et multilocus minisatellites, several REs, usually 4bp-
al., 1994b).This could prove to be a practical and recognizing enzymes (e.g., HinfI, HaeIII, MboI) are
efficient alternative to the use of CsCl gradients. tested to find those that reveal polymorphism in
fragments of appropriate size (Bruford et al., 1992).
Choice of Resk'ction Enzymes The above discussion assumes that REs are
Depending on the application, REs are selected on selected at random. This is efficient for the analy-
the basis of how many sites they are likely to rec- sis of large sequences with approximately even
ognize, the range of salt conditions under which base content, in which case most enzymes are
they are active (broad for double-digest map- likely to cleave one or more times. However, di-
260 Chapter 8 / Dowling, Moritz, Palmer 8 Xieseberg
Table 3
Properties of several commonly used restriction endonucleasesa
Number of sites
Tobacco Human Bee

Enzyme Site C~DNA~ mtDNAC ~ ~ D N A ~
AluI AG~CT 341 64 19

ApaI CGGCCLC 13 5 0
ApaLI G~TGCAC 8 1 0
AseI AT~TAAT 114 10 147
AvaI CLYCGUG 70 3 1
BarnHI GJ-C 40 1 0
Ban1 GJGWCC 32 8 0
Ban11 GUGCYJC 75 15 0
BclI TL-A 54 4 6
~g11 GCYLGGC 9 2 o
B ~ ~ I I ALGATCT 60 0 2
BgtBI TTLCGAA 94 7 0
BS~EII G~GTNACC 12 2 o
BstNT CC~U)GG 128 4 5
BstUI CG~CG 89 6 0
BS~XI CCAN~LTGG 26 4 o
Bsu36I CCT~NAGG 15 1 2
ClaI AT&AT 59 I 4
DdeI c L ~ G 309 72 4
DraI TTT_JAAA 64 4 103
Eco0109 UGLGNCCY 67 14 1
EcoNI CCTLN~AGG 18 5 1
EcoRI C L D C 97 3 5
ECORV GAXLATC 36 3 1
HaeII UGCGCJY 25 7 0
HaeIII GGLCC 196 50 0
HincII G~LUAC 57 12 1
Hind111 A~AGCTT 33 3 2
Hinff G L N C 718 36 12
HinPI GLCGC 89 17 1
HphI GGTGAN~~ 128 55 8
KpnI GGTAC~C 15 3 0
MboI LGNC 623 22 35
MIUI ALCGCGT 9 0 o
MspI C~CGG 214 23 1
NciI C C L ( ~ G G 113 8 0
NcoI C J _ C ~ G NA 4 0
NdeI CALTATG NA 3 4
NheI G J ~ C 7 1 0
NsiI ATGCALT 43 3 2
PstI CTWA~G 14 2 1
(coiztz~zuedl
Nucleic Acids 111: Analysis of Fragments and Restricfion Sites 261
Table 3 (contrnued)
Properties of several commonly used restriction endonucXeasesa
Number of sites
Tobacco Human Bee

Enzyme Site C~DNA~ mtDNAC ~ ~ D N A ~
FUUII CACJCTG 12 1 1
RsaI GT~AC 286 35 9
Sac1 GAGCT~C 21 2 0
Sac11 CCCC~GG 7 2 0
SalI G J m C 11 0 0
ScrEI CCLNGG 239 22 5
SmaI CCC~GGG 16 0 0
SpeI A ~ W T 28 9 1
Ssp1 AAT~ATT 137 11 95
StuI AGGLCCT 16 13 0
sty1 C.~C(AIT)(A/T)GG loo 22 o
TaqI TJCGA 639 29 30
XbaI TJ-A 49 5 3
XhoI CLTCGAG 24 1 1
a Best digestion is
achieved uslng buffers and conditions supplled by the manufacturer. For the recognition
sequence, the locat~onof cuts is Indicated by an arrow ($) and the bases filled in by end-labeling are under-
lmed. "NA" indicates that the information was not available.
b~urnberof tobacco cpDNAfra ments produced (=number of cleavage sites rnh~usone inverted repeat
segment, plus one). Data from A n o z a k i et al., 1986.
Wurnber of human mtDNA fragments produced (=number of cleavage sites). Data from S. Anderson et
al., 1981.
dNumber of honeybee mtDNA fragments produced (= number of cleavage sites). Data from Crozier and
Crozier, 1993.
gestion of PCR-amplified segments with ran- tion for each allelc to be tested, but greatly ex-
domly selected REs is usually inefficient because tends the use of RBs to screen large population
the sequence is relatively short (i.e., <2 kb) and samples for alleles revealed by sequencing.
most enzymes either do not cleave or have uni-
form fragment patterns (e.g., Karl and Avise, Electroplzoresis of RE Fragments and VNTR Loci
1993). In this case, Slade et al. (1993) recom- Fragments may be sorted according to size by elec-
mended "targeted" digestion (i.e., using restric- trophoresis through agarose gels, polyacrylamldc
tion enzymes shown from preliminary sequence gels, or both. The range over which the size of
data to have multiple or variable cleavage sites). fragments can be estimated accurately varies ~'lt11
It is also possible to create restriction sites to gel concentration and buffer. Agarosc gels arc
assay specific nucleotide variants that diagnose commonly used at a concentration of between
two alleles or phylogenetic lineages previously 0.6% and 2.0% agarose w/v, using TAE or TEE
identified by DNA sequencing. Tlus makes use of buffers, and these gels allow accurate estimation
specifically designed mismatch primers to alter of fragment sizes over the range 300 bp to 20 kb.
the sequence of the PCR product just 5' of the TAE provides better buffering capacity and allows
variable nucleotide site such that one of the two better separation of large fragments, but poorcr
alleles now has a restriction site (C. Strobeck and resolution of small fragments. Low (e.g., 0.8%)
1.Sved, personal communication; e.g., Figure 5A). concentration agarase gels and long run times are
This approach requires an additional amplifica- used for analysis of rnultilocus minisatellltcs to
262 Chnpf er 8 / Dozuli~zg,Moritz, Palmer & Rieseberg
(A) Mismatch
I
I'olymorph~sm (13) 2 c ~ , - ~ c ~ c ~ c ~ c(C)~ c ~ c ~
Prlrner
I
ATIATTAATIAAC . .
Sequc~~ce{ T
I. c. . 190
147
110
131
Figure 5 Use of a mlsmatch primer (A) to de- 110 65
57
velop IWLP assays to diagnose variable nu-
clcolldc sltes (B). The primer, which extends
from Lhc ATT to the left, creates a new AseI site
in ti-~cCn allele but not the Cb allele, thereby
providing the basis for the rapid diagnostic test
shown on the acrylamide gel. (C) An example
of high resolution separat~onof small frag- both types of gel. Use of both agarose and poly-
ments produced by digesting an ~340-bpfrag-
ment of marine turtle control region with MseI acrylamide gels produces extremely accurate
(sce Norman et al., 1994). cleavage site maps and ensures that all fragments
are visualized; it also can improve the resolution
of fragments in the region of overlap (-1.5-0.4 kb
maximize resolution (Bruford et al., 1992). Poly- for 1.2% agarose and 3.5% polyacrylamide) and
acrylarnide gels typically are composed of be- may reveal conformation-induced changes in
tween 3.5% and 6.0% acrylamide, and provide for fragment mobility, as these are restricted to the
analysis of fragments ranging from 10 bp to 1 kb. polyacrylamide gels (G.Singh et al., 1987).
Agarose gels are simpler to prepare than For the analysis of microsatellites, amplifica-
polyacrjrlamide gels and may be run horizontally tion products typically are separated on high per-
or vertically. Horizontal gels are easier to prepare centage (e.g., 6%) denaturing acrylamide gels, as
but arc thicker than vertical gels, This is an ad- are used for sequencing (Chapter 4). This enables
vantage where large amounts of DNA are to be reliable separation of alleles that differ by as little
loaded per lane (e.g., when staining with interca- as 2 bp in fragments of 100400 bp and eliminates
latii-ig dyes or using hybridization methods), be- artifacts that are due to differences in conforma-
cause [he DNA concentration at the gel interface tion. Typically, products of sequencing reactions
should not exceed l ,ug/mm2. However, the thin- from some DNA k g . , M13) are included on each
ner (imm) vertical gels are easier to dry onto fil- gel to provide precise measures of allele length
ter paper for autoradiography of end-labeled (Figure ?A). Thus, microsatellite alleles can be
fragme~~ts. Only polyacrylamide gels allow high scored by their absolute length, thereby avoiding
resolut~onof very sn-iall fragments (<0.2 kb) and the practical limitations of methods based on rela-
arc used in conjunction with a variety of detec- tive migration (e.g., DGGE, SSCPs, minisatellites;
tion lechniques, such as end-labeling (W.M. see Chapter 2). A further advantage of microsatel-
Brown, 1980) and some hybridization methods lites for large-scale studies is that their elcc-
(Kreit~nanand Aguadc, 1986). The difference be- trophoresis and detection can be greatly sirnpli-
tween the resolving power of agarose and poly- fied using multiplexing and automated DNA
;icry!amide gels is partrcularly evident where sequencing technology. For this purpose, primers
small products generated by digestion of PCX are resynthesized with a fluorescent dye attached;
products are visualized by staining (Figure 5). this allows different microsatellite loci to be ana-
Howcvel-, polyacrylarnide gels must be handled lyzed simultaneously using different dyes
wrth caution, as unpolymerized acrylamide is a (Schwengel et al., 1994).
potent neurotoxin.
Fur some methods that produce a wide range Ileteroduplexes, SSCP, and DCGE
of fragment sizes (e.g., end-labeling of animal These methods, which are most appropriate to
mtDNA RFLPs), it is useful to run each sample on small (<600 bp) PCR products, vary in complex-
Nucleic Acids IE Analysis of Fragments and Restriction Sites 263
Table 4
Comparative resolving power of heteroduplex analysis, SSCP, and
DGGE methods for detecting different DNA sequences
Method Fragment size Medium Sensitivitya
Heteraduplex analysis < 300 bp Native acrylamide 80-90%

SSCP < 200 bp Native acrylamide 70-95%
> 400 bp Native acrylamide < 50%
DGGE < 600 bp Denakring gradient = 95%
Data from Crompe, 1993.

a Sensitivity indicates the proportion of different variants that are likely to be detected and
is likely to vary depending on the conditions employed.
ity and sensitivity (Table 4; see also Chapter 9, technique is most efficient for fragments of 4 0 0 bp
Protocol 20). Heteroduplex and SSCP analyses and is enhanced by the use of a primer with a 40-
are technically straightforward, involving elec- bp GC-clamp attached (Myers et al., 1989; Sheffield
trophoresis of PCR products through native et al., 1992) and by heteroduplex formation. The
polyacrylamide gels to test for variation in con- GC-clamp converts the sequence into a single do-
formation. Heteroduplexes can be formed by co- main with respect to denaturing properties and
amplification of different alleles (e.g., a standard holds the denatured stands together for staining.
vs. variant mtDNA) or by denaturing and rena- Myers et al. (1989)also suggested that longer PCR
turing mixed PCR products. According to products be digested prior to DGGE, although in
Hayashi (1991a,b), the separation of alleles by this case it would not be possible to attach the GC-
SSCP is sensitive to variations in temperature and clamp to all fragments. Considerable care is needed
buffer conditions and is enhanced by the addition to ensure that the denaturation gradient (whether
of glycerol and the use of gels with low crosslink- temperature or urea) is optimized for the alleles
ing (i.e., low bisacryl-amide/acrylamide ratio). concerned. The appropriate conditions can be de-
Fan et al. (1993) found that resolution was also re- termined empirically using a parallel gradient
duced when conditions were less than optimal, prior to loading multiple samples on a perpendic-
with transitions more difficult to detect than ular gradient (Figure 6 ) . More details on the nu-
transversions. To optimize resolution, gels should ances of DGGE are available in reviews by Myers
be cooled and run at low power (10 W),with a et al. (1989) and Lessa (1993).
minimum migration of the smallest fragments of The power of these methods lies in screening
16-18 cm. large numbers of individuals for known or novel
DGGE is a specialized form of electrophoresis sequence variants. Although they d o not reveal
that uses polyacrylamide gels with gradients of de- the location of the differences in the PCR product,
naturants (e.g., urea) to detect differences in the sta- use of these techniques in combination with lim-
bility of PCR products (reviewed by Myers et al., ited sequencing provides efficient and sensitive
1989; Lessa, 1993). The double-stranded product assays of genetic polymorphism in natural popu-
moves through the gel until it reaches a urea con- lations (e.g., Hayashi, 1991b; Lessa, 1992; Norman
centration where it begins to become single- et al., 1994). Lessa (1992) was able to reamplify
stranded, at which point migration is retarded. Se- and sequence fragments extracted from a DGGE
quences that differ by as little as a single base polyacrylamide gel; thus, DGGE can be used to
substitution begin to denature at a different point purify an allele for direct sequencing-an impor-
within the gradient, resulting in different mobilities tant application for rare alleles that are not ho-
at the completion of electrophoresis (Figure 6). This mozygous among the samples analyzed,
264 Chapter 8 / Dowling, Moritz, Palmer t;.Xieseberg
(A) Perpendicular gradient Figure 6 Exainples of perpei~dicular(A) and parallel

(0)DGGE gels for a segment of the mitochondrral con.
0% trol region from Australian hawksbill turtles heterodu-
plexed with divergent alleles and visualized by silver
staining.For the perpendicular geI, the Australian sam+
ple was heteroduplexed to a divergent allele from the
Carribean and run on a 0-80% urea gradient. The left
curve illustrates the denaturation of the heteroduplexes
over the 40-60% range and the right curve the denatu-
ration of the homoduplexes in the 60-80% range. In (he
parallel gel, the Australian samples (1 and 3 are the
same allele, 2 differs by a single G H T transversion)
were then heteroduplcxed against elther a divergent
Australian allele (lanes 1 to 3) or the same Carribean
samples as used above (lanes 4 to 6 ) and run on a
40-80% gradient,Both gels were run at a constant tem-
perature of 45°C. (Photographs courtesy of Damien
Broderick.)
(B) Parallcl gradient
the DNA, staining intensity is proportional to

DNA concentration, and thus, fragment size. The
minimum amount of DNA in a band detectable
by this method is about 2 ng. Thus, small frag-
ments can only be detected if a large amount of
Heteroduplexes DNA is loaded in the gel: for example, 100 ng of
a 10-kb sequence must be loaded to detect frag-
ments of 200 bp. The concentration of DNA can be
Single-stranded increased by running a thinner gel, e.g., using 0.4-
DNA mm-thick vertical acrylamide gels to separate di-
gests of PCR products (Figure 5).
Silver staining of DNA is more sensitive, al-
lowing detection of 1-100 pg amounts of DNA
(Guillemette and Lewis, 1983; Bassam et al., 1991).
This method has been applied to analysis of
Double-
stranded DNA mtDNA RFLPs (Tegelstrom, 1986) and the detec-
Homoduplexes tion of SSCPs (Hayashi, 1991b) and PCR products
(Bassam and Caetano-Anolles, 1993; Santos et al.,
1993). Although the intensity of staining is still
proportional to fragment size, this method pra-
Methods of Detection vides a substantial improvemei~tover staining
DNA fragments can be detected by direct stain- with ethidium bromide. However, the cost is
ing, incorporation of rddioactive nucleotides, or higher, and we have found the increased sensitiv-
hybridization. In choosing an approach, consider- ity unnecessary for most applications.
ations include cost, effort, the amount and type of PCR-generated fragments can be detected
DNA that can be prepared, and the sensitivity de- with maximum sensitivity by incorporation of ra-
sired (Figure 4). Direct staining, usually with dioactive nucleotides during the alnplificatiotz
ethidium bro~nide(EB), is the simplest and cheap- process or by labeling the 5' end of the primer
est (but least sensitive) method, and can only be with y32P-dATPusing polynucleotide kinase.
applied to purified sequences (e.g., cpDNA, Tlus approach is used to detect inicrosatellites and
mtDNA, PCR products). Since the dye binds to SSCPs (Hayashi, 1991a).In general, the use of la-
Nucleic Acids III: Analysis of Fragments and Xes t18ictionSites 265
beled primers (rather than incorporation of la- random priming (Feinberg and Vogelsteln, 19831,
beled dNTPs) gives better results because of the altliough non-radioactive methods also have been
higher specific activity and the absence of base developed (e.g., biotin-streptavidin, J.J. Leary et
effects (Hayashi, 1991a). al., 1983; AMPPD and alkalinc phosphatase, Cate
End-labeling of fragments produced by re- et al., 1991; Bronstein et al., 1991). The labeled
striction enzymes involves adding a-labeled 32P strands will hybridize to complementary mem-
(or 35s) nucleotides ( d N P s ) to the ends produced brane-bound single-stranded sequences, allowing
by cleavage with REs and, again, can only be ap- them to be detected by exposure to film (radioac-
plied effectively to highly purified sequences. Be- tive probes, chemoluminescence) or staining (bi-
cause each fragment has the same number of otin probes). The amount of base pair mismatch
this technique has the advantage that frag- permitted (stringency) can be controlled by vary-
111ent intensity is independent of size (i.e., a 10-bp ing temperature, salt, and formamide concentra-
fragment should be as intense as a 10-kb frag- tion (Sambrook et al., 1989).
ment). Tlus method also is highly sensitive: end- This approach has many useful properties.
labeled fragments of any size can be visualized Any sequence for which there is a probe can be an-
from 1-5 ng of digested DNA (Figure 4). How- alyzed from heterogeneous (total cellular) DNA.
ever, end-labeling of RE-digested PCR products Thus, hybridization is the only practical approach
(with all four CX-~~P-~NTPS) tends to reveal a large where the sequence cannot be readily purified, ei-
number of spurious amplification products not ther directly or by amplification. The method IS
visible in EB-stained gels, making interpretation highly sensitive, allowing detection of picogram
difficult. We suspect, but have not demonstrated, (pg) quantities of a fragment. It also is an efficient
that these represent incomplete amplification approach to assaying multiple sequences (e.g.,
products with long single-stranded stretches that Quinn and White, 1987a; Sites and Davis, 1989)
make ideal templates for polymerization. making it an especially valuable tool for gene
With end-labeling, it is preferable to use REs mapping (e.g., Palmer et al., 1988b) or sequential
that produce 5' overl~angsor blunt ends (Table 2). analysis of single-locus minisatellites (Bruford et
The large (Klenow) fragment of E. coli polymerase al., 1992).Multiple (>lo)probes can be applied se-
has both polymerase and 3' exonuclease func- quentially to membrane-bound DNA by dissociat-
tions. The polymerase will add radioactive nu- ing the probe and target strands under conditions
cleotides using the 5' overhang as a template. The in which the latter remain attached to the filter.
3' exonuclease can convert blunt ends or 3' over- Transfer hybridization does hcve some pit-
hangs to 5' overhangs but is relatively inefficient. falls. Using standard methods it is difficult to de-
Ends can be labeled with any O?~P- or 35S-dNTP tect fragments smaller than 250 bp, making the
so long as the nucIeotide occurs among the bases technique less able to detect the gain or loss of
to be inserted (Table 3). However, if several differ- closely spaced cleavage sites and limiting the de-
ent REs are being used, it is most efficient to use tection of variants at VNTR loci. However, Krcit-
all four 32P-or 35S-dNTPs.This makes end-label- man and Aguade (1986) used electroplnoret~c
ing a relatively expensive approach unless there transfer of digested DNA from denaturing. poly-
is economy of scale. acrylamide gels, combined with high-sensitivity
Transfer hybridization (Southern, 1975) has l~ybridizationconditions (Church and Gilbert,
two basic steps. First, a membrane-bound replica 19841, to detect fragments smaller than 100 bp
of the gel is made by transferring electrophoreti- A second potential pitfall of hybridization IS
calIy separated fragments to a nylon or nitrocellu- the danger of detecting fragments other ~ h a n
lose membrane. Second, labeled single-stranded those from the sequence to be assayed (e.g., par-
DNA probe is allowed to hybridize with comple- alogous genes). In most cases, this problem can be
mentary membrane-bound sequences. The probe minimized by hybridizing at the highest possible
usually is labeled with radioactive (321' or 35S) stringency and using cloned probes. For organelle
dNTPs by nick-translation (Rigby et al., 1977) or sequences, this problem can arise because of the
266 Chapfer 8 / Dowling, Morifz, Palmer t;l Rieseberg
not-nzfrequent movement of sequences to other beling of a highly purified sequence, followed by

organelles or to the nucleus (e.g,, Timmis and broader scale surveys using hybridization meth-
Scott, 1984; Palmer 1985a). For example, Quinn ods (Tibbets and Dowling, 1995). For the analy-
and Wlilte (1987a) observed that a cloned mtDNA sis of PCR products, the simplicity of RFLp
segment hybridized to nuclear sequences when analysis can be combined with the higher resolv-
the sample was derived from tissues rich in nu- ing power of DGCE to screen for variants previ-
clear DNA (DNA derived from avian blood) but ously detected by sequencing (e.g., Norman et
not when the template had a higher proportion of al., 1994). Conversely, methods that identify a
mtDNA (e.g., DNA from liver). In practice, how- high proportion of different alleles (e.g., appro-
ever, such artifacts due to paralogous sequences priately calibrated SSCP or DGGE) can be used
rarely interfere with hybridization assays of to screen populations and identify variants for
cpDNA and animal mtDNA because of the high sequencing (Lessa and Applebaum 1993; Aguade
copy number of the organellar sequences relative et al., 1994).
to nDNA copies (but see Lopez et al., 1994) and,
ill plant leaves, the abundance of cpDNA relative
to mtDNA. APPLICATIONS AND LIMITATIONS
Applications of transfer hybridization to as-
saying sequence variation are restricted by the
availability of suitable probes. These must have
Choice of Sequence
sufhcient sequence similarity to the target DNA Selecting the sequence(s) to be analyzed is the first
to forin a stable hybrid at moderate to high strin- decision to be made in designing any molecular
gency. Ideally, the probes should come from the systematic study, including one based on frag-
same species. Depending on the scale of the ment variation (see Chapter 12). In broad terms,
study, sufficient quantities of probe DNA can be choices are made according to evolutionary rate
generated by cloning (Quinn and White, 1987b; and mode of inheritance (Table 5). Another fun-
Degnan, 1993b; Chapter 9) or by PCR (e.g., with damental choice is whether information is re-
universal primers; see Chapter 7). Alternatively, quired on allele phylogeny as well as distribution
if the sequence contains some highly conserved because some techniques (e.g., RAPDs, VNTRs)
regions, probes prepared from other species can only provide information on the latter. Recent de-
be u s d . The use of such hetenologous probes is velopments in population genetic theory (e.g.,
exeln~lifiedby analyses of cpDNA and rRNA se- phylogeography, Avise, 1989; allele coalescence,
quences, discussed below. Although the use of Hudson, 1990; Harvey et al, 1994) use information
heterologous probes is technically simpler, cau- on allele relationships as well as their distribution
tion is needed in interpreting results as fragments to make inferences about population struc.lure
wholly included within a rapidly evolving region and history. However, for many applications (e.g.,
may not be detected (Hillis and Davis, 1988; Jor- analyses of mating systems, population structure,
genseix and Cluster, 1988; S.M. Williams et al., use of genetic tags), information on allele distrib-
1988) utions will suffice.
Characters should display sufficient variation
The U s e of Combined Techniques to enable population genetic or phylogenetic
If should be noted that the methods described analysis but not so much that there is substantial
above are not exclusive: techniques with differ- homoplasy of fragment lengths or cleavage sites.
ing resolving power can be combined to identify For animals, the highest rates of nucleotide sub-
sequence variants and then to screen populations stitution generally occur in mtDNA, certain areas
or species for these variants. An accurate and ef- of the non-coding control region in particular, fol-
flcient approach for the analysis of animal lowed by non-coding regions of nuclear DNA
mtDr\rA is to examine RFLP variation in detail and then silent substitutions within nuclear cod-
for a representative subset oi taxa using end-la- ing sequences (W.-H. Li et al., 1985a).For plants,
Nucleic Acids III: Analysis of Fragments and Restricfion Sifes 267
Table 5
~volutionaryproperties of different genomes and lineagesa
-
-
Genome
mtDNA
Lineage
Animals
Inheritance
Maternal
Point mutations
High
Size range
14- >30 kb
Rearrangements
Very rare
mtDNA Land plants Maternal LOW 200-2,500 kb Very frequent
mtDNA Fungi Allb LOW 20-200 kb Frequent
cpDNA Land plants ~llb LOW 120-217 kb Rare
nDNAC Animals Biparental Moderate 1-1000 x lo5 kb Frequent
nDNAC Land plants Biparental Moderate 1-1000 x lo5 kb Frequent
nDNAC Fungi Biparental Not known 0.1-10 x lo5kb Frequent
Data summarized from Cavalier-Smith, 1985; Palmer, 1985a,b; Moritz et al., 1987; Wolfe et al., 1987,1989a,b; Palmer and Herbon,
1988; and Neale and Sederoff, 1988.
"or properties of hypervariable minisatellite and microsatellite loci, see text.
"he term "all" refers to maternal, paternal, and biparental modes of inheritance.
C nDNA = single-copy nuclear loci.
both cpDNA and mtDNA have relatively low choice of sequences, Sequences with biparental
substitution rates (Palmer, 1985a).In contrast, the codominant inheritance (e.g., low-copy-number
very high mutation rate of VNTRs (Table 1) nuclear RFLPs, VNTRs) often display multiple al-
makes these ideal for studies of mating systems leles per locus and heterozygotes are easily iden-
and structure of closely related populations tified. In contrast, RAPD loci are usually biallelic,
(Queller et al., 1993),but is expected to confound with typically dominant inheritance patterns
phylogenetic information as evolutionary dis- (J.G.K. Williams et al., 1990). Thus, RAPDs pro-
tance increases. vide less genetic information on a per locus basis
The mode of inheritance has a strong effect than codominant loci when applied to questions
on the dynamics of genetic variation within of population genetic structure (Lynch and Milli-
species and on the inferences that can be drawn. gan, 1994), paternity (Lewis and Snow, 19921,
The inheritance of mtDNA and cpDNA is usually outcrossing rates (Fritsch and Rieseberg, 1992), or
uniparental and effectively haploid. This results hybridization (Rieseberg and Ellstrand, 1993).
in a four-fold reduction in the effective number of Another mode of inheritance is shown by
genes when males and females are equally fre- some repeated sequences: this is concerted evolu-
quent (Birky et al., 19891, which increases the ef- tion, the tendency for copies of such sequences to
fect of drift and the rate of turnover within popu- become homogeneous, first among gene copies
lations (Avise el al., 1984,1988). This property is within genomes and then among individuals
valuable for analyses of population structure in within populations (Zimmer et al., 1988; Hancock
that it tends to increase the proportion of varia- and Dover, 1990; reviewed by Arnheim, 1983). If
tion distributed among populations and, together sufficiently rapid, concerted evolution could en-
with a high mutation rate, provides for more hance among-population variation (by reducing
rapid sorting of ancestral alleles within and be- within-population variation), making these genes
tween species (e.g., Slade et al., 1994). Compar- especially useful for defining population bound-
isons of genetic markers with uniparental versus aries and potentially simplifying sampling for
biparental inheritance also can be used to detect phylogenetic studies (Hillis and Davis, 1988).
differences in behavior between the two sexes However, the same process would make these se-
(e.g., M.L. Arnold et al., 1991; Karl et al., 1992a). quences inappropriate for quantifying long-term
Dominant versus codominant inheritance also is gene flow, effective population size, and devia-
an important consideration with regard to the tions from Hardy-Weinberg equilibrium.
268 Chapter 8 1 Dowling, Moritz, Palmer 8 Rieseberg
The rate of sequence rearrangement also is a more accessible, more researchers are examining
consideration in selecting a sequence for fragment fragment variation at individual nuclear loci, ei-
analysis (Table 5 ) . Rearrangements, unless care- ther by hybridization or tlvough analysis of PCR
fully characterized, complicate RE analysis and products (e.g., RAPDs or microsatellites). Appli-
can lead to gross overestimates of sequence diver- cations are broad and include estimating the ex-
sity. However, once identified, the rearrangements tent of variation within and among pop~lations,
are themselves potential sources of phylogenetic levels of gene flow and effective population size,
information. Most cpDNA and animal mtDNA se- patterns of historical demography and biogeogra-
quences are stable in this regard, although there phy, and analyses of parentage and relatedness.
are exceptions (e.g., Moritz and Brown, 1987;
Palmer et al., 1987,1988a; Palmer, 1991). In con- Animal Mitochondrial DNA
trast, plant mtDNAs (Palmer and Herbon, 1988) A great deal of attention has been given to Rl?LP
and, to a lesser extent, fungal mtDNAs (Bruns analysis of animal mtDNA within and among
and Palmer, 1989) are in general unsuitable for populations (for recent reviews, see Ilarrison,
whole-genome fragment analysis because of their 1989; Avise, 1994; Moritz, 1994).The popularity of
high rate of structural change. Transposable ele- mtDNA for studies of animal populations is due
ments are common in nuclear genomes, and their to a combination of its maternal, clonal inheri-
insertion or excision will modify fragment pat- tance, its relatively rapid rate of base substitition,
terns revealed by hybridization assays (e.g., and the relative ease with which it is isolated and
Aquadro et al., 1986), potentially complicating analyzed. Much of the variation in animal
fragment analyses. J.G. Lawrence et al. (1989) mtDNA is due to base substitution, with transi-
used the location of transposoi~sto estimate relations greatly outnumbering transversions (W.M.
tionships among closely related strains of E , coli, Brown et al., 1982). However, in many groups of
but noted that transposons moved too frequently animals length variation also is common, with dif-
for such characters to be of use among more di- ferences occurring within (i.e., heteroplasmy) as
vergent isolates. well as between individuals (reviewed in Moritz
In the following sections, we illustrate by ex- et al., 1987; Rand, 1993). Significantly, the rates
ample some of the strengths and weaknesses of and patterns of mtDNA evolution vary consider-
fragment analysis of the more commonly used ably among taxa (Martin and Palumbi, 1993;
types of sequences for studies of intraspecific vari- Rand, 1994) as well as among genes within taxa
ation, hybridization, and interspecific and higher (W.M. Brown, 1985; Adachi et al., 1993; Kondo et
level phylogeny. This is not intended to be ex- al., 1993; Zhu et al., 1994).
haustive: the application of molecular information The amount and distribution of variation
to problems in evolutionary biology and ecology within versus among populations depends on
has expanded far too rapidly to contemplate such population sizes and rates of gene flow, both his-
a review, torical and contemporary. The most common pat-
tern, at least in terrestrial vertebrates (for marine
counter examples, see Palumbi, 19921, is to have
Population-Level Comparisons less variation within than among geagrapluc pop-
Fragment variation in cpDNA, animal mtDNA, ulations, indicative of population structuring
and unique nuclear sequences has provided use- (Smouse et al., 1991; reviewed by Avise et al.,
ful genetic markers for the analysis of variation 1987). Variants detected within populations typi-
within species. For animals, the most commonly cally have low levels (<I%)of sequence diver-
used systems have been mtDNA and nuclear gence and are best detected by digestion with 4-
VNTR loci, especially in the form of multilocus bp REs that sample a larger fraction of the
fingerprints. For plants, cpDNA and anonymous genome. However, it is often possible to distin-
low-copy-number nuclear sequences have been guish numerous mtDNA haplotypes within pop-
most commonly used. As the technology becomes ulations if sufficient enzymes are used to sample
Nucleic Acids III: Analysis of Fragments and Xestviction Sites 269
the genome (e.g., Cann et al., 1984; Dowling and marked when information on divergence among
Childs, 1992; Dowling and Brown, 1993). In sev- alleles was included as well as their frequency
eral studies (e.g., Avise et al., 1992a; Moritz and and distribution; however, the converse was true
Heideman, 1993; Tibbets and Dowling, 1995), for coconut crabs (Lavery et al., 1996a1, suggesl-
highly divergent mtDNA types have been found ing that the most sensitive form of analysis will
at the same location, suggesting either large long- vary (see Hudson et al., 1992a).
term effective population size or immigration The unique characterist~csof mtDNA also re-
from a previously isolated population. sult in some disadvantages. The lack of recombl-
An interesting recent development is the use nation makes mtDNA comparable to a single al-
of the distribution of pairwise differences (e.g., the lozyme locus with many alleles. Consequcntly,
number of different restriction sites) between estimates of gene diversity obtained from mtDhTA
mtDNA alleles to make inferences about histori- are expected to exhibjt a larger variance than com-
cal demography. For example, a Poisson distribu- parable estimates made using a large number of
tion (i.e., a star-like phylogeny) may indicate a pe- nuclear loci. The extremely rapid evolut~oriof
riod of rapid population expansion (Slatkin and some mtDNA genes can result in convergence,
Hudson, 1991; Rogers and Harpending, 1992). confounding phylogenetic relationships even
Applied to humans, this method suggests rapid wlthin some species (e.g., Lansman et al., 1983;
population growth (DiRenzo and Wilson, 1991). Dowling and Brown, 1989). This problem seems
Applied to Pacific Ocean populations of coconut to be particularly severe for length variants where
crabs, it also indicates exponential population thc number of character states 1s discrete and fi-
growth from about 130,000 years ago-a remark- nite. In general, length variants should only be
able result given that the species has declined to used as markers for the analysis of population
the point where it is now threatened with extinc- subdivision where there is evidence for thejr sia-
tion (Lavery et al., 1996a). The coconut crab and ble inheritance, and even then with caution.
other examples serve to emphasize that the ge-
netic signature recorded in mtDNA phylogeny re- Chloroplast DNA
flects historical more than contemporary popula- The chloroplast genome IS conservative in most
tion processes, particularly where population size respects (Table 5; reviewcd by Palmer 1985a,b;
or connectivity fluctuates (Avise, 1994; Moritz, Wolfe et al., 1987,1989b; Palmer, 1991).Although
1995). the range of cpDNA sizes among photosynthetic
The tendency for variation among popula- land plants is large (120-217 kb), most of this van-
tions to be significantly higher than that within ation is due to a few exceptional genonies and
populations allows mtDNA to be used to estimate length mutations typically are short (<I. kb) and
phylogenies of populations, and thus to investi- are of restricted occurrence. The order and
gate patterns of historical biogeography (Berm- arrangement of chloroplast genes is nearly invari-
ingham and Avise, 1986; Bowen et al., 1989; ant. Most land plant cpDNAs have identical orga-
Moritz and Heideman, 1993; Moritz et al., 1993; nization, and most variants that do occur stem
Joseph and Moritz, 1994). The phylogeographic from one or a few simple inversions. The rate of
approach (Avise et al., 1987)provides a qualitative nucleotide substitutions in cpDNA appears lo be
assessment of genetic population structure within less than that of animal and plant nDNAs, and
species. A more quantitative approach, using fixa- much less than for animal mtDNAs (Table 5).Per-
tion indices (e.g., Takahata and l'alumbi, 1985; Ex- haps the most variabIe feature of cpDNA is ~ t s
coffier et al., 1992) or alternative statistical meth- mode of inheritance, which may be strictly niater-
ods (e.g., Slatkin and Maddison, 1989; Neigel and nal (most angiosperms), biparental (=20% of an-
Avise, 1993),is needed where the variation within giosperm species; Corriveau and Coleman, 1988;
populations is substantial relative to that among Harris and Ii~gram,1991),or paternal (conifers; re-
populations. Excoffier et al. (1992) found that vari- viewed in Sears, 1980; Neale and Sederoff, 1988).
ation among human populations was more However, even with biparental inheritance, trans-
270 Chapter 8 / Dozuling, Morifz, Palmer & Rieseberg
mlssron is essentially clonal as recombination has tion at single-copy or repeated loci also can pro-
not been observed in land plants. vide information on allele phylogeny. A signifi-
The slolv rate of change in cpDNA sequence cant advantage of working with identified, rather
and structure (Table 5) is reflected in the low lev- than anonymous, nuclear loci is the greater PO-
els of wthin- and among-population variation ap- tential for examining variation at the same loci in
parenl from most of the early studies (Banks and other species and thus benefitting from the Syn-
Urrky, 1985; D.B. Wagner et al., 1987; Neale et al., ergy between studies of n~olecularevolution and
1988).Ho~vever,a number of more recent studies systematics (Chapter 1).
of llowering plant species have found substai~tial
levels of wrtl~in-populationvariation (reviewed in MINISATELLITE SEQUENCES The discovery of
D.E. Soltis et al., 1992). In many of these cases, I~ypervariableminisatellite sequences and their
however, the intraspecific cpDNA variation does use in D N A fingerprinting revolutionized the
not appear to have arisen in situ, but instead ap- analysis of population-level variation, particular-
pears to result from cpDNA introgression (rely the assessment of parentage, contributing to
viewed in Rieseberg and Soltis, 1991). In many studies of sexual selection, mating behavior, and
studlcs principally aimed at clarifying interspe- population ecology (e.g., Jeffreys et al., 1985a,b;
cific relat~onships,cpDNA polymorphisms were Burke, 1989; Burke et al., 1991; Gibbs et al., 1990;
eicher absent or rare among conspecific popula- Rogstad et al., 1991a; Haig et al., 1993; Wolff et al.,
tions (reviewed in Palmer, 1987; D.J. Crawford, 1994).
1989; D.E. Soltis et al., 1992). The ultimate utility Minisatellites typically are analyzed by diges-
of cpDNA as a population marker remains un- tion with REs (which do not cleave within the tan-
clear, and is llkely to vary from species to species, dem repeats) and transfer hybridization, using
depending on extrinsic and intrinsic factors (i.e., minisatellite sequences, an entire hypervariable
the number of restrictioi~sites and average length sequence, or synthetic oligonucleotides as probes.
of restriction fragments surveyed, the age of the Two distinct strategies have been applied. Core
specles and its populations and their rate of minisatellite probes have been used to reveal vari-
cpDNA evolution). Point mutation rates in ation at a large number of hypervariable loci (e.g.,
cpDNA may vary severalfold among closely re- Jeffreys et al., 1985b).The result is complex multi-
lated laxa (Palmer et al., 1988b), whereas rates of fragment patterns that are usually unique to an
rearrangements and length mutations have been individual and are extremely powerful for testing
shotvn to be highly variable, principally due to parentage when the putative parents can be lined
dlfferel~cesin the amount of short dispersed and up next to the individual in question. A major ad-
tandem repeats, respectively (Palmer et al., 1985, vantage of the method is that many of the probes
1987) can be applied across a wide spectrum of plants
and animals. The alternative approach is to assay
Nuclear Sequences hypervariable loci one at a time using synthetic
Seq~~cnces in the nuclear genome provide an ef- oligonucleotides or cloned sequences as probes
fec~lvelyinexhaustible supply of genetic markers, (e.g., Nakamura et al., 1987; Bentzen et al., 1991;
if that variation can be accessed efficiently. For Prodohl et al., 1994; Scribner et al., 1994; Verheyen
analyses of intraspecific variation, attention has so et al., 1994).However, cloning of minisatellite loci
far iocused on single-copy sequences, particularly for probes is a relatively complex process (Bruford
those that are hypervariable (VNTRs), and on re- et al., 1992). This approach requires more effort
peated sequences such as rDNA cistrons. These than multilocus fingerprinting, and the variation
vary widely in the form and rate of mutation, at individual loci tends to be taxonomically re-
which has important implications for how they stricted (Gray and Jeffreys, 1991; but see Hannotte
are used. An important distinction is that analy- et al., 1992),but it has the major advantage that al-
ses oi VNTR loci provide information on allele leles can be assigned to specific loci and geno-
distr~bution,whereas analyses of sequence varia- types can be identified. In some cases it is possi-
Nucleic Acids III: Analysis of Fragments and Restriction Sites 271
ble to amplify individual minisatellite loci by PCR Basic protocol Enrichment method
(e.g., Boerwinkle et al., 1989). Indeed, Jeffreys et
al. (1991) combined PCR with RE digestion to ex-
amine the fine structure of variation at a human
minisatellite locus, increasing the resalution of
technique still further.
Although comparison of the multifragment
patterns generated by the minisatellite probes re-
mains a popular method for testing parentage,
there are several technical and statistical difficul-
ties with using this method (see Lynch, 1988;
Burke et al., 1991; Bruford et al., 1992). These in-
clude: (1) assigning specific fragments to a partic-
ular locus and thus identifying alleles and deter-
mining genotypes; (2) potential comigration of
non-homologous fragments (convergence),inflat-
ing the variance of the estimate of similarity; and
(3) correlations among loci due to linkage.
Nonetheless, measures of similarity and popula-
tion structure have been devised (Lynch, 1990,
1991a; Reeve et al., 1992), and several studies have
revealed variation in band-sharing coefficients
consistent with known relationships, current pop-
ulation sizes, or inferred population history (e.g.,
Wayne et al., 1991b; D.A. Gilbert et al., 1990,1991;
Degnan, 19934. The use of single-locus rninisatel-
lites for the study of natural populations is at an
early stage but overcomes most of the above diffi-
culties and appears very promising (e.g., May et
al., 1993; Scribner et al., 1994).
MICROSATELLITE SEQUENCES Microsatellite as- Figure 7 Schematic of an approach for isolating mi-
says have rapidly become established as a pow- crosatellite loci and developing primers for amplifica-
erful tool for the analysis of mating systems and tion using a basic protocol (left; see, e.g., Rassman et al.,
1991) or a non-radioactive procedure that enriches
population structure (reviews in Queller et al., cloned fragments containing repeat arrays (right; after
1993; Bruford and Wayne, 1993; Schlotterer and Armour et al., 1994).
Pemberton, 1994). Positive features include: (1)
high variability (e.g., Table I), even in species
lacking polymorphism at allozyme loci (Hughes morphism at microsatellite loci has been report-
and Queller, 1993); (2) the ability to score codom- ed across cetaceans, which diverged 10 million
inant genotypes with exact allele sizes; and (3) years ago (Schlotterer et al., 1991); marine turtles,
access via PCR, making it possible to work from which separated u p to 150 million years ago
extinct as well as extant populations (e.g., A.C. (FitzSimmons et al., 1995); and divergent taxa of
Taylor et al., 1994; Roy et al., 1994). One draw- birds (Hanotte et al., 1994).In any case, the meth-
back is the need to develop new sets of primers ods for developing microsatellite primers from
for each group of species, although this may be random, size-selected clones are reasonably
less of a problem than for minisatellites (see straightforward (Figure 7; Rassman et al., 1991;
Chapter 9, protocol 26). Conservation of poly- Armour et al., 1994; see also Chapter 9). The
272 Chapter 8 / Wowling, Moritz, Palmer & Xieseberg
most frequent microsatellites in mammals are IDENTIFIED SINGLE-LOCUS SEQUENCES RFLP analy-
AC-TG repeats, followed closely in number by sis has been used to examine variation within and
AG-TC repeats (Morgante and Oliveri, 1993). In among populations for genes of known function
contrast, repeats of the AT type are by far the [e.g.,alcohol dehydrogenase (Aquadro et al., 1986;
most common in plants, with repeats of the AC- Kreitman and Aguade, 1986; G.M. Simmons et al.,
TG type occurring very infrequently (Morgante 1989) and several other genes (Begun and
and Oliveri, 1993). Aquadro, 1993) in D~osopkila].These studies typi-
Most applications to date have concerned cally have revealed a wealth of polymorphism
analysis of mating systems (e.g., Amos et al., and have provided information concerning the
1993; Morin et al., 1994),where microsatellites are evolutionary history and diversity of populations,
particularly powerful because of the large num- the action of selection and drift on the sequences
ber of genotypes observed. Analyses of mi- concerned, or both. Similarly, RFLP analyses of
crosatellite variation among populations are just major histocompatibility (MHC) loci have pro-
beginning to appear (e.g., Edwards et al., 1992; vided important insights into population
Roy et al., 1994; Bowcock et al., 1994; Paetkau and processes and molecular evolution (reviewed by
Strobeck, 1994) and, despite concerns over the Slade, 1992). Other fragment separation methods,
impact of convergence and selection, results seem such as heteroduplex analysis, SSCP, and DGGE
promising. One interesting application is to ex- are being widely used to screen for diagnostic
amine differences among populations for a Y- polymorphisms in several genes associated with
chromosome-specific microsatellite, permitting human diseases (reviewed by Grompe, 1993) and
direct analysis of male-mediated gene flow (San- to a more limited extent in surveys of natural pop-
tos et al., 1993). Much more needs to be learned ulations (Lessa, 1992; Aguade et al., 1994).
about the dynamics of microsatellite variation Because they are expected to be more variable
within and between species before their potential than coding regions, there have been several stud-
can be fully realized (Bruford and Wayne, 1993; ies of variation in introns using PCR primers lo-
ValdCs et al., 1993). Nonetheless, microsatel1ites cated in conserved regions af flanking exons
provide an alternative to allozymes for some, but (Lessa, 1992; Slade et al., 1993; Palumbi and Baker,
not all, applicatiuns where inforti~ationon allclc 1994),This is a very promising approach because
frequencies at nuclear loci is needed (see also the design of primers that can be used across tax-
Chapter 12). onomically divergent species (see Chapter 7) will
One limitation peculiar to microsatellite promote use of the same loci in different species.
analysis is that the use of primers developed in Not surprisingly, nuclear introns tend to be less
one species on other species may bias estimates of variable than mlDNA, so that digestion of PCR-
allelic diversity, with the species from which the amplified introns with randomly selected REs is
primers were designed having higher values than relatively inefficient. Thus, studies to date have
"non-source" species (Bruford and Wayne, 1993). combined sequencing with DGGE (Lessa, 1992) or
Clones are selected on the basis of having long targeted digestion (Slade et al., 1993; Palumbi and
uninterrupted arrays of repeats (typically >12 un- Baker, 1994).
interrupted repeats; Figure 7) because of the ex-
pectation that these are the most likely to be poly- ANONYMOUS SINGLE-COPY SBQUI~NCES Randomly
morphic (Weber, 1990). In other species, arrays at cloned low-copy-number sequences have been
some of these loci are likely to be shortened or in- examined for RFLPs via transfer hybridization
terrupted, presumably reducing the mutation rate (e.g., Quinn and White, 198%; Degnan, 1993b)
and thus the diversity, irrespective of population and have been shown to provide useful informa-
size. The extent of this bias remains to be deter- tion on patterns of intraspecific polymorphism
mined. The effect was evident in cross-species Probes for RFLP analysis of anonymous nuclear
amplifications of marine turtles (FitzSimmons et loci typically are obtained from either cDNA or
al., 1995),but not in wombats (A.C. Taylor et al., genomic libraries. From these libraries, clones
1994). are selected that. hybridize to single or low-copy-
Nucleic Acids III: Analysis of Fuagrnents and Restriction Sites 273
number sequences. In practice, this requires efficient. Of 15 primer pairs developed for turtles,
selecting clones that hybridize to only one or two only seven reliably amplified products of the ex-
restriction fragments. There appears to be no cor- pected size and only five were polymorpluc.
relation between clone size and polymorphism Genetic studies of many species are often
(McCouch et al., 1988; Miller and Tankslejr, limited by the quantity and quality of tissue
1990a), but cDNA clones typically reveal consid- available for analysis, and by the number of vari-
erably more polymorphism than genomic clones, able loci that can be assayed in a cost-effective
regardless of the enzyme used to generate the manner. PCR-generated RAPDs have proven ef-
library (Landry et al., 1987; McCouch et al., 1988; fective for efficiently surveying numerous poly-
Miller and Tanksley, 1990a). morphic loci from small amounts of tissue. How-
Perhaps the most common application of ever, as discussed above, RAPDs have significant
RFLP analyses of single-copy nuclear loci has limitations as well. Although some of these diffi-
been to assess patterns of relationships among culties can be overcome witln appropriate experi-
populations or accessions of cultivated plants and mental design (such as that shown in Figure 81,
their wild relatives (Kochert et al., 1991; Aldrich the intrinsic teclmical and conceptual limitations
and Doebley, 1992; Neuhaussen, 1992; Liu and of RAPDs have caused many to have substantial
Furnier, 1993; reviewed in Whitkus et al., 1994). reservations about their use: in some instances,
Typically, populations or accessions are examined this information might be obtained more reliably
for the presence/absence or frequency of RFLP using other sets of markers (e.g., microsatellites
"alleles" and then subjected to phylogenetic or allozymes).
analysis. Although this approach generally pro- Nonetheless, witln these caveats and at the
vides more resolution than isozyme analysis, appropriate taxonomic 'level, RAPDs can be a
these WLP studies often suffer from inadequate powerful tool for studies of the genetics, system-
population sampling. Moreover, in many in- ati.cs, and ecology of populations, By far the most
stances the genetic basis of the RFLP variation is common use of RAPDs has been to identify and
poorly understood. Thus, genetic alleles repre- discriminate among individuals, cultivars, vari-
sented by more than one fragment may be scored eties, and species (M.L. Sinith et al., 1992; Fani et
more than once, thereby biasing estimates of rela- al., 2993; Mailer et al., 1994; Hsiao and Riescberg,
tionships. Nonetheless, if correctly used, RFLP 1994). For example, Smith et al. used KAIJUs to
analysis of low-copy-number anonymous nuclear identify the spatial distribution and size of one of
loci can be a powerful tool for intraspecific sys- the largest and oldest living organisms, an indi-
tematics. This is aptly demonstrated by Aldrich vidual of Arnzillnria bulbosn. This represents an
and Doebley (1992), who use nuclear RFLP data ideal use of RAPDs, because reproducible differ-
to support the origin of domesticated sorghum ences in RAPD phenotypes are all that is required
from wild sorglzum of central-northeastern Africa. to identify and differentiate clonal genot-ypcs;
Otlner applications include cultivar and individ- knowledge of the genetic basis of the RAPD phe-
ual identification (Smith and Smith, 1991; Vaccino notypes is not essential unless relationships
et al., 1993), estimation of levels and partitioning among clones must be ascertained. Another com-
of genetic diversity (Aldrich and Doebley, 1992), mon use of M P D data has been to describe pat-
and parentage determination (Quinn and White, terns of relationships among populations or ac-
1987b). cessions of cultivated plants and their wild
Karl and Avise (1993) modified this approach relatives (Liu and Furnier, 1993; Adams et al.,
for PCX, developing primers for random single- 1993; Yu and Nguyen, 1994).To date, too many of
copy clones and screening populations for RFLPs these studies have been limited by inadequate
in the amplification products. This techlnique has sample sizes and inadequate knowledge of frag-
provided significant insights into marine turtle ment heritability and homology.
population structure (Karl et al., 1992) and pat- U P D s also have been used successfully for
terns of genetic differentiation in oysters (Karl and estimation of parentage, contributing to studies of
Avise, 1992),but even proponents regard it as in- reproductive biology in both plants and animals
274 Chapter 8 / Dowling, Moritz, Palmer @ Xieseberg
= : .-.. ~ - =.-- .._+ = - - - ~ = . & ~-.--_~ = .

-_~
=---&
Extract DNA E
T
7
-
=
-= :--*-. z
-== -
w -
=7 .. E6E.
lL--& :>-=e
- > = - 7 - - ----- 2-3-
.-.---..-------a>---.--*
- **-E
A
+
~~-~~~+~~&-@~~~-~~+~=g;
----
"." -
-
E
7
--
w
-
*
T -. - *?4?:.a3&&-zE5F-E
-.---- ---.
.
-a >.
Search for
polymorphism
Determine mode
of inheritance
(if possible)
Conduct survey
Verify honiology
Analyze data
Figure 8 Flow chart indicating the steps required for

rigoious use of RAPBs in surveys of genetic variation
among natural populations
loci required to obtain robust estimates of out-
crossing was three to four times greater than for
comparable studies using codominant markers
(Fritsch and Rieseberg, 1992; Philbrick, 1993; (e.g., Ritland and Ganders, 1987; Holtsford and
Hadrys e t al., 1993). For example, Fritsch and Ellstrand, 1990).The inability to detect heterozy-
Rieseberg (1992) used RAPD markers to estimate gotes also increases the number of loci required
outcrossing rates in populations of the flowering for paternity analysis (Lewis and Snow, 1992;
plant Dnfisca glomerata. Because the majority (84% Hadrys et al., 1993; Milligan and McMurry, 1993).
of 31) of markers were dominant, the number of In fact, Lewis and Snow suggest that researchers
Nucleic Acids IIL A n alysis of Fragments and Restriction Sites 275
should expect to survey more than 50 RAPD loci studies, RE variation has been important for ex-
for each offspring for most applications of pater- amining the significance of hybridization. Studies
nity exclusion analysis. Nonetheless, by choosing of RE variation have been used to identify shifts
a subset of markers with high recessive allele fre- in the position of contact zones (e.g., M.L. Arnold
quencies, RAPD loci provide nearly as much et a.l., 1987a; Dowling and Hoeh, 1993) and the
power as biallelic codominant loci (Lewis and role of hybridization in the production of new
Snow, 1992). Furthermore, several loci can be as- plant and animal taxa (reviewed by M.L. Arnold,
sayed per primer, and considerable automation of 1992; Dowling and DeMarais, 1993).The detection
the technique is possible (R. Sederoff, personal of hybridization in past evolutionary events is
communication). Thus, RAPDs may be useful for best achieved by rigorous phylogenetic analysis,
estimation of parentage in systems that are genet- which involves using several independent data
ically uncharacterized, and where the availability sets to identify mosaics of characteristics con-
of variable codominant markers is limited. tributed by different taxa (Rieseberg and
Brunsfeld, 1992).This can be accomplished read-
REPEATED SEQUENCES The repeated nuclear gene ily by combining direct sequencing and restriction
most commonly assayed is the ribosomal RNA site analyses (for generating a large number of
cistron. This spans both variable and conserved characters and assessing intra- and interpopula-
regions, and a few studies have demonstrated tional variation, respectively).
intraspecific l2FLP variation, often due to length At this time, mtDNA and cpDNA, often in
variation in the non-transcribed spacer region combination with allozyme variants, have been
(e.g., Learn and Schall, 1987; Sites and Davis, the markers most frequently used in such studies,
1989; Hillis et al., 1991~).In a few cases, useful due mainly to their ease of application. The typi-
intraspecific variation in the internal transcribed cally maternal mode of inheritance for these char-
spacer has been revealed by PCR sequencing, acters makes them particularly useful for studies
which could be the basis for screening via a frag- of hybridization, because they provide a means
ment method. for identifying the maternal form involved in the
production of hybrids and the assessment of di-
Studies of Interspecific Hybridization rectionality of introgression. Y-chromosome-spe-
Studies of hybridization play an important role in cific sequences can provide a similar haploid
our understanding of evolutionary processes (re- marker for tracing the male contribution (Vanler-
viewed by Hewitt, 1988; Harrison, 1990).Where Berghe et al., 1986; Tucker et al., 1992; Lundrigan
distinct taxa currently hybridize, it is possible to and Tucker, 19941, although their use has been
examine speciation and the evolution of repro- limited. Autosomal nuclear DNA markers, while
ductive isolation. Typical studies of hybridization used less often than organelle systems, also have
involve a population genetic approach to quantdy proven to be informative (e.g., rDNA: R.J. Baker
patterns of gene exchange among extant taxa (re- et al., 1989; RAPDs: M.L. Arnold et al. 1991; and
viewed by Barton and Hewitt, 1989; Harrison, anonymous nDNA loci: Degnan, 1993b; Parsons
1990). The very nature of these studies requires et al., 1993).We expect that nuclear DNA markers
examination of large numbers of individuals for will be applied more regularly in the future, espe-
several independent markers (e.g., different nu- cially for organisms in which limited allozyme
clear genes, mtDNA, cpDNA), making direct se- variation exists. With the above caveats, RAPD
quencing impractical and inappropriate. Many in- markers potentially will be of great use for future
dividuals can be screened relatively quickly and studies of hybridization because of the efficiency
cheaply for fragment length or site polymor- with which species-specific markers can be devel-
phism~,making these methods ideally suited for oped (Rieseberg and Ellstrand, 1993). This is pri-
studies of hybridization (reviewed by Harrison, marily due to the "universality" of RAPD primers
1990; Rieseberg and Ellstrand, 1993). across taxonomic groups, combined with the
In addition to its use as a tool in evolutionary many loci typically amplified by each primer. In
276 Chapter 8 / Dowling, Moritz, Palmer & Xieseberg
addition, bulked segregate analysis (Michelmore among closely related species, whereas those with
et al., 1991) can increase the efficiency of marker slower evolutionary rates may provide useful
detection. This method involves pooling the DNA characters for studying relatively ancient diver-
from individuals of each parental species and gences. Levels of variation should be assessed in a
screening the bulked samples for polymorphisms. pilot study (see Chapter 2), with final design in-
DNA-based characters are not without their cIuding adequate samples below the level of in-
limitations. Although haploid markers allow for terest to assess the impact of geographic variation
direct examination of the maternal/paternal com- (Smouse et al., 1991).PopuIation sampling should
ponent of hybridization, they are useful only take into account other characters as well as geo-
when used in conjunction with other character graphic information, with special effort made to
sets, such as allozymes (Chapter 4) or other DNA include samples from morphologically/geneti-
markers. When using diploid markers, it is essen- cally distinctive as well as geographically dis-
tial that their mode of inheritance is understood. persed populations.
For example, when applying rDNA characters, it An ideal approach to such a study would in-
is important to consider the effect concerted evo- volve a combined approach involving direct se-
lution could have on the distribution of variants quencing (see Chapter 9) and RE analysis. Nu-
within individuals and populations and the esti- cleotide sequences for each major lineage can be
mation of deviations of observed numbers of hy- used as a guide for surveys of RE site variation,
brids relative to those expected (M.L. Arnold et allowing for fast and efficient quantification of
al., 1987a; Hillis et al., 1991~).
The phenotypic na- levels of variation within lineages. It shouId be re-
ture of RAPDs (multiple bands and typically stated that the phylogenies produced are of the
dominant expression of alleles) makes it impera- molecules and may differ from the organismal
tive that breeding studies be conducted in order phylogeny for various reasons, including intro-
to understand patterns of heritability, gression, gene conversion, and sortlng of poly-
morphism (reviewed by Avise, 1994).
Species-Level Comparisons Animal Mitochondria1 DNA
As with olher levels of comparison, the ideal is to The application of mtDNA RFLPs to phylogenetic
find characters that vary among, but not within, analysis of congeneric species has been reviewed
the groups being studied. Further, differences extensively (A.C. Wilson et al., 1985; Birley and
among groups should not be so large that conver- Croft, 1986; Avise, 1986; Moritz et al., 1987; Avise,
gence, parallelism, or non-homology obscure the 1994).In general, the approach has proven useful
true phylogeny. The choice of characters or se- for resolving relationships of closely related
quences for analysis is critical in achieving this bal- species. Phylogenetic analysis of mtDNA restric-
ance. Due to the rapid rate of evolution and the fi- tion sites also has identified the bisexual species
nite number of character states, homoplasy is that acted as the maternal parent of hybrid-
likely to be common for length-associated charac- parthenogenetic species (W.M. Brown and Wright,
ters such as those assayed by microsatellites and 1979; reviewed in Avise et al., 1992b; Moritz et al.,
minisatellites. Because of problems involving re- 1992b) and the existence of past introgression
peatability and homology, RAPDs are unlikely to (Dowling and DeMarais, 1993).
be useful for phylogenetic studies because of prob- The main problems encountered in such stud-
lems discussed previously (see also Clark and ies stem from sorting of polyrnorp~usm,where re-
Lanigan, 1993; Hillis, 1994; J.J. Smith et al., 1995). cently separated species are being compared, and
Therefore, the discussion below will focus on the from high levels of noise (homoplasy), where dis-
use of RE characters for phylogenetic inference. tantly related species are examined. Using simu-
Sequences with rapid evolutionary rates yet lation studies, Neigel and Avise (1986) showed
moderate to low intraspecies polymorphism are that sequences froin recently separated mono-
most appropriate for analyzing relationships phyletic sister taxa appear polyphyletic initially,
Nucleic Acids III: Analysis of F~agmentsand Rest~ictiolzSites 277
then paraphyletic, and then monophyletic, as the example, Drosoplzila (DeSalle et al., 1987a) and
original polymorphic lineages are terminated and minnow (Dowling et al., 199210) mtDNA se-
replaced by variants unique (i.e., apomorphic) to quences seem to plateau at only 8% and 10% se-
each taxon. The simulations indicated that, for a quence divergence, respectively. Once these levels
haploid marker such as mtDNA, this process may are reached, further base substitutions are con-
take 4N, generations, where N,is the effective centrated at positions that have already changed,
population size. However, the time frame is also which is likely to increase homoplasy among
likely to be affected by the amount and distribu- IIFI,Ps. Since the plateau point varies among iax-
tion of polymorpltism within each species, the ge- onomic groups and among genes (Zhu et al.,
ographic mode of speciation, and the demo- 19941, so will the ability to resolve phylogenetic
grapluc history of the two species (see also Avise relationships. Therefore, it is essential to complete
et al., 1984,1988). This problem is not restricted to a carefully designed pllot study (Chapter 2 ) before
mtDNA or recently separated taxa. W.S. Moore embarking on a large-scale study. Where homo-
(1995) and Slade et al. (1994) found that ancestral plasy does appear to obscure relationships, it rnay
polymorphisms in mtDNA are eliminated more be possible to improve the signal-to-noise ratlo by
rapidIy than those at nDNA loci, presumably be- restricting comparisons to a slowly evolvlng re-
cause of the reduced effective population size and gion (Dowling and Brown, 1989).
higher mutation rate of the former. Theoretical
studies (Pamilo and Nei, 1988) indicate that if an CIiEoroplast DNA
ancestral taxon was highly polymorphic and mul- Nucleotide sequence divergence values for
tiple speciation events occured over a short time cpDNAs of congeneric species typically range up
relative to effective population size, then the prob- to 2.0% (see references in Palmer, 1987; Palmer et
ability of obtaining the correct topology from a al., 1988b; D.J. Crawford, 1989). Given a typical
single sequence is low. This has undoubtedly con- genome size of 150 kb, sampling with 10-20 REs
tributed to the debate over the phylogeny of that cleave from 20-100 times each will allow cov-
higher l~ominidsas deduced from mtDNA and erage of 1-5 kb of sequence, which usually is ade-
other sequences (reviewed by Holmquist et al., quate to produce a highly resolved phylogeny.
1988a,b) and created difficulties in resolving rela- Thus far, such phylogenies have been relatively
tionslups among African rift lake cichlids (Moran untroubled by problems of hoxnoplasy (Givnisl~
and Kornfield, 1993).There seems no obvious so- and Sytsma, 1995) and have contributed to a bet-
lution to the problem of polymorplusin. However, ter understanding of a host of phylogenetic prob-
it does stress the need for adequate sampling of lems, including the identification of crop plant
geographic populations and gene loci for phylo- origins from wild species, identification of the ma-
genetic analyses. If there is a strong geographic ternal and paternal ancestry of a number of hy-
component to intraspecific polymorphism, inade- brid and polyploid species, detection of unsus-
quate sampling may lead to erroneous phyloge- pected cases of introgress~on,and identlfica tion of
nies (Smouse et al., 1991). the progenitor genus of a putatively monotypic,
Homoplasy can be a substantial problem morphologically isolated genus (reviewed in
where distantly related tax? are compared. Tlus is Palmer, 1987; Palmer et al., 1988b; D.J. Crawford,
true particularly where comparisons are restricted 1989; Olmstead and Palmer, 1994; Soltis and
to fragment sizes rather than mapped cleavage Soltis, 1994; Sytsma and Halm, 1994).
sites. The upper limit to useful RFLP comparisons In situations where the quantity of DNA is
of mtDNA presumably is set by constraints on se- limiting, where extensive population sampling is
quence evolution. Sequence comparisons indicate required, or where rearrangements make map-
that primate mtDNAs reach a plateau of sequence ping difficult, digestion of PCR-amplified cpDNA
divergence at about 25% (W.M. Brown et al., fragments with frequent-cutting enzymes may be
1982). It is important to note, however, that the the method of choice (e.g., Rieseberg et al., 1992;
position of this plateau may vary among taxa. For Liston, 1992). Because the entire chloroplast
278 Cizapfer 8 / Dowling,Moritz, Palmer E7) Riese berg
genome has been sequenced for several plant Studies using restriction site variation of
species, it is now possible to generate universal cpDNA for interspecific phylogenetic analysis
primers for almost any portion of the genome. have encountered several problems. As with a l ~ -
Thus, rapldly evolving non-coding sequences can mal mtDNA, the sorting of ancestral polymer-
be chosen for comparison of closely related phisms can lead to discordance between c ~ D N A
species, whereas more slowly evolving sequences trees and organis~nalphylogenies (reviewed in
can be amplified for more divergent taxa. D.E. Soltis et al., 1992; Doyle, 1992). This problem
Nonetheless, tlus method is limited by several fac- appears to be less severe for cpDNA than for ani-
tors including: (1) the difficulty of amplifying mal mtDNA, however, because of the low rate of
large cpDNA regions (2-5 kb) in sufficient quan- cpDNA evolution, low effective population sizes
tity for digestion (although this may be overcome of many plant groups, and resulting low levels of
uslng "long-PCR); (2) the Iaborious nature of intrapopulational and intraspecific cpDNA poly-
double-digest mapping of complex fragment pro- morphism. Conversely, due to the high potential
files generated by four-cutter enzymes; and (3) the for interspecific gene flow in plants, hybridization
lim~tedpl~ylogeneticresolution typically obtain- and introgression may be a greater problem for
able from a single region. Mapping efficiency and phylogenetic analysis of cpDNA variation than
choice of restriction endonucleases can be greatly animal mtDNA (Rieseberg and Soltis, 1991). For-
enhanced by the availability of sequence data for tunately, this problem is readily solved by ade-
at least one of the taxa under study. In addition, quate geographic sampling for phylogenetic
several regions can be amplified and digested to analysis and by comparing cpDNA trees with
increase phylogenetic resolution. phylogenetic hypotheses based on other data sets.
Early studies of cpDNA restriction site varia- Another common problem for interspecific phy-
tion within a genus were accomplished by direct logeny is the conservative nature of cpDNA evo-
n~spect~on of restriction fragment patterns of pu- lution, which often limits resolution among
rlfied cpDNA. However, most current efforts use closely related species (e.g., Schilling and Jansen,
a transfer hybridization approach in which cloned 1989; Rieseberg et al., 1991; D.E. Soltis et al., 1991;
cpDNA fragments are hybridized sequentially to Mummenhoff and Koch, 1994). In some instances,
filter blots containing digests of genomic DNA resolution can be increased by sampling the
(see protocols). Although more laborious, this ap- genome with additional restriction endonucle-
proach has two main advantages. The use of total ases, particularly endonucleases with four-base
DNA as compared to cpDNA has major advan- recognition sites (four-cutters),which typically cut
tages with respect to yield (therefore much less cpBNA much more frequently (Olmstead and
starilng material is required) as well as extraction Palmer, 1994).
flexibility and adaptability (see Palmer et al.,
1988b for a fuller discussion). By probing with Nuclear Genes
cloned portions of the chloroplast genome, the Some single-copy nDNA sequences have been
cornplexlty of the fragment patterns is greatly re- compared among species by RFLP analysis (e.g.,
duced, allowing a more critical analysis of frag- ADH among Drosophila; Langley et al., 1981;
ment differences in terms of discrete mutations Bishop and Hunt, 1988; Y-chromosome markers in
and often permitting the direct mapping of re- Mus; Tucker et al., 19891, but the data are too few
striction fragments and sites. Fortunately, many for particular advantages and limitations to be
complete clone banks are readily available for a identified, RFLP analysis of multigene families is
wlde range of land plant cpDNAs (reviewed in exemplified by studies of globin variation among
Paln~er,1986a; Palmer et a]., 1988b; Ollnstead and primates (Zimmer et al., 1980; Barrie et al., 1981).
Palnier, 2994), with a well-characterized bank Analysis of multigene familes requires particular
froin the completely sequenced genome of Nico- care when using heterologous probes-because low
rmna generally being the most usefuI (Olmstead stringency hybridization is likely to detect varia-
and i3almer11992). tion in duplicate copies as well as the target se-
pence. It then becomes important to distinwish 1984; M.L. Arnold et al., 1987a; Mindell and Hon-
between variation in orthologous (shared by de- eycutt, 1990; Rieseberg, 1991). The variation re-
scent) and paralogous (duplicate) copies for phy- vealed in these studies was typically, although
logenetic analysis. Even if this distinction can be not exclusively, in the transcribed or non-tran-
made (e.g., by relative intensities of hybridization; scribed spacers and was due to length mutations
see Barrie et al., 19811, gene conversion among or to the gain or loss of cleavage sites. The phylo-
members of a multigene family (e.g., Slightom et genetic information obtained from these studies
al., 1987) could still cause the gene tree to differ typically has been consistent with previous stud-
from the species tree. ies. However, M.L. Arnold et al. (198%) found
There are also a number of studies that have that divergence of a highly repeated sequence
used RFLP variation at numerous anonymous nu- was inconsistent with other evidence on the rela-
clear loci for phylogenetic inference among tionships among subspecies of Caledia captiva,
species (Song et al., 1988; Miller and Tanksley, and attributed the discrepancy to historical intro-
1990b; Kesseli et al., 1991; Jena and Kochert, 1991). gression.
Because each locus can be considered a poten-
tially independent estimator of phylogenetic rela- Higher-Level Systematics
tionship, this approach greatly reduces the prob-
lems of phylogenetic sorting and hybridization Investigations at this level have used both
associated with gene trees. However, much nu- changes in cleavage sites and gross structural re-
clear RFLP variation appears to result from inser- arrangements as characters for phylogenetic
tions, deletions, and rearrangements. Thus, frag- analysis. However, in contrast to sequence data
ment profiles generated by different endo- (see Chapter 9), there have been relatively few ap-
nucleases with the same probe are often corre- plications of WLPs to higher-level systematics.
lated, suggesting that it may be best to use many
probes with a single enzyme each. Anonymous Animal Mitochondria1 DNA
nuclear loci also will be subject to the problem of Although sequence evolution of animal mtDNA
orthologous versus paralogous variation dis- typically is rapid, certain aspects are highly con-
cussed in the preceding paragraph. served. These include rntDNA structure (Bridge et
The most comprehensive studies of nuclear al., 19921, gene order, genetic code, and the sec-
RFLP variation are on crop plants and their rela- ondary structure of tRNA and rRNA sequences
tives and involve Brassica (Song et al., 1988, (reviewed by Wolstenholme, 1992). The order of
1990), tomato (Miller and Tanksley, 1990b), and mtDNA genes varies considerably among phyla,
lettuce (Kesseli et al., 1991). For example, phylo- with the position of tRNA genes more variable
genetic analysis of eight species of tomato with than other coding sequences, There are some in-
40 RFLP loci generated two clusters, correspond- dications of minor variations within classes or
ing to self-incompatible and self-compatible phyla (e.g., in vertebrates, Paabo et al., 1991; Des-
species. In addition, red-fruited tomato species jardins and Morais, 1991; Seutin et al., 19941, mak-
formed a cluster within the self-compatible ing it imperative to further investigate within-
species group. It is noteworthy that an earlier group diversity before applying gene order as a
cpDNA-based phylogeny for the group did not tool for estimating relationships among phyla
resolve species into self-incompatible and self- (W.M. Brown, 1985; e.g., Sankoff et al., 1992; M.J.
compatible clades (Palmer and Zamir, 19821, but Smith et al., 1993).
did support a clade of red-fruited species. Which Aside from structural changes, some coding
tree represents the "true" phylogeny for toma- sequences (reviewed in Brown, 1985) may be con-
toes remains unclear. servative enough to provide characters useful for
Of repeated genes, the rDNA cistrons have phylogenetic analysis among genera. However,
been used most widely in interspecific compar- sequencing rather than fragment analysis is
isons (e.g., Coen et al., 1982; G.N. Wilson et al., clearly the method of choice here.
280 Chapter 8 / Dowling,Moritz, Palmer b Rieseberg
Chloroplast DNA stitutions makes each rearrangement a relatively

Comparative restriction site mapping of cpDNA powerful character.
can be used successfully at the highest levels Gene and intron losses and gains are detected
within many families of flowering plants (e.g., by a simple presence/absence test based on hy-
Jansen et al., 1991; Olmstead and Palmer, 1992). In bridization experiments (Figure 9) and as supple-
general, whole-genome analysis cannot be used mented by PCR and sequencing analysis. Inver-
above the family level, or across such diverse fam- sions and losses of the large inverted repeat are
ilies as the Fabaceae and Onagraceae. This restric- detected by hybridization and PCR-based assays
tion is due to excessive DNA divergence, both in that analyze linkage relationships between two or
sequence and in structure. more probe fragments from the ends of the inver-
Recently, however, it has been shown that the sion or deleted repeat segment (Palmer et al.,
chloroplast inverted repeat region, which is typi- 198813; for examples, see Jansen and Palmer,
cally 25 kb in size, can be mapped reliably for en- 1987a,b; Lavin et al., 1990; Doyle et al., 1992).
tire orders and even subclasses of angiosperms
(Downie and Palmer, 1992a; Manos et al., 1993; Nuclear DNA
Downie and Palmer, 1994). This is because (for Given the vast complexity of the nuclear genome,
reasons that are still unclear) the rate of synony- many nuclear genes should be useful for inferring
mous substitutions is nearly four times lower in higher-order phylogenetic relationships (see
the inverted repeat than in single-copy regions of Friedlander et al,, 1992).The best example of the
the chloroplast genome (Wolfe et al., 1987). This broad utility of nuclear genes is the rDNA repeat
approach combines the use of smaller hybridiza- unit, which has intervening sequences that vary
tion probes and more frequently-cutting restric- within and between populations (see above) and
tion enzymes than typically are used in whole- coding sequences that are so highly conserved as
genome studies. to be useful for comparisons among widely di-
Two other approaches for extracting phylo- vergent taxa (Mindell and Honeycutt, 1990; Hillis
genetic information from cpDNA at higher lev- and Dixon, 1991)-even among kingdoms (Pace
els are DNA sequencing and rearrangement et al., 1986; reviewed in Chapter 9). Once again,
analysis (reviewed in Downie and Palmer, however, sequencing often is more efficient than
1992b). The general practice of DNA sequencing fragment analysis and there have been compara-
is discussed in detail in Chapter 9 and its appli- tively few studies using RFLPs. (Hillis and Dixon,
cation to cpDNA, in particular the chloroplast 1991, cited a few dozen restriction fragment
rbcL gene, is discussed by Clegg (1993) and
Baum (1994) (also see all 14 papers in volume 80,
issue 3, of the Annals of tlze Missouri Botanical Gar- Figure 9 Detection of intron and gene losses during )
dens, which is dedicated to rbcL and seed plant angiosperm cpDNA evolution.(A) Electrophoresisin a
phylogeny). Rearrangements that are useful as 0.9% agarose gel of cpDNA fragments produced by di-
higher level characters are major events such as: gestion wit11 EcoRI (lanes 1 and ll), Sad-PvuII (lanes
2-6,8,9,13, and 141, SacI-PstI (lanes7,12, and 25), and
(1)inversions (e.g., Jansen and Palmer, 1987b; Hind111 (lanes 10, 16, and 17).Filter replicas of the gel
Doyle et al., 1992), (2) deletions/insertions of in- were made by bidirectional blotting and were then hy-
trons (e.g., Downie et al., 1991), (3) partial or bridized sequentiallywith the gene probes indicated in
complete deletions of genes (e.g., Gantt et al., panels B-E. (B) Hybridization with a 772-kp fragment
1991), and (4) deletion of one entire segment of internal to and containing 90% of the coding region of
the rpl2 gene from spinach. (C) Hybridization with a
the large inverted repeat found in most chloro- 545-bp fragment internal to and containing 82%of tlze
plast genomes (e.g., Lavin et al., 1990). Major re- intron of the rpl2 gene from tobacco. (D) Hybridization
arrangements are quite rare relative to nu- with a 1040-bp fragment containing 96% of the coding
cleotide substitutions and therefore cannot be region of the rplA gene from spinach dnd 78 bp of 5'
expected to produce by themselves a fully re- non-coding sequence. (E) Hybridization with a 209-bp
fragment internal to and containing 45% of tlze coding
solved phylogenetic tree. However, their very region of the rpl22 gene from tobacco. (Reprinted froin
rarity and lack of homoplasy compared to sub- Palmer et al., 198813.)
Nz~cleicAcids 111: Analysis of Fragments a n d Restrictio~zSites 281
(A) Chloroplast DNAs Lane

number Species Family Subclass
Zea 1nny5 Poccae Commcllntdae
Narcissus tnzetta Amarylhdaceae Lihidae
Aristolochia durior Ai~stolochaceae Magnol~tdac
Deipltiniunz grandlflorum lianunculaceae Magnoi~~dac
Eschschoiizia californicn Papaveraceae Magnoll~dac
Pilea ttticraphylla Urllcaceae Mamamcl~dac
Spiilacia oleracea Clienopodiaceae Caryoph~~ll~dac
Rumex obti~sifolius Poiygonaceae Caryophyll~dac
Glycine max Fabaceae Rosidae
Mcdicago sativa Fabaceae Rosidae
Trifolium subterraneuin Fabaceae Ros~dac
P ~ S U sa
N Itiv~i~iz Fabaceae Rosidae
Aesculus cnlifornicn I-hppocastanaccac Rosidae
Pdnrgotzi~~m xhortori~ln Geranlaceae Rosldac
Brassica canipestris Brassicaceae D~llinndac
Nicotiann fnbncum Sola~laceae Astendae
Lactuca sativn Asteraccae Astcr~dac
r p / 2 exon (c) r p / 2 intron

282 Chapter 8 / Dowling,Morifz, Palmer & Rieseberg
analyses of higher phylogeny using rDNA, as op- ethidium bromide, RNase A, DNase 1, DNA poly-
poseci to hundreds of sequencing studies.) merases (e.g., Klenow fragment, Kornberg en-
zyme, Taq polymerase), restriction enzymes (see
Table 3), bovine serum albumin (crude for addi-
LABORATORY SETUP tion to hybridization solutions, ultrapure for other
applications), ethanol, sodium acetate, Fmercap-
Major equipment items needed for analysis of toethanol (WE),sorbitol, hexadecyltrimethylam-
DNA fragments are included in Chapters 7 and 9, monium bromide (CTAB), ammonium acetate,
The most expensive of these is an ultraspeed cen- potassium chloride, dithiothreitol (DTT), agarose
trifuge with approproriate rotors needed to pre- (ultrapure), acrylamide (ultrapure), bisacrylamide
pare mtDNA of high purity. A programmable (ultrapure), ammonium persulfate (ultrapure),
thermal cycler is needed for gene alnplification re- N,N,N',N'-tetramethylethylenediamine (%ED),
actions (Chapter 7) and is now routine and af- boric acid, urea, and sodium citrate.
fordable equipment for a molecular systematics
laboratory. DGGE requires a specialized elec-
trophoresis chamber with recirculating tempera- PROTOCOLS
ture-controlled water. Chambers for routine gel
electrophoresis can be made in-house at low cost We describe protocols for the basic operations in
(see below). Other essential items include an au- fragment analysis. The more complex procedures
toclave (or access to one), a fume hood, and a for optimization and application of DGGE are de-
source of high-purity water. Single-distilled or scribed clearly and in detail by Myers et al. (1989)
deionized water can be used for rinsing glassware and Lessa (1993). Methods for SSCPs and the crit-
and making up electrophoresis buffers, but solu- ical variables are described by Hayashi et al.
tions used for preparing or manipulating DNA re- (1991a,b), and the basic method is described in
quire even greater purity (i.e., sterile double- Chapter 9, Protocol 20. The experimental ap-
distilled or sterile distilled-deionized water). proach to, and methods for, PCR are elucidated
Standard laboratory items that are used include in Chapter 7. The protocols given here conclude
glassware, including various sizes of beakers, with a detailed exposition of restriction site
graduated cylinders and pipettes, Erlenmeyer mapping by double digestion or sequential hy-
flasks, slde-arm flasks, and bottles. High-strength, bridization.
acid/so!vent-resistant centrifuge tubes are needed Isolation of animal mtUNA using Csc1-p1
for many applications. Disposable supplies in- gradients
clude gloves, pipette tips, Pasteur pipettes, and
microcentrifuge tubes. 2. Isolation of cpDNA using sucrose step and
Reagents generally should be of analytical CsC1-EB gradients
reagcrii grade or better, although there are some 3, Digestion DNAwitl, restriction
exceptions (see below). In particular, chemicals
used In the preparatiol~and manipulation of 4. Agarose and polyacrylamide gel electrophoresis
DNA must be of high quality, as must the media 5. Staining with ethidium bromide
used for electrophoresis. Commonly used
reagents include; tris-base, sodium chloride, eth- 6. CX~~P-3'
el-td-labelil~gof restriction fragments
ylenedlaininetetraacetic acid (EDTA, disodium, 7, Primer for microsatellite analysis
dil~ydrate),sucrose, sodium dodecyl sulfate (SDS,
ultrapure), cesium chloride (technical grade), pro- 8. Transfer hybridization
pidlum rodide, light mineral oil, hydrochloric 9. Mapping restriction sites
acid, sodium hydroxide, isopropyl or isobutyl al-
col~ol,proteinase (e.g., proteinase K or pronase),
phenol (ultrapure), chloroform, isoamyl alcohol,
Nucleic Acids III: Analysis of Fragments and Restviction Sites 283
purification of mtDNA from nDNA using CsC1-PI

gradients. When using tissues unusually rich in
The optimal method of DNA isolation depends on mitochondria (e.g., amphibian or fish oocytes,
the type of sequence to be assayed, the level of avian cardiac muscle) the mtDNA obtained from
r e s o l ~ t idesired,
~~. and the type and condition of repeated differential centrifugation (repeat steps
tissues (Figure 4). Protocols for isolating total cel- 3-6, part A) may be adequately pure for most ap-
lular DNA from plants and animals are given in plications. However, in most cases it will be nec-
Chapter 9, a method for large-scale DNAisolation essary to proceed to the CsC1-PI gradient. A sin-
is given in Chapter 6, and rapid protocols appro- gle gradient (part B) usually will suffice, although
priate to some types of PCR are in Chapter 7. to obtain maximum purity (e.g., for use as a
Here we concentrate on methods for p u r 9 i n g or- probe), a velocitization step (part D) and a second
ganellar DNAs by centrifugation. These protocols CsC1-PI gradient (part E) are added. The velociti-
use CsCl gradients in conjunction with the inter- zhtion differs from equilibrium gradients in that
calating dyes, propidium iodide (PI), or ethidium molecules are pelleted at a rate proportional to
bromide (EB). Mitochondrial DNA is obtained by their size and largely independent of conforma-
lysing an enriched mitochondrial preparation and tion. Thus intact mtDNA (and large nDNA frag-
is separated from contaminating nDNA on the ba- ments) can be purified away from smaller DNA
sis of conformation. Supercoiled mtDNA mole- fragments, RNA, and proteins. Velocitization also
cules bind less of the lower density dye than does can be used to clean u p partially degraded sam-
linear DNA. The supercoiled molecules therefore ples and to remove any contaminating DNases.
have higher density in the presence of dye and The yield and purity of mtDNA is highly de-
band below the linear nuclear (and damaged mi- pendent on tissue type and condition. Fresh tis-
tochondrial) DNA in the appropriate CsCl gradi- sues generally provide good yields (=1pglg of tis-
ent (C.A. Smith et al., 1971). Unlike mtDNA, sue) so that amounts adequate for >50 digests
cpDNA is large enough that effectively all the (using end-labeling) can be obtained from rela-
molecules are damaged such that they band with tively small amounts of tissue ( ~ 2 5 mg).
0 The best
linear (nuclear) DNA; hence, CsCl gradients are source for animal mtDNA is unfertilized eggs;
used here only to purify DNA relative to other heart, liver, kidneys, gonads, and brain also pro-
kinds of molecules (i.e., proteins). Techniques for vide good yields. For crustaceans, heart and pleo-
isolation of fungal and plant mtDNA are given in pod muscle provide the best yields, but should be
W.W. Hauswirth et al. (1987). We remind readers gently homogenized. Likewise, adductor muscle
that when it is difficult to obtain adequate of bivalves and flight muscle of insects are ade-
amounts of appropriate tissue (see below), ampli- quate; all striated muscle should be homogenized
fication via long-PCR (e.g., Cheng et al., 1994a,b) and centrifuged in high volumes. Mitochondrial
may provide an alternative, at least for the rela- DNA has been isolated successfully from the
tively small animal mtDNA. white blood cells obtained from 200-250 ml of
whole mammalian blood (W.M. Brown, personal
communication) and 5-10 ml of whole reptile
ProL~cul1: Isolation of Animal mOtDNA blood (L.D. Densmore, personal communication).
Using CsCl-PI Gradients Frozen tissues usually yield about half the
(Time: Part A: 2-3 hr; Part B: 20-36 hr; Part C: amount of mtDNA relative to fresh material; this
1530 min; Part D: 4 hr; Part E: 24 h . ;Part F: 24 may be due to rupturing of mitochondrial mem-
hr; Total: 3 4 days) branes, which exposes the mtDNA to cytosolic
DNases and reduces the efficiency of enrichment.
The steps in this protocol reduce to two basic op- Tissues should be removed from the freshly killed
erations: (I) the preparation of a rnitochondrially specimen and snap-frozen (below -70°C). Alter-
enriched fraction from a cell or tissue homogenate natively, if mtDNA is to be prepared within a few
by differential centrifugation; and (2) the further days, it may be preferable to store tissues at 4OC
284 Chapter 8 / Dowling, Moritz,Palmer b Rieseberg
in STES buffer (Appendix). However, storage of samples (>I g) until pellet is the same size in
tissue in this buffer softens tissue considerably, two consecutive spins. For small-scale prepa-
making membranes more susceptible to breakage. rations (<5 g), go to step 6.
This may be a function of the high concentration 4. Transfer the supernatant to a 50-ml poly-
of EDTA, since Avise and coworkers (Lansman et propylene or polyalloiner screw cap cen-
al., 1981; Ball et al., 1988) report good results us- trifuge tube. Centrhge at 23,000 g, 4"C, for 20
ing their buffer, which contains less EDTA. There- min to pellet mitochondria and other remail,-
fore, this strategy should be tested for the differing cellular debris. Decant supernatant and
ent combinations of tissue, species, and buffers. drain pellet.
Yields from ethanol-preserved tissues are poor, 5. (OPTIONAL; for large amounts of tissue) Purify
possibly because of damage to the mitochondrial mitochondrial fraction on a 1.0 M/1.5 M su-
membranes (S. Palumbi, personal communica- crose step gradient as follows:
tion).
a. Re-suspend pellet in 20 rnl 0.25 M sucrose (in
Pare A. Preparahinni of Crirde mti3P;IPk ThE, Appendix).
1. Sacrifice or, if frozen, partially thaw animals b. Make the sucrose gradient by underlayering 10
and remove tissues. If using only cells (e.g., ml of 1 M sucrose (in ThE) with 8 ml of 1.5 M
blood), pellet and begin at step 7. sucrose.
2. Homogenize thoroughly in cold STES buffer c. Carefully overlayer the sample onto the gradi-
(see Appendix: 12 ml buffer/g tissue; 12 ml ent.
minimum). The concentration of EDTA may d. Centrifuge at 25,000 rpm (81,000g), 4"C, for 1hr
be adjusted, depending upon leveIs of DNase (no brake) in a Beckman SW28 rotor (or equiva-
activity. EDTA inhibits DNases by chelating lent).
divaient cations required for their function. A e. After centrifugation, aspirate off the top of the
good starting concentration is 100 mM EDTA. gradient and carefully remove the mitochondr-
Isolations of mtDNA from organisms with ial fraction (appears as a band at the 1.0-1.5M
high levels of DNase activity (e.g., mollusks) interface; see Figure 10A).
have been more successful using 200 mM f. Re-suspend mitochondria1 fraction in three vol-
EDTA in their grinding buffer, while initial umes of ThE and centrifuge at 23,088 g to pcllet.
studies of teiid lizards and terrestrial mam-
mals worked well with 1 mh4 EDTA. It is im- 6. Re-suspend the pellet (from step 4 or 5fl in 1.0
portant to note that increasing EDTA concen- ml ThE at room temperature and mix vigor-
tration decreases the stability of membranes. ously. If the pellet volume is greater than 0.3
High EDTA concentrations limit the loss of ml, re-suspend in 4 volumes of ThE.
mtDNA due to degradation, but mtDNA is 7. Add 0.125 ml(1/8 re-suspended volume) 20%
lost due to membrane breakage and which re- SDS (W/V in H20) to lyse membranes, mix
sults in inability to recover the molecules gently, and leave at room temperature for at
from the supernatant. Therefore, it may be least 10 min.
necessary (particularly when working with 8. Add 0.188 m1 (1/6 volume) CsC1-saturated
small amounts of tissue) to determine empir- water to precipitate nuclear DNA-SDS-CsCI,
ically which EDTA concentration provides the mix gently, and place on ice for at least 15
best yields. min. Larger samples (>1 g tissue) may require
3. Centrifuge homogenate for 5 min at 1200 g, longer incubation times (i.e., overnight) to
doc, to pellet nuclei and large cellular debris. complete precipitation. (This mixture can be
This pellet may be saved for nuclear DNA ex- stored at 4°C overnight or longer at this
traction. Repeat this step, when using large point).
Nucleic Acids IIP: Analysis of F ~ a g m e n t sa ~ Rest~ictioiz
d Sifes 285
9. Centrifuge at 17,000 g, 4"C, for 10 inin. Trans-

10 ml of load (low-speed fer supernatant to an ultracentrifuge tube ( ~ f
chloroplast pellet) continuing) or a 5 ml culkure tube (for stor-
Diffuse band of
lipids and protein age). If the liquid is extremely viscous, force
8 ml of 30%sucrose-) 1 I - the solution through a 30-gauge needle S I X
L. '-Broad smear of
chloroplasts
times to shear any nuclear DNA. (Do not
shear the DNA if you plan on saving nuclear
DNA). This solution can be stored at -20°C af-
17 ml of 52% sucrose
iI I Cell wall debris,
ter adding CsCl (step 1, Part B).
nuclei, starch grains

Part B. UllmcenBrifugaiitlri of CsC1-PI S;radrer?i
(B) Mineral oil If the sample volume is less than 1.5 ml, mtDNA
can be isolated in a preparat~veultracentrifuge
(e.g., Beckman SW6OTi rotor or equivalent.) using
-Nuclear DNA the rapid step-gradient method (steps 1-4). If
-mtDNA smaller gradients are run (e g., in a Beckman TLS-
55 rotor), or if the sample volume is greater d ~ a n
1.5 ml, follow steps 5-6.
Note: In this protocol, the Intercalating dye 1s
propidium iodide (PI) lnstead of ethidiurn bro-
mide (EB). More PI can intercalate into a closed-
Figure 20 Purification of chloroplasts and mtDNA by circular molecule, allowing far visualization of
gradient centrifugation. (A) Sucrose step gradient pu- smaller amounts of mtDNA. EB also may be used
rification of chloroplasts The sucrose step gradient pu- for isolation of mtDNA; however, higher dye con-
rification of mitochondria appears the same, with the centrat~onsare required for the same result. These
mitochondria banding in thc same position as depicted dyes are mutagenic, so wear gloves whenever us-
for the chloroplasts (1.0-1.5 M sucrose interface). (B)
CsC1-propidium iodide density gradient purification of ing them.
mtDNA, prepared as a step gradlent (1.40-1.70 g/ml). 1. Measure the sample volume and add the ap-
For the alternative method (1.55 g/ml), the RNA will
pellet at the bottom of the tube. propriate amount of solid CsCl to adjust the
sample density to 1.40 g/ml (Table 6), i.e., 0.53
g of CsCl per ml of sample plus 0.12 g to ac-
count for the dye volume to be added (0.23 ml
Table 6
Approximate amounts of PI and CsCl to adjust sample densities
to 1.40 g/ml
a is the volulne of sample prior to the addition oi PI and CsCl.

V[,~tiall
286 Chapter 8 / Dowling, Moritz, Palmer & Rieseberg
o l 2 mg/mnl PI in TE). Samples may be stored justable); speed = 36,000 rpm (140,000 g ;
for moi-itl~sat -20°C by adding only the CsC1. running time = 20-24 hr. The running time is
The PIis added just prior to ultracentrifugation. dependent upon the amount of DNA in the
2 . Add 0.23 1x11 of 2 mg/ml PI stock (in TE). sample; the more DNA, the longer it takes
Check the density of each sample by: (a) re- for the sample to attain equilibrium. Larger
peatedly weighing 1 ml of the solution, (b) samples may require more than 24 hr to
accurately measuring the sample volume and reach equilibrium.) Now go to step 8 (see
weighing the sample, or (c) using a refrac- below).
torneter. Adjust to 1.40 g/ml by addition of 5. For small volume gradients (total volume
water (if too heavy) or solid CsCl (if too <2.5 ml; to be run in a Beckrnan TLS-55 rotor
hght). or equivalent) or for large initial volumes
3. Place samples in ultracentrifuge tubes and (>1.5 ml of sample; to be run in a Beckman
check the volume of each. To form a step gra- SW6OTi rotor or equivalent), measure voIume
dient, careiully underlayer the sample with of supernatant from part A and adjust density
1 33 ml of 1.70 g/ml solution (Appendix) per to 1.52-1.57 g/ml by adding the amount of
mi of sample. Overlayer the step gradient CsCl indicated in Table 7 (this includes the
wlth mineral oil to within 1-3 mm of the top volume of PI to be added later).
(there should be at least 2 mm oil). 6, Just prior to centrifugation, add the amount
4 Put tubes into rotor buckets, carefully hook of 2 mg/ml PI needed to bring final concen-
buckets onto rotor, and place rotor on drive bation to 350 ,ug/ml (Table 7) and mix. Mea-
shalt. For a Beckman SW60Ti rotor or equiv- sure density of the solution. Final density
alcnt, set run parameters to: temperature = should be 1.52-1.57 g/ml. If necessary, adjust
21°C; maximum temperature = 35OC (if ad- by adding water or solid CsCI.
Table 7
Approximate amounts of PI and CsCl to adjust sample densities
to 1.55 g/ml
(mf)'
V[inxtia~~ ~r ( d l CsCl (g)
1.0 0.21 0.93
1.1 0.23 1.01
1.2 0.25 1.11
1.3 0.27 1.20
1.4 0.29 1.29
1.5 0.31 1.39
1.6 0.33 1.48
17 0.35 1.57
1.8 0.37 1.66
19 0.39 1.76
2.0 0.41 1.85
2.1 0.43 1.94
22 0.46 2.04
2.3 0.48 2.13
2.4 0.50 2.22
2.5 0.52 2.32
VDdhajii s the volume of sample prior to the addition of PI and CsCl
7, Place samples in tubes and fill to within 1-3 Part F). VeIocity Cexstrifugatinn on a Step
mm of the top with light mineral oil. Balance Gradient
tubes to within k0.02 g of each other. Run pa-
rameters for a Beckman TLS-55 rotor (or 1. Measure the volume of the sample collected
equivalent) are: 50,000 rpm (140,000 g), 2I0C, from the equilibrium gradient and add an
and >20 hr. For the larger Beckman SW60Ti equal volume of TE (at least 2/3 of the sam-
rotor (or equivalent), parameters are as in step ple volume) and mix. Addition of TE reduces
4, except that minimum run time is 36 hr. the density of the sample below 1.40 g/ml, al-
lowing it to be layered over the step gradient.
8. To end the run, push "stop" with brake on Failure to add a sufficient volume of TE re-
and remove tubes from buckets. duces efficiency of velocitization by prevent-
ing discrete overlayering of sample on the
gradient. The combined volume of sample
X7art C , Recovcsy of DNA
and TE should be less than 1 ml.
In room light, the nuclear DNA (actually all linear
and relaxed circular DNA, i.e., including damaged 2. The sample is overlayered onto a step gradi-
mtDNA) should be visible as an intense red band. ent consisting of two layers, 0.7 ml of 1.70
The band containing undamaged mtDNA, which g/ml solution (Appendix) and a quantity of
is from 2-6 mm below the nuclear DNA band, 1.40 g/ml solution (Appendix) determined by
probably will not be visible (Figure 10B).Bands of the volume of the diluted sample. The
carbohydrate are white to light pink in room light amount of 1.40 g/ml solution is calculated us-
and may be present below the mtDNA band. RNA ing the following formula:
is found at or near the bottom of the gradient.
I. Wear safety glasses or face shield and gloves. Volume (in ml) of 1.40 g/ml solution =
Using a long-wave (305 nm) UV light source, 3.8 ml - volume of diluted sample -
locate the mtDNA band. The mtDNA band is 0.7 ml of 1.70 g/ml solution
often not visible. In such cases, collect the area
2-6 mm below the main band. 3. Add the correct amount of 1.40 g/ml solution
to an ultracentrifuge tube. Using a Pasteur
2. Puncture the tube bottom with an 18-21-gauge pipette, underlayer this with 0.7 ml of 1.70
syringe needle with a wire inserted in it (appa- g/ml solution.
ratus in Figure 11).Use the wire to dislodge (by 4. Carefully layer the diluted sample on top of
pushing up) the small plastic plug that may the gradient, add light mineral oil to within
clog the needle, then remove the wire from the 1-3 mm of the top and balance tubes to
needle. The flow can be regulated by placing a within 0.02 g.
gloved finger over top of the tube.
5. Put the tubes into rotor buckets and place ro-
3. Collect the mtDNA fraction in a 1.5-ml micro- tor (Beckman SW6OTi or equivalent) into ul-
centrifuge tube. If the mtDNA is to be further tracentrifuge. Centrifuge at 45,000 rpm, 21°C
purified (parts D and E) and the mtDNA for 3.5 hr, with no brake.
bands are faint (or invisible), include the first
drop of nuclear DNA as a reference point for
further gradients. Otherwise, avoid contami- Part E. Samp!e Rccovery and Final: Eqrijlibriurn
nating the mtDNA fraction with any DNA Gradient
from the top band. The top band DNA also 1, Puncture tubes as in step 2, part C. Collect the
can be collected and usually is adequate for a
bottom 1.4 ml of the step gradient into a 1.5-
variety of uses (e.g., transfer-hybridization ml microcentrifuge tube.
analysis of nuclear and mtDNA sequences,
template for PCR amplification). Proceed to 2. Put 1ml of 1.55 g/ml solution (Appendix) into
extraction and dialysis (part F), unless further an ultracentrifuge tube, add the sample, and
purification (parts D and E) is desired.
288 Clzaptev 8 / Dowling,Moritz, Palmer 6.Rieseberg
SIDE VIEW
Interchangeable - 7/16 in.
21-gauge
1-5/8 in. needle
~nsertcd Threaded brass
througl~ fitting to screw
brass into base
7/16 in. fitting
I-Needle in
brass fitting
Figure
mix. Add light mineral oil and balance tubes as microcentrifuge. The saturated alcohol forms
above. the top layer (pinkish from the dye) and is
3. Use the same centrifugation conditions as in step discarded after each extraction. Repeat this
7, part B, with run time reduced to 18-20 hr. process until the sample (lower layer) is clear.
4. Recover sample as described in part C. 2. Place samples into 8-mm dialysis tubing (for
preparation, see Appendix) and tie or clip
tightly.
Past ki Extraction of Dye and Dialysis 3. Dialyze against two changes of 2 L 0 . 5 TE,~
1. To remove PI from a sample, extract with iso-
for 24 hr. -
propyl alcohol (saturated with CsCl; top layer 4. Remove and store purified mtDNA (should
is the isopropyl alcohol) and spin briefly in be in 0.2-0.5 ml) at -20°C.
Protocol 2: 1solaE.ian-eof cpDNh XJsix~g. overlay should be added with sufficent mix-
Sucrose Step and CsCl-EB Cradicaats ing to create a diffuse interface and thereby
(Time: Part A: 3 hr; Part 0:6-18 hr) prevent trapping of nuclear material in the
band of chloroplasts that form at the
This method involves two steps: (1) purification 30%-52% interface.
of intact and broken chloroplasts using a sucrose 9. Centrifuge the step gradients at 25,000 rpm
step gradient, and (2) purification of the cpDNA (81,000 g) for 30-60 min at 4'C in a SW-27
released from the organelles, together with any (Beckman) or AH-627 (Sorvall) rotor.
contammating nDNA and mtDNA, using a CsCl
10. Remove the chloroplast band from the
gadient with the intercalating dye ethidium bro-
30%-52% interface (Figure 10) using a wicie
mide (EB). Although the sucrose gradient proce-
bore pipette, dilute wit11 3-10 volumes wash
dure does not give as pure cpDNA as the DNase I
buffer, and spin at 1,500 g for 15 min at 4°C.
procedure of Kolodner and Tewari (19751, it is
much more applicable to a wide range of pIants 11. Re-suspend the chloroplast pellet in 1-2 ml
for which it is difficult or impossible to prepare in- wash buffer (or 15 ml if to be further puri-
tact, DNase I-resistant chloroplasts, or for which fied).
tissue quantities are limiting. For details and mod- 12. Add 1/20 volume of a 20 mg/ml solution of
ifications of this procedure, and for discussion of self-digested (2 hr at 37°C) proteinase K and
alternative procedures for purifying cpDNA, see incubate for 2 min at room temperature.
Palmer (1986a) and Palmer et al. (1988b). 13. Gently add 1 / 5 volume of lysis buffer (Ap-
pendix). Slowly invert tube several times ovcr
Part A: Xsnlation of Chloxopiasts and lysis a period of 10-15 min at room temperature,
1. Use young, unexpanded green leaves if at all then make the CsCl gradient (part B, below).
possible since they will have smaller cells 14. A cpDNA-enriched "total" DNA preparation
than older fully expanded leaves, and hence can be prepared by re-suspending the pellct
will yield more DNA. If practical, prior to ex- of the sucrose gradient in 1.5 ml of wash
traction, place plants in the dark for 1 4 days buffer, lysing (steps 12 and 13), conducting a
to reduce chloroplast starch levels. This usu- clearing spin (10 min, 1,750 g), and CsCl
ally is not essential. banding (see below).
2. Cut leaves into small pieces, 2-10 cm2in sur-
face area. Wash cut leaves in tap water (if vis- Faxi B. CsCI-EB Paarifiratian of cpDNA
ibly dirty). This method is described for cpDNA, but is ap-
3. Place 10-100 g of cut leaves in 50-400 ml of plicable to any crude DNA preparation. A snzaller
ice-cold cpDNA isolation buffer (Appendix). volume, more rapid protocol is described by
4. Homogenize in a blender for 3-5 5-sec bursts Weeks et al. (1986).
at high speed. 1. Bring the DNA sample (e.g., chloroplast
5. Filter through four layers of cheesecloth (with lysate, or re-suspended isopropanol pellet
squeezing).
- from a total DNA CTAB extraction, Chapter
6. Centrifuge filtrate at 1000g for 15 min at 4OC. g
9) to a volume of roughly 3 ml. Add 3.35 of
freshly powdered CsCl and dissolve by gen-
7. Re-suspend the pellet from 10-50 g of starting
tle mixing. Add EB to a final concentration of
material in 5-8 ml of ice-cold wash buffer
200 pg/ml and enough distilled H 2 0 to bring
(Appendix) using a soft paint brush and vig-
sample to a final volume of 4.45 ml and a fi-
orous swirling.
nal density of 1.55 g/ml.
8. Load the re-suspended pellet onto a step gra-
2. Centrifuge for 4-16 hr at 220,000-290,000 g at
dient consisting of 17 ml of 52% sucrose over-
20°C in a vertical rotor (e.g., Sorvall TV-865,
Iayed with 8 ml of 30% sucrose, both in 50
Beckman 65Vti).
mM Tris-HC1, p H 8.0, 25 mM EDTA. The
290 Chapter 8 / Dowling, Moritz, Palmer b Rieseberg
3 Remove any scum (this wiII be considerable ticular enzyme may vary depending upon its
In the case of a directly banded chloroplast source; therefore, manufacturer's condition
lysate) from the top of the gradient using a 1- should be consulted prior to use. REs vary widely
ml pipette tip with the end cut off. Use a sec- in stability: those that denature rapidly are best
ond 1-ml pipette tip with the end cut off used at relatively high concentration, whereas sta-
obl~quelyto remove the visible band of DNA. ble REs can be used at lower concentrations (1-2
This should be removed in as smalI a voIume U/sampie) for extended periods (e.g., Crouse and
as possible (i.e., 0.5-1.0 ml). Ainorese, 1986). Digestion also can be improved
4. If the DNA fraction is visibly dirty after the by addition of bovine serum albumin (BSA), and
firs1 gradient (as is often the case with direct the addition of spermidinc has been found to as-
banding of chloroplast lysates), it can be sist in digestion of DNA samples containing im-
banded a second time. Simply bring the purities (e.g., Jeffreys, 1982). Because many REs
UNA/CsC1 fraction to a volume of 4.45 ml by are heat sensitive, they should be stored at -20°C,
ddding a premixed 1.55 g/ml solution of CsCl preferably in a frost-free freezer, and removed for
with 100 pg/ml EB, and TE and repeat steps as short a period as possible. The enzymes are
2 and 3. stored in 50% glyceroI to prevent denaturation by
5. Remove EB by three extractions with iso- freezing. The glycerol can affect RE activity if pre-
sent at greater than 5% of the final reaction mix-
propanol (uppermost layer) as described in
ture. Thus, the volume of RE added to a reaction
I'rotocol 1, part F.
should always be less than 10% of the total.
6. There are two ways to remove the CsC1. Ei-
ther dialyze (Protocol I, part F) or ethanol- Part A. Digestion of Single Samples
precipitate as described below.
1. For each sample, the final reaction volume
s. I?ernove the aqueous layer from the third so- should be 5-30 p1. For a single digest, add the
propanoi extractlon and add two volulnes of following to a sterile microcentrifuge tube:
I-i2Q to dilute the CsCl. Mix gently and add 6
volulnes of ice-cold ethanol to precipitate DNA. a. BSA (100 mg/ml solution) and appropriate
Place at -20°C for 30 min to overnight. Do not buffer stock (typically provided by supplier as
place at -80°C or the CsCl will precipitate. lox stock) are added at 1/10 final volume.
b. Centrifuge at >1,750 for 10 min to collect the b. Water (sterile, deionized, distilled) IS added to
DNA precipitate. dilute the reaction mixture to the calcuiated fi-
nal volume (see below).
c. Wash pellet with 70%ethanol. Spin at >I750 g
for 2 min to collect the DNA. c. DNA accord~ngto amount required: 1-5 ~ i for g
end-labeling, 0.1-10 ,ug for staining or transfer-
d . Re-suspend pellet in 0.1-0.5 ml of TE.
hybridization, depending on the sequence as-
7 Slore the DNA at 4OC for short-term use and sayed and the size of fragment to be detected.
at -20°C for long-term use. The volume depends on concentration k g . ,
mtDNA purified according to Protocol 1 can
usually be used at 1-10 p1 per digest for end-la-
IPriiiacoX3: Digestion of DNA with beling).
Rcs tri ction Eurdonrscieases d. 1-2 U of the appropriate RE. More may be
(?'me: Part A: 2-6 hr; Part B: 2-6 hr) needed for large amounts of DNA (>I ,ug) or for
The activity of REs varies with temperature, pH, heat-labile REs.
and salt (Nat, K+, Mg2+)concentration. However, Example:
~t1s usually possible to achieve acceptable levels
1p1 ZOx buffer stock (1/10 final volume)
of activity using a small range of buffers (supplied
1 pl 1 mg/ml BSA stock (1/10 final V O ~ -
by the manufacturer) that differ in the final con-
ume)
cen tration of Na+. Digestion conditions for a par-
Nucleic Acids III: A nalysis of Fragments and Restriction Sites 291
5 p1 DNA sample (depends on DNA con- 15-30 units RE (add last)

centration) H20up to 45 p1
1-2 U of RE (volume varies with RE con-
centration) 3. Mix thoroughly and aliquot 3 pl of the digest
mix to each sample, mix, and incubate as
H20 to final volume of 10 pl
above.
2. Mix well, and incubate at 37OC (or higher
temperature as recommended by suppliers).
Digestion of purified mtDNA or cpDNA, or 13rotoco34: Agarose and Poiyacrylamide
of PCR products, is usually complete in 1-3
hr, although some samples take longer or Electrophoresis
may require adding a second aliquot of en- (Time: Part A: 1hr preparation, 2-18 hr elec-
zyme after a few hours for complete diges- trophoresis; Part B: 2 hr plus exposure time)
tion. Digestions of large amounts of total cel- The fragments produced by RE digestion are sep-
lular DNA with expensive but long-lived REs arated according to size by electrophoresis
are typically left overnight. through agarose or polyacrylamide gels. For
3, Remove from the incubator, spin briefly in analysis of double digests (e.g., for restriction site
microcentrifuge (not necessary for large reac- mapping) or analysis of single digests that pro-
tion volumes), and place on ice or store in the duce small fragments (e.g., 4-bp REs) by end-la-
freezer (indefinitely if desired) until needed. beling, each sample should be run on both types
of gel to resolve fragments accurately over a wide
size range (e.g., 10 kb-20 bp). Most other applica-
Part B. M~rltip!~ Samples and T'4aublc Digests tions just use agarose gels (see "Electrophoresis of
If multiple samples will be digested with the same Fragments," above), although short (12 cm) 6-8%
RE, prepare a "digest mix" and then add an aliquot vertical acrylarnide gels are usefill for visualizing
(e.g., 3 pl) to each sample. For double digests in- digested PCR products (e.g., Figure 5). For high-
volving REs with compatible salt requirements, an resolution separation of large (>I0 kb) fragments,
aliquot of each RE is added to the DNA sample, al- as required for multilocus minisatellite finger-
though for REs which are inhibited by high salt printing, use low concentration (e.g., 0.8%)agarose
concentrations, the sample volume should first be gels and low voltages (e.g., 2 V/cm; see Bruford et
increased by adding an equal volume of TE. al., 1992 for details). For some applications, the
Example: To digest 14 DNA samples of electrophoresis buffers can be used at half strength
mtDNA with volumes of DNA sample per digest (e.g., 0 . 5 TBE),
~ resulting in considerable savings.
varying from 3 to 7 pl.
1. Bring all samples to the same volume by Part A. Gef I'reparation and Electrophoresis
adding TE (e.g., up to 7 ,d). AGAROSE GELS Agarose gels can be run horizon-
2. Prepare a digestion mix sufficient for 14 Sam- tally or vertically. Horizontal gels are used for
ples wlth some allowance for pipetting error most applications, e.g., staining and transfer
(e.g., [14 x 31 + 3 = 45 ,d total). The amount of hybridization; vertical gels are easier to dry and
lox buffer stock and BSA must include the vol- offer better resolution for autoradiography of
ume of DNA as well as the other ingredients of end-labeled fragments. Since agarose of the high
the digestion mix. In this example, each tube purity grade (i.e., most expensive) is necessary for
will contain 7 ,ulof DNA and 3 /.d of digest mix. electrophoresis of DNA, the gel usually is kept as
Allowing for error, there is a total of 15 x 10 /d small and thin as the application allows. Minigels
= 150 pl. Thus, the mix should contain: (e.g., 50 mm x 100 mm) are often used for check-
ing DNA samples, clones, or PCR products
15 pl lox buffer stock (Chapters 7 and 9). Larger gels (see molds in
15 pl BSA stock (1 mg/ml) Figures 12,13, and 14) are used for RELP analysis.
292 Chapter 8 / Dowling, Moritz, Palmer ~57'Rieseberg
GEL MOLDER
# Dimensions
-b 12-3/4 in.4- 1 1/2 in. x 8-112 in x 21-3/4 In. top
2 1/4 in x 2-1/4 in. x 12-3/4 in. sides
2 1/2 in x 1-1/8 in. x 8-1/2 i n legs
2 1/2 In. x 1/2 in. x 8-1/2 in. feet
1-1/8 in. 2 l / 4 in. x 1-3/8 in. x 8-1/2 in. gates
M
1 in. BUFFER TANKS
SIDE VIEW
Banana plug
rn I
1 1 1 0
I-.,,
:..
0 "t.
1/2 in.
in.
3 in. Wire holders
f
1-1/8 in.
U
- - , q 3 */ 8/n in. Li 9+1 ".- -il
in.
'
. 9-3/4 in. ---------I
1--.1--1
1/2 in. 2/2 in. TOP VIEW
SIDE VIEW OF
INTERNAL PIECES
1/4in.
H 2-1/8 in.
-7
1-3/8 in.
2-1/4 in. 3/8 in.
L b8-1/2 i n . d
1/2 in. &3-1/2 in. ---4
END VIEW
END VIEW
COMBS
5/32 in. 1/16 in.
I
I
7/8in./ 1 39 teeth
-.' 9-1/ 2 in. --------A
i
b8-1/2 in.
l
- SIDE VIEW
1/4 1n. 1/8 in. # Dimensions

2 1/4 in. x 2-1/8 m. x 9-3/4 in. sides
1 3/8 in. x 3-1 /2 in. x 9-3/4 in. bottom
2 1/4 in. x 2-1 /8 In.x 3 in. ends
1/2 in.
3/4 lnf
1 1/2 in, x 1 In. x 1 in, triangular corner piece
3 1/4 in. x 1/4 UI. x 1/4 in. wire holders
k--8-1/2 in.-
1 banana plug
12 in. 27-gauge platinum wire
1 GEL RIG = 1 GEL HOLDER + 2 TANKS
Figure 12 Plans for a non-submarine type, horizontal lished plan of McDonell et al. (19m,with modification by
agarose gel electrophoresis unit wit11 agarose wicks (1unit M. Murray, W. Thompson, R. Jorgensen, and J. Palmer.
= 1 gel holder plus 2 tanks). The design of two types of gel (Figure courtesy of Nanette Mussy and Jim Manhart.)
combs is also shown. Gel rig plans are based on the pub-
Nucleic Acids In: Analysis of Fragments and Restrictioiz Sites 293
/ APPARATUS GEL MOLD

OVERHEAD VIEW OVERI-TEAD VIEW
Notch for comb
9-3/4 Ill
+8 in. -----4
H
3 / 4 in.
SIDE VIEW
3/16 in. SIDE VIEW
a d -
14-3/4 in. jc
5/8 in. 1
1-3-1 /4 in. -1
depth
2-7 j8 in. l/i in.
notch
width
I 1/4 in. 1/E in.

M -Ii-
I 3 / 4 in.
END VIEW
t
13/16 inJJjU[-~9/16 in. Combs'have teeth
L 1 mm or 2 rnm thick
1. p
5/Btrn. 1/2 in. 20 teeth I
F
k--- -
8-3/8 in.
i k--- 8-1/ 4 in. 4
Figure 13 Plans for a submarine horizontal gel rig. Gel

mold is made of ultraviolet transparent plcxiglass for The steps involved are preparation o l a mold
use in EB staining.
and the agarose, pourlng the gel, inserting the
well-forming comb, and, once the agarose has set,
removal of tlte comb.
The range of fragment sizes that can be accu- 1. Preparation of the gel mold depends on the
rately measured varies with the concentration type of unit. Horizontal units have preformcd
(weight/volume) of the agarose. Fragments as molds wlzich are taped on opposite sides to
small as 100 bp can be visualized using 2.5% gels, contain the agarose solution and have combs
while fragments as large as 30 kb can be resolved inserted to form the wells (Figures 12 and 13).
in O.G% gels. Agarose cannot be poured easily at For large melds, tight taping across the top
concentrations of greater than 2.5%, although prevents warping. For vertical units (Figure
some special preparations (e.g., NU-Sieve'", FMC 14),two polished glass plates (one notched)
Corp,) can be used at much higher concentrations are separated by spacers and clamped to-
(at least 4%) for detecting smaller fragments.
Nucleic Acids III: A of Fragments and Restriction Sites 295
4 Figure 1 4 Plans for an adjustable vertical gel rig. vertical gels, squirting a small amount of
Glass plates are 3.2-mm double-strengthglass, 16.5 cm buffer between the gel and each tooth of the
x 19 cm and 16.5 cm x 44.5 cm; in sets of two, where one comb oftens helps. Remove the tape from the
has a notch in the top that is 1.9 cm deep x 14 cm wide
(centered).Spacers for the agarose gel (smallgel) are 2.0 mold, place the gel in the rig and submerge in
mm thick. Spacers for the polyacrylamide gel (large gel) buffer to prevent the gel from dessicating. For
are 0.75 mm thick. Combs routinely have 16 wells for vertical gels, squirt molten agarose between
both gels. the plate and rig before clamping together to
provide a good seal against buffer leakage.
The gel is now ready to use, or it may be kept
gether. The bottom is sealed using tape or by as is for at least 1 day, as long as the wells re-
pouring an agarose plug while the mold unit main immersed in buffer,
is held vertically in a stand with a central Prior to electrophoresis, wells should be
well. tested by preloading with dilute (1x1 running
2. Mix agarose, lox stock of gel buffer (usually dye (Appendix) and electrophoresed for sev-
TBE or TAE, see Appendix), and distilled wa- eral minutes. In the case of vertical gels, thin
ter. For example, to make 200 ml of a 1% gel, layers of agarose need to be removed from
combine 2 g of agarose, 20 ml of lox buffer, the wells manually (Hamilton syringes work
and make up to 200 ml with H20.Mix the in- well for this) and by gentle rinsing.
gredients thoroughly in a flask and boil vig- Connect the electrical leads to the gel appara-
orously with intermittent swirling. If using a tus. DNA migrates to the anodal (positive)
microwave, add a teflon-coated stir bar to pole, therefore the wells should be closest to
avoid superheating. The preparation is ready the cathodal pole (for vertical gels, anode at
when all of the particles have gone into solu- the bottom, cathode at the top).
tion. When cooking agarose (especially in a Add 1/5 volume of loading solution (Appen-
microwave oven), loss of water due to evap- dix) to the sample (which should already be
oration can be significant. Check the final vol- end-labeled if necessary). A size standard
ume, add water to replace that which has (e.g., HindIII- or AvaI/BglII-digested 3L bacte-
boiled away, and reheat briefly to ensure that riophage DNA) must be included on each gel.
the agarose is well mixed and dissolved. For analysis of minisatellites, an internal size
Molten agarose may be stored for several marker revealed by hybridization can be in-
days at 70°C (or allowed to set at room tem- cluded in each lane (Burke et al., 1991).
perature), or after sufficient cooling (when the
flask is no longer too hot to handle; Using a Hamilton syringe or adjustable mi-
=50-55°C), can be poured into the vertical or cropipettor, load each sample into the well,
horizontal mold. Pouring agarose that is too splitting samples between the agarose and
hot will crack the plates or warp the plexi- acrylamide gels if both are used. Fragments
glass mold. are best resolved using low voltages (1.0-1.5
V/cm), although much higher voltages (=I0
3. Pour the slightly cooled agarose into the level V/cm) sometimes are used for rapid running
mold. For horizontal gels, the comb should be of minigels. Full-length (20 cm) agarose gels
in place prior to pouring. For vertical gels, in- are usually run overnight. Electrophoresis
sert the comb immediately after pouring and typically is stopped when the dye front
fix it in place by clamping the comb to the (equivalent to 0500 b p in a 1% gel) has
back plate. Let the gel set until it is cool to the reached the end of the gel. The gel mold is re-
touch and opaque. moved from the apparatus and the gel is
4. Carefully remove the comb to prevent tearing treated to visualize the fragments (see "Stain-
of the wells or the teeth separating them. For ing" and "Gel Drying," below).
POLYACRYLAMIDE GELS Polyacrylamide gels are they usually may be put back in place and
prepared at varying concentrations (typically still provide a good barrier between lanes.
3.5-6.0%) and are used for visualizing small 6. Place the gel in the apparatus (Figure 14) as
fragments (<I000 bp), Unlike agarose, polyacry- for vertical agarose gels. Fill the buffer tanks
lamide gels are run only vertically (Figure 141, to prevent dessication. The gel may be stored
and transfer from polyacrylamide gels to a as is overnight or used immediately.
hybridization filter must be done electrophoreti- 7. The remaining steps are as described for
cally (Church and Gilbert, 1984; Kreitman and agarose gels, using a standard wit11 fragments
Aguade, 1986). The following instructions are for of appropriate size (e.g.,HueIII-digested $XI74
the long (44.5 cm) 4% gels used for electrophore- RF DNA), Polyacrylamide gels can be run in a
sis of end-labeled RE products; the procedures minimum of 4 hr or overnight. However, ap-
are similar for the short 6-8% gels used for plication of strong current (typically >300 V
analysis of digested PCR products. for a long gel) can severely distort the migra-
tion front due to differential heating. When
CAUTION: Polyacrylamide is a cumulative neuro- the dye front has migrated the appropriate
toxin and must be handled with extreme care. distance (28 cm for a 40 cm 3.5% gel), the gel
Always use gloves, and handle the powder in a mold is removed from the apparatus. The gel
fume hood while wearing a face mask. is then treated accordingly (see below).
1. Wash plates (one notched as for agarose gels)
with ethanol. If the gel consistently sticks to Part B, f;ef llryir~gand ikukoxadictpraphy
both plates, spread silane (Appendix) on the When using gels to separate end-labeled frag-
top (notched) plate. An appropriate substitute ments, it is best to dry the gels to a piece of chro-
is water-repellent for windshields. This prod- matography paper (Whatman 3MM) before au-
uct is much easier to use and can be pur- toradiography. Dried gels are easier to handle and
chased at many auto parts stores for a fraction the fragment patterns much sharper.
of the cost of silane.
2. Place spacers between the plates and clamp CAUTION: For 32Por 35S end-labeled fragments,
tightly in place. Tape the bottom of the plate the solution in the bottom tank of the gel rig
to complete the mold. contains the unincorporated nucleotides. There-
3. Wearing gloves, mix the appropiate amounts fore, this solution is highly radioactive, requir-
of bis:acrylamide, buffer, and distilled water ing caution in handling and proper disposal.
(Appendix) in a flask. 1. Remove the gel mold from the apparatus. Re-
4. Add 10%ammonium persulfate and TEMED move a side spacer and carefully split the top
to the mixture, mix by swirling, and imme- (notched) plate away from the bottom with a
diately pour between glass plates. While spatula. For polyacrylamide gels, the gel will
pouring, make sure that no large bubbles sometimes stick to both plates. The gel can be
form, as these will interfere with migration removed from either plate by gently squirting
of fragments. When the mold is full, lay flat with water as the plates are separated.
on a raised surface. Insert comb approxi- 2. For agarose gels, gently rinse the exposed
mately 1-2 cm into the gel (depending upon side of the gel with water to remove excess
sample volume to be loaded) and fix by nucleotide and reduce background contami-
clamping the two plates over the comb with nation. Drain by tilting, allowing the water to
a large binder clip. This minimizes the run off. Excess water should be removed by
amount of polymerized acrylamide in the gentle blotting with an absorbent wipe. This
wells.
procedure is not typically necessary for poly-
5, After 40-60 min, carefully remove the comb acrylamide gels, but if performed, do not blot
from the gel. If the walls of the wells break, the gel dry as the gel will stick to the wipe.
Nucleic Acids III: Analysis of Fuagmen ts and Res trictio~zSites 29 7
3. Remove the gel from the glass plate by adhe- method for silver staining is provided by Bassaln
sion to the filter paper. and Cae tano-Anollits (1993).
4. Rinse and blot the opposite side of the gel as
previously described (step 2) and place a sec-
ond piece of filter paper the same size as the ProkasoE 5: Staining wiih EtF;ia'
41~02
first beneath the gel and filter paper. Cover Bromide
the gel with plastic wrap, trim the plastic (Time: 3 0 4 5 min)
wrap and filter paper to the size of the gel,
and place in the gel dryer. Apply vacuum and Fragments may be visualized using UV fluoresc-
turn on heat. 1.5-mm thick vertical gels usu- ing dyes such as ethidium bromide IEB, a power-
ally dry in 3 0 4 5 inin; thicker, horizontal gels ful carcinogen) which blnd to the DNA molecule.
take considerably longer. The gel is now fixed This is used to observe RFLPs where large
to the top piece of filter paper. amounts of purified ox amplifed sequence are
available (e.g., Figure 5). Staining is also an im-
Remove plastic wrap and extra filter paper
portant step in the transfer-hybridization method
and dispose of as radioactive waste. Load the
(Protocol 8). The method below is used to stain
dried gels and film into an autoradiograph
gels after electropl~oresis.Alternatively, EB can be
cassette. The number of intensifying screens
included in the gel mix or added to the elec-
.to be used is determined by monitoring the
trophoresis buffer. The EB solution needs to be
gel with a Geiger counter (for 32Plabeling)
disposed of according to regulations for carcino-
and past experience. At -70°C, intensifying
genic compounds and should be replenished
screens enhance the intensity of the image (in-
every 1-3 days depending on usage.
cluding the background contamination) by a
factor of four (one screen) to ten (two screens). 1. Trim the gel (e.g., at slots and 4 cm below the
However, the use of two screens reduces the broinophenol blue) and place on an acrylic
crispness of image, If one intensifying screen plastic sheet.
is used, the orientation is intensifying screen 2. Stain gel in 500 rnl distilled H20 w ~ t h0.5
(shiny side up), film, and the dried gel (gel pg/ml EB for 10-20 min. Shake gentIy. Pour
side towards film). If two screens are used, off El3 solution and rinse for I min in d~stllled
the orientation is intensifying screen (shiny H20.
side up), film, intensifying screen (shiny side
3. Shake gel in second rinse of distilled H20for
down), and the gels (gel side facing the film).
5-30 min to remove excess EB from gel.
6. After exposure for the appropriate length of
4. Photograph gel (using a PolaroidTM camera or
time (dependent upon the amount of DNA la-
beled, the efficiency of the labeling reaction, other instant visualization system) wxth a
age and type of nucleotide used [ ? ' or 35S], plastic ruler next to the size marker. If the
and the number of intensifying screens), the photograph is to be enlarged or publislied,
autoradiograph is developed, fixed, and al- save a negative (for PolaroidTMnegatives:
wash with water, then sodium sulfite, then
lowed to dry.
rinse with water).
Visualizakiaas: of I:ragmrents k.'sutb~col6: wa2P3' EraQ-L,abelfrzgo f

Fragments usually are visualized by staining with AiesE.riction Fragments
EB (Protocol 5) or by end-labeling (Protocols 6 (Time: 30 rnin)
and 7), or transfer hybridization (Protocol 8). The End-labeling of the DNA fragments produced by
relative merits and limitations of these alterna- RE cleavage with radioactive nucleotides (39P or
tives were discussed above. A sensitive and rapid 35S dNTPs) can detect minute amounts of DNA,
enabling more digests per sample. d2P-labeled 2 . Add 5 pl of label mix to each sample and
dNTPs are used most frequently because their leave at appropriate temperature (see above)
high-energy emission results in relatively short for 20-30 min.
exposure times for autoradiography. The alterna- 3. Add 1/5 volume of loading dye to each Sam-
{ w e , 35S,
has a longer half-life (half-life of 60 days ple. This can be mixed by vortexing or by
compared to 14 for 32P)and produces crisper im- gentle aspiration in the Hamilton syringe dm-
ages, but requires much longer exposure times ing loading.
and contamination is more difficult to detect in 4. Load samples into wells, splitting each be-
the laboratory, requiring swipes and scintillation tween agarose and acrylamide gels.
counts. Where several different REs, each with its
own type of end, are used, it is simplest to use all
four radiolabeled dNTPs for end-labeling. Protocol 7: Primer Labeling far
The reaction uses the large (Klenow) frag-
ment of DNA polymerase I which has 5' -+ 3'
Microsatellit@Analysis
(Time: 40 min)
polymerase and 3' exonuclease functions (see
"Methods of Detection"). The polymerase func- Microsatellite loci typically are analyzed via PCR
tion is far more active than the 3' exonuclease. La- and new primers should be designed and opti-
beling generally is carried out at room tempera- mized as described in Chapter 7. A protocol lor
ture or at 4'C. However, fragments with blunt cloning microsatellite loci to determine appropri-
ends or 3' overhangs (Table 2) are best labeled at ate primer sequences is given in Chapter 9. The
37°C to maximize the exonuclease activity. Under PCR products are electrophoresed through dena-
thebe conditions, randomly sheared fragments turing polyacrylamide gels (as used for sequenc-
also may be labeled, thereby increasing backing; see Chapter 9 ) and are best visualized by
ground. This can be reduced by adding only the radioactive labeling, or, with an automated se-
first nucleotide to be inserted (e.g., for RsaI di- quencing apparatus, using fluorescence. The pro-
gests, just add 32P-dTTP). tocol below is for preparing radiolabeled primers
that can be used in combination with cold primers
1. Prepare a labeling mix to be added to each
in the PCR reaction. The same method is used to
sample. This consists of lox label buffer (Ap-
prepare primers for cycle sequencing (Chapter 9).
pendix), radioactive dNTPs, the large The protocol requires y-labeled ATP. Either
(Klenow) fragment of DNA polymerase I, and y-33P-dATP (1000-3000 Ci/mmol) or Y - ~ ~ P - ~ A T P
distilled water. The amount of lox label buffer (3000 Ci/mmol) nucleotides are used; Y - ~ ~ S - ~ A T P
added must take into account the volume of is not recommended because of the reduced effi-
the digests as well as the labeling mix itself. ciency of polynucleotide kinase (PNK) with this
For example, if 5 ,dof label mix is to be added isotope.
to each of 16 tubes (e.g., 15 digested DNA
samples and a size standard) which already 1. Calc~tlatevolumes for an end-labeling reac-
contain 10 p1 of digest, the total volume, in- tion as follows: for each PCR reaction, select
cluding an aliquot for pipetting error, is 17 x one primer to label and use a mix of 3:l unla-
15 = 255 pl. For this example, this mix would beled to labeled primer (the second primer is
include: unlabeled). Note that the labeling reaction
will result in a primer stock at 0 . 1 the
~ origi-
25.5 pl lox label buffer nal concentration. Calculate the amount of la-
5 U b0.25 U/sample) Klenow poly- beled primer required for the number of PCR
merase reactions and proceed with the labeling reac-
2 pl of 800 Ci/mM o?~P-~NTPs (Q 0.5 /.d tion.
= 5 yCi each) 2. For example, to prepare 10 pl of labeled
ddH20 to 85 p1 ((i.e.,17 aliquots @ 5 pl primer, combine the following ingredients in
each) an 0.5-ml microcentrifuge tube:
1.0 pl primer (10 (i.e., 10 pmoles) DNA, making it single-stranded, and hybridiza-
1.5pl Y - ~ ~ P - ~ A
(i.e.,
T P10 pmoles) tion; (4) washing the filter; and (5) autoradiogra-
1.0 pl lox T4polynucleotide kinase phy. Significant variations include transfer by vac-
buffer uum instead of capillary action ("vacuum
0.625 ,ulT4 polynucieotide kinase (i.e., blotting"), transfer under alkaline conditions
5 U) (Reed and Mann, 1985),production of radioactive
5.875 ddHpO probes by random priming instead of nick trans-
lation (Feinberg and Vogelstein, 19831, and modi-
Mix, then incubate at 370C fications of the stringency of hybridization and
then denature the at 650C washing (see Hames and Higgins, 1986). For some
min. Note: if using Y - ~ ~ S - ~ use
A T20
~ ,U PNK lower-sensitivity applications (e.g., excluding dex-
and incubate for much longer (e.g.~4 hr) at tran sulfate from hybridization mix) or some types
370C. Spin brieflyto any of membranes (e.g., non-charged membranes), the
End-labeled primers be at -200C prehybridization step can be omitted without a
for as long as one month and can be used di- substantial increase in background.
rectly without any further preparation. The most important aspect of hybridization is
3. To use the labeled primer in a set of PCR re- determining the appropriate conditions (strin-
actions, for example 20 reactions each of 6.25 gency). When using heterologous probes, some
p1, prepare the following master mix, using base-pair mismatches must be permitted, with the
precautions against cross contamination amount of mismatch required dependent upon
(Chapter 7): similarity of probe and target DNAs. Stringency
can be reduced by lowering temperatures, and by
64.6 pl ddHIO increasing salt and formamide concentrations. For
2.5 p110 mM dNTPs more precise description of manipulation of these
2.5 $ lox Taq polymerase buffer parameters, see Sambrook et al. (1989) and Hames
7.5 p125mM MgC12(adjust as necessary) and Higgin (1986).
4.0 $ 10 IMunlabeled primer 1
3.0 $10 mM unlabeled primer 2 Part A. Transfer of DNA to I.hc Membrane
10.0 p11 mM labeled primer 2 (from The electrophoresed fragments are made single-
step 2) stranded by alkaline treatment and are then trans-
20 pl DNA template (i.e., 1 pl per ferred in the same orientation from the gel to a
reaction) binding membrane to which they are bound.
~ l i ~add ~ DNA
~ t template
, (including negative 1. After agarose electrophoresisf stain the gel
control) and proceed with thermal cycling. It is ef- with EB and photograph with a ruler next to
ficient to use multiwel trays for the reactions. the size marker to allow fragment sizes to be
PCR products can be stored at -20°C prior to elec- determined from the fina1 autoradiograph.
trophoresis on denaturing acrylamide gels. Trim the gel to minimum size, slicing at the
origin and 1-2 mm from the outside DNA
lanes. For RE assays of mtDNA, cut the bot-
tom at the 150-200 bp position. For minisatel-
Protocol 8: Transfer Hybridization Iites, gels are run until fragments of =2 kb are
(Time: Part A: 3 hr to overnight; Part 8: 1hr to
at the bottom of the gel. For large-scale survey
overnight; Part C: 6-24 hr; Part D: 2 hr plus
work, the sizes of gels are calculated so that
exposure time)
two (or sometimes three or four) fit precisely
onto a single piece of film: e.g., twd20 cm ;(
This method consists of five basic steps that follow
12.5 cm gels will result in membranes that can
digestion and electrophoresis on agarose gels: (1)
be exposed together on a standard size (20 cm
l~ansferof the DNA from the gel onto a filter; (2)
x 25 cm) piece of film.
prehybridization of the filter; (3) labeling the probe
2. If the membrane is to be cut into smaller 7. Neutralize the gel by shaking in 3 n / ~

pieces after transfer, mark between lanes with NaC1/0.5 M Tris-HC1, pH 7.5 for 30 min.
permanent ink to indicate positions of cuts. 8. For double-sided transfer, cut out two pieces
Mark a corner to orient gel and membrane. of hybridization membrane to the exact size
3. If large fragments (e.g., >5 kb) are to be trans- of the gel or, at the most, 5 mm larger in each
ferred, proceed with acidic depurination dimension. Nylon membranes are preferred
treatment (steps 3-4); for small fragments over nitrocellulose as they are tougher and
proceed directly to step 5 . Shake gel gently in can be rehybridized many times. Membrane
0.25 M HC1 until the bromophenol blue can be purchased in long (30 rn) rolls (the
barely turns yellow. The exact time varies de- width of two gels) to minimize wastage. Mark
pending upon the thickness and percentage filters carefully on the edge with appropriate
of the gel: 5-10 mi11 usually gives the best re- data (e.g., date, experiment number, etc.).
sults. The acid depurinates the DNA, break- 9. Cut out four pieces of robust filter paper (e.g.,
ing large fragments into smaller pieces for Whatmann 3MM) to the same size as the gel.
more efficient transfer. Prolonged exposure 10. Soak membranes in distilled H 2 0for >20 min;
will result in excessive depurination, thereby then soak membranes and filter paper in 20x
producing small fragments that pass through SSC for 1-10 min prior to setting up the transfer.
the filter or hybridize poorly with the probe
DNA. Insufficient exposure may result in in- 11. Put together a symmetrical "gel-blot sand-
complete transfer. wich" (Figure 15) in t l ~ efollowing order:
4. Pour off acid solution and rinse gel wit11 dis- 4-cm paper towels, trimmed to size of
tilled H 2 0 for 1min. membrane and flat
5. Shake gel in 0.4 M NaOH until the dye be- 2 pieces of Whatmann paper
comes blue again b-10-20 min; can be much 1 piece of nylon membrane
longer). gel (do not slide it around)
6. Pour off the NaOH and rinse gel with dis- 1 piece of nylon membrane
tilled H 2 0for 1 min. 2 pieces of filter paper
Whatman
(2 pieces)
Whatman
(2 pieces)
I I---/ I
Figure 15 Setup of transfer according to the two-sided, dry-blot method.
Nucleic Acids III: Analysis of Fragments and Restrictio?z Sites 301
4-cm paper towel trimmed to fit dry milk powder to distilled H 2 0 to nearly
plexiglass or glass plate full volume and mix gently, then acid SDS
weight (from 20% stock solution, millipore-filtered),
and finally add the SSC (20x stock solution,
Any bubbles or creases will cause uneven also millipore-filtered). Cover flask and heat
transfer and should be removed by rolling for 2 hr in a 65°C H20bath. One or two filters
with a Pasteur pipette or other suitable cylin- in a plastic bag can be prehybridized (and hy-
drical object. Allow transfer to proceed for 3 bridized) in 10 ml of solution. For large scale
11r to overnight. experiments, 100 rnl of solution will suffice
for approximately 20 filters (each 12.5 x 20
12. Disassemble the gel blot, taking care to mark
cm) hybridized together in a plastic tub or in
the filters at any spots where they are to be
plastic bags, either individually or in small
cut. Also mark the filter to define orientation
relative to the gel. groups,
5. For tubs, add the hybridization solution and
13. Shake membrane in 2x SSC for 10 min.
cover with a lid. For bags, add the hybridiza-
14. Air-dry for 30-120 min on filter paper. Some tion solution, carefully remove any bubbles,
types of membrane may require further dry- and heat-seal 2-3 cm from the edge of the fil-
ing in a vacuum oven for 30 min at 80°C to ir- ter. This will leave room for additional sealing
reversibly bind the single-stranded DNA after adding the probe. Place all bags together
fragments to the membrane. Alternatively, in a single plastic tub with lid.
DNA can be cross-linked to the membrane by
6. Shake gently for 2 hr ta overnight at 65°C.
exposure to ultraviolet light, with the neces-
sary energy-approximately 1.200 J. Note,
however, that overexposure can result in re- Pare C. l,al?eiing o f Pmb::: and Hybricfizatjnrs
duced hybridization efficiency (Reed and Here we describe nick-translation (Rigby et al.,
Mann, 1985). 1977), one of several methods available for label-
15. Trim off any extra filter outside of the desired ing DNA for use as probe in transfer hybridlza-
image. Remember, filters must be an appro- tion. Random priming of DNA (Feinberg and Vo-
priate size for autoradiography Slice filters gelstein, 1983) also is commonly used (this
into smaller strips as necessary. Store new fil- reaction is best performed using commercialiy
ters at room temperature or 4°C until needed available kits). It also should be possible to gener-
for hybridization. ate large quantities of labeled probe by incorpo-
rating a32P-dNTPsinto PCR reactions. In any
case, unincorporated nucleotides may be re-
Part B. Prehybddizahion of 1.11~Filficr moved using steps described below (4-8) or any
commercially available spin columns.
I. Wet filters in 500 ml2x SSC for 5 min.
2. If appropriate, remove probe from the previ- 1. Prepare 10 p1 of nick-translation buffer cock-
ous hybridization by shaking the filter in 500 tail per reaction:
rnl of boiling 0 . 1 ssc
~ for 3-5min. New mem- 3.0 pl10x nick-translation NT buffer
branes should be washed in 500 ml of 0 . 1 ~
(Appendix)
SSC, 0.5% SDS at 65°C for 1 hr to minimize
background on subsequent hybridizations. 0.5 p1 DNA Polymerase I (10 U/pl)
1.0 pl each of 5 mM dTTP, dATIJ, dGTP
3. Remove excess solution from membranes and
1.0 ,ul d2P-dCTP (i.e., 20 pCi of 3000
place in a heat-sealed plastic bag.
Ci/mM stock)
4. Prepare the hybridization solution (4x SSC, 1.0 ,ulDNase I (0.1 pg/ml stock)
1% SDS, 0.5% nonfat dry milk; or alternative, 3.5 @ ddH20
e.g,, Church and Gilbert, 1984).First add the
302 Chapter 8 / Dowling, Moritz, Palmer 6 Rieseberg
More of the 32P-dCTP can be added to achieve

greater incorporation if necessary. Keep mixture 1. Prepare 4 L of filter wash buffer (2x SSC, 0.5%
on ice. Do not mix the cocktail vigorously as this SDS). Add water first (3.5 L),then 20x SSC
may denature the enzymes. (400 ml and mix), then 1.0% SDS (100 ml).
2 Add 10 pl of nick-translation cocktail to each 2. If using a bag, cut off top of bag and remove
tubbe containing template DNA in a volume of the hybridization solution using a disposable
20 ,u1(>50ng, typically 2 11 of plasmid DNA + 10-ml pipette. This can be stored for further
18 p1 distilled H20). use or discarded as radioactive waste. Slide
3. Incubate reactions fox 2 hr in a 15OC water the filter out and place it in a plastic tub with
bath. 500 ml of wash buffer at room temperature.
4. Prepare spun columns for removal of unin- Discard the bag into the dry radioactive waste
corporated nucleotide (for alternative method container. If using a tub, remove each filter,
see Chapter 7, Protocol 3): one by one, into tub with wash buffer and
store or discard the radioactive solution as
a. Heat G50-sephadexrM(hydrated in STE + SDS above.
[Appendix]and stored at 4OC) in 65OC water 3. Shake filters for 5 min and discard the wash
bath for 15 min.
solution as radioactive waste, Wash once
b Pack a small wad of glass wool into the bottom more.
of a 1-ml syringe 4. Wash filters 2-3 times for 3 0 4 0 min at 65OC.
c. Fill each syringe with G50-SephadexrM and let 5. Remove the filters from the final wash and re-
drip dry in a used 15-mlpolypropylene tube.
move as much excess liquid as possible, but
d. Add additional G50-SephadexrM and spin down do not let the filters dry out as this makes it
briefly at 1750 g in a benchtop centrifuge. Re- difficult or impossible to strip hybridized
peat, if necessary, until the packed volume is 0 9 probe for subsequent rehybridization. Wrap
ml filters in plastic wrap.
e. F111 columns with STE + SDS and centrifuge at 6 . In a darkroom, load each filter into a cassette
1750 g for 2 mi11 Repeat once. Do not allow the with film (see Protocol 4). The film can be
columns to run dry. marked to identify each filter. As for end-la-
5. Slop reactions by adding 75 pl of STE + SDS beled gels, the exposure times and number of
a n d mix well. screens used are determined by the strength
of the emission signal detected upon manual
6. Place the spun columns in new, labeled 15-ml scanning of the wrapped filter with a Geiger
polypropylene tubes. counter.
7. Load each column with reaction mixture. 7. Remove the cassettes from the freezer. Allow
8 Centrifuge at 1750 g for 4 min and discard >15 min for them to defrost before develop-
columns as radioactive waste. ing. There is a danger of cracking and damag-
9. Add 600 p1 TE to effluent and denature by ing the cassettes if they are opened for devel-
boilir-lgfor 10 min. opment when still partially frozen.
10, For hybridization in bags, inject probe into 8. Develop and fix the film.
bag using a I-ml syringe. Heat-seal the bag
twrce just above the filter to minimize total
bag volume. Mix bag contents well. For tubs, Pro tocoX 9: _MappingRtfstrictiasxt Si kes
clump probe directly in tub with filters and (Time: variable)
n-iix well.
A cleavage map can be constructed by determin-
11 Shake gently overnight at 65OC. ing the order of restriction sites for each RE and
with 3 2 ~ \
RE digestion
enzyme 1
$.
a+b+c
C
d~gests
b+c
Complete b
dige, [ - C
Electrophoresis
detection
Increasing digestion (1-4)

Lane S: Size standard, all fragments end-labeled
Figure 16 Illustration of methods for nlapping cleav-
age sites. (A) Partial digestion. (B) Double digestion. appropriate particularly for small genomes or se-
quences (e.g., animal mtDNA). The third ap-
their location relative to sites produced by other proach, sequential hybridization, is useful fbr
REs. This information greatly extends the phylo- much larger genornes such as cpDNA.
genetic applications of RFLP analysis and is es- Inco~npletedigestion of DNA results in a set
sential for localization of length mutations. The of larger fragments, each one equal to the sum of
particular approach used for restriction mapping two or more adjacent fragments. Therefore, the or-
and identification of mutations will depend on der of fragments can be determined by comparing
the size of the genome or genome region being the sites of partially versus completely digested
studied, the amount of variation expected, and fragments (Figure 16). This method is particularly
the number and size of restriction fragments gen- useful where there are several closely spaced
erated. Three methods are commonly used. Two cleavage sites for an RE not separated by sites for
of these-partial digests and double digests-are any other REs. The general approach is to label the
304 Chapter 8 / Dowling,Moritz, Palmer & Rieseberg
DNA at one end only and then to generate a series Part A. Double Digestiort Expcrimcnts
of partial digests either by varying digestion time 1. Determine which samples and W s need to be
or by serial dilution (Danna, 1980; Ausubel, 1987). mapped. REs typically are selected by cost,
Hillis et al. (1992) used a modification of this expected number of cleavage sites, and corn-
method for generating restriction maps of the T7 patibility of buffers in double digests. TOstart,
phage lineages used in their experimental phylo- the fragment pattern for each RE is Compared
genies. In their analysis, partial digestion products across representative samples (e.g., one per
were visualized by hybridization with synthesized locality). Note that not all samples will need
oligonucleotide probes specific to each end of two to be mapped for all sites.
conserved fragments (four in all).
Multiple digestion experiments compare the 2. Perform double digests (Protocol 3) for the ref-
fragment patterns of REs used alone and in com- erence sample to map all sites relative to each
bination. This enables the cleavage sites of the dif- other. T~ically,the best sample to use as a ref-
ferent REs to be located relative to one another erence is the one that exhibits the most cleav-
(Figure 16). This approach works well for se- age sites, as this maximizes the ability to infer
quences of up to 30 kb (e.g., animal mtDNA and losses in other samples. One strategy is to be-
nuclear rDNA) and for REs that make relatively gin with all painvise combinations of three to
few cuts. five REs that cleave only a few sites (i.e., up to
The third approach, sequential hybridizations three) and which cut in at least two buffers.
(e.g., Figure 91, is particularly useful for Iarge Fragments are end-labeled (Protocol 61, sepa-
b30-kb) sequences such as entire cpDNAs rated by electrophoresis through agarose and
(Palmer, 1982,1986a), although it also can be used polyacryalmide gels (Protocol 41, and visual-
to map cleavage sites in smaller sequences (e.g., ized by autoradiography (Protocol 4).
Sites and Davis, 1989). The basic strategy is to se- As an example, consider a hypothetical circular
quentially hybridize a series of radioactive frag- molecule of 10 kb. Assume that this molecule was
ments ("probes"), which together make up the en- digested with enzymes A, B, C, and D (see Table
tire sequence, to fragments produced by single below) in all possible single- and double-digest
and doubIe digests after the latter have been combinations, and that fragments were separated
transferred to nylon membranes. Adjacent frag- by electrophoresis along with appropriate size
ments will hybridize to the same probe whereas standards. Migration distance for each fragment
physically separated fragments will not. By hy- was measured and the number of fragments
bridizing to both single and double digests, it is found in each lane was noted. Fragment sizes
possible to deduce the order of fragments pro- were estimated from calibration curves generated
duced by each RE and the relative location of by plotting migration distances of the size-stan-
cleavage sites for different REs (see Palmer, 1982). dard fragments against their known fragment
Methods for double digesting and sequential hy- lengths (Figure 3).
bridization are given below.
Gel lane 1 2 3 4 5 6 7 8 9 10
Enzymes A A-I3 A-C A-D D B B-D B-C C C-D
Number of sites 2 3 4 3 1 1 2 3 2 3
Sizes 6.0 4.0 3.5 6.0 10 10 5.5 5.0 6.5 6.0
4.0 4.0 3.0 2.5 4.5 3.5 3.5 3.5
2.0 2.5 1.5 1.5 0.5
1.0
Estimated size 10 10 10 10 10 10 10 10 10 10
Molecule size and number of fragments should The correct orientation is determined by exami-
be cansistent across each lane and predictable nation of the A-D double digest re1ati.v~Lo the
from single digests. For example, enzymes A map produced for B and D. Upon incorporation
and B cut twice and once, respectively; thereof enzyme D, the two poss~blealternat~vesarc.
fore, the A-B double digest (lane 3 in the table)
sl~ouldcolztain three fragments that sum to 10
kb. Fewer fragments than expected may be ob-
served in two situations: (1) when fragments
comigrate, or (2) when sites are very close to- Orientation 2
gether. Comigration would be indicated if the
estimated size of the molecule is smaller than
expected (with the size difference the same as
the comigrating fragment). In addition, the Based on fragment sizes obtained from the A-D
comigrating fragment will be Inore intense than double digest, the correct alignment of sites for
other fragments in the same lane. When sites enzymes A, B, and D is provided by orientat~on1.
are close together, the fragment produced may The sites for enzyme C then can be mapped.
be so small as to migrate off the gel. In this situ- Enzyme B cleaves the 6.5-kb C fragment into two
ation, the molecule size will be approximately pieces, 1.5 kb and 5.0 kb. When mapping C rela-
the same as expected, and double digests with tive to B, the two possible alternatives are:
other enzymes will place the two sites proxi-
mate to each other.
To construct the map from the measured frag-
ment sizes, select enzymes that cleave the fewest
times and progressively add enzymes that exhibit Orientation 2
more complex fragment patterns. For our exam-
ple gel from above, we would start with enzymes
B and D, thereby producing the following map
(sizes are not to scale): Again, the correct orientation is determined by
incorporating enzyme D in the two alternatives:
Orientation 1
B C C 0.5 D
Note that location of the zero point (in this case,
the cleavage site for enzyme B) and orientation
4 1.5kb / 3.5 kb j. kb + 4.5 kb
are not important initially. Orientation 2

Next, infer the map positions for sites pro- B C 0.5 D C
duced by enzyme A. The A-B double digest indi- 4 5.0 kb 4 kb i, 3.0 kb j, 1.5 kb
cates that the B site is contained in the 6-kb A frag-
ment, 2 kb from one end, wit11 two possible Based on fragment sizes obtained from the C-D
orientations: doubIe digest, the correct alignment of sites for
enzymes B, C, and D is provided by orientation 1.
Orientation 1 Finally, combine the maps for A-0-D ancl
B A A B-C-D generated above.
f- 4 kb 4 4 kb $ 2kb
Orientation 2 Orientation 1
B A A
4 2kb / 4 kb 4 4 kb
306 Chapter 8 / Dowling, Moritz, Palmer 8Rieseberg
Orientation of sites and distances among them is fully mapped with no further digests. When a
verified by looking at t h e fragment sizes pro- pair of fragments unique to the pattern in one
duced by the A-C double digest. sample sum in size to a third, larger fragment
unique to the pattern found in a second sample,
3. The map for this reference sample is com- those two fragments must be adjacent in the h s t
pleted by adding more enzymes, using the genome. There are two weaknesses to inferring
three previously mapped enzymes as a back- site changes by this approach. First, it is not pos-
bone for construction of additional maps. sible to discriminate between small length muta-
Choice of backbone REs is determined b y tions and site mutations near the end of a frag-
spacing of sites (evenly spaced is best) and ment, although this problem can be alleviated to
buffer conditions. To continue the example a large extent by using polyacrylamide gels to vi-
lrom above, such. a gel might look like: sualize small fragments (Figure 17). Second, for
Lane circular molecuIes such as animal mtDNA, it is
(2) (7) A 4 (12) C difficult to differentiate samples with no sites for
(2) B-E (8) A-E (13) C-F a particular enzyme (but some molecules lin-
(3) B-1: (9) size (14) C-G earized by damage) from those with one site. In
(4) R-G standard (15) C-E addition, different samples cleaved once but in
(5) G (10) A-F (16) E differentpositions will be indistinguishable.This
(6) A (11) F is not a problem for Iinear genomes because the
number of sites is one less than the number of
fragments, and samples which are uncut will ex-
The three new enzymes (E, I?, and G) would
hibit only a single fragment. The only way to al-
be m a p p e d independently relative to the
leviate these problems for circular DNAs is to di-
backbone enzymes (A, B, and C), essentially
rectly map the sites in question. In practice, all
iollowlng the logic applied above.
samples suspected of possessing zero or one sites
4 Thls procedure is performed until all en- for a particular RE must be critically reexammeit
zymes are mapped. Where REs produce com- using an appropriate double digest.
plex fragment patterns (i.e., exhibit 5 or more
b. Each sample that shows extreme differences
cleavage sites), several mapping gels may be
(fragment patterns not interpretable in terms of
necessary for accurate map construction.
single site gains or losscs, typically >3-5% se-
5 Once the reference map is completed for all quence divergence)must be mapped separately
enzymes, relative positions of closely spaced
c. For intermediate haplotypes, some sites may be
sites (typically less than 100 bp) are tested by
inferred relative to the reference map (as de-
selected double digests. In the example
scribed above) while others (e.g., RBs that ex-
above, one could test the order of sites A-C-D
hibit single sites or inferred site gains relative to
using side-by-side cornparison of A-C and
the reference sanxplu) will require placement by
A-D double digests. Thls order is verified by
double digestion.
the smaller size of the A-C fragment of inter-
c-sl relative to homologous piece produced by 7. After site maps have been generated for all
A-D (1.0 versus 1.5 kbj. samples, it is essential to test for site homology
6. Strategy for mapping additional samples de- across samples. In situations where sites ap-
pends upon the extent of divergence. pear fairly close (within 100 bp), they often are
assumed to be homologous, especially if they
a. Where samples differ by only a few site changes, have been mapped with small fragments re-
maps are readily generated with few additional solved on polyacrylamide gels. When sites in
digests. Single-site gains usually can be placed different samples appear to be within 100-500
with only one double digest. Where fragment b p of each other or enzymes produce small
patterns are identical or differ by one or two site fragments, site homology across samples must
losses from the reference sample, sites can be be tested in side-by-side comparisons.
(A) C A A B B B A S (B) C A A B B B A S (C)

14.3
1 2 . 5 a -2,2
- 2.2
8.87 56.6
2.2 ---t
8 07.56.6
;:;
__+
l
1 5
G
8.0
1.08
Figure 17 RE fragment patterns and maps of
NlndIII nztDNA haplotypes from Luxilus cornutus
(types A and B) and L. clzrysocephaltls (type C). Pan-
els (A) and (B): agarosc and polyacrylamide gels,
respectively, with haplotypes designated by letters
A-C; the size standard (S) is a combination of h
bacteriophage DNA digested with Hind111 and
4x174 RF DNA digested with HaeIII. Panel (C): net-
work of haplotypes showing RE sites and changes
(maps from Dowling et al., 1992).Whereas the site
loss differentiating pattern B from A (at position
2.2) could be inferred from a complete map of A
:
(1.27
and knowledge of the fragment changes (panel A),
the converse (i.e., inferring the site gain in A rela-
tive to a complete map of B) is not possible. Small
fragments (such as the 450-bp fragment in haplo-
types A and B produced by RE sites at 7.5 and 8.0),
are best examined and mapped using polyacryl-
0.19 amide gels (e.g., panel B).
In this instance, it would be neccessary to verify

homology of sites Y1 and Y2 across samples. This
would be achieved by double digestion of each
sample for REs X-Y and Z-Y and comparison of
As an example, assume the following maps had fragment lengths across samples. If the maps
been inferred for four taxa (1-4) with the con- above are correct, then the following fragments
served enzymes X and Z (each cleaving once) and should be observed:
the polymorphic enzyme Y (producing sites Y1
and Y2; arrows indicate sites, + and - indicate
Fragment size (in kb)
presence and absence, respectively). Taxon X-Y1 2-Y2
1 2.0 1.0
Fragment 2 2.0 1.3
size: 2.0 0.3 1.0 67
3 2.3 1.O
X Yl y 2 z 4 2.3 1.O
C
Taxon 1 + + + +
Taxon 2 + + - + Sizes of X-YI fragments would indicate that site
Y1 is found in taxa 1 and 2 but not 3 and 4. Like-
Taxon3 + - + + wise, comparison of Z-Y2 would indicate that
Taxon4 + - + + only taxa 1,3, and 4 share site Y2.
f7nrtD. Sequential Hybridiralion two different probes in plastic buckets. After

This is the approach used to survey and map RE autoradiography strip the probes and rehy-
variation for large, complicated target sequences bridize with other clones. Clones may range
(e.g., cpDNA). in size from fairly large (5-15 kb, as dustrated
in Jansen and Palmer 1987b) to small, gene-
I. Digest each DNA (total cellular or purified specific clones of a few hundred base pairs
cpDNA) with 10-20 REs that each cleave the (see Palmer et al., 1988b).
genome 20-100 times (Protocol 3).
5. If the initial survey was designed principalIy
2. Separate digests of different DNA samples, to find rearrangements, a secondary survey
each produced with the same RE, by elec- may have to be performed to sample more in-
trophoresis through 1.0-2.0% agarose gels tensively among taxa related to those that
(Protocol4). The loading dye should move 10 have rearrangements.
em, which allows the gel to fit onto a 12.5-cm
membrane. 6. In some cases, more detailed molecular char-
acterization of a particular rearrangement,
3. Make two filter replicas of each gel by bidi- such as fine-structure mapping or sequencing
rectional blotting onto durable nylon mem- of its endpoints, may be needed before the
branes (Protocol 8). event can be used with confidence as a phy-
4. Hybridize the two identical sets filters with logenetic character.
two different probes in plastic tubs. After au-
toradiography, strip the probes and rehy-
bridize with other clones until the entire tar-
get sequence has been covered. For cpDNA, a INTERPRETATION AND
total of 6 4 0 probes, requiring 3-20 rounds of TROUBLESHOOTING
hybridization, is usually used. The lower the
taxonomic level and expected level of varia- This section offers some guidance based on our
tion, the fewer and larger the probes that can experience, particularly with RFLP analyses of
be used, although one can compensate for whole mtDNA and cpDNA and the use of mi-
these factors by using enzymes that cut at a crosatellites. We have more limited experience
high frequency. Where one wishes to cover with ~nultilocusminisatellite analysis, but most
the entire genome with only a few hybridiza- potential difficulties are covered in the sections on
tions one can pool several clones in a single digestion and electrophoresis and on transfer hy-
nick-translation and hybridization reaction. bridization. For further guidance on this tech-
nique, consult (Bruford et al., 1992 and references
5. Interpret autoradiograms and make restric-
therein). For information on DGGE or SSCP, see
tion maps (see "Interpretation and Trou-
Lessa (1993) and Lessa and Applebaum (1993):
bleshooting.")
A typical survey for rearrangements in a genome RFLP Analysis
such as cpDNA (0150 kb) can be performed as fol-
lows: Fragment versus Site Approaches
There are two fundamelltally different approaches
1. Digest each DNA with between one and four
for analyzing RFLP variation: comparing frag-
REs that cut the genome 30-70 times.
ments or comparing sites. Fragment comparisons
2. Place all digests for a given DNA in adjacent suffer from several drawbacks. Many enzymes
lanes of a 1.0-1.5% agarose gel and electro- produce comigrating fragments, which may be
phorese until tracking dye has moved 5-10 cm. detected by a difference in intensity compared to
3. Make two filter replicas of each gel by bidi- other fragments produced by the same digest. Un-
rectional blotting onto durable nylon mem- der the best conditions (end-labeling), comigrat-
branes. ing fragments complicate the determination of
4. Hybridize the two identical sets of filters with fragment homology among genotypes. In the
worst case (EB staining, transfer hybridization), it Lengflz Vauiation

can be difficult to identify comigrating fragments, Length variation can be identified by the absence
let alone assign homology. Because of these prob- of fragments predicted by a site gain or site loss
lems and difficulties presented by length variation hypothesis and by a strong correlation of effects
and rearrangements, we argue that: the fragment among fragments produced by different REs. If
lnethod should be restricted to very closely re- tlie aim is to examine site gains and losses, sus-
lated sequences, and then used with caution. pected length variation should be verified and
Comparison of mapped restriction sites allows for characterized by mapping to remove confounding
interpretation of fragment pattern differences as effects from the analysis.
"individual" mutations that affect the presence Several different types of length variation are
and position of restriction sites. For animal observed commonly (e.g., in animal mtDNA, re-
mtDNA and cpDNA, the great majority of muta- viewed by Moritz et al,, 1987; Rand, 1993;
tions identified by mapping are restriction site Broughton and Dowling, 1994). Minor length
changes assumed to be due to single nucleotide variation can result from changes in copy number
substitutions within the 4- or 6-bp site surveyed. of small (e.g., <I00 bp) tandemly repeated se-
Fragments, on the other hand, do not vary inde- quences or even in number of nucleotides in a
pendently of one another. Given the increased string of the same base (e.g.,poly-C tracts). These
confidence in homology involved in mapping changes are most obvious where the variable re-
sites in site data, the extra effort over fragment gion is contained within a relatively small frag-
data is well worth tlie investment. ment and under these conditions the variat~on
Levels of variation also determine the choice should be correlated across REs. However, for
of REs (4-bp versus 6-bp recognition sequence) to some REs it may not be obvious if the variation
be used, each with its own particular set of diffi- resides in a large (e.g., 210 kb) fragment. Also, if
culties. When sequence divergence in animal the RE has a site within the repeated sequence, the
mtDNA is relatively high (>2-4%), it usually is variation will be reflected by differences in inte1-t-
possible to obtain enough characters from 6-bp sity of a fragment of the repcat size, rather than
REs. These typically produce fewer, larger frag- differences in length (e.g., Densmore et al., 1985).
ments with sites that are easily mapped. The diffi- This type of variation occurs at high frequency
culty is related to fragment size: large fragments and in animal mtDNA is often heteroplasmic.
have larger errors of measurement than smaller Larger scale length variation may be due to
fragments. Thus, maps and genome size estimates insertions or deletions; such variation will pro-
generated from ldrger fragments (especially >10 duce correlated changes among the fragmen ts
kb) are less accurate than those from small frag- produced by all REs. This type of variation is 11-
ments. The use of polyacrylamide gels for map- lustrated by a direct tandem duplication (Fjgure
ping significantly increases accuracy because 18) and a deletion that occurs in approximately
small fragments produced in double digests can half of the molecules (i.e., heteroplasmy; Figure
be visualized. When sequence divergence in ani- 19). In each case, the nature of the modification IS
mal. mtDNA is low (<2%), it may not be possible highly predictable, given a cleavage map for a
to generate enough characters with 6-bp REs to closely related sequence lacking the length muta-
address the question at hand. In such instances, 4- tion. More complicated changes that requlre
bp REs (which tend to produce more fragments) analysis by hybridizatioii as well as mapping (sec
are likely to provide more sensitivity, based on the below) include duplicative transpositions and m-
larger number of characters generated. Unfortu- sertions of exogenous n~atenal.
nately, 4-bp REs tend to produce complicated Apparent length variation also may resull
fragment patterns (i.e., many fragments with from copurification of containinating DNA lrom
comigration), making it difficult to infer site other organisms (e.g., parasites) with mtDNA.
changes. The combination of 4- and 6-bp REs pro- Such contaminants can be identified by transfcr-
vides a powerful approach to analyzing variation hybridization, using a typ~calrntDNA sample as
across a wide range of divergence levels. probe (Figure 20).
310 Chapter 8 / Dowling, Moritz, Palmer eS Rieseberg
Figure 18 Effectsof a 4.8-kb direct tandem duplication

on fragment patterns. Three types of changes are ap-
parent in comparing the standard length (S) and long
(L) genomes: (A) one fragment becomes larger if there Partial Digestion and Heteroplasmy
is no cleavage site for the RE within the duplication The presence of substoichiometric bands and a
(e.g ,BclI); (B) one additional fragment of the same size sum of fragments that exceed the sequence siz
as the duplication is present if the duplication includes (beyond acceptable error) suggests either in
a cleavage site kg., PvuII and BamHI); and (C) there are plete digestion or length heteroplasmy. Such
multlple additional fragments which sum to the size of
the duplication and all but one of which comigrate with ments should not be ignored or confused
previously existing fragments. (D) The location of RFLPs that are due to changes in restriction sit
cleavage sltes for BclI (C), PvuII (p), BamHI (B), and Length heteroplasmy should be strongly corr
SacTI (s) in relation to the 4 8-kb duplication (see Moritz lated across REs (see above) although this may
and Brown, 1986 for further details). The M lane is the complicated if the length variation is dispersed
size marker, ;t bacteriophage digested with HindIII.
Nucleic Acids 11%.Analysis of Fragments and Restriction Sites 311
c v a l h x ap c c e v c ab
1 7 . 6 k b ~ 1 1 ' 1 11 1 I I I I
h Xnnnn c s s n l I nc
I
I -
-0.5 kb
\ I
j 3.9 kb ,
;
\I I ,/
cv a lh x a? c c e x
13.2kb h l l / l l l l l i l ' 1 ?I' L
h Xnnnn c s s b nc 1
Figure 19 Effectsof a heteroplasmic 3.9-kb deletion in deletions (3.9 and 0.5 kb) in relation to the S genome
CnemidophorusmtDNA. (A) Comparison of end-labeled cleavage map. For E s that have no site within the dele-
fragments of the standard 17.6-kb mtDNA (S) and the tion, one fragment simply gets smaller (e.g., lanes 2,3,
heteroplasmic 17.6/13.2-kb sample (Dl. The two types 7). For REs that cut within the deleted region, two frag-
of genome in the heteroplasmic sample are present in ments are missing and have been replaced by a larger
approximately equal quantities. The first pair of lanes fragment equal to their sum minus the deletion (e.g.,
are partial digests showing two size classes of relaxed lane 5, 10.7 and 6.7 replaced by 13 kb; lane 9,lO.O and
circles and linear molecules in the D sample. The other 3.0 replaced by 8.6 kb). Abbreviations: a = AvaI, b =
eight pairs of lanes are digests with: (2) BamHI, (3) SacII, BamHI, c = BclI, e = EcoRI, h = HindIII, 1 = SalI, n = NciI,
(4) BclI, (5) EcoRV, (6) NciI, (7) XbaI, (8) EcoRI, and (9) p = PvuII, s = SUCH,v = EcoRV, and x = XbaI. The saw
AvaI. The bars indicate fragments from which 3.9 kb tooth region indicates a set of small tandem repeats that
was deleted in the 13.2-kb genome. The size marker is have been reduced in copy number in the deletion
digested with HindIII. (B) Map of the location of the genome.
312 Chapter 8 / Dowling, Moritz, Palnzer t
(A) Hind111 BamHI (B)

1 2 3 1 2 3 s
Figure 20 Contamination of CsC1-isolated mtDNA. mtDNA contaminated with exogenous DNA (extra
Samples are mtDNAs isolated from Rutilis alburnoides fragments in both digests of panel A); 3 = standard
and digested with HindJII and BamHI, with fragments mtDNA preparation; S = combination of k DNA di-
visualized by (A) end-labeling and (B) transfer hybrid- gested with Hind111 and 4x174 RF DNA digested with
ization. Lanes: 1 = mtDNA preparation contaminated HaelII. (Figure courtesy of M. J. Alves.)
with nuclear DNA (background smear in panel A); 2 =
(e.g., in Hyla [Pscudacris]crucifer mtDNA; Moritz site heteroplasmy results in only one larger frag-
et al., 1987).Incomplete digestion can be caused by ment. However, there also may be considerable
technical artifacts (see below) such that not all heterogeneity among sites in their cleavage rate. A
molecules are cleaved at all sites, or by true het- more sophisticated way to discriminate between
eroplasmy for a restriction site that results in large partial digestion and site heteroplasmy is to clone
fragments equal to the sum of two or more smaller from the sample (see Chapter 9) and test for the
fragments present in complete digests (e.g., presence of both restriction types among the
Wauswirtl~and Laipis, 1984; Benzten et al., 1988; clones. Bands suspected of representing partial di-
Gold and Ricl~ardson,1990). A crude but simple gests should be measured to ensure that they do
test i s to repeat the digest with a new batch of the indeed represent the sum of two or more smaller
RE or to digest with an excess of enzyme for ex- fragments. All too often, interesting phenomena
tended time periods. Typically partial digestion such as duplications or deletions are missed be-
produces a variety of larger fragments, whereas cause additional bands were assumed to be due t~
Nucleic Acids 111: Analysis of F~agmentsand Restriction Sites 3x3
partial digestion. Also, the combinations of frag-

ments present in partial digests is useful informa-
tion for mapping cleavage sites (Protocol 9).
Mapping Large Sequences

Larger sequences, such as cpDNA (typically 150
kb), are more difficult to map using the methods
described above because of the large number of
fragments produced (20-100 per enzyme; Table 3).
However, the large number of fragments provides
the potential for analyzing many more sites. The
most common approach to restriction mapping of
cpDNA variation uses sequential hybridization of
filters with a battery of cloned cpDNA fragments.
By adjusting the average size of the cloned frag-
ment relative to the average size of the fragments
being scored and the amount of variation ex- Figure 21 Hybridization analysis of cpDNA restric-
pected, one generates a series of autoradiograms tion site variation. (A) Photograph of ethidium bro-
mide-stained 1.0% agarose gel containing BstXI digests
of sufficient simplicity to allow the critical inter- of total DNAs (lanes 1 and 8) and cpDNAs of varying
pretation of fragment pattern differences in terms purity (lanes 2-7) of species from eight genera in the
of site or length mutations, and, in many cases, Asteraceae. (B)Autoradiogram sl~owinghybridization
the construction of more or less complete maps. to the gel at left with a 32P-labeled,nick-translated plas-
As an illustration of this approach, Figure 21 mid containing a 10.6-kb Sac1 fragment cloned from the
cpDNA of Lactuca sntiva. Numbers at right indicate
shows the hybridization of a cloned 10.6-kb frag- fragment sizes in kb. (Figurecourtesy of Robert Janscn.)
ment from the 151-kb chloroplast genome of Lac-
tuca sativa to a hybridization filter containing ei-
ther total DNA or purified cpDNA from each of
eight genera in the family (Asteraceae) to wluch Again, the simple inference IS the gain of an extra
Lactuca belongs. Note that even though the en- BstXI site in the latter two DNAs.
zyme used here, BstXI, cuts cpDNA relatively in- These observations establish the existence of
frequently (22-30 times) compared to many 6-bp two restriction-site mutations of potential phylo-
REs, the complexity of fragment pattern differ- genetic significance. The actual mapping of the
ences apparent in the purified cpDNAs precludes sites involved is not a requirement for their use in
analysis by direct inspection of whole genome phylogenetics, but can be accornplisl~edwithin
patterns (Figure 21, left panel). However, by prob- the framework of the overall hybridization analy-
ing with a fragment representing 7% of the sis (see below) and can be important in cascs of
genome, a simple and readily interpretable pat- potential confusion between site and length mu-
tern is produced (Figure 21, right panel). tations (see next paragraph). In the example
As a starting point, consider the simple two- shown in Figure 21, the order of the fraglncnts
banded pattern of DNA 4 in Figure 21: relative to could be established in two ways. First, in cases
this pattern, DNAs 1,3,5, and 7 have lost a 14.2- where the probes used are identical or nearly so
kb fragment and gained fragments of 9.0 and 5.2 to the DNAs on the filter, one can read the inten-
kb. The simplest explanation of these differences sity of hybridization signals in terms of the ap-
is that these four DNAs have gained an extra proximate amount of overlap of probe and filter-
BstXI site located 5.2 kb from one end of the 14.2- bound fragments. Since the 10.6-kb probe
kb fragment in DNA 4. Relative to DNAs 1,3, 5, hybridizes, in the simplest pattern, to two frag-
and 7, DNAs 6 and 8 have lost the 9.0-kb fragments whose sizes significantly exceed the probe
ment and gained fragments of 6.3 kb and 2.7 kb. length, the largest of these two, the 14.2-kb frag-
ment, must extend past the end of the 10.6-kb limited to taxa for which purified cpDNA is
probe by at least 5 kb. So too must one of the two available, as this will facilitate the mapping
smaller fragments (9.0, 5.2 kb) produced by site effort. Alignment of each enzyme map for the
galn within the 14.2-kb fragment. Given their reference genome is greatly aided by includ-
si;r*s and relative hybridization intensities, the ing on each enzyme gel or set of gels a double
only possible interpretation is that the 5.2-kb frag- digest of the reference genome with the en-
ment is internal to the 10.6-kb probe and that the zyme specific to that gel and an enzyme used
9.0-kbone extends across its end. Similarly, in in common in a11 the double digests. Draw
DNAs 6 and 8, the 6.3-kb fragment, which hy- the maps on two sets of sheets. On one sheet
bridlzes only weakly with the 10.6-kb probe, must draw all of the aligned maps for the reference
overlap the probe only slightly and the 2.7-kb genome, one atop the other. Use this for step
fragment must lie internal to it. This logic estab- 3. Draw the reference map for a single en-
lisl~edBstXI fragment orders of 1.5-5.2-9.0 in zyme on a second sheet. Include below this
DNAs 1,3,5, and 7 and 1.5-5.2-2.7-6.3 in DNAs 6 one-line map the aligned map of the clones
and 8. The second way to establish these fragment used as hybridization probes. Use this set of
orders is by using as hybridization probes frag- maps for step 2.
ments adjacent to the 10.6-kb fragment. One of 2. Group the autoradiograms by enzyme and by
these two fragments should hybridize to the mu- order of probe. Using the single enzyme map
tated fragments, of 9.0 and 6.3 kb, that span the sheets, draw the map for each taxon on a sepa-
junction between it and the 10.6-kb probe. rate line above the reference map. The com-
DNA 2 differs from all other DNAs in Figure pleteness of the mapping information needed
21 in lacking the 1.5-kb BstXI fragment and fea- will depend on the amount of variation de-
turing instead a fragment of 1.3 kb. In the absence tected. For divergent genomes, it may be nec-
of any other information one cannot distinguish essary ta write down each site and fragment
between two alternative explanations for this size for the whole genome; for similar
fragment difference: (I)a deletion of 0.2 kb in this genomes simply writing the variabie sites and
region in DNA 2 relative to all others, and (2) a fragments may suffice; for genomes of inter-
site mutation occuring 0.2 kb from one end of the mediate variabilify writing all the sites and just
1.5-kb fragment, with the additional 0.2-kb frag- the variable sizes may be enough. This step
ment in DNA 2 having gone undetected, probably will identlfy all clear-cut site mutations and all
being run off the end of the gel. Length mutations cases of ambiguity regarding length muta-
can be recognized by aligning the fragment maps tion/site mutations near ends of fragments.
constructed for each enzyme and observing cor- 3. Regroup the autoradiograms according to
related size changes overlapping the variable probe fragment. Analyze these together with
fragment in question (see above). the unified reference genome map to resolve
The general form of the site mapping analysis length mutation/site mutation ambiguities as
used in cpDNA studies consists of three steps: described in the preceding paragraph.
1 Construct a reasonably complete map of the
cpDNA of one of the taxa under study (the
reference genome) for each enzyme used. Troubleshooting
Logical choices for the reference genome Problems encountered in RE analyses typically oc-
could be the one used as the source of cloned cur at any of three stages: (1) during DNA isolation
hybridization probes or one which has been and storage, (2) during digestion and electrophore-
completely sequenced (e.g., tobacco; Shi- sis, and (3) during transfer hybridization. Alterna-
nozaki et al., 1986) and for which computer- tively, inherent properties of the sequence studied
generated maps can be made easily. If such could present problems. A brief list of problems
genomes do not fall within the study group that may be encountered and some solutions fol-
then the choice of reference genome should be low.
DNA Isolation and Storage (C) & =<

;-:
z.= E-z*z=--
-
problem: DNA degradation due to old tissue; ------ ---- -.-
-
senescence (in vlvo)
Remedy Use fresh tissue
problem: Improper storage of tissues (slow freez-

ing, freeze thaw)
~emedy:Store properly (see Chapter 3)
Problem: Breakage during isolation

Remedy: Extract gently contamination on
h a RE and end-la-
aminated Lacerta
problem: Breakage after isolation (freeze-thaw,nu- mtDNA. (B) BclI digests of nuclease-contaminated Lac-
clease or bacterial contamination, particularly in erta mtDNA. Note that degradation is most severe
dialysis) for the larger fragments and produces a smear of ran-
~ ~ B~ certain
~ ~
dialysis d
tubing ~ been;
has domly degraded DNA. The sample on the right is also
contaminated with nuclear DNA, which is obvious as
treated bee*ppendix); DNA relatively high-molecular-weightDNA. (C) The sane
clean and frozen. Test storage solution (TE) for mtDNAs digested with BclI after contaminating nucle-
nuclease activity. When the sample is degraded, ases have been removed by velocitization (Protocol 1,
use frequentPcuttingenzymes to reduce the aver- Part D). The largest fragment is now clear, indicating
age size of fragments thereby mini- negligible degradation due to nucleases. The poor res-
olution of fragments in some lanes is due to insuffi-
effects.If the has been cient care in clearing out the wells prior to loading the
contaminated, clean by: (1) velocity gradient cen- samples.
trifugation (Protocol 1, Part d), dye and salt re-
moval. (Protocol 1, Part f), and concentration by
isobutanol extraction; (2) pheno1:chloroform ex- Remedy: Remove any bubbles from wells, in-
traction (Chapter 9), or (3) commercially available crease run times to decrease effects of differential
wash solutions. heating
Digestion Problem: Fuzzy bands

Problem: Endonuclease or exonuclease contamina- Remedy: Reduce buffering capacity of running
tion (Figure 22) buffer (make new buffer)
Remedy: Titrate enzymes properly; switch to
cleaner enzymes; change suppliers; clean sample Problem: Missing small bands
as described above Remedy: Reduce electrophoresis time or use a
combination of agarose and polyacrylamide gels
Problem: Partial digestion (Figure 23)
Remedy: Use more enzyme; two-step digestion; Problem: Missing large bands
switch to better enzyme; change supplier; clean Remedy: Too much BSA in digests; could also be
up DNA (as above); redialyze to remove excess due to DNA degradation (see above)
salt
Problem: Non-specific background (i.e., "flecking")
Electrophoretic Artifacts in gels loaded with end-labeled samples (particu-
Problem: Retardation due to excess DNA larly agarose gels)
Remedy: Use less DNA; use purified or semipuri- Remedy: Use higher grade agarose, making certain
fied organellar DNA it is completely dissolved; make sure plates and
apparatus are clean; rinse gels before drying
Problem: Smiling bands down
Figure 23 Minnow rntDNAs digested with AvaI, and

end-labeled with 32P, demonstrating various states of
digestion. The sample in lane 1 is uncut, showing re-
laxed circular and linear molecules (upper and lower
fragments, respectively). Samples in lanes 2, 3, and 5
show varying degrees of partial digestion, whereas the
sample in Iane 4: is digested to completion. S = it bacte-
riophage DNA digested with HindIII.
DNA transferred) with high-copy-number probes;

hybridize with total mtDNA or cpDNA to assess
bad spots
Problem: Double images

Remedy: Don't slide gel or filters around while set-
ting up the blot
Problem: Large fragments weak and poorly trans-

ferred
Remedy: Use acid depurination in transfer proto-
col
Transfer Hybridization: Hybridization Problems

Problem: Non-specific background
Remedy: Don't let filters dry in wrapping, wash-
ing, and exposing; improper prehybridization (es-
pecially if large, bubble-shaped blotches), strip
and repeat; use larger volume of prehybridization
solution
Problem: Hybridization to contaminating vector

DNA
Remedy: Use isolated inserts; avoid vector conta-
mination of DNAs
Problem: Inability to strip completely previously

hybridized probe
Remedy: Let filters decay; use low-copy-number
Transfer Hybridization: Poor Transfers and more divergent probes first, high-copy-num-
Problem: Bubbles and spots of no transfer ber and conserved probes last; don't let filters dry
Remedy: Treat filter carefully-do not touch with after probing
bare hands; roll out bubbles in setting up blot
Problem: Weak bands on autoradiogram (new filter)
Problem: Bottom filter weaker (and blurrier) than Remedy: Be sure transfer was complete (stain gel
top with EB following transfer); be sure probe is la-
Remedy: Avoid excess weight in blotting; use bot- beled to high specific activity and in sufficient
tom filters (which may be weaker due to less concentration
Problem: Weaker signals with time and reuse For most loci, there is considerable shadow~ng
Remedy: Don't wash filters excessively; use low- (e.g., Figure 2A) thought to be duc to replication
copy-number and more divergent probes first, slippage during the PCR reaction, although so-
high-copy-number and conserved probes last matic mutation is another possibility. Whatever
the cause, the shadowing usually is very consis-
problem: Unequal hybridization efficiency to dif- tent across samples and between ainplificatlons so
ferent fragments of the same digest that alleles can be reliably and repeatably scored.
Remedy: A probe produced by random-priming Phenotypes should be inspected carefully to den-
(Feinberg and Vogelstein, 1983) may not be label- tify heterozygotes for closely spaced alleles: e.g ,
ing randomly, so try a nick-translated probe; or a if the typical pattern for a homozygote is a domi-
heterologous probe may not be binding equally nant band and two sub-bands with 2-bp spacing,
well to all fragments, $0 use a probe more similar a heterozygote will have two similar intensity
to the target DNA or reduce the stringency of bands followed by two sub-bands.
wash conditions In some cases, we have observed more pro-
nounced "stuttering" and/or weaker ampliflca-
Intrirtsic Biological Problems tion in larger than in smaller alleles, especially
P1,oblern: Contamination of sample with exoge- where allele sizes are bimodal and widely sepa-
nous DNA (Figure 20) rated. In such circumstances, tests for Hardy-
Remedy: Characterize and exclude exogenous Weinberg equilibrium (Chapter 10) or direct tests
fragments when analyzing data of inheritance are useful to verify that the differ-
ent fragments are allelic. These tests also should
Problem: Heteroplasmy be performed to test for high-frequency null alle-
Remedy: Characterize and account for variable- les: variants that fail to amplify because of large-
length fragments when analyzing data scale expansion or mutations in the priming sites
(see "Assumptions").
Problem: Cross-hybridization to another genome Alleles are scared by thcir absolute size rela-
or to repeated sequences within the genome (Fig- tive to size markers, which usually are known se-
ure 24) quence ladders that are loaded on the same gel.
Remedy: Switch probes; use subportions of the These should be run on either side (3-4 spaces
probe lacking the cross-hybridizing region; make from the edge of the gel) and in the middle as
sure probe is free of contaminating DNA well. To check for consistency of scoring, a small
number of samples should be run on different
Problem: Distinguishing point mutations from gels as internal controls.
small-Iength mutations
Remedy: Use both agarose and polyacrylamide Tvoubleskooting
gels to visualize all possible fragments Most technical difficulties encountcred with mi-
crosatellites are the same as for general PCR (see
P~*oblem: Comigrating, non-identical bands Chapter 7) or for running of sequencing gels
Remedy: Establish homology by constructing re- (Chapter 9).
striction maps
Problem: Excessive sl~adowing,making reliable
scoring of microsatellite alleles impossible
Microsatellites Remedy: Optimize PCR condit~ons,particularly
For the most part, interpretation of microsatellites primer concentration, Mg2+ concentrat~on,and
as codominant alleles is straightforward. Samples annealing temperature. In general, minimize an-
are run adjacent to known sequences so that the neallng and extension times and maximize au-
precise size of each product can be determined. nealing temperature
318 Chapter 8 / Dowlhzg, Moritz, Palmer ti Xieseberg
Figure 24 The effect of contaminating nuclear DNA in rafus mtDNAs digested with Mbol and end-labeled
hybrldlza tion probes. Contamination can be obvious, as with 321?. Note the faint band (indicated by the arrow)
portrayed in panels (A) and (B),or subtle, as demon- of nuclear origin (identified as nuclear by EB-staining
straled in panels (C)and (D). (A) Tz~rsiopstruncntt~stotal of total DNA digests; gel not. shown). (D) Tursiops truiz-
DNA sampXes digested wlth NtnPI and hybridized with catus total DNA samples digested with MboI and hy-
Tu~siopsmtDNA contammated with nuclear DNA. S = il bridized with Tursiops mtDNA, contaminated with nu-
bacteriophage DNAdigested with HzndIlI. (B) Same fil- clear DNA. The arrow identifies a fragment of nuclear
ter as (A), hybridized with a Tursiops mtDNA sample origin (see panel c). M, 9x174 RF DNA digested with
lacking nuclear DNA contammation. (C) Tursiops tr~ln- HarIII.
Proillem Poor resolution of closely spaced alleles Remedy: Redesign primers to generate products of
Rcrnedy. Run fragments into bottom 1 / 4 of gel to <200 bp (this can reduce shadowing as well be-
incrzase separation cause of increased efficiency of amplification)
APPENDIX: STOCK SOLUTIONS
Protocol I : Isolation of Animal mlDNA
CsCl stock solutions

Density (g/ml)
1.40 1.55 2.70
g CsCl 53.3 73.3 93.3

ml10x TE 10.0 10.0 10.0
Hz0 ml 71.7 66.7 61.7
PI (2 mg/ml in TE; ml) 5.0 5.0 5.0
Total 100 100 100
Dialysis Tubing J'rotocoli 2: Isolation of cpDNA

(Modified from Sambrook et al., 1989)
1. Cut the tubing (MWCO = 12,000-14,000, di- Tsalation Buffer
ameter = 6.4 mm) to a convenient length. 0.35 M sorbitol
2. Boil (hard, rolling boil) for 10 min in 1L of 2% 50 Tris-HCl, ~ H 8 . 0
sodium bicarbonate, 10 mM EDTA, pH 8.0. 5 mh4 EDTA
3. Rinse thoroughly with distilled water. 1.0% BSA
4. Boil (as above in step 2) for 10 min in 10 miM 0.1% P-mercaptoethanol
EDTA, pH 8.0.
5. Allow to cool and store at 4"C, being certain Lysis Buffer
that there is enough liquid to keep tubing 5% sodium sarcosinate (w/v)
submerged at all times. 50 mM Tris-HC1, pH 8.0
25 mM EDTA
Sodium Chloride-Trmlri-EDTA-St~crose
JSTES) Wash Buffer
I part: 0.35 M sorbitol
1.5 M sucrose in ThE 50 mM Tris-HC1, pH 8.0
25 mM EDTA
5 parts:
10 mM Tris
100 rnM EDTA
Protocol 4: Electrs[rpI~arrsls
10 rnM sodium chloride, pH 7.5
Ammtbniulrs. Fersulf ate ( APS)
ThE
10% solution in water
10 mM Tris Store at 4°C.
100 mM EDTA, pH 7.5
320 Chapter 8 / Dowlilzg, Morih, Palmer & Rieseberg
Label Buffer
40% sucrose Recipe is for 1ml total.
0.25% bromophenol blue in 5x TBE lox stock Final concentrations used-
p- .
- - -
Store at 4OC. 60 p11 M potassium chloride 6 mM KC1
50p12M Tris 10 mM Tris
100 p11 M magnesium chloride 10mM MgC12
5% solutior~in chloroform (or some suitable 5 p114 M P-mercaptoethanol 7mM P-ME
replacement) 785 p l sterile distilled water
TEMED
Use flush from the bottle
Nick Translakion~Buff e1:
50 mA4 Tris-HC1, pH 7.2
10 mM magnesium sulfate
1 mhl dithiothreitol
3 M sodium chloride
1M dithiothreitol (DTT)
2 M Tris, pH 7.4
1M magnesium chloride 10 mM Tris-HC1, pH 8.0
1M potassium chloride 1 rnM EDTA
14 M &mercaptoethanol 100 mMI sodium chloride
deionized, distilled water (ddH20) 0.2% SDS
eic Acids IV:
Sequencing and C
David M. Hillis, Barbara K.Mable, Allan Larson,
Scott K.Davis, and Elizabeth A. Zimmer

Sequencing and cloning technology have developed rapldly over the past two
decades. Although nucleic acid sequencing is a comparatively new approach for
systematics (as it is for all of biology), thc power of the technique has ensured
that DNA sequencing has became one of the most utilized of the molecular ap-
proaches for inferring phylogenetic history. During the past few years, sequcl~c-
ing studies have accounted for about half of all molecular investigations ol sys-
tematics, and about one-quarter of phylogenetic studies in general (Sanderson el
al., 1993).This widespread use of the method is a direct result of the high infor-
mation content relative to the effort needed to collect nucleotlde sequence data, as
compared to some other techniques. Nonetheless, sequencing is labor-intensive
per individual and per locus examined, and other approaches are often more ap-
propriate when many loci or individuals need to be sampled (see Chapter 12).
The primary attractions of nucleic acid sequencing include the facts that. (1)
the characters (nucleotides) are the basic units of inforlnation encoded in organ-
isms; (2) it is relatively easy to extract and incorporate information about molcc-
ular evolutionary processes into analyses; (3) sequence evolution is compara-
tively easy to model (and tests of the models are relat~velystraightforward;
Goldman, 1993);and (4) the potential sizes of informative data sets are immense.
Some species contain more t l ~ a n10l1 nucleotide pairs per haploid genome, al-
though the number of independent characters that could be used i n phylogenetic
322 Chapter 9 1 Hillis, Mable, Lauson, Davis & Zimmer
analysis is considerably lower. Of course, in order tinued advances in sequencing technology, the
to use nucleotide sequence positions in phyloge- rapid increase is likely to continue. Although only
netlc studies, homologous sequences must be a tiny fraction of these data are collected for sys-
aligned. The number and size of homologous se- tematic studies, many of the data can be used for
quences that can be aligned will differ depending this purpose. Unfortunately, most of the available
on the level of comparison, but for most studies data have been collected from a very few species.
potentially useful variation is essentially inex- Figure 1 shows the taxonomic distribution of the
haustible in the near futxre, sequences currently in GenBank, and it is clear
Nucleic acid sequence data are compiled con- that chordates are heavily overrepresented and
tinuously in several data bases; GenBank (main- most other eukaryotes are seriously underrepre-
tained by the U.S. National Center for Biotechnol- sented compared to extant species diversity. In
ogy Information at the National Library of fact, a Iarge fraction of the sequences are from
Medlclne) is perhaps the best known and most about ten species of commercial or medical impor-
w ~ d e l yused. Another well-known data base is tance to humans (Hillis, 1987). Thus, although se-
corxpiled by the European Molecular Biology Lab- quence data collected for other purposes are a use-
oratory These compilations represent, in effect, the ful starting point for some systematic studies,
largest comparative data sets ever collected. When systematists usually must collect comparative data
the fist edihon of this book was published in 1990, for relevant species of interest.
there were fewer than 108 nucleotides in GenBank. Although there are a number of strategies for
As of 1995, there are more than three times as obtaining sequence data for use in systematics, all
many nucleotides (approximately 3 x lo8) in the methods have four basic steps. First, a particular
database, from about 3.5 x l o 5 sequences. The target sequence must be identified that contains
genomes of many viruses have been sequenced i n an appropriate amount of variation across species
their entirety, and the complete sequence of the or individuals for the problem that is to be ad-
genome of a free-living organism (Haemophilus in- dressed (this step is discussed below, under "Ap-
fluenza~)has been reported (Fleischmann et al., plications and Limitations"; also see Chapter 8).
1995). With an increased number of laboratories Second, many copies of the target sequence must
collecling nucleotide sequence data, and with con- be isolated and purified from each individual to
Viruses
I
Chordates -
Figure 1 Taxonomic distr~bution

nucleorlde sequences in GenBank as 1 '~rtl~ro~ods
of !994 (From Hillis, 1994b.) Nematodes
Nucleic Acids IV: Sequencing and Cloning 323
---
~ ~ ~ Figure 2 =Sequencing~and cloningyflow chart. ~
Num-
--
&--* --- bers refer to protocols.
be examined. Third, the purified DNA (or, rarely, whole genomic DNA or cDNA (Chapter 7) and
RNA) must be sequenced. Finally, homologous sequenced directly, or RNA transcripts of the
sequences must be aligned (alignment is dis- genes of interest can be isolated and sequenced.
cussed in the section on "Interpretation and Trou- Amplification of target DNA sequences by the
bleshooting"). The various methods differ pri- polymerase chain reaction has become the most
marily in how the nucleic acid is isolated: "direct" widely used approach for comparative studies.
methods involve either directly amplifying the Direct sequencing of RNA can be used to se-
target DNA or isolating abundant RNA tran- quence nuclear-encoded ribosomal RNAs or (oc-
scripts; cloning methods involve the preparation casionally) messenger RNAs that are particularly
and isolation of viral and/or bacterial vectors that abundant in a particular tissue (Weisman et al.,
contain copies of the sequence of interest. Each of 1986). For most applications, however, direct se-
the methods has distinct advantages and disad- quencing of RNA has been replaced by amplifica-
vantages and is particularly appropriate under tion (of genomic or cDNA) and sequencing; the
certain conditions. In the next few sections, we exceptions are cases where a target sequence is
outline the differences of the most widely used difficult to amplify in a particular taxon.
methods and discuss the relative merits of each
approach. Cloning
Whole genomic DNA can be used to construct ge-
nomic libraries, which contain virtually all of an
Isolating Target Sequences organism's DNA cloned in pieces into a viral
In molecular systematics, three methods are com- host-usually one of several-derivatives of the
monly used to isolate nucleic acids for sequencing lambda (h)bacteriophage. The library often con-
(Figure 2). Cloning using recombinant DNA tech- tains 108 or more copies of packaged viral DNA,
nology is the most widely used approach in mol- each with a fragment of the original organism's
ecular biology, although systematists often are re- genome. In order to use the library, an investiga-
luctant to use this strategy because of the tor must find the viruses that contain the gene or
perceived time and effort involved. Alternatively, region of interest and grow additional copies of
target DNA sequences can be amplified from these viruses (Figure 2). Although DNA can be
324 Chapter 9 / Hillis, Mable, Larson, Davis & Zimrneu
isolated and sequenced at tlus point, it is usually striction enzyme, ligate the DNA into the cloning
desirable to subclone the target sequence into a vector, and package the resulting recombinant
sequencing vector fjrst (usually a bacterial plas- lambda DNA with commercial packaging extracts
inid or the virus MI3 and its derivatives). Plasmid to produce subgenomic libraries. (They are subge-
clones can be stored indefinitely in a freezer and nomic libraries because restriction fragments of
grown in quantity whenever desired, and the tar- inappropriate sizes will not be represented.) The
get DNA can be easily isolated and sequenced. choice of cloning vector will depend upon the de-
Bacteriophage lambda is one of the most ex- sired target site and upon the size range of inserts
tensively used cloning vectors for initial cloning to be cloned. Lambda vectors generally are liin-
steps, such as construction of genomic and subge- ited to inserted fragments of less than 23 kb, and
nomic libraries (for reviews, see Frischauf, 1987 many vectors can incorporate only much smaller
and Salnbrook et al., 1989).Numerous lambda de- fragments (typically in the range of 2-15 kb). If
rivatives have been constructed, each of which larger fragments must be cloned, then the libraries
has advantages for certain cloning applications. should be constructed in cosmids, which are spe-
This bacteriophage is a double-stranded DNA cialized cloning vectors designed to accommodate
virus of approximately 50 kilobases (kb) in length, fragments of up to 45 kb in length (see DiLella
with single-stranded complementary ends that al- and Woo, 1987, or Salnbrook et a]., 1989 for more
low the lambda DNA to circularize after entering information).
a bacterial host. In the host bacterium, the lambda In order to use the gene library, one must
DNA is replicated by one of two pathways (lytic screen the recombinant lambda (or other) clones
and lysogenic cycles). In lytic growth, many to find the particular gene or DNA region of in-
copies of the bacteriophage DNA are produced terest. Numerous methods have been developed
via rolling circle rephcation, and are then pack- for screening gene libraries (see Berger and Kim-
aged into a protein coat that consists of a head mel, 1987; Ausubel, 1989). Most of these methods
(that contains the DNA) and a tail. The host bac- involve either hybridizing with a nucleic acid
terium that contains these mature viruses is then probe (see Protocol 8) or immunoscreening for ex-
lysed and the progeny phage are released. In a pressed proteins of interest. Hybridization is the
petri dish, this lysis can be visualized against a more general procedure, although it is not as effi-
background of a bacterial lawn as a plaque (a cient for screening for protein-coding genes that
clear spot in which the bacteria have been lysed). can be expressed in vivo.
Modifications of lambda bacteriophage for If the library is composed of random frag-
cloning have deletions of a central, non-essential ments of average size I from a genome of size G,
(for lytic growth) portion of the genome, into the number of independent clones (N) that must
which foreign DNA may be inserted. They also be screened to isolate a single-copy fragment of
have been selected to have only one or two re- interest with a probability of P can be calculated
striction sites for a given restriction enzyme, by the formula
which are the target sites for cloning. In vectors
wit11 two cloning sites (known as replacement
vectors), a fragment of the bacteriophage DNA is
replaced with the foreign DNA; in vectors with a
single site (known as insertion vectors), the for- As a rough approximation, one must screen ap-
eign DNA is inserted into the bacteriophage. Nu- proximately five times the number of base pairs
merous vectors have been constructed that con- of DNA that are in the genome [e.g., (I x N/G =
tain a diversity of cloning sites and accommodate 5)1to have a 99% chance of locating a specific sin-
a relatively large range of foreign fragments. gle-copy gene (Seed et al., 1982).However, the to-
Many of these are commercially available as tal number of clones screened can be much
predigested phage arms, so that one only needs to smaller for sequences that are present in high
digest the target DNA with an appropriate re- copy number (e.g.,rRNA genes, mtDNA, cpDNA,
Nucleic Acids IV: Sequencing and Clorziizg 325
and highly repeated heterochromatic sequences). the polymerase chain reaction (PCR) technique
Screening efficiency can be increased greatly by (Kleppe et al., 1971; Mullis and Faloona, 1987;
cleaving the genomic DNA with appropriate re- Ochman et al., 1988; Saiki et al., 1988; see also
striction enzymes (rather than random shearing) Chapter 7). Starting with virtually any amount of
and selecting lambda vectors that only accept re- DNA, it is possible to amplify a target sequence
striction fragments in the size range of the desired up to microgram quantities. Under some circurn-
target sequence. Of course, for this approach, a re- stances (see some of the limitations discussed in
striction map must be obtained before the library Chapter 7), this amplified DNA is sufficiently ho-
can be constructed (see Chapter 8). rnogeneous and of sufficiently high quality for se-
Once the appropriate DNA fragment has been quencii~g(Figures 2 and 3). Double-stranded
cloned and isolated in a lambda vector, it can be DNA produced by PCR amplification can be se-
subclol~edinto a plasmid or the virus M13. Sub- quenced. directly or by generatjng single-stranded
cloning is accomplished by cleaving the target se- DNA from the amplification product (see review
quence from the lambda with the same restriction by Bevan et al., 1992). Single-stranded DNA is
enzyme with which it was originally cloned, and generated by asyminetric realnplification using an
then ligating the ends of the DNA with plasmid excess of one of the primers (Gyllensten and Er-
or N13 DNA that has been cleaved with a restric- lich, 1988; Chapter 71, by treatment with exonu-
tj01-1enzyme to produce compatible ends (see Pro- clease (Iliguchi and Ochman, 19891, or by use of
tocol 10 and Chapter 8). The subcloning vector is biotinylated primers (Mitchell and Merrill, 1989).
then introduced into a bacterial host in a process The basic requirement for PCR is that the se-
known as transformation (see Protocols 11-13). quences of the regions flanking the target se-
This allows the cloned fragment to be grown in quence are known so that primers to these regions
quantity, easily isolated, and sequenced. DNA can be constructed (however, a method called in-
amplified in vitro (i.e., PCR; see Chapter 7 and be- verted PCR can be used. to amplify outside of a
low) also can be cloned in this manner in order to known region; Ochman et al,, 1988). Using this
purify heterogeneous amplification products, pro- methodology, it is possible to sequence DNA iso-
duce a stable clone (so that the isolated sequence lated from a wide variety of sources, includirrg
can be used for other purposes or by other indi- preserved museum specimells and (under special
viduals), and facilitate sequencing. Until recently, circumstances) fossils or subfossils (Piiiibo et al.,
MI3 was the most widely used sequencing vector 1988; T.J. White et al., 1989; Golenberg et al., 1990;
because it allowed single-stranded sequencing, DeSalle et al., 1992; Cano et al., 1993; DeSalle,
which provided superior autoradiographs com- 1994; Cano and Borucki, 1995).However, at least
pared to double-stranded sequencing. However, some "fossil" sequences that have been reported
new double-stranded sequencing protocols have (e.g., Woodward et al., 1994) are clearly contami-
greatly improved sequencing of plasmid clones, nants from recent l ~ u m a n s (Hedges and
and the greater ease with which plasmids are Schweitzer, 1995;Allard et al., 1995; Zischler et al.,
grown, manipulated, and stored is a strong point 1995) or other sources.
in their favor. Also, some recently developed plas-
mids have single-stranded farms that can be used RWA Imlatio?~
for single-stranded sequencing. Improved meth- The tlurd approach to isolating target sequences 1s
ads for insertion of PCIi products into plasmid to isolate the transcr~bedRNA and sequence l t us-
vectors (TA cloning, incorporation of restriction ing reverse transcriptase (Harnlyn et al., 1978; Pro-
sites in PCR primers) also have made direct tocol 23). Before PCR, this method was used ex-
clonil~gmore accessible (see Protocol 18). tensively in sequencing the ribosomal RNA genes,
and the method is still used to sequence RNAs
In Vitro Amplification that prove difficult to amplify. Ribosomal RNA is
Direct sequencing from complex genomic DNA relatively easy to sequence directly because it ac-
has been made possible with the development of counts for a large fraction of the total cellular
326 Chapter 9 / Hillis, Mable, Larson, Davis 8Zimmer
JiNA Some regions of rlWA sequences are con- many studies, the approach has been replaced by
served th~oughoutmost living organisms, and PCR amplification of the rRNA genes (Chapter 7).
scveral universal primers have been constructed
iha t are complementary to these regions (Lane et
al., 1985; Hillis and Dixon, 1991).These primers
Nucleic Acid Sequencing
can bc used to sequence several regions of rRNA Although protein sequencing became a routine
from v~rtuallyany organism. This technique has (albeit costly and labor-intensive) method for the
had n major impact 011 systematic studies of study of protein molecular evolution by the late
prokaryotes (e.g., G.E. Fox et al., 1980) and has 1950s, nucleic acid sequencing did not become
been applied througl~out~netazoanand plant commonplace in studies of molecular systematics
groups as well (e.g.,Field ef al., 1988).However, in until the 1980s. In fact, until the mid-1970s, only
stretches of DNA 15-20 base pairs in length had
been sequenced. Breakthroughs in nucleic acid se-
Figure 3 AmpIificat1oi.i of a conserved region of quencing were published almost simuItaneousIy
n?tDNA via the polymerase clxam reaction. The primers
shown aniplify a region of the mitocho~~dr~al cy- by Maxam and Gilbert (1977) and Sanger et al.
iochrome b gene ln vertebrates and some invertebrates (1977).These procedures are outlined in Figures 4
(Kocher et al., 1989). and 5, respectively.
j.
-300 bp unknown
I
1 ~~-~GCTTCCATCC~CATCTCAGCATGAT~~~XXXXXX---XXXXXX~'TGA~CAAATATCATTCTGA~TGCGTTT-Y
1 ?-TTTTT~G~GGCTA~TTGTA~GTCGTACTACTTTXXXXXXX---XXXXXXXACTCCTGTTTATAGTAAGACTCC~~AAA-~' 1
-
I I
I-Ieat at 95°C
to separate strands
I
I I
Cool to 50°C
to anneal primers
I I
I I
Warm to 70°C
for DNA replication
I
I ~'-~GCTTCCATCCAACATCTCAGCATGATGKAAXXXXXXX---XXXXXXXTGACX~ACAAATATCATTCTGA~T~TTT-~'
11 *
Primer extcilslon vla Taq polymerase
ACTCCTGTTTATAGTAAGACTCCCCGACGTCM-5'
Primer extension via Taq polymerase
I 5'-~Gr_TTCCATCCAACATCTCAGCATGATGKAA
I 3'-T~TTTOi4ETAGTTGTAGAGTCGTACTACTTTXXXXXXX-- - X X X X X X X A C T C C T G T T T A T A G T A A G A c T c ~ ~ A ~ ~ '
I
Repeat cycle =30 times

Maxam-Gilbert (Chemical) Sequencing separated into its two complementary strands,

Maxam-Gilbert, or chemical, DNA sequencing re- each of which was then end-labeled with a2P.The
lies on the use of base-specific modification and two complementary strands were each se-
cleavage reactions (Figure 4). In the original ver- quenced, and the results compared to check for
sions of this method (Maxarn and Gilbert;, 1977, errors. Development of sequencing vectors espe-
1980), a DNA fragment was electrophoretically cially designed for use in Maxam-Gilbert se-
SGATCAGGCTTAAGCA-3'
3'-CTAGTCCGAATTCGT-5'
'P-GATCAGGCT TAAGCA
(Cleaves at G ) (Cleaves at A I- G) (Cleaves at T + C) (Cleaves at C)

*P-GATCA *P-G *P-GA *P-GAT
'P-GATCAG *P-GATC *P-GAT *P-GATCAGG
*P-GATCAGGCTTAA *P-GATCA *P-GATCAGG 'P-GATCAGGCTTAAG
*P-GATCAG "P-GATCAGGC
*P-GATCAGGCTT "P-GATCAGGCT
*P-GATCAGGCTTA *P-GATCAGGCTAAG
*P-GATCAGGCTTAA
*P-GATCAGGCTTAAGC
I
G A+G T+C C
A
C
G
A
A
T
T
C
G
G
A
Figure 4 Maxam-Cilbert (chemical C
cleavage) sequencing. See text for ex- T
planation. A
328 Chapter 9 1 Hillis, Mable, Larson, Davis & Zimmer
quencing (Eckert, 1987) greatly facilitated the pro- Next, a short segment of DNA (typically 15-25
cedure; these vectors allow selective end-labeling, bp) known to be complementary to a segment on
SO that either strand can be sequenced without the target DNA (or in the adjacent sequencing
separation into single-stranded fragments. Se- vector) is annealed to the target sample; this short
quencing is accomplished by dividing the target fragment is known as a primer. The sample is
DNA into four subsamples and then treating the then divided into four subsamples, to each of
subsamples with a series of base-specific chemical which is added the four deoxynucleotides (i.e.,
reagents that partially cleave the DNA. The first dATP, dCTP, dGTP, and dTTP, at least one of
sample is treated with dimethyl sulfate, which which is radioactively labeled) and DNA poly-
mekhylates a few percent of the guanines in the merase. In addition, one of four dideoxynu-
sequence, and piperidine, whicl~displaces the cleotides (ddNTP) is added to each of the tubes
methylated guanines and thereby cleaves the (ddATP, ddCTP, ddGTP, or ddTTP, respectively).
DNA at these sites. The second sample is treated The primer has a free 3' OH group on its deoxyri-
with formic acid, which protonates a few percent bose, to which additional nucleotides can be at-
of purine-ring nitrogens, and piperidine, which tached. The DNA sequence is extended by the
then displaces the affected purines (adenosine DNA polymerase using the target DNA as a tem-
and guanine). The third sample is treated with hy- plate (Figure 5). On some strands in the sequenc-
drazine, which removes cytosine and thymine ing reaction, a given ddNTP will be incorporated
from the DNA and leaves ribosylurea. The DNA into the growing strand, at which point the poly-
is then cleaved at these sites with piperidine. The merization is terminated because the ddNTP lacks
fourth sample is treated like the third, except that a 3' OH group. The radioactive fragments in the
the hydrazine treatment is conducted in the pres- four subsamples are then separated by denatur-
ence of NaC1, which suppresses the reaction of ing polyacrylamide gel electrophoresis and visu-
thymine with hydrazine (so the DNA is cleaved alized by autoradiography, as with Maxam-
only at cytosines). In all of these subsamples, Gilbert sequencing. The fragments in each sub-
chemical cleavage is carried out under conditions sample will terminate with the corresponding
in which only a few of the respective nucleotides ddNTP (which is complementary to the dNTP on
in any given fragment are affected. However, be- the template sequence), and the sequence of the
cause the cleavage is random and the population target DNA can be read directly from the autora-
of DNA fragments examined is large, some frag- diograph (Figure 5).
ments will be cleaved at each of the nucleotide
positions (Figure 4). The radioactively labeled Cycle Sequencing
fragments from the four subsamples are then elec- Cycle sequencing is based on the dideoxynu-
trophoretically separated by size on a denaturing cleotide chain-termination method of Sanger et al.
polyacrylamide gel and visualized by exposing (1977) but utilizes a linear polymerase reaction to
the dried gel to X-ray film to produce an autora- amplify labeled DNA that is complementary to
diograph. The sequence of the DNA sample can the target DNA (Murray, 1989; Craxton, 1991).As
then be read directly from the autoradiograph discussed for the Sanger method, an appropriate
(Figure 4). primer molecule is annealed to a complementary
single-stranded segment of DNA in the presence
Sanger Dideoxy Sequencing of deoxynucleotide triphosphates (dNTPs) and
Sanger sequencing, or controlled interruption of dideoxynucleotide triphosphates (ddNTPs). En-
enzymatic DNA replica tion, uses dideoxynu- zymatically controlled DNA synthesis is initiated
cleotide analogs in primer-directed DNA exten- at the 3' OH terminus of the annealed primer and
sion to produce discrete DNA fragments (Figure continues until chain growth is terminated by in-
5). The double-stranded DNA is first denatured to corporation of one of the four dideoxynucleotides.
produce single-stranded DNA (or single-stranded In thermal cycle sequencing, DNA synthesis is
DNA is isolated from single-stranded vectors). catalyzed by a therrnostable DNA polymerase
Nucleic Acids IV; Sequencing and Clo~zi~zg328
$ 4 4 4
A C G T
Figure 5 Sanger (enzymatic)sequcnc-

ing. See text for explanation.
(e.g., Taq or Vent DNA polymerase). Heat denatu- Iinear amplification of labeled product (thus it i s
ration of double-stranded template allows labeled not a chain reaction). As in the Sanger method, ra-
primers access to a single strand and subsequent dioactive fragments in the four subsamples are
extension by the polymerase. Successive cycles of separated by denaturing poIyacrylamide elec-
denaturation, annealing, and synthesis result in a trophoresis and visualized by autoradiograp11)r.
330 Chapter 9 / Hillis, Mable, Larson, Davis G. Zimmer
.~lthough labeling of template can be quencing. They also permit sequence information
achicvcci through incorporation of an alpha-la- to be read from a gel directly into a computer
belcd dcoxynucleotidc triphosphate into the without the need for visual inspection of an au-
nascent chain ( d 2 P or d5S), the efficiency of the toradiogram. Imaging systems also are expensive
reactlor.\can be increased greatly by using end-la- and usually are acquired as shared institutional
beled prlmers. In this type of reaction, T4polynu- facilities, but they have the advantage of utility
cleotlde kinase (PNK) is used to add a y32P (or for analyzing diverse kinds of electrophoretic data
y3") froin rATP to the 5' end of a primer mole- in addition to sequencing sets. Detailed protocols
cule. Pnrners also can be biotinylated for chemi- are available from the manufacturers of both au-
luminescent DNA sequencing (Creasey et al., tomated sequencers and imaging systems so we
1991) or end-labeled with fluorescent dyes for au- will not duplicate them here.
tomated DNA sequencing (Tracey and Malcahy,
1991; see below).
Assumptions
Autornnfed Sequenci~?g In all broad-scale comparative studies, certain im-
There are a number of types of automated DNA plicit assumptions are made by the investigators.
sequencing, but most use Sanger sequencing with In the case of DNA sequencing, these i ~ ~ c l u das-
e
fluorescently labeled (rather than radioactively la- sulnptions about biochemical methodology and
beled) DNA fragments. These fragments are de- about genome and organismal evolution. It is im-
lccted during electrophoresis with the use of a portant to realize the limitations placed upon in-
tun.>ble laser. The laser is stationary with respect terpretation of sequence data that arise from such
to the cleclrophoresis apparatus, and fragments assumptions.
are recorded as they pass a single point. The At the biochemical level, the homogeneity of
process 1s "automatic" in that one does not visu- input DNA and the fidelity of DNA replication are
ally ~nspectan autoradiograph and manually issues of primary importance. Contamination of
record the results; instead, the sequence is template DNA (prepared either in vivo via cloning
recorded directly into a computer or onto paper or in vitro via PCR) or polymorphism in the se-
in the form of a chromatograph (Plate 21, which quence based on interallelic variation or pooling
may be interpreted (by computer software or vi- of individual samples may lead to uninterpretable
sually) lnto a DNA sequence. Because of their ex- or incorrect sequences. contamination is a poten-
pense and maintenance requirements, automated tially serious issue with PCR because even a sin-
sequencing machines often are maintained as part gle strand of foreign DNA can be amplified and
of a shared institutional facility with supervising thereby confound results. Contamination prob-
technicians hired to operate them. However, own- lems in PCR are most likely to occur for DNA sam-
ership and operation of automated sequencers by ples that are difficult to amplify because the sam-
i n d ~ v ~ d ulaboratories
al is becoming more comple contains chemical impurities, the target DNA
mon as costs decrease. It is likely that most large- is degraded, or the primers do not produce a good
scale DNA sequencing will be done by automated match to the template. In these cases, small
systems at some point in the future (Hunkapiller a m o u ~ ~oft scontaminating DNA that otherwise
ct a1 , 199i), although the automated sequencing would have been overwhelmed by the target
machines of tomorrow will likely be based on rad- DNA may be amplified in place of the target DNA.
ically different technologies from those of today High fidelity of DNA replication is important both
(e.g, Jett ct al., 1989; Soper et al., 1991; Harding in the amplification of the input DNA (done either
and Keller, 1992). In vivo or in vitro) and in the various methods that
Imaging systems that read Information from constitute controlled interruption of enzymatic
autoradiographs of manually produced sequenc- replication. When sequencing amphfied DNA, one
ing gels constitute an alternative to automated se- assumes that the sequence determined is the orig-
Jucleic Acids IV; Sequencing and Cloning 331
inal one isolated from the genome and not a vari- which known copies exist as distinctly sized prod-
ant produced during DNA replication. Some poly- ucts (Lessa 1992; Palumbi and Baker, 1994; see "In-
merases (e.g., Taq) have a high error rate (incorpo- terpretation and Troubleshooting").
ration of the incorrect nucleotide during DNA When the target sequences are part of a gene
replication), so sequencing of multiple isolates is family in which copies within an individual are
needed to confirm a sequence. homogenized (i.e., undergo concerted evolution),
Another implicit assumption when using it is important to use a typical or consensus repeat
DNA sequence information for phylogenetic stud- sequence. It also is desirable to determine in ad-
ies is that homologous genes can be reliably iden- vance that the rate of concerted evolution (gene
tified for comparison. If primers designed for PCR family homogenization) of the repeats in that fam-
recognize and anneal at more than one locus in a ily significantly exceeds that of speciation for the
particular genome, this assumption can be vio- groups of organisms compared (Hillis and Davis,
lated. Recent studies have indicated that tandem 1988; Sanderson and Doyle, 1992). A preliminary
duplications sometimes occur in mitochondria1 restriction endonuclease cleavage analysis (Chap-
DNAs (e.g., Moritz and Brown, 1987; Zevering et ter 8) can produce an estimate of the homogene-
al., 1991) and that some mtDNA genes have puta- ity of the gene family to be compared and this
tive non-functional nuclear copies (e.g., M.E Smith may dictate the appropriate sampling strategy
et al., 1992). Single-copy nuclear genes also have (Chapter 2). Assumptions of level of homology
been used recently for systematic studies but the are important to comparative sequence studies
potential for amplification of paralogous loci and (see T-Iillis, 1994a); for most phylogenetic studies
pseudogenes is an even higher risk. PCR arnplifi- except study of gene phylogenies, it is critical that
cation can also produce recombinant sequences orthologous (rather than paralogous) genes are
(i.e., combinations of nucleotides not found on a being compared (see Chapter 1).However, fami-
single strand of DNA in the individual amplified) lies of genes undergoing concerted evolution (and
if individuals are heterozygous for a particular thereby exhibiting plerology; Patterson, 1988) also
gene locus or primers anneal to multiple sites can be used to reconstruct relationships among
(Saiki et al., 1985; Scharf et al., 1988a,b; Scharf, lineages, as long as the branches in the tree are
1990; Bradley et al., 1993). The amplification of separated by enough time to allow homogeniza-
multiple products is a common result of amplifi- tion to occur (Sanderson and Doyle, 1992).
cations involving nuclear genes and may be due to With respect to analyses based on nucleic acid
the presence of processed pseudogenes or other sequences, both alignment and phylogenetic in-
duplicated gene copies. In addition, formation of ference steps involve making either implicit or ex-
heteroduplexes between the target gene and other plicit assumptions about evolutionary processes.
fragments (Zorn and Krieg, 1991; Valentine et al., Alignment algorithms are designed to maximize
1992)can make isolation of uniform sequences dif- percent sequence similarity while minimizing the
ficult. When target genes are part of a gene family, number of insertioddeletion events (see "Inter-
identifying orthologous sequences can be a prob- pretation and Troubleshooting"). Thus, base sub-
lem if different gene copies cannot be distin- stitutions are assumed to be more frequent during
guished by size, and amplification of multigene evolution and are penalized less severely by the
families is subject to both selection and drift dur- alignment algorithms. Furthermore, alternative
ing the PCR process (A. Wagner et al., 1994).Most alignments may be equally good, and the extent
of these problems can be overcome by (1) cloning, of reciprocal illumination between alignment and
(2) using rapid screening techniques to detect se- phylogeny reconstruction procedures has not
quence heterogeneity (reviewed by Lessa and Ap- been standardized. Choice of methods for phylo-
plebaum, 1993), (3) using procedures that discour- genetic inference and character weighting strate-
age amplification of recombinant sequences (see gies also rely to varying degrees on assumptions
Chapter 7),and/or (4) examining gene families in about evolutionary processes (see Chapter 11).
332 Chapter 9 / Hillis, Mable, Larson, Davis & Zirnrner
Comparison of the Primary Techniques the target bands using low-melting-point agarose
and then extracting the DNA using one of several
Cloning versus Direct Sequencing purification methods (see Protocol 19 and "Trou-
Development of the PCR method has had a major bleshooting").
impact on systematics, because it is faster and Another method for purifying amplificatiol~
sometimes easier to ampllfy DNAusing PCR than products involves using two internested sets of
to clone (see Chapter 7). The disadvantages are primers. After the initial amplification round has
that (1) the sequences of the flanking regions usu- been completed using the two external primers,
ally must be known; (2) the individual should be the DNA is reamplified with a set of primers in-
homozygous or otherwise homogeneous for the ternal to the first. This helps to purify the amplifi-
sequence of interest (or else steps must be taken cation product and reduces ambiguity (Mullis and
to control in vitro recombination); (3) only rela- Faloona, 1987). It also is possible to reduce the
tively short fragments can be sequenced directly; cost of Taq polymerase by isolating and purifying
(4) no clone is produced for verification or use in recombinant Taq enzyme following overproduc-
further work; (5) linear amplified DNA can be tion in E. coli (Engelke et al., 19901, although some
harder to sequence than is circular cloned DNA; applications of Taq polymerase are covered by
and (6)the cost of Taq polymerase can be quite patents that require purchase of the enzyme from
high. Because of the need for known flanking re- a licensed source. Moreover, because of the time
gions, the technique has been used mainly for re- and effort required to isolate and purify Taq poly-
gions for which complete and closely related se- merase, many laboratories probably will prefer to
quences are available (e.g., Wrischnik et al., 1987) continue to use commercially prepared DNA
or regions that are flanked by highly conserved polymerases.
sequences (see the sect1011 on "Useful Primers" in One major advantage of PCR is that the
Chapter 7). However, methods are available for method can be used to obtain sequences from al-
sequencing outside (rather than inside) of known cohol-preserved tissues (Kocher et al., 1989) or,
flanking regions (Ochrnan et al., 1988; Loh et al., rarely from fossil or subfossil specimens (Paabo et
19891, although technical difficulties with imple- al., 1988; T.J. White et al., 1989; Golenberg et al.,
mentation prevent the widespread use of inverse 1990; DeSalle et al., 1992; Cano et al., 1993; Thomas
PCR. Most of the disadvantages of PCR (except and Paabo, 1993; DeSalle, 1994; but see Zischler et
for the requirement of known flanking regions al., 1995).Because nucleic acids from such sources
and the high cost) can be overcome by combining tend to be degraded, PCR is the only approach to
DNA amplification and cloning strategies; cloning sequencing that typically is applicable under these
from the amplified DNA products is considerably conditions. It is thought that PCR can amplify
faster than cloning from whole genomic DNA (see fragmented DNA by "reassembling" the target
Protocol 181, especially since cloning methods from the fragments tl~roughsuccessive cycles'of
have been designed specifically for PCR products annealing and extension, wit11 each partially ex-
[e.g., TA cloning (Marchuk et al., 19911, blunt-end tended sequence acting as a primer for the next
ligations (Liu and Schwartz, 19921, or incorpora- fragment. Under these conditions, it is highly
tion of restriction fragments into PCR primers]. likely that the reassembled DNA is composed of
However, sequencing directly from PCR amplifi- fragments of sequence from many different copies
cation products reduces problems associated with of the target locus, but this is not usually a prob-
errors made by Taq polymerase, because errors in lem for high-copy, hoinogeneous targets such as
all but the earliest rounds of amplification will ap- most animal mtDNA. Application of methods for
pear as ambiguities. Length polymorphisms of extracting DNA from minute tissue samples that
any size in the target DNA cause the greatest dif- have been developed for forensic purposes (e.g.,
ficulties for PCR, because two offset sequences Chelex extractions; see Protocol 3) have greatly im-
produce unreadable sequence gels. However, proved the potential for obtaining usable sequence
these problems can be reduced by gel-isolation of information from ancient or fragmented DNA.
and C l o n i ~ ~ g333
Nucleic Acids IV: Seqtlenci~~g
The other direct sequencing technique, RNA structure, which is not a problem with Maxam-
sequencing, also is faster and easier than cloning Gilbert sequencing. Occas~onally,a sect~onof
alzd sequencing. Direct RNA sequencing is used DNA will be highly resistant to Sanger sequenc-
to obtain sequence informaiiol~for nu- ing and can be sequenced only using clzemrcal
clear-encoded ribosomal RNA for phylogeny re- degradation. In addition, Maxam-Gilbert se-
colzstruction and rapid identification of microor- quencing is easily adaptable to some massive se-
ganisms. Direct RNA sequencing has tlze quencing efforts (see below).
disadvantages of (1) only having a single strand
available (so verification requires use of overlap- Cycle Sequencing versus Ollzer Methods
ping primers on a single strand rather than two- One of the main advantages of cycle sequencing
strand verification); (2) requiring fresh (or in some is that it reduces the amount, and, to some extci~t,
cases frozen) tissues; (3) providing access only to the qual~tyof template necessary lor sequencing
transcribed regions; (4) having greater difficulties Enough template can be generated from a single
in regions of strong secondary structure; (5) hav- 25-pl PCR reaction to result ill clear resolutioi~of
ing difficulties in polymorphic regions; and (6) DNA sequences. This is due to both the ~izcreascd
not producing a stable and verifiable clone. The efficiency of end-labeled primers and to a modcst
regions that are accessible to direct RNA sequenc- llnear amplification of the initlal template DNA.
ing (primarily the nuclear ribosomal genes) are Although most protocols recommend purification
also accessible to PCR; thus, sequencing the am- of PCR templates prior to sequencing, direct se-
plified products of PCR is the preferred direct sequencing of unpurified PCR products can bc
quencing method because of the greater accuracy aclueved mihen clean arnplifrcation without evi-
and the possibility of using poorer quality tissue dence of length (or other) heterogeneity IS appar-
samples. ent. Sequences also may be obtained directly from
phage plaques and bacterial colo~zieswithout pu-
Maxam-Gilbert versus Sanger Sequenci~zg rification (Krishnan ct al., 1991; Young and
Although both Maxam-Gilbert and Sanger se- Blakesley, 1991).
quencing have been used extensively for deter- Another advantage of cycle sequencing IS
mining DNA sequences, we emphasize Sanger se- that direct labeling of double-stranded ternplatc
quencing in this chapter for several reasons. circumvents problems usually associated wlth di-
Modifications of Sanger sequencing can be used rect sequencing of double-stranded DNA or wltlz
to sequence both DNA and RNA, whereas generation of single-stranded templates froin
Maxam-Gilbert sequencing is applicable only to double-stranded DNA segments. in cycle se-
DNA. Furthermore, Maxam-Gllbert sequencing quenclng, more of the labelcd primer is extended
requires a prior knowledge of tlze restriction map by the DNA polymerase than 1x1standard double-
of the target sequence, because it is necessary to stranded DNA sequencing protocols, wh~cli
cleave the DNA into manageable size pieces for rcsults in less wastage of prlmer and shortel
sequencing. Sanger sequencing requires no prior autoradiograph times (Murray, 1989). Many sc-
knowledge of the sequence, because primers can quenclng artifacts arc removed by the thcriual
be complementary to the cloning vectors or, in the cycling procedure and therefore are not as appar-
case of PCR, amplification prii~xerscan also be ent. Sing-le-strandedDNA sequencing may result
used as sequencing primers. Modifications of in less ambiguous sequences (i.e., fewer stops
Sanger sequencing are used in cycle sequencing and more readable sequences close to the pnincr)
and in automated sequencers. Finally, it is possi- than double-stranded sequencil-ig,but the gcncr-
ble to read more scquence information per gel ation of smgle-stranded DNA can be a time-con-
with Sanger than with Maxam-Gilbert sequenc- sumlng and labor-intensive procedure (Tripathl,
ing, because of better band resolution. However, 1991). Single-stranded DNA may bc produced
Sanger sequencing can present problems in se- from double-stranded templates by cloning into
quencing regions with strong DNA secondary a vector such as MI3 or tl~roughasyrnmeir~c
334 Chapter 9 / Hillis, Mable, Larson, Davis 6 Zimnzer
PCII. However, cloning is time-consuming, and cloning and amplification steps. However, this
g~tll~rdtloll of sufficient asylnmetric product for procedure is not generally applicable to most sys-
sequencing can be problematic. Cycle sequencing tematic studies, because each visualized sequence
utrli~esdouble-stranded template and therefore requires a separate probe and an appropriately lo-
reduces the time of preparation of sequencing cated restriction site. In addition, the low-copy
sa~t~ples and can result in cleaner sequences com- number of each target sequence results in a weak
pared with other double-stranded sequencing signal on tlze hybridized blot (Church and Kieffer-
methods. Higgins, 1988). A related technique known as
Cycle sequencing reactions are quick and ef- multiplex sequencing was described by Church
Ilcicut and leave less room for experimental error and Kieffer-Higgins (1988) that has been used for
because most of the procedure is performed in a massive sequencing of entire prokaryotic
tl~ermalcycling machine. The efficiency of the re- genomes (and similarly large sequencing. efforts).
actions can be increased further by using such Although several variations are possible, the key
titne-saving measures as automatic pipetters and to multiplex sequencing is the combined sequenc-
mlcrotiter plates (V. Smith et al., 1993). The reac- ing of numerous clones on a single gel, each of
tion is also very versatile. It can be used for sin- which is incorporated in a distinct vector. The
gle-stranded or double-stranded DNA, can utilize DNA is then transferred to a nylon membrane as
cloned products or PCR products, and can gener- in genomic sequencing, and the sequences of the
ate templates for direct or automated sequencing. individual clones are visualized by successive hy-
The major limitation of the procedure is that bridization to vector-specific probes. The advan-
i l ~iicorporatessome of the problems associated tage of the technique is the reduction of repetition
wrth the polymerase chain reaction. These include of many of the sequencing steps by a factor of
syeclficity of primer design, standardization of re- twenty or more. However, the technique is un-
actloll conditions across thermocycler machines, likely to be incorporated in most smaller scale sys-
eslablish~nentof optimal reaction conditions for tematic studies because of the need for cloning
new taxa or new sets of primers, potential incor- into a large number of distinct vectors.
poration of Tnq polymerase errors, and high cost Sequencing technology is continuing to ad-
of thermally stable polymerases. vance rapidly driven in part by the current inter-
est in sequelicing whole genomes. One especially
0 ther Methods of DNA Sequencing promising technology involves cleaving nu-
Several other modifications to DNA sequencing cleotides (using exonucleases) from a single
have been developed, each of which has special- strand of DNA in a flow cytometer, then detecting
ized applications. Genomic sequencing (Church and identifying these nucleotides in order in a
and Gilbert, 3984) can be used to assess sequence constant stream (Jett et al., 1989; Shera et al., 1990;
information directly front genomic DNA. In this Soper et al., 1991; Karding and Keller, 1992).'Ije-
procedure, fragments from completely restricted tection and identification can be accomplished by
whole genomic DNA are partially cleaved using laser spectroscopy of the fluorescently labeled nu-
chei~zrcaldegradation. These fragments are then cleotides (Shera et al., 1990). The technique has
separated on an acrylarnide gel and electro- considerable promise for very rapid sequencing of
phoretically transferred to a nylon membrane. large DNA molecules, but it still has numerous
The fr'lgments on the membrane are then hy- technical limitations that need to be overcome be-
bridized to a series of specific probes so that DNA fore tlze method can be widely used (L.M. Davis
sequences can be visualized. The technique et al., 1992). Whether or not this particular
thereby eliminates cloning or in vitro amplifica- method becomes generally applicable, radically
tlon steps. Another advantage to this approach is new approaches such as this are needed to under-
t h a t information on DNA methylation is pre- take massive sequencing projects such as the hu-
served, whereas this information is lost during man genome initiative.
~ c l e i cAcids IV: Sequencing and Cloning 335
tween homologous sequences to ask whether they

APPLICATIONS AND LIMITATIONS will be useful for inferring phylogenetic relation-
ships for particular taxa can be misleading. If a
There are three major areas of application of com- small proportion of sites are free to vary and 0th-
parative nucleic acid sequencing in systematic ers are highly constrained, the comparison may
studies: (1) evolution of genes, including studies be an uninformative combination of invariant
of the processes that produce sequence-levelvari- sites and sites that are saturated by change, al-
ation, studies of the origin of new alleles or new though the percent sequence divergence will ap-
loci, and investigations of convergence and selec- pear reasonable for systematic comparisons. Pilot
tion; (2) intraspecific or populational studies, tests should be conducted concerning the ability
including the tracing of organismal and allelic ge- of the potential target sequences to provide the
nealogies within species and studies of geo- resolution needed for the question of interest. For
graphic variation, gene flow, hybridization, and instance, do the target sequences discriminate se-
conservation genetics; and (3) interspecific stud- lected internal branches of the phylogeny of inter-
ies, such as the construction of species phyloge- est (including relatively recent and old ones)? Or,
nies to evaluate macroevolutionary patterns and in the context of population studies, do the target
processes. Within these three categories lie an im- sequences provide a reliable means for distin-
mense number and diversity of specific applica- guishing among individuals or indicating their re-
tions, from ecological and behavioral analyses, latedness? Analysis of the substitutional proper-
through developmental studies, to investigations ties of sequences relative to the requirements for
of population genetics, to taxonomic and system- phylogenetic reconstruction (following Larson,
atic applications, to studies of the epidemiology 1991a; Larson et al., 1992) and testing for system-
of diseases. Many of these applications were re- atic structure in the data (Hillis, 1991; Hillis and
viewed thoroughly by Avise (1994); we provide Huelsenbeck, 1992) may be useful components of
only a few examples here. a pilot study. The database for comparative se-
All applications must begin by matching the quences is continually increasing and comparison
level of variability of the molecule to be studied with results from other laboratories can greatly re-
with a set of systematic comparisons representing duce the time involved in selecting appropriate
the appropriate evolutionary timescale. However, target sequences for the level of comparison de-
it must be cautioned that ephemeral lineages that sired (e.g., Friedlander et al., 1992,1994; Graybeal,
existed relatively far into the past are unlikely to 1994).
be recovered unambiguously by any molecular (or
other) method. The evolutionary variability of any
molecule is a balance between mutational input
Evolution of Genes
and the constraints of structure and function. The For the investigation of gene evolution, the se-
relative constancy of these interactions across tax- quence to be studied is chosen for its own intrin-
onomic groups for some molecular sequences pro- sic interest, and the appropriate methods for
duces the useful result that the expected rate of analysis depend on (1) whether the DNA se-
evolution of these sequences can be predicted quence is located in a nuclear or organellar
within broad confidence intervals (see Chapter genome; (2) the number of copies of the DNA se-
12). Therefore, specific sequences can be recom- quence present per celI; (3) whether the DNA se-
mended for various applications with reasonable quence is transcribed and, if so, (4) which tissues
expectation that they will provide the desired level contain the largest relative abundance of the tran-
of variability. Molecular sequences will be readily script. If the sequence is present in an organellar
accessible for systematic study, however, only if genome, fractionation of the cellular components
sequence information is already available for ho- is a useful first step for sequence analysis, al-
molago~~s sequences from related organisms. though this can be avoided if sequence informa-
The use of percent sequence divergence be- tion is available for homologous regions from a
336 Chapter 9 / Hillis, Mable, Lauson, Davis
sufficientlyclosely related organism. In the latter potheses (Bradley et al., 1993). Other studies of
case, cloned or synthetic DNA can be used to se- gene evolution at the DNA sequence level are
lect the sequence of interest from a genomic li- beginning to reveal considerable inforlnatioll
brary made from a sample of total cellular DNA about the processes of substitutional mutations
(Protocol 8); alternatively synthetic DNA can be (e.g., Gojobori ct al., 1982; W.-H. Li et al., 1984;
used to amplify the sequence of interest from a Moriyama et al., 1991; Bull et al., 1993a1, the ori-
preparation of total cellular DNA using the poly- gin of new loci (Prager and Wilson, 19881, the evo-
merase chain reaction (Chapter 7). Sequences lution of gene families (e.g., Goodman et al.,
located in organellar genomes are present in 1979), and convergence at the molecular level due
sufficiently high frequency in cellular DNA prepa- to selection (Stewart and Wilson, 1987). In addi-
rations that cloning and amplification are relation, phylogenetic analyses of sequences are being
tively routine. used to reconstruct the sequences of ancestral
For nuclear DNA, cloning and amplification genes, promoters, and proteins, wluch can then be
are routine for sequences that are extensively resynthesized in vitro and tested in vivo (Adey et
peated relative to the total size of the nuclear al., 1994; Jermann et al., 1995; Stewart, 1995).
genome. The genes encoding ribosomal. RNA, for Moreover, study of the evolution of genes is also
example, are in this category. If the DNA sequence beginning to have an important impact on our un-
is transcribed, direct sequencing of the RNA tran- derstanding of related fields in molecular, cellu-
script is possible where the frequency of the tran- lar, and developmental biology (e.g,, Schubert et
script in total cellular RNA (or in the isolated al., 1993; Doyle, 1994).
polyadenylated or non-polyadenylated RNA frac-
tions) is sufficient to permit direct sequencing.
Weisman et al. (1986) showed that this latter ap-
Intraspecific Diversity
proach is feasible even for a non-repetitive gene With improvements in PCR and sequencing tech-
whose transcript is abundant only in a certain tis- niques, intraspecific sequence variation at the
sue at a certain stage of the life cycle, and where species level now can be used in studies of epi-
prior knowledge of homologous sequences is demiology of diseases (e.g., Ou et al., 1992; Hillis
fragmentary. Previously, it was necessary to study and Huelsenbeck, 1994a), gene flow (e.g., Slatkin
the sequence evolution of alleles at most loci by and Maddison, 1989,1990; Hudson et al., 1992b),
first cloning the gene and its flanking regions geographic variation (e.g., Vigilant et al., 1991;
(e.g., Ponath et al., 1989a,b). However, improve- Hayes and Harrison, 1992; D.R. Maddison et al.,
ments in PCR and sequencing technology have 1992), and hybridization (e.g., Moritz et al., 1992b;
made it possible to compare sequences from Moritz and Heideman, 1993; ArBvalo et al., 1994).
cDNA generated using reverse transcriptase am- However, there is a trade-off between the detailed
plifications and to directly amplify even single- information at one or a few loci in a sequencing
copy nuclear DNA (e.g., Bradley et al., 1993). study and the less detailed information at many
Sequence information also can provide more loci in studies of allozymes (Chapter 4),
heretofore unavailable details about the molecu- RAPDs (Chapter 7), or restriction sites (Chapter 8).
lar processes that are responsible for many evolu- Animal mitochondria1 DNA offers one of the
tionary phenomena. For instance, several distinct best opportunities for applying DNA sequencing
hypotheses have been proposed for the origin of to the study of population genetic processes. Am-
rare alleles of enzymes in hybrid zones, including plification and sequencing can be used to cliarac-
intracistronic crossing over, hybrid dysgenesis terize the mtDNA haplotypes present in a popu-
(the release of previously controlled transposable lation or species and to reconstruct the gene
elements), and differential selection in hybrid phylogeny that relates them, Because animal
zones (reviewed by D.S. Woodruff, 1989; see mtDNA is maternally transmitted (at least most of
Chapter 4). Sequencing of the rare and parental al- the time, in most species) and non-recombinii~g,
Ieles has helped to distinguish among these 11y- all parts of the molecule share the same historical
Nucleic Acids XV: Sequevrcing and C Z O I Z ~ I337
Z~
pattern of coinmoil descent (A.C. Wilson et al., these loci is sufficient to justify their regular use
1985). The use of these gene phylogenies of in population genetics applications. In humans,
rntDNA together with geographic information on no variation was found in a sample of 38 rnalcs *n
the populations sampled provides a means for an intron of the ZFY gene on the Y chromosome,
the genetic structure of populations although variation was detected in other primate
(termed "jntraspecific phylogeography" by Avise species (Rorit et al., 1995).
et al., 1987). Use of gene phylogenies forms the
basis for the approach to population genetics Interspecific Diversity
known as coalescent theory. Coalescent theory
provides a means for measuring gene flow among Interspecific studies include investigations across
populations (Slatkin and Maddison, 1989, 1990; an immense span of time, from a geological instant
Hudson et al., 1992b; Templeton, 1993,1995) and through about 4 billion years of the history of life.
for evaluating the phenotypic effects of allelic sub- Naturally, a target sequence that provides resolu-
stitutions (Templeton et al., 1988, 1992). Uses of tion at one end of this continuu.m is unlike1.y to be
DNA sequence data for population genetic stud- useful at the other end. Some of the genes or re-
ies of plants tends to be more difficult than for an- gions that have proved especially versatile include
imals because their organellar genoines (espe- the ribosomal DNA arrays and the mitocl~ondrial
cially cpDNA) are less variable, thereby providing and cldoroplast genomes. Various nuclear targets
fewer markers for studying intraspecific varia- (other than rDNA) have been useful within partic-
tion; however, some instances of intraspecific ular groups, although gene duplication, and the
variation have been reported for cpDNA (D.E. presence of pseudogenes have caused consider-
Soltis et al., 1992). Sequence surveys of plant able difficulties for some applications.
mtDNA have been much less comprehensive than There are many and varied applications of in-
for animals, so it is possible that this molecule will terspecific phylogenies. In many cases, the phy-
prove useful in plant population genetics as well logeny is of intrinsic interest in its own right, as it
(Palmer, 1992). informs us about the course of evolution and pro-
Sequencing studies have found a place within vides the basis for taxonomy and classification.
the field of conservation genetics as well (re- However, interspecific phylogenies also are csti-
viewed by Avise, 1994).Studies of sequence vari- mated to directly study such topics as speciation
ation (primarily within the mtDNA and cpDNA (e.g., Moritz and Heideman, 1993; Patton and
genomes) have focused on issues such as inbreed- Smith, 19941, biogeography (Page, 1993a, 1994),
ing depression, reduced heterozygosity in small and co-speciation (IJage,1991; Brooks and McLen-
populatioi~s,introgression, and identification of nan, 1993; Chapela et al., 1994; HinMe et ai., 1994).
commercial products made from endangered Interspecific phylogenies are also needed to con-
species. trol for historical effects in studies of behavior,
Use of nuclear loci for intraspecific studies ecology, physiology, and other comparative or-
has received far less attention than has use of the ganisinal fields (Felsenstein, 1985c; Baum and.
mtDNA genome. However, studies of nuclear in- Larson, 1991; Brooks and McLennan, 1991; Har-
iron sequences (e.g., Lessa, 1992; Lessa and Ap- vey and Pagel, 1991; Garland et al., 1992,1993).
plebaum, 1993; Slade et al., 1993) and spacer re- The nuclear and mitochondria1 genes encod-
gions of rRNA genes (Kambhampati and Rai, ing ribosomal RNA have been particularly inipor-
1991) appear promising for such applications. The tant for inferring species phylogenies because
search for a paternally inherited counterpart to they are easily accessible, collectively demonstrate
the maternally inherited mtDNA of animals has a wide range of evolutionary rates, and therefore
focused on introns in coding sequences of sex have the potential to provide resolution across a
chromosomes (in species in which the male is the large time scale. Until recently, phylogenetic se-
heterogametic sex), such as the Y chromosoine of quence comparisons have concentrated 011 the
mammals. It remains to be seen if the variation in coding portions of the ribosomal genes and their
338 CJzapter 9 / Hillis, Mable, Larson, Davis
RNA products (see review by Hillis and Dixon, 1989; Gouy and Li, 1989b; Wainwright et al., 1993;
3 9911. The nuclear-encoded ribosomal RNA clus- Halanych et al., 1995). The relatively variable re-
ter demonstrates an unusual pattern of evolution, gions within the rRNA genes make them useful
fealurlng the interspersion of relatively rapidly for examining relationships within more closely
evolvmg sequences with some of the most highly related groups, such as various groups of algae
conserved ~nacromolecularsequences known (Jupe et al., 1980; Perasso et al., 1989; Eschbach et
(Gcrbi, 1985). The most h ~ g h l yconserved seal., 1991; Larson et al., 1992), angiosperms
quences have been useful for investigating the (Hamby and Zimmer, 1988, 1992; Wolfe et al.,
oldesl divergences in the history of life (G.E. Fox 1989b; Zimmer et al., 19891, fungi (Guadet et al.,
ct a1 ,1980; Kunzel and Kochel, 1981; D.E Spencer 1989; Watanabe et al., 1989; Gargas et a]., 1995),
et a1 , 1984; Hasegawa et al., 1985a; Lane et al., mollusks (Ghiselin, 1988), arthropods (Hancock et
1985, G.J. Olsen, 1987; Field et al., 1988; Cedergren al., 1988; Abele et al., 1989; W.C. Wheeler, 1989;
et al., 1988; Lake, 1988; Ghiselin, 1988). Compar- Kim and Abele, 1990; Spears et al., 1992), echino-
isons of mltochondrial and cl~loroplastribosomal derms (Raff et al., 1988; A. Smith, 1989),and ver-
genes with prokaryotlc ribosomal genes have tebrates (Hillis and Dixon, 1989; Larson and Wil-
helped resolve relationships among these eukary- son, 1989; S.B. Hedges et al., 1990; Larson, 1991b;
otlc organelles and their prokaryotic relatives (D. Larson and Dirnmick, 1993; Hillis et al., 1991a,
Yang el al., 1985; S. Turner et al., 1989). 1993b). The internal transcribed spacer (I'TS) re-
Riboso~nalRNA genes (especially the small gions are useful for examining relationships
subunit) have provided considerable insight on among closely related species (Gonzalez et al.,
the relat~onsl~ips of prokaryotes; in fact, much of 1990; Phillips and Pleyte, 1991; Lee and Taylor,
what 1s known of prokaryote phylogeny has been 1992; Pleyte et al., 1992; C.E. Ritland et al., 1993;
derived from rDNA analyses. X~bosomalRNA Scl~lottereref al., 1994; Vogler and DeSalle, 1994;
gencs have also been used both to support (e.g., Vilgalys and Sun, 1994). The nuclear-encoded ri-
Pace ct a1 , 1986; Woese and Olsen, 1986; Gouy bosomal RNA sequences can be studied by se-
and LI, 19894 and refute (e.g., Lake, 198713) the quencing cloned copies of the ribosomal genes
monopllyly of archeans (formerly called archae- (Ware et al., 1983; Hassouna et al., 1984; Elwood
bactclla) Within the Archea, studies such as those et al., 19851, by sequencing PCR-amplified regions
of Gupta et al. (1963), Woese et al. (1984a), Lech- (Sogin, 1990; Weisburg et al., 1991; see primer
ner et al. (19851, and Leffers et al. (1987)initiated a compilation by Hillis and Dixon, 19911, or by se-
new area of understanding of the broad outllnes quencing the ribosomal RNA directly (Hamlyn et
of diversity in this group, and many Archea that al., 1978; Youvan and Hearst, 1979; Qu et al., 1983;
are resistant to culturing are known only from Hamby et al., 1988; Larson and Wilson, 1989;
the~rrDNA sequences. Scores of ribosomal DNA Bachellene and Qu, 1993) . The latter approach is
sequencing studies have also indicated the major facilitated by the fact that the majority of the RNA
evolu tlonary relationships among the Eubacteria in any cell is nuclear-encoded ribosomal IWA.
(e 8,Wocse et al., 1984b,c, 1985; Weisburg et al., Sequence comparisons of selected regions of
19b9a,b). animal mtDNA (including the rntDNA ribosomal
Sequcnce studies of the rRNA genes, from the genes) are useful for inferring phylogenetic rela-
conherved regions to the rapidly evolving regions tionships of species whose divergences are more
knoxrn as divergent domains or expansion seg- recent than those accessible using nuclear-en-
mcnis, have proven useful for investigating evo- coded ribosomal W A (Moritz et al., 1987).Phylo-
lutionary divergences that occurred throughout genetic analyses have been conducted based on
the hlstory of the metazoans (Sogin et al., 1986; miDNA sequences of many of the major phyla of
Patterson, 1989; Lake, 1990a; Larson, 1991a). The animals, although the greatest number of studies
more conserved regions have been useful for have been conducted on arthropods (e.g., Alexan-
lookrng at relationships among the major phyIa der, 1991; Simon, 1991; Ballard et al., 1992; Cun-
(e g , blwood et al., 1985, Field et al., 1988; Sogin, ningham et al., 1992; DeSalle, 1992; Cameron,
Jucleic Acids IV: Sequencing and Cloning 339
1993; Simon et al., 1994) and vertebrates (e.g., taxa (Lessa 1992; Lessa and Applebaum 1993;
W.M. Brown et al., 1982; Higuchi et al., 1984; Hix- Slade et al., 1993,1994; Palumbi and Baker, 1994).
son and Brown, 1986; Hayasaka et al., 1988; The number of loci examined in a comparative
Holmquist et al., 1988a; Miyamoto et al., 1990; Ru- manner and the number of taxa for which such in-
volo et al., 1991; S.B. Wedges et al., 1991,1992; Al- formation is available should increase dramati-
lard et al., 1992; Ammerman and Hillis, 1992; cally over the next several years.
Block et al., 1993; Titus and Larson, 1995). The
mtDNA sequences that have received the most at-
tention are the genes for ribosomal RNA (125 and SUMMARY
16S), cytochrome oxidase I and TI, and cytochrome
b, as well as the control region, but other regions From the preceding discussion, it should be clear
are proving to be useful as well. that nucleic acid sequencing can be used to study
Most comparative plant DNA sequencing virtually any systematic problem, from studies of
studies have utilized the chloroplast genome evolutionary processes to the phylogeny of life.
(Clegg el: al., 1990; Clegg and Zurawski, 1992; Iiowever, this does not mean that sequencing is
Jansen et al., 1992). Complete chloroplast DNA necessarily the best approach to any problem.
(cpDNA) sequences have been obtained for sev- Since it is not always a cost- or time-effective
eral species of plants (the first two were se- method for obtaining relevant data (see Chapters
quenced to completion by Shinozaki et al., 1986 2 and 12), other techniques are best used for many
and Ohyama et al., 1986). However, a large frac- systematic applications. This is especially true of
tion of studies of cpDNA have focused on se- studies that require examination of many individ-
quence variation of rbcL, the gene that encodes the uals or loci, such as many studies of intraspecific
large subunit of ribulose-1,5-bisphosphate car- variation (e.g., geographic variation, reproductive
boxylase (e.g., J, Aldrich et al., 1986a,b; Ritland modes, geographic variation, and heterozygosity
and Clegg, 1987; Zurawski and Clegg, 1987; S. estimates) and many studies of closely related
Turner et al., 1989; Morden and Golden, 1989; D.E. species (hybridization, cryptic species, and recent
Soltis et al., 1990; Donoghue et al., 1992; R.G. Olm- phylogeny). However, for problems of phylogeny
stead et al., 1992; M.W. Chase et al., 1993). Taken reconstruction over relatively ancient spans of
together, these studies represent one of the largest time (greater than 50 million years), no other mol-
sequence data bases (in terms of number of ecular technique is as likely to be informative as
species) available for phylogenetic analysis. appropriate nucleotide sequence data.
Many other loci have been cloned and se-
quenced from several species and used for phy-
logeny reconstruction. Loci for which consider- LABORATORY SETUP
able comparative sequence information is
available include (in addition to the r M A genes Although all nucleic acid sequencing work re-
and mtDNA genome discussed above) the globin quires a relatively sophisticated laboratory, needs
loci in primates (e.g., Koop et al., 1986; Goodman will vary depending on the method(s) chosen for
et al., 1987; Miyarnoto et al., 1987; Holmquist et isolating the target sequence. In general, cloning
al., 1988a), the immunoglobulin genes in rodents requires somewhat more equipment (and experi-
(e.g., Ponath et al., 1989a,b),tile alcohol dehydro- ence) than PCR amplification or direct RNA se-
genase loci in fruit flies (e.g., Bodmer and Ash- quencing. Table 1 provides a rough idea of the re-
burner, 1984; Schaeffer and Aquadro, 1987; Bishop quirements of a typical laboratory for cloning and
and Hunt, 1988a), and the aldolase and DQQ: sequencing.
genes in both mammals (Lessa, 1992) and skinks Beyond the equipment listed in Table 1, the
(Slade et al., 1994). Intron-containing regions of supply needs for a sequencing laboratory are sim-
single-copy nuclear genes (see Chapter 7) have ilar to those described in Chapter 8 for restriction
been used to compare closely related groups of enzyme analysis. Cloning work requires a few
340 Chapter 9 / Hillis, Mable, Larson, Davis G. Zirnrner
Table 1
Primary equipment for sequencing and cloninga
Equipment Use (protocols in parentheses)
Agarose gel apparatus, small* Quick check of DNA fragments

Agarose gel apparatus, large* Separation of DNA fragments
Autoclave* Sterilization
Balance, analytical" Weighing small samples
Balance, top-loading" General weighing
Camera, PolaroidTM* Gel photography
Centrifuge, microtube* Centrifuging small samples
Centrifuge, high-speed, refrigerated Centrifuging large samples
Rotor for 250-ml bottles Spinning down cells (7,14,15)
Rotor for 50-mi tubes DNA, cell isolation (1-5,7-15)
Centrifuge, ultraspeed Isolation of cellular components
Swinging buckct rotor, =36-m1 tubes mtDNA, cpDNA isolation (Chapter 8)
Vertical or fixed angle rotor, =2-5-ml tubes Plasmid, mtDNA, and cpDNA isolation
(14; Chapter 8)
Centrifuge, vacuum Drying DNA/RNA samples (NE)
Computer* Sequence storage and analysis
Darkroom, with developing tanks Developing autoradiographs (8,25,26)
Distilled water source* Purified water
Film cassettes, autoradiography Autoradiography (8,25,26)
Freezer, -80°C* Tissue and sample storage
Freezer, -20°C, not frost-free" Sample and enzyme storage
Fume hood" Use of caustic reagents
Geiger counter Radiation detection and safety (8,22-26)
Gel dryer Drying gels far autoradiography (21-26)
Gel reader Reading sequence gels (NE)
Goggles, W protective* Safety
Heating blocks, ambient-100°C (-3)" Heating reactions
Ice machine* Cooling samples or reactions
Incubator, 37°C Bacterial incubation (7-16,18,26)
Incubator, 55-65"C, with rocker DNA hybridization (8,26)
Laminar flow hood Sterile work area (7-16,18,26; NE)
Lucite screens Radiation protection (8,21-26)
Micropipetters (set from 1 pl -1 ml)* Pipetking small volun7es
Microwave oven* Heating liquids
pH meter* Adjusting pH of solutions
Plastic bag sealer Filter hybridization (8,26)
Power supply, 22000 V Acrylamide gel electrophoresis (20,25,26)
Power supply, 2250 V* Agarose gel electrophoresis
Refrigerators, 4"C* Cold storage
Sequencing gel apparatus DNA or RNA sequencing (20-26)
Shaker, orbital Gel fixation (25)
Shaker, heated water bath Bacterial growtl~,hybridization (7-16,18,26)
Spectrophotometer* DNA/RNA quantification, purity
assessment, cell concentration
(continued)
NucEeic Acids IV: Sequencing and Cloning 341
Table 1(continued)
Primaxy equipment for sequencing and cloningn
Equipment Use (protocals in parentheses)
Stirring hot plate* Mixing solutions

Thermal cycler PCR ampl~ficatton(17),
cycle sequencing (24)
T:mcrX T~mingprotocol steps
Tissue homogenizer Tissue grinding (1-5)
UV light box, long wave" Visuahzing and photographing gels
Vacuum oven (or UV oven) Cross-linkmg DNA to membranes (8,26)
Vacuum pump For vacuum oven and centrifuge
Vortexer* Mixing sarrlples
Water baths, ambient to 65'C (=3)* Constant tcmpcrature of samples
Watcr bath, cool~ng Preparing radioactive probes (8,26,NE)
W o t all itcms listed are needed for some of the appllcalions described in thls chapter Thcrelore,
thc protocols for which each ltein is needed are hsted in parentheses, ltcms that are csscntral for
inany protocois are marked with an astensk. Itcms that are non-csscntial but facilitate a procedurc
are marked "NE."
additional supplies, such as disposable petri reagents arc the samc as described for rcstrictlon
dishes, culture tubes, and wirc loops for spread- site analysis (Chapter 8).
ing bacterial colonies. Oligonucleotide primers for
sequencing are commercially available lor t l ~ c
various cloning vectors, or specific primers can be FR(>TOCQLS
made to order by many companies and central-
ized institutional facilities. If an oligonucleotide 1. DNA isolation from a~~imals,
protists, and
synthesizer is available in the laboratory, primers prokaryotes
can be designed and constructed with iitile time
2. DNA isolation from plants, fungi, and algac
or effort (see W.M. Barnes, 1987).
In addition to the materials described for re- 3. DNA isolation from minute quantities of
striction site analysis (Chapter 81, cloning and se- tissue
quencing require a few specialized enzymes, an-
4. RNA isolation froin animals
tibiotics, and other chemicals. Sanger dideoxy
sequencing - requires
- one of s e v e r a y pol;-
~ ~ ~ 5. RNA isolation from plants
me>ases, or reverse transcriptase for RNA se-
quencing (see Protocols 22 and 23). The ther-
of gene banks in lambda
bacteriophage vectors
mostable polymerases (ex., Taq or Vent)used for
PCR can aiso'be used as egective polymerases for - -
7. Growing bacteriophage
DNA sequencing. DNA ligase is- needed in
cloning to ligate compatible fragments (see Proto-
8, Screcni.&
cols 10and j8). Plasmid subcloning requires the 9. Miniprep isolation of lambda bacteriophage
use of various antibiotics and substrates (to en- DNA
sure the presence of plasmids and for screening
10. Subcloning into plasmids or MI3
recombinant from non-recombinant plasmids; see
Protocols 12 and 18).Vector DNA and host bacte- 11. Preparation sf frozen competent cells for
rial strains are available commercially or through transformation
exchange with other laboratories. The remaining
342 Chapter 9 / Hillis, Mnble, Larson, Davis G. Zimmer
12. Transformation of plasmid DNA mals, a few drops of blood can be diluted directly
13. Transforlnation of MI3 bacteriophage DNA
into STE in step 2 (much greater quantities of
mammalian blood are needed, because mam-
14 Isolation of plasmid DNA malian erythrocytes lack nuceli and the DNA must
be isolated from the leukocytes). Extraction of
15. Niniprcp isolation of MI3 DNA
DNA from many species of mollusks (especially
16. Preparing permanent frozen stocks of gastropods) has proven difficult; it is usually nec-
plasmid clones essary to experiment with several different tissues
(the gonads work well in many species), or see the
17. Preparation of PCR products for sequencing variations of Protocol 1 in Chapter 7. Most insects
18. Cloning of PCR products can be processed whole after removal of the diges-
tive tract. If isolating DNA from ethanol-preserved
19. I'urification of PCR products
samples, the tissue should be lyophilized first. See
20. Screening methods for detecting variation Chapter 8 for protocols for isolating mtDNA and
in sequencing templates cpDNA and Chapter 7 for alternative methods for
2 1. PIcparing a sequencing gel isolating nuclear DNA.
For many tissues, particularly most vertebrate
22 DNA sequencing reactions tissues, high-quality DNA can be isolated rou-
tinely using abbreviated procedures. The basic ex-
23 lWA sequencing reactions
traction method (part A) can be used for most tis-
21. Cycle sequencing sues, including those that contain pigments that
must be removed from the DNA sample, The al-
25. Running a sequencing gel
ternate method (part B) is a more condensed pro-
26 Microsatellites cedure that works best with muscle tissue or other
tissues that are not heavily pigmented.
iirl;tudol 1: D N A Isal*xi-ionfrom Par! A. Basis Extr:~ctinnMetlrod

1. If the tissue sample cannot be disrupted eas-
fbiriiirala,iPr*stists,
hl~dP~hfIc&ry~fes ily, grind the sample to a fine powder in liq-
(%me. day 1:=3 hr; day 2: =30 min) uid nitrogen with a mortar and pestle. (Note:
There are many protocols for isolation of high- Be very careful while powdering tissue as the
molecular-weight DNA; the following protocol is mortar and pestle can shatter due to the ex-
usefril for isolating DNA from small tissue sam- treme cold.) Do not allow the powder to thaw
ples dnd will produce more than enough DNA for at any time.
all applications in this chapter. This procedure 2. Place 100 mg of the powdered tissue in a mi-
woxki well for many lnulticellular animals, but it crocentrifuge tube (or fill to 1/3 full). The re-
may not work for crustaceans and some fish, maining powder can be stored at -80°C until
among other organisms; uniceIlular organisms needed. For "clean" tissues, such as m~~scle,
may be processed by start~ngat step 2. For any or- steps 1 and 2 may be unnecessary.
gnnlsm, if the final DNA is degraded, i t may be 3. Add to the tube:
necessary to skip step 6. This will greatly reduce '
yields, but will almost always produce high-mole- 500 pl of STE buffer
cular-weight DNA, Protocol 2 will also work well 25 PI of 10 mg/ml proteinase K
on some animal tissues. For vertebrates, muscle 75 pl of 10% SDS
tiss~reproduces the highest quality DNA, al- 4. Mix well and incubate for 2 hr at 55°C in the
though liver tissue usually produces the greatest shaking water bath.
yieicis. Vertebrate blood is also a good source of
high quality DNA; for all vertebrates except mam- 5. Mix occasionally during the incubation to
keep the tissue suspended.
Nucleic Acids IV: Seque~zcingand Cloning 343
6. Add an equal volume of PC1 (Appendix), mix mately 50 ,ug/ml. The ratio of the readings at
gently but thoroughly and incubate at room 260 nm/280 nm should be approximately 1.8;
temperature for 5 min. If the phases separate, lower readings indicate contamination with
gently mix again. protein and/or phenol. Relative concentra-
7. Centrifuge for 5 min at ~ 7 0 0 0g (or at high tion can also be determined by elec-
speed in a microcentrifuge). trophoresing the sample on a denaturing
8. Carefully remove the aqueous layer with a mi- polyacrylamide gel or a n agarose gel and
cropipette and a wide bore tip and transfer to a comparing to a standard.
clean tube. The aqueous layer is usually the 19. Dilute the sample to the desired working con-
top layer, although high salt concentrations can centration with TE.
cause inversion of the phases. Be careful not to 20. Electrophorese 2 pl of the sample on a minigel
disturb the cellular debris on the interface. to check for degradation and determine if it
9, Re-extract the aqueous phase with PC1 (re- will be necessary to treat the sample to re-
peat steps 6-8). move RNA.
10. Add an equal volume of CI (Appendix), mix
gently, and incubate at room temperature for Part R. hlion-rative E N A Exlraciion Method
2 min. Re-mix once a minute to prevent the Steps 1 4 : Perform as in part A.
phases from separating.
5. Add 1/10 the sample volume of 5 M NaCl
11. Centrifuge for 3 min at ~ 7 0 0 0g. and place on ice for 1 hr.
12. Carefully remove the upper (aqueous) layer 6. Centrifuge at ~ 7 0 0 0g for 10 min.
with a micropipette and a wide bore tip and
transfer to a clean tube. Be careful not to dis- 7. Transfer the supernatant to a clean tube and
turb the interface. shake well.
13. Re-extract the aqueous phase with CI (repeat 8. Centrifuge at =7000 g for 5 min.
steps 10-12). 9. Add 2-3 times the sample volume of 95%
14. Add 1/10 the sample volume (about 45 pl) of ethanol. The DNA should precipitate imme-
2 M NaCl (or 3 M NaAc or 5 M N u c ) and diately.
2.5 times the sample volume of ice-cold 95%
ethanol.
15. Precipitate the DNA at -20°C for at least two Protocol 2: DNA Isofafio~xfrom Flanfs,
hours (overnight is preferable). Fungi, and Algae
16. Centrifuge the precipitate for 10 rnin at ~7,000 (Time: day 1: =2 hr; day 2: =30 min)
g. Wash the pellet twice with 7'0% ethanol and
dry in a vacuum centrifuge. Many methods have been developed for isolation
of high-molecular-weight DNA from plants (Ziin-
17. Re-suspend the pellet in 250 ,ul (the volume mer et al., 1981; Saghai-Moroof et al., 1984; Rogers
will depend on the pellet size and desired con- and Bendich, 1985; Doyle and Dickson, 1987).
centration) of l x TE or in diethylpyrocarbon- These methods differ primarily with respect to
ate-treated distilled H20. Incubating the Sam- their requirements for input material (fresh,
ple at 4540°C can facilitate dissolution of the frozen, or lyophilized; gram or hundreds of gram
pellet. quantities) and the use or non-use of ultracen-
18. Check the concentration and purity of the trifugation steps. The protocol given below is rel-
sample in a spectrophotometer by taking atively simple and is useful for the preparation of
readings at 260 nm and 280 nm. An optical the small samples of DNA needed for applica-
density of 1 at 260 nm corresponds to a dou- tions in this chapter. See Chapter 8 for protocols
ble-stranded DNA concentration of approxi- for isolating cpDNA and mtDNA.
344 Chapter 9 / Hillis, Mable, Larson, Davis
1. Grind leaf or flower tissue to a fine powder in Protocol 3: Jfsalakion of DNA Brerlv
liquid nitrogen with a mortar and pestle. Painaeate Quantities of 'Fissae
(Note: Be very careful while powdering tissue (Time: ~ 1 . hr)
5
as the mortar and pestle can shatter due to the
extreme cold.) With continued improvements in PCR tech-
2. Add P-mercaptoethanol (/?ME)to 2x CTAB niques, even very minute quantities of tissue are
extraction buffer (Appendix) to a final con- sufficient to allow reliable amplification of DNA
centration of 0.2%. Heat the CTAB plus ,BME segments. The previously described methods for
solution in a 60°C water bath for 5 min. DNA extraction usually require at least several
3. Aliquot 500 yl of the buffer plus PME into a hundred micrograms of tissue. Recent advances
1.5-mi microcentrifuge tube. Add =I00 mg in forensic techniques have made it possible to
extract DNA from single hairs or tissue scrapings.
fine nitrogen powdered tissue and place in a
60" water bath for 45 min. Application of these techniques to molecular sys-
tematic problems has allowed extraction of DNA
4. Add 500 p1 of CI. Close the tube and extract from some fossil tissues and preserved speci-
by gently inverting the tube. Extract for 10 mens. These methods also have been used to al-
min. . low non-destructive sampling of populations for
5. Centrifuge for 5 min at =7000 g (high speed) survey purposes.
in a microcentrifuge. The following protocol is based on Singer-
6. Transfer the upper (aqueous) phase to a fresh Sam et al. (1989).This method uses BioRad Chelex
tube using a wide bore pipette tip and re-ex- 100", which was designed for ion resin exchange.
tract with CI. Centrifuge as above and trans- It is a very simple procedure but usually result sir^
fer the aqueous phase to a fresh tube. template usable for PCR. Only a very small
7. Add 1 ml of absolute ethanol and allow the
amount of tissue is required and exceeding the
DNA to precipitate at -20°C for at least 30 recommended quantity will result in decreased
min. (Note : Precipitating overnight substan- extraction efficiency. Optimization may be re-
tially increases the yield). quired in terms of quantity of tissue used and in-
cubation times. However, the basic procedure has
8. Centrifuge for 1-5 min at ~7000 g to pellet the been found to work for single drops of blood from
DNA. vertebrates (especially other than mammals); sin-
9. Decant the ethanol and briefly dry the pellet gle hairs frommam&ils; fossil tissue scrapings;
in a vacuum centrifuge. and alcohol and formalin-preserved specimens
10. Redissolve the pellet in 100 pl of 1x TE. Add (see Walsh et al., 1991). Throughout this proce-
10 pl of 3 3 sodium acetate and 2.5 volumes dure it is essential that all solutions, instruments,
of 95% ethanol. Precipitate at -20°C for 30 and tubes are sterile because even trace quantities
min. of contaminants can result in serious contamina-
11. Centrifuge for 5 min at -7000 g. Decant the tion problems. Instruments should be flamed in
ethanol and add 1ml of 70% ethanol to wash alcohol between use and it is a good idea to do
the pellet. Recentrifuge for 2 min at 7000 g. control,reactions to check for contaminatioiz.
Two 70% ethanol washes may be necessary to 1. Scrape a sliver of tissue from a frozen sample
remove traces of CTAB or chloroform. (less than 1 mg) with alcohol-flamed forceps
12. Dry the pellet in a vacuum centrifuge until and place in a sterile microcentrifuge tube
all visible traces of ethanol are gone. Do not containing 500 pl of 5% Chelex. Wash forceps
overdry the pellet. Redissolve in 200 pl of l x with water and then alcohol, followed by
TE. Determine concentration and purity of flaming, in between samples. If using blood,
the sample as in Protocol 1 (part A), step 18. drop several. microIiters into a tube contain-
ing Chelex.
Plate 2 Chromatographs from an automated DNAse- quence (second panel), lower slgnal-to-noise results m
quencer. The height of each of the four colored lines in- occasional ambiguities (e.g., base positxon 186) Within
dicates the relative intensity of fluorescence that corre- another 100 nucleotides, the peaks become less well de-
sponds to each of the four labeled dideoxynucleotides. fined and ambiguities increase (th~rdpanel). Evcntu-
Hence, the peaks may be read directly as DNA se- ally, the peaks are poorly deflncd and the sequence 15
quences (indicated above the chromatograph).The se- unrellablc (bottom panel). The length of reliable reads
quence near the primer (top panel) is clear and easy to will depend on many factors, including the model of
read, with well-defined peaks that are easy to distin- the sequencer, the quality of the template, and the de-
guish from background noise. Further along the se- tails of the sequencing reaction.
Nucleic Acids IV: Sequencing alzd Cloning 345
2. Incubate tubes at 56°C for 45 min to over- 5. Pour the guanidinium isothiocyanate/kmer-
night, until most of the tissue has disinte- captoethanol solution into the mortar and stir
grated (times will vary with type of tissue until the mixture freezes.
used). 6. Place the mortar in a 60°C water bath until
3, Vortex at maximum lor 10-15 sec. the mixture melts. Stir, then pour the mixture
4. Heat at 95-100°C for 15 min. into the centrifuge tube and place it in a
beaker of water in the 60°C water bath.
5. Vortex at maximum for 10-15 sec.
7. Draw the mixture into a syringe (10-ml vol-
6. Store at 4°C.
ume, fitted with an 18-gauge needle) and
7. Prior to using for PCR, centrifuge the extracts forcibly eject it into the centrifuge tube. Re-
in a microcentrifuge to pellet the Chelex peat until the viscosity of the mixture is re-
beads. duced.
&
. <
8. Add phenol (5 ml per gram of tissue), pre-

heated to 60°C, and continue to pass the
%3roeocol4: Jsolafian sf RNA from emulsion through the syringe.
Ar~irnals
(Time: day 1: =2 hr; day 2: -3 hr) 9. Add 5 ml of ATE (Appendix) per gram of tis-
sue.
RNA is much less stable than DNA, and the tis- 10. Add 5 ml PC1 per gram of tissue and shake
sues must be as fresh as possible. All glassware vigorously for 10-15 min while maintaining
used for RNA work must be baked at 250°C for at the temperature at 60°C.
least 4 hr to remove RNase. Water should be 11. Cool on ice and centrifuge for 10 min at 4°C
treated with 0.1% diethylpyrocarbonate (DEP;
using a swinging bucket rotor at moderate
Appendix). All plasticware should be sterile. It is speed (=3000 g).
a good idea to use a separate set of glassware and
other reusable supplies exclusively for M A work 12. Recover aqueous (top) phase (use siliconized
to avoid contamination with RNase. Pasteur pipette) into a new 50-ml centrifuge
The following protocol is useful for isolating tube.
total cellular RNA from vertebrates and many 13. Re-extract with an equal volume of PC1 at
other multiceUular animals for direct sequencing 60°C.
of rRNA. Protocol 5 is preferred for isolation of 14. Repeat steps 11 and 12.
RNA from plants and algae, as well as most in- 15. Re-extract twice with CI, centrifuge, and re-
sects. cover (at room temperature).
16. Add 2-2.5 volumes absolute ethanol and
First Day store at -20°C overnight.
1. Place frozen phenol in a 60°C water bath to
melt.
Second Day
2. Weigh the frozen tissue (mass in grams) and
place in a mortar. Cover with liquid nitrogen. 17. Centrifuge for 20 min at 4°C using a swinging
bucket rotor at moderate speed (=3000g),
3. Measure 5 ml of guanidinium isothiocyanate
solution (Appendix; with 1/100 volume P- 18. Pour liquid into a waste container; dry the
mercaptoethanol added) per gram of tissue pellet in a vacuum centrifuge.
(from step 2). Add to a 50-ml centrifuge tube 19. Dissolve the pellet in the original starting vol-
and set aside. ume (step 3) of STE plus 0.2% SDS.
4. Grind the tissue to a fine powder with a pes- 20. Add 20 p1 proteinase K (10 mg/ml in STE)
tle. (Liquid nitrogen may have to be added per ml starting volume. Incubate for 1-2 hr at
several times, as it evaporates.) 37°C.
346 Chapter 9 / Hillis, Mable, Larson, Davis b Zimmer
21. Heat to 60°C. Add 1/2 volume (from step 19) insects. It is a modification of the procedure of
of phenol heated to 60°C and mix. Add 1/2 T.C. Hall et al. (1978; for further information, see
volume (from step 19) of CI; mix for 10 mi11at Hamby et al., 1988).
60°C.
1. Place 5-10 g of liquid-nitrogen-powdered tis-
22. Cool on ice and centrifuge at 4OC for 10 min sue in a 50-1111polypropylene tube. Add 25 ml
(4000 g). hot (90-95°Ci borate buffer (Appendix) and
23. Recover aqueous (top) phase (with siliconized homogenize the sample in three 10-scc bursts.
Pasteur pipette) into a new 50-ml centrifuge 2. Filter the extract through sterile cheesecloth
tube. into a fresh tube. Add 0.3 1n1 of 10 mg/ml
24. Repeat steps 21-23. proteinase K solution. Incubate for one hr at
25. Extract twice with CI at room temperature (as 37°C. Add 1 ml of 2 M KC1 to the tube and
in steps 21-23, except for temperature). chill on ice for 5-10 min.
26. Add 2-2.5 volumes of absolute ethanol and 3. Centrifuge at 16000 g i n a swinging bucket ro-
store at -20°C overnight. tor for 10 min at 4OC. Filter the supernatant
27. Repeat steps 17 and 18. through a double layer of laboratory wipes
into a 30-ml glass centrifuge tube. Add 1/4
28. Re-suspend the pellet in DE13-treated, dis- volume of 10 M LiCl. Freeze the sample on dry
tilled water (51ml). ice for 30 min and then keep at 4°C for 2-4;hr.
29, Take optical density readings on a 1/50 dilu- 4. Centrifuge at 13000g ina swinging bucket ro-
tion (10 p1 sample in 490 pl DEP ddH20) at tor for 15 min at 4OC. Pour off the supernatant
260 nm and 280 nm in a spectrophotometer. immediately as the IWApellet will be loose.
An optical density of 1 at 260 nm corre-
sponds to =40 pg/ml for RNA. Pure samples 5. Wash and re-suspend the pellet with 5 ml of
of RNA have a ratio of optical density read- cold 2 M LiCl. Centrifuge as in step 4.
ings at 260 nm/280 nm of ~ 2 . 0lower
; read- 6. Re-suspend the pellet in at least 2 in1 of 2 M
ings indicate contamination by protein and/ potassium acetate, pH 5.5. This pellet often re-
or phenol. quires extensive vortexing and some warm-
30. Separate the sample equally into two micro- ing to re-dissolve. Add 2.5 volumes of ice-
centrifuge tubes, precipitate one of them with cold ethanol and store at least 4 hr at -20°C.
2-2.5 volumes absolute ethanol, store at -80°C 7. Centrifuge at 12000g in a swinging bucket ro-
(for long-term storage). tor for 15 min at 4OC.
31. To the other tube of sample add dithiothreitol 8. Air-dry the pellet and dissolve in 5 ml STE.
(DTT) and RNasin as follows: 1 pl of 2.5 M (Remove aliquots of 20-50 pl here and af
DTT per 500 pl of sample (or 10 pl of 0.25 M steps 9 and 10 to assay RNA integrity on
DTT per 500 pl). Centrifuge, vortex, and cen- agarose minigels). Add 5 ml of PC1 and ex-
trifuge again. Add 12.5 ,ul RNasin per 500 pl tract the sample with tl~oroughmixing for 2-5
of sample. Centrifuge, vortex, and centrifuge. min. Let stand on ice for 10 min.
Store at -gO0C. 9. Repeat step 7. Remove the top layer and put
in a fresh glass centrifuge tube (remove
minigel aliquot) and then add I nd of 4 M am-
Pxsbfocol5: Is~liileionsi RNA from monium acetate and 10 ml ice-cold absolute
ethanol. Mix well and store 4 hr to overnight
Pliinnts at -20°C,
(Time: day 1: 8-10 hr; day 2: -4 hr)
10. Repeat step 7. Dry the pellet in a vacuum cen-
This technique is the most effective for isolation of trifuge. Dissolve in 1-2 ml of l x TE (use more
RNA from plants and algae, as well as from many TE with larger pellets). Determine RNA con-
centration and purity as inProtocol 4, step 29.
Nucleic Acids IV: Seqtlelzcing and Cloning 347
6: Pxepa~dtion(94
~30t<3~:09 Partial Gene 7. Package the ligation u s ~ n ga comlnerc~al
I,ibrxics ir~.ILBacleriophage Vectors packaging extract, follow~ngtlre manufac-
(Time: day 1 [steps 1-61: 56-10 hr; day 2 [steps turer's instructions. (Tlr~sstep varies slightly
7-93 : =2-3 1 ~ ) with various packaging extracts, but usually
Involves simply adding the ligation mixture
Tire following protocol presents an example of directly to a freeze-thaw lysate and a sonic ex-
lambda cloning that is typical for many commer- tract and incubating at room temperature for
cially available lambda cloning vectors. One can a few hours).
also grow and purify lambda DNA and make 8. Dilute the packaged phage with 0.5-1 ml of
extracts rather than using commercial PDB (Appendix). The resulting gene library
preparations (see Berger and Kimmel, 1987, for should contain lo6-loy recombinant phage
details). In step 1, the DNA may be digested to (depending on the efficiency of the packaging
completion with a particular endonuclease that is extract used and tlre quality of the DNA lrga-
known to flank the region of interest. If this infor- tion) with inserts from 1 to 23 kb (dependmg
mation is not known, it is usually preferable to di- on the cloning vector used).
gest the DNA partially with an endonuclease that 9. Plate serial dilutions (1 p1-0.1 pl-0.01 ill) of
has a short recognition sequence (e.g., MboI) to the gene bank to determine the titer and re-
generate fragments of the desired size. combination efficiency (sce Protocol 7).
1. Digest I pg of target DNA with the desired
cloning enzyme (e.g., EcoRI).
2. Ethanol-precipitate the restriction digest by Prorocoi 2 Gg'xro~rringBackeriophagc
the addition of 1/10 volume of 2 M NaCl and (Time: day 1: =I0 min; day 2: =I hr [plus incuba-
2.5 volumes of absolute etlrairol. (Note: 3 M tion time])
NaAc can be used in place of 2 M NaC1, but liecornbinant lambda bacteriophage are grown by
small traces of NaAc seem to be more detri- adding aliquots or serial dilut~oiisof the phage li-
mental to ligation efficiency). brary to appropriate host bacteria, then plat~ng
3. Incubate two or more hours at -20°C, then the bacteria and selecting the resulting plaques
centrifuge at ~ 7 0 0 0g for 20 min. Decant the For titering libraries, it is usually desirable to plate
ethanol and dry the pellet in a vacuum cen- several 10-fold serial dilutions of the stock to dc-
trifuge. termine the concentration. If relatively few sc-
4. Re-suspend the pellet in 10 p1 of water. combinant phage are obtained, or if larger quanti-
ties of the library are desired, the library can be
5. Assay 5 ,ul of the digested DNA on a minigel
amplified (see Berger and Kimmel, 1987) Flow-
with standard lanes containing 0.1 and 0.5 pg
ever, it should be cautioned that some recornbi-
of DNA. This will verify that the restriction
nant bacteriophage will replicate much faster than
digest and subsequent ethaizal precipitation
others (because of the size of the insert), and tlwt
were successful.
the amplified library will therefore overrepreseni
6. Add an equal molar ratio of target DNA to some clones and underrepresent others. Tlrerc-
lambda phage arms. For instance, if the ap- fore, usually ~tis best not to amplify the library
proximate average target cloning size is 8 kb unless absolutely necessary.
and the lambda arms total 40 kb, then add 0.2 For growing lambda bacteriophage, strams of
,ug of the digested DNA (=2 ~ 1to) 1 pg of bacteria are selected that do not allow recombma-
lambda phage arms and bring the total volume tion among the phage (recA- strains); these stralns
to 3.5 @. Next add 0.5 ,dof lox ligation buffer, are typically supplied wlth the phage arms and
0.5 @ of T4 DNA ligase (2 Weiss uuts), and 0.5 the recA- phenotype can usually be maintained
pI of 10 xnM A n , pH 7.5.Mix tke ligation reac- by antiblotic selection, Systems for detectlon of re-
tion thorougldy and incubate for one hour at combinant versus reconstituted lambda bacteno-
room temperature, then overnight at 4OC.
348 Clzapter 9 / Hillis, Mable, Lnrson, Davis & Zimmer
phage also vary with different host strains; some To find the particular gene or DNA region of in-
systems use color selection by IPTG/X-Gal (see terest, one must screen the gene library by plating
Bergcr and Kimmel, 1987) and others use bacterla the phage at an appropriate density (typically
t11al only allow recombinant lambda growth. The 2,000-50,000 plaques/plate), transferring the
ba51c protocol for growing bacteriophage is given phage DNA to a binding membrane (filter lift),
below; variations may be required for particular and hybridizing the filter Iift with an homologous
bacterial host strains. probe. This procedure is relatively easy if the gene
is present in high copy number (e.g., the rRNA
1. Pick a single colony of the host strain from a
genes, heterochromatic repeats, or mtDNA frag-
plate that contains the antibiotic that allows
ments) and is flanked by appropriate restriction
selection for the recA- phenotype, and add to
sites for the library that has been constructed. Sin-
L-broth (Appendix) plus 0.2%maltose plus 10
gle-copy genes require screening of many more
mM MgS04using sterile technique (250 rnl of plaques (often as many as lo6);this may require
L-broth is enough for most applications).
plating on larger plates than in the protocol belour
Grow overnight:with vigorous shaking ( ~ 3 0 0
or use of a lambda strain that accepts larger frag-
rpm) at 37OC.
ments.
2. Centrifuge UI sterile tubes at 1000 g for 10 min This protocol is among the simplest for iden-
to pellet the cells. tifying clones of interest, although numerous
3. Re-suspend the cells in one half of the origi- other techniques are more applicable in particular
nal volume of sterile 10 rnM MgS04. situations. For a review of the various methods,
4. Remove L-broth plates from 4°C refrigerator see Berger and Kimmel(1987) or Ausubel(1989).
and warm in incubator (3T°C). 1. Plate out the phage at a density where the
5. Mix 200 pl of cells for a 100-mm plate or 450 plaques cover the majority of the plate, but do
,id of cells for a 150-mm plate with the phage not overlap significantly. Square plates are
stock in a sterile culture tube. Incubate at 37OC preferable to round plates, as square filter lifts
for 15 min with gentle (=I00rpm) shaking. save film during autoradiography. For a 100-
6. While the cells plus phage are incubating, mm square plate, approximately 2,000-10,000
melt L-broth top agarose in a microwave oven plaques can be screened efficiently. Incubate
and allow it to cool to 48°C. Hold at 4S°C in plates for =8 hours at 37OC.
water bath. (Top agarose is preferable to top 2. Cool the plates for several hours at 4OC to
agar, because the former will not stick to filter harden the top agarose.
lifts as readily). 3. Carefully lay a nylon (or nitrocellulose) filter
7. After the infection is complete, add 3 ml(100- onto the surface of the plate and wait about 2
rnm plate) or 7 rnl(150-mm plate) of 48°C top min for it to absorb moisture (and phage
agarose to the culture tube, vortex gently, and DNA) from the plate. No bubbles should be
pour over the surface of the plate. Tilt the trapped under the nylon or areas of the plate
plate to spread the agarose evenly. Grow 6 hr will not transfer well. While waiting, stick a
to overnight in a 37°C incubator, until plaques hypodermic needle containing waterproof ink
are approximately 1 mm in diameter. through the filter into the plate in three to five
places. This should mark both the filter and
the plate with ink dots so that they can be re-
Bacteriopfiage
PzakutoI 8s S c r c c n i ~ ~ g aligned later.
f,ihrnrics 4. Carefully peel the nylon filter off and place
(Tim?: step 1: see Protocol 7; step 2: =2 hr; steps into denaturing solution (Appendix) for =2
3-7 =2 5 hr; step 8: =2 days; steps 9-10: see min. Meanwhile, lay a second filter on the
Pro~ocol7; steps 11-12: =4 days; total time: =1 plate and repeat the process, this time waiting
weeii) =4 min before removing the filter. Mark the
second filter in the same spots as the first with

the waterproof ink. Place the second filter into
denaturing solution for 2 min.
5. Transfer the nylon filters to neutralizing solu-
tion (Appendix) for 5 min.
6. Transfer the nylon filters to 2 x SSC (Appen-
dix) for 30 sec.
7 . Air-dry the filters and bake for 2 hr at 80°C in
a vacuum oven (or 30 sec in a UV oven).
8. Hybridize to the desired probe sequence (see
Chapter 8 for details of this procedure). See
Figure 6 for results of filter-lift hybridization.
Positive plaques should appear on autoradi-
ographs from both filter lifts; dark marks on
the autoradiograph of only one filter are false
positives.
9. For primary screening, align the plate and the
resulting autoradiograph using the ink
marks. Use the wide end of a sterile Pasteur Figure 6 Autoradiograph of a filter lift. The dark cir-
pipette to "plug" the plate at the region con- cles correspond to positive plaques and the small light
taining a positive plaque. Place the agar plug marks correspond to negative plaques.
into 0.5 ml of PDB plus 50 pl chloroform. This
0.5 ml phage stock is the working stock and
will contain the desired clone plus several ad- Once the lambda clone of interest has been iso-
iacent clones. lated, large quantities of the cloned DNA can be
10. Titer the working stock (Protocol 7) and plate grown and purified. Although it is possible to se-
-100 plaques. quence the cloned DNA in the lambda vector di-
rectly, it is usually desirable to subclone the DNA
11. Repeat the screening process described above into a plasmid or M13 vector, because of greater
(steps 2-9). For secondary screening the ease of sequencing and DNA preparation. For
phage should he plated at much lower den- most lambda vectors, the cloned DNA must be
sity so that each plaque is clearly separate. isolated prior to the subcloning steps. Some
12. Plug "secondary" isolated positives with the lambda vectors (such as the lambda ZAP vectors
small end of a sterile Pasteur pipette and of Stratagene Cloning Systems) contain a plasmid
again put into 0.5 ml of PDB plus 50 pl chlo- w i t h the lambda vector, and allow an in vivo ex-
roform. This is the stock from which you will cision of the plasmid using a helper phage, thus
isolate DNA in Protocol 9. Check clones by bypassing the subcloning steps. For other lambda
agarose gel electrophoresis, restriction analy- vectors, the following simple protocol can be used
sis, and autoradiography (Figure 7; see also to isolate the cloned DNA for subcloning. For
Chapter 8). other protocols or for large scale isolation of
phage DNA, see H. Miller (1987).
1. Add 450 pl of host bacterial cells (prepared as
Protocol 9: Miniprep Isolation of h described in Protocol 7, steps 1-3) to enough
Bactcrrioptmage D NA lambda stock to contain approximately
(Time: day 1 [steps 1-21: 30 min plus incubation 50,000-100,000 plaque-forming units (pfu) in
time [6-8 hr]; day 2: =4 hr) a sterile culture tube. Incubate at 37OC for 15
min with gentle shaking (=I00 rpm). Then
350 Chapter 9 1 Hillis, Mable, Lavson, Davis & Zimmev
Figure 7 (A)Gel for checking a series of lambda bac- of the lambda bacteriophage; the smaller fragment in
teriophage clones (even lanes) and their plasmid sub- each of the odd lanes is the linearized plasmid vector.
clones (odd lanes) digested wit11 EcoRI. Lanc 9 is (B) Autoradiograph of Southern blot from check gel
lambda DNA digested with Ni~dI11.The two larger shown in Figurc 8A, hybridized with an homologous
fragments in the even lancs correspond to the two arms probe to vcrify clones.
add 7 ml top agar (or top agarose) at 48°C (as 10. Decant the supernatant and allow the in-
in Protocol 7, steps 6-7) and plate on a 150 verted tube to drain thoroughly. (Note: A
mm L-broth + MgS04 + maltose plate (Ap- white precipitate should be clearly visible).
pendix). Grow 6-8 hr at 37OC. The plaques 11. Re-suspend the pellet in 0.5 ml of PDB in a
should be confluent or nearly so. 1.5-1111microccntrifuge tube. Add 5 /.d of 0.5 M
2. Add 5 ml of PDB (Appendix) to the plate and EDTA.
shake gently at 4OC overnight. 12. Incubate at 65OC for 15 min.
3. Remove the PDB with a Pasteur pipette and 13. Extract twice with an equal volume of PC1 as
transfer it to a glass or polypropylene cen- described in Protocol 1 (steps 6-9). A large
trifuge tube. Add 200 pl of chloroform and amount of PEG will collect at the interface
mix. during these extractions.
4. Spin down the debris at 7500 g for 10 min at 14. Extract twice with an equal volume of CI as
4°C. described in Protocol 1 (steps 10-13).
5. Collect the supernatant, transfer it to a clean 15. Add 50 pl of 2 M NaCl and 1 ml of ethanol to
glass or polypropylene centrifuge tube, and precipitate the DNA.
add 1 pglml of DNase I and RNase A (nor-
16. Centrifuge at 7500 g far 10 min to pellet the
mally kept as 1 mg/ ml stocks).
DNA.
6. Incubate 30 min at 37OC.
17. Decant the ethanol, dry the pellet in a vac-
7. Add an equal volume of PEG stock (Appen- uum centrifuge and re-suspend the DNA in
dix) and mix gently. 250 pl of l x TE. Check concentration and pu-
8. Incubate 1 hr on ice. rity spectrophotometrically as described in
9. Pellet the precipitated phage by centrifuga- Protacol 1, stcp 18. 10 ,L of this
II stock should
tion at 12,000 g for 20 min at 4OC. be ample for a test restriction or a subclol~ing
experiment.
Nucleic Acids IV: Sequencing and Clo~zilzg 351
Pmotdscok 10: Subcloning into Plasmids 11: P r e p a x a t d ~r b~f ~Frozen

19ro&ocol
or MI3 Couapctcsrt Cells box Tx;ansfharn-nsai,ic~~'t.
(Time: steps 2-10: =6 1v) (Time: day 1: 10 mm; day 2. =3-4 lir; day 3. =30
mill)
The following protocol assumes that the target
DNA sequence is flanked by appropriate restric- In order for the plasmid, clones produced in Pro-
tion sites for the vector of choice. If not, then link- tocol 10 Lo be grown in quantity, they must be in-
ers need to be added to the target sequence (see troduced into bacterial host strains. This is ac-
Helfman et al., 1987, for protocol and additional complished at high efficiency by making the host
information). If the target DNA has been ampli- bacteria competent for transformation. Production
fied by PCR, then TA-cloning (see Protocol 18) can of competent cells requires careful attention to de-
be used instead. tail (especially with regard to maintaining low
temperature and the density of cells at harvest),
1. Isolate DNA from the lambda clone of inter-
and the use of sterile tubes, glassware, and solu-
est (Protocol 9) or from PCR amplification
tions. There likely will be considerable variation
(Chapter 7).
in transformation efficiency from batch to batch of
2. Digest 1 pg of the DNA with the appropriate competent cells, so if one preparation prod,uces
restriction enzyme(s) to cut out the sequence poor results, try the procedure again. There are
of interest. also other methods for preparation of competent
3. Digest 0.5 yg of plasmid or MI3 vector DNA cells that may work better for the particular strain
with a restriction enzyme that produces corn- of cells being used (see H.Miller, 1987; Sambrook
patible ends (e.g., BamHI and BglII produce et al., 1989). Commercial preparations of compe-
compatible ends). tent cells are also available and usually are of reli-
4. Add 1/10 volumc of 2 M NaCl and 2.5 vol- able quality.
umes of cold absolute ethanol to precipitate 1. Inoculate 10 rnl of L-broth with a loopful ol
the DNA. Incubate for 2 hr at -20°C. an appropriate strain of E. coli cells frotn a an-
5. Centrifuge for 15 min at 12,000 g to pellet the gle colony. Grow overnight at 37°C.
DNA. 2. Subculture 5 rnl of the overnight culture into
6. Decant the ethanol, and dry the pellet in a 500 ml of L-broth in a 2-L flask.
vacuum centrifuge. 3. Grow to OD6oo= 0.4-0.5, as measured wrlh a
7. Re-suspend the pellets in 10 pl of water and spectrophotometer (usually 2-3 fir).
assay 2-3 pl on a minigel. This will verify that 4. Pour the culture into sterile 250-1111 plastic
the restriction digest and subsequent ethanol bottles. Centrifuge at 2500 g for 5 min. Decant
precipitation were successful. the supernatant.
8. Mix the target and vector DNA in a 2:l molar 5. Iic-suspend the pellets in 100 n ~ of l cold
ratio of ligatable ends. Use the size of the (04°C) 0.1 A4 MgC12 (total volume) Transler
cloning vector and of thc targeted insert to the cell suspension to two 50-1111 Oak Rlclgc
determine the molar ratio. tubes. Note: From t h ~ point
s on in the proto-
9. Bring the volume of the DNA solution to 39 col, the cells must be kept between 04°C
,dwith water. Add 5 pl of lox ligation buffer 6. Incubate the cells on ice for 5 min.
(Appendix), 5 pl of 5 mM ATP, and 1 pl (4
7. Centrifuge t l ~ cells
e at 2500 g for 5 lnin at 4°C.
Weiss units) of T4 DNA ligase.
Decant the supernatant.
10. Mix the ligation reaction and incubate
8. Wash the cell pellets with cold (04'C) 0 1 M
overnight at 4°C.
CaC12.Do not vortex. Centrifuge at 2580 8 for-
11. Transform the ligation (Protocols 11-12) and 5 1m11at 4°C. Decant the supernatant.
screen for the desired clone.
352 Chapter 9 / Hillis, Mable, Larson, Davis 63 Zimmer
9. Re-suspend each pellet in 7 ml of cold 0.1 M 1. Thaw frozen competent cells on ice.
CaC12. 2. Aliquot 200 pl of cells into a sterile tube on
1 0 Incubate the cells on ice overnight. ice. This should be enough cells to allow effi-
11. Add 3 rn! of ice-cold 50% giyceroi/50 rnM cient transformation with the DNA from a
CaC12to each ttlbe. Mix gently. subcloning experiment.
12 Alrquot 0.5 ml of cells/tube into pre-chilled 3. Add ligated DNA in up to 50 PI total volume
tubes and quick-freeze in liquid nitrogen. and sufficient 1.0 M CaCl, to keep the Ca2+
Store the frozen cells at -80°C. Cells prepared concentration at 0.1 M.
in this manner retain 290% of their original 4. Mix thoroughly and incubate on ice for 30
competency for up to one year. min.
5. Heat shock the cells at exactly 42°C for 2-3
min.
PruiacasE 11.2:Transformation of E. culi 6. Allow the cells to cool to room temperature,
wit11 BJlasmid DNA then add 1ml of L-broth.
(Time: =2 hr to step 9) 7. Incubate the cells at 37°C for 30 min to allow
the expression of drug resistance.
The following protocol is used to isolate and
screen plasmid clones created in Protocol 10. The 8. Spread the cells on L-broth + 1%agar plates.
plasn~idsare introduced into competent E. coli The plates should contain the appropriate an-
cells (produced in Protocol 11). Because the plas- tibiotic for the plasmid (e.g., 100 mg ampi-
lnrd carrres a gene for antibiotic resistance (typi- ciIlin/l), as well as 50 mg IPTG and 40 mg
cally ampicillin or tetracycline), the transformed X-Gal per liter of broth.
bacterla can be isolated by growing the cells with 9. Grow overnight at 37OC. Colonies that contain
the appropriate antibiotic. However, cells with recombinant plasmids will be white; colonies
both recombinant as well as non-recombinant that contain non-recombinant colonies will be
plasrnids will grow under these conditions, so a blue. DNA can be Isolated from white
secorld screening condition is usually imposed. colonies for screening by using a scaled-down
For some plasmids, this involves a second gene for version of Protocol 14, part A. After the cor-
a ddferent antibiotic resistance that is disrupted by rect clone is identified (Figure 71, it should be
cloning inko the target sitc Recombinant piasmids streaked onto a new plate (with the appropri-
are chen separated from non-recombinant plas- ate antibiotic), grown in volume for DNA iso-
~ n d by
s replicate platmg on plates with one and lation (Protocol 14), and frozen for permanent
wlllx both antibiotics. Most plasmid vectors, how- storage (Protocol 16).
ctfer, use color screening for recombinant plas-
mids The most cornmon system involves a P-
galactosidase gene that bridges the cloning site. By Profocol13: Translonnation of MI3
addl~igappropriate substrates to the plates (X-Gal Bacteriophage DNA
and TPTG), bacterial colonies that contain non-re- (Time: *1 1w to step 9)
cornbinant plasmids will produce blue colonies,
whereas colonies with recombinant plasfnids This protocol should be followed for transforma-
(which have non-functional Pgalactosidase genes) tion of E. coli with M13 clones. Blue/white screen-
w l l produce wh~tecolon~es.The following proto- ing for recombinant DNA (as described in Protocol
col assumes that a plasmid with blue/white 12) is used for MI3 phage. One-tenth of a sub-
screening 1s used. (For information on alternative cloning reaction involving 1 ,ug of M13 DNA will
screening methods, see Berger and Kimrnel, 1987.) yield sufficient recombinant phage for analysis.
Ten LOO-mm plates will be sufficient for a transfor-
1. Thaw frozen competent cells on ice.
miition involving up to 0.1 ,ug of vector DNA.
2. Aliquot 200 ~1 of cells into a sterile tube on suspension to 50-rnl polypropylene centrifuge
ice. tubes. Incubate at room temperature for 5
3. Add the ligated DNA and sufficient 1.0 M min.
CaC12to keep the Ca2+concentration at 0.1 M. 4. Add 8 ml of freshly made 0.2 M NaOH plus
4. Mix thoroughly and incubate on ice for 30 1% SDS. Mix by hand and incubate at room
min. temperature for 5-15 min. The solution
5. Heat shock the cells at exactly 42°C for 2-3
should become less viscous during this time.
min. 5. Add 6 ml of 5 M KAc, pH 4.8. Vortex thor-
oughly, then incubate on ice for 5 min.
6. Allow the cells to cool to room temperature,
then aliquot the transformation into a number 6. Centrifuge at 7500 8 for 10 rnin at 4°C. Care-
of sterile culture tubes equal to the number of fully transfer the supernatant to a new tube.
plates desired. 7. Add an equal volume of PC1 and vortex. Cen-
7. Add 100 pl of a fresh overnight culture of an trifuge for 1 min at 7500 8. Transfer the aque-
appropriate strain of E. coli to each tube. ous (top) phase to a new tube.
8. Add 4 ml of warm (48°C)top agar and imme- 8. Add an equal volume of ether and vortex.
diately spread on a L-broth + 1% agar plate Centrifuge for 10-20 sec. Remove ether (top
containing 50 mg IPTG and 40 mg X-Gal per layer), and save lower layer in tube.
liter of medium. 9. Add 2.5 volumes of cold absolute ethanol,
9. Grow overnight at 37°C. See comments under mix thoroughly, and incubate 10 rnin or
Protocol 12, step 9. longer at -20°C.
10. Centrifuge at 8000 g for 5 rnin at 4OC to pellet
the plasmid DNA. Decant the ethanol. Add 1
of Plasmjid DNA
Protocol 14: EsolaQic~n ml of 70% ethanol and transfer the DNA pel-
(Time: Part A: day 1:=I0 min; day 2: -2 hr. Part let plus 70% ethanol to a microcentrifuge
B: day 1: =I0 min; day 2: =3 hr; day 3: =6 hr) tube. Spin in microcentrifuge for 1 min, de-
cant ethanol, and dry the DNA in a vacuum
The following protocol contains two parts. Clean centrifuge until the ethanol has just evapo-
preparations of plasmid DNA suitable for most rated.
purposes (including sequencing) can be obtained 11. Dissolve DNA in 1 ml of I X TE. Add 10
by following part A of the protocol. If further pu- pg/ml of RNase A and incubate at 37OC for 30
rification is necessary, the CsCl protocol (part B) min.
can be used, but part B requires an ultracen- 12. Add 0.1 m15 M KAc, Repeat steps 7-10.
trifuge. Either part can be scaled up or down as 13. Dissolve DNA in u p to 1 ml of I x TE, and
needed. For alternative protocols and modifica-
check concentration and purity.
tions, see H. Miller (1987).
Past A. Basic Method Part: B. CaCf Gradient Yerrificatiun

1. Grow the desired cells containing plasmids 1. Follow steps 1-10 of part A.
overnight in 300 ml of L-broth plus the ap- 2. Re-suspend the pellet in 5 ml of 1.7 g/ml
propriate antibiotic at 37OC with vigorous CsCl in IX TE.
shaking ( ~ 3 0 rpm).
0 3. Add 250 p1 of 4 mg/ml ethidium bromide.
2. Centrifuge the cell culture in 250-ml bottles at Check the density of this solution and adjust
2500 g for 10 rnin at 4°C. it to 1.60 g/ml by adding more CsCl or TE.
3. Re-suspend the cells in 4 ml of GTE (Appen- Note: Ethidium bromide is a mutagen and
dix) plus 1 mg/ml lysozyme. Transfer the suspected carcinogen.
354 Chapter 9 / Hillis, Mable, Larson, Davis b Zimmer
4. Centrifuge for 18 hr at -150,000 g at 20°C in 8. Remove the supernatant, recentrifuge for 30

an ultracentrjfuge. sec (7000 g), and remove any residual super-
5. Collect the plasmid band (the lower of the natant.
two bands) by side puncture with a syringe. 9, Re-suspend the phage pellet in 100 of 1Ox
Add an equal volume of l x TE to dilute the TE.
CsC1. 10. Extract with 200 ,ul of PC1 as described in Pro-
6. Extract repeatedly with CsC1-saturated tocol 1, part A (steps 6-81. Note: Sacrifice yield
isoamyl alcohol until the aqueous phase is to avoid the interface.
colorless. (The alcohol is the upper phase; 11. Extract with 200 pl of chloroform as described
keep the lower phase). in Protocol 1, part A (steps 10-12).
7. Dialyze the DNA solution against 4 L of 1 0 ~ 12. Extract with 500 pl of ether. Note: The ether
TE for =4 hr at 4°C. See Appendix for prepa- will be the upper phase and the DNA will be
ration of dialysis tubing. in the lower phase.
8. Ethanol-precipitate the DNA, re-suspend it in 13. Add 10 pl of 2 A4 NaCl and 250 p1 of absolute
200 pl of 1x TE, and check concentration and ethanol to precipitate the DNA.
purity. 14. Centrifuge at 7000 g for 10 min to pellet the
DNA. Rinse the pellet once with 70% ethanol
and recentrifuge briefly at 7000 g.
Protocol 15: Miniprep Iso%atliesmof 15. Decant the supernatant, dry the pellet in a
MI3 DNA vacuum centrifuge, and re-suspend it in 20 pl
(Time: day 1: 10 min; day 2: = 2 3 hr) of l x TE.
M13 DNA can be isolated using ProtocoI 14, part
A (without the antibiotic in step 1 or lysozyme
treatment in step 3), or small amounts of MI3 HProtocoB.16: Yreparin-~g
Permanent
DNA can be isolated using the following protocol. Froxea~Stocks of Plasmid Clones
This protocol is useful as a first step if large num- (Time: day 1: =10 min; day 2: =I0 min)
bers of MI3 clones are to be screened; larger quan-
tities of the desired DNA can then be prepared us- Stocks of bacteriophage clones are best stored at
ing Protocol 14. 4°C in PDB (Appendix) with 0.4% chloroform
added. The chloroform will prevent bacterial
1. Inoculate 2 ml of L-broth with 1 drop of an growth and preparations are stable for years.
overnight host bacterial culture and a single
However, it may be necessary to amplify t l v
white plaque.
stocks after prolonged storage. Plasmid clones can
2. Incubate 12-16 h r at 37°C with shaking (-300 be stored indefinitely at -80°C using the follow-
rpm). ing protocol.
3, Remove 1.5 ml of the culture and centrifuge 1. Grow a fresh overnight culture of the desired
for 5 min at 7000 g. clone in liquid media plus antibiotics.
4. Remove 1 ml of the supernatant and place it
2. Combine 0.85 rnl of the overnight culture
in a clean microcentrifuge tube. Be careful to
with 0.15 ml of sterile glycerol and mix well
avoid the bacterial pellet. by vortexing.
5. Add 150 ,H I stock (Appendix) and mix
of PEG 3. Flash-freeze in liquid nitrogen and store at
thoroughly.
-80°C. Cell stocks prepared in this manner
6. Incubate for 30 min on ice. will last for years if they are not allowed to
7. Centrifuge for 5 min at 7000 g. There should thaw. To access the frozen cells, simply scrape
be a clearly visible pellet. the top of the frozen culture with a sterile
Nucleic Acids IV: Sequcncirzg and Cloning 355
loop and spread on a plate with appropriate be cloned into a sequencing vector (often required
antibiotics. for fragments larger than ~ 6 0 bp),
0 it is helpful to
incorporate a restrictian enzyme recognition site
on the 5' end of the primers. A primer so con-
Protocol 17: Isolaficses of PCR Products structed should 11ave an additional24 bases 5' to
for Seqascslcing the restriction site. Mismatches at the 5' end of the
(Time: =5 hr for double-stranded template, 010 primer will not usually impede the amplification
hr for single-stranded template) process, although an absolute match of the primer
to thc target DNA is preferable. Although primers
A complete discussion of PCR techniques is given as short as 17 bp have been used effectively, it is
in Chapter 7; see Protocol 2 of that chapter for the usually desirable to use primers in the range of
basic amplification procedure. Among parameters 25-35 bp. Care should be taken to match the melt-
that can be varied to optimize amplification are ing temperatures (T,,) of the two primers: T,,, =
the concentrations of DNA template, primers, [4 x (#G's + C's)l + [2 x (#A's + T'sll. Primer mix-
dNTPs, Mg2+,KCl, and Taq polymerase, as well as tures with up to 256-fold degeneracy have been
the length and temperature of the annealing and used successfully in PCR amplification, although
extension cycles (see Gyllensten and Ehrlich, 1988; more than 32-fold primer degeneracy often results
Lawyer et al., 1989; T.J. White et al., 1989; Kocher in highly heterogeneous amplification products,
and White, 1989; Chapter 7). There are a number which may result in ambiguities in sequ.encespro-
of DNA polymerases that may be used for PCR duced. Degeneracy should be no more than two-
amplifications (e.g., Taq, Vent, Deepvent, Pfu). fold at any one site.
Choice of a polymerase may vary depending on Several methods have been developed for se-
the size of the target fragment, the denaturing quencing DNA from PCR reactions. Parts A and B
temperature to be used, and the necessity to have describe two methods for sequencing the ampli-
or not to have proofreading activity associated fied product. Sequencing single-stranded tem-
with the enzyme. For example, Vent polymerase plate through asymmetric reamplification (part:A;
is stable at higher temperatures than Taq poly- Gyllensten and Erlich, 1988) usually is limited to
merase and is recommended for target sequences short ($600 bp) amplified fragments; longer [rag-
that may have strong secondary structure, be- ments can be sequenced directly using part B.
cause the denaturing temperature can be raised to Both direct sequencing options (parts A and B)re-
99°C. "Long-ranging PCR" (i.e., amplification of quire homogeneous amplification product. If the
large fragments) has varied performance with dif- PCR product is heterogeneous, or a clone is de-
ferent polymerases and works best using a com- sired, then the DNA should be inserted into a se-
bination of enzymes, one of which has proofread- quencing vector for analysis (see Protocol 18).
ing activity (see W.M. Barnes, 1994). Remember
that each enzyme has its own requirements for Paaft A, Asymmetric Reamplification
buffer co~nposition(e.g., Vent uses MgS04,
1. Purify amplified DNA samples using low-
whereas Taq uses MgC12) and when combining
melting-point agarose gels (Protocol 19) or
enzymes, one buffer may work better than the
Centricon 30TM cartridges (Amicon Corp.,
other. For example, Taq works well in Vent buffer
Danvers, MA, USA). Wash t l ~ eCentricon 30IM
(in fact, it sometimes works better in Vent buffer)
cartridges by applying 2 ml of TE and ccn-
but not vice versa.
trifuging at 4800 g for 10 min at 4°C. Then
Careful primer design is critical to success of
add the 2 ml of amplified DNA solution to the
PCR amplification. The two primers should be
cartridge and centrifuge as above for 15 min.
complementary to opposite strands, and should
Discard the solution in thc reservoir. Collect
flank the target sequence at a distance of up to 4
the purified DNA sample by inverting the
kb (larger fragments can be amplified with di-
cartridge and centrifuging at 200 Q for 2 rnln
minishing success).If the amplified fragment is to
The final volume of DNA sample sl~ouldbe
approximately 100 pl. The yield of DNA Marchuk et al., 1991).Blunt-ended cloning may be
should be approx~mately7-10 pg. used but it is first necessary to enzymatically re-
2. Repeat the amplification (Chapter 7), but with move the 3' overhang using an enzyme with 3' -+
a 1:100ratio of the two primers (some experi- 5' exonuclease activity (see Scharf, 1990; Marchuk
~l~el~tation
in primer ratio may be necessary). et al., 1991j. T4 DPdA polymerase can be used to
Xttpr asymmetric amplification, the low-con- blunt both 5' and 3' overhanging ends in the same
centration primer can be used to sequence the reaction. The exonuclease activity of the enzyme
fragment (see Protocols 21,22, and 25). digests the 3' overhangs to create a blunt end and
the polyn~eraseactivity end fills the 5' overhangs.
If there is a restriction site overhang, Klenow frag-
X'JI ! 1%'i~,olafionof Dijui)lc-Stranbcd 13Nn for
ment can be used because it will cleave the single
> ~ C ! L - ~ ~ C ~ J Z P ,
nucleotide overhang without digesting further.
1 Concentrate the PC17 product to -25 pl total The blunt-ended product then needs to be phos-
voluine in a vacuum centriI%ge/concentrator. phorylated (cold kinasing reaction) prior to liga-
2. Prepare a SepharoseTM CL-6B column (Boeh- tion to the vector. Blunt-end cloning tends to be
ringer Mannl~eim)by mixing thoroughly. Re- less efficient than sticky-end cloning, so screening
move the top cap, then the bottom cap, and with a color selection vector is recommended.
drain excess buffer from the column. Spin the Vectors such as ~ l u e s c r i ~(Stratagene,
t'~ La Jolla,
column 2.5 min at 1100 g.Discard the buffer CA, USA) work on the principle that vectors with-
and repeat spin. out an insert will have a functional fi-galactosi-
3. Vslng a new collection tube, add the sample dasc gene and transformed bacterial colonies will
i1.01~step 1 ( 4 5 ,ul)
to the middle of the col- turn blue, whereas vectors with an insert will not
urnn. S p ~ nfor 10.5min at 1 1 0 0 to
~ recover the have the correct enzyme and colonies will remain
purified DNA. white (see Protocol 12). Other methods are avail-
able that claim to increase the efficiency of blunt-
4 To prepare DNA template for sequencing, use
end ligations (see Liu and Schwarz, 1992) but will
21-2 ,ug of the purified DNA (=I0 pl) in Pro-
not be described here.
tocol 22, part A. Sequence the product using
A more efficient method of cloning PCR prod-
modified T7 DNA polymerase (Tabor and
ucts exploits the A-overhangs created by the ac-
Rrchardson, 1987)as described in Protocol 22.
tion of Taq polymerase without the necessity for
futher modification of the template prior to liga-
tion (Marchuk et al,, 1991).Although several com-
Cfonixng &Ie.thobs for
k 3 r i i ! u ~ ~18:
i
mercially prepared TA-cloning kits are available
Pi,:; TPruducts (e.g., InvitrogenTM, NovagenTM), the procedure is
(Time Part A, sections 1-111: -5-6 hr; section IV:2 relatively straightforward,A T-vector is created by-
hr, plus overnight; sections V-VI: 2-5 hr, plus digesting an appropriate plasmid (e.g., Blue-
overnight. Part B: 2 days) scriptTM) with a restriction enzyme that has only a
For templates that are difficult to amplify or when single restriction site in the vector (e.g., EcoRV).
het~rogeneityof sequences within a size class is The digested vector is then incubated with Taq
suspected, cloning of PCR products can lead to polymerase and dTTP. The absence of any other
better quality and less ambiguous sequences. Sev- nucleotides in the mixture results in the addition
eral cloning methods have becn developed specif- of a single thymidine at the 3' end of each frag-
ically for PCR products and have increased the ef- ment. The vector and PCR product then have
ficrency of cloning. One of the difficulties with complementary single-base 3' overhangs. The 3'
clol-ung PCR products has to do with the terminal T-overhang inhibits self-ligation of the vector, and
translerase property of Taq polymerase, which re- the unphosphorylated 5' end prevents ligation of
sults in the addition of a single nucleotide (usu- PCR products to each other. One important piece
ally adenosine) to the 3' end of the sequence (see of information that is not specified in most of the
TA-cloning kit protocols is that ligations using T- 7. Add 5 pl (1/10 volume) 3 M NaAc (do not
vectors require higher concentrations of ligase use NH4Ac because it interferes with kinase
than is normally required (~4x1.Transformation, activity).
color selection, and screening can be performed as 8. Add 150 pl(3 volumes) of ethanol.
for blunt-end reactions. The efficiency of this 9. Precipitate for 20 min at -20°C.
method is thought to be 100-fold that of blunt-end
cloning when using unmodified PCR products. It 10. Vortex. Centrifuge for 5 min at 4OC.
also requires fewer steps than blunt-end reactions 11. Decant the ethanol.
and is quicker to perform. 12. Dry, re-suspend in 15 p1 ddH,O, and place at
Whichever method is used, once the PCR 37°C to dissolve the DNA.
products hove been cloned, screening for inserts of
the appropriate size may be accomplished in sev- SF,CTIII)N $1. C i ) I I? KINA41WG 01. I'CK 1'ROI)UCTS
eral ways. A quick method (e.g., part A, section V) PXlOII. TO IILCPXING
may be used as a rapid screen to search for posi-
1. Combine:
tive clones. PCR can also be utilized to amphfy the
region of the vector that contains the insert. These 15.0 pl ddH2Qwith DNA (blunt-ended)
methods both rely on the ability to distinguish be- 2.0 pl lox kinase buffer
tween clones with and without inserts on the ba- 1.0 pl DDTT (reducing agent to prevent en-
sis of differential migration through minigels. Di- zyme from oxidizing)
gesting the vector DNA with restriction enzymes 2.0 pl rATP (phosphate donor)
greatly improves the ability to detect differences in 0.5 pl T4 DNA k'~nase
migration of positive versus self-ligated clones. 2. Incubate mixture at 37°C for 3045 min.
I-fowever, preparation of templates for sequencing
3. Add 30 pl ddH,O (to 50 pl) and phenol ex-
is best performed using a more thorough prepar-
tract as above. Centrifuge for 2 min (~7000 g).
ative method (i.e., minipreps). There are many
miniprep methods available. Two methods were 4. Precipitate with 5 pl of 3 M NaAc, 150 p1
described in Protocols 14 and 15 and we have in- ethanol, and 0.5 p1 of 20 pg/@ t W A ,
cluded a rapid method here (part A, section VI). 5. Re-suspend in 12 pl ddH20.
Although the STET method (part A, section VI) is 6. Vortex and centrifuge briefly. Place in heating
faster, alkaline lysis/SDS methods (such as in Pro- block at 37°C for 5 min to dry pellet.
tocol 14, part A) appear to produce plasmid prepa-
7. Add 2 pl to a new tube. Add 10 pg (1 pl of
rations of more consistent quality.
stock solution) of RNase to remove tRNA.
Part I\. 15ltrnt-Bard Cloning 8. Heat at 37OC for 5 min.
%;C!'IOW X. is1 UN7-I NU [ W A C 110h 9. Electrophorese a sample of the reaction on a
1. Combine: minigel to check quality and relative quantity
of DNA.
7.5 1-11ddH20 + DNA
1.0 pl lox polymerase buffer 10. Use 5 pl of the kinased DNA in ligation reac-
tions.
1.O pl5mM dNTP
0.5 pl T4 DNA polymerase
2. Incubate mixture for 30 min at 37°C.
1. Combine:
3. Bring volume to 50 ,dvolume with ddH20.
11.5 pl insert DNA plus ddHpO
4. Add an equal volume of 25:24:1pheno1:cholo- 2.0 pl blunt-ended vector DNA (approxi
roform:isoamyl alcohol.
mately 20 ng)
5. Vortex, Centrifuge for 15 min (~70008). 2.0 pl lox ligation buffer
6. Extract top layer. 2.0 p110 mM rATP
358 Chapter 9 / Hillis, Mable, Larson, Davis B Zi~ninev
2.0p1 100 mMDTT 6. Electrophorese the supernatant on a minigel.

0.5 fl BSA (20 mg/ml) 7. Compare migration of blue and white colony
1.0 pl ligase (I U/pl) preparations.
2. Incubate at room temperature overnight. 8. Save clones with appropriate inserts and pre-
pare for sequencing using a preparative
S ~ C T I O Nra: ranNsx:onn;rkrro;.: miniprep. To store cells, freeze with glycerol
or DMSO (150-300 pl autoclaved glycerol, or
1. Thaw DH5a competent cells (see Protocol 11) 30 p1 DMSO, per ml cell suspension).
on ice. Cells must be in log phase.
2. Combine:
SECTION V1. NfNif7R£1'.{SXT:'~MI-~FWD!
5 pl ligation m i x (use 500 pg of plasmid as 1-3. Performed as in section V above.
control)
100 pl cells 4. Centrifuge (~70008) for 10 sec and decant LB
broth.
3. Incubate on ice for 30 min.
5 , Re-suspend pelleted cells in 300 pl STET
4. Heat shock for 1 min at 37OC (make sure that buffer (Appendix).
there is water in the wells of the heating block
6. Add 15 ,ul of 20 mg/ml lysozyme.
to ensure efficient heat transfer),
7. Vortex lightly.
5. Incubate on ice for 5-60 min.
8. Boil for 45-60 sec.
6. Pre-warm LB/ampicillin plates at 37OC.
9. Centrifuge (67000 g) for 15 min.
7. Combine:
10. Pour supernatant into new tubes.
40 @4% X-Gal (in DMF)
105 pl competent cells 11. Add 400 pl of isopropjrl alcohol.
12. Place at -20°C for 5-10 min.
Mix by tapping gently. Do not mix cells
by pipetting up and down. 13. Centrifuge (=7000g) for 15-20 min.
8. Mix solutions with spreader (flamed in alco- 14. Pour off isopropyl alcohol. Rinse in 70%
hol) and spread them evenIy over the plate. ethanol.
9. Incubate plates (upside down) overnight at 15. Place sample in a heating block to dry and re-
37°C. suspend in 30 pl ddH20.
16. Digest wit11 appropriate restriction enzymes
SEC'TION V. SCREENING CLONCS: QUICK PREP
to screen for positive clones, use immediately
in sequencing reactions, or store at -20°C. .
1. Pick part of white colonies using a sterile
pipette tip or platinum wire loop. Innoculate
4-5 ml LB broth containing 8 ,ulampicillin (25 Part B. TA Cloning
pglrnl) (or other antibiotic appropriate for the
combination of vector and cells used). Use a 1. Digest vector by combining:
blue colony as a control (i.e., one without an 10 ,ug plasmid DNA in 87 pl ddH20
insert). 10 pl EcoRV lox buffer
2. Grow overnight at 37OC with shaking or rota- 1pl BSA (10 mg/ml)
tion. 2 pl EcoRV enzyme (10 U/,d)
3. Transfer 25 pl of the cell culture to a 1.5-ml 2. Incubate overnight at 37OC.
microcentrifuge tube.
3. Precipitate vector with 10 pl 2 M NaCl and
4. Add 25 pl PC1 (Appendix). 250 pl ethanol at -20°C for 30 min.
5. Centrifuge (~7000 g) for 5 nun. 4. Re-suspend in 90 ,ul ddH20. Electrophorese 5
pl of the sample on a minigel with uncut vec- dence of size polymorphisms, centricon filtration
tor as a standard to check digestion. (e.g., ~ i l l i p a r e MC40)
~" is the simplest and most
5. To add T-overhang, combine: efficient method. The method is very straightfor-
ward and will not be described here. However, it.
85 pl vector digestion can result in sequence ambiguities close to the
10 pl Taq lox buffer primer if primer-dimer bands are present. Di.rect
2 pl dTTP (100 mM) purifjcation of PCR products using a sili.ca bind-
3 pl Taq polymerase (2U/,d) ing matrix (glassmilk) can reduce these problcms
Add 75 pl mineral oil overlay. because it tends to remove small segments of
6. Incubate at 70°C for 2 11r in a thermal cycler or DNA. flowever, the most reliable results often arc
a heating block. obtained by gel-purifying target bands using low-
melting-point agarose (LMP). The agarose can be
7. Extract with PC1 (Appendix) and chloroform
eliminated by phenol extraction but cleaner tc11i-
(see Protocol 1, steps 6-8 and 10-12). plates and better yield can be achieved by ~lsing
8. Ethanol-precipitate (see Protocol 1, steps one of the other purification methods described in
14-16) 15-20 min at -80°C. this protocol. Gel purification is preferred when
9. Re-suspend pellet in 100 p1 0 . 1 TE. ~ This PCR primers result in amplification of multiple
should be enough for 50 ligations. Store at products of different sizes.
4°C. 'LMP gels use the same buffers and proce-
dures as do normal agarose gels but nor~nallyare
run at a relatively high percentage of agarose
5LCl"XDk IX, I.IC;ATlON REACTION
(1.2-2%) to allow maximum separation of bands
1. Combine: and a n easier handling consistency of gels. TAE
4 pl ddH:O buffer is recommended over TBE because borate
1 pl lox ligation buffer (Appendix) is thought to interfere with sequencing reactions.
1p l 5 mMATP Although there are many methods that have been
2 pl T-vector (prepared in section I) used to extract DNA fro111 agarose gels, we will
1pl PCR product (fresh is better; purifica- describe two here. The first is based on isolatioi~
tion may be unnecessary) of DNA on a silica binding matrix as described by
1 pl T4 DNA ligase (3-4 U/pl) L.G. Davis et al. (1986).Kits based on this proce-
2. Incubate overnight at 12OC. dure (or slight modifications thereof) are available
from several companies. The method recovers up
3. Transform with DH5acompetent cells (Proto- to 90% of the initial DNA template and results in
col 12) as for blunt-end cloning (see part A, the elimination of excess proteins, salts, uninco-
section IV). porated nucleotides, primers, and other residual
4. Screen for inserts using X-Gal color selection impurites (e.g., small RNAs, ethid.ium brolxide,
followed by screening methods (see part A, and phenol). The second procedure allows direct
sections V-VI). recovery of DNA from agarose gels by migration
into a well that contains a high-salt buffer (Zhen
and Swank, 1993).This mcthod has been found to
Pro tosol 143: Purification of BsCR recover up to 98% of the initial DNA. It has been
Products for Sequencing used to isolate fragments ranging in size from 200
(Time: =I-2 hr) to 5000 bp. Although the original protocol recom-
mendeh subsequent phenol extraction prior to se-
The efficiency of sequencing reactions by any quencing, the high salt does not seem to interfere
method can be improved by purifying PCR-gen- with sequencing reactions using either direct se-
erated templates prior to sequencing. If unam- quencing or cycle sequencing procedures. Both of
biguous PCR products are generated with no evi- these methods work well and choice of method
360 CFinpter 9 / Hillis, Mable, Larso~z,Davis & Zimme~
may depend on personal preference. The glass- 10. Add 500 pl Tris-ethanol wash buffer (Appen-
1mlk procedure (part A) requires a little more time dix).
but the well method (part B) requires more atten- 11. Mix and spin in a microcentrifuge for 30 sec.
tion while the gel is running.
12. Remove the supernatant with a pipette and
If multiple products are produced in a PCR discard it.
reaction, the smallest products usually will be the
most concentrated. If the target products are large, 13. Repeat steps 10-12 one or two times.
yield may be improved by gel-isolating the larger 14. On the last wash, it is important to ensure
products and then using the purified template in that no liquid remains.
a realnplification reaction. If the band excised 15. Add 10-15 p1 l x TE. The volume used will
from a gel actually contains two products of siml- depend on the final concentration of product
lar s u e , a secoild gel-purification step may be nec- desired. Higher yieid is obtained by perform-
essal y following reamplification. ing two elutions than by a single elution of
larger volume.
Yart Z Glastrmilk Ji'rrriiicafion 74c.L-hod 16. Mix well and incubate at 40°C for 5 min.
1. Electrophorese the PCR product on a 1.5% 17. Mix and spin in a microcentrifuge for 30 sec.
LMP gel in 0 . 5 TAE
~ buffer. The volume of 18. Remove the liquid (DNA pIus 7%) with a
product loaded onto the gel will depend on pipette and save to a new tube.
thc concentration desired and the thickness of
the gel. Large-volume samples either can be 19. Repeat steps 15-18 to increase yield.
preclpltated and concentrated into a smaller 20. Spin the tube containing the final sample in a
volume or products can be divided into microcentrifuge for 10 sec to ensure that all
slnaller volumes and loaded in severaI lanes matrix is out of solution. Before using for se-
of the gel. quencing, it is best to centrifuge again because
2 Stam the gel with ethidium bromide and ex- the matrix can inhibit DNA accessibility.
clse the target band with a scalpel or razor
blade. Place excised bands into 1.5-ml micro-
centrifuge tubes. If thc sample was divided 1. Electrophorese samples at a low voltage (e.g.,
~lnong- lanes, the exclsed bands from the ap- 5 V/cm) on a 1.5%LMP gel in 0 . 5 TAE~ con-
propriate lanes can be combined into the taining 0.2-0.5 pglml cthidicun bromide . The
snmc tube.
buffer should just reach the edgcs of the gel
3 Add 3 volurnes of NaX binding buffer (Ap- but not cover it.
pendix) to tubes conlaining the gel bands. 2. After samples have run halfway, place the gel-
4 lncubate at 40°C for 5-10 min to melt gel. rig on a UV-light box (long wave). (Remem-
5. Vortex matrix (i.e., glassmilk) thoroughly and ber to wear appropriate eye and skin protec-
add 10 ,dto gel plus binding buffer. tion). Excise wells directly in front of target
6. Mix well by inverting and incubate at room bands. Wells should be slightly wider than
temperature for 5-10 min. The efficiency of the bands.
binding can be increased by frequent inver- 3. Add 250-400 pl 15% PEG/TAE to the well.
sion of tubes. The purpose of the PEG is to retard migration
7. MIXand spin in a microcentrifuge for 30 sec. of DNA through the trough buffer so that the
target migrates as a discrete band.
8. Draw off the supernatant with a pipette or as-
plrator and discard. 4. Electrophorese the target bands into the ex-
used wells. It is important to watch carefully
9. Acid 500 p1 Nal binding buffer. Repeat steps at this step. Run at a low current. When the
6-8. band has moved into the well, transfer the liq-
uid from the well into a 1.5-ml microcen- et al., 1992). The method described here is based
trifuge tube with a pipette. The current may on Wilson and Schulles (1992) and was designed
be reversed if target bands have migrated too for screening MI3 plaques. See Lessa and Apple-
far, but the DNA yield may be reduced. baum (1993) for recommendations about opti-
5. Use directly in sequencing reactions or pre- mization.
cipitate to remove salt. DGGE (denaturing gradient gel elec-
trophoresis) utilizes double-stranded DNA and is
able to separate homoduplex molecules based on
rsdstacol 20: Screening Methods for differences in migration through a gel containing
linear gradients of denaturants (i.e., urea) at con-
Detecting Variatiort in DNA Sequences stant temperature (see Chapter 8). The basic prin-
Although screening of cloned products by se- ciple is that DNA segments will partially denature
quencing multiple clones can be used to detect at a point along the gradient determined by its
polymorphisms in sequences under comparison, melting point. Mutations will alter the melting
there are a number of other methods that can de- point and result in changes in the rate of migra-
tect single base-pair changes within or among tion. This is the most sensitive method and can
samples prior to sequencing. Lessa and Apple- detect nearly every mutation in DNA fragments
baum (I 993) recently reviewed these methods and up to 500 bp. It also can be combined with het-
their applicability to population biology, so only a eroduplex analysis (Lessa and Applebaum, 1993)
brief summary will be given here (see also Chapter and efficiency can be improved by adding a GC
8). A detailed protocol is given here only for SSCP clamp to PCR products (Myers et al., 1989;
because it is the simplest procedure that results in Sheffield et al., 1992). Other advantages are that
good resolution of sample heterogeneity. optimization of gradients and conditions can be
The simplest of these methods, heteroduplex standardized across templates, it is more versatile
analysis, is designed to test whether a sample than the other methods, and it can be preparative
contains one or two types of DNA by the differ- (i.e., bands can be cut from the gel and used for
ences in mobility expected between heteroduplex sequencing) as well as analytical (see Lessa, 1993).
and homoduplex molecules on acrylamide gels. It However, it also requires the most specialized
is very simple but relatively limited in its sensi- equipment and most complicated procedures.
tivity and applicability. Methods for pouring gradient gels and maintain-
SSCP (single-strand conformational poly- ing constant temperature electrophoresis will dif-
morphism; protocol below) is a technique de- fer depending on the apparatus used. Therefore,
signed to identify allelic variation at a given locus the reader is best directed elsewhere for detailed
(Orita et al., 1989).The technique is useful for de- protocols. Detailed descriptions of methods and
tecting variation in short fragments of DNA. It equipment required are given by Myers et al.
utilizes the differences in migration in a gel ma- (1986,1989).
trix caused by conformational changes of single-
stranded DNA that result from point substitu- SSCP Protocot
tions, insertions, and deletions. It is relatively 1. Add 1 pl of [d2I'1dATP to the last ten cycles
simple and conditions may be optimized to maxi- of PCR reaction.
mize variation detectability, but optimization may
vary among samples and may require some trial 2: Combine 10 y1 of PCR product with 5 yl of
and error (see Lessa and Applebaurn, 1993). It is formamide sample buffer (95% formamide, 20
estimated to detect 99% of single base-pair mM EDTA, 0.05% bromophenol blue, and
changes for fragments of 100-300 bp and 89% of 0.05%xylene cyanol).
changes for fragments of 300-450 bp (Hayashi, 3. Heat-denature at 95°C (or place in a boiling
1991a,b),but longer sequences may be used when water bath) for 5 min.
combined with endonuclease digestion (Iwahana 4. Immediately cool on ice.
362 Chapter 9 / Hillis, Mable, Larson, Davis & Zimmer
5. Load 4 pl of each denatured sample onto a 0.4 Side view

x 160 mm 6% T13E polyacrylamide gel. con-
taining 5% glycerol.
6. Submerge the gel in 1x TBE buffer maintained
at 20°C by connection to a circulating water
bath.
7. Electrophorese a t 1000 V until the xylene
cyan01 reaches the bottom of the gel.
8. Dry gel and autoradiograph or stain with
ethidium bromide.
Front view
f'xottacot 21: Preparing a Sequencing
Geli
(Time: -1-2 hr)
The details of the following protocol will vary de-
pending on the style of sequencing apparatus
used; a simple sequencing apparatus is shown in
Figure 8. The gel spacers can vary in thickness
from 0.2-0.8 mm. If spacers of uniform thickness
are used, the bands at the bottom of the gel will
be widely spaced, whereas .those at the top will be
very close together. Much longer sequences can be
read from gels that take advantage of field gradi-
ents produced with wedge-shaped spacers (An-
sorge and Labeit, 1984).With wedge-shaped spac-
ers, bands will be much more evenly spaced along
the length of the gel. Wedge-shaped spacers can
be obtained commercially, but are expensive and
often not uniform. An effective alternative is to Figure 8 A basic sequencing gel unit. The gel is
combine two layers of spacers at the bottom of a poured between the two glass plates (El and E2), which
gel, with only a single layer of spacers at the top. are separated by the teflon spacers (H). Note that the.
Experimentation will be required to find the opti- front plate (E2) is slightly longer than the back plate
inal gradient for a particular sequencing system, (El) to allow contact between the gel and buffer in the
lower tray A sharkstooth comb (I) is inserted at the top
but a gradient of 0.2 to 0.8 m m usually is quite ef- of the gel (see Protocol 21). The two plates are held to-
fective. gether by clamps (heavy-dutypaper clamps work well)
Reading long sequences b 6 0 0 bp) requires and the gel is inserted into the lower well (C) where it is
long sequencing gels (>80 cm), wedge-shaped held in place by a plexiglass bar (B). The top of the gel
spacers, and use of 3%-labeled or 33P-labeled is clamped to the side ears of the upper tank (A); nate
that the front of the upper tank is open to allow contact
(rather than 3zP-labeIed) nucleotides. (Some pro- between the buffer and the gel. A rubber gasket (GI
prietary acrylamide solutions [e.g., Long RangerTM forms a seal between the upper tank and the earred
of AT Biochem] also increase the length of read- glass plate (El). A11 aluminium plate (F) is clamped to
able sequence). Pouring and handling very long the glass plates to ensure even heating. The electrodes
gels presents some difficulties. A n alternative are constructed from platinum wire to p-'event corro-
sion. The stand (D) can be modified to permit height
pouring strategy to the one given below is to slide adjustment of the upper buffer tray so that gels of many
the plates together, pouring the acrylamide gel different lengths can be accommodated. Sequencing
mixture ahead of the leading edge of t h e top gels are typically 40-100 cm long and 2040 cm wide.
Nucleic Acids IV:Seq~~el~ci~zg
and Clolzirzg 363
plate. With practice, gels without any bubbles can of the gel apparatus and plates used. TEMED
be prepared from very long plates with tlus tech- should always be added last.]
l~ique.Another technique for pouring long gels 5. Pour the gel solution between the plates us-
it~volvesinjecting the acrylamide through a small ing a 25-ml pipette and a regulating pipette
hole in the bottom of one of the two glass plates bulb or pour slowly and constantly from a
(Slightom et al., 1987). For handling long gels, it beaker. Allow the solution to run down one
may be preferable to bind the gel to one of the edge and fill from the bottom. Avoid forming
plates (using bind-silane [y-methacryloxypropyl- bubbles between the plates.
trimethoxysilane]) rather than to transfer the gel 6. Insert a sharkstooth comb backwards bc-
to filter paper lor vacuum drying. Gels attached tween the plates at the top, aligning the holes
to the glass can be dried with a hot-air blower or In the comb with thc edge of the back plate
in a drying cabinet (see IJrotocol 25). (sl~orterone). Allow gel solution to cover the
1. Prepare the inner surfaces of the gel plates (af- outer surface of the comb. Clamp into place
ter cleaniiig) using 2% dimethyldicldorosilane and allow to rest for one hour at an mcline.
solution in 1,1,l-trichloroethane (add 21 full Removc the clamps after the gel sets.
Pasteur pipette of silane per surface and 7. Pour diluted electrode buffer on the comb
spread with a lab tissue; polish surface until with a Pasteur pipette. Remove the comb and
smooth). CAUTION:Wear gloves and prepare rinse it clean in distilled water. Re-insert the
tlie plates in a fume hood, as the silane solu- comb with the tceth polntlng inward so that
tion is highly toxic. the tips of the teeth barely touch thc surface
2. Clainp the plates together with spacers be- of the gel.
tween them. Be sure that the spacer covers the 8 Cut the tapc from the bottom edge of the gel
complete length of the gel plate. For this w ~ t ha razor blade.
method, d o not use a spacer across the bot- 9. Clamp the gel onto the gel-running appara-
tom of the plates. f11r
3. Tape all sides of t l ~ egel plates except the top, 10. Fill the upper and lower reservoirs wit11 IX
making sure that all edges are tightly sealed. TBE buffer.
Re-clamp the sides of the taped plates. An al-
ternative method involves using a spacer at 11. Use a syringe or micropipetter to clear [lie
the bottom and clamping the sides and bot- wells formed by the sharkstooth comb. Tht:
torn of the gel rather than taping. wells are the spaces between the teeth.
4. Mix tlie following gel solution in a 500-ml 12. Fill the wells with 4.5 pl of stop buffer. Prc-
flask (for a 4% gel): run the gel, setting the current not to exceed
25 mA and the voltage not to exceed 2,000 V.
60 ml urea mix (Appendix) Use a micropipetter with microtl~intips or
20 ml20% acrylalnide (Appendix) capillary tubes drawn to a fine tip to load the
20 rnl l x TBE buffer (Appendix) gel. (Length af pre-run = 15-30 min.)
400 pl 10% ammonium persulfate
50 ,ulTEMED
[Note: T11e concentration of acrylamide for H3~sBocol22:D h T hScqua:naing
DNA gels should be 4-6%. For RNA gels, use Reactioxas
8% acrylaniide. Gels can also be poured using (Time: Part A: =1,5hr; Part B:=30 min; Part C:
a stock solution of the desired percentage of llr)
acrylamide (e.g., 6% working solution, see
Appendix) and adding 10% ammoniuln per- The conditions of DNA sequencing reactions can
sulfate and TEMED just prior to pouring the be varied according to (1)the length of sequence
gel. The volume used will depend on the size to be determined; (2) whether single- or double-
364 Clznpfer 9 / Hillis, Mable, Larson, Davis b Zimmer
stranded DNA is to be sequenced; (3) the base For amplified DNA from Protocol 17, part B,
camposition of the primer sequence; (4) the base add 4 pl of 1% acrylamide
coriipos~tronof the target sequence; and (5) the se-
iluenclng enzyme to be ~ l s e dIn . general, these 4. Add 150 pl absolute ethanol and mix.
vnri~lilonsare noted in the following protocol, ex- 5. Precipitate DNA at -80°C for 45 min or more.
cept that particular conditions for the various 6. Pellet DNA for 20 inin in refrigerated micro-
sequcnclng enzymes should follow the manu- centrifuge.
facturer's recommendations. The common se- 7. Wash pellet with 70% ethanol (approximately
quel~cmgellzymes are Klenow fragment, modlfied 150 pl), centrifuge for 10 min in a refrigerated
bacter.iop11age T7 DNA polymerase (Sequenase'") microcentrifuge.
(Tabor and Ibchardson, 1987),and Taq polymerase.
Although good results can be obtained wit11 any 8. Wash pellet with absolute ethanol (approxi-
of time enzymes, modified T7 DNA polymerase mately 150 pl), spin 10 min in refrigerated mi-
usually provides superior results, especially in re- crocentrifuge.
glans of strong secondary structure. If probleins 9. Dry the pellet in a vacuum centrifuge. [Note:
arise 111 sequencing regions of high GC content, This DNA may be stored at -20°C dry before
such as colnpressions or "stop bands" (i.e., strong proceeding to the next step.]
bands In more than one lane at a given nucleotide
poslliun), it may also be desirabvle to substitute I%ar.kR . Prcpar;~tionof Sol~ation~ and
dlTP ax 7-deaza dGTP for dGTP in step 1of part B Terminakion Ttrbes
(W.hl Barnes et al., 1983, Gough and Murray,
1983, Mlzusawa et al., 1986) These problems also 1. Label four tubes per reaction with GI A, T,
may be reduced by using both Taq polymerase and and C. To each tube add 2.5 p1 of the respec-
inodliied T7 DNA polymerase in the sequencing tive ddNTP mixture. All four mixtures con-
reactions (Austin, 1995; see section on "Interpreta- tain 80 p.M dGTP, 80 pM dATP, 80 pA4 dCTP,
tion ~ n Troubleshooting").
d For sequencing dou- 80 pM dTTP, and 50 plvl NaC1. In addition,
ble-stral-tded DNA, the DNA should be denatured each contains 8 pM of the respective ddNTP.
(part A; Haltiner et al., 1985) before starting part B. For dITP sequencing, substitute 160 dITP
Single-stranded DNA (c.g., MI3 or asymmetric for 80 pi4 dGTP in each mixture. dNTPs also
PCR products) can be used directly in part C. can be reduced to sequence close to the
primer (i.e.,the ratio of ddNTPs to dNTPs can
be altered to adjust readability length).
Par i ~ " xIDenaturatioxz and Neutrafi~atiur:of
Doublc-Stranded DNA Template 2. Prepare labeling mix depending on sequenc-
ing distance from primer. Stock: 7.5 ,dvI dGTP
1. Bring 1-3 pg of liNased plasmid DNA (or (or 15 ph4 dITP), 7.5 pM dCTP, 7.5 ,uMdTTP. -
other double-stranded template) to a volume Dilute stock 1:10 for sequencing close to
of 20 pl with deionized, distilled water. Add 2 primer, 1:5 for sequencing 25-300 bp from
111of 2 N NaOH. The exact amount of DNA primer, and use undiluted for greater than
will vary depending on the size of the tem- 300 bp from primer.
plate. A 1:1 molar ratio should be maintained
between primer and template. 3. Prepare DNA polymerase according to man-
ufacturer's directions.
2 Incubate at 65OC for 5 min, then place on ice
and allow to cool.
3. Add ~~eutralizing salt mix of: Part C. Pritner Anwaling and Sequencing
Kcaction
2 ,d8 8 NNH4Ac
3 PI 3 3 NaOAc 1. For DNA produced in part A, re-suspend
20 pl ddH@ template in 8 ,dof primer (2.5 ng/pl) in a 0.5-
ml microcentrifuge tube. For single-stranded
template from asymmetric PCR, add 7 p1 8. Prior to loading on gel, heat samples at 80°C
DNA + 1 p110 plvl primer. for 2 min.
2. Add 2 p1 of 5x sequencing buffer (e.g., 200
mM Tris-HC1, pH 7.5, 100 mM MgC12, 250 Part 11. Modificatioxls Betr Microtiter Pkate
rnM NaC1; this may vary with the DNA poly- 1-3. As in part C.
merase used).
4. Preheat two heating blocks, one at 37OC and
3. If the (G+C)/(A+T) ratio of the primer is ap-
the other at 95OC (the 95°C block will heat
proximately 0.5 or more, heat the tube to 65OC
faster if it is covered). Use blocks that have
for 2 min, then allow the tube to cool down at been drilled to hold a microtiter plate sur-
a rate of approximately 1°C/min to 35OC. If rounded by a raised edge.
the (G+C)/(A+T)ratio is less than 0.5, hold
the tube at 37OC for 15 min. Some experimen- 5. Cover two columns of wells on a microtiter
tation will be required for specific primers. plate with lab tape. Label each row according
Samples can be frozen at this step if desired. to the template to be sequenced. Label each
column G, A, T, C from left to right.
4. Prepare the sequencing cocktail in a micro-
centrifuge tube, with the following amounts 6. Add 2,5 p1 of termination mix (ddG, ddA,
of reagents for each template to be sequenced ddT, ddC) to each well in the appropriate
(add enzyme just before required): row. Cover plate and place on ice.
2.0 pl dGTP labeling mix (1:20 dilution") 7. Prepare the sequencing cocktail as in part C,
1.0 pl0.1 M dithiothreitol (DTT) step 4.
0.5 p1 [cx-~~PI~ATP, [a-33PldATP, or 8. Add 5.8 p1 of sequencing cocktail to each of
[tr-35S]thio-dATP the template/primer solutions, carefully plat-
2.0 pl DNA polymerase in buffer (I:&di- ing the sequencing cocktail on the side of the
lution with enzyme dilution buffer) tube about 1/3 down from the lip.
*or dilute as recommended in part B 9. Centrifuge the tubes briefly, vortex, and spin
above briefly again. Allow the extension reaction to
proceed for 30 sec to 7 0 min (see part C, step
Centrifuge briefly, vortex, and centrifuge 5). Place tubes on ice.
briefly again. The sequencing cocktail should 10. Add water to the 37°C block so that a thin
be kept on ice. layer of water covers the depression in the
block. Place the microtiter plate with the
5. For each reaction, add 5.5 pl of sequencing dideoxynucleotides on the block so that wa-
cocktail, and briefly centrifuge. Allow the ex- ter contacts the bottom of the wells in the
tension reaction to proceed at room tempera- plate. Incubate for 30-60 sec.
ture for 30 sec to 10 min, depending on how
close or far from the primer you wish to se- Start a timer. For each template, pipette 3.4 @
quence (short extension times allow se- from the tubes prepared in step 9 into the four
quences to be read close to the primer, wells (G, A, T, C) from left to right in one row,
whereas longer extension times accentuate beginning with the top row. Mix the contents
bands farther downstream; a happy medium in each well by pumping the solution once
is 2 rnin). Place tubes on ice. with the pipettor.
6. Add 3.5 pl of this reaction mixture to each of After pipetting the termination reactions, wait
the termination tubes prepared in part B, step until at least 5 min have elapsed on the timer
1 (pre-warmed to 37OC). Centrifuge briefly (it takes about 5-6 min to pipette 10 templates
and incubate at 37°C for 2 min. if no problems arise). Begin placing 4 pl of
formamide dye in the wells in the same order
7. Add 4 7 pl stop buffer (Appendix), centrifuge (e.g., left to right, row by row, top to bottom)
briefly, and place on ice or freeze up to 7 days.
366 Chapter 9 / Hillis, Mable, Larson, Davis & Zim~ner
as the termination reactions were pipetted. 6. To the G tube add 1 pl of 1.5 mM ddGTP; to
Work methodically and you should come to the A tube add 1 pl of 8 11lM ddATP; to t l ~ eT
the end of the plate in about the same time tube add 1 pl of 5 mM ddTTP; and to tlte C
that it took to pipette the termination reac- tube add 1pl of 2 mM ddCTP.
tions. Place the plate on ice or freeze. 7. Prepare "Reaction Mixture 1" in the following
13. Just before loading the sequencing gel, dena- manner for each RNA sample to be se-
ture samples on the 95OC block. Adjust the quenced:
water level to form a thin layer covering the 3 p1 dNTP mix (5 mM each dATP, dCTP,
depression on the block. This may lower the dGTP, dTTP)
temperature of the block somewhat; tem- 3 pl dddH20
plates will denature at 85°C. Water should 3 pl reverse transcriptase
contact the bottom of the wells on the mi-
crotiter plate. Denature for 3 min and imme- Vortex, spin in a microcentrifuge for several
diately place the plate on ice. seconds, and store on ice until needed.
8. Add 2.1 pl of the solution from step 4 to each
of the four tubes.
Protocol 23: RNA Sequencing
Reactions 9. Add 2 ,dof "Reaction Mixture 1" to each tube
(Time: =3 hr) (G, A, T, C), vortex, and spin for several sec-
onds in a microcentrifuge.
A common problem in sequencing rRNA with re- 10. Incubate at 4B°C for 40 min.
verse transcriptase is sequencing through regions
of strong secondary structure. One method that 11. Prepare "Reaction Mixture 2" (during step
may help to resolve such sequences involves the 10).
addition of terminal deoxynucleotidyl transferase 3.0 p1 dNTP mix
(TdT) following the completion of the reverse 3.0 pl reverse transcriptase
transcriptase extension reactions (DeBorde et al., Vortex, spin in a microcentrifuge for several
1986).This procedure is indicated below as an op- seconds, and store on ice until needed.
tional step 14.
1. Add 6 p1 of solution of the RNA to be se- 12. Add 1 ml of "Reaction Mix 2" to each tube.
quenced to a microcentrifuge tube. Vortex and then spin for several seconds in a
microcentrifuge.
2. Heat the RNA to >90°C for 5 min in a heating
block. Cool in ice water, and then spin for sev- 13. Incubate at 48OC for 40 rnin. Spin for several
eral seconds in a microcentrifuge. seconds in a microcentrifuge following incu- .
bation.
3. Add 1 ml of 20x reverse transcription buffer
(Appendix) and 2 ml of labeled primer (work- 14. (Optional; see comments above) Add 1 pl of a
ing stock = 0.5 pmol/pl) and 1.5 pl of RNasin mixture of dATP, dCTP, dTTP, and dGTP
(2000 U/ml) in this order, vortex, and then (each at 1 mM) and 10 U of terminal de-
spin for several seconds in a microcei~trifuge. oxynucleotidyl transferase to each tube. Incu-
bate at 37OC for 30 rnin.
4. Incubate at 42OC for 30 min. Spin for several
secoitds in a microcentrifuge following incu- 15. Add 4 ml of stop buffer (Appendix) to each
bation. (Note: To save time, the next three tube.
steps are performed during this incubation.) 16. Heat for 5 min to >90°C. Cool on ice, vortex,
5. For each RNA sample to be sequenced, pre- and then spin for several seconds in a micro-
pare four microcentrifuge tubes marked GI A, centrifuge. Store on ice until use.
T, and C.
Nucleic Acids IV: Sequerzcing and Clonilzg 367
"rottrcoi 24: 'Fherrnal Cycle Seqxaenci~ag dioisotope used he., y33Por y32P) and how old ii ts,
(Time: =4 hours) Although it is possible to use y5Sfor end-labelmg,
MJ Research (a manufacturer of therinai cyclers)
As for polymerase chain reaction procedures, op- recently warned that radioactive H2S may be
timization of reaction conditions and concentra- formed when 35Sis used in thermal cyclers and
tions of reagents may depend on the quality, therefore it should not be used for cycle sequenc-
quantity, and nature of the template DNA, the ing. f3P is nice to work with because it is relatively
particular model of thermal cycling machine, and stable and can be used for end-labeling for several
the specific primers used in the sequencing reac- months, it does not result in significant degrada-
tion. The same rules for primer design that are tion of end-labeled primers, sequencing samples
outlined for PCR and Sanger sequencing also ap- can be run for up to three months, and it tends to
ply to cycle sequencing. As for PCR,when begin- result in better resolution than p2P(Evans and
ning cycle sequencing of a new taxon or when us- Read, 1992).However, it is about four times more
ing new primers, some trial and error may be expensive. F2Presults in degradation of end-la-
necessary to result in optimal sequence produc- beled primers and sequencing reactions withln
tion (see Chapter 7). several weeks. However, it is much cheaper and
Although unambiguous sequences may bc sequences of allnost the same quality as yqPcan
obtained directly from unpurified PCR products be generated by reducing the amount of 92P m the
or from bacterial colonies or plaques, further pu- end-labeling reaction, by reduclng the amount of
rification may improve resolution of sequences end-labeled primer in t.he sequencing reaction, or
obtained. For example, quality of sequences ob- by using old y32P(i.e,, about one half-life after
tained from plaques may be improved by per- manufacture). Quantity of labeled template loaded
forming a short asymmetric PCR reaction follow- onto sequcncing gels and tlmes for autoradl-
ing the labeling reaction (Mason, 1992). For PCR ograph exposure may also be varied wit11 the re-
products, gel isolation of target fragments (see action conditions and isotope used. Specific cy-
Protocol 19) may be used to reduce ambiguities in cling conditions may vary among thermal cyclcr
sequences caused by primer-dimers or length models and may need to be adjusted accordingly
heterogeneity. Centricon filtration of PCR prod- Efficiency of reactions may be increased by using
ucts may also be used but tends to result in more microtiter plates for sequenccmg.
ambiguities in sequences close to the primer. Effi-
ciency of sample recovery may also be increased Part A. End-kabcxing R~:actsr,nx
by using methods to remove oil from the reactions
I. For each primer, cornbil~e:
(Whtehouse and Spears, 1991) or by changing the
type of oil used (Ross and Leavilt, 1991). Choice 2.5 pl lox kinase buffer
of a purification procedure may require experi- 1 pl10 pM primer (0.4 ph4 final concen-
mentation on the system being used. When het- tration)
erozygosity of sequences within a particular size 1 p1 T4 polynucleotide lcinase (0.4 Ulpl
fragment is suspected, cloning into plasmid vec- final concentration)
tors and screening for heterogeneity may be nec- 3 p1 f3P ATP (1.2 p1 Ci/pl; 1.2 pM ATP
essary (see Protocols 18 and 20). Less template is final concentration)
required than in standard chain termination Add sterile, ddH20to a final volume of 25 ~ 1 1
methods, but quality is very important.
Many methods exist for cycle sequencing and 2. Centrifuge briefly and incubate at 37OC for 30
a number of commercially available kits are on the min.
market. The concentrationof radioisotope used for 3, Incubate at 95OC for 5 mm to denature the b-
end-labeling should be optimized to achieve an nase.
optimal balance between ATP concentration and
4. Stop reaction by placing on ice or at -20°C.
specific activity, depending on the type of ra-
368 Ctznpter 9 / Hillis, Mable, Lnrson, Davis 6.'Z i ~ n m e r
[ S o i e T4 PNK Is a very sensitive enzyme. If Sample cycles

tile t eactlons do not work, check the efficiency Denaturation: 20-30 sec at 94OC
o f labeling. If usingY2P, 2 21 can be used in Annealing: 10-30 sec at 57°C
the labeling reaction.] Extension: 45-60 sec at 72OC
Efficiency of the chain termination may be in-
creased by adding 5-10 cycles at the end of
1. For each sample, combine: the reaction that do not include the annealing
25 , ~ddH;?O
1 step (e.g., 20 sec at 94°C; 1 min at 72°C).These
4.5 pl1Ox cycle sequencing buffer "chase" cycles serve to ensure that all exten-
3 pl template DNA (=50 fmol) sion products are complete and that all la-
0.2 p1 Taq polymerase (2-1U) beled fragments terminate at a dideoxynu-
3.5 pl end-labeled primer ( ~ 1 . pmol)
4 cleotide.
6. Add 5 pl of stop buffer (Appendix) to each re-
2. Add 2 pi of the appropriate nucleotide termi-
nation mix (ddG, ddA, ddT, or ddC, each with action tube.
a 10 pA4 final concentration of each of the four 7. Freeze at -20°C or run directly on sequencing
deoxynucleotides; Appendix) to G, A, T, and gel. Samples may be stored for up to three
C tubes or to spot wells in a microtiter plate. months when using P3P-labeled primers, or
Substitution of a nucleotide analog for dGTP two weeks when using f2P.
(7-deaza-dGT1' or ITP) may be used if com- 8. Heat samples at 80°C for 2-5 min prior to
pression of samples is a problem (due to in- loading on gel.
complete denaturation of GC-rich segments).
3. Add 8 pl of the mix from step 1 to each G, A,
T,and C tube. Protocsk 25: Running a Secjucrrc~ingGel
4. Overlay reactions with a drop of mineral oil. (Time: 3-6 hr to step 18)
This is a very important step to prevent evap- Sequencing is best accomplished at approximately
oration from small volumes. Some thermal 50°C, so that the DNA or IWA remains denatured
cyclers (especially those with microtiter plate and relatively few secondary structures form. The
attachments) supposedly do not require the temperature must be constant across the width of
use of oil, but this claim should be verified the gel, or the lanes will migrate at different rates
tlirougl~comparative tests. and "smiling" of the bands will occur (Rgure 9C).
5. Place reactions in a thermal cycler and set the Uniform gel temperature is usually maintained in
cycles according to the primer Iength and se- one of two ways: (1)by placement of an aluminum .
quence as well as the quantity and purity of plate against the glass surface of the sequencing
the template DNA. Generally, for PCR tem- gel for even dispersal of heat, and regulating the
plates, an annealing temperature 5-10°C current to maintain the desired gel temperature; or
hig11c.r than that for the initial PCR is the most (2) placement of a thermostatic plate with circulat-
efflcicnt. Cycle repetitions may be varied ing temperature-controlled water against the se-
from 20 to 40, depending on the quantity of quencing gel. The second option is preferable, be-
DNA and .the sensitivity of the autoradi- cause of greater control of gel temperature, but
ugraph detection procedure. For large DNA requires more equipment (the thermostatic plate
fragments, an initial long denaturing step and a circulating heated water bath).
(2-3 min) and longer extension times can im- 1. Turn off the pre-run (from Protocol 21, step
prove labeling efficiency but unnecessarily 12).
long annealing or extension times can reduce
labcling efficiency (especially at a long dis- 2. Clear each of the wells by flushing with
tnncc from the primer). buffer.
:leic Acids IV: Sequencing and Cloning 369
(A) (B) (C) (D) temperature (measure using a surface ther-

G A T C GATC G A T C GATC mometer). As the run progresses, the amper-
age will drop as the voltage rises and the gel
heats. Some experimentation will be required
for particular gel systems In general, a volt-
age of approximately 2000 V and a tempera-
ture of approximately 50°C is desirable. Gels
can also be run at constant power (e.g., 35 W).
7. Allow the gel to run until the bromophenol
blue marker reaches the bottom of the gel. If
sequence far from the primer is to be read, a
second set of reactions can be loaded at this
point and run until the bromophenol blue
marker reaches the bottom of the gel. The dis-
tant sequence can then be read from the lanes
loaded first, and the closer sequence from the
lanes loaded last.
8. Turn off the power and remove electrode ca-
bles.
Figure 9 Sequencing gel autoradiographs and trou-
bleshooting. (A) Readable control reaction. (0)Same se- 9. Unclamp the gel plate from the gel apparatus.
quence as in (A), but with the lanes overloaded and the CAUTION: The bottom tank now contains ra-
gel run at a high voltage (resulting in poorly defined dioactive buffer and must be handled with
bands). (C) Same sequence as in (A), but with smiling appropriate care.
effect produced by uneven gel temperature (the left
lanes are closer to the outside of the gel than are the 10. Dry the gel plates and cut the tape on both
nght lanes).(D)Same sequence as in (A), but with RNA sides using a razor blade.
contamination.Note the darker background and poorly 11. Pull the spacers out from between the plates.
defined bands.
12. Use a spatula to pry the plates apart. The gel
should adhere to only one of the plates.
3. Label the reactions to be loaded on the front 13, If the gel was bound to the glass plate using
plate of the gel, Use a consistent order (e.g., G, bind-silane or if 35Swas used as the labeling
A, T, C) for each set of reactions. Note that radioisotope, wash the gel by immersing the
this order allows the sequence of the opposite entire plate in 10% acetic acid and gently
strand to be read correctly if the autoradi- shaking for 15 min. Remove plate from fix
ograph is inadvertently viewed backwards. and proceed to step 14 (for 35Slabeled gels) or
4. Load 4.5-5 pJ of each reaction (from Protocols step 18 (for silane-bound gels).
22,23, or 24) in the wells as marked, using a 14. Cut a sheet of #1filter paper to fit over the gel
lnicropipetter with microthin tips. Store un- plate.
used portion of the reactions at -20°C
15. Place the filter paper over the gel and press it
5. Wipe off lane markings for first load to pre- onto the gel. Place the other gel plate over the
vent accidently loading them a second time. filter paper and press firmly again.
6. If using a thermostatic plate, set voltage at 16. Remove the top plate and pull the filter paper
2000 V (the constant temperature of the gel (with gel attached) from the lower plate.
will also hold the amperage relatively con-
17. Cover the gel with plastic wrap. Place several
stant) . If an aluminum plate is used to main-
sheets of filter paper beneath the gel and put
tain even gel temperature, run the gel at the
on a gel dryer at 80°C for 1hr,Proceed to step
amperage required to maintain desired gel
19.
18. If the gel was bound to the glass plates using golden jackals (Roy et al., 1994). Bovine mi-
bind-silane, dry with a hot-air blower or in a crosatellites have been used successfully in sheep
drying cabinet. and goats (S.S. Moore et al., 1991). The following
19. Place the gel in a film cassette with autoradi- protocol has been used to isolate (GT),, micro-
ography film for =24 hr (the exact length of satellites from cattle, horses, turtles, and lizards.
exposure will vary). Exposure time can be es- The average insert size in libraries con-
timated after scanning the gel with a Geiger structed using the following protocol is approxi-
counter (the relationship between how ra- mately 600 bp. The (GT)15probe hybridizes to
dioactive the gel appears and exposure time 2.9% of insert-bearing colonies in a bovine library
will vary among laboratories). If 35Swas used, and 66% of these contain (GT),, microsatellites. A
do not place plastic wrap between the gel and weaker signal on the grid indicates a reduced like-
the film. lihood that an insert contains a useful repeat.
Longer repeats are more likely to be polyl~~orphic
20. Develop the film for 5 min in a developer
(Weber and May, 1989).
tank. Rinse in stop bath and place in fixer for
In 5% of the microsatellites that have been iso-
5 min. Wash well in running tap water and
hang to dry. lated by this protocol, the repeating unit is too
close to the Sau3Al site to design primers. An-
other 20% of the microsatellites fail to amplify,
and about 20% of the amplified loci are mono-
Protocol 26: Microsatellites morphic. Thus, the yield of useful microsatellite
(Time: -1 week) markers typically is about 1 per 100 white
Microsatellites are tandemly repeated DNA se- colonies. This ratio has been observed in mam-
quences of one to six bases in length. Variation in mals and reptiles, but in birds the ratio appears to
the number of repeats generates length polymor- be much lower-approximately one useful mi-
phism~that can be visualized and scored follow- crosatellite marker per 1000 white colonies
ing PCR and polyacrylamide gel electrophoresis. screened.
Microsatellites appear to be uniformly distributed
1. Digest 10 pg of total genomic DNA to com-
throughout human (A.E. Hughes, 1993), mouse
pletion with Sau3Al in a 100-pl reaction.
(Stallings et al., 1991), chicken (Haberfeld et al.,
19911, whale (Tautz, 19891, and insect (Choudhary 2. Load the entire digest onto a 0.8% agarose
et al., 1993) genomes. Abundance (every 50 kb in minigel with a low-molecular-weight size
mammals), uniform distribution, and high poly- standard.
morphism make these markers useful in popula- 3. After electrophoresis, stain the gel with ethid-
tion genetic and gene mapping studies. Mi- ium bromide and excise fragments between.
crosatellites have been developed for vespid 300 and 800 bp with a sterile razor blade.
wasps (Choudhary et al., 1993), chickens (Haber- 4. Recover the DNA from these gel fragments by
feld et al., 19911, cattle (Barendse et al., 1994; M.D. crushing, freezing, and thawing them several
Bishop et al., 1994), pigs (Ellegren et al., 1993; times and then pelleting the agarose by cen-
Rohrer et al., 1994; Wintero et al., 19921, horses (El- trifugation.
legren et al., 1992; Marklund et al., 19941, dogs
5. Precipitate the DNA from tlie supernatant for
(Holmes et al., 1993; Ostrander et al., 19931, cats
1 hr at -20°C by the addition of 1/10 volume
(O'Brien, 1993),mice (Cornall et al., 1991; Dietrich
of 2M NaCl and 2.5 volumes of ethanol.
et al., 1992, 1993), rats (Serikawa et al., 19921, and,
of course, humans (Litt and Luty, 1989; Tautz, 6. Re-suspend the pellet in 90 p1 of 10 rnM Tris-
1989; Weber and May, 1989).Canine microsatellites HC1 (pH 8.3), and 10 pL calf intestma1 phos-
already have bcen applied in genetic differentia- phatase (CIP) buffer containing 4 U of CIP.
tion and hybridization studies of wolves and coy- 7, Incubate at 37°C for 1 11r. This treatment re-
otes, and have been amplified successfully in moves the 5' phosphate groups from genomic
Nucleic Acids IV: Seqtlerzcing a~zdClolzing 371
DNA so that the fragments cannot self-ligate 18. Perlianently blnd the DNA to the membranes
and create chimeric inserts. w ~ t han overnight incubatiol-t in a 65°C dry
8. Reinove the CIP by incubating the mixture oven.
with 0.5% SDS, 5 mM EDTA (pH 8.01, and 100 19. Hybridize membranes with an end-labeled
mg/ml proteinase K for 30 min at 55°C. (GTII5 oligonucieotide (see Chapter 8 for de-
9. Extract the sample with an equal volume of tails).
PC1 (Appendix) and again with chloroform. 20. Isolate DNA from the colonies that liy-
10. Combine the size-selected phosphatized ge- bridized to the probe and those from the mas-
nomic DNA with an equaI molar ratio of plas- ter plates (Protocol 14). Sequence the inserts
mid DNA that has been cut by BnnzHI to pro- using the appropriate plasmid primers.
duce compatible ends. 21. Design PCR primer pairs in the regions flank-
11. Co-precipitate the mixture by the addition of mg the microsatellite based on the sequence
1/10 volume of 2 M NaC1 and 2.5 volumes of (see Chapter 7).
ethanol.
12. Dry the pellet from the co-precipitation and
re-suspend it into a 25 pl ligation reaction mix INTERPRETATION AND
containing 7.5 U of T4 DNA ligase (in ligation TROUBLESHOOTING
buffer plus 0.5 mM ATP) and incubate over-
night at 12°C. Autoradiograph Interpretation
13. Transform the ligations by aliquots into DH5a
competent E. coli cells (2 of ligation mix/50 Although reading autoradiograplils of sequence
yl cells). Ligations may be stored frozen at data is relatively straightforward (see Figures 4,5,
-20°C for many montl-ts and will still effec- and 9A), some practice is required to record the
tively transform cells. data accurately and to identify and solve prob-
lems. When sequencing DNA, it is strongly advls-
14. Plate transformed cells on LB plates with able to sequence both strands, as this provides a
ampicillin (200 mg/ml), X-gal (40 mg/ml), check against reading errors. When sequencing
and IPTG (100 mg/ml) for blue-white color RNA, only one strand can be sequenced, so ~t 1s
selection. A low ratio of white:blue colonies necessary to sequence broadly overlapping re-
usually indicates insufficient genomic DNA in gions in order to verify the sequence.
the ligation. A low number of transformants Reading sequences from autoradiographs 1s
indicates either an excess of genomic DNA in greatly simplified by use of one of various gel-
the ligation or poor quality competent cells. readers-digitizers coupled directly to a coin-
15. Pick white colonies onto a fresh plate covered puter. Use of a gel reader reduces human error
with a gridded nylon membrane and onto a compared to recording a sequence and then in-
master plate numbered to correspond to the putting the sequence via a keyboard. Most gel
grid. readers and software packages allow previousIy
16. Allow the cells to grow on the surface of the input sequences to be verified, thus further re-
nylon membrane for 12-18 hr. ducing error. Various automated gel readers have
17. Lift the membrane off the plates and place se- been and continue to be developed, and evcntu-
quentially onto satura-ted blot paper in the ally may replacc manual reading of autoradi-
bottom of PyrexTM dishes soaked in: 10% SDS ographs altogether. However, experience in gel
for 3 min; denaturing solution (Appendix) for reading usually allows higher accuracy of man-
ual sequence interpretat~oncompared to the prc-
5 min; neutralizing solution (Appendix) for 5
min; and 2x SSC (Appendix) for 5 min. This sent automated autoradiograph reading tech-
will denature the plasmid DNA and bind it to nology Software for automated reading of chro-
the membrane. mato-graphs produced by automated sequel~cers
372 Cl~npter9 / Hillis, Mable, Larson, Davis
hecolne quite accurate, although manual G A T C G A T C

cllechng of sequences against chromatographs is
still necessary. As with rnanual reading, discrep-
ancies between the two strands should be
c~~ecited and the sequence confirmed against the
chromatograph.
The length of readable sequence depends on
a number of factors. Wit11 use of wedge-shaped
spacers (Ansorge and Labeit, 1984), long gels
(50-60 cm), 3% or 33Prather than 32P,bacterio-
phage T7 DNA polymerase (Tabor and Richard-
son, 19871, and/or modified acrylamide solutions
(see Protocol 211, it is possible to obtain greater
than 600 bp from a single sequencing reaction on
a single autoradiograph. Even longer sequences Figure 10 Example of a method used to reduce or
can be obtained on occasion from single reactions eliminate stop bands in DNA sequencing gels (see
on some atrtomated sequencers. Wedge-shaped Austin, 1995).The sequencing reactions on the left were
spacers produce a gel that is thicker on the bot- produced with modified T7 DNA polymerase (Seque-
tom than at the top; thus, the smaller DNA frag- nase"); those on the right were produced with a com-
bination of modified T7 DNA polymerase and Taq poiy-
ments slow down as they approach the bottom, merase. Note the resolution of the stop band (arrow).
and allow resolution of the larger fragments. Au-
tomated sequencers solve this problem by scan-
ning a single point on the gel, recording the ter- per portion of the gel, it is likely that there has
minated fragments as they pass. The use of 3?3 or been a loss of activity of the sequencing enzyme.
33P in rnanual sequencing produces sharper If ghost bands are apparent in adjacent lanes, then
bands lhan does "P, SO it is possible to deduce se- too large a sample may have been loaded into the
q ~ e n c efrom bands that are quite close together. gel, the loading syringe or pipette tip was not
Improved DNA polymerases with higher tem- rinsed between loading samples, or there was a
perature optima and fidelity (e.g., Tabor and poor fit of the sharkstooth comb in the gel. Often
RichLirdson,1987) also allow accurate sequencing the problem is caused by secondary structure or
ol long DNA sequences. The ratio of dNTPs to other sequence-specificproblems (see below). It is
ddNTl's also can alter the length of readable se- also possible that sequence ambiguities reflect
quence using both direct sequencing and cycle template impurities or l~eterornorphicPCR prod-
sequencing. ucts. If gel purification of target bands does not .
One of the most common problems encoun- improve resolution, it may be necessary to screen
tered wlth autoradiographs of sequencing gels is for hcterogeneity by cloning and screening (Pro-
a dark background in the lanes (Figure 9D). This tocol 11) or by using a rapid screening method
usually is caused by impure template DNA; RNA (Protocol 20; see also Chapter 8).
is the most common contaminant. Often this can Another problem sometimes observed is that
be corrected by further purification of the DNA the bands are too faint on the a~*toradiograph.
sample, including re-treatment with RNase. Back- This can be caused by old radionucleotides, insuf-
ground is not as much of a problem when using ficient exposure time, or salt contamination in the
sequencing techniques that utilize end-labeled DNA template. With cycle sequencing, faint
primers (e g., cycle sequencing). bands may be the result of inefficient labeling of
If bands are present in the same position in the primers, 6;of thermal cycler reaction condi-
more than one lane (Figure lo), there are several tions that are not optimal for the primers being
posslble problems If the resolution of the gcl is used. If the bands are too dark, this can be cor-
gene1 ally poor and there are no bands in the up- rected simply by exposing the autoradiograph for
a shorter interval. Diffuse bands (Figure 9B) usu-

ally are caused by poor contact between the film
and gel during exposure, but also can be caused
by loading too much sample, running gels at too
high a voltage, and by poor washing and/or fixa-
tion of the gel.
"Smiling" is the phenomenon of samples in
the outside lanes of a gel running slower than
samples in the middle of a gel (Figure 9C). This is
caused by uneven gel temperature, and can be
corrected by good contact between the glass
plates of the gel and an aluminum plate, or even
better, by use of a thermostatic glass plate through
which heated water (approximately 50°C) is cir-
culated. If the samples are horizontal with respect
to one another but the individual bands are not
straight (Figure ll),the problem is likely that urea
was present in the sample wells when they were
loaded. This can be corrected by thoroughly rins-
ing the wells with buffer prior to loading.
If the current (with either voltage limiting or
watts limiting) is exceptionally high or low, this is
likely an indication of a problem with the buffer
or polymerization of the gel mix. Common prob-
lems include inaccurate preparation of the TBE
buffer, different concentrations of TBE used in
tray buffers and gel mix, exclusion (or replace-
ment) of urea from the gel mix, or incomplete
polymerization of the gel.
If bands appear on the autoradiograph only
on the areas that correspond to the lower portions Figure 11 Example of a sequencing problem that oc-
of the gel, there are at least three possible causes. curs when undissolved urea or another substance is
One, the template DNA may not have been present in the sequencing wells. The problem may be
added; two, one or more dNTPs may have been solved by thorough flushing of the wells prior to
oinitted from the reaction; and thee, there may be loading.
no recognition sequence in the template DNA for
the primer being used. In the first two cases, the
reaction must be carried out again. If the primer rich regions. This problem is prevalent in se-
site is absent in the template DNA, the only rem- quencing rRNA. The secondary structure com-
edy is to use a different primer. With cycle se- monly produces a stop in the sequencing reac-
quencing, absence of bands also may be an indi- tions, so that bands are present in all four lanes
cation that the primers did not label properly. (Figure 10). There are several ways to correct this
End-labeling problems often can be traced to de- problem. Higher gel temperatures help prevent
creased activity of polynucleotide kinase (polynu- formation of secondary structures, so running the
cleotide kinase is a very sensitive enzyme and gel at a higher temperature will help combat the
should never be left out of the freezer). problem. In sequencing RNA, the addition of a
Secondary structure of DNA or RNA can terminal deoxynucleotidyl transferase (TdT) chase
cause difficulties in sequencing, especially in GC- following the completion of the reverse transcrip-
374 Chapter 9 1 Hillis, Mable, Larson, Davis & Zirnnzer
tase extension reactions is helpful (see Protocol 23 nucelotide databases for the best match of virtually
and DeBorde et al., 1986). In sequencing DNA, any sequenced segment of DNA, especially if the
use of bacteriophage T7 DNA polymerase or Taq identity of the fragment is unknown or uncertain
polymerase (or both; see Figure 10) rather than (R.F. Doolittle, 1990b).Because some sequence will
Klenow fragment resolves many problems with always represent a best match, ~tis desirable to
secondary structure; extreme cases can be re- know if the match is sigruhcantly different from a
solved by using dITP (or 7-deaza dGTP) rather random match. Lipman and Pearson (1985) de-
than dGTP in the sequencing reactions (see Proto- scribed a z statistic for this purpose, which is de-
col 22; also W.M. Barnes et al., 1983; Gough and rived from the particular similarity score used in
Murray, 1983). the search procedure. Briefly, the z statistic equals
the difference between the similarity score and the
mean similarity score from the database scan, di-
Sequence Comparison and Alignment vided by the standard deviation of the similarity
Once the sequence has been obtained, it must be scores from the database scan. They suggested the
related to other sequences to be of use in system- following guidelines: z > 3, possibly significant; z
atics. Sequences either can be aligned with known > 6, probably significant; and z > 10, significant.
orthologs (or paralogs and xenologs if the evolu- Other approaches to similarity sipficance testing
tion of gene families is the object of study), or sim- have been described by Kanehisa (19841, Lipman
ilarity searches (often incorrectly termed homol- et al. (1984), T.E Smith et al. (1985),and Pearson
ogy searches) can be performed by matching the (1990).
sequence to all other sequences in a databank The program Entrez is a very useful tool for
such as GenBank. In the latter case, the best exploring the nucleotide and protein databases, as
matches (or the most interesting ones) are then ex- well as the associated literature. It is available on
tracted and aligned for phylogenetic analysis. CD-ROM (address above), or a network version is
Alignments may be simple for closely related pro- available across the Internet (contact netinfo8
tein genes, but may be extremely difficult or am- ncbi.nlm.nih.gov to register and obtain client soft-
biguous if the sequences are distantly related or ware). A World Wide Web version is also available
come from non-protein-coding regions. (htpp://www.ncbi.nlm.nih.gov/). After genes of
interest have been identified, Entrez may be used
Database Searches to locate many more similar sequences very
Local alignment algorithms find all subsequence rapidly. This speed is possible because Entrez con-
matches above a certain defined threshold. Search tains precompiled lookup tables of connections
of data banks makes use of these algorithms, such among similar sequences from prior BLAST
as the BLAST algorithm of Altschul et al. (1990),the searches. It also has direct connections among nu-
FASW algorithm of Lipman and Pearson (1985), or cleotide entries, protein entries, and literature ref-
the FASTA algorithm of Pearson and Lipman erences from the various databases, which are dis-
(1988).An implementation of the BLAST algorithm tributed with the program on CD-ROM or are
is distributed as part of the Entrez CD-ROM pack- accessible across the Internet. Searches of the data-
age (National Center for Biotechnology Informa- bases are possible through Boolean queries of al-
tion, National Library of Medicine, 8600 Rockville most any of the illformation associated with the
Pike, Bethesda, MD, USA). It also can be accessed databases. For instance, searclung may be done by
across the Internet (to register and obtain client soft- taxonomic group, text terms in titles and ab-
ware, contact [email protected]);a ver- stracts, key words, author names, accession num-
sion is also available on the World Wide Web bers, Enzyme Commission numbers, sequence ID
(htpp://www.ncbi.nlm.~i.gov/). Versions of other numbers, medical subject headings, gene names,
local alignment algorithms are distributed as part chemical substances, or MEDLINE ID numbers.
of most commercial sequence analysis software Producers of nucleotide sequences should
packages. It has become commonplaceto search the make their findings available to the public by de-
Nucleic Acids IT/: Sequencing and Cloni?zg 375
positing the sequences in a major database. Many Pairzvise Aligiznzents

editors of journals are reluctant to permlt the orig- P a ~ r w ~ alignments
se seek to align two entlre ho-
inal sequences to be published in print, so many mologous regions, uslng a balance betv~cel-i
journals now require submission of t l ~ csequences matches and gaps. The ~ntroductionol gaps 15
to GenBank or EMDL as a prerequistite to publi- necessary to account for insertlol~/dcletlonevenls
cation. The alignments sf the sequences also (see Figure 14), but because any two sequences
should be deposiled (this is possible in EMBL and could be aligned perfectly if enough gaps were in-
soon will be standardized for GenBank), since troduced, gaps must be penallzed. A dynail-uc
they are the basis of any phylogenetic conclu- programming algorithm for global a l l g n m e ~ ~ t s
sions. Software for sequence submission (Au- was devcloped by Needleman and W ~ ~ n s c h
thorin) is distributed on the Entrez CD-ROM dis- (1970), many variations on the baslc idea have
cussed above, or can be requested across the been developed since then (reviewed in R F.
Internct from [email protected]~.gov. The se- Doohtile, 1990a).The bas~cldca is to find the least
quences may then be submitted to gbsub@ costly path (in terms of substitutions and g a p )
ncbi.nlm.nih.gov. A World Wide Web tool for se- through a matrix plot of the two sequcnces (see
quence submission, BankIt, is also available above).
(htpp:/ /www.ncbi.nlm.nih.gov/). Gap penalties can bc a combinatron of the
number of gaps and the slze of the gaps. Irz gcn-
Matrix Plots eral, the number of gaps should be penalized
As a first step in pairwise alignment, matrix com- more heavily than the size of the gaps, because
parisons (Figures 12 and 13) are uscful for quick there is no a prior1 reason to think that inser-
determination of major regions of similarity (not tlon/deletron events arc more I~kelyto tnvolve
necessarily homology; see Chapter 1) and for vi- short sequences. I12 prote~n-codingsequences, 11
sual portrayal of these similarities (C.B. makes sense to penalize gaps that produce frame-
Lawrence, 1990).111tlze sinlplest form of this pro- shlfts (~.e.,gaps that are not In ~nultiplcsof t h r w
cedure, two sequences are portrayed along the x nucleotidcs) more heavily than those that do not.
and y axes of a graph, and every nucleotide in For sequences of unequal length, costs may also
one scquence is con~paredto every nucleotide in be set for leading and tralling gaps. These penal-
the other sequence. If the nucleotides are the ties are usually lower than the penalties for inter-
same, then a dot (or some other symbol) is shown nal gaps, since the leading and trailing gaps usu-
in the corresponding row and column of the ally have more to do wit11 the length of sequence
match; otherwise, the space is left blank. Because examined than actual evolutionary changes.
there are only four possible nucleotides in DNA All substitutions may be asslgned the s a n e
sequences, approximately 25% of the compar- penalty in an alignment, or a matrix of change
isons will be matches in random sequences if the costs may be specified (Sankoff and Rousseau,
four bases are present in equal frequencies. 1975; Sankoff and Cedergren, 1983). Tl-us latter
Therefore, usually comparisons are made be- strategy allows the investigator to assign d~fferent
tween several adjacent nucleotides simultane- costs to transitions and transversions, or to all
ously, rather than on a base-by-base basis. Usu- possible changes between nucleotide pairs.
ally, some kind of weighting scheme is employed, Visual inspection usually is necessary to cn-
so that the percent of identical bases within a sure that the most reasonable allgnlnent has been
given window of adjacent sequence can be taken generated. Confidcnce in alignments produced
into account by use of different symbols (Figures can bc improved by comparing alignments tzr~t11
12 and 13).These kinds of comparisons often are amino acid substitution algor~thmsand with m-
helpful as a first step in comparing sequences, formation on secondary structure of the genes
and are useful especially for identifying inser- under study (e.g., Kjer et al., 1994). However,
tion/deletion events (indels) that may have oc- changes 111 alignment should be discussed and
curred between homologs. justified, and a clear set of alignment criterla
376 Chapter 9 / Hillis, Mable, Larson, Davis G-l Zimmer
Figure 32 Matrix comparison of a portion of the 285 Davis, 1987).The letters represent percent similarity
rlZNA genes of a frog (Xenopus Iaevls, vertical axis; Ware over blocks of 30 bp; A: loo%, B: 98-99%, C: 96-9796,
et nl ,1983) and a mouse (Mtns muscultls, horizontal axis; etc. All matches of 65%or higher similarity are shown.
I-Iassouiza et al., 198.1).The deflections along the diago- Note the regions of similarity between GC-rich regions
nal represent insertion/deletion events (see Hillis and at positions 500-1000 and 2500-3200.
should be identified. For more information on Chan et al., 1992; Higgins et al., 1992; Wheeler and'
pair.culse alignment, see Waterman et al. (1991). Gladsteii~,1992,1994).
One approach to multiple alignment is to
Multiple Alignments make pairwise alignments, and then add the se-
For most phylogenetic studies, sequences must be quences together by inserting additional gaps as
allg-ned arnong multiple taxa, individuals, or needed. However, the final alignment will be or-
genes. Tn principle, the method of Needleman and der-dependent, meaning that different alignments
Wunsch (3970) could be extended to multiple di- will be achieved depending on the order of the
menslons, but this approach would be computa- pairwise alignments. Feng and Doolittle (1987,
tionally impractical. Many of the recent advances 1990) proposed to obtain the order of the pairwise
in alignment have been concerned with ways of alignments from clusters in an initial tree pro-
solving the problem of multiple alignment (e.g., duced from a matrix of distances across all pair-
Feng slid Doolittle, 1987, 1990; Hein, 1989a,b; S.C. wise alignments. This strategy is implemented in
Figure 13 Another matrix comparison of the two se-

quences coinpared in Figure 12, but filtered so that only
matches of 85% or higher are shown. timizes a multiple alignment by searching for the
alignment that globally minimizes differences
among the sequences (as specified by defined gap
the program Clustal (Higgins and Sharp, 1988, penalties and change costs; see above). This very
1989; Higgins et al., 1992).Clustal is available by useful and versatile program implements matrix
anonymous ftp from ftp.bio.indiana.edu (in the change costs and versatile gap weighting, allows
directory molbio/align). I-Iein (1989a, 1990b)sug- user-specified or phylogenetic tree-based align-
gested that the tree derived from the first set of ment order, permits user decisions about the com-
alignments should be used to repeat the process, plexity of the multiple alignment optimization
and this cycle can be continued until a stable search, and outputs aligned sequences in formats
alignment has been reached. Kein's program that are compatible with all of the major phyloge-
TreeAlign may be obtained from the same source. netic analysis computer packages. Precompiled
An alternative strategy recognizes that align- versions of the program are available from Ward
ment of sequences and phylogenetic analysis are Wheeler (Department of Invertebrates, American
two sides of the same coin. Sankoff et al. (1973) Museum of Natural History, Central Park West at
suggested that alignment of sequences should be 79th Street, New York, NY 10024-5192, USA), or
part of phylogeny inference, rather than prior to it. the source code (for standard or parallel versions
Wheeler and Gladstein (1992, 1994) implemented of the program) can be obtained by anonymous
this strategy in the program MALIGN, which op- ftp across the Internet (ftp.arnnh.org).
378 Chapter 9 / Hillis, Mable, Larson, Davis & Zimmer
(A)
22 100 22 120 22 140
Mus GTCAGCCAGGACTCTCTACCCGCTCACGGCAAGGCTTCCCTGCCCGCTACCGGAGGCAAC
Rattus GTCAGCCAGGACTCTCTACCCGCTCACGGCAAGGCTTCCCTGCCCGCTACCGGAGGCAAC
Homo GTCAGCCAGGACTCTCTACCCGCTCGCGGCAAGGCTTCCCTGCCCGCTACCGGAGGCAAC
Xhlneura GTCAGCCAGGATTCTCTATCCGCTCGCGGCAAGGCTTCCCTGCCCGCTACCGGAGGCAAC
Cacatua GTCAGCCAGGATTCGCTATCCGCTCGCGGCAAGCCTTCCCTGCCCGCTACCGGAGGCAAC
Xenopus GTCAGCCAGGATTCTCTACCCGCTCGCGGCAAGCCTTCCCTGCCCGCTACCGGAGGCAAC
Rhyacofriton GTCAGCCAGGATTCTCTATCCGCTCGCGGCAAGCCTTCCCTGCCCGCTACCGGAGGCAAC
Typhlonectes GTCAGCCAGGATTCTCTATCCGCTCGCGGCAAGCCTTCCCTGCCCGCTACCGGAGGCAAC
Latimeria GTCAGCCAGGATTCTCTACCCGCTTGCGGCAAGGCTTCCCTGCCCGCTACCGGAGGCAGC
Cyprinelfa GTCAGTCCAGGATTCCTACCCGCTGGCGGTCAAGCCTTCCCTCCGGCTACCGGAGGCAGC
* * * * *
* ( I ** ** ** * * * * *
(B)
22 100 22 120 22 / 40
MUS GTCAG-CCAGGACTCTCTACCCGCTCACGG-CAAGGCTTCCCTGCCCGCTACCGGAGGCAAC
Xaftus GTCAG-CCAGGACTCTCTACCCGCTCACGG-CAAGGCTTCCCTGCCCGCTACCGGAGGCAAC
Homo GTCAG-CCAGGACTCTCTACCCGCTCGCGG-CAAGGCTTCCCTGCCCGCTACCGGAGGCAAC
Rhineura GTCAG-CCAGGATTCTCTATCCGCTCGCGG-CAAGGCTTCCCTGCCCGCTACCGGAGGCAAC
Cacatua GTCAG-CCAGGATTCGCTA.~CCGCTCGCGG-CAAGCCTTCCCTGCCCGCTACCGGAGGCAAC
Xenopus GTCAG-CCAGGATTCTCTACCCGCTCGCGG-CAAGCCTTCCCTGCCCGCTACCGGAGGCAAC
Xhyacofriton GTCAG-CCAGGATTCTCTATCCGCTCGCGG-CAAGCCTTCCCTGCCCGCTACCGGAGGCAAC
Typhlonectes GTCAG-CCAGGATTCTCTATCCGCTCGCGG-CAAGCCTTCCCTGCCCGCTACCGGAGGCAAC
Latimeria GTCAG-CCAGGATTCTCTACCCGCTTGCGG-CAAGGCTTCCCTGCCCGCTACCGGAGGCAGC
Cyprinella GTCAGTCCAGGATTC-CTACCCGCTGGCGGTCAAGCCTTCCCT-CCGGCTACCGGAGGCAGC
* * * * *
Figure I4 Alignment of a portion of the 28s rRNA that are variable among species are marked with an as-
genes of various species of vertebrates sequenced by terisk. Insertions are indicated by dashes. (A) Align-
Hadjiolov et al. (1984),Gonzalez et al. (1985),Ilassouna ment with no insertions added. Note that this align-
et al. (1984), HiUis and Dixon (1989),Larson and Wilson ment requires 20 variable sites. (B) Alignment with four
(19891, and Ware ct al. (1983). The numbers refer to the insertioddeletion events. There are only 10 variable
nucleotide positions of the Mus 28s rRNA gene; sites sites in this alignment, including the insertions.
APPENDIX: STOCEC SOB,UTHBNS WorHcing Gel M i x

6% Ac~yliarnid~
150 m140% acrylamide stock
20% Acrylamide Stock Scalutiaal 100 ml10x TBE
420 g urea
96.5 g acrylamide
d d H 2 0 to 1 L
3.35 g bis-acrylamide
233.5 g urea (ultrapure) Filter before using.
50 ml 10x TBE
d d H 2 0 (150 ml start) to 500 rnl
Filter before storage; store in a brown bottle.
335 g ammonium acetate
ddH,O to 1 L
40% Acrrylamide Stock SoJuhicsn
280 g acrylamide ATE (Actetate-Trlis-EDTA)
20 g bis-acrylamide
100 mM sodium acetate, pH 5.2
d d H 2 0 to 1 L
10 mM Tris-HC1, pH 7.4
Store in a brown bottle. 1 mM EDTA
'BoraEc Buffer Cut tubing to desired size. Place tubing in a large

beaker with a stirring bar. Submerge and keep
0.2 M sodium borate, pH 9.0 submerged by placing a plastic container partially
30 mM [ethylenebis(oxyethylene~7itrilo)] filled with water on top of surface. Place 0x1 a hot
tetraacetic acid plate and bring to a boil. Boil tubing for 20 rnin. If
5 mA4 dithiothreitoi tubing is very dirty, boil again in 2 L of 1 nuZ4
1%SDS EDTA. Remove tu.bix~gwith blunt instrument.
ddH2O to 1 L Store in 1 mM EDTA in the refrigerator. Rinse
with distilled water before use.
0.5 M EDTi'a,
A solution of chloroform and isoamyl alcohol, in
the ratio 24:l. (Etlay1enediaminr;Jt-ctraaackicArid), ;?M
8.0
2s CTAB Extr;~cfianBuffer 268.1 g disodium EDTA
ddH,O to 1 L
10 g CTAB [hexadecyltrimethylammonium
bromide] Dissolve and pH with sodium l~ydroxide.Auto-
140 m15 M sodium chloride clave.
25 m12 M %is-HCl, pH 8.0
20 m10.5 M EDTA
0.05 M glucose
?Ox Cycle Sequencing Buffer 0.025 M Tris-HCI, pH 8.0
0.01 M EDTA
300 mM Tris-HC1, p1-I 9.0
50 mA/I magnesium chloride
300 mM potassium chloride
0.25% (w/v) NP40
60 m15 M potassium acetate
0.25% (w/v) Tween 20 11.5 ml glacial acetic acid
28.5 ml d d H i 0
Dena t uril~gSolution
1.5 M sodium chloride
0.5 M sodium hydroxide
10 g tryptone
10 g sodium chloride
5 g yeast extract
d d H 2 0 to 1 L
Add 0.1% diethylpyrocarbonate to water, wait 12
hr, and autoclave. Used to inhibit RNase. Adjust pH to 7.2 with sodium hydroxide, then au-
toclave. For L-broth plates, add 15 g agar/L belore
autoclaving. For L-broth + MgS04 -e- maltose
Dialysis Tubing plates, add sterile MgS04 to 10 mM final concen-
To prepare 2 L of solution to boil tubing: tration and sterile-filtered maltose to 0.2% after
autoclaving. Sterile-filtered antibiotics, IPTG, and
40 g Na2C03 X-Gal should also be added after autoclaving, af-
4 m10.5 M diaodium EDTA ter agar has cooled down to below 50°C. For L-
ddH20 to 2 L broth top agarose, add 7 g/L agarose to L-broth
before autoclaving.
380 Clzapker 9 / Hillis, Mable, Larson, Davis b Zilnrner
-IOy Iiga ilnn Rtxffer. SDS (Sodium Derdecyl Sulfate)

500 nuC1 Trls-I-IC1, pH 8.0 Use as a 20% solution. Do not refrigerate or auto-
70 nuM magnesium chloride clave.
10 11-h.1 ii~ihlothre~tol
20x SSC (Sodium Chloride-Sodium
,a1 '53 ind ing Bnif er Citrate)
91.0 g sodium iodide 3 M sodium chloride
1.5 g sodium sulfide 0.3 M trisodium citrate
Add water to 100 ml, filter through Whatman # 1 Adjust pH to 7.0 with hydrochloric acid.
filter paper Add 0.5 g sodium sulfite. Store at 4°C
in a llght-proof bottle.
STE (Ssdiunx Chloride-Tri s-EDTA)
0.1 M sodium chloride
Pdtirhr,zlizi.t;gSnlulian 0.05 M Tris-HC1, pH 7.5
1.5Iv1 sodium chloride 0.001 M EDTA
0.5 kl Tris-HC1, pH 8.0.
Autoclave.
Thls 1s a solution of phel~ol,chloroform, and 0.1 M sodium chloride

isoa~nylalcohol, in a ratlo of 25:24:1.A layer of 10 mM Tris-HC1, pH 8.0
water %-ill form on the surface; the PC1 is the
1mM EDTA
lower layer
5% Triton X-100
0.01 i'd Tris-HC1, pH 7.4

0.1 /\/Isodium chloridc 95% formamide
0.01 M magnesium chloride 20 r n M EDTA
8.05%brolnophenol blue
0.05% xylene cyan01 FF
PCC; ",yoiyefhyIcnc Glycol)

Use as a 20% solution. 242 g Tris base
57.1 ml glacial acetic acid
100 m10.5 M EDTA, pH 8.0
400 mM Tris-IIC1, p H 8.3
150 1nA4 magnesium chloride 10x Tag Polymerase Buffer
150 11iMpotassium chloride
125 p14 M. potassium chloride
40 mM dlthiothreitol 100 p l 1M Tris-HC1, pH 8.4
25 p1 1M magnesium chloride
0.001 g gelatin
750 ~1 DEP-H20
3 Ox TBE (Tris-Borate--EDTA) Add one of the following to each of four tubes:

108.0 g Tris base G: 0.2 mM ddGTP
55.0 g boric acid A: 2 rnM ddATP
9.5 g EDTA, disodium salt T: 2 nM ddTTP
750 ml d d H 2 0 C: 1 mM ddCT13
Adjust pH to 8.3 with sodium hydroxide or acetic
acid. Filter, adjust final volume to 1 L.
Tris-Ethan01 Waskt Buffer
50 ml95% ethanol
50 rnl buffer:
0.01 M Tris-HC1, pH 7.5 20 mM Tris, pH 7.5
0.001 M EDTA 1mM EDTA
Autoclave. 100 mM sodium chloride
Store at -20°C.
Termination Mixes for Cycle
Sequencing Urea Mix
50 pM each of dATP, dCTP, dGTP (or 7-deaza- 233.5 g urea (ultrapure)
dGTP), and dTTP 5n.n ml IQX TRE
ddH20 to 500 ml
Par
Intraspecific Differentia tion
Bruce S. Weir
BIOLOGICAL CONTEXT
This chapter considers genetic variation within species. The general goals of pop-
ulation genetic studies are to account for and characterize the extent of genetic
variation within species. Variation provides the raw material for future evolu-
tionary change, and different levels of variation in different populations may pro-
vide evidence for different evolutionary events in the past. Much of the chapter is
directed towards estimation and interpretation of F statistics. For human popu-
lations, these measures are to be used in the proposed human gene diversity pro-
ject (Cavalli-Sforza et al., 1991), and they already have been used to adjust allele
frequencies in forensic calculations (Nichols and Balding, 1991).There has been
confusion in the literature over the use of these statistics to estimate migration
rates (Slatkin and Barton, 1989) and over the use of fixed or random statistical
models (Chakraborty and Danker-Hopfe, 1991).
The characterization of variation rests on phenotypic observations. It will be
assumed here that there is a direct relation between phenotype and genotype,
meaning that only discrete data are being considered. Further, it will be assumed
that the crosses necessary to demonstrate that a band on an electrophoretic gel
(for example) does indeed correspond to an allelic form of a single gene have
been carried out. Different genetic entities, whether allozymes, restriction site pat-
terns, repeat copy numbers, or nucleotide subsequences, will be taken to be sub-
stantially independent. Although means for accommodating associations be-
tween Mendelizing units are available (e.g., Weir and Cockerham, 1989a), they
are beyond the scope of this chapter.
The first analyses of genetic data are those that rest simply on the genotypic
state. Avise (1994) has been able to make statements about the phylogeny of
385
386 Chapter 10 / Weir
species on the basis of n~itochondrialgenotypes quencies of pairs of loci have even more severe
observed in different populations. In this field of problems. Linkage disequilibrium refers to the
"pl~ylogeography,"individuals are genotyped departure of such joint frequencies froi~lthe prod-
and assigned to maternal lineages, and the result- ucts of single frequencies, and theory relates the
ing phylogeny is related to patterns of geograpluc expected value of squared linkage disequilibrium
distibution. In a study of pocket gophers, Geomys to population size and recombination rates be-
pinefus, Avise et al. (1979) found that most mito- tween the loci. Altl~oughit would bc of great ad-
chondrial haplotypes in their sample were local- vantage to be able to estimate either of these
ized geographically, and that related genotypes quantities from frequency data, once again the
tended to be geographically contiguous or over- large sampling variances of linkage disequilibria
lapping. make this unlikely (Hill and Weir, 19941, unless
At the next level, counts of genotypes lead to ancestral combinations of alleles can be inferred
simple measures of variation such as the number (Kaplan et ai., 1995). It is the variance caused by
of alleles per locus or the allelic frequencies. The the stochastic nature of the evolutionary forces
numbers of alleles may be sufficient for the pur- acting on the population, rather than that caused
pose of establishing that there is variation, but it by sampling of individuals for observation, that
IS difficult to use them to compare levels of varia- causes difficulties. The variance introduced by the
tion between different populations. For such pur- evolutionary forces cannot be reduced simply by
poses, allelic frequencies are better suited and taking larger sample sizes. Another problem with
appropriate statistics will be discussed. More basing inferences on linkage disequilibrium is
complex functions of frequencies, such as gene di- that most applications have assumed relations
versity (e.g., Weir, 1989), defined as one minus the that hold only in populations in equilibrium for
sum of squared allelic frequencies, can be used to the joint effects of drift and recombination, Most
address the mechanisms for the maintenance of natural populations are unlikely to be in such
variation. equilibrium.
The main theme of this chapter is that statisti- Further refinements of the classic model al-
cal analyses must be based on biological models. low for specified forces of selection, migration, or
The classic model is that of an ideal population, mutation. Each of these lead to changes in allelic
infinite in size and mating at random for a locus frequencies over time. Whe11 data are available
at which there are no disturbing forces such as from several generations, it may be possible to
mutation, migration, or selection. Such a model make inferences about the evolutionary events
leads to predictions of equality, for example, the acting within the species. Good discussions have
relationship between gene diversity and hetero- appeared previously in the literature. Prout (1965)
zygosity (defined as the frequency of heterozy- detailed the difficulties in estimating the strengths .
gotes).A significant difference between two quan- of selection acting at different stages of the life cy-
tities may indicate the violation of one or more cle, while Cl~ristiansenand Frydenberg (1973)
assumptions of the model. showed .the power of having data collected for
One of the first steps to introduce realism into mother-offspring pairs. Estimating migration or
the classic model is to suppose that the popula- mutation rates in a population generally proceeds
tion is finite, although still mating at random. This under the assumption that the population is at
model allows quantities such as the change in fre- equilibrium (e.g., Chakraborty and Leimar, 19871,
quency of heterozygotes to be related to popula- so that the various evolutionary forces are no
tion size and may suggest a means for estimating longer changing the quantities being measured.
population size (e.g., Laurie-Ahlberg and Weir, Although theory relates allelic frequencies and
1979).Unfortunately, the large sampling variance functions of allelic frequencies to mutation or mi-
of heterozygosity make such attempts of limited gration rates, care must be taken in the analyses.
use in samples that are measured only in the hun- . Estimation of migration rates among local popu-
dreds. Statistics constructed iron1 the joint fre- lations, for example, should use models that rec-
Intraspecific Differentiation! 387
ognize that migration prevents allele frequcncies pling scheme of regions and populations within
in different populations from being independent. regions for the same species, These authors were
Eastel (1986) was able to estimate migration rates all concerned either with aspects of genetic het-
without an assumption of equilibrium. He took erogeneity among populations within a single
advantage of the known history of different pop- species or with variation in mating structure.
ulations of giant toads, Bufo marinus. Compar- The growing availability of rnalecular infor-
isons among allelic frequencies between intro- mation has allowed more detailed studies of
gressed populations and the original populations within-species variation. For example, measures
allowed rates of admixture to be estin~ated. of population structure for human VNTR loci
This chapter is concerned with survey data have been estimated by Weir (1994). Gaur and
from one or several natural populations. It is be- Clegg (1993a,b) studied nucleotide variation at the
coming increasingly easy to collect molecular data Adh-1 locus in the genus Zen (maize and teosinte)
on many loci for many individuals in many pop- and in pearl millet, Pennisteum glaucum. They
ulations. Temporal information is not generally were interested in the evolution of that locus, and
available, so direct evidence for selection, for ex- estimated divergence times between maize alleles
ample, is not obtained. What the data do allow, to be as long as 2 million years. Schaeffer and
however, is a characterization of the relationships Miller (1991) looked at sequence variation at the
between genes at various levels in a hierarchy. Adh locus in Drosophila pscudoobscura and used
The degrees of relatedness of genes within that information to infer the time of divergence
individuals, between individuals within subpop- between populations in Colombia and California.
ulations, between subpopulations within popula- Statistical procedures have been developed to al-
tions, and so on, can all be estimated. Com- low sequence data to provide information about
parisons between estimates allow inferences to be factors such as geographic subdivision (Hudson
made about the forces acting within a species. If et al., 1992a; Slatkin and Maddison, 1990) and se-
genes within individuals appear to be related in a lection (McDonald and Kreitman, 1991).
population, there may be departures from ran-
dom mating. If genes within individuals appear Genetic and Statistical Sampling
related to a different extent in different popula-
tions, the possibility of different mating systems Unless a population is absolutely uniform for klie
or population sizes in those populations needs to loci being studied, different samples from the
be investigated. If different loci show different de- population will show different levels of gcnctlc
grees of relatedness for genes within individuals variation. This is simply a consequence of the
within the same population, the possibility of se- "statistical sampling" that results in each sample
lective forces acting on those loci needs to be con- having a different set of ~ndividuals.Statistical
sidered. Note that although the degrees of rela- sampling variation can be accominodated in
tionship may be expressed in terms of measures analyses set up to allow statements to be made
of inbreeding, the most general interpretation re- about the population, based on the sample a t
gards them as correlations (Cockerham, 19731, as hand. Consider, for example, estimating the mean,
detailed below. p, for some variable. If obscrvations of tlne van-
Estimation of levels of relatedness of genes in able are denoted by X, and the sample mean oi 17
a hierarclucal sampling scheme has been consid- values by X,then ,U is estimated by X .Variation
ered by many authors for a wide variety of between samples is anticipated by assigning a
species. Schoen (1982) looked at populations of variance of 02/nto this estimate, where o2 is the
the annual plant Gilia achilleifolia, whereas Guries variance of the original variable. As the sample
and Ledig (1982) considered the longer living tree size gets larger, the population is better reprc-
Pinus rigida. Foltz and Hoogland (1983) looked at sented by the sample and different samples be-
populations of the prairie dog Cynomys ludovi- come more similar (i.e., the among-sample vari-
cianus, and Chesser (1983) looked at a nested Sam- ance of the sample mean decreases).
in I~Opulation genetics there is another level of
s a m p ] ~ nto
g be considered. Each generation of a
Fixed and Random Models
poyula [ion rs formed by the muon of gametes cho- The previous discussion also can be phrased in
sen froin among those produced by the previous terms of fixed and random effects, to show how
gencrd iion. This "genetic sampling" process would the intended scope of inference affects the sam-
cause [he population to look different if the forma- pling properties of genetic statistics. If there is in-
tion of a new generahon was replicated. Genes can terest only in the particular population sampled,
also be altered by mutation, and pairs of genotypes then prior genetic sa~npllngis not of consequence.
make differential contributions to succeeding gen- It is necessary only to take account of the statisti-
erations because of selection. These forces are sto- cal sampling for repeated samples from this one
chast~cbecause they ~ncludea random element- "fixed" population. Future samples would be
the particular genes affected cannot be specified in taken from the same population. Comparisons be-
advance. Population genetic theory depends on the tween different fixed populations can be phrased
conicyt of replicate populations which are main- in terms of means, and it will be shown that nu-
tamed under the same conditions, but which will merical procedures of permutation and resam-
differ because of genetic sampling. It is possible to pling are of use in comparing populations.
derive variances for statistics of interest that in- A different situation arises when the sample
clude both types of variation. is to be used to make inferences about the species
One use for such total variances is in predict- as a whole. In this case there is less interest in the
ing future values. For a part~cularpopulation, it is particular population sampled, which can now be
posslblc to predict the expected value of a statistic, regarded as being "random." Future samples may
such as allelic frequency, heterozygosity, or link- very well be drawn from a different population,
age dlsequilibriumn, in a future generation but not so both statistical and genetic sampling variation
to specify the actual value of the statistic. For a need to be considered. The distinction between
neutral gene (i.e., a gene unaffected by selection) fixed and random effects arises in statistics. In the
in a flnrte population, for example, it is known analysis of variance context, it is easier to detect
that the allelic frequencies have constant expected differences between means in a fixed-effects situ-
values over time, although in any particular pop- ation because a smaller variance is used in the de-
ulation the frequency may have drifted to any nominator for the test statistic. It is only one spe-
value between 0 and 1. Statements about the sta- cific set of means (fixed effects) that are being
tlstic in some future sample therefore must take compared, and not some population of means
into account the variation between replicate pop- (random effects) for which the means at hand are
ulations, as well as that between replicate samples just a sample.
floxn any one population. The distinction between fixed and random
A difficulty arises 111 that the magnitude of be- models in the genetic context has been made pre-
tween-population variation cannot be estimated viously by Cockerham and Weir (1986). They
wit11 a sample from a single population. One way stress that the random model considers each pop-
around this problem is sometimes afforded by the ulation to be a replicate sample of the evolution-
availability of several unlinked loci in the data set. ary process. Chakraborty and Danker-Hopfe
Although genes at different loci are never com- (1991) restrict attention to a fixed model, and
pletely independent (since they are carried on the point out that "in such formulations no a s s m p -
same gametes between generations), they may tion is needed regarding the evolutionary mecha-
have frequencies that are nearly independent and nism that determines the process of genetic differ-
therefore are functloi~allyequivalent to separate entiation within and between subpopulatiol~s."
populairons. Genes at these loci have each been This seems unlikely to be useful for evolutionary
exposed to the same genetlc forces between gen- studies.
eratlons, but can have different pedigrees in the As an example of the differences in ranges of
sanie ~ ~ asadoygenes in replicate populations. inference that follow from the choice of flxed or
Intraspecific Differentiation 389
random models, consider the study of M.C. Fixed Populations

Baker et al. (1982). These authors sampled spar-
rows, Zonotrichia leucophrys nuttalli, from areas Allelic Frequencies
known to have different song dialects. They were With data collected for genotypes, the first de-
seeking genetic evidence for a lack of maiing be- scriptors of a population are simply the geno-
tween different dialects. A fixed-model analysis typic frequencies. When the population is sam-
would have restricted them to making state- pled in such a way that every member of the
ments about their particular set of four dialects population has an equal chance of being sam-
in California. By adopting a random-model pled, and individuals are sampled indepen-
analysis, however, they were able to make state- dently, the genotypic counts are multinomially
ments about dialect groups for the species as a distributed. This distribution gives the expected
whole. frequencies with which the counts take their pos-
sible values in repeated samples from the same
population. As a consequence, the count for any
STATISTICAL METHODS particular genotype has a binomial distribution.
Defining counts as n,] for genotypes A,Al, and
Most population genetic data sets are based on the sample size as n, the binomial property is ex-
the genotypes of diploid individuals. Exceptions pressed as
include cases where isogenic lines of Drosophiln
are used so that the data are essentially on ga-
metes rather than on genotypes. Haploid data
also are obtained for non-nuclear genes (i.e., mi- The average or expected counts, 6(nij1, over all
tochondrial and chloroplast genes). In human samples from a population in which genotypic
studies, with data collected for family pedigrees, frequencies are P,] ,are therefore
it is often possible to infer phase and so obtain
haplotypic data. Even here, though, a proper
analysis should take account of the fact that the
basic sampling unit is the individual and it is the and the variances are
genotype that is recorded or inferred from a phe-
notype.
One-locus genotypic frequencies will be writ-
ten as P's with subscripts indicating the alleles as Sample genotypic frequencies_areobtained by di-
in PAAfor AA or for AiAI.Allelic frequencies viding by sample size n (i.e., 4,= n,, / n ), so that
will be written with lower-case p's. For allele A,,
then,
P~=P,,+%C?,
I*'
The sum over heterozygous classes, indicated by The sample variances (obtained by using observed
involves every heterozygote A,AIwith allele A,, genotypic frequencies in tlus last equation) or their
and by convention the sum includes each AiAj square roots (i.e., the standard deviations) can be
just once. presented along with the sample frequencies.
Sample genotypic values will be distin- If the locus has several alleles, the number of
guished f r p the population values they estimate genotypic classes is largc and the data may be
by tildes, P . The population value refers to the summarized better with allelic frequencies. For
particular population in the fixed model, or to all codominant alleles such as those found for al-
replicate populations in the random case. Ex- lozyme markers or VNTR loci, allelic numbers can
pected values will be indicated by the symbol &. be found directly from the genotypic numbers:
at least one of the assumptions that leads to such

frequencies does not hold for that population. The
population may not be large and mating at ran-
and allelic frequencies found by dividing by the dom, or it may have substructure. There may be
number of genes, 2n: forces suclz as selection or migration acting. Non-
compliance with W E does not, by itself, indicate
the reason for non-compliance. Unfortunately,
compliance with HWE does not mean that all of
the assumptions have been met, since jt is not nec-
Taking averages over all samples from the popu- essary that the assumptions hold for the popula-
lation does give tion to have HWE frequencies. Lewontin and
Cockerham (1959)showed that certain patterns of
selection can lead to HWE frequencies, and C.C.
Li (1988) presented a similar argument for non-
but the variances depend on genotypic frequen- random mating.
cies as well as allelic frequencies:
Testing for Hardy-Weinberg Equilibrium
A recent review of the many procedures for test-
ing for Hardy-Weinberg genotypic proportions in
as noted by Kempthorne (1957).Allelic frequen- a population was given by Maiste (1993),
cies do not have binomial distributions. It was be- prompted by concerns in the forensic community
cause they ignored this fact that Chakraborty and about HWE tests for loci with many alleles. The
Danker-Hopfe (1991) had an error in their Equa- VNTR loci used for human identification have
tion 3.4, and subsequent equations, that makes many alleles, and as many as half of the possible
their analysis approximate. genotypes at a locus may not be seen in moderate
When Hardy-Weinberg equilibrium (HWE) sized samples. If the HWE relation for genotypic
holds, so that qi = pi2,~il= 2pipl, the variance of an frequencies can be demonstrated, then the fre-
allelic frequency reduces to the form for the bino- quencies of every genotype can be estimated as
mial distribution the appropriate product of allele frequencies
(Weir, 1992a).
Maiste was able to show the satisfactory be-
havior of the Pearson goodness-of-fit X2 statistic,
even when some of the expected genotypic counts
In the HWE case, allelic counts are themselves bi- were small. Similar conclusions were reached by.
nomially distributed, and the population can be Lewontin and Felsenstein (1965). Wit11 v alleles
completely characterized by allelic frequencies. at a locus, the quantity
Otherwise, allelic frequencies should be presented
along with standard deviations calculated accord-
ing to the genotypic frequencies. Evidently, then,
a test for HWE should be one of the first steps in
studying intraspecific differentiation.
Note that HWE testing is being suggested is expected to have a X 2 distribution with
here to see if populations can be characterized by [ V ( V - 1)]/2 degrees of freedom when there is
allelic instead of genotypic frequencies. Although HWE. It gives a global test that all of the geno-
the finding of W E frequencies therefore simpli- types meet HWE frequencies simultaneous1y.
fies further analyses, there are also biological ram- Maiste (1993)showed by simulations that the test
ifications. A demonstration that a population does has good power, meaning that it is likely to detect
not have HWE genotypic frequencies means that real departures from HWE, By contrast, he found
that the likelihood ratio test (Sokal and Rohlf, unusual as the sample). Tlus aggregate probabll-
1969; Weir, 1992b) did not always have good ~ty,a, measures the support from the data for the
power. If there are suspicions in any particular sit- HWE hypothesis, and 1s the significance l e v ~ l .
uation that the goodness-of-fit test statistic is giv- with which the hypothesis would be rejcctcd. Ap-
ing spuriously large values because expected val- plying this conditional test presents a consider-
ues are very small, it is a good idea to list the able computing challenge, slnce the n u n ~ b c roi
observed and expected numbers and look at the genotypic arrays can be astronomical. Guo and
contributions from each class to the test statisitic. Thompson (1992) applied a randornizatlon proce-
If rejection of the HWE is due in large part to one dure to the problem, and this can be described as
genotype with an expected count: of less than one follows.
and an observed count of one, for example, it may Suppose a deck of 212 cards is constructed,
be appropriate to combine genotype classes to in- with the first n l cards marked to represent allele
crease the expected count. A1, the next 122 marked for allele Ap, and so on.
A more powerful testing strategy is Fisher's The deck 1s then shuffled (pcrmuted),and succes-
exact test, used in conjunction with an insight pre- sive pairs of cards taken to represent n genotypes.
sented by Guo and Thompson (1992). Under Under the HWE l~ypothesis,genes are d~strlbutcd
HWE, the probability of obtaining the sample set independently into genotypes, so the genotyp~c
of genotypic counts, {nil),is array found by permutation corresponds to one of
the arrays possible under HWE. It has the same
allelic frequencies as the or~ginalsample, and ~ t s
conditional probability can be calculated with the
above equation. The process is repeated, and if m
of N shuffled arrays are as probable or less proba-
and the probability of obtaining the sample set of ble than that of the original samplc, the signlh-
allelic counts, (nil,is cance level is estimated as 4= m / N . If it is deslrcd
to be 95% certain of being within 0.01 of the true
value of a, the blnorn~altheory provldes N =
(196I2a(1- a), which has an approximate upper
bound of 10,000. Tlus test 1s an example of a per-
mutation test, treated at length by P. Good (1994)
The ratio of these two quantities is the probability
of the genotypic array conditional on the allelic Differentiation between Populations
array, and it does not involve the unknown true Under the fixed-poptllation approach, different
allele freqencies, pi: populations for the same species are compared
simply by comparing frequencies. When HWE
cannot be assumed, this requires the comparison
of genotypic frequencies.
CoNrINGENcY TABLES The most straightfor~vard

The quantity H is the number of heterozygotes in procedure is to use contingency table x2 tests.
the sample. For large sample sizes and many al- With v alleles at a locus, the genotypic counts In
leles, this conditional probability will be very each of r samples are arranged in a fv(v I- I)]/
small, but it is necessary to find out if it is unusu- 2 x v contingency table and a x2 statistic with
alIy small. In other words, what is needed is the ([v(v +- 1)1/(2 - 1))x (r - 1) degrees of freedom 1s
aggregate probability of all the possible sample calculated. In practice, the method has problems
genotypic arrays, with the same allele frequencies, when some cells have small expected counts,
that are as probable or less probable than the since this can give test statistics that are spurr-
sample genotypic array (i.e., that are at least as ously large, and it may be necessary to collapse
392 Chapter 10 1 Weir
the least frequent classes This problem increases example of a random number generator program).
wlth :he number of alleles, but even for two al- The parameter 41 is estimated from tlte new Sam-
lele> a sample of size 100 is expected to have ple, and the process is repeated many times-per..
only four Individuals m the aci class when allelic haps 1000 or more. In place of the single estimate
frequelicles are = 0.8, pn = 0.2. Conventional from the original sample, bootstrapping provides
wlsdoln says that goodness-of-fit X2 tests should as many new estimates as desired. While this col-
not be performed on classes with expected lection of new estimates cquld be used to provide
counts less than five, although this is probably an estimated variance for d , it is more informative
too conservative. Koehler and Larntz (1980) gave to work with the whole distribution of estimates.
rules rtklritlng sample slze to numbers of cells. As For example, a 95% confidence interval for 4 can
mentioned in the HWE testing section, doubts be constructed as the limits between which the
about the effect of small expected counts are best middle 95% of the bootstrap estimates lie.
resolved by looking at the contribution of each Bootstrapping within each of the r samples,
class to the test statistic. When HWE is assumed, therefore, will provide confidence intervals for the
l t 1s sufficient to compare the allele arrays in each population allele frequencies, without having to
of the populations. The contingency table is then invoke the binomial theorem and therefore with-
only zl x r, and problems of small expected out having to assume Hardy-Weinberg equilib-
counts are less likely. rium. Two populations can be judged to have
different allele frequencies if the estimated fre-
KUMERLCAL RESAMPLING A n alternative to the quencies have non-overlapping confidence inter-
conc~ngencytable approach is provided by vals. How wide should these confidence intervals
nurnerlcal resampling (Efron, 1982). This is a be in order to have a specified confidence in the
rrie~msof making inferences about population statement that the populations have different fre-
allele frequencies from the sample frequencies. quencies? An approxiinate answer can be based
Variances can be estimated and confidence inter- on normal distribution theory (Weir, 1992b).
vals can be constructed, both referring to repeat- An answer that does not invoke normality
ed sampling from the same populations. Briefly, can be based on Chebyshev's inequality, which
numerical resampling mimics the drawing of states that the probability of a random variable
new samples from the original sample for each being more than k standard deviations from its
populatlol~.Two metllods are commonly used: mean is less than l/k2. For population I, if the
jackkiiiflng and bootstrapping. They were sample frequency for allele A, has variance C T ~ ,
descr~bedIn a genetic context by Dodds (1986) this means that
bul 3nly bootstrapping will be considered here
smce ~i provides much more information about
the parameter being estimated than does jack-
knifing I t leads to an estlmate of the sampling ,. a similar equation for the sample frequency
with
disiribution of the estlxnate. PI, in population 11. Under the hypothesis that
For bootstrapping, from the original set of 11 the two populations have the same allele frequen-
observations a new sample of the same size is cies, P,, 1= P,,,and the same variances of sample
constructed by random sampling with replace- frequencies, Chebyshev's inequality is
ment. In other words, each of the original obser-
vatlolis 1s equally likely to be selected to consti-
tute ,iny one of the members of this new sample.
The bootstrap sample therefore is likely to have
some of the original observations represented For this last probability to be 0.05, corresponding
many tlmcs, and some of them not represented at to a 95% confidence interval on the difference of
all. Diawing tlte sample requires the use of frequencies, it is necessary that K2 = 20. Wow-
(psecdo)random numbers (see Weir, 1990 for an ever, if this corresponds to non-overlapping con-
I n traspecific Differentiation 393
fidence intervals for the two separate samples

2 k o = ~ g & , SO that k2 = 10 and these are 90%
confidence intervals.
PERMUTATION TESTS A distribution-free ap- where F=Cibi,h is the average sample fre-
proach can be based on permutation. ~xtending quency of the allele over the samples. If the Sam-
the previous analogy of a deck of cards to two ples were of unequal sizes, n,, weighted means
populations, the deck now has a card for each and variances should be used and then
allele in the two populations. The hypothesis of
equality of frequencies in the two populations is
ec~uivalentto alleles being distributed indepen-
dently into two samples. After the permutation
is completed, the first 2 n cards
~ are taken to rep- with
resent sample 1, and the remaining 2nrI cards to
represent sample 2, where nr and nfl were the e = xir xi
~ ~ jni,~ E /=x ~ ni/r
or'lginal sample sizes. The proportion of permu-
tations in which the difference in allele Aicounts It is true that this quantity increases as the
between the two samples is as great or greater sample allele frequencies diverge, but it is difficult
than the original observed value provides the to assess the significance of divergence. Using the
significance level for rejecting the hypothesis of FSTstatistic to test for gene frequency differences
equality. This test is local to allele A,. If HWE is would require knowledge of its sampling proper-
assumed, a global test over all alleles would be ties. In the fixed-population framework, it is pos-
based on the conditional probability of the two sible to relate FSTto the contingency-table X2 test
arrays: statistic. Suppose attention is focused-on only one
allele, A, and its alternative allele(s),A. If the Sam-
ple frequency of allele A in the ith sample is Fi,
and the sample is n,, the x2 statistic for com-
paring allele counts over populations is
Permutation tests for comparing populations

were discussed by Roff and Bentzen (1989)) al-
though Manly (1991) gave more efficient random-
ization techniques.
F sTATrsTIcs It is often the case that a single sta-

tistic, written as FST,is calculated to compare
populations. In the words of Bowcock and as has been given previously (e.g., Hedrick, 1983,
Cavalli-Sforza (19911, "The variation in gene fre- although he used r in place of r - 1 in defining the
quencies between populations can be measured variance of allele frequencies over populations).
by calculating their FST.This corresponds to the There seems little point, however, in calculating
variance among gene frequencies, standardized Fsr in order to test for population differentiaton as
by the mean gene frequency among all popula- that can be done directly with the x2 test or alter-
tions." For a set of r populations with sample natives such as the permutation test. Under the
allele frequencies, Fi (where i = 1, 2, .. . r) for hypothesis that the allele frequencies are the same
some allele A, these authors would define the over populations, the x2 statistic has an expected
statistic as value of r - 1, so that the expected value of Ffl is
1/77, This dependence on sample size for an ex-
pected value is not desirable, although for very ferentiation, and this differentiation is conve-
large sample sizes FsTis expected to be close to niently quantified with the F statistics of Wright
zero when populations have the same frequencies. (1951), or the analogous measures of Cockerham
In the fixed model, the only variation being (1969,1973).Tl-tese quantities measure tl-te degree
considered is that between samples from the same of relatedness of various pairs of genes. Cocker-
population. This is the variation giving rise to the ham (1969) described tl-te three basic quantities in
distribution of the test statistic, and is to be con- the situation when diploid individuals are sam-
trasted to the random model case considered be- pled from a series of populations as follows: (1)
low and by Bowcock et al. (1991).A quite differ- the overall inbreeding coefficient F (Wright's F,)
ent approach was adopted by ~ h a k r a b o r andt~ is the correlation of genes within n-tdividualsover
Danker-Hopfe (1991).They defined FSTin terms of all populations; (2) the coancestry 8 (Wright's
the population frequencies, p, . No provision is FST) is the correlation of genes of different indi-
made for statistical or genetic sampling, and so viduals in the same population; and (3)f Wright's
there is no basis for making statements about the Fis) is the correlation of genes within individuals
size of the parameter. within populations. Because the populations are
assumed to have been isolated since the ancestral
population, genes in one population are indepen-
Random Populations dent of those in another.
Under the random model, the populations sam-
pled may be considered to represent the species Haploid Data
and therefore to have a common evolutionary an- If data are available on genes directly, then the
cestry. Even though the populations may have analysis is in terms of allelic frequencies and is
been distinct for some time, the analysis is built phrased conveniently in terms of a set of indica-
on the assumption that there is a single ancestral tor variables. For gene j in a sample from popula-
population. The expectations referred to by means tion i, a variable xy can be defined by x,! = 1 if the
and variances now refer to repeated samples from gene is allele A, or xll = 0 if tl-te gene is not allele A.
the populations and to replicate populations. In The expected value of xij over samples and
the absence of disturbing forces, such as selection, replicate populations is therefore the allelic fre-
all populations are expected to have the same al- quency p~ common to these populations. Under a
lelic frequencies. model assuming no forces such as mutation or se-
Underlying the analysis of differentiation in lection, this frequency is also that in the ancestral
the random model is the notion that genetic Sam- population. The sample allelic frequency FA, in a
pling causes different genes in a population to be sample of 12, genes from the ith population can be
dependent, or related. Even though individuals or written as
genes may be sampled randomly, the process of
taking expectations must recognize that they are
dependent througll thelr shared ancestry. Another
essential concept for the analysis is that the rela-
tionships between various genes are relative to For l-taploiddata, there is only one F statistic.
the least-related genes in the data. It is generally It measures the relationship between different
assumed that these least-related genes are inde- genes in the same population, relative to the zero
pendent-the data do not allow measures of rela- relationship between genes of different popula-
tionship to be estimated otherwise. tions. The quantity is written here as 8 and was
Interest can be centered 017. the extent to termed FST by Wright (1951). It allows the ex-
which different populations within the species pected value of a sample allelic frequency to be
have differentiated over the time since the ances- written as
tral population. The action of evolutionary forces,
or genetic sampling, will result in intraspecific dif-
Table I
Analysis of variance layout for haploid data
X
df Sum of squares* Expected mean square
Between populations r-l
Within populations l7A(I - P A ) -- $1

= 72. - 1'
= x,7'p~z(I- ~ A I )
Although the allelic frequency ,un is assumed to square for between populations is written as MSP
remain constant, for finite populations the and for genes witliin populations as MSG:
coancestry will increase over time as inbreeding
accrues in each population. Tn other words, 8
measures the extent of differentiation between
populations. It is worth stressing that this mca-
sure of between-population differentiation is a
consequence of the relatedness of genes within
populations. As il~dividualsbecome more related
within populations, the independent populations
become more differentiated.
Estimation of 8 can proceed by the method of
moments, with the various statistics being con- The value of 11, is given in Tables 1 and 2.
veniently organized in an analysis of variance for- It may be convenient to identify two terms as
mat (Table I). [Other metl~odsof estimation were the colnponents of variance for genes within pop-
reviewed by Spiehnan et al. (1977).] Data are con- ula tions,
sidered to be available from r populations, with
different numbers of genes, n,, from each being al-
lowed. Thc weighted average frequency of allele
A over all the samples is written as and between populations,
Note that the variance cornpollent between pop-

ulations is the same as the covariance between frc-
For the method of moments, the parameters quencies of genes in differcnt individuals within
are estimated by equating the observed and the the same population (Cockerhain, 1969). The LTVO
expected mean squares. The observed mean variance components, wltldn and between popu-
396 Chapter 10 I Weir
Table 2
Analysis of variance layout for diploid data
Source df Sum o f squares* Expected mean square*
2
Betwccn populations I.-1 2C:=ln,(PA,- PA.) PA(^ - PA)
= 2(r - 1 ) ~ s ; [(I - F)+ 2(F - 0 )+ 2n,9]
Individuals in yapulntio~~s CLl("i ~ L - 2Fii)

C : = i n i ( ~ "AA PAP - PA)
=n.-r = 2r?iPA. (1 - PA.) - 2(r - 1)2s; [(I - F ) + 2(F - o)]
1 --
- -nzHA.
2
G e m s 111individuals c:=, nr = I.L c:=,

--
1
(6Ai PA(^ - PAP- F,
= -mHA.
2
lntlons, reflect all of the factors that lead to varia- e=-TI

A
tlon ILI gene frequencies The sum of these two T2

components involves only the allellc frequency
p,,, ancl 1111s allows the unknown quantity with
p, ( I - p,,) to be eliminated. An estimate of 0 then
can Le iound as T~= s i - n1[ ~ A ( l - ~ ~ )r-- 1T ~2 ~ ]
8=
A MSP-MSG
MSP +()I, -1 ) ~ s ~ ;
This can also be expressed in terms of the variance

of allc!lc frequencies over populations For a large number of large samples, when both
1/Z and l / r can be ignored, this estimate reduces
to
which is just the estimator discussed above in the

LJsing elther of tkese two expressions, the estima- fixed-model section. Note, however, that the esti-
tor is a ratio of two functions of the data, whose mators have been derived under very different
exyeciatlons differ by the factor 0. The ratio may models. Nothing was implied about evolutionary
be denoted forces causing population differentiation in the
In traspecific Deerentiation 397
fixed model. There is a difference between arriv- linearly with the time since divergence of the two
ing at this equation as an approximate expression populations from the ancestral population. When
for the estimator of a parameter, and taking the all populations are fixed at all loci scored, the esti-
equation as the definition of a quantity of interest. mator is undefined, since the equation becomes
Estimation of 6 has been presented in terms of the ratio of zeros. 'Indeed, unless there is addi-
one of the alleles, A, at a locus. If there are only tional information on the molecular structure of
two alleles at the locus, then the same estimate the fixed alleles, there is no information in this
would be obtained if the other allele is used, For case on how long the populations have been fixed
more than two alleles, however, a different esti- or on the time of divergence of the populations.
mate will result wit11 every allele. Since the para-
meter 6 is the same for every allele under a model rNFERENcEs ABOUT 6 Making inferences about 6
of no disturbing forces, all of these estimates refer beyond simply estimating it may be accom-
to the same quantity and an appropriate average plished by numerical resampling. Confidence
is needed to give the best single estimate. For the intervals follow from bootstrapping. Because 8 is
uth allele, the estimate could be written as a parameter appropriate for the random model,
numerical resampling cannot be performed by
resampling genes at a locus within populations,
as was done in the fixed model. The resarnpling
must mimic both the genetic sampling that caus-
and then an overall estimate with the desirable es replicate populations to differ, and the sam-
properties of low bias and small variance is given pling of genes for observation from each popula-
by combining the information from all v alleles tion. Two possibilities are suggested.
In the first place, resampling may be done
over loci. For a study in which m loci are scored,
bootstrapping over loci provides confidence in-
tervals for 6. Each bootstrap sample consists of a
There is an additional extension to cover the case set of m pairs of values,
where several loci are scored. Once again, under
a model with no disturbing forces, every allele at
every locus provides an estimate of the same
quantity. Indexing the loci by I and alleles within drawn with replacement from the rn calculated val-
foci by u, then the individual estimates are ues, and the combined estimate formed from this
new collection of T's. As before, the middle 95%of
these new estimates will provide a 95% confidence
interval. The hypothesis that 6 has some specified
value can be rejected, at the 5% significance level,
and the overall estimate from all m loci is if the confidence interval does not include that
value. Data sets with overlapping (90%)confidence
intervals provide no evidence that the correspond-
ing 6 values differ (at the 5% level).
It is also possible to bootstrap over popula-
tions; this may be done for each locus separately.
With 6 estimated, there is a quantification of In this way the estimates of 6 for each locus can
the degree of divergence among a set of r popula- be compared. Loci that do not give overlapping
tions. Equal allele frequencies in all populations confidence intervals for 6 may be affected by dif-
will cause 8 to be zero and, for a pair of popula- ferential disturbing forces such as selection. Boot-
tions, O may serve as a measure of genetic dis- strapping over populations requires that there are
tance. Under the drift model, In(1- @)willincrease several populations, just as resampling over loci
supposes that several loci have been scored. In lowed, but it is now possible to estimate the de-
practice, at least five populations or loci appear to gree of relationship between genes within indi-
be necessary. viduals, F, as well as tliat between genes of differ-
A promising method for making inferences ent individuals, 8. The preceding haploid analysis
about 0 may be suggested by recent work of essentially dropped the distinction between F and
Brownie and Boos (1994). These authors showed 8. Differentiation between independent popula-
that analysis of variance methodology may be tions is still measured in terms of 8, reflecting the
quite robust to lion-normality. If this suggestion, relatedness of individuals witlun populations, but
also hinted at by D'Agostino et al. (19881, holds the analysis also provides an estimate of tlie over-
for zero-one variables like the indicator variables all inbreeding coefficient, F. Remember that, ln
x,, used here, then the ratio of mean squares in Wright's notation, F = FITand Q = FSTThe degree
Table 1could be used to test the hypothesis tliat 0 of inbreeding within populations, f, or Frs, can be
is zero. The mean squares themselves may be expressed as
taken to be X2 variables, and this used to construct
confidence intervals without tlie need for numer-
ical resampling. The current availability of com-
putmg power, however, means that numerical re-
sampling does not impose an undue burden on Only f can be estimated from data from a single
the data analyst. population, by means of a momel~t-estimator
It is necessary to comment on the possibility such as
that the estimate of Qmaybe negative. Two situa-
lions are likely to give this outcome. It may be
that the true value of e is oositive but small. Since
the estimale has low bias, the estimate is about as
likely to be below as above the true value. In this
case, estimates less than the true value will often (Weir, 1990). The estimate o f f provided by tlie
be negative. The second situation is that t l ~ epara- program given in Weir (1990) is an average value
meter may be negative. In statistical language, applying to all populations sampled.
this corresponds to a negative intraclass correla- The expected value of tlzc squarcd sampIe al-
tion, and indicates the advantage of adopting tlie lelic frequencies now must reflect the two levels
perspective of correlation coefficients. In genetic of relatedness of different genes within papula-
language, it means that genes are more related be- tions-both are a consequence of prior genetic
tween than within populations. This could result sampling:
from some forms of migration that violate the as-
sumption of the populations having remained
distinct since an ancestral population. A more
complete discussion of biological causes for a neg-
ative component of variance between popula-
tions, QpA(l- pA)< 0, was given by Cockerham
(1973) for tlie analysis ol diploid data. Any mat-
ing system, such as the avoidance of self-mating, Once again, the estimation procedure follows nat-
that causes genes to be more alike between indi- urally from an analysis of variance layout. Tliere
viduals than within individuals, can cause this are now three sources of variation: populations,
phenomenon. individuals within populations, and genes within
populations. The sums of squares are constructed
Diploid Data with gene and genotypic frequencies, as shown in
When observations are made on diploid individ- Table 2, and fallow from the usual nested analy-
uals, the analysis should be performed at the sis of variance sums of squares for the indicator
diploid level. The same general approacli is fol- variable x,]k for the kt11 gene (k = 1,2) in the jth in-
Intraspecific Difjferen tiation 393
dividual (j = 1, 2, ... n,)in the ~ t population

h (i = These estimates have all been presented In
1,2, ... 13. The expected mean squares could have terms of one particular allele, A. Tn practice, sev-
been written in terms of variance components far eral alleles at several loci will be available and
populations as o$ = pA(l - pA)@,for individuals each will provide an estimate of the same para-
within populations as o:= pA(l - pA)(F - B), meters under the basic model of neulrality. To
and for genes within iitdividuals as oi= combine estimates over all alleles, numerators
PA(I-PA)(~-F). and denominators are combined separately For
In Table 2, the sample sizes n, are for the num- the ~ l t hallele at the It11 locus, the estimates may be
bers of individuals in each sample, whereas in expressed as
Table 1 thgy were the numbers of genes. Also in
Table 2, is the frequency of AA homozy-
gotes in the ith sample. In the whole data set, ITA
is the frequency of heteroz~~gous individuals that
have allele A. In other words
and t l ~ ecombined estimates arc
If the observed nzean squares are written as

MSP, MSI, and MSP for populations, individuals,
and genes, respectively, then the measures of dif- Also, as 111 the haploid case, numerical resarnpllng
ferentiation can be estimated as: over loci provides the means for maklng inlcr-
ences about the parameters F and 0, whereas rc-
sampling over populat~onsallows comparisons
betwccn loci.
Note that large numbers of large samples al-
8=
MSP - MSI -3 low approximate expressions to be found for the
MSP + (n, - 1)MSI+ n,MSG - S2 estimates from each allele.
These are probably the most convenient comput-

ing formulas, but it 1s possible to give explicit ex-
pressions for the terms S1, S2and S3:
S2 = p“, (I-&$)- but the use of computers for data analysis n~akes
these common levels of approx~lnationunneccs-
sary.
&[ ~4( np- At i ,. ( l -- pA )
-
-
For (monoecious) populations mating aL ran-
dom, genes are equally relatcd whethcr tl-icy arc
withi11 or between individuals. III this case F = 0
=n1[ ( G I ) + ( ~ - l ) ( E - n ~ ) ] s ~
A
or f = 0. Therefore, estimates of F and 0 that differ
- nc
412: ~~,1 significantly may indicate departures from ran-
dom mating. Any avoidance of mating of relatives
will cause 8 to exceed F and f to be negative. In
the language of variance components, t l ~ compo-
c
nent for individuals with~npopulations will then
s3 -%fi
- A be negative. Recall that this component is actually
the cillference of two posltlve quantities. It is not quencies no longer remain constant over time,
b c ~ n gdairned that there is a variance that is ncga- and it is not appropriate to separate p~ and 8 in
tlve Dlfferent patterns of differences for the two the expectation of ~2,.
In this case, it is sufficient
estlmales (1 e., F and 8)at different loci indicate to work with measures of allelic similarity with-
that there are forces other than non-random matin a n d between populations. Migration is
mg alfecilng these loc~ allowed between the populations. Using nota-
The effects of selcctlon on the F statistics were tion from Cockerl~amand Weir (19871, Q2 is the
deta~ledby Cockerham (1973) Dlfferent selective probability that two genes within a population
forces in different populations, tending to Increase have the same allelic state, whereas Q3 is the
their differences, will lncrease the value of 8 chance that two genes, one in each of two sepa-
Within a population, Lewontin and Cockerhain rate populations, are in the same state. The
(1959) showed that selection at a locus gives neg- expected mean squares in the analysis of vari-
at~vef values unless the viability of a heterozygote ance layout, after summing over alleles, become
1s less than or equal to the geometric mean of the
vlabll~tlesof the two homozygotes
1NFI;RENCES FROM 8 A random model allows

statemeiiis to be made about evolutionary forces
from estimated values of 8 Once an evolution-
ary model IS specified, In principle it is possible These similarity measures, in turn, can be related
to determine the distribution of 8 values and to descent measures, %, for genes within popula-
then to compare this distribution to empirical tions and 6 for genes between populations. (Until
dlstnhutions. In practice, it is difficult to derive now populations have been assumed to be inde-
algebraic results. Cockerham and Weir (1983) pendent, so that there was no need for a 03, and
preser~tedresults for the expected variance of 0 t):! was written as 8.)The only estimable quantity
under a model of pure drift or specified mating is
system, but this is a long way from a complete
dlst!:buhon. It appears that simulation methods
are necessary. Bowcock et al. (1991)simulated
five human populations under a model of a phy-
logcnetlc tree of known topology. They com- where /3 is analogous to f. The analogy should not
pared tlzc distribution of resulting 0 values with be carried too far, since f was defined in terms of
the dlstr~butionof estimated values for a set of the relationship of pairs of genes within and be-
100 DNA polymorphisms. Those loci giving 8 tween individuals of the same population,.
values 111 the lower tail of the simulated distribu- whereas /?refers to pairs of genes within and be-
tion ("low variation") were considered to be tween different populations. Cockerl~amand Weir
exhrb~t~ng the effects of stabilizing or balanced (1987) discussed how p depends on the popula-
select~onand loci in the upper tail ("high variation size, number of populations, mutation rate,
tion") to be exhibiting the effects of disruptive and migration rate.
select~onwhereby some alleles are favored in Suppose first that there is no migration. Un-
some environments and are at a disadvantage 111 der the infinite-alleles mutation model, every mu-
otliers It is not clear that statistical sampling tation is to a new allelic type. In a finite popula-
effects were considered, and the conclusions are tion of size N,an equilibrium will be established
limited to the model simulated, but the approach between the loss of variation by drift and the in-
provides an interesting example of a random troduction of variation by mutation. At such an
mode1 analysis. equilibrium, if the mutation rate is u, the value of
6 is given by
1
MUT;\TION A N D MIGRATION If other forces are
involveci, such as mutation, then the allelic fre-
Int~nspecificDifferentiation 401
(Kimura and Crow, 1964).Note that different pop- The drawback of using 8 or related quantities
ulations are considered here to be independent. to estimate migration rates is that it is assumed
Often it has been suggested that migration rates that population differentiation is due to gene
could be estimated from 8 values (e.g., Slatkin, flow. In the absence of direct observations on gene
1985). There is an infinite-island model, corre- flow, this may be the only alternative but it does
sponding to the infinite-alleles mutation model. mean that alternative evolutionary scenarios lead-
Each generation, any gene sampled from a popu- ing to the same pattern of gene frequencies can-
lation has probability nz of having migrated from not be eliminated.
any one of an infinite number of other popula-
tions. When these various "islands" are of finite Population Subdivision
size N, an equilibrium is again established be- It is often the case that individuals are sampled in
tween loss of variation due to drift within islands, a nested sampling scheme. J.S.F. Barker et al.
and gain of variation by migration from other is- (1986) sampled Drosophila buzzatii from transects
lands. The equilibrium value of 8 is simply and sites within transects; Ferrari and Taylor
(1981) sampled Drosophila subobscura from subdi-
visions, regions within subdivisions, and demes
within regions. Each recognizable level in such hi-
erarchies adds another level to the degrees of re-
Although this suggests a means for estimating latedness of genes. The analysis of intraspecific
Nm,there are complications in practice because differentiation can be carried out by looking at the
the infinite-island model is unrealistic. As soon as hierarchy of relationships between pairs of genes,
a finite number n of islands are postulated, there
is a non-zero probability that two islands receive
migrant genes from the same island. The island
populations are not independent and there is a
need to distinguish between €4and 4. The quan-
tity p can be estimated (although it should not be from the most closely related pairs (within indi-
called 0 or Fsr), and was shown by Cockerham viduals) to the most distantly related (between
and Weir (1987) to have exgectation in an equilib- largest sampling units). This hierarchy is conve-
rium population of niently recognized and organized in nested analy-
ses of variance layouts. These layouts then pro-
vide the means for estimating the various
measures of differentiation. Details for both three-
and four-level hierarchies were given by Weir
where, for mutation rate p, p = (1 - pI2, d = (1990).
(1 - m d 2 , and a = n/(n - 1). For a large number
of islands
APPLICATIONS
Conditional Genotypic Frequencies

which reduces to the 1/(1 + 4Nm) value given The uses to which F statistics may be put are
above when the mutation rate is small. Cockerham many, and an interesting application concerns
and We& (1993) showed by theory and simulation conditional genotypic frequencies for forensic cal-
that 1 / ( -
~1) performs satisfactorily as an estima- culations. If a suspect has been found to have the
tor of 4Nm,in spite of some doubts raised by same genotype AiAt as evidentiary material (such
Slatkin and Barton (1989).A related discussion, of as a blood stain) left by the perpetrator at a crime
using FST estimated for DNA sequence data to esti- scene, there is interest in determining how likely it
mate gene flow, was given by Hudson et al. (1992b). is that this could have occurred by chance. In
words, what is the frequency that the per- whereas the single-individual frequencies are
petrator has genotype A,AI, given that the suspect
in the crime has that type? The best way of pre-
senting the evidence of a match between suspect
and evidentiary material is to compare probabili- There is almost no empirical information oh
ties of the match under two hypotheses, C: sus- the higher-order measures y, 6, and A. However,
pect S and perpetrator P are the same person, and for low levels of relatedness, numerical work has
?: suspect and perpetrator are different people. shown that the following approximations given
For heterozygotes, the comparison is given by the by Balding and Nichols (1994)for the conditional
likelihood ratio frequencies are satisfactory (they are somewhat
too small for heterozygotes and too large for ho-
mozygotes) :
when the numerator in the first equation is as-

sumed to be 1. Additional details were given by Weir (1994).
The frequencies with which two individuals
have the same genotypes were given by Cocker-
ham (1971).These frequencies can be expressed in IMPLEMENTATION
terms of a suite of measures of identity by descent
for genes considered two, three, or four at a time.
The two-gene versions of these measures are the
F statistics defined in terms of identity of descent The theory presented above assumes random
rather than correlations. For random mating, 0 is sampling of individuals from each population, at
the probability that any two genes at a locus are least with respect to the genetic units being
identical by descent, y and F are the probabilities scored. Such randomness generally is guaranteed
that any tkree or four allelic genes are identical by by the methods used to capture animals or select
descent, and A is the probability that two pairs of plants from a natural stand.
allelic genes are identical by descent. Sampling of populations, however, may not be .
In the case of both individuals being the same as clearly random. Because the genetic sampling
homozygote or the same heterozygote the joint involved in the transmission of genes between
frequencies are generations is random, any extant population is a
random representative of all the replicate popula-
tions that may arise under the same set of condi-
tions. The extent to which it can be regarded as
representative of the species will depend on the
extent to which the environmental conditions for
that population are representative of those faced
by the species. If a population clearly has evolved
under unique conditions, then it may be appropri-
ate to restrict conclusions to that one population.
A fixed analysis is necessary then, and the F statis-
tics need not bc calculated. In general, though, nat- jected to a nested analysis of variance. For hapioid
ural populations would seem better regarded as data, analysis in terms of allele A proceeds by
random samples in both space and time. coding every occurrence of that allele by 1 and
In the absence of knowledge of the sampling every occurrence of alternative alleles by 0. These
distributions of parameter estimates, it is not easy values are subjected to a nested analysis of vari-
to give expressions for the sample sizes necessary ance with as many levels as there are salnpl~ng
for desired levels oi precision. As always, pilot ievels, and the components of variance for each
studies are invaluable, and variances of estimates sampling level are estimated. In the SAS package
obtained by numerical resampling in such pilot (SAS Institute, 1985), for example, the NESTED
studies will indicate the precision to be expected procedure is appropriate. If X is the indicator val-i-
in the full-scale study. able, then for a simple hierarcl~ywith populat~ons
In a series of simulation experiments for pop- and genes within populat~ons,the SAS program
ulations undergoing drift, Reynolds et al. (1983) would include the statements
and Weir and Cockerham (1984) showed that the
PROC NESTED
method of ~nomentsgives estimates of low bias,
CLASS POPN
whereas the method suggested here for combin-
VAR X
ing estimates over alleles and loci has sufficiently
low variance that the mean square errors com- with the EW\OR mean square providing the varj-
pares favorably with alternative methods. ance cornpoilent for genes within popvlations.
Variance components need to be estimated lor
every allele for loci with more than two alleles.
Analysis The estimate of 8 follows from this and the POPN
As ill all population genetics studies, it is impor- variance component as described in this cl-iaptcr.
tant here to record the genotypes at all loci scored A similar procedure holds for diploid data,
on an individual basis. Although summary tables with the variable X given a value for each of the
may present data on a per-locus basis, data two genes within an individual. For an analysis of
should be kept as multilocus genotypes. allele A, l~omozygotesare replaced by the two
There are two principal ways of performing numbers 1,1, heterozygotes by 1,0, and all other
analyses once the data are in hand. The first is to genotypes by 0,O. For a three-level hierarchy an
use the equations and tables presented in this SAS program would contam
chapter. Tables 1 and 2 can be constructed with al-
PROC NESTE,D
lelic aitd genotypic frequencies from the popula-
CLASS POPN INDIV
tions sampled. For moderate-sized samples, this
VAR X
can be done by hand and the various F statistics
calculated from the equations given. This work and the ERROR mean square now provides [he
becomes tedious when there are several loci, or variance component fur genes within indivicluais
several alleles per locus, and then the analysis Note that these analyses should be limited to es-
slzould be performed by computer. A computer tima tion of variance components. Any F or t test
approach is necessary when numerical resain- statistics that are calculated by the statistics pack-
pling is to be used to estimate variances or confi- age inay have little meanlng, since the indicator
dence intervals for the estimates. Programs were variables are not nornlally distributed.
given by Weir (1990).
The second method of analysis takes advan-
tage of general-purpose statistical packages. As AN EXAMPLE
has been stressed, analyses of population struc-
ture can be phrased in terms of nested analyses of One of the classical analyses of human population
variance of indicator variables. Accordingly, the structure is that of ten political districts of the Pa-
genotypic data can be transformed to values for pago Indians in Arizona (Workman and Niswan-
the indicator variables, and these values then sub- der, 1970).The study is aIso one of the few 111
404 C k n p t e ~20 / Weir
Ta b l c 3
MuliiXocus genotype frequencies for the MNS blood group system in samples from
tell political districts of the Papago Indians of Arizona
Dlstrlcr MNSS MMSs MM~S ~ s S MNSs MNss NNSS NNss Total
1 9 14 18 7 12 8 2 1 0 71
2 3 7 8 I 3 4 1 1 1 29
3 12 31 12 3 14 7 0 0 1 80
4 3 9 8 3 12 9 1 2 1 48
7 4 14 18 0 6 10 1 0 2 55
6 1 8 7 4 2 2 0 2 2 28
7 8 31 4 3 14 4 0 3 1 68
6 21 52 55 7 24 21 0 4 3 187
9 4 16 7 5 10 6 0 2 1 51
10 23 14 4 3 10 3 1 1 1 60
From Workn~anand Niswander, 1970.
wluch detailed tables of genotypic frequencies are ness-of-fit tests are probably adequate and the test
presented. Table 3, which is part of Table 2 in statistics are shown in Table 4, but with samples as
Workman and Nistvander's paper, shows the small as 29, it is preferable to conduct exact tests.
two-locus counts for the MNS blood group sys- The permutation method described above was
tem In samples from the ten districts. The data used to find the significance levels (also shown in
will be treated here as though there arc two Table 4). Workman and Niswander had found
codomlnant loci M and S, each with two alleles: only the S locus in district 7 to be out of
M, Nand S, s. Hardy-Weinberg equilibrium. They commented
I-Iardy-Weinberg testing may be conducted that since there was a single significant value in 20
for each locus in each district. The usual x2 good- tests, "it may be concluded that the observed pro-
Table 4
Significance levels of tests for Hardy-Weinberg equilibrium
M S
District x2 Permutation x2 Permutation
In traspecific Differentiation 405
portions do not differ from those expected." Cer- The low value for M is consistent with the homo-
tainly one significant result in 20 replicate tests is geneity of genotypic frequencies, but not with the
to be expected when there is true equilibrium and heterogeneity of allele frequencies. There is a
a 5% sigruficance level is used. However, in this in- higher 0 value for S, and it is on the borderline of
stance it is not clear that there are 20 replicate tests. being significantly different from zero. Mention
If there is equal interest in each locus and each dis- was made earlier of it being easier to detect dif-
trict, then no test. is replicated. It would be of inferences in a fixed population situation than in a
terest to explain why the S locus has such an ex- random situation where inferences are being
cess of heterozygotes in District 7. (This excess made in a wider context and therefore there is a
results in two-locus genotypic frequencies not be- need to take into account larger variances.
ing consistent with the appropriate products of al-
lele frequencies in District 7, but there is no other
evidence of two-locus associations.) CONCLUSION
To judge the interdistrict heterogeneity in
genotype frequencies, a 10 x 3 contingency table As with all population genetic analyses, quanti-
(10 populations by 3 genotypes) test can be con- fying the effects of variation among populations
ducted for each Iocus. The test statistics are 25.80 within species requires methods tailored to the
for M and 73.35 for S. The second value is highly intended scope of inference. For detecting differ-
significant, with the largest differences being in ences between specific populations, standard
districts 5 (ss), 7 (Ss and ss), and 10 (SS and ss). statistical techniques involving contingency table
Both loci show significant heterogeneity in allelic x2 tests or permutation tests are appropriate. It is
frequencies, with the 10 x 2 contingency tables (10 possible to measure the extent of differentiation
populations by 2 alleles) having X* values of 18.49 with the F statistic, FST, but difficult to ascribe a
for M and 50.82 for S. These analyses are from a genetic meaning to this quantity in the fixed-
fixed population viewpoint, and make no appeal model analysis. If the sampled populations are
to any genetic mechanism. regarded as having been sampled from a set
With a genetic random model supposing a of populations subjected to evolutionary
common origin for the ten districts, it is appro- events, then the full set of F statistics provides
priate to estimate F statistics. Using the DIP- an appropriate parameterization. Moment esti-
LOID.FOR program (Weir, 19901, the bootstrap mators are convenient for supplying numerical
confidence intervals for 0 for two loci (bootstrap- values, and resampling over loci may allow in-
ping over districts) are: ferences to be made about differences among
populations.
ogenetic Inference
David L.Swofford, Gary J. Olsen,
Peter J. Waddell, and David M. Hillis
INTRODUCTION
Inferring phylogenetic relationships from molecular data requires the selection of
an appropriate method from the many techniques that have been described. Un-
fortunately, phylogenetic analysis IS frequently treated as a black box Into wluch
data are fed and out of which "The Tree" springs. Our goal in this chapter is to
provide more than a cursory description of the available analytical mcthods,
rather, we hope to develop a conceptual framework for understanding the theo-
retical and practicaI distinctions among alternative methodolog~es.Phylogenet~c
analysis of molecular data is in the midst of a remarkable transformation The
most striking theme in this shift is an increased emphasis on the use of methods
that are based on models of evolutionary change. Moreovcs, users of methods Lhat
do not require explicit models arc now much more likely to ~ncorporatemocllh-
cations based on reasonable assumptions about the evolutlanary process than
when the first edition of Molecular Systenznfzcs appearcd only six years ago We
view this trend as a positive one and have reorganized our chapter accordingly.
Regrettably, we cannot accoinplish all of the above objectives and at the samc
time provide an exhaustive review of the voluminous l~teratureon phyloge~iehc
reconstruction; however, Felsensteii~(1982, 1988a, 1993) and Hillis et al. (1993a)
have presented general reviews of methods for inferring phylogenies. Jnstcad,
we will focus 011 methods that are currently in widespread use or that are likely
to be used in the foreseeable future. We will also avoid the temptatmix to a t e
every relevant paper, limiting our citations to papers that are either of funda-
mental importance to the development of a method or that provide the clearest
explanations of that method.
As any reader even moderately familiar with the current state of affairs 111
pl-i)rlogeliet~~s
already knows, debates among pro- forms of pair-group cluster analysis (e.g., UP-
ponents ol rival methodologies are often intense GMA) and some other distance methods such as
and solnetimes ~~nneccssarily acrimonious. Con- neighbor joining (discussed later in this chapter),
sequently, we will offer recommendations where The methods tend to be computationally fast be-
we deem them appropriate, but wlll deliberately cause they proceed directly toward the final solu-
c?vold taking strong positions on or making con- tion without requiring evaluation of large xlum-
tro\,crsial assertions about issues where there is bers of competing trees.
room for legitimate disagreement. Instead, we The second class of methods lzas two logical
hope to provide suffic~entbackground so that steps. The first step is to define an optimality cri-
readers will be able to make informed decisions terion (formally described by an objective func-
regarding thc techniques most appropriate for tion) for evaluating a given tree-i.e., a score is as-
their own data. Our treatment In this chapter will signed and subsequently used for comparing one
be llmlted to the inference of the phylogenetic lus- tree to another. The second is to use specific algo-
tory of the genes under study. For a variety of rea- rithms for computing the value of the objective
sons, these "gene trees" may fail to reflect the re- function and for finding the trees that have the
lationships of the organisms from which the genes best values according to this criterion (a maxi-
were sampled. A discussion of these and related mum or minimum value, as appropriate). Thus,
issues 15 presented in Chapter 1; we will not ad- the evolutionary assumptions made in the first
dress thcm further here step are decoupled from the computer science of
the second step. The price of this logical clarity is
Algorithms versus Optimality Criteria that the methods tend to be much slower than
those of the first class, a consequence of having to
Inferring a phylogeny is an estimation procedure; search f ~ the
r tree(s) with the best score. For data
we are malting a "best estimate" of an evolution- sets containing more than about 8 to 20 taxa, the
ary hlslory based on the incomplete information search for the best tree is usually not exact (be-
contained In the data. In the context of molecular cause of the large number of possible solutions),
systematics, we generally do not have direct in- and thus we must add caveats regarding the thor-
lormation about the past-we only have access to oughness of the search for the optimal tree. These
contemporary species and molecules. Because we issues are covered in detail below.
can postdate evolutionary scenarios by which any It is important to distinguish between the
cl~osenphylogeny could have produced the ob- uses of algorithms in the two approaches. In a
served data, we musk have some basis for select- purely algorithmic method, the algorithm clefirlrs
ing one or more preferred trees from among the the tree selection criterion and takes on funda-
set of possible phylogenles. Phylogenetic infer- mental importance. In a criterion-based method,
ence methods seek to accomplish this goal in one however, the algorithms are merely tools used in
of two ways: (1) by defining a specific sequence of evaluating the objective function and searching
s t c p d x ~algorithm) that leads to the determina- for trees that optimize it.* Because criterion-based
tion of n tree; or (2) by deflning a criterion for methods can assign scores to every tree examined,
co~nparingalternative phylogenies to one another phylogenies can be ranked in order of preference
and decidlng which 1s better (or that they are according to the chosen criterion. This is an enor-
equally good) mous advantage over purely algorithmic meth-
Purely algorithmic methods combine tree in- ods. If a criterion-based method finds that there
ference and the definition of the preferred tree are thousands or millions of trees that explain the
into a slngle statement. These methods include all data about equally well, the user of the method
"Aclually, the same algor~thm!nay bc used in both approaches, albeit for very different goals. For instance, an algo-
rithm used to specify a f~naltrce m a purely algor~thmicmetbod may be used to find an initial tree for a criterion-
based m~thod(e.g.,as a startlng point for branch-swappingrearrangement algorithms).
Phylogenetic Inference 409
will not be misled into believing that any particu- If a phylogenetic inference method could be
lar tree is well-specified. On the other hand, when based upon a complete knowledge of the evolu-
a purely algorithmic method determines a single tionary process, it would be free of systematic er-
tree, the user will have no immediate knowledge ror (i.e., if enough data were obtained, the method
about the strength of support for that tree. Some would consistently obtain the true phylogeny).
workers (e.g., Hedges et al., 199213) have argued Even in the absence of such complete knowledge,
that algorithmic methods can be rescued by using hypothetical models of the evolutionary process
statistical methods such as nonparametric boot- could be used to derive (or otherwise justify) tree
strapping (see the section "Reliability of Inferred inference methods that would be free of system-
Trees," later in this chapter) to assess the confi- atic error, if the assumed /nodel were correct. A vari-
dence in a tree found using an algorithmic ety of inference techniques have been formulated
method. This position fails to address the criticism on the basis of explicit evolutionary assumptions.
that algorithmic methods generally do not ad- These model-based methods are not necessarily
dress the operational evolutionary assumptions. invalidated when one or more of their assump-
As an extreme example, consider an algorithm tions is violated-a model does not have to be
that chooses trees independently of the data, for perfect in order to be useful. That is, although the
example by labeling the tips of a maximally asym- assumptions may be sufficient to ensure the va-
metric tree in alphabetic order of the species lidity of a technique, under special circumstances
names. Repeated analyses using different re-sam- they might not all be necessary, and the method
plings of the data will always generate the same may be robust to violation of its assumptions. Fur-
tree, leading to the obviously absurd conclusion thermore, model assumptions need not be ac-
that the tree is extraordinarily reliable. cepted in a vacuum; data can and should be al-
lowed to reject the model if the model is
Use of Models and Assumptions inadequate.
Although almost all methods accept the ap-
in Phylogenetics propriateness of a tree-like model of evolution (a
Although we will deal extensively with specific strong assumption in itself), many commonly
models of the evolutionary change of molecules, used methods of phylogenetic inference are not
a preliminary discussion of the relevance of mod- explicitly based on a set of evolutionary assump-
els in general is in order at the outset. Phyloge- tions. However, the lack of stated assumptions
netic inferences are premised on the inheritance of does not mean that a method is assumption-free;
ancestral characteristics, and on the existence of the assumptions are simply implicit rather than
an evolutionary history defined by changes in explicit. For example, the widely used method of
these characteristics. The stable inheritance of maximum parsimony does not depend on a pre-
characteristics is mediated by the genome. Differ- cise model, but believing its results does require
ences due to epigenetic or environmental factors one to believe that plausible evolutionary scenar-
do not provide useful phylogenetic information ios that could cause it to fail have not taken place.
and must be specifically avoided; all characteris- It is often argued that it is circular to model char-
tics of interest are genetically mediated. Therefore, acter change for the purpose of estimating a phy-
the data for phylogenetic inference reflect, more logeny because we cannot begin to understand
or less directly, genomic information. From this the processes of character change without first
reductionistic perspective, a complete evolution- knowing the tree. We prefer, instead, to think of
ary history is synonymous with an event-by-event the problem as one of "reciprocal illumination"
accounting of fixed mutations in every genomic (Hennig, 1966): having some idea of the phy-
lineage of interest. This view of the problem pro- logeny is relevant to the development of good
vides a common framework, albeit a purely con- models, but ever-improving models can also lead
ceptual one, for analyzing and comparing types to better phylogenetic inferences. Thus, both
of molecular data and analysis techniques. classes of methods are useful and important. We
410 Chaptev 11 / Swofford, Olsen, Waddell b Hillis
see it as unfortunate that some workers, in their connected to an internal node, then the node rep-
zeal to avoid circularity, limit theinselves to resents a multifurcation, or polytomy. A tree ill
"model-free" methods that may be more likely to which all internal nodes represent bifurcations is
violate their (implicit)assumptions than the meth- said to be binary, fully resolved, or strictly b~fur-
ods they reject, for which the assumptions are cating. A tree that contains a single internal node
more explicit. is called a star tree.
One assumption implicit in this general view An unrooted, fully resolved tree has T termi-
concerns the uniqueness of the genomic lineage. nal nodes (corresponding to the taxa) and T - 2 in-
The potential confusion due to Iateral gene trans- ternal nodes. The tree has 2T - 3 branches, of
fer has received much recent attention. When which T - 3 are interior and T are peripheral. The
transfer is common among thc lineages of inter- total number of distinct unrooted, strictly bifur-
est, a population genetic analysis (Chapter 10) is cating, trees for T taxa is
most appropriate. Our presentation is appropriate
for cases in which interspecies differences are
large compared to intraspecific variation.
Definitions of Terms
Most of the analytical techniques that we will dis- (Felsenstein,1978b).Adding a root adds one more
cuss result in the inference of an unrooted tree or internal node and one more interior branch. Since
unrooted phylogeny-a phylogeny in which the the root can be placed along any of the 2T - 3
earliest point in time (the location of the common branches, the number of possible rooted trees is
ancestor) is not identified. (We generally use free increased by a factor of 2T - 3.
and phylogeny interchangeably.) Also, biologists
often refer to an unrooted tree as a network; how-
ever, this usage conflicts with the definition ap- TYPES OF DATA
plied to that term by mathematicians and should
be avoided (the section "Split Decomposition" All of the experimental data gathered by the tech-
uses network in the correct sense). When we find niaues in this volume fall. into one of two broad
it necessary to distinguish between rooted and categories: discrete characters, and similarities or
unrooted phylogenies or trees, we will do so ex- distances. A discrete character provides data
plicitly. about an individual species or sequence. Charac-
The components of a phylogenetic tree go by ter data are often transformed inio similarity or
a variety of names. The contemporary taxa corre- distance values representing quantitative com-
spond to terminal nodes or tips, also called leaves parisons of two species or sequences; each such
or external nodes. The branch points within a tree measure describes a pairwise relationship. Of the
are called internal nodes. Nodes are called ver- methods discussed in this book, only DNA-DNA
tices or points by some authors. The branches hybridization data are collected directly in the
connecting (incident to) pairs of nodes are also form of pairwise distance comparisons. Appro-
called edges, links, or segments. We will use the priate distance measures and transformations for
terms peripheral branches to refer to branches DNA-DNA hybridization data are discussed in
that end at a tip and interior branches (or, in the Chapter 6. o u r discussion here focuses on charac-
case of a tree with four terminal nodes, central ter data.
branch) to refer to branches that are not incident Discrete character data are those for which a
to a tip. data matrix X assigns a character state xi, to each
If just three branches connect to an internal taxon i for each character j. Although syst'ematists
node, then the node represents a bifurcation, or sometimes disagree about the terminological dis-
dichotomy. If there are more than three branches tinction between character and character state, we
prefer to think of characters as independent vari-
ables whose possible values are collections of inu-
tually exclusive character states.
The assumption of Independence among
characters is common to most character-based
of analysis, When we can not assume in-
dependence, we are forced to take covariances
ainong characters into account, and the computa-
tional methods become considerably more com-
plicated. Furthermore, tlze assumption of inde-
pendence enables us to treat each position
separately in certain time-consuming stages sf
computational algorithms, thereby allowing prob-
lems to be subdivided into a number of much
simpler subproblems. (For example, numbers of
substitutions can be minimized separately posi- Figure 3. Ordered and u~iordercdcliaracters (A) Or-
tion-by-position and then summed over positions dered multistate character (transformat~onbetwccn ally
in a parsimony algorithm, or probabilities can be two states that arc not directly connected tmpi~espas-
lnultiplicd over positions in a maximum likeli- sage througl~one or more ~nternledlatestates). (%) Un-
ordered multistate character (any state can transform
hood approach.) directly lilt0 any other state) (C) Ordcred multlstatc
A second assumption required of character characters m which the polarity is indlcatcd (thc orcla-
data is that the cl~aractersbe homologous. As ar- mg relation 1s the same In all threc cnses but the anccs-
ticulated in Chapter 1, the concept of homology is tral state d~ffers).
complicated by the variety of meanings that have
been applied to the term. In general, by honzology
we mean that a character must be defined in such ordered, depending on whetl~eran orderlng rcln-
a way that all of the states observed over taxa for tionship is imposed upon the possible states (Fig-
that particular character must have been derived, ure 1).Por example, nucleotjde sequence data are
perhaps with modification, from a corresponding generally treated as unordered multistate charac-
state observed in the common ancestor of those ters, since there is no a prior1 reason to assume, for
taxa. When we are interested in relationships instance, that state C is inier~nedlatebctwecn
among species rather than among genes, we fur- states A and G. In tlze context of phylogenetlc
ther restrict this definition to include only orthol- analysis, we say that any state is allowed to trans-
ogous, as opposed to paralogous or xenologous, form directly into any other state. If, 011 tlzc othcr
genes. hand, we are willing to make assumptions In-
In general, character data are either qualita- volving the relationships among the states of a
tive, in which case the possible states are two or character, we can rank tlze clzaracter states in to an
more discrete values; or quantitative, i11 which ordered series he., a linearly ordered character)
case the characters vary continuously and are or a branching diagram (partially ordered char-
measured on an interval scale. Qualitative charac- acter or character-state tree.) Multistate ordcred
ters may be further subdivided into binary (two characters arc not commonXy encountered in mol-
possible states) and multistate (three or more pos- ecular data sets, but they are sometimes used $11
sible states). Binary characters typically represent the a~~alysis of allozyme data.
the presence or absence of some item, such as the The concepts of character order and charac/cr
recognition sequence for a restriction endonucle- polarrty sl~ouldnot be confused. The former dc-
ase at a certain map location (restriction site) or a fines the allowed character-state translormatlons,
particular allele at an isozylne locus. whereas the latter refers to the dlrectlolz of clzarac-
Multistate characters may be ordered or un- ter evolution. Estimation of character polarlty
412 Clzapter 11 / Swofford, Olsen, Waddell b Hillis
gawxally ~nvolvesan assessment of the observed from sequence data. Methods for alignment are
characLer st,~tenost likely to represent the ances- discussed in Chapter 9.
tral condition (i.e., the state found in the most re-
cent corn1no1-1ancestor of the taxa under study). Restrictio~~
Endonuclease Data
A n excellent discussion of character orderlng and
pola~lty(in a non-molecular context) can be found Restriction endonuclease analysis provides char-
in Mabee (1989). We will return to the subject of acter data in one of two forms, both of which lead
character polarity in the discussion. of parsimony to a set of binary characters for each taxon. Ideally,
methocis. the characters are map locations and character
Quan~itativecharacters are less commonly states are presences or absences of the recognition
used as character data in molecular systematics. sequences for particular endonucleases at those
The prominent exception occurs when polymor- locations (restriction-site data). However, because
phrc characters such as allelic isozyines or the construction of restriction maps is time-con-
xntDNA haplotypes are coded as frequencies. suming (see Chapter 8), some workers simply
treat the presence or absence of restriction frag-
ments of-a given length as character states (re-
Sequence Data striction-fragment data).
.
d
111 principle, the use of sequence data as charac- We do not recommend the use of restriction-
ters for phylogenetic analysrs is straightforward fragment data for input to phylogenetic analysis,
Given a set of sequences, the characters are repre- primarily because these data violate the assump-
sented by corresponding positions (offsets)in the tion of independence among characters. If a new
sequences, and the character states are the nu- site evolves between two preexisting sites, one
cleolide or a~ninoacid residues observed at those (longer) fragment disappears and two new
positlons For example, if nucleotide A is observed (shorter) ones appear. Thus, even thoug11 two
to occlir at position 139 in a sequence, "position species may share two of the three restriction sites,
139" rs the character and "A" is the state assigned they have no fragments in common-a potentially
Lo lkat character. To simphfy our exposition, we serious source of error. Some authors (e.g., B. Bre-
will us~tallyconfine our descriptions to nucleohde mer, 1991) have recognized this difficulty and ar-
sequences unless the dlstlnction is important. gue that it can be overcome by looking at
Uniortunately, thls s~rnphcityis deceiving. In "enough fragment data so that each occurrence
addltion to requiring the usc of homologous mol- of this kind of error will be swamped by other
ecules (see Chapter 3 1, phylogenetic analysis of se- data. We are unconvinced by this argument, how-
quence data requires positional homology. That ever, because there is no guarantee that if some-
is, the nucleotides observed at a given position in thing is done inappropriately enough times, all.
the tdxa under study should all trace their ances- will work out in the end (and the amount of sys-
fry to a smgle position that occurred in a common tematic error introduced by this shortcut: will in-
anccstor of those taxa. Except for higldy conserved crease substantially with increasing divergence
sequences, insertion and deletion events must among the taxa in the analysis). A second and re-
nearly always be postulated 111 order to make be- lated problem with fragment data is that insertions
llevable the assumption that nucleotides at corre- or deletions are difficult to handle. For example,
sponcilng positlons in the various sequences are in the insertion of a length of DNA long enough to al-
fact homologs. An alignment of the sequences is ter the mobility of the fragment (but not contain-
obtalned by inserting gaps, which correspond to ing a restriction site) requires the worker to assert
i~~sertiollsor deletions, ~ n t oone or more of the se- that a species lacks a fragment found in one or
quences In order to place positions inferred to be more other species, even though the restriction
homologous into the same column of the data ma- sites responsible for the fragment are at homolo-
trix Aligrunent is often the most difficult and least gous points on the map (see Chapter 8).
undersLood component of a phylogenetic analysis Even when sites are mapped, restriction en-
Phylogenefic Inference 413
donuclease data are problematic for phylogenetic With the development of character-based
due to the asymmetry in the probabilities methods, however, came a second controversy,
of gaining and losing sites. If a particular sequence this one involving the importance of allele fre-
of six base pairs is only one substitution away quency information. Some authors (e.g., Micke-
from equalling the recognition sequence of a par- vich and Johnson, 1976) argued that the presence
tic~darendonuclease (a "one-off" site), then given or absence of an allele was of more fundamental
that a substitution occurs *withinthe six-base se- evolutionary importance than was its frequency
quence, only one of the 18 possible substitutions (which was subject to modification by drift
of one base for another will convert the sequence and/or selection), and that frequency information
to a restriction site. On the other hand, if the six- should therefore be discarded. These authors pre-
base sequence is already a restriction site, then a ferred to recast the data into presence/absence
substitution at any of the six positions will cause form. However, other authors (e.g., Swofford and
the site to be lost. Thus, losing an existing restric- Berlocher, 1987) have argued that there is no rea-
tion site is much more likely than gaining a site at son to ignore frequency information in analyzing
a particular location. (For more complete discus- allozyme data.
sions, see Templeton, 1983a, 1983b and DeBry and The earliest attempts to use allozyme charac-
Slade, 1985.)Note that this argument applies only ters directly in a phylogenetic analysis generally
to particular sites in the genome; it does not imply treated the allele as the character and either its
a net loss of restriction sites during evolution. Be- presence/absence (e.g., Mickevich and Johnson,
cause of these gain-loss asymmetries, special han- 1976) or its frequency (e.g., Buth, 1979b; Simon,
dling may be required for restriction-site data. 1979) as the character state. This procedure, how-
ever, is open to the same criticism leveled at the
use of restriction fragment data: the assumption
Isozyme Data of independence of characters is violated. Specif-
Allozyme (allelic isozyme) data represent the only ically, since the frequencies of the alleles at a lo-
type of isozyme data routinely used in phyloge- cus in a given taxon are constrained to sum to
netic analysis (but see Buth, 1984a, and Chapter 4 one, if the frequency of one allele increases, the
for a discussion of other data types). These data frequency of at least one other allele must de-
are usually presented as a three-dimensional ar- crease. This property leads to problems, for ex-
ray that specifies the frequency of each allele at ample, when allele-as-character data are sub-
each locus in each population or taxon.* Two con- jected to maximum parsimony analysis, where
troversial issues confront the researcher attempt- ancestors are often inferred to contain no alleles
ing to estimate phylogenies from allozyme data. at all (presence/absence cading) or frequencies
The first concerns whether or not to transform the that do not sum to one (frequency coding) for
data to genetic similarities or distances. Probably some loci.
due more to inertia than anything else, the pre- Because of these difficulties, Buth (1984a) and
dominant mode of analysis throughout the 1970s others have advocated an approach that recog-
and into the 1980s was to compute a matrix of nizes the locus as the character and the allelic
pairwise similarities or distances between taxa composition at the locus in each taxon (i.e., allele
that served as the input to cluster analysis or ad- or combination of alleles present) as the character
ditive-tree methods, The stereotypical way in state. For example, if some taxa are fixed either for
which these data were treated tended to retard the allele a or for allele b, whereas others are poly-
development of approaches that made direct use morphic for both alIeIes, then three states would
of the character information. be recognized: "only a," "only b," and "a plus b."
"It is customary to refer to loci as putative or presumptive and to use the term elecfromorphsrather than alleles because
of the indirect nature of the data and the usual absence of crossing experiments to conftrrn the mode of inheritance.
For our purposes here, the siinpler terms suffice.
414 Chapter 11 / Swoffovd, Olsen, Waddell 8Hillis
The resulting discrete character states ("particu- these considerations, methods that require re-cod-
late data") are either left unordered or ordered ing of allele frequency arrays into discrete states
mto some logical progression (see Buth, 1984a, for should be used only when levels of polymor-
details) for subsequent analysis. phism are low, with problematic loci excluded
Despite its intuitive appeal, several factors from the data set.
limit the utility of the particulate data, locus-as- J.S. Rogers (1984,1986) and Swofford and
character approach, When many different alleles Berlocher (1987) have developed methods of
occur in various combinations across taxa, the analysis that use the observed allele frequencies
number of unique combinations may approach or directly in character-based analyses rather than
even equal the number of taxa. Such characters requiring their recoding as discrete states (see the
will contain little or no information if the charac- section on "Parsimony on Allozyme Data").
ter states are left unordered. Ordering the charac- Felsenstein's (1981b) maximum likelihood
ter states helps somewhat, but the ordering crite- method for continuous characters evolving under
ria often seem subjective and arbitrary. a Brownian motion process can also be applied to
Buth (1984a)distinguished qualitative coding, gene frequency data (after an appropriate trans-
in which observed combinations of alleles are formation).
used regardless of frequencies, and quantitative
coding, in which estimated allele frequencies are
used to assess ''whether the states expressed by
Gene Order Data
two taxa are statistically identical." Obviously, Phylogenetic inference based on the structural
qualitative coding is extremely susceptible to arrangement of genes, particularly in organellar
sampling error. Consider the example in the genomes, provides a useful alternative to the
above paragraph. Taxa that were in reality poly- more traditional comparison of the sequences of
morphic for alleles a and 11 would often be incor- one or more genes (or indirect measures thereof).
rectiy scored as "fixed" if one allele were rare, un- Although we will not discuss the use of gene-or-
less sample sizes were large. (Swofford and der data in detail, there is growing evidence that
Berlocher, 1987, give a table showing the proba- such data will provide important information on
bility of failing to detect low-frequency alleles in relationships, particularly when trying to resolve
samples of various sizes; see also Chapter 2). Even ancient divergences. Sankoff et al. (1992) used
if allele frequencies could somehow be deter- gene-order comparisons to estimate a phylogeny
mined without error, it would be unreasonable to for 16 taxa, including fungi and other eukaryotes,
argue that allele frequencies are so irrelevant that and obtained a tree highly compatible with our
the distinction between allele frequency arrays of, current understanding of metazoan and fungal re-
say, [0.01,0.99] and [0.99,0.011 is unimportant. lationships. More recently, Boore et al. (2995) have
Quantitative coding presumably makes use of used gene-order data to address longstandirtg
contingency-table analysis to test whether two or questions regarding arthropod relationships. They
more samples could have come from a single ho- were able to draw strong conclusions about rela-
mogeneous population. In most cases involving tionships that previously had been highly am-
interspecific comparisons, however, we know be- biguous. Boore et al. (2995), Downie and Palmer
forehand or from the analvsis of other loci that (1992b), and others have argued that gene re-
such is not the case, even if ;he difference between arrangements are potentially more informative
the allele frequency arrays of two taxa at a partic- because they occur less frequently (and hence are
ular locus is not deemed significant. Furthermore, less subject to parallelism and convergence) than
the power of these tests to detect heterogeneity is sequence data, and because the large number of
weak unless sample sizes are large. Therefore, possible character states makes it unlikely that the
failure to reject the null hypothesis of homogene- same gene order will evolve independently in dif-
ity should not usually be taken as evidence that ferent lineages. Thus, while gene-order characters
the taxa are "statistically identical." Because of typically are insufficient to obtain a fully resolved
Phylogenetic I~zfeuel?ce 415
tree, one can generally have high confidence in tionary steps (transforrnat~ons fro111 one character
the groups that are supported. state to another) required to explain a given sel of
Phylogenetic analysis of gene-order data is in data. For example, the steps l n i ~ hbe
t base substi-
ils infancy (although tlze problems are similar to tutions for nucleot~desequence data, or gall1 and
those encountered in the analysis of chromosomal loss events for restriction-site data. Obviously, a
inversions and other rearrangements). A serious tree that minimizes the total number of steps also
complication is that the characters can no longer miniinizes the number of extra steps (horno-
be assumed to evolve independently, because it is plasies) needed to explain the data.
the relationships of the genes to each other that In more mathematical terminology, we can
define tlze characters. Sankoff et al. (1992) have define the general inaxiinum parsimony problc~n
developed and implemented a 1net11od for mini- as thc following. From the set of a11 possible trees,
mizing the number of evolutionary events (inver- find all trees zsuch that
sions, transpositions, insertions, and deletions) re-
quired to convert one circular genome into
another. This quantity then serves as the basis for
a distance metric. Others (e.g., Doore et al., 1995)
have performed parsimony analysis on special
codings of the gene order data, despite the non-
independence of the data. It is likeIy that methods is rnmimal, where L(z) is the length of tree z, B 1s
of analysis for gene-rearrangement comparisons the number of branches, N IS the number of char-
will be an active area of research for the next few acters, k f and k l 1 are the two nodes incident to
years. each branch k, xk?, and xi ,, represent either ele-
ments of the input data matrlx or optimal charac-
ter-state assignments made to internal nodes, and
BBTIMALITY CRITERIA I: diff(y,z) is a function specifying the cost of a trans-
PARSIMONY METHODS formation from state y to stale z along any branch.
The coefficient wlassigns a weight to each charac-
Of tlze existing numerical approaches to inferring ter; it is often set to 1, but thls need not be the
phylogenies directly from character data, meth- case. Note also that diff(y,z) need not equal
ods based on the principle of maximum parsi- diff(z,y), although for methods that yieId un-
mony have been the most widely used by far. rooted trees, diff(y,z)= diff(z,y).As discussed be-
Most biologists are familiar with the usual notion low, tlze definition of optznrnl chnracter-state nss~gl?-
of parsimony in science, which essentially main- ~nenfsinay include rcstr~ctionson the nature of
tains that simpler hypotheses are preferable to permissible character-statc changes.
inore complicated ones and that ad Iqoc hypothe- Any discussion of parslinony mehods must
ses should be avoided whenever possible. Neth- distinguish between tlze o p t ~ m a l i t ycrlterlon
ods for estimating trees under the criterion of par- (minimal tree length under a specified set of re-
simony equate "simplicity" with the explanation strictions on permissible cl~aracter-statechanges)
of attributes shared among taxa as due to their in- and the actual algorithm used to search for opti-
heritance from a common ancestor (e.g., Sober, mal trees. Early descriptions of parsiinony inetll-
1989).When character colficts occur, however, ad ods (e.g., Farris, 1970) were presented m a way
lzoc hypotheses cannot be avoided if tlze observed that tended to obscure the boundaries between
character distributions are to be explained, and criteria and algorithms. Biologists attempting to
assumptions of I~omoplasy(convergence, paral- understand a method sl-lould not become so
lelism, or reversal) must be invoked. mired in algorithmic details that they lose track
In general, parsimony methods for inferring of the underlying biological principles and as-
phylogenies operate by selecting trees that mini- sumptions (Felsenstein, 1982).Algorithms tend to
mize the total tree length: the number of evolu- have short life spans, because better ones are con-
416 Chapter I1 / Swoffovd, Olsen, Waddell G. Hillis
stantly b e ~ n ginvented. For example, Farris's quantity but differing in their underlying evolu.
(1 970) olgc~rithrnfor estimating minimum-length tionary assumptions. We will now address each
tree.; under the Wagner parsimony criterion is of these methods in turn. The metliods are pre-
not, io our knonrledge, ~ l s e din any modern, sented in a logical p r o g r e s s i o ~rather
~ than in
~vidc!yused parsimony computer program (e.g., chronological order of their introduction into the
Farrii'i Hennlg86, Felsenstein's PHYLIP-MIX, or literature. In describing the procedures used to
S~vofiord'sPAUP), but 111scrzferion forms the ba- compute the minimum length required by a tree
sl3 for all of them. For these reasons, the concep- under a particular o p t i n ~ a l ~criterion,
ty we will
tual f r an~cworkin whlch we wlll discuss parsl- consider a single character (position) in isolation
many (and other) crlteria assumes that the from the rest. Because of the assumption of inde-
problem of finding optimal trees is not at issue. pendence among characters, we can compute the
\Vc assume, for the moment, that every possible overall tree length by summing, over all charac-
tree can be evaluated, optlmlzing each one acters, the lengths required by each individual
cordlilg to the chosen criterion and ranking them character. For the simplest procedures (Fitch and
accordmg to that criterion. We will take up the Wagner parsimony), we provide pencil-and-pa-
matter of searching for optimal trees in a subse- per algorithms for computing tree lengths and
quent section. determining optimal character-state assign-
A common misconception regarding the use ments. Again, we are concerned olily with calcu-
of parsimony methods is that they require a priori lating the length of a single tree, which is taken
determination of character polarities (see above). as a given; this tree may not be a most-parsimo-
In morpl~ologicallpbased studies, character po- ~iiousarrangement for our example character (or
larity is often inferred using the method of out- even over all characters); it is simply a tree that
group comparison, and the resulting "polarized" we wish to evaluate.
c h a ~acters form the basis of the analysis. F ~ ~ r t h e r -
rnorc, since a "hypothetical ancestor" 1s implied
by the polarity ass~gnments,the output of an
Fitch and Wagner Parsimony
analybls of polarized characters is a rooted tree. These are the simplest parsimony methods, im-
Whel eas specification of polarities provides a suf- posing no (Fitcl~) or minimal (Wagner) constraints
ficlei~tbass for obtaining rooted (rather than un- on perniissible character-state changes. The Wag-
rooted) trees, it is by no means prerequisite to the ner method, formalized by Kluge and Farris
use of parsimony mctl~ods.This circumstance is (1969) and Farris (19701, assumes that characters
fortunate, since the estin~ationof character polar- are meas~rredon an interval scale; thus it is ap-
~ t IS
y both n-tore difficult and less meaningful for propriate for binary, ordered multistate, and con-
most kinds of molecular data. All that is required tinuous characters. Fitch (1971b) generalized tlie
to obtnu~rooted trees from parsimony analysis is method to allow unordered multistate characters
to include m the data set one or more assumed (e.g., nucleotide and protein sequences). Wagner
outgroup taxa. Thc location at which the out- parsimony assumes that any transformation from
group Iorns the unroo[ed tree implies a root with one character state to another also implies a trans-
respect to the ingroup taxa. We emphasize, how- formation through any intervening states, as de-
ever, (hat the assignment of taxa to the outgroup fined by the ordering relationship. Fitch parsi-
constitutes an assumption that the remaining taxa mony allows any state to transform directly to
(the ~ngroup)are monophyletic (an assumption any other state. 130th methods permit free re-
that hopefully is juslified by evidence extrinsic to versibility; that is, change of character-states in ei-
the data at hand). Xf this assumption is wrong, the ther direction is assumed to be equally probable,
trec wrll be rooted incorrectly. and character states may transform from one state
I'arsimony analysis actually comprises a to another and back again. A consequence of re-
group of related methods, u n ~ t e dby the goal of versibility is that the tree may be rooted at any
uunlnuzrng some evolutionarily significant point with no change in the tree length.
To determine the minimum length required An application of the above algorithm is pre-
by a given character j under either the Wagner or sented in Figure 2. We wish to compute the length
pitch criteria, only a single pass over the tree is re- of the unrooted tree of Figure 2A. (Although the
quired, proceeding from the tips toward the arbi- more usual situation for molecular data would in-
trary root. Computer scientists call this pass a post- voIve binary rather than rnultistate characters, we
order traversal. Although the computation can be treat the multistate case to demonstrate the gen-
performed in other ways, we recommend rooting erality of the algorithm. Binary characters are sirn-
the tree at one of the terminal taxa, denoted r, as ply a special case.) We first re-root the tree at node
shown in Figure 2. The algorithm for computing
the length of a strictly bifurcating tree under the
Wagner parsimony criterion then proceeds as fol-
lows (see Swofford and Maddison, 1987, for a
more rigorous presentation).
1. To each terminal node i (including the one at
the root), assign a state set S, containing the
character state assigned to the corresponding
taxon in the input data matrix (= x0) Initialize
the tree length to zero.
2. Visit an internal node k for which a state set Sk
has not been defined but for which the state
sets of k's two immediate descendants has
been defined. Let i and j represent Kstwo irn-
mediate descendants. Assign to k a state set Sk
according to the following rules:
2a. If the intersection of the state sets assigned to
nodes i and j is non-empty (Si n SI # a),let k's
state set equal this intersection (i.e., Sk = S, n SI).
The intersection can be represented as a closed
interval [ak,bk].
2b. Otherwise (Si n S, = a),let k's state set
equal the smallest closed interval [ak,bk]contain-
ing an element from each of the state sets
assigned to i and j. Increase the tree length by
bk - ak.
3. If node k is located at the basal fork of the tree
(i.e., the immediate descendant of the termi-
nal node placed at the root), the traversal has
been completed; proceed to step 4. Otherwise,
return to step 2.
Figure 2 Steps in the algorithm for computing the
4. If the state assigned to the terminal node at the length of an ordered character under Wagner parsi-
root of the tree (x,) is not contained in the mony. (A) The unrooted tree and character states. (8)
Tree obtained by rooting at terminal node A and initial
state set just assigned to the node at the basal state sets assigned to temlnal nodes. (C) State sets com-
fork of the tree (Sk),increase the tree length by puted for interior nodes (bold).(D)Reconstruction ob-
the distance from x, to Sk. (This distance tained according to the algorlthin described in the text.
equals ak - x, if x, < ak or x, - ak if x, > bk.) (E) An alternative, equally parsimonious reconstruc-
tion.
418 Chapter I1 I Swofford, Olsen, Waddell G. Hi1Eis
A (although we could have chosen any node), [0,2] to 1) to node X; likewise we assign state 1
yielding the rooted tree shown in Figure 2B. Also (the closest state in [1,31 to 1)to node Y. The re-
shown in Figure 2B are the state sets assigned to sulting reconstruction is shown in Figure ZD,and
the terminal nodes according to step 1 of the algo- confirms the value of 5 as the minimum length for
rithm. Visiting internal node X in the first invoca- this character.
tion of step 2, we observe that Sg n SC = 10) A {2}= It is important to remember that this method
0, and hence assign the interval [0,21 to Sx, finds only a single MPR, although others may ex-
adding 2 - 0 = 2 to the tree length, Similarly, we ist. For instance, the reconstruction in Figure 2E
let Sy = [1,3] in the second invocation of step 2, also requires 5 steps. Swofford and Maddison
and add 3 - 1 = 2 to the length, which is now 4. In (1987) described an exact algorithm for obtaining
the third and final invocation of step 2, we ob- all MPRs for discrete character data under the
serve that the intersection Sx n Sy= [0,2] n [1,3] is Wagner parsimony criterion.
not empty, and therefore assign the interval [1,21 Simple modifications of the above algorithm
to Sz The situation as we arrive at step 4 is shown provide for the treatment of multistate unordered
in Figure 2C. Since x, = 0 is not an element of Sz = characters (e.g., nucleotide sequence positions)
[1,3], we add an additional 1 - 0 = 1 to the length. under the Fitch (1971b) parsimony criterion. In
Thus, evolution of this character requires a mini- the initial pass (computation of state scts and tree
mum of five steps on our given tree. Icngtl~s),modify steps 2 and 4 as follows:
The procedure outlined above is sufficient to 2a'. If the intersection ofthe state sets assigned to
obtain the minimal length required by any char- nodes i and j is non-empty (Si n Sl # O), let k's
acter on a given tree. However, it does not actu- state set equaI this intersection (i.e.,Sk= S, n Sl).
ally assign optimal character states to the hypo-
thetical ancestors (internal nodes) of the tree to 2b'. Otherwise (S, n SI = O), let k's state set
yield a most-parsimonious reconstruction equal the union of the state sets assigned to
(MPR). To obtain such a reconstruction we can nodes i and I (S,u S,), and increase the tree
make a second pass over the tree, this time pro- length by 1.
ceeding from the root toward the tips (a preorder 4'. If the state assigned to the terminal node at the
traversal): root of the tree (x,) is not contained in the state
5. Visit an internal node k for which an optimal set just assigned to the node at the basal fork
state assignment xk has not yet been made but of the tree (Sk),increase the tree length by 1.
for which such an assignment has been made In order to obtain an MPR, modify step 6 above as
to k's immediate ancestor, denoted m. (Note
foIIows:
that the first time this step is invoked, k corre-
sponds to the node at the basal fork of the tree 6'. If x, is contained in the state set assigned to k
and rn = u, the terminal taxon at the root of the in the first-pass (Sk),assign this state to k as
tree.) well. Otherwise, arbitrarily assign any state
from Skto k.
6. Assign to k the state from the state set com-
puted in the first-pass, Sk (= [ak,bk]),that is An example of the applicatioi~of the above algo-
closest to x,. Specifically, if x,, is contained in rithm is shown in Figure 3. We are interested in
Sk, we let xk = x,. Otherwise, we let xk = ak if carnputing the length required by a single charac-
xm r: akor xk = bk if X, > bk. ter on the unrooted tree of Figure 3A. As before,
we re-root the tree arbitrarily at node A, yielding
7. If all internal nodes have been visited, stop.
the tree shown in Figure 313. The state sets as-
Otherwise return to step 1.
signed to the terminal nodes are indicated on the
Applying steps 5-7 to the example of Rgure 2, we figure. Visiting node X in the First invocation of
first assign state 1 (the closest state in [1,2] to 0) to step 2', we see that {A}n (C] = 0, and hence as-
node Z. We then assign state 1 (the closest state in sign the union {A,C} as the state set Sx and set the
Phytogenetic Inference 418
cated in Figure 3C. Since x, = C is not an element

of SZ = {A,G), w e add an additional step to the
length, so that a total of 3 steps (nucleotide substi-
tutions) are required on this tree.
If we wish to obtain one of tl-re MPRs, we ob-
serve that the state C taken by the terminal taxon
at the root of the tree is not contained in the set
[A,G) assigned to the node at the first fork, and
we may arbitrarily choose to assign state A to this
node. We then assign state A to node Y as well
(since the state set was a singleton no decision
need be made). Finally, since state A is contained
in node X's state set {A,C),we assign it to the
node, yielding the reconstruction shown in Fig-
ure 3D.
As was the case for the ordered character ex-
ample, more than one MPR exists. For example, if
we had cl-rosen to assign state G rather than state
A to node Z , we would have obtained the recon-
struction s l ~ o w nin Figure 3E.Still another MPR
exists, however, i1-r whic11 state C is assigned to all
three internal nodes. That C was a possible state
for node Z was not readily apparent from the
state set {A,G] originally assigned to that node. In
fact, a second pass over the tree is necessary in or-
der to obtain all of the possible state assig~~ments
to each interior node. Fitch (1971b) described one
such method and gave an algorithm for enurner-
ating all of the possible MPRs.
Although all the algorithms described above
are restricted to strictly bifurcating trees, they can
easily be modified to handle multifurcations
Figure 3 Steps in the algorithm for computing the (polytomies). W.P. Maddison (1989) reviewed al-
length of an unordered character under Fitch parsi- gorithms for obtaining MPRs on polytomous trees
mony. (A) The unrooted tree and character states. (B) under a variety of evolutionary models, includiilg
Tree obtained by rooting at.terminal node A and initial the introduction of some novel approaches.
state sets assigned to temlnal nodes. (C)State sets com-
puted for interior nodes (bold).(D) Reconstruction ob-
tained according to the algorithm described in the text.
Branches on which character-state change occur are in- Other Parsimony Variants
dicated in bold. (E) An alternative, equally parsirno-
nious reconstruction. Dollo Parsimony
The Wagner and Fitch parsimony criteria are ap-
propriate under the assumption that probabilit~es
tree lengtl~for this character to 1.Moving to node of character change are symmetrical (i.e., lhe
Y, we assign (A,C}n (A] = {A)to Sy. Finally, since probability of a transformation from state 0 t o
{A}n (G} = 0, we assign the state set {A,GJ to state 1 in some small unit of evolutionary trine 1s
node Z, again adding 1to the tree length. Thus, at equivalent to that of a change from state 1 to slate
the beginning of step 4', the state sets are as indi- 0). As discussed above, this assumption is proba-
420 Chnpter 21 / Swofford, Olsen, Waddell & Hillis
hence the tree length assigned) nus st be corlsistent

with the constraint that every derived character
state be uniquely derived. If a hypothetical ances-
tor (a hypothetical taxon to which the assumed
ancestral states for each character have been as-
signed) is included in the analysis, this definition
corresponds to the traditional Dollo model (Far-
ris, 1977):each character state is allowed to origi-
nate only once on the tree, and any required ho-
moplasy takes the form of reversals to a more
ancestral condition (i.e., parallel or convergent
gains of the derived condition are not allowed). ln
the context of restriction-site data, each site may
be gained once, with as many parallel losses of
the site being assumed as are necessary to explain
the data. For example, for the tree and character
states shown in Figure 4 and with state 0 (site ab-
sent) assumed to be ancestral, the reconstruction
of Figure 4A,requiring only two steps, is not ac-
ceptable under the Do110 model because two
gains are indicated. Consequently, three steps
would be required under the Dollo criterion (Fig-
ure 4B): a single gain followed by two losses.
Figure 6 Character-state reconstructions deinonstrat- Use of the Dollo parsimony criterion does
ing Dollo parsimony criterion. Branches on whicl~char- not require inclusion of a hypothetical ancestor;
acter- state cl~a~igesoccur arc indicated in bold. (A) Most
parsimonious reconstruction if multiple originations of it can be applied to unrooted trees as well. Stated
state 1 are allowed. (B) Most parsimonious reconstruc- another way, although the Dollo criterion re-
tion under DoIlo parsi~nony,in which only a single quires specification of character polarity in a uni-
origination of state 1 is permitted. (C,D) Reconstruc- versal sense, it does not require us to know the
[.ions obtained under unrooted Dollo model. Either state occurring in the most recent ancestor of the
rootirig of the tree implies a minimum of two character
state changes and only a single origination of state 1. ingroup taxa. Specifically, the unrooted Dollo
model forces us to assign character states to the
interior nodes of the tree S L L C that
~ if a path is
traced from any terminal taxon to any other, a
biy unreasonable for restriction-site characters, backward change (from a more derived state to
slncc the loss of an exlsting restriction site is more a more ancestral state) is never followed by a lor-
probable than a parallel gain of the same site at ward change (from a more ancestral state to a
any particular location. more derived state). Under this definition, the
Because of this asymmetry, DeBry and Slade position of the root affects neither the assign-
(1985) and others have suggested that the Do110 ment of character states to interior nodes nor the
parsimony model (Farris, 1977) is more appropri- length of the tree. For example, both of the trees
ale for restriction-site data. The Dollo parsimony shown in Figures 4C and 4D, whicl~differ only
criterion can be applied to binary or linearly or- in the placement of the root, require two steps
dered multistate characters for which we can rea- under the unrooted Dollo model (assuming that
so~iablyhypothesize an ancestral condition (po- state 1 is the derived state). Neither tree requires
lar~ty).As for Wagner and Fitch parsimony, the more than a single origination of state 1. (Note
preferred tree is the one requiring the fewest that in the tree of Figure 4D, the derived state 1is
steps, but the character-state reconstruction (and assumed to be ancestral with respect to the
group ABCD, but derived relative to some more

inclusive group.)
The unrooted Dollo approach is particularly
convenient for restriction-site characters since it
does not requirs the construction of a hypotheti-
cal ancestor, only the inclusion of one or more
outgroup taxa. If a site is present in some of the
ingroup taxa and in one or more of the outgroup
taxa as well, then the most recent common ances-
tor of the ingroup is assumed to have had the site.
The analysis will then seek to minimize the num-
ber of losses of the site over the full tree (ingroup
and outgroup). If, on the other hand, the site is
found only in some of the ingroup taxa but not in
the outgroup, then the site is assumed to be an-
cestrally absent with respect to the ingroup, and a
single gain will be postulated at an optimal loca-
tion within the ingroup. Remember that the spec-
ification of "site absent" as the ancestral condition
does not imply that the site was absent in the
most recent coininon ancestor of the ingroup taxa,
only that the site was absent in some, perhaps
quite distant, ancestor.
The drawback to use of Dollo parsimony for
restriction-site characters is demonstrated in Fig-
ure 5. If, despite its unlikelihood, a particular re-
striction site does originate independently in two
lineages (Figure 5A), then the actual number of
evolutionary changes can be drastically overesti-
mated (Figure 58) due to the strict enforcement of
the requirement for unique originations. This
pathological behavior may occur more often than
the reader might suspect. Suppose one particular
position within the restriction site were less con-
strained than the others, and further suppose that
transition substitutions at this position were much Figure 5 Demonstration of problems affecting Dollo
more likely to occur than transversions. Then it is parsimony if multiple originations of the derived state
easy to imagine that the nucleotide at this position actually occur. (A) "True" tree has two steps due to in-
would, on an evolutionary time scale, toggle be- dependent derivations of state 1.(B) Reconstruction ob-
tween the two purines (or pyrimidines). The site tained under Dollo parsimony requires 11 steps (one
derivation of state 1 and ten reversals to the ancestral
would then "blink" on and off, depending on state 0).
which base was present at any particular point on
a lineage. If we permitted only a single origina-
tion of the site, the number of losses we would be dent gains, but we might prefer two independent
forced to postulate could become large. gains to one gain and ten losses. The generalized
One way to avoid this problem is to adopt a parsimony method, discussed later, provides a
"relaxed" Dollo criterion. For example, we might mechanism for implementing a relaxed Dollo
prefer one gain and two losses to two indepen- model.
422 Chapter 11 / Swofford, Olseuz, Waddell & Hillis
Camin-Sokal Parsimony 1983, for a somewhat more consumer-oriented

The method of Carnin and Sokal(1965) was actu- treatment and note that generalized parsinzony is
ally the first discrete-character parsimony ap- our term, not theirs). The costs can be represented
proach to be described. It makes the strongest as- as an m-by-m matrix S, where S,] represents the in-
sumption of any of the methods discussed so far, crease in tree lengtl~(weight) associated with a
namely, that evolution is irreversible.* We men- transformation from state i to state j, and nz is the
tion it here only for the sake of completeness, total number of possible states. Three such
since it is highly unlikely that the assumption of weighting matrices, corresponding to the Wagner,
irreversibility could be justified for any type of Fitch, and Dollo parsimony criteria, are shown in
molecular data. Figure 6A-C. An exact, dynamic programming al-
gorithm can be used to determine the minimum
Transve~sionParsimony length required on a given tree for any particular
A common observation (e.g., W.M. Brown et al., choice of costs and to obtain one or all of the
1982) is that transition substitutions occur more MPRs that yield this length (Sankoff and Ceder-
frequently than transversions in a given gene. For gren, 1983); because of the complexity of this al-
same molecules, it might even be argued that tran- gorithm, we will not attempt to describe it here
sitions occur so frequently that they quickly de- (but see Swofford and Maddison, 1992, for an in-
generate into noise and should therefore be ig- troductory presentation).
nored altogether. A simple method for ignoring ~ ~ ~ f o r t & a the
t e lgeneralized
~ parsimoi~yap-
transitions is to re-code the four nucleotides as ei- proach is much more computationally expensive
ther .?I (purine; A or GI or Y (pyrimidine; C or TI. than the algorithms described above for certain
Standard Wagner parsimony may then be applied special cases (although a new procedure de-
to the resulting binary-coded matrix. scribed by Wheeler and Nixon, 1995, may provide
A disadvantage to the complete rejection of a faster approximation). Its advantage lies in its
information on transitions is that, while transi- generality. For instance, S is not required to be
tions may become saturated over long evolution- symmetric. Relaxation of this requirement pro-
ary distances, they may nonetheless be highly in- vides a means of implementing a relaxed Dollo
formative with respect to relationships among criterion: by making the cost of a forward trans-
closely related taxa. One way around the dilemma formation greater than that of a backward trans-
is to assign greater weight to transversions than formation, we can prefer single-gain, multiple-
transitions, without going so far as to give transi- loss scenarios until the number of losses becomes
tions zero weight, as does transversion parsi- great enough that we are willing to allow inde-
mony. Generalized parsimony can also be used pendent gains. For example, the step matrix
for this purpose, as outlined below. shown in Figure 6D would prefer one gain and
Note that some authors (e.g., Lake, 1987a)use two losses over two gains, but would prefer tw;
the term fransvevsion parsilnony in a different sense gains over one gain and four losses. Generalized
than we describe here. parsimony can also be used to attach greater im-
portance to transversions than to transitions by
assigning costs such that changes between two
Generalized Parsimony purines or between two pyrimidines receive
All of the above parsimony variants can be sub- iower weight than ckang~s'froma purine to a
sumed into a generalized method that assigns a pyrimidine or vicc versa (e.g., Figure 6E).
cost far the transformation of each character state Perhaps the most troublesome aspect of gan-
to the other possible states (Sankoff, 1975; Sankoff eralized parsimony is determining how to choose
and Rousseau, 1975; see Sankoff and Cedergren, the costs for different kinds of transformations.
"Some readers, familiar with "Dollo's Law of Irreversibility," may be confused at this point. T11e Dollo parsimony
model does not assume complete irreversibility, only that a denved character state cannot be lost and then
regained. The Camin-Sokal modcl does not permit a derived character state to relurn to thc ancestral condition.
Phylogene t i c Inference 423
(A) a b c d (R) a b c d weiglitil-ig. Even if we have no idea how 1nuch

a - 1 2 3 a - 1 1 1 more frequently transitions occur than transver-
sions, a transversion:transition weight such as a
1.1:l weighting may be desirable. Suppose that
under equal weighting one tree required 5 homo-
plastic transversions and 3 hoinoplastic traiisi-
;ions, while another tree required 1 l-ion~oplastic
transversion and 7 homoplastic transitions.
Whether the "optimal" transversion:transitio~~
weighting is 2:1, 3:1, or 20:1, the tree requiring
only I "extra" transversion would be preferable
and would be chosen as superior und& the 1.1:l
weighting scheme. Similar arguments can also be
advanced for the use of gain:loss weights other
than 1:l for restriction sites.
An alternative to assuming a particular set of
costs based on extrinsic criteria is to estimate the
appropriate weights from the data themselves.
Williams and Fitcli (1989) discussed methods for
choosing initial weights and for refining them by
iterative improvement. Unfortunately, these inetli-
Figure 6 Cost matrices for generalzed parsimony. (A) ads may be sensitive to the starting point, a fre-
Cost matrix equivalent to Wagner parsimony (ordered quent drawback to successive approximation
characters). (B) Cost matrix equivalent to Fitch parsi- methods. Iterative approxiniation of optimal
mony (unordered characters). (C) Cost matrix equiva- weights remains an area of active research, and
lent to Dollo parsimony, M is an arbitrarily large num- further devclopinents may bc expected in the
ber, guaranteeing that only one transformation to each
derived state will be permitted. (D) Cost matrix that as- near future (for more on this subject see the scc-
signs greater weight to gains (0 -+ 1 changes) than to tion on "Reliability of Inferrcd Trees").
losses (1 -+0 changes). (E) Cost matrix that assigris The methods developed by Sankoff and his
greater weight to transversions than to transitions. colleagues were also designed to construct opti-
mal allgi~mentson a given tree by incorporat~ng
insertion/deletion welghts (with insertions of
One approach is to assign weights consistent with gaps as appropriate) In add~tionto the substitu-
the researcher's assumptions about the relative tion weights. This strategy is very appealing 111
frequency of different kinds of events. As a mat- that ~t effectively merges the problems of align-
ter of general principle, we disagree with those ment and trec selection ~ n t oa single problem, In-
who argue that a priori weighting of different sertlons and deletions are treated as events local-
kinds of changes introduces an unacceptable level izcd to particular branches 01-1 the tree m o r d e ~to
of subjectivity into the analysis; an assuil~ptionof maximize the overall parsimony. The alternative
equal weights is itself a strong assumption. If, for method, construction of a multiple alignment
example, w e examined an alignment and ob- prior to the phylogenetlc analysis, IS vastly mFe-
served that of 200 variable positions (columns), 80 rior, since the topology of the tree cannot be ig-
contained only A and G, 80 contained only C and nored when deciding where to place gaps.
T, and only 40 contained a mixture of purines and Unfortunately, rigorous application of Sau-
pyrimidines, the conclusion that transitions occur koff's method is computatlonally difficult for
much mare frequently than transversions would more than three sequences and one interior nodc
not be controversial. In this case, a transver- Sankoff et al. (1976) dcscrlbed an iteratrve proce-
sion:transition weighting of 1:1 would certainly dure that rigorously aligns w l t h ~ nlocal regions of
represent a stronger assumption than a 2:1 a tree (three sequences adjacent to a single interlor
424 Chapter I 1 / Swofford, OZsen, Waddell G'
node), sacrificing the guarantee of global optirnal- man-Czelusniak algorithm may be overkill in the
~ t but
y providing greater trartab5ty. Nanney et al. sense that it pays too much attention to silent sub-
(1989)described and programmed a more ap- stitutions (e.g., substitutions at third positions that
proxlniale, but much faster, procedure that oper- do not change the corresponding amino acid), If
ates by assuming that lengths of insertions and silent substitutions occur so frequently that infor-
deletions are sufficiently small to allow alignment mation from third positions quickly reaches satu-
within a local "window" rather than obtaining a ration, then these positions would contribute
global alignment for any triplet of sequences. mainly noise (or worse, systematic error) and
Hejn (1990a,b) and Wheeler and Gladstein (1994) should therefore be ignored. Weighting methods
have developed useful programs for simultane- presumably could be used to minimize the contri-
ous alignment and tree optimization (see Chapter bution of third positions without ignoring them
9 for dctalls). entirely. To our knowledge, however, such meth-
ods have not been used.
A third approach, intermediate between the
I'arsimony on Protein Sequences first two, has been implemented by Felsenstein
Because this book does not specifically deal with (1993) in his PROTPARS program from the
ain1170acid sequencing, our discussion of parsi- PHYLIP package but has yet be formally de-
mony methods for treating these sequences will scribed in the literature. Unlike the Eck-Dayhoff
be brief. Three general procedures have been approach, it does consider the genetic code, but it
used. The first, and simplest, is to minimize the also deviates from the Moore-Goodman-Czelus-
number of amino acid replacements by using niak method by ignoring silent substiiutions. Al-
Fitch parsimony as described above (i.e., each po- though ignoring silent substitutions sounds like
sltlon in the aligned sequences is a multistate un- extra work, the required bookkeeping is in fact
ordered character, of which the possible states are simplified considerably because the program does
the 20 possible amino acid residues). This ap- not need to consider all the potential mRNA
proach, apparently used first by Eck and Dayhoff codons responsible for a particular amino acid
(1966),Ignores the genetic code by failing to con- residue or all of the potential synonymous codon
sider the mii~imalnumber of nucleotide substitu- assignments to the interior nodes. For example,
tions required for the replacement of one amino PROTPARS would assign one step to a change
acld by another (i.e., some replacements require a from lysine to arginine (e.g., AAA -+ AGA), but
single nucleotide substitution, whereas others re- two steps to a change from lysine to proline (e.g.,
q ~ t ~ r e t woroeven three substitutions). AAA + CAA (glutamine) -+ CCA). Changes such
Goodman, Moore, and their colleagues devel- as pheuylalanine to glutamine require three nu-
oped a more sophisticated approach (reviewed by cleotide substitutions (e,g.,AAA 4 GAA (leucine)
Goodman, 1981) tliat seeks trees requiring the i GAT (leucine) -+GTT) but are counted as only
fewest number of nucleotide substitutions at the two steps, since the middle substitution is silent.
mRNA level. They produced an algorithm that One could take Felsenstein's argument a step
generalizes the Fitch parsimony approach to further. Because of the biochemical properties of
codons, taking into account the degeneracy of the the various amino acids, there is often little selec-
genetic code and guaranteeing that one obtains tion against changes between amino acids having
ciw m ~ n i m u nnumber
~ of nucleotide substitutions similar properties (e.g., between aspartic and glu-
r t q u ~ r e dby any given tree. (A highly readable tamic acids). If changes between similar residues
presentation of the algoritlm, including a worked occur very frequently, perhaps we should ignore
example, appears in G.W. Moore, 1976; see also them as well (or at least give them less weight).
Goodman et al., 1979).A more recent modification The generalized parsimony method can be used
to their algorithm, by 1. Czelusniak, permits the to implement this strategy (Marsh et al., 19941,
ln~xtureof amino aeld and nucieotide sequences with the weights derived from the matrices pre-
( w l ~ e navailable) in the same analysis (Goodman, sented by Dayhoff (1978) or Henikoff and
1981). Despite its elegance, the Moore-Good- Henikoff (1992).
(A) Allele frequencies Presence/Absence

Taxon a b c d e a b c d e
A 0.8 0 0 0 0.2 1 0 0 0 1
R 1.0 0 0 0 0 1 0 0 0 0
C 0 0.5 0 0.5 0 0 1 0 1 0
D 0 0 l,o 0 0 0 0 1 0 0
E 0 0 1.0 0 0 0 0 1 0 0
Figure 7 Demonstration of one problem with "inde-

Parsimony on Allozyme Data pendent alleles" coding. (A) Allele frequencies and data
matrix resulting from presence/absence coding. (B) The
The problems with treating allele frequencies or most parsimonious reconstruction for the tree indicated
presence/absence as characters in a phylogenetic assigns no alleles to ancestor Y .
analysis were discussed above (see "Isozyme
Data" in the section "Types of Data"). To clarify
these issues within the context of parsimony tances used for measuring branch lengths. Rogers'
analysis, consider the example in ~ i @ r e7. If the original method (1984) was derived for his earlier
alleles indicated in Figure 7A are scored as present (1972) distance measure; he later extended it to a
or absent and then treated as independent charac- variety of other (mostlv Euclidean) distance mea-
ters, the most parsimonious reconstruction under sures. His procedure uses the optimization tech-
Wagner parsimony assigns no alleles to ancestor nique of "hyperboloid approximation," which re-
Y (Figure 7B), an outcome that most biologists quires that the distance measure be representable
would find unacceptable. A similar example as a differentiable function. Swofford and
could have been constructed using allele frequen- Berlocher (1987) argued for the superiority of the
cies rather than presence/absence, in which the Manhattan metric and were forced to solve the
most parsimonious reconstruction assigned an- problem via linear programming.
cestral frequencies that summed to a value less Methods that use allele frequencies rather
than 1. than presence/absence are often criticized on the
J.S. Rogers (1984, 1986) and Swofford and grounds that the allele frequencies are too easily
Berlocher (1987) developed methods for minimiz- modified by random drift and/or selection, and
ing the total amount i f frequency change on a therefore do not provide reliable information for
given tree, subject to the constraint that the array phylogenetic analysis (e.g., Mickevich and John-
of allele frequencies (for a particular locus) as- son, 1976). In some cases, allele frequencies are
signed to each interior node of the tree must exist known to vary temporally over the span of a few
in "allele frequency space" (a hyperplane in years, and this observation also has been used to
which the sum of the frequencies for all alleles is question their relevance to phylogeny (Crother,
1).These methods differ only in the choice of dis- 1990). We wouId argue, however, that even if the
426 Chapter 11 / Swojford, Olsen, Waddell b Hillis
information contained in allele frequencies is ary change. Considerable disagreement exists as

somewhat unreliable, the frequencies at least pro- to whether the "model-free" nature of parsimony
vide a way to weight the presence or absence of is an advantage or a disadvantage. Regardless of
particular alleles. For example, if an allele were where one stands on this issue, however, oize
detected sporadically in the taxa being analyzed, thing is clear: parsimony does make assumptions,
but never at frequencies higher than 0.04, we and violation of these assumptions can lead to
would be hesitant to attach much importance to problems. The difficulty lies in stating precisely
the shared presence of that allele in some of the what the assumptions are. At a minimum, accep-
taxa; it could easily be present in other taxa at tance of an optimal tree under the parsimony cri-
similar frequencies, but missed due to sampling terion requires one to assume that conditions that
error. On the other hand, an allele that is either can cause parsimony to estimate an incorrect tree
fixed or nearly fixed whenever it occurs is proba- are unlikely to have occurred." The ability of an
bly more indicative of relationship. It should be estimation method to converge to a true value (in
emphasized that adopting a cutoff frequency (typ- this case the correct tree) as more data are accu-
ically 0.05) does not solve the problem unIess a re- mulated is known as consistency. Felsenstein
searcher is willing to assert that an allele known (1978a) showed that parsimony methods can
to occur in a sample at an estimated frequency of, make inconsistent estimates of the true phylogeny
say, 0.04 is "not present." under one simple evolutionary model.
Although the Rogers and the Swofford-
Berlocher methods are conceptually simple, the Pavsimony and Inco~zsistency
computer algorithms used to implement them are Examination of the conditions under which parsi-
quite complex; the interested reader should refer mony'' is an inconsistent estimator will be helpful
to the original papers for details. These methods in understanding the usefulness of explicit evolu-
are also much slower than other parsimony meth- tionary nzodels. We will first present a non-tech-
ods. However, Berlocl~erand Swofford (1996; see nical examination of the problem; in a later sec-
Swofford, 1996) have developed a fast approxi- tion ("Model-Based Corrections for Character
mation using generalized parsimony on single-lo- Data: Hadamard Conjugation") wc will look at
cus Manhattan distance matrices (for a given tree, the issue more rigorously. Suppose that the true
this algorithm obtains an exact solution to Swof- phylogeny for a group of four taxa 1s as shown in
ford and Berlocl~er's1987 MANOB criterion). Figure 8A, where the lengths of the branches in-
dicate the relative expected amount of evolution-
ary change along each branch under some model
OPTIMALITY CRITERIA 11: of evolution (e.g., the model of Jukes and Cantor,
METHODS BASED O N MODELS 1969). For whatever reason, the rate of evolution
has been accelerated in the peripheral branches
OF EVOLUTIONARY CHANGE leading to taxa 1 and 4, Each nucleotide position
will have some ancestral nucleotide (e.g.,A in Fig-
The Utility of Models ure 8A). Suppose that the short branclnes are so
Although the parsimony methods described short that there are essentially no changes along
above are based on specific optimality criteria, the lineages leading to taxa 2 and 3. One of four
they do not require explicit models of evolution- possible classes of nucleotide patterns will then
'An alternative position is that parsimoliy is required as a method of scientific inquiry regardless of any considera-
t~onsabout whether it is more or less likely to recover the true phylogeny than other methods. Some proponents of
this view hold that since the truth is essentially unknowable, we should abandon the search for it and simply
choose the most parsimonious solution for its own sake, We do not subscribe to this position. Although the true
phylogeny may be "unknowable," it can nonetheless be estimated, and we view phylogenetic methods as means to
that end rather than an end in themselves.
tParsimony in the traditional sense, i.e., "uncorrected parsimony"; see the end of this section,
Plzylogelzetic Inference 427
(A) (1) Paitern type -

(4)
tuna tely, Lhls pattern supports an incorrect- tree
A I Uninfomlative (constant) A
(Figure 8C);~t is actually mlslnformatlve about
A I1 Uninformative G evolutionary relat~onsh~ps. Patterns that support
C I11 Unmformative G the true tree (Figure 88) will occur only rarely, be-
cause they require an unlikely change along khc
central branch (or, even less likely, two parallel
changes along the long and short branches on the
same slde of the tree)
Felsenstein (1978a) called the behavlor of par-
simony 111 situations l ~ k ethat shown In Flgure 8
"posit~velymlsleadlng" because as the number of
characters (e.g., sequence length) mcreases, we be-
(B) (1)x(4) (0 (2) come more and more certaln to lnfer an incorrect
tree. Stated another way, when we are In the
Felsenstein zone (of mconsistency), the only hope
(2) (3) of gettlng the correct tree is by sampllng few
(4) enough characters that we may be lucky enough
Figure 8 Demonstration of the potential inconsistency to obtain more of the character patterns favoring
of parsimony methods. (A) Hypothetical four-taxon the (rue tree than of the more probable character
tree containing two long peripheral branches, with all patterns favorlng the wrong trce. As we will see
other branches being very short. (B) Unrooted equiva- below, methods that attempt to account for unob-
lent for the tree shown in (A). (C) Incorrect tree selected served as well as observed substitutions (maxi-
by maximum parsimony. See tcxt.
m u m likelihood and d ~ s t a n c emethods uslng
"multiple-hit" corrections) wlll not be inconsistent
under thls model and will estimate the correct tree
result from changes along the lineages leading to as long as enough data are ava~lableto overcome
taxa 1 and 4. In patterns of type I, taxa 1 and 4 sampling error.* Although In thls case the incon-
both retain the ancestral nucleotide, so the posi- slstency 1s due to strongly unequal rates of change
tion is constant and therefore uninformative un- along different branches, Hcndy and Penny (1989)
der the parsimony criterion. (Note that the obser- demonstrated other scenarios that lead to incon-
vation of pattern I does not imply that no slstency even with equal rates of change through-
substitutions have occurred-only that the nu- out the tree be., a molecular clock) and suggested
cleotides observed in the terminal taxa are identi- the term "long-branch attraction" for this general
cal at the tips of the tree, regardless of the number phenomenon.
of changes that have actually occurred.) Patterns Steel et al. (1993a) have emphasized that par-
of types I1 (a change along only one of the two simony is a criterion for choosing an optimal trce
long branches) and I11 (a change to a different nu- for a data set, whether the data are the original.
cleotide along each long branch) are likewise un- data or some transformat~onof those data Thcy
informative, as one can explain each pattern with show that, for conditions such as those shown
a single change (pattern 11) or two changes (pat- above, parsimony can st111make a consistent c5tl-
tern 111) along the peripheral branches of any of mate of the phylogeny lithe data are hrst cor-
the three possible unrooted trees. Only patterns of rected for unobserved substltutloi-~susing a
type IV are informative under parsimony. Unfor- Hada~nardconjugation (see below). The correc-
'Felsenstein's results have often been criticized (e.g., Farris, 1983,1986b) because they are based on unreallst~cand
restrictivc models of the evolutionary process. This criticism is unjustified, however, as the polnt could equally wcll
be made with much more general and believable models, but requiring more complex mathematics. Farris's (1983,
1986b)pomt that a maximum likelihood method will guarantee consistency only if evolution proceeds according to
the assumed model is of course true, a point to which we will rekrn latcr.
428 Chapter 11 I Swofiord, Olsen, Waddell & Hillis
tlon formally involves transfor~nation

of the orig-
lnal d'jta matrix to a new data matrix containing
21-1 characters (in the case of two states), each
with an associated weight. Weighted parsimony
analysis of t h ~ snew data set will not be inconsis-
tent as long as the other assumptions of the model
are satisfied (e.g., equal rates of change at each
I
Y
G
P -*
C
site). In extreme cases, tlus new data set may con- G
Time -----------+
tain highly weighted character patterns that were
coinpletely absent from the original data set, so
the method is quite different from the conven-
tional usage of the term parsimony. Consequently,
we wlll use parsimony to mean uncorrected pausi-
171onyunless otherwise indicated.
Differences in Perspective between Parsinzony

n n ~Likelihood
l Figure 9 Exainple used to show difference in perspec-
Even under conditions where parsimony is con- tive between parsimony and likelihood methods. (A)
sistent, alternative methods that incorporate mod- Hypothetical tree with branch lengths drawn propor-
tionally to expected nun~berof substitutions,labeled by
els of evolutionary change can make more effec- base observed at a particular site. (B,C,D) Insertion of a
tive u s e of the data, as denIonstrated in the new taxon containing a C at the site of interest to
example of Figure 9. The tips of the tree in Figure branches a , P and Y, respectively, of the tree shown in
9A are labeled by the nucleotides observed at one (A).
scqucnce position. (Although the tree is shown as
a rooted trec, it is forrnally unrooted, with the path
between ancestor 1 and the outgroup treated as a any branch of the tree wouId require no additional
single branch.) As a preliinlnary step to our formal steps and adding a sequence with a T would re-
inrroduction to maximum likelihood, it will be in- quire a single additional step for every branch, this
str~tctivcto examine (qualitatively) the perspec- sequence position would be uninformative re-
tives of parsinTony and maxlmum likelihood with garding the placement of a sequence with an A or
respect io the identity of the corresponding nu- a T This position would predispose a lineage con-
cleotlde In ancestor 1 of this tree. Using the meth- taining only a C or a G to originate from branches
ods of-Fltch parsimony described above, we find a, p, or u; because connecting such a lineage any-
Lhal tho i110st parsimonious state assignment for where in the subtree descending from ancestor 2 -
anceslor 2 is an A (an obvious choice, as all of an- would entail an extra i~ucleotidesubstitution.
cestor 2's descendants possess A as well). Thus, Now consider the maximum likelihood per-
ai-iccsior 1 has given rise to a lineage with an A spective. In maximum likelihood estimation, we
and a llneage with a C. It is also related to a Iin- choose the hypothesis that maximizes the proba-
cage (lcadmg to the outgroup) with a G. Assign- bility of observing the data we have obtained (i.e.,
ment df any one of these three nucleotides to an- the tlp sequences). To calculate this probability,
ceslor I would be equally parsimonious, with each we need a model of evolutionary change. For
recoriscructlon explainmg all of the tip nucleotides now, suppose that the rate of substitution from
a t this position with exactly two changes. (If an- any nucleotide to any other nucleotide is the same
cestor 1 had a T at this position, three character- for all nuclcotide pairs, and that the expected
stale chn~~ges would be required.) Consequently, a number of such substitutions along any one
ncw sccluence with a C could be inserted equally branch is a function of this substitution rate and
p~rslmoi~iously (wit11 respect to this position) into the length of the branch in evolutionary time.
branches a, P, or yof the tree (Figure 9B-D). More (This is an oversimplified version of the Jukes-
generally, because adding a sequence with an A to Cantor model of nucleotide sequence change, dis-
cussed in more detail below.) For the moment, sometimes be a C-and occasionally it. would
also assume that the substitution rate is the same even be a G or a T.
throughout the tree (we will see later that this as- Returning our attention to the full tree, we
sumption is not necessary; it merely allows us to know that at least two changes must have oc-
think of branch lengths as amounts of evolution- curred, and since change is rare in this example,
ary time). The observation that all eight descen- histories with three or more changes are less
dants of ancestor 2 have nucleotide A is most con- likely than those with only two changes. But on
sistent with change being rare, so postulated which two of the three branches (a,P, or fl are
histories with fewer changes are more plausible the changes most likely to have occurred? Be-
than histories with more changes. Thus, from a cause branch a is so short, it is much more likely
maximum likelihood perspective, ancestor 2 that the two changes have occurred on branches
would have an A in those histories (ancestral state p and y than on any pair of branches involving
reconstructions) having the highest probability of branch a. Therefore, histories with an A in ances-
giving rise to the observed nucleotides. Although tor 1 are more likely than others of having gener-
histories in which ancestor 2 had a C, G, or T ated the observed data, and due to the greater
would also contribute to the overall probability of length of branch y, histories with a C in ancestor 1
the specified tree having generated the observed aremore likely than are those with a G. It seems
data, if all branches in the subtree were very short, extremely unlikely under our model that ances-
histories with an A at ancestor 2 would contribute tor 1 would have possessed a T. Thus, we obtain
the vast majority of the total probability. This is as a clear ordering of preferences for all residues. An
close as maximum likelihood gets to saying "an- important practical consequence is that, unlike
cestor 2 had an A." parsimony, this sequence position would be in-
We now move to ancestor 1. The branches formative with respect to the placement of a new
connected to this ancestor lead to ancestor 2 sequence containing a C at the site, biasing the
(probably an A) and to sequences known to pos- decision toward connecting this new sequence to
sess a C and a G (the outgroup), respectively. Ig- branch P (Figure 9C).
noring the G for the moment, let us consider It is important to remember that the only rea-
whether ancestor 1 is more likely to have pos- son for the appropriate predisposition toward tree
sessed an A or a C, given the topology of the tree 9C is the short length of branch a and the low
and the nucleotides found in the tip sequences. If overall rate of change. In this case, an improbable
ancestor 2 indeed possessed an A as expected, at substitution along branch a is avoided by placing
least one change must have occurred along the the change along the branch leading to the tips
path between ancestor 2 and the tip having a C with nucleotide C in Figure 9C. For either of the
(i.e., branches a and P). Because branch lengths trees of Figure 9B and 9D, avoiding a substitution
represent the expected number of character-state along the original branch a would require parallel
changes along a branch, when a branch is short, A -+ C changes along the lineages terminating at
there is a relatively low probability of a single taxa possessing nucleotide C. These parallelisms
cklange occurring along that branch, and an al- would be improbable events if the rate of change
most negligible probability of more than one is low, but they become more probable as rates &-
change. Thus, given that a character change crease. Thus, as branch a becomes longer and the
(probably) occurred somewhere along branches a rates of change grow faster, the preference for tree
or b, it is far more likely to have occurred along 9C will decrease.
the long branch P than the short branch a. Thus, In summary, whereas parsimony ignores in-
ancestor 1 is much more likely to have possessed formation on branch lengths when evaluating a
an A than a C. Remember, however, that the esti- tree, maximum likelihood considers that changes
mate of A at ancestor 1 is a probabilistic state- are more likely along long branches than short
ment. When the same configuration of tip states ones, and estimation of branch lengths is an im-
arises at different sites, the nucleotide found in the portant component of the method. This difference
actual ancestor will usually be an A, but it would explains the consistency of maximum likelihood
430 Chapter 11 / Swofford, Olserz, Waddell G. Hillis
under many situations in which parsimony is in- posed changes and af sampling variance is that
consistent. In the example of Figure 8, maximum even with very short sequences, maximum IikeIi-
likelihood will not be fooled by the "misinforma- hood tree inference tends to outperform alterna-
tive" pattern JV,because this pattern is very likely tive methods (e.g. parsimony or additive dis-
to occur even on the true tree. Distance methods tances) when evaluated under many models of
that adequately account for unobserved substitu- sequence evolution (see, e.g., Hasegawa and Fuji-
tions wdl also succeed in this case, altl~oughthey wara, 1993; Kuhner and Felsenstein, 1994;
tend to be less efficient, requiring more data to Huelsenbeck, 1995a).
achieve the same level of accuracy (e.g., see Hillis Several areas of biological research, notably ge-
et al., 1994b; Kul~nerand Felsenstein, 1994; netic mapping and clinical testing, routinely use
Huelsenbeck, 1995a,b). maximum likelihaod metlxods for testing hypothe-
ses, However, the perceived and a c l a l complexi-
ties of obtaining maximum likelihood solutions to
Maximum Likelihood Methods problems that involve numerous alternative hy-
Maximum likelihood methods of phylogenetic in- potheses has inhibited the more general use of
ference evaluate a hypothesis about evolutionary these techniques. The following discussion at-
history in terms of the probability that a proposed tempts to outline the elements of a maximum like-
model of the evolutionary process and the hy- lihood formation of phylogenetic inference. For ad-
pothesized history would give rise to the ob- ditional perspective, Goldman (1990) provides a
served data. It is conjectured that a history with a very accessible introduction to these concepts.
higher probabjlity of giving rise to the current
state of affairs is a preferable hypothesis to one Objective
with a lower probability of reaching the observed Phylogenetic analysis seeks to infer the history (or
state. Maximum likelihood estimation was first set of histories) that are most consistent with a set
used in phylogenetic inference by Cavalli-Sforza of observed data. In the present case, the data are
and Edwards (1967). However, because they did observed nucleotide (or protein) sequences; the
not use sequence data, this work remained rela- unknowns are the branching order and branch.
tively obscure. Felsenstein (1981a, 1993) brought lengths of the tree. To apply a maximum likeli-
the maximum likelihood framework to nu- hood approach, a concrete model of the evolu-
cleotide-based phylogenetic inference, Later, max- tionary process that accounts for the conversion
imum likelihood was applied to amino acid se- of one sequence into another must be specified.
quence data as well (Kishino et al., 1990; Adachi This model may be fully defined; alternatively, it
and Hasegawa, 1992). may contain many parameters that are to be esti-
In addition to its consistency properties, max- mated froin the data. A maximum likelihood ap-.
imum likelihood is useful because it often yields proach to phylogenetic inference evaluates the
estimates that have lower variance than other probability that the chosen evolutionary model
methods (i.e., it is frequently the estimation will have generated the observed sequences (the
method Ieast affected by sampling error). It also probability of the data under the model); phylo-
tends to be robust to many violations of the as- genies are then inferred by finding those trees that
sumptions used in its models. Part of its power in yield the highest likelihoods.
this respect is that many models of sequence evo- The basic principles involved in calculating
lution that assume identical distributions across the likelihood of a tree are introduced in Figure
sites can safeIy assume that the actual substitution 10. Figure 10A shows a set of aligned nucleotide
processes taking place at different sites have sequences for four taxa. Suppose we want to
much in common, even if they are not exactly evaluate the likelihood of the unrooted tree
identical. Consequently, the major components shown in Figure IOB; that is, what is the proba-
determining the evolution of sequences can be de- bility that this tree could have generated the data
scribed by just a few parameters. The overall re- of Figure 10A under our cl~osenmodel? Because
sult of both improved~compensationfor superim- most of the models currently used are time-re-
Phy logenetic Inference 432.
Figure 10 Overvlew of thc calculation of t11c ilkell-

hood of a tree. (A) Hypotl~etlcal. sequence alignmc.nt
(B) Ail unrooted tree for the four taxa whose sequcnies
appear m (A). (C) Trce aftcr rooting at an arbitrary m-
ternal node (D) The llkel~hoodfor a particular slte 1s
the sum of the probablllties of cvcry possible recon-
structlon of ancestral states glve~isome modcl of base
substltut~ori.(E) The likelihood of the full. tree 1s thc
product of the llkellhoods a t each slte. (F) Thc 11kcl1-
hood 1s usually evaluated by sumtnlng the log or Lhc
l~kcl~hosds a t each slte, and reported as the log Iikc11-
hood of the full trce.
cleotides. More specifically, for any given site, the

node at the root of Figure IOC might have pas-
sessed an A, a C, a T, or a G.Eor each of these pos-
sibilities, tlze other internal node might also have
.
4
possessed any of the four nucleotides. Thus, there

are 4 x 4 = 1 6 possibilit~esto consider. Since anj7
one of these scenarios could have led to the nu-
cleotide configuration at the tips of the tree, wc
must calculate the probability of each and s u m
them to obtain the total probability for each site 1
This calculation is illustrated schematically in Fig-
ure ZOD, Because we assume a Markov model
(see below), we assume that changes along differ-
ent branches are independent. ~ l z u sthe, p;obab~l-
ity of any single scenario is equal to the prod~ict
of the probabilities of the changes required by that
scenario. For instance, thc probability of t l ~ esce-
nario represented by the first tern1 of Figure 10D
is equal to tlze prior probability that the nuclcotide
at node 6 is an A (typlcallp 1/4,or the average fre-
quency of A in the origlnal sequences, depe&dlng
on the specifics of the model) times the probabil-
ity of retaining a n A along the branch leading
versible, the likelihood of the tree is generally in- from node 6 to node 5, tiines the probability of ail
dependent of the location of the root. It is conve- A -+ C change along the periph&al branch lcad-
nient to root the tree at an arbitrary internal node ing to t ~ 1,p and so on.
(e.g., Figure 10C). Having calculated the likelthaods at each SJ le,
Under the assumption that nucleotide sites the joint probability that the tree and model con-
evolve independently, we can calculate the likeli- fer up011 all sites js computed as the product of
hood for each site separately, and combine the the individual-site likelihoods (Figure IOE). Be-
likelihoods into a total value at the end. To calcu- cause the probability sf any single observation is
late the likelihood for some site j, we must con- an extremely sinall number (much too small to
sider all of the possible scenarios by which the tip represent using standard floating-point represcn-
sequences could have evolved. Obviously, some tations on a computer), we almost always evalu-
of these scenarios are inuclz more plausible than ate the log" of thelikelihood instead, so tile uroba-
others, but every scenario has at least some prob- bilities arc acculnulated as the sum of the logs of
ability of generating any pattern of observed nu- the single-site likelihoods (Figure 10F).
432 Chapter 11 / Swofford, Olsen, Waddell b Hillis
Nlo~lelsof Secluencc Evolution

The cnhcal element missing from the above overview is how the probabilities of
the varlous changes are calculated. These probabilities depend on several as-
sumptlons about the process of nucleotide substitution, which define a substitu-
tion nnodel. We will restrict our attention here to Markov models, in which the
probability of a change from state i to state j at a given site does not depend on the
history of [he site prior to its possession of state i. For example, if a sequence po-
sition has base A at some time to, the probability that it will have base T at a later
time il depends only on the fact that it has base A at to; knowing that it had state
(2
' at some tlme prior to t, would be irrelevant to the probability. We will also as-
suil-ie that the substitution probabilities do not change in different parts of the
tree (1 e , that the evolutionary mcchanisms responsible for sequence change con-
stitute a homogeneous Markov process). The use of Markov processes to model
nucleotlde substitution has been discussed by Felsenstein (1981a1, Lanave et al.
(1984), Tavari. (1986), Barry and Hartigan (1987a,b), Kishino and Hasegawa
(3990),Rodriguez et al. (1990),and Zharkikh (1994), among others.
The mathematical expression of a substitution model is a table of rates (sub-
stitutions per site per unit evolutionary distance) at which each nucleotide is re-
placed by each alternative nucleotide. For DNA sequences, these rates can be ex-
pressed as a 4 x 4 instantaneous rate matrix, Q, in which each element Q,
represents the rate of changc from base i to base j during some infinitesimal time
period d l . The most general form of this matrix is
tvhere the rows (and columns) correspond to the bases A, C, G, and T, respec-
tively The factor p represents the mean instantaneous substitution rate. This
.
mean rate is modified by the relative rate parameters a, b, c, . ., 1, which corre-
spond to each possible transforlnation from one base to a different base. The
product of a relative rate parameter and the mean instantaneous substitution rate
constitutes a rate pauai?leter. The remaining parameters, ~CA,ele, and n~,are frL.-
ijueirc9 parameters that correspond to the frequencies of the bases A, C, G, and T,
resyecilvely ( Z . Yang, 1994a).We assume that these frequencies remain constant
over time (i.c., they are always at equilibrium), and that the rate of change to each
base w proportional to the equilibrium frequency but independent of the identity
of il-ie starting base. The diagonal elements of Q are always chosen SO that the el-
ements in the corresponding row sum to zero. It is sometimes convenient to de-
compose Q into two matrices R and n, where
and
l o 0 0
The off-diagonal elements of Q are then equal to the off-diagonal elements of the
inatrix product m, and the diagonal elements of Q are once again set to the neg-
ative of the sum of the off-diagonal elements for the corresponding row. Analo-
gous matrices can be defined for protein sequence data, except that there are 20
states rather than 4.
Almost all of the DNA substitution models proposed to date are special cases
of matrix (3). It is usually assumed that the overall rate of change from base i to
base 1 in a given length of time is the same as the rate of change from base j to
base i. Such models are said to be time-reversible, This corresponds to the rate
parameter restrictions g = a, h = b, i = c, j = d, k = e, and 1 = f. One byproduct of
time reversibility is that the likelilzood of a tree generally does not depend on how
the tree is rooted. Consequently, as for most of tlze parsimony methods discussed
above, maximum likelihood estimation is usually limited to the inference of un-
rooted trees, and other assumptions must be invoked to convert an unrooted tree
into a rooted one. Although it is possible to relax the time-reversibility assump-
tion, this relaxation introduces additional computational complications, includ-
ing the need to consider rooted trees. Thus, we will only consider symmetric R
matrices of the form
The most general time-reversible model (GTR) is then represented by
Pan, @ubn,
---(anA d~
i- -I-enT) M ~ G
- p ( b ~ , +d z c +f ~ , ) (4)
,ud%
Wnc P~ZG -p(cn, + en, + frt,)
434 I
Chapter 11 Swofford, Olselz, Waddell b Hillis
3 substitution types
(h.ansverslons,2 transition
Tr N SYM
2 substitution types 3 substitution types
(transitions vs, transversions) (transitions,2 transversion
classes)
4 4
HKY85
Equal basc
frequencies
Single substitution type 2 substitution types
(transitionsvs. transversions)
F81
Equal base frequencies
//
Figure 11 Relationship between special cases of the
general time-reversible family of substitution models.
JC
Single substitution type
era1 time-reversible (Lanave el al., 1984; Tavar6, 1986;

Rodriguez et al., 1990); HKY85, Hasegawa-Kishino-
Arrow labels indicate restrictions that convert a more Yano model (Wasegawa et al., 1985b); JC, Jukes and
general model to a more specific one. Model abbrevia- Cantor (1969) model; K2P, Kimura (1980) twa-parame-
tions: F81, model of Felsenstcin, 1981a (equivalent to termodel; K3ST, Kimura (1981) three-substitution-type
the "equal input" model of Tajirna and Nei, 1982); F84, model; SYM, model described by Zharkikh (1994); TrN,
model used in versions 2.6 and later of PHYLIP (Wlsen- Tamura and Nei (1993) n~odel.
stein, 1993; Kishino and Hasegawa, 1989); GTR, Gen-
(Lanave et al., 1984; Tavar6, 1986; Barry and Hartigan, 198%; Rodriguez et al.,
1990). Most of the remaining models commonly used either for maximum likeli-
hood tree inference or estimation of pairwise evolutionary distances can be ob-
tained by restricting the parameters in matrix (4), as shown in Figure 11. For in-
stance, if the substitutibn t v ~ e are
$ I
s divided into transversions, transitions
between purines, and transitions between pyrimidines, we obtain the model of
Tamura and Nei (1993; TrN) by requiring that a = c = d = f. Similarly, we can ob-
tain Kimura's (1981) three-substitution-type (K3ST) model by requiring that all
bases occur in equal frequency (a = =7 5= ~ = 0.25) and dividing the substi-
tution types into transitions (b = e), A ++T or C ++ G transversions (c = df, and
A u C or G ++ T transversions (a = f). Zharkikh (1994) described a model (SYM)
that is equivalent to GTR except that it assumes equal base frequencies.Any other
restriction of the relative rates from the general time-reversible model (e.g., a = c,
e = f ) is possible; all such models are also time-reversible.
Further restrictions 011 the parameters in matrix (4) lead to more familiar
models. If we assume that the equilibrium frequencies of all bases are the same
= =
(nA= = = 9= 0.25) and that all substitutions occur at the same rate (a = b =
c = d = e = f = 1, the model reduces to that of Jukes and Cantor UC) (1969):
T11e base frequency and substitution rate are typically combined into a single pa-
rameter a = M4,leading to the simpler form:
IGnzura's (1980) two-parameter model (K2.P) takes into account the conlrnon ob-
servation that transitions and transversions occur at different rates, but still as-
sumes equal base frequencies. Thus we set a = c = d = f = 1 and b = e = K and ob-
tain
Letting the transition rate a= P K / ~and the transversion rate P = y/4,the above
can be rewritten as
Note that K = a/p represents the transition bias. When K = 1, there is no pref-
erence for transitions and the model reduces to the JC model. However, because
there are twice as many kinds of transversions as transitions, the expected transi-
tion:transversion ratio is 1:2. Similarly, if K = 4, we would then expect twice as
many transitions as transversions.
436 Chapter 11 / Swofforci, Olsen, Waddell G. Hillis
The K2P model can easily be generalized to allow unequal equilibrium base
frequencies (13asegawa et al., 1985b).The instantaneous rate matrix for this model
(HKY85) IS thcn given by
t\ here n):= nA + R,and ny = q + n ~This

a = p, /3 = ,u~, . corresponds to the GTR
lnodel \tqth the constraints a = c = d = f = 1 and b = e = K T11e JC model can like-
wise be generalized to allow for unequal base frequencies (Felsenstein,1981a; the
requiring that a = b = c =
F81 rnodel) by setting K = 1 in matrix (5) or, eq~~ivalently,
d = e = f = 1 in matrix (4):
-p(n, + n,) pnC P%

Q=[ p n ~ -/dnl< + ZT) Pnc
A'n,
n~
Pnc
Pnc
+by
Pnc
+ZA)
;; PT
-p(zR+ nc)
Thli model was also described as the "equal input" model by Tajirna and Nei
(1982).
Felsenstein (1984) used a different method to accommodate unequal base fre-
quencles in a two-parameter inodel (the F84 model, formally described in Kishino
and 1-lasegawa, 1989). T11c F84 mode1 divides the substitution process into two
components: a gel~ernlsubstitution rate capable of producing all types of substi-
tutions, and a wifhrrr-group substitution rate that produces only transitions. The
instani.?neous rate matrix for the F84 model can be obtained from matrix (4) by
scrt~ngi, = c = d = f = 1, h = (1+ K/q?), and e = (1 + K/zY):
whcre K 1s the parameter determining the transiti0n:transversion ratio, n~= n~+

, = jzc + n ~and
~ L G/c;i , the diagonal elements are set to the negative of the sum of
the ofl-diagonal ele1nenLs ill the corresponding row. The elements of the above
matrlx corresponding to transitions each have two components, because transi-
tions can occur due to either the general substitution rate or the withill-group
rate. When K = 0,this model collapses to the F81 model. As K increases above
zero, transitions occur more and more frequently relative to transversions.
Cnlculating Change Probabilities

The instantaneous rate matrix Q specifies the rates of change between pairs of
nucleotides per instant of time dt, but in order to calculate likelihoods we need
the probabilities of change from any state to any other along a branch of length t.
The substitution probability matrix* is calculated as
(e.g., Cox and Miller, 1977; Hasegawa et al., 1985b; Z. Yang, 1994a). The exponen-
tial can be evaluated by decomposing the instantaneous rate matrix Q into its
eigenvalues and eigenvectors (we omit the details of how this is done, but see
Lewis et al., 1996, for an introductory explanation of the techniques used). For
several models, simple expressions exist for the eigenvalues, allowing direct an-
alytic calculation of the elements of the substitution probability matrix. For ex-
ample, in the K2P model of DNA substitution, there are only three probabilities to
consider: the probability of a transversion-type substitution; the probability a
transition-type substitution; and the probability of no substitution. These proba-
bilities are:
(i # j, transversion)
The full substitution probability matrix is then given by:
Substitution probabilities for some other DNA models are as follows (see Lewis ct
al., 1996):
'We refer to this matrix as the substitution probability matrix rather than the more traditional
transition probability matrix to avoid confusion with "transition" in the sense of a change
between two purines or between two pyrimidines.
438 Chapter 11 / Swoffoud, Olsen, Waddell b Hillis
HKY85, F84 : c,(L)= + "[?- l e t-e (i + j, transition)
/ .,(I - c") (i ;t j, transversion)
where A = 1 + JJj ( K - 1) for the HKY85 model and A = K + 1 for the F84 model,
with Hj = Q + Q if base j is a purine (A or G) and Ell = Q + n~ if base j is a
pyrimidine (C or T). Substitution probabilities for the remaining models can be
calculated by numerical evaluation of the eigenvalues and eigenvectors of Q us-
ing standard algorithms (2.Yang, 1994a; Lewis et al., 1996).
CHANGE PROBABILITIES FOR PROTEIN SEQUENCE DATA The techniques described

above for DNA sequence data can be applied to protein sequences as welI; the
difficulty lies in specifying an appropriate model of amino acid replacement.
The simplest model is a Poisson model, analogous to the JC model for DhrA
sequences but extended to 20 states (e.g., Kishino et al., 1990), which assumes
that all changes between amino acids occur at the same rate and that the equi-
librium frequencies of all amino acids are equal. The change probabilities for
this model are given by:
Poisson : qj(Z) =
20 *
20 e-pt (i # j)
The assumption of equal alnino acid frequencies is clearly unreasonable for pro-
tein sequence data. If substitution rates are still assumed to be equal, an analog to
the Felsenstein (1981a) model would have the same basic form as the instanta-
neous rate matrix of (ti), but with 20 states instead of 4. This model has been
called the proportional model by Hasegawa and Fujiwara (1993). The corre-
sponding change probabilities are the same as (7):
where g now represents amino acid frequencies rather than base frequencies, Al-
though this model is preferable to the Poisson model, it st111assumes that the rel-
ative frequencies of the amino acids are constant across sites. This assumption 1s
clearly violated as well (e.g., hydrophobic amino acids predominate in some rc-
gions of a protein, while hydrophilic amino acids predominate in others).
A large body of empirical evidence dcmonstra tes that an amino acid 1s more
likely to bc replaced by a physicochemically similar amino acid than would be
predicted by an equal-change-probability model (Dayhoff et al., 1978).Klshino ct
al. (1990) were able to derive a maximum likelil~oodmethod analogous to the
general time-reversible model for DNA sequences by using an instantancous rate
matrix derived froin Dayhoff et al.'s (1978) empirical substitution matrix. Thls
model has been implemented as the Dayhoff model in the PROTML program of
the MOLPHY package (Adachi and I-lasegawa, 1992). More recently, a model
(JTT)based on the updated empirical substitution matrix of D.T. Jones et al. (1992)
has been added to PROTML; preliminary evidence indicates that this modifica-
tion provides a better model for the evolution of diverse proteins than the Day-
hoff model (Cao et al., 1994).
Protein-coding DNA sequences can be analyzed using either the origlnal
DNA sequences or the translated proteins (with some complications). Some in-
formation is lost in the translation to protein sequences. 0 1 1 the other hand, an
obvlous limitation to use of the original DNA sequences is that the assumption
of equal rates of change for all sites is violated due to the degeneracy of the ge-
netic code; a greater proportion of synonymous changes allows third positions to
evolve at a much more rapid rate than first and second posltlons. This problem 1s
easily corrected by allowing relative rates to be specified on a site-specific basis
(see below). However, selection at the amino acid or codon level will cause the
assumption of independence among sites to be violated as well. Consequently,
maximum likelihood analyses of protem-coding DNA sequences probably sl~ould
be conducted at the protein level unless the sequences are not very divergent (see
Reeves, 1992, for a discussion of these and related ~ssues). An alternative is to use
a model of codon evolution wit11 61 states (Muse and Gaut, 1994; Goldman and
Yang, 19941, retaining the full information content of the DNA sequences. Unfor-
tunately, codon-based models are still in their infancy and are much more corn-
putationally intensive than 4-state (or even 20-state) models.
THE RELATIONSHIP BETWEEN SUBSTITUTION RATE AND TIME FOXall of these models,
the probability of a change from state i to state j depends on the interaction of
the duration of time t and the substitution rate ,u only tluough their product pf
(Felsenstein, 1981a).Thus, a branch could be "long" either because it represents
a long period of evolutionary time or because the rate of substitution has been
high, In general, ~tis imposs~bleto tease these two components apart unless one
is willing to assume a perfect molecular clock. Consequently, the mean substitu-
440 Chapter 11 / Szoofford, Olsen, Waddell & Hillis
tion rate 1 1 1s usually set to 1 and the relative rate goodness-of-fit statistic and then search for a
paralneters a, b, ..., jare scaled so that the average model that maximizes this statistic without adding
rate of substitution at equilibrium is 1 (e.g., Z. uruwcessary parameters that do little more than ex-
Yang, 1994a).The length of a branch then repre- plain random fluctuations in the data. If we can as-
senti clrc expected number of substitutions per sume that sites in the sequence evolve indepen-
sllc along that branch, with no implication as to dently, then the data represent a multinornial
l h e actucli amount of evolutionary time it- repre- sample, so goodness-of-fitstatistics such as a ~2 or
sen~s. the log likelihood ratio test (e.g., G of Sokal and
These models allow the expected number of Rohlf, 1981) can be used to measure the fit of the
substitutions to be different for each branch of observed data to the predictions of the model (see
the tree. As noted above, one consequence of this Navidi et al., 2991 for a general discussion, and Rit-
freedom is that the likelihood of a tree can be cal- land and Clegg, 1987for examples).In phylogenet-
culated independently of the location of the root. ics it is more common to use the likelihood ratio
If one 1s willing to assume that the substitution statistic, which (unlike the ~2 statistic) does not re-
rate xs approximately homogeneous across lin- quire the expected probability of all distinct nu-
eages, then the likelihood can be estimated un- cleotide patterns to be calculated.As with a contil-t-
der a rnolec~~lar clock model by estiinating gency table analysis, we expect that with a large
blanching times rather than the lengtl-ts of each amount of data, the G statistic will behave like a ~ 2 -
branch (Bishop and Friday, 1985; Felsenstein, distributed random variable, assuming the model
1993) (Note that this model then requires evalu- is correct. (Likelihood-ratio tests of model fit are
ation of rooted rather than unrooted trees.) Be- further described in the section on "Reliability of
cause the clock model req~~ires estimation of only Inferred Trees.") A related measure, the Akaike in-
about half as many parameters as the uncon- formation criterion (Akaike, 19741, can also be used
strained model [(T- 1)/(2T - 311, it will be more to choose the most appropriate model (e.g.,
effic~ent(in the sense of requiring less data to Kishino and Hasegawa, 1990), althougl~in practice
achieve the same level of accuracy) if the clock this measure is similar to a variety of other model
assuli-iptlons are valid. Felsenstein (1993) out- selection criteria (see A.J.Miller, 1990).It is also im-
lined a likelihood ratio test of the molecular portant to avoid overconfidence when one model
clock that compares the likelihoods of the more fits the data much better than another if the over-
constrained clock model to the unconstrained- all fit is not good, si~iceboth models could be quite
branch-length model. inadequate.
cr-TOOSING A N APPROPRIATE MODEL In a pl-tyloge- Calculating the Likelihood of a Tree

netic analysis, model selection and evaluation To calculate the likelihood of a full tree, it is nec-
are interrelated. There are two main criteria for essary to consider the likelihoods of the occur-
evaluating a phylogenetic model: how well it fits rence of each state at each node in the tree as a
the ciala at hand, and how well it fits with other function of the tree topology and branch lengths.
reliable data (sometimes called congruence in the As with othcr methods that define the optimal
case of comparing trees). In selecting a model tree in terms of an optimality criterion (e,g., least-
based on fit of data at hand, there are tradeoffs to squares and parsimony), we will assume that the
consider. We can always improve the apparent tree is given, and that the present task is to deter-
fit oi a model by adding additional parameters, mine how good it is. The method for evaluating
buL estimating these additional parameters also the likelihood of a given tree proceeds from a 11y-
leacts to klgher sampling variai-tces. Measures of pothetical root node at any convenient location in
fit are useful in deciding whether it is worth the tree, and combines the likelihoods of each of
a d d ~ n gan extra parameter (see A J. Miller, 1990). its daughter trees (i.e., descendant lineages). (For
The general approach is to choose an overall time-reversible models, the choice of root location
will not change the likelihood of the tree.) If A is

an ancestor that gave rise to sequences B and C,
then the conditional likelihood of state i at se-
quence position in A is
where A, B, C, D, and E are the original se-

where v,, is the length of the branch joining sequences; F, G, and H are the labels of the internal
quence x to sequence y. We say "conditional" nodes; and the hypothetical root has been placed
likelihood because this value actually represents at node G. The overall likelihood would be the
the likelihood of the subtree descending from product over positions. The four factors of the
node A given that xA = i. In words, the condi- outer summation are: (1) the prior probability of
tional likelihood that A has state i is the product a state with identity m;(2) the conditional likeli-
of the likelihoods that the i could have given rise hood of state m at node G giving rise to state k at
to the outcomes in B and C. The first term on the node F, and k giving rise to XA, at node A, and k
riglit-hand side is the probability of state i chang- giving rise to XB] at node B; (3) the conditional
ing to state k in the interval VAB, Plk(~ADAB),
times the likelihood of m giving rise to I at node H, and 1
liltelihood that sequence B has state k at the cor- giving rise to x ~at, node D, and 1 giving rise to x ~ ,
responding position, summed over all possible at node E; and (4) the conditional likelihood of m
values of k. If B is a known sequence, then the giving rise to x q at node C. Tliis basic pattern can
likelihood that position j has state k is 1 if k is be expanded to trees of any size.
equal to the observed state in the sequence, or In the above description, we implicitly as-
zero otherwise. On the other hand, if B is an an- sumed that the branch lengths were known, but
cestor, then the likelihoods of it having state k are of course these are in general unknown and must
derived recursively, by inserting another copy of be estimated as part of the process of computing a
the right-hand side of (8) into the equation. The likelihood. The methods for finding the branch
second term in equation (8) is analogous to the lengths that maximize the value of the likelihood
first, but refers to the lineage leading to C. Calcu- function are beyond the scope of this chapter, but
lating the likelihood of the entire evolutionary
tree at sequence position j requires multiplying
the conditional likelihood of each possible state
at the root node, L(xAI= i), by its prior probabil-
ity, z,, and summing over all ancestral states i
Usually the root node will be made coincident
with one of the other nodes in the tree, eliminat-
ing one branch and one summation, as shown in
Figure 10C. The product of the position-specific
likelihoods is the overall likelihood of the tree.
Again, this is usually expressed as a sum of the Figure 12 An evolutionary tree of five sequences. The
known sequences are at the terminal nodes and are la-
log-likelihoods for each position. beled A, B, C, D, and E. The nodes F, G, and H repre-
Figure 12 illustrates a tree of five sequences. sent ancestral sequences.The likelihood of this tree for
The corresponding likelihood for a position j is: a particular site is calculated using equation (9).
442 Clzapter 21 / Swofford, Olsen, Waddell & Mlis
typically involve an Iterative approach in which tl-te likelihood of an ancestral state 1s tl-te product
each branch is optimized separately by Newton's of the likelihoods of the state giving rise to the
method (e-g.,Kislzino et al., 1990; G.J. Olsen et al., daughter trees. In parsimony, the total cost of the
1992; Tillier, 1994; Lewls et al., 1996).This method tree is the sum of tl-te costs at each position,
rs guaranteed to find globally optimal branch whereas the net log-likelihood of a tree is the sum
lengths for a given tree topology only if there is at of the log-likelihoods of the evolution at each se-
most one maximum on the likelihood surface. Al- quence position. Essential differences between the
though Fukami and Tateno (1989) claimed to have general parsimony approach and the maximum
proved this to be the case, Steel (199bb) presented likelihood approach include: the cost of a change
a simple counterexample demonstrating that mul- in parsimony is not a function of branch length,
tiple optimality peaks could occur and found tl-te unlike maximum likelrhood; and maximum par-
error in Fukami and Tateno's proof. Steel's exam- simony looks only at the single, lowest cost solu-
ple was artificial, but preliminary results (J.S. tion, whereas maximum likelihood looks at the
Rogers and D.L. Swofford, unpublished data) in- combined likelihood for all solutions (ancestral
dicate that the problem can occur with real data states) consistent with the tree and branch lengths
sets as well. So far, local optima seem to occur (see the discussion of integrated likelihood in
only on trees that provide extremely poor expla- Goldman, 1990). Felsenstein has used the rela-
nations of the data (e.g., random trees). tionship between likelihood and parsimony to
It is important to emphasize that the method gain several insights into the parsimony criterion,
for calculating likelihoods described in this sec- including the discovery of the potential for incon-
tion does not require calculation of the probabili- sistency due to unequal rates (Felsenstein, 19784
ties of each possible reconstruction of ancestral and the inference of a character-weighting ration-
states as was shown in the conceptual example of ale (Felsenstein, 1981~).
Figure 10. The two methods are in fact equivalent,
but if we were indeed required to consider all Accomnzodating Rate Heterogeneity across Sites
possible reconstructions, the problem would be- The maximum likelihood models described above
come essentially intractable, as there are 4T-2pos- all assume that every site evolves at the same rate.
sible reconstructions for DNA sequence data and , Violation of this assumption can have devastating
20T-2possible reconstructions for protein sequence consequences. For instance, Gaut and Lewis
data. For example, a data set of 20 taxa and DNA (1995) showed that maximum likelihood inference
sequences of length 2000 would require calcula- under the assumption of rate homogeneity can
tion of the probabilities of 1.4 x 1014 reconstruc- become inconsistent when the true evolutionary
tions for a given topology and set of branch process exhibits site-to-site rate variation, even
lengths, and adjustment of even one branch when all other aspects of the process are modeled
length would require recalculation of all of them. perfectly. If there is strong variation in rates across
It is extremely fortuitous that the probability sum- sites, sites that are resistant to change (e.g., due to
mations can be rearranged into forms like equa- strong selective constraints) can hide tl-te actual
tion (9) (corresponding to the "pruning" algo- amount of change that has occurred at more
rithm of Felsenstein, 1981a). rapidly evolving sites. This causes maximum like-
Evaluation of the likelihood of a tree and lihood to underestimate the number of multiple
counting the number of clzanges of a tree under changes; the longer the branch the greater the un-
the general parsimony criterion are similar in sev- derestimation. Thus, maximum likelihood can be-
eral respects. The cost of a given change under come "positively misleading" (Felsenstein, 1978a)
parsimony is analogous to the likelil-tood of the for exactly the same reasons as parsimony (Figure
given change from the substitution matrix, P(f). h-t 8): highly divergent sequences will appear to be
parsimony, the cost of placing a given state at an more closeIy related than they actually are (see
internaI node is the sum of the costs of deriving Lockhart et al., 1995a, for a probable example of
both of the daughter trees from that state, whereas this problem with rcal. data).
Phylogerzefic Inferu~cc 443
Rate heterogeneity can be incorporatcd into substitutions (perhaps due to strong functional.
likelil~oodanalyses by including an additional rel- constraint), but that the remaining sites all vary at.
ative ratc component, r , into the substitution the same rate (Hasegawa et al., 1985b;Churchill
probability expressions. In the JC model, for ex- et al., 1992; Reeves, 1992; Sidou7et al., 1992). I n
ample, we let this case, when r = 0,Pi,(t,y) = I and Pii(t,r) = O for
all i st j. The proportion of invariable sites s111.1stei-
ther be estimated separately (see below) or treated
as a parameter that is optimized for each tree.
Therc is no reason in principle to restrict the rato
of one of the categories to O (no change), or to
limit the number of categories to 2, but estimation
If the relative rates u are scaled so that the mean of the proportion of sites within each category
substitution rate remains 1, branch lengths will and the relative rates among categories becomes
still reflect the number of substitutions per site. In much more complicated otherwise.
the simplest case, we simply assign a rate ul to The most cominonly used continuous distrib-
each site j. Typically, the basis for this assignment ution for modeling rate heterogeneity i s the
would be some a p r i m classificatiol~of sites into gamma (r)distribution (e.g.,Z. Yang, 1993; Steel
functional categories and assignment of relative et al., 1993~). The r distribution has two parame-
rates to the categories. Categorizations might be ters, a shape parameter a a n d a scale parameter P.
first, second, and third positions of a protein-cod- By setting P to l/a, a distribution with a mcan
ing gene, or paired versus unpaired sites for a ri- rate of 1 is obtaincd, and a wide variety of rate
bosomal RNA gene. It is also possible to assign distributions can be obtaincd by varying cx (Fig-
sites to rate categories based on the observed pat- ure 13).
tern of residue change. Van de Peer et al. (1993) The shape parameter a is equal to the inverse
proposed a way to do this by observing the fre- of the coefficient of variation of the substitutioll
quency with which sequence pairs differ at each rate, so that as a! increases, the distribution con-
site as a function of the distance between the se- verges to an equal-rates ~nodcl.Obtaining likeli-
quence pair. G.J. Olsen has written a program hoods by integrating over the r distribution (or
(DNArates; see Appendix) that performs a maxi- any other continuous distribution) is usually ex-
mum likelihood estimate of the rate at each site tremely cornputationally intensive ( Z . Yang, 1993;
for a given phylogenetic tree. see the section on Hadamard conjugation for a
Several stochastic models that explicitly in- fast method under some models). Z. Yang (1994b)
corporate site-to-site rate variation are available. evaluated an alternative procedure in which the f
In these models, each site has a certain probabil- distribution is divided into several rate categories
ity of evolving at any rate contained in some by finding boundaries in the d i s t r i b u t i o ~such
~
probability distribution, which may either be dis- that each category has equal probability. The
crete or continuous. For a discrete rate distribu- mean (or median) of each category is then used to
tion, the full likelihood for a given site is obtained represent all of the rates within that category. Z .
by summing over rate categories the likelihoods Yang (1994b) found that this "discrete gamma"
of the site given each rate, weighted by the proba- model can provide a good appmxirnation with as
bility that the site is drawn from each category few as four ratc categories. The advantage of u s -
(Felsenstein, 1981a).Site likelihoods are calculated ing a discrete model is that it requircs only a tiny
a~~alogously for a continuous rate distributio~~ ex- fraction of the computer time needed for the con-
cept that the likelihoods must be integrated over tinuous model. The discrete r distribution, like
the entire distribution. the contii~uouscase, only adds one extra parame-
The simplest model based on a discrete distri- ter to the model (the shape parameter), no matter
bution is an invariable-sites model that assumes how may rate categories are considered.
some fraction of the sites is incapable of accepting In some situations, mixtures of rate hetero-
444 Chapter 11 / Swofoud, Olsen, Waddell & Hillis
Rate
Figure 13 The gamma distlibutlon for four different tribution becomes more peaked and symmetrical
values of the shape parameter (a). When a is small, about a mean rate of 1.0. When a is ~nfinity,all sites
most of \he sltes evolve very slowly, but a few sites have relative rate 1.0, so that an equal-rates model can
have modcrate-to-high rates. As a Increases, the dis- be obtained as a special case of the gamlna model.
geneiiy models may be appropriate. For example, ally optimal values of these parameters in the n-
Gu t i al. (1995) and WaddelI and Penny (1996a) dimensional parameter space, That is, we would
have proposed an "invar~ant+ gamma" model, m consider every possible tree and optimize
which some fractjon of the sites, 0, are invariable, (jointly) all parameters of the model for each tree,
with the remaining rates distributed according to choosing the resulting tree(s) of highest likeli-
a l- distribution with shape parameter a. hood. ~~r a given tree,-one could perform a mul-
tidimensional optimization using Newton's
Esti-itlafingModel Pnrnrneters method (e.g., A.W.F. Edwards, 1972). Unfortu-
The 11.iodcls described above contain a variety of nately, this approach is difficult to implement be-
palaincters that must be estimated from the data cause it requires knowledge of the first and sec-
or supplied on the basis of extrinsic evidence. ond partial derivatives (and second cross-
The5e parameters include: the tree topology; the derivatives) of the likelihood function with re-
branch-length estimates (which are spec~ficto spect to each of the parameters. Even when these
eLichtopology); the relative rate parameters of the derivatives are available, their computation can
subsiltution models (a, b, ..., fl in matrix (4) or re- be quite slow.
laied parameters such as K and K; the base-fre- In the section "Calculating the Likelihood of
CIUEnCy parameters (nA,ZC, TCG, and nT),and the a Tree," we described a procedure that finds
para~i~eters used in n~odelingrate heterogeneity branch lengths that are at least locally optimal,
(gan\lnd shape parameter, proportion of invari- given the Glues of any other parameters in the
abld ilces, etc.).Ideally, we would search for glob- model. For any model more complex than the
JCIPoisson models, the values of additional para- for the data (e.g., a parsimony tree, or a maxi-
meters should be simultaneously optimized. mum likelihood tree inferred under the model of
When the model contaills only one additional pa- Jukes and Cantor, 1969)and then "fix" the result-
rameter (e.g., in the K2P model or the shape pa- ing estimates in a search for better trees under the
rameter in the J C t r model), it is relatively easy to desired model. A successive approximations ap-
plot the likelihood function evaluated at various proach might work very well in this case. That is,
values of the parameter of interest and thereby if a tree of higher likelihood is found, the para-
find a value that approximately maximizes the meters could be re-optimized on this new tree
likelihood (e.g., Felsenstein, 1993).Obviously, this and fixed for yet another search, alternating be-
procedure can be quite tedious. tween estimation and tree-searching until the
A method that has worked weU for one of us same tree is found in successive iterations. Al-
(DLS) is the use of derivative-free methods for though this strategy seems quite promising, its
function minimization developed by Brent (1973) effectiveness needs to be confirmed in empirical
for a single variable and M.J.D. Powell (1964; as studies. Note that one of the limitations ascribed
modified by Brent, 1973) for two or more vari- to the use of successive approximations in parsi-
ables. The procedure implemented in PAUP* mony character weighting is not relevant in this
(Swofford, 1996)is to use the Brent-Powell meth- case, because the likelihood function provides an
ods to find optimal values for all parameters other objective function that is comparable across para-
than branch lengths. When these algorithms need meter values and trees.
to evaluate the likelihood function, optimal An alternative to the methods presented
branch lengths (conditional on the current values above is to estimate the model parameters using
of the other parameters) are obtained using New- methods other than likelihood. For example, the
ton's method as described above. Thus, optimal rshape parameter can be approximated by fitting
values of all parameters are obtained when the al- a negative binomial distribution to a frequency
gorithm converges. (As for all heuristic methods, distribution of the number of changes required at
howevcr, there is no guarantee that the resulting each site under the parsimony criterion (e.g.,
solution is globally optimal.) For small data sets Uzzell and Corbin, 1971; Kocher and Wilson,
(4-8 taxa), this strategy can be used for every tree 1991; Wakeley, 1993; Sullivan et al., 1995a).A sim-
evaluated due to the small size of the trees and ilar approach can be used to estimate the propor-
the modest number of topologies tested. How- tion of invariable sites using the Poisson distribu-
ever, optimization of all model parameters on tion (Fitch and Markowitz, 1970; Markowitz,
every tree tested dramatically slows the search us- 1970).Sidow et al. (1992) described another inter-
ing larger data sets. Z. Yang and coworkers (Yang, esting method for estimating the proportion of in-
1994a,b,c; Yang et al., 1994) have suggested that variable sites based on a mark-recapture model
parameter estimates are fairly stable across tree (Seber, 1982).These estimates require different as-
topologies as long as the trees are not "too wrong" sumptions than maximum likelihood tree models
(Yang, 11995). Estimates of the shape parameter for and can be calculated quickly, so they may be use-
the r model of site-to-site rate variation appear to ful as a first approximation for selecting a model,
be somewhat more sensitive to the tree topology obtaining starting parameter values for maximum
than substitution-rate parameters (Yang, 1995; likelihood estimation, or examining the effect of
Sullivan et al., 1995b), although these conclusions tree topology on parameter estimates (e.g., Sulli-
are largely based on comparison of trees that van et al., 1995a).
probably fall into the "too wrong" category (e.g.,
random trees or star trees). Maximum Likelihood Methods for
As long as parameter estimates are not Other Data Types
wildly unstable across tree topologies, a poten- Maximum likelihood methods also can be applied
tially useful method would be to estimate the to other data types, such as gene frequencies
model parameters on some reasonably good tree (Felsenstein, 1981b) or restriction sites (Felsen-
446 Chapter 11 / Swofford, Olsen, Waddell & Hillis
stein, 1992b).The basic approach is the same as likelihood methods have consistently outper-
that described above for sequence data: one for- formed distance methods in choosing the correct
mulates a model of evolutionary change and cal- tree (e.g., Kuhner and Felsenstcin, 1994; Z. Yang,
culates the probabihty that tlze observed data (111 1994c; Huelsenbeck, 1995a).Although some other
this case, restriction site presences/absences or ar- studies have reported better performance of some
rays of gene frequencies) would have been gener- distance methods (Saitou, 1988; Saitou and Iinan-
ated by a particular tree topology under the ishi, 1989; Tateno et al., 19941, these results have
model. The mechanics of estimating branch subsequently been shown to be based on inade-
lengths and other model parameters are essen- quate computer programs and/or inappropriate
tially equivalent; the differences lie in the form of comparisons (Hasegawa et al., 1991; Z. Yang,
the models and how clzange probabilities are cal- 1994c; Huelsenbeck, 1995b).
culated. For some sources of data, including im-
munology and nucleic acid hybridization, there is
no alternative to the use of distance methods. For
Pairwise Distance Methods other types of data, i~zcludingmacromolecular se-
A critical point made in the comparison of parsi- quence, restriction site, and allozyme data, dis-
mony and likelihood metlzods above was that tances can provide a way to take advantage ~f
parsimony methods seek solutions that minimize models of evolutionary change when likelil.rood
the amount of evolutionary cl~angerequired to methods are either unavailable or intractable. Un-
explain the data, whereas liltelihood methods at- til recently, computers have been too slow and al-
tempt to estimate the actual amount of clzange ac- gorithms too inefficient to exploit fully the advan-
cording to an evolutionary model. This distinction tages of maximum likelil~oodtechniques, and
is reIevant because as mutations are fixed in the distance methods played a more important role.
genome, there is an ever-increasing chance of su- Even with the availability of faster maximum like-
perimposed changes occurring at a single se- lihood computer programs (see Appendix), dis-
quence position: changes at a particular site along tance methods remain useful, particularly for the
a lineage of the phylogeny may mask earlier analysis of large data sets, where their increased
changes at that sitc, and parallel or convergent speed allows more thorough testing of alternative
changes may occur at the same site in different tree topologies.
lineages. Thus, estimates of the amount of evolu- The negative side of reducing character data
tionary change implied by parsimony will be un- to pairwise distances is that information is lost in
derestimates of the true amount of change, unless the transformation. For instance, Penny (1982) has
the actual rate of change is extremely small. shown examples in which several different sets of
An alternative to the use of likelihood for sequences yield tlze same distance matrix, but
minimizing the impact of the underestimation given only the distances it is impossible to go
problem is the use of corrected distances that ac- back to the original sequences. Although this loss
count for superimposed changes by estimating of information probably explains tlze better per-
tlze number of unseen events using the same sorts formance of character-based maximum likelihood
of models employed in maximum Iikelihood inference, it clearly is not devastating. In fact,
analysis. The corrected distances are then esti- Inany sequence data sets yield identical conclu-
mates of the true evolutionary distance, whch re- sions with character-based and distance-based
flects the actual mean number of changes per site analyses (e.g., G.J. Olsen, 1987).Another draw-
that have occurred between a pair of sequences back to d~stanceanalysis is that it does not lend it-
since their divergence from a common ancestor. self to the combination of different kinds of data
Thus, following Cavaili-Sforza and Edwards into the same analysis, as is possible for character-
(1967), we view distance methods as less desirable based analyses (e.g., Miyamoto, 1985). Finally,
appraximations to a full maximum likelihood ap- only through character-based analysis can a re-
proach. In recent simulation studies, maximum searcher identify particularly informative charac-
Plzyloge~zt.ficInference 447
ters (or regions) in order to limit subsequent stud-

ies to those characters that are most useful (e.g.,
the detection of so-called "signature" events;
W7oese et al., 1980).
Additive Distances
If we could determine exactly the true evolution-
Add~tlveproperties.
ary distance implied by a given amount of ob-
dAB = V l + V 2
served sequence difference between each pair of
dAC=vlI-V3+7I',
taxa under study, these distances would have the
dAD= ~1 + ~3 + U S
very useful property of tree additivity (Figure 14):
dBC = 112 I- ' 3 + vq
the evolutionary distance between each pair of
dBD = V z + V 3 + U s
taxa would be equal to the sum of the lengths of
~ C D =~4 + U S
each branch lying on the path between the mem-
bers of each pair. (The branch lengths also repre-
sent evolutionary distances between pairs of se-
quences, but at least one member of the pair is a
hypothetical ancestral taxsn.) Additive distances
satisfy the four-poilzt nzetric condition (Buneman,
1971):for any four taxa A, B, C and D,
Additive propert~cs:
where d,, is the distance between taxa 7 and j, and dAR = ~1 + 0 2 t ~ 1 3
"max" is the maximum value function. Conceptu- dAC= 111 + V 2 + U 4
ally, this simply means that of the three sums of dBC = V g + vq
distances d,, + dkl where i it j + k ;t 1, one of these
must be as small or smaller than the other two, Ultrametric properties.
and these other two must be equal. For example, v, = v4
in figure 14A: v1 = v2 + v3 = v, + U q
d,, +dcD = v, +v, +v,+v, Figure 14 Addltlve and ultrametric trees (A) An ad-
d~tivetrec relating four taxa. A, H, C, and D It also 1~51s
dAC + dBD = (vl + v, + v,) + (v, + v, + v,) = the relationships between the srx taxon-to-taxon dis-
v* + v, + v, + u5 + 2v3 tances (dABthrough dcD) and thc flvc branch lengths
(v, through u s ) Add~t~ve distances and trccs do not
d,, +dB, = (v,+v, +v,)+(v,+v3+v4)= make any assumption about the rootmg; hence the rc-
v, + v, + v4 + v, + 2v3 lat~onsliipsare displayed In a n unroofed format All
sets of palrwlse d~stanccsthat satisfy the four-palnt
condltlon (see text) can be represented as a unlquc ad-
Tree-additive distances can be fitted to an un- ditive tree (B) An uItramctric trec relating three taxa. A,
rooted tree such that all pairwise distances are 8, and C In addrtlon to having addltlve propcrt~es(all
equal to tlie sum of the lengths of the branches taxon-to-taxon distances are the total of thc branch
along the path connecting the corresponding taxa lengths joining them), cvcry common ancestor 15
(Figure 14A). Unfortunately, due to the finite equidistant from all ~ t descendants.
s Thus, thc mo5t ~ c -
ccnt common ancestor of B and C is a 3 from B arid v 4
amount of available data, stochastic (random) er- from C, therefore v 3 = V J Llkew~sc,tl-ie common an-
rors will cause deviation of the estimated evolu- cestor of A and B I S v 1 from h and v 2 + v g from 8,
tionary distances from perfect tree additivity even therefore v = v I- v
448 Chapter 11 / Swoflord, Olsen, Waddell & Hillis
when evolution proceeds exactly according to the the vertical bars represent the absolute value, and
model used for distance correction. Many meth- cw = 1 or 2. A value of a and a weighting scheme
ods have been described that derive a tree and an must be chosen.
associaicd set of branch lengths that comes clos- Setting a to 2 represents a weighted ieast-
est (111 some sense) to being additive for a matrix squares criterion; the weighted squared devia-
of pairrvlse distances. These methods typically, tion of the path-length distances from the dis-
but not always, attempt to optimize an objective tance estimates will be minimized. If a = 1, then
funcrlon that quantifies the degree of "distortion" the weighted absolute differences will be mini-
between the path length and observed distances. mized. If the errors m the distance estimates are
T11e orlginal descriptions of these methods often distributed uniformly across the data, then the
confound thc choice of an optimality criterion least-squares criterion is preferred. If some esti-
with the algorithms used to select an optimal tree, mates are apt to be particularly bad, there are
but we will separate these two components, de- two considerations. First, if the identities of the
ferring the latter to the "Searching for Optimal least certain estimates are known, this knowl-
Trees" section. edge can be accommodated in the least-squares
method by assigning particularly low weights to
AdA.rf.rve-Tree Methods these uncertain values. If, however, it is not
A colnplete record of all genetic events would known a priori which estimates are apt to be er-
colzsti ttr tc a set of perfectly additive distances. We roneous, then using the minimum absolute de-
will1 tl-cat the experimentally derived distances, viations will reduce the overall perturbation
which estimate the (unknown) number of genetic caused by spurious data values. This last condi-
events that have actually occurred from the num- tion might pertain to direct experimental deter-
ber of differences actually observed between each minations of the distance data, a situation in
p a r of taxa, as approxiinations of this ideal. To which unrecognized experimental artifacts
emphLislzethe uncertainty in the values, we will could substantially Flaw some values.
call thcrn distance estimates. We can now address The four most cornrnonly used weighting
tl~cproblem of choosing a tree from the following schemes are:
conccpkual perspective: We have uncertain data
l want to fit to a particular mathematical
t l ~ a we
nod el (an additive tree) and find the optimal
value for the adj~rstableparameters (the branch-
ing pattern and the branch lengths).
TITCII-MARGOLIASH A N D RELATED METHODS

Sevc1s1n~ethodsdepend on a definition of the
illsagleement between a tree and the data based
on the following f a ~ n ~of
l yobjective functions:
where oi is the expected variance of measure-

ments of d,,. The first three equations amount to
implicit assumptions about the uncertainty of the
measurements: equation (12a) (Cavalli-Sforza and
where E deflnes the error of fitting the distance es- Edwards, 1967) assumes that all distance esti-
tmlaics to the tree, T 1s the number of taxa, w,, is mates are subject to the same magnitude of error;
thc wclght applied to Ll~eseparation of taxa I and equation (12c) (Fitclt and Margoliash, 1967) as-
1, r i , is the palrwise dlstance estimate, p,, 1s the sumes that the estimates are uncertain by the
length of path connecting 7 and I in the given tree, same percentage; and equation (12b) could be
viewed as a compromise that assumes the uncer- branch k is part of the path connecting taxon i to
tainties are proportional to the square roots of the taxon j, otherwise is equal to 0.With this de-
values (Felsenstein, 1993).Note that missing data finition it follows that
can correctly be handled by setting the corre-
sponding weight to zero; that is, ~f d,, is unknown,
setting zo,, = 0 will cause this observation to be ig-
nored (although most currently available software
does not allow specification of individual pair-
wise weights). Thus, a system of equations such as that of Figure
If there is a rational method for estimating a;, 14A can be represented in matrix notation as
then use of equation (12d) is preferable. Theoreti-
cal variance formulas are available for most of
the model-based distances described below (al-
though space limitations preclude their inclusion
here, they are available i n the original refer-
ences). These theoretical variances can be used
for DNA and protein sequence data, restriction
site data, and gene frequency data. An important
property of these formulas is that they explicitly
state the dependence of uncertainty on the
amount of data; e.g., for sequence-based dis-
tances, the variance is inversely proportional to
the sequence length N,A problem, however, is If the distances were additive, then p,,= di, for all
that if two sequences are identical, the estimated (i, j ) pairs, and we could solve (13) directly. In
uncertainty will be zero, which causes equation general, however, due to the imperfect additivity
(12d) to be undefined and would be a question- of the distances, we must use (13) to eliminate p,]
able conclusion in any case. A practical treatment from (11) and seek a solution to the v i s that mini-
is to assume that the minimum measurable dis- mizes E. This minimization can be accomplished
similarity is one-half of a substitution, yielding using special-purpose linear or quadratic pro-
(approximately) 1/(2IV2),as a minimum value to gramming algorithms (e.g., Barrodale and
be imposed on the estimated variance. Roberts, 1973),by iterative successive refinement
For other kinds of data, including indirect techniques ("alternating least-squares;" Felsen-
methods such as DNA hybridization or immuno- stein, 1993), or-when a = 2 and w,] = 1-by using
logical distances, random errors can be estimated ordinary linear algebra (e.g., Cavalli-Sforza and
by comparing replicate experiments or using reci- Edwards, 1967; Kidd and Sgaramella-Zonta, 1971;
procal comparisons (where appropriate; see G.J. Olsen, 1988) using the equation:
Chapter 6). These concepts are discussed in the
corresponding experimental chapters.
For an unrooted tree of T taxa, there are 2T - 3
independent branches that define the p , values,
and there are T(T - 1)/2 distinct pairwise dis- For weighted least-squares criteria like that of
tances. To represent mathematically the relation- Fitch and Margoliash (1967), the linear algebraic
ships between the branch lengths, vb and the path solution is
lengths between pairs of taxa, we need an appro-
priate representation of the tree topology. Let A be
a matrix of T(T - 1)/2 rows and 2T - 3 columns
such that the element A(,,)kis equal to 1 if the
450 Chapter 11 I Swofford, Olsen, Waddell b Hiillis
where W is a T(T- 1)/2 x T(T- 1)/2 matrix with priate because some highly suboptimal trees can
diagonal elements equal to the weights associated use negative values to produce a low apparent er-
with each pairwise comparison and all off-diago- ror. Several methods for dealing with negative
nal elements equal to 0. branch lengths have been proposed. Some au-
The methods in the previous paragraph fit the thors (e.g., Cavalli-Sforza and Edwards, 1967;
data to a specific tree topology, and thus assume Kidd and Sgaramella-Zonta, 1971) have favored
that an appropriate search strategy will be used to outright rejection of any tree that requires a nega-
find the best topology In an alternative approach tive optimal value for any branch. This extreme
described by We Soete (1983a,b), the values of p, approach runs the risk of rejecting the correct tree
are initially set to the observed distances (d?), and in certain realistic situations. An alternative strat-
then they are gradually adjusted by an opt~rniza- egy (Felsenstein, 1993) is to constrain the opti-
tion regimen that keeps them at a local minimum mization process so that the negative branch
of equation (ll),while improving their fit to in- lengths are disallowed; a solution that optimizes
equality (10) for all sets of four taxa. At the end of E under the constraint that all branch lengths be
the process, all sets of p,, satisfy inequality (10)- non-negative is obtained. If (14) or (15) is used to
so they will perfectly fit some additive tree-and determine least-squares branch lengths, the only
they are at a minimum of equation (11). alternative is simply to set any negative branch
A problem that sometimes arises with the lengths to zero and then calculate E without read-
above methods is that full minimization of equa- justing the other branches. This method gives ex-
tion (11)requires that some of the vk be negative. act values of E for trees that have no negative
A negative branch length does not correspond to branch lengths and overestimates the value of E
any meaningful biological process and should otherwise. ?he amount of overestimation is small
probably be avoided (e.g., Kidd and Sgaramella- as long as there are no large negative branch
Zonta, 1971).Allowing branches to have negative lengtl-rs,
values when E is evaluated is probably inappro- Table 1 summarixes the results of a least-
Table 1
Optimal 5s rRNA tree by weighted least-squares criterion
Sequence Estimated Expected Distance Expected Error
pa? distance" distancec difference" uncertaintyc contributionf
Bsu-Bst 0.1717 0.1655 0.0062 0.0522 0.00133

Bsu-Lvi 0.2147 0.2269 -0.0122 0.0600 0.00415
BSU-Amo 0.3091 0.2895 0.0196 0 0758 0 00667
Bsu-Mlu 0.2326 0.2414 -0.0088 0 U630 0.00194
Bs t-Lvi 0.2991 0.2958 0.0033 0.0743 0.00020
Bst-Amo 0.3399 0.3584 -0.0185 0.0809 0.00521
Ust-Mlu 0.2058 0.2058 0.0000 0.0584 0.00000
Lvi-Arno 0.2795 0.2795 0.0000 0 0708 0.00000
Lvi-Mlu 0.3943 0.3716 0.0227 0.0902 0.00633
Amo-Mlu 0.4289 0.4343 -0.0054 0.0906 0.00031
Data from G.J. Olsen, 1988 The corresponding tree is illustrated in Figure 15A
"Abbreviations are as In Figure 15.
'Distance est~matefrom sequence cornpansons, using equations (4) and (51, w ~ t hb = 3/4.
'Sum of appropriate branch Iengtl~salong the path jolnmg the taxa In the inferred tree
Difference of the two prevlous columns.
'Square root of the variance estlmate from equat~on(16).
{The individual terms of the summation in equatlon 14, w ~ t ha = 2 and w,, = cf2.
Phylogmetic Inference 451
squares calculatia~zfor a tree of five rRNA sc-

quences. The table presents the pairwise distance
estimates wit11 their expected uncertainties, the
corresponding path lengths through the inferred
tree, and the error contributed by each distance to
the overall value of E . As expected for a least-
squares rncthodology, the paths through the best
fitting tree will sometimes exceed the correspond-
ing distance estimates (e.g., Bsu to Lvi) and some-
times they will be shorter (e.g., Bsu to Bst). It
might be noticed that two distances are fitted ex-
actly. Tree branch lengths assigned by most meth-
ods will exactly reproduce the distances between
sister taxa in a tree (as long as negative numbers
are not involved), The inferred tree is shown in
Figure 15A.
The least-squares and minimum-absolute-de-
viation approaches implicitly assume that each
pairwise distance measurement is independent.
Because of the common evolutionary history of
the molecules in question, this assumption is not
generally true. The primary consequence of vio-
lating this assumption is purely statistical; trees
will be less well resolved than they would be if
the samples were in fact independent. However,
a second consequence is that any systematic er- Figure 15 Compartson of 5s rRNA pl~ylogenies111-
rors in tlze distance estimates can also be multiply ferred by d~ffercntpanwlse dislancc methods ( d a t a
sampled, and thus t l ~ epairwise methods are po- from Olsen, 1988). (A) Trees obtalned using nc~ghbor
tentially more sensitive to undercompensation for jo~ningand weighted least-squares The upper branch
lengths (expected substitutions per sequence pos~tlon)
l~omoplasyin the data (see the section on "Sys- are from the neighbor-jommg aualysls m Figuie 30, and
tematic Errors" later In this chapter). Felsenstein the parenthetical values are from the wc~ghtedleast-
(1986,1988a) discussed methods for dealing with squares analysis In Table 1. Although the tree is un-
interdependencies in pairwise distance data. In rooted, the M luteus sequence 1s consrdered the Q U L -
practice, none of these methods are used regularly group (B) Tree obtained by cluster analysis (UPGMA)
from the analysis tn Figurc 29 It can be seen that the
because of their computational complexity and neighbor-jotntng and least-squares procedures p~o-
other limitations. Note that neither parsimony nor duced very similar trees, but the cluster analysis tlee 1s
rnaximum likelihood suffers from this difficulty, very dlffcrent Two of the sequences, those of 1, vrr r-
descens and A modlcurn, are very much more dtverged
THE MINIMUM EVOLUTION METHOD Kidd and than are the others, an effect to whlch cluster analysis
1s particularly sensitive. Abbreviat~onsused to ~dcntlfy
Sgaramella-Zonta (1971) suggested using the the taxas Bsu, Bactllus subtilts, Bst, Buczllus sfcnrot11cr.-
unweighted least-squares criterion (equation 11, moplatlus, Lvl, Lactobuczllus niridescens, Amo, Ackolc-
with w,] = 1 a n d a = 2; Cavalli-Sforza a n d plasma inodzcum; and Mlu, Miriococcus luteus
Edwards, 1967) to fit the branch lengths, but a
different criterion to evaluate and compare trees:
That is, the optimality criterioiz is simply the sun^
2T-3 of the absolute values of the branch lengths that
LS length = 2 lvkl minimize the sum of squared deviations between
k=l observed (estimated) and path-length distances.
Subsequent simulations indicated that the LS
452 Chapfer 11 / Swofford, Olsen, Waddell b Hillis
length criterion consistently outperformed least- striking improvement in the performance of the
s q ~ ~ acrileria
cs based on (11) (Kidd and Cavalli- FM method when branch lengths were con-
Sfor~a,1971).Apparently unaware of this work, strained to be non-negative; in their study the per-
Rzlietsky and Nei (1992a) described a method formance of the FM method slightly surpassed an
based on essentially the same criterion, cailing it approximate method closely related to ME (the
the 117lliln?um evolution method: neighbor-joining method; see below), but only if
negative branch lengths were disallowed.
Ultrametric Distances
Ultrametric distances are more constrained than
tree-additive distances. Mathematically, ultramet-
The only difference between the two methods IS ric distances are defined by satisfaction of the
h a t R~l~etsky and Nei drop the absolute values in three-point condition, whic1-1requires that for any
cqu.~~ion (IG), which has the seemingly undesir- t h e e taxa A, 13, and C,
able property of allow~ngnegative branch lengths
to rrnprove the apparent goodness-of-fit of the
irce In practice, however, the two methods are lit-
tle difierent, because the branch lengths are usu- This inequality simply states that two of the three
ally non-negative (or very close to zero if nega- pairwise distances between three taxa are equal
Livc) (Swofford, unpublished observations) on and at least as large as the third. Phylogenetically,
trees scorlng well according to equation (16). The ultrametric distances will precisely fit a tree so
choice of the name "minimum evolution" is un- that the distance between any two taxa is equal lo
foriul-iatc,as the same name had been used earlier the sum of the branches joining them, and the tree
for s clulte different method (Cavalli-Sforza and can be rooted so that all of the taxa are equidistant
Ed~vards,1967; Thompson, 1973).Because the ear- from the root (Figure 14B). The first half of this de-
lie1 method was never widely used and the scription defines an additive tree (and implies
Rzhersky-Nei method is becoming very popular, that ultrametric distances are additive). The sec-
i t s e e ~ best
~ ~ sto refer to the methods defined by ond half of the description corresponds to the
equations (16) and (17) as thc minimum evolution concept of a molecular clock that runs at the same
(ME) method. rate in all lineages at any given moment. Two po-
R7hetsky and Nei (1992b) have provided a tential surprises may emerge, however. First, even
theoreucal argument for the superiority of thc ME with ultrametric data, there is no guarantee that
method over the Fitch-Margoliash (FM) and re- the amount of divergence is linear in time. In par-
lated methods duc to a bias in the latter methods ticular, superimposed sequence changes, which
~ ~ 1 1 ~t h1e1varlance of thc estimated distances is decrease the observed molecular divergence, do
high (c g , due to large drfferences between sl-iort not destroy the ultrametric property, Second, ob-
sequences) Although their computer simulations taining ultrametric data is extremely unlikely;
appe'11ed to reinforce this concIusion, the actual even if the underlying substitution rate is per-
reason for the betier performance of ME is un- fectly constant, any finite sample will yield statis-
cIear, as the bias quickly becomes inconsequential tical fluctuations in the measured divergences.
as sequence length increases. It seems more ylau- Consequently, even a universal substitution rate
sible that the enhanced ME performance is due to would not give ultrametric data without an infi-
a reduced impact of negative branch lengths In nitely large sample. The closest experimental ap-
the M E melhod. I<idd et a1 (1974) reported that if proximations of infinite samples are genome hy-
trees contaming negative branch lengths are auto- bridization measurements (Chapter 61, although
matically rejected, the ME and FM methods give measurement errors limit the effective amount of
essc.i-iiidly den tical rcsults. Felsenstein and Kuh- data (Felsenstein, 1987).
ner's (1994) simulations also demonstrated a If data are nearly ultrametric by equation (181,
which is rarely the case, methods that assume a thereby changing our definition of sequence dis-
lnolecular clock can be more efficient (require less similarity to the number of aligned sequence posi-
data to achieve the same probability of inferring tions containing "non-synonymous" residues di-
the correct tree). Felsenstein's (1993) KITSCH provided by the number of sequence positions
gram uses the same criterion as equation (11) compared. For example, "conservative substitu-
(with a = 21, but constrains the lengths of the tions" are commonly ignored when comparing
branches so that the total length from the root of proteins by pooling the amino acids into six
the tree to each terminal taxon is the same. Cluster groups: acidic (D, E), aromatic (E W, Y),basic (El,
analysis methods (described below) are also ap- K, R), cysteine, non-polar (A, C, 1, L, P, V), and po-
propriate under the assumption of a molecular lar (MIN, Q, S, TI. Residues within each group are
clock, and are very fast. Colless (1970) provided a considered synonymous; residues in different
precise definition of how much deviation from ul- groups arc considered non-synonymous.
trametricity can be tolerated without causing the As discussed above, if the evolution of a gene
estimation of the tree to become inconsistent. includes insertions and/or deletions, then gaps
However, there is little practical reason to use must be inserted to adjust for the internal length
cluster analysis because related methods such as changes when aligning the contemporary se-
neighbor joining are applicable to more general quences. Althougl~the character state "gap" is
additive distances, require very little additional sometimes treated as a fifth base or twenty-first
computation, and are often more efficient in sim- amino acid, the processes responsible for base
ulation studies under a molecular clock model substitution and-for insertion-and deletion are
(Sourdis and Krimbas, 1987; Charleston, 1994)un- evolutionarily and mechanistically distinct. Be-
less rates of substitution are high. cause a proper treatment is not obvious, sequence
positions with gaps are usually omitted from
analyses in one of two ways (e.g., Kumar et al.,
Distance Transformations fou Sequence Data 1993; Swofford, 1996).The first (pairwise deletion)
MEASUREMENT OF SEQUENCF DISSIMILARITY By omits sites in which one or both sequences have a
far the most common method of summarizing gap for each affected comparison. This option is
the relationship between two sequences is by appropriate when gaps are short and distributed
their fractional (or percentage) similarity or dis- approximately at random (Kumar et al., 1993).A
similarity. In its simplest form, the sequence dis- s@condtreatment (complete deletion) deletes a
similarity is equal to the number of aligned site from all pairwise comparisons if any of the se-
sequence positions containing non-identical quences in the data set have a gap at that site. Al-
residues (bases or amino acids) divided by the though the complete deletion method discards
number of sequence positions compared (in moreinformation, it may be more appropriate
mathematics this distance is called the Hamming when some regions of a sequence (e.g., more
distance). I-lowever, we must explicitly address rapidly changing regions) are more prone to in-
several subtleties and potential ambiguities: sertion/deletion events than others, in which case
alternatives to limiting the comparison to identi- pairwise deletion could introduce a bias. Align-
cal residues; terminal length variation of mole- ment gaps are usually positioned to maximize the
cules; alignment gaps; and treatment of ambigui- alignment of identical residues in sequences.
ties. The following sections assume that the Thus, additional insertion/deletion events could
sequence alignment has already been defined systematically raise the apparent similarity Once
(see "Sequence Data" in the section "Types of again we emphasize that regions of the sequence
Data" above, and Chapter 9). alignment that contain substantial numbers of
It is frequently of interest to define the simi- alignment gaps should be omitted from the analy-
larity of two molecules in terms of a more relaxed sis; positional homology is too uncertain for reli-
criterion than the fraction of identical residues, able estimates to be made from these regions.
454 Chapter 11 / Swofford, Olsen, Waddell tS Hillis
Tevminnl length variatio~zrefers to the observation that corresponding mole-

cules from different species (and even within an individual organism) can start
and end at different distances from homologous features within the molecules.
In addition to insertions and deletions, other genetic and factors
(e.g.,substitution mutations or alteration of a processing enzyme) could bc re-
sponsible for these variations. Because of the diversity of mechanisms, omitting
the corresponding alignment columns, as in the second treatment above, seems
lnost appropriate.
ACCOUNTING FOR SUPERIMPOSED EVENTS The raw dissimilarity (or similarity) is

an appropriate value for summarizing the relationship between sequences.
However, it is an inescapable fact that as gcnes accumulate mutations, there is
an ever increasing likelilzood that some of the changes will be at the same
sequence location. Because pairwise comparisons of sequences are based entire-
ly on the identity or non-identity of residues at corresponding sequence posi-
tions, the first: substitution at a site will convert identical residues to non-identi-
cal residues, Subsequent changes at the same sequence position cannot further
decrease tlze similarity, but they can raise the similarity by converting the com-
pared residues to similar identities (parallelism or reversion). The net effect of
this superimposition of substitutions is that dissimilarity does not increase uni-
formly with the number of events; instead, it increases rapidly at first and more
slowly thereafter. Thus, correction of the distance to account for the unobserved
substitutions is necessary for the distances to conform to an additive-tree
model, unless all sequences are extremely similar. We show some of tlze more
common distance corrections below, but see Kumar et al. (1993) and Swofford
(1996) for more complete compilations.
A general framework for describing distance measures under a variety of
models uses a divergence matrix F, to represent the relative frequencies of each
nucleotide (or amino acid) pair in a given pairwise comparison of two sequences
X and Y, e.g.:
wlzere n,, is tlze number of times sequence X has state i aligned next to state j in se-
quence Y, and N = Cn,. Let us represent this matrix as
A frequently overlooked issue in pairwise sequence compariso~zis the treat-

ment of ambiguities (i.e., nucleotide bases or amino acid residues of uncertain
Pizylogenetic Inference 455
identity) in the sequences being compared. For example, counting a purine (R)
as synonymous with A and G and non-synonymous with C and T will tend to
overestimate the similarity between the affected sequence comparisons. One
approach (Swofford, 1996) is to distribute diflerences between sites with atnbi-
guities based on the frequencies of differences at unambiguous sltes. For in-
stance, suppose that a site has an A in sequence X and an R in sequence Y. If for
t111s comparison there are 450 sites that have an A in both sequences, and 50
sites that have an A in one sequence and a G in the other, then the site would
c0ntribut.e 45015' 00 = 0.9 to the value of (Fr,,)AA,and 0.1 to Maxlmum
likelihood distances (see below) can deal with the ambiguitxes directly (e.g.,
Felsenstein, 1993) by considering the likelihood of each possible resolution of
the ambiguity.
The uncorrected distance, often referred to as the dissin~~larity(D) or p-dis-
tance (e.g., Kumar et al., 1993),is simply the total number of differences divided
by the total number of available sites:
dxy= b + c + d + e + g + I . r + i + j + l t ~ i z + r ~ + o
p-distance : =I-(a+ f + k + p )
A pairwisc distance estimate is essentially the branch length in an optimal phy-

logenetic tree of two taxa. Thus, most of what was said about models for maxi-
mum likelihood tree inference (above) also applies lzcre.
The corrccted distance for the Jukes-Cantor model, which assumes equal
rates of substitutioil between all pairs of bases, is calculated as
Note that the maximum expected dissimilarity is 0.75; if D equals or exceeds this
value, the distance becoines undefined because the argumeilt of the logarithm be-
comes negative. A distance for the model of Felsenstein (1981a),whicl~relaxes the
assumptioil of equal base frequencies, is given by
wl~ereD IS the same as for JC above and 3 = l - (ni+ n: + ni + n f ) (Tajima and

Nei, 1982).The base frequencies can either be estimated for each pair of sequences
compared or for the full set of sequences; we favor the latter due to lower sam-
pling variance.
Comparison of equations (20) and (21) reveals that (23) can be used to calcu-
late a distance for the JC model if we set B = 3/4. In fact, (21) is a very general for-
muIa that can also be used for calculating distances from protein sequences. A dis-
tance for the Poisson model is obtained if we set 13 = 19/20, and a distance for the
Proporiional (unequal amino acid frequency) model is obtalned by setting
456 Cl~npfer11 / Szuofford, Olsen, Waddell & Hillis
general time-reversible model (GTR). [Although

the algorithms described in these two papers are
quite different, the methods are actually equiva-
-tvllcre the xi's now represent the frequencies of lent (Lewis and Swofford, unpublished; see Swof-
each amno acld, and D represents the proportlon ford, 1996).]The Rodriguez et aI. version of this
of a m ~ n oacid differences between the two sedistance is
quences.
The distance for Kimura's (1980) two-para-
meter model is calculated from the proportions of
tr ansitlon-type differences (P)and transversion-
GTR :
l i
dy = -trace I3 In 13-lPXY
I)
type differences (Q). where IT is a diagonal matrix of the average base
frequencies in sequences X and Y.(Interpretation
of this formula requires some familiarity with ma-
trix algebra. Note in particular that evaluating the
log of a matrix requires, among other things, de-
termination of its eigenvalues and eigenvectors.)
Lewis and Swofford (unpublished; see Swofford,
1996) have developed an extension of the Lanave
Note that the proportion of transitions and trans- et a1.-Rodriguez et al, method that allows estima-
versions is estimated separately for each palr of tion of distances under any special case of the GTR
taxa, in spite of the fact that different pairs ol taxa family of models. When simple formulas sucl~as
share common lineages on the tree. For many the ones shown above exist, the Lewis-Swofford
n~odelsmore complex (and general) than the K2P method gives identical results, but it allows calcu-
model, no simple distance forinula exists (e.g., Z. lation of distances for many models for which dis-
Yang, 1994a; Zharkikh, 1994). For example, the tances were unavailable previously.
F-Il<Y model (the unequal base-frequency general-
17aiion of the I(2P model) does not have a simple ESTIMATING TRANSITION AND TRANSVERSION SUB-
distance formula (see 2.Yang, 1994a for an expla- s T r T u r I a N s SEPARATELY If transversions occur
nation). However, the closely related F84 model much less frequently than transitions and the
docs have a simple distance formula (Tateno et al., amount of divergence is high, transition differ-
1994) ences are likely to approach or reach saturation.
When this happens, transitions will contribute lit-
tle phylogenetic information and will cause infla-
tion of the variance of the evolutionary distance
estimates. In such situations, it may be preferable
to estimate the phyIogeny using transversion
data alone, minimizing the impact of the noisy
where ry = q i. TC,, X R = n ~ %+ , A = ncnr/%+ transitions (Goldstein and Pollock, 1994). All of
rr,q,/nR, B = ncn~+ e~r,n~, and C = %q,and P the distance formulas described above can be
and Q arc as defined for the U P model. The most modified to estimate the number of transitions
general model for w h ~ c ha simple distance for- and transversions per site separately (see Kumar
mula exists (Z. Yang, 1994a)is tltat of Tamura and et al., 1993, and Swofford, 1996, for con~pilations
Xel (1993) (not shown), wl~lchgeneralizes the of these methods). Alternatively, one could
HICY model to allow different rates for transitions recode the nucleotide states into R (A or G)and Y
between purines versus those between pyrim- (C or T) and apply a two-state distance correctiou
lcilnes (analogously to the transversion parsimony
Lanave et al. (1984) and Rodriguez et al. method). Alternative distances have been pro-
(1990) have formulated a &stance for the most posed for the K2P model that make separate esti-
mates of the number of transition versus trans- strategy is appropriate when a substantial se-
version substitutions and use a weighted combi- quence divergence is apparent. The rationale is
nation of these as the estimate of the evolutionary that the third codon position will be largely ran-
distance (Schoniger and von Haeseler, 1993; domized and hence phylogenetically uninforma-
Goldstein and Pollock, 1994; Tajima and tive. This approach, by definition, also circum-
Takezaki, 1994). These methods appear to be vents the problem sf the third codon position
much more reliable for tree inference than the changing more rapidly than the first two and re-
usual K2P distance (Pollock and Goldstein, 1994). duces the degree of violation of the assumption
that all sites are changing at the same rate.
PROTEIN-CODING DNA SEQUENCES In principle, The third basic method is to infer the protein
knowledge of the gene sequence should be more sequence from the gene sequence and perform the
informative than the corresponding protein phylogenetic analysis at the protein level. This ap-
sequence. In practice, at least two factors call this proach has two merits: (1) the protein is the most
assertion into question. First, silent substitutions biologically relevant aspect of the gene (taken as
in protein-coding genes are much more frequent a whole); and (2) the sequence can be compared
than replacement substitutions; thus the third with homologous molecules that were sequenced
codon positions tend to become randomized at the protein level, for which nucleotide se-
quickly and convey very little information about quences are therefore unknown. In addition to the
distant phylogenetic relationships. Second, the distances for the Poisson and Proportional mod-
base composition of the third codon position els described above, PHYLIP (Felsenstein, 1993)
appears to vary systematically between some provides a distance under the Dayhoff model.
species, thereby indicating that it can be subject The more complex methods involve estimat-
io at least a moderately strong selective force ing the numbers of synonymous (silent) and non-
that is different in different lineages. The pres- synonymous (replacement) substitutions sepa-
ence of directional selection can lead to profound rately. When the maximum divergence between
sequence convergences and consequent errors in taxa is low, distances based on synonymous
inferred relationships. With these considerations changes may reduce the effect of among-site rate
in mind, three relatively simple strategies can be variation, as synonymous substitutions are largely
used to analyze protein-coding sequences, and a neutral (Kumar et al., 7993). For more distantly re-
host of moderately to extremely complex alterna- lated taxa, restriction to non-synonymous changes
tives exists. tends to minimize the impact of noise contributed
The simplest method of calculating distances by a large number of silent changes. Many meth-
between sequences for protein-coding genes is to ods have been proposed far estimating synony-
apply the distance fonnulas above directly to the mous versus non-synonymous substitutions (W.-
gene sequence without special treatment. This H. Li et al., 1985b; Nei and Gojobori, 1986; W.-H.
method is reasonable, or even preferred, when the Li, 1993b; and references cited therein). These
total amount of divergence is very small, in which methods differ in the details of how they deal
case the resulting trees are based primarily on with multiple substitution pathways when two
silent substitutions in the genes. The main draw- codons are more than one substitution apart and
back is that a systematic undercorrection for su- how they account for different levels of degener-
perimposed substitutions will result, since the as- acy (e.g., a site in a sequence is twofold degener-
sumption that all positions are equally subject to ate if one of the three possible changes is synony-
change will clearly be violated. If the amount of mous and fourfold degenerate if all possible
sequence divergence is truly small, then superim- changes at the site are synonymous).
posed changes will be rare and the undercorrec-
tion will be negligible. MAXIMUM LIKELIHOOD DISTANCES The most
The second approach is to restrict the analy- straightforward (and computationally intensive)
sis to the first two nucleotides of each codon. This method for estimating evolutionary distances is
458 Chapter 11 / Swofford, Olselz, Waddell & .'Nilis
to apply maximum likelihood according to the the highest likelihood) as the parameter value for
models described under "Models of Sequence calculating distances as input to a tree search us-
Evolution." As noted above, the "tree" in this ing a distance criterion. This hybrid approach can
case is a single branch, and we estimate the be an effective compromise between a full search
branch length (expected number of substitutions under the maxilnum likelihood criterion (which
per site) t6at maximizes the probability of one may be computationally infeasible) and an arbi-
sequence evolving from the other. (Because of the trary choice of parameter values using a distance
time-reversibility of the models, it makes no dif- criterion.
ference which sequence is considered ancestral.)
Felsenstein's (1993) DNADIST program obtains TREATMENT OF UNDEFINED VALUES Distance val-
maximum likelihood estimates of distance under ues become undefined if the apparent sequence
the JC, K2P (with or witl~outa gamma-correction divergence exceeds the maximum possible (true)
for among-site rate variation), and F84 models, distance under the assumed model of evolution.
but the same approach easily could be adapted to For example, in the JC model, complete random-
accommodate other models. Many (but not all) of ization of sequences would lead to D = 0.75 (i.e.,
the distance formulas presented above are maxi- even for two random sequences, one-fourth of
mum likelihood estimators (e.g, see Zlzarkikh, the nucleotides are expected to be identical by
1994). However, direct use of maximum likeli- chance). If the observed dissimilarity equals or
hood to calculate the distance has a number of exceeds 0.75 due to sampling error or violation
advantages. Most importantly, it allows model of the model, the logarit& in equation (20) can-
parameters, such as the transition:transversion not be taken. In this-situation, it is probably wise
ratio, to be maintained at a consistent value not to proceed without taking steps to avoid
across all pairwise comparisons (e.g., although problems due to excessive saturation. If only one
the standard K2P distance formula is a maximum or two sequences are causing the problem, they
likelihood estimate when estimating the transi- can be eliminated from the analysis. If the prob-
tion:transversion ratio independently for every lem is rnostly due to high rates of transition-type
pair, the distance must be numerically evaluated differences, transversion-only distances (or max-
usinn maximum likel~hoodin order to use a fixed
w
imum likelihood distances with a high. transi-
ratio as a means of reducing sampling variance). tion:transversion ratio) can be employed. As a
Maximum likelihood estimation also provides a last resort, any undefined distances can be re-
very clean way of handling missing or ambigu- placed by an-arbitrarily large distance value,
ous data, as the probability of observing each of such as twice the rnaxirnum observed distance.
the bases allowed by the ambiguity can be explic-
itly evaluated. ACCOMMODATING AMONG-SITE RATE VARIATION IN
Although maintenance of substitution-model DISTANCE CORRECTIONS Distance corrections
parameters at a consistent value is an advantage that assume equal rates of change across sites
of maximum likelihood distances, it adds the bir- will be affected by the same problem that com-
den of specifying their values. One possible way plicates maximum Iikelihood analysis when
of estimating these parameters is to perform phy- among-site rate heterogeneity exists: distances
logenetic analyses using a range of parameter val- will underestimate the actual nuinber of substi-
ues, then choose the parameter settings that max- tutions (Gelding, 1983). Fortunately, this rate
imize the additivity of the distances on the best heterogeneity can be accommodated without
tree(s) found (e.g., that minimize the value of E in too much difficulty. For maximum likelihood
equation 11).~~Grnatively, parameters may be es- distances, any of the model variations described
timated using maximum likelihood on a few "rea- under "Accommodating Rate Heterogeneity
sonable" trees obtained using simpler distances. Across Sites" in the "Maximum Likelihood
If the parameter estimates are reasonably similar Methods" section can be applied directly. If rates
acrosstl~esetrees, it is probably safe t-o use their are assumed to follow a gamma drstribution,
mean value (or the value from the tree that had special modifications of the distances described
above are available for the JC and K2P models in this correction should be estimated from the
(Jin and Nei, 1990) and TrN models (Tamura and constant sites alone.
Neil 1993).Although not noted by tliese authors,
these "gamma" distances can be obtained from LOG-DETERMINANT DISTANCES The ~ l ~ o d e l s
the usual distances simply by replacing tlie described above for inaxlinum llkcl~hoodand
function ln(x) with a(l - x-l'9 in tlie original d~stanceestimat~oizassume that the substltut~on
formulas, where a: is the shape parameter of the probability matrices rema111 constant throughout
gamma distribution (this function 1s tlie inverse the tree (l.e., they are stationary) and that t l ~ e p
of the moment generating function for the distri- have the property of tlnze revers~bil~ty (which
bution). In fact, this method also works for most jointly 11ilyly that basc frecluencies rcilzain a t a
( ~ not
f all) of the other time-reversible distances constant, equllibriunz value) The LogDct (Steel,
(Waddell and Steel, 1995; Lewis and Swofford, 1994a; Locklzart et a l , 1994) or parallnear dls-
unpublished; see Swofford, 1996). For example, tance (Lake, 1994) 1s a transformation that yields
the general time-reversible distance witli a dis- addltive distances under a much w ~ d e rset of
tribution of rates across sites can be written as models. Perhaps most ~mportantly,thls transfor-
d,,, = -tracein I M - ~ ( ~ - ~ F , ~where
) ] ) , M-l is the matjon 1s robusl to changing base composltlon
s a m e fuliction used for the Jin-Nei a n d (e.g., GC bias) among the taxa being stud~ed-a
Tamura-Nei distances in the case of the gainma potential source of systellzat~cerror if stationary
distribution, but can be tlzc inverse of tlie ~lzodelsare assumed. The LogDet transfoxma tion
moment-generating function for other distribu- wlll yleld an additlve dlstance (in expectatlon)
tions as well (Waddell and Steel, 1995). The under any Markov model. of evolution (sce
value of a must be determined independently above) as long as sites cvolve ~dentlcallyand
using one of t h e methods outlined above. independently and rales of substltutlon arc equal
Choice of an a value based on results from pre- across sites. This general Markov model is
vious studies is also an option (e.g., Kumar et described by a rooted tree, where the root can
al., 19931, although evidence is accumulating have any base composltlon (as long as all states
that levels of rate heterogeneity vary widely have a non-zero frequency) There are no con-
among different genes, regions of genes, and stralnts on the parameters 111 edch s u b s t ~ l u i ~ o n
organisms. probability matrlx P(1) (all 12 substltutlons are
The invariable sites model (see above) can free to occur at different rdles), and P(t) can bc
also be applied to distance estlinatlon by remov- dlffercnt for each branch or at diffcrcnt polnis
ing a certain fraction of the constant sites from the along the samc branch Each P(t) matrix irnplles
data matrix. Tlie easiest way to accomplish this is its own set of stationary basc colnposltlon val-
to subtract the constant $N/4 from the diagonal ucs, so tliese are also allojved to vary throughou~
entries of 1 1 , in matrix (19) (and adjusting N ac- the tree. These assumptions correspond to those
cordingly) before calculating the distance, where of the maximum hkel~hoodmodel proposed by
$ is the desired proportion of invariable sites and Barry and I-Iartigan (1987a)
N is tlzc total number of sites (Waddell, 1995). If The basic forin of the log-determ~l-iantd ~ s -
basc frequencies are unequal, ~t is preferable to taizces is
subtract ~~$Nfr0111 Lhc kt21 diagonal elerr~entof the
divergence matrix, where n,(is the frequency of
base k . When base composition is not liomoge-
neotzs tliraughout the tree, or in other situations (Steel, 19944, where "det" refers to the determi-
where constant sites have a different composition nant* of a ilzatrix and F,, is an r x divergence
than the variable sites, tlie base frequencies used n-tatrlx for sequences X and Y (e.g., equatlon 19)
*The defrllitlon of the deterrnlnant of a matrtx 1s beyond the scope of thls chapter Introductions to nlatr~xalgcbla
can be found 1n many statistics texts or any lu~caralgcbra text. An excelleni mtraductioi~for brologists 1s Bul~ncr
(1994, p. 298 ff.).
460 Chapter 11 / Swofford, Olselz, Waddell & Hillis
~ l t 1h equal to the nuiznbcr of character states transformations, because non-stationary base

(e.8 , 1. = 4 for DNA sequences). For identica1 se- composition can lead the standard formulas to
quences, d , v should be set to zero, although in over- or underestimate the true distance by large
practics, equation (23) is used instead, rn which amounts (Waddell, 1995).Thus, as a general rule,
case no expllclt treatment 1s needed for this case. the branch lengths of a tree estimated using the
U evolution proceeds according to the model de- LogDet distance should be considered just as use-
scribed in the above paragraph, distances calcu- ful as any other distance when base frequencies
lated using equation (22) will have the property are not homogeneous.
of tree additivity (apart from sampling error), but A concern with using any drstance transfor-
in general, thls expression cannot be used to estr- mation derived from a very general model is
mate the number of nucleotlde substitutions per that it will suffer from inflated sampIing errors,
sike (cvolufionary distance). However, for sta- making it less reliable for tree selection unless
tionary models, the value obtained from (22) can sequeices are very long. This concern appears
bc 5cnIed to a distance that is pvopovtionnl to the to be unjustified, however, as the sampling vari-
evolutionary drstance using the formula ance of LogDet distances can approximate that
of even the most simple (but least general)
LogDet: distance transformations described above (Wad-
dell, 1995). 'For example, when applied to sim-
d, = [-ln(det F,) + ~ l n ( d e II,II,)]/r
t
ple, stationary models with equal base fre-
A det F,,
= - I In[Jdet nXEy
1
wherf II, and rIy are diagonal matrices of the
quencies, the variance of the LogDet distance
(Lockhart et al., 1994)becomes equal to that cal-
culated by the usual variance formulas. Furtlzer-
more, four-taxon computer simulations (D.L.
Swofford, P.O. Lewis, and P.J. Waddell, unpub-
character-state frequencies in sequences X and Y, lished) show that when data are simulated ac-
rcspect~vely(Lockhart et al., 1994). The expected cording to any of the models in the GTR family
value of this distance will be equal to the mean (Figure ll),the minimum evolution method us-
nurnbcr of substitutions per slte if base frcquen- ing LogDet distances leads to recovery of the
cies are all equal, in ~vhichcase correct tree about as often as using other dis-
tance measures-including the distance specific
ln(det n,n,) = -r lnr to the simulation model-:for all but very short
sequences ( ~ 2 0 bases).
0
Otlierwlse, it will overestimate the evolutionary The LogDet can be applied to amino acid se-
distance by a constant factor that becomes larger quences (Lake, 1994 gave a four-taxon example),
as base composition becomes more unequal or even using each of the 61 non-stop codsns as
(Waddell, 1995).Note that equation (23) is equiv- character states. The variance of the LogDet may
alcni to Lake's (1994) paralinear distance except become more of a problem in these situitions, so
for [he scaling by l / r . (Lake did observe, how- it may be useful t.b group some states together
ever, that his paralinear distance was approxi- (e.g., into the six main amino acid classes). An-
rnaiely equaI to r timcs the mean number of sub- other problem is that a state may be entirely ab-
slltu tions per site ) For non-stationary models, sent in one or more of the sequences. In this case,
(21) tends to overestimate the mean number of the determinants of F,, and of I& and/or rPy will
s~tbshtutions,but it can also be an underestimate, be zero (yielding an undefined distance when the
depcndrng on the base composition at internal log is taken). The best way to deal w ~ t hthis situa-
p o ~ n t sof the tree. Even under non-stationary tion remains to be determined; possible solutions
models, however, the LogDek distance often pro- include removing the state from the F, matrix al-
vldcs better estimates of the number of substltu- together (if the state is absent from all of the se-
tlons per site than any of the standard distance quences), pooling this state with another, or set-
ting the corresponding elements of F,, to some WHICH SEQUENCE DISTANCE TRANSEORMATfON IS
sinall value such as 1/ (2N). BEST? AS the above discussion indicates, dis-
Lockhart et al. (1994) found that use of tance analysis of sequence data requires choos-
LogDet distances yielded more believable trees in ing a distance transformation from a rather over-
three examples for which nucleotide composition whelming number of possibilities. Ideally, we
was variable over taxa. However, a weakness of would always choose the most general distance
the standard LogDet transform in real applica- available, as this distance has the smallest chance
tions is that it is no more robust to unequal sub- that assumptions corresponding to particular
stitution rates at different sites than are other dis- restrictions of the underlying model will be vio-
tance measures (Barry and Hartigan, 198713; lated. Currently, this criterion would lead to a
Lockhart et al., 1994; Lake, 1994). Lockhart et al. tradeoff between the LogDet/paralinear distance
(1994) reported that for some data sets, reasonable (which requires special treatment if there is sub-
trees could be obtained only after eliminating sites stantial among-site rate variation) or the GTR
that were uninformative according to the parsi- (general time-reversible) distance with an appro-
mony criterion, and suggested that inclusion of priate correction for rate heterogeneity (Waddell
sites that were highly unlikely to change might be and Steel, 1995). However, generality often
the cause of the problem. Unfortunately, unlike comes at the price of increased variance, and
the less general distance transformations, LogDet many simulation studies have indicated that
distances cannot be directly modified to take ac- simpler distances based on models that are
count of a specific distribution of rates such as the known to be vioIated may nonetheless perform
gamma distribution. better for phylogenetic inference than distances
Waddell (1995) has shown that by subtracting based on the same model being used to generate
an appropriate proportion of invariant (constant) the data (e.g., see Nei, 1991 and references cited
sites from the diagonal elements of I?, (see "Ac- therein). For example, when sequences are rela-
commodating Among-Site Rate Variation in Dis- tively short, use of simple dissimilarity (p-dis-
tance Corrections," above), LsgDet distances can tance) or the JC distance can lead to correct
become nearly additive even if the true distribu- recovery of the true tree more often than the K2P
tion of rates across sites follows a continuous dis- distance, even when there is a fairly strong tran-
tribution sucl~as the gamma. Methods of estimat- sition/transversion bias.
ing the proportion of invariable sites for It is difficult to provide simple prescriptions
maximum likelihood and other distance transfor- for the choice of a distance measure (but see Ku-
mations perform well, whereas simple removal of mar et al., 1993, for one such set of recommenda-
parsimony-uninformative sites tends to be too se- tions). In general, we believe that additional stud-
vere. However, as base composition becomes ies will confirm preliminary simulations that
more heterogeneous over taxa, sites with different indicate little variance-inflation problem with
rates of change also change base composition LogDet/paralinear distances when all sites evolve
with respect to each other. Thus, it may be impor- at the same rate (see "Log-Determinant Dis-
tant to estimate base frequencies using only the tances," above). Because of their generality (in-
constant sites, rather than the full data set, when cluding their robustness to base composition bi-
calculating the proportion of sites to remove from ases), log-determinant distances are probably
the diagonal elements of F,. Removing constant preferable to other, more restricted, distances that
sites is helpful and may adequately correct for the do not incorporate corrections for among-site rate
problem of rate heterogeneity plus shifting base variation. Beyond that, we offer Kumar et al.'s
composition (Waddell, 1995),but a better strategy (1993, p. 29) rule of thumb: "As a general rule, if
may be to classify sites into a few distinct rate two distance measures give similar distance val-
classes, apply the LogDet transform to each, and ues for a set of data, use the simpler one because it
sum these separate estimates to obtain the final has a smaller variance." Of course, the longer the
distance. sequence length, the less variance considerations
462 Chapter 11/ Swofford, Olsen, Waddell & Hillis
dominate the choice of a distance. With long se- tion to DN to alleviate the problems created by
quences (e.g., >2000 bases), it may be more proi- non-uniform rates of change:
itable to emphasize closcr modeling of the substi-
tution process than to worry too much about
variance.
Transformation of Allozyme and Xest~ictiolz

Endonuclease Data t o Distarzces where L is the total number of loci; that is, the dis-
A large number of mcasures have been proposed tance is computed from the arithmetic mean of
for transforming allelic and genotypic freq~tency the single-locus identities. (Although I-Iillis, 1984,
data to genetic distances (S. Wright, 1978);we will did not specifically recommend it, an unbiased
treat only a few of the more commonly used ones version of D; could be obtained by a substitution
here. Historically, the most frequently used ge- equivalent to that for Nei's original distance.)
netic distance has been that of Nei (1972, 1978). Nei's distances (in either their original form
Let x,and y, be the frequencies of the ith allele at a or as modified by Hillis, 1984) are non-metric in
particular locus in taxa X and Y,respectively. Nei's that they frequently violate the triangle inequal-
(2972) standard genetic distance can then be de- ity. Farris (1981) has heavily criticized it for this
fined as reason, arguing that when a distance measure is
non-metric, it is meaningless to fit branch lengths
under an additive-tree model in which branch
lengths are interpreted as amounts of evolution-
where Jx, Jy, and JXyare the arithmetic means ary change. Felsenstein (1984) countered that if
across loci of Cx?, Zy?, and Cx,y,, respectively, with branch lengths were interpreted as expected,
summations over allcles at each locus. Equation rather than actual, amounts of change, Farris's
(24) gives a biased estimate when sample sizes are objections were moot. While we do not wish to
small; an unbiased estimate of the standard dis- become entangled in this controversy (see also
tance is obtained by replacing Cxt and Cy,2 with Farris, 1985, 1986a; Felsenstein, 1986), we basi-
(2nCxCx,z- 1 ) / ( 2 n ~- 1) and (2ny2y? - 1)/(21ty - I), cally agree with Felsenstein, without going so far
respectively (Nei, 1978). DN is intended to mea- as to recommend routine usage of Nei's distance.
sure the number of codon substitutions per locus If Nei's model of evolution is appropriate (which
that have occurred after divergence between a is obviously open to question), then the non-
pair of populations (taxa). However, this interpre- metricity of his distance is not in itself a reason to
tation is valid only if the rate of gene substitution shun it.
per locus is uniform across both loci and lineages, Another widely used distance measure is that
an assumption that is almost certainly unrealistic of J.S. Rogers (1972):
(Hillis, 1984) for any systematically informative
data set. Hillis (1984)demonstrated that violation
of the assumption of rate uniformity leads to a pe-
culiar property of DN when it is applied in sys-
tematic studies involving interspecific compar-
isons. He sl~owedthrce hypothetical two-locus Rogers' measure has Lhe virtues of simplicity and
cases in which, for each case, two taxa had iden- an easily interpretable geometric basis. Except for
tical allele frequencies at one locus and shared no a scaling factor, it is simply the Euclidean distance
alleles at the second locus. However, due to dif- between the allele frequency vectors for each lo-
ferent levels of polymorplzism within the two cus of the two taxa being compared. However,
taxa, WNvaried from 0.41 to 1.10. Hillis (1984) con- Rogers' coefficient shares with Nei's the undesir-
sequently recommended the following modifica- able property of being too heavily influenced by
Pizylogclzel.ic Inference 4 63
within-taxon lieterozygosity (5. Wright, 1978; Nei and Li's (1979) inethod for estimating Lhc
Hillis, 1984); the distance between two taxa that number of ~~ucleotlde substit~~tionsthat have oc-
are fixed for alternate alleles exceeds that between curred slnce divergence of a pair of taxa X and Y
two taxa in wrhich one or both are heteroallelic but from a comnon ancestor I S typ~callyused An es-
have no alleles in common. timate of the proportion of ancestral restriction
An alternative Euclidean measure that over- sites that have remained unchanged untll the pre-
comes this Emitation is the arc distance of Cavalli- sent is given by
Sforza and Edwards (1967),which is given by
where n,, is the lumber of identical srtes shared

by the two taxa, and 72, and 1 7 are~ the tolal num-
ber of restriction sites in taxa X and Y,respcctibcly.
From thxs quantity we can estimate the mean
where 8 = COS-' 6 . Thus, if no alleles are number of substitutions per nuclcotide site uslng
shared between a pair of taxa, the distance takes either of the following.
its limiting value of one regardless of the variabil-
ity within either population. Perhaps more im-
portantly, this distance incorporates an angular
transformation of gene frequencies in an attempt
to make tile variances of the transformed fre-
quencies independent of the ranges in which they
fall. This transformation has the effect of stan- where r is the length of the endonuclease rccoglii-
dardizing the distance with respect to random tion sequence (usually 4 or 6 ) . The first formula
drift, so that the rate of increase ill genetic dis- (25a) treats original restriction sites restored by
tance under drift is nearly independent of the ini- back-mutations as new sites, and was first pro-
tial gene frequencies. T11e Cavalli-Sforza and Ed- posed by Upholt (1977). The second formula
wards (1967) arc distance and its relative, the (more correctly) considers the reverted sites as
chord distance, thus incorporate some realistic as- identical to the original sites.
sumptions about the nature of evolutionary Li and Graur (1991) suggested estimating the
change in gene frequencies without the undesir- proportion of nucleotide differences as
able properties of the Nei (1972, 1978) and the
Rogers (1972) measures.
The simplest distance of all is the Manhattan
distance (attributed to Prevosti by 5. Wright, and then using the standard Jukes-Cantor d ~ s -
19781, which for a single locus equals tance transformation to estimate the nuinbcr of
nucleotide substitutions (l.e, substitute fol D III
equation 20). A related method of estimatmg the
number of nucleotlde substitutions per sitc from
restr~ctionsite data via lnaxilnurn hkehl~oodhas
An arithmetic mean is used to combine distances been developed by J. Felsensie~n(available as a
across loci. Unlike the Cavalii-Sforza and Bd- test program "Restdist" from same locatioln as
wards (1967) distances, this method gives equal PHULIP; see Appendix). HIS rnetlzad assumes a
weight to a given frequency difference, regardless Kimura two-parameter model of evolution h e . ,
of where it occurs on the scale froin zero to one. It equal base frequencies with a potentially different
is not sensitive to intrataxon variability, however. rate of transitions relatlve to transverslons) a n d
To transform restriction-site data to distances, can mclude a correction for among-site ratc varia-
464 Chnpfer 11 / Szuojford, Olsen, Waddell & Hillis
tion ,-tccordmgto a gainma distribution (see "Ac- Hybridization data and their transforinatioll
commodat~ngAmong-Site R$te Variation in Dis- to amount of difference in the DNAs are dis-
t a ~ ~ Corrections,"
ce above). S is used to estimate cussed extensively in Chapter 7. These data can be
the proportion of restriction sites teat have been corrected for superimposed base changes by the
prcsel rred by a pair of species, and sl/' then rep- methods discussed above.
resents the corresponding fraction of silnllarity at
each of the r sites in the recognition sequence. The ~ ~ d ~ ti^^^ l - ~for character
~ ~ ~ d
d~siancevalue that predicts this fraction of simiIar
sites under the chosen model and parameter set- Data: Hadamard Conjugation
t u g \ is then estimated by maximum likelihood The Hadamard conjugation, or spectral analysis
The methods described above are appropriate (Hendy and Penny, 19931, offers another frame-
when all restrlctlon endonuclcase recogn~tion work for taking superimposed changes into ac-
S I ~ L are
~ the same length For studies involving count. It will not be possible to provide a coin-
enzymes wlth different slzes of recognition se- plete description and justification of this family of
qucnccs, more coinplicated methods developed methods in the space available, so we will instead
by Ncl and Tajima (1983) can be used, although try to provide a clear explanation of the basic
we wlll not describe them here. methodology. We begin by describing another
Nel and Li (1979) also addressed the problem model of character change introduced formally by
of cslrniatil~gnucleot~desubstitutions from re- Cavender and Felsensteln (1987). The Caven-
slrrct1oi1 fragment data I-lowever, these estimates der-Felsenstein model is essentially a two-state
,Ire reliable only if the actual number of substitu- equivalent of the Jukes-Cantor (1969) model.
tions has been low (e.g.,the samples are restricted Each of the two states (0and 1)are assumed to oc-
to wnspeclfic populations). Consequently, we will cur at equal frequency, and the probabrlity of
17ot describe their procedures for dealing with change from state 0 to state 1 is equal to the prob-
frngmcnt data; the interested reader can consult ability of change in the opposite direction. For ex-
their y aper directly ample, this model might apply if we pool the
purines (A and G) into one character state (0) and
Dnrr~~i~~oToyicaE
and Nucleic Acid Hybridization the pyrimidines (C and 2-1 into anotj~ercl~aracter
Dirla state (1).
Wh217 analyzing ~rnn~unological measurements, it
1s usi~nllyassu111ed that, wlthin certain ljmits, the Iievisiti~zgthe Felsenstein Zolze
mcasured iinmunological distance (ID) increases Consider the problein of calculating the probabil-
linenrIy wlth the number of ammo acid differ- ities of obtaining the various character patterns on
ences 111 the proteins being compared. The con- a tree such as that shown in Figure 16A, which
st'int of proportionality depends on the number corresponds to one of the examples used by
of ~ndcpendcntbinding domains and on the frac- Felsenstein (1978a) to demonstrate the potential
lion ol amino acid changes that alter a domain inconsistency of parsimony. Let Pi,klrepresent the
sufiiciently to inhibit antibody binding. Thus, probability of each possible pattern, where i, j, k,
there is s~gnificantuncertainty in the exact scahng. and 1 are the states (0 or 1)found in taxa 1, 2, 3,
If ~ V Cknew the scaling, we would apply a correc- and 4, respectively These pattern probabilities can
tlo1-1for superimposed amino acid replacements. be determined using the same system described
This is of little practical importance, however, under "Calculating the Likelihood of a Tree." As
sii?cc the amount of divergence being mcasured is an example, let us evaluate the probability that
~ L I I small,
L ~ so any correctxsn would also be small. the pattern of Figure 16B (0011) will evolve under
hie suggest equating evolutlonary distance to the the conditions of the Cavender-Felsenstein
immunological distance-that IS,assume that d = model. We first note that because of the time-re-
ID fur each pair of proteins. versibility assumption, we can re-root the tree at
(E) Pooll t Plloo= ZPooll = ~ ( 1%)(I-

- y)*(x +I) + x2y3
Figure 16 Calculation of the probability of observing a Tree re-rooted at an arbitrary internal node. (D) Calcu-
given pattern of character states on a tree. (A) An un- lation of the probability of the pattern shown in (B). (E)
rooted tree for four taxa with probabilities of character Calculation of expected proportion of characters that fa-
differences x or y along each branch. (B) Tips of tree la- vor tree (A). (F) Calculation of expected proportion of
beled by character states in the pattern of interest. (C) characters that favor the tree grouping taxa 1and 3.
an arbitrary internal node (Figure 16C), and then where x is the probability of a character-state
sum the probabilities of each of the four configu- change along the "long" branches and y is the cor-
rations of states at the two internal nodes (Figure responding probability for the "short" branches.
16D). That is, for each scenario, we multiply the Equation (26) is equivalent to one given by
prior probability of the basal state (= 112 in this Felsenstein (1978a). A similar derivation reveals
case) times the product of the probabilities of the that the probability of a pattern evolving that sup-
various changes (or non-changes) implied by each ports the tree grouping taxa 1+3 and 2+4 is
reconstruction. Because of the symmetry of the
branch lengths used here, the probability (Plloo)of
the other pattern that supports the tree of Figure
16A is equal to Peon. Thus, the probability of a
character pattern evolving that supports the true Felsenstein (1978a) used these results to show that
tree is for many values of x and y ( x > y), the probability
of evolving character patterns that favor an incor-
rect tree exceeds that of patterns supporting the
466 Chapter I 1 / Swuffu~d,Olsen, Waddell & Hillis
true tree. For example, if x = 0.3 and y = 0.06,
Thus, a sample of 1000 characters will,on avcr- Figure 17 Definit~onof Hadamard matrices. A
age, contain 30 more characters favoring a n incor- Hadamard matrix H IS a square matrix whose entries
rect tree than the true tree (78 versus 48). are all 1 or -1, and with every row (and column) or-
The tedious strategy outlined i n the above thogonal to every other row (and colunii~).(A) Basic
form of a Hadamard matrix, and recursive formula for
paragraph could be used to calculate the proba- generating the next larger matrix. (B) Example calcula-
bilities of any of the z4 = 16 possible character pat- tion of a matrix wit11 four rows and colulnns from the
terns for the four terminal taxa. Furthermore, it previous matrix with two rows and columns.
could in principle be generalized to trees of any
size. But as there are 2T distinct cl~aracterpatterns
and 2T-2 ways of generating each, this algebraic above, in = 8, and the correspondi~lgHadamard
approach quickly becomes unmanageable. matrix is
Calculating Character-Pattern Probabilities via

the Hadamard Conjugation
I-Iadamard conjugation (Hendy and Penny, 1993)
provides a n alternative mechai~ismfor obtaining
the above pattern probabilities." A Hadamard ma-
trix (described by the nineteenth century mathe-
matician of that name) is a matrix of 1's and -1's
in a simple repeating pattern (Figure 17). For T
taxa a n d t w o character states, w e will use a
I-Iadamard matrix containing m = 2T-1 rows a n d
columns. For the example discussed in the section
'This section assumes some familiarity wit11 matrix algebra; see many statistics texts or any linear algebra text for
introductions. Bulmcr (1994, p. 293 ff,)provides an accessible overview for biologists. For now, note t.11at the prod-
uct of a matrix A and a vector b, denoted Ab, can be obtained as in the foIlowing example:
The inverse of matrix A, denoted A . ~is

, a matrix such that AA-I= I, where 1 is d n identity matrix that has 1's on the
diagonal and 0 everywhere else. For examplc, if
(see Flgure 17 for a n explanation of how these ma-
trices are defined). The branch lengths x a n d y
represent the probabilities that the character states
at either end of a branch will be different ("ob-
served differences") at a given site. We will store
these values in an m-element vector p a t a posi-
tion determined by the indexing scheme shown in
Figure 18. For our example, p is defined as
(R) miti011
0 a, (1,2,3,41
1 (11, [2,3,4)
2 [2), IS, 3,41
Figure 18 (A) Indexing of partitions 111 thc Hadalnalcl

conjugation. To labcl branches, root the trac arbltral~ly
The first element of this vector, po, is always set to at the lughest numbered taxon Label as 2'-' thc branch
0,If the branch corresponding to a given index k leading to each tip 1. Labcl the remaining branchcs by
does not exist in the tree, p k is also set to 0. Thus, tlze sum of the labels of the blanches immed~ately
w e have p g = p6 = 0,because branch 5 (represent- above it. (B) Each branch defines a partition or spht that
rs Indexed by the branch's labcl. Note that the partition
ing a partition separating taxa 1 and 3 from taxa 2 corresponding to any mdex k can be determ~nedby cle-
and 4) a n d branch 6 (representing a partition sep- composing it into its blnary components. For cxample,
arating taxa 2 and 3 from taxa 1 and 4) d o not ex- ~ 1 t h T = $ t h e i n d e x 9 0 = 6 4 + 1 6 + 8 + 2 = i2- 2~4 4 z3+
ist on the tree. Z1, correspond~ngto the partltlon (2,4,5,7],(1,3,6,8).
Under the conditions of tlze model, the ob-
served differences p can bc converted to "ex-
Finally, multiplicatiol~of a matrix by a scalar (ordinary number) ~ m p l ~multiplication

es of every element of the
matrix by the scalar:
468 Chapter 11 / Swafford, Olsen, Waddell:&?Willis
pected total changes per site" q using the formula
where L], is the expected number of changes per site along branch i. Note that this
is just a special case of the general Poisson-correction formula (21) with = l / 2 .
Define the branch-lengliz spcclrum ~ ( 7as)
f -1.108040'
127).In some cases (e.g., simulation

t v l t l ~yl ilirough q7 defined using eqilatinn
it may be more convenient to start with the y vector directly, in w h ~ c h
scud~<s),
case
1 - e-2q1
P, =7 (28)
h o w let s(T) be the expected sequence spectrum-a vector where each element s k is
the precllcted proportion of the characters supporting each possible bipartition of
tlic taxa (division into two subsets; see Figure 18 for how bipartitions are in-
dexed! For example, s3is equivalent to Felsenstein's (1978a) Pooll+ Plloo,and % is
equivalent lo Polo,+ PloloThe values of s(T) can be obtained using the following
I-ladamnrd conjugation.
where the expol~entialfunction is applied separately to each element of Wy, Let US

apply formula (29) to our example. First, the generalized distance vectors p and
r m e calculated as follows:
0
Each entry in p represents -26,*, where 6: is a corrected generalized disfance. The

exponential tr:nsformation then converts each p, to an observed generalized dis-
.
tance, 1; = 1 - 2 4 (They are called "generalized distances because they represent
the lengths of path sets that correspond not only to distances between pairs of
taxa, but also to groups of non-intersecting paths involving even numbers of
taxa.) The expected sequence spectrum s(T) for tree T is then obtained as foIlows:
s(T) = H-'r = (k~ ) r

(32)
(The simple form of the inverse of a Hadamard matrix, shown above, is an im-
portant advantage of the method.) For our example,
The probabilities corresponding lo Plloo+ Pool,and Polol-i- PLolo (sj and 55, respec-
tively) correspond exactly to those calculated algebraically in the preceding sec-
tion. Hada~nardconjugation has strong advantages over such algebraic calcula-
470 Chapter 11 / Swoford, Olsen, Waddell & Hillis
tions, however. First, we have not only calculated

the probabilities of characters supporting these
two bipartitions, but all of the other bipartitions
as well. Second, the method is general, and ex-
tends automatically to the calculation of expected
character-state distributions even for more realis-
t~ evolutionary models and large trees. (Note,
however, that the exponential growth of the size
of the vectors-e.g., 524,288 elements for 20
taxa-puts a practical limit on tree size.) For in-
stance, Hendy and Penny (1989)used this feature
to show that parsimony could be inconsistel~tun- Thus, the full Hadamard conjugation in this di-
der a molecular clock; Bull et al. (1993a) used it to rection is
examine the consequences of combining different
sources of data; and Charleston et al, (1994) used
it for tree-selection simulations.
The conjugate spectrum is evaluated as
Invevtibility of the Hadanzard Conjugation
Although the prediction of pattern frequencies as
outlined above can be useful, the power of the
Hadamard conjugation in phylogenetic applica-
tions lies in its invertibility. Specifically,all of the
above operations can be performed in the oppo-
site direction: starting from am observed sequence
spectrum 6 (pattern frequencies observed in the
data), we can work back to a conjugate spectrum
f, which is an estimate of the underlying branch-
length spectrum y(V. To demonstrate the inverse
operations, suppose that the observed sequence
spectrum i corresponds exactly to s(T) as calcu- Finally, use of formula (28) to convert ou; estimale
lated in equation (33). Solving for r in equation of the expected number of changes ( Y , to the
(32) yields nuinber of observed differencespredicted for each
branch ( Pr) leads to exact recovery of the original
branch lengths (x = 0.3, y = 0.6).
Application t o Real Data

Eve11 if the assumptions of our evolutiol-tary
model were perfectly satisfied, we cannot expect
the observed sequence spectrum i to correspond
exactly to the true spectrum s, because the se-
quences obtained in an actual study represent a fi-
nite sample and therefore are subject to sampling
error. To illustrate the use of Hadamard conjuga-
tian in practice, we will draw a sample of charac-
ters that have evolved according to our model;
and since the log function is the inverse of the ex- this sample will be used to represent a set of ob-
ponential, served sequence data that have evolved according
to the model.
Phylogene-tic Inference 471
The 4 vector below shows the results of a CHOOSING A TREE Wltl'i real data sets, the PIC-
random sample of 1000 characters (using the ture is seldom as clear as the abovc section sug-
pseudorandom number generator in Mathemat- gests, and we must use one of several methods
icao) according to the expected sequence spec- to choose an optimal trec based 9n the trans-
trum in equation (33). formed data represented by the Y vector Tll?
closest tree procedure (Hcndy, 1991) is one corn-
monly recommended method. For a given trec z
containing K branches, ~t is straightforward io
find a vector q(7) th$t mlnlinlzes thc Euclldcan
distance from q ( ~toy.
) The squared distance can
be obtained [without the need to form q ( ~ )
exphcitly] using the forrnula
As expected, parsimony alialysis of this data set

will choose an incorrect tree, as the 48 characters
supporting the true trec ( 23) are contradicted by where the expresslolls e, E e ( ~and) ez E e(7) 11m1t
76 charactcrs ( gs) supporting the trec that groups the summations to those branches (= edges) that
are included in or absent from, rcspectlvcly, the
taxa 1 and 3. Solution of equation (36) yields
tree being tested (Hendy and Penny, 1993).
T11c closest tlze 1s the one that minimizes the
value of forinula (38) over all possible trees, and
can be found using (for example) a modificat~on
of the brancll-and-bound algorithm onf Penny and
Hendy (1987).Note that some of the Y values cal-
culated using formula (37) (othcr than Po)may be
negative, although this did not happen In our ex-
ample. Any tree that would Include one of these
branches is autoinatically rejected.
For the example in thc above senctlon, the
scpared distances of the three trees to n r e
If the transformed data conform to a treelike pat-
tern, all but 2T - 3 of the elements ill Ywill be
close to zero (or negative, in the case of yo),and
the bipartitions corresponding to significantly
positive elements will be compatible with a single Thus, the first tree is thc closest tree.
tree. In this example, f 5 and f6 are both close to Another method for chooslng a tree is cor-
zero. The remaining bipartitions are compatible rected parsimony. The conjugate spectrum can
(all but f 3 define an "uninformative" partition be thought of as a transformation of the origlnai
splitting a single terminal taxon from the remain- data matrix to a new data matrix containing 2T-1
der). Thus, the trec of Figure l6A is clearljr speci- characters (in the case of two states), each corre-
fied by t l ~ ecorrected data. sponding to the partitions associated with a row
of ? The elements of ? are used as character Note that equation (39) is equal to the first term
weights, and a minimum-length tree under the on the right-hand side of (38).The second term in
weigliied parsimony criterjon is sought. As noted (38),although different for each tree, appears not
above, some elements of Y may be negative due to contribute greatly to the discrimination among
to lack of model flt or sampling error; these val- trees (Waddeil, 1995), and dropping it from the
ues are typically set to 0 before proceeding. Cor- optimality criterion allows US to use character
rccied parsimony 1s always consistent under the compatibility methods to ~ninimjze(39). Specifi-
Cavcr-tder-Felsenstein model (Steel et al., 1993a), cally, after setking any negative values in Y to 0,
unlike siandard parslrnorly. Corrected parsimony wc square each element and find a maximum
chooses the correct tree in our four-taxon exam- weighted clique; solution of this problem is then
ple, bec'luse the weight of character patterns sup- equivalent to minimizing the sum of squared de-
poriir~gthe true tree ( Y 3 ) is greater than that of viations for the excluded partitions from their ex-
pected value of 0. This ~ e t h o dseems especially
character patterns favoring alternative trees (75 promising when each Y,is divided by its esti-
and 76). The simulation studies of Cl~arleston mated sampling error before proceeding (yielding
(199-1)suggest that corrected parsimony can be the vector y,,), which gives a forin of weighted
lughly eflective in some situations, and in general least-squares tree selection (Waddell, 1995).
tends to outperform the closest tree and other
~ncihodsdescribed below. DATA EXPLOIZATION Apart from their use in esti-
An analogous method of corrected character mating trees, spectral analysis rnethods are use-
compnilbllity also can be employed. This method ful as aids in understanding the peculiarities of
searches for the largest weighted clique for the particular data sets. Strong contradictory signals
salnc data matrix and weights used for corrected in the? vector allow the data to reject the model,
parixrnony. A clique is slmply a set of mutually and we should explore the reasons that the cor-
compatible characters that can all fit OII the same rect data are not treelike if this occurs. Lack of fit
evolilllonary tree without hornoplasy (e.g., LC to a tree lnay indicate that our model is too sim-
Quesne, 1982, Estabrook, 1983). Standard graph ple (e.g., we are not accounting adequately for
t l ~ c o r yalgorithms exist for exact solution of the rate heterogeneity across sites, or the substitu-
wergl~tedclique problem (e.g., Bron and Ker- tion model is too restrictive). Alter-natively, there
b ~ i c h 1573).
, inay be multiple signals due to re-combination
A final method 1s actually a hybrid of the clos- or to non-independence among sites.
est tree and character coinpatibility approaches. It is lielpfui to plot the inferred branch lengths
Retnember that when evolution proceeds exactly (7 values) divided by their estimated standard er-
accordrng to the model and there is 1x0 sampling rors to see how much statistical support the "sig-
error, 2T - 3 of the elements in T will be positive; nals" really have (Waddell et al., 1994; Waddell,
1995). Another useful way of viewing the cor-
the I emainder (except for To) will equal 0. Thus, rected sequences is to plot the magnitude of each
fol a n y particular tree, the squared deviations signal in the conjugate spectrum against the sum
from 0 of the elements of '? that correspond to bi- of its pairwise incompatibilities with all other se-
partltlons not found on the tree is a least-squares quence patterns (a support/conflict spectrum; see
xnensure of the lack of fit: Lento et al., 1995).These graphical representations
of noise in the data set allow exploration of the
factors responsible for conflicts in different re-
gions of the tree and suggest which hypotheses of
relationship should be subjected to further
scrutiny. The paper by Lento et al. (1995) provides
good examples of this approach.
Extension t o FOUYCharacter States pooling of substitution types reduces stochastic

Hadamard conjugations can be extended to han- errors if the simpler models are adequate.
dle all four bases as character states under a ver-
sion of Kimura's (1981) K3ST model, which clas- Among-Site Rate Variation and
sifies substitutions into three types: type I = lMaximum Likelihood
transitions; type I1 = transversions between A and The Hadamard conjugation can be modified to al-
C or G and T; and type 111 = transversions below for unequal substitution rates across sites
tween A and T or C and G (see "Models of Evo- (Steel et al., 1993c; Waddell, 1995; Waddell and
lution," above) The model is generalized to al- Penny, 1996b) in much the same way as the cor-
low the probabilities of these events to be rections arc made for distances (e.g., G.J. Olsen
different for each branch of the tree. Under this 1987; Jir. and Nei, 1990; see above). To estimate
model there are 4T/4 = 4T-1distinct sequence pat- pattern probabilities assuming a gamma distribu-
terns (i.e., patterns such as AAGG, CCTT, GGAA, tion, we need only replace the exponential func-
and TTCC are equivalent). These patterns are in- tion in the Hadamard conjugation (formula 29)
dexed using a modification of the binary coding with [(a-p) /a]-? where a i s the shape parame-
for the two-state case (see Hendy et al., 1994; ter. If going from observed sequence data to the
original derivation due to Szekely et al,, 1993) corrected sequence spectrum using femula (36),
that define quadripartitions of the taxa (partitions we replace the logarithm function with a(l -
into four or fewer subsets). Application of the x-l'a). For practically any distribution (e.g., the
Iladamard conjugation to this observed sequence log-normal) the appropriate path-length correc-
spectrum using formula (36) corresponds to a tion can be estimated numerically (as in G.J.
correction for superimposed substitutions ac- Olsen, 1987) if an analytic form does not exist as
cording to the generalized K3ST model, Within for the gamma distribution.
the corrected data (conjugate spectrum Y), there Recall that for any tree T and branch-length
are three sets of 2T-1 entries, as for the two-state spectrum y, we can obtain the associated vector
case. These elements correspond to the number of expected pattern frequencies s using formula
of transitions, type I1 transversions, and type I11 (29). Since the log likelihood of the tree is given
transversisns, respectively. The remaining ele- by
ments are expected to be 0 under the model,
again as in the two-state case. We can use closest
tree, corrected parsimony or compatibility, or
least-squares methods to select a tree from this
spectrum. where f; is the frequency of sites with pattern i in
Another promising way of treating four-state the data, and s, is the probability of this pattern
nucleotide data using three separate 2T-" under the model, Hadamard conjugation pro-
Hadamard conjugations has also been developed vides an alternative algorithm for maximum like-
(Waddell and Hendy, 1995). These calculations lihood estimation. It is especially useful for maxi-
give essentially the same results as the much more mum likelihood tree inference with among-site
computationally expensive (order 4T-1) approach rate heterogeneity using continuous distributions
of Hendy et al. (1994). such as gamma (Waddell, 1995; Waddell and
Subcases of the K3ST model can be handled Penny, 19964. Although limited to the general-
by averaging the patterns in the observed data ized K3ST model and its submodels, this ap-
that are equivalent under the more restricted proach can be much faster than that of Z. Yang
models (Waddell, 1995).For example, if we aver- (1993).
age the type II and type UI transversions, we force
the corrections to be made according to a general- Statistics on the Corrected Sequences
ized K2P model, and if we average all substitu- It is straightforward to obtain the variance-co-
lions we obtain a generalized JC correction. This variance matrix of the corrected sequence data via
the delta method approximation (Waddell et al., the ?' values from the usual Hadamard conjuga-
1994). The simulations by Waddell et al. (1994) tion under the same model, because the distance
showed that the covariance matrix derived in this method of estimating path-set lengths involving
way gives nearly unbiased results, whereas boot- more than two taxa has lower variance (Waddeli,
strap resampling tends to yield overestimates. As 1995).Consequently, tree selection using this vec-
long as a pattern occurs five or more times in the tar tends to be more reliable (Charleston, 1994).
observed data, it is reasonable to treat the corre- However, the distance Hadamard does not seem
sponding corrected pattern (or branch length) as to be as sensitive as the Hadamard conjugation at
normally distributed, resulting in straightforward detecting violations of the model's expectations.
confidence intervals, or tests of the hypothesis The studies of Lento et al. (1995) and Lockhart et
that its true value is zero. The covariances of cor- al. (199510)suggest that this method is a useful ex-
rected patterns can also be thought of as covari- ploratory tool when trying different distance
ances of tree branch length estimates. Generally, transformations, although more study is needed
the more changes per site there are on the tree, the on how directly a pattern from the distance
more strongly branch lengths become either posi- Hadamard can be treated as evidence for specific
tively or negatively correlated (Waddell et al., sequence patterns.
1994). (These interdependencies tend to make the
iterative search for a maximum likelihood solu-
tlon slower.) Another conclusion from this study Lake's Method of Invariants
is that long branches, even when not biasing the
topology of the tree, nonetheless cause a large in- Xatio~zale
crease in the variance of internaI branch length es- As discussed earlier in this chapter, the presence
timates, reducing the reliability of tree selection. of more than one long, unbranched lineage in an
It is possible to estimate a confidence interval on analysis can lead to systematic error in the ab-
transition:transversio~lratios or the shape para- sence sf perfect compensation for superimposed
meter of distributions used to model among-site substitutions. In the context of parsimony, the ho-
rate variation (Waddell, 1995). moplasies along the long branches can over-
whelm the informative character changes along
The Distance Hadanzard the internal branch(es) of the tree (see Figure 8
The last part of the Hadamard conjugation (from and the section "Parsimony and Inconsistency").
p to ?) can also begin from a matrix of pairwise Ideally, we would like to distinguish informa-
distances (either corrected or uncorrected) tive changes from homoplasies. In parsimony and
(Hendy and Penny, 1993). We would like to esti- maximum likelihood analyses, the addition of
mate a branch-length spectrum (now called ?,) new sequences whose branch points subdivide
and choose an optimal tree from this spectrum, the longest lineages (i.e., representation of taxa
ana1ogously to the procedure used for sequence that are specifically related to the most divergent
data. We input the distances at the level of the taxa already in the tree) will tend to accomplish
generalized distance vectors p (formula 35). How- this goal. The effect is illustrated in Figure 19
ever, a co~nplicationarises because these vectors where adding sequences A' and B' to the tree
include elements corresponding to path sets in- would reduce the effects of hon~oplasiesalong the
volving more than two taxa; see Hendy and branches leading to A and B. Of course, the prac-
Penny (1993) for a method of estimating these tical utility of this approach requlres that appro-
path-set lengths. The ?,vector resulting from for- priate taxa exist, that their identities are known,
mula (37) then serves as the basis for choosing a and that the corresponding sequence data exist or
tree as described above. can be generated. A second method of reducing
Simulations and analytic calculation: have the effects of homoplasy is to confine the analysis
shown that the variances of entries in the Y, vec- to the most conserved sequences (both on the ba-
tor resulting from this approach are lower than sis of the overall conservation sf the molecule and
Phylogerzetic Infe~ence 4 75
evcnts were subtracted from the type 1 events A

complete accounting of possible transvers~ons
and transitions yields the scoring system in 'Table
2.
hlethodologtj
Lake's method can be described by the folio~inzg
sequence of steps:
1. Choose a quartet of aligned sequences; call
them A, B, C, and D.
2. Find the alignment positions in m~hichtwo se-
quences have purines and two have pyrim-
idines.
Figure 19 Adding new taxa to a parsimony or maxi-
mum likelihood tree to reduce the effects of homoplasy. 3. Consider the three possible groupings of se-
Given the unrooted tree shown in heavy lines, the long quences (see Figure 21): AB/CD (A with 13, C
lineages leading to A and B would have the greatest with D), AC/RD and AD/BC. Call these
tendency to artifactually group due to parallel or con- branching patterns X, Y, and Z,respecti~iely.
vergent changes in sequence. Adding taxa A' and B'
would reduce this effect by subdividing the long lines. 4. Using the sequence posltlons at which se-
quences A and B are bofh purines or botiz
pyrimidines (and sequences C and D are both
by selecting the most conserved portions of the of the opposite class of base), use the rules In
molecule). In distance-based analyses, estimates Table 2 to count the number of positions that
of the superimposed substitutions (which include support and the number that counter brnnch-
the homoplasies) can also be included. ing order X. Call these totals X+ and X-, rc-
Lake (1987a) suggested an alternative spcctively. Similarly, find the support (Y9and
method, which he called evolutionary parsimony, countersupport (Y-) for branclung order Y, us-
for analyzing the branching pattern linking four 111g the sequence pos~tionsat w l ~ l c hse-
nucleotide sequences. The analysis can be derived quences A and C have the same class of base,
from the following assumptions: (1)substitutions and B and D have the opposite class, Finally,
at a given sequence position are independent; (2) find the support (Z+)and countersupport (Z-)
a balance exists among specific classes of transfor branching pattern Z. If the counting has
versions (a sufficient coi~ditionfor this balance is bee11 done correctly, the total of Xt,X-, Y ' , I/-,
that transversions are equally likely to yield each %+, and Z- will be equal to the total numbcr of
of the two possible s~~bstitution products, so that positrons with two purines and two pyrim-
C is equally likely to change to A or G, etc.); and idines, as found in the second step.
(3) insertions or deletions can be safely ignored.
5. The net supports for branching patterns X, Y,
An advantagc of the method is that it does not as-
and Z are
sume anything about rate equality over sites; each
site is free to evolve at a different rate than all
otl~ersites.
If the assumptions are satisfied, then parallel
transversions in the two branches of a tree pro-
The support for two of the branching patterns
duce equal numbers of similar (type I in Figure
should be near zero, while the remaining
20) and dissimilar (type 2 in Figure 20) nu-
branching pattern inay or may not be sup-
cleotides. Thus, the net effect of peripheral branch
ported by a significantly non-zero score.
transversions could be cancelled if the type 2
476 Chapter 11 / Szuofford, Olsen, Waddell & Hillis
Informative Parallel Parallel

transversion transversions 1 transversions 2
Tree and
obscrvcd
nucleotldes
Supports Supports No effect

Influence on \ /
Supports Supports Supports

Influence on
transversion
parsimony
Supports Supports Counters

Influence on
evolutionary
parsimony
Y
Figure 20 Nucleotrdc substitutlon patterns and thelr third pattern illustrates the possibilily that independent
effccls on different metl~odsof phylogenetic tree mfer- transversions in two peripheral branches will yleld dif-
ence The hrst pattern, informat~vetransverslon, repre- ferent nucleotides. The pattern is uninformative to tra-
sents i l ~ eeffect of a slnglc nucleotide substitutlon that ditional parsimony (two substitutions would be re-
is In thc ~ntcrnal(central) branch of the tree It 1s an ex- quired regardIess of the assumed branching order).
ample of ~ h lnformatlve
c characters upon which pars%- Transversion parsimony will conslder this pattern to be
117ony depends. Transverslon parsimony and Lake's support for the incorrect tree since the outcome looks
metlxod of Invariants rely entlrely upon transvcrslons like a central branch transversion (in an incorrect tree)
for ~r~forrnatlve events The secol~dpattern portrays a combined with a peripheral branch transition (which is
posilble outcome of two peripheral branch transver- ignored), Lake's method treats this third pattern as an
slons Because the results are ind~stii~guishable from thc estimator of multiple substitutions in peripheral
fir5t paltern (two A's and two C's), all methods wlll branches and subtracts it from the support for the in-
lnistahe thls as support for an incorrect phylogeny The correct tree.
6. Lake 11987a) suggested that statistical signili- (1988a) correctly pointed out that the x2 ap-
cance be evaluated by a one degree of free- proximatio~~ is inadequate when counts are
dom x2 test: low and recommended the use of the exact bi-
nomial test instead.
NEGATIW VALUES The net support of a tree can

be negative and yet significant (e.g., xis negative
and is significantly large). Lake (1987a) sug-
gested that this result could be interpreted as
Therefore the outcolne of interest is two val- positive evidence for the corresponding branch-
ces of X* that do not dilfer significantly from ing pattern, if no other pattern has significant
21310 and one value that does. Holmquist et al. support. However, significantly negative values
sl~ouldbe viewed with extreme caution, because
Branching pattern X Branching pattern Y Branching pattern Z

Figure 21 The three unrooted branching patterns with four sequences.
such outcomes are most likely to be the result of central branch transversion) rather than subtract-
selective pressure or some other non-random ing them as does Lake's method.
process.
Performance
TRANSITIONS AND TRANSVERSIONS The phyloge- Despite its intuitive appeal, the drawback of
netic information provided by Lake's method is Lake's method is inefficiency. Especially when
based entirely on transversion substitutions, so rates of change are high, simulation studies sug-
positions with two purines and two pyrimidines gest that it requires vastly more data to achieve
are required. If there are no transversions, there the same probability of inferring the correct phy-
will be no signal. On the other hand, transition logeny as other methods. For example, in four-
substitutions decrease the signal. In particular, taxon simulations using the K2P model under
peripheral branch transitions convert informa- long-branch-attraction conditions, Hillis et al.
tive (supportive) positions into countersupport, (1994b) found that Lake's method required about
suggesting that: the method might be particularly lo8 nucleotides before its probability of selecting
sensitive to the ratio of transitions to transver- the correct tree exceeded 1/ 3 (= the probability of
sions. If transitions are indeed substantially more a randomly chosen tree). Maximum likelihood
frequent than transversions, then it is difficult to analysis, on the other hand, achieved 95% success
accumulate a sufficient number of transversions at only 5000 nucleotides under the same condi-
to infer the branching pattern without having the tions. Lake's method can be consistent under con-
signal raxtdomized by transitions (see W.-H. Li et ditions in which maximum likelihood (as cur-
al., 198713).As noted above, generalized parsimo- rently implemented) is inconsistent, so given
ny (character-state weighting), transversion par- enough data, it remains a potentially useful
simony, and transversion-based distance meth- method. Unfortunately, "enough data" may be
ods provide alternative methods of coping with vastly more than the amount available.
a high transition:transversion ratio. Under many
conditions, these methods are much more effi-
cient than Lake's method at finding the correct
Rooting Revisited
tree (Ilillis ct al., 199410). Most of the methods discussed above do not spec-
Interestingly, transversion parsimony (as de- ify the location of the root. If, as is generally the
fined in this chapter, which differs from Lake's case, a rooted tree is desired, the root must be lo-
use of the term) applied to four sequences seeks cated using extrinsic information. As mentioned
the tree, X, Y or 2, with the largest value of XC + above, the most commonly used method is to in-
X-, Y+ + Y-, and Z+ -1- 2-.By examining the equa- clude one or more taxa that are assumed to lie
tions in (40), it can be seen that transversion par- cladistically outside of a presumed monophyletic
simony uses the same data but adds the terms that group. We recommend including more than one
look like a peripheral branch transition (and a outgroup taxon as a means of testing the assump-
478 Chapter 11 / Swofford, Olserz, Waddell b Hillis
tion of ingroup monophyly. If there is a single

branch on the unrooted tree that partitions the in-
group taxa from the outgroup taxa (e.g., Figure
22A), then the tree is consistent with the assump-
tion of ingroup monophyly. If, on the other hand,
there is no such branch (Figure 22B), then we have
rejected the monophyletic ingroup hypothesis (at
least in a non-statistical sense). Of course, this test
is one-sided: the existence of a branch that parti-
tions the assumed ingroup versus outgroup taxa
is no guarantee that the root does not lie some-
where within the ingroup. But at least the attempt
to reject the hypothesis of ingroup monophyly
failed, and one can feel somewhat more confident Figure 22 Use of multiple outgroup taxa to infer the
about the assumption far that reason. location of the soot of a tree. (A) The branch indicated
Rooting is frequently the most precarious step in bold partitions the ingroup taxa froin the outgroup
in any phylogenetic analysis. In particular, con- taxa, yielding an unambiguous root for the ingroup
necting a distant outgroup to a tree can be very portion of the tree. (B) No sii~glebranch partitions the
ingroup taxa from the outgroup taxa. The data do not
problematical, as there may be so many changes support the assumption of ingroup monophyly
along the branch connecting the ingroup to the
outgroup that the sequences have become effec-
tively randomized. In the worst case, this can lead
to spurious "long branch attraction" effects (see SEARCHING FOR OPTIMAL TREES
the section on "Parsimony and Xnconsistency"), As emphasized above, methods that have explicit
with artifactual rooting along longer ingroup optimality criteria (e.g., maximum parsimony, ad-
branches (Hendy and Penny, 1989; Miyamoto and ditive-tree distance methods, and maximum like-
Boyle, 1989; W.C. Wheeler, 1990b; D.R. Maddison lihood) separate the problem of evaluating a par-
et al., 1992).For this reason, it is often preferable ticular tree under the selected criterion from that
to be satisfied with an unrooted tree than to in- of finding the optimal. tree(s). Most of our presen-
clude a highly divergent outgroup taxon in the tation to this point has dealt with the former prob-
analysis. An alternative strategy (Nixon and Car- lem; in this section, we address the latter. For data
penter, 1993) is to perform an analysis of only the sets of small to moderate size (8-20 taxa, depend-
ingroup taxa first, and connect an outgroup taxon ing on the criterion), exact methods that guaran-
to the resulting unrooted tree secondarily (Lund- tee the discovery of all optimal trees may be used.
berg, 1972).Altl~ouglithe location of the root may For larger data sets, exact solutions require a pro-
still be suspicious, at least the distant outgroup lubitive amount of computing time; consequently,
will not confound the estimation of the iunrooted) approximate metl~odsthat do not guarantee opti-
relationships of the ingroup. mality must be used.
The choice of outgroup taxa can exert a skrong
effect on the analysis, so the outgroup(s) must be
chosen carefully. It is especially important to Exact Algorithms
choose outgroups that minimize the impact of
long branches (i.e., it is much more important to Exhaustive Search
break up long sister-group lineages than to in- The conceptually simplest approach to the search
crease the sampling density of more distant for optimal trees is to evaluate every possible tree.
clades). A.B. Smith (1992)provides an excellent Assuming that exact methods exist for evaluating
discussion of these and related issues. a particular tree, we need only a method for enu-
merating all possible (strictly bifurcating) trees in
Pizylogeizctic Inference 473
Figure 23 Enumeration of all 15 possible unrooted trees for five taxa (see text)
order to find a globally optimal solution. A sim- blc trees for the first five taxa, obtalned by addnig
ple algorithm, outlined in Figure 23, can be used the fifth taxon to each of the five possible
to perform this enumeration. Initially, we connect branches for the tliree trees obtaincd at the four-
tlie first three taxa in the data set to form the only taxon stage. This makes clear the rationale fol cx-
possible u ~ ~ r o o t etree
d for these taxa (Figure 23, presslon (1) for counting the number of posslble
row 1).In the next step, we add the fourth taxon unrooted bifurcating trees for 7' laxa for each oT
to each of the three branches of the three-taxori the possible trees for r - 1 taxa, tliere are 2(1- 1)- 3
tree, thereby generating all t l ~ r c epossible un- = 21 - 5 branches to w111ch the it11 taxon can bc
rooted trces for the first four taxa (Figure 23, row connected. Note that the order of additlo11 1s 1x11-
2). We continue in a similar fasliion: adding the ith material; we could have just as easily choscn taxa
taxon to each branch of every tree (containing i - 1 at random for next addition at each step.
taxa) generated during a previous step. Thus, for Evaluation of cxpresslon (1) for several possi-
example, row 3 of Figure 23 contains all 15 possible values of T qulckly reveals why exliaushve
480 Chnpter 12 / Swofford, Olsen, Waddell b Hillis
B E D C D E E C D C D C
v?A??'+?'FA
C2.1 C2.2 C2.3 C2.4 C2.5
Figure 24 Search tree for branch-and-bound algorithm (see text).
scardx procedures are usehl only for small num- enumeratio~~ is available for any criterion whose
bers of taxa There are 945 possible unrooted trees value is known to be non-decreasing as additional
fot only 7 taxa, over 2 x 106 trees for 10 taxa, and taxa are connected to a tree. The branch-and-
over 2 x 1020possible trees for 20 taxa (Felsenstein, bound method, frequently used to solve problems
1978b;see Table 2 in Chapter 12). Thus, exhaustive in combinatorial optimization, was first applied to
enumeration of all possible trees typically is feasi- evolutionary trees by Hendy and Penny (1982).
ble only for 11 or fewer taxa (34,459,425 trees). The branch-and-bound method closely resembles
tile exhaustive search algorithm described above.
B1,nrrcTr-cljld-Bound Methods in this procedure, we traverse a search tree In a
Fortunately, an exact algorithm for identifying all depth-first sequence, as illustrated in Figure 24.
opl~lrialtrees that does not require exhaustive The root of the search tree (A) contains the only
possible tree for the first three taxa. We first con- If we reach the end of a path on the search
struct one of the three possible trees obtained by tree and obtain a tree whose score is equal to the
connecting taxon 4 to tree A, yielding tree B1. upper bound L, then this tree is a candidate for
Then, to this tree, we connect taxon 5, yielding optimality. If this score is less than L, then this is
tree C1.1. (If there were more than five terminal the best tree found so far, and we have improved
taxa, we would continue to join additional taxa in the upper bound on the score of the optimal
this manner until a tree containing all T taxa had tree($. This improvement is important, as it may
been completed.) Now, we backtrack one node on enable other search paths to be terminated more
the search tree (i.e., back to tree B1) and generate quickly. When the entire search tree has been tra-
the second tree resulting from the addition of versed, all optima1 trees will have been identified.
taxon 5 to tree B1 (= tree C1.2). When all five of The branch-and-bound method is extremely
the trees derivable from tree B1 (Cl.l-Cl.5)have effective for many criteria, permitting exact solu-
been constructed, we backtrack all the wav to tree tions for 20 or more taxa, depending on the effi-
A of the search tree and take the secoid path ciency of the implementation, the speed of the
away from this node, leading to tree B2. As before, available computer, and the "messiness" of the
all five trees derivable from tree B2 (C2.1-C2.5) data. The method can be used to search for opti-
are constructed in turn. Then we backtrack once mal trees under parsimony, maximum likelihood,
again to tree A and proceed down the third path, and additive distance criteria in programs such as
toward trees C3.1-C3.5. Eventually we will have PAUP* (see Appendix).
constructed all of the possible trees, culminating The above presentation of the branch-and-
with tree C3.5. If the score of each tree containing bound method, although correct, is an oversim-
all five taxa were evaluated at the time of its con- plification of the algorithms actually used in state-
struction, then the search would be an exhaustive of-the-art computer programs. Refinements in the
one equivalent to that described in the above sec- algorithm that greatly speed the computations
tion. However, a branch-and-bound search differs usually are implemented. These refinements, de-
by eliminating parts of the search tree that only signed to promote earlier cut-offs in the traversal
contain suboptimal solutions. of the search tree, include: ( 2 ) using heuristic
Let 1, represent an upper bound on the opti- methods (discussed below) to obtain a near-opti-
mal value of the chosen optimality criterion. (We ma1 tree whose score is used as the initial upper
assume that we want to minimize this criterion, bound; (2) designing the search tree so that diver-
just as we minimize the tree length under a parsi- gent taxa are added early, thereby increasing the
mony criterion or minimize the sum of squared length of the initial trees in the search path; and
deviations in an additive-tree distance method.) (3)using pairwise incompatibility to improve the
For the present, we can obtain L by evaluating a lower bound on the length that will ultimately be
random tree; if we know that a tree of score L ex- required by trees descending from a tree i t a
ists, then the score of the optimal tree(s) cannot given node of the search tree. These methods are
exceed this value. As we are moving along a path discussed in more detail in Hendy and Penny
of the search tree toward its tips (containing all T (1982) and Swofford (1996).
taxa), if we encounter a tree whose score exceeds An obvious question may have occurred to
L, then there is no need to proceed further along the reader at this point. Since the branch-and-
this path; connecting additional taxa cannot pos- bound method requires evaluation of all trees as
sibly decrease the score. Thus, we can dispense its worst possible case, why would we ever want
with the evaluation of all (phylogenetic)trees that to perform an exhaustive search? In fact, if we
descend from this node in the search tree and im- were interested only in the optimal trees, the
mediately backtrack and proceed down a differ- branch-and-bound algorithm would indeed be
ent path. By cutting off portions of the search tree the preferred means of finding them. However,
in this manner, we can greatly reduce the number exhaustive searches permit the researcher to ex-
of trees that must actually be evaluated. amine the frequency distribution of tree lengths.
482 Chapter 11 / Swoford, Olsen, Waddell & Hillis
It is often useful to know, for example, whether Stepwise Addition

there are few or many near-optimal trees, or The most commonly used method for obtaining a
where some tree of prior interest lies in the distri- starting point for further rearrangement is by
bution of tree lengths. In addition, with very noisy stepwise addition of taxa to a growing tree. First,
data, the time spent evaluating bounds can ex- three taxa are chosen for thc initial tree. Next, olle
ceed the time spent evaluatii~gthe extra trees. of the unplaced taxa is selected for next a d d ~ t i o ~ .
Each of the three trees that would result from join-
ing the unplaced taxon to the kee along one of its
Heuristic Approaches (three) branches is evaluated, and the one whose
When a data set is too large to permit the use of score is optimal is saved for the next round. In tlus
exact methods, optimal trees must be sought via next round, yet another unplaced taxon is con-
heuristic approaches that sacrifice the guarantee nected to the tree, thjs time to one of the five pos-
of optimality in favor of reduced computing time. sible branches on the tree saved from the previous
The task of searching for an optimal tree by ap- round. The process terminates when all taxa have
proximate methods IS somewhat analogous to the been joined to the tree.
plight of a myopic pilot who lases his glasses Of course, the above description is oversim-
when forced to parachute from his airplane into a plified in that several decisions are required,
mountainous region. He suspects that there is a none of which has a straigl~tforwardanswer.
manned outpost at the top of the highest peak in Which three taxa should be used initially? How
the area, and he must somehow grope his way do we decide which unplaced taxon to connect to
there to have any hope of rescue. Simply wallung the tree next? One approach is to simply add the
uphill from the point of his landing will not nec- taxa in the same order in which they are pre-
essarily lead to his goal, since he may not have sented in the data matrix, starting with the first
started on a slope of the highest peak. Supposc three and sequentially adding the rest. This strat-
that he reaches a summit and finds no outpost. egy, for example, is the one used in Felsenstein's
Two possibilities remain: (1) he is, in fact, at the (1993) PHYLIP package. Another approach, op-
top of the highest peak, but was wrong about the tionally available in Swofford's PAUP", is to
existence of the outpost; or (2) he has climbed the check all triplets of taxa and start with the one
wrong hill. Although rather absurd, the analogy that yields the shortest tree. At each successive
is quite appropriate. step, all remaining unplaced taxa are considered
Heuristic tree searches generally operate by for connection to every branch of the tree, and the
hill climbing methods. An initial tree is used to taxon-branch combination that requires the
start the process; we then seek to improve the tree smallest increase in tree length is chosen. Still an-
by rearranging it in a way that improves its score other approach, suggested by Farris (1970), is to
under our chosen optimality criterion (e.g., mini- pre-specify an addition sequence based on each
mum length). When we can find no way to fur- taxon's distance to a reference taxon (called a hy-
ther improve the tree, we stop. Like the downed pothetical ancestor by Farris, but it could just as
pilot, however, we generally have no way of well be any taxon in the data matrix). Unfortu-
knowing whether we ended up at the top of the nately, there seems to be no strategy that works
highest hill-we do not know whether we have best for all data sets; the best approach is to try as
arrived at a global. or merely a local optimum. many alternatives as possible, each of which may
The details of heuristic search procedures potentially provide a different starting point for
vary considerably from one implementation to branch swapping (see below).
the next. In addition, better methods are often in- Algoritlms like this are referred to as "greedy
vented. consequently, we prefer to leave the algorithms" by computer scientists. Like the near-
specifics to the documentation of the computer sighted pilot who is unable to scan the horizon
program used to perform the search, and will con- and must simply proceed up the nearest hill, these
centrate on more general concepts. methods choose the solution that looks best given
Plzylogelretic Iizferelzcc 483
Step 1 stcp 2 stcp 3
Figure 25 Hcur~stlctree seiectlon uslng star decom- eages leadlng away from the ccntral node. The best trcc
posit~onmethod. At each step, thc optimallty criterion found during each step bcco~nesthc starting polnt fol
is evaluated for each possible joinil~gof a pair of lm- thc next step.
the current situation rather than attempting to see a single internal node (Figure 25, step 1). Ncxt,
more broadly into the future. Tl~us,one placement we evaluate the optimality criterion for all pos-
of a taxoil may be best given the taxa currently on sible trees that can be constructed by jorning two
the tree, but that placement may become subopti- of the terminal nodes into a new group (Flguic
rnal upon the addition of subsequent taxa. Once a 25, step 2). The tree from t h ~ stage
s that scores
decision has been made to connect a laxon to a best according to the criterion is saved for the
certain point, however, we must usually accept next stcp. Each time wc form a new group, wc
the consequences of that decision for the remain- reduce by one the number of branches con-
der of the stepwise addition process, perl~apsend- nected to the central node. The process contln-
ing u p in a local optimum as a result. ues until the step in which all generated trecs
are binary (Figure 25, step 31, and we choose t l ~ e
Star Decomposition Methods best of these (again according to the chosen op-
An alternative to stepwise addition is the star timality criterion).
decomposition method, a divisive pairwise clus- The most co~nmonlyused star decornpos~tlon
tering method (see "Cluster Analysis," below). method is the lleighbor-joining algoritl~mof
The algorithm can bc used wit11 any criterion Saitou and Nei (1987; see below). Saitou (19?0),
that can be evaluated 011 a non-binary (polyto- Adachi and Hasegawa (19921, and Z. Yang (1995)
mous) tree. To begin, we col~nectall of the ter- have also implemented the me!hod for both DKA
minal taxa connected in a "star tree" containing and pr~tei1-imaximum likelihood. Star dccon~po-
mality criterion), then we will eventually arrive at
the global optimum. However, if the intermediate
trees would require us to pass through trees that
are inferior to the one(s) already obtained, we will
once again find ourselves trapped in a local opti-
m u m unless an option is provided for branch
swapping on suboptimal trees (e.g., the "KEEP('
option in PAUP*; Swofford, 1993,1996). A related
problem concerns plateaus on the optimality sur-
face. It may be the case, for example, that an opti-
mal tree lies several rearrangements away from
the current tree, and that these rearrangements all
correspond to trees having equal scores under the
optimality criterion. If the intermediate trees are
discarded becausc they are "not better," then the
optimal tree will not be found. A few programs do
rigurti 26 Branch swapplng by nearest-nelghbor In- not retain equally good trees because they have
terchangm (NNIs). Each Interim branch of the tree deno protection against cycling (alternation between
fines ;Ilocal reglon of four subtrees connected by the m- two trees, each of which can be rearranged to
ter~orbrru~~ch lnterchang~nga subtree on one side of the
branch with one from ihc other constitutes an NNI yield the other); these programs will not be effec-
Two S L I C ~ Trearmngcrne~itsare possible for each ~ntcrlor tive if plateaus are encountered, since they are un-
branch. able to traverse the plateau.
sii~on,like stepwise addition, is a greedy algo-

rit21m that is prone to entrapment in local optima.
Graircll Swapping
Becauqe of the excessive greediness and suscepti-
blllty to local optima problems, stepwise addition
and star decomposit~onalgorithms generally do
not find optimal trees unless the number of taxa
1s small or the data are very clean. However, it
may be possible to Improve the initial estimate by
perfollning sets of predclined rearrangements, a
teciunrque commonly referred to as branch swap-
ping 111 general, any one of these rearrangements
amo~iiltsto a "stab in the dark," but the hope is
tlld t 11a belter tree exists, one of the rearrange-
lnel-is will find ~ t Examples
. of tlzree kinds of re-
arrangements used In current branch-swappmg
a l g o ~ ~ t l 'Ire
~ n shown
~s In Figures 26 through 28.
01 course, the globally optimal tree(s) may be
several rearrangements away from the starting
trw Tf a rearrangement is successful in finding a Figure 27 Branch swapping by subtree pruning and
be~lertree, a round of rearrangements is initiated regrafting. A subtree is pruned from the tree (e.g., the
subtree containing terminal nodes A and B as indi-
on tfus new tree. As long as each round of re- cated). The subtrce is then regrafted to a different loca-
arrangements is successful In finding an im- tion on the tree. All possible subtree removals and reat-
i~rovecit r ~ (according
e to its score under the opti- tachment points arc evaluated.
the optimality surface exist, we will be more likely

to find them.
Alternatives to Hill Climbing

Even when greedy algorithms like stepwise ad-
dition or star decomposition are followed b y
branch swapping, entrapment in local optima can
still occur. Fundamentally, any search heuristic
consists of pseudorandomly perturbing (rear-
ranging) the current solution until either the re-
sulting solution is acceptable, or a stopping crite-
rion is satisfied. The criteria of acceptability are
what separate the heuristic search methods from
each other: the nature of the perturbations used
is problem-dependent.
We can think of thc "goodness" of a solution
t, as some function z(t,), for each step i. Thus, in
hill climbing, t,+lis acceptable if 2 z(t,): our
myopic pilot will never go anywhere that takes
him downhill, only uphill or across. In simulated
Figure 28 Branch swapping by tree bisection and re- annealing (Van Laarhoven and Aarts, 1987) a new
connection. The tree is bisected along a branch, yield- solution is accepted if ~ ( t , , .5~z(t,),
) as in hill climb-
ing two disjoint subtrees. The subtrees are then recon- ing, but even if z(t,+,)< z(t,), then the procedure
nected by joining a pair of branches, one from each will accept the new solution with a certain proba-
subtree. All possible bisections and pairwise reconnec-
tions are evaluated. bility, as follows:
I'rob[accepting solution ~ ( t , , ~ ) ]
Testing for Convergence
Because of the limitations of heuristic approaches,
some way of evaluating the success of the chosen
method in obtaining a globally optimal solution is
needed. The obvious strategy in this regard is to
begin from different starting points and ask where k is a parameter that can vary over time.
whether the same result is always obtained. For ex- In the Great Deluge method (Dueck, 1990;
ample, a set of random sequences for the addition Dueck and Scheuer, 1990), the probability of ac-
of tawa c~lnbe used to generate imtial trees for in- cepting a new solution t, is 1 if z(t,+,) > w,,
put to branch swapping. Since, for reasonably where w, is a bound that increases slowly with
noisy data at least, the starting trees will vary de- time, so that if t,tl is accepted, then = w, +
pending on the addition sequence, convergence to c[z(tttl) - (f,)l. The constant c is usually about
a common optiinal tree (or set of trees) is encour- 0.01 to 0.05. These methods of determining the
aging. (A more extreme approach-using random acceptability of a new solution offer an efficient
trees rather than random addition sequences- means of improving the performance of heuris-
could be adopted; however, the starting trees are, tic searches (M. Charleston, personal communi-
on average, so far from the optimal trees that this cation), and there are many other variants, in-
strategy seems to be less effective.) Even if re- cluding thc use of a "tabu list" (Glover, 1989)
arrangements of different starting trees do not con- that prevents :he search from revisiting any so-
verge to the same end point, the use of several lutions it 11as just tried (the list usually contains
starting trees is a good idea; if multiple peaks on about 5 to 10 solutions).
486 Chapter 11 / Swofford, Olsen, WaddeLl G. Hillis
Algorithmic and Other Methods 4. Define the distance from u to each other cluster
(k, with k # i or j) to be an average of the dis.
The methods for tree searching described in the tances dk, and dki.
above sections are appropriate when an optimal- 5. Go back to step 1 with one less cluster; clusters
ity criterion that can be evaluated for any given i and j have been eliminated, and cluster u has
tree is chosen. The problem is then reduced to been added.
finding an optimal tree given the chosen criterion.
The methods described below do not cleanly fit The variants are primarily in the details of
into this framework, either because they are de- step 4. The most con~i~ionly used clustering
fined solely on tlze basis of an algorithm or be- method is UPGMA (unweighted pair group
cause the task of finding an optimal tree cannot be method using arithmetic averages), in which the
cleanly separated from that of evaluating a spe- averaging of the distances in step 4 is based on the
cific tree. total number of taxa in the clusters. That is, if clus-
ter i contains TItaxa, and cluster j contains T, ta,xa,
Cluster Analysis then dku= (TIdkl+ T' dk,)/(Tl+ TI).If the simple av-
Cluster analysis is a family of related techniques erage Idka= (dL,+ dk1)/2]is used instead, the tech-
for representing similarity or distance data (we nique is called WPGMA (weighted PGMA).
will use distances) in the form of an ultrametric Other variants include using tlze rnaxlmum dis-
tree (Sneath and Sokal, 1973). If the data tlzem- tance [dkM = max(dk,,dk,),called complete linkage],
selves are ultrametric, then the representation on or the minimum distance Idku= min(dk,,dk,),called
the tree will be exact. It should be obvious that if single linkage]. These alternatives all give the
the distance data tl~emselvesare not ultrametric, same results when the data are ultrametric, but
then they cannot be fit exactly to suc11 a tree, and they can differ in their inferences when tlze data
therefore errors might be introduced. are not ideal.
The method of cluster analysis is concepkally An example of usmg UPGMA to Infer a tree
simple. The raw data arc provided as a table of of five taxa (5srRNA sequences) is given in Fig-
distances between all pairs of taxa. Call dl, the dis- ure 29. The figure presents the upper right half of
tance between taxa i and j. The tree is constructed the pairwise distance matrix at each stage of tlze
by linking tke least distant pairs of taxa, followed cluster analysis. Starting with the first table, the
by successively more distant taxa, or groups of smallest distance, the 0.1715 substitutions per se-
taxa. When two taxa are linked, they lose their in- quence position separating Bsu and Bst, is indi-
dividual identities and are subsequently referred cated in bold face. Thus, the first inferred branch-
to as a single cluster. Initially, each taxon consti- ing unites these taxa at a depth of 0.1715/2 =
tutes its own cluster. At each stage in the process, 0.0858. These two taxa are merged into a cluster
as two clusters are merged into one, the number in the next table, and their distances to all other
of clusters declines by one. The process is com- taxa are averaged. For example, the distartcc from
plete when the last two clusters are merged into a the Bsu-Bst group to Lvi is (0.2147 + 0.2991)/2 =
single cluster containing all of the original taxa. 0.2569. The smallest distance in the second table
The steps of the method are as follows: joins the Bsu-Bst cluster with Mlu a t a depth of
0.1096 (= 0.2192/2). The distances of the Bsu-Bst-
1. Given a matrix of pairwise distances, find the
Mlu cluster to the other taxa are then computed
clusters (taxa) i and j such that d , is the mini- by the unweighted method. For example, the dis-
mum value in the table. tance to Lvi is (2 x 0.2569 + 0.3943)/3 = 0.3027.
2. Define the depth of the branching between i Notice that this value is identical to (Bsu:Lvi -t
and j (1,)) to be d,/2. Bst:Lvi + Mlu:Zvi)/3, where A:B is the distance
from taxon A to taxon B. Each taxon in the origi-
3. If i and j were the last two clusters, the tree is
nal data table contributes equally to the averages,
complete. Otherwise, create a new cluster
which is why the method is called unwnghted. The
called u.
Bsu Rst Lvi Amo Mlu
- sumpt~onis that the data are approx~matclyultra-
"-.,
U>U
- 0.1715 0.2147 0.3091 0.2326 metrlc T h ~ assumpt~on
s 1s of course a very it1ong
Bst onc, but ~t is seductive to belleve that a ~ l l l p l e
Lvi s t r ~ n g e ~assu~nptlon
lt can be satisiled more easlly
At110
than a 1 0 1 1 list~ of (what mlght be) less resti~ct~vc
assumptions. Second, the ldca of grouping the
Bsu-Bst Lvi Amo Mlu taxa that are least d~ffereni,regardless of any fmei
~su-Dst - 0.2569 0.3245 0.2192 po111ts of considerat~on,has a strong intultivc ap-
Lvi - 0.2795 0.3943 peal. The extreme of tlus view is the pl~eneticpcr-
Am0 - 0.4289
- spcctlve m wluch ~tis asserted that nothul~gbut thc
Mlu
extent of similarity matters biologically and that
Bsu-Bst-Mlu Lvi Arno considelation of the h~storlcalbranching ordcr 1s
~su-Bst-Mlu - 0.3027 0.3593
of purely secondary lilterest A thli-d reason IS the
Lvi - 0.2795 ava~lab~hty of programs to do cluster analys~sand
Amo - the relat~vespeed of the calcuiatrons, thcrcby en-
abllng large ntu~nbersof taxa to be analyled
As emphas~zedrtbo~/e,slmple cluster anal~lsls
Bsu-Bst-Mlu - 0.3310 has drawbacks. First, xt is just an algorlthin (or
Lvi-Aino -
farn~lyof algorithms) wlth no objectlvc defInlho11
Figure 29 Cluster analys~s(UPGMA) of 55 rRNA evo- of what constitutes an o p t ~ ~ ntreea l when the data
lut~olzarydlstance est~mates.Abbrevlatlons correspond are not d e a l . In part~cular,because genes do not
to Figure 15. Each table represents the pairwlse dls- dlverge un~formly111 all organisms or orgar~ellcs
tances (estimated nuclcot~desubstitut~onsper sequence (Chapters 8,9, and 121, syslernatlc errors arc llhely
poslt~on)for one round of cluster~i~g (only the upper tc) be introduced into cluster analysls reconstruc-
right half of the symmetr~calmatr~x1s sl~own)The
rnlnlmum dlstance value 111 each table IS In bold. The tions F~nally,alternative, rapld mell~odsare a \ all-
correspoi~dingpalr of taxa (or clusters) are merged Into able that wlll work for ail addltlve trees, not just
a s~nglecluster in the next table The bold dlstance those that are ultrametrlc
value IS tw~cethe depth of the branch po~ntscparat~ng
the clusters merged. A diagram of the Inferred tree is In Algorithnzic Methods for A d d i t i v e ?i.ees
F~gure15B.
A variety of algoritlmllc metl~odsrelated to clus-
ter analysis have been proposcd that wlll colrectly
reconstruct additive trees, whether the data are ul-
smallest distance in the third table unites Lvi and tranzetric or not. These methods fall into threc prl-
Amo at a depth oi 0.1398. The distance between mary categories. Those 111 Lhe flrst category tl ans-
the Bsu-Bst-Mlu and Lvi-Amo clusters is then (3 form any a d d ~ t l v edlstance matrix Into an
x 0.3027 + 3 x 0.3593)/6 = 0,3320. Thus the im- ultrametric matrlx and then use cluster analys~sto
plied rook of the tree joins these two clusters at a infer the tree. T11ey lnclude the transformed d ~ s -
depth of 0.1655. The complete tree is shown in tances method of W.-H. LI(1981), the present-day
Figure 15B. ancestor m e t l ~ o dof Klotz and Blankcn (1981),
Note that cluster a~lalysiscannot join two taxa and, in a less obvious sense, the neighbor-jo~n~ng
(sometimes called operational taxonomic units or method of Saltou and NEI (2987). The second cat-
OTUs) unless at least one pairwise distance links egory comprises methods that form the clusters
tlzem. Thus, missing data w~tluna group can force consistent wit11 the largest fraction of taxon-quar-
one or more members out af the group in the in- tcts, uslng a relaxed deflnitlon of additivity for a
ferred tree, a problem discussed in greater detail four-taxon tree. These methods include those of
under "Similarity and Distance Data." Sattat11 and Tversky (1977) and Fitch (1981).Mcth-
Cluster analysis has historically been very ods of the thlrd class, which mcludes tlze distancc
popular for several reasons. First, the principal as- Wagner method (Farris, 1972), b u ~ l dan add~llve
representation of the tree by sequential addition When two nodes are linked, their common ances-
oi !aka The transformed distance approaches all tral node is added to the tree and the terminal
11ave a computational complexity that is propor- nodes wit11 their respective branches are removed
tlo11~11to T3; therefore, any p r o b l e ~ nthat is from the tree. This pruning process converts the
iractable wlth standard cluster analysis can also newly added common ancestor into a terminal
bc solved with these methods. We present a ver- node on a tree of reduced size. At each stage in
slon of tke neighbor-joli~ingmethod below. the process, two terminal nodes are replaced by
Urihke cluster analysis, additive-tree methods one new node (corresponding to an internal node
yleid ~ ~ n r o o t etrees,
d which are adequate for some on the filial tree). The process is complete when
purposes. If a root 1s to be placed, however, it two nodes remain, separated by a single branch.
must be based on an ancillary criterion. Usually,
one or n ~ o r etaxa that are assumed to lie outside a The steps of the method (modified from Studier
monophyletic group of interest are included in the and Keppler, 1988) are as follows:
n1:alysis The locatlon at ~ r h i c hthese taxa join the
1. Givcn a matrix of pairwise distances (d), for
tree defines the root wlth respect to the ingroup.
each terminal node i calculate its net diver-
Another method, nudpoint rooting, depends on
gence (r,) from all other taxa sing the for-
an assumption of ratc uniformity that is some-
mula
what wcalcer tharl assumlng a molecular clock
across the entire tree. if the two most divergent
line'lgcs havc evolved at the same rate, then the
a p p ~ n p r i a kroot is at the midpoint of the path
coiznecting these tam
where N is the number of terminal nodes in
rric NEIGHBOR-JOINING METHOD Ne~ghborjoin- the current matrix. Note thc assumption that
it-ig (Sartou and Nel, 1987) is conceptually related d,i = 0,otherwise the summation would need
to Lrod~tionalcluster analysis, but removes the to skip over k = i.
as5u111prlon that the data are ultrametrlc In prac-
2. Create a rate-corrected distance matrix (M) in
tlc,~lterms, it does not assume that all lineages
which the elements are defined by
havc diverged equal amounts. However, it does
nssume that the data come close to fitting an
a d d r t ~ v etree, so correction for superimposed
~ ~ tdata that might
s ~ ~ b s i ~ t u t i oisn~s m p o r t a for
~ncludelineage-to-lmeage differences in average for all i and with j > i (the matrix is symmetri-
rnie cal, and the case of i = j is not interesting).
The neighbor-loirung algorithm is a special Only the values i and j for which M,, is mini-
c'ise of the star decomposlt~onmethod described mum need be recorded; saving the entire ma-
c'irlier 111 contrast to cluster analysis, nelghbor trix is unnecessary.
jo117111gkeeps track of nodes on a tree rather than
3. Define a new node z~ whose three branches join
tavd or cltrsters of taxa. The raw data are provided
nodes i, j, and the rest of the tree. Define the
as a distance matrix, and the initial tree is a star
lengths of the tree branches from u to i and j:
tree A i~~odified distance matrix is constructed m
wi11cl1 the scparatlon between each pair of nodes
is a d j ~ ~ s t eon
d the basis of their average diver-
gertcc from all other nodes (conceptually, this ad-
]~:,tmenthas the effect of normalizing the diver-
gence of each taxon for its average clock rate). The
4. Define the distance from LL to each other termi-
t~ee is constructed by I11.tking the least-distant pair
nal node (for all k # i or j )
oi- nodes as defined by this modified matrix.
Bsu Bst Lvi Arno Mlu R R/3

Bsu - 0.1715 0.2147 0.3091 0.2326 0.9279 0.3093
Bst -0 4766 - 0.2991 0.3399 0 2058 1.0163 0.3388
Lvl -0.4905 -0 4356 - 0.2795 0.3943 1 1876 0.3959
Amo -0.4527 -0.4514 -0.5689 - 0.4289 1.3574 0.4525
Mlu -0.4972 -0 5535 -0.4221 -0.4441 - 12616 0.4205
Lvi to node 1 dlstance = 0.2795/2 + (0 3959 - 0.4525)/2 = 0.1114
Amo to ndde 1 distance = 0.2795 - 0.1114 = 0 1681
Bsu Bst Mlu Node1 R R/2
Bsu - 0.1715 0.2326 0.1222 0 5263 0.2631
Bsl -0.3701 - 0.2058 0.1798 0.5571 0 2785
Mlu 4.3856 -0.4278 - 0.2714 0.7103 0.3551
Node 1 -0.4278 -0.3856 -0.3701 - 0 5739
.- 0.2869
Bsu to node 2 distance = 0.1222/2 + (0.2631 - 0.2869)/2 = 0.0492
node 1 to node 2 dlstance = 0.1222 - 0.0492 = 0.0730
list Mlu Node2 R 11/1
Bst - 0.2058 0.1146 0.3204 0.3204
Mlu -0 5116 - 0.1912 0.3970 0.3970
Node 2 -0.5116 -0.5116 - 0.3058 0 3058
Rst to node 3 distance = 0.1146/2 + (0.3204 - 0.3058)/2 = 0.0646
node 2 to node 3 d~stance= 0.1146 - 0.0646 = 0.0500
Mlu Node 3
Mlu - 0.1412
Node 3 -
Mlu to node 3 distance = 0.1412
Figure 30 Nelghbor joining of 5s rRNA evolutionary as defined by equation (42) are given in the lower left
distance estimates.The data and abbreviations are as in half of the matrix. The minimum corrected distance
Figure 29. Each table presents the pairwise distance val- value in each table and the corresponding uncorrected
ues input to the round sf analysis (upper right half of pairwise distance are shown in bold. The correspond-
the matrix). The rightmost two columns present the ing pair of taxa (or clusters) are removed from the ma-
row totals for the uncorrected distances (the row being trix and replaced by their common ancestral node in
defined based on the full symmetr~calmatrix; see equa- the next table and distances based on equation (43).The
tion 41) and the total divided by the number of terminal inferred tree is diagrammed in Figure 15A.
nodes minus two The rate-corrected pairwise distances
a n d has estimated the lengths of t w o of the

branches connected to that node. The tree can
be drawn from these data.
5. Remove distances to nodes i and j from the data An example of using neighbor joining to infer
matrix, and decrease N b y 1. a tree of five taxa is given in Figure 30. The data
are the same as in the cluster analysis example in
6. If more than two nodes remain, go back t o step
Figure 29. The pairwise distance estimates are in
1. Otherwise, the tree is fully defined except
the upper right triangle of each matrix (ignoring
for the length of the branch joining the two re-
the last two columns). The distance matrix row to-
maining nodes (i a n d j). Let this remaining
tals [r from equation (4111 a n d r/(N-2) are given
branch be
in the last two columns. The rate-corrected dis-
tances are in the lower left triangle of the table.
For example, the corrected Bsu to Bst distance is
0.1715 - (0.3093 + 0.3388) = -0.4766. A general
Each s t e p has generated one internal n o d e
490 Chapter I 1 I Swofford, Olsen, Waddell
property of these corrected distances is that they the adjacent branch length so that the total dis-
are negative; therefore, finding the minimum distance between an adjacent pair of terminal nodes
tance means finding the most negative value. In was unaffected. This change does not alter ihe
the first table, the minimum value is the -0.5689 topology of the tree found by the algorithm; it just
relating Amo and Lvi. Both this value and the cor- guarantees non-negativity of branch lengths (e.g.,
responding uncorrected distance, 0.2795, are in for interpreting branch lengths as estimated nunl-
boldface. Thus, Amo and Lvi are joined to one an- bers of substitutions).
other and to the rest of the taxa through a new Neighbor joining is classified as an algorith-
node, called node 1 in this example. The two lines mic method because it constructs only one tree
below the table illustrate the calculation of the and does not explicitly optimize any objective
branch lengths from the two taxa to the node. function (the branch-length estimates from neigh-
Arno and Lvi are then removed from the distance bor joining are not, in general, optimal for the
table, and the distances from node 1 to the re- minimum evolution criterion). We believe that it
maining taxa are calculated using equation (43). should be thought of as a means of getting a start-
For example, t l ~ eBsu to node 1 distance is (0.2147 ing tree for more thorough searches using branch
c 0.3091 - 0.2795)/2 = 0.1222. The second table, swapping under tlle minimum evolution or other
~vllichnow relates only four terminal nodes, is additive-tree criteria, not as a method for choos-
treated just as the first table. Looking at the cor- ing a final tree.
rected distances, we find two pairs wit11 the low-
est value, -0.4278. This is not a coincidence:if Bsu SPLIT D E C O M P O S ~ T I O N All of the methods
and node 1 are sister nodes, then Bst and Mlu described above will select a tree regardless of
must also be sister groups. (If this observation is how non-treelike the data appear. When the data
unclear, try drawing the unrooted tree of four do not conform to a treelike model, criterion-
taxa.) The remaining arithmetic will peld identi- based methods may provide some indrcation of a
cal trees regardless of wfuch of these two pairs are problem, for example, by discovering some near-
joined at this step. In this example, node 2 is ly optimal trees that are quite different in topolo-
added to the tree, joining Bsu, node 1, and the rest gy. Algoritl~micmethods such as neighbor join-
of the tree. The branch lengths from Bsu and node ing provide little or no indication that the data
1 to node 2 are calculated below the table. The do not conform to the model. Split decomposi-
third table eliminates Bsu and node 1, and adds tion (Bandelt and Dress, 1992) is a method for
node 2. In this table, which relates three periph- graphically representing trends in distance data.
eral nodes, all three rate-corrected distances are The method detects well-supported groupings
identical. As in the previous step, this result is not when they occur, but also identifies conflicting
a coincidence: only one possible unrooted tree can (incompatible) groups that may also have strong
link three taxa. The choice of the pair to be joined support in the data. These conflicts can arise
is arbitrary; the ultimate outcome will be the from sources such as inadequate correction for
same. Adding node 3 to the tree so that jt links Bst superimposed changes in the distance transfor-
and node 2 to the rest of the tree (which is only mation, convergence driven by natural selection,
Mlu at this point) gives one more pair of branch or reticulate evolution. We will not give a com-
lengths and a "tree" containing node 3 and Mlu. plete description of this method, but will outline
Their pairwise distance is used directly as the the basic ideas using a simple example.
length of the segment joining them. The tree is The method is based on the four-point metric
completed. The results are shown in Figure 15A. (formula 10) (Buneman, 1971),which states that il
As the neighbor-joining algorithm seeks to taxa i, j, k, and 1 (a quartet) are related by a tree
represent the data by an additive tree, it can as- ((i, j), (k,1)) and the distances are tree-additive,
sign a negative length to a branch. Kulzner and then the minimum sum will be dl, + dkl,~7hilethe
Felsenstein (1994) modified the algorithm so that larger sums + d,, and dtl+ dlk will be equal. With
when a negative branch length occurred, it was real data (i.e., imperfectly additive distances), the
set to zero, and the difference was transferred to relationship dlk + dl[ = dIl+ dIkwill not hold. Al-
Figure 31 (A) Distance matrix for split decomposition
example. (B) Graphical representation (network) of
splits ilnplied by matrix (A). (C) Poisson-corrected dis-
tance matrix. (B)Corrcct tree inferred from matrix (C).
though we could hope that d,]+ dkl .:dlL + d,l and
d , -t. dki < d,! + dlk ,which forms the basis of the Sat-
tath-Tversky (1977) and Fitch (1981) "neighborli-
ness" methods, even this relationship will usually Thus, we reject the tree ((1,4),(2,3))and calculntc
be violated by some quartets. Split decomposition index representing support for each of
ail isolafio~~
adopts the working assumption that at the very the other partitions (splits) as
least, d,]+ dkl will not be tlze largest of the three
sums. Usually, phylogenetic methods assume that
if dl, + dlkexceeded hot11 other sums, then there is
no support In the data for the tree ((I,!), (j,k)).
However, we can also ask whether there is rela-
tively unambiguous support for one of the other The observatioil tlzat support for the ((1,2),(3,4))
two trees. For example, if d,, + dkl and d,k+ dkl are split is nearly half that of the support for tlie
nearIy equal, but both are distinctly smaller than ((1,3),(2,4))split suggests tlzat there is conf11ct111g
d,! + dlkrconflicting support is evident. The closer support for two differenl groupings In the data
one of these two sums approaches d,,+ dlk, the set. Thls conflict is represented by drawing ihc
more consistent is the support far thc tree corre- tree as a network showing the amount o f support
sponding to the other sum. for each of the two supported groupings (F~gurc
We illustrate this procedure using the Izypo- 31B). A standard tree-bullding method srrch as
thetical example of Figure 31. The distances in nelgl~borjoining would, 111 contrast, select the
Flgure 31A are the observed or uncorrected dis- tree ((1,3),(2,4))but grve no ~ndicationof the SLIP-
tances that would be expected from the example port in the data set for the alternative tree
used to illustrate the Hadamard conjugation ( i . ~ . , ((1,2),(3,4)).
calculated using the relationship d = (1 - r ) / 2 ; see In this example, we can ~dentifytlze cause of
equation 31). The three relevant distance sums the conflict as fallure to account for superral-
are: posed changes, which in this case would cause
select~onof an incorrect tree rrsing neighbor loin-
sng or other additive-tree methods. However, us-
mg the standard Poissolt correction (equat~on27),
we can obtain the corrceted distance matrix
shown in Figure 31C. (Note that the elements of
this matrix are equal to one-half of the appropn-
492 Cizlzpter 11 / Swofford, Olsen, Waddell b Hillis
ate elements in the corrected generalized distance low one to detect when conflicting splits are due
vector p of equation 30.) For this corrected ma- to events such as horizoxxtal transfer of DNA or
rrix, we have recombination. Such claims should be evaluated
with more sensitive character- and sequence-
based methods k g . , Stephens, 1985; Hein, 1990a,
1993).A more straigl~tfonvarduse of the method
is in the choice of a distance transformation (e.g,,
allowing more substitution parameters, unequal
rates across sites, and/or unequal base composi-
tions). Split composition can give some idea of
'LtTlzenspllt decomposition 1s performed using the whether these transformations are improving the
corrected distances, the box in Figure 318 indi- "treelikeness" of the graph or making it worse (VI-
cating conflrcting support disappears because sualized as a more "'boxy" network; e.g., see Lock-
l(dl++ dZ3)- (d13+ dZ4)1/ 2 = 0, and the correct hart et al., 1995b).
tree 1s 111ferred(Figure 31D). Split decomposition analysis will not neces-
For a tree of more than four t-axa, the devia- sarily detect some kinds of departures from pre-
tion from the additive four-poinl metric condition dictions of a model, again because we are start-
1s measured for all posslbIe subsets of four taxa. ing from distances rather than characters. For
Bandelt and Dress (1992)showed that only a cer- example, unlike the Hadarnard conjugation,
tain nulnbcr of the implied splits can be portrayed split decomposition will not recognize an excess
011 d planar graph (the spl/t dcco~nposableportion); of patterns supporting all three four-taxon trees,
the 13K1POI t ~ o nwhich cannot is referred to as tlie as would happen if there were more superim-
sp!zt-pt v ~ i ~desidue.
e Bandelt and Dress (1992) sug- posed changes than the model predicts. Like the
gested l l ~ nthet majority of the random noise con- Hadamard conjugation, we need a means of de-
tarned in a data set is iransfcrrcd to the split- termining whether a conflicting "signal" is re-
primc residue (which also contains some ally present or is simply due to sampling error
systematic biases that arc only locally uniform in causing inequality of d,k + 4[ = djf + dlkby chance.
t h e ~ dircclron).
r Remaining random noise and sys- Unfortunately, this question has received little
tcrnalic error is retained in the split-decomposable attention, but with small data sets it is possible
component and is observed on the resulting net- to determine analytically how many standard
work as incompatibilities between splits (or unre- deviations separate the three sums of distances.
solved nodes; see below). A bootstrap approach (see below) to assessing
Wlxen using split decompositlon on substan- the reliability of features in the split decomposi-
tial n ~ ~ m b eofr staxa, tlze resulting graph often ap- tion graph is also feasible, but will probably be
pears more llke an unresolved tree than a network conservative. The relationship between split de-
with inany boxes. Distant outgroups, for example, composition and the distance Kadamard is not
can show large random fluctuations and also dif- well understood; both methods should be con-
ferent syslematic biases, tending to hide the infor- sidered useful because they give different in-
rnatlon on ingroup systematic bias (as all three sights.
quarlel- rclations may be optimal depending on
which taxa are used). When this happens, local- METHODS BASED ON A RELAXED FOUR-POINTMETRIC
ized (but possibly strong) systematic error is lost The methods of Sattath and Tversky (1977) and
m tlz~spllt-prime residue and the graph loses both Fitch (1.981) are also based on a relaxation of the
"boxmess" and resolution. One solution to this four-point metric condition of Buneman (1971).
problem is to look for systematic errors by restrict- However, they are based on a somewhat stricter
mg ihc ;~nalysisto smaller subsets of taxa (4-10) criterion than split decomposition. These methods
Because it is based on distances and not clnar- operate by creating a similarity matrix sij that
actcrs, spllt decompositlon by itself does not al- counts the number of times each pair of taxa i and
j satisfy the conditions d,] + dkl < d,k + dkl and dil + Modifications to the distance Wagner procedure
dkl < dii + dlk over all pairs (k, 1). Tlus matrix forms have subsequently been proposed by Swofford
the basis for a cluster analysis. We begin by choos- (1981) and Tateno et al. (1982).As with neighbor
ing the pair (i, j) for which sij is maximal, and form joining, if the experimentally determined dis-
the corresponding cluster. These two taxa are tances are additive, then the optimal solution
merged into a single object and distances are re- will always be found. I-Iowever, when the fit is
calculated as in UPGMA. The quartet-based scor- not exact, the behavior is not intuitively obvi-
ing of pairs of taxa is then repeated, and the cycle ous.
continues until all taxa have been clustered. (The
Sattath-Tversky and Fitch methods differ slightly
in the details of how averaging is performed in RELIABILITY OF INFERRED TREES
preparation for the next clustering cycle.)
The Sattath and Tversky (1977) and Fitch Systematic Versus Random Error
(1981) methods have not been widely used. Fur-
thermore, simulations by Charleston (1994) indi- In any statistical analysis, two kinds of error (sys-
cate that these methods (and other transformed tematic and random) need to be distinguished.
distance methods, such as that of W.-H. Li, 1981) We define random error as deviation between a
are less effective in identifying the correct tree parameter of a population and an estimate of that
than methods such as neighbor joining or closest parameter, due strictly to a limited sample size
tree (applied to the distance Hadamard). They are used to make [he estimate. By definition, random
also more computationally intensive (requiring error disappears in infinite samples. In contrast,
time proportional to T5, as opposed to T3 for systematic error is deviation between a parame-
neighbor joining). ter of a population and an estimate of that para-
meter, due to incorrect assumptions in the esti-
DISTANCE WAGNER AND RELATED METHODS The mation method. Systematic error persists (and
conceptual perspective of Pitch-Margoliash may intensify) as sample sizes increase and be-
methods and neighbor joining is that the esti- come infinite.
mated pairwise distances are to be fit to an Throughout this chapter, we have discussed
additive tree, with some of the estimates (obser- various conditions under which systematic error
vations) being greater than the true vaIues and arises in phylogenetic analyses. In general, sys-
some of them being smaller than the true val- tematic error occurs when the evolutionary
ues. An alternative view is one in which the process violates the assumptions of a phyloge-
sequence (or other) differences are not corrected netic method in a critical way. Under these condi-
for superimposed changes and thus provide tions, a bias may be introduced into the evalua-
lower bounds for the actual evolutionary dis- tion of alternative phylogenies, favoring some
tance, h this framework, the length of the path branching patterns and decreasing the support for
connecting any pair of taxa must equal or others. If the bias becomes sufficiently great, it
exceed the corresponding observed distance. In may overcome the legitimate support for the cor-
analogy to character-based parsimony, the rect tree and lead the researcher to an incorrect
desired tree is the one that minimizes the total conclusion. Because the effect is systematic, the
of all branch lengths in the tree, while using the addition of more data will tend to solidify the in-
pairwise distances as lower bounds on the path- correct conclusion (and the method is said to be
length distances. Beyer et al. (1974) a n d inconsistent or positively mislending under these
Waterman et al. (1977) have described exact conditions; FeIsenstein, 1978a).For a mistake to
methods for accomplishing the desired mini- occur in phylogenetic estimation of the branching
mization on a given tree. Farris's (1972) dis- order as a result of systematic error, the magni-
tance Wagner algorithm can be thought of as a tude of the bias must exceed the valid support for
heuristic approach to the same problem. the correct tree. Furthermore, the bias must be in
494 Chapter 11/ Swofford, Olsen, Waddell €? Hillis
the direction of an erroneous tree, as it is possible tions should be tested, the effects of potential
for systematic bias to increase apparent support sources of bias should be explored, and met]zods
for the historically correct tree. Thus, the presence should be used to reduce the effects of systematic
of a bias does not necessarily lead to wrong an- error in the analysis.
swers, but it does cast doubt upon the valid~tyof
the inference process.
Even if evolution occurred exactly as as- Systematic Error
sumed by a particular analytical method, an in-
correct tree may be inferred with finitc data due Conditions m a t Lead to Systenzatic Evvov
to chance events (which introduce random error). Fortunately, the situations likely to lead to syS-
For example, convergent substitutions might be tematic error under most of the methods we have
cxpected to occur (in a given situation) only once described are relatively well understood. We have
per 100 nucleotide sites, but because of sampling discussed some of these conditions in the sections
error we might observe thrce convergent substi- describing each of the methods; here we present a
tutions in a single sample of 100 nucleotide sites. brief review far the major classes of analysis.
This type of error occurs even when the presumed
model is correct. By analogy, the observation of 20 GENERAL ASSUMPTIONS Almost all lnet1lods
consecutive "heads" in a coin-tossing experiment assume that the characters analyzed are vertical-
might lead us to conclude that the coin is two- ly inherited (rather than horizontally acquired).
headed, but of course this outcome has a finitc This assumption is usually inet for molecular
probability of occurring (approximately 10") even data, and so probably only rarely introduces sys-
if the coin is fair. In inferential statistics, we gen- tematic error into molecular systematic studies.
erally choose a certain probability (typically 0.05) The other general assumption of most methods
below which an outcoine is improbable enough is that characters are independent with respect to
(assuming that random error accounts for the de- probability of change. If, for examplc, a change
viation) to warrant rejection of a null hypothesis. in one nucleotide position makes a clzange in a
Random error does not necessarily produce a second position more likely, then this assump-
random effect on the outcome of an analysis, tion is violated (see Wheeler and Moneycutt,
however. For instance, for many methods of cal- 1988, and Korber et al., 1993 for exa~~zples). If
culating pairwise distances, small distances and methods do not explicitly account for this non-
large distances are affected differently by sam- independence, it may lead to systematic error.
pling error. Under some conditions, this leads to
a sample-size-dependent bias in methods that are PARSIMONY If the number of actual sequence
nonetheless consistent for the model under con- cl~angesper sequence position in a macromole-
sideration (see Ilillis et al., 1994b for an example). cule is always sinall (zero or one), then parsiinony
In otlzer words, even if a method is consistent will correctly recollstruct the phylogeny given
and will lead to the correct tree if given an infi- enough data (Felsenstein, 2978a). As the nt1111ber
nite amount of data, it nonetheless may be biased of chsanges increases, the proportion of those
with finite data, even if its assumptions are met changes that are homoplastic (parallel, conver-
perfectly. gent, or reversed) increases. If the tree is relatively
Realistically, both random and systematic er- dense (i.e., branch lengths are short enough so
ror are expected in any given study. Random er- that the expected number of changes on any one
ror occurs in any finite data set (since the expected branch is small), these holnoplastic clzanges usu-
proportions of different character patterns are real ally will be detected as such. However, parsimony
numbers), so tile sensitivity of the results to the analyses do not detect multiple changes on long
presence of random error needs to be assessed. unbianched lineages, thereby creating the poten-
Because systematic error is expected when the as- tial for bias if a mixture of long and short branch-
suinptions of a method are violated, the assump- es are present in an analysis (Felsenstein, 1978a).
A ~ ~ I T I V E - T R ETECHNIQUES
E The additive-tree tematic error. There are, however, a few, tech-
techlliques discussed in this chapter are free of niques that can hell! ill cvaluatillg the extent of
error if the distance data are additive systematic error, and for assessing the expected
(satisfy the four-point condition) and no distance effects of identified systcinatlc error.
values between sister taxa are missing from the
data matrix. This internal consistency of the tech- TESTS OF MODEL FIT Often there are tradeofks
nique places the burden of accuracy on the esti- between modcl colnplexlty (wl~lchprovides con
m a t i ~ l and
l transformation of tlie distance data sistency under a wide range of conditions) and
opposed to the actual tree inference proce- both computational complexity and sensitlvlty to
dure. Specifically, the model used to correct for random error. Therefore, in using a inctllod tl~at
superimposed changes must reflect tlze underly- assumes an explicit model of evolution, it 1s
ing evolutionary processes. To the extent that it important to choose a model that is complex
does not, additive-tree methods are susceptible enough to explain the observed data, but not so
to systematic error. complex as to be subject to llilpractlcally long
or require :mpractlcally large data
MAXIMUM LIKELIHOOD If the model of e.iroluti0n sets Clioos~nga model, therefore, requires a test
used to evaluate the likelihood of given trees to compare the fit of one model of e v o l u t ~ o n
does not reflect the actual evolutionary process- aga~iistanother lor a particular data set. Fur-
es, then maximum likelihood analyses will be thermore, we need to know ~f the best lnodel
subject to systematic error. 117 general, maximum p r o v i d e s a n adequate explanation of tllc
likelihood appears to be more robust to viola- observed data. Reeves (1992)and Goldman
tions of its assumptions than are additive-tree (1993a,b) have described tests for thls purpose
methods (Huelsenbeck, 1995b). In principle, To compare two models of evolution, c old-
maximum likelihood models can be made arbi- man (1993a) suggested using the likelihood ratio
trarily complex to account for particular evolu- test statistic, 8
tionary processes, but the cost in terms of com-
putational time may be severe. Moreover, com-
plex models may bc more sensitive to random
error than are simple models (because more where In L1 is the log hkehhood under the Inole
parameters need to be estimated from the same colnplex (~arameter-rich) model and In L~ 1s thc
amount of data). log l~kelihoodunder the simpler nzodel. This sla-
tistic will always take on a value greater thall or
CLUSTER ANALYSIS If the assumption of ultra- equal to zero because the likelihood under the
rnetricity is satisfied and no distance values complex model will always be equal to or higher
between sister taxa are missing from the data than the likelihood under the simple model. TO
matrix, cluster analysis will be free of systematic test whether the more complex model prav~clesa
error. However, if two lineages are not equally significantly better explanation of tlze observed
distant from a third, more divergcd lineage (i.e., data, Goldman (1993a)suggested that the null d:s-
if the pairwise distances are not ultrametric), tributron of tlic statistic 6be determined usil~g
then systematic error will be introduced. As simulation. The tree and the parameters of the
pointed out above, satisfactiolz of the three-point model arc estimated under the null l ~ y ~ ~ t h
condition establishes tlxat the distances are ultra- that t l ~ simpler
e model of evolution is correct, and
metric. In practice, this condition is rarely satis- this estimated tree and parametenzed model are
fied by real data. then used to simulate many replicate data sets of
the same slze as the origulal.Maximum like]lhood
Recognizing Systematic Ewor scores are then calculated under both the simple
There is no foolproof method for identifying arti- and complex models to produce a null distribuhon
facts in plzylogenetic trees that result from sys- for the test statistic 6.If 6(from the origuxal data) 1s
496 Chapter 11 / Su~offord,Olsen, Waddell G. Hillis
greater than 95% of scores from the simulated to perform the test on the most deviant lineages
data, then the simpler nlodel of evolution is re- (those with the greatest and/or least total length
jecreci. Note, however, that rejection of the null hy- in a rooted additive tree). Alternatively, some au-
potlle,ls only indicates that the slrnpler model is thors have simply varied the assumed rate for
madeq~iaitcto explain the observations; it does not each branch or subtree, one after the other. In ei-
neccssarlly indicate that the more complex model ther case, the approach amounts to multiple hy-
is adeq~~ate The more complex model is now the pothesis testing, and lowers the significance be-
null model and is subject to further testing. low that of a single likelihood-ratio test with the
Typrcally, one can conduct tests to see if a same value of 6.
given parameter that can be added to a model Another approach for testing model fit has
pro.irides a significant ~mprovementin the optl- been proposed by Rzhetsky and Nei (1995).They
mallty 5core. For instance, many models assume derive linear invariants that are independent of
a diflcr ence in the probabilities of transitions and evolutionary time and phylogeny and reflect the
transversions. TOtest if this parameter (transi- constraints on a restricted model relative to more
tion transversion ratio) is necessary, one could general time-reversible models. By testing
test the Kirnura two-parameter model against whether the deviations of these invariants from
the Jukes-Cantor one-parameter model of DNA their expected values are greater than would be
subst~tut~on (see the section on "Maximum Like- expected by chance if a particular model were
hhood"). In this example, the log likelihood untrue, a test of whether that model is applicable to
der the Kimura model would be 111 L1and the log a particular data set is obtained. Goldman's
likelihood under the Jukes-Cantor model would (19934 method has some theoretical advantages,
be In Lo. but Rzhetsky and Nei's (1995) method is mucl~
To test the adequacy of a given model of evo- more computationally feasible.
lution, Coldman (1993a) suggested that the log
likelrhood under the multinon~ialdistribution ASSESSING THE EFFECT OF A POTENTIAL BIAS In
(In L1) be tested against the model of interest some cases, a model of evolution may be ade-
(inLo). This test is very stringent, however, and quate for the majority of taxa, but not applicable
under a wide variety of circumstances the model to all taxa. For instance, if a model incorrectly
of interest will be rejected as an "adequate" ex- assumes that the same equilibrium base frequen-
planation of the observed data. This does not cies exist in all lineages, then systematic error
mean that the model is inadequate to provide a will be introduced into the analysis. The problem
reasonable estimate of phylogeny, but ~t does may be particularly severe if the differences in
mean that the model fails to provide a perfect de- base composition do not follow phylogenetic
scrlpt~onof the underlying evolutionary pro- lines. If base composition is affected by ecologi-
cesses. Since we never expect models of evolution cal or physiological factors, then the potential for
to be correct in every detail, the test is perhaps convergence in base composition exists. For
best used to estimate how far the assumed model instance, Pettigrew (1994) argued that the meta-
deviates from the underlying processes. The bolic constraints of flying bias the base composi-
greater the deviation, the more attention one tion of rnicrochiropterans (the mostly small,
should pay to discovering those ctspects of thc echolocating bats) and the megacl~iropterans
evolutionary process that have not been ade- (flying foxes and their relatives) toward a higher
quarely ~ncorporatedInto the model. AT content, and that this bias misleads phyloge-
In applying the likelihood ratio test, the num- netic analyses of many different mitochondria1
ber oi tests being conducted needs to be consid- and nuclear genes (an effect he called the "flying
ered For example, in comparing the likelihoods DNA hypothesis"). He argued that the numer-
o i ultt elmetric trees (i.c, assuming a "constant ous studies that support the monophyly of these
clock") t o trees In whlch a given lineage is al- two bat groups (e.g., Bennet el al., 1988; Adkfns
lowed to change at a different rate, it is tempting and Honeycutt, 1991; Mindell et al., 1991; Am-
Phylogenefic Inference 497
merman and Hillis, 1992; Bailey et al., 1992; to the remaining eukaryotes. Furthermore, if in-
Stanhope et al., 1992) can all be explained by this variant sites are taken into account, then the sup-
base compositional bias. Instead of bat mono- port shifts strongly in favor of microsporidians as
phyly, Pettigrew (1986, 1991a,b, 1494) has argued the most basal eukaryotic lineage (Waddell, 1995).
(primarily on the basis of neuroanatomy) that Similar tests can be conducted to examine the
lnegachiropterans are more closely related to pri- potential effects of any hypothesized systematic
mates than to microchiropterans. Therefore, two bias. For example, both Gouy and Li (1989a) and
different explanations have been presented for Olsen and Woese (1989) have argued that if the
the apparent support from DNA sequences for tree of life proposed by Lake (1988)-in which Ar-
bat monophyly: either the two "bat" groups are chaea is paraphyletic or polyphyletic-were cor-
phylogenetically related, or the results are rect, a systematic bias due to "attraction" of long
accounted for by systematic bias. Van Den branches would not be sufficient to yield the trees
Bussche et al. (1996) have tested Pettigrew's fly- observed by the former groups (in which Archaea
ing DNA hypothesis for the relevant data sets is monophyletic). Gouy and Li (1989a) and Olsen
through simulation, and have shown that the and Woese (1989) interpreted these results as
support for bat monophyly cannot be explained grounds to reject the proposal of Lake as being in-
on the basis of base composition bias alone. Even consistent with their observations, a conclusion
if Pettigreds phylogenetic hypothesis is correct, that is contested by Lake (1990b).
and every substitution in the two bat lineages
went to an A or a T, then the bias would still not SENSITIVITY TO SPECIFIC TAXA IN THE TREE If the
be sufficient to explain the observed support for data and tree inference technique were ideal,
bat monophyly. Furthermore, analyses that are analyzing any two subsets of taxa would yield
better at taking different base composition congruent trees (i.e., the trees would be identical
among lineages into account (such as LogDet after pruning taxa absent from one or both trees).
analyses) still support bat rnonophyly. Therefore, In practice tlus is not the case. (Otherwise, find-
the analyses show that the particular bias is not a ing optimal trees would be almost trivial, since
sufficient ex~lanationfor the data. This does not
I
constructing a tree by sequential addition of taxa
mean that the data have no systematic bias, but would always lead directly to the globally opti-
it does mean that the hypothesized bias is not an mal tree, regardless of the order of addition.)
explanation for the results in this case. Both systematic and random error can distort the
In other cases, base composition does have a tree so that the inferred branching order is
demonstrable effect on phylogenetic analyses (see dependent on the taxa included. Because the
Rzhetsky and Nei, 1995, for a test to detect signif- to& error contains both systematic and random
icant base compositional differences). For in- components, variation with the sampling of taxa
stance, Leipe et al. (1993) and Hasegawa and does not necessarily indicate an effect of system-
IIashimoto (1993) have suggested that early eu- atic error, but it is suggestive. Most sources of
karyote evolution is especially difficult to analyze systematic error are expected to increase with
because of unequal base composition (e.g., the Gi- branch length; therefore, if the changes in tree
ardia genome is about 70% G+C, whereas the av- topology are specific to the most diverged taxa,
erage microsporidian genome is 35% G+C). Ob- then there is again reason to suspect that system-
served distances, transformed distances, standard atic error is having a significant effect on the
parsimony, and current maximum likelihood analysis.
models all support Giardia as the sister lineage to Lanyon (1985) described a jackknife method
other eukaryotes with high bootstrap support that evaluates taxon stability by computing T
[based on Gouy and Li's (1989) small-subunit trees, each time leaving out one taxon. By com-
rIWA data set]: However, phylogenetic analysis puting a strict consensus of these trees using a
of LogDet distances shows equal support for ei- method that allows different subsets of taxa to be
ther ~ i a r d i aor microsporidians as the sister group contained on each of the rival trees, the investiga-
498 Chapter 11 / Swofford,Olsen,Waddell b Hillis
tor can determine which relationslups are consis- If a conflict cannot be expIained by random
tent. Felsenstein (1988a) suggested that this error associated with finite sampling, then one of
method may not have the properties of a statisti- the following possible explanations should be
cally valid jackknifing procedure, but it nonethe- considered: the inadvertent use of non-ortholo-
less provides a useful index of which groups are gous genes (e.g., a tree with mouse and rabbit a-
most stable to taxon selection. globin and rat ,!?-globin; paralogy); reticulation of
lineages due to hybridization or lateral gene
CONTRIBUTION OF INDIVIDUAL TAXA TO THE OPTI- transfer (xenology); or the presence of significant
MALIm CRITERION If the placement of a particu- levels of systematic error (leading to inconsistent
lar taxon is problematic (due to systematic error), conditions) in one or both trees.
removal of that taxon from the analysis will fre-
quently make a disproportiolzate clzange in a NONPARAMETRIC APPROACHES Nonparametric
measure of tree quality, such as the least-squares tests may provide an additional source of guid-
criterion in a distance tree, the estimated homo- ance in evaIuating a tree inferred fro111 distance
plasy of a tree derived by parsimony, or the data (or for which pairwise distance estimates
overall likelihood ratio statistic 6. However, such can be generated from the character data), In
measures are correlated with the number of taxa practice, the usefulness of these tests is depen-
in an analysis, so one must confirm that the dent on the details of the tree inferred, and in
change in a given statistic is significantly greater many circumstances the tests may not be able to
than would be predicted by the removal of an distinguish alternatives. An illustration of a case
average taxon (in many cases this will require a in which they might be useful is provided by the
simulation study). trees in Figure 32. A comparison of the paths
from A and B to D yields the expectation that dAD
INFERENCES BASED O N D I F F E R E N T MOLECULES > dgD for all three trees. Let us assume that this
Pl~ylogeneticrelationships inferred from two or trend is significantly supported by the data (for
more different molecules sl~ould,in theory, be example, the trend is verified by bootstrap sam-
congruent if the molecules had the same overall plings of sequence positions). If we now consider
history. If the inferred reIationships are different, the relationships ol C to A and B, we expect that
the reasons for the differences should be investi- dAC > dsc in trees 1 and 3 (an expectation that
gated (Bull et al., 1993b).It is important to avoid could also be true of a minor variant of tree 2),
confusing differences between the optimal trees whereas dBc > dAC is only consistent with tree 2.
with the conclusion that the results are signifi- Again, we can examine the data directly to see if
cantly incongruent: the former might simgy be one of these inequalities is significantly support-
d u e to random errors in one or both trees, ed. In particular, if we observe that dBc > dAC,
whereas the latter asserts tlze existence of a sig- then we must conclude that trees 1 and 3 are
nificant conflict. One method for deciding incorrect, leaving tree 2 by elimination. Yet, if
between these two possibilities is to fit each data tree 2 were historically correct, systematic error
set to the tree(s) derived froin the other data could have biased the tree inference procedure to
set(s). Most modern programs allow the input of group the long branches leading to C and D,
user-defined trees for evaluation under a partic- leading to the incorrect choice of tree 1. The rea-
ular optimality criterion. For example, suppose son that it is possible to infer tree 2 from the data
tree 1 is optimal for data set A and tree 2 is opti- and yet to find certain distances significantly
mal for data set B. If tree 2 is nearly as good as inconsistent with that tree lies in the particular
tree 1 for data set A, and if tree 1 is nearly as ratios of branches and in the fact that the latter
good as tree 2 for data set B, then there is no-real test does not need to examine the most underes-
conflict, just inadequate information. This result timated distance (i.e., that separating C and Dl.
can sometimes occur even though tlze two trees In contrast, the tree inference procedures dis-
differ substantially in their topologies! cussed would include the distance from C to D
Tree 1 Tree 2 Trcc 3
Figure 32 Three alternative trecs relating four taxa
that can be distinguished by a non-parametric test on
the distance data. See text. substitution of different outgroup taxa, one or
two at a time, can still be used to evaluate the
reliability in the position of the root.
(directly or indirectly) and potentially be misled Ironically, the effect of multiple outgroups In
by this value. parsmony is allnost exactly the opposite. The use
of multiple species of an outgroup taxon will tend
Reducing Systematic Ewor to dlvide the longest brancli 111 the tree, thereby
Several strategies are available to minimize sys- decreasing its tendency to attract other long
tematic error and its effects on a phylogenetic branches (Felsenstein, 1978a; Hendy and Penny,
analysis. 1989; A.B. Smith, 1994).To be inost effective, how-
ever, additional outgroups should be chosen so as
CHANGING THE ASSUMPTIONS One obvious way to divide long branches reasonably evcnly;
to reduce the chance of having systematic error adding an extremely close relative of a very dis-
lead to incansistei~cyis to change the assump- tant outgroup will gain little Of course, the bene-
tions of the analysis to better match the observed fits of adding additronal taxa are not llmited to the
data (e.g., see "Tests of Model Fit," above). One outgroup. Long branches (sparse regions) wlthln
example has already been given: if base composi- the ingroup can also contribute to systematic cr-
tion is thought to vary significantly among taxa, ror, and multiple substitutions are more easily dc-
then pairwise distances can be corrected using tected in dense regions. A somewhat paradoxical
the LogDet transformation. However, the source phenomenon results. With large numbers of Laxa,
of the systematic error may not always be so correctly inferring every aspect of the true topol-
obvious, or a method may not have been devised ogy is extremely difficult, but if we were inter-
for dealing with an identified bias. The following ested in the relationships of only, say, four taxa,
teclmiques may be useful in these cases. we would be much better off to compute d trtJefor
20 taxa (interspersed among the four of interest)
REMOVING LONG BRANCHES A practical consid- and prunc 16 of them froin the tree than to coni-
eration in the inference of trees from pairwise p t e - t h e tree for only the four taxa.
distance data is that the effects of systematic
error are expected to be worse with larger than ELIMINATING UNRELIABLE DATA Another placl1-
with smaller distances. As noted in the discus- cal cons~derationconcerns the fact that a branch
sion of the Fitch-Margoliash technique, pairwise is long because a large number of substitutions
distance methods include all measurements in have occurred in the sequences being comparcd
the calculations as .though they were indepen- Liinltlng an analysis to those sequence reglons In
dent. Therefore, having many long distances in a which positional homology is most certain tends
tree will tend to compound errors. In order to t o exclude the most varlablc p o s r t ~ o l ~117s
work around this problem, the use of outgroup sequences, thereby shortening branches and
sequences should be kept to a minimum when decreasing the sens~hvltyof the analysis to mul-
using a painvise distance method. However, the tiple substitutions Tlus concept can be pushed
500 Ciznpter I1 / S~uofiord,Olsen, Waddell & Hillis
~ L I tl-ter
I lf hypervariable regions can be identl- characters: those that are misinformative. T1xese
fied 111 set of sequences, then they might be observations lead us to the rationale for chamc-
e1im1natcl.dfrom the analysis, even if their posl- ter weighting. If we could somehow deduce
t~onalhomology 1s not in doubt. This phenome- which characters were in fact the unreliable ones,
non yrovrdes one motivation for character the task of reconstructing evolutionary trees
we~ghcing would be greatly simplified, because we could
Subjective elimination of data is sometimes minimize their infiuence in the analysis by giv-
critlc~~cd as being too arbitrary (e.g., Gatesy et al., ing them less weight.
1993). Although we share the concerns of these Identification of unreliable characters is also
authors, we take the position that data are exan effective way to avoid systematic error. By as-
cluded from the moment one chooses a particular signing lower weight to the characters that either
gene, set of genes, or gene region to use in a sys- violate the assumptions of a method or arc known
ten-ialjc study. Most researchers would agree that to predispose the method to inconsistency, we can
cerlaln genes are evolving at an inappropriate rate minimize the likelihood that systematic error ttrill
for the level of a study, and would avoid those occur. For instance, parsimony methods are much
genes 111 an attempt to minimize saturation effects more likely to be consistent if character change is
and other problems (see, e.g., Simon et al., 1994). low, and consequently work best if the events be-
Tt seclns unreasonable to argue that just because ing minimized (i.e., homoplastic changes) are in
sequence data have been obtained (perhaps even fact the rare events. If the rapidly evolving char-
accidentally) for a region that is evolving too acters are recognized as such and given little
rapidly lo be reliable in a study, we are forced to weight in the analysis, the problem of attraction
retain [hem at all costs. It is unrealistic to think of long branches due to chance convergences will
that subjectivity in a molecular systematic study be minimized. Unfortunately, beyond the use of
can be entirely avoided-for example, one could alignment difficulty as a criterion for macromole-
almost always sequence additional taxa relevant cular sequences, methods for assessment of char-
to a question, and it is a subjective decision when acter reliability have received little attention.
to stop. We believe that the benefits of excluding One extreme form of weighting is the elimi-
clearly unreliable regions-however subjectively nation of characters, as discussed above. By as-
deiermined-outweigh the dangers. signing one set of characters the maximum
The above paragraph notwithstanding, we weight (unity) and another set of characters the
look iorward to the development of methods that minimurn weight (zero),we esse~ztiallyassert that
allow ;Imore objective assessment of which posi- [here are Iwo classes of characters, one compris-
I J O J I ~111 a sequence are worth retaining. One ing characters that, at least on an a priori basis, are
prij1~115mg approach is the elision method of W.C. all equaIly reliable, the other containing characters
Mrheelcr et al. (1995), which attempts to identify that are worthless for the analysis in question. If
stable irersus unstable alignment regions by ask- we believed that characters actually behaved in
ing which positions align consistently over a wide this way, we would use a method of analysis
r'ingc of al~gnmentparameters. known as character compatibility (Felsenstein,
1981b), which searches for the largest "clique"-a
cr-r~rcnnc.rEirWEIGHTING Obv~ously,all charac- set of mutually compatible characters that can all
1rl.s ;ire no1 equally informalive with respect to fit on the same evolutionary tree without homo-
the evolutionary l~istoryof the taxa under study. plasy (e.g., Le Quesne, 1982; Estabrook, 3983).
Sanic cl~aractersare both informative and reli- Compatibility methods are no longer in wide-
able, they are telling us the truth about their spread use, probably because of their implicit ad-
past Other characters may be reliable but unin- herence to an ul~realisticmodel that asserts that
formative: although they are not actively mis- once a cl~aracterhas been excluded from the
leading us, they are not telling us anything very largest clique, it no longer conveys any useful in-
~ ~ i e l either,
ul The reason that phylogenetic formation whatsoever.
annlyqls is so difficult lies in a third category of An approach that uses compatibility as an ob-
Phylogenetic Inference 503.
jective weighting criterion (rather than to infer from a new set of weights is identical to the tree
phylogeny directly) was developed by Penny and that was used to derive those weights). Farris
Hendy (1985,1986). Sharkey (19891, apparently (2969) used reweighting functions based on the
unaware of the work of Penny and Hendy, de- consistency index (Kluge and Farris, 1969), de-
scribed a related approach, but limited to binary fined as r l / l , where rl is the range cf character j
characters. The strategy of these workers is to (defined as the minimum number of steps that the
count the observed number of incompatibilities character would require on any possible tree) and
(01)between each character (j) and each other I, is the length required by the character on the
character. (For methods to test the pairwise com- tree at hand. Thus, characters that change the
patibility of unordered multistate characters, see minimum possible number of times have perfect
$stabrook and Landruin, 2975; Fitch, 1975,1977; consistency (1.01, whereas characters that change
Sneath et al., 1975.) To convert this number to a more often have lower consistencies (approaching
weight, Penny and I-Iendy recommended com- zero in the limit). Farris also noted that more ex-
puting the number of incompatibilities expected treme forms of weighting might be more effective
by chance (El) if the distribution of states for each than the use of the consistency index in successive
character were independent of that for other char- weighting procedures.
acters (i.e., free of any non-independence imposed One danger inherent in any successive ap-
by their evolution on a common phylogeny). proximations (a posteriori) approach is the likeli-
Penny and Wendy (1985) tested several weighting hood of the search becoming trapped in a local op-
functions, but seem to have settled on the simple timum that depends on the starting tree (see also
rela tionship Neff, 1986).It is easy to see that a character that is
inconsistent with the initial tree and down-
weighted as a result will have less influence in the
second iteration than it did in the first. But there
Thus, a character that is compatible with all other are some trees on which the character would have
characters is assigned thi maximum weight been perfectly consistent, and would therefore
(unity), whereas a character that is incompatible have been given maximum weight. Farris (1969)
with as many characters as would be expected by tested the effectiveness of his successive approxi-
chance alone is assigned zero weight. More im- mations method by adding random noise to a data
portantly, characters that fall between these two set containing otherwise perfectly compatible
extremes are assigned intermediate weights. characters and testing whether the noisy charac-
(Note that if the observed number of incompati- ters were in fact the ones assigned little weight in
bilities actually exceeded the expected number, a successive iterations (they were). We suggest that
negative weight would be assigned unless the one not become overconfident upon seeing this
weights are constrained to be non-negative.) This kind of result, however, as characters in real data
method of weighting thus uses hierarchical struc- sets do not fall cleanly into "completely reliable"
ture in the data to assign weights, but does not versus "random noise" categories. Nonethe-
base weights on any specific t&e. Unfortunately, less, recent model-based simulation studies and
these methods remain relatively untested. studies of well-supported phylogenies (J. Mc-
Another approach to character weigllting is to Guire and J. Iluelsenbeck, personal communica-
estimate optimal weights by successive approxi- tion) indicate that successive approximation ap-
mation (Farris, 1969). An initial set of weights proaches can be effective, although not as they are
(perhaps uniform weights) is used to obtain an usually implemented. McGuire and Huelsenbeck
il~itialestimate of the tree. From some measure of observed little or no improvement in accuracy
the fit of the characters to this tree, a new set of over the initial parsimony estimate when succes-
weights is derived, which are then used to esti- sive weighting was performed using a character's
mate a second tree. The iterative rederivation of average consistency index across all of the most-
weights and recomputation of trees continues un- parsimonious trees as the reweighting criterion.
til the solution stabilizes (i.e., the tree derived However, they found that successive weighting
502 Chapter I 1 I Swofford, Olsen, Waddell
did increase the accuracy of the estimated phy- Character-state weights can be implemented
logeny when used with more extreme forms of by use of the step matrices described in the set-
weighting (such as the inverse of the total number tion on "Generalized Parsimony." Several metlz-
of character-state changes raised to the tenth ods have been proposed for determining appro-
power) and when the best observed index value priate weights. If:we knew the actual probabilities
for a character across all of the most-parsimonious for each type of transformation (e.g., for DNA se-
trees is used (as suggested by campbell and Frost, quence data, A -;\ C, A + G, A -+ T,etc.), then an
1993). appropriate transformation would be
Another problem with successive weighting
approaches similar to Farris's (1969) method is
that there is no objective criterion for comparing
any two trees (D.R. Maddison, 1990).That is, if a where C,,) is the cost of a state change from state i
tree is found to be optimal by the successive ap-, to state j and P,,, is the relative probability that
proximations algorithm, one cannot say how state i will change to state j across a given branch
much worse (if at all) an alternative tree is. or tree (Felsenstein, 1981c; W.C. Wheeler, 1990a).
Goloboff (1993) has developed a method for If the entire probability matrix of state changes is
weighting characters based on their implied ho- converted in tlus way into a step-matrix of change
moplasy that avoids this limitation by defining a costs (including the diagonals, which represent
weighting function and optimality criterion that the probability that a state will not change), then
can be evaluated for any tree and compared the most-parsimonious reconstructions oi ances-
across trees. The idea is promising, although t l ~ e tral states represent maximum Bayesian probabil-
method needs to be mare thoroughly evaluated. ity estimates for these states (D.R. Maddison,
Simon et al. (1994) have written an excellent 1990; Maddison and Maddison, 1992).
review of character-weighting strategies that is How can the relative probability matrix of
both more data-oriented and more comprehen- state changes be estimated? If we can assume a
sive than the discussion here; readers are urged to constancy of processes across characters, then it is
consult their paper for additional insights into is- possible to estimate the probability matrix from
sues concerning weighting in distance and char- the observed data. For instance, with DNA se-
acter-based contexts. quences, we might assume that the relative prob-
abilities of substitutions are affected in the same
CHARACTER-STATE WEIGHTKNG In character way across sites by exposure to the same muta-
weighting, entire characters (e.g., ilucleotide posi- gens and repair mechanisms. Given this assump-
tions in a gene) are weighted differentially. In tion, one way to estimate relative probabilities of
contrast, character-state weighting provides dif- change is to base the calculation on t l ~ eratio of ex-
ferent weights for different character-state trans- pected to observed changes ill all pairwfse com-
formations withill a character (see the section on parisons of the sequences, taking the relative base
"Generalized Parsimony"). Differential character- frequencies of each base into account (Thomas
state weighting provides a mechanism for and Beckenbach, 1989; Knight and MindelI, 1993).
increasing both the consistency and the efficiency However, the various pairwise comparisons are
of parsimony analyses when the relative proba- not evolutionarily independent, so the calcula-
bilities of character-state transformations differ, tions will be biased by the underlying phylogeny.
especially at high rates of evolution (Huelsenbeck One way to account for this non-independence is
and Hillis, 1993; Hillis et al., 1994a,b). The to reconstruct all most-parsimonious ancestral
method works by giving greater weight to rare states in an initial estimate of the tree, and then
changes, which are less likely to be homoplastic use this information to produce a change-and-sta-
(especially at overall high rates of character sis matrix (Maddison and Maddison, 1992). Of
change) and hence more likely to be reflective of course, the reconstruction requires an initial tree,
pl~ylogenetichistory (Williams and Fitch, 1989). which (if estimated by parsimony) requires an ini-
tial matrix of change costs, so the estimate may be from G to A) or asymmetric (111 whlch case I l ~ e
biased by the initial assumptions. In practice, the reclprocal costs will differ, wlt1-1Dollo parsimony
relative frequencies of the various changes usu- being the most extreme form) Under most cv-
ally are not biased greatly by the initial tree, and cumstances, the reclprocal. costs should be sym-
~ubsequentrounds of tree estimation can also in- metric, so that any part of the tree can be looted
volve reweighting of the character-state changes (by ~ncluslonof an outgroup, for ~izstance)~71th-
until a stable solution is reached (a procedure out changing the length of the tree. If asymlnctn-
called dynamic weighting by Williams and Fitch, cal step matrlces are used, then the various lot-
1989). Alternatively, matrices of change costs can lngs of a tree wlll differ in tree length, so rooted
be estimated for several alternative hypotlzeses to trees must be exarnincd to dcterininc the trec
examine directly the extent to which the starting length of the potentla1 solutions Slnce small
tree biases the estimates of change costs. asymmetries m the estimated matrlx are expected
One problem with the above approach is that from random error associated with finlte saalplc
the most-parsimonious reconstructions are not the sizes, onc would not want to root the trce on the
o111y changes possible. Ideally the relative proba- basrs of t h ~ srandom error alone. However, i f ihc
bility matrix should be based on sumned proba- asymmetries of change among states are stro~lg
bilities across all possible character-state histories. and obvious (as wlth some RNA virusps,
This can be accomplished using maximum likeli- Moriyalna el al., 19911, then the use of an asyrn
hood (e.g., Z. Yang, 1994a; Z . Yang et al., 1994). inetr~calstep lnatrlx may be justifled (e g , see
But if the relative probability matrix is estimated Hillls et a1 , 1994a)
using likelihood methods, then what is the ad- The assumption of constant substitutional
vantage of using weighted parsimony methods processes operating across sites can be vlolatcd
over an explicit likelihood estimation procedure for any number of reasons, ~ncludingdependence
to estimate the tree? One of the principal advan- on the state of izeighbor~ngbases (Randall ct a l l
tages is one of computation time: conlplex maxi- 1987; Schaaper and Dunn, 19571, codon usage 113
muin likelihood models typically constrain an in- protcin-coding genes (W -FI LI et a1 , 1985b),
vestigator's ability to search tree spacc strand bias (Wu and Maeda, 1987, Thomai and
thoroughly, so only a very small portion of the po- Beckenbach, 1989), n ~ tat1011
u bias (Loeb and PI c-
tential solution space can be explored. Weighted ston, 19861, secondary structural conslraints
parsimony procedures often provide a close ap- (Gerbi, 1985; D ~ x o nand Hillls, 1993, T~lllerand
proximation to the likelihood solutions, and the Coll~ns,19951, and other non-phylogenctlc
calculations are much faster. Thus, one strategy is sources of covariation among sites (Fltch a n d
to estimate the relative probability of change ma- Markowltz, 1970, Korber et al., 1993). Thercforc.,
trix using likelihood, and then use weighted parin some s~tuatlons,~tmay be necessary to d ~ v l d c
simony to explore the solution space as thor- the data set (e.g., into first, second, and tl~irdpo-
oughly as computational limits permit. Once sitlons of codons) for the purpose of cornpnlii\g
optimal or near-optimal solutions have been separate step matrices to p r o v ~ d edlffercntlal
found (under the weighted parsimony criterion), weighting of state changes among thc vallorls
they can be used as input trees and evaluated un- sltes ~ * the
r sequence
der the likelihood model. Given a fixed and finite
amount of computation time, this procedure often
finds better solutions under the likelihood crite-
Random Error
rion than does a direct search of tree space under The only way to avoid random error is to obtain
the likelihood criterion, at least for moderately an infinite amount of data; this practice will guar-
large data sets. antee the correct result as lor-lg as the method is
Step matrices used for character-state weight- consistent. This option is unrealistic, however, so
ing can be either syinmetric (e.g., the cost of a it is important to maximize the extraction of pliy-
change from A to G will equal the cost of a change logenetic information by using the most effic~ent
504 Chapte~11 / Szoofford, Olsen, Waddell G.' Elillis
methods [hat are applicable to the available data. length atid measures of character consistency or
In any case, methods must be used to estimate the homoplasy (Archie, 1989a,b; Faith, 1990, 1991;
sensrtivr[y of the results to finite sampling. Penny Faith and Cranston, 1991).
and 1-Icndy (19861, Felsenstein (1988a), Li and An alternative approach to permutatioll is the
Couy (19911, Hillis et al. (1993a), and Li and examination of shape of the distribution of tree
Zharktkh (1995) have presented reviews of the lengths for either all possible trees or a random
Inany methods available. Here we present and sample of them (Hillis, 1991; Hillis and Hueken-
dlsc~rssa few of the more common rnetliods. beck, 1992).Fitch (1979, 1984) observed that data
sets with little or no hierarchical structure tended
Testins for Hierarclziccrl Structure to produce relatively symmetric tree-length fre-
Even ~ r a' data set were co~istructedby randomly quency distributions. Hillis and Huelsenbeck
asslglilng character states to taxa, some random (1992) showed that as the amount of hierarchical
covariatlon would be expected due to the sto- structure was increased, these distributions be-
chastic nature of the sarnpllng process. This ran- came more Ieft-skewed. The degree of skewness
dom covariation would lead phylogenetic recon- can be quantified using the standard gl statistic.
structlon methods to prefer some trees over others For n trees of length T, gl is calculated as
even tilough true hierarchical structure in the data
was absent. Thus, it is worthwhile to ask whether
a data set contains more hierarchical structure
than would be expected purely by chance.
0i1c way to assess the non-randomness of hi-
eral-clilcalstructure is through permutation tests, where s is the standard deviation of the tree
c.irl-uc11provide a means for approxlinating the dis- lengths (Sokal and Rolzlf, 1981). Strong skewness
iributron of a test statistic under a given null hy- can be misleading, however, as very localized
poihesls by perinutlng (randomizing) the ob- structure can lead to highly asymmetric tree-
served data. In a phylogenetic context, permuted length frequency distributions. (For example, a
data s c t s are created by randomizing character purely random data set can produce a highly
states among taxa, while holding the total num- skewed tree-length distribution if one taxon is du-
ber of occurrences of any state constant. Thus, any plicated, as trees consisient with the monophyly
correlation among character states that results of the duplicated pair will be much shorter than
fro111 actual pl~ylogenetlcstructure is destroyed. the remaining trees.) Hillis (1991) suggested a
By comparing the null distribution of a test slatis- procedure for detecting those groupings most re-
tic g~neratedfrom a series of permuted data sets sponsible for the observed structure by calculat-
with Lhc observed value of the statistic from the ing the g1 statistic after successive restrictions of
origlual data, one can determine whether the null the sample space of trees. He used random char-
hypothesis of no phylogenetic structure can be re- acter states (rather than permuted stales from the
jectvd If the test statistic does not lie in the ex- observed matrix) to estimate the null distribution.
trclnc (say 5%) tail(s) of the null distribution, then The latter approximation is computationally
il~cre1s a reasonably good chance that it could much laster than permutation (in fact, it need
have artsen by chance in the absence of meaning- only be calculated once for a given number of taxa
ful i~ierarchicalstructure, and further analysis of and characters), but it is sensitive to deviations in
the ii'lta would seem ill-advised. It is important to the frequencies of the observed character states.
i erncrnber, however, tliat although significant hi-
erarchical structure may be due to phylogenetlc Tests for Conzpauing Two Trees
signal, other sources of structure (such as basta Many tests have been described to compare rive
composttional bias, or convergence) may also lead Iiypothesized trees: Is tree A significantly better
to rejection of the null hypothesis. Statistics that (under a given optimality criterion) than tree G, or
have been used in pertnutation tests include tree are the differences within the expectations of ran-
dom error? Such tests have been devised for each tive site. The expectation for D (under the null hy-
of the major optimality criteria. pothesis that the two trees are not significantly
different) is zero, and the sample variance of D is
PARSIMONY The first analytical tests for parsi-
mony were devised by Cavender (1978, 1981),
who studied the case of a four-taxon tree.
Felsenstein (1985b) extended these results to
include an assumption of a constant molecular
clock, and Steel et al. (1993b) extended Felsen- where n is the number of informative sites. The
stein's test to take into account unequal null hypothesis that D = 0 can be tested with a
nucleotide frequencies among the taxa. Li and paired t-test with n - 1 degrees of freedom, where
Zharkikh (1995) noted that these tests could, in
principle, be extended to more than four taxa,
but that the tests are expected to have very low
f -
= D ln
,s /&'
power, Therefore, we concentrate here on related
heuristic tests that can be used with any number If there is no a priori reason to suspect that tree 1
of taxa. is better than tree 2, the test should be two-tailed.
Templeton (1983b) devised a nonparametric If there is an a priori reason to suspect that one tree
test for comparing two trees. The test utilizes a is better than the other (for instance, if one tree is
Wilcoxan ranked sums test of the relative number the optimal tree found in a search, and it is being
of steps required by each character on each of the compared against nearby suboptimal trees), then
respective trees. If the characters are uniformly the expectation for D is no longer zero. For this
weighted and require no more than one addi- reason, the test is strictly valid only when the two
tional change on either of the trees, then the test trees being compared are selected on an a priori
can be simplified into the "winning sites" test of basis.
Prager and Wilson (1988). This simple test com-
pares the number of characters that favor each of DISTANCE TESTS Rzhetsky and Nei (1992a,
the two trees and tests the results against a bino- 1993) proposed a test for comparing two trees
mial distribution. Under the assumption that ran- under the minimum evolution criterion. In this
dom noise will be equally likely to favor either of test, D is the difference in the sum of the branch
the two trees, the test asks whether the support lengths for the two trees as estimated by the
for one hypothesis is significantly better than least-squares method, and the variance of D is
would be expected from random variation among either estimated by bootstrapping (Nei, 1991) or
the characters. Although the assumption is usu- computed analytically (Rzhetsky and Nei,
ally not met exactly (because the size of the rele- 1992a). Rzhetsky and Nei suggested a search
vant subgroups in the two trees is expected to dif- strategy for solutions under the minimum evo-
fer), the effect of the violation is likely to be small lution criterion by comparing the neighbor-join-
and the test gives an easy approximation of the ing approximation to all trees that differ from
probability that the observed difference is due to the neighbor-joining tree by u p to four symmet-
random error. ric-difference distance units (dsu), and accepting
Kishino and Hasegawa (1989) devised a para- all trees that are not significantly worse than the
metric test for comparing two trees, under the as- neighbor-joining tree. They restricted the com-
sumption that all nucleotide sites are indepen- parisons to trees within 4 dS, because studies
dently and identically distributed. This test uses based on six taxa showed that it is unlikely for
the difference in lengths of the two trees (Dl as a the optimal solution to be any more distant
test statistic, where D = CDg ) and Qi) is the dif- from the neighbor-joining tree under these con-
ference in the minimum number of nucleotide ditions, at least if the number of characters
substitutions on the two trees at the ith informa- examined is large. However, this search strate-
506 Chapter I 1 / Swafford, Olserr, Waddell G. Hillis
gy is likely to miss many solutions that are rearrangement, we could test one topology
equal to or better than thc neighbor-joining tree against the other by pretending that there was
if there are greater numbers of taxa. For one degree of frcedoi~~ and using the hkelihood
instance, one of us (DLS) has found more than ratio test.
27,000 trees that are equal to or better than the Other approaches have been used to estimate
neighbor-joining tree (under the minimum evo- the significance of a difference in log likelihoods.
lution criterion) for the distance matrix exam- One is the application of the Kishino and
ined by S.B. Hedges et al. (1992b; based on the Hasegawa (1989) test (discussed above, under the
data of Vigilant et al., 1991).All but 345 of these parsimony criterion). An alternative is to generate
equal or better solutions are more than 4 dsD the expected distribution of 6(rather than assum-
from the neighbor-joining tree, and better solu- ing a ~2 distribution) through silnulation of the
tions are as much as 30 dsD from the neighbor- null hypothesis (i.e.,the tree with the lower likeli-
joining tree. Therefore, a neighbor-joining esti- hood). The likelihood analysis already provides
mate (with a search of nearby trees) IS a poor the expected branch lengths given the topology of
substitute for a thorough search of tree space the null hypothesis, under an explicit model of
for near-optimal solutions. If the number of character evolution. Thus, this parameterized tree
taxa is very small (the conditions under which can be simulated under the assumed model of
this search strategy is likely to be successfuI), an evolution, and the simulated data sets can be ana-
exact search (exhaustive or branch-and-bound) lyzed under the maximum likelihood criterion.
is computationally simple and will always find The expected distribution of differences in log
the optimal solutions. likelihood scores (or twice the differences, if the
An alternative approach to testing the differ- standard test. statistic is maintained) between the
ence between two trees is to use a measure called optimal tree and null tree can then be generated
the generalized least-squares sum of squares, under tke assumptiol~that the null hypothesis is
which is similar to a weighted least-squares mea- true. If the difference in the test statistic for the
sure but takes covariances between distances (e.g., trees being compared exceeds 95% of the simu-
shared branches in the tree) into account. This sta- lated differences, then the two trees are signifi-
tistic can be compared against a x2 distribution cantly different at p < 0.05, and the null hypothe-
(see Bulmer, 1991 for examples). sis can be rejected. An example of this approach
(which could be used with any optimality crite-
LIKELIHOOD If one tree is a subset of a second, rion) is presented in Chapter 12, The primary lim-
more fully resolved tree, then the two hypothe- itation to its implementation is the computation
ses can be compared with a standard likelihood time involved, which can be considerable when
ratio test, using twice the difference of the log the data sets are large and the optimality criterion
likelil~oodsof the two trees as a test statistic (6 1. is maximum likelihood.
This statistic is compared against the x2 djstribu-
tion, with the degrees of freedom equal to the Assessing the Reliability oflndividual Branches
difference in the number of parameters of the In many situations, it is desirable to assess the re-
two hypotheses (in this case, the number of addi- liability of the individual internal branches of an
tional branches in the more fully resolved tree). estimated tree. Many methods have been sug-
Unfortunately, we would usuallj~like to compare gested for this purpose. For instance, several
two trees that are not subsets of one another. In a methods have been proposed for testing whether
strict sense, the likelihood ratio test is invalid a particuIar internal branch length is significantly
under these conditions, because the number of greater than zero in an additive-distance tree (see
parameters in the two hypotheses is equal, so we Li and Gouy, 1991). Here we describe two non-
have zero degrees of freedom. Felsenstein parametric approaches that have been widely
(1988a) has suggested that in cases where two used for testing the degree of support. for particu-
tree topologies differ by a single branch lar branches.
DECAY/SUPPORT INDICES AND F P T P TESTS In par- be used to estlmate the varlance associated ~.\rilh
simony, a useful index of support for a mono- a stahstlc for which the underlying sampllng d ~ s -
phyletic group may be obtained by calculating tributioll is eltl~erunknown or dlfflcult to d c lve~
the difference in tree lengths between the short- analytically. These methods are called tesnt~ipi~rrg
est trees that contain versus lack that group (K. techniques because they operate by estlniatlng the
Bremer, 1988). This statistic has been referred to variancc of the sampling dlstrlbution by rcpeat-
as the decay iizdex (Donoghue et al., 1992) or the edly resampling data from the original data set
support mdex (K.Bremer, 1994). A difficulty with Under certain reasonable assumptions (Efron,
this index is that it is not clear how large a value 1982), Lhe variance of the statistic of intert5l can
must be for the group to be considered well sup- be approxilnated froin thc dlstributlon ol 111c
ported. Faith (1 991) extended permutation samplc cst~rnateovel rcpllcations of the jesaln-
approaches to test lor the monophyly of a given pling process. These resainpllng methods \trert'
group of taxa. 1-11s a priori T-PTP (topology- flrst used In a phylogenct~ccontext by Muellei
dependent permutation tail probability) test uses and Ayala (1982),Felsenstein (1985a), and 12ei1ny
as a test statistic the difference in the lengths of and Hendy (1985)
the shortest trees in which a particular group is The bootstrap and thc jackknife differ 11-1 the
non-moi~ophyleticand monophyletic, respec- way 111 which resampllng is performed In ihe
tively. This statistic is equivalent to the sup- bootstrap, data po111ts are sampled randomly,
port/decay indices described above, suggesting wlth replacement, from the original data sct un t ~ l
that it might provide a useful means of assessing a new data set containing the orlginal nunibcr of
their significance. The null distribution of the observations 1s obtained. Thus, some data pomts
test statistic is determined by evaluating the cor- ~1111not be included at all 111 a given b o o l s l ~ a p
responding length differences of trees calculated rephcat~on,others will be included once, and still
from permuted data sets. Faith's a posieriori T- others twlcc or more. For each repllcatlon, the sLa-
PTP test uses the same test statistic as the a priorz tlstlc of interest is coinputed The jackknife, on the
T-PTP test but uses a different method for gener- other hand, resamples thc original data scl by
ating the 11~11distribution. After permutation of dropping k data points al a time and rccomputmg
thc data matrix, one calculates the length differ- the estimate from the rcinalning I I - k obscrva-
ence for all groups of the same size as t l ~ egroup tlons (see R.G. Miller, 1974) We dcscrlbc boot-
of interest and picks the greatest length differ- strapping here because ~t 1s muck more com-
ence betwecn the shortest tree in which the monly used 111 phylogcnetlc applicatrons, b u t
group of interest is non-monophyletic and the much of the discussion applles to jackkriiflng as
shortest tree in which the group is monopl~yletic. well.
Unfortunately, tl~csctests are sensitive to struc- Flgure 33 illustrates tke bootstrapping proce-
ture in the data set that is unrelated to the specif- dul e in a phylogcnetic context. History (thc l r ~ l c
ic hypothesis of n ~ o n o p h y l ybeing evaluated phylogeny) given us one actual dlstrtbut~on
(Thorne et al., 1996). Simulations of Faith's topol- of characters among taxa for any given data set oi
ogy-dependent cladistic permutation tail proba- interesl. The Ideal way to exatnine the effects of
bility (T-PTP) tcsts (Huelsenbeck et al., 1995; randoin error would be to replay the cvolutronary
Tl~orneet al., 1996) demonstrate that it does not tape Inany times; t h ~ would
s allow us to cxalmnc
accurately test for ~nonophylyof the specified san~plingvarlance 111our data directly (see Flgulc
group, so the question of how to assess the sig- 33A). However, tins is not possible due to the sln-
nificance of a support/decay index remains g~tlarltyof evolutionary lustory. Instead, boot-
unanswered. strapping allows us to gcnerate a serles a l p ~ u -
dosamples (by resamplil~gthe unique data set
NONPARAMETRIC RESAMPLING M E T H O D S The with replacement; Flgure 33J31, whicll we can use
bootstrap and the jackknife (Efron, 1982; Efron m place of the actual samples to estiinatc sam-
and Gong, 1983; Efron and Tibshirani, 1993) can piing variance. Typically, the pseudosamples arc
508 Chapter 21 / Szoofjo~d,Olsen, Waddell @ Hillis
+sa~f$l@&+$~fz~~;&~~g~~
i:7.-i;. . :=
- .
.*.=
z.:*.:--7.:
-:--23-'-.
L
:-F
-7,
-
--..--zz--.-.
.. .. ..- .->.-
: --
a. Estimate of
x ,,$
.
i.a>:
p@T.$z
s-*..::.<y+r. .
: -
=.W?-=.---.~Z
...e=-:-z:-.==+z>7&*:
~~~~~~7
,::=7+=.s&:.= ...=>:==.==-c..&--
*-
5=
- 5z..---:-%-rA*2."5*---?<
-- -
-
":<: : : . -:.: .zzr. =:.,=.:+.: --
-x.:.? !
variance about
variance about
Estimate of
true plxylogeny
Flgr~re33 (A) i f phylogenles were repeatable experl- phylogenies are not usually repeatable, it is not possi-
menis, 11 would be posstblc to generate many indepen- ble to draw more than one sample of characters for a
dent iarnplcs of characters for a given gene and tasa of given gene and taxa of ~nterest.Therefore, bootstrap-
iniercii In tlz~scase, the sdlnpling varlance about the ping is used to generate pseudosamples from the
true phylogeny co~rldbe calculated directly Erom esti- unique sample, and sampling variance is calculated
mat?s based on these ~lrdcpendentsamples. (B) Because from estimates based on these pseudosamples
analyrcd ~ndividually,and the proportion (P)of would be found in an analysis of a new, indepen-
ihc pseudosamples that support a given internal dent sample of cl~aracters(assuming we could re-
branch on a tree is recorded. play the evolutionary tape). More recently, Felsen-
iio~.zrmany pseudosarnples must be gener- stein and Kishino (1 993) have suggested that P
ated to obta~na precise estimate of P? The sam- can be interpreted as a measure of accuracy, or the
pllng variance oi P follows the binonual distribu- probability that the specified branch is contained
tiorz, such that o2-- P(1 - P ) / n , where n is the in the true tree (assuming that the phylogenetic
number of pseudosamples (S.B. Hedges, 1992). method is consistent).
For i~istance,if we draw 100 pseudosarnples, the Hillis and Bull (1993) examined these two in-
sample variance of P ranges from a maximum of terpretations of bootstrap proportions, using both
0.0025 (when P is 50%) to a minimum of 0 (when simuIated and known experimental phylogenies.
P is 0 or 100%).However, this just tells us how They found that bootstrap proportions provide
sin~ilnrthe estimate of P is likely to be to what we relatively unbiased, but highly imprecise, esti-
rvould obtain if we could analyze an infinite num- mates of repeatability. They also found that boot-
ber of pseudosamples. It does not tell us anything strap proportions provide biased estimates of ac-
about the interpretation of P. curacy (a result that was also found ar~alytically
Felsenstein (1985a) originally suggested that by Zharkikh and Li, 1992a,b, for four-taxon trees
P could be used as a measure of repeatability, or both with and witl~outa molecular clock). When
the probability that a specified internal branch the phylogenetic method is consistent, bootstrap-
ping gives underestimates of accuracy at high number of potential branches is often so large that
bootstrap values, and overestimates of accuracy at an almost hopelessly low alpha level would be re-
low bootstrap values. The extent of the bias de- quired in order to maintain an overall type I error
pends (at least) on the number of taxa, the num- rate of, say, 0.05.
ber of characters, and the location of the internal Anotlier concern is tke assumption that the
branch in the tree (Hillis and Bull, 1993; Zharkikh sequence positions are changing independently of
and Li, 1995; Li and Zharkikh, 1995). one another. To the extent that this is not true, the
Two corrections have been proposed to recal- pseudosarnples will be too large, and the boot-
ibrate bootstrap proportions to account for this strap values will be higher than they would be
bias. Rodrigo (1993) proposed using an iterated otherwise. It is also important to note that the
bootstrap (Hall and Martin, 1988). This involves bootstrap can only assume that the data at hand
bootstrapping each of the pseudosamples ob- are representative of the underlying distribution
tained in the first round of bootstrapping, and and thereby estimate the variation that would be
thus is computationally very intensive. Zharlukh obtained by sampling additional data from that
and Lf (1995) sl~owedthat a simpler correction distribution. If the data are not representative or
can be obtained with just two rounds of boot- if the reconstruction method makes an inconsis-
strapping on the original sample (with one set of tent estimate of the phylogeny, then bootstrap-
pseudoreplicates the same size as the original ping will not remove this bias.
data matrix, and the other set of pseudoreplicates Bootstrapping and jackknifing can be used ei-
with reduced character matrices). The estimates ther with methods that operate on characters di-
from the two sets of pseudoreplicates can be com- rectly or with methods in which character data are
bined along with a correction for sample size to first transformed into distances. In character-
produce a corrected estimate of plzylogenetic ac- based methods, weighting vectors corresponding
curacy. The simulations of Zharkikh and Li (1995) to the number of times each character is sampled
indicate that this complete-and-partial bootstrap can be constructed and input to the analysis. For
technique can be effective for reducing the bias of distance methods, the resampling is conducted
bootstrap proportions, at least if the number of in- prior to calculation of the distance matrix; each
formative characters in the original data set is replication is then performed using a different in-
large (2100). put matrix corresponding to the replicate sample.
As with other methods, for a valid test using However, an additional source of bias exists with
bootstrapping the null hypothesis should be spec- metl~odsthat make non-linear transformations of
ified in advance. Otherwise, we run into a multi- sequence data (including distance corrections).
ple-tests problem similar to the one arising in a Under these conditions, the bootstrap will (in ex-
posteriori comparison of means following an pectation) overestimate the variance of the cor-
analysis of variance: inflation of the type I error rected data (e.g., Waddell et al., 1994),which leads
rate above the nominal level. (The problem may to conservative tests of significance .
be circumvented to some degree if the researcher Finally, the bootstrap replicates should be
interprets the frequency in which a group appears evaluated under an optimality criterion rather
in replicate trees as an index of support rather than just a tree-building algorithm. Otherwise,
than as a statistical statement, but this interpreta- any bias of the algorithm will artificially inflate
tion is far from satisfactory.) If we are interested the bootstrap proportions. Imagine, for example,
in testing more than one internal branch or if we an algorithm that clustered taxa solely on the ba-
are unable to pre-specify the branch(es) of inter- sis of their input order in the data matrix. Even
est, we can adjust the significance level to allow with no data, such an algorithm would find the
for the fact that we are conducting more than one same tree for every pseudoreplicate. However,
test (e.g., by dividing the significance level by the the resulting 100% bootstrap proportions would
number of tests implied). However, if the bear no relation to any measure of phylogenetic
branches of interest cannot be pre-specified, the accuracy.
510 Chapfer11 / Swoford, Olsen, Waddell & Hillis
IZPPENDIX: PROGRAMS AND SOF*$'&ARfi X3ACKACES A'BJAealF,AUEE
POR CONYlUC'TbNG B2MYLOGE;INeTICANT9 FOP$lkA1'HON CENISTIC
Alt4hLYS ES
Some of this information was extracted from a file compiled by J. Felsenstein and
distributed as part of the PHYLIP documentation in the file main.doc. That file
should be consulted for recent updates on availability and information about new
programs.
Program/Package Operating system

(authod or source code Applications Availability
ABLE DOS To implement a form of para- By anonymous ftp from
(1. Dopazo) metric bootstrapping in con- ftp.cnb.uarn.es (in directory
junction with PHYLIP software/molevol)
CAIC Macintosh OS For comparative analysis of By anonymous ftp from
(A. Purvis and independent contrasts, wit11 evolve.zps.ox.ac.uk (in direc-
A. Rambaul) partially or fully resolved trees tory packages/CAIC)
CLADOS DOS Mapping characters and Contact K.Nixon, L. H. Bailey
(K. Nixon) manipulation of trees Hortoriurn, Cornell University,
Ithaca, New York 14853 USA
CLINCH DO5 and Compatibility analysis By anonymous ftp from
(K.Fiala and FORTRAN muse.bio.cornell.edu (in direc-
G. Estabraok) source code tory pub/software/clinch)
ClustalW Macintosh 05, Primarily for sequence align- By anonymous ftp from
(D. Higgins, DOS, ment, but includes the neigh- ftp.embl-Heidelberg.de (in
J. Thompson, C source code bor-joining algorithm and boot- directory pub/software) or
and T. Gibson) strapping ftp.bio.indiana.edu (in direc-
tory molbio/align)
Component W~ndaws Tree camparison and consensus Contact L. Timpson at
(R. Page) methods for coevolutionary and emtQnhm.ic,ac.ukor an
biogeographic analyses order folrn is available on
the World Wide Web at
http:/ /evolve.zps.ox.ac.uk/
Rod/cpw.html
COMPROB Pascal source To compute the probability that Contact C. Meacham at
(C. Meacham) code characters would be compatible [email protected]
in random data
DNArates C source code S~te-by-sitemaximum likcli- From the World Wide Web at
(G.J. Olsen) hood estimation of the rate of http:/ /rdpwww.life.uiuc.edu,
nuclcotide substitution from a or by anonymous ftp from
sequence alignment and a tree rdp.life.uiuc.edu (in directory
pub/RDP/programs/
fastDNAm1)
Evomony DOS For Lake's method of invariants Contact J. A. Lake at
(J. A. Lake) (Lake, 1987a) lakeQuclaue.mbl.ucla.edu
FastDNAml C source code Afaster adaptation of D N A d From the World Wide Web at
(G. J. Olsen) (can be com- from PHYLIP (version 3.3) for http://rdpwww.life.uiuc.edu,
piled for paral- use on workstations, main- or by anonymous ftp from
lel processing) frames, or supercomputers rdp.life.uiuc.edu (in directory
(including parallel machines) pub/RDP/programs/
EastDNAml)
(continued)
PI~ylogeneficInference 511
Program/J?ackage Operating system

(author) or source code Applications Availability
FREQPARS DOS and C phylogenet~canalysis of fre- By anonymous ftp from

(D.L. Swofford source code quency data onyx si.edu
and S. H. Berlocher)
GDA Windows Hierarchical F statistics, corn- Sinaucr Associates, Sunderlanci,

(P.0.Lcwis putation of genetic d~stances, Massachusetts 01375 USA
and D. Zaykin UPGMA, neighbor joirung, (orders@s~naucr.corn)
computation of discqu~llbrium
coefficients and their relevant
statistics, and exact tests of joint
independence among several loci
GENEPOP DOS Population genetic analyses in- By anonymous ftp from
(M. Raymond cluding tests for Hardy-Weinberg ftp.cefe.cnrs-mop.fr
and F. Rousset) equil~brium,population differen-
tiation, and linkage disequlli-
brium
GENESTRUT Macintosh OS Analyzing population structure By anonymous ftp £roll? csu
(C. C. Constan- from multilocus genotypic data vax I .murdoch.edu.au ( ~ n
tine, R. P. Hobbs, directory pub/vet)
and A. J. Lymbery)
Hadtrec, Prepare, DOS Hadamard transformat~ons Contact V. Spagnolo at
and Trees (conjugations and the distance ~~.spagnolo@~nassey.ac.nz
(D. Penny) Hadamard), character weighting,
distance transformations (in-
cluding LogDet), base composi-
tion tests, resampling schemes,
and tree selection
HennigS6 DOS Fast searching for trees under the Contact A. Kluge, Museum of
(J, S. Farris) parsimony criterion, using heur- Zoology, Umversity of Mlch-
istic and exact methods. Threc igan, Ann Arbor, Micl-ilgan
programs by the same author 48109-1079 USA
extend the capabilities of (arnold.g.klugc@u~n.cc.
Hennig86 to include measures umich.cdu) or L). Lipscorlxb,
of congruence between data sets Department of Biological
(ARN),bootstrapping, T-PTP Sciences, George Wash~ngton
tests, and support tests (RNA), Univers~ty,Washington, D.C.
and jackknifing of large data sets 20052 (b~odlOgwuvm.gwucdu)
to find strongly versus poorly
supported parts of trees (TAC)
MacClade Macintosh OS For interactive manipulation of Slnauer Associates, Sunderland,
(W. P Maddison trees and studying character Massachusetts 01375 USA
and D. R. Naddi- cvolution under the parsimony ([email protected])
son) criterion (including gencralized
parsimony). Numerous features
for evaluating, summarizing, and
simple simulation af characters
and trees. Also useful for high
quality graphical output of kecs
MacT Macintosh OS, For computii~gpairwisc distances By anonymous ftp from
(A. Luettke QuickBasic and calculating neighbor-joining ftp.bio.ind1ana.edu (111 drrec-
and R. Fuchs) source code trees tory inolbio/mac)
(conf~nircd}
512 Chapter 11 / Szoofford, Olsen, Waddell G. Hillis
Prograi~dl'ackage Operating system
or source code Applications Availability
(author)
Contact W. C. Wheeler
-
MALIGN Mac~nLoshOS, Simultaneous alignment of mul-
(W C Wheeler DOS, U n ~ x , tiple sequences and construct~on (Department of Invertebrates,
and D Cladstern) and C soucc of parsimony trees. Code for American Museum of Natural
code implementation on parallel History, Central Fark West at
architectures is available 79th Street, New York, NY
10024-5192, USA) or source
code is available by anony-
mous ftp from ftp.amnh.org
MARKOV FORTRAN To compute distance measures Contact C. Lanave at
(G. l'esole and source code and substitution matrices under lar;[email protected],it
C. Saccone) a stationary Markov model of
DNA substitution. Bootstrapping
is included to assess the reliabihty
of the results
MEGA DOS Calculahon of nucleotide and pro- Institute of Molecular Eva-
(S. Ku:nar, tein pairwise distances, and calcu- lutionary Genetics, Pennsyl-
K Tamura, and latian of trees using the neighbor- vania State Un~versity,Uni-
M Ncl) joining and UPGMA algorithms. versity Park, Pennsylvania
Also searching capabilities under 16802 USA
the parsimony criterion using ([email protected])
stepwise addition, local branch-
swapping, or branch-and-bound
algorithms. Includes bootstrap-
ping and tests for comparlng the
length of two additive trees
hlFrTl<llE DOS Search for trees under minimum Contact M. Nei (same address
evolution criterion; with standard as MEGA)
errors and significance tests
blolcvol DOS, A package of about 20 programs Contact W. Fitch at
('vV Tltch) FORTRAN for estimating parsimony and wfitch8daedalus.bio.uci.edu
source code distance trees, dynamic weighting,
alignment, searching for second-
ary structure, and other analyses
of molecular data
MOLPi-I'f C source code A package of prograins for maxi- By anonymous ftp from
(J Adailhi mum likelihood analyses w ~ t h sunmh.ism.ac.jp
and bl Hasegawa) either nucleotide (NUCML) or
protein (PROTML) sequences,
basic statistics of nucleotide
(NUCST) and protein (PROTST)
sequences, and neighbor-joining
analysis (NJDIST)
MUST and 3s DOS Sequence management, analysis Contact H. Philippe at
(I-i Phllippe) of taxon sampling effects, and hpG8bio4.bc4.u-psud.fr
estimation of appropriate se-
quence lengths for a given
analysis
NONA DOS For parsimony analyses using Contact P. Coloboff at Depart-
(1' C;oioboff) Kennig86 data file format but ment of Entomology, Ameri-
with no limit on the number of can Museum of Natural His-
taxa and characters tory, Central Park West at 79th
St., New York, New York 10024
(continued)
Phylogenetic Iizference 513
programIPackage Operating system
-
ODEN C source code For distance matrix analyses on By anonymous ftp from
(Y. Ina) nucleotide or protein sequences bioslave uio.no (in directory
pub/oderr)
PAML C source code A package mostly for maximum By anonymous ftp from
(2.Yang) likelihood analyses with either ftp.bio.indiana.edu (in direc-
nucleotide or protein sequences. tory molbio/evolve)
Includes programs for recon-
struction of ancestral sequences
and conducting analyses of mul-
tiple genes (baseml, codonml)
and simulating trees (mcml) un-
der maximum likelihood. Also
includes a parsimony program
(pamp) for estimating substitu-
tion matrices, intersite variability
of rates of evolution, and an-
cestral states
PARBOOT C source code For parallel processing of boot- By anonymous ftp from
(P. Roux and strapped data sets in conjunction megasun.bch.umontreal.ca
T. Littlejohn) with PHYLIP
PAUP* Macintosh OS, For finding and evaluating trees Sinauer Associates, Sunderland,
(D. 1;. Swofford) DOS, Unix, under the minimum evolution, Massachusetts 01375 USA
VAX/VMS DNA maximum likelihood, and (orders@sinauer,com)
parsimony (including generalized
parsimony) criteria. Includes
branch swapping, branch-and-
bound, and exhaustive searches.
Reliability of trees may be assessed
with permutation tests, decay/sup-
port indices, bootstrapping, in-
variant tests, or maximum likeli-
hood scores. Includes extensive
pairwise distance calculations,
consensus techniques, and recon-
struction of ancestral states using
parsimony and likelihood methods
Pce-Wee DOS For parsimony analyses using Contact P. Goloboff at Depart-
(P. Goloboff) character weights determined by ment of Entomology, American
their homoplasy during tree search Museum of Natural History,
Central Park West at 79th St.,
New York, New York 10024
PHYLIP DOS, Windows, A package of 30 programs, in- By anonymous ftp from evolu-
(J. Felsenstein) Macintosh OS, cluding parsimony methods of tion.genetics.washington.cdu
C source code invariants, maximum likelihood (in directory pub/phylip) or
(for nucleotide, protein, and restric- from the World Wide Web site:
tion site data), distance methods, (http:/ /evolution.genetics.
and compatibility analysis. Search- washington,edu/phylip.html)
ing by stepwise addition, branch
swapping, and the branch-and-
bound algorithm for some methods.
Includes bootstrapping, tree draw-
ing/ assessment of independent con-
trasts, various statistical tests of trees,
and consensus analysis
Chapter 11 / Swofford, Olsen, Waddell G. Hillis
ProgramIPackage Operating system
Random Cladistics DOS For conducting permutation tests, By anonymous ftp from
(Mark Siddall) bootstrapping, or jackknifing in zoo.utoronto.ca/pub
conjunction with Hennig86 (randoin.doc and random.exe)
RAPDistance DOS, Windows For computing distance matrices By anonymous ftp from
(J. S. Amstrong, in RAPD analyses 1ife.anu.edu.a~(in directory
A. J. Gibbs, pub/RAPDistance)
R.Peakall,
and G. Weiller)
REAP DOS Estimation of sequence diver- Contact D. McElroy

(D.McElroy, gences, nucleotide and restriction ([email protected])
P. Moran, site diversity; tests for hetem-
E. Bermingham, geneity of allele frequencies using
and I. Kornfield) randomization methods
Relatedness Macintosh OS For calculation of relatedness Contact K. E Goodnight, De-
(K. F. Goodnight) statistics from allele frequencies partment of Ecolog and Evo-
lutionary Biology, Rice urn-
versity, Houston, Texas 77252
(keithg@whittaker,rlce.edu)
RESTSITE DOS Manipulation of restriction site Contact J. C. Miller, Whitehead
(J. C. Miller) data and estimation of sequence Institute, 9 Cambridge Center,
divergences; neighbor joining Cambridge, Massachusetts
02142
RSVP C source code For calculating distance matrices By anonymous ftp from
(K.Rice) and measures of variability from oeb.harvard.edu (in directory
restriction map data rice)
The Siminator C source code For simulation of data under By anonymous f t p from
u.Huelsenbeck) several models of nucleotide
substitution for use in parametric
onyx.sl.edu or from the
World Wide Web at http://
bootstrap analyses n1ws7,biol.berkeley.edu/
lohn/john.html
Splits Macintosh OS For conducting split decompos- Contact D. Huson at
(R. Wetzel and ition analyses husonQmathematik.
B.Huson) unibielefeld.de
TreeAlign C source code Simultaneous construction of trees By al~onymousftp from
(J. Hein) (with approximate parsimony or ftp.bio.mdiana.edu (in direc-
distance methods) and alignment tory molbio/aiign)
of multiple sequences
TREECON DOS, Windows For distance methods with By anonymous ftp from
(Y. van de Peer) molecular data sets. Includes uiam3.uia.ac.be
bootskapping and tree drawing
capabilities
VOSTORG DOS Alignment of sequences and calcu- By anonymous ftp from
(A. Z h a r k i l lation of parsimony and distance lxgc6.sph.uth.tmc.edu
and A. Rzhetsky) trees. Other programs available at
this address (by A. Zharkikh, in the
directory zharkikh/bootstrap/
double-bootstrap) conduct full-
and-partial bootstrap analyses
WINAMOVA DOS, Windows Analysis of genetic structure of By anonymous ftp from aca
(L. Excoffier) populations using an analysis of sunl.unige.ch (in directory
variance approach pub/arnova)
ications of Mo a ~Systematics:
+
The State of the Fie
a Look to the Future
David M. Hillis, Barbara K. Mable, and Craig Moritz
INTRODUCTION
From the preceding chapters, it should be clear that the diversity of molecular
techniques available to systematists is considerable and that the problems that
can be addressed with these techniques span an enormous range, from relat~on-
ships among genes within populations to the phylogeny of life. The rapid dcvei-
opment and power of these techniques has produced a euplnoria in evolutionary
biology; because so many new problems can be addressed, it is a commonly held
mnisconception that all evolutionary problems are solvable with molecular data.
Tlus is clearly not the case. Worse, inappropriate techniques are often appl~ed(at
a considerable waste of time and expense) to particular problems that could be ef-
fectively addressed with alternative techniques. In other cases, the technique cho-
sen lnay not be the most cost-effective choice. Therefore, we provide some guide-
lines in this chapter to aid in matching teclmiques to problems.
In the first edition of this book, we concluded that the field of molecular sys-
tematics was in its early stages, with much unexplored potential. Tkc vast in-
crease in volume of the bibliography (which provides only the briefest overview
of the available literature) demonstrates that the use of molecular techniques 111
evolutionary biology has increased dramatically in the last five years. Sanderson
et al. (1993) conducted an extensive survey of phylogenetic analyses and con-
516 Clznpter 12 / Hillis, Mable G-' Moritz
cluded (hat the years from 1989-1991 were char- were immediately promoted by some as '%etterV
acterized by a rapid accumulation of phylogenetic than traditional morphological data. The develop-
data, encompassing a broad scope of topics and ment of DNA hybridization techniques and re-
applicatio~zs,Although almost half of the studies striction fragment analysis was accompanied by
they assessed were based on morphoIogical data, new assertions of superiority, and individuals who
the use of molecular data (sequence data, iol- worked with isozyme electrophoresis were chas-
lowed by restriction site and allozyme data, and tised (in review of grants and publications, for in-
then DNA-DNA hybridlzatlon data) saw a pro- stance) for being "old-fashioned." Most recently,
portlor~atelyhigher increase. Among the journals some proponents of sequencing have suggested
publishing phylogenetic lnforination most often, that all other techniques are superfluous and out-
Jo~irnniof Molecular Evolution and Molecular Biol- dated (e.g., Wilson et al., 1989). We disagree
ogy niid Ez~oluiionwere the top two, emphasizing strongly with these assertions; certain techniques
the crltlcal role that phylogenetic inference is are better than others for answering particular
playing in studies of molecular evolution (Sander- problems, but no technique is best under all cir-
son et al., 1993; see also Chapter 1). The years cumstances. As Avise (1994:xii-xiii) pointed out,
since 199 1 have seen an even more pronounced sometimes timing is everything:
increase in the diversity of studies using molecu- Imagine for the sake of argument that DNA
Iar systematic techniques. In discussing the issue sequencing methods had been widely employed
of clzoosing an appropriate molecular technique for the past 30 years and that only recently had
for addressing a given problem, we mention some protein-electrophoretic approaches been intro-
of the many applications of molecular systemat- duced. No doubt a headlong rush into allozyrne
ics Tlus is by no means an exhaustive summary techniques would ensue, on justifiable rationales
and should be regarded as a composite illustra- that (a) the methods are cost-effective and tech-
tion rather than a complete picture of recent ad- nically simple, (b) the variants revealed reflect
vances (lor additional examples, see Chapters 2, independent Mendelian polymorphisms at sev-
4-91. For a more thorough review of recent appli- era1 loci scattered around the genome (rather
cations, a good starting. place is Molecular Markers, than as linked polymorphisms m a single stretch
of DNA), and (c) the amino acid replacement
Nnturiil iiistoiy and Evol~~tion(Avise, 1994). substitutions uncovered by protein electrophorc-
111 addition to choosing an appropriate mole- sis (as opposed to the silent base changes often
cular techniclue, seleclion of method of analysis is revealed in DNA assays) might bring molecular
an equally important decision. Recent advances evolutionists closer to the real "stuff" of adap-
have resulted in improved methods for analyzing tive evolution. To carry the argument farther,
molecular data, but there are many areas that re- suppose that molecular genetic methods had
main controversial and probably will be subject to been employed tlwoughout the last century but
further attention in the near future. Therefore, we that an entrepreneurial scientist finally ventured
include a brief overview of areas of current devel- into the world of nature and discovered organis-
oprnent in methods of analysis, and discuss ma1 phenotypes and behaviors. Finally, the inter-
face of gene products with the environment
whete w e expect additional advances will occur. would have been revealed! Imagine the sense of
excitement and research prospects!
CHOOSING A TECHNIQUE FOR A The point is that all approaches provide interest-
PARTICULAR PROBLEM ing and important insights into biodiversity and
evolution, and it makes little sense to think of one
Advances in technology often promote various technique as being inherently superior to another.
kinds of data chauvinism. When isozyme elec- Rather than promoting the latest technique as a
trophoresis and microcomplement fixation began panacea, it is worthwhile to consider which tech-
to be applied widely to systematic problems in the nique(~)are best suited for a particular problem.
1960s dnd early 1970s, the new biochemical data Morpholog~caldata are clearly superior to molec-
Applications of Molecular Systematics 517
ular data under certain conditions (e.g., for stud- should be used in combination. Table 1 lists some
ies of long-extinct species, or for looking at some of the common applications of molecular tech-
interactions with and adaptations to the environ- niques in systematics. We roughly classify each
ment), just as the reverse is true under other con- technique into one of four categories for each of
ditions (Chapter 1; Hillis, 1987).Only by combin- the problems listed: the technique is (1) inappro-
ing data from various morphological, behavioral, priate for the problem; (2) appropriate under lim-
physiological, and molecular techniques is it pos- ited conditions; (3) appropriate but not usually
sible to obtain a comprehensive view of evolution. cost-effective;or (4) appropriate under most con-
We think that the recent trend toward technique ditions. By inappropriate, we mean that consider-
overspecialization in graduate studies is harmful able time, money, and effort can be wasted by at-
to the field of evolutionary biology (not to men- tempting to answer the given problem using a
tion the rest of biology). Of course students particular technique, with little likely fruition. As-
should know how to sequence a gene if that is rel- suming that there are no technical barriers, the
evant to their research-but that shouldn't pre- most common reason for such a failure is that
elude them from examining proteins, chromo- there is too little or too much variation to address
somes, behavior, or morphology. Any lab that is the question of interest. A technique is listed in the
limited to only one technique is going to be re- second category (appropriate under limited con-
stricted to a relatively narrow set of evolutionary ditions) for a particular problem if success has
questions and problems. been obtained under some conditions (when lev-
Given the above caveats, we will now address els of variability are appropriate), but alternative
the issue of choosing a molecular technique to ad- techniques are more likely to yield more robust
dress a particular problem. In doing so, we will results for equal or less effort. The third category
try to emphasize that, in many cases, techniques (appropriate but not cost-effective) indicates that
Table 1.
Applications of various molecular techniques to problems in systematics
DNA Restriction Fragment DNAfRNA
Problem Isozymes Cytogenetics hybridization analysis analysisa sequencing
Gene evolution M M - M -, M,- +

Population subdivision + M - + M, +, - +
Mating systems + M - M M, +, - $
Clonal detection + M - + +, +, + $
Heterozygosity + - - + -,f,M M
Paternity testing M - - M M, +, + $
Individual relatedness M - - M M, +, M $
Geographic variation + M - .t M, +, M +
Hybridization + + - + +, M, - $
Species boundaries + + - 4- +, M,- +
Phylogeny (0-5 mya) t M M + -,M, - -I-
Phylogeny (5-50 mya) + M + + -,-,- f
Phylogeny (50-500 mya) M M M M ---
I I +
Phylogeny (500-3500 mya) - - - - -I -I - +
Key: -,Inappropriate use of technique; M, marginally appropriate or appropriate under limited circumstances; $, appropriate use
of technique, but unlikely to be cost-effective; +, appropnate and effechve method.
UFragment analysls lncludes recomnendations for M D s , single locus mini- or rnicrosateliltes, and mult~locusDNA fingerprint-
ing, ul. that order.
518 Chapter 12 / Hillis, Mable Cir' Moritz
the given technique may be used to address the useful markers can be identified through DNA se-
problem, but that other techniques will probably quencing and then appropriate rapid screening
be just as effective for much less effort and/or techniques (fragment analyses or isozyme elec-
money. In other words, except under unusual cir- trophoresis) can be designed to examine this vari-
cumstances, a technique is only recommended for ation across many individuals and Ioci (see Chap-
a particular problem if it falls in the fourth cate- ters 8 and 9). Therefore, studies that combine
gory (appropriate under most conditions). One fi- approaches such as DNA sequencing and
nal caveat-the times of divergence given jn Table isozyme or fragment analyses (e.g., R.J. Baker et
1 are very rough. Because rates of molecular di- al., 1989; Bradley et al., 1993; R.S. Thorpe et al.,
vergence can be very different among lineages 1994) can maximize effectiveness by combining
and among molecules (see below), and because lugh resolution with broad coverage of individu-
the limitations of some methods are still to be ex- als and/or loci.
plored, the times should be used only as a first ap- For studies of mating systems, population
proximation. structure, and heterozygosity, isozyme elec-
Many studies of gene evolution require DNA trophoresis remains one of the best techniques
sequencing, because no other technique provides available. These studies usually require informa-
the necessary information to infer relationships tion from many individuals at many loci, and are
among individual alleles. However, studies of suited perfectly to the kind of data provided by
functional gene duplications and linkage studies isozyme electrophoresis, altl~oughmicrosatellites
can be conducted very efficiently with isozyme are being used increasingly for this purpose. Cy-
techniques (e.g., Buth, 1983; Morizot and Siciliano, togenetic analysis, particularly of meiotic config-
1984; Morizot, 1990).Restriction site and fragment urations, can reveal significant changes in the ge-
studies can be very useful to screen many indi- netic system (e.g., clonal inheritance, polyploidy,
viduals or tandemly repeated loci to study interchange heterozygosity) that are important in
processes such as biased gene conversion or un- their own right and also affect interpretation of
equal crossing over (Seperack et al., 1988). Molec- other types of genetic markers. Studies of indi-
ular cytogenetic analyses also can be used in com- vidual relatedness require analysis of variation at
bination with these techniques to study the large numbers of loci as well, and under certain
distribution of genes across the nuclear genome conditions, isozymes may provide this informa-
(e.g., Wichman et al., 1985,1991; Baker and Wich- tion. The various methods that access variation in
man, 1990; Hillis et al., 1991~). mini- and microsatellite loci (see Cl~dpter8) are
lsozyme electrophoresis, restriction site analy- perhaps the most powerful for inferring individ-
sis, and fragment analysis (e.g., DNA fingerprint- ual relatedness, but in most cases such studies
ing, rnicrosatellites, RAPDs) are applicable to a should be restricted to inferences about close rela-
large number of population-level problems (see tives (Lynch, 1988). DNA fingerprinting tech-
Chapters 4 and 7-8). DNA sequencing also is ap- niques that employ gene amplification (as well as
plicable at this level, but most studies of popula- DNA sequencing studies) can use non-destructive
tion genetics require examination of large num- sampling of tiny tissues samples, an especially
bers of individuals over large numbers of loci (see important point in the field of conservation ge-
Chapters 2 and 10). Although it has become eas- netics, where collecting whole specimens may en-
ier to obtain sequences from many individuals for danger the study populations (e.g., Garza and
certain loci (particularly the mitochondria1 Woodruff, 1992; Morin et al., 1994; A.C. Taylor et-
genome) by amplifying the DNA (see Chapter 7), al., 1994).Tl~eoretically,DNA sequencing can be
it is still inordinately expensive, time-consuming, used with high precision to examine individual
and difficult to obtain sequence information from relatedness, but only if many loci are examined
multiple Mendelian nuclear Ioci among llurnerous from each individual.
individuals. However, sequencing and fragment Geograpl~icvariation within species, detec-
analyses can be combined with excellent results: tion of clonal diversity, the origin of unisexual
Applicafions of Molecular Systsnzatics 519
species, hybridization, and discovery of cryptic species boundaries became clearer, diagnostic
species arc all effectively studied with isozyme morphological traits were found for each of the
electrophoresis, cytogenetics, restriction site species. As in this case, information on species
analysis, and some form of DNA fraginent stud- boundaries from molecular data is often invalu-
ies, or with a combination of these approaches able for separating intraspecific morphological
(e.g., D.D. Shaw et al., 1990; Scribner et al., 1994). polymorphisms from diagnostic characters.
Analyses of cpDNA and mtDNA, which are ma- Perhaps the most common applicat~onof
ternally inherited in most species, can be com- molecular techrziques in systematics is to estimate
bined with studies of nuclear loci (e.g., allozymes) phylogeny. All of the techniques discussed 111 t h ~ s
to provide information on both the degree and bi- book have been applied successfully to questions
ases in direction of hybridization. The two Ends of phylogeny, although the appropriate tec11-
of data also can be c~mbined(often with cytoge- niques will vary from study to study. In order for
netic data as well) to determine not only the a technique to be useful for reconstructing phy-
species involved in initial hybridization events logeny, enough variation must exist among the
&at gave rise to unisexual species, but also the species examined for application of phylogenetic
sexes of each species involved in the hybridiza- reasoning, but not so much that the characters un-
tion event(s) (e.g., J.W. Wright et al., 1983; Avise der study are saturated by cliange. To a first-order
et al., 1991; Moritz and Heideman, 1993; Radtke approximation, useful ranges of divergence can
et al., 1995). be predicted for each major technique except for
The detection of morphologically cryptic cytogenetics, where change is not strongly corre-
species is often accidental; with any molecular lated with time (Table 1). However, these ranges
technique, one should be open to the possibility are very rough; some groups show rnucl~less
that previous perceptions of species boundaries variation for certain characters, and applicatron of
may have been wrong. In many cases, systema- a given technique may be extended further back
tists choosc to investigate a suspicious "polymor- into time for such groups (see the section "Predlc-
phic" taxon; allozyme electrophoresis is used tions of Time from Molecular Data," below). For
commonly in these cases. However, many other example, mtDNA and many commonly exanilned
cryptic species have been discovered accidentally. isozyme loci can be used to study relationships
In addition to examples from studies of isozymes, among higher taxonomic levels of birds than 1s
cryptic species also have been discovered by cy- possible within most other groups of vertebrates
togenetic techniques (e.g., Moritz, 1983) and by (Kessler and Avise, 198510).In groups that have
immunological techniques (e.g., Scanlan et al., never been studled, same experimentation may
1980). In the former case, a morphologically vari- be required to find a technique suitable for a par-
able nominal species of gecko, Hetero~zotiabinoei, ticular phylogenetic problem (see Chapter 2).For
was shown to consist of several cryptic bisexual most studies, however, Table 1 will provide a
species and numerous parthenogenetic lineages of guide to selection of an appropriate teclmicluc, ai
hybrid origin. These concIusions were supported least for a pilot study
subsequently by analysis of isozymes (Moritz et In addition to a necd for rapidly evolv~ngse-
al., 1989a,b).In the immunological example, Scan- quences (see Table I), tracking relationships of m-
lan et al. (1980) showed that some individuals of dividuals within populations often requlres meth-
the nominal frog species Gastrotlzeca riobambae ap- ods that allow for reticulation of lineages (see
peared to be more closely related to other species "Trees versus Networks," below). However, therc
than to other individuals of G,riobambae. This led are several cases in which methods that assume a
Duellman and Hillis (1987) to examine this group bifurcating tree are appropriate for looking a t in-
with allozyme electrophoresis, which extended traspecific phylogenies. For instance, phylogenles
the findings from immunology to suggest that six of organellar genomes usually can be assumed to
species in two different species groups had been be largely free of reticulat~ons.Studies of in-
confused under the name G, riobambae. After the traspecific maternal phylogenies of mitochondr~al
520 Chapter 12 / Hillis, Mable & Moritz
D S A have become commonplace for some ated with few numbers of nucleotide changes can
specles, such as human populations (e.g., Vigilant be overcome by using multiple nuclear loci (see
el al., 1991; D.R. Maddison et al., 1992).Also, rela- Slade et al., 1994),but at present this is a labor-in-
tionshlps within asexual species (or species in tensive undertaking because each gene and each
which recombination is rare) can be examined us- taxon may require specific pilot studies to resolve
ing rnetl~odsthat build bifurcating trees. Phyloge- problems with amplification of pseudogenes and
netic studies also have become increasingly im- multiple gene copies (Chapters 7-8). Intron size
portant in epidemiological and evoiutionary can vary widely between taxonomic groups, and
studics of vlruses. These studies are possible be- primers that work well in one group may not be
cause viruses often evolve rapidly enough to pro- useful in another (e.g., Slade et al., 1994). The in-
duce sulficient variatioin for phylogenetic analysis ternal transcribed spacer (ITS) regions of the ribo-
over ihe course of just a few years or decades somal RNA gene array may prove more useful as
(e.g., li E Doolittle et al., 1990; Fitch et al., 1991; a rapidly evolving nuclear gene target for fine-
R A. Ol~nsteadet al., 1992; Nichol et al., 1993a; scale comparisons, although complications can
Eickbush, 1994; Korber et al., 1994; Crandall, arise if there is extensive variation among copies
1995a,b). In one well-publicized case (Ou et al., within individuals. Some studies (e.g., Gonzales
1992),phylogenetic analyses were used to identify et al., 1990; Pleyte et al., 1992) have used the ITS
a ser~esof patients infected with W N in a dental regions for comparisons of closely related species,
practice. Phylogenetic analyses also have been but rates of nucleotide change vary widely across
used to identify viruses associated with outbreaks the region, the size of the region varies widely
of prevlousiy unidentified diseases, in some cases among taxa (e.g., Gonzales et al., 19901, and there
even before the viruses have actually been iso- is virtually no obvious sequence homology be-
lated (Nichol et al., 1993b). tween more distantly related taxonomic groups
Closely related species (diverged within the (e.g., Pleyte et al., 19921, The non-transcribed
past 5 million or so years) are best studied by ex- spacer regions of ribosomal DNA have not been
amil-iing relatively fast-evolving isozyme loci (see examined as thoroughly, but tend to be too vari-
Chapter 41, nuclear spacer regions and introns or, able to provide meaningful phylogenetic signal.
in anlmals, the mitochondria1 genome (see Chap- Even with these rapidly evolving sequences, it re-
ters 7-9). Other techniques have, on occasion, mains impractical to resolve very recent (e.g.,
p r o v ~ duseful in tlus range, but in the majority of post-Pleistocene) divergences because it is diffi-
cases are not sensitive enough to detect sufficient cult to sort uniquely derived character states from
changes over such a short time scale. There has random fixation of ancestral polymorphisms
been some recent effort to find intron regions of (E.N. Arnold, 1981; Neigel and Avise, 1986).
protein-coding nuclear genes that could be useful The most common timeframe of divergence
to study divergence of closely related species and studied by systematists (roughly 5-50 million
of populations within species (Lessa, 1992; SIade years) is within the range of study of most of the
et al , 1993; Palumbi and Baker, 1994). In some techniques discussed in this book. Further back
cases, substantial variat~onand geographic struc- into time (50-500 million years: Table 1)most of
ture have been observed (e.g.,Palumbi and Baker, the techniques are relatively ineffective, except for
1494,13~~r"cn and Lee, 1994). However, a 729-bp sequencing of relatively conserved genes (Chap-
~ntronof the human Y cl~romosomeshowed no ter 9) and perhaps comparing changes in organi-
varlaiion among 38 human males (Dorit et al., zation of organelle genomes (Chapter 8). Beyond
1995).The few studies that have compared intron 500 million years divergence, only sequencing the
sequences between species also have found sur- most conserved genes has been effective for phy-
prisli~glylittle variation between closely related logeny reconstruction. In this range, adequate
taxa ( e . ~ .Slade
, et al., 1994); in myobatrachine resolution of closely spaced divergence points be-
frog"t!xere tends to be little variation within gen- comes highly unlikely using any technique.
era but major differences between genera (8. If several techniques are appropriate for ad-
Mnb'ie, personal observation). Problems assocl- dressing a particular problem, cost and the avail-
ability of technology often are paramount. Costs (A)
BA
for laboratory set-up and operating expenses vary
considerably, but in general isozyme electrophore- E
sis and cytogenetics are the least expensive tech-
niques per specimen examined, whereas DNA hy-
bridization and restriction analysis are several
times as expensive, and nucleic acid sequencing is
the most costly approach. However, this doesn't A F
*w
mean that a given problem will always be an-
swered with less money by the less expensive tech-
niques, because considerable money (and time) can
be wasted by trying to apply an inappropriate
technique to a particular question. All heritable in-
formation is potentially accessible to DNA se-
quencing, whereas only subsets of this information
are accessible to the other techniques. Often, the
choice of technique will depend upon the resolu- (B)
Bw
tion required to address the question of interest.
For many problems, it will be useful. to use
more than one approach. For example, simultane-
ous examination of chromosomes, allozymes or
microsatellites, and mtDNA to investigate popu-
lation structure, clonal diversity, or hybridization (c) A
phenomena can provide qualitatively in- Figure 1 (A) Unrooted tree (not directed with respect
formation than would be obtained from the use of to time, but contains no cycles). (B) Rooted tree show-
any one approach. For phylogenetic studies, it is ing same branching relationships as in (A), but rooted
useful to study several sequences that evolve at along the branch leading to F. This implies a direction
of time, from the root toward the tips. The branch
different rates to resolve different parts of the in example are arbibary (C) An unraoted
phylogeny. An may network with one cycle. If this network were rooted
pare allozymes to identify groups and to obtain along lineage F, the cycle might be interpreted as a re-
some phylogenetic information within and be- combination or hybridization event between the lin-
tween groups; rapidly evolving sequences (e.g., eages leading to *and Clwhich gave rise to lineage B.
The graph could also be interpreted as ambiguous
animal mtDNA) to resolve relationships within placement of A, B, C.
groups; and slowly evolving sequences (e.g.,
rDNA) to resolve among-group relationships or
to root the tree by comparison to outgroup taxa. mistakenly called networks by systematists, but
the term network actually refers to graphs with cy-
cles (see Figure 1).Of course, some biological phe-
DATA ANALYSIS: ISSUES AND nomena can only be represented by networks. Ex-
CONTROVERSIES amples include recombination events between
genes, hybridization events between lineages, and
processes of horizontal gene transfer such as retro-
Trees versus Networks transposition. In situations where such phenom-
Most phylogenetic methods produce trees, which ena are likely to occur (e.g., in many intraspecific
in the language of graph theory are restricted to studies, or in groups where hybridization is com-
graphs without cycles (cycles are commonly called mon), methods that build networks rather than
reticulations by systematists). Trees can be either trees should be used.
rooted or unrooted (the latter are undirected with The principal problem with building networks
respect to time). Unrooted trees are sometimes is to detect recombination events (or other reticu-
lations). One of the simplest (yet often effective) analyses of the individual studies. It is not un-
procedures for detecting reticulations that result usual to find that the mean estimate from the
from hybridization events involves producing a combined studies falls within the confidence lim-
tree, then searching for branches with excessive its of the estimates of each of the individual stud-
homoplasy (Buth, 1984a; Funk, 1985). More re- ies, even though the point estimates of these indi-
cently, Hein (1990,1993) developed an explicit al- vidual studies differ. In this case, the differences
gorithm for detecting recombination. Other meth- among the studies may be ascribed to stochastic
ods (e.g,, Bandelt and Dress, 1992; see Chapter 11) variation, and the grand mean can be accepted as
examine support for alternative solutions and pre- the best estimate of tlze parameter in question.
sent the results as a network that represents the In a phylogenetic context, the multiple data
potential ambiguity. Templeton et al. (1992) devel- sets may represent different genes, different
oped a method that uses Hein's algorithm to de- kinds of data (e.g., sequence data and inorpho-
tect recombination events, and then represents logical data), or even different process classes
parsimonious and near-parsimonious solutions in within a single gene (e.g., first, second, and third
a network. This latter method is most effective positions of codons). Debates about how (or
when the average number of changes among Izap- whether) data from these multiple data sets
lotypes is small, a situation in which most other should be combined in phylogenetic analyses
metl~odsare least effective (Crandall, 19941, have paralleled the debates about meta-analyses
To date, methods that assume a tree are used in general (see Hillis, 1995). Hillis (1987) sug-
much more commonly than methods that assume gested that the best estimate of phylogeny is de-
a network. However, under certain circumstances, rived from a combined analysis of all relevant
an assu~nptionof a network is much more realis- data, but that congruence among independent
tic (Crandall et al., 1994; Crandall, 1995a,b; Cran- data sets provides convincing evidence that the
dall and Templeton, 1996).Given the recent inter- underlying phylogeny is being correctly esti-
est in intraspecific applications of gene evolution, mated. This position is consistent with the basic
further development and more widespread use of idea of a meta-analysis. Kluge (1989) argued that
network methods is expected. the relevant data sets should always be combined
for analysis, but that the combined analysis
Combined versus Separate Analyses of makes the individual results irrelevant-an ap-
proach he called "total evidence." Kluge argued
Multiple Data Sets that a combined analysis maximizes the explana-
In any field of science, a question arises whenever tory power of all the data, whether or not the in-
multiple studies have been conducted to address dividual results are consistent with the combined
the same problem: If the results of the individual result. This is equivalent to the first part of a stan-
studies differ, wlzat is tlze best way to reach a gen- dard meta-analysis. Miyamoto and Fitclz (1995)
eral concIusion?The general term for such a com- took the opposite position, and argued that the
bined study is a meta-analysis, which was origi- individual data sets (or process partitions) always
nally defined as a "statistical analysis of a large should be analyzed separately, because the sepa-
collection of analysis results from individual stud- rate analyses are likely to provide insights into
ies for the purpose of integrating the findings" the individual data sets, and different results
(Glass, 1976).However, the term is often used in could indicate violation of underlying assump-
a Inore restrictive sense to describe a particular tions in one or more of the analyses. This is
method of analyzing multiple studies (Hedges equivalent to the second part of a standard meta-
and Olkin, 1985; Olkin, 1990; Mann, 1990; Dick- analysis. fn practice, most systematists do both
ersin and Berlin, 1992).The typical meta-analysis separate and combined analyses (Hillis, 1987;
(restrictive sense) consists of a weighted, com- Olmstead and Sweere, 19941, although the final
bined analysis of all the data from across studies; step (asking whether the combined result is
the result of this combined analysis is then com- within the confidence limits of the individual
pared (statistically) to the results of the separate studies) is rarely attempted. The procedures for
Applications of Molecular Syste~nafics 523
establishing whether or not a given result is assign some measure of reliability to each of the
within the confidence set of trees for a given internal branches in a tree (see Chapter 11). Suclz
analysis are still under development (see Sander- approaches are designed with hypothesis-gener-
son, 1989; Swofford, 1991; Rodrigo et al., 1993; de ating (rather than hypothesis-testing) studies In
Queiroz, 1993; Lanyon, 1993; Hillis, 1995). I-iow- mind (see also Chapter 1). A tree is often recon-
ever, examples of this approach are beginning to structed with no a priori hypotlzesis of phylogeny
appear (e.g., Omland, 1994), and phylogenetic to be tested: the investigator simply wants a reli-
meta-analyses of multiple data sets will likely be- able estimate of phylogeny for the group. Under
come more common in the near future. these conditions, we need some measure of the re-
A combined phylogenetic analysis makes two liability of the various reconstructed branches.
important assumptions: first, that the same under- Measures such as bootstrap proportions and sup-
lying tree is being reconstructed in each of the port indices are designed to provide this informa-
studies; and second, that tke chosen method of tion. However, as detailed in Chapter 11, the m-
analysis is appropriate for each of the individual terpretations of these measures are not always
data sets. If a test for homogeneity among data straightforward in this context. Furthermore, in
sets fails, tlus is an indication that one or both of many cases, it is the overall tree structure (ra thcr
these assumptions has been violated (Bull et al., than a particular branch) that suggests that the
1993b;de Queiroz, 1993).A violation of the first as- null hypothesis is incorrect. We can imagine a sit-
sumption can occur because individual gene trees uation in which no single branch is particularly
may differ from the species tree that contains them well supported, and yet the cumulative effects of
(because of lineage sorting or non-orthology).A vi- many branches contain enough phylogenetic sig-
olation of the second assumption may occur be- nal to reject a parficular null hypothesis. Undcr
cause a given method may be inconsistent (or oth- these conditions, the method of parametric boot-
erwise biased) for one or more of the data sets, or strapping (Efron, 1982; Bull et al., 1993a; Huelsew
because the different data sets are evolving in re- beck et al., 1995) can be extremely useful.
sponse to different processes (e.g., rates of substi- Bootstrapping methods are-a general set of
tution may differ dramatically). Under these con- methods for creating pseud~re~licaie data sets in
ditions, the combined analysis may be less situations where true resamyling is impractical.or
informative or even misleading compared to one impossible. (The name "bootstrapping" refers to
or more of the individual analyses (see Bull et al., pulling one's self u p by the bootstraps in this sta-
199313 for some exan~ples).Modifying the method tistically difficult situation.) In the case of phylo-
of combined analysis (e.g., by differential weight- genetics, we only have a single instance i f ;ach
ing; Hillis, 1987; Chippindale and Wiens, 1994; or taxon. Yet, we know that the distribution of char-
by making the model underlying the analysis acters we observe is influenced by stochastic ef-
more generally applicable) will solve the problem fects. The pseudoreplicate data sets generated by
in some cases. However, in other cases it is possi- bootstrapping alldw an investigator to asscss
ble to show that a given data set is uninformative whether or not these stochastic effects are likely to
at best and misleading at worst, no matter how it have influenced the results (in the phylogenet~c
is analyzed (see Huelsenbeck et al., 1995 for an ex- context, the branching order of the tree). 111 phy-
ample). In such cases, the combined analysis logenetic analyses, nonparametric bootstrapping
clearly should exclude the problem data set. (usually simply called '%ootstrappingr' in system-
atics) is the most commonly used method: the
pseudoreplicate data sets are generated by ran-
Hypothesis Testing and the domly sampling the original character matrlx
Parametric Bootstrap with replacement to create-new character ma tl-ices
Most methods for testing the reliability of phylo- of the same size as the original (Efron, 1979, 1982;
genetic results concern testing the reliability of the Felscnstcjlz, 1985a;see Chapter 11).The frequency
data as a whole (is there information content in with which a given branch is found upon analy-
the data set, or just random noise?) or attempt: to sis of these pseudoreplicate data sets is recorded
524 Chapter 12 / HiLlis, Mable & Moritz
as thc. bootstrap proportion. These proportions (A)

car. be used (within limitations; see Chapter 11)to
assess the reliability of individual branches in the
optlmal tree.
'Tne analysis shown in Figure 2A highlights
some uses as well as limitations of the nonpara- ?- 1Patient H
metric bootstrapping approach. In this study, a 35 Patient J-x
deniisc who was infected with the human immun- Patient I:
odeflclency virus (111V) was suspected of 19 LC B
infect~ngsolrze of his patients in the course of treat-
77 Patient D
ment (see Ou et al., 1992). HXV evolves very
quickly (on the order of 10-3 substitutions/ -Patient J-y
site/year), so it is possible to trace the history of in- F~~VELI
fections among individuals by conducting a phy- Parsin~onyscore: 225 steps
ML score: -1484.10793
logenetic analysis of HIV sequences. Partial HXV
sequences were obtained from a series of the den-
tist's HIV-positive patients, as well as other HIV- )('
positwe individuals from the local community, for
the purpose of conducting a phylogenetic analysis
(only B subset of the sequences is shown here for
the sake of a simple example; see O u et al., 1992
for a more complete set of sequences). The pri-
- LC C
Patient F
mary quesl~onconcerned which of the patients (if - LC B
any) h a d been infected in the course of dental Patient J-x
treatment. The phylogeny shown in Figure 2A is
Patient J-y
consistent with one of the patients (patient A) be-
ing epidemiologicaily related to the dentist (in the Patient D
full study, it appears that four additional patients HIVELl
also fall into this category; Ou et al., 1992; Hillis et Parsimony score. 232 steps
ML score -1504.35273
a1,1994a,b). This result is supported by a rela-
tively high bootstrap proportion (90% in this
Figure 2 (A) Optimal tree for a set of HIV sequences
However, the allalysis col~ectedfrom a dentist, a series of l~ispatients, and a
another poillt of interest from the standpoint of sene, of local controls (LC; individuals from the local
epidemiology: two different HIV sequences from population who were not patients of the dentist). The
patient J do not cluster together m the tree. Patient data set is a subset of the sequences reported by ou et
3 w a s a patient with multiple risk factors for HIV, al. (1992) and Hillis and Huelsenbeck (1994a).The tree
is consistent with an hypothesis of HlV transmission
allal~sissuggests that patient J may have from tile dentist to patient A (supported by a bootstrap
been infected from more than one source. If sup- proportion of 90%, shown above the branch). Of greater
ported, this multiple-infection hypothesis would interest for this example, it appears from the tree that
be of inFerest to epidemiologists. the two sequences collected from patient J are not $is-
~ iIt is not obvious
~ how~ the nol,paralne~c
~ ~ ter~taxa, which~ is consistent
, with an hypothesis of mul-
tiple infection of this individual. The bootstrap propor-
proportions can be used test the lly- tions (shown for each internal branch) are not sufficient
pothesis that the patient J sequences are not re- to test this hypotl~esis(see text and Figure 3). (B) The
1nti.d Several branches, none with high bootstrap best tree that is consistent wit11 a sister-group relation-
propomons, separate the patient J sequences in the ship between the two patient J sequences. This tree re-
tree but the scores cannot be summed because quires an additional seven substitutions under the par-
simony criterion, and has a maximum likelihood score
boo t s ~ r . 3
proportions
~ clearly are not additive. It is that is lower by 220 log likelihood units. This latter tree
really a larger feature of the :ree Structure (rather served as the model for the parametric bootstrap test
that: ;in individual branch) that needs to be tested. shown In Fibwre 3.
This problem (and any phylogenetic problem

with clear alternative hypotheses) is much better
suited for parametric than nonparametric boot-
strapping. In parametric bootstrapping, the 80
P
pseudorepIicate samples are created using nu- m
4-
merical simulation rather than resampling. This 2

permits exploration of alternative hypotheses. A 3 60
parametric bootstrap analysis consists of (1)as- 3
suming a model of evolution; (2) estimating para- .g
meters of the model from the data; (3) simulating g 40
&
new data matrices under this parameterized a
model; and (4) analyzing the replicate data matri- 8 20
Observed difference
ces. In the case outlined above, we could assume for the actual dat
that the nu11 hypothesis is true (that the two pa-
tient J sequences are related), and ask if we could
expect to see the patient J sequences separated in 0 2 4 6 8 10 12 14 16 18 20
the estimated tree due to stochastic variation or Difference in log ltkelihood
some systematic error that results from the details Figure 3 Results from a parametric bootstrap analysis.
of the tree topology (see Chapter 11). The tree shown in Figure 213 was simulated 100 times
The first step in this analysis is to find the best under the same model of evolution used in the maxi-
tree that is coi~sistentwith the null hypothesis (the mum likelihood analysis (see Huelsenbeck et al., 1995).
two patient J sequences together in the tree, as The differences in scores between the best tree and the
would be required if they descended from a com- best tree that supported the null hypothesis (mono-
phyly of the two patient J sequences) were recorded
mon ancestor). This tree is shown in Figure 2B. and graphed to obtain the expected distribution under
This tree requires seven additional substitutions the null model. All 100 sampled d~fferencesfall below
compared to the optimal tree (Figure ZA), and also six log likelihood units, whereas for the observed data
has a lower log likelihood score (-1504.4 compared the difference in scores is >25 log likelihood units.
to -1484.1). Can differences this great result from Therefore, we would expect to see a difference this
great (if the null hypothesis were true) much less than
random errors or systematic errors related to the 1% of the time, so the null hypothesis is rejected at p <<
shape of the tree? To test this possibility, the tree in 0.01. The null hypothesis can also be rejected by using
Figure 2B can be simulated, using branch lengths the difference in parsimony scores, which is computa-
estimated from a maximum likelihood analysis. tionally faster.
Each of these simulated data sets is then analyzed,
and the difference between the optimal tree and
the best tree that is consistent with the null hy- multiple infections. The traditional approach--col-
pothesis (which, in these simulations, is the model lapsing all branches wit11 low nonparametric boot-
tree) is recorded, This procedure produces an ex- strap proportions-would have suggested that the
pected distribution of difference scores, which is data set provides little resolution to this question,
shown in Figure 3. If the tree shown in Figure 2B whereas the parametric bootstrapping approach
(the best tree under the null hypothesis) were cor- shows that the data set is highly informative about
rect, then the probability of observing a difference the non-monophyly of the patient j sequences.
as great as that observed (>20 log likelihood units) Parametric bootstrapping requires specifying
is p << 0.01. In fact, the greatest value observed in a particular model of evolution, and it might be
100 simulations is <6 log likelihood units. (The argued that the results are dependent on the de-
same conclusion can be reached for parsimony tails of the model. However, the test may be re-
scores in considerably less computational time, al- peated using different models to test the sensitiv-
though the level of discrimination is slightly ity of the results to any particular assumption. In
lower.) Therefore, we can reject the null hypothesis limited studies, the test shown here appears to be
of common descent and accept the alternative of robust to changes in the model of evolution.
526 Chapter 12 / Hillis, Mable G.Moritz
Table 2
The number of distinct, unrooted, bifurcating trees as a function
of the number of taxa
Number Number
o f taxa of trees
10 2x1O6
22 3 x loz
50 3x
100 2 x 10lS2
1,000 2 x 102f860
10,000 8 x 1038,658
loo,ooo 1 10486,663
1,000,000 1 105,866,723
10,000,000 5 1068,667,340
Parametric bootstrapping is best suited to tant species), there are 5 x

1 0 ~ ~possible un-
1 ~ ~ ~ J ~ ~
problems in which a clear a priori hypothesis ex- rooted bifurcating trees. Although it is unlikely
ists. However, this may not be as limiting as it that anyone would attempt an analysis of this
first seems. If an initial. estimate suggests a poten- size, analyses have already appeared that con-
tial source of systematic bias (such as long branch sider the relationships among hundreds of species
attraction or skewed base composition), paramet- (Chase et al., 1993), and analyses of hundreds or
ric bootstrapping can be used to assess whether even thousands of human mitochondria1 se-
the observed conditions are sufficient to affect the quences are likely to be attempted. Obviously, the
results. Another good use for parametric boot- number of possible solutions in these cases makes
strapping is to predict the amount of data that exhaustive examination of the solution space im-
will be needed to obtain reliable phylogenetic res- practical. Given the number of possible solutions,
olution, given a preliminary data set and prelimi- it seems unreasonable to expect that we could
nary tree estimation (I-Iuelsenbeck et al., 1995). find the one correct history out of all the possible
Thus, parametric bootstrapping can be used not phylogenies. In fact, how do we know that any of
only for testing alternative l~ypotheses,but also as our estimated phylogenies are accurate? Accuracy
a tool to guide in study design (Chapter 2) and of phylogenetic methods can be assessed in sev-
troubleshooting. eral ways, including simulations, experimental
phylogenies, statistical tests, and congruence
studies. Statistical tests and congruence studies
Phylogenetic Accuracy are discussed in Chapter 11 and elsewhere in t l ~ s
The number of possible solutions in any phyloge- chapter. Below, we present a discussion of the use
netic analysis increases remarkably quickly as a of simulations and experimental studies for as-
function of the number of taxa. Even if only bifur- sessing accuracy in phylogenetic analyses (for ad-
cating solutions are considered, there are more ditional discussion of this topic, see Hillis, 1995).
possible branching orders for 50 taxa than there
are atoms in the universe (Table 2). If we consider Simulations and Pe@ormunce Criteria for
the number of possible phylogenies for all living Phylogenetic Methods
species, the size of the potential solution set is be- The most widely used (and abused) approach for
yond normal comprehension. For 10 million taxa assessing phylogenetic performance is numerical
(well below most estimates of the number of ex- simulation under an explicitly stated evolutionary
c
model (see review by Huelsenbeck, Z995a, and

Figures 4 and 5 for an example). The challenges of
evaluating method performance using simula-
tions are many, but perl-taps the thorniest issues
are model realism and overcoining simulation
bias. Obviously, numerical models of evolution
are simplifications of the processes that govern
the evolution of real organisms. Although the
/
lnodels are undoubtedly simpler than any real or- /
ganisms, biologists attempt to model the evolu-
tionary generalities that apply under a wide vari- 0
ety of conditions. By systematically varying a x Branch lengths a, b, c
particular part of the model (rate variability
across nucleotide sites, for instance), an ilxvestiga- Pigure The completeparameter for the four-
taxa, two-ratcs problen~originally outlined by Felsen-
tor can the effects a particular parameter On stein (1978a), extended to four character statcs (e g.,
~erformanceof various phylogenetic methods. DNA sequence data).In tlus problem, two opposing pe-
The results of suclt a study do not show how a ripheral branches (d and e) co-vary; their length (n~ca-
lnethod performs on real data sets, but they do sured in terms of proportion of differences) IS
show some conditions under wltich a method along the vertical axis. The remaining threc bra~~ches
(the other two peripheral branches and the central
might be to perform Or poorly If a
branch) also co-vary; their length is plotted along t11c
give11 result appears under a wide varie% of mod- horizontal axis. Because there arc only four states, thc
els and conditions, then we can conclude tl-tat the maximum expected divergence for two sequences sepa-
result is lil<elyapplies to real world rated by an infinitely long time is 0 75. Tile dotted 11nc
organisms. example is the finding that most along thc diagonal represents equal rates of change
along all hneages. The four trees shown represent rcb-
are biased rvhen branches in the tive branch lengths for trces near the respective cor~~ers
true tree are much longer (in terms of evolution- of the graph. Extenslvc simulations 1x.w~evaluated the
ary change) than others (see Felsenstein, 1978a; performance of most major methods of phylogenetlc
Hendy and Penny, 1989; Huelsenbeck et al., 1995). analysis throughout this Parameter space (see Huelscn-
~l~~~~ is a tendency for authors of beck and Hillis, 1993;Huelsenbcck, 1995a;and F1gur.e 5).
studies to overgeneralize from the results of their
studies. However, it should be clear that the re-
sults of a simulation shrdy apply only to the con- method is the "best" will depend upon the appli-
ditions tested. Every method has conditions un- cation and the specific conditions of the study
der which it performs well and other conditions Some of the important criteria far evaluatrng
under which it performs poorly. It is possible to performance of a method include consistency, effi-
find silnulation studies that claim to show the ciency, robustness, computat~onalspeed, discria~i-
"general" superiority of almost any method of nating ability, and versatility (Penny et al,, 1992,
building trees; such studies are often biased by Hillis and Huelsenbeck, 1994b).A phylogenetic
simulating only the conditions for wl-ticlt a given method is consistent (under a given model) ~f i t
method performs best. Some types of bias can be converges on the correct tree as the data available
addressed by examining the complete parameter to the method become infinite. All methods are
space for a given well-defined problem (see Fig- consistent when their assurnptio~~s are met, and all
ures 4 and 5). However, no simulation study is methods are inconsistent if thelr assu~nptionsarc
completely unbiased, because an investigator sufficiently violated. Therefore, it inakes no sense
must at least choose which parameters to vary to say that one method is coltsistent and another I S
(Huelsenbeck, 1995a). Moreover, there is no such not will-tout specifying the conditions under which
thing as a method tl-tat is ideal for all criteria of this statement is true. Although identifying a
performance (Penny et al., 19921, so which method's explicit and implicit assumptions is Im-
portant, consistency has received a surprising verges on the correct solution as more data become
amount of attention i n contrast to efficiency, Effi- available to the method. For many methods, there
a e m y 1s a measure of how quickly a method con- is a tradeoff between consistency and efficiency.
For instance, Lake's (1987) method of invariants is gle neighbor joining tree, David Swofford (per-
consistent under a wide variety of conditions, and sonal communication) found more than 27,000
has little bias even under extreme conditions of better trees under the minimum evolution crite-
branch-length heterogeneity. Ilowever, the method rion (which is the appropriate criterion for neigh-
is also extremely inefficient under many circum- bor-joining according to both authors of the algo-
stances. Hillis et al. (1994b) presented an illustra- rithm; Nei, 1991; Saitou, 1991). The accepted
tive simulation of this point, in which the most ef- standards in the field (as of this writing) appear
ficient methods found the correct tree 100%of the to be very different depending on the criterion
time with as few as 200 nucleotides, whereas chosen. Point estimates are almost never accepted
Lake's method of invariants required >lo9 nu- by investigators who choose parsimony or maxi-
cleotides to achieve the same level of performance mum likelihood criteria, at least without some
with the same data sets. It is probably more impor- search of tree space for better or equally good so-
tant for most investigators to know that a chosen lutions. (It was not always so; old parsimony pro-
method can find the correct solution with a limited grams such as Wagner78 output point estimates
data set than to know that it wouId find the correct only, and these results were widely reported in
solution if they had an infinite data set. the 1970s.)Unfortunately, this rigor often does not
There also are tradeoffs in some cases be- extend to investigators who use a distance crite-
tween robustness and efficiency. A method is ro- rion such as minimum evolution; many papers
bust if it is relatively insensitive to violations of its still appear each year with only neighbor-joining
assumptions. A method may be both consistent trees, without any attempt to optimize the solu-
and efficient under a given model of evolution, tion. Whether this is a result of lax standards or
and yet if the assumptions of the method are vio- ignorance on the part of investigators, reviewers,
lated, the method may quickly become inefficient and editors is unclear, but the evidence suggests
and/or inconsistent. This suggests an excellent use that many investigators do not realize that neigh-
for simulations, namely to explore the sensitivity bor-joining trees are only approximate solutions.
of a method to its various assumptions by sys- Among the various optimality criteria, there is
tematically violating them. A related point is that also a tradeoff in many cases between efficiency
complex models may be needed to achieve effi- and computational speed. For instance, when its
ciency under some conditions, and yet model assumpt~onsare met, maximum likelihood meth-
complexity also comes at a cost (see Chapter 11). ods are often more efficient than other methods
One of the most obvious tradeoffs is between (Hillis et al., 1994b; Huelsenbeck, 1995a,b). How-
computational speed and discriminating ability. ever, the much greater computational complexity
Single-tree algorithms (e.g., neighbor joining, of maximum likelihood limits the thoroughness of
UPGMA, various stepwise addition algorithms; tree searches for large data sets compared to par-
see Chapter 11) are very fast for finding a point es- simony or distance criteria. Thus, an investigator
timate of a tree, but they do not guarantee an op- with a large data set needs to decide if choice of
timal solution, and they do not permit the com- criterion is more important than a thorough search
parison of alternative solutions. Many of these of tree space for optimal or near-optimal solutions.
algorithms are good for finding a starting point Methods also differ greatly in their versatility;
for a more thorough search of tree space under a in other words, what kind of information can be
given optimality criterion, but they should not be incorporated into the analysis? The popularity of
viewed as a final solution. Strangely, some inves- parsimony methods (Sanderson et al., 1993) stems
tigators (e.g., S.B. Hedges et al., 1992b; Stoneking in part from their great versatility. Not only can a11
et al., 1992)consider the single-tree output an ad- kinds of character data (whether morphological,
vantage of using these methods. They may give a behavioral, ecological, or molecular) be analyzed
single "answer," but they clearly do not guaran- using parsimony, but almost any information on
tee that this answer is even among the best solu- evolutionary processes can be incorporated into
tions. For instance, using the same data set for the analyses as weI1. For instance, each site in a
which S.B. Hedges et al. (1992b) calculated a sin- gene could be weighted differently based on a pri-
530 Chapter 12 / Hillis, Mable G. Movitz
o n or a posreriori information on the respective (A) R VIT

probabilities of change or levels of independence. JII J
VIIl
Similarly, different weights can be applied to all
possible character-state changes, although this can ni 1 ~.-.- K
also be done by using different estimators of se- - N
L
quence divergence (e.g., Tamura, 2992; reviewed XI1 Q
XI11
in Hillis et al., 1993a). Parsimony analyses also in- VI +
XIV
clude estimating ancestral character states as well
as estimating branch lengths and branching order,
whereas many other methods of analysis exclude (B) R 7
at least one of these procedures.
In summary, the choice of method wjll depend
on which of the various performance criteria are of
greatest importance for a given application. Sirnu-
Iations can be useful for evaluating some of these
criteria under particular circumstances, but the re-
sults of a given simulation study should not be
overgeneralized. It is important to realize that no
method is best for all performance criteria, and
that there are tradeoffs among many of the crite-
ria. In choosing a method, an investigator should
identify the goals of his or her study, and then
evaluate which of the methods is best suited to
meet those goals. Beware of anyone who recom-
mends a method without asking about the goals
and details of the study!
Experimental Phylogenies: The Connectiotz

between Models and the Real World Figure 6 (A) Design of an experimental phylogeny.
The lineages were started at the base of the tree from a
Critics of simulation studies (e.g., Miyamoto and single clone of bacteriophage T7. Each lineage was
Cracraft, 1991) argue that such studies face a cred- propagated in cultures of E. coli in the presence of a
ibility gap because of the simple evolutionary mutagen (see Hillis el al., 1992for details). The lineages
models they must assume. No simulation will ap- were further divided at predetermined points, so that
proach in complexity the evolutionary constraints each of the lineages marked I-XTV were culturcd for the
same number of cycles in Identical (within the bounds
and processes that are experienced by real organ- of experimental error) environments. (B) The observed
isms (Hillis et al., 1993~).
Therefore, we need some number of nucleotide substitutions along each of the
method to test the predictions of simulations to see branches in the phylogeny for the four genes sequenced
if results are applicable to at least part of the real (from Hillis et al., 1994a).See Hillis et al. (1992) for sim-
world. If the simulation results do not apply to a ilar information on restriction sites from throughout the
entire genome. (C) One of two equally parsimonious
specific group of organisms, then we know that trees inferred from the sequence data, and the inferred
the predictions were not very general, or that we number of substitutions along each branch.
need to add additional complexity to the models.
Viruses are excellent model organisms for ex-
perimental studies of phylogeny. They reproduce which the details of the true tree are controlled by
very rapidly (with generation times often measured the investigator, whereas the details of evolution are
in minutes rather than days, months, or years), and controlled by the organisms (rather than a numeri-
they can rapidly accumulate changes to their cal model of organisms).Figure 6 shows the design
genomes. Hillis et al. (1992)showed that viruses can for an experimental phylogeny and compares it
be used to produce experimental phylogenies, in with the observed changes along the tree and an in-
Applications oJMolecula~Systemattcs 531
ferred phylogeny (constructed bljndly with respect entiation not as a metronome, but as a Pojssoli
to knowledge of the true tree). process with regularity of the same order of mag-
Experimental studies of phylogeny have just nitude as radioactive decay (Fitcl~,ZY76b, A C.
begun to appear, so there have been only limited Wilson ct al., 1977,1987a). This has promoted the
tests of the predictions from simulations. Already, use of molecular divergence measures to provide
however, it appears that experimental studies WLU a timeframe for phylogen~es,particularly where
be useful for identifying assumptions of models there are insufficient data 011 fossils (e.g., Sarich
that are likely to be violated when applied to real and Wilson, 1967; A.C. Wilsol~et al., 1974, 1975;
organisms and for testing assertions of method Beverley and Wilson, 1985; A.C. Wilson et al.,
superiority. For instance, a maximum likelihood 1987b; Bowen et al., 1989; Vigilant et al., 7 991;
model for restriction site variation has been Wayne ct al,, 1991b).
shown to perform rather poorly compared to The "molecular clock controversy" 1s really
other methods in an experimental phylogeny several different controversies. Therefore, TYC
(Hillis et al., 1994a). Also, coding of restriction break the controversy down into its variouc corn-
fragments (rather than restriction sites) has been ponents in the following sections.
shown to be positively misleading in some exper-
imental cases (M.E. White et al., 1991; Hillis et al., Is There a Universal Molecular Clock?
19944, contrary to the recommendations of some Heterogeneity of rates across different nucleot~de
recent authors (e.g., B. Bremer, 1991). positions, different genes, different genoll~icrc-
gions, or different genomes within an organis~~~al
lincage (for instance, nuclear versus urganellar
Predictions of Time fkom Molecular Data genomes) is undeniable (for a revlew, see LIand
A common application (and an area of considcr- Graur, 1991). However, "un~versal"rnolccular
able controversy) of molecular systematics is the clocks have been proposed for many indlv1cIua1
prediction of time from molecular divergence genes or genomic regions across a wide spectrum
data. It is clear that molecular divergence is of taxa. For instance, a clock is often claimed for
roughly correlated with divergence of time; how- animal mtDNA, which is supposed to evolve at
ever, there is considerable debate about constancy about 2% sequence divergence per million years
of rates of divergence and how much error is as- between pairs of taxa (W.M. Brown et al., 1979).
sociated with predictions of divergence times A.C. Wilson et al. (1985) stated that "no major cle-
from measures of nzolecular similarity (Chapter partures from this rate are known for the inole-
1).The "molecular clock l~ypothesis"l~oldsthat cule as a whole." X-iowever, since then, many
the rate of molecular change is constant enough studies have shown considerable rate heterogene-
(within the bounds of particular gencs and taxa) ity of mtDNA within and between various animal
to be useful in predicting times of divergence. groups (DeSalle and Templeton, 1988; Hascgawa
However, despite the many applications of the and Kishino, 1989, Martin el al., 199213; Avise el
molecular clock hypothesis, there have been few al., 1992~).Molecular clocks have been proposed
serious attempts to determine confidence limits of for numcrous nuclear gencs as well, but in most
the estimates of time derived from molecular di- cases the evidence suggests that rates of subst~iu-
vergences. In this section, we suggest that much tion vary among taxono~nicgroups (reviewed by
greater rigor and caution is needed in estimating W.-H. Li, 1993; Avise, 1994).There is, howevel, ari
divergence times from molecular data than is important caveat: all estimates of rates of diver-
c o m m & ~ exercised.
l~ gence must ultimately trace back to datcs dcrlved
Zuckerkandl and Pauling (1962,1965) were fro111 the fossil record (or, more dubiously, to iac-
the first to suggest that genes and their protein ariance events estiinatcd from biogeography) and
products might evolve at rates constant enough these are often open to different interpretations
that measures of nzolecular divergence could be (Marshall, 1990; Easteal and Collett, 1994)
used to calibrate a "molecular clock." Recent. ad- The suggested reasons for the lack of gcncral-
vocates of this hypothesis view molecular differ- ity of molecular clocks include differenccb In
532 Chapter 12 / Hillis, Mable cEi Moritz
eage in the tree can be reconstructed without
metabolic late (Thomas and Beckenbacl~,1989; error.
Avise rt al., 1992~;Martin et al., 199233); differ- 5s calibration dates for all times of divergence
ences in DNA repair efficiency (Britten, 1986); dif- used to calculate the rate of the molecular
lerences in exposure to mutagens (Adelman et al., clock are known without error.
1988); differences in nucleotide generation times
(Martin and Palumbi, 199313);differences in num- 6. A regression of time on number of substitu-
ber of DNA replications in germ line cells (Wu tions can be conducted without error.
and Li,1985); and differences in organismal gen- Under these conditions (none of which is realistic,
eration times (Laird et al,, 1969; Kohne, 1970; of course), we would be able to construct a molec-
Catzefiies et a]., 1987; MT-H. Li et al., 1987; Sibley ular clock like the one shown in Figure 7 .The con-
et a1 , 1988; Gaut et al., 1992).Martin and Palumbi fidence limits for individual data points can be
(l993b) usefuily combined the effects of metabolic easily calculated under this model based on the
rate and generation time into a single concept of Poisson expectations:
"llucleotlde generation time."
This apparent heterogeneity leaves many ad-
vocales of molecular clocks to argue for "local" P = e+pY
Y!
molecular clocks (see W.-H. Li, 1993). The argu-
~nei-ii1s that among closely related species with where P is the expected frequency of Y substitu-
s ~ r n i l a rlife histories, metabolic rates, generation tions and pis the mean i~umberof expected substi-
titnes, etc., rates of evoluiion for a particular gene tutions. Thus, on our perfect clock (with a mean
are hkely lo be stable. Therefore, according to this substitution rate of one substitution per million
argument, predictions of time can be made if we years), for lineages of 15 million years we would
calibr~tc the rate of evoliltlon separately for each expect on average to observe 15 substitutions, and
gene 1x1 each taxonomic group. Putting issues of we would expect 95% of such lineages to have be-
practicality aside for the moment, it is a useful ex-
erclse !o accept that such local clocks exist and
turn to the issue of what we should expect from 95% confidence hmits:
the pcriect lnolecular clock. 15 +
15 7 substitutions
What Are the Expectations of n Perfect -

&Iolrcular Clock?
Let's assume that a perfect molecular clock, for
2 10
G
whicl~w e have a perfcct calibration, exists for a 2
gene in 3. closely related group of organisms. There-
fore, we will assume that the following are all We:
-2
a Confidence linuts based
1 Molecular change IS a linear function of time, E on Poisson distribution
w ~ i hsubstitutions accun~ulatingfollowing a M
l-'olsson distribution (A.C. Wilson et al., 0 5 10 15 20 25
Number of substitutions
1987a). Therefore, the only variation we will
observe is stochastic. For the sake of discus- Figure 7 The sampling expectations of a perfect molec-
s o n , we will set tho rate at one substituiion ular clock, arbitrarily set at one substitution (throughout
(within the gene of illterest) per million years. the sequence examined) per million years. The model
assumes that change is linear with time, substitutions
2 Rate of change Is equal across all positions followa Poisson process, there is 110 error in callbration
compared and across all lineages. times or collection of data, all substitutions are ob-
3. ~h~ phylogenetic tree can be reconstructed served, and all lineages are evolutionarilyindependent
Under these cond~tions,the 95% confidence limits for in-
.i\-ichouterror, and eaclt branch in the tree can dividual data points would be as shown, l;or instance,
be analyzed independently. 95% of lineages isolated for 15 million years would be
4 -rhe nulnber of substitutions along each 1in- expected to exhibit 8-22 substibtions, inclusive.
Ween 8 and 22 substitutions, inclusive (see Figure Restriction site changes

\
7). In other words, we could not reject a lineage as
being 15 million years old if we observed either 8
or 22 substitutions, even with our "perfect" clock.
Furthermore, we also could not reject a lineage
-'
with 8 substitutions as being only 5 million years
old. The point is that the expected stochastic varia-
sed on Poisson
tion in accumulation of the relatively small num-
ber of substitutions observed within most genes is
great enough to introduce considerable impreci- Nucleatide substitutions
sion into molecular clock estimates. Almost all mol-
ecular clock estimates in the literature, however, ig-
nore the expected variance of the estimates, and Number of changes
report the Confidencelunitslli they are Figure 8 Data from the experimentalphylogeny de-
calculated at all, are often based on the variance scribed in Figure 6, plotted on the clock mode~shown
the molecular estimate, but ignore the major source in Figure 7. Each point represents one of the lineages
of error in the estimate (stochastic variation in the I-XIV shown in Figure 6A. Data are shown for restric-
,-lock itself). In our ideahed example, we have as- tion sites and nucleOtfdesubstitutions,each plotted t~
the mean predicted time for the lineage (in arbitrary
sumed that there are other of other units, such that one change correspondsto one unit of
than stochastic variation. Thus, these confidence time). The nucleotide data are slightly more
limits are the smallest we could expect under the than predicted from the model < 0.05); the reseictioll
most idealized of conditions. site data fit the modei ( p > 0.05).
Experimental Dnta:A Test of the Pe$ect Clock

How can we test the predictions of the model pre- the branch lengths (in terms of change)
sented in the lasr section? We can approach many in Figure 6B (observed) or 6C (inferred from the
of the otherwise unrealistic assumptions only un- terminal lineages) indicates that the 14 indepen-
der controlleci conditions in the laboratory.The ex- dent lineages appear t o be evolving at very differ-
perimental phylogeny shown in Figure 6 provides ent rates. However, if We plot out these values on
a system for testing our perfect molecular clock our model molecular clock (Figure 8), with one
model. The phylogeny was observed (actually cre- change per one arbitrary unit of time, the varia-
ated) rather than estimated, the changes along the tion is not n ~ u c hgreater than we would expect
branches were observed directly, the length of time under the x~~odel. 1x1 fact, we can't reject the mode1
each branch existed is known, and the organisms for the restriction site data, and there is just barely
(in this case, viruses) all evolved from a single more variation than expected for the sequence
clone of bacteriophage T7 and were grown under data. Therefore, under these ideal conditions, the
essentially identical environments (within the con- viruses are evolving lnuc1-i as would be predicted
straints of laboratory error). This produces as close from a perfect molecular clock. But is this good
a fit to the idealized conditions of the model as news or bad news for clock proponents? If all we
could be expected in a real evolving system. had from the viral study was the inferred tree
Two kinds of molecular character data have (Figure 6'3, we ~ o u l seed a tree with very differ-
been collected for the T7 experimental phylogeny. ent branch lengths (each of which, in fact, repre-
Restriction sites have been mapped (Hillis et al., sents the same length of time). Although the in-
1992) from across the entire genome of the v h s e s ferred branch lengths (Figure 6C) are excellent
(approximately 40,000 bp long), and sequence estimates of the actual branch lengths (Rgure 613),
data have been collected for four genes (Bull et al., they are not very good estimates of actual time
1993a; Hillis et al., 1994a). There are 14 indepen- (Figure 6A). Thus, the predictions based on char-
dent lineages (labeled I-XIV in Figure 6A) of acter change give very imprecise estimates of
equal length in terms of time. A cursory look at time, even under these ideal conditions.
Chapter 12 / Hillis, Mable & Moritz
(A) (0)
Sequence divergence Sequence divergence

Figure 9 The problem of non-independence of pair- two groups. Because most of the phylogenetic history
wise comparisons in a phylogeny. (A) Assumed tree, is held in common among these comparisons, the nine
with branch lengths drawn proportional to molecular values are all very similar. If the nine values are treated
change. Since the divergence from a common ancestor as independent data points in the calibration, the esti-
(at the root of the tree), the lineage A-EL-C has changed mated confidence limits (solid lines) of the regression
little, whereas the lineage D-E-F has changed consid- (dashed line) appear to bc very narrow, despite the
erably A11 taxa indicated by letters are extant, so the vastly different rates of evolution in the tree. (C) If the
distance in time to the root is equal for each taxon. (B) two independent lineages leading to the two groups are
Molecular clock calibration based on perfect knowledge plotted separately, the two vcry different rates of evo-
of the last common ancestor of the two major lineages, lution are revealed.
and using all nine pairwise comparisons between the
Limitations of Molecular Clock Calibrations lems associated with determining dates from fos-
In the real world, it i s never possible to satisfy all sil evidence. Ideally, we would need accurately
the conditions of the perfect molecular clock dated fossils from just above and below the split-
model. In particular, there are a number of prob- ting event we wish to date (Figure 10A). How-
lems that occur in calibrating molecular clocks. ever, it is more likely that the fossils are not direct
For instance, rather than using independent lin- ancestors, but simply branched off the tree before
eages in a phylogenetic tree to calibrate a clock, and after the splitting event (Figure 10B; Mar-
most investigators use all pairwise divergences shall, 1990).These latter dates will tend to under-
among taxa within a given group. These values estimate the age of the splitting event. Even with
are not independent of one another because an outstanding fossil record for a group, it is ex-
many are based on shared portions of the phylo- ceedingly difficult to pinpoint the age of the last
genetic tree (see also Lynch and Jarrell, 1993). common ancestor of a group of living species
This lack of independence can only inflate the (S.S. Carlson et al., 1978).
perceived correlation between divergence and Biogeographic data also have been used to
time (Figure 9). There are also numerous prob- calibrate molecular clocks, but there are difficul-
Applications of MoleculalaSystematics 535
species clearly assignable to Xeizopt~sare found m

S o u t h America. S i m ~ l a rcallbration errors a r c
likely responsible for the molecular clock e s t ~ m tea
of 30110 million years ago (mya) for last common
2 ancestor of tlie genera Xerzopus a n d S ~ l u ~ a n(Bis-
n
F Inferred
hming of timing of bee et al., 1977). This estimate is dlrectly contra-
branching branching dicted by the presence of Xelzopus i n both South
event event
America a n d Africa, which requires either XCMO-
Figure 10 A potential problem of inferring dates of P U S to be established prior to the break u p of
branching events from fossil evidence. (A) In an ideal Africa a n d S o u t h America (90-100 mya), o r
case, well-dated fossils from just before and just after a
branching event will be available, and character data transoceanic dispersal of frogs (Figure 11, Can-
will exist to place the fossils at these points in the tree. natella and d e SB, 1993).
(B) In a realtstic case, fossil lineages wtll be identified
that connect to the tree before and after the branching Calculating Confidence Limits from Real Data
event, but they will not represcnt direct ancestors on et- Ideally, molecular clocks s l ~ o u l db e calibrated
ther side of the event. In this case, even accurate dates
of the foss~lswill give an underestimate of the age of based on ~ n d e p e n d e n lineages
t of estimated phy-
the brancl~ingevent. logenetic trees. However, such calibrations a r e
very rare i n the literature. Instead, there are nu-
merous clock calibrations proposed that arc based
ties with this approach a s well. To use dates de- on pairwlse differences among taxa. In general,
rived from biogeographic data, it m u s t b e as- these molecular clocks are calibrated by dividing
s u m e d that the relevant speciation event w a s thc average estimate of the age of tlie last com-
caused directIy by thc correIated geological event. m o n ancestor b y the average measure of molccu-
For instance, the separation of p i p i d frogs i n lar divergence. The only error that is generally
Africa a n d South America has been correlated taken into account IS that associated wit11 the esh-
with the separation of those two continents (e.g., mate of molecular divergence, However, this er-
Savage, 1973), a n d clocks have been calibrated ror usually pales into insignificance in compari-
based on the molecular distance between the gen- son to other sources of error
era Xenopus (in Africa) and Pipa (in South Amer-
ica). However, Cannatella and d e SB (1993) have
shown that the splitting event between these two
genera is not correlated with the breakup of t l ~ e
two continents, because Pipa is much more closely
related to Hymenochirus (another African pipid)
than it is to a n y species of Xenopus, a n d fossil
- t Xenopus pascunli (SA)
Figure 11 An example of error in biogeographic dat-
ing, from Cannatella and de Sk (1993). The tree shows Extant Xri2opus (AF)
the inferred phylogenetic relationships of various pipid
frogs. Before the phylogeny was available, the split be-
tween pipid frogs in Africa (AF) and South America
(SA) was attributed to thc breakup of the two conti-
nents. Thus, point A (the split between Xenopus and
Ptpa) was associated with a date of approximately
90-100 mya. Based on a molecular clock, Bisbee et al.
(1977) calculated a date of approximately30-40 mya for
point B (split between Xenopus and Silurana).However,
the split between Africa and South America could have
been no earlier than point C (the split between African
and now-extinct South American species of Xenopus)
without transoceanic dispersal of the frogs. A B C
536 Chapter 12 / Hillis, Mnble b Moritz
Although we believe this basic approach is underestimate of the confidence limits). Figure 12
flawed for the reasons outlined above, it is useful shows two 95% confidence limits for a regression
to calculate the confidence limits of these clocks of time o n molecular divergence. The data are
based on the regression model that has been used based on percent divergence (corrected for ex-
to cstabllsh the calibrations. In other words, we pected multiple substitutions; C%) of silent sub-
wlii ~gnorethe probielns of non-independence of stitutions in coding regions of several genes com-
the p;llXwlsedivergence esiimates for a moment pared among various pairs of mammals (from
to zs!,mate the confidence limits. Even giving Britten, 1986). A weighted linear regression of
these callbrations the benefit of the doubt, it can time (Y) on divergence (X)gives Y = 1.39X (as
be shown that the conf~del~ce limits for new esti- represented by !ine A in Figure 12). The variance
ma t2s ol time based on the models are so large as of the residuals under this model of regression is
to rn'ike {he clocks h~ghlyimprecise. given by
The regression model is usually a simple
weighted linear regression of time on molecular
d~vergmce, with the constraint that the intercept
of the regression line is the origin. The common
cahbrstlon technique of divrding the average time
s2yx=
($1
c - --cx
n-l
oi divergence by the average ~noleculardiver-
gence WI!~ produce the correct slope of this re- and the estimated standard error of the slope (b =
gression under the a s s u m p t i o ~that
~ the residual 1.39)is given by
error of the regression is proportional to molecu-
lar divergence (Snedecor and Cochran, 1989). In
s , method is acceptable if there is
otl~cr' ~ ~ o r dthis
proportionally greater deviation about the regres-
=g
sign llile at high levels of molecular divergence
than a t low levels of molecular divergence. This
seems to be a reasonable assumption, and plots of
C
t:me versus molecular divergence generally fol-
low [his pallern (Figures 12-14). 125
Calculating confidence limits for molecular
clocks based on the expected error in the measure
of 1nolecular divergence is inadequate, because
thi, source o i error is trivial compared with the ... 100
res~dualerror of the regression. lit is more rigor-
oua to assume that the error associated with the -E'
molecular measure is triu~aland calculate confi- 2 ,,
del~cclimits for predictions of time based on the $
regression (although this will result in a slight 5
E
'F; 50
Figure 1 2 Regression (A; see text for model) of esti-

2
g
mated time since separation on divergence of silent
stibs!itutions (corrected for expected multiple substltu- 25
lions) 117 coding regions of DNA in mamlnals (data
iron1 Urltten, 1986). B1 and B2 are the boul~dsof the 95%
confidence ilmits of the regression line. Cl and C2 are
the bounds aE the 95% confrdence l~mitsfor predicted
values of time based on new measurelnents of sequence 0 20 40 60 80 100
divergence, except that regions of negatlve t ~ m eare eol-
lapsea to zero. Sequence divergence (C%)
'Figure 13 Regression (A)of estimated time since sepa-

(SnedecOr and COcllran/ 1989). the data ration on sequence divergence of mitochondria1 DNA
Britten (1986) shown in Figure 12, Sb = 0.137; since in primates (data from W.M. Brown et al., 1979). Key to
Student's t at a = 0.05 (with n - 1 = 19 degrees of confidence limits same as for Figure 12.
freedom) is 2.093, the 95% confidence interval of
b is 1.39 0.287 This interval is bounded by B1
and B2In Figure 12.Altl~oughthis provides confi- confidence limits f-or all estimates of time based
dence limits for the slope of the regression, it does on molecular clock calibrations, under the as-
not provide confidence limits for a new predicted sumptions that the data points are independent
value of time given a known sequence diver- and the molecular divergence data are without er-
gence. The standard deviation of a new predicted ror. In most cases these assumptions will be vio-
value of time (1') is given by lated and so the confidence limits will be under-
S-
Y =SIX/s estimated. However, even these underestimated
confidence limits are so great as to render the
clock estimates minimally useful. Nonetheless, if
one is interested in applying molecular clock
models to questions of time-since-divergence,
(Snedecor and Cochran, 1989).*The 95% confi- then the error associated with the estimates of
dence limits of new estimates of time from the time cannot be ignored.
data presented in Figure 12 are represented by The data that have been used to calibrate two
lines C1 and C2. These limits are quite large; for other molecular clocks are plotted in Figures 13
instance, at C% = 50, the 95% confidence interval and 14. In Figure 13, data on mtDNA sequence di-
is 69.5 +. 65.34 million years. vergence is plotted against time-since-divergence
The above approach can be used to calculate information derived from the fossil record of pri-
*It is important in calculating these confidence limits to recall that the regression model assumes the res~dualerror of the regres-
sion is proportional to the molecular d~vergenceand that the regression line runs through the origin Confidence limit calcula-
tions that assume the resldual error is the same for all values of molecular divergence (e.g., S.S. Carlson et al., 1978) seriously
underestimate the actual confidence limits.
Immunological distance
Figure 14 Regression of estimated time since separa- ure 12. D indicates the reported relationship between
tion on immunologicai distance. The data points are the time since divergence and albumin immunological dis-
same that were used by Prager et al. (1974) to calculate tance for mammals (Sarich and Wilson, 1967). Confi-
the rate of albumin evolution in birds (A); hence, some dence limits of D cannot be calculated because of an in-
of the points are averages for comparisons of several sufficient number of data points.
species. Key to confidence limits is the same as for Fig-
mates (from W.M. Brown et al., 1979). This cali- The values of time-since-divergence used in
bration is widely used as a standard mtDNA Figures 12-14 are by no means universally ac-
clock (A.C. Wilson et al., 1985), although Moritz et cepted; indeed, the extreme difficulty with which
al. (1987) stressed likely errors associated with pa- such data may be garnered from the fossil record
leontological calibrations and from variation is a major obstacle to calibrating molecular clocks.
among lineages. Although the confidence interval We have used the original data upon which these
for this calibration is smaller than that in Figure calibrations were based in order to provide confi-
12, it is still large enougl~to be quite limiting for dence limits for estimates derived from the cali-
most applications of a molecular clock. brations. New calibrations based on new values
The calibration of albumin divergence based on of time are possible, but these calibrations should
immunological colnparisons among birds (Figure be accompanied by newly calculated confidence
14; Prager et al,, 1974) shows that the confidence limits,
limits of new predicted values of time may be so It is difficult to find relevant data to calculate
large as to not exclude any reasonable possibility. confidence limits for an allozyme (Nei's genetic
Note, however, that the confidence limits for the distance, D) clock. As noted by Avise and Aqua-
slope of tlus calibration for birds (BI and B2 in Fig- dro (1982), "...the major obstacle to critical tests
we 3)do not include the rate reported for m a m d of the electrophoretic protein clock is the almost
(D inFigure 3),as Prager et al. (1974) correctly con- total lack of reliable independent information
cluded. This highlights the necessity of calibrating about times of speciation." Nonetlleless, there
inolecular clocks w i t l the
~ group of interest. has been an enormous range of estimated rates
Applications of Molecular Sysfernafics 539
for divergence of Nel's D.For instance, the time

for accumulation of a Nei's D of 1.0 in various
groups of vertebrates has been given as any-
where from 0.7 to 18 million years (Figure 15;
Avise and Aquadro, 1982). With such a range of
estimated rates, "...it is hard to imagine a genetic
distance estimate that would not be 'compatible'
with almost any fossil or geologic data" (Avise
and Aquadro, 1982). The study by Beerli et al.
(1996) is a rare example of a study that has at-
tempted to calculate confidence limits on an al-
lozyme clock. They showed that although the
confidence limits on the clock are fairly broad, it
is possible to use the predictions to exclude some
biogeographic scenarios within closely related
groups of organisms.
In summary, the following guidelines should
be considered when estimating time from values
of molecular divergence. (1) For any estimate of
time, reference should be made to an explicit cal-
ibration of the clock for the particular type of
molecular data analyzed. This calibration should
be based on independently derived estimates of
time-since-divergence (e.g., the fossil record, but
not other molecular data), and the calibration Nei s genehc d~stanceB
points should be evolutionarily independent
(e.g., independent branches in a phylogenetic Figure 15 Proposed relatronshlps between ttme since
divergence and Nei's D 1x1 vcrtcbrates ( s u m m a i l ~ c dIn
tree). Many calibrations are based on singIe A v ~ s eand Aquadro, 1982) No confidence 11rn1tscan be
points or on no data at all; such calibrations are calculated for any of these regressloll lines because of a
obviously insufficient. (2) The confidence limits pauclty of data. Rates 1,3,and 6 have been suggested
of the calibration must be considered and these for varlous teleost hshes, rate 2 fol plethodont~dsala-
limits should be calculated for the new estimate manders, rates I and 3 for squamate rept~les,rate 3 for
birds, and rates 4,5, and 7 for varlous mammals
of time. (3) Calibrations should only be used
within the group of organisms far which the cal-
ibration applies. Extreme caution should be ex-
ercised before applying a calibration derived timates apply to single estimates from angle
from one group of organisms to another group of genes. Trying to estimate tlme of divergence from
organisms, as rates of molecular divergence of- one such estimate is like trying to estimate the av-
ten differ markedly among groups. At the mini- erage height of humans by measuring one hu-
mum, a relative rate test should be conducted to man: the estimate is likely to fall within broad
test for rate heterogeneity between the group bounds. An obvious solution to the clock problern
studied and that for which the calibration was is to make many separate estimates of the date of
obtained. a single splitting event from many independent
genes or from many taxa expected to be affected
Other Estimates of Time by a particular vicariance event (e.g., KnowIton ct
Given the problems of molecular clocks, how can al., 2993).
molecular systematic analyses estimate times of In other cases, it may not be necessary to use
divergence? There are several possible ways. First, a molecular clock at all, and yet still make predlc-
the broad canfidence limits of molecular clock es- tlons of time from molecular data. Buanaguno et
al. (1986)used a molecular clock based on nu- some data clearly overlap, it is difficult to imag-
cleotlde substitutions among isolates of influenza ine any data that would not have "fit" such broad
A vlruses collected across 50 years to predict the confidence hrdts. Furthermore, both estimates ap-
ongln of a Russian influenza strain that resulted pear to be consistent with most current hypothe-
from accidental release in an moculation program. ses of modern human origins.
I-Iowever, the influenza lsola tes collected through The above discussion suggests that consider-
the %-year sampling penod clearly are not evolu- able caution is needed in predicting absolute
tional l l y independent, and essentially the same times of divergence from molecular data. How-
result can be obtained by simply assuming that ever, this in no way impedes the calculation of rel-
the isolates are part of a single lineage sampled atizle times of divergence, because many methods
through tlrne. The rate of evolution doesn't even of plxylogenetic inference are relatively insensitive
need to remain constant through time for this ap- to differences in rates of divergence {Chapter 7.1).
proach to work. As an example, although we may have limited
Ye1 another approach is to use coalescent confidence in the absolute time that the orangutan
models 10 estimate times of divergence These lineage diverged from the common ancestor of
models require a number of limiting assumptions, humans, chimps, and gorillas, i-nolecular data
but under certain circurnsiances, i t may be possi- leave little doubt that tlxis event occurred before
ble to obtain reasonable estimates of divergence the latter three lineages diverged (Slightom et al.,
tlme and calculate reasonable confidence limits 1987). FortunateIy, the vast majority of applica-
for the ebtimate. For instance, Dorit et al. (1995)re- tions of molecular systematics do not depend
ported a complete lack of variation among 38 lxu- upon calibrations (or even the existence) of mole-
man males for a reg1011 of the Y chromosome. As- cular clocks. Differences in rates of divergence
suming random mating, equilibrium population among lineages detract only from methods of
size, exponentially distributed bifurcation times, analysis that require clocklike behavior of mole-
and a mutatiou rate estimated from other greal cules, and alternative methods of analysis exist for
apes, they used a coalescent model to estimate all applicatiol~sof molecular systematics except
that the last common male ancestor for these infor the absolute estimation of time.
dlviduals existed bekween 0 and 800,000 years ago
(95% confidence limits). However, the distribution
of expected coalescence times for sequences with
110 detected base substitutions is exponential
APFLICATPON 01:PHYLOCENIES
(Tapma, 1983) so the mean estimate of coalescence FOR ANALYZING MACRO-
time derived from the uniform Y chromosome se- EVOLUTIONARY PATTERNS:
quences is a poor description of the likely out- COMPARATIVE METHODS
comes. Despite tlxis uncertainty, the mean esti-
mate of 270,000 years ago presented by Dorit et al. The ultimate goal of most systematic studies is to
has been widely reported as "fitting" the estimate provide insight into the historical structure of
of the last common female ancestor for human groups of organisms and the evolutionary
mtDN.4 (e.g.,Paabo, 1995). Vigilant et al. (1991) processes that underlie diversity. Historically,
reported the latter date at 266,000-249,000 years these goals tended to be separated into the work
ago, but this range of dates takes only the experi- of taxonomists and the work of population biolo-
mental error (and noi the sampling error of the gists. However, improvements in the confidence
molecular clock calibration) into account. Based with which phylogenies can be estimated has
on other primate mtDNA clocks (e.g., Figure 131, been essential to the development of comparative
we could approximate confidence limits of about studies that use statistical tests to account for the
0-1,000,000 years ago for the date of the last comnon-independence of taxa caused by commolv an-
mon ancestor of human mtDNA. Although the cestry. Althougl~previous tests have been pro-
coi~fliiencelimits for the nxtDNA and Y chromo- posed to compensate for relatedness of organisms
under study it is only relatively recently that lack weighted squared-change parsimony to encom-
of independence has been recognized by non- pass varying models of evolution, and Garland et
systematists as a major concern in comparative al. (1992) developed a method for assessing
studies. Felsenstein (1985~)was among the first to whether contrasts have been adequately stan-
propose a statistical test for comparisons of con- dardized (i.e., that underlying models of evolu-
tinuous traits between organisms that used a phy- tion have been adequately accounted for). Losos
logenetic hypothesis as a structural framework. (1994) proposed that the sensitivity of statistical
He proposed uslng a series of independent con- comparisons to phylogenetic history could be
trasts to search for correlations in traits among ter- evaluated by comparing results based on a large
minal taxa and their ancestors. number of conceivable phylogenies (varying di-
The major limitation of this approach is the chotomous branchings and branch lengths) for a
necessity for a reliable phylogenetic hypothesis given group to determine the importance of vio-
that includes an estimate of branch lengths (ex- lation of these assumptions.
pressed in units of expected variance of change) Many other methods exist for the analysis of
with which to standardize contrasts (Grafen, correlated continuous characters based on phylo-
1989). Felsenstein (1985~;1988b) proposed meth- genetic history (e.g., autocorrelation: Cheverud et
ods to account for incompletely resolved phylo- al., 1985; minimum evolution: Huey and Bennett,
genies and to estimate branch lengths but this has 1987, Martins and Garland, 1991; Garland et al.,
been the subject of considerable controversy (see 1991; nested analysis of covariance: Bell, 1989).
Grafen, 1989,1992; Harvey and Pagel, 1991; Pagel, The reader is directed elsewhere for detailed com-
1992; Page1 and Harvey, 1992; Garland et al., 1991, parisons of these methods (e.g., Harvey and
1992; Losos, 1994). Grafen (1989; 1992) proposed Pagel, 1991) and for evaluations of their relative
the "phylogenetic regression," which is based on performance in simulation studies (e.g., Martins -
Felsenstein's method but uses a likelihood ap- and Garland, 1992; Garland et al., 1992,1993; Git-
proach to simultaneously estimate relationships tlernan and Luh, 1992). Methods for analysis of
between standardized independent contrasts and correlations in discrete characters also have been
Lo transform branch lengths. This method has developed (Ridley 1983; Maddison, 1990; see also
been contrasted (Grafen, 1992; Page1 and Harvey, Harvey and Pagel, 1991). The choice of method
1992) with an alternative generalization of Felsen- may depend on whether the emphasis is on iden-
stein's approach for analysis of incompletely re- tifying patterns of correlation in traits among
solved phylogenies that was discussed by Harvey closely related taxa or in more specific hypothesis
and Page1 (1991; fully described in Pagel, 1992).A testing, for which adequate statistical power and
more complete review and additional suggestions at least some knowledge of the underlying distri-
for the treatment of "hard" versus "soft" poly- bution of character change is more critical.
tomies was given by Purvis and Garland (1993). Some questions require reconstruction of an-
The concern in these studies is that contrasts have cestral states for prediction of direction of changes
been adequately standardized to account for (e.g., Donoghue, 1989; W.P. Maddison, 1990,
changes along branches of differing lengths and 1991).Ryan and Rand (1995)reconstructed ances-
that errors associated with interpreting poly- tral calls of frogs of the genus Physalaemus based
tomies have been reduced. on a phylogenetic analysis of mitochondria1DNA
Another assumption of Felsenstein's method sequences (Figure 16). They then synthesized
that has been the subject of discussion is that evo- these calls electronically, and tested preferences
lution proceeds through Brownian motion, so that for extant and ancestral calls among the various
expected variance of change in a trait is propor- species. Ancestral gene and promoter sequences
tional to time. Martins and Garland (1991) per- have also been reconstructed (literally) from in-
formed simulation studies using null distribu- ferred ancestral states from phylogenetic analyses,
tions to evaluate the reliability of Felsenstein's and then tested in living systems (Adey et al.,
method (compared to alternative methods) using 1994; Jermann et al., 1995; Stewart, 1995). Such
542 Chapter 12 / Hillis, Mable b Mouitz
P. enesefae P. ephrpplfer
i
\
I
13.3
I
P.species A P.pustulosus
-
13.1 g F
P. petersi
y,y
P.species B P. coloradorurn P.pusluiatus
u 2500
u800
Figure 16 Reconstruction of ancestral advertisement msec
calls in frogs of the Physalaemus pustulosus and P.ephip-
pifw groups (adapted from Ryan and Rand, 1995).The
branch lengths are estimates of the numbers of changes which could be used experimental investiga-
in the mitochondria1 12s rDNA gene (averaged across
all most-parsimonious reconsiructions).The graph in tion of microevolutionary processes Brooks and
the lower right indicates the scale of the axes for the McLennan (1991) provide a review of the types of
sonagrams. See Ryan and Rand (1995)for details of the questions that have been addressed using these
analysis. methods.
Whichever method is used, the important
point is that combining traditional. comparative
studies have been highly successful, and results studies with phylogenetics has greatly increased
from experimental phylogenies suggest that these the potential to predict and interpret patterns of
reconstructions are likely to be highly accurate evolution among organisms at various taxonomic
(Hillis et al., 1992). The ability to reconstruct an- levels. This approach has increased communica-
cestral genes, behaviors, and phenotypes provides tion among biologists in a wide variety of fields
a powerful tool for investigating functional and recent applications are too diverse to cover in
changes through time. this brief overview. However, applications have
Other studies are concerned with compar- included such widely divergent topics as genome
isons among terminal taxa only. Lynch (1991b) de- size evolution (Sessions and Larson, 19871, coad-
veloped a phylogeny-based likelihood method for aptation of physiological constraints and behav-
partitioning mean phenotypes of taxa into I~erita- ioral preferences (Huey and Bennett, 1987; Mar-
ble phylogenetic effects and non-heritable resid- tins and Garland, 1991; Garland et al., 19911,
ual components to be used to infer constraints on allometry (e.g., Page1 and Harvey, 19891, experi-
macroevolutionary and microevolutionary mental ethology of mating systems (Brooks and
processes, respectively. McLennan (1991) sug- McLennan, 1991; McLennan, 19911, sexual selec-
gested that phylogenetic systematics could be tion on secondary sexual characters (Ryan and
used to uncover macroevolutionary patterns, Rand, 19951, developmental genetics of homeobox
Applications of Molecula~Syste~nat~cs543
genes (Doyle, 1994), relationshps between phylo- proaches to sampling, assaylng variation, and 1n-
genetic pattern and developmental processes (De- terpreting results.
Salle and Grimaldi, 1993), use of weI1-character- The next few decades will continue to be an
ized model systems for comparison of gene and exciting time for molecular systematics. The com-
gene system evolution (Kellogg and Bircher, mon ground between molecular population ge-
1993), evolution of coordinated regulation of netics and phylogenetics will continue .to be ex-
genes encoding intermediate metabolic enzymes plored as large databases for il~traspecific
(Clark and Wang, 19941, and many others. The variation in multiple genes are developed for
number and diversity of these kinds of studies model taxa and coalescence theory matures. \Vc
should see a substantial increase with improve- will almost certainly see many additional entire
ments in methods of phylogenetic reconstruction. prokaryote genomes sequenced and compared,
However, it must be re-emphasized that these in- which will provide us with opportunities to ex-
ferences will only be as sound as the phyogenies amine questions of whole genome evolution (see
on which they are based. Highly sophisticated Eleiscl~mannet al., 1995). The push to sequence
analyses of character evolution will not make up the human genoxne will continue to prod;cc ad-
for non-rigorous estimatian of the phylogeny of vancements in sequencing technology and fostcr
the taxa compared. comparative sequencing projects within eukary-
otes. As more data are accumulated on the pro-
cesses of genome evolution, this information can
THE FUTURE OF MOLECULAR be incorporated into better and more rel~able
SYSTEMATICS methods for estimating phylogenies. Molecular
biology will continue to provide new inforlnation
Molecular systematics has undergone a number on the molecular basls of development, so that a
of remarkable changes during the past tl~ree true synthesis of molecular and morphological
decades. These changes include not only techno- data can occur.
logical developments and refinements (e.g., dis- All levels of systematics are enjoying a rc-
covery and isolation of Type 11: restriction en- nascence, as the importance of understanding his-
zymes, development of DNA sequencing, torical relationsh~psin interpreting patterns
discovery of a heat-stable DNA polymerase and throughout biology is beginning to be widely ap-
its use in DNA amplification) but also major ad- preciated. In addition, the recent emphasis and
vances in issues of analysis. We expect these ad- concern for biodiversity and conservation has
vances to continue, and we hope that discussions placed more national and international a t t e n i ~ o ~ ~
of the current limitations of data collection and on systematics (A.C. Wilson, 1986; Q.D. Whecler,
analysis presented throughout this book will 1995). This emphasis and attention can have cl-
stimulate consideration of these issues. Improve- ther a positive or a negative effect on systematics.
ments can came from technoIogica1 deveIop- The effect will be negative if, in the rush to obtain
ments per se, as well as from increased sophisti- systematic information, scientific rigor is aban-
cation in the use of current methods. The doned. The effect wjll be positive if, in the need
interplay between our increased understanding for accurate information, a premium is placed on
of the evolutionary dynamics of molecules and rigorous data collection and analysis."~7eI~ope
their use as markers in systematic studies is that this book will help to stirnulate the latter
fundamental to developing more efficient ap- course of action.
Acknowledgments
Chapter 1 / Craig Moritz and David M, Hillis
We thank John Avise, John Gillespie, Mark Kirkpatrick, Barbara Mable, and the
late Allan Wilson for comments on various drafts of this chapter.
Chapter 2 / Peter R. Baverstock and Craig Moritz
We are grateful to S. Lavery and C. James for their comments and suggestions.
Chapter 3 / Herbert C. Dessauer, Charles J. Cole, and Mark S , Hafner
We thank Sheldon I. Guttrnan, Mia Molvray, and Elizabeth A. Zirnmer for assis-
tance with the botanical section. Carol R. Townsend developed the folded alu-
minum foil packets and cardboard sleeves for packaging tissues from small
organisms. Robert M. Zink and K. Elaine Hoagland read and gave valuable
advice on the manuscript. This chapter is an outgrowth of the Workshop on
Frozen Tissue Callections and Management supported by the National Science
Foundation (Dessauer and Hafner, 1984).
Chapter 4 / Robert W. Murphy, Jack W. Sites, Jr., Donald C . Buth and

Christopher H. Haufler
The refinement of electrophoresis methods was supported by grants to: ICWM

from the Natural Sciences and Engineering Research Council (NSERC A3148),
the Department of Zoology, University of Toronto, and the National Institutes
of Health (Minority Biomedical Research Support program NIH RR08156-10, D.
J. Morafka [P.I.] and RWM); JWS, Jr. from the National Science Foundation
(U.S.A.) (BSR 85-09092, 88-2275, and DEB 91-19091), the National Geographic
Society (2803-84 and 3088-85), and the College of Biology and Agriculture and
the M. L. Bean Life Science Museum, BYU. P. T. Chippindale, K. A. Coates, L.
A. Lowcock, R. D. MacCulloch, G. S. Allen, and J. L. Sites provided assistance in
the laboratory in the preparation of this manuscript.
We thank the following persons for supplying information and helpful
advice on staining recipes: Donald E. Campton, Paul T. Chippindale, L,W.
545
546 Acknowledgments
Frick, Ronald G. Garthwaite, Carla Hass, Gennady I? Manchenko, Ronald H.
Matson, Donald C. Morizot, and James 8. Shaklee. We are indebted to Donald
E. Campton and Ronald H. Matson who provided us with their own lists of
buffer formulas which we have consulted liberally. We also thank Herb
Dessauer, Stephen D. Ferris, James B. Shaklee, and Gregory S. Whitt for many
helpful comments over the years. Ross MacCulloch, Lisa Gilhooley, Marty
Rouse, Cathy Rutland, Cynthia Horkey, Scotty Allen, and Karen Ditz greatly
assisted with the preparation of the manuscript. Paul Chippindale, David
Hillis, Maurice Ringuette, Gregory S. Whitt and Ronald H. Matson provided
valuable editorial comments. Brian Thompson prepared the equipment dia-
grams.
Chapter 5 / Stanley K. Sessions
I thank Maria Exposito, L.Ferrucci, David Green, Martina Guttenbach, D.

Hernandez-Verdum, S. Kohno, and Herbert Macgregor for permission to use
their photomicrographs. David Green kindly provided Protocol 9. I also thank
Nancy Brandow for her expert word processing skills.
Chapter 6 / Steven D. Werman, Mark S. Springer, and Roy J. Bxitten
We thank Charles G. Sibley for a helpful review of an early draft of the manu-
script. We also tlnartk Adalgisa Caccone for suggestions and information regard-
ing TEACL methods.
Chapter 7 / Steve Palumbi
I thank contributors to the Simple Fool's Guide to PCR., including G. Grabowski,

Bailey Kessing, A. Martin, W. 0 . McMillan, E. Metz, S. Romano, and L. Stice.
Many people contributed suggestions, especially J. Palmer and R. Olmstead for
the cpDNA section, C. Moritz for nuclear introns, and D. Hillis throughout the
chapter. The work in Palumbi's laboratory was funded by grants from the NSF
and the University of Hawaii Foundation.
Chapter 8 / Thomas E. Dowling, Craig Moritz, Jeffrey D. Palmer, and Loren

H.Rieseberg
TED and CM are grateful to W. M. Brown for his inspiration and guidance. We
thank R. Broughton, S. Degnan, N. FitzSimmons, L. Joseph, S. Lavery, R. Slade,
C. A. Tibbets, and R. Wood for comments on the manuscript, C. Armstrong for
providing metl~odologicalinformation on detection of SSCPs, D. Lambert for
the minisatellite gel in Figure 2A, Damien Broderick for providing Figure 6, and
S. Clegg, S. Degnan, A. Heideman, and N. FitzSimmons for help with illustra-
tions. Supported by the National Science Foundation (USA), the U.S. National
Institutes of Health, and the Australian Research Council.
Chapter 9 / David M. Willis, Barbara K. Mable, Allan Laxson, Scott K. Davis,
and Elizabeth Zimmer
We thank M. Goodman, M. M. Miyamoto, J. Slightom, and D. Tagle for com-

ments and suggestions on the manuscript. Loren Ammerman, Chris Austin,
Marty Badgett, and Mike Dixon assisted wlth figure preparation. Bernie
Degnan and Andrew Hugall provided helpful advice on cycle sequencing and
Marty Badgett provided advice on several of the cloning and sequencing proto-
cols. Research support to the authors from the National Science Foundation is
gratefully acknowledged.
Chapter 10 / Bruce S. Weir
This investigation was supported in part by NIH Grant GM 45344. Helpful

comments were made by Wyatt Anderson, Janet Chaseling, Clark Cockerham,
and the editors.
Chapter I I / David L. Swofford, Gary J. Olsen, Peter J. Waddell, and David M.

Millis
This chapter has benefitted from the suggestions and input of numerous indi-
viduals, including Jim Bull, Mike Charleston, Keith Crandall, Mike Hendy,
Peter Lockhart, Barbara Mable, David Maddison, Wayne Maddison, Jirn
McGuire, David Penny, and Mike Steel. Joe Felsenstein, John Huelsenbeck, and
Paul Lewis have been especially helpful with issues related to maximum likeli-
hood inference and statistical testing. In addition, we thank the students in the
molecular evolution courses at the Woods Hole Oceanographic Laboratory and
the University of Texas for useful comments on material in this chapter.
Chapter 12 / David M. Hillis, Barbara K. Mable, and Craig C. Moritz
We thank John Avise, Jim Bull, John Gillespie, Mark Kirkpatrick, and the late
Allan Wilson for comments on various drafts of this chapter. Mike Ryan provid-
ed the file for the preparation of Figure 16. Support for research discussed in
this chapter was provided by the National Science Foundation and the Centers
for Disease Control and Prevention.
A ampere. Unit of electrical current, defined as the M molar; unit of concentration; moles of solute per
current which, if maintaliled in two straight paral- liter of solution
iel conductors ofiilfimie length, of negligible clr- meter; unit of length; defined as 1r650t763'73
cular cross-section, and placed 1m apart m a vac- wavelengths of the orange-rcd radiation of 86Kr
mA milliampere; 1mA = A
uum, would produce between these two
mCi milhcurie; 1 mCi = Ci
conductors a force equal to 2 x 10" N/m. Equiva- pg microgram; = 10-6
lent to the current that passes in a resistance of 1 R mg milligraln; 1 mg = 10-3
when a potentlal difference of 1V is applied. min minute; 1min = 60 sec
"C degrees Celsius (centigrade scale). Unit of temper- microliter; 1 = 10-6 L
aiure based on scale in which 0°C = ice point of ml milliliter; 1ml = L
water, 100°C = steam point of water (at atmos- pA4 micromolar; 1 pM = lo4 M
pheric presure). mM millimolar; 1rnM = M
C coulomb; unit of charge; defined as that quantity of pm micrometer; 1 pm = 1 0 6 m
charge that flows across any crass-section of wire mol micromole; 1 pmol = 10-6 mol
in 1 sec when there is a steady current of 1 A mmol milliinole; 1 mmol = mol
cnl calorle; unit of heat; the amount of heat necessary moI mole; the amount of substance that contains the
to raise the temperature of 1 g of water from same number of formula units as there are 12C
14.S°Cto 15.5'C when the water is at atmospheric atoms in 12.00000 g 12C(6.0225 x or
presure Avogadro's number)
Cal kilocalorie; 1 Cal = l o 3 cal N Newton; unit of force; 1 N = 1 kg-in/sec2
Ci curie; unit of radioactivity, 1 CI = 3.70 x lo1' dish- N normal; a solution contailling one equivalent
tegrations/sec weight of the constituent m question in 1 L of
cm centimeter; 1 cm = m solution
iyrn counts per minute; detected (typ~callyby a ng nanogram; 1 ng =
Gelger counter or a scintillation counter) disinte- nm nanometer; 1 nm = 10- m 5
grations per minute (see CI) nmol nanomole; 1nmol = moles
g gram, unit of mass; originally the mass of 1 cm3 of Q ohm; unit of resistance; 1 a =1 V/A
water at 4OC; now defined by reference to a stan- pg picograin; 1 pg = 10-l2 g
dard kilogram a 4-cm-high, 4-cm-wlde cylinder pmol picomole; 1 pmol = 10-l2 moles
of piatmum-irldlum kept at the Bureau rpm revolutions per minute
Interr~aiionaldes 1201sel Mesutes in France S Svedberg unit;unit of sedimentation rate
g unit of gravitational force; 1g = the gravitational sec second; unit of time; originally 1/86,400 of a
force of Earth mean solar day; now defined as 9,192,631,770
hr hour; 1 hr = 60m ~ n vibrations of radiation from '"CS
J ~oule,unlt of work; 1 J = 1 N/m U enzymeunit
kg kilogram; 1 kg = 10
l<m kilometer; 1km = 10 m' ~i V volt; unit of electric potential difference; 1V = 1
J/C
t bter, unit of liquid volume; 1 L = 1,000 cm' W watt; unit of power; 1 W = 1J/sec
ossary and Abbreviations
A 1. In DNA or RNA sequences: adenine. 2. In pro- Antibody A large protein made in response to a for-
tein sequences: alanine. eign antigen (generally a protein).
AAR Amino acid replacement. Antigen Any molecule that ellcits an antibody
Acetone powders Preparations obtained by grinding response.
tissues in ice-cold acetone and allowing the ace- Antigenic site A region of 5 to 10 amino acids on an
tone to evaporate from the resultant solids. antigen to which antibodies can be elicited.
Additive distances A set of distances between pairs Apomorphy A derived character state.
of sequences or taxa that will precisely fit a Area cladogram A tree that depicts historical rela-
unique, additive phylogenetic tree. Defied math- tionships among geographic areas.
ematically by satisfying the four-point condition ATE Acetate-Tris-EDTA buffer (see Chapter 9,
(see Chapter 11). Appendix).
Additive tree A phylogenetic tree in which the dis- Autapomorphy A derived character state unique to a
tance between any two polnts is the sum of the particular taxon.
lengths of the branches along the path connecting Autoradiograph An image produced on X-ray film
two points. by placing a radioactive object (such as a gel con-
AGE Agarose gel electrophoresis taining labeled DNA fragments) next to film in a
Alignment The juxtaposition of amino acids or light-proof container.
nucleotides 111homologous molecules to maxi- Autosome A chromosome other than a sex chromo-
mize similarity or minimize the number of some.
inferred changes among the sequences. Avidin-biotin Glycoprotein-vitamin complex used
Alignment is used to infer positional homology for histochemical staining with non-radioactively
(qv) prior to or concurrent with phylogenetic labeled probes. Avidin non-immunologically
analysis (see Chapters 9 and 11). binds four molecules of biotin, which allows
Allele A particular form of a gene at a particular amplification of hybridization signal when used
locus. in conjunction with biotin-labeled probes, and
Allele genealogy See gene tree. anti-avidin and/or anti-biotin antibodies which
Allopatric Occurring in geographically separate are themselves conjugated with biotin or fluo-
areas. See sympatric, parapatric. rochromes (see Chapter 5).
Allozyme An allele of an enzyme.
Alu repeat The most abundant interspersed repeated B In DNA or RNA sequences any nucleotide except
DNA family of primates. adenine.
AMPPD Disodium 3-(4-methoxyspiro-[l,2-dioxe- Base composition The relative proportions of the
~hen~l
tane-3-2'-tri~~lo[3.3,1.1~~~Idecan)-4-y1] four respective nucleotides in a given sequence of
phosphate. DNA or RNA.
Anion A negatively charged molecule. BCIP 5-Bromo-4-chloro-3-indolylphosphate.
Anode The positive electrode in an electrolytic cell BGD Bromcresol green dye.
(such as an electrophoresis chamber) toward Bifurcation A node in a tree that connects exactly
which anions migrate. three branches. If the tree is directed (rooted),
ANS Anilino naphthalene sulphonate. then one of the branches represents an ancestral
550 Glossary and Abbreviations
lineage and the other two branches represent Character polarity The inferred direction of change
descendent lineages. Synonym: dichotomy. of a character state in a phylogenetic tree; usually
Biotinylated Labeled wlth biotin. Biotinylated determined by reference to the character state in
probes are used for non-radioactive histochelnical an outgroup.
staining or filter hybridization visualization (see Character state The specific value taken by a charac-
Chapter 5). The technique also may be used to ter in a specific taxon or sequence (e.g., green eyes
label oligonucleotidepruners used for FCR and or glycine at position 12 of a particular protein).
sequencing (see Chapter 7). See character.
BN buffer Bicarbonatc-nonidet buffer (see Chapter Character-state tree A description of the transitions
6, Appendix). among the states of a multistate character, espe-
Bootstrapping See nonparametric bootstrapping and cially when the transitions do not define a linear
parametric bootstrapping. series of states.
bp Base pair. Chiasmata Sites of the mutual switching of non-sis-
BPA Brooks parsimony analysis (cospeciationanaly- ter chromatids of homo1ogous chromosome seg-
sis). A method for encoding and combining infor- ments observed during prophase and metaphase
mation from several independent pl~ylogenetic of meiosis I.
trees for the purpose of inferring coevolutionary Chromatid The eukaryotic d~romosomeprior to
patterns. replication, or one of the two longitudinal sub-
BrdU Bromodeoxyuridine. units of a chromosome after replication.
BSA Bovine serum albumin. Chromomere A region on a chromosoine of densely
Bufh coat A thin layer of white blood cells that lies packed chromatid fibers that produces a dark
above the erythrocytes after vertebrate blood is band (as on a polytcne chromosome).
centrifuged. Chromosome painting A method for the non-
Bulked segregate anaIysis Pooling DNA from each radioisotopic detection of hybridized chromoso-
parental species and screening bulked sample for mal probes using various fluorochromes. The
polymorphisrns using RAPD markers (see method combines immunochemistry with in situ
Chapter 8). l~ybridization.It is used for mapping sequences
on chromosomes and for identiljrlng chrornoso-
C 1. In DNA or R I A sequences: cytosine. 2. In pro- ma1 homologies between species (see Chapter 5).
tein sequences: cysteine. Chromosome repatterning hypothesis One of two
Cathode The negative electrode in an electrolytic cell hypotheses concerning the mode of evolutionary
(such as an electrophoresischamber) toward change in the molecular structure of chromo-
which cations migrate. somes. According to this hypothesis, interspecific
Cation A positively charged molecule. differences in the chromosomal location of certain
C-bands Dark bands on chromosomes produced by repetitive DNA sequences reflect the redistribu-
strong alkaline treatment at high temperahre, fol- tion of chromosomal elements within karyotypes.
lowed by incubation in sodium citrate solution, fol- This hypothesis predicts that evolutionary
lowed by Giemsa staining. C-bands generally cor- changes in sequence location should be relatively
respond to regions of constitutiveheterochromatin. conservative. See homoseqentiallty hypothesis
cDNA See complementary DNA. and Chapter 5.
Central branch The interior branch connecting the CI Chloroform-isoamyl alcohol (see Chapter 9,
two internal nodes of an unrooted phylogenetic Appendix).
tree of four taxa. CIC See cold-induced constriction.
Chaotropic agent In DNA-DNA hybridization, an CISS Chromosome in s i h suppression hybridiza-
agent that reduces thermal stability of base pair- tion. A method for fluorescence in situ hybridiza-
ing in DNA. tion (see FISH) that utilizes probes from DNA
Character A variable feature that in any given taxon libraries of flow-sorted chromosomes to search
or sequence takes one out of a set of <wo or more for DNA sequence homologies of whole or partial
different states (e.g., eye color or amino acid posi- chromosomes. CXSS is used to identify homeolo-
tion 12 of a particular protein). gous chromoson~esin different species or in
Character compatibility A method of phylogenetic hybrids (see Chapter 5).
analysis that seeks the largest clique of characters Cladogram A tree that depicts inferred historical
that can be fitted to a common tree so that each branching relationships among entities. Unless
character state arises only once (see Chapter 11). otherwise stated, the depicted branch lengths in a
cladogram are arbitrary; only the branching order C o t plot A plot of percentage of single-stranded
is significant. See phylogram. DNA versus log of CoI-.
Cluster analysis A rapid method of hierarchically cpDNA ChloropIast DNA.
grouping taxa or sequences on the basis of simi- Criterion In DNA-DNA hybr~dization,the s trin-
larity or distance. gency of reassociatlon of single-stranded DNA
Coalescence The evolutionary process vicwed back- measurcd by the difference between thc T,, oi
ward through time, so that allelic diversity is perfect duplexes in the incubation buffer arid the
traced back;hrough mutations to ancestral alle- temperaturc of ~ncubatlon(see Chaptcr 6 )
les. Coalescent theory can be used to make pre- Cryptic allele An undetected (by a particular tecl1-
dictions about effective population sizes, ages and r-uque) variant at a gene locus
frequencies of alleles, selection, rates of mutation, CTAB Hexadecyltr~methylammoniuinb r o m ~ d e
or time to common ancestry of a set of alleles. C-value A measure of haplo~dDNA content per cell
Coancestry coefficient The correlation of genes of
different individuals in the same population; a D 1.In DNA or RNA sequences: any nucleotide ex-
measure of the relatedness of individuals within cept cyfosine. 2.111 protem sequences: aspartic ac~d
populations (symbolized by 0 or FST);sce DAB Diaminobenzidine tetrahydrochloride.
Chapter 10. DAFI Diam~dino-2-plzenylindole
Cold-induced constriction A chromosome-specific Degenerate primers Oligonuclcotldes dcslgned to
constriction induced in certain specles with large Include a mixture of different sequences to allow
chromosomes by prolonged trcatment of the for variation at particular nucleottde positions ~n
organism at 0.5-2.5'C in the presence of a targct sequence.
colclucine. Dendrogram Any branching, treehke diagram.
Complementary DNA DNA reverse transcribed DEP Diethyl pyrocarbonate
from an RNA template. DGGE Denaturing gradient gel electrophoresis A
Concexted evolution The generation and mainte- specialized form of electrophoresis uslng poly-
nance of homogeneity among members of a fami- acrylamide gels with gradlents of denaturants,
ly of DNA repeats within a species or population. used to detect differenccs In the stability of PCR
Confornational isozymes lMultiple forms of a single products. The double-stranded poduci moves
gene product [hat differ in secondary or tertiary through the gel until a point iiz the denaturing
structure. gradient at which it becomes single-stranded,
Congruence Agreement among data or data sets. DNA segments differ~ngby a single nucleotide
Consistency In the context of statistical inference: o f en can be distinguished using this technlque.
convergence on the correct answer using a partic- Dichotomy See bifurcat~on.
ular method as the sample size becomes infinite. Diplotene bivalents Pairs of homologous chromo-
Lack of consistency indicates a critical departure somes associated via chiasmata during the
from the assumptions (explicit or implicit) of the dipiotenc substage (when ch~asmataare hrst
method of analysis. All methods are consistent formed) of prophase I of meiosis.
when their assumptions are met; all methods Direct fluorescence method A mcthod for non-
become inconsistent when their assumptions are radioactive immunochemical v~sualizationof
violated sufficiently. Contrast with efftciency. hybrid~zedprobes. See Indirect fluorcsccncc
Consistency index A measure of the amount of method and Chapter 5.
homwplasy exhibited by a character or sct of char- Disequilibrium coefficient A term that describes
acters on a tree, defined as the sum of the mini- the differencc bctween the jolnt frequency of two
mum individual character ranges divided by the or more alleles and the product of the frcquenclcs
observed number of changes. If there is no homo- of the scparate alleles.
plasy, these quantities will be equal, so that the Dissimilarity A generic measure of the dlfferencc
consistency index achieves its maximum value of between two objects, usually measured on a scalc
one. of 0 to 1.
Constitutive heterochromatin Regions on chromo- Distance A measure of thc difference betweeen t ~ 7 o
somes consisting mostly of highly repeated, non- objects, usually measured on a scale of 0 to mfin~ty
coding sequences. Distance estimates A phrase used to emphasize thc
C o t Initial concentration of DNA in a DNA reassoci- potentially imperfect rcflcctlon of cvolut~onary
ation experiment (in mol/L) multiplied by time of history In distance valucs lnfcrrcd from experl-
incubation in seconds. mental or sequencc data.
DMSO Dimethylsulfoxlde. Exon A segment of an interrupted gene that is repre-
DNA fingerprinting In the broad sense, any fine- sented in the mature mRNA.
scale DNA analysls that allows identification of
sdmples at the level of the individual. The term F In protein sequences: phenylalanine.
has been specifically applied to analyses such as FACS Fluorescence activated cell sorting. A tech-
VNTlls, RAPDs, microsatellites, and minisatel- nique that allows large scale isoIation of parhcu-
li~es lar chromosomes of a given karyotype (see
DNA polymerase An enzyme that catalyzes synthe- Chapter 5).
sis ol DNA under direction of single-stranded FCS Fetal calf serum.
DNA template. Felsenstein zone A region (in parameter space) of
dNTP Deoxyribonucieotide. inconsistency for a given phylogenetlc method
Downstream 3' of the target sequence. under a given evolutionary model.
Driver Unlabeled, fractionated single-copy DNA F I s See inbreeding coefficient.
used in DNA-DNA hybridization experiments FIT See inbreeding coefficient.
See tracer. FISH Fluorescence in situ hybridization. Refers to a
dsDNA Double-stranded DNA. number of methods for non-isotopic in situ
hybridization (l.e., methods uslng non-radioac-
E in proteln sequences: glutamic acid. tlvely labeled probes). See ISH and Chapter 5.
E'B Ethidium bromide. FITC Fluorescein isothyocyanate.
EDTA Ethylene diamine tetra-acetate Fixation index See inbreeding coefficient.
Eff~ciency In the context of stat~sticalinference: A Fourfold degenerate codons Codons for which the
measure of how quickly a particular method con- third base position can be occupied by any of the
verges on the correct solution as more data are four nucleotides without altering the encoded
applied to the problem. amino acid.
Electrodecantation The setthng of proteins of hlgh Fs, See coancestry coefficient.
molecular weight toward the bottom of a horizon- F statistics A set of coefficients that describes how
tal gel during electrophoresis. genetic variation is partitioned within and among
Electroendosmosis Movement of ionized buffer populations and individuals. See coancestry
solutlo1.1through a gel caused by gel charge coeffient, inbreeding coefficent, and Chapter 10.
groups.
Electromorph An rlectrophoretically indistinguish- G 1.In DNA or RNA sequences: guanine. 2. In pro-
able class of isozymes. Electromorphs represent k i n sequences: glyclne.
alleles ~f all differences between variants result in Caps Edihng syn-ibols that are inserted into
changes in electrophoretic migration rate. sequences in the process of alignment in order to
Electrophoresis The separation of molecu~esln an compensate for presumptive insertion and dele-
zlectrlc flcld. tion events.
EPIC Exon priming, iintron crossmg. Refers to G-bands Dark bands on chromosomes produced by
primers, designed to amplify intron regions, that Giemsa staining. G-bands occur primarily in AT-
are based on conserved exon sequences flanking rich regions.
the target introns. Gene conversion A genetic process by which one
Epigrnetic All processes relating to the expression sequence replaces another at an orthologous or
and interaction of genes. paralogous locus. May result from mismatch
Evolutionarify significant unit Historical groups of repair in heteroduplexes.
populations recognized for the purpose of priori- Gene tree A branching diagram that depicts the
tlzlng conservation actions and determining long- known or (usually) inferred relationships among
term strategies. Equivalent to species under most an historically related group of genes or olher
lineage concepts of species See species. nucleotide or amino acid sequences.
Evolutionary distance An idealized measure of the Genomic library A mixture of cloned DNA frag-
e i nluhonary separation of sequences or taxa, ments (usually in viral or cosmid vectors) that
s.iich as the total number of substitutional events. together represent virtually all of an organism's
They are defined so that the values are additive DNA. Partial or subgenomic libraries contain
a n d hence will precisely ht an additive evolution- only restriction fragments of a cerlain size range.
ary tree. GTE Glucose-Tris-EDTAbuffer (see Chapter 9,
Appenix).
Glossary and Abbreviations 553
H 1, In DNA or IWA sequences: any nucleotide than inheritance from a common ancestor. These
except gualune. 2. In protein sequences: histidine. include convergence, parallelism, and reversal.
H Symbol for average heterozygosity. Homoseqentiality hypothesis One of two hypothe-
HAP Kydroxyapatite. HAP is used in columns to ses concerning the mode of evolutionary change
separate single-stranded DNA from double- in the molecular structure of chromosomes. This
stranded DNA (see Chapter 6). hypothesis holds that differences in the apparent
."
Hardy-Weinberg equilibrium An equilibrium of chromosomnl locations of various sequences
genotypes achieved in populations of infinite size reflect localized amplification or diminution of
(in which there is no immigration, emigration, sequences with fairiy stable chsomosomal loca-
selection, or mutation) after one generation of tions and predicts rapid and reversible changes in
panmictic mating. With two alleles A and B of fre- the cytologically visible cl~romosomallocation of
quency p and q, respectively, the Hardy-Weinberg certain kinds of sequences. See chromosome
equilibrium frequencies of the genotypes AA, AB, repatterning hypothesis and Chapter 5.
and BB are p2, 2pq, and q2, respectively. Hornospecific Adjective used to describe a probe
Heterochromatin Chromosomal segments or whole derived from the same species that is under study,
chromosomes that generally exhibit a condensed or to refer to any reaction between homologous
state throughout interphase and late replication. molecules from the same individual or species.
See constitutive heterochromatin. HWE Hardy-Weinberg equilibrium.
Keteroduplex A hybrid DNA-DNA molecule Hybrizymes Alleles found in hybrid zones which
formed between (presumably homologous) are rare or absent in populations of the parental
sequences. (non-hybrid) species.
Heterologous Homologous molecule from a species Hybridoma Cell line formed by fusing a B lympho-
other than that which is being examined. cyte with a myeloma (a tumor cell line derived
Heteroplasmy The containment by one cell or indi- from a lymphocyte). Used in the production of
vidual of more than one type of a particular monoclonal antibodies.
organellar DNA (e.g., mtDNA or cpDNA).
Heteropolyrner A rnultimeric protein formed from I In protein sequnces: isoleucine.
products of multiple alleles. IgG A type of antibody commonly used for
Heterospecific Adjective used to describe a probe immunochemical staining.
derived from a species other than the species ImmunoglobuIin (Ig) An antibody composed of
under study, or to refer to any reaction between two identical light chains and two identical heavy
homologous molecules from different species. chains of amino acids.
Heuristic method Any analysis procedure that does Inbreeding coefficient The correlation of genes with-
not guarantee finding the optimal solution to a m individuals (symbolized by F or Fm;this is the
problem (usually used to obtain a large increase overall inbreeding coefficient), or the correlation
in speed over exact methods). of genes within individuals within populations
HKY model The DNA substitution model of M. (symbolized by f or Frs; this is the within-popula-
Hasegawa, Id. Kishino, and T. Yano (1985; see tion inbreeding coefficient; see Chapter 10).Frs is
Chapter 11). also known as the fixatlon index. Both FIs and FIT
~ o m e o l o ~Chromosomes
s that are homologous are measures of deviation from Hardy-Weinberg
among species. proportions; positive values indicate a deficiency
Homoduplex A hybrid DNA-DNA molecule of heterozygotes whereas negative values indicate
formed between sequences from the same mdi- an excess of heterozygotes.
vidual (or sometimes, species). Indel Insertion/deletion event.
Homogeneous Markov process A process that fol- Indirect fluorescence method A method for non-
lows a Markov model (qv) and does not vary radioactive immunochemicai visualization of
through time (e.g., in different parts of a phyloge- hybridized probes (see Chapter 5).
netic tree). Ingroup An assumed monophyletic group, usually
Homology Common ancestry of two or more genes comprising the taxa of primary interest.
or gene products. In situ hybridization The annealing of a mobile,
Homomeric isozyme An enzyme composed of mul- labeled nucleic acid probe to a stationary nucleic
tiple identical polypeptide chains. acid target <oftenwnole chromosomes) to form
Homoplasy A collection of phenomena .that leads to base-paired duplexes.
similarities in character states for reasons other
Interior branches Branches in a phylogenetic tree virus to circularize after entering its bacterial
that do not connect to a tip of the tree. host. The DNA is packaged into a protein coat in
Interior nodes The branch points in a phylogenetic the mature virus.
tree. If the tree is rooted, the root node IS also an Lampbrush chromosome A bivalent at diplotene
interior node. stage in a female meiotic cell; found in the oocytes
Intron Non-coding reglon of an interrupted gene of most animals.
that is transcribed ~ n t oXWA but is excised during L-broth Luria broth (see Chapter 9, Appendix).
processing of the primary transcript into a mature LINE Acronym for long interspersed elcment. An
mRNA. interspersed repetitive DNA sequence usually
Isoelectric point The pH at which the positive and >5,000 bp.
negative charges of a protein are equal. Linearly ordered character A multistate character in
IsoIoci Two or more loci of a multilocus enzyme sys- which the allowed transitions between states
tem that produce products of the same elec- form a linear chain.
trophoretic mobility Linkage disequilibrium Departure from the pre-
Isology Sequence similarity of aligned nucleic acids dicted frequencies of multiple locus gamete types
or polypeptides; the similarity may be due to assuming alleles are randomly associated.
homology or convergence. Lyophifization Drying from the frozen state.
Isomers Molecules with the same chem~calformula Lysogenic cycle A cycle of phage growth in which
but different molecular structures. the phage become a stable prophage component
Isoschizomer Restriction endonuclease with the of the bacterial, genome.
same recognition sequence as another restriction Lytic cycle A cycle of phage growth in which the
endonuclease. phage are replicated many times, resulting in
Isozyme Enzyme with the same chemical function eventual destruction of the host bacterial cell and
as another enzyme, but differing in primary, sec- release of the progeny phage.
ondary, tertiary, and/or quaternary structure.
M 1. In DNA or I W A sequences: adenine or cyto-
Jackknifing A statistical method of numerical sine. 2. In protein sequences: methionine.
resampling based on deleting a portlon of the M13 A filamentous bacteriophage of the bacterium
original obsemat~onsin subsequcnt samples (see E.coli that is widely used for clonlng and
Chapters 10 and 11). sequencing. The genome of MI3 is circular and
Jukes-Cantor model The DNA substitut~onmodel approximately 6,500 bp in length. M13 occurs in
of T.H. Jukes and C. R. Cantor (1969) that both a double-stranded rephcative form (used for
assumes all possible nucleotide substitutions are cloning small fragments) and a single-stranded
equally likely (see Chapter 11).The Jukes-Cantor form (used for Sanger dideoxy-sequencing);see
model is a special case of the Kimura model (qv). Chapter 9.
MAB Monoclonal antibody.
K 1. In DNA or RNA sequences: guanine or thymine Management unit Demograplnically independent
(uracil in RNA). 2. In protein sequences: lysine. sets of populations recognized for the purpose of
KAc Potassium acetate. management of exploited or endangered species,
Karyotype A pictorial or diagrammatic representa- e.g., for population monitoring and manipulation.
tion of the metaphase chromosomes of the com- Broadly equivalent to "stocks" and recognized as
plement of an individual or a species. sets of popula tions showing significant divergence
kb Kilobase pairs, or 1000 base pairs of DNA. in allele freauencies from other conspecific sets.
Kimura model The DNA substitution model of M. Markov model ' A~nodelin which the ;robability of
Kimura (1980) that assumes all transitions are a change from one state to another does not
equally likeIy and all transversions are equally depend on the previous history of the state.
likely (see Chapter 11). Maximum likelihood A criterion for estimating a para-
meter from observed data under an explicit model.
L In protein sequences: leucine. In phylogenetic analysis, tl~eoptimal tree under the
Lambda (h.)bacteriophage Avirus of the bacterium maximum likelihood criterion is the tree that is the
E. colt that is widely used as a cloning vector in most likely to have occurred givcn the observed
molecular bioiogy. It is a double-stranded DNA data and the assumed model of evolution.
virus approximately 50 kb in length, with single- Maximum parsimony A criterion for cstimating a
stranded complementary ends that allow the parameter from observed data based on the prin-
Glossary and Abbreviatiorzs 555
ciple of minimizing the number of events nceded mRNA Messenger RNA.
to explain the data. In phylogenetic analysis, the mtDNA Mitochondria1 DNA
optimal tree under the maximum parsimony cri- MTT 3-(4,5-Dirnethylth1azol-2-yl)-2,5-diphunyltcta-
terion is the tree that requires the fewest number zoliuln bromidc.
of character-state changes (which may be differ- Multifurcation A node in a trcc that connects morc
entially weighted across characters and/or char- than threc branches. If the trce 1s drrected (root-
acter states). Often simply called parsimony. ed), then one of the branches represents an ancc.5-
MEM Eagle's minimum essential medium. tral l~neageand the remaming branches rcprescnt
Methylation The chemical process of adding a descendent lineages. A multifurcation may rcprc-
methyl group to a rnolccule. sent a lack of resolulion because of too few data
Microsatellites A subset of VNTRs characterized by available for infcrrtng thc phylogeny ( ~ which
n
very short (2-5 bp) tandem repeats with a high case ~tis satd to be a soft multifurcahon) or i t may
rate of variation in copy number among individu- represent the hypotheslzcd simultaneous spllttlng
als. These loci tend to be randomly distributed of several lineages ( ~ which
n case it is sald to bc a
throughout the gcnome and are subject to replica- hard mult~furcation).Synonym polytomy
tion slippage that leads to length variation (see Multimeric protein Aproteln that co~~tains multlplc
Chapter 8). polypeptide chains.
Minimum evolution 1. Originally, a name applied Multiplexing Any process that conducts rcpetltive
to a phylogenetic optimality criterion developed tasks simultaneously on many objccts. In the co1-i-
by Cavalli-Sforza and Edwards (1967).2. The text of DNA sequencing, rnultiplex~ngrefers to
name applied by Rzhetsky and Nei (1992a) to a combincd sequencing of numerous clones on a
phylogenetic optimality criter~onthat was origi- single gcl, each of which 1s Incorporated into a
nally described by Kidd and Sgaramella-Zonta distinct vector. Multlplcxing IS also used to rcfer
(1971). The optimal tree under this criterion is the to methods of analysis such as sirnultancous
tree with the smallcst sum of branch lengths as arnpliflcation of several mxcrosatelllte locl vla
estimated under the least-squares criterion, with PCX.
negative branch lengths disallowed. MUJdTIPRINS Multiple primed m s ~ t uhybnd~za-
Minisatellites A subsct of VNTWs characterized by tlon. Modification of the PRLNS techn~quc(qv) fol
tandem repeat unlts of approximately 20 bp. usc with multiple probes detected with dlfferent
These loci tend to be concentrated close to telom- fluorochro~nes(sce Chapter 5).
crcs and vary in length and sequence because of
intramolecular or interallellc rccombit~ationand N 1.In DNA or RNA sequences. an unknown
gene conversion (see Chapter 8). nucleotidc 2. In protcln sequences: asparagme
Mitogen A substance that stimulates mitosis. NaAc Sod~umacetate
Molecular clock hypothesis The hypothesis that NAD P-n~cotinainideadenine dinucleotide
molecules evolve In direct proportion to time, so NADH ,!?-nicotinarnldcadcnlne dlnucleohdc,
that differences between homologous DNA reduced form.
sequences or proteins can be used to estimate the NADP /?-nicotinarnide adcnine dlnucleotide phos-
time elapsed since the two molecules (or the taxa phate.
that contam thcm) last shared a common anccstor. NBT Nitro blue tetrazohum.
Molecular systematics The detection, dcscription, nDNA Nuclear DNA.
and explanation of molecular biological d~vcrsity, N , Effectwe populat~onsize
both within and ainang species. Neighbor joining An heurlstlc method for obtail~~ng
Monoclonal antibody A single antibody produccd a po~ntestimate of a m~i-iiin~~rn
evolutton trec (scc
in quantity by cultured lines of hybridolna cells. Chapter 21).
Monomeric protein A protein that contains a single Network A graph that dcp~ctsrelationsh~psamong
polypeptide chain. ent~ticsand contains cycles (ret~culations).
Monophyletic A group of taxa that contains an Nonidet Non-ionic detergent (e.g., NP-40).
ancestor and all of its descendants. Nonparametric bootstrapping A statistical melhod
Most-parsimonious reconstruction (MPR) Any based on repeated random sampling with
assignment of ancestral states to characters on a replacement from an orlg~nalsample to provide a
tree so that the change of each character is mini- collectlan of new pseudorepllca te samples, Cron-i
mized (subject to any constraints being enforced). wl~ichsampling variance can be cstlmatcd (scc
MPR Most-parsimonious reconstruction. Chapters 10 and 11).
556 Glossnry and Abbreviations
Non-synoz~ymoussubstitution A nucleotide substl- Paralogy Homology that arises via gene duplication.
tutlon that results in an amino acid replacement. Parametric bootstrapping A method for producing
NOR Nucleolar organizer reglon. independent pseudoreplicates of a data set by
NP-40 Nonidet P-40,a nomonic detergent. estimating parameters from the observed data,
NPH Norinahzed percentage of hybridization (see using the estimates to produce a model, and
Cliapter 6 ) Defined CIS the extent of hybridization using the model to simulate replicate data sets.
In a heieroduplex comparison divided by that for See Chapter 12.
the homoduplex control
P' ressed as a percentage
Nuclear genome The port on of the genome con-
tamed in the nucleus of eukaryotes, i.e., the chro-
Parapatric Adjacent but non-overlapping distribu-
tions. See allopatric, sympatric.
Parsimony See maxlinum parsimony.
mosomes. Partially ordered character A multistate character
Nucleolar organizer region A region on a chromo- that is ordered, but for which the permitted state
some that contains the riboson~alRNA genes and transitions do not form a linear series.
:issoclated spacers. PB Phosphate buffer, used in DNA-DNA hybridiza-
Null allele An allele that produces either no proteln tion experiments (see Chapter 6, Appendix).
product or a non-functional proteln product PBS Phosphate-buffered saline (see Chapter 6,
(under the conditions analyzed). Appendix).
Neutral~ty The state of being free from the effects of PC1 Mixture of phenol, chloroform, and isoamyl
selectlon. alcohol, used in DNA extraction protocols (see
Chapter 9, Appendix).
Objective function A function that defines how well PCR Polymerase chain reaction.
data f ~at particular hypothesis (as, for instance, a PDB Phage dilution buffer (see Chapter 9, Appendix).
part~cularphylogenetic tree). PEG Polyethylene glycol.
OD Optical density, as measured in a spectropho- Perfect primer An oligonucleotide designed to be
tomcter. Used to estlrnate concentration and puri- exactly complementary to a target DNA sequence.
ty of DNA solutions (see Chapters 7-9). Pericentric inversion An inversion of a region of a
Oligonucleotide A short chain of nucleotides, often chromosome that includes the centromere.
produced in the laboratory. Peripheral branches The branches on a phylogenetic
Optirnality criterion Same as objective function. tree that coi~nectto a terminal taxon or sequence.
Ordered character A multistate character for which PERT Phenol emulsion reassociation technique (see
the changes between states are constrained; not Chapter 6).
all states can be reached directly from any other. PHA Phytohaemagglutinin, a mitogen.
Organellar genome The DNA contained in cytoplas- Phenogra~n A branching diagram that links entities
mlc organelles (i.e,, mtDNA and cpDNA). by estimates of overall similarity. Usually con-
Orthology Homology that arises via speciation. structed using WCMA cluster analysis.
OTU Operational taxonomic unit. Synonymous with Phylogeny The historical relationships among lin-
teilninal taxon in thls book eages of organisms or their part:: (e.g., genes).
Outgroup One or more taxa assumed to be phyloge- Phylogeography The study of biogeography as
neLlcally outside the ingroup. revealed by a comparison ofestimated phyloge-
Outgrouy comparison Amethod that can be used nies of populations or species with their geo-
lor assigning the direction of change to character- graphic distributions.
state transformations and for determining the PI Propidium iodide.
root of 3 phylogenctic tree (see Chapter 11). Phylogram A tree that depicts inferred Iustorical
relationships among entities. Differs from a clado-
P In protein sequences. prollne. gram in that the branches are drawn proportional
Pachytene A substage of prophase of meiosls I in to the amo~mtof inferred character change.
~vhichthe homologous chromosomes are paired Plaque A clear spot on a bacterial lawn (in a petri
f~olnend to end. plate) that results from lysis of the resident bacte-
PAGE Polyacrylamide gel electrophoresis. ria by bacteriophage.
PAP Peroxidase-antiperoxidase complex. Used in a Plasmid A self-replicating extrachromosomal circu-
method for visualizing non-radioactively labeled lar DNA.
hybridzed probes (see Chapter 5). Plerology Homology of repeated sequences that are
Faracentric inversion An inversion of a regionaf a subject to concerted evolution.
chromasnme that does not include the centromere. Plesiomorphy An ancestral character state.
Glossary and Abbreviations 557
pMS Phenazine methosulfate. R 1. In DNA or RNA sequences: adenine or guanine.
PNK Polynucleotide kinase. Used in end-labeling of 2. In protein sequences: arginine.
primers. Random error Deviation between a parameter of a
polymerase chain reaction A process for amplifying population and an estimate of that parameter, due
a target DNA sequence manyfold, in which a strictly to a limited sample size used to make the
series of thermal cycles each result in denatura- estimate. By definition, random error disappears
tion of a double-stranded target, annealing of in infinite samples.
oligonucleotide primers to the resulting single RAPDs Random amplified polymorphic DNAs (see
strands, and primer extension catalyzed by a ther- Chapter 8).
mostsble DNA polymerase. R-bands Bands on chromosomes that exhibit the
Polytene chromosome A somatic chromosome that reverse pattern of Q- or G-bands.
has undergone many rounds of endoreplication rDNA Ribosomal DNA, which contains the genes
such that each chromosomal element consists of for ribosomal XNA and the associated spacer
hundreds to thousands of unseparated chromatids. regions.
Polytomy See multifurcation. RE Restriction enzyme.
Positional homology The relationship among the Reciprocity The degree to which reciprocal mea-
columns of nucleotides or amino ac~dsin correct- sures of divergence (e.g., A to B versus B to A)
ly aligned DNA or protein sequences. The agree.
nucleotides or amino acids in a single column of Recombination Exchange of gene segments between
the alignment are inferred to have been derived non-sister chromatids through the physical
from a-single ancestral nucleotide or amino acid, process of exchange of (usually) homologous
with or without intervening substitutions or strands of DNA.
replacements. Restriction endonuclease An enzyme that cleaves
Postf.ranslationa1modification Any process that double-stranded DNA. Type I restriction endonu-
modifies a polypeptide after its translation from cleases are not sequence-specific; Type I1 restric-
RNA. tion endonucleases cleave DNA at particular
PPS A solution of phenoxyethanol-phospl~ate- recognition sequences (typically 4-6 bp palin- ,
sucrose, used to preserve proteins. dromes).
Primary antibody An antibody produced directly in Restriction fragment length polymorphism (RFLP)
response to a particular antigen. A polymorphism in an individual, population, or
Primers Oligonucleotides used to initiate synthesis species defined by restrictlon fragments of a dis-
of DNA by a DNA polymerase or reverse tran- tinctive length. Usually caused by gain or loss of a
scriptase. A primer anneals to a complementary restriction site, but may result from an insertion
sequence in a single-stranded DNA or RNA tem- or deletion of a fragment of DNA between two
plate, and the polymerase then extends the com- conserved restriction sites.
plementary sequence from the primer. Retroposition Reverse transcription of M A to DNA
PRINS Primed in situ hybridization. A method for with subsequent integration of the DNA at a new
fluorescence in situ hybridization (see FISH) genomic site.
which utilizes unlabeled oligonucleotide probes Retroposon A transposable rctroelement that neither
to complementary sequences on fixed chromo- constructs virion particles nor is flanked by termi-
somes, with subsequent extension by DNA poly- nally redundant sequences.
merase and labeled nucleotides (see Chapter 5). Reverse transcriptase An enzyme that transcribes
Pseudogene A usually nan-functional copy of a pro- RNA into DNA.
tein-coding gene inserted at another location in RFLP See restriction fragment length polymor-
the genome. Most pseudogenes result from phism.
retroposition of processed mRNAs, and therefore Robertsonian translocation Fission or fusion of
typically lack introns and the regulatory chromosomes at their centromeres.
sequences necessary for expression. rRNA Ribosomal M A , the nucleic acid component
PWM Pokeweed mitogen. of ribosomes, which functions in translation of
proteins from mRNA.
Q In protein sequences: glutamhe. RT Reverse transcriptase.
Q-bands Fluorescent (under UV light) bands on
chromosomes produced by quinacrine staining. S 1.In DNA or RNA sequences: guanine or cytosine.
Q-bands are brightest in AT-rich regions. 2. In protein sequences: serine.
Sl nuclease An enzyme that digests single-stranded the states that are consistent with the potential
DNA. most-parsimonious reconstructions.
Satellite DNA Highly repeated DNA sequences that STE Sodium chloride-Tris-EDTA buffer (see Chapter
band apart from most nuclear DNA in CsCl ultra- 9, Appendix).
centrifugation. STES Sodium chloride-Tris-EDTA-sucrose buffer
SC Synaptonemal complex. (see Chapter 8, Appendix).
scnDNA Single-copy nuclear DNA. Streptavidin Protein made by the bacterium
SCP Saline citrate-phosphate (see Chapter 5, Streptofnycesavidiniz. Streptavidin binds biotin
Appendix). and is often used in place of avidin in histochemi-
SDS Sodium dodecyl sulfate (= sodium lauryl cal staining procedures. See avidin-biotin.
sulfate). Stringency In DNA-DNA or DNA-RNA ltybridiza-
Secondary isozyme A conformational isoqme. tion, the conditions of the hybridization (such as
SEDTA Saline EDTA (see Chapter 6, Appendix). temperature and concentration of chemical addi-
Sequential electrophoresis The use of a series of dif- tives) that determine the degree of similarity that
ferent electrophoretic conditions to uncover hid- will result in formation of hybrid molecules.
den heterogeneity in isozyme electrophoresis. Subbands Non-allelic bands in isozyme elec-
SGE Starch gel " electrophoresis. trophoresis that represent the electrophoretic loca-
Similarity A generic measure of the resemblance tion of conformational isozymes.
between two objects, usually on a scale from Superimposed changes Changes at a part~cularsite
1 to 0. along a lineage of the phylogeny that mask earlier
SINE Acronym for short interspersed element. An changes at that site, as well as parallel or conver-
interspersed repetitive DNA sequence of <500 bp. gent changes that occur at the same site in differ-
Single-strand conformational polymorphism A ent lineages.
polymorphism detected by differential migration Sympatric Occurring in the same place. See
of DNA fragments in a gel matrix caused by con- allopatric, parapatric.
formational changca of single-stranded DNA Symplesiomorphy A shared ancestral character state.
resulting from point substitutions, msertions, and Synapomorphy A shared derived character state
deletions (see Chapters 8 and 9). that is indicative of a phylogenetic relationship
Southern blot A membrane onto which DNA has among two or more OTUs.
been transferred directly from an electrophoretic Synaptonemal complex A set of proteinaceous par-
gel. Named after the blotting technique devised allel strands that occur coaxial to paired chrorna-
by E. M. Southern (1975). somes during prophase I of meiosis, which func-
Species A cohesive historical lineage of tion to hold the paired chromosomes together and
ancestral-descendent populations of organisms facilitate recombination.
that maintains its identity from other such lin- Synonymous substitution A nucleotide substituhon
eages. A species comes into being at a branching that does not result in an amino acid replacement.
event (one lineage becomes two or more lineages) Synteny Genetic linkage of loci to the same chromo-
and ceases to exist either at a branching event some.
(when it gives rise to new species) or when the Systematic error Deviation between a parameter of
lineage is terminated through extinction. a population and an estimate of that parameter,
Specific reactivity See specificity. due to incorrect assumptions in the estimation
Specificity The degrcc to which antibodies react method. Systematic errar persists (and may mten-
with multiple antigenic s~tes.Initially antibodies sify) as sample sizes increase and become infinite.
are monospecific, but with longer periods of
immunization they become morc cross reactive T 1. In DNA sequences: thymine. 2. In protein
(react with more antigenic sites). sequences: threonine.
SSC Saline sodium citrate (see Chapter 5, TAE Tris-acetic acid-EDTA buffer (see Chapter 8,
Appendix). Appendix).
SSCP Single-strand conformational polymorphism. Taq polymerase A thermostable DNA polymerase
ssDNA Single-stranded DNA. from Thermus aquaticus, a thermophilic bacterium.
Star tree A tree that contains a single internal node. Used for amplification of DNA via the poly-
State set Amathematical set of character states, as merase chain reaction.
used during a parsimony analysis to keep track of TCA Trichloroacetic acid.
Glossary arzd Abbreviations 559
TE Tris-EDTA buffer (see Chapter 9, Appendix). Tree length The sum of the estimated or actual
TEACL Tetraethylammonium chloride, a chaotropic branch lengths in tree
agent used to reduce the effects of differential Two-fold degenerate codons Codons for w111ch the
base composition on hybrid melting temperature third base pair can be occup~cdby either purlne
in DNA-DNA hybridization (see Cl~aptcr7). or by either pyrimidine (1 e., it can be degenerate
TEMED N,N,N',N'-Tetramethylethylenediamine. for either T/C or A/G) without altering thc
Terminal nodes Tips of a phylogenetic tree at which encoded amino acid.
OTUs (terminal taxa) are placed.
Thermal cycler Machine used to produce the con- U 1.In RNA sequenccs: uracil
trolled temperature cycles required for PCR. Ultrametric distances Palrwise distance values that
Time-reversible model A model in which the proba- prcc~selyfit a rooted tree with a constant molecu-
bility of change from state A to state B is the same lar clock. Defined mathematically by satisfying
as the probability of change from B to A. Thus, in the three-point canditlon (see Chapter 11).
the context of phylogenetic analysis, evaluation of Unequal crossing over Pl~ysicalcrossover between
a given tree under a time-revcrsible model is imperfectly aligned rcpeats of a multigene famlly,
independent of the root of thc tree. - DNA
which results in onc smaller and one larger
Titer The conccntration of a substance as determined molecule.
by the amount of a known reagent required to Unequal sister chromatid exchange Sec unequal
bring about a given effect in a test solution. crossing over.
TSoH The interpolated temperature along a DNA melt- Unordered character A character for which anv state
ing curve at which 50%of the DNA is double- can change directly to any other state.
stranded. T5OI3differs from T, (below) when all Universal primer An ol~gonucleotidedesigned to bc
DNA in a DNA-DNA hybridization rcaction does complementary to target sequences that are con-
not form duplexes. The difference in TNI+between served over a wide range of taxa.
homoduplex and hetelvduplex curves 1s AT,. Unrooted tree A pl~ylogcl<etrctree that is not direct-
T, The interpolated temperature along a DNA melted with respect to tin~c.
ing curve at wl~ich50% of the duplex DNA UPGMA Unweighted pair-group method of arltli-
formed In a DNA-DNA hybridization reaction is metic averages. A cluster analysis technique.
double-stralided. The difference in T,, between Upstream 5' of the target sequence.
l~omoduplexand heteroduplcx curves is AT,.
Tmodc The interpolated temperature of the peak of a V 1. In DNA or RNA sequenccs. adenine, cytosmc,
d~fferentialplot of a DNA melting curve. The dif- or guaninc (not thymine or uraciI). 2. In proteln
ference in Tmodebetween homoduplex and het- sequenccs: valine.
eroduplex curvcs is A Tmode Variable number tandem repeat loci Genomic loca-
TPBS Tris-phosphate buffered sahne. tlons that contain variable numbers of short
Tracer Radioactively labeled, fractionated single- tandemly repeated sequcnccs (see Chaptcr 8).
copy DNA used in DNA-DNA hybridization VNTR Variable number tandem repeat.
experiments. See driver.
Transition A nucleotide substitution from one W I. In DNA sequences: adenlnc or thymine. 2. In
purine to another purine (e.g., A -+ G), or from RNA sequences: adenine or urac~l.3. In protem
one pyrimidine to another pyrimidine (e.g., sequences: tryptophan.
T-+ C). WPGMA Werghted pax-group method of arlthrnet~c
Transposable element A genomic element that can averages. A cluster analysls technique.
move froin site to site in the genome of an organ-
ism, either through direct DNA copying (at least Xenology Homology that arises vla lateral gene
in prokaryotes) or reverse transcription from an transfer between unrelated specics (e.g., by rctro-
RNA intermediate (probably the usual mecha- viruses).
nism in eukaryotes).
Transposon A segment of DNA flanked by transpos- Y 1. In DNA or RAJA scqucnccs. cytosine or thyinine
able elements that is capable of moving its loca- (uracil in RNA). 2. In protcin sequences: tyrosinc
tion in the genome.
Transversion A nucleotide substitution from a Zymogram The pattern on an aliozyme electro-
purine to a pyrimidine (e.g., A -t C), or vice phoresis gel visualized by histochem~calstaining
versa (e.g., T -+G)
Literature Cited
ALelc, L.C , W. Kiln and B. E. Felgenhaucr. 1989. Akalke, H. 1974. A new look at the statistical model
"vloleculal evidence for the inclusion of the phy- ident~ficatlon.IEEE Trans. Autom. Contr. AC-
luln Pentaston~ida111 the Crustacea. Mol. Biol. 19:716-723.
bvol 6:685-691. Aldrich, J., B. Cherney, E. Merlin and J. D. Palmer.
A b o ~ t iF~ ,1987. Letter to the editor. Cell 51:515-516. 1986a. Sequence of the rbcL gene for the large sub-
Adachl, J. and M. Itasegawa 1992. MOLPNY: unit of ribulose bisphosphate carboxylase-oxyge-
Programs for moiecular phylogenetics I-PROTML: nase from petunia. Nucl. Acids lies. 14:9534.
Maximurn lzkelihood inference of protein phylogeny. Aldrich, J., B. Cherney, E. Merlin and J. D. Palmer.
Co~nputerScience Monographs, No. 27. Institute 198623. Sequence of the rbcL gene for the large sub-
of Statistical Mathematlcs, Tokyo unit of ribulose bisphosphate carboxylase-oxyge-
Adachi, J ,Y. Cao and IvI Hasegawa. 1993. Tempo and nase from alfalfa. Nucl. Acids Res. 14:9535.
mode of mitocliondnal DNA evolution in verte- Aldrich, P. R.and 1. Doebley. 1992. Restriction frag-
b ~ a t e ai
s the amino a c ~ dsequence level: rapid ment variation in the nuclear and chloroplast
evolulion in warm-blooded vertebrates. J. Mol. genomes of cultivated and wild Sorglnrm bicolor.
Evol. 36.270-281. Theor. Appl. Genet. 85:293-302.
Adams, R. P.,T. Derneke and H. H.Abufatih. 1993. Alexander, B. A. 1991. Phylogenetic analysis of the
IiAPD DNA fingerprints and terpenoids: Clues to genus Apis (Hymenoptera: Aptdae). Ann.
past migrations of Juniper~rsin Arabia and East Entomol. Soc. Am. 84:137-149.
Africa. Theor. Appl. Genet. 87:22-26. Allard, M. W., M. M. Miyamoto, L, Jarecki, R Kraus
Adelman, R., R. L. Saul and B. N. Ames. 1988. and M. R. Tennant. 1992. DNA systematics and
Oxidatlve damage to DNA: Relation to species evolution of the artiodactyl family Bovidae. Proc.
metabolic rate and life span. Proc. Natl. Acad. Sci. Natl. Acad. Sci. USA 89:3972-3976.
USA 85:2706-2708. Allard, M. W, D. Young and Y. Huyen. 1995. Detecting
Adcy, N B., T 0.Tollefsbol, A. B.Sparks, M. H. Edgell dinosaur DNA. Science 268:1192.
and C A Hutchlson JlI. 1994. Molecular resurrec- Allegrucci, G., D. Cesaroni and V. Sbordoni. 1987.
t ~ o nof an extinct ancestral promoter for mouse Adaptation and speciation of cave crickets
L1. Proc. Natl. Acad. Sci. USA 91:1569-1573. (Orthoptera, l&aphidopboridae): Geographic
Adklns, R. M. and R. L. Honeycutt. 1991. Molecular variation of morphometric i ~ d i c e and
s allozyme
phylogeny of the superorder Archonta. Proc. frequencies. Biol. J, Linnean Soc. 31:151-160.
Natl. Acad. Sci. USA 88:70317-10321. Allendorf, F. W. 1977. Electromorphs or alleles.
Arbersold, P. B., G. A. Winans, D.J. Teel, G. B. Milner Genetics 87821-822.
and F. M. Utter. 1987. Manual for starch gel elec- AUendorf, E W. and S. R. Phelps. 1981. Use of allelic
trophoresis: A method for the detection of genetic frequencies to describe population structure. Can.
varla tian. NOAA Tech. Report NMFS No. 61. J. Fish. Aquat. Sci. 38:1507-1514.
Aguadk, M., W. Meycrs, A. D. Long and C. H. Langley. Allendorf, F. W. and G. H. Thorgaard. 1984.
1994. Single-strand conformation polymorpl~ism Tetraploidy and the evolution of salmonid fishes,
analysis coupled wlth stratified DNA sequencing pp. 1-53. In B. J. Turner (ed.), Evolutiona~/Genetics
ieveals reduced sequence variation in the sub) of Fishes. Plenum, New York.
and sti(W) regions of the Wrosophila melanogaster Allendorf, F. W. and F. M. Utter. 1973. Gene duplica-
X chromosome. Proc. Natl. Acad. Sci. USA tian within the family Salmonidae: Disomic inher
91 46584662.
Literature Cited 561
itance of two loci reported to be tetrasomic in Dutta (ed.), DNA Systematics.CRC Press, Boca
rainbow trout. Genetics 74:647-654. Raton, FL.
Allendorf, F. W., K L. Knudsen and R. F. Leary. 1983. Aquadro, C. E and 1. C. Avise. 1982a. An assessment
Adaptive significance of differences in the tissue- of "hidden" heterogeneity within electromorphs
speclfic expression of a phosphoglucomutase at three locl in deer mice. Genetlcs 102:269-284.
gene in ralnbow trout. Proc. Natl. Acad. Sci. USA Aquadro, C. E and J. C. Avise. 1982b. Evolutionary
800:1397-1400. genetics of birds. VI. A reexamination of protein
Allendorf, E W., G. Stahl and N. Ryman. 1984. divergence using varied electrophoretic condi-
Silencing of duplicate genes: A null polymor- tions. Evolution 36:1003-1019.
phism for lactate dehydrogenase in rainbow Aquadro, C. F. and B. D. Greenberg. 1983. Human
trout. Mol. Biol. Bvol. 1:238-248. mitochondria1 DNA variation and evolution:
Altschul, S. E, W. Gish, W. Miller, E. W. Myers and D. Analysis of nucleotide sequences from seven indi-
J. Lipman. 1990. Basic local alignment search tool. viduals. Genetics 103:287-312.
J. Mol. Biol. 215:403-410. Aquadro, C. F., S F. Deese, M. M. Bland, C. H. Langley
Ammerman, L. K. and D. M. Hillis. 1992. A molecular and C. C. Laurie-Ahlberg. 1986. Molecular popu-
test of bat relationships: Monophyly or diphyly? lation genetics of the alcohol dehydrogenase gene
Syst. Biol. 41:222-232. region of Drosophila melanogaster. Genetics
Amos, D., C. Schlotterer and D. Tautz. 1993. Social 114:1165-1190.
structure of pilot whales revealed by analytical Aradhya, K. M., D. Mueller-Dombois and T. A.
DNA profiling. Science 260:670-672. Ranker. 1991. Genetic evidence for recent and
Anderson, D. M.and W. R. Folk. 1976. Iodination of incipient speciation in the evolution of Hawaiian
DNA. Studies of the reaction and iodination of Metrosideros (Myrtaceae).Heredity 67:129-138.
papovavirus DNA. Biochemistry 15:1022-1030. Archie, J. W. 1989a.A randomization test for phyloge-
Anderson, J. O., J. Nath and E. J. Harner. 1978. Effect netic information in systematic data. Syst. 2001,
of freeze-preservation on some pollen enzymes. 38:239-252.
Cryobiology 15:469477. Archie, J. W. 198913. Phylogenies of plant families: A
Anderson, P. R. and J. G. Oakeshott. 1984. Parallel geo- demonstration of phylogenetic randomness in
graphical patterns of allozyme variation in two DNA sequence data derived from proteins.
sibling Drosophila species. Nature 308:729-731. Evolution 43:1796-1800.
Anderson, S., A. T. Banker, B. G. Barrell, M. H. L. Archie, J. W., C. Simon and A. Martin. 1989. Small
DeBruijn, A. R. Coulson, J. Drouin, I. C. Eperon, sample size does decrease the stability of dendro-
D. P. Nierlich, B. A. Roe, E Sanger, P. H. Schreier, grams calculated from allozyme-frequency data.
A. J. H. Smith, R. Staden and I. G. Young. 1981. Evolution 43678-683,
Sequence and organization of the human mito- Arctander, P. 1988. Comparative studies of avian DNA
chondrial genome. Nature 29k457-465. by restriction fragment length polymorphism
Andronico E, S. De Luccini, F. Graziani, I. Nardi, R. analysis: Convenient procedures based on blood
Batistoni and G. Barsacchi-Pilone. 1985. Molecular samples from live birds. J. Ornithologie
organization of ribosomal RWA genes clustered at 129:205-216.
variable chromosomal sites in Triturus vulgaris Arhvalo, E., S. K. Davis, G. Casas, G. Lara and J. W.
meridionnlis (Amphibia, Urodela). J. Mol. Biol. Sites, Jr. 1993. Parapatric hybridization between
186:219-229. chromosome races of the Sceloporus grammicus
Angerer, R. C., E. H. Davidson and R. J. Britten. 1976. complex (Phrynosomatidae): Structure of the
Single copy DNA and structural gene sequence Ajusco transect. Copeia 1993:320-340.
relationships among four sea urchin species. Arkvalo, E., S. K. Davis and J. W. Sites, Jr. 1994.
Chromosoma 56:213-226. Mitochondria1DNA sequence divergence and
Ansorge, W, and S. Labeit. 1984. Field gradients phylogenetic relationships among eight chromo-
improve resolution on DNA sequencing gels. J. some races of the Sceloporus grammicus complex
Biochem. Biophys. Meth. 10:237-243. (Phrynosomatidae) in central Mexico. Syst. Biol.
Appels, R, and J. Dvorak. 1982. The wheat ribosomal 43:387-418.
DNA spacer region: Its structure and variation in Armour, J. A. L. and A. J. Jeffreys. 1992. Biology and
populations and among species. Theor. Appl. applications of human minisatellite loci. Curr.
Genet. 63:337-348. Opin. Genet. Dev. 2:850-856.
Appels, R. and R. L. Honeycutt. 1987. rDNA: Armour, J. A. L., R. Neumann, S. Gobert and A. J,
Evolution over a billion years, pp, 81-135. In S. K. Jeffreys. 1994. Isolation of human simple repeat
562 Literature Cited
loci by hybridisallon Selection.Human Mol. Attardi, G. 1985. Animal m~tochondrtalDNA: An
Genet. 3:599-605. extreme example of genetic economy. Int. Rev.
Arnason, U. and A. GuIIberg. 1904. Relationships of Cytol. 93:93-145.
baleen whales estabiished by cytochrome b gene Austin, C. C. 1995. A new method of bi-polymerase
sequence comparison. Nature 367:726-728. sequencing prevents "stop-bands." Mol.
Arnason, U. and B. Widegren. 1984. Different rates of Biotechnol. 4500-101.
divergence in highly repetitive DNA of cetaceans. Ausubel, E M. (ed.). 1989. Current Protocols iiz
Hereditas 101:171-177. Molecular Biology. John Wiley and Sons, New
Arnheim, N. 1983. Concerted evolution of multigene York.
families, pp. 38-61. Iir M. Nei and R. K.Koehn Ausubel, F. M., R. Brent, 17. B ICingston, D. D. Moore,
(eds.), Evolution of Genes and Proterns. Sinauer, J. G. Seidman, J. A. Smith and K.Struld. 1992.
Sunderland, Massachuset(s. Short Protocols i n Molecular Biology. 2nd ed. John
Arnheim, N., E. M. Prager and A. C. Wilson. 1969. Wiley and Sons, New York.
Immunological prediction of sequence differences Avise, J. C. 1974, Systematic value of electrophoretic
among proteins. Chemical comparisons of chick- data. Syst. Zool.23:465481.
en, quail, and pheasant lysozymes. J. Biol. Chem. Avise, J. C. 1976. Genetic differcntiation during specia-
244:2085-2094. tion, pp. 106-122. In R J. Ayala (ed.), Molecular
Arnheim, N., D. Treco, B. Taylor and E. M. Eicher. Evolution., Sinauer, Sunderland, Massacl~usetts.
1982. Distribution of ribosomal DNA length vari- Avise, J. C. 1986. Mitochondria1DNA and the evolu-
ants among mouse chromosomes. Proc. Natl. tionary genetics of higher animals. Phil. Trans.
Acad. Sci. USA 79:4677-4680. Roy. Soc. London B 312:325-342.
Arnold, E. N. 1981. Estimating pl~ylogeniesat low tax- Avise., J. C . 1989. Gene trees and organismal histories:
onomic levels. Z. Zool. Syst. Bvo1ut.-forsch. A phylogenetic approach to population biology.
19:l-35. Evolution 43:1192-1208.
Arnold, M. L. 1992. Natural hybridization as an evolu- Avise, J. C. 1994.Molecular Markers, Natural History,
tionary process. Annu. Rev. Ecol. Syst. 23237-261. and Evolution. Chapman and Hall, New York.
Arnold, M. L., D. D. Shaw and N. Contreras. 1987a. Avise, J. C. and C. F. Aquadro. 1982. A comparative
Ribosomal RNA-encoding DNA introgression summary of genetic distances in the vertebrates.
across a narrow hybrid zone between two sub- Evol. Biol. 15:151-158.
species of grasshopper. Proc. Natl. Acad. Sci. USA Avise, J. C. and G. B. Kitto. 1973. Phosphoglucose iso-
84:3946-3950. merase gene duplication in the bony fishes: An
Arnold, M. L., P. Wilkinson, D. D. Shaw, A. D. evolutionary history. Biochem. Genet. 8:113-132.
Marchant and N. Contreras. 1987b. Highly repeat- Avise, J. C. and R. A. Lansman. 1983. Polymorphism of
ed DNA and allozyme variation between sibling mitochondria1DNA in populations of higher ani-
species: Evidence for introgression. Genome mals, pp. 165-190. In M. Nei and R. K. Koehn
29:272-279. (eds.), Evolution of Genes and Proteins. Sinauer,
Arnold, M. L., J. L. Hamrick and 8. D. Bennett. 1990. Sunderland, Massachusetts.
Allozyme variation in Louisiana irises: A test for Avise, J. C., J. J. Smith and E J. Ayala, 2975. Adaptive
introgression and hybrid speciation. Heredily differentiation with little genic change between
65:297-306. two native California minnows. Evolution
Arnold, M. L., C. M. Buckner and J. L. Robinson. 1991. 29:411-426.
Pollen mediated introgression and hybrid specia- Avise, J. C., C. Giblin-Davidson, Laenn, J. C. Patton
tion in Louisiana irtses. Proc. Natl. Acad. Sci. USA and R. A. Lansman. 1979. Mitochondria1DNA
88:1398-1402. clones and matriarchal phylogeny within and
Arrand J. E. 1985. Preparation of nucleic acid probes, among geographic populations of the pocket
pp. 17-45. In 8. D. Ha~nesand S. J. Higgins (eds.), gopher, Geomys pinetis. Proc. Natl. Acad. Sa. USA
Nucleic Acid Hybridisation: A Practical Approach. 76:6694-6698.
IRL Press, Oxford. Avise, J. C., J. E. Netgel and 1. Arnold. 1984.
Asber, J. H. 1970. Parthenogenesis and genetic vari- Demographic influences of lnitochondrial DNA
ability. II. One locus model for various diploid lineage survivorship in animal populations. J.
populations. Genetics 66:369-391. Mol. Evol. 20:99-105.
Atcl~ley,W. R. and W. M. Fitch. 1991. Gene trees and Avise, J. C., J. Arnold, R. M. Ball, E. Bermingham, T.
the origins of inbred strains of mice. Science Lamb, J. E. Neigel, C. A. Reeb and N. C. Saunders.
254:554-558. 2987. Intraspecific phylogeography: The mito-
chondrial bridge between population genetics Zealand populations ot chaffinaches (Frrngilln
and systematics. Annu. Rev. Ecol. Syst. coelcbs), Evolution 46:1784- 1 800
18:489-522. Baker, A J and A. Moeed 1987 Rapid genetic diflcr-
Avise, J. C., R. M. Ball and J. Arnold. 1988. Current entiation and founder effect in colonizing popula-
versus historical population sizes in vertebrate tions of Common Mynas (Aci ~doiherestrzstls)
species with high gene flow: A comparison based Evolution 41:523-538.
on mitochondrial DNA lineages and inbreeding Baker, C S. and S. R. Palumbi. 1994. Which whalcs are
theory for neutral mutations. Mol. Biol. Evol. hunted? A molecular genetrc approach to moni-
5:331-344. toring whallng. Science 265 1538-1539.
Avise, J. C., B. W. Bowen and T. Lamb. 1989. DNA fin- Baker, M.C , D B.TI.iompson, G. L Sherman, M, A
gerprints from hypervariable mitochondrial geno- Culin~nghamand D F Tomback. 1982. Allozylnc
types. Mol. Biol. Evol. 6:258-269. frequencies in a hnear series of song dlnlect popu-
Avise, J. C., J. C. Trexler, J. Travis and W. S. Nelson. lations. Evolution 36.1020-1 029
1991. Poectlia mexfcana is the recent female parent Baker, R. J. and H A. Wichman. 1990. Retrotransposon
of the unisexual fish P.formosa. Evolution Mys 1s concentrated on the sex chromosomes
45:1530-1533. Imphcations for copy nuinber containment
Avise, J. C., R. T. Alisauskas, W. S. Nelson and C. D. Evolution 44:2083-2088.
Ankney. 1992a. Matriarchal population genetic Baker, R. J., S. K. Davrs, R. D. Bradlcy, M. J. Ham~lion
structure in an avian species with female natal and R. A. Van Den Bussche 1989. Ribosomal-
plulopatry. Evolution 4631084-1096. DNA, mitochondrial-DNA, chromosomal, and
Avise, J. C., J. M. Quattro and R. C. Vrijenhoek. 1992b. allozymlc stud~eson a contact zone in the pocket
Molecular clones within organismal clones: gopher, Geoinys. Evolution 43 63-75.
Mitochondria1 DNA phylogeliies and evolution- Baker, R. J., R. L. FIoneycutt and R A. Van Den
ary history of unisexual vertebrates. Evol. Biol. Bussche. 2991a. Examlnatlan of thc monophyly of
26:225-246. bats: Restriction map of the ribosomal DNA
Avise, J. C., B. W. Bowen, T.Lamb, A. B. Meylan and E. c~stron,pp. 42-53. In T. A Grlffiths and D.
Benningham. 7992c. Mitochondria1 DNA evolu- T<hngener (eds.), Confnb~~tlolis 117 Manrlr7alog-y in
tion at a turtle's pace: Evidence for low genetic Iionor of Karl F. Koopnmn Amerlcan Museum ot
variability and reduced microevolutionary rate in Natural History, New York
the Testudines. Mol. Biol. Evol. 9:457-473. Baker, R. J., M. J. Novacek and N. B. Simmons 3991b.
Ayala, E. J. 1982. Genetic variation in natural popula- On the monophyly of bats. Syst. Zool. 40:216-231
tions: Problem of electrophoretically cryptic alle- Bald~ng,D. J. and R. A Nichols. 2994. DNA proflie
les. Proc. Natl. Acad. Sci. USA 79:550-554. match probability calculat~onsHow to allow for
Ayala, E J. 1986. On the virtues and pitfalls of the population stratlflcat~on,relatedness, databasc
moecular evolutionary clock. J. Hcrcd. sclectlon and single bands. Forensic Sci. Int
77:226-235. 64,125-140.
Ayala, E J., J. R. Powell, M. L. Tracey, C. A. Mourao Ball, R. M , Jr., S Frccman, F.C James, E. Bermlnghan~
and S. Perez-Salas. 1972. Enzymc variability in the and J. C. Avise 1988. Phylogeographic population
Drosophzla willistonz group. IV. Genic variation in structure of rcd-w~ngcdblackbirds assessed by
natural populations of Drosophrla willistoni. mitochondrial DNA. Proc. Natl. Acad. Sci USA
Genetics 70:113-139. 85 1558-1562.
Ball, R M. J ,J. E. Neigel and J C Av~se1990. Gcnc
Baba, M. L., M. Goodman, H. Dcne and G. W. Moore. gcncologies within the organismal pedxgree o f
1975. Origins of the Ceboidea viewed from an random-mating populatiol-is Evolut~on
immunological perspective. J. Human Evol. 44.360370.
489-102. Ballard, J W O., G. J. Olscn, D P Faith, W.A. Odgers,
Bachellerie, J. -P. and L. -H. Qu. 1993. Direct ribosomal D. M. Rowell and P W. Atkrnson. 1992. Ev~dencc
RNA sequencing for phylogenetic studies. Meth. from 125 ribosomal RNA scquenccs that o11y-
Enzymol. 224:349-357. chophorans are modlfied arthropods. Sc~ence
Bailey W. J., J. L. Slightom and M. Goodman. 1992. 258'1345-1348.
Rejection of the "flying primate hypothesis" by Bandelt, H.-J. and A. W. M. Dress 1992. Spht dccom-
phylogenetic evidence from the E-globin gene. position: A new and useful approach to phyloge-
Science 256:8&89. netic analysis of distance data. Mol. Phylogc~~ct
Baker, A. J. 1992. Genetic and morphometric diver- Evol 1.242-252.
gence in ancestral European and descendent New
B~ltks,J A and C. W. Birky Jr. 7985. Clxforoplast DNA Bassam, B. J. and G. Caetano-Anollbs. 1993. Silver
iii\v?rs~tyis low in a wlld plant, Lupinus texensis. staining of DNA in polyacrylamlde gels. Appl.
Proc Natl. Acad Sci USA 82:6950-6954. Biochem. Biotechnol. 42:181-188.
Darendse, W., et al. 1994.A genetic linkage map of the Bassam, B. J., G. Caetano-AnollCs and P. M. Gresshoff.
bovine genome. Nature Cenetlcs 6:227-235. 1991. Fast and sensitive silver staining of DNAin
Barker, J. S. F., P.D.East and B.S. Weir. 1986. Temporal polyacrylainide gels. Analyt. Biochem. 196:80-83.
and microgeographic variation m allozyme fre- Baum, D. 1994. rbcL and seed-plant phylogeny. Trends
queficies in a natural pop~llationof Drosoph~lia Ecol. Evol. 9:39-41.
buzznrtzr. Genetics 112:577-611 Baum, D. and A. Larson. 1991. Adaptation reviewed.
Barker, P E , J. I?. Testa, N.2.Parsa and R. Snyder. A phylogenetic methodology for studying charac-
1986. High molecular welght DNA from fixed ter macroevolution. Syst. Zool.40:l-18.
cytogcnetic preparations. Am. J. Human Genet. Bautz, E. K.and E A. Bautz, 1964. The influence of
39,661-668. noncompiernentary bases on the stability of
B'irnes, P.T and C. C. Laur~e-Ahlberg.1986. Genetic ordered polynucleotides, Proc. Natl. Acad. Sci.
variability of flight metabolism in Drosophila USA 52:1476-1481.
~~~elanogasler. 111. Effects of GPDH allozymes and Baverstock, I? R. and M. Adams. 1987. Comparative
environmental temperature on power output. rates of molecular, chromosomal and morphologi-
Genetlcs 112:267-294. cal evolution in some Australian vertebrates, pp.
13arnes, W. M. 1987. Sequencing DNA with dideoxyri- 175-188. In K. S. W. Campbell and M. E Day (eds.),
bonucleotides as cham terminators: Hints and Xates of Evolution. Allen and Unwin, London.
strategies for big projects. Meth. Enzymol. Baverstock, R., C. 13. S. Watts and S. R. Cole. 1977.
152 538-556. Electrophoretic comparisons between the
Barnes, W. M. 1994. PCR amplification of up to 35-kb allopatric populations of five Australian
DNA with high fidelity and high yield from h pseudomyine rodents (Muridae). Australian J.
bacteriophage templates. Proc. Natl. Acad. Sci. Biol. Sci. 30:471-485.
USA 912216-2220 13averstock,P. R., S. R. Cole, B. J. Rchardson and C. H.
Barnes, W M., M. Bevan and P. H. Son. 1983. Kilo- Watts. 1979. Electrophoresis and cladistics. Syst.
sequencing: Creatlon of an ordered nest of asym- Zool. 28:214-219.
metric deletions across a large target sequence Baverstock, P. R., M.Adams and C. H. S. Watts. 1986.
carr~edon phage M13. Meth. Enzymol. Biochemical differentiation among karyotypic
101.98-122. forms of Australian Rattus. Genetica 71:ll-22.
Uarrodaie, I. and F. D. K.Roberts. 1973. An improved Beckman, J. S. and J. L. Weber, 1993. Survey of human
algorithm for discrete lI linear approximation. and rat microsatellites. Genomics 12:627-631.
SlAM J. Numer. Anal. 10:839-848. Beerli, P., H. Hotz and T. Uzzell. 1996. Geologically
Barreti, M ,M.J. Donoghue and E.Sober. 1991. dated sea barriers calibrate a protein clock for
Against consensus. Sysl. Zool. 40,486493. Aegean water frogs. Evolution (in press).
Barrle, P. A.,A.], Jeffries and A. F. Scott. 1981. Begun, D. J. and C. E Aquadro. 1993. African and
Evolution of the pglobin gene cluster in man and North American populations of Drosopl?ila
the primates. J. Mol Biol. 149319-336. rnelanogaster are very different at the DNA level.
Darro~rclough,G. R., N. K. Johnson and R. M*Zink. Nature 365:548-550.
1985. On the nature of genic variation in birds, I.
Benjamin, D. C., J. A. Berzofsky, I. East, E R. N.
pp. 135-154. In R. I: Johnston (ed.), Current Gurd, C. Hannum, S. J. Leach, E. Margoliash, J. G.
Or~~iilzology. Vol. 2. Plenum, New York. Michael, A. Miller, E. M. Prager, M. Rcichlin, B. B.
Ba~ion,N. 13.and 6 .M Ilew~lt.1989. Adaptahon, Sercarz, S. J. Smith-Gill, P. E. Todd and A. C.
speciation and hybr~dones. Nature 341:497-503. Wilson. 1984. The antigenic structure of proteins:
Rdrton, N.H,R. R. Halllday and G. M. Hewitt. 1983. A reappraisal. Annu. Rev. Immunol. 2:67-101.
Rare electrophorehc variants in a hybrid zone. Benn, P. A. and M. A. Perle. 1986. Chromosome stain-
Hered~ty50:139-146. ing and banding techniques, pp. 57-84. In D. E.
13a11y; D.and J. A. Hartigdn, 1987a. Statistical analysis Rooney and B. H. Czepulkowski (eds.), Human
of holninoid molecular evolution. Stat. Sci. Cytogeneiics. lRL Press, Oxford.
2.191-210. Bennet, S., L.J. Alexander, R. H. Crozier and A. G.
Darry, D. and J. A. Hartigan. 198%. Asynchronous dis- Macfilay. 1988. Are megabats flying primates?
tance between homologous DNA sequences. Contrary evldence from a mitochondria1 DNA
Biurnetrics 43:261-276. sequence. Aust. 1. Biol. Sci. 41:327-332.
Bennett, M. D.1972. Nuclear DNA content and mini- Beyer, "W. A., M.L. Stein, T. E Smith and S. M. Ulam.
mum mitotic time in herbaceous plants. Proc. 1974. A molecular-sequence metnc and evolution-
Roy. Soc. London B, 181:109-135. ary trees. Math. Biosci. 19:9-25.
Benzten, P., W. C. Leggett and G. G. Brown. 1988. Bickmore, W. A. and A. T. Surnner. 1989. Mammalian
Length and restriction site heteroplasmy in the chromosome banding-an expression of genome
mitochondnal DNA of American shad (Alosa organization. Trends Genet. 5:144-148.
sapidisszm). Genetics 118:509-518. Birky, C. W., Jr. 1983. The partitioning of cytoplasmic
Bentzen, P., A. S. Harris and J. M. Wright. 1992. organelles at cell division. Int. Rev. Cytol.
Cloning of hypervariable minisatellite and simple 15:49-89.
sequence microsatellite repeats for DNA finger- Birky, C. W., Jr.,T. Maruyama and P. Fuerst. 1983. An
printing of important aquacultural species of approach to population and evolutionary genetic
salmonids and tilapia, pp. 242-262. In T. Burke, G. theoy for genes in mitochondria and chloroplas-
Dolf, A. J. Jeffreys and R. Wolff (eds.), DNA ts, and some results. Genetics 103:513-527.
Fingerprinting: Approacl~esand Applications. Birky, C. W., Jr., P. Fucrst and T. Maruyama. 1989.
Birkhauser Verlag, Basel, Switzerland. Organelle gene diversity under migration, muta-
Benveniste, R. E. 1985. The contribution of retroviruses tion, and dnff: Equilibriuln expectations,
to the study of mammalian evolution, pp. approach to equilibrium, effects of heteroplasmic
359-417. In R. J. MacIntyre (ed.), Molecular cells, and comparison to nuclear genes. Genetics
Evolutionay Genetics, Plenum, New York. 121:613-627.
Benveniste, R. E. and G. J. Todaro. 1976. Evolution of Birley, A. J. and J. H. Croft. 1986. Mitochondria1 DNAs
type C viral genes: Evidence for an Asian origin and phylogenetic relationships, pp. 107-137. In S.
of man. Nature 261:lOl-108. K. Dutta (ed.), DNA Systematics. CRC Press, Boca
Berg, W. J. and D. G. Buth. 1984. Glucose dehydroge- Raton, FL.
nase in feleosts: Tissue distribution and proposed Birstem, V, J. 1982. Structural characteristics of genome
function. Comp. Biochem. Physiol. 77B:285-288. organization in amphibians: Differential staining
Berger, S. L. and A. R. Kimmel (eds.). 1987. Guide to of chromosomes and DNA structure. J. Mol. Evol.
Molecular Cloning Techniques. Meth. Enzymol. 18:73-91.
152:1-812. Bisbee, C. A., M. A. Baker, A. C. Wilson, I. Hadji-Azimi
Berlocher, S. J. and G. L. Bush. 1982. An electrophoret- and M. Fischberg. 1977.Albumin phylogeny for
ic analysis of Rhagoktis (Diptera: Tephritidae) clawed frogs (Xenopus). Science 195:785-787.
phylogeny. Syst. Zool. 31:13&155. Bishop, J. G. and J. A. Hunt. 1988. DNA divergence in
Berlocher, S. H. and D. L. Swofford. 1996. Searching and around the alcohol dehydrogenase locus in
for phylogenetic trees under the frequency parsi- five closely related species of Hawaiian
mony criterion: An approximation using general- Drosophila. Mol. Biol. Evol. 5:415-432.
ized parsimony. (unpublished manuscript) Bishop, M. D., S. M. Kappes, J. W.Keele, R. T. Stone, S.
Bermingham, E. and J. C. Avise. 1986. Molecular zoo- L. E Sunden, G. A. Hawkins, S. S. Toldo, R. Fries,
geography of freshwater fishes in the southeast- M. D. Grosz, J. Yoo and C. W. Beattie. 1994. A
ern United States, Genetics 113:939-965. genetic linkage map for cattle. Genetics
Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. 136:619-639.
Salina, G. Cuny, M. Meunier-Rotival and E Bishop, M. J. and A. E. Friday. 1985. Evolutionary trees
Rodier. 1985. The mosaic genome of warm-blood- from nucleic acid and protein sequences. Proc.
ed vertebrates. Science 228:953-958. Roy Soc. London B 226:271-302.
Beutler, E. 1969. Electrophoresis of phosphogiycerate Black, I. W. C. 1993. PCR with arbitrary primers:
kinase. Biochem. Genet. 3:189-195. Approach with care. Insect Mol Biol. 2:1-6.
Bevan, I. S., R. Rapley and M. R. Walker. 1992. Bledsoe, A EI. 1987. DNA evolutionary rates in nine-
Sequencing of PCR-amplrfied DNA. PCP. Meth. primaried passerine birds. Mol. Biol. Evol.
Applica, 1:222-228. 4:559-571.
Beverley, S. M. and A. C. Wilson. 1982,Molecular eve- Block, B. k.,J. R. Finnerty, A. E R. Stewart and J. Kidd.
lutlon in Drosophila and higher diptera. I. Micro- 1993. Evolution of endothermy in fish: Mapping
complement fixation studies of a larval physiological traits on a molecular phylogeny,
hemolymph protein. J. Mol. Evol. 18:251-264. Science 260:210-214.
Beverley, S. M, and A. C. Wilson. 1985. Ancient origin Bodmer, M. and M. Ashburner. 1984. Conservation
for Hawaiian Drosophibae inferred from prote~n and change in the DNA sequences coding for
comparisons. Proc. Natl. Acad. Sci. USA aicohoI dehydrogenase in sibling species of
82475311757. Drosophila. Nature 309:421-430.
566 Literafure Cited
Boerwinkle, E., W. Xiong, E. Fourest and L. Chan. Boyden, M. G. 1967. It's about time. Ser. Mus. Bull.
1989. Rapid typing of tandemly repeated hyper- 37:7-10.
variable loci by the polymerase chain reaction: Boyer, S. H.1961. Alkaline phosphatase in human sera
Application to the apolipoprotein B 3' l~ypervari- and placentae. Science 134:1002-1004.
able region. Proc. Natl. Acad. Sci. USA 86:212-216. Boyer, S. H., D. C. Fainer and E. J. Watson-Williams.
Bogart, J. F. 1972. Karytoypes, p. 171-195. In W. E. 1963. Lactate del~ydrogenasevariant from h~unan
Blair (ed.), Evolution tn the Genus Bufo. University blood: Evidence for molecular subunits. Science
of Texas Press, Austm. 141:642-643.
Bogart, J. P., L.A. Lowcock, C. W. Zeyl and B. K. Bradley, R. D., J. J. Bull, A. D. Johnson and D. M.
Mable. 1987. Genome constitution and reproduc- Millis. 1993. Origin of a novel allele in a mam-
tive biology of hybrid salamanders, genus malian hybrid zone. Proc, Natl. Acad. Sci. USA
Ambystoma, on Kelleys Island in Lake Erie. Can. J. 90:8939-8941.
Zool. 65:2188-2201. Brazaitis, P. and M. Watanabe. 1982. The doppler, a
BonneU, M. T. and R. K. Selander. 1974. Elephant seals: new tool for reptile and amphibian hematological
Genetic variation and near extinction. Science studies. J. Herpetol. 16:1-6.
184:908-909. Bremer, K.1988. The limits of amino acid sequence
Banner, T. I., D. J. Brenner, B. R. Neufeld and R. J. data in angiosperm phylogenetic reconstruction.
Britten. 1973. Reduction in rate of DNA reassocia- Evolution 42:795-803.
tion by sequence divergence. J. Mol. Biol. Bremer, B. 1991. Restriction data from chloroplast
81:123-135. DNA for phylogenetic reconstruction: Is there
Bonner, T. I., R. Heinemann and G. J. Todaro. 1980. only one accurate way of scoring? Plant Syst.
Evolution of DNA sequences has been retarded in Evol. 17539-54.
Malagasy primates. Nature 286:420-423. Bremer, K. 1994. Branch support and tree stability
Boore, J. L., T.M. Collins, D. Stanton, L. L. Daehfer Cladistics 10:295-304.
and W. M. Brown. 1995. Deducing the pattern of Breneman, J. W., M. J, Rarnsey, D. H. Lee, G. G.
arthropod phylogeny from mitochondrial DNA Eveleth, J. L. Minkler and J. D. Tucker. 1993. The
rearrangements. Nature 376:163-165. development of chromosome-specific composite
Bowcock, A. M. and L. Cavalli-Sforza.1991. The study DNA probes for the mouse and their application
of variation in the human genome. Genomics to chromosome painting. Chromosoma
11:491498. 102:591-598.
Bowcock, A. M., J. R. Kidd, J. L. Mountain, J. M. Brent, R. P. 1973. Algovithms for Minimization WEhout
Herbert, L. Carotenuto, K. K. ~ i d and d L. L. Derivatives. Prentice-Hall, Englewoad Cliffs, New
Cavalli-Sforza. 1991. Drift, admixture, and selec- Jersey.
tion in human evolution: A study with DNA poly- Brewer, G. J. 1970. An Introduction to Isozyme
morphism~.Proc. Natl. Acad. Sci. USA Techniques. Academic Press, New York.
88:839-843. Bridge, D., C. W. Cunningham, B. Schierwater, R.
Bowcock, A. M., A. Ruiz-Linares, J. Tomfohrde, E. DeSalie and L. W. Buss. 1992. Class-level relation-
Minch, J. R. Kidd and L.L. Cavalli-Sforza. 1994. ships in the phylum Cnidaria: Evidence from
High resolution of human evolutionary trees with mitochondria1genome structure, ]?roc.Natl.
polymorphic microsatellites. Nature 368:455457. Acad. Sci. USA 89:8750-8753.
Bowen, B. W.,A. B. Meylan and J. C . Avise. 1989. An Briscoe, D. A., J. M. Malpica, A. Robertson, G. J. Smith,
odyssey of the green sea turtle: Ascension Island R. Frankham, R. G. Banks and J. S. R Barker. 1992,
revisited. Proc. Natl. Acad. Sci. USA 86:573-576. Rapid loss of genetic variation in large captive
Bowen, 8. W., W. S. Nelson and J. C. Avise. 1993. A populations of Drosophila flies: Implications for
molecular phylogeny for marine turtles: Trait the genetic management of captive populations.
mapping, rate assessment, and conservation rele- Conserv. Biol. 6:416-425.
vance. Proc. Natl. Acad. Sci. USA 905574-5577. Britten, R. J. 1986. Rates of DNA sequence evolution
Boyden, A. (ed.). 1948-1978. Serol. Mus. Bull. Vols. differ between taxonomic groups, Science
1-51. 231:2393-1398.
Boyden, A. 1942. Systematic serology: Acritical Britten, R. J. 1989. Comment on DNA hybridization
appraisal. Physiol. Zoal. 15:109-145. issues raised at Lake Arrowhead. J. Mol. Evol.
Boyden, A. 1964. Perspectives in systematic serolom, 18:163-164.
pp. 75-99. In C. A. Leone (ed.), Taxonomic Britten, R. J. and E. H. Davidson. 1969.Gene regulation
Biocl~emistryand Serology. Ronald Press, New York. for higher cells: A theory. Science 165:349-357.
Br~tten,R. J . and E. H. Davidson. 1985. Hybridisalion Brown, W. M.1983 Evolut~onof animal rnltocl~ondr~al
strategy, pp, 3-15. In B. D. Hames and S. J. DNA, pp. 62-88. Irz M Ncl and R. K. Koehl~
Higgins (eds.), Nucleic Acid Hybrldisafioiz: A (eds.), Evolution of Geiies and Prateltls. Smaue~,
Prnctlcal Applanch. IRL Press, Oxford. Sundcrland, Massachusetts
Bntteiz, R. J, and D. E.XCohne, 1967. Nucleotide Brown, W. M. 1985. The mltochondrlal genome of a111-
sequence repetition in DNA. Carnegie Inst. Wash. mals, p p 95-130 I n It Mac111tjxc (ed.),Molec~ilni
Yearbook 65.78-106. Evolutionary Genetrcs Plcnum, New York.
Britten, R. J. and D. E. Kohnc. 1968. Repeated Brown, W. M. and J Wrxgiit. 1979 Mitochondr~alDNA
sequences in DNA. Science 161:529-540. analyses and the orlgln and relative age of
Brittcn, R. J., A. Cetta and E. H. Davidson. 1978. The parthenogenetic lizards (genus Cnernidophorus).
single-copy sequence polymorphism of the sea Sc~ence203:2247-1249.
urchin Strongylocentrotus purpuratr~s.Cell Brown, W M., M. George, Jr and A. C. W~lson1979
15:1175-1186. Rap~dcvolution of animal mitochondrlal DNA
Britten, R. J., D. E. Graham and B. R. NeufeId. 1974. Proc. Nall Acad. Sci. USA 76 1967-1971
Analysis of repeating DNA sequences by reassocl- Brown, W. M., E. M. Pragcr, A. Wang and A. C
ation. Meth. Enzymol. 29:365-418. Wilson. 1982. Mltochondrlal DNA scqucnccs o I
Bron, C. and J. Kerbosch. 1973.Algorithm 457: Finding primates: Tempa and mode of evolution. J Mol
all cliques of an undirected graph. Comm. ACM EvoL 18:225-239.
16:575-577. Brownre, C. and D. D. Boos 2994. Type 1 error robust-
Bronstein, I., j. C. Voyta, K. G. Lazzari, 0.Murphy, B. ness of ANOVA and ANOVA on ranks whcn tlie
Edwards and I,. J. Krika. 1990. Rapid and sensi- number of treatments is large. Biometries
tive detection of DNA in Southern blots with 50:542-549.
chemiluminescence. BioTechniques 8:310-312. Bruford, M. W. and R. K.Wayne 1993. Microsatcllltes
Brooks, D. R. 1981. Hennig's parasitological method: and their application to population genetic stud-
Aproposed solution. Syst. Zool. 30:229-249. ies. Curr. Qpin. Genct. Dev. 3 939-943
Brooks, D. R. 1990. Parsimony analysis in historical Bruford, M. W., 0. Hanotte, J. F Y Brookfield and T
biogeography and coevolution: Methodological Burke. 2992. Single-locus and multilocus DNA
and theoretical update. Syst. Zool. 39:14-30. fingerprinting, pp. 225-269. In A. R. Hoelzel (ed ),
Brooks, D. R. and D. A. McLennan. 1991. Phylogmy, Molecular Genehc Analysrs oj Poptilafions. A
Ecology, and Behavior: A Research Program in Practical Approach. IRL Prcss, Oxford.
Comparative Biology. University of Chicago Press, Bruns, T. D. and J. D. Palmer. 1989. Evolution of mush-
Ch~cago. room mitochondrial DNA Suzllus and related
Brooks, D. R. and D. A. McLennan. 1993. Parascript: genera. J. Mol. Evol. 28:348-362.
Parasites and the Language of Evolution. Budowle, B.,A. M. Giusti, J. S Waye, F. S. Baechtcl, I<.
Smithsonian Institution Press, Washington, D.C. M. Fourney, D. E. Adams, L A. Presley, H. A
Broughton, R. E. and T. E. Dowling. 1994. Length vari- Dcadman and K.L. Monson. 1991. Rxed-bin
ation in mitochondrial DNA of the minnow, analysis for statistical evaluation of continuous
Cypriizella spiloptera. Genetics 138:179-190. distributions of allelic data from VNTR loci, for
Brown, A. D. H. 1975. Sample sizes needed to detect use 111 forensic comparisons. Am. J. .Human
linkage disequilibrium between two or tlvee loci. Genet. 48:841-855.
Theor. Pop. Biol. 8:184-201. Buffon, G.-L. de L., Comlc de. 1753. Histatre Nalureiic
Brown, J. K. M. 1994. Probabilities of evolutionary Ginirale et Particuladre. Val. 4 Imprimerie Royale,
trees. Syst. Biol. 43:78-91. Paris.
Brown, K. L. 1985. Demographic and genetic charac- Bull, J. J., C. W. Cunningham, I. 1. Molineux, M.R.
teristics of dispersal in the mosquitofish, Badgett and D. M. I-lllhs. 1993a Expermental
Gambusia afinis (Pisces: PoeciIiidae). Copeia molecular evolution of bacteriophage T7.
1985:597-612. Evolution 47:993-1007
Brown, T. A. and K.A. Brown. 1994. Ancient DNA: Bull, J. J., J. P. Huelsenbeck, C. W. Cunningham, D. L.
Using molecular biology to explore the past. Swofford and J? J Waddell. 1993b. Partltion~ng
BioEssays 16:719-726. and combining data in phylogenetic analysis.
Brown, W. M.1980. Polymorphism in mitocbondrial Syst. B101.42:384-397.
DNA of humans as revealed by restriction Bulmer, M. 1991. Use of the method of generalized
endonuclease analysis. Proc. Natl. Acad. Sci. USA Icast squares inreconstructing phylogenles from
77:3605-3609. sequence data. Mol. Blol. Evol. 8:868-883.
Bulmc.r, kl 1994 Tkeoretzcal Evolutions y Ecology. Buth, D. G. 1982a. Glucosepl~osphate-isomerase
Sinauer Associates, Sunderland, Massachusetts. expression in a tetraploid fish, Moxosfoma lackneri
Buneinan, P.1971. The recovery of trees from mea- (Cypriniformes, Catostomidae): Evidence for
sures of dissimilarity, pp. 387-395. In F. R. "retetraploidization"? Geneiica 57:171-175.
Hodson, D. G. Kendall and P. Tautu (eds.), Buth, D. G.1982b. Locus assignments for general mus-
Mcfkemattcs In the Arcizaeological and Hisioricnl cle proteins of darters (Etheostomatini). Copeia
Sc~eizces Edinburgh University Press, Edinburgh. 1982:217-219.
Buonagurio, D. A., S. Nakada, J. D. Parvin, M.Krystal, Buth, D. G. 1983. Duplicate isozyme loci in fishes:
P I'alese and Mr.M. 17itch.1986. Evolution of Origins, distribution, phyletic consequences, and
human influenza A vlruses over 50 years: Rapid, locus nomenclature, pp. 381400. bz M. C. Rattazzi,
unlforru rate of changc In NS gene. Science J. G. Scandalios and G. S. Whitt (eds.), Isozymes.
232 9811-982. Current Topics in Biological and Medical Researclz, Vol.
Burke, T.1989. DNA flngcrprinting and other methods 10. Genetics and Evoltition. A, R. Liss, New York.
for the study of matlng success. Trends Ecol. Evol. 8ut11, D. G. 1Y84a. Tlte application of electrophoretic
4 139-144. data in systematic studies. Annu, Rev. Ecol. Syst.
Burke, T.and M. W. Bruford. 1987. DNAfingerprint- 15:501-522.
ing in birds. Nature 327149-152. Buth, D,G. 198413. Allozymes of the cyprinid fishes:
Burke, T., N. B. Davles, M. W. Bruford and B. J. Variation and application, pp. 561-590. In B. J.
llatchv\rell. 1989. Parentai care and mating behav- Turner (ed.), Evolufionay Genetics of Fishes.
i,)ur of polyandrous dunnocks PrunelIa modulnris Plenum, New York.
related to paternity by DNA fingerprinting Buth, D. G. 1990. Genet~cprinciples and the interpre-
Nature 338.249-251. Lation of electrophoretic data, pp. 1-21. In D. H.
Burke, T., 0 Hanotte and M.W. Bruford. 1991. Whitmore (ed.), Electrophoretic and Isoelectric
Multilocus and single locus min~satelliteanalysis Focusing Techniques in Fisheries Management. CRC
In population biological studies, pp. 155-168. In T. Press, Boca Katon, Fionda.
Burke, G. Dolf, A J Jeffreys and R. Wolff (eds.), Buth, D. G. and R. W. Murphy 1980. Use of n~coti-
D N A I'rngerprinting: Approaches arid Applicafion. namide adenine dinucleotide (NAD)-dependent
Birkhduser Verlag Kasel, Switzerland. glucosc-6-phosphate dehydrogenase in enzyme
Burkhart, B. D ,E. Montgomery, C. 1-1. Langley and R. staining procedures. Stain Technol. 55:173-176.
.A Voeker, 1984. Characteri~ationof allozyme Bu th, D.G., B. M. Burr and J. R. Sclnenck. 1980.
null and low act~vitralleles from two natural EIectrophoretic evidence for relationships and dif-
poyulatlons of Drosophrla melanogaster. Genetics ferentiation among members of the percid sub-
107:295-306. genus Microperca. B~ochem.Syst. Ecol. 8:297-304.
Bur nell, K. L. and S. B. fledges. 1990. Relacionships Buth, D. G., R. W. Murphy, M. M. Miyamoto and C. S.
and biogeography of West Indian Anolts (Sauria. Lieb. 2985. Creatine kinases of amphibians and
Iguarudae): An approach using slow-evolvmg reptiles: Evolutionary and systematic aspects of
yrotem loci. Canb. J. SCI.26-7-30. gene expression. Copcia 1985:279-284.
Burtan, R. S. and B.-N. Lee. 1994. Nuclear and mito-
cl~otldrialgene gencalogles and allozyme poly- Caccone, A. and J. R, Powell. 1987. Molecular evolu-
morphlsm across a major phylogeograph~cbreak tionary divergence among North American cave
in the copepod Tigrioyus califor~z~ctts. Proc. Natl crickets I1 DNA-DNA hybridization. Evolution
Acad SCI.USA 91:5197-5201. 41:1215-1238.
R~ksack,S D , B G. Jerlcho, L. R. Maxson and T. Caccone, A. and J. R. Powell. 1992. A protocol for the
Lizrcll. 1988. Evolutionary relationships of sala- TEACL method of DNA-DNA hybridization, pp.
~nandersln the genrrs T Y I ~ L I YThe
U S :view from 385407. i n G. M. Hewitt, A. W. B. johnson and J.
~mmu~rology. Herpetologlca 44307-316. P. W. Young (eds.), Molecular Techniques in
Buth, C . G. 1979a. Creatine kinasc variability in Taxonomy..Springer-Verlag, New York.
Moxostonza nlacrolep~doturn(Cypriniformes: Caccone, A,, G. D. Amato and J. R. Powell. 1987.
Catostomidae). Copela 1979:152-154. lntraspec~ficDNA divergence in Drosopkila: A
Built, D. G. 197Yb. Genet~crelationships among the study on parthenogenetic D. mercatorum. Mol.
torrent suckers, genus Thoburniu. Biochem. Syst. Biol. Evol. 4:343-350.
Ecol. 3311-316. Caccone, A., G. 23. Amato and J. R. Powell. 1988a.
Durh, 9, G 1980, Staining procedures for D-2-hydrox- Rates and patterns of scnDNA and mtDNA diver-
~3c1ddclrydrogenase as applied to studlrs of gence within the Drosoplliln rn~lanogastersub-
io-iver vertebrates. lsozyme BuII. 13:115. group Genetics 118:671-683.
Caccone, A,, I<.DeSalle and J. R. PowelI. 1988b. Cano, R. J., H. N.Poinar, N. J. Pienlazek, A. Acra and
Calibration of the change in thermal stability of G. 0.Poinar, Jr. 1993. Amplification and sequenc-
DNA duplexes and degree of base pair mismatch. ing of DNAfrom a 120-135-million-year-old wee-
J. Mol. Evol. 27:212-216. vil. Nature 363:536-538.
Caccone, A., Gleason, J. M. and J. R.Powell. 1992. Cantatore, P., M. N. Gadaleta, M. Roberti, C. Saccone
Complementary DNA-DNA hybridization in and A. C. Wilson. 1987. Duplication and remould-
Drosophiln. J. Mol. Evol. 34:130-140. ing sf tRNA genes during the evolutionary
Cadle, J. E. 1988. Phyiogenetic relationships among rearrangement of mitochondria1 genomes. Nature
advanced snakes: A molecular perspective. Univ. 329:853-854.
Calif. Pub. Zool. 119:l-77. Cao, Y., J. Adachl, T.Yano and M. Hasegawa. 1994.
Caetano-Anollks, G. and B. J. Bassam. 1993. DNA Phylogenetic placement of guinea pigs: No sup-
amplification fingerprinting usmg arbitrary port of the rodent polyphyly hypothesis from
oligonucleotide primers. Appl. Biochem maximum likelihood analysis of multiple protein
Biotechnol. 42:189-200. sequences. Mol. Biol. Evol. 11:593-564.
Caetano-AnollCs, G., B. J. Bassam and P. M. Gresshoff. Carbonari, M. 1993. Optimization of PCR perfor-
1992. Primer-template interactions during DNA mance. Trends Genet. 9:4243.
amplification fingerprinting with single arbitrary Carlson, J. E., L. K. Tulsieram, J. C. Glaubitz, V. W. K.
oligonucleotides. Mol. Gen. Genet. 235:157-165. Luk, C. Kauffeldt and Rutledge. 1991. Segregation
Callan, H. G. 1966. Chromosomes and nucleoli of the of amplified DNA markers in F1 progeny of
axolotl, Ambystoma mexicanurn. J. Cell SCI.1:85-108. conifers. Theor. Appl. Genet. 83:194-200.
Callan, H. G. 1986. Lampbrush Clzromosomes. Springer- Carlson, S. S.,A. C. Wilson and R. D. Maxson. 1978.
Verlag, Berlm. Do aibumin clocks run on time? Science
Callan, H. G., J. G. Gall and C. A. Berg. 1987. The 200:1183-1185.
lampbmsh chromosomes of Xenopus laevis: Carpenter, J. M. 1988. Choosing among multiple
Prepardtiui~,iclentificdtiuri,dtld disiribuliol~o i 5 s
DNA sequences. Cnromosoma 95:236-250.
Callen, D. F,, A. D. Thompson, Y. Shen, H. A. Phillips, Carr, S. M., A. J. Brothers and A C. Wilson. 1987.
R.I. Richards, J. C. Mulley and G. R Sutherland. Evolutionary inferences from restriction maps of
1993. Incidence and orlgtn of "null" alleles in the mitochondrial DNA from nine taxa of Xenopus
(AC)n microsatellite markers. Am. J. Human frogs. Evolution 41:?76-190.
Genet. 52:922-927. Case, S. M, and M. H. Wake. 1977. immunological
Cameron, S. A. 1993. Multiple origins of advanced comparisons of Caecilian albumins (Amphibia:
eusociality in bees inferred from mitochondrial Gymnophiona). Herpetologica 33:94-98.
DNA sequences. Proc. Natl. Acad. Sci. USA Case, S. M,and E. E. Williams. 1984 Study of a contact
90:8687-8691. zone in the Anolis distichus complex in the central
Camin, J. H. and R. R. Sokal. 1965.A method for Domin~canRepublic. Herpetologica 40:118-137.
deducing branching sequences in phylogeny, Casillas, E., J. Sundqulst and W. E. Ames. 1982,
Evoiution 19:311-326. Optimization of assay conditions for, and selected
Campbell, J. A. and D R. Frost. 1993. Anguid lizards tissue distributior~of alanine aminotransferase
of the genus Abronia: Revisionary notes, descrip- and aspartate aminotransferase of English sole,
tions of four new species, a phylogenetic analysis, Pflrophys vetulus Girard. J. Fish. Biol. 21:197-204.
and key. Bull. Amer. Mus, Nat. Hist. 216.1-121. Castora, F. J., N. Arnheim and M. V. Simpson. 1980.
Cann, R. L,and A. C. Wilson. 1983. Length mutations Mitochondria1 DNA polymorphism: Evidence
in human mitochondria1 DNA. Genetics that variants detected by restriction enzymes dif-
104:699-711. fer in rlucleotide sequence rather than in methyla-
Cann, R. L., W. M. Brown and A. C. Wiison. 1984. tion. Proc. Natl. Acad. Sci. US4 7Z6415-6419.
Polymorphic sites and the mechanism of evolu- Cate, R. C., C. W. Ehrenfels, M Wysk, R. Tizard, 7. C.
tion in human mitochondrial DNA. Genetics Voyta 8.Murphy 111and 1. Bronstein. 1991.
106:479-499. Genomic southern analysis with alkaline-phos-
Cannatella, D. C. and R 0.de SA. 1993. Xenopus Iamis phatase conjugated oligonucleotide probes and
as a model organism. Syet. Bioi. 42:47&507. the chemiluminescent substrate AMPPD. Genet.
Cano, l? J. and M. K. Borucki. 1995. Revival and iden- Anal. Tech. Appl. 8;102-106.
tification of bacterial spores in 25- to 40-million- Catzeflis, F. M., E H. Sheldon, J. E. Ahlquist and C. G.
year-old amber. Science 268:1060-1064. Sibley. 1987. DNA-DNA hybridization evidence
of the rapid rate of rodent DNA evolution. Mol. Chambers, G. K., W. G. Laver, S. Campbell and J. B.
Biol. Evol. 4:242-253. Gibson. 1981. Structural analysis of an elec-
Cavalier-Smith, T. (ed.). 1985a. 7% Evolutton of Cenonle trophoretically cryptic alcohol dehydrogenase
Size. John Wiley & Sons, New York. variant from an Australian population of
Cavalier-Smith, T. 1985b.Eukaryotic gene numbers, Drosophila melanognster. Proc. Natl. Acad. Sci. USA
non-coding DNA, and genome size, pp. 69-103. 78:3103-3107.
In T. Cavalier-Smlth (ed.), The Evolutbt of Genome Champion, A. B., E. M. Prager, D. Wachter and A. C.
Size. Wiley, New York. Wilson. 1974. Microcomplement futation, pp.
Cavalli-Sforza, L. L. and A. W. E Edwards. 1967. 397416. In C. A. Wright (ed.), Biochemtcal and
Phylogenetic analysis: Models and estimation Immunological Taxonoiny of Animals. Academic
procedures. Evolution 32:550-570 and Am. J. Press, London.
Hum. Genet. 19:233-257. Champion, A. B., E. L. Barrett, N. J. Palleroni, K. L.
Cavalli-Sforza, L.L., A. C. Wilson, C. R. Cantor, R. M. Soderberg, R. Kunisawa, R. Contopoulou, A. C.
Cook-Deegan and M. C. King. 1991. Call for a Wilson and M. Duodoroff. 1980. Evolution in
worldwide survey of human genetic diversity. A Pseudornonas fluoresceizs. J. Gen. Micra.
vanishing opportunity for the Human Genome 120:485-511.
Project. Genomics 11:490-491. Chan, H.-C., W. T. Ruyechan and J. G. Wctmur. 1976.
Cavender, J. A. 1978. Taxonomy with confidence. In vitro iodinatioa of low complexity nucleic
Math. Biosci. 40:271-280. acids without chain scission. Biochemistry
Cavender, J, A. 1981. Tests of phylogenetlc hypotheses 15:5487-5490.
under generalized models. Math, Biosci. Chan, S. C., A. K. C. Wong and D. K. Y, Chiu. 1992. A
54:217-229. survey of multiple sequence comparison meth-
Cavender, J. A. ANZ)J. Fclsenstein. 1987. Invariants of ods. Bull. Math. Biol. 54563598.
phylogenies in a simple case with discrete states. Chapela, I. H., S. A. Rehner, T. R. SchuItz and U. G.
J. Classif. 4:57-71. Mueller. 1994. Evolutionary history of the sym-
Cedergren, R., M. W. Gray, Y. Abel and D. Sankoff, biosis between fungus-growing ants and their
1988. The evolutionary relationships among fungi. Science 266.1691-1694.
known life forms. J. Mol. Evol. 28:98-112. Chapman, R. W. and D. A. Powers. 1984. A method for
Cei, J. M. 1972. Archaeobatrachia versus Neobatrachia: rapid isolation of mtDNA from fishes. Maryland
A first serological approach. Serol. Mus. Bull. Sea Grant Tech. Rep. MD-SG-TS-84-05.11 pp.
48:1-4. Charleston, M. A. 1994. Factors affecting the perfor-
Chakraborty, R. 1992. Sample size requirements for mance of phylogenetic methods. Ph.D. disserta-
addressing the population genetic issues of foren- tion, Massey University.
sic use of DNA typing. Human Biol. 64341-159. Charleston, M. A., M. D. Hendy and D. Penny. 1994.
Chakraborty, R. and H.Danker-Hopfe. 1991.Analysis The effects of sequence length, tree topology, and
of population structure: A comparative study of number of taxa on the performance of phyloge-
different estimators of Wright's fixation indices, netic methods. J. Computation. Biol. 1:133-151.
pp. 203-254. In C. R. Rao and R. Chakraborty Chase, C. D., M. Ortega and C. E. Vallejos. 1991.
(eds.), Handbook of Statistrcs, Volunte8. North- DNA restriction fragment length polymorphisms
Holland, Amsterdam. correlate with isozyme diversity in Phaseolus vul-
Chakraborty, R. and L. Jin. 1993. Determination of garis L. Theor. Appl, Genet. 81:806-811.
relatedness between individuals using DNA fin- Chase, M, W., D. E. Soltis, R. G. Olmstead, D. Morgan,
gerprinting. Human Biol. 65:875-895. D. H, Les, 8. D. Misliier, M. R. Duvall, R. A. Price,
Chakraborty, R. and K. K. Kidd. 1991. The utility of H. G. Hills, Y.-L. Qiu, K. A. Kron, J. H. Rettig, E.
DNA typing in forensic work. Science Conti, J. D. Palmer, J. R. Manhart, K. J. Systma, H.
254:1735-1739. J. Michaels, W. J. Kress, K. G. Karol, W. D. Clark,
Chakraborty, R. and 0. Leimar. 1987. Genetic variation M. Hedren, B. S. Gaut, R. K. Jansen, K.J. Kim, C.
within a subdivided population, pp. 90-120. In N. F, Wimpee, J. E Smith, G. R. Furnier, S. H. Strauss,
Ryman and E Utter (eds.), Population Genetics and 0.Xiang, G. M. Plunkett, P. S. Soltis, S. M.
Fisheries Manageinenl. University of Washington Swensen, S. E. Williams, P. A. Gadek, C. J. Quinn,
Press, Seattle. L. E. Eguiarte, E. Golenberg, G. H. J. Learn, S. W.
Chakraborty, R. and M. Nei. 1977. Bottleneck effects Graham, S. C. Barrett, S. Dayanandan and A.
on average heterozygosity and genetic distance Albert. 1993. Phylogenetics of seed plants: An
with the stepwise mutation model. Evolution analysis of nucleotide sequences from the plastid
3197-356. gene rbcL. Ann. Missouri Bot. Gard. 80:528-580.
Cheliak, W. M. and J. A. Pitel. 1984. Techniques for Church, G. M.and Mr. Gilbert. 7 984. Genomic sequcnc-
starch gel electrophoresis of enzymes from forest ~ n gI'roc.
. Natl. Acad. Sci. USA 81 :1991-1995.
trees. Information Report PI-X-42. Petawawa Nat. Church, G. M. and S. Kieffer-I-hgglns. 1988. Mu1t1plt.x
Forestry Inst., Canadian Foreslry Service. DNA scquenclng. Science 240.185-188.
Chen, B-Y., S-H. Mao and Y-1-1. Lmg. 1980. Clzurch~ll,G. A., A, von Haeseler and W. C Navld~.
Evolutionary relationships of turtles suggested by 1992. Sample size for s phylogenetic mfercncc
immunological cross-reactivity of albumins. Mol Biol. Evol. 9:753-769.
Comp. Biochem. Physlol. 663,421425. Clark, A. G. and C. M. S. Lanigan. 1993. Prospects for
Cheng, S., C. Fockler, W. M.Barnes and R. Higuchi. estimating nucleotide divergence with RAPDS
1994a. Effective amplification of long targets from Mol. Biol. Evol. 10:1096-1111.
clones inserts and human genomic DNA. Proc. Clark, A. G. and I,.Wang. 1994. Comparative evolu-
Natl. Acad. Sci. USA 91:5695-5699. tionary analysis of metabohsm in nine Drosophlln
Cl~eng,S., R. Higuchi and M. Stoneking. 1994b. species. Evolution 48:1230-1243.
Complete mitochondria1 genome amplification. Clark, A. G. and T. S. Whlttam. 1992. Sequencing
Nature Genetics:350-351. crrors and molecuIar evolutionary analysis MoJ
Chepko-Sadc, B. D. and Z. T. Halpin (eds.).1987. Blol Evol. 9:744-752.
Manzmallan D~spersalPatterns: Tlze Effects of Social Clegg, M.T. 1993. Chloroplast genc sequences and t l ~ c
Structure on Poptlintion Gerzetics. University of study of plant evolution. Proc. Natl. Acad SCI
Chlcago Press, Chicago. USA 90:363-367.
Cherry, L. M., S. M. Case, J. G. Kunkel, J. S. Wyles and Clegg, M. T. and G. Zurawski. 1992. Chloroplast DNA
A C. Wilson. 1982. Body shapc metrics and and the study of plant p11yIogeny: Present status
organismal evolution. Evolution 36924-933. and future prospects, pp. 1-13.111 I? S. Saltis, J E.
Chesser, R. K. 1983. Genetic variability within and Soltis and J. J. Doyle (eds.),Moleclrlar Systematics
among populations of the black-tailed prairie of Plants. Chapman and Hall, New York.
dog. Evolution 37:320-331. Clegg, M. T., G. H.Learn and E. M. Golenberg 1990
Cheverud, J. M., M. M. Dow and W. Leutenegger. Molecular evolution of chloroplast DNA. In R K
1984. The quantitative assessment of phylogenetic Selander, A. G. Clark and T. S. Whittam (eds.),
constraints in comparative analyses: Sexual Evolution at the Moleculu~Level. Sjnauer,
din~orphismin body weight among primates. Sunderland, Massachusetts.
Evolution 39~1335-1351. Clayton, J. W. and D. N. Tretiak. 1972. Amine-citratc
Chilsan, 0. I?., L. A. Costello and N. 0. Kaplan. 1965. buffers for pH control in starch gel electrophorc-
Effects of freezing on enzymes. Fed. Proc. 24 sis. J. Fish. Res. Board Canada 29:1169-1172.
(s15):555-565. Cochrane, 8. J. and R. C. Richmond. 1979. Studies of
Chippindale, P. T. 1989. A high-pH discontinuous esterase 6 in Drosophzla melanogasfer. 1. The gencl-
buffer system for resolution of isozymes in starch- ics of posttranslational modification. Biochem.
gel electrophoresis. Stain. Technol. 6461-64. Genet. 17:167-183.
Chippindale, T. and J. J. Wiens. 1994. Weighting, Cockerham, C. C. 1969. Variance of gene frequencies.
partitioning, and combining characters in phylo- Evolution 23:72-84.
genetic analysis. Syst. Biol. 43:278-287. Cockerham, C. C. 1971. Higher-order probabilities of
Cho, S., A. Mitchell, J. Regier, C. Mitter, R. Poole, identity by descent. Genetics 69:235-246.
T. Wedlander, and S. Zhao. 1995.A highly con- Cockerham, C. C. 1973. Analyses of gene frequencies.
served nuclear gene for low-level phylagenetics: Genetlcs 74:679-700.
Elogations factor l a recovers morphology-based Cockerham, C. C. 1984. Drift and mutation with a
tree for heliothine moths. Mol. Biol. Evol. 22: finite number of allelic states. Proc. Natl. Acad.
650-656. Sci USA 81:530-534.
Choudharv, , M., -1. E. Strassman, C. R. Solis and D. C. Cockcrham, C. C. and B. S. Weir. 1983. Variance of
Queller. 1993. Microsatellitevariation in social actual inbreeding. Thcor. Pop. Uiol. 23:85-109
insects. Biochem. Genet. 31237-95. Cockerham, C. C. and B. S. Weir. 1986. Estimation of
Chrambach, A. and D. Rodbard. 1971. Polyacrylamide inbrecding parameters in stratified populations
gel electrophoresis. Science 172:440-451. Ann. Human Genet. 50.271-281.
Christiansen, F, B. and 0.Frydenberg. 1973 Selection Cockerham, C. C. and B. S. Weir. 1987. Correlations,
component analysis of natural polymorphisms descent measures: Drlft with migration and muta-
using population samples including mother-off- tlon. Proc. Natl. Acad. Sci. USA 8453512-8514.
spring combinations. Theor. Pop. Biol. 4:425-445. Cockerham, C. C. and B. S. Weir. 1993. Estimation of
gene flow from F-statist~cs.Evolution 47.855-863.
Cocks, G. T. and A. C. Wilson. 1972. Enzyme evolution Biological and Medical Research, Vol. 6. A. R. Liss,
in ihe Ent.erobacter~aceae.J. Bacteriol. 110:793-802. New York.
Coen, E., T. Strachan and G. Dover. 1982. Dynamics of Crabtree, C. B. 1987. Allozyme evidence for the phylo-
concerted evolution in regions of ribosomal DNA genetic relationships within the silverside sub-
and histone gene familres in the melanogaster family Atherinopslnae. Copeia 1987:860-867.
group of Drosophrln J Mol Biol. 158:17-35. Cracraft, J. 1987. DNA hybridization and avian phylo-
Colless, D FI. 1970. Thc phenogram as an estimate of genetics. Evol. Biol. 21:47-96.
phylogeny. Syst. Zool. 3 9:352-362. Cracraft, 1. 1989. Speciation and its ontology: The
Collier, G. E 1990. Evolution of arginine kinase wrthln empirical consequences of alternative species con-
the genus Drosophlln J Hered. 81:177-182. cepts for understanding patterns and processes of
Coiller, C E. and R. J. Maclntyre. 1977. differentiation, pp. 28-59. In D. Otte and J. A.
Microcomplement fixation studies on the evolu- Endler (eds.), Speciation and its Consequences.
tion of a-glycerophosphate dehydrogenase within Sinauer Associates, Sunderland, MA.
the genus Drosophila Proc. Natl. Acad. SCI.USA Crandall, K.A. 1994. lntraspecific cladogram estima-
74 684-688. tion: Accuracy at higher levels of divergence.
Collins, T M., E Kraus and G. Estabrook. 1994a. Syst. Biol. 43:222-235.
Compositional effects and weighting of Crandall, K. A. 1995a. Intraspccific phylogenetics:
~~ucleotlde sequences for phylogenetic analysis. Support for dental transmission of human
Syst. Biol. 43:449459. immunodeficiency virus. J. Virol 69:2351-2356.
Collrns, T M., P. H. Wimberger and G. j. P. Naylor. Crandall, K. A. 199513. Multiple interspecies transmis-
1994b. Compositional blas, character-state bias, sions of human and simian T-cell leukemia/lym-
and character-state rcconstmction using parsimo- phoma virus type I sequences. Mol. Biol. Evol. (in
i~y.Syst. Biol. 43:482-496. press)
Comings, D. E. 1978. Mechanis~nsof chromosome Crandall, K, A., A. R. Templeton and C. E Sing. 1994.
banding and ~mphcationsfor chro~nosomestruc- Interspecific phylogenetics: 13roblemsand solu-
Lure. Annu. Rev. Genet. 12:25-46. tions, pp. 273-297. In R. W. Scotland, D. J. Siebert
Commorford, S. L. 1971. Iodmatmn of nucleic acids in and D. M. Williams (eds.), Models 171Phylogeny
vit~o.Blochemistry 10.1993-2000. Xeconstruct~on.Clarendon Press, Oxford.
Conkle, M. T., P.D. EIodgskiss, L. B. Nunnally and S. Crandall, K. A. and A. R. Templeton. 1996.
C Hunter. 1982. Sfnrclr Gel Electrophoresis of Conifer Applications of intraspecific phylogenetics. In P.
Seeds A Laboraioy Manual. Gcn. Tech. Report H. Harvey, A. J. Leigh Brown and J. Maynard
PSW-64. Pacific Southwest Forest and Range Smith (eds.), New Usesfor New Phylogemies. Oxford
Experimental Station, Forest Service, U.S. Dept. University Press, Oxford.
Agriculture, Berkeley, California. Crawford, U.J. 1989. Enzyme electrophoresis and
Cooper,A., C. Mourer-Chauvir6, G. K. Chambers, A. plant systematics, pp. 146-164. In Soltis, D. E. and
von Haeseler, A. C. Wilson and S. Paabo. 1992. P. S. Soltis (eds.), Isozyrnes in Plant Biology.
Independent orlglns of new Zcaland moas and Dioscorides Press, Portland, Oregon.
k~wls.Proc. Natl. Acad. Sci. USA 89:8741-8744. Crawford, D.J. 1990. Plant Molecular Systematics. Jol~n
Coradm, L and D. E. Clannasl 1980. The effects of Wiley and Sons, New York.
chcmical preservatives on plant collections to be Crawford, D. L,and D. A. Powers. 1989. Molecular
used m chemotaxonomic surveys. Taxon 29:33-40. basis of evolutionary adaptation at the lactate
CL)n?all,R. J ,T. J. Aitman, C M. Hearne and J. A. dehydrogenase-I3locus in the fish Fundulus hetero-
'lbdd. 1991. The generat~onof a library of PCR- clitus. Proc. Natl. Acad. Sci. USA 86:9365-9369.
analyzed microsatellltc varlants for genetic map- Crawford, T.J. 1984. What is a population?, pp.
ping of the mousc genome. Genomics 10:874-881. 135-174. In B. Shorrocks (ed.), Evolutions y
Corriveau, J. L. and A. W. Coleman. 1988. Rapid Ecology. The 23rd Symposium of the British Ecological
screening method to detect potential biparental Society. Blackwell, Oxford.
inlieritance of plastid DNA and resulks from over Craxton, M. 1991. Linear amplification sequencing: A
200 anglosperln specles. Am. J. Bot. 75:1443-1458. powerful method for sequencing DNA. Methods
Cox, D. R. and H. D. Miller. 1977. The Theoly of 3:ZO-24.
Stochasftc Processes. Chapman and Hall, London. Creasey, A,, L. DIAngio,T. S. Dunne, C. IKisslnger, T.
Coyne, ] 1982. Gel electrophoresis and cryptic protein O'Keeffe, H. Perry-O'Keeffe, L. S. Moran, M.
variation, pp. 1-32. Tn M. Rattazzi, J. Scandalios Roskey, I. Schildkraut, L, E. Sears and B. Slatko.
and G.Whitt (eds.),lsozytnes: Current Top~cszn 1991. Application of a novel chemiluminescence-
based DNA detection method to single-vector Daly, J. C. 1981. Effects of social organization and envi-
and multiplex DNA sequencing. BioTechnology ronmental diversity on determining the genetic
11:102-109. structure of a population of the wild rabbit,
Cremisi, E, R. Vignali, R. Batistoni and G. Barsacchi. Oryctolagus cuniculus. Evolution 35:689-706.
1988. Heterochromatic DNA in Triturus Dando, P. R., K. B. Storey, P. W. Hochachka and J. M.
(Amphibia, Urodela) 11. A centromeric satellite Storey. 1981. Multiple dehydrogenases in marine
DNA. Chromosoma 97:204-211. molluscs: Electrophoretic analysis of alanopine
dehydrogenase, strombine dehydrogenase,
Cronin, J. E. and V. M. Sarich. 1975. Molecular system- octopine dehydrogenase, and lactate dehydroge-
atics of the New World monkeys. J. Human Evol. nase. Marine Biol. Letters 2:249-257.
4:357-375. Danna, K. J. 1980. Determination of fragment order
Cross, T. E, R. D. Ward and A. Abreu-Grobois. 1979. through partial digests and multiple enzyme
Duplicate loci and allelic variation for mitochon- digests. Meth. Enzymol. 65:449-467.
drial malic enzyme in the Atlantic salmon, Salmo Danzmann, R. G. and J. P. Bogart. 1982a. Evidence for
salar L. Comp. Biochem. Physiol. 62B:403-406. a polymorphism in gametic segregation using a
Crother, B. I. 1990. Is "some better than none" or do malate dehydrogenase locus in the tetraploid
allele frequencies contain phylogenetically useful treefrog Hyla versicolor. Genetics 100:287-306.
information? Cladistics 6277-281. Danzmann, R. G. and J. P. Bogart. 1982b. Gene dosage
Crother, 8. I. 1992. Genetic characters, species con- effects on MDH isozyrne expression in diploid,
cepts, and conservation biology Conserv. Biol. triploid, and tetraploid treehogs of the genus
6:314. IJyla. J. Hered. 73:277-280.
Crouau-Roy, B. 1988. Genetic structure of cave- Darnell, R., H. Lodish and D.Baltimore. 1986.
dwelling beetles populations: Significant deficien- Molecular Cell Biology. Scientific American Books,
cies of heterozygotes, Heredity 60:321-327. New York.
Crousc, J. and D. Amorese. 1986. Stabilily of rest~iction Darwin, C. 1859. On the Origin o/Sy~ciesby Means of
endonucleases during extended digestion. Focus Natural Selection. J. Murray, London.
(BRL) 8:l-2. Daugherty, C. H., A. Cree, J. M. Hay and M. B.
Crow, J. E 1985.The neutrality-selection controversy Thompson. 1990. Neglected taxonomy and con-
in the history of evolution and population genet- tinuing extinctions of tuatara (Sphenodon).Nature
ics, pp. 1-18. In T. Ohta and K. Aoki (eds.), 347177-179.
Population Genetics and Molecular Evolution. Japan Davies, D. H., R. Lawson, S. J. Burch and J. E. Hanson.
Science Press and Springer-Verlag, Tokyo. 1987. Evolutionary relationships of a "primitive"
Crozier, R. H. and Y. C. Crozier. 1993. The mitochondr- shark (Heferodontus)assessed by micro-comple-
ial genome of the honeybee Apis rnellifera: ment fixation of serum transferrin. J. Mol. Evol.
Complete sequence and genome organization. 25:74-80.
Genetics 133:97-117. Davis, J. I. and K. C. Nixon. 1992. Populations, genetic
CSKRN. 1973. Committee for a standardized kary- variation, and the delimitation of phylogenetic
otype of the Norway rat, Xattus norvegicus. species. Syst. Biol. 41:421435.
Cytogenet. Cell Genet. 12:199-205. Davis, L. G., M. D. Dibner and J. E Battcy. 1986. Basic
Cunningham, C. W., N. W. Blackstone and L. W. Buss. Methods in Molecular Biology. Elsevier Science
1992. Evolution of king crabs from hermit crab Publ., New York.
ancestors. Nature 355:539-542. Davis, L. M., F. R. Fairfield, M. L.Hammond, C. A.
Cummings, M. P., S. P. Otto and J. Wakeley 1995. Harger, J. H.Jett, R. A. Keller, J. H. Hahn, L. A.
Sampling properties of DNA sequence data in Krakowski, B. Marrone, J. C. Martin, H. L. Nutter,
phylogenetic analysis. Mol. Biol. Evol. 12:814-823. R R. Ratliff, E. B. Shera, D.J. Simpson, S. A. Soper
and C. W. Wilkerson. 1992. Rapid DNA sequenc-
D'Agostino, R. B., W. Chase and A. Belanger. 1988. The ing based on single-molecule detection. Los
appropriateness of some common procedures for Alamos Sci. 20:280-285.
testing the equality of two independent binomial Davis, M. 13.1973. Labeling of DNA with lZ51. Carnegie
populations. Am. Statist. 42:198-202. Inst. Wash. Yearbook 72:217-221.
Dallas, J. E 1988. Detection of DNA "fingerprints" of Davison, D. 1985. Sequence similarity ("homology")
cultivated rice by hybridization with a human searching for molecular biologists. Bull. Math.
minisatellite DNA probe. Proc. Natl. Acad. Sci. Biol. 47437474.
USA 85:6831-6835.
Dawley, R. M, and J. P. Bogart (eds.). 1989. Evolution gressive hybridization: Implications for evolution
and Ecology of Unisexual Verfebrates. Bull. New and conservation. Proc. Natl. Acad. Sci. USA
York State Museum, Albany. 89:2747-2751.
Dawley, R, M., J. H. Graham and R. J. Schultz. 1985. Dene,H.,M. Goodman and W. S. Prychodko. 1978. An
Triploid progeny of pumpkinseed x green sunfish immunological examination of the systematics of
hybrids. J. Hered. 76.251-257. the Tupaioidea. J. Mammal. 59:697-706,
Dawson, D. M., H.M. Eppenberger and N. 0. Kaplan. Densmore, L. D.1983. Biochemical and immunologi-
1967. The comparative enzymology of creatine cal systematics of the order Crocodilia. Evol. Biol.
kinases. 11. Physical and chemical properties. J. 16:397-465.
Biol. Chem. 25:210-217. Densmore, L. D., J. W. Wright and W. M. Brown. 1985.
Dayhoff, M. 0. 1978. Atlas of Protein Sequenceand Length variation and heteroplasmy are frequent
Sfructure, vol. 5, suppl. 3. Natl. Biomed. Res. in mitochondrial DNA from parthenogenetic and
Found., Silver Spring, Maryland. bisexual lizards (genus Cnemidophorus). Genetics
Dayhoff, M. 0 , R. M. Schwartz and B. C. Orcutt. 1978. 110:698-707.
A model of evolutionary change in proteins, pp. de Queiroz, A. 1993. For consensus (sometimes). Syst.
345-352. In Dayhoff, M. 0.1978. Atlas of Protein Biol. 42:368-372.
Sequence and Structure, vol. 5, suppl. 3. Natl. Derr, J. N., J. W. Bickham, I. F. Greenbaum, A. G. J.
Biomed. Xes. Found., Silver Spring, Maryland. Rhodin and R. A. Mittermeicr. 1987. B~ochemlcal
Debeau, L., L. A. Chandler, J. R. Gralow, P. W. Nichols systematics and evolution in the South American
and P. A. Jones. 1986. Southern blot analysis of turtle genus Platemys (Pleurodira: Chelidae).
DNA extracted from formalin-fixed pathology Copeia 1987:370-375.
specimens. Cancer Res. 46:2964-2969. de SB, R.0. and D. M. Hillis. 1990. Phylogenetic rela-
DeBorde, D. C., C. W. Naeve, M. L. Herlocher and H. tionships of the pipid frogs Xenopus and Szlurana:
E Maassab. 1986. Resolution of a common RNA An integration of ribosomal DNA and morpholo-
sequencing ambiguity by terminal deoxynu- gy. Mol. Biol. Evol. 7:365-376.
cleotidyl transferase. Analyt. Biochem. DeSalle, R. 1992. The phylogenetic relationships of
1573275-282. flies in the family Drosophilidae deduced from
UeBry, R. W. 1992. The consistency of several phyloge- mtDNA sequences. Mol. Phylogenet. Evol.
ny-inference methods under varying evolutionary 1:31-40.
rates. Mol. Biol. Evol. 9:537-551. DeSale, R. 1994. Implications of ancient DNA for phy-
DeBry, R. W. and N. A. Slade. 1985. Cladistic analysis logenetic studies. Experientia 50:542-550.
of restriction endonuclease cleavage maps within DeSalle, R. and D. Grimaldi. 1993. Phylogenetic pat-
a maximum-likelihood framework. Syst. 2001. tern and developmental process in Drosoplzila.
34:21-34. Syst. Biol. 42:458475.
Degnan, S. D. 1993a. Genetic variability and popula- DeSalle, R. and A. R. Templeton. 1988. Founder effects
tion differentiation inferred from DNA finger- accelerate the rate of mitochondrial DNA evolu-
printing in silvereyes (Aves: Zosteropidae). tion in Hawaiian Drosophila. Evolution
Evolution 47:1105-1127. 42:1076-1084.
Degnan, S. D. 1993b. The perils of single gene trees- DeSalle, R,, L. V. Giddings and A. R. Templeton. 1986.
mitochondrial versus single-copy nuclear DNA Mitochondria1 DNA variability in natural popula-
variation in white-eyes (Aves: Zosteropidae). Mol. tions of Hawaiian Drosophila. I. Methods and lev-
Ecol. 2219-225. els of variability in W. stlveslris and D. heteroneura
Deininger, L. and G. R. Daniels. 1986. The recent populations. Heredity 5675-85.
evolution of mammalian repetitive DNA ele- DeSalle, R., T. Freedman, E. M. Prager and A. C.
ments. Trends Genet. 2:76-80. Wilson. 1987a. Tempo and mode of sequence evo-
DeLong, E. F. 1990. Archaea in coastal marine environ- lution in mitochondrial DNA of Hawaiian
ments. Proc. Natl. Acad. Sci. USA 89:5685-5689. Drosophila. J. Mol. Evol. 26:157-164.
DeLorenzo, R. J. and E H. Ruddle. 1969. Genetic con- DeSalle, R., A. R. Templeton, I. Mori, S. Pietscher and
trol of two electrophoretic variants of glu- J. S. Johnson. 198%. Temporal and spatial hetero-
cosephosphate isomerase in the mouse. Biochem. geneity of mtDNA polymorphisms in natural
Genet. 3:151-162. populations of Drosophila rnercaforunz.Genetics
DeMarais, B. D., T,E. Dowling, M. E. Douglas, W. L. 116:215-233.
Minckley and P. C. Marsh. 1992. Origin of Gila DeSalle, I<.,J. Gatesy, W.Wheeler and D. Gnmaldi.
seminuda (Teleostei: Cyprinidae) through intro- 1992. DNA sequences from a fossil termite in
Oligo-Miocene amber and their phylogenetic Dcssauer, H. C., J. B. Cadle and R. Lawson. 1987.
implications. Science 257:1933-1936. Patterns of snake evolution suggested by their
DeSalle, R., A. K. Williams and M.George. 1993. proteil~s.Fieldiana 2001. N.S. 34.1-34.
Isolation and characterization of animal mito- Dessaucr, H. C., M. S.I-Iafner, R M.Zink and C. J.
chondrial DNA. Meth. Etizymol. 224:176-204. Cole 1988.A nat~onalprogram to develop, main-
Desjardins, P. and R. Morais. 1991. Nucleotide tain, and utllize frozen tlssue collections for scien-
sequence and evolution of coding and noncoding tific research. Assoc. Syst. Collections Newsle~ter
regions of a quail mitochondria1 genome. J. Mol. 16.3,9-10.
Evol. 32:153-161. Djaz, M. O., G. Barsacchi-Pilone, K.A. Mahon and J
Dc Soete, G. 1983a. A least squares algoritlun for fit- G. Gall. 1981. Transcripts from both strands of a
ting additive trees to proximity data. satellite DNA occur on lamybrush chramosomc
Psychometrika 48:621-626. loops of the newt Nolophfhalmus. Cell 24.649-659
De Soete, G. 1983b. On the constmction of Dickersm, K. and J. A. Berlin 1992. Mcta-analys~s
"optimal"phylogcnetic trees. Z. Naturforsch State-of-the-science. Epldem~olRev, 24 154-276.
38:156-158. Dickerson, R. E. 1971. The structure of cytochrome (
Dessauer, 13. C. and C. J. Cole. 1984. Influence of gene and the rates of molecular evolution. J. Mol. Evol
dosage on electrophoretic phenotypes of proteins 1:2645.
from lizards of the genus Cnemtdophorl~s.Comp. D~clrlch,W., H. Katz, S. E. Llncoln, H.-S. Shm, J.
Biochem. Physlol. 77B:181-189. Friedman, N. C. Dracopli and E. S.Lander 1992.
Dessauer, H.C, and C. J. Cole. 1986. Clonal inheri- A genetic map of the mouse sultable for typing
tance in partl~enageneticwhiptail lizard: ~i~traspecific crosses. Genetics 131,423447
Biochelnical evidence. J. I-Iered. 773-12. Dletr~ch,W, J. Miller, H. Katz, I3 Joyce, li. Steen, S
Dessauer, H. C. and C. J. Cole. 1989. Diversity between Lmcoln, M. Daly, M. P. Rceve, A. Weaver, 1'
and within nominal forms of unisexual teiid Anagnostopoulis, N. Goodman, N Dracopoli and
lizards, pp. 49-71. In R. Dawley and J. I? Bogart E. S. Landcr. 1993. SSLP genetic mapping of the
(eds.), Evolution and Ecology of Unisexual mouse ( M u s nzuscidus) SN-40, pp. 110-142 117 S J
Veitebrafes.New York State Museum, Albany. O'Brlei~(ed.), Genetic Maps Loclts Maps of Cornplcx
Dessauer, H. C. and C. J. Cole. 1991. Genetics of whip- Genomes. Book 4. Nonhuman Vertebmtes. 6th cd
tail lizards (Reptilia: Teiidae: Cuemidophorus) in a Cold Spring Harbor Laboratory Press, Cold
hybrid zone in southwestern New Mexico. Spring Harbor, New York
Copcia 1991:622-637 DlLella, A G. and S. L C Woo 1987 Cloning large
Dessauer, H. C. and M. S. Hafner (eds.). 1984. segments of genomic DNA uslng cosmid vectors
Collections of Frozen Tissues: Value, Management, Metlt. Enzyinol. 152:199-212,
Field and Laboratory Procedures, and Dlrecto y of DiMlchele, L. and D. A. Powers. 1982a LDH-B geno-
Exlsting Collection?.Assoc. Systcmatics type-speclfic hatching times of Ftitzd~~lus heferocll-
Collections, Wasliington, D.C. lus embryos. Nature 296563-564.
Dessauer, 13. C. and R. A. Menzies. 1984. Stability of DiMlchele, L. and D. A. Powers. 1982b. Physlologicnl
macromolecules during longtcrm storage, pp. basls for swimmlng endurance differences
17-20. In H. C. Dessauer and M. S. HaIner (eds.), between LBH-B genotypes of F~irzduiusheterocil-
Collections of Frozen Tissues: Value, Management, t r s Science 216:1014-1016
Field artd Laboratory Procedures, and Directory of Dlnimick, W. W. 1987 Fhylogenetic relationsh~psof
Exjstzng Collecttons. Assoc. Syst. Collections, Nolrop~shubbs~,N welaka and N.emilne
University of Kansas Press, Lawrence. (Cyprinifosmes: Cyprlnidac) Copeia
Dessauer, H. C., M. J. Braun and S. Neville. 1983.A 1987:316-325.
simple hand centrifuge for field use. Isozyme Dixon, M. T and D. M. H1111s. 2993 Ribosomal l'aTA
Bull. 16:91. secondary structure: Compensatory muta hons
Dcssauer, H. C., R. A. Menzies and D. E. Fairbrothers. and implications for phylogenet~canalysis Mol.
1984. Procedures for collecting and preserving tis- 8101. Evol 10:256-267.
sues for molecular studies, pp. 21-24.17~ H. C. Dobzhansky, T.1937. Genetics and the Brzgin of Spertes
Dessauer and M. S. Hafner (eds.), Collections of Reprinted 1982, Columbla University I>rcss,NCIV
Frozen Tissues: Value, Managemenf, Field and York.
Laboratory Pvocedu~es,and Directory of Existing Dodds, K.G. 1986. Resampllng methods m genetlcs
Collections. Assoc. Syst. Collections, University of and the effect of family structure in genehc data
Kansas Press, Lawrence. Inst. Statis. M h ~ c oSeries
. 1684T, North Carolina
State University, Ralelgh.
Dorn~i~go, E. and J. J. Holland. 1994.Mutation rates Dowling, T.B. and M. R.Childs. 1992. Impact of
and rap~devolution of IWA viruses, pp. 161-184. hybridization on a threatened trout of the south-
in S S Morse (ed ), The Evollntionay Btology of western United States. Conserv. Biol. 6:355-364.
VI~L(FCS.Raven Press, New York. Dowling, T. E.and B. D. DeMarais. 1993. Evolutionary
Ucnoghue, M, J. 1989 Pl~ylogeiziesand the analysis of significance of introgresslve hybridization in
evolutionary sequences, wlth examples from seed cyprinid fishes. Nature 362:444446
plants Evolution 43:1127-1156. Dowling, T. E. and W. R. Hoeh. 1991. The extent of
Donoghuc, M. I., R. G Olmstead, J. F. Smith and J. D. introgresslon outside the hybrid zone between
Palmer, 1992. Phylngenetic relationships of Notropis cornutus and Notropis cl~rysocephalus
Dipscales based on rbcL sequences. Ann. M~ssouri (Teleostei:Cyprinidae). Evolution 45:944-956.
Bot Garden 79:249-265. Dowling, T. E., G. R. Smith and W. M. Brown. 1989.
Donnellan, S. C. and K. P Aplin. 1989. Resolulion of Reproduclive isolation and introgression between
cryptic species in tlze New Guinean Ilzard, Notropis cornufus and Nofropis chrysocephalus (fam-
S~he17o1~10rpJ~us lobter~sis(Scincidae)by elec- ily Cyprinidae): Comparison of morphology,
~rophorcsis.Copeia 1989 81-88. allozyrnes, and mitochondria1DNA. Evolution
Dodlittle, R. F. (ed). 1990a.Molecular Evolution: 43:620-634.
Corrlputer Analysls of Protein and Nuclerc Acid Dowling, T. E., B. D. DeMarais, W. L. Minckley, M. E.
Sequences. Methods 11-1Enzymology, 183. Douglas and P. C. Marsh. 1992a. Use of genetic
Academic Press. San Diego. characters in conservation biology. Conserv. Biol.
Doolittle, R. E 1990b.Searching through sequence 6:7-8.
databases. Meth. Enzymol. 18399-110. Dowling, T. E., G. R. Smith, W.R. Hoeh and W. M.
Doolltlle, R. F., D . 4 Feng, M. A. McCiure and M, S. Brown. 199223. Evolutionary relationships of shin-
Johnson. 1990. Rctrovirus phylogeny and evolu- ers in the genus Luxilus (Cyprinidae) as deter-
tlon. Curr. Top. Mlcro. Immunol. 157:l-18. mined by analysis of mitochondria1 DNA. Copeia
Doohttle, W. Fa1985. Middle repetitive DNAs, pp. 1992:306-322.
443487. In T. Cavalier-Smlth (ed.), The Evoluf~ui~ Downie, S. R. and J. D. Palmer. 1992a. Restriction site
of Geno?neSize, John Wiley and Sons, New York. mapping of the chloroplast DNA inverted repeat:
Dolli, R. L., H. Akashi and W. Gilbert. 1995.Absence A molecular phylogeny of the Asteridae. Ann.
of polymorphism at the ZFY locus on the human Missouri Bot. Garden 79:266-283.
Y chromosome. Science 268:1183-1185. Downie, S. R. and J. D. Palmer. 1992b. Use of chloro-
Dover, G.A. 1982. Molecular drive, a cohesive mode plast DNA rearrangements in reconstructing
of species evolutian. Nature 299:lll-117. plant phylogeny, pp. 1435. In P. S. Soltis, D. E.
Dover, G. A. 1987. Letter to the editor. Cell 51:515. Soltis and J. Doyle (eds.), Molecular Systematics of
Dover, C.A. and D. Tautz. 1986. Conservation and Plants. Chapman and Hall, New York.
divergence in muit~genefamilies: Alternatives to Downie, S. R. and J. D. Palmer, 1994.A chloroplast
selection and drift. Phil. Trans. Roy. Soc, London DNA phylogeny of the CaryophylXalesbased on
B312.275-289. structural and inverted repeat restriction site vari-
Dover, G A., S. Brown, E. Coen, J. Dallas, T. Strachan ation. Syst. Dot. 19:236-252.
and M. Trick. 1982 The dynamics of genome evo- Downie, S. R., R. G. Olmstead, G. ~urawski,D. E.
lution and species differentiation, pp. 343-372, i n Soltis, P. S. Soltis, J. C. Wa and J. D. Palmer. 1991.
C A. Dover and R. B. Flavell (eds.), Genome Six independent losses of the chloroplast DNA
Evolziiion. Acadelnic Press, New York. rp12 intron in dicotyledons: Molecular and phylo-
Dowling, H.C., R. Highton, G. C. Maha and L. R. genetic implications. Evolution 45:1245-1259.
Maxson. 1983. Biochemical evaluation of colubrid Doyle, J. J. 1992. Gene trees and species trees:
snake phylogeny. J. Zool. (London) 201:309-329. Molecular systematics as one-character taxonomy.
Dowling, T. E. and W. M. Brown. 1989. Allozymes, Syst. Bot. 17:144-163.
~nltochondrialDNA, and levels of phylogenetic Doyle, J. J. 1994.Evolution of a plant homeotic multi-
resolution among four species of minnows gene family: Toward connecting molecular sys-
(Nofropis:Cyprinidae). Syst. Zool. J8:126-143. tematics and molecular developmental genetics.
Dowlmg, I:B.and W. M. Brown. 1993. Population Syst. Biol. 43:307-328.
structure of the bottlenose dolphin (Tursiops trim- Doyle, J. J. and E. E. Dickson. 1987. Preservation of
cafus) as deiermmed by restriction endonuclease plant samples for DNA restriction endonuclease
analysis of mitochondrial DNA. Marine Mam-cn. analysis. Taxon 36:715-722.
SCI.Y:138-155. Doyle, J. J., J. I. Davis, R. J. Soreng, D. Gamin and M. J.
Anderson. 1992. Chloroplast DNA inversions and ty of mutation rate: Protein evolution inmam-
the origin of the grass family (Poaceae).Proc. mals is not neutral. Mol. Blol. Evol. 11:643-648.
Natl. Acad. Sci. USA 89:7722-7726. Echelle, A. A. and !I J. Connor. 1989. Rapid, geograph-
Dubin, D. T., C. C. HsuChen and L. E. Tillotson. 1986. ically extensive genetic introgression after sec-
Mosqu~tomitochondria1 transfer RNAs for valine, ondary contact between two pupfish species
glycine and glutamate: RNA and gene sequences (Cyprinadon, Cyprinodontidae). Evolution
and vicinal genome organization. Curr. Genet. 43:717-727.
10:701-707. Echelle, A. A. and T. E. Dowling. 1992. Mitochondria1
Dueck, G. 1990. New optimization heuristics: The DNA evolution of the Death Valley pupfishes
Great Deluge algorithm and the record-to-record (Cyprinodon, Cyprinodontidae). Evolution
travel. Scientific Centre Technical Report, IBM 46:193-206.
Germany. Echelle, A. A., T.E. Dowling, C. Moritz and W. M.
Dueck, C. and T. Scheuer. 1990. Threshhold accepting: Brown. 1989. Mitochondria1 DNA diversity and
A general purpose optimisation algorithm the origin of the Menidia clarkl~ubbsicomplex of
appearing superior to simulated annealing. J. unisexual fishes (Atherinidae). Evolution
Comp. Physics 90:161-175. 43:984-993.
Duellman, W. E. and D. M. Hillis. 1987. Marsupial Echelle, A. F., A. A. Echelle and D. R. Edds. 1989.
frogs (Anura: Hylidae: Gastrofheca)of the Conservation genetics of a spring-dwelling desert
Ecuadorian Andes: Resolution of taxonomic prob- fish, the Pecos gambusia (Gambusia nobilis,
lems and phylogenetic relationships. Poeciliidae). Conserv. Biol. 3:159-169.
Herpetologica 43: 135-167. Eck, R. V. and M. 0.Dayhoff (eds.). 1966, Atlas of
Duellman, W. E., L. R. Maxson and C. A. Jesiolowski. Protein Sequence and Structure 1966. Natl. Biomed.
1988. Evolution of marsupial frogs (Hylidae: Res. Found., Silver Springs, Maryland.
Hemiphractinae): Immunological evidence. Eckert, R. 1987. New vectors for rapid sequencing of
Copeia 1988:527-543. DNA fragments by chemical degradation. Gene
Dutrillaux, B, 1975. Discontinued treatment with 51:242-252.
BudR and staining with acridine orange: Edwards, A., H. A. Hammond, J. Li, C. K. Caskey and
Observation of R- or Q- or intermediary banding. R. Chakraborty. 1992. Genetic variation at five
Chromosoma 52:261-273. trimeric and tetrameric tandem repeat loci in four
Dyer, A. E 1979. Investigating Chromosomes. John Wiley human population groups. Genomics 12:241-253.
and Sons, New York. Edwards, A. W. E 1972. Likelihood. Cambridge
Dykhuizen, D. E., C. Mudd, A. Honeycutt and D. L. University Press, Cambridge.
Hartl. 1985. Polymorphic posttranslational modi- Efron, B. 1979. Bootstrapping methods: Another look
fication of alkaline phosphatase in Escherichia coli. at the jackknife. Ann. Stat. 21-26.
Evolution 39:l-7. Efron, B. 1982. The Jackknife, fhe Bootstrap, and Other
Resampling Plans. CBMS-NSF Regional
Eanes, W. F. 1987. Allozymes and fitness: Evolution of Conference Series in Applied Mathematics,
a problem. Trends Ecol. Evol. 2:44-48. Monograph 38. Soc. Indust. Appl. Math.,
Eanes, W. E, L. Katona and M. Longtine. 1990. Philadelphia.
Comparison of in vitro and in vivo activities asso- Efron, B, and G. Gong. 1983.A leisurely look at the
ciated with the G6I'D allozyme polymorphism in bootstrap, the jackknife, and cross-validation.
Drosophila melanogaster. Genetics 125:845-853. Am. Statist. 37:36-48.
Easteal, S. 1985. The ecological genetics of introduced Efron, B. and R. J. Tibshirani. 1993. A n Introduction to
populations of the giant toad Bufo marinus. 11. the Bootstrap. Chapman and Hall, New York.
Effective population size. Genetics 110:107-122. Eickbush, T. 1994. Evolution of retroelements, pp.
Easteal, S. 1986. The ecological genetics of introduced 121-157. In S. S. Morse (ed.), The Evolutionary
populations of the giant Toad, Bufo marinus. W. Biology of Viruses. Raven Press, New York.
Gene flow estimated from admixture in Ellegren, H., M. Johansson, K.Sandberg and L.
Australian populations. Heredity 56:145-156. Andersson. 1992. Cloning of highly polymorphic
Easteal, S. 1990. The pattern of mammalian evolution microsatellites in the horse. Anim. Genet.
and the relative rate of molecular evolution. 23:133-142.
Genetics 124:165-173. Ellegren, H., M. Johansson, B. I? Chowdhary, 5.
Easteal, S. and C. C. Collett. 1994. Consistent variation Marklund, D. Ruyter, L. Marklund, Bduner-
in amino-acid substitution rate, despite uniformi- Nielsen, I. Edfors-Lilja, I. Gustavsson, R. K. Juneja
and L. Andersson. 1993. Assignment of 20 Evans, M. R. and C. A. Read. 1992. 32!?, 33Pand 35S:
microsatellite markers to the porcinc linkage map. Selecting a iabel for nucleic acid analysis. Nature
Genomics 16:431-439. 3583520-521.
Ellsworth, D. L., K,D. Kittenhouse and R. L. Evarts, S. and C. J. Williams. 1987. Multiple paternity
Honeycutt. 1993.Artifactual variation in random- in a wild populat~onof mallards. Auk
ly amplified polymorphic DNA banding patterns. 104:597-602.
BioTechniques 14:214-217. Excoffier, L., P. E. Smouse and J. M. Quattro. 1992.
Elwood, H.J., G. J. Olsen and M.L. Sogin. 1985. The Analysis of molecular varlance inferred from met-
small-subunit ribosomal RNA gene sequences ric distances among DNA haplotypes:
from the hypotrichous cillates Oxytricha nova and Application to human mitochondrlal DNA
Stylonychia pustulata. Mol. Biol. EvoI. 2:399-410. restriction data. Genetics 131:479-491.
Endler, J. A. 1979. Gene flow and life history patterns.
Genetics 93:263-284, Fairbrothers, D. E. and M. A. Johnson. 1964.
Endler, J. A. 1989. Conceptual and othcr problems in Comparative serological studies within the fami-
speciation, pp. 625-661. b~D. Otte and J. A. iies Cornaceae (dogwood) and Nyssaceaa (sour
Endler (eds.), Speciation and its consequences. gum), pp. 305-318. In C. A. Leone (ed.), Taxonomic
Sinauer Associates, Sunderland, MA. Btochemistiy and Serology. Ronald Press, New
Engel, W., J. Schmidtke, W. Vogel and V. Wolf. 1973. York.
Genetic polymorphism of lactate dehydrogenase Faith, D. P. 1985. Distance methods and the approxi-
isoenzymes in the carp (Cyprinus carpio) apparent- mation of most-parsimonrous trees. Syst. Zool.
ly due to "null alleles." Genetica 8:281-289. 34:312-325.
Engelke, D .,A. Krikos, M. E. Bruck and D. Ginsberg. Faith, D. P. 1990. Chance marsupial relationships.
1990. Purification of Themlus aquaticus DNA poly- Nature 345:393-394.
merase expressed in Esclzerichia coli. Analyt. Faith, D. P. 1991. Cladistic permutation tests for mono-
Biochem. 191:396-400. phyly and nonmonophyly. Syst. Zool.40:366-375.
Epplen, J. T. 1988. On simple repeated CAC/TA Faith, D. P. and P. S. Cranston. 1991. Could a clado-
sequences in animal geizomes: A critical reap- gram this short have arisen by chance alone? On
praisal. J. Hered. 79:409417. permutation tests for cladistic structure.
Erlich, H. A. 1989. PCR Tecimology. Stockton press, CIadistics 7:1-28.
New York. Fan, E., D. B. Levin, B. W. Glickman and D. M. Logan.
Erlich, W.A. and N. Arnhcim. 1992. Genetic analysis 1993. Limitations in the use of SSCP analysis.
using the polymerasc chain reaction. Annu. Rev. Mutation Res. 288:85-92.
Genet. 26:479-506. Fangan, 8. M., B. Stedje, 0. E. Stabbctorp, E. 5. Jensen
Erlich, H. A,, D. Gelfand and J. J. Sninsky. 1991. Recent and K. S. Jakobsen. 1994. A general approach for
advances in the polymerase chain reaction. PCR amplification and sequencing of chloroplast
Science 252:1643-1651. DNA from crude vascular plant and algal tissue.
Eschbach, S., J. Wolters and P. Sitte. 1991. Primary and BioTechniques 16484494.
secondary structure of the nuclear small subunit Fani, R., G. Damamiani, C. DiSerio, E. Gallori,A. Grifoni
ribosomal RNA of the cryptomanad Pyrenomanas and M. Bazzicalupo. 1993. Use of random ampli-
salina as inferred from the gene sequence: fied polymorphic DNA (IIAPD) for generating
Evolutionary implications. J. Mol. Evol. specific DNA probes for microorganisms. Mol.
32:247-252. Ecol. 2243-250.
Estabrook, G. F, 1983. The causes of character incom- Farris, J. 5. 1969. A successive approximations
patibility, pp. 279-295. In J. Felsenstein (ed.), approach to character weighting. Syst. Zool.
Numerical Taxonomy. NATO AS1 Series, Vol. GI, 18:374-385.
Springer-Verlag, Berlin. Farris, J. S. 1970. Mcthods for computing Wagner
Estabrook, G. E 1992. Evaluating undirected position- trees. Syst. Zool. 34:21-34.
al congruence of individual taxa between two Farris, J. S. 1972. Estimating phylogenetic trees froin
estimates of the phylogenetic tree for a group of distance matrtices. Am. Nat. 106:645-668.
taxa. Syst. Biol. 41372-177. Farris, J. S. 1977. Phylagcnetic analysis under DoUoJs
Estabrook, G. R and L. Landrum. 1975.A simple test Law. Syst. Zool. 26:77--88.
for the possible simultaneous evolutionary diver- Farris, J. S. 1981. Distance data in phylsgenetic analy-
gence of two amino acid positions. J. Math. Biol. sis, pp. 3-23. In V. A. Funk and D. R. Brooks
4:195-200. (eds.), Advances irr Cladistics: Proceedzngs of tlze First
Meeting ofthe Willr Hennig Soclety. New York slze from samples of sequences Inefficiency of
Botanical Garden, Bronx. palrwlse and segregating sltcs as coinparcd to
Farris, J. S. 1983. The logical basis of phylogcnetic sys- phylogenetic estimates. Genet. Res. Camb.
tematics, pp. 7-36. In N. I. Platnick and V. A. Funk 59~139-147.
(eds.), Advances In Cladistics. Columbia University Felscnstein, J. 1992b. Phylogenies From rcstrlctlon
Press, New York. sites, a maximum likcllhood approach. Evolution
Farris, J. S. 1985. Distance data revisited. Cladistics 46.159-173.
1:67-85. Felsenstein, J. 1993. lJkWLIP (IJhylogenyInfereizce
Farris, J. S. 1986a. Distances and cladistics. Cladistics Package), version 3.5~.Department of Genetics,
2:14.2-157. University of Washington, Seattle.
Farris, J. S. 1986b. On the boundaries of phylogcnetic Felsenstein, J. and EI.K~shino1993 Xs there someth~ng
systematics. Cladistics 2:14-27. wrong with the bootstrap on phylogenies? A
Feinberg, A. P. and B. Vogelstein. 1983. A technique for reply to Hillis and Bull. Syst Blol. 42:193-200.
radiolabelling DNA restriction endonuclease frag- Feng, D-E and R. F. Doolittle. 1987 i'rogressive
ments to high specific activity. Analyt. Biochem. sequence alignment as a prerequisite to correct
132:6-13. phylogenetic trees J. Mol. Evol. 25.351-360
Felsenstein, J. 1978a. Cases in which parsimony and Feng, D-E and R. F. Daolittle 1990 Progressive align-
compatibility methods will be positively mislead- ment and phylogenet~ctree construction of pre-
ing. Syst. Zool. 27:401-410. tein sequences. Meth. Enxyrnol. 183.375-387.
Felsenstcin, J. 1978b. The number of evolutionary Fernholm, B., K. Bremer and I3 Jornvall (eds.) 1989
trecs. Syst. Zool. 27:27-33. The Hierarchy of L~fe.Clsevicr Sclence Publishers,
Felsenstein, J. 1981a. Evolutionary trees from DNA Amsterdam.
sequences: A maximum likelihood approach. J. Ferrari, J. A, and C. E. Taylor 1981. M~erarchicalpat-
Mol. Evol. 17:368-376. terns of chromosome vanation in Drosophtiia sub-
Felsenstein, J. 1981b. Evolutionary trees from gene fre- obscura. Evolution 35:391-394
quencies and quantitative characters: Finding Ferris, S. D. and G. S. Wh~tt.1977a Duplicate gene
maximum likelihood estimates. Evolution expression in dlploid and tetraploid loaches
35:1229-1242. (Cypriniformcs, Coblt~dae).Biochem. Genet
Felsenstein, J. 1981~. A likelihood approach to charac- 15:1097-1112.
ter weighting and what it tells us about parsitno- Ferris, S D. and G. S. Whitt. 197% Loss of duplicate
ny and compatibility. Biol. J. Linnean Soc. gene expression after polyploidlzation. Naturc
16183-196. 265.258-260,
Felsenstein, J. 1982. Numerical inetliods for inferring Ferris, S D.and G. S. W h ~ t t1978a. Phylogeny of
evolutionary trees. Quart. Rev. Biol. 57:379-404. tetraploid catostomid flshes based on thc loss of
Felsenstein, 1.1984. Distance methods for inferring duplicate gene exprcsslon. Syst Zool. 27.189-203.
pl~ylogenies:A justification. Evolution 38:16-24. Ferr~s,S. ID.and G. S. Whltt 197810. Genetic and molc-
Felsenstein, J. 1985a. Confidence limits on phyloge- cular analys~sof non-randoin dimer assembly of
nies: An approach using the bootstrap. Evolution Lhe creatine kinase isozynles of fishes. Biochem
39:783-791. Scnct. 26:811-829
Felsenstein, J. 1985b. Confidence limits on phylogenies Ferrucci, L,, E. Romano and G. E De Stefano. 1987
with a molecular clock. Syst. Zool. 34352-161. Thc AILII-mducedbands m great apes and man
Felsenstein, J. 1985c. Phylogenies and the comparative lmpl~cationfor heterochloma t ~ characterizat~oi-i
u
method. Am. Nat. 125:l-15. and satellite DNA distr~bution.Cytogenet Cell
Felsenstein, J. 1986. Distance methods: A reply to Genet. 44:53-57.
Farris. Cladistics 2:130-143. Fetni, R., li. Droum, N. Lemincux, B Malfoy, 13
Felsenstcin, J. 1987. Estimation of hominoid phyloge- Dutrillaux, P. Messier and C. L Richer. 1992.
ny from a DNA hybridization data set. J. Mol. Detection of small, single-copy gciles on protem-
Eval. 26:123-131. G-banded chromosoines by electron microscopy
Felsenstein, J. 1988a. Phylogenies from molecular Cylogenet. Cell Genet. 60.187-389.
sequences: Inference and reliability. Annu. Rev. Field, K.G., G. J. Olsen, B.J. Lane, S J. Giovannnni,
Genet. 22:521-565. M. T.Ghiselin, E. C, liaff, N. R. Pace and Ii. A.
Felsenstein, J. 1988b. Phylogenies and quantitative Raff 1988. Molecular phylogeny of the anlmal
characters. ,41mu. Rev. EcoI. Syst. 19:445471. kmgdom. Science 239 748-753.
Felsenstein, J. 1992a. Estimating effective population
Frclcis, &I.A., P. R.Gaudreault a l ~ dH.Tyson. 1989. Fitch, W, M. 1984. Cladistic and other methods:
i-Icritablc changes In electrophoretic properlles of Problems, pitfalls, and potentials, pp. 221-252. In
flax perox~dasesresulting from variation in N T. Duncan and 1:G. Stuessey (eds.), Cladistlc
nutr~entlevel. Genetica 78.81-90. Perspecfzves on the Reconstruction of Evolutionay
rigurroa, E, M. Kasahara, H T~chy,E. Neufeld, U. History. Columbia University Press, New York.
lillte and J. Klein. 1987. Polymorphism of umque Fitch, W. M. 1986. A hidden bias in the estimate of
noncodmg DNA sequences in wild and laborato- total nucleotide substitutions from pairwise dif-
ry mice. Genetics 117.101-108. ferences, pp. 315-328. In S. Karlin and E. Nevo
Fildes, R.A. and H. Harr~s.1966. Genetically deter- (eds.), Evoluttonary Processes and Theory. Academic
m~nedvariation of adenylate klnase in man. Press, Orlando, Florida.
Nature 209:261-263. Rtcl~,W. M. and W. R. Atchley. 1985. Evolution in
, C.and R. W. Brosemer. 1973. Immunochem~ca!
F ~ n kS. inbred strains of mice appears rapid. Science
siudles with glycerol 3-phosphate dehydrogenase 228:1169-1175.
~nbees and wasps. Arch. Biochem. B~ophys. Fitch, W. M. and W. R. Atchley. 1987.Divergence in
1-58 30-35. inbred strains of mice: A comparison of three dif-
Flshel, S. E. and G.S Whitt. 1978. Evolution of ferent types of data, pp. 203-216. In C. Patterson
isozyme loci and their differet~tialtissue expres- (ed.), Molecr~lesand Morphology in Evolufion:
sion. Creatine bnase as a mode! system. J. Mol. Conflict or Conzpromtse? Cambridge University
Evoi. 12:25-55, Press, Cambridge, England.
Asher, S E. and G. S. Whitt. 1979, Evolution of the cre- Fitch, W. M. and E. Margoliash. 1967. Construction of
ahnc kinase isozyme system in the primitive ver- pl~ylogenetictrees. Science 155:279-284.
tebrales. Occ. Pap. Californla Acad. Sci. Fitch, W. M. and E. Markowitz. 1970. An improved
134.142-159. method for determining codon variability in a
Flsher, S E., J. 3.Shaklee, S D. Ferris and G. S. Whitf gene and its application to the rate of fixation of
1960 Evolution of five mul~ilocusisozyme sys- mutations in evolution. Biochem. Genet.
terns UI the chordates. Genetica 52/53:73-85. 4:579-593.
Filch, W M. 1966. An Improved method of testlng for Fitch, W. M., J. M. E. Leiter, X. Li and I? Palese. 1991.
evolutionary homology. J. Mol. Biol. 1629-16. Positive Darwinian evolution in human influenza
Fitch, W. M. 1970. Distinguishing homologous from A viruses. Proc. Nat. Acad. Sci. USA 88:42704274.
analogous proteins. Syst. Zool. 19:99-113. FitzSimmons, N. N., C. Moritz and S. S, Moore. 1995.
Rtih, W. M. 1971a. Thc non-identity of invariant posi- Conservation and dynamics of microsatellite loci
tions in the cytochrome c of different species. over 300 million years of marine turtle evo1ut.ion.
Blochem. Genet. 5,231- 241. Mol. Biol. Evol. 12:432440.
Flich, W. M. 1971b. Toward defin~ngthe course of evo- Flavell, R. B. 1986. Repetitive DNA and chromosome
lution Minimal change for a specific tree topolo- evolution in plants. Phil. Trans. Roy. Soc. London
gy. Syst Z001.20:406416. B312:227-242.
Rtci?, W. M.1975. Toward f~ndingthe tree of maxi- Flavell, R. B., M. O'Dell, P. Sharp, E. Nevo and A.
mum parsimony, pp. 189-230. In G. F. Estabrook Beiles. 1986. Variation in the intergenic spacer of
(ed 1, Proceedings ofthe Eighth Internatzonal ribosomal DNA of wild wheat, Triticum
Conference on Nzrrnerlcni Taxonomy. W. H . Freeman, dicoccoides, m Israel. MoI. Biol. EvoI. 3:547-558.
San Francisco. Fleischer, R. C. 1983. A comparison of theoretical and
Fltcli, W. M.1976a. The molecular evolution of electrophoretic assessments of genetic structure in
cytochrome c in eukaryotes. J. Mol. Evol. 8:13-40. populations of the house sparrow (Passer domesti-
Flich, W. M. 1976b.Molecular evolut~onaryclocks, pp. CUS). Evolution 37:1007-1009.
160-178. In F. J Ayala (cd.), Molecular Evohliow. Flint, J., A. V. 5. Hill, D. K. Bowden, S. J. Oppenheimer,
Sinauer, Sunderland, Massachusetts. P.R. Sill, S. W. Serjeantson,J. Bana-Koiri, K.
Fitch, W. M. 1977. On the problem of discovering the Bhatia, M. P.Alpers, A. J. Boyce, D. J. Weatherall
most parsimonious tree. Am. Nat. 111:223-257. and J. B. Clegg. 1986. High frequencies of a-tha-
Frtcll, LV. hi.1979. Cautionary remarks on using gene lassaemia are the result of natural selection by
expression events ~nparslmony procedures. Syst. malaria. Nature 321:744-750.
Zoo1 28:375-379. Foltz, D. W. 1986.Null alleles as a possible cause of
F~tch,W. M. 1981. A non-sequential method for con- heterozygote deficiencies in the oyster Crassostrea
struct~ngtrees and hierarchical classifications.J. virginica and other bivalves. Evolution 40:869-870.
Mol Evol. 18:30-37. Foltz, D. W, and J. L. Woogland. 1983. Genetic evi-
dence of outbreeding in the black-tailed prairie gene sequences in animals: Initial assessment of
dog (Cynonzys ludovicianus). Evolution 37:273-281. character sets from concordance and divergence
Fonatsch, C., C. Gradl, J. Ragoussis and A. Ziegler. studies. Syst. Biol. 43:511-525.
1987.Assignment of the TCPl locus to the long Frischauf, A.-M. 1987. Construction and characteriza-
arm of human chromosome 6 by in situ tion of a genomic library in lambda. Meth.
hybridization. Cytogenet. Cell Genet. 45:109-112. Enzymol. 152:190-199.
Foran, D. R., P. J. Johnson and G. P. Moore. 1985. Fritsch, P. E and L. H. Rieseberg. 1992. High outcross-
Evolution of two actin " genes in the sea urchin ing rates maintain male hermaphrodite individu-
Strongylocentrotus fransiscanus. J. Mol. Evol. als in populations of the flowering plant Datisca
22:108-116. glomerata. Nature 359:633-636.
Fox, G. E., E. Stackebrandt, R. B. Hespell, J. Gibson, J. Frohman, M. A., Dush, M. K. and G. R. Martin. 1988.
Maniloff, T. A. Dyer, R. S. Wolfe, W. E. Dalch, R. S. Rapid production of full-length cDNAs from rare
Tanner, L. J. Magrum, L. B. Zablen, R. Blakemore, transcripts: Amplification using a gene-specific
R. Gupta, L. Bonen, B. J. Lewis, D.A. Stahl, K. R. oligo-nucleotide primer. Proc. Natl. Acad. Sci.
Luehrsen, K. N. Chen and C. R. Woese. 1980. The USA 85:8998-9002.
phylogeny of prokaryotes. Science 209:457-463. Frommer, M., C. Paul and P. C. Vincent. 1988.
Fox, G. M. and C. W. Schrnid. 1980. Related single Localization of satellite DNA sequences on human
copy sequences in the human genome. Biochim metaphase chromosomes using bromodeoxyuri-
Biophys. Acta 609349-363. dine-labelled probes. Chromosoma 97:ll-18.
Fox, G. M., J. Umeda, R. K.-Y. Lee and C. W. Schmid. Frost, D. R. and D. M. Hillis. 1990. Species in concept
1980. A phase diagram of the binding of mis- and practice: Herpetological applications.
matched duplex DNAs to hydroxyapatite. Herpetologica 46:87-104.
Biochem. Biophys. Acta. 609:364-371. Frykman, I, and 8 . 0 . Bengtsson. 1984. Genetic differ-
Frair, W. 1964. Turtle family relationships as deter- entiation in Sorex. 111. Electrophoretic analysis of a
mined by serological tests, pp. 535-544. In C. A. hybrid zone between two karyotypic races in
Leone (ed.), Taxonomic Biochemistry and Serology. Sorex araneus. Hereditas 70:259-270.
Ronald Press, New York. Fu, Y -X. and W.-H. Li. 1993. Statistical tests of neutral-
Freifelder, D. 1982. Physical Biochemistry: Applications to ity of mutations. Genetics 133:693-709.
Biochemistry and Molecular Biology. 2nd ed. W. H. Fukami, K. and Y. Tateno. 1989. On the maximum like-
Freeman and Co., New York. lihood method for estimating molecular trees:
Frelin, C. and F, Vuilleumier. 1979. Biochemical meth- Uniqueness of the likelihood point. J. Mol. Evol.
ods and reasoning in systematics. Z. Zool. Syst. 28:460-464.
Evo1ut.-forsch. 17:l-10. Funk, V. A. 1985. Phylogenetic patterns and hybridiza-
Freshney, R. I. 1987. Culture ofAnimal Cells. Alan R. tion. Ann. Missouri Bot. Card 72:681-715.
Liss, New York. Furrer, B., U. Candrian, P. Wieland, J. Luthy 1990.
Freshney, R. I. 1994. Culture of Animal Cells. 3rd ed. Improving PCR efficiency. Nature 346:324.
Wiley-Liss, New York. Futuyma, D. J. 1986. Evolutionary Biology. 2nd ed.
Frick, L. W. 1981.A biochemical, phylogenetic and Sinauer, Sunderland, Massachusetts.
immunological investigation of the cytosolic di-
and tripeptidases of fishes. Ph.D. dissertation, Galau, G. A,, M. E. Chamberlin, B. R. Hough, R. J.
University of Hawaii. Britten and E. H. Davidson. 1976. Evolution of
Frick, L. W. 1983. An electrophoretic investigation of repetitive and nonrepetitive DNA in two species
the cytosolic di- and tripeptidases of fish: of Xenopus, pp. 200-224. In F. J. Ayala (ed.),
Molecular weights, substrate specificities and tis- Molecr~larEvolution. Sinauer, Sunderland,
sue and phylogenetic distributions. Biochem. Massachusetts.
Gene:. 21:309-322. Gall, J. G. and M. L. Parduc. 1969. Formation and
Frieden, C. 1963. Glutarnate dehydrogenase. V. The detection of RNA-DNA hybrid molecules in cyto-
relation of enzyme structure to the catalytic func- logical preparations. Proc. Natl. Acad. Sci. USA
tion. J. Biol. Chem. 238:3286-3299. 63:378-383.
Friedlander, T.P., J. C. Regier and C. Mitter. 1992. Gantt, J. S., S. L Baldauf, I? J. Calie, N. E Weeden and
Nuclear gene sequences for higher level phyloge- J. D. Palmer, 1991. Transfer of rp122 to the nucleus
netic analysis: 14 promising candidates. Syst. Biol. greatly preceded its loss from the chloroplast and
41:483-490. involved the gain of an intrcn. EM&OJ.
Friedlander, T.,.'i J. C. Regier and C. Mitter. 1994. 10:3073-3078.
Phylogenetic information content of five nuclear
Gargas, A,, P.T. DePr~est,M.Grube, A. Tehler. 1995. Georges, M., A.-S. Lequarre, M. Castelli, R. Hanset
Multiple orlgins of lichen symbioses in fungi sug- and G. Vassart. 1988. DNA fingerprinting in
gested by SSU rDNA phylogeny. Sclence domestic animals using four different minisatel-
268:1492-1494. lite probes. Cytogenet. Cell Genet. 47:127-131.
Gargouri, A. 1989. A rapid and simple method for Gerbi, S. A. 1985. Evolution of ribosomal DNA, pp.
extracting yeast mitochondria1DNA. Curr. Genet. 419-517. In R. J. MacXntyrc (ed.), Molecular
15:235-237. Evolutionary Genetics. Plenum, New York.
Garland, T., Jr., R. B. Huey and A. E Bennett. 1991. Ghiselin, M. T. 1988.The origin of molluscs in bight of
Phylogeny and coadaptation of thermal physiolo- molecular evidence. Oxford Sum. Evol. Biol.
gy in lizards: A reanalysis. Evolution 45:1969-1974. 5:66-95.
Garland, T., Jr., P. H.Harvey and A. R. Ives. 1992. Gibbs, H. L., P. J. Weatherhead, P. T. Boag, B. N. White,
Procedures for the analysis of comparative data L. M. Tabak and D. J. Hoysak. 1990. Realized
using phylogenetlcally independent contrasts. reproductive success of polygynous red-winged
Syst. Biol. 41:18-32. blackbirds revealed by DNA markers. Science
Garland, T., Jr., A. W. Dickerman, C. M. Janis and J. A. 250:1394-1397.
Jones. 1993. Phylogenetic analysis of covariance Gilbert, D. A., N. Lehrman, S. J. O'Brien and R. K.
by computer simulation. Syst. Biol. 42:265-292. Wayne. 1990. Genetic fingerprinting reflects pop-
Garza, J. C. and D. S. Woodruff. 1992.A phylogenetic ulation differentiation in the California channel
study of the gibbons (Hylobates) using DNA island fox. Nature 344:764-767.
obtained nondestructively from hair. Mol. Gilbert, D. A., C. Packer, A. B. Pusey, J. C. Stephens
Phylogenet. EvoI. 1:202-210. and S. J. O'Brien. 1991. Analytical DNA finger-
Gastony, G. J. 1986. Electrophoretic evidence for the printing in lions: Parentage, genetic diversity and
origin of fern species by unreduced spores. Am. J. kinship. J. Hered. 82378-386.
Bot. 73:1563-1569. Gilberk, S. F. 1991. Developmental Biology. 3rd ed.
Gastony, G. J. 1991. Gene silencing in a polyploid Sinauer, Sunderland, Massachusetts.
homosporous fern: Paleopolyploidy revisited. Gillespie, J. H. 1984. The molecular clock may be an
Proc. Natl. Acad. Sci. USA 88:1602-1605. episodic clock. Proc. Natl. Acad. Sci. USA
Gatesy, J., R. DeSaUe and W. Wheeler. 1993. 81:8009-8013.
Alignment-ambiguous nucleotide sites and the Giliespie, J. H. 1986a. Natural selection and the molec-
exclusion of systematic data. Mol. Phylogenet. ular clock. Mol. Biol. Evol. 3:138-155.
Evol. 2:152-157. Gillespie, J. H. 1986b. Variability of evolutionary rates
Gaut, B. S. and M. T. Clegg. 1993a.Molecular evolu- of DNA, Genetics 113:1077-1091.
tion of the AdhZ locus in the genus Zca. Proc. Natl. Gillespie, J. H. 1986~.Rates of molecular evolution.
Acad. Sci. USA 9035095-5099. Annu. Rev. Ecol. Syst. 17:637-665.
Gaut, B. S. and M. T. Clcgg. 1993b. Nucleotide poly- Gillespie, J. H. 2987. Molecular evolution and the neu-
morphism in the Adhl locus of pearl millet tral allele theory. Oxford Surv. Evol. Biol. 4:lO-37.
(Pennisetum glaucum) (Poaceae). Genetics Gillespie, J. H. 1991. The Causes of MolecuLr Evolution.
135:1091-1097. Oxford University Press, Oxford.
Gaut, B. S. and P. 0.Lewis. 1995. Success of maximum Gillespie, J. H. and K. Kojima. 1968. The degree of
likelihood in the four-taxon case. Mol. Biol. Evol. polymorphism in enzymes involved in energy
12:152-162. production compared to that in nonspecific
Gaut, B. S., S. V. Muse, W. D. Clark and M. T.Clegg. enzymes in two Drosophila nnannssae populations.
1992. Relative rates of nucleotide substitution at Genetics 61:582-585.
the rbcL locus of monocotyledonous plants. J. Gillespie, R. G., H. B. Croom and S. R. Palurnbi. 1994.
Mol. Evol. 35:292-303. Multiple origins of a spider radiation in Hawaii.
Gauthier, J., A. G. Kluge and T. Rowe. 1988.Amniote Proc. Natl. Acad. Sci. USA 91:2290-2294.
phylogeny and the importance of fossils. Gittleman, J. L. and H,-K.Luh. 1992. On comparing
Cladistics 43105-205. comparative methods. Annu. Rev. Ecol. Syst.
Gelfand, D. H. and T. J. White. 1990. Thermostable 23:383-404.
DNA polymcrases, pp 129-141. In M. A. Innis, D. Givnish, T.J. and K. J. Sytsma. 1995. Homoplasy in
H. Gelfand, J. J. Sninsky and T.J. White (eds.), moIecular vs. morphological data: The likelihood
PCR Protocols. Academic Press, New York. of correct phylogenetic inference Evolution (in
Gellisen, G., J. Y.Bradfield, B. N. White and G. R. press).
Wyatt. 1983. Mitochondria1DNA sequences in the Glass, G. V. 1976. Primary, secondary and rneta-analy-
nucIear genome of a locust. Nature 301:631-634. sis of research. ~ d u cRes.
: 5:3-8.
Glover, F. 1989. Tabu search-part 1. OlSA J. Comp. Gonzaler,, I. L.,J. E Sylvester, T. R Smlth, D.
1:190-206. Stambolian and R. D. Scl~m~ckel. 1990. k b o s o ~ n a l
Goelz, S. E., S. R. Ham~ltonand B. Vogelstein. 1985. RNA gene sequences and hominoid phylogeny
Purification of DNA from formaldehyde fixed Mol. B~ol.Evol. 7:203-219
and paraffin embedded human tissue. Biochem. Good, D. A. 1989. Iiybridizahon and crypt~cspeclcs in
Biophy Res. Comm.l30:118-126. Dzcamptodon (Caudata. Dicamptodontidae)
Gojobori, T., W.-H. Li and D. Graur. 1982. Patterns of Evolution 43:728-744.
nucleotide substitution in pseudogenes and func- Good, D. A., G. Z.Wurst and D.B Wake. 1987.
tional genes. J. Mol. Evol. 18:360-369. Patterns of geographic variat~onin allozymes of
Gold, ,T. R. and L. R. Richardson. 1990. Restriction sitc the Olympic salamander, Rlzyacotriton oly~nyiiu\
heteroplasmy in the mitochondria1 DNA of the (Caudata: Dicamptodontxdac). Fieldiana Zool N
marine fish Scinenops ocellatus (L.).Anim. Genet.. S. 1374:l-15.
21 :313-316. Good, P.1994. Pernziilatioii 'resfs: A Practical Glrrde to
Golding, G. 8. 1983. Estimates of DNA and protein Resampliizg for Testing ~ypotliesrs.Springer-Verlag,
sequence divergence: An examination of some New York.
assumptions. Mol. Biol. Evol. 1:125-142. Goodfcllow, P. N. 1993. M~crosatellitesand lhc new
Golding, G. B. and C. Strobeck. 1983. Increased num- genetlc maps. Curr. B~ol.3:149-151.
ber of allcles found in hybrid populations due to Goodman, M. 1961. The role of lmmunachemical dlf-
intragenic recombination. Evolution 37:17-29. ferences in the phyletic devclopment of human
Goldman, N. 1990. Maximum likelihood of phyloge- behavior. Human B~ol.33 131-162.
netic trees, with special reference to Poisson Goodman, M. 1963 Serolog~calanalysis of the system-
process models of DNA substltution and to parsi- atics of reccnt h o m ~ n o ~ dI-luman
s. Biol.
mony analysis. Syst. Zool. 39:345-361. 35:377-424.
Goldman, N. 1993a. Statistical tests of models of DNA Goodman, M. 1981. Decoding the pattern of protell1
substltution. J. Mol. Evol. 36:182-198. evolution. Progr. Biophys. Mol. Biol. 37:105-164.
Goldman, N. 199313. Simple diagnostic tests of models Goodman, M. 1985. Iiatcs of molecular evolut~onThe
of DNA substitution. J. Mol. Evol. 37:50-661. hominoid slowdown BioEssays 3:9-14.
Goldman, N. and Z. Yang. 1994. A codon-based model Goodman. M. and G. W. Moore 3971.
of nucleotide substitution for protein-coding Immunodiffusion systematics of the primatcs I
DNA sequences. Mol. Biol. Evol. 11:725-736. The Catarrhhi. Syst. Zool 20:19-62.
Goldstein, D. B. and D. D. Pollock. 1994. Least squares Goodman, M., J. Barnabas, G. Matsuda and G. W.
estimation of molecular distance-noise abate- Moore. 1971. Molecular evolution in the descent
ment in pl~ylogeneticreconstruclion. Theor. Pop. of man. Nature 233:604-613.
Bial. 45:219-226. Goodman, M., J. Czelusniak, G W Moore, A. E.
Goldstein, D. B., A. R. Linares, M. W. Feldman and L. Romere-Herrera and G. Matsuda. 1979 F~tling
L. Cavalli-Sforza. 1995. An evaluatiol~of genetic the gene lineage into the species lineagc, a parsi-
distance for use with microsatellite data. Genetics mony strategy illustrated by cladograms con-
139:463471. structed from globln sequences. Syst. Zool.
Golenberg, E. M., D. G. Giannasi, M. T.Clegg, C. J. 28:132-163.
Smiley, M. Durbin, D. Henderson and G. Goodman, M., M. M.Miyamoto and J. C7elusnlak.
Zurawski. 1990. Chloroplast DNA sequence from 1987. Pattern and process in vertebrate phylogeny
a Miocene Magnolia species. Nature 344:656-658. revealed by coevolution of molecules and mor-
GoUmann, G., P. Rolh and W. Hodl. 1988. phologies, pp. 141-176. Iiz C. Patterson (ed.),
Hybridization between fire-bellied toads Bombina Molecules and Morphology L I Z Evolution: Confllcl a:
barnbiizn and Boinbi~zavariegata in the Karst regions Cornpromzse? Cambridge Un~versityPress,
of Slovakia and Hungary: Morphological and Cambridge.
allozyrne evidence. J. Evol. Biol. 1:3-14. Gorman, G. C. 1971. Evolutionary genetics of ~sland
Goloboff, P. A. 1993. Estimating character weights dur- lizard populations. Yearbook Am. Philo. Soc
ing tree search. Cladistics 983-91. 1971:318-319.
Gonzales, I. L., J. L. Gorski, T. J. Campden, D. J. Gorman, G. C. and J. Rcnzi, Jr. 1979 Genetic dlstancc
Dorney, J. M. Erickson, J. E. Sylvester and R. D. and hctcrozygosity estimates m electrophoret~c
Schmiclcel. 1985. Variation among human 28s studics: Effects of sample slzc. Copeia
ribosomal RNA genes. Proc. Natl. Acad. Sci. USA 1979242-249.
82:7666-7670.
Golman, G. C, and D.Shochat 1972. Multiple lactate the primates. Proc. Roy. Soc. London B
dchydrogcnase alleles In the lizard Agamn stelllo 243:241-253.
Expcrle~ltia28:351-353. Graybeal, A. 1993. The phylogenetlc utihty of
Gorman, G.C., A. C. Wilson and M. Nakanish~.1971. cytochrome b: Lessons from bufonid frogs. Mol.
A biochemical approach towards the study of rep- Phylogenet. Evol. 2:256-269.
t~!janphylogeny: Evolution of serum albumin Graybeal, A. 1994. Evaluating the phylogenetic utility
and lactlc dehydrogenase Syst. Zool. 20:167-185 of genes: A search for genes informative about
Gorman, G. C., D. G. Buth and J. S. Wyles. 1980. Anoiis deep d~vergencesamong vertebrates. Syst. Biol.
lizards of the eastern Caribbean: A case study m 43:174-193.
e~volutlon.111. A cladlsiic analysis of albumin Green, D.M. 1983. Evidence for chron~osomenumber
i~nlnunolog~cal data, and the definition of specles reduction and chromosome homoseguentiality in
groups. Syst. 2001. 29.143-158. the 24-chromosome Korean frog, Kana dybowskiz
Gorzuia, S., C. L.Arocha-Pinango and C. Salazar. 1976. and related species. Chromosorna 88:222-226
A method of obtaining blood by caudal vein from Green, D. M. and S. K. Sessions. 1991.Amphibian
laige reptiles. Copela 1976:838-839. Cyfogeneticsand Evolution. Academic Press, San
Gottlleb, L. D. 1973. Genetic differentlation, sympatric Diego.
speciation, and the origin of a diploid species of Green, D.M., J. P. Bogart and E. H. Anthony. 1980. An
Scephanomerra. Am. J. Bot. 60:545-553. interactive, microcomputer-based karyotype
Gottllob, L. C . 1982a. Conservation and duplication of analysis system for phylogenetic cytotaxonomy
isozymes in plants. Science 216:373-380. Comput. Biol. Med. 10:219-227.
Gotilzeb, L. D. 1982b. Isozyme number and pliylogeny, Greenbaum, 1. F. 1981. Genetic interactions between
pp 209-221. In U. Jensen and D. E. Fairbrothers hybridizing cytotypes of the tent-making bat
(eds ), Proteins and Nucle~cAcids in Plant (Uyoderma bllobatzrm). Evolution 35:305-320.
Spternatics. Sprmger-Verlag, Berlin. Greenberg, B. D., J. E. Newbold and A. Sugino. 1983.
Gottl~cb,L. D. and N. F. Weeden. 1979. Gene duplica- Xntraspecific nucleotide sequence variability sur-
tlon and phylogeny in Clnrkia. Evolution rounding the origin of replication in human mito-
33: 1024-1039. chondrial DNA. Gene 21:33-49.
Cough, J A. and N. E. Murray. 1983. Sequence diversi- Grompe, M. 1993. The rapid detection of unknown
~yamong related genes for recognition of spec~flc mutations in nucleic acids. Nature Genetics
targets in BNAmolecules. J. Mol. Biol. 166:l-19. 5:121-117.
Gouy, M. and W. -H. Li. 1989a. Phylogenetic analysis Groot, G. S. P. and A. M. Kroon. 1979. Milocl~ondrial
based on r W A sequences supports the archae- DNA from various organisms does not contain
bacterial rather than the eocyte tree. Nature internally methylated cytosine in -CCGG-
339:145-147. sequences. Biochim. Biophys. Acta 564:355--357.
Gouy, 91. and W. -H. Li. 1989b. Molecular phylogeny Guadet, J., J. Julien, J. Lafay and Y. Brygoo. 1989.
of the k~ngdomsAnimalia, Plantae and Fungi. Phylogeny of some Fusarium species, as deter-
Mol. 8101. Evol. 6.109-122 mined by large-subunit rRNA sequence compari-
GIaicn, A. 1989. The phylogenetic regression. Phil. son. Mol. Biol. Evol. 6:227-242.
Trans. Roy. Soc. London 326:119-157. Gruenbaum, H., T.Naveh-Many, H. Cedar and A.
Grafcn, A. 1992. The uniqueness of the phylogenetic Razin. 1981. Sequence specificity of methylation
regression. J. Thcor. Biol. 156:405-423. in higher piant DNA Nature 292:860-862.
Grahain, D,E 1978. The lsolatlon of h ~ g hmolecular Grula, J. W., T. J. Wall, T. D. Giugni, G. J. Graham, E. H.
xve~ghtDNA from .cvhole organisms or large tis- Davidson and R. J. Britten. 1982. Sea urchin DNA
sue masses. Analyt Blochem. 85:609-613. sequence variation and rcduced interspecies dif-
Glaham, J. H. and J. D. Fclley, 1985 Genomic coadap- ferences of the less variable DNA sequences.
ration and developmental stability within intro- Evolution 36665-676.
gresscd populations of Entzeauanthus glorzosus and Gu, X., Y.-X. Fu and W.-H. Li. 1995. Maximum likeh-
E obnus (Piscer;,Centrarchidade). Evolution hood estimation of the heterogeneity of substitu-
39.204-114. tion rate among nucleotide sites. MoI. Biol. Evol.
Gray, C. S. and Mr. M. Fitch. 1983. Evolution of antibi- 12:546-557.
otic res~stancegenes: The DNA sequence of a Guillcmette, J. G. and P. N.Lewis. 1983. Detection of
kanamycin resistance gene from Staphylococcus s~lbnanogramquantities of DNA and RNA on
nureus. Mo!. Biol. Evol. 1:57-66. nat~veand denaturing polyacrylamide and
GI ,iy, 1 C. and A. J. Jefileys. 1991. Evolutionary tran- agarose gels by silver staining. Electropborefiis
s~enceof hypervariable mmisatellites in man and 4:92-94.
Guo, S.-W and E. A. Thompson. 1992. Performing the populations of the ninesplne stickleback,
exact test of Hardy-Weinberg proportion for mul- Pungitius pungltius, pp. 438-452. In R. L. Mayden
tiple alleles. Biornetrics 48:361-372. (ed.),Systematics, H~siorlcalEcology, and North
Gupta, R., J. M. Lanter and C. R. Woese. 1983. Amerran Freshwater F~shes.Stanford University
Sequence of the 16s ribosomal RNA from Press, Stanford.
Halobacteriunz volcanii, an archaebacterium. Haig, S. M , J. R. Belthoff and D.H. Allen. 1993.
Science 221:656-659. Examination of population structure in red-cock-
Guries, R.P. and E T. Ledig. 1982. Genetic diversity aded woodpeckers using DNA profiles. Evolution
and population structure in pitch pine (Pinus rigi- 47:185-194.
da Mill). Evolution 36:387-402. Halanych, K.M., J. D. Bacheller,A. M. A. Aguinaldo,
Gutierrcz, R. J., li. M.Zink and S. Y. Yang. 1983. S. M. Liva, D. M. Hillis and J. A. Lake. 1995.
Genetic variation, systematic and biogeographic Evidence from 185 ribosomal DNA that the
relationships of some Galliform birds. Auk lophophorates are protostome animals. Science
100:33-40. 267:1641-1643.
Gyllensten, U. B ,D. Wharton, A. Josefsson and A. C. Halkka, L., Soderlund, U. Skaren and J. Keikkila.
Wilson. 1991. Parental inheritance of mitochondi- 1987. Chromosomal polymorphism and racial
a1 DNA in mice. Nature 352255-257. evolution of Sorex araneus L. in Finland. Hereditas
Gyllensten, V. and H. Erlich. 1988. Generation of sin- 106:257-275.
gle-stranded DNA by the polymerase chain reac- Hall, B. K. (ed.). 1994. Eomology: The Hierarchical Basis
tion and its applications to direct sequencing of of Comparative Biology. Academic Press, New York.
the HLA-DQa locus. Prac. Natl, Acad. Sci. USA Hall, P. and M. A. Martin. 1988. On bootstrap resam-
85:7652-7656. pling and iteration. Biometrika 756614371.
Hall, T. C., Y Ma, B. V. Buchbinder, J. W. Pyne, S. M.
Haberfeld, A,, A. Cahaner, 0 . Yoffe, Y. Plotsky and J. Sun and F. A. Bliss. 1978. Messenger RNA for GI
Hillel. 1991. DNA fingerprints of farm animals protein of French bean seed: Cell-free translation
generated by microsatellite and minisatellite and product characterization. Proc. Natl. Acad.
DNA probes. Anim. Genet. 22:299-305. Sci. USA 75:3196-3200.
Hack, M. S, and H. T. Lawce. 1980. The Association of Hall, T. J., J. W. Grula, E. H. Davidson and R. J. Britten.
Cyfogenetic Technologists Cytogenetics Laboraioy 1980. Evolution of sea urchin non-repetitive DNA.
Manual. University of California Press, San J. Mol. Evol. 16:95-110.
Francisco. Hallick, R. B., L. Hong, R. G. Drager., M. R. Eavreau,
Hadjiolov, A. A., 0.I. Georgiev, V. V. Nosikov and L. P A. Monfort, B. Orsat, A. Spielman and E. Stutz.
Yavachev. 1984. Primary and secondary structure 1993. Complete sequence of Euglena gracilis
of rat 285 ribosomal RNA. Nucl. Acids Res. chloroplast DNA. Nucl. Acids Res. 21:3537-3544.
12:3677-3693. Haltiner, M., T. Kempe and R. Tijian. 1985. A novel
Hadrys, H., M.Balcik and B. Schierwater. 1992. strategy for constructing clustered point muta-
Application of random amplified polymorphic tions. Nucl. Acids Res. 13:1015-1026.
DNA (RAPD) in molecular ecology. Mol. Ecol. Hamby, R. K. and E. A. Zimmer. 1988. Ribosomal RNA
1:55-63. sequences for inferring phylogeny within the
Hadrys, H., B. Schierwater, S. L. Dellaporta, R DeSalle grass family (Poaceae). Plant Syst. Evol.
and L. W. Buss. 1993. Determination of paternity 160:29-37.
in dragonflie~by random amplified polymorphic Hamby, R. K. and E. A. Zimmer. 1992. Ribosomal RNA
DNA fingerprinting. Mol. Ecol. 2:29-87, as a phylogenetic tool in plant systematics, pp.
Haeckel, E.1866. Generellc Morphologie der Organismen: 50-91. In P. S. Soltis, J. E. Soltis and J. J. Doyle
Allgerneine Grundzuge der organischen Formen- (eds.),Molectrlar Systetnaflcs of Plants. Chapman
Wissenschaft, nzechanisch begrtrndet durch die von and Hall, New York.
Charles Darzuin rcforrnirte Descendenz-Theorie Hamby, R. K., L. Sims, L. Issel and E. Zimmer. 1988.
Georg Riemer, Berlin. Direct ribosomal RNA sequencing: Optimization
Hafner, M. S. and S. A Nadler. 1988. Phylogenetic of extraction and sequencing methods for work
trees support the coevolution of parasites and with higher plants. Plant Mol. Biol. Rep.
their hosts. Nature 332258-259. 6:175-192.
Hagiund, T.R., D. G. Buth and R.Lawson. 1993. Hames, B. D. and D. Rickwood. (eds.) 1981. Gel
Allozyme variation and phylogeneiic relation- Electrophoresis of Protezns. A Practical Approach XRL
ships of Asian, North American, and European Press, Oxford.
Hames, B. D. and S. J. Higgins. 1985. Nuclezc Acid genetic marker in population and evolutionary
Hybridizatron: A Practzcnl Approach. IRL Press, biology. Trends Ecol, Evol. 4:6-11.
Oxford. Harrison, R. G. 1990. Hybrid zones: Windows on evo-
Hamkalo, B. A. and N. J. Hutchison. 1984. In situ lutionary process, pp. 69-128. In D. J. Futuyma
hybridization at the elcctron microscope level, pp. and J. Antonovics (eds.), Oxford Sl~rveysin
97-115. In R. S. Sparkes and F. F. de la Cruz (eds.), Evolutzonary Biology. Vol. 7. Oxford University
Research Perspectives in Cytogenetics. University Press, London.
Park Press, Baltimore. Harrison, R. G., D. M. Rand and W. C. Wheeler. 1987.
Hamlyn, P. H., G. G. Brownlee, C.-C. Cheng, M. J. Gait Mitochondria1 DNA variation in field crickets
and C. Milstein. 1978. Complete sequence of con- across a narrow hybrid zone. Mol. Biol. Evol.
stant and 3' noncoding regions of an 4144-158.
immunoglobulin mRNA using the dideoxynu- Harry, J. L. and D. A. Briscoe. 1988. Multiple paternity
cleotide method of RNA sequencing. Cell in the loggerhead turtle (Caretta caretta). J. Hered.
15:1067-1075. 79:91-99.
Ilamrick, J. L. and M. J. W. Godt. 1989.Allozyme Hartl, D. L. and A. G. Clark. 1989. Principles of
diversity in plant species, pp. 43-63. In A. D. H. Population Genetics. 2nd ed. Sinauer, Sunderland,
Brown, M. T. Clegg, A. L. Kahler and B. S. Weir Massachusetts.
(eds.), Plant Population Genetics, Breeding and Hartman, B. K. and S. Udenfried. 1969. A method for
Genetic Resources. Sinauer, Sunderland, immediate visualization of proteins in acrylamide
Massachusetts. gels and its use for preparation of antibodies to
Hancock, J. M. and G. A. Dover. 1990. "Compensatory enzymes. Analyt. Biochem. 30:391-394.
slippage" in the evolution of ribosomal genes. Harvey, P. H. and M. D. Pagel. 1991. The Comparative
Nucl. Acids Res. 18:5949-5954. Method In Evolutionary Biology. Oxford University
Hanotte, O., E. Cairns, T. Robson, M. C. Double and T. Press, Oxford.
Burke. 1992. Cross-specieshybridization of a sin- Harvey, II., E. C. Holmes, A. O. Mooers and S. Nee.
gle locus minisatellitc probe in passerine birds. 1994. Inferring evolutionary processes from mole-
Mol. Ecol. 1:127-130. cular phylogenies, pp. 313333, In R. W. Scotland,
Hanotte, O., C. Zanon, A. Pugh, C. Greig, A. Dixon D. J. Siebert and D. J. Williams (eds.), Models in
and T.Burke. 1994. Isolation and characterization Phylogeny Reconstruction. SystematicsAssociation
of microsatellite loci 111 a passerine bird: The reed Special Volume 52, Oxford.
bunting. Mol. Ecol. 3:529-531. Hasegawa, M. and M. Fujiwara. 1993. Relative effi-
Harding, J. D. and R. A. Keller. 1992. Single-molecule ciencies of the maximum hkelihood, maximum
detection as an approach to rapid DNA sequenc- parsimony, and neighbor-joiiung methods for
ing. Trends Biotech. 10:55. estimating protein phylogeny. Mol. Phylogen.
Harper, M,E., A. Ullrich and G. F. Saunders. 1981. Evol. 2:l-5.
Localization of the human insulin gene to the dis- Hasegawa, M. and T. Kashimoto. 1993. Ribosomal
tal end of the short arm of chromosome 11. Proc. RNA trees misleading? Nature 361:23.
Natl. Acad. Sci. USA 78:445&4460. Hasegawa, N. and H. Kishino. 1989. Heterogeneity of
Harper, M. E. and Saunders, G. F. 1984. Localization of tempo and mode of mitochondrial DNA evolu-
single-copy genes on human chromosomes by in tion among rnam~nalianorders. Japan J. Genet.
situ hybridization of 3H-probes and autoradiogra- 61:243-258.
phy, pp. 217-133. In R. S. Sparkes and E E de la Hasegawa, M., Y, Iida, T. Yano, E Takaiwa and M.
Cruz (eds.), Research Perspectives itr Cytogetzetics. Iwabuchi. 1985a. Phylogenetic relationships
University Park Press, Baltimore. among eukaryotic kingdoms inferred from ribo-
Harris, H. 1966. Enzyme polymorphism in man. Proc. somal RNA sequences. J. Mol. Evol. 22:32-38.
Roy Soc. London B 164:298-310. Hasegawa, M., W.Kishino and T. Yano. 1985b. Dating
Harris, H, and D. A. Hopkinson. 1976 et seq. Handbook of the human-ape splitting by a molecular clock
of Enzyme Elecfrophoresis in Human Genetics. of mitochondrial DNA. J. Mol. Evol. 21:160-174.
North-Holland, Amsterdam. Hasegawa, M., H. Kishino and N. Saitou. 1991. On the
Harris, S. A. and R. Ingram. 1991. Chloroplast DNA maximuln likelihood method in molecular phylo-
and biosysternatics: The effects of intraspecific genetics. J. Mol. Evol. 32:443-445.
diversity and plastid transmission. Taxon Haslewood, G. A. D. 1967. Bile Salts. Metl~uen,
40:393-4 London.
Harrison, R. G. 1989. Animal mitochondrial DNA as a Hassot.mil, N., B. Michot and J.-P. Bachellerie. 1984.
The complete nucleotide sequence of mouse 285 I-Icdges, S. B 1989. Evolut~onand biogeography 01
rRNA gene. Implications for the process of size West Indran frogs of the genus Elcutherodactyir~s
Increase of the large subunrt rRNA in higher Slow-evolving locl and the major groups, pp
eukaryotes. Nucl. Acids Res. 12:3563-3583 305-370.111 C. A. Woods, (cd 1, B10,yeograpizyn/ khc
Haucke, H-R and G. Gellrssen. 1988. Different mito- West Indies: Past, Preseizl, rzild Future. Sandhill
chondrial gene orders among insects: Exchanged Crane Press, Galnesvrlle, Florrda.
tRNA gene positions in the COII/COIII region t-ledges, S. B. 1992. The number of replications ncedcd
between an orthopteran and a dipteran species. for accurate est~mationof the bootstrap P value in
Curr. Genet. 14:471-476. phylogenetic studies. Mol. B~ol.Evol. 9.366-369
Haufler, C. H. 1987. Electrophoresis is modifying our Hedges, S. B. and M. EI. Schwe~bcr.1995. Detcctlng
concepts of evolutjon in homosporous pterido- dlnosaur DNA. Sclence 268,1191-1192.
phytes. Am. J. Bot. 74:953-966. Hedges, S. B ,K. D. Mobcrg and L R. Maxson 1990
Haugland, R. P. 1992-1994. Handbook ojFluorescent Tetrapod phylogcny ~nferredfrom 185 and 28s
Plabes and Research Chemicals. 5th ed. Molecular ribosomal RNA sequences and a review of the
Probes, Eugene, Oregon. cvrdence for alnniotc relatioi1shlps. Mol. Blol
Hauswirth, W. L. and P. J. Laipis. 1985. Transmission Evol. 7:607-633.
genetics of mammalian mitocl~ondria:A molecu- Hcdgcs, S. R., R L. Rezy and L R. Maxson. 1991.
lar inodel and experimental evidence, pp. 49-59. Phylogenetic relationsl~lpsand biogeography of
In E. Quagliariello, E. C. Slater, E Palrnieri, C. xantusiid lizards, inferred from mitochondr~al
Saccone and A. M. Kroon (eds.), Achievements and DNA sequenccs. Mol. Biol. Evol. 8:767-780.
Perspectives of Mitochondria1 Research. Elsevier, Hedges, S. B., J. Bogart and L. I<.Maxson. 1992a
Amsterdam. Ancestry of un~sexualsalamanders. Nature
Hauswirth, W. W., L. 0.Lim, B. Dujon and G. Turner. 356 708-710.
1987. Methods for studying tile genetics of mito- Hedges, S B., S Kumar, K Tamura and bi.Stoneking
cl~ondria,pp. 171-282. In V. M. Darley-Usmar, D. 1992b. kiuman origins and analysis of mitochon-
Rickwood and M. T. Wilson (eds.), Mitochondria. drial DNA sequences Science 255:737-739
A Pracfical Approach. IRL Prcss, Oxford. I-Iedges, L. V. and I. Olkin 1985. Stntzstical Mcthods T o ,
Hay, R. J. 1979. Idenhfication, separation and culture Meta-analysis. Academic Prcss, Orlando, Flortda
of mammalian tissue cells, pp. 143-318. In E. Reid Hedrick, P.W. 1983. Genet~csof Populatlolzs. Science
(cd.), Cell Populations, Methodology Surveys (B): Books International, Boston.
Biochemistry. Vol. 8. Wiley and Sons, New York. Hem, J. 1989a. A new method that slinultaneously
Hay, R. J. and G. E Gee. 1984. Procedures for collecting aligns and reconstructs ancestral sequences for
cell. lines under field conditions, pp. 25-26. In H. any number of homologous sequences, when
C. Dessauer and M. S. Hafner (eds.), Collections of phylogeny is given. Mol. Biol. Evol. 6:649-668
Frozen Tissues: Value, Management, Field and Hein, J. 1989b. A tree reconstruct~onmethod that is
Laboratoiy Procedures, and Dzrectory of Existing economical m the numbex of pairwlse compar-
Collections.Assoc. Syst. Collections, University of isons used. Mol. Blol. Evoi 6.669-684.
Kansas Press, Lawrence. Hem, J. 1990a. Reconstruchng evolution of sequences
Hayasaka, K., T.Gojobori and 5. Horai. 1988. subject to recomb~nat~on uslng parsimony. Math,
Molecular phylogeny and evolution of primate Biosci. 98:185-200.
mitochondria1DNA. Mol. Biol. Evol. 5:626-644. I-Iein, J. 1990b. Unified approach to alignment and
Hayashi, K. 1991a. PCR-SSCP: A simple and sensitive phylogenies. Mct11. Enzyn~ol.183:626-644.
mcthod for detection of mutations in the genomic Hein, J. 1993. A heuristic mcthod to reconstruct the
DNA. PCR Meth. Applica. 1:34-38. history of sequences subject to recombinat~onJ.
Hayashi, K. 1991b.PCR-SSCP: A method for detction Mol. Evol. 36:396-405.
of mutations. GATA 9:73-79. Heinstra, P.W. H., W J. M.Aben, W. Scharioo and G
Hayes, J. P. and R. G. Harrison. 1992. Variation in E. W. Thorig. 1986. Alcohol dehydrogenase of
mitochondria1 DNA and the biogeographic histo- DrusophiIa nielanogaster: Metabalic differences
ry of woodrats (Neotoma)of the eastern United mediated througi cryptic allozymcs. Heredity
States. Syst. Biol. 42:331-344. 57.23-29.
Healy, J. A. and M. E Mulcahy 1979. Polymorphic Helfinan, D. M., J. C. Fiddes and D. Hanahan. 1987
tetrameric superoxide dismutase in the pike Esox Directional cDNA cloning in plasmid vectors by
luctus L. (Fisccs; Esocidae).Comp. Biochem. sequential. addition of oligonucleotide linkers.
Physiol. 62B:563-565. Meth. Enzymol. 152:349-359.
588 Literature Cifed
Hendc~son,A. S. 1982. Cytological hybndization to triosephosphate isomerase (TPI)in Isotes
mammalian chromosomes. Int. Rev. Cytol. (Isotaceae). Am. J. Bot. 76:215-221.
76 1 4 6 I-Xiggins, D. G. and I?. M. Sharp. 1988. CLUSTAL: A
Henderson, N. S. 1965. Isozymes of isocitrate dehy- package for performing multiple sequence align-
drogenase: Subunit structure and intracellular ment on a microcomputer. Gene 73:237-244.
location J. Exp. Zoo!. 158:263-274. Higgins, D. G. and P, M. Sharp. 1989. Fast and sensi-
ISendy, M. D. 2989. The relationship between simple tive ~nultipiesequence alignments on a micro-
evolutionary tree models and observable computer. CABIOS 5:151-153.
iequcnce data. Syst. Zool. 38:310-321. Higgins, D. G., A. J. Bleasby and R. Fuchs. 1992.
I-lendy, M. D. 1991.A cornbinatorial description of the CLUSTAL V: Improved software for multiple
closest tree algorithm for finding evolutionary sequence alignment. Comput. Appl. Biosa.
irees. Discrete Math. 96:51-58 8.189-192.
IIendy, M. D. and D. Penny. 3 982. Branch and bound Highton, R., G. C, Maha and L. R. Maxson. 1989.
aigorrtluns to determine minimal evolutionary Biochemical evolution in the slimy salainanders
irces Math. Bioscl, 59:277-290. of the P(et1zado11glutinosus complex in the eastern
Iiendy, M. D. and D. Penny. 1989. A framework for the United States. Illinois Biol. Monogr. 57:l-153.
qudntitative study of evolutionary trees. Syst. Higucl~i,R. G. and H. Ochman. 1989. Production of
Zool. 38.297-309. single-stranded DNA templates by exonucleasc
Hendy, M D. and D. Penny. 1993. Spectral analysis of digestion following the polymerase chain reac-
phylogenetic data. J. Class. 10:5-24. tion. Nucl. Acids Res. 17:5865.
Hendy, M. D., D. Penny and M, A. Steel. 1994.A dis- Higuchi, R. G., B. Bowman, M. Freiberger, 0.A Ryder
crete Fourier analysls for evolutionary trees. Proc. and A. C. Wilson. 1984. DNA sequences from the
Natl Acad. Sci. USA. 91,33393343. quagga, an extinct member of the horse family.
HclukoTf, S. and J. G. Hen~koff.1992.Amino acid suh- Nature 312:282-284.
st~tutionmatrices from protein blocks. Proc. Natl. Higuchi, R. G., L. A. Wrischnik, E. Oakes, M. George,
Acad. SCI.USA 89:10925-10919. B. Tong and A. C. Wilson. 1987. Mitochondria1
Hennig, W. 1950. Grundzuge einer Th~orieder phylo- DNA of the extinct quagga: Relatedness and post-
~erirf~scheiz Systemaflk Deutscher Zentralverlag, mortem change. J. Mol. Evol. 25:283-287.
Berlin. Hilbish, T. J. and R. K.Koehn. 1985a. The physiologi-
l-Iennig, W. 1966. Phylogel~eflcSysfematics. University cal basis of natural selection at the LAP locus.
u l Illinois Press, Urbana. Evolution 393302-1317.
i-icreiord, L M. and R.liobash. 1977. Number and dis- Hilbish, T. J. and R. K.Koel~n.1985b. Dominance i n
tribution of polyadenylated RNA sequences in physiological phenotypes and fitness at an
yeast Cell 10:453-462. enzyme locus. Science 229:52-54.
Helman, S. G. 1980. The Naturalist's Field Journal. Buieo Hilbish, T. J., L. E. Deaton and R. K. Koehn. 1982.
Books, Vermillion, South Dakota. Effect of an allozyme polymorphism on regula-
Hernandez, J. L. and B. S. Weir. 1989. A disequilibrium tion of cell volume. Nature 298:688-689.
coefficient approach to Hardy-Weinberg testmg. Will, W. G. and 13. S. Weir. 1988. Variances and covari-
Ulolnetrics 45:53-70. ances of squared linkage disequilibria. Theor.
1iernandc~-Juviel, J. M., U.J. Morafka, I. Delgado, G. Pop. Biol. 33:54-78.
D Scott dnd R. W Murphy. 1992. Effect of enzyme Hill, W. G. and B. S. Weir. 1994. Maximum likelihood
dilution on the relative mobility of glutamate estimation of gene location with linkage disequi-
dehydrogenase isozyrnes in the prairic rat- librium. Am. J. Human Genet. 54:705-714.
ilesnake, Crotalus vir~disvrridls. Copeia Hillis, D. M. 1984. Misuse and modification of Nei's
1992 1117-1119. genetic distance. Syst Zool. 33238-240.
J-lewitt, G. M. 1988. Ilybrrd zones-natural laborato- Hillis, D. M. 1985. Evolutionary genetics of the
ries for evolutionary studies. Trends Ecol. Evol. Andean lizard genus Pl~olidobolus(Sauria:
3 158-167. Gymnophthalmidae): Phylogeny, biogeography,
I-Iey, J and R. M, miman. 1993. Population genetics and a comparison of tree construction techniques.
and phylogenetics of DNA sequence variation of Syst. Zool.34:109-126.
il~uItlpleloci within the Drosophila melanogaster I-iillis, D.M.1987. Molecular versus morphological
complex. Mol. Biol. Evol. 10:804-822. approaches to systematics. Annu. Rev. Ecol. Syst.
Hickey, R J., S. I. Guttman and W. H. Eshbaugh. 1989. 28:23-42.
Ev~dencefor post-translational modification of
Literature Cifed 589
Hillis, D. M. 1989. Genetic consequences of partial Molectilar Evolution of Physiological Processes.
self-fertilizationon populations of the Florida tree Rockefeller University Press, New York.
snail (Liguusfasciatus). Am. Malacol. Bull. 6:7-12. Hillis, D. M. and J. P. Huelsenbeck. 1995. Assessing
fillis, D. M. 1990. The phylogeny of amphibians: molecular phylogenies. Science 267:255-256.
Current knowledge and the role of cytogenetics, Hilhs, D. M. and J. C. Patton. 1982. Morphological and
pp. 7-31. In D. M. Green and S. K. Sessions (eds.), electrophoretic evidence for two species of
Amphibian Cytogenetics and Evolutiofi. Academic Corbicula (Bivalvia: Corbiculidae) in North
Press, San Diego. America. Am. Midl. Nat. 108:74-80.
Willis, D. M. 1991. Discriminating between phyloge- Hillis, D. M., D. S. Rosenfield and M. Sanchez. 1987.
netic signal and random noise in DNA sequences, Allozymic variability and heterozygote deficiency
pp. 278-294. In M. M. Miyamoto and J. Cracraft within and among morphologically polymorphic
(eds.), Pkylogenetic Analysis of DNA Sequences. populations of Liguus fasciatus (Molluscs:
Oxford University Press, New York. Pulmonata: Bulimulidae). Am. Malacol. Bull.
Hillis, D. M. 1994a.Homology in molecular biology, 5:155-159.
pp. 339-367. bl B. K. Hall (ed.), Homology: The Hillis, D. M., M. T.Dixon and L. K. Ammerman.
Hierarchical Basis of Comparative Biology. Academic 1991a. The relationships of the coelacanth
Press, New York. Lntimeria chalumnae: Evidence from sequences of
Hillis, D. M. 1994b. Phylogenetic searching of molecu- vertebrate 285 ribosomal RNA genes. Environ.
lar data bases. Syst. Biol. 43:461463. Biol, Rshes 32:119-130.
Hillis, D. M. 1995. Approaches for assessing phyloge- Hillis, D. M., M. T. Dixon and A. L. Jones. 1991b.
netic accuracy. Syst. Biol. 44:3-16. Minimal genetic variation in a morphologically
Hillis, D. M. and J. J. BuU. 1991. Of genes and diverse species (Florida tree snail, Liguus fascia-
genomes. Science 254528. tus).J. Hered. 82282-286.
Hillis, D. M. and J. J. Bull. 1993.An empirical test of Hillis, D. M., C. Moritz, C. A. Perter and R. J. Baker.
bootstrapping as a method for assessing confi- 1991~.Evidence for biased gene conversion in
dence in phylogenetic analysis. Syst. Biol. concerted evolution of ribosomal DNA. Science
42:182-192. 251:308-310.
Hillis, D. M, and S. K. Davis. 1986. Evolution of ribo- Hillis, D. M., J. J. Bull, M. E. White, M. R. Badgett and
somal DNA: Fifty million years of recorded histo- I. J. Molineux. 1992. Experimental phylogenetics:
ry in the frog genus Rana. Evolution 40:1275-1288. Generation of a known phylogeny. Science
Hillis, D. M. and S. K. Davis. 1987. Evolution of the 255:589-592.
285 ribosomal RNA gene in anurans: Hillis, D. M., M. W. Allard and M.M. Miyamoto.
Phylogenetic implications of length and restric- 1993a. Analysis of DNA sequence data:
tion site variation. Mol. Blol. Evol. 4:117-125 Phylogenetic inference. Meth. Enzymol.
Hillis, D. M. and S. K. Davis. 1988. Ribosomal DNA: 242:456487.
Intraspecific polymorphism, concerted evolution, Hillis, D. M., L. K. Ammerman, M. T.Dixon and R. 0.
and phylogeny reconstruction. Syst. Zool. de Sb. 199323. Ribosomal DNA and the phylogeny
32:63-66. of frogs. Herpetol. Monog. 7:118-131.
Hillis, D. M, and M. T. Dixon. 1989. Vertebrate phy- Hillis, D. M., J. J. Bull, M. E. White, M. R. Badgett and
logeny: Evidence from 28s ribosomal DNA I. J. Molineux. 1993c. Experimental approaches to
sequences, pp. 355-367. In B. Fernholm, K. phylogenetic analysis. Syst. Biol. 42:90-92.
Bremer and I-I. Jornvall (eds.), The Hierarchy of Hillis, D. M., J. P. Huelsenbeck and C. W.
Life. Proc. Nobel Symp. 70. Elsevier, Amsterdam. Cunningham. 1994a. Application and accuracy of
Hillis, D. M. and M. T. Dixon. 1991. Ribosomal DNA: molecular phylogenies. Science 264:671-677.
Molecular evolution and phylogenetic inference. Hillis, D. M., J. P. Huelsenbeck and D. L. Swofford.
Quart. Rev. Biol. 66:411453. 1994b. Hobgoblin of phylogenetics? Nature
Hillis, D. M. and J. P. Huelsenbeck. 1992. Signal, noise, 369:363-364.
and reliability in molecular phylogenetic analy- Hinkle, G., J. K. Wetterer, T. R. Schultz and M. L.
ses. J. Hered, 83:189-195. Sogin. 1994. Phylogeny of the attine ant fungi
Hillis, D. M, and J. Huelsenbeck. 1994a. Support for based on analysis of small subunit ribosomal
dental HIV transmission. Nature 369:24-25. RNA gene sequences. Science 266:1695-1697.
Hillis, D. M. and J. P. Huelsenbeck. 1994b. To tree the Hiratsuka, J., H. Shimada, R. Whittier, T, Ishibashi, M.
truth: Biological and numerical simulations of Sakamoto, M. Mori, C. Knoda, Y. Honii, C. -R.
phylogeny, pp. 55-67. In D. M. Fambrough (ed.), Sun, B. -Y. Meng, Y. -Q. Li, A. Kanno, Y.
Nishizawa, A. Hirai, K.Shinozaki and M. Sugura. analysis of isozyme patterns, pp. 489-508. In C. L.
1989. The completc sequence of the rice ( O y z a Markert (ed.), Isozymes. Val. 1. Academic Press,
sativa) chloroplast genome: Intermolecular recom- New York.
bination between distinct tRNA genes accounts Hdss, M. and S. Paabo. 1993. DNA extraction from
for a major plastid DNA inversion during the Pleistocene bones by a silica-based purification
evolution of the cereals. Mol. Gen. Genet. method. Nucl. Acids Res. 21:3913-3914.
212185-194. Hoss, M., M.Kohn, S. Pahbo, F. Knauet and W.
Hixson, J. E. and W. M. Brown. 1986. A comparison of Schroder. 1992. Excrement analysis by PCR.
the small ribosomal RNA genes from the mito- Naturc 359:199.
chondrial DNA of the great apes and humans: Houck, L. D., S.G. Tilley and S. J. Arnold. 1985. Sperm
Sequence, structure, evolution, and phylogenetic competition in a plethodontid salamander:
implications. Mol. Biol. Evol. 3:l-18. Preliminary results. J. Herpetol. 19:420-423.
Hoeh, W. R., K. H. Blakley and W. M. Brown. 1992. Houde, P. and M.J. Braun. 1988. Museum collections
Heteroplasmy suggests limited biparental inheri- as a source of DNAfor studies of avian phyloge-
tance of Mytitus mitochondria1 DNA. Science ny. Auk 105:773-776.
251:1488-1490. Hsiao, J.-Y. and L.H. Rieseberg. 1994. PopuIation
Hoelzal, R. and G. A. Dover. 1987. Molecular tech- genetic structure of Yush n~itakayanzenszs
niques for examining genetic variation and stock (Bambusoideac, Poaceae) in Taiwan. Mol. Ecol.
identity in cetacean species. Report of the 3:201-209.
International Whale Commission. Hsu, T. C. 1979. Human and Mamntalian Cytogenetics.
Hoey, M. T, and C. R. Parks. 1991. Isozyme divergence Springer-Verlag, Berlin.
between Eastern Asian, North American and Hsu, T. C. 1981. Polymorphism in huinan acrocentric
Turkish populations of Liquidambar chromosomes and the silver staining method for
(Hamamelidaceae).Am. J . Dot. 78:938-947 nucleolus organizer regions. Karyogram 245.
Holmes, N. G., C. S. Mellersh, S. J. Humphreys, M. M. Hubby, J. L.and R. C. Lewontin. 1966. A molecular
Binns, A. Hollirnan, li. Curtis and J. Sampson. approacli to the study of gemc heterozygosity in
1993. Isolation and characterization of microsatel- natural populations. I. The number of alleles at
lites from the canine genome. Anim. Genet. different loci in Dlasophila yseudoobscura. Genetics
24:289-292. 543571-594.
Holmquist, R., M. M. Miyamoto and M. Goodman. Hubby, J. L. and L. H. Throckmorton. 1965. Protein
1988a. Analysis sf higher-primate phylogeny differences in Drosophila. 11. Comparative species
from transversion differences in nuclear and genetics and evolutionary problems. Genetics
mitochondria1 DNA by Lake's methods of evolu- 52203-215.
tionary parsimony and operator metrics. Mol. Hudson, R. R. 1990. Gene genealogies and the coales-
Biol. Evol. 5:217-236. cent process. Oxford Surv. Evol. Biol. 2 1 4 4 .
Holmquist, R., M. M. Miyamoto and M. Goodman. Hudson, R. R., D. D. Boos and N. L. Kaplan. 1992a.A
1988b. Higher-primate phylogeny-Why can't we statistical test for detecting geographic subdivi-
decide? Mol. Biol. Evol. 5:201-216. sion. Mol. Biol. Evol. 9:138-151.
Holsinger, K. E. and L. D. Gottlieb. 1988. Isozyme vari- Hudson, R. R., M.Slatkin and W. P. Maddison. 1992b.
ability in the tetraploid Clarkia gracilis Estimation of levels of gene flow from DNA
(Onagraceae) and its diploid relatives. Syst. Bot. sequence data. Genetics 132:583-589.
13:l-6. Wudspeth, M. E. S., D. S. Scl~urnard,K. M. Tatti and L.
Holtsford, T. P. and N. C. Ellstrand. 1990. Inbreeding I. Grossman. 1980. Rapid purification of yeast
effects in Clarkia tembloviensis (Onagraceae) popu- mitoclzondrial DNA in 11igli yield. Biocl~im.
lations with different natural outcrossing rates. Biophys. Actd 610:221-228.
Evolution 44:2031-2046. Huelsenbeck, J. P. 1995a. Performance of phylogenetic
Honeycutt, R. L., S. W. Edwards, K. Nelson and E. methods in simulation. Syst. Biol. 44:17-48.
Nevo. 1987. Mitochondria1 DNA variation and Huelsenbeck, J. Pa1995b. The robustness of two phylo-
the phylogeny of African mole rats (Rodentia: genetic methods: Four-taxon simulations reveal a
Bathyergidae).Syst. Zool. 36:280-293. slight superiority of maximum likekhood over
Hood, L. E.,J. H. Wilson and W. B. Wood. 1974. neighbor joining. Mol. Biol. Evol. 12:843-849.
Molecular Biology of Eucaryotic Cells. Vol. 1.W. A. Huelsenbeck, J. P.and D. M. Hillis. 1993. Success of
Benjamin, Nenlo Park, California. phylogenetic methods in the four-taxon case.
Hopkinson, D.A. 1975. The use of thiol reagents in the Syst. Biol. 42:247-264.
Liferatwe Cited 591
Huelsenbeck, J. P., D. L. Swofford, C. W. Cunningham, alated deoxyribonuclerc ac~d.Biochemistry
J. J. Bull and P, W. Waddell. 1994. Is character 12.558-563.
weighting a panacea for the problem of data het- Huxley, J. 1942. Evoluflon 7 % Modern
~ Syi~thrsrsAilen
eroieneity & ph ylogenetic &alysis? Syst. Biol. and Unw~n,London
43:288-291.
Huclsenbeck, J. P., D. M.H~llisand R.Jones. 1995. Innis, M.A., D. H. Gelfand, J. J. Sninsky and T,J. Wlute.
Parametric bootstrapping in molecular phylogc- 1990 PCR Protocols. Academic Press, New York.
netics: Applications and performance. In J. International Union of Blochcinistry: Nomenclalure
Ferraris and S. Palumbi (eds.), Molecular Zoology: Committee. 1984. Eizzylize Nonzenclattrre, 1984
Strategres and Profocols. Wiley, New York. Academic Press, Orlando, Florida.
FIuey, R. B. and A. E Bennett. 1987. Phylogenctic stud- Irwin, D. M., T. D. Kocher and A. C. Wilson 1991
ies of coadaptation: Preferred temperaturcs ver- Evolution of the cytachroine b gene of mammals.
sus optimal performance temperatures of lizards. J. Mol. Evol. 32:128-144
Evolution 41:1098-1115. ISCN 1981. An internat~onalsystem for human cyto-
Hugall, A,, C. Morita, J. Stanton and D. R. gcnetic nomenclature-high resolution bandlng.
Wolstenholme. 1994. Low, but strongly structured Cytogenet. Cell Genet 31:l-23.
mitochondria1 DNA diversity ~nroot knot nema- Iwahana, H., D. Yosl~imotoand M. Itakura. 1992.
todes (Meloidogyne).Genetics 136:903-912. Detection of p o ~ nmutations
t by SSCP of PCR-
Hughes, A. E. 1993. Optimization of microsatellite amplified DNA after endonuclease digestion
analysis for genetic mapping. Genomics BioTechniques 12.64-66.
15:433434.
Hughes, A, L.and M. Nei. 1989. Ancient mterlocus lackman, T. R. and D. B. Wake. 1994. Evolutionary and
exon exchange in the history of the HLA-A locus. historical analysis of protein variation in the
Genetics 122:681-686. blotched forms of salan~andersof the Ensatinn
Hughes, C. R. and D. C. Queller. 1993. Detection of complex (Amplubia, Plethodont~dae).Evolution
highly polymorphic microsatellite loci in a species 48.876-897.
with little allozyme polymorphism. Mol. Ecal. Jackson, J. E and J. A. Pounds. 1979. Commcil~son
2:131-138. assessing the dedifferentlating effects of gcnc
Hunkap~ller,T., R. J. Kaiser, D.E Koop and L. Wood. flow. Syst. ZOO^. 28.78-85.
1991. Large-scale and automated DNA sequence Jacobs, EX. T., J. W. Posakony, J. W. Grula, J. W. Roberts,
determination. Science 254:59-68. J. 1-1 Xin, R. J. Brittcn and 2. E-I. Davidson 1983.
Hunt, J. A., T. J. Hall and R. I. Britten. 1981. Mitochondria1 DNA sequences in the nuclear
Evolutionary distances in Hawaiian Drosophila genome of Sfrongylocentrotus purpurrztus. J. Mol
measured by DNA reassociation. J. Mol. Evol. Biol.165:609-632.
17:361-367. Janczewski, D. N., N. Yuhh, D.A. Gilbert, G.T.
Hunt, W. G. and R. K. Selander. 1973. Biochemical Jeffcrson and S. J. O'Brien. 1992. MoZccular phylo-
genetics of hybridization in European house mice. genetic inference from saber-toothed cat fosslis of
Heredity 31:ll-33. Rancho La Brea. Froc. Natl. Acad. Sci. USA
Hunter, X. L. and C. L. Markert. 1957. Histochemical 899769-9773.
demonstration of enzymes separated by zone Jansen, R. K. and J. D. Palmer. 1987a. Chlaroplast
clectrophoresis in starch gels. Science DNA from Icttuce and Barnodesig (Asteraceae).
125:1294-1295. Structure, gene local~zationand characterization
Hutchinson, M. N.and L. R. Maxson. 1987a. of a large inversion. Curr. Genet. 11:553-564
Biochemical studies on the relationships of the Jansen, R. K. and J. D. Palmcr. 1987b. A chloroplast
gastric-brooding Frogs, genus Rh.eobatraclzus. DNA inversion marks an ancient evolutionary
Amphibia/Reptilia 8:1-11. split in the sunflower family (Asteraceae). Proc.
Hutcl~inson,M. N, and L. R. Maxson. 198%. Natl. Acad. Sci. USA 84:5818-5822.
Phylogenetic resolution ainong Australian tree Jansen, R. K. and J. D.Palmer 1988. Phylogenetlc
frogs (Anura: Hylidae: Pelodryadinae): An ~n~plications of cl~loroplastDNA restriction site
immunological approach. Australian J. 2001. variat~onin the Mutis~eae(Asteraceae).Am. J
3561-74. Bot. 75:751-764.
Plutton, J. R. and J. G. Wetrnur. 1973. Effect of chemical Jansen, R. K., H. J. Mlchaels and J. D. Palmer. 1991.
modification on the rate of renafuratian of Phylogeny and character evoIution In the
deoxyribonucleic acid: Deamination and glyox Astcraceae based on chloroplast DNA restnctlcln
sitc mapping. Syst. Bot. 16:98-115.
Jansen, R. K.,H. J. Michaels, R. S. Wallace, K.-J. Kim, S. and D. J. Simpson. 1989. High-speed DNA
C Keeley, L. E. Watson and J. D. Palmer. 1992. sequencing: An approach based upon ffuores-
Chloroplast DNA variation ln the Asteraceae: cence detection of single molecules. J. Biomol.
Phylogenetic and evolutionary implications, pp. Struct. Dynam. 7:301-309.
252-279. In P.S. Soltrs, J E. Soltis and J. J. Doyle Jiminez-Marin, D, and H. C. Dessauer. 1973. Protein
(eds.),Molecular Systematics of Plants. Chapman phenotype variation in laboratory populations of
and Hall, New York. Itaftus norvegxcus. Comp. Biochem. Physiol.
Jcanp~erre,M. 1987. A r a p ~ dmethod for the purifica- 46B:487-492.
l ~ o nof DNA from blood. Nucl. Acids Res. 15:9611. Jin, L.and R. Chakraborty. 1995. Population structure,
Jech, M. S, and N. C. Wheeler. 1984. Laboraiory Ma?zual stepwise mutations, heterozygote deficiency and
For Holdizontal Starch Gel Electrophoresis. their implications for DNA forensics. Heredity
Weyerhauser Research and Development Report 743274-285.
#O50-3210/6. Jin, L, and M. Nei. 1990. Limitations of the evolution-
Jcffreys,A. J. 1982. Spermidine and the digest~onof ary parsimony method of phylogenetic analysis.
mipure DNA. Focus (BRL) 4(3):12. Mol. Biol. Evol. 7:82-102.
Jeffrcys,A. J. and D. 8. Morton. 1987. DNAfinger- Johannisson, R. and H. Winking. 1994. Synaptonemal
pnnts of dogs and cats. Anim. Genet. 18:l-15. complexes of chains and rings in mice heterozy-
Jeff~cys, A. J., V, Wilso1-1and S. L. Thein. 1985a. gous for multiple Robertsonian translocations.
Hypervariable "minisatellite" regions in human Chromosome Res. 2:137-145.
DNA Nature 314:67-73. Job, H., M. L. Birnsteil and K. W. Jones. 1969.
Jeffreys,A. J., V. Wilson and S. L. Thein. 1985b. RNA-DNA hybrids at cytological levels. Nature
lndlvidual-specihc "fingerprints" of human 223:582-587.
DATA. Nature 316:76-79. Johnson, A. G., E M. Utter and H. 0.Hodgins. 1970.
Jeffreys, A J.,V. Wilson, R. Kelly, B. A. Taylor and G. Interspecific variation of tetrazolium oxidase in
Bulheld. 1987. Mouse DNA "fingerprints": Sebastodes (rockfish).Comp. Biochem. Physiol.
Analysis of chromosome localization and germ- 37:281-285.
lme stability of hypervariable loci in recombinant Johnson, G. B. 1976. Hidden alleles at the a-glyc-
inbred strains. Nucl. Acids Res. 15:2823-2836. erophosphate locus in Colins butterflies. Genetics
jeff~eys,A J., N. J. Xoylc, V. IVilson and Z. Wong. 83:149-167.
1988. Spontaneous mutation rates to new length Johnson, G. 6.1977. Assessing electrophoretic simnilari-
alleles at tandeni-rcpehtive hypervariable loci in ty: The problem of hidden heterogeneity. Annu.
11un1an DNA. Nature 332:278-281. Rev. Ecol. Syst. 8:309-328.
Jeffrcys, A. J., A. MacLcod, K. Tamaki, D. L. Ned and Johnson, G.B. 1979. Increasing the resolution of poly-
D.G. Monckton. 1991.Minisatellite repeat coding acrylamide gel electrophoresis by varying the
as a digital approach to DNA typing. Nature degree of crosslinking. Biochem. Genet.
354:204-209. 17:499-516.
Jeffxeys, A. J., K. Tamak~,A. MacLeod, D. G. Johnson, M. S. and R. Black. 1984. The Wahlurtd effect
klonckton, D. L. Neil and J. A. L. Armour. 1994. and the geographical scale of variation inthe
Co~nplexgene conversion events in germline intertidal limpet Siphonaria sp. Marine Biol.
mutation at human minisatellites. Nature 79:295-302.
Genetics 6:136-145. Johnson, M. S. and X. E Doolittle. 2986. A method for
j ma, K.K.and G. Kochert. 1991. Restriction fragment the sitnultaneous alignment of three or more
length poiymorphism analysis of CCDD genome amino acid sequences. J. Mol. Evol. 23:267-278.
species oi the genus O y z a L. Plant Mol. Biol. Johnson, N. K., R. M. Zink, G. E Barrowclough and J.
16:831-839. A. Marten. 1984. Suggested techniques for mod-
lensen, U. and D. E.Fairbrothers (eds.) 1983. Protelns ern avian systeinatics. Wilson Bull. 96:543-560.
and Nucleic Acids in Plant Systematics. Springer- Johnson, N. K., R. M. Zink and J. A. Marten. 1988.
Verlag, New York. Genetic evidence for relationships in the avian
Jermat~n,T.M., J. G. Opitz, J. Stackhouse and S. A. family Vireonidae. Condor 90:428-445.
Bcnner. 1995. Reconstructing the evolutionary his- Jones, C. S., H.Tegelstrom, D, S. Latchman and R. J.
tory of the artjodactyl ribonuclease superfamily. Berry 1988. An improved rapid method for mito-
Nature 374:57-59. chondrial DNA isolation suitable for use in the
Jett, J. H., R.A. Keller, J. C. Martin, B. L. Marronc, R.K. study of closely related popualtions. Biochem.
Moyzis, R. L.Ratliff, N. K. Seitzinger, B. B. Shera Genet. 26:83-88.
Jones, D. T.,W. R. Taylor and J. M. Thornton. 1992. The Karl, S. A. and J. C. Avise. 1992. Balancing selection at
rapid generation of mutation data matrices from allozyme loci in oysters: Implications from
protein sequences. Con-rp.Appl. Biosci. 8:25-282. nuclear RFLPs. Science 256:100-102.
Jones, G. H. and D. de Azkue. 1993. Synaptonemal Karl, S. A, and J. C. Avise. 1993. PCR-based assays of
complex karyotyping: An appraisal based on a Mendelian polymarphisms from anonymous sin-
study of Crepis caplllaris. Chromosome Res. gle-copy nuclear DNA: Techniques and applica-
1:197-203. tions for population genetics. Mol. Biol. Evol.
Jones, T. R., A. G. Kluge and A. J Wolf. 1993. When 10:342-361.
theories and methodologies clash: A phylogenetic Karl, S. A., B. W. Bowen and J. C. Avise. 1992. Global
reanalysis of the North American ambystomatid population genetic structure and male-mediated
salamanders (Caudata: Ambystomatidae). Syst. gene flow in the green turtle (Chelonia nzydas):
Biol. 42:92-102. RILP analysis of anonymous nuclear loci.
Jorgensen, R. A. and P. D. Cluster. 1988. Modes and Genetics 131:163-173.
tempos in the evolution of nuclear ribosomal Keilen, D. and Y. L. Wang. 1947. Stability of hemoglo-
DNA: New characters for evolutionary studies bin and certain non-erythrocytic enzymes in vitro.
and new markers for genetic and population Biochem. J. 41:491499.
studies. Ann. Missouri Bot. Gard. 75:1238-1247. Kellogg, E. A. and J. A. Bircher. 1993. Linking phyloge-
Joseph, L. and C. Moritz. 1994. Mitochondria1DNA ny and genetics: Zea mays as a tool for phyloge-
phylogeography of birds in eastern Australian netic studies. Syst. Biol. 42:409414.
rainforests: First fragments. Australian J. Zool. Kemmerer, E. C., M. Lei and R. WU. 1991. Isolation
42:385403. and molecular evolutionary analysis of a
Joseph, L., C. Moritz and A. Hugall. 1995. Molecular cytochrome c gene from 0y z a sativa (rice).Mol.
support for vicariance as a source of diversity in Biol. Evol. 8:212-226.
rainforest. Proc. Roy. Soc. Lond. (in press). Kempthorne, 0.1957. An introduction to Genetic
Jouannic, S.,C. Kerbourch,and B. Kloareg and S. Statistics.Wiley, New York.
Loiseau-de Goer. 1992. Nucleotide sequences of Kephart, S. R. 1990. Starch gel electrophoresis of plant
the atpB and the atpE genes of the brown alga isozymes: A comparative analysis of techniques.
Pylaiella Iittoralis(L.) Kjellm. Plant Mol. Biol. Am. J. Bot. 775693-712.
18 5319-822. Kesseli, R., 0. Ochoa and R. Michelmore. 1991.
Jukes, T. H. and C. R. Cantor. 1969. Evolution of pro- Variation at RFLP loci in Lactuca ssp. and origin of
tein molecules, pp. 21-132. In H. N. Munro (ed.), cultivated lettuce ( L , safiva). Genome 34:430436.
Mammalian Protein Mefabolism. Academic Press, Kessing, B. D. 1991. Strongylocentrotid sea urchin
New York. mitochondria1 DNA: Phylogenetic Relationships
Jupe, E. R,, R. L. Chapman and E,A. Zimmer. 1988. and patterns of molecular evolution. Masters the-
Nuclear ribosomal RNA genes and algal phyloge- sis, Department of Zoology, University of Hawaii,
ny-the Chlamydornonas example. BioSystew Honolulu, HI.
21:223-230. Kessler, L. G. and J. C. Avise. 1985a.Microgeographic
lineage analysis by mitochol-rdrialgenotype:
Kambl-rampati, S. and K. S. Rai. 1991. Temporal varia- Variation in the cotton rat (Sigmodun hispidis).
tion in the ribosomal DNA nontranscribed spacer Evolution 39:831-838.
of Aedes albopictus (Diptera: Culicidae). Genome Kessler, L. G. and J. C. Avise. 198510.A comparative
34:293-297. description of mitochondria1 differentiation in
Kanehisa, M. 1984. Use of criteria for screening poten- selected avian and other vertebrate genera. Mol.
tial homologies in nucleic acid sequences. Nucl. Biol. Evol. 2:109-126.
Acids lies. 12:203-213. Kettler, M. K. and G. S. Whitt. 1986. An apparent pro-
Kaplan, J.-C. and E. Beutler. 1967. Electrophoresis of gressive and recurrent evolutionary restriction in
red cell NADH- and NADPH-diaphorases in nor- tissue expression of a gene, the lactate dehydroge-
ma1 subjects and patients with congenital methe- nase-C gene, within a famlly of bony fish
moglobinemia. Biochem. Biophy. Res. Comm. (Salmoniformes: Umbridae). J. Mol. Evol.
29:605-610. 23:95-107.
Kaplan, N. L., W. G. Hill and B. S. Weir. 1995. Kettler, M. K., A. W. Ghent and G. S. Whitt. 1986. A
Likelihood methods for locating disease genes in comparison of phylogenies based on structural
non-equilibrium populations. Am. J. Human and tissue-expressional differences of enzymes in
Genet. 56:18-32. a family of teleost fishes (Salrnoniformes:
Umbridae). Mol. Biol. Evol. 3:485498.
Kezer, J. and S. K. Sessions. 1979. Chromosome varia- Kimura, M. and T. Ohta. 1972. On the stochastic model
tion in the plethodontid salamander, Aneides fer- for estimation of mutational distance between
reus. Chromosoma 71:65-80. homologous proteins. J. Mol. Evol. 2:87-90.
Kezer, J., P. Lebn and S. K. Sessions. 1980. Structural Kimura, M. and G. H. Weiss. 1964. The stepping stone
differentiation of the meiotic and mitotic chromo- model of population and the decrease of genetic
somes of the salamander Ambystoma nzacrodacty- correlation with distance. Genetics 49:561-576.
lunz. Chromosoma 81:277-197. King, J. L. and T. H. Jukes. 1969. Non-Darwinian evo-
Kezer, J., S. K.Sessions and P. Ledn. 1989. The meiotic iution. Science 164,788-798.
structure and behavior of the strongly heteromor- King, J. L. and T. Ohta. 1975. Polyallelic mutational
phic X/Y sex chromosomes of neotropical pletho- equilibria. Genetics 79:681-691.
dontid salamanders of the genus Oedipina. King, M. 1993. Species Evolution: The Role of
Chromosoma:98:433-442. Chromosome Change. Cambridge University Press,
Kidd, K.K.,P. Astolfi and L. L. Cavalli-Sforza. 1974. Cambridge.
Error in the reconstruction of evolutionary trees, Kirsch, J. A. W., Springer, M. A., Krajewski, C., Arcl~er,
pp. 121-136. I n J. F. Crow and C. Denniston (eds.), M., Aplin, K. and A. W. Dickerman. 1990a.
Genetic Distance. Plenum, New York. DNA/DNA hybridization studies of the carnivo-
Kidd, K. K, and L. L. Cavalli-Sforza. 1971. Number of rous marsupials. I: The intergeneric relationships
characters examlned and error in reconstruction of bandicoots (Marsupialia: Perameloidea). J. Mol.
of evolutionary trees, pp. 335346. bz F. R. T-Jodson Evol. 30:434448.
and P. Tautu (eds.),Mathematics in the Kirsch, J. A. W., Krajewski, C., Springer, M. S. and M.
Archaeological aod Historical Sciences. Edinburgh Archer, 1990b.DNA-DNA hybridization studies
University Press, Edii-tburgh. of carnivorous marsupials. 11. Relationships
Kidd, K. K. and L. A. Sgaramella-Zonta. 1971. among dasyurids (Marsupialia: Dasyuridae).
Phylogenetic analysis: Concepts and methods. Australian J. Zool. 38673-696.
Am. J. Human Genet. 23:235-252. Kirsch, J. A. W., Dickcrman, A. W., Reig, 0,A. and M.
Kilias, J. 1987. Protein characters as a taxonomic tool S. Springer. 1991. DNA hybridizatiol~evidence for
in lichen systematics. Bibl. Lichenol. 25445455. the Australian affinity of the American marsupial
Kim, J. 1993. Improving the accuracy of phylogenetic Dvomiciops australts. Proc. Natl. Acad. Sci. USA
estimation by cornbin~ngdifferent methods. Syst. 88:10465-10469.
Biol. 42:331-340. Kishino, H,and M. Hasegawa. 1989. Evaluation of the
Kim, W. and L. G. Abele. 1990. Molecular phylogeny maximum likelihood estimate of the evolutionary
of selected decapod crustaceans based on 18s tree topologies from DNA sequence data, and the
rRNA nucleotide sequences. J. Crustacean Biol. branching order in Mominoidea. J. Mol. Evol.
1O:l-13. 29:170-179.
Kimura, M. 1968. Evolutionary rate at the molecuIar Kishino, H. and M. Hasegawa. 2990. Converting dis-
level. Nature 217624-626. tance to time: Application to human evolution.
Kimura, M. 1980. A simple method for estimating evo- Meth. Enzymol. 183:550-570.
lutionary rate of base substitutions through com- Kishino, H., T. Miyata and M. Hasegawa. 1990.
parative studies of nucleotide sequences. J. Mol. Maximum likelihood inference of protein phy-
Evol. 16:211-229. logeny and the origin of cl~loroplasts.J. Mol. Evol.
Kimura, M. 1981. Estimation of evolutionary distances 31:151-160.
between homologous nucleotide sequences. Proc. Kitto, G. B., P. M. Wasserman and N. 0.Kaplan. 2966.
Natl. Acad. Sci. USA 78:454-458. Enzymatically active conformers of mitochondria1
Kimura, M. 1983a. The neutral theory of molecular malate dehydrogenase. Proc. Natl. Acad. Sci. USA
evolution, pp. 208-233. In M. Nei and R. K. Koehn 56578-585.
(eds.), Evolution of Genes and Proteins. Sinauer, Kjer, K. M., G. D. Baldridge and A. M. Fallon. 1994.
Sunderland, Massachusetts. Mosquito large subunit ribosomal RNA:
Kimura, M. 19831s. The Neutral Theory of Molecular Simultaneous alignment of primary and sec-
Evolution. Cambridge University Press, ondary structure. Biochim. Riophy. Acta:147-155.
Cambridge. Klebe, R. J. 1975.A simple method for the quantifica-
Kimura, M. 1986. DNA and the neutral theory. Phil. tion of isozymes patterns. Biochem. Genet.
Trans. Roy. Soc. London B312:343-354. 13:805-812.
Kimura, M, and J. E Crow. 1964. The number of alleles Klein, J. 1982. Immunology: The Science of Self-Nolzself
that can be maintained in a finite population. Discriminatiotz. John Wiley &Sons, New York.
Genetics 49:725-738.
Literalure Cited 595
Klem, J., Y. Satta and C. O'Huigin. 1993. The molecular blllty across the isthmus of Panama. Sclence
descent of the major histocompatibility complex. 260 1629-1632.
Annu. Rev. Immunol. 11:269-295. Kobayash~,T., G. B. Milner, D. Tee1 and E M.Utter
Klcppe, K., E. Ohtsuka, R. Kleppe, I. Molineux and H. 1984 Genetic basis for electrophoretic var~at~on of
G. Khorana. 1971. Studies on ploynucleotides adenosine deaminase m chlnook salmon. Trans
XCVI. Repair replication of short synthetic DNA's Am Fish. Soc 113:86-89
as catalyzed by DNA polymerases. J. Mol. Biol. Koch, J ,J. Hindkjaer, J. Mogensen, S. Kalvraa and L
56:341-361. Bolund 1991. An rmproved method for chromo-
Klier, K., M.J. Leoschke and J. E Wendel. 1991. some-specif~clabellng of alpl~a-satelliteDNA m
Hybridization and introgression in white and yel- s ~ t uby uslng denatured double-stranded DNA
low ladyslipper orchids (Cypripcdium candidurn probes as primers m a primed In situ 1abeIlng
and C. pubescens). J. Hered. 82:305-318. (PRINS) procedure. GATA 81 171-178.
Klotz, L. C. and R. L. Blanken. 1981.A practical Kocher, T D. 1991. Sequence evolution of miiochondr-
method for calculating evolutionary trees from lal DNA in human and ch~mpanzees:Control
sequence data. J. Theor. Biol. 91:261-272. reglon and protein coding region, pp. 391-413 111
Wuge, A. G. 1983. Cladistics and the classification of S. Osawa and T. Honjo (eds ), Evolutzotz of Lrfc
the great apes, pp. 151-177. Ira R. L. Ciochan and f'osstls, Molecules, and Cullure. Springer, Tokyo.
R. S. Cormccini (eds.), New Interpretatloi.rs of Ape Kocher, T. D.and R D.Sage. 1986 Further genet~c
and Human Ancest ry. Plenum, New York. analyses of a hybrid zonc between leopard frogs
Kluge, A. G. 1984. The relevance of parsi~nonyto phy- (Rana pipiens complex) in ccntral Texas. Evoluljon
logenetic inference, pp. 2438. in T.Duncan and T. 40:21-33.
Stuessy (eds ), Cladistics: Perspectives on the Kocher, T. D. and T J. White 1989 Evolutionary
Reconstruction of Evolutionary History. Columbia analysis via PCR. In H A. Erllch (ed.), PCR
University Press, New York. Technology: Priizciples and Appllcatrons for DNA
Kluge, A. G. 1988. Parsimony in vicariance biogeogra- Amplification. Stockton Press, New York
phy: Aquantitative method and a Greater Kocher, T D. and A. C W~lson.1991. Sequence evolu-
Antillean example. Syst. Zool. 37:315-328. tion of mitochondr~alDNA In humans and chlm-
Kluge, A. G. 1989. A concern for evidence and a phylo- panzees: Control reglon and a protein-coding
genetic hypothesis of relationships among region, pp. 391413 111S Osawa and T I-ionjo
Epicrates (Boidae, Scrpentes). Syst. Zool. 38:7-25. (eds ), Evolutton of Lrfe Springer-Verlag,Tokyo
Kluge, A. G. and J. S. Farris. 1969. Quantitative phylet- Kocher, T D , W. K. Thomas, A. Mcycr, S. V, rdwards,
ics and the evolution of anurans. Syst. Zool. S. Paabo, F. X, V~llablancaand A. C. W~lson1989.
18:l-32. Dynam~csof mltochondrral DNA evolution in
Kluge, A. G. and R. E. Strauss. 1985. Ontogeny and animals Amplification and scquencmg wlth con-
systematics. Annu. Rev. Ecol. Syst 16:247-268. served primers. Proc. Natl Acad. Scl. USA
Knight, A, and D. P. Mindell. 1993. Substitution bias, 86,6196-6200.
weighting of DNA sequence evolution, and the Kochert, G., T. Halward, W. D.Branch and C. E.
phylogenetic position of Fea's viper. Syst. Biol. Slmpson 1991. RFLP variability in peanut
42:18-31. (Arnchis hypogaea L.) cultivars and wrld spec~es
Knight, A. and D. P. Mindell. 1995. Weighbng of Theor. Appl, Genet 81 565-570.
nucleotide sequences: A reply Syst. Biol. Koehler, K. and K. Larntz 1980 An empirical snvestl-
44112-116. gatlon of goodness-of-frt statlstlcs for sparse
Knlght, A., D. Styer, S. Pelikan, J. A. Campbell, L. D. multmomials. J. Am. Statls Assoc 75 336-344.
Densmore I11 and D. P. Mindell. 1993. Choosing Koehn, R. K. 1978. Physrology and brochen~lstryof
among hypotl~esesof rattlesnake phylogeny: A enzyme variation. The jnterface of ecology and
best-fit rate test for DNA sequence data. Syst. populat~ongenetics, pp. 51-72. In P. Brussard
Biol. 42:356-367. (ed.), EcoIogical Gerzet~cs:The Ii~terjnce Spru~gcr,
Knight, S. E. and D. M. Waller. 1987. Genetic conse- New York.
quences of outcrossing in the cleistogamous Koehn,..?I K.and E W. Irnmermann. 1981. Blochemlcal
annual, linpatrens capensis. I, Population-genetic studics of ammopep ttdase polymorph~smIn
structure. Evolution 41:969-978. Mytilus edults. I Dependence of enzyme actsvity
Knowlton, N., L. A. Weigt, L. A. Solorzano, D. K. Mills on season, tissuc, and genotype. Blocl~emGenet
and E. Bermingham. 1993. Divcrgence in proteins, 19.1115-1142.
mitochondria1DNA, and reproductive compati-
Koehn, 1; K. and J. E Siebenaller. 1981. Biochemical envelope protein. Proc. Natl. Acad. Sci. USA
studies of aminopeptidase polymorphism in 90:7176-7180.
i\/Iy/ll!is edul~s.11. Dependence of reaction rate on Korber, B. T. M., R. F. Smith, K. MacInnes and G.
physical factors and enzyme concentration. Myers. 1994. Mutat~onaltrends in V3 loop protein
Bloc!:em. Genet. 19:1143-1162 sequences observed m different genetic lineages
Koehn, R K., R.I. E. Newel1 and F.I~nrnermann.1980. of human imrnunodeficlency virus type I. J. Virol.
b~a~nienance of an aminopeptidase allele frequen- 68:6730-6744.
cy cline by natural selection. Proc. Natl. Acad. Scl. Kornberg, A. 1980. DNA Replication. Freeman, Sail
USA 775385-5389. Fransisco.
Kochn, 12. K., W J Die111 and T. M. Scott. 1988. The dif- Kowbel, D. 5, and M. J. Smith. 1989. The genomic
frrential contribution by individual enzymes of nucleotide sequences of two differentially
glycolysls and proten1 catabolism to the relation- expressed actin-coding genes from the sea star
shxp between heterozygosity and growth rate in Pisaster ochmceus. Gene 72297-308.
the coot clam, Mulil~mlateralrs. Genetics Krajewski, C. 1989. Phylogenetlc relationships among
118~121-130. cranes (Aves: Gruidae) based on DNA hybridiza-
l<ohne, 13 E. 1970. Evolulion of h~gher-organism tion. Auk 106.603-618.
DNA Quart. Rev. Blophys. 33:327-375. IGajewski. C, and A. W. Dickerman. 1990. Bootstrap
Kohnc, D 1.: and R. J. Brltten. 1971. Hydroxyapatite analysis of phylogenctic trees derived from DNA
rechnirlues lor nuclelc acld reassociation, pp. l~ybridizationdistances. Syst. Zool. 39:383-390.
500-512.111 G. L. Cantoni and D. R. Davies (eds.), Kraus, E 1991, htra-individual ploidy consistency
Procedures 117 Nuclerc Acid Resmrch. Harper and among unisexual Ambysfoma. Copeia 1991:3843.
liow, New York. Kraus. E and M. M Miyamoto. 1990. Mitochondria1
Kohne, D E.,J. A. Chiscon and l3. H.Hoyer. 1972. genotype of a unisexual salamander of hybrid ori-
Evolution of primate DNA sequences. J. Human gin is unrelated to either of its nuclear haplo-
Evol 1 627-644. types. Proc. Natl. Acad. Sci. USA 87:2235-2238.
Kohne, D. E., S. A. Levison and M. J. Byers. 1977. Kreitman, M. 1987. Molecular population genetics.
Room temperature method for increasing the rate Oxford Surv. Evol. Biol. 4:38-60.
of DNA reassociation by many thousandfold: The Kreitman, M, and M. Aguade. 1986. Genetic uniformi-
pi7cnol emulsion reassociation technique. qr in two populations of Drosoplzila rnelanogaster as
Blochemistry 16:5329-5341. revealed by Biter hybridization of four-
Kohno, 5.I , M. Kuro-o and C. Ikebe. 1991. nucleotide-recognizing restriction enzyme
Cytogenetics and evolution of hynobiid salaman- digests. Proc. Natl. Acad. Sci. USA83:3562-3566.
ders hz D.M. Green and S. I(.Sessions (eds.) I&ishnan, B. R., R. W. Blakesley and D. E. Berg. 1991.
Alripizzblan Cytogenetics and Eriolution. Academic Linear amplification DNA sequencing directly
Press, San Diego. from single phage plaques and bacterial colonies.
I<olodi-ier,R. and K. KTernari. 1987. The molecular Nucl. Acids Res. 19:1153.
s u e and conformation of the chloroplast DNA KruskaI, J. B. 1983. An overview of sequence cornpari-
fro111 higher plants. Biochim, Biophys. Acta son, pp. 1 4 0 . In D. Sankoff and J. R. Kruskal
402 372-390. (eds.), Time Warps, Siring Edits, and
Kol-ido, I<.,S. I-iorai, Y. Satta and N. Takahata. 1993. Macromolec~~les: Tile Theoty and Practice of Sequence
Evolution of homlnoid n~~tochondrial DNA with Compavison. Addison-Wesley, London.
special relerence to the sllent substitution rate Kuhner, M. K. and J. Felsenstein. 1994. Asimulation
o v a the genome. J Mol. Evol. 36:517-531. comparison of phylogeny algorithms under equal
Koop, B. F,M.Goodman, P. Xu, K. Chan and J. L. and unequal evolutionary rates. Mol. Biol. Evol.
Sllgl-itom.1986. Primate 11-globin DNA sequences 11:459-468.
and man's place among the great apes. Nature Kumar, S., K. Tamura and M. Nei. 1993. MEGA:
319 234-238. Molecular Evolutiona y Genetics Analysis. Version
Kooy, B. I?, D,A. Tagle, M, Goodman and J. L. 1.0. Pennsylvania State University, University
Sl~ghtorn.1989. A molecular view of primate phy- Park, Pennsylvania.
lcgcny and lnlportant systematic and evolution- Kiintzel, 13.and H.G. Kiichel. 1981. Bvoiution of
ary questions. Mol. Biol. Evol. 6:580-612. rRNA and origin of mitochondria. Nature
Korbel, 'S 1..% I<. M.IFarber,
, D. H,Walpert and A . S. 293:751-755.
iapides. 1993. Covariation of mutations in the V3 Kuro-0, M., C. lkebe and S. Kohno. 1986. Cytogenetic
loop oi human imiuunodcficiency virus type I studies of Ilynobiidae (Urodela) IV.DNA replica-
tion bands (R-banding) in the genus Hynobius and Lamboy, W. E 1994 The accuracy of the maximum
the banding karyotype of Hynobius nigrescens parsimony method for phylogeny reconstruction
Stejneger. Cytogenet. Cell Genet. 43:14-18. with morphological characters. Syst. Bot.
Kuro-o, M., C. Ikebe and S. Kohno. 1987. Cytogenetic 19:489-505.
studies of Hynobiidae (Urodela) VI R-banding Lanave, C., G. Preparata, C. Saccone and G. Serio.
patterns m five pond-type Hynobzus from Korea 1984. A new method for calculating evolutionary
and Japan. Cytogenet. Cell Genet. 44:69-75. substitution rates. J. Mol. Evol. 20236-93.
Landry, B. S., R. Kesseli, H. Leung and R. W.
Lacroix, J. C., R. Azzouz, D. Boucher, C. Abbadie, C. K. Michelmore. 1987. Comparison of restriction
Pyne and J. Charlemagne. 1985. Monoclonal anti- endonucleases and sources of probes for their effi-
bodies to lampbrush chromosome antigens of ciency in detecting restriction fragment length
Pleurodeles waltlii. Chromosoma 92:69-80. polymorphisms in lettuce (Lactuca sativa L.).
Laird, C. D. 1987. Proposed mechanism of inheritance Theor. Appl. Genet. 74:646-653.
and expression of human fragile->(syndrome of Lane, D. J., B. Pace, G. J. Olsen, D. A. Stahl, M. L. Sogin
mental retardatron. Genetics 117:587-599. and N. R. Pace. 1985. Rapid determination of 16s
Laird, C. D., B. L. McConaughy and B. J. McCarthy ribosomal sequences for phylogenetic analyses.
1969. Rate of fixatlon of nucleotide substitutions Proc. Natl. kcad. Sci, USA 82:6955-6959.
in evolution. Nature 224.149-154. Langer, P. R., A. A. Waldrop and D. C. Ward. 1981.
Laird, C. D., E. Jaffe, G. Karpen, M. Lamb and R. Enzymatic synthesis of biotin-labeled polynu-
Nelson. 1987. Fragile sites in human chromo- cleotides: Novel nucleic acid affinity probes. Proc.
somes as regions of late-replicating DNA. Trends Natl. Acad. Sci. USA78:6633-6637.
Genet. 3:274-281. Langley, C. H., E. Montgomery and W, Quattlebaum.
Lake, J. A. 1987a. Rate-independent technique for 1981, Restriction map variation in the ADH
analysis of nucleic acid sequences: Evolutionary region of Drosopkila. Proc. Natl. Acad. Sci. USA
parsimony. Mol. Biol. Evol. 4:167-191. 79:5631-5635.
Lake, J. A. 198713. Prokaryotes and archaebacteria are Lansman, R. A., R. 0.Shade, J. E Shapira and J. C.
not monophyletic: Rate invariant analysis of Avise. 1981. The use of restriction endonucleases
rRNA genes indicates that eukaryotes and eocytes to measure mitochondria1 DNA sequence rdated-
form a monopl~ylet~c taxon. Cold Spring Harbor ness in natural populations. J. Mol. Evol.
Symp. Quant. 8101.52:839-846. 17:214-226.
Lake, J. A. 1988. Origin of the cukaryotic nucleus Lansman, R. A., J. C. Avise, C, E Aquadro, J. F.Shapira
determined by rate-invariant analysis of rRNA and S. W. Daniel. 1983. Extensive genetic varia-
sequences. Nature 331:184-186. tion in mitochondria1 DNAs among geographic
Lake, J. A. 1990a. Origin of the Metazoa. Proc. Natl. populations of the deer mouse, Peromyscus manic-
Acad. Sci. USA 82763-766. ulatus. Evolution 37:l-16.
Lake, J. A. 1990b.Archaebacterial or eocyte tree? Lanyon, S. 1985. Detecting internal inconsistencies in
Nature 343:418-419. distance data. Syst. Zool. 3k397-403.
Lake, J. A. 1994. Reconstructing evolutionary trees Lanyon, S. 1993. Phylogenetic frameworks: Towards a
from DNA and protein sequences: Paralinear dis- firmer foundation for the comparative approacl~.
tances. Proc. Natl. Acad. Sci. USA 91:1455-1459. Biol. J. Linnean Soc. 49:45-61.
Lamarck, J.-B.-Pa-A.de M. de. 4809. Philosopkze Lapoint, F.-J.and P, Legendre. 1992.A statistical
Zoologique, ou Expositron des ConsrdPrations framework to test the consensus among additive
Relatives d l'histoire Naturelle des Anlrnaux. Dentu, trees (cladograms). Syst. Biol. 41:158-171.
Paris. Larson, A. 1989. The relationship between speciation
Lamb, T., C. Lydeard, R. B. Walker and J. W. Gibbons. and morphological evolution, pp. 579-598. In D.
1994. Molecular systematics of map turtles Otte and J. A. Endler (eds.), Speciation and Its
(Graptemys):A comparison of mitochondria1 Consequences, Sinauer, Sunderland,
restriction sites versus sequence data. Syst. Biol. Massachusetts.
43:543-559. Larson, A. 1991a. Evolutionary analysis of length vari-
Lambert, D. M., C. D. Millar, K. Jack, S. Anderson and able sequences: Divergent domains of ribosomal
J. L. Craig. 1994. Single- and multilocus DNA fin- RNA. Pp. 221-248. In M. M. Miyamoto and J.
gerprinting of communally breeding pukeko: Do Cracraft (eds.) Phylogenetic Arzalysls of D N A
copulations or dominance ensure reproductive Sequence Data. Oxford University Press, New
success? Proc. Natl. Acad. Sci. USA 91:9641-9645. York.
Larson, A. 1991b. Amolecular perspective on the evo- RNA immobilized an nitrocellulose: Bioblots.
lutionary relationships of the salamander fami- Proc. Natl. Acad. Sci. USA 80:4045-4049.
lies. Evol. Biol. 25:211-277. Leary, R. E, F W. Allendorf and K. L. Knudsen. 1984.
Larson, A, and W. W. Dimmick. 1993. Phylogenetic Major morphological effects of a regulatory gene
relationships of the salamander families: An Pgml-t in rainbow trout. Mol. Biol. Evol.
analysis of congruence among morphological and 1:183-194.
molecular characters. Herpetol. Monog. 7:77-93. Leberg, P. L. 1992. Effects of population bottlenecks on
Larson, A. and A. C. Wilson. 1989. Patterns of riboso- genetic diversity as mcasured by allozyme elec-
mal RNA evolution in salamanders. Mol. Biol. trophoresis. Evolution 46:477494.
Evol. 6:131-154. Lebherz, H. G. 1983. On epigenetically generated
Larson, A,, D. B. Wake and K. Yanev. 1984. isozymes ("pscudo isozymes") and their possible
Measuring gene flow among populations having biological relevance, pp. 203-218. In M. C.
high levels of genctic fragmentation. Genetics Rattazzi, J. G. Scandalios and G. S. Whitt (eds.),
106:293-308. Isozymes: Current Topics in Biological and Medical
Larson, A,, M. M. Kirk and D. L. Kirk. 1992. Molecular Research, Vol. 7. Molecular Structure and Regulation.
phylogeny of the volvocine flagellates. Mol. Biol. A. R. Liss, New York.
Evol. 9:85-105. Lechner, K., G. Wich and A. Bock. 1985. The nucleotide
Laurie-Ahlberg, C. C. and B. 5. Weir. 1979.Allozyrnic sequence of the 16s rRNA gene and flanking
variation and linkage disequilibrium in some lab- regions from Mefhairobacteriulnformicicunz: The
oratory populations of Drosoplzila melanogaster. phylogenetic relationship between methanogenic
Genetics 92:1295-1314. and halophilic Archaebacteria. Syst. Appl.
Lavery, S., C. Moritz and D. R. Fielder. 1995. Changing Microbiol. 6:157-163.
patterns of population structure and gene flow at Lecointre, G., H. Philippe, H, L. V.L&and H. L.
different spatial scales in the coconut crab (Birgus Guyader. 1993. Species sampling has a major
latro). Heredity 74:531-541. impact on phylogenetic inference. MoI.
Lavery, S., C. Moritz and D. R. Fielder. 1996a. The Phylogenet. Evol. 2:205-224.
effects of scale on the population structure of the Lee, M. R. and F.I? B. Elder. 1980. Yeast stimulation of
coconut crab (Birgus latro). (unpublished manu- bone marrow mitosis for cytogenetic investiga-
script) tions. Cytogenet. Cell Genet. 26:36-40.
Lavery, S., C. Moritz and D. R. Fielder. 1996b. Genetic Lee, S. B, and J. W. Taylor. 1992. Phylogeny of five fun-
patterns suggest exponcntial population growth gus-like protoctistan Phytophthora species,
in a declining species. (unpublished manuscript) inferred from the internal transcribed spacers of
Lavin, M., J. J. Doyle and J. D. Palmer. 1990. xibosomal DNA. Mol. Biol. Evol. 9:636653.
Evolutionary significance of the chloroplast DNA Leffers, H., J. Kjems, L. Ostergaard, N. Larsen and R.
inverted repeat in the Leguminosae subfamily A. Garrett. 1987. Evolutionary relationships
Papilionidae. Evolution 44:390-402. amongst Archaebacteria.A comparative study of
Lawrence, C. 8.1990. Use of homology domains in 23s ribosomal RNAs of a sulphur-dependent
sequence similarity detection. Meth. Enzymol. extreme thermophile, an extreme halophile and a
183:133-145. thermophilic methanogen. J. Mol. Biol. 195:43-61.
Lawrence, J. G., D. E. Dykhuizen, R. F,DuBose and D. Leipe, D, D., J. M.Gunderson, T. A. Nerad and M. L.
L. Hartl. 1989. Phylogenetic analysis using inser- Sogin. 1993. Small subunit RNA+ of Hexamita
tion sequence fingerprinting in Escherichia coli. tnflata and the quest for the first branch in the
Mol. Biol. EvoI. 6:1-24. eukaryotic tree. Mol. Biochem. Parisitol. 59:41-48.
Lawyer, E C., S. Stoffel, R. K. Saiki, K. Myambo, R. Lcnto, G. M., R. E. Hickson, G. K. Chambers and D.
Drummond and D. H. Gelfand. 1989. Isolation, Penny. 1995. Use of spectral analysis to test
characterization, and expression in Escherichia coli hypotheses on the origin of pinnipeds. Mol. Biol.
of the DNA polymerase gcne from Themzus Evol. 12:28-52.
aquaticus. J. Biol. Chem. 264:6427-6437. Leone, C. A. 1964. Taxonomic Biochemistry and Serology.
Learn, G. W., Jr. and B. A. Schaal. 1987. Population Ronald Press, New York.
subdivision for ribosomal DNA repeat variants in Leone, C. A. 1968. The immunotaxonomic literature:
Clematisfremontiz. Evolution 41:433438. The animal kingdom. Serol. Mus. Bull. 39:l-28.
Leary, J. J., D. J. Brigati and D. C. Ward. 1983. Rapid LeQuesne, W. J. 1982. Compatibility analysis and its
and sensitive colorimetric method for visualizi~~g applications. Zool. J. Linnean Soc. 74267-275.
biotin-labeled DNA probes hybridized to DNA or
Liteyatu16e Cited 599
Lessa, E. P. 1990. Multidimensional analysis of geo- degree of heterozygos~tym natural populations of
graphic genetic structure. Syst. Zool. 39:242-252. Wlasophrla pseudoobscrtra. Genetics 54 595-609
Lessa, E. P. 1992. Rapid surveying of DNA sequence Li, C. C. 1988. Pseudo-random mating. In cclebratlon
variation in natural populations. Mol. Biol. Evol. of the 80th anniversary of the Hardy-Weinbeig
9:323-330. law Genetics 119:731-737
Lessa, E. P. 1993. Analysis of DNA sequence variation LI, W-H. 1980. Rate of gcne siienclng at dupIicatc loc~.
at population level by polymerase chain reaction A theoretical study and interpretation of data
and denaturing gradient gel electrophoresis. from tetraploid fishes Gcnetlcs 95:237-258
Meth. Enzymol. 224:419-428. Li, W-H. 1981. Asimplc method for construct~ngplv-
Lessa, E. P. and C. Applebaum. 1993. Screening tech- logcnetic trees from distance matrices Proc Natl.
niqucs for detecting allelic variation In DNA Acad. Sci. USA 78 1085-1089.
sequences. MoI. EcoI. 2:119-129 LI, W.-H. 1986. Evolutionary change of restrict~on
Leu, S., J. Schlesinger, A. Michaels and N. Shavit. 7992. cleavage sites and pl~ylogenetlcmfercnce.
Complete DNA sequence of the Chlai7zydonzonas Genetlcs 113:187-213.
reinizardtii chloroplast atpA gene. Plant Mol. Biol. LI, We-H.1993a. So, what about the molecular clock
18:613-616. hypotl~esis?Curr. Opm. Genct. Dev. 3.896-901
Levan, A., D. Fredga and A. A. Sandberg. 1964. Li, W.-H. 199313. Unb~asedestimation of the rates of
Nomenclature for centromenc position on chro- synonymous and nonsynonymous substitution. J
mosomes. Hereditas 52:201-220. Mol. Evol. 36:96-99.
Levin, D. A. 1981. Dispersal versus gene flow in Li, W.-H. and M. Gouy 1991. Statistical methods tor
plants. Ann. Missouri Bat. Gard. 68.233-253. Icst~ngphylogenlcs, pp 249-277.111 M. M.
Lrvitan, D. R. and R K. Crosberg. 1993. The analysis Miyamoto and J. Cracraft (eds.), Phylogei~etrc
of paternity and maternity in the marine hydro- Anniysls of DNA Sequence? Oxford Univers~ty
zoan Hydractitzia symbioloizgicarpus using random- I)ress, New York.
ly amplified polymorphic DNA (ILtlPD) markers. LI, W.-H. and D. Graur, 1991. Fundamentals of Moiecr~lar
MoX. Eool. 2:315-328. Evolution. Sinauer, Sunderland, Massachusetts.
Leviton, A. E.,R. H. Gibbs, Jr., E. H. Heal and C. E. Li, W.-H and L. A. Sadler. 1991. Low nucleotlde diver-
Dawson. 1985. Standards in herpetology and sity in man. Genetics 129,513-523.
ichthyology. Part I: Standard symbolic codes for L1, W.-H. and M. Tanimura. 1987. The molecular clock
institutional resource collections in herpetology runs mare slowly in man than in apes and mon-
and ichthyology Copeia 1985:802-832. keys. Nature 326.93-96.
Lewin, B. M. 1987. Genes III. Wiey, New York. Li, W.-H. and A. Zhark~kh1995 Statistical tests of
Lewin, R. 1988. Conflict over DNA clock results. DNA phylogen~esSyst B101.44:49-63
Science 241:1598-1600. Li, W.-H., C.-I. Wu and C.-C. Luo 1984.
Lewis, P., J. TJ. Huelsenbeck and D. L. Swofford. 1996. Nonrandomness of point mutation as reflected in
Maximum likelihood. In D. L. Swofford, PAUP: nuclestide subst~tut~ons in pseudogenes and ~ t s
version 4.0. Sinauer Associates, Sunderland, evolutionary implications. J. Mol. Evol. 21.58-71.
Massachussetts. Li, W.-H., C.-C. Luo and C.4. Wu 1985a. Evolution of
Lewis, P. 0. and A. A. Snow. 1992. Deterministic pater- DNA sequences, pp. 1-130 in R. MacIntyre (cci ),
nity exclusion using RAPD markers. Mol. Ecol. Molecular Evolutions y Genctzcs. Plenum, New Yor k.
1:155-160. L,W.-H., C.-I. WU and C -C Luo. 1985b.A new
Lewontin, R. C. 1974. The Genetic Basis of Evolutionnry method for estimating synonymaus and nonsyn-
Change. Columbia University Press, New York. onymous rates of nucleotide substitution consld-
Lewontin, R. C. 1986. Population genetics. Annu. Rev. erlng the relative likel~hoodof nucleotide and
Genet. 19:81-102. codon changes. MQI Bloi. Evol, 2:150-171.
Lewontin, R. C, and C. C. Cockerham. 1959. The good- Li, W.-H., M. Tanimura and P.M Sharp. 1987a An
ness-of-fit test for detecting natural selection in evaluation of thc molecular clock hypothes~s
random mating populations. Evolution 13:561-564. using mammalian DNA sequences. J, Mol Evol.
Lewontin, R. C. and D. L. Hartl. 1991. Population 25.330-342.
genetics in forensic DNA typing. Science Li, W.-H., K.H. Wolfc, J. Sourdls and P. M. Sharp
254:1745-1750. 1987b. Reconstruction of phylogenetlc trees and
Lcwontin, R. C. and J. Hubby 1966.A molecular estunatlon of divergence tlmes under nonconstant
approach to the study of genlc heterozygosity in rates of evolut~on.Cold Spring Harbor Symp
natural populations. 11. Amounts of variation and Quant. Biol. 52:847-856.
600 Liferafure Cifed
Libby, R L. 1938. The phoironrefiectometer-an Loeb, L. A. and B.D. Preston. 1986. Mutagenesis by
mstruincnt for the measurement of turbid sys- apurinic/apyrimidinic sites. Annu. Rev. Genet.
terns J Immunol. 34:71-73. 20:201-230.
Llclltcr, P and D. C. Ward. 1990. Is non-isotonic in-situ Loh, E. Y., j.' E Elliott, S. Cwirla, L. L. Lanier and M. M.
hybsldlzation finally comlng of age? Nature Davis. 1989. Polymerase chain reaction with sm-
345 93-94. gle-stranded specificity: Analysis of T cell recep-
Lin, C . C ,G.Shipmann, W. A. IGttrelI and S. Olu~o. tor a chain. Science 243:217-220.
1969 The predomnance of heterozygotes found in Long, E. H. and 1. B. Dawid. 1980. Repeated genes in
wild goldfish of Lake Erie at the gene locus for sor- eukaryotes. Annu. Rev. Biochem. 49:727-764.
bltol dehydrogenase. Biochem. Genet. 3:603-607. Loomis, W. E 1988. Four Billion Years: An Essay on the
tindahl, T 1993. Instability and decay of the prrmsry Evolution of Genes and Orgnnisms. Sinauer,
structure of DNA. Nature 362:709-715. Sunderland, Massachusetts.
Linnaeus, C. 1758. Systema Naturne. 10th ed. Stockholm. Losos, J. 1994. An approach to the analysis of compar-
Lint, D , J. Clayton, L. Postma and R. Lillie. 1988. ative data when a phylogeny is unavailable or
Evolution of cetaceans. A serum albumin incomplete. Syst. Biol. 43:117-123.
~mmunologicaland biochemical perspective. Lopez, J. V., N. Yuhki, R. Masuda, W. Modi and S. J.
(Abst 1133.21.36).XVI lntcr~~ational Congress of O'Brien. 1994. Nurnt, a recent transfer and tandem
Genetics, Toronto. amplification of mitochondria1 DNA to the
Llpman, D J. and W. R. Prarson. 1985. Rapid and sen- nuclear genome of the domestic cat. J. Mol. Evol.
s ~ t ~ protein
ve similarity searches. Science 39:174-191.
227,1435-1441. Lowenstein, J. M. 1985a. Molecular approaches to the
Lipman, D. J., W. J. Wilbur, T. E Smith and M. S. identification of species. Amer. Sci. 73:541-547.
VVatermdn. 1984. On the statistical significance of Lowmstein, J. M. 1985b. Radioimmune assay of mam-
nucleic acid similarit~esNud. Acids Xes. moth tissue. Acta Zool. Fennica 170:233-235.
12 215-226. Lowenstein, J. M. and 0.A. Ryder. 1985.
Liston, A. 1992. Variation in the chloroplast genes Immunological systematics of the extinct quagga
rp0C1 and rpoC2 of the genus Astragalus (Equidae). Experientia 41:1192-1193.
(Fdbaceae); evidence from restriction s ~ t map-
e Lowenstein, J. M., V. M. Sarich and B. J. Richardson.
pmg of a PCR anlpiificd product. Am. J. Bot. 1981. Albumln systematics of the extinct mam-
79 953-961. moth and Tasmanian wolf. Nature 291:409-411.
Lrtt, M and J. A. Luty. 1989.A hypervariable Luke, S. and R. S. Verma. 1993. The genomic synteny
microsatellite revealed by in vitro amplification of at DNA level between human and chimpanzee
dinucleot~derepeat within the cardiac muscle chromosomes. Chromosome Res. 1:215-219.
actin gene. Am. J. Human Genet. 44:397-401. Lumb, W. V. and E.W. Jones. 1984. Veterinary
Liu, Z -G and G. R. Furnier 1993. Comparison of Anesthesia. 2nd ed. Lea and Pebriger,
aliozyme, RELP,and RAPD markers for revealing Philadelphia.
gcnelic variation within and between trembling Lundberg, J. G. 1972. Wagner networks and ancestors.
aspcn and bigtooth aspen. Theor. Appl. Genet. Syst. Zool. 21:398-413.
b7 97-1 05. Lundrigan, B. L. and P. K. Tucker. 1994. Tracing pater-
Llu, Z-G and L.M. Schwartz. 1992. An efficient nal ancestry in mice, using the Y-linked, sex-deter-
method for blunt-end llgation of PCRproducts mining locus, Sry. Mol. Biol. Evol. 11:483-492.
BiaTechniques 12:28-30. Lynch, M. 1988. Estimation of relatedness by DNA fin-
Lockhart, P. J., M.A Steel, M. D. Hendy and D.Penny. gerprinting. Mol. Biol. Evol. 5:584-599.
1994. Xect~veringevolutionary trees under a more Lynch, M. 1990. The similarity index and DNAfinger-
rcdirshc model of sequence evolution. Mol. B i d . printing. Mol. Biol. Evol. 7:478-484.
Fvoi 11:605-612. Lynch, M. 1991a. Analysis of population genetic struc-
Lockl~drt,I-' J., A. W. Larkum, M.A. Steel, P. J. ture by DNA fingerprinting, pp. 113-126. In T.
hiaddell and D. Penny. 1995a. Evolution of Burke, G. Dolf, A. J. Jeffreys and R. Wolff (eds.),
chlorophyll and bacterlochlorophyll: The problem DNA Fingerprinting: Appronches and Appl~cations.
of invariant sites in sequcnce analysis. Proc. Natl. BirM~auser,Boston.
Acdd. Sci. USA (in press). Lynch, M. 1991b. Methods for the analysis of coinpar-
Lockhart, P.J ,D. Penny and A. Meyer. 1995b. Testing ative data in evolutionary biology. Evolution
the phylogeny of swordtall fishes using split 45:1065-1080.
decomposition and spectral analysis. 7. Mol. Evol.
41 666-674
Lynch, M. and T. J. Crease. 2990. The analysis of popu- Maddison, D. R. 1990. Phylogenetic inference of hls-
lation survey data on DNA sequence variation. torical pathways and models of evolutionary
Mol. Biol. Evol. 7:377-394. change. Ph.D. dissertation, Llarvard University.
Lynch, M. and P. E.Jarrell. 1993. A method for calibrat- Maddison, D. R., M. Ruvolo and D. L. Swofford. 1992.
ing molecular clocks and its application to animal Geographic origins of human mitochondria1
mitochondria1 DNA. Genetics 135:1197-1208. DNA: Phylogenetic evidence from control region
Lynch, M. and B. G. Milligan. 1994. Analysis of popu- sequences. Syst. Biol. 41:lll-124.
lation genetic structure with RAPD markers. Mol. Maddison, W. P. 1989. Reconstructing character evolu-
Ecol. 3:91-100. tion on polyto~nousrladograms. Cladistics
Mabee, I? M. 1989.Assumptions underlying the use of 5:365-377.
ontogenetic sequences for determining character- Maddison, W. 1990. A method for testing the corre-
state order. Trans. Am. Fish. Soc. 118:151-158. lated evolution of two binary characters: Are
Mabee, P. M. 2993. Phylogenetic interpretation of gains or Ioses concentrated on certain branches of
ontogenetic change: Sorting out the actual and a phylogenekic tree? Evolution 44:539-557.
artefactual in an empirical case study of centrar- Maddison, W. P. 1991. Squared-change parsimony
chid fishes. Zool. J. Linnean Soc. 107:175-291. reconstructions of ancestral states for continuous
Mabee, P. M, and J. Humphries. 1993. Coding poly- valued characters on a phylogenetic tree. Syst.
morphic data: Examples from allozymes and Zool.40:304-314.
ontogeny. Syst. Biol. 42166-181. Maddison, W. P. and D. R. Maddison. 1992. MacClade,
MacArthur, R, H. and E. 0.Wilson. 1963. An equilibri- version 3.0. Sinauer, Sunderland, Massachusetts.
um theory of insular zoogeography. Evolution Maddison, W. P., M. J. Donoghue and D. R. Maddison.
17:373-387. 1984. Outgroup analysis and parsimony. Syst.
MacArthur, R. H. and E. 0.Wilson. 1967. The Theoy of Zool.33:83-103.
Island Biogeography. Princeton University Press, Maeda, N., C. -I. Wu, J. Bliska and J. Reneke. 1988.
Princeton. Molecular evolution of intergenic DNA in higher
Macgregor, H. C. 1993. An Introduction to Animal pirmates: Pattern of DNA changes, molecular
Cytogenetjcs. Chapman and Hall, London. clock, and evolution of repetitive sequences. Mol.
Macgregor, H. C. and S. K. Sessions. 1986. The biologi- Biol. Evol. 5:l-20.
cal significance of variation in satellite DNA and Mailer, R. J., R. Scarth and B. Fritensky. 1994.
heterochromatin in newts of the genus Triturus: Discrimination among cultivars rapeseed
An evolutionary perspective. Phil. Trans. Roy. (Brassica napus L.)using DNA polymorphisms
Soc. London B312:243-259. amplified from arbitrary primers. Theor. Appl.
Macgregor, H. C. and S. Sherwood. 1979. The nucleo- Genet. 87:697-704.
lus organizers of Plethodon and Aneides located by Maiste, P. J. 1993. Comparison of statistical tests for
in situ nucleic acid hybridization with Xenopus independence at genetic loci with many alleles.
3H-ribosomalW A . Chromosoma 72271-250. Ph.D. dissertation, North Carolina State
Macgregor H. C. and J. Varley. 1983. Working with University, Raleigh.
Animal Chromosomes. John Wiley and Sons, New Malcolm, S., J. K. Cowell and B. D. Young. 1986.
York. Specialist techniques in research and diagnostic
Macgregor, H. C., S. K. Sessions and J. W. Arntzen. clinical cytogenetics, pp. 197-226. In D. E. Rooney
1990. An integrative anaiysis of phylogenetic rela- and B. H. Czepulkowski (eds.), Human
tionships among newts of the genus Triturus Cytogenetics. IRLPress, Oxford.
(family Salamandridae), using comparative bio- Maldonado, I. E. 1992. Problems in the identification
chemistry, cytogenetics, and reproductive interac- of XDH in vertebrates. Isozyme Bull. 25:72
tions. J. Evol. Biol. 3329-273. Manchenko, G. P. 1988. Subunit structure of enzymes:
MacIntyre, R. J. 1976. Evolution and ecoiogical value Allozymic data. Isozyme Bull. 21344-158.
of duplicate genes. Annu, Rev. Ecol. Syst. Manchenko, G. 1.' 1994. Handbook of Detection of
7:421468. Enzymes on Electrophoretic Gels. C.R.C. Press, Ann
MacIntyre, R. J. (ed.) 1985.Molecular Evolutionary Arbor.
Genetics. Plenum, New York. Mancmo, G., lvr. liagghianti and S. Bucci-lnnocenti.
MacIntyre, R. j., M. X. Dean and G. Batt. 1978. 1977. Cytotaxonomy and cytogenetlcs in
Evolution of acid phosphatase-1 in the genus European newt species, pp. 411-447. In 3.EI.
Drosopizila. Immunological studies. J. Mol. Evol. Taylor and S. I. Guthnan (eds.), The Reproductive
12:121-142. Biology of Amphibians. Plenum, New York.
Maniatis, T., E. F. Fristch and J. Sambrook. 1982. Marsden, J. E. and B. May. 1984. Feather pulp: A non-
MolecuIar Cloning: A Laboratory Mariual. Cold destructive sampling technique for electrophoret-
Spring Harbor Publications, Cold Spring Harbor, ic studies of birds. Auk 101:173-175.
New York. Marsh, T. L., C. I. Reich, R. B. Whitelock and G. J.
Manly, B. E J. 1991. Randornizatioiz and Monte Carlo Olsen. 1994. Trat~scrlptionfactor LID tn the
Methods in Biology. Chapman and Hall, NEWYork. Arcbea: Sequences in the Therrnococcus celer
Mann, C. 1990. Meta-analysis in the breech. Science genome would encode a product closely related
249:476-480. to the TATA-binding protein of eukaryotes. Proc.
Manos, P. S., K. C. Nixon and J. J. Doyle. 1993. Natl. Acad. Sci. USA 81:4180-4184.
Cladistic analysis of restriction site variation Marshall, C. R. 1990. The fossil record and estitnating
within the chloroplast DNA inverted repeat divergence times between lineages: Maximum
region of selected Harnamelididae. Syst. Bot. divergence times and the importance of reliable
18:551-562. phylogenies. J. Mol. Evol. 30:400-408.
Manuelidis, L., I?. R. Langer-Safer and D. C. Ward. Marshall, C. R. 1992. Character analysis and the inte-
1982. High-resolution mapping of satellite DNA gration of molecular and morphological. data in
using biotin-labeled DNA probes. J. Cell Biol. an understanding of sand dollar phylogeny. Mol.
95:619-625. Biol. Evol. 9:309-322.
Mao, S.-H, and B.-Y. Chen. 1982. Serological relation- Marshall, C. R.and H.Swift. 1992. DNA-DNA
ships of turtles and evolutionary implications. hybridization phylogeny of sand dollars and
Comp. Biochem. Physiol. 71B:173-179. highly reproducible extent of hybridization val-
Mae, S.-H., 8.-Y. Chen, E-Y. Yin and Y.-W. Guo. 1983. ues. J. Mol. Evol. 34:31-44.
Immunotaxonomic relationships of sea snakes to Martin, A. P. and S. R.Palumbi. 1993a.Protein evolu-
terrestrial snakes. Comp. Biochem. Physiol. tion in different cellular environments:
74A:869-872. Cytochrome b in sharks and mammals. Mol. Biol.
Mao, S.-I-I., W. Frair, F.-Y. Yln and Y.-W. Guo. 1987. Evol. 10:873-891.
Relationships of some Cryptodiran turtles as sug- Martin, A. P. and S. R. Palumbi. 1993b. Body size,
gested by immunological cross-reactivity of metabolic rate, generation time and the molecular
serum albumins. Biocl~em.Syst. Ecol. 15:621-624. clock. Proc. Natl. Acad. Sci. USA 90:40874091.
Marchant, A. D., M. L. Arnold and P. Wilkinson. 1988. Martin, A. P., R. Humphreys and S. R Palumbi. 1992a.
Gene flow across a chromosomal tension zone. I. Population genetic structure of the armorhead,
Relicts of ancient hybridization. Heredity Pseudopentaceros wheeleri, in the North Pacific
61:321-328. ocean: Application of the polymerase chain reac-
Marchuk, D., M. Drumm, A. Saulino and F.S. Collins. tion to fisheries problems. Can. J. Fish. Aquat. Sci.
1991. Construction of T-vectors, a rapid and gen- 49:2386-2391.
eral system for direct cloning of unmodified PCR Martin, A. P., G. J. P. Naylor and S. R. Palumbi. 1992b.
products. Nucl. Acids. Xes. 19:1154. Rates of mitochonrial DNA evolution in sharks
Markert, C. L. 1983. Isozymes: Conceptual history and are slow compared to mammals. Nature
biological significance, pp. 1-17. In M. C. Rattazzi, 357:153-155.
J. G. Scandalios and G.S. Whitt (eds.), Isozymes: Martins, E.P. and T. Garland, Jr. 2991. Phylogenetic
Current Topics in Biologtcal and Medical Researclz, analyses of the correlated evolution of continuous
Vnl. 7. Molecular Structure and Xegulatton. A. R. characters: A simulation study. Evolution
Liss, New York. 45:534-557.
Markert, C. L. and E Mollcr. 1959, Multiple forms of Martinson, H. G. 1973. The nucleic acid-hydroxyap-
enzymes: Tissue, ontogenetic, and species-specific atite interaction. 11. Phase transitions in the
patterns. Proc. Natl. Acad. Sci. USA 45753-763. deoxyribonucleic acid-hydroxyapatite system.
Markert, C. L., J. B. Shaklae and G. S. Whitt. 1975. Biochemistry 12:145-150.
Evolution of a gene. Science 189:102-114. Mason, I. J. 1992. Rapid and direct sequencing of DNA
Marklund, S., H. Ellegren, 5. Eriksson, K. Sandberg from bacteriophage plaques using sequential lin-
and L. Andersson. 1994. Parentage testing and ear and asymmetric PCR. BioTechniques 12:60-61
linkage analysis in the horse using a set of highly Massaro, E. J. and C. L. Markert. 1968. Protein staining
potymorphic microsatellites.h i m . Genet. on starch gels. J. Histochem. Cytochcm.
25:19-23. 16:380-382.
Markowitz, E. 1970. Estimation and testing goodness- Matson R. H. 1984. Applications of eiectrophorctic
of-fit for some models of codon fixation variabili- data in avian systematics. Auk 101:717-729.
ty. Biochem. Genet. 4:595-601.
Matson, R. H. 1989.Avian peptidase isozymes: Tissue Mayden, R.L (ed.). 1992. Systemntzcs, Hisfor~cal
distributions, substrate affinities, and assignment Ecology, and North Amerlcaiz Reshwater Flshes
of homology. Biochem. Genet. 27:137-151. Stanford University Press, Stanford, Califnr~lla
Maure, R. R. 1978. Freezing mammalian embryos: A Mayr, E.1942. Systemal.rcs nnd the 01 zgzn of Spec~es
review of techniques. Theriogenology 9:45-68. Reprinted 1982, Columbia University Press, NCLV
Maxam, A. M. and W. Gilbert. 1977. Anew method for York.
sequencing DNA. Proc. Natl Acad. Scl. USA Mayr, E. 1983. The Growth of B~ologrcalTIzougizt
74:560-564. Diversity, Evolution, and bzlzerztance. Harvard
Maxam, A. M. and W. Gilbert. 1980. Sequencing end- Uluversity Press, Cambndge, Massachusetts
labeled DNA with base-specific chemical cleav- Mazur, P, 1970. Cryoblology: The freezmg of b~ological
ages. Metl~.Enzymol. 65:499-559. systems. Science 168939-949.
Maxson, L. R. 1981. Albumin evolution and ~ t phylo-s McBec, K., R. J. Baker and R L.l-Ioneycutt. 1987.
genetic implications in toads of the genus Bufo. II. Observations on rates of D N A degradation Abstr
Relationships among Eurasian Bufo. Copeia 87, Ann. Meet., Amcr. Soc. Man~malogists,
1981:579-583. Albuquerque, New Mexico.
Maxson, L. R. 1984. Molecular probes of phylogeny McClenaghan, L. R., Jr., M. H. S m ~ t hand M. W Smitl~
and biogeography in toads sf the widespread 1985. Biochemical genetics of mosquttofish IV
genus Bufo. Mol. Biol. Evol. 1345-356. Changes of allele frequenclcs through time and
Maxson, L. R. and C. H. Daugherly. 1980. space. Evolution 39:451-460.
Evolutionary relationships of the monotypic toad McCouch, S. R., G. Kochert, Z. H Yu, Z. Y. Wang, C; S,
family Rltinopl~rynidae:A biocltemical perspec- IU~ush,W. R. Coffman and S D. Tanksley 1988.
tive. Herpetologica 36:275-280. Molecular mapping of rice cltroinosomcs Tl~eor.
Maxson, L. R. and R.D. Maxson. 1990. Proteins TI: Appl. Genet. 76:815-829.
Immunological techniques, pp. 127-155. In W.M. McCracken, G. E and J. W. Bradbury. 1977. Paternity
Hillis and C. Moritz (eds.), Molecular Systematics. and genetic heterogei~eltyIn the polygynous bat,
Sinauer, Sunderland, Massachusetts. I-'hyllostonzus Izastatus. Sc~ei~ce 198:303-306.
Maxson, L. R. and J. D. Roberts. 1985. An immunologi- McDonald, 14.S. 1976. Methods for the physiolog~cal
cal analysis of the phylogenetic relationships study of reptiles, pp. 19-1 26.11~C. Gans and W. P,
between two enigmatic frogs, Myobatrachus and Dawson (eds.), B~ologyof the Xept~lm,Vol 5.
Arenophryne J. Zool. (London) 207:289-300. Academic Press, Ncw York
Maxson, L. R. and J. M. Szymura. 1984. Relationships McDonald, J, f-l. 1989. Selection component analysis of
among discoglossid frogs: An albumin perspec- rhc Mpi locus in the amplupod Platorcizestrn pintell-
tive. Amphibia/Reptiha 5:245-252. sls. Heredity 62:243-249
Maxson, L. R. and A. C. Wilson. 1975. Albumin evolu- McDonald, J. H. and M. Kreitman. 1991. Adaptive pro-
tion and organismal evolution in tree frogs tein evolution at the Adh locus 111 Drosophlla.
(Hylidae). Syst. Zool. 24:l-15. Nature 351652-654.
Maxson, L. R., R.Highton and D. B. Wake. 1979. McDonald, J. H a n d J. F Sicbcnaller 1989. Similar gco-
Albumin evolution and its phylogenetic implica- graphic variation at the LAP locus In the musscls
tions in the pletl~odontidsalamander genera Myfzlus trossulus and M edulis. Evolution
Pletlzodon and Ensatina. Copeia 1979:502-508. 43:228-231.
Maxson, L. R., L. S. Ellis and A,-R. Song. 1981. McDoncll, M. W., M. N. Simon and F. W. Studier 1977
Quantitative immunological studies of the albu- Analys~sof restriction fragments of T7 DNA and
mins of North American squirrels, family determination of molecular welghts by elec-
Sciuridae. Comp. Biochem. Physiol. 68B:397400. trophoresis in neutral and allcal~negels. J. Mol.
Maxson, R. D. and L. R. Maxson 1986. Micro-comple- BiaL 110:119-146.
lnent fixation: A quantitative estimator of protein McGovern, M. and C. R. Tracy. 1981. Phenotypic varla-
evolution. Mol. Biol. Evol. 3:375-88. tion in electromorphs previously considered to Isc
May, C. A., J. H. Wetton, P. E. Davis, J. E Brookficld genetic markers in Microtus ochrogaster. Oecolog~a
and D. T. Farkin. 1993. Single-locus profiling 51:276-280.
reveals loss of vartation in inbred populations of McInnes, J. L., P. D.Vise, N. Habilt and R. Xi. Symons
the red kite (Mzlvus milvus). F'roc. Roy. Soc. Land. 1987. Chemical b ~ o t ~ ~ ~ y l aoft lnuclelc
on acids nrith
I3 251:165-170. pltotoblotin and the~ruse as hybridization probes.
Mayden, R. L. 1986. Speciose and depauperate phy- Focus (BRL) 9:l-4.
lads and tests of punctuated and gradual evolu-
tion: Fact or artifact? Syst. Zool. 35:591-602.
McKusick, V. A. 1988. The Morbid Anatomy of the Michelmore, R. W., I. Paran and R. V. Kesseli. 1991.
7-Innran Genome. Howard Hughes Medical Identification of markers linked to disease-resis-
li-istitute. tance genes by bulked segregant analysis: A rapid
McLellan, T. 1984. Molecular charge and elec- method to detect markers in specific genomic
trophoret~cmobillty in cetacean myoglobins of regions by using segregating populations. Proc.
known sequence. Biochem. Genet. 22:181-200. Natl. Acad. Sci. USA 88:9828-9832.
McLellan, T. and L. S. Inouye. 1986. The sensitivity of Mickevich, M. E and M. S. Johnson. 1976. Congruence
isoelcctric focusing and electrophoresis in the between morphological and allozyme data in
detection of sequence differences in proteins. evolutionary inference and character evolution.
Blochem. Genet. 24.571-577. Syst. Zool. 25:260-270.
McLcnnan, D.A. 1991. Integrating phylogeny and Milinkovitch, M. C. 1995. Molecular phylogeny of
experiiizental ethology From pattern to process. cetaceans prompts revision of morphological
Evolution 45.1773-1789. transformations. Trends Ecol. Evol. 10:328-334.
McLennan, D. A., D. R. Brooks and J. D. McPhail. Miller, A. J. 1990, Subset Selection in Regression.
1988. The benefits of communication between Chapman and Hall, London.
comparative ethology and phylogenetic systemat- Miller, H. 1987. Practical aspects of preparing phage
~ c sA: case study using gasteroid fishes. Can. J. and plasmid DNA: Growth, maintenance, and
Zool. 662177-2190. storage of bacteria and bacteriophage. Meth.
McPheron, 13. A., D. C. Smith and S. H. Berlocher. Enzymol. 152:145-170.
1988, Genetic differences between host races of Miller, J. C. and S. D. Tanksley. 1990a. Effectof differ-
liilagolefrs pornonella. Nature 336:64-66. ent restriction enzymes probe source, and probe
McWr~ght,C. G.,J. J. Kearizey and J L. Mudd. 1975. length on detecting restriction fragment length
Effect of environmental [actors on starch gel elec- polymorphism in tomato. Theor. AppI. Genet.
trophoretic patterns of human erythrocyte acid 80:385-389.
phosphatase, pp. 151-161. In G. Davis (ed.), Miller, j. C. and S. D.Tanksley. 1990b. WLP analysis uf
Forenslc Science. Amer. Chem. Soc. Symp. Ser. 13, phylogenetic relationships and genetic variation
ACS. Washington, D. C. in the genus Lycopersicon. Theor. Appl. Genet.
Meagher, S. and T.E. Dowhng. 1991. Hybridization 80:437-488.
between the cyprinid fishes Luxilus albeolus, L, cor- Miller, R. G. 1974. The jackknife: Areview. Biometrika
nutus, and L, cerasinus with comments on the pro- 61:l-15.
posed hybrid origin of L. albeious. Copeia Milligan B. G. 1992. Is organelle DNA strictly mater-
1991,979-991. nally inherited? Power analysis of a binomial dis-
Melchlor, W. B. and P H Von Hippel. 1973.Alteration trtbution. Am. J. Bot. 79:1325-1328.
31 the relative stability of dA-dT and dG-dC base Milligan, B. G. and C. K. McMurray. 2993. Dominant
pairs m DNA. Proc Natl. Acad. Sci. USA versus codominant markers in the estimation of
70.298-302. male mating success. Mol. Ecol. 2:275-284.
Xicllor, J. D. 1978. Fundarnerzials ofFreeze-Diylng. Mindell, D. 1' and R. L. Honeycutt. 1990. Ribosomal
Academic Press, New York. RNA in vertebrates: Evolution and phylogenetic
Mcnken, S. 13. J. 1987. Is the extremely low heterozy- implications. Annu. Rev. Ecol. Syst. 21:541-566.
gosity level in Ypononiruta rorellus caused by bot- Mindell, D, P,, J, W. Sites, Jr. and D. Graur. 1989.
tlenecks? Evolution 41:630-637. Speciational evolution: A plzylogenetic test with
Merr~tt,R B., J. F. Rogers and B. J. Kurz. 1978. Genetic allozymcs in Sceloporus (Reptilia).Cladistics
variability in the longnose dace, Xhinichdhys 5:49-61.
cataractae. Evolution 32:116-124. Mindell, D.I?., J. W. Sites, Jr. and D. Graur. 1990.
hreycr, A and A. C. W11soi-i.1990. Origin of tetrapods Assessing the relationship between speciation
inferred from their mitochondria1 DNA affiliation and evolutionary change. Cladistics 6:393-398.
io lungfish. J. Mol. Evol 31:359-364. Mindell, D. P., C. W. Dick and R. J. Baker. 1991.
blryerowitz, E. M. and C. H.Martin. 1984. Adjacent Phylogenetic relationships among megabats,
chronzoso~nalregions can evolve at very different microbats, and primates. Proc. Natl. Acad. Sci.
rates: Evolution of the Drosopilila 68C glue gene USA 88:10322-10326.
rigLon. 1.Mol. Evol. 20.251-264. Minton, S. A. and S, K. Salanitro. 1972. Serological
hl~cales,J. A., M, R. Bonde and G. L. Peterson. 1986. relationships among some colubrid snakes.
The use of isozyrne analysis in fungal taxonomy Copeia 1972:246-252.
and genetics. Mycotaxon 27:405449. Mitchell, L. G. and C. R. Merril. 1989.Affinity genera-
tion of single-stranded DNA for dideoxy sequenc- PCR primer pairs in closely related species.
ing following the polymerase chain reaction. Genomics 10:654-660.
Analyt. Biochem. 178:239-242. Moore, W. S. 1995. Inferr~ngphylogenies from mtDNA
Miyamoto, M. M. 1981. Congruence among character variation: Mitochondrial-gene trees versus
sets in phylogenetic studies of the frog genus nuclear-gene trees. Evolution 49:718-726.
Leptodacfylus. Syst. Zool.30:281-290. Moran, P. and I. Kornfield. 1993. Retention of ancestral
Miyamoto, M. M. 1983. Biochemical variation in polymorphism in the rnbuna species flock
Eleutherodactylus bransfordii: Geographic patterns (Teleostei: Cichlidae) of Lake Malawi. Mol. Biol.
and cryptic species. Syst, Zool. 321:43-51. Evol. 10:1015-1029.
Miyamoto, M. M. 1985. Consensus cladograms and Morden, C. W. and S. S. Golden. 1989.psbA genes indi-
general classifications. Cladistics 1:186-189. cate common ancestry of prochloropl~ytesand
Miyamoto, M. M. and S. M. Boyle. 1989. The potential chloroplasts. Nature 337:382-385.
importance of mitochondria1 DNA sequence data Morescalchi, A. 1973.Amphibia, pp. 233-348. In A. 13.
to eutherian mammal phylogeny, pp. 437-450. In Chiarelli and E. Capanna (eds.), Cytotaxonomy and
B. Fernholm, K. Bremer and H. Jornvall (eds.), The VertebrafeEvolution. Academic Press, New York.
Hierarchy of Life. Elsevier, Amsterdam. Morescalchi, A. 1975. Chromosome evolution in the
Miyamoto, M. M. and J. Cracraft. 1991. Phylogenetic caudate amphibia. Evol. Biol. 8:339-387.
inference, DNA sequence analysis, and the future Morgan, K. and C. Strobeck. 1979. Is intragenic recom-
of molecular systematics, pp. 3-17. In M. M. bination a factor in the maintenance of genetic
Miyamoto and 1. Cracraft (eds.), Phylogenetic variation in natural populations? Nature
Analysis of D N A Sequences. Oxford University 277:383-384.
Press, New York. Morgante, M. and A. M. Oliveri. 1993. PCR-amplified
Miyamoto, M. M. and W. M. Rtch. 1995. Testing microsatellites as markers in plant genetics. Plant
species phylogenies and phylogenetic methods 1. 3:175-182.
with congruence. Syst. Biol. 44:64-76. Morin, A,, J. J. Moore and D. S. Woodruff. 1992.
Miyan~oto,M. M., J. L. Slightom and M. Goodman. Identification of chimpanzee subspecies with
1987. Phylogenetic relationships of humans and DNA from hair and allele-specific probes. Proc.
African apes as ascertained from DNA sequences Roy. Soc. London B 249:293-297.
(7.1 kilobase pairs) of the W-globin region. Morin, P. A., J. J. Moore, R. Chakraborty, J. Li, J.
Science 238:369-373. Goodall and D. S. Woodruff. 1994. Kin selection,
Miyamnto, M. M., F: Kraus and 0.A. Ryder. 1990. social structure, gene flow, and the evolution of
Phylogeny and evolution of antlered deer deter- chimpanzees. Science 265:1193-1201.
mined from mitochondrial DNA sequences. Proc. Moritz, C. 1983. Parthenogenesis in the endemic
Natl. Acad. Sci. USA87:6127-6131. Australian lizard Heteronotia binoei (Gekkonidae).
Miyamoto, M. M., M.W. Allard, R. M. Adkins, L. L. Science 220:735-737.
Janecek and R. L. Honeycutt. 1994. A congruence Moritz, C. 1987. Parthenogenesis in the tropical
test. of reliability using linked mitochondria1DNA gekkonid lizard, Nactus arnouxii (Sauna:
sequences. Syst. Biol. 43:236-249. Gekkoludae). Evolution 41:1252-1266.
Mizusawa, S., S. Nishimura and R Seela. 1986. Moritz, C. 1991a. Evolutionary dynamics of mitochon-
Improvement of the dideoxy chain termination drial DNA duplications in parthenogenetic geck-
method of DNA sequencing by use of deoxy-7- os, Heteronotia binoei. Genetics 129:221-23.
deazaguanosine triphosphate in place of dGTP. Moritz, C. 1991b. The origin and evolution of
Nucl. Acids Res. 14:1319-1324. parthenogenesis in Heteronotia binoei (Gek-
Moore, D. W, and T. L. Yates. 1983. Rate of protein konidae): Evidence for recent and localized ori-
inactivation in selected animals following death. gins of widespread clones. Genetics 129:211-219.
J. Wildl. Manag. 47:1166-1169. Moritz, C. 1994.Applications of mitochondria1 DNA
Moore, G. W. 1976. Proof for the maximum parsimony analysis on conservation: A critical review. Mol.
("Red King") algorithm, pp. 117-137. In M. Ecol. 3:401-411.
Goodman and R. E. Tashian (eds.),Molecular Moritz, C. 1995. Uses of molecular phylogenies for con-
Anthropology, Plenum, New York. sewation. Phil. Trans. Roy. Soc. London (in press).
Moore, S. S., L. L. Sargeant, T. J. King, J, S. Mattick, M. Moritz, C. and W. M. Brown. 1986. Tandem duplica-
Georges and D. J. S. Hetzel. 1991. The conservation of D-loop and ribosomal RNA sequences in
tion of dinucleotide microsatellites among mam- lizard mitochondria1DNA. Science
malian genomes allows the use of heterologous 233:1425-1427.
Moritz, C. and W. M. Brown. 1987. Tandem duplica- 207-234. In 2. I. Og~taand C. L. Markert (eds.),
tions in animal mitochondria1DNAs: Variation in Isozymes: Structure, Function, and Use in Biology
incidence and gene content among lizards. Proc. and Medzcine. Wiiey-Liss, New York.
Natl. Acad. Sci. USA 84:7183-7287. Morizot, D. C. and M. E. Schmidt. 1990. Starch gel
Moritz, C. and A. Heideman. 1993. The origin and electrophoresis and histochemical visualization of
evolution of parthenogencsis in Heteroirotza binoei proteins, pp. 23-80. In D. H.Whitmore (ed.),
(Gekkonidae): Reciprocal origins and diverse Electrophnreticand Isoelecfrlc Focusing Techniques in
mitochondria1 DNA in western populations. Syst. Fisherzes Management. CRC Press, Boca Raton, FL.
Biol. 42:293-306. Morizot, D. C. and M. J. Siciliano. 1982. Linkage of
Moritz, C., T. E. Dowling and W. M. Brown. 1987. two enzyme loci in fishes of the genus
Evolution of animal mitochondrial DNA: Xiphophorus (Poecillidae).J, I-Iered. 73:163-167.
Relevance for population biology and systemat- Morizot, D. C. and M. J. Siciliano. 1984. Gene mapping
ics. Annu. Rev. Ecol. Syst. 18:269-292. in fishes and other vertebrates, pp. 173-234. In B.
Moritz, C., M. Adams, S. Donnellan and P. Baverstock. J. Turner (ed,), Evolutiotlavy Genetics of Fishes.
1989a. The origins and evolution of parthenogen- Plenum, New York.
esis in Heteronotin binoet: Genetic diversity among Morizot, D. C., J. A. Greenspan and M. J. Siciliano.
bisexual populations. Copeia 1990:333-348. 1983. Linkage group VI of fishes of the genus
Moritz, C., W. M. Brown, L. D. Densmorc, J. W. Xiphophorus (Poeciliidae):Assignment of genes
Wright, DD. Vyas, S. Donnellan, M. Adams and P. coding for glutamine synthetase, uridine
Baverstock. 198913. Genetic diversity and the monophosphate kinase, and transferrin. Biochem.
dynamics of hybrid parthenogenesis in Genet. 21:1042-1049.
Cnemidophorus (Teiidae) and Heteronotia Moss, D. W. 1982. Isoenzymes. Chapman and Hall,
(Gekkonidae),pp. 87-112. In R. M. Dawley and J. New York.
P. Bogart (eds.), The Biology of LI,zisexual Motro, U. and G. Thomson. 1982. On heterozygosity
Vertebrates. New York State Museum, Albany. and the effective size of populations subject to
Moritz, C., S. Donnellan, M. Adams and P. R. size changes. Evolution 36:1059-1066.
Baverstock. 1989c. The origin and evolution of Mubumbila, M. V., 0.Carelse and J. Kempf. 1993.
parthenogenesis in Heteronotia binoei Isolation by asymmetric polymerase chain reac-
(Gekkonidae): Extenswe genotypic diversity tion and partial sequencing of thc common bcan
among parthenogens. Evolution 43:994-1003. chloroplast trnL (UAA) gene and pseudogene.
Moritz, C., C. J. Schneider and D. B. Wake. 1992a. Phytochem. Anal. 4:145
Evolutionary relationships within the Ensatina Mueller, L. D. and R J. Ayala. 1982. Estimation and
eschscholtziicomplex confirm the ring species interpretation of genetic distance in empirical
interpretation. Syst. Biol. 41:273-291. studies. Genet. Res. 40:127-137.
Moritz, C., T. Uzzell, C. Spolsky, H. Hotz, I. Darevsky, Mulley, J. C. and B. D. H. Latter. 1980. Genetic varia-
L. Kupriyanova and .?I Danielyan. 1992b. The tion and evolutionary relationships within a
maternal ancestry and approximate age of group of thirteen specles of penaeid prawns.
parthenogenetic species of Caucasian rock lizards Evolution 34:904-916.
(Lacerta: Lacertidae). Genetica. 87:53-62. Mullis, K. B. and E A. Faloona. 1967.Specific synthesis
Moritz, C., L. Joseph and M. Adams. 1993. Cryptic of DNA in vitro via a polymerase catalyzed chain
diversity in an endemic rainforest ski& reaction. Meth. Enzymal. 155:335-350.
(Gnypetoscincus queenslandiae). Biodiv. Conserv. Mummenhoff, K.and M. Koch. 1994. Cl~loroplast
2:412425. DNA restriction site variation and phylogenetic
Moritz, C., A. Heideman, N. N. FitzSimmons, A. relationships in the genus Thfaspi sensu lato
Hugall and P. Hale. 1996. Microsatellitesfor (Brassicaceae).Syst. Bot. 19:73-88.
macropods (Marsupialia): Cross-species polymor- Muona, O., R. Yazdani and D. Rudin. 1987. Genetic
phism and amplification artefacts. (unpublished change between life stages in Pinus sylvestris:
manuscript) Allozyme variation in seeds and planted
Moriyama, E. N., Y. Ina, K. Ikeo, N. Shimizu and T. seedlings. Silvae Genet. 3659-42.
Gojobori. 1991. Mutation pattern of human Muralidharan, K. and K. E. Wakeland. 1993.
immunodeficiency virus genes. J. Mol. Evol. Concentration of primer and template qualitative-
32:360-363. ly affects products in random-amphfied polymor-
Morizot, D. C. 1990. Use of Ash gene maps to predict phic DNA PCR. BioTechniques 14:362-364.
ancestraI vertebrate genome organization, pp.
Muramatsu, T., S. Kan and M. Hiraishr. 1978. Isolation Myers, 17. M., T.Maniatus and L. S. Lerman. 1986.
and characterization of lipoamide dehydrogenase Detection and localization of single base changes
from mackerel dark muscle. Comp. Biochem. by denaturing gradient gel electropl~oresls.Met11
Physiol. 61B:247-252. Enzymol. 155:501-527.
Murawski, D. A. and J. L. Hamrick. 1990. The effect of
the density of flowering individuals on the mat- Mycrs, R. M., V.C Sheffield and D.R. Cox. 1989
ing systems of nine tropical tree species. Heredity Mutation detechon, GC-clamps, and dena tun115
67:167-174. gradient gel electrophoresis, pp. 71-88. In H.A
Murphy, R. W. 1983a. Paleobiogcography and genetic ErIich (ed 1, PCB Tecl~lzologyPrlrzc~plesnnd
diffcrcntiation of thc Baja California herpetofau- Applications for DNA Anzpilficatton . Stockton
na. Occ. Pap. California Acad. Sci. 137.148. Press, New York.
Murphy, R. W. 1983b. The reptiles. Origin and evolu-
tion, pp. 130-158. In T. J. Case and M. L. Cody Nadeau, J. H., J. Britton-Davidian, F Bonbomn~eand
(eds.), Island Bzogeography in the Sea of Cortez. L. Thaler. 1988. H-2 polymorphisms are more uni-
University of California Press, Berkeley. formly distributed than allozylne polymorph~sms
Murphy, R. W. 1988. The problematic phylogenetic in natural populations of house mice. Gcnetlcs
analysis of interlocus heteropolymer isozyme 118:131-140.
characters: A case study from sea snakes and Nakamura, Y., M. Leppcrt, P. O'Conncll, R. Wolff, T
cobras. Can. J. Zool. 66:2628-2633. Holm, M. Culver, C Mart~n,E. Fujimoto, M. Hoff,
Murphy, R. W. 1993. The phylogenctic analysis of E. Kumlin and R.White. 1987 Variable number of
allozyme data: Invalidity of coding alleles by tandem repeat (VNTR) markers for 11uman gene
presence/absence and recommended procedures. mapping. Scrence 235,1616-1622.
Biochem. Syst. Ecol. 21:2538. Nakanislu, M.,A. C. Wilson, A. Nolan, G. C. Gorrnan
Murphy, R. W. and C. B. Crabtree. 1985a. Genetic rela- and G. S. Bailey 1969. Phenoxyethanol: Protc~n
tionships of the Santa Catalina Island rattleless prescrvative for taxonomrsts. Sclencc 163-681-683.
rattlesnake, Crotalus catalinensis (Serpentes: Nanney, D. L., R. M. Preparata, E P. Preparata, E B
Viperidae). Acta Zool. Mexicana (n.s.) 9:1-16. Meyer and E. M Simon. 1989. Shifting dltyplc slte
Murphy, R. W. and C. B. Crabtree. 198513. Evolutionary analysis: Heuristics for expanding the pl~yloge-
aspects of isozyme patterns, number of loci, and netic range of nucleot~desequences in Sankofl
tissue-specific gene expression in the prairie rat- analysis. J. Mol. Evol 28.451-459.
tlesnake, Crotalus vtrtdis viridis. Herpetologica Nardi, I., F. Andronico, S. De Lucch~niand R. bat is ton^
41:451-470. 1986. Cytogenehcs of the European plethodonttd
Murphy, R. W. and N. R. Lovejoy. 1995. Punctuated salamanders of the genus Hydromnntcs (Amphtbia,
equilibrium or gradualism in the lizard genus Urodela). Chromasoma 94:377-388.
Sceloporus? Lost in plesiograms and a forest of Navidi, W. C., G. A. Churchill and A. V. von Ilaescler
trees. Cladistics (in press). 1991. Methods for inferring phylogenies from
Murphy, R. W. and R. H. Matson. 1986. Gene expres- nucleic acid sequence data by using maximum
sion in the tuatara, Splzenodon punctatus. New likelihood and linear invanants. Mol. Blol. Evol
Zealand J.Zool. 13:573-581. 8.128-143.
Murphy, R. W., W. E. Cooper, Jr. and W. S. Richardson. Neale, D B, and R. R Sederoff 1988. Inheritance and
1983. Phylogenetic relationships of the North evoiution of organelle gcnomes, pp. 251-264.111 J
American five-lined skinks, genus Eurneces W. Hanover and D. E. Keathly (eds.), Grnehc
(Sauria: Scincidae). Herpetologica 39:200-211. Manipulafton of Woody Plants Plenum, New York
Murphy, R. W., E C. McCollum, G. C. Gorman and R. Neale, D. B., M.A. Saghai-Maroof, R. Mr.Allard, Q.
Thomas. 1984. Genetics of hybridizing popula- Zhang dnd R. A. Jorgenses~1988. Chloroplast
tions of Puerto Rican Sphaerodactylus. J. Herpetol. DNA diversity in populations of wild and cultl-
18:93-105. vated barley. Genetics 120:1105-1110.
Murray, V. 1989. Improved double-stranded DNA Nee, S., E. C. Holrnes and P. H. Harvey. 1995. Inferring
sequencing using the linear polymerase chain population processes from molecular phyloge-
reaction. Nucl. Acids Res. 17:8889. nies. Phil. Trans. Roy. Soc. L ~ n d o n(in press).
Muse, S. V. and B. S. Gaut. 1994. A likelihood approach Needleman, S. B. and C D. Wunsch. 1970. A general
for comparing synonymous and nonsynonymous method applicable to the search for similarltles In
substitutioi~rates, with application to the chloro- the amino acid sequence of two proteins J Mol
plast genome. Mol. Blol. Bvol. 11:715-724. Biol 48:443453.
Neff, N. A. 1986. A ratlonal basis for a priori character ment length polymorphism in Cucumis melo.
weighting. Syst. Zool. 35:110-123. Theor. Appl. Genet. 83:379-384.
Nei, M. 1972. Genetic distance between populations. Nevo, E., A. Ueiles and R. Ben-Shlomo. 1984. The evo-
Am Nat. 106:283-292. lutionary significance of genetic diversity:
Nel, M 1973. Analysls of gene diversity in subdivided Ecological, demographic and life history corre-
populations. Proc. Natl. Acad. Sci. USA. lates. Lec. Notes Biomath. 53:13-213.
70 3321-3323. Nichol, S. T.,J. E. Rowe and W. M. Fitch. 1993a.
NCI,M 1978. Estimation of average heterozygosity Punctuated equilibrium and positive Darwinian
and genetic distance from a small number of indi- evolution in vesicular stomatitis virus. Proc. Nat.
viduals. Genetics 89:583-590. Acad. Sci. USA 90:1042410428.
Nei, M 1987. Molecular Evolt~tionaryGenetics. Nichol, S T., C. F. Spiropoulou, S. Morzunov, P. E.
Columbia University Press, New York. Rollin, T. G. Ksiazek, H. Feldmann, A. Sanchez, J.
Nel, M.1991. Relative efficiencies of different tree Childs, S. Zaki and C. J. Peters. 1993b. Genetic
lnhklng methods for lnolecular data, pp. 90-128. identification of a hantavirus associated with an
111M M. Miyamoto and J Cracraft (eds.), outbreak of acute respiratory tllness. Science
i3hylogenetlcAnalysrs of DNA Sequences. Oxford 262:914-917.
University Press, New York. Nichols, E., V. M. Chapman and E H. Ruddle. 1973.
Nei, and R. K.Chesser. 1983. Estimation of fixation Polymorpl~isrnand linkage for mannosephos-
indices and gene diversif~cation.Ann. Human phate isomerase in Mus musculus. Biochem.
Genet. 47253-259, Genet. 8:47-53.
Nei, M. and T. Gojobori 1986. Simple methods for Nichols, R. A. and D. J. Balding. 1991. Effects of popu-
estimating the numbers of synonymous and non- lation structure on DNA fingerprint analysis in
synonymous nucleottde substitutions. Mol. Biol. forensic science. Heredity 66:297-302.
Evol. 3:418-426. Nickrent, D. L. 1986. Genetic polymorphism in the
Nei, M and r(.K.Koehn (cds.) 1983. Evolution of Genes morphologically reduced dwarf mistletoes
,7n~fProteins. Sinauer, Sunderland, Massacl-iusetts. (Arceuthoblum,Viscaceae):An electrophoretic
Nel, 91 and W.-H. LI. 1979. Mathematical model for study. Am. J. Bot. 73:1492-1501.
studying genetic variation in terms of restrlctiorr Nickrent, D. L., S. I. Guttman and W. M.Eshbaugh.
eildonucleases Proc Natl. Acad. Scl. 1984. Biosystematlc and evolutionary relation-
U SA76:5269-5273. ships among selected taxa of Arceuthobium. U.S.
Nel, M and E Tajima. 1983. Maximum likelihood esti- Dept. Agriculture Tech. Report IW-111.
mat~onof the number of nucleotide substitutions Nierman, W. C. and D. R. Maglott (eds.). 1993.
from restriction sites data. Genetics 105:207-217 ATCCINIH Reposito?y Catalogue of Human and
Nel, M. and E Tajima 1985. Evolutionary change of Mouse DNA Probes and Libraries. 7th ed. American
restrlctlon cleavage sites and phylogenetic infer- Type Culture Collection, Rockville, Maryland.
cnce for malt and apes. Mol. Uiol. Evol. 2189-205. Nixon, K. C. and J. M. Carpenter. 1993. On outgroups.
Nei, M , T.Maruyama and Ii. Chakraborty. 1975. The Cladistics 9:413-426.
bottleneck effect and genetic variability in popu- Noordermeer, J., F. Meijlink, P. Verrijzer, E Xjsewijk
latlons. Evolution 29:l-10. and 0.Destree. 1989. Isolation of the Xenopus
Nclgel, J E. and J. C. Avlse. 1986. Phylogenet~creia- homolog of ~nt-l/winglessand expression during
tlonships of mitocl~ondrialDNA under various neurula stages of early development. Nucl. Acid
demographic models of speciation, pp. 515-534 Res. 17:ll-18.
J17 E. Nevo and S. Karlin (eds.), Evolutionary Norman, J., C. Moritz and C. I. Limpus. 1994.
Processes and Theoiy. Academic Press, New York. Mitochondria1 DNA control region polymor-
Nelgel, J. E. and 5. C. Avise. 1993. Application of a ran- phisms: Genetic markers for ecological studies of
dom walk model to geographic distributions of marine turtles. Mol. Ecol. 3:363-373.
animal initochondrial DNA variation. Genetics Nuttall, G. 13. E 1904. Blood Immunity and Blood Rela-
135 1209-1220. flonship. Cambridge University Press, Cambridge.
Nelson, I<, R. J , Baker and R. L. Honeycutt, 1987.
Mltochondrial DNA and protein differentiat~on O'Brien, S. J. 1993. Domestic cat. pp.250-253, In S. J.
between l~ybridizingcytotypes of the white-foot- O'Brien (ed.),Genetic maps. Locus maps of complex
ed mouse, Peromyscus leucopus. Evolution genomes Book 4. Nonhuman verfebrates.6th ed.
41.864-872. Cold Spring Harbor Laboratory Press, Cold
Keuhaussen, S. L. 1992. Evaluation of restriction frag- Spring Harbor, New York.
i )
O'Brien, S. J., D. E. Wildt, D. Goldman, C. R. Merril Wachter and M. L. Straf (eds.), The future of Meta-
and M. Bush. 1983. The cheetah is depauperate in analysis. Russell Sage Foundation, New York.
genetic variation. Science 221:459462. Olmstead, R. A., R. Langley, M. E. Roelke, R. M.
O'Br~en,S. J., W, G. Nash, D. E. Wildt, M. E. Bush and Goeken, D. Adger-Johnson, J. P. Goff, J. P. Albert,
R. E. Benveniste. 1985a. A molecular solution to C. Packer, M. K. Laurenson, T. M. Caro, L.
the riddle of the giant panda's phylogeny. Nature Scheepers, D. E. Wildt, M. Bush, J. S. Martenson
317:140-144. and S. J. O'Brien. 1992. Worldwide prevalence of
O'Brien, S. J., M E. Roelke, L. Marker, A. Newman, C. lentivirus infection in wild feline species:
A. Winkler, D. Meltzer, L. Colly, J. E Evermann, Epidemiologic and phylogenetic aspects. J. Virol.
M. Bush and D. E. Wildt. 1985b. Genetic basis for 66:6008-6018.
species vulnerability in the cheetah. Science Olmstead, R. G. and J. D. Palmer. 1992.A chloroplast
227:1428-1434. DNA phylogeny of the Solanaceae: Subfamilial
O'Brien, S. J., D. E. Wildt, M. Bush, T. M. Caro, C. relationships and character evolution. Ann.
FitzGibbon, I. Aggundey and R. E. Leakey. 1987. Missouri Bot. Garden 79:346-360.
East African cheetahs: Evidence for two popula- Olmstead, R. C. and J. D. Palmer. 1994. Chloroplast
tion bottlenecks? Proc. Nati. Acad. Sci. USA DNA systematics: A review of methods and data
84:508-511. analysis. Am. J. Bot. 81:1205-1255.
3chma11,H., A. S. Gerber and D. L. Hartl. 1988. Olmstead, R. G. and J. A. Sweere. 1994. Combining
Genetic applications of an inverse polymerase data in phylogenetic systematics: An empirical
chain react~on.Genetics 120:621-623. approach using three molecular data sets in the
Odrzykoski, I. J. and J. Szweykowski. 1991. Genetic Solanaceae. Syst. Biol. 43:467-481.
differentiation witl~outconcordant morphological Olmstead, R. G., H.J. Michaels, K. M. Scott and J,
divergence in the thallose liverwort Conocephalum Palmer. 1992. Monophyly of the Asteridae and
conicurn. Plant Syst. Evol. 178:135-151. identification of their major lineages as inferred
O'Grady, R. T. and G. 8. Deets. 1987. Coding multi- from DNA sequences of rbcl;. Ann. Missouri Bot.
state characters, with special reference to the use Gard. 79:249-265.
of parasites as characters of their hosts. Syst. Zool. Olmstead, R. G., J. A. Sweere and K. H. Wolfe. 1993.
36:268j-279. Ninety extra nucelotide in ndhF gene of tobacco
O'Hara, R. J. 1993 Systematic generalization, histori- chloroplast DNA: A summary of revisions to the
cal fate, and the species problem. Syst. Biol. 1986 genome sequence. Plant Mol. Biol.
42:231-246. 22 :1191-1193
Ohno, S. 1970. Evolution by Gene Duplication. Springer- Olsen, G. J. 1987. Earliest phylogenetic branchings:
Verlag, New York. Comparing rRNA-based evolutionary trees
Ohno, S., C. Stenius, L. Christian and G. Schipmann. inferred with various techniques. Cold Spring
1969. De novo mutation-like events observed at Harbor Symp. Quant. Biol. 52:825-837.
the 6PGD locus of the Japanese quail, and the Olsen, G. J. 1988. Phylogenetic analysis using riboso-
principle of polymorphism breeding more poly- ma1 RNA. Meth. Enzymol. 164:793-838.
morphisin. Biochem. Genet. 3:417-428. Olsen, G. J. and C. R. Woese. 1989. A brief note con-
Ohta, N., H. Nagashima, S. Kawano and T.Kuroiwa. cerning archaebacterial pl~ylogenyCan. J.
3992. Isolation of the chloroplast DNA and the Microbiol. 35:119-123.
sequence of the trnK gene from Cyanidium calcari- Olsen, G. J., R. Overbeek, N. Larsen, T.L. Marsh, M. J.
urn Strain RIC-I. Plant Cell Physiol. 33:657-661. McCaughey, M. A. Maciukenas, W.-M. Kuan, T. J.
Ohta, T.1977. Extension of neutral mutation drift Macke, Y. Xing and C. R. Woese. 1992. The riboso-
hypothesis, pp. 148-167. In M. Kimura (ed.), ma1 database project. Nucl. Acids Res. 20
Molecular Evolution and Polymorphism. Nat~onal (suppl.):2108-2200.
Institute of Genetics, Mishima, Japan. Olsen, R. R., J. A. Runstadler and T. D. Kocher. 1991.
Ohta, T. 1992. The nearly neutral theory of molecular Whose larvae? Nature 351:357-358.
evolution. Annu. Rev. Ecol. Syst. 23:263-286. Omland, K. E. 1994. Character congruence between a
Ol~yama,K., W. Fukuaawa, T. Kohchi, H. Shirai, T. molecular and a morphological phylogeny for
Sano, S. Sano, K. Umesono, Y. Shiki, M. Takeuchi, dabbling ducks (Anus).Syst. Biol. 43:369-386.
Z. Ckang, S. Aota, H.lnokuchi and H. Ozeki. Orita, M., H. Iwahana, H. Kanazawa, K. Hayashi and
1986. Complete nucleotide sequence of liverwort T. Sekiya. 1989. Detection of polymorphisms of
Marchantiu polymorpha chloroplast DNA. Plant human DNA by gel electrophoresis and single-
Mol. Biol. Rep. 4:148-175. strand conformation polymorphisms. Proc. Natl.
Olkin, I. 1990, History and goals, pp. 3-10. In K. W. Acad. Sci. USA 86:2766-2770.
Orosz, J. M. and J. G. Wetmur. 1974. In vitro iodination Page, R. D. M. 1993a Genes, organisms, and areas:
of DNA. Maximizing iodination while minimiz- The problem of multiple lineages. Syst. Biol.
ing degradation: Use of buoyant density shifts for 4277-84.
DNA-DNA hybrid isolation. Biochemistry Page, R. D. M. 1993b. On islands of trees and efficacy
13:5467-5473. of different methods of branch swapping in find-
Ostrander, E. A., G. E Sprague, Jr. and J. Rine. 1993. ing most-parsimonious trees. Syst. Biol.
Identification and characterization of dinucleotide 42:200-210.
repeat (CA),, markers for genetic mapping in Page, R. D. M. 1994. Maps between trees and cladistic
dog. Genomics 16:207-213. analysis of historical associatians among genes,
Ou, C.-Y., C. A. Ciesielski, G. Myers, C. I. Bandea, C.- organisms, and areas. Syst. Bid. 43:58-77.
C, Luo, B. T, M. Korber, J. I. Mullins, G. Palmer, J. D. 1982. Physical and gene mapping of
Schochetman, R. L. Berkelman, A. N. EcQnomou, chloroplast DNA from Atrtplex triangularis and
J. J. Witte, L. J. Furman, G. A. Satten, K. A. Cucumis sativn. Nucl. Acids Res. 10:1593-1605.
Macinnes, J. W. Curran and K. W. Jaffe. 1992. Palmer, J. D. 1985a. Evolution of chloroplast and mito-
Molecular epidem~ologyof HIV transmission in a chondrial DNA in plants and algae, pp. 131-240.
dental practice. Science 256:1165-1171. In R. J. MacIntyre (ed.), Molecular Evolutfonary
Ouchterlony, 0.1958. Diffusion-in-gel methods for Gerzetics. Plenum, New York.
i~nmunologicalanalysis. Progr. Allergy 5:l. Palmer, J. D. 1985b. Comparative organization of
chloroplast genomes. Annu. Rev. Genet.
Paabo, S. 1985. Molecular cloning of ancient Egyptian 19:325-354.
mummy DNA. Nature 314:644-645. Palmer, J. D. 1986a. Isolation and structural analysis of
Paabo, S. 1989. Ancient DNA: Extraction, characteriza- chloroplast DNA. Meth. Enzymol. 118:167-186.
tion, molecular cloning, and enzymatic amplifica- Palmer, J. D. 1986b.Chloroplast DNA and phylogenet-
tion. Proc. Natl. Acad. SCL. USA 86:1939-1943. ic relationships, pp. 63-80. In S. K. Dutta (ed.),
Paabo, S. 1995. The Y chromosome and the origin of D N A Systernattcs. CRC Press, Boca Raton, Horida.
all of us (men). Science 268:1141-1142. Palmer, J. D. 1987. Chloroplast DNA evolution and
Paabo, S., J. A. Gifford and A. C. Wilson. 1988. biosystematic uses of chloroplast DNA variation.
Mitochondria1DNA sequences from a 7000-year Am. Nat. 130:S6-529.
old brain. Nucl. Acid Res. 16:9775-9787. Palmer, J. D. 1991. Plastid chromosomes: Structure and
Piabo, S., R. Higuchi and A. C. Wilson. 1989. Ancient evolution, pp. 5-53. In L Bogorad and I. K. Vasil
DNA and the polymerase chain reaction. J. Biol. (eds.), Cell Culture and Somatic Cell Genetics of
Chem. 264:9709-9712. Plants, vol. 7A, The Molecular Biology of Plastids.
Paabo, S., W. K. Thomas, K.M. Whitfield, Y. Academic Press, San Diego.
Kumazawa and A. C. Wilson. 1991. Palmer, J. D. 1992. Mitochondnal DNA in plant sys-
Rearrangements of mitochondria1 transfer RNA tematics: Applications and limitations, pp. 36-49.
genes in marsupials. J. Mol. Evol. 33:426-430. In P. S. Soltis, J. E. Soltis and J. J. Doyle (eds.),
Pace, N. R., G. J. Olsen and C. R. Woese. 1986. Molecular Systenzatics of Plants. Chapman and
Ribosomal RNA phylogeny and the primary lines Hall, New York.
of evolutionary descent. Cell 45:325-326. Palmer, J. D. and L. A. Werbon. 1988. Plant mitochon-
Pagel, M. D. 1992. A method for the analysis of com- drial DNA evolves rapidly in structure, but slow-
parative data. J. Theor. Biol. 156:431442. ly in sequence. J. Mol. Evol. 28:87-97.
Pagel, M. D. and P. H.Harvey 2989. Taxonomic differ- Palmer, J. D, and D. Zamir. 1982. Chloroplast DNA
ences in the scaling of brain on body size among evolution and phylogenetic relationships in
mammals. Science 244:1589-1593. Lycopersicon. Proc. Natl. Acad. Sci. USA
Pagel, M. D. and P. H. Harvey. 1992. On salving the 79:5006-5010.
correct problem: Wishing does not make it so. J. Palmer, J. D., R. A. Jorgensen and W. F. Thompson.
Theor. Biol. 156:425430. 1985. Chloroplast DNA variation and evolution in
Paetkau, D. and C. Strobeck. 1994. Microsatellite Pisum: Patterns af change and phylogenetic
analysis of genetic vanation in black bear popula- analysis. Genetics 109:195-213.
tions. Mol. Ecol. 3:489496. Palmer, J. D., B. Osorio, J. Aldricl~and W. F.
Page, R. D. M.1991. Clocks, clades, cospeciation: Thompson. 2987. Chloroplast DNA evolution
Comparing rates of evolution and timing of among legumes: Loss of a large inverted repeat
cospeciation events in host-parasite assemblages. occurred prior to other sequence arrangements.
Syst. Zool. 40:188-198. Curr. Genet. 11:275-286.
Palmer, 1. D.,B. Osorio and W F.Thompson. 1988a. enzyine activity in two spccles of the genus
Evolutionary significance of inversions in legume Garnrxarus (Crustacea. Arnphipoda). Evolution
cl~loroplastDNA. Curr. Genet 14:75-89. 46 1568-1573.
Palmer, J. D., R. K. Jansen, H. J. Michaels, M. W. Chase Patarnello, T., P.M. Blsol and B. Battaglia. 1989.
and J. R. Manhart. 198810. Chloroplast DNA and Studies on differential fitness of PGI genotypes
plant phylogeny. Ann. Missouri Bot. Gard. with regard to temperature In Ganzmarus ~i?serzsi-
75:1180-1206. bilis (Crustacea: Amphipoda). Marine Biol
Palumbi, S. R. 1992. Marine speciation on a small plan- 102:355-359.
et. Trends Ecol. Evol. 7:114-121. Pathak, S. and E E. Arrighl. 1973 Loss of DNA follow-
Palumbi, S. R, and C. S. Baker. 1994. Contrasting pop- ing C-banding procidures. Cytogenet. Cell Genct
ulation structure from nuclear intron sequences 12:414422.
and mtBNA of humpback whales. Mol. Biol. Patterson, C. (ed.) 1987. Molecl~lesrand Morphology i i r
Evol. 11:426435. Evolution' Conflict or Coinpro~nise?Cambridge
Palumbi, S. R. and J. Benzie. 1991. Large rnitochondri- Universrty Press, Cambridge
a1 DNA differences among morphologically simi- Patterson, C. 1988. Hon~ologym classical and molecu-
lar penaeid shrimp. Mol. Mar. Biol. Biotechnol. lar biology. Mol. Brol Evol 5 603-1525.
1:27-34. Patterson, C. 1989. Phylogenet~crelations of tnajor
Palumbi, S. R., A. P. Martin, S. Romano, W. 0. groups: Conclusions and prospects, pp. 471-488
McMillan, L. Stice and G. Grabowskl. 1991. The 712 8. Fernholm, K. Brerner and I-I. Jornvall (eds ),
Simple Fool's Guide to PCR ,Special Publ. Dept. The H~erarrhyof Life Elscvlcr, Amsterdam
Zoology, University of Hawaii, Honolulu. Patterson, C., D. M Wllliams and C. J. Humpliries.
Palva, T.K. and E. T. Palva. 1985. Rapid isolation of 1993. Congruence between molecular and mor-
animal mitochondria1 DNA by alkallne extraction. phologlcal phylogenres Annu Rev. Ecol Syst
FEBS Letters 192:267-270. 24:253-188.
Pamllo, P. and M. Nei. 1988. Iielationships between Patton, J. Id. and J. H. Feder 1981. Microspatlal genetlc
gene trees and species trees. Mol. Biol. Evol. hcterogeneity in pocket goplicrs: Non-random
5:568-583. breed~ngand dr~ft.Evolution 35:912-920.
Pardue, M. L. 1985. In situ hybridization, pp. 179-202. Patton, J L and S. W Shcrwood. 1982. Gcnonie evolu-
In B. D. Hames and S. J. Higgins (eds.), Nucleic tron In pocket gophers (genus Tlzoniomys) I.
Acid Hybridization: A Practical Apprvach. IRL Press, Hcterocl~romatmvarlatlon and speclation potcn-
Oxford. tlal. Chron~osolna85 149-162.
Pardue, M. L. 1986. In situ hybridization to DNA of Patton, J. L. and S. W. Shcrwood. 1983. Chromoson~e
chromosomes and nuclei, p p 111-137. ln D. B. evolution and speclatian In rodents. Annu. Rev
Roberts (ed.), Drosopllzla: A Pmcfical Approach. IRL Ecol. Syst. 14:139-158.
Press, Oxford. Patton, J L, and M. F Smlth 1994. Paraphyly poly-
Paris Conference. 1971. Standardization in l ~ u m a n phyly, and the nature of specles boundaries In
cytogenetics. Cytogenetics 11:313-362. pocket gophers (genus Thotnotnys). Syst. Biol.
Parker, E. D., Jr. and R. K.Selander. 1976. The organi- 43:ll-26.
zation of genetic diversity in the parthenogenetic Patton, J. L., M. E Smith, R. D.Prlce and R. A
lizard Cnemidophorus lesselatus. Genetics Hellenthal. 1984. Genctrcs of hybridization
84791-805. between the pocket gophers Tiromomys bottae and
Parkm, D. T. and S. R. Cole. 1985. Genetic differentia- Tkomonzys townsendiz In northeastern California
tion and rates of evolution in some introduced Great Basin Nat. 44:431-440.
populations sf the House Sparrow, Passer domesfi- Pearson, W. R. 1990. Rapld and scns~tlvesequence
cus in Australia and New Zealand. Heredity comparison with FASTP and FASTA. Meth.
54:15-23. Enzymal. 183:63-98.
Parsons T. J., S. L. Olson and M. J. Braun. 2993. Pearson, W. R. and D. J. Liprnan. 1988. Improved tools
Un~directionalspread of secondary sexual for b!ological sequence comparison. Proc. Natl
plumage traits across an avian hybrid zone. Acad. Sci. USA 85:2444-2448.
Science 260:1643-1646. P e n ~ ~D.y , 1982. Towards a basis for classification: The
Patarnello, T. and B. Battaglia. 1992. Glucose-phos- incompleteness of distance measures, incompatl-
phate isomerase and fitness: Effects of tempera- bility analysis and phenetic classification J. Thcor,
ture on genotype dependent mortality and Biol. 96129-142.
612 Literature Cifed
Penny, DD, and M. D. Hendy. 1985. Testing methods of Pinkel, D., T. Straume and J. W. Gray. 1986.
evolutionary tree construction. Cladistics Cytogenetic analysis uslng quantitative, high-sen-
1 266-272 sitivity, fluorescence hybridization. Proc. Natl.
Penny, D and M. I-Iendy. 1986. Estimating the rehabd- Acad. Scl. USA 83:2934-2938.
l t y of evolutionary trees. Mol. Blol. Evol. Pirrotta, V. 1986. Cloning Drosophila genes, pp. 83-110.
3 403-417. In D. 8. Roberts (ed.), Drosophlla: APractical
Penny, D. and M. D. Hendy. 1987. TurboTree: A fast Approach. IKL Press, Oxford.
algorithm for minimal trees. CABIOS 3:183-187. Plante, Y., P. T. Boag and B. N. White. 1987.
Penny, D., L. R. Foulds and M. D. I-Iendy. 1982. Nondestructive sampling of mitochondria1 DNA
Testing the theory of evolution by comparrng evo- from voles. Can. J. Zool. 65:175-180.
lut~onarytrees constructed from five different Pleyte, K. A., S. D, Duncan and R. B. Phillips. 1992.
protem sequences Nature 297.197-200. Evolutionary relationships of the salmonid fish
Pc~uny,D., M. D. Hendy and M. Steel. 1992. Progress genus Salvelinus inferred from DNA sequences of
with rnethods for constructing evolutlanary trees. the first internal transcribed spacer (ITS11 of the
Trends Ecol. Evol 7.73-79. ribosomal DNA. Mol. Phylogenet. Evol. 1:223-230.
Perasso, R., A. Baroin, L. 13. Qu, 3.-P. Bachellerie and Pollock, D. D. and D. B. Goldstein. 1994.A comparison
A Adoutte. 1989. Origin of the algae. Nature of two methods for constructing evolutionary dis-
339 142-144. tances from a weighted contribution of transition
Pctcrson, D. G., S. M Stack, J. L. Ilealy, B. S. Donohoe and transversion differences. Mol. Biol. Evol.
and L. K. Anderson 1994 The reiationship 12:713-717.
between synaptonernal complex length and Ponath, I? D., R. T. Boyd, D. M. Hillis and P. D.
genome size in four verlebrate classes Gottlieb. 1989a. Structural and evolutionary com-
(Ostelchthyes, Reptllia, Aves, Mammalia). parisons of four alleles of the mouse Igk-J locus
Chro~~~osorne Res. 2 153-162. which encodes immunoglobulin kappa light
I'eirlgrew, J. D. 1986. Flylng primates? Megabats have chain joinlng (Jk) segments. Immunogenetics
thc advanced pathway from eye to midbrain. 29:389-396.
Saence 231:1304-1306. Ponath, P. D., D. M. Hillis and P. D. Gottlieb. 1989b.
J't.itiglew, J. D.1991a. \471ngs or brain? Convergent Structural and evolutionary comparisons of four
evolution in the origll~sof bats. Syst. Zoal. alleles of the mouse immunoglobulin kappa chain
40:199-216. gene, Igk-VSer. Immunogenetics 29249-25'7,
Pettlgrew, J. D. 1991b. A fruitful wrong hypothesis? Pope, T. R. 1992. The influence of dispersal patterns
Response to Baker, Novacek and Simmons. Syst. and mating system on genetic differenhation
Zoo1 40.231-239. within and between populations of the red
Pcttlgrew, J. D. 1994. Flying DNA. Curr. Biol. howler monkey (Alouatta seniculus). Evolution
4 277-280. 46:1112-1128.
Petlrgrew, 1. D , B. C. M. Jamieson, S. K.Robson, L. S. Porter, A. H. 1990. Testing nominal species boundaries
Hall, K. I. McAnally and 1-1. M.Cooper. 1989. using gene flow statistics: The taxonomy of two
Plxylogeneticrelations between microbats, mega- hybridizing admiral butterflies (Limenitis:
bats and primates (Mammalia: Chireptera and Nymphalidae). Syst. Zool. 39:131-147.
Primates). Phil. Trans, Roy. Soc. B 325:489-559. Porter, C. A., M. J. Hamilton, J. W. Sites, Jr. and R. J.
Pfennig, D W. and H. K. Reeve. 1993. Nepotism in a Baker. 1991. Location of ribosomal DNA in chro-
solltary wasp as revealed by DNA fingerprinting. mosomes of squamate reptiles: Systematic and
Evolution 47:700-704. evolutionary implications. Herpetologica
Fhllbllck, C, T. 1993. Underwater cross-pollination in 47:271-280.
Cailrtrlche hermapi~rod~tic (Callltrichaceae): Porter, C. A,, M. W. Haiduk and K. de Queiroz. 1994.
Cvldence from random amplified polymorphic Evolution and phylogenetic significance of ribo-
DNA markers. Am J Bot. 80:391-394, soma1 gene location m chromosomes of squamate
Phl!ilps, SC. B. and K. A. Pleyte 1991. Nuclear DNA reptiles. Copeia 2:302-313.
and salmonid phylogenics. J. Fish. Biol. 39(suppl. Powell, J R. and A. Caccone. 1989. Intra- and interspe-
A):259-275. cific genetlc variation in Drosophila. Genome
Rerson, E D., V. M. Sarich, J. M. Lowenstein, M. J. 31:233-238.
Danlel and W. E. Ramey. 1986.Amolecular link Powell, J. R. and A. Caccone. 1990. The TEACL
between the bats of New Zealand and South methad of DNA-DNA hybridization: Techmcal
America. Nature 323.6043. considerations. J, Mol. Evol. 30:267-272.
Powell, J R, and M. C. Zuniga. 1983.A simplified pro- Prout, T. 1965. The estimation of fitness from genotyp-
cedure for studying mtDNA polymorphisms. ic frequencies. Evolution 19:546-551.
Biochem. Genet. 21:1051-1055. Pryer, K.M. and C. H. Haufler. 1993. Isozymic and
Powell, J. R., A. Caccone, G. D. Arnato and C. Yoon. chromosomal evidence for the allotetraploid ori-
1986. Rates of nucleotide substitution in gin of Gymnocarpiurn dryopteris (Dryopteridaceae).
Drosophrla mitochondria1 DNA and nuclear DNA Syst. Bot 18:150-172.
are similar. Proc. Natl. Acad. Sci. USA Purvis, A. and T. 6 .Garland, Jr. 1993, Polytomies in
83:9090-9093. comparative analyses of continuous characters.
Powell, M. J. D. 1964.An efficient method for finding Syst. Biol. 42:569-575.
the minimum of a function of several variables
without calculating derivatives. Comp. J. Qu, L. H., 13. Michot and J.-P. Bachellerie. 1983.
7:155-162. Improved methods for structure probing in large
Powers, D. A. 1987. A multidisciplinary approach to RNAs: A rapid heterologous sequencing
the study of genetic variation in species, pp. approach is coupled to the direct mapping of
102-134. In M, E, Feder, A. E Bennet and X.B. nuclease accessible sites. Application to the 5' ter-
Huey (eds.), New Directions in Physiological minal domain of eukaryotic 285 rRNA. Nucl.
Ecology. Cambridge University Press, New York. Acids Res. 11:5903-5920.
Powers, D. A., G. S. Greaney and A. R. Place. 1979. Quellar, D. C., J. E. Strassn-iann and C. R. Hughes.
Physiological correlation between lactate dehy- 1988. Genetic relatedness in colonies of tropical
drogenase genotype and haemoglobin function in wasps with multiple queens. Science
killifish. Nature 277:240-241. 2421155-1157.
Prager, E. M. and A. C. Wilson. 1971a. The dependence Queller, D. C., J. E.Strassmann and C. R.
of immunological cross-reactivity upon sequence Hughes.1993. Microsatellites and kinship. Trends
resemblance among lysozymes. I. Micro-comple- Ecol. Evol. 8:285-288.
ment fixation studies. 1. Biol. Chem. 246:5978-89. Quinn, T. W. 1992. The genetic legacy of Mother
Prager, E. M. and A. C. Wilson. 1971b. The depen- Goose: Pl-iylogeographic patterns of lesser snow
dence of immunological cross-reactivity upon goose Chen caerulescens caerulescens maternal lin-
sequence resemblance among lysozymes. 11. eages. Mol. Ecol. 1:105-117.
Comparison of precipiiin and micro-complement Quinn, T. W. and B. hi.White. 1987a.Analysis of DNA
fixation results. J. Biol. Chem. 246:7010-17. sequence variation, pp. 163-198. In F. Cooke and
Prager, B. M, and A. C. Wilson. 1976. Congruency of P. A. Buckley (eds.), Avian Genetzcs. Academic
phylogenies derived from different proteins. J. Press, London.
Mol. Evol. 9:45-57. Quinn, T. W. and B. N. White. 198%. Identification of
Prager, E. N. and A. C. Wilson. 1988. Ancient origin of restriction fragment length polymorphisms in
lactalbumin from lysozyme: Analysis of DNA and genomic DNA of the lesser snow goose. Mol. Biol.
amino acid sequences. J. Mol. Evol. 22326-335. Evol. 4:126-143.
Prager, E. M., A. H.Brush, R. A. Nolan, M. Nakanishi Quinn, T. W., J. S. Quinn, F'Cooke and B. N. White.
and A. C. Wilson. 1974. Slow evolution of trans- 1987. DNA marker analysis detects multiple
ferrin and albumin in birds according to micro- maternity and paternity in single broods of the
complement fixation analysis. J. Mol. Evol. lesser snow goose (Anser caerulescens caerulescens).
3:243-262. Nature 396:392-394.
Prager, E. M., A. C. Wilson, J. M. Lowenstein and V. M.
Sarich. 1980. Mammoth albumin. Science Radtke, R. D., S. D. Donnellan, R.I\].Fisher, C. Moritz,
209:287-289 K. A. Hanley and T. J. Case. 1995. When species
Prensky, W. 1976. The radiolodination of RNA and collide: The origin and spread of an asexual
DNA to high specific activit~es,pp. 121-152. In D. species of gecko. Proc. Roy. Soc. London B
M. Prescott (ed.), Methods in Cell Biology. 258:145-152.
Academic Press, New York. Raff, R. A., K. G. Field, M. T. Ghiselin, D. J. Lane, G. J.
Pnce, D. K., G. E. Collier and C. F.Thompson. 1989. Olsen, A. L. Parks, B. A. Parr, N. R. Pace and E. C.
Multiple parentage in broods of house wrens: Raff. 1988. Molecular analysis of distant phyloge-
Genetic evidence. J. Hered. 80:l-5. nehc relationships in echinoderms, pp. 2941. In
Prodohl, P. A,, J. B. Taggart and A. Ferguson. 1994. C. R. C. Paul and A. B. Smith (eds.), Echinoderm
Single locus inheritance and joint segregation Phylogeny and Evolutionary U~ology.Oxford
analysis of minisatellite (VNTR) loci in brown University Press, Oxford.
trout (Salmo frufta L.). Heredity 73:556-566.
Ragghianti, M., S. Bucci, G. Mancino, J. C. Lacroix, D. Reed, K.C. and D. A. Mann. 1985. Rapid transfer of
Boucher and J. Charlemagne. 1988. A novel DNA from agarose gels to nylon membranes.
approach to cytotaxonomic and cytogenetic stud- Nucl. Acids Res. 13:7207-7221.
ies in the genus Triturus using monoclonal anti- Reeve, H. K., D.F. Westneat and D. C. Queller. 1992.
bodies to lampbrush chromosomes antigens. Estimating average within-group relatedness
Chromosoma 97:134-144. from DNA fingerprints. Mol. Ecol. 1:223-232.
Rainboth, W. J. and D. G. Buth. 1992. On the costs of Reeves, J. W.1992. Heterogeneity in the substitution
isozyme electrophoresis: Current prices for process of amino acid sites of proteins coded for
enzyme stains. Isozyme Bull. 2522-26. by mitochondria1 DNA. J. Mol. Evol. 35:17-32.
Rainboth, W. J. and G. S. Wh~tt.1974. Analysis of evo- Rernsen, J. V., Jr. 1977. On taking field notes. Am. Birds
lutionary relationships among shiners of the sub- 31:946-953.
genus Luxilus (Teleoste~,Cypriniformes, Notropzs) Reynolds, J., 8. S. Weir and C. C. Cockerham. 1983.
with the lactate dehydrogenase and malate dehy- Estimation of the coancestry coefficient: Basis for
drogenase isozyme systems. Comp. Biochem. a short-term genetic distance. Genetics
Physiol. 49B:241-252. 105:767-779.
Ramshaw, J. A. M., J. A. Coyne and R. C. Lewontin. Richardson, B. J. 1981. The genetic structure of rabbit
1979. The sensitivity of gel electrophoresis as a populations, pp. 37-52. In K. Myers and C. D.
detector of genetic variation. Genetics MacInnes (eds.), Proceedings of the World
93:1019-1037. Lagomorph Conference held in Guelph, Ontario,
Rand, D. M. 1993. Endotherms, cctotherms, and mito- August, 1979, Guelph, University of Guelph.
chondrial genome-size variation. J. Mol. Evol. Richardson, B. J. 1983. Distribution of protein varia-
37:281-295. tion in skipjack tuna (Katstlmonus pelamls) from
Rand, D. M. 1994. Thermal habit, metabolic ratc and the central and south-western Pacific. Australian
the evolution of mitochondria1 DNA. Trends Ec01. 5. Marine Freshwater. Res. 34:231-251.
Evol. 9:125-131. Richardson, B. J., P. R. Baverstock and M. Adams.
Rand, D. M.and R. G. I-Iarrison. 1986a. Ecological 1986. Allozyme Electrophoresis: A Handbook for
genetics of a mosaic hybrid zone: Mitochondrial, Animal Systematics and Population Structure.
nuclear and reproductive differentiation of crick- Academic Press, Sydney.
ets by soil type. Evolution 43432-449. Riddle, B. R., R. C. Honeycutt and P. L. Lee. 1993.
Rand, D. M. and R. G. Harrison. 198613. Mitochondrial Mitochondrial DNA phylogeography in northern
DNA transmission genetics in crickets. Genetics grasshopper mice (Onyckomys leucogaster)-the
114:955-970. influence of Quaternary climatic oscillations on
Rand, D.M., M. Dorfsman and L. M. Kann. 1994. population dispersion and divergence. Mol. Ecol.
Neutral and non-neutral evolution of Drosoplzzla 2183-193.
mitochondria1 DNA. Genetics 138:741-756. Rider, C. C. and C. B. Taylor. 1980. Isoenzymes.
Randall, S. K., R. Eritja, B. E. Kaplan, J, Petruska and Chapman and Hall, London.
M. E Goodman. 1987. Nucleotide insertion kinet- Ridgway, G. J., S. W. Sherburne and R. D. Lewis. 1970.
ics opposite abasic lesions in DNA. J. Biol. Chem. Folymorphism in the esterases of Atlantic her-
262:6864-6870. ring. Trans. Am. Fish. Soc. 99:147-151.
Ranker, T. A. and A. E Schnabel. 1986. Allozymic and Ridley, M. 1983. The Explanation !or Organic Diversity:
morphological evidence for a progenitor-deriva- The Comparative Method and Adaptations for Mating.
tive species pair in Camassia (Liliaceae). Syst. Bot. Oxford University Press, Oxford.
11:433445. Riedy, M. E, W. J. Hamilton and C. E Aquadro. 1992.
Rassmann, K., C. Schlatterer and D. Tautz. 1991, Excess of non-parental bands in offspring from
Isolation of simple-sequence loci for use in poly- known primate pedigrees assayed using RAPD
merase chain reaction-based DNA fingerprinting. FCR. Nucl. Acids Res. 20:918.
Electrophoresis 12:113-118. Rieseberg, L. H. 1991. Homoploid reticulate evolution
Rceck, G.R., C, de Haen, D. C. Teller, R. E Boolittle, in Helianthus: Evidence from ribosomal genes.
W. M. Fitch, R. E.Dickerson, F. Chambon, A.D. Am. J. Bot. 78:1218-1237.
McLachlan, E. Margoliash, T. H. Jukes and E. Xieseberg, L. H. and S. J. Brunsfeld. 1992. Molecular
Zuckerkandl. 1987. "I-Tomology" in proteins and evidence and plant introgression, pp. 151-176. h
nucleic acids: A terminology muddle and a way P. S. Soltis, D. E. Soltis and J. J. Doyle (eds), Plant
out of it. Cell 50:667. Molecular Systetnatics. Chapman and Hall, New
York.
Rieseberg, L. H. and N. C. Ellstrand. 1993. What can Roberts, J \V,, S. A. Johnson, I' ffier, T. J. Hall, E 13.
morphological and molecular markers tell us Davidson and R. J. Britten. 1985. Evolutionary
about plant hybridization. Crit. Rev. Plant Sci. conservation of DNA sequences expressed in sea
12:213-241. urchin eggs and embryos. J. Mol Evol. 22:99-107,
Rieseberg, L. EI. and D. E. Soltis. 1991. Phylogenetic Roberts, L. 1989. Genome project under way, at last.
consequences of cytoplasmic gene flow in plants. Science 243:167-168.
Evol. Trends Plants 5:65-84. Roberts, R.J. 1984. Restriction and modificat~on
Riesebcrg, L. W., S. Beckstrom-Sternberg, A. Liston ezymes and their recognltlon sequences. Nucl.
and D. Arias. 1991. Phylogenetic and systematic Acids Res. 12:r167-r204
inferences from chloroplast DNA and isozyme Robeits, R. J and D. Macellis 1993 REBASE-restrlc-
variation in Heliaizfhus sect. Heliantltus. Syst. Rot. tion enzymes and methylascs Nucl. Acids Res.
16:50-76. 21 3125-3137.
Fkcseberg, L. H., M. A. Hanson and C. T. Philbrick. Robertson, D. L., P. M. Sharp, F. E.McCutchan and B
1992. Androdioecy is derived from diaccy in H. Hahn. 1995. Rccombinatlon in HIV,Nature
Datiscaceae: Evidence from restriction site map- 374:124-126.
ping of PCR-amplified chloroplast DNA frag- Rodrigo, A. G. 1992. Two optllnality criteria for select-
ments. Syst. Bot. 17:324-336. ing subsets of most parsimonious trees. Syst. Biol.
Rieseberg, L. Xi., H. Choi, R. Chan and C. Spore. 1993. 41:3340.
Genomic map of a diploid hybrid species. Rodrigo, A. G. 1993. Calibrating the bootstrap test of
Heredity 70:285-293. monophyly. Int. J. Parasitol 23.507-514.
Rigby, P. W. J., M. Dieckmann, C. Rl~odesand P. Berg. Rodrigo, A. G., M. Kelly-Borgcs, P R Bcrgquist and P
1977. Labelling deoxyribonucleic acid to high spc- L. Bergquist. 1993.A randoniisat~ontest of the
cific activity in vitro by nicktranslation with DNA null hypothcsls that two cladograrns are sample
polymerase I. J. Mol. Biol. 113:237-251. estimates of a paramctrlc phylogenetsc tree New
Rijsewijk, F. M. Schuermann, E. Wagenaar, P. Parren, Zealand J Bot. 31:257-268
D. Weigel and R. Nussc. 2987. The Drosophila Rodriguez, F., I. L. Oliver, A. Marin and J. R. Mcdina.
homolog of tlie rnnuse mamnary oncogeilc int-1 1990. Thc general srochashc model of nucleotide
is identical to the segment polarity gene wingless. subst~tution.J. Theor. Biol. 142.485-501.
Cell 50649-657. Roff, D. A, and P. Bentzen. 1989. The statistical analy-
Riley, M. A,, 5. R. Kaplan and M. Veuille. 1992. sls of mitochondnal DNA polymorphisms: X2 and
Nucleotide polymorphism at the xanthine dehy- the problem of small samplcs ~Mol.Biol. Evol
drogcnase Iocus in Drosophila pseudoobscura. Mol. 6.539-545.
Biol. Evol. 9:56-69, Rogers, A. R,and H. Harpendlng 1992. Population
Riley, V. 1960. Adaptation of orbital bleeding tech- growth makes waves in the dlstrlbution of pair-
nique to rapid serial blood studies. Proc. Soc. Exp. wlsc genetic differences. Mol. Blol. Evol.
Biol. Med. 104:751--754. 9.552-569
Ritland, C. E., K. Ritland and N. A. Straus. 1993. Rogers, D. S, and M. D. Engstrom 1992. Gcnetlc dif-
Variation in the ribosomal internal transcribed ferentiation m spiny pocket nilce of the 1;iotnys
spacers (ITS1 and ITS2) among eight taxa of the pzcfus species-group (fainlly Heterornyidae). Can.
MitrzuIus guttatus species complex. Mol. Biol. Evol. J. Zool. 70:1912-3919
10:1273-1288. Rogers, J. S. 1972. Measures of genct~csimilarity and
Ritland, K. and M. T. Clegg. 1987. Evolutionary analy- genetic distance Studles 111 Genet. VII. Un~verstty
sis of plant DNA sequences. Am. Nat. of Tcxas Pub. 7213:145-153.
130:s75-~100. Rogers, J S. 1984. Deriving phylogenctic trees from
Ritland, K, and E R. Ganders. 1987. Convariation of allcle frequencies. Syst. Zool. 33.52-63.
selfing rates with parental gene fixation indices Rogers, J. S. 1986. Derlving phylogenetic trees from
within populations of Mitlzulus grrttatus. allele frequencies: A comparisolx of nine genctic
Evolution 41:760-771. distances. Syst. Zool. 35.297-310.
Robert-Fortel, I., H. R. Junera, G. Geraud and D. Rogers, S. 8 . and A. J. Bendsch. 1985. Extraction of
Hernandez-Verdum. 1993. Three dimensional DNA from milligram amounts of fresh, herbari-
organization of the ribosomal genes and Ag-NOR urn and mummified plant tissues Plant Mol. Biol.
proteins during interphase and mitosis in PtKl 5.69-76.
cells studied by confocal microscopy.
Chromosoma 102:146-157.
Rogsiad, S Ji., J. C. Patton and B. A. Schaal. 1988. M13 properties of rattlesnake venom following 26
repeat probe detects DNA minisatellite-like years of storage, Proc. Soc. Exp. Biol. Med.
sequences in gymnosperm and angiosperm. Proc. 103:737-739.
Natl Acad. Sci. USA85:9176-9178. Ruvolo, M., T. R. Disotell, M. W. Allard and W. M.
Rogstad, S. H., H. Nybom and 13.A. Schaal. 1991a. The Brown. 1991.Resolution of the &can hominoid
Leilapod "DNA fingerprlnking" MI3 repeat probe trichotomy by use of a mitochondria1gene
l~vedlsgenetic dlverslty and clonal growth in sequence. Proc. Natl. Acad. Sci. USA 88:1570-1574.
quakmg aspen (Popnlus tremuloides, Salicaceae). Ryan, M. J. and A. S. Rand. 1995. Female responses to
Piant Syst. Evol. 175,115-123. ancestral advertisement calls In TGngara frogs.
Rogstad, S H., K. Wolff and B A. Schaal. 1991b. Science 269 :390-392.
Geographical variation 111Asznzzna iriloba Dunal Ryman, N. and F. Utter, (eds.) 1987.Population Genetzcs
(Annonaceae) revealed by MI3 "DNA finger- a71d Fzshery Management. University of
PI~nting"probe. Am. J. Bot. 78:1391-1396. Washington Press, Seattle.
Rohrer, G. A., L. J. Alexander, J. W. Keele, T. Smith Ryman, N., F,W. Allendorf and G. Stahl. 1979.
and C. M! Ueattie. 1994. A microsatellite linkage Reproductive isolation with little genetic diver-
map of the porcine genome. Genetics 136:231-245. gence in sympatric populations of brown trout.
Rollo, F A,, A. Amici, R. Salvi and A. Garbuglia. 1988. Genetics 92:247-262.
Short but faithful pieces of ancient DNA, Nature Ikhetsky, A, and M. Nei. 1992a. A simple method for
335:774. estimating and testing minimum-evolution trees.
Xooncy, D E. and B. H. Czepulkowski. 1986.Hurwnn Mol. Biol. Evol. 9945-967.
Cyiogenetics. IRL Press, Oxford. Rzhetsky, A. and M. Nei. 1992b. Statistical properties
Roosc, M. L.and L. D. Gottlieb. 1976. Genebc and bio- of the ordinary least-squares, generalized least-
chenucal consequences of polyploidy in squares, and minimum-evolution methods of
Tmyopogon. Evolution 30:818-830. phylogenetic inference. J. Mol. Evol. 35:367-375.
Ropson, 1.J. and D. A. Powers. 1989. The allelic Rzhetsky, A. and M. Nei. 1993. Theoretical foundation
isotymes of hexosc-6-phosphate dehydrogenase of the minimum-evolution method of phylogenet-
isolated from Fundulus heleroclitus: Physical char- ic inference, Mol. Biol. Evol. 10:1073-1095.
actcrs and kinetic properties. Mol. Biol. Evol. Rzhetsky, A. and M. Nei. 1995. Tests of applicability of
6.171-185. several substitution models for DNA sequence
Iiopson, I. J., D. C. Brown and D.A. Powers. 1990. data. Mol Biol. Evol. 22:131-151.
Uiocheinical genetics of Fundulus heteroclitus (L.).
V1. Geographical variation in the gene frequencies Sackler, M. L. 1966. Xanthine oxidase irom liver and
of 15 loci. Evolution 44:16-26. duodenuin of the rat: Histochemical localization
l<aser~,D E. and D. G. Buth. 1980. Empirical evolu- and electrophoretic heterogeneity. J. Histochem.
t~onaryresearch versus neo-Darwinian specula- Cytochem. 14:326-333.
t ~ o nSysh
. Lool. 29:3GG-308. Sage, R, D, and R. K.Selander. 1979. Hybridization
Ross, j. and S. Leavitt. 1991. Iinproved sample recov- between species of the Rana pipiens complex in
ery in thermocycle sequencing protocols. central Texas. Evolution 33:1069-1088.
BioTechniques 11:618-619. Saghai-Moroof, M. A., K, M. Soliman, R. A. Jorgensen
Rost, E W. D.1992. FluorescewceMicroscopy.Vol. 1. and R. W. Allard. 1984. Ribosomal DNA spacer-
Cambridge University Press, Cambridge length polymorplusms in barley: Mendelian
Roy,M S., E. Geffen, D. Smith, E. Ostrander and R. K. inheritance, chromosoinal location, and popula-
Wayne. 1994. Patterns of differentiation and tion dynamics, l'roc. Natl. Acad. Sci. USA
hybridization in North American wolf-like canids 81:8014-8019.
revealed by analysis of nucrosatellite loci. Mol. Saiki, R. K., S. Scharf, R Faloona, K. B. Mullis, G. T.
Bloi Evol. 11:553-570. Horu, 1-1. A. Erlich and N Arnheim. 1985.
Runno, G , A . S. Deinard, S. Tishkoff and K. K. Kidd. Enzymatic amplification of Pglobin genomic
1494 Detection of DNA sequence variation via sequences and restriction site analysis for diagno-
dellberate heteroduplex formation from genamic sis of sickle cell anemia. Science 230:1350-1354.
DNAs amplified en masse in "population tubes". Saiki, R. K., D. H. Gelfand, S. Stoffel, S. J. Scharf, R.
PCR Mcth. Applica. 3:225-231. Higuchi, G. T.Horn, K. 13. Mullis and H. A. Erlich.
liusiell, F E. 1980. Snake Veizom Poisol~ing.Lipponcott, 1988. Primer-directed enzymatic amplification of
l3hlladelphia. DNA with a thermostable DNA polymerase.
Rusaeli, F.E., J, A. Emery and T. B. Long. 1960. Some Science 239:487-491.
Saitou, N. 1988. Property and efficiency of the maxi- Frequency of insertion-deletion, transversion, and
mum likelihood method for molecular phylogeny. transition in the evolution of 5S ribosomal TCNA.
J. Mol. Evol. 27:261-273. J. Mol. Evol. 7:133-149.
Saitou, N. 1990. Maximum likelihood methods. Meth. Sankoff, D., G. Leduc, N. Antoine, B. Paquin, B. F.
Enzyrnol. 183:584-598. Lang and R. Cedergren. 1992. Gene order com-
Saitou, N. 1991. Molecular Systematics (book review). parisons for phylogenetic inference: Evolution of
Mol. Biol. Evol. 4:559-561. the mitochondria1 genome. Proc. Natl. Acad. Sci.
Saitou, N. and T. Imanishi. 1989. Relative efficiencies USA 89:6575-6579.
of the Fitch-Margoliash, maximum-parsimony, Santos, F. R., S. D, J. Pena and J. T.Epplen. 1993.
maximum-likelihood, minimum-evolution, and Genetic and population study of a Y-linked
neighbor joining methods of phylogenetic tree tetranucleotide repeat DNA polymorphism with a
construction in obtaining the correct tree. Mol. simple non-isotopic technique. Human Genet.
Biol. Evol. 6:514525. 90:655-656.
Sa~tou,N. and M. Nei. 1987, The neighbor-joining Sarich, V. M. 1977. Rates, sample sizes, and the neu-
method: A new method for reconstructing phylo- trality hypothesis for electrophores~sin evolution-
genetic trees. Mol. Biol. Evol. 4:406-425. ary studies. Nature 265:24-28.
Salthe, S. N. and N. 0. Kaplan. 1966. Immunology and Sarich, V. M. 1985. Rodent macromolecular systemat-
rates of enzyme evolution in the amphibia in rela- ics, pp. 423-452, In W. P. Luckett and J.-L.
tion to the origins of certain taxa. Evolution Hartenberger (eds.), Evolutioiza y Relationships
20:603-616. Among Rodents. A Multidisciplinary Analysis.
Sambrook, E., F. Fritsch and T. Maniatis. 1989. Plenum, New York.
Molecular Cloning. Cold Spring Harbor Press, Sarich, V. M. and J. E. Cronin. 1976. Molecular system-
Cold Spring Harbor, New York. atics of the primates, pp. 141-170. In M. Goodman
Sanderson, M, J. 1989. Confidence limits on phyloge- and R. E. Tashian (eds.), Molecular Anthropology.
nies: The bootstrap revisited. Cladistics 5913-129. Plenum, New York.
Sanderson, M. J. and J. J. Doyle. 1992. Reconstruction Saricl~,V. M. and A. C. Wilson. 1966. Quantitative
of organisrnal and gene phylogenies from data on immunochemistry and the evolution of primate
multigene families: Concerted evolution, homo- albumins: Micro-complement fixation. Science
plasy, and confidence. Syst. Biol. 414-17. 1541563-1566.
Sanderson, M. J., B. G. Baldwin, G. Bharathan, C. S. Sarich, V. M. and A. C. Wilson. 1967. Immunological
Campbell, C. von Dohlen, D.Ferguson, J. M. time scale for hominid evolution. Science
Porter, M. F, Wojciechowski and M.J. Donoghue. 158:1200-1203.
1993.The growth of phylogenctic information Sarich, M., C. W. Schmid and J. Marks. 1989. DNA
and the need for a phyfogenetic data base. Syst. hybridization as a guide to phylogenies: A critical
Biol. 42:562-568. analysis. Cladistics 5:3-32.
Sanger, E, S. Nicklen and A. R. Coulson. 1977. DNA Sarkar, G., H.-S. Yoon and S. S. Sommer. 1992.
sequencing with chain-terminating inhibitors. Screening for mutations by RNA single-strand
Proc. Natl. Acad. Sci. USA 74:5463-5467. conformation polymorphism (rSSCP):
Sankoff, D. 1975. Minimal mutation trees of sequences. Comparison with DNA-SSCP. Nucl. Acids Res.
SIAM J. Appl. Math 28:35-42. 209371-878.
Sankoff, D. and R. J. Cedergren. 1983. Simultaneous SAS Institute. 1985. SAS User's Guide: Statistics, Version
comparison of three or more sequences related by 5. SAS Institute, Cary, North Carolina.
a tree, pp. 253-263. In D. Sankoff and J. B. k s k a l Sasavage, N.1992. Painting by the chromosome num-
(eds.), Trme Warps, String Edits, and bers. J. NIH Res. 444-46.
Macromolecules: The Theory and Practice of Sequence Sattath, S. and A. Tversky. 1977. Additive similarity
Comparison, Addison-Wesley, Reading, trees. Psychometrika 42:319-345.
Massachusetts. Savage, J. M. 1973. The geograpl~icdistribution of
Sankoff, D, and Rousseau. 1975. Locating the ver- frogs: Patterns and predictions, pp. 351-455. i n J.
tices of a Steiner tree in arbitrary space. Math. L. Vial (ed.), Evolutionary Biology of the Anurans:
Prog. 9:240-246. Contemproray Research on Major Problems.
Sankoff, D., C. More1 and R. J. Cedergren. 1973. University of Missouri Press, Columbia.
Evolution of 55 RNA and the non-randomness of Scanlan, B. E., L. R. Maxson and W. E. Duellman. 1980.
base replacement. Nature 245:232-234. Albumin evolution in marsupial frogs (Hylidae:
Sankoff, D., R. J. Cedergren and G. Lapalme. 1976. Gaslrotheca). Evolution 34:222-229.
Schaal, 8. A,, W. J. Leverich and J. N~cto-Sotela.1987. of simple sequence DNA. Nuci. Acids Res.
Ribosomal DNA variation in the native plant 20:211-215.
Phlox divaricata. Mol, Bioi. Evol. 4:611-621. Schlotterer, C., B. Amos and D. Tautz. 1991.
Schaaper, R. M. and R. L. Dunn. 1987. Spectra of spon- Conservation of polymorphic sequence loci in
taneous mutations m Escherichia colt strains defec- certain cetacean species. Nature 35453-65.
tive in mismatch correction: The nature of in vivo Schlotterer, C., M. T. Hauser, A. von Waeseler and D.
replication errors. Proc. Natl. Acad. Sci. USA Tautz. 1994. Comparative evolutionary analysis of
84:6220-6224. rDNA ITS regions in Drosophila. Mol. Biol. Evol.
Schaeffer, S. W. and C. F, Aquadro. 1987 Nucleotide 11:513-522.
sequence of the alcohol dehydrogenasc region of Schmid, M. and M. Guttenbach. 1988. Evolutionary
Drosophila pscudoobscura: Evolutionary change diversity of reverse (R) fluorescent cl~omosome
and evidence for an ancient duplication. Genetics bands in vertebrates. Chromosoma 97:lOl-124.
117:61-73. Schmid M., J. Olert and C. Klett. 1979. Chromosomc
Schaeffer,S. W. and E. L. Miller. 1991. Nucleotide banding in Amplubia 111. Sex chromosomes in
sequence analysis of Adh gene estimates the time Trzturus. Chromosoma 71:29-55.
of geographic isolation of the Bogota population Schoen, D. J. 1982. Genetic variation and the breeding
of Drosophila pscudoobscura. Proc. Natl. Acad. Sci. system of Gilia achilleifolia.Evolution 36:361-370.
USA 88:6097-6101. SchBniger, M. and A. von Haeseler. 1993. A simple
Schafer, M. and W. Kuaz. 1985. rDNA in Locusta migra- method to improve the reliability of tree recon-
toria is very variable: Two introns and extensive structions. Mol. Biol. Evol. 10:471483.
restriction site polymorphisms in the spacer. Schubert, F. R., K. Nieselt-Struwe and P. Gruss. 1993.
Nucl. Acids Res. 13:1251-1266. The antennapaedia-type homeobox genes have
Scharf, S. J. 1990. Cloning with PCR, p p 8491. In M. evolved from three precursors separated early in
A. Innis, D. H. Gelfand, J. J. Sninsky and T. J. metazoan evolution. Proc. Natl. Acad. Sci. USA
White (eds.), PCR Protocols. Academic Press, New 90:143-147.
York. Schwaner, T. D. and H. C. Bessauer. 1982.
Scharf, S. J., C. M. Long and H. A. Erlich. 1988a. Comparative immunodiffusion survey of snake
Sequence analysis of the HLA-DRP and HLA- transferrins focused on the relationships of the
DQP loci from three Pemphigus vulgaris patients. natricines. Copeia 1982:541-549.
Human Immunol. 22:61-69. Schwaner, T.D., P. R. Baverstock, H.C.Dcssauer and G.
Scharf, S. J., A. Friedman, C. Brautbar, F. Szafer, L. A. Mengden. 1985. Immunological evidence for the
Steinman, G. Ilorn, U.Gyllensten and H. A. phylogenetic relationships of Australian elapid
Erlich. 1988b. HLA class II allelic variation and snakes, pp. 177-184. In G. Grigg, R. Shine and H.
susceptibility to Pniphigus vulgaris . Proc. Natl. Ehmann (eds.), Biology of Australasian Frogs and
Acad. Sci. USA 85:3504-3508. Reptiles. Royal Zool. Soc., New South Wales.
Scherberg, N.H. and S. Refetoff. 1975. Radioiodine Schwartz, M. K., 1. S. Nisselbaum and 0.Bodansky.
labeling of ribopolymers for special applications 1963. Procedure for staining zones of activity of
in biology, pp. 343-359. In D. M. Prescott (ed.), glutamic oxaloacetic transaminase following elec-
Methods in Cell Biology, Vol. 10, Academic Press, trophoresis with starch gel. Am. J. Clin. Pathol.
New York. 40:103-106.
Schilling, E. E. and R. K. Jansen. 1989. Restriction frag- Schwartz, 0.A. and K. B. Armitage. 1980. Genetic
ment analysis of chloroplast DNA and the sys- variation in social mammals: The marmot model.
tematics of Viguieva and related genera Science 202665-667.
(Asteraceae: Heliantheae). Am. 1. Bot. Scl~wartz,R. M. and M. 0. Dayhoff. 1978. Origins of
121769-1778. prokaryotes, eukaryotes, mitochondria, and
Schleif, R. F. and P. C. Wensink. 1981. Practical Methods chloroplasts: Aperspective is derived from pro-
in Molecular Biology. Springer-Verlag, Berlin. tein and nucleic acid sequence data. Science
Schlotterer, C. and J. Pemberton. 1994. The use of 199:395403.
microsatellites for genetic analysis of natural pop- Scliwengel, D. A,, A. E.Jcdlicka, E. J. Nanthakumar, J.
ulations, pp. 203-214. In B. Schierwater, 'B. Streit, L. Weber and R. C. Levitt. 2994. Cornparisan of
G. P.Wagner and R. DeSalle (eds.), Molecular fluorescence-based semi-automated genotyping
Ecology and Evolution: Approaches and Applications. of multiple microsatellite loci with autoradi-
Birkhauser Verlag, Basel, Switzerland. ographic techniques. Genomics 22:46-54.
SchlGtterer, C. and D. Tautz. 1992. Slippage synthesis
Liferutu~eCited 61 3
Schwert, G. W. 1957. Recovery of native bovine serum Rat gene mapping uslng PCR-analyzed
albumin after precipation wlth trichloracetic acid microsatellites. Genetics 131 701-721.
and solution in organic solvents. J. Am. Chem. Sessions, S. K.1982. Cytogenet~csof diploid and
Soc. 79:139-141. triploid salamanders of the Ambystoma ~cffersonl-
Scribner, K.T., J. W. Arntzen and T. Burke. 1994. nnum complex. Chromosoma 84:599-621
Comparative analysis of intra-and interpopula- Sess~ons,S. K. and J. Kezer. 1987. Cytogenetic evolu-
tion genetic diversity in Bufo bufo, using allozyme, tion i12 the plethodontid saiamandcr genus
single-locus m~crosatellite,minisatellite and mul- Aneides. Chromosoma 95:17-30.
tilocus minisateirite data. Mal. Biol. Eval. Sess~ons,S. K.and A. Larson 1987. Developmental
11:737-748. correlates of genomc size in piethodolqtid sala-
Sears, 8. B. 1980. T11e elimination of plastids during manders and therr impl~cationsfor genome evo-
spermatogenesis and fertilization in the plant lutton Evolution 41:1239-1251.
klngdom, Plasmid 4:233-255. Seutin, C., B.F: Lang, D. P.Mindell and R. Morals
Seber, G. A. R 1982. Tlze Estzmation of Animal 1994. Evolution of the WANCY region m amnrotc
Abuizdance. Charles Griffin and Co., 1,ondon. mltochondrial DNA. Mol Biol Evol. 11.329- 340
Seed, B., R. C. Parker and N. Davidson. 1982. Shaklce, J. B. 1984. Genetic var~ahonand populat~on
Representation of DNA sequences in recombinant structure in the damselfish, Stegastes fasc~olaflds,
DNA libraries prepared by restriction enzyme throughout the Hawallan Archipelago. Copeia
partial digestion. Gene 19:201-209. 2 984.629-640.
Selander, R. K., M. K. Smith, S. Y. Yang, W. E. Johnson Shaklee, J. D,and C. P.Keenan. 1986. A Practtcal
and J. R. Gentry. 1971. Biochemical polymorphism Laboratory Guide to the Techniyues and Met\zodology
and systematics in the genus Peramyscus. I. of Electrophoresis and Its Appl~calronto Fish Ftllct
Variation in the old-field mouse (Peromyscus lderztificafion. CSBO Marme Laboratories Publ.
polionotus). Stud. Genet. V1. University of Texas 177. Melbourne, Australia.
Pub. 7103:49-90. Shaklee, J. B, and C. S. Tanaru. 1981. Biochemical and
Sclandcr, R. K., D. A. Caugant, H.Ochman, J. M. morpl~ologica!cvolution of Hawanan bonefishes
Musser, M. N. Gilmour and T. S. Whlttam. 1986. (Albula).Syst. Zool.30:125-146.
Methods of multilocus enzyme electrophoresis for Shaklee, J. B. and G. S. Wl~itt.1981. Lactatc dehydro-
bacterial population genetics and systcmatics. genase isozymes of gadiform fishes: Divergent
Appl. Environ. Microbial. 51:873-884. patterns of gene expression indicate a heterogc-
Sellers, P. 1974. On the theory and computation of evo- neous taxon. Copeia 1981:563-578.
lutionary distances. S U M J. Appl. Math. Shaklee, J. B., K. L. Kepes and G. S. Whitt. 1973.
26:787-793. Specialized lactate dehydrogenase isozymcs: The
Sensabaugh, G. F. 1982. Isozymcs in forensic science, molecular and gcnetic bas~sfor the uniquc eye
pp. 247-282. In M. Rattazzi, J. Scandalios and C. and liver LDHs of teleost fishes. J. Exp. Zool
Whitt (eds.), Isozymes: Current Topics in Biological 185:217-240*
and Medical Research, Vol. 6. A. R. Liss, New York. Shaklee, J. B., E W. Allcndorf, D.C Morizot and G S.
Sensabaugh, G. F., A. C. Wilson and P. L. Kirk. 1971a. Whitt. 1992. Gene nomenclature for prote~n-cod-
Protein stabih ty in preserved biological remains. ing loci in fish. Trans. Am Fish. Soc. 119.2-15.
X. Survival of biologically active proteins in an 8- Sharkey, M. J. 1989 A hypothesis-~ndependentmcthad
year-old sample of dried blood. Int. J. Biochem. of character welghtlng for cladrstic analysls
2:545-557. Cladistics 5:63-86.
Sensabaugh, G. F., A. C. Wilsoi~and P. L. Kirk. 1971b. Sbaw, C. R. 1965. Electrophoretic variation m
protein stability in preserved biolog~calremains. enzymes. Science 149936-943
11. Modification and aggregation of proteins in an Shaw, C. R. and R. Prasad. 1970 Starch gel elec-
8-year-old sample of dried blood. Int. J. Biochem. trophoresis of cnzymes-a compilation of rcclpcs.
2558-568. Biochem. Gcnet. 4:297-330.
Separack, P,, M. Slatkin and N.Amheim. 1988. Shaw, D.D., A. D. Marchant, M.L. Arnold and N
Linkage disequilibrium in human ribosomal Contreras. 1987. Chromosomal rearrangements,
genes: Implications for multigene family evolu- ribosomal genes and mitochondria1 DNA:
tion. Genetics 119:943-949. Contrasting pattcrns of iiztrogrcssion across a nar-
Serikawa, T., T. Kuramoto, P. Hilbert, M. Mori, J. row hybrid zone, pp 121-130. In I? E. Brandham
Yainada, C. J. Dubay, K.Lindpainter, D. Ganten, J. and M. D. Bennett (eds.), Kew Chrorrlosome
-L. Guenet, G. M. Lathrop and J. S. Beckman 1992. Corzference. 111. Allen and Unwin, .
Shaw, D D., A. D. Marchant, M. L. Arnold, N. mastodon and woolly mammoth demonstrated
Contreras and B.Kohlll~ann1990. The control of immunologically Paleobiology 1k429-437.
gene flow across a narrow hybrid zone: A selec- Shows, T. B, and F. H. Ruddle. 1968. Function of the
hve role for chro~nosomalrearrangement. Can. J. lactate dehydrogenase B gene in mouse erythro-
Zoo1 68 1761-1769. cytes: Evidence for control by a regulatory gene.
Shaw, J ,T IZ Meagher and P. I-Iarley. 1987. Electro- Proc. Natl. Acad. Sci. USA 61:574.
phoretlc evidence of reproductive isolation Shrlver, M. D., J. Li, R. Chakraborty and E.
between two varieties of the moss Climaciutn Boerwinkle. 1993. VNTR allele frequency distrib-
alrzerrcunum. Heredlty 59337-343. utions under the stepwise mutation model: A
Sheffield, V. C., D. R. Cox and R. M. Myers. 1989. computer simulation approach. Genetics
Attachment of a 40-base pair G+C rich sequence 134:983-993.
(GCcIamp) to genomic fragments by polymerase Sibley, C. G. and J. B. Ahlquist. 1981a. The phylogeny
chain reaction results ~nin~proveddetection of and relationships of the ratite birds as indicated
srngle base changes. Proc. Natl. Acad. Sci. USA by DNA-DNA hybridization, pp. 301-335. In G.
86 232-236. G. E. Scudder and J. L. Reveal (eds.), Evolution
Sheffield, V. C., J. S. Beck, E. M. Stone and R. M. Today. Carnegie-Mellon University, Pittsburgh,
Myers. 1992. A simple and efficient method for Pennsylvania.
attachment of a 40 base pair G+C rich sequence to Sibley, C. G. and J. E. Ahlquist. 1981b. Instructions for
I'CR amplified DNA. BioTechniques 12:386-387. specimen preservation for DNA extraction: A
Sheldon, F H. 1987. Rates of single-copy DNA evolu- valuable source of data for systematics. Assoc.
tion in herons. Mol. Biol. Evol. 4.56-69. Syst. Collections Newsletter 9:44-45.
Sheldon, F 13, and A. H.Bledsoe. 1989.Indexcs to the Sibley, C. G. and J. E. Ahlquist. 1983. The phylogeny
ieassociation and stability of solution DNA and classification of birds based on the data of
l~pbrids.J. Mol. Evo1.29.328-343, DNA-DNA hybridization, pp. 245-292. In R.E
Sheldon, E H., Sllkas, B., Kinnarney, M., Gill, F. B., Johnston (ed.), Current Ornithology,Vol. 1.
Zaho, E.and B. Silverin. 1992. DNA-DNA hybrid- Plenum, New York.
lzalion evidence of phylogenetic relationships Sibley, C. G. and J. Ahlquist. 1987a.Avian phylogeny
among major lineages of Parus. Auk 109:173-185. reconstructed from co~nparisonsof the genetic
Sl~era,E U , N. K.Seitzinger, L. M. Davis, R. A. Keller material, DNA, pp. 95-121. In C. Patterson (ed.),
and S. A Soper. 1990. Detection of single fluores- Molecules and Morphology in Evolution: Conflict or
cent n~olecules.Chem. Phys. Letters 175:553-557. Compromise?Cambridge University Press,
Shields, G F. and A. C. Wilson. 1987. Calibration of Cambridge.
m~iochondrialDNA evolution in geese. J. Mol. Sibley, C. G. and J. E.Ahlquist. 1987b. DNA hybridiza-
13101,24:212-217. tion evidence of hominoid phylogeny: Results
Shlnozaki, K., M. Ohme, M. 'Tanaka, T. Wakasugi, N. from an expanded data set. J. Mol. Evol.
Hayashida, T. Matsubayashl, N, Zaita, J. 26:99-121.
Chunwongse, J. Obokata, K. Yamaguchi- Sibley, C. G. and J. B. Ahlquist. 1990. Phylogeny and
Shmozaki, C. Ohto, K. Torazawa, B. Y. Meng, M. Classification of Birds. Yale University Press, New
S~igita,H. Deno, T. Kamogashlra, K.Vamada, J. Haven.
K~rsuda,E Takaiwa, A Kato, N.Tohdoh, W. Sibley, C . G., K, W. Corbin, J. E. Ahlquist and A.
Shlmada and M. Sugiura. 1986. The coinplete Ferguson. 1974. Birds, pp. 89-176. In C. A. Wright
nucleotlde sequence of tobacco chloroplast (ed.), Biochemical and In~munologicalTaxo~zomyof
geiiome. Its gene organ~zationand expression. A~~imals. Academic Press, New York.
EblBO J 5.2043-2049 Sibley, C. G., J. E. Ahlquist and F. H.Sheldon. 1987.
Shochnl, D. and H. C. Dessauer 1981. Coinparatlve DNA hybridization and avian pl~ylogenetics:
il~~mu~~ological study of album~nsof Anolzs Reply to Cracraft. Evol. Biol. 21:97-125.
lizards of the Caribbean Islartds. Comp. Biochem Sibley, C. G., J. E. Ahlquist and B. L.Monroe Jr. 1988.A
Ijhyslol. 68A:67-73 classification of living birds based on DNA-DNA
Shoemdker, J. S. and W. M. Fitch. 1989. Evidence from hybridization stud~es.Auk 105:409423.
nuclear sequences that invariable sites should be Siciliano, M. J. and C. R. Shaw. 1976. Separation and
considered when sequence divergence is calculat- visualization of enzymes on gels, pp. 185-209. In
cd Mol. Biol. Evol. 6.270- 289. I. Smith (ed,), Chromutographicand Electrophoretic
Shoshan~,J ,]. M. Lowenste~n,D. A. Walz and M. Techniques. Vol. 2. Wm. Heineman Medical Books,
Goodman. 1985. Proboscidean origins of London.
Liierature Cited 621
Sidow, A. and A. C. Wilson. 1991. Compositional sta- Sites, J. W., Jr. and S. K.Davis. 1989. Phylogenetic rela-
tistics evaluated by computer simulation, pp. tionships and molecular variability within and
129-146. In M. M. Miyamoto and J. Cracraft among six chromosome races of Sceloporus gram-
(eds.), Phylogenetic Analysis of DNA Sequences. micus (Sauria, Xguanidae), based on nuclear and
Oxford University Press, New York, Oxford. mitochondria1 markers. Evolution 43:296-317.
Sidow, A,, T. Nguyen and T. P. Speed. 1992. Estimating Sites, J. W., Jr. and C. Moritz. 1987. Chromosome change
the fraction of invariable codons with a capture- and speciation revisited. Syst. Zool. 36:153-174.
recapture method. J. Mol. Evol. 35:253-260. Sites, J. W., Jr. and R. W. Murphy. 1991. Isozyme evi-
Silberman, J. D.and l?. J,. Walsh. 1992. Species identifi- dence for independently derived, duplicate
cation of spiny lobster phyllosome larvae via G3PDH loci among squalnate reptiles. Can, J.
ribosomal DNA analysis. Mol. Mar. Biol. Zool. 69:2381-2396.
Biotechnol. 1:195-205. Sites, J. W., Jr., J. W. Bickham, B. A. Pytel, I. F.
Simmons, G. M., M. E. Kreitman, W. E Quattlebaum Greenbaum and B.A. Bates. 1984. Biochemical
and N. Miyashita. 1989. Molecular analysis of the characters and the reconstruction of turtle phylo-
alleles of alcohol dehydrogenase along a cline in genies: Relationships among batagurine genera.
Duosophila melnnogast.er. I. Maine, North Carolina, Syst. Zool. 33:137-158.
and Florida. Evolution 43:392-392. Sites, J. W., Jrd,R. L. Bezy and P. Thompson. 1986.
Simmons, N. B., M. J. Novacek and R. J. Baker. 1991. Nonrandom heteropolymer expression of lactate
Approaches, methods and the future of the dehydrogenase isozymes in the lizard family
Chiropteran monophyly controversy: A reply to J. Xantusiidae. Biochem. Syst. Bcol. 14:539-545.
D. Pettigrew. Syst. Zool. 40:239-241. Sites, J. W., Jr., D. M. Peccinini-Seale, C. Moritz, J. W.
Simon, C. 1979. Evolution of periodical cicadas: Wright and W. M. Brown. 1990. The evolutionary
Phylogenetic inferences based upon allozyme history of parthenogenetic Cnemidophorus lemnis-
data. Syst. Zool. 28:22-39. catus (Sauria, Teiidae). I. Evidence for a hybrid
Simon, C. 1991. Molecular systematics at the species origin. Evolution 44:889-905.
boundary: Exploiting conserved and variable Sites, J. W., Jc,S. K. Davis, D. W. Hutchison, B. A.
regions of the mitochondria1 genome of animals Maurer and G. Lara. 1993. Parapatric hybridiza-
via direct sequencing of enzymatically amplified tion between chromosome races of the Sceloporus
DNA, p p 33-71. In G. M. Ilewitt, A. W. B. grammicus complex (Pluynosomatidae): Structure
Johnson and J. P. W. Young (eds.), Molecular of the Tulancingo transect. Copeia 1993:341-366.
Techniques in Taxonomy. NATO Advanced Studies Slade, R. W. 1992. Limited MHC polymorphism in the
Institute, H57. Springer, Berlin. southern elephant seal: Implications for MHC
Simon, C., S. Paabo, T.D. Kocher and A. C. Wilson. evolution and marine mammal population biolo-
1990. Evolution of mitochondria1 ribosomal RNA gy. Proc. Roy. Soc. London B 249:163-171.
in insects as shown by the polymerase cham reac- Slade, R. W., C. Moritz and A. Heideinan and P. T.
tion, pp. 235-244. Tn M. Clegg and S. O'Rrien Hale. 1993. Rapid assessment of single-copy
(eds). Molecular Evolution. UCLA Symposium on nuclear DNA variation in diverse species. Mol.
Molecular and Cellular Biology, New Series. Vol. Ecol. 2359-373.
122. Wiley-Liss, New York. Slade, R. W., C. Moritz and A. Heideman. 1994.
Simon, C., F. Frati, A. Beckenbach, B. Crespi, H. Liu Multiple nuclear-gene phylogenies: Application
and I? FFlook. 1994. Evolution, weighting and phy- to pinnipeds and comparison with a rnitochondri-
logenetic utility of mitochondria1 gene sequences a1 DNA gene phylogeny Mol. Biol. Evol.
and a compilation of conserved polymerase chain 11:341-356.
reaction primers. Ann. Entomol. Soc. Am. Slatkin, M. 1985. Gene flow in natural populations.
87:651-701. Annu. Rev. Ecol. Syst. 16:393-430.
Singer-Sam, J., R. C Tanguay and A. D. Riggs. 1989. Slatkin, M. 1987. Gene flow and the geographic struc-
Use of Chelex to improve the PCR signal from a ture of natural populat~ons.Science 236:787-792.
small number of cells. Amplifications 3:11. Slatkin, M. 1991. Inbreeding coefficients and coales-
Singh, G., N. Neckelmann and D. C. Wallace. 1987. cence times. Genet. Res. Camb. 58:167-175.
Conformational mutations in human mitochondr- Slatkin, M. 1993. Isolation by distance in equilibrium
ial DNA. Nature 329:270-272. and non-equilibrium populations. Evolution
Singh, R, S., R. C. Lewontin and A. A. Pelton. 1976. 47:264279.
Genetic heterogeneity within electropl~oretic Slatkin, M. 1995. A measure of population subdivision
"alleles" ol xanthine dehydrogenase in Drosophila based on rnicrosatellite frequencies. Genetics
pseudoobscuua. Genetics 84:609-629. 139:457-462.
Smith, M. I., Boom, J. D. G. and R. A. Raff. 1990.
Slatkin, M, and N. H. Barton. 1989.A comparison of Single-copy DNA distance between two con-
three indirect. methods for estimating average lev- generic sea urchin species exhibiting radicalIy dif-
els of gene flow. Evolution 43:1349-1368. ferent modes of development. Mol. Biol. Evol.
Slatkin, M.and R. R. Hudson. 1991. Pairwise compar- 7:315-326.
isons of mitochondrial DNA sequences in stable Smith, M. J., A. Arndt, S. Corski and E. Fajber. 1993.
and exponentially growing populations. Genetics The phylogeny of echinoderm classes based on
129:555-562. mitochondria1 gene arrangements. J. Mol. Evol.
Slatkin, M. and W. Maddison. 1989. A cladistic mea- 36545554.
sure of gene flow inferred from the phylogenies Smith, M. L., J. N. Bruhn and J. 8 . Anderson. 1992. The
of alleles. Genetics 123:603-613 fungus Armillaria bulbos is among the largest and
Slatkin, M. and W. P. Maddison. 1990. Detecting isola- oldest living organisms. Nature 356:428-431.
tion by distance using phylogenies of genes. Smith, M. W., C. E Aquadro, M. H. Smith, R. K.
Genetics 126:249-260. Chesser and W. J. Etges. 1982. Bibliography of
Siightom, J. L., T.W. Theisen, B. P. Koop and M. Electrophoretic Studies of Biochenzical Variation m
Goodman. 1987. Orangutan fetal globin genes. Natural Vertebrate Populations. Texas Tcch Press,
Nucleotide sequences reveal multiple gene con- Lubbock.
versions during horninid phylogeny. J. Biol. Smith, T. A,, J. WheIan P, J. Parry. 1992. Detection of
Chem. 2627472-7483. single-base mutations in mixed population of
Small, E., S. E. Warwick and B. Brookes. 1992. Isozyme cells: A comparison of SSCP and direct DNA
variation and alleged progenitor-derivative rela- sequencing. GATA 9:143-145.
tionships in the Medicago murex complex Smith, T. F., M. S. Watcrman and W. M. Fitch. 1981.
(Fabaceae). Plant Syst. Evol. 181:3343. Comparative biosequence metrics. J. Mot. Evol.
Smith, A. B, 1989. RNA sequence data in phylogenetic 18:38:46.
reconstruction: Testing thc limits of its resolution. Smith, T. F., M. S. Waterman and C. Burks. 1985. The
Cladistics 5:321-344. statistical distribution of nucleic acid similarities.
Smith, A. 8.1994. Rooting rn~leculartrees: Problems Nucl. Acids Res. 13645-656.
and strategies. Bioi. J. Linnean Soc. 51:279-292. Smith, V., M. Craxton, A. T.Bankier, C. M.Brown, W.
Smith, C. A., J. M. Jordan and J. Vinograd. 1971. In D. Rawlinson, M. S. Chee and B. G. Barrell. 1993.
vivo effects of intercalating drugs on the superhe- Preparation and fluorescent sequencing of MI3
lix density of mitochondrial DNA isolated from clones: Microtiter methods. Meth. Enzymol.
human and mouse cells in culture. J. Mol. Biol. 218:173-187.
59:255-272. Smithies, 0. 1955. Zone electrophoresis in starch gels:
Smith, G. R. 1992. Introgression in fishes: Significance Group variations in the serum proteins of normal
for paleontology, cladistics, and evolutionary individuals. Biochem. J. 61:629-641.
rates. Syst. Biol. 41:41-57. Smouse, P. E., T.E. Dowling, J. Tworek, W. R. Hoeh
Smith, J. J., J. S. Scott-Craig, J. R. Leadbetter, G L. and W. M. Brown. 1991. Effects of intraspecific
Bush, D. L. Roberts and D. W. Fulbright. 1995. variation on phylogenetic inference: A likelihood
Characterization of random amplified polymor- analysis of mtDNA restriction site data in
phic DNA (RAPD) products from Xanfhomonas cyprinid fishes. Syst. Zool. 40:393409.
campestris: Implications far the use of RAPD prod- Sneath, P. H. A. and R. R. Sokal. 1973. Numerical
ucts in phylogenetic analysis. Mol. Phylogenet. Taxonomy. W. H. Freeman, San Francisco.
Bvol. 3135-145. Sneath, P. H. A., M. J. Sackin and R. Amber. 1975.
Smith, J. S. C. and 0.S. Smith. 1991. Restriction frag- Detecting evolutionary incompatibilities from
ment length polymorphisms can differentiate protein sequences. Syst. Zool. 24:311-332.
among U.S. maize hybrids. Crop Sci. 31:893-899. Snedecor, G. W. and W. G. Cochran. 1989. Statistical
Smith, M. F., W. K. Thomas and J. L. Patton. 1992. Methods. 8th ed. Iowa State University Press, Arnes.
Mitochondrial-like sequence in the nuclear Sober, E. 1983. Parsimony in systematics:
genome of an Akodontine rodent. Mol. Biol, Evol. Philosophical issues. Anlzu. Rev. Ecol. Syst.
9:204-215. 14:335-357.
Smith, M. J., R. Nicholson, M. Stuerzl and A. Lui. 1982. Sober, E. 1989. Reconstructing the Past: Parszmony,
Single copy DNA homology in sea stars. J. Mol. Evolution, and Inference. MIT Press, Cambridge,
Evol. 18:92-101. Massachusetts.
Literafula Cited 623
Sogin, M.L. 1989. Evolutioi~of eukaryotic microor- Soltis, D. E., P S. Saltis and B. G. M~lllgan.1992
ganisms and their small subunit ribosomal RNAs. lntraspec~ficchloroplasl var~atlon:Systematic and
Amer. Zool. 29:487-499. plrylogcnetic implications, pp 117-150.111P. S
Sogin, M. L. 1990. Amplification of r~bosolnalRNA Soltis, D. E Soltis and J. J. Doyle (eds.), Plnrll
genes for molecular evolution studies, pp. Molec~~lar Systeinat~cs.Chnpman and Hall, N e w
307-314. b?M. A. Innis, D. H.Gelfand, J. J. York.
Sninsky and T. J. White (eds.), PCR Protocols: A Soltls, P. S. and D. E. Soltls. 1994 Plant ~nolecularsys-
Guide to Methods and Applicatto~zs.Academic Press, tcmatics: Inferences of phylogeny and evolution-
San Diego. ary processes. Evol. Biol. 28:139-194.
Sogin, M, L., H. J. Elwood and J. H. Gunderson. 1986. Song, K.M., T. C. Osborn and P. H W~llia~ns. 1988.
Evolutionary diversity of eukaryotic small-sub- Rrass~cataxonomy based on nuclear restrictioli
unil rRNA genes. Proc. Natl. Acad. Sci. USA fragment length polymorpl~lsms(WLPs). I.
83:1383-1387. Genome evolution of diplo~dand amphid~ploid
Sogin, M L., J. H. Gunderson, H. J. Elwood, R. A. species. Thcor. Appl. Gcnct. 75 784-794.
Alonso and D. A. Peattie. 2989. Phylogenetic Song, K. M., T.C. Osborn and P.13 Williams. 1990.
meaning of the kingdom concept: An unusual Ul'ass~cntaxonomy based on nuclear restriction
ribosomal RNA froin Giardta lanzblia. Scicnce fragment length polymorphlsms (RFLPs). 3.
243:75-77. Genome relationships in Brassten and related gel?-
Sokal, R. R. and E J. Rohlf. 1981. Bion~ety ,Second era and the origin of B. oleracen x B , rapa (syn
Edition. W. W.Freeman and Co., San Francisco. campestrzs). Theor. Appl Genet. 79:497-506.
Solignac, M., Guer~nont,J., Monnerot, M., J-C. Sopcr, S. A., L. M. Davis, R R. Falrheld, M. L.
Mounolou. 1984. Genetics of mitochondria in Hammond, C. A. Harger, J. 1-1. Jett, R. A. Keller, B
Drosophila: mtDNA inheritance in heteroplasmic L. Marone, J. C. Martin, H. L. Nutter, E. I3 Shera
strains of 13. nrauritiana. Mol. Gen. Genet. and D. J. Simmons. 1991. Raptd DNA sequel~cmg.
197:183-88. based on single molecule dcSection Proc. Int. Soc
Soltis, D. B. and L. J. Rieseberg. 1986.Autopolyploidy Opt. Engin. 1435:168.
in Toliniea nzeizziesu (Saxifragaceae):Genetic Sourdls, J. and C. Krimbas. 1987 Accuracy of phyloge-
~nslghtsfrom enzyme electropl~oresis.Am. J. Bot. netic trees estimated from DNA scquence data.
73:310-318. MoI. Bioi. Evol. 4:159-168
Soltis, D. E. and P. S. Soltis. 1989. Polyploidy, breedlng Southern, E. M. 1975. Detection of specific sequences
systems, and genetic differentiation in homo- among DNA fragments separated by gcl elec-
sporous pteridophytes, pp. 241-258. In D. E.Soltis trophoresis. J. Mol. Biol. 98:503-517.
and P. S. Soltis (eds.), Isozymes in Plant Biology. Spears, T., L.G. Abele and W. Kiln. 1992. The mono-
Dioscoridcs Press, Portland, Oregon. phyly of brachyuran crabs: A phylogenetic study
Soitis, D. E., C. H. Haufler, D. C. Darruw and G. J. based on 18s rRNA. Syst. Biol. 41:446-461.
Gastony. 1983. Starch gel electrophoresis of ferns: Spencer, D. F., M. N. Schnarc and M. W. Gray. 1984.
A compilation of grinding buffers, gel and elcc- Pronounced structural s~mllaritiesbetween the
trode buffers, and staining schedules. Am. Fern J. small ribosomal RNA genes of wheat mitoc11011-
739-27. 'dria and Escherichla coll. Proc. Natl. Acad. Scl.
Soltis, D. E., P. S. Saltis and B. D. Ness. 1989a. USA 81:493-497.
Chloroplast DNA variation and multiple origins Spencer, E. W., V. M. Ingram and C. Levinthal. 1966.
of autopolyploid y in Heuchera micrantha Electrophoresis: An accident and some p~ecau-
(Saxifragaceae). Evolution 43:650-656. tions. Science 152:1722-1723.
Soltis, D. E., T. A. Ranker and B. D. Ness. 198913. Spencer, N., D. A. Hopkinson and 13. Harris. 1964.
Chloroplast DNAvariation in a wild plant, Phosphoglucomutase polylnorph~smin man.
Tolmtea menziesii. Genetics 121:819-826. Nature 204:742-745.
Soltis, D. E., P. S. Soltis, M. T.Clegg and M. Durbin. Spencer, N., D. A. Hopkinson and H.Harris. 1968
1990. rbcL sequence divergence and phylogenetic Adenosine deaminase polymorphism j37 man.
relationships in the Saxifragaceae sensu lato. Froc. Ann. I-luman Genet. 32.9-14.
Natl. Acad. Sci. USA. 87:4640--4644. Spielman, R. S., J. V. Nee1 and F. H.F Li. 1977.
Soltis, D. E., P. S. Soltis, T. G. Coll~erand M. L. Inbreeding estimation from populakion data.
Bdgerton. 1991. Chloroplast variation within and Models, procedures and ~mphcations.Genetics
among genera of the Heuchera group 85:355-371.
(Saxifragaceae): Evidence for chloroplast transfer
and paraphyly. Am. J. Dot. 78:1150-1161
624 Lifernture Cited
Spinella, D G.and R.C. Vrijenhoek. 1982. Genetic dis- Stecher, P. G., M. Windholz, D. S. Leahy, D. M. Bolton
secrion of clonally inherited genomes of and L.G. Eaton. 1968.MerckTndex. 8th. ed.
Po~cll~opszs.11. Investigation of a silent car- Steel, M. 1994a. Recovering a tree from the Markov
boxylcsterase allele. Genetics 100.279-286. leaf colourations it generates under a Markov
Spolshy, C ,C. A. Phillips and T. Uzzell. 1992. model. Appl. Math. Lett. 7:19-23 (also published
Gynogenetic reproduction In hybrid mole sala- as May 1995, Research Rep, i03, Mathematics
manders (genus Ar~~bystomn). Evolution Bept., University of Christcliurch, NZ).
46 1935-1944. Steel, M. A. 1994b. The maximum likelihood point for
Spr~nger,M. S. 1988. The phylogeny of diprotodontian a phylogenetic tree is not unique. Syst. Biol.
marsupials based 011 single-copy DNA-DNA 43:560-564.
hybridization and craniodental anatomy. Ph.D. Steel, M. A,, M. D. Hendy and D. Penny. 1993a.
dlsserlation, Universrty of California, Everside. Parsimony can be consistent! Syst. Biol.
Springer, M.S. and J. A. W. Kirsch. !989. Rates of sin- 42:581-587.
gle-copy DNA evolution in phalangeriform mar- Steel, M. A., P. J. Lockhart and D. Penny 199313.
supials. Mol. 8101. Evol. 6:331-341. Confidence in evolutionary trees from biological
Springer, M S. and J. A. W. K~rscli.1991. DNA sequence data. Nature 364:440-442.
hybridization, the compression effect; and the Steel, M. A., L. Szekely, P. L. Erdos and P. J. Waddell.
radiation of diprotodontlan marsupials. Syst. 1993c. A complete family of phylogenetic invari-
Zoo1 40:131-151. ants for any number of taxa under Kimura's 3ST
Springer, M. S. and C. Krajewski. 1989. DNA model. New Zealand J. Bot. 31:289-296.
hybrld~zationin animal taxonomy: A critique Steffcn, D. L., G. T. Cocks and A. C. Wilson. 1972.
irom first principles. Quart. Rev. Biol. 64:291-318. Micro-complement fixation in Klebstella classifica-
Springel, M. S., Kirsch, J. A. W., Aplin, K. and T. tion. J. Bacteriol. 110:803-808.
Flannery. 1990. DNA hybridization, cladistics, and Steinemann, M., W. Pinsker and D. Sperlich. 1984.
the phylogrny of phalangerid marsupials. J. Mol. Chromosome homologies within the Drosophila
Evol 30:298-311. obscura group probed by in situ hybridization.
Spr~ligcr,M. S., Davidson, E.If. and X , J. Britten. Chromosoma 91:46-53.
1992a. Calculation of sequence d~vergencefrom Steiner, W. W. M. and D. J. Joslyn. 1979.
the thermal stability of DNA heteroduplexes. J. Elecirophorctic techniques far the genetic study
Mol Uvol. 34:379-382. of mosqu~toes.Mosquito News 3935-54.
Springer, M S., McKay, G., Apiin, K. and J. A. W. Steinmuller, J., E. Schleiermacher and H. Scherthan.
K~rsch199233. Relations among ringtail possums 1993. Direct detection of repetitive, whole chro-
(Marsupialia: I'seudocheiridae) based on DNA- mosome paint and telomere DNA probes by
DNA hybridisation. Australian J. Zool.40:423-435. immunogold electron microscopy. Clromosoi~~e
St Louis, V.L. and J. C. Barlow. 1988. Genetic differen- Res. 1:45-51.
tiatlon among ancestral and introduced popula- Stephen, W. P. 1974. Insects, pp. 303-349. In C. A.
tlons of the Eurasian tree sparrow (Passer mon- Wright (ed.),Biochemical and Immunological
fanus) Evolution 42:266-276. Taxonomy of Animals. Academic Press, New York.
Stahl, D A., D. J. Lane, G. J Olsen and N. R.Pace. Stephens, J. C. 1985. Statistical methods of DNA
1984.Analysis of hydrothcrrnal vent-associated sequence analysis: Detection of intragenic recom-
syn~blontsby ribosomal RNA sequences. Science bination or gene conversion. Mol. Biol. Evol.
224.409-411. 2:539-556.
Stallings, R. L., A. F. Ford, D. Nelson, D. C. Torney, C. Stewart, C.-B. 1995. Active ancestral molecules. Nature
E.ilildebrand and I<.K. Moyzis. 1991. Evolution 37k12-13.
and distribution of (GT),, repetitive sequences in Stewart, C.-B. and A. C. Wilson. 1987. Sequence con-
inainmalian genoines. Genomics 10:807-815. vergence and functional adaptation of stomach
Sianhape, M.J., J. Czelusniak, J.-S. Si, J. Nickerson and lysozymes from foregut fermenters. Cold Spring
M Goodman. 1992. A molecular perspective on Harbor Symp. Quant. Biol. 52:891-899.
mammalian evolution from the gene encoding Stonelung, M., B. May and J. Wright. 1981. LOSSof
in terphotoreceptor ret~noldblnding protein, with duplicate gene expression in salmonids: Evidence
convlnclng evidence for bat monopl~yly.Mol. for a null allele polymorphism at the duplicate
Phylogenet. Evoi. 1:148-160. aspartate aminotransferase loci in brook trout
Stanton, M. 1986. Unveiling the mystery of plant (Salvelznus fontinalls). Biochem. Genet.
paternity. Trends Ecol Evol. 1:116-117. 19:1063-1077.
Stoneking, M., S. T. Sherry and L. Vigilant. 1992. Freshwater Fishes. Stanford University Press,
Geographic origin of human mitochondria1 DNA Stanford.
revisited. Syst. Biol. 41:384-391. Swofford, D. L. and R. B. Selander. 1981.BIOSYS-1: A
Stowell, R. E. (ed.), 1965. Cryobiology, Fed. Proc. FORTRAN program for the comprehensive analy-
24:Sl-S324. sis of electrophoretic data in population genetics
Strobeck, C. and K. Morgan. 1978. The effect of intra- and systematics. J. Hered. 72:281-283.
genic recomb~nationon the number of alleles in a Sytsma, K. J. 1990. DNA and morphology: Inference of
finite population. Genetics 88:829-844. plant phylogeny. Trends Ecol. Evol. 5:104-110.
Studier, J. A. and K. J. Keppler. 1988. A note on the Sytsma, K. J. and W. J. Hahn. 1994. Molecular
neighbor-jolning algorithm of Saitou and Nei. Systematics: 1991-1993. Progr. Botany 55:307-333.
Mol. Biol. Evol. 5:729-731. Szikely, L. A., M. A. Steel and P. L. Erdos. 1993.
Sturtevant, A. H. and E. Novitski. 1941. The homolo- Fourier calculus on evolutionary trees. Adv. Appl.
gies of the chromosome elements in the genus Math. 14:200-216.
Drosophila. Genetics 26:517-541. Szrnidt, A. E., R. Alden and J.-E. H Ilgren. 1987.
Sullivan, J., I<. E. Holsinger and C. Simon. 1995a. Paternal inheritance of chloroplast DNA in Larix.
Among-site rate variation and phylogenetic Plant Mol. Biol. 9:59-64.
analysis of 12s rRNA in sigmodontine rodents. Szymura, J. M. and N. H. Barton. 1986. Genetic analy-
Mol. Biol. Evol. 11:261-277. sis of a hybrid zone between the fire-bellied
Sullivan, J., K. E. Holsinger and C. Simon. 1995b. The toads, Bomhcna bombina and B, varlegata, near
effect of topology on estimates of among-site rate Cracow in southern Poland. Evolution
variation. J. Mol. Evol. (in press). 40:1141-1159.
Sumner, A. T. 1990. Chromosome Banding. Unwin Szymura, J. M. and N. H. Barton. 1991. The genetic
Hyman, London. structure of the hybrid zone between the firebel-
Suzuki, H., K. Moriwaka and E. Nevo. 1987. lied toads Bombina bombina and B, uariegata:
Ribosomal DNA (rDNA) spacer polymorphism in Comparisons between transects and between loci.
mole rats. Mol. Biol. Evol. 4:602-610. Evolution 45:237-261.
Swofford, D. L. 1981.On the utility of the distance
Wagner procedure, pp. 25-43. In V. A. Funk and Taberlet, P. and]. Bouvet. 1994. MtDNA polymor-
D. R. Brooks (eds.), Advances in Cladistics. Proc. pl~ism,phylogeography and conservation genet-
First Meeting of the Willi Hennig Soc., New York ics of the brown bear Ursus arcfos in Europe. Proc.
Bot. Garden, Bronx. Roy. Soc. London B 255:195-200.
Swofford, D. L. 1991. When are phylogeny estimates Tabor, S. and C. C. Richardson. 1987. DNA sequence
from molecular and morphological data incon- analysis with a modified bacteriophage T7 DNA
gruent?, pp. 295-333. In M. M. Miyamoto and J. polymerase, Proc. Natl. Acad. Sci. USA
Cracraft (eds.), Phylogenefic Analysis of D N A 84:47674771.
Sequences. Oxford University Press, New York. Taggart, J. B. and A. Ferguson. 1994. A composite
Swofford, D. L. 1993. PAUP: Phylogenetic Analysis DNA size reference ladder suitable for routine
Using Parsimony, version 3.1. Formerly distrib- application in DNA fingerprinting/profiling
uted by Illinois Natural History Survey, studies. Mol. Ecol. 3:271-272.
Champaign, Illinois. Tajima, F. 1983. Evolutionary relationship of DNA
Swofford, D. L. 1996.PAUP*: Phylogenefic Analysis sequences in finite populations. Genetics
Using I-'arsimorry (and Other Methods), version 4.0. 105:437-460.
Sinauer Associates, Sunderland, Massachusetts. Tajima, F. and M. Nei. 1982. Biases of the estimates of
Swofford, D. L. and S. H. Berlocher. 1987. Inferring DNA divergence obtained by the restriction
evolutionary trees from gene frequency data enzyme technique. J. Mol. Evol. 18:115-120.
under the principle of maximum parsimony. Syst. Tajima, F. and M. Nei. 1984. Estimation of evolution-
Zool. 36:293-325. ary distance between nucleotide sequences. Mol,
Swofford, D. L. and W. P. Maddison. 1987. Biol. Evol. 1:269-285.
&constructing ancestral character states under Tajima, E and N. Takeaaki. 1994. Estimation of evolu-
Wagner parsimony. Math. Biosci. 87:199-229. tionary distance for reconstructing molecular
Swofford, D. L. and W. P. Maddison. 1992. Parsimony, phylogenetic trees. Mol. Biol. Evol. 11:27&286.
character-state reconstructions, and evolutionary Takahata, N. 1989. Gene genealogy in three related
inferences, pp. 186-223.177 R. L. Mayden (eds.), populations: Consistency probabil~tybetween
Systematics, Historical Ecology, and North American gene and population trees. Genetics 122:957-966.
Takahata, N. and S. R. Palumbi. 1985. Extranuclear dif- sis of natural populations. Genetics
ferentiation and gene flow in the finite island 120:1145-1154.
model. Genetics 1093441457. Templeton, A. R., K. Shaw, B. Routman and S. K.Davis.
Tammar, A. R. 1974. Bile salts of Amphibia, pp. 67-76. 1989.The genetic consequences of habitat fragmen-
In M, Horkin and B. T. Scheer (eds.), Citemica1 tation. Ann. Missouri Bot. Garden 77:13-27.
Zoology. Academic Press, New York. Templeton, A. R., K. A. Crandall and C. E Sing. 1992.
Tamura, K. and M. Nei. 1993. Estimation of the num- A cladistic analysis of phenotypic assoc~ations
ber of nucleotide substitutions in the control with haplotypes inferred from restriction endonu-
region of mitochondrial DNA in humans and clease mapping and DNA sequence data. 111.
chimpanzees. Mol, Biol. Evol. 10:512-526. Cladogram estimation. Genetics 132:619-633.
Tateno, Y., M. Nei and E Tajima. 1982. Accuracy of Templetan, A. R., B. Routman and C. A. Phillips. 1995.
estimated phylogenctic trees from molecular data. Separating population structure from population
I. Distantly related trees. J. Mol. Evol. 18:387-404. history: A cladistic analysis of the geographical
Tateno, Y., N.Takezaki and M. Nei. 1994. Relative effi- distribution of mitochondrial DNA haplotypes in
ciencies of the maximum-likelihood, neighbor the tiger salamander, Anzbystama tigrinum.
joining, and maximum-parsimony methods when Genetics 40:767-782.
substitution rate varies with site. Nol. 8101. Evol. Tereba, A, and B. J. McCarthy 1973. Hybridization of
11:261-277. lZ51-labeledribonucleic acid. Biochem~stry
Tautz, D. 1989. Hypervariability of simple sequences 12:4675-4679.
as a general source for polymorphic DNA mark- Therman, E. and M. Susman. 1993.Huntan
ers. Nucl. Acids Res. 17.6463-6471. Chromosomes, Sfrucfure,Behavior, and Epcts.
Tavart., S. 1986. Some probabil~sticand statistical prob- Springer-Verlag, New York.
lems on the analysis of DNA sequences. Lec. Thomas, M.R. and N. S. South. 1993. Microsatellite
Math. Life Sci. 1757-86. repeats in grapevine reveal DNA polymorphisms
Taylor, A, C., W. B. Sherwin and R. K. Wayne. 1994. when analyzed as sequence-tagged sites (STSs).
Genetic variation of microsatellite loci in a bottle- Theor. Appl. Genet. 86:985-990.
necked species: The northern hairy-nosed wom- Thomas, W. IC. and A. T. Beckenbach. 1989. Variation
bat Lasiorhinus krefftii, Mol. Ecol. 3:277-290. in salmonid mitochondrial DNA: Evolutionary
Taylor, H. A., S. E. Riley, 5. E. Parks and R. E. constraints and mechansims of substitution. J.
Stevenson. 1978. Longterm storage of tissue sam- Nol. Evol. 29:233-245.
ples for cell culture. In Vitro 14:476-478. Thomas, W. K. and S. Paabo. 1993. DNA sequences
Tegelstrom, H. 1986. Mitochondria1 DNA in natural from old tissue remains. Meth. Enzyrnol.
populations: An improved routine for the screen- 224:406-419.
ing of genetic variation based on sensitive silver- Thomas, W. K., S. Paabo, F. Villablanca and A. C.
staining. Electrophoresis 7:226-229. Wilson. 1990. Spatial and temporal continuity of
Templeton, A. R. 1983a. Convergent evolution and kangaroo rat populations shown by sequencing
non-parametric inferences from restriction frag- mitochondria1 DNA from museum specimens. J.
ment and DNA sequence data, pp, 151-379. In B. Mol. Evol 31:101-112.
Weir (ed.), Statistical Analysis of D N A Sequence Thompson, E. A. 1973. The method of minimum evo-
Data, Marcel Dekker, New York. lution. Ann. Human Genet. 36:333-340.
Templeton, A. R. 198313. Phylogenetic inference from Thorne, J. S., D.L. Swofford, J. Felsenstein and B. S.
restriction endonuclease cleavage site maps with Wiegmann. 1996. The topology-dependent per-
particular reference to the humans and apes. mutation test for monophyly does not test for
Evolution 32221-244. monophyly. (u~zpublishedmanuscript)
Templeton, A. R. 1987. Nonparametric inference from Thorpe, J. P. 1982. The molecular clock hypothesis:
restriction cleavage sites. Mol. Biol. Evol. Biochemical evaluation, genetic differentiation
4:315-319. and systematics. Annu. Rev. Ecol. Syst.
Templetan, A. R. 1993. The "Eve" hypothesis: A genet- 13:139-168.
ic critique and reanalysis. Am. Anthropol. Thorpe, R. S., D. P, McGregor, A. M. Cummings and
95:51-72. W. C. Jordan. 1994. DNA evolution and coloniza-
Templeton, A. R. ,C. F. Sing, A. Kessling and S. tion sequence of island lizards in relation to geo-
Humphries. 1988. A cladistic analysis of pheno- logical history: mtDNA RFLP, cytochrome b,
type associations with haplotypes inferred from cytochrome oxidase, 125 rRNA sequence, and
restriction endonuclease mapping. 11. The analy- nuclear M P D analysis. Evolution 48:230-240.
Tibbets, C. A, and T. E. Dowling. 1995. Effects of Tuckcr, P. K., B. K. Lee and E M E~cher.1989 Y chro-
intrinsic and extrinsic factors on population frag- mosome evolutran 111 the subgenus Mus (genus
mentation in three North American minnows Mus) Genetics 122 169-179
(Teleostei:Cyprinidae). Evolution (in press). Tucker, P K.,I? D. Sage, J. Warner, A C. W~isonand F
Tllley, S. G. 1981. Anew species of Des~nognathus M. Eicher 1992. Abrupt cllne for scx chromo-
(Amphibia: Caudata: Plethodontidae) from the somes in a hybrid zone between two species of
southern Appalachian mountains. Occ. Pap. Mus. mice. Evolution 46.1146-1163
Zool. University of Michigall 695:l-23. Turner, B. J. 1973. Gcnetrc variation of m~tochondr~al
Tilley, S. G. and J. S. Hansman. 1976. Allozymic varia- aspartate aminotransfcrase In the teleost
tion and occurrence of multiple inseminations in Cyprinodon nevndensrs. Comp Blochem. Phys~ol
populations of the salamander Desmognathus 44B:89-92.
ochrophneus. Copeia 1976:734-741. Turner, B J 1974 Genetic d~vergcnceof Death Vallcj~
Tilley, S. G. and P. M. Schwerdtfeger. 1981. pupfish species: B~ocheln~cal versus marphologl-
Electrophoretic variation in Appalachia11 popula- cal evidence. Evolutlo~~ 28.281-294.
tions of the Desmognafhusfuscus complex Turncr, B. J 1980.A multiple sllcer for starch gels
(Amphibia: Plethodontidae). Copela lsozylne Bull. 13.113.
1981:109-119. Turncr, B. J. 1984. Evolutlona~ygenetics of art~ficlal
Tillier, E. R.M. 1994. Maximum likelihood with multi- refug~umpopulations of an endangered speclcs,
parameter models of substitution. J. Mol. Evol. the desert pupfish. Copela 1984.364-369
39:409417. Turner, B. J ,R. I<.Miller and E M Rasch. 1980
Tillier, C. R. M. and R. A. Collins 1995. Neighbor join- Sign~ficantdifferential gene duplication t\r~thout
ing and maximum likelihood with RNA ancestral tetraploldy in a genus of Mexlcan llsh
sequences: Addressing the interdependence of Experientia 36:927-930
sites. Mol. Biol. Evol. 12:7-15. Turner, B. J., J. S Balsano, I? J Monaco and E. M
Timmis, J. N, and N. S. Scott. 1984. Promiscuous DNA; Rasch 1983. Clonal diversity and evolutionary
Sequence homologies between DNA of separate dynamics m a diploid-tr~plo~d breeding con~plex
organelles. Trends Riochein. Sci. 9:271-273. of unisexual fishcs (Poeczliu) Evalut~on
Titus, T. A. and A. Larson. 1995.A molecular perspec- 37.798-809.
tlve on the evolutionary rad~ationof the salaman- Turner, S ,T. Burger-W~ersma,S 1 Glovannon~,L R
der family Salamandridae. Syst. Biol. 44:125-251. Mur and N. R. Pace 1989 ?he relationship of a
Titus, T. A., D. M. Hillis and W. E. Duellman. 1989. procl~lorophyteProchlorolFirlx hollandlca to green
Color polymorphism in neotropical treefrogs: An chloroplasts. Nature 337.380382
allozymic investigation of the taxonomic status of
Hyla favosa Cope. Werpetologica 45:17-23. Uetsuki, T., A. Naito, S Nagata, Y. Kazlro. 1989
Tjio, J. H, and A. Levan. 1956. The chromosome num- Isolation and character~zationof the human chra-
ber of man. Hereditas 42:2-6. masornal gene for polypepttde chain eiongatlor~
Tobler, J. E. and E. H. Grell. 1978. Genetics and physio- factor-1 alpha. J. Biol. Chcm. 264:5791-5798.
logical expression of Phydroxy acid dehydroge- Upholt, W. B. 1977. Est~mationof DNA sequence
nase in Drosophila. Biochem. Genet. 16333-342. divergence from cornpanson of restriction
Torroni, A., T. G. Scburr, C.-C. Yang, E. J. E. Szatlunary, endonuclease digests. Nucl. Aclcis Ices. 4.1257-65
R. C. Williams, M. S. Schanfield, G. A. Troup, W. Utter, E, P. Aebersold and G W~nans,1987.
C. Knowler, D. N. Lawrence, K. M. Weiss and B. Interpreting genetrc variation detected by elec-
C. Wallace. 1992. Native American mitochondria1 trophoresis, pp. 2146.I11N. Ryman and E Utter
DNA analysis indicates that tlze Amerind and the (eds.), Population Gunefzcsand F~slzeryManage~nenl
Nadene populations were founded by two inde- University of Washington Press, Seattlc.
pendent migrations. Genetics 30:153-162. Uy, R. and F. Wold. 1977. Posthanslatlonal c~valent
Tracey, T. E. and L. I. Mulcahy. 1991.Asimple method modif~cat~on of proteins. Science 198.890-896.
lor direct automated sequencing of PCR frag- Uzzell, T. and K. W. Corbin. 1971 Rtting discrete
ments. BioTechniques 11:68-75 probabil~tydistributions to evolutionary events
Trask, B. J. 1991. Fluorescence in situ hybrid~zation: Science 172:1089-1096.
Applications in cytogenetics and gene mapping.
Trends Genet. 7:149-154. Vacclno, E,M. Accerb~and M Corbellini. 1993.
Tripatlu, R. L.1991. Alternative dideoxy sequencing of Cultivar identification in T acstzvum using hlgl11y
double-stranded DNA. BioTcchniques 12:390-391. polymorph~cDNA probcs. Theor. Appl. Genet
87.833-836.
Valdes, A M. and D. Piiiero. 1992. Phylogenetlc esti- Vawter, L. and W. M. Brown. 1986. Nuclear and mito-
ma tion of plasmid exchangc 11.1 bacteria: chondrial DNA comparisons reveal extreme rate
Evolution 46:641-656. variation in the lnolecular clock. Science
Valdks, A. M , M. Slatkin and N. 8 . Freimer. 1993. 234:194-196.
Allele frequencies a t iurcrosatellite loci: The step- Vawter, I;. and W M. Brawn. 1993. Rates and patterns
wise mutation model revisited Genetics of base change in the small subunit ribosomal
133,737-749. W A gene. Genetics 134:597-608.
Valenui~e,J. E., M.J. Boyle and W. A. Sewell. 1992. Verheyen, G. R., 9. Kempenaers, T. Burke, M. Van Den
Presence of single-stranded DNA in PCR prod- Broeck, C. Van Broeckhoven and A. Dhont. 1994.
ucts of slow mobility. BioTechniques 13:222-224 Identification of hypervariable single locus mh-
Van Beneden, 8,J, and D.A. Powers. 1989. Structural isatellite DNA probes in the blue tit Parus
and functional differentiation of two clinally dis- caeruleus. Mol. Ecol. 3:137-143.
tr~butedglucosephosphate isomerase allelic Verkerk, A. and 20 others. 1991. Identification of a
isozymes from the teleost fish Fundulus heterocli- gene (FMR-1) containing a CGG repeat coincident
tins Mol Biol. Evol. 6:155-270. with a breakpoint cluster region exhibiting length
Van de Peer, Y., J. M. Neefs, P De Iiijk and R. De variation in Fraglle X syndrome. Cell 65:905-914.
Wachter 1993. Reconstructing evolution from Verma, R. S. and A. Babu. 1989.Human Chromosontes:
eukaryotic small-ribosomal-subunit RNA Manual of Basic Techniques. Pergamon Press, New
secluences: Calibration of the molecular clock. 3. York.
Mol. Evol. 37:221-232. Vigilant, L., M. Stoneking, H. Harpending, K.Hawkes
Van Den Bussche, R. A,, D. M. Hillis, J. P. Huelsenbeck and A. C. Wilson. 1991.African populations and
and R.J. Baker, 1996. Base compositional bias and the evolution of human mitochondria1 DNA.
phylogenetic analyses: A test of the "flying DNA" Science 2531503-1507.
hypothesis. (unpublished manuscr~pt) Vilgalys, R. and B. L. Sun. 1994,Ancient and recent
Van Laarhovcn, I? J. M and E H L. Aarts. 1987. patterns of geographic speciation in the oyster
Srrnulafed Annealing: Theory and Applications. mushroom Pleurotus revealed by phylogenetic
Rc~del,Boston. analysis of ribosomal DNA sequences. Proc. Natl.
VanlerBerghe, E, B. Dod, P. Boursot, M. Bellis and E Acad. Sci. USA 91:4599-4603.
Uonhomme. 1986.Absence of Y-chromosome Vogler, A. P. and R. DeSalle. 1994. Evolution and pby-
lntrogression across the hybrid zone behveen M u s logenetic information content of the ITS-1 region
irrirscul~~sand Mus dornesflcus. Genet. Res. 1n the tiger beetle Cicmdela dorsalis. Mol. Biol.
48.191-197. Evo~.lk393-405.
van Ooycn, A. V. Kwee and 13. Nusse. 1985. The Volpi, B. V, and A. Baldini. 1993. MULTIPRINS: A
nuclcoilde sequence of the human int-1 mamma- method for multicolor primed in situ labelling.
ry oncogene; evolutionary conservation of coding Chromosome Res. 1:257-260.
and non-coding sequences. EMBO J. 4:2905-2909. Vrijenhoek, R. C. 1989. Genetic diversity and the ecolo-
vanTets, P. and I. M. Cowan. 1966. Some sources of gy of asexual populations, pp.175-197. In K.
variallon in the blood sera of deer (Odocoileus) as Wiihrmann and S. Jain (eds.), Population Biology
revealed by starch gel electrophoresis. Can. J. and Evolufion. Springer-Verlag, New York.
Zool. 44531-647. Vrijenhoek, R. C., M. E. Douglass and G. K. Meffc.
Van Treuren, R., R. Bijlsma, W. Van Delden and N. J. 1985. Conservation genetics of endangered popu-
Ouborg 1991. The significance of genetic erosion lations in Arizona. Science 229:400-402.
In the process of extinctlon. 1. Genetic differentia-
tlon ~n Salvia pratensis and Scabiosa columbaria ln Waddell, P. J. 1995. Statistical methods of phylogenetic
rcla tion to population slze. Heredity 66:181-189. analysis, including Hadamard conjugations,
Varley, J. M., 1-1, C. Macgregor, I. Nardi, C. Andrews LogDet transforms, and maximum likelihood.
and H.P. Erba. 1980. Cytological evldence of tran- P11.D. dissertation, Massey University.
script1011of highly repeated DNA sequences dur- Waddell, P. J. and M.D. Hendy 1995. Families of order
ing the lampbrush stage in Trzturus cristatus 2t-1 bipartition invariants under the generalised
cnl nzfex. Chromosoma 80:289-307. Kimura 3P model. Massey University Mathe-
Vassarr, G., M. Georges, R. Monsieur, W.Brocas, A. S. matical and Information Sciences Report, Series B.
Lequarre and D.Christophe. 1987. A sequence in Waddell, P. J. and D. Penny. 1996a. Evolutionary trees
M13 phage detects hypervariable minisatellites in of apes and humans from DNA sequences. In A.
human and animal DNA. Science 235683-684. J. Lock and C. R. Peters (eds.), Handbook of
Symbolic Evolution. Clarendon Press, Oxford (in comparisons of higher plant plastocyanins.
press). Phytochemistry 15:137-141.
Waddell, P. J. and D. Penny. 199613. Extending Wallace, D. G., M -C. King ana A. C. Wilson. 1973.
Hadamard conjugations to model sequence evo- Albumin differences among ranid frogs:
lution w ~ t hvariable rates across sites. Available Taxonomic and phylogenetic implications. Syst.
by anonymous ftp from onyx.si.edu. ZooL 22:I-13.
Waddell, F.J. and M. A. Steel. 1995. General time Walldorf, U.and B. T. Hovemann. 1990. Apls mellifera
reversible distances allowing a distribution of cytoplasmic elongation factor la (EF-la) is close-
rates across sites. Research Report, Department of ly related to Drosophila melanogasfey EF-la. FEBS
Mathematics and Statistics, Canterbury Letters 267245-249.
University Walsh, P. S., D. A. Metzger and R. Higuchi. 1991.
Waddell, P, J., D. Penny, M. D. Hendy and G. Arnold. Chelex 100 as a medium for simple extraction of
1994. The sampling distributions and covariance DNA for PCR-based typing from forensic materi-
matrix of phylogenetic spectra. Mol. Biol. Evol. al. BioTechnlques 10:506-513.
11:630-642. Walter, H., W. Selby and J. R. Fransisco. 1965.
Wagner, A,, N. Blackstone, P, Cartwright, M. Dick, B. Altered electrophoretic mobilities of some ery-
Misof, P. Snow, G. P, Wagner, J. Bartels,M. Murtha throcytic enzymes as a function of their age.
and J. Pendleton. 1994. Surveys of gene families Nature 208:76-77.
using polymerase chain reaction: PCR selection Waples, R. S. 1989. A generalized approach for esti-
and PCR drift. Syst. Bioi. 43:250-261 mating effective population size from temporal
Wagner, D. B., G. R. Furnier, M. A. Saghai-Maroof, S. changes in allele frequency. Genetics 121:379391.
M. Williams, B. F.Dancik and R. W. Allard. 1987. Ward, R. D.,B. J. McAndrew and G. P. Wallis. 1979.
Chloroplast DNA polymorphism in lodgepole Purine nucleoside phosphorylase variation in the
and jack pines and their hybrids. Proc. Natl. brook lamprey, Lampetra planer1 (Bloch)
Acad. Sci. USA 84:2097-2100. (Petromyzone, Agnatha): Evidence for a trirneric
Wagner, W. H. 1983. Reticulistics: The recognition of enzyme structure. Biochem. Genet. 17:251-256.
hybrids and their role in cladistics and classifica- Ware, V. C., B. W. Tague, C. G. Clark, R. L. Gourse, R,
tion, pp. 63-79. In N. 1,Platnick and V. A. Funk C. Brand and S. A. Gerbi. 1983. Sequence analysis
(eds.), Advances in Cladzsttcs: Proceedings of fhe of 285 ribosomal DNA from the amphibian
Second Meeting of the -/Villi finnig Society. Xenopus laevis. Nucl. Acids Res. 11:7795-7817.
Columbia University Press, New York. Waterman, M. S. 1984. General methods of sequence
Wahlund, S. 1928. The combination of populations comparison. Bull. Math. Bioi. 46:473-500.
and the appearance of correlation examined from Waterman, M. S., T. F. Smith and W. A. Beyer. 1976.
the standpoint of the study of heredity. Hereditas Some biological sequence metrics. Adv. Math.
1165-106. 20:367-387.
Wainwright, P. O., G. Hinkle, M. L. Sogin and S. K. Waterman, M. S., T. E Smith, M. Singh and W. A,
Stickel. 1993. Monophyletic origins of the meta- Beyer. 1977.Additive evolutionary trees. J. Theor.
zoa: An evolutionary link with fungi. Science Biol. 64199-213.
260:340-342. Waterman, M. S.,]. Joyce and M. Eggert. 1991.
Wake, D. B, and A. Larson. 1987. Multidimensional Computer alignment of sequences, pp. 59-72. In
analysis of an evolving lineage. Science 238:42-48. M. M. Miyamoto and J. Cracraft (eds.),
Wake, D. B., G. Roth and M. H. Wake. 1983. On the Phylogenetic analyszs of D N A sequences. Oxford
problem of stasis in organismal evolution. J. University Press, Oxford.
Theor. Biol. 101:211-224. Watson, P. R (ed.) 1978. Artificial breeding of non-
Wake, D. B., K. P. Yanev and M. M. Frelow. 1989 domestic animals. Symp. 2001. Soc. London
Sympatry and hybridization in a "ring species": 43:l-376.
The plethodontid salamander Ensatina Watt, J. L. and G. S. Stephen. 1986. Lymphocyte cul-
eschscholtzii, pp. 134-157. In D. Ottc and J. A. ture for chromosome analysis, pp. 39-55. Ifl D.E,
Endler (eds.), Speciation and Its Consequences. Rooney and B. 13, Czepulkowski (eds.), Human
Sinauer, Sunderland, Massachusetts. Cytogenefics.IRL Press, Oxford.
Wakeley, J. 1993. Substitution rate variation among Watt, W. B. 1972. Inhagenic recombination as a source
sites in hypervariable region 1of human mito- of population genetic variability. Am. Nat.
chondrial DNA. 5. Mol. Evol. 37:613-623. 106:737-753.
Wallace, D.G. and D. Boulter, 1976. Immunoioglcal
Watt, W. B. 1977. Adaptation at specific loci. I. Natural to isolations from mammalian, insect, higher
selection on phosphoglucose isomerase of Collas plant, algal, yeast, and bacterial sources. Analyt.
butterflies: Biochemical and population aspects. Biochem. 152576-385.
Genetics 87:177-794. Wegnez, M. 1987. Letter to t l ~ eeditor. Cell 51:516.
Watt, W. B. 1983. Adaptation at specific loci. 11. Weining, S. and P. Langridge. 1991. Identification and
Demographic and biochemical elements in the mapping of polymorphisms in cereals based on
maintenance of the Colzas PGI polymorphism. the polymerase chain reaction. Theor. Appl.
Genetics 103:691-724. Genet. 82:209-216.
Watt, W. 0.1985, Bioenergetics and evolutionary Weir, 8. S. 1989. Sampling properties of gene diversity,
genetics: Opportunities for new synthesis. Am. pp. 2342. b?A. W.D. Brown, M. T., Clegg, A. L.
Nat. 125:118-143. Kahler and B. S. Weir (eds.), Plant Po;lulatton
Watt, W. 8.1986. Power and efficiency as indices of fit- Genetics, Breeding and Genetic Resources. Sinauer,
ness in metabolic organization. Am. Nat. Sunderland, Massachusetts.
127:629-653. Weir, B. S. 1990. Genetlc Data Analysis. ~inauer,
Watt, W. B., P. A. Carter and S. M. Blower. 1985. Sunderland, Massacl~usetls.
Adaptation at specific loci. IV. Differential mating Weir, B. S. 1992a. Population genetics in the forensic
success among glycolytic allozyme genotypes of DNA debate. Proc. Natl. Acad. Sci. USA
Colias butterflies. Genetics 109:157-175. 89:1165411659.
Watt, W. B., I? A. Carter and K.Donohue. 1986. Weir, B. S. 1992b. Independence of VNTR alleles
Females' choice of "good genotypes" as mates is defined as fixed bins. Genetics 130:873-887.
promoted by an insect mating system. Science Weir, B. S. 1994. Effects of inbreeding on forensic cal-
233:1187-1190. culations. Annu. Rev. Genet. 28:597-621.
Wayne, R. K., S. K. George, D. Gilbert, P. W. Collins, S. Weir, B. S. and C. C. Cockerhani. 1984. Estimating F-
D. Kovach, D. Girman and N. Lehman. 1991a. A statistics for the analysis of population structure.
morphological and genetic study of the island Evolution 38:1358-1370.
fox, Urocyon littoralzs. Evolut~on45:1849-1868. Weir, B. S. and C. C. Cockerliam. 1989a. Complete
Wayne, R. K., B. Van Valkenburgh and S, J. O'Brien. characterization of disequilibrium at two loci, pp.
1991b. Molecular distance and divergence time in 86-110. In M. W. Feldman (ed.), Mntizenzatical
carnivores and primates. Mol. Biol. Evol. 8:297-319. Evolutionary Tlzeory. Princeton University Press,
Weber, J. L.1990. Tnformativeness of human Princeton.
(dC-CIA),-(dG-dT), polymorphisms. Genomics Weir, B. S. and C. C. Cockerham. 1989b. Analysis of
7:524530. disequilibrium coefficients, pp. 45-51. In W. G.
Weber, J. L. and P. E. May. 1989. Abundant class of Hill and T. F. C. Mackay (eds.), Evolution and
human DNA polymorphism which can be typed Animal Breeding: Reviews on Moleculav nnd
using the polymerase chain reaction. Am. J. Quantitative Genetlcs Approaches in Honour of Alan
Human Genet. 44388396. Robertson. Commonwealth Agricultural Bureaux,
Weber, J. L, and C. Wong. 1993. Mutation of human Slough, United Kingdom.
short tandem repeats. Human Mol. Genet. Weisburg, W. G., M. E. Dobson, J. E. Samuel, G. A.
2:1123-1128. Dasch, L. P. Mallavia, L. Mandelco, J. E. Sechrest,
Weeden, N. F. 1983. Plastid isozymes, pp. 139-158. In E. Weiss and C. R. Woese. 1989a. Phylogenetic
S. D. Tanskcy and T. J. Orton (eds.), dsozymes in diversity of the Rickettsiae. J. Bacteriol.
Plant Genetics and Breeding, Part A. Elsevier, 171:42024206.
Amsterdam. Weisburg, W. G., J. G. Tully, D. L. Rose, J. P. Petzcl, H.
Weeden, N. F,and J. F. Wendell. 1989. Genetics of Oyaizu, D. Yang, L.Mandelco, J. Sechrest, T G.
plant isozymes, pp.46-72. I71 D. E. Solhs and P. S. Lawrence, J. Van Etten, J. Manilaff and C. R.
Soltis (eds.), lsozymes in Plnnt Biology. Dioscorides Woese. 1989b.A phylogenetic analysis of the
Press, Portland, Oregon. mycoplasmas: Basis for their classification. J.
Weeden, N. E, J. J. Doyle and M. Lavin. 1989. Bacteriol. 171:6455-6467.
Distribution and evolution of a glucosephosphate Weisburg, W. G., S. M.Barns, D. A. Pelletier and D. J.
isomerase duplication in the Leguminosae. Lane. 1991. 16s ribosomal DNA amplification for
Evolution 45:1637-1651. phylogenetic study. J. Bacteriol. 173:697-703.
Weeks, D. F.,N.Beerman and 0.M. Griffith. 1986. A Weisman, L. S., B. M. Krummel and A. C. Wilson.
small scale five-hour procedure for isolating mul- 1986. Evolutionary shift in the site of cleavage of
tiple samples of CsCl-purified DNA: Application prelysozyme. J. Biol. Chem. 261:2309-2313.
Weller, S. J., D,P. Pashley, J. A. Martin and J. L. White, M. B., M. Carvalho, D. Derse, S. J. O'Brien and
Constable. 1994. Phylogeny of noctuoid moths M. Dean. 1992. Detecting single base substitutions
and the utility of combining independent nuclear as heteroduplex polymorphisms. Genomics
and rnitochondrlal genes. Syst. Biol. 43:194-211. 12:301-306.
Werman, S. D., Davidson, E.H. and R. J. Britten. 1990. Wh~te,M. E ,J J. Bull, I J Mol~neuxand D. M I3lllls
Rapid evolution in a fraction of the Drosoplaila 1991 Experimental phylogen~esfrom T7 bactcrlo-
nuclear genome. J. Mol. Evol. 30:281-289. phage, pp 935-943 In E Dudley (ed ), The Ulzlty
Werth, C. R. 1985. Implementing an isozyme laborato- of Evollitiona y Blology Proceedtngs of the Fourtk
ry at a field station. Virginia J. Sci. 36:53-76. Iizternatronal Congress of Systematzc and
Wcrth, C. R.and M. D.Windham. 1987.A new model Evoluttonary Biology. Bloscorldes Press, Portland
for speciation in polyploid pteridophytes result- White, M J D 1973 Aninla1 Cytology and Evolutlon
ing from reciprocal silencing of homoeologous 3rd ed Cambrtdge University Press, Cambridge
genes. Am. J. Bot. 74:713-714. Wlute, M W., S D. Mane and R C Bchmond 1988
Werth, C. R., S. I. Guttman and W. H. Eshbaugh. Studles of esterase 6 In DlO~Op\2llarnelanogaster
1985a. Electropl~oreticevidence of reticulate evo- XVIII Biochemical d~ffcrenccsbetween the slow
lution in the Appalachian Aspleniui?~con~plex. and fast allozymes. Mol Biol Evol 5 41-62
Syst. Bot. 10:184-192 White, T J ,N. A r ~ h e l mand FJ. A Erl~ch.1989 T11c
Werth, C. R., S. I. Guttman and W. I-I. Eshbaugh. polymerase cham reactlon Trends Genet
1985b. Recurring origins of allopolyploid species 5 185-189.
in Aspleniutn. Science 228:731-733. Whlte, T J., T Bruns, S. Lee and J Taylor 1990.
Wetmur, J. G. and N. Davidson. 1968. Kinetics of Arnpliflcatlon and direct sequenung of fungal
renaturation of DNA. J. Mol. Biol. 31:349-370. ribosomal RNAgencs for phylogenetlcs, pp
Wetton, J. H., R.E. Carter, D. T. Parkin and D,Walters. 315-322. In M. A Innls, D I-I Gelfand, J J
1987. Demographic study of a wild house spar- Snlnsky and T. J. Wl~lte(eds ), PCR protocol^
row population by DNA fingerprinting. Nature Academic Press, New York
327:147-149. Whltehouse, E. and T Spears 1991 A simplc method
Wheeler, Q. D. 1995. Systematics, the scientific basis for removlng oil from cyclc sequencllig react~ons
for invcntorics of biodiversity. Biodiv. Conserv. B~oTechnlques11 616-628
4:476-489. Whitkus, R , J Doebley and J F Wendel 1994 Nuclear
Wheeler, W. C. 1989. The sytematics of insect riboso- DNA markers In systematics and evolution, p p
mal DNA, p p 307-321. In 8. Fernholm, K.Bremer 116-141. In L Phdlips and I K Vasll (eds ), DNA-
and H. Jornvall (eds.), The Hierarchy of Ltfc. Based Markers in Plants Kluwer Academc
Elsevier, Amsterdam. Publlshcrs, Dordrecht, The Netherlands.
Whccler, W. C. 1990a. Combinatorial weights in phy- Wh~tmore,D H. 1990. Isoelectric focusing of protclns,
logenetic analysis: A statistical parsimony proce- pp. 81-105. In D. I-I. Whitmore (ed ),
dure. Cladistics 6:269-275. Eleclropharetic and Isoeiectr tc Focusrng Tech111qiiesit1
Wheeler, W, C. 1990b. Nucleic acid sequence phyloge- Ptsherzes Maizag.en~ent.CRC Press, Boca Raton,
ny and random outgroups. Cladistics 6363-368. Florlda
Wheeler, W. C. and D. Gladstein. 1992.MALIGN. Whltt, G S 1970. Developmental genehcs of the lac-
American Museum of Natural History, New York. tate dehydrogenase lsozymcs of fish. J. Exp Zoo1
Wheeler, W. C, and D. Gladstein. 1994. MALIGN: A 175 1-36
inultiple sequence alignment program. J. Ilered. Whltt, G S 1981. Evolution of lsozyme locl and thclr
85:417. dlfferentlal regulatron, pp. 271-289.Ii1 G. G E.
Wheeler, W. C. and R. L. Honeycutt. 1988. Paired Scudder and J. L. Reveal (cds ), Evollitton Toduy,
sequence difference in ribosomal RNAs: Proceedtngs of tlze Secaizd Infenzatlanal Congress of
EvoLutionary and phylogenetic implication. Mol. Systerrratrc and Evolutronaiy Biology. Hunt lnst Uot
Biol. Evol. 5:90-96. Documentation, Carncgle-Mcllen University,
Wheeler, W. C, and K.Nixon. 1995.A novel method Pittsburgh, Pennsylvania
for economical diagnosis of cladograms under Whitt, G. S 1983. Isozymcs as probcs and part~clpant.;
Sankoff optimization. Cladistlcs 10:207-213. In developmental and evolutionary genet~cs,pp.
Wheeler, W. C., J. Gatesy and R. DeSalle. 1995. Elision: 1-40. In M. C. Rattazzl, J G Scandalios and G. S
A method for accommodating multiple molecular Whrtt (eds 1. bozymes Czlrrcrnt Toprcs in Btologlcal
sequence alignments with alignment-ambiguous and Medlcal Research Vol 10 Geneftcs and
sites. Mol. Phylogenet. Evol. 4:l-9. Evolution. A, R. Llss, New York.
Whlt~,G. S 1987. Species differences in isozyme tissue Wilk~nson,M. 1994. Common cladistic information
patterns: Their uhi~tyfor systematic and evolu- and its consensus representation: Reduced Adams
tionary analyses, pp 1-26. In M. C. Rattazzi, J. t. and reduced cladistic consensus trees and pro-
Scandalios and G. S.Whitt (cds.), Isozymes: files. Syst. B~ol.43:343-368.
Ciii rent Topics ~n B1oIog1'cnIand Medical Researclz, Williams, J. G. K., A. R.Kubelik, K.J. Livak, J. A.
VoI 15. Genetics, Development, nnd Evolution. A. R. Rafalski and S. V. Tingey, 1990. DNApolymor-
L :ss, New York. phisms amplified by arbitrary primers are useful
Whitt, C S., J B. Shaklee and C. L. Markert. 1975. as genetic markers. Nucl. Acids Xes.
Evolution of the lactate dehydrogenase isozymes 18:6531-6535.
or fishes, pp. 381-400. I n C. L. Markert (ed.), Williams, P. L. and W. M. Fitch. 1989.Finding the min-
lsozyirres IV: Genelrcs a~rdEvolution. Academic imal change in a given tree, pp. 453-470. In B.
Press, New York. krnholrn, K.Bremer and H. Jornvall (eds.), The
M'h~ttcmore,A. and B. Schaal. 1991. Interspecific gene Hierarchy of Llfe, Elsevier, Amsterdam.
llo~7 in sylnpatric oaks. Proc. Natl. Acad. Sci. USA Williams, S. M., R. BeSalle and C. Strobeck. 1985.
88 2540-2544. Homogenization of geographical variants at the
Wich~nan,H. A,, S. S. Potter and D. S. Pine. 2985. Mys, nontranscribed spacer of rDNA in Drosophila mer-
a family of mammalian transposable elements catorurn. Mol. Biol. Evol. 2:338-346.
~soiatedby phylogcnetic screening. Nature Williams, S. M., G. R.Furnier, E. Fuog and C. Strobeck.
31277-81 1987. Evolution of the ribosomal DNA spacers of
Wicliman, H.A,, C. T. I'ayne, 0.A. Iiyder, M.J. Drosophila melanogaster: Different patterns of vari-
Hamilton, M. Maltbie and R,J. Baker. 1991. ation on X and Y chromosomes. Genetics
Gcnomic distribution of hetcrochromatic 116:225-232.
sequences m equids: Ilnplications to rapid chro- Williams, S. M., R. W. DeBry and J. L. Feder. 1988. A
~ ~ \ o s o mevolution.
al J. Hered. 82;369-377. commentary on the use of ribosomal DNAin sys-
Wienberg, J. R., A. Jauch, R. Stanyon and T. Cremer. tematic studies. Syst. ZaoI. 3260-63.
1990. Molecular cy totaxonomy of primates by Wilson, A. C., V. M. Sarich and L. R.Maxson. 1974.
cillomosomal in situ suppression hybridization. The importance of gene rearrangement in evolu-
Genomics 8:347-350. tion: Evidence from studies of rates of chromoso-
W~enberg,J, R., C. A. Stanyon and T, Cremer. 2992. ma], protein, and anatomical evolution. Proc.
Homologies in human and Macaca fuscata chro- Natl. Acad. Sci. USA 71:3028-3030.
mosomes revealed by in situ suppression Wilson, A. C., G. L. Bush, S.M. Case and M. C. King.
hybridization with l~umanchromosome-specific 1975. Social structuring of mammalian popula-
DNA libraries. Chromosoma 101:265-270. tions and rate of chromosomal evolution. Proc.
Wlens, J. J. and P. T. Chipplndale. 1994. Combining Natl. Acad. Sci. USA 72:5061-5065.
and we~ghtingcharacters and the prior agreement Wilson, A. C., S. S. Carlson and T.J, White. 1977.
approach revisited. Syst. Biol. 43:564-566. Biochemical evolution. Annu. Rev. Biochern.
Wlcns, J. J. and D. M. Hill~s.1996. Accuracy of parsi- 46:473-639.
mony analysis using morphological data: A reap- Wllson, A. C., R. L. Cann, S. M. Carr, M. George, Jr., U.
pra~sal.Syst. Bot. (in press). B. Gyllensten, K. Helm-Bychowski, R. C. Higuchi,
Wiens, J . J. and T. A. T~tus.1992. A phylogenetic analy- S. R. Palumbi, E. M. Prager, R. D. Sage and M.
LIS of Spea (Anura: Pelobatidae). I-Ierpetologica Stoneking. 1985. Mitochondria1 DNA and two
17.21-28. perspectives on evolutionary genetics. Biol. J.
Wllcy, E.0.1978. The evolutionary species concept tinnean Soc. 26:375400.
reconsidered. Syst. Zool. 27:17-26. Wilson, A. C., H. Ochman and E. M. Prager. 1987a.
Wiley, E. 8.1982.Phylogerietics: 17ze T k w r y nnd Practice Molecular time scale for evolution. Trends Genet.
of I'i~ylogenetic Systernatlcs. Wiley Interscience, 3:241-247.
New York. Wilson, A. C., M. Stoneking, R. L. Cann, E. M. Prager,
lhlllc.y, E 0. 1988a. Vicarlance biogeograpl~yAnnu. S. U.Ferris, L. A. Wrischnik and R. G. Higuchi.
Rev Ecol. Syst. 19:513-542. 1987b. Mitochondria1 clans and the age of our
Wlley, E.0. 1988b. Parsimony analysis and vicariance common mother, pp. 158-164. In F. Vogcl and K.
biogeography. Sys~.Zool. 37:271-290. Sperling (eds.), Human Genetics. Proceedings ofthe
\Vlll~i.lmi,I< W, 1942. The application of the precipitln Seventh lnternntional Congress, Berlin, 2986.
iechnique to theories concerning the origin of ver- Springer-Verlag, Berlin.
icbrates Biol. Bull 82:179-189.
Wilson, A. C., E. A. Zimmer, E. M. Prager and T. D. Wolfe, K. H., W.-EI. Li and P? M. Sharp. 1987, Rates of
Kocher. 1989, Restriction mapping in the molecu- nucleotide substitutions vary greatly among plant
lar systematics of mammals: A retrospective mitochondrial, chloroplast, and nuclear DNAs.
salute, pp. 407419. In B. Fernholm, K. Bremer Proc. Natl. Acad. Sci. USA 84:9054-9058.
and H. Jornwall (eds.), The Hierarchy of Life. Proc. Wolfe, K. H., W.-W. Li and P. M. Sharp. 1989a. Rates of
Nobel Symp. 70. Elsevier, Amsterdam. synonymous substitution in plant nuclear genes.
Wilson, E. 0. 1985. Time to revive systematics. Science J. Mol. Evol. 293208-211.
230:1227. Wolfe, K. H., M. Gouy, Y.-W. Yang, P. M. Sharp and W.-
Wilson, E. 0.1986. The value of systematics. Science H. Li. 1989b. Date of the monocot-dicot diver-
231:1057. gence estimated from chloroplast DNA sequence
Wilson, F. R., G. S. Whitt and C. L. Prosser. 1973. data. Proc. Natl. Acad. Sci. USA 86:6201-6205.
Lactate dehydrogenase and malate dehydroge- Wolfe, K. H., C. W. Morden and J. D. Palmer. 1992.
nase isozyme patterns in tissues of temperature Function and evolution of a minimal plastid
acclimated goldfish (Carassius auratus). Comp. genome from a nonphotosynthetic parasitic plant.
Biochem. Physiol. 46B:105-116. Proc. Natl. Acad. Sci, USA 89:10648-10652.
Wilson, G. N., M. holler, L. L. Szyura and R. D. Wolff, K., S. H. Rogstad and B. A. Schaal. 1994.
Schmickel. 1984. Individual and evolutionary Population and species variation of minisatellite
variation of primate ribosomal DNA transcription DNA in Plantago. Theor. Appl. Genet. 87:733-740.
initiation regions. Mol. Biol. Evol. 1:221-237. Wolstenholme, D. R. 1992. Animal mitochondrial
Wilson, V. G. and G. Schuller. 1992. PCR-SSCP screen- DNA: Structure and evolution. Int. Rev. Cytol.
ing of MI3 plaques. Focus (BRL) 16:59-62. 141:173-216.
Wintero, A. K., M. Fredholm and P. D. Thomsen. 1992. Wolstenholrne, D. R., Clary, D. O., MacFarlane, J. L.,
Variable (dG-dT),(dC-dA), sequences in the Wahleithner, J. A. and L. Wilcox. 1985.
porcine genome. Genomics 12:281-288. Organization and evolution of invertebrate mito-
Wirz, T., U.Brandle, T. Soldati, J. P. Hossle and J.-C. chondrial genomes, pp. 61-69. In E. Quagliariello,
Perriard. 1990. A unique chicken 8-creatine kinase E. C. Slater, FAPalmieri, C. Saccone and A. M.
gene gives rise to two B-creatine kinase isopro- Kroon (eds.), Achievements and Perspectives of
teins with distinct N-termini by alternative splic- Mitoclzondrial Research. Elsevier, Amsterdam.
ing. J. Biol. Chem. 265:11656-11666. Womack, J. E. 1983. Post-translational modification of
Woese, C. R, and G. J. Olsen. 1986. Archaebacterial enzymes: Processing genes, pp. 175-186. In M. C.
phylogeny: Perspectives on the urkingdoms. Syst. Rattazzi, J. G. Scandalios and G. S.Whitt (eds.),
Appl. Microbiol. 7:161-177. Isozymes: Current Topics in Biological and Medical
Woese, C. IT., Maniloff, J. and Zablen, L. B. 1980. Research, Vol. 7.Molecular Structure and Regulation.
Phylogenetic analysis of the mycoplasmas. Proc. A. R. Liss, New York.
Natl. Acad. Sci. USA 77494498. Worthington Wilmer, J., C. Moritz, L, Hall and J. Toop.
Woese, C. R., R. Gupta, G. M. Hahn, W. Zillig and J. 1994, Extreme population structuring in the
Tu. 1984a. The phylogenetic relationships of three threatened Ghost Bat, Macrodema gigas: Evidence
sulfur-dependent Archaebacteria. Syst. Appl. from mitochondria1 DNA. Proc. Roy. SOC.London
Microbiol. 5:97-105. B 257:193-198.
Woese, C. R' ., E. Stackebrandt, W. G. Weisburg, B. J. Wong, C., C. E. Dowling, R. K. Saiki, R. G.Higuchi, H.
Paster, M, T. Madigan,V. J. Fowler, C. M. Hahn, P. A. Ehrlich and H. H. Kazazian, Jr. 1987.
Blanz, R. Gupta, K. H. Nealson and G. E. Fox. Characterization of Pthalassaemia mutations
1984b. The phylogeny of purple bacteria: The alpha , using direct genomic sequencing of amplified sin-
subdivision. Syst.Appl. Microbiol. 5:315-326. gle copy DNA. Nature 330:384-386.
Woese, C. R., W. G. Weisburg, B. J. Paster, C. M. Hahn, Woodruff, D. S. 1989. Genetic anomalies associated
R. S. Tanner, N. R. Krieg, H.-P. Koops, H. Harms with Cerion hybrid zones: The origin and mainte-
and E. Stackebrandt. 1984c.The phylogeny of nance of new electromorphic variants called
purple bacteria: The beta subdivision. Syst. Appl. hybrizymes. Biol. J. Linnean Soc. 36281-294.
Microbial. 5:327-336. Woodruff, R. C. and J. N. Thompson. 1980. Hybrid
Wocse, C. R., W. G. Weisburg, C. M. Hahn, B. J. Paster, release of mutator activity and the genetic structure
L. B. Zablen, B. J. Lewis, T. J. Macke, W. Ludwig of natural populations. Evol. Biol. 12:129-162.
and E. Stackebrandt. 1985. The phylogeny of pur- Woodward, S. R., N. J. Weyand and M. Bunnell. 1994.
ple bacteria: The gamma subdivision. Syst. Appl. DNA sequences from Cretaceous Period bone
Microbiol. 6:25-33. fragments. Science 2663229-1232.
Workman, P. L.and J. D. Niswander. 1970. Population Yang, Z. 1993. Maximum likelihood estimation of phy-
studies on southwestern Indian tribes. 11. Local logeny from DNA sequences when substitution
genetic differentiation in the Papago. Am. J. rates differ over sites. Mol. Biol. Evol.
Human Genet. 22:24-29. 10:1396-1401.
Wothe, D. D., H. Charbonneau and B. M. Shapiro. Yang, Z. 1994a. Estimating the pattern of nucleotide
1990. The phosphocreatine shuttle of sea urchin substitution. J. Mol. Evol. 39:105-111.
sperm: Flagellar creatine kinase resulted from a Yang, 2. 1994b. Maximum likelihood phylogenetic
gene triplication. Proc. Natl. Acad. Sa. USA estimation from DNA sequences with variable
87:5203-5207. rates over sites: Approximate methods. J. Mol.
Wright, C. A. (ed.). 1974. Biochemical and I~r~~irunological Evol. 39:306-314.
Taxonomy ofAnima[s. Academic Press, New York. Yang, Z. 1994c. Statistical properties of the maximum
Wright, C. A. (ed.). 1978. Biochemical and Immunological likelihood method of phylogenetic estimation and
Taxonomy of Animals. 2nd ed. Academic Press, comparison with distance matrix methods. Syst.
New York. Biol. 43:329-342.
Wright, D. A., C. M. Richards, J. S. Frost, A. M. Yang, Z. 1995. PAML, Phylogenetic Analysts by
Camozzi and B. J. Kunz. 1983. Genetic mapping Maximum Likelihood (PAWL), version 1.1. Institute
in amphibians, pp. 287-311. In M. C. Rattazzi, J. of Molecular Genetics, Pennsylvania State
G. Scandalios and G. S. Whitt (eds.), Isozymes: University, University Park.
Current Topics in Biological and Medtcal Research, Yang, Z., N. Goldman and A. E. Friday. 1994.
Vol. 7.Molecular Structure and Iiegulation. A. R. Comparison of models for nucleotide substitution
Liss, New York. used in maximum likelihood phylogcnetic esti-
Wright, J. W., C. Spolsky and W. M. Brown. 1983. The mation. Mol. Biol. Evol. 11:316-324.
origin of the parthenogenetic lizard Yonenaga-Yassuda, Y., S. Kasahara, T. M.Chu and M.
Cnemidophorus laredoensis inferred from mitochon- T. Rodrigues. 1988. High-resolution RBG-banding
drial DNA analysis. Herpetologica 39:410-416. pattern in the genus Tropidurus (Sauria,
Wright, S. 1943. Isolation by distance. Genetics Iguanidae). Cytogenet. Cell Genet. 48:68-71.
28:114-138. Young, A. and R. BIakesley. 1991. Sequencing plasmids
Wright, S. 1951. The genetical structure of populations. from single colonies with thc dsDNA cycle
Ann. Eugen. 15:323-354. sequencing system. Focus (BRL) 13:137.
Wright, S. 1978. Evolution and the Genetics of Youvan, D. C. and J. E. Hearst. 1979. Reverse tran-
Populations. University of Chicago Press, Chicago. scriptase pauses at N2-methylpanine during in
Wrischnik, L. A., R. G. Higuchi, M. Stoneking, H. A. vitro transcription of Escherichia coli 16s ribosomal
Erlich, N. Arnheim and A. C. Wilson. 1987. RNA. Proc. Natl. Acad. Sci. USA 763571-3574.
Length mutations in human mitochondrial DNA: Yu, L.-X. and H. T.Nguyen. 1994. Genetic variation
Direct sequencing of enzymatically amplified detected with RAPD markers among upland and
DNA. Nucl. Acids I<es.15:529-542. lowland rice cultivars. Theor. Appl. Genet.
Wu, C.-I. 1991. Inferences of species phylogeny in rela- 87:68&692.
tion to segregation of ancient polymorphisrns.
Genetics 127:429-435. Zevering, C. E., C. Morltz, A. Heideman and R. Sturm.
Wu, C.-I. and W.-H. Li. 1985. Evldence for higher rates 1991. Parallel origin of duphcations and the for-
of nucleotide substitution in rodents than in man. mation of pseudogenes in mitochondrial DNA
Proc. Natl. Acad. Sci. USA 82:1741-1745. from parthenogenetic lizards (Heteronotia btnoei:
Wu, C.-1. and N. Maeda. 1987. Inequality in mutation Gekkonidae). J.Mol. Evol. 33:431-441.
rates of the two strands of DNA. Nature Zhan, T. S., 5. Pathak and J. C. Liang. 1984. Induction
327:169-170. of G-bands in the chromosomes of Melanoplus san-
Wulf, J. H. and R. G. Cutler. 1975. Altered protein guinipes (Orthoptera, Acrididae). Can. J. Genet.
hypothesis of mammalian aging processes: I. Cytol. 26:354359.
Thermal stability of glucose-6-phosphate dehy- Zharkikh, A. 1994. Estimation of evolutionary dis-
drogenase in C57BL/6J mouse tissue. Exp. tances between nucleotide sequences. J. Mol.
Gerontol. lO:101-117. Evol. 39:325-329.
Zharkikh, A. and W.-13. Li. 1992a. Statistical properties
Yang, D., Y. Oyaizu, H. Oyaizu, G. J. Olsen and C. R. of bootstrap estimation of phylogenetic variability
Woese. 1985. Mitochondria1 origins. Proc. Natl. from nucleotide sequences. I. Four taxa with a
Acad. Sci. USA 82:44434447. molecular clock. Mol. Biol. Evol. 9:1119-1147.
Zharkrkh, A, and W.-H. Li. 1992b. Statistical properties Zimmerman, W. 1930 Dze Pkylogente del PfZanzel? C
of bootstrap estimation of phylogenetic variability Rscher, Jena, Germany
from nucleotide sequences. 11. Pour taxa without a Z~rnmerinan,W. 1931. Arbeltswelse der botanlschen
molecular clock. J. Mol. Evol. 35:356-366. Phylogcnct~kund anderer
Zharkikh, A. and W.-H. Li. 1993. Incanslstency of the Gmppierungswisscnscl~eftei~, pp. 941-1 053.Ii.1E
maximum-parsimony method. The case of five Abdcrhaldcn (ed.),l-lnizdbticl~der b~ologzschelz
taxa with a rnolccular clock. Syst. Biol. 42:113-125. Arbeitsmetkoden. Urban and Schwarzenberg,
Zharkikh, A. and W.-H. Li. 1995. Estimation of confi- Berlin.
dence in phylogeny: The full-and-partial boot- Zimmerrnan, W. 1934. Research on phylogeny of
strap technique. Mol. Phylogenet. Evol. 4:44-63. species and of single characters. Am. Nat
Zhen, L. and R. T. Swank. 1993. A simple and high 68:381-384.
yield method for recovering DNAfrom agarose Zimmerman, W. 1943. Die Methoden der
gels. BioTechniques 14:894-898. Pltylogenetik, pp. 20-56 In G Heberer (ed ), Die
Zhu, D., B. G. M. Jamieson, A. Hugall and C. Moritz. Evolutton der Organlstnen G Fischer, Jena,
1994. Sequence evolution and phylogenetic signal Germany.
in control region and cytochrome b sequences of Zorn, A. M. and P. A. Krleg. 1991. PCR analysls of
rainbowfisl~es(Mclanotaeniidae).Mol. Biol. Evol. alternative splicing pathways Idcntificatlon of
11:672-683. artifacts generated by heteroduplex format~on
Zischler, H., M. Hdss, 0. Handt, A. van Haeseler, A. C. BioTecl~niques11:181-183.
van der Kuyl, J. Goudsmit and S. Paabo. 1995. Zouros, E., K. R. Frccman, A. 0.Ball and G. I-I.
Detecting dinosaur DNA. Science 268:1192-1193. Pogson. 1992. Direct evldencc for extenslve patcr-
Zimmer, E. A., S. L. Martin, S. M. Beverly, Y. W. Kan nal mitochondria1 DNA ~nheritancein the marine
and A. C. Wilson. 1980. Rapid duplications and mussell Mytilus. Nature 359:412414.
loss of genes coding for a chains of hemoglobin. Zuckerkandl, E. and L. Paul~ng.1962. Molecular dis-
Proc. Natl. Acad. Sci. USA77:2158-2162. ease, evolution and genic heterogcneity, pp.
Zimmer, E. A,, C. J. Rlvin and V, E. WaIbot. 1981. A 189-225. In M. Kasha and B.Pullman (eds.),
DNA isolation procedure suitable for most higher FIo~orizonsin Biochelnzstry Academic Press, New
plant species. Plant Mol. Biol. Newsl. 2:93-96. York.
Ziinmer, E. A,, E. R. Jupe and V. Walbot. 1988. Zuckerkandl, E. and L Pauling 1965. Evolutionary
Ribosomal gene structure, variation and inhen- divergence and convergence In protclns, pp
tance in maize and its ancestors. Genetics 97-166. In V. Bryson and H. J. Vogel (eds.),
120:1125-1136. Evolvzng Genes and Profems Academic Press, New
Zimmer, E. A,, R. K. Hamby, M. L. h o l d , D. A. York.
Leblanc and E. C. Theriot. 1989. Ribosomal RNA Zurawski, G. and M. T. Clegg. 1987 Evolution o i high-
phylogenies and flowering plant evolution, pp. er-plant chloroplast DNA-encoded genes.
205-214. In B. Fcrnholm, K. Bremer and H. lmpllcations for structure-function and phyloge-
Jornvall (eds.), The Hierarchy oJLife. Proc. Nobel netlc studies. Annu. Rrv. Plant Physiol.
Syrnp. 70. Elsevier, Amsterdam. 38:391-418.
Page numbers in boldface type in&- Algorithm(s) divergence, 59
cate formulas for stock s~lutions. defined, 408 historical events, 56
exact, 478482 spatial and temporal, 56
AAT (aspallate aminotransferase), 100 "greedy," 482 ALP (alkalinephosphatase), 99
ABl,l-l, 510 us, optimality criteria, 408409, ALPDH (alanopinedehydrogenase), 99
Acetate-Tns-PDTA (ATE), 378 415-416 Aluminum foil tissue packaging, 3031
Acetone powder, 37-38 single-tree, 529 Alu-repeats, preserved fragments, 39
N-acciyl-PglucosaWdase ( F A ) , 97 Algorithmic methods, 4864% Ambiguities, pairwise sequence compar-
Acld phosphatase (ACP),97-98 additive trees, 487493 ison, 454-455
ACOIH (amnitate hydratase), 98 cluster analysis, 486487 Arnine-citrate
ACP (acid phosphatase),97-98 distance Wagner, 493 morpholine, 117
Acrylam~deaoiutions, 378 neighbor joining, 488-490 propanol, 117
AC TC repcats, 271 Alignment, 412 Ammonium acetate (NIWc),378
Actln prrmer, 241-242 gaps, 453 Ammonium persulfate (APS),319
Activalors, eleclrophorehc, 73 Alignment algorithms, 331 Amplification.see also Nuclear DNA am-
ADA (adenosine d e a m s e ) , 98 global, 375 plifications; Polymerase chain reac-
Adaptahon, and allozyme vanation, local, 374 tion; specific types
58-59 Alkaline electrophoresis buffers, 201 direct, 225
Addlhon, stepw~se,482-483 Alkaline phosphatase (ALP), 99,139, reverse transcriptase, 336
Additive distances, 172,447-448, 142,265 AMPPD [disodium 3-(4-methoxyspiro-
487493 Allele(s),51,413 [1,2-dioxetane-3-2'-
Additrve tree methods, 448-452 as characters, 413 tri~ydo(3.3.1.l~~~)decan14
algoathmic, 487-493 coalescence, 266 yl)pheny11,265
Fitch -Mdrgohash metl~od,448-451 cryptic, 64 Ancestral polymorphisms, 277
rninlmum evolution (ME) method, homology, 256-259 Ancient DNA, 228
451-452 locus nomenclature, 95 Anesthesia, 33-34
systematic error, 495 null, 66,255 Aneuploidy, 62
Addillvliy, tree, 447 rare, 67 Animal mitochondria1DNA (mtDNA).
Additivity assumption, 172 segregation, detection limits, 61-66 see Mitochondria1DNA, animal
Adenosme dcaminase (ADA), 98 sorting, 9-10 Animal tissue collection, 33-35
Adenylatc lunase (a), 9598 variants, LDH isozyrne, 93 Anion, 53
ADH (alcohol dehydrogenase), 99 variation, 23 Annealing
hgarose gel, electrophoresis (AGE),55, Allele frequenc~(les),389-3901413-414 extension, 227
262 sre ulso lsozyme elech.ophoresis between-population heterogeneity, 65 PCR cycle, 208-209
protocol, 291-297 gene diversity, 386 temperanue, PCR, 227-228
Ag:~r overlay, 86-89,97 geographic variation, 56 thermal cycling, 227
AGE see Agarose gel elecLrophoresis population structure, 54,56 Anode, 52
AgNOI? receding, 414 Anonymol~ssingle-copy RKPs, 218
banding protocol, 157-158 space, 425 Anonymous singlecopy sequ@nce(s)
develop~r,166 variance, 20-21 amplification, 218
AC-TC repcats, 271 Allopatnc sp&es/popuIahons, 22-24 population-level comparisons,
AK (adcnylatekinase), 98,100 Allopolyplo~ds,62 272-275
Aka~kemformahon cnter~on,440 Allozyme(s), 51 Anti-avidin antibody, biotinylated goat,
Alanoplne dehydrogenase (ALPDH), 99 characters, 59 166
A L M (alanine aminotransferase), 99 clock, 60,538-539 Antibody(1es).ser also Polyclonal; spe-
Alb~~rnm data, 413 cific types, e.g., Anti-avidin, Blohn,
frcezo-thdw stability, 38 data, parsimony, 425-426 Monoclonal
stability in alcohol, 33 electrophoresis, 3, 19 chromosome painting, 126
Alsohol del~yd~ogenase (ADH), 99 synapomorpltic, 65 cross-contamination, 148
Alcohol tlssue preservahon, 33 AUozyme variahon gold-conjugate, 123
Aldolase pnmer, 243-245 adaptive differences, 58-59 monoclonal, 223,126
Index 637
polyclonal, 123,126 probes, 265 subtree pruning and regrafting, 484
primary, 139-140 specific antibodies, 139 tree b~sectionand reconnection, 485
Ant~gens,clmmosome painting, 126 Biotin-avid'i label, 123,136 BrdU-banding, salamander embryo pro-
Apomorphism, 277 Blotin label, 226,135-136,142 tocol, 158
A15 (ammonium persulfate), 319 Diohn-labeled probe hybridization buffer, BrdU label, 126,135-136
ARAB (a-L-arabinofuranos~dase),99 167 Breeding structure, 56-57
Arc dlstance measure, 463 Biohnstreptavidin, 265 Breeding studies, 53
Area cladograrn, 60 Biohnylatd goat anti-avidin antibody, Brent-Powell methods, 445
ARK (argirune kinase), 100 166 5-Bromo-4-chloro-3-indolyphosphate
NU(primer, 241 Biotinylated nucleotide, 213-214 (BCIP), 142
Asexual species Biotinylated probes, 127,136 Brooks parsimony analysis (BPA),59-60
complex, 23 Biotypes, unisexual, 61 Broth, L, 379
relationships, 520 Bisbenzimide, 259 USA (bovine serum a l b u m ) , 38,211
Aspartate aminotransferase (AAT), 100 Bivalents, 129-130 BufFer(s)
Assumptions, general, and systematic er- BLAST algorithm, 374 additions, in PCR, 225
ror, 494 Blocking solution, fluorescein-avidin, 166 alkaline electrophores~s,201
Asymmetric reamplification, 325,355-356 Blood collection, 33-35 EN, 166
ATE (acetate-Tris-EDTA), 378 Blotting, vacuum, 299 borate, 379
AT repcats, 271 Blucose, 320 chromomycin, 166
Automated sequencers Blunt-end cloning, 357-358 CTAB (hexadecyltrimethylammonium
gel reading, 371-372 BN (bicarbonate-nonidet) buffer, 166 bromide) extraction buffer, 379
types and use, 330 Bone, PCR extraction, 223-224 cycle sequencing, 379
Autopolyploids, 62 Boolean queries, 374 functions, 53
Autoradiography, 123 Bootskapping, 197,392,397-398 hybridization, 137
chromosomes, 123,138,162-163 nonparametric, 409 hybridization for biotin-labeled
DNA fragments, 302 parametric, 523-526 probes, 167
DNA sequening, 328330,369-374 parametric vs. nonparametric, 523-524 incubation temperature/strength, 173
genomic libraries, 349 random error, 507-509 ionic, 53
Avidm, 139 see also Anti-avidin antibody; split decomposition, 492 isolation (cpDNA), 319
Avidin-hintin; Fh~owscsin-avidin stachastic effects, 523 isazyme electrophoresis, 116-120
biochemical properties, 126-127 Bootstrap proportion, 523-524 label, 320
Avidin-biotin, 123,136 Borate ligation, 380
buffer, 379 lysis (cpDNA), 319
Background staining, 93-94,148 continuous, 117 McIlvaine's, 167
Bacterial colony amplificahon, 225 discontu~uous,118 multiple, 93
Bacteriophage lambda. see Lambda (h) Bottlen~ks,56,201,220 NaI binding, 380
bacteriophage Bounces, high-stringency,227 nick translation, 167,202
Bacteriophage library screening protocol, Bovine serum albumin (BSA), 38,211 nrck translation (transfer hybridiza-
348-349 Branch-and-bound methods, 4801182 tion), 320
Banding. see Chromosome banding; Mole- Branch attraction, long, 478 "2P end-label, 320
cular cytogenetics; specifictypes Branches phage dilution, 380
problems, 372473 interior (central), 410 phosphate, 185-186,167,202. see also
RFLP, substoichiometric, 310 peripheral, 410 Phosphate buffer
BankIt, 375 reliability of individual, 506-509 phosphate hybridization, 174
Base compositional bias, 4 removing long, 499 reverse transcriptase, 380
factors affecting, 496 Branching Sl nuclease, 202
BCIP (5-bromo4chloro-3-indolyphos- patterns, unrooted, 477 screen, 92-93
phate), 86,142 sequence, 198 STES (sodium
Behavior, 337 timing, 198 chloride-Tris-EDTA-sucrose),
Beta tubulin primer, 245-246 Branch length(s), 439-440 283-284
Bias covariances, 474 stop (nucleic acid sequencing), 380
assessing effect, 496-497 inferred, 472 TAE (Tris-acetate-EDTA), 262,247
base compositional, 496 least squares, 450 Taq polymerase, 246,380
codon and nucleotlde, 212-213 LogDet distance, 460 TBE (Tris-borate-EDTA), 247,262
simulation, 527 methods for finding, 441-442 TE (Tris-5DTA), 247
Bifurcation, 410 model, unconstrained, 440 Tris-acetate, 203
Binary characters, 411 negative, 450 Tris-ethanol wash, 381
Biogeographic data, errors, 535 spectrum, estimation, 474 wash (cpDNA), 319
Biogeography, 337 spectrum [$Dl, 468 Buffer systems, 69-70,8243
historical, 269 substitutions, 440 Buffer tray, 70-71
Biopsy protocol, 36 Branch swapping Buffer well, 69
Biotin nearest neighbor interchanges (NNIs), Bulked segregate analysis, 276
biochemical properties, 126-127 484
conjugated nucleotides, 139 rearrangement algorithm, 408 CAGE (cellulose acetate gel electrophore-
sis), 55
CAIC, 510
638 Index
Calcium-binding proteins (CBP), non-spe- Character state(s), 410-411 karyotypes and idiograins, 165
cific, 100 gap, 453 rearrangements, 143
Cam-Sokal parsimony, 422 isozymc data, 413-414 scormg, 165
CAP (cytosol aminopeptidase), 101 optunal assignments, 415 specialized procedures, 132-133
Carbon dioxide, solid (dry ice), 36-37 particulate data, 414 timing, 133-134
Catalase (CAT), 100 probabilrty of observing on a tree, types, 132
Catalytic properhes, and isozyme expres- 464-465 Chromosome characters, 144
sion, 91 reconstructions In Dollo parsimony, Chromosome In situ suppression hy-
CAT (catalase),100 420 bndization (CISS), 139
Cathode, 52 restriction cndonuclease data, 412413 Chromosome paitthng, 122-124,142
Cation, 52-53 tree, 411 antieens and antibodies, 126
L
,
Cavender-Felsenstein model, 464466, Character-state weighting comparative stud~es,143

472 systematic error, 502-503 FISH, 138-139,141-142,163-164
C-banding, 125-126,132-133,165 using step matrices to implement, 502 Cl~romosomepreparation. see also Lamp-
protocol, 156-157 Character wclghhng brush chromosomes; Mitotic chro-
CBP (calcium-binding proteins), 100 compatibility methods, 500-501 ~nosornes
cDNA amplification, 217-218 maxlmum (unity) us. minimum (zero), lampbrush, 130-132
Cell fractionation studies, 53 500 meiotic spermatocyte, 129-130
Cell line collection, 35-36 rationale, 442 mitotic metaphase, 127,149-150,
Cellulose acetate gel electrophoresis successive approximation, 500-502 152-155
(CAGE),55. see alsa Isozyine elec- systematic errors, 500-502 polytene, 129
tropl~oresls Chebyshev's inequality, 392 squash and splash techniques, 127-129
Centrifugation Chemical sequencing. see Maxan-Gilbert Cluomosomc repatterning ltypotl~esis,
differenhal, 216,259 sequencing 144
gradient, 285 Chernoiurnmesccnce,265 CI (cldoroform-isoamyl alcohol), 379
ultracentrifugation, 216,258,282,288 Chiasmata, 130 CICs (cold-induced constrictions), 133,
velocity, 287 Clu-square (x2)test, 390492,404 135
Centripetal farce, 222 Cldoroform-isoamvl alcohol (CI), 379 ClSS (chromosome in s i b suppression
Cesium chloride (CsCl) gradient, 259 Cldoroplast DNA (&DNA) l~ybndiiation),139
animal mtDNA isolation, 285-286 amplification, 217 Cistrons, 270,275
purification of plasmid DNA, 353-354 CsC1-EB purification, 289-290 CK (creatine b a s e ) , 62-63,94-95,101
Cesium chloride-propidium iodide gradi- evolution, 278,280-281 prlmcr, 241
ent, 283-288 higher-level systematics, 280 CI,ADOS, 510
Chain reaction, 207 inheritance mode, 255,270 Cleavage site mapping, 257
Chain-termination, dideoxynucleotide, interspecific diversity studies, 337 construction, 302-308
328 interspecific hybridization studies, 275 double digests, 303-304
Change probabilities isolation, sucrose step and CsC1-EB partial (incomplete) digests, 303-304
calculation, 437-440 gradient protocol, 289-290 CLINCH, 510
protein sequence data, 438-439 mappmg, 312-314 "Clique," 500
Chaotropic agents, 173-174 nucleotlde substitution, 267 Clonal diversity, 61,518
Character(s), 410-411. see also Chromo- population-level comparisons, 269-270 Cloning, 32S325, see also Nucleic acid
some; Controversies; DNA fragment primers (rbct, vpoC1, rpoC2J, 239-240 cloning; Nucleic acid sequencing
analysis; Isozyrne; Species-Ievel rate of change, 270,278 and cloning; specific types
binary us. multistate, 411 restriction site variation, 278 us, direct sequencing, 332-333
comautkna lenath. 417-419 species-level comparisons, 277-278 gene libraries, protocol, 347
corr~lated~con~nuous, 541 Chord distance, 563 methods for PCR products, protocol,
data, 6,27,170 Chromatid, 129 356-359
DNA-based, 276 Chromatography columns, 178 microsatellites, protocol, 370371
homologous us, orthoiogous, 411 Chromomere, 129-130 plasm~dsubcloning, protocols, 351-355
independence, 256,411,413,416 Clvomomycin buffer, 166 TA cloning, 358-359
interval scale measurement, 416 Chromosomal evolution, 144-145 using bacteriophage vectors, 347-350
map locations, 412-413 Chromosomal target acceessibility, 147 Cloning vectors
ordered vs,unordered, 421 Cluomosorne(s) cosmids, 324
polarity, 411,416 anatomy, 142 lambda (h)bacteriophage, . - 323-325
quantitative us, qualitative, 411 defined, 121 M13,351-354
reliability and informat~veness, identification, 124-125,146 replacement and insertion vectors, 324
500-502 lampbrush, 130 TA, 356-359
weighting, 4 molecular cytogenetics, 121-168 Closest tree, 471
Character-based methods, 413,509 morphology, 124-125 CLSM (confocallaser scanning mi-
Character change phylogenetic studies, 143 croscopy), 123,139,148
models, 464-474 polytcne, 123 Clustal, 377
symmetrical probabilities, 419-420 rearrangements, and banding, 143 Cluster analysis, 451,486-487
Character compatibility, corrected, 472 sequence analysis, 142 pair-group, 408
Character data, model-based correctrons, Chromosome banding, 121,232-134,142. systematic error, 495
464-474 see also Bmdmg; Chromosome band- Coalescence
Character pattern probabilihes, ing; specific types allele, 266
Hadamard conjugation,466-470 analysis, 125 theory, 3,5,337
Index 639
Coalescent models, time estimates, 540 Co-speciahon, 337 Decay mdex, 507
Codon-based models, 439 analysis, 59-60 Degeneracy sylnbols, 232
Codon bias, 212-213 Cost matrices, 422-423 Degenerate pruners, 214
Coemyme(s) cpDNA. see Chloroplast DNA Deletlol~s,412
electrophoretic, 73 Creatmc b a s e (CK), 101 Delta approxiination method, 474
enzyme stabilization, 38 gene exprcssion, 62-63 Demography, hlstorrcal, 269
Cofactors lntron primer, 241 Denaturation. see DNA denaturat~on,PCR
electrophoretic, 73 isozymqs, 94-95 cycle, spec~fic protocols
PCR, Ma2+,211 Crossmp:over thermal cycling, 226
Colcemld (deacetylmetl~ylcolchicine),127 intracystronic, 336 Denahlrrilg g~adlenlgel electrophorcsrr
Colctucmc, 127,165 unequal, 145 (DGGE),252-253,255
Cold-induced constrichons (CICs), 133, Cryopreservation, 30-33,37-39 perpendicular us parallel gcls, 263-264
114
a--
population scrcenlng, 266
Collechng permits, 30 household freezer/refrigerator, 37 rcsolvlng power, 263
Collection. see A l h d hssue; Blood; Cell laboratory, 37 Dcnah~ringsolution, 374
line; FIemolymph; Semen, Tissue, tissue stability, 37-39 Denhardt's solution, 167
Venom Cryptic allcles, 64 Bensedpe, 247
Collections.see Database management; Cryptic species, 519 Deoxynucleotlde tnphosphates (dNT1'$),
T~ssuecollections, synoptic CsC1. see Cesium chloride 206
Column, glass, DNA hybrid melt, 187 Cnt labeled, 265,298
Comparahve methods, analyzing hvbridization arecision, 173 mix for PCR, 247
macroevolutionary patterns, 540-543 piot, 171 Sanger dicieoxy sequencing, 328
Compatibihty value, 171 ternllnatlon mlxes, cycle sequencing,
character welght~ng,500-501 CTAB (hcxade~ltrimethylaminonium 381
corrected character, 472 brormdc), i24 DEP-treated water, 379
Component, 510 extraction buffer (2x), 379 Developmental stabihty, 61
CBMPROB, 510 Culture medium, peripheral blood cells, Dextran sulfate, 146
Computer programs, for phylogenehc 166-167 DGGE sci. Denaturing gradient gel elec-
analyses, 510-514 C-value, 127 trophoresis
Concerted evolution, 74,217,267-268, Cycle sequencing, 328-330 Dlalys~stubmg, 319,379
276,331 buffer, 379 B~amlnobenzid~nc tetrabydroflondc
Condit~onallikelihood, 441 end-labeling, 330,367-368 (DAB), 142
Confocal laser scanning microscopy us, other methods, 333-334 stalmng solution, 167
(CLSM), 123,139,148 protocol, 367-368 4,6-D~arnlno-2-phenylitidole(DAPI), 112
Congruence, 440 terminat~onrsnxes, 381 o-Dlarus~dmed1HC1,86
Consensus repeat sequence, 331 Cytochrome b, 216 Diazo dye, fast, 86
Conservation genetics, 337,518 interspeclfic diversrty studles, 339 Dichotomy, 410
Conservahve substituhons, 453 pnmers, 236-238 D~deoxynuclcnlidechail~-tcrmmati~n
Conserved gcnc sequences, 219-220 Cytochrome c primer, 241-243 method, 328
Conserved pnmers, 219-220 Cytochrome oxidase 1,216 D~deoxynucleot~de tnphosphates
Consistency, 426 primers, 236-237 (ddNTPs).328
parameters, in max~mum-llkelihood Cytochrome oxidase 1/11, lnterspccific dt- Differential-repl~cationbandlng, 132-133
distance, 458 versrty studies, 339 BrdU protocol, 158
phylogenetic methods, 527-528 Cy togeneticb), 3 D~geshon.see Nuclcic ac~d,Parl~al,$1nu
Constitubve heterochromatin, 132 stasis, 144 clcase, specihc protocols
Constraints, evolutior1ary, 4 Cytosol aminopeptidasc (CAP), 101 D ~ g o x l f ~ alabel,
u i ~ 136
Contact zones, 275 D~hydrolipoamtdedehydrogenase
Contammation DAB (diaminobenzidinc tetrahydrochlo- (DDH), 101
PCR, 330 ride), 142 B~mers,213
RFLP, 309,312,315,318 staining solution, 167 Dlmethyl sulfate, 328
Conhngency tablc(s),391-392 DAPI (4,6-diamino-2-phenylindolc), 142 Diplo~ddaia, 396,398-401
analysis, 414 Data, combuung, 522 analysrs of vanance layout, 396
Convergence Database management, tissue collect~ons, inferences from 4 400
molecular level, 336 Diplotene bivaicnts, 129
40
RE analysis, 256 Databasc starches, sequence comparison, Direct fluorescence irmnunochemis~~y,
site gain and loss, 257 139-1 40
374-375
teshng, 485 Data elimination, syste~naticerm, Dlrect scquenculg, 264 see also Nueleic
VNTR loci, 257 499-500 acld sequcncing and clorung, Poly-
Co~xvverwon,gene, see Gends), conversion merase cham rcacbon
~~t~ quality and preselltation, 11-12 vs nuclc~ca c ~ d clonmg, 332
Corrected character compatibility, 472 ~ ~ ~model, h 439
~ f f
Corrected generalized d~stance,469 distances, 457
RNA sequencing, 333
Corrected parsimony, 471472 DDH (diilydrohpoamlde dehydrogenase), Direct tandem duphcation,309-310
Comchons, multipk-kt, 427 101 Dlscretc dlstnbuhon, 443
Correlahons, d~screteand contitluous d d ~ (dideoxynucleohdes),
~ p ~ 328 Disequhbi ium stabstics,24
characters, 541 Deacctylmethylcolcl~iune(colcem~d),127 Bispcrsal, 5&57
Cosnuds, 324 Displacement loop (D-loop)primers,
238-239
640 Index
Diss~m~larity (D)distance, 455 measurement of sequence dissimilari- limitations, 177
Disrancc(i) ty, 453454 measurement error, 199-200
additive, 172,447448,467-493 protein-codlllg DNA sequences, 457 methods, primary, 174-176
add1:ive-tree, 447-452 superimposed events, 454-456 phenol emulsion reassociation tech-
albzyme/restr~ctionendonuclease tran$it~on/transversionsubstitutions, nique (PERT), 187-192
data, 462-464 456-457 phosphate buffer, 174
corrected generalized, 469 treatment of undefined values, 458 phylogenetic reconstruction, 197-201
Dayhoff model, 457 Distance Wagner method, 493 points of contention, 170
d~sslil~~laniy (D) or p, 455 Distortion, of add~tivedistances, 448 principles, 170-171
DNA-DNA liybridizal~on,194-197 Divergence reassociabon, 173-174
DATA sequences, 453462 extent, 25-26 sheared drivers, protocol, 180-181
estimates, 448 meltlng temperature, 170 single-copy tracer fractionation, proto-
evolutionary, 446 prediction of time, 531-540 col, 181-182
Felsenstem's method, 455 sequence, 176-177 stock solutions, 201-203
F84 model, 456 species, 520 techniques and data analysis, 171-172
Eladarnard, 474 time, 10,25,387 tracer fragment fengtl~estimation pro-
HKY model, 456 timeframe, 520 tocol, 184
imrnuiiolog~cal,464 Divergence matrrx Fv, 454 tracer preparation by iodination proto-
JukesCan tor model, 455 Div~sivepauwise clustering, 483 col, 184-185
I m u r a ' s two-parameter model (ELF), D-loop (displacement loop) primers, tracer preparation with 32Por pro-
456 238-239 tocol, 181-182
Ldwii-Swofford method, 456 DMSO, 211,225 DNA extraction.see DNA isolation; Ex-
Ll's ~~ietllod, 487 cell line cryoprotectant, 35 traction, DNA
log-deieiminant (LogDet), 459-461 DNA. see also Nucleic acids DNA fingerprinting, 3,270
model-based, 449 ancient, 220 DNA fragment analysis, 3-4,249-320. see
observed generalized, 469 repetitive, 145,173 also Restriction fragment length
Po~ssonmodel, 455,457 repetitive us, single-copy, 173 polymorphism
propor~ionalmodel, 455-457 satellite, 123 agarose gel electrophoresis protocol,
pmt~ln-codinggene$, 457 DNA amplihcahon. see Ampltllcation; 291-297
r c s ~ ~ r c tsite,
r o ~463464
~ Nucleic acld sequencing and applications and limitations, 266 282
'ramura and Mei model, 456 clorung; Polymerase chain reachon assumptions, 255-257
~~ltrail~eric, 452-453 DNA clontng. see Nucleic acid donmg character independence, 256
uncorrecled vs correcied, 455 DNA denaturation, 170 combined techniques, 266
Distai~craata, 6,27,170,172 chromosome, 126,137 cpDNA ~salation,sucrose step and
Dista~~ce fluctuation, 170 cycle sequencing, 329 CsCI-EB gradient protocol, 289-290
Dlstancc measures, 454 double-stranded template, 364 DNA isolation protocols, 283-290
arc, of Cavallr-Sforza and Edwards, PCR reaction, 206-208 DNApreparation, 257-259
463 Sanger dideoxy sequencing, 328 ethidium bromide staining protocol,
chord, 563 DNA digestion. see Nucleic acid digestion 297
Euclidean, 462-463 and specific protocols forms of vadation, 250-254
Li and Gmur method, 463 DNABIST program, 458 fragment conformation/stabilify,@,
Manhattan, 426,463,563 DNA-DNA hybridization, 3,169-203 252-253
rnod~f~cd Na,462 applications, 176-177 fragment size variation, 250-252
Nei, 462 chromatography columns, 178 fragment visualization, 297-302
hrri and Ll's transformahon method, contammation sources, 193-194 heritability, 255
163 data error and non-additivity, 197-201 homology, 256
nor~nallzedpercent hybr~dlzatlon data properties, 172 laboratory setup, 282
(Nl'ii), 195-196 distance estimates from melting curve methods, 257-266
Rogers, 462-463 data, 194-197 methylation, 255
T2511,197 distance metric, 170 microsatellite analysis primer labeling
Tjoif, 196-197,200 equipment, 179 prolocol, 298-299
T,,, lYh, 200 error sources, 192-193 mtDNA isolation protocol, animal,
2:.rodi., l95,200 genome size, 200-201 283-288
D~star~ce methods hybrid thermal stability analysis, polyacrylamide gel electrophoresis
us mc~s~murn-l~kel~hood meihods, 446 188-189 (PAGE) protocol, 291,296-297
pa~rwise,446-464 hybddizahon with hydroxyapatitc and cx32P3' re~tricbonfragment end-label-
Wagnel; 487 phosphate buffer, protocol, 185-186 ing protocol, 297-298
Dlstance te~ts,comparmg two trees, hydroxyapatite column preparation, radioactive labeling, 265
505-506 protocol, 186-187 repeatability, 255-256
Bisiance iransformatiolu, for sequence incomplete reaction, 194 restriction endonuclease DNA diges-
data, 453-462 interpretation and troubleshootmg, tion protocol, 290-291
among-site rate variahon, 458459 189-201 restriction site mappmg protocol,
evaluating, 461462 isolation and purification protocol, 302-308
log-determinant d~stanccs,459-460 179-180 restrictlon site variation, 253-254
mas~n~urn-likelihood distances, kinetics, 172,176 sequence choice, 266-268
457-456 laboratory setup, 178-179 staimng, 264
Index 641
stock solutions, 319-320 Double-strandedDNA (dsDNA) amplifi- Entrez, 374-375
techniques and their applications, 250 cahons, 225-227 Enzymatic sequencing.see Sanger
transfer hybridization protocol, Downstream primers, 232 dideoxy sequencing
299-302 DQa primer, 243-244 Enzyme. see alsa Restr~ctionendonude-
DNA hybridization wth l~ydroxyapatite Drift model, 397,400 ases
and phosphate buffer protocol, Driver, 171 activity, 53
185-186 Dry ice, hSSUe transport, 36-37 dilution, 67
DNA isolation. see also DNA fragment "Dry shipper," 36 extraction, eleclrophoresis, 78
analysis; Extrachon; Restriction frag- Duplications, DNA fragment analysis, 257 modification, posttranslational, 66-67
ment length polymorphism Duplicative transpositions, 308 multimeric, 62-63,90
clonlng methods, 323 nomenclature, 94-96
polymorphisms, 58
direct methods, 323
double-stranded DNA, for sequencing,
356
E,coli
EB see Ethidium bromide
witlI plasmid D N ~ , stabilirat~on,38
352 staining formulas, 96-116
target sequences, 323-326 Ecological genetics, 56-59 zones of activity, gel, 91
DNA isolation protocols EDTA (ethylenediaminetekaaceticacid), E"z~me (EC)numbers, 96
cpDNA, 289-290 202,284,379 Enzyme systems
genomic, 342-345 blood preservation, 34 freezing and thawing, 72
large-scale, for DNA-DNA hybridita- DNA stabilization, 39 multilocus, 90
tion, 179-180 tissue preservation, 33 nomenclature, 95
mtDNA, 283-288 Efficiency,phylogenetic methods, 528-529 (exon-primed inkon-aossink)
MI3 DNA, 354 EFl n pr~mers,242-244 amplification, 219-220
PCR products, for sequencing, 355-356 Elechodecantation, 54 primers, 241-246
PEG method, 360-361 Electroendosmosis, 55 Epidemiology, disease, 336
plasmid DNA, 353-354 Electromorphs,413 Epigenetic events, 66
small-scale, for PCR, 222-225 Equal input model, 436
autapomorphic, 60-61
DNA markers, nuclear autosomal, 275 definition, 64 Equal-rates model, 443
DNAnalyzer, 169-170 Electrophoresis.see also Agarose gel; Al- random' see Random error
DNA polymeraso I, 136 lozyme(s);I ~Polyaqlamide
~ ~ Error,~ ~ s e Systematic
systematic.
EST (esterasel, 102
~ ; errarb)
DNA polymerase(6) gel; Starch gel
autoradiograph quality, 372 agarose gel protocol, 291-297 nonspecific, 102
behavior and manipulation, PCR, allozyme, 3,19 Ethanol, 284
206-207 artifacts, 315 wash, 202
T7,372,374 basic procedure, 52-53 Ethidium bromde (EB), 259,264
Taq, 207,372,374 chamber, 282 solution, 202
Vent, 207,329 chemicals, 74-78 staining protocol, DNA fragment
DNA precipitation to assay melting, analysis, 297
definition and history, 51
174-175 fragment and restriction site Ethylenediarninetetraaceticacid, see
DNAprobes, 134-135 analysis, 254-255 EDTA
DNArates, 443,510 RE fragments, 262-263 ETS (extcmaltranscriber spacers), 232
DNArepeats, and hybridization tracer sequential, 64 Euclidean distance measures, 462463
quality, 193 tracking dye, 117 European Molecular Biology Laboratory
DNase I, 136,201 varlant detection, 64-65 (EMBL),322
DNA sequence variation, 4,249-250. see database searches, 375
V ~ loci, R 262-263
also Sequence, variation Electropl~ooresis banding, modifications
screening methods, 252-253,255, with tissue storage, 38 chromosomal, 144-145
263-264,266,361-362 EIongation factor la (Ena)primer, concerted, 7-8,217,267-268,276,331
DNA sequencing. see also Nucleic acid se- DNA sequence us phenotypic, 170
242-244
quencing and cloning gene families, 335-336
Elongation factors (Em, EF1, Em),
automated, 262-263 242-243 genome, 330
reactions, protocol, 363-366 EMBL. see European Molecular Biology ~ ~ ~ ~
DNA strand synthesis, 206-207 Laboratory
DNA substitution.see also Subshtution Emulsion, nuclear track, 138 sequence, 321
models &dangered species, and allozyme"aria- Evolutionaril~SignificantUnih 19
models, 433 tion, 56 Evolutionary change models, 426-478
dNTPs (deoxynucleotidehiphosphatesf, ~~d..l~b~li~~ Evolutionary constramts, 4
202,206,210,247,381 DNA fragment analysis, 265 Evolutionary distance, 446
Dollo parsimony 419-421 PAGE, 262 estimation, 189-192
character state reconstructions,420 Evolutionary parsimony, 475
primers, 213-330 Evolutionary rate, 60
problems affechng, 421 problems, 373
relaxed crltenon in, 421-422 thermal, cycle sequencing, 330,367-368 constancy, lo-''
restriction-sitedata, 420 Endonucleases,restriction, see Restrichon inheritance 266-267
traditional and unrooted models, endonucleases Evomony, 510
420-421 Endoreplication, 129 Exhaushve search, 478-181
uniquely derived characters, 420 Exon-primed intron-crossing (EPIC)am-
End-product suppression,88
Dollo's law of irreversibility, 422 plification, 219-220
EN0 (enolase),102
642 Index
Exon-priming,intron-crossing (EPIC) automated sequencing, 330 buffer screen, 92-93
primers, 241-246 cycle sequencing, 330 horizontal, 292
Exom, nuclear, 272 Fluorochrome-conjugatedanti-biotin interpretation, 53
Expected sequence spectrum, 468 monoclonal antibodies, 139-142 loading, 81-82
Expected value, 388-389 Fluorochrome(s),132,134,139 molds, 70
Extension, 209 label, 135 origin guide, 72
temperature, PCR, 228 - --
problems.,148 preparation, sequencing gel, 362463
External transcriber spacers (ETS),232 R-banding, protocol, 157 preparation, starch gel, 79-81
Extraction, DNA Flying DNA hypothesis, 49-97 resolution problems, electrophoretic,
animals, protists, and prokaryotes, Forensic idenfifica tion 91-95
342-343 intraspecific differentiation statistics, scoring, 90-91
DNAfragment analysis, 257-259 401-w slicing, 72,82,85-86
minute tissue quantities, 344-345 PCR,2 2 0 ~ - splithng/tearing, 83,85
organellar DNA, 259 Formaldehyde vertical, 294-295
whenol-chloroform,258 dehydrogenase (FDH), 102 GenBank
plants, fungi, and algae, 343-344 DNA damage, 39 database, 322
salt, 258 Fomamide, 225 database searches, 375
uitracentrifugation, 258 melting temperature, 137-138 error rate, 12
Extraction. PCR stock, deionized, 167 sequence comparison, 374
bone, 223-224 Formazan, 86-87 taxonomic distribution of nucleotide
controls, 224 Formic acid, 328 sequences, 322
preserved tissue, 223 Fossil DNA, 220 Genealogy, allelic, 5
problems, 224 Four-fold codons, 212-213 Gene(s)
Extraction, RNA, 345-347 Four-point metric, 490 amplification,3
condition, 447 conversion, 7-8,146,252,279
relaxed, 492-493 d~spersal,57
F81 model, 436 dupl~cation,62-63
Fractional similarity/dissimilarity,453
F84 model, 436,438,456 expression, 62-63
F statishcs, 385-394 Fragment mobility assays, DNA f r a p e n t
analysis. 263-264 flow, 5667,336
FsT, 39S394 levels of relatedness, 387
Fragment visualization, 297-302
F, 454 losses and gains, 280-281
XiACS (fluorescence activated cell. sorting), Frameshifts,375
Freeze-drying, 38 mapping, 124,265
142443 Freezer, 37, see also Cryopreservation phylogenies, 337
FASTA algorithm, 374 sampling, phylogenetic relationship
French straw, 31,36
FastDNAmI, 510 studies, 26
FASTP algorithm, 374 WQPARS, 511
Frequency parameters, 432 s~lencing,62
FBA (fructose-bisphosphatealdolase), 103 structural arrangement, 414
FDP (fructose-bisphosphatase),102-103 Fructose-bisphosphatase (FBP),102-103
Fructose-bisphosphate aldolase (FBA), transfer, horizontal, 172
FDH (formaldehydedehydrogenase), 102 Gene array, rRNA, 520
Feather tissue preservation, 33 103
Fumarate hydratase (ETMH),103 Gene diversity,allehc frequency, 386
Felsenstein's method, 424,455,541 Gcne evolution studies, 335-336,518
Felsenstein zone, 427,461-466 FUMH (fumaratehydratase),103
Function, objective, 408 Gene order data, 414-415
Filter lift, bacteriophage library screening, GENEPOP, 511
Function minimization
348349 Generaked distance, corrected, 469
multilocus minisatellite, derivative-free methods, 445
Generalized least-squares sum of squares,
g force (RCF),222 506
FISH, see Fluorescent insitu hybridization Generalized parsimony, 422-42.4
Rsher's exact test, 391 G statistic, 440
gl (skewness) statistic, 504 cost ma trices, 422-423
FlTC (fluoresceinisothiocyanate), 139,142 General protein (GP),nonspecific, 103
Fitch-Margoliash method, 448-451 aGAL (a-gdactosidase),103
E A L (pgalactosidase),103 Gene sequence, inferring protein se-
Fitch parsimony 416420,424,428 quence, 457
Fixation indices, 269 E A L A (m-acetyl@lacbsaminidase),97
Gamma (r)distribution,443-445 GENESTRUT, 511
Fixatives, 128 Genetic control, 89-91
Futed differences, popdahon, 65 F A (N-acetyl-glucosaminidase), 97
GAPDH (glyceraldehyde-3-phosphakde- Genetic differentiation,58
Fixed population sampling,388 Genehc distance, 171, see Distance mea-
Fluorescein-avidin, blocking solution, 166 hydrogenasel, 106
Gaps, 375,412 sures; Distance(s); named distances
Fluorescein isothiocyanate (FITC), 139, and measures
142 alignment, 453
character state, 453 Genetic divergence, 24
Fluorescence activated cell sorting Genehc marker(s)
(FACS), 142-143 Gap weighting, 377
G-banding (GTG bands), 126,132,134, establishing, 19-20
Fluorescence bandine. 132.134 hybrid zone, 24
Fluorescent insitu h&ridization (FISH), 165
pxolocol, 157 introgreasion, 61
123,138-139 sampling feasibility, 20
limitations, 146-148 GC-clamp, 263,361
GCDH (glucose dehydrogenase), 104 suitability, 20
single-copy genomic probe, protocol, uniparentally inherited non-recombin-
164-1 65 GDA (guanine deaminase), 106-107,511
Gel(s) ing, 24
Fluorescent label, 214
Gcnchc polymorplusm, 263-264 G3I'Df-I (glycerol-3-phosphate dehydro- Heterochromahn, coi~sbtuhve,132
Genebcs, ecological, 5859 gcnase), 106 I-leterodlmer,95
Genetic samplu~g,388 GP (general protein), nonspeclflc, 103 Hetcroduplex analysis, rcsolving power,
Genehc shxcture, population, 19 GPI (glucose-6-plmsphate ~somcrase),92, 267
Genchc vanation 104 I?eteroduplex(es), 252-253,255
fixed, 24 Great Deluge method, 485 hybr~dlzatlon,190
mode of Inheritance, 267-268 "Greedy algoritluns," 482 PCR amplificahon, 331
w~thinspecies, 385-405 GR (glutathione reductase [NAB(P)HI), reaction, 171
survey, IZtlPDs, 274 105-106 Hetcrogene~ty
Gcnc transfer, lateral, 410 GTDH (glutamate dehydrogcnase), 105 between-population, 65
systematic error, 498 dilutlon effect, 67 h~dden,64-65
Gene wce, 3 GTDHP [glutamate dehydrogenase Heternlogous, defined, 9
phylogeny, 9-10 (NADPI-I*)], 105 Weterologous probes, 266
Genome GTE (glucose-Tris-EDTA),379 Hctcromer(s)
evolution, 330 GTR k d e l , see Time-reversiblemodels, interlocus, 95
molecular organizat~on,145 generalized isozyn~es,90-91
organization, and klnet~cs,176 Guamdine hydrochloride tissue prescrva- Hetcroplasmy, 268,309-313
sequencing, 334 hon, 33 Heteropolymer, 63
Genome size Guanine dealninase (GDA), 106-107 Heterospcc~f~c, defined, 9
DNA hybridization lunetics, 172-173 Guanylate kinase (GUK), 107 Hetero?ygoslty
paralogous sequence differences, 200 GUK (guanylate kinasc), 107 single-locus, 90
phenotypic correlates, 142 studies, 518
Genom~chbrary(ies), 323-324 Iiadamard, distance, 474 Heterozygotc(s) ,
screerung, 324-325 Hadamard conjugahon, 427,464-474 dehctency, 57
Genotype among-s~terate variation/maximum frequency, 386
analysls us, phenotype analysis, 249 likelihood, 473 EIeunstic approaches, 481-485
nomenclature, 95 apphcation to real data, 470-472 tree selection, 483
Genotypic trequency(ies), 389 calculating character-pattern probabili- Hexadccyltr~mcthyla~nmontuin bromrdc
conditional, 401-402 ttes, 466-470 see CTAB
Hardy-Wanberg equd~briurn,390-391 chooslng a tree, 471-472 EIcxokmasc (HK), 107
Geographic subdivision, 387 data explorat~on,472 Hidden heterogeneity, 64-65
GLAL (glutamate-arnrnonla Iigase), 105 distance Hadamard, 474 Hierarchical structure, tests, 504-506
Glass column DNAhybrid melt, 187 extenslon to four character statcs, 473 I-Iill climbmg, 482485
Glassmllk purification method, 360 Felsenstem zone, 464-466 alternatlves, 485-486
Glass tube tissue packagmg, 31 mvertlbdity, 470 fishdme-citra te (disconhnuous), 118
Glucose dehydrogenasc (GCDH), 104 statlstlcs on corrected sequences, Hlstochcm~calstammg, 86-89,97
Glucose-6-phosphate dehydrogenase 473-474 HIStone H2AF pruner, 244-245
(GGPDW), 104 Hadanlard matnx, 466-470 HK (hexolunase), 107
Glucose-6-phosphate isomerase (GPI), 92, deflnction, 466 HKY model, 436,438,456
104 indexing of paxtlttons, 467 3H label (biburn), 135
Glucose-Tns-EBTA (GTE),379 HADH (D-2-hydroxy-aud dehydroge- I-Iomeolog, defmed, 9
K L U R (pglucwonidase), 104-105 nase), 87,107 Homodlmer, 95
&LUS (a-glucos~dase),104 HAGI.1 (hydroxyacylglutatluonc hydro- Homoduplex
P L U S (pglucosidase), 104 lase), 108 hybr~dizatlon,190
Glutamate-ammonia hnase IGLAL), 105 Halrp~nstructure, 209 reaction, 171
Hair hssue prescrvatlon, 33 Homogeneous Markov process, 432
Glutamate dehidroienase (NADPH+) HAOX [(S)-2-hydroxy-ac~doxidasel, Homogeruzahon, aozyme electrophore-
(GTDHP),105 107-108 SIS,73,78-79
Glutathione reductase [NAD(P)I-II(GR), Haplo~ddata, 394398 I-Iornolog(s)
analysts of variance layout, 395 comlg;ating, in DNA fragment analy-
Glyceraldehyde-3-phosphatedehydroge mfcrcnces about 0,397 sis, 257
nase (GAPDH), 106 Hardy-Welnberg eqculibrrium (EWE), 18, deflned, 9
Glvcerate dehydronenase
< " (GLYDH), 106 53-54,90 positronal, 256
~$cerol,211 fixed populations, 390 Homologous gcnc ~dentlfication,331
cell line cryoprotectant, 35 m~crosatellitcs,317 Homologous locus nomenclature, 95
protein stabilization, 38 s~g~uficance level of tests, 404 Homologous sequence alignment, 322
Glycerol-3-phosphate dehydmgeiiase testma. 390-391 Homology, 411
(G3PDH),94,106 HBDH i3-hydroxybutyrate dehydroge- allele, 5, 256-259
GLYDH (glycerate dehydrogenase), 106 nase), 108 defined, 5,7
Gold-conjugated antibodies, 123 Health hazards, chemical, 74-78 DNA fragment analysis, 256
Goodness-of-fit statistics, 440 Heart puncture, for blood collection, 34 positional, 9,412
Goodness-of-fit tests (x2),390492,404 Hemolymph collection, 35 Homomeric isozyrnes, 89-91
G6PDH (glucose-6-phosphate dehydroge- Hennig86,511 Homonornv, dehned, 9
nase), 104 Heparin blood preservation, 34
Hentab~hty,DNA fragment analysis, 255
644 Jndex
reduc~ngeffects, 474-475 Incomplete d~geshon.see Nucleic acid di- phylogeography, 337
systemal~cerror, 500 gestion; Partial digestion statistical methods, 389-401
Homosequcntiaiity hypothesis, 144 Inconsistency, parsimony, 426428 Intraspecific variahon
I-lon~ospec~hc, defined, 9 Inconsistent method, 493 DNA-DNA hybridization accuracy, 201
Iiorizontal gene transfer, 172 Indel, see Insertion/deletion events ( i d e l ) electrophoresis, 63
I-iorscradish peroxidase, 139 independent allele codmg, 425 Introgression, 10,61
Mybiid duplexes, 126 Indirect fluorescence irnmunochemishy, cpDNA, 270
llybr~cidysgenesis, 336 139-140 hybrid zone, 24
Hybrlduatron, 519 Individual relatedness studies, 518 Irttron primers, 240-245
chromosome, 137-138 Infinitealleles mutation model, 401 actin, 241-242
chromosome, sensitivity and efficiency, infinite-island model, 401 aldolase, 243-245
146 Ingroup taxa, 416,478 arginine kinase (ARK), 241
detecting reilculation, 522 Inheritance modc beta tubulin, 245-246
DNA-DNA, see DNA-DNA hybr~dlra- cpDNA, 270 creatine kinase (CK), 241
tlon DNA fragment. (IlF'LP)analysis, 255 cytochrome c, 241-243
intcrspenfir, 60-61,275-276 evolutionary rate, 266-267 DQa,243-244
~ntraspecific,336 genetic variation within spccies, elongation factor la (EFla), 242-244
RNA-DNA, 137 267-268 histone WAF, 244-245
screening, 324 Inhibitors, PCR, 225 proto-oncogene ~nt,242-243
sequenttal (reatr~ctionsite mapping), Insertion, 412 Intron(s)
303--304,308 exogenous material, 308 amplification, 220
systematic error, 498 vectors, 324 losscs and gains, cpDNA, 280-281
transfel, 256,265-266 Insertion/deletion events (indel), 253,257, nuclear, 272
~lybridizabanbuffer, 137 331,453 positions, 219-220
blot~n-labeledprobes, 167 matrix plots/pairwise alignments, 375 sizes, 220
Hybndor~>a, 126 In situ hybridization (ISH), 122-123, true, 220
liybr~dtl.icrma1stabil~tyanaiysis, 188-189 134-137 Invarrable sites model, 443
1-Iybrid zone(s), 24-25,61,67 fluorescence (FISH), 123 Invariant igamma model, 444
dllleientlal selection, 336 non-isotopic (NISI-I), 123 Inverted PCR, 325
Hyblirymes, 61 sensitivily and eificlency, 146 Ion, achvating, and enzyme stabilisabon,
Fiydiazv~e,328 steps, 134 38
Hydrogen peroxide, 142 transmission electron microscopy ISH. see In s i h hybridization
a-2-Hydroxy-aciddehydrogenase (TEMISI-I), 123 Isocikate dehydrogenase (IDH), 108
(1-IADI-I),87,107 troubleshootmg, 165-166 Isoelechic point, 52
(S)-2-llydroxy-acid oxldase (I IAOX), Insulin gene localuation, 123 Isolalion, see DNA fragment analysis,
107-108 znt, 242 DNA isolation; XNA isolahon; spe-
Hydroxyacyiglutalhio~~e hydrolase Intenor branch, 410 cific protocols
(I-IAGlI), 108 Interlocus heteromers, 95 Isolahon, reproductive, 23-24
I-lydroxyapatite column preparahon pro- Interlocus isozyme, nomenclature, 96 Isolahon buffer (cpDNA), 319
iocol, 186-187 Internal nodes, 410 Isolation index, 491
Hydroxyapahte (HAP), 173-176,185-186 Internal transcrrbed spacer (ITS), 232,275, Isolationmed~um,lampbrush chromo-
chromatography, 182-183 520 some, 167
3-1-lydroxybuiyratcdehydrogenase interspeclfrc diversity studies, 338 tsoloci, 66
cl+BD13),108 International Union of Biochemistry Isoschiiomers, 253,255
I-lyper!>oloid approximation, 425 (IUBNC), 96 Isozyme(s), 51
EIypothesib-gcnerak~gstudles, 523 lnterspecific diversity, nucleic acid se- characters, 63
I-iypoihesis testmg, 523-526 quencing and cloning, 337-339 conformational, 66
I-iypoto~uctleatment, inetaphase cliromo- Interspccific hybridization, 275-276 expression, and catalytic properties, 91
sumes, 121 restriction c~~donuclease study, 275 homomeric, 89
Intraclstronic recombination. 67.336 homomeric vs. heteromerlc subunds,
l"1 l a b e l 135 Intralocus polymeric isozyrne, nomench- 91
ture, 96 nomenclature, 96
lDDH (i.-~d~Lol dehydrogei~ase),108 patterns, homozygotes us. hetcrozy-
1DI-X(~socilratedehydrogenase), 108 Illtraspecihc differentiation, 385-405
Id~ogr~ml, 126 applications, 401-402 gotes, 90
L-ld~tol cichydrogenasc (IDDI-I), 108 assumptions, 385 secondary, 66,9l-93
lllurn~natlon,reciprocal, 409 biological context, 385-389 subbands, 91-93
Imaging systems, 330 example, 403-405 variation, 66-67,93
flxed and random models, 388-389 Isozyme data, 413-414
In,munodlenustry, chromosol~,e,139-142 lsozyme eleckopl~oresls,51-120. see also
In~munogerucity,126 forensic applications, 401-402
genetic and statishcal samplmg, specific methods, e.g., Starch gel
1~1~11unog;ob~illn (Ig) eiechophoresis
lyoplulrzai~on,38 387-388 agar overlay drying protocol, 88-89
stability in storage, 38 genotype state analysis, 385-386
implementation: sampling and analy- amperage, 83
structure, 126 assumptions, 53-54
In~munologcaidistance (ID), 464 sis, 402-403
nucleic acid sequencing and cloiung, buffers, 116-120
Immnunoscrtenu~g,324
Inbreeciir~g,56-57 336-337 buffer systems, 69--70,82-83
buffer tray, 70-71
coefi~cientF (FIT), 394
Index 645
chemicals, use, storage, hazard, 74-78 Labeling Ligation buffer, 380
comparison of methods, 54-55 cycle sequencing, 330 Likelihood
documentation oi results, protocol, 89 digoxigenin, 136 comparing two trees, 506
electrophoresisprotocol, 82-84 end-labeling, 213,330 condiiional, 441
gel loading protocol, 81-82 fluorescent, 330 v6. parsimony, 428-430,442
gel molds, 70 nick-translation, 136-137 tree, 440
gel origin guide, 72 radioactive nucleotide, 265 Likelihood ratio statistic, 440
gel sconng, 90-91 random priming, 214,301 Lhehhood raho test, 391,495-496
gel slicing, 72,82,85-86 in vitro, 135-136 Linearly ordered characters, 411
gel splitting/tearing, 83,85 Labels. see also specific types Llnkage disequihbrium,386-387
genetic population structure, 53-54 DNA amplification, 137 Llqmd nitrogen
histochemical staining protocol, 86-88 y-labeled ATP, 298 tissue prese~vation,32
Interpretation and troubleshooting, Laboratory setup tissue transport, 36-37
89-94 DNA-DNA hybridization, 178-179 Lithlum-borate/Tris-citrate,118
lnterspecificapplications, 59-62 DNA fragment anaiysis, 282 Localization, repeats and sequences, 144
intraspecific applications, 56-59 isozyme electrophoresis, 67-69 Local optuna, entrapment, 485 -
ionic buffers, 53 nucleic acid sequencing and cloning, Loci, putative or presumptive, 413
labaratory setup, 67-69 339-341 Locus
limitations, 63-67 polymerase chain reaction (PCR), 221 as a character, 413-414
posttranslat~onmodification (PTM), 93 restriction fragment length polymor- nomenclature, 94-96
procedure, 52-53 phism (RFLP)analysis, 282 origin of new, 336
project planning, 69-73 kLactate dehydrogenase (LDH), 87, sampling, 20
starch gel preparation protocol, 79-81 92-94,109 LogDet (log-deternunantdistances),
support medium, 54 enzyme system, 92 459461
tissue homogenization protocol, 73, isozyme, allelic variants, 93 Long branch attrachon, 427,478
78-79 locus (Uh-cf,62 Long PCR, 220-221,256,259,278,283
tracking dye, 116-117 Lactoyiglutathione lyase (LGL), 109 Low-copy-number sequences, popula-
variables, 72-73 Lake's method of mvariants, 474-477 tion-level comparisons, 272-275
wickless system conditions, 83 consistency, 528-529 Lypholization,38
ITS (mternal transcribed spacer),232,275, negative values, 476-477 Lysis buffer (cpDNA),319
338,520 performance, 477 Lysogemc cycle, 324
IUBNC (International Unlon of Biochem- rationale, 474-475 Ly tic cycle, 324
istrv).
,,232 transitlons and transversions, 477
Lambda (h)bacteriophage, 323-325 M13
gel for checking clones, 350
Jackknifing, 197,392 growth protocol, 347-348 sequencing vector, 324-325
evaluating taxon stability, 497-498 subcloning, 324
library screening protocol, 348-349 transformat~onof bacteriophage DNA,
random error, 507-509 rniniprep DNA isolation protocol, 352-353
JTT model, 439 349-350
Jukes-Cantor (JC) model, 198-199, MacClade, 511
Lampbrush chromosome
428-429,435436 Macroevolutionary patterns, 540-543
isolation medium, 167 MacT, 511
distance, 455
preparation, 130-132 Malate dehydrogenase (MDEI), 93,109
rate heterogeneity, 443
protocol, 155-156 Malate dehydrogenase (NADP*)
testing model fit, 496 Lanava et a1.-Rodriguez et al. model, 456 (MDHP),109
a-L-arabmofuranosidase(ARAB),99 MALIGN, 377,512
KAc (potassium acetate), 379 Laser automated sequencing, 330
Karyological.evolution, 144 Mammalian genome, size, 170
L-broth, 379 Management Unils, 19
Karyotype
chromosome banding, 165 LDH92-94,109
&-lactate dehydrogenase), 6287, ~~~h~~~~~distance, 563
definition, 125 matrices, 426
Least-squares criteria, weighted, 448-451 Manhattan mehic, 425
Ketamine, 34 optimal 5s rRNA tree, 450
Kirnura's three-parameter model (mST), aMAN (cu-mannosidase),110
sum of squares, 506
473 Mannitol protein stabhation, 38
Kimura's two-parameter model (KZP),
435-436
L~fl:~~tions,
mP 314 Mannose-6-phosphate isomerase (MPI),
109-110
terminal, variation, 454
distance, 456 variants, DNA fragment analysis, 257 isozyme, 94
teshng model fit, 496 Mapping
fietics, and genome organu&on/struc- l d ~ ~ ~ ~ cleavage
~ site, 257~
~ d
ture, 176 LGL (lacto~lglutathione lyase), log cpDNA, 312-314
&fino and Hasegawa test, 505-506 gene, 124
Li and Graur method, 463 nuclear rDNA array, 232-234
KITSCH program, 453 Libraries,see Autoradiography;Bacteno-
Klenow fragment, 265,298 restriction, accuracy,256
K2P model, 435-437 restriction sltcs, protocol, 291,302-308
Subgenomic
phage Genomic
library;libraries MARKOV, 512
Lgation Markov modelts),431432,459
Label buffer (31Pend-labeling),320 blunt-end cloning, 357-358 LogDet, 459
y-labeled 32P-dATP,298 TA clonmg, 359 Markov process, homogenous, 432
646 Index
Mark-recapture model, 445 TSm and ATsoM,192 plateau point, 277
Matemity/paternity exclusion analysis, Melting profile, 171 population genetic studies, 336-337 .
219 Melting temperature, 171 population-level comparisons, 268-269
Mating, random, 399 divergence, 170 preserved fragments, 39
Mating system duplex base composition, 174 primers, 235-239
specific, 400 formamide, 137-138 source of material, 283
studles, 518 2-Mercaptoethanal,78 species-level comparisons, 276-277
studies, VNTRs, 267 Mercuric chloride, DNA damage, 39 Mitotic chromosomes
Matrix(ces1 Meta-analysis, 522-523 fibroblast culture (reptiles) protocol,
additive and distance, 172 Method of moments, 395,403 153
change costs, 375,377 Methylatian-sensitive enzymes, 255 gut ep~tliellumprotocol, 149
comoarisons. 375 Methvl ereen stock solution. 167 msect embryo protocol, 15P155
plant root tips protocol, 149-150
prcparatlon of metaphase chromo-
;tep, 502-503 Microsatcllite(s), 255 somes, 127
variance-covariance, 473474 DNA amplification, 218 vertebrate corneal epithelium protocol,
weighting, 422-423 population-level comparisons, 271-272 153-154
Matrix algebra, 466-467 sequencing and cloning protocol, vertebrate penpheral blood protocol,
Maxam-Gilbert sequencing, 327-328 370-371 152-153
us. Sanger sequencing, 333 variations, 250-252 Mitotlc proliferation, stmulatlon, 127
Maximum-likelihood methods, 414, Mcrosakelliteanalysis, 3,262-263 Model-based &stances, 449
430-446 interpretation, 317-318 Model ht, tests, 495-496
accommodating rate heterogeneity primcr labeling protocol, 298-299 Model reahsm, 527
across sites, 442-444 scoring, 317 Models, ubhty, 426-430
adding new data, 475 troubleshooting, 317-318 Molecular clock(s), 10-11,60,172,440,470
calculating change probabilit~es, Microhtcr platc modificalions,365-366 confidence ltmlt calculations, 535-539
437-440 Midpoint rooting, 488 lack of generality, 531-532
calculating likelihood of a tree, 440-442 Migration, 57 hmltations, 534-535
consistency of, and variance, 430 mutation, 400-401 "local," 532
dehition, 430 Migration rates. see Population differenh- perfect, 532-534
us, distance methods, 446 atton tlme smce divergence, 537
distance model, 455,457458 estimation, 386-387 universal, 531-532
est~matingmodel parameters, 444-45 Minimum-absolute-deviationapproach, Molecular cytogenetlcs, 121-168
Hadamard conjugation, 473 -" -
451 AgNOR band~ngprotocol, 157-158
models of sequence evolution, 432-436 Minimum evolution (ME) method, applications, 142-146
objective, 430-431 451452 assumptions, 126-127
other data types, 445-446 Miniprep clone screening,358 autoradiography, 138
rate homogeneity us. heterogeneity, 442 Misatellite(s) autoradiography, protocol to detect
systematic error, 495 DNA preparation in analysis, 258 rad~olsotopicISH, 162-163
Maximum parsimony. see Parsimony homoloes. 257 BrdU-bandmg, salamander embryo
Mcllvaine's buffer, I67 loci, 2 5 j ' protocol, 158
MDH (malate dehydrogenase),I09 multilocus fingerprinting, 291 C-banding protocol, 156-157
MDI-IF [malate dehydrogenase uouulation-level comparisons, 270-271 chromosotne banding, 132-134
(NADPf)], 109 $ikgle-locus,265 chromosome painting using FlSII,
Mean square, observed variation, 251-252 138-139
diploid data, 399 Misinformativecharacter, 500 cliromosome painting using FISH, pro-
haploid data, 395 Misinformativepatterns, 427 tocol, 163-164
Measurement error, DNA-DNA hy- Misleading, positively, method, 493 chromosome preparation, 127-132
bridization, 199-200 Mismatch differentlal replication banding, BrdU
NEGA, 512 base pair, 170 orotocol. 158
Meiosis analysis, 142 primers, 261-262 equ?lpmant,'147
Meiotic spermatocytecl~omosomeprepa- Mitochondria, isolation, 216 FISH with single-copy genomic probe
ration, 129-130 Mitochondria1 DNA (mtDNA),215-216 protocol, 164-165
Melting assay 176 CsCl sola at ion, protocol, 283-288 fludrochromeR-banding, chro-
Melting curve(s) gene phylogenies, 337 momycm A3, protocol, 157
calculation from raw counts, 189-192 higher-level systematics, 279 G-banding protocol, 157
differential, 194 inheritance mode, 255 hybridization reaction, 137-138
distance estimates, 194-197 interspecific diversity studies, 337-338, immunochemistry, 139-142
error sources, 193-194 interspec~fichybridization studies, 275 laboratory setup, 147-148
homoduplex and heteroduplex, intraspecific phylogeography, 337 lampbrush chromosome protocol,
191-192 isolation, plant and fungi, 283 155-156
integral, 190-191 non-functional nuclear copies, 331 litnitations, 146-1 48
interspecies, 192 nucleotide substitution, 267 methods, 127-142
problematic, 192-194 PCR amplification, 325-326 mitotic chromosomes, protocols,
TEACLand hydroxyapatite, 175-176 phylogenetic analyses, 338. 149-155
nlck translation label~ngof probes for du~ucleotide)/NADH(nicohne ade- random primer, 218-219
TSH protocol, 159-160 nine dmucleotide, reduced), 86 ribosomal RNA genes, 217
polytene chromosomes, dlpteran sali- Nai binding buffer, 380 single-copy nuclear genes, 217-218
vary glands protocol, 155 Natural selection, and molecular cvolu- Nuclear genes, 2723-279
prmciples, 121-126 tion, 58 Nuclear IOU,519
Q-banding protocol, 157 NBT (nitro blue tetrazolium), 8647,142 Nuclear rDNA array map, 232-234
radloisotoplc ISM, 160-161 Nearest neighbor interchanges (NNIs), Nuclear sequences, popuiat~on-levelcorn-
rad~oisotoplclocalization, single-copy branch swappmg, 484 pansons, 270-275
sequences, 161-162 Negahve intraclass correlation, 398 Nuclear targets, interspecific diversity
in sltu hybridization (ISH), 134-137 Nel a i d ti's transformation method, 463 studies, 337
splash technlque, mitotic chromosomc Neighbor joining, 408,451,483,487-490, Nuclear track photograpluc emuls~on,138
slldes, protocol, 151-152 505 Nuclelc acid clomg, 231-281. see also
squash technlque, mltotic/meiotic "~eighborliness"methods, 491 Clonmg, N~lcle~c acid seque~~cing
chromosomes, protocol, 150-151 Nei's B,462,539 and clon~ng
stock soluhons, 167-168 Nehvork(s), 410,521-522 blunt-end, 357-358
subbed s l ~ d eprotocol, 149 Neutral drift, 170 vs. ducct scqucnclng, 332
teclinological breakthroughs, 121-122 Neutralist-selectionist controversy, 11 lambda (h)bactcnophage, 323-325
yeast method, mitohc chromosomes, Neutrality methods, 323
small vertebrate protocol, 151 defined, 5 TA, 358-359
Molecular weight, and electrophoresis nu- molecular variant, 11 Nuclerc a n d d~gest~on
gration, 254 Neutralizing solution, 380 restr~chonendonuclease wrotocol.
Molevol, 512 Newton's mcthod. 442.444 RFI,P analysis, 290-241
MOLPHY, 439,512 N h A c (ammonium acetate), 378 RFLP, partlal (incomplete), 310,
Moi~oclonalanhbodies, 123,126 Nick translation, 136-137 312-313
fluoroclvomated antib~ot~n, 139-148 bulfer, 167,202 RFLP, problems, 315
Monomenc proteh~,89 DNA fragment analvsis, 301-302 Nucle~c:ad hybnd~rahondata, pairwlse
Monophylehc taxa, 416 labeling gf probes f6r ISH protocol, distance method, 464
Moore-Goodman-Czelusniak algorithm, 159-160 Nuclerc acids see also DNA, RNA
424 Nlcotine adenine dinuclcotide EDTA stab~l~zat~on, 39
Morpliollne (amme-c~trate),117 stablllty in storage, 38-39
Morpholog~calfeatures, d~agnoshc,24 NISI3 (non-lsotopic in situ hybndlzat~on), Nuclerc acrd sequencliig and clonmg, 3,
Morpl~olog~cally cryptic specles, 519 8687,123 331-381 see also DNA fragment
Morphotype, sympatrlc, 23 Nitro blue tetrazohum (NBT), 142 analys~s,DNA isola tlon, Nu-
Most-pars~moniousreconstmchon ( w l c ) , Nitrogen, liquid. see Liquid nitrogen clcobde(s)
418-419 Nodes, 410 applicat~on< and hrn~tatlons,335-339
MPI (mannose-6-phospl~atesomer rase), Nomenclature, enzyme and locus, 94-96 assumptions, 3'30-331
109-110 NONA, 512 attractions, 321
MPR (most-parsmonious reconslmction), Non-genctic variation, 67 automatcd sequencing, 330
418-419 Non-isotopic in situ hybridization (NISH), autoradlograph lnterpretat~onand
MSG.395 123 troublcshootlng, 371-374
MSI, 399 Non-orthologous genes, systematic error, bacteriophage gro&th protocol,
MSP, 395,399 498 347-348
mtDNA. see Mitochondria1DNA Nonparametric tests, systcmahc error, bactcriophagc library screening preto-
MTIT [3-(48-dimethylthiazol-2-yl)-2,5- 498-499 C O ~ ,348-349
diphenyltetrazolium bromide], Non-submarme gel, 293 cloninp, 323-325
86-88 Non-synonymous residues, 453 c l o ~ i methods
~n~ for PCR products,
Mucus, in FCR extraction, 224 Non-synonymous subshtuhons, 457 356-359
Multidimensional optimization, 444 Non-transcribed spacer (NTS), 232 clonlng vs. direct sequencing, 332-333
Multifurcation, 410,419 NOR banding (nucleolar organizer re- cvcle seauencine. 328-330
Multimcric enzyme(s),62-53,90 p o d , 132-134 cycle scq'uencing vs other methods,
Multimenc protein, 52 NPH (Normalized percentage of hy- 333-334
Multiple alignments, 376-378 bndization), 171-172,195-196 d~rectsequeiic~ng,323
Multiple-hit corrections, 427 melting curve calculation, 189-190 DNA isolation protocols, 342-345
Mulhplc slicer, 85 NP-40 (nonidet P-40), 211,225 DNA sequcnciGg reactions, prorocol,
Mdhplexmg, 252,262-263 (nuclc~sidehipliosphate pyropllos- 363-366
Multiplex sequencing, 334 phatase), 110 equipment, 339-341
MULTTPRINS, 139 NTS (non-transcribed spacer), 232 froren plasm~dclone stock preparation
Multistate characters, 411 Nuclcar DNA protocol, 354-355
MUST and 3S, 512 gene evolution studies, 336 gene evolution shdies, 335-336
Mutahon higher-level systematics, 280, 282 genomic sequencmg, 334
migration and, 400401 Nuclear DNA amplifications, 217-220 history, 326
substitutional, 336 anonymous single-copy sequences, 218 interspec~hcdiversity studies, 337-339
exon-primed intron-crossing (EPIC), lntraspecific diversity studies, 336-337
219-220 isolating target sequences, 323-326
Ne, 277 microsatellite DNA, 218 laboratory setup, 339441
NAD (nicotine adenine
648 Index
lambda (?J bacteriophage DNA, Nucleotide subst]tution Organelle(s)
n~~rriprep protocol, 348-350 animal: mtDNA, 267 DNA extraction, 259
rnatr~xplots, 375-377 cstimatmg from restr~ctionfragment separation, PCR, 222-223
blaxaln-Ctlbert sequencmg, 327-328, data, 464 Orthologous characters, 411
333 methods to estimate, 463464 Orthologous copies, nuclear genes, 279
MI3 DNA lniniprep sola ah on protocol, patterns of and effects on pl~ylogenet~c Orthologous DNAfragments, 256
354 trees, 476 Orthologous genes, evolution studies, 331
n~ii~osateli~te protocol, 370-371 plant: cpDNA, mtDNA, 267 Ortl~ologouslocus, 214-215
muillpie ahgnments, 376-378 Nucleotyplc correlates, 142 Ortl~ologoussequences, 200
rnullrplex sequencing, 334 Null alleles, 66,255 Orthology
pajrwiie ahglunents, 375-376,378 Nun~encalmodels, 527 defined, 7-8
PCR product isolat~onfor sequencing Nurner~calresampl~ng,392493 nomenclature, 95
]~~otocoI, 355-356 inferences about 8,397498 OTUs (operational taxono~nlcunits), 487
PC:R p~oductpurification for sequcnc- Numerical simulation,526-527 Outcrossmg, 56-57
lng p~otocol,359-36i Outgroup companson, 26,416,499
,isii.uc DNA isolation protocol, Overstaming, 88,91-92
Objective funchon, 408
a53-354 Observed generalized distance, 469
RNA isolat!on, 325-326 Observed mean square "Plabel, 135,265,298
RNA isola tion, protocols, 345-347 diploid data, 399 stock solutions, 320
RNA sequencing reactions protocol, haplo~ddata, 395 Pachytene bivalents, 129-130
366 Octanol dehydrogenase (ODH),110 PAGE. see Polyacrylamide gel elec-
ru!Jl1lng a sequencing gel ~ ~ ~-0ctopine
~ ~ de1lydrogenase
~ ~ (OPDH), ~ ' trophoresls
l
368470 110-111 Pair-group cluster analysis, 408
Sanger dideoxy sequencing, 328-329, ODEN,s13 Pairwise alignments, 375-376,378
333 OBH (octanol dehydrogenase), 110 Painvise clustering, divisive, 483
screening methods for DNA sequence offsets, 412 Pairwise comparisons, non-indepen-
vcirialion, protocol, 361-362 Offspnng-parentrelationships, 219 deace, 534
secondary DNA and RNA structure, Oligo dT, 220 Pairwise differences,269
373-374 Ohgonucleotide, 206 Pairw~sedistance method(s), 446-464
sequence cornparison and alignment, synthesizers, 214 systematic error, 499
074-378 One-off s~te,413 Pairwise sequence comparison, 454-455
sLqLlenclng gel preparation, protocol, O~DH (o-octopinedehydrogenase), Paleobiogeography,5%0
1362-363 110-111 PAML, 513
steps, 022-323 Operational tdxonomic units (Oms),487 PAP (pemxidase-antiperoxidase)im-
dock solutions, 378-381 Optima, local, entrapment, 485 munochemistry, 140
subclonlng mto plasmids or M13, pro- Optimaliyl criteria, 408, 415-478 Paracentric inversion, 143-144
tocol, 351 accommodating rate heterogeneity Paraformaldehyde, 167
ther~i-ialcycle sequencing protocol, across s~tes,442-444 Paralinear distance, 459461
367-368 vs. algorithms, 415-416 Paralogous copies, nuclear genes, 279
~raixsformaiion,frozen cell preparation calculating cllange probabilities, Paralogous DNA fragments, 256
protocol, 351-352 437-440 Paralogous loci nomenclature, 95
tr3nsiormat10n,of coll with plasmrd calculating likelihood of a tree, 440442 Paralogous sequences, 200,266
DNA, protocol, 352 Camin-Sokal pars~mony,422 Paralogy
transformahon, of MI3 bacteriophage ~ ~parsimoily,
l l 419-421
~ defined, 7-8
DNA, protocol, 352-353 estimating model parameters, 444-4415 PCR, 8-9
111vitro an~plification,325 Fitch and Wagner parsimony, 416-419 systematic error, 498
Nuclcoiar olgaluzer repon WOW band- ~~d~~~~~ colIjugalion,464-474 Parametric test, comparing two trees, 505
ing, 432-134 Lake's method of invariants, 474-477 Parapatric populalions/species, 22-23
Nt~cleos~de iriphosphate pyrophos- maximum-likelihood methods, PARBOOT, 513
phalase (NP), 110 430-446 Parentage
Nucleotlde(s),see also Deoxynucleohde methods based on evo~utionary exclusion analysis, 219
tnphospl~tes change, 426-478 MPDS est~matxon,274-274
bias, 212-213 models of sequence evolution, 432-436 unisexual biotypes, 61
b~oi~nylated, 213-214 objective of phylogenetic analyses, Parsimony
cloning, packagtng bacteriophage 430431 allozyme data, 425426
DNA, 324 other data types, maximum likehhood, assumptions, 426
yirparmg dNTPs, 202 445446 Cam~n-Sokal,422
primers, 213 pairwise distance methods, 446-464 character polarities, 416
Nucleoi~desequence cllange, Jukescan- parsimony and inconsistency, 426-428 comparing two treest 505
tor model, 1128-429 parsimony methods, 415-426 corrected, 471-472
Nucleohde secluencing. see also Nucleic parsimony vs,hkelihood, 428-430 defined, 428
ac~dsequencing and donlng rooting revisited, 477-478 Dollo, 419421
direct sequencing, 325 utihty of models, 426-430 evolutionary, 475
GenBank taxonomic distribution, 322 Ophma] trees, 4 7 ~ 9 3 Atch and Wagner, 416419
museum specimens, 325 Ophmization,mulhdimensional, 444 generalized, 422-424
single- vs. double-stranded DNA, 325 ordered chal.acters,411-412 inconsistency, 426-428
Index 649
vs. likel~hood,428430,442 Phenol emuhion reassociation techxuque gene trees, 9-10
model-free nature, 426 (PERT),187-192 history, 1-3
most-parsimonious reconstruction, Phenol(s), 202 Phylogeography, 3,266,269,385-386
418419 homogenization, 78 ini-raspecrfic, 337
optimality vs. algorithm cr~teria,
415416
protein sequences, 424
systematic error, 494
transversion, 422,476
J
PCB extraction, 2 4
Phenotypic evolu ' n, and DNA sequence
evolution,
2-Phenoxycthanol tissue presergation,
32-33
Phytohenagglutrnin (PI-IA), 127
Picric acid, DNA damage, 39
Piperidine, 328
PI (pmpidium iodide), 142,259,285-286
PK (pyruvate kinase), 113-114
tree, adding new data, 475 p-Phenylenediamine, 142 Plaque, 324
uncorrected, 426,428 Phosphate buffered saline (PBS), 167 Plasmid
versatility and popularity, 529-530 Phosphate buffer (PB), 185-186,167,202 cloning, preparing permanent frozen
we~ghted,503 hybridization, 174 stocks, protocol, 354-355
Partial digestion, RFLP, 310,312-313 Phosphate-citrate, 118 DNA isolation protocol, 353-354
Partially ordered characters, 411 6-Phosphofructokinase ( P W , 111-112 screening, 225
Patermty studies, 57-58 Phosphoglucomutase(PGM), 8647,112 subcloning, 325
exclusion analysis, 219 Phosphogluconate dehydrogenase Plastic
PAUP, 45,481482,484,513 (EDH), 112 bags, trssue packaging, 30-31
"Pause sites:' 215 Phosphoglycerate b a s e (PGK),112-113 box, cryopreservation, 31
PB. see Phosphate buffer Phosphoglycerate mutase (PGAM),113 cryotubes, 30-31
PBS (phosphatebuffered saline), 167 PWLP, 424,457,463,482,510,513 Pierology, 8,331
PCDH (pyroline-5-carboxyla1.edehydro- Phylogenetic accuracy,526-531 Ploidy level, 62
genase), 113 criteria for evaluating, 527 analysis, 142
PC1 (phenol-cldoroform-isoamyl alco- number of taxa, 526 PMCs (pollen mother cells), 129
hol), 380 simulations and performance criteria, PMS (phenazine methosulfate), 86-87
PCR, see Polymerase chain reaction 526-530 Pm (polynucleohdekinase), 298
PCR amphers, 258 Phylogenetic data cycle sequencing, 330
PCR cycle, 207-209,226-227 allele frequency information, 413 end-labeling problems, 373
annealing, 208-209,227 character, 411-412 PNP (purme-nucleoside phosphorylase),
denaturation, 207-208,226 gene order, 414-415 113
extension, 209,226-227 isozyme, 413-414 Point estmates, 529
PDB (phage diiuhon buffer), 380 restnction endonuclease, 412-413 Poisson correct~on,491
p-distance,455 restriction-fragment; 412-413 formula, 468
Peanon goodness-of-fitstatishc (x2), sequence, 4 2 Poisson distnbut~on,445
390-391 Phylogenetic inference, 407-510. see also molecular clock, 532
Pee-Wee, 513 Distance methods; Maximum likeli- Poisson model, 438-139
PEG (polyethylene glycol), 225,380 hood; Parsimony d~stances,455,457
isolation method, 360-361 algorithms vs, opt~malitycriteria, Pokeweed mitogen (PWM), 127
Pentobarbital (Nembutal), 34 408409 Polarity, of characters, 411412
PEP (peptidase), 111 allozyme electrophoresis, 59 Pollen, stab~lltyin storage, 38
PER (peroxidase),111 choosing appropriate model, 440 Pollen mother cells (PMCs), 129
Percent square divergence, 335 chromosome characters, 144 Polyacrylamide gel electrophoresis
Periect-Rt theorem, 197 consistency, 527-528 (PAGE),54555,256255,262. see also
Pericenhric inversion, 143-144 criterion-based methods, 408 lsozymc eleclrophoresis
Peripheralblood cell culture medium efficiency, 528-529 accuracy in RFLP, 309
(vertebrate), 166-167 experimental phylogenies, 530-531 autoradiography of sequencing gel,
Peripheralbranch, 410 macroevolutionary patterns, 540-543 369
Permits, 30 objective, 430431 cycle sequencing, 329
Permutation tests, 391,393,504 performance, 526-531 gel dry~ngand autoradiography proto-
Peroxidase-antiperoxldase (PAP) im- programs/software, 510-514 col, 296-297
munochemistty, 140 protein level, 457 gel preparation and electrophoresis
PERT (phenol emulsion reassociation robustness/speed/discr~nunation, 529 protocol, 296
technique), 187-192 rooting, 477-478 Maxam-Cilbert sequencing, 328
PFX (6-phosphohctokmase), 11l-112 sampling size and strategy, 25-27 protocol, 291,296-297
PGAM (phosphoglycerate mutase), 113 simulations and performance criteria, running a sequencing gel, 368-370
PGDI-I (phosphogluconatedehydroge- 526-530 Sanger dideoxy sequencing, 328
nase), 112 terms used, 410 sequencing gel preparation, 362363
PGK (phosphoglycerate !&use), 112-113 types of data, 410-415 Polyclonal ant~bodies,123,126
PGM (phosphoglucomutase), 8647,112 use of models and assumptions, Poly-C tracts, 309
Phage arms, pred~gested,324 409-410 Polyethylene glycol (PEG), 225,380
Phage dilution buffer CPDB), 380 versatility, 529-530 isolation method, 360-361
Phagemid screening, 225 Phylogeny Polymerase chain reaction (PCR), 3,134,
PHA (phymhemagglutinin);127 accuracy, 3 205-247, see also PCR amplimers;
Phenol-chloroform extractron (DNA),258 defined, 1,410 PCR cycle
Phenol-chloroform-isoamyl alcohol estimation, 519-520. see also alcohol-preserved tissues, 332
(PCI), 380 Phylogenetic derence ancient or fragmented DNA, 332
650 Index
assumptions, 214-215 Vent, 207,329 plastic box, 31
cDNA, 229-230 Polymeric mulhlocus system nomencla- hssue collections, animal, 33
cloning of products, vrotocols, 356-359 ture, 96 tissue collections, synoptic, 40
comp&enLs, 210-21i Polymorphism(s),23 Presumptive loci, 413
controls, 212 ancestral, 277 Primed in situ labeling (PRINS),139
DNA isolation procedure, 222-225 divergence, 25 Pnnmer(s),207
double-stranded DNA amplification, enzvme. 58 alumal mitochondrial, 235-239
225-227 hypbthesis, 67 annealing and sequenc~ng(nucleic acid
forensic uses, 332 shared, 25 sequencing), 364365
hygiene (cross-contamination),230 Polvnucleotide kinase (PNK), 298 chloroplast DNA (vbcL, rpoC1, rpoCZ),
inhibitors, 225 c a e sequencing, 330 239-240
mverted, 325 end-labeling problems, 373 conserved, 219-220
isolation of products for sequenang, Polyploidy origin, 62 control region, 238-239
protocol, 355-356 Polytene chromosomes, 123 cycle sequencing, 328,330
laboratory setup, 221 dipteran salivary gland protocol, 155 cytochrome b, 236-238
length polymorphisms, 332 preparation, 129 cytochrome oxtdase I, 236-237
long PCR, 220-221,256,259,278,283 Polytomy, 410,419 degenerate, 214
machines (thermal cycler), 210-212, hard us. soft, 541 D-loop (displacement loop), 238-239
221 Population boundaries, DNA fragment downstream and uostream. 232
mtDNA amplification, 325-326 analysis, 267-268 3' end, 213
multiple reactions, 226 Population differences, fixed, 65 intron, 240-245. see also Intron primers
nucleic probe labeling, 137 Population differentiation, 4 labeled, DNA fragment analysis, 265
optimal conditions, 211 Population frequencies, 394 labeling protocol~microsate~ite analy-
paralogy, 8-9 Populahon genetics, 336-337 sis, 298-299
PCR cycle, 207-209 analysis, 410 length, 213
primers, 212-214,232-245. see also molecular, and phylogenetics, 4-5 match with template, 213
Primer($)and specific primer theory, 388 m~srnatch,spec~ficallydesigned,
names theory, DNA fragment analysis, 266 261-262
principles, 206-207 Population-level comparisons, 518 modifications, 213-214
probe labeling, 137 chloroplast DNA (cpDNA),269-270 nucleotlde composition, 213
problems, common, 230-231 mlcrosatellite sequences, 271-272 reaction temperature, 208
problems, signs and symptoms of m~nisatellitesequences, 270-271 redundancy reduction, 214
amplification, 231-232 rntDN.4, anlmal, 268-269 12s and 16s rRNA, 216,235-236
protocol, 225-229 nuclear sequences, 270-275 Sanger dideoxy sequencmg, 328
purification of products for sequenc- randomly amplified polymorphic "semi-randam," 254
ing, protocol, 359-361 DNAs (RAPDs), 273-275 specificity, 213
purifying amplification products, 332 repeated sequences, 275 third, in PCR, 228
random-primed, 214 single-copy sequences, anonymous, universal, cpDNA, 278
reaction conditions, 209-210 272-275 universal, PCR, 209-210,212,214,266
resources, 245-246 angle-locus sequences, identified, 272 universal, rRNA, 326
RNA, protocol, 229-230 Population sampling PRINS (pruned in situ labeling), 139
single-stranded amphfication, 228-229 feasibility, 20 Probability mahlx(ces),437,459
stock solutions, 246-247 hxed, 388 Probab~lltyof change matnx, 503
temperature, 227 random, 388 Probe(s)
variations, general, 210-212 Population structure biotm, 265
variations, in PCR cycle, 227-228 allele frequency, 54 core minisatellite, 270
Polymerase chain reaction genetic, 53 DNA, 134135
amplificationfs).see also Amplfica- inherrtance mode, 267 heterologous, 266
tion; Nuclear DNA models, 19 ISM, 134-138
ancient DNA, 220 studies, 19-22,56,518 labeled nuclelc acids, 135-137
animal mt DNA, 216 Population subdivision, 401 radioactive, 265
asymmetric, 228-229 Population values, 389 RNA, 135
cpDNA, 217 Population variation, DNAfragment speclfic radioactivity, 146
double-stranded DNA, 225 analysis, 267-268 transfer hybridization, 266
evolution studies, 330-331 Positronal homology, 412 Project design, 17-27. see also Sampling
fidelity, 330 Posl tively misleading method, 493 phylogcnetic relationship studies,
forensic identification, 7-21 Postorder traversal, 427 25-27
long PCR, 221 posthanslahon modification (m), 93 population structure studies, 19-22
nuclear DNA, 217-220 Potassium acetate (KAc), 379 problem definition, 17-18
problems and solutions, 330-331 Potato starch, hydrolyzed, 79 spec~esboundary and hybridization
single-stranded, 228-229 Preservation. see also Cryopresemation; studies, 22-25
tissues, 225 EDTA; Tissue preservation stages, 17
in vitro amplification,325 blood/eye/feather/l~air, 33-34 statistical considerations, 18
Polymerases. see DNA polymerase(s) chemicals used, 32-3437 Propanol (amine-dtrate),117
error rate, 331 isozyme variation, 66-67 Propid~umiodide (P!), 142,259
3kq, 329 l~quidnitrogen, 32 animal mtDNA sola at ion. 285-286
proportional model, 438-139 lnterspeciflc hybndizat~on,275-276 intracislronic, 67
distances, 455-457 parentage estimation, 273-274 intramolccular/si~terallelic,and RKP,
protease degradation, and electrophoresis, population-level comparlsons, 273-275 252
66-67 Random mahng, 399 Redlplo~d~atlon, 62
protcul Random model analysls, 400 Regression
cl~emistryand structure, 52 Random populahon(s), 394401 est~matedt ~ m since
e scparatlon,
monomeric, 89 diploid data, 396,398-401 536-538
shape and size, 52 haploid data, 394-398 model, 536
stability in storage, 38 population subd~v~sion, 401 Regulatory loci, nomenclature, 95
Proteilz sequence(s) sampling, 388 Relatedness
change probabliities, 438-439 Random prmer amplifications, 218-219 genes, 387
inferring from gene sequence, 457 Random priming, 214,301 ~ndivldual,518
parsimony on, 424 Range slfts, 220 Reliabihty, data set, 523
PROTML, 439 RAPDistance, 514 Repeatability, DNA fragment analysls,
proto-oncogene int primer, 242-243 M P D s (randomly ampl~fledpolymor- 255-256
PRQTPARS, 424 phic DNAs), 3,224,218-219,250, Repedted scquences, population-lcvd
Pruning algorithm (Felsenstem),442 253-254 comparlsons, 275
Pseudogene(s), 215,220,256,331 Ibte heterogene~ly,442-444 Repcat local17at1on,144
analysis, 4 among-site, ~n d~stancecorrechons, Repcat sequence, consensus, 331
PTM (posttranslation modification), 93 458-459 Repehhve DNA, 173
PT (phosphate buffer + TEACL), 175-176 Rate hetarogeneity models Repehhve sequences, 144
purlficahon gamma (r), 443 Replacement subshtuhons, 457
cpDNA, 289-290 ~nvariable-sites,443 Replacement vectors, 324
DNA, for DNA-DNA hybrid~zation, ~nvariant+ gamma, 444 RepLcate populahons, 388
179-180 Rate homogeneity us, heterogene~ty Replicatxan, rollu~gcircle, 324
glassmilk method, 360 maxlmum-likelihood methods, 442 Reproductive sola at ion, 23-24
PCR products for sequencing, protocol, Rate matrix (Q) REs. see Restr~chonendonucleasrs
359-361 instantaneous, 432433,437 Rcsamphng
plasnud DNA, CsCl gradient, 353-354 symmetric (R), 433 numer~cal,392-393
Purine-nucleoside phosphorylase (PNP), Rate parameter, 432 techniques, 197,507
113 Rate table, 43.2433 Rcsldues, synonymous US.non-synony-
Putative loci, 413 Rate vanation mous, 453
PWN (pokeweed mitogen), 127 among-slte, and maximum I~kelihood, Resolution, gcl, problems, 91-95
~~roline-5-carboxylate dehydrogenase 47'3- Resolving power, DNA hagment analysis,
(PCDI I), 113 among-site, in dlstance corrections, 263
pyruvate khase (FK),113-114 458-459 Restdist oroaram. 463
s~te-to-site,443,445 ~estriction&donucleases (REs),253. see
Q-banding (QFQ), 132,134 gamma (r)model, 445 also Enzyme
protocol, 157 stochastic models, 443 banding, 133,135
Qualitative characters, 411 I<-banding(reverse), 132,134 DNA digest~onprotocol, 290-291
binary vs. multistatc, 411 rbcL, 217,239-240 fragment electrophores~s,262-263
Qualitative coding, 414 prlrncr, 239-240 mapping, cpDNA, 312-314
Quick prep, clone screerung, 358 RCF (g force), 222 methylat1os1-senslt~ve,255
RDH (retino1dehydrogenase), 114 propertlcs, 260-261
Rad~o~sotope labels, 135-137 rBNA, nuclear array map, 232-234 randomly selected, 261
Radio~sotopscISH, protocol, 160-1 61 Reaction rates, effect of storage, 38-39 rccognitiorx sequcnccs, 253
"Ramp" (PCR temperature cl~ange),228 Reactivity, spccif~c,126 select~oncri tcrla, 259-262
Random Cladishcs, 514 Reahsm, model. 527 12e.;trictlon cndonuclease analysis,
Random error, 503-510 Reamplification, asymmetric, 325,355-356 412-413
comparing two trees, 504-506 REAP, 514 Rcstnchon ervyrnes see Restnchon en-
decay/support ~ n d ~ cand
e s T-PTP test, Rearrangements, DNA fragment analysis, donucleases
507 257 Restriction frabment length polymoi-
defin~tion and effects, 493-494 I<easstciation plusm (RFLP), 63,249-320
distance tests, 505-506 criteria, 173 agarose electrophoresis protocol,
hierarchical structure, 504506 curve, 171 291-297
I~kelihood,506 kinetics, 170-171,206 d 2 P 3' end-labelmg protocol, 297-299
nonparametric resarnpllng mctl!ods, ootilnum rate, 174 annnal mtI3NA sola at ion protocol,
507-509 Gecision, 173-174 283-288
parsunon5 505 stringency, 174 applications and l~mstations,266-282
us systematic error, 493-494 REBASE, 253 assumptions, 255-257
tree d~stortion,497 Reciprocal illumination, 409 b~ologicalproblems, mtrinsic, 317-718
Randomly amplibed polymorpluc DNAs Rccsprocity, 199 character Independence, 256
(RAPDs),3,214,21&219,250, Recomb~nantsequences, PCR ampbhca- cornblned tccl~n~qucs, 266
253-254 hon, 331 contaminallon problems, 309,312,3 15,
applications, 267 Recombinahon 318
inheritance mode, 255 algorithm for detecting, 522
652 Ind~x
cpDNA sola at ion protocol, CsCl-EB, Retinal dehydrogeiiase (RDH), 114 populat~onstructure studies, 20-22
289-290 Retroposition, 145 size, 20-22
deflned, 253 Retroposon famll~es,145-146 species boundary and hybridizat~on
digeslton problems, 315 Reverse transcnptase, 325 studies, 23-24
diiect tandem duplication, 309-310 amplification, 336 strategy, 20-22
DNA isolation and storage problems, Reverse traiiscriptase buffer, 380 substructuring scale, 22
31 5 RI1LP. sec Restrichoii fragment length variance, 18,389
DKA i>uiatioi~ protocols, 283-290 polymorphism Sanger dideoxy sequencing, 328-7329
DNA pleparahon, 257-259 Ribosomal DNA (rDNA) us. MaxamGllbert sequencing, 333
electrophoret~cartifacts, 315 cistrons, 270 Sankoff's method, 423-425
eih1di:iin bromide staining prolocol, interspecific diversity studies, 337 Satekte DXA, 123,209
297 repeat unit, 280 Satellite sequences, nudear, 259
forms of variahon, 250-254 Ribosomal RNA (rRNA) Saturation effects, minimizing, 500
fragnicni to~~for~natron/stab~llty, clstrons, 275 SCP (salme citrate-phosphate), 168
252-253 gene arnplif~cat~on, 217 Screening
fragmen~v~sualizat~on, 297-302 gene array, 520 bacteriophage library, 348-349
f~dgmcnt vs, slte approaches, 308-309 gene evolution studies, 336 blunt-end clones, 358
hentabiilty, 255 interspecif~cdiversity studies, 337-338 DGGE, 252-253,255,263-264,266,361
hetrroplasmy, 309-31 3 PCR ampl~fication,326 DNA sequence variahon, 361-362
homology, 256 primers, 235-236 efficiency, 325
~nterp~ecatlon, 308-314 sequencing, 325-326 gene l~brarirs,324-325
laboratory setup, 282 sequencing problems, 373-374 heteroduplex analysis, 252-253,255,
iength mutations, 314 kbulose 15-bispliosphate carboxylase 361
length vanahon, 309-310 (rbcL), 217 immunoscreaiing, 324
mdpping large sequences, 313-314 interspecific d~versitystudies, 339 mlniprep clone, 358
metirods, 257-266 primer, 239-240 phagemid and plasmid, 225
ineihylatioi~,255 Rnger's solution, amphibian, 166 quick prep, 358
microsalellitc prnner labeling, 298-299 IWA-DNA hybridization, 137 SSCP, 252-253,255-263,266,361-362
PAGE prolocol, 291,296-297 RNA isolabon, 325-326,345-347 SCs (synaptonemal complexes), 129-130
parbal (incomplete) digest, 310, RNA polyrnerases, rpoC2 and rpoC2 SDS (sodium dodecyl sulfate), 380
312-313 primers, 240 Secondary struchw, DNA/RNA, 373-374
principles, 249-255 IWAprobes, 135 SEDTA (saline EDTA), 202
radioactive labeling, 265 RNase (DNase-free),202 Segregate analysis, bulked, 276
RE DNA d ~ g s h o nprotocol, 298-291 XNA sequencing, 333,366 Self-primng, 208
repeatabilily, 255-256 RNA transcriphon, and in vitro labeling, Semen collection, 36
restriction endonuclease cbo~ce,309 135-136 Semi-random primers, 254
reslrlction site mapping protocol, Robertsonian translocation, 143-144 Sequence
302-308 Rogers' distance measure, 462463 class separation, 176
restr ict~onsite vanation, 253-254 Rogers' metl~od,425426 comparison, pairw~se,454-455
sequence choice, 266-268 Rollmg circle repbcation, 324 comparison and alignment, 374-378
stdinlng, 264 Root, location, 431 data, 412
stock solut~ons,319-320 Rooting, 477478,488 divergence, 199
teclin~quer;and their appl~cat~ons, 250 rpm, 222 evolution, 321
transfer hybndizatlon problems, rpoC1,C2,217 evolution, models, 4321136
316-317 primer, 240 localization, 144
trausfcr hybridization protocol, rlWA. see fibosomal RNA variation, 194, 199
299-302 IGW, 514 z stat~stic,374
Iroubleshootmg, 314-317 Sequence dissimilarity, 453-454
Reslricl~onmaps 35Slabel, 265 Sequencing, direct, 264. srealso Nudeic
accuracy, 256 S1 nuclease, 188-189 acid sequenang and clonlng
cpDNA, 312-314 buffer, 202 Sequencing gel
Rcsl~ict~an site digestion, 174-175 preparation, 362-363
analysli, 249-320, see also Reslriction solution, 202 running, protocol, 368-370
fragment length polymoryhisrn Salamandcr oocytes, lamplxush cl~romo- Sequential electrophoresis, 64
creakion, 261-262 somes, 130 Sequential hybri&zation, 305304,308
mappmg, 29 1 Salmon sperm DNA, 202 Shadowing, 317
Rest1lction slte mapping pro(ocol, 302-308 Salt extraction (DNA), 258 Sheared drivers, 180-181
double digestion experiments, 304-307 Sampling, see also I'roject design Shik~matedelrydrogenase (SKDH), 114
sequential hybridization, 303-304,308 cost-effectrve, 22 ShUMes, low-stzingency 227-228
Rcstnction sites, 213 genehc and statistical, 387-388 Silane, 320
RESTSI'I'E, 514 individuals us, populations, 402 Sdent subshtutions, 216,457
Retenbon, 9 intraspecdic d~fferentiation,402403 Similarity, definition, 7
IZetetrdplo~d~~ahon, 62 limitations, isozyme electrophoresis, Suninator, 514
~ehcuIalion,9-10,521 64-66 Sunulahon bias, 527
delecling, 522 phylogenctic relationship studies, Sunulat~onstudies, 527
Iil~eages519-520 25-27 Single-locus sequences, identified, popu-
sysce~naiicelror, 498 latlon-level comparisons, 272
Single-strand conformation polymor- Spermatocyte,diromosome preparation, STE (sodium chlogde-Tns-@~T~), 34,
phisms (SSCPs),252-253,255 129-130 320.380
p o d a t i o n screening, 266 Splash technique, 127-129 STES (sodium chloride-T~s-EDTA-+~-
resolving power, 263 mitotic chromosome slides, protocol, crose), 283-284,319
Single-stranded PCR amplifications, 151-152 STET (sodium chloride-Tris-EDTA-%-
228-229 Split decomposition, 490-492 ton), 380
double-stranded amplifications, 229 choosing distance transformation, 492 mlniprep clone screening, 358
problems, 231-232 corrected distances, 492 Stop bands, 372
pure mtDNA, 229 Squash technique, 127-128 Stop buffer, 380
SKDH (shikunatedehydrogenase),114 mitotic/meiotic chromosomes,proto- Storage, electrophoresis chemicals, 74-78
Skewness, 504 col, 150-151 Streptavidin, 127,139,142
"Smiling" (bands), 373 SSCP (single-strand conformation poly- Stringency, 209
Sodium acetate, 202 marpbm), 252-253,255 DNA fragment analysis, 265
Sodlum chloridesod~runcitrate (SSC), SSC (sodium chloridesodium citrate), high-stringency bounces, 227
380 168,380 low-stringency shuffles, 227-228
Sodium chloride-Tris-EDTA + sodium protocol, 361-362 transfer hybridization, 299
dodecyl sulfate (STE + SDS), 320 Stability "Stuttering," 317
Sodium chloride-Tris EDTA (STE), 380 developmental, 61 Subbands, 66,91-93
blood preservation, 34 WLP, 252 Subbed slide protocol, 149
Sodium chloride-Tris-EDTA-sucrose tissue, 37-39 Subbing solut~onfor microscope slides,
(STES), 319 Stamning 168
buffers, 283-284 background, 93-94,148 Subcellular organelle loci, nomenclature,
Sodium chloride-Ris-EDTA-Mton direct, 264 95
(STET),358,380 enzyme formulas, 96-116 Subcloning, 325
Sod~umdodecyl sulfate (SDS),380 incubation, 97 lnto plasmids or M13,351
Sodium hydroxide, 203 intensity, isozyme, 62 Subgenomic libraries, 324
Sodium iodide solution, 203 silver, 264 Submarine gel, 293
SOD (superoxide dismutase), 114 volume, 96-97 Substitutionalmutations, 336
Software, for phylogenet~canalyses, Stains Substitution(s),375
510514 agar overlay, 86-89,97 branch length, 440
Spacers costs, 96 conservative, 453
external transcribed (ETS), 232 DAB solution, 167 silent, 216
internal transcribed (ITS-1 and ITS-2), histochemical, 86-89,97. see also specif- synonymous (silent) vs. non-synony-
232,238,275,338 ic stains mous (replacement),457
non-transcribed (NTS), 232 W-fluorescing, 87 Substitutionmodels
wedge-shaped, 372 Slarcl~gel electrophoresis (SGE),54. see Bayhoff, 439
Spcciation,337 also lsozyme electrophores~s F81,436
allozymes, 59 agar overlay drying protocol, 88-89 F84,436,438
cytogenetic change, 144 amperage used, 83 Hn85,436,438
Species buffer systems, 82-83 J'M', 439
asexual, 520 buffer well, 69 Jukes-Cantor OC), 435-436
crypt~c,519 documentation of results protocol, 89 Kimura's two-parameter (KZP),
definition, 22 electrodecantation, 54 435-437
diagnostic morphological features, 24 electrophoresisprotocol, 82-84 mathematical expression, 432
divergence, 520 equipment and supplies, 68-69 Poisson, 438-439
types, 22-23 gel loading protocol, 81-82 time-reversible, 433-434
unisexual, 518-519 gel preparation protocol, 79-81 Substitutionrate, 436,439440
Species boundaries and hybridization, gel scoring, 90-91 Subunit gene, 232
22-25 gel slicing, 82,8546 Subunit shuchue, 89-91
fixed genetic differences, 24 gel splitting/tearing, 83,85 Succinate dehydrogenase (SUDH), 114
hybrid zone, 24-25 histochemical staining protocol, 86-88 SUDH (succinake ddeydrogenase),114
pilot studies, 23-24 horizontal apparatus, 84 Sulfhydrylreagent, enzyme stabilization,
sampling sizes and strategies, 23-24 horizontal vs. vertical, 54 38
Species boundary stud~es,electrophoretic, laboratory setup, 67-69 Superimposed events, 446,454-46
58 tissue homogenization protocol, 73, Superoxlde dismutase (SOD), 114
Species concept, phylogenetic, 23 78-79 isozyme, 94
Species-level comparisons, 276-279 wickless system conditions, 83 Support index, 507
character choice, 276 Star decomposition methods, 483-484 Svedberg units (S), 232
cpDNA, 277-278 Star tree,410,483 Swofford-Berlocher method, 425-426
mtDNA, animal, 276-277 State set, 417 Sympatric populations/species, 22-23
nuclear genes, 278-279 Statisticalsampling, 387 Symplesiomorphic alleles, 61
Specific reactiwty, 126 Step matnces, 502-503 Synapomorphic allozpnes, 65
Spectra "Step-up" (PCR), 228 Synaptonemal complexes, 129-130
branch length, 468 Stepwise addition, 482-483 Synonymous residues, 453
expected sequence, 468 Steric hindrance, 175 Synonymous substitutions, 457
Spectral analysis. set Hadamard conjuga- STE + SDS, 320 Synoptic tissue collections. see Tissue col-
tion lechons, synoptic
654 Index
Temperature niater~alsand supplies, 30-33
annealing, PCR, 227-228 noncryogenic, 32
,assessing effect of potential bias, extension, PCR, 228 Tissue stability, long-tenn, 37-39
496-497 incubation buffer. 173 Tlssue storage, 31-33,37
changing assumptions, 499 melting. see Melting temperature Tissue hansport, 36-37
character-state weighting, 502-503 "ramprr (PCR), 228 shipping iegulations, 36
character weighting, 500-502 ~ernplke TM10,268
conditions that lead to, 494-495 "jumping," 215 tmL, 217
definition and effects, 493-494 match, wlth primer, 213 Topology-dependent permutation tad
eliminating unreliable data, 499-500 Ternunal deoxynucleohdyl transferase probablhty (T-PTF) tests, 507
individual taxa and optimality critcri- (TdT),373-374 "Total ev~dence,"522
on, 498 Term~nallength vanation, 454 "Touch down" (PCR),228
inferences in, based on different mole- Termnal nodes, 410 TPBS (Tris-phosphatebuffered sahne), 168
cules, 498 Termnat~onmixes, cycle sequencing, 381 TPI (triose-phosphate somer rase), 114-115
nonparametric approaches, 498-499 Tetraethylammonium chloride CTEACL), T-PTP (topology-depcndcnt permutation
us, random error, 493-494 17.5-176,18&189,203 tad probability) tests, 507
recognizing, 495-499 TE (Tns-EDTA), 381,203 Tracer, 171
reduction, 499-503 buffer, 247,262 repeated elements in DNA,193
removing long branches, 499 ThE, 319 short or degraded fragments, 194
specific taxa in tree, 497-498 Thermal cycler (PCR machine), 210-212, Tracer fractionation, 178
tree distortion, 497 221,282 Tracer fragment length eshmation proto-
Systematics Thermal cyck sequencing, see Cycle secol, 184
definition, 1 quencing Tracer preparahon
higher-level, 279-282 Thermal cyclmg. see Polymerase chain re- ladlnation protocol, 184-185
molecular evolution, 3-4 achon or 3H protocol, 181-182
Thermal elution, 169-170 Tradang dye, 73,117
T u n , 197 Thermal stab~lity(AT, ), sequence differ- ?kanscnptar;e, reverse, 325
TsH, 196-197 ence, 176 Transfer hybriduahon, 256,265-266
melting curve, 192 Thcrmus aquatlcus polymerase (Taq), 207, probes, 266
ATSoH,172 329 problems in I i n P analysis, 316-317
melting curve, 189,192 buffer, 246,380 protocol, DNA fragment analysis,
T, 196,200 Theta (& FST),393-394,397400 299-302
PCR primers, 208 Thiosulfate sulfurtransferase CTST), 114 Transfer hybridlzahon protocol
~~,,,,lf1-171 Tie labeling of probe and hybnd~zahon,
melting curve calculation, 189 estimates and predictions, 531-540 301-302
sequence difference, 176 relations to substitution rate, 439-440 prehybr>d~zatlon of fllter, 301
Tmoder195,200 Time-reversible models, 430-431 transfer of DNA to membrane, 299-301
ATmode,171,189 Dayhoff, 439 two-sided, dry-blot method, 300
TA cloning, 358459 F81,436 washing and autorad~ography,302
TAE (Tris-acetateEDTA), buffer, 247,262, F84,436,438 Transformahon(s),325
380 general (GTR), 433434,436,456 blunt-end cloning, 358
Tamura and Nei model, 456 HKY85 model, 436,438 E, coil with plasmld DNA,protocol,
Tandem duplication JTT, 439 352
direct, 309-310 Tukea-Cantor (JC), 435-436 MI3 bacter~ophageDNA,protocol,
FCR amplification, 331 ~irnura'stwo-parameter W F ) , 352-353
Tandem repeats, 250-252 435-437 preparation of frozen competent cells,
Taq (Thermus aquaticus) polymerase, 207, Poisson, 438-439 protocol, 351-352
329 T i e since divergence, 536-539 Transformahons, &stance
buffer, 246,380 Tissue, preserved, PCR extraction, 223 allozyme/restnction endonuclcase
"Targeted" digestion, 261 Tissue collection, 29-35 data, 462-464
Target sequence isolation, 323-326 animal, 33-35 costs, 422-423
TAT (tvrosine amhotransferase). 115 blood and helnolymph, 33-35 evaluatmg, 461-462
TDE dris-borate-EDTA), 118,119,382 materials and supplies, 30-33 sequence data, 453-462
buffer, 247,262 packaging, 30-32 Transformed distances, 487
TBS (Tris buffered saline),168 plant, 35 Transition probability matnx, 437
TCA (trichloroacetic acid), 138,203 regulations governing, 30 Transihons,4,375
TCA (trichloroaceticacid)/BSA bovine venom, 35 Lake's method, 477
senun albumin), 168 Tissue collections, synoptic Trana~honsubstitutions, eshrnahng,
TdT (terminal deoxynucleotidyl trans- curatorial problems, 40-41 456457
ferase), 373374 development and support, 39-41 Translhontransversion ratlo, 215
TEACL (tetraethylammonium chloride), existing collections, 41-47 high, methods for, 477
175-176,188-189,203 bssue preservation, 40 Translation, mck, see Nick translahon
TEMED (N,N,N,Wtetramethylethylene- Tlssue homogenization protocol, 73,78-79 Transmiss~onelectron nucroscopy in situ
diamine), 320 Tissue preservation, 32-33 hybndtzahon CTEMISH), 123
TEMISH (transmission elcctron mi- effect on electrophoresis, 38 Transposable elements (transposons), 146,
croscopy in situ hybrideahon), 123 isozyme variation, 66-67 268
Index 655
~ ~ ~ ~ ~ ~ o s duplicative,
i t i f l n s , 308 Triosc-phosphate isomerase (TFI), lnterspecihc vanabon analys~s,270
Trawvcrsion parsimony, 422,476 114-115 maling system and yopulatlon strue-
~ ~ ~ ~ v e r s i o 4,375
n(s), Tris-acetate, neutral gel and tray buffer, turc shldies, 267
informative, 476 203 Vanallcccovanancc matnx, 47.3474
Lake's method, 477 Tns-acetatcrEDTA (TAE), 247,262, 380 Vana:lce formulas, theoretical, 449
,ubstitutlonS, estimating, 456-457 Tris-borate-EDTA-hthium, 119 Variances, sample, 389
Traversal, postorder, 417 Tr~s-berate-EDTA (TBE), 118,119,381 Vanahon
Tree addihvity, 447 buffer, 247,262 evolutionary, 335
TrecAlign, 377,514 Tr~sbuffered s a h e (TBS), 168 ~ntcrspcc~fic,5
n e e dgnment, Sankaff's method, Tns-atrate/borate, 120 ~ntraspecdic,4-5,201
423-425 Tris-atrate-EDTA, 120 molecular, 6
TREECON,514 Trffi-citrateII, 119 non-genehc, 67
Tree length Tr~s-cltrate111,119-120 sequence, 176-1 77
frequency distributions, 504 Tris-EBTA (TE), 120,203,381 Vectors see Clonu~gvectors, Inserhon,
,>~inimlzation,415 Tris-ethanol wash buffer, 381 vectors, Replacement vectors, Se-
Tree rootkg, 477-478 Tris-HCl, 120,203 quenung vectors, T-vector
locating root, 477478 T~E-maleate-EDTA, 120 Velocity centrihlgahon, step gradlent, 287
midpoint, 488 Tns-NaCI, 168 Venom collection, 35
outgroup taxa in testing ingroup Tntlum label, 135 Vent polymerase, 207,329
tnonophyly, 478 Triton X-100,211,225 Vinblashne, 127
role in phylogenetic analysis, 477- tmK, 217 Viruscs, epidemolo~yand evoluhon, 520
Tree(s) TST (th~osulfatesulfurtransferase), 114 VNTR see Varlablc number of tandem re-
additive, 447 Tubmg, dialysis, 319,379 peats
additive, algorithmic methods, T-vector preparahon, 358359 VNTR loci, 255
487-493- Tween-20,211 convcrgence, 257
additive, systematic error, 495 Two-allele, smgle-locus model, 90 fragment electrophoresis, 262-263
calculating hkelihood, 430-431, Two-fold codons, 212-213 VQSTORG, 514
440-442 Two-parameter modcl (UP), 435-436,496 Wagner method, 487,493
character state, 411 distance, 456 Wagner pars~mony,41 6-419
closest, 471 Tyroslne amhotransferase (TAT),115 computing tree length, 417-,419
computing length, 417-419 Wahlund effect, 57
evaluation of likelihood, 442 UGUT (UTF-glucose-1-phosphateuridyl- Wash buffer (cpDNA),319
finding optimal, 416 Iransferase), 115 Water
five sequences, 441 UK (uridine kinasc), 115-116 DEP-treated, 379
heuristic selection, 482-485 UItracenhihgahon, 216,282 m~puntlesand elcctrophorcsis, 70
likelihood, 440 bottom puncture of tubes, 288 Welghted least-squares critana, 448-451
mlnimun length, 416 DNA extraction, 258 Weighted parsimony, 503
us. networks, 521-522 Ultramenc distances, 452-453 Welghtcd I'GMA (WIGMA), 486
number of changes, 442 Uncorrected parsimony, 426,428 Weight~ng
parametric test for comparing, 505 Uncqual crossing over, 145 schemes, in additive tree methods, 418
rooted, 521 Uninformative character, 500 transversiol~trans~tion,423
rooting, 417 Uninformative patterns, 427 Welghhng matrrces, 422-423
star, 410 Unisexual biotypes, parentage, 61 Wlckless system, 83
tests for comparing two, 504-506 Unisexual species, 518-519 WINAMOVA, 514
ultrameric, 447 - Universal primers. see Primerts), un~versal 'Wining sites" test, 505
unreotcd, 410,433,478479,488,521 Unordered characters, 411 WPGMA (we~ghtedI'GMA), 486
Rees, inferred, 493-510 Unrooted tree, 410
random error, 503-510 UPGMA (unweighted pair group method XDI-I (xantlunc dchydrogexiase),116
systematic error, 494-503 uslng aritfimehc averages), 451,486 Xenology, 7-9
svstematic us. random error, 193-494 clustering, 172 systemahc error, 498
~re'es,optuna~ Upstream, 206
branch-and-bound methods, 480482 Upstream pruners, 232 Y chromasome-specrhc sequences, 275
exact algorithms, 478-482 Urea mx, 381 Yeast method for ilut~hcc h r o i ~ ~ ~ ~ ~ o ~ n c s ,
heuristg methods, 481 Uridlne kinase (UK), 115-116 151
searching for, 478-493 UTP-glucose-1-phosphateuridyltrans- Yeast suspension, 168
star decomposition methods, 483-484 ferase CUGUT), 115
stepwise addition, 482-483 UV cross-h~ker,221
Triallelic variation, 89-90
Tricane, 34 Vacuum blotting, 299
TricNoroaceticacid W A ) , 138,203 Vacuum dryu~g,38
TricMoroaceticacid /bovine scrum albu- Variable number of tandem repeats (VN-
min (TCA/BSA), 168 T h ) , 252. see also VNTR loci
About the Book
Editor. Andrew D. Sinauer
Project Editor: carol J. Wigg
Book Design: Christopher Small
Cover Design: Concept, David Hillis; Design, Jefferson Johnson
Production Manager: C17ristopher Small
Book Production in QuarkXpress: Janice Holabird
Book and Cover Manufacture: Best Book Manufacturers

Molecular Systematics - David Hillis, Craig Moritz, Barbara Mable

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Molecular Systematics - David Hillis, Craig Moritz, Barbara Mable

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Molecular Systematics - David Hillis, Craig Moritz, Barbara Mable

Uploaded by

Copyright:

Available Formats

nr Systematics

Sinauer Associates, Inc. * Publishers

A/IoEeculnr Systemntics, Second Edition

Copyright O 1996 by Sinauer Associates, Inc.

Library of Congress Cataloging-in-Publication Data

I. Molecular Systematics: Context and Controversies 1

Part 2. Molecular Techniques

Part 2 Molecular Techniques

Protocol 10:Mitohc Cl~nnnosomesfwn2Insect Embryos 154

Cktirp~er6 Protocol 3: Tracer Preparation with 3 2 or~ 3H 181

Gl~apter9 Protocol 7: Growtng Bacteriophage 347

Chapfeeu21 OPTlMALITY CRITERIA 11: METHODS BASED ON MODELS

Predictions of Time from Molecular Data 531

David M. Hillis and Craig Moritz

Figure 1 T11e phylogeny and classification of life as proposed by Haeckel(1866).

+-- Speciat~onevent 2 ----+

Zimmer, E. A., T. J. White, R. L. Cann and A. C. Wilson Nucleic Acids

Power *P 0.55 0.70 0.80 0.90 0.95

50% 0.05 760 645 492 276 146

From Richardson et al., 1986.

REGULATIONS GOVERNING loci was obtained from individual lice weighing

and catalytic properties of many enzymes for at Procedures Unique to

(1984). Inhalatjon anesthetics such as halothane Blood and Hemolymplz Collectior~

served as experimental material for an important chemical compositioi~and molecular conforma-

Disposition of Tissues for Deacquisition Policies

APPENDIX: SYNOPTIC TISSUE Strengths: Caudata, Squamata, Crotalus

ROYAL ONTARIO MUSEUM

ISRAEL UNITED STATES OF AMERICA

CENTER FOR REPRODUCTION OF Indiana

Pennsylvania UNIVERSITY OF TEXAS

UNIVERSITY OF PITTSBURGH TEXAS TECH UNIVERSITY

West Virginia Size: 2; material: 7; taxa: 4; regions: A, B, D-F

Proteins are composed of amino acids joined by

and negatively charged proteins (anions) migrate

Separates by charge Yes Yes yes yes

low, 1988; Baker and Moeed, 1987; A.J. Baker,

Character Ancestral Derived

XNTRACX~TRONIC RECOMBINATION Recombination

Quantity Quantity Protocol

- ,-Groove to hold wire 7

10/32 Nuts and bolts ---/ 2'

Figure 4 Gel origin guide (constructed from %-inch

Detail of shield Setup

PROTOCOLS Protocol 3 : Tissue Homogcrkizatlon

Acetic acid, glacial (buffers) s

Fluorescein diacetate (CAI f P7378 E,S,R,I

Peroxidase (PEP) f P-8125 A, E,S, X,I

Succinic acid (disodium salt) (SUDH) s 5-2378 E, S, I

a Enzyme systcm(s) and/or buffer(s).

preferably at >10,000 g for 15-30 nzirl, to sepa-

paper. Each stack should contain as many

Amine-citrate (morpholine) 4.2 Overnight 14 hr

lated by choosing a tissue that will express the de-

where L, = the total number of loci, h = the num- 0

However, the question of a two-allele, single-lo-

Anode erodimers in relevant individuals can provide an

zyme system, and electrophoresis buffer used. - Origin