Bioinformatics: An Introduction 4th Edition Jeremy Ramsden 2024 Scribd Download
Bioinformatics: An Introduction 4th Edition Jeremy Ramsden 2024 Scribd Download
com
OR CLICK BUTTON
DOWLOAD NOW
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://ebookmeta.com/product/marketing-an-introduction-4th-
edition-ross-brennan/
https://ebookmeta.com/product/primary-mathematics-3a-hoerst/
https://ebookmeta.com/product/an-introduction-to-mechanical-
engineering-4th-ed-4th-edition-jonathan-wickert/
https://ebookmeta.com/product/evolutionary-psychology-an-
introduction-4th-edition-lance-workman/
An Undergraduate Introduction to Financial Mathematics
4th Edition Buchanan
https://ebookmeta.com/product/an-undergraduate-introduction-to-
financial-mathematics-4th-edition-buchanan/
https://ebookmeta.com/product/health-studies-an-introduction-4th-
edition-jane-wills-editor/
https://ebookmeta.com/product/bioinformatics-an-introductory-
textbook-1st-edition-thomas-dandekar/
https://ebookmeta.com/product/evolutionary-psychology-an-
introduction-team-ira-4th-edition-lance-workman/
https://ebookmeta.com/product/an-introduction-to-holistic-
enterprise-architecture-4th-edition-scott-bernard/
Computational Biology
Jeremy Ramsden
Bioinformatics
An Introduction
Fourth Edition
Computational Biology
Editors-in-Chief
Andreas Dress, CAS-MPG Partner Institute for Computational Biology, Shanghai,
China
Michal Linial, Hebrew University of Jerusalem, Jerusalem, Israel
Olga Troyanskaya, Princeton University, Princeton, NJ, USA
Martin Vingron, Max Planck Institute for Molecular Genetics, Berlin, Germany
Advisory Editors
Gordon Crippen, University of Michigan, Ann Arbor, MI, USA
Joseph Felsenstein, University of Washington, Seattle, WA, USA
Dan Gusfield, University of California, Davis, CA, USA
Sorin Istrail, Brown University, Providence, RI, USA
Thomas Lengauer, Max Planck Institute for Computer Science, Saarbrücken,
Germany
Marcella McClure, Montana State University, Bozeman, MT, USA
Martin Nowak, Harvard University, Cambridge, MA, USA
David Sankoff, University of Ottawa, Ottawa, ON, Canada
Ron Shamir, Tel Aviv University, Tel Aviv, Israel
Mike Steel, University of Canterbury, Christchurch, New Zealand
Gary Stormo, Washington University in St. Louis, St. Louis, MO, USA
Simon Tavaré, University of Cambridge, Cambridge, UK
Tandy Warnow, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Lonnie Welch, Ohio University, Athens, OH, USA
Editorial Board
Janet Kelso, Max Planck Institute for Evolutionary Anthropology, Leipzig,
Germany
Gene Myers, Max Planck Institute of Molecular Cell Biology and Genetics,
Dresden, Germany
Pavel Pevzner, University of California, San Diego, CA, USA
Endorsed by the International Society for Computational Biology, the
Computational Biology series publishes the very latest, high-quality research
devoted to specific issues in computer-assisted analysis of biological data. The main
emphasis is on current scientific developments and innovative techniques in
computational biology (bioinformatics), bringing to light methods from mathemat-
ics, statistics and computer science that directly address biological problems
currently under investigation.
The series offers publications that present the state-of-the-art regarding the
problems in question; show computational biology/bioinformatics methods at work;
and finally discuss anticipated demands regarding developments in future
methodology. Titles can range from focused monographs, to undergraduate and
graduate textbooks, and professional text/reference works.
Jeremy Ramsden
Bioinformatics
An Introduction
Fourth Edition
Jeremy Ramsden
Department of Biomedical Research
The University of Buckingham
Buckingham, UK
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Mi a tudvágyat szakhoz nem kötők,
Átpillantását vágyuk az egésznek.
imre madách
Preface to the Fourth Edition
Eight years have elapsed since the previous edition, during which there have been
continuing rapid advances in many of the technologies used to obtain the raw
data of bioinformatics, such as DNA sequencing, as well as enormous increases in
widely available computing power, and discoveries have continued apace. There has
also been a global pandemic, combating which has been greatly assisted by bioin-
formatics, and which vastly boosted data acquisition. These developments alone
warranted thorough revision of the material in the book. The opportunity has also
been taken to somewhat rearrange the chapter topics, although admittedly in such a
multidimensional subject as bioinformatics there is probably no ideal arrangement.
There has been a significant increase in the space accorded to regulatory networks
and their analysis, which is now in better balance with the nucleic acid sequencing
aspects, which are usually perceived as the traditional subject matter of bioinfor-
matics; the transmission of information within the networks, and their architecture,
deserve comparable prominence. We are becoming accustomed to the idea that life is
organized heterarchically and that our DNA is just one of many features contributing
to a living organism, which must survive a lifetime in a changing environment, during
which its DNA sequence is not changing.
New material added includes forensic investigation, viruses, pandemics, domes-
tication, and multiomics. Nevertheless, every effort has been made to avoid unduly
increasing the overall length of the book. Many new references have been added, and
of course it has never been easier for a reader to find further information from the
vast, albeit uncritically accumulated, resources available on the World Wide Web.
The reader should be cautioned not to accept anything in this book—or indeed in
any other—as the last word. As Max Planck remarked at the end of his 17th Guthrie
lecture, delivered to the Physical Society in London in 1932: “… science does not
mean contemplative rest in possession of sure knowledge, it means untiring work
and steadily advancing development”.
vii
Preface to the Third Edition
The publication of this third edition has provided the opportunity to carefully scru-
tinize the entire contents and update them wherever necessary. Overview and aims,
organization and features, and target audiences remain unchanged. The main addi-
tions are in Part III (Applications), which has acquired new sections or chapters
on the seemingly ever expanding “-omics”—now metagenomics, toxicogenomics,
glycomics, lipidomics, microbiomics, and phenomics are all covered, albeit mostly
briefly. The increasing involvement of information theory with ecosystems manage-
ment, which is undoubtedly a part of biology, was felt to warrant a new chapter on
that topic. The nervous system has also been explicitly included: it is indubitably
an information processor and at the same time biological and, therefore, certainly
warrants inclusion, although consideration of the vastness of the topic and its exten-
sive coverage elsewhere has kept the corresponding chapter brief. A section on the
automation of biological research now concludes the work.
In his contribution, entitled “The domain of information theory in biology”, to
the 1956 Symposium on Information Theory in Biology, Henry Quastler remarks
(p. 188) that “every kind of structure and every kind of process has its informational
aspect and can be associated with information functions. In this sense, the domain
of information theory is universal—that is, information analysis can be applied to
absolutely anything”. This sentiment continues to pervade the present work.
The author takes this opportunity to thank all those who kindly commented on
the second edition.
January 2015
ix
Preface to the Second Edition
Overview and aims. This book is intended as a self-contained guide to the entire field
of bioinformatics, interpreted as the application of information science to biology.
There is strong underlying belief that information is a profound concept underlying
biology, and familiarity with the concepts of information should make it possible
to gain many important new insights into biology. In other words, the vision under-
pinning this book goes beyond the narrow interpretation of bioinformatics some-
times encountered, which may confine itself to specific tasks such as the attempted
identification of genes in a DNA sequence.
Organization and features. The chapters are grouped into three parts, respec-
tively covering the relevant fundamentals of information science; overviewing all of
biology; and surveying applications. Thus Part I (fundamentals) carefully explains
what information is, and discusses attributes such as value and quality, and its multiple
meanings of accuracy, meaning, and effect. The transmission of information through
channels is described. Brief summaries of the necessary elements of set theory,
combinatorics, probability, likelihood, clustering, and pattern recognition are given.
Concepts such as randomness, complexity, systems, and networks, needed for the
understanding of biological organization, are also discussed. Part II (biology) covers
both organismal (ontogeny and phylogeny, as well as genome structure) and molec-
ular aspects. Part III (applications) is devoted to the most important practical appli-
cations of bioinformatics, notably gene identification, transcriptomics, proteomics,
interactomics (dealing with networks of interactions), and metabolomics. These
chapters start with a discussion of the experimental aspects (such as DNA sequencing
in the genomics chapter), and then move on to a thorough discussion of how the data
is analysed. Specifically medical applications are grouped in a separate chapter. A
number of problems are suggested, many of which are open-ended and intended to
stimulate further thinking. The bibliography points to specialized monographs and
review articles expanding on material in the text, and includes guide references to
very recently reported research not yet to be found in reviews.
xi
xii Preface to the Second Edition
May 2008
Preface to the First Edition
This little book attempts to give a self-contained account of bioinformatics, so that the
newcomer to the field may, whatever his point of departure, gain a rather complete
overview. At the same time it makes no claim to be comprehensive: The field is
already too vast—and let it be remembered that although its recognition as a distinct
discipline (i.e., one after which departments and university chairs are named) is
recent, its roots go back a long time.
Given that many of the newcomers arrive from either biology or informatics, it was
an obvious consideration that for the book to achieve its aim of completeness, large
portions would have to deal with matter already known to those with backgrounds
in either of those two fields; that is, in the particular chapters dealing with them,
the book would provide no information for them. Since such chapters could hardly
be omitted, I have tried to consider such matter in the light of bioinformatics as a
whole, so that even the student ostensibly familiar with it could benefit from a fresh
viewpoint.
In one regard especially, this book cannot be comprehensive. The field is devel-
oping extraordinarily rapidly and it would have been artificial and arbitrary to take
a snapshot of the details of contemporary research. Hence I have tried to focus on
a thorough grounding of concepts, which will enable the student not only to under-
stand contemporary work but should also serve as a springboard for his or her own
discoveries. Much of the raw material of bioinformatics is open and accessible to all
via the internet, powerful computing facilities are ubiquitous, and we may be confi-
dent that vast tracts of the field lie yet uncultivated. This accessibility extends to the
literature: Research papers on any topic can usually be found rapidly by an internet
search and, therefore, I have not aimed at providing a comprehensive bibliography.
In bioinformatics, so much is to be done, the raw material to hand is already so
vast and vastly increasing, and the problems to be solved are so important (perhaps
the most important of any science at present), we may be entering an era comparable
to the great flowering of quantum mechanics in the first three decades of the twentieth
century, during which there were periods when practically every doctoral thesis was
a major breakthrough. If this book is able to inspire the student to take up some of
the challenges, then it will have accomplished a large part of what it sets out to do.
xiii
xiv Preface to the First Edition
Indeed, I would go further to remark that I believe that there are still comparatively
simple things to be discovered and that many of the present directions of work in the
field may turn out not to be right. Hence, at this stage in its development the most
important thing is to facilitate that viewpoint that will facilitate new discoveries.
This belief also underlies the somewhat more detailed coverage of the biological
processes in which information processing in nature is embodied than might be
considered customary.
A work of this nature depends on a long history of interactions, discussions,
and correspondence with many present and erstwhile friends and colleagues, some
of whom, sadly, are no longer alive. I have tried to reflect some of this debt in
the citations. Furthermore, many scientific subjects and methods other than those
mentioned in the text had to be explored before the ones best suited to the purpose
of this work could be selected, and my thanks are due to all those who helped in
these preliminary studies. I should like to add an especial word of thanks to Victoria
Kechekhmadze for having so ably drawn the figures.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 What is Bioinformatics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 What Can Bioinformatics Do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 An Ontology of Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 The Organization of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Part I Overview
2 Genotype, Phenotype, and Environment . . . . . . . . . . . . . . . . . . . . . . . . . 11
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Regulation and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 The Concept of Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Cybernetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 The Integrating Rôle of Directive Correlation . . . . . . . . . . . . . . . 23
3.6 Timescales of Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.7 The Architecture of Functional Systems . . . . . . . . . . . . . . . . . . . . 25
3.8 Autonomy and Heterarchical Architecture . . . . . . . . . . . . . . . . . . 26
3.9 Biological Information Processing . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1 Phylogeny and Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.1 Group and Kin Selection . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Models of Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Evolutionary Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Evolutionary Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 Concluding Remarks on Evolution . . . . . . . . . . . . . . . . . . . . . . . . . 37
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
xv
xvi Contents
Part II Information
6 The Nature of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1 Structure and Quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.1.1 The Generation of Information . . . . . . . . . . . . . . . . . . . 55
6.1.2 Conditional and Unconditional Information . . . . . . . . 56
6.1.3 Experiments and Observations . . . . . . . . . . . . . . . . . . . . 56
6.2 Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2.1 The Value of Information . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.2 The Quality of Information . . . . . . . . . . . . . . . . . . . . . . 63
6.3 Accuracy, Meaning, and Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3.2 Meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3.3 Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.4 Significs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 Further Remarks on Information Generation
and Reception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7 The Transmission of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.1 The Capacity of a Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.2 Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.3 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.4 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.4.1 Use of Compression to Measure Distance . . . . . . . . . . 85
7.4.2 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.5 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.6 Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8 Sets and Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.1 The Notion of Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.2 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.2.1 Ordered Sampling with Replacement . . . . . . . . . . . . . . 94
8.2.2 Ordered Sampling Without Replacement . . . . . . . . . . . 94
8.2.3 Unordered Sampling Without Replacement . . . . . . . . . 95
8.2.4 Unordered Sampling With Replacement . . . . . . . . . . . 97
8.3 The Binomial Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Contents xvii