Biological Sequence Analysis Probabilist
Biological Sequence Analysis Probabilist
Biological Sequence Analysis Probabilist
BOOK REVIEW
In a recent press release, the British MRC announced a historic day you can do by surfing on the net. The book is good in discussing
in scientific history, the completion of the Ceanorhebtitis elegans and presenting algorithms, but it will not tell you what to do with
genome. This first genetic blueprint of an animal adds more than the programs around nor how you can be successful in mining that
19,000 genes to the several hundred thousand already available. database.
The sequence databases are regarded as a treasure store. Genome The subjects of biological sequence comparison are nucleic acid
data mining shows signs of triggering a gold rush. and protein sequences. From the computational point of view,
How can you get to share in the bioinformatics bonanza? On the there seems to be little or no difference between the two classes of
technical side you need tools to spot the ore. BLAST and FASTA molecules when represented as strings or sequences of symbols.
are two standard tools for digging. To get the most out of them, you From a structural point of view, however, these molecules are
should have some knowledge of their inner workings. Using those fundamentally different. The complicated three-dimensional struc-
tools you will be one in a crowd. If you wish to improve your tures of proteins have far-reaching consequences for the design and
chances, you must add something more. You may look for other implementation of alignment algorithms. Consideration of struc-
tools that have been neglected so far or you may invent your own tural constraints is particularly important in the treatment of gaps,
wonder shovel. In any case, as a newcomer, you will look for a notion fundamental to sequence comparison.
advice. You will not find a protein fold in this book. Three-dimensional
This book will get you started on some of the fundamental structure is neglected. In this respect the text follows the tradition
algorithms in sequence alignment. The topics covered are pairwise of almost all books on biological sequence alignment. Hence, you
sequence alignments, multiple sequence alignments, building phy- will not find the powerful extensions and modifications to standard
logenetic trees, and RNA ~secondary! structure analysis. In be- alignment algorithms that can be applied when at least one of the
tween you will find introductions to Hidden Markov Models protein folds is known. One may consider this delete as a minor
~HMMs!, transformational grammars, and probability. omission in a book on sequence analysis, but some references
There is an obvious bias toward HMMs. Several of the standard would be helpful.
alignment algorithms ~Needleman-Wunsch, Smith-Waterman, mul- The title of the book implies coverage of biological sequence
tiple alignments! are developed in terms of HMMs and there is the alignment. In fact it presents a rather narrow subset of topics.
unspoken implication that all you need are HMMs. In fact this is Several of the most exciting developments are not even mentioned.
an elegant way to work out algorithms for string comparison and Take fold recognition and threading, for example. These methods
the presentation is quite enjoyable. Moreover, most of the algo- consistently score well in public blind tests, and they provide a
rithms are clearly stated. Even relatively inexperienced readers number of exciting problems in algorithm design and implemen-
should be able to implement their own code quickly provided they tation, but there is no reference to threading or fold recognition.
are experienced programmers. There is no actual code provide, nor All things considered, can this book be recommended? Yes it
even hints to an ftp-site. can. It is a nice text for anybody who wants to understand or
The more advanced subjects are less accessible. One of the most implement sequence alignment techniques. This book will get you
important recent topics in sequence comparison is that of e-values, started and on the right track. In fact there are not many textbooks
expressing the significance level of sequence matches. The basic available that make the subject of sequence comparison enjoyable.
ideas are addressed in the text and are summarized in the appendix. This book is one of them. I look forward to a second edition.
However, it is rather unlikely that you will succeed in implement-
ing e-values ~and actually understand them! unless you consult the MANFRED J. SIPPL
original papers. Center of Applied Molecular Engineering
After finishing the chapters on sequence comparison, you might University of Salzburg
ask, what now? How can I apply what I have learned? You will not Jakob Haringer Str. 3
get an answer here. There is no indication of the exciting things Salzburg, Austria
695