A Practical Guide To Scientific Data Analysis
A Practical Guide To Scientific Data Analysis
A Practical Guide To Scientific Data Analysis
JWBK419-FM JWBK419/Livingstone September 25, 2009 13:8 Printer Name: Yet to Come
A Practical Guide to
Scientific Data Analysis
A Practical Guide to
Scientific Data Analysis
David Livingstone
ChemQuest, Sandown, Isle of Wight, UK
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ,
United Kingdom
For details of our global editorial offices, for customer services and for information about
how to apply for permission to reuse the copyright material in this book please see our
website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in
accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act
1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in
print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as
trademarks. All brand names and product names used in this book are trade names, service
marks, trademarks or registered trademarks of their respective owners. The publisher is not
associated with any product or vendor mentioned in this book. This publication is designed
to provide accurate and authoritative information in regard to the subject matter covered. It
is sold on the understanding that the publisher is not engaged in rendering professional
services. If professional advice or other expert assistance is required, the services of a
competent professional should be sought.
The publisher and the author make no representations or warranties with respect to the
accuracy or completeness of the contents of this work and specifically disclaim all warranties,
including without limitation any implied warranties of fitness for a particular purpose. This
work is sold with the understanding that the publisher is not engaged in rendering
professional services. The advice and strategies contained herein may not be suitable for
every situation. In view of ongoing research, equipment modifications, changes in
governmental regulations, and the constant flow of information relating to the use of
experimental reagents, equipment, and devices, the reader is urged to review and evaluate the
information provided in the package insert or instructions for each chemical, piece of
equipment, reagent, or device for, among other things, any changes in the instructions or
indication of usage and for added warnings and precautions. The fact that an organization or
Website is referred to in this work as a citation and/or a potential source of further
information does not mean that the author or the publisher endorses the information the
organization or Website may provide or recommendations it may make. Further, readers
should be aware that Internet Websites listed in this work may have changed or disappeared
between when this work was written and when it is read. No warranty may be created or
extended by any promotional statements for this work. Neither the publisher nor the author
shall be liable for any damages arising herefrom.
Library of Congress Cataloging-in-Publication Data
Livingstone, D. (David)
A practical guide to scientific data analysis / David Livingstone.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-85153-1 (cloth : alk. paper)
1. QSAR (Biochemistry) – Statistical methods. 2. Biochemistry – Statistical methods.
I. Title.
QP517.S85L554 2009
615 .1900727–dc22
2009025910
A catalogue record for this book is available from the British Library.
ISBN 978-0470-851531
Typeset in 10.5/13pt Sabon by Aptara Inc., New Delhi, India.
Printed and bound in Great Britain by TJ International, Padstow, Corwall
P1: OTE/OTE/SPH P2: OTE
JWBK419-FM JWBK419/Livingstone September 25, 2009 13:8 Printer Name: Yet to Come
Contents
Preface xi
Abbreviations xiii
viii CONTENTS
4 Data Display 75
4.1 Introduction 75
4.2 Linear Methods 77
4.3 Nonlinear Methods 94
4.3.1 Nonlinear Mapping 94
4.3.2 Self-organizing Map 105
4.4 Faces, Flowerplots and Friends 110
4.5 Summary 113
References 116
CONTENTS ix
x CONTENTS
Index 333
P1: OTE/OTE/SPH P2: OTE
JWBK419-FM JWBK419/Livingstone September 25, 2009 13:8 Printer Name: Yet to Come
Preface
The idea for this book came in part from teaching quantitative drug
design to B.Sc. and M.Sc. students at the Universities of Sussex and
Portsmouth. I have also needed to describe a number of mathemati-
cal and statistical methods to my friends and colleagues in medicinal
(and physical) chemistry, biochemistry, and pharmacology departments
at Wellcome Research and SmithKline Beecham Pharmaceuticals. I have
looked for a textbook which I could recommend which gives practical
guidance in the use and interpretation of the apparently esoteric meth-
ods of multivariate statistics, otherwise known as pattern recognition. I
would have found such a book useful when I was learning the trade, and
so this is intended to be that sort of guide.
There are, of course, many fine textbooks of statistics and these are
referred to as appropriate for further reading. However, I feel that there
isn’t a book which gives a practical guide for scientists to the processes of
data analysis. The emphasis here is on the application of the techniques
and the interpretation of their results, although a certain amount of
theory is required in order to explain the methods. This is not intended
to be a statistical textbook, indeed an elementary knowledge of statistics
is assumed of the reader, but is meant to be a statistical companion to
the novice or casual user.
It is necessary here to consider the type of research which these meth-
ods may be used for. Historically, techniques for building models to
relate biological properties to chemical structure have been developed in
pharmaceutical and agrochemical research. Many of the examples used
in this text are derived from these fields of work. There is no reason,
however, why any sort of property which depends on chemical structure
should not be modelled in this way. This might be termed quantita-
tive structure–property relationships (QSPR) rather than QSAR where
P1: OTE/OTE/SPH P2: OTE
JWBK419-FM JWBK419/Livingstone September 25, 2009 13:8 Printer Name: Yet to Come
xii PREFACE
David Livingstone
Sandown, Isle of Wight
May 2009
P1: OTE/OTE/SPH P2: OTE
JWBK419-FM JWBK419/Livingstone September 25, 2009 13:8 Printer Name: Yet to Come
Abbreviations
xiv ABBREVIATIONS
ABBREVIATIONS xv