Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Chemometrics: Data Driven Extraction for Science
Chemometrics: Data Driven Extraction for Science
Chemometrics: Data Driven Extraction for Science
Ebook1,126 pages11 hours

Chemometrics: Data Driven Extraction for Science

Rating: 0 out of 5 stars

()

Read preview

About this ebook

A new, full-color, completely updated edition of the key practical guide to chemometrics

This new edition of this practical guide on chemometrics, emphasizes the principles and applications behind the main ideas in the field using numerical and graphical examples, which can then be applied to a wide variety of problems in chemistry, biology, chemical engineering, and allied disciplines. Presented in full color, it features expansion of the principal component analysis, classification, multivariate evolutionary signal and statistical distributions sections, and new case studies in metabolomics, as well as extensive updates throughout. Aimed at the large number of users of chemometrics, it includes extensive worked problems and chapters explaining how to analyze datasets, in addition to updated descriptions of how to apply Excel and Matlab for chemometrics. 

Chemometrics: Data Driven Extraction for Science, Second Edition offers chapters covering: experimental design, signal processing, pattern recognition, calibration, and evolutionary data. The pattern recognition chapter from the first edition is divided into two separate ones: Principal Component Analysis/Cluster Analysis, and Classification. It also includes new descriptions of Alternating Least Squares (ALS) and Iterative Target Transformation Factor Analysis (ITTFA). Updated descriptions of wavelets and Bayesian methods are included.

  • Includes updated chapters of the classic chemometric methods (e.g. experimental design, signal processing, etc.)
  • Introduces metabolomics-type examples alongside those from analytical chemistry
  • Features problems at the end of each chapter to illustrate the broad applicability of the methods in different fields
  • Supplemented with data sets and solutions to the problems on a dedicated website

Chemometrics: Data Driven Extraction for Science, Second Edition is recommended for post-graduate students of chemometrics as well as applied scientists (e.g. chemists, biochemists, engineers, statisticians) working in all areas of data analysis.

LanguageEnglish
PublisherWiley
Release dateMar 13, 2018
ISBN9781118904688
Chemometrics: Data Driven Extraction for Science

Related to Chemometrics

Related ebooks

Chemistry For You

View More

Related articles

Reviews for Chemometrics

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Chemometrics - Richard G. Brereton

    Preface to Second Edition

    The first edition of this book has been well received, with a special emphasis on numerical illustration of a wide range of chemometric methods. Of particular importance were the problems at the end of each chapter that readers could work through in their own favourite environment, such as Excel or Matlab, but also R or Python or Fortran or any number of languages or computational packages if desired. I have performed calculations in both Matlab and Excel, but readers should not feel restricted if they have an alternative.

    The reader of this book is likely to be an applied scientist or statistician who wishes to understand the basis and motivation of many of the main methods used in chemometrics.

    Since the first edition, chemometrics has become much more widespread, including outside mainstream chemistry. In the early 2000s, the major applications were quantitative laboratory analytical science and chemical engineering including process control. Over the past few years, application areas have broadened, as large analytical laboratory-generated data sets become more widely available, for example, in metabolomics, heritage science and food science, reflecting a larger emphasis on pattern recognition in the second edition including some practical case studies from metabolomics in the form of worked problem sets.

    Despite this, many of the original building blocks of the subject remain unchanged. A factorial design and a principal component is still the same, so parts of the text only involve small changes from the first edition. Nevertheless, feedback both from students and co-workers of mine and also from comments via the Internet have provided valuable guidance as to what changes are desirable for a second edition. Important structural changes such as multiple choice questions throughout the book and colour printing update the original edition as a modern day textbook.

    Some major updates are as follows.

    • Short multiple choice questions at the end of every section of the main text.

    • Colour printing involving redrawing many figures.

    • New chapter on supervised pattern recognition (classification) involving enhanced discussions of SIMCA, PLS-DA, LDA, QDA, EDC, kNN as well as validation.

    • New case studies on NIR for distinguishing edible oils, and properties of elements, to illustrate unsupervised pattern recognition methods.

    • New case studies in metabolomics, including Arabidopsis genotyping by MS, Raman of cancerous lymph nodes and NMR for diagnosing diabetes, as new problem sets.

    • Additional description of MCR and ITTFA.

    • New and expanded discussions of wavelets and of Bayesian methods in signal analysis.

    • Updated description of Matlab R2016a under Windows 10, and Excel 2016 under Windows 10, in the context of the needs of the chemometrician.

    • Enhanced discussion of the main statistical distributions.

    • Enhanced discussions on validation and optimisation, including description of the bootstrap and of performance indicators.

    To supplement this book, all data sets in this book, both from the main text and the problems at the end of each chapter, are downloadable. In addition, there is a downloadable Excel add-in to perform most of the common multivariate methods and a macro for labelling graphs. Matlab routines corresponding to many of the main methods are also available. The answers to the problems at the end of each chapter can also be found. These are available on the Wiley website associated with this book.

    It is hoped that this text will be useful for students wishing to obtain a fundamental understanding of many chemometric methods. It will also be useful for any practicing chemometrician who needs to work through methods they may have only recently encountered, using numerical examples: as a researcher, when I encounter an unfamiliar approach, I usually like to reproduce numerical data from published case studies to check how it works before I am confident to use the method. For people encountering chemometrics for the first time, for example, in metabolomics and heritage science, this book presents many of the most widespread methods and so will serve as a good reference. And as a refresher, the multiple choice questions test the basic understanding. The worked case studies can be collected together and are helpful for courses.

    Finally, I thank the publishers who have encouraged the development of this rather complex project, especially Jenny Cossham, through many stages and also colleagues who have provided data as listed in the acknowledgements.

    Bristol, May 2017

    Richard G. Brereton

    Preface to First Edition

    This book is a product of several years' activities from myself. First and foremost, the task of educating graduate students in my research group from a large variety of backgrounds over the past 10 years has been a significant formative experience, and this has allowed me to develop a large series of problems which we set every 3 weeks and present answers in seminars. From my experience, this is the best way to learn chemometrics! In addition, I have had the privilege to organise international quality courses mainly for industrialists with the participation of many representatives as tutors of the best organisations and institutes around the world, and I have learnt from them. Different approaches are normally taken while teaching industrialists who may be encountering chemometrics for the first time in mid-career and have a limited period of a few days to attend a condensed course, and university students that have several months or even years to practice and improve. However, it is hoped that this book represents a symbiosis of both needs.

    In addition, it has been a great inspiration for me to write a regular fortnightly column for Chemweb (available to all registered users on www.chemweb.com) and some of the material in this book is based on articles first available in this format. Chemweb brings a large reader base to chemometrics, and feedback via e-mails or even travels around the world have helped me formulate my ideas. There is a very wide interest in this subject, but it is somewhat fragmented. For example, there is a strong group of Near Infrared Spectroscopists, primarily in the USA, that has led to the application of advanced ideas in process monitoring who see chemometrics as a quite technical industrially oriented subject. There are other groups of mainstream chemists that see chemometrics as applicable to almost all branches of research, ranging from kinetics to titrations to synthesis optimisation. Satisfying all these diverse people is not an easy task.

    This book relies mainly on numerical examples: many in the body of the text come from my favourite research interests that are primarily in analytical chromatography and spectroscopy, to expand the text more to produce a huge book of twice the size, so I ask the indulgence of readers if your area of application differs. Certain chapters such as those on calibration could be approached from widely different viewpoints, but the methodological principles are the most important, and if you understand how the ideas can be applied in one area, you will be able to translate to your own favourite application. In the problems at the end of each chapter, I cover a wider range of applications to illustrate the broad basis of these methods. The emphasis of this book is on understanding ideas, which can then be applied to a wide variety of problems in chemistry, chemical engineering and allied disciplines.

    It is difficult to select what material to include in this book without making it too long. Every expert I have shown this book to has made suggestions for new material. Some I have taken into account and I am most grateful for every proposal, and others I have mentioned briefly or not at all, mainly for the reason of length and also to ensure that this book sees the light of day rather than constantly expands without an end. There are many outstanding specialist books for the enthusiast. It is my experience, although, that if you understand the main principles (which are quite a few in number), and constantly apply them to a variety of problems, you will soon pick up the more advanced techniques, so it is the building blocks that are most important.

    In a book of this nature, it is very difficult to decide on what detail is required for the various algorithms, some readers will have no real interest in the algorithms, whereas others will feel the text is incomplete without comprehensive descriptions. The main algorithms for common chemometric methods are presented in Appendix A.2. Step by step descriptions of methods, rather than algorithms, are presented in the text. A few approaches that will interest some readers such as cross-validation in PLS are described in the problems at the end of appropriate chapters which supplement the text. It is expected that readers will approach this book with different levels of knowledge and expectations, so it is possible to gain a great deal without having an in-depth appreciation of computational algorithms, but for interested readers, the information is nevertheless available. People rarely read texts in a linear fashion, they often dip in and out of parts of it according to their background and aspirations, and chemometrics is a subject which people approach with very different previous knowledge and skills, so it is possible to gain from this book without covering every topic in full. Many readers will simply use add-ins or Matlab commands and be able to produce all the results in this text.

    Chemometrics uses a very large variety of software. In this book, we recommend two main environments, Excel and Matlab, the examples have been tried using both environments, and you should be able to get the same answers in both cases. Users of this book will vary from people that simply want to plug the data into existing packages to those that are curious and want to reproduce the methods in their own favourite language such as Matlab, VBA or even C. In some cases, instructors may use the information available with this book to tailor examples for problem classes. Extra software supplements are available via the publishers' website www.SpectroscopyNOW.com, together with all the data sets in this book.

    The problems at the end of each chapter form an important part of the text, the examples being a mixture of simulations (which have an important role in chemometrics) and real case studies from a wide variety of sources. For each problem, the relevant sections of the text that provide further information are referenced. However, a few problems build on the existing material and take the reader further: a good chemometrician should be able to use the basic building blocks to understand and use new methods. The problems are of various types; thus, not every reader will to solve all the problems. In addition, instructors can use the data sets to construct workshops or course material that goes further than the book.

    I am very grateful for the tremendous support I have had from many people when asking for information and help with data sets and permission where required. I thank Chemweb for agreement to present material modified from articles originally published in their e-zine, The Alchemist, and the RSC for permission to base the text of Chapter 5 on material originally published in the Analyst (125, 2125–2154 (2000)). A full list of acknowledgements for the data sets used in this text is presented after this foreword.

    I thank Tom Thurston and Les Erskine for a superb job on the Excel add-in, and Hailin Shen for outstanding help in Matlab. Numerous people have tested the answers to the problems. Special mention should be given to Christian Airiau, Kostas Zissis, Tom Thurston, Conrad Bessant and Cevdet Demir for access to a comprehensive set of answers on disc for a large number of exercises so I can check mine. In addition, several people have read chapters and made detailed comments particularly checking numerical examples; in particular, I thank Hailin Shen for suggestions about improving Chapter 6 and Mohammed Wasim for careful checking of errors. In some ways, the best critics are the students and postdocs working with me because they are the people that have to read and understand a book of this nature, and it gives me great confidence that my co-workers in Bristol have found this approach useful and have been able to learn from the examples.

    Finally, I thank the publishers for taking a germ on an idea and making valuable suggestions as to how this could be expanded and improved to produce what I hope is a successful textbook and having faith and patience over a protracted period.

    Bristol, February 2002

    Richard G. Brereton

    Acknowledgements

    The following have provided me with sources of data for this text. All other case studies are simulations.

    About the Companion Website

    Do not forget to visit the companion website for this book:

    http://booksupport.wiley.com

    The accompanying website for this text, http://booksupport.wiley.com, provides valuable material designed to enhance your learning, including:

    • Answers to problems at the end of each chapter

    • Software

    • Associated data sets

    • Figures in PPT

    Chapter 1

    Introduction

    1.1 Historical Parentage

    There are many opinions about the origin of chemometrics. Until quite recently, the birth of chemometrics was considered to have happened in the 1970s. Its name first appeared in 1972 in an article by Svante Wold [1]: in fact, the topic of this article was not one that we would recognise as being core to chemometrics, being relevant to neither multivariate analysis nor experimental design. For over a decade, the word chemometrics was considered to be of very low profile, and it developed a recognisable presence only in the 1980s, as described below.

    However, if an explorer describes a new species in a forest, the species was there long before the explorer. Thus, the naming of the discipline just recognises that it had reached some level of visibility and maturity. As people re-evaluate the origins of chemometrics, the birth can be traced many years back.

    Chemometrics burst into the world due to three fundamental factors, applied statistics (multivariate and experimental design), statistics in analytical and physical chemistry, and scientific computing.

    1.1.1 Applied Statistics

    The ideas of multivariate statistics have been around a long time. R.A. Fisher and colleagues working in Rothamsted, UK, formalised many of our modern ideas while applying primarily to agriculture. In the UK, before the First World War, many of the upper classes owned extensive land and relied on their income from tenant farmers and agricultural labourers. After the First World War, the cost of labour became higher, with many moving to the cities, and there was stronger competition of food from global imports. This meant that historic agricultural practices were seen to be inefficient and it was hard for landowners (or companies that took over large estates) to be economic and competitive, hence a huge emphasis on agricultural research, including statistics to improve these. R.A. Fisher and co-workers published some of the first major books and papers that we would regard as defining modern statistical thinking [2, 3], introducing ideas ranging from the null hypothesis to discriminant analysis to ANOVA. Some of the work of Fisher followed from the pioneering work of Karl Pearson in the University College London who had founded the world's first statistics department previously and had first formulated ideas such as p values and correlation coefficients.

    During the 1920s and 1930s, a number of important pioneers of multivariate statistics published their work, many strongly influenced or having worked with Fisher, including Harold Hotelling, credited by many as defining principal components analysis (PCA) [4], although Pearson had independently described this method some 30 years ago, but under a different guise. As so often ideas are reported several times over in science, it is the person that names it and popularises it that often gets the credit: in the early twentieth century, libraries were often localised and there were very few international journals (Hotelling working mainly in the US) and certainly no internet; therefore, parallel work was often reported.

    The principles of statistical experimental design were also formulated at around this period. There had been early reports on what we regard as modern approaches to formal designs before that, for example James Lind's work on scurvy in the eighteenth century and Charles Pierce's discussion on randomised trials in the nineteenth century, but Fisher's classic work of the 1930s put all the concepts together in a rigorous statistical format [5].

    Much non-Bayesian, applied statistical thinking has been based on principles established in the 1920s and 1930s, for nearly a century. Early applications include agriculture, psychology, finance and genetics. After the Second World War, the chemical industry took an interest. In the 1920s, an important need was to improve agricultural practice, but by the 1950s, a major need was to improve processes in manufacturing, especially chemical engineering; hence, many more statisticians were employed within the industry. O.L. Davies edited an important book on experimental design with contributions from colleagues in ICI [6]. Foremost was G.E.P. Box, son-in-law of Fisher, whose book with colleagues is one of the most important post-war classics in experimental design and multi-linear regression [7].

    These statistical building blocks were already mature by the time people started calling themselves chemometricians and have changed only a little during the intervening period.

    1.1.2 Statistics in Analytical and Physical Chemistry

    Statistical methods, for example, to estimate accuracy and precision of measurements or to determine a best-fit linear relationship between two variables, have been available to analytical and physical chemists for over a century. Almost every general analytical textbook includes chapters on univariate statistics and has done for decades. Although theoretically we could view this as applied statistics, on the whole, the people who advanced statistics in analytical chemistry did not class themselves as applied statisticians and specialist terminology has developed over time.

    Most quantitative analytical and physical chemistry until the 1970s was viewed as a univariate field; that is, only one independent variable was measured in an experiment. Usually, all other external factors were kept constant. This approach worked well in mechanics or fundamental physics, the so-called ‘One Factor at a Time’ (OFAT) approach. Hence, statistical methods were primarily used for univariate analysis of data. By the late 1940s, some analytical chemists were aware of ANOVA, F-tests and linear regression [8], although the term chemometrics had not been invented, but multivariate data came along much later.

    There would have been very limited cross-fertilisation between applied statisticians, working in mathematics departments, and analytical chemists in chemistry departments, during these early days. Different departments often had different buildings, different libraries and different textbooks. A chemist, however numerate, would feel a stranger walking into a maths building and would probably cocoon him or herself in their own library. There was no such thing as the Internet or Web or Knowledge or electronic journals. Maths journals published papers for mathematicians and vice versa for chemistry journals. Although in areas such as agriculture and psychology there was a tradition of consulting statisticians, chemists were numerate and tended to talk to each other – an experimental chemist wanting to fit a straight line would talk to a physical chemist in the tea room if need be. Hence, ideas did not travel in academia. Industry was somewhat more pragmatic, but even there, the main statistical innovations were in chemical engineering and process chemistry and often classed as industrial chemistry. The top Universities often did not teach or research industrial chemistry, although they did teach Newtonian physics and relativity. In fact, the treatment of variables and errors by physicists trying, for example, to measure gravitational effects or the distance of a star is quite different to multivariate statistics: the former try to design experiments so that only one factor is studied and to make sure any errors are minimised and from one source, whereas a multivariate statistician might accept and expect data to be multifactorial.

    Hence, statistics in analytical chemistry diverged from applied statistics for many decades. Caulcutt and Body's book first published in 1983 contains nothing on multivariate statistics [9] and in Miller and Miller's book of 1993 just one out of six main chapters is devoted to experimental design, optimisation and pattern recognition (including PCA) [10].

    Even now, there are numerous useful books aimed at analytical and physical chemists that omit multivariate statistics. An elaborate vocabulary has developed for the needs of analytical chemists, with specialist concepts that are rarely encountered in other areas. Some analytical chemists in the 1960s to 1980s were aware that multivariate approaches existed and did venture into chemometrics, but good multivariate data were limited. Most are aware of ANOVA and experimental design. However, statistics for analytical chemistry tends to lead a separate existence from chemometrics, although multivariate methods derived from chemometrics do have a small foothold within most graduate-level courses and books in general analytical chemistry, and certainly quantitative analytical (and physical) chemistry was an important building block for modern chemometrics.

    Over the last two decades, however, applications of chemometrics have moved far beyond traditional quantitative analytical chemistry, for example, into the areas of metabolomics, environment, cultural heritage or food, where the outcome is not necessarily to measure accurately the concentration of an analyte or how many compounds are in the spectra of a series of mixtures. This means that the aim of some chemometric analysis has changed. We often do not always have, for example, well-established reference samples and, in many cases, we cannot judge a method by how efficiently it predicts properties of these reference samples. We may not know whether the spectra of some extracts of urine samples can contain enough information to tell whether our donors are diseased or not: it may depend on how the disease has progressed, how good the diagnosis is, what the genetics of the donor and so on. Hence, we may never have a model that perfectly distinguishes two groups of samples. In classical physical or analytical chemistry, the answer is usually known to a greater accuracy than we can predict, in advance, so we can always tell which methods are best. This gradual change in culture distinguishes much of modern chemometrics from traditional statistics in analytical chemistry, although analytical chemistry is definitely one of the ancestors of chemometrics, and the two are symbiotic.

    1.1.3 Scientific Computing

    Another revolution happened as from the 1960s, the use of computers in scientific research.

    Many of the original statistical computations required complex matrix operations that may have taken days or even weeks to solve using manual calculations even with calculators or slide rules. This limited the applicability of many statistical methods. Many early statistical papers were intensely theoretical and some methods were applied only to important and economically significant problems: an agricultural experiment that took several years deserved a couple of weeks manually computing the trends in the data. However, they were not widespread, especially in scientific laboratories.

    With the 1960s, scientists in the best resourced laboratories gained access to mainframe computers. Usually, they had to be programmed in languages such as FORTRAN and used punch cards, paper tape and line printers. However, they allowed a rapid adoption of computers by applied scientists, which became the third revolution that led to chemometrics.

    Resolution and rank analysis of spectroscopy of mixtures had its vintage in the 1960s, with a small number of pioneering papers [11, 12] taking advantage of newly available computer power: in earlier papers, such methods were reported but were applied to very small problems, for example, of four mixtures and four wavelengths due to the difficulty of manual calculation. Multivariate spectroscopic resolution developed quite separately to multivariate statistics, primarily via physical chemistry. The original terminology differed quite considerably from statistics and was primarily that of physics.

    Over the 1960s and 1970s, there were many papers about spectroscopic resolution in both the physical chemistry and the analytical chemistry literature, but Ed Malinowski, whose remarkable publication career stretches from 1955 to 2011, is best recognised to having put these concepts together with multivariate statistics. He published what many regard as the first book that covered one important area of chemometrics [13], which he called Factor Analysis, involving determining the number of components in spectroscopic mixtures together with their characteristics.

    Meanwhile, a separate development in scientific computing emerged in the 1960s – partly catalysed by NASA's trip to the moon – to use AI to identify compounds spectroscopically [14], a project that involved Nobel Prize winners and spawned the whole new area of expert systems. This in turn led to the field of pattern recognition and the award of several competitive grants in scientific computing, particularly in the USA.

    Isenhour, Jurs and Kowalski were early pioneers of computerised learning in chemistry, primarily using pattern recognition [15], the early group founded by Isenhour, who left in 1969. Kowalski took over the reins in 1974, initially with an interest in chemical pattern recognition.

    Hence, computational chemistry arrived via both physical chemistry of spectroscopic mixtures and organic chemistry for pattern recognition and had important elements in the formative mix in the 1960s and 1970s. This allowed the application to comparatively large problems and wider access to algorithms that had previously been rather theoretical.

    1.2 Developments since the 1970s

    Chemometrics slowly gained an identity from the mid-1970s, after Wold first named it. However, some of the recognised pioneers were slow to identify with it. For example, both Wold and Kowalski published far more papers using ‘chemical pattern recognition’ than ‘chemometrics’ as a keyword in the 1970s.

    The first symposia with chemometrics in the name, in the USA, took place in the late 1970s. The first analytical chemistry review entitled ‘Chemometrics’ was published in 1980 [16]. The International Chemometrics Society was founded by Wold and Kowalski in the 1970s.

    By this stage, although still relatively few workers identified themselves with chemometrics, small groups of enthusiasts were promoting the name and idea. In those days, most of those that identified themselves as chemometricians were quite expert programmers who cut their teeth on a mainframe or, latterly, primitive micros. Some even started their scientific careers before scientists had ready access to computers and may have had to learn programming via Assembly language so were in practice extremely good programmers. If a method was reported in a paper, the authors would typically program it in itself rather than using a package.

    A NATO sponsored workshop in Cosenza, Italy, in 1983, brought together many of the early experts of the time [17] and events moved fast after that. The first journals dedicated to chemometrics, Chemometrics and Intelligent Laboratory Systems (Elsevier) and Journal of Chemometrics (Wiley), were founded in 1986 and 1987. Kowalski and co-workers produced the first comprehensive book in 1986 [18] followed by Massart and co-workers in 1988 [19]. Software packages such as Arthur, Unscrambler and Simca emerged during this period.

    By the 1990s, well-established books, journals, courses and software were available, although still only quite a small number of dedicated groups worldwide. However, this changed when laboratory-based data started to become more readily available – the size of data sets and complexity of problems increased massively. In the 1980s, the emphasis was primarily on small problems such as the resolution of a cluster of HPLC peaks or deconvolution of uv/vis spectra. Economically important problems in process control and NIR spectroscopy posed new challenges to chemometricians and gradually moved the subject from a rather theoretical application of quantitative analytical chemistry to a more applied subject. There was a special interest in the interface between chemical engineering and chemometrics.

    A further revolution has happened in the last 15 years when complex real-world data have become available. This has allowed looking at applications ranging from metabolomics to heritage studies to forensics and so on where large data sets are available. It has resulted in chemometrics tools becoming very widely used, although the core community of experts is probably no bigger than a few decades ago. The widespread applicability of common chemometric methods such as PCA, classification, calibration, and so on, leads to an urgent need to understand these methods. This book is primarily aimed at potential users who want to understand the underlying mathematical approaches, rather than just use packages.

    1.3 Software and Calculations

    The key to chemometrics is to understand how to perform meaningful calculations on data. In most cases, these calculations are too complex to do by hand or using a calculator; hence, it is necessary to use some software.

    The approach taken in this book, which differs from many books on chemometrics, is to understand the methods using numeric examples. Some excellent books and reviews are more descriptive, listing the methods available together with the literature references and possibly some examples. Others have a big emphasis on equations and output from packages. This book, however, is primarily based on how I personally learn and understand new methods, and how I have found it most effective to help students working with me. Data analysis is not really a knowledge-based subject but is more a skill-based subject. A good organic chemist may have encyclopaedic knowledge of reactions in their own area. The best supervisor will be able to list to his or her students thousands of reactions, or papers, or conditions that will aid their students, and with experience this knowledge base grows. In chemometrics, although there are quite a number of named methods, the key is not to learn hundreds of equations but to understand a few basic principles. These ideas, such as multiple linear regression (MLR), occur again and again but in different contexts. To become skilled in chemometric data analysis, practice to manipulating numbers is required, not an enormous knowledge base. Although equations are necessary for the formal description of methods, and cannot easily be avoided, it is easiest to understand the methods in this book by looking at numbers. Hence, the methods described in this book are illustrated using numerical examples, which are available for the reader to reproduce. The data sets employed in this book are available on the publisher's website. In addition to the main book, there are extensive problems at the end of each main chapter. All numerical examples are quite small and are designed in such a manner that you can check all the numbers yourselves. Some are reduced versions of larger data sets, such as spectra recorded at 5 nm rather than 1 nm intervals. Many real examples, especially in chromatography and spectroscopy, simply differ in size to those in this book. In addition, the examples are chosen so that they are feasible to analyse fairly simply.

    One of the difficulties is to decide the software to be employed in order to analyse the data. This book is not restrictive and you can use any approach you like. Some readers may like to program their own methods, for example, in C or Visual Basic. Others may like to use statistical packages such as SAS or SPSS. There is a significant statistical community that uses R. Some groups use ready packaged chemometrics software such as Pirouette, Simca, Unscrambler, PLS Toolbox and several others in the market. One problem with using packages is that they are often very focussed in their facilities. What they do, they do excellently, but if they cannot do what you want, you may be stuck, even for relatively simple calculations. If you have an excellent multivariate package but want to use a Kalman filter, where do you turn? Perhaps you have the budget to buy another package, but if you just want to explore the method, the simplest implementation takes only an hour or less for an experienced Matlab programmer to implement. In addition, there are no universally agreed definitions, so a ‘factor’ or ‘eigenvector’ might denote something quite different according to the software used. Some software has limitations making it unsuitable for many applications of chemometrics, a very simple example being the automatic use of column centring in PCA in many general statistical packages, whereas some chemometric methods involve using uncentred PCA.

    Nevertheless, many of the results from the examples in this book can quite successfully be obtained using commercial packages, but be aware of the limitations and also understand the output of any software you use. It is important to recognise that the definitions used in this book may differ from that employed by any specific package. As a huge number of often incompatible definitions are available, even for fairly common parameters, in order not to confuse the reader, we have had to adopt one single definition for each parameter; thus, it is important to carefully check with your favourite package or book or paper whether the results appear to differ from those presented in this book. It is not the aim of this book to replace an international committee that defines chemometric terms. Indeed, it is quite unlikely that such a committee would be formed because of the very diverse backgrounds of those interested in chemical data analysis.

    However, in this book, we recommend that the readers use one of the two environments.

    The first is Excel. Almost everyone has some familiarity with Excel, and in Appendix A.4, specific features that might be useful for chemometrics are described. Most calculations can be performed quite simply using normal spreadsheet functions. The exception is PCA for which a small program must be written. For instructors and users of VBA (a programming language associated with Excel), a small macro that can be edited is available, downloadable from the publisher's website. However, some calculations such as cross-validation and partial least squares (PLS), while possible to program using Excel, can be quite tedious. It is strongly recommended that readers do reproduce these methods step by step when first encountered, but after a few times, one does not learn much from setting up the spreadsheet each time. Hence, we also provide an Excel add-in to perform PCA, PLS, MLR and PCR (principal components regression). The software also contains facilities for validation. Readers of this book should choose what approach they wish to take.

    A second environment, which many chemical engineers and statisticians enjoy, is Matlab described in Appendix A.5. Historically, the first significant libraries of programs in chemometrics became first available in the late 1980s. Quantum chemistry, originating in the 1960s, is still very much based on Fortran because this was the major scientific programming environment of the time, and over the years, large libraries have been developed and maintained; hence, a modern quantum chemist will probably learn to use Fortran. The vintage of chemometrics is such that a more recent environment to scientific programming has been adopted by the majority, and many chemometricians swap software using Matlab. The advantage is that Matlab is very matrix oriented and it is most convenient to think in terms of matrices, especially as most data are multivariate. In addition, there are special facilities for performing singular value decomposition (or PCA) and the pseudo-inverse used in regression, which means it is not necessary to program these basic functions. There have been a number of recent enhancements, including links to Excel that allow easy interchange of data which enables simple programs to be written that transfer data to and from Excel. There is no doubt at all that matrix manipulation, especially for complex algorithms, is tedious in VBA and Excel. Matlab is an excellent environment for learning the nuts and bolts of chemometrics. A slight problem with Matlab is that it is possible to avoid looking at the raw numbers, whereas most users of Excel will be forced to look at the raw numeric data in detail, and I have come across experienced Matlab users that are otherwise very good at chemometrics but who sometimes miss quite basic information because they are not constantly examining the numbers; hence, if you are a dedicated Matlab programmer, look at the numerical information from time to time!

    An ideal situation would probably involve using both Excel and Matlab simultaneously. Excel provides a good interface and allows flexible examination of the data, whereas Matlab is best for developing matrix-based algorithms. The problems in this book have been tested both in Matlab and in Excel and identical answers were obtained. Where there are quirks of either package, the reader is guided.

    Two final words of caution are needed. The first is that some answers in this book have been rounded to a few significant figures. Where intermediate results of a calculation have been presented, putting these intermediate results back may not necessarily result in exactly the same numerical results as retaining them to higher accuracy and continuing the calculations. A second issue that often perplexes new users of multivariate methods is that it is impossible to control the sign of a principal component (see Chapter 4 for a description of PCA). This is because PCs involve calculating square roots that may give negative as well as positive answers. Therefore, using different packages, or even the same package but with different starting points, can result in reflected graphs, with scores and loadings that are opposite in sign. It is therefore unlikely to be a mistake if you obtain PCs that are opposite in sign to those in this book.

    1.4 Further Reading

    A large number of books and review articles have been written, covering differing aspects of chemometrics, often aimed at a variety of audiences. In Sections 1.1 and 1.2, we list some of the more historic books and papers. This chapter summarises some of the most widespread and recent works. In most cases, these works will allow the reader to delve further into the methods introduced within this book. In each category, only a few main books will be mentioned, but most have extensive bibliographies, allowing the reader to access information especially from the primary literature. Although there are also internet resources and numerous tutorial and review papers, in order to restrict the bibliography, we only list books.

    1.4.1 General

    Largest authored book in chemometrics is published by Massart and co-workers, part of two volumes [20, 21]. These volumes provide an in-depth summary of many modern chemometric methods, involving a wide range of techniques, and many references to the literature. The first volume, although, is quite strongly oriented towards analytical chemists but contains an excellent grounding in basic statistics for measurement science. The books are especially useful as springboards for the primary literature. This is a complete rewrite of the original book published in 1988 [19], which is still cited as a classic in the analytical chemistry literature. Comprehensive Chemometrics [22] is a follow-on from the same publisher, an encyclopaedic collection of edited articles in four volumes covering much of the knowledge base of chemometrics in 2009 and is probably the most comprehensive detailed summary of the subject.

    Otto's book on chemometrics [23] is a well-regarded book now in its third edition covering quite a range of topics but at a fairly introductory level. The book looks at computing, in general, in analytical chemistry including databases and instrumental data acquisition. It is a very clearly written introduction for the analytical chemist, by an outstanding educator.

    Beebe and co-workers at Dow Chemicals have produced a book [24] that is useful for many practitioners and contains very clear descriptions especially of multivariate calibration in spectroscopy and although some years old is still recommended for those working in this area. However, there is a strong ‘American School’ originating in part from the pioneering work of Kowalski in NIR spectroscopy and process control, and while covering the techniques required in this area in an outstanding way, and is well recommended as a next step for readers of this book working in this application area, it lacks a little in generality, probably because of the very close association between NIR and chemometrics in the minds of some.

    Kramer has produced a somewhat more introductory book [25]. He is well known for his consultancy company and highly regarded courses, and his approach is less mathematical. This will suit some people very well, but may not be presented in a way that suits statisticians and chemical engineers.

    This current author published a book on chemometrics at an early stage of the development of the subject [26], which has an emphasis on signal resolution and minimises matrix algebra, and is an introductory tutorial book especially for the laboratory-based chemist. This author also published a later book based on web articles that covers a range of applications as well as simple descriptions of methods [27]. This author has a series of ongoing short tutorial articles covering aspects of chemometrics as a column in Journal of Chemometrics, starting in 2014: these look more into the statistical principles of the subject.

    The journal Chemometrics and Intelligent Laboratory Systems published regular tutorial review articles over its first decade or more of existence. Some of the earlier articles are good introductions to general subjects such as PCA, Fourier transforms and Matlab. They are collected together as two volumes [28, 29]. They also contain some valuable articles on expert systems.

    Varmuza and Filtzmoser have published a very well-regarded book involving using R with clear descriptions within a statistical context [30]. Gemperline edited a multi-author book, which is currently in its second edition [31]. Pomerantsev has published a book oriented towards users of Excel [32]. Mark and Workman have written a comprehensive book aimed at spectroscopists [33]: it is very good at analytical instrumental chemistry and the authors are well regarded.

    Meloun and Militky published a large book based on extensive course work [34]. This covers many topics in chemometrics and has a special feature of 1250 numerical problems and data sets.

    Martens and Martens have produced a book that is quite a detailed discussion of how multivariate methods can be used in quality control [35] and covers several aspects of modern chemometrics, and so could be classed as a general book on chemometrics.

    Although this list is not comprehensive, it lists most general books on chemometrics. There are also several books in different application areas such as food, environment, various types of spectroscopy and so on.

    1.4.2 Specific Areas

    There are a large number of books and review articles dealing with specific aspects of chemometrics, interesting as a next step after this book, and for a comprehensive chemometrics library. We will list just a few.

    1.4.2.1 Experimental Design

    In the area of experimental design, there are innumerable books, many written by statisticians. Specifically aimed at chemists, Deming and Morgan have produced a highly regarded book [36], which is well recommended as a next step after this book. Bayne and Rubin have written a clear and thorough book [37]. An introductory book mainly discussing factorial designs was written by Morgan as part of the Analytical Chemistry by Open Learning Series [38]. For mixture designs, involving compositional data, the classic statistical book by Cornell is much cited and recommended [39] but is quite mathematical. More historical books such as those by Fisher [5] and by Box and co-workers [7] have already been described above but are still relevant today.

    1.4.2.2 Pattern Recognition and Principal Component Analysis

    There are several books on pattern recognition and PCA. An introduction to several of the main techniques is provided in an edited book [40]. For more statistical in-depth descriptions of Principal Components Analysis, read books by Joliffe [41] and Mardia and co-authors [42]. An early but still valuable book by Massart and Kaufmann covers more than just its title ‘cluster analysis’ [43] and provides clear introductory material. Varmuza [44] and Strouf [45] wrote early books in the area when much of the rest of chemometrics was focussed on calibration and signal resolution.

    A more up-to-date book focussed on pattern recognition was recently published by this author [46] that illustrated using several case studies. Over the past decade, there has been a much more interest in pattern recognition compared with a few decades ago, with increased application to areas such as metabolomics.

    1.4.2.3 Multivariate Signal Analysis

    Multivariate curve resolution (MCR) is the main topic of Malinowski's book [47], which is the third edition of his original book [13]. The author is a physical chemist and so the book is oriented towards that particular audience and especially relates to the spectroscopy of mixtures. Although there have been notable advances in the area, especially in alternating least squares (ALS), these are primarily published in the form of papers. Malinowski's book is still the classic book in the area. For more up-to-date reading, search for papers on MCR and ALS. However, the third edition of this book covers ALS well, and most of the pioneering papers were published some 15–20 years ago.

    1.4.2.4 Multivariate Calibration

    Multivariate calibration is a very popular area, and the much reprinted classic by Martens and Næs [48] is one of the most cited books in chemometrics. Much of the book is based around NIR spectroscopy, which was one of the major success stories in applied chemometrics in the 1980s and 1990s, but the clear mathematical descriptions of algorithms are particularly useful for a wider audience. The book by Beebe and co-workers [24] also has good in-depth discussion about calibration. A more recent book by Naes et al. is somewhat less theoretical and is mainly about multivariate calibration [49].

    1.4.2.5 Statistical Methods

    There are a number of books on general statistical methods in chemistry, mainly oriented towards analytical and physical chemists. Miller and Miller's book [10] has gone through several editions and takes the reader through many of the basic significance tests, distributions and so on. There is a small amount on chemometrics in the final chapter. The Royal Society of Chemistry published quite a nice introductory tutorial book by Gardiner [50]. Caulcutt and Boddy's book [9] is also a much reprinted and useful reference. There are several other competing books, most of which are very thorough, for example, in describing applications of the t-test, F-test and ANOVA but which do not progress much into modern chemometrics. If you are a physical chemist, Gans' viewpoint on deconvolution and curve fitting may suit you more [51], covering many regression methods. Meier and Zund published a book in 2000 [52] with a very thorough discussion of univariate methods especially in industrial practice and a little introduction to multivariate methods. Ellison and co-workers published a book based on the UK Valid Analytical Measurement initiative [53].

    Several other books about statistical approaches (mainly univariate) in analytical chemistry and a number of international initiatives that regularly produce reports are regularly being developed.

    1.4.2.6 Digital Signal Processing and Time Series

    There are numerous books on digital signal processing (DSP) and Fourier transforms (FTs). Unfortunately, many of the chemically based books are fairly technical in nature and oriented towards specific techniques such as NMR; however, books written primarily by and for engineers and statisticians are often quite understandable. A recommended reference to DSP contains many of the main principles [54], but several similar books are available. A couple of recent general books on FTs are recommended [55, 56]. For non-linear deconvolution, Jansson's book is well known [57]. Methods for time series analysis are described in more depth in an outstanding and much reprinted book written by Chatfield [58].

    1.4.2.7 Multi-way Methods

    For chemometricians, the best book available is by Smilde et al. [59], which is a thorough description and illustration of the algorithms. There was much development in this area in the 1990s, which was a very exciting era for new algorithms, and the three authors were pioneers of some of the original papers in the chemometrics literature. This book is the best comprehensive summary of the application of such methods in chemistry.

    References

    1 Wold, S. (1972) Spline functions, a new tool in data-analysis. Kemisk Tidskrift, 3, 34–37.

    2 Fisher, R.A. (1925) Statistical Methods for Research Workers, Oliver and Boyd, Edinburgh.

    3 Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Ann. Eugen., 7, 179–188.

    4 Hotelling, H. (1933) Analysis of a complex of statistical variables into principal components. J. Educ. Psychol., 24, 417–441.

    5 Fisher, R.A. (1935) The Design of Experiments, Hafner, New York.

    6 Davies, O.L. (ed.) (1956) Statistical Methods in Research and Production, Longman, London.

    7 Box, G.E.P., Hunter, W.G. and Hunter, J.S. (1978) Statistics for Experimenters, John Wiley & Sons, Inc., New York.

    8 Mandel, J. (1949) Statistical Methods in Analytical Chemistry. J. Chem. Educ., 26, 534–539.

    9 Caulcutt, R. and Boddy, R. (1983) Statistics for Analytical Chemists, Chapman and Hall, London.

    10 Miller, J.C. and Miller, J.N. (1993) Statistics for Analytical Chemistry, 2nd edn, Prentice-Hall, Hemel Hempstead.

    11 Wallace, R.M. and Katz, S.M. (1964) A method for determination of rank in analysis of absorption spectra of multicomponent systems. J. Phys. Chem., 68, 3890–3892.

    12 Katakis, D. (1965) Matrix rank analysis of spectral data. Anal. Chem., 37, 876–878.

    13 Malinowski, E.R. and Howery, D.G. (1980) Factor Analysis in Chemistry, John Wiley & Sons, Inc., New York.

    14 Lindsay, R.K., Buchanan, B.G., Feigenbaum, E.A. and Lederberg, J. (1980) Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project, McGraw-Hill, New York.

    15 Kowalski, B.R., Jurs, P.C., Isenhour, T.L. and Reilly, C.N. (1969) Computerized learning machines applied to chemical problems: interpretation of infrared spectrometry data. Anal. Chem., 41, 1945–1949.

    16 Kowalski, B.R. (1980) Chemometrics. Anal. Chem., 52, R112–R122.

    17 Kowalski, B.R. (ed.) (1984) Chemometrics: Mathematics and Statistics in Chemistry, Reidel, Dordrecht.

    18 Sharaf, M.A., Illman, D.L. and Kowalski, B.R. (1986) Chemometrics, John Wiley & Sons, Inc., New York.

    19 Massart, D.L., Vandeginste, B.G.M., Deming, S.N. et al. (1988) Chemometrics: A Textbook, Elsevier, Amsterdam.

    20 Massart, D.L., Vandeginste, B.G.M., Buydens, L.M.C. et al. (1997) Handbook of Chemometrics and Qualimetrics Part A, Elsevier, Amsterdam.

    21 Vandeginste, B.G.M., Massart, D.L., Buydens, L.M.C. et al. (1997) Handbook of Chemometrics and Qualimetrics Part B, Elsevier, Amsterdam.

    22 Tauler, R., Walczak, B. and Brown, S.D. (eds) (2009) Comprehensive Chemometrics, Elsevier, Amsterdam.

    23 Otto, M. (2016) Chemometrics: Statistics and Computer Applications in Analytical Chemistry, 3rd edn, Wiley-VCH Verlag GmbH, Weinheim.

    24 Beebe, K.R., Pell, R.J. and Seasholtz, M.B. (1998) Chemometrics: A Practical Guide, John Wiley & Sons, Inc., New York.

    25 Kramer, R. (1998) Chemometrics Techniques for Quantitative Analysis, Marcel Dekker, New York.

    26 Brereton, R.G. (1990) Chemometrics: Applications of Mathematics and Statistics to Laboratory Systems, Ellis Horwood, Chichester.

    27 Brereton, R.G. (2007) Applied Chemometrics for Scientists, John Wiley & Sons, Ltd, Chichester.

    28 Massart, D.L., Brereton, R.G., Dessy, R.E. et al. (eds) (1990) Chemometrics Tutorials, Elsevier, Amsterdam.

    29 Brereton, R.G., Scott, D.R., Massart, D.L. et al. (eds) (1992) Chemometrics Tutorials II, Elsevier, Amsterdam.

    30 Varmuza, K. and Filzmoser, P. (2009) Introduction to Multivariate Statistical Analysis in Chemometrics, CRC Press, Boca Raton.

    31 Gemperline, P.J. (ed.) (2006) Chemometrics: A Practical Guide, CRC Press, Boca Raton.

    32 Pomerantsev, A.L. (2014) Chemometrics in Excel, John Wiley & Sons, Ltd, Chichester.

    33 Mark, H. and Workman, J. (2007) Chemometrics in Spectroscopy, Academic Press, London.

    34 Meloun, M. and Militky, J. (2011) Statistical Data Analysis: A Practical Guide, Woodhead, New Delhi.

    35 Martens, H. and Martens, M. (2000) Multivariate Analysis of Quality, John Wiley & Sons, Ltd, Chichester.

    36 Deming, S.N. and Morgan, S.L. (1994) Experimental Design: A Chemometric Approach, Elsevier, Amsterdam.

    37 Bayne, C.K. and Rubin, I.B. (1986) Practical Experimental Designs and Optimisation Methods for Chemists, Wiley-VCH Verlag GmbH, Deerfield Beach.

    38 Morgan, E. (1995) Chemometrics: Experimental Design, John Wiley & Sons, Ltd, Chichester.

    39 Cornell, J.A. (1990) Experiments with Mixtures: Design, Models, and the Analysis of Mixture Data, 2nd edn, John Wiley & Sons, Inc., New York.

    40 Brereton, R.G. (ed.) (1992) Multivariate Pattern Recognition in Chemometrics, Illustrated by Case Studies, Elsevier, Amsterdam.

    41 Joliffe, I.T. (1987) Principal Components Analysis, Springer-Verlag, New York.

    42 Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate Analysis, Academic Press, London.

    43 Massart, D.L. and Kaufmann, L. (1983) The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis, John Wiley & Sons, Inc., New York.

    44 Varmuza, K. (1980) Pattern Recognition in Chemistry, Springer, Berlin.

    45 Strouf, O. (1986) Chemical Pattern Recognition, Research Studies Press, Letchworth.

    46 Brereton, R.G. (2009) Pattern Recognition for Chemometrics, John Wiley & Sons, Ltd, Chichester.

    47 Malinowski, E.R. (2002) Factor Analysis in Chemistry, 3rd edn, John Wiley & Sons, Inc., New York.

    48 Martens, H. and Næs, T. (1989) Multivariate Calibration, John Wiley & Sons, Ltd, Chichester.

    49 Naes, T., Isaksson, T., Fearn, T. and Davies, T. (2002) A User Friendly guide to Multivariate Calibration and Classification, NIR Publications, Chichester.

    50 Gardiner, W.P. (1997) Statistical Analysis Methods for Chemists: A Software-Based Approach, Royal Society of Chemistry, Cambridge.

    51 Gans, P. (1992) Data Fitting in the Chemical Sciences: By the Method of Least Squares, John Wiley & Sons, Ltd, Chichester.

    52 Meier, P.C. and Zund, R.E. (2000) Statistical Methods in Analytical Chemistry, 2nd edn, John Wiley & Sons, Inc., New York.

    53 Ellison, S.L.R., Barwick, V.J. and Duguid Farrant, T.J. (2009) Practical Statistics for the Analytical Scientist: A Bench Guide, 2nd edn, Royal Society of Chemistry, Cambridge.

    54 Lynn, P.A. and Fuerst, W. (1998) Introductory Digital Signal Processing with Computer Applications, 2nd edn, John Wiley & Sons, Ltd, Chichester.

    55 James, J.F. (2011) A Student's Guide to Fourier Transforms, 3rd edn, Cambridge University Press, Cambridge.

    56 Bracewell, R.N. (2000) Fourier Transform and Its Applications, McGraw-Hill, Boston.

    57 Jansson, P.A. (ed.) (1984) Deconvolution: with Applications in Spectroscopy, Academic Press, New York.

    58 Chatfield, C. (2003) Analysis of Time Series: An Introduction, 6th edn, Chapman and Hall/CRC, Boca Raton.

    59 Smilde, A., Bro, R. and Geladi, P. (2004) Multi-way Analysis, John Wiley & Sons, Ltd, Chichester.

    Chapter 2

    Experimental Design

    2.1 Introduction

    Although all chemists acknowledge the need to be able to design laboratory-based experiments, formal statistical (or chemometric) rules are rarely developed as part of mainstream chemistry. In contrast, a biologist or a psychologist will often spend weeks in carefully constructing a formal statistical design before investing months or years in time-consuming and often unrepeatable experiments and surveys. The simplest of experiments in chemistry are relatively quick and can be repeated, if necessary, under slightly different conditions; hence, not all chemists observe the need for formalised experimental design early in their career. For example, there is little point in spending a week for constructing a set of experiments that take a few hours to perform. This lack of expertise in formal design permeates all levels from management to professors and students. However, in contrast, some real-world experiments are expensive; for example, optimising conditions for a synthesis, testing compounds in a quantitative structure–activity relationships (QSAR) study or improving the chromatographic separation of isomers, and can take days or months of people's time, and it is essential to, under such circumstances, to have a good appreciation of the fundamentals of design.

    There are several key reasons why the chemist can be more productive if he or she understands the basis of design, including the following four main areas.

    Screening. These types of experiments involve considering factors that are important for the success of a process. An example may be the study of a chemical reaction, dependent on the proportion of the solvent, catalyst concentration, temperature, pH, stirring rate and so on. Typically, 10 or more factors might be relevant. Which can be eliminated, and which should be studied in detail? Approaches such as factorial and Plackett–Burman designs (Sections 2.3.1–2.3.3) are useful in this context.

    Optimisation. This is one of the commonest applications in chemistry. How to improve a synthetic yield or a chromatographic separation? Systematic methods can result in a better optimum, found more rapidly. Simplex is a classical method for optimisation (Section 2.6), although several designs such as mixture designs (Section 2.5) and central composite designs (Section 2.4) can also be employed to find optima.

    Saving time. In industry, this is possibly the major motivation for experimental design. There are obvious examples in optimisation and screening, but even more radical cases, as in the area of quantitative structure–property relationships. From structural data of existing molecules, it is possible to predict a small number of compounds for further testing, representative of a larger set of molecules. This allows saving of enormous time. Fractional factorial, Taguchi and Plackett–Burman designs (Sections 2.3.2 and 2.3.3) are good examples, although almost all experimental designs have this aspect in mind.

    Quantitative modelling. Almost all experiments, ranging from simple linear calibration in analytical chemistry to complex physical processes, where a series of observations are required to obtain a mathematical model of the system, benefit from good experimental design. Many such designs are based around the central composite design (Section 2.4), although calibration designs (Section 2.3.4) are also useful.

    An example of where systematic experimental design is valuable is the optimisation of the yield of a reaction as a function of reagent concentration and pH. A representation is given in Figure 2.1. In reality, this relationship is unknown in advance, but the experimenter wishes to determine that the pH and concentration (in mM) provide the best reaction conditions. Within 0.2 of a pH and concentration unit, this optimum happens to be pH 4.4 and 1.0 mM. Many experimentalists will start by guessing one of the factors, say concentration, and then finding the best pH at that concentration.

    Geometrical illustration of Yield of a reaction as a function of pH and catalyst concentration.

    Figure 2.1 Yield of a reaction as a function of pH and catalyst concentration.

    Consider an experimenter who chooses to start the experiment at 2 mM and wants to find the best pH. Figure 2.2 shows the yield at 2.0 mM. The best pH is undoubtedly a low one, in fact pH 3.4. Hence, the next stage is to perform the experiments at pH 3.4 and improve the concentration, as shown in Figure 2.3. The best concentration is 1.4 mM. These answers, pH 3.4 and 1.4 mM, are quite far from the true ones.

    Plot for Cross-section through surface at 2mM catalyst concentration.
    Enjoying the preview?
    Page 1 of 1