Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Practical Statistics for Environmental and Biological Scientists
Practical Statistics for Environmental and Biological Scientists
Practical Statistics for Environmental and Biological Scientists
Ebook434 pages4 hours

Practical Statistics for Environmental and Biological Scientists

Rating: 0 out of 5 stars

()

Read preview

About this ebook

All students and researchers in environmental and biological sciences require statistical methods at some stage of their work. Many have a preconception that statistics are difficult and unpleasant and find that the textbooks available are difficult to understand.

Practical Statistics for Environmental and Biological Scientists provides a concise, user-friendly, non-technical introduction to statistics. The book covers planning and designing an experiment, how to analyse and present data, and the limitations and assumptions of each statistical method. The text does not refer to a specific computer package but descriptions of how to carry out the tests and interpret the results are based on the approaches used by most of the commonly used packages, e.g. Excel, MINITAB and SPSS. Formulae are kept to a minimum and relevant examples are included throughout the text.
LanguageEnglish
PublisherWiley
Release dateApr 30, 2013
ISBN9781118687413
Practical Statistics for Environmental and Biological Scientists

Related to Practical Statistics for Environmental and Biological Scientists

Related ebooks

Biology For You

View More

Related articles

Reviews for Practical Statistics for Environmental and Biological Scientists

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Practical Statistics for Environmental and Biological Scientists - John Townend

    Preface

    Statistics wasn’t forced upon the environmental and biological sciences; it has been absorbed into their practice because it was realized that it had something to offer. Statistical methods provide us with ways of summarizing our data, objective methods to decide how much confidence we can place in experimental results, and ways of uncovering patterns that are initially masked by the complexity of a dataset. Also, if we carry out scientific investigations according to our instincts, there is a risk that we will bias the results by overlooking some important factor or through our desire to get a particular result. By carefully following accepted statistical procedures we can avoid these problems and, just as importantly, we will be seen to have avoided them, so our results will be more readily accepted by others.

    Statistics is also a useful means of communication. For example, a researcher might state that ‘the molluscs had a mean shell length of 12.2 mm ± 1.6 mm (standard error)’, or report that ‘ANOVA showed significant differences between nitrogen contents in different groups of plants (P = 0.02)’. These are succinct ways of explaining a great deal of detail about how studies have been carried out and what can be concluded from them. Of course, they are only really a useful means of communication if you understand what the terms mean. Like it or not, though, they are widely used, so whether you intend to use statistics yourself or just read about others’ research, it will still be a great help to know something about it.

    While teaching statistics in a university I found that, for the most part, the statistical methods required by both environmental and biological scientists were the same. Indeed this might be expected, because much of the science is common to both as well. I also found that requirements were very similar at all levels from undergraduate to experienced professional. Really there is seldom any necessity to use complex statistical methods to do world-class research in environmental and biological sciences. Those who are able to identify the key, simple questions to ask are likely to enjoy the greatest success. So it is that I have tried to put together a book that addresses as many of the most common needs as possible.

    The choice of content is based on the questions I have most frequently been asked and the explanations that seemed to work best. Memorizing formulae will be of very little practical use to you, except perhaps to pass an exam; most calculations can be carried out by computer these days. However, computers do not generally tell you whether you are carrying out the right calculations or exactly what you can conclude from the results. Here textbooks still have a part to play. In this book I try to unlock many of the codes commonly used to present scientific information and to provide you with the tools you need to be an effective user of statistics yourself. I wholeheartedly hope that it will provide you with the information you need.

    PART I

    STATISTICS BASICS

    Chapters 1 to 6 introduce the ideas behind statistical methods and how practical studies should be set up to use them. They aim to give the required background for using the methods in Part II. Readers who are new to statistics or in need of a short refresher might find it useful to read this part in its entirety.

    1

    Introduction

    If your first love was statistics, you probably wouldn’t be studying or working in environmental or biological sciences. I am starting from this premise.

    1.1 Do you need statistics?

    Somebody who is trying to sell you a statistics textbook is probably not the best person to ask whether you need statistics. Maybe you have opened this book because you have an immediate need for these techniques or because you have to study the subject as part of a course. In this case the answer for you is clearly yes, you need statistics. Otherwise, if you want to know whether statistics is really relevant to you, ask people who have been successful in your chosen area – academics, researchers or people doing the kind of job you want to do in the future.

    Some use it more than others, and certainly you will find some very successful people who are not confident with statistics and possibly dislike any involvement with it. I don’t believe being a brilliant statistician is a necessary condition for being a brilliant biologist or environmental scientist. However, you will probably find that most of the people you ask would have found it useful to understand statistics at some stage in their career, perhaps very regularly. Even if you do not need it to present results yourself, you will need to understand some statistics in order to understand the real meaning of almost any scientific information given to you.

    The fact that most university degrees in environmental and biological sciences include a compulsory statistics course is simply a recognition of this. However, do not think that understanding statistics is all or nothing. Even a basic understanding of why and when it is used will be very valuable. If you can grasp the detail too, so much the better.

    1.2 What is statistics?

    Football scores, unemployment rates and lengths of hospital waiting lists are statistics, but not what we commonly think of as being included in the subject of statistics. An interesting definition I heard recently was that statistics is ‘that part of a degree which causes a sinking feeling in your stomach’. I don’t have an all-encompassing definition myself, but it will be helpful if you can keep in mind that more or less everything in this book is concerned with trying to draw conclusions about very large groups of individuals (animate or inanimate) when we can only study small samples of them. The fact that we have to draw conclusions about large groups by studying only small samples is the main reason that we use statistics in environmental and biological science.

    Supposing we select a small sample of individuals on which to carry out a study. The questions we are trying to answer usually boil down to these two:

    If I assume that the sample of individuals I have studied is representative of the group they come from, what can I tell about the group as a whole?

    How confident can I be that the sample of individuals I have studied was like the group as a whole?

    These questions are central to the kind of statistical methods described in this book and to most of those commonly used in practical environmental or biological science. We are usually interested in a very large group of individuals (e.g. bacteria in soil, ozone concentrations in the air at some location which change moment by moment, or the yield of wheat plants given a particular fertilizer treatment) but limited to studying a small number of them because of time or resources.

    Fortunately, if we select a sample of individuals in an appropriate way and study them, we can usually get a very good idea about the rest of the group. In fact, using small, representative samples is an excellent way to study large groups and is the basis of most scientific research. Once we have collected our data, our best estimate always has to be that the group as a whole was just like the sample we studied; what other option do we have? But in any scientific study, we cannot just assume this has to be correct, we also need to use our data to say how confident we can be that this is true. This is where statistics usually comes in.

    Almost all experimental results are as described above. They state what is the case in a small sample that was studied, and how likely it is to be true of the group it was taken from. Elementary textbooks often quote results leaving out any indication of how much confidence we can place in them for the sake of clarity. However, most of the results they quote originally come from papers published in scientific journals. If you look at the results presented in a scientific journal, you will see statements like:

    Big gnomes catch more fish than little gnomes (P = 0.04)

    The study would have been carried out using samples of big gnomes and small gnomes and the statement is really shorthand for:

    In our samples, on average, big gnomes caught more fish than little gnomes, so we expect that big gnomes in general catch more fish than little gnomes.

    Based on the evidence of our samples, we can really only be 96% confident that big gnomes in general do catch more fish than little gnomes.

    You can see that the second, qualifying, statement (which comes from the P = 0.04) is really quite important to understanding what the researchers have actually shown. It is not as clear-cut proof as you might otherwise think.

    We will look in more detail at how to interpret the various forms of shorthand as we go through the different statistical techniques, but notice that when the result is stated in full we have (i) a result for the whole group of interest assuming that the samples studied were representative, and (ii) a measure of confidence that the samples studied actually were representative of the rest of the groups. This point is easy to lose sight of when we start to look at different techniques.

    Textbooks tend to emphasize differences between statistical techniques so that you can see when to use each. However, these same ideas lie behind nearly all of them. Statistical methods, in a wide variety of disguises, aim to quantify both the effects we are studying (i.e. what the samples showed), and the confidence we can have that what we observed in our samples would also hold for the rest of the groups they were taken from. If you can keep this fact in mind, you already understand the most important point you need to know about statistics.

    1.3 Some important lessons I have learnt

    Statistics as a science in its own right can be very complicated. The statistics you need to be a good environmental scientist or biologist is only a small and fairly straightforward subset of this. Even a general understanding of the basic ideas will be a great asset when you come to interpret other people’s experimental results. When you know some of the shorthand, like the example of the gnomes, you will see that very many scientific ‘facts’ are not as clear-cut and certain as we often imagine. Understanding just this already gives you statistical and scientific skills beyond those of the general public. You will quickly learn to be more discerning about what scientific ‘facts’ you really believe.

    There is no denying that a skilled statistician would have methods in his or her armoury beyond those I have included in this book. There are not statistical techniques available for every eventuality, but there are techniques for a good many of them. However, it takes rather a long time to learn about them all and probably you want to get on with some environmental or biological science too. I have therefore selected in this book a range of techniques that I consider most relevant and useful, and I believe these are sufficient to allow you to conduct most types of environmental or biological study with a little careful planning. Now here’s the bit that a lot of people find difficult to grasp. The thing that separates competent environmental scientists and biologists from incompetent ones, in terms of statistical skills, is not numeracy, but careful planning. The chances are that a computer will do all of your calculations for you.

    By the time you sit down at the keyboard with your data you will have already made most of the mistakes you are likely to make. Just when you think you are about to start the statistical part of your project, your part in the statistics is really coming to an end. If you have planned carefully, formed a clear idea of what you are investigating, followed the layout of appropriate examples from this or other books, and carried out your survey or experiment accordingly, the analysis and interpretation will be plain sailing. Please don’t leave all thought of statistical analysis to the point where you sit down with your data already in hand. You would be unlikely to find the analysis plain sailing then. This is an important lesson I have learnt.

    1.4 Statistics is getting easier

    Until the 1980s most statistical calculations were done using a pocket calculator or by hand. Nowadays almost all calculations are carried out by computer. We need only know which test to use and how to enter the data in order to carry it out. I have heard concerns that many students nowadays just quote the output without understanding it. This is probably true, but it was always thus. As far as I can see, the only difference with precomputer days is that then you would spend two hours struggling with the calculations so there was a feeling you had earned the right to give the result. I don’t believe the average user of statistics either knew or cared what the calculations were actually doing any more then than they do now.

    Although I do not think that as users of statistics we need to do the calculations ourselves, we do lose a lot if we take the results without understanding anything about the methods. Until recently it was necessary to teach the calculations behind statistics because without them you could not use statistics, whether you understood them or not. To someone who is comfortable with mathematical concepts, the formulae are also a satisfactory explanation of what is going on, so teachers often believed they had covered method and understanding at the same time.

    An aunt of mine used to say, ‘There are liars, damn liars and sadistics. Most of the liars and damn liars go into law or advertising so don’t bother us much, but most of the sadistics teach numeracy skills. That’s why maths and statistics are hard.’ It is my belief that because statistics has traditionally been taught as a mathematical skill, although most students got by with the methods, very few picked up the understanding along the way. There is a great challenge here for teachers of statistics. Rather than seeing the removal of the calculations as a sad loss to understanding, we should take advantage of this to try to make the meaning and value of statistics more accessible to all.

    1.5 Integrity in statistics

    Scientific research relies on the integrity of the people conducting the research. Most of the time, we just have to believe that researchers have been honest in their work as there is no way to tell if results have been made up. In fact, in my experience very few people do lie about the actual values they have collected, even if they are disappointing. Most scientists, I think, have a fairly strong sense of conscience. We also need to have this attitude to carrying out an appropriate statistical analysis. Some kinds of analysis are easier to do than others and some may appear to give us the result we want whilst others do not. However, just because it is possible to use one statistical technique does not necessarily mean it is valid. Usually it is necessary to make certain checks on the data to discover whether a particular method can be applied validly (Chapter 6). Unfortunately, this can sometimes lead us to have to do more work, so there is a temptation to skip this stage.

    The reader of our work, of course, has to assume that we have done the appropriate checks and, if necessary, carried out the extra work. Otherwise we should add the qualifying statement ‘assuming the test was valid in this case’, but then who would take our results seriously? If we just present results without checking the validity of using our chosen statistical method, we are deliberately deceiving the reader. If you value the integrity of your work, therefore, checking the validity of applying particular statistical methods must be seen as part of the normal process of statistical analysis. The checks required are described in the ‘Limitations and assumptions’ sections preceding each of the methods described in this book.

    1.6 About this book

    I have tried as far as possible to avoid mathematical descriptions of the techniques. There are a few simple formulae which readers might find occasion to use because they are not covered by some of the common statistical programs. I have included these in boxes; you can skip them if you want. I have also included some formulae in Appendix A, principally because they might be needed by some people for examinations in statistics. Mainly I have tried here to describe some of the range of techniques available, when you can use them, how to use them, and what the results are telling you; I have assumed that you will use a computer to do most of the calculations.

    Competence and confidence in statistics will be an asset to you as an environmental or biological scientist, but at the same time it is only one of many things that will make you a good environmental or biological scientist. You only have so much time available, and to suggest you study the detail of statistics may not be the best use of it. With this in mind I have tried to include only techniques and a level of detail that I think will be genuinely useful to those studying or working in environmental or biological sciences.

    There are different schools of thought about whether or not one should illustrate statistics with real experimental data. My own thoughts on this are that it is best to use simple datasets to demonstrate the techniques. It is not possible to cover all the eventualities that will arise in real-life results. However, provided you understand clearly what is required, you will be in a strong position to decide how to collect and handle your own data. All of the datasets in this book are therefore invented to demonstrate particular points.

    The book is divided into two parts. Chapters 1 to 6 cover some basic statistical ideas and are intended to give you the necessary background for any of the statistical techniques in later chapters. Chapters 7 to 15 are more of a reference section with different statistical tests or methods described in each chapter. Guidance on which test to use in a particular situation is given in Section 3.2 and in the decision chart in Appendix D.

    I have also included some pointers to more advanced techniques that readers might find useful in the further reading sections at the end of some of the chapters. If you have a computer package available to carry these out, understanding the details of the calculations need not necessarily be a problem to you. Nevertheless, before going ahead and using any of them it is important to familiarize yourself with what the tests are actually testing, and the assumptions and limitations they have about the types of data they are suitable for. In general, I have not specified particular texts to consult because these techniques are widely covered in many of the more in-depth statistical textbooks, and probably most of these would give you similar information.

    2

    A Brief Tutorial on Statistics

    2.1 Introduction

    From Chapter 3 onwards I describe a range of statistical tests and methods, and how to design experiments or surveys that make use of them. If you are studying this subject for the first time, you will probably find it difficult to retain all this information in your head. For the most part, this is not a problem. You can refer back to the book when you need to. However, there are some basic ideas behind all statistical methods and if you can keep these in mind, they will help you to make sense of statistical methods in general. These basic ideas are the subject of this chapter.

    2.2 Variability

    Think of a group you might want to study, e.g. the lengths of fish in a large lake. If all of these fish were the same length, you would only need to measure one. You can probably accept that they are not all the same length, just as people are not all the same height, not all volcanic lava flows are the same temperature, and not all carrots have the same sugar content. In fact, most characteristics we might want to study vary between individuals. If we measured the lengths of 100 fish, we could plot them on a graph as in Figure 2.1(a).

    To understand this graph, think how we would add the extra point if we measured another fish to be 42 cm. It would appear as an additional fish in the column labelled > 40–45 cm. The graph tells us that most of the fish were about the same length, and gives us a picture of how widely spread the individuals’ lengths were around this. We can see that only a few fish were greater than 50 cm or less than 15 cm. Figure 2.1(b) shows the more usual ways of presenting this kind of data.

    Figure 2.1 (a) Numbers of fish out of a sample of 100 falling into different size ranges. Note >10–15 means fish more than 10cm long, up to and including 15cm. (b) When larger numbers of measurements are involved, it becomes inconvenient to represent each individual, so a column graph or a line can be used to show the shape of the distribution

    The graphs in Figure 2.1 are called frequency distributions. For some things we might measure, we would find different distributions such as a lot of low values, some high values, and a few extremely high values. These result in different shapes of graph (Sections 6.1 and 6.2). However, it turns out that if we measure a set of naturally occurring lengths, concentrations, times, temperatures, or whatever, and plot their distribution, very often we do get a diagram with a shape similar to those in Figure 2.1. This shape is called a Normal distribution. Statisticians have derived a mathematical formula which, when plotted on a graph, has the same shape. Being able to describe the distribution of individual measurements using a mathematical formula turns out to be very useful because, from only a few actual measurements, we can estimate what other members of the population are likely to be like. This idea is the basis of many statistical methods.

    2.3 Samples and populations

    As discussed in Chapter 1, practical considerations almost always dictate that we study any group we are interested in by making measurements or observations on a relatively small sample of individuals. We call the group we are actually interested in the population. A population in the statistical sense is fairly close to the common meaning of the word, but can refer to things other than people, and usually to some particular characteristic of them. Here are some examples of statistical populations:

    The lengths of blue whales in the Arctic Ocean

    All momentary light intensities at some point in a forest

    Root lengths of rice plants of a particular variety grown under a specific set of conditions

    In the first example, the population is real but we are unlikely to be able to study all of the whales in practice. Populations in the statistical sense, however, need not be finite, or even exist in real life. In the second example, the light intensity could be measured at any moment, but the number of moments is infinite, so we could never obtain measurements at every moment. In the third example, the population is just conceptual. We really want to know about how rice plants of this variety in general would grow under these conditions but we would have to infer this by growing a limited number of rice plants under the specified conditions. Although the few plants in our sample may be the only rice plants ever to be grown in these conditions, we still consider them to be a sample representing rice plants of this variety in general growing in these conditions.

    2.4 Summary statistics

    It is often useful to be able to characterize a population in terms of a few well-chosen statistics. These allow us to summarize possibly large numbers of measurements in order to present results and also to compare populations with one another.

    Mean, variance, standard deviation and coefficient of variation

    If we want to describe a population it may sometimes be useful to present a frequency distribution like those in Figure 2.1, but this is usually more information than is needed. Two items are often sufficient:

    A measure which tells us what a ‘typical’ member of the population is like

    A measure which tells us about how spread out the other members of the population are

    Enjoying the preview?
    Page 1 of 1