P5 and Statistics PDF
P5 and Statistics PDF
Hypothesis: A scientific hypothesis is a tentative explanation of a natural phenomenon capable of being tested by
observation or experiments.
Extension
A hypothesis is never really proved but can be disproved.
It was Karl Popper, a philosopher of science who showed that a hypothesis is never really proved but can be
disproved at any time, by a single contrary observation. He maintained that all explanations of science are of this type.
That is, scientific knowledge is always only tentative. It is the best available explanation we can offer at any time. Any
explanation may be proved wrong or incomplete, and often is, sooner or later.
Qualitative
ordered ordinal, i.e. values that can be placed in an order or rank, the interval
between them may not be equal, e.g. opinion judgements – ‘completely
agree’, ‘mostly agree’, ‘mostly disagree’, ‘completely disagree’.
continuous interval, that can have any value within a specific range and can be a whole
number, a fraction or a decimal. It can be counted, ordered and measured,
e.g. body mass, time taken for seeds to germinate after different treatments
discrete interval, that can have only a limited number of values which are usually
whole numbers, e.g. the number of seeds in a bean pod, the number of cells
in a haemocytometer grid
Extension
Handling very large and very small numbers
In science ‘powers of ten’ are used to avoid writing long strings of zeros when recording numbers. For example, the
age of the Earth, about 4 500 000 000 years, is written as 4.5 × 10 years. Similarly, a cyanobacterium, a
9
photosynthetic bacterium, may be about 0.000 003 6 metres in diameter, which is written as 3.6 × 10 m.
−6
This way of recording numbers is called scientific or standard notation. It is used to avoid the
errors that are easily made when writing down a large number of zeros. Also, when we need to
multiply numbers we can do so by adding powers. Similarly, to divide, the powers are subtracted.
Several of the powers of ten have prefixes which are frequently used in biology and are
represented by agreed symbols (Table A2.2).
Table A2.2 Powers of ten
× 10 kilo- or k × 10 deci- or d
3 −1
× 10 mega- or M × 10 centi- or c
6 −2
× 10 giga- or G × 10 milli- or m
9 −3
× 10 micro- or μ
−6
× 10 nano- or n
−9
Standardised (controlled) variables: the variables in an experiment or investigation that are kept the same so they
do not influence the measurement of the dependent variable.
The variables that are always kept the same in this investigation of the rate of reaction of catalase are the
conditions such as temperature, the volumes and concentrations of the reagents, and perhaps the pH (the
volume and type of buffer, if used, for example). Variables that are kept constant are called
the controlled or standardised variables.
Meanwhile, in Figure 3.8 you can see that measurements are made at 30 second intervals. So the times
that readings are taken are the variable that is manipulated by the experimenter, and is called
the independent variable. Note that it has been recorded in the table as a list before the experiment was
started.
In the experiment, the amount of gas (oxygen) that has collected at 30 second intervals is the variable
accurately measured by the experimenter. It is called the dependent variable and is recorded in the table.
(Sometimes, two dependent variables may be measured. So, for example, in photosynthesis both oxygen
production and carbon dioxide intake may be measured.)
In a scientific investigation a control is used to ensure that any effects observed are due to changes in
values of the independent variable and not some unidentified variable. Thus a control is one value of the
independent variable. For example, in the investigation of the action of the enzyme catalase in bringing
about the breakdown of hydrogen peroxide, the control involves the use of boiled enzyme, since heat
denatures a protein and destroys its catalytic properties (Figure 3.10, page 64).
Reliability: reliable results are repeatable by the same student and reproducible by others.
Linear magnification and the actual sizes of
images and objects
Calculating the linear magnification of drawings and
photographs
Example
In a TEM of a red blood cell, the diameter of the image of the cell was 45 mm, and the actual size of the red
blood cell was 7.5 μm. What is the magnification of this TEM?
Magnification is the number of times larger an image is compared to its real size.
Then:
To try
In a photomicrograph of a green plant cell, the image had been magnified ×400. The image of the cell is 94
mm in length.
What is the actual size of the cell?
Presenting data
Firstly, you must select from the data what is important. For example, it may be appropriate to round
numbers in order to avoid giving data to a greater level of accuracy than the measurements warrant. Then
you need to display the important data using a visual summary, making their importance clear. This might
involve graphs, bar charts, histograms, scatter graphs or pie charts, for example.
Graphs
Graphs show relationships or trends between two variables. It is conventional to plot the dependent variable
(the variable being measured) on the y (vertical) axis and the independent variable (the variable altered by
the experimenter) on thex (horizontal) axis. Both axes must be labelled and the units indicated. Points (a dot
within a circle or a cross, for example) must be plotted with a sharp pencil. The points should be joined by a
smooth curve only if you are confident that it indicates the likely points of intermediate readings. If you are
not confident of this, connect points with a straight line. If more than one line is plotted, then the plot points
and lines must be different and be clearly labelled. If the two plots have different vertical axes, then the scale
of the axes should be placed on either side of the graph.
The term graph applies to the whole graphic representation. The line on the graph showing a relationship,
whether straight or curved, is referred to as a curve. Examples of graphs are seen throughout this book.
Question
1 The following results were obtained in an investigation of the effects of pre-incubation of starch and
amylase solutions at different temperatures on the subsequent hydrolysis of the starch to sugar.
Plot a graph of these results, applying the rules and conventions detailed here.
Check your work against the model graph provided at the end of this Appendix.
Bar charts
Bar charts are used to show relationships between the independent variable (on the x axis) and the dependent
variable (on the y axis) when the independent is categoric and the dependent variable is continuous, such as
the range of tree species found in woods. There should be small gaps between the lines or bars used, which
should be of equal width, and typically presented in order of magnitude (Figure A2.4a).
Histograms
Histograms are useful to display continuous data. They show the variations in a sample of repeated
measurements. The x axis represents the variation in the repeated measurements. The y axis is the frequency
or number in each class. Normally the blocks are drawn touching. There should be an informative title
(see Figure A2.3).
Scatter graphs
We construct a scatter graph to investigate whether there may be a relationship between two variables. If one
variable increases, does the other also increase? This is a common situation in biology, for example, the
question of whether there is a correlation between blood pressure and heart rate, or smoking and heart
disease, or river flow rate and diversity of aquatic non-vertebrates. The first step in investigating a
correlation is to plot a scatter graph of one variable against another. The shape of the scatter graph indicates
the type of correlation. By ‘correlation’ we mean a mutual relation between two (or more) things, or an
interdependence of variable quantities. If both variables increase together then there is a positive
correlation; if one variable decreases when the other increases then there is a negative correlation.
The closer the data points come to laying on a straight line, the closer the relationship; the more scattered
the data points the less close the relationship. If the scatter graph has random points then there is no
correlation (Figure A2.4b).
It is important to realise that a correlation between two variables does not necessarily mean that the
variable are causally linked. So, having applied a statistical test that indicates the possibility of a
correlation, we have to go on to investigate the mechanisms of the linkage, if there is one.
Pie charts
Pie charts are best used for showing relative proportions.
Statistical checks on data
Statistical tests should be used when you are not sure about the numerical relationships your data indicate.
The application of simple statistical tests is described below. Many calculators are programmed to carry out
statistical tests. Microcomputers will run spreadsheet programs with statistical tests programmed in.
Dedicated statistical software is available too.
Standard deviation: the spread of a set of data from the mean of the sample is a measure of the variability of a
population from a sample. A small standard deviation indicates that the data is more reliable.
Standard error: an estimate of the reliability of the mean of a population sample. A small standard error indicates that
the mean value is close to the actual mean of the population.
Standard error
The standard error (SM) represents how well the sample mean approximates the population mean. The
larger the sample, the smaller the standard error, and the closer the sample mean approximates the
population mean. The standard error is obtained by dividing the standard deviation, s, by the square root
of n, the sample size.
When graphs are presented showing mean values, error bars are added to each value plotted to demonstrate
the deviation of the sample from the true population mean. Error bars (±SM) extend above and below the
points plotted on a graph to show this variability.
The t-test
Statistical tests using standard deviation or standard error typically compare large, randomly selected
representative samples of normally distributed data. In practice it is often the case that data can only be
obtained from quite small samples. The t-test may be applied to sample sizes of more than 5 and less than 30
taken from normally distributed data. It provides a way of measuring the overlap between two sets of data –
a large value of t indicates little overlap and makes it highly likely there is a significant difference between
the two data sets. The following example illustrates the method of the t-test, however, you should note that
you are not expected to calculate values of t.
Extension
The null hypothesis
Statistical tests are hypothesis-testing statistics. They test a mathematical statement called the null hypothesis.
Where we are comparing data the null hypothesis states that there is no difference between the sets of data. When
we are looking for an association it states that there is no association.
The outcome of a statistical test is a probability that the null hypothesis is true. A probability
(known as the P-value) varies from 0 (impossible) to 1 (certain). Since the P-values are small, they
are given as a percentage (0 to 100%) to avoid possible confusion with small numbers. The lower
the probability, the less likely it is that the null hypothesis is true.
The t-test makes comparison between means of data to test for significant differences between
the samples. For the t-test the null hypothesis is ‘There is no difference between the means’. By
convention in biology, if the probability is greater than 0.05 (5%) then the null hypothesis is
accepted. However, if the probability is 0.05 or less (P < 0.05), then the null hypothesis is rejected.
This implies the event is predicted to happen by chance less than once in twenty times. So the
difference is judged to be significant.
20–24 24
25–29 26, 26 26
40–44 43 41, 42
45–49 45
The steps of the t-test
1 The null hypothesis assumes the difference under investigation has arisen by chance. That is, there is no
difference in width between leaves from plants growing in sun and shade. The role of this statistical test is to
determine whether to accept or reject the null hypothesis. If it is rejected in this case, we can have
confidence that the difference in the leaf sizes of the two samples is statistically significant.
2 Next, check that the data are normally distributed. This is done by arranging the data for the two samples as
in Table A2.3. (and plotting a histogram, if necessary).
3 You are not expected to calculate values of t. This is a statistic which, if required can be found by using a
scientific or statistics calculator or by means of a spreadsheet incorporating formulae.
Actually, the formula for the t-test for unmatched samples (data sets a versus b) is:
where:
a = the mean of data set a
b = the mean of data set b
sa = the standard deviation of data set a, squared
2
whether observed numerical results differ from the expected numerical result (Table 16.5, page 365). It tests
for ‘goodness of fit’ between an observed distribution and a theoretical one. It allowed us to test whether the
observed results obtained from the dihybrid cross between Drosophila of normal flies (wild type) with flies
homozygous for vestigial wing and ebony body differ significantly from the expected outcome.
Turn back to pages 364–5 and refresh your memory of the formula for the chi-squared statistic and how it
is applied. Note that in the chi-squared test, the null hypothesis is ‘Observed frequencies equal expected
frequencies’. The chi-squared test is also applied in ecology for looking at the differences in distribution of
organisms in different habitats.
In summary
We have now introduced the four tests that you may need to be able to apply. The formulae for all four are
given in Figure A2.11.
You do not need to memorise the formulae or the meaning of their symbols. However, you may need to
use them in practical work:
• to calculate a standard deviation
• to put error bars on graphs
• to test for a significant difference between the means of two small samples
• to perform a chi-squared test on suitable data.
To do this you will have access to the formulae, the meaning of the symbols, a t-table and a chi-squared
table. Rather than carry out all the steps of a test in an examination, you may be given partly completed
calculations to finish. Consequently, it is helpful to be fully acquainted with the use of an approved
electronic calculator and have used it to become familiar with each of the four tests.
Answer to question 1