BIO 610 Lab Edited (Student)
BIO 610 Lab Edited (Student)
BIO 610
EXPERIMENTTAL BIOLOGY:
DESIGN & ANALYSIS
LAB MANUAL
1
INTRODUCTION
________________________________________________________________________
The students are also expected to read the labs carefully and understand them well before coming
into the lab.
Attendance is compulsory!!!!
LABORATORY SCHEDULE
WEEK TOPIC
3 Summary Statistics.
4 Probability.
2
Lab report:-
i) should be completed and handled in for marking, the following week during the
next practical.
iii) will provide the basis for your course work (lab report – 10%) assessment mark in
BIO 610.
Introduction
The following format is to be used for writing "full" lab reports.
It is preferably typed however, a hand written report is also acceptable.
Lab report grades are based on the following criteria:
o completeness,
o neatness,
o clarity,results,
o and answers to questions (if any),
I. Front Page
The students must write;
o Full name,
o Title,
o Date of experiment,
o Group
II. Title
Give the full title of the lab exercise.
III. Purpose/objectives: State in a complete sentence the reason for doing the lab exercise, or
Hypothesis: Include a hypothesis, if it is appropriate for the exercise (otherwise delete this item).
The hypothesis should be in an if ______then_______statement.
IV. Introduction
An introduction of 1-2 paragraphs should be sufficient. It should provide the background of the
underlying lab and techniques that are used in it.
3
B. Organize data in an easy to read format. Use data tables whenever possible and be sure to give
all tables good titles.
C. All measurements must be in proper metric units.
D. Whenever you need to perform experiments with multiple replicates, condense your data into
averages and standard deviations across replicates.
E. Do not show all raw data unless I specifically ask you to.
VIII. Discussion
A. Conclusion:
Present a brief conclusion that ties together the reason for doing the exercise and results
obtained.
You must either accept or reject your hypotheses.
Use specific examples from your processing of data or data and observations to support your
conclusion.
Use complete sentences.
B. Source of Error:
Give at least one source of error.
Be specific, and explain why it is a source of error. Also, explain how this error affected your
results.
IX. References
Include complete citations of any works you cite such as textbooks, journals and internet
materials.
4
Scheme of Lab Report
Marks will be given as follows:
Result:
Data Observation (diagram/ table, etc.) 5
Data Analysis (calculation/graph, etc.)
Discussion:
Include answer for any post-lab question) 5
Conclusion 2
References 1
Overall structure:
Include grammar & quality. 1
TOTAL 20
5
LAB 1: SAMPLING, FREQUENCY TABLES AND STEM-AND-LEAF PLOTS
Introduction
Two of the commonest types of mistake in statistical calculations are simple arithmetical errors
and errors in copying numbers, especially the very large numbers which tend to arise at
intermediate stages in statistical calculations. It is therefore important to cultivate good work
habits which keep these to a minimum. One such habit is the tabulation of data in a properly
constructed table, preferably on ruled.
Besides that, in statistic we are concerned not with the particular results of individual
measurements but with the distribution of the measured values. A great deal of work in statistics
is spent in identifying and describing the distribution associated with a particular set of
measurements or observations and the first thing we must do is to consider ways of representing
distributions of random numbers.
Objective:
i) To select a simple random sample from the population.
ii) To explore the LEAF data in your sample with a stem–and-leaf plot and frequency
table.
Materials:
Leaves
Ruler
Activities
1. Data Gathering
a. Find a tree or shrub (if possible, each group pick the leaves from different tree or shrub).
b. Pick 50 leaves per tree keeping the petiole attached. Try to pick a "random" assortment of
sizes if they exist on your chosen tree.
c. For each leaf, record the length of leaf (in tenths of mm).
d. Enter the leaf data into a computer file using an Excel/SPSS format, if possible.
6
Tasks:
1. Create the result as:-
a. Table of raw data.
b. Frequency distribution table which contain measured weight, implied weight,
class mark, frequency, relative frequency and cumulative frequency.
2. Construct a stem-and-leaf plot of the above data. Describe the shape, location, and spread
of the distribution.
1. Stem-and-leaf plot
a. Start SPSS
b. Open the file which contains your data and click Analyze > Descriptive Statistics >
Explore.
c. Place the LENGTH OF LEAF in the Dependent List and click OK.
d. After the program runs, go to the OUTPUT window and navigate to the Stem-and-leaf
plot.
(How does this plot compare with the one you constructed by hand?)
2. Frequency table
a. Click the Variable View tab (toward the bottom of the screen) and create a new
variable named LENGTHGRP. Make this a numerical variable with width 8 and 0
decimals.
b. In the column called “Label,” enter “Age Group” to give the variable a descriptive
label.
c. Click the Data View tab at the bottom of the screen and classify each leaf with the
appropriate codes: e.g. 1 = 0-9 mm, 2 = 10-19 mm, and so on.
d. Click Analyze > Descriptive Statistics > Frequencies, select the LENGTHGRP
variable, and click OK.
e. Go to the Output Window and navigate to the frequency table for LENGTHGRP.
View the frequency table compiled by SPSS.
(How does this frequency table compare with the one you prepared by hand?)
7
LAB 2: SUMMARY STATISTIC
Introduction
In biology, we generally classify the objects around us and we will need to do the same in
statistics. The ‘objects’ we are concerned with in statistics are probability distributions. Collected
data can be shown via the various probability distributions (binomial, normal, chi-square, etc.).
Summary statistics shows about ways of classifying probability distributions so that we do not
need to specify the distributions in every detail but rather can pick out the key properties as we
need them.
So, once a large set of data have been collected, we need to use some descriptive statistics to
convey the important aspects of the distribution of the data.
Objectives:
i) To calculate and interpret summary statistics (descriptive statistics) of the PULSE-RATE
data.
ii) To determine if the distribution follows normality
Materials:
Stopwatch
Activities
1. Data Gathering
a. Count the resting pulse rate of the students in the class – number of beats per
minute
b. Get all the counts and show the raw data in your lab report.
c. Enter the pulse-rate data into a computer file using an Excel/SPSS format, if
possible.
Tasks
1. Calculate the summary statistics (mean, mode, median, variance, standard deviation and
range) for the class data set
the class value is the population while the group value is the sample
8
2. Calculate the summary statistics for males and females.
Post-lab Question
1. Draw a histogram for the class data set-population and sample.
on a separate graph put in the class mean (population), mean for males, mean for
females and mean for the different groups (samples)
Describe the general shape of the data distribution: normal (bell-shaped), uniform, skewed with a
long tail to the left or right, middle-heavy (platykurtic), or tail-heavy (leptokurtic), or bimodal.
2. What are the consequences of too few intervals in a histogram? Too many?
4. Does the box plot change when you manipulate the histogram? Why?
9
LAB 4: ONE-SAMPLE INFERENCE
Whenever carry out an experiment or make an observation, we should always take at least three
readings. One reading gives us an estimate of the mean but no indication of the dispersion. Two
readings enable us to calculate the standard deviation, and a 95% confidence interval for the
mean is then
Two repeats is thus the absolute minimum number of readings we should take. However, with
three repeats the confidence interval for the mean is
With 50% more effort, we have increased the precision of our estimate by a factor of 4!
The standard error (SE) is a measure of the reliability or precision of x as an estimate of μ. The
smaller the SE, the more precise the estimate.
Under certain circumstances, the SE can be given a definite quantitative interpretation by using it
to construct a confidence interval for the population mean. A confidence interval for μ is an
interval that upper and lower limits are computed from the data. The interval always contains x.
Unfortunately, the mean calculated from a sample, X, will differ from the population mean μ.
The expected discrepancy between X and μ depends on the size of the sample and the variability
of X. If sample size is small and X has high variance, then X may be quite far from the population
mean. In contrast, if sample size is large and X has low variance, X will probably be close to μ.
The first step in constructing a confidence interval is to choose a value called the confidence
coefficient, which measures our “confidence” that the confidence interval contains µ.
Student’s t describes the method for constructing a confidence interval for μ. First, suppose we
have chosen a confidence equal to 95%. To construct a 95% confidence interval for μ, we
compute the upper and lower limits of the interval as
x ± t.05 SEx
that is,
x ± t.05 s
__
√n
The confidence interval (CI) combines information on sample size and variability to put
probabilistic bounds on estimates of the population mean. CI’s can be calculated for any desired
degree of confidence, but 95% confidence intervals are most common. If your sample is random
13
and the population has a normal distribution, you can be “95% confident” that your confidence
interval includes the population mean. More accurately, if you sample repeatedly and generate a
95% CI’s each time; you can expect the CI to include the population mean in 95% of the cases,
and not in the other 5% of cases.
Since you usually don't know the population mean, you'll never know when this happens. If the
data are not from a normal distribution, then the 95% CI will include the true mean in
approximately 95% of cases only if sample size is large (follows from the Central Limit
Theorem).
If the random sample is from a normally distributed population, then the (two-tailed) one-sample
t test may be used:
reject Ho if t ≥ t0.05(2),ν or t ≤ −t0.05(2),ν where;
t=X-µ
s/√n
and ν is degrees of freedom. The same procedure may be used if the data are not from a normal
distribution only if n is large (follows from the Central Limit Theorem).
If the data are not from a normal distribution and the sample size is not large, the Wilcoxon
signed-rank test may be used instead. We will learn more about the Wilcoxon and other “non-
parametric” tests in future lab exercises. These tests are based on ranks and do not require the
assumption that the population is normally distributed. However, rank tests are generally less
powerful than tests based on the normal distribution, and the latter are therefore preferred if the
assumption of normality can be met.
Objective:
i) To learn about distributions of sample means and confidence intervals for means.
Material:
Health scale machines meter
Methods:
1. Data Gathering
14
b. Get all the measurements and show the raw data in your lab report.
c. Enter the weight data into a computer file using an Excel/SPSS format, if possible.
d. Determine the population mean and population standard deviation for your class.
Tasks
1. Confidence interval for WEIGHT, estimated.
Calculate the 95% confidence interval for the mean weight for your group. Use both your
group standard deviation (s) and the population standard deviation (σ) as comparisons.
Does the confidence interval contain/capture the population mean for both the cases?
(Refer examples of calculation given).
2. Then repeat for your group and the 2nd group; then your group, 2nd group and 3rd group;
your group, 2nd group, 3rd group and the 4th group. Does the SE decrease as the sample
size increases? Report your results as μ = X ± SE as well as graph your answer with SE
on the y–axis and n on the x-axis.
5. Open your data set in SPSS and check your confidence interval calculations with Analyze
> Descriptive Statistics > Explore (select the WEIGHT variable). The confidence
interval is reported in the output area labeled “Descriptives”.
15
Example calculation for Confidence Interval.
σx = 6.21 n = 10 x = 159.40
using x as point estimate of µ and with a knowledge of the population sd the upper
limit (UL) and lower limit (LL) for a 95% confidence interval of the mean are
given by:-
Conclusion: We feel 95% confident that the population mean of this population is
included by the interval 147.23 to 171.57.
*** 0.95% of the standard normal distributions lies between z score -1.96 and
1.96.
n = 20 x = 21.0 mm s = 1.76 mm
and,
LL0.95 = 21.0 – 2.093 * 0.394
= 20.175
Or, µ = x ± 0.825
Thus, we conclude there is a 95% (0.95) probability that the range of 20.175 mm to
21.825 includes the population mean.
17
LAB 7: INDEPENDENT SAMPLES AND THEIR DIFFERENCES
Introduction
Two-sample inference is when we consider the problem of estimating and testing differences
between two means. For example, we may be interested in comparing the effects of two different
medications on patient mean blood pressure. Or, we may wish to compare the effects of different
fertilizers on mean plant growth. There are two completely different ways of carrying out such
comparisons of means. The first approach is to randomly assign independent observations (e.g.,
patients, field plots) to different treatments. In this case we have two samples of individuals,
each from separate populations: one sample of individuals given drug #1 and a second sample
of individuals given drug #2 (or, one sample of field plots treated with fertilizer #1 and another
sample treated with fertilizer #2). This is the two-sample design the subject of the present lab
exercise. Our goal is to compare the two population means (μ1 and μ2) using two random
samples of patients (or, field plots).
You will have only a single estimate of each mean, but keep in mind that if you were to go back
and collect two more random samples, the value of X1 − X2 obtained the second time would be
different from that obtained the first time. The mean of the distribution of possible values for X1
− X2 is μ1 − μ2, and its standard deviation is σ X1 −X2 .
has a t-distribution with n1 + n2 − 2 degrees of freedom. This fact is the basis of the two-sample t
test for a difference between population means, and of the confidence interval for the difference
between two means. The quantity s − is computed from the pooled sample variance, sp2, where
X1 X 2
23
conditions the Wilcoxon rank sum test is about 95% as powerful as a 2 sample t-test, although it
may be less powerful in specific settings.
Power Analysis
When researchers carry out an experiment to test the difference between two treatment means,
how do they decide on the appropriate sample sizes to take? How confident are they about their
abilities to detect a difference if one is present? Power is the probability of correctly rejecting the
null hypothesis when it is false (power is 1−β, where β is the probability of making a Type II
error). The power of the two-sample t test depends on:
1. The sample size (n1+n2). Greater sample size increases power of a test.
2. The significance level (α). Power decreases with decreasing α. For example, reducing α from
0.05 to 0.01 to reduce the probability of making a Type I error but increases the probability of
making a Type II error.
3. The within-population variation (σ). Higher variation reduces power.
4. The difference between means, μ1−μ2. The larger the difference between the population
means, the greater the probability of rejecting Ho.
Objective
i) To describe independent samples.
ii) To estimate a mean difference with 95% confidence.
iii) To conduct an independent t test.
Tasks
1. Independent samples and Side-by-side boxplot.
Used the data from the Appendix (The average daily Na+ intakes (in milligrams) of 12
Normal and 10 Hypertensive subjects) to:
a. Determine 5-point summaries of the Normal (n1 = 12) and Hypertensive (n2 = 10) in
the sample.
b. Then, construct a side-by-side boxplot of these distributions. Do the distributions
overlap? How do the medians compare?
The mean difference in Na+ in Normal and Hypertensive in the population (μ1−μ2) = 50 mg.
24
4. Statistical hypothesis test
a. Test H0: μ1−μ2 = 0. List all hypothesis testing steps.
Were you able to reject the null hypothesis? Does this imply the null hypothesis is
correct? Did you make a type I or type II error?
5. SPSS
a. Enter the data into a computer file using an Excel/SPSS format, if possible and make
note of Na+ values for the Normal and Hypertensive in this sample.
b. Open .sav file in SPSS and click Analyze > Descriptive Statistics > Explore.
c. Put the variable Na+ in the Dependent List and Na+ in the Factor list.
d. Go to the output window and navigate to the boxplot. How does this boxplot compare
with the one you produced by hand?
e. Check your statistical hypothesis test calculations with SPSS by clicking Analyze >
Compare Means > Independent Samples T test. The Test variable is Na+ and the Group
Variable is People. You must use the Define Groups button to tell SPSS that Groups 1 is
coded “Y” and Group 2 is coded “N”.
f. After the program runs, go to the output window and navigate to the region labeled
“Independent Samples Test”. The first row of the output table (labeled “Equal Variances
Assumed”) contains confidence interval and test statistics. These should match the
statistics you calculated by hand in part 5 and 6, respectively.
Appendix
The average daily Na+ intakes (in milligrams) of 12 normal and 10 hypertensive subjects.
_____________________________________________________________________
Normal:
10.2 2.2 0.0 2.6 0.0 43.1 45.8 63.6 1.8 0.0 3.7 0.0
_____________________________________________________________________
Hypentensive:
92.8 54.8 51.6 61.7 250.8 84.5 34.7 62.2 11.0 39.1
_____________________________________________________________________
* The two groups were isolated for a week and compared with respect to Na + intake.
This data deal with sodium chloride preference as related to hypertension.
25