0% found this document useful (0 votes)
142 views26 pages

Bio111 Lab Manual 1 Fall 2021

Uploaded by

Seth Jones
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views26 pages

Bio111 Lab Manual 1 Fall 2021

Uploaded by

Seth Jones
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

1

LABORATORY #1: HYPOTHESIS TESTING


Purpose: The purpose of this activity is to introduce you to one method of scientific inquiry – the
laboratory experiment. Many of the skills you will learn here are used in other types of scientific
investigations, as well. Today you will learn how to identify and categorize variables, use statistical
tests to examine patterns in data, and design and modify hypotheses. You will experience the
tentative, probabilistic, cumulative nature of ‘truth’ in science.
Methods and Techniques: You will learn how to construct testable hypotheses, collect data, subject
these data to a statistical test, interpret the results of the statistical test, and reach a conclusion
regarding your hypothesis.
Major Concepts: variables (controlled, random, independent, dependent, confounding); falsifiability;
replication; artifact; probability, inductive logic, deductive logic.

INTRODUCTION
Webster’s New World Dictionary (2010) defines science as “systematized knowledge derived
from observation, study, and experimentation carried on in order to determine the nature or principles of
what is being studied; the systematized knowledge of nature and the physical world.” So, science seeks
to understand the physical world. Therefore, questions about or dependent upon supernatural
phenomena (by definition), or questions regarding ethics or morality, are outside the domain of science.
Likewise, Webster’s definition specifies a method of inquiry – observation, study, and experimentation.
You may know this process as ‘the scientific method.’
In fact, there is no single, monolithic ‘scientific method.’ There are several ways to gain
information about the physical world through systematized observation, study, and experimentation. In
fact, for some questions, observation may be sufficient. These are called ‘descriptive studies.’ For
instance, to answer the question “do humans have tails?” you just have to look – there is no true
‘experiment’ that is conducted. Of course, the validity of the conclusion is still dependent on the rigor
and completeness of the investigation, and rigorous observations may lead you to modify your question
to be more specific. If you simply use common sense and your own biased impressions (i.e., without
applying systematic observation), you might say “of course not – humans do not have tails.” That is
why common sense is not science – it is neither rigorous nor specific. (For instance, common sense
led humans to believe that the Earth was flat, stationary, and the center of the solar system.) If you
observed humans in a systematic, unbiased way, you would find that almost all humans are born
without external tails. However, there are a few individuals who are, indeed, born with a small external
tail. And, barring other rare mutations, human embryos have post-anal tails that develop into internal
tailbones. So, observations of humans at different stages of development, and observations of internal
anatomy, might lead you to answer this simple question about the physical world in a rather specific
manner couched in qualifiers: “Like all Chordates, human embryos typically develop a post-anal tail.
Typically, this develops into post-anal coccygeal vertebrae. At birth, these tail bones usually do not
protrude from the body as an external tail, except in very rare cases.” Therefore, many facts in science
are ‘just’ observations; but they are observations taken in an unbiased, systematic manner. You might
think humans do not have tails, but that is because your nonscientific impressions were based on
biased considerations of only the external anatomy of most postnatal humans. In addition, external tails
are often surgically removed shortly after birth, so observations of older children and adults gives a bias
impression of the frequency of this condition at birth.
Our definition then lists study as a scientific method of inquiry. This is the least specific of all,
and demands further elaboration. In science, we equate ‘study’ with logical, rational thought based on
the principle of natural causality. This means that when we observe a phenomenon in the physical
world and we decide to try to explain it scientifically, we seek a rational explanation for that occurrence
as a function of natural causes. We do this for two reasons. First, we have no reliable, repeatable
method for inquiries based on supernatural explanations, and no way to distinguish between alternate
supernatural ‘hypotheses’ with evidence from the physical world. (Did Yahweh make it rain? Zeus?
2

Vishnu? Odin? There is no evidence from the physical world that can distinguish one supernatural
hypothesis from the other; as such, it is a moot issue scientifically). A second reason for seeking a
rational explanation is that describing the natural causality of events can potentially give us the ability
to control those events by inducing these cause-effect relationships ourselves. In other words, science
WORKS…. It is a useful, predictive tool for understanding and manipulating the physical universe.
Through science, for example, humans have come to understand electricity. We can control it and
cause electrical impulses to pass through thin pieces of metal—causing it to glow. Although many
questions lie outside the domain of science, science is the most appropriate and reliable tool to gain a
correct understanding of cause-effect relationships about the physical universe. The first step in
assessing causality is to describe relationships between variables; ‘when ‘X’ happens, it seems that ‘Y’
then occurs’. This is a correlational pattern – the occurrence of one event correlates with the
occurrence of another. However, this DOES NOT define causality… event ‘Z’ could cause both ‘X’ and
‘Y’ and that could be why X and Y co-occur.
Also, it is important to realize that quantification is typical of a scientific approach. In science,
we often represent what we observe in mathematical terms because these can be relatively precise
and less sensitive to subjective interpretation than other languages. Although scientific facts may also
be described in prose (‘narrative data’), or as relative, semi-quantitative categorical terms (‘high,
medium, low’), eventually a more precise study will demand quantitative data. Biology is a science,
and you must be comfortable with mathematics to do it successfully.
The final method mentioned in the definition is experimentation. The goal of an experiment is
to create a situation in which a potential causal link between two correlated variables can be tested. To
do this, other variables must be controlled, randomized, or measured so that their potential effects can
be nullified or described. There are several types of experiments, too; from laboratory experiments to
field experiments, to post-dictive experiments. Laboratory experiments provide the greatest control of
the experimental environment, and so give the greatest sensitivity for observing a causal relationship
between two variables. However, the goal of science is to describe how the physical (i.e. ‘natural’)
world operates, and the laboratory environment may be so unnatural that what you observe is a
function of this unnatural environment itself, and is not what actually happens in the ‘real world.’ Such
results are called laboratory artifacts. Suppose you are studying the ecological interactions between
spiders and lizards on islands in the Caribbean, and you systematically observe that there are few
spiders on islands that have many lizards. You hypothesize that the lizards are eating the spiders and
reducing spider abundance. You bring the system into the laboratory and you conduct an experiment
demonstrating that lizards do eat the spiders. But, maybe the lizards eat the spiders in the laboratory
because there is nothing else to eat. On the islands, maybe lizards rarely eat the spiders; rather, they
out-compete spiders for flies. Lots of lizards mean fewer flies for spiders to eat, and spider populations
decline. Some good field observations could have prevented this premature lab experiment.
Field experiments are more realistic, but the investigator cannot control (or even measure) all of
the variables that are fluctuating in the environment. As such, it may be difficult to ascertain exactly
which variable is responsible for the response you measure. Finally, post-dictive experiments allow us
to test hypotheses about the past. The typical post-dictive hypothesis is: if ‘X’ happened in the past,
then ‘Y’ should be present now. (It is best if you have not observed ‘Y’ yet… otherwise you may
unconsciously be trying to explain ‘Y’ rather than testing the effects of ‘X.’) For instance, the ‘Big Bang’
hypothesis predicted that there should still be a radiational ‘echo’ of the universal explosion, and that it
should be about 2-3oK (degrees above absolute zero temperature, on the Kelvin scale). Two scientists
at Bell Labs, unaware of this prediction, kept finding ‘background radiation’ of this magnitude, no matter
which direction that they placed their antenna. They had confirmed a prediction of the Big Bang
hypothesis…and with subsequent confirmations of other predictions, this idea was elevated to the
status of a Theory – an explanatory model of how the physical world operates that has been
tested and supported by numerous, independent experiments. Evolutionary Theory is another
example. In 1859, Darwin suggested that species alive today are descended from common ancestors
that lived in the past. Essentially, all of life is one big family tree, with some species closely related (like
siblings in a nuclear family that share the same parents), and others more distantly related (like second
3

cousins that share great-grandparents). Based on anatomical similarities, Darwin hypothesized that
humans were more closely related to other primates than to cats, birds, or fish. This hypothesis of
relatedness was tested and confirmed in the 1970’s by looking directly at DNA similarity among existing
species.
All true experiments – whether laboratory, field, or post-dictive – involve the rigorous
identification of variables. Today’s first activity will give you some practice identifying and categorizing
variables. Then, you will create and conduct experiments with the response of crickets to light. You
will analyze your data statistically (employing quantification) and draw conclusions based on an
understanding of probability. Throughout the work, you should also gain an appreciation for the fact
that scientific investigations are iterative (you have to keep testing and modifying your hypothesis until it
explains ALL of the related data - a single experiment is not sufficient), cooperative (since no one
person can do ALL of the relevant experiments related to a hypothesis, the sharing of data among
investigators is crucial), and tentative (you will never know for SURE that your hypothesis is absolutely
true since it can never be tested in every conceivable way that controls or excludes every other
variable).
ACTIVITY A: VARIOUS VARIABLES
Introduce yourself to the person sitting next to you – you will be lab partners for today. Read the
following description of a hypothetical experiment conducted to determine the value of a new anti-
cancer drug. Then, with your partner, answer the questions that follow. If directed to do so, review
these answers as a class.
You and your partner are the scientific advisors to the Board of Directors of Pilco, a fledgling
pharmaceutical corporation. Pilco has invested a significant percentage of their research budget in a
1.3 million dollar grant to Dr. Marsh, a medical research scientist at Massachusetts General Hospital.
Dr. Marsh is conducting the first human trials on the value of a new drug, X136. In previous
experiments using human colorectal carcinoma (cancer) cell lines, X136 reduced the rate of malignant
cell division by 75% compared to untreated cells. Trials with mice also demonstrated an anti-cancer
effect. She is hopeful that this drug may provide a non-surgical cure for colon cancer. Pilco was
encouraged by the previous results with the drug. They know that colon cancer is one of the most
common cancers in humans, so there is a large market for a successful, non-invasive treatment. In
other words, a truly effective drug could make the company a lot of money and save many lives.
However, if the experimental results are dubious and they promote an ineffective drug, some patients
may die, the public will lose faith in the company, and their business will fail (not to mention U. S. Food
and Drug Administration liabilities). There is a lot riding on your advice. You must advise the board of
directors regarding: 1) the reliability of the results – was the experiment sound? and 2) what do the
results suggest? Should Pilco invest in the process of seeking FDA approval and the manufacture,
marketing, distribution, and sales of this drug?
The Experiment:
Dr. Marsh identifies 300 patients who are otherwise healthy but have early stages of colon
cancer. She selects the 250 patients who are between the ages of 25-55 for screening. She asks them
if they would like to be in an experiment for a new anti-cancer drug. Even though they are told that they
might not receive the drug in the experiment, 205 agree to participate. Dr. Marsh randomly selects 100
of these individuals for the experiment. She and her colleagues conduct a colonoscopy on each patient
and count the number of malignant polyps seen in a particular region of the colon. Fifty patients are
randomly assigned to the treatment group and will receive the drug in a pill they will take every day for
the next three months. The other fifty are assigned to the ‘placebo’ group, and they will receive a pill
every day that does not contain the drug. The patients do not know to which group they have been
assigned; they are ‘blind’ to the treatment. In addition, these assignments were made by Dr. Marsh’s
graduate student. As such, Dr. Marsh is also ‘blind’ to which patients are in which treatment group.
This is called a ‘double-blind’ experiment. Dr. Marsh tells them to eat their normal diet and maintain
their typical daily routine, and she meets with each patient once a month to see how they are doing. All
subjects say they have taken their pill every day.
4

After three months, Dr. Marsh conducts another colonoscopy on each patient and again counts
the malignant polyps. Now that the data are collected, she can look and see to which group the
patients were assigned. She tabulates the number of patients from each group that had at least 5
fewer polyps after three months, about the same number of polyps (within 5 of the original count), or at
least 5 more polyps than in the original sample. Her results look like this:

CHANGE IN POLYP NUMBER X136 PILL PLACEBO PILL


DECREASE OF 5 OR MORE 15 10
NO CHANGE (WITHIN 5) 25 20
INCREASE OF 5 OR MORE 10 20
Dr. Marsh conducts a statistical test (which you will do later), and determines that there is only a
5% probability that two identical groups would differ this much, just by chance. As such, there is a 95%
probability that there is a meaningful difference between these sets of numbers – a difference not
caused by chance, but caused by something else – such as the difference in pill types between her
groups (or any other systematic difference between her groups).
Questions:
Consider this list of variables, and answer the following questions with your lab partner.
Patient age patient sex patient ethnic group
Patient health disease status patient diet
Patient activity level type of pill change in polyp number
Dr. Marsh’s ‘bedside manner’ weather on examination days

1. What does Dr. Marsh measure, as an indicator of the effectiveness of her pill? (This is called
the dependent variable, because the value may depend upon the treatment.)

2. What variable is purposefully changed between the groups by the experimenters? (This is
called the independent variable, and it is the hypothesized ‘cause’ of changes in the
dependent variable.)

3. What variables does Dr. Marsh either hold constant or allow to vary in a narrow range? (These
are controlled variables.)

4. What variables are randomized across the experiment? Why control some variables and
randomize others?

5. What variables are uncontrolled (and unrandomized)?


5

6. What is the purpose of the placebo group? Why do they also receive a pill? Could the
decrease in polyp number in the placebo group be biologically interesting? How would you
test for a ‘placebo effect?’

7. What would you change about the experimental design?

8. Are you confident that the drug X136 has anti-cancer properties? If there was only a 1%
probability that similar groups would differ as much as these, would you be more confident
that the drug has an effect?

9. Suppose that, when the patients were assigned to groups, all the men were assigned to the
X136 group and all the women to the placebo group. Now, can you attribute the difference in
response solely to the effect of the drug, or do the groups now also differ systematically in
another variable? (This is called a confounding variable).

10. Why employ a blind and double-blind experiment? Explain in terms of confounding
variables.
6

ACTIVITY B: TESTING PATTERNS OF HABITAT SELECTION IN CRICKETS


In this exercise, you will be investigating the tendency of crickets to gather in shaded or lit habitats
by collecting quantitative data from a laboratory experiment, and then subjecting these data to a
statistical test. The goal of this activity is to learn how to use a statistical test to reach a scientific
conclusion, and to hone your critical experimental skills.
Today's experiments: the biological phenomenon of taxis
Typically, animals select habitats that provide food, shelter, and/or mates, or place them in
environmental conditions that they can tolerate. Important environmental variables include
temperature, humidity, light, wind, sound, disturbance, etc. Obviously, animals usually select habitats
based on more than one variable… but which ones? Some variables may be more important to some
species than others. Controlled laboratory experiments allow scientists to determine whether
organisms respond to particular stimuli. Orientation in response to a stimulus is called taxis.
Furthermore, taxis can be characterized by the type of environmental stimulus guiding the orientation
and whether the orientation is toward (positive taxis) or away from (negative taxis) the stimulus. For
example, the grayling butterfly (Eumenis semele) flies upward toward the sun when confronted by a
predator. The behavior in this case is termed positive phototaxis, a positive orientation towards light.
Similarly, adult female crickets tend to turn and move toward a speaker playing the recorded chirps of a
male cricket of the same species (positive phonotaxis). Conversely, cockroaches typically avoid lighted
areas of their environment, a type of negative phototaxis. The wood-boring larvae of many species of
beetles change direction when exposed to simulated gnawing sounds in front of them (negative
phonotaxis). This keeps them away from potential competitors, and creates the irregular ‘galleries’ of
burrows in trees that are heavily infested.
In this experiment, you will study the taxis response of crickets to light. Crickets are nocturnal, and
they are fairly skittish creatures. Why? Is their nocturnal behavior a function of an aversion for light, or
are they nocturnal because that is when their food is available or that when their predators are? As a
function of these issues, you might become interested in whether crickets respond to differences in
their light environment. (You might not be interested, but you will play along, won’t you?) Based on
your casual observations, you probably have a preconceived opinion on the matter – this is your
‘working hypothesis.’ Based on the fact that crickets are nocturnal and skittish, your working
hypothesis is probably that ‘crickets congregate in shaded habitats.’ To test that general idea, you are
going to conduct a specific, restricted test. Therefore, you are going from a general principle to a
specific circumstance – this is deductive logic. ‘If, in general, crickets congregate in shade, then they
will do so in my specific lab experiment when they are given a choice between shaded and lit habitats.’
EXPERIMENTAL PROTOCOL
The first experiment offers crickets lit and shaded environments in a light-controlled room. You will
record the number of individuals that select each type of environment.
As you move through this experiment, think about how your procedures and the ‘choices’ you
make could affect the behavior of the crickets. At the end of the experiment, your goal is to reach a
conclusion about how crickets behave in the restricted conditions of your specific experiment.
However, your conclusion will be more valuable if you can legitimately apply this conclusion to cricket
behavior in general. You want to be able to apply inductive logic (compare with deductive logic
mentioned earlier) at the end of your experiment, and apply your conclusion about these specific
crickets in this specific situation to the behavior of all crickets, in general. In order to legitimately
generalize, the specific conditions of your experiment can’t be too unnatural or unique. Otherwise, the
patterns you observe really ARE specific to your experimental conditions and have no predictive or
explanatory power outside of these particular conditions. As you go through this experiment, think
about the choices you are making, either purposefully or not, and think about whether these choices
could be limiting the generality of your conclusion. After this experiment is over, you will design a new
hypothesis and will conduct another experiment. That is, you will have the chance to improve upon this
experiment and ask a more refined question that has greater explanatory power.
7

Procedure:
1. Examine your arena. The top is clear plastic and can slide out. There is also an entry port on one
side, with a stopper. Also note a black sleeve, and a wooden rod that fits in the sleeve. There are
three slots in the back of the arena so that you can divide the arena in half or into thirds, and you
have two metal dividers that fit these slots. You will insert the dividers at the end of a run so that
you can count crickets at each end of the maze without them moving back and forth. (Here are
some of the choices you have made so far: this arena… its size, color, composition, etc.; this room
and all of the environmental conditions in it.)
2. Position the arena on the table, under a light source. Cover the half of the arena closest to the light
– this creates a shady region under this cover. Align the arena and cover so that half of the arena
is shady and half is lit. Put a piece of tape on the table to note this position. (Choices made: this
type of light: its quality and quantity. This type of shaded spot: its size, position, and amount of
reflected light.)
3. Pick up the arena, open one end, and shake in the crickets. Be quick, and do not bother to count
the crickets entering the box! Grab any stragglers or escapees and add them through the ‘cork-hole’
in the front of the arena. Give the arena a sideways shake to randomize the position of the crickets
in the box. Quickly place the arena in position, with the cover in position, too. Note the time.
(Choices made: Dumping in the crickets through the top of the box; shaking the box; using multiple
crickets at once.)
4. You will terminate the experiment after five minutes by sliding a divider into the center slot. In
the meantime, observe your study organisms. Note that males have two terminal ‘spines’ coming
off their abdomen – these are cerci. Females have shorter cerci, but a very long spine between
them. This is the ovipositor – an egg-laying device. (choices made: five minute interval; both sexes
present in box at the same time).
5. After five minutes, slide the divider into position and count the number of crickets in each end of the
maze. Place your data in the results table on the next page:
6. What do you conclude? How sure are you that this pattern was not just ‘dumb luck?’ In other
words, how likely is it that crickets DO NOT typically gather in one area over the other, but just by
chance ended up sorting themselves as they did? Knowing this particular probability is very
important. If you conclude that crickets move to one area over the other, but in fact this pattern is
due to chance, then you are wrong. This type of error can have grave consequences. Think about
Dr. Marsh and the Pilco drug company. Suppose they claim that the difference between the
treatment group and the placebo group was caused by their treatment – in other words, they
conclude that the drug works. However, suppose this difference was due to chance. They are
WRONG. The drug doesn’t work, their reputations will be shot, and they will be sued. Hmmm….
Knowing the probability that chance could be responsible for your pattern becomes pretty important!
They should be conservative; they should only say the drug works if the probability that chance
caused the pattern is very low (< 1%). This is the function of most statistical tests: determining the
probability that the pattern you observe could be caused by chance. Also, think about the
physicians who read Dr. Marsh’s report in the Journal of the American Medical Association, and
then have to decide whether to prescribe this drug or not, based on their interpretation of her
results. These physicians need to know something about statistics and probability, too!
8

Working’ Hypothesis: Crickets congregate in shaded habitats

Methods Summary: Crickets were added to an arena and the arena was gently shaken to
randomize their position. The arena was placed beneath a light source and partially covered,
creating a shaded region and a well-lit region within the arena. After five minutes, the number of
male and female crickets at each end of the maze was determined. The frequency distribution of
total crickets in shaded and lit regions was compared to a 1:1 distribution that we would expect if
there were no tendency for congregation in one of the regions, using a Chi-squared Goodness-of-
Fit test.

‘Testable’ Hypothesis for which an expected distribution can be calculated:


Location of crickets is unaffected by shaded or lit regions of the arena after disturbance.

Results: Treatment # of Males # of Females Total


Lit
Shaded

8. Read ‘The Researcher’s Toolkit #1: Comparing Observed and Theoretical Distributions
Using the Chi-squared Goodness-of-Fit Test’ on page 87. Then, fill in the table below.
Calculate your Chi-squared value, and compare it with the critical values in the toolkit.

Expected Distribution under this testable hypothesis: If there were no preference, you
would expect 10 in the shaded region and 10 in the lit region. So, to calculate your expected
values, divide your total sample size by 2. Enter your expected and observed values in the
table below.
Table comparing observed and expected distributions
Obs. Exp. O-E (O - E)2 (O – E)2/E
Shaded
Lit
 (sum) - - X2 =

Calculated Experimental Critical Value from Chi-squared


Chi-squared value (X2) = ______ table (p = 0.05) = _______

How likely is it that any differences occurred by chance? (> 5% or < 5%) _______

Conclusion:

Variables:
Dependent:

Independent:

Controlled:

Uncontrolled:

Confounding:
9

The Researcher’s Toolkit


#1: Comparing Observed and Theoretical Distributions
Using the Chi-squared Goodness-of-Fit Test
Suppose you perform a study in which the collected data are not continuous measurements
(length, mass, time, etc.), but are categorical outcome counts (how many individuals have this or that
attribute, or how many times an event happened). The data could be in the form of male/female, or
blue/green/brown eyes. Suppose you want to compare the distribution of these outcomes with a
theoretical distribution. The Chi-squared goodness-of-fit test determines the probability that the
difference between the observed and theoretical (expected) distribution is due to chance.
Suppose you want to know if a coin is balanced. You state a hypothesis for which you can
envision a particular theoretical result. In this case, the hypothesis is “the coin is balanced,” and you
envision that in a sample of flips, the coin should theoretically give equal numbers of heads and tails.
You conduct an experiment to test your hypothesis. You flip the coin 21 times. It lands heads 13
times and tails 8 times. Well, what do you conclude? You know that even a perfectly balanced coin will
not yield exactly equal heads and tails in every sample; there is chance deviation, especially in a small
sample (21 observations). It would be useful to know how likely it is that a truly balanced coin would
yield a 13:8 ratio. If you knew that a 13:8 ratio was a very unlikely event for a perfectly balanced coin,
then you’d be confident that your coin was not balanced. However, if a 13:8 split is a rather common
occurrence for a perfectly balanced coin, then you wouldn’t want to claim that your coin was
unbalanced. The Chi-square Goodness-of-Fit test determines the probability that observed results
could be produced by chance when a particular theoretical distribution is expected.
If the coin is balanced, our theoretical expectation for the outcome of this sample would be 10.5
heads and 10.5 tails. So, how close are our observed results to this expectation? By subtraction, we
can measure the difference between each outcome value and the one we would expect if the coin was
balanced. Then, we can sum those differences to get a total ‘index’ of the difference between our
observed and expected results.
This table that shows our data (‘Obs.’ = observed), our expectations (‘Exp.’) under the hypothesis of
a perfectly balanced coin, and the difference (Obs minus Exp) between them:
Outcome Category Obs. Exp. O - E
Heads 13 10.5 2.5
Tails 8 10.5 -2.5
(sum) 21 21 0
However, if we sum these differences to get a total measure of the difference between our observed
and expected results, it always sums to zero. We can overcome this hurdle if we square these
differences to make them positive, (O – E)2, and then add them:
Outcome Category Obs. Exp O-E (O - E)2
.
Heads 13 10.5 2.5 6.25
Tails 8 10.5 -2.5 6.25
(sum) 21 21 12.50
What would the value of 12.5 mean? Obviously, as this total gets larger, the observed results are
less similar to our expected results (and less supportive of the hypothesis of ‘even balance’ from which
our expected results were generated). But, this total of ’12.5’ has to be evaluated in the context of the
sample size of the experiment. We need to include some weighting procedure that weights each
10

squared difference by some index of sample size. Statisticians divide the squared difference by the
expected value to correct for sample size. So, our value, standardized for sample size, is:
Outcome Category Obs. Exp. O-E (O - E)2 (O – E)2/E
Heads 13 10.5 2.5 6.25 6.25/10 = 0.625
Tails 7 10 -2.5 6.25 6.25/10 = 0.625
(sum) 21 21 1.25
But what does 1.25 mean? This is our Chi-squared (X2) value, and statisticians have determined
how likely a value of at least 1.25 is, just by chance. In fact, they have determined how likely every X2
value is, just by chance, and have tabulated particular values in a Chi-squared table:

Probability
# Categories “df” 0.5 0.1 0.05 0.01 0.001
2 1 0.455 2.706 3.841 6.635 10.828
3 2 1.386 4.605 5.991 9.210 13.816
4 3 2.366 6.251 7.815 11.345 16.266
Common ‘critical’ values for evaluating tests with two outcome categories (‘heads’ and ‘tails’) are
3.841, which will occur by chance 5% of the time, and 6.635, which will occur by chance only 1% of the
time. (REMEMBER: the larger the chi-square value, the greater the difference between observed and
expected values and the LESS LIKELY it is that random chance could produce a difference this large.)
In scientific experiments, it is common to use either the 5% or 1% criteria for evaluating hypotheses.
Our value is 1.25. In the table; using the 2 outcome categories row, find where 1.25 would be. Now,
read the column headings – a value of 1.25 would occur by chance between 10% and 50% of the time,
just by chance. So, a perfectly balanced coin would yield a 13:8 ratio 10-50% of the time. Our
conclusion is that this is a pretty common result for a perfectly balanced coin. So, this study has failed
to show that the coin is unbalanced. If we had done a larger study with a greater number of tosses, we
may have found that the coin is indeed slightly unbalanced, but our study has failed to show this. We
have not proven the coin is balanced, we have simply failed to show that it is unbalanced.
There are a few things you need to keep in mind about the Chi-squared Goodness-of-Fit test:
1) You can compare two or more observed distributions to one another, to test for differences
among various observed distributions. (This is called the Chi-squared Test of Independence,
and is described in a later ‘Researcher’s Toolkit’).
2) Your data must be count (integer) data. If the values you put into your observed outcome table
column contains percentages, fractions, or decimal points, you cannot use the Chi-squared
Goodness-of-Fit test. The expected values, however, may have decimal points.
3) The Chi-squared Goodness-of-Fit test may indicate there are differences between the observed
and expected distributions, but it may not indicate which observed values are significantly
different from which expected values. For instance, if you have three categories (blue, green,
brown), and conclude their distribution does not fit a particular theoretical one, you don’t which
color outcome(s) gave rise to the discrepancy. All you know is that a discrepancy exists
somewhere.
4)
11

The Researcher’s Toolkit


#2: Developing a Testable Scientific Hypothesis
A scientific hypothesis is a tentative, testable statement of description, relationship, or causality in
the physical world. ‘Testable’ means that you can gather evidence from the physical world that will
either support or refute this statement. So, ‘testable’ means ‘falsifiable’ – you can envision data that
would show your hypothesis to be false (as well as data that would support it). One un-testable
hypothesis is: ‘There is an invisible, mass-less Leprechaun standing on your shoulder.’ It might be
true; in fact, the good luck that you’ve had in your life is supporting evidence for his presence.
However, how would you disprove his existence with evidence from the physical world? You can’t see
him, and his feet leave no impression on your shirt (he has no mass). So, the issue can’t be
addressed with science. Of course, neither can any hypothesis dependent on this one. If you find a
$10 bill and I state that ‘your Leprechaun caused you to find the $10’, then this hypothesis is also un-
testable because it assumes the previous hypothesis (that the Leprechaun exists) is true. But it
couldn’t be tested, remember? In the realm of biology, ‘Scientific Creationism’ and ‘Intelligent Design
Theory’ are not science simply because they explain events as the consequences of un-testable,
supernatural agents.
The hypothesis should be a one sentence statement of ‘fact’. This fact can be descriptive
(“Humans have tails.”), correlational (“As height increases, weight increases.”), or causal (“The HOX-2
gene causes tail-bud formation.”). If you cannot put your hypothesis into one sentence, it may need
rethinking. Often, a long-winded hypothesis is actually several hypotheses mistakenly rolled into one; it
is always easier to test just one hypothesis at a time. Making the hypothesis into a statement as if it
were a fact will help you to see how further experiments could help support or contradict the
hypothesis. This can help sharpen your logic and assist you in seeking and eliminating hidden
assumptions in your arguments.
The hypothesis should explain more than just the initial observations that led to it. If you noted that
a certain insect species consistently moved away from the direction of the sun, the hypothesis “This
insect species moves away from the direction of the sun” may actually be a good hypothesis in another
situation to explain other phenomena. Here, however, it merely restates what you already know. Thus,
a good hypothesis often explains the observations in terms more fundamental than those in which the
observations were made. In this case, such a hypothesis may be “This insect species orients away
from light sources.” This is not simply restating the observations, since the insect could conceivably be
moving away from the sun’s heat, or it could just be following its shadow. This more fundamental
hypothesis of the mechanism underlying the observed phenomenon not only helps explain the
observation, but could help explain other related aspects of the insect's behavior. For example, it could
explain why the insect tends to burrow when an artificial light is placed directly overhead, or why it stays
under the forest floor litter, even on overcast days. Indeed, it is the capacity of tested hypotheses to
predict and explain as yet unobserved phenomena that is a power of science.
The hypothesis should be focused, both in the terms that it uses, and in the problem it seeks to
explain. Unfocused, vague hypotheses require numerous experiments to be supported, and often use
poorly defined terms or immeasurable variables. Not only are they shots in the dark, they are shots at
imaginary targets. Such hypotheses often have hidden assumptions, and may be based in turn upon
hypotheses the researcher has not even considered. In some respects, a hypothesis is a speculation,
and you always put yourself on shaky ground when you build one speculation on top of another. Try to
keep your hypothesis just one logical step away from the observations you actually made. Remember,
most scientific progress is made one small step at a time.
12
13

The Researcher’s Toolkit


#3: Compiling Data with Tables
A table is simply a presentation of numbers in an organized form. During an experiment, it is
often more convenient to enter the data, as you collect them, into a ready-made table than to write
them scattered down or across the page(s) of your notebook. Before you collect the data, construct a
table. Not only will this make it easier for you to see when to collect the next data point, but also to
make sure at the end of the study that you have not missed anything. In addition, making a table
beforehand ensures that you know exactly what you are going to measure and when or where you will
measure it.
Tables are used both for initial collection of data and for analysis of already-collected data. For
collecting data, each variable should be represented by a different column. Although this requires
much repetition in each column, it also unambiguously identifies each data point. Each data point is a
row in the table, explicitly identified by the unique entries in each descriptive variable column. Also,
when you analyze data with a computer statistical program, the program will require that the data take
the form of a table. So, for collecting data on insect location, you might have a data sheet that looks
like this:
Elapsed
Position # Males # Females Total
Time (min.)
Lighted area 5

Middle area 5

Shaded area 5

Make sure you include the units of measurement in your table. They need only be noted once at
the top of the appropriate columns in the table. (‘min.’ in the second column, above, symbolizes
‘minutes.’) It may be obvious to you now that time was measured in minutes, but how about in 3 years
when you return to these data? Would someone reading your table be able to guess this unit if it were
not explicitly indicated? Better that your information is overly specific, rather than too vague.
Choose the units of measurement so that they can be conveniently written and read. For example,
if you were measuring the mass of the crickets, do not use units of kilograms when grams or milligrams
would be more appropriate. Thus, write 2.3 g instead of 0.0023 kg, and write 18.7 ml instead of 18,700
µl. A long string of zeros is easy to misread or miswrite.
For analyzing data, sometimes a particular table structure is necessary. For instance, you need to
set your table in a particular way in order to conduct the Chi-squared Goodness-of-Fit test:
Outcome category Obs. Exp. O-E (O - E)2 (O – E)2/E
Shaded
Lit

(sum) - - X2 =

However, YOU SHOULD NOT USE SUCH A TABLE TO COLLECT DATA. Doing so will cause the
loss of information that could be important later, such as the time interval and the numbers of males
and females. So, remember that there are tables for collecting data, analyzing data, and reporting data
(that’s next).
14

When reporting data in a formal paper, you may need to restructure the table to reduce some of
that redundancy. For instance, with the insect location data above you would not need to include a
column for ‘Elapsed Time’ in the table. Rather, you could refer to the elapsed time in the title of the
table. It is accepted practice to include a title for the table and place it above the actual table. The title
should be clear enough that the table is generally understandable without having to refer to the text of
the paper itself. But the text of the paper must include mention of the table, directing the reader to it.
Give some thought about how to arrange the data in the table in the most concise and readable
manner. A well designed table can actually help you to see relationships in the data. This is especially
true when large amounts of data are reported. Rarely are raw data reported. Rather, tables of means,
standard deviations, or statistical results are often reported because these directly show the patterns in
the data.
Here are examples of a poor table and a good table, using the same hypothetical data.
male hi 0.053
male lo 0.034
female lo 0.018
female hi 0.026
female lo 0.017
male hi 0.042
male lo 0.027
female hi 0.036

Table 1. Effect of growth medium on whole body mass of


male and female Drosophila melanogaster larvae at day 7.
Body mass (mg)
Sex Low protein medium High protein medium
Female 18 26
17 36

Male 34 53
27 42
15

The Researcher’s Toolkit


#4 Measuring Small Distances with a Compound
Microscope
Note: The following instructions are for the Meiji Techno ML2000 microscope. Although the
basic parts and procedures are similar for many other compound microscopes, the availability
and locations of the controls may vary.
Always follow these precautions
 Carry the microscope upright. Grasp the arm of
the microscope in one hand and support its
base with the other. The eyepiece (ocular) can
fall out of the microscope if the microscope is
tilted.
 Place the microscope away from the edge of the
table.
 Use only lens paper (not paper towels or tissue
paper) for cleaning the microscope lenses.
 Rotate the low-power objective into the viewing
position before removing a slide from the
microscope and when you are finished with the
microscope.
 If you leave a microscope out after you use it, place a microscope cover over it.
 When viewing a specimen, always start with the lowest power objective and use the coarse
adjustment to bring it into focus. After first focusing at the lowest power, you can then change to
progressively higher power objectives, using only the fine adjustment for focusing at each subsequent
step.
Initial focusing:
1. Position the microscope on the table for comfortable viewing.
2. Lower the stage and put the lowest power objective in place.
3. Watch from the side as you raise the stage with the coarse focus control, until the objective is
almost touching the slide or the stage is at its highest potion.
4. Look through the eyepieces and adjust the light intensity to a comfortable viewing level. While
looking through the eyepiece, use the coarse adjustment to move the stage slowly downward.
When the specimen becomes visible and is approximately in focus, turn the fine adjustment slowly
to more precisely sharpen the focus.
5. Slide the eyepieces apart or together so that you can see clearly through both eyepieces at the
same time. Adjust the fine focus while looking just through the left eyepiece. Now look just through
the right eyepiece and rotate the right eyepiece adjustment until its image is also focused.
Optimizing the illumination:
6. Adjusting the diaphragm: Look through the ocular while you rotate the diaphragm adjustment to
reduce the amount of light reaching the specimen and hence to reduce glare. If you turn it down too
far, you will start to see halos around things. Adjust the diaphragm to give the best contrast; not too
16

dark, not too washed out. The diaphragm and light control need to be readjusted for each objective
you use.
Calibrating the ocular micrometer:
1. Arrange the stage micrometer for viewing under low power, and move the slide so that the scale of
the ocular micrometer is seen superimposed on the scale of the stage micrometer. The actual
distance between the smallest adjacent lines of your stage micrometer is 0.01 mm (or 10 microns).
Its entire length is 1.00 mm. Align the left ends of the stage micrometer ruler and the ocular
micrometer ruler (see figure).
2. Using the 4X objective, 30 units on the ocular micrometer corresponds to what length on the stage
micrometer ruler? To what actual distance (using the 4X objective) does one unit on the ocular
micrometer correspond? For example, if 30 ocular micrometer units span a distance of
0.6 mm on the stage micrometer (as shown in the figure below), then each ocular micrometer unit
equals 0.6 mm/30, or 0.02 mm. A more convenient unit for microscopic distances is the micron
(µm), which is 1/1000 of a mm. Thus, in this example each ocular micrometer unit would
correspond to 20 µm. Record this calibration factor (in this example, it would be 20 µm per
micrometer unit).
3. Repeat this type of calibration for the 10X and the 40X objectives.
4. Convert ocular micrometer units to microns by multiplying by the appropriate calibration factor you
determined in steps 2-3. For example, if the calibration factor of the 40X objective were 3.0 µm per
micrometer unit, and the object viewed were measured to be 4 1/2 ocular micrometer units in
diameter, the object diameter would be 4.5 X 3.0 = 13.5 µm.

1 2 3 4 5 6 7
17

The Researcher’s Toolkit


#5 Measuring Volumes with a Pipettor
Although each type of pipettor is slightly different, all have the same basic adjustments and operating
procedure.
Always follow these precautions when using a pipettor.
 Make sure a plastic disposable tip is on the pipettor before you dip it into a solution.
 Do not pipette volatile solvents (e.g., ether, chloroform), corrosive acids (e.g., concentrated HCl) or
strong volatile bases (e.g., concentrated ammonia) with a pipettor. Such solutions give off vapors
that can damage the interior of the pipettor. Use a glass pipet and pipet bulb to measure out such
solutions.
 Do not attempt to pipet a volume that is greater than the printed maximum of the pipettor nor less
that the printed minimum. Attempting to achieve such settings can damage the volume control
mechanisms of the pipettor.
 If your pipettor drips, either you are pipetting a volatile material, or the pipettor tip is not on securely,
or there is a leak in the internal pipettor seal. Solve the problem or call it to the attention of the lab
supervisor before using the pipettor further.
 Draw fluids up into the pipettor tip slowly. Rapid pulling of solutions can cause the solution to splash
up into the body of the pipettor.
 Always keep the pipettor in a vertical position with the tip down when it contains fluid.
 When in doubt, see if your pipettor is delivering accurately by pipetting a volume of water into a
small container on a top-loading balance. One ml of water equals one gram.
Using the pipettor:
1. Securely place an appropriately sized disposable pipettor tip on the pipettor. Most pipettors will only
accept a single size of pipettor tip.
2. Unlock the volume adjustment (some pipettors do not have this lock). Adjust the volume to a
volume within the stated range of the pipettor. The volume readout is typically a series of vertical
numbers, with the position of a decimal point indicated by a line or by different colors of the digits.
Become familiar with the meaning of the numbers on the pipettor you are using. Relock the volume
adjustment.
3. Depress the pipettor plunger until it reaches its first stop. Hold the pipettor vertically and put the
end of the pipettor tip into the solution. There is no need to submerge a large portion of the pipettor
tip into the solution. Let the plunger rise slowly to its uppermost position. Raise the pipettor tip out
of the solution.
4. Dispense the solution by slowly depressing the plunger to its first stop. When no more solution exits
the pipettor tip, you may still see a small drop in the end of the tip. This drop should also be
delivered by depressing the plunger down to its second (and final) stop.
5. Although pipettor tips are disposable, make sure you place them in the proper waste container,
especially if they have been used to pipet toxins, biohazards, or radioactive solutions.
18
19

The Researcher’s Toolkit


#6: Showing Quantitative Relationships with
Graphs and Charts
A table is best used to present data to the reader when the exact magnitude of the values is
critical to an understanding of the results. However, usually, the relative sizes of the values, or the
relationships among them, are their most important aspects. In these cases, visual representation of
the data in charts or graphs is best, since relationships are much more apparent in a ‘picture’ than in a
column of numbers.
A chart is a figure indicating the magnitude of a value for several conditions or groups. A chart is
best used when you want to compare several values. Examples are bar charts and pie charts. A
graph is a figure indicating the relationship between two variables. Examples are line graphs and
‘scatterplots.’
For presentation of both graphs and charts in formal written papers, it is customary to give each an
explanatory title at the bottom of the figure. This explanation should be enough to make the figure
generally understandable without having to refer to the text of the paper. Nevertheless, the text of the
paper must include mention of the figure, directing the reader to it.
GRAPHS:
1. Be sure to put axis labels on both the x and y axes, and include units in both labels: e.g., time
(minutes).
2. For data points, use filled circles if only type of symbol is used. If multiple symbols are used (e.g.,
you are putting more than one curve on the same graph) use hollow circles, followed by filled triangles,
and finally hollow triangles if there are four curves on the graph. Avoid putting more than four curves
on a graph, since it tends to get crowded and hard to see. Do not use asterisks for data points, since
asterisks on a graph, chart, or table typically represents the results of a statistical analysis of the data.
3. Any lines connecting the data points should not obscure the shape of the data points. If you put
multiple lines on a graph, make them all solid lines, since the shapes of the data points should
differentiate between them. "Break" one line or the other, if lines cross each other. You should use a
smooth curve through a collection of points only when the curve has been generated by a theoretical
equation, which is usually not the case. To be safe, connect the data points with line segments. This
makes no assumptions about the actual shape of the curve which underlies the relationship between
the two variables. Finally, do not extrapolate (extend) a line beyond the range of the data points unless
doing so is part of your data analysis.
4. The ‘calibration’ marks and numbers on the axes should look like a ruler. Do not arbitrarily place the
marks and numbers wherever there is a data point. For example, if you have points at 2, 4, and 16,
the axis marks should still be at 0, 5, 10, 15, and 20. Always label the lowest value on each axis,
which will usually be zero, unless it is a semi-log graph (see next page). Use an appropriate unit for
your axes so that you can avoid numerous zeros (use 1 mg instead of 0.001 g). Minimize the use of
"broken axes." These are used when the values would be clustered at the top of right of a graph with
"normal" axes, and they thus avoid excess white space on the graph. If you use broken axes, make
sure they are clearly indicated, so as not to mislead the reader about the true relationship between the
variables.
5. A semi-log graph is one which has a logarithmic scale, usually as the y axis, and a "normal" linear
scale on the other axis. Such logarithmic axes are used to graphically show values that range widely
20

in magnitude, or that are related to one another by exponential, or power relationships. Thus, a plot of
y=10X is a curve on a normal graph, but becomes a straight line on a semi-log graph.
BAR CHARTS:
1. A baseline is really not needed, since the base of the bars form a visual baseline.
2. The bars should be wider than the spaces between them. Use white, black or dark gray as the
filling pattern of the bars. Avoid using bold stripes, which are distracting, or detailed patterns which are
hard to distinguish from one another.
Below are examples of good graphs and a chart.
21

The Researcher’s Toolkit


#7: Comparing the Means of Two Groups Using the
Student's t-Test
Whenever you gather measured values from two groups of experimental subjects or observations,
their means (averages) are likely to be different. To a researcher, the important question is whether
the difference is meaningful. Is the difference greater than what might be caused simply by chance?
The Student's t-test analyses the difference between two means. (Fun fact: “Student” was pseudonym
of W. S. Gosset, a brewmaster at Guinness around the 1900’s. Guinness didn’t want competitors to
realize that they were using statistics to make better beer (http://udel.edu/~mcdonald/statttest.html)).
One factor that affects our conclusion of whether two means are different is the reliability of each
mean - how confident are we in each mean - how much error is associated with each mean? An "error"
associated with a mean is called the "standard error of the mean" or SEM. It is an estimate of the
reliability of this value. You see, your gathered numbers are only a "sample" of the whole population
you are trying to describe. For instance, suppose we examined the skulls of 16 mink. We hope that
these skulls are representative of all mink skulls. We calculate a mean based on our sample, and we
hope that this mean is a good, reliable representation of the average mink skull. How reliable is this
mean, as an indicator of the "true" average mink skull? If all minks had exactly the same sized head - if
therefore there was no variation among the measurements in our sample - then we would be confident
that the mean is an exact descriptor of this sample and an exact estimate of the true mean of the
population. However, if our skulls are very different in size, then some values will be far from the mean.
In this case, the mean is not a perfect descriptor of the entire sample. And it seems possible that, given
this variation in mink skull size, that another sample might yield a different mean. So, you should have
less confidence in the reliability of a mean computed on a variable sample. As such, you should be
less confident that two means are REALLY different from one another if the samples upon which they
are based are quite variable.
Consider these examples: In situation 'A' , below, we are comparing two samples that have means of
4.3 and 6.2, and each sample shows very little variation (represented by the 'spread' of data values
around each mean). So, in this case, we are confident in each mean, and the difference between them
seems meaningful in relation to the low variation within the groups - the samples seem different; there
are no values that occur in both samples. In B, we compare two other samples. In this case, the
means are also 4.3 and 6.2, but there is a large amount of variation within each sample. As such, we
have less confidence in each mean. It seems that, if we sampled these populations again, we might
get means of 5.7 for both - they could be the same. So, in this case, although the difference between
the mean values is the same as in example A, we are unwilling to call this difference meaningful
because of the variation within each group. Hopefully this shows you that evaluating differences
between means depends on the difference between the means relative to the variation within the
groups.
A. 4.3 6.2 B. 4.3 6.2

Computations:
22

1. So, the first step is to calculate the average value for each sample:

(Xa = group mean; xa = each data value in sample A, na = number of data values in sample A.)

2. Calculate the Standard Error of the Mean (SEM) of each mean, which is the average difference
between the mean and the data points - describing the "spread" about the mean. Since we square
each difference (to avoid negative values), we must take the square-root of the final calculation so that
the SEM is in the same units as the mean. Repeat for SEMb, using Xb, xb, and nb.

3. Compute a t value, which is the ratio we want; the difference between the means in relation to the
variation within the samples:

4. Now, we compare this calculated t value with a critical value from a statistical table, p = 0.05. If your
calculated value is greater than the critical value, then you reject the null hypothesis of equality and
accept the alternative: there is a statistically significant difference between your means.
Sum of both Sum of both Sum of both
Critical t Critical t Critical t
group sizes group sizes group sizes
4 4.30 11 2.26 18 2.12
5 3.18 12 2.23 19 2.11
6 2.78 13 2.20 20 2.10
7 2.57 14 2.18 25 2.07
8 2.45 15 2.16 30 2.05
9 2.37 16 2.14 40 2.02
10 2.31 17 2.13 50 2.01
There a few things to keep in mind about the Student’s t test.
Only two groups can be compared. It is not valid to compare numerous data groups to each other
by repeating the Student’s t test for each comparison. Each data group can be included in only one
Student’s t test.
The Student’s t test assumes that both data groups are normally distributed (i.e., each distribution
follows roughly a bell-shaped curve).
The Student’s t test assumes the variation of the two data groups are roughly equal.
23

The Researcher’s Toolkit


#8: Comparing Distributions with the Chi-Squared Test of Independence
There are occasions when you wish to know whether two events are independent of one another, or
whether the occurrence of one event increases or decreases the likelihood of the second
(dependence). This is a common issue in the heredity of multiple genes, where the investigator wants
to know whether the inheritance of the genes for one trait are independent or related to the inheritance
pattern of other genes for other traits. So, consider the following results from a dihybrid test-cross
(assume complete dominance for each locus):
Parents: AaBb x aabb
Offspring: Phenotype N
AB 40
Ab 20
aB 18
ab 22
Total 100
You might think that, if the genes assort independently, the double heterozygous parent (AaBb) should
produce AB, Ab, aB, and ab gametes in a ¼:¼:¼:¼ ratio. So, you might think that you should compare
your observed results to an expected distribution of 25 AB : 25 Ab : 25 aB : 25 ab with a Goodness-of-
Fit test (RTK #1 – pg. 107). However, the prediction that the AB gametes should be produced ¼ of the
time is actually based on THREE hypotheses, not just the single hypothesis of independent assortment:
Hypothesis 1: The ‘A’ gamete should be produced ½ the time
Hypothesis 2: The ‘B’ gamete should be produced ½ the time
AND Hypothesis 3: The genes assort independently so the chance of their co-occurrence = ½ x ½ = ¼
There are three different assumptions that must be true in order for your observed results to match
these expectations. In other words, your genes might assort independently, but not match the
expected 25:25:25:25 ratio because gene A or gene B is not segregating exactly evenly (the frequency
of A or B gametes produced by the AaBb parent is not exactly ½). So, you could reject the null
hypothesis and erroneously conclude that the genes are not assorting independently, even though they
are. If you want to test for independence, then you use the Test of Independence. If you want to test
whether results fit a particular distribution (1:1:1:1, for instance), the Goodness-of-Fit test is appropriate.
Just understand the hypotheses you are testing in each case.
You really want to test whether the way the ‘A’ genes are inherited is independent of the way that the
‘B’ genes are inherited. Your first step is to set up a ‘contingency table’, in which the categories of one
variable (Types of ‘A’ phenotypes) are rows, and categories of the other variable (types of B
phenotypes) are columns. (‘Contingency’ is just another word for ‘dependency’ … whether B occurs or
not may be contingent - may depend on - whether A also occurs.) Then enter your observed data into
the table.
Contingency Table:
Categories of Variable 2
Categories of Variable 1 B Phenotype b phenotype (Row Totals)
A phenotype 40 20 60
a phenotype 18 22 40
(Column totals) 58 42 100 (Grand Total)
24

A quick look at the row and column totals will show you that, indeed, neither the A gene nor the B gene
was produced exactly ½ the time. But those aren’t the hypotheses we are testing here. We want to
know if the genes assort independently. So, we calculate our expected frequencies by using the
product rule.
Under the hypothesis of independent assortment:
1) the frequency of AB = f(A) x f(B) x N; f(A) = 60/100, f(B) = 58/100, and N = 100
2) f(AB) = 60 x 58 x 100 = (60 x 58 ) = (Row Tot. x Col. Tot.) = 34.8
100 100 1 100 Grand Total
If these genes assorted independently, we should expect to see 34.8 AB offspring. This expectation is
based SOLELY on the hypothesis of independent assortment; no assumptions about the frequencies of
A or B are made, because we used the ACTUAL, OBSERVED frequencies of A and B (not a guess at
½ for each). Now, if the observed results do not fit the expectations from our hypothesis, there is only
one reason – the single hypothesis of independent assortment must be wrong.
Next, compute the other three expected ratios the same way: (RT x CT)/ GT
3) f(Ab) = (60 x 42) / 100 = 25.2
f(aB) = (40 x 58) / 100 = 23.2
f(ab) = (40 x 42) / 100 = 16.8
Then compare these calculated expected values with the observed values, using the same formulae as
the Goodness-of-Fit test.
Phenotype Observed Expected (o-e) (o-e)2 / e
AB 40 34.8 5.2 0.78
Ab 20 25.2 5.2 1.07
aB 18 23.2 5.2 1.17
ab 22 16.8 5.2 1.61
SUM = 4.63
Next, compare your calculated value with the critical value from the table (pg. 14). For the test-of-
independence, however, the degrees of freedom as computed based on the number of columns and
rows in the contingency table. Df = (c – 1)(r – 1) = (2-1)(2-1) = 1.
The critical chi-square, with df = 1 and p = 0.05, = 3.84. Your calculated value is greater than the
critical value, so independently assorting genes would only produce a deviant pattern like yours less
than 5% of the time. Your observed results are very unlikely for independently assorting genes. So, the
correct conclusion, at the 95% confidence level, is to reject the hypothesis of independence.
25

The Researcher’s Toolkit


#9: Detecting Correlations with Spearman’s Rank Correlation
Suppose you are interested in determining whether there is a significant relationship between
two continuous variables. As the value of one variable increases, does the other variable change in a
non-random manner (either increasing or decreasing)? To address these correlational questions, you
conduct either a correlation analysis or regression analysis. Regression analyses are used when you
believe there is a causal relationship between the variables; as the independent variable changes, it
causes a change in the dependent variable. Correlational analyses are used when no causality is
implied or known. For instance, arm length correlates with leg length, but an increase in arm length
does not cause and increase in leg length.
As in other statistical tests, you are confronted with a dilemma: either there IS a relationship
between the variables, or there is not. Which hypothesis should you test? Well, once again, it is easier
to predict what the results should be if there is NOT a relationship. (This is similar to the hypothesis
that there is NO difference between one distribution and another, or one mean and another). In that
case, as one variable increased in value, the other variable would change randomly. Thus, you will be
determining the probability that any observed trend between the two variables is due to chance alone.
Procedure:
1. Consider two variables that you have measured, such as the diameter of a tree and the
number of neighbors within a 4 meter radius. Suppose you had the following 5 data points:
Tree Diameter (cm) Neighbors
1 13.4 11
2 12.2 14
3 24.7 10
4 1.1 15
5 18.9 12
2. The first step is to RANK the values for each variable from lowest (1) to highest (6).
Tree Diameter (cm) RANK (Diam) Neighbors RANK (Neigh)
1 13.4 3 11 2
2 12.2 2 14 4
3 24.7 5 10 1
4 1.1 1 15 5
5 18.9 4 12 3
3. Now, you work JUST with these ranks. (That’s why the distribution of the real data doesn’t
matter. Even if your largest tree had been 100 cm in diameter and your data was non-normal, the tree
would still be ranked as ‘5’.) Compute the difference (d) between the ranks for each data point, and
square these differences. Then sum the squared differences. This equals 38 in the table, below.
26

Diameter RANK RANK Difference Squared


Tree Neighbors
(cm) (Diam) (Neigh) between ranks Differences
1 13.4 3 11 2 1 1
2 12.2 2 14 4 -2 4
3 24.7 5 10 1 4 16
4 1.1 1 15 5 -4 16
5 18.9 4 12 3 1 1
SUM 38
4. Now, apply the following formula to compute the Spearman correlation coefficient,
symbolized as rs:
rs = 1 – ((6 (d2)) / (n3 – n)) where n = # of data points and d = the difference between ranks.
For this example, the calculation would be:
rs = 1 – ((6 x 38) / (53 – 5)
rs = 1 – (228 / 120)
rs = 1 – (1.900) = -0.900
5. Next, compare the absolute value of your calculated correlation coefficient to the critical
values at the p = 0.05 level (see table below). If your coefficient is greater or equal to the critical value,
then there is a statistically significant correlation between the variables at the p =0.05 level. With n = 5,
the critical value = 1.00. Our value is 0.900, so our correlation is not strong enough at the p = 0.05
level. Thus, we cannot reject the idea that the relationship observed is merely due to chance. But
notice that, as sample size increases, the critical value (threshold) declines dramatically. That’s why
large samples are better than small samples – they allow us to resolve whether a relationship exists,
even if the relationship is weak.

Critical Critical Critical


n n n
value value value
5 1.00 15 0.521 25 0.398
6 0.886 16 0.503 26 0.390
7 0.786 17 0.485 27 0.382
8 0.738 18 0.472 28 0.375
9 0.700 19 0.460 29 0.368
10 0.648 20 0.447 30 0.362
11 0.618 21 0.435 40 0.313
12 0.587 22 0.425 50 0.279
13 0.560 23 0.425 80 0.220
14 0.538 24 0.406 100 0.197

You might also like