0% found this document useful (0 votes)

20 views50 pages

Hartl Genetics Chapter2

Uploaded by

Zeriabrook

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views50 pages

Hartl Genetics Chapter2

Uploaded by

Zeriabrook

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

C H A P T E R

2
DNA Structure
and DNA
Manipulation

36
CHAPTER OUTLINE PRINCIPLES
2.1 Genomes and Genetic Differences Among • A DNA strand is a polymer of A, T, G and C
Individuals deoxyribonucleotides joined 3'-to-5' by phosphodiester bonds.
DNA Markers as Landmarks in
Chromosomes • The two DNA strands in a duplex are held together by
hydrogen bonding between the AT and GC base pairs and
2.2 The Molecular Structure of DNA
Polynucleotide Chains
by base stacking of the paired bases.
Base Pairing and Base Stacking • Each type of restriction endonuclease enzyme cleaves double-
Antiparallel Strands stranded DNA at a particular sequence of bases usually four or
DNA Structure as Related to six nucleotides in length.
Function
• The DNA fragments produced by a restriction enzyme can be
2.3 The Separation and Identificatin of
Genomic DNA Fragments
separated by electrophoresis, isolated, sequenced, and
Restriction Enzymes and Site-Specific manipulated in other ways.
DNA Cleavage • Separated strands of DNA or RNA that are complementary in
Gel Electrophoresis nucleotide sequence can come together (hybridize)
Nucleic Acid Hybridization
The Southern Blot spontaneously to form duplexes.
2.4 Selective Replication of Genomic DNA
• DNA replication takes place only by elongation of the growing
Fragments strand in the 5'-to-3' direction through the addition of
Constraints on DNA Replication: successive nucleotides to the 3' end.
Primers and 5'-to-3' Strand • In the polymerase chain reaction, short oligonucleotide
Elongation
The Polymerase Chain Reaction primers are used in successive cycles of DNA replication to
amplify selectively a particular region of a DNA duplex.
2.5 The Terminology of Genetic
Analysis • Genetic markers in DNA provide a large number of easily
accessed sites in the genome that can be used to identify the
2.6 Types of DNA Markers Present in
Genomic DNA
chromosomal locations of disease genes, for DNA typing in
Single Nucleotide Polymorphisms individual identification, for the genetic improvement of
(SNPs) cultivated plants and domesticated animals, and for many
Restriction Fragment Length other applications.
Polymorphisms (RFLPs)
Random Amplified Polymorphic
DNA (RAPD)
Amplified Fragment Length
Polymorphisms (AFLPs)
Simple Tandem Repeat CONNECTIONS
Polymorphisms (STRPs)
The Double Helix
2.7 Applications of DNA Markers James D. Watson and Francis H. C. Crick 1953
Genetic Markers, Genetic Mapping, A Structure for Deoxyribose Nucleic Acid
and “Disease Genes”
Other Uses for DNA Markers Origin of the Human Genetic Linkage Map
David Botstein, Raymond L. White, Mark Skolnick, and Ronald W.
Davis 1980
Construction of a Genetic Linkage Map in Man Using Restriction
Fragment Length Polymorphisms

37
I n Chapter 1, we reviewed the experimen-
tal evidence demonstrating that the
genetic material is DNA. We saw how,
through the unique structure of the DNA
molecule, genetic information can be tran-
scribed and translated into proteins that
affect the inherited characteristics of organ-
isms. When a mutant gene encodes a non-
functional protein that results in some
physical or physiological abnormality—for
example, an “inborn error of metabo-
lism”—the expression of that abnormality
can be used to trace the transmission of the
mutant gene from one generation to the
next in a pedigree (family history). As a
consequence, until recently, the first step in
genetic analysis was the identification of or-
ganisms with such abnormal traits, such as
peas with wrinkled seeds instead of round
seeds and fruit flies with white eyes instead
of red. These traits were studied by means
of controlled crosses so that the parentage
of each individual could be traced. Large-
scale genetic studies were typically limited
to one of a small number of model organ-
isms especially favorable for isolating and
identifying mutant genes, such as the bud- 09131_01_1651A
ding yeast Saccharomyces cerevisiae, the nem-
atode worm Caenorhabditis elegans, or the
fruit fly Drosophila melanogaster.
Since the mid-1970s, studies in genetics
have undergone a revolution based on the
use of increasingly sophisticated ways to
isolate and identify specific fragments of
DNA. The culmination of these techniques
was large-scale genomic sequencing—the
ability to determine the correct sequence of
the base pairs that make up the DNA in an
entire genome and to identify the se-
quences associated with genes. Because
many of the model organisms used in ge-
netics have relatively small genomes, these
sequences were completed first, in the late
1990s (Figure 2.1) The techniques used to se-
quence these simpler genomes were then
scaled up to sequence the human genome.
The initial “rough draft” of the human
genome was announced in June 2000; this
represents an important milestone in the
Human Genome Project, whose goals in-

Figure 2.1 Timeline of large-scale genomic DNA

sequencing.

38 Chapter 2 DNA Structure and DNA Manipulation

clude determining the sequence, and iden- tions that are responsible for genetic dis- Au: In 1st para in
tifying the function, of all human genes. eases such as phenylketonuria and other 1st column, proof-
inborn errors of metabolism, as well as the reader asks if
mutations that increase the risk of more “hundreds or
complex diseases such as heart disease, thousands of
2.1 Genomes and Genetic breast cancer, and diabetes. copies” is correct?
Differences Among Fortunately, only a small proportion of Or should phrase
all differences in DNA sequence are asso-
Individuals ciated with disease. Some of the others are
be “hundreds of
thousands of
The numbers associated with the genome of associated with inherited differences in copies”?
even a simple organism can be intimidating. height, weight, hair color, eye color, facial
The sequenced genomes of D. melanogaster features, and other traits. Most of the
and C. elegans, both approximately 100 mil- genetic differences between people are
lion base pairs in length, encode 13,601 completely harmless. Many have no de-
proteins and 18,424 proteins, respectively. tectable effects on appearance or health.
The human genome is considerably larger. Such differences can be studied only
As found in a human reproductive cell, the through direct examination of the DNA it-
human genome consists of 3 billion base self. These harmless differences are never-
pairs organized into 23 distinct chromo- theless important, because they serve as
somes (each chromosome contains a single genetic markers.
molecule of duplex DNA). A typical chro-
mosome can contain several hundred to
several thousand genes, arranged in linear
order along the DNA molecule present in DNA Markers as Landmarks
the chromosome. The sequences that make in Chromosomes
up the protein-coding part of these genes In genetics, a genetic marker is any differ-
actually account for only about 4 percent of ence in DNA, no matter how it is detected,
the entire genome. The other 96 percent of whose pattern of transmission from genera-
the sequences do not code for proteins. tion to generation can be tracked. Each in-
Some noncoding sequences are genetic dividual who carries the marker also carries
“chaff” that gets separated from the protein- a length of chromosome on either side of it,
coding “wheat” when genes are transcribed so it marks a particular region of the
and the RNA transcript is processed into genome. A mutant gene, or some portion of
messenger RNA. Other noncoding se- a mutant gene, can serve as a genetic
quences are relatively short sequences that marker. In the “classical” approach to ge-
are found in hundreds or thousands of netics, it is the outward expression of a
copies scattered throughout the genome. gene (or lack of expression) that forms the
Still other noncoding sequences are rem- basis of genetic analysis. For example, a
nants of genes called pseudogenes. As might mutation causing wrinkled peas is a genetic
be expected, identifying the protein-coding marker, which can be identified through its
genes from among the large background of effects on pea shape. In modern genetic
noncoding DNA in the human genome is a analysis, any difference in DNA sequence
challenge in itself. between two individuals can serve as a ge-
Geneticists often speak of the nucleotide netic marker. And although these genetic
sequence of “the” human genome because markers are often harmless in themselves,
99.9 percent of the DNA sequences in any they allow the positions of disease genes to
two individuals are the same. This is our be located and their DNA isolated, identi-
evolutionary legacy; it contains the genetic fied, and studied.
information that makes us human beings. Genetic markers that are detected by di-
In reality, however, there are many differ- rect analysis of the DNA are often called
ent human genomes. Geneticists have great DNA markers. DNA markers are impor-
interest in the 0.1 percent of the human tant in genetics because they serve as land-
DNA sequence—3 million base pairs—that marks in long DNA molecules, such as those
differs from one genome to the next, be- found in chromosomes, which allow ge-
cause these differences include the muta- netic differences among individuals to be

2.1 Genomes and Genetic Differences Among Individuals 39

tracked. They are like signposts along a ward expression. Use of these methods
highway. Using DNA markers as landmarks, broadens the scope of genetics, making it
the geneticist can identify the positions of possible to carry out genetic analysis in any
normal genes, mutant genes, breaks in organism. This means that detailed genetic
chromosomes, and other features impor- analysis is no longer restricted to human
tant in genetic analysis (Figure 2.2). beings, domesticated animals, cultivated
The detection of DNA markers usually plants, and the relatively small number of
requires that the genomic DNA (the total model organisms favorable for genetic
DNA extracted from cells of an organism) studies. Direct study of DNA eliminates the
be fragmented into pieces of manageable need for prior identification of genetic dif-
size (usually a few thousand nucleotide ferences between individuals; it even elim-
pairs) that can be manipulated in labora- inates the need for controlled crosses. The
tory experiments. In the following sec- methods of molecular analysis discussed in
tions, we examine some of the principal this chapter have transformed genetics:
ways in which DNA is manipulated to re-
veal genetic differences among individuals, The manipulation of DNA is the basic experi-
whether or not these differences find out- mental operation in modern genetics.

Chromosomes are located Each chromosome contains DNA markers are used
in the cell nucleus. one long molecule of duplex to identify particular
(double-stranded) DNA. regions along the DNA
in chromosomes.
A B
Nucleus
C
DNA marker

Double-stranded
DNA molecule
Chromosomes Cleavage

HUMAN CELL
Single DNA
DNA fragment D fragments can
be cleaved from
the molecule.
BACTERIAL
CELL
Replication in
CLONED DNA bacterial cells
The fragments can be
Large quantities of the
transferred into bacterial
cloned human DNA D cells, where they can
fragment can be isolated
replicate. (This is the
from the bacterial cells.
procedure of cloning.)

A DNA marker, in this case

D
D, serves to identify bacterial
cells containing a particular
DNA fragment of interest.
Au: Proofreader
asks that you lcon- Figure 2.2 DNA markers serve as landmarks that identify physical positions along a DNA molecule,
firm Chapter 13 such as DNA from a chromosome. As shown at the right, a DNA marker can also be used to identify
reference in Fig 2.2 bacterial cells into which a particular fragment of DNA has been introduced. The procedure of DNA
caption is correct. cloning is not quite as simple as indicated here; it is discussed further in Chapter 13.

40 Chapter 2 DNA Structure and DNA Manipulation

These methods are the principal techniques
used in virtually every modern genetics
laboratory.

2.2 The Molecular Structure

of DNA 09131_01_1745P
Modern experimental methods for the ma-
nipulation and analysis of DNA grew out of
a detailed understanding of its molecular
structure and replication. Therefore, to un-
derstand these methods, one needs to
know something about the molecular
structure of DNA. We saw in Chapter 1 that
DNA is a helix of two paired, complemen-
tary strands, each composed of an ordered
string of nucleotides, each bearing one of the Photocaptionphotocaptionphotocaptionphotoc
bases A (adenine), T (thymine), G (gua- aptionphotocaptionphotocaptionphotocaption-
nine), or cytosine (C). Watson–Crick base photocaptionphotocaptionphotocaptionphoto-
pairing between A and T and between G caption
and C in the complementary strands holds
the strands together. The complementary
strands also hold the key to replication, be-
cause each strand can serve as a template gar), phosphoric acid, and the four nitro-
for the synthesis of a new complementary gen-containing bases denoted A, T, G, and
strand. We will now take a closer look at C. The chemical structures of the bases are
DNA structure and at the key features of its shown in Figure 2.3. Note that two of the
replication. bases have a double-ring structure; these
are called purines. The other two bases
have a single-ring structure; these are
Polynucleotide Chains called pyrimidines.
In terms of biochemistry, a DNA strand is a • The purine bases are adenine (A) and
polymer—a large molecule built from guanine (G).
repeating units. The units in DNA are com- • The pyrimidine bases are thymine (T) and
posed of 2'-deoxyribose (a five-carbon su- cytosine (C).

Purines Pyrimidines

Adenine Guanine Thymine Cytosine

H H
N O CH3 H
H
C H C H C O H C N
N N N 5
N1 6 5 C 7 C C6 4C C C H
2 A 8 C H G C H T C
9 1 3
C 3 4C H C C N 2 N N N
H N N N N N C H C
Deoxyribose Deoxyribose
H
Deoxyribose Deoxyribose O O

Figure 2.3 Chemical structures of the four nitrogen-containing bases in DNA: adenine, thymine,
guanine, and cytosine. The nitrogen atom linked to the deoxyribose sugar is indicated. The atoms
shown in red participate in hydrogen bonding between the DNA base pairs.

2.2 The Molecular Structure of DNA 41

Nucleoside Nucleotide

OH
Base Base
HOCH2 A, G, T, or C HO P O CH2 A, G, T, or C
5 O 5 O
4 H H 1 O 4 H H 1
H 2
H H 2
H
3 3
OH H Phosphate OH H
Sugar Sugar
This group is OH This group is OH
in RNA. in RNA.

Figure 2.4 A typical nucleotide, showing the three major components (phosphate, sugar, and base),
the difference between DNA and RNA, and the distinction between a nucleoside (no phosphate
group) and a nucleotide (with phosphate). Nucleotides are monophosphates (with one phosphate
group). Nucleoside diphosphates contain two phosphate groups, and nucleoside triphosphates
contain three.

In DNA, each base is chemically linked in Table 2.1. Most of these terms are not
to one molecule of the sugar deoxyribose, needed in this book; they are included be-
forming a compound called a nucleoside. cause they are likely to be encountered in
When a phosphate group is also attached further reading.
to the sugar, the nucleoside becomes a In nucleic acids, such as DNA and RNA,
nucleotide (Figure 2.4). Thus a nucleotide the nucleotides are joined to form a
is a nucleoside plus a phosphate. In the polynucleotide chain, in which the phos-
conventional numbering of the carbon phate attached to the 5' carbon of one sugar
atoms in the sugar in Figure 2.4, the car- is linked to the hydroxyl group attached to
bon atom to which the base is attached is the 3' carbon of the next sugar in line
Au: Query from the 1' carbon. (The atoms in the sugar are (Figure 2.5). The chemical bonds by which
MH—Why no given primed numbers to distinguish them the sugar components of adjacent nu-
primes shown from atoms in the bases.) The nomencla- cleotides are linked through the phosphate
here (shown in ture of the nucleoside and nucleotide de- groups are called phosphodiester bonds.
Fig 2.5)? rivatives of the DNA bases is summarized The 5'–3'–5'–3' orientation of these linkages

Table 2.1 DNA nomenclature

Base Nucleoside Nucleotide

Adenine (A) Deoxyadenosine Deoxyadenosine-5'

monophosphate (dAMP)
diphosphate (dADP)
triphosphate (dATP)
Guanine (G) Deoxyguanosine Deoxyguanosine-5'
monophosphate (dGMP)
diphosphate (dGDP)
triphosphate (dGTP)
Thymine (T) Deoxythymidine Deoxythymidine-5'
monophosphate (dTMP)
diphosphate (dTDP)
triphosphate (dTTP)
Cytosine (C) Deoxycytidine Deoxycytidine-5'
monophosphate (dCMP)
diphosphate (dCDP)
triphosphate (dCTP)

42 Chapter 2 DNA Structure and DNA Manipulation

(A) (B)

5’ end
5’ end terminates P A
with phosphate group

5’ end –O

–O P O NH2 P G

O N N
H A
5’ CH2
O N N H C
P
H H
H H
3’
O H
–O
HO
P O O
3’ end
H
O N N
Phosphate linked H G
5’ CH2
to 5’ carbon and O N N NH2
to 3’ carbon
H H
H H
3’
O H
Phosphodiester –O P O NH2
bonds
O H N
C
5’CH2
O H N O
H H
H H
3’
3’ end OH H

3’ end terminates
with hydroxyl (–OH)

Figure 2.5 Three nucleotides at the 5' end of a single polynucleotide strand. (A) The chemical
structure of the sugar–phosphate linkages, showing the 5'-to-3' orientation of the strand (the red
numbers are those assigned to the carbon atoms). (B) A common schematic way to depict a polynu-
cleotide strand.

continues throughout the chain, which technique to measure the amount of each
typically consists of millions of nucleotides. base present in DNA. As we describe this
Note that the terminal groups of each technique, we will let the molar concentra-
polynucleotide chain are a 5'-phosphate tion of any base be represented by the sym-
(5'-P) group at one end and a 3'-hydroxyl bol for the base in square brackets; for
(3'-OH) group at the other. The asymme- example, [A] denotes the molar concentra-
try of the ends of a DNA strand implies that tion of adenine. Chargaff used his tech-
each strand has a polarity determined by nique to measure the [A], [T], [G], and [C]
which end bears the 5' phosphate and content of the DNA from a variety of
which end bears the 3' hydroxyl. sources. He found that the base composi-
A few years before Watson and Crick tion of the DNA, defined as the percent
proposed their essentially correct three- G C, differs among species but is con-
dimensional structure of DNA as a double stant in all cells of an organism and within a
helix, Erwin Chargaff developed a chemical species. Data on the base composition of

2.2 The Molecular Structure of DNA 43

DNA from a variety of organisms are given by addition of the other two: [A] [G]
in Table 2.2. [T] [C]. In the next section, we examine
Chargaff also observed certain regular the molecular basis of base pairing in more
relationships among the molar concentra- detail.
tions of the different bases. These relation-
ships are now called Chargaff’s rules:
Base Pairing and Base Stacking
• The amount of adenine equals that of
In the three-dimensional structure of the
thymine: [A] [T].
• The amount of guanine equals that of DNA molecule proposed in 1953 by Watson
cytosine: [G] [C]. and Crick, the molecule consists of two
• The amount of the purine bases equals that of polynucleotide chains twisted around one
the pyrimidine bases: another to form a double-stranded helix in
which adenine and thymine, and guanine
[A] [G] [T] [C].
and cytosine, are paired in opposite strands
Although the chemical basis of these ob- (Figure 2.6). In the standard structure, which
servations was not known at the time, one of is called the B form of DNA, each chain
the appealing features of the Watson–Crick makes one complete turn every 34 Å. The
structure of paired complementary strands helix is right-handed, which means that as
was that it explained Chargaff’s rules. Be- one looks down the barrel, each chain fol-
cause A is always paired with T in double- lows a clockwise path as it progresses. The
stranded DNA, it must follow that [A] bases are spaced at 3.4 Å, so there are ten
[T]. Similarly, because G is paired with C, we bases per helical turn in each strand and ten
know that [G] [C]. The third rule follows base pairs per turn of the double helix.

Table 2.2 Base composition of DNA from different organisms

Base (and percentage of total bases) Base composition
Organism Adenine Thymine Guanine Cytosine (percent G C)

Bacteriophage T7 26.0 26.0 24.0 24.0 48.0

Bacteria
Clostridium perfringens 36.9 36.3 14.0 12.8 26.8
Streptococcus pneumoniae 30.2 29.5 21.6 18.7 40.3
Escherichia coli 24.7 23.6 26.0 25.7 51.7
Sarcina lutea 13.4 12.4 37.1 37.1 74.2

Fungi
Saccharomyces cerevisiae 31.7 32.6 18.3 17.4 35.7
Neurospora crassa 23.0 22.3 27.1 27.6 54.7

Higher plants
Wheat 27.3 27.2 22.7 22.8* 45.5
Maize 26.8 27.2 22.8 23.2* 46.0

Animals
Drosophila melanagaster 30.8 29.4 19.6 20.2 39.8
Pig 29.4 29.6 20.5 20.5 41.0
Salmon 29.7 29.1 20.8 20.4 41.2
Human being 29.8 31.8 20.2 18.2 38.4

*Includes one-fourth 5-methylcytosine, a modified form of cytosine found in most plants more complex than algae and in
many animals

44 Chapter 2 DNA Structure and DNA Manipulation

(A) (B)

Minor
groove

A Adenine
T
T Thymine
A
Guanine Major
C G
Cytosine groove
G C

A T
C G
34 Å per complete
GC
turn (10 base pairs
GC
P
P

per turn)

Guanine
H

N
N
N

H
C
C
H

C
N
N
C
H

Cytosine
O
O
C
N

H
N

N
N
H

C
H
C
C

H
C
C

Adenine
C
N
H

N
C N
O

H
H
C
N

Thymine
N

O
C
C
H

CH
3
P

Phosphate
P
P

Deoxyribose
sugar
P

Base

Diameter
Oxygen
20 Å
Hydrogen

Phosphorus

C in sugar–
phosphate chain

C and N in bases

Figure 2.6 Two representations of DNA, illustrating the three-dimensional structure of the double
helix. (A) In a ribbon diagram, the sugar–phosphate backbones are depicted as bands, with horizon-
tal lines used to represent the base pairs. (B) A computer model of the B form of a DNA molecule.
The stick figures are the sugar–phosphate chains winding around outside the stacked base pairs,
forming a major groove and a minor groove. The color coding for the base pairs is as follows: A, red
or pink; T, dark green or light green; G, dark brown or beige; C, dark blue or light blue. The bases
depicted in dark colors are those attached to the blue sugar–phosphate backbone; the bases depicted
in light colors are attached to the beige backbone. [B, courtesy of Antony M. Dean.]

2.2 The Molecular Structure of DNA 45

The strands feature base pairing, in percent of G C. Because nothing re-
which each base is paired to a complemen- stricts the sequence of bases in a single
tary base in the other strand by hydrogen strand, any sequence could be present
bonds. (A hydrogen bond is a weak bond along one strand. This explains Chargaff’s
in which two participating atoms share a observation that DNA from different or-
hydrogen atom between them.) The hydro- ganisms may differ in base composition.
gen bonds provide one type of force holding However, because the strands in duplex
the strands together. In Watson–Crick base DNA are complementary, Chargaff’s rules
pairing, adenine (A) pairs with thymine of [A] [T] and [G] [C] are true what-
(T), and guanine (G) pairs with cytosine ever the base composition.
(C). The hydrogen bonds that form in the In the B form of DNA, the paired bases
Au: Term in Key adenine–thymine base pair and in the are planar, parallel to one another, and per-
Terms list is guanine–cytosine pair are illustrated in pendicular to the long axis of the double
“hydrophobic Figure 2.7. Note that an AT pair (Figure helix. This feature of double-stranded DNA
interaction” 2.7A and B) has two hydrogen bonds and is known as base stacking. The upper and
rather than that a GC pair (Figure 2.7C and D) has lower faces of each nitrogenous base are
“hydrophic.” three hydrogen bonds. This means that the relatively flat and nonpolar (uncharged).
Would you like to hydrogen bonding between G and C is These surfaces are said to be hydrophobic
change term in stronger in the sense that it requires more because they bind poorly to water mole-
list? Or change energy to break; for example, the amount cules, which are very polar. (The polarity
text here? of heat required to separate the paired refers to the asymmetrical distribution of
strands in a DNA duplex increases with the charge across the V-shaped water molecule;

(A) (B)
Two hydrogen
bonds attract A and T.

H H
C N N H O CH3
N C C C C
C
A N H N T C H
Deoxyribose
N
C C N
H O
Deoxyribose

Adenine Thymine
(C) (D)
Three hydrogen
bonds attract G and C.

H H
C N O H N H
N C C C C
C
G N H N C C H
Deoxyribose
N
C C N
N H O
Deoxyribose
H
Guanine Cytosine

Figure 2.7 Normal base pairs in DNA. On the left, the hydrogen bonds (dotted lines) with the joined
atoms are shown in red. (A and B) AT base pairing. (C and D) GC base pairing. In the space-
filling models (B and D), the colors are as follows: C, gray; N, blue; O, red; and H (shown in the bases
only), white. Each hydrogen bond is depicted as a white disk squeezed between the atoms sharing
the hydrogen. The stick figures on the outside represent the backbones winding around the stacked
base pairs. [Space-filling models courtesy of Antony M. Dean.]

46 Chapter 2 DNA Structure and DNA Manipulation

the oxygen at the base of the V tends to be 3’ end
quite negative, whereas the hydrogens at (terminates in
3’ hydroxyl)
the tips are quite positive). Owing to their OH
5’ end
repulsion of water molecules, the paired (terminates in
nitrogenous bases tend to stack on top of 5’ phosphate)
one another in such a way as to exclude the
maximum amount of water from the in- T A P
P
terior of the double helix. Hence a double-
stranded DNA molecule has a hydrophobic
core composed of stacked bases, and it is the
energy of base stacking that provides G C P
P
double-stranded DNA with much of its
chemical stability.
When discussing a DNA molecule, mol-
ecular biologists frequently refer to the in- P C G P
dividual strands as single strands or as
single-stranded DNA; they refer to the dou-
ble helix as double-stranded DNA or as a
P A T P
duplex molecule. The two grooves spiraling
along outside of the double helix are not
symmetrical; one groove, called the major
groove, is larger than the other, which is T A P
P
called the minor groove. Proteins that in-
teract with double-stranded DNA often
have regions that make contact with the
base pairs by fitting into the major groove, P G C P
into the minor groove, or into both grooves
(Figure 2.6B). 5’ end
(terminates in
HO 5’ phosphate)
Antiparallel Strands 3’ end
Each backbone in a double helix consists of (terminates in
deoxyribose sugars alternating with phos- 3’ hydroxyl)
phate groups that link the 3' carbon atom of Figure 2.8 A segment of a DNA molecule,
one sugar to the 5' carbon of the next in showing the antiparallel orientation of the
line (Figure 2.5). The two polynucleotide complementary strands. The overlying blue
strands of the double helix have opposite arrows indicate the 5'-to-3' direction of each
polarity in the sense that the 5' end of one strand. The phosphates (P) join the 3' carbon
atom of one deoxyribose (horizontal line) to the
strand is paired with the 3' end of the other 5' carbon atom of the adjacent deoxyribose.
strand. Strands with such an arrangement
are said to be antiparallel. One implica-
tion of antiparallel strands in duplex DNA is
that in each pair of bases, one base is at- ferent one. The right-handed double helix
tached to a sugar that lies above the plane in Figure 2.6 is the standard B form, but de-
of pairing, and the other base is attached to pending on conditions, DNA can actually
a sugar that lies below the plane of pairing. form more than 20 slightly different vari-
Another implication is that each terminus ants of a right-handed helix, and some re-
of the double helix possesses one 5'-P group gions can even form helices in which the
(on one strand) and one 3'-OH group (on strands twist to the left (called the Z form
the other strand), as shown in Figure 2.8. of DNA). If there are complementary
The diagram of the DNA duplex in Fig- stretches of nucleotides in the same strand,
ure 2.6 is static and so somewhat mislead- then a single strand, separated from its
ing. DNA is a dynamic molecule, constantly partner, can fold back upon itself like a
in motion. In some regions, the strands can hairpin. Even triple helices consisting of
separate briefly and then come together three strands can form in regions of DNA
again in the same conformation or in a dif- that contain suitable base sequences.

2.2 The Molecular Structure of DNA 47

The Double Helix
We wish to suggest a structure for the salt nine and cytosine. The sequence of
James D. Watson and
of deoxyribose nucleic acid (DNA). . . . bases on a single chain does not appear
Francis H. C. Crick 1953
The structure has two helical chains to be restricted in any way. However, if
Cavendish Laboratory,
each coiled round the same axis. . . . only specific pairs of bases can be
Cambridge, England
Both chains follow right-handed helices, formed, it follows that if the sequence of
A Structure for Deoxyribose Nucleic Acid
but the two chains run in opposite di- bases on one chain is given, then the se-
rections. . . . The bases are quence on the other
This is one of the watershed papers of twen- on the inside of the helix and If only specific pairs chain is automatically
tieth-century biology. After its publication, the phosphates on the out- of bases can be determined. . . . It has
nothing in genetics was the same. Every- side. . . . There is a residue not escaped our notice
formed, it follows
thing that was known, and everything still on each chain every 3.4 Å and that the specific pair-
to be discovered, would now need to be in- the structure repeats after 10 that if the sequence ing we have postulated
terpreted in terms of the structure and residues. . . . The novel fea- of bases on one immediately suggests
function of DNA. The importance of the ture of the structure is the chain is given, then a plausible copying
paper was recognized immediately, in no manner in which the two mechanism for the ge-
the sequence on the
small part because of its lucid and concise chains are held together by netic material. . . . We
description of the structure. Watson and the purine and pyrimidine other chain is are much indebted to
Crick benefited tremendously in knowing bases. The planes of the automatically Dr. Jerry Donohue for
that their structure was consistent with the bases are perpendicular to determined. constant advice and
unpublished structural studies of Maurice the fiber axis. They are joined criticism, especially on
Wilkins and Rosalind Franklin. The same together in pairs, a single base from one interatomic distances. We have also
issue of Nature that included the Watson chain being hydrogen-bonded to a sin- been stimulated by a knowledge of the
and Crick paper also included, back to gle base from the other chain, so that general nature of the unpublished ex-
back, a paper from the Wilkins group and the two lie side by side. One of the pair perimental results and ideas of Dr.
one from the Franklin group detailing their must be a purine and the other a pyrimi- Maurice H. F. Wilkins, Dr. Rosalind
data and the consistency of their data with dine for bonding to occur. . . . Only spe- Franklin and their co-workers at King’s
the proposed structure. It has been said cific pairs of bases can bond together. College, London.
that Franklin was poised a mere two half- These pairs are adenine (purine) with
steps from making the discovery herself, thymine (pyrimidine), and guanine Source: Nature 171: 737–738
alone. In any event, Watson and Crick and (purine) with cytosine (pyrimidine). In
Wilkins were awarded the 1962 Nobel other words, if an adenine forms one
Prize for their discovery of DNA structure. member of a pair, on either chain, then
Rosalind Franklin, tragically, died of cancer on these assumptions the other mem-
in 1958 at the age of 38. ber must be thymine; similarly for gua-

pairing of A with T and of G with C in the

DNA Structure as Related two polynucleotide chains. Unwinding and
to Function separation of the chains, with each free chain
In the structure of the DNA molecule, we being copied, results in the formation of two
can see how three essential requirements of identical double helices (see Figure 1.6).
2. A genetic material must also have the
a genetic material are met.
capacity to carry all of the information
1. Any genetic material must be able to be needed to direct the organization and
replicated accurately, so that the information metabolic activities of the cell. As we saw in
it contains will be precisely replicated and Chapter 1, the product of most genes is a
inherited by daughter cells. The basis for protein molecule—a polymer composed of
exact duplication of a DNA molecule is the repeating units of amino acids. The sequence

48 Chapter 2 DNA Structure and DNA Manipulation

of amino acids in the protein determines its who might carry the mutant gene. To take
chemical and physical properties. A gene is another example, suppose there is reason
expressed when its protein product is to believe that a mutation causing a genetic
synthesized, and one requirement of the disease is present in a particular DNA frag-
genetic material is that it direct the order in
ment; then it is important to be able to pin-
which amino acid units are added to the end
point this fragment and isolate it from
of a growing protein molecule. In DNA, this
is done by means of a genetic code in which affected individuals to verify whether this
groups of three bases specify amino acids. hypothesis is true and, if so, to identify the
Because the four bases in a DNA molecule nature of the mutation.
can be arranged in any sequence, and Most procedures for the separation and
because the sequence can vary from one part identification of DNA fragments can be
of the molecule to another and from grouped into two general categories:
organism to organism, DNA can contain a
great many unique regions, each of which 1. Those that identify a specific DNA fragment
can be a distinct gene. A long DNA chain can present in genomic DNA by making use of
direct the synthesis of a variety of different the fact that complementary single-stranded
protein molecules. DNA sequences can, under the proper
3. A genetic material must also be capable of conditions, form a duplex molecule. These
undergoing occasional mutations in which procedures rely on nucleic acid hybridization.
the information it carries is altered. 2. Those that use prior knowledge of the
Furthermore, so that mutations will be sequence at the ends of a DNA fragment to
heritable, the mutant molecules must be specifically and repeatedly replicate this one
capable of being replicated as faithfully as fragment from genomic DNA. These
the parental molecule. This feature is procedures rely on selective DNA replication
necessary to account for the evolution of (amplification) by means of the polymerase
diverse organisms through the slow chain reaction.
accumulation of favorable mutations.
Watson and Crick suggested that heritable The major difference between these ap-
mutations might be possible in DNA by rare proaches is that the first (relying on nucleic
mispairing of the bases, with the result that acid hybridization) identifies fragments that
an incorrect nucleotide becomes incorpo- are present in the genomic DNA itself,
rated into a replicating DNA strand. whereas the second (relying on DNA ampli-
fication) identifies experimentally manu-
factured replicas of fragments whose original
templates (but not the replicas) were pre-
sent in the genomic DNA. This difference
2.3 The Separation and has practical implications:
Identification of Genomic
• Hybridization methods require a greater
DNA Fragments amount of genomic DNA for the experimen-
The following sections show how an un- tal procedures, but relatively large fragments
derstanding of DNA structure and replica- can be identified, and no prior knowledge of
tion has been put to practical use in the the DNA sequence is necessary.
development of procedures for the separa- • Amplification methods require extremely
small amounts of genomic DNA for the
tion and identification of particular DNA
experimental procedures, but the amplifica-
fragments. These methods are used primar- tion is usually restricted to relatively small
ily either to identify DNA markers or to aid fragments, and some prior knowledge of DNA
in the isolation of particular DNA frag- sequence is necessary.
ments that are of genetic interest. For ex-
ample, consider a pedigree of familial The following sections discuss both types of
breast cancer in which a particular DNA approaches and give examples of how they
fragment serves as a marker for a bit of are used. In methods that use nucleic acid
chromosome that also includes the mutant hybridization to identify particular frag-
gene responsible for the increased risk; ments present in genomic DNA, the first
then the ability to identify the fragment is step is usually cutting the genomic DNA
critically important in assessing the relative into fragments of experimentally manage-
risk for each of the women in the pedigree able size. This procedure is discussed next.

2.3 The Separation and Identification of Genomic DNA Fragments 49

Restriction Enzymes and
Site-Specific DNA Cleavage
Procedures for chemical isolation of DNA,
such as those developed by Avery,
MacLeod, and McCarty (Chapter 1), usually
lead to random breakage of double-
stranded molecules into an average length
of about 50,000 base pairs. This length is de-
noted 50 kb, where kb stands for kilobases
(1 kb 1000 base pairs). A length of 50 kb
is close to the length of double-stranded
DNA present in the bacteriophage that in-
fects E. coli. The 50-kb fragments can be
made shorter by vigorous shearing forces,
such as occur in a kitchen blender, but one Figure 2.9 Structure of the part of the restriction
enzyme BamHI that comes into contact with its
of the problems with breaking large DNA
recognition site in the DNA (blue). The pink and
molecules into smaller fragments by ran- green cylinders represent regions of the enzyme
dom shearing is that the fragments con- in which the amino acid chain is twisted in the
taining a particular gene, or part of a gene, form of a right-handed helix. [Courtesy of A. A.
will be of different sizes. In other words, Aggarwal. Reprinted with permission from M.
Newman, T. Strzelecka, L. F. Dorner, I.
with random shearing, it is not possible to
Schildkraut, and A. A. Aggarwal, 1995. Science
isolate and identify a particular DNA frag- 269: 656. Copyright 2000 American Association
ment on the basis of its size and sequence for the Advancement of Science.]
content, because each randomly sheared
molecule that contains the desired se-
quence somewhere within it differs in size
and cleaves each strand between the G-
from all other molecules that contain the
bearing nucleotides shown in red. Figure 2.9
sequence. In this section we describe an im-
shows how the regions that make up the
portant enzymatic technique that can be
active site of BamHI contact the recognition
used for cleaving DNA molecules at specific
site (blue) just prior to cleavage, and the
sites. This method ensures that all DNA
cleavage reaction is indicated in Figure 2.10.
fragments that contain a particular se-
Table 2.3 lists nine of the several hundred
quence have the same size; furthermore,
restriction enzymes that are known. Most
each fragment that contains the desired se-
restriction enzymes are named after the
quence has the sequence located at exactly
species in which they were found. BamHI,
the same position within the fragment.
for example, was isolated from the bac-
The cleavage method makes use of an
terium Bacillus amyloliquefaciens strain H,
important class of DNA-cleaving enzymes
and it is the first (I) restriction enzyme iso-
isolated primarily from bacteria. The en-
lated from this organism. Because the first
zymes are called restriction endo-
three letters in the name of each restriction
nucleases or restriction enzymes, and
enzyme stand for the bacterial species of
they are able to cleave DNA molecules at
origin, these letters are printed in italics; the
the positions at which particular, short se-
rest of the symbols in the name are not ital-
quences of bases are present. These natu-
icized. Most restriction enzymes recognize
rally occurring enzymes serve to protect the
only one short base sequence, usually four
bacterial cell by disabling the DNA of bacte-
or six nucleotide pairs. The enzyme binds
riophages that attack it. Their discovery
with the DNA at these sites and makes a
earned Werner Arber of Switzerland a
break in each strand of the DNA molecule,
Nobel Prize in 1978. Technically, the en-
producing 3'-OH and 5'-P groups at each
zymes are known as type II restriction endonu-
position. The nucleotide sequence recog-
cleases. The restriction enzyme BamHI is one
nized for cleavage by a restriction enzyme is
example; it recognizes the double-stranded
called the restriction site of the enzyme.
sequence
The examples in Table 2.3 show that some
5'-GGATCC-3' restriction enzymes cleave their restriction
3'-CCTAGG-5' site asymmetrically (at different sites in the

50 Chapter 2 DNA Structure and DNA Manipulation

BamH1 restriction site, GGATCC Figure 2.10 Mechanism of DNA
cleavage by the restriction enzyme
BamHI. Wherever the duplex
5’ end 3’ end contains a BamHI restriction site, the
GGATCC enzyme makes a single cut in the
CCTAGG backbone of each DNA strand. Each
3’ end 5’ end cut creates a new 3' end and a new 5'
end, separating the duplex into two
fragments. In the case of BamHI the
Cleavage occurs Cleavage creates a short cuts are staggered cuts, so the result-
in each strand at complementary single- ing ends terminate in single-stranded
the site of the stranded overhang in each regions, each four base pairs in
arrowhead. cleaved end (“sticky ends”). length.

5’ 3’ 5’ 3’
G New ends GATCC
CCTAG created G
3’ 5’ 3’ 5’
Restriction fragment Restriction fragment

Table 2.3 Some restriction endonucleases, their sources, and their cleavage sites
Enzyme Enzyme Enzyme
(Microorganism) (Microorganism) (Microorganism)

EcoRI HindIII AruI

(Escherichia coli) (Haemophilus influenzae) (Arthrobacter luteus)

Target sequence Target sequence

GAAT TC and cleavage site; AAGCT T and cleavage site; AGCT
CT TAAG T TCGAA TCGA
sticky ends blunt ends

BamHI PstI RsAI

(Bacillus amyloliquefaciens H) (Providencia stuartii) (Rhodopseudomonas sphaeroides)

GGATCC CTGCAG GTAC

CCTAGG GACGTC CATG

HaeII TaqI PvuII

(Haemophilus aegyptus) (Thermus aquaticus) (Proteus vulgaris)

PuG C G C Py TCGA CAGCTG

PyC G C G Pu AGCT GTCGAC

Note: The vertical dashed line indicates the axis of symmetry in each sequence. Red arrows indicate the sites of cutting. The enzyme TaqI yields cohe-
sive ends consisting of two nucleotides, whereas the cohesive ends produced by the other enzymes contain four nucleotides. Pu and Py refer to any
purine and pyrimidine, respectively.

2.3 The Separation and Identification of Genomic DNA Fragments 51

BamHI two DNA strands), but other restriction The DNA fragment produced by a pair
(Bacillus enzymes cleave symmetrically (at the same of adjacent cuts in a DNA molecule is called
amyloliquefaciens H) site in both strands). The former leave a restriction fragment. A large DNA mol-
sticky ends because each end of the ecule will typically be cut into many restric-
cleaved site has a small, single-stranded tion fragments of different sizes. For
GGATCC overhang that is complementary in base se- example, an E. coli DNA molecule, which
CCTAGG
quence to the other end (Figure 2.10). In contains 4.6 106 base pairs, is cut into
contrast, enzymes that have symmetrical several hundred to several thousand frag-
cleavage sites yield DNA fragments that ments, and mammalian genomic DNA is
AruI
have blunt ends. In virtually all cases, the cut into more than a million fragments. Al-
(Arthrobacter luteus) restriction site of a restriction enzyme reads though these numbers are large, they are
the same on both strands, provided that the actually quite small relative to the number
opposite polarity of the strands is taken into of sugar–phosphate bonds in the DNA of an
AGCT
account; for example, each strand in the re- organism.
TCGA striction site of BamHI reads 5'-GGATCC-3'
(Figure 2.10). A DNA sequence with this
type of symmetry is called a palindrome. Gel Electrophoresis
(In ordinary English, a palindrome is a The DNA fragments produced by a restric-
word or phrase that reads the same for- tion enzyme can be separated by size using
wards and backwards, such as “madam.”) the fact that DNA is negatively charged and
Restriction enzymes have the following moves in response to an electric field. If the
important characteristics: terminals of an electrical power source are
connected to the opposite ends of a hori-
• Most restriction enzymes recognize a single zontal tube containing a DNA solution,
restriction site. then the DNA molecules will move toward
• The restriction site is recognized without the positive end of the tube at a rate that
regard to the source of the DNA.
depends on the electric field strength and
• Because most restriction enzymes recognize a
unique restriction site sequence, the number
on the shape and size of the molecules. The
of cuts in the DNA from a particular organism movement of charged molecules in an elec-
is determined by the number of restriction tric field is called electrophoresis.
sites present. The type of electrophoresis most
commonly used in genetics is gel electro-
phoresis. An experimental arrangement
for gel electrophoresis of DNA is shown in
Figure 2.11. A thin slab of a gel, usually
agarose or acrylamide, is prepared contain-
Slots for Bands (visible after ing small slots (called wells) into which
samples suitable treatment)
Gel samples are placed. An electric field is ap-
plied, and the negatively charged DNA mol-
Electrode
ecules penetrate and move through the gel
toward the anode (the positively charged
electrode). A gel is a complex molecular
Buffer
solution network that contains narrow, tortuous
Direction of passages, so smaller DNA molecules pass
movement through more easily; hence the rate of
–
+ movement increases as the size of the DNA
fragment decreases. Figure 2.12 shows the re-
Figure 2.11 Apparatus for gel electrophoresis. Liquid gel sult of electrophoresis of a set of double-
is allowed to harden with an appropriately shaped mold
in place to form “wells” for the samples (purple). After stranded DNA molecules in an agarose gel.
electrophoresis, the DNA fragments, located at various Each discrete region containing DNA is
positions in the gel, are made visible by immersing the called a band. The bands can be visualized
gel in a solution containing a reagent that binds to or under ultraviolet light after soaking the gel
reacts with DNA. The separated fragments in a sample in the dye ethidium bromide, the molecules
appear as bands, which may be either visibly colored or
fluorescent, depending on the particular reagent used. of which intercalate between the stacked
The region of a gel in which the fragments in one sam- bases in duplex DNA and render it fluores-
ple can move is called a lane; this gel has seven lanes. cent. In Figure 2.12, each band in the gel

52 Chapter 2 DNA Structure and DNA Manipulation

Figure 2.12 Gel electrophoresis of DNA.
Band from Fragments of different sizes were mixed and
placed in a well. Electrophoresis was in the
heaviest fragment
downward direction. The DNA has been made
(moves least)
visible by the addition of a dye (ethidium
Direction bromide) that binds only to DNA and that
of movement fluoresces when the gel is illuminated with
short-wavelength ultraviolet light.

Band from
lightest fragment
(moves most)

For any one agarose concentration, except for

results from the fact that all DNA fragments the largest fragments, the distance migrated
decreases as a linear function of the logarithm
of a given size have migrated to the same
of fragment size.
position in the gel. To produce a visible
band, a minimum of about 5 109 20
grams of DNA is required, which for a frag-
ment of size 3 kb works out to about 109 18
log10 of size in bp

molecules. The point is that a very large

number of copies of any particular DNA 16
fragment must be present in order to yield a
Size range (kb) for efficient separation of
linear double-stranded DNA fragments

visible band in an electrophoresis gel. 14

A linear double-stranded DNA fragment
has an electrophoretic mobility that de-
12
creases in proportion to the logarithm of its
length in base pairs—the longer the frag-
10 Distance migrated
ment, the slower it moves—but the propor-
tionality constant depends on the agarose
concentration, the composition of the 8
buffering solution, and the electrophoretic
conditions. This means that different con- 6
centrations of agarose allow efficient sepa-
ration of different size ranges of DNA
4
fragments (see Figure 2.13). Less dense gels,
such as 0.6 percent agarose, are used to sep-
2
arate larger fragments; whereas more dense
gels, such as 2 percent agarose, are used to
separate smaller fragments. The inset in 0
0.6 0.7 0.9 1.2 1.5 2.0
Figure 2.13 shows the dependence of elec-
trophoretic mobility on the logarithm of Agarose concentration (%)
fragment size. It also indicates that the lin-
Figure 2.13 In agarose gels, the concentration of
ear relationship breaks down for the largest agarose is an important factor in determining
fragments that can be resolved under a the size range of DNA fragments that can be
given set of conditions. separated.

2.3 The Separation and Identification of Genomic DNA Fragments 53

(A) EcoRI enzyme genetic engineering, discussed further in
22 5 5.5 7.5 6 4 Chapter 13.
λ DNA DNA fragments that have been cloned
1 5 4 2 3 6 into organisms such as E. coli are widely
used because the fragments can be isolated
in large amounts and purified relatively
easily. Among the uses of cloned DNA are:
1 2 3 4 5 6 • DNA sequencing. All current methods of DNA
(B) EcoRI sequencing require cloned DNA fragments.
These methods are discussed in Chapter 6.
BamHI • Nucleic acid hybridization. As we shall see
1 2 34 5/6
below, an important application of cloned
DNA fragments entails incorporating a
radioactive or light-emitting “label” into
(C) BamHI enzyme them, after which the labeled material is used
to “tag” DNA fragments containing similar
sequences.
6 1 5 4 3 2
λ DNA • Storage and distribution. Cloned DNA can be
5.5 17.5 5 6.5 7.5 8 stored for long periods without risk of change
and can easily be distributed to other
Figure 2.14 Restriction maps of DNA for the restriction enzymes researchers.
(A) EcoRI and (C) BamHI. The vertical bars indicate the sites of cutting.
The numbers within the arrows are the approximate lengths of the
fragments in kilobase pairs (kb). (B) An electrophoresis gel of BamHI
and EcoRI enzyme digests of DNA. Numbers indicate fragments in order Nucleic Acid Hybridization
from largest (1) to smallest (6); the circled numbers on the maps
correspond to the numbers beside the gel. The DNA has not undergone Most genomes are sufficiently large and
electrophoresis long enough to separate bands 5 and 6 of the BamHI digest. complex that digestion with a restriction
Note: In Problem 2 at the end of this chapter (Guide to Problem Solving), enzyme produces many bands that are the
we show how to use the results of a double digest to determine the partic-
same or similar in size. Identifying a partic-
ular order of fragments for a pair of restriction enzymes.
ular DNA fragment in a background of
many other fragments of similar size pre-
sents a needle-in-a-haystack problem. Sup-
pose, for example, that we are interested in
Because of the sequence specificity of a particular 3.0 BamHI fragment from the
cleavage, a particular restriction enzyme pro- human genome that serves as a marker in-
duces a unique set of fragments for a particular dicating the presence of a genetic risk factor
DNA molecule. Another enzyme will pro- toward breast cancer among the individuals
duce a different set of fragments from the in a particular pedigree. This fragment of
same DNA molecule. In Figure 2.14, this prin- 3.0 kb is indistinguishable, on the basis of
ciple is illustrated for the digestion of E. coli size alone, from fragments ranging from
phage DNA by either EcoRI or BamHI (see about 2.9 to 3.1 kb. How many fragments
part B). The locations of the cleavage sites in this size range are expected? When hu-
for these enzymes in DNA are shown in man genomic DNA is cleaved with BamHI,
Figures 2.14A and C. A diagram showing the average length of a restriction fragment
sites of cleavage along a DNA molecule is is 46 4096 base pairs, and the expected
called a restriction map. Particular DNA total number of BamHI fragments is about
fragments can be isolated by cutting out the 730,000; in the size range 2.9–3.1 kb, the
small region of the gel that contains the expected number of fragments is about
fragment and removing the DNA from the 17,000. What this means is that even
gel. One important use of isolated restric- though we know that the fragment we are
tion fragments employs the enzyme DNA interested in is 3 kb in length, it is only one
ligase to insert them into self-replicating of 17,000 fragments that are so similar in
molecules such as bacteriophage, plasmids, size that ours cannot be distinguished from
or even small artificial chromosomes (Fig- the others by length alone.
ure 2.2). These procedures constitute DNA This identification task is actually harder
cloning and are the basis of one form of than finding a needle in a haystack because

54 Chapter 2 DNA Structure and DNA Manipulation

haystacks are usually dry. A more accurate
Denatured
analogy would be looking for a needle in a Strand separation single strands
haystack that had been pitched into a

Relative light absorbance at 260 nm

swimming pool full of water. This analogy is 1.40
more relevant because gels, even though
they contain a supporting matrix to make
them semisolid, are primarily composed of
1.30
water, and each DNA molecule within a gel
is surrounded entirely by water. Clearly, we Further
denaturation Temperature at which
need some method by which the molecules half the base pairs are
in a gel can be immobilized and our specific 1.20 denatured and half
fragment identified. remain intact
The DNA fragments in a gel are usually
Denaturation
immobilized by transferring them onto a begins
1.10
sheet of special filter paper consisting of
Double-
nitrocellulose, to which DNA can be perma- stranded DNA
nently (covalently) bound. How this is
done is described in the next section. In this 1.00
section we examine how the two strands in 30 50 70 Tm 90 100 110
a double helix can be “unzipped” to form Temperature, °C
single strands and how, under the proper
conditions, two single strands that are Figure 2.15 Mechanism of denaturation of DNA by heat. The temperature
at which 50 percent of the base pairs are denatured is the melting tempera-
complementary or nearly complementary ture, symbolized Tm.
in sequence can be “zipped” together to
form a different double helix. The “unzip-
ping” is called denaturation, the “zipping”
renaturation. The practical applications of
denaturation and renaturation are many: pairs. When solutions containing DNA frag-
ments are raised to temperatures in the
• A small part of a DNA fragment can be
“zipped” with a much larger DNA fragment. range 85–100°C, or to the high pH of strong
This principle is used in identifying specific alkaline solutions, the paired strands begin
DNA fragments in a complex mixture, such as to separate, or “unzip.” Unwinding of the
the 3-kb BamHI marker for breast cancer that helix happens in less than a few minutes
we have been considering. Applications of this (the time depends on the length of the mol-
type include the tracking of genetic markers ecule). When the helical structure of DNA
in pedigrees and the isolation of fragments is disrupted, and the strands have become
containing a particular mutant gene. completely unzipped, the molecule is said
• A DNA fragment from one gene can be to be denatured. A common way to detect
“zipped” with similar fragments from other
denaturation is by measuring the capacity
genes in the same genome; this principle is
used to identify different members of families
of DNA in solution to absorb ultraviolet
of genes that are similar, but not identical, in light of wavelength 260 nm, because the
sequence and that have related functions. absorption at 260 nm (A260) of a solution of
• A DNA fragment from one species can be single-stranded molecules is 37 percent
“zipped” with similar sequences from other higher than the absorption of the double-
species. This allows the isolation of genes stranded molecules at the same concentra-
that have the same or related functions in tion. As shown in Figure 2.15, the progress of
multiple species. It is used to study aspects of denaturation can be followed by slowly
molecular evolution, such as how differences heating a solution of double-stranded DNA
in sequence are correlated with differences and recording the value of A260 at various
in function, and the patterns and rates of
temperatures. The temperature required
change in gene sequences as they evolve.
for denaturation increases with G C
As we saw in Section 2.2, the double- content, not only because GC base pairs
stranded helical structure of DNA is main- have three hydrogen bonds and AT base
tained by base stacking and by hydrogen pairs two, but because consecutive GC
bonding between the complementary base base pairs have stronger base stacking.

2.3 The Separation and Identification of Genomic DNA Fragments 55

Photocaptionphotocaptionphoto
captionphotocaptionphotocap-
tionphotocaptionphotocaption-
photocaptionphotocaptionphotoc
aptionphotocaption

09131_01_1746P

Denatured DNA strands can, under cer- Shown in part A is a solution of denatured
tain conditions, form double-stranded DNA DNA, called the probe, in which each mol-
with other strands, provided that the ecule has been labeled with either radioac-
strands are sufficiently complementary in tive atoms or light-emitting molecules.
sequence. This process of renaturation is Probe DNA is typically obtained from a
called nucleic acid hybridization be- clone, and the labeled probe usually con-
cause the double-stranded molecules are tains denatured forms of both strands pre-
“hybrid” in that each strand comes from a sent in the original duplex molecule. (This
different source. For DNA strands to hy- has led to some confusing terminology. Ge-
bridize, two requirements must be met: neticists say that probe DNA hybridizes
1. The salt concentration must be high with DNA fragments containing sequences
( 0.25M) to neutralize the negative charges that are similar to the probe, rather than
of the phosphate groups, which would complementary. What actually occurs is that
otherwise cause the complementary strands one strand of the probe undergoes hy-
to repel one another. bridization with a complementary sequence
2. The temperature must be high enough to in the fragment. But because the probe usu-
disrupt hydrogen bonds that form at random ally contains both strands, hybridization
between short sequences of bases within the takes place with any fragment that contains
same strand, but not so high that stable base a similar sequence, each strand in the probe
pairs between the complementary strands
undergoing hybridization with the comple-
are disrupted.
mentary sequence in the fragment.)
The initial phase of renaturation is a slow Part B in Figure 2.16 is a diagram of ge-
process because the rate is limited by the nomic DNA fragments that have been im-
random chance that a region of two com- mobilized on a nitrocellulose filter. When
plementary strands will come together to the probe is mixed with the genomic frag-
form a short sequence of correct base pairs. ments (part C), random collisions bring
This initial pairing step is followed by a short, complementary stretches together. If
rapid pairing of the remaining complemen- the region of complementary sequence is
tary bases and rewinding of the helix. short (part D), then random collision can-
Rewinding is accomplished in a matter of not initiate renaturation because the flank-
seconds, and its rate is independent of DNA ing sequences cannot pair; in this case the
concentration because the complementary probe falls off almost immediately. If, how-
strands have already found each other. ever, a collision brings short sequences to-
The example of nucleic acid hybridiza- gether in the correct register (part E), then
tion in Figure 2.16 will enable us to under- this initiates renaturation, because the pair-
stand some of the molecular details and also ing proceeds zipperlike from the initial con-
to see how hybridization is used to “tag” tact. The main point is that DNA fragments
and identify a particular DNA fragment. are able to hybridize only if the length of

56 Chapter 2 DNA Structure and DNA Manipulation

(A)–Fragments of denatured (B)–Fragments of denatured
and labeled probe DNA genomic DNA
Mix immobilized on filter
The denatured probe
usually contains both
complementary strands.

GTATAATGCGAGCC
Renaturation
CATAT TACGCTCGG
Some fragments in the genomic
DNA may contain a sequence
similar to that in the probe DNA.

(C)–
Random collisions bring small
regions of complementary sequences
together to start the renaturation. Heat-sealed bag

(D)–Initial pairing with

incorrect fragment (E)–Initial pairing with
correct fragment
T GCA GCCGT TA CAT
GC T C AGGA TAC ACA
A T T A C T AT AA TGC G CCA
A
T GC T
CG C C C GC A T A T T A C G CG A
TC GG
C G

Base pairing cannot go farther Base pairing proceeds in a zipper-like

because flanking sequences are not fashion because flanking sequences
complementary; probe falls away. are complementary; probe sticks.

Figure 2.16 Nucleic acid hybridization. (A) Duplex molecules of probe DNA (obtained from a clone)
are denatured and (B) placed in contact with a filter to which is attached denatured strands of
genomic DNA. (C) Under the proper conditions of salt concentration and temperature, short
complementary stretches come together by random collision. (D) If the sequences flanking the
paired region are not complementary, then the pairing is unstable and the strands come apart again.
(E) If the sequences flanking the paired region are complementary, then further base pairing
stabilizes the renatured duplex.

the region in which they can pair is suffi- tured DNA, if it is suitably labeled (for ex-
ciently long. Some mismatches in the ample, with radioactive 32P), can be com-
paired region can be tolerated. How many bined with a complex mixture of denatured
mismatches are allowed is determined by DNA fragments, and upon renaturation the
the conditions of the experiment: The small fragment will “tag” with radioactivity
lower the temperature at which the hy- any molecules in the complex mixture with
bridization is carried out, and the higher which it can hybridize. The radioactive tag
the salt concentration, the greater the pro- allows these molecules to be identified.
portion of mismatches that are tolerated. The methods of DNA cleavage, elec-
trophoresis, transfer to nitrocellulose, and
hybridization with a probe are all combined
The Southern Blot in the Southern blot, named after its in-
The ability to renature DNA in the manner ventor Edward Southern. In this procedure,
outlined in Figure 12.16 means that solu- a gel in which DNA molecules have been
tion containing a small fragment of dena- separated by electrophoresis is treated with

2.3 The Separation and Identification of Genomic DNA Fragments 57

(A)–DNA is cleaved; electrophoresis (B)–DNA fragments are blotted (C)–Filter is exposed (D)–Filter is exposed to
is used to separate DNA onto nitrocellulose filter to radioactive probe photographic film;
film is developed
Gel
DNA restriction
fragments Heat-sealed bag
(so many that Filter
individual bands x-ray
Probe in
run together) solution film
Nitrocellulose
Size filter
markers

Gel

Buffer Absorbent Nitrocellulose DNA fragments Probe hybridizes with Bands on x-ray
solution paper filter transferred onto filter homologous DNA film created by
(invisible at this stage) fragments (still not visible) labeled probe

Figure 2.17 Southern

blot. (A) DNA re- alkali to denature the DNA and render it the ones that are of interest. Practical appli-
striction fragments are single-stranded (Figure 2.17). Then the DNA cations of Southern blotting center on iden-
separated by electro-
phoresis, blotted from is transferred to a sheet of nitrocellulose fil- tifying DNA fragments that contain
the gel onto a nitro- ter in such a way that the relative positions sequences similar to the probe DNA or
cellulose or nylon of the DNA fragments are maintained. The RNA, where the proportion of mismatched
filter, and chemically transfer is accomplished by overlaying the nucleotides allowed is determined by the
attached by the use nitrocellulose onto the gel and stacking conditions of hybridization. The advantages
of ultraviolet light.
(B) The strands are many layers of absorbent paper on top; the of the Southern blot are convenience and
denatured and mixed absorbent paper sucks water molecules sensitivity. The sensitivity comes from the
with radioactive or from the gel and through the nitrocellulose, fact that both hybridization with a labeled
light-sensitive probe to which the DNA fragments adhere. (This probe and the use of photographic film am-
DNA, which binds step is the “blot” component of the South- plify the signal; under typical conditions, a
with complementary
sequences present ern blot, parts A and B.) Then the filter is band can be observed on the film with only
on the filter. The treated so that the single-stranded DNA be- 5 1012 grams of DNA—a thousand
bound probe remains, comes permanently bound. The treated fil- times less DNA than the amount required
whereas unbound ter is mixed with a solution containing to produce a visible band in the gel itself.
probe washes off. denatured probe (DNA or RNA) under con-
(C) Bound probe is
revealed by darkening ditions that allow complementary strands to
of photographic film hybridize to form duplex molecules (part
placed over the filter. C). Radioactive or other label present in the 2.4 Selective Replication of
The positions of the probe becomes stably bound to the filter,
bands indicate which and therefore resistant to removal by wash-
Genomic DNA Fragments
restriction fragments
contain DNA se- ing, only at positions at which base se- Although nucleic acid hybridization allows
quences homologous quences complementary to the probe are a particular DNA fragment to be identified
to those in the probe. already present on the filter, so that the when present in a complex mixture of frag-
probe can form duplex molecules. The label ments, it does not enable the fragment to be
is located by placing the paper in contact separated from the others and purified. Ob-
with x-ray film. After development of the taining the fragment in purified form
Au: Proofreader film, blackened regions indicate positions of requires cloning, which is straightforward
asks that you con- bands containing the radioactive or light- but time-consuming. (Cloning methods are
firm reference to emitting label (part D). discussed in Chapter 13.) However, if the
Chapter 13 in first The procedure in Figure 2.17 solves the fragment of interest is not too long, and if
para of sec. 2.4. wet-haystack problem by transferring and the nucleotide sequence at each end is
immobilizing the genomic DNA fragments known, then it becomes possible to obtain
to a filter and identifying, by hybridization, large quantities of the fragment merely by

58 Chapter 2 DNA Structure and DNA Manipulation

selective replication. This process is called dATP, dGTP, dTTP, and dCTP, which contain
amplification. How would one know the the bases adenine, guanine, thymine, and
sequence of the ends? Let us return to our cytosine, respectively. Details of the struc-
example of the 3.0-kb BamHI fragment that tures of dCTP and dGTP are shown in Figure
serves to mark a risk factor for breast cancer 2.18, in which the phosphate groups cleaved
in certain pedigrees. Suppose that this frag- off during DNA synthesis are indicated.
ment is cloned and sequenced from one af- DNA synthesis requires all four nucleoside
fected individual, and it is found that, 5'-triphosphates and does not take place if
relative to the normal genomic sequence in any of them is omitted.
this region, the BamHI fragment is missing a A feature found in all DNA polymerases
region of 500 base pairs. At this point the se- is that
quences at the ends of the fragment are A DNA polymerase can only elongate a DNA
known, and we can also infer that amplifi- strand. It is not possible for DNA polymerase to
cation of genomic DNA from individuals initiate synthesis of a new strand, even when a
with the risk factor will yield a band of 3.0 template molecule is present.
kb, whereas amplification from genomic
DNA of noncarriers will yield a band of 3.5 One important implication of this prin-
kb. This difference allows every person in ciple is that DNA synthesis requires a pre-
the pedigree to be diagnosed as a carrier or existing segment of nucleic acid that is
noncarrier merely by means of DNA ampli- hydrogen-bonded to the template strand.
fication. To understand how amplification This segment is called a primer. Because
works, it is first necessary to examine a few the primer molecule can be very short, it is
key features of DNA replication. an oligonucleotide, which literally means
“few nucleotides.” As we shall see in Chap-
ter 6, in living cells the primer is a short seg-
Constraints on DNA Replication: ment of RNA, but in DNA amplification in
Primers and 5'-to-3' Strand vitro, the primer employed is usually DNA.
Elongation
As with most metabolic reactions in living H
cells, nucleic acids are synthesized in chem-
ical reactions controlled by enzymes. An H N H
enzyme that forms the sugar–phosphate O– O– O– C C
bond (the phosphodiester bond) between H C C N
–O O O O CH2
adjacent nucleotides in a nucleic acid chain P P P
O N C
is called a DNA polymerase. A variety of O O O O
H H
DNA polymerases have been purified, and H H
for amplification of a DNA fragment, the OH H
DNA synthesis is carried out in vitro by com-
Deoxycytidine 5'-triphosphate (dCTP)
bining purified cellular components in a
test tube under precisely defined condi- The outer two phosphate
tions. (In vitro means “without participation groups are cleaved off when
of living cells.”) nucleotides are added to the
In order for DNA polymerase to catalyze growing DNA strand.
synthesis of a new DNA strand, preexisting
single-stranded DNA must be present. Each
single-stranded DNA molecule present in H
O– O– O–
the reaction mix can serve as a template C N
O
upon which a new partner strand is created –O
P O P O P O CH2 C
O N C
by the DNA polymerase. For DNA replica- C
O O O H H G N H
tion to take place, the 5'-triphosphates of
H H N
the four deoxynucleosides must also be pre- C
OH H
sent. This requirement is rather obvious, be- N H
cause the nucleoside triphosphates are the
Deoxyguanosine 5'-triphosphate (dGTP) H
precursors from which new DNA strands
are created. The triphosphates needed are Figure 2.18 Two deoxynucleoside triphosphates used in DNA synthesis.
the compounds denoted in Table 2.1 as The outer two phosphate groups are removed during synthesis.

2.4 Selective Replication of Genomic DNA Fragments 59

44
A P–P (pyrophosphate)
OH group is released.

C O P
5’ P P
end P
P
1
C
Base pairing specifies
3
the next nucleotide to An O–P bond is
G 3’
be added at the 3’ end. end formed to attach
OH the new nucleotide.
3’

T P
Template Newly 2
strand synthesized 3’ The 3’ hydroxyl group at the
strand 3’ end of the growing strand attacks
O the innermost phosphate group of
P
3’ 5’ the incoming trinucleotide.
end end

Figure 2.19 Addition of nucleotides to the 3'-OH terminus of a growing strand. The recognition step
is shown as the formation of hydrogen bonds between the A and the T. The chemical reaction is that
the 3'-OH group of the 3' end of the growing chain attacks the innermost phosphate group of the
incoming trinucleotide.

It is the 3' end of the primer that is essen- The Polymerase Chain
tial, because, as emphasized in Chapter 1, Reaction
DNA synthesis proceeds only by addition of The requirement for an oligonucleotide
successive nucleotides to the 3' end of the primer, and the constraint that chain elon-
growing strand. In other words, chain elonga- gation must always occur in the 5' 3'
tion always takes place in the 5'-to-3' direction
direction, make it possible to obtain large
(5' 3').
quantities of a particular DNA sequence by
The reason for the 5' 3' direction of selective amplification in vitro. The method
chain elongation is illustrated in Figure 2.19. for selective amplification is called the poly-
It is a consequence of the fact that the reac- merase chain reaction (PCR). For its in-
tion catalyzed by DNA polymerase is the vention, Californian Kary B. Mullis was
formation of a phosphodiester bond be- awarded a Nobel Prize in 1993. PCR amplifi-
tween the free 3'-OH group of the chain cation uses DNA polymerase and a pair of
being extended and the innermost phos- short, synthetic oligonucleotide primers,
phorus atom of the nucleoside triphosphate usually 18–22 nucleotides in length, that are
being incorporated at the 3' end. Recogni- complementary in sequence to the ends of
tion of the appropriate incoming nucleoside the DNA sequence to be amplified. Figure
triphosphate in replication depends on base 2.20 gives an example in which the primer
pairing with the opposite nucleotide in the oligonucleotides (green) are 9-mers. These
template strand. DNA polymerase will usu- are too short for most practical purposes, but
ally catalyze the polymerization reaction they will serve for illustration. The original
that incorporates the new nucleotide at the duplex molecule (part A) is shown in blue.
primer terminus only when the correct This duplex is mixed with a vast excess of
base pair is present. The same DNA poly- primer molecules, DNA polymerase, and all
merase is used to add each of the four four nucleoside triphosphates. When the
deoxynucleoside phosphates to the 3'-OH temperature is raised, the strands of the
terminus of the growing strand. duplex denature and become separated.

60 Chapter 2 DNA Structure and DNA Manipulation

(A)–Original duplex DNA Figure 2.20 Role of
DNA
primer sequences in
PCR amplification. (A)
ACATGACGT CTATGCATG Target DNA duplex
TGTACTGCA GATACGTAC (blue), showing
sequences chosen as
the primer-binding
sites flanking the
region to be amplified.
(B)–First cycle—primers attached
(B) Primer (green)
bound to denatured
strands of target DNA.
ACATGACGT CTATGCATG
GATACGTAC (C) First round of
amplification. Newly
Primer Region to be synthesized DNA is
amplified Primer shown in pink. Note
that each primer is
ACATGACGT
TGTACTGCA GATACGTAC extended beyond the
other primer site. (D)
Second round of
amplification (only
one strand shown); in
(C)–Elongation (DNA synthesis) this round, the newly
synthesized strand
terminates at the
ACATGACGT CTATGCATG opposite primer site.
TGTACTGCA GATACGTAC (E) Third round of
amplification (only
one strand shown); in
this round, both
strands are truncated
ACATGACGT CTATGCATG at the primer sites.
TGTACTGCA GATACGTAC Primer sequences are
normally about twice
as long as shown here.

(D)–Second cycle—after elongation

ACATGACGT CTATGCATG
TGTACTGCA GATACGTAC

(E)–Third cycle—after elongation

ACATGACGT CTATGCATG
TGTACTGCA GATACGTAC

When the temperature is lowered again to cause each DNA strand elongates only at the
allow renaturation, the primers, because 3' end. After the primers have annealed,
they are in great excess, become annealed to each is elongated by DNA polymerase using
the separated template strands (part B). the original strand as a template, and the
Note that the primer sequences are different newly synthesized DNA strands (red) grow
from each other but complementary to se- toward each other as synthesis proceeds
quences present in opposite strands of the (part C). Note that:
original DNA duplex and flanking the re- A region of duplex DNA present in the
gion to be amplified. The primers are ori- original reaction mix can be PCR-amplified
ented with their 3' ends pointing in the only if the region is flanked by the primer
direction of the region to be amplified, be- oligonucleotides.

2.4 Selective Replication of Genomic DNA Fragments 61

To start a second cycle of PCR amplifica- ends of the sequence to be amplified and be-
tion, the temperature is raised again to de- come the substrates for chain elongation by
nature the duplex DNA. Upon lowering of DNA polymerase. In the first cycle of PCR
the temperature, the original parental amplification, the DNA is denatured to sep-
strands anneal with the primers and are arate the strands. The denaturation temper-
replicated as shown in Figure 2.20B and C. ature is usually around 95°C. Then the
The daughter strands produced in the first temperature is decreased to allow annealing
round of amplification also anneal with in the presence of a vast excess of the primer
primers and are replicated, as shown in part oligonucleotides. The annealing tempera-
D. In this case, although the daughter du- ture is typically in the range of 50°C–60°C,
plex molecules are identical in sequence to depending largely on the G C content of
the original parental molecule, they consist the oligonucleotide primers. To complete
entirely of primer oligonucleotides and the cycle, the temperature is raised slightly,
nonparental DNA that was synthesized in to about 70°C, for the elongation of each
either the first or the second cycle of PCR. primer. The steps of denaturation, renatura-
As successive cycles of denaturation, primer tion, and replication are repeated from
annealing, and elongation occur, the origi- 20–30 times, and in each cycle the number
nal parental strands are diluted out by the of molecules of the amplified sequence is
proliferation of new daughter strands until doubled.
eventually, virtually every molecule pro- Implementation of PCR with conven-
duced in the PCR has the structure shown tional DNA polymerases is not practical,
in part D. because at the high temperature necessary
The power of PCR amplification is that for denaturation, the polymerase is itself ir-
the number of copies of the template strand reversibly unfolded (denatured) and be-
increases in exponential progression: 1, 2, comes inactive. However, DNA polymerase
4, 8, 16, 32, 64, 128, 256, 512, 1024, and so isolated from certain organisms is heat sta-
forth, doubling with each cycle of replica- ble because the organisms normally live in
tion. Starting with a mixture containing as hot springs at temperatures well above
little as one molecule of the fragment of in- 90°C, such as are found in Yellowstone
terest, repeated rounds of DNA replication National Park. Such organisms are said to
increase the number of amplified molecules be thermophiles. The most widely used
exponentially. For example, starting with a heat-stable DNA polymerase is called Taq
single molecule, 25 rounds of DNA replica- polymerase, because it was originally iso-
tion will result in 225 3.4 107 mole- lated from the thermophilic bacterium
cules. This number of molecules of the Thermus aquaticus.
amplified fragment is so much greater than PCR amplification is very useful for gen-
that of the other unamplified molecules in erating large quantities of a specific DNA se-
the original mixture that the amplified quence. The principal limitation of the
DNA can often be used without further pu- technique is that the DNA sequences at the
rification. For example, a single fragment of ends of the region to be amplified must be
3 kb in E. coli accounts for only 0.06 percent known so that primer oligonucleotides can
of the DNA in this organism. However, if be synthesized. In addition, sequences
this single fragment were replicated longer than about 5000 base pairs cannot be
through 25 rounds of replication, then replicated efficiently by conventional PCR
99.995 percent of the resulting mixture procedures, although there are modifica-
would consist of the amplified sequence. A tions of PCR that allow longer fragments to
3-kb fragment of human DNA constitutes be amplified. On the other hand, many ap-
only 0.0001 percent of the total genome plications require amplification of relatively
size. Amplification of a 3-kb fragment of small fragments. The major advantage of
human DNA to 99.995 percent purity PCR amplification is that it requires only
would require about 34 cycles of PCR. trace amounts of template DNA. Theoreti-
An overview of the polymerase chain cally only one template molecule is re-
reaction is shown in Figure 2.21. The DNA se- quired, but in practice the amplification of a
quence to be amplified is again shown in single molecule may fail because the mole-
blue and the oligonucleotide primers in cule may, by chance, be broken or damaged.
green. The oligonucleotides anneal to the But amplification is usually reliable with as

62 Chapter 2 DNA Structure and DNA Manipulation

(A) First cycle

DNA sequence to be amplified

Denaturation, annealing

Primer oligonucleotides

DNA replication

(B) Second cycle

(C) 20–30 cycles Amplified DNA

sequences

Figure 2.21 Polymerase chain reaction (PCR) for amplification of particular DNA sequences. Only
the region to be amplified is shown. Oligonucleotide primers (green) that are complementary to the
ends of the target sequence (blue) are used in repeated rounds of denaturation, annealing, and DNA
replication. Newly replicated DNA is shown in pink. The number of copies of the target sequence
doubles in each round of replication, eventually overwhelming any other sequences that may be
present.

2.4 Selective Replication of Genomic DNA Fragments 63

few as 10–100 template molecules, which alanine in the polypeptide chain that the
makes PCR amplification 10,000–100,000 gene encodes, another form of the same
times more sensitive than detection via nu- gene may have, at the same position, the
cleic acid hybridization. codon GCG, which also specifies alanine.
The exquisite sensitivity of PCR amplifi- Hence the two forms of the gene encode
cation has led to its use in DNA typing for the same sequence of amino acids yet dif-
criminal cases in which a minuscule fer in DNA sequence. The alternative forms
amount of biological material has been left of a gene are called alleles of the gene.
behind by the perpetrator (skin cells on a Different alleles may also code for different
cigarette butt or hair-root cells on a single amino acid sequences, sometimes with
hair can yield enough template DNA for drastic effects. Recall the example of the
amplification). In research, PCR is widely PAH gene for phenyalanine hydroxylase in
used in the study of independent muta- Chapter 1, in which a change in codon 408
tions in a gene whose sequence is known from CGG (arginine) to TGG (tryptophan)
in order to identify the molecular basis of results in an inactive enzyme that becomes
each mutation, to study DNA sequence expressed as the inborn error of metabo-
variation among alternative forms of a lism phenylketonuria.
gene that may be present in natural popu- Within a cell, genes are arranged in lin-
lations, or to examine differences among ear order along microscopic thread-like
genes with the same function in different bodies called chromosomes, which we
species. The PCR procedure has also come will examine in detail in Chapters 4 and 8.
into widespread use in clinical laboratories Each human reproductive cell contains one
for diagnosis. To take just one very impor- complete set of 23 chromosomes contain-
tant example, the presence of the human ing 3 109 base pairs of DNA. A typical
immunodeficiency virus (HIV), which chromosome contains several hundred to
causes acquired immune deficiency syn- several thousand genes. In humans the
drome (AIDS), can be detected in trace average is approximately 3500 genes per
quantities in blood banks via PCR by using chromosome. Each chromosome contains a
primers complementary to sequences in single molecule of duplex DNA along its
the viral genetic material. These and other length, complexed with proteins and very
applications of PCR are facilitated by the tightly coiled. The DNA in the average hu-
fact that the procedure lends itself to au- man chromosome, when fully extended,
tomation by the use of mechanical robots has relative dimensions comparable to
to set up and run the reactions. those of a wet spaghetti noodle 25 miles
long; when the DNA is coiled in the form of
a chromosome, its physical compaction is
2.5 The Terminology of comparable to that of the same noodle
coiled and packed into an 18-foot canoe.
Genetic Analysis The physical position of a gene along a
In order to discuss the types of DNA mark- chromosome is called the locus of the
ers that modern geneticists commonly use gene. In most higher organisms, including
in genetic analysis, we must first introduce human beings, each cell other than a sperm
some key terms that provide the essential or egg contains two copies of each type of
vocabulary of genetics. These terms can be chromosome—one from the mother and
understood with reference to Figure 2.22. In one inherited from the father. Each mem-
Chapter 1 we defined a gene as an ele- ber of such a pair of chromosomes is said to
ment of heredity, transmitted from parents be homologous to the other. (The chro-
to offspring in reproduction , that influ- mosomes that determine sex are an impor-
ences one or more hereditary traits. Chem- tant exception, discussed in Chapter 4, that
ically, a gene is a sequence of nucleotides we will ignore for now.) At any locus,
along a DNA molecule. In a population of therefore, each individual carries two al-
organisms, not all copies of a gene may leles, because one allele is present at a cor-
have exactly the same nucleotide se- responding position in each of the
quence. For example, whereas one form of homologous maternal and paternal chro-
a gene has the codon GCA, which specifies mosomes (Figure 2.22).

64 Chapter 2 DNA Structure and DNA Manipulation

The genetic constitution of an individual Locus (physical position) One of each pair of
is called its genotype. For a particular of gene A in each chromosomes is maternal in
gene, if the two alleles at the locus in an in- homologous chromosome origin, the other paternal.
dividual are indistinguishable from each
other, then for this gene the genotype of the Gene A Gene B Gene C
individual is said to be homozygous for B C
the allele that is present. If the two alleles at Homologous
chromosomes
the locus are different from each other, b C
then for this gene the genotype of the indi-
Heterozygous Homozygous
vidual is said to be heterozygous for the genotype Bb genotype CC
alleles that are present. Typographically,
genes are indicated in italics, and alleles are Many different A
typically distinguished by uppercase or low- A1
A2 alleles can exist Genotypes are sometimes
ercase letters (A versus a), subscripts (A1 A3 in an entire population written with a slash (for
versus A2), superscripts (a versus a), or A4 of organisms, but only example, B/b and C/C) to
sometimes just and . Using these sym- A5 a single allele can be distinguish the alleles in
bols, homozygous genes would be por- • present at the locus of homologous chromosomes.
• the A gene in any one
trayed by any of these formulas: AA, aa, • chromosome.
A1A1, A2A2, aa, aa, , or . As in
the last two examples, the slash is some- Figure 2.22 Key concepts and terms used in
times used to separate alleles present in modern genetics. Note that a single gene can
homologous chromosomes to avoid ambi- have any number of alleles in the population as
guity. Heterozygous genes would be por- a whole, but no more than two alleles can be
trayed by any of the formulas Aa, A1A2, present in any one individual.
aa, or . In Figure 2.22, the genotype
Bb is heterozygous because the B and b al-
leles are distinguishable (which is why they smoker is much more likely to develop the
are assigned different symbols), whereas disease. Environmental effects also imply
the genotype CC is homozygous. These that the same phenotype can result from
genotypes could also be written as Bb and more than one genotype; smoking again
CC, respectively. provides an example, because most smok-
Whereas the alleles that are present in ers who are not genetically at risk can also
an individual constitute its genotype, the develop lung cancer.
physical or biochemical expression of the
genotype is called the phenotype. To put
it as simply as possible, the distinction is
that the genotype of an individual is what 2.6 Types of DNA Markers
is on the inside (the alleles in the DNA),
whereas the phenotype is what is on the
Present in Genomic DNA
outside (the observable traits, including Genetic variation, in the form of multiple
biochemical traits, behavioral traits, and so alleles of many genes, exists in most natural
forth). The distinction between genotype populations of organisms. We have called
and phenotype is critically important be- such genetic differences between individu-
cause there usually is not a one-to-one cor- als DNA markers; they are also called DNA
respondence between genes and traits. polymorphisms. (The term polymorphism
Most complex traits, such as hair color, skin literally means “multiple forms.”) The
color, height, weight, behavior, life span, methods of DNA manipulation examined in
and reproductive fitness, are influenced by Sections 2.3 and 2.4 can be used in a variety
many genes. Most traits are also influenced of combinations to detect differences among
more or less strongly by environment. This individuals. Anyone who reads the litera-
means that the same genotype can result in ture in modern genetics will encounter a
different phenotypes, depending on the en- bewildering variety of acronyms referring to
vironment. Compare, for example, two different ways in which genetic polymor-
people with a genetic risk for lung cancer; if phisms are detected. The different ap-
one smokes and the other does not, the proaches are in use because no single

2.6 Types of DNA Markers Present in Genomic DNA 65

method is ideal for all applications, each to differ at one SNP site about every
method has its own advantages and limita- 1000–3000 bp in protein-coding DNA and
tions, and new methods are continually be- at about one SNP site every 500–1000 bp in
ing developed. In this section we examine noncoding DNA. Note, in the definition of a
some of the principal methods for detecting SNP, the stipulation that DNA molecules
DNA polymorphisms among individuals. must differ at the nucleotide site “fre-
quently.” This provision excludes rare ge-
netic variation of the sort found in less than
Single-Nucleotide 1 percent of the DNA molecules in a popu-
Polymorphisms (SNPs) lation. The reason for the exclusion is that
A single-nucleotide polymorphism, or genetic variants that are too rare are not
SNP (pronounced “snip”), is present at a generally as useful in genetic analysis as the
particular nucleotide site if the DNA mole- more common variants. A catalog of SNPs is
cules in the population frequently differ in regarded as the ultimate compendium of
the identity of the nucleotide pair that oc- DNA markers, because SNPs are the most
cupies the site. For example, some DNA common form of genetic differences among
molecules may have a TA base pair at a people and because they are distributed
particular nucleotide site, whereas other approximately uniformly along the chro-
DNA molecules in the same population mosomes. By the middle of 2001, some
may have a CG base pair at the same site. 300,000 SNPs are expected to have been
This difference constitutes a SNP. The SNP identified in human populations and their
defines two “alleles” for which there could positions in the chromosomes located.
be three genotypes among individuals in
the population: homozygous with TA at
the corresponding site in both homologous Restriction Fragment Length
chromosomes, homozygous with CG at Polymorphisms (RFLPs)
the corresponding site in both homologous Although most SNPs require DNA sequenc-
chromosomes, or heterozygous with TA ing to be studied, those that happen to be
in one chromosome and CG in the ho- located within a restriction site can be ana-
mologous chromosome. The word allele is lyzed with a Southern blot. An example of
in quotation marks above because the SNP this situation is shown in Figure 2.23, where
need not be in a coding sequence, or even the SNP consists of a TA nucleotide pair in
in a gene. In the human genome, any two some molecules and a CG pair in others. In
randomly chosen DNA molecules are likely this example, the polymorphic nucleotide

(A) Polymorphic (B) Polymorphic

GAAT TC G A A T T C site GAAT TC GAAT TC G A A C T C site GAAT TC
CT TAAG CT TAAG CT TAAG CT TAAG CT TGAG CT TAAG

5' 3' 5' 3'

3' 5' 3' 5'

Treatment of DNA Treatment of DNA

with EcoRI with EcoRI

No cleavage

Cleavage
Result: Two fragments Result: One larger fragment

Figure 2.23 A minor difference in the DNA sequence of two molecules can be detected if the differ-
ence eliminates a restriction site. (A) This molecule contains three restriction sites for EcoRI, includ-
ing one at each end. It is cleaved into two fragments by the enzyme. (B) This molecule has an altered
EcoRI site in the middle, in which 5'-GAATTC-3' becomes 5'-GAACTC-3'. The altered site cannot be
cleaved by EcoRI, so treatment of this molecule with EcoRI results in one larger fragment.

66 Chapter 2 DNA Structure and DNA Manipulation

site is included in a cleavage site for the re- phism, or RFLP (pronounced either as
striction enzyme EcoRI (5'-GAATTC-3'). The “riflip” or by spelling it out).
two nearest flanking EcoRI sites are also Because RFLPs change the number and
shown. In this kind of situation, DNA mole- size of DNA fragments produced by
cules with TA at the SNP will be cleaved at digestion with a restriction enzyme, they
both flanking sites and also at the middle can be detected by the Southern blotting
site, yielding two EcoRI restriction frag- procedure discussed in Section 2.3. An ex-
ments. Alternatively, DNA molecules with ample appears in Figure 2.24. In this case the
CG at the SNP will be cleaved at both labeled probe DNA hybridizes near the re-
flanking sites but not at the middle site striction site at the far left and identifies the
(because the presence of CG destroys the position of this restriction fragment in the
EcoRI restriction site) and so will yield only electrophoresis gel. The duplex molecule
one larger restriction fragment. A SNP that labeled “allele A” has a restriction site in the
eliminates a restriction site is known as a middle and, when cleaved and subjected to
restriction fragment length polymor- electrophoresis, yields a small band that

Positions of
cleavage sites
Direction of current

Site of hybridization Larger DNA Smaller DNA

with probe DNA fragments fragments

5’ 3’
“Allele” A
3’ 5’
Duplex DNA
DNA band
from allele A

5’ 3’
“Allele” a 3’ 5’
Duplex DNA
DNA band
from allele a

Duplex DNA in homologous Figure 2.24 In a restric-

chromosomes tion fragment length
polymorphism (RFLP),
5’ 3’ alleles may differ in
3’ 5’ the presence or
Homozygous AA absence of a cleavage
5’ 3’ site in the DNA. In this
3’ 5’ example, the a allele
DNA band lacks a restriction site
from genotype AA
that is present in the
DNA of the A allele.
5’ 3’ The difference in
3’ 5’ fragment length can
Homozygous Aa be detected by
5’ 3’ Southern blotting.
3’ 5’ RFLP alleles are
DNA bands codominant, which
from genotype Aa means (as shown at
the bottom) that DNA
from the heterozygous
5’ 3’
Aa genotype yields
3’ 5’
each of the single
Homozygous aa
5’ 3’
bands observed in
3’ 5’
DNA from homozy-
DNA band gous AA and aa
from genotype aa genotypes.

2.6 Types of DNA Markers Present in Genomic DNA 67

Origin of the Human Genetic Linkage Map
tors worldwide for genetic linkage studies. partially by a major locus segregating in
David Botstein,1 Raymond L. White,2
Today the CEPH maintains a database on a pedigree to be mapped. Such a pro-
Mark Skolnick,3 and Ronald W.
the individuals in these pedigrees that com- cedure would not require any knowl-
Davis4 1980
1 Massachusetts Institute of Technology, prises approximately 12,000 polymorphic edge of the biochemical nature of the
DNA markers and 2.5 million genotypes. trait or of the nature of the alterations in
Cambridge, Massachusetts
2 University of Massachusetts Medical the DNA responsible for the trait. . . .
No method of systematically mapping The most efficient procedure will be to
Center, Worcester, Massachusetts
3 University of Utah, Salt Lake City, Utah human genes has been de- study a small set of
4Stanford University, Stanford, vised, largely because of the In principle, linked large pedigrees which
paucity of highly polymor- marker loci can have been genotyped
California
phic marker loci. The advent for all known polymor-
Construction of a Genetic Linkage Map in allow one to
of recombinant DNA tech- phic markers. . . . The
Man Using Restriction Fragment Length establish, with high resolution of genetic
nology has suggested a theo-
Polymorphisms
retically possible way to certainty, the and environmental
define an arbitrarily large genotype of an components of dis-
This historic paper stimulated a major in- number of arbitrarily poly- ease . . . must involve
individual.
ternational effort to establish a genetic morphic marker loci. . . . A unraveling the underly-
linkage map of the human genome based subset of such polymorphisms can ing genetic predisposition, understand-
on DNA polymorphisms. Pedigree studies readily be detected as differences in the ing the environmental contributions,
using these genetic markers soon led to the length of DNA fragments after digestion and understanding the variability of ex-
chromosomal localization and identifica- with DNA sequence-specific restriction pression of the phenotype. In principle,
tion of mutant genes for hundreds of hu- endonucleases. These restriction frag- linked marker loci can allow one to
man diseases. A more ambitious goal, still ment length polymorphisms (RFLPs) establish, with high certainty, the geno-
only partly achieved, is to understand the can be easily assayed in individuals, fa- type of an individual and, consequently,
genetic and environmental interactions in- cilitating large population studies. . . . assess much more precisely the con-
volved in complex traits such as heart dis- [Genetic mapping] of many DNA tribution of modifying factors such as
ease and cancer. The "small set of large marker loci should allow the es- secondary genes, likelihood of expres-
pedigrees" called for in the excerpt was tablishment of a set of well-spaced, sion of the phenotype, and environment.
soon established by the Centre d'Etude du highly polymorphic genetic markers
Polymorphisme Humain (CEPH) in Paris, covering the entire human genome [and Source: American Journal of Human Genetics
France, and made available to investiga- enabling] any trait caused wholly or 32: 314–331

contains sequences homologous to the codominant. In Figure 2.24, the bands

probe DNA. The duplex molecule labeled from AA and aa have been shown as some-
“allele a” lacks the middle restriction site what thicker than those from Aa, because
and yields a larger band. In this situation each AA genotype has two copies of the A
there can be three genotypes—AA, Aa, or allele and each aa genotype has two copies
aa, depending on which alleles are present of the a allele, compared with only one
in the homologous chromosomes—and all copy of each allele in the heterozygous
three genotypes can be distinguished as genotype Aa.
shown in the Figure 2.24. Homozygous AA
yields only a small fragment, homozygous
aa yields only a large fragment, and het- Random Amplified Polymorphic
erozygous Aa yields both a small and a large DNA (RAPD)
fragment. Because the presence of both the For studying DNA markers, one limitation
A and a alleles can be detected in heterozy- of Southern blotting is that it requires mate-
gous Aa genotypes, A and a are said to be rial (probes available in the form of cloned

68 Chapter 2 DNA Structure and DNA Manipulation

DNA) and one limitation of PCR is that it re- often anneal to genomic DNA at multiple
quires sequence information (so primer sites. Some primers anneal in the proper
oligonucleotides can be synthesized). These orientation and at a suitable distance from
are not severe handicaps for organisms that each other to support amplification of the
are well studied (for example, human be- unknown sequence between them. Among
ings, domesticated animals and cultivated the set of amplified fragments are ones that
plants, and model genetic organisms such as can be amplified from some genomic DNA
yeast, fruit fly, nematode, or mouse), be- samples but not from others, which means
cause research materials and sequence in- that the presence or absence of the ampli-
formation are readily available. But for the fied fragment is polymorphic in the popula-
vast majority of organisms that biologists tion of organisms.
study, there are neither research materials In most organisms it is usually straight-
nor sequence information. Genetic analysis forward to identify a large number of
can still be carried out in these organisms by RAPDs that can serve as genetic markers for
using an approach called random ampli- many different kinds of genetic studies. An
fied polymorphic DNA or RAPD (pro- example of RAPD gel analysis is illustrated
nounced “rapid”), described in this section. in Figure 2.25, where three pairs of primers
RAPD analysis makes use of a set of PCR (sets 1–3) are used to amplify genomic DNA
primers of 8–10 nucleotides whose se- from four individuals in a population. The
quence is essentially random. The random fragments that amplify are then separated
primers are tried individually or in pairs in on an electrophoresis gel and visualized af-
PCR reactions to amplify fragments of ter straining with ethidium bromide. Many
genomic DNA from the organism of inter- amplified bands are typically observed for
est. Because the primers are so short, they each primer set, but only some of these are

RAPD primer sets

Set 1 Set 2 Set 3

Individuals A B C D A B C D A B C D

RAPD bands on Figure 2.25 Random

electrophoresis amplified polymorphic
gel DNA (RAPD) is
detected through the
use of relatively short
primer sequences that,
by chance, match
genomic DNA at
Monomorphic multiple sites that are
bands close enough together
to support PCR
amplification.
Genomic DNA from a
single individual
typically yields many
Polymorphic bands, only some of
bands which are polymor-
phic in the population.
Different sets of
Fragments that PCR-amplify with genomic DNA from some primers amplify differ-
individuals but not others, using the same primer set ent fragments of
genomic DNA.

2.6 Types of DNA Markers Present in Genomic DNA 69

polymorphic. These are indicated in Figure Figures 2.24 through 2.26 illustrate an
2.25 by the colored dots. The amplified important point:
bands that are not polymorphic are said to
In modern genetics, the phenotypes that are
be monomorphic in the sample, which
studied are very often bands in a gel rather
means that they are the same from one in- than physical or physiological characteristics.
dividual to the next. This example shows 17
RAPD polymorphisms. Figure 2.26 shows an Figure 2.25 offers a good example. Each
actual RAPD gel amplified from genomic position at which a band is observed in one
DNA obtained from small tissue samples or more samples is a phenotype, whether
from a population of fish (Campostoma or not the band is polymorphic. For exam-
anomalum) in the Great Miami River Basin, ple, primer set 1 yields a total of 19 bands,
Ohio. The fish were collected as part of a of which 5 are polymorphic and 14 are
water quality assessment program to deter- monomorphic in the sample. The pheno-
mine whether fish populations in stressful types could be named in any convenient
water environments progressively lose their way, such as by indicating the primer set
genetic variation (that is, become increas- and the fragment length. For example,
ingly monomorphic). Each pair of samples suppose that the smallest amplified frag-
is flanked by a lane containing DNA size ment for primer set 1 is 125 bp, which is
standards producing a “ladder” of fragments the polymorphic fragment at the bottom
at 100-bp increments. left in Figure 2.25. We could name this
fragment unambiguously as 1-125 because
it is a fragment of 125 bp amplified by
primer set 1.
To understand why a DNA band is a phe-
notype, rather than a gene or a genotype, it
is useful to assign different names to the
“alleles” that do or do not support amplifi-
cation. (The word allele is in quotation
marks again, because the 1-125 fragment
that is amplified need not be part of a gene.)
We are talking only about the 1-125 frag-
900 bp
ment, so we could call the allele capable of
supporting amplification the plus () allele
and the allele not capable of supporting am-
plification the minus () allele. Then there
are three possible genotypes with regard to
the amplified fragment: , , and
400 bp . Using genomic DNA from these geno-
types, the homozygous and hetero-
zygous will both support amplification
of the 1-125 fragment, whereas the
genotype will not support amplification.
Hence the presence of the 1-125 fragment
is the phenotype observed in both and
Figure 2.26 RAPD polymorphisms in the genotypes. In other words, with
stoneroller fish (Campostoma anomalum) trapped regard to amplification, the allele is
in tributaries of the Great Miami River in Ohio. dominant to the allele, because the
Each pair of samples is flanked by a lane
containing DNA size standards; in these lanes, phenotype (presence of the 1-125 band) is
the smallest DNA fragment is 100 base pairs present in both homozygous and het-
Au: Are 400 and (bp), and each successively larger fragment erozygous genotypes. Therefore, on
900 base pairs cor- increases in size by 100 bp. Fragments whose the basis of the phenotype for the 1-125
sizes are multiples of 500 bp are present in band in Figure 2.25, we could say that indi-
rect? Or is it 100 greater concentration and so yield darker bands.
and 400 base [Courtesy of Michael Simonich, Manju Garg, viduals A and D could have either a or
pairs? and Ana Braam (Pathology Associates genotype but that individuals B and C
International, Cincinnati, Ohio).] must have genotype .

70 Chapter 2 DNA Structure and DNA Manipulation

polymorphisms known as amplified frag-
Amplified Fragment Length ment length polymorphisms, or AFLPs
Polymorphisms (AFLPs) (usually pronounced by spelling it out).
Because RAPD primers are small and may The first step (part A) is to digest genomic
not match the template DNA perfectly, the DNA with a restriction enzyme; this exam-
amplified DNA bands often differ a great ple uses the enzyme EcoRI, whose restric-
deal in how dark or light they appear. This tion site is 5'-GAATTC-3'. Digestion yields a
variation creates a potential problem, be- large number of restriction fragments
cause some exceptionally dark bands may flanked by what remains of an EcoRI site on
actually result from two amplified DNA each side. In the next step (part B), double-
fragments of the same size, and some stranded oligonucleotides called primer
exceptionally light bands may be difficult to adapters, with single-stranded overhangs
visualize consistently. To obtain amplified complementary to those on the restriction
fragments that yield more uniform band in- fragments, are ligated onto the restriction
tensities, double-stranded oligonucleotide fragments using the enzyme DNA ligase.
sequences that match the primer sequences The resulting fragments (C) are ready for
perfectly can be attached to genomic amplification by means of PCR. Note that
restriction fragments enzymatically prior to the same adapter is ligated onto each end,
amplification. This method, which is out- so a single primer sequence will anneal to
lined in Figure 2.27, yields a class of DNA both ends and support amplification.

(A) EcoRI site Eco RI site

5’–GAATTC GAATTC–3’
3’–CTTAAG CTTAAG–5’

Cleavage

(B)
5’–NN…NN–3’ 5’–AATTC G–3’ 5’–AATTNN…NN–3’
3’–NN…NNTTAA–5’ 3’–G CTTAA–5’ 3’–NN…NN–5’
Primer adapter Restriction fragment Primer adapter

Adapter ligation

Amplification
Fraction
of fragments
(D) Primer sequence amplified
5’–NN…NNAATTC–3’ 3’–CTTAANN…NN–5’ All
Nucleotide
5’–NN…NNAATTCA–3’ extensions reduce 3’–ACTTAANN…NN–5’ 1/16
5’–NN…NNAATTCAC–3’ the number of 3’–CACTTAANN…NN–5’ 1/256
amplifications.
5’–NN…NNAATTCACT–3’ 3’–TCACTTAANN…NN–5’ 1/4096

Figure 2.27 An amplified fragment length polymorphism (AFLP). (A and B) Genomic DNA is
digested with one or more restriction enzymes (in this case, EcoRI). (C) Oligonucleotide adaptors are
ligated onto the fragments; note that the single-stranded overhang of the adaptors matches those of
the genomic DNA fragments. (D) The resulting fragments are subjected to PCR using primers
complementary to the adaptors. The number of amplified fragments can be adjusted by manipulat-
ing the number of nucleotides in the adaptors that are also present in the primers.

2.6 Types of DNA Markers Present in Genomic DNA 71

09131-01-1747P

Photocaptionphotocaptionphotocaptionphotocaptionphotocaptionphotocaptionphotocaptionpho-
tocaptionphotocaptionphotocaptionphotocaption

There are nevertheless a number of

choices concerning the primer sequence. A
Simple Tandem Repeat
primer that matches the adapters perfectly Polymorphisms (STRPs)
will amplify all fragments, but this often re- One more type of DNA polymorphism war-
sults in so many amplified fragments that rants consideration because it is useful in
they are not well separated in the gel. A DNA typing for individual identification
PCR primer must match perfectly at its 3' and for assessing the degree of genetic relat-
end to be elongated. Thus, additional nu- edness between individuals. This type of
cleotides added to the 3' end reduce the polymorphism is called a simple tandem
number of amplified fragments, because repeat polymorphism (STRP) because
these primers will amplify only those frag- the genetic differences among DNA mole-
ments that, by chance, have a complemen- cules consist of the number of copies of a
tary nucleotide immediately adjacent to the short DNA sequence that may be repeated
EcoRI site. For example, the primer se- many times in tandem at a particular locus
quence in the second row has a single- in the genome. STRPs that are present at
nucleotide 3' extension; because only different loci may differ in the sequence
14 14 of the fragments would be ex- and length of the repeating unit, as well as
pected to have a complementary T immedi- in the minimum and maximum number of
ately adjacent to the EcoRI site on both tandem copies that occur in DNA molecules
sides, this primer is expected to amplify in the population. A STRP with a repeating
116 of all the restriction fragments. Simi- unit of 2–9 bp is often called a microsatel-
larly, the primer sequence in the third row lite or a simple sequence length poly-
has a two-nucleotide 3' extension, so this morphism (SSLP), whereas a STRP with a
primer is expected to amplify 116 repeating unit of 10–60 bp is often called a
116 1256 of all the fragments. One ap- minisatellite or a variable number of
plication of AFLP analysis is to organisms tandem repeats (VNTR).
with large genomes, such as grasshoppers Figure 2.28 shows an example of a STRP
and crickets, for which RAPD analysis with a copy number ranging from 1
would yield an excessive number of ampli- through 10. Because the number of copies
fied bands. How large is a “large” genome? determines the size of any restriction frag-
Compared to the human genome, that of ment that includes the STRP, each DNA
the brown mountain grasshopper Podisma molecule yields a single-size restriction
pedestris is 7 times larger, and those of North fragment depending on the number of
American salamanders in the genus Amphi- copies it contains. The STRP in Figure 2.28
uma are 70 times larger! (Genome size is has 10 different “alleles” (again, we use
discussed in Chapter 8.) quotation marks because the STRP may not

72 Chapter 2 DNA Structure and DNA Manipulation

Au: Proofreader asks that youconfirm refer-
ence to Chapter 8 in 1st para, 1st column.
Direction of current
Positions of cleavage sites
Larger DNA Smaller DNA
fragments fragments
1 Duplex DNA
5’ 3’ molecules
3’ 5’

1 2 Position
5’ 3’ of band in
3’ 5’ DNA gel

1 2 3
5’ 3’
3’ 5’

1 2 3 4
5’ 3’
3’ 5’

1 2 3 4 5
5’ 3’
3’ 5’

1 2 3 4 5 6
5’ 3’
3’ 5’

1 2 3 4 5 6 7
5’ 3’
3’ 5’

1 2 3 4 5 6 7 8
5’ 3’
3’ 5’

1 2 3 4 5 6 7 8 9
5’ 3’
3’ 5’

1 2 3 4 5 6 7 8 9 10
5’ 3’
3’ 5’

Tandem repeats of a DNA sequence

Figure 2.28 In a simple tandem repeat polymorphism (STRP), the alleles in a population differ in the
number of copies of a short sequence (typically 2–60 bp) that is repeated in tandem along the DNA
molecule. This example shows alleles in which the repeat number varies from 1 to 10. Cleavage at
restriction sites flanking the STRP yields a unique fragment length for each allele. The alleles can also
be distinguished by the size of the fragment amplified by PCR using primers that flank the STRP.

be in a gene), which could be distinguished individual genotype can carry at most two
either by Southern blotting using a probe to different alleles. Nevertheless, a large
a unique (nonrepeating) sequence within number of alleles means an even larger
the restriction fragment or by PCR amplifi- number of genotypes, which is the feature
cation using primers to a unique sequence that gives STRPs their utility in individual
on either side of the tandem repeats. In this identification. For example, even with only
situation the locus is said to have multiple 10 alleles in a population of organisms,
alleles in the population. Even with multi- there could be 10 different homozygous
ple alleles, however, any one chromosome genotypes and 45 different heterozygous
can carry only one of the alleles, and any genotypes.

2.6 Types of DNA Markers Present in Genomic DNA 73

More generally, with n alleles there are other hand, among women who are not
n homozygous genotypes and n(n 1)2 carriers, the lifetime risk of breast cancer is
heterozygous genotypes, or n(n 1)2 dif- about 12 percent, and hence many women
ferent genotypes altogether. With STRPs, without the genetic risk factor do develop
not only are there a relatively large number breast cancer. Indeed, BRCA1 mutations are
of alleles, but no one allele is exceptionally found in only 16 percent of affected women
common, so each of the many genotypes in who have a family history of breast cancer.
the population has a relatively low fre- The importance of a genetic risk factor can
quency. If the genotypes at 6–8 STRP loci be expressed quantitatively as the relative
are considered simultaneously, then each risk, which equals the risk of the disease in
possible multiple-locus genotype is exceed- persons who carry the risk factor as com-
ingly rare. Because of their high degree of pared to the risk in persons who do not.
variation among people, STRPs are widely The relative risk for BRCA1 equals 3.0 (cal-
used in DNA typing (sometimes called culated as 36 percent12 percent).
DNA fingerprinting) to establish indi- The utility of DNA polymorphisms in lo-
vidual identity for use in criminal investi- cating and identifying disease genes results
gations, parentage determinations, and so from genetic linkage, the tendency for
forth (Chapter 17). genes that are sufficiently close together in
a chromosome to be inherited together. Ge-
netic linkage will be discussed in detail in
Chapter 5, but the key concepts are sum-
marized in Figure 2.29, which shows the lo-
2.7 Applications of cation of many DNA polymorphisms along
DNA Markers a chromosome that also carries a genetic
Why are geneticists interested in DNA risk factor denoted D (for disease gene).
markers (DNA polymorphisms)? Their in- Each DNA polymorphism serves as a ge-
terest can be justified on any number of netic marker for its own location in the
grounds. In this section we consider the chromosome. The importance of genetic
reasons most often cited. linkage is that DNA markers that are suffi-
ciently close to the disease gene will tend to
be inherited together with the disease gene
in pedigrees—and the closer the markers,
Genetic Markers, Genetic Mapping, the stronger this association. Hence, the ini-
and “Disease Genes” tial approach to the identification of a dis-
Perhaps the key goal in studying DNA poly- ease gene is to find DNA markers that are
morphisms in human genetics is to identify genetically linked with the disease gene in
the chromosomal location of mutant genes order to identify its chromosomal location,
associated with hereditary diseases. In the a procedure known as genetic mapping.
context of disorders caused by the interac- Once the chromosomal position is known,
tion of multiple genetic and environmental other methods can be used to pinpoint the
factors, such as heart disease, cancer, dia- disease gene itself and to study its functions.
betes, depression, and so forth, it is impor- If genetic linkage seems a roundabout
tant to think of a harmful allele as a risk way to identify disease genes, consider the
factor for the disease, which increases the alternative. The human genome contains
probability of occurrence of the disease, approximately 80,000 genes. If genetic
rather than as a sole causative agent. This linkage did not exist, then we would have
needs to be emphasized, especially because to examine 80,000 DNA polymorphisms,
genetic risk factors are often called disease one in each gene, in order to identify a dis-
genes. For example, the major “disease ease gene. But the human genome has only
gene” for breast cancer in women is the 23 pairs of chromosomes, and because of
gene BRCA1. For women who carry a mu- genetic linkage and the power of genetic
tant allele of BRCA1, the lifetime risk of mapping, it actually requires only a few
breast cancer is about 36 percent, and hundred DNA polymorphisms to identify
hence most women with this genetic risk the chromosome and approximate location
factor do not develop breast cancer. On the of a genetic risk factor.

74 Chapter 2 DNA Structure and DNA Manipulation

Locus of a “disease gene”
(a genetic risk factor), D
DNA markers that are too far from the
DNA markers that are close enough disease gene in the chromosome (or are in a
to a disease gene tend to be inherited different chromosome) are not linked to the
together (genetically linked) with the disease gene. They do not tend to be
disease gene. inherited with the disease gene in pedigrees.

DNA polymorphisms The closer a marker is to the disease gene,

(genetic markers) along the closer the linkage and the more
the chromosomes likely it is that they will be inherited together.

Figure 2.29 Concepts in genetic localization of genetic risk factors for disease. Polymorphic DNA
markers (indicated by the vertical lines) that are close to a genetic risk factor (D) in the chromosome
tend to be inherited together with the disease itself. The genomic location of the risk factor is deter-
mined by examining the known genomic locations of the DNA polymorphisms that are linked with it.

Improvement of domesticated plants

Other Uses for DNA Markers and animals. Plant and animal breeders
DNA polymorphisms are widely used in all have turned to DNA polymorphisms as
aspects of modern genetics because they genetic markers in pedigree studies to iden-
provide a large number of easily accessed tify, by genetic mapping, genes that are as-
genetic markers for genetic mapping and sociated with favorable traits in order to
other purposes. Among the other uses of incorporate these genes into currently used
DNA polymorphisms are the following. varieties of plants and breeds of animals.
Individual identification. We have al- History of domestication. Plant and ani-
ready mentioned that DNA polymorphisms mal breeders also study genetic polymor-
have application as a means of DNA typing phisms to identify the wild ancestors of
(DNA fingerprinting) to identify different cultivated plants and domesticated animals,
individuals in a population. DNA typing in as well as to infer the practices of artificial
other organisms is used to determine indi- selection that led to genetic changes in
vidual animals in endangered species and these species during domestication.
to identify the degree of genetic relatedness
DNA polymorphisms as ecological in-
among individual organisms that live in
dicators. DNA polymorphisms are being
packs or herds. For example, DNA typing in
evaluated as biological indicators of genetic
wild horses has shown that the wild stallion
diversity in key indicator species present in
in charge of a harem of mares actually sires
biological communities exposed to chemi-
fewer than one-third of the foals.
cal, biological, or physical stress. They are
Epidemiology and food safety science. also used to monitor genetic diversity in
DNA typing also has important applications endangered species and species bred in
in tracking the spread of viral and bacterial captivity.
epidemic diseases, as well as in identifying
Evolutionary genetics. DNA polymor-
the source of contamination in contami-
phisms are studied in an effort to describe
nated foods.
the patterns in which different types of
Human population history. DNA poly- genetic variation occur throughout the
morphisms are widely used in anthropology genome, to infer the evolutionary mecha-
to reconstruct the evolutionary origin, nisms by which genetic variation is main-
global expansion, and diversification of the tained, and to illuminate the processes by
human population. which genetic polymorphisms within

2.7 Applications of DNA Markers 75

species become transformed into genetic population history, patterns of migration,
differences between species. and so forth.
Population studies. Population ecolo- Evolutionary relationships among spe-
gists employ DNA polymorphisms to assess cies. Differences in homologous DNA se-
the level of genetic variation in diverse quences between species is the basis of
populations of organisms that differ in ge- molecular systematics, in which the se-
netic organization (prokaryotes, eukary- quences are analyzed to determine the
otes, organelles), population size, breeding ancestral history (phylogeny) of the species
structure, or life-history characters, and and to trace the origin of morphological,
they use genetic polymorphisms within behavioral, and other types of adaptations
subpopulations of a species as indicators of that have arisen in the course of evolution.

Chapter Summary
The sequence of bases in the human genome is 99.9% stable duplexes with whatever fragments contain suffi-
identical from one person to the next. The remaining ciently complementary base sequences, and the positions
0.1%—comprising 3 million base pairs—differs among in- of these duplexes can be determined by exposing the fil-
dividuals. Included in these differences are many muta- ter to x-ray film on which radioactive emission (or, in
tions that cause or increase the risk of disease, but the some procedures, light emission) produces an image of
majority of the differences are harmless in themselves. the band. Particular DNA sequences can also be amplified
Any of these differences between genomes can be used as without cloning by means of the polymerase chain reac-
a genetic marker. Genetic markers are widely employed in tion (PCR), in which short, synthetic oligonucleotides are
genetics to serve as positional landmarks along a chromo- used as primers to replicate repeatedly and amplify the
some or to identify particular cloned DNA fragments. The sequence between them. The primers must flank, and
manipulation of DNA molecules to identify genetic mark- have their 3' ends oriented toward, the region to be am-
ers is the basic experimental operation in modern genetics. plified, because DNA polymerase can elongate the
A DNA strand is a polymer of deoxyribonucleotides, primers only by the addition of successive nucleotides to
each composed of a nitrogenous base, a deoxyribose the 3' end of the growing chain. Each round of PCR am-
sugar, and a phosphate. Sugars and phosphates alternate plification results in a doubling of the number of ampli-
in forming a polynucleotide chain with one terminal 3'- fied fragments.
OH group and one terminal 5'-P group. In double- Most genes are present in pairs in the nonreproduc-
stranded (duplex) DNA, the two strands are paired and tive cells of most animals and higher plants. One member
antiparallel. Each end of the double helix carries a termi- of each gene pair is in the chromosome inherited from the
nal 3'-OH group in one strand and a terminal 5'-P group maternal parent, and the other member of the gene pair is
in the other strand. The four bases found in DNA are the at a corresponding location (locus) in the homologous
purines, adenine (A) and guanine (G), and the pyrim- chromosome inherited from the paternal parent. A gene
idines, cytosine (C) and thymine (T). Equal numbers of can have different forms that correspond to differences in
purines and pyrimidines are found in double-stranded DNA sequence. The different forms of a gene are called al-
DNA (Chargaff’s rules), because the bases are paired as leles. The particular combination of alleles present in an
AT pairs and GC pairs. The hydrogen-bonded base organism constitutes its genotype. The observable charac-
pairs, along with hydrophobic base stacking of the nu- teristics of the organism constitute its phenotype. In an
cleotide pairs in the core of the double helix, hold the two organism, if the two alleles of a gene pair are the same
polynucleotide strands together in a double helix. (for example, AA or aa), then the genotype is homozy-
Duplex DNA can be cleaved into fragments of defined gous for the A or a allele; if the alleles are different (Aa),
length by restriction enzymes, each of which cleaves DNA then the genotype is heterozygous. Even though each
at a specific recognition sequence (restriction site) usually genotype can include at most two alleles, multiple alleles
four or six nucleotide pairs in length. These fragments are often encountered among the individuals in natural
can be separated by electrophoresis. The positions of par- populations.
ticular restriction fragments in a gel can be visualized by DNA polymorphisms (DNA markers) are common in
means of nucleic acid hybridization, in which strands of natural populations of most organisms. Among the most
duplex DNA that have been separated (denatured) by widely used DNA polymorphisms are single-nucleotide
heating are mixed and come together (renature) with polymorphisms (SNPs), restriction fragment length poly-
strands having complementary nucleotide sequences. In morphisms (detected by Southern blots), and such PCR-
a Southern blot, denatured and labeled probe DNA is based polymorphisms as random amplified polymorphic
mixed with denatured DNA made up of restriction frag- DNA (RAPD), amplified fragment length polymorphisms
ments that have been transferred to a filter membrane af- (AFLPs), and simple tandem repeat polymorphisms
ter electrophoresis. The probe DNA anneals and forms (STRPs). DNA polymorphisms are used in genetic map-

76 Chapter 2 DNA Structure and DNA Manipulation

ping studies to identify DNA markers that are genetically man population history, and improving cultivated plants
linked to disease genes (genetic risk factors) in the chro- and domesticated animals, as well as for the genetic
mosome in order to pinpoint their location. They are also monitoring of endangered species and for many other
used in DNA typing for identifying individuals, tracking purposes.
the course of virus and bacterial epidemics, studying hu-

Key Terms
allele genomic DNA probe
amplification genotype purine
amplified fragment length heterozygous pyrimidine
polymorphism (AFLP) homologous chromosomes random amplified polymorphic
antiparallel homozygous DNA (RAPD)
band hydrogen bond relative risk
base composition hydrophobic interaction renaturation
base pairing kilobase (kb) restriction endonuclease
base stacking locus restriction enzyme
B form of DNA major groove restriction fragment
blunt end microsatellite restriction fragment length polymor-
Chargaff’s rules minisatellite phism (RFLP)
chromosome minor groove restriction map
codominant monomorphic restriction site
denaturation multiple alleles risk factor
denatured DNA nucleic acid hybridization simple sequence length polymor-
disease gene phism (SSLP)
nucleoside
DNA cloning simple tandem repeat polymorphism
nucleotide
DNA fingerprinting (STRP)
oligonucleotide
DNA marker single-nucleotide polymorphism
palindrome (SNP)
DNA polymerase percent G C Southern blot
DNA polymorphism phenotype sticky end
DNA typing phosphodiester bond thermophile
dominant polarity 3'-OH (hydroxyl) group
5'-P (phosphate) group polymerase chain reaction variable number of tandem repeats
gel electrophoresis (PCR) (VNTR).
gene polynucleotide chain Z form of DNA
genetic linkage primer
genetic mapping primer adapter
genetic marker

Review the Basics

• What four bases are commonly found in the nu- • Describe how a Southern blot is carried out. Explain
cleotides in DNA? Which form base pairs? what it used for. What is the role of the probe?
• Which chemical groups are present at the extreme 3’ • How does the polymerase chain reaction work? What
and 5’ ends of a single polynucleotide strand? is it used for? What information about the target se-
quence must be known in advance? What is the role
• What does it mean to say that a single strand of DNA
of the oligonucleotide primers?
strand has a polarity? What does it mean to say that
the DNA strands in a duplex molecule are anti- • What is a DNA marker? Explain how harmless DNA
parallel? markers can serve as aids in identifying disease genes
through genetic mapping.
• What are restriction enzymes and why are they im-
portant in the study of particular DNA fragments? • Define and given an example of each of the following
What does it mean to say that most restriction sites key genetic terms: locus, allele, genotype, hetero-
are palindromes? zygous, homozygous, phenotype.

Review the Basics 77

GeNETics on the Web will introduce you to some of the most contributing to the stability of double-stranded DNA is the
important sites for finding genetic information on the Inter- stacking of the base pairs on top of one another as a result of
net. To explore these sites, visit the Jones and Bartlett home hydrophobic interactions. For further discussion of this fea-
page at ture of DNA, and much else of interest regarding the discov-
ery and analysis of this critical biological macromolecule,
http://www.jbpub.com/genetics
consult the keyword site.
For the book Genetics: Analysis of Genes and Genomes, choose
• The concept of the polymerase chain reaction (PCR) oc-
the link that says Enter GeNETics on the Web. You will be pre-
curred to Kary Mullis one night while cruising on Route 128
sented with a chapter-by -chapter list of highlighted keywords.
from San Francisco to Mendocino. He immediately realized
Select any highlighted keyword and you will be linked to a Web
that this approach would be unique in its ability to amplify, at
site containing genetic information related to the keyword.
an exponential rate, a specific nucleotide sequence present
• DNA is like Coca-Cola? According to this keyword site, it in a vanishingly small quantity amid a much larger back-
is. It contains sugar, which is highly soluble in water; phos- ground of total nucleic acid. Once its feasibility was demon-
phate groups, which are of moderate solubility; and bases, strated, PCR was quickly recognized as a major technical
which have extremely low solubility. (The base in the soft drink advance in molecular biology. The new technique earned
is caffeine, which is chemically similar to adenine and can Mullis the 1993 Nobel Prize in chemistry, and today it is the
sometimes be incorporated into DNA, causing a mutation.) basis of a large number of experimental and diagnostic pro-
As this keyword site emphasizes, the most important property cedures. At this keyword site you can learn more about the

Guide to Problem Solving

Problem 1 Distinguish between base pairing and base How many possible restriction maps are compatible with
stacking in double-stranded DNA. these data? For each possible restriction map, make a dia-
gram of the circular molecule and indicate the relative po-
Answer Base pairing is the hydrogen bonding between
sitions of the EcoRI and HindIII restriction sites.
corresponding bases in opposite strands of duplex DNA; A
(adenine) is paired with T (thymine), and G (guanine) is Answer Because the single-enzyme digests give two
paired with C (cytosine). Base stacking refers to the bands each, there must be two restriction sites for each
hydrophobic (water-hating) interaction between consec- enzyme in the molecule. Furthermore, because digestion
utive base pairs along a DNA duplex, which promotes the with HindIII makes both the 6-kb and the 14-kb restric-
formation of a “stack” of base pairs with the sugar–phos- tion fragments disappear, each of these fragments must
phate backbones of the strands running along outside. contain one HindIII site. Considering the sizes of the frag-
ments in the double digest, the 6-kb EcoRI fragment must
Problem 2 The restriction enzyme EcoRI cleaves double-
be cleaved into 2-kb and 4-kb fragments, and the 14-kb
stranded DNA at the sequence 5'-GAATTC-3', and the re-
EcoRI fragment must be cleaved into 5-kb and 9-kb frag-
striction enzyme HindIII cleaves at 5'-AAGCTT-3'. A
ments. Two restriction maps are compatible with the
20-kilobase (kb) circular plasmid is digested with each
data, depending on which end of the 6-kb EcoRI frag-
enzyme individually and then in combination, and the
ment the HindIII site is nearest. The position of the re-
resulting fragment sizes are determined by means of elec-
maining HindIII site is determined by the fact that the
trophoresis. The results are as follows:
2-kb and 5-kb fragments in the double digest must be ad-
EcoRI alone fragments of 6 kb and 14 kb jacent in the intact molecule in order for a 13-kb frag-
HindIII alone fragments of 7 kb and 13 kb ment to be produced by HindIII digestion alone. The
EcoRI and HindIII fragments of 2 kb, 4 kb, 5 kb and 9 kb accompanying figure shows the relative positions of the

EcoRI EcoRI EcoRI

(A) (B) (C)
HindIII

5 2 4
6 HindIII
HindIII 9
4 2
14
9 5
EcoRI EcoRI EcoRI
HindIII

78 Chapter 2 DNA Structure and DNA Manipulation

development of PCR from Mullis’s original conception, in- • The Pic Site showcases some of the most visually appeal-
cluding two major innovations that were necessary to perfect ing genetics sites on the World Wide Web. To visit the genetics
the process. Web site pictured below, select the PIC Site for Chapter 2.

• Human beings rely on plants for food, shelter, and medi-

cines. Although at least 5000 species are cultivated, modern
agricultural research emphasizes a few widely cultivated crops
while largely ignoring plants such as Bambara groundnut (Vi-
gna subterranea), breadfruit (Artocarpus altilis), carob (Cerato-
nia siliqua), coriander (Coriandrum sativum), emmer wheat
(Triticum dicoccum), oca (Oxalis tuberosa), and ulluco (Ullucus
tuberosus). To learn more about these minor crops and the use
of molecular markers for characterizing and preserving their
genetic diversity, consult this keyword site.

• The Mutable Site changes frequently. Each new update in-

cludes a different site that highlights genetics resources avail-
able on the World Wide Web. Select the Mutable Site for
Chapter 2 and you will be linked automatically.

EcoRI sites (part A). Parts B and C are the two possible re- accompanying gel diagram, indicate the genotypes across
striction maps, which differ according to whether the the top and the phenotype (band position or positions)
EcoRI site at the top generates the 2-kb or the 4-kb frag- expected for each genotype. The scale on the right shows
ment in the double digest. the expected positions of fragments ranging in size from 1
to 12 kb.
Problem 3 The accompanying diagram shows the positions
of restriction sites (tick marks) for a particular restriction Answer After cleavage with the restriction enzyme, the
enzyme that can be present in the DNA at a locus in a hu- A1-type DNA yields a 3-kb fragment that binds with the
man chromosome. The DNA present in any particular probe (yielding a 3-kb band) and a 9-kb fragment that
chromosome may be that shown at the top or that shown does not bind with the probe (yielding no visible band),
at the bottom. A probe DNA binds to the fragments at the whereas the A2-type DNA yields a 12-kb fragment that
position shown by the rectangle. With respect to an RFLP binds with the probe (yielding a 12-kb band). A particu-
based on these fragments, three genotypes are possible. lar chromosome may carry allele A1 or allele A2. Because
What are they? Use the symbol A1 to refer to the allele individuals have two copies of each chromoosme (ex-
that yields the upper DNA fragment, and use A2 to refer cept for the sex chromosomes), any individual may
to the allele that yields the lower DNA fragment. In the carry A1A1, A1A2, or A2A2. DNA from homozygous A1A1
genotypes yields a 3-kb band, that from heterozygous
A1A2 genotypes yields both a 3-kb and a 12-kb band,
3 kb 9 kb
A1 and that from homozygous A2A2 genotypes yields a 12-
kb band. The expected phenotypes are illustrated here.
12 kb
A2
A1A1 A1A2 A2A2

12 kb 12 kb
9 kb 9 kb

6 kb 6 kb

3 kb 3 kb

1 kb 1 kb

Guide to Problem Solving 79

5'-GTACGGGCAATGGTAATTTTTCAGGAACCAGGGCCCTTAAGCCGTC-3'
3'-CATGCCCGTTACCATTAAAAACTCCTTGGTCCCGGGAATTCGGCAG-5'
Problem 4

Problem 4 A geneticist plans to use the polymerase chain one that is elongated in a left-to-right direction) should
reaction (PCR) to amplify part of the DNA sequence have the sequence 5'-GCAATG-3' and the “reverse
shown below, using oligonucleotide primers that are primer” (the one that is elongated in a right-to-left direc-
hexamers matching the regions shown in red. (In prac- tion) should have the sequence 3'-TTCGGC-5'. The re-
tice, hexamers are too short for most purposes.) State the sulting amplified sequence is shown below.
sequence of the primer oligonucleotides that should be
5'-GCAATGGTAATTTTTCAGGAACCAGGGCCCTTAAGCCG-3'
used, including the polarity, and give the sequence of the
DNA molecule that results from amplification. 3'-CGTTACCATTAAAAACTCCTTGGTCCCGGGAATTCGGC-5'

Answer The primers must be able to base-pair with the

chosen primer sites and must be oriented with their 3'
ends facing one another. Thus the “forward primer” (the
AU/ED: Space in art for
prob 2.6 reduced slightly
to make page.OK?

Analysis and Applications

2.1 Many restriction enzymes produce restriction frag- 2.6 The linear DNA fragment shown here has cleavage
ments that have “sticky ends.” What does this mean? sites for BamHI (B) and EcoRI (E). In the accompanying
diagram of an electrophoresis gel, indicate the positions at
2.2 Which of the following sequences are palindromes which bands would be found after digestion with:
and which are not? Explain your answer. Symbols such (a) BamHI alone
as (AT) mean that the site may be occupied by (in this (b) EcoRI alone
case) either A or T, and N stands for any nucleotide. (c) BamHI and EcoRI together
(a) 5'-AATT-3' The dashed lines on the right indicate the positions to
(b) 5'-AAAA-3' which bands of 1–12 kb would migrate.
(c) 5'-AANTT-3'
(d) 5'-AA(AT)AA-3'
B E B E
(e) 5'-AA(GC)TT-3'
0 2 4 6 8 10 12 kb
2.3 The following list gives half of each of a set of palin-
dromic restriction sites. What is the complete sequence of
each restriction site? (N stands for any nucleotide.) BamHI
(a) 5'-AA??-3' +
(b) 5'-ATG???-3' BamHI EcoRI Eco RI
(c) 5’'-GGN??-3'
12 kb
(d) 5'-ATNN??-3' 9 kb

2.4 Apart from the base sequence, what is different about 6 kb

the ends of restriction fragments produced by the follow-
ing restriction enzymes? (The downward arrow repre- 3 kb
sents the site of cleavage in each strand.)
(a) HaeIII (5'-GG j CC-3')
(b) MaeI (5'-C j TAG-3') 2.7 The circular DNA molecule shown at the top of page
(c) CfoI (5'-GCG j C-3') 81 has cleavage sites for BamHI and EcoRI. In the accom-
(a) panying diagram of an electrophoresis gel, indicate the
2.5 A solution contains double- positions at which bands would be found after digestion
stranded DNA fragments of size 3 kb, (b) with:
6 kb, 9 kb, and 12 kb. They are sepa- (a) BamHI alone
rated in an electrophoresis gel. In (c) (b) EcoRI alone
the diagram of the gel at the right, (c) BamHI and EcoRI together
match the fragment sizes with the (d) The dashed lines on the right indicate the positions to
correct bands. which bands of 1–12 kb would migrate.

80 Chapter 2 DNA Structure and DNA Manipulation

0 kb
AU/ED: Space in art for (a) A restriction enzyme with a 4-base cleavage site?
9 1
BamHI prob 2.7 reduced slightly (b) A restriction enzyme with a 6-base cleavage site?
to make page.OK? (c) A restriction enzyme with an 8-base cleavage site?
8 EcoRI 2
2.11 In a random sequence consisting of equal proportions
of all four nucleotides, what is the average distance be-
7 BamHI 3 tween restriction sites for:
(a) A restriction enzyme with a 4-base cleavage site?
Eco RI (b) A restriction enzyme with a 6-base cleavage site?
6 4
5 kb (c) A restriction enzyme with an 8-base cleavage site?

BamHI 2.12 If human DNA were essentially a random sequence

+
Eco RI of 3 109 bp with equal proportions of all four nu-
BamHI Eco RI
cleotides (this is an oversimplification), approximately
12 kb how many restriction fragments would be expected from
9 kb
cleavage with
6 kb (a) A “4-cutter” restriction enzyme?
(b) A “6-cutter” restriction enzyme?
3 kb (c) An “8-cutter” restriction enzyme?

1 kb 2.13 Consider the restriction enzymes BamHI (cleavage

site 5'-G j GATCC-3') and Sau3A (cleavage site
5'-j GATC-3'), where the downward arrow denotes the
2.8 Consider the accompanying diagram of a region of site of cleavage in each strand. Is every BamHI site a
duplex DNA, in which the B’s represent bases in Watson– Sau3A site? Is every Sau3A site a BamHI site? Explain
Crick pairs. Specify as precisely as possible the identity of: your answer.
(a) B5, assuming that B1 A
(b) B6, assuming that B2 C
2.14 A DNA duplex with the sequence shown here is
(c) B7, assuming that B3 purine
cleaved with BamHI (cleavage site 5'-G j GATCC-3'),
(d) B8, assuming that B4 A or T
where the arrow denotes the site of cleavage in each
strand. If the resulting fragments were brought together
OH in the right order, and the breaks in the backbones re-
paired, what possible DNA duplexes would be expected?

P 5'-ATTGGATCCAAACCCCAAAGGATCCTTA-3'
1 P B1 B5 3'-TAACCTAGGTTTGGGGTTTCCTAGGAAT-5'

P 2.15 The restriction enzymes PstI, PvuII, and MluI have

2 P B2 B6 the following restriction sites, where the arrow indicates
the site of cleavage in each strand.

P PstI 5'-CTGCA j G-3'

3 P B3 B7 PvuII 5'-CAG j CTG-3'
MluI 5'-A j CGCGT-3'
AU/ED: Art for prob 2.8 scaled
4 P B8to make page. OK?P
B4to 85% A DNA duplex with the sequence above is digested. What
fragments would result from cleavage with:
2.9 Refer to the DNA molecule diagrammed in Problem (a) PstI?
2.8. In the precursor nucleotides of this molecule, which (b) PvuII?
base was each of the phosphate groups 1–4 associated (c) MluI?
with?
2.16 With regard to the restriction enzymes and the DNA
2.10 In a random sequence consisting of equal propor- duplex in Problem 2.15, what fragments would result
tions of all four nucleotides, what is the probability that a from digestion with
particular short sequence of nucleotides matches a re- (a) PstI and MluI?
striction site for: (b) PvuII and MluI?

Problems 2.15, 2.16

5'-ATGCCCTGCAGTACCATGACGCGTTACGCAGCTGATCGAAACGCGTATATATGCC-3'
3'-TACGGGACGTCATGGTACTGCGCAATGCGTCGACTAGCTTTGCGCATATATACGG-5'

Analysis and Applications 81

2.17 Consider the sequence: ment.) In the accompanying gel diagram, indicate the
genotypes across the top and the phenotype (band posi-
5'-CTGCAGGTG-3'
tion or positions) expected for each genotype. (The scale
3'-GACGTCCAC-5' on the right shows the expected positions of fragments
If this sequence were cleaved with PstI (5'-CTGCA j G-3'), could from 1 to 12 kb.)
it still be cleaved with PvuII (5'-CAG j CTG-3')? If it were
cleaved with PvuII, could it still be cleaved with PstI? Ex- 4 kb 2 kb
A1
plain your answer.
6 kb
2.18 A circular DNA molecule is cleaved with BamHI, A2
EcoRI, or the two restriction enzymes together. The ac-
companying diagram shows the resulting electrophoresis
gel, with the band sizes indicated. Draw a diagram of the 12 kb
9 kb
circular DNA, showing the relative positions of the BamHI
and EcoRI sites. 6 kb

BamHI
+ 3 kb
BamHI Eco RI EcoRI
1 kb
10 kb
7 kb 2.21 The accompanying diagram shows the DNA frag-
ments associated with an RFLP revealed by a probe that
3 kb hybridizes where shown by the rectangle. The tick marks
are cleavage sites for the restriction enzyme used in the
RFLP analysis. How many alleles does this RFLP have?
2.19 In the diagrams of DNA fragments shown here, the How many genotypes are possible? In the accompanying
tick marks indicate the positions of restriction sites for a gel diagram, indicate the phenotype (pattern of bands)
particular restriction enzyme. A mixture of the two types expected of each genotype. (The scale on the right shows
of molecules is digested and analyzed with a Southern the expected positions of fragments from 1 to 12 kb.)
blot using either probe A or probe B, which hybridizes to
the fragments where shown by the rectangles. In the ac- 4 kb 8 kb
companying gel diagram, indicate the bands that would
result from the use of each of these probes. (The scale on 12 kb
the right shows the expected positions of fragments from
1 to 12 kb.)
5 kb 7 kb 12 kb
9 kb
12 kb 6 kb

Probe A Probe B
3 kb
Probe A Probe B
12 kb 1 kb
9 kb
6 kb 2.22 The thick horizontal lines shown below represent al-
ternative DNA molecules at a particular locus in a human
3 kb chromosome. The tick marks indicate the positions of re-
striction sites for a particular restriction enzyme. Ge-
1 kb nomic DNA from a sample of people is digested and
analyzed by a Southern blot using a probe DNA that hy-
2.20 In the accompanying diagram, the tick marks indi- bridizes at the position shown by the rectangle. How
cate the positions of restriction sites in two alternative many possible RFLP alleles would be observed in the sam-
DNA fragments that can be present at the A locus in a hu- ple? How many genotypes?
man chromosome. An RFLP analysis is carried out, using
3 kb 3 kb 2 kb 4 kb
probe DNA that binds to the fragments at the position
shown by the rectangle. With respect to this RFLP, how 3 kb 5 kb 4 kb
many genotypes are possible? (Use the symbol A1 to refer
to the allele that yields the upper DNA fragment, and use 6 kb 2 kb 4 kb
A2 to refer to the allele that yields the lower DNA frag-

82 Chapter 2 DNA Structure and DNA Manipulation

2.23 The RFLPs described in Problem 2.22 are analyzed A B C D
with the same restriction enzyme but a different probe,
12
which hybridizes at the site indicated here by the rectan-
gle. How many RFLP alleles would be found? How many 10
genotypes? (Use the symbols A1, A2, . . . to indicate the al-
leles.) In the accompanying gel diagram, indicate the Band 8
genotypes across the top and the phenotype (band posi- number
6
tion or positions) expected for each genotype. (The scale
on the right shows the expected positions of fragments 4
from 1 to 12 kb.)
2
3 kb 3 kb 2 kb 4 kb

3 kb 5 kb 4 kb
2.28 A cigarette butt found at the scene of a robbery is
6 kb 2 kb 4 kb found to have a sufficient number of epithelial cells stuck
to the paper for the DNA to be extracted and typed.
Shown below are the results of typing for three probes
(locus 1, locus 2, and locus 3) of the evidence (X) and 7
12 kb suspects (A through G). Which of the suspects can be ex-
9 kb
cluded? Which cannot be excluded? Can you identify the
6 kb robber? Explain your reasoning.

3 kb A B C D E F G X

1 kb

Locus 1
2.24 If hexamers were long enough oligonucleotides to
serve as specific primers for PCR (for most purposes they
are too short), what DNA fragment would be amplified
using the “forward” primer pair 5'-AATGCC-3' and the
“reverse” primer 3'-GCATGT-5' on the double-stranded
DNA molecule shown below?
A B C D E F G X
2.25 Would the primer pairs 3'-AATGCC-5' and
5'-GCATGT-3' amplify the same fragment described in the
previous problem? Explain your answer.
Locus 2
2.26 A human DNA fragment of 3 kb is to be amplified by
PCR. The total genome size is 3 109 bp.
(a) Prior to amplification, what fraction of the total DNA
does the target sequence constitute?
(b) What fraction does it constitute after 10 cycles of
PCR? A B C D E F G X
(c) After 20 cycles of PCR?
(d) After 30 cycles of PCR?

2.27 RAPD analysis is carried out using genomic DNA Locus 3

from four individuals (A–D) sampled from a natural pop-
ulation of Hawaiian crickets. The gel shown at the top of
the page resulted from PCR with one of the primer pairs
tested. Which bands are the RAPD polymorphisms?

5'-GATTACCGGTAAATGCCGGATTAACCCGGGTTATCAGGCCACGTACAACTGGAGTCC-3'
3'-CTAATGGCCATTTACGGCCTAATTGGGCCCAATAGTCCGGTGCATGTTGACCTCAGG-5'
Problem 2.24

Analysis and Applications 83

2.29 A woman is uncertain which of two men is the fa- 2.30 Snake venom phosphodiesterase cleaves the chemi-
ther of her child. DNA typing is carried out on blood from cal bonds shown in red in the accompanying diagram,
the child (C), the mother (M), and each of the two males leaving mononucleotides that are phosphorylated in the
(A and B), using probes for a highly polymorphic DNA 3' position. If the phosphates numbered 2 and 4 are ra-
marker on two different chromosomes (“locus 1” and dioactive, which mononucleotides will be radioactive af-
“locus 2”). The result is shown in the accompanying dia- ter cleavage with snake venom phosphodiesterase?
gram. Can either male be excluded as the possible father?
Explain your reasoning.
P
1 P B1 B 5
A B M C A B M C

P
2 P B2 B 6

P
3 P B3 B 7

P
4 P B4 B 8

Locus 1 Locus 2 HO

AU/ED: Art for prob 2.8 scaled

to 85% for consistency. OK?

Challenge Problems
2.31 The genome of Drosophila melanogaster is 180 A B C D E A B C D E
106 bp, and a fragment of size 1.8 kb is to be amplified by 么乆么乆么乆么乆么乆 X 么乆么乆么乆么乆么乆 X
PCR. How many cycles of PCR are necessary for the am-
plified target sequence to constitute at least 99 percent of
the total DNA?

2.32 A murder victim is found in an advanced state of

decomposition and cannot be identified. Police suspect
that the victim is one of five persons reported by their
parents as missing. DNA typing is carried out on tissues
from the victim (X) and on the five sets of parents (A
through E), using probes for a highly polymorphic DNA
marker on two different chromosomes (“locus 1” and
“locus 2”). The result is shown in the diagram at the
right. How do you interpret the fact that genomic DNA
from each individual yields two bands? Can you identify
the parents of the victim? Explain your reasoning.

2.33 The snake venom phosphodiesterase enzyme de-

scribed in Problem 2.30 was originally used in a proce-
dure called “nearest neighbor” analysis. In this Locus 1 Locus 2
procedure, a DNA strand is synthesized in the presence of Problem 2.32
all four trinucleotides, one of which carries a radioactive
phosphate in the α (innermost) position. Then the DNA is
digested to completion with snake venom phos-
phodiesterase, and the resulting mononucleotides are (a) How does this procedure reveal the “nearest neigh-
separated and assayed for radioactivity. Examine the dia- bors” of the radioactive nucleotide?
gram in Problem 2.30, and then answer the following (b) Is the “nearest neighbor” on the 5' or the 3' side of
questions. the labeled nucleotide?

84 Chapter 2 DNA Structure and DNA Manipulation

Further Reading
Botstein, D., R. L. White, M. Skolnick, and R. W. Davis. Loxdale, H. D., and G. Lushai. 1998. Molecular markers
1980. Construction of a genetic linkage map in man in entomology. Bulletin of Entomological Research 88:
using restriction fragment length polymorphisms. 577–600.
American Journal of Human Genetics 32: 314. Mitton, J. B. 1994. Molecular approaches to population
Calladine, C. R., and H. Drew. 1997. Understanding DNA: biology. Annual Review of Ecology & Systematics
The Molecule and How it Works. 2d ed. San Diego: 25: 45.
Academic Press. Mullis, K. B. 1990. The unusual origin of the polymerase
Cruzan, M. B. 1998. Genetic markers in plant evolution- chain reaction. Scientific American, April.
ary ecology. Ecology 79: 400. Olby, R. C. 1994. The Path to the Double Helix: The Discovery
Danna, K., and D. Nathans. 1971. Specific cleavage of of DNA. New York: Dover.
Simian Virus 40 DNA by restriction endonuclease of Pena, S. D. J., V. F. Prado, and J. T. Epplen. 1995. DNA
Hemophilus influenzae. Proceedings of the National Academy diagnosis of human genetic individuality. Journal of
of Sciences, USA 68: 2913. Molecular Medicine 73: 555.
DePamphilis, M. L., ed. 1996. DNA Replication in Sayre, A. 1975. Rosalind Franklin and DNA. New York:
Eukaryotic Cells. Cold Spring Harbor, NY: Cold Spring Norton.
Harbor Press. Schafer, A. J., and J. R. Hawkins. 1998. DNA variation
Eeles, R. A. and A. C. Stamps. 1993. Polymerase Chain and the future of human genetics. Nature Biotechnology
Reaction (PCR): The Technique and Its Application. Austin 16: 33.
TX: R. G. Landes. Smithies, O. 1995. Early days of electrophoresis. Genetics
Frank-Kamenetskii, M. D. 1997. Unraveling DNA: The 139: 1.
Most Important Molecule of Life. Tr. by L. Liapin. Reading Thomson, G., and M. S. Esposito. 1999. The genetics of
MA: Addison-Wesley. complex diseases. Trends in Genetics 15: M17.
Hartl, D. L. 2000. A Primer of Populaton Genetics. 3d ed. Wang, D. G., J.-B. Fan, C.-J. Siao, A. Berno, P. Young, R.
Sunderland, MA: Sinauer. Sapolsky et al. 1998. Large-scale identification,
Jorde, L. B., M. Bamshad, and A. R. Rogers. 1998. Using mapping, and genotyping of single-nucleotide
mitochondrial and nuclear DNA markers to polymorphisms in the human genome. Science 280:
reconstruct human evolution. Bioessays 20: 126. 1077.
Kumar, L. S. 1999. DNA markers in plant improvement: Watson, J. D. 1968. The Double Helix. New York:
An overview. Biotechnology Advances 17: 143. Atheneum.