Bulk Segregant Analysis
Bulk Segregant Analysis
Bulk Segregant Analysis
USA
Vol. 88, pp. 9828-9832, November 1991
Genetics
ABSTRACT We developed bulked segregant analysis as a phisms (RFLPs) and random amplified polymorphic DNAs
method for rapidly identifying markers linked to any specific (RAPDs). The majority of studies over the last 7 years have
gene or genomic region. Two bulked DNA samples are gener- employed RFLPs. They are often codominant but are re-
ated from a segregating population from a single cross. Each stricted to regions with low or single copy sequences. Re-
pool, or bulk, contains individuals that are identical for a cently, RAPD markers have been developed by Williams et
particular trait or genomic region but arbitrary at all unlinked al. (3). This technique relies on the differential enzymic
regions. The two bulks are therefore genetically dissimilar in amplification of small DNA fragments using PCR with arbi-
the selected region but seemingly heterozygous at all other trary oligonucleotide primers (usually 10-mers). Polymor-
regions. The two bulks can be made for any genomic region and phisms result from either chromosomal changes in the am-
from any segregating population. The bulks are screened for plified regions or base changes that alter primer binding. The
differences using restriction fragment length polymorphism procedure is rapid, requires only small amounts of DNA,
probes or random amplified polymorphic DNA primers. We which need not be of high quality, and involves no radioac-
have used bulked segregant analysis to identify three random tivity. As no Southern hybridization is involved, polymor-
amplified polymorphic DNA markers in lettuce linked to a gene phisms can be detected in fragments containing highly re-
for resistance to downy mildew. We showed that markers can peated sequences; this provides markers in regions of the
be reliably identified in a 25-centimorgan window on either side genome previously inaccessible to analysis. The RAPD
of the targeted locus. Bulked segregant analysis has several markers are usually dominant because polymorphisms are
advantages over the use of near-isogenic lines to identify detected as the presence or absence of bands. RAPD markers
markers in specific regions of the genome. Genetic walking will provide a quick method for generating genetic maps and
be possible by multiple rounds of bulked segregation analysis; analyzing populations. The ability to target RAPD markers
Downloaded from https://www.pnas.org by 152.58.67.46 on June 23, 2023 from IP address 152.58.67.46.
each new pair of bulks will differ at a locus identified in the efficiently to specific genes or regions monomorphic for
previous round of analysis. This approach will have widespread previously characterized markers or to regions sparsely
application both in those species where selfing is possible and populated with markers will increase their usefulness further.
in those that are obligatorily outbreeding. In this paper, we describe bulked segregant analysis as a
method for rapidly identifying RFLP or RAPD markers in
We have developed a method, bulked segregant analysis, as any genomic region of interest. We illustrate the procedure
a rapid procedure for identifying markers in specific regions by identifying RAPD markers linked to a disease-resistance
of the genome. The method involves comparing two pooled gene for which no near-isogenic lines (NILs) exist and by
DNA samples of individuals from a segregating population analyzing the performance of RFLP and RAPD markers that
originating from a single cross. Within each pool, or bulk, the are known to be linked to disease-resistance genes. This
individuals are identical for the trait or gene of interest but are procedure efficiently identifies markers linked to genes of
arbitrary for all other genes. Two pools contrasting for a trait interest, allowing their rapid placement on a genetic map. It
(e.g., resistant and susceptible to a particular disease) are also can be used to consolidate genetic maps by identifying
analyzed to identify markers that distinguish them. Markers markers in sparsely populated regions and at the end of
that are polymorphic between the pools will be genetically linkage groups.
linked to the loci determining the trait used to construct the
pools. Bulked segregant analysis has two immediate appli- MATERIALS AND METHODS
cations in developing genetic maps. Detailed genetic maps for
many species (1) are being developed by analyzing the Basic Method. Bulked segregant analysis involves screen-
segregation of randomly selected molecular markers in single ing for differences between two pooled DNA samples derived
populations. As a genetic map approaches saturation, the from a segregating population that originated from a single
continued mapping of polymorphisms detected by arbitrarily cross. Each pool, or bulk, contains individuals selected to
selected markers becomes progressively less efficient (2). have identical genotypes for a particular genomic region
Bulked segregant analysis provides a method to focus on ("target locus or region") but random genotypes at loci
regions of interest or areas sparsely populated with markers. unlinked to the selected region (Fig. 1). Therefore, the two
Also, bulked segregant analysis is a method for rapidly resultant bulked DNA samples differ genetically only in the
locating genes that do not segregate in populations initially selected region and are seemingly heterozygous and mono-
used to generate the genetic map. morphic for all other regions. The two bulks are screened for
Two types of molecular markers have been used to develop differences the same way as NILs, with several RFLP probes
detailed genetic maps, restriction fragment length polymor- simultaneously (4) or individual RAPD primers (5-10 loci
The publication costs of this article were defrayed in part by page charge Abbreviations: RFLP, restriction fragment length polymorphism;
payment. This article must therefore be hereby marked "advertisement" RAPD, random amplified polymorphic DNA; cM, centimorgan(s);
in accordance with 18 U.S.C. §1734 solely to indicate this fact. NIL, near-isogenic line.
9828
Genetics: Michelmore et al. Proc. Natl. Acad. Sci. USA 88 (1991) 9829
tion between the two bulks with decreasing linkage until the detect unequal amounts of alleles was tested with RFLP and
locus appears unlinked. Obviously, loci not segregating in the RAPD markers to determine at what genetic distance there
population, whether linked or not, will not distinguish the would be sufficient recombinants to cause the bulked samples
bulks. Bulked segregant analysis does not reveal novel types to appear monomorphic. Both the screening of artificial
of variation but rather allows the rapid screening of many loci mixtures of parental DNA with single markers and the
and therefore the identification of segregating markers in the screening of bulks with markers known to be linked to
target region. various degrees with the target loci were tested.
Plant Material. To identify new markers linked to Dm518 Artificial mixtures were used to test the sensitivity of
and analyze the performance of existing markers, bulks were RAPD and RFLP analysis. For RAPDs, DNA samples from
made from the basic mapping population, cvs. Calmar x two species, Lactuca saligna and L. sativa, were mixed in
Kordaat, used to generate the genetic map of lettuce (5, 6). various ratios. The use of distinct species ensured a large
This F2 population consists of 66 individuals and segregates number of polymorphic bands. Reciprocal dilutions were
for six downy mildew resistance genes, Dm1, Dm3, Dm4, made to provide a series of 15 mixtures with each species
Dm518, Dm7, and Dm13. DNA was extracted as described present in proportions of 0.0, 0.001, 0.02, 0.04, 0.1, 0.2, 0.4,
(5) or by a modified hexadecyltrimethylammonium bromide and 0.5. These mixtures were screened for RAPDs with an
(CTAB) procedure (7). Aliquots (2.5 ,Ag of DNA) of each arbitrary primer (Fig. 2; 0.04-0.001 dilutions are not shown).
individual homozygous for one or the other allele of the As anticipated, the precise sensitivity of the procedure to
targeted gene were bulked together. The number of individ- detect the rarer allele was band (locus) specific. However,
uals in each bulk varied between 14 and 20 plants. The bulks RAPD analysis was unexpectedly insensitive in its ability to
were screened with arbitrary RAPD primers. Linkage be- detect the rarer allele. The rarer allele was barely detectable
tween Dm518 and loci polymorphic between the bulks was at a proportion of 0.1 of the mixture and was never detected
confirmed and quantified by analyzing their cosegregation in if it constituted a proportion of 0.04 or less of the total.
the F2 population used to construct the bulks. The F2 Depending on the band, polymorphisms could be detected
population of a second cross, Lactuca sativa cv. Saffier x when the rarer allele constituted proportions up to 0.2-0.4 of
Lactuca serriola PIVT1309, comprising 80 individuals, was the mixture, at least as differences in band intensities.
analyzed for markers linked to Dm16. Therefore, segregating markers within a window of. 10%
A F2 population was used, as it provided the greatest recombination either side of the target locus will always be
genetic window (the segment of the genome in which markers detectable. Many markers within a 30% recombination win-
are likely to be detected) around the locus. F3 families from dow will also be detectable, at least as bands of unequal
each F2 individual had previously been analyzed for resis- intensity. Similar experiments were done with a RFLP
tance (5). F2 individuals heterozygous for Dm518 were ex- marker (CL922; data not shown). Again, in mixtures equiv-
cluded from the analysis to allow the identification of RAPD alent to 20%o or less recombination, linkage could be detected
markers from both parents (i.e., RAPD bands in both cis and because the hybridization intensities between the alleles was
trans to the dominant Dn518 allele). If F3 analysis had not obviously different. In mixtures equivalent to 30% recombi-
.BM*w: X,
9830 Genetics: Michelmore et al. Proc. Natl. Acad. Sci. USA 88 (1991)
Table 1. The sensitivity of known RAPD markers to detect
polymorphism between bulked segregants
Dm gene Recombination Map
Primer targeted frequency distance Polymorphic
OPill 3 0.02 2 Yes
R62 4 0.13 15 Yes
OPB12 4 0.21 27 No
OPA01 3 0.25 35 No
*:
...
.f C
-:4
FIG. 3. Southern blots of parental and bulked DNA samples FIG. 4. RAPD markers detecting polymorphisms between bulks
probed with markers at known distances from the loci used to made for alternate alleles of Dm518. Each set of four lanes results
distinguish the bulks. (A-C) Bulks were made for alternate alleles of from PCR amplification with a different lO-mer oligonucleotide
Dm16 from F2 individuals of L. sativa cv. Saffier x L. serriola primer: OPHO4, OPH15, or OPF12. In each set, the first and second
PIVT1309. (D-F) Bulks were made for alternate alleles of Dm518 lanes contain parental DNA from Kordaat (K) and Calmar (C). The
from F2 individuals of cvs. Calmar x Kordaat. The first lane contains third lane contains bulked DNA from the homozygous susceptible
bulked DNA from homozygous-susceptible individuals; the second individuals (S), and the fourth lane contains bulked DNA from the
lane contains DNA from homozygous-resistant individuals. The homozygous resistant individuals (R). The polymorphisms distin-
third and fourth lanes contain parental DNA from PIVT1309 and guishing the bulks are indicated by a solid arrowhead. Other poly-
Saffier, respectively (A-C), or Kordaat and Calmar, respectively morphisms at unlinked loci that distinguish the parents but not the
(D-F). Blots in A, B, and C were probed with pCL922, pCL1419, and bulks of F2 individuals are indicated by small open arrowheads.
pCL1407, respectively. Dm16 is 0 cM, 7 cM, and 9 cM from CL922, Occasionally bands are present in the bulks but not either parent; this
CL1419, and CL1407, respectively. Blots in D, E, and F were probed phenomenon is sometimes also observed with heterozygous individ-
with pCL250, pCL849, and pCL1007, respectively. Dm518 is 11 cM uals. A-D beside the lanes for primer OPF12 identify the loci that are
from CL250, 26 cM from CL849, and unlinked to CL1007. diagrammed in Fig. 1.
Genetics: Michelmore et al. Proc. Natl. Acad. Sci. USA 88 (1991) 9831
0
0
0
co tected even more frequently in the Dm518 region because this
oD co
0)
o D co gene was originally introgressed from the wild species, L.
serriola.
1rC\J
0 c0
N
P-
I
m
I
E- cc I
o
0
° The observed experimental sensitivity of bulked segregant
QLL 1 1 I
0
II
00
Ia _y analysis correlated well with that predicted from reconstruc-
C co (0 Cm o) tion experiments and studies with known markers. All poly-
C~J (D L c rN ( Lei morphic loci assayed within 15 cM of the target locus are
likely to be identified; loci further away will be detected with
FIG. 5. Genetic map of the region of the lettuce genome con- decreasing frequency as genetic distance increases. Similar
taining Dm518. Three RAPD markers, OPH048wy, OPFJ2,14, and results were obtained with RFLP and RAPD markers. The
OPHJ5,19 were identified by bulked segregant analysis. The CL narrow width of the genetic window for RAPD markers was
prefix designates a RFLP locus detected by a cDNA clone. Markers not anticipated; in theory, the sensitivity of PCR might be
flanking this region have been described (5, 6). Genetic distances
were derived by multipoint analysis and are shown in centimorgans. expected to reveal alleles even when rarely present in the
Pairwise recombination distances between the markers and Dm518 mixture. In practice, even alleles as prevalent as a proportion
are reported in the text. of 0.1 of a mixture were barely detectable. This probably
reflects the competition that occurs during the initial cycles
using the basic mapping population. Two-point analysis of RAPD amplification between templates with various de-
showed the new markers to be 6, 8, and 12 cM from Dm518. grees of mismatch with the primer. Precise sensitivity will
Multipoint analysis indicated the map positions shown (Fig. vary with the sequence amplified for RAPD markers and with
5). the age of the blot and the particular probe for RFLP markers.
The width of the genetic window will also depend on the
DISCUSSION segregating population used to construct the bulks. Any
segregating population originating from a single cross can be
Several approaches have been suggested to saturate genomic used; bulks made from backcross populations would provide
regions of interest with molecular markers. These include greater focus around the region of interest than F2 popula-
preselection using NILs (4), preparative pulsed-field gel tions, which provide maximal genetic width of the region
electrophoresis (10), and chromosome walking and jumping screened for polymorphism. If sufficient individuals are
(11). Bulked segregant analysis provides a rapid, technically pooled to form each bulk, the genetic window will be sym-
simple alternative for identifying markers linked to specific metrical around the target locus; this is in contrast to the
genes. The only prerequisite is the existence of a population region around a locus selected during the generation of NILs,
resulting from a cross that segregates for the gene of interest. which may be extremely asymmetrical (12).
The success of the approach will depend on the genetic Bulked segregant analysis will allow rapid mapping of loci
divergence between the parents in the target region. that do not segregate in the original populations used to
The underlying principle of bulked segregant analysis is the develop the genetic map. Bulks for the unmapped locus
grouping of the informative individuals together so that a would be made from a new population segregating for that
particular genomic region can be studied against a random- locus and screened using markers known to be spaced at =30-
Downloaded from https://www.pnas.org by 152.58.67.46 on June 23, 2023 from IP address 152.58.67.46.
ized genetic background of unlinked loci. The minimum size to 40-cM intervals through the genome. If RFLP markers
of the bulk will be determined by the frequency with which were being used, Southern blots could include multiple pairs
unlinked loci might be detected as polymorphic between the of bulks for several loci segregating in different populations.
bulked samples. This in turn will depend on the type of Probes could be combined so that the whole genome could be
marker being screened (dominant or codominant) and the screened rapidly to locate several loci simultaneously. Once
type of population used to generate the bulks (F2, backcross, markers that distinguish the bulks are identified, precise
full sib, etc.). For a dominant RAPD marker segregating in an linkage distance could be determined by segregation analysis.
F2 population, the probability of a bulk of n individuals having If differences between a pair of bulks are not detected with
a band and a second bulk of equal size not having a band will existing RFLP or RAPD markers, the bulks would be
be 2(1 - [1/4J1)(1/4)1 when the locus is unlinked to the target screened for further markers using additional arbitrary RAPD
gene. Therefore, few individuals per bulk are required. For primers; new polymorphic markers and therefore the locus
example, the probability of an unlinked locus being polymor- would then be mapped by using an already characterized
phic between bulks of 10 such individuals is 2 x 10-6. Even population. If only the approximate genetic position is re-
when many loci are screened, the chances of detecting an quired for a trait, the individuals segregating for the trait
unlinked locus are small. As smaller bulks are utilized, the could be bulked prior to DNA extraction, necessitating only
frequency of false positives will increase. However, as the two extractions; the genetic position would be fixed by
linkage of all polymorphisms is confirmed by analysis of a analyzing the ability of a series of linked markers in the region
segregating population, bulked segregant analysis with only to distinguish the bulks.
small numbers of individuals in one or both bulks will provide Bulked segregant analysis overcomes several problems
great enrichment for markers linked to target loci. inherent in using NILs or cytogenetic stocks to identify
Bulked segregant analysis successfully identified markers markers linked to particular genes. There is minimal chance
linked to Dm518 for which no NILs exist. The procedure was that regions unlinked to the target region will differ between
rapid; it required fewer than 300 PCR reactions to identify the bulked samples of many individuals. In contrast, even
and map three new markers. The 100 primers screened -900 after five backcrosses, only half the loci polymorphic be-
loci. Screening more primers should identify more closely tween NILs are expected to map to the selected region (13).
linked markers, assuming a random distribution of loci de- Linkage drag of large regions of DNA associated with the
tected as RAPD markers and sufficient polymorphism in the selected region in NILs (12) will not be problematic. In
target region. The first assumption has yet to be tested, but control experiments and in bulks from an F2 population,
mapping data generated in this and other labs indicate that RFLP and RAPD polymorphisms were not detected further
RAPD markers are at least distributed throughout the ge- than 30 cM from the target locus. As bulked segregant
nome. Calmar and Kordaat represent distinct types within analysis detects polymorphic loci using a segregating popu-
the cultivated species; arbitrary primers detected an average lation, all loci detected will segregate and can be mapped.
of 0.53 polymorphism per primer between these two parents Some of the loci we have detected as polymorphic between
(R.V.K., unpublished results). Polymorphisms may be de- NILs did not segregate in any of the populations we are
9832 Genetics: Michelmore et al. Proc. Natl. Acad. Sci. USA 88 (1991)
currently mapping (8). Near-isogenic lines require many equilibrium between some loci and the target locus may
backcrosses to develop and are therefore time consuming to prevent their detection by bulked segregant analysis. How-
generate; in contrast, bulked segregants can be made imme- ever, if individuals from only a single family are bulked,
diately for any locus or genomic region once the segregating bulked segregant analysis should identify some linked mark-
population has been constructed. In addition, bulked segre- ers; only when the same alleles segregate in the gametes of
gants can substitute for cytogenetic stocks such as substitu- both parents but in linkage equilibrium with the target locus
tion and addition lines, for assigning probes to linkage groups will differences not be detected between the bulks. Bulking
or chromosome arms (14, 15), because bulks can be accu- individuals from multiple families will increase the probabil-
rately made for particular regions as needed and do not ity that linkage equilibrium will obstruct bulked segregant
require extensive cytological manipulations to generate and analysis. The challenge for human genetics will be to identify
maintain. individual families of sufficient size to allow informative
Bulks can be made to identify markers in regions that lack bulking. Even bulks made from small families will provide
markets, such as gaps in the genetic map or ends of linkage great enrichment for linked polymorphic markers; linkage
groups. To fill a gap in the genetic map, two bulks would be can be subsequently confirmed by segregation analysis of
made from the segregating F2 individuals of the mapping many families.
population. Each bulk would be homozygous for each non-
recombinant genotype for the interval; recombinants would We thank S. V. Tingey, J. A. Rafalski, and J. G. K. Williams
be excluded from the analysis. The bulks would be screened (DuPont and Nemours) for their help with RAPD markers and R.
for RAPDs; as only two reactions are required for each Sa4l (Operon Technologies) for providing many of the oligonucleo-
primer and each primer detects 5-10 loci, hundreds of loci tide primers. We gratefully acknowledge the support of U.S. De-
can be screened per day. Linkage'would be confirmed by partment of Agriculture Grant USDA-88-CRCR-37262-3522.
segregation analysis. Similarly, bulked segregant analysis 1. O'Brien, S. J., ed. (1990) Genetic Maps (Cold Spring Harbor
can be made sequentially for neighboring regions to define Lab., Cold Spring Harbor, NY), Vol. 6.
the genetic end of a linkage group. Bulks would be made of 2. Bishop, D. T., Cannings, C., Scolnick, M. & Williamson, J.
each homozygous genotype for the terminal locus of a linkage (1983) in Statistical Analysis of DNA Sequencing Data, ed.
group. If new markers are identified that are distal to the Weir, B. S. (Dekker, New York), pp. 181-200.
original terminal marker, bulked segregant analysis will be 3. Williams, J. G. K., Kubelik, A. R., Livak, K. J., Rafalski,
repeated until rho more distal markers are identified and the J. A. & Tingey, S. V. (1990) Nucleic Acids Res. 18, 6531-6535.
genetic end of the linkage group is reached. This would be 4. Young, N. D., Zamir, D., Ganal, M. W. & Tanksley, S. D.
"genetic walking" along the chromosome: a locus identified (1988) Genetics 120, 579-585.
in one round of bulked segregation analysis is used to 5. Landry, B. S., Kesseli, R. V., Farrara, B. & Michelmore,
R. W. (1987) Genetics 116, 331-337.
generate the bulks for the next round. Such an analysis would 6. Kesseli, R. V., Paran, I. & Michelmore, R. W. (1990) in
consolidate the genetic map until the number of linkage Genetic Maps, ed. O'Brien, S. J. (Cold Spring Harbor Lab.,
groups equaled the chromosome number, unless a large Cold Spring Harbor, NY), Vol. 6, pp. 100-102.
region is nearly monomorphic between the two parents or 7. Bernatzky, R. & Tanksley, S. D. (1986) Theor. Appl. Genet. 72,
there are regions of extremely frequent recombination. 314-321.
Downloaded from https://www.pnas.org by 152.58.67.46 on June 23, 2023 from IP address 152.58.67.46.
Bulked segregant analysis could be extended to the anal- 8. Paran, I., Kesseli, R. & Michelmore, R. W. (1991) Genome, in
ysis of' genetically complex traits by screening bulks of press.
informative individuals. If a quantitative trait is controlled by 9. Lander, E. S., Green, P., Abrahamson, J., Barlow, A., Daly,
M. J., Lincoln, S. E. & Newburg, L. (1987) Genomics 1,
a few major genes (QTL), comparison of bulks of extreme 174-181.
individuals could rapidly identify markers linked to QTL. 10. Michiels, F., Burmeister, M. & Lehrach, H. (1987) Science 236,
This could be made more powerful by progeny testing the 1305-1308.
extreme individuals and discarding those that do not show 11. Rommens, J. M., lannuzzi, M. C., Kerem, B.-s., Drumm,
heritable variation. Bulked segregant analysis may also be M. L., Meimer, G., Dean, M., Rozmahel,,'R., Cole, J. L.,
useful in mapping loci showing partial penetrance, such as Kennedy, D., Hidaka, N., Zsiga, M., Buchwald, M., Riordan,
some disease loci in humans. A bulk of those progeny J. R., Tsui, L.-C. & Collins, F. S. (1989) Science 245, 1059-
expressing the trait would be compared to the parents or a 1065.
bulk of nonexpressing progeny (depending on the dominance 12. Young, N. D. & Tanksley, S. D. (1989) Theor. Appl. Genet. 77,
353-359.
relationships and the homozygosity of the parents). 13. Muelbauer, G. J., Specht, J. E., Thomas-Compton, M. A.,
Bulked segregant analysis should also be useful in analyz- Staswick, P. E. & Bernard, R. L. (1988) Crop Sci. 28, 729-735.
ing species that are obligatorily outbreeding as in most animal 14. Helentjaris, T., Weber, D. F. & Wright, S. (1986) Proc. Natl.
species. In obligatorily outbreediiig species, if the two par- Acad. Sci. USA 83, 6035-6039.
ents originate from an interbreeding population, linkage 15. Weber, D. & Helentjaris, T. (1989) Genetics 121, 583-590.