Phenotypic and Molecular Evolution Across 10,000 Generations in Laboratory Budding Yeast Populations
Phenotypic and Molecular Evolution Across 10,000 Generations in Laboratory Budding Yeast Populations
Phenotypic and Molecular Evolution Across 10,000 Generations in Laboratory Budding Yeast Populations
complex combination of deterministic and stochastic forces. On the one hand, beneficial mutations
that establish within a population often rise to fixation at rates nearly perfectly predicted by deca-
des-old theory. On the other hand, random forces such as mutation, genetic drift, and recombina-
tion ensure an enduring role for chance and contingency. To understand evolution, we must
appreciate the interactions between these deterministic and stochastic components.
While there is extensive theoretical work analyzing how the interplay between these factors
affects the rate, predictability, and molecular basis of evolution, empirical evidence remains relatively
limited. In large part, this stems from a basic difficulty: we cannot easily characterize the predictabil-
ity of evolution using observational studies of natural populations, because we cannot replicate evo-
lutionary history. In addition, the inferences we can make from extant populations and the fossil
record are limited by a lack of complete data.
To circumvent these difficulties, scientists have turned to laboratory evolution experiments, pri-
marily in microbial populations. These provide a simple model system in which researchers can main-
tain many replicate populations for hundreds or thousands of generations, in a setting where the
environment and other relevant parameters (e.g. population size) can be precisely controlled and
manipulated. By conducting phenotypic and sequencing studies of the resulting evolved lines, we
can observe evolution in action, and ask whether specific phenotypic and genotypic outcomes are
predictable.
Over the last several decades, a few consistent results have emerged from these types of experi-
ments (reviewed in Kassen, 2014). As populations evolve in a constant environment, they gain fit-
ness along a fairly predictable trajectory, following a pattern of declining adaptability in which the
rate of fitness increase slows as populations adapt (Couce and Tenaillon, 2015;
Kryazhimskiy et al., 2014; Wiser et al., 2013). Meanwhile, the rate of molecular evolution remains
roughly constant (Barrick et al., 2009; Good et al., 2017; Tenaillon et al., 2016). Mutations are
rarely predictable at the nucleotide level but often moderately predictable at higher levels: muta-
tions in certain genes or pathways are repeatedly fixed across replicate populations (Bailey et al.,
2015; Kryazhimskiy et al., 2014; Tenaillon et al., 2012; Tenaillon et al., 2016). Phenotypes not
under direct selection change less predictably than fitness in the evolution environment, but some-
times still exhibit some correlation with level of adaptation in the evolution environment
(Jerison et al., 2020; Leiby and Marx, 2014; Ostrowski et al., 2005).
Most of these microbial and viral evolution experiments, as well as those in multicellular eukar-
yotes such as C elegans and Drosophila melanogaster, involve at most about 1000 generations of
adaptation to a novel environment. This makes them well suited to studying the initial dynamics of
adaptation, where a population encounters a novel environment and rapidly acquires beneficial
mutations as it evolves in response to this new challenge. However, it is unclear how far we can
extrapolate findings from this type of study. Will evolutionary dynamics remain similar over longer
timescales? Or will the evolutionary dynamics change in qualitative ways once a population has had
thousands of generations to become well-adapted to the laboratory environment?
The experiment best equipped to answer this question is the Long-Term Evolution Experiment
(LTEE) conducted by Richard Lenski and collaborators. For over 30 years and 70,000 generations
(reviewed in Lenski, 2017), the Lenski lab has propagated 12 Escherichia coli populations in minimal
media by batch culture. The LTEE has led to numerous insights into evolutionary dynamics over both
short and long timescales, and has also provided many examples of interesting phenomena such as
contingency (Blount et al., 2012; Good et al., 2017), the spontaneous emergence of quasi-stable
coexistence (Good et al., 2017; Plucain et al., 2014; Rozen and Lenski, 2000), and evolution of
mutation rates (Sniegowski et al., 1997; Wielgoss et al., 2013). The LTEE is unique among micro-
bial evolution experiments in its long timescale, and provides an important look at evolution well
beyond the initial rapid adaptation of a population to a novel laboratory environment. However, it is
limited by its specificity: it involves 12 replicate populations, each founded from a single E. coli
strain, all evolving in the same constant environment. It thus remains unclear which of the broad con-
clusions drawn from this experiment will be generalizable to other organisms and environments.
Would we draw similar conclusions when other species are allowed to evolve in other environments
for long periods of time?
While no other laboratory evolution experiments match the LTEE in timescale, a few have
extended beyond the ~1,000 generations of most other experiments. For example, Behringer et al.,
2018 evolved E. coli populations in tubes for up to 10,000 generations and found that they
repeatedly evolved a biofilm phenotype and stable coexisting subpopulations. Fisher et al., 2018
evolved laboratory populations of the budding yeast S. cerevisiae for 4000 generations, finding that
as in E. coli, these populations gain fitness along predictable trajectories characterized by declining
adaptability. This experiment, along with Marad et al., 2018, also studied the relationship between
ploidy and adaptation, finding that in general diploids adapt more slowly than haploids. Slower
adaptation in diploids has been observed in yeast evolution experiments in a variety of environments
and appears to be caused by the reduced efficacy of selection on recessive or partially recessive
beneficial mutations in diploids (Zeyl et al., 2003; Gerstein et al., 2011). While these experiments
provide an important first look into long-term adaptation in yeast and E. coli, they all involve rela-
tively limited whole-population sequencing, and none have provided data on the dynamics of molec-
ular evolution in both haploid and diploid populations over many thousands of generations.
To fill this gap, we established a long-term evolution experiment in the spirit of the LTEE, with a
total of 205 budding yeast populations (split between haploids and diploids) evolving in three differ-
ent laboratory environments. In this paper, we describe the first 10,000 generations of this experi-
ment. We find that some aspects of evolution in our system are broadly consistent with the
conclusions of the LTEE and other long-term evolution experiments. For example, the dynamics of
fitness increase are largely repeatable between replicate lines and show a pattern of declining
adaptability over time even while the rate of molecular evolution remains relatively constant. How-
ever, there are also key differences: we find no evidence of stably coexisting lineages or widespread
evolution of mutator phenotypes. As the first laboratory evolution of this length in a eukaryotic sys-
tem, our study provides an important test of the generality of conclusions from earlier work (primar-
ily the LTEE), as well as a novel opportunity to observe evolutionary dynamics over long timescales
across many replicate populations in multiple environmental conditions.
Results
We founded 45 haploid mating type a (MATa), eight mating type a (MATa), and 37 diploid S. cerevi-
siae populations in each of three evolution environments (90 populations per environment, for a total
of 270 independent lines; see Figure 1). Each population was founded from a single independent
colony of the corresponding ancestral W303 MATa, MATa, or diploid strain (see Materials and meth-
ods for details). We then propagated each population in batch culture in one well of an unshaken
1 2
Gen. 0
3 4 5 6 7 8 9 10 11 12 1
Gen. 10
2 3 4 5 6 7 8 9 10 11 12
Gen. 10,000
1 2 3 4 5 6 7 8 9 10 11 12
A A A
YPD B
E
B
E
B
30°C F
H
F
H
F
1/1024 dilution
1 2
Gen. 0
3 4 5 6 7 8 9 10 11 12 1
Gen. 10
2 3 4 5 6 7 8 9 10 11 12
Gen. 10,000
1 2 3 4 5 6 7 8 9 10 11 12
A A A
Three SC B
E
B
E
B
Environments 30°C
F F F
G G G
H H H
1/1024 dilution
1 2
Gen. 0
3 4 5 6 7 8 9 10 11 12 1
Gen. 8
2 3 4 5 6 7 8 9 10 11 12
Gen. 8,000
1 2 3 4 5 6 7 8 9 10 11 12
A A A
SC B
E
B
E
B
37°C
F F F
G G G
H H H
Figure 1. Experimental design. We propagated budding yeast lines in 96-well microplates in one of three environmental conditions, using a daily
dilution protocol as shown at top. Each population was founded by a single clone of one of three ancestral genotypes (a haploid MATa, a haploid
MATa, and a diploid, all derived from the W303 strain background). On a weekly basis, we froze all populations in glycerol at 80˚C for long-term
storage. The frozen timepoints used for the analyses in this paper are indicated at bottom.
96-well microplate in the appropriate environment (YPD at 30˚C, SC at 30˚C, and SC at 37˚C), with
daily 1:210 dilutions for the 30˚C environments, and 1:28 dilutions for the 37˚C environment. We froze
glycerol stocks of each population every week (corresponding to every 70 generations in the 30˚C
environments, and every 56 generations in the 37˚C environment), creating a frozen fossil record for
future analysis. A total of 65 populations were lost during the first 10,000 generations of evolution
due to contamination, evaporation, or pipetting errors (see Materials and methods for details;
Supplementary file 1), leaving us with 205 populations.
Figure 2. Fitness changes during evolution. Competitive fitness is plotted relative to a reference strain in each environment. Inferred ancestral fitness is
indicated by horizontal lines and colored by strain. Populations with premature stop-codon reversion mutations in ADE2 are indicated by asterisks.
Correlations between replicate fitness measurements are shown in Figure 2—figure supplement 2.
The online version of this article includes the following figure supplement(s) for figure 2:
Figure supplement 1. Declining adaptability.
Figure supplement 2. Correlations between absolute fitness measured in replicate competitions with a fluorescent reference.
population, but also between ancestral strains of different ploidy and mating type (Figure 2—figure
supplement 1).
Molecular evolution
At six of the timepoints used for fitness assays (Figure 1), we also performed whole-population,
whole-genome sequencing in 90 focal populations (12 MATa, 12 diploid, and 6 MATa from each
environment). After aligning sequencing reads and calling variants, we use observed allele counts
across multiple timepoints to filter out sequencing and alignment errors and identify a set of muta-
tions present in each evolving population (Materials and methods). At each sequenced timepoint,
we call mutations fixed if they are at greater than or equal to 40% frequency (diploids) or 90% fre-
quency (haploids) and do not drop below these thresholds at a later timepoint. We additionally call
loss of heterozygosity in mutations in diploids using the criteria for fixation in haploids (90%
threshold).
Our data shows that mutations fix steadily through time across all sequenced populations (Fig-
ure 3). While we would need more sequenced timepoints to fully observe the frequency trajectories
of mutations in these populations, we can see a few patterns from our temporally sparse sequencing
(Figure 3A, Figure 3—figure supplement 1–9). We frequently observe clonal interference in which
groups of mutations rise to high frequency and then plummet to extinction, outcompeted by
another group. All populations fix mutations throughout the experiment; we find no evidence for
the emergence of stably coexisting lineages within any of our populations (Figure 3B, Figure 3—fig-
ure supplement 10). Denser sequencing through time would be required to determine whether any
populations exhibit shorter periods of semistable coexistence (e.g. as seen by Frenkel et al., 2015).
It is also possible we are missing coexistence of haplotypes at very low frequency ( < ~ 5%), which
sequencing may not be able to detect. However, our results rule out long-term coexistence of multi-
ple lineages at substantial frequencies like that observed after 10,000 generations of evolution in the
LTEE or Behringer et al., 2018.
We find that the rate of mutation accumulation in the MATa populations is consistently higher
than in MATa or diploid populations (Figure 3B). This is likely due to a higher mutation rate in our
MATa ancestor. Consistent with this hypothesis, we find that MATa populations have a lower ratio
of nonsynonymous to synonymous mutations than MATa or diploid populations in all three environ-
ments, as expected if a higher mutation rate leads to an increase in hitchhiking (although we note
that this comparison is only significant in SC 37˚C; p<0.01, Mann-Whitney U Test, Figure 4A). We
identified a putative causal mutation in TSA1 in our MATa ancestor; this mutation is absent in our
MATa ancestor and heterozygous in our diploid ancestor. We confirmed that the TSA1 mutation
increases mutation rate in a BY strain background (Figure 4—figure supplement 1).
Overall, we find that dN/dS ratios for fixed mutations in our populations are near one
(Figure 4A), suggesting that selection in favor of beneficial (and presumably typically nonsynony-
mous) mutations is balanced by hitchhiking of neutral mutations and purifying selection against dele-
terious mutations. The relative prevalence of different types of fixed mutations across strains and
environments are similar, with roughly 45–50% missense mutations, 40–45% synonymous and non-
coding mutations, and 5–10% nonsense and indel mutations (Figure 4B). While there is variation
between populations in the number of mutations accumulated, we do not observe any sudden
increases in the rate of mutation accumulation (Figure 3B). This stands in contrast to the LTEE,
where mutator alleles sweep to fixation and dramatically increase the mutation rate in 6 of 12 repli-
cate populations (four of which are apparent in sequencing data after 10,000 generations
[Good et al., 2017]). We do observe one potential mutator event: P1E11, an MATa population
evolved in YPD 30˚C has an unusually large number of indel mutations, likely due to a mutation in
the mismatch repair protein MSH3 that hitchhiked to fixation with an indel mutation in GPB2 (Fig-
ure 4—figure supplement 2). However, the elevation in mutation rate in this population remains rel-
atively modest. While we do observe mutations in mutator-associated genes such as MSH3 in other
populations, we do not observe clear differences in mutation-type distribution or rate of mutation
accumulation in these populations, suggesting that these mutations lead to at most subtle changes
in mutation rate (stacked mutation type plots for each population are shown in Figure 4—figure
supplements 3–14). Further work will be needed to characterize more subtle variation in mutation
rate in each of these populations.
Figure 3. Dynamics of molecular evolution. (A) Allele frequencies over time in four example populations. Nonsynonymous mutations in ‘multi-hit’ genes
are solid black lines (see ‘Parallelism’ section below), nonsynonymous mutations in the adenine biosynthesis pathway are colored orange and labeled,
other nonsynonymous mutations are thin gray lines, and synonymous mutations are dotted lines. (B) Number of fixed mutations over time in each
population. Timepoints with average coverage less than 10 (for haploids) or 20 (for diploids) are not plotted.
The online version of this article includes the following figure supplement(s) for figure 3:
Figure supplement 1. Allele frequencies over time in all focal diploid populations in YPD 30˚C.
Figure supplement 2. Allele frequencies over time in all focal MATa populations in YPD 30˚C.
Figure supplement 3. Allele frequencies over time in all focal MATa populations in YPD 30˚C.
Figure supplement 4. Allele frequencies over time in all focal diploid populations in SC 30˚C.
Figure supplement 5. Allele frequencies over time in all focal MATa populations in SC 30˚C.
Figure supplement 6. Allele frequencies over time in all focal MATa populations in SC 30˚C.
Figure supplement 7. Allele frequencies over time in all focal diploid populations in SC 37˚C.
Figure supplement 8. Allele frequencies over time in all focal MATa populations in SC 37˚C.
Figure supplement 9. Allele frequencies over time in all focal MATa populations in SC 37˚C.
Figure supplement 10. No evidence of coexistence.
Figure supplement 11. Copy number variation in the ribosomal DNA array and CUP1 array, determined from sequencing coverage data.
Figure 4. Types of mutations. (A) Swarm plot of dN/dS (ratio of nonsynonymous / synonymous fixations by the final timepoint, scaled by the ratio of
possible nonsynonymous / synonymous mutations across the genome) for each environment-strain combination. Each point represents one population
and the horizontal line represents the median. Asterisks indicate significant differences (p<0.01, Mann-Whitney U test) between strains in the same
environment. (B) Breakdown of mutation types for all mutations fixed by the final timepoint, in all populations corresponding to each environment-strain
combination.
The online version of this article includes the following figure supplement(s) for figure 4:
Figure supplement 1. Confirmation that the TSA1 mutation increases mutation rate.
Figure supplement 2. Population P1E11, a putative mutator.
Figure supplement 3. Stacked plot of heterozygous or homozygous fixed mutation types over time in all focal diploid populations in YPD 30˚C.
Figure supplement 4. Stacked plot of homozygous-only (lost heterozygosity) fixed mutation types over time in all focal diploid populations in YPD 30˚
C.
Figure supplement 5. Stacked plot of fixed mutation types over time in all focal MATa populations in YPD 30˚C.
Figure supplement 6. Stacked plot of fixed mutation types over time in all focal MATa populations in YPD 30˚C.
Figure supplement 7. Stacked plot of heterozygous or homozygous fixed mutation types over time in all focal diploid populations in SC 30˚C.
Figure supplement 8. Stacked plot of homozygous-only (lost heterozygosity) fixed mutation types over time in all focal diploid populations in SC 30˚C.
Figure supplement 9. Stacked plot of fixed mutation types over time in all focal MATa populations in SC 30˚C.
Figure supplement 10. Stacked plot of fixed mutation types over time in all focal MATa populations in SC 30˚C.
Figure supplement 11. Stacked plot of heterozygous or homozygous fixed mutation types over time in all focal diploid populations in SC 37˚C.
Figure supplement 12. Stacked plot of homozygous-only (lost heterozygosity) fixed mutation types over time in all focal diploid populations in SC 37˚
C.
Figure supplement 13. Stacked plot of fixed mutation types over time in all focal MATa populations in SC 37˚C.
Figure supplement 14. Stacked plot of fixed mutation types over time in all focal MATa populations in SC 37˚C.
Parallelism
Next, we examined whether mutations in certain genes are fixed more frequently than we would
expect by chance. We define a ‘hit’ as a nonsynonymous mutation that is fixed by the final timepoint,
and define the multiplicity of a gene as the number of hits in that gene across all sequenced popula-
tions, divided by its relative target size (Good et al., 2017). As in many other laboratory evolution
experiments, we observe an excess of high multiplicity genes in our data, relative to a null in which
mutations are fixed randomly across all open-reading frames (Figure 5A).
To understand the functional basis of this parallelism, we focus on multi-hit genes, defined as
those with hits in six or more populations. These multi-hit genes (Figure 6) are enriched for several
gene ontology (GO) terms (Supplementary file 4), indicating parallelism at the level of biosynthetic
Figure 5. Parallelism. Comparison between null and actual distributions of (A) the fraction of genes with multiplicity m (see Materials and methods),
(B) the fraction of genes with hits in PH populations, and (C) the fraction of amino acid sites with hits in PH populations (those with PH 3 are listed
in Supplementary file 4). For all three plots, the null distribution (shown in gray) is obtained by simulating random hits to genes, taking into account
the number of hits in each population in our data and the relative length of each gene.
and signaling pathways. In Figure 6, we show all genes with hits in ten or more populations, and
highlight several key functional groups (adenine biosynthesis, sterility, and negative regulators of the
Ras pathway; see Figure 6—figure supplements 1–3 for analogous figures for all other multi-hit
genes). Mutations in the latter two functional groups are commonly observed in yeast evolution
experiments, and have been shown to be beneficial in similar environments (Rojas Echenique et al.,
2019; Kryazhimskiy et al., 2014; Lang et al., 2013; Venkataram et al., 2016). The mutations in
adenine biosynthesis, by contrast, reflect the particular genotype of our ancestral strains; we discuss
these further below.
We next asked whether some multi-hit genes are more likely to fix mutations in particular strain
backgrounds or environments. We find that most multi-hit genes have mutations distributed across
both haploid mating types, diploids, and all three environmental conditions, indicating that these
mutations are presumably beneficial in all these contexts. However, we do find several mutations
that are either strain or environment specific (‘Effect’ column in Figure 6; Figure 6—figure supple-
ment 4, Supplementary file 4). For example, mutations in SRS2 and LCB3 are fixed more often in
SC 37˚C, while mutations in CCW12 are fixed more in diploids.
To investigate the impact of the mutations in multi-hit genes on protein function, we used SnpEff
(Cingolani et al., 2012) to predict the impact of each mutation. In Figure 6, we show the fraction of
mutations in each multi-hit gene that were annotated as ’High Impact’. Because most of these high-
impact mutations are nonsense or frameshift mutations, they are very likely to lead to loss of function
of the associated gene, as are some fraction of the ‘Moderate Impact’ mutations (e.g. some mis-
sense mutations or in-frame deletions). We find that many of our multi-hit genes have a large per-
centage of high-impact mutations, suggesting that selection acts in favor of loss-of-function of the
corresponding genes, consistent with many earlier laboratory evolution experiments (Murray, 2020).
However, this is not universal: a few genes with 10 or more hits have no high-impact mutations fixed,
and several of these genes are essential (Figure 6). This suggests that selection in these genes may
be instead for change- or gain-of-function.
Figure 6. Multi-hit genes. Each row represents a gene. The first three blocks are groups of genes identified from gene-ontology enrichment analysis of
multi-hit genes (from top to bottom: adenine biosynthesis, sterility, and negative regulation of the Ras pathway). The bottom block is all other genes
with hits in at least 10 populations. Each column in the heatmap represents a population, such that if a gene is hit in that population the square will be
colored (darker color if a gene is hit two or more times in that population). Red squares indicate premature-stop-lost mutations in ADE2, which
correspond to the populations with asterisks in Figure 2. One population that was not sequenced (not shown here) also has this mutation (confirmed
by Sanger sequencing). The table at left gives more information on each multi-hit gene: ‘High impact’ is the fraction of hits that are likely to cause a
loss-of-function, as annotated by SnpEff (e.g. nonsense mutations), ‘LOH’ (loss of heterozygosity) is the fraction of hits in diploid populations that fix
homozygously, and ‘Effect’ describes whether the hits are distributed significantly unevenly across strain-types (S), environments (E), or both (SxE), when
compared to a null model where fixations are not strain or environment dependent.
The online version of this article includes the following figure supplement(s) for figure 6:
Figure supplement 1. Same as Figure 6, but for all multi-hit genes not shown in Figure 6 (plot 1/3).
Figure supplement 2. Same as Figure 6, but for all multi-hit genes not shown in Figure 6 (plot 2/3).
Figure supplement 3. Same as Figure 6, but for all multi-hit genes not shown in Figure 6 (plot 3/3).
Figure supplement 4. Same as Figure 6, but for all multi-hit genes where hits are distributed significantly unevenly across strain-types (S),
environments (E), or both (SxE) compared to a null model where fixations are not strain or environment dependent.
Unexplored ridgeline
A B
AIR
Adenine
13 ADE2 fixed,
28 33 ADE3 pathway still broken
ADE4 ADE5
5
Fitness peak False summit
33 33
ADE8
ADE7 ADE6 AIR
AIR
Adenine Adenine
7
ADE2 Functional LOF upstream
ADE pathway in ADE pathway,
ADE2 broken
Adenine
Ancestral state
High mutation rate
(large target size)
AIR
Figure 7. ADE pathway evolution. (A) Simplified schematic of the adenine biosynthesis pathway. Circles represent metabolic intermediates; AIR is the
toxic metabolic intermediate phosphoribosylaminoimidazole. Annotations represent the number of fixed nonsynonymous mutations in each gene (note
that ADE5 and ADE7 are both products of the same gene). (B) Schematic of a fitness landscape with four possible states defined by whether ADE2 is
functional and whether the ADE pathway upstream of ADE2 is functional. The small insets represent the state of the pathway in (A) at each position.
Elevation in the landscape represents putative fitness differences, and the width of the arrows represents the putative mutation rates between the
different states.
The online version of this article includes the following figure supplement(s) for figure 7:
Figure supplement 1. Overdispersion.
Figure supplement 2. Mutual information analysis.
functional because they disrupt adenine biosynthesis, are strongly beneficial in the ade2-1 back-
ground because they prevent this toxic buildup (Rojas Echenique et al., 2019). Consistent with this,
we see rapid fixation of at least one mutation in the ADE pathway, typically upstream of ADE2, in
almost all of our sequenced populations, along with frequent loss of heterozygosity of these muta-
tions in diploids (Figure 3A).
Five of our sequenced populations find a better solution: they fix mutations that revert the pre-
mature stop codon so that the full ADE2 sequence can be translated (populations indicated by aster-
isks in Figure 2 and mutations shown in red in Figure 6; note that one unsequenced high-fitness
population also has this mutation, confirmed by Sanger sequencing). These populations have higher
fitness than other populations from the same strain background and environment, presumably
because they have both repaired the defect in adenine biosynthesis and avoided the buildup of the
toxic intermediate. As we would expect, these populations do not fix any loss-of-function mutations
in other ADE pathway genes. The fact that only six of our populations find this higher fitness rever-
sion of ade2-1 is presumably a consequence of differences in target size: while loss-of-function in
genes upstream in the pathway can arise from a variety of mutations in five genes upstream of
ADE2, the ade2-1 reversion requires a mutation at a specific codon in ADE2.
We note that once a population has fixed an upstream loss-of-function mutation, it requires rever-
sion of both the original ade2-1 mutation and the upstream mutation to find the higher fitness geno-
type. While this is possible in principle, both mutations have single-codon target sizes and when
they occur alone are likely neutral and deleterious respectively, making this evolutionary path
extremely improbable. We do not observe any populations that move from the lower fitness geno-
type to the higher fitness genotype even after 10,000 generations of evolution. Figure 7 depicts
these evolutionary states using a simple fitness landscape framework.
Contingency
The alternative evolutionary paths involving mutations in the ADE pathway are an example of contin-
gency that is already well understood (Rojas Echenique et al., 2019; Roman, 1956). We next sought
to analyze the role of contingency more broadly in our experiment. To do so, we first analyzed
whether mutations are over-dispersed or under-dispersed among populations, following
Good et al., 2017. Looking within each environment-strain combination, we find that nonsynony-
mous mutations are more over-dispersed than expected by chance; this is still true if we also include
mutations that are present but not fixed (Figure 7—figure supplement 1). This provides evidence
of ‘coupon collecting’: populations with a fixed nonsynonymous mutation in a gene are less likely to
fix another mutation in that gene.
We next sought to test whether mutations in a given gene tend to open up or close off opportu-
nities for beneficial mutations in other genes. To do so, we calculated the mutual information
between multi-hit genes (i.e. for each pair of multi-hit genes, whether a population with a fixed non-
synonymous mutation in the first gene is more or less likely to have a fixed nonsynonymous mutation
in the second). As in Fisher et al., 2019, we find that the sum of mutual information across all pairs
of multi-hit genes in our experiment is higher than in simulations (p=0.036, Figure 7—figure supple-
ment 2). Thus, there is an overall statistical signature of contingency in our data: mutations in certain
genes make mutations in others more or less likely. However, we do not have power to isolate this
signature to individual pairs of genes; the mutual information between any two multi-hit genes in
our experiment is not higher than we would expect by chance. Note that because we calculate
mutual information separately for each environment-strain combination (at most 12 populations per
group), we have less power than Fisher et al., 2019 to detect interactions between genes. In sum,
while we cannot confidently identify more specific examples of contingency in our data beyond a
general pattern of coupon-collecting, it is likely to be playing a role, as in the LTEE (Good et al.,
2017).
Figure 8. Patterns of molecular evolution and loss of heterozygosity in diploids. (A) Genomic positions of all mutations that experienced loss of
heterozygosity (LOH) across all diploid populations (loss of heterozygosity defined by a mutation reaching >90% frequency). Orange marks represent
mutations in the ADE pathway. Each horizontal line represents one population, and the histogram at right represents the total number of LOH fixations
in each population, with populations arranged by environment. The top histogram represents the frequency of loss of heterozygosity across the
genome, and the chromosomes underneath show the centromere location with a black circle. Genes with five or more LOH fixations are annotated. (B)
The fraction of fixed nonsynonymous mutations that are in essential genes, plotted for mutations fixed in haploid populations, mutations fixed
homozygously in diploid populations (LOH) and mutations fixed heterozygously in diploid populations, plotted separately for mutations annotated as
high or moderate impact by SnpEff (high-impact mutations are likely to cause loss-of-function). The dashed line represents the fraction of the coding
genome that is in essential genes. (C) The ratio of high-impact to moderate-impact fixations in the same three mutation groups as in (B), for mutations
in non-essential genes only.
The online version of this article includes the following figure supplement(s) for figure 8:
Figure supplement 1. The ploidy state of two clones from each focal population, shown by FITC histograms of Sytox-stained cells.
Figure supplement 2. Cell imaging from three populations with abnormal Sytox data.
chromosomes XII and IV. These concentrations of LOH are likely due to some combination of selec-
tion in favor of LOH events and differences in the rates at which they occur. Higher rates of LOH on
the right arm of chromosome XII are likely related to high levels of recombination associated with
the ribosomal DNA array (Fisher et al., 2018; Marad et al., 2018), but we also see evidence that
patterns of LOH are affected by selection for recessive beneficial mutations that would otherwise be
filtered out by Haldane’s sieve (as in Gerstein et al., 2014), notably among loss-of-function muta-
tions in the adenine pathway (Figure 3A, Figure 8A).
As driver mutations sweep to fixation in diploids, they have the potential to bring along recessive
deleterious hitchhikers (which then also fix as heterozygotes). Consistent with this, we find that muta-
tions fixed as heterozygotes in diploids include a large percentage of high-impact mutations in
essential genes, while mutations fixed in haploids and mutations fixed homozygously in diploids
include nearly zero high impact mutations in essential genes (Figure 8B). Even in non-essential
genes, mutations that fix heterozygously in diploids are more likely to be high-impact mutations
compared to those that fix in haploids or those that fix homozygously in diploids, again suggesting
that diploids are fixing recessive deleterious mutations as heterozygotes (Figure 8C). This build-up
of recessive deleterious load in diploids is expected, but takes on an interesting light in the context
of the widespread loss of heterozygosity we observe. As recessive deleterious load accumulates in
the population, it will limit the rate of LOH by making many LOH events strongly deleterious or
lethal. Thus, passage through Haldane’s sieve by loss of heterozygosity should become less likely as
populations accumulate a substantial load of hitchhiking heterozygous mutations. This process is
likely to be occurring in domesticated industrial diploid yeast lineages, which often become obli-
gately asexual and accumulate many heterozygous mutations (Gallone et al., 2016). However, we
note that in sexual lineages recombination with sufficient inbreeding could dramatically alter these
dynamics, by continuously purging recessive deleterious load (Charlesworth and Willis, 2009).
While we hypothesize that most of the heterozygous fixations in diploids are either dominant
beneficial mutations or neutral or deleterious hitchhikers, some may be overdominant beneficial
mutations. One possible candidate for overdominance is CCW12, which is hit preferentially in dip-
loids (Figure 6) and in which only 2 of the 17 fixed mutations lost heterozygosity (both these muta-
tions are in-frame deletions of the final amino acid, and note that CCW12 is in a region on the right
arm of chromosome XII where LOH appears common). In Leu et al., 2020, mutations in CCW12
were maintained in asexual diploid populations but lost in sexual populations, supporting a hypothe-
sis of overdominance, although Leu et al., 2020 did not detect overdominance in reconstructed
strains in their evolution environments. Extensive reconstructions or backcrossing will be required to
understand the importance of overdominance in the evolution of our diploid populations.
Discussion
Evolution experiments are as much about hypothesis generation as hypothesis testing, and work
across the field has now laid out a series of hypotheses about evolution in general. No experiment
can cover the breadth of biological and environmental diversity needed to fully test these hypothe-
ses; we cannot replay all of evolution. However, a relatively consistent set of results has emerged
across microbial species evolved asexually for thousands of generations in the lab (Kassen, 2014).
Our results confirm many aspects of the picture drawn by previous work, with several important
exceptions.
Most of our populations followed predictable fitness trajectories in which fitness increases slowed
over time. This pattern was not observed, however, in some of our diploid populations in SC 30˚C,
Figure 9. Loss of extrachromosomal elements. (A) Killer virus activity at each sequenced timepoint, determined by a killer assay against a sensitive
strain. Each row represents one population. Examples of raw data for each qualitative phenotypic category are shown in the key, and the full raw data
underlying these scores is shown in Figure 9—figure supplement 1. (B) 2-micron plasmid copy number at each sequenced timepoint. Rows represent
the same populations as in A. The x in a diploid population at generation 1410 marks a population we excluded due to contamination in the
population during these experiments.
The online version of this article includes the following figure supplement(s) for figure 9:
Figure supplement 1. Contrast-enhanced scanned images of killer virus halo assays.
which instead increased in fitness at a slow constant rate similar to Marad et al., 2018, before
experiencing significant rapid increases in fitness likely associated with individual selective sweeps
(Figure 2). Our populations show signatures of clonal interference, and they accumulated fixed
mutations linearly through time even late in the experiment. We find only one strong case of repeat-
able selection at the level of the nucleotide change (ade2-1 reversions), but we observe widespread
parallelism across strains and environments at the level of genes and pathways: populations predict-
ably adapt through loss-of-function mutations in the adenine biosynthesis pathway, sterility-associ-
ated genes, and negative regulators of the Ras pathway.
We do not observe two phenomena that results from the LTEE had previously suggested might
be common: the fixation of mutator alleles that dramatically increase mutation rates, and the sponta-
neous emergence of long-term quasi-stable coexistence between competing lineages. The reasons
for these differences remain unclear. In part, we may not observe these phenomena simply because
of the shorter timescale of our experiment. However, we note that within the first 10,000 genera-
tions of the LTEE, 4 of the 12 populations fix mutator alleles, and 3 of the 12 populations have coex-
isting lineages detectable from sequencing data. Instead, the lack of mutator lineages may stem
from a difference in the rate at which mutators arise or a different balance between the relative
importance of beneficial and deleterious mutations (which depends on the environment and ances-
tral fitness; see e.g. Swings et al., 2017 and Kryazhimskiy et al., 2014) that leads to less indirect
selection for mutators (Good and Desai, 2016). We also may have less second-order selection for
mutators in our experiment because our strains have mutation rates that are higher than the ances-
tral E. coli strain in the LTEE, though lower than the LTEE mutator lineages (Lang and Murray,
2008; Wielgoss et al., 2013). The difference in how commonly coexistence emerges is similarly
unclear. Our strains and environments may simply lack the metabolic pathway architecture to pro-
duce cross-feeding or other interactions that could be the basis for coexistence. Alternately, coexist-
ing lineages may have emerged in our experiment but been lost due to drift or strong within-lineage
adaptation. Since our populations are smaller than those in the LTEE, low-frequency lineages will be
lost more commonly during the daily bottleneck. Regardless of the reasons for these differences,
our results suggest that the evolution of mutation rates and of stable ecological interactions may not
be as general or widespread as the LTEE has suggested, and may instead vary substantially based
on differences in the organisms or details of the environmental conditions.
As the longest running evolution experiment in yeast, this project provides a window into how
dominance and loss of heterozygosity can affect the dynamics of adaptation in diploids. Our diploid
populations appear to carry substantial recessive deleterious load (Figure 8B–C) and may carry ben-
eficial overdominant mutations, but future studies involving genetic reconstructions or backcrossing
will be needed to fully characterize these effects. We also observe widespread loss of heterozygos-
ity. The dynamics of mutations in the adenine biosynthesis pathway provides a particularly interest-
ing example of both how Haldane’s sieve slows adaptation in diploids and how diploids can bypass
the sieve by loss of heterozygosity. At some point during the experiment, most of our diploid popu-
lations homozygously fix a loss-of-function mutation upstream in the pathway, which eliminates the
deleterious toxic intermediate produced as a result of the ancestral ade2-1 mutation. While the rate
of loss of heterozygosity was high enough to produce these genotypes and expose them to selec-
tion, it appears to have been a limiting factor; haploids typically fixed these mutations earlier in the
experiment (Figure 3—figure supplements 1–9). Haldane’s ‘speedcheck’ here slowed adaptation
but also provided diploid populations with more time to search for the single-codon target of the
(highly beneficial and apparently dominant) ade2-1 reversion, and indeed, 4/6 populations with this
mutation in our experiment are diploids.
Perhaps, the most important product of microbial evolution experiments is a base of intuition for
understanding how the interactions between different evolutionary forces determine the dynamics
and outcomes of genotypic and phenotypic evolution. The extent to which this base of intuition can
be generalized across systems and scales – ranging from specific protein complexes to human
pathogens to entire clades of sexually reproducing species – is an important set of largely unan-
swered questions. However, laboratory microbial evolution experiments have provided basic expect-
ations to compare against, and have highlighted a collection of phenomena that can sometimes play
a major role in adaptation. Our results here reinforce the conclusion that long-term adaptation to a
constant environment can be characterized by widespread clonal interference, contingency, and
steady molecular evolution even as fitness increases slow down over time. They also highlight the
role of dominance and loss of heterozygosity in diploid evolution. However, our work also calls into
question the generality of conclusions about the importance of the evolution of mutation rates or
stable coexistence. As our populations continue to evolve, further analysis of our experiment and of
other complementary studies will further broaden our understanding of the processes that deter-
mine the rate, predictability, and molecular basis of evolution.
Culture conditions
We propagated all populations in 128 mL of media in unshaken flat-bottom polypropylene 96-well
plates (VWR #82050–786). For one environment, we used rich YPD media (1% Bacto yeast extract
(VWR #90000–726), 2% Bacto peptone (VWR #90000–368), 2% dextrose (VWR #90000–904)) and
grew populations at 30˚C. For the other two environments, we used synthetic complete (SC) media
(0.671% YNB with nitrogen (Sunrise Science #1501–250), 0.2% SC (Sunrise Science # 1300–030), 2%
dextrose) and grew populations at 30˚C or 37˚C. All media was supplemented with 100 mg/ml ampi-
cillin and 25 mg/ml tetracycline. Using a Biomek FXp robot (Beckman Coulter), we performed daily
1:210 dilutions of populations in YPD 30˚C and SC 30˚C and daily 1:28 dilutions of populations in SC
37˚C (we used 384-well plates (VWR #82051–306) for serial dilution). These dilutions determine the
number of doublings or generations per day (10 for YPD 30˚C and SC 30˚C, eight for SC 37˚C), the
bottleneck population size (~8 . 103 for YPD 30˚C and SC 37˚C, ~2 . 103 for SC 30˚C), and the
corresponding effective population size (~6 . 104 for YPD 30˚C, ~4 . 104 for SC 37˚C, ~1 . 104 for SC
30˚C) (Wahl and Gerrish, 2001). These bottleneck sizes are based on estimated saturation densities
of ~6 . 107cells/mL for YPD and ~1.6 . 107cells/mL for SC in 96-well plates, measured using a Coulter
Counter Z2 (Beckman Coulter). Before dilution, we resuspended cultures by shaking at 1200 rpm for
2 min, and after dilution we shook the new plates at 1200 rpm for 1 min, both on a Titramax 100
plate shaker (Heidolph Instruments). After each transfer, the tips (VWR #89204–794) used to dilute
cultures were washed with water (to wash out cells) and 100% ethanol (to lyse residual cells), left to
dry overnight, and reused in culture propagation. The 96-well microplates used to maintain popula-
tions were bleached (to lyse cells), washed with distilled water, and autoclaved (121˚C, 30 min)
before being reused. Every 7 days, we froze aliquots of all populations in 27% glycerol (final concen-
tration) at 80˚C. To monitor for contamination, six well-spaced wells in each environment were
intentionally left ‘blank’ at the start of the experiment (i.e. they contained only media and no cells).
At several timepoints during the evolution we noticed contamination in the previously blank wells of
our 96-well plates. During instances of contamination, we unfroze all populations from an older glyc-
erol archive and inoculated 4 mL directly into 124 mL of the appropriate media for each environment.
A record with notes on the evolution is available in Supplementary file 1.
Fitness assays
In order to assess competitive fitness using a consistent reference for each environment, we isolated
clones at various generations from an arbitrarily chosen evolving diploid population in YPD 30˚C
(P1G09). We looked for clones that had fitnesses intermediate between the ancestral strains and
evolved strains in each environment, and tagged these clones by inserting a yNatMX cassette and
GFP (pRPL39::eGFP::tADH) into an intergenic region (chromosome VII, position 649234) that was
previously used as a neutral insertion site control in Johnson et al., 2019. This produced the refer-
ence strains used for fitness assays in YPD 30˚C and SC 30˚C (2490A-GFP1), and SC 37˚C (11470A-
GFP1).
Fitness assays were performed as described previously (Lang et al., 2011). Briefly, we unfroze
populations and a reference strain from glycerol stocks, allowed them to grow in their evolution
environment for one full growth cycle, and then mixed the populations with the reference strain in
equal proportions. We then maintained these mixed populations for three daily growth cycles, as
described above. At each transfer, we diluted cells from each well into PBS and used flow cytometry
(Fortessa and LSRII, BD Biosciences) to measure the ratio of the two competing types, counting
approximately 10,000–40,000 cells for each measurement.
To get fitness measurements for each population-timepoint, we first calculated the frequency of
fluorescent reference cells in each sample by gating our flow cytometry data to separate the fluores-
cent cells. Because a small percentage of reference cells do not fluoresce, we estimated this percent-
age from six wells that only contained the reference in each environment and used these values to
correct the reference frequency in all other wells. We then calculated the fitness of each population-
timepoint as the slope of the natural log of the ratio between the frequencies of the non-reference
and reference cell populations over time (timepoints with reference frequency under 5% or over 95%
were excluded). After taking the mean of fitness measurements from two replicates, we corrected
for batch effects in our assays by subtracting the mean fitness measured for an unlabeled reference
(2490A) in the same fitness assay. While our replicate fitness measurements generally correlate very
well (Figure 2—figure supplement 2), there is a deviation from the 1:1 line for replicate fitness
measurements for low-fitness populations in SC 37˚C. We believe this is due to batch effects specific
to each flow cytometry machine, which in turn affect the reference frequency correction explained
above. Only assays in which the reference frequency goes up very rapidly (low-fitness populations)
are strongly affected.
Because the ancestral genotypes have a strongly deleterious mutation in the adenine biosynthesis
pathway and haploids very quickly fix strongly beneficial suppressor mutations, it was difficult to
measure ancestral fitness in some cases; we sometimes observed changes in fitness during the fit-
ness assay even when using clones from generation zero (which had been grown prior to and after
freezing glycerol stocks). In all but one environment-strain combination, we were able to identify
populations without any nonsynonymous mutations detected from our sequencing data at the first
timepoint (generation 70 for YPD 30˚C and SC 30˚C, generation 56 for SC 37˚C), so we used the
median of the fitness measured among these populations at the first timepoint to define ancestral
fitness. All MATa populations in YPD 30˚C had nonsynonymous mutations present (and often fixed)
at generation 70, but one unsequenced population had a significantly lower generation 70 fitness
than all others (similar to the one MATa population in YPD 30˚C with no-nonsynonymous mutations
at generation 70), so we use the fitness estimated at generation 70 for that population as our ances-
tral fitness for MATa populations in YPD.
Whole-genome sequencing
For each of the three environments, we selected 30 focal populations: 12 diploid populations, 12
MATa populations, and 6 MATa populations. We chose these populations randomly after excluding
populations in wells along the edge of the plate (which we have had the most problems with losing
due to evaporation or pipetting errors) and populations where we had detected cross-contamina-
tion. We performed whole-genome, whole-population sequencing on each of these populations at
six timepoints. After unfreezing populations as described above, we transferred each of our focal
populations into five replicate wells in their evolution environment, let them grow for 24 hr, and then
pelleted ~0.5 mL of cells. We used a DNA extraction protocol based on the ‘BOMB gDNA extraction
using GITC lysis’ from Oberacker et al., 2019. Briefly, we resuspended the cell pellets in 50 mL of
zymolyase buffer (5 mg/mL Zymolyase 20T (Nacalai Tesque), 1M Sorbitol, 100 mM Sodium Phos-
phate pH 7.4, 10 mM EDTA, 0.5% 3-(N,N-Dimethylmyristylammonio)-propanesulfonate (Sigma,
T7763), 200 mg/mL RNAse A, and 20 mM DTT) (Nguyen Ba et al., 2019) and incubated the suspen-
sion at 37˚C for 1 hr. Subsequently, we added 85 mL of a modified BOMB buffer (4M guanidinium-
isothiocyanate (Goldbio G-210–500), 50 mM Tris-HCl pH 8, 20 mM EDTA) and then 115 mL of isopro-
panol (VWR# BDH1133-4LP), mixing by pipetting for 3 min after each addition. We then added 20
mL of Zymo Research MagBinding beads to bind DNA, mixed for 3 min by pipetting, separated
beads from the solution using a Magnum FLX 96-well magnetic separation rack (Alpaqua), and
removed the supernatant. We washed the beads with 400 mL of isopropanol and twice with 300 mL
of 80% ethanol. Finally, we added 75 mL of sterile water to the beads and mixed by pipetting for 3
min. Finally, we separated beads from solution and transferred 44 mL of the supernatant (containing
the DNA) into a new 96-well PCR plate (Bio-Rad HSP9631) for library preparation. This entire process
was carried out on a Biomek FXp robot (Beckman Coulter).
Sequencing libraries were prepared using a Nextera (Illumina) kit as previously described
(Baym et al., 2015), but with three additional PCR cycles for a total of 16, and with a two-sided
bead-based size selection after PCR (we used either 0.5/0.7X or 0.55/0.65X bead buffer ratios with
PCRClean DX Magnetic Beads (Aline)). Libraries were sequenced to an average depth of 20-fold
(haploids) or 40-fold (diploids) coverage using a NextSeq 500 or Novaseq (Illumina).
Sequencing analysis
We trimmed Illumina reads with NGmerge version 0.2 (Gaspar, 2018), aligned all the first-timepoint
samples to a SNP-corrected W303 genome (Lang et al., 2013) using BWA version 0.7.15 (Li, 2013),
and marked duplicate reads with Picard version 2.9.0 (http://broadinstitute.github.io/picard). We
used samtools (Li et al., 2009) to merge these alignments and then used Pilon version 1.23
(Walker et al., 2014) to create a new reference genome that is corrected for additional SNPs pres-
ent in the ancestral strains. We then repeated this process until the marking duplicate reads step for
all samples using this new reference and called variants using GATK version 4.1.3.0 (McKenna et al.,
2010), specifically using HaplotypeCaller, GenomicsDBImport, and GenotypeGVCFs with heterozy-
gosity set to 0.005. We annotated these variants using SnpEff version 4.3T (Cingolani et al., 2012),
and split multi-allelic records into individual records.
We extracted allele depths for each variant to determine the number of reads supporting the ref-
erence and alternate alleles at each site. We then filtered variants based on these read counts. We
first excluded mutations with less than five reads representing the alternate allele across all time-
points. To create this filtered list of variants present in each population, we required that mutations
pass at least one of these two criteria:
1. The total alternate-allele reads across all timepoints for the population in question is more
than 90% of the total alternate-allele reads across all populations and all timepoints.
OR
2. At least two timepoints have at least five reads supporting the alternate allele AND The total
alternate-allele reads across all timepoints for the population in question is more than 90% of
the total alternate-allele reads across all populations at only the first timepoint.
The first criterion addresses if a mutation is unique to a single population, which provides strong
evidence that it is not a common sequencing or alignment error. However, we do not want to
exclude the possibility of parallelism at the nucleotide level, so we include the second criterion as a
more lenient way to exclude these types of errors while not requiring uniqueness. Some small num-
ber of sequencing or alignment errors will pass these filters, so we emphasize that this is only a
lenient first step, and that our analysis of parallelism and contingency relies on also observing
fixation.
We simplify our SnpEff annotations to indicate one of five types of mutation; in order of decreas-
ing putative effect they are indel, nonsense, missense, synonymous, or noncoding. For mutations
with multiple annotations, we assign the mutation type with the largest putative effect. To test if
some nearby mutations are part of a single mutational event, we perform Fisher’s exact test on the
alternate and reference allele counts at each timepoint for mutations within 25 bp of each other. If
two mutations have no significant differences detected at the p<0.01 level for any timepoint, we
label them as part of the same ‘mutation group,’ and they are counted as one mutation in subse-
quent analysis. We define mutations as ‘present’ at a particular timepoint if they have coverage of at
least 5X and are at greater than or equal to 0.1 frequency. We define mutations as ‘fixed’ at a partic-
ular timepoint if they have coverage of at least 5X, are at greater than or equal to a frequency of
40% (diploids) or 90% (haploids), and do not drop below these thresholds while still at >= 5X cover-
age at a later timepoint. If a mutation is called fixed at one timepoint, it is automatically called fixed
at later timepoints, even if they have less than 5X coverage. Using the same rules, we also call loss of
heterozygosity of a mutation in diploids using a frequency threshold of 90%. We exclude mutations
called in the 2-micron plasmid from further analysis since most populations lose this plasmid during
evolution, and variation in coverage and misalignments can easily produce false mutation calls in
these cases. We also exclude mutations in the telomeres, where alignment errors and repetitive
regions make mutation calling difficult.
divide all window-depths by this value to get ‘relative depth.’ To account for regions that are at a
different copy number in our ancestral strains, we calculate the average relative depth at the first
sequenced timepoint for each window (and for each strain). We divide the relative depth in our data
by these values to get ‘standardized depth.’ Windows with a relative depth less than 0.25 at the first
timepoint are excluded from analysis. For each chromosome in each sample, we use a simple,
untrained HMM to detect tracts of standardized depth that deviate from the expectation of 1. We
allow states 0, 0.5, 1, 1.5, 2, 3, and 4, with variances equal to the calculated variance in standardized
depth multiplied by the state (except for state 0, where we use the calculated variance multiplied by
0.5), initial probabilities of 1% for each non-1 state (94% for state 1), and transition matrix probabili-
ties of 0.01% for all non-diagonal entries (99.94% along the diagonal). This is a rough detection
method, but it succeeds in identifying putative CNVs, which we then subject to a filtering process.
First, we merge CNV records across timepoints if they cover the same region. Next, we exclude
CNVs in telomeric regions, CNVs found in only one timepoint, and CNVs that are less than four win-
dows (2 kb) long. Finally, we manually inspect our structural variant and CNV calls together using a
modified version of Samplot (Belyeu et al., 2020) to create a list of confirmed copy number variants
in our populations (Supplementary file 3). During this analysis, we noticed two regions with high
copy-number that experienced copy-number changes in many populations: one associated with the
CUP1 tandem array and one associated with the ribosomal DNA tandem array. We excluded these
regions from the above analysis and show their copy number changes in every population in Fig-
ure 3—figure supplement 11.
populations in the set (we ignore when a gene is hit multiple times in the same population for this
probability calculation). We model the probability a gene is hit in population g based on the total
number of gene hits in population g, Mg, where hgi is 1 if gene i is hit in population g, and 0 if not:
To test whether genes are disproportionately hit in different strain backgrounds (MATa, MATa,
diploid) or different environments, we compare four models that use different sets of populations to
compute P(hgi):
1. P(hgi) is calculated using the entire set of populations (so that there is only one P(hgi) for each
multi-hit gene)
2. P(hgi) is calculated separately for each strain background (so that there are three P(hgi) for
each multi-hit gene, one for each strain)
3. P(hgi) is calculated separately for each environment (so that there are three P(hgi) for each
multi-hit gene, one for each environment)
4. P(hgi) is calculated separately for each environment-strain combination (so that there are nine
P(hgi) for each multi-hit gene, one for each environment-strain combination)
For each multi-hit gene, we calculate the log-likelihood of the data under each model and calcu-
late log-likelihood ratios between model 1 and each of the other three models. We then create
10,000 null datasets by drawing values from the probabilities defined in model 1 and compute log-
likelihood ratios for these simulated data. To define significant effects (at a p<0.05 level), we com-
pare our log-likelihood ratios to distributions of log-likelihood ratios from these null datasets and
correct for multiple hypothesis testing using a Benjamini-Hochberg correction. If multiple models are
significantly better than model 1, we use the Akaike information criterion (AIC) to determine the
model that best explains the data. Data on multi-hit genes and these statistical tests are available in
in Supplementary file 4.
X
Ne ~
Pðhi jeÞ ¼ CM M gi
g¼1
~
where M gi ¼ Mgi þ " and Mgi = 1 if there is a hit in gene i in population g, Ne is the total number of
populations in environment-strain combination e, CM = 1/N(1 + "), and the pseudocount " = 1/M (M
is the total number of gene hits across all populations in the environment-strain combination, as in
model 4 above). In contrast to the enrichment analysis above, this formula treats populations as
exchangeable, which is a reasonable assumption since the number of fixed mutations in each envi-
ronment-strain combination is not highly variable (Figure 3B).
We calculate joint probabilities and mutual information as in Fisher et al., 2019 (see equations
2.2-2.7). Because we separate our data into sets of populations with a shared environment and strain
background, our data contain many cases where a gene is hit zero times, which inflates the sensitiv-
ity to the pseudocount used in Fisher et al., 2019. To avoid this issue, we set the mutual information
between two genes to zero if either of the genes has no mutations in a given environment-strain
combination. We sum the mutual information values for each pair of genes across the nine possible
environment-strain combinations, and record the total mutual information (MItot, the sum of MI
values across all possible gene pairs) and the maximum mutual information between any two genes
(MImax).
Next, we compare these results to simulated datasets. We create 10,000 null datasets by drawing
from P(hi|e) for each gene, and calculate mutual information as described above to build null distri-
butions for MItot and MImax.
The results are plotted in Figure 7—figure supplement 2. While MItot for our data is higher than
in simulated datasets (p=0.036), MImax for our data lies well within the range of simulated data, so
we cannot detect any specific examples of contingency. As in Fisher et al., 2019, we test the robust-
ness of our results to choices of the pseudocount eM between 0.1 and 2 (the value used above was
1), and find that it does not qualitatively change our results.
Over-/under-dispersion analysis
Following Good et al., 2017, we looked for statistical patterns of contingency by comparing the dis-
persion configurations for genes with simulated data. For each environment-strain combination, we
record the number of times each gene is hit and the number of populations in which it is hit. We
also simulate distributing these hits across populations by multinomial draws weighted by the num-
ber of hits in each population. We run this simulation, for each possible number of hits (up to the
maximum observed), 10,000 times for each environment-strain combination. For each number of
hits, we compute the probability of those hits being distributed among each possible number of
populations for both our data and the simulated data. We compute the ‘excess probability’ in our
data by subtracting the simulated probability from the data probability. The results are plotted in
Figure 7—figure supplement 1A. We repeat this process with nonsynonymous mutations that are
detected but do not fix included (Figure 7—figure supplement 1B). Red squares along the diagonal
suggest that the mutations are overdispersed, meaning that nonsynonymous mutations are less likely
to fix multiple times in the same population than we would expect by chance. As in Good et al.,
2017, we quantify this observation of overdispersion by showing that mutations have less ‘missed
opportunities’ than we would expect by chance (Figure 7—figure supplement 1).
Ploidy assays
To investigate whether any of our focal populations had changed ploidy during the course of the
experiment, we measured the DNA content of clones isolated from each focal population at the final
timepoint. We isolated one to two clones from each focal population and measured DNA content
using a nucleic acid stain as described previously in Jerison et al., 2020, but with minor modifica-
tions. Briefly, we diluted 4 mL of saturated cultures from each clone (grown in YPD) into 120 mL of
water in a 96-well plate, centrifuged the plate, removed the supernatant, resuspended in 50 mL
water, added 100 mL of ethanol and pipetted slowly to mix, and incubated at room temperature for
1 hr. Next, we centrifuged the plate, removed the supernatant, let dry for ~5 min, resuspended in 65
mL RNase solution (2 mg/ml RNase in 10 mM Tris-HCl, pH 8.0 and 15 mM NaCl), and incubated at
37˚C for 2 hr. We then added 65 mL of 2 mM Sytox Green (Thermo Fisher Scientific S7020), covered
the plates in aluminum foil, and shook on a Titramax 100 plate shaker (Heidolph Instruments) for
approximately 45 min at room temperature. We measured DNA content using a linear FITC channel
on a Fortessa flow cytometer (BD Biosciences). FITC histograms are shown and described in Fig-
ure 8—figure supplement 1.
Preliminary imaging
To investigate the possibility of clustering phenotypes in some of our populations with abnormal
ploidy stain data, we imaged our focal populations at each of the sequenced timepoints. We diluted
cultures 1:360 into 384-well plates (VWR #82051–306) and imaged cells using the Celldiscoverer 7
(Zeiss). Images for all populations are available in Supplementary file 6.
All analysis scripts used in this project are available on GitHub: https://github.com/mjohnson11/
VLTE_PIPELINES (copy archived at swh:1:rev:588043a94abb34b13a6dd7a1b25277c25ae8deaf)
(archived permanently at https://doi.org/10.5281/zenodo.4422067).
An interactive data browser for this project is available online: https://www.miloswebsite.com/
exp_evo_browser.
Acknowledgements
We thank Andrew Murray, Nina Benites, Yi Chen, members of the Desai lab, members of the Sher-
lock lab, and three reviewers for helpful comments on the manuscript. We thank the Northwest
building staff, in particular Francisco Gonzalez, and the Bauer Core staff, without whom we could
not have done this work. This work was supported by NSF Graduate Research Fellowships (MSJ,
ERJ, KK, and KRL), the NSF-Simons Center for Mathematical and Statistical Analysis of Biology at
Harvard University Grant DMS-1764269 (KRL), the Harvard Program for Research in Science and
Engineering (JG), the NDSEG Fellowship Program (CWB), the Fannie and John Hertz Foundation
Graduate Fellowship Award (KRL), the Boston Bangalore Biosciences Beginnings Program from DBT,
India (RP), the ARC Grant FT 170100441 (MJM), the NSERC (ANNB), Simons Foundation Grant
376196 (MMD), NSF Grant PHY-1914916 (MMD), and NIH Grant R01 GM104239 (MMD). Computa-
tional work was performed on the Cannon cluster supported by the Research Computing Group at
Harvard University. We thank the Harvard Center for Biological Imaging for infrastructure and
support.
Additional information
Competing interests
Julia C Piper: Julia C. Piper is affiliated with Aeronaut Brewing Co. The author has no financial inter-
ests to declare. The other authors declare that no competing interests exist.
Funding
Funder Grant reference number Author
National Science Foundation Graduate Fellowship Milo S Johnson
Elizabeth R Jerison
Katya Kosheleva
Katherine R Lawrence
Simons Foundation DMS-1764269 Katherine R Lawrence
Harvard University PRISE Juhee Goyal
National Defense Science and Graduate Fellowship Christopher W Bakerlee
Engineering Graduate
Hertz Foundation Graduate Fellowship Katherine R Lawrence
Department of Biotechnology , Boston Bangalore Ramya Purkanti
Ministry of Science and Tech- Biosciences Beginnings
nology Program
Australian Research Council FT170100441 Michael J McDonald
Natural Sciences and Engi- Alex N Nguyen Ba
neering Research Council of
Canada
Simons Foundation 376196 Michael M Desai
National Science Foundation PHY-1914916 Michael M Desai
National Institutes of Health R01 GM104239 Michael M Desai
The funders had no role in study design, data collection and interpretation, or the
decision to submit the work for publication.
Author contributions
Milo S Johnson, Conceptualization, Software, Formal analysis, Investigation, Visualization, Methodol-
ogy, Writing - original draft, Writing - review and editing; Shreyas Gopalakrishnan, Conceptualiza-
tion, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review and
editing; Juhee Goyal, Formal analysis, Investigation, Methodology; Megan E Dillingham, Investiga-
tion, Methodology; Christopher W Bakerlee, Investigation, Writing - review and editing; Parris T
Humphrey, Tanush Jagdish, Katherine R Lawrence, Jiseon Min, Alief Moulana, Angela M Phillips,
Julia C Piper, Ramya Purkanti, Artur Rego-Costa, Investigation; Elizabeth R Jerison, Katya Kosheleva,
Michael J McDonald, Conceptualization, Investigation, Methodology; Alex N Nguyen Ba, Conceptu-
alization, Investigation, Methodology, Writing - review and editing; Michael M Desai, Conceptualiza-
tion, Formal analysis, Supervision, Funding acquisition, Investigation, Methodology, Writing - original
draft, Project administration, Writing - review and editing
Author ORCIDs
Milo S Johnson https://orcid.org/0000-0003-0169-2494
Shreyas Gopalakrishnan http://orcid.org/0000-0002-7243-0005
Elizabeth R Jerison http://orcid.org/0000-0003-3793-8839
Michael M Desai https://orcid.org/0000-0002-9581-1150
Additional files
Supplementary files
. Supplementary file 1. Experimental record. Includes daily notes, phenotype information for individ-
ual wells (including fitness), and a record of sample sizes for statistical tests.
. Supplementary file 2. A zip file of processed variant calling files for each population.
. Supplementary file 3. A table of all confirmed copy number variants.
.Supplementary file 4. Summary information on mutations, including which genes are mutated in
which populations, GO-term enrichments, multi-hit codons, and statistical test results for strain or
environment enrichment for each multi-hit gene.
. Supplementary file 5. A record of the detected differences between our ancestral strains and of
the fluctuation assay confirming that the TSA1 mutation increases mutation rate.
. Supplementary file 6. A zip file of all preliminary cell imaging.
. Transparent reporting form
Data availability
Sequencing data have been deposited in the GenBank SRA (accession: SRP286889). Analysis code is
available at https://github.com/mjohnson11/VLTE_PIPELINES (copy archived at https://archive.soft-
wareheritage.org/swh:1:rev:588043a94abb34b13a6dd7a1b25277c25ae8deaf/). All other generated
data is available in Supplementary files 1–6. An interactive data browser is available at https://www.
miloswebsite.com/exp_evo_browser.
References
Bailey SF, Rodrigue N, Kassen R. 2015. The effect of selection environment on the probability of parallel
evolution. Molecular Biology and Evolution 32:1436–1448. DOI: https://doi.org/10.1093/molbev/msv033,
PMID: 25761765
Barrick JE, Yu DS, Yoon SH, Jeong H, Oh TK, Schneider D, Lenski RE, Kim JF. 2009. Genome evolution and
adaptation in a long-term experiment with Escherichia coli. Nature 461:1243–1247. DOI: https://doi.org/10.
1038/nature08480, PMID: 19838166
Baym M, Kryazhimskiy S, Lieberman TD, Chung H, Desai MM, Kishony R. 2015. Inexpensive multiplexed library
preparation for megabase-sized genomes. PLOS ONE 10:e0128036. DOI: https://doi.org/10.1371/journal.
pone.0128036, PMID: 26000737
Behringer MG, Choi BI, Miller SF, Doak TG, Karty JA, Guo W, Lynch M. 2018. Escherichia coli cultures maintain
stable subpopulation structure during long-term evolution. PNAS 115:E4642–E4650. DOI: https://doi.org/10.
1073/pnas.1708371115, PMID: 29712844
Belyeu JR, Brown J, Pedersen BS, Cormier MJ, Layer R, Brueffer C, Valle-Inclan JE. 2020. Samplot: A Platform for
Structural Variant Visual Validation and Automated Filtering. GitHub. 2fb0b75. https://github.com/ryanlayer/
samplot
Blount ZD, Barrick JE, Davidson CJ, Lenski RE. 2012. Genomic analysis of a key innovation in an experimental
Escherichia coli population. Nature 489:513–518. DOI: https://doi.org/10.1038/nature11514, PMID: 22992527
Bondarev T. 2020. FluCalc. GitHub. 8dc996b. https://github.com/bondarevts/flucalc
Buskirk SW, Rokes AB, Lang GI. 2020. Adaptive evolution of nontransitive fitness in yeast. eLife 9:e62238.
DOI: https://doi.org/10.7554/eLife.62238, PMID: 33372653
Charlesworth D, Willis JH. 2009. The genetics of inbreeding depression. Nature Reviews Genetics 10:783–796.
DOI: https://doi.org/10.1038/nrg2664, PMID: 19834483
Chou HH, Chiu HC, Delaney NF, Segrè D, Marx CJ. 2011. Diminishing returns epistasis among beneficial
mutations decelerates adaptation. Science 332:1190–1192. DOI: https://doi.org/10.1126/science.1203799,
PMID: 21636771
Cingolani P, Platts A, Wang leL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. 2012. A program for
annotating and predicting the effects of single Nucleotide Polymorphisms, SnpEff: snps in the genome of
Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6:80–92. DOI: https://doi.org/10.4161/fly.19695,
PMID: 22728672
Couce A, Tenaillon OA. 2015. The rule of declining adaptability in microbial evolution experiments. Frontiers in
Genetics 6:99. DOI: https://doi.org/10.3389/fgene.2015.00099, PMID: 25815007
Fisher KJ, Buskirk SW, Vignogna RC, Marad DA, Lang GI. 2018. Adaptive genome duplication affects patterns of
molecular evolution in Saccharomyces cerevisiae. PLOS Genetics 14:e1007396. DOI: https://doi.org/10.1371/
journal.pgen.1007396, PMID: 29799840
Fisher KJ, Kryazhimskiy S, Lang GI. 2019. Detecting genetic interactions using parallel evolution in experimental
populations. Philosophical Transactions of the Royal Society B: Biological Sciences 374:20180237. DOI: https://
doi.org/10.1098/rstb.2018.0237, PMID: 31154981
Forche A, Abbey D, Pisithkul T, Weinzierl MA, Ringstrom T, Bruck D, Petersen K, Berman J. 2011. Stress alters
rates and types of loss of heterozygosity in candida albicans. mBio 2:e00129-11. DOI: https://doi.org/10.1128/
mBio.00129-11, PMID: 21791579
Frenkel EM, McDonald MJ, Van Dyken JD, Kosheleva K, Lang GI, Desai MM. 2015. Crowded growth leads to the
spontaneous evolution of semistable coexistence in laboratory yeast populations. PNAS 112:11306–11311.
DOI: https://doi.org/10.1073/pnas.1506184112, PMID: 26240355
Gallone B, Steensels J, Prahl T, Soriaga L, Saels V, Herrera-Malaver B, Merlevede A, Roncoroni M, Voordeckers
K, Miraglia L, Teiling C, Steffy B, Taylor M, Schwartz A, Richardson T, White C, Baele G, Maere S, Verstrepen
KJ. 2016. Domestication and Divergence of Saccharomyces cerevisiae Beer Yeasts. Cell 166:1397–1410.
DOI: https://doi.org/10.1016/j.cell.2016.08.020, PMID: 27610566
Gaspar JM. 2018. NGmerge: merging paired-end reads via novel empirically-derived models of sequencing
errors. BMC Bioinformatics 19:536. DOI: https://doi.org/10.1186/s12859-018-2579-2, PMID: 30572828
Gerstein AC, Chun HJ, Grant A, Otto SP. 2006. Genomic convergence toward diploidy in Saccharomyces
cerevisiae. PLOS Genetics 2:e145. DOI: https://doi.org/10.1371/journal.pgen.0020145, PMID: 17002497
Gerstein AC, Cleathero LA, Mandegar MA, Otto SP. 2011. Haploids adapt faster than diploids across a range of
environments. Journal of Evolutionary Biology 24:531–540. DOI: https://doi.org/10.1111/j.1420-9101.2010.
02188.x, PMID: 21159002
Gerstein AC, Kuzmin A, Otto SP. 2014. Loss-of-heterozygosity facilitates passage through Haldane’s sieve for
Saccharomyces cerevisiae undergoing adaptation. Nature Communications 5:3819. DOI: https://doi.org/10.
1038/ncomms4819, PMID: 24804896
Giaever G, Chu AM, Ni L, Connelly C, Riles L, Véronneau S, Dow S, Lucau-Danila A, Anderson K, André B, Arkin
AP, Astromoff A, El-Bakkoury M, Bangham R, Benito R, Brachat S, Campanaro S, Curtiss M, Davis K,
Deutschbauer A, et al. 2002. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418:387–
391. DOI: https://doi.org/10.1038/nature00935, PMID: 12140549
Good BH, McDonald MJ, Barrick JE, Lenski RE, Desai MM. 2017. The dynamics of molecular evolution over 60,
000 generations. Nature 551:45–50. DOI: https://doi.org/10.1038/nature24287, PMID: 29045390
Good BH, Desai MM. 2016. Evolution of mutation rates in rapidly adapting asexual populations. Genetics 204:
1249–1266. DOI: https://doi.org/10.1534/genetics.116.193565, PMID: 27646140
Harari Y, Ram Y, Rappoport N, Hadany L, Kupiec M. 2018. Spontaneous changes in ploidy are common in yeast.
Current Biology 28:825–835. DOI: https://doi.org/10.1016/j.cub.2018.01.062, PMID: 29502947
Harrison E, Koufopanou V, Burt A, MacLean RC. 2012. The cost of copy number in a selfish genetic element: the
2-mm plasmid of Saccharomyces cerevisiae. Journal of Evolutionary Biology 25:2348–2356. DOI: https://doi.
org/10.1111/j.1420-9101.2012.02610.x, PMID: 22994599
Huang ME, Rio AG, Nicolas A, Kolodner RD. 2003. A genomewide screen in Saccharomyces cerevisiae for genes
that suppress the accumulation of mutations. PNAS 100:11529–11534. DOI: https://doi.org/10.1073/pnas.
2035018100, PMID: 12972632
Jerison ER, Nguyen Ba AN, Desai MM, Kryazhimskiy S. 2020. Chance and necessity in the pleiotropic
consequences of adaptation for budding yeast. Nature Ecology & Evolution 4:601–611. DOI: https://doi.org/
10.1038/s41559-020-1128-3, PMID: 32152531
Johnson MS, Martsul A, Kryazhimskiy S, Desai MM. 2019. Higher-fitness yeast genotypes are less robust to
deleterious mutations. Science 366:490–493. DOI: https://doi.org/10.1126/science.aay4199, PMID: 31649199
Kassen R. 2014. Experimental Evolution and the Nature of Biodiversity. Roberts.
Khan AI, Dinh DM, Schneider D, Lenski RE, Cooper TF. 2011. Negative epistasis between beneficial mutations in
an evolving bacterial population. Science 332:1193–1196. DOI: https://doi.org/10.1126/science.1203801,
PMID: 21636772
Klopfenstein DV, Zhang L, Pedersen BS, Ramı́rez F, Warwick Vesztrocy A, Naldi A, Mungall CJ, Yunes JM,
Botvinnik O, Weigel M, Dampier W, Dessimoz C, Flick P, Tang H. 2018. GOATOOLS: a Python library for gene
ontology analyses. Scientific Reports 8:10872. DOI: https://doi.org/10.1038/s41598-018-28948-z,
PMID: 30022098
Kokina A, Kibilds J, Liepins J. 2014. Adenine auxotrophy–be aware: some effects of Adenine auxotrophy in
Saccharomyces cerevisiae strain W303-1A. FEMS Yeast Research 14:697–707. DOI: https://doi.org/10.1111/
1567-1364.12154, PMID: 24661329
Kryazhimskiy S, Rice DP, Jerison ER, Desai MM. 2014. Microbial evolution. global epistasis makes adaptation
predictable despite sequence-level stochasticity. Science 344:1519–1522. DOI: https://doi.org/10.1126/
science.1250939, PMID: 24970088
Lang GI, Botstein D, Desai MM. 2011. Genetic variation and the fate of beneficial mutations in asexual
populations. Genetics 188:647–661. DOI: https://doi.org/10.1534/genetics.111.128942, PMID: 21546542
Lang GI, Rice DP, Hickman MJ, Sodergren E, Weinstock GM, Botstein D, Desai MM. 2013. Pervasive genetic
hitchhiking and clonal interference in forty evolving yeast populations. Nature 500:571–574. DOI: https://doi.
org/10.1038/nature12344, PMID: 23873039
Lang GI, Murray AW. 2008. Estimating the per-base-pair mutation rate in the yeast Saccharomyces cerevisiae.
Genetics 178:67–82. DOI: https://doi.org/10.1534/genetics.107.071506, PMID: 18202359
Layer RM, Chiang C, Quinlan AR, Hall IM. 2014. LUMPY: a probabilistic framework for structural variant
discovery. Genome Biology 15:R84. DOI: https://doi.org/10.1186/gb-2014-15-6-r84, PMID: 24970577
Leiby N, Marx CJ. 2014. Metabolic erosion primarily through mutation accumulation, and not tradeoffs, drives
limited evolution of substrate specificity in Escherichia coli. PLOS Biology 12:e1001789. DOI: https://doi.org/
10.1371/journal.pbio.1001789, PMID: 24558347
Lenski RE. 2017. Experimental evolution and the dynamics of adaptation and genome evolution in microbial
populations. The ISME Journal 11:2181–2194. DOI: https://doi.org/10.1038/ismej.2017.69, PMID: 28509909
Leu JY, Chang SL, Chao JC, Woods LC, McDonald MJ. 2020. Sex alters molecular evolution in diploid
experimental populations of S. cerevisiae. Nature Ecology & Evolution 4:453–460. DOI: https://doi.org/10.
1038/s41559-020-1101-1, PMID: 32042122
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome
Project Data Processing Subgroup. 2009. The sequence alignment/Map format and SAMtools. Bioinformatics
25:2078–2079. DOI: https://doi.org/10.1093/bioinformatics/btp352, PMID: 19505943
Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. https://arxiv.
org/abs/1303.3997.
Liu G, Yong MY, Yurieva M, Srinivasan KG, Liu J, Lim JS, Poidinger M, Wright GD, Zolezzi F, Choi H, Pavelka N,
Rancati G. 2015. Gene essentiality is a quantitative property linked to cellular evolvability. Cell 163:1388–1399.
DOI: https://doi.org/10.1016/j.cell.2015.10.069, PMID: 26627736
Marad DA, Buskirk SW, Lang GI. 2018. Altered access to beneficial mutations slows adaptation and biases fixed
mutations in diploids. Nature Ecology & Evolution 2:882–889. DOI: https://doi.org/10.1038/s41559-018-0503-
9, PMID: 29581586
McDonald MJ, Rice DP, Desai MM. 2016. Sex speeds adaptation by altering the dynamics of molecular
evolution. Nature 531:233–236. DOI: https://doi.org/10.1038/nature17143, PMID: 26909573
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly
M, DePristo MA. 2010. The genome analysis toolkit: a MapReduce framework for analyzing next-generation
Wielgoss S, Barrick JE, Tenaillon O, Wiser MJ, Dittmar WJ, Cruveiller S, Chane-Woon-Ming B, Médigue C, Lenski
RE, Schneider D. 2013. Mutation rate dynamics in a bacterial population reflect tension between adaptation
and genetic load. PNAS 110:222–227. DOI: https://doi.org/10.1073/pnas.1219574110, PMID: 23248287
Wiser MJ, Ribeck N, Lenski RE. 2013. Long-term dynamics of adaptation in asexual populations. Science 342:
1364–1367. DOI: https://doi.org/10.1126/science.1243357, PMID: 24231808
Woods DR, Bevan EA. 1968. Studies on the nature of the killer factor produced by Saccharomyces cerevisiae.
Journal of General Microbiology 51:115–126. DOI: https://doi.org/10.1099/00221287-51-1-115,
PMID: 5653223
Zeyl C, Vanderford T, Carter M. 2003. An evolutionary advantage of haploidy in large yeast populations. Science
299:555–558. DOI: https://doi.org/10.1126/science.1078417, PMID: 12543972