Sushant - NGL Technologies Project
Sushant - NGL Technologies Project
Impressive progress has been made in the field of Next Generation Sequencing (NGS). Through
advancements in the fields of molecular biology and technical engineering, parallelization of the sequencing
reaction has pro- foundly increased the total number of produced sequence reads per run. Current
sequencing platforms allow for a previously unprecedented view into complex mixtures of RNA and DNA
samples. NGS is currently evolving into a molecular microscope finding its way into virtually every fields of
biomedical research. In this chapter we review the technical background of the different commercially
available NGS platforms with respect to template generation and the sequencing reaction and take a small
step towards what the upcoming NGS technologies will bring. We close with an overview of different
implementations of NGS into biomedical research. This article is part of a Special Issue entitled: From
Genome to Function.
1. Introduction
2. Sequence library preparation
The growing power and reducing cost sparked an enormous
range of applications of Next generation sequencing (NGS) All currently available sequencing platforms require some level
technology. Gradual- ly, sequencing is starting to become the of DNA pre-processing into a library suitable for sequencing. In
standard technology to apply, certainly at the first step where the general, these steps involve shearing of high molecular weight DNA
main question is “what's all involved”, “what's the basis”. It should into an ap- propriate platform-specific size range, followed by an
be realized that for many appli- cations sequencing would always end polishing step to generate blunt ended DNA fragments. Specific
have been the method of choice, yet it was science-fiction, adapters are ligat- ed to these fragments by either A/T overhang or
technically unthinkable and later possible but far too costly. We direct blunt ligation. A functional library requires having specific
perform genome-wide association studies (GWAS) using SNP- adapter sequences to be added to the 3′ and 5′ ends. Each of the
arrays simply because we cannot afford to perform whole- genome sequence platforms uses a differ- ent set of unique adapter
sequencing in ten-thousands of individuals. This is changing sequences to be compatible with the further steps of the process
rapidly and sequencing will become our molecular microscope, the (Fig. 1).
tool to get a first look. Although replication, transcription, Following adapter ligation Life Technologies (Solid, PGM,
translation, methylation and nuclear DNA folding are completely Proton) libraries require a nick translation step to get functional
different process- es, they can all be studied using sequencing. molecules while for the other technologies the sample is in principle
An important advantage of sequence data is its quality, robustness ready for loading immediately after ligation. One may then choose
and low noise. It should be noted that a successful NGS project requires to sequence these libraries directly as amplification free libraries or
expertise both at the wet lab as well as the bioinformatics side in order introduce a pre-amplification step prior to sequencing. It is
to warrant high quality data and data interpretation. The sequence itself important to realize that any step during pre-processing which
is hard evidence of its correctness. A sequencing system will not pro- involves amplification of the molecules [1] or which has been shown
duce “random” sequences and when it does this becomes evident im- to be sequence biased, like ligations [2], will impose a selection on
mediately from QC calls obtained from spike-in controls. Furthermore molecules that end up in the sequenceable libraries.
random sequences will have no match and can be easily discarded
3. Current sequencing technology
during data analysis and when their number exceeds a certain The different sequence platform vendors have devised different
threshold it is evident that there is a serious problem somewhere strategies to prepare the sequence libraries into suitable templates
in the study. as
The final step is to remove one strand of the dsDNA fragments
using the cleavable site in the surface oligo (Fig. 1A; step 6) and to
block all 3′ ends with ddNTP to prevent the otherwise open 3′ ends
to act as sequencing primer sites on adjacent library molecules [4].
With optimal loading of library molecules one flow-cell lane will
yield approximately 800–1000 K clusters per mm2. Optimal
amounts depend not only on the concentration of the library, but
also on the length of the molecules. Short molecules yield clusters
with a small area that are denser and therefore generate more
intense signals. Loading a wide fragment size distribution will
generate clusters varying widely in size and signal strength which
may impair the number of passing filter reads.
Bridge amplification is not a very efficient method for clonal ampli-
fication, i.e., the 35 cycles of isothermal amplification yield a mere
~ 1000 copies of the initial molecule. Moreover, there will be predomi-
nantly outward growth of the clusters, there is a high probability of
the template strands to re-hybridize instead of annealing to a new prim-
er site on the glass surface and there is both an upper and a lower limit
to the length of the template molecules that can be reliably amplified. In
addition, DNA polymerases, which are known to have biases towards
specific DNA templates are used during the amplification processes.
The bridge amplification scheme that Illumina exploits yields a high
number of clusters, i.e., with good loading of the flow cell, the total
num- ber of reads generated per Hiseq2000 lane may reach ~ 180
million. With a paired-end 2 × 100 bp read format the total output of
Fig. 1. Structure of sequence library molecules for the different technologies. Linear one flow-cell lane is up to ~ 36 Gb. A full run of 2 flow cells sequencing
library molecules (Panel A) contain different adapter sequences at the 5′ [A] and 3′
in parallel may yield ~ 600 Gb of data.
[B] ends of the library inserts. Circular library molecules (Panel B) contain identical
adapter molecules at both ends of the insert. During sequencing, the polonies on the flow cell are read one nucle-
otide at a time in repetitive cycles. During these cycles, fluorescently la-
well as to detect the signal and ultimately read the DNA sequence. beled dNTPs are incorporated into the growing DNA chain. Each of the
For the Illumina, Solid, PGM and 454 systems a local clonal four dNTP species (A, C, T, G) has a single different fluorescent label
amplification of the initial template molecules into polonies [3] is which serves to identify the base and act as a reversible terminator
required to increase the signal-to-noise ratio because the systems to prevent multiple extension events. After imaging the fluorescent
are not sensitive enough to detect the extension of one base at the group is cleaved off, the reversible terminator is de-activated and the
individual DNA template mol- ecule level. On the other hand, the template strands are ready for the next incorporation cycle. The se-
Heliscope and PacBio SMRT systems do not need any pre- quence is read by following the fluorescent signal per extension step
amplification steps as these systems are sensitive enough to detect for each cluster. Under ideal circumstances, all bases within a cluster
individual single molecule template extensions. The different will be extended in phase. However, a small portion of the molecules
strategies to generate the sequence reads also lead to dif- ferences in do not extend properly and fall either behind (phasing) or advance a
the output capacity for the different platforms (Table 1). Below we base (pre-phasing). Over many cycles, these errors will accumulate
will focus on the newer sequencing platforms, being the Illumina, and decrease the signal to noise ratio per cluster, causing a decrease in
LifeTechnologies Semiconductor sequencing and PacBio. Other quality towards the ends of the reads.
older platforms will briefly be discussed in Online Supplement 1. The cycle time for the HiSeq2000 is approximately 1 h. The
major contributor is the imaging of the flow-cell. The enzymatic
reactions take very little time at all. By reducing the imaging time,
3.1. Illumina technology the whole sequencing process can be sped up considerably. This is
implemented in the miSeq and Hiseq2500 platforms by providing
All of the enzymatic processes and imaging steps of the Illumina the option to decrease the total surface area to be imaged. In rapid
tech- nology take place in a flow cell. Depending on the specific mode, cycle time can thereby be reduced to 5 and 10 min for the
Illumina platform it may be partitioned into 1 (miSeq), 2 miSeq and Hiseq 2500, respectively. Furthermore, with optimized
(HiSeq2500) or 8 (HiSeq2000, HiSeq2500) separate lanes. The reagent kits for these short cycle times it is possible to achieve a 2 ×
Illumina platform uses bridge amplification for polony generation 300 bp paired end run on the miSeq, with 85% of data points
and a sequencing by synthesis (SBS) approach (Fig. 2A). Forward above Q30 and run times of
and reverse oligos for amplification (one with a cleavable site), ~ 65 h. However, the increased sequencing speed does come at a price.
complementary to the adapter sequences in- troduced during the With the decreased surface area, the total number of data points that
library preparation steps, are attached to the entire inside surface of can be generated per run will reduce, increasing sequencing cost per
the flow cell lanes. The first step for loading the library onto the flow- nucleotide significantly.
cell is denaturation of the dsDNA fragments into individual ssDNA Early 2014, Illumina has announced the release of two new
molecules. When on the flow-cell, these hybridize to the oligo nu- sequencer models, i.e., the NextSeq 500 and the HiSeq X Ten. The
cleotides on the surface (Fig. 1A; step 1) which are used as primers former system was designed to be a highly flexible, smaller version of
to form an initial copy of the individual sequencing template the HiSeq2500, provid- ing both medium (40 Gb) and a high output
molecule (Fig. 1A; step 2). The initial library molecules are removed (120 Gb) modes both with run times under 30 h. The HiSeq X Ten
and the copied, flow cell-attached fragments are used to generate a was designed for one main purpose: enabling whole human genome
cluster of identical template molecules using isothermal sequencing and reaching the
amplification. This is done through cyclic alternations of three specific $1000 genome in run costs. The main advancement enabling this is
buffers that mediate the denaturation, annealing and extension steps the introduction of the patterned flowcells. In contrast to the spatial
at 60 °C. During these steps the 3′ end of the copied library molecules random cluster generation of the HiSeq and MiSeq flowcells, the X
can hybridize to the complementary oligos on the flow cell, thus Ten flowcells contain a pre-formatted grid of nano-wells, which
forming a bridge structure (Fig. 1A; steps 3–5). each can produce
one sequence polony. This allows for optimized cluster densities and
Remarks
Output per run 3.2. Ion torrent technology
achieve optimal loading of one library molecule into all individual vesi-
cles. In fact 1/3 of the vesicles will have the one molecule to one vesicle
ratio, the remaining 2/3 will be either without a molecule or have more
Run types
ment step from empty spheres and the loaded spheres are deposited
into the sequencing chip.
The Ion torrent chip consists of a flow compartment and solid state
Detection
pH within the sensor wells. Due to the lack of the time consuming
SingleProton
increase is just 1.2 fold. This decrease of the relative increase of the
GS Junior
LifeTechnologies
0.5–1% of the molecules deviate from the flow either because they
Illumina/solexa
the A nucleotide to catch up after the T base. Although this does come at a
those of the above mentioned sequencers. First of all, the technology
cost of decreased overall read length, the overall quality of the read does
works with single molecule detection, i.e., the optics used are sensitive
improve. Still, the quality of the reads gradually decreases towards the
enough to detect incorporation of one fluorescently labeled nucleotide.
ends of the reads. By taking into account the flow order, it is possible to
Consequently, template preparation does not require any amplification
make flow-aware base caller algorithms and flow-space aware aligner
steps, and the prepared library molecule is the sequencing template. Li-
software and variant detection tools that take the actual flow order into
brary preparation is similar to shotgun libraries for the other platforms,
account when processing the data in order to generate higher accuracy
i.e., fragmentation of the gDNA to the required size, here multi-kb size,
data [7,8]. The present error rate for substitutions is ~ 0.1% [8] which
followed by end repair and either A/T overhang or direct blunt adapter
is very similar to that of the Illumina systems. The main point of criticism
ligation. The main difference is that the adapters have a hairpin struc-
the system endures is the homo-polymer errors. Despite many improve-
ture (SMRT loop adapters) so that after ligation the dsDNA fragments
ments the 5-mer homo-polymer error rate is still at 3.5% [8].
will have become circular (Fig. 1B). Pre-preparation of the sequencing
Since the initial release of the Ion torrent platform, this technology
template consists of annealing a sequencing primer to the ssDNA region
has evolved at a very rapid pace. The output specs of the first Ion-
of the SMRT loop adapters, followed by binding of the DNA polymerase
314 chip were a mere 10 Mb. Through increasing the total surface
to form the active polymerization complex. A combination of insert size
area of the chips and the sensor well density, all on the 350 nm
and read length will determine whether a short molecule is read several
CMOS technology, in addition to increasing the average read length
times (CCS or consensus circular sequencing) or a long molecule only
from 100 up to 400 bp, the newest Ion-318 chips produce ~ 1 Gb.
once.
For the Ion Proton System 110 nm CMOS technology was used to
The standard PacBio DNA library preparation starting amounts can
manufacture the Proton-I chips. The diameter of the spheres and
be as high as 1–5 μg total genomic DNA or 0.5–1.0 μg sheared and
the sensor wells decreased which allowed the number of wells to
size selected DNA fragments for 10 kb libraries. These numbers are
increase to ~ 165 million per chip. The Proton-I chips currently
unex- pectedly high for a system that reads single DNA molecules.
yield 60–80 million reads per run, reaching 10 Gb. This is enough to
These high input requirements may limit the use of the PacBio
sequence two human exomes at ~50× coverage. The announced
system for low input applications such as ChIP-Seq or single-cell
Proton-II chip will have 4× the number of sensor wells, with an
genomics. The bottleneck lies at the stringent XP bead clean-up
expected output of ~ 32 Gb per chip promising to generate a whole
steps and Exo III and VII treatments during library preparation to
human genome at ~ 10× coverage, still within the 4 h run time. This
exclude short and/or non-circular fragments from the final library
output puts it at par with a paired-end run on single HiSeq2000 lane.
pool. A lot of sample is lost during these steps, but not taking these
Life Technologies have developed an alternative method to generate
measures which would lead to suboptimal output of the runs.
polonies called Wildfire [9]. The process generates clusters on a Alternative methods have been explored that bypass these stringent
solid surface using isothermal amplification without denaturation or library preparation steps and immediately proceed to sequencing the
amplifi- cation cycles. Although initially designed for the Solid 5500 unprocessed DNA template molecules either by specifically primed or
system, it is likely that this method could be applied to the Ion random hexamer primed sequencing on plas- mid DNA. Although the
torrent semicon- ductor sequencing as well. This may involve output of the SMRT cells was far below standard specs, Coupland et
spheres as an intermediate carrier or clusters may be generated al. were able to sequence the entire M13mp18 ge- nome at average
directly into the sensor wells. A Proton-III chip has been 56× coverage using as little as 3.1 ng input DNA [10]. These
announced that will double the number of wells to 1.2 billion, experiments demonstrated that it is possible to directly sequence
leading to an expected output of ~ 64 Gb per run. small single- and double-stranded DNA genomes without the need
With these output levels, the Ion Proton will become a competitor for any DNA hungry library preparation steps.
to the current Illumina HiSeq systems. The sequencing reaction takes place at the bottom of the ~ 150,000
zero-mode waveguide (ZMW) wells [11] on a SMRT cell. These ZMW
3.3. Pacific biosciences technology are small reaction wells that each ideally contains one complex consisting
of template molecule, sequencing primer and DNA polymerase bound to
The principles underlying the pacific biosciences single molecule the bottom of the ZMW [12]. Unlike the Illumina and Life Technologies
real-time (SMRT) sequencing technology are quite different from
platforms, PacBio does not rely on interrupted cycles of extension and im-
the sequencing reaction. In the C3 chemistry, photo-protected
aging to read the template strand. Instead, the fluorescent signals of the
nucleo- tide analogs are used which shield the polymerase from
extended nucleotides are recorded in real time at 75 frames per second
damage and consequently, half of the reads are over 8–10 kb in
for the individual ZMWs. This is achieved by a powerful optical system
length when using the P5 enzyme (Fig. 3). With the current max.
that illuminates the individual ZMWs with red and green laser beamlets
movie length of 180 min, read lengths of over 40,000 bases have
from the bottom of the SMRT cell and a parallel confocal recording system
been reported. With these specifications the output per SMRT cell
to detect the signal from the fluorescent nucleotides [13]. The width of
reaches ~ 400 Mb for the PacBio RSII.
the ZMWs is chosen in relation to the laser wavelength such that the
One much debated aspect of the PacBio data is the high single pass
light cannot pass through the ZMWs but a zeptoliter sized illumination
error rate of 10–15%, the majority of which are insertion/deletion errors
zone is formed at the bottom of the ZMWs where the active polymerase
with only a small fraction of miscalls. It is important to keep in mind
complex is bound.
that in contrast to the other sequencing technologies, the errors in the
As with the Illumina platform, the four nucleotide species are
PacBio reads are randomly distributed and do not occur more
labeled with a different fluorescent label. However, a crucial
frequently towards the end of the reads. This property can be used to
difference is that the label for the PacBio system is terminally
create con- sensus calls from information of multiple reads covering
phospholinked nucleotides, meaning the label is cleaved off during
each reference position. Consensus accuracy can reach over 99.999%
strand extension. In addition, the nucleotides do not contain a
by using the PacBio Quiver software [17]. Although this does require
terminator group allowing continuous extension of the growing
a coverage of
DNA copy. When a nucleotide complementary to the template is
~ 40× per base, these levels can easily be reached with the output
bound in position by the polymerase within the illu- mination zone
of a few SMRT cells for small (bacterial) genomes.
of the ZMW, the identity of the nucleotide is recorded by its
Despite the relatively low output of the system per SMRT cell, the
fluorescent label. During extension the label is cleaved off, diffuses
PacBio long read data, absence of GC bias and insight into the kinetic
outside the illumination zone, and the complex is ready for the next
extension. In essence, the system records a movie of the activity of state of the polymerase during sequencing present a niche of specific
the polymerase during a rolling circle amplification of the template. applications for this system that cannot be covered by any of the other
The po- lymerase used for sequencing in the PacBio system is a currently available sequence platforms.
modified version of phi29 [14] which although exhibiting reduced 3–
5 exonuclease activ- ity, still has many of the properties of the 4. Future sequencing technology
original phi29 like high processivity of several hundred kilobases,
low error rate of ~ 10e− 5 [15], no GC bias and strand displacement The advancements made on sequencing technology over the last
properties [16]. years have been impressive. However, the ultimate sequencing plat-
Through several advancements at the technological, bioinformatics form would work on single DNA or RNA molecules without any
and chemistry level, the average output per SMRT cell has (pre-) amplification, without use of optical steps, reads of Mb to Gb
increased considerably since the release of the system early 2011. in length, no GC bias, high read accuracy and would be flexible
The output has mainly increased though achieving longer average enough to generate as many sequence reads as are necessary for the
read lengths, starting from the ~ 1 kb at the time of release, while specific research ques- tion at hand. In addition, it should be both
the number of reads passing filter remained ~ 50–60 K per SMRT cheap to acquire and run, easy to operate, have short run times and
cell. Although the maximal achievable read lengths that can be simple or no library pre- preparation steps. Needless to say, this
obtained with this technol- ogy are directly related to the length of sequencing platform does not exist, yet. In the next section we will
the sequencing time, not all polymerase complexes reach identical discuss one emerging sequencing platform which may have the
read lengths. The main cause for this is photo-damage of the phi29 potential to make the next step towards these ultimate sequencers.
polymerase which terminates
Fig. 3. Total read length distribution for PacBio reads obtained with the P5 enzyme in combination with the C3 chemistry. Blue bars (left y-axis) represent the number of reads
and the black line (right y-axis) represents the total amount of data from reads longer than the read length on the x-axis. The data presented is from 8 SMRT cells run on the same
library. The total number of bases and reads is indicated with the average per SMRT cell in parentheses.
4.1. Oxford nanopore sequencing Finally transcripts
5. NGS applications
5.2. ChIP-seq
5.3. Methylation
Transcriptome analysis
Tag based (SAGE/CAGE) N 10 million Single end 20–50
SmallRNA N 10 million Single end 20–50
mRNA Seq N 30 million Paired-end N50 Efficient exclusion of rRNA derived sequences
increases
the resolution of the transcripts of interest
Ribosome profiling N 20 million Single end 20–50
ChIP-Seq N 20 million Single or Paired-end ≥50 Specificity of the ChIP enzyme determines the # reads
needed. Low specificity ~ more background = more
reads needed
De novo sequencing 30× genome coverage, long single-end reads and Paired-end As long as possible Ideal PacBio long reads. Or combination of paired-end,
preferably more. mate-pair and PacBio.
Meta-genomics
Tag based (ITS, 16S) N100,000 Paired-end, long single-end reads As long as possible Complexity of the specific biosphere determines both
the primer pairs and/or #reads per sample. Longer
reads allow for better differentiation between related
species
Shotgun N 100 million Paired-end, long single reads As long as possible Complexity of the specific biosphere determines the
library insert size and/or #reads per sample.
Methylation analysis
Whole genome N 400 million Paired-end ≥100 Ideal situation: all PacBio long reads on
native/ unmodified shotgun libraries.
Enrichment strategies N 50 million Paired-end ≥100
Infections N 25 million Single or Paired-end ≥100 ~2% of cell-free DNA from plasma is of non-human
origin
Non-invasive prenatal testing N 10–20 million Single-end N50 Trisomy detection from cell-free fetal DNA in maternal
plasma
5.5. Metagenomics
RNA sequencing one gets an overview of “what's happening” [44].
The technology facilitates the study of the consequences of environ-
A complex variant of de novo genome sequencing is “sequence it
mental changes as well as way to determine the cause of
all”, metagenomics. Using brute force sequencing, simply reading all
disturbances. Enormous progress has been made in a medical
DNA sequences present in a sample, metagenomics is a way to make
setting. Studying the human microbiome gave a range of surprising
an inventory of what is present in a sample, of what is living where. A
findings, incl. its enor- mous complexity containing 10 × more cells
simple but effective application of this is trying to detect the cause of
than the human body and 1000 × more genes. The microbiome of
an infectious disease. Simply analyzing all DNA from control versus
the human gut has been studied in great detail [45] including its
infected (diseased) individuals will reveal the “extra” DNA which most
relation with phenotypes like obesity [46]. Microbiomics is hot and
likely derives from the infectious agent. The approach was used success-
it will undoubtedly bring new insights in the interplay between
fully to identify e.g. colony collapse disorder killing honeybees [38] but
human health and the bugs living on and in within us.
also to identify the cause of diseases that killed thousands of humans in
the past, inclusive the black death [39].
Metagenomics can be performed by a sequence it all approach or
5.6. Non-invasive prenatal testing
by focusing on specific uniformly conserved sequences like e.g.
ribosomal RNA genes only. The latter approach has two main
To obtain DNA from a fetus, prenatal diagnosis generally
advantages; 1) the complexity of the data obtained is much smaller,
involves the costly and risky sampling of either chorionic villi or
and 2) more sequences can be assigned to a specific organism or a
amniotic fluid. It is long known that DNA of the fetus can be found
group of related organisms. The latter facilitates some semi-
in maternal blood (cell free serum), yet it has low abundance and
quantitative analysis which is much more difficult when analyzing
low quality and it is not easy to discriminate fetal from maternal
all sequences mixed from many organ- isms with largely varying
DNA. These characteristics prevented wide-spread implementation
genomesizes. Metagenomics focussing on rDNA genes has been
of prenatal tests performed on maternal blood. However, the
used to study many different things, incl. e.g. the effect of the 2004
enormous power of NGS technology seems particularly attractive
tsunami on microbial ecologies in marine, brackish, freshwater and
for non-invasive prenatal testing (NIPT). To detect trisomies, in
terrestrial communities in Thailand [40]. Targeting the chloroplast
particular trisomy-21 or Down's syndrome, a very simple but
trnL gene was successfully used to study airborne pollen and its
effective brute-force method was developed: sequence, map, and
relation with hay-fever [41].
count. DNA isolated from maternal serum is sequenced, reads are
A unique feature of metagenomics approaches is that one does
mapped to the human genome and counted per chromosome.
not need to culture the organisms that one wants to study. The
When 5–10 million reads are mapped, trisomies will reveal
most im- pressive result of first studies that read all DNA sequences
themselves by giving a significantly too high number of reads
was that up to ten times more organisms were encountered than
mapping to a partic- ular chromosome [47]. A recent study showed
seen previously. One can now study organisms that no one is able
that when genome se- quencing of both parents, genome-wide
to culture and/or that no one has ever seen. Based on DNA
maternal haplotyping and deep sequencing of maternal plasma
sequencing one gets an idea of the complexity and constitution of
DNA are combined even the ge- nome sequence of an 18.5 weeks
entire ecosystems [42,43]. Using
human fetus can be determined [48].
5.7. Disease gene identification
studies, low accuracy (N 85%) single-molecule long-read sequences
can be sufficient to make a significant difference. De novo genome
A combination of genome-wide association studies (GWAS) and
assembly can be improved considerably when long kilobase-sized
specific targeting by sequence capture of the genomic regions
reads are avail- able that span gaps from short-read paired-end
detected is now used extensively trying to identify the variants that
sequencing. The single- molecule technologies are amplification free
functionally link the DNA with the phenotype. Similarly, genome
and thereby not hampered by PCR-based artifacts like uneven
sequencing can be used as a tool to characterize genetic variation in
amplification and GC-bias. Conse- quently, they give a much more
a specific population, determine haplotype structure and use this
uniform coverage and span GC-rich regions. In addition they may
knowledge to impute alleles and boost the outcome of GWAS
span repeat structures or duplications that cannot be resolved using
analysis [GoNL consortium, 2014, in press].
short read sequencing. English et al. have used PacBio data for
One of the most impressive applications of NGS lies in the field of
upgrading existing draft genome assemblies derived from Illumina
human genetics and disease gene identification. In the past larger fam-
sequencing data by looking for reads that extend into or cover gaps
ilies were an absolute requirement for a successful approach. Without
in the assembly [56]. A paper by Loomis et al. showed that the
being able to first map a disease gene to a specific position and then
system has no problems sequencing long regions of 100% GC
zooming in on the genes in that region, the human genome was simply
content of CGG trinucleotide repeat expansions [57].
too big and analysis too costly. Some successes were obtained using
Amplification-free methods facilitate the analysis of DNA modifica-
candidate disease gene approaches but generally these only worked
tions, deriving from either cellular processes like methylation [58]
when for a specific disease a new gene was discovered making similar
and/or from damage (irradiation, chemicals, etc) [59]. DNA
genes or neighboring genes in a certain pathway obvious candidates.
modifica- tions present on the template molecule affect the DNA
NGS studies were much more successful, even when only one or a few
polymerase activity to a certain extent, i.e., the time needed for
cases are available [49]. Parent–child trio analysis turned out to be
incorporation of a nucleotide at a particular site. The identity of
very effective to reveal dominant de novo diseases [50], while recessive
modifications can be inferred by analysis of the kinetic state of the
diseases can be revealed when several unrelated cases are available or
polymerase. This works well for modifications that have a large
when clearly damaging variants are present [51]. The latter successes
effect on the polymerase activity, e.g., N6-methyladenine (m6A)
could already be obtained with exome sequencing, i.e. a method to
and N4-methylcytosine (m4C) [60,61]. The 5-methylcytosine (m5C)
zoom in on the 1–2% protein coding sequences of the human genome
modification has a weaker signal but given enough coverage can still
only. Needless to say that, when cost drops further, full genome
be inferred from the data [61]. How- ever, conversion of 5-
sequencing will be used to detect also deleterious variants that are not
methylcytosine to 5-carboxylcytosine through the Ten-eleven
in the protein coding regions. Early steps towards the ultimate applica-
Translocation Gene Protein 1 (Tet1) results in a greater disturbance
tion, genome-based medicine/personalized medicine have been set
in the signal, thus making it easier to detect even at lower coverage
by the UK and Saudi Arabia which last year both announced projects
[62].
to sequence the genome of 100,000 individuals.
Other applications of long-read sequencing are RNA structure analy-
sis [63] and studies to unravel complex repeat structures (e.g.
5.8. Human disease and health
segmental duplications in the human genome) and large segments of
repetitive DNA.
Thus far studies have been mostly performed on the level of cell
cultures, whole tissues or sorted cell populations. Although the yield
6. Online supplement
per cell, 30%–70% of all RNA or DNA present, can still be improved,
re- cent NGS developments have now made genome-wide single cell
6.1. Solid sequencing
anal- ysis feasible. Individual cells turn out to be quite different
showing extensive genomic and transcriptomic heterogeneity in both
The Solid systems read DNA by an intricate sequencing-by-
normal development and disease [52]. This turns out to be especially
ligation scheme [64]. After positioning the templated beads on the
true for cancer tissue being a complex mixture of many different cell
flow-cell, the first step is hybridization of a primer complementary
popula- tions each carrying a range of genomic rearrangements driving
to the com- plete adapter followed by hybridization of octamer
its unre- stricted growth. Dissecting these using (very) deep
probes. The first two nucleotides of these probes represent 16
sequencing of the cancer genome/transcriptome as a whole or from
dinucleotide combina- tions, bases 3–5 are degenerate and bases 6–
single cell analysis should give us a tool to identify the so-called driver
8, also degenerate, contain a fluorescent label for identification of
mutations. These will be different in different tumors and instrumental
bases 1 and 2. Four different di- nucleotide combinations are
to direct treat- ment and prescribe the best (set of) drugs to be used,
labeled with the same fluorescent tag. Use of 4 different fluorescent
personalized cancer treatment. Similarly NGS approaches will be used
groups allows for labeling all 16 dinucleo- tide tags. When a probe
to study drug resistance, identify their mechanism and provide
anneals adjacent to the adapter primer strands are ligated and
strategies to combat resistance [53]. Coordinated by the International
information on the first two bases is recorded from the fluorescent
Cancer Genome Consortium (ICGC) a large project is ongoing trying to
signal at bases 6–8. These last three bases are removed and new
resolve the geno- mic changes present in many forms of cancers by
octomer probes are allowed to hybridize and ligate, providing
analyzing 50 cancer/ normal tissue pairs [54]. In due time NGS information on bases 6 and 7, i.e., position +5 relative to the
developments will start to impact our daily life. While it will initially be previous cycle, of the template. Depending on the required read
used as a molecular micro- scope to diagnose disease, ultimately it will length, 5–7 of such cycles are performed after which the entire
also be used to monitor our personal health. Our genome sequence will synthesized strand is removed from the template. A new primer
be read once, but e.g. blood- derived RNA analysis, completed with with a −1 nt shift in end po- sition is hybridized and octomer probes
proteomics and metabolomics measurements, will be used on a regular are allowed to hybridize and li- gate in 5–7 cycles as described above,
basis to study the status of our body [55], the Whole-body-BIOscan. but now probing nucleotides 0–1, 5–6, and 10–11 etc. of the
template. In total 5 of these re-priming cycles are performed. The
5.9. Single molecule & long read sequencing template sequence is subsequently decoded from the color labels
from the two ligation events per base [65].
It should be noted that high accuracy sequences, i.e. sequences con- Similar to Illumina sequencing, the Solid system is able to
taining few read errors, are not essential for all applications. For specific generate paired-end reads. However, due to limitations of
sequencing by ligation scheme the maximal read length is limited to
75 bases for read 1 and 35 bases for read 2. Although the total
number of reads generated per Solid run is comparable to HiSeq,
the total output in Gb per run is only half of that for HiSeq due to
the shorter read length. The main advantage of this
scheme is that each base is interrogated by two octomer ligations, J. Li, M. Gershow, D. Stein, E. Brandin, J.A. Golovchenko, DNA molecules and config- urations in
a solid-state nanopore microscope, Nat. Mater. 2 (2003) 611–615.
which results in a significant increase in read accuracy which may [21] D. Fologea, M. Gershow, B. Ledden, D.S. McNabb, J.A. Golovchenko, J. Li, Detecting
be as low as 0.01%. By adding yet another re-priming cycle, the single stranded DNA with a solid state nanopore, Nano Lett. 5 (2005) 1905–1909.
[22] P. A.'t Hoen, Y. Ariyurek, H.H. Thygesen, E. Vreugdenhil, R.H. Vossen, R.X. de
accuracy can theoretically be improved even more.
Menezes, J.M. Boer, G.-J.B. van Ommen, J.T. den Dunnen, Deep sequencing-based
expression analysis shows major advances in robustness, resolution and inter-lab
portability over five microarray platforms, Nucleic Acids Res. 36 (2008) e141.
6.2. Roche 454 [23] T. Lappalainen, M. Sammeth, M.R. Friedlander, P.A.C. ‘t Hoen, J. Monlong, M.A. Rivas,
M. Gonzalez-Porta, N. Kurbatova, T. Griebel, P.G. Ferreira, M. Barann, T. Wieland, L.
Greger, M. van Iterson, J. Almlof, P. Ribeca, I. Pulyakhina, D. Esser, T. Giger, A.
The Roche company has produced two Pyrosequencing [66]
Tikhonov, M. Sultan, G. Bertier, D.G. MacArthur, M. Lek, E. Lizano, H.P.J. Buermans,
plat- forms, i.e., the FLX+ and the bench top GS Junior system, on I. Padioleau, T. Schwarzmayr, O. Karlberg, H. Ongen, H. Kilpinen, S. Beltran, M. Gut,
the same technology. Similar to Ion torrent sequencing, templated K. Kahlem, V. Amstislavskiy, O. Stegle, M. Pirinen, S.B. Montgomery, P. Donnelly,
M.I. McCarthy, P. Flicek, T.M. Strom, T.G. Consortium, H. Lehrach, S. Schreiber, R.
beads are de- posited in a picotiter plate to separate the individual
Sudbrak, A. Carracedo, S.E. Antonarakis, R. Hasler, A.-C. Syvanen, G.-J. van Ommen,
sequence reactions and T–A–C–G nucleotide flows are applied to the A. Brazma, T. Meitinger, P. Rosenstiel, R. Guigo, I.G. Gut, X. Estivill, E.T.
plate. The main differ- ence is that the sequencing reaction is Dermitzakis, Transcriptome and genome sequencing uncovers functional variation
in humans, Nature 501 (2013) 506–511.
monitored by detection light gen- erated by luciferase-mediated
[24] H.J. Westra, M.J. Peters, T. Esko, H. Yaghootkar, C. Schurmann, J. Kettunen, M.W.
conversion of luciferin to oxyluciferin upon primer extension. For a Christiansen, B.P. Fairfax, K. Schramm, J.E. Powell, A. Zhernakova, D.V. Zhernakova,
long time, the main niche for these sys- tems was the long read J.H. Veldink, L.H. Van den Berg, J. Karjalainen, S. Withoff, A.G. Uitterlinden, A.
Hofman, F. Rivadeneira, P.A.C. ’t Hoen, E. Reinmaa, K. Fischer, M. Nelis, L. Milani, D.
lengths of 500–800 bp. However, increased read lengths for
Melzer, L. Ferrucci, A.B. Singleton, D.G. Hernandez, M.A. Nalls, G. Homuth, M.
competitor platforms have made pyrosequencing less cost efficient. Nauck, D. Radke, U. Volker, M. Perola, V. Salomaa, J. Brody, A. Suchy-Dicey, S.A.
The Roche systems were announced to be phased out in mid-2016. Gharib, D.A. Enquobahrie, T. Lumley, G.W. Montgomery, S. Makino, H. Prokisch, C.
Herder, M. Roden, H. Grallert, T. Meitinger, K. Strauch, Y. Li, R.C. Jansen, P.M.
Visscher, J.C. Knight, B.M. Psaty, S. Ripatti, A. Teumer, T.M. Frayling, A. Metspalu, J.B.J.
van Meurs, L. Franke, Systematic identification of trans eQTLs as putative drivers of
References
known disease associations, Nat. Genet. 45 (2013) 1238–1243.
[25] E. Valen, G. Pascarella, A. Chalk, N. Maeda, M. Kojima, C. Kawazu, M. Murata, H.
[1] J. Dabney, M. Meyer, Length and GC-biases during sequencing library
Nishiyori, D. Lazarevic, D. Motti, et al., Genome-wide detection and analysis of
amplification: a comparison of various polymerase-buffer systems with ancient
hippocampus core promoters using deepcage, Genome Res. 19 (2009) 255–265.
and modern DNA sequencing libraries, Biotechniques 52 (2012) 87–94.
[26] E. de Klerk, A. Venema, S.Y. Anvar, J.J. Goeman, O. Hu, C. Trollet, G. Dickson, J.T.
[2] M. Hafner, N. Renwick, M. Brown, A. Mihailović, D. Holoch, C. Lin, J.T. Pena, J.D.
den Dunnen, S.M. van der Maarel, V. Raz, et al., Poly (a) binding protein nuclear 1
Nusbaum, P. Morozov, J. Ludwig, et al., RNA-ligase-dependent biases in miRNA
levels affect alternative polyadenylation, Nucleic Acids Res. 40 (2012) 9089–9101.
repre- sentation in deep-sequenced small RNA cDNA libraries, RNA 17 (2011) 1697–
[27] E. Berezikov, F. Thuemmler, L.W. van Laake, I. Kondova, R. Bontrop, E. Cuppen,
1712.
R.H.A. Plasterk, Diversity of microRNAs in human and chimpanzee brain, Nat.
[3] R.D. Mitra, G.M. Church, In situ localized amplification and contact replication of
Genet. 38 (2006) 1375–1377.
many individual DNA molecules, Nucleic Acids Res. 27 (1999) e34–e39.
[28] E.N.M. Nolte-'t Hoen, H.P.J. Buermans, M. Waasdorp, W. Stoorvogel, M.H.M.
[4] D.R. Bentley, S. Balasubramanian, H.P. Swerdlow, G.P. Smith, J. Milton, C.G. Brown,
Wauben, P.A.C. t Hoen, Deep sequencing of RNA from immune cell-derived
K.P. Hall, D.J. Evers, C.L. Barnes, H.R. Bignell, et al., Accurate whole human genome se-
vesicles uncovers the selective incorporation of small non-coding RNA biotypes
quencing using reversible terminator chemistry, Nature 456 (2008) 53–59.
with poten- tial regulatory functions, Nucleic Acids Res. 40 (2012) 9272–9285.
[5] M. Nakano, J. Komatsu, S.-i. Matsuura, K. Takashima, S. Katsura, A. Mizuno,
[29] R. Andersson, C. Gebhard, I. Miguel-Escalada, I. Hoof, J. Bornholdt, M. Boyd, Y. Chen,
Single- molecule pcr using water-in-oil emulsion, J. Biotechnol. 102 (2003) 117–124.
X. Zhao, C. Schmidl, T. Suzuki, E. Ntini, E. Arner, E. Valen, K. Li, L. Schwarzfischer, D.
[6] D. Dressman, H. Yan, G. Traverso, K.W. Kinzler, B. Vogelstein, Transforming single
Glatz, J. Raithel, B. Lilje, N. Rapin, F.O. Bagger, M. Jorgensen, P.R. Andersen, N. Bertin,
DNA molecules into fluorescent magnetic particles for detection and enumeration
O. Rackham, A.M. Burroughs, J.K. Baillie, Y. Ishizu, Y. Shimizu, E. Furuhata, S. Maeda,
of genetic variations, Proc. Natl. Acad. Sci. 100 (2003) 8817–8822.
Y. Negishi, C.J. Mungall, T.F. Meehan, T. Lassmann, M. Itoh, H. Kawaji, N. Kondo, J.
[7] D. Golan, P. Medvedev, Using state machines to model the ion torrent sequencing
Kawai, A. Lennartsson, C.O. Daub, P. Heutink, D.A. Hume, T.H. Jensen, H. Suzuki, Y.
process and to improve read error rates, Bioinformatics 29 (2013) i344–i351.
Hayashizaki, F. Muller, T.F. Consortium, A.R.R. Forrest, P. Carninci, M. Rehli, A.
[8] B. Merriman, IonTorrentR&D-team, J.M. Rothberg, Progress in ion torrent
Sandelin, An atlas of active enhancers across human cell types and tissues, Nature
semicon- ductor chip based sequencing, Electrophoresis 33 (2012) 3397–3417.
507 (2014) 455–461.
[9] Z. Ma, R.W. Lee, B. Li, P. Kenney, Y. Wang, J. Erikson, S. Goyal, K. Lao, Isothermal
[30] N.T. Ingolia, S. Ghaemmaghami, J.R. Newman, J.S. Weissman, Genome-wide
amplification method for next-generation sequencing, Proc. Natl. Acad. Sci. 110
analysis in vivo of translation with nucleotide resolution using ribosome pro filing,
(35) (2013) 14320–14323.
Science 324 (2009) 218–223.
[10] P. Coupland, T. Chandra, M. Quail, W. Reik, H. Swerdlow, Direct sequencing of
[31] Y. Blat, N. Kleckner, Cohesins bind to preferential sites along yeast chromosome III,
small genomes on the pacific biosciences RS without library preparation,
with differential regulation along arms versus the centric region, Cell 98 (1999) 249–
Biotechniques 53 (2012) 365–372.
259.
[11] M.J. Levene, J. Korlach, S.W. Turner, M. Foquet, H.G. Craighead, W.W. Webb, Zero-
[32] E.P. Consortium, et al., The encode (encyclopedia of DNA elements) project, Science
mode waveguides for single-molecule analysis at high concentrations, Science 299
306 (2004) 636–640.
(2003) 682–686.
[33] P.W. Laird, Principles and challenges of genome-wide DNA methylation
[12] J. Korlach, P.J. Marks, R.L. Cicero, J.J. Gray, D.L. Murphy, D.B. Roitman, T.T. Pham,
analysis, Nat. Rev. Genet. 11 (2010) 191–203.
G.A. Otto, M. Foquet, S.W. Turner, Selective aluminum passivation for targeted
[34] B.R. Herb, F. Wolschin, K.D. Hansen, M.J. Aryee, B. Langmead, R. Irizarry, G.V.
immobi- lization of single DNA polymerase molecules in zero-mode waveguide
Amdam, A.P. Feinberg, Reversible switching between epigenetic states in
nanostruc- tures, Proc. Natl. Acad. Sci. 105 (2008) 1176–1181.
honeybee behavioral subcastes, Nat. Neurosci. 15 (2012) 1371–1373.
[13] P.M. Lundquist, C.F. Zhong, P. Zhao, A.B. Tomaney, P.S. Peluso, J. Dixon, B. Bettman, Y.
[35] R. Li, W. Fan, G. Tian, H. Zhu, L. He, J. Cai, Q. Huang, Q. Cai, B. Li, Y. Bai, et al., The
Lacroix, D.P. Kwo, E. McCullough, et al., Parallel confocal detection of single mole-
sequence and de novo assembly of the giant panda genome, Nature 463 (2009) 311–
cules in real time, Opt. Lett. 33 (2008) 1026–1028.
317.
[14] M. de Vega, J.M. Lazaro, M. Salas, L. Blanco, Primer-terminus stabilization at the 3′–5′
[36] N. Rohland, D. Reich, S. Mallick, M. Meyer, R.E. Green, N.J. Georgiadis, A.L. Roca,
exonuclease active site of phi29 DNA polymerase. Involvement of two amino
M. Hofreiter, Genomic DNA sequences from mastodon and woolly mammoth reveal
acid residues highly conserved in proofreading DNA polymerases, EMBO J. 15
deep speciation of forest and savanna elephants, PLoS Biol. 8 (2010) e1000564.
(1996) 1182.
[37] D. Reich, R.E. Green, M. Kircher, J. Krause, N. Patterson, E.Y. Durand, B. Viola,
[15] J. Esteban, M. Salas, L. Blanco, Fidelity of phi 29 DNA polymerase. Comparison be-
A.W. Briggs, U. Stenzel, P.L. Johnson, et al., Genetic history of an archaic hominin
tween protein-primed initiation and DNA polymerization, J. Biol. Chem. 268 (1993)
group from denisova cave in Siberia, Nature 468 (2010) 1053–1060.
2719–2726.
[38] D.L. Cox-Foster, S. Conlan, E.C. Holmes, G. Palacios, J.D. Evans, N.A. Moran, P.-L. Quan,
[16] L. Blanco, A. Bernad, J.M. Lázaro, G. Martin, C. Garmendia, M. Salas, Highly
T. Briese, M. Hornig, D.M. Geiser, et al., A metagenomic survey of microbes in
efficient DNA synthesis by the phage phi 29 DNA polymerase. Symmetrical mode of DNA
honey bee colony collapse disorder, Science 318 (2007) 283–287.
replication, J. Biol. Chem. 264 (1989) 8935–8940.
[39] K.I. Bos, V.J. Schuenemann, G.B. Golding, H.A. Burbano, N. Waglechner, B.K. Coombes,
[17] C.-S. Chin, D.H. Alexander, P. Marks, A.A. Klammer, J. Drake, C. Heiner, A. Clum,
J.B. McPhee, S.N. DeWitte, M. Meyer, S. Schmedes, et al., A draft genome of
A. Copeland, J. Huddleston, E.E. Eichler, et al., Nonhybrid, finished microbial
Yersinia pestis from victims of the black death, Nature 478 (2011) 506–510.
genome assemblies from long-read SMRT sequencing data, Nat. Methods 10 (6)
[40] N. Somboonna, A. Wilantho, K. Jankaew, A. Assawamakin, D. Sangsrakru, S.
(2013) 563–569.
Tangphatsornruang, S. Tongsima, Microbial ecology of Thailand tsunami and
[18] N. Ashkenasy, J. Snchez-Quesada, H. Bayley, M.R. Ghadiri, Recognizing a single base
non- tsunami affected terrestrials, PLoS One 9 (2014) e94236.
in an individual DNA strand: a step toward DNA sequencing in nanopores, Angew.
[41] K. Kraaijeveld, L.A. de Weger, M. Ventayol GarcÃa, H. Buermans, J. Frank, P.S.
Chem. Int. Ed. 44 (2005) 1401–1404.
Hiemstra, J.T. den Dunnen, Efficient and sensitive identification and quantification
[19] J. Clarke, H.-C. Wu, L. Jayasinghe, A. Patel, S. Reid, H. Bayley, Continuous base identi-
of airborne pollen using next-generation DNA sequencing, Mol. Ecol. Resour. (2014),
fication for single-molecule nanopore DNA sequencing, Nat. Nanotechnol. 4 (2009) 265–
http://dx.doi.org/10.1111/1755-0998.12288 (http://onlinelibrary.wiley.com/doi/10.
270.
1111/1755-0998.12288/abstract ).
[20]
[42] J.C. Venter, K. Remington, J.F. Heidelberg, A.L. Halpern, D. Rusch, J.A. Eisen, D. Wu, I. R. Chen, G. Mias, J. Li-Pook-Than, L. Jiang, H. Lam, R. Chen, E. Miriami, K. Karczewski,
Paulsen, K.E. Nelson, W. Nelson, et al., Environmental genome shotgun sequencing M. Hariharan, F. Dewey, Y. Cheng, M. Clark, H. Im, L. Habegger, S. Balasubramanian,
of the Sargasso sea, Science 304 (2004) 66–74. M. O'Huallachain, J. Dudley, S. Hillenmeyer, R. Haraksingh, D. Sharon, G. Euskirchen,
[43] D. Ercolini, F. De Filippis, A. La Storia, M. Iacono, Remake by high-throughput P. Lacroute, K. Bettinger, A. Boyle, M. Kasowski, F. Grubert, S. Seki, M. Garcia, M.
se- quencing of the microbiota involved in the production of water buffalo Whirl-Carrillo, M. Gallardo, M. Blasco, P. Greenberg, P. Snyder, T. Klein, R. Altman,
mozzarella cheese, Appl. Environ. Microbiol. 78 (2012) 8142–8145. A.J. Butte, E. Ashley, M. Gerstein, K. Nadeau, H. Tang, M. Snyder, Personal omics
[44] Z. Wang, M. Gerstein, M. Snyder, RNA-seq: a revolutionary tool for pro- filing reveals dynamic molecular and medical phenotypes, Cell 148 (2012) 1293–
transcriptomics, Nat. Rev. Genet. 10 (2009) 57–63. 1307.
[45] M. Arumugam, J. Raes, E. Pelletier, D. Le Paslier, T. Yamada, D.R. Mende, G.R. Fernandes, [56] A.C. English, S. Richards, Y. Han, M. Wang, V. Vee, J. Qu, X. Qin, D.M. Muzny, J.G. Reid,
J. Tap, T. Bruls, J.-M. Batto, et al., Enterotypes of the human gut microbiome, Nature 473 K.C. Worley, et al., Mind the gap: upgrading genomes with pacific biosciences RS
(2011) 174–180. long-read sequencing technology, PLoS One 7 (2012) e47768.
[46] P.J. Turnbaugh, F. Bäckhed, L. Fulton, J.I. Gordon, Diet-induced obesity is linked [57] E.W. Loomis, J.S. Eid, P. Peluso, J. Yin, L. Hickey, D. Rank, S. McCalmon, R.J. Hagerman,
to marked but reversible alterations in the mouse distal gut microbiome, Cell Host F. Tassone, P.J. Hagerman, Sequencing the unsequenceable: expanded CGG-
Microbe 3 (2008) 213–223. repeat alleles of the fragile X gene, Genome Res. 23 (2013) 121–128.
[47] M.E. Norton, H. Brar, J. Weiss, A. Karimi, L.C. Laurent, A.B. Caughey, M.H. Rodriguez, [58] B.A. Flusberg, D.R. Webster, J.H. Lee, K.J. Travers, E.C. Olivares, T.A. Clark, J. Korlach,
J. W. III, M.E. Mitchell, C.D. Adair, H. Lee, B. Jacobsson, M.W. Tomlinson, D. S.W. Turner, Direct detection of DNA methylation during single-molecule, real-time
Oepkes, D. Hollemon, A.B. Sparks, A. Oliphant, K. Song, Non-invasive chromosomal sequencing, Nat. Methods 7 (2010) 461–465.
evaluation (nice) study: results of a multicenter prospective cohort study for [59] T.A. Clark, K.E. Spittle, S.W. Turner, J. Korlach, et al., Direct detection and
detection of fetal trisomy 21 and trisomy 18, Am. J. Obstet. Gynecol. 207 (2012) 137.e1– sequencing of damaged DNA bases, Genome Biol. 2 (2011).
137.e8. [60] I.A. Murray, T.A. Clark, R.D. Morgan, M. Boitano, B.P. Anton, K. Luong, A. Fomenkov,
[48] J.O. Kitzman, M.W. Snyder, M. Ventura, A.P. Lewis, R. Qiu, L.E. Simmons, H.S. S.W. Turner, J. Korlach, R.J. Roberts, The methylomes of six bacteria, Nucleic Acids
Gammill, C.E. Rubens, D.A. Santillan, J.C. Murray, H.K. Tabor, M.J. Bamshad, E.E. Res. 40 (2012) 11450–11462.
Eichler, J. Shendure, Noninvasive whole-genome sequencing of a human fetus, Sci. [61] G. Fang, D. Munera, D.I. Friedman, A. Mandlik, M.C. Chao, O. Banerjee, Z. Feng, B.
Transl. Med. 4 (2012) 137ra76. Losic, M.C. Mahajan, O.J. Jabado, et al., Genome-wide mapping of methylated
[49] A. Hoischen, B.W.M. van Bon, C. Gilissen, P. Arts, B. van Lier, M. Steehouwer, P. de adenine residues in pathogenic Escherichia coli using single-molecule real-time
Vries, R. de Reuver, N. Wieskamp, G. Mortier, K. Devriendt, M.Z. Amorim, N. sequencing, Nat. Biotechnol. 30 (12) (2012) 1232–1239.
Revencu, A. Kidd, M. Barbosa, A. Turner, J. Smith, C. Oley, A. Henderson, I.M. [62] T.A. Clark, X. Lu, K. Luong, Q. Dai, M. Boitano, S.W. Turner, C. He, J. Korlach,
Hayes, E.M. Thompson, H.G. Brunner, B.B.A. de Vries, J.A. Veltman, De novo muta- Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via
tions of SETBP1 cause Schinzel–Giedion syndrome, Nat. Genet. 42 (2010) 483–485. Tet1 o x i d a t i o n , B M C B i o l . 1 1 ( 2 0 1 3 ) 4.
[50] L.E.L.M. Vissers, J. de Ligt, C. Gilissen, I. Janssen, M. Steehouwer, P. de Vries, B. [63] D. Sharon, H. Tilgner, F. Grubert, M. Snyder, A single-molecule long-read survey
van Lier, P. Arts, N. Wieskamp, M. del Rosario, B.W.M. van Bon, A. Hoischen, of the human transcriptome, Natl. Biotechnol. Adv. Online Publ. 31 (11) (2013)
B.B.A. de Vries, H.G. Brunner, J.A. Veltman, A de novo paradigm for mental 1009–1014.
retardation, Nat. Genet. 42 (2010) 1109–1112. [64] J. Shendure, G.J. Porreca, N.B. Reppas, X. Lin, J.P. McCutcheon, A.M. Rosenbaum,
[51] S.B. Ng, K.J. Buckingham, C. Lee, A.W. Bigham, H.K. Tabor, K.M. Dent, C.D. Huff, P.T. M.D. Wang, K. Zhang, R.D. Mitra, G.M. Church, Accurate multiplex polony
Shannon, E.W. Jabs, D.A. Nickerson, J. Shendure, M.J. Bamshad, Exome sequencing sequencing of an evolved bacterial genome, Science 309 (2005) 1728–1732.
identifies the cause of a Mendelian disorder, Nat. Genet. 42 (2010) 30–35. [65] M.L. Metzker, Sequencing technologies-the next generation, Nat. Rev. Genet. 11
[52] R. Bernards, Finding effective cancer therapies through loss of function genetic (2009) 31–46.
screens, Curr. Opin. Genet. Dev. 24 (2014) 23–29. [66] M. Margulies, M. Egholm, W.E. Altman, S. Attiya, J.S. Bader, L.A. Bemben, J. Berka,
[53] I.C. Macaulay, T. Voet, Single cell genomics: advances and future perspectives, M.S. Braverman, Y.-J. Chen, Z. Chen, et al., Genome sequencing in microfabricated
PLoS Genet. 10 (2014) e1004126. high- density picolitre reactors, Nature 437 (2005) 376–380.
[54] J. Zhang, J. Baran, A. Cros, J.M. Guberman, S. Haider, J. Hsu, Y. Liang, E. Rivkin, J.
Wang,
B. Whitty, M. Wong-Erasmus, L. Yao, A. Kasprzyk, International cancer genome consor-
tium data portal: a one-stop shop for cancer genomics data, Database 2011 (2011)
bar026 http://dx.doi.org/10.1093/database/bar026.
[55]