Seq Quantification App Note
Seq Quantification App Note
n
G
“1n”
G
n
G
Figure 2: Allele Drop-Out. The peak signal is only approximately half of the signal expected in the case of homozygosity.
2n I
R (G+A)
n
G
2n II
R (G+A)
n
A
2n III
R (G+A)
Figure 3: Heterozygosity: sequencing a heterozygous allele may ideally present in an electropherogram as a balanced peak pair (Outcome I) or
may appear somewhat imbalanced (Outcome II or III). The specific outcome for a given peak pair is typically highly reproducible and depends on the
local sequence context.
similar position as a mixed base. accounted for using homozygous detected by commercial and public
The signal strength of each control samples (see text). (Figure 3) domain sequence analysis software
component is approximately half packages. However, minor sequence
of the homozygous counterpart. The simple principle that the variants such as they are found
Ideally, the two heterozygous proportion of each of two sequence in somatic mutations in tumor
peaks appear to be of equal height variants in a mixture determine tissue or in emerging mutations in
(see outcome I) but in reality they the relative heights of the peaks subpopulations of microbial or viral
may occur somewhat unbalanced that represent each variant in a organisms often elude detection
(outcome II or III) depending on sequence electropherogram has because the abundance of the minor
the DNA strand sequenced and inspired Ian Carr and colleagues allele is too low for triggering a
sequence-dependent context. This from the University of Leeds Institute (mixed) basecall.
complicates the determination of of Molecular Medicine to develop a
peak height ratios. However, this software application that exploits the The heights of the primary and
imbalance phenomenon is typically quantitative information embedded secondary peaks in a mixed-base
highly reproducible for a given allele in a sequencing trace. situation are the most important
from sample to sample and can be attributes for basecalling. If the
Homozygous and heterozygous peak height ratio of a secondary to a
sequence variants are readily
2
• Move .ab1 files for QSV • Open .ab1 file of sample • Run Batch command and
analysis into a project with allele of interest select project folder as • Open result folder
folder input source “QSV Data” located
• Inspect electropherogram
• Must include • QSV analysis is executed in project folder
and
homozygous and a report folder with • Review results
counterparts for • Select peak(s) of interest results is deposited in the
(heterozygous) allele(s) and 5´ reference peaks project folder
of interest
primary peak drops below 30% (or Inferring Allelic Variant Ratios using regular SNPs, paralogous sequence
other user-set threshold) it is usually QSVAnalyzer variants (PSV) and SNPs in the
not considered and therefore not In 2009, Carr et al. published a background of copy number variation
called out as a mixed base. paper describing the QSVanalyzer (CNV).
desktop application in the journal
In this application note we will review Bioinformatics. QSVanalyzer enables An important concept presented in
the paper and the QSVanalyzer the high-throughput quantification of the paper is the normalization of
software published by Ian Carr et the proportions of DNA sequences electropherograms: Fluorescent
al. from the University of Leeds, containing single-nucleotide dideoxynucleotide terminators are
UK and recommend its utility for sequence variants (SNVs) from incorporated dependent on their
the detection and quantification of fluorescent Sanger sequencing sequence context and may appear
sequence variants. traces. The paper is open access imbalanced in heterozygous mixed
and can be downloaded with bases (see Figure 3). Further, the
We also describe a new amount of template DNA and other
bioinformatics utility, supplementary data from [1]. The
QSVanalyzer application including factors affect the absolute peak
ab1PeakReporter, which is available height. Therefore, relative (rather
on the Life Technologies web site. original sequencing trace files used
in the study can be downloaded from than absolute) peak heights are
The utility provides numerical peak determined by comparing the variant
height data of Sanger sequencing http://dna.leeds.ac.uk/qsv/ .
nucleotide’s peak height to that of
traces allowing the quantitative In the paper, Carr et al. an invariant nucleotide located 5’
analysis of peak height data. To that demonstrated the utility of the (upstream) where one can assume
end, we show how minor alleles method for estimation of copy a neutral sequence background,
can be quantified by polynomial number proportions (CNPs) for i.e., no variant–introduced effects.
regression analysis using Microsoft various quantitative sequence The software also corrects for the
Excel software.
®
A B
Figure 6: Output reports of the QSVanalyzer application. (A) Widget of the electropherogram accompanied by peak heights of the area. (B)
Comprehensive Excel-readable table with raw and reference-adjusted data. (C) Final Quantitative Sequence Variant (QSV) report with adjusted peak
heights (see Carr et al. for details).
4
trace and subtracts the allele-
specific “background noise” from
the relative peak height for a final
normalized peak height (NPH). To
calculate the QSV ratio, the program
needs two reference sequences,
each containing the homozygous
allele of the two variants.
To meet the need for quantitative Figure 8: Data from polynomial regression analysis of peak height data of a particular allele
information from Sanger containing defined proportions; only values for 0% and 10% are listed (dilution series data
sequencing traces we have provided by Carr et al. 2009). RFU = relative fluorescent units = peak height.
6
file is extracted and opened as an
Excel-readable .csv file.
Figure 13: Applying Filters to the data. 1) Click on row 16, 2) go to tab “Data” and 3) select Filter.
8
Scanner software to readily find a
peak of interest. 1
and higher)
Columns Q, R, S, T are populated
with the amplitude and sequence
output data from the KB ™
Figure 18: Amplitudes and basecalls of primary and secondary peak as determined by KB ™
Basecaller.
Quality Values
The QV is a per-base estimate of the KB Basecaller accuracy.
™
Figure 19: The Quality values indicate the probability of an incorrect basecall of primary peak.
10
Measuring allele proportions by
peak height ratios
To demonstrate the utility of the
tool we have prepared genomic
DNA mixtures of normal and
mutant TP53 gene (exon 11) at
various proportions and determined
the peak height ratios between
minor and major allele using the
ab1PeakReporter tool. Figure 20
shows that in this particular allele
situation the peak height ratios
obtained from both channels (1-
scan window or 7-scan window)
correlated quite well up to 15%.
A 5% level of mutant allele was
clearly distinguishable from 0%
(normal control; Figure 21).
0% 2.5% 5% 7.5%
11
QSVanalyzer ab1PeakReporter
Number of alleles Limited to predefined positions All bases in trace file
Number of .ab1 files that can be analyzed Multiple (maximum # not specified) QSV 96 (maximum upload per processing)
analysis requires presence of homozygous
controls for either variant
Table of peak height data of primary and Yes (see Figure 6, columns B and C) Yes (requires that .ab1 file is analyzed with KB™
Table 1: Summary of features available in the QSVanalyser application and the ab1PeakReporter tool.
Conclusions References
This application note shows tools [1] Carr IM*, Robinson JI, Dimitriou
and methods for extracting and R et al. (2009) Bioinformatics,
using peak height data from 25 (24):3244–3250. http://
fluorescent Sanger sequencing bioinformatics.oxfordjournals.org/
traces for determination of allele content/25/24/3244.long
ratios or allele quantification.
Table 1 summarizes the features [2] White paper: Applied Biosystems
of the two software applications Genetic Analysis Data File Format
presented. http://www6.appliedbiosystems.
com/support/software_community/
ABIF_File_Format.pdf
12
Find out more at lifetechnologies.com
For Research Use Only. Not for use in diagnostic procedures. ©2013 Life Technologies Corporation. All rights reserved. The trademarks
mentioned herein are the property of Life Technologies Corporation and/or its affiliate(s) or their respective owners. Excel is a registered
trademark of Microsoft Corporation. CO07793 1113