COMP90016 2023 08 Variant Calling II
COMP90016 2023 08 Variant Calling II
Lecture 8
Variant Calling II
Dr Khalid Mahmood
Before watching this lecture, make sure you are familiar with… Today
Broad Institute
Improving alignment
● A sequencing experiment results in a large
volume of sequencing reads
● Reads are not mapped to a reference
● Reads can contain errors and technical
artifacts
● e.g. a molecule sequenced multiple times
will result in duplicate
● We need to pre-process the aligned reads
ready for variant calling
Improving alignment
Before variant calling, improve the alignment:
● Local realignment
○ fix reads misaligned around indels
In reality we usually filter our variant call list after variant calling.
Filtering - improving specificity
In many research projects the motivation is to narrow the list of
calls to the most likely candidates for further investigation.
● Hard filtering
○ Filter on quality metrics, strand biases (overall reassess
supporting evidence)
chr1 102 A G
chr1 104 A T
vs.
CCGACAATT -> CCG.ACA.ATT.A…
Phased on different
CCACAGGATT -> CCA.CAG.GAT.TA… chromosome
Phasing
If variants are close enough together we can use overlapping
reads or read pairs for phasing.
Haplotypes
A haplotype (short for haploid genotype) is a set of
genotypes that are phased so that we know which variants
are on the same DNA molecules.
Two parts of the genome that are not adjacent in the reference
may be adjacent at the breakpoint.
Detecting SVs: breakpoints
Using anomalous alignments:
● paired-end mapping: read pairs do not map at expected
distance or orientation
○ can span larger distances using short reads
○ gives only approximate breakpoint location
○ dependent on uncertainty in insert size
● split-read mapping: read maps over breakpoint, so ends of
read align to different parts of reference genome
○ need long enough reads (>200bp)
○ gives single-base resolution of breakpoint
Detecting SVs: breakpoints
If there is a variant, we expect reads be split and/or paired reads
to map discordantly:
No variant Deletion
No variant Deletion
Alt
Ref
Detecting SVs: breakpoints
Other SV signatures:
Alt
Ref
Duplication
(tandem)
Detecting SVs: breakpoints
Other SV signatures:
Alt
Ref
Evidence: 2 read pairs – both have wider insert size and both reads are facing the same way
Detecting SVs: breakpoints
Other SV signatures:
Alt
Ref
Inversion
Evidence: 2 read pairs – both have wider insert size and both reads are facing the same way
Detecting SVs
Basic approach:
● Look for breakpoints and/or CNV
○ breakpoints: paired-end reads / split read mapping
○ CNV: read depth / B-allele frequencies
○ Tools include: GRIDSS, Manta, DELLY etc. use both
paired-end and split-reads evidence
● Deduce SV events
○ which breakpoints are linked?
○ what are the likely SV types based on distances and
orientations?
● Possibly, try de novo assembly using reads from the relevant
regions
○ can be effective
○ repeat regions cause problems
Detecting SVs: summary
Brief Funct Genomics, Volume 14, Issue 5, September 2015, Pages 305–314, https://doi.org/10.1093/bfgp/elv014
The content of this slide may be subject to copyright: please see the slide notes for details.
Summary overview of SV types and their mechanisms https://doi.org/10.1038/nrg3373