Sajeev Sequencing
Sajeev Sequencing
SAJEEV RAJ S
MSc P Batch
INTRODUCTION
• The information content of DNA is encoded in the form of four
bases (A,G,C and T) and the process of determining sequence of
these bases in a given DNA molecule is referred to as DNA
sequencing.
n
it o
A
a
DN
on a r
i
n
t p
ng is
of
tio
ta r e i s
n
en P c y
ca
l
io
y e n a
ar
ifi
at
g m u An
br
pl
ol
a q
Fr Li e ta
Am
Is
S
1 2 3 4 5 Da
6
Isolation of DNA
• Cell lysis
• Chemical lysis
• Mechanical lysis
• Enzymatic lysis
• DNA stabilization
• Protein removal
• DNA precipitation
Fragmentation
• After isolation of DNA the next step is the fragmentation of DNA
molecule.
• Mainly there are two types of PCR were used in NGS techniques such as
Emulsion PCR & Bridge Amplification.
1. Sequencing by Synthesis
2. Sequencing by Ligation
Sequencing by Synthesis
• In NGS, sequencing by synthesis is generally used in 454
sequencing, Ion torrent sequencing and Illumina sequencing
methods.
• Then each nucleotides are added in an order in the wells for 4-10s
span.
• Then by using another software this bcl file is converted into a fastq file
for further analysis.
• When it comes in the case of illumina it is done by another base
call process.
• Binary file containing base calls and quality scores for each tiles for each cycle.
• The quality scores given for each bases according to the fluorescence intensity
of the light detection.
Sequence Analysis
Viewer(SAV)
bcl2fastq
• Illumina's conversion software that demultiplexes sequencing data and converts BCL files into
FASTQ files. You can use bcl2fastq v1.8.4 for Illumina sequencing systems
• The FASTQ is a text-based sequence file format that is generated from the BCL
file that stores both raw sequence data and quality scores.
• --bcl-input-directory: The path to the input directory containing BCL files
• --output-directory: The path to an output directory for newly created FASTQ files
• --sample-sheet: The path to a CSV file containing sample information
Sequencing by Ligation
• SOLiD is the only sequencing platform that uses the sequence by
ligation method by using DNA ligase enzyme for the ligation.
• In this method after the emulsion PCR the bead is then deposited
onto a glass surface, a high density of beads can be achieved which
increases the throughput of this technique.
• Upto 5 rounds of sequencing using shorter primers each time (i.e. N−2,
N−3 and N-4) and measuring the fluorescence ensures that the target is
sequenced.
• Due to the two base sequencing method and also the bases the
effectively sequenced twice it gives a accuracy of 99.999% and also it is
inexpensive.
• The main disadvantage that it can read only short reads and it make
unmatchable for many applications
• Here for the interpretation of the result we got from this method,
uses a unique color matrix
IMAGE CALLING
Data Analysis
• After the sequencing process in
different types of sequencing
platforms with different methods of
sequencing we all get a BCL file.
➢ The sequence reads of variable lengths are aligned using different bioinformatics
alignment tools such as BWA, Bowtie, and TopHat.
➢ These heuristic-based aligners allow fast sequence alignment and generate a
consensus sequence from the alignment by searching the overlapping portions of
the reads and merging them into longer reads in order to construct a region of
interest, that is, genes or a whole genome.
➢ The main aim of this step is to generate a consensus sequence from the millions of
reads.
Reference based assembly
➔ Reference-assisted assembly is more like painting a scenery.
➔ The landscapes on the painting may look a little different, the terrains need not
to be the same, but still having a scenery in front of you makes the job
relatively simpler.
➢ Variant analysis uses the reads file to determine the conserved and
variable nucleotides at specific positions.
★ SNPs are variations at a single ★ SNVs are similar to SNPs but ★ Indels refer to small
nucleotide position in the encompass any single insertions or deletions of
genome where different nucleotide change in a nucleotides in the genome.
individuals may have different sequence, regardless of its ★ For example, an indel could
nucleotides. frequency in the population. involve the addition or
★ For example, in one person, a ★ SNVs can be rare and include removal of a few nucleotides
specific position in the genome all single nucleotide changes at a specific location, which
might have an "A," while in whether or not they are can lead to changes in the
another person, it might be a common. reading frame of a gene or
"G." SNPs are common and ★ For example, SNV changes the affect gene function.
can contribute to genetic adenine to thymine in the
diversity and disease coding region of the HBB gene,
susceptibility which can result in sickle cell
disease due to a change in the
amino acid sequence from
glutamic acid to valine.
Annotation:
➢ The genetic variants detected are annotated based on the published
peer-reviewed literature and public genetic variant databases using
tools like ANNOVAR, SnpEff, or VEP (Variant Effect Predictor).
Interpretation of variants:
➢ Lastly, medical professionals will interpret these variants to
examining different disease pathways and gene network analysis and
identifying actual mutations causing a disease.
• https://www.researchgate.net/publication/24043867_Next-Generation_S
equencing_From_Basic_Research_to_Diagnostics
• https://www.slideshare.net/slideshow/dna-sequencing-and-its-types/857
10251#11
• https://www.researchgate.net/figure/SOLiD-Four-color-sequencing-by-l
igation-After-annealing-of-a-universal-primer-a_fig3_268816439
• https://www.youtube.com/watch?v=QhjUS3YHpzw
• https://www.researchgate.net/publication/336975702_Biological_Sequen
ce_Analysis
• https://www.researchgate.net/publication/329097779_Current_Strategie
s_of_Polyploid_Plant_Genome_Sequence_Assembly
THANK YOU