Genomic DNA Libraries For Shotgun Sequencing Projects
Genomic DNA Libraries For Shotgun Sequencing Projects
William C. Nierman
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Whole Genome Shotgun Sequencing
Library construction Random Sequencing Phase Closure Phase
a. sequence DNA
a. assemble sequences
(15,000 sequences/ Mb)
a. isolate DNA
b. close gaps
b. fragment DNA
c. edit
c. clone DNA GGG ACTGTTC ...
d. annotation
237 239
COMPLETE 238
GENOME SEQUENCE
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Genomic Sequencing Overview
Marker1 Marker2
Genomic DNA
Assembly
Gap Closure
Analysis
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Genomic Sequencing Overview
Marker1 Marker2
Genomic DNA
Sequencing
(6-8 X)
Assembly
Gap Closure
Analysis
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Shotgun Sequencing
Phase
Library Construction
Clone Picking
Template Preparation
Sample
Tracking
Sequencing Reactions
Electrophoresis and
Base Calling
Sequence Files
Genome Assembly
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Consensus
quality values
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
3 Tier Whole Genome Shotgun
Library Strategy
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Repetitive
sequences
TIGR Assembly Viewer. Green arrows represent F and R sequences from the same clone.
Red arrows represent sequences with a sequence mate in a different contig. 5’ end of the
assembly points to a telomeric repeat and is linked to a clone containing telomeric sequence
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Repetitive Regions
Large-insert
spanning clone,
DMGRG22
Output from the TIGR software tool repeat Display showing a section of an assembly. The
black boxes represent a 700 bp repeat (7V, 24 copies/genome) and a 3100 bp repeat (9D, 9
copies/genome). Both repeats are spanned by clone DMGRG22. To confirm the sequence
of these repeats, this clone was transposed.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
4737A
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
4738A
Library Requirements
1. Free vector should be at low or undetectable level.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
BstXI adaptor cloning system
DNA insert
BstXI adaptor
CTTTCCAGCACA
Ligate GAAAGGTC
Vector
Vector
plus insert
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Library Requirements
1. Free vector should be at low or undetectable level.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Genome representation in
shotgun libraries is limited by
the need for faithful
propagation in Escherichia coli
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
What is “Unclonable” DNA ?
Difficult cloning targets include several
different types of sequences, such as:
– Toxic coding sequences
– Promoters
– A/T Rich DNA
– Modified bases
– Repetitive regions
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Library Coverage and Randomness
• Insert size
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Vector Design Issues
• Vector driven transcription and translation into the
insert induce expression of the cloned sequence.
• Fortuitous transcription out of the insert can interfere
with vector maintenance.
• False positives and false negatives arise from
inappropriate transcription.
• High copy number can cause plasmid instability.
P lac
Cloned fragment
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Sequencing Project Vector Features
1. The sequencing primer sites immediately flank the cloning
site to avoid excessive re-sequencing of vector DNA.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Design Features of a BstXI Adaptor
Cloning System
Forward sequencing primer Reverse sequencing primer
BstXI site BstXI site
Forward PCR primer Reverse PCR primer
rrnBT1 rrnBT2
ter2
AmpR ter1
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Construction of Linking Library in pHOS2Kan
genomic DNA with BstXI adaptors
~50 kb Ligation
CTGGAAAG CTTTCCAGCACA
ACACGACCTTTC GAAAGGTC pHOS2
Amp
Restriction
Digest
Double Amp/Kan
Selection
pHOS2
Kan
Amp
pHOS2
Amp Phosphatase
Ligate Kan Cassette
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Fosmid Library Construction
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Copy Number Induced (+) vs. Uninduced (-) Fosmid DNA preps
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Library Mix
Wolbachia (endosymbiont of B. malayi)
Sequenced to 20X
True genome size: 1,080,471 bases.
At 7.6X coverage:
0% small, 100% large gave 1 scaffold
of 1,076,660 bp and 10 contigs
5% small, 95% large gave 1 scaffold of
1,077,210 bp and 12 contigs
60% small, 40% large gave 14 scaffolds
(largest=160 kb), 79 contigs
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Redundancy Analysis from Completed Projects
SEQ GAPS
12%
5 RNA'S
8%
MATT'S HELP
3%
COVERAGE
3%
REPEATS
EDITING 15%
FAILED MATES
8%
Genome BSP GBS 1%
GSA GSE
S. pneumoniae S. agalactiae S.aureus S. epidermidis
Size MB 2.1 2.1 2.8 2.7
Groups 160 58 134 12
Seq Gaps 290 46 198 24
Start Date Nov 95 Dec 00 Mar 99 Feb 01
In Closure 49 months 10 months 26 months 7 months
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Myxococcus xanthus Sequencing Statistics
• Total shotgun sequences _ 130,436
– TIGR Library insert sizes 2-3 kb, 10-12 kb
– Sequence coverage of 9X
• Assembled into single scaffold of 103 contigs
• Two rounds of autoprimer sequencing
reduced contig number to 36
• 9,131,959 bases, 3500 Ns in gaps
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Aspergillus fumigatus karyotype
1,789 Kb
3,779 Kb
2,021 Kb
3,992 Kb
4,018 Kb
4,834 Kb
4,891 Kb
3,933* Kb
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Optical Analysis
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
A. fumigatus chr5-7 contig placement
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Aspergillus fumigatus
Chromosomes
TIGR
Presumed centromeric area
THE INSTITUTE FOR GENOMIC RESEARCH
Telomere
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH