0% found this document useful (0 votes)
320 views40 pages

Genomic DNA Libraries For Shotgun Sequencing Projects

The document discusses genomic DNA libraries and shotgun sequencing projects. It provides information on constructing libraries with different insert sizes for random sequencing and closure phases. The sequencing process involves fragmenting DNA, cloning fragments, randomly sequencing clones, assembling sequences, and closing gaps. Libraries should have low vector contamination, uniform insert sizes, and represent the entire genome. Different library strategies are discussed, including using fosmids for large inserts. Coverage levels and redundancy analysis from completed projects is also presented.

Uploaded by

Govind Kumar Rai
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
320 views40 pages

Genomic DNA Libraries For Shotgun Sequencing Projects

The document discusses genomic DNA libraries and shotgun sequencing projects. It provides information on constructing libraries with different insert sizes for random sequencing and closure phases. The sequencing process involves fragmenting DNA, cloning fragments, randomly sequencing clones, assembling sequences, and closing gaps. Libraries should have low vector contamination, uniform insert sizes, and represent the entire genome. Different library strategies are discussed, including using fosmids for large inserts. Coverage levels and redundancy analysis from completed projects is also presented.

Uploaded by

Govind Kumar Rai
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Genomic DNA Libraries

for Shotgun Sequencing


Projects

William C. Nierman

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Whole Genome Shotgun Sequencing
Library construction Random Sequencing Phase Closure Phase

a. sequence DNA
a. assemble sequences
(15,000 sequences/ Mb)
a. isolate DNA
b. close gaps

b. fragment DNA
c. edit
c. clone DNA GGG ACTGTTC ...

d. annotation

237 239

COMPLETE 238
GENOME SEQUENCE

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Genomic Sequencing Overview
Marker1 Marker2
Genomic DNA

Large Insert Library (20 - 500 Kb)


Physical
Map

Shotgun Library (2-3 Kb)


Sequencing
(6-8 X)

Assembly

Gap Closure
Analysis
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Genomic Sequencing Overview
Marker1 Marker2
Genomic DNA

Shotgun Library (2,10, 50 Kb)

Sequencing
(6-8 X)

Assembly

Gap Closure
Analysis
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Shotgun Sequencing
Phase
Library Construction

Clone Picking

Template Preparation
Sample
Tracking
Sequencing Reactions

Electrophoresis and
Base Calling

Sequence Files

Genome Assembly

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Consensus
quality values

graphical representation of phred quality values

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
3 Tier Whole Genome Shotgun
Library Strategy

1. Moderate copy number plasmids containing ~2-


kb inserts

2. Moderate copy number plasmids containing


~10-kb inserts

3. Fosmid or other clones containing 40 - 200-kb


inserts

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Repetitive
sequences

TIGR Assembly Viewer. Green arrows represent F and R sequences from the same clone.
Red arrows represent sequences with a sequence mate in a different contig. 5’ end of the
assembly points to a telomeric repeat and is linked to a clone containing telomeric sequence

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Repetitive Regions

Large-insert
spanning clone,
DMGRG22

Output from the TIGR software tool repeat Display showing a section of an assembly. The
black boxes represent a 700 bp repeat (7V, 24 copies/genome) and a 3100 bp repeat (9D, 9
copies/genome). Both repeats are spanned by clone DMGRG22. To confirm the sequence
of these repeats, this clone was transposed.

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
4737A
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
4738A
Library Requirements
1. Free vector should be at low or undetectable level.

2. None of the clones should contain chimeras derived by


insertion of two or more random fragments from separate
parts of the genome.

3. The inserts should be of relatively uniform size.

4. Libraries of different insert sizes for linking should be used.

5. Libraries should be representative of genome.

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
BstXI adaptor cloning system
DNA insert
BstXI adaptor
CTTTCCAGCACA
Ligate GAAAGGTC

Ligate Complementary to BstXI adaptor


CTGGAAAG
GTGTGACCTTTC

Vector

Vector
plus insert

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Library Requirements
1. Free vector should be at low or undetectable level.

2. None of the clones should contain chimeras derived by


insertion of two or more random fragments from separate
parts of the genome.

3. The inserts should be of relatively uniform size.

4. Libraries of different insert sizes for linking should be used.

5. Libraries should be representative of genome.

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Genome representation in
shotgun libraries is limited by
the need for faithful
propagation in Escherichia coli

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
What is “Unclonable” DNA ?
Difficult cloning targets include several
different types of sequences, such as:
– Toxic coding sequences
– Promoters
– A/T Rich DNA
– Modified bases
– Repetitive regions

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Library Coverage and Randomness

• Tolerance of cloned DNA by E.


coli host

• Vector copy number

• Insert size

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Vector Design Issues
• Vector driven transcription and translation into the
insert induce expression of the cloned sequence.
• Fortuitous transcription out of the insert can interfere
with vector maintenance.
• False positives and false negatives arise from
inappropriate transcription.
• High copy number can cause plasmid instability.

P lac

Cloned fragment

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Sequencing Project Vector Features
1. The sequencing primer sites immediately flank the cloning
site to avoid excessive re-sequencing of vector DNA.

2. PCR primer sites are located immediately outside of the


sequencing primer sites to allow PCR amplification for
template preparation.

3. The entire cloning region including the primer sites is isolated


from RNA transcription.

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Design Features of a BstXI Adaptor
Cloning System
Forward sequencing primer Reverse sequencing primer
BstXI site BstXI site
Forward PCR primer Reverse PCR primer
rrnBT1 rrnBT2

ter2

pHOS vector plus insert


Ori, copy
Pr number

AmpR ter1

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Construction of Linking Library in pHOS2Kan
genomic DNA with BstXI adaptors
~50 kb Ligation
CTGGAAAG CTTTCCAGCACA
ACACGACCTTTC GAAAGGTC pHOS2
Amp
Restriction
Digest
Double Amp/Kan
Selection
pHOS2
Kan
Amp
pHOS2

Amp Phosphatase
Ligate Kan Cassette

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Fosmid Library Construction

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Copy Number Induced (+) vs. Uninduced (-) Fosmid DNA preps

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Library Mix
Wolbachia (endosymbiont of B. malayi)
Sequenced to 20X
True genome size: 1,080,471 bases.
At 7.6X coverage:
0% small, 100% large gave 1 scaffold
of 1,076,660 bp and 10 contigs
5% small, 95% large gave 1 scaffold of
1,077,210 bp and 12 contigs
60% small, 40% large gave 14 scaffolds
(largest=160 kb), 79 contigs

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Redundancy Analysis from Completed Projects

Genome & Coverage Coverage Scaffolds Contigs


Insert Size Est. Actual
GBS Small 9.0X 8.9X 75 95
GBS Large 2.1X 4.7X 94 279
GMX Small 2.2X 3.4X 996 1321
GMX Large 3.2X 4.1X 38 1238
GFS Small 10.8X 10.4X 123 210
GFS Large 2.5X 5.1X 104 422

TPG Small 6.5X 6.5X 91 321


TPG Large 3.2X 4.2X 128 1620

gbs = Streptococcus agalactiae


gmx = Myxococcus xanthus
gfs = Fibrobacter succinogenes
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
tpg = Theileria parva
GBS of
Comparison EFFORT
Library Strategies
PHYS GAPS
23%
2 DIFFICULT REPEATS
27%

SEQ GAPS
12%
5 RNA'S
8%

MATT'S HELP
3%
COVERAGE
3%
REPEATS
EDITING 15%
FAILED MATES
8%
Genome BSP GBS 1%
GSA GSE
S. pneumoniae S. agalactiae S.aureus S. epidermidis
Size MB 2.1 2.1 2.8 2.7
Groups 160 58 134 12
Seq Gaps 290 46 198 24
Start Date Nov 95 Dec 00 Mar 99 Feb 01
In Closure 49 months 10 months 26 months 7 months

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Myxococcus xanthus Sequencing Statistics
• Total shotgun sequences _ 130,436
– TIGR Library insert sizes 2-3 kb, 10-12 kb
– Sequence coverage of 9X
• Assembled into single scaffold of 103 contigs
• Two rounds of autoprimer sequencing
reduced contig number to 36
• 9,131,959 bases, 3500 Ns in gaps

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Aspergillus fumigatus karyotype

1,789 Kb

3,779 Kb

2,021 Kb

3,992 Kb

4,018 Kb

4,834 Kb

4,891 Kb

3,933* Kb
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Optical Analysis

• Molecule maps generated from images of


single DNA molecule digested with NheI
• Resolution (avg fragment size) 8.28kb
• Total coverage: 8,987 Mbase, or 300x
• Total of 8 chromosomes
• Total size: 29.189 Megabases

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
A. fumigatus chr5-7 contig placement

TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Aspergillus fumigatus
Chromosomes

1 2.2 2.7 4.9 Mb

2 1.8 3.0 4.8 Mb

3 1.3 2.8 4.0 Mb


rRNA
4 0.4 0.7 2.5 3.9 Mb
0.3

5 1.2 2.6 3.9 Mb

6 1.3 2.5 3.6 Mb


Mitochondrion
7 0.7 1.3 2.0 Mb
32 Kb
8 0.8 1.0 1.8 Mb

TIGR
Presumed centromeric area
THE INSTITUTE FOR GENOMIC RESEARCH
Telomere
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH

You might also like