0% found this document useful (0 votes)
10 views20 pages

The C-Value Paradox

total genome

Uploaded by

sabinp2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views20 pages

The C-Value Paradox

total genome

Uploaded by

sabinp2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

The C-value paradox

• The C-value is the total number of DNA nucleotide residues in the genome (per
haploid set of chromosomes).

• When you compare this to the complexity of the organism you find a massive
disparity.

• Clearly the amount of DNA is not proportional to that required to produce all
the proteins made by the organism.

• The E. Coli genome has 4.6 million base pairs and codes for about 3,000
different proteins (proteins of ~40,000 and 500 bp for promoters)

• Using the same assumptions the human genome should code for 1 million
proteins (3 billion base pairs (3*10^9), protein ~30,000 and promoters of 1500
bp)
Re annealing experiments
• Only about 3% of the DNA in the genome
actually codes for proteins. What is the
rest of it doing?

• Some clues come from re-annealing


experiments.

• The time it takes for DNA to re-anneal


depends on the complexity of the sequence
Cot plots
Single
stranded
DNA

Cot½

A260

Double
stranded
DNA

Co*Time (M*s)
• You need to account for the starting concentration (Co).

• Obviously if you started with more of a sequence it would


anneal quicker….more matches to find each other.

• By using the Cot value you can plot (on the same graph)
different annealing experiments with different starting
concentrations (10 – 2000 ug/mL) from the same source and
they will all lie of the graph.

• The rate of re-association, hence the time taken to renature is


dependent on the complexity of the DNA sequence.

• The complexity is defined as the number of bases in each


unique sequence e.g. poly (U)+poly (A) has a complexity of 1,
the repeating sequence AGTGCn has a complexity of 5.
• The Cot1/2 for a given DNA depends on the complexity.

• Eukaryotic genomic DNA can be divided up into 4 classes:


highly repetitive (hundreds to millions of copies),
moderately repetitive (10s to hundreds of copies), slightly
repetitive (1 – 10 copies) and single copy sequences.

• The last 2 are often combined


Highly
repetitive Moderately
repetitive

unique
CpG island
Any stretch of DNA greater than 500bp with a CG content of greater than
50%.
So, it a region of DNA in which the frequency of the CG sequence is higher
than in other regions.
"p" indicates that "C" and "G" are connected by a phosphodiester bond.
CpG island –from several hundred to several thousand base pairs long.
In humans there are about 45,000 CpG islands, mostly found at the 5'
ends of genes.

8
CpG island properties

CpG islands are often located around the promoters of genes frequently expressed in a
cell.

A promoter - specific region just upstream from a gene that acts as a binding site for
transcription factors and RNA polymerase during the initiation of transcription.

Thus, the knowledge of CpG island is important for the computational prediction of
promoters for genes. Recently it was shown that the prediction, which no associated with
CpG-island may not even be possible.

9
According to a recent study, human chromosomes 21 and 22
contain about 1100 CpG-islands and about 750 genes.
(Comprehensive analysis of CpG islands in human chromosomes 21 and 22, Proc. Natl. Acad. Sci. US, March 19,
2002)

10
CpG islands are not really a repeated sequence, but a special type of DNA sequence with a particular
function

This is a typical gene with a CpG island.


The island includes the first exon.

A function for islands: Molecular studies showed that the chromatin in these regions has an "open" configuration,
with no nucleosomes or histone H1.

This would make the DNA accessible to transcription factors, etc. and hence able to be transcribed.

11
What CpG islands are?

• CpG dinucleotides are rare in mammal DNA

• DNA Methylation only occurs at CpG sites

• Methylated cytosines may be converted to thymine by deamination over evolution


• CpG  TpG

• CpG islands are short stretches of DNA with higher frequency of the CG sequence

• Usually they are not methylated


• Definition from Gardiner-Garden & Frommer
• At least 200 bases long
• G+C content: > 50%
• observed CpG/expected CpG ratio: >= 0.6

• Definition from Takai & Jones


• Longer than 500 bp
• G+C content: > 55%
• observed CpG/expected CpG ratio: >= 0.65
• With this definition, these CpGi’s are more likely to be associated with the 5’
regions of genes and exclude most Alu’s

• There are about 29,000 such regions in the human genome


CpG islands and Genes

• CpG islands located in the promoter regions of genes can play important roles in gene silencing
• Housekeeping genes
• Almost all housekeeping genes are associated with at least one CpG island
• CpG islands are starting 5’ to the transcription start site and covering one or more exons and introns
• Tissue specific genes
• About 40 % tissue specific genes are associated with islands
• The position of these islands is not strongly toward the transcription start site as in the housekeeping genes
• Not all CpG islands are associated with genes
• Ioshikhes & Zhang determined the features to discriminate the promoter-associated and non-associated CpG
islands
• There are methylation-prone and methylation-resistant CpG islands
• Feltus et. al. found patterns to discriminate methylation-prone from methylation-resistant CpG islands
5’ end

CpGi
Gene
Promoter CpG islands
Gene

Gene CpG islands in body

Gene 3’ end CpG islands


Highly repetitive DNA
• Short sequences arranged in tandem repeats,
sometimes thousands of times.
• Short Tandem Repeats (STRs) or satellite DNA
• 16 bp sequence of "gatagatagatagata
• gata is repeated
• Microsatellites 1 – 13 nucleotides
• Minisatellites 14 – 500 nucleotides
• Often found clustered around the centromere or
the telomere.
Moderately repetitive DNA
• Segments of 100 to several thousand base pairs
repeated

• Repeated groups of genes whose products are


needed by cells in large quantities e.g. histones,
ribosomal and transfer RNA (although these are
sometimes classified in the highly repetitive group)

• Retrotransposons, DNA which has been transcribed in


reverse back from RNA
Retrotransposons
• Around 40% of the human genome

• LINES (long interspersed nuclear elements) 6 – 8 kb


segments that encode the proteins that enable the
transposition (e.g human L1 was from its retrotransposition
into the factor VIII gene causing hemophilia)

• SINES (short interspersed nuclear elements) 100 – 400 bp


sections containing remnants of tRNA transcription
machinery.

• LTR retrotransposons or long terminal repeats


Gene Families
• Most genes in the genome are only represented
once.

• Some have a few copies on the genome.

• One example is the globin family. This set of genes


contains a number of closely related sequences
which vary by only a few changes in the code.

• Sometimes found clustered together on the one


chromosome (but not always!)
Single copy genes
• Most of the genes of the organism are single copy
genes

• But they only make up a small proportion of the


total genome.

• They are the most complex group and hence take


the longest to re-anneal.

You might also like