2 Genome Organization

Genome Organization and Evolution
Genes
●
A gene is the basic physical and functional unit of
heredity.
●
Genes, which are made up of DNA, act as
instructions to make proteins
– DNA which codes for functional RNA?
– Control regions?
Gene organization
●
A gene may occur on either strand of DNA
●
Genes are continuous stretches (almost always) in
prokaryotes
●
Genes are (often) discontinuous stretches (exons)
in eukaryotes. The intervening regions are called
introns
●
Upstream is a binding site
●
Location of regulatory region is less predictable
The Central Dogma
●
One gene, one protein
●
Like most dogmas, not entirely true
●
Alternative splicing permits the manufacture of
many products from a single gene
●
The protein products are sometimes called the
proteome
●
With current technology, more gene information
is available than protein information
Transmission of information
●
How hereditary information is stored, passed on,
and implemented is considered a fundamental
problem is biology.
●
Three types of maps are essential:
– Linkage maps of genes
– Banding patterns of chromosomes
– DNA sequences
Gene maps
●
Gene maps help describe the spatial arrangement
of genes on a chromosome.
●
Genes are designated to a specific location on a
chromosome known as the locus and can be used
as molecular markers to find the distance between
other genes on a chromosome.
●
Maps provide researchers with the opportunity to
predict the inheritance patterns of specific traits
Chromosome banding pattern maps
●
Chromosomes are identified by the banding
patterns revealed by different staining techniques.
DNA sequence
●
Physically a sequence of nucleotides in the
molecule,
●
Computationally a string of characters: A, T, G,
and C
●
Genes are regions of the sequence, in many cases
interrupted by noncoding regions
High-resolution maps
●
Variable number tandem repeats (VNTRs –
minisatellites), 10-100 bp, are a sort of genetic
fingerprint
●
Short tandem repeat polymorphisms (STRPs –
microsatellites), 2-5 bp, are another kind of marker
●
A contig is a series of overlapping DNA clones of
known order along a chromosome from an organism
●
A sequence tagged site (STS), 200-600 bp, is a
known unique location in the genome
Identifying genes
●
Open Reading Frames (ORF) is a region of DNA
that begins with an initiation codon and ends with
a stop codon.
●
An ORF is a potential gene
●
Gene finding techniques are based on one or a
combination of the following:
– Similarity to known genes
– Properties of the DNA sequence itself (ab-initio
approaches)
Prokaryote genomes
●
Genetic material of the cell takes the form of a large
single circular piece of double stranded DNA.
Example: E. coli 4,639,211 pb
●
89% coding
●
4,285 genes
●
122 structural RNA genes
●
Prophage remmants
●
Insertion sequence elements
●
Horizontal transfers
Metagenome
●
Genetic information of an entire environmental
sample
●
DNA is extracted directly from the environment
using Next Generation Sequencing
●
Determine the sequences directly from a sample
without culturing individual strains
●
Provide information about species that cannot be
cloned in the traditional way
Eukaryotic genome
●
The full genetic information in a eukaryotic cell
●
Example: C. elegans
●
10 chromosomes
●
19,099 genes
●
Coding region – 27%
●
Average of 5 introns/gene
●
Both long and short duplications
Human Genome Project
●
At the height of the Human Genome Project,
sequencing factories were generating DNA
sequences at a rate of 1000 nucleotides per
second 24/7.
●
Technical breakthroughs that allowed the Human
Genome Project to be completed have had an
enormous impact on all of biology…..
Molecular Biology Of The Cell. Alberts et al. 491-495

Human Genome Project
Goals:
■ identify all the approximate 30,000 genes in human DNA,
■ determine the sequences of the 3 billion chemical base pairs that make up human DNA,
■ store this information in databases,
■ improve tools for data analysis,
■ transfer related technologies to the private sector, and
■ address the ethical, legal, and social issues (ELSI) that may arise from the project.
Milestones:
■ 1990: Project initiated as joint effort of U.S. Department of Energy and the National
Institutes of Health
■ June 2000: Completion of a working draft of the entire human genome (covers >90% of
the genome to a depth of 3-4x redundant sequence)
■ February 2001: Analyses of the working draft are published
■ April 2003: HGP sequencing is completed and Project is declared finished two years
ahead of schedule
http://doegenomes.org
http://www.sanger.ac.uk/HGP/overview.shtml U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
What does the draft human
genome sequence tell us?
By the Numbers
• The human genome contains 3 billion chemical nucleotide bases (A, C, T, and G).
• The average gene consists of 3000 bases, but sizes vary greatly, with the largest known
human gene being dystrophin at 2.4 million bases.
• The total number of genes is estimated at around 30,000--much lower than previous
estimates of 80,000 to 140,000.
• Almost all (99.9%) nucleotide bases are exactly the same in all people.
• The functions are unknown for over 50% of discovered genes.
http://doegenomes.org U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
How It's Arranged
• The human genome's gene-dense "urban centers" are predominantly composed of the
DNA building blocks G and C.
• In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T. GC-
and AT-rich regions usually can be seen through a microscope as light and dark bands
on chromosomes.
• Genes appear to be concentrated in random areas along the genome, with vast
expanses of noncoding DNA between.
• Stretches of up to 30,000 C and G bases repeating over and over often occur adjacent
to gene-rich areas, forming a barrier between the genes and the "junk DNA." These
CpG islands are believed to help regulate gene activity.
• Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest
(231).
The Wheat from the Chaff
• Less than 2% of the genome codes for proteins.
• Repeated sequences that do not code for proteins ("junk DNA") make up at least
50% of the human genome.
• Repetitive sequences are thought to have no direct functions, but they shed light on
chromosome structure and dynamics. Over time, these repeats reshape the genome
by rearranging it, creating entirely new genes, and modifying and reshuffling existing
genes.
• The human genome has a much greater portion (50%) of repeat sequences than the
mustard weed (11%), the worm (7%), and the fly (3%).
How the Human Compares with Other Organisms
• Unlike the human's seemingly random distribution of gene-rich areas, many other
organisms' genomes are more uniform, with genes evenly spaced throughout.
• Humans have on average three times as many kinds of proteins as the fly or worm
because of mRNA transcript "alternative splicing" and chemical modifications to the proteins.
This process can yield different protein products from the same gene.
• Humans share most of the same protein families with worms, flies, and plants; but the
number of gene family members has expanded in humans, especially in proteins involved in
development and immunity.
• Although humans appear to have stopped accumulating repeated DNA over 50 million
years ago, there seems to be no such decline in rodents. This may account for some of the
fundamental differences between hominids and rodents, although gene estimates are similar
in these species. Scientists have proposed many theories to explain evolutionary contrasts
between humans and other organisms, including those of life span, litter sizes, inbreeding,
and genetic drift.
Variations and Mutations
• Scientists have identified about 3 million locations where single-base DNA differences
(SNPs) occur in humans. This information promises to revolutionize the processes of finding
chromosomal locations for disease-associated sequences and tracing human history.
• The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females.
Researchers point to several reasons for the higher mutation rate in the male germline,
including the greater number of cell divisions required for sperm formation than for eggs.
What does the draft human genome
sequence tell us?
●
Led to the discovery of whole new classes of
proteins and genes, while revealing that many
proteins have been much more highly conserved in
evolution than had been suspected.
●
Provided new tools for determining the functions of
proteins and of individual domains within proteins,
revealing a host of unexpected relationships between
them.

What does the draft human genome
sequence tell us?
●
By making large amounts of protein available, it
has yielded an efficient way to mass produce
protein hormones and vaccines
●
Dissection of regulatory genes has provided an
important tool for unraveling the complex
regulatory networks by which eukaryotic gene
expression is controlled.

How does the human genome stack
up?
Organism Genome Size (Bases) Estimated Genes

Human (Homo sapiens) 3 billion 30,000
Laboratory mouse (M. musculus) 2.6 billion 30,000
Mustard weed (A. thaliana) 100 million 25,000
Roundworm (C. elegans) 97 million 19,000
Fruit fly (D. melanogaster) 137 million 13,000
Yeast (S. cerevisiae) 12.1 million 6,000

Bacterium (E. coli) 4.6 million 3,200
Human immunodeficiency virus (HIV) 9700 9
http://doegenomes.org
Future Challenges:
What We Still Don’t Know
• Gene number, exact locations, and functions
• Gene regulation
• DNA sequence organization
• Chromosomal structure and organization
• Noncoding DNA types, amount, distribution, information content, and functions
• Coordination of gene expression, protein synthesis, and post-translational events
• Interaction of proteins in complex molecular machines
• Predicted vs experimentally determined gene function
• Evolutionary conservation among organisms
• Protein conservation (structure and function)
• Proteomes (total protein content and function) in organisms
• Correlation of SNPs (single-base DNA variations among individuals) with health and
disease
• Disease-susceptibility prediction based on gene sequence variation
• Genes involved in complex traits and multigene diseases
• Complex systems biology including microbial consortia useful for environmental
restoration
• Developmental genetics, genomics
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
Evolution of genomes
●
Adaptation of species is coterminous with
adaptation of genomes
●
Where do genes come from? (Answer: from other
genes)
●
Homologs and paralogs
●
Lateral transfer
●
Molecular species each have their own family tree
●
Genes are widely shared
Close relatives
●
Yeast, fly, worm and human share at least 1308
groups of proteins
●
Unique to vertebrates: immune proteins (for
example)
●
Unique molecules are adapted from ancient
molecules of different purpose but similar design
●
Most new proteins come from domain
rearrangement
●
Most new species come from control region
variation

2 Genome Organization

Uploaded by

Copyright:

Available Formats

2 Genome Organization

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 Genome Organization

Uploaded by

Copyright:

Available Formats

Genome Organization and Evolution

Molecular Biology Of The Cell. Alberts et al. 491-495

• Less than 2% of the genome codes for proteins.

Molecular Biology Of The Cell. Alberts et al. 491-495

Molecular Biology Of The Cell. Alberts et al. 491-495

Organism Genome Size (Bases) Estimated Genes

Laboratory mouse (M. musculus) 2.6 billion 30,000

Mustard weed (A. thaliana) 100 million 25,000

Roundworm (C. elegans) 97 million 19,000

Fruit fly (D. melanogaster) 137 million 13,000

Yeast (S. cerevisiae) 12.1 million 6,000

You might also like