0% found this document useful (0 votes)
6 views8 pages

Genes, Genome and Genectic Code

The document discusses the nature of genetic information, revealing that DNA carries this information and is organized into genomes, which include coding and non-coding regions. It highlights the complexity of gene expression, the role of introns and regulatory sequences, and the significance of transposable elements in evolution. Additionally, it explains the genetic code, the structure of eukaryotic genes, and the processes of transcription and splicing in gene expression.

Uploaded by

Sobia Anwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views8 pages

Genes, Genome and Genectic Code

The document discusses the nature of genetic information, revealing that DNA carries this information and is organized into genomes, which include coding and non-coding regions. It highlights the complexity of gene expression, the role of introns and regulatory sequences, and the significance of transposable elements in evolution. Additionally, it explains the genetic code, the structure of eukaryotic genes, and the processes of transcription and splicing in gene expression.

Uploaded by

Sobia Anwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MIC-304

Genes, Genome and Genetic Code


Introduction
For many years, scientists wondered about the nature of the information that directed the activities
of cells. What kind of molecules carried the information, and how was the information passed on
from one generation to the next? Key experiments, done between the 1920s and the 1950s,
established convincingly that this genetic information was carried by DNA. In 1953, with the
elucidation of the structure of DNA, it was possible to begin investigating how this information is
passed on, and how it is used.
Genomes
We use the word “genome” to describe all of the genetic material of the cell. That is, a genome is
the entire sequence of nucleotides in the DNA that is in all of the chromosomes of a cell. When
we use the term genome without further qualification, we are generally referring to the
chromosomes in the nucleus of a eukaryotic cell. As you know, eukaryotic cells have organelles
like mitochondria and chloroplasts that have their own DNA (Figure 7.1 & 7.2). These are referred
to as the mitochondrial or chloroplast genomes to distinguish them from the nuclear genome.
Starting in the 1980s, scientists began to determine the complete sequence of the genomes of many
organisms, in the hope of better understanding how the DNA sequence specifies cellular functions.
Today, the complete genome sequences have been determined for thousands of species from all
domains of life, and many more are in the process of being worked out by groups of scientists
across the world.

Figure 7.1 - Mitochondria carry their own genome

Figure 7.2 - The human mitochondrial genome


MIC-304

Global genome initiative


The Global Genome Initiative, a collaborative effort to sequence at least one species from each of
the 9,500 described invertebrate, vertebrate, and plant families is one of many such ventures. The
information from these various efforts is collected in enormous online repositories, so that it is
freely available to scientists. As the sequence databases compile ever more information, the fields
of computational biology and bioinformatics have arisen, to analyze and organize the data in a way
that helps biologists understand what the information in DNA means in the cellular context.
Genes
It has been known for many years that phenotypic traits are controlled by specific regions of the
DNA that were termed “genes”. Thus, DNA was envisioned as a long string of nucleotides, in
which certain regions, the genes, were separated by non-coding regions that were simply referred
to as intergenic sequences (inter=between; genic=of genes). Early experiments in molecular
biology suggested a simple relationship between the DNA sequence of a gene and its product, and
led scientists to believe that each gene carried the information for a single protein. Changes, or
mutations in the base sequence of a gene would be reflected in changes in the gene product, which
in turn, would manifest itself in the phenotype or observable trait. This simple picture, while still
useful, has been modified by subsequent discoveries that demonstrated that the use of genetic
information by cells is somewhat more complicated. Our definition of a gene is also evolving to
take new knowledge into consideration.

Figure 7.3 - From genomes to genes

Figure 7.4 - Human genes sorted by class


MIC-304

Matters of size
A common-sense assumption about genomes would be that if genes specify proteins, then the more
proteins an organism made, the more genes it would need to have, and thus, the larger its genome
would be. Comparison of various genomes shows, surprisingly, that there is not necessarily a direct
relationship between the complexity of an organism and the size of its genome (Figure 7.5). To
understand how this could be true, it is necessary to recognize that while genes are made up of
DNA, all DNA does not consist of genes (for purposes of our discussion, we define a gene as a
section of DNA that encodes an RNA or protein product). In the human genome, less than 2% of
the total DNA seems to be the sort of coding sequence that directs the synthesis of proteins. For
many years, non-coding DNA in genomes was believed to be useless, and was described as “junk
DNA” although it was perplexing that there seemed to be so much “useless” sequence. Recent
discoveries have, however, demonstrated that much of this so-called junk DNA may play
important roles in evolution, as well as in regulation of gene expression.

Figure 7.5 - Sizes of various genomes


Introns
So, what is all the non-coding DNA doing there? We know that even coding regions in our DNA
are interrupted by non-coding sequences called introns. This is true of most eukaryotic genomes.
An examination of genes in eukaryotes shows that non-coding intron sequences can be much
longer than the coding sections of the gene, or exons. Most exons are relatively small, and code
for fewer than a hundred amino acids, while introns can vary in size from several hundred base-
pairs to many kilobase-pairs (thousands of base-pairs) in length. For many genes in humans, there
is much more of intron sequence than coding (a.k.a. exon) sequence. Intron sequences account for
roughly a quarter of the genome in humans.
Other non-coding sequences
What other kinds of non-coding sequences are there? One function for some DNA sequences that
do not encode RNA or proteins is in specifying when and to what extent a gene is used, or
expressed. Such regions of DNA are called regulatory regions and each gene has one or more
MIC-304

regulatory sequences that control its expression. However, regulatory sequences do not account
for all the rest of the DNA in our genomes, either.
Transposable sequences
Surprisingly, almost half of the human genome appears to consist of several kinds of repetitive
sequences. Many of the repetitive sequences are known to be transposable elements (transposons),
sections of DNA that can move around within the genome. Sometimes referred to as “jumping
genes” these transposable elements can move from one chromosomal location to another, either
through a simple “cut and paste” mechanism that cuts the sequence out of one region of the DNA
and inserts it into another location, or through a process called retrotransposition involving an
RNA intermediate.
LINES & SINES
There are millions of copies of each of two major classes of such transposable elements, the LINEs
(Long Interspersed Elements) and SINEs (Short Interspersed Elements) in our genomes.
LINEs and SINEs are both a kind of transposable element called retrotransposons, sequences that
are copied into RNA, then reverse transcribed back into DNA before being inserted into new
locations. This movement is typically not sequence specific, meaning that the transposons can be
inserted randomly in the genome, in many cases within coding regions. As might be expected, this
can disrupt the function of the gene. Transposons may also insert within regulatory regions, and
change the expression of the genes they control. As a major cause of mutation in genomes,
transposons play an important role in evolution.
Finally, recent findings have shown that much of the genome is transcribed into RNAs, even
though only about 2% encodes proteins. What are the RNAs that do not encode proteins?
Ribosomal RNAs (Figure 7.7) and transfer RNAs, together with the small nuclear RNAs that
function in splicing, account for some of these non-translated transcripts, but not all. The remaining
RNAs are regulatory RNAs, small molecules that play an important role in regulating gene
expression.

Figure 7.6 - Components of the human genome


MIC-304

Figure 7.7 - 5S rRNA structure


The Genetic Code
The genetic code is the set of rules by which information encoded in genetic material (DNA or
mRNA sequences) is translated into proteins (amino acid sequences) by living cells.
The code defines how sequences of three nucleotides, called codons, specify the order in which
the 20 naturally occurring amino acids are put together to produce a specific polypeptide during
protein synthesis. The existence of a molecular code was predicted by the biochemists even before
the DNA was confirmed to be the blueprint of life. The challenge was to understand how the DNA
molecule, that consists only of four molecular “letters” (A, T, G, and C) can encode 20 amino
acids. The simplest assumption is that one DNA nucleotide specifies one amino acid. This,
however, leaves 16 amino acids without a code. If we increase the complexity one step forward,
and suppose that two DNA “letters” code for one amino acid. 42=16; we are still four “words”
short of 20. Increasing the complexity one step further, we can imagine that a combination of three
“letters” specifies one amino acid. 43 = 64; now we have too many “words”, but this this the most
parsimonious combination.
The theoretical genetic “vocabulary” was verified experimentally in the early 1960s. It has been
determined that of the 64 nucleotide “triplets” (codons), or codons, 61 code for amino acids; three
out of 64 serve as termination sequences (stop codons) during protein synthesis. Many amino acids
are coded by more than one codon (the code is redundant). The AUG codon specifying amino acid
methionine also serves as the “start” codon of every eukaryotic gene. AUG is the first codon of a
messenger RNA (mRNA) transcript translated by a ribosome during protein synthesis. It does not
get incorporated into the polypeptide chain during the synthesis process.
The third nucleotide in a synonymous codon (in cases when more than one codon specifies the
same amino acid, the codons are synonymous to one another) exists in a so-called wobble position.
A nucleotide change in the wobble position does not change the specified amino acid, because
during translation, a transfer RNA (tRNA) that specifies which amino acid to insert in a
polypeptide sequence and is able to recognize a specific nucleotide code on the messenger RNA
(mRNA) copied from a coding DNA strand, is not sensitive to nucleotides in the third position.
MIC-304

The vast majority of genes are encoded using the same code, but there are also many variant codes.
For example, protein synthesis in human mitochondria relies on a genetic code that differs from
the standard genetic code.

The Genetic Code. Credit: dunk (CC BY 2.0)


Not every nucleotide in a DNA sequence part of the genetic code. All organisms’ DNA contains
regulatory sequences, intergenic segments, chromosomal structural areas, and other non-coding
DNA that can contribute greatly to phenotype. Those elements operate under sets of rules that are
distinct from the codon-to-amino acid paradigm underlying the genetic code.
Codons appear on the coding (sense) DNA strand as they are read in the 5′ to 3′ direction. Each
protein-coding gene is transcribed into a molecule of the related polymer RNA, using the other
DNA strand in the double helix (called the template stand).
Therefore, the sequence of nucleotides in the RNA copy of the DNA code appears exactly the
same as the code of the DNA, except that the nucleotide thymine (T) of the DNA is replaced by
uracil (U) in the RNA. Since the discovery of the genetic code took place through the analysis of
RNA copies of the genetic message, the genetic code table is traditionally composed of RNA
codons. However, with the rise of computational biology and genomics, proteins have become
MIC-304

increasingly studied at a genomic level rather learning about protein composition by isolating
mRNA molecules from a cell undergoing gene expression. As a result, the practice of representing
the genetic code as a DNA codon table has become more popular.
Structure of a eukaryotic gene
A gene is a sequence of nucleotide triplets of a DNA molecule bound by a start codon (ATG) and
a stop codon (TGA, TAA, or TAG) that specifies a cellular product.
Most of the time, the cellular product “coded” in DNA is a protein, in which case the nucleotide
triplets in a gene will specify a specific sequence of amino acids on a polypeptide chain. A gene’s
final product can, sometimes, be an RNA molecule (rRNA, tRNA).
In humans, like in other eukaryotes, the region of the DNA coding for a protein is usually not
continuous. This region is composed of alternating stretches of exons (the actual nucleotide
sequences that carry a message delivered to the site of polypeptide assembly) and introns (non-
coding “spacer” sequences that take part in regulation of genetic expression at the mRNA level).
During transcription, both exons and introns are transcribed onto the messenger RNA (mRNA), in
their linear order. Thereafter, a process called splicing takes place, in which, the intron sequences
are excised and discarded from the mRNA sequence. The remaining RNA segments, the ones
corresponding to the exons are ligated to form the mature RNA strand.

Structure of a eukaryotic gene and the accompanying control elements


A typical human multi-exon gene starts with the promoter region, which is followed by a
transcribed but non-coding region called 5′ untranslated region (5′ UTR), which contains the
transcription start site (TSS), followed by the start codon.

Source: addgene. The use of the image adheres to the Fair Use application of 17 U.S. Code §
107.
MIC-304

The start codon starts the coding sequence of a gene, and it also serves as the translation start site
during the translation of mRNA into a chain of polypeptides on the ribosomes. The start codon in
eukaryotes is ATG. Following the start codon, there is an alternating series of exons interspersed
by internal introns, followed by the terminating exon, which contains the stop codon (either TGA,
TAA, or TAG). It is followed by another non-coding region called the 3′ UTR. Ending the gene,
there is a stretch of the adenine nucleotide repeating several times (also called the polyadenylation
(polyA) tail). The exon-intron boundaries (i.e., the splice sites) are signaled by specific short (2
nucleotide-long) sequences.

Components of a eukaryotic gene. Credit: Genomics Education Programme, CC BY 2.0, via


Wikimedia Commons.
On average, a vertebrate gene is around 30, 000 nucleotides (kilobases, or Kb) long, out of which
the coding region is only about 1 Kb long. The average coding region consists of six exons, each
about 150 nucleotides (base pairs, since DNA is double stranded, abbreviated as bp) long. Huge
deviations from the average are observed. For example, the gene called dystrophin is 2.4 million
(Mb) bp long. Blood coagulation-factor VIII has 26 exons whose size varies from 69 bp to 3106
bp, with the total coding region reaching length around 186 Kb and the introns lengths adding up
to 32.4 Kb. Intron number 22 produces two transcripts unrelated to this gene, one for each strand.
An average 5′ UTR is 750 bp long, but it can be longer and span several exons. On average, the 3′
UTR is about 450 bp long, but examples exist where its length 5 Kb (e.g., the gene for Kallman’s
syndrome).

You might also like