Genes, Genome and Genectic Code
Genes, Genome and Genectic Code
Matters of size
A common-sense assumption about genomes would be that if genes specify proteins, then the more
proteins an organism made, the more genes it would need to have, and thus, the larger its genome
would be. Comparison of various genomes shows, surprisingly, that there is not necessarily a direct
relationship between the complexity of an organism and the size of its genome (Figure 7.5). To
understand how this could be true, it is necessary to recognize that while genes are made up of
DNA, all DNA does not consist of genes (for purposes of our discussion, we define a gene as a
section of DNA that encodes an RNA or protein product). In the human genome, less than 2% of
the total DNA seems to be the sort of coding sequence that directs the synthesis of proteins. For
many years, non-coding DNA in genomes was believed to be useless, and was described as “junk
DNA” although it was perplexing that there seemed to be so much “useless” sequence. Recent
discoveries have, however, demonstrated that much of this so-called junk DNA may play
important roles in evolution, as well as in regulation of gene expression.
regulatory sequences that control its expression. However, regulatory sequences do not account
for all the rest of the DNA in our genomes, either.
Transposable sequences
Surprisingly, almost half of the human genome appears to consist of several kinds of repetitive
sequences. Many of the repetitive sequences are known to be transposable elements (transposons),
sections of DNA that can move around within the genome. Sometimes referred to as “jumping
genes” these transposable elements can move from one chromosomal location to another, either
through a simple “cut and paste” mechanism that cuts the sequence out of one region of the DNA
and inserts it into another location, or through a process called retrotransposition involving an
RNA intermediate.
LINES & SINES
There are millions of copies of each of two major classes of such transposable elements, the LINEs
(Long Interspersed Elements) and SINEs (Short Interspersed Elements) in our genomes.
LINEs and SINEs are both a kind of transposable element called retrotransposons, sequences that
are copied into RNA, then reverse transcribed back into DNA before being inserted into new
locations. This movement is typically not sequence specific, meaning that the transposons can be
inserted randomly in the genome, in many cases within coding regions. As might be expected, this
can disrupt the function of the gene. Transposons may also insert within regulatory regions, and
change the expression of the genes they control. As a major cause of mutation in genomes,
transposons play an important role in evolution.
Finally, recent findings have shown that much of the genome is transcribed into RNAs, even
though only about 2% encodes proteins. What are the RNAs that do not encode proteins?
Ribosomal RNAs (Figure 7.7) and transfer RNAs, together with the small nuclear RNAs that
function in splicing, account for some of these non-translated transcripts, but not all. The remaining
RNAs are regulatory RNAs, small molecules that play an important role in regulating gene
expression.
The vast majority of genes are encoded using the same code, but there are also many variant codes.
For example, protein synthesis in human mitochondria relies on a genetic code that differs from
the standard genetic code.
increasingly studied at a genomic level rather learning about protein composition by isolating
mRNA molecules from a cell undergoing gene expression. As a result, the practice of representing
the genetic code as a DNA codon table has become more popular.
Structure of a eukaryotic gene
A gene is a sequence of nucleotide triplets of a DNA molecule bound by a start codon (ATG) and
a stop codon (TGA, TAA, or TAG) that specifies a cellular product.
Most of the time, the cellular product “coded” in DNA is a protein, in which case the nucleotide
triplets in a gene will specify a specific sequence of amino acids on a polypeptide chain. A gene’s
final product can, sometimes, be an RNA molecule (rRNA, tRNA).
In humans, like in other eukaryotes, the region of the DNA coding for a protein is usually not
continuous. This region is composed of alternating stretches of exons (the actual nucleotide
sequences that carry a message delivered to the site of polypeptide assembly) and introns (non-
coding “spacer” sequences that take part in regulation of genetic expression at the mRNA level).
During transcription, both exons and introns are transcribed onto the messenger RNA (mRNA), in
their linear order. Thereafter, a process called splicing takes place, in which, the intron sequences
are excised and discarded from the mRNA sequence. The remaining RNA segments, the ones
corresponding to the exons are ligated to form the mature RNA strand.
Source: addgene. The use of the image adheres to the Fair Use application of 17 U.S. Code §
107.
MIC-304
The start codon starts the coding sequence of a gene, and it also serves as the translation start site
during the translation of mRNA into a chain of polypeptides on the ribosomes. The start codon in
eukaryotes is ATG. Following the start codon, there is an alternating series of exons interspersed
by internal introns, followed by the terminating exon, which contains the stop codon (either TGA,
TAA, or TAG). It is followed by another non-coding region called the 3′ UTR. Ending the gene,
there is a stretch of the adenine nucleotide repeating several times (also called the polyadenylation
(polyA) tail). The exon-intron boundaries (i.e., the splice sites) are signaled by specific short (2
nucleotide-long) sequences.