DNA The Code of Life
DNA The Code of Life
The Human Genome Project (HGP) was a mega project launched in the year 1990. The
advances in genetic engineering techniques have made this project possible. The aims of this
project reveal the magnitude and the requirements of this project.
The human genome (i.e. the complete set of genes) has approximately 3 x 109 base pairs. If the
cost of sequencing is US$ 3 per base pair, then the cost of the entire project would be
approximately US$ 9 billion! Moreover, let’s say the sequencing data were to be stored in
books. Then if each page has 1000 letters and each book has 1000 pages, we will need 3300
such books to store the genetic information from a single cell!
This large amount of data would also need computational devices with high speed to store,
retrieve and analyse the data. Therefore, HGP aided the rapid development of another field in
biology – Bioinformatics.
Goals Of HGP
• Identify all the genes in the human genome (approximately 20,000-25,000 genes).
• Provide a complete and accurate sequence of the 3 billion base pairs that make up the
human genome.
• Develop new tools to obtain and analyse data and make the information widely
available.
• Address the ethical, legal and social implications of the project on society.
HGP was a 13-year project, coordinated by the National Institute of Health (NIH) and the U.S.
Department of Energy. It involved contributions from other countries too such as Japan,
Germany, China, France etc. The benefits of this project are that it can lead to revolutionary
new ways to diagnose, treat and prevent human diseases.
Besides the human genome, information about the genomes of non-human organisms can also
be very helpful. We can understand their natural capabilities and apply them towards solving
problems in human healthcare, agriculture, energy production etc. Therefore, scientists have
also sequenced many non-human organisms such as bacteria, yeast, fruit fly, plants etc.
Methodologies Of HGP
• Expressed Sequence Tags (ESTs) – This approach focussed on identifying all genes
expressed as RNA.
• Sequence Annotation – This blind approach involved sequencing the whole genome
(coding and non-coding) and later assigning functions to the different regions.
DNA sequencing involves the following steps:
• First, total DNA is isolated from a cell and converted into random, small-size fragments
since it is difficult to sequence long pieces of DNA. These fragments are then cloned into
a suitable host (bacteria or yeast) using special vectors such as Bacterial Artificial
Chromosomes (BAC) or Yeast Artificial Chromosomes (YAC). This amplifies each DNA
fragment so that it can be sequenced easily.
• Next, the fragments are sequenced using automated DNA sequencers. These sequences
work on the principle of Frederick Sanger’s method.
• Special computer-based programs are used to arrange and align the DNA sequences
based on overlapping regions present in them.
• An average gene has 3000 bases. However, sizes vary greatly, with the largest human
gene being ‘dystrophin’ that has 2.4 million bases.
• The original estimate of the number of genes was between 80,000 and 140,000.
However, HGP gave an estimate of about 30,000 genes. Approximately 99.9% nucleotide
bases are the same in all people.
• For over 50% of the discovered genes, the functions are unknown.
• Stretches of DNA sequences that are repeated many times (sometimes 100 to 1000
times) are repetitive sequences. Although they don’t code for proteins, they shed light
on chromosome structure, evolution, and dynamics.
• Chromosome 1 has the greatest number of genes (2968), and chromosome Y has the
least (231).
• HGP has identified 1.4 million locations with single base DNA differences in humans. This
information will revolutionize the identification of disease-associated sequences and
tracking of human history.
The need to derive meaningful knowledge from genomic sequences and better understand
biological systems will drive future research. This enormous task will require the coordinated
effort of scientists from various fields.
A major impact of HGP is providing a radically new approach in biological research. Earlier,
researchers studied one or a few genes at a time. Now, with new technologies and whole
genome sequences, they can study all the genes in a genome i.e. all the transcripts in a tissue
or an organ. They can also study how thousands of genes work together in networks to make a
system function.
DNA Fingerprinting
As we know, 99.9% of nucleotide bases are the same in all humans. However, there are some
differences in human DNA sequences, which make them unique. This is their DNA fingerprint.
How do we determine these differences? If we compare the whole DNA sequences of two
individuals, it’ll take far too long. DNA fingerprinting is a quicker way to compare the sequences
of two individuals.
This technique involves identifying differences in the repetitive DNA regions. The peaks on a
density gradient centrifugation help to separate the repetitive part from the bulk DNA. Here,
the bulk DNA forms a major peak, while the small peaks are called satellite DNA.
Satellite DNA is classified into micro-satellites and mini-satellites based on multiple factors such
as – base composition (A:T rich or G:C rich), number of repetitive units, length of segment etc.
These sequences do not code for any protein but are abundant in the human genome. They
also show a high degree of polymorphism i.e. differences in DNA sequence and therefore,
form the basis of DNA fingerprinting.
DNA from every tissue such as hair follicle, saliva, skin, bone etc show the same degree of
polymorphism. Thus, these are very important as an identification tool in forensic applications.
Moreover, since polymorphisms are passed on from parents to children, this fingerprinting
technique is also the basis of paternity testing.
Polymorphism
Polymorphisms are variations at the genetic level that arise due to mutations. In an individual,
new mutations can arise either in somatic cells or germ cells i.e. cells that generate sperm and
ovum. If the germ cell mutation doesn’t affect the individual’s ability to reproduce, then it is
passed on to the next generation and thus, spreads in the population.
Technique
Alec Jeffreys initially developed the technique of DNA fingerprinting using a satellite DNA that
shows a very high degree of polymorphism, as a probe. It is called Variable Number of Tandem
Repeats (VNTR). VNTR belongs to the class of mini satellites. Here, a small DNA sequence is
arranged in many copies. The copy number varies between individuals and the number of
repeats shows a high degree of polymorphism.
The technique of DNA fingerprinting involves Southern blot hybridization using radiolabelled
VNTR as a probe. The steps are:
• Sample collection
• DNA isolation.
Apart from forensic science and paternity testing, this technique is also useful in
determining population and genetic diversities. Therefore, many different probes are used
currently to generate DNA fingerprints.
Gene Expression
We now know that genes encode proteins and proteins control the functions of a cell. Are all
the genes in a cell expressed at the same time? And, are all genes expressed all the time? No!
This will not only lead to wastage of cellular energy but also affect the balance within a cell.
This is the reason why gene expression is regulated. Exactly how are genes regulated? Let’s find
out.
Protein synthesis begins at transcription, ends at translation and involves multiple steps.
Therefore, regulation of gene expression can happen at any of these steps. In eukaryotes, gene
regulation occurs at any of the following steps:
• Translational level.
In prokaryotes, the main site for regulation of gene expression is transcription initiation. Within
a transcription unit, the activity of RNA polymerase at the promoter is regulated by ’accessory
proteins’. These proteins affect the ability of RNA polymerase to recognize start sites. These
proteins can act both positively (activators) or negatively (repressors).
In prokaryotic DNA, the accessibility of the promoter depends on the interaction of proteins
with sequences called operators. In most operons, the operator is adjacent to the promoter
elements. Moreover, in most cases, the operator has a repressor protein bound to it.
Therefore, each operon has its own, specific operator and repressor. Let’s understand this
better using lac operon as an example.
• One regulatory gene – The i gene where ’i’ is derived from ‘inhibitor’. This gene codes
for the repressor of the lac operon.
2. The y gene codes for the enzyme permease that increases the permeability of the
cell to beta-galactosides.
In the absence of lactose, the i gene synthesizes the repressor which then binds to the operator
region of the operon. This prevents RNA polymerase from transcribing the genes (z, y, a) on the
operon. Therefore, if there is no lactose, the operon does not synthesize genes for its
utilization. The action of the repressor on the lac operon is negative regulation.
In the presence of lactose, the repressor interacts with lactose and gets inactivated. Thus, RNA
polymerase is free and can transcribe the genes in the operon. Therefore, if lactose is present,
the operon synthesizes the genes for its utilization. Therefore, essentially, the presence of the
substrate i.e. lactose regulates the synthesis of enzymes for its utilization.