Bioinformatics Intro
Bioinformatics Intro
Bioinformatics: The sum of the computational approaches to analyze, manage, and store
biological data. Bioinformatics involves the analysis of biological information using computers
and statistical techniques, the science of developing and utilizing computer databases and
algorithms to accelerate and enhance biological research.
OR
The mathematical, statistical and computing methods that aim to solve biological problems using
DNA and amino acid sequences and related information.
History of Bioinformatics
The Modern bioinformatics is can be classified into two broad categories, Biological Science and
computational Science. Here is the data of historical events for both biology and computer
science.
The history of biology in general, B.C. and before the discovery of genetic inheritance by G.
Mendel in 1865, is extremely sketch and inaccurate. This was the start of Bioinformatics history.
Gregor Mendel. is known as the "Father of Genetics". He did experiment on the cross-
fertilization of different colors of the same species. He carefully recorded the data and analyzed
the data. Mendel illustrated that the inheritance of traits could be more easily explained if it was
controlled by factors passed down from generation to generation.
The understanding of genetics has advanced remarkably in the last thirty years. In 1972, Paul
berg made the first recombinant DNA molecule using ligase. In that same year, Stanley Cohen,
Annie Chang and Herbert Boyer produced the first recombinant DNA organism. In 1973, two
important things happened in the field of genomics. The advancement of computing in 1960-70s
resulted in the basic methodology of bioinformatics. However, it is the 1990s when the
INTERNET arrived when the full fledged bioinformatics field was born.
Here are some of the major events in bioinformatics over the last several decades. The events
listed in the list occurred long before the term, "bioinformatics", was coined.
BioInformatics Events
1665 Robert Hooke published Micrographia, described the cellular structure of cork. He also
described microscopic examinations of fossilized plants and animals, comparing their
microscopic structure to that of the living organisms they resembled. He argued for an
organic origin of fossils, and suggested a plausible mechanism for their formation.
1683 Antoni van Leeuwenhoek discovered bacteria.
1686 John Ray, John Ray's in his book "Historia Plantarum" catalogued and described 18,600
kinds of plants. His book gave the first definition of species based upon common descent.
1843 Richard Owen elaborated the distinction of homology and analogy.
1864 Ernst Haeckel (Häckel) outlined the essential elements of modern zoological classification.
1865 Gregory Mendel (1823-1884), Austria, established the theory of genetic inheritance.
1902 The chromosome theory of heredity is proposed by Sutton and Boveri, working
independently.
1962 Pauling's theory of molecular evolution
1905 The word "genetics" is coined by William Bateson.
1913 First ever linkage map created by Columbia undergraduate Alfred Sturtevant (working with
T.H. Morgan).
1930 Tiselius, Uppsala University, Sweden, A new technique, electrophoresis, is introduced by
Tiselius for separating proteins in solution. "The moving-boundary method of studying the
electrophoresis of proteins" (published in Nova Acta Regiae Societatis Scientiarum
Upsaliensis, Ser. IV, Vol. 7, No. 4)
1946 Genetic material can be transferred laterally between bacterial cells, as shown by Lederberg
and Tatum.
1952 Alfred Day Hershey and Martha Chase proved that the DNA alone carries genetic
information. This was proved on the basis of their bacteriophage research.
1961 Sidney Brenner, François Jacob, Matthew Meselson, identify messenger RNA,
1965 Margaret Dayhoff's Atlas of Protein Sequences
1970 Needleman-Wunsch algorithm
1977 DNA sequencing and software to analyze it (Staden)
1981 Smith-Waterman algorithm developed
1981 The concept of a sequence motif (Doolittle)
1982 GenBank Release 3 made public
1982 Phage lambda genome sequenced
1983 Sequence database searching algorithm (Wilbur-Lipman)
1985 FASTP/FASTN: fast sequence similarity searching
1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM
1990 BLAST: fast sequence similarity searching
1991 EST: expressed sequence tag sequencing
1993 Sanger Centre, Hinxton, UK
1994 EMBL European Bioinformatics Institute, Hinxton, UK
1995 First bacterial genomes completely sequenced
1996 Yeast genome completely sequenced
1997 PSI-BLAST
1998 Worm (multicellular) genome completely sequenced
1999 Fly genome completely sequenced
2000 Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL. The large-scale organization of
metabolic networks. Nature 2000 Oct 5;407(6804):651-4, PubMed
Phylogenomics: is the intersection of the fields of evolution and genomics.[1] The term has been
used in multiple ways to refer to analysis that involves genome data and evolutionary
reconstructions. It is a group of techniques within the larger fields of phylogenetics and
genomics. Phylogenomics draws information by comparing entire genomes, or at least large
portions of genomes.[2] Phylogenetics compares and analyzes the sequences of single genes, or a
small number of genes, as well as many other types of data. Four major areas fall under
phylogenomics:
Molecular medicine:
The human genome will have profound effects on the fields of biomedical research and clinical
medicine. Every disease has a genetic component. This may be inherited (as is the case with an
estimated 3000-4000 hereditary disease including Cystic Fibrosis and Huntingtons disease) or a
result of the body's response to an environmental stress which causes alterations in the genome
(e.g. cancers, heart disease, diabetes.).
The completion of the human genome means that we can search for the genes directly associated
with different diseases and begin to understand the molecular basis of these diseases more
clearly. This new knowledge of the molecular mechanisms of disease will enable better
treatments, cures and even preventative tests to be developed.
Personalised medicine: Clinical medicine will become more personalised with the development
of the field of pharmacogenomics. This is the study of how an individual's genetic inheritance
affects the body's response to drugs. At present, some drugs fail to make it to the market because
a small percentage of the clinical patient population show adverse affects to a drug due to
sequence variants in their DNA. As a result, potentially life saving drugs never makes it to the
marketplace. Today, doctors have to use trial and error to find the best drug to treat a particular
patient as those with the same clinical symptoms can show a wide range of responses to the same
treatment. In the future, doctors will be able to analyse a patient's genetic profile and prescribe
the best available drug therapy and dosage from the beginning.
Preventative medicine: With the specific details of the genetic mechanisms of diseases being
unraveled, the development of diagnostic tests to measure a persons susceptibility to different
diseases may become a distinct reality. Preventative actions such as change of lifestyle or having
treatment at the earliest possible stages when they are more likely to be successful, could result
in huge advances in our struggle to conquer disease.
Gene therapy: In the not too distant future, the potential for using genes themselves to treat
disease may become a reality. Gene therapy is the approach used to treat, cure or even prevent
disease by changing the expression of a person’s genes. Currently, this field is in its infantile
stage with clinical trials for many different types of cancer and other diseases ongoing.
Drug development: At present all drugs on the market target only about 500 proteins. With an
improved understanding of disease mechanisms and using computational tools to identify and
validate new drug targets, more specific medicines that act on the cause, not merely the
symptoms, of the disease can be developed. These highly specific drugs promise to have fewer
side effects than many of today's medicines.
Microbial genome applications: Microorganisms are ubiquitous, that is they are found
everywhere. They have been found surviving and thriving in extremes of heat, cold, radiation,
salt, acidity and pressure. They are present in the environment, our bodies, the air, food and
water. Traditionally, use has been made of a variety of microbial properties in the baking,
brewing and food industries. The arrival of the complete genome sequences and their potential to
provide a greater insight into the microbial world and its capacities could have broad and far
reaching implications for environment, health, energy and industrial applications. For these
reasons, in 1994, the US Department of Energy (DOE) initiated the MGP (Microbial Genome
Project) to sequence genomes of bacteria useful in energy production, environmental cleanup,
industrial processing and toxic waste reduction. By studying the genetic material of these
organisms, scientists can begin to understand these microbes at a very fundamental level and
isolate the genes that give them their unique abilities to survive under extreme conditions.
Waste cleanup: Deinococcus radiodurans is known as the world's toughest bacteria and it is the
most radiation resistant organism known. Scientists are interested in this organism because of its
potential usefulness in cleaning up waste sites that contain radiation and toxic chemicals.
Climate change Studies: Increasing levels of carbon dioxide emission, mainly through the
expanding use of fossil fuels for energy, are thought to contribute to global climate change.
Recently, the DOE (Department of Energy, USA) launched a program to decrease atmospheric
carbon dioxide levels. One method of doing so is to study the genomes of microbes that use
carbon dioxide as their sole carbon source.
Alternative energy sources: Scientists are studying the genome of the microbe Chlorobium
tepidum which has an unusual capacity for generating energy from light.
Biotechnology: The archaeon Archaeoglobus fulgidus and the bacterium Thermotoga maritima
have potential for practical applications in industry and government-funded environmental
remediation. These microorganisms thrive in water temperatures above the boiling point and
therefore may provide the DOE, the Department of Defence, and private companies with heat-
stable enzymes suitable for use in industrial processes.
Lactococcus lactis is one of the most important micro-organisms involved in the dairy industry,
it is a non-pathogenic rod-shaped bacterium that is critical for manufacturing dairy products like
buttermilk, yogurt and cheese. This bacterium, Lactococcus lactis ssp., is also used to prepare
pickled vegetables, beer, wine, some bread and sausages and other fermented foods. Lactococcus
lactis is one of the most important micro-organisms involved in the dairy industry, it is a non-
pathogenic rod-shaped bacterium that is critical for manufacturing dairy products like buttermilk,
yogurt and cheese. This bacterium, Lactococcus lactis ssp., is also used to prepare pickled
vegetables, beer, wine, some bread and sausages and other fermented foods. Researchers
anticipate that understanding the physiology and genetic make-up of this bacterium will prove
invaluable for food manufacturers as well as the pharmaceutical industry, which is exploring the
capacity of L. lactis to serve as a vehicle for delivering drugs.
Antibiotic resistance: Scientists have been examining the genome of Enterococcus faecalis-a
leading cause of bacterial infection among hospital patients. They have discovered a virulence
region made up of a number of antibiotic-resistant genes that may contribute to the bacterium's
transformation from a harmless gut bacteria to a menacing invader. The discovery of the region,
known as a pathogenicity island, could provide useful markers for detecting pathogenic strains
and help to establish controls to prevent the spread of infection in wards.
Forensic analysis of microbes: Scientists used their genomic tools to help distinguish between
the strain of Bacillus anthryacis that was used in the summer of 2001 terrorist attack in Florida
with that of closely related anthrax strains.
The reality of bioweapon creation: Scientists have recently built the virus poliomyelitis using
entirely artificial means. They did this using genomic data available on the Internet and materials
from a mail-order chemical supply. The research was financed by the US Department of Defense
as part of a biowarfare response program to prove to the world the reality of bioweapons. The
researchers also hope their work will discourage officials from ever relaxing programs of
immunisation. This project has been met with very mixed feelings
Evolutionary studies: The sequencing of genomes from all three domains of life, eukaryota,
bacteria and archaea means that evolutionary studies can be performed in a quest to determine
the tree of life and the last universal common ancestor.
Crop improvement: Comparative genetics of the plant genomes has shown that the organisation
of their genes has remained more conserved over evolutionary time than was previously
believed. These findings suggest that information obtained from the model crop systems can be
used to suggest improvements to other food crops. At present the complete genomes of
Arabidopsis thaliana (water cress) and Oryza sativa (rice) are available.
Insect resistance: Genes from Bacillus thuringiensis that can control a number of serious pests
have been successfully transferred to cotton, maize and potatoes. This new ability of the plants to
resist insect attack means that the amount of insecticides being used can be reduced and hence
the nutritional quality of the crops is increased.
Scientists have recently succeeded in transferring genes into rice to increase levels of Vitamin A,
iron and other micronutrients. This work could have a profound impact in reducing occurrences
of blindness and anaemia caused by deficiencies in Vitamin A and iron respectively. Scientists
have inserted a gene from yeast into the tomato, and the result is a plant whose fruit stays longer
on the vine and has an extended shelf life
Progress has been made in developing cereal varieties that have a greater tolerance for soil
alkalinity, free aluminium and iron toxicities. These varieties will allow agriculture to succeed in
poorer soil areas, thus adding more land to the global production base. Research is also in
progress to produce crop varieties capable of tolerating reduced water conditions
Veterinary Science: Sequencing projects of many farm animals including cows, pigs and sheep
are now well under way in the hope that a better understanding of the biology of these organisms
will have huge impacts for improving the production and health of livestock and ultimately have
benefits for human nutrition.
Comparative Studies: Analysing and comparing the genetic material of different species is an
important method for studying the functions of genes, the mechanisms of inherited diseases and
species evolution. Bioinformatics tools can be used to make comparisons between the numbers,
locations and biochemical functions of genes in different organisms.
Organisms that are suitable for use in experimental research are termed model organisms. They
have a number of properties that make them ideal for research purposes including short life
spans, rapid reproduction, being easy to handle, inexpensive and they can be manipulated at the
genetic level. An example of a human model organism is the mouse. Mouse and human are very
closely related (>98%) and for the most part we see a one to one correspondence between genes
in the two species. Manipulation of the mouse at the molecular level and genome comparisons
between the two species can and is revealing detailed information on the functions of human
genes, the evolutionary relationship between the two species and the molecular mechanisms of
many human diseases.
Bioinformatics is defined broadly as the study of the inherent structure of biological information.
It is the marriage of biology and the information sciences. Examples of current bioinformatics
research include the analysis of gene and protein sequences to reveal protein evolution and
alternative splicing, the development of computational approaches to study and predict protein
structure to further understanding of function, the analysis of mass spectrometry data to
understand the connection between phosphorylation and cancer, the development of
computational methods to utilize expression data to reverse engineer gene networks in order to
more completely model cellular biology, and the study of population genetics and its connection
to human disease.