0% found this document useful (0 votes)
12 views

Unit I Bioinformatics-Introduction

Uploaded by

vampier7277788
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Unit I Bioinformatics-Introduction

Uploaded by

vampier7277788
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

BIOINFORMATICS

INTRODUCTION:
Unit I.
• An overview of Bioinformatics, scope
Bioinformatics:
• Study of information content and information flow in
biological systems and process
• Field of science in which biology, computer science and
information technology merge to form a single discipline
• The use of information technology to acquire, store,
manage, share, analyses, represent and transmit genetic
data etc.
• The beginning of the pioneering work by Margaret Dayhoff,
Richard Eck and Robert Ledley in computer-aided analysis of
protein data goes back to the period around 1960.
• Dayhoff, Eck, and Ledley capitalized on their experience and
training in computing, mathematics, and life sciences in collecting
and organizing protein sequences, sequence analysis, and studies
of protein evolution
• Their work could be regarded as the direct ancestor of modern
bioinformatics
• In 1965, Dayhoff, Eck, and a couple of colleagues compiled the first
Atlas of Protein Sequence and Structure, which had around 50
sequences known at the time.
• The second volume was published in 1966 and had a little over 100
sequences
• This compilation of protein sequence and structure information
was the predecessor of the current gene and protein databases
that form the backbone of contemporary bioinformatics
• In subsequent years, as more and more protein sequences were
reported, the Atlas grew in size and popularity under the
leadership of Dayhoff
• Eventually, this database became The Protein Information
Resource (PIR) database, now maintained at Georgetown
University
• Margaret Dayhoff (Professor at Georgetown University Medical
Center)
• Richard Eck (Chemical Engineering and Plant Biology)
• Robert Ledley (Theoretical Physics and Dentistry)
• Margaret Dayhoff was a professor at Georgetown University Medical
Center
• As an independent researcher, Dayhoff brought her background of
mathematics, chemistry, and computing to address problems in
biology, particularly protein chemistry, and became the pioneer in the
application of mathematics and computational methods to
biochemistry
• One of her most important contributions was developing, together
with Richard Eck, the single-letter code for amino acids that is used by
all protein analysis tools
• She developed a computer algorithm for protein-sequence alignment,
which was (correctly) thought to reveal their evolutionary history
• Richard Eck studied chemical engineering and plant biology. In
1961, Eck published a paper in Nature in which he compared all
the sequences of hemoglobin variants, and other proteins such
as insulin, from different species
• He realized that the information on amino-acid sequences could
be organized in different ways in order to produce specific
patterns
• He also identified numerous amino-acid substitutions in proteins
and noted that the pattern of substitutions was not random
• In a conference in 1964, Eck presented a cryptogrammic method
to trace the evolution of proteins
• He suggested that, using this result, one could calculate the degree
of relatedness of each protein with reference to its ancestors, and
draw a family tree in which the distances between the branches
represented a quantitative measure of relatedness
• Thus, Eck outlined the basis of reconstruction of a phylogenetic
tree.
• Robert Ledley, who studied theoretical physics and dentistry,
envisioned an important application of computers to sequence
analysis
• He suggested that after the polypeptide chain is cut into many
overlapping fragments, whose sequences could be determined by
peptide sequencing, the fragment reassembly of partial
sequences to obtain full sequences could be done using
computers
• Thus, Ledley suggested that computers could assist biochemists
in their efforts to determine protein sequences
• He invited Dayhoff to join the staff of National Bureau of
Standards (NBRF; later the National Institute of Standards and
Technology, or NIST) in 1960 to continue investigating this
question
• Dayhoff and Ledley wrote FORTRAN programs that could direct
the assembly of partial peptide sequences in the right order in less
than 5 minutes
• Both Dayhoff and Eck became involved in evolutionary studies of
proteins while Ledley continued with his interest in the
application of computers in biology
• Dayhoff started playing an increasingly important role in protein-
sequence analysis and continued to contribute to evolutionary
biology based on her studies on protein sequences
• She published the first reconstruction of a phylogenetic tree using
a maximum parsimony method
• She also developed the first amino-acid substitution matrix for
studying protein evolution, called the PAM matrix
• PAM stands for point accepted mutation (also referred to as
percent accepted mutation) because it represents accepted point
mutation per 100 amino acid residues
• A publication by Dayhoff in the popular science journal The
Scientific American, entitled Computer Analysis of Protein
Evolution, can be regarded as one of the most important initial
publications in bioinformatics and molecular phylogenetics
• For her enormous pioneering contributions, Margaret Dayhoff is
popularly regarded as the founder of modern bioinformatics.
• The proximate goal of bioinformatics is to develop such an
understanding through analysis and integration of the information
obtained on genes and proteins, as well as to develop new tools
and continuously improve the existing set of tools for diverse types
of analyses
• Bioinformatics also aims to develop tools that help in the
management of and access to data and information, including
improved search and retrieval capability of genomic data and
information from various types of databases
• The term “bioinformatics” was coined by Paulien Hogeweg and
Ben Hesper in 1978
• The term had been used by Hogeweg and Hesper since the
beginning of the 1970s, but was formally coined in 1978 in an
article written in Dutch
• In the beginning, the term was used to mean the study of
informatic processes in biotic systems.
• Some examples of common bioinformatic tools and analyses that are
continuously being improved and refined are:
• Data capture and storage capability
• The usability of databases
• Data analysis
• Nucleic acid and protein sequence analysis and sequence annotation
• Structural analysis of proteins and prediction of protein structure,
including three-dimensional (3D) structure; protein domain prediction
• Gene prediction
• Analysis of functional studies
• Analysis of gene and protein networks
• Phylogenetic analysis
• Therefore, as more data accumulate in the databases and more
scientific information becomes available, the progress of science
and its prognostic ability will require and hence dictate the
development of new bioinformatic tools
• Acquisition of m ore data and information, storage of all that
information, expansion of databases, new strategies needed for
analysis, and advances in computing power are all expected to
facilitate the analysis of large volumes of data and discovery of
new biological principles and insights from which unifying
principles of life and its evolution can be discerned
• The analytical tools in bioinformatics are computer algorithms
and statistics
• Improvements in the capacity of existing tools and the
development of new tools are both driven by the need for newer
dimensions and greater speed of analysis, as well as the ability to
handle an ever-increasing amount of data
• However, the success and prediction accuracy of bioinformatic
analysis ultimately depends on our knowledge of the biology of
organisms
BIOINFORMATICS - TECHNICAL TOOLBOX
• Bioinformatic analysis requires data (such as sequence
information), databases, and analysis tools
• Databases are built from data obtained through wet laboratory
experiments
• Some of the original nucleotide- and protein-sequence databases
were created more than 30 years ago
• Subsequently, information from these original databases was
utilized to create curated and more refined databases to meet
specific research needs

You might also like