Unit 1 Bioinformatics
Unit 1 Bioinformatics
Introduction to Bioinformatics
Computer Fundamentals: An introduction
Hardware Basics
❑ CPU (central processing unit) - the “brain” of the machine, where all the basic
operations are carried out, such as adding two numbers or do logical operations
❑ Main Memory – stores programs & data. CPU can ONLY directly access info stored in
the main memory, called RAM (Random Access Memory). Main memory is fast, but
volatile.
• Perl
• The Practical Extraction and Report Language (PERL) is currently the most heavily used
programming language in bioinformatics. It is particularly adept at handling arbitrary strings
of text and detecting patterns within data, which makes it particularly well suited to working
with protein and DNA sequences. In addition, it features a very flexible grammar which
allows one to write in a variety of syntaxes, ranging from simple to complex. Perl has been
widely used in genomics, including by the human genome project and TIGR. It is distributed
under a free open source Artistic License and has become widely adopted by the open
source programming community, resulting in numerous useful add on modules for Perl
(www.perl.org).
Python
A simple object oriented scripting language that is well suited for developing
bioinformatics applications and available under a free open source license. It is
particularly easy to read and understand, and has become increasingly popular in
bioinformatics applications (www.python.org).
Java
Java is a powerful object oriented cross-platform programming language developed and
made available for free by Sun. It was originally developed for controlling consumer
appliances, but repurposed for web development, then expanded. It is particularly well
suited for developing complex projects. Although it is simpler than C++, the object
oriented version of C, it still takes significant effort to master, but is very powerful, and
has been used in a number of major bioinformatics projects (www.java.com).
Supercomputers
• A supercomputer is a computer with high speed and is
calculation efficient.
• Supercomputers first came in practice during 1960s.
• These supercomputers are normal and are similar to
other computers but have more processors making the
speed high.
• Presently, supercomputers are replaced by parallel
supercomputers in which thousands of processors
were connected to a single computer (Hoffman et al.,
1990, Hill et al., 1999 & Prodan et al., 2007).
Role of Supercomputers
• Supercomputers play an important role in the field
computational science, and are used for a wide range
of computational tasks in various fields, including
quantum mechanics, weather forecasting, climate
research, oil and gas exploration, molecular modeling
(computing the structures and properties of chemical
compounds, biological macromolecules, polymers, and
crystals), and physical simulations (such as simulations
of the early moments of the universe, airplane and
spacecraft aerodynamics, the detonation of nuclear
weapons, and nuclear fusion).
Supercomputers are also involved in:
Computational Methods for Protein Structure Prediction
Three dimensional (3D) visualization of protein structure to determine the
structures of the different protein structures--primary, secondary, and tertiary.
These tools make it possible to predict these structures by the advent of
sophisticated supercomputers. The predictions of the interaction between
proteins and their ligands as well as protein-surface interactions are also made
possible by the use of advanced computational methods
Homology Modeling
Homology modeling, or comparative modeling, is the most reliable method used
for modeling 3D structures of proteins to identify the unknown structure of a
target protein using a homologous template protein structure.
Homology modeling is based on the principle that evolutionarily related proteins
have similar structures.
Therefore, the target structure of the protein can be modeled using the template
structure
• Proteins are responsible for an endless number of tasks within the cell.
• The complete set of proteins in a cell can be referred to as its proteome and the
study of protein structure and function and what every protein in the cell is doing
is known as proteomics.
• The proteome is highly dynamic and it changes from time to time in response to
different environmental stimuli.
• The goal of proteomics is to understand how the structure and function of
proteins allow them to do what they do, what they interact with, and how they
contribute to life processes.
• An application of proteomics is known as protein “expression profiling” where
proteins are identified at a certain time in an organism as a result of the expression
to a stimulus.
• Proteomics can also be used to develop a protein-network map where interaction
among proteins can be determined for a particular living system.
• Proteomics can also be applied to map protein modification to determine the
difference between a wild type and a genetically modified organism.
• It is also used to study protein-protein interactions involved in plant defense
reactions.
For example, proteomics research at Iowa State
University, USA includes:
Source: http://biotech.nature.com
Molecular Phylogeny
• Phylogenetic tree is a visual representation of the relationship
between different organisms, showing the path through
evolutionary time from a common ancestor to different
descendants.
• Similarities and divergence among related biological sequences
revealed by sequence alignment often have to be rationalized and
visualized in the context of phylogenetic trees.
• Thus, molecular phylogenetics is a fundamental aspect of
bioinformatics.
• Molecular phylogenetics is the branch of phylogeny that analyzes
genetic, hereditary molecular differences, predominately in DNA
sequences, to gain information on an organism’s evolutionary
relationships.
• The result of a molecular phylogenetic analysis is expressed in
a phylogenetic tree.
Computer aided Drug Design(structure
and ligand based approaches)
• Major types of approaches in CADD
• There are mainly two types of approaches for
drug design through CADD is the following:
• 1. Structure based drug design / direct
approach
• 2. Ligand based drug design / indirect
approach
1. Structure-based drug design
• In SBDD, structure of the target protein is known and
interaction or bio-affinity for all tested compounds calculate
after the process of docking; to design a new drug molecule,
which shows better interaction with target protein.
• SBDD runs through multiple cycles before the optimized lead
reached into clinical trials.
• The first cycle comprises isolation, purification and structure
determination of the target protein by one of three key
methods: like X-ray crystallography, homology modeling or
NMR.
• Second cycle comprises structure determination of the
protein in complex with the most optimistic lead of the first
cycle, the one with minimum micro-molar inhibition in-vitro,
and shows sites of the compound which can be optimized for
further increment in the potency.
2. Ligand-Based drug design