0% found this document useful (0 votes)
111 views38 pages

Unit 1 Bioinformatics

This document provides an introduction to key concepts in bioinformatics and computer science. It discusses hardware components like the CPU and memory, programming languages like Python and C++, and commonly used languages in bioinformatics like Perl, Python, and C. It also mentions the history of bioinformatics and its role in areas like genomics, proteomics, and supercomputing applications like protein structure prediction.

Uploaded by

Isha Chopra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views38 pages

Unit 1 Bioinformatics

This document provides an introduction to key concepts in bioinformatics and computer science. It discusses hardware components like the CPU and memory, programming languages like Python and C++, and commonly used languages in bioinformatics like Perl, Python, and C. It also mentions the history of bioinformatics and its role in areas like genomics, proteomics, and supercomputing applications like protein structure prediction.

Uploaded by

Isha Chopra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Unit 1

Introduction to Bioinformatics
Computer Fundamentals: An introduction
Hardware Basics

❑ CPU (central processing unit) - the “brain” of the machine, where all the basic
operations are carried out, such as adding two numbers or do logical operations

❑ Main Memory – stores programs & data. CPU can ONLY directly access info stored in
the main memory, called RAM (Random Access Memory). Main memory is fast, but
volatile.

❑ Secondary Memory – provides more permanent storage


o Hard disk (magnetic)
o Optical discs
o Flash drives

❑ Input Devices – keyboard, mouse, etc

❑ Output Device – monitor, printer, etc


Programming Languages
▪ A program is simply a sequence of instructions
telling a computer what to do.
▪ Programming languages are special notations for
expressing computations in an exact, and
unambiguous way
▪ Every structure in a program language has a precise
form (its syntax) and a precise meaning (its
semantics)
▪ Python is one example of a programming
language. Others include C++, Fortran, Java, Perl,
Matlab etc.
High-level vs. machine language
▪ Python, C++, Fortran, Java, and Perl are high-
level computer languages, designed to be
used and understood by humans.
▪ However, CPU can only understand very low-
level language known as machine language
Most Popular Programming Languages
•Java
•C
3.C++
•C#
•Python
•Visual Basic .NET
•PHP
•JavaScript
•Delphi/Object Pascal
•Swift
•Perl
•Ruby
•Assembly language
•R
•Visual Basic

Source: March 2017 TIOBE Programming Community Index


(only Turing complete languages, so SQL or HTML not included)
Commonly Used Programming Languages in
Bioinformatics
• C
• The C programming language is one of the oldest programming languages still in wide use. A
compiled C program offers excellent performance, and its syntax has influenced most
programming languages in current use (www.lysator.liu.se/c/).

• Perl
• The Practical Extraction and Report Language (PERL) is currently the most heavily used
programming language in bioinformatics. It is particularly adept at handling arbitrary strings
of text and detecting patterns within data, which makes it particularly well suited to working
with protein and DNA sequences. In addition, it features a very flexible grammar which
allows one to write in a variety of syntaxes, ranging from simple to complex. Perl has been
widely used in genomics, including by the human genome project and TIGR. It is distributed
under a free open source Artistic License and has become widely adopted by the open
source programming community, resulting in numerous useful add on modules for Perl
(www.perl.org).
Python
A simple object oriented scripting language that is well suited for developing
bioinformatics applications and available under a free open source license. It is
particularly easy to read and understand, and has become increasingly popular in
bioinformatics applications (www.python.org).

Java
Java is a powerful object oriented cross-platform programming language developed and
made available for free by Sun. It was originally developed for controlling consumer
appliances, but repurposed for web development, then expanded. It is particularly well
suited for developing complex projects. Although it is simpler than C++, the object
oriented version of C, it still takes significant effort to master, but is very powerful, and
has been used in a number of major bioinformatics projects (www.java.com).
Supercomputers
• A supercomputer is a computer with high speed and is
calculation efficient.
• Supercomputers first came in practice during 1960s.
• These supercomputers are normal and are similar to
other computers but have more processors making the
speed high.
• Presently, supercomputers are replaced by parallel
supercomputers in which thousands of processors
were connected to a single computer (Hoffman et al.,
1990, Hill et al., 1999 & Prodan et al., 2007).
Role of Supercomputers
• Supercomputers play an important role in the field
computational science, and are used for a wide range
of computational tasks in various fields, including
quantum mechanics, weather forecasting, climate
research, oil and gas exploration, molecular modeling
(computing the structures and properties of chemical
compounds, biological macromolecules, polymers, and
crystals), and physical simulations (such as simulations
of the early moments of the universe, airplane and
spacecraft aerodynamics, the detonation of nuclear
weapons, and nuclear fusion).
Supercomputers are also involved in:
Computational Methods for Protein Structure Prediction
Three dimensional (3D) visualization of protein structure to determine the
structures of the different protein structures--primary, secondary, and tertiary.
These tools make it possible to predict these structures by the advent of
sophisticated supercomputers. The predictions of the interaction between
proteins and their ligands as well as protein-surface interactions are also made
possible by the use of advanced computational methods

Homology Modeling
Homology modeling, or comparative modeling, is the most reliable method used
for modeling 3D structures of proteins to identify the unknown structure of a
target protein using a homologous template protein structure.
Homology modeling is based on the principle that evolutionarily related proteins
have similar structures.
Therefore, the target structure of the protein can be modeled using the template
structure

Fold Recognition or Threading


Threading predicts the 3D structure of the protein by aligning its primary
sequence to proteins in the protein data bank (PDB) to check for a similar
structure.
History of Bioinformatics
• The discipline of bioinformatics integrates computer science with biology to acquire,
store, analyze, and share data about biological systems. Most often this data is
concerned with DNA and sequences of amino acids.
• Margaret Dayhoff (1925–1983) was an American physical chemist who pioneered the
application of computational methods to the field of biochemistry.
• Between 1958 and 1962 she worked alongside Robert S. Ledley at the National
Biomedical Resource Foundation to develop the computer program known as
COMPROTEIN that is the first occurrence of a de novo sequence assembler, which used
Edman peptide sequencing data to ascertain protein primary structure.
• Dayhoff’s contribution to this field is so important that David J. Lipman, former director
of the National Center for Biotechnology Information (NCBI), called her ‘the mother and
father of bioinformatics’.
• The term bioinformatics was first introduced in 1990s.
• Originally, it dealt with the management and analysis of the data pertaining to DNA, RNA
and protein sequences.
• As the biological data is being produced at an unprecedented rate, its management and
interpretation invariably requires bioinformatics.
•The origin of the field can be linked back to 1953 when the double-helix
structure of DNA was determined.
• Scientists recognized the potential applications of being able to
sequence DNA, however, it wasn’t until 25 years later that the first DNA
sequencing methods emerged.
• Before figuring out DNA sequencing, scientists first focussed on protein
analysis as their starting point.
•The foundations of bioinformatics were laid in the early 1960s with the
application of computational methods to protein sequence analysis
(notably, de novo sequence assembly, biological sequence databases and
substitution models).
• Later on, DNA analysis also emerged due to parallel advances in (i)
molecular biology methods, which allowed easier manipulation of DNA,
as well as its sequencing, and (ii) computer science, which saw the rise of
increasingly miniaturized and more powerful computers, as well as novel
software better suited to handle bioinformatics tasks.
Scope of Bioinformatics
• Bioinformatics helps in understanding the
biomolecules at the molecular level using
various visualization software’s, machine
learning, understanding evolutionary genetics,
DNA sequencing, drug development, precision
medicine, data analysis, and interpretation.
• It is an upcoming field due to the information
technology, and artificial intelligence
revolution in the country.
Bioinformatics is involved in all areas of Science
Genomics provides an overview of the complete set of genetic instructions
provided by the DNA, while transcriptomics looks into gene expression
patterns.
Proteomics studies dynamic protein products and their interactions, while
metabolomics is also an intermediate step in understanding organism’s entire
metabolism.
Genomics
• Genomics is the new science that deals with the discovery and noting of
all the sequences in the entire genome of a particular organism.
• The genome can be defined as the complete set of genes inside a cell.
Genomics, is, therefore, the study of the genetic make-up of organisms.
• Determining the genomic sequence, however, is only the beginning of
genomics.
• Once this is done, the genomic sequence is used to study the function of
the numerous genes (functional genomics), to compare the genes in one
organism with those of another (comparative genomics), or to generate
the 3-D structure of one or more proteins from each protein family, thus
offering clues to their function (structural genomics).
• In crop agriculture, the main purpose of the application of genomics is to
gain a better understanding of the whole genome of plants.
• Agronomically important genes may be identified and targeted to
produce more nutritious and safe food while at the same time preserving
the environment.
•Genomics is an entry point for looking at the other ‘omics’ sciences.
•The information in the genes of an organism, its genotype, is largely
responsible for the final physical makeup of the organism, referred to as the
“phenotype”. However, the environment also has some influence on the
phenotype.
•DNA in the genome is only one aspect of the complex mechanism that keeps
an organism running – so decoding the DNA is one step towards
understanding the process. However, by itself, it does not specify everything
that happens within the organism.
•The basic flow of genetic information in a cell is as follows. The DNA is
transcribed or copied into a form known as “RNA”.
•The complete set of RNA (also known as its transcriptome) is subject to some
editing (cutting and pasting) to become messenger-RNA, which carries
information to the ribosome, the protein factory of the cell, which then
translates the message into protein.
Figure 1. Genes, proteins, and molecular machines

Source: U.S. Department of Energy Genomes to


Life Program, http://doegenomestolife.org
For example : The International Rice Genome Sequencing
Project

• This ongoing genomic research in rice is a


collaborative effort of several public and
private laboratories worldwide.
• This project aims to completely sequence the
entire rice genome (12 rice chromosomes)
and subsequently apply the knowledge to
improve rice production.
Proteomics

• Proteins are responsible for an endless number of tasks within the cell.
• The complete set of proteins in a cell can be referred to as its proteome and the
study of protein structure and function and what every protein in the cell is doing
is known as proteomics.
• The proteome is highly dynamic and it changes from time to time in response to
different environmental stimuli.
• The goal of proteomics is to understand how the structure and function of
proteins allow them to do what they do, what they interact with, and how they
contribute to life processes.
• An application of proteomics is known as protein “expression profiling” where
proteins are identified at a certain time in an organism as a result of the expression
to a stimulus.
• Proteomics can also be used to develop a protein-network map where interaction
among proteins can be determined for a particular living system.
• Proteomics can also be applied to map protein modification to determine the
difference between a wild type and a genetically modified organism.
• It is also used to study protein-protein interactions involved in plant defense
reactions.
For example, proteomics research at Iowa State
University, USA includes:

• an examination of changes of protein in the


corn proteome during low temperatures
which is a major problem for young corn
seedlings;
• analysis of the differences that occur in the
genome expression in developing soybean
stressed by high temperatures; and
• identifying the proteins expressed in response
to diseases like soybean cyst nematode.
Metabolomics

• Metabolomics is one of the newest ‘omics’ sciences.


• The metabolome refers to the complete set of low molecular weight
compounds in a sample.
• These compounds are the substrates and by-products of enzymatic
reactions and have a direct effect on the phenotype of the cell.
• Thus, metabolomics aims at determining a sample’s profile of these
compounds at a specified time under specific environmental conditions.
• Genomics and proteomics have provided extensive information regarding
the genotype but convey limited information about phenotype.
• Low molecular weight compounds are the closest link to phenotype.
• Metabolomics can be used to determine differences between the levels of
thousands of molecules between a healthy and diseased plant.
• The technology can also be used to determine the nutritional difference
between traditional and genetically modified crops, and in identifying
plant defense metabolites.
Example of a metabolic network model for E. coli

Source: U.S. Department of Energy Genomes to Life


Program, http://doegenomestolife.org
Figure 2. Example of a metabolic network model for E. coli

Source: http://biotech.nature.com
Molecular Phylogeny
• Phylogenetic tree is a visual representation of the relationship
between different organisms, showing the path through
evolutionary time from a common ancestor to different
descendants.
• Similarities and divergence among related biological sequences
revealed by sequence alignment often have to be rationalized and
visualized in the context of phylogenetic trees.
• Thus, molecular phylogenetics is a fundamental aspect of
bioinformatics.
• Molecular phylogenetics is the branch of phylogeny that analyzes
genetic, hereditary molecular differences, predominately in DNA
sequences, to gain information on an organism’s evolutionary
relationships.
• The result of a molecular phylogenetic analysis is expressed in
a phylogenetic tree.
Computer aided Drug Design(structure
and ligand based approaches)
• Major types of approaches in CADD
• There are mainly two types of approaches for
drug design through CADD is the following:
• 1. Structure based drug design / direct
approach
• 2. Ligand based drug design / indirect
approach
1. Structure-based drug design
• In SBDD, structure of the target protein is known and
interaction or bio-affinity for all tested compounds calculate
after the process of docking; to design a new drug molecule,
which shows better interaction with target protein.
• SBDD runs through multiple cycles before the optimized lead
reached into clinical trials.
• The first cycle comprises isolation, purification and structure
determination of the target protein by one of three key
methods: like X-ray crystallography, homology modeling or
NMR.
• Second cycle comprises structure determination of the
protein in complex with the most optimistic lead of the first
cycle, the one with minimum micro-molar inhibition in-vitro,
and shows sites of the compound which can be optimized for
further increment in the potency.
2. Ligand-Based drug design

• In LBDD, 3D structure of the target protein is not


known but the knowledge of ligands which binds to the
desired target site is known.
• These ligands can be used to develop a pharmacophore
model or molecule which possesses all necessary
structural features for bind to a target active site.
• Generally ligand-based techniques are pharmacophore
based approach and quantitative-structure activity
relationships (QSARs).
• In LBDD it is assumed that compounds which having
similarity in their structure also having the same
biological action and interaction with the target protein
System biology
• Systems biology is the iterative and integrative study of
biological systems as systems in response to
perturbations. It is founded on hypotheses formalized
in models built from the results of global functional
genomics analyses of the complexity of the
genome, transcriptome, proteome, metabolome, etc.
• Systems biology studies all of the elements in a system
in response to internal and external signals in order to
understand the emergent properties.
Systems biology requires the development and application of
powerful new technologies and computational tools to carry out
systems approach and, accordingly, requires a cross-disciplinary
environment, including biologists, chemists, computer scientists,
engineers, mathematicians, and physicists.
We believe systems biology will be a powerful engine driving
biology in the 21st century to collect new knowledge and develop
useful applications for monitoring and improving the environment,
agriculture, nutrition and human health.
In addition, this will require the development of a new conceptual
and epistemological framework founded on the lessons of the
history of science, and integration of the ethical, legal issues in new
practices for the organization and conduct of science.
Functional Biology
• THE FIELD OF functional genomics attempts to describe the
functions and interactions of genes and proteins by making
use of genome-wide approaches, in contrast to the gene-
by-gene approach of classical molecular biology
techniques.
• It combines data derived from the various processes
related to DNA sequence, gene expression, and protein
function, such as coding and noncoding transcription,
protein translation, protein–DNA, protein–RNA, and
protein–protein interactions.
• Together, these data are used to model interactive and
dynamic networks that regulate gene expression, cell
differentiation, and cell cycle progression.
Applications of Bioinformatics

• The advent of bioinformatics has revolutionized the


advancements in biological science.
• And biotechnology is largely benefited by bioinformatics.
• The best example is the sequencing of human genome in a
record time which would not have been possible without
bioinformatics.
• A list of applications of bioinformatics is given below:
• i. Sequence mapping of biomolecules (DNA, RNA, proteins).
• ii. Identification of nucleotide sequences of functional
genes.
iii. Finding of sites that can be cut by restriction
enzymes.
iv. Designing of primer sequence for polymerase chain
reaction.
v. Prediction of functional gene products.
vi. To trace the evolutionary trees of genes.
vii. For the prediction of 3-dimensional structure of
proteins.
viii. Molecular modelling of biomolecules.
ix. Designing of drugs for medical treatment.
x. Handling of vast biological data which otherwise is
not possible.
xi. Development of models for the functioning various
cells, tissues and organs.

You might also like