Lab Work
Lab Work
Zainab Sohail
Arid No #:
16-ARID-2582
Semester:
5th
Subject:
Bioinformatics
DEPARTMENT OF BIOCHEMISTRY
INDEX
S.NO Experiment Date Signature
1. Databases or 6-12-2018
Software
2. Retrieval of gene 13-12-2018
sequence
3. Multiple Sequence 20-12-2018
Alignment
4. Phylogenetic 27-12-2018
Analysis
5. Retrieval of 3-1-2018
protein sequence
6. Secondary 10-1-2018
Structure
Prediction
7. Tertiary Structure 17-1-2018
Prediction
8. Structure 17-1-2018
Visualization
PRACTICAL# 1:
DATABASES OR SOFTWARES
Databases:
A database is an organized collection of data, generally stored
and accessed electronically from a computer system. Where databases
are more complex, they are often developed using formal design and
modelling techniques.
Biological databases:
A biological database is a large, organized body of persistent
data, usually associated with computerized software designed to update,
query, and retrieve components of the data stored within the system.
A simple database might be a single file containing many
records, each of which includes the same set of information.
Popular databases:
A few popular databases are GenBank from NCBI (National
Centre for Biotechnology Information), Swissport from the Swiss Institute
of Bioinformatics and PIR from the Protein Information Resource.
GenBank:
GenBank (Genetic Sequence Databank) is one of the fastest
growing repositories of known genetic sequences.
EMBL:
The EMBL Nucleotide Sequence Database is a comprehensive
database of DNA and RNA sequences collected from the scientific
literature and patent applications and directly submitted from
researchers and sequencing groups.
SwissPort:
This is a protein sequence database that provides a high level of
integration with other databases and has a very low level of redundancy
(means less identical sequences are present in the database).
PRACTICAL#2:
RETRIEVAL OF GENE SEQUENCE
Procedure:
In order to retrieve a gene sequence, follow the steps given below
1. Go to NCBI database search.
4. Your device will display all the records for this gene. Select the one
whose gene sequence you want to retrieve e.g. I have selected
CYBA. (The gene record will contain gene locus graphical
representation, gene sequence, transcript, product and related
literature information).
5. To see the gene sequence information, click on FASTA.
6. By clicking FASTA, the gene sequence will appear. By copying this
sequence, the sequence will be retrieved and can be used for further
processing.
PRACTICAL#3:
MULTIPLE SEQUENCE ALIGNMENT
Definition:
A multiple sequence alignment (MSA) is a sequence
alignment of three or more biological sequences, generally protein,
DNA, or RNA. In many cases, the input set of query sequences are
assumed to have an evolutionary relationship by which they share a
linkage and are descended from a common ancestor. From the resulting
MSA, sequence homology can be inferred, and phylogenetic analysis
can be conducted to assess the sequences' shared evolutionary origins.
Visual depictions of the alignment illustrate mutation events
such as point mutations (single amino acid or nucleotide changes) that
appear as differing characters in a single alignment column, and
insertion or deletion mutations (indels or gaps) that appear as hyphens
in one or more of the sequences in the alignment.
Multiple sequence alignment is often used to assess sequence
conservation of protein domains, tertiary and secondary structures, and
even individual amino acids or nucleotides.
Explanation:
Multiple sequence alignment also refers to the process of
aligning such a sequence set. Because three or more sequences of
biologically relevant length can be difficult and are almost always time-
consuming to align by hand, computational algorithms are used to
produce and analyse the alignments.
MSAs require more sophisticated methodologies than pairwise
alignment because they are more computationally complex. Most
multiple sequence alignment programs use heuristic methods rather
than global optimization because identifying the optimal alignment
between more than a few sequences of moderate length is prohibitively
computationally expensive.
1. Kalign:
Very fast MSA tool that concentrates on local regions.
Suitable for large alignments.
2. T-Coffee:
Consistency-based MSA tool that attempts to mitigate the
pitfalls of progressive alignment methods. Suitable for small alignments.
3. WebPRANK:
The EBI has a new phylogeny-aware multiple sequence
alignment program which makes use of evolutionary information to help
place insertions and deletions.
4. Clustal Omega
New MSA tool that uses seeded guide trees and HMM profile-
profile techniques to generate alignments. Suitable for medium-large
alignments.
Procedure:
1. Search Kalign. This is the tool of multiple sequence alignment.
6. Now select one more sequence of gene from NCBI and paste in the
same block from the next line where the first sequence pasted. (we
are selecting two and more than two sequences because this is a
multiple sequence alignment.)
7. After pasting your sequences, click “submit”.
Phylogenetic Analysis:
Phylogenetic methods can be used for many purposes, including
analysis of morphological and several kinds of molecular data. These
can be used for
Comparisons of more than two sequences
Analysis of gene families, including functional predictions
Estimation of evolutionary relationships among organisms
Phylogenetic Methods:
Phylogenetic methods can be divided into three general categories
1. Parsimony
2. Minimum Distance
3. likelihood
1. Parsimony:
Finds the optimum tree by minimizing the number of evolutionary
changes
No assumption on the evolutionary pattern
May oversimplify evolution
May produce several equally good trees
2. Minimum distance:
Pairwise distances can be aggregated into a phylogenetic tree
Search for the tree that minimizes discrepancies among pairwise
distances
May or may not use an explicit model of sequence evolution
How the distances are calculated and how the tree is found can be
mixed and matched
To know what method is being used, you have to know both how the
distance matrix was constructed, and how the tree was determined
3. Likelihood:
A model of sequence evolution can be used to relate the data to a
hypothesis (typically a tree topology).
Maximum likelihood
Search for the tree that maximizes the likelihood function
The idea is to find the tree that is most likely given the data and the
model.
Procedure:
1. In the results of kalign, at the top you will see the option of
phylogenetic tree. Select it.
3. In all Databases, enter “Protein”, search your protein name also for
example haemoglobin homo sapiens.
4. Your device will display all the records for this protein. Select the one
whose protein sequence you want to retrieve e.g. I have selected
beta-globin.
PRACTICAL#6:
SECONDARY STRUCTURE PREDICTION
Introduction:
Secondary structure prediction is a set of techniques in
bioinformatics that aim to predict the secondary structures of proteins
and nucleic acid sequences based only on knowledge of their primary
structure. For proteins, this means predicting the formation of protein
structures such as alpha helices and beta strands, while for nucleic
acids it means predicting the formation of nucleic acid structures like
helixes and stem-loop structures through base pairing and base stacking
interactions.
Procedure:
In order to predict secondary structure of a protein, follow the
following steps:
1. Go to ScanProsite search.
4. In step 2, select option according to your demand and start the scan.
5. The result will appear on the screen.
PRACTICAL#7:
TERTIARY STRUCTURE PREDICTION
Introduction:
Protein tertiary structure refers to the 3-dimentional form of the
protein, presented as a polypeptide chain backbone with one or more
protein secondary structures, the protein domains.
Determining the tertiary structure of a protein can be achieved
by x-ray crystallography, nuclear magnetic resonance, and dual
polarization interferometry.
Alternatively, protein tertiary structure can be predicted using
specific algorithm and software tools based on amino acid sequence.
Procedure:
1. Go to Swiss Model search.
PRACTICAL#8:
STRUCTURE VISUALIZATION TOOLS
Visualization:
Visualization tools allow us to
• see 3D structure data.
• communicate features about 3-D structures to colleagues.
• illustrate biological processes (catalytic/binding).
• educate laypersons about structural biology.
Tools:
1. RasMol
2. JMol
3. CHIME
4. PyMol
5. Swiss 3D viewer
1. RasMol:
This tool was developed by Roger Sayle. It is an
Open source, binaries available. RasMol is widely used,
simple to use (menus) for simple operations. The Complex
operations require command-line interface.
2. JMol:
Jmol can connect to certain databases in order to directly
retrieve structures. This applies to the Jmol application, to the JSmol
HTML5 object and to the Jmol signed Java applet. (The unsigned applet
is not allowed connection to external servers and so does not support
this method.
3. CHIME:
CHIME stands for “Chemical mIME”. It is a free molecular
viewer web browser plugin based on Rasmol and was developed by
MDL Information Systems.
4. PyMOL:
It is the set of structure tools built on top of Python and
supports all Standard Features. PyMOL is extensible, scriptable native
ray tracer. It is a freely available tool.
5. Swiss PDB Viewer:
This tool is for superimposition to compare proteins and
their components such as active/binding sites. Some of the functions of
this tool are measure angles, distances between atoms, Manual or
automated (Swiss-Model) homology modelling including loop modelling,
Threading (Fold recognition), Mutations and Energy minimization,
Electron density map reading and model building (crystallography data)
and Interface to POV-Ray rendering software.