0% found this document useful (0 votes)
104 views29 pages

Lab Work

This document contains a summary of 5 practical experiments conducted by a student named Zainab Sohail in their 5th semester of studying Bioinformatics. The experiments include: 1) Retrieving gene sequences from databases; 2) Performing multiple sequence alignments; 3) Conducting phylogenetic analysis; 4) Retrieving protein sequences; 5) Predicting protein secondary and tertiary structures. For each experiment, the document provides details on the objectives, procedures, and results.

Uploaded by

Aleena Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views29 pages

Lab Work

This document contains a summary of 5 practical experiments conducted by a student named Zainab Sohail in their 5th semester of studying Bioinformatics. The experiments include: 1) Retrieving gene sequences from databases; 2) Performing multiple sequence alignments; 3) Conducting phylogenetic analysis; 4) Retrieving protein sequences; 5) Predicting protein secondary and tertiary structures. For each experiment, the document provides details on the objectives, procedures, and results.

Uploaded by

Aleena Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Name:

Zainab Sohail
Arid No #:
16-ARID-2582
Semester:
5th
Subject:
Bioinformatics

DEPARTMENT OF BIOCHEMISTRY
INDEX
S.NO Experiment Date Signature
1. Databases or 6-12-2018
Software
2. Retrieval of gene 13-12-2018
sequence
3. Multiple Sequence 20-12-2018
Alignment
4. Phylogenetic 27-12-2018
Analysis

5. Retrieval of 3-1-2018
protein sequence
6. Secondary 10-1-2018
Structure
Prediction
7. Tertiary Structure 17-1-2018
Prediction
8. Structure 17-1-2018
Visualization
PRACTICAL# 1:
DATABASES OR SOFTWARES
Databases:
A database is an organized collection of data, generally stored
and accessed electronically from a computer system. Where databases
are more complex, they are often developed using formal design and
modelling techniques.

Biological databases:
A biological database is a large, organized body of persistent
data, usually associated with computerized software designed to update,
query, and retrieve components of the data stored within the system.
A simple database might be a single file containing many
records, each of which includes the same set of information.

Popular databases:
A few popular databases are GenBank from NCBI (National
Centre for Biotechnology Information), Swissport from the Swiss Institute
of Bioinformatics and PIR from the Protein Information Resource.
GenBank:
GenBank (Genetic Sequence Databank) is one of the fastest
growing repositories of known genetic sequences.
EMBL:
The EMBL Nucleotide Sequence Database is a comprehensive
database of DNA and RNA sequences collected from the scientific
literature and patent applications and directly submitted from
researchers and sequencing groups.
SwissPort:
This is a protein sequence database that provides a high level of
integration with other databases and has a very low level of redundancy
(means less identical sequences are present in the database).
PRACTICAL#2:
RETRIEVAL OF GENE SEQUENCE
Procedure:
In order to retrieve a gene sequence, follow the steps given below
1. Go to NCBI database search.

2. Select “National Centre for Biotechnology Information”, this screen


will appear
3. In all Databases, enter “Gene”, search your gene name also for
example cytochrome b.

4. Your device will display all the records for this gene. Select the one
whose gene sequence you want to retrieve e.g. I have selected
CYBA. (The gene record will contain gene locus graphical
representation, gene sequence, transcript, product and related
literature information).
5. To see the gene sequence information, click on FASTA.
6. By clicking FASTA, the gene sequence will appear. By copying this
sequence, the sequence will be retrieved and can be used for further
processing.

PRACTICAL#3:
MULTIPLE SEQUENCE ALIGNMENT
Definition:
A multiple sequence alignment (MSA) is a sequence
alignment of three or more biological sequences, generally protein,
DNA, or RNA. In many cases, the input set of query sequences are
assumed to have an evolutionary relationship by which they share a
linkage and are descended from a common ancestor. From the resulting
MSA, sequence homology can be inferred, and phylogenetic analysis
can be conducted to assess the sequences' shared evolutionary origins.
Visual depictions of the alignment illustrate mutation events
such as point mutations (single amino acid or nucleotide changes) that
appear as differing characters in a single alignment column, and
insertion or deletion mutations (indels or gaps) that appear as hyphens
in one or more of the sequences in the alignment.
Multiple sequence alignment is often used to assess sequence
conservation of protein domains, tertiary and secondary structures, and
even individual amino acids or nucleotides.

Explanation:
Multiple sequence alignment also refers to the process of
aligning such a sequence set. Because three or more sequences of
biologically relevant length can be difficult and are almost always time-
consuming to align by hand, computational algorithms are used to
produce and analyse the alignments.
MSAs require more sophisticated methodologies than pairwise
alignment because they are more computationally complex. Most
multiple sequence alignment programs use heuristic methods rather
than global optimization because identifying the optimal alignment
between more than a few sequences of moderate length is prohibitively
computationally expensive.

Multiple Sequence Alignment Tools:


Some of the multiple sequence alignment tools are:

1. Kalign:
Very fast MSA tool that concentrates on local regions.
Suitable for large alignments.

2. T-Coffee:
Consistency-based MSA tool that attempts to mitigate the
pitfalls of progressive alignment methods. Suitable for small alignments.

3. WebPRANK:
The EBI has a new phylogeny-aware multiple sequence
alignment program which makes use of evolutionary information to help
place insertions and deletions.

4. Clustal Omega
New MSA tool that uses seeded guide trees and HMM profile-
profile techniques to generate alignments. Suitable for medium-large
alignments.

Procedure:
1. Search Kalign. This is the tool of multiple sequence alignment.

2. Select “ Kalign < multiple sequence alignment < EMBL-EBI”. This


screen will appear.

3. Now, select Nucleic Acid in place of protein in step 1 in above figure.


4. Now retrieve a sequence of gene from NCBI (as discussed in
previous practical).i.e. I have retrieved a sequence of cytochrome b.
Also select reference sequence line.
5. Now paste this sequence on kalign page.

6. Now select one more sequence of gene from NCBI and paste in the
same block from the next line where the first sequence pasted. (we
are selecting two and more than two sequences because this is a
multiple sequence alignment.)
7. After pasting your sequences, click “submit”.

8. Your result will appear on the screen.


PRACTICAL#4:
PHYLOGENETIC ANALYSIS
Phylogenetics:
Phylogenetics is the study of the evolutionary history and
relationships among individuals or groups of organisms. These
relationships are discovered through phylogenetic inference methods
that evaluate observed heritable traits, such as DNA sequences or
morphology under a model of evolution of these traits. The result of
these analyses is a phylogeny--a diagrammatic hypothesis about the
history of the evolutionary relationships of a group of organisms.
The tips of a phylogenetic tree can be living organisms or fossils, and
represent the "end", or the present, in an evolutionary lineage.
Phylogenetic analyses have become central to understanding
biodiversity, evolution, ecology, and genomes.

Phylogenetic Analysis:
Phylogenetic methods can be used for many purposes, including
analysis of morphological and several kinds of molecular data. These
can be used for
 Comparisons of more than two sequences
 Analysis of gene families, including functional predictions
 Estimation of evolutionary relationships among organisms

Steps for analysis:


1. Choosing the sequence type
2. Alignment of sequence data
3. Search for the best tree
4. Evaluation of tree reproducibility

Phylogenetic Methods:
Phylogenetic methods can be divided into three general categories
1. Parsimony
2. Minimum Distance
3. likelihood

1. Parsimony:
 Finds the optimum tree by minimizing the number of evolutionary
changes
 No assumption on the evolutionary pattern
 May oversimplify evolution
 May produce several equally good trees

2. Minimum distance:
 Pairwise distances can be aggregated into a phylogenetic tree
 Search for the tree that minimizes discrepancies among pairwise
distances
 May or may not use an explicit model of sequence evolution
 How the distances are calculated and how the tree is found can be
mixed and matched
 To know what method is being used, you have to know both how the
distance matrix was constructed, and how the tree was determined

3. Likelihood:
 A model of sequence evolution can be used to relate the data to a
hypothesis (typically a tree topology).
 Maximum likelihood
 Search for the tree that maximizes the likelihood function
 The idea is to find the tree that is most likely given the data and the
model.

Properties of analytical methods


1. Consistency
A method is consistent if it is more likely to find the correct
answer with more data.
2. Power
A method is powerful if it can find the correct answer with
very few data.
3. Accuracy
A method is accurate if in multiple trials it produces answers
that follow a normal distribution centered on the correct answer.
4. Precision
A method is precise if in multiple trials it finds answers that
are very close to each other.

Procedure:

1. In the results of kalign, at the top you will see the option of
phylogenetic tree. Select it.

2. Your result will appear on the screen.


PRACTICAL#5:
RETRIEVAL OF PROTEIN SEQUENCE
Procedure:
In order to retrieve a protein sequence, follow the steps given below
1. Go to NCBI database search.

2. Select “National Centre for Biotechnology Information”, this screen


will appear.

3. In all Databases, enter “Protein”, search your protein name also for
example haemoglobin homo sapiens.
4. Your device will display all the records for this protein. Select the one
whose protein sequence you want to retrieve e.g. I have selected
beta-globin.

5. To find protein sequence, click FASTA.


6. By clicking FASTA, the gene sequence will appear. By copying this
sequence, the sequence will be retrieved and can be used for further
processing.

PRACTICAL#6:
SECONDARY STRUCTURE PREDICTION
Introduction:
Secondary structure prediction is a set of techniques in
bioinformatics that aim to predict the secondary structures of proteins
and nucleic acid sequences based only on knowledge of their primary
structure. For proteins, this means predicting the formation of protein
structures such as alpha helices and beta strands, while for nucleic
acids it means predicting the formation of nucleic acid structures like
helixes and stem-loop structures through base pairing and base stacking
interactions.

Procedure:
In order to predict secondary structure of a protein, follow the
following steps:
1. Go to ScanProsite search.

2. Select ScanProsite, the following screen will appear.


3. In step 1, Retrieve a protein sequence from NCBI and paste it in a
box shown in figure above.

4. In step 2, select option according to your demand and start the scan.
5. The result will appear on the screen.

PRACTICAL#7:
TERTIARY STRUCTURE PREDICTION
Introduction:
Protein tertiary structure refers to the 3-dimentional form of the
protein, presented as a polypeptide chain backbone with one or more
protein secondary structures, the protein domains.
Determining the tertiary structure of a protein can be achieved
by x-ray crystallography, nuclear magnetic resonance, and dual
polarization interferometry.
Alternatively, protein tertiary structure can be predicted using
specific algorithm and software tools based on amino acid sequence.

Procedure:
1. Go to Swiss Model search.

2. Enter start modelling, the following screen will appear.


3. Retrieve a protein sequence from NCBI and paste it in a box shown in
figure above. Select “Build Model”.
4. The Model results are given below.

PRACTICAL#8:
STRUCTURE VISUALIZATION TOOLS

Visualization:
Visualization tools allow us to
• see 3D structure data.
• communicate features about 3-D structures to colleagues.
• illustrate biological processes (catalytic/binding).
• educate laypersons about structural biology.

Tools:
1. RasMol
2. JMol
3. CHIME
4. PyMol
5. Swiss 3D viewer
1. RasMol:
This tool was developed by Roger Sayle. It is an
Open source, binaries available. RasMol is widely used,
simple to use (menus) for simple operations. The Complex
operations require command-line interface.
2. JMol:
Jmol can connect to certain databases in order to directly
retrieve structures. This applies to the Jmol application, to the JSmol
HTML5 object and to the Jmol signed Java applet. (The unsigned applet
is not allowed connection to external servers and so does not support
this method.
3. CHIME:
CHIME stands for “Chemical mIME”. It is a free molecular
viewer web browser plugin based on Rasmol and was developed by
MDL Information Systems.
4. PyMOL:
It is the set of structure tools built on top of Python and
supports all Standard Features. PyMOL is extensible, scriptable native
ray tracer. It is a freely available tool.
5. Swiss PDB Viewer:
This tool is for superimposition to compare proteins and
their components such as active/binding sites. Some of the functions of
this tool are measure angles, distances between atoms, Manual or
automated (Swiss-Model) homology modelling including loop modelling,
Threading (Fold recognition), Mutations and Energy minimization,
Electron density map reading and model building (crystallography data)
and Interface to POV-Ray rendering software.

You might also like