LabManual Bioinformatics
LabManual Bioinformatics
C April,2023@authors FOREWORD
This Lab Manual on “BIOINFORMATICS” is prepared in accordance with
All rights reserved. No part of this publication may be reproduced or
the updated syllabus under DBT Star College Scheme sponsored by the Department
transmitted any form or by any means without permission of the author. of Biotechnology, Ministry of Science and Technology, MHRD, New Delhi to
Any person who does any unauthorised act in relation to criminal fulfill the needs of students.
prosecution and civil claims for damage.
This manual could enable the students to retrieve protein, nucleic acid sequence
using different tools and to interpret the results. The protocols included in this
manual elaborate the step by step procedure with URL link and illustrations.
We hope this manual will definitely meet out the student’s needs to perform the
experiments that enhance their research skills in the drug development.
Bioinformatics is an evergreen, emerging interdisciplinary field in life The field of bioinformatics, which applies both information technology
science. It provides a medium for the exchange of information in the fields of and biological science make to prediction about biological processes using
computational molecular biology and the post-genome era, with an emphasis bioinformatics tools. Moreover bioinformatics is an essential part of biological
on the documentation of large data sets and databases that allow the progress of research and has applications in different practical fields.
biomedical research in a significant manner.
This bioinformatics manual helps the beginners to learn step-by-step
This lab manual presents a collection of twenty practical exercises, aimed at
procedure that will make the learners to follow the experiments easily. This
providing standard protocols to access nucleic acid and protein sequence
manual covers the curriculum for the Biotechnology, Microbiology and
databases, perform sequence alignments, predict secondary structure, and
Biochemistry students at the undergraduate level.
visualize proteins. This manual introduces the theory and provides a systematic
procedure to facilitate students to carry out the practical exercises in an easy I would like to express my heartfelt gratitude to the reviewer,
manner. Overall understanding NCBI resources, accessing biological Mrs.P.Sugapriya Menaga, Assistant Professor, Department of Biotechnology,
sequences from GenBank, performing sequence alignments, and using protein ANJAC, Sivakasi for her useful suggestions to improve the quality of this
visualization tools will become easy if the students and researchers use this manual.
manual effectively. I assure that this will become a handy tool and motivation
K.Sudha Rameshwari
factor for the basic analysis of sequences. Assistant Professor,
Department of Biochemistry,
When the readers finish practicing the exercises in this manual, they will
V.V.Vanniaperumal College for Women,
possess the knowledge to use the various online tools available for processing Virudhunagar-626001,
the biological data and will have a platform to develop their skills in handling Tamilandu, India.
bioinformatics techniques in the future.
P. Sugapriya Menaga
Reviewer
Assistant Professor of Biotechnology,
ANJAC, Sivakasi.
BIOINFORMATICS
LABORATORYMANUAL
LIST OF PRACTICALS
S.No. TITLE Page No.
1. Introduction to Bioinformatics 1
2. Retrieval of Nucleotide sequence from GenBank 4
3. Retrieval of Protein sequence from GenBank 8
4 Sequence Similarity Search using BLASTN 14
5. Sequence Similarity Search using BLASTP 22
6. Accessing Structural Database and Download the 28
Protein Structure
7. Working with Ensembl 32
8. Multiple Sequence Alignment 36
9 Predicting Physiochemical properties of protein 42
sequence
10 Predicting Peptide mass of protein sequence 48
11 Predicting cleavage site of protein sequence 54
12 Predicting secondary structure using SOPMA tool 58
13 Predicting secondary structure using CFSSP tool 62
14 Predicting transmembrane region of Protein sequence 66
15 Predicting hydrophilic region in the protein sequence 70
16 Detecting alignment of repeats in protein sequences 74
17 Predicting the peptide structure for the given protein 76
sequence
18 Conversion of nucleotide sequence into Protein 78
sequence
19 Molecular Visualization using RasMol 82
20 Measurement of bond length in protein structure 94
using RasMol
21 Measurement of bond angle in protein structure using 99
RasMol
22 Web links 101
INTRODUCTION TO BIOINFORMATICS personalized medicine depends on the application of
bioinformatics approaches.
Bioinformatics is a newly emerging scientific discipline for
This course focuses on employing existing bioinformatics
the computational analysis and storage of biological data.
resources - mainly web-based programs and databases - to access
Bioinformatics is the field in which biology, computer science and
the wealth of data to answer questions relevant to the average
information technology merges into a single discipline for managing
biologist, and is highly hands-on. Different types of career
and analyzing biological data using advanced computing techniques.
opportunities are available for the students of bioinformatics like
OBJECTIVES Scientific Curator, Gene Analyst, Protein Analyst, Phylogenitist,
To organize data, to access existing information and to Research Scientist / Associate, Data Base programmer,
submit new entries as they are produced. Bioinformatics software developer, Computational biologist,
To develop tools and resources that aid in the analysis of data Network Administrator / Analyst, Structural Analyst, Molecular
Modeler, Biostatistician, Biomechanics, Database programmer,
To analyze the data and interpret the results in a biologically
Cheminformatician, Pharmacogenetician, Pharmacogenomics etc.
meaningful manner
SCOPE OF BIOINFORMATICS
1 2
RETRIEVAL OF NUCLEOTIDE SEQUENCE FROM
GENBANK
AIM
To retrieve the gene from Genbank and to save the sequence in
FASTA format
Click
INTRODUCTION
Entries of nucleotide sequence are stored in Genbank nucleotide
database. Each entry contains complete information about the particular
gene. Genbank website is accessible for anyone who is interested in
Figure 1: Open the web browser getting the entry for research.
PROCEDURE
Type NCBI in the web browser and click search, click National
Center for Biotechnology Information, it directs to the URL :
https://www.ncbi.nlm.nih.gov/ (Figure 1) OR
Type the URL https://www.ncbi.nlm.nih.gov/ directly in the
space for address in the address bar and press the enter key.
NCBI homepage will appear.
Click the All Databases drop –up menu and drag the bar and
Click select nucleotide (Figure 2)
Search list will be displayed, click the suitable accession number
or any gene of interest (figure 4)
3 4
Enter the Gene name in the search
area Click the gene of interest/accession number (Figure 4)
Click Type Collagen in the search area and click (Figure 3)
A new window will appear and shows the entry of the collagen
gene in detail (Figure 5)
Click the FASTA, FASTA sequence appears in the new window
(Figure 6)
Figure 3: Home page of NCBI the gene enter in the search area
Copy the FASTA sequence and paste it in note pad
Save it for further investigation.
[[ RESULT
The collagen gene was retrieved and saved as FASTA format in
notepad.
OUTCOME
Figure 4: search list page Genbank. From the retrieved sequence flat file, details of sequences,
Click
submitter’s details, biological significance, and the scientific name
and taxonomy of the organism is understood. A feature table shows
CLICK characteristics that indicate coding regions, transcription units,
FAST
mutation locations, etc. Retrieved gene sequence is useful for gene
Figure 5: Genebank entry format
analysis. It can be compared with other sequences to determine which
gene is mutant. It enlightens the diagnosis of hereditary disease.
5 6
RETRIEVAL OF PROTEIN SEQUENCE FROM GENBANK
Click AIM
To retrieve the protein from Genbank and to save the sequence in
FASTA format
INTRODUCTION
Entries of protein sequence are stored in Genbank database.
Each entry contains complete information about a particular protein.
Genbank website is accessible for anyone who is interested in getting
the entry for research.
PROCEDURE
Figure 7: Home page of NCBI, Selection of Protein Type NCBI in the web browser and click search, it shows
National Center for Biotechnology Information, click National
Center for Biotechnology Information, it directs to the url :
https://www.ncbi.nlm.nih.gov/ or home page of NCBI (Figure 7)
Or
Type the url: https://www.ncbi.nlm.nih.gov/ web/Genbank/
directly in the address bar and press the enter key
NCBI homepage will appear
Click Click the All Databases drop –up menu, and drag the bar to
select protein (Figure 8)
7 8
Type collagen in the search area and press enter key or click
search
Search list will be displayed (Figure 8)
Select the suitable accession number that describes collagen
Click the accession number of collagen
A new window will appear and shows the entry of the collagen
K gene (Figure 9)
In display format, in place of summary choose “FASTA”
Only sequences will be shown (Figure 10)
Copy the sequence (Figure 11) and paste it on note pad (Figure
12) and save it for the further investigation
RESULT
Figure 9: Collgen gene Genbank entry page
The Protein sequence was retrieved and saved as FASTA
format in notepad.
OUTCOME
Students master the technique of sequence retrieval which is the
basic step in bioinformatics. From the retrieved sequence flat file,
details of sequences, submitter’s details, biological significance, and
the scientific name and taxonomy of the organism are understood.
Sequence retrieval is essential for the analysis of primary, secondary
and tertiary structure of any protein sequence.
9 10
Figure 11: Copy the Fasta Sequence
11 12
SIMILARITY SEQUENCE SEARCH USING BLASTN
AIM
To find the similarity of sequence for the given nucleotide or
Click
protein sequence
INTRODUCTION
BLAST is Basic Local Alignment Search Tool. It is a
technique for finding homology and similarity. It is a tool for
Figure 13: Search list of BLAST searching of sequences that are similar to one another in databases. By
matching the novel sequence with previously defined genes, it
compares novel gene sequences with nucleotide databases. This tool
focuses on identifying areas of sequence similarity. It will provide
information on the structure and function of the novel sequence.
Instead of using the best alignments, it looks for areas of sequence
Click
Nucleotide similarity. It produces ungapped alignments. It reports multiple local
BLAST for
similarity alignments between the query and database. It is based on an explicit
search of
nucleotide
statistical theory.
PROCEDURE
Type BLAST in the web browser and click search
Click the first link, BLAST (Figure 13)
Go to Blast Home page (Figure 14)
13 14
Check blastn
is selected Depending on the type of sequence, the programs in BLAST
differs. They are
Blastp compares an aminoacid query sequence against a
protein sequence database
Blastn compares an nucleotide query sequence against a
Paste the Fasta nucleotide nucleotide sequence database
sequence in the search box
Blastx compares a nucleotide query sequence translated in
all reading frames against a protein sequence database.
tblastn compares a protein query sequence against a
nucleotide sequence database dynamically translated in all
reading frames.
tblastx compares the six frame translation of a nucleotide
Figure 15: BlastN Page query sequence against the six frame translation of a
Click Tick the box nucleotide sequence database.
Paste the sequence (FASTA format) in the given box of
BLASTN or paste the accession number (Figure 15)
Tick the box “results in new window” in the last line of
BLASTN Home page (Figure 15)
Click BLAST (Figure 15)
In the new window, format request display is shown, wait for
few seconds, it leads to result page
15 16
In the result page, the blast results for sequence similarity
search appear as graphically and as text (descriptions,
alignment)
Click each text and find the similarity between the two
sequences
Click graphic summary (Figure 16)
View the graphical representation of sequence alignment
Lines in pink, red colors represent the sequence in a set of
score values.
Length of the line indicates the length of the local
alignment with the query sequences. (Figure 17)
RESULT
The most significant and similar sequences fetched by the
Click
blast are Homo sapiens collagen type II alpha 1 chain (COL2A1),
graphical transcript variant 2, mRNA,
summary OUTCOME
By learning BLASTN, students obtain the skill to compare the
new gene sequence with the nucleotide database by aligning the
novel sequence with previously characterized gene/protein. From
the comparison, students gather the functional and evolutionary
clues about the structure and function of the novel sequence.
17 18
Figure 17: Result page of Graphical summary
19 20
SIMILARITY SEQUENCE SEARCH USING BLASTP
AIM
To find the similarity of sequence for the given protein sequence
INTRODUCTION
BLAST is Basic Local Alignment Search Tool. It is a technique
for finding homology and similarity of protein sequences. It is made
for the Windows platform and is employed to carry out DNA or protein
similarity searches. The source is NCBI. It is a tool for searching for
sequences that are similar to one another in databases. This tool
focuses on identifying areas of sequence similarity. It is used to
Figure 18: Select Protein blast Click
compare a novel protein sequence against protein database by aligning
the novel sequence with previously characterized protein. The
emphasis of this tool is to find regions of sequence similarity. It will
yield functional and evolutionary clues about the structure and function
of the novel sequence. It finds out patches of sequences similarity
rather than best alignments. It produces un gapped alignments. It
reports multiple local alignments between the query and database. It is
based on an explicit statistical theory.
PROCEDURE
Go to NCBI Blast page
Chose the Blast program for proteins BLASTP (Figure 18)
Figure 19: BlastP Page
Click Tick
21 22
Copy the sequence from the notepad. Avoid the first FASTA line.
Paste the protein sequence in the search window of BLASTP.
(Figure 19)
Tick the box “results in new window”, in the last line of blast
Home page
Click BLAST
In the new window, format request display is shown, wait for few
seconds, it leads to result page (Figure 19a)
In the result page, the blast results for sequence similarity search
appear as graphically and as text
Click each text and find the similarity between sequence
Click graphic summary
View the graphical representation of sequence alignment
(Figure 19b)
Lines in pink, red colors represent the sequence in a set of score
values.
Length of the line indicates the length of the local alignment with
the query sequences.
RESULT
The most significant and similar sequences fetched by the
blastp are hemoglobin [Pseudoterranovadecipiens],CAA77743.1
Figure 19a: Result page of Blastp
23 24
OUTCOME
By learning BLASTP, students gain the knowledge to compare
the new protein sequence with the protein database by aligning the
novel sequence with previously characterized protein. From the
comparison, students can obtain functional and evolutionary clues
about the structure and function of the novel sequence.
25 26
ACCESSING STRUCTURAL DATABASE AND
DOWNLOADING THE PROTEIN STRUCTURE
AIM
To access the PDB (Protein Data Bank) structural database and
Click to download the protein structure.
INTRODUCTION
Figure 20 : Web browser page The three dimensional structure of biomolecules plays an
important role in the functions and maintenance of the structural
27 28
Choose the protein name of interest (Figure 22)
Click the ID
Click the download file (Figure 23)
Click PDB format (Figure 23)
Click
Download file as PDB format (Figure 24)
Save file
Click
View the protein structure in the molecular visualization
tool like RASmol, Swisspdbv (Figure 25)
Figure 23: Download the protein structure in PDB format
RESULT
The protein structure was downloaded and their descriptions were
Protein structure
downloaded in observed.
pdb format
OUTCOME
Students get trained to retrieve any protein structure from
structural protein database and can understand the characteristics of
Figure 24: Protein structure was downloaded
the protein structure.
29 30
WORKING WITH ENSEMBL
Type AIM
ensembl
To retrieve the vertebrate genomic information
INTRODUCTION
Ensembl is one of the several well-known genome browsers for
31 32
List of genes were displayed (Figure 29)
Select one and click
Details of diabetes were shown (Figure 30)
Click hyperlink one by one; the details of variant, genomic
Diabetes
Select location, reported genes, phenotype/genotype trait, annotation
Human
source, submitter, and external reference were known. (Figure
30)
Figure 28: Ensembl home page
RESULT
The diabetic gene annotations were retrieved from the Ensembl
genome browser.
Click OUTCOME
Students can retrieve any vertebrate gene and its details from the
Ensembl genome browser.
Figure 29: List of diabetes gene
33 34
MULTIPLE SEQUENCE ALIGNMENT
AIM
To study the closely related genes or proteins
Click INTRODUCTION
ClustalW tool is used for aligning multiple nucleotide or
protein sequences. It uses progressive alignment methods, which align
the most similar sequences first and work their way down to the least
Figure 31: Google Search similar sequences until a global alignment is created. A multiple
sequence alignment tools called Clustal Omega creates alignments
between three or more sequences by using HMM profile-profile
algorithms and seeded guide trees. Multiple sequence alignment is a
tool used to study closely related genes or proteins in order to find the
evolutionary relationships between genes and to identify shared
patterns among functionally or structurally related genes.
PROCEDURE
Figure 32: ClustalW Home page
Type multiple sequence alignment tool in the web browser
search field (Figure 31)
Click Clustal Omega < Multiple Sequence Alignment <
EMBL-EBI Or
Click Multiple Sequence Alignment-CLUSTALW –
Genome Net ((Figure 31)
35 36
Go to home page (Figure 32)
Copy the sequences from the notepad (Figure 33)
Paste the more than two sequences in input sequences box
(Figure 34)
37 38
Figure 36: Phylogenetic tree
39 40
PREDICTING PHYSIOCHEMICAL PROPERTIES OF
PROTEIN SEQUENCE
AIM
Click To predict the physiochemical properties of protein sequence
using ExPASy resources.
Figure 37: search bar INTRODUCTION
With the help of the tool ProtParam, users may compute a
number of physical and chemical parameters for a specific protein that
is contained in Swiss-Prot or TrEMBL, as well as for a user-entered
protein sequence. Molecular weight, theoretical pI (Isoelectric point),
Click amino acid composition, atomic composition, extinction coefficient,
assumed half-life, instability index, aliphatic index, and grand average
of hydropathicity (GRAVY) are among the calculated characteristics.
Figure 38: Home page of ExPASy
PROCEDURE
Type ExPASy in the web browser search box (Figure 37)
Click ExPASy
Go to ExPASy page, enter specific proteomic tool in search
box (Figure 38) Or
Enter the url: https://web.ExPASy.org/ProtParam/
For predicting physiochemical properties, type ProtParam
Click and enter
41 42
Select protoparm tool (Figure 39)
Click “browse the resource website”, protoparm tool page is
opened in new window (Figure 40)
Paste the protein sequence / accession number in the given box
(Figure 41)
Click Click compute parameters (Figure 41)
OUTCOME
Students learnt the art of analysing the physical and chemical
properties for a specific protein.
Click
43 44
Figure 42: Result page of Protoparam tool
45 46
PREDICTING PEPTIDE MASS OF PROTEIN SEQUENCE
AIM
To predict the peptide mass of protein sequence using ExPASy
resource
INTRODUCTION
In the analytical method of protein identification known as
peptide mass fingerprinting (PMF), the unknown protein of interest is
first broken up into smaller peptides whose absolute masses may be
Click
precisely determined with a mass spectrometer like the MALDI-TOF
or ESI-TOF.
Figure 43: ExPASy Proteomic tool search site PROCEDURE
Type ExPASy in the web browser search field
Tool opens in new window
Click ExPASy
Go to ExPASy page, Enter Peptide Mass in the search field
(Figure 43) Or
Type url : https://web.ExPASy.org/peptide_mass/
Click peptide mass tool
It leads to peptide mass resource page
Click Click the “Browse the resource website”, Peptide mass tools
site opens in new window (Figure 44)
Paste the protein sequence of interest
Figure 44 : Peptide Mass resources site
47 48
Move the cursor to choose which type of enzyme to cleave the
interested protein sequence (Figure 45)
Select "trypsin" from the drop-down box under "Select enzyme:
in the tool," if we want other enzyme use drop menu and select
the desired enzyme (Figure 45)
Click perform (Figure 46)
Result page shown (Figure 47)
RESULT
Choose any
enzyme
The high molecular weight peptide mass sequence is
TGPPGKPGPPGPPGPPGIQGIHQTLGGYYNK, position 309-
339and its molecular mass is 3036.5689
Figure 45: Peptide Mass tool site showing different enzymes for selecting to cleave
protein sequence
The low molecular weight peptide mass sequence is QELK,
position 174-177 and its molecular weight is 517.2980.
OUTCOME
Students can identify the molecular weight of peptide after
cleavage of protein using different enzymes or chemicals.
Click
Figure 46: PeptideMass tool site with pasted sequence and click to perform
49 50
Figure 47: Result page of peptide mass
51 52
PREDICTING CLEAVAGE SITE OF PROTEIN SEQUENCE
AIM
To predict the cleavage site of protein sequence using ExPASy
53 54
Go to the peptide cutter tool home page (Figure 49)
Paste the protein sequence of interest in the given box
Move the cursor to choose the enzyme type to cleave the
sequence of interest or select all available enzymes and
chemicals; Click perform, (Figure 50)
Result page is displayed (Figure 51)
RESULT
Proteinase K enzyme (Specific enzyme choose) cleaves 328 sites
in a given protein sequence.
OUTCOME
Students can predict the potential cleavage sites cleaved by
proteases or chemicals in a given protein sequence. Peptide Cutter
returns the query sequence with the possible cleavage sites mapped on
it and /or a table of cleavage site positions.
55 56
PREDICTING SECONDARY STRUCTURE OF PROTEIN
SEQUENCE USING SOPMA TOOL
AIM
Click To predict the secondary structure of protein sequence using
ExPASy resources
INTRODUCTION
Secondary structure prediction is a group of techniques
in bioinformatics that aim to calculate the secondary structures of
proteins and nucleic acid sequences based on the information from
Figure 52: Google search list with secondary structure prediction tool
their basic structures (primary structures). Through base pairing and
base stacking interactions, it predicts the formation of nucleic acid
structures like helixes and stem-loop structures while predicting the
formation of protein structures like alpha helices and beta strands for
proteins. The Self-Optimized Prediction Method with Alignment
(SOPMA) is a tool to predict the secondary structure of a protein.
Based on the query (primary sequence of a protein), SOPMA will
predict its secondary structure. Protein secondary structure prediction
offers insight into the activity, interactions, and functions of proteins
as well as serving as an important initial step toward tertiary structure
prediction. The polypeptide backbone of the local conformation
Figure 53: Home page of SOPMA tool proteins is referred to as the protein's secondary structure.
57 58
PROCEDURE
Type SOPMA tool in the web browser search field and press
enter
Click the NPS@SOPMA secondary structure prediction
(Figure 52)
Open the SOPMA secondary structure prediction tool home
page (Figure 53)
Click
Paste a protein sequence in the given box (Figure 54)
Figure 54: sequence pasted in box Click submit (Figure 54)
Result page (Figure 55) is displayed.
RESULT
The secondary structure location present in the protein
sequence is predicted.
The secondary structures present in the given sequence are
alpha helix, extended strand, beta turn and Random coil.
Random coil is the most prominent secondary structure
(70.89%) in the given sequence.
The lowest percentage of secondary structure in the given
sequence is Beta turn.
OUTCOME
Students learn how to predict the regions of different forms of
Figure 55: Result page of SOPMA secondary structure from the protein sequence.
59 60
PREDICTING SECONDARY STRUCTURE OF PROTEIN
SEQUENCE USING CFSSP TOOL
AIM
To predict the secondary structure of the given protein sequence
through CFSSP tool.
INTRODUCTION
CFSSP (Chou and Fasman Secondary Structure Prediction
Click Server) is an online protein secondary structure prediction server. The
Figure 56: Search list of Secondary structure prediction tool output predicts regions of secondary structure from the protein
sequence such as alpha helix, beta sheet, and turns from the amino
acid sequence. The method implemented in CFSSP is Chou and
Fasman algorithm, which is based on analyses of the relative
frequencies of each amino acid in alpha helices, beta sheets, and turns
based on known protein structures solved with X-ray crystallography.
CFSSP is freely accessible via ExPASy server or directly from
BioGem tools at http://www.biogem.org/tool/chou-fasman.
PROCEDURE
In web browser, type Chou Fasman secondary structure
prediction tool, search list will be displayed and click CFFSSP
(Figure 56)
Click CFSSP or http://www.biogem.org › tool › chou-
Figure 57: CFSSP Home Page fasman
61 62
Open the CFSSP secondary structure prediction tool home
page (Figure 57)
Paste a protein sequence in the given box (Figure 58)
Click predict (Figure 58)
Result page (Figure 59) is displayed.
RESULT
Click The secondary structures present in the given sequence are
alpha helix, beta sheet and betas turn.
Figure 58: Sequence pasted in the box
Alpha helix is the most prominent secondary structure
(32.9%) in the given sequence.
The lowest percentage of secondary structure (16.7) in the
given sequence is β turn.
OUTCOME
Students understand how to predict the regions of different
forms of secondary structure from the protein sequence.
63 64
PREDICTING TRANSMEMBRANE REGION OF
PROTEIN SEQUENCE
AIM
To predict the transmembrane region in the given protein
Click sequence/ID.
INTRODUCTION
The orientation of membrane-spanning sections is predicted
by the TMpred software. The technique is based on a statistical
investigation of the transmembrane protein database known as
TMbase. A combination of various weight matrices for scoring are
Figure 60: ExPASy home page
used to make the prediction.
PROCEDURE
Go to ExPASy page, enter TMpred in search box (Figure 60)
Click TMpred
Go to the home page of TMpred tool (Figure 61)
Paste the protein sequence in the search field
Click Run TMpred (Figure 61)
Results page is displayed (Figure 62)
Click
65 66
RESULT
OUTCOME
Students comprehend the technique to predict the membrane-
spanning regions and their orientation. They also understand that the
transmembrane proteins act as gateways for transporting specific
substances across the membrane.
67 68
PREDICTING HYDROPHILICITYREGION IN THE
PROTEIN SEQUENCE
AIM
69 70
On the other hand, amino acids with high hydrophilicity show
that these residues are in touch with a solvent, such as water, and are
therefore likely to be found on the protein's outer surface. First, a
hydrophobicity rating between 4.6 and -4.6 is provided to each
amino acid. The highest hydrophobic value is 4.6, and the least
hydrophilic score is -4.6.
PROCEDURE
Type hydrophilicity plot in the web browser and press enter
Search list will be displayed, Click protein hydrophilic city
Figure 65: Home page plot –novoprolabs (Figure 63)
Go to the home page (Figure 64)
Paste the protein sequence in the given field
Click submit (Figure 65)
Results will be displayed (Figure 66)
RESULT
The hydrophilic residues in given Protein sequence was
predicted.
OUTCOME
Students learn to predict whether or not the protein segment
has enough hydrophilicity to either interact with or reside in a
membrane.
Figure 66: Hydrophilic regions in the protein sequence
71 72
DETECTING ALIGNMENT OF REPEATS IN A PROTEIN
SEQUENCES
Type tool name
AIM
To detect the alignment of repeats in a given protein sequence.
INTRODUCTION
Click
RADAR stands for Rapid Automatic Detection and
Alignment of Repeats in protein sequences. RADAR identifies
Figure 67: Google search page gapped approximate repeats and complex repeat architectures
involving different types of repeats.
PROCEDURE
Type Radar protein tool in the web browser and press enter
Click the RADAR tool (ebi.uk)in the search list (Figure 67)
Figure 68: RADAR tool home page
Go to the RADAR home page (Figure 68)
Paste the fasta protein sequence in the input box
Click submit
Result page was shown (Figure 69)
RESULT
The given sequence contains 2alignment repeats.
OUTCOME
Students can able to predict the number of alignment repeats
in the proteins sequences.
73 74
PREDICTING THE PEPTIDE STRUCTURE FOR THE
GIVEN PROTEIN SEQUENCE
AIM
Click To predict the peptide structure of given protein sequence
using Pepdraw tool
INTRODUCTION
Figure 70: Google search list Pepdraw tool is used to draw primary peptide and also
calculate the physico chemical properties.
PROCEDURE
Type Pepdraw tool in the web browser and press enter
Click the Pepdraw in the search list (Figure 70)
Figure 71: Pepdraw home page Go to the Pepdraw home page (Figure 71)
Paste the fasta protein sequence in the input box
Click draw peptide
Result page will be displayed (Figure 72)
RESULT
The protein sequence was converted into peptide chain.
OUTCOME
Students acquire the skill to draw the primary chemical
structure of an amino acid sequence and to predict chemical
properties for any protein sequences.
75 76
CONVERSION OF NUCLEOTIDE SEQUENCES INTO
PROTEIN SEQUENCES
AIM
To convert the nucleotide sequences into protein sequences and to
identify the correct reading frame.
INTRODUCTION
Select compact Translate is a tool which allows the translation of a nucleotide
(DNA/RNA) sequence into a protein sequence. Translate accepts a
DNA sequence and converts it into a protein in the reading frame as
specified. Translate supports the entire IUPAC alphabet and several
genetic codes. A raw sequence or one or more FASTA sequences is
pasted in the text area. Input limit is 200,000,000 characters.
Determining is a complex process if a nucleic acid sequence actually
codes for a protein. Because, generally it is not known which strand is
the coding strand or which is the correct reading frame. Both these
questions are resolved by translating both strands in all three reading
frames and looking for the one that gives the longest amino acid
sequence before a stop codon is encountered. A stop codon is expected
to appear on average once for every 20 amino acids when reading a
sequence in the incorrect frame. It is possible for an out of frame
77 78
PROCEDURE
Select the nucleotide sequence, copy it and then paste it into the
translate sequence window in the ExPASy translate tool
Under Output format, select "Compact" or nucleotide sequence
without space
Click on Translate Sequence
Select nucleotide
RESULT
Output compact is selected; it gives the amino acid sequence as
one letter code with stop codons indicated by a hyphen with
different frames.
Output nucleotide sequence without space is selected; it gives the
nucleotide sequence with one letter code aminoacid with different
frames.
Red colour indicates the open reading frame.
OUTCOME
Students gain knowledge to convert any unknown nucleotide
sequence into protein sequence and to identify the most correct reading
frame. They also acquire knowledge about the exons, pseudogenes, non-
coding region of DNA and regulatory functions.
Figure 74: Result page of translate with nucleotide sequence with aminoacid
79 80
Click file Click open
MOLECULAR VISUALIZATION USING RASMOL
AIM
To visualize the tertiary structure of protein molecule in
graphic view and command line
INTRODUCTION
RasMol is free software for molecular visualization created by
Roger Sayle. It is a molecular graphics programme intended for the
visualization of proteins, nucleic acids and small molecules. The
programme aims at display, teaching and generation of publication
with quality images. The program reads in a molecule coordinate file
Figure 75 : Open the PDB file in RASMOL tool and interactively displays the molecular screen in variety of colour
schemes and molecular representations.
PDB co-ordinate file
REQUIREMENT
RASMOL software, PDB molecule
PROCEDURE
Open a molecular visualization tool (Figure 75)
From the file menu open a PDB atom co-ordinate file
(Figure 76)
Rotate the molecule
81 82
Try various options (Figure 77,78,79)
Try different commands in command line and visualize the
changes in structure (Figure 80, 81)
Save the required structural view
Exit the application
COMMANDS
Select
Colour
Zoom on
Zoom off
Label on
Label off
Spacefill
Star on
Background pink
Stereo on
Stereo off
Pick angle
Pick distance
Label 250
Star
83 84
Hbonds
Wireframe
Cartoon
dots
Quit
Zoom
RESULT
The tertiary structure of protein molecule is visualized in graphic
view and command line
Figure 78: Visualise the different forms of protein structure
85 86
OUTCOME
The students are imparted the ability to evaluate and interpret
molecular models. Students can interpret the complicated molecule
structure, properties, and interactions with the use of molecular
visualisation tools. These resources aid their study in the fields of
chemistry, pharmacology, biology, and bioinformatics.
87 88
Figure 80: Visualise the different forms of protein structure
89 90
Select carbon
Colour green
Figure 81: Various commands used in command line and visualize the
changes in structure
91 92
MEASUREMENT OF BOND LENGTH IN PROTEIN
STRUCTURE USING RASMOL
AIM
To measure the bond length of atoms in the protein structure
and to visualize it in graphic view and command line.
Label on INTRODUCTION
RasMol is a free software for molecular visualization created
by Roger Sayle. It is a molecular graphics programme intended for
the visualization of proteins, nucleic acids and small molecules. The
program aims at display, teaching and generation of publication with
quality images. The program reads in a molecule coordinate file and
interactively displays the molecular screen in variety of colour
Stereo on
schemes and molecular representations.
PROCEDURE
Open RasMol and import a file of Pdb atom coordinates
Use the various menu options and get a composition of the
molecule
Set the display style to “ball and stick” (Preferable, but works
with other displaying style as well)
Background green Use Shift+ mouse down to zoom in on the molecule to see the
bonds more clearly.
Figure 82: Various commands used in command line and visualize the Open the command line window
changes in structure
Type set picking distance and press enter key
93 94
Open the display Window and select the two atoms
participating in the bond formation by clicking on them
successively.
The command line window displays the bond length.
Record the results
Alternatively, to show bond and to measure the bond length
between two atoms, type set picking monitor in the command
line window.
Now click on the two atoms again.
A bond line appears (The bond is removed when a click on
the atoms is made more than once.)
Note the results from the command line window.
RESULT
OUTCOME
Students learn how to find out the distance between two atoms
using command line and directly picking atoms in structure.
95 96
MEASUREMENT OF BOND ANGLE IN PROTEIN
STRUCTURE USING RASMOL
AIM
To measure the bond angle between the atoms in the protein
structure and to visualize it in graphic view and command line.
INTRODUCTION
RasMol is a free and most popularly used software for
molecular visualization created by Roger Sayle. It is a molecular
graphics programme intended for the visualization of proteins, nucleic
acids and small molecules. The program aims at display, teaching and
generation of publication with quality images. The program reads in a
molecule coordinate file and interactively displays the molecular
screen in variety of colour schemes and molecular representations.
PROCEDURE
Open RasMol and load a file of Pdb atom coordinates
Use the various menu options and get a feel of the molecule
Set the display style to “ball and stick”(Preferable, but works
with other displaying style as well)
Use Shift + mouse down to zoom in on the molecule to see the
bonds more clearly.
Go to command line window.
Type set picking angle and press enter key
97 98
Go to display Window and select the three atoms forming the
bond angle by clicking on them successively.
The command line window displays the bond angle.
Note the results.
RESULT
OUTCOME
Students acquire the skills necessary to find out the bond angle
between atoms using command line and directly picking atoms in
structure.
99 100
V.V.VANNIAPERUMAL COLLEGE FOR WOMEN
(Belonging to Virudhunagar Hindu Nadars)
An Autonomous Institution Affiliated to Madurai Kamaraj University
Re–accredited with ‘A’ Grade (3rd cycle) by NAAC
VIRUDHUNAGAR.
WEBLINKS: