0% found this document useful (0 votes)

13 views11 pages

Lab05 Manual

Uploaded by

laiphuong1112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

Lab05 Manual

Uploaded by

laiphuong1112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Practice in Bioinformatics (BT338IU) This is used internally only.

Labwork 5

BASIC MODELLER
MODELING A PROTEIN BASED ON A SINGLE TEMPLATE

Objectives
• Using MODELLER to construct the protein structure from an amino acid sequence
• Chossing an appropriate available 3D protein for the template
• Changing important information in Modeller scripts
• Estimate the quality of the modeled protein using PROCHECK

Taken and revised from:

1/ Modeling lactate dehydrogenase from Trichomonas vaginalis based on a single template1. University
of Illinois at Urbana-Champaign. Beckman Institute for Advanced Science and Technology, Theoretical
and Computational Biophysics Group, Computational Biophysics Workshop.
2/ MODELLER - A Program for Protein Structure Modeling, Release 10.1, r121562. Andrej Šali
3/ PROCHECK v.3.5 - Programs to check the Stereochemical Quality of Protein Structures3. Roman A
Laskowski, Malcolm W MacArthur, David K Smith, David T Jones, E Gail Hutchinson, A Louise Morris,
David S Moss, and Janet M Thornton.

MODELLER
MODELLER is a computer program that models three-dimensional structures of proteins and their
assemblies by satisfaction of spatial restraints.
MODELLER is most frequently used for homology or comparative protein structure modeling: The
user provides an alignment of a sequence to be modeled with known related structures and
MODELLER will automatically calculate a model with all non-hydrogen atoms (these structures are often
homologs, but certainly don't have to be, hence the term “comparative” modeling).
More generally, the input to the program are restraints on the spatial structure of the amino acid
sequence(s) and ligands to be modeled. The output is a 3D structure that satisfies these restraints as well
as possible. Restraints can in principle be derived from a number of different sources. These include
related protein structures (comparative modeling), NMR experiments (NMR refinement), rules of
secondary structure packing (combinatorial modeling), cross-linking experiments, fluorescence
spectroscopy, image reconstruction in electron microscopy, site-directed mutagenesis, intuition, residue-
residue and atom-atom potentials of mean force, etc. The restraints can operate on distances, angles,
dihedral angles, pairs of dihedral angles and some other spatial features defined by atoms or pseudo

1 https://salilab.org/modeller/tutorial/basic.html
2 https://salilab.org/modeller/manual/manual.html
3 http://www.csb.yale.edu/userguides/datamanip/procheck/manual/index.html

Page 1 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

atoms. Presently, MODELLER automatically derives

the restraints only from the known related
structures and their alignment with the target
sequence.
A 3D model is obtained by optimization of a
molecular probability density function (pdf). The
molecular pdf for comparative modeling is
optimized with the variable target function
procedure in Cartesian space that employs
methods of conjugate gradients and molecular
dynamics with simulated annealing.
MODELLER can also perform multiple
comparison of protein sequences and/or
Figure 5.1: First, the known, template 3D structures are
structures, clustering of proteins, and searching of aligned with the target sequence to be modeled. Second,
sequence databases. The program is used with a spatial features, such as Cα- Cα distances, hydrogen
bonds, and mainchain and sidechain dihedral angles, are
scripting language and does not include any
transferred from the templates to the target (a number of
graphics. It is written in standard FORTRAN 90 and spatial restraints on its structure are obtained). Third, the
will run on UNIX, Windows, or Mac computers. 3D model is obtained by satisfying all the restraints as
well as possible.

Figure 5.2: The Modeller version 9.21 window

The individual modeling steps of this example are explained below. Note that we go through every
step in this tutorial to build a model knowing only the amino acid sequence. In practice you may already
know the related structures, and may even have an alignment from another program, so you can skip
one or more steps.

Page 2 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

Lab procedure
Step 1. Making the amino acid sequence file in PIR format – sequence.ali file
It is necessary to put the target amino acid sequence into the PIR format file sequence.ali which is
readable by MODELLER.

sequence.ali: The yellow highlighted sections should be changed depending on your protein.

>P1;name_of_protein_in_code
sequence:name_of_protein_in_code:::::::0.00: 0.00
MSEAAHVLITGAAGQIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKAAFKD
IDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTNCEIAMLHAKNLKPENFSSLSMLD
QNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEGKTQKVVDVLDHDYVFDTFFKKIGHALNHLAQGG*

The first line contains two fields separated by a semicolon in the format
">P1;name_of_protein_in_code". The code of protein name should be short, no space, and
distinguished with other proteins
The second line with ten fields separated by colons generally contains information about the structure
file, if applicable. Only two of these fields are used for sequences, "sequence" (indicating that the file
contains a sequence without known structure) and "name_of_protein_in_code" (should be identical with
previous line).
The rest of the file contains the sequence of the interesting protein, with "*" marking its end. The
standard one-letter amino acid codes are used. (Note that they must be upper case; some lower case
letters are used for non-standard residues and will be ignored.)

The file should be fulfilled with needed information (name_of_protein_in_code, amino acid sequence)
and saved.

Note that, the .ali file should be named with your name_of_protein_in_code for not being confused
later. In this example, I will name it TvLDH (a shorten name of a protein).

Step 2. Searching for structures related to your sequence using build_profile.py file
A search for potentially related sequences of known structure can be performed by
the Profile.build() command of MODELLER. The following script in the build_profile.py file, taken
group by group (the “#” indicates that this line is a note and will not be run), does the job:
1. Initializes the 'environment' for this modeling run, by creating a new 'Environ' object. Almost
all MODELLER scripts require this step, as the new object (which we call here 'env', but you
can call it anything you like) is needed to build most other useful objects.
2. Creates a new 'SequenceDB' object, calling it sdb. 'SequenceDB' objects are used to contain
large databases of protein sequences. And, reads a text format file containing non-
redundant PDB sequences at 95% sequence identity into the sdb database. The sequences
can be found in the file pdb_95.pir (which can be downloaded using the link
https://salilab.org/modeller/supplemental.html). Like the previously-created alignment, this
file is in PIR format. Sequences which have fewer than 30 or more than 4000 residues are
discarded, and non-standard residues are removed.

Page 3 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

3. Writes a binary machine-specific file containing all sequences read in the previous step.
4. Reads the binary format file back in. Note that if you plan to use the same database several
times, you should use the previous two steps only the first time, to produce the binary
database. On subsequent runs, you can omit those two steps and use the binary file directly,
since reading the binary file is a lot faster than reading the PIR file.
5. Creates a new 'Alignment' object, calling it aln, reads our query sequence
"name_of_protein_in_code" from the file sequence.ali, and converts it to a profile prf.
Profiles contain similar information to alignments, but are more compact and better for
sequence database searching.
6. Searches the sequence database sdb for our query profile prf. Matches from the sequence
database are added to the profile. The Profile.build() command has many options.
In this example, we set the parameters as: the BLOSUM62 similarity matrix (rr_file), -
450 (matrix_offset), -500 and -50 (gap_penalties_1d) which are appropriate for the
BLOSUM62 matrix, 1 search iteration (n_prof_iterations), False for no need to check
the profile for deviation (check_profile), only sequences with e-values smaller than or
equal to 0.01 (max_aln_evalue).
7. Writes a profile of the query sequence and its homologs build_profile.prf file. The equivalent
information is also written out in standard alignment format.

build_profile.py: The yellow highlighted sections should be changed depending on your protein.

from modeller import *

log.verbose()
env = Environ()

#-- Prepare the input files

#-- Read in the sequence database
sdb = SequenceDB(env)
sdb.read(seq_database_file='pdb_95.pir', seq_database_format='PIR',
chains_list='ALL', minmax_db_seq_len=(30, 4000), clean_sequences=True)

#-- Write the sequence database in binary form

sdb.write(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
chains_list='ALL')

#-- Now, read in the binary database

sdb.read(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
chains_list='ALL')

#-- Read in the target sequence/alignment

aln = Alignment(env)
aln.append(file='sequence.ali', alignment_format='PIR', align_codes='ALL')
#-- Convert the input sequence/alignment into
# profile format
prf = aln.to_profile()

#-- Scan sequence database to pick up homologous sequences

Page 4 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

prf.build(sdb, matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat',

gap_penalties_1d=(-500, -50), n_prof_iterations=1,
check_profile=False, max_aln_evalue=0.01)

#-- Write out the profile in text format

prf.write(file='build_profile.prf', profile_format='TEXT')
#-- Convert the profile back to alignment format
aln = prf.to_alignment()
#-- Write out the alignment file
aln.write(file='build_profile.ali', alignment_format='PIR')

After changing needed information and downloading the newest pdb_95.pir, this regular Python
script can be run with a command at your command line window: mod9.21 build_profile.py

Note that the command will depend on the version of the Modeller. For example, to run this script using
Modeller version 10.0, type mod10.0 build_profile.py. The command will run and create 2 outputs.

Step 3. Selecting a template from output build_profile.prf file

The output of the build_profile.py script is written to the build_profile.log and build_profile.prf files (or
any names that you made). MODELLER always produces a log file. Errors and warnings in log files can be
found by searching for the "_E>" and "_W>" strings, respectively. In the output file in text format
build_profile.prf, the first 6 commented lines indicate the input parameters used in MODELLER to build
the profile. Subsequent lines correspond to the detected similarities by Profile.build().

build_profile.prf: The yellow highlighted sections lately are added for more information.

# Number of sequences: 30
# Length of profile : 335
# N_PROF_ITERATIONS : 1
# GAP_PENALTIES_1D : -900.0 -50.0
# MATRIX_OFFSET : 0.0
# RR_FILE : ${MODINSTALL8v1}/modlib//as1.sim.mat

#column number
1 2 3 4 5 6 7 8 9 10 11 12 13
1 TvLDH S 0 335 1 335 0 0 0 0. 0.0
2 1a5z X 1 312 75 242 63 229 164 28. 0.83E-08
3 1b8pA X 1 327 7 331 6 325 316 42. 0.0
4 1bdmA X 1 318 1 325 1 310 309 45. 0.0
5 1t2dA X 1 315 5 256 4 250 238 25. 0.66E-04
6 1civA X 1 374 6 334 33 358 325 35. 0.0
7 2cmd X 1 312 7 320 3 303 289 27. 0.16E-05
8 1o6zA X 1 303 7 320 3 287 278 26. 0.27E-05
9 1ur5A X 1 299 13 191 9 171 158 31. 0.25E-02
10 1guzA X 1 305 13 301 8 280 265 25. 0.28E-08
11 1gv0A X 1 301 13 323 8 289 274 26. 0.28E-04
12 1hyeA X 1 307 7 191 3 183 173 29. 0.14E-07
13 1i0zA X 1 332 85 300 94 304 207 25. 0.66E-05
14 1i10A X 1 331 85 295 93 298 196 26. 0.86E-05
15 1ldnA X 1 316 78 298 73 301 214 26. 0.19E-03
16 6ldh X 1 329 47 301 56 302 244 23. 0.17E-02
17 2ldx X 1 331 66 306 67 306 227 26. 0.25E-04

Page 5 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

18 5ldh X 1 333 85 300 94 304 207 26. 0.30E-05

19 9ldtA X 1 331 85 301 93 304 207 26. 0.10E-05
20 1llc X 1 321 64 239 53 234 164 26. 0.20E-03
21 1lldA X 1 313 13 242 9 233 216 31. 0.31E-07
22 5mdhA X 1 333 2 332 1 331 328 44. 0.0
23 7mdhA X 1 351 6 334 14 339 325 34. 0.0
24 1mldA X 1 313 5 198 1 189 183 26. 0.13E-05
25 1oc4A X 1 315 5 191 4 186 174 28. 0.18E-04
26 1ojuA X 1 294 78 320 68 285 218 28. 0.43E-05
27 1pzgA X 1 327 74 191 71 190 114 30. 0.16E-06
28 1smkA X 1 313 7 202 4 198 188 34. 0.0
29 1sovA X 1 316 81 256 76 248 160 27. 0.93E-03
30 1y6jA X 1 289 77 191 58 167 109 33. 0.32E-05

The most important columns in the Profile.build() output are the second, tenth, eleventh and
twelfth columns. In detail:
➢ Column 1: number of order
➢ Column 2: PDB code (first 4 letters) and chain name (last letter) of the PDB query/entry
➢ Column 3-4: signs for your sequence (S/0) or queries/entries (X/1)
➢ Column 5: number of amino acids of query/entry sequence
➢ Column 6-7: start and end positions of your sequence
➢ Column 8-9: start and end positions of the query/entry
➢ Column 10: length of the alignment between your sequence and a PDB query/entry
➢ Column 11: percentage of identities between your sequence and a PDB sequence
➢ Column 12: e-value of paired alignment
➢ Column 13: sequence alignment between your sequence and queries (omited in this example)

There are 2 criteria to choose a suitble template. 1/ A sequence identity value above approximately
25% indicates a potential template, means the higher identity the better template, over the length of the
alignment. And 2/ a better measure of the significance of the alignment e-value is as small as good.
After choosing your template, it is should be downloaded in .pdb format from the RCSB PDB4 database.

For example, in this case I have 6 PDB entries having lowest e-value 0: 1b8pA, 1bdmA, 1civA, 5mdhA,
7mdhA, and 1smkA. Among them, 1bdmA has the highest identity (45%) but the total length of
alignment is only 309 (~139 identical amino acids). Meanwhile, the 5mdhA has the second highest
identity (44%) and the total length of alignment is 328 (~144 identical amino acids). Between 139 and
144, it is not significantly different, thus I can choose either 1bdmA or 5mdhA. In this case, I will choose
1bdmA (chain A of PDB ID 1bdm).

Step 4. Aligning your protein with your chosen PDB template by align2d.py file
A good way of aligning your sequence with the PDB template is the align2d() command
in MODELLER. This align2d() is based on a dynamic programming algorithm and takes into account
structural information from the template when constructing an alignment. This task is achieved through
a variable gap penalty function that tends to place gaps in solvent exposed and curved regions, outside
secondary structure segments, and between two positions that are close in space. As a result, the
alignment errors are reduced by approximately one third relative to those that occur with standard
sequence alignment techniques. This improvement becomes more important as the similarity between
the sequences decreases and the number of gaps increases. In the current example, the template-target

4 www.rcsb.org
Page 6 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

similarity is so high that almost any alignment method with reasonable parameters will result in the same
alignment. The following MODELLER script aligns your template sequence in file sequence.ali with
the chain A of 1bdm structure in the PDB file 1bdm.pdb.

align2d.py: The yellow highlighted sections should be changed depending on your protein.

from modeller import *

env = Environ()
aln = Alignment(env)
mdl = Model(env, file='1bdm', model_segment=('FIRST:A','LAST:A'))
aln.append_model(mdl, align_codes='1bdmA', atom_files='1bdm.pdb')
aln.append(file='TvLDH.ali', align_codes='TvLDH')
aln.align2d()
aln.write(file='TvLDH-1bdmA.ali', alignment_format='PIR')
aln.write(file='TvLDH-1bdmA.pap', alignment_format='PAP')

Each command/line will do the job:

1. Create an 'Environ' object to use as input to later commands
2. Create an empty alignment aln, and then a new protein model mdl, into which the chain A
segment of the 1bdm PDB structure file is read.
3. The append_model() command transfers the PDB sequence of this model to the alignment
and assigns it the name of "1bdmA" (align_codes for chain A of 1bdm).
4. Add your sequence from file sequence.ali to the alignment and coded it followed your name,
using the append() command.
5. The align2d() command is then executed to align the two sequences.
6. Finally, the alignment is written out in two formats, PIR (TvLDH-1bdmA.ali) and PAP (TvLDH-
1bdmA.pap). The PIR format is used by MODELLER in the subsequent model building stage,
while the PAP alignment format is easier to inspect visually. In the PAP format, all identical
positions are marked with a "*".

After changing needed information and downloading the chosen PDB template, run the command:
mod9.21 align2d.py

TvLDH-1bdmA.pap

_aln.pos 10 20 30 40 50 60
1bdmA MKAPVRVAVTGAAGQIGYSLLFRIAAGEMLGKDQPVILQLLEIPQAMKALEGVVMELEDCAFPLLAGL
TvLDH MSEAAHVLITGAAGQIGYILSHWIASGELYG-DRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGF
_consrvd * * ********* * ** ** * * * * ** ** ** * ********* ***

_aln.p 70 80 90 100 110 120 130

1bdmA EATDDPDVAFKDADYALLVGAAPRL---------QVNGKIFTEQGRALAEVAKKDVKVLVVGNPANTN
TvLDH VATTDPKAAFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTN
_consrvd ** ** **** * * ** * * ** * * ** ***** *** ***

_aln.pos 140 150 160 170 180 190 200

1bdmA ALIAYKNAPGLNPRNFTAMTRLDHNRAKAQLAKKTGTGVDRIRRMTVWGNHSSIMFPDLFHAEVD---
TvLDH CEIAMLHAKNLKPENFSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEG
_consrvd ** * * * ** ** *** * * * * ***** * ** *

_aln.pos 210 220 230 240 250 260 270

1bdmA -GRPALELVDMEWYEKVFIPTVAQRGAAIIQARGASSAASAANAAIEHIRDWALGTPEGDWVSMAVPS
Page 7 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

TvLDH KTQKVVDVLDHDYVFDTFFKKIGHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPV
_consrvd * * * * ** **** *** * * ** * ** *

_aln.pos 280 290 300 310 320 330

1bdmA Q--GEYGIPEGIVYSFPVTAK-DGAYRVVEGLEINEFARKRMEITAQELLDEMEQVKAL--GLI
TvLDH PEGNPYGIKPGVVFSFPCNVDKEGKIHVVEGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG
_consrvd *** * * *** * **** * * * * * *

Step 5. Building the models using model-single.py file

Once a target-template alignment is constructed, MODELLER calculates a 3D model of the target
completely automatically, using its AutoModel class. The following script will generate five similar
models of the sequence based on the PDB template structure and the aligned .ali file from the previous
step.
Each command/line will do the job:
1. The first and second lines load in the AutoModel class and prepares it for use.
2. An AutoModel object is created and called 'a', and set parameters to guide the model building
procedure: alnfile names the file that contains the target-template alignment in the PIR
format (prvious output .ali file). knowns defines the known template structure. Sequence
defines the name of the target sequence. assess_methods requests one or more
assessment scores.
3. starting_model and ending_model define the number of models that are calculated
(their indices will run from 1 to a number of models).
4. The last line in the file calls the make method that actually calculates the models.

model-single.py: The yellow highlighted sections should be changed depending on your protein.

from modeller import *

from modeller.automodel import *
#from modeller import soap_protein_od

env = Environ()
a = AutoModel(env, alnfile='TvLDH-1bdmA.ali',
knowns='1bdmA', sequence='TvLDH',
assess_methods=(assess.DOPE,
#soap_protein_od.Scorer(),
assess.GA341))
a.starting_model = 1
a.ending_model = 5
a.make()

Run the command: mod9.21 model-single.py

The codes for knowns, and sequence should be exactly the same as before or as in the aligned .ali
file. Such as, knowns = align_codes of model and sequence = align_codes of your sequence in
the align2d.py. Or, knows = first code and sequence = second code in the TvLDH-1bdmA.pap file.

The most important output file is model-single.log, which reports warnings, errors and other useful
information including the input restraints used for modeling that remain violated in the final model. The
last few lines from this log file are shown below.

Page 8 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

model-single.log

>> Summary of successfully produced models:

Filename molpdf DOPE score GA341 score
----------------------------------------------------------------------
TvLDH.B99990001.pdb 1763.56104 -38079.76172 1.00000
TvLDH.B99990002.pdb 1560.93396 -38515.98047 1.00000
TvLDH.B99990003.pdb 1712.44104 -37984.30859 1.00000
TvLDH.B99990004.pdb 1720.70801 -37869.91406 1.00000
TvLDH.B99990005.pdb 1840.91772 -38052.00781 1.00000

The log file gives a summary of all the models built. For each model, it lists the file name, which
contains the coordinates of the model in PDB format. The models can be viewed by any program that
reads the PDB format, such as VMD. The log also shows the score(s) of each model:
✓ The MODELLER objective function, molpdf, is always calculated, and is also reported in a
REMARK in each generated PDB file.
✓ The Discrete Optimized Protein Energy, DOPE, is used to assess homology models in protein
structure prediction.
✓ The Statistically Optimized Atomic Potentials, SOAP, is an orientation-dependent potential
and can only reliably be used for scoring (not optimization) as its first derivatives are zero.
✓ The GA341 score uses the percentage sequence identity between the template and the model
as a parameter and always ranges from 0.0 (worst) to 1.0 (native-like).
The molpdf, DOPE, and SOAP scores are not absolute measures, in the sense that they can only be
used to rank models calculated from the same alignment. The best model can be selected in several
ways: the model with the lowest value of the molpdf or the DOPE or the SOAP assessment scores, or
with the highest GA341 assessment score; however, GA341 is not as good as DOPE or SOAP at
distinguishing 'good' models from 'bad' models.
To calculate the SOAP score, you will first need to download the SOAP-Protein potential file from the
SOAP website5, then uncomment the SOAP-related lines in model-single.py by removing the '#'
characters.

Step 6. Evaluating the quality of model using PROCHECK server: https://saves.mbi.ucla.edu/

• Upload your chosen model to the server and run programs
• The next page will give you 6 different things to do with your protein. Choose PROCHECK.
• Read the output (fig 5.3)
1. Summary: overview of all criteria to access the quality of protein based on the stereochemitry.
The evaluations include “+”: warning – not or may be worth investigating further and
continue to next experiments; and “*”: error – should be rechecked and investigated
further.
2. Ramachandran plot: shows the phi-psi torsion angles for all residues in the ensemble (except
those at the chain termini). Those in unfavourable conformations are labelled by the
position number their name and position.
3. All Ramachandrans: Ramachandran plot of each available amino acid type. The numbers in
brackets, following each residue name, show the total number of data points on that
graph. Those in unfavourable conformations are labelled by the position number.

5 https://salilab.org/SOAP/
Page 9 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

4. Chi1-chi2 plots: show the chi1-chi2 torsion angle combinations for all residue types that have
both these angles. The shading on each plot indicates how favourable each region on the
plot is; the darker the shade the more favourable the region. The numbers in brackets,
following each residue name, show the total number of data points on that graph. Those
in unfavourable conformations are labelled by the position number.
5. Main-chain params: The six graphs on the main-chain parameters plot show how the
structure (represented by the solid square) compares with well-refined structures at a
similar resolution. The dark band in each graph represents the results from the well-refined
structures; the central line is a least-squares fit to the mean trend as a function of
resolution, while the width of the band on either side of it corresponds to a variation of
one standard deviation about the mean. a. Ramachandran plot quality is measured by
the percentage of the protein's residues that are in the most favoured, or core, regions of
the Ramachandran plot. b. Peptide bond planarity is measured by calculating the
standard deviation of the protein structure's omega torsion angles. c. Bad non-bonded
interactions is measured by the number of bad contacts per 100 residues. d. Calpha
tetrahedral distortion is measured by calculating the standard deviation of the zeta
torsion angle. e. Main-chain hydrogen bond energy is measured by the standard
deviation of the hydrogen bond energies for main-chain hydrogen bonds. f. Overall G-
factor is a measure of the overall normality of the structure.
6. Side-chain params: The five graphs on the side-chain parameters plot show how the structure
(represented by the solid square) compares with well-refined structures at a similar
resolution. The dark band in each graph represents the results from the well-refined
structures; the central line is a least-squares fit to the mean trend as a function of
resolution, while the width of the band on either side of it corresponds to a variation of
one standard deviation about the mean. The 5 properties plotted are: Standard deviation
of the chi-1 gauche minus torsion angles, Standard deviation of the chi-1 trans torsion
angles, Standard deviation of the chi-1 gauche plus torsion angles, Pooled standard
deviation of all chi-1 torsion angles, Standard deviation of the chi-2 trans torsion angles.
7. Residue properties and Bond len/angle: The various graphs and diagrams on this plot show
how the protein's geometrical properties vary along its sequence. This gives a visualization
of which regions appear to have consistently poor or unusual geometry (perhaps because
they are poorly defined) and which have more normal geometry. The red colored bars
indicate the unnormal residues.
8. M/c bond lengths: The histograms on this Main-chain bond length distributions plot show
the distributions of each of the different main-chain bond lengths in the structure. The
solid line in the centre of each plot corresponds to the small-molecule mean value, while
the dashed lines either side show the small-molecule standard deviation
9. M/c bond angles: The histograms on this Main-chain bond angel distributions plot show the
distributions of each of the different main-chain bond angles in the structure. The solid
line in the centre of each plot corresponds to the small-molecule mean value, while the
dashed lines either side show the small-molecule standard deviation.

Page 10 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

10. Planar groups: show the RMS distances from planarity for the different planar groups in the
structure. The dashed lines indicate different ideal values for aromatic rings (Phe, Tyr, Trp,
His) and for planar end-groups (Arg, Asn, Asp, Gln, Glu). The default values are 0.03Å and
0.02Å, respectively.
11. Program output: summaries all information and files for each criteria.

Figure 5.3: PROCHECK output

Page 11 of 11

Dymola Full User Manual
No ratings yet
Dymola Full User Manual
1,645 pages
Idioms For 12th Class
0% (1)
Idioms For 12th Class
21 pages
SC92F7352 7351 7350v0.1en
No ratings yet
SC92F7352 7351 7350v0.1en
109 pages
Dynamic Business Strategy Competing in A Fastchanging Uncertain
No ratings yet
Dynamic Business Strategy Competing in A Fastchanging Uncertain
134 pages
Past Paper 2020
No ratings yet
Past Paper 2020
30 pages
Vsphere Esxi 8.0 Installation Setup Guide
No ratings yet
Vsphere Esxi 8.0 Installation Setup Guide
255 pages
MS PDF VIEWER Snowsetanswers 2
No ratings yet
MS PDF VIEWER Snowsetanswers 2
475 pages
30 Common IT Support Interview Questions and Answers - Wisestep
No ratings yet
30 Common IT Support Interview Questions and Answers - Wisestep
8 pages
Lab 5 - 3D Structure Modelling
No ratings yet
Lab 5 - 3D Structure Modelling
21 pages
Biopython Org DIST Docs Tutorial Tutorial HTML
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
267 pages
Protocols and Switching
No ratings yet
Protocols and Switching
48 pages
Boucherit Oussama F1
No ratings yet
Boucherit Oussama F1
55 pages
Modelling
No ratings yet
Modelling
32 pages
Computation Prediction Protein Structure
No ratings yet
Computation Prediction Protein Structure
22 pages
PDS OperaManEPAS3W Us9901
No ratings yet
PDS OperaManEPAS3W Us9901
58 pages
Dokumen - Pub Machine Learning in Bioinformatics of Protein Sequences Algorithms Databases and Resources For Modern Protein Bioinformatics 9811258570 9789811258572
No ratings yet
Dokumen - Pub Machine Learning in Bioinformatics of Protein Sequences Algorithms Databases and Resources For Modern Protein Bioinformatics 9811258570 9789811258572
378 pages
Protein Structure and Sequence Generation With Equivariant Denoising Diffusion Probabilistic Models
No ratings yet
Protein Structure and Sequence Generation With Equivariant Denoising Diffusion Probabilistic Models
18 pages
BMS Lab 2
No ratings yet
BMS Lab 2
35 pages
Structural Bioinformatics Lab Copy Final
No ratings yet
Structural Bioinformatics Lab Copy Final
47 pages
Log
No ratings yet
Log
25 pages
Protein Modelling
No ratings yet
Protein Modelling
20 pages
DS20 ALL Overview
No ratings yet
DS20 ALL Overview
88 pages
Manual
No ratings yet
Manual
272 pages
Protein Structure Prediction
No ratings yet
Protein Structure Prediction
34 pages
Google - Professional Cloud Architect - Page 7 - Examprepper
No ratings yet
Google - Professional Cloud Architect - Page 7 - Examprepper
4 pages
Structural Bioinformatics
No ratings yet
Structural Bioinformatics
23 pages
Homology Modelling
No ratings yet
Homology Modelling
110 pages
Protein Structure Prediction
No ratings yet
Protein Structure Prediction
23 pages
Deep Learning in Protein Structural
No ratings yet
Deep Learning in Protein Structural
23 pages
Smooth Coefficient Estimation of A Seemingly Unrelated Regression
No ratings yet
Smooth Coefficient Estimation of A Seemingly Unrelated Regression
15 pages
Structural Bioinformatics and Protein Structure Prediction
No ratings yet
Structural Bioinformatics and Protein Structure Prediction
14 pages
1769-L35E Compactlogix™ System: User Manual
No ratings yet
1769-L35E Compactlogix™ System: User Manual
149 pages
Experiment-7 (HOMOLOGY MODELING)
No ratings yet
Experiment-7 (HOMOLOGY MODELING)
12 pages
Homology Modeling Tutorial
No ratings yet
Homology Modeling Tutorial
11 pages
Protein Structure Prediction and Modeling
No ratings yet
Protein Structure Prediction and Modeling
20 pages
Protein Modeling in Biochemistry
No ratings yet
Protein Modeling in Biochemistry
29 pages
Bioinformatics DA 2.1
No ratings yet
Bioinformatics DA 2.1
11 pages
WunbeiJoshua BioinformaticsAssignment
No ratings yet
WunbeiJoshua BioinformaticsAssignment
8 pages
Genomic Data Preprocessing Through Different Libraries
No ratings yet
Genomic Data Preprocessing Through Different Libraries
30 pages
Bioinformatics 6 and 7
No ratings yet
Bioinformatics 6 and 7
8 pages
Dr. Qudsia Yousafi
No ratings yet
Dr. Qudsia Yousafi
30 pages
P TS4FI 1909 Final - Updated
100% (2)
P TS4FI 1909 Final - Updated
42 pages
BIF101 - II - Spring 2024
No ratings yet
BIF101 - II - Spring 2024
8 pages
Warehouse Mapping BR
No ratings yet
Warehouse Mapping BR
11 pages
Homology Modelling and Autodock
No ratings yet
Homology Modelling and Autodock
25 pages
Tanvi MITU22BTBI0101
No ratings yet
Tanvi MITU22BTBI0101
7 pages
CamScanner 29-05-2024 14.17
No ratings yet
CamScanner 29-05-2024 14.17
6 pages
How To Used PC To Update V7 V8 Box PDF
No ratings yet
How To Used PC To Update V7 V8 Box PDF
7 pages
Devops Unit 4
No ratings yet
Devops Unit 4
6 pages
Bus Com
No ratings yet
Bus Com
3 pages
De Novo Protein Design
No ratings yet
De Novo Protein Design
6 pages
C++ Polymorphism
No ratings yet
C++ Polymorphism
6 pages
Keshav Com Seminar
No ratings yet
Keshav Com Seminar
5 pages
20 Abbreviations Related To Computer (5 Files Merged)
No ratings yet
20 Abbreviations Related To Computer (5 Files Merged)
11 pages
Tertiary Structure Prediction Methods: Any Given Protein Sequence
No ratings yet
Tertiary Structure Prediction Methods: Any Given Protein Sequence
29 pages
Modeller Manual
No ratings yet
Modeller Manual
278 pages
Learn HTML - Semantic HTML Cheatsheet - Codecademy
No ratings yet
Learn HTML - Semantic HTML Cheatsheet - Codecademy
2 pages
Modeller I Introduction
No ratings yet
Modeller I Introduction
12 pages
User Behavior Analytics
No ratings yet
User Behavior Analytics
2 pages
Pre-Assessment Questions
No ratings yet
Pre-Assessment Questions
18 pages
1.2 How To Create Routes and AVCS Order
No ratings yet
1.2 How To Create Routes and AVCS Order
5 pages
Drug Design
No ratings yet
Drug Design
18 pages
Phyre2: Protein Modeling, Prediction and Analysis
No ratings yet
Phyre2: Protein Modeling, Prediction and Analysis
5 pages
Eswar MethodsMolBiol 2008
No ratings yet
Eswar MethodsMolBiol 2008
25 pages
Xi4 Series Parts Catalog en Us
No ratings yet
Xi4 Series Parts Catalog en Us
17 pages
Homology Modeling, Also Known As Comparative Modeling of
No ratings yet
Homology Modeling, Also Known As Comparative Modeling of
19 pages
Quectel BG96: Lte Cat Nb1
No ratings yet
Quectel BG96: Lte Cat Nb1
2 pages
Homolgy Modeling
No ratings yet
Homolgy Modeling
19 pages
Phyre2 (English Version)
No ratings yet
Phyre2 (English Version)
3 pages
Lab Report 2 Bioinformatics
No ratings yet
Lab Report 2 Bioinformatics
17 pages
EasyModeller4 Manual
No ratings yet
EasyModeller4 Manual
11 pages
Introduction To Solana - Grayscale-Building-Blocks-Solana-1
No ratings yet
Introduction To Solana - Grayscale-Building-Blocks-Solana-1
18 pages
Structure Based Drug Designing For Mycoplasmal Pneumonia
No ratings yet
Structure Based Drug Designing For Mycoplasmal Pneumonia
23 pages
Modeller
No ratings yet
Modeller
6 pages
Blender Blender: Bio Bio
No ratings yet
Blender Blender: Bio Bio
25 pages
Bioinformatics Softwares: by Rifat Shahriyar Student No: 100705037P
No ratings yet
Bioinformatics Softwares: by Rifat Shahriyar Student No: 100705037P
20 pages
Re Cross Docking Tutorial 27aug12
No ratings yet
Re Cross Docking Tutorial 27aug12
5 pages
BIOT 206 - Lecture 1
No ratings yet
BIOT 206 - Lecture 1
3 pages
Chimera Procedures
No ratings yet
Chimera Procedures
3 pages
Mastering Generic Programming in C++: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering Generic Programming in C++: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
BentoML Adapter Integrations for Machine Learning Frameworks: The Complete Guide for Developers and Engineers
From Everand
BentoML Adapter Integrations for Machine Learning Frameworks: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
From Everand
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Python for Chemistry: An introduction to Python algorithms, Simulations, and Programing for Chemistry (English Edition)
From Everand
Python for Chemistry: An introduction to Python algorithms, Simulations, and Programing for Chemistry (English Edition)
Dr. M. Kanagasabapathy
5/5 (1)
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet

Lab05 Manual

Uploaded by

Lab05 Manual

Uploaded by

Practice in Bioinformatics (BT338IU) This is used internally only.

Taken and revised from:

atoms. Presently, MODELLER automatically derives

Figure 5.2: The Modeller version 9.21 window

from modeller import *

#-- Prepare the input files

#-- Write the sequence database in binary form

#-- Now, read in the binary database

#-- Read in the target sequence/alignment

#-- Scan sequence database to pick up homologous sequences

prf.build(sdb, matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat',

#-- Write out the profile in text format

Step 3. Selecting a template from output build_profile.prf file

18 5ldh X 1 333 85 300 94 304 207 26. 0.30E-05

from modeller import *

Each command/line will do the job:

_aln.p 70 80 90 100 110 120 130

_aln.pos 140 150 160 170 180 190 200

_aln.pos 210 220 230 240 250 260 270

_aln.pos 280 290 300 310 320 330

Step 5. Building the models using model-single.py file

from modeller import *

Run the command: mod9.21 model-single.py

>> Summary of successfully produced models:

Step 6. Evaluating the quality of model using PROCHECK server: https://saves.mbi.ucla.edu/

Figure 5.3: PROCHECK output

You might also like