0% found this document useful (0 votes)
13 views11 pages

Lab05 Manual

Uploaded by

laiphuong1112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views11 pages

Lab05 Manual

Uploaded by

laiphuong1112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Practice in Bioinformatics (BT338IU) This is used internally only.

Labwork 5

BASIC MODELLER
MODELING A PROTEIN BASED ON A SINGLE TEMPLATE

Objectives
• Using MODELLER to construct the protein structure from an amino acid sequence
• Chossing an appropriate available 3D protein for the template
• Changing important information in Modeller scripts
• Estimate the quality of the modeled protein using PROCHECK

Taken and revised from:


1/ Modeling lactate dehydrogenase from Trichomonas vaginalis based on a single template1. University
of Illinois at Urbana-Champaign. Beckman Institute for Advanced Science and Technology, Theoretical
and Computational Biophysics Group, Computational Biophysics Workshop.
2/ MODELLER - A Program for Protein Structure Modeling, Release 10.1, r121562. Andrej Šali
3/ PROCHECK v.3.5 - Programs to check the Stereochemical Quality of Protein Structures3. Roman A
Laskowski, Malcolm W MacArthur, David K Smith, David T Jones, E Gail Hutchinson, A Louise Morris,
David S Moss, and Janet M Thornton.

MODELLER
MODELLER is a computer program that models three-dimensional structures of proteins and their
assemblies by satisfaction of spatial restraints.
MODELLER is most frequently used for homology or comparative protein structure modeling: The
user provides an alignment of a sequence to be modeled with known related structures and
MODELLER will automatically calculate a model with all non-hydrogen atoms (these structures are often
homologs, but certainly don't have to be, hence the term “comparative” modeling).
More generally, the input to the program are restraints on the spatial structure of the amino acid
sequence(s) and ligands to be modeled. The output is a 3D structure that satisfies these restraints as well
as possible. Restraints can in principle be derived from a number of different sources. These include
related protein structures (comparative modeling), NMR experiments (NMR refinement), rules of
secondary structure packing (combinatorial modeling), cross-linking experiments, fluorescence
spectroscopy, image reconstruction in electron microscopy, site-directed mutagenesis, intuition, residue-
residue and atom-atom potentials of mean force, etc. The restraints can operate on distances, angles,
dihedral angles, pairs of dihedral angles and some other spatial features defined by atoms or pseudo

1 https://salilab.org/modeller/tutorial/basic.html
2 https://salilab.org/modeller/manual/manual.html
3 http://www.csb.yale.edu/userguides/datamanip/procheck/manual/index.html

Page 1 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

atoms. Presently, MODELLER automatically derives


the restraints only from the known related
structures and their alignment with the target
sequence.
A 3D model is obtained by optimization of a
molecular probability density function (pdf). The
molecular pdf for comparative modeling is
optimized with the variable target function
procedure in Cartesian space that employs
methods of conjugate gradients and molecular
dynamics with simulated annealing.
MODELLER can also perform multiple
comparison of protein sequences and/or
Figure 5.1: First, the known, template 3D structures are
structures, clustering of proteins, and searching of aligned with the target sequence to be modeled. Second,
sequence databases. The program is used with a spatial features, such as Cα- Cα distances, hydrogen
bonds, and mainchain and sidechain dihedral angles, are
scripting language and does not include any
transferred from the templates to the target (a number of
graphics. It is written in standard FORTRAN 90 and spatial restraints on its structure are obtained). Third, the
will run on UNIX, Windows, or Mac computers. 3D model is obtained by satisfying all the restraints as
well as possible.

Figure 5.2: The Modeller version 9.21 window

The individual modeling steps of this example are explained below. Note that we go through every
step in this tutorial to build a model knowing only the amino acid sequence. In practice you may already
know the related structures, and may even have an alignment from another program, so you can skip
one or more steps.

Page 2 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

Lab procedure
Step 1. Making the amino acid sequence file in PIR format – sequence.ali file
It is necessary to put the target amino acid sequence into the PIR format file sequence.ali which is
readable by MODELLER.

sequence.ali: The yellow highlighted sections should be changed depending on your protein.

>P1;name_of_protein_in_code
sequence:name_of_protein_in_code:::::::0.00: 0.00
MSEAAHVLITGAAGQIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKAAFKD
IDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTNCEIAMLHAKNLKPENFSSLSMLD
QNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEGKTQKVVDVLDHDYVFDTFFKKIGHALNHLAQGG*

The first line contains two fields separated by a semicolon in the format
">P1;name_of_protein_in_code". The code of protein name should be short, no space, and
distinguished with other proteins
The second line with ten fields separated by colons generally contains information about the structure
file, if applicable. Only two of these fields are used for sequences, "sequence" (indicating that the file
contains a sequence without known structure) and "name_of_protein_in_code" (should be identical with
previous line).
The rest of the file contains the sequence of the interesting protein, with "*" marking its end. The
standard one-letter amino acid codes are used. (Note that they must be upper case; some lower case
letters are used for non-standard residues and will be ignored.)

The file should be fulfilled with needed information (name_of_protein_in_code, amino acid sequence)
and saved.

Note that, the .ali file should be named with your name_of_protein_in_code for not being confused
later. In this example, I will name it TvLDH (a shorten name of a protein).

Step 2. Searching for structures related to your sequence using build_profile.py file
A search for potentially related sequences of known structure can be performed by
the Profile.build() command of MODELLER. The following script in the build_profile.py file, taken
group by group (the “#” indicates that this line is a note and will not be run), does the job:
1. Initializes the 'environment' for this modeling run, by creating a new 'Environ' object. Almost
all MODELLER scripts require this step, as the new object (which we call here 'env', but you
can call it anything you like) is needed to build most other useful objects.
2. Creates a new 'SequenceDB' object, calling it sdb. 'SequenceDB' objects are used to contain
large databases of protein sequences. And, reads a text format file containing non-
redundant PDB sequences at 95% sequence identity into the sdb database. The sequences
can be found in the file pdb_95.pir (which can be downloaded using the link
https://salilab.org/modeller/supplemental.html). Like the previously-created alignment, this
file is in PIR format. Sequences which have fewer than 30 or more than 4000 residues are
discarded, and non-standard residues are removed.

Page 3 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

3. Writes a binary machine-specific file containing all sequences read in the previous step.
4. Reads the binary format file back in. Note that if you plan to use the same database several
times, you should use the previous two steps only the first time, to produce the binary
database. On subsequent runs, you can omit those two steps and use the binary file directly,
since reading the binary file is a lot faster than reading the PIR file.
5. Creates a new 'Alignment' object, calling it aln, reads our query sequence
"name_of_protein_in_code" from the file sequence.ali, and converts it to a profile prf.
Profiles contain similar information to alignments, but are more compact and better for
sequence database searching.
6. Searches the sequence database sdb for our query profile prf. Matches from the sequence
database are added to the profile. The Profile.build() command has many options.
In this example, we set the parameters as: the BLOSUM62 similarity matrix (rr_file), -
450 (matrix_offset), -500 and -50 (gap_penalties_1d) which are appropriate for the
BLOSUM62 matrix, 1 search iteration (n_prof_iterations), False for no need to check
the profile for deviation (check_profile), only sequences with e-values smaller than or
equal to 0.01 (max_aln_evalue).
7. Writes a profile of the query sequence and its homologs build_profile.prf file. The equivalent
information is also written out in standard alignment format.

build_profile.py: The yellow highlighted sections should be changed depending on your protein.

from modeller import *


log.verbose()
env = Environ()

#-- Prepare the input files


#-- Read in the sequence database
sdb = SequenceDB(env)
sdb.read(seq_database_file='pdb_95.pir', seq_database_format='PIR',
chains_list='ALL', minmax_db_seq_len=(30, 4000), clean_sequences=True)

#-- Write the sequence database in binary form


sdb.write(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
chains_list='ALL')

#-- Now, read in the binary database


sdb.read(seq_database_file='pdb_95.bin', seq_database_format='BINARY',
chains_list='ALL')

#-- Read in the target sequence/alignment


aln = Alignment(env)
aln.append(file='sequence.ali', alignment_format='PIR', align_codes='ALL')
#-- Convert the input sequence/alignment into
# profile format
prf = aln.to_profile()

#-- Scan sequence database to pick up homologous sequences


Page 4 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

prf.build(sdb, matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat',


gap_penalties_1d=(-500, -50), n_prof_iterations=1,
check_profile=False, max_aln_evalue=0.01)

#-- Write out the profile in text format


prf.write(file='build_profile.prf', profile_format='TEXT')
#-- Convert the profile back to alignment format
aln = prf.to_alignment()
#-- Write out the alignment file
aln.write(file='build_profile.ali', alignment_format='PIR')

After changing needed information and downloading the newest pdb_95.pir, this regular Python
script can be run with a command at your command line window: mod9.21 build_profile.py

Note that the command will depend on the version of the Modeller. For example, to run this script using
Modeller version 10.0, type mod10.0 build_profile.py. The command will run and create 2 outputs.

Step 3. Selecting a template from output build_profile.prf file


The output of the build_profile.py script is written to the build_profile.log and build_profile.prf files (or
any names that you made). MODELLER always produces a log file. Errors and warnings in log files can be
found by searching for the "_E>" and "_W>" strings, respectively. In the output file in text format
build_profile.prf, the first 6 commented lines indicate the input parameters used in MODELLER to build
the profile. Subsequent lines correspond to the detected similarities by Profile.build().

build_profile.prf: The yellow highlighted sections lately are added for more information.

# Number of sequences: 30
# Length of profile : 335
# N_PROF_ITERATIONS : 1
# GAP_PENALTIES_1D : -900.0 -50.0
# MATRIX_OFFSET : 0.0
# RR_FILE : ${MODINSTALL8v1}/modlib//as1.sim.mat

#column number
1 2 3 4 5 6 7 8 9 10 11 12 13
1 TvLDH S 0 335 1 335 0 0 0 0. 0.0
2 1a5z X 1 312 75 242 63 229 164 28. 0.83E-08
3 1b8pA X 1 327 7 331 6 325 316 42. 0.0
4 1bdmA X 1 318 1 325 1 310 309 45. 0.0
5 1t2dA X 1 315 5 256 4 250 238 25. 0.66E-04
6 1civA X 1 374 6 334 33 358 325 35. 0.0
7 2cmd X 1 312 7 320 3 303 289 27. 0.16E-05
8 1o6zA X 1 303 7 320 3 287 278 26. 0.27E-05
9 1ur5A X 1 299 13 191 9 171 158 31. 0.25E-02
10 1guzA X 1 305 13 301 8 280 265 25. 0.28E-08
11 1gv0A X 1 301 13 323 8 289 274 26. 0.28E-04
12 1hyeA X 1 307 7 191 3 183 173 29. 0.14E-07
13 1i0zA X 1 332 85 300 94 304 207 25. 0.66E-05
14 1i10A X 1 331 85 295 93 298 196 26. 0.86E-05
15 1ldnA X 1 316 78 298 73 301 214 26. 0.19E-03
16 6ldh X 1 329 47 301 56 302 244 23. 0.17E-02
17 2ldx X 1 331 66 306 67 306 227 26. 0.25E-04

Page 5 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

18 5ldh X 1 333 85 300 94 304 207 26. 0.30E-05


19 9ldtA X 1 331 85 301 93 304 207 26. 0.10E-05
20 1llc X 1 321 64 239 53 234 164 26. 0.20E-03
21 1lldA X 1 313 13 242 9 233 216 31. 0.31E-07
22 5mdhA X 1 333 2 332 1 331 328 44. 0.0
23 7mdhA X 1 351 6 334 14 339 325 34. 0.0
24 1mldA X 1 313 5 198 1 189 183 26. 0.13E-05
25 1oc4A X 1 315 5 191 4 186 174 28. 0.18E-04
26 1ojuA X 1 294 78 320 68 285 218 28. 0.43E-05
27 1pzgA X 1 327 74 191 71 190 114 30. 0.16E-06
28 1smkA X 1 313 7 202 4 198 188 34. 0.0
29 1sovA X 1 316 81 256 76 248 160 27. 0.93E-03
30 1y6jA X 1 289 77 191 58 167 109 33. 0.32E-05

The most important columns in the Profile.build() output are the second, tenth, eleventh and
twelfth columns. In detail:
➢ Column 1: number of order
➢ Column 2: PDB code (first 4 letters) and chain name (last letter) of the PDB query/entry
➢ Column 3-4: signs for your sequence (S/0) or queries/entries (X/1)
➢ Column 5: number of amino acids of query/entry sequence
➢ Column 6-7: start and end positions of your sequence
➢ Column 8-9: start and end positions of the query/entry
➢ Column 10: length of the alignment between your sequence and a PDB query/entry
➢ Column 11: percentage of identities between your sequence and a PDB sequence
➢ Column 12: e-value of paired alignment
➢ Column 13: sequence alignment between your sequence and queries (omited in this example)

There are 2 criteria to choose a suitble template. 1/ A sequence identity value above approximately
25% indicates a potential template, means the higher identity the better template, over the length of the
alignment. And 2/ a better measure of the significance of the alignment e-value is as small as good.
After choosing your template, it is should be downloaded in .pdb format from the RCSB PDB4 database.

For example, in this case I have 6 PDB entries having lowest e-value 0: 1b8pA, 1bdmA, 1civA, 5mdhA,
7mdhA, and 1smkA. Among them, 1bdmA has the highest identity (45%) but the total length of
alignment is only 309 (~139 identical amino acids). Meanwhile, the 5mdhA has the second highest
identity (44%) and the total length of alignment is 328 (~144 identical amino acids). Between 139 and
144, it is not significantly different, thus I can choose either 1bdmA or 5mdhA. In this case, I will choose
1bdmA (chain A of PDB ID 1bdm).

Step 4. Aligning your protein with your chosen PDB template by align2d.py file
A good way of aligning your sequence with the PDB template is the align2d() command
in MODELLER. This align2d() is based on a dynamic programming algorithm and takes into account
structural information from the template when constructing an alignment. This task is achieved through
a variable gap penalty function that tends to place gaps in solvent exposed and curved regions, outside
secondary structure segments, and between two positions that are close in space. As a result, the
alignment errors are reduced by approximately one third relative to those that occur with standard
sequence alignment techniques. This improvement becomes more important as the similarity between
the sequences decreases and the number of gaps increases. In the current example, the template-target

4 www.rcsb.org
Page 6 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

similarity is so high that almost any alignment method with reasonable parameters will result in the same
alignment. The following MODELLER script aligns your template sequence in file sequence.ali with
the chain A of 1bdm structure in the PDB file 1bdm.pdb.

align2d.py: The yellow highlighted sections should be changed depending on your protein.

from modeller import *

env = Environ()
aln = Alignment(env)
mdl = Model(env, file='1bdm', model_segment=('FIRST:A','LAST:A'))
aln.append_model(mdl, align_codes='1bdmA', atom_files='1bdm.pdb')
aln.append(file='TvLDH.ali', align_codes='TvLDH')
aln.align2d()
aln.write(file='TvLDH-1bdmA.ali', alignment_format='PIR')
aln.write(file='TvLDH-1bdmA.pap', alignment_format='PAP')

Each command/line will do the job:


1. Create an 'Environ' object to use as input to later commands
2. Create an empty alignment aln, and then a new protein model mdl, into which the chain A
segment of the 1bdm PDB structure file is read.
3. The append_model() command transfers the PDB sequence of this model to the alignment
and assigns it the name of "1bdmA" (align_codes for chain A of 1bdm).
4. Add your sequence from file sequence.ali to the alignment and coded it followed your name,
using the append() command.
5. The align2d() command is then executed to align the two sequences.
6. Finally, the alignment is written out in two formats, PIR (TvLDH-1bdmA.ali) and PAP (TvLDH-
1bdmA.pap). The PIR format is used by MODELLER in the subsequent model building stage,
while the PAP alignment format is easier to inspect visually. In the PAP format, all identical
positions are marked with a "*".

After changing needed information and downloading the chosen PDB template, run the command:
mod9.21 align2d.py

TvLDH-1bdmA.pap

_aln.pos 10 20 30 40 50 60
1bdmA MKAPVRVAVTGAAGQIGYSLLFRIAAGEMLGKDQPVILQLLEIPQAMKALEGVVMELEDCAFPLLAGL
TvLDH MSEAAHVLITGAAGQIGYILSHWIASGELYG-DRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGF
_consrvd * * ********* * ** ** * * * * ** ** ** * ********* ***

_aln.p 70 80 90 100 110 120 130


1bdmA EATDDPDVAFKDADYALLVGAAPRL---------QVNGKIFTEQGRALAEVAKKDVKVLVVGNPANTN
TvLDH VATTDPKAAFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTN
_consrvd ** ** **** * * ** * * ** * * ** ***** *** ***

_aln.pos 140 150 160 170 180 190 200


1bdmA ALIAYKNAPGLNPRNFTAMTRLDHNRAKAQLAKKTGTGVDRIRRMTVWGNHSSIMFPDLFHAEVD---
TvLDH CEIAMLHAKNLKPENFSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEG
_consrvd ** * * * ** ** *** * * * * ***** * ** *

_aln.pos 210 220 230 240 250 260 270


1bdmA -GRPALELVDMEWYEKVFIPTVAQRGAAIIQARGASSAASAANAAIEHIRDWALGTPEGDWVSMAVPS
Page 7 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

TvLDH KTQKVVDVLDHDYVFDTFFKKIGHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPV
_consrvd * * * * ** **** *** * * ** * ** *

_aln.pos 280 290 300 310 320 330


1bdmA Q--GEYGIPEGIVYSFPVTAK-DGAYRVVEGLEINEFARKRMEITAQELLDEMEQVKAL--GLI
TvLDH PEGNPYGIKPGVVFSFPCNVDKEGKIHVVEGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG
_consrvd *** * * *** * **** * * * * * *

Step 5. Building the models using model-single.py file


Once a target-template alignment is constructed, MODELLER calculates a 3D model of the target
completely automatically, using its AutoModel class. The following script will generate five similar
models of the sequence based on the PDB template structure and the aligned .ali file from the previous
step.
Each command/line will do the job:
1. The first and second lines load in the AutoModel class and prepares it for use.
2. An AutoModel object is created and called 'a', and set parameters to guide the model building
procedure: alnfile names the file that contains the target-template alignment in the PIR
format (prvious output .ali file). knowns defines the known template structure. Sequence
defines the name of the target sequence. assess_methods requests one or more
assessment scores.
3. starting_model and ending_model define the number of models that are calculated
(their indices will run from 1 to a number of models).
4. The last line in the file calls the make method that actually calculates the models.

model-single.py: The yellow highlighted sections should be changed depending on your protein.

from modeller import *


from modeller.automodel import *
#from modeller import soap_protein_od

env = Environ()
a = AutoModel(env, alnfile='TvLDH-1bdmA.ali',
knowns='1bdmA', sequence='TvLDH',
assess_methods=(assess.DOPE,
#soap_protein_od.Scorer(),
assess.GA341))
a.starting_model = 1
a.ending_model = 5
a.make()

Run the command: mod9.21 model-single.py

The codes for knowns, and sequence should be exactly the same as before or as in the aligned .ali
file. Such as, knowns = align_codes of model and sequence = align_codes of your sequence in
the align2d.py. Or, knows = first code and sequence = second code in the TvLDH-1bdmA.pap file.

The most important output file is model-single.log, which reports warnings, errors and other useful
information including the input restraints used for modeling that remain violated in the final model. The
last few lines from this log file are shown below.

Page 8 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

model-single.log

>> Summary of successfully produced models:


Filename molpdf DOPE score GA341 score
----------------------------------------------------------------------
TvLDH.B99990001.pdb 1763.56104 -38079.76172 1.00000
TvLDH.B99990002.pdb 1560.93396 -38515.98047 1.00000
TvLDH.B99990003.pdb 1712.44104 -37984.30859 1.00000
TvLDH.B99990004.pdb 1720.70801 -37869.91406 1.00000
TvLDH.B99990005.pdb 1840.91772 -38052.00781 1.00000

The log file gives a summary of all the models built. For each model, it lists the file name, which
contains the coordinates of the model in PDB format. The models can be viewed by any program that
reads the PDB format, such as VMD. The log also shows the score(s) of each model:
✓ The MODELLER objective function, molpdf, is always calculated, and is also reported in a
REMARK in each generated PDB file.
✓ The Discrete Optimized Protein Energy, DOPE, is used to assess homology models in protein
structure prediction.
✓ The Statistically Optimized Atomic Potentials, SOAP, is an orientation-dependent potential
and can only reliably be used for scoring (not optimization) as its first derivatives are zero.
✓ The GA341 score uses the percentage sequence identity between the template and the model
as a parameter and always ranges from 0.0 (worst) to 1.0 (native-like).
The molpdf, DOPE, and SOAP scores are not absolute measures, in the sense that they can only be
used to rank models calculated from the same alignment. The best model can be selected in several
ways: the model with the lowest value of the molpdf or the DOPE or the SOAP assessment scores, or
with the highest GA341 assessment score; however, GA341 is not as good as DOPE or SOAP at
distinguishing 'good' models from 'bad' models.
To calculate the SOAP score, you will first need to download the SOAP-Protein potential file from the
SOAP website5, then uncomment the SOAP-related lines in model-single.py by removing the '#'
characters.

Step 6. Evaluating the quality of model using PROCHECK server: https://saves.mbi.ucla.edu/


• Upload your chosen model to the server and run programs
• The next page will give you 6 different things to do with your protein. Choose PROCHECK.
• Read the output (fig 5.3)
1. Summary: overview of all criteria to access the quality of protein based on the stereochemitry.
The evaluations include “+”: warning – not or may be worth investigating further and
continue to next experiments; and “*”: error – should be rechecked and investigated
further.
2. Ramachandran plot: shows the phi-psi torsion angles for all residues in the ensemble (except
those at the chain termini). Those in unfavourable conformations are labelled by the
position number their name and position.
3. All Ramachandrans: Ramachandran plot of each available amino acid type. The numbers in
brackets, following each residue name, show the total number of data points on that
graph. Those in unfavourable conformations are labelled by the position number.

5 https://salilab.org/SOAP/
Page 9 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

4. Chi1-chi2 plots: show the chi1-chi2 torsion angle combinations for all residue types that have
both these angles. The shading on each plot indicates how favourable each region on the
plot is; the darker the shade the more favourable the region. The numbers in brackets,
following each residue name, show the total number of data points on that graph. Those
in unfavourable conformations are labelled by the position number.
5. Main-chain params: The six graphs on the main-chain parameters plot show how the
structure (represented by the solid square) compares with well-refined structures at a
similar resolution. The dark band in each graph represents the results from the well-refined
structures; the central line is a least-squares fit to the mean trend as a function of
resolution, while the width of the band on either side of it corresponds to a variation of
one standard deviation about the mean. a. Ramachandran plot quality is measured by
the percentage of the protein's residues that are in the most favoured, or core, regions of
the Ramachandran plot. b. Peptide bond planarity is measured by calculating the
standard deviation of the protein structure's omega torsion angles. c. Bad non-bonded
interactions is measured by the number of bad contacts per 100 residues. d. Calpha
tetrahedral distortion is measured by calculating the standard deviation of the zeta
torsion angle. e. Main-chain hydrogen bond energy is measured by the standard
deviation of the hydrogen bond energies for main-chain hydrogen bonds. f. Overall G-
factor is a measure of the overall normality of the structure.
6. Side-chain params: The five graphs on the side-chain parameters plot show how the structure
(represented by the solid square) compares with well-refined structures at a similar
resolution. The dark band in each graph represents the results from the well-refined
structures; the central line is a least-squares fit to the mean trend as a function of
resolution, while the width of the band on either side of it corresponds to a variation of
one standard deviation about the mean. The 5 properties plotted are: Standard deviation
of the chi-1 gauche minus torsion angles, Standard deviation of the chi-1 trans torsion
angles, Standard deviation of the chi-1 gauche plus torsion angles, Pooled standard
deviation of all chi-1 torsion angles, Standard deviation of the chi-2 trans torsion angles.
7. Residue properties and Bond len/angle: The various graphs and diagrams on this plot show
how the protein's geometrical properties vary along its sequence. This gives a visualization
of which regions appear to have consistently poor or unusual geometry (perhaps because
they are poorly defined) and which have more normal geometry. The red colored bars
indicate the unnormal residues.
8. M/c bond lengths: The histograms on this Main-chain bond length distributions plot show
the distributions of each of the different main-chain bond lengths in the structure. The
solid line in the centre of each plot corresponds to the small-molecule mean value, while
the dashed lines either side show the small-molecule standard deviation
9. M/c bond angles: The histograms on this Main-chain bond angel distributions plot show the
distributions of each of the different main-chain bond angles in the structure. The solid
line in the centre of each plot corresponds to the small-molecule mean value, while the
dashed lines either side show the small-molecule standard deviation.

Page 10 of 11
Practice in Bioinformatics (BT338IU) This is used internally only.

10. Planar groups: show the RMS distances from planarity for the different planar groups in the
structure. The dashed lines indicate different ideal values for aromatic rings (Phe, Tyr, Trp,
His) and for planar end-groups (Arg, Asn, Asp, Gln, Glu). The default values are 0.03Å and
0.02Å, respectively.
11. Program output: summaries all information and files for each criteria.

Figure 5.3: PROCHECK output

Page 11 of 11

You might also like