Tutorial 3
Tutorial 3
Introduction:
This tutorial covers the following advanced processing of SAXS data:
• Evaluating ambiguity of 3D shape reconstructions - AMBIMETER
• 3D reconstruction of bead models - DAMMIF/N and DAMAVER
• Comparing 3D bead model reconstructions and crystal structures (real space)
– SUPCOMB and ChimeraX
• Comparing scattering data and crystal structures (q space) - CRYSOL and
FoXS
• 3D reconstruction of electron density – DENSS
• Comparing 3D electron density reconstructions and crystal structures (real
space) – ChimeraX and PyMOL
Requirements:
1. BioXTAS RAW, version 2.1.4 (newest).
• Install instructions are available from:
https://bioxtas-raw.readthedocs.io/en/latest/install.html
• This tutorial assumes you are familiar with RAW.
2. ATSAS programs, version >=3.0.1.
• Download and install instructions are available from:
http://www.embl-hamburg.de/biosaxs/download.html
• Requires a free registration for academic users. Industrial users must
pay to use.
3. ChimeraX.
• Download and install instructions are available from:
https://www.cgl.ucsf.edu/chimera/download.html
4. PyMOL, version >=2.0.
• A free trial download is available from: https://pymol.org/2/
5. Internet connection.
1. Clear all of the data in RAW. Load the glucose_isomerase.out file that you
saved in the reconstruction_data folder in a previous part of the tutorial.
• Note: If you haven’t done the previous part of the tutorial, or forgot to
save the results, you can find the glucose_isomerase.out file in the
reconstruction_data/gi_complete folder.
2. Right click on the glucose_isomerase.out item in the IFT list. Select the
“AMBIMETER” option.
3. The new window will show the results of AMBIMETER. It includes the number
of shape categories that are compatible with the scattering profile, the
ambiguity score (also called an “a-score”, which is log base 10 of the number
of shape categories), and the AMBIMETER interpretation of whether or not
you can obtain a unique 3D reconstruction.
1. Right click on the glucose_isomerase.out item in the IFT list. Select the
“Bead Model (DAMMIF/N)” option.
• Note: If necessary, load the glucose_isomerase.out file that you
saved in the reconstruction_data folder in a previous part of the
tutorial. If you haven’t done the previous part of the tutorial, or forgot
to save the results, you can find the glucose_isomerase.out file in
the reconstruction_data/gi_complete folder.
2. Running DAMMIF generates a lot of files. Click the “Select” button for the
output directory, make a new folder in the reconstruction_data directory
called gi_dammif and select that folder.
3. Change the number of reconstructions to 5 and the Mode to Fast (if
necessary).
• Note: It is generally recommended that you do 15-20 reconstructions
However, for the purposes of this exercise, or for obtaining an initial
quick look at results, 3-5 are enough.
• Note: For final reconstructions for a paper, DAMMIF should be run in
Slow mode. For this tutorial, or for obtaining an initial quick look at
results, Fast mode is fine.
4. Uncheck the “Refine average with dammin” checkbox.
• Note: For final reconstructions for a paper, DAMMIN refinement should
be done. However, it is quite slow, so for the purposes of this tutorial
we won't do it.
5. RAW can align the DAMMIF/N output with a PDB/mmCIF structure using
CIFSUP from the ATSAS package. To do so, check the ‘Align output to
PDB/mmCIF’ box and select the 1XIB_4mer.pdb file in the
reconstruction_data/gi_complete folder.
• Tip: If you’re not sure if you selected the correct file, hovering your
mouse over the filename will show the full path.
• Note: Some settings are accessible in the panel, and all settings can
be changed in the advanced settings panel.
8. Wait for all of the DAMMIF runs, DAMAVER and alignment to finish. Depending
on the speed of your computer this could take a bit.
• Question: Based on the AMBIMETER results from the previous part of
the tutorial, how good a reconstruction do you expect?
9. Once the reconstructions are finished, the window should automatically switch
to the results tab. If it doesn’t, click on the results tab.
10.The results panel summarizes the results of the reconstruction run. At the top
of the panel there is the AMBIMETER evaluation of how ambiguous the
reconstructions might be. If DAMAVER was run, there are results from the
normalized spatial discrepancy (NSD), showing the mean and standard
deviation of the NSD, as well as how many of the reconstructions were
included in the average. If DAMAVER was run on 3 or more reconstructions,
and ATSAS >=2.8.0 is installed, there will be the output of SASRES which
provides information on the resolution of the reconstruction. If DAMAVER
found more than one cluster, the number of clusters and information on each
cluster is shown. Note that DAMCLUST (ATSAS <=3.1.0) provided more
information about the clusters, so some fields will be blank with ATSAS
>=3.1.1.
11.Information on each individual model is shown at the bottom. The summary
tab gives the model c2, Rg, Dmax, excluded volume, molecular weight
estimated from the excluded volume, and, if appropriate, mean NSD of the
model.
• Any models are rejected from the average by DAMAVER will be shown
in red in the models list.
• The model highlighted in blue is the ‘most probable’ model, this can be
used as your final bead model instead of doing a dammin refinement.
12.Also, each individual model has a tab which shows the data, the model fit,
and the residuals. Check that for each model the visual fit is good, and the
residuals are flat and randomly distributed about zero.
• Note: Generally, the file of interest is the -1.cif file, in this case
glucose_isomerase_01-1.cif, glucose_isomerase_02-1.cif, etc.
21.If averaging was done with DAMAVER, the results are saved in the selected
output folder with the given prefix, in this case glucose_isomerase. The
output files generated are described in the DAMAVER manual
(https://www.embl-hamburg.de/biosaxs/manuals/damaver.html).
• Note: Generally, the file of interest is the generated damfilt mmCIF:
<prefix>_damfilt.cif. For this tutorial, those would be
glucose_isomerase_damfilt.cif.
22.If multiple clusters were found, the results are saved in the selected output
folder with the given prefix (for this tutorial, glucose_isomerase). The files
generated are described in the DAMAVER manual (https://www.embl-
hamburg.de/biosaxs/manuals/damaver.html).
23.If refinement was done with DAMMIN, the results are saved in the selected
output folder as refine_<prefix>, e.g. for this tutorial
refine_glucose_isomerase. The files generated are described in the
DAMMIN manual (https://www.embl-
hamburg.de/biosaxs/manuals/dammin.html#output).
• Note: Generally, the file of interest is the -1.cif file, in this case
refine_glucose_isomerase-1.cif.
24.If alignment to a reference PDB was done with SUPCOMB, the files aligned
depend on what other processing was done.
• If refinement was done, then there will be a single file named
refine_<prefix>_-1_aligned.cif. For this tutorial,
refine_glucose_isomerase-1_aligned.cif.
• If no refinement is done but averaging is done, then the damaver and
damfilt results are aligned, as well as the most probable model (the
blue highlighted model in the summary panel). The associated
filenames would be <prefix>_damaver_aligned.cif,
<prefix>_damfilt_aligned.cif, and <prefix>_##_-1_aligned.cif
where ## is the model number of the most probable model. For this
tutorial, glucose_isomerase_damaver_aligned.cif,
glucose_isomerase_damfilt_aligned.cif, and
glucose_isomerase_##-1_aligned.cif.
• If no refinement is done but clustering is done, then the representative
models of each cluster is aligned. The associated filenames would be
<prefix>_##-1_aligned.cif where ## is the model number of the
representative model. For this tutorial, that is
glucose_isomerase_##-1_aligned.cif.
Ambiguity
As discussed in the previous section, AMBIMETER determines how many shapes
might have produced your measured scattering profile. Having an ambiguity score <
2.5 (ideally < 1.5) is important for a good reconstruction.
The average NSD is commonly used to evaluate the stability of the reconstruction.
Roughly speaking we evaluate reconstruction stability as:
Generally speaking, if your average NSD is less than 1.0, the reconstruction can
probably be trusted (if all of the other validation metrics also check out),
while if it is greater than 1.0 you should proceed with caution, or not use
the reconstructions at all.
The NSD is also used to determine which models to include in the average.
If the average NSD of a given model is more than two standard deviations above
the overall average NSD, that model is not included in the average. If more
than ~2 models are rejected (out of 15), that may be a sign of an unstable
reconstruction.
Clusters
DAMCLUST creates clusters of models that are more similar to each other than they
are to the rest of the models. This is a way of assessing the ambiguity of the
reconstruction. If you have more than one cluster of models in your reconstructions,
you may have several distinct shapes that are being reconstructed by the DAMMIF
algorithm. This typically indicates that there are several distinct shapes in solution
that could generate the measured scattering profile, and so is another indication of a
highly ambiguous reconstruction.
The caveat to this is that with good quality data that is very low ambiguity
(ambiguity score from AMBIMETER < 0.5) and yields a set of reconstructions with a
very small average NSD (<0.5, typically) and NSD standard deviation (~0.01), I
have seen several (often >5) clusters identified with DAMCLUST. I believe that in
this case there are not actually multiple clusters, but the extremely low deviation
between the models is fooling the DAMCLUST algorithm.
Note that the different clusters should not be taken as representatives of different
distinct shapes in solution. Even if there are a finite number of distinct shapes
scattering in the solution (such as an open and closed state of a protein), the
measured scattering profile is an average of the scattering from each component,
and each individual reconstruction fits that measured scattering profile. As such,
there is no way for an individual reconstruction to fit just the scattering from one of
the components and so the different clusters cannot be representative of the
different shapes in the solution.
The Rg and Dmax obtained from the model should be close to those calculated from
the P(r) function. If that is not the case, you should reevaluate your P(r) function
The volume is reported for each bead model, but it is usually easier to compare the
molecular weight calculated from that volume with the expected molecular weight. In
this case, M.W. is calculated by dividing the volume (nominally representing the
sample's excluded volume) by an empirically determined constant of 1.66 (used in
RAW, other programs may use different values). This value is approximate, and
varies between roughly 1.5 and 2.0 depending on the shape of the macromolecule.
This M.W. is less well determined than other SAXS methods, given the variation in
the coefficient. As such, it is mostly useful for indicating general agreement between
the overall size of the reconstruction and the expected size. If the M.W. is different
from the expected M.W. by more than 20-25% you should consider the
reconstructions to be suspect.
3. In the window that opens, ‘Target’ is the model that is aligned, whereas
‘Reference’ is the model that the target is aligned to. In other words, the
Reference model stays unchanged, while the target model is moved to best
align with the Reference.
4. Use the Reference ‘Select’ button to select the 1XIB_4mer.pdb file in the
reconstruction_data/gi_complete folder.
• Tip: Only the filename will show up in either the Reference or Target
box. If you hover your mouse over the filename it will show the full
path to the file.
5. Use the Target ‘Select’ button to select the refine_glucose_isomerase-
1.pdb file in the reconstruction_data/gi_complete/gi_dammif folder.
6. Click the start button. CIFSUP will run, and you should see the ‘Status’ update
to ‘Running alignment’ and then ‘Alignment finished’.
7. When CIFSUP is finished, in the same folder as the target file you will see a
<target_name>_aligned.pdb file, which is the target model aligned with
the reference file.
8. Advanced settings can be accessed by clicking on the ‘Advanced Settings’ text
to expand the section. These settings are described in the CIFSUP manual
(https://www.embl-hamburg.de/biosaxs/manuals/supcomb.html).
12.Now you need to set the dummy atom radius to be what DAMMIF/N used. To
find this, open the bead model .pdb file in a text editor, such as Notepad
(Windows) or TextEdit (MacOS).
13.Find the “Atomic Radius” (DAMMIF/DAMAVER) or “DAM packing radius”
(DAMMIN) number. Make a note of this radius. This is the bead size you need
to set.
14.In the command line, type “size atomradius X” (without quotes) where x is
the atomic radius you just found.
15.Your model is now the right size. You can either stop here, and just adjust the
settings of the beads, or you can make an envelope. Most typically models
are presented as an envelope, but either is fine. The next steps detail how to
make an envelope.
16.Look in the Model Panel. Identify the ID number of the bead model.
17.We now need to make a surface. In the command line type “molmap #y z”
(without quotes) where y is the ID number for the bead model and z is 3x the
bead size you found in the previous steps.
• Note: You should see an ‘envelope’ form, but it still needs some
adjustment to be useful.
• Tip: You can vary the final number, which sets the smoothness of the
envelope. I find that 3*(bead size) is reasonable, but it depends on the
size of the model beads, and how smooth you want your envelope. It
is generally a good idea in SAXS to leave various lumps, or to actually
be able to see the outline of beads, so that your audience (and you)
remembers that an envelope is NOT an electron density contour.
18.In the Model Panel, turn off the bead model by selecting it and clicking the
“Hide” button.
19.In the Volume Viewer controls that appear after the molmap command (lower
right of the window) click on the color box.
20.In the window that appears, set the Opacity to 40%. Change the color if you
want. Close the color window.
21.Click and drag to rotate your viewer and compare the envelope to the crystal
structure.
Part 4b. Compare crystal structure and SAXS data (real space)
– PyMOL (optional)
In addition to ChimeraX (Part 4b) you can also use PyMOL to visualize bead model
reconstructions/envelopes.
6. For the bead model, use the model Show menu (‘S’) to show ‘spheres’.
7. Now you need to set the dummy atom radius to be what DAMMIF/N used. To
find this, open the bead model .pdb file in a text editor, such as Notepad
(Windows) or TextEdit (MacOS).
8. Find the “Atomic Radius” (DAMMIF/DAMAVER) or “DAM packing radius”
(DAMMIN) number. Make a note of this radius. This is the bead size you need
to set.
11.Your model is now displayed correctly with the beads. However, it is more
useful to set the beads as partly transparent, so you can see the high
resolution structure through them. Do this with the command “set
sphere_transparency, 0.6”
12.Instead of the sphere representation you can create an envelope that shows
the edges of the model. To do so, using the model Show menu show ‘surface’
and using the model Hide menu, hide ‘spheres’.
13.You can set the transparency of your surface with the command “set
transparency, 0.5”.
14.If you want to smooth out the surface you and adjust the probe radius using
the command “set solvent_radius, 3.0” (where you can vary the size from
3.0).
15.You can also improve the surface quality using the command “set
surface_quality, 1”
• Note that values larger than 1 may take a while to render.
16.Finally you might want to set the colors to something a bit nicer, to get a final
display of your envelope and high resolution structure.
5. On the results page that appears, click on the Profile file name
(2POL.pdb.dat) to download. Move that downloaded file to your
polymerase_theory folder and rename it 2POL_foxs.dat.
6. Open RAW if it is not already open. If it is open, clear all data loaded in RAW
(unless your reconstruction is still running. If it is, just remove any items in
the manipulation panel).
7. Load the polymerase.dat file in the Example_Data/reconstruction_data
folder.
8. Carry out Guinier, molecular weight, and GNOM analysis on the scattering
profile. Save the polymerase.out file in the reconstruction_data folder.
9. Load the 2POL_foxs.dat file in the polymerase_theory folder.
10.The theoretical scattering profiles extends from q=0 to q=0.5, a much wider
range than the measured profile. To make comparison easy you can trim the
q range of the 2POL_foxs.dat profile to match the polymerase.dat profile.
Use the triangle to show more options for the scattering profile and adjust the
qmin until it is 0.01. Adjust qmax until it is 0.24.
11.Star the polymerase.dat file, right click on the 2POL_foxs.dat file and
select the Other Options->Superimpose. In the dialog window that pops up,
select ‘Scale’.
• Note: The first file is the crystal structure from which to generate a
theoretical profile. The second file is the experimental data file to fit
the theoretical profile to.
• Note: Again, this fits parameters to adjust the excluded volume and
hydration layer contrast (Ra and Dro respectively).
• Note: You can also use CRYSOL without fitting to data.
18.This will generate several files. The file with the scattering profile is the
2POL00.fit file. Load it into RAW.
• Note: When it loads, it will load two scattering profiles. The
2POL00.fit profile is the experimental data (identical to the
polymerase.dat file except that I(q) values of 0 have been added for
all q points below the minimum measured q). The 2POL00_FIT profile
is the theoretical profile.
19.Hide the 2POL00.fit profile.
20.Adjust the q range of the 2POL00_FIT file until qmin is 0.0097.
• Question: How does this theoretical profile compare to those from
FoXS?
1. Clear all of the data in RAW. Load the polymerase.out file located in the
reconstruction_data/polymerase_complete folder.
• Note: This is the P(r) function for the polymerase data without the
truncation to 8/Rg. When using DENSS you should use the full q range
of your data.
• Note: Unlike DAMMIF/N, DENSS can also be run on BIFT P(r)
functions.
2. Right click on the polymerase.out item in the IFT control panel. Select the
“Electron Density (DENSS)” option.
3. Running DENSS generates a lot of files. Click the “Select” button for the
output directory, make a new folder in the reconstruction_data directory
called polymerase_denss and select that folder.
4. Change the number of reconstructions to 4 and the mode to Fast.
• Note: It is generally recommended that you do at least 20
reconstructions. However, for the purposes of this tutorial, 4 are
enough.
• Note: For final reconstructions for a paper, DENSS should be run in
Slow mode. For this tutorial, or for obtaining an initial quick look at
results, Fast mode is fine.
5. RAW can align the DENSS output with a PDB structure. To do so, check the
‘Align output to PDB/MRC’ box and select the 2POL.pdb file in the
reconstruction_data/polymerase_complete folder.
• Tip: If you’re not sure if you selected the correct file, hovering your
mouse over the filename will show the full path to the file.
9. Once the reconstructions are finished, the window should automatically switch
to the results tab. If it doesn’t, click on the results tab.
10.The results panel summarizes the results of the reconstruction runs. If you
are using a .out file, then at the top of the panel there is the ambimeter
evaluation of how ambiguous the reconstructions might be (see earlier
tutorial section). If averaging was run there is an estimate of the
reconstruction resolution based on the Fourier shell correlation. In the models
section there are several tabs. The summary tab shows the c2, Rg, support
volume, and RSC to the reference model. If any model was not included in
the averaging it is highlighted in red.
• Verify that the Rg is close to the expected value, and that the c2and
support volumes are relatively consistent between models.
• Note: The c2 is much too small at the moment. The DENSS algorithm
doesn’t properly compute c2 for smoothed data. For the moment c2
should only be used as a convergence criteria, not to evaluate the
model fit.
11.Individual model results are displayed in the numbered tabs. For each
individual model there are plots of: the original data and the model data
(scattering from density); the residual between the original data and the
model data; and c2, Rg and support volume vs. refinement step.
• Verify that the residual between the actual data and the model data is
small.
• Check that the c2, Rg, and support volume have all plateaued
(converged) by the final steps.
12.If the densities were averaged, the average tab will display the Fourier shell
correlation vs. resolution.
• Note: The reconstruction resolution is taken as the resolution in
angstroms where the correlation first crosses 0.5.
3. In the window that opens, ‘Target’ is the model that is aligned, whereas
‘Reference’ is the model that the target is aligned to. In other words, the
Reference model stays unchanged, while the target model is moved to best
align with the Reference.
4. Use the Reference ‘Select’ button to select the 2POL.pdb file in the
reconstruction_data/polymerase_complete folder.
• Tip: Only the filename will show up in either the Reference or Target
box. If you hover your mouse over the filename it will show the full
path to the file.
5. Use the Target ‘Select’ button to select the polymerase_refine.mrc file in
the reconstruction_data/polymerase_denss folder.
7. When alignment is finished, in the same folder as the target file you will see a
<target_name>_aligned.mrc file. Compare this to the
<reference_name>_centered.pdb file in the reference file folder. In this
case those names are polymerase_refine_aligned.mrc and
2POL_centered.pdb.
8. You can change the advanced settings by expanding the Advanced Settings
section. These advanced settings are:
• Number of cores: Number of cores to use during alignment.
• Enantiomorphs: Whether to generate enantiomorphs of the Target
before doing the alignment.
• Center reference: Whether to center the reference model at the origin.
If used, this creates a <reference_name>_centered.pdb in the
same folder as the reference file.
• PDB calc. resolution: The resolution of the density map created from
the Reference PDB model to compare with the Target model. This has
no effect if the Reference is already a density.
Note: Significant portions of this tutorial are based on this tutorial by Thomas Grant:
https://www.tdgrant.com/denss/tips/
8. You now have a version of an envelope view for the electron density. If you
want a more advanced visualization, we can use PyMOL
9. Close Chimera.
10.Open PyMOL.
11.In PyMOL, open the 2POL_centered.pdb and polymerase_refine.mrc
files.
• Note: There is a bug in the DENSS alignment program for PDBs with
multiple chains, which can cause them to not load properly into
PyMOL.
12.In older versions of PyMOL, when you open polymerase_refine.mrc you will
get the ‘Map Import’ dialog. Check the ‘volume’ representation box and click
‘Load’.
13.In newer versions of PyMOL, when you open polymerase_refine.mrc, in the
‘A’ model menu select ‘Volume’. This will create a new model in the model
menu showing the volume.
14.In the model panel, click on the ‘H’ in the 2POL_centered line, and select
waters to hide the waters in the PDB model.
15.In the polymerase_refine_volume line click on the ‘C’ and select ‘rainbow’.
This creates an initial rainbow map for the density.
16.In the polymerase_refine_volume line click on the ‘C’ and select ‘panel’.
This opens up a panel where you can adjust the colors. By dragging the
colored dot left or right you adjust the sigma threshold for the color map. By
dragging the colored dot up or down you adjust the opacity of the color.
17.You can also explicitly create a color ramp using the PyMOL command line.
Enter the following command and then hit enter: volume_ramp_new
colored_density, 2 blue 0.0 2.5 blue 0.01 5 cyan 0.01 7.5
green 0.01 10 yellow 0.01 15 red 0.01 200 red 0.03
18.Once you have created the color ramp you can apply it to the volume object
with the following command: volume_color
polymerase_refine_aligned_volume, colored_density
19.You may have noticed that the map isn’t particularly accurate, it’s got low
density in some parts of the ring, and some larger bulges on the outside. Why
might this be? How would you improve it?
Depending on your needs, these can be very handy resources, and we encourage
you to check them out:
http://www.embl-hamburg.de/biosaxs/atsas-online/
The FoXS server, which you have used in a previous part of the tutorial, is another
handy online tool:
https://modbase.compbio.ucsf.edu/foxs/index.html
Windows 7:
1. Open a command prompt by clicking on the start menu, searching for “cmd”
(no quotes) and running the cmd program.
2. Type “cd ” (no quotes)
3. Drag the folder you want to move to into the command prompt. It should
automatically put the folder path in the command prompt. For example, if you
put the Example_Data directory on your desktop, and wanted to move to it,
you should now see “cd C:\Users\<username>\Desktop\Example_Data” (no
quotes) on the command line.
4. Hit enter.
5. The command prompt should show you what directory you are in (listed to
the left of the prompt).
6. To check what files are in the current directory, type “dir” (no quotes) and hit
enter.
Windows 8:
1. Open a command prompt by clicking on the windows tile and clicking the
down arrow to show all apps. In the all apps screen select Command
Prompt in the Windows System section.
2. Steps 2-6 for Windows 7 also work for Windows 8.
Windows 10:
1. Open a command prompt by clicking on the windows/start menu, selecting All
Files, selecting Windows System, and clicking on Command Prompt.
2. Steps 2-6 for Windows 7 also work for Windows 10.
Mac OS X:
1. In the Applications/Utilities folder, open the Terminal app.
2. Type “cd ” (without quotes).
3. Drag the folder you want to move to into the terminal. It should automatically
put the folder path in the command prompt. For example, if you put the
Example_Data directory on your desktop, and wanted to move to it, you
should now see “cd /Users/<username>/Desktop/Example_Data” (no quotes)
on the command line.
4. Hit enter.
5. To check what directory you are in, type “pwd” (no quotes) and hit enter.
6. To check what files are in the current directory, type “ls” (no quotes) and hit
enter.
Linux:
It depends upon the flavor of Linux you are using. On a many Linux machines:
1. Open the folder in your system file manager.
Otherwise, you can open a terminal and use the “cd” command to change to the
proper directory.