Pycontact - A Tool For Analysis of Non-Covalent Interactions in MD Trajectories
Pycontact - A Tool For Analysis of Non-Covalent Interactions in MD Trajectories
MD Trajectories
July 2, 2017
Version: 1.0.1
Code: https://github.com/maxscheurer/pycontact
1
1 INTRODUCTION 2
1 Introduction
Non-covalent interactions of biomolecules are known to be the cornerstones for biochemical pro-
cesses: They govern molecular recognition, induce conformational changes in proteins and exhibit
a plethora of other key functions in the cell. For example, cellular signaling is just working as a
result of specifically evolved interactions of biomolecules.
As atomic interactions through electrostatics, hydrogen bonds or hydrophobic properties of a
biomolecule, are invisible to the common microscopes, they are visible through the computational
microscope: Molecular dynamics (MD) simulations aim at describing the aforementioned inter-
actions at an atomic level of detail, thereby also yielding dynamics of proteins, DNA, ligands,
membranes or other small molecules. Thus, MD is the gateway to study non-bonding interactions
with high spatial and time resolution. The results of a plain MD simulation are the positions of
every atom at every timestep, called the trajectory. Depending on the system and simulation time,
this corresponds to a lot of data that needs to be analyzed. Furthermore, the different types of
interactions have to be distinguished and the level of detail that could be studied reaches from
interactions between individual atoms to complete protein chains. It depends on the scientific task
or question how specific the ”resolution” of the analysis ought to be.
To target these tasks, we provide a novel tool, PyContact, that is capable of non-covalent interac-
tion (or contact) analysis from MD simulation trajectories. Thereby, it offers high flexibility and
can be used without any programming experience, as it is a GUI (graphical user interface) applica-
tion in the first place. In the following sections, we will examine interactions of two key players in
proteasome-guided protein degradation, i.e. Ubiquitin (Ub) and Rpn11, a metallo-protease residing
in the lid of the 26S proteasome.
We already provide a short sample trajectory file. If you plan to start learning how to perform
MD simulations yourself, QwikMD http://www.ks.uiuc.edu/Research/vmd/plugins/qwikmd is
a good way to go.
User Contributions
PyContact is a very new tool, hence it is under constant development. The source code is publicly
available on GitHub (https://github.com/maxscheurer/pycontact). Your ideas, contributions
to novel features and bug reports are very much appreciated. If you want to contribute to the
project, opening Pull Requests or Issues directly on GitHub is the most convenient way. Sub-
scribing the project on GitHub, you can also follow the development progress and get to know the
development of PyContact in a very transparent manner.
How to cite
Available soon.
2 INSTALLATION AND REQUIRED SOFTWARE 3
2 Installing PyContact
To build the C/C++ modules, first install Cython by running
pip install cython
PyContact as such is available on pip. To install it, just run
in the terminal. This will download and install the other dependencies, which are available
on pip.
1 Loading an MD trajectory
First, we need to load a trajectory (i.e. the coordinates for every simulation timestep) together
with the topology (information about bonds etc.) into the tool. To do so, click on File
3 BASIC ANALYSIS OF PROTEIN INTERACTIONS 4
→ Load Trajectory Data (or hit Ctrl+I / ⌘+I). Then, click on the Topology button and
select the rpn11_ubq.psf file in the tutorial folder. Afterwards, click on Trajectory and select
the trajectory file rpn11_ubq.dcd, also residing in the tutorial folder. As we want to elucidate
the interactions between Rpn11 (”segid RN11”) and Ubiquitin (”segid UBQ”), we will put
those in the input selections 1 and 2, respectively. See Tab. (1) for detailed description of
the input fields.
Finally, click on OK to load the trajectory into the program and run the atom-atom contact
analysis.
When the task is accomplished, the Status field should say ”50 frames loaded”.
1
Green: sidechain-sidechain interaction
Yellow: backbone-sidechain interaction
Blue: backbone-backbone interaction
3 BASIC ANALYSIS OF PROTEIN INTERACTIONS 6
For various contact data exportation tasks, we provide a tool, accessible in the menu under
Tools → Export Contact Data or by hitting Ctrl+E / ⌘+E. The first tab available offers to save
the current contact view from the main window to a file. The format can be chosen in the dropdown
menu on the left, where the common png, as well as a svg vector graphics format are disposable.
The next tab brings us to the histogram tool, which allows a fast and clear visualization of useful
properties like Mean Score, Mean Lifetime or Hbond percentage (selectable on the right hand side
of the widget). Two major histogram options are available, General Histogram and Bin per Contact.
The former one groups the corresponding property values in numerical bins. If you wish to use the
analyzed contacts for the bin selection, you may choose Bin per Contact.
To update your choice and draw the histogram, simply click on Show Preview. Using the Bin per
Contact option, you may want to adjust the font size of the bin labeling, which can be easily
achieved with the bin per contact font size text field. To save the histogram, pick a suitable file
format from menu on the right and press Save Histogram to choose the file location.
Another option of visualization of our data can be found in the third tab, called Contact Map. Here
the same properties, which are also provided in the histogram tool can be plotted grayscale matrix,
using the two selections on the two axes respectively. Similar to the tools before, the view can be
updated via the Show Preview button and be saved by clicking on Save Map.
In order to view the trajectory data with VMD, go to the VMD tab and let the tool create a
suitable tcl script for you. If you wish to differentiate between single contacts, you may tick the
Split selections for each contact checkbox. Additional selection texts can be given in the two text
fields below.
Finally, for further data analysis purposes we provide you with a raw data export functionality in
the last tab, titled Plain Text. Pick your preferred combination of properties you want to export
with the checkboxes on left and click on Export to text, which file open the common file dialogue.
The data will be written in a tab separated based ASCII text file.
4 CONTACT AREA CALCULATION 7
2 Atom selections
To specify the atom selections for contact area calculations, one need to understand the
underlying principle of solvent-accessible surface area (SASA) calculations to some extent.
(...)
Let’s say we are interested in the contact area of Rpn11 to Ubq. Then we specify ”segid
RN11” in the Selection text field. We want to shrink the selection to the Ubq interface, so we
◦
only select those residues in 5Aproximity by typing ”segid RN11 and around 5 segid UBQ”
in the Restriction text field. Finally, the program will need to subtract the SASA of protein
residues at the outside of the interface. Type ”protein” in the Selection 2 text field and check
the contact checkbox.
3 Running the calculation Choose the number of cores to run the calculation on. Then click
on Calculate to run the SASA/contact area calculations. The results will be plotted directly
in the GUI and are available for export. See Fig. 1 for an example output.
4 CONTACT AREA CALCULATION 8
Figure 1: Contact area calculation example. As explained in section 4, we calculated the time-
dependent evolution of the contact area between Rpn11 and Ubiquitin.
5 PYTHON SCRIPTING FOR JOB AUTOMATION 9
This scripting capability allows faster contact analyses for trajectories with identical parameters
for example. Furthermore, it shall motivate users to understand how PyContact works, dig into
the code and probably come up with own feature ideas and code.