PTRAJ and CPPTRAJ: Software For Processing and Analysis of Molecular Dynamics Trajectory Data
PTRAJ and CPPTRAJ: Software For Processing and Analysis of Molecular Dynamics Trajectory Data
pubs.acs.org/JCTC
■ INTRODUCTION
Biomolecular simulation methods have proven useful in a wide
becoming larger and are generated more quickly, it is critical
that the data generated can be analyzed not only rapidly and
variety of applications ranging from protein folding and efficiently but in a manner that is both flexible and easy to use
computer aided drug design to the characterization of materials and ideally within a generalizable and extensible framework.
properties.1 Broadly defined, such methods include not only the PTRAJ (short for Process TRAJectory) has served as the
molecular dynamics (MD) and Monte Carlo approaches, but main analysis program of the Amber software package2 since
also molecular docking, geometry optimization, free energy, the early 1990s and offers a wide range of functionality,
and/or path sampling approaches. Any of these methods may including simple geometric analysis of coordinates, conversion
generate a time series or an ensemble of three-dimensional between different coordinate formats, advanced vector and
(3D) atomic positions of a model or set of models, i.e. a matrix analysis, and so on. The overall goal of PTRAJ was to
coordinate “trajectory.” In practice, a key enabler of any robust provide a unified interface to commonly needed analysis tools
biomolecular simulation method is not only providing the as well as provide reusable routines and an extensible
ability to generate conformational ensembles that describe the framework for the easy development and incorporation of
processes of interest but also facilitating the means to analyze new analysis tools. Although PTRAJ has been an integral part
the ensemble data. Beyond simply understanding the static of Amber for decades, this is the first publication providing an
structure derived from experiment or the starting and end in-depth description of the code outside the information
points of a simulation, much more information can be gleaned provided by the Amber manuals and tutorials.
by characterizing the full ensemble or time dependent evolution Several other software packages for the analysis of coordinate
of the sets of 3D atomic or model coordinates. This “trajectory trajectories exist, including VMD,3 MMTSB,4 MDAnalysis,5
analysis” refers to analyzing a (potentially large) set or time Pteros,6 LOOS/PyLOOS,7 and HiMach.8 The MDAnalysis,
series of 3D positions and their derived properties. Compared Pteros, LOOS/PyLOOS, and HiMach packages provide
to the gigabyte or smaller trajectories that could be generated in libraries of functions (essentially analysis-oriented program-
the mid-1990s, today we can run 100 ∼300 ns replicate MD ming language extensions) that can be used to construct
trajectories of a solvated DNA 18-mer on GPU resources such analysis programs. While this approach provides a great deal of
as XSEDE’s Keeneland in ∼4 months to generate over 22 flexibility, the low-level interface can be challenging for some
terabytes (TB) of data representing an aggregate ∼30 μs of users. The MMTSB package provides a series of Perl scripts,
simulation data. The data explosion is only getting worse with
access to resources like Blue Waters at NCSA which has over Received: April 26, 2013
3000 K20X GPUs available. Given that these data sets are Published: June 10, 2013
© 2013 American Chemical Society 3084 dx.doi.org/10.1021/ct400341p | J. Chem. Theory Comput. 2013, 9, 3084−3095
Journal of Chemical Theory and Computation Article
each of which can perform a different type of analysis; however, important was simplicity, functionality, generality, and ease of
combining different types of analysis into a single run is not implementation. When the software was originally written, the
straightforward. VMD provides a convenient graphical interface rate determining step of biomolecular simulation was MD
for various kinds of coordinate analysis. While VMD does
provide a TCL text interface that can be used for scripting, it is trajectory generation, not analysis, and the simulations tended
not typically used for batch-processing various types of analyses to be relatively short (up to thousands of frames in 1−2 ns
(which is particularly useful when analyzing data at remote length simulations that took months to generate). PTRAJ was
sites). In contrast to these software packages, PTRAJ provides a also initially designed with the idea of only processing single
variety of higher-level analysis commands via a unified interface sets of input trajectory data, with a single output trajectory file,
that is still amenable to batch-processing and performing
using a single parameter/topology file. For example, it is not
multiple types on analysis in one run.
The initial incarnation of PTRAJ evolved out of the program readily possible to calculate the Root-Mean-Square Deviation
RDPARM (ReaD PARaMeters), which was designed to read (RMSD) of a coordinate frame to a reference structure with a
and display parameter information from a single Amber different topology (e.g., a mutant protein structure or a related
Topology file. At the time RDPARM was developed in the receptor with a different ligand bound). Moreover, PTRAJ
Kollman lab at UCSF, the formatted Amber Topology data files tends to output derived data as raw text files necessitating
could be generated via multiple mechanisms including the
soon-after retired Prep-Link-Edit-Parm (PLEP) programs and further postprocessing of the data with graphing programs to
its replacement program still used in Amber today, LEaP.2a The display the results. Although PTRAJ has served the computa-
topology files were difficult to read not only due to the tional chemistry community well for about two decades, there
formatting but also since the ordering and specification of are several factors that drove a complete overhaul of the code.
information could be different yet still lead to equivalent results This overhaul was the development of CPPTRAJ, a rewrite
(for example, via different bond orderings or equivalent but of PTRAJ in C++. Although certain portions of the PTRAJ
opposite torsion atom definitions X−C−C−Y vs Y−C−C−X);
because of this, one could not simply “diff” the files to expose code are used in CPPTRAJoften initially cut-and-paste and
differences between PLEP and LEaP topologies. RDPARM then recoded and optimizedand elements of the PTRAJ
emerged when many of the members of the Kollman lab design philosophy have been retained, the code base has been
realized after months of simulations with new LEaP topologies rebuilt from the ground up, with an eye toward improving
that there were inconsistencies in the simulation results. A tool calculation speed and making future additions to the code as
was needed to “read” the topology files and allow easy display
simple as possible. As CPPTRAJ was developed, the intent was
of the parameters (to see what was wrong in the files), and
RDPARM was created. A summary of the functionality and to be fully backward-compatible with PTRAJ commands and
output is presented in the Supporting Information, section 1. At input files, although there are some differences, most notably in
this time, many members of the Amber community had a output formats (to allow direct visualization of results and to
myriad of scripts and programs to do basic MD trajectory facilitate further postprocessing of the derived data). CPPTRAJ
processing and analysis, written in various languages with can read multiple topology files and reference structures, write
significant code duplication. This set of tools included both
home-grown scripts and those distributed with Amber.2a multiple output trajectories (for which specific frames to be
While experimenting with adding analysis capabilities to written can be specified), and strip topology files. In addition,
RDPARM, PTRAJ was designed as an extensible tool with the the results from separate commands can be directed to the
intention of it becoming the central place to aggregate Amber same data file (e.g., the results from two dihedral angle time
MD trajectory analysis tools to facilitate MD trajectory analysis. series calculations like protein phi and psi angles can be written
PTRAJ was built within RDPARMthe two source codes are
to one file), and there is native support for compressed files
merged with different run time behaviors determined by the
name of the executablepromoting reuse of common along with many other improvements. Overall, CPPTRAJ
functionality and providing the underlying analysis code with shows a significant speedup compared to PTRAJ, particularly
a single interface to the various trajectory and topology formats. when processing Amber NetCDF trajectories. In addition,
The code is fairly well documented and attempts to be modular several commands have been parallelized with OpenMP10 to
and extensible by others. In fact, the code for PTRAJ has been take advantage of multicore machines for even more speedup.
■
modified and extended by several different authors over the
years, leading to new functionality (time correlation functions,
matrix analyses, etc.). Moreover, PTRAJ grew to support other GENERAL OVERVIEW
MD trajectory formats beyond PDB and Amber, including Although PTRAJ and CPPTRAJ have some limited interactive
CHARMM9 PSF topology and DCD MD trajectory formats capability, they are mainly designed to be run as a batch job
(among others). Despite the attempts to abstract the
functionality, extending the code is not particularly easy since with predefined input outlining a series of commands to
the RDPARM and PTRAJ codes were written primarily in C, process and analyze the data. An example run might look like
leading to an overly complicated code base. C was chosen as
the primary language since C++ was not yet a reliable standard ptraj GAAC.topo commands.in
when the RDPARM and PTRAJ programs were initially
written. The complicated code base led to code fragmentation where in this case, the file GAAC.topo defines the topology of
and duplication, and memory leaks are still present. Moreover, an 18-mer DNA duplex in solution, and the input file
PTRAJ was not written with speed or efficiency in mind; more (commands.in) is as follows:
3085 dx.doi.org/10.1021/ct400341p | J. Chem. Theory Comput. 2013, 9, 3084−3095
Journal of Chemical Theory and Computation Article
Figure 1. Overall program flow of PTRAJ (left) and CPPTRAJ (right). Orange boxes represent data; blue boxes represent phases of the program.
trajin GAAC.cdf.1 certain actions will not function properly if the total number of
trajin GAAC.cdf.2 frames cannot be determined, as is the case for corrupted or
trajout GAAC‐strip.crd nobox certain compressed trajectories.
As the input file is read in, the various commands to be
center :1−18 mass origin processed are put into a stack of “actions” which will be
image origin center familiar byres :19−36 performed sequentially on each coordinate frame. At this stage,
center :1−36 mass origin necessary memory allocation and setup is performed. After the
image origin center familiar initialization phase comes the actions phase where coordinate
rms first mass out rms time 50 :1−36 frames are read one by one from the input trajectories and
average avgpdb/avg.pdb pdb :1−36 processed by each action. Actions can generate data at this
point, as well as change the “state” of the system (for example,
strip :WAT if a strip command is specified to remove coordinates, all
The above input file reads in two trajectory files (via trajin), subsequent actions after the strip will use the new and
performs centering and imaging of the coordinates (via center truncated set of coordinates). In order to retain generality,
and image with respect to a particular atom selection, e.g., :1− there is little checking on the “actions” to see if they make
18 refers to residues 1−18; see Supporting Information, section sense; this can lead to issues if care is not taken to understand
3 for an extended description of PTRAJ/CPPTRAJ mask the implications. For example, trying to periodically image
syntax), performs an RMS fit to the first frame, outputs an coordinates after an RMS fit rotates the periodic cell will alter
average structure, strips (removes) residues named WAT (in the coordinates incorrectly by imaging with respect to the
Amber this is the standard name for water molecules), and unrotated unit cell. On the other hand, by not imposing strict
outputs the resulting coordinates to another trajectory (via rules there is more flexibility. For example, atomic positional
trajout). Note that the placement of the trajout command does fluctuations (or B-factors) can be calculated either with free
not matter; coordinates produced via trajout are always output coordinates (including molecule rotation) or by prefitting to a
after all actions have processed a frame. common reference frame; in other words, the atomicfluct
To better understand the trajectory processing and analysis, command does not require a particular fit to a reference frame.
it is useful to describe the code flow which is shown in Figure 1 If an output trajectory was specified, output coordinates are
and is similar for both PTRAJ and CPPTRAJ. PTRAJ has three written during this phase once all actions have processed the
distinct phases: initialization, actions, and analysis. In the current trajectory frame. Once coordinate processing is
initialization phase, the topology file and an input file of complete, any accumulated data can be used to perform
commands are read in and parsed. The format of the topology various types of analysis in the analysis phase. Last, any data or
file is automatically detected and is used to define the size of the information not written during the actions phase is printed.
system, the atom and residue names and numbers for atom CPPTRAJ extends this flow into five distinct phases:
selections, solvent, and other properties if present (e.g., atomic initialization, setup, actions, analysis, and data formatting.
masses, charges, etc.). Next, input commands are parsed, either During initialization, everything (including specification of
from a file or from standard input; this may include reading and topologies) can be prepared based on user input, again either
checking reference coordinates or input trajectories. As with from an input file or via standard input. All topology
topology files, the format of coordinate/trajectory files is information and reference structures are read. Coordinate
automatically detected. One current limitation of PTRAJ is that trajectories are prepared for reading, and actions/analyses to be
for many actions it is necessary to first predetermine how many performed are instantiated. The next two phases, setup and
frames will be processed for memory allocation, meaning that actions, comprise coordinate reading and data accumulation.
3086 dx.doi.org/10.1021/ct400341p | J. Chem. Theory Comput. 2013, 9, 3084−3095
Journal of Chemical Theory and Computation Article
During the setup phase, all topology-dependent aspects of each are taken from the name specified with the action; default
action (e.g., atom mask parsing) are processed. When a names are assigned if not specified.
trajectory with a different topology needs to be loaded, all The format of the output can be changed simply by changing
actions are set up again for the new topology before continuing. the filename extension. For example, to output in XMGRACE
During the actions phase, input trajectories are read in one format, the user can simply change the name of the output file
frame at a time and processed by each action. As with PTRAJ, to “standard.agr”, or to obtain Gnuplot map output, the name
actions are performed in sequence on each coordinate frame can be changed to “standard.gnu. The results of this are shown
that is read in, and data may be generated at this point. It in Figure 2. The figures shown are exactly what is rendered by
should be noted that although some actions do output data in
the actions phase, the majority do not for reasons of efficiency
(see e.g. the “secstruct” command in the Discussion). In
contrast to PTRAJ, output coordinates can be written out
during action processing (as opposed to only after all actions
are processed); this allows output of coordinates before
modification from, e.g., an RMS fit. CPPTRAJ also has the
ability to write out Amber topology files, either during the
initialization or setup phases, which is particularly useful when
actions modify topologies. As with PTRAJ, once coordinate
processing is complete, any accumulated data can be used to
perform analysis during the analysis phase. During the last
phase, data formatting, any data slated for output is formatted
and then written to disk.
Overall, the procedure followed by CPPTRAJ is similar to
PTRAJ, with three notable exceptions: (1) Trajectories can
have different topologies (thus actions require separate
initialization and setup phases). (2) The number of frames
does not need to be known for memory allocation in actions, as
data set memory can be allocated dynamically. (3) There is
now a distinct output phase after trajectory processing and
analysis in which data can be formatted. This latter phase is
intended to give the user more output format options, as well as
to eliminate the need for some common postprocessing steps.
Most data generated from actions can be output in one of three
formats (essentially any output specified with the “out”
keyword): standard data (data in whitespace-delimited
columns), XMGRACE (a freely available graphing program
for X-windows, http://plasma-gate.weizmann.ac.il/Grace/),
and Gnuplot contour map (another freely available graphing
program, http://www.gnuplot.info/). The output file type is Figure 2. XMGRACE (top) and Gnuplot map (bottom) output of
detected from the filename extension. Data from separate CPPTRAJ showing how data from separate actions (in this case, a
actions can be output to single or multiple files in any distance and radius of gyration calculation) may be combined. The
combination desired by the user. plots were created using XMGRACE 5.1.23 and GNUPLOT 4.4 on
For example, consider the following CPPTRAJ input: the data output from CPPTRAJ and were not modified in any way.
parm DPDP.parm7
XMGRACE and Gnuplot directly from the CPPTRAJ output;
trajin DPDP.nc i.e., no alterations were made, and no massaging of the files was
distance end_to_end :1:22 out standard.dat required to get axis labels and legends.
radgyr RoG nomax out standard.dat
Table 1. File Formats Supported by PTRAJ/CPPTRAJ used, although this behavior can be changed. Radius of gyration
is defined by
file format typea supported by
Amber Topology Parm both i=0
1
PDB Coord/Parm both Rg = ∑ (ri − rm)2
Charmm PSF Parm both
N N
Mol2 Coord/Parm CPPTRAJ
where N is the number of atoms, ri denotes atomic position,
Amber Trajectory Coord both
and rm denotes the mean position of all atoms. Five-membered
Amber NetCDF Coord both
Amber Restart Coord both
ring pucker can be calculated using the method of Altona and
Amber NetCDF Restart Coord CPPTRAJ
Sundaralingam12 or the method of Cremer and Pople.13 With
Charmm DCD Coord both
the analysis utilities, averages and standard deviations can be
Scripps Binpos Coord PTRAJ
reported, with proper cyclic averages being calculated for
periodic values (e.g., torsions).
a
Parm = topology file; Coord = coordinates/trajectory. Both PTRAJ and CPPTRAJ employ a version of Kabsch’s
algorithm14 for calculating the best-fit RMSD of a structure to a
reference structure, although there are several small differences
generated (via the “bondsearch” keyword) for topology files in the implementation between the two programs. Specifically,
that do not already have such information for use in actions CPPTRAJ allows the specification of separate masks for the
which require it (e.g., the linear combination of pairwise target and reference, allowing some more flexibility. CPPTRAJ
overlaps surface area algorithm11). CPPTRAJ also has the also contains an additional mode that automatically calculates
ability to write Amber topology files, which can be particularly the no-fit RMSD of specified residues after an overall RMS-fit
useful when the topology has been modified such as after a has been performed (enabled with the “perres” keyword),
“strip” command; see the Commands Overview section for which gives an idea of the motion of individual residues within
more details. the overall reference frame.
PTRAJ and CPPTRAJ can read and write in several PTRAJ and CPPTRAJ have the ability to calculate protein
trajectory formats: SCRIPPS BINPOS (PTRAJ only), Amber secondary structure using the method of Kabsch and Sander15
Coordinates (ASCII), Amber Restart, Amber NetCDF (also known as the DSSP method). Both programs can output
trajectory, Charmm DCD, PDB, and (CPPTRAJ only) both secondary structure for each residue per frame as well as
Amber NetCDF restart and Mol2 files. Trajectory files can be the average secondary structure of each residue over all frames.
read in either for processing or for use as reference frames for In addition, CPPTRAJ can output these results in native
certain actions (e.g., RMSD). Trajectory files can be read in XMGRACE and/or Gnuplot format, shown in Figure 3.
using user-specified start, stop, and offset frame numbers. In PTRAJ and CPPTRAJ also have several commands for
addition, users can choose to process frames at a specific modifying coordinates and/or topology. The “strip” command
temperature from an ensemble of trajectories generated from removes user-specified atoms from coordinates and topology,
Amber replica exchange molecular dynamics (REMD) via the which is useful when, for example, removing solvent molecules
“remdtraj remdtrajtemp” keywords. Replica trajectories are from a trajectory. In addition to this, CPPTRAJ can output a
automatically searched for one given replica filename based on corresponding stripped Amber topology file that matches the
the assumption of a numeric file suffix. Output trajectories are stripped coordinates. Similar to “strip,” the “closest” or
written after coordinate processing. Multiple input trajectories “closestwaters” command can be used to keep only a user-
can be written to a single output trajectory. Existing trajectories specified number of waters near a specific area of solute (e.g.,
can also be appended. near a bound ligand), and solvent can be redefined using the
CPPTRAJ also has some additional trajectory processing “solvent” command for the generality of keeping nearby
functionality worth mentioning. For processing replica molecules. As with “strip,” CPPTRAJ can also be used to
trajectories, users can explicitly specify trajectory file names in output a corresponding Amber topology. CPPTRAJ also has an
addition to the automatic file name search. A given replica additional command, “unstrip,” which can be used to restore a
ensemble can also be converted to a temperature ensemble in topology to its original state, which can be used for example to
one pass using the “remdout” keyword. When reading in separate a complex into separate receptor and ligand
trajectories, the “last” keyword can be used if the stop frame is trajectories in one pass; indeed, CPPTRAJ is used by the
not known in advance, and the “lastframe” keyword can be MM-PBSA method in Amber (MMPBSA.py python script) for
used to explicitly choose the final frame of a trajectory. When this very purpose.16
writing out trajectories, users can specify which input frames to For trajectories with periodic boundary conditions, PTRAJ
be output. Also, multiple output trajectories can be specified, and CPPTRAJ can perform imaging in order to bring molecules
and trajectories can be written out during action processing (as outside the primary unit cell back into the primary unit cell.
opposed to only after in PTRAJ) with the “outtraj” command. This is usually done by centering molecules of interest prior to
Examples are shown in the Supporting Information, section 5.
■
imaging. Imaging can be performed for both orthorhombic and
nonorthorhombic cells. Nonorthorhombic cell types include all
PTRAJ/CPPTRAJ COMMANDS OVERVIEW of the general triclinic unit cells, and imaging can be performed
Both PTRAJ and CPPTRAJ can calculate various geometric to the triclinic unit cell shape (which often looks like a slanted
quantities, including distance, angle, dihedral, radius of gyration, rectangle, see Figure 4) with the “triclinic” keyword or to the
and pucker; a complete list of commands is provided in the more familiar or spherical shape that displays the molecules
Supporting Information, section 2. The distance calculations such that the single periodic image closest to the center of the
can perform imaging for both orthorhombic and non- unit cell is displayed (with the “familiar” keyword). PTRAJ
orthorhombic cells; by default the shortest imaged distance is defaults to the triclinic, for historical reasons, whereas
3088 dx.doi.org/10.1021/ct400341p | J. Chem. Theory Comput. 2013, 9, 3084−3095
Journal of Chemical Theory and Computation Article
■ VECTOR/MATRIX ANALYSIS
PTRAJ and CPPTRAJ have several commands which allow
users to track and analyze vectors and matrices generated from
input coordinates. In addition to storing simple atom
coordinate vectors (e.g., between a nitrogen atom bonded to
a hydrogen atom), both principal axis vectors and dipole
vectors can be stored as well. Vectors can also be stored for use
with the Isotropic Reorganizational Eigenmode Dynamics
(IRED) approach of Prompers and Brüschweiler.17 Vectors
are stored as 2 sets of Cartesian coordinates (magnitude and
origin). Auto or cross time correlation functions can be
calculated for vectors using spherical harmonics addition
theorem and a fast-Fourier transform (FFT) approach via
cross-correlation/Wiener-Khinchin theorem.
Several types of matrices can be calculated including distance,
covariance, mass-weighted covariance, correlation, distance-
covariance, IRED, and Isotropically Distributed Ensemble
Analysis.18 Distance matrices record the average distance
between atom pairs. (Mass-weighted) covariance matrices
Figure 3. CPPTRAJ native XMGRACE (top) and Gnuplot (bottom) record the coordinate covariance. Correlation records the
output for the “secstruct” (DSSP) command for a short trajectory of a correlation between atom vectors, and distance-covariance
model β sheet peptide. The plots represent exactly what is output from records the covariance between atom-pair distances. IRED
CPPTRAJ and were not modified in any way. In the XMGRACE matrices use previously defined IRED vectors to generate an
output, the average secondary structure content (y axis) for each IRED matrix.
residue (x axis) is shown. In the Gnuplot output, each residue (y axis) Using the “analyze matrix” command, symmetric matrices
is assigned a secondary structure type (0 = no structure, 1 = parallel can be diagonalized, and the eigenvectors and eigenvalues
beta, 2 = antiparallel beta, 3 = 310 helix, 4 = α helix, 5 = PI helix, 6 = calculated and output. If all eigenvalues/eigenvectors are
turn) for each frame (x axis).
desired, the dspev() routine from the LAPACK math library19
is used for the calculation. If desired, only the eigenvalues may
be computed. If a subset of the eigenvectors/eigenvalues is
desired, the dsaupd()/dseupd() routines from the ARPACK
math library20 are used. If the matrix being analyzed is an IRED
matrix, IRED order parameters can be calculated and output. If
the matrix being analyzed is mass-weighted covariance, quasi-
harmonic analysis (entropy, heat capacity, and internal energy)
can be performed using the standard statistical mechanical
formula for an ideal gas. Eigenvectors from covariance matrices
may also be reduced according to the procedure of Abseher and
Nilges,21 which can be useful for comparing eigenvectors from
Cartesian-space to those in distance space. Eigenmodes
Figure 4. An RNA tetraloop (yellow) with surrounding water solvent generated from “analyze matrix” can be further analyzed to
(blue, only showing the oxygen atoms) and ions (Na+ in red and Cl− calculate RMS fluctuations, displacements of Cartesian
in green) showing equivalent periodic unit cells imaged either to the
coordinates along mode directions, or dipole−dipole correla-
triclinic unit cell (left) or more spherical or familiar imaging (right).
Image created with VMD 1.9.1.3 tion functions. The modes can also be read back in and
coordinate frames projected onto them (using the “project”
command) to calculate how much that frame contributes to
CPPTRAJ defaults to familiar, which is the common imaging each mode. Principal component (or quasi-harmonic) analysis
mode in Amber for truncated octahedral unit cells. is a common use of the matrix analysis facilities. A typical
In addition to imaging by molecule, PTRAJ can image by workflow would involve calculating the covariance matrix
atom, by residue, or according to a user-specified mask (mass-weighted for quasi-harmonic analysis), diagonalizing it to
expression and can build nearby unit cells through user- get the eigenmodes, then projecting frames onto those
3089 dx.doi.org/10.1021/ct400341p | J. Chem. Theory Comput. 2013, 9, 3084−3095
Journal of Chemical Theory and Computation Article
eigenmodes to obtain the values of the principal components at using either the Connolly method23 or the LCPO method11
each frame. Examples are shown in the Supporting Information, (with the “molsurf” and “surf” commands, respectively).
section 4. Although both PTRAJ and CPPTRAJ can use distance-based
■
masks that use reference coordinates, CPPTRAJ can also make
PTRAJ-SPECIFIC COMMANDS use of a distance-based mask that updates each frame via the
“mask” command, which can be used for example to write
Advanced Cluster Analysis. While CPPTRAJ has the
separate PDB files of all water molecules within a certain
ability to perform cluster analysis on input coordinates via a
distance of a ligand (as opposed to the “closest” command,
hierarchical agglomerative approach with single, average, or
which only outputs a fixed number of molecules). CPPTRAJ
complete linkage, PTRAJ is able to perform clustering using a
much greater variety of clustering algorithms, including top- can also calculate J-coupling values from dihedral angles
down splitting, Bayesian, and COBWEB among others.22 In according to the procedure of Chou et al.24 or Perez et al.25
addition, in PTRAJ clustering one has the ability to “sieve” CPPTRAJ comes with many test cases that also serve as
input trajectories in order to reduce total computation time. examples of how to run various commands. These test cases are
During a cluster sieve, clustering is initially performed on a part of the larger AmberTools distribution; when AmberTools
smaller subset of the input coordinates, after which the is installed, they are located in the directory $AMBERHOME/
remaining input coordinates are added to resulting clusters AmberTools/test/cpptraj.
based on their closeness to the cluster centroid. Automatic Imaging. Imaging a trajectory could prove to
Due to the complexity of the clustering code in PTRAJ, it has be challenging with PTRAJ for systems with more than one
not yet been fully ported to CPPTRAJ, although this is planned nonsolvent molecule (e.g., DNA duplexes, receptor−ligand
for future versions of CPPTRAJ. complexes, protein tetramers, etc.). The “autoimage” com-
Hydrogen Bond Facility. Although both PTRAJ and mand in CPPTRAJ was designed to perform centering and
CPPTRAJ both have commands to track hydrogen bonds, the imaging in one step with minimal input from the user. The
PTRAJ hydrogen bonding facility differs significantly from command functions by dividing all molecules into three
CPPTRAJ. In PTRAJ, hydrogen bond donors and acceptors are regions: anchor, mobile, and fixed. The command will pick
defined prior to an “hbond” command with the “donor” and defaults for each region, although any of the regions can be
“acceptor” keywords. Note that in PTRAJ the definition of chosen by the user via mask expressions. The anchor region is
hydrogen bond donor and acceptor are reversed with respect to made up of a single molecule that will be centered either to the
standard conventions; the electron pair “acceptor” is bonded to box center or the coordinate origin; all other molecules are
hydrogen and the “donor” is the atom to which the hydrogen imaged with respect to the anchor molecule. By default the first
bond is formed (i.e., in PTRAJ a “donor” can be thought of as molecule is chosen as the anchor. The mobile region is made
donating electrons to the hydrogen atom). This has not been up of molecules that can be imaged freely; by default all
changed in order to preserve backward compatibility. Solute solvents and ions are chosen to be mobile. The fixed region is
donors can either be specified using a mask or by giving a made up of all remaining molecules; molecules in this region
residue and atom name, or by specifying a mask. Solute are imaged only if the imaged position is closer to the anchor
acceptors can be specified by giving residue, heavy atom, and molecule. An example of this command is shown in the
hydrogen atom names, or by giving heavy atom and hydrogen Supporting Information, section 5.7.
atom masks. This information is then used by a subsequent New Hydrogen Bonding Facility. Like PTRAJ, CPPTRAJ
“hbond” command. Solute−solute, solute−solvent, and has the ability to keep track of hydrogen bonds formed over the
solvent−solvent hydrogen bonds can all be tracked. A hydrogen course of a trajectory. However, there are several important
bond is considered formed when distance and angle cutoff differences from the PTRAJ implementation. Unlike PTRAJ,
criteria are met. If desired, the user can disable the angle where hydrogen bond donors and acceptors were specified with
criterion. PTRAJ can also be used to track particle−particle separate commands, in CPPTRAJ, hydrogen bond donors and
interactions by specifying the same selection for the “acceptor” acceptors are specified with the “hbond” command as masks.
specification twice, for example to track ion interactions with a Note that unlike PTRAJ, the definitions of hydrogen bond
typical donor. In addition, PTRAJ allows definition of donor and acceptor follow standard conventions (hydrogen
solventdonor and solventacceptor hydrogen bonds, for bond donors consist of a heavy atom and hydrogen; hydrogen
example to track generic solvent interactions (e.g., if any bond acceptors consist of a single atom). Hydrogen bond
solvent interacts in contrast to each specific solvent molecule donors and acceptors can be searched for automatically
and the definition of solvent is general). An example script is following the simplistic criterion that donors are considered
shown in the Supporting Information, section 6. At the end of to be N, O, and F atoms bonded to hydrogen, while acceptors
coordinate processing, a summary of the average occupancy, are considered to be N, O, and F atoms with no bonded
distance, and angle of each hydrogen bond found is output. hydrogen atoms. Donors and/or acceptors can also be explicitly
Note that in PTRAJ, hydrogen bond angles are reported as specified with “donormask” and “acceptormask” keywords,
deviations from linear (i.e. as 180° - ANGLE). The time-series
respectively. As with PTRAJ, a distance and angle criterion is
of each hydrogen bond can optionally be saved, from which
used to determine when a hydrogen bond is present, and the
details on hydrogen bond lifetimes can be calculated. PTRAJ
average occupancy, distance, and angle of each hydrogen bond
will also report when solute residues are bridged by a single
found is output. Unlike PTRAJ however, only solute−solute
water molecule.
■
hydrogen bonds are searched for, and there is no option to
record the time series (and hence get lifetimes) of hydrogen
CPPTRAJ-SPECIFIC COMMANDS bonds, although these features are planned for future releases of
CPPTRAJ has several actions not available in PTRAJ. The CPPTRAJ. An example of this command is shown in the
solvent accessible surface area of a molecule can be calculated Supporting Information, section 5.8.
3090 dx.doi.org/10.1021/ct400341p | J. Chem. Theory Comput. 2013, 9, 3084−3095
Journal of Chemical Theory and Computation Article
Nucleic Acid Structure Analysis. CPPTRAJ has the The deff equation can be solved for Q (and therefore D) using
ability to calculate basic nucleic acid structure parameters. Base singular value decomposition; in CPPTRAJ this is done via an
pair parameters (shear, stretch, stagger, buckle, propeller twist, external call to a LAPACK routine (dgesvd). Diagonalization of
and opening), base pair step parameters (shift, slide, rise, tilt, the diffusion tensor to yield principal components and axes is
roll, and twist), and helical parameters (X-displacement, Y- also accomplished with a LAPACK routine (dsyev).19 The
displacement, rise, inclination, tip, and twist) are calculated diffusion tensor in the full anisotropic limit is then found using
using the same procedure employed by 3DNA26 using a downhill simplex minimization scheme that uses the diffusion
reference frame coordinates from Olson et al.27 Base pairing tensor in the small anisotropic limit as initial input. An example
is determined on a per frame basis. First, each base is identified of this command is shown in the Supporting Information,
and assigned a standard reference frame. Next, for each base, section 5.10.
potential base-pairing partners are determined based on how RMSD Autocorrelation. Time correlation functions are
close their reference frames are and whether they are hydrogen- useful tools that allow discernment of patterns that may be
bonded; currently only base pairs with standard Watson−Crick hidden in a given set of data. A property that one may be
hydrogen bonding patterns are recognized. CPPTRAJ has some interested in calculating the time correlation function for is
ability to recognize nonstandard/modified nucleic acid bases RMSD, in order to determine the similarity of structures in a
through a user-specified argument which attempts to map the trajectory over different time scales and to assess convergence
nonstandard base to one of the five standard bases. Currently, to the average structure. However, one cannot directly calculate
this works as long as the modified base contains at least the the time correlation function of a series of RMSD values and
same heavy atoms as the standard base to which it is being obtain this, as the difference between two RMSD values may
mapped. An example of this command is shown in the not be indicative of how similar the structures are. For example,
Supporting Information, section 5.9. suppose one has a trajectory of a small peptide which starts in a
Rotational Diffusion Tensor. The rotational diffusion helical conformation, then adopts both hairpin and extended
properties of a molecule can be calculated with CPPTRAJ using conformations. With respect to the starting structure, the
the “rotdif” command, which determines the diffusion tensor hairpin and extended conformations may have similar RMSD
with both small and full anisotropy. The procedure followed is values, although they are obviously different structures.
briefly described here; for further details, refer to the original To calculate the similarity of structures over different time
implementation by Wong and Case.28 First, a user-specified scales, we have developed the RMSD average correlation
number of random vectors are generated and rotated using method, or “rmsavgcorr”. This method calculates the
rotation matrices obtained from RMS fitting the trajectory of autocorrelation of RMSD as the average RMSD of coordinate
the target molecule with a suitable reference (typically the frames which have been averaged over sliding time windows of
average structure). The correlation function Cl for each vector a certain size:
is then determined using a Legendre polynomial Pl of order l =
N
1 or l = 2, specified by the user (default is 2): ∑t = 0 RMSD(AvgCrd(t , t + τ ))
i=0 j=0
RAC(τ ) =
N−τ+1
Cl(τ ) = ∑ ∑ Pl[nj ·nj + i]
τ N−i where τ is the window size (ranging from 1 to N), N is the total
number of frames, AvgCrd(t, t + τ) is the average coordinates
where τ denotes the maximum lag time, n represents the from frames t to t + τ, and RMSD(.) represents the calculation
rotated vector, i and j denote different times, and N is the total of best-fit RMSD of the given frame to reference AvgCrd(0, τ).
number of frames. Since trajectories are of finite size, τ should This means that RAC(1) is just the average RMSD to the first
be somewhat less than N in order to limit statistical errors. The frame, and RAC(N) is always 0.0.
average (or “effective”) correlation time for a vector Tl(n) can Atom Mapping. In some cases, two structures of the same
be determined from the integral over the calculated correlation molecule may exist with different atom ordering. This can occur
function; in CPPTRAJ, this is accomplished by first creating a for example when generating a ligand with two different
cubic spline mesh, then performing simple integration via the programs; hydrogen atoms may be placed at the end of one file,
trapezoid rule. This can then be used to calculate the local whereas they may be placed close to their bonded heavy atom
effective diffusion constant dloc(n,l) for that vector using an in another. The two structures then become impossible to
iterative procedure: compare using, e.g., the RMSD algorithm in PTRAJ/CPPTRAJ,
1 since it is expected that the atom ordering between the target
dloc(n , l) ≡
l(l + 1)Tl(n) and reference structure matches. To address this issue, the
“atommap” command was created for CPPTRAJ. The
Once the local effective diffusion constants for each vector have “atommap” command employs a very simple algorithm which
been determined, they can be used to determine a tensor Q by attempts to map a target structure onto a reference structure
solving (i.e., solve in an approximate way the maximum common
subgraph isomorphism problem).29
deff = AT *Q
The central idea of the algorithm is to assign each atom in
where deff is a column vector composed of the local diffusion the target and reference a unique ID based on its chemical
constants and AT is a 6 by N matrix, the rows of which are environment and from these IDs attempt to create a map
composed of n vector components. Q is related to the diffusion between the target and reference. To create the unique ID, first
tensor D by each atom is assigned a single character based on its atomic
element, so carbon becomes C, hydrogen becomes H, etc. The
3Dav I − D next step is to assign each atom what is called an AtomID,
Q=
2 which is made up of the single character plus the characters of
3091 dx.doi.org/10.1021/ct400341p | J. Chem. Theory Comput. 2013, 9, 3084−3095
Journal of Chemical Theory and Computation Article
the atoms it is bonded to. So for example, the α carbon in an stage are usually symmetric in some way. If atoms are mapped
alanine side chain would receive an AtomID of CCCHN since in this way, the algorithm then returns to the first step (Figure
it is bonded to a carbonyl carbon, a beta carbon, an alpha 5, green atoms); otherwise mapping is complete.
hydrogen, and an amide N. The AtomID is then combined with When the molecule is highly symmetric, it is possible that no
the AtomID of all bonded atoms and sorted in alphabetical unique atoms can be found to make an initial map. When this
order to form the unique ID. In the case of alanine, the occurs, the algorithm attempts to guess an initial mapping
AtomIDs of the previously mentioned atoms bonded to the α based on atoms whose unique IDs are duplicated only once,
carbon are CCOO, CCHHH, HC, and NCHH, so the total preferring chiral atoms. The algorithm performs mapping using
unique ID for the α carbon is CCCCCCCCCHHHHHH- each potential pair as an initial guess and ranks each guess
HNNOO. The unique ID of an atom therefore reflects the local based on RMSD of mapped atoms. The mapping which
chemical environment of the atom. Because the chemical produces the lowest RMSD is then used.
environment of an atom may not truly be unique (which is the After mapping is complete, several options are available. The
case with symmetric atoms for example) each unique ID is map can be used to reorder an input trajectory based on the
checked to see how many times it is repeated in the molecule. If target so that it matches the reference, or only the map itself or
it is not repeated, it is marked as truly unique. the RMSD of mapped atoms can be printed if desired. In cases
The atom mapping procedure is illustrated in Figure 5. Once where not all atoms can be mapped (either due to different
the truly unique atoms have been found in the target and numbers of atoms in the target and reference or incomplete
mapping), structures can be modified to only include mapped
atoms. Despite the relative simplicity of the algorithm, it can be
quite effective, at least for small molecules. In a recent docking
study in which the atom ordering of crystal poses did not match
that of the starting poses, the CPPTRAJ “atommap” algorithm
was able to map all 85 ligands, including structures where the
protonation state changed and highly symmetric structures.30
An example of this command is shown in the Supporting
Information, section 5.11.
■
the chiral atom and two atoms it is bonded to are mapped).
These atoms are then mapped by matching dihedral angles
formed between them and the three mapped atoms (Figure 5, DISCUSSION OF PTRAJ/CPPTRAJ PERFORMANCE
orange atoms). If any atoms are mapped in this way, the There are several differences between PTRAJ and CPPTRAJ,
algorithm returns to the first step. The final step is to map both in the implementation of underlying algorithms and
atoms based on their element and bonding to previously general code layout, which are worth noting as they contribute
mapped atoms (Figure 5, yellow atoms). The atoms in this to the generally better performance of CPPTRAJ vs PTRAJ.
3092 dx.doi.org/10.1021/ct400341p | J. Chem. Theory Comput. 2013, 9, 3084−3095
Journal of Chemical Theory and Computation Article
One major difference is that by design CPPTRAJ attempts to using an OpenMP pragma in front of a key loop. Since the
exploit the locality of reference as much as possible, i.e. to make input trajectory is never split up, there is no postprocessing to
data access as efficient as possible.31 This tends to be a feature recombine data from actions on different threads in the correct
when programming with an object-oriented language such as order.
C++ using modern compilers, as classes tend to keep both code Benchmarks. Benchmarks of several commonly used
and the target data together. In CPPTRAJ, this strategy was actions are shown for PTRAJ and CPPTRAJ in Table 2.
also employed for the storage of coordinates. Whereas in
PTRAJ, X, Y, and Z coordinates are stored in three separate Table 2. Time Necessary to Complete Common Actions for
arrays, in CPPTRAJ, X, Y, and Z coordinates are stored 1000 Frames (Unless Otherwise Noted) in AmberTools 12
sequentially in a single array. This results in significantly CPPTRAJ and PTRAJ, and Speedup of CPPTRAJ vs PTRAJ
improved performance when processing Amber NetCDF
command CPPTRAJ (s) PTRAJ-OPT (s) speedup
trajectories in particular, since such trajectories have their
coordinates stored in a similar fashion. angle 2.77 4.18 1.51
Another difference is in the implementation of atom masks. center 3.32 5.71 1.72
In PTRAJ, atom masks are stored as an integer array of size dihedral 2.78 4.20 1.51
NAtom (where NAtom is the total number of atoms in the DSSP 9.47 138.94 14.67
system), set to 1 if the atom is selected and 0 if not. This means image (general triclinic) 4.46 7.97 1.79
that for example a routine calculating the center of mass of image (truncated oct.) 8.44 16.12 1.91
atoms in a mask would have a loop that always executes NAtom pucker 2.80 4.21 1.50
times, with a conditional inside the loop determining whether radius of gyration 2.80 4.48 1.60
an atom is selected or not (although it should be noted there RMSD (best-fit) 3.60 19.23 5.34
are some actions in PTRAJ which do have special cases where running avg 6.48 27.46 4.24
only one atom is selected). In contrast, CPPTRAJ employs two strip 2.92 3.85 1.32
mask types. The first is similar to PTRAJ masks, where there is closesta 16.62 33.33 2.01
radial dist. fna
a character array of size NAtom, set to “T” if the atom is selected a
65.96 125.06 1.90
and “F” if the atom is not selected. This is useful when one Only one frame processed.
needs to know both selected and unselected atoms. However, it
is far more common to only be interested in selected atoms, Both programs were compiled with GNU compilers, version
and so the second mask type in CPPTRAJ is an integer array of 4.5.1. In AmberTools, PTRAJ is currently built without explicit
size NSelected (where NSelected is the total number of selected compiler optimizations turned on for historical reasons, so
atoms), containing only the atom numbers of selected atoms. timings are reported for PTRAJ with compiler optimizations
Typically NSelected is much smaller than NAtom. So to use the turned on (denoted “PTRAJ-OPT”). Turning on compiler
previous example of a center of mass calculation, instead of optimizations for PTRAJ results in a speedup (versus
having to execute NAtom times, the loop only needs to be unoptimized PTRAJ) of 1.39× on average. The trajectory
executed NSelected times with no conditional inside the loop. processed was an Amber NetCDF trajectory (49 115 atoms).
This also saves memory as only selected atoms are stored All actions processed 1000 frames, except for the “closest” and
versus all atoms for a PTRAJ mask. “radial” actions, which only processed one frame due to their
Although both PTRAJ and CPPTRAJ have parallelized code, time-consuming nature. The CPU used was an AMD Athlon 64
the strategies employed in each case are radically different. For X2 4600+ (2.4 GHz).
PTRAJ, the strategy was to parallelize trajectory reads, i.e. For all commands, CPPTRAJ is faster than PTRAJ, with an
divide each trajectory among the number of threads being used, overall average speedup of 3.15×. For relatively simple
within an MPI32 framework. So given a 1000 frame trajectory calculations such as angle, dihedral, pucker, radgyr (radius
and two threads, thread 0 would read frames 0−499, while of gyration), strip, and center, the average speed-up for
thread 1 would read frames 500−999. While this did at times CPPTRAJ is 1.53×. Much of this is a result of the better
result in a significant speedup and scaled reasonably well, there handling of NetCDF files and improved locality of reference as
were several problems with this approach. The first was that as mentioned in the previous section. Both image actions show
implemented the number of input frames was required to be a slightly better speedups of 1.79× and 1.91×; this is due to faster
multiple of the number of threads; so for example it was not handling of atom masks in CPPTRAJ, as well as vectorization of
possible to read 753 frames with two threads. Another is that the nonorthogonal imaging code originally used in PTRAJ.
the performance has proved to be extremely dependent on the The rms action shows a much larger speedup of 5.34×. This
underlying file system; for some standard nonparallel file is largely the result of two changes to the RMSD calculation
systems it was possible to see no speedup (or even a from PTRAJ: (1) copies of the target and reference coordinate
slowdown) when using multiple threads. Finally, this approach frames are made prior to the calculation based on atom masks,
also requires that the trajectory be “seekable”, i.e. the file can be so that the RMSD calculation itself is only ever on the atoms of
opened at a specific frame; as such its use is currently restricted interest, eliminating the need for conditionals to check for
to Amber formatted and NetCDF trajectories. selected atoms and/or mass information inside the RMSD
To avoid these issues, it was decided that for CPPTRAJ only calculation loop, and (2) the reference frame is set up and
certain time-consuming actions would be parallelized using precentered once instead of each time the RMSD calculation is
OpenMP.10 As it is now quite commonplace for even desktop performed.
machines to have two or more cores, the simplicity of The secstruct action (DSSP analysis) has a huge speedup of
parallelizing time-consuming actions within a shared-memory 14.67×. Much of this is due to the fact that in PTRAJ secstruct
framework makes OpenMP an attractive parallelization data are output during trajectory processing, whereas in
solution. In many cases, speedup can be achieved by simply CPPTRAJ data are output after trajectory processing. This
3093 dx.doi.org/10.1021/ct400341p | J. Chem. Theory Comput. 2013, 9, 3084−3095
Journal of Chemical Theory and Computation Article
illustrates how optimizing disk access can dramatically improve necessitated a code rewrite. CPPTRAJ has been created to
performance, although it does so at the expense of using more initially complement but eventually replace PTRAJ as the main
memory. analysis engine of the Amber software package. CPPTRAJ has
OpenMP Benchmarks. Benchmarks for several generally significantly improved performance compared to PTRAJ, and
time-consuming actions which have been parallelized using the reorganization of the code should make future additions to
OpenMP in CPPTRAJ are shown in Table 3. CPPTRAJ was the code easier. Although CPPTRAJ supports most of the
functionality and syntax from PTRAJ, the code is not yet fully
Table 3. CPPTRAJ OpenMP Benchmarks for Certain compatible.
Parallelized Actions One of the main goals for further development of CPPTRAJ
is to enhance and increase the flexibility of data set handling.
# time
command frames threads (s) speedup efficiency For example, although CPPTRAJ allows the creation of several
types of matrices, users may want to create a custom matrix
closest 10 :1−268 first 3 1 49.68 1.00 1.00
composed of data from various different sources (e.g., distances,
2 25.15 1.98 0.99
angles, dihedrals, etc.) and perform various operations on that
4 12.94 3.84 0.96
matrix (such as principal component analysis). Additional
8 6.93 7.17 0.90
future aims for CPPTRAJ are to become fully backward-
mask ″(:190−210 10 1 44.76 1.00 1.00
<:3.0) &:WAT″ compatible with PTRAJ, add novel analysis capabilities, add
2 22.73 1.97 0.98 more parallelization and improve upon existing parallelization,
4 11.77 3.80 0.95 and increase the number of recognized topology and
coordinate file formats.
■
8 6.09 7.35 0.92
radial Radial.agr 0.5 1 1 69.02 1.00 1.00
10.0 :WAT@O ASSOCIATED CONTENT
2 35.00 1.97 0.99 *
S Supporting Information
4 17.63 3.91 0.98 Summary of RDPARM functionality and output. Summary of
8 9.48 7.28 0.91 PTRAJ/CPPTRAJ commands. Description of PTRAJ/
secstruct out dssp.gnu 1000 1 11.90 1.00 1.00 CPPTRAJ atom mask selection syntax. PTRAJ/CPPTRAJ
2 7.11 1.67 0.84 matrix/vector analysis scripts. Example CPPTRAJ scripts/
4 5.75 2.07 0.52 commands. Example PTRAJ scripts. This material is available
8 5.64 2.11 0.26 free of charge via the Internet at http://pubs.acs.org.
■
surf :1−268 out surf.dat 400 1 62.21 1.00 1.00
2 32.52 1.91 0.96 AUTHOR INFORMATION
4 17.26 3.60 0.90
Corresponding Author
8 9.36 6.65 0.83
*E-mail: [email protected] (D.R.R.), [email protected]
(T.E.C.III)
compiled with GNU version 4.4.3 compilers. Trajectory is Author Contributions
NetCDF 49 115 atoms and 15 022 water molecules; the The manuscript was written through contributions of all
number of frames used for each command was chosen so that authors. Cheatham is the primary author of PTRAJ and Roe is
the single thread benchmark would be around 60 s in length the primary author of CPPTRAJ. All authors have given
and is listed next to the command. This is run on a system with approval to the final version of the manuscript.
four AMD Opteron 6174 CPUs (48 cores total), 2.2 GHz.
In general, actions scale reasonably well up to eight threads. Funding Sources
Routines which need to calculate many distances between NIH R01-GM081411 and NSF OCI-1036208.
single atoms (closest, mask, and radial) scale the best and Notes
remain around 90% efficient to eight threads. The surf action The authors declare no competing financial interest.
■
scales reasonably well out to eight threads, remaining 83%
efficient. The reason surf does not scale as well is likely due to ACKNOWLEDGMENTS
the fact that the underlying algorithm contains more nested We would like to acknowledge locally Niel Henriksen, Rodrigo
loops than closest, mask, and radial. In contrast, the secstruct Galindo and Christina Bergonzo for extensive testing of the
action does not scale well beyond two threads. This is probably codes, and also the larger community of Amber developers,
because the parallelization of secstruct was done on the level of members of the Amber mailing list, and others for feedback on
residues (rather than the level of distances), resulting in an improvements to the codes. We would also like to acknowledge
often uneven distribution of calculations to each thread. It the Center for High Performance Computing at the University
should be noted that with the exception of the radial of Utah, NIH R01 GM-081411, NRAC XSEDE MCA01S027,
command, the only coding required to parallelize these actions and NSF/NCSA/U Illinois Blue Waters (PRAC OCI-1036208
was a single OpenMP pragma in front of the outermost loop, and OCI 07-25070) for access to exceptional computational
and there remains significant room for improvement.
■
resources and support.
CONCLUSIONS
PTRAJ has provided the computational chemistry community
■ REFERENCES
(1) (a) Klepeis, J. L.; Lindorff-Larsen, K.; Dror, R. O.; Shaw, D. E.
the means to perform a wide variety of analyses on data Long-timescale molecular dynamics simulations of protein structure
generated from computational simulations for over almost two and function. Curr. Opin. Struct. Biol. 2009, 19 (2), 120−127.
decades. However, dramatic increases in the size of trajectories (b) Durrant, J. D.; McCammon, J. A. Molecular dynamics simulations
being processed combined with the aging PTRAJ code-base and drug discovery. BMC Biol. 2011, 9, 71. (c) Schlick, T.; Collepardo-
Guevara, R.; Halvorsen, L. A.; Jung, S.; Xiao, X. Biomolecularmodeling (17) Prompers, J. J.; Bruschweiler, R. General framework for studying
and simulation: a field coming of age. Q. Rev. Biophys. 2011, 44 (2), the dynamics of folded and nonfolded proteins by NMR relaxation
191−228. (d) Wereszczynski, J.; McCammon, J. A. Statistical spectroscopy and MD simulation. J. Am. Chem. Soc. 2002, 124 (16),
mechanics and molecular dynamics in evaluating thermodynamic 4522−4534.
properties of biomolecular recognition. Q. Rev. Biophys. 2012, 45 (1), (18) Prompers, J. J.; Bruschweiler, R. Dynamic and structural analysis
1−25. (e) Perez, A.; Luque, F. J.; Orozco, M. Frontiers in molecular of isotropically distributed molecular ensembles. Proteins 2002, 46 (2),
dynamics simulations of DNA. Acc. Chem. Res. 2012, 45 (2), 196−205. 177−189.
(2) (a) Pearlman, D. A.; Case, D. A.; Caldwell, J. W.; Ross, W. S.; (19) Anderson, E.; Bai, Z.; Bischof, C.; Blackford, J.; Demmel, J.;
Cheatham, T. E.; Debolt, S.; Ferguson, D.; Seibel, G.; Kollman, P. Dongara, J.; Du Croz, J.; Greenbaum, A.; Hammarling, S.; McKenney,
AMBER, a package of computer programs for applying molecular A.; Sorensen, D. C. LAPACK Users’ Guide, 3rd ed.; SIAM:
mechanics, normal mode analysis, molecular dynamics and free energy Philadelphia, PA, 1999.
calculations to simulate the structure and energetic properties of (20) Lehoucq, R. B.; Sorensen, D. C.; Yang, C. ARPACK Users’
molecules. Comput. Phys. Commun. 1995, 91 (1−3), 1−41. (b) Case, Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly
D. A.; Cheatham, T. E., III; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. Restarted Arnoldi Methods; SIAM: Philadelphia, PA, 1998.
M., Jr.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. The (21) Abseher, R.; Nilges, M. Are there non-trivial dynamic cross-
Amber biomolecular simulation programs. J. Comput. Chem. 2005, 26 correlations in proteins? J. Mol. Biol. 1998, 279 (4), 911−920.
(16), 1668−1688. (22) Shao, J.; Tanner, S. W.; Thompson, N.; Cheatham, T. E., III.
(3) Humphrey, W.; Dalke, A.; Schulten, K. VMD - Visual Molecular Clustering molecular dynamics trajectories. 1. Characterizing the
Dynamics. J. Mol. Graphics Modell. 1996, 14, 33−38. performance of different clustering algorithms. J. Chem. Theory
(4) Feig, M.; Karanicolas, J.; Brooks, C. L., III. MMTSB Tool Set: Comput. 2007, 3, 2312−2334.
enhanced sampling and multiscale modeling methods for applications (23) (a) Connolly, M. L. Solvent-accessible surfaces of proteins and
in structural biology. J. Mol. Graphics Modell. 2001, 22 (5), 377−395. nucleic acids. Science 1983, 221 (4612), 709−713. (b) Connolly, M.
(5) Michaud-Agrawal, N.; Denning, E. J.; Woolf, T. B.; Beckstein, O. Analytical molecular surface calculation. J. Appl. Crystallogr. 1983, 16,
MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics 548−558.
Simulations. J. Comput. Chem. 2011, 32, 2319−2327. (24) Chou, J. J.; Case, D. A.; Bax, A. Insights into the mobility of
(6) Yesylevskyy, S. O. Pteros: Fast and Easy to Use Open-Source C+ methyl-bearing side chains in proteins. J. Am. Chem. Soc. 2003, 125,
+ Library for Molecular Analysis. J. Comput. Chem. 2012, 33, 1632− 8959−8966.
1636. (25) Perez, C.; Lohr, F.; Ruterjans, H.; Schmidt, J. M. Self-consistent
(7) Romo, T. D.; Grossfield, A. LOOS: An extensible platform for Karplus parametrization of 3J couplings depending on the polypeptide
the structural analysis of simulations. In 31st Annual International side-chain torsion chi1. J. Am. Chem. Soc. 2001, 123 (29), 7081−7093.
Conference of the IEEE EMBS; IEEE: New York, 2009; pp 2332−2335. (26) (a) Lu, X. J.; Olson, W. K. 3DNA: a software package for the
(8) Tu, T.; Rendleman, C. A.; Borhani, D. W.; Dror, R. O.; analysis, rebuilding and visualization of three-dimensional nucleic acid
Gullingsrud, J.; Jensen, M. Ø.; Klepeis, J. L.; Maragakis, P.; Miller, P.; structures. Nucleic Acids Res. 2003, 31 (17), 5108−5121. (b) Babcock,
Stafford, K. A.; Shaw, D. E. A Scalable Parallel Framework for M. S.; Pednault, E. P.; Olson, W. K. Nucleic acid structure analysis.
Analyzing Terascale Molecular Dynamics Simulation Trajectories. In Mathematics for local Cartesian and helical structure parameters that
Proceedings of the ACM/IEEE Conference on Supercomputing (SC08), are truly comparable between structures. J. Mol. Biol. 1994, 237 (1),
Austin, TX, November 15−21, 2008; IEEE: New York: 2008; pp 15− 125−156.
21. (27) Olson, W. K.; Bansal, M.; Burley, S. K.; Dickerson, R. E.;
(9) Brooks, B. R.; Brooks, C. L., III; Mackerell, A. D., Jr.; Nilsson, L.; Gerstein, M.; Harvey, S. C.; Heinemann, U.; Lu, X. J.; Neidle, S.;
Petrella, R. J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, Shakked, Z.; Sklenar, H.; Suzuki, M.; Tung, C. S.; Westhof, E.;
S.; Caflisch, A.; Caves, L.; Cui, Q.; Dinner, A. R.; Feig, M.; Fischer, S.; Wolberger, C.; Berman, H. M. A standard reference frame for the
Gao, J.; Hodoscek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Ma, J.; description of nucleic acid base-pair geometry. J. Mol. Biol. 2001, 313
Ovchinnikov, V.; Paci, E.; Pastor, R. W.; Post, C. B.; Pu, J. Z.; Schaefer, (1), 229−237.
M.; Tidor, B.; Venable, R. M.; Woodcock, H. L.; Wu, X.; Yang, W.; (28) Wong, V.; Case, D. A. Evaluating rotational diffusion from
York, D. M.; Karplus, M. CHARMM: the biomolecular simulation protein MD simulations. J. Phys. Chem. B 2008, 112 (19), 6013−6024.
program. J. Comput. Chem. 2009, 30 (10), 1545−1614. (29) Raymond, J. W.; Willett, P. Maximum common subgraph
(10) Dagum, L.; Menon, R. OpenMP: An Industry-Standard API for isomorphism algorithms for the matching of chemical structures. J.
Shared-Memory Programming. IEEE Comput. Sci. Eng. 1998, 5 (1), Comput.-Aided Mol. Des. 2002, 16 (7), 521−533.
46−55. (30) Brozell, S. R.; Mukherjee, S.; Balius, T. E.; Roe, D. R.; Case, D.
(11) Weiser, J.; Shenkin, P. S.; Still, W. C. Approximate atomic A.; Rizzo, R. C. Evaluation of DOCK 6 as a pose generation and
database enrichment tool. J. Comput.-Aided Mol. Des. 2012, 26 (6),
surfaces from linear combinations of pairwise overlaps (LCPO). J.
749−773.
Comput. Chem. 1999, 20, 217−230.
(31) Bacon, D. F.; Graham, S. L.; Sharp, O. J. Compiler
(12) (a) Altona, C.; Sundaralingam, M. Conformational analysis of
transformations for high-performance computing. ACM Comput.
the sugar ring in nucleosides and nucleotides. A new description using
Surv. 1994, 26, 345−420.
the concept of pseudorotation. J. Am. Chem. Soc. 1972, 94 (23), 8205−
(32) Geist, A.; Gropp, W.; Huss-Lederman, S.; Lumsdaine, A.; Lusk,
8212. (b) Harvey, S. C.; Prabhakaran, M. Ribose puckering - structure,
E.; Saphir, W.; Skjellum, T.; Snir, M. MPI-2: Extending the message-
dynamics, energetics, and the pseudorotation cycle. J. Am. Chem. Soc.
passing interface. In Euro-Par’96 Parallel Processing Lecture Notes in
1986, 108, 6128−6136.
Computer Science; Springer: New York, 1996; pp 128−135.
(13) Cremer, D.; Pople, J. A. A general definition of ring puckering
coordinates. J. Am. Chem. Soc. 1975, 97, 1354−1358.
(14) Kabsch, W. A discussion of the solution for the best rotation to
relate two sets of vectors. Acta Crystallogr., Sect. A 1978, 34, 827−828.
(15) Kabsch, W.; Sander, C. Dictionary of protein secondary
structure: pattern recognition of hydrogen-bonded and geometrical
features. Biopolymers 1983, 22 (12), 2577−2637.
(16) Miller, B. R., III; McGee, T. D., Jr.; Swails, J. M.; Homeyer, N.;
Gohlke, H.; Roitberg, A. E. MMPBSA.py: An efficient program for
end-state free energy calculations. J. Chem. Theory Comput. 2012, 8,
3314−3321.