Thesis Ls5201 17ms204 Saswat Mohanty
Thesis Ls5201 17ms204 Saswat Mohanty
Thesis Ls5201 17ms204 Saswat Mohanty
Biological Sciences
by
i
INDIAN INSTITUTE OF SCIENCE
EDUCATION AND RESEARCH, KOLKATA
Certificate
I hereby certify that the matter embodied in the thesis entitled, “Investigating
Drug-Mediated Conformational Changes in KRas”, is the result of investi-
gations carried out by Saswat Kumar Mohanty at the Department of Biological
Sciences and Department of Chemical Sciences, Indian Institute of Science
Education and Research, Kolkata under my supervision and has not been sub-
mitted elsewhere for the award of any degree, diploma or other qualification.
Supervisor Signature
ii
Declaration
iii
Acknowledgement
First and foremost, I’d like to express my gratitude to my advisor, Dr. Susmita
Roy, for piquing my interest in computational biophysics and her invaluable
guidance, never-ending support, and encouragement throughout the research
process. Being her Master’s student has been a privilege. She introduced me
to international collaborations so early in my career, which has been a lifetime
experience. I have learned a lot from her over the last 2.5 years, including how
to approach a problem from different perspectives, turn a problem into a new
possibility, and, most importantly, look at failures positively. Therefore, I owe
her a debt of gratitude for being an outstanding mentor.
I’d also like to express my heartfelt gratitude to all of the faculty members at
IISER Kolkata, who have instilled in me the knowledge and enthusiasm to carry
out this research work.
I want to take this opportunity to thank my parents, Anagendra Nath Mohanty
and Binata Mohanty, as well as my sister, Swagatika Mohanty, for providing me
with moral support and motivation throughout my time at IISER Kolkata. I also
owe my deepest gratitude to my friends’ group “Crazzzy Confused” for their
constant support throughout this journey of ups and downs at IISER Kolkata.
I would also take the opportunity to thank my co-supervisor, Dr. Purba Mukher-
jee, who agreed to co-supervise my thesis. She has helped me with all the de-
partmental logistics and has extended her constant support whenever I have
needed it. Nevertheless, I’d also like to thank Raju Sarkar, Satyam Sangeet,
Anushree Sinha, Avijit Mainan, and my other lab mates, who have been ex-
tremely helpful in providing me with their unwavering support and helping
me grow. I am also grateful to the DIRAC supercomputing facility at IISER
Kolkata and the supercomputing facility at IACS Kolkata, without which I
would not have been able to carry out my research work.
iv
Abstract
Intrinsically Disorder Proteins (IDPs) are a significant part of the human
proteome, and their involvement in numerous diseases is well documented.
As IDPs have no single fixed structure, they represent an exception to the
structured-protein concept, known as Anfinsen’s dogma. On the other hand,
there are Intrinsically Disorder Regions (IDRs) in some protein structures that
can be fully or partially disordered, containing highly charged amino acid
residues. Despite their unstructured regions, they are involved in critical roles
in cellular functioning.
KRas, a member of the Ras GTPase family, is one of such proteins containing a
number of IDRs known as switch regions. Mutations in wild-type KRas at the
G12 position cause loss of GTPase activity and acquire oncogenic properties
that result in tumour cell growth and cancer progression. Recently, AMG510
was one of the first KRas (G12C) inhibitors efficacious against KRas G12C
tumors. However, a recent FDA-approved drug MRTX849 is more efficacious
than AMG510 in tumour regression in KRas G12C mutant cell lines of multiple
tumour types, especially patients with lung and colon cancer patients.
As acquired resistance to the mutant selective KRas G12C inhibitor like
AMG510 is a major concern in lung cancer, to understand different drug-
induced structural changes of KRas, this thesis work attempts to perform com-
putational studies on the G12C mutated KRas, as well as the above two drug
bound forms: AMG-510 and MRTX-849. This thesis contains four chapters as
follows: Chapter-I contains introduction to IDPs/IDRs, the structural plasticity
and the experimental and computational methods to characterise their proper-
ties. We refresh through the computational methods, in chapter-II, which we
have used to run our molecular dynamics simulations and which we have used
to extract meaningful data from our trajectories. Through our analysis of fluctu-
ation, contact and correlation map, in chapter-III, we find that MRTX is potent
in inhibiting the fluctuation of the IDR switches as compared to AMG. MRTX
forms a large number of hydrogen and a few hydrophobic interactions with
the Switch-II loop. In chapter-IV, our thorough free energy analyses and com-
parison of drug-bound and unbound forms of KRas explore all possible drug-
induced conformational states. This exploration indicates that MRTX is likely
to restrict the GDP-GTP exchange in its functional cycle, and hence, possible
this is one of the reasons that may exert high efficacy. This study also predicts
that switch-II inhibition can act as a potent target for any future drug to make
KRas-G12C inhibition operational and devoid of acquired drug-resistance.
v
Contents
Acknowledgement iv
Abstract v
List of Abbreviations xi
1 Introduction 1
1.1 Order-Disorder Transition: Limit of Anfinsen’s Dogma . . . . . 1
1.1.1 A Continuum rather than Binary . . . . . . . . . . . . . 1
1.1.2 Structural plasticity and implications . . . . . . . . . . . 2
1.1.3 Interactions of IDPs and mechanism . . . . . . . . . . . 3
1.2 Experimental characterization of IDPs . . . . . . . . . . . . . . 4
1.3 Computational aspects of IDPs . . . . . . . . . . . . . . . . . . 5
1.3.1 Molecular Dynamics Simulations . . . . . . . . . . . . . 5
1.3.2 Energy Landscape Visualization Method (ELViM) . . . . 6
1.3.3 Parallel Tempering . . . . . . . . . . . . . . . . . . . . . 6
1.4 IDP Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Prostate-Associated Gene 4 (PAGE4) . . . . . . . . . . . 8
1.4.2 Ras-family proteins . . . . . . . . . . . . . . . . . . . . 9
vi
2.1.3 The First Postulate: Time average is equal to Ensemble
average . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.4 The Second Postulate: Equal A Priori Probability . . . . 14
2.1.5 The Ergodic Hypothesis . . . . . . . . . . . . . . . . . . 15
2.1.6 Canonical Partition Function . . . . . . . . . . . . . . . 15
2.1.7 Relating Thermodynamics and Partition Function . . . . 17
2.2 Basic Concepts of Molecular Dynamics Simulation . . . . . . . 19
2.2.1 General Features of Force Field . . . . . . . . . . . . . . 19
2.2.1.1 Bonded Potential . . . . . . . . . . . . . . . . . 20
2.2.1.2 Angular Potential . . . . . . . . . . . . . . . . . 20
2.2.1.3 Torsional Potential . . . . . . . . . . . . . . . . 21
2.2.1.4 Non-bonded Potential . . . . . . . . . . . . . . 23
2.2.2 Energy Minimisation . . . . . . . . . . . . . . . . . . . 24
2.2.2.1 Steepest Descent . . . . . . . . . . . . . . . . . 25
2.2.2.2 Conjugate Gradient . . . . . . . . . . . . . . . . 26
2.2.3 Basic Approach . . . . . . . . . . . . . . . . . . . . . . 26
2.2.4 Numerical Integration Methods . . . . . . . . . . . . . . 27
2.2.4.1 Verlet Algorithm . . . . . . . . . . . . . . . . . 27
2.2.4.2 Leap-Frog Algorithm . . . . . . . . . . . . . . . 28
2.2.4.3 Velocity-Verlet Algorithm . . . . . . . . . . . . 28
2.3 Temperature and Pressure Control . . . . . . . . . . . . . . . . 29
2.3.1 Temperature Coupling . . . . . . . . . . . . . . . . . . . 29
2.3.2 Pressure Coupling . . . . . . . . . . . . . . . . . . . . . 30
2.4 Tricks for Computational Efficiency . . . . . . . . . . . . . . . 31
2.4.1 Periodic Boundary Conditions . . . . . . . . . . . . . . . 31
vii
2.4.2 Minimum Image Convention and Truncation of Inter-
molecular Interaction . . . . . . . . . . . . . . . . . . . 32
2.4.3 Long Range Forces: Ewald Summation and Particle Mesh
Ewald . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.4 Neighbour Lists and Cell Lists . . . . . . . . . . . . . . 34
2.4.5 Free Energy Calculations: Umbrella Sampling . . . . . . 34
2.4.6 Free Energy Calculations: Metadynamics . . . . . . . . . 37
2.4.6.1 Standard Metadynamics . . . . . . . . . . . . . 39
2.4.6.2 Well-tempered Metadynamics . . . . . . . . . . 40
2.5 Computational Methods for Analysis . . . . . . . . . . . . . . 41
2.5.1 Root-Mean Square Distance (RMSD) Analysis . . . . . . 41
2.5.2 Root-Mean Square Fluctuation (RMSF) Analysis . . . . 41
2.5.3 Theory of Correlation Analysis . . . . . . . . . . . . . . 42
2.5.3.1 Gaussian Network Model . . . . . . . . . . . . 42
2.5.3.2 Covariance Matrix . . . . . . . . . . . . . . . . 43
2.5.3.3 Correlation Matrix . . . . . . . . . . . . . . . . 44
viii
3.2.1.2 Mutated protein in drug-bound state: Specific in-
teraction with AMG-510 & MRTX-849 . . . . . 49
3.2.2 Hybrid protein specific force field: CHARMM36IDPSFF 50
3.2.3 Atomistic simulation methods . . . . . . . . . . . . . . . 52
3.3 Results & Analysis: Comparison of conformational dynamics
among the G12C variants, and the AMG and MRTX drug-bound
forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.1 Finding fluctuating motifs from RMSF analysis . . . . . 53
3.3.2 Quantifying & comparing fluctuation of different IDRs in
KRas from RMSD . . . . . . . . . . . . . . . . . . . . . 54
3.3.3 Fluctuation-fluctuation correlation at residual level be-
tween Switch-I & Switch-II . . . . . . . . . . . . . . . . 55
3.3.4 Structural investigation in the neighbourhood of switch
regions–specifically focussing on α-2 & α-3 . . . . . . . 57
3.3.4.1 Temporal Helicity comparison . . . . . . . . . . 57
3.3.4.2 Dihedral Analysis . . . . . . . . . . . . . . . . . 58
3.3.5 Exploration of drug-mediated interaction through contact
map analysis . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
ix
4.3.2 Comparison of conformational states of the oncogenic
variant & the drug-bound forms of KRas . . . . . . . . . 66
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Future Aspects 69
References 70
x
List of Abbreviations
MD Molecular Dynamics
IDP Intrinsically Disordered Protein
IDR Intrinsically Disordered Region
PIN Protein Interaction Network
MORF MOlecular Recognition Feature
SLiM Short-LInear Motif
EM Electron Microscopy
SAXS Small-Angle X-ray Scattering
DLS Dynamic Light Scattering
AFM Atomic Force Microscopy
CD Circular Dichorism
smFRET single molecular Förster Resonance Energy Transfer
2f-FCS Two-focus Fluorescence Correlation Spectroscopy
MS Mass Spectrometry
NMR Nuclear Magnetic Resonance
FTIR Fourier Transform Infrared Spectroscopy
PAGE4 Prostate-Associated GEne 4
PCa Prostate Cancer
ELViM Energy Landscape Visualization Method
NVE Number of particles, Volume, Energy
NVT Number of particles, Volume, Temperature
NPT Number of particles, Pressure, Temperature
GROMACS GROningen MAchine for Chemical Simulations
MIC Minimum Image Convention
xi
TIMI Truncation of InterMolecular Interaction
PBC Periodic Boundary Condition
CV Collective Variable
MAPK Mitogen Activated protein Kinase
ERK Extracellular Signal-Regulated Kinase
DNA Deoxyribose Nucleic Acid
KRas Kirsten Rat sarcoma
SOS Son of Sevenless
GEF Guanine Exchange Factor
GDP Guanosine DiPhosphate
GTP Guanosine TriPhosphate
GAP GTPase Activating Protein
P-loop Phosphate-binding loop
HVR Hyper-Variable Region
NSCLC Non-Small-Cell Lung Cancer
AMG AMGen
MRTX MiRati Therapeutics
FDA Food and Drug Administration
CHARMM36IDPSFF Chemistry at Harvard Macromolecular Mechanics-36
Intrinsically Disordered Protein Specific Force Field
CMAP Correction MAP
LINCS Linear Constraint Solver
GNM Gaussian Network Model
RMSD Root-Mean Square Distance
RMSF Root-Mean Square Fluctuation
xii
List of Figures
xiii
2.16 Nodes in the GNM model connected with springs; adapted from
Ref. [111] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.1 Schematic of MAPK/ERK cellular signalling pathway; adapted
from Ref. [112] . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 KRas function under physiological and mutated states; adapted
from Ref. [114] . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 KRas Structure and schematic representation of helices and
sheets; adapted from Ref. [115] . . . . . . . . . . . . . . . . . 47
3.4 Pie chart showing mutational distribution for KRas malignancy
in NSCLC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Mutated KRas at position 12 from Glycine to Cysteine . . . . . 50
3.6 Mutated KRas covalently attached to AMG-510 at position 12 . 51
3.7 Mutated KRas covalently attached to MRTX-849 at position 12 51
3.8 RMSF plot of GDP bound G12C variant, AMG and MRTX drug-
bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.9 RMSD plots of GDP bound G12C variant, AMG and MRTX
drug-bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.10 Correlation plots of GDP bound G12C variant, AMG and MRTX
drug-bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.11 α-2 helix melting histogram comparison . . . . . . . . . . . . . 57
3.12 Frequency-dependent histograms of GDP bound G12C variant,
AMG and MRTX drug-bound . . . . . . . . . . . . . . . . . . 59
3.13 Contact map of Switch-II loop region with drugs . . . . . . . . 60
3.14 Contact map of Switch-II’s α-2 region with drugs . . . . . . . . 60
3.15 Contact map of α-3 helix with drugs . . . . . . . . . . . . . . . 60
xiv
4.1 Schematic diagram of KRas-GEF Interaction scheme showing
Kick-Out; adapted from Ref. [80] . . . . . . . . . . . . . . . . 63
4.2 Binding and Unbinding mechanism of KRas-GEF interaction
(PDB ID: 7KFZ) . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Well-tempered Metadynamics plot of WT KRas along with its
most stable state . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 Well-tempered Metadynamics plot of G12C-mutated KRas along
with its most stable state . . . . . . . . . . . . . . . . . . . . . 65
4.5 Well-tempered Metadynamics plot of AMG-bound mutated
KRas along with its most stable states . . . . . . . . . . . . . . 66
4.6 Well-tempered Metadynamics plot of MRTX-bound mutated
KRas along with its most stable states . . . . . . . . . . . . . . 67
xv
1 Introduction
1
as binary states, whereas in IDPs it essentially dictates a continuum of states [9].
Because of the structural heterogeneity exhibited by the IDPs, they occupy the
key nodal positions in the Protein Interaction Networks (PIN) [17, 18]. PINs
being the channel system inside the cells, are essential for the coordinated func-
tioning of the cell. However, due to the ability of IDPs to promiscuously inter-
act, when overexpressed, IDPs can rewire and change the PINs adapting to the
new environmental perturbations [19].
Being major hubs in the PINs, IDPs perform a multitude of functions such as
2
signaling via cellular protein networks, splicing, embryonic differentiation and
development, and transcriptional regulation [6, 20]. The interactions exhibited
by IDPs have high specificity with low affinity which lead to rapid and spon-
taneous dissociation and, hence, termination of the downstream signal, which
allows high levels of cellular control [21, 22]. These let IDPs to function as
sensitive rheostats and switches in the PIN regulatory circuits [22, 23].
Apart from these, IDPs are involved in major cellular events such as: regulation
of cell cycle, phenotypic plasticity, stress response, and circadian rhythm [24,
25, 26, 27, 28, 29, 30]. Moreover, most IDPs can form protein-based memo-
ries that drive the development and inheritance of biological characteristics in a
prion-like way [31]. IDPs are also involved in ensuring that other proteins fold
properly. Therefore, several chaperones, heat-shock proteins (Hsp22 and αβ-e)
and stress-response proteins are IDPs in nature [32].
Due to the vast repertoire of functions IDPs execute, any dysregulation can lead
IDPs to cause pathological states [33]. Hence, in many diseases like cancer,
neurodegenerative diseases, genetic diseases, diabetes, etc. IDPs are seen to be
dysregulated [34, 35, 36].
It has been studied that several of IDPs undergo disorder to order transition upon
binding. Once they bind to their cognate partners, they undergo the “coupled
folding and binding” phenomenon [37]. For this to happen, two mechanisms
must go simultaneously. The first one is the “induced fit” and the other one be-
ing “conformational selection” mechanism. The former one informs that IDPs
fold after associating with the target, while the latter envisages all potential con-
formations of the ensemble pre-exist among which one is selected by the lig-
3
and [38]. However, both can co-exist suggesting that the binding mechanism
of the IDPs are determined by their intrinsic secondary structure propensities
[39]. Hence, the disorder to order transition in IDPs is referred to as “template
folding”, where the partner binding to the IDP dictates the route to the product
formed, ensuring a cooperative binding [40]. However, for some of the IDPs
it has been seen that they continue to stay disordered even when bound to their
cognate partner. Such interactions have been described to be as “fuzzy com-
plexes” [41]. IDPs mainly interact with their binding partners with the help of
molecular recognition features (MORFs) and short-linear motifs (SLiMs), in ad-
dition to low complexity sequences [42, 43]. The interactions of IDPs through
SLIMs and are particularly electrostatic in nature, either through a highly pos-
itively charges patch or a negatively charged ones [44, 45]. However, some
hydrophobic regions are also found to be interacting [46].
Experimentally characterizing the IDPs, especially the IDR regions in the large
proteins and complexes still remains a major challenge. Since the well-known
techniques like cryo-EM and X-Ray crystallography provide only static state
images of proteins in the frozen and crystallized states, respectively, they are
not adequate to study the vast ensemble of structural heterogeneity posed by
the IDPs [47]. Therefore, the go-to techniques for the characterization of IDPs
are as follows: small-angle X-ray scattering (SAXS), dynamic light scattering
(DLS), atomic force microscopy (AFM), circular dichorism (CD), single molec-
ular Förster resonance energy transfer (FRET), fluorescence, two-focus fluo-
rescence correlation spectroscopy (2f-FCS), mass spectrometry (MS), nuclear
magnetic resonance (NMR), Fourier transform infrared spectroscopy (FTIR)
4
and Raman spectroscopy [48, 49, 50, 51, 52, 53, 54, 55]. However, all these
experiments are able to provide only limited resolution, structure, and dynam-
ics. Here, the computational methods have come to the rescue by elucidating the
conformational ensemble to a higher degree, along with validating the experi-
mental results [56, 57]. Nevertheless, the visualisation of the whole complex
energy landscape still remains a computational challenge.
5
tive variables associated with the slowest motions[59]. These techniques have a
common limitation in that they require a priori definition of coordinates, which
can be computationally expensive.
Local minima can be addressed individually, and visualization of the distances
between local minima in a hierarchical representation is also an appealing way
to probe the energy landscape[60]. The methods described above are well suited
to studying funnel-like landscapes with well-defined energy basins. IDPs, on
the other hand, are far more difficult systems to study because of their high
disorder, shallow energy minima, and lack of reference structures.
IDPs are found in shallow, rugged free energy landscapes with multiple confor-
mational populations in dynamic equilibrium. As a result, using experimental
techniques to structurally resolve them at high resolution is difficult. Molecular
6
simulation has recently been used in conjunction with low-resolution ensemble-
averaged data to elucidate the structural and dynamical features of IDPs at
higher resolution[64, 65, 66]. Despite many advances, extracting an IDP’s ex-
perimentally consistent ensemble remains a difficult task. This is due in part to
the presence of multiple conformational states in an ensemble, which makes ex-
perimental data noisy, sparse, and/or ambiguous. Molecular simulations, on the
other hand, typically sample only a small portion of an IDP ensemble’s phase
space, despite the fact that the underlying free energy landscape is shallow. The
presence of significant entropic barriers between different population clusters is
an often overlooked aspect of IDPs sampling and the main reason for samples
failing to replicate the ensemble and thermodynamic averages of experiments.
Adequate sampling is required for the determination of experimentally con-
sistent ensemble data from simulation, which is typically accomplished in ad-
vanced sampling approaches by either applying structural restraints using col-
lective variables or re-weighting the obtained conformations to arrive at Boltz-
mann weighted populations[67, 68]. Parallel tempering (PT) sampling is ap-
pealing because it can be used effectively without any reweighing or restraining,
and it does not require the use of a low-dimensional collective variable (CV) to
define the ensemble states. Furthermore, in cases where sampling results do
not match experimental data, PT can be seamlessly combined with other CV-
based restraining methods or re-weighted to solve the problems of interest[69,
70, 71]. Several variants of PT have evolved in recent years like TREMD,
REST/REST2, REHT and gREST.
7
1.4 IDP Examples
Prostate Cancer (PCa) is a leading cause of mortality and morbidity around the
world. PAGE4, a protein that appears to act as both an oncogenic factor and
a metastasis suppressor, has been identified as a novel therapeutic target for
PCa. PAGE4 is a prostate-specific Cancer/Testis Antigen that is highly upreg-
ulated in the human foetal prostate and its diseased states, but not in the adult
normal gland[72]. The PAGE4 protein is predicted to be highly disordered by
bioinformatic algorithms[73]. PAGE4 is expected to have several regions (most
notably residues 13–19 and 86–92, and to a lesser extent 49–61) with a slightly
increased propensity to order, according to these analyses[74]. Also, it has been
reported that, PAGE4 has metastable secondary structures, according to nuclear
magnetic resonance (NMR) experiments.
PAGE4 acts as a stress-response protein by suppressing reactive oxygen species
and preventing DNA damage. The kinase HIPK1 can phosphorylate PAGE4 at
two residues (S9, T51); phosphorylation of PAGE4 allows it to interact with the
AP-1 transcription factor complex [75]. Another kinase, CLK2, can phosphory-
late PAGE4, and the two phosphorylated versions of PAGE4 (HIPK1-PAGE4
and CLK2-PAGE4) have opposing functions due to their different conforma-
tional dynamics[76, 72]. CLK2-PAGE4 has a reduced affinity for AP-1 due
to its random coil-like structure, whereas HIPK1-PAGE4 has a compact con-
formational ensemble that can bind AP-1 and potentiate c-Jun[72]. Because
c-Jun potentiation indirectly increases CLK2 levels via AR, a negative feed-
back loop is formed, resulting in oscillations in AR levels and those of different
phosphorylated versions of PAGE4. These oscillations can cause non-genetic
8
Figure 1.2: ELViM representation of different phosphorylated states of PAGE4; adapted from
Ref. [77]
Ras, which stands for “Rat Sarcoma Virus,” is a group of proteins found in all
animal cell types and organs. Ras proteins are members of the small GTPase
protein family, which is involved in signal transmission within cells. When
incoming signals ”switch on” Ras, it activates other proteins, which in turn ac-
tivate genes involved in cell growth, differentiation, and survival. Ras gene mu-
tations can result in the production of permanently activated Ras proteins, which
can cause unintended and overactive signalling within the cell even when no ex-
ternal signals are present. Overactive Ras signalling can lead to cancer because
these signals cause cell growth and division[78]. HRAS, KRAS, and NRAS
are the three most common oncogenes in human cancer; mutations that perma-
nently activate Ras are found in 20 to 25 percent of all human tumours, and up
to 90 percent in certain types of cancer[79]. As a result, Ras inhibitors are being
9
investigated as a potential treatment for cancer and other diseases characterised
by Ras over-expression.
Six beta strands and five alpha helices make up Ras[80]. It has two domains:
a G domain that binds guanosine nucleotides and a C-terminal membrane tar-
geting region (CAAX-COOH, also known as CAAX box) that is lipid-modified
by farnesyl transferase, RCE1, and ICMT. The G domain has five G motifs that
directly bind GDP/GTP. The P-loop, also known as the G1 motif, binds the beta
phosphate of GDP and GTP. The threonine35 in the G2 motif, also known as
Switch-I or SW1, binds the terminal phosphate (-phosphate) of GTP and the
divalent magnesium ion bound in the active site. The DXXGQ motif is found
in the G3 motif, also known as Switch-II or SW2. The D stands for aspartate57,
which is specific for guanine versus adenine binding, and the Q stands for glu-
tamine61, which activates a catalytic water molecule for GTP to GDP hydroly-
sis. The LVGNKXDL motif in the G4 motif provides specific interaction with
guanine. A SAK consensus sequence can be found in the G5 motif. The A is
alanine146, which provides guanine specificity rather than adenine specificity.
When GTP is hydrolyzed into GDP, the two switch motifs, G2 and G3, are
the main parts of the protein that move. The basic functionality of a molecular
switch protein is mediated by this conformational change mediated by the two
switch motifs. The “on” state of Ras is the GTP-bound state, while the “off”
state is the GDP-bound state. Ras also binds a magnesium ion, which aids in
nucleotide binding coordination.
10
2 Concepts of Computational Methods and Techniques
11
2.1.2 Ensembles
The concept of the ensemble is based on the fact that an equilibrium system is
made up of a large number of microscopic states, also known as microstates.
When the system’s temperature is non zero, the system’s natural motion takes it
through some of these microstates in a timescale comparable to the macrostate’s
measurement timescale. If we assume that the system has constant energy at all
times, the trajectory moves on a constant energy surface. As a result, the prob-
ability distribution of the thermodynamical and mechanical properties at these
microstates is required to calculate the system’s average equilibrium properties.
Instead of chasing the time evolution of these microstates to obtain this distri-
bution, we consider a mental picture of a large number of systems with similar
macroscopic properties such as the number of particles (N), pressure (P), vol-
ume (V), and energy (E). Given the large number of microstates, it is highly
likely that the microstates inhabited by each system are distinct. Hence, ensem-
ble is a mentally constructed collection of thermodynamically identical systems.
The ensembles can be classified based on the thermodynamic constraints.
12
Figure 2.2: Schematic representation of microcanonical ensemble
correspond to the same temperature and have the same number of particles
and volume. To achieve this ensemble, the system must be a closed sys-
tem that cannot exchange particles with the environment but can exchange
energy in order to maintain an equilibrium temperature. In molecular dy-
namics simulation, this is the most commonly used ensemble. As a result,
we’ll talk about it more in the second half of the chapter.The schematic is
represented in the Figure 2.3.
13
Figure 2.4: Schematic representation of grandcanonical ensemble
The time average of a quantity (Ō) can be defined as the average of the quantity
over a long period of time. Mathematically,
Z t0 +τ
1
Ō = lim O(t)dt (2.1)
τ →∞ τ t0
1 X
N
⟨O⟩ = lim Oi p i (2.2)
N →∞ N
i=1
Ō = ⟨O⟩ (2.3)
When given a very long time, this postulate states that a system has an equal
chance of being in any microstate corresponding to the system’s macrostate.
14
In other words, given enough time, the system visits each of the microstates
an equal number of times. However, because the criteria for a long time have
not been defined properly, it is assumed that this time is much longer than any
relaxation time of the system, preventing us from capturing it in all possible
states.
The Ergodic hypothesis states that if a system is given enough time, it is free
to explore all of the microstates associated with it, and the time spent in each
microstate is proportional to its volume in phase space. However, if a system
becomes trapped in a region of phase space, this hypothesis is broken, and the
ensemble average is no longer equal to the time average, violating the statistical
mechanics’ first postulate. As a result, the Ergodic hypothesis serves as a link
between the two statistical mechanics postulates. The second postulate is valid
because of the Ergodic hypothesis, and the ensemble average is done because
of the second postulate.
We would refer from here the macrocanonical ensemble as the canonical en-
semble. We now consider a system consisting of Np number of replicas of the
original NVT system, as we know in canonical ensemble N, V, and T are kept
constant. All of the ensemble’s systems are placed next to each other so that
they can exchange heat through their heat-conducting walls, but not matter.
This entire ensemble is placed in a heat bath, and after reaching equilibrium,
a thermal insulation is placed around the entire ensemble, forming an isolated
super-system. This is done to convert the canonical ensemble to a microcanon-
ical ensemble, allowing statistical thermodynamics postulates that are only true
15
for microcanonical ensemble to be applied to canonical ensemble as well. The
microcanonical super-ensemble of the system under consideration is made up
of mental replicas of such a super-system.
Now we denote ni as the number of systems in our super-energy system’s Ei ,
and Et as the total energy of the super-system. Hence,
X X
ni = Np and ni E i = E t (2.4)
i i
Np !
Ω(ni ) = (2.5)
n1 !n2 !n3 ! . . .
The two undetermined multipliers here are α and β. Now we will differentiate
using the maximum term method to obtain the following expression:
!
X
ln nj − ln n∗i − α − βEi = 0 (2.9)
j
16
n∗i
= e−α−βEi , j = 1, 2, . . . (2.10)
N
Here n∗i is the most probable distribution. Here sum over n∗i is equal to the total
number of systems in the ensemble.
X
eα = e−βEi (2.11)
i
Hence,
n∗i e−βEi (N,V )
P̄i = = P −βE (N,V ) (2.12)
N ie
i
X
ZN (V, T ) = e−βEi (N,V ) (2.13)
i
X P
Ei e−βEi (N,V )
Ē = Pi Ei = Pi −βE (N,V ) (2.14)
ie
i
i
Hence,
X
dĒ = (Ei dPi + Pi dEi )
i
(2.15)
1X ∂Ei
= (ln Pi + ln Z)dPi + Pi dV
β i ∂V N
17
We also know that,
! !
X X X
dS = −kB d Pi ln Pi = −kB dPi + ln Pi dPi (2.18)
i i i
Hence,
1
β= (2.21)
kB T
Using the thermodynamic relationship between entropy, internal energy and
Helmholtz free energy, we get,
Ē Ē A
S= + kB ln Z = − (2.22)
T T T
Hence,
A = −kB ln Z(N, V, T ) (2.23)
Hence,
∂A ∂ ln Z
S=− = kB T + kB ln Z (2.25)
∂T V,N ∂T V,N
18
and
∂A ∂ ln Z
p=− = kB T (2.26)
∂T T,N ∂T T,N
Force Fields are the most important part of molecular dynamics simulations.
Force fields contain terms that help to calculate the overall energy of the sys-
tem at any time point. As according to the Born-Oppenheimer approximation,
the electronic and nuclear motion of the system can be decoupled, in the force
fields we only account for the nuclear part of the system[81]. This greatly re-
duces the computational cost because we do not have to look for the electronic
motions[82, 83]. However, the drawback is that it is unable to predict bond
formation and breakage.
The terms in the force fields are simple and come from several molecular
motions of bonds, for example stretching and bending, described by Hooke’s
law[84, 85, 86]. Most of the force fields describe the motion with four com-
ponents, the first two arising from bonds, the third one arising from the bond
rotations, and the last one pertains to the non-bonded interactions. Therefore,
the general form looks like this:
X X X X
V (rN ) = Vij + Vijk + Vijkl + Vij (2.27)
bond angle dihedral nb
Here, the r signifies that V is the function of particle coordinate and the N is the
number of particles in the system.
19
2.2.1.1 Bonded Potential
Figure 2.5: Atoms i and j connected by spring with force constant kij ; adapted from Ref. [87]
The bonded potential corresponds to the energy term contributed by the covalent
bonds present in the system and is derived from Hooke’s law. The expression
is as follows:
X X1
Vij = Kij (rij − rijeq )2 (2.28)
2
bond bond
Here, Kij refers to the force constant of the covalent bond, rij is the instanta-
neous bond length, and rijeq is the equilibrium bond length between atoms i and
j.
Figure 2.6: Atoms i, j and k making an angle θijk ; adapted from Ref. [87]
20
The angular potential term corresponds to the energy term contributed by the
vibrational angular motion present in the system corresponding to atoms i, j,
and k. The expression is as follows:
X X1 eq 2 eq 2
Vijk = [Kijk (θijk − θijk ) + KU B (rik − rik )] (2.29)
2
angle angle
Here Kijk refers to the angle constant and KU B is the Urey-Bradley constant
used to describe a non-covalent spring between the ith and kth atom. θijk is the
eq
instantaneous angle term and θijk is the equilibrium angle between i, j, and k.
eq
rik refers to the instantaneous distance between atoms i and k, and rik is the
respective equilibrium term.
The torsional potential term corresponds to the energy term contributed by the
dihedral angular spring present between the two planes made by the first three
atoms and by the last three atoms. There are two types of dihedral terms:
• Proper Dihedral: The proper dihedral consists of four atoms which are
joined consecutively in a chain fashion.
Figure 2.7: Atoms i, j, k and l making a proper dihedral θijkl ; adapted from Ref. [87]
21
The potential can be expressed as follows, taking the first few terms from
the Fourier transform:
X X 1
Vijkl = Kijkl (1 + cos(nϕijkl − ϕ0 )) (2.30)
2
dihedral dihedral
Here Kijkl is the torsional angle constant, the ϕijkl is the instantaneous
dihedral angle and ϕ0 is the minimum-potential angle. n is the multiplicity,
which implies the number of minima present in a complete 360o rotation
of the dihedral.
Figure 2.8: Atoms i, j, k and l making a improper dihedral θijkl ; adapted from Ref. [87]
Here Kijkl is the torsional angle constant, the ϕijkl is the instantaneous
dihedral angle and ϕ0 is the reference dihedral angle.
22
2.2.1.4 Non-bonded Potential
Figure 2.9: Charged atoms i and j separated by distance rij ; adapted from Ref. [87]
X X 1 q i qj
Vij = (2.32)
4πϵ0 rij
elec elec
Here, qi and qj are the charges of species i and j, respectively. The rij
corresponds to the distance between the two species.
Figure 2.10: Atoms i and j separated by distance rij ; adapted from Ref. [87]
23
The expression is as follows:
" 12 6 #
X X σij σij
Vij = 4ϵij −2 (2.33)
rij rij
LJ LJ
Here ϵij is the depth of the potential well. The rij corresponds to the dis-
tance between the two species and σij refers to the distance at which the
species-species potential energy becomes zero.
Before starting of any simulation, the system must be at the minimum energy
conformation. However, the structures reported in the databases are not in the
energy-minimised state and the randomly added solvent configurations also can
create strong steric clashes among themselves or with the solvent. Therefore,
to obtain the local minima, the process of energy minimisation is performed.
Under this algorithm, the slope is equated to zero and the differential gradient
24
is used for obtaining the local minima, as follows:
∂V ∂ 2V
= 0& >0 (2.34)
∂Ri ∂Ri2
Here V is the potential associated with the system and Ri is the coordinate in
3-dimension for the atoms in the system.
Most of the minimisation algorithms tend to locate the minima closest to the
initial configuration. The most used ones are: Steepest Descent and Conjugate
Gradient[81].
This algorithm lets the system to take one step each time and moves in the
direction opposite to the gradient of the potential energy, a coordinate function,
as follows:
rn+1 = rn − αn ∇V (rn ) (2.35)
where αn is the step-size and ∇V (rn ) is the potential energy gradient function.
However, GROMACS instills this in a slightly conditional way:
∇V (rn )
rn+1 = rn − αn (2.36)
max[∇V (rn )]
here max[∇V (rn )] is the largest scalar force on any atom in the system. Then,
• If Vn+1 < Vn , the new position is added and αn+1 is rescaled to 1.2αn
• If Vn+1 > Vn , the new position is rejected and αn+1 is rescaled to 0.2αn
This algorithm can be stopped manually by the user by giving a specific number
of steps for iteration, or giving an accuracy term.
25
2.2.2.2 Conjugate Gradient
This method can be used to choose successive search directions that avoid the
constraint of repeated minimization in the same direction[81].A minimum range
is first determined in each direction, then converged using either a golden sec-
tion search or a quadratically convergent method. It considers the gradient his-
tory when determining the best next step direction.
In the early stages of the minimization, this approach is slower than steepest de-
scent, but it becomes more efficient as you get closer to the energy minimum.
The stop criterion and parameters used in conjugate gradient are the same as
those used in steepest descent in GROMACS[89].
Here now we have the interaction potential terms for the atomic particles and the
corresponding energy minimised structure. Hence, now we can use the New-
ton equation to calculate the position and velocities of the atoms at different
instances. Therefore, from the potential, we need to find the force as follows:
∂V
Fi = − (2.37)
∂ri
After finding the force we now need to find the particle position from the mo-
mentum equation as follows:
dpi dvi d 2 ri
Fi = =m =m 2 (2.38)
dt dt dt
The position and velocity can then be found by integrating the above equation
2.38.
26
2.2.4 Numerical Integration Methods
As described above, the position and velocities of the particles can be calculated
using Newton’s equations of motion. However, the initial position and veloc-
ity needs to be specified. The position is specified by the energy minimised
structure and the velocity is randomly assigned by a Maxwell-Boltzmann dis-
tribution, as follows:
r
mi mi vi2
P (vi ) = exp − (2.39)
2πkB T 2kB T
Here P (vi ) is the probability of particle i with mass mi to have the velocity vi
at temperature T.
After this has been implemented different integration algorithms are used, such
as (i) Verlet algorithm[90], (ii) Leap-frog algorithm[91], and (iii) Velocity Verlet
algorithm[92] for numerically performing the integration and obtain the future
coordinates and velocity.
Using the Verlet algorithm[90], the velocity at t and position at time t+δt can be
calculated using positions from time t and t − δt. This is an iterative algorithm.
The position at time t + δt and t − δt can be expressed as:
1
r(t + δt) = r(t) + v(t)δt + a(t)(δt)2 + ... (2.40)
2
1
r(t − δt) = r(t) − v(t)δt + a(t)(δt)2 + ... (2.41)
2
Adding Equation 2.40 and 2.41 yields:
27
And the velocities can be calculated as follows:
Therefore, this algorithm calculates the half-integer step velocities which are
used to calculate the coordinates of the next steps.
1
r(t + δt) = r(t) + v(t)δt + a(t)(δt)2 (2.47)
2
1
v(t + δt) = v(t) + [a(t) + a(t + δt)] δt (2.48)
2
28
The v(t + δt) is determined substituting:
1 1
v t + δt = v(t) + a(t)δt (2.49)
2 2
During the molecular dynamics simulation, the velocity (vi ) are rescaled for
each of the atoms in the system according to the equipartition theorem so as to
maintain the temperature of the system. Hence, the temperature at each step of
the trajectory is calculated and the velocities are rescaled to bring the instanta-
neous temperature (T) to the required temperature (Treq ).
1 X1
N
3
kB T = mi vi2 (2.51)
2 N i=1 2
r
Treq
vi → vi (2.52)
T
Here vi is the velocity of the ith particle having mass mi in a system of N atoms.
The equation implies an isokinetic thermostat.
29
However, their is an intrinsic problem with isokinetic thermostat. It strictly
maintains a constant temperature, which is highly unrealistic in a biomolecu-
lar system. Rather, biomolecular systems experience a range of temperatures
maintained on an average with the required temperature.
Therefore, small kinetic energy fluctuations, which is dependent on the tem-
perature, are allowed. And as the number of particles (N) increases the fluc-
tuations reduce. However, since simulation can’t account for huge number of
particles due to computational cost, appropriate thermostats are required which
can allow thermal fluctuations along with maintenance of an average required
temperature. Therefore, various thermostats have been developed such as An-
derson[93], Berendsen[94], Nose-Hoover thermostat[95, 96] etc. These ther-
mostats generate thermodynamics ensembles where the average temperature is
maintained throughout the trajectory.
For example, in the Nose-Hoover thermostat the Boltzmann distribution is re-
tained along with an extended ensemble approach. The system is strongly cou-
pled with the required temperature (Treq ), giving the Hamiltonian extra degrees
of freedom.
and r⃗i is the coordinate of it. Here is the pressure coupling the pressure is kept
constant by rescaling the box vectors.
30
Similar to the thermostat described above, the Parinello-Rahman [95, 97] baro-
stat also uses an extended ensemble approach, generating a isothermal-isobaric
(NPT) ensemble in combination with Nose-Hoover thermostat. Under this, the
Netwon’s equation are modified to also incorporate the pressure, volume and
temperature. In addition to that, it includes additional terms to account for the
strength of the coupling between the thermostat and barostat to the system.
31
Figure 2.12: Periodic Boundary Conditions; adapted from Ref. [98]
Using this concept, the periodic images are arranged in all possible directions
in a 3-D lattice. Therefore, the particle coordinates are calculated by adding
integral multiples of the length of the box edges to the coordinates. Therefore,
if a real particle goes out of the box during the simulation, then an image par-
ticle enters the box from the opposite side to mimic the real system. And for
the calculation of particle interactions within the cutoff range, both the particle
neighbours are included.
32
On the other hand, Truncation of Intermolecular Interaction (TIMI) takes into
account that two atoms separated by large distance negligibly interact or don’t
interact at all. This is true for short range interaction potentials where V (r) ∝ 1
rn
and n > 3. Therefore, not all particles needs to be considered greatly reducing
computing cost.
So, for a system of N particles, there are a total of N C2 ∼ (N − 1)N interacting
pairs. Therefore, if N >> 1, then (N − 1)N ≈ N 2 , which also implies that
computational power is squared. Therefore, MIC and TIMI come to the rescue
and save a magnitude of computational power by defining a spherical cutoff
range.
2.4.3 Long Range Forces: Ewald Summation and Particle Mesh Ewald
Methods described above such as the TIMI only take account short-ranged in-
teractions neglecting the Coloumbic and ion-dipole interactions. Therefore, in
such interactions implementing a cutoff radius will result in a very high error
in the interaction force calculations. To rescue these, the Ewald summation[99]
and Particle Mesh Ewald[100, 101] were introduced.
For the long-range forces, all the periodic images of the box are used to cal-
culate the electrostatic potential. Hence, the total electrostatic potential on an
atom ‘i’ is derived as the infinite pair potential sum of all the charged particles
in the box and their respective images. The sum that is derived is divided into
two parts: long-ranged and short-ranged. The short-ranged part is calculated
using the cutoff scheme, whereas the long-ranged part is usually taken care by
the Ewald summation methods[99] by decomposition.
In addition to this, under the Particle Mesh Ewald[100, 101], each atom is repre-
sented on a mesh grid and the potential function is represented as the interaction
33
between the mesh points. Each mesh point is a separate particle that interacts
with all other mesh particles in a convolution. The potential function in the
mesh space is then evaluated via a Fast Fourier transformation. The mesh size
and interpolation strategy determine the accuracy and computational efficiency.
However, the use of cutoff distance drastically decreases the efficiency of the
interaction calculation. Because to implement the cutoff scheme all the atoms
need to be mapped to look into which atoms come and do not come under the
cutoff radius, which is then used to calculate the final interaction energy.
Therefore, the neighbour list scheme[90, 91, 102] is implemented to increase
the computational efficiency. A list of nearby atoms to be included in the non-
bonded interaction calculation is stored in an array and updated periodically
in this method. The distance used to calculate each atom’s neighbouring list
must be greater than the non-bonded cutoff distance, so that no atom outside
the neighbour cutoff gets closer than the non-bonded cutoff distance before the
neighbour list is updated. A correction term can be added to the energy estimate
at each step of updating the neighbour list.
Calculating the order on N2 for an ‘N’ particle system is required to prepare
and update such a neighbouring list. To reduce the number of calculations or
neighbour searches, the entire simulation space can be divided into cells, with
the search limited to particles that are present within the cells[102, 103, 104].
Under steered molecular dynamics, there can be some instances when the
biomolecule of interest may get trapped in a local free energy minima due to
the presence of a high energy barrier, violating the Ergodic hypothesis. Hence,
34
under these conditions the umbrella sampling technique is used which a type
of enhanced sampling method. The technique of umbrella sampling was devel-
oped by two scientists, Torrie and Valleau in 1977[105]. In this method, a bias
potential is used to increase the probability of the molecule to visit the unex-
plored minima at the other end of the high energy barrier. After sampling the
whole phase space, the effect of this bias potential is then finally removed.
Figure 2.13: High Activation Energy Barrier separating State I and State II; adapted from Ref.
[106]
As shown in the Figure 2.13, the states I and II are separated by a very high
energy barrier. Therefore, a biomolecule trapped at state I has a very low chance
of visiting state II in a real time frame and vise-versa. Therefore, a bias potential
is as follows:
W (χ) = K(χ − χ0 )2 (2.54)
Here K is the spring constant of the bias potential and ξ is the reaction coordinate
35
of the system.
Therefore, the effective potential is:
We must now recover the probability distribution for the unbiased trajectories
after sampling the barrier top. This is done using a simple mathematical trick
as written below.
R
O(χ) exp(−βV0 (χ))dχ
⟨O⟩V0 = R
exp(−βV0 (χ))dχ
R
O(χ) exp(−β(V (χ) + W ))dχ
= R
exp(−β(V (χ) + W ))dχ
R (2.56)
O(χ) exp(−β(V (χ)) exp(βW ))dχ
= R
exp(−β(V (χ)) exp(βW ))dχ
⟨O(χ) exp(βW )⟩V
=
⟨exp(βW )⟩V
Using the above mathematical equations, we can adjust the bias potential (W)
to achieve sufficient sampling in the desired region of the phase space, and then
remove the bias potential to recover the unbiased ensemble average. Because
we are using a harmonic potential that resembles an umbrella to constrain the
system to a specific region of phase space in this case, this method is known as
the umbrella sampling method [84, 102, 106].
However, umbrella sampling is computationally expensive. This is because the
method involves creating multiple windows which also should have significant
overlap to minimise errors, as errors from each window adds quadratically.
36
Figure 2.14: Reaction coordinates between two states divided into into distinct windows;
adapted from Ref. [106]
However, the data from the windows is difficult to analyse. Therefore, a method
called the Weighted Histogram Analysis Method [84, 106] is used to combine
different simulations with different biasing potentials to generate the combined
Potential Mean Force (PMF). The PMF provides with important insights as it
gives the gives the free energy barrier separating different states. Hence, the
information about the relative stability of different states present, can be in-
ferred. The method along with other reaction coordinates can also be extended
to multiple reaction coordinates as well.
Metadynamics is another enhanced sampling method where the rare events be-
yond high energy barriers can be explored, implying ergodicity, and hence the
free energy of the system can be estimated[107]. The process is well-known
37
as “filling the free energy with computational sand”. Under this algorithm, the
assumption is made that the free energy of the system can be described by col-
lective variables (CVs)[107]. Hence, during the simulation with metadynamics,
more and more Gaussian hills are added as time progresses and the system is
prevented to go back until all the system explores the complete energy land-
scape and starts making random walks.
Let the Hamiltonian of the system with the bias potential, Vbias , be: H =
K + V + Vbias where Vbias is a function of CVs. Now, we will start updat-
ing the bias potential with the bias rate ω and st is an instantaneous collective
variable value at t. Thus,
∂Vbias (s)
= ωδ(|s − st |) (2.57)
∂t
which implies,
Z tsim
Vbias = ωδ(|s − st |)dt (2.58)
0
For computer simulations, the time t is discretized into τ intervals and the δ is
replaced by a multidimensional positive Gaussian kernel function. This makes
the equation as:
tsim
X
τ
Vbias ≈ τ ωK(|s − sj |)
j=0
tsim ! (2.59)
X τ
1 s − sj
2
≈τ ω exp −
j=0
2 σ
38
2.4.6.1 Standard Metadynamics
As per the assumption of metadynamics, in the long-time limit, the bias potential
converges to minus the free energy as a function of the CVs. Hence,
• The bias potential overfills the underlying Free Energy Surface and pushes
the system toward high-energy regions of the CVs space, which makes it
non-trivial to decide when to stop a simulation.
39
2.4.6.2 Well-tempered Metadynamics
Therefore, to address the first limitation of the standard metadynamics, the con-
cept of well-tempered metadynamics comes into the picture[109].
Here, the bias deposition decreases with time. The new W therefore be-
comes,
Vbias (s(q(ωτ )), ωτ )
W (ωτ ) = W0 exp − (2.63)
kB ∆T
This makes the bias potential to be
!
X V (s(q(ωτ )),ωτ )
− bias k ∆T
Xd
1 si − si (q(ωτ ))
2
Vbias = W0 e B exp − (2.64)
i=1
2 σi
40
∆T → ∞ ⇒ Standard Metadynamics
This factor is defined as the bias factor in well-tempered metadynamics and is
denoted by the symbol γ. Therefore,
∆T 1−γ ∆T
γ =1+ ⇒ = (2.66)
T γ T + ∆T
1−γ
Vbias (s, t → ∞) = − F (s) (2.67)
γ
RMSF is mainly useful in the quantifying how much a protein segment is fluctu-
ating through time. This particularly helps in differentiating between structured
and unstructured regions in a protein of interest. It is calculated by averaging
over time coordinate giving particle-specific values. The expression is given
41
as: v
u
u1 X T
RM SFi = t |xi (τ ) − x̄i |2 (2.69)
T τ =1
Gaussian Network Model is used to study the fluctuation and correlated motions
of atoms. In this model, the α-carbons of the amino acids of the proteins are
identified as nodes, and all nodes are connected by springs within an interaction
range generally with a cutoff distance (rc ) of 0.7Å.
Figure 2.16: Nodes in the GNM model connected with springs; adapted from Ref. [111]
Therefore,
∆Rij = ∆Rj − ∆Ri (2.71)
42
The potential energy of the model can be written as:
" #
γ XN
VGN M = Γij (∆Ri − ∆Rj )2
2 i,j
" N #
γ X
= Γij (∆Xi − ∆Xj )2 + (∆Yi − ∆Yj )2 + (∆Zi − ∆Zj )2
2 i,j
(2.72)
Here γ is the spring constant and Γij is the ijth element of Kirchhoff’s matrix
of inter-residue contact, Γ, defined by:
−1, if i ̸= j and Rij ≤ rc
Γij = 0,
if i ̸= j and Rij > rc (2.73)
− P Γij ,
if i = j
i̸=j
The general assumption of the GNM is that all fluctuations are isotropic and
Gaussian in nature. After further derivations, it can be shown that the covari-
ance matrix is a combination of expectation values of residue fluctuations and
cross-correlations in the diagonal and off-diagonal elements, respectively. The
covariance matrix (for X) is related to Kirchhoff’s matrix as follows:
kB T −1
Ξ= Γ (2.74)
γ
Similarly, it can be written for Y and Z. Therefore, the residue fluctuations and
cross-correlations can be expressed as follows:
3kB T −1
⟨∆Ri2 ⟩ = (Γ )ii (2.75)
γ
43
3kB T −1
⟨∆Ri .∆Rj ⟩ = (Γ )ij (2.76)
γ
Therefore, the covariance matrix (for X) is as follows:
⟨∆x21 ⟩
⟨(∆x1 )(∆x2 )⟩ . . . ⟨(∆x1 )(∆xn )⟩
⟨(∆x2 )(∆x1 )⟩ ⟨∆x22 ⟩ . . . ⟨(∆x2 )(∆xn )⟩
Ξ=
(2.77)
.. ...
.
⟨(∆xn )(∆x1 )⟩ ⟨(∆xn )(∆x2 )⟩ ... ⟨∆x2n ⟩
⟨∆Ri .∆Rj ⟩
C(i, j) = p (2.81)
⟨∆Ri .∆Ri ⟩⟨∆Rj .∆Rj ⟩
44
3 Drug-induced conformational dynamics of oncogenic
KRas: Comparing the effects of AMG-510 & MRTX-849
3.1 Introduction
The MAPK (Mitogen Activated protein Kinase) pathway or the ERK (Extra-
cellular Signal-Regulated Kinase) pathway are a relay of proteins in the cell
that communicate a signal to the cellular DNA from the extracellular recep-
tor. It mainly consists of proteins that are involved in the phosphorylation of
downstream proteins to make them “active” or “inactive” by acting as molecular
switches.
Figure 3.1: Schematic of MAPK/ERK cellular signalling pathway; adapted from Ref. [112]
45
The Ras-family proteins acts as a crucial relay in this chain. And KRas (Kirsten
Rat sarcoma) is one of them [113]. It generally stays in an “Inactive” confor-
mation when bound to GDP. When an upstream signal is intercepted, the SOS-
family (Son of Sevenless) of proteins, which includes GEFs (Guanine Exchange
Factors) catalyses the GDP-to-GTP exchange in the KRas, switching the latter
from an inactive state to an active state. In this state, it is capable of activating
downstream targets by phosphorylating them. Once it does the phosphoryla-
tion, under normal physiological conditions, it being a GTPase, hydrolyses the
GTP to GDP with the help of a protein called GAP (GTPase Activating Protein),
which catalyses the process. The schematic of the pathway has been shown in
the Figure 3.1.
Figure 3.2: KRas function under physiological and mutated states; adapted from Ref. [114]
The schematic structure of KRas is as given below in the Figure 3.3. It has
three important regions which act as the active site of the protein: the P-loop,
46
Switch-I and Switch-II, among which the last two act as Intrinsically Disordered
Regions. There are a total of 5 α-helices, 5 β-sheets and multiple loops forming
a globular kind of protein structure. The HVR (Hyper-variable Region) region
beyond 169 amino acid has been deleted for illustrative purposes.
Figure 3.3: KRas Structure and schematic representation of helices and sheets; adapted from
Ref. [115]
However, mutations can occur in the KRas at some of the potential sites, which
can cause the protein to be oncogenic. Potential mutation sites are G12, G13,
and Q61. These sites are essential in the functioning of KRas, and hence their
mutation leads to abnormalities in the MAPK pathway. These mutations es-
sentially lead to the loss in the GTPase activity of the KRas leading to cancer,
predominantly seen in the lung cells. Therefore, the KRas always stays in the
active state and promotes cell growth and proliferation. The schematic of the
abnormal mechanism is shown in Figure 3.2.
Lung cancer is the most frequently diagnosed cancer and a leading cause of
cancer-related death worldwide making up almost 25% of all cancer deaths
[116]. Non-small-cell lung cancer (NSCLC) is the most commonly diagnosed
47
form of the disease, accounting for >85% of the total cases [117]. Squamous cell
carcinoma and adenocarcinoma are examples of non-small-cell lung tumours
that act similarly. Mutation in the protein involved in the MAPK signalling
pathway, i.e. KRas, is the leading cause of NSCLC.
Figure 3.4: Pie chart showing mutational distribution for KRas malignancy in NSCLC
3.1.3 Most fatal G12C mutation and its drug induced inhibition
The major codon affected in these NSCLC cells is codon 12. The remaining
being 13 and 61 [118]. And among the codon 12, G12C is the most common
mutation accounting to around 46% as shown in Figure 3.4 [119]. The other
major ones include G12V, G12D, and G12A. Recently, for the G12C-mutated
KRas two drugs named AMG-510 and MRTX-849, were designed by Amgen
and Mirati Therapeutics, respectively. The latter was recently approved by the
FDA [120]. One aspect of this project mainly involves investigating the G12C-
mutated KRas and its interaction in the presence of those two drugs: Adagrasib
(MRTX-849) and Sotorasib (AMG-510). These two drugs covalently bind to
the mutated site by forming a C–S bond and inhibit the binding of GTP to KRas,
48
leading the KRas to remain inactive for an indefinite period of time. The two
drugs are very similar in structure. However, it has been shown in the literature
that the former has an overall potency (Kinact /KI ) of 35 mM −1 s−1 , whereas
the later has a potency of 9.9 mM −1 s−1 [121, 122]. Here, Kinact represents the
rate of inactivation and KI represents the reversible affinity. Therefore, it was
of interest to look into the mechanism of the drug binding and the differential
inhibition caused by the two drugs at the molecular scale.
The original KRas protein configuration was taken from the Protein Data Bank
(PDB ID: 4OBE) from an X-Ray Diffraction experiment by Hunter et al.[123].
Subsequently, the glycine at position 12 was altered. Following that for the
simulation of the G12C variant with GDP attached, PyMol’s mutagenesis tool
was used to convert glycine to cysteine, in-silico. The structure was homology
modelled with the SWISS-MODEL server to account for missing regions[124].
The structure of the G12C-mutated protein is as shown in Figure 3.5.
3.2.1.2 Mutated protein in drug-bound state: Specific interaction with AMG-510 &
MRTX-849
X-Ray Diffraction structures from the protein data bank PDB ID: 6OIM[121]
and PDB ID: 6UT0[122] were utilised to investigate the mutated KRas structure
with the bound drugs Sotorasib and Adagrasib. Since, chemically both the drugs
are covalently bound to the mutated site and the force field doesn’t account for
49
this new bond, the simulation parameters were missing for this residue. There-
fore, the bond was hypothetically created using a distance-constrained spring of
force constant k = 10, 000 kJ mol−1 nm−2 . Also, because drug molecules were
not contained in the initial force field of CHARMM36IDPSFF, the SWISS-
PARAM module was used to determine their parameters. Both structures were
homology modelled with the SWISS-MODEL server to account for missing
regions[124]. The structures of the G12C-mutated drug-bound proteins are as
shown in Figure 3.6 and Figure 3.7.
The force field used for explicit solvent simulation of KRas with its ligands and
drugs was CHARMM36IDPSFF, as this force field was specifically designed to
simulate intracellular disordered proteins (IDP) or their regions (IDRs) [125].
50
Figure 3.6: Mutated KRas covalently attached to AMG-510 at position 12
This force field was improvised from the previously established force field
CHARMM36m (C36m), which was itself an improved version of CHARMM36
(C36) [126]. The C36m force field yielded a high-energy barrier in the back-
51
bone dihedrals between the poly-proline II region and the helix region in the
Ramachandran plot. Therefore, with modified Grid-based energy correction
map (CMAP) parameters for all the 20 naturally occurring amino acids, the
CHARMM36IDPSFF force field was developed. It is to be noted that, CMAP
method was first used in CHARMM22 to account for improved sampling of
backbone dihedrals [127]. Since, CHARMM36IDPSFF accounted for the back-
bone dihedrals better than the C36m, it was the go to force field for the hybrid
protein system.
GROMACS software was used to run the three simulations, and the topologies
were created using the CHARMM36IDPSFF force field, which is specifically
built for proteins having Intrinsically Disordered Regions[125]. After that, each
of them were centered in a dodecahedral box and solvated using the TIP3P wa-
ter model[128]. To mimic the normal physiological environment, the systems
were then neutralised with sodium and chloride ions.
After the systems were prepared, they were subjected to energy minimisation
using the steepest descent algorithm to remove steric clashes. The protein and
bound ligands were first position constrained with a force constant of 1000
kcal mol−2 nm−2 and the solvent was equilibrated. At this stage, the systems
were allowed to go through an NVT equilibration at 300 K using the modi-
fied Berendsen thermostat[94] for 6 ns. The Parrinello-Rahman barostat[97]
was then used to maintain an average pressure of 1 bar on all systems for 7ns.
Throughout the simulations, a time step of 1fs was maintained and the leap-
frog integrator was utilised. The position restrictions were eliminated in the
final simulation and the simulations were run for 1µs each using the NPT pa-
52
rameters. Particle Mesh Ewald was used for electrostatic calculations, with a
cubic interpolation of order 4 and a grid spacing of 0.16 for the Fast Fourier
Transform. Periodic boundary conditions were used throughout all the simu-
lations in all directions. After every 10 steps, the neighbour list was updated
using a grid method, with a short-range neighbour list cut-off of 1 nm. LINCS
constraints were applied to all of the bonds.
• For the GDP G12C variant, both the switch regions show very high fluc-
tuation in the regime of 0.4-0.6nm.
• For GDP and AMG bound protein, the fluctuation is reduced only in the
Switch-I region.
• The GDP and MRTX bound protein shows reduced fluctuations in the both
the switch regions.
53
Figure 3.8: RMSF plot of GDP bound G12C variant, AMG and MRTX drug-bound
3.3.2 Quantifying & comparing fluctuation of different IDRs in KRas from RMSD
Once the fluctuation was obtained, to understand their flexibility we have anal-
ysed and compared P-loop, Switch-I and Switch-II’s Root Mean Square Devi-
ation. The blue part indicated in the figure 3.9, represents the non-equilibrium
part of the simulation and they have been ignored for all further analysis.
54
Figure 3.9: RMSD plots of GDP bound G12C variant, AMG and MRTX drug-bound
The RMSD for the P-loop, Switch-I and Switch-II of all the IDRs of the three
systems were analysed as shown in the Figure 3.9. As is evident from the figure
3.9, the RMSD fluctuations for the Switch-II region of the MRTX-bound species
is significantly lower than the RMSD fluctuation of the AMG-bound and no
drug-bound G12C-mutated species. The Switch-I fluctuation is also restricted
in one bound to MRTX as compared to the other two species.
Inferring from the upper sections that the Switch regions are highly flexible
as compared to the other regions of the protein, it was of interest to see if the
motions of the switch regions are correlated in some way. Therefore, the gmx
55
covar module of GROMACS was used to calculate the covariance matrix, which
was indeed used to further calculate the correlation matrix using the theory from
Section 2.5.3.
Figure 3.10: Correlation plots of GDP bound G12C variant, AMG and MRTX drug-bound
Therefore, the covariance matrix of the Cα atoms was constructed and then the
correlation matrix was calculated. The matrix were plotted as shown in the Fig-
ure 3.10.
Here in the Figure 3.10, the blue circles indicate the regions of Switch-I and
Switch-II correlation. Though a prominent anti-correlation can be seen in the
case of both GDP bound G12C variant and AMG-drug bound one, the correla-
tion is seen to be completely diminished in the case of the MRTX drug bound
one. Therefore, it can be concluded that MRTX not only reduces the fluctuation
56
of the Switch regions but also diminishes the correlated motion between the two
Switch regions.
The α-2 helix in the KRas protein is a part of the Switch-II region as mentioned
before. This is because, the helix has a very high propensity to fluctuate under
normal conditions and continuously shifts its configuration between an alpha
helix and loop. Therefore, a study was conducted to look upon if different sys-
tems we are studying have different helicity.
57
For this study, the gmx helix module of GROMACS was used to collect the
time-dependent data and then a frequency-dependent histogram was plotted by
normalizing all the plots with the global maxima of the frequency obtained form
the three data sets. The data were obtained as shown in Figure 3.11.
The more the hydrogen bonds present better stabilized is the helix. As can
seen in the Figure 3.11, the frequency-dependent normalised histogram for α-2
helix for GDP-bound G12C variant and the AMG-drug bound protein mostly
stays around the 3 to 4 hydrogen bonds during the simulation. However, for the
MRTX drug bound one, the peak lies around 5 to 6 hydrogen bonds. Therefore,
it can be concluded that Adagrasib successfully inhibits the melting of the helix
as compared to the other two. And this might be one of the reason for which
MRTX inhibits the fluctuation of the Switch-II region as shown in the RMSF
Figure 3.8.
The α-2-α-3 pocket is the main binding pocket of the drugs AMG and MRTX.
Therefore, the dihedral angle formed these two loops were analysed so as to see
if the drug-binding event has any effect on the bending of the two loops towards
each other.
Therefore, the frequency-dependent histogram was plotted for all the three
species as shown in the Figure 3.12. As can be seen in the Figure, both the
G12C variant and AMG-bound KRas show bimodal distribution which ranges
from 50o to 148o , having peaks at around 80o and 110o . Whereas for the MRTX-
bound KRas, only a unimodal distribution is evident which is quite restricted
between 70o to 118o peaking at around 96o . This also indicates that MRTX
heavily restricts the fluctuation of the Switch-II’s α-2 helix and keeps it close
58
to the α-3 helix, unlike the heavy fluctuation seen in case of G12C variant and
AMG-bound KRas.
Figure 3.12: Frequency-dependent histograms of GDP bound G12C variant, AMG and MRTX
drug-bound
Contact maps are a very good visualization tool for visualizing domain-domain
interactions that are present in a biomolecular system. For this study, frequency
dependent contact maps were used so as to analyse the major long-timescale in-
teractions present between a protein segment and the associated ligand.
For our two drug systems, the drugs lie very close to the Switch-II and α-3
regions. Therefore, the analysis was performed independently to all the com-
binations possible. As can be seen, in Figure 3.13 and Figure 3.14, the MRTX
drug makes a significant number of more sustained contacts compared to AMG.
The contacts formed are mainly forming through the hydrogen bonds and hy-
59
Figure 3.13: Contact map of Switch-II loop region with drugs
60
drophobic contacts. Furthermore, it should be noted that there is no significant
difference between the contacts formed between α-3 and the drugs, as shown in
Figure 3.15. This gives the conclusion that MRTX is able to form more contacts
with the Switch-II region, as opposed to AMG, and hence able to highly reduce
its fluctuation as confirmed through the RMSF and RMSD plots.
3.4 Conclusion
From all these studies on the oncogenic variant of the KRas and its drug-bound
states, it can concluded that:
• The two switch regions are highly disordered as compared to the other
parts of the protein.
• MRTX-bound variant restricts the motion of both the Switch-I and Switch-
II as compared to the G12C oncogenic variant and the AMG-bound one.
• The maximal contacts formed are in the MRTX-bound one are mainly
through the Hydrogen bonds and Hydrophobic interactions.
61
4 Exploring conformational landscape of drug-bound and
unbound forms of KRAS: Deducing switch-mediated kick-
out mechanism
4.1 Introduction
62
Figure 4.1: Schematic diagram of KRas-GEF Interaction scheme showing Kick-Out; adapted
from Ref. [80]
Figure 4.2: Binding and Unbinding mechanism of KRas-GEF interaction (PDB ID: 7KFZ)
63
Hence, our interest was to explore that if KRas can intrinsically reach this push-
pull configuration in its energy landscape and how accessible it is in its drug-
bound form. This could indicate an alternative pathway through which drugs
might inhibit the GTP binding of mutated KRas.
Along with the systems described in Chapter 3, another extra system was incor-
porated here, which is the Wild-type KRas, so as to explore the conformation
states visited by this native KRas system. Similar to the general MD protocol
for all the three species described in the Section 3.2, all the steps were done
exactly similar for all the four systems here, till the NPT equilibration. Once
the systems were equilibrated, the simulation were patched with PLUMED[129,
130, 131], for the performance of the well-tempered 2D-metadynamics through
Langevin dynamics. Also, the previously chosen distance order parameter was
an 1D-order parameter and was unable to calculate the multidimensional free
energy surface properly. Therefore, a collective variable such as RMSD was
chosen to take care of it. Hence, the order parameters for this run were cho-
sen as “RMSD of Switch-I” and “RMSD of Switch-II”. For the Well-tempered
metadynamics, the rate of hill deposition was set at 500, height and width of
the Gaussian hills were set at 0.1 and 0.001 respectively, and order parameter
bias factor was given a value of 15. The simulations were run through 200ns at
the temperature 310K using grids for computational optimization. And the data
were analysed to generate the free energy surface using the sum_hills module
of PLUMED.
64
4.3 Results
4.3.1 Conformational states of Wild-type (WT) and G12C oncogenic variant of KRas
For the Wild-type (WT) variant of KRas, as can be seen from the Figure 4.3
there exists a single stable state. Also there is very less heterogeneity present
in the structural landscape, which ranges from 0.12nm to 0.24nm for Switch-I
RMSD, and from 0.23nm to 0.33nm for Switch-II RMSD. This region we pro-
pose to be the GDP-GTP exchange state. For the Figure 4.3 to 4.6, the red,
Figure 4.3: Well-tempered Metadynamics plot of WT KRas along with its most stable state
Figure 4.4: Well-tempered Metadynamics plot of G12C-mutated KRas along with its most sta-
ble state
65
yellow and cyan represent the P-loop, Switch-I and Switch-II, respectively.
However, for the G12C-mutated oncogenic variant, there exists a different
global minima as compared to the WT. This minima shows more deviation in
the Switch-I and less in the Switch-II as compared to the WT. Nevertheless, it
has a small population in the single state as WT as well. Also, it shows more
heterogeneity than WT. Hence, we can propose that this variant has a small
propensity for the GDP-GTP exchange to happen, but has a different major sta-
ble state, which might be leading to the oncogenicity.
4.3.2 Comparison of conformational states of the oncogenic variant & the drug-bound
forms of KRas
Figure 4.5: Well-tempered Metadynamics plot of AMG-bound mutated KRas along with its
most stable states
66
As can be seen in the Figure 4.5, here for the AMG-bound mutated KRas, there
exists very large heterogeneity in the energy landscape. There are two stable
states as can be inferred from the landscape. However, the WT state still exists
with a small population which indicates that GDP-GTP exchange pathway is
still open even in the AMG bound state. This might be a reason why AMG is
not a potent drug in inhibiting the mutated KRas’s oncogenicity.
Figure 4.6: Well-tempered Metadynamics plot of MRTX-bound mutated KRas along with its
most stable states
However, in the Figure 4.6, MRTX completely restricts the fluctuations in the
Switch-II region of KRas in a controlled manner and has two stable states in
its energy landscape. This also validates the conclusion made in the previous
molecular dynamics data proposed in the previous chapter. Moreover, the WT
state population doesn’t exist. Therefore, it can be inferred that MRTX com-
67
pletely restricts the GDP-GTP exchange pathway.
4.4 Conclusion
• For the WT KRas, there exists a single stable state, that we propose is the
state that favours the GDP-GTP exchange.
• G12C mutated KRas, shows heterogeneity and also exhibits a small pop-
ulation that favours GDP-GTP exchange.
68
Future Aspects
• The allosteric affect of other G12 mutations (G12V, G12D, G12A) to the
switch regions needs to be chalked out.
69
References
[3] Jonathan J Ward et al. “Prediction and functional analysis of native disor-
der in proteins from the three kingdoms of life”. In: Journal of molecular
biology 337.3 (2004), pp. 635–645.
[6] Peter E Wright and H Jane Dyson. “Intrinsically unstructured proteins: re-
assessing the protein structure-function paradigm”. In: Journal of molec-
ular biology 293.2 (1999), pp. 321–331.
[7] Peter Csermely, Robin Palotai, and Ruth Nussinov. “Induced fit, confor-
mational selection and independent dynamic segments: an extended view
of binding events”. In: Nature Precedings (2010), pp. 1–1.
70
[8] Malene Ringkjøbing Jensen et al. “Exploring free-energy landscapes of
intrinsically disordered proteins at atomic resolution using NMR spec-
troscopy”. In: Chemical reviews 114.13 (2014), pp. 6632–6660.
[9] Shelly DeForte and Vladimir N Uversky. “Order, disorder, and every-
thing in between”. In: Molecules 21.8 (2016), p. 1090.
[12] Vladimir N Uversky. “Dancing protein clouds: the strange biology and
chaotic physics of intrinsically disordered proteins”. In: Journal of Bio-
logical Chemistry 291.13 (2016), pp. 6681–6688.
71
continuum with intrinsic disorder-based proteoforms”. In: Cellular and
Molecular Life Sciences 76.22 (2019), pp. 4461–4492.
[17] Chad Haynes et al. “Intrinsic disorder is a common feature of hub proteins
from four eukaryotic interactomes”. In: PLoS computational biology 2.8
(2006), e100.
[18] A Keith Dunker et al. “Flexible nets: the roles of intrinsic disorder
in protein interaction networks”. In: The FEBS journal 272.20 (2005),
pp. 5129–5148.
[20] A Keith Dunker et al. “Intrinsic disorder and protein function”. In: Bio-
chemistry 41.21 (2002), pp. 6573–6582.
[23] Jörg Gsponer and M Madan Babu. “The rules of disorder or why disorder
rules”. In: Progress in biophysics and molecular biology 99.2-3 (2009),
pp. 94–103.
72
[24] Charles A Galea et al. “Regulation of cell division by intrinsically un-
structured proteins: intrinsic flexibility, modularity, and signaling con-
duits”. In: Biochemistry 47.29 (2008), pp. 7598–7609.
[25] Mi-Kyung Yoon et al. “Cell cycle regulation by the intrinsically disor-
dered proteins p21 and p27”. In: Biochemical Society Transactions 40.5
(2012), pp. 981–988.
[26] Jennifer M Hurley et al. “Conserved RNA helicase FRH acts nonenzy-
matically to support the intrinsically disordered Neurospora clock protein
FRQ”. In: Molecular cell 52.6 (2013), pp. 832–843.
[27] Pei Dong et al. “A dynamic interaction process between KaiA and KaiC
is critical to the cyanobacterial circadian oscillator”. In: Scientific reports
6.1 (2016), pp. 1–11.
73
[32] Peter Tompa and Denes Kovacs. “Intrinsically disordered chaperones
in plants and animals”. In: Biochemistry and Cell Biology 88.2 (2010),
pp. 167–174.
[37] H Jane Dyson and Peter E Wright. “Coupling of folding and binding
for unstructured proteins”. In: Current opinion in structural biology 12.1
(2002), pp. 54–60.
[38] David D Boehr, Ruth Nussinov, and Peter E Wright. “The role of dy-
namic conformational ensembles in biomolecular recognition”. In: Na-
ture chemical biology 5.11 (2009), pp. 789–796.
74
[40] Angelo Toto et al. “Molecular recognition by templated folding of an
intrinsically disordered protein”. In: Scientific reports 6.1 (2016), pp. 1–
9.
[43] Norman E Davey et al. “Attributes of short linear motifs”. In: Molecular
BioSystems 8.1 (2012), pp. 268–281.
[44] Nicolás Palopoli et al. “Short linear motif core and flanking regions mod-
ulate retinoblastoma protein binding affinity and specificity”. In: Protein
Engineering, Design and Selection 31.3 (2018), pp. 69–77.
[45] Andreas Prestel et al. “The PCNA interaction motifs revisited: thinking
outside the PIP-box”. In: Cellular and Molecular Life Sciences 76.24
(2019), pp. 4923–4943.
[46] Heli I Alanen et al. “Beyond KDEL: the role of positions 5 and 6 in deter-
mining ER localization”. In: Journal of molecular biology 409.3 (2011),
pp. 291–297.
[47] Rob Kaptein and Gerhard Wagner. Integrative methods in structural bi-
ology. 2019.
75
binding domain of CBP”. In: Proceedings of the National Academy of
Sciences 107.28 (2010), pp. 12535–12540.
[50] Ewa Jurneczko et al. “Intrinsic disorder in proteins: a challenge for (un)
structural biology met by ion mobility–mass spectrometry”. In: Biochem-
ical Society Transactions 40.5 (2012), pp. 1021–1026.
76
[56] Supriyo Bhattacharya and Xingcheng Lin. “Recent advances in com-
putational protocols addressing intrinsically disordered proteins”. In:
Biomolecules 9.4 (2019), p. 146.
[57] Claire C Hsu, Markus J Buehler, and Anna Tarakanova. “The order-
disorder continuum: linking predictions of protein structure and disorder
through molecular simulation”. In: Scientific reports 10.1 (2020), pp. 1–
14.
[59] Frank Noé and Cecilia Clementi. “Collective variables for the study of
long-time kinetics from molecular trajectories: theory and methods”. In:
Current opinion in structural biology 43 (2017), pp. 141–147.
[60] David J Wales. “Energy landscapes: some new horizons”. In: Current
opinion in structural biology 20.1 (2010), pp. 3–10.
77
[65] Massimiliano Bonomi et al. “Principles of protein structural ensemble
determination”. In: Current opinion in structural biology 42 (2017),
pp. 106–116.
[69] Gül H Zerze et al. “Free energy surface of an intrinsically disordered pro-
tein: comparison between temperature replica exchange molecular dy-
namics and bias-exchange metadynamics”. In: Journal of chemical the-
ory and computation 11.6 (2015), pp. 2776–2782.
[70] Shalini Awasthi and Nisanth N Nair. “Exploring high dimensional free
energy landscapes: Temperature accelerated sliced sampling”. In: The
Journal of Chemical Physics 146.9 (2017), p. 094108.
[71] Trang Nhu Do, Wing-Yiu Choy, and Mikko Karttunen. “Accelerating the
conformational sampling of intrinsically disordered proteins”. In: Jour-
nal of Chemical Theory and Computation 10.11 (2014), pp. 5081–5094.
78
[72] Xingcheng Lin et al. “Structural and dynamical order of a disordered
protein: molecular insights into conformational switching of PAGE4 at
the systems level”. In: Biomolecules 9.2 (2019), p. 77.
[78] David S Goodsell. “The molecular perspective: the ras oncogene”. In:
Stem cells 17.4 (1999), pp. 235–236.
79
[80] Ingrid R Vetter and Alfred Wittinghofer. “The guanine nucleotide-
binding switch in three dimensions”. In: Science 294.5545 (2001),
pp. 1299–1304.
[82] Hans Martin Senn and Walter Thiel. “QM/MM methods for biomolecu-
lar systems”. In: Angewandte Chemie International Edition 48.7 (2009),
pp. 1198–1229.
[84] Biman Bagchi. Statistical Mechanics for Chemistry and Materials Sci-
ence. CRC Press, 2018.
[85] Ralf Schneider, Amit Raj Sharma, and Abha Rai. “Introduction to molec-
ular dynamics”. In: Computational Many-Particle Physics. Springer,
2008, pp. 3–40.
[86] M Sprik. “Effective pair potentials and beyond”. In: Computer simulation
in chemical physics (1993), pp. 211–259.
[88] Kunal Roy, Supratik Kar, and Rudra Narayan Das. Understanding the
basics of QSAR for applications in pharmaceutical sciences and risk as-
sessment. Academic press, 2015.
80
[89] Mark James Abraham et al. “GROMACS: High performance molecular
simulations through multi-level parallelism from laptops to supercom-
puters”. In: SoftwareX 1 (2015), pp. 19–25.
[92] William C Swope et al. “A computer simulation method for the calcu-
lation of equilibrium constants for the formation of physical clusters of
molecules: Application to small water clusters”. In: The Journal of chem-
ical physics 76.1 (1982), pp. 637–649.
81
[98] perm_identity Posted by LAMMPS Tube. Periodic boundary conditions.
Jan. 2022. URL: https://lammpstube.com/2019/10/30/periodic-
boundary-conditions/.
[99] Paul Peter Ewald. “Ewald summation”. In: Ann. Phys 369.253 (1921),
pp. 1–2.
[101] Tom Darden, Darrin York, and Lee Pedersen. “Particle mesh Ewald: An
N log (N) method for Ewald sums in large systems”. In: The Journal of
chemical physics 98.12 (1993), pp. 10089–10092.
[102] Daan Frenkel and Berend Smit. “Chapter 4-molecular dynamics simu-
lations”. In: Understanding Molecular Simulation 2 (2002), pp. 63–107.
82
[107] Alessandro Laio and Michele Parrinello. “Escaping free-energy min-
ima”. In: Proceedings of the National Academy of Sciences 99.20 (2002),
pp. 12562–12566.
[112] Anne B Vojtek and Channing J Der. “Increasing complexity of the Ras
signaling pathway”. In: Journal of Biological Chemistry 273.32 (1998),
pp. 19925–19928.
[113] Nobuo Tsuchida, Tom Ryder, and Eiichi Ohtsubo. “Nucleotide sequence
of the oncogene encoding the p21 transforming protein of Kirsten murine
sarcoma virus”. In: Science 217.4563 (1982), pp. 937–939.
[114] Daniel Zeitouni et al. “KRAS mutant pancreatic cancer: no lone path to
an effective treatment”. In: Cancers 8.4 (2016), p. 45.
83
[115] Sezen Vatansever, Zeynep H Gümüş, and Burak Erman. “Intrinsic K-
Ras dynamics: A novel molecular dynamics data analysis method shows
causality between residue pair motions”. In: Scientific reports 6.1 (2016),
pp. 1–12.
[116] Rebecca L Siegel et al. “Cancer statistics, 2021”. In: CA: a cancer jour-
nal for clinicians 71.1 (2021), pp. 7–33.
[117] Ravi Salgia. “Mutation testing for directing upfront targeted therapy
and post-progression combination therapy strategies in lung adenocar-
cinoma”. In: Expert Review of Molecular Diagnostics 16.7 (2016),
pp. 737–749.
[120] Hayley Virgil. New Drug Application for ADAGRASIB accepted by FDA
for Kras G12c+ NSCLC. 2022. URL: https://www.cancernetwork.
com/view/new- drug- application- for- adagrasib- accepted-
by-fda-for-kras-g12c-nsclc.
[121] Jude Canon et al. “The clinical KRAS (G12C) inhibitor AMG 510 drives
anti-tumour immunity”. In: Nature 575.7781 (2019), pp. 217–223.
84
[123] John C Hunter et al. “In situ selectivity profiling and crystal structure of
SML-8-73-1, an active site inhibitor of oncogenic K-Ras G12C”. In: Pro-
ceedings of the National Academy of Sciences 111.24 (2014), pp. 8895–
8900.
[127] Alexander D MacKerell Jr, Michael Feig, and Charles L Brooks. “Im-
proved treatment of the protein backbone in empirical force fields”. In:
Journal of the American Chemical Society 126.3 (2004), pp. 698–699.
[128] Pekka Mark and Lennart Nilsson. “Structure and dynamics of the TIP3P,
SPC, and SPC/E water models at 298 K”. In: The Journal of Physical
Chemistry A 105.43 (2001), pp. 9954–9960.
85
[131] Gareth A Tribello et al. “PLUMED 2: New feathers for an old bird”. In:
Computer Physics Communications 185.2 (2014), pp. 604–613.
86