Bioinformatics 21 8 1311
Bioinformatics 21 8 1311
Bioinformatics 21 8 1311
Systems biology
© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 1311
A.d.Sol et al.
a b Betweenness Centrality
Monomer
Frequenc y
–3
C = 2.287 ×10
0.06 λ =13.195 0.001
R 2 = 0.99975
R2 = 0.95121
Monomer
0.04 0.0001
Fig. 1. Frequency distributions of the residue number of links and betweenness centrality averaged over both sets of monomers and dimers. (a) Bell-shaped
Poisson frequency distribution of the residue number of links averaged in both the monomers (shown with the pink dots) and dimers (shown with the blue dots).
The discrete Poisson fit P (x) = λx e−λ /x! is illustrated with the pink and blue lines for the monomers and dimers respectively. The average residue number
of links λ and the correlation coefficients squared R 2 are shown in the graph. (b) Frequency distribution of betweenness centrality averaged over both sets
of monomers (shown with the pink dots) and dimers (shown with the blue dots). The frequency distributions follow a power law with an exponential cut-off
P (β) = Cβ −η exp(−β/βc ) which is illustrated in the graph with the pink and blue lines for the monomers and dimers respectively. The data has been graphed
using a logarithmic scale with the power law-scaling exponent η, exponential cut-off βc , constant C and the correlation coefficients squared R 2 for both datasets
shown in the graph. There was no statistically significant difference between the monomer and dimer frequency distributions in both (A) and (B).
A set of 18 protein complexes with experimental information on hot spot where Nv is the number of vertices, Ni is the number of neighbors of the
residues was obtained by searching the Alanine Scanning Energetics data- vertex i, and ni is the actual number of edges between the neighbors of i
base (ASEdb) (http://www.140.247.111.161/hotspot/index.php) (Thorn and (Vendruscolo et al., 2002).
Bogan, 2001). Experimentally measured hot spots of binding free energy
were defined as residues with a change in binding free energy greater than or Statistical analysis
equal to 1.0 Kcal/mol. Some additional data were used from previous studies The frequency distributions of the residue number of links and betweenness
in phenylalanine substitutions (Mainfroid et al., 1996). centrality averaged over both sets of monomers and dimers were plotted
The conservation of residues in the protein complexes was analyzed based and analyzed using Systat statistical software packages. The Kolmogorov–
on multiple sequence alignments generated by ClustalW (Thompson et al., Smirnov test was used to test the statistically significant difference between
1994), using homologous protein sequences obtained from the Swissprot the monomer and dimer frequency distributions.
database (http://www.us.expasy.org/sprot/) (Boeckmann et al., 2003). Our analysis was carried out on a PC Linux cluster with 40 nodes (dual
The accessible surface areas (ASAs) of the protein complexes were determ- 3.02 GHz Xeon), and on a Windows PC (3.0 GHz Pentium IV).
ined using the DSSP program (Kabsch and Sander, 1983). Experimental
enrichment of hot spot information was obtained from the literature (Bogan DISCUSSION
and Thorn, 1998).
We start by modeling protein structures as networks (see Systems
The protein graphs and Methods). We base our analysis on a representative set of 42
biologically diverse protein complexes (with one or both of their
The protein structures are modeled as networks with amino acid residues
unbound structures available), and find, in agreement with previous
being the vertices and all atom contacts between them the edges. Atom con-
tacts are defined when the distance between at least one atom of residue i is
studies (Vendruscolo et al., 2002; Dokholyan et al., 2002; Greene and
at a distance ≤5.0 Å from an atom of residue j (Greene and Higman, 2003). Higman, 2003), that both the dimer and monomer structures exhibit
The characteristic path length L is defined as the average minimal distance small-world character in accordance with their values of clustering
between all pairs of vertices in the graph, calculated by: coefficients and characteristic path lengths, in comparison with ran-
1
dom and regular graphs with the same number of vertices and average
L= lij , number of neighbors (see Supplementary material). Figure 1a illus-
Np
j >i trates the frequency distribution of the residue number of links N
where Np represents the number of pairs of vertices of the graph, and lij is averaged in both sets of monomers and dimers, indicating that both
the minimal path between vertices i and j (Vendruscolo et al., 2002). distributions are Poisson-like, where P (x) = λx e−λ /x! (with the
The clustering coefficient C is defined as the average over all vertices of average residue number of links λ), with no statistically significant
the fraction of the number of connected pairs of neighbors for each vertex, difference between them. The concept of betweenness centrality used
calculated by: in sociology (Freeman, 1977), defined for each vertex k as the number
1 ni of pairs of vertices with the shortest path among them passing through
C= ,
Nv Ni (Ni − 1)/2 k normalized by the total number of pairs of vertices, is a good
i
1312
Small-world networks of protein–protein complex structures
Table 1. Statistically significant high betweenness (z-score ≥3.0) residues obtained from the 18 complexes analyzed, and their correlation to hot spots of
binding free energy
Protein complex PDB code and Statistically significant high betweenness Clusters (ratio > 0.8)
chain identifier residues (z-score ≥ 3)
The types of protein complexes (column 1) with their corresponding PDB code and chain identifiers (column 2) are shown in the table along with their respective statistically significant
high betweenness residues (column 3). The clusters including statistically significant high betweenness residues and experimentally annotated hot spots are also illustrated for each
complex (column 4). The clustering ratio in each case was assumed to be ≥0.8, and it is defined as ratio = Ne /[Nv (Nv − 1)/2], where Ne is the number of edges among residues in
the cluster, and Nv is the number of residues in the cluster. In columns 3 and 4, the green colored residues represent experimentally annotated hot spots and the blue colored residues
represent statistically significant high betweenness residues, for which no experimental information on binding free energy is available. In each of the clusters, residues occurring in
both columns 3 and 4 are shown in bold.
indicator of the centrality of the vertex in the network. The frequency The process of dimerization between monomers can be viewed as a
distribution of the residue betweenness centrality β averaged in both particular rewiring (rather than preferential attachment) in the system
sets of monomers and dimers follows a power law with an exponential of the two monomers (each corresponding to a small-world network)
cut-off P (β) = Cβ −η exp(−β/βc ), with the corresponding values due to the conformational changes, with the removal and addition of
for the power law scaling exponent η and the exponential cut-off βc links occurring in each monomer, the formation of new links between
approximately the same in the monomer and the dimer structures, the monomers, but on the other hand, leaving the frequency distribu-
and no statistically significant difference between the betweenness tions of the residue number of links and betweenness centrality with
centrality distributions in the two cases (Fig. 1b). Unlike the fre- no statistically significant difference between both sets of monomers
quency distribution of the residue number of links, the betweenness and dimers (see Fig. 2 in Supplementary material). Interestingly, due
centrality frequency distribution is quite inhomogeneous, showing to this rewiring process, new central residues (with statistically sig-
that a high number of residues have a small value of the betweenness nificant high values of central betweenness z-score ≥ 3.0) which are
centrality while only a few residues have a large value. This protein not homogenously distributed appear mainly at the protein–protein
representation is in agreement with the wiring model proposed by interfaces, while other previously central residues in the monomeric
Watts and Strogatz (1998), where an important role is played by the structures lose their centrality in the dimer structure. Conversely,
short cuts, responsible for the small values of the characteristic path there are a number of central residues in the monomer structures,
length, while the clustering coefficient values remain high. which remain central in the complex (see Fig. 3 in Supplementary
We study the protein–protein interaction mechanism using this rep- material).
resentation of protein structures as small-world networks in order to Perhaps the most interesting result of this work is the strong
elucidate some of the important topological changes occurring upon correlation between the statistically significant central residues at
dimerization and the existence of topological determinants possibly protein–protein interfaces (topological determinants) with the most
related to key residues in the complex stability. contributing residues to the binding free energy in protein–protein
1313
A.d.Sol et al.
interactions. Experimental results based on Alanine scanning muta- order to improve the current methods of protein dockings. Some
genesis (Thorn and Bogan, 2001) and phenylalanine substitution initial results in this direction have been addressed in our recent
(Mainfroid et al., 1996) of protein–protein interfaces has shown work (del Sol and O’Meara, 2004), where we show that some central
that the free energy contribution of individual amino acids in residues in the monomeric structures remain central after dimeriza-
protein–protein binding is not uniformly distributed at the binding tion and that possible information on hot spots of binding free energy
site; instead there are hot spots of binding free energy (G ≥ could be obtained from the unbound structures. We are planning to
1.0 Kcal/mol) comprised of a small subset of residues at the com- continue this study in the future.
plex interface (Bogan and Thorn, 1998). Our analysis based on
a set of 18 protein complexes with experimental information on
ACKNOWLEDGEMENTS
hot spot residues and covering different biological examples of
protein–protein interactions shows that the statistically significant We would like to acknowledge interesting discussions in issues
high betweenness residues (z-score ≥ 3.0) occurring at the protein– related to small-world view of protein structures with Dr Alfonso
1314
Small-world networks of protein–protein complex structures
Thorn,K.S. and Bogan,A.A. (2001) ASEdb: a database of alanine mutations and their Verkhivker,G.M., Bouzida,D., Gehlhaar,D.K., Rejto,P.A., Freer,S.T. and Rose,P.W.
effects on the free energy of binding in protein interactions. Bioinformatics, 17, (2002) Monte carlo simulations of the peptide recognition at the consensus bind-
284–285. ing site of the constant fragment of human immunoglobulin G: the energy landscape
Vendruscolo,M., Dokholyan,N.V., Paci,E. and Karplus,M. (2002) Small-world view analysis of a hot spot at the intermolecular interface. Proteins, 48, 539–557.
of the amino acids that play a key role in protein folding. Phys. Rev. E, 65, Watts,D.J. and Strogatz,S.H. (1998) Collective dynamics of small-world networks.
061910-1–061910-4. Nature (London), 393, 440–442.
1315