Bioinformatics 21 8 1311

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Vol. 21 no.

8 2005, pages 1311–1315


BIOINFORMATICS DISCOVERY NOTE doi:10.1093/bioinformatics/bti167

Systems biology

Topology of small-world networks of protein–protein complex


structures
Antonio del Sol∗ , Hirotomo Fujihashi and Paul O’Meara
Bioinformatics Research Project, Research and Development Division, Fujirebio Inc., 51 Komiya-cho, Hachioji-shi,
Tokyo 192-0031, Japan
Received on September 6, 2004; revised on November 11, 2004; accepted on November 17, 2004

Downloaded from https://academic.oup.com/bioinformatics/article/21/8/1311/249420 by guest on 09 February 2023


Advance Access publication January 19, 2005

ABSTRACT the identification of key residues involved in the protein folding


The majority of real examples of small-world networks exhibit a power mechanism (Vendruscolo et al., 2002) and the correlation between
law distribution of edges among the nodes, therefore not fitting into the topological properties of protein conformations and their kinetic
the wiring model proposed by Watts and Strogatz. However, protein ability to fold (Dokholyan et al., 2002; Greene and Higman, 2003),
structures can be modeled as small-world networks, with a distribu- or the identification of functional sites in protein structures (Shemesh
tion of the number of links decaying exponentially as in the case of et al., 2004) among other examples.
this wiring model. We approach the protein–protein interaction mech- An interesting application of small-world networks would be
anism by viewing it as a particular rewiring occurring in the system the representation of protein–protein complexes as such networks,
of two small-world networks represented by the monomers, where in order to elucidate different structural characteristics associated
a re-arrangement of links takes place upon dimerization leaving the with the presence of residues that contribute the most to the bind-
small-world character in the dimer network. Due to this rewiring, the ing free energy (hot spots), which are unevenly distributed at
most central residues at the complex interfaces tend to form clusters, the binding interface (Bogan and Thorn, 1998). Although differ-
which are not homogenously distributed. We show that these highly ent approaches involving sequence and structural information or
central residues are strongly correlated with the presence of hot spots energetic calculations have been proposed to study and predict
of binding free energy. hot spots of binding free energy (Kortemme and Baker, 2002;
Contact: [email protected] Sheinerman and Honig, 2002; Verkhivker et al., 2002; Ma et al.,
Supplementary information: http://www.fujirebio.co.jp/support/index. 2003; Brinda et al., 2002), the small-world representation of protein–
php (under construction). protein complexes could give another complementary view on this
problem.
INTRODUCTION Here, we show that the protein–protein interaction mechanism
can be viewed as a specific rewiring occurring in the system of
Networks have become a powerful and useful tool for modeling
two small-world networks represented by the monomers, where a
and understanding the evolution of different complex systems
rearrangement of links takes place upon dimerization leaving the
(Kuramoto, 1984; Strogatz and Steward, 1993; Braiman et al., 1995;
small-world character in the dimer network. Due to this specific
Gerhardt et al., 1990; Nowak and May, 1992). Although the con-
rewiring, a rearrangement of residue centrality occurs, leading to
nection topology is frequently assumed to be completely random
the appearance of a significant percentage of central residues at the
or completely regular (Watts and Strogatz, 1998; Bollabas, 1985),
protein–protein interface. The analysis of 18 protein complexes with
in many cases both of these models seem to give a simplistic rep-
experimentally annotated hot spots of binding free energy shows that
resentation of real complex systems. Indeed, many real networks
the most central residues at the protein–protein interface, respons-
lie somewhere between the extremes of order and randomness with
ible for the small-world character, are strongly correlated with the
respect to their topological characteristics. This is the case of the
presence of hot spots.
so-called small-world network, where any pair of vertices can be
connected through just a few links. The topology of these kinds of
networks are characterized by large values of the clustering coeffi- SYSTEMS AND METHODS
cient (as for regular graphs), defined as the average over all vertices
of the fraction of the number of connected pairs of neighbors for
Datasets
each vertex, and small values of the characteristic path length (as for A dataset of 42 dimer complexes, which each contained at least one mono-
random graphs), defined as the average minimal distance between meric structure was obtained by searching the protein data bank (PDB)
all pairs of vertices in the graph. (http://www.rcsb.org/pdb/) (Berman et al., 2000) and the structural classific-
ation of proteins (SCOP) database (http://www.scop.berkeley.edu/) (Murzin
The representation of protein structures as small-world networks
et al., 1995). The non-complexed structures were chosen if they had an
has recently become an interesting approach to study a variety identical sequence to their bound form with no insertions and deletions. If any
of problems associated to protein function and structure, such as of the complexes contained more than two structures in the unbound form
the most recently solved structures were used. As a result, a dataset of 58
∗ To whom correspondence should be addressed. monomers was compiled.

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 1311
A.d.Sol et al.

a b Betweenness Centrality

0.001 0.01 0.1 1


0.12 1
Dimer
0.10 λ =13.630 0.1
R 2 = 0.96002 Dimer
η = 0.870
0.08 0.01
β c = 3.293×10 –2
Frequenc y

Monomer

Frequenc y
–3
C = 2.287 ×10
0.06 λ =13.195 0.001
R 2 = 0.99975
R2 = 0.95121
Monomer
0.04 0.0001

Downloaded from https://academic.oup.com/bioinformatics/article/21/8/1311/249420 by guest on 09 February 2023


η = 0.809
–2
β c = 3.354 × 10
0.02 0.00001
C = 3.124 × 10–3
R 2 = 0.99960
0.00 0.000001
0 5 10 15 20 25 30
Number of Links

Fig. 1. Frequency distributions of the residue number of links and betweenness centrality averaged over both sets of monomers and dimers. (a) Bell-shaped
Poisson frequency distribution of the residue number of links averaged in both the monomers (shown with the pink dots) and dimers (shown with the blue dots).
The discrete Poisson fit P (x) = λx e−λ /x! is illustrated with the pink and blue lines for the monomers and dimers respectively. The average residue number
of links λ and the correlation coefficients squared R 2 are shown in the graph. (b) Frequency distribution of betweenness centrality averaged over both sets
of monomers (shown with the pink dots) and dimers (shown with the blue dots). The frequency distributions follow a power law with an exponential cut-off
P (β) = Cβ −η exp(−β/βc ) which is illustrated in the graph with the pink and blue lines for the monomers and dimers respectively. The data has been graphed
using a logarithmic scale with the power law-scaling exponent η, exponential cut-off βc , constant C and the correlation coefficients squared R 2 for both datasets
shown in the graph. There was no statistically significant difference between the monomer and dimer frequency distributions in both (A) and (B).

A set of 18 protein complexes with experimental information on hot spot where Nv is the number of vertices, Ni is the number of neighbors of the
residues was obtained by searching the Alanine Scanning Energetics data- vertex i, and ni is the actual number of edges between the neighbors of i
base (ASEdb) (http://www.140.247.111.161/hotspot/index.php) (Thorn and (Vendruscolo et al., 2002).
Bogan, 2001). Experimentally measured hot spots of binding free energy
were defined as residues with a change in binding free energy greater than or Statistical analysis
equal to 1.0 Kcal/mol. Some additional data were used from previous studies The frequency distributions of the residue number of links and betweenness
in phenylalanine substitutions (Mainfroid et al., 1996). centrality averaged over both sets of monomers and dimers were plotted
The conservation of residues in the protein complexes was analyzed based and analyzed using Systat statistical software packages. The Kolmogorov–
on multiple sequence alignments generated by ClustalW (Thompson et al., Smirnov test was used to test the statistically significant difference between
1994), using homologous protein sequences obtained from the Swissprot the monomer and dimer frequency distributions.
database (http://www.us.expasy.org/sprot/) (Boeckmann et al., 2003). Our analysis was carried out on a PC Linux cluster with 40 nodes (dual
The accessible surface areas (ASAs) of the protein complexes were determ- 3.02 GHz Xeon), and on a Windows PC (3.0 GHz Pentium IV).
ined using the DSSP program (Kabsch and Sander, 1983). Experimental
enrichment of hot spot information was obtained from the literature (Bogan DISCUSSION
and Thorn, 1998).
We start by modeling protein structures as networks (see Systems
The protein graphs and Methods). We base our analysis on a representative set of 42
biologically diverse protein complexes (with one or both of their
The protein structures are modeled as networks with amino acid residues
unbound structures available), and find, in agreement with previous
being the vertices and all atom contacts between them the edges. Atom con-
tacts are defined when the distance between at least one atom of residue i is
studies (Vendruscolo et al., 2002; Dokholyan et al., 2002; Greene and
at a distance ≤5.0 Å from an atom of residue j (Greene and Higman, 2003). Higman, 2003), that both the dimer and monomer structures exhibit
The characteristic path length L is defined as the average minimal distance small-world character in accordance with their values of clustering
between all pairs of vertices in the graph, calculated by: coefficients and characteristic path lengths, in comparison with ran-
1 
dom and regular graphs with the same number of vertices and average
L= lij , number of neighbors (see Supplementary material). Figure 1a illus-
Np
j >i trates the frequency distribution of the residue number of links N
where Np represents the number of pairs of vertices of the graph, and lij is averaged in both sets of monomers and dimers, indicating that both
the minimal path between vertices i and j (Vendruscolo et al., 2002). distributions are Poisson-like, where P (x) = λx e−λ /x! (with the
The clustering coefficient C is defined as the average over all vertices of average residue number of links λ), with no statistically significant
the fraction of the number of connected pairs of neighbors for each vertex, difference between them. The concept of betweenness centrality used
calculated by: in sociology (Freeman, 1977), defined for each vertex k as the number
1  ni of pairs of vertices with the shortest path among them passing through
C= ,
Nv Ni (Ni − 1)/2 k normalized by the total number of pairs of vertices, is a good
i

1312
Small-world networks of protein–protein complex structures

Table 1. Statistically significant high betweenness (z-score ≥3.0) residues obtained from the 18 complexes analyzed, and their correlation to hot spots of
binding free energy

Protein complex PDB code and Statistically significant high betweenness Clusters (ratio > 0.8)
chain identifier residues (z-score ≥ 3)

Hormone/receptor 1a22AB 18A,178A,365B [18A] [175A,178A,365B,369B]


Enzyme/inhibitor 1a4yAB 33A,63A,150A,27B,31B,41B,89B,93B [33A,63A,31B] [150A,27B,93B] [263A,93B]
[318A,375A,89B] [434A,41B] [27B,31B]
Enzyme/inhibitor 1brsAD 27A,73A,38D,39D [27A,73A,38D,39D]
Immune system 1bxiAB 30A,55A [30A,33A,34A,37A] [50A,51A,54A,55A,56A]

Downloaded from https://academic.oup.com/bioinformatics/article/21/8/1311/249420 by guest on 09 February 2023


protein
Enzyme/inhibitor 1cbwCD 15D,17D [15D,17D]
Immune system 1cdcBA 29A,31A,32A,33A,16B,31B,32B [29A,31A,81A,29B,31B] [31A,32A,33A,38A,81A]
protein, [29A,31B,32B,38B] [16B,32B]
receptor
Enzyme/inhibitor 1dfjEI 146I [146I,202I]
Antibody/antigen 1fccAC 28C [27C,28C,31C,43C]
Antibody/antigen 1fvcAB 36A,38A,89A,37B,39B [36A,89A,91A,105B] [38A,37B,39B,95B]
Antibody/antigen 1gc1CG 29C,43C,46C [29C,81C] [29C,85C] [43C,44C,59C] [44C,46C]
Cytokine 1il8AB 25A,25B [25A,27A,25B,27B]
Toxin/receptor 1jckCD 26D,60D,176D [55C,20D,23D,176D] [23D,26D,90D,210D]
[26D,60D,90D,210D]
Hydrolase 1pp2RL 31L,31R [5L,9L,31R] [31L,5R,9R]
Isomerase 1ypiAB 12A,64A,77A,82A,98A,12B,46B,77B,98B [12A,64A,77A,98A,77B,98B] [82A,12B] [82A,46B]
Enzyme/inhibitor 2ptcEI 15I,17I,19I [15I,17I] [17I,19I]
Antibody/antigen 3hfmHY 58H [58H]
Hormone/receptor 3hhrAB 21A,178A,164B,165B [21A,172A] [64A,42B,43B,44B,164B,169B]
[64A,43B,44B,103B,164B,169B]
[175A,176A,178A,104B,169B]
[178A,164B,165B,169B]
Cytokine 3inkCD 43D,45D,68D [42D,43D] [42D,68D][43D,45D]

The types of protein complexes (column 1) with their corresponding PDB code and chain identifiers (column 2) are shown in the table along with their respective statistically significant
high betweenness residues (column 3). The clusters including statistically significant high betweenness residues and experimentally annotated hot spots are also illustrated for each
complex (column 4). The clustering ratio in each case was assumed to be ≥0.8, and it is defined as ratio = Ne /[Nv (Nv − 1)/2], where Ne is the number of edges among residues in
the cluster, and Nv is the number of residues in the cluster. In columns 3 and 4, the green colored residues represent experimentally annotated hot spots and the blue colored residues
represent statistically significant high betweenness residues, for which no experimental information on binding free energy is available. In each of the clusters, residues occurring in
both columns 3 and 4 are shown in bold.

indicator of the centrality of the vertex in the network. The frequency The process of dimerization between monomers can be viewed as a
distribution of the residue betweenness centrality β averaged in both particular rewiring (rather than preferential attachment) in the system
sets of monomers and dimers follows a power law with an exponential of the two monomers (each corresponding to a small-world network)
cut-off P (β) = Cβ −η exp(−β/βc ), with the corresponding values due to the conformational changes, with the removal and addition of
for the power law scaling exponent η and the exponential cut-off βc links occurring in each monomer, the formation of new links between
approximately the same in the monomer and the dimer structures, the monomers, but on the other hand, leaving the frequency distribu-
and no statistically significant difference between the betweenness tions of the residue number of links and betweenness centrality with
centrality distributions in the two cases (Fig. 1b). Unlike the fre- no statistically significant difference between both sets of monomers
quency distribution of the residue number of links, the betweenness and dimers (see Fig. 2 in Supplementary material). Interestingly, due
centrality frequency distribution is quite inhomogeneous, showing to this rewiring process, new central residues (with statistically sig-
that a high number of residues have a small value of the betweenness nificant high values of central betweenness z-score ≥ 3.0) which are
centrality while only a few residues have a large value. This protein not homogenously distributed appear mainly at the protein–protein
representation is in agreement with the wiring model proposed by interfaces, while other previously central residues in the monomeric
Watts and Strogatz (1998), where an important role is played by the structures lose their centrality in the dimer structure. Conversely,
short cuts, responsible for the small values of the characteristic path there are a number of central residues in the monomer structures,
length, while the clustering coefficient values remain high. which remain central in the complex (see Fig. 3 in Supplementary
We study the protein–protein interaction mechanism using this rep- material).
resentation of protein structures as small-world networks in order to Perhaps the most interesting result of this work is the strong
elucidate some of the important topological changes occurring upon correlation between the statistically significant central residues at
dimerization and the existence of topological determinants possibly protein–protein interfaces (topological determinants) with the most
related to key residues in the complex stability. contributing residues to the binding free energy in protein–protein

1313
A.d.Sol et al.

interactions. Experimental results based on Alanine scanning muta- order to improve the current methods of protein dockings. Some
genesis (Thorn and Bogan, 2001) and phenylalanine substitution initial results in this direction have been addressed in our recent
(Mainfroid et al., 1996) of protein–protein interfaces has shown work (del Sol and O’Meara, 2004), where we show that some central
that the free energy contribution of individual amino acids in residues in the monomeric structures remain central after dimeriza-
protein–protein binding is not uniformly distributed at the binding tion and that possible information on hot spots of binding free energy
site; instead there are hot spots of binding free energy (G ≥ could be obtained from the unbound structures. We are planning to
1.0 Kcal/mol) comprised of a small subset of residues at the com- continue this study in the future.
plex interface (Bogan and Thorn, 1998). Our analysis based on
a set of 18 protein complexes with experimental information on
ACKNOWLEDGEMENTS
hot spot residues and covering different biological examples of
protein–protein interactions shows that the statistically significant We would like to acknowledge interesting discussions in issues
high betweenness residues (z-score ≥ 3.0) occurring at the protein– related to small-world view of protein structures with Dr Alfonso

Downloaded from https://academic.oup.com/bioinformatics/article/21/8/1311/249420 by guest on 09 February 2023


protein interfaces are not uniformly distributed, but instead cluster Valencia (CNB), and thank Professor Ruth Nussinov (NCI, Tel Aviv
together, surrounded by regions of residues with relatively low values University) for helpful discussions on protein–protein interactions.
of betweenness centrality, resembling that of the aforementioned free
energy of binding distribution. More detailed analysis reveals a clear REFERENCES
tendency of the statistically significant high betweeness residues to Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H.,
be located in hot spot regions, with the experimentally annotated hot Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res.,
spots exhibiting statistically significant high betweenness values in 28, 235–242.
the majority of the cases. Table 1 shows that in the 18 complexes Boeckmann,B., Bairoch,A., Apweiler,R., Blatter,M.-C., Estreicher,A., Gasteiger,E.,
Martin,M.J., Michoud,K., O’Donovan,C., Phan,I., Pilbout,S. and Schneider,M.
analyzed, 81% of these central residues form clusters with an exper- (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in
imentally annotated hot spot at the cluster center with 22 of these 2003. Nucleic Acids Res., 31, 365–370.
statistically significant high betweenness residues been actual hot Bogan,A.A. and Thorn,K.S. (1998) Anatomy of hot spots in protein interfaces. J. Mol.
spots (see Fig. 4 in Supplementary material). Biol., 280, 1–9.
Bollabas,B. (1985) Random Graphs. Academic Press, London.
The remaining 19% of our predicted residues occur mainly in those
Braiman,Y., Lindner,J.F. and Ditto,W.L. (1995) Taming spatiotemporal chaos with
examples of protein complexes with little experimental information disorder. Nature, 378, 465–467.
on hot spot residues, such as the enzyme/Inhibitor complex 2ptcEI, Brinda,K.V., Kannan,N. and Vishveshwara,S. (2002) Protein Eng., 15, 265–277.
which contains only one experimentally annotated hot spot of binding del Sol,A. and O’Meara,P. (2004) Small-world network approach to identify key residues
free energy. On the other hand, these residues tend to be clustered in protein–protein interaction. Proteins, 58, 672–682.
Dokholyan,N.V., Li,L., Ding,F. and Shakhnovich,E.I. (2002) Topological determinants
together, are highly correlated with the experimental data on hot spot of protein folding. Proc. Natl Acad. Sci. USA, 99, 8637–8641.
enrichment, and are generally conserved in sequence alignment or Freeman,L.C. (1977) A set of measures of centrality based on betweenness. Sociometry,
non-exposed to the solvent in the dimer structure, indicating that 40, 35–43.
many of them are candidates of hot spots. Gerhardt,M., Schuster,H. and Tyson,J.J. (1990). A cellular automaton model of excitable
media including curvature and dispersion. Science, 247, 1563–1566.
Despite the complexity involved in real physical interaction net-
Greene,L.H. and Higman,V.A. (2003) Uncovering network systems within protein
works occurring in the protein structures, our simple network rep- structures. J. Mol. Biol., 334, 781–791.
resentation of the latter provides some insight into this complicated Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary structure: pat-
picture. Indeed, by using only one network topology characteristic tern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22,
(betweenness centrality) we are able to identify hot spot regions at 2577–2637.
Kortemme,T. and Baker,D. (2002) A simple model for binding free energy hot spots in
protein–protein interfaces, taking into account the global topology of protein–protein complexes. Proc. Natl Acad. Sci. USA, 99, 14116–14121.
the complex whilst keeping its simplicity, which in combination with Kuramoto,Y. (1984) Chemical oscillation. In Waves and Turbulence. Springer, Berlin.
the reduced computational requirements are clear advantages of our Ma,B., Elkayam,T., Wolfson,H. and Nussinov,R. (2003) Protein–protein interactions:
method over previous physical models proposed to identify hot spots structurally conserved residues distinguish between binding sites and exposed protein
surfaces. Proc. Natl Acad. Sci. USA, 100, 5772–5777.
of binding free energy (Kortemme and Baker, 2002; Sheinerman and
Mainfroid,V., Mande,S.C., Hol,W.G., Martial,J.A. and Goraj,K. (1996) Stabilization
Honig, 2002). On the other hand, the graph-spectral method pro- of human triosephosphate isomerase by improvement of the stability of individual
posed by Brinda et al., including some additional information, such alpha-helices in dimeric as well as monomeric forms of the protein. Biochemistry,
as residue solvent accessibility and sequence conservation, shows 35, 4110–4117.
that the betweenness centrality turns out to be a better and simpler Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) SCOP: a structural classi-
fication of proteins database for the investigation of sequences and structures. J. Mol.
predictor of hot spot regions. There is a possibility that the cor- Biol., 247, 536–540.
respondence between energy hot spots and structurally conserved Nowak,M.A. and May,R.M. (1992) Evolutionary games and spatial chaos. Nature, 359,
residues remarked upon by Ma et al., could be related to the tendency 826–829.
of energy hot spots to remain central in the interacting network. Sheinerman,F.B. and Honig,B. (2002) On the role of electrostatic interactions in the
design of protein–protein interfaces. J. Mol. Biol., 318, 161–177.
Finally, we should mention that a graph theoretical representation
Shemesh,A., Amitai,G., Sitbon,E., Shklar,M., Netanely,D., Venger,I. and
method similar to ours has been proposed by Shemesh et al. for Pietrokovski,S. (2004) Structural analysis of residue interaction graphs. The First
identifying functional sites in protein structures. These authors repor- Structural Bioinformatics Meeting, ISMB/ECCB2004. pp. 22–23.
ted that the most central residues in protein structure networks are Strogatz,S.H. and Steward,I. (1993) Coupled oscillators and biological synchronization.
found in functional sites (catalytic or ligand binding sites). Although Sci. Am., 269, 102–109.
Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: improving the
their measure of centrality differs from our definition of betweenness sensitivity of progressive multiple sequence alignment through sequence weighting,
centrality, it would be interesting to explore the possibility of using position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22,
the information of residue centrality in the monomeric structures in 4673–4680.

1314
Small-world networks of protein–protein complex structures

Thorn,K.S. and Bogan,A.A. (2001) ASEdb: a database of alanine mutations and their Verkhivker,G.M., Bouzida,D., Gehlhaar,D.K., Rejto,P.A., Freer,S.T. and Rose,P.W.
effects on the free energy of binding in protein interactions. Bioinformatics, 17, (2002) Monte carlo simulations of the peptide recognition at the consensus bind-
284–285. ing site of the constant fragment of human immunoglobulin G: the energy landscape
Vendruscolo,M., Dokholyan,N.V., Paci,E. and Karplus,M. (2002) Small-world view analysis of a hot spot at the intermolecular interface. Proteins, 48, 539–557.
of the amino acids that play a key role in protein folding. Phys. Rev. E, 65, Watts,D.J. and Strogatz,S.H. (1998) Collective dynamics of small-world networks.
061910-1–061910-4. Nature (London), 393, 440–442.

Downloaded from https://academic.oup.com/bioinformatics/article/21/8/1311/249420 by guest on 09 February 2023

1315

You might also like