0% found this document useful (0 votes)

21 views22 pages

Plos 2015 - Maximum Entropy

Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models

Uploaded by

Mohaddeseh Mozaffari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views22 pages

Plos 2015 - Maximum Entropy

Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models

Uploaded by

Mohaddeseh Mozaffari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

REVIEW

Inferring Pairwise Interactions from

Biological Data Using Maximum-Entropy
Probability Models
Richard R. Stein1*, Debora S. Marks2, Chris Sander1*
1 Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New
York, New York, United States of America, 2 Department of Systems Biology, Harvard Medical School,
Boston, Massachusetts, United States of America

* [email protected] (RRS); [email protected] (CS)

Abstract
Maximum entropy-based inference methods have been successfully used to infer direct
interactions from biological datasets such as gene expression data or sequence ensem-
bles. Here, we review undirected pairwise maximum-entropy probability models in two cate-
gories of data types, those with continuous and categorical random variables. As a concrete
example, we present recently developed inference methods from the field of protein contact
prediction and show that a basic set of assumptions leads to similar solution strategies for
inferring the model parameters in both variable types. These parameters reflect interactive
couplings between observables, which can be used to predict global properties of the bio-
logical system. Such methods are applicable to the important problems of protein 3-D struc-
OPEN ACCESS
ture prediction and association of gene–gene networks, and they enable potential
applications to the analysis of gene alteration patterns and to protein design.
Citation: Stein RR, Marks DS, Sander C (2015)
Inferring Pairwise Interactions from Biological Data
Using Maximum-Entropy Probability Models. PLoS
Comput Biol 11(7): e1004182. doi:10.1371/journal.
Introduction
pcbi.1004182 Modern high-throughput techniques allow for the quantitative analysis of various components
Editor: Shi-Jie Chen, University of Missouri, UNITED of the cell. This ability opens the door to analyzing and understanding complex interaction pat-
STATES terns of cellular regulation, organization, and evolution. In the last few years, undirected pair-
wise maximum-entropy probability models have been introduced to analyze biological data
Published: July 30, 2015
and have performed well, disentangling direct interactions from artifacts introduced by inter-
Copyright: © 2015 Stein et al. This is an open mediates or spurious coupling effects. Their performance has been studied for diverse prob-
access article distributed under the terms of the
lems, such as gene network inference [1,2], analysis of neural populations [3,4], protein contact
Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any prediction [5–8], analysis of a text corpus [9], modeling of animal flocks [10], and prediction of
medium, provided the original author and source are multidrug effects [11]. Statistical inference methods using partial correlations in the context of
credited. graphical Gaussian models (GGMs) have led to similar results and provide a more intuitive
Funding: This work was supported by NIH awards
understanding of direct versus indirect interactions by employing the concept of conditional
R01 GM106303 (DSM, CS, and RRS) and P41 independence [12,13].
GM103504 (CS). The funders had no role in the Our goal here is to derive a unified framework for pairwise maximum-entropy probability
preparation of the manuscript. models for continuous and categorical variables and to discuss some of the recent inference
Competing Interests: The authors have declared approaches presented in the field of protein contact prediction. The structure of the manuscript
that no competing interests exist. is as follows: (1) introduction and statement of the problem, (2) deriving the probabilistic

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 1 / 22

model, (3) inference of interactions, (4) scoring functions for the pairwise interaction strengths,
and (5) discussion of results, improvements and applications.
Better knowledge of these methods, along with links to existing implementations in terms of
software packages, may be helpful to improve the quality of biological data analysis compared
to standard correlation-based methods and increase our ability to make predictions of interac-
tions that define the properties of a biological system. In the following, we highlight the power
of inference methods based on the maximum-entropy assumption using two examples of bio-
logical problems: inferring networks from gene expression data and residue contacts in pro-
teins from multiple sequence alignments. We compare solutions obtained using (1)
correlation-based inference and (2) inference based on pairwise maximum-entropy probability
models (or their incarnation in the continuous case, the multivariate Gaussian distribution).

Gene association networks

Pairwise associations between genes and proteins can be determined by a variety of data types,
such as gene expression or protein abundance. Association between entities in these data types
are commonly estimated by the sample Pearson correlation coefficient computed for each
pair of variables xi and xj from the set of random variables x1,. . ., xL. In particular, for M given
samples in L measured variables, x1 ¼ ðx11 ; . . . ; xL1 Þ ; . . . ; xM ¼ ðx1M ; . . . ; xLM Þ 2 RL , it is
T T

deﬁned as,

C^ ij
rij :¼ qffiffiffiffiffiffiffiffiffiffiffi ;
^ ii C
C ^ jj

XM
^ ij :¼ 1
where C ðxim xi Þðxjm xj Þ denotes the (i, j)-element of the empirical covariance
M m¼1
^ ¼ ðC ^ ij Þ
matrix C i;j¼1;...;L . The sample mean operator provides the empirical mean from the
XM
measured data and is defined as xi :¼ M1 xm . A simple way to characterize dependencies
m¼1 i
in data is to classify two variables as being dependent if the absolute value of their correlation
coefficient is above a certain threshold (and independent otherwise) and then use those pairs to
draw a so-called relevance network [14]. However, the Pearson correlation is a misleading mea-
sure for direct dependence as it only reflects the association between two variables while ignor-
ing the influence of the remaining ones. Therefore, the relevance network approach is not
suitable to deduce direct interactions from a dataset [15–18]. The partial correlation between
two variables removes the variational effect due to the influence of the remaining variables
(Cramér [19], p. 306). To illustrate this, let’s take a simplified example with three random vari-
ables xA, xB, xC. Without loss of generality, we can scale each of these variables to zero-mean
qffiffiffiffiffiffi
and unit-standard deviation by xi 7!ðxi xi Þ C^ ii , which simplifies the correlation coeffi-

cient to rij xi xj . The sample partial correlation coefﬁcient of a three-variable system between
xA and xB given xC is then deﬁned as [19,20]

rAB rBC rAC ðC^ 1 Þ

rABC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffipffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
AB
:
1 rAC 2 1 rBC 2
^
ðC ÞAA ðC
1 ^ 1 Þ
BB

The latter equivalence by Cramer’s rule holds if the empirical covariance matrix,
^
C ¼ ðC^ ij Þ
i;j2fA;B;Cg , is invertible. Krumsiek et al. [21] studied the Pearson correlations and partial

correlations in data generated by an in silico reaction system consisting of three components A,

B, C with reactions between A and B, and B and C (Fig 1A). A graphical comparison of

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 2 / 22

Fig 1. Reaction system reconstruction and protein contact prediction. Association results of correlation-
based and maximum-entropy methods on biological data from an in silico reaction system (A) and protein
contacts (B). (A) Analysis by Pearson’s correlation yields interactions associating all three compounds A, B,
and C, in contrast to the partial correlation approach which omits the “false” link between A and C. (Fig 1A
based on [21].) (B) Protein contact prediction for the human RAS protein using the correlation-based mutual
information, MI, and the maximum-entropy based direct information, DI, (blue and red, respectively). The 150
highest scoring contacts from both methods are plotted on the protein contacts from experimentally
determined structure in gray. (Fig 1B based on [6].)
doi:10.1371/journal.pcbi.1004182.g001

Pearson’s correlations, rAB, rAC rBC, versus the corresponding partial correlations, rABC, rACB,
rBCA, shows that variables A and C appear to be correlated when using Pearson’s correlation as
a dependency measure since both are highly correlated with variable B, which results in a false
inferred reaction rAC. The strength of the incorrectly inferred interaction can be numerically
large and therefore particularly misleading if there are multiple intermediate variables B [22].
The partial correlation analysis removes the effect of the mediating variable(s) B and correctly
recovers the underlying interaction structure. This is always true for variables following a mul-
tivariate Gaussian distribution, but also seems to work empirically on realistic systems as
Krumsiek et al. [21] have shown for more complex reaction structures than the example pre-
sented here.

Protein contact prediction

The idea that protein contacts can be extracted from the evolutionary family record was formu-
lated and tested some time ago [23–26]. The principle used here is that slightly deleterious
mutations are compensated during evolution by mutations of residues in contact in order to
maintain the function and, by implication, the shape of the protein. Protein residues that are
close in space in the folded protein are often mutated in a correlated manner. The main prob-
lem here is that one has to disentangle the directly co-evolving residues and remove transitive
correlations from the large number of other co-variations in protein sequences that arise due to
statistical noise or phylogenetic sampling bias in the sequence family. Interactions not internal
to the protein are, for example, evolutionary constraints on residues involved in

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 3 / 22

oligomerization, protein–protein, protein–substrate interactions [6,27,28]. In particular, the
empirical single-site and pair frequency counts in residue i and in residues i and j for elements
s, o of the 20-element amino acid alphabet plus gap, fi(s) and fij(s, o), are extracted from a
representative multiple sequence alignment under applied reweighting to account for biases
due to undersampling. Correlated evolution in these positions was analyzed, e.g., by [29], by
using the mutual information between residue i and j,
!
X fij ðs; oÞ
MIij ¼ fij ðs; oÞln :
s;o
fi ðsÞfj ðoÞ

Although results did show promise, an important improvement was made years later by
using a maximum-entropy approach on the same setup [5–7,30]. In this framework, the direct
information of residue i and j was introduced by replacing fij in the mutual information by Pijdir ,
!
X Pijdir ðs; oÞ
DIij ¼ Pij ðs; oÞln
dir
; ð1Þ
s;o
fi ðsÞfj ðoÞ

where Pijdir ðs; oÞ ¼ Z1ij expðeij ðs; oÞ þ h~ i ðsÞ þ h~ j ðoÞÞ and h~ i ðsÞ; h~ j ðoÞ and Zij are chosen such
that Pijdir , which is based on a pairwise probability model of an amino acid sequence compatible
with the iso-structural sequence family, is consistent with the single-site frequency counts. In an
approximative solution, [6,7] determined the contact strength between the amino acids s and o
in position i and j, respectively, by
eij ðs; oÞ ’ ðC1 ðs; oÞÞij : ð2Þ

Here, (C−1(s,o))ij denotes the inverse element corresponding to Cij (s,o) fij(s,o) − fi(s)
fj(o) for amino acids s, o from a subset of 20 out of the 21 different states (the so-called gauge
fixing, see below). The comparison of contact prediction results based on MI- and DI-score for
the RAS human protein on top of the actual crystal structure shows a much more accurate pre-
diction result when using the direct information instead of the mutual information (Fig 1B).
The next section lays the foundation to deriving maximum-entropy models for the two data
types: continuous, as used in the first example, and categorical, as used in the second one. Sub-
sequently, we will present inference techniques to solve for their interaction parameters.

Deriving the Probabilistic Model

Ideally, one would like to use a probabilistic model that is, on the one hand, able to capture all
orders of directed interactions of all observables at play and, on the other hand, correctly repro-
duces the observed and to-be-predicted frequencies. However, this would require a prohibi-
tively large number of observed data points. For this reason, we restrict ourselves to
probabilistic models with terms up to second order, which we derive for continuous, real-val-
ued variables, and extend this framework to models with categorical variables that are suitable,
for example, to treat sequence information in the next section.

Model formulation for continuous random variables

We model the occurrence of sets of events in a particular biological system by a multivariate
probability distribution P(x) of L random variables x = (x1,. . .,xL)T 2RL that is, on the one
hand, consistent with the mean and covariance obtained from M observed data values x1,. . .,xM
and, on the other hand, maximizing the information entropy, S, to obtain the simplest possible
probability model consistent with the data. At this point, each of the data’s variables xi is

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 4 / 22

continuously distributed on real values. In a biological example, these data originate from gene
expression studies and each variable xi corresponds to the normalized mRNA level of a gene
measured in M samples. As an example, a recent pan-cancer study of The Cancer Genome
Atlas (TCGA) provided mRNA levels from M = 3,299 patient tumor samples from 12 cancer
types [31]. The problem can be large, e.g., in the case of a gene–gene association study one has L
20,000 human genes.
The first constraint on the unknown probability distribution, P: RL !R0 is that its integral
normalizes to 1,
ð
PðxÞ dx ¼ 1; ð3Þ
x

which is a natural requirement on any probability distribution. Additionally, the ﬁrst moment
of variable xi is supposed to match the value of the corresponding sample mean over M mea-
surements in each i = 1,. . ., L,
ð
1X M
hxi i ¼ PðxÞxi dx ¼ x m ¼ xi ; ð4Þ
x M m¼1 i

where we deﬁne the n-th moment of the random variable xi distributed by the multivariate
ð
probability distribution P as hxin i :¼ PðxÞxin dx. Analogously, the second moment of the var-
x
iables xi and xj and its corresponding empirical expectation is supposed to be equal,
ð
1X M
hxi xj i ¼ PðxÞxi xj dx ¼ xm xm ¼ xi xj ð5Þ
x M m¼1 i j

for i, j = 1,. . ., L. Taken together, Eqs 4 and 5 constrain the distribution’s covariance matrix to
be coherent to the empirical covariance matrix. Finally, the probability distribution should
maximize the information entropy,
ð
maximize S ¼ PðxÞln PðxÞ dx ð6Þ
x

with the natural logarithm ln. A well-known analytical strategy to ﬁnd functional extrema
under equality constraints is the method of Lagrange multipliers [32], which converts a con-
strained optimization problem into an unconstrained one by means of the Lagrangian L. In
our case, the probability distribution maximizing the entropy (Eq 6) subject to Eqs 3–5 is
found as the stationary point of the Lagrangian L ¼ LðPðxÞ; a; β; γÞ [33,34],
X
L X
L
L ¼ S þ aðh1i 1Þ þ bi ðhxi i xi Þ þ gij ðhxi xj i xi xj Þ: ð7Þ
i¼1 i;j¼1

The real-valued Lagrange multipliers α, β = (βi)i = 1,. . ., L and γ = (γij)i,j = 1,. . ., L correspond to
the constraints Eqs 3, 4, and 5, respectively. The maximizing probability distribution is then
found by setting the functional derivative of L with respect to the unknown density P(x) to
zero [33,35],
X
L X
L
dL
dPðxÞ
¼0 ) ln PðxÞ 1 þ a þ b i xi þ gij xi xj ¼ 0:
i¼1 i;j¼1

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 5 / 22

Its solution is the pairwise maximum-entropy probability distribution,
!
X L X L
1
Pðx; β; γÞ ¼ exp 1 þ a þ bi xi þ gij xi xj ¼ eHðx;β;γÞ ð8Þ
i¼1 i;j¼1
Z

which is contained in the family of exponential probability distributions and assigns a non-
negative probability to any system conﬁguration x = (x1,. . .,xL)T 2RL. For the second identity,
we introduced the partition function as normalization constant,
ð !
XL X
L
Zðβ; γÞ :¼ exp b i xi þ gij xi xj dx expð1 aÞ
x i¼1 i;j¼1

XL XL
with the Hamiltonian, HðxÞ :¼ i¼1
b i xi g x x . It can be shown by means of the
i;j¼1 ij i j
information inequality that Eq 8 is the unique maximum-entropy distribution satisfying the
constraints Eqs 3–5 (Cover and Thomas [35], p. 410). Note that α is fully determined for given
β = (βi) and γ = (γij) by the normalization constraint Eq 3 and is therefore not a free parameter.
The right-hand representation of Eq 8 is also referred to as Boltzmann distribution. The
matrix of Lagrange multipliers γ = (γij) has to have full rank in order to ensure a unique param-
etrization of P(x), otherwise, one can eliminate dependent constraints [33,36]. In addition, for
the integrals in Eqs 3–6 to converge with respect to L-dimensional Lebesgue measure, we
X
require γ to be negative deﬁnite, i.e., all of its eigenvalues to be negative or g xx ¼
i;j ij i j

xT γx < 0 for x 6¼ 0.

Concept of entropy maximization

Shannon states in his seminal work that information and (information) entropy are linked: the
more information is encoded in the system, the lower its entropy [37]. Jaynes introduced the
entropy maximization principle, which selects for the probability distribution that is (1) in
agreement with the measured constraints and (2) contains the least information about the
probability distribution [38–40]. In particular, any unnecessary information would lower the
entropy and, thus, introduce biases and allow overfitting. As demonstrated in the section
above, the assumption of entropy maximization under first and second moment constraints
results in an exponential model or Markov random field (in log-linear form) and many of the
properties shown here can be generalized to this model class [41]. On the other hand, there is
some analogy of entropy as introduced by Shannon to the thermodynamic notion of entropy.
Here, the Second law of Thermodynamics states that each isolated system monotonically
evolves in time towards a state of maximum entropy, the equilibrium. A thorough discussion
of this analogy and its limitation in non-equilibrium systems is beyond the scope of this review,
but can be found in [42,43]. Here, we exclusively use the notion entropy maximization as the
principle of minimal information content in the probability model consistent with the data.

Categorical random variables

In the following section, we derive the pairwise maximum-entropy probability distribution on
categorical variables. For jointly distributed categorical variables x = (x1,. . .,xL)T 2ΩL, each var-
iable xi is defined on the finite set Ω = {s1,. . ., sq} consisting of q elements. In the concrete
example of modeling protein co-evolution, this set contains the 20 amino acids represented by
a 20-letter alphabet from A standing for Alanine to Y for Tyrosine plus one gap element, then
Ω = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y,−} and q = 21. Our goal is to extract
co-evolving residue pairs from the evolutionary record of a given protein family. As input data,

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 6 / 22

Fig 2. Illustration of binary embedding. The binary embedding 1σ: Ω ! {0, 1}Lq maps each vector of
categorical random variables, x2ΩL, here represented by a sequence of amino acids from the amino acid
alphabet (containing the 20 amino acids and one gap element), Ω = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R,
S, T, V, W, Y,−}, onto a unique binary representation, x(σ)2{0, 1}Lq.
doi:10.1371/journal.pcbi.1004182.g002

we use a so-called multiple sequence alignment, {x1,. . ., xM} ΩL×M, a collection of closely
homologous protein sequences that is formatted such that it allows comparison of the evolu-
tion across each residue [44]. These alignments may stem from different hidden Markov
model-derived resources, such as PFAM [45], hhblits [46], and Jackhmmer [47].
To formalize the derivation of the pairwise maximum-entropy probability distribution on
categorical variables, we use the approach of [8,30,48] and replace, as depicted in Fig 2, each
variable xi defined on categorical variables by an indicator function of the amino acid s 2 Ω,
1s: Ω ! {0, 1}q,
(
1 if xi ¼ s;
xi 7!xi ðsÞ : 1s ðxi Þ ¼
0 otherwise:

This embedding specifies a unique representation of any L-vector of categorical random

variables, x, as a binary Lq-vector, x(σ) with a single non-zero entry in each binary q-subvector
xi(σ) = (xi(s1),. . ., xi(sq))T 2{0,1}q,
1s
x ¼ ðx1 ; . . . ; xL Þ 2 OL7!xðσÞ ¼ ðx1 ðs1 Þ; . . . ; xL ðsq ÞÞ 2 f0; 1g :
T T Lq

Inserting this embedding into the first and second moment constraints, corresponding to
Eqs 3 and 4 in the continuous variable case, we find their embedded analogues, the single and
pairwise marginal probability in positions i and j for amino acids s,o,2Ω
X X
hxi ðsÞi ¼ PðxðσÞÞxi ðsÞ ¼ Pðxi ¼ sÞ ¼ Pi ðsÞ;
xðσÞ x

X X
hxi ðsÞxj ðoÞi ¼ PðxðσÞÞxi ðsÞxj ðoÞ ¼ Pðxi ¼ s; xj ¼ oÞ ¼ Pij ðs; oÞ
xðσÞ x

including Pii(s,o) = Pi(s)1s(o) and with the distribution’s ﬁrst moment in each random vari-
X
able, hyi i ¼ y
PðyÞyi and y = (y1,. . ., yLq)T 2RLq. The analogue of the covariance matrix then
becomes a symmetric Lq × Lq matrix of connected correlations whose entries Cij(s,o) = Pij(s,o)
− Pi(s) Pj(o) characterize the dependencies between pairs of variables. In the same way, the

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 7 / 22

sample means translate to the single-site and pair frequency counts over m = 1,. . .,M data vectors
xm ¼ ðx1m ; . . . ; xLm Þ 2 OL ,
T

1X M
xi ðsÞ ¼ xm ðsÞ ¼ fi ðsÞ;
M m¼1 i

1X M
xi ðsÞxj ðoÞ ¼ xm ðsÞxjm ðoÞ ¼ fij ðs; oÞ:
M m¼1 i

The pairwise maximum-entropy probability distribution in categorical variables has to ful-

fill the normalization constraint,
X X
PðxÞ ¼ PðxðσÞÞ ¼ 1: ð9Þ
x xðσÞ

Furthermore, the single and pair constraints, the analogues of Eqs 3 and 4, enforce the resul-
ting probability distribution to be compatible with the measured single and pair frequency
counts,
Pi ðsÞ ¼ fi ðsÞ; Pij ðs; oÞ ¼ fij ðs; oÞ ð10Þ

for each i, j = 1,. . ., L and amino acids s,o2Ω. As before, we require the probability distribu-
tion to maximize the information entropy,
X X
maximize S ¼ PðxÞln PðxÞ ¼ PðxðσÞÞln PðxðσÞÞ: ð11Þ
x xðσÞ

The corresponding Lagrangian, L ¼ LðPðxðσÞÞ; a; βðσÞ; γðσ; ωÞÞ, has the functional form,
L X
X X
L X
L ¼ S þ aðh1i 1Þ þ bi ðsÞðPi ðsÞ fi ðsÞÞ þ gij ðs; oÞðPij ðs; oÞ fij ðs; oÞÞ:
i¼1 s2O i;j¼1 s;o2O

For notational convenience, the Lagrange multipliers βi(s) and γij(s,o) are grouped to the
Lq-vector βðσÞ ¼ ðbi ðsÞÞi¼1;...;L and the Lq × Lq-matrix γðσ; ωÞ ¼ ðgij ðs; oÞÞi;j¼1;...;L , respec-
s2O s;o2O

@L
tively. The Lagrangian’s stationary point, found as the solution of @PðxðσÞÞ ¼ 0, determines the
pairwise maximum-entropy probability distribution in categorical variables [30,49],
!
1 X L X X L X
PðxðσÞ; β; γÞ ¼ exp bi ðsÞxi ðsÞ þ gij ðs; oÞxi ðsÞxj ðoÞ ð12Þ
Z i¼1 s2O i;j¼1 s;o2O

with normalization by the partition function, Z exp(1−α). Note that distribution Eq 12 is of

the same functional form as Eq 8 but with binary random variables x(σ) 2{0,1}Lq instead of
continuous ones x2RL. At this point, we introduce the reduced parameter set, hi(s): = βi(s)+
γii(s, s) and eij(s,o): = 2γij(s,o) for i < j, using the symmetry of the Lagrange multipliers,
γij(s,o): = γji(o, s), and that xi(s) xi(o) = 1 if and only if s = o. For a given sequence (z1,. . .,
zL)2ΩL summing over all non-zero elements, (x1(z1) = 1,. . ., xL(zL) = 1) or equivalently (x1 =
z1,. . .,xL = zL) then yields the probability assigned to the sequence of interest,
!
1 XL X
Pðz1 ; . . . ; zL Þ exp hi ðzi Þ þ eij ðzi ; zj Þ : ð13Þ
Z i¼1 1i<jL

This is the 21-state maximum-entropy probability distribution as presented by [5–7].

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 8 / 22

Gauge fixing
In contrast to the continuous variable case in which the number of constraints naturally
matches the number of unknown parameters, the case of categorical variables has dependencies
X X
due to 1 ¼ P ðsÞ for each i = 1,. . ., L and Pi ðsÞ ¼
s2O i
P ðs; oÞ for each i, j = 1,. . ., L
o2O ij
2
and s2Ω. This results in at most LðL1Þ
2
ðq 1Þ þ Lðq 1Þ independent constraints compared
to LðL1Þ q2 þ Lq free parameters to be estimated. To ensure the uniqueness of the inferred
2
X X
parameters in defining the Hamiltonian, Hðx1 ; . . . ; xL Þ ¼ i<j eij ðxi ; xj Þ h ðxi Þ, and, by
i i
implication, the probability distribution, one has to reduce the number of independent parame-
ters such that these match the number of independent constraints. For this purpose, so-called
gauge fixing [5] has been proposed, which can be realized in different ways. For example, the
authors of [6,7] set the parameters corresponding to the last amino acid in the alphabet, sq, to
zero, i.e., eij(sq,) = eij(, sq) = 0 and hi(sq) = 0 for 1 i < j L, resulting in rows and columns
of zeros at the end of each q q-block of the Lq × Lq coupling matrix. Alternatively, the
X X X
authors of [5] introduce a zero-sum gauge, s
eij ðs; oÞ ¼ s
e ij ðo 0
; sÞ ¼ 0 and h ðsÞ ¼
s i
0 for each 1 i < j L and o, o2Ω. However, different gauge fixings are not equally efficient
for the purpose of protein contact prediction. The zero-sum gauge is the parameter fixing that
minimizes the sum of squares of the pairwise parameters in the Hamiltonian H,
X 2
e ðs; oÞ , which makes it the suitable choice when using non-gauge invariant scoring
s;o ij
functions, such as the (average product-corrected) Frobenius norm [5,50] (see section “Scoring
Functions”). Moreover, no gauge fixing is required when combining the strictly convex ℓ1- or
ℓ2-regularizer with negative loglikelihood minimization; here the regularizer selects for a unique
representation among all parametrizations of the optimal distribution [32,51]. However, to
additionally minimize the Frobenius norm of the pairwise interactions, [51] changed the
obtained full parameter set from regularized inference with plmDCA to zero-sum gauge by,
X X X
eij ðs; oÞ7!eij ðs; oÞ 1q s 0
e ij ðs 0
; oÞ 1
q o 0
e ij ðs; o 0
Þ þ 1
q 2 e ðs0 ; o0 Þ, where q denotes
s0 ;o0 ij
the length of the alphabet.

Network interpretation
The derived pairwise maximum-entropy distributions in Eqs 13 or 12 and 8 specify an undi-
rected graphical model or Markov random field [34,41]. In particular, a graphical model repre-
sents a probability distribution in terms of a graph that consists of a node and an edge set.
Edges characterize the dependence structure between nodes and a missing edge then corre-
sponds to conditional independence given the remaining random variables. For continuous,
real-valued variables, the maximum-entropy distribution with first and second moment con-
straints is multivariate Gaussian, which will be demonstrated in the next section. Its depen-
dency structure is represented by a graphical Gaussian model (GGM) in which a missing edge,
γij = 0, corresponds to conditional independence between the random variables xi and xj (given
the remaining ones), and is further specified by a zero entry in the corresponding inverse
covariance matrix, (C−1)ij = 0.
In the next section, we describe how the dependency structure of the graph is inferred.

Inference of Interactions
Up to this point, the functional form of the maximum-entropy probability distribution is speci-
fied, but not its determining parameters. For categorical variables with dimension L > 1, there
is typically no closed-form solution. In the following section, we present several inference

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 9 / 22

methods to estimate these parameters that have recently been used in the context of protein
contact prediction. Those are (1) for continuous variables, the exact closed-form solution
which approximates the mean-field result for categorical variables, and (2) three inference
methods for categorical variables based on the maximum-likelihood methodology: the stochas-
tic maximum likelihood, the approximation by pseudo-likelihood maximization, and finally,
the sparse maximum-likelihood solution.

Closed-Form Solution for Continuous Variables

The simplest approach to extract the unknown Lagrange multipliers α, β = (βi), and γ = (γij)
from P(x) exactly is to use basic integration properties of the continuous random variables xi in
the constraints Eqs 3–5. For this purpose, we rewrite the exponent of the pairwise maximum-
entropy probability distribution Eq 8,

1 1 1 1 1
Pðx; β; γ~ Þ ¼ exp βT x xT γ~ x ¼ exp βT γ~ 1 β ðx γ~ 1 βÞ γ~ ðx γ~ 1 βÞ ;
T

Z 2 Z 2 2

where we use the replacement γ~ :¼ 2γ and require γ~ to be positive definite (which is equiva-
lent to γ being negative definite), i.e., xT γ~ x > 0 for any x 6¼ 0, which makes its inverse γ~ 1 ¼
12 γ 1 well-defined. As already discussed, this is a sufficient condition on the integrals in Eqs
3–6 to be finite. For notational convenience, we define the shifted variable z ¼ ðz1 ; . . . ; zL Þ :¼
T

XL
x γ~ 1 β or xi ¼ zi þ j¼1
ð~
γ 1 Þij bj and accordingly, the maximum-entropy probability dis-
tribution becomes

1 1 1 1T
PðxÞ ¼ exp ðx γ~ 1 βÞ γ~ ðx γ~ 1 βÞ e2z γ~ z ð14Þ
T

Z~ 2 Z~

with the normalization constant Z~ ¼ exp 1 a 12 βT γ~ 1 β . The normalization condition Eq
3 in the new variable is,
ð ð
1 1 T
1 ¼ PðxÞ dx e2z γ~ z dz ð15Þ
x
~
Z z

and the linear shift does not affect the integral when integrated over RL yielding for the nor-
ð
~ 1 T
malization constant, Z ¼ e2z γ~ z dz. Furthermore, the ﬁrst-order constraint Eq 4 becomes
z
for each i = 1,. . ., L,
ð ð !
1 X
L X
L
12zT γ~ z
hxi i ¼ PðxÞxi dx e zi þ γ 1 Þij bj
ð~ dz ¼ γ 1 Þij bj
ð~
x Z~ z j¼1 j¼1

ð
1 Tγ
~z
and we used the point symmetry of the integrand then, e2z zi dz ¼ 0 in each i = 1,. . ., L.
z
Analogously, we ﬁnd for the second moment, determining the correlations for each index pair
i, j = 1,. . ., L,
ð ð
1 1 T
hxi xj i ¼ PðxÞxi xj dx e2z γ~ z ðzi hxi iÞðzj hxj iÞ dz ¼ hzi zj i þ hxi ihxj i;
x
~
Z z

where we use again the point symmetry and the result on the normalization constraint. Based

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 10 / 22

on this, the covariance is found as,
Cij ¼ hxi xj i hxi ihxj i hzi zj i:

Finally, the term hzi zji is solved using a spectral decomposition of the symmetric and posi-
tive-definite matrix γ~ as sum over products of its eigenvectors v1,. . .,vL and real-valued and pos-
XL
itive eigenvalues λ1,. . .,λL, γ~ ¼ l v vT . The eigenvectors form a basis of RL and assign
k¼1 k k k
X L
new coordinates, y1,. . .,yL, to z ¼ y v , which allows writing of the exponent hzi zji as
k¼1 k k
XL
zT γ~ z ¼ l y2 . The covariance between xi and xj then reads as (Bishop [52], p. 83)
k¼1 k k
ð !
1X L
1X L
2
XL
1
hzi zj i ¼ ðvl Þi ðvn Þj exp lk yk yl yn dy ¼ γ 1 Þij
ðvk Þi ðvk Þj ð~
Z~ l;n¼1 y 2 k¼1 l
k¼1 k

with solution Cij ¼ ð~γ 1 Þij or ðC 1 Þij ¼ ð~

γ Þij ¼ 2gij . Taken together, the Lagrange multipliers
β and γ are speciﬁed in terms of the mean, hxi, and the inverse covariance matrix (also known
as the precision or concentration matrix), C−1,
1 1
β ¼ C 1 hxi; γ ¼ γ~ ¼ C1 : ð16Þ
2 2

As a consequence, the real-valued maximum-entropy distribution Eq 14 for given first and

second moments is found as the multivariate Gaussian distribution, which is determined by
the mean hxi and the covariance matrix C,

L=2 1=2 1 T 1
Pðx; hxi; CÞ ¼ ð2pÞ detðCÞ exp ðx hxiÞ C ðx hxiÞ ð17Þ
2

and we refer to [52] for the derivation of the normalization factor. The initial requirement of
γ~ ¼ 2γ to be positive definite results in a positive-definite covariance matrix C, a necessary
condition for the Gaussian density to be well defined. In summary, the multivariate Gaussian
distribution maximizes the entropy among all probability distributions of continuous variables
with specified first and second moments. The pair interaction strength is now evaluated by the
already introduced partial correlation coefficient between xi and xj given the remaining vari-
ables {xr}r2{1,. . ., L}\{i,j},
8
>
> ðC1 Þij
gij < q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi if i 6¼ j;
rijf1;...;Lgnfi;jg pffiffiffiffiffiffiffiffi ¼ ðC 1 Þii ðC 1 Þjj ð18Þ
gii gjj > >
:
1 if i ¼ j:

Data integration
In biological datasets as used to study gene association, the number of measurements, M, is
typically smaller than the number of observables, L, i.e., M < L in our terminology. Conse-
XM
quently, the empirical covariance matrix, C^¼ 1 ðxm xÞðxm xÞ , will in these cases
T
M m¼1
always be rank-deﬁcient (and, thus, not invertible) since its rank can exceed neither the num-
ber of variables, L, nor the number of measurements, M. Moreover, even in cases when M L,
the empirical covariance matrix may become non-invertible or badly conditioned (i.e., close to
singular) due to dependencies in the data. However, for variables following a multivariate
Gaussian distribution, one can access the elements of its inverse by maximizing the penalized
Gaussian loglikelihood, which results in the following estimate of the inverse covariance

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 11 / 22

matrix, C 1 Cd;l
1
,
1
Cd;l ^
¼ arg maxfln detðYÞ traceðCYÞ lkYkdd g ð19Þ
Y pos: definite;
symmetric

X
with penalty parameter λ 0 and kYkdd ¼ jYij j . If λ = 0, we obtain the maximum-likeli-
d
i;j

hood estimate, for δ = 1 and λ > 0 the ℓ1-regularized (sparse) maximum-likelihood solution
that selects for sparsity [53,54], and for δ = 2 and λ > 0 the ℓ2-regularized maximum-likelihood
solution that favors small absolute values in the entries of the selected inverse covariance
matrix [55]. For δ = 1 and λ > 0, the method is called LASSO, for δ = 2 and λ > 0, ridge regres-
sion. Alternatively, regularization can be directly applied to the covariance matrix, e.g., by
shrinkage [17,56].

Solution for categorical variables

An ad hoc ansatz to extract the pairwise parameters in the categorical variables case (12) is to
extend the binary variable xðσÞ ¼ ðxi ðsk ÞÞik 2 f0; 1g to a continuous one, y = (yj)j 2RL(q−1),
Lðq1Þ

and replace the sums in the distribution and the moments hi by integrals. The extended binary
maximum-entropy distribution Eq 12 is then approximated by the Lq-dimensional multivariate
Gaussian with inherited analogues of the mean hyi ¼ ðfi ðsk ÞÞik 2 RLðq1Þ and the empirical
^ ωÞ ¼ ðC ^ ij ðsk ; sl ÞÞ ^ ij ðs; oÞ ¼
i;j;k;l 2 R
Lðq1Þ Lðq1Þ
covariance matrix Cðσ; whose elements C
fij ðs; oÞ fi ðsÞfj ðoÞ are characterizing the pairwise dependency structure. The gauge ﬁxing
results in setting the preassigned entries referring to the last amino acid in the mean vector and
the covariance matrix to zero, which reduces the model’s dimension from Lq to L(q−1); other-
wise the unregularized covariance matrix would always be non-invertible. Typically, the single
and pair frequency counts are reweighted and regularized by pseudocounts (see section
“Sequence data preprocessing”) to additionally ensure that Cðσ; ^ ωÞ is invertible. Final applica-
tion of the closed-form solution for continuous variables Eq 16 to the extended binary variables
for C 1 ðσ; ωÞ C ^ 1 ðσ; ωÞ yields the so-called mean-ﬁeld (MF) approximation [48],
1 1
ij ðs; oÞ ¼ ðC Þij ðs; oÞ
gMF
2
) eMF 1
ij ðs; oÞ ¼ ðC Þij ðs; oÞ ð20Þ

for amino acids s,o2Ω and with restriction to residues i < j in the latter identity. The same
solution has been obtained by [6,7] using a perturbation ansatz to solve the q-state Potts model
termed (mean-ﬁeld) Direct Coupling Analysis (DCA or mfDCA). In Ising models, this result is
also known as naïve mean-ﬁeld approximation [57–59].
The following section is dedicated to maximum likelihood-based inference approaches,
which have been presented in the field of protein contact prediction.

Maximum-Likelihood Inference
A well-known approach to estimate the parameters of a model is maximum-likelihood infer-
ence. The likelihood is a scalar measure of how likely the model parameters are, given the
observed data (Mackay [34], p. 29), and the maximum-likelihood solution denotes the parame-
ter set maximizing the likelihood function. For Markov random fields, the maximum-likeli-
hood solution is consistent, i.e., recovers the true model parameters in the limit of infinite data
(Koller and Friedman [32], p. 949). In particular, for a pairwise model with parameters hðσÞ ¼
ðhi ðsÞÞi¼1;...;L and eðσ; ωÞ ¼ ðeij ðs; oÞÞ1i<jL , we ﬁnd the likelihood l(h(σ),e(σ,ω)) = l(h(σ),e(σ,
s2O s;o2O

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 12 / 22

ω)|x1,. . ., xM) given observed data, x1,. . ., xM 2ΩL,which are assumed to be independent and
identically distributed (iid), as
Y
M
lðhðσÞ; eðσ; ωÞjx1 ; . . . ; xM Þ ¼ Pðxm ; hðσÞ; eðσ; ωÞÞ: ð21Þ
m¼1

The estimates of the model parameters are then obtained as the maximizer of l or, using the
monotonicity of the logarithm, the minimizer of ln l,
fhML ðσÞ; eML ðσ; ωÞg ¼ arg max lðhðσÞ; eðσ; ωÞÞ arg min ln lðhðσÞ; eðσ; ωÞÞ:
hðsÞ;eðs;oÞ hðsÞ;eðs;oÞ

When we specify the maximum-entropy distribution Eq 13 as model distribution, the then-

concave loglikelihood [32] becomes
X
M
ln lðhðσÞ; eðσ; ωÞÞ ¼ ln Pðxm ; hðσÞ; eðσ; ωÞÞ
m¼1
" # ð22Þ
L X
X X X
¼ M ln Z hi ðsÞfi ðsÞ eij ðs; oÞfij ðs; oÞ :
i¼1 s 1i<jL s;o

The maximum-likelihood solution is found by taking the derivatives of Eq 22 with respect

to the model parameters hi(s) and eij(s,o) and setting to zero,

@ @
@hi ðsÞ
ln l ¼ M
@hi ðsÞ
ln Z
fhðσÞ;eðσ;ωÞg j
fi ðsÞ ¼ 0;
" # ð23Þ
@ @
@eij ðs; oÞ
ln l ¼ M
@eij ðs; oÞ
ln Z
fhðσÞ;eðσ;ωÞg j
fij ðs; oÞ ¼ 0:

The partial derivatives of the partition function,

X X X
Z¼ ðx ;...;x Þ
exp i
h i ðx i Þ þ e ðxi ; xj Þ , follow the well-known identities
i<j ij
1 L

@ 1
@hi ðsÞ
ln Z j fhðσÞ;eðσ;ωÞg
¼ @ Z
Z hi ðsÞ j fhðσÞ;eðσ;ωÞg
¼ Pi ðs; hðσÞ; eðσ; ωÞÞ;

@ 1
@eij ðs; oÞ
ln Z j fhðσÞ;eðσ;ωÞg
¼ @
Z eij ðs;oÞ
Z j fhðσÞ;eðσ;ωÞg
¼ Pij ðs; o; hðσÞ; eðσ; ωÞÞ:

The maximizing parameters, hML ðσÞ ¼ ðhML i ðsÞÞi¼1;...;L and e ðσ; ωÞ ¼ ðeML
ij ðs; oÞÞ1i<jL ,
ML s2O s;o2O

are those matching the distribution’s single and pair marginal probabilities with the empirical
single and pair frequency counts,
Pi ðs; hML ðσÞ; eML ðσ; ωÞÞ ¼ fi ðsÞ; Pij ðs; o; hML ðσÞ; eML ðσ; ωÞÞ ¼ fij ðs; oÞ

in residues i = 1,. . ., L and i,j = 1,. . ., L, respectively, and for amino acids s,o2Ω. In other
words, matching the moments of the pairwise maximum-entropy probability distribution to
the given data is equivalent to maximum-likelihood ﬁtting of an exponential family [34,60].
Although the maximum-likelihood solution is globally optimal for the pairwise maximum-
entropy probability model, based on the concavity of ln l, the resulting distribution is not nec-
essarily unique, due to dependencies in the input data (Koller and Friedman [32], p. 948). To
remove these equivalent optima and select for a unique representation, one needs to introduce
further constraints by, for example, gauge ﬁxing or regularization.

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 13 / 22

Based on the maximum-likelihood principle, we present three solution approaches in the
remainder of this section.

Stochastic maximum likelihood

The maximum-likelihood solution is typically inaccessible for models of categorical variables
due to the computational complexity of estimating the partition function Z which involves a
sum over all possible states and grows exponentially with the size of the system [3,61]. Lapedes
et al. [30] solved Eq 22 by likelihood maximization on sampled subsets using the Metropolis–
Hastings algorithm [32,34]. In particular, the likelihood is maximized iteratively by following
the steepest ascent of the loglikelihood function ln l using Eq 23. In each maximization step,
the parameters hðkÞ ðkÞ
i ðsÞ and eij ðs; oÞ are changed in proportion to the gradient of ln l and
scaled by the constant step size ε > 0,
@
DhðkÞ
i ðsÞ ¼ ε
@hi ðsÞ
ln l j
fhðkÞ ðσÞ;eðkÞ ðσ;ωÞg
/ fi ðsÞ Pi ðs; hðkÞ ðσÞ; eðkÞ ðσ; ωÞÞ;

@
DeðkÞ
ij ðs; oÞ ¼ ε
@eij ðs; oÞ
ln l j
fhðkÞ ðσÞ;eðkÞ ðσ;ωÞg
/ fij ðs; oÞ Pij ðs; o; hðkÞ ðσÞ; eðkÞ ðσ; ωÞÞ

until convergence is reached as the differences DhðkÞi ðs; oÞ :¼ hi

ðkþ1Þ
ðs; oÞ hðkÞ
i ðs; oÞ,
ðkÞ ðkþ1Þ ðkÞ
i = 1,. . ., L, and Deij ðs; oÞ :¼ eij ðs; oÞ eij ðs; oÞ, 1 i < j L, go to zero [30]. The com-
putation of the marginals requires summing over 20L states and is, for example, estimated by
Monte-Carlo sampling. As the likelihood is concave, there are no local maxima and the maxi-
mum-likelihood parameters are obtained in the limit k ! 1,

fhML ðσÞ; eML ðσ; ωÞg ¼ lim fhðkÞ ðσÞ; eðkÞ ðσ; ωÞg
k!1

or DhðkÞ ðkÞ
i ðs; oÞ ! 0 for i = 1,. . ., L and Deij ðs; oÞ ! 0 for 1 i < j L and s,o2Ω \ {sq}, a

subset of Ω containing q−1 elements to account for gauge ﬁxing.

Pseudo-likelihood maximization
Besag [62] introduced the pseudo-likelihood as approximation to the likelihood function in
which the global partition function is replaced by computationally tractable local estimates.
The pseudo-likelihood inherits the concavity from the likelihood and yields the exact maxi-
mum-likelihood parameter in the limit of infinite data for Gaussian Markov random fields
[41,62], but not in general [63]. Applications of this approximation to non-continuous categor-
ical variables have been studied, for instance, in sparse inference of Ising models [64] but may
lead to results that differ from the maximum-likelihood estimate. In this approach, the proba-
bility of the m-th observation, xm, is approximated by the product of the conditional probabili-
ties of xr ¼ xrm given observations in the remaining variables
xnr :¼ ðx1 ; . . . ; xr1 ; xrþ1 ; . . . ; xL Þ 2 OL1 [51],
T

Y
L
Pðxm ; hðσÞ; eðσ; ωÞÞ ’ Pðxr ¼ xrm jxnr ¼ xmnr ; hðσÞ; eðσ; ωÞÞ:
r¼1

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 14 / 22

Each factor is of the following analytical form,
!
X
exp hr ðxrm Þ þ erj ðxrm ; xjm Þ
j6¼r
Pðxr ¼ xrm jxnr ¼ xmnr ; hðσÞ; eðσ; ωÞÞ ¼ !;
X X
exp hr ðsÞ þ erj ðs; xjm Þ
s j6¼r

which only depends on the unknown parameters (eij(s,o))i6¼r,j6¼r and (hi(s))i6¼r and makes the
computation of the pseudo-likelihood tractable. Note, we treat eij(s,o) = eji(o,s) and eii(,) =
0. By this approximation, the loglikelihood Eq 21 becomes the pseudo-loglikelihood,
X
M X
L
ln lPL ðhðσÞ; eðσ; ωÞÞ :¼ ln Pðxr ¼ xrm jxnr ¼ xmnr ; hðσÞ; eðσ; ωÞÞ:
m¼1 r¼1

In the final formulation of the pseudo-likelihood maximization (PLM) problem, an ℓ2-regu-

larizer is added to select for small absolute values of the inferred parameters,
fhPLM ðσÞ; ePLM ðσ; ωÞg ¼ arg minfln lPL ðhðσÞ; eðσ; ωÞÞ þ lhkhðσÞk22 þlekeðσ; ωÞk22 g;
hðσÞ;eðσ;ωÞ

where λh, λe > 0 adjust the complexity of problem and are selected in a consistent manner across
different protein families to avoid overﬁtting. This approach has been presented (with scaling of
the pseudo-loglikelihood by M1eff wm to include sequence weighting, see section “Sequence data
preprocessing”) by [51] under the name plmDCA (PseudoLikelihood Maximization Direct Cou-
pling Analysis) and has shown performance improvements compared to the mean-ﬁeld approxi-
mation Eq 20. Another inference method based on the pseudolikelihood maximization but
including prior knowledge in terms of secondary structure and information on pairs likely to be
in contact is Gremlin (Generative REgularized ModeLs of proteINs) [65–67].

Sparse maximum likelihood

Similar to the derivation of the mean-field result (20), Jones et al. [8] approximated Eq 12 by a
multivariate Gaussian and accessed the elements of the inverse covariance matrix by a maxi-
mum-likelihood inference under sparsity constraint [54,68,69]. The corresponding method has
been called Psicov (Protein Sparse Inverse COVariance). The validity of this approach to solve
the sparse maximum-likelihood problem in binary systems such as Ising models has been dem-
onstrated by [69], followed by consistency studies [70]. In particular, the Psicov method infers
the sparse maximum-likelihood estimate of the inverse covariance matrix Eq 19 for δ = 1 using
the analogue of the empirical covariance matrix derived from the observed amino acid frequen-
^
cies, Cðσ; ^ ij ðs; oÞ ¼ fij ðs; oÞ fi ðsÞfj ðoÞ, the empirical connected correla-
ωÞ. Its elements C
tions, are preprocessed by reweighting and regularized by pseudocounts and shrinkage.
Regularized loglikelihood maximization Eq 19 selects a unique representation of the model,
i.e., no additional gauge ﬁxing is required. Using identity Eq 16 on the elements of the sparse
maximum-likelihood (SML) estimate of the inverse covariance, C1;l 1
ðσ; ωÞ, yields the estimates
for the Lagrange multipliers,
1 1
ij ðs; oÞ ¼ ðC1;l Þij ðs; oÞ
gSML
2
) 1
ij ðs; oÞ ¼ ðC1;l Þij ðs; oÞ
eSML

for s,o2Ω; in the second identity, the symmetric Lagrange multipliers γij(s,o) deﬁned for

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 15 / 22

indices i,j = 1,. . ., L have been hypothetically translated to the reduced parameter formulation
eij(s,o) for 1 i < j L.

Sequence data preprocessing

The study of residue–residue co-evolution is based on data from multiple sequence alignments,
which represent sampling from the evolutionary record of a protein family. Multiple sequence
alignments from currently existing sequence databases do not evenly represent the space of
evolved sequences as they are subject to acquisition bias towards available species of interest.
To account for uneven representation, sequence reweighting has been introduced to lower the
contributions of highly similar sequences and assign higher weight to unique ones (see Durbin
et al. [44], p. 124 ff.). In particular, the weight of the m-th sequence, wm: = 1/km, in the align-
XM XL
ment {x1,. . .,xM}, can be chosen to be the inverse of km :¼ n¼1
H i¼1
1ðxim ; xin Þ L y ,
the number of sequences xm shares more than θ 100% of its residues with. Here, θ denotes a
similarity threshold and is typically chosen as 0.7 θ 0.9, 1(a,b) = 1 if a = b and 1(a,b) = 0,
otherwise, and H is the step function with H(y) = 0 if y < 0 and H(y) = 1, otherwise. This also
provides us with an estimate of the effective number of sequences in the alignment,
XM
M :¼
eff m¼1
w . Additionally, pseudocount regularization with l~ > 0 is used to deal with
m

ﬁnite sampling bias and to account for underrepresentation [5–8,44,48], resulting in zero
^
entries in Cðσ; ωÞ, for instance, if a certain amino acid pair is never observed. The use of pseu-
docounts is equivalent to a maximum a posteriori (MAP) estimate under a speciﬁc inverse
Wishart prior on the covariance matrix [48]. Both preprocessing steps combined yield the
reweighted single and pair frequency counts,
! !
1 l~ X M
1 l~ X M
fi ðsÞ ¼ þ wm xi ðsÞ ;
m
fij ðs; oÞ ¼ þ wm xi ðsÞxj ðoÞ ;
m m

Meff þ l~ q m¼1 Meff þ l~ q2 m¼1

in residues i,j = 1,. . ., L and for amino acids s,o2Ω. Ideally for maximum-likelihood inference,
the random variables are assumed to be independent and identically distributed. However, this
is typically violated in realistic sequence data due to phylogenetic and sequencing bias, and the
reweighting presented here does not necessarily solve this problem.

Scoring Functions for the Pairwise Interaction Strengths

For pairwise maximum-entropy models of continuous variables, the natural scoring function
for the interaction strength between two variables xi and xj, given the inferred inverse covari-
ance matrix, is the partial correlation Eq 18. However, for categorical variables, the situation is
more complicated, and there are several alternative choices of scoring functions. Requirements
on the scoring function are that it has to account for the chosen gauge and, in the case of pro-
tein contact prediction, evaluate the coupling strength between two residues i and j summa-
rized across all possible q2 amino acids pairs. The highest scoring residue pair is, for instance,
used to predict the 3-D structure of the protein of interest. For this purpose, the direct informa-
tion, defined as the mutual information applied to Pijdir ðs; oÞ ¼ Z1ij expðeij ðs; oÞ þ h~ i ðsÞ þ
h~ j ðoÞÞ instead of fij(s,o),
!
X Pijdir ðs; oÞ
DIij ¼ P ðs; oÞln
dir
;
s;o2O
ij
fi ðsÞfj ðoÞ

has been introduced [5]. In Pijdir ðs; oÞ, h~ i ðsÞ and h~j ðoÞ are chosen to be consistent with the

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 16 / 22

(reweighted and regularized) single-site frequency counts, fi(s) and fj(o), and Zij such that the
sum over all pairs (i, j) with 1 i < j L is normalized to 1. The direct information is invariant
under gauge changes of the Hamiltonian H, which means that any suitable gauge choice results
in the same scoring values. As an alternative measure of the interaction strength for a particular
pair (i, j), the Frobenius norm of the 21×21-submatrices of (eij(s,o))s,o has been used,
!1=2
X 2
keij kF ¼ eij ðs; oÞ :
s;o2O

However, this expression is not gauge-invariant [5]. In this context, the notation with eij(s,
o), which refers to indices restricted to i < j, is extended and treated such that eij(s,o) = eji(o,
s) and eij(,) = 0; then ||eij||F = ||eji||F and ||eii||F = 0. In order to correct for phylogenetic biases
in the identification of co-evolved residues, Dunn et al. [27] introduced the average product
correction (APC). It has been originally used in combination with the mutual information but
was recently combined with the ℓ1-norm [8] and the Frobenius/ℓ2-norm [51] and is derived
from the averages over rows and columns of the corresponding norm of the matrix of the eij
parameters. In this formulation, the pair scoring function is
kei kFkej kF
APCFNij ¼keij kF ð24Þ
ke kF

for eij-parameters ﬁxed by zero-sum gauge and with the means over the non-zero elements in
XL XL
row, column and full matrix, kei kF :¼ L1 1
j¼1
keij kF , kej kF :¼ L1
1
i¼1
keij kF and
XL
ke kF :¼ LðL1Þ
1
i;j¼1
keij kF , respectively. Alternatively, the average product-corrected ℓ1-norm
applied to the 20×20-submatrices of the estimated inverse covariance matrix, in which contri-
butions from gaps are ignored, has been introduced by the authors of [8] as the Psicov-score.
Using the average product correction, the authors of [51] showed for interaction parameters
inferred by the mean-ﬁeld approximation that scoring with the average product-corrected Fro-
benius norm increased the precision of the predicted contacts compared to scoring with the
DI-score. The practical consequence of the choice of scoring method depends on the dataset
and the parameter inference method.

Discussion of Results, Improvements, and Applications

Maximum entropy-based inference methods can help in estimating interactions underlying
biological data. This class of models, combined with suitable methods for inferring their
numerical parameters, has been shown to reveal—to a reasonable approximation—the direct
interactions in many biological applications, such as gene expression or protein residue—resi-
due coevolution studies. In this review, we have presented maximum-entropy models for the
continuous and categorical random variable case. Both approaches can be integrated into a
framework, which allows the use of solutions obtained for continuous variables as approxima-
tions for the categorical random variable case (Fig 3).
The validity and precision of the available maximum-entropy methods could be improved
to yield more biologically insightful results in several ways. Advanced approximation methods
derived from Ising model approaches [59,71] are possible extensions for efficient inference.
Moreover, additional terms beyond pair interactions can be included in models of continuous
and discrete random variables [1,33,59]. However, higher-order models demand more data,
which is a major bottleneck for their application to biological problems. In the case of protein
contact prediction, this could be resolved by getting more sequences, which are being obtained

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 17 / 22

Fig 3. Scheme of pairwise maximum-entropy probability models. The maximum-entropy probability distribution with pairwise constraints for continuous
random variables is the multivariate Gaussian distribution (left column). For the maximum-entropy probability distribution in the categorical variable case
(right column), various approximative solutions exist, e.g., the mean-field, the sparse maximum-likelihood, and the pseudolikelihood maximization solution.
The mean-field and the sparse maximum-likelihood result can be derived from the Gaussian approximation of binarized categorical variables (thin arrow).
Pair scoring functions for the continuous case are the partial correlations (left column). For the categorical variable case, the direct information, the Frobenius
norm, and the average product-corrected Frobenius norm are used to score pair couplings from the inferred parameters (right column).
doi:10.1371/journal.pcbi.1004182.g003

as the result of extraordinary advances in sequencing technology. The quality of existing meth-
ods can be improved by careful refinement of sequence alignments in terms of cutoffs and gaps
or by attaching optimized weights to each of the data sequences. Alternatively, one could try to
improve the existing model frameworks by accounting for phylogenetic progression [27,49,72]
and finite sampling biases.
The advancement of inference methods for biological datasets could help solve many inter-
esting biological problems, such as protein design or the analysis of multi-gene effects in relat-
ing variants to phenotypic changes as well as multi-genic traits [73,74]. The methods presented
here could help reduce the parameter space of genome-wide association studies to first approx-
imation. In particular, we envision the following applications: (1) in the disease context, co-
evolution studies of oncogenic events, for example copy number alterations, mutations, fusions

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 18 / 22

Table 1. Overview of software tools to infer pairwise interactions from datasets in continuous or categorical variables with maximum-entropy/
GGM-based methods.

Data type Method Name Output Link

categorical mean-ﬁeld DCA, mfDCA DI-score [76,77]
pseudolikelihood maximization plmDCA APC-FN-score [78–80]
pseudolikelihood maximization Gremlin Gremlin-score [81]
sparse maximum-likelihood Psicov Psicov-score [82]
continuous sparse maximum-likelihood glasso partial correlations [83]
ℓ2-regularized maximum-likelihood scout partial correlations [84]
shrinkage corpcor, GeneNet partial correlations [85,86]
doi:10.1371/journal.pcbi.1004182.t001

and alternative splicing, can be used to derive direct co-evolution signatures of cancer from
available data, such as The Cancer Genome Atlas (TCGA); (2) de novo design of protein
sequences as, for example, described in [65,75] for the WW domain using design rules based
on the evolutionary information extracted from the multiple sequence alignment; and (3)
develop quantitative models of protein fitness computed from sequence information.
In general, in a complex biological system, it is often useful for descriptive and predictive
purposes to derive the interactions that define the properties of the system. With the methods
presented here and available software (Table 1), our goal is not only to describe how to infer
these interactions but also to highlight tools for the prediction and redesign of properties of
biological systems.

Acknowledgments
We thank Theofanis Karaletsos, Sikander Hayat, Stephanie Hyland, Quaid Morris, Deb Bemis,
Linus Schumacher, John Ingraham, Arman Aksoy, Julia Vogt, Thomas Hopf, Andrea Pagnani,
and Torsten Groß for insightful discussions.

References
1. Lezon TR, Banavar JR, Cieplak M, Maritan A, Fedoroff NV. Using the principle of entropy maximization
to infer genetic interaction networks from gene expression patterns. Proceedings of the National Acad-
emy of Sciences of the United States of America. 2006; 103(50):19033–19038. PMID: 17138668
2. Locasale JW, Wolf-Yadlin A. Maximum entropy reconstructions of dynamic signaling networks from
quantitative proteomics data. PloS one. 2009; 4(8):e6522. doi: 10.1371/journal.pone.0006522 PMID:
19707567
3. Schneidman E, Berry II MJ, Segev R, Bialek W. Weak pairwise correlations imply strongly correlated
network states in a neural population. Nature. 2006; 440:1007–1012. PMID: 16625187
4. Tang A, Jackson D, Hobbs J, Chen W, Smith JL, Patel H, et al. A maximum entropy model applied to
spatial and temporal correlations from cortical networks in vitro. The Journal of Neuroscience. 2008; 28
(2):505–518. doi: 10.1523/JNEUROSCI.3359-07.2008 PMID: 18184793
5. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein—
protein interaction by message passing. Proceedings of the National Academy of Sciences of the
United States of America. 2009; 106(1):67–72. doi: 10.1073/pnas.0805923106 PMID: 19116270
6. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, et al. Protein 3D Structure Com-
puted from Evolutionary Sequence Variation. PLoS One. 2011; 6(12):e28766. doi: 10.1371/journal.
pone.0028766 PMID: 22163331
7. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks D, Sander C, et al. Direct-coupling analysis of residue
co-evolution captures native contacts across many protein families. Proceedings of the National Acad-
emy of Sciences of the United States of America. 2011; 108:E1293–E1301. doi: 10.1073/pnas.
1111471108 PMID: 22106262
8. Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using
sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012; 28
(2):184–190. doi: 10.1093/bioinformatics/btr638 PMID: 22101153

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 19 / 22

9. Stephens GJ, Bialek W. Statistical mechanics of letters in words. Physical Review E. 2010; 81
(6):066119.
10. Bialek W, Cavagna A, Giardina I, Mora T, Silvestri E, Viale M, et al. Statistical mechanics for natural
flocks of birds. Proceedings of the National Academy of Sciences. 2012; 109(13):4786–4791.
11. Wood K, Nishida S, Sontag ED, Cluzel P. Mechanism-independent method for predicting response to
multidrug combinations in bacteria. Proceedings of the National Academy of Sciences of the United
States of America. 2012; 109(30):12254–12259. doi: 10.1073/pnas.1201281109 PMID: 22773816
12. Whittaker J. Graphical models in applied multivariate statistics. Wiley Publishing; 2009.
13. Lauritzen SL. Graphical models. Oxford: Oxford University Press; 1996.
14. Butte AJ, Kohane IS. Unsupervised knowledge discovery in medical databases using relevance net-
works. In: Proceedings of the AMIA Symposium. American Medical Informatics Association; 1999.
p. 711–715.
15. Toh H, Horimoto K. Inference of a genetic network by a combined approach of cluster analysis and
graphical Gaussian modeling. Bioinformatics. 2002; 18(2):287–297. PMID: 11847076
16. Dobra A, Hans C, Jones B, Nevins JR, Yao G, West M. Sparse graphical models for exploring gene
expression data. Journal of Multivariate Analysis. 2004; 90(1):196–212.
17. Schäfer J, Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implica-
tions for functional genomics. Statistical applications in genetics and molecular biology. 2005; 4(1):1–
32.
18. Roudi Y, Nirenberg S, Latham PE. Pairwise maximum entropy models for studying large biological sys-
tems: when they can work and when they can’t. PLoS Computational Biology. 2009; 5(5):e1000380.
doi: 10.1371/journal.pcbi.1000380 PMID: 19424487
19. Cramér H. Mathematical methods of statistics. vol. 9. Princeton university press; 1999.
20. Guttman L. A note on the derivation of formulae for multiple and partial correlation. The Annals of Math-
ematical Statistics. 1938; 9(4):305–308.
21. Krumsiek J, Suhre K, Illig T, Adamski J, Theis FJ. Gaussian graphical modeling reconstructs pathway
reactions from high-throughput metabolomics data. BMC Systems Biology. 2011; 5(1):21.
22. Giraud BG and Heumann, John M and Lapedes, Alan S. Superadditive correlation. Physical Review E.
1999; 59(5):4983–4991.
23. Neher E. How frequent are correlated changes in families of protein sequences? Proceedings of the
National Academy of Sciences of the United States of America. 1994; 91(1):98–102. PMID: 8278414
24. Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins.
Proteins. 1994; 18(4):309–317. PMID: 8208723
25. Taylor WR, Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Engi-
neering. 1994; 7(3):341–348. PMID: 8177883
26. Shindyalov IN and Kolchanov NA and Sander C. Can three-dimensional contacts in protein structures
be predicted by analysis of correlated mutations? Protein Engineering. 1994; 7(3):349–358. PMID:
8177884
27. Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramat-
ically improves residue contact prediction. Bioinformatics. 2008; 24(3):333–340. PMID: 18057019
28. Burger L, Van Nimwegen E. Disentangling direct from indirect co-evolution of residues in protein align-
ments. PLoS computational biology. 2010; 6(1):e1000633. doi: 10.1371/journal.pcbi.1000633 PMID:
20052271
29. Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW. Correlations among amino acid sites in
bHLH protein domains: an information theoretic analysis. Molecular biology and evolution. 2000; 17
(1):164–178. PMID: 10666716
30. Lapedes A, Giraud B, Jarzynski C. Using Sequence Alignments to Predict Protein Structure and Stabil-
ity With High Accuracy. eprint arXiv:12072484. 2002;.
31. Ciriello G, Miller ML, Aksoy BA, Senbabaoglu Y, Schultz N, Sander C. Emerging landscape of onco-
genic signatures across human cancers. Nature genetics. 2013; 45(10):1127–1133. doi: 10.1038/ng.
2762 PMID: 24071851
32. Koller D, Friedman N. Probabilistic graphical models: principles and techniques. MIT press; 2009.
33. Mead LR, Papanicolaou N. Maximum entropy in the problem of moments. Journal of Mathematical
Physics. 1984; 25:2404–2417.
34. MacKay DJ. Information theory, inference and learning algorithms. Cambridge university press; 2003.
35. Cover TM, Thomas AJ. Elements of information theory. John Wiley & Sons; 2012.

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 20 / 22

36. Agmon N, Alhassid Y, Levine RD. An algorithm for finding the distribution of maximal entropy. Journal
of Computational Physics. 1979; 30(2):250–258.
37. Shannon CE. A Mathematical Theory of Communication. Bell system technical journal. 1948; 27
(3):379–423.
38. Jaynes ET. Information Theory and Statistical Mechanics. Physical Review. 1957; 106(4):620–630.
39. Jaynes ET. Information Theory and Statistical Mechanics II. Physical Review. 1957; 108(2):171–190.
40. Jaynes ET. Probability theory: the logic of science. Cambridge: Cambridge university press; 2003.
41. Murphy KP. Machine learning: a probabilistic perspective. The MIT Press; 2012.
42. Balescu R. Matter out of Equilibrium. World Scientific; 1997.
43. Goldstein S, Lebowitz JL. On the (Boltzmann) entropy of non-equilibrium systems. Physica D: Nonlin-
ear Phenomena. 2004; 193(1):53–66.
44. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Pro-
teins and Nucleic Acids. Cambridge University Press; 1998.
45. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families
database. Nucleic acids research. 2014; 42:D222–D230. doi: 10.1093/nar/gkt1223 PMID: 24288371
46. Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching
by HMM-HMM alignment. Nature methods. 2012; 9(2):173–175.
47. Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic
acids research. 2011;p. gkr367.
48. Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, et al. Fast and accurate multi-
variate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction part-
ners. PloS one. 2014; 9(3):e92721. doi: 10.1371/journal.pone.0092721 PMID: 24663061
49. Lapedes AS, Giraud BG, Liu LC, Stormo GD. A Maximum Entropy Formalism for Disentangling Chains
of Correlated Sequence Positions. In: Proceedings of the IMS/AMS International Conference on Statis-
tics in Molecular Biology and Genetics; 1998. p. 236–256.
50. Santolini M, Mora T, Hakim V. A general pairwise interaction model provides an accurate description of
in vivo transcription factor binding sites. PloS one. 2014; 9(6):e99015. doi: 10.1371/journal.pone.
0099015 PMID: 24926895
51. Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: Using pseu-
dolikelihoods to infer Potts models. Physical Review E. 2013; 87(1):012707.
52. Bishop CM. Pattern recognition and machine learning. New York: Springer-Verlag; 2006.
53. Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. The
Annals of Statistics. 2006; 34(3):1436–1462.
54. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Bio-
statistics. 2008; 9(3):432–441. PMID: 18079126
55. Witten DM, Tibshirani R. Covariance-regularized regression and classification for high dimensional
problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2009; 71(3):615–
636.
56. Ledoit O, Wolf M. A well-conditioned estimator for large-dimensional covariance matrices. Journal of
multivariate analysis. 2004; 88(2):365–411.
57. Kappen HJ, Rodriguez F. Efficient learning in Boltzmann machines using linear response theory. Neu-
ral Computation. 1998; 10(5):1137–1156.
58. Tanaka T. Mean-field theory of Boltzmann machine learning. Physical Review E. 1998; 58(2):2302–
2310.
59. Roudi Y, Aurell E, Hertz JA. Statistical physics of pairwise probability models. Frontiers in computa-
tional neuroscience. 2009;3. doi: 10.3389/neuro.10.003.2009 PMID: 19242556
60. Wainwright MJ, Jordan MI. Graphical models, exponential families, and variational inference. Founda-
tions and Trends in Machine Learning. 2008; 1(1–2):1–305.
61. Broderick T, Dudik M, Tkacik G, Schapire RE, Bialek W. Faster solutions of the inverse pairwise Ising
problem. arXiv preprint arXiv:07122437. 2007;.
62. Besag J. Statistical analysis of non-lattice data. The Statistician. 1975; 24(3):179–195.
63. Liang P, Jordan MI. An asymptotic analysis of generative, discriminative, and pseudolikelihood estima-
tors. In: Proceedings of the 25th international conference on Machine learning. ACM; 2008. p. 584–
591.
64. Höfling H, Tibshirani R. Estimation of sparse binary pairwise markov networks using pseudo-likeli-
hoods. The Journal of Machine Learning Research. 2009; 10:883–906.

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 21 / 22

65. Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ. Learning generative models for pro-
tein fold families. Proteins: Structure, Function, and Bioinformatics. 2011; 79(4):1061–1078.
66. Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue—residue con-
tact predictions in a sequence-and structure-rich era. Proceedings of the National Academy of Sci-
ences. 2013; 110(39):15674–15679.
67. Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue—residue interactions
across protein interfaces using evolutionary information. eLife. 2014; 3: e02030. doi: 10.7554/eLife.
02030 PMID: 24842992
68. Wainwright MJ, Jordan MI. Log-determinant relaxation for approximate inference in discrete Markov
random fields. IEEE Transactions on Signal Processing. 2006; 54(6):2099–2109.
69. Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estima-
tion for multivariate gaussian or binary data. The Journal of Machine Learning Research. 2008; 9:485–
516.
70. Ravikumar P, Wainwright MJ, Lafferty JD. High-dimensional Ising model selection using l1-regularized
logistic regression. The Annals of Statistics. 2010; 38(3):1287–1319.
71. Sessak V, Monasson R. Small-correlation expansions for the inverse Ising problem. Journal of Physics
A: Mathematical and Theoretical. 2009; 42(5):055001.
72. Lapedes AS, Giraud BG, Liu L, Stormo GD. Correlated mutations in models of protein sequences: phy-
logenetic and structural effects. In: Statistics in Molecular Biology/IMS Lecture Notes—Monograph
Series. JSTOR; 1999. p. 236–256.
73. Rockman MV. Reverse engineering the genotype—phenotype map with natural genetic variation.
Nature. 2008; 456(7223):738–744. doi: 10.1038/nature07633 PMID: 19079051
74. Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover geno-
type-phenotype interactions. Nature Reviews Genetics. 2015; 16(2):85–97. doi: 10.1038/nrg3868
PMID: 25582081
75. Russ WP, Lowery DM, Mishra P, Yaffe MB, Ranganathan R. Natural-like function in artificial WW
domains. Nature. 2005; 437(7058):579–583. PMID: 16177795
76. EVFold. http://evfold.org.
77. Direct Coupling Analysis. http://dca.rice.edu.
78. Ekeberg M. pseudolikelihood maximization Direct-Coupling Analysis. http://plmdca.csc.kth.se.
79. Pagnani A. Pseudo Likelihood Maximization for protein in Julia. https://github.com/pagnani/PlmDCA.
80. CCMpred. https://bitbucket.org/soedinglab/ccmpred.
81. Gremlin. http://gremlin.bakerlab.org.
82. Psicov. http://bioinfadmin.cs.ucl.ac.uk/downloads/PSICOV.
83. Friedman J, Hastie T, Tibshirani R. Graphical lasso in R and Matlab. http://statweb.stanford.edu/~tibs/
glasso/.
84. Witten DM, Tibshirani R. scout: Implements the Scout method for Covariance-Regularized Regression.
http://cran.r-project.org/web/packages/scout/index.html.
85. Schäfer J, Opgen-Rhein R, Strimmer K. Modeling and Inferring Gene Networks. http://strimmerlab.org/
software/genenet/.
86. Schäfer J, Opgen-Rhein R, Zuber V, Ahdesmäki M, Silva APD, Strimmer K. Efficient Estimation of
Covariance and (Partial) Correlation. http://strimmerlab.org/software/corpcor/.

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 22 / 22

Kolter PGM
No ratings yet
Kolter PGM
75 pages
Reverse Engineering (RE) : Systems Biology Class 2
No ratings yet
Reverse Engineering (RE) : Systems Biology Class 2
49 pages
Lecture1-Introduction To Data Mining
No ratings yet
Lecture1-Introduction To Data Mining
46 pages
311c PDF
No ratings yet
311c PDF
50 pages
Symmetry and Complexity in Gene Association Networ
No ratings yet
Symmetry and Complexity in Gene Association Networ
30 pages
A Three-Way Approach For Protein Function Classification (Deep Learning Based 3WC)
No ratings yet
A Three-Way Approach For Protein Function Classification (Deep Learning Based 3WC)
29 pages
Untangling The Hairball
No ratings yet
Untangling The Hairball
60 pages
Pattern - Recognition - Meta - Correlations - Final - CLEAN
No ratings yet
Pattern - Recognition - Meta - Correlations - Final - CLEAN
35 pages
Classification of Protein-Protein Association Rates Based On Biophysical Informatics
No ratings yet
Classification of Protein-Protein Association Rates Based On Biophysical Informatics
20 pages
Journal Pcbi 1010730
No ratings yet
Journal Pcbi 1010730
23 pages
Maxent Models and Discriminative Estimation: The Maximum Entropy Model Presentation
No ratings yet
Maxent Models and Discriminative Estimation: The Maximum Entropy Model Presentation
39 pages
Dca HP RBM
No ratings yet
Dca HP RBM
26 pages
SC 9
No ratings yet
SC 9
35 pages
11 Chapter.4
No ratings yet
11 Chapter.4
26 pages
Neural Interaction Detection
No ratings yet
Neural Interaction Detection
21 pages
Dca PM PLM
No ratings yet
Dca PM PLM
19 pages
Protein Case Study
No ratings yet
Protein Case Study
12 pages
Graphical Models For Data Mining: NLP-AI Seminar
No ratings yet
Graphical Models For Data Mining: NLP-AI Seminar
39 pages
Wittenberg 2010 An Introduction To Maximum Entropy and Minimum Cross Entropy Estimation Using Stata
No ratings yet
Wittenberg 2010 An Introduction To Maximum Entropy and Minimum Cross Entropy Estimation Using Stata
16 pages
Ceng465 Week15
No ratings yet
Ceng465 Week15
44 pages
5.pairwise Alignment
No ratings yet
5.pairwise Alignment
85 pages
Ensemble Disease Gene Prediction by Clinical Sample-Based Networks
No ratings yet
Ensemble Disease Gene Prediction by Clinical Sample-Based Networks
12 pages
Interpretable AI For Inference of Causal Molecular Relationships From Omics Data
No ratings yet
Interpretable AI For Inference of Causal Molecular Relationships From Omics Data
18 pages
Bts 595
No ratings yet
Bts 595
8 pages
Improved Enzyme Functional Annotation Prediction Using Contrastive Learning With Structural Inference
No ratings yet
Improved Enzyme Functional Annotation Prediction Using Contrastive Learning With Structural Inference
10 pages
Entropy-Based Inference Using R and The NP Package: A Primer
No ratings yet
Entropy-Based Inference Using R and The NP Package: A Primer
11 pages
2020 Chatterjee
No ratings yet
2020 Chatterjee
14 pages
Metabolic Network Prediction Through Pairwise Rati - Roche-Lima Et Al. - 2014
No ratings yet
Metabolic Network Prediction Through Pairwise Rati - Roche-Lima Et Al. - 2014
13 pages
Informational Rescaling of PCA Maps With Application To Genetic Distance
No ratings yet
Informational Rescaling of PCA Maps With Application To Genetic Distance
5 pages
Implementing Machine-Learning Algorithms For Identifying Essential Proteins-1
No ratings yet
Implementing Machine-Learning Algorithms For Identifying Essential Proteins-1
6 pages
Function Prediction Using Neighborhood Patterns: Petko Bogdanov Petko@cs - Ucsb.edu Ambuj Singh Ambuj@cs - Ucsb.edu
No ratings yet
Function Prediction Using Neighborhood Patterns: Petko Bogdanov Petko@cs - Ucsb.edu Ambuj Singh Ambuj@cs - Ucsb.edu
7 pages
Using RRC Algorithm Classify The Proteins and Visualize in Biological Databases
No ratings yet
Using RRC Algorithm Classify The Proteins and Visualize in Biological Databases
6 pages
Machine Learning Based Prediction Methods in Bioinformatics
No ratings yet
Machine Learning Based Prediction Methods in Bioinformatics
34 pages
Journal Pcbi 1009909
No ratings yet
Journal Pcbi 1009909
18 pages
Jcs Talk
No ratings yet
Jcs Talk
29 pages
Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements
No ratings yet
Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements
12 pages
Nmeth 4642
No ratings yet
Nmeth 4642
2 pages
SLA - Class Test - 5 - AnswerKey
No ratings yet
SLA - Class Test - 5 - AnswerKey
2 pages
Inferring Regulatory Networks From Gene Expression Data
No ratings yet
Inferring Regulatory Networks From Gene Expression Data
14 pages
Protein-Protein Interactions and Genetic Diseases The Interactome
No ratings yet
Protein-Protein Interactions and Genetic Diseases The Interactome
10 pages
Bandyopadhyay 2016
No ratings yet
Bandyopadhyay 2016
10 pages
Maximum Entropy Method: Sampling Bias: Jorge - Cossio@cigb - Edu.cu
No ratings yet
Maximum Entropy Method: Sampling Bias: Jorge - Cossio@cigb - Edu.cu
10 pages
Cores Bioinformatics - and - Computational - Biology
No ratings yet
Cores Bioinformatics - and - Computational - Biology
4 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
Aracne Califano2006 Nat Protocol
No ratings yet
Aracne Califano2006 Nat Protocol
10 pages
How Reliable Are Experimental Protein - Protein Interaction Data?
No ratings yet
How Reliable Are Experimental Protein - Protein Interaction Data?
5 pages
Walking The Interactome For Prioritization of Candidate Disease Genes
No ratings yet
Walking The Interactome For Prioritization of Candidate Disease Genes
10 pages
A Systematic Approach To Orient The Human Protein Protein Interaction Network
No ratings yet
A Systematic Approach To Orient The Human Protein Protein Interaction Network
9 pages
Bayesian Networks Applied To Modeling Cellular Networks
No ratings yet
Bayesian Networks Applied To Modeling Cellular Networks
5 pages
Agriculture 5
No ratings yet
Agriculture 5
3 pages
Arabidopsis Thaliana: Designing A Computational System To Predict Protein-Protein Interactions in
No ratings yet
Arabidopsis Thaliana: Designing A Computational System To Predict Protein-Protein Interactions in
4 pages
Analysis of Multiple Experiments Tigr Multiple Experiment Viewer (Mev)
No ratings yet
Analysis of Multiple Experiments Tigr Multiple Experiment Viewer (Mev)
130 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Journal of Bioinformatics and Computational Biology Vol. 10, No. 4 (2012) 1203002 (3 Pages) C Imperial College Press Doi
No ratings yet
Journal of Bioinformatics and Computational Biology Vol. 10, No. 4 (2012) 1203002 (3 Pages) C Imperial College Press Doi
3 pages
Mining Applctns
No ratings yet
Mining Applctns
3 pages
Supporting Material: Robust Probabilistic Superposition and Comparison of Protein Structures
No ratings yet
Supporting Material: Robust Probabilistic Superposition and Comparison of Protein Structures
12 pages
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Logical Modeling of Biological Systems
From Everand
Logical Modeling of Biological Systems
Luis Fariñas del Cerro
No ratings yet
Toehold Mediated Strand Displacement: Molecular Control of DNA Hybridization and Strand Exchange
From Everand
Toehold Mediated Strand Displacement: Molecular Control of DNA Hybridization and Strand Exchange
Fouad Sabry
No ratings yet

Plos 2015 - Maximum Entropy

Uploaded by

Plos 2015 - Maximum Entropy

Uploaded by

REVIEW

Inferring Pairwise Interactions from

* [email protected] (RRS); [email protected] (CS)

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 1 / 22

Gene association networks

rAB rBC rAC ðC^ 1 Þ

correlations in data generated by an in silico reaction system consisting of three components A,

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 2 / 22

Protein contact prediction

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 3 / 22

Deriving the Probabilistic Model

Model formulation for continuous random variables

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 4 / 22

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 5 / 22

Concept of entropy maximization

Categorical random variables

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 6 / 22

This embedding specifies a unique representation of any L-vector of categorical random

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 7 / 22

The pairwise maximum-entropy probability distribution in categorical variables has to ful-

with normalization by the partition function, Z exp(1−α). Note that distribution Eq 12 is of

This is the 21-state maximum-entropy probability distribution as presented by [5–7].

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 8 / 22

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 9 / 22

Closed-Form Solution for Continuous Variables

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 10 / 22

with solution Cij ¼ ð~γ 1 Þij or ðC 1 Þij ¼ ð~

As a consequence, the real-valued maximum-entropy distribution Eq 14 for given first and

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 11 / 22

Solution for categorical variables

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 12 / 22

When we specify the maximum-entropy distribution Eq 13 as model distribution, the then-

The maximum-likelihood solution is found by taking the derivatives of Eq 22 with respect

The partial derivatives of the partition function,

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 13 / 22

Stochastic maximum likelihood

until convergence is reached as the differences DhðkÞi ðs; oÞ :¼ hi

subset of Ω containing q−1 elements to account for gauge ﬁxing.

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 14 / 22

In the final formulation of the pseudo-likelihood maximization (PLM) problem, an ℓ2-regu-

Sparse maximum likelihood

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 15 / 22

Sequence data preprocessing

Meff þ l~ q m¼1 Meff þ l~ q2 m¼1

Scoring Functions for the Pairwise Interaction Strengths

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 16 / 22

Discussion of Results, Improvements, and Applications

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 17 / 22

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 18 / 22

Data type Method Name Output Link

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 19 / 22

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 20 / 22

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 21 / 22

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004182 July 30, 2015 22 / 22

You might also like