Airoldi_GettingStarted

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Getting Started in

Probabilistic Graphical Models


The Harvard community has made this
article openly available. Please share how
this access benefits you. Your story matters

Citation Airoldi, Edoardo M. 2007. Getting Started in Probabilistic Graphical


Models. PLoS Computational Biology 3(12): e252. doi:10.1371/
journal.pcbi.0030252

Published Version http://dx.doi.org/10.1371/journal.pcbi.0030252

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:2757496

Terms of Use This article was downloaded from Harvard University’s DASH
repository, and is made available under the terms and conditions
applicable to Other Posted Material, as set forth at http://
nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-
use#LAA
Message From ISCB

Getting Started in Probabilistic


Graphical Models
Edoardo M. Airoldi

P
robabilistic graphical models and computational scientists, and their abundance, functional processes,
(PGMs) have become a popular discusses the aspects to which each and membership of genes to functional
tool for computational analysis scientist should contribute to carry out processes. Next we translate the
of biological data in a variety of the data analysis successfully using biological players and the connections
domains. But, what exactly are they and PGMs. we established among them into
how do they work? How can we use Let us start by considering a specific mathematical quantities (i.e., random
PGMs to discover patterns that are problem in transcriptional regulation. variables) and connections among
biologically relevant? And to what Given measurements about the them (i.e., statistical dependencies).
extent can PGMs help us formulate new abundance of gene transcripts in This translation specifies the model
hypotheses that are testable at the retinal cells across stages of structure. At this stage, we rely on
bench? This Message sketches out some development, we would like to discover biological intuitions to fine-tune the
answers and illustrates the main ideas which functional processes are relevant model, for instance, by deciding which
behind the statistical approach to for development, and reveal which sources of variability in the
biological pattern discovery. ones are most important at which stage. measurements carry information about
To develop a PGM to address this the latent variables and which do not—
Introduction problem, we begin by identifying the if the temporal expression profiles of
biological objects that would appear in genes A and B are similar on a relative
Probabilistic graphical models offer
a cartoon model of how cellular scale, but their absolute abundance is
a common conceptual architecture
development impacts transcription. In quite different, should we believe that
where biological and mathematical
this illustrative example, we have genes they both participate in the same
objects can be expressed with a
and functional processes/contexts. It is
common, intuitive formalism. This functional processes? Last, we assign
reasonable to assume that each gene
enables effective communication numerical values to those quantities
will participate in multiple functional
between scientists across the that are unknown in the final model
processes, although typically in a small
mathematical divide by fostering specifications (i.e., we fit the model to
number of them, and that not all
substantive debate in the context of a the data) and we use them to develop
functional processes will be important
scientific problem, and ultimately biological intuitions in the context of
at all stages of development. We then
facilitates the joint development of the original problem. (Functional
assess what aspects of the problem we
statistical and computational tools for aspects of retinal development, in
can probe directly, with experimental
quantitative data analysis. A number of mouse, are fully addressed in [5].)
techniques, and what aspects we
success stories have appeared over the cannot. In the example, while an In the following, we briefly introduce
years [1–4]. Today, probabilistic abundance of gene transcripts can be the basic mathematical quantities that
graphical models promise to play a obtained, for instance, via SAGE (serial enable the translation of a cartoon
major role in the resolution of many analysis of gene expression), it is harder model of biology into a PGM, and we
intriguing conundrums in the to measure functional processes.
biological sciences. The goal of this However, the latter could be
short article is to be a dense, operationally defined as sets of genes Editor: William Noble, University of Washington,
informative introduction to the language that share a similar temporal
United States of America
of probabilistic graphical models, for regulation pattern; this definition has Citation: Airoldi EM (2007) Getting started in
beginners, with pointers to successful the advantage of creating a connection
probabilistic graphical models. PLoS Comput Biol
3(12): e252. doi:10.1371/journal.pcbi.0030252
applications in selected areas of between membership of genes to
biology. The exposition introduces the Copyright: ! 2007 Edoardo M. Airoldi. This is an
functional processes (i.e., an open-access article distributed under the terms of
essential concepts involved in PGMs in unobservable mapping) and similarity the Creative Commons Attribution License, which
the context of the various stages of a of the temporal expression profiles (i.e., permits unrestricted use, distribution, and
typical collaboration between natural reproduction in any medium, provided the original
observable quantities). The author and source are credited.
establishment of connections between
Abbreviations: EM, expectation–maximization;
those biological objects that we can MCMC, Monte Carlo Markov chain; PGM, probabilistic
probe and those that we cannot ends a graphical model
first conceptual effort. Edoardo M. Airoldi is with the Lewis-Sigler Institute
A cartoon model of how cellular for Integrative Genomics and the Computer Science
Department, Princeton University, Princeton, New
development impacts transcription is Jersey, United States of America. E-mail: eairoldi@
now specified in terms of genes and princeton.edu

PLoS Computational Biology | www.ploscompbiol.org 2421 December 2007 | Volume 3 | Issue 12 | e252
summarizes how well the observations
are explained by the specific PGM that
is identified by a given value of the
underlying constants. The likelihood
can be computed using the structural
hypotheses encoded by the graph, and
the probability distributions specified
for the nodes. Continuing the example,
the likelihood corresponding to the
model in Figure 1 is computed as
follows:
Z
PrðY j a; bÞ ¼ PrðY ; Xj a; bÞdX ð1Þ
x

Z Y
G
¼ ½PrðY ðgÞjXðgÞ; bÞ
X g¼1

%PrðXðgÞj aÞ&dX
ð2Þ
doi:10.1371/journal.pcbi.0030252.g001
Figure 1. Two Equivalent Representations of the Same Probabilistic Graphical Model
The left panel shows the full model, and the right panel shows the same model expressed in [ ‘ðY j HÞ; ð3Þ
compact form. Nodes denote random variables; observed random variables are shaded while latent
random variables are not; edges denote possible dependences. The box in the right panel is called for H [ (a,b). The joint probability of
a plate; it denotes independent and identically distributed replicates. measurements and latent variables
given the underlying constants, that is,
review strategies to assign numerical distributions of the random variables the integrand on the right-end side of
values to the unknown quantities completes the picture. These constants Equation 1, is often referred to as the
underlying any PGM that are most are referred to as parameters in the complete likelihood function in the
likely given the observations. We frequentist paradigm and as hyper- literature—an important quantity in
conclude with an overview of selected parameters in the Bayesian paradigm. the statistical treatment of PGMs with
applications, complete with pointers to (See [7], pp. 185–189, for a discussion of latent variables.
published work. when the distinction matters in
practice, with examples.) Estimation and Inference
The Basics Figure 1 shows an example of a A family of PGMs is fit to the data to
A probabilistic graphical model probabilistic graphical model for gene find likely values for its underlying
defines a family of probability expression. (We note that there is a constants and likely distributions for its
distributions that can be represented in considerable overlap between the class latent variables. This process boils
terms of a graph. Nodes in the graph of probabilistic graphical models and down to an optimization problem
correspond to random variables; its the class of Bayesian networks. A where the objective function is based
structure translates into statistical number of scholars choose to refer to on the likelihood. Considered jointly,
dependencies (among such variables) PGMs that can be represented as directed the estimation and inference tasks
that drive the computation of joint, acyclic graphs, with nodes corresponding to identify a specific model in the family
conditional, and marginal probabilities discrete-valued random variables, encoding of PGMs that is defined by the
of interest [6]. In applications, most of observed measurements, and no latent assumptions on the graph and the
the (node-specific) random variables variables as Bayesian networks.) The random variables, which successfully
are chosen to express the variability of observed expression of a gene, Y(g), summarizes the variability of the
an observed quantity, such as the depends on the latent functional observations.
expression of a specific gene measured process it is involved in, X(g). The In the language of the statistical
under a certain condition. Some underlying constants, (a,b), control the literature, we distinguish the task of
random variables, however, may specify probability that any given functional estimating the underlying constants (i.e.,
unobserved quantities that are believed process is active and the probability of the parameters in a frequentist
to influence the observable outcomes observing expression of a certain statistical setting, or the hyper-
of a given experiment, such as which magnitude, respectively. The left panel parameters in a Bayesian statistical
cellular processes were active at the shows the full model, and the right setting) of a probabilistic graphical
time measurements were taken. The panel shows the same model expressed model, from the task of inferring the
(directed or undirected) arcs of the in compact form. distributions of the latent variables
graph specify the biological hypotheses The likelihood function, or the given the observations. Let us consider
about how observable and latent probability of the measurements given strategies to address the latter task first.
quantities influence one another. A set the underlying constants, is the main The choice among the many strategies
of constants underlying the quantity of interest in PGMs. It available is often informed by the

PLoS Computational Biology | www.ploscompbiol.org 2422 December 2007 | Volume 3 | Issue 12 | e252
complexity of the model, and in integral on the right-end side of estimating the constants underlying a
particular by whether the integral on Equation 1. The main idea shared by PGM; few established strategies exist.
the right-end side of Equation 1 can be both approaches is to find a lower The estimates for the underlying
computed in closed form. Exact bound for the likelihood, ‘ (Y jH), constants may be chosen, for instance,
inference is available for models that making use of Jensen’s inequality and to maximize the likelihood, or to match
belong to special families [6]. Focusing of an arbitrary distribution on the empirical and theoretical moments of
on the biology of the problem, latent variables q(X): the random variables that correspond
however, often leads to a model Z to measurements ([7], pp. 120–124).
structure and probabilistic log ‘ ðYjHÞ ¼ log PrðY; Xja; bÞdX Alternatively, when the likelihood is
specifications that cannot be subsumed x too difficult or expensive to compute,
under any special family. The Z an approximation, LD ’ ‘, or a lower
likelihood is intractable in many such ¼ log qðXÞ % Pr ðY; Xja; bÞ=qðXÞdX bound, L * ‘, for the likelihood can be
cases—that is, the integral in Equation x used as a surrogate. These alternatives
1 cannot be solved in closed form—and ðfor any qÞ and others are sometimes referred to as
we resort to approximations. Below, we empirical Bayes estimates in the
briefly survey the intuitions behind Z context of nontrivial probabilistic
three popular strategies to perform ( qðXÞ % log PrðY; Xja; bÞ=qðXÞdX graphical models ([13], Chapter 3).
x
approximate inference in PGMs: Monte Popular software packages that
ð Jensen’s inequalityÞ
Carlo Markov chains (sampling-based), implement a language to specify and fit
and expectation–maximization (EM) PGMs are available. For MCMC, see
and variational methods (optimization- ¼ Eq ½log PrðY; XjHÞ& ' log qðXÞ [ Lðq; HÞ BUGS [14]; for variational inference,
based). ð5Þ see VIBES [15].
Monte Carlo Markov chain (MCMC)
techniques such as the Gibbs or In EM, the lower bound L (q, H) is Applications
Metropolis-Hastings samplers can be iteratively maximized with respect to
H, in the M step, and q in the E step With the technical machinery we just
used to explore the joint posterior
[11]. In particular, at the t-th iteration introduced, we are now ready to bring
distribution of the latent variables [8,9].
of the E step the q distribution must the biological intuition back into the
Although the likelihood is intractable,
satisfy the following equation: picture. Let us continue with the
the complete likelihood Pr (Y,X j a,b)
transcriptional regulation example. In
can be easily computed for the large
qðtÞ ¼ PrðXjY ; Hðt'1Þ Þ; ð6Þ the PGM of Figure 1, the expression of
majority of PGMs. The main concept
gene g may be encoded by a real-valued
behind MCMC schemes is to work with That is, we set the arbitrary random variable Y(g). The mixed
the complete likelihood, and to reduce distribution q equal to the posterior membership of gene g to
the full joint posterior to lower- distribution of the latent variables nonobservable biological contexts may
dimensional conditional given the data and the estimates of the be encoded by the nonzero
distributions—on individual, or blocks parameters at the previous iteration. components of a latent random vector,
of latent variables—that we can sample Unfortunately, it is not always possible X(g). The number of latent biological
from. Samples from the joint posterior to express the distribution q(t) in contexts we ask the PGM to infer,
are then obtained by composing Equation 6 in analytic form. In such denoted by K, is an important quantity
conditional samples. The Gibbs cases, a variational approximation to in this model, which we discuss later—
sampler, for instance, requires that one the EM [12] can be obtained by defining briefly, the value of K specifies the
can sample from all univariate, full- a parametric approximation to the dimensionality of this PGM, that is, the
conditional distributions: posterior in Equation 6, denoted by ~q [ number of components of the vector-
PrðXðgÞjXð'gÞ; Y ; a; bÞ; for g ¼ 1; :::; G; qD (X), which involves an extra set of valued latent variables, X(g). The two
variational parameters, D, and leads to an constants (a,b) may be used to encode
ð4Þ
approximate lower bound for the biological constraints. For instance, a
where X('g) is the collection of random likelihood LD (q, H). At the t-th may be used to introduce a notion of
variables X without X(g). The iteration of the E step, we then biological parsimony in the form of a
Metropolis-Hastings sampler requires minimize the Kullback-Leibler probabilistic (soft) constraint on the
ðtÞ
that one can at least compute a divergence between q(t) and qD , with number of biological contexts each
quantity proportional to the desired respect to D, using the data—this is gene may participate in, and b may be
posterior—samples are drawn from an equivalent to maximizing the used to specify gene expression
arbitrary proposal distribution and are approximate lower bound for the patterns in the form of differential
accepted or rejected using a formula likelihood, LD (q, H) with respect to D. expression levels across those
that depends on the proposal. Other The optimal parametric experimental conditions for which
sampling-based algorithms such as approximation can be thought of as an microarray measurements were
particle filters can be used to perform approximate posterior distribution for taken—alternative pattern
inference in PGMs of sequential the latent variables in the sense that it specifications and parameterizations
observations [10]. depends on the data Y, although exist [5]. For any given number of
ðtÞ
The two alternatives to sampling we indirectly, q(t) ’ qD)ðY Þ ðXÞ ¼ Pr (X j Y). latent biological contexts, K, the PGM
survey here aim at approximating the Let us now return to the task of is fit to the data. Estimation and

PLoS Computational Biology | www.ploscompbiol.org 2423 December 2007 | Volume 3 | Issue 12 | e252
inference will assign numerical values of the PGM in the example.) The roles from a large collection of
!
to the unknown quantities (X ,a,b). goodness of fit, along with the manually curated protein interactions,
These quantities provide us with model- substantive value of the inferred as well as cross-talk patterns among
based and observation-induced summaries patterns, should inform a critical proteins that participate in distinct
of the data. In the example, for review of the biological assumptions functional processes [31]; and a model
instance, while b summarizes gene underlying the initial cartoon model, for inferring temporal patterns of
expression patterns that summarize the and possibly suggest new hypotheses— coexpressed genes from time-course
main trends of transcription in a testable either with new statistical expression data measured via SAGE
collection of microarrays, the values analyses, or with new experimental and microarray technologies [5].
assigned to the latent variables, X(g), probes at the bench. In this sense, Note that the graphical
provide gene-specific information that probabilistic graphical models representation of a family of PGMs
can be used for making fine-grained contribute to an iterative process of goes only so far in specifying the
predictions. scientific discovery, where statistical model; it’s informative, but not
In the last stage of the analysis, we and biological thinking are intertwined exhaustive. Probabilistic assumptions
assess the biological relevance of the as both cause and effect. and some features of the sampling
patterns we inferred from the data There is a rich history of applied scheme cannot be specified by the
(such as the biological contexts, or gene research that leverages the graph. Such subtle variants typically
coexpression patterns, in the example) probabilistic graphical models make a significant difference in
to make sure the model is capturing the approach outlined above to problems applications.
signal we set out to capture, and we use in the biological sciences. It includes a
the inferred patterns to gain insights model for inferring the ancestral Conclusions
into the problem. Assessment of population structure of individuals
Probabilistic graphical models offer
biological relevance can be qualitative starting from a collection of multilocus
a common conceptual architecture
or quantitative. Qualitative methods genotype measurements [2] and a
where biological and mathematical
such as visual inspection are typically model for inferring HIV mutation
objects can be expressed with a
useful for focused scientific endeavors; patterns from longitudinal clonal
common, intuitive formalism. This
for instance, whenever a biological sequence data [19]; the former model is
enables effective communication
problem targets a small set of genes or closely related to the classic
between scientists across the
a specific cellular process or probabilistic graphical models to infer
mathematical divide by fostering
component, or a signaling pathway. phylogenetic trees [1,20] and to recent
substantive debate in the context of a
Quantitative methods are necessary for extensions, in particular, that take into
scientific problem, and ultimately
genome-wide scientific endeavors, and account the dependence among the
facilitates the joint development of
typically rely on knowledge-based bases at neighboring sites [21,22].
statistical and computational tools for
repositories and ontologies (such as Models for sequence analysis are well-
quantitative data analysis. In other
gene ontology [16]) and bioinformatics established in the community [4,23];
words, probabilistic graphical models
tools to carry out the evaluation [17,18]. more recently, the connection between
provide a bridge between biology and
Arguably, in any given application, the sequence information and gene
statistical computations. These models
more interpretable the patterns are, in expression has been investigated using
recently earned a spot at the center
terms of functional processes and other probabilistic graphical models as well
stage of modern (computational)
biological concepts of interest, the [24,25]. Other applications of this
biology by furthering our ability to
better the family of PGMs captures research include: a model for
probe data for biological hypotheses,
some aspects of biology that may be predicting the clinical status of breast
and will undoubtedly play an
relevant for the understanding of the cancer using gene expression profiles
important role in resolving many
phenomenon under investigation, and [26]; a model for facilitating content
intriguing conundrums in the
that are not directly measurable with browsing of biomedical literature
biological sciences, in the future. &
experimental techniques. about the nematode Caenorhabditis
Moving a step forward, the goodness elegans [27]; a model for inferring the
of model fit is often taken as a measure location of chromosome aberrations Acknowledgments
of how well the data support structural from array-based comparative The author thanks Florian Markowetz, Chad
biological hypotheses encoded by the genomic hybridization measurements Myers, David Hess, and Olga Troyanskaya at
cartoon model of biology that was used [28], and an extension that leverages Princeton University, and Eric Xing at
Carnegie Mellon University, for comments
to posit a given family of PGMs. array-based comparative genomic on an early draft of this manuscript.
Measures of goodness of model fit hybridization profiles from multiple Author contributions. EMA wrote the
include the Bayesian information individuals to recover shared paper.
criterion, the held-out likelihood aberration patterns [29]; a model for Funding. This research was partly sup-
obtained using bootstrap or cross- reconstructing features of the internal ported by United States National Institute of
General Medical Sciences Center of Excel-
validation techniques, measures of organization of the cell from the lence grant P50 GM071508, by National
predictive power such as the predictive nested structure of observed Science Foundation grants DBI-0546275
R2 in linear regression, or other perturbation effects, such as those and IIS-0513552, and by National Institutes
of Health grant R01 GM071966.
quantities, depending on the goals of measured via high-dimensional
Competing Interests. The author has
the analysis. (These measures can also phenotype screens [30]; a model for declared that there are no competing
be used to select the dimensionality, K, inferring proteins’ multiple functional interests.

PLoS Computational Biology | www.ploscompbiol.org 2424 December 2007 | Volume 3 | Issue 12 | e252
References empirical Bayes methods for data analysis. 22. Siepel A, Haussler D (2004) Combining
1. Felsenstein J (1981) Evolutionary trees from Second edition. London: Chapman & Hall. phylogenetic and hidden Markov models in
DNA sequences: A maximum likelihood 14. Lunn DJ, Thomas A, Best NG, Spiegelhalter DJ biosequence analysis. J Comput Biol 11: 413–
approach. J Mol Evol 17: 368–376. (2000) WinBUGS: A Bayesian modeling 428.
2. Pritchard J, Stephens M, Donnelly P (2000) framework: Concepts, structure and 23. Durbin R, Eddy S, Krogh A, Mitchison G (1998)
Inference of population structure using extensibility. Statistics and Computing 10: Biological sequence analysis: Probabilistic
multilocus genotype data. Genetics 155: 945– 321–333. Available: http://www.mrc-bsu.cam.ac. models of proteins and nucleic acids.
959. uk/bugs/. Accessed 8 November 2007. Cambridge: Cambridge University Press.
3. Friedman N (2004) Inferring cellular networks 15. Bishop C, Spiegelhalter D, Winn J (2003) 24. Segal E, Yelensky R, Koller D (2003) Genome-
using probabilistic graphical models. Science VIBES: A variational inference engine for wide discovery of transcriptional modules
303: 799–805. Bayesian networks. In: Becker S, Thrun S, from dna sequence and gene expression.
4. Xing EP, Karp RM (2004) MotifPrototyper: A Obermayer K, editors. Advances in neural Bioinformatics 19 (Supplement 1): i273–282.
profile Bayesian model for motif family. Proc information processing systems 15. Cambridge 25. Beer MA, Tavazoie S (2004) Predicting gene
Natl Acad Sci U S A 101: 10523–10528. (Massachusetts): MIT Press. pp. 777–784. expression from sequence. Cell 117: 185–198.
5. Airoldi EM, Fienberg SE, Xing EP (2006) Mixed Available: http://vibes.sourceforge.net/. 26. West M, Blanchette C, Dressman H, Huang E,
membership analysis of expression studies: Accessed 8 November 2007. Ishida S, et al. (2001) Predicting the clinical
Attribute data. Available: http://arxiv.org/abs/ 16. Ashburner M, Ball CA, Blake JA, Botstein D, status of human breast cancer by using gene
0711.2520/. Accessed 20 November 2007. Butler H, et al. (2000) Gene ontology: Tool for expression profiles. Proc Natl Acad Sci U S A
6. Jordan MI (2004) Graphical models. Statistical the unification of biology. The gene ontology 98: 11462–11467.
Science 19: 140–155. consortium. Nat Genet 25: 25–29. 27. Blei DM, Franks K, Jordan MI, Mian IS (2006)
7. Wasserman L (2004) All of statistics. New York: 17. Boyle EI, Weng S, Gollub J, Jin H, Botstein D, et Statistical modeling of biomedical corpora:
Springer-Verlag. al. (2004) GO::TermFinder—Open source Mining the Caenorhabditis genetic center
8. Gelman A, Carlin J, Stern H, Rubin D (1995) software for accessing Gene Ontology terms bibliography for genes related to life span.
Bayesian data analysis. London: Chapman & associated with a list of genes. Bioinformatics BMC Bioinformatics 7: 250.
Hall. 20: 3710–3715. 28. Myers CL, Dunham MJ, Kung SY, Troyanskaya
9. Robert C, Casella G (2005) Monte Carlo 18. Myers CL, Barret DA, Hibbs MA, Huttenhower OG (2004) Accurate detection of aneuploidies
statistical methods. Springer texts in statistics. C, Troyanskaya OG (2006) Finding function: in array CGH and gene expression microarray
Corrected second edition. New York: An evaluation framework for functional data. Bioinformatics 20: 3533–3543.
Springer-Verlag. genomics. BMC Genomics 7: 187. 29. Shah SP, Lam WL, Ng RT, Murphy KP (2007)
10. Liu JS (2001) Monte Carlo strategies in 19. Beerenwinkel N, Drton M (2007) A Modeling recurrent DNA copy number
scientific computing. New York: Springer- mutagenetic tree hidden Markov model for alterations in array CGH data. Bioinformatics
Verlag. longitudinal clonal HIV sequence data. 23: i450–i458.
11. Dempster A, Laird N, Rubin D (1977) Biostatistics 8: 53–71. 30. Markowetz F, Kostka D, Troyanskaya OG,
Maximum likelihood from incomplete data via 20. Felsenstein J, Churchill GA (1996) A hidden Spang R (2007) Nested effects models for high-
the EM algorithm. J R Stat Soc [Series B] 39: 1– Markov model approach to variation among dimensional phenotyping screens.
38. sites in rate of evolution. Mol Biol Evol 13: 93– Bioinformatics 23: i305–i312.
12. Jordan M, Ghahramani Z, Jaakkola T, Saul L 104. 31. Airoldi EM, Blei DM, Fienberg SE, Xing EP
(1999) Introduction to variational methods for 21. McAuliffe JD, Pachter L, Jordan MI (2004) (2006) Mixed membership analysis of high-
graphical models. Machine Learning 37: 183– Multiple-sequence functional annotation and throughput interaction studies: Relational
233. the generalized hidden Markov phylogeny. data. Available: http://arxiv.org/abs/0706.0294/.
13. Carlin BP, Louis TA (2005) Bayes and Bioinformatics 20: 1850–1860. Accessed 20 November 2007.

PLoS Computational Biology | www.ploscompbiol.org 2425 December 2007 | Volume 3 | Issue 12 | e252

You might also like