0% found this document useful (0 votes)
19 views

Module 2 Local Probabilistic Models

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Module 2 Local Probabilistic Models

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes.

Distribution and modifications of the content is prohibited.

Probabilistic Graphical Models


CSDLO5011
2024-25

Subject Incharge
Dr. Bidisha Roy
Associate Professor
Room No. 401
email: [email protected]

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 1
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Module 2: Bayesian Network Model and Inference


Local Probabilistic Models

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 2
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

What???
❑ Local Probabilistic Models specify how each node in a Bayesian
network is conditionally dependent on its parent nodes
❑Define the relationship between a variable and its parents using
Conditional Probability Distributions (CPDs)
❑Key Idea: In Bayesian networks, every node’s behavior is governed
locally by the CPD associated with it
❑The type of CPD depends on the nature of the variables and the
relationships they mode
❑Types:
❑Tabular CPDs
❑Deterministic CPDs
❑Context Specific CPDs
St. Francis Institute of Technology PGM
Department of Computer Engineering Dr. Bidisha Roy 3
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Tabular CPDs
❑Used for discrete variables where the probabilities of each outcome
are represented in a table.
❑Specifies the conditional probabilities associated with each
combination of values of the parents of a particular variable
❑ Are common when all variables are discrete and there are relatively
few parents.

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 4
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Tabular CPDs… Advantages


❑Simplicity and Clarity:
❑Provide a straightforward way to represent conditional
probabilities for discrete variables. Each possible combination of
parent states is explicitly listed, making it easy to understand and
interpret.
❑Flexibility:
❑Can accommodate any type of discrete relationship between
variables, regardless of whether the relationship is linear, non-
linear, or even irregular.
❑Direct Representation:
❑Allow for an exact and direct representation of the conditional
probability distribution without relying on approximations or
assumptions.
St. Francis Institute of Technology PGM
Department of Computer Engineering Dr. Bidisha Roy 5
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Tabular CPDs… Advantages


❑Easy to Implement:
❑Simple to implement and work well when the number of parent
variables is small or when the data is sparse.
❑General Purpose:
❑Can be used in a wide range of applications, especially when
working with categorical data or when the relationship between
variables cannot be easily captured by deterministic functions or
linear models.
When to use:
❑When all variables are discrete
❑When the number of parent variables is small, as dimensionality is reduced
❑When an exact and comprehensive listing of all possible parent-child
configurations is desired
St. Francis Institute of Technology PGM
Department of Computer Engineering Dr. Bidisha Roy 6
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Tabular CPDs… Disadvantages


❑Scalability:
❑As the number of parent variables increases, the size of the
probability table grows exponentially. For 𝑛 binary parent variables,
the table requires 2𝑛 entries, leading to a large and complex CPD
that is difficult to manage.
❑Data Sparsity:
❑In cases where the dataset is limited, it can be challenging to
estimate reliable probabilities for every possible combination of
parent states. This can result in inaccurate or biased models.
❑Storage and Computation:
❑Large tables require significant storage space and computational
power, which can become impractical for networks with many
variables or parent-child dependencies.
St. Francis Institute of Technology PGM
Department of Computer Engineering Dr. Bidisha Roy 7
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Tabular CPDs… Disadvantages


❑Lack of Generalization:
❑Do not generalize well across unseen or rare parent states. Each
combination must be explicitly specified, limiting the model’s ability
to infer probabilities in new scenarios.
❑Overfitting Risk:
❑Every combination is explicitly represented, hence there is a risk of
overfitting to the training data, especially when dealing with noisy
or limited datasets.
❑Interpretability in Large Networks:
❑While tabular CPDs are easy to interpret for small networks, they
can become difficult to understand and manage as the number of
parent variables and states increases.
St. Francis Institute of Technology PGM
Department of Computer Engineering Dr. Bidisha Roy 8
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Tabular CPDs… Disadvantages


❑When not to use:

❑When the number of parent variables is high, leading to exponentially


large tables
❑When data is limited or sparse, making it hard to estimate accurate
probabilities
❑When the relationship between variables can be captured more
efficiently using other models

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 9
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Deterministic CPDs
❑There is no uncertainty or variation in the conditional relationship
between the variables
❑Specifies a deterministic relationship between the parent and child
nodes, where the value of child variable is fully determined by the
parent’s state(value of the parent variable).
❑Key Idea: Deterministic Relationship: a one-to-one mapping between
the parent nodes and the child node
❑For every combination of parent node values, the child node takes a single,
fixed value.
❑Often represent logical operations (like AND, OR, XOR) or arithmetic
functions (like sum, max, min) between parent nodes
❑The outcome is determined directly based on these operations

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 10
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Deterministic CPDs
❑No Probability Involved
❑Deterministic CPDs assign a probability of 1 to one specific outcome
and 0 to all others
❑A “hard rule”
❑ Examples
❑Boolean logic circuits
❑Decision rules in expert systems
❑Physical systems where outputs are exact functions of inputs

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 11
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Deterministic CPDs… Advantages


❑Compact Representation: Can be represented more compactly than
tabular CPDs, especially when the relationship is simple (like logical
operations)
❑Reduces storage requirements and makes the model easier to
manage.
❑Efficiency in computation: The relationship between parent and child
nodes is deterministic, hence is no need to perform probabilistic
inference.
❑Significantly speed up calculations in a Bayesian network.
❑No learning from data required:
❑The distributions based on rule-based relationships, can be directly
encoded without needing extensive training data.
St. Francis Institute of Technology PGM
Department of Computer Engineering Dr. Bidisha Roy 12
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Deterministic CPDs… Advantages


❑Easy Interpretability: relationship between variables is explicit and easy
to understand. (AND, OR etc.)
❑Reduction in Overfitting: don’t rely on fitting a probabilistic model to
data
❑They are simply rules encoded in the model

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 13
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Deterministic CPDs… Disadvantages


❑Limited Flexibility: Rigid as they cannot capture uncertainty or
stochastic relationships
❑Noise or variability cannot be modelled accurately
❑Scalability with Complex Relationships: Difficult to manage when the
relationships involve complex functions or large numbers of variables
❑Inapplicable for Uncertain or Probabilistic Relationships
❑Missing data not handled: Exact knowledge of all parent nodes’ states
needed to determine the child node’s state
❑If some parent information is missing or uncertain, the
deterministic approach breaks down
❑Reduced Generalization: Do not generalize well to situations where
slight variations in input should lead to variations in output
❑Lack theof Technology
St. Francis Institute ability to handle scenarios outside predefined
PGM rules
Department of Computer Engineering Dr. Bidisha Roy 14
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Deterministic CPDs
❑When to use:

❑When relationships are clear-cut, well-understood, and exact (e.g.,


logical circuits, rule-based systems)

❑When to avoid:

❑When modeling noisy, probabilistic, or uncertain systems where


relationships are better captured with probabilistic CPDs

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 15
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Context-Specific CPDs
❑A way to represent probabilistic dependencies in a more efficient
manner by taking advantage of regularities in the data
❑ Instead of specifying a full CPD for every possible combination of
parent variables, context-specific CPDs allow us to define distributions
that are specific to certain contexts or conditions
❑Key Idea: Regularity in parameters as many times, same effect can be
observed in multiple contexts
❑Leverage the idea of contextual independence, where a node’s value
may be independent of some parents in certain contexts
❑Represented using decision trees and decision graphs

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 16
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Context-Specific CPDs

Consider a scenario where a student’s job offer depends on their SAT score, whether they
applied, and a recommendation letter. Instead of specifying a CPD for every combination of
these variables, we can use a tree CPD to simplify:

❑ If the student did not apply, the job offer probability might only depend on the SAT score.
❑ If the student applied, the job offer probability might depend on both the SAT score and
the recommendation letter.
St. Francis Institute of Technology PGM
Department of Computer Engineering Dr. Bidisha Roy 17
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Context-Specific CPDs. Advantages


❑Efficiency in representation
❑Reduced Complexity, as parameters are reduced
❑Simplified models
❑Improved Accuracy
❑Tailored Distributions and hence more accurate probabilistic
dependencies
❑Contextual Relevance as they better reflect the actual dependencies
❑Flexibility
❑Adaptability to different scenarios
❑Customizability to modify or extend to include new contexts

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 18
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Context-Specific CPDs. Advantages


❑Scalability
❑Useful in large datasets
❑Scalable Computation
❑Enhanced Interpretability
❑Clearer insights as data broken down to context-specific
components
❑Use of decision trees and graphs provide a more intuitive way to
interpret and explain the dependencies

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 19
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Context-Specific CPDs. Disadvantages


❑Complexity in Design
❑Initial setup can be complex and time consuming, requiring a deep
understanding of the domain and the relationships between
variables
❑Expert knowledge: Domain experts needed
❑ Scalability Issues
❑As the number of contexts increases, the complexity of managing
and maintaining the CPDs can grow significantly.
❑Computational overhead in cases of large datasets
❑ Data Sparsity due to insufficient data resulting in overfitting
❑Limited applicability when clear contextual dependencies not defined
❑In cases where dependencies are complex or not context-
dependent,
St. a traditional CPD might be more suitablePGM
Francis Institute of Technology
Department of Computer Engineering Dr. Bidisha Roy 20
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Context-Specific CPDs
❑When to use:

❑When you can identify specific contexts that simplify relationships and
reduce the complexity of your model.

❑When to avoid:

❑where dependencies are not context-dependent, or when the cost of


identifying relevant contexts outweighs the benefits of the compact
representation

St. Francis Institute of Technology PGM


Department of Computer Engineering Dr. Bidisha Roy 21
Self Study
❑Linear Generalized Models

You might also like