Chapter 04
Chapter 04
Prepared by
• It is primarily used for exploratory data analysis and tasks where the goal is to uncover
Key Characteristics
• No labeled data is provided; only input data is available.
Major Technique
• Clustering
• Dimensionality Reduction
• The algorithm uses centroids to determine which data points belong to which cluster and
updates their positions iteratively for better clustering.
Definition
• A centroid is the geometric center of a cluster, calculated as the mean of all data points
within the cluster.
Convergence in K-Means occurs when the algorithm stops updating centroids or cluster
assignments. This happens when the clusters stabilize, meaning the centroids no longer change
significantly
• Manhattan and Euclidean distance are used to find the distance between the two data point.
Example:
Final clusters are: {A1,B1,C2}, {A3,B2,B3} and {A2,C1}
Hierarchical Clustering
• Hierarchical clustering is a technique that builds a hierarchy of clusters for a dataset. Unlike flat
clustering methods like K-Means, hierarchical clustering organizes data into a tree-like structure
called a dendrogram, which illustrates how clusters are merged (or split) at different levels.
1. Agglomerative Clustering
• Bottom-Up Approach: Each data point starts as its own cluster, and clusters are iteratively merged
based on similarity until a single cluster (or a specified number of clusters) remains.
Steps:
2. Compute the distance (similarity) between all pairs of clusters using a distance metric (Euclidean
distance).
4. Repeat steps 2 and 3 until all points are in a single cluster or the desired number of clusters is
Distance Metrics for Clustering
• Single Linkage: Measures the shortest distance between any two points in
different clusters.
• Each node in the DAG corresponds to a random variable, and the edges
represent direct causal or influential relationships between the variables.
Why Do We Need Bayesian Networks?
Bayesian networks are valuable tools for various reasons:
• Uncertainty Modeling: It allow us to model uncertain information and make
probabilistic inferences.
• Causal Reasoning: It can represent causal relationships between variables,
enabling us to understand how changes in one variable affect others.
• Decision Making: It can assist in making decisions under uncertainty by
considering multiple factors and their probabilities.
• Learning from Data: It can be learned from data, allowing us to discover
hidden relationships and patterns.
• Inference and Prediction: It enable us to make predictions about
unobserved variables based on observed evidence
Key Components of a Bayesian Network:
• Nodes: Represent random variables.
• Edges: Represent direct dependencies between variables.
• Conditional Probability Distributions (CPDs): Associated with each node, these
specify the probability distribution of a node's value given the values of its parent
nodes.
Formulas and Concepts:
• Joint Probability Distribution: The joint probability distribution of all variables in
a Bayesian network can be factorized as the product of the conditional probabilities
of each variable given its parents:
P(X1, X2, ..., Xn) = Π P(Xi | Parents(Xi))
• Bayesian Inference: Bayes' theorem is used to update beliefs about a variable
given new evidence:
Example:
You have a new burglar alarm installed at home. It is fairly reliable at
detecting burglary, but also sometimes responds to minor earthquakes. You
have two neighbors, John and Merry, who promised to call you at work when
they hear the alarm. John always calls when he hears the alarm, but
sometimes confuses telephone ringing with the alarm and calls too. Merry
likes loud music and sometimes misses the alarm. Given the evidence of who
has or has not called, estimate the probability of a burglary.
What is the probability that the alarm has sounded but neither a
burglary nor an earthquake has occurred, and both John and Merry
call?
• State Space:
The state space of a Markov Network is the set of all possible configurations of
the variables. In the given example, the state space consists of two states:
"sunny" and "rainy."
• Initial Probability:
The initial probability distribution specifies the probability of each state at the
initial time step. In the example: P(sunny) = 0.5 P(rainy) = 0.5
• Transition Matrix:
The transition matrix defines the probabilities of transitioning from one state to
another. In the example, the transition matrix would be:
What are HMMs?
• Hidden Markov Models (HMMs) are statistical models used to represent systems
that are assumed to be Markov processes with unobserved (hidden) states.
• HMMs are widely utilized in various fields, including speech recognition,
bioinformatics, and finance.
Why are HMMs Needed
• Modeling Sequential Data: They can effectively capture temporal
dependencies in sequences.
• Dealing with Uncertainty: HMMs provide a framework for modeling systems
with hidden states and observations that can be noisy or incomplete.
• Probabilistic Inference: They allow for the computation of probabilities of
sequences of observations, making them suitable for tasks such as classification
and prediction.
Parameters of HMM
• States (S): A set of hidden states in the model. Each state
represents a possible condition of the system.
• Observations (O): A set of possible observations that can be
generated by the states.
• Transition Probabilities (A): A matrix where A[i][j] represents
the probability of transitioning from state i to state j.
• Emission Probabilities (B): A matrix where B[i][k] represents
the probability of emitting observation k from state i.
• Initial State Distribution (π): A vector where π[i] indicates the
Properties of HMM
• Markov Property: The future state depends only on the current state and not on
the previous states.
• Stationary Transition Probabilities: The transition probabilities remain the
same over time.
• Memoryless: The model does not retain memory of past states beyond the
current one.