0% found this document useful (0 votes)

10 views42 pages

Chapter 04

machine learning

Uploaded by

Belete Siyum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views42 pages

Chapter 04

machine learning

Uploaded by

Belete Siyum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

Fundamentals of Machine Learning

Prepared by

Prince Thomas M.E., PhD

Associate Professor
Chapter 4: Unsupervised Learning and Graphical Models
• Clustering Methods
• K-Means Clustering: Centroid Calculation and Convergence
• Hierarchical Clustering: Agglomerative and Divisive Methods
• Frequent Pattern Mining
• Apriori Algorithm and Association Rules
• Applications in Market Basket Analysis
• Graphical Models
• Bayesian Networks
• Markov Networks
• Hidden Markov Models (HMMs)
• States, Transitions, and Observations
• Forward Algorithm and Viterbi Algorithm
• Applications in Speech Recognition and NLP
Unsupervised learning
• It involves analyzing and clustering data without labeled outputs.

• It tries to find hidden patterns, structures, or features within the data.

• It is primarily used for exploratory data analysis and tasks where the goal is to uncover

insights rather than predict labels.

Key Characteristics
• No labeled data is provided; only input data is available.

• Focuses on understanding data distributions, relationships, and patterns.

• Examples include clustering, dimensionality reduction, and anomaly detection.

Major Technique
• Clustering

• Dimensionality Reduction

• Association Rule Mining

Types of Unsupervised Learning Algorithm
• The unsupervised learning algorithm can be further categorized
into two types of problems:
Clustering
• It is an unsupervised learning technique.
• It used to group data points into clusters based on their similarity.
• Data points in the same cluster are more similar to each other than to those in other clusters.
• Clustering is widely used for exploratory data analysis and pattern recognition.
Clustering Methods
• K-Means Clustering
• Hierarchical Clustering
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
• Mean-Shift Clustering
• Gaussian Mixture Models (GMM)
Applications of Clustering
• Customer segmentation
• Social network analysis
• Market segmentation
K-Means Clustering
• K-Means Clustering is a popular unsupervised machine learning algorithm.
• It is used to partition a dataset into distinct clusters.
• Each cluster is represented by a centroid, and the algorithm iteratively assigns data
points to clusters and updates these centroids until convergence.
Steps of Algorithm
• Select random K points or centroids.
• Assign each data point to their closest centroid, which will form the predefined K
clusters.
• Calculate the variance and place a new centroid of each cluster.
• Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
• If any reassignment occurs, then go to step-4 else go to FINISH.
Centroid in K-Means Clustering
• The centroid is the central point of a cluster in K-Means Clustering.

• It is a representative "average" position for all the data points in a cluster.

• The algorithm uses centroids to determine which data points belong to which cluster and
updates their positions iteratively for better clustering.

Definition
• A centroid is the geometric center of a cluster, calculated as the mean of all data points
within the cluster.

Convergence in K-Means occurs when the algorithm stops updating centroids or cluster
assignments. This happens when the clusters stabilize, meaning the centroids no longer change
significantly

• Manhattan and Euclidean distance are used to find the distance between the two data point.
Example:
Final clusters are: {A1,B1,C2}, {A3,B2,B3} and {A2,C1}
Hierarchical Clustering
• Hierarchical clustering is a technique that builds a hierarchy of clusters for a dataset. Unlike flat
clustering methods like K-Means, hierarchical clustering organizes data into a tree-like structure
called a dendrogram, which illustrates how clusters are merged (or split) at different levels.

Types of Hierarchical Clustering

1. Agglomerative Clustering
• Bottom-Up Approach: Each data point starts as its own cluster, and clusters are iteratively merged
based on similarity until a single cluster (or a specified number of clusters) remains.

Steps:

1. Start with each data point as an individual cluster.

2. Compute the distance (similarity) between all pairs of clusters using a distance metric (Euclidean
distance).

3. Merge the two closest clusters into a single cluster.

4. Repeat steps 2 and 3 until all points are in a single cluster or the desired number of clusters is
Distance Metrics for Clustering

• Single Linkage: Measures the shortest distance between any two points in
different clusters.

• Complete Linkage: Considers the farthest distance between points in

different clusters; ensures compact and spherical clusters.

• Average Linkage: Calculates the mean of all pairwise distances between

points in two clusters; balances between Single and Complete Linkage.

• Centroid Linkage: Uses the Euclidean distance between the centroids

(average positions) of two clusters; focuses on cluster centers rather than
individual points.
2. Divisive Clustering
• Top-Down Approach: Start with all data points in one cluster and recursively split
clusters into smaller clusters until each data point forms its own cluster (or a
desired number of clusters is reached).
Steps:
1. Begin with all data points in a single cluster.
2. Identify the cluster to split using a criterion (e.g., largest variance or dissimilarity).
3. Divide the selected cluster into two sub-clusters based on a distance metric or
other criteria.
4. Repeat step 2 and 3 until all points are in their own cluster or the desired number
of clusters is achieved.
Challenges:
• More computationally intensive than agglomerative clustering since splitting
Frequent Pattern Mining
• It is a data mining technique to discover patterns, correlations, or
associations in datasets.
• It identifies sets of items, sequences, or events that occur frequently
together in transactional databases.
Apriori Algorithm
• The Apriori Algorithm is an iterative method used to identify frequent
itemsets in a dataset and generate association rules.
Graphical Models
• Graphical models are probabilistic models that represent dependencies among
variables using a graph structure. They are used for reasoning about
uncertainty and for making predictions based on observed data.
Key Features:
• Nodes: Represent random variables.
• Edges: Represent probabilistic dependencies.
• Types: Directed and undirected graphs.
Types of Graphical Models:
• Bayesian Networks (Directed Acyclic Graphs): Captures conditional
dependencies via directed edges.
• Markov Networks (Undirected Graphs): Captures undirected
What is a Bayesian Network?

• A Bayesian network, also known as a Bayes network, belief network, or

probabilistic directed acyclic graphical model.

• It is a graphical model that represents a set of variables and their

conditional dependencies via a directed acyclic graph (DAG).

• Each node in the DAG corresponds to a random variable, and the edges
represent direct causal or influential relationships between the variables.
Why Do We Need Bayesian Networks?
Bayesian networks are valuable tools for various reasons:
• Uncertainty Modeling: It allow us to model uncertain information and make
probabilistic inferences.
• Causal Reasoning: It can represent causal relationships between variables,
enabling us to understand how changes in one variable affect others.
• Decision Making: It can assist in making decisions under uncertainty by
considering multiple factors and their probabilities.
• Learning from Data: It can be learned from data, allowing us to discover
hidden relationships and patterns.
• Inference and Prediction: It enable us to make predictions about
unobserved variables based on observed evidence
Key Components of a Bayesian Network:
• Nodes: Represent random variables.
• Edges: Represent direct dependencies between variables.
• Conditional Probability Distributions (CPDs): Associated with each node, these
specify the probability distribution of a node's value given the values of its parent
nodes.
Formulas and Concepts:
• Joint Probability Distribution: The joint probability distribution of all variables in
a Bayesian network can be factorized as the product of the conditional probabilities
of each variable given its parents:
P(X1, X2, ..., Xn) = Π P(Xi | Parents(Xi))
• Bayesian Inference: Bayes' theorem is used to update beliefs about a variable
given new evidence:
Example:
You have a new burglar alarm installed at home. It is fairly reliable at
detecting burglary, but also sometimes responds to minor earthquakes. You
have two neighbors, John and Merry, who promised to call you at work when
they hear the alarm. John always calls when he hears the alarm, but
sometimes confuses telephone ringing with the alarm and calls too. Merry
likes loud music and sometimes misses the alarm. Given the evidence of who
has or has not called, estimate the probability of a burglary.
What is the probability that the alarm has sounded but neither a
burglary nor an earthquake has occurred, and both John and Merry
call?

What is the probability that john calls?

Markov Network
• Its also known as a Markov Random Field (MRF).
• It is a type of undirected graphical model that represents probabilistic
relationships between random variables.
• Markov Networks use undirected edges to represent pairwise relationships
between variables.
Markov Network Parameters
• Nodes: Represent random variables (e.g., "sunny," "rainy").
• Edges: Represent pairwise relationships between variables (e.g., the influence
of today's weather on tomorrow's).
• Potential Functions: Measure the compatibility of joint assignments of values
to variables within a clique (a fully connected subgraph). These functions
quantify the strength of the relationship between variables.
Markov Network Parameters

• State Space:
The state space of a Markov Network is the set of all possible configurations of
the variables. In the given example, the state space consists of two states:
"sunny" and "rainy."
• Initial Probability:
The initial probability distribution specifies the probability of each state at the
initial time step. In the example: P(sunny) = 0.5 P(rainy) = 0.5
• Transition Matrix:
The transition matrix defines the probabilities of transitioning from one state to
another. In the example, the transition matrix would be:
What are HMMs?
• Hidden Markov Models (HMMs) are statistical models used to represent systems
that are assumed to be Markov processes with unobserved (hidden) states.
• HMMs are widely utilized in various fields, including speech recognition,
bioinformatics, and finance.
Why are HMMs Needed
• Modeling Sequential Data: They can effectively capture temporal
dependencies in sequences.
• Dealing with Uncertainty: HMMs provide a framework for modeling systems
with hidden states and observations that can be noisy or incomplete.
• Probabilistic Inference: They allow for the computation of probabilities of
sequences of observations, making them suitable for tasks such as classification
and prediction.
Parameters of HMM
• States (S): A set of hidden states in the model. Each state
represents a possible condition of the system.
• Observations (O): A set of possible observations that can be
generated by the states.
• Transition Probabilities (A): A matrix where A[i][j] represents
the probability of transitioning from state i to state j.
• Emission Probabilities (B): A matrix where B[i][k] represents
the probability of emitting observation k from state i.
• Initial State Distribution (π): A vector where π[i] indicates the
Properties of HMM
• Markov Property: The future state depends only on the current state and not on
the previous states.
• Stationary Transition Probabilities: The transition probabilities remain the
same over time.
• Memoryless: The model does not retain memory of past states beyond the
current one.

States, Transitions, and Observations

• States: These are not directly observable and represent the underlying processes.
• Transitions: These define how likely it is to move from one state to another.
• Observations: These are the outputs we can observe and are dependent on the
states.
Forward Algorithm
• The Forward Algorithm is used to calculate the probability of observing
a sequence of events (observations) given a Hidden Markov Model.

Forward Algorithm Steps

• Initialization: Compute the initial probabilities of the first observation.
• Recursion: For each subsequent observation, update the probabilities
based on the previous state probabilities and the transition/emission
probabilities.
• Termination: Sum the probabilities of ending in any state after the last
observation.
States: Start, Sunny, Rainy, End
Transition Probabilities:
•Start to Sunny: 0.6
•Start to Rainy: 0.4
•Sunny to Sunny: 0.4
•Sunny to Rainy: 0.6
•Rainy to Sunny: 0.2
•Rainy to Rainy: 0.3
•Rainy to End: 0.1
•Sunny to End: 0.2
Emission Probabilities (for illustration):
•Assume:
•P(Play | Sunny) = 0.7
•P(Play | Rainy) = 0.4
•P(Shop | Sunny) = 0.2
•P(Shop | Rainy) = 0.5
Viterbi Algorithm
• The Viterbi Algorithm finds the most likely sequence of hidden states given
the observed sequence. It uses dynamic programming similar to the
Forward Algorithm but tracks the best paths taken to reach each state.
Viterbi Algorithm Steps
• Initialization: Similar to the Forward Algorithm but also record the state
paths.
• Recursion: Update the probabilities and the paths for each observation.
• Termination: Trace back the most likely path from the last state to the first.
• The path from Sunny → Rainy → Shop → End yields a maximum probability of 0.021.
• The path from Rainy → Rainy → Shop → End yields a lower probability of 0.006.
• The better path, based on the maximum probability of reaching the End state after observing
the sequence,

ABB 2025 Dealer Price List
100% (2)
ABB 2025 Dealer Price List
364 pages
Kenya Medical Training College Proposal
33% (3)
Kenya Medical Training College Proposal
13 pages
F-S Divertor PDF
No ratings yet
F-S Divertor PDF
174 pages
Telecommunications Security Code of Practice
No ratings yet
Telecommunications Security Code of Practice
150 pages
SOP For Protocol For Working Standard
No ratings yet
SOP For Protocol For Working Standard
6 pages
2023 Toyota Crown
No ratings yet
2023 Toyota Crown
9 pages
Thesis On Color Image Segmentation
100% (2)
Thesis On Color Image Segmentation
5 pages
Surveillance Systems
No ratings yet
Surveillance Systems
17 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Governing AI For Humanity
No ratings yet
Governing AI For Humanity
101 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
63 pages
Module 5
No ratings yet
Module 5
91 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
CH 1
No ratings yet
CH 1
72 pages
Clustering
No ratings yet
Clustering
75 pages
DBMS Lab Session Unit 5
No ratings yet
DBMS Lab Session Unit 5
89 pages
Clustering
No ratings yet
Clustering
53 pages
Module 5
No ratings yet
Module 5
43 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Chapter-2-3-Process Scheduling
No ratings yet
Chapter-2-3-Process Scheduling
68 pages
Unit 4
No ratings yet
Unit 4
74 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
48 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Unit 2
No ratings yet
Unit 2
33 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Clustering
No ratings yet
Clustering
38 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
Lab 2
No ratings yet
Lab 2
29 pages
Unit 4
No ratings yet
Unit 4
16 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
Lab 3
No ratings yet
Lab 3
25 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
Chapter-5 - File System - 104436
No ratings yet
Chapter-5 - File System - 104436
50 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Phoenix Black-Microwave Muffle Furnace
No ratings yet
Phoenix Black-Microwave Muffle Furnace
12 pages
6 - Into To Data Science Techniques and Clustering
No ratings yet
6 - Into To Data Science Techniques and Clustering
16 pages
05 RSB Cluster
No ratings yet
05 RSB Cluster
14 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Introduction To Machine Learning-Presentation
No ratings yet
Introduction To Machine Learning-Presentation
28 pages
Clustering
No ratings yet
Clustering
20 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Module - 5 - ECE3047 - Machine Learning
No ratings yet
Module - 5 - ECE3047 - Machine Learning
52 pages
Integrating PCA With Deep Learning Models For Stock Market Forecasting
No ratings yet
Integrating PCA With Deep Learning Models For Stock Market Forecasting
13 pages
Chapter-4 - Device Managment
No ratings yet
Chapter-4 - Device Managment
30 pages
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Chapter-3 - Paging and Segmentation
No ratings yet
Chapter-3 - Paging and Segmentation
6 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Clustering
No ratings yet
Clustering
11 pages
Unit Iii - ML
No ratings yet
Unit Iii - ML
13 pages
WIREs Data Min Knowl - 2023 - Shaik - Remote Patient Monitoring Using Artificial Intelligence Current State
No ratings yet
WIREs Data Min Knowl - 2023 - Shaik - Remote Patient Monitoring Using Artificial Intelligence Current State
31 pages
CSI 4500 Datasheet PDF
No ratings yet
CSI 4500 Datasheet PDF
16 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Q6 6r80manual PDF
No ratings yet
Q6 6r80manual PDF
49 pages
Exception 20240408
No ratings yet
Exception 20240408
7 pages
Cambridge International AS & A Level: Computer Science 9618/13 October/November 2021
No ratings yet
Cambridge International AS & A Level: Computer Science 9618/13 October/November 2021
9 pages
Lab 1
No ratings yet
Lab 1
4 pages
Mbeya University of Science and Technology: Admission Requirements
No ratings yet
Mbeya University of Science and Technology: Admission Requirements
15 pages
A List of All My Torrents
No ratings yet
A List of All My Torrents
3 pages
Clustering New
No ratings yet
Clustering New
6 pages
Psyc325 U5 Ip Final Turn in This One 2
No ratings yet
Psyc325 U5 Ip Final Turn in This One 2
6 pages
NLP Extc Sem8 Final Exam IMPs
No ratings yet
NLP Extc Sem8 Final Exam IMPs
3 pages
DLL - Mapeh 4 - Q3 - W9
No ratings yet
DLL - Mapeh 4 - Q3 - W9
4 pages
Handover - Check List
No ratings yet
Handover - Check List
5 pages
LT08
No ratings yet
LT08
5 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Brakes Volvo Trucks
No ratings yet
Brakes Volvo Trucks
2 pages
SCBM-910400#SCBM-910400 1
No ratings yet
SCBM-910400#SCBM-910400 1
2 pages
Clustering
No ratings yet
Clustering
39 pages
Unit 5
No ratings yet
Unit 5
5 pages
Quants Intern - JD
No ratings yet
Quants Intern - JD
3 pages
15kw - SN College - SLD
No ratings yet
15kw - SN College - SLD
1 page
Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
Expt 6 - P-I-N and Avalanche Photodiode BER Performance Comparison
No ratings yet
Expt 6 - P-I-N and Avalanche Photodiode BER Performance Comparison
4 pages
Appendix C - Machine Language: Code Operand Description
No ratings yet
Appendix C - Machine Language: Code Operand Description
1 page
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Chapter 04

Uploaded by

Chapter 04

Uploaded by

Fundamentals of Machine Learning

Prince Thomas M.E., PhD

• It tries to find hidden patterns, structures, or features within the data.

insights rather than predict labels.

• Focuses on understanding data distributions, relationships, and patterns.

• Examples include clustering, dimensionality reduction, and anomaly detection.

• Association Rule Mining

• It is a representative "average" position for all the data points in a cluster.

Types of Hierarchical Clustering

1. Start with each data point as an individual cluster.

3. Merge the two closest clusters into a single cluster.

• Complete Linkage: Considers the farthest distance between points in

• Average Linkage: Calculates the mean of all pairwise distances between

• Centroid Linkage: Uses the Euclidean distance between the centroids

• A Bayesian network, also known as a Bayes network, belief network, or

• It is a graphical model that represents a set of variables and their

What is the probability that john calls?

States, Transitions, and Observations

Forward Algorithm Steps

You might also like