0% found this document useful (0 votes)

17 views

Module 4 - Supervised and Unsupervised learning techniques

Uploaded by

devaadi0713

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Module 4 - Supervised and Unsupervised learning techniques

Uploaded by

devaadi0713

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

SUPERVISED LEARNING

ALGORITHMS

Presenter: Dr. Amit Kumar Das

Professor,
Dept. of Computer Science and Engg.,
Institute of Engineering & Management.
K-NEAREST NEIGHBOR
ALGORITHM
LET’S CONSIDER THE INPUT DATA …
DATA HOLDOUT

Training data

Test data
LET’S SEE HOW THE TRAINING DATA IS
GROUPED…
Communica
Name Aptitude tion Class
Karuna 2 5 Speaker
Bhuvna 2 6 Speaker
Communication

Gaurav 7 6 Leader
Parul 7 2.5 Intel
Dinesh 8 6 Leader
Jani 4 7 Speaker
Bobby 5 3 Intel
Parimal 3 5.5 Speaker
Govind 8 3 Intel
Susant 6 5.5 Leader
Gouri 6 4 Intel
Bharat 6 7 Leader
Ravi 6 2 Intel
Pradeep 9 7 Leader

Aptitude
SAY WE DON’T KNOW WHICH CLASS THE
TEST DATA BELONGS TO …
Communica
Name Aptitude tion Class
Karuna 2 5 Speaker
Communication

Bhuvna 2 6 Speaker
Gaurav 7 6 Leader
Parul 7 2.5 Intel
Dinesh 8 6 Leader
Jani 4 7 Speaker
Bobby 5 3 Intel
Parimal 3 5.5 Speaker
Govind 8 3 Intel
Susant 6 5.5 Leader
Gouri 6 4 Intel
Bharat 6 7 Leader
Ravi 6 2 Intel
Pradeep 9 7 Leader
Aptitude Josh 5 4.5 ???
LET’S TRY FIND SIMILARITY WITH THE
DIFFERENT TRAINING DATA INSTANCES…
We calculate the “Euclidean distance” from the test data point
to the different training data points using the formula:
DIFFERENT MEASURES OF SIMILARITY
Distance-based similarity measure –
The most common distance measure is the Euclidean distance, which, between two
features F1 and F2 is calculated as:

where F1 and F2 are features of an n-dimensional data set.

A more generalized form of the Euclidean distance is the Minkowski distance. It takes
the form of Euclidean distance (also called L2 norm) when r = 2.

Minkowski distance Manhattan distance

At r = 1, it takes the form of Manhattan distance (also called L1 norm).

DIFFERENT MEASURES OF SIMILARITY
Distance-based similarity measure –
To calculate the distance between binary vectors, Hamming distance is used.

For example, the Hamming distance between two vectors 01101011 and 11001001 is 3,
as illustrated in the above diagram.

Other similarity measures –

The Jaccard distance, a measure of dissimilarity between two features, is a
complementary of Jaccard index. For two features having binary values, Jaccard
index is measured as:

where n11 = number of cases where both the features have value 1
n01 = number of cases where the feature 1 has value 0 and feature 2 has value 1
n10 = number of cases where the feature 1 has value 1 and feature 2 has value 0
Jaccard distance, dJ = 1 - J
DIFFERENT MEASURES OF SIMILARITY
Other similarity measures –
Jaccard distance
Let’s consider two features F1 and F2 having values (0, 1, 1, 0, 1, 0, 1, 0) and (1, 1, 0, 0,
1, 0, 0, 0).

Cosine Similarity

It is calculated as:
DIFFERENT MEASURES OF SIMILARITY
Cosine Similarity
Let’s calculate the cosine similarity of x and y, where x = (2, 4, 0, 0, 2, 1, 3, 0, 0) and y
= (2, 1, 0, 0, 3, 2, 1, 0, 1).
In this case, x.y = 2*2 + 4*1 + 0*0 + 0*0 + 2*3 + 1*2 + 3*1 + 0*0 + 0*1 = 19

Cosine similarity actually measures the angle between x and y vectors. In the above
example, the angle comes to be which is a good level of similarity. if cosine
similarity has a value 1, the angle between x and y is , which means x and y are same
except for the magnitude.

Cosine similarity is one of the most popular measure in text classification.

DECISION TREE
ALGORITHM
WHAT IS ENTROPY?
 In statistical mechanics, entropy is a property of a
thermodynamic system, closely related to the microscopic
configurations (known as microstates)

 Because it is determined by the number of random

microstates, entropy is related to the amount of additional
information needed to specify the exact physical state of a
system, given its macroscopic specification. For this
reason, it is often said that entropy is an expression of the
disorder, or randomness of a system, or of the lack of
information about it.

 The concept of entropy plays a central role in information

theory
TRAINING DATA FOR GTS RECRUITMENT
TRAINING DATA FOR GTS RECRUITMENT
ENTROPY AND INFORMATION GAIN
CALCULATION (LEVEL 1)

Entropy (S) =

Information Gain (S, A) =

Entropy (Sbs) - Entropy (Sas)

Entropy (Sas) =
ENTROPY AND INFORMATION GAIN CALCULATION
(LEVEL 2)
ENTROPY AND INFORMATION GAIN
CALCULATION (LEVEL 3)
RANDOM FOREST CLASSIFIERS
SIMPLE LINEAR
REGRESSION ALGORITHM
MOTIVATING PROBLEM

How can he validate what he believes?

As we know, in simple linear regression, the line is drawn using the

regression formula.
Y = a + bX

If we know the values of ‘a’ and ‘b’, then it is easy to predict the value of Y
for any given X by using the above formula. But the question is how to
calculate the values of ‘a’ and ‘b’ for a given set of X and Y values?
MOTIVATING PROBLEM (CONTD.)
A scatter plot was drawn to explore the relationship between the
independent variable (internal marks) mapped to X-axis and dependent
variable (external marks) mapped to Y-axis as depicted in the figure below.
ORDINARY LEAST SQUARES (OLS) TECHNIQUE
A straight line is drawn as close as possible over the points on the scatter
plot. Ordinary Least Squares (OLS) is the technique used to estimate a
line that will minimize the error (ε), which is the difference between the
predicted and the actual values of Y. This means summing the errors of
each prediction or, more appropriately, the Sum of the Squares of the
Errors (SSE) i.e. .

It is observed that the SSE is least when ‘b’ takes the value.

The corresponding value of ‘a’ calculated using the above value of ‘b’ is
OLS TECHNIQUE BASED CALCULATION
OLS TECHNIQUE BASED CALCULATION
As we have
already seen, the
simple linear
regression model
built on the data
in the example is

MExt = 19.04 +
1.89 * MInt

The value of the intercept from the above equation is 19.05. However, none of
the internal mark is 0. So, intercept = 19.05 indicates that 19.05 is the portion of
the external examination marks not explained by the internal examination marks.

Slope measures the estimated change in the average value of Y as a result of a

one-unit change in X. Here, slope = 1.89 tells us that the average value of the
external examination marks increases by 1.89 for each additional 1 mark in the
internal examination.
MULTIPLE LINEAR REGRESSION
 Two or more independent variables, i.e. predictors are
involved in the model
 Say, while predicting Price of a Property as the dependent
variable, the possible predictors can be Area of the Property,
location, floor, number of years since purchase, amenities
available, etc.
 We can form a multiple regression equation as shown below:
PriceProperty = f (AreaProperty, location, floor, Ageing, Amenities)
 The following expression describes the equation involving the
relationship with two predictor variables, namely X1 and X2.

The model describes a plane in the three-dimensional space of Ŷ,

X1, and X2. Parameter ‘a’ is the intercept of this plane. Parameters
‘b1’ and ‘b2’ are referred to as partial regression coefficients.
UNSUPERVISED
LEARNING
UNSUPERVISED LEARNING - CLUSTERING

Cluster 2

Cluster 1

Cluster 3
Cluster 4
UNSUPERVISED LEARNING – ASSOCIATION
ANALYSIS
DIFFERENT CLUSTERING TECHNIQUES
 Partitioning techniques
 Hierarchical techniques

 Density-based techniques
DIFFERENT CLUSTERING TECHNIQUES (CONTD.)
Partitioning techniques
 Uses mean or medoid (etc.) to represent cluster
centre
 Adopts distance-based approach to refine
clusters
 Finds mutually exclusive clusters of spherical
or nearly spherical shape
 Effective for data sets of small to medium size

 Two of the most important algorithms for

partitioning-based clustering are k-means
and k-medoids
DIFFERENT CLUSTERING TECHNIQUES (CONTD.)
Hierarchical techniques
 Creates hierarchical or tree-like structure through
decomposition or merger
 For example, in a problem of organizing employees of
a university in different departments, first the
employees are grouped under the different
departments in the university. Then within each
department, the employees can be grouped according
to their roles such as professors, assistant professors,
supervisors, lab assistants, etc.
 Uses distance between the nearest or furthest points
in neighbouring clusters as a guideline for refinement
 Two main hierarchical clustering methods:
agglomerative clustering and divisive clustering
DIFFERENT CLUSTERING TECHNIQUES (CONTD.)
Hierarchical techniques
 Agglomerative clustering is a bottom-up technique
which starts with individual objects as clusters and
then iteratively merges them to form larger clusters.
 Divisive clustering starts with one cluster with all
given objects and then splits it iteratively to form
smaller clusters.
 In both these cases, it is important to select the split
and merger points carefully, because the subsequent
splits or mergers will use the result of the previous
ones and there is no option to perform any object
swapping between the clusters or rectify the decisions
made in previous steps
DIFFERENT CLUSTERING TECHNIQUES (CONTD.)
Hierarchical techniques
DIFFERENT CLUSTERING TECHNIQUES (CONTD.)
Density-based techniques
 Useful for identifying arbitrarily shaped clusters

 Guiding principle of cluster creation is the identification of

dense regions of objects in space which are separated by low-
density regions. The key idea is that for each point of a
cluster, the neighbourhood of a given radius has to contain at
least a minimum number of points.
 DBSCAN is one of the popular density-based algorithm
K-MEANS ALGORITHM
BASIC ALGORITHM OF K-MEANS
 Choose k points in the feature space to serve as
the cluster centres
 After choosing the initial cluster centres, the
other examples are assigned to the cluster centre
that is most similar or nearest according to the
distance function
 Initial assignment phase is complete; the k-
means algorithm proceeds to the update phase.
 The first step of updating the clusters involves
shifting the initial centres to a new location,
known as the centroid, which is calculated as
the mean value of the points currently assigned
to that cluster
BASIC ALGORITHM OF K-MEANS (CONTD.)
 More points have been reassigned from one
cluster to another. This leads to another
update stage.
 When no points are assigned, the k-means
algorithm stops. The cluster assignments are
now final.
Figure 9.6 Clustering dataset
Clustering with initial centroids
Cluster C1

Cluster C3

ai2 ai1

b14(1)
ain_1
b14(2)

b14(n4)

Cluster C4
Cluster C2

Iteration 1: 4 Clusters and distance of points from the centroids

Cluster C1

Cluster C3

Moved from Cluster1

ai2 ai1

b14(1)
ain_1
b14(2)

b14(n4)

Cluster C4
Cluster C2

Iteration 2: Centroids recomputed and points redistributed among

clusters based on the nearest centroid
Cluster C1

Cluster C3

ai2 ai1

b14(1)
ain_1
b14(2)

b14(n4)
Moved from Cluster3

Cluster C4
Cluster C2

Iteration 3: final cluster arrangement - centroids recomputed and

points redistributed among clusters based on the nearest centroid
HOW TO ARRIVE AT A VALUE OF K?
 In case there is some a priori knowledge, the
same can be used
 Sometimes it is dictated by business need or
the motivation for the analysis
 Without any a priori knowledge at all, one
rule of thumb suggests setting k equal to the
square root of (n / 2), where n is the number
of examples in the dataset
 A technique known as the elbow method
attempts to gauge how the homogeneity or
heterogeneity within the clusters changes for
various values of k
HOW TO ARRIVE AT A VALUE OF K?
- ELBOW METHOD
ASSOCIATION ANALYSIS
ASSOCIATION ANALYSIS
 Useful for identifying interesting relationships hidden
in large data sets
 A common application of this analysis is the Market
Basket Analysis that retailers use for cross-selling of
their products
 Itemset - One or more items are grouped together.
They are surrounded by brackets to indicate that they
form a set. E.g. {Bread, Milk, Egg} can be grouped
together to form an itemset as those are frequently
bought together
 Support count - Denotes the number of transactions
in which a particular itemset is present. This is a very
important property of an itemset as it denotes the
frequency of occurrence for the itemset.
ASSOCIATION ANALYSIS (CONTD.)
 Association rules - set of rules that specify patterns
of relationships among items. A typical rule might be
expressed as {Bread, Milk}→{Egg}, which denotes that
if Bread and Milk are purchased, then Egg is also
likely to be purchased.
 Association rules are learned from subsets of itemsets.
For example, the preceding rule was identified from
the set of {Bread, Milk, Egg}.
 It should be noted that an association rule is an
expression of X → Y where X and Y are disjoint
itemsets, i.e. X ∩ Y = 0.
 Support and confidence are the two concepts that
are used for measuring the strength of an association
rule.
ASSOCIATION ANALYSIS (CONTD.)
 Support denotes how often a rule is applicable to a
given data set.
 Confidence indicates how often the items in Y
appear in transactions that contain X in a total
transaction of N. Confidence denotes the predictive
power or accuracy of the rule.
 Support and confidence are the two concepts that
are used for measuring the strength of an association
rule.

lift(X→Y) = confidence(X→Y) / support(Y)

Leverage (X→Y) = support(X→Y) − support(X) × support(Y)
conviction(X→Y) = (1 − support(Y)) / (1 − conﬁdence(X→Y))
ASSOCIATION ANALYSIS (CONTD.)
 In the data set presented below, if we consider the
association rule {Bread, Milk} → {Egg}, then from the
formula of support and confidence we can say:
ASSOCIATION ANALYSIS (CONTD.)
Role of support & confidence in association
analysis :
 A low support may indicate that the rule has occurred
by chance i.e. in context of retail, items are seldom
bought together by the customers
 Confidence provides the measurement for reliability of
the inference of a rule. Higher confidence of a rule X →
Y denotes more likelihood of Y to be present in
transactions that contain X.
 The confidence of X leading to Y is not the same as the
confidence of Y leading to X. In the given data set,
confidence of {Bread, Milk} → {Egg} = ¾ = 0.75 but
confidence of {Egg} → {Bread, Milk} = 3/5 = 0.6. Here,
the rule {Bread, Milk} → {Egg} is the strong rule.
THANK YOU &
QUESTIONS PLEASE!

Numerical Method For Engineers-Chapter 13
100% (2)
Numerical Method For Engineers-Chapter 13
20 pages
Unit 2 ML
No ratings yet
Unit 2 ML
89 pages
ml unit2
No ratings yet
ml unit2
38 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
AI UNIT 4
No ratings yet
AI UNIT 4
17 pages
Unit-7 ML
No ratings yet
Unit-7 ML
11 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Unit - 2 ML notes
No ratings yet
Unit - 2 ML notes
14 pages
COMP1901 Research Project
No ratings yet
COMP1901 Research Project
12 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
DSBDUNITIII_T1729232981820-1
No ratings yet
DSBDUNITIII_T1729232981820-1
26 pages
ICT202B AI ML and Emerging technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging technologies UNIT 3 (Classification and Regression) 2
23 pages
ML UNIT-2
No ratings yet
ML UNIT-2
33 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
Unit 1
No ratings yet
Unit 1
15 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
AIML
No ratings yet
AIML
30 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Unit 1(DS)
No ratings yet
Unit 1(DS)
15 pages
Classification
No ratings yet
Classification
50 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
ML unit-2 (CEC)
No ratings yet
ML unit-2 (CEC)
96 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
9.0 KNN Nearest Neighbours Algorithm
No ratings yet
9.0 KNN Nearest Neighbours Algorithm
4 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
Unit 6
No ratings yet
Unit 6
22 pages
Clustering
No ratings yet
Clustering
64 pages
Module 3
No ratings yet
Module 3
20 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Data Mining Assignment 3
No ratings yet
Data Mining Assignment 3
9 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
Unit V - Big Data Programming
No ratings yet
Unit V - Big Data Programming
22 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Data Mining - Other Classifiers
No ratings yet
Data Mining - Other Classifiers
7 pages
Classifying in Machine Learning
No ratings yet
Classifying in Machine Learning
26 pages
Machine Learning Notes 1
No ratings yet
Machine Learning Notes 1
120 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Ambo University Inistitute of Technology Department of Computer Science
No ratings yet
Ambo University Inistitute of Technology Department of Computer Science
13 pages
Seminar 3
No ratings yet
Seminar 3
43 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
ML Unit-2
No ratings yet
ML Unit-2
55 pages
Classification
No ratings yet
Classification
74 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
Algorithms
No ratings yet
Algorithms
5 pages
Unit-5
No ratings yet
Unit-5
73 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
unit 6 ai
No ratings yet
unit 6 ai
28 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
Beginning Software Engineering
From Everand
Beginning Software Engineering
Rod Stephens
4.5/5 (2)
Numerical Method Question Bank
100% (1)
Numerical Method Question Bank
5 pages
MCQ For Insem
No ratings yet
MCQ For Insem
8 pages
Rainbow 17
No ratings yet
Rainbow 17
19 pages
Deep Learning Techniques: An Overview: January 2021
No ratings yet
Deep Learning Techniques: An Overview: January 2021
11 pages
Taking Derivative by Convolution
No ratings yet
Taking Derivative by Convolution
29 pages
Numerical Methods Lecture2a
No ratings yet
Numerical Methods Lecture2a
30 pages
Math 404 - W01 - Intro
No ratings yet
Math 404 - W01 - Intro
28 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
231 pages
BME 513 final-exam-2013
No ratings yet
BME 513 final-exam-2013
1 page
Dive Into Deep Learning
No ratings yet
Dive Into Deep Learning
972 pages
Investigate and Compare Analysis Method of Hammerhead Bridge Pier Caps
No ratings yet
Investigate and Compare Analysis Method of Hammerhead Bridge Pier Caps
19 pages
All chapter download Numerical Methods for Engineers 7th Edition Chapra Solutions Manual
86% (7)
All chapter download Numerical Methods for Engineers 7th Edition Chapra Solutions Manual
42 pages
STUDY PLAN
No ratings yet
STUDY PLAN
1 page
Questions 4
No ratings yet
Questions 4
5 pages
Review Paper On Secure Hash Algorithm With Its Variants: Research
No ratings yet
Review Paper On Secure Hash Algorithm With Its Variants: Research
8 pages
Lecture 3: Sorting: Set Interface (L03-L08)
No ratings yet
Lecture 3: Sorting: Set Interface (L03-L08)
6 pages
Physics-Informed-Neural-Networks-for-Numerical-Analysis
No ratings yet
Physics-Informed-Neural-Networks-for-Numerical-Analysis
16 pages
2023 Princeton
No ratings yet
2023 Princeton
45 pages
DTFS_&_DTFT
No ratings yet
DTFS_&_DTFT
7 pages
Heap Sort PDF
No ratings yet
Heap Sort PDF
48 pages
2. String Processing
No ratings yet
2. String Processing
19 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
2 pages
Beyond Scaling Real Time Even Processing With Stream Mining
No ratings yet
Beyond Scaling Real Time Even Processing With Stream Mining
30 pages
CS 161: Design and Analysis of Algorithms
No ratings yet
CS 161: Design and Analysis of Algorithms
43 pages
Grade 10quarterly Test
No ratings yet
Grade 10quarterly Test
3 pages
05. Chap 7-2 Regularization for Deep Learning-Hyun-Lim Yang
No ratings yet
05. Chap 7-2 Regularization for Deep Learning-Hyun-Lim Yang
49 pages
Assignment CH-2 (9TH)
No ratings yet
Assignment CH-2 (9TH)
2 pages
Lecture14-Routh Stability Criterion
No ratings yet
Lecture14-Routh Stability Criterion
15 pages

Module 4 - Supervised and Unsupervised learning techniques

Uploaded by

Module 4 - Supervised and Unsupervised learning techniques

Uploaded by

SUPERVISED LEARNING

Presenter: Dr. Amit Kumar Das

where F1 and F2 are features of an n-dimensional data set.

Minkowski distance Manhattan distance

At r = 1, it takes the form of Manhattan distance (also called L1 norm).

Other similarity measures –

Cosine similarity is one of the most popular measure in text classification.

 Because it is determined by the number of random

 The concept of entropy plays a central role in information

Information Gain (S, A) =

How can he validate what he believes?

As we know, in simple linear regression, the line is drawn using the

Slope measures the estimated change in the average value of Y as a result of a

The model describes a plane in the three-dimensional space of Ŷ,

 Two of the most important algorithms for

 Guiding principle of cluster creation is the identification of

Iteration 1: 4 Clusters and distance of points from the centroids

Moved from Cluster1

Iteration 2: Centroids recomputed and points redistributed among

Iteration 3: final cluster arrangement - centroids recomputed and

lift(X→Y) = confidence(X→Y) / support(Y)

You might also like