0% found this document useful (0 votes)

12 views78 pages

3 Unit PR NonParametric Decision Making

Uploaded by

KANAK SHARMA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views78 pages

3 Unit PR NonParametric Decision Making

Uploaded by

KANAK SHARMA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Unit-3

Non-Parametric Decision Making

Dr. Srinath.S
Syllabus
• Nonparametric Decision Making:
• Introduction, Histograms,
• kernel and Window Estimators,
• Nearest Neighbour Classification Techniques: Nearest neighbour
algorithm, Adaptive Decision Boundaries, Minimum Squared
Error Discriminant Functions, Choosing a decision-making
technique
NON-PARAMETRIC DECISION MAKING
In parametric decision making, Only the parameters of the densities, such as their MEAN or
VARIANCE had to be estimated from the data before using them to estimate probabilities of class
membership.

In Nonparametric approach, distribution of data is not defined by a finite set of parameters

Nonparametric model does not take a predetermined form but the model is constructed according
to information derived from the data.
It does not uses MEAN or VARIANCE.
Non-Parametric Decision making is considered as more robust.
Some of the popular Non – Parametric Decision making includes:
Histogram, Scatterplots or Tables of data
Kernel Density Estimation
KNN
Support Vector Machine (SVM)
HISTOGRAM
• Histogram is one of the easiest ways of obtaining the approximate density
functions 𝑝(𝑥)^ from the sampled data.
• Histogram is a way to estimate the distribution of data without assuming
any particular shape for distribution (Gaussian, beta, etc.).
• Histogram shows the proportion of cases that fall into each of several
categories.
• The total area of a histogram is always normalized to 1, to display a valid
probability.(thus, it is a frequentist approach)
• Histogram plots provide a fast and reliable way to visualize the probability
density of a data sample.
• A histogram is a plot that involves first grouping the observations into bins and
counting the number of events that fall into each bin.
HISTOGRAM Continued
• The counts, or frequencies of observations, in each bin are then
plotted as a bar graph with the bins on the x-axis and the
frequency on the y-axis.
• One of the thumb rule to choose the number of intervals to be
equal to the square root of the number of samples
Histogram Example
Histogram Example
Histogram Example
HISTOGRAM Continued

For Example : the height in the first row is 0.1/4 = 0.025

Continued …
Kernel and Window Estimators

• Histogram is a good representation for discrete data. It will show the spikes
for each bin.
• But may not suite for continuous data. Then we will be using Kernel
(function) for each of the data points. And the total density is estimated by
the kernel density function.
• It is useful for applications like audio density estimation.

• This approximation to a continuous density estimation is not useful in

decision making.

• Each Delta function is replaced by Kernel Functions such as rectangles,

triangles or normal density functions which have been scaled so that their
combined area should be equal to one.
Kernel Density function

-4 to -2 = 1, -2 to 0 = 2, 0 to -2 = 1 -2 to -4 =0, -4 to -6=1 and -6 to -8=1

Height = 1/6*2 = 0.08 (first case) and so on
KERNEL DENSITY ESTIMATION
Similarity and Dissimilarity

s
Distance or similarity measures are essential in solving many pattern recognition problems
such as classification and clustering. Various distance/similarity measures are available in the
literature to compare two data distributions.
As the names suggest, a similarity measures how close two distributions are.

For algorithms like the k-nearest neighbor and k-means, it is essential to measure the
distance between the data points.
• In KNN we calculate the distance between points to find the nearest neighbor.
• In K-Means we find the distance between points to group data points into clusters based
on similarity.
• It is vital to choose the right distance measure as it impacts the results of our algorithm.
Euclidean Distance
• We are most likely to use Euclidean distance when calculating the distance between two rows
of data that have numerical values, such a floating point or integer values.
• If columns have values with differing scales, it is common to normalize or standardize the
numerical values across all columns prior to calculating the Euclidean distance. Otherwise,
columns that have large values will dominate the distance measure.

n
dist =  ( pk − qk )
2
k =1

• Where n is the number of dimensions (attributes) and pk and qk are, respectively, the kth
attributes (components) or data objects p and q.
• Euclidean distance is also known as the L2 norm of a vector.
Compute the Euclidean Distance between the following data set
• D1= [10, 20, 15, 10, 5]
• D2= [12, 24, 18, 8, 7]
Apply Pythagoras theorem for Euclidean distance
Manhattan distance:
Manhattan distance is a metric in which the distance between two points is the sum
of the absolute differences of their Cartesian coordinates. In a simple way of saying it
is the total sum of the difference between the x-coordinates and y-coordinates.
Formula: In a plane with p1 at (x1, y1) and p2 at (x2, y2)

• The Manhattan distance is related to the L1 vector norm

• In general ManhattanDistance = sum for i to N sum of |v1[i] – v2[i]|
Compute the Manhattan distance for the following
• D1 = [10, 20, 15, 10, 5]
• D2 = [12, 24, 18, 8, 7]
Manhattan distance:
is also popularly called city block distance

Euclidean distance is like flying

distance

Manhattan distance is like

travelling by car
Minkowski Distance
• It calculates the distance between two real-valued vectors.
• It is a generalization of the Euclidean and Manhattan distance measures and
adds a parameter, called the “order” or “r“, that allows different distance
measures to be calculated.
• The Minkowski distance measure is calculated as follows:
1
n
dist = (  | pk − qk r r
| )
k =1
Where r is a parameter, n is the number of dimensions (attributes) and pk
and qk are, respectively, the kth attributes (components) or data objects p
and q.
Minkowski is called generalization of Manhattan and Euclidean:

Manhattan Distance is called L1 Norm and

Euclidean distance is called L2

Minkowski is called Lp where P can be 1 or 2

Cosine Similarity
(widely used in recommendation system and NLP)
–If A and B are two document vectors.
–Cosine similarity ranges between (-1 to +1)
– -1 indicates not at all close and +1 indicates it is very close in similarity
–In cosine similarity data objects are treated as vectors.
– It is measured by the cosine of the angle between two vectors and determines
whether two vectors are pointing in roughly the same direction. It is often used
to
measure document similarity in text analysis.
–Cosine Distance = 1- Cosine Similarity
cos(A, B) = 1: exactly the same
0: orthogonal
−1: exactly opposite
Formula for Cosine Similarity
• The cosine similarity between two vectors is measured in ‘θ’.
• If θ = 0°, the ‘x’ and ‘y’ vectors overlap, thus proving they are
similar.
• If θ = 90°, the ‘x’ and ‘y’ vectors are dissimilar.

• If two points are on the same plane or same vector

• In the example given P1 and P2 are on the same vector,
and hence the angle between them is 0, so COS(0) =1 indicates
they are of high similarity
• In this example two points P1 and P2
are separated by 45 degree, and hence
Cosine similarity is COS(45) = 0.53.

In this example P1 and P2 are separated by

90 degree, and hence the Cosine
similarity is COS(90)= 0
If P1 and P2 are on the opposite side
• If P1 and P2 are on the opposite side then the angle between
them is 180 degree and hence the COS(180)= -1

• If it is 270, then again it will be 0, and 360 or 0 it will be 1.

Cosine Similarity
Advantages of Cosine Similarity
• The cosine similarity is beneficial because even if the two similar
data objects are far apart by the Euclidean distance because of
the size, they could still have a smaller angle between them.
Smaller the angle, higher the similarity.
• When plotted on a multi-dimensional space, the cosine similarity
captures the orientation (the angle) of the data objects and not
the magnitude.
Example1 for computing cosine distance
Consider an example to find the similarity between two vectors – ‘x’ and ‘y’, using Cosine
Similarity. (if angle can not be estimated directly)

The ‘x’ vector has values, x = { 3, 2, 0, 5 }

The ‘y’ vector has values, y = { 1, 0, 0, 0 }

The formula for calculating the cosine similarity is : Cos(x, y) = x . y / ||x|| * ||y||

x . y = 31 + 20 + 00 + 50 = 3

||x|| = √ (3)^2 + (2)^2 + (0)^2 + (5)^2 = 6.16

||y|| = √ (1)^2 + (0)^2 + (0)^2 + (0)^2 = 1

∴ Cos(x, y) = 3 / (6.16 * 1) = 0.49

Example2 for computing cosine distance
d1 = 3 2 0 5 0 0 0 2 0 0 ; d2 = 1 0 0 0 0 0 0 1 0 2

d1 • d2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5

||d1|| = (33+22+00+55+00+00+00+22+00+00)0.5 = (42) 0.5 = 6.481

(square root of sum of squares of all the elements)
||d2|| = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2) 0.5 = (6) 0.5 = 2.245

So cosine similarity = cos( d1, d2 ) = (d1 • d2)/ (||d1|| *||d2|| )

= (5/(6.481*2.245)) = 0.3150

Cosine distance (or it can be called dis-similarity)

= 1-cos(d1,d2) = 1-0.3436 = 0.6564
Find Cosine distance between
D1 = [5 3 8 1 9 6 0 4 2 1] D2 = [1 0 3 6 4 5 2 0 0 1]
When to use Cosine Similarity
• Cosine similarity looks at the angle between two vectors, euclidian
similarity at the distance between two points. Hence it is very popular
for NLP applications.

• Let's say you are in an e-commerce setting and you want to compare
users for product recommendations:
• User 1 bought 1x eggs, 1x flour and 1x sugar.
• User 2 bought 100x eggs, 100x flour and 100x sugar
• User 3 bought 1x eggs, 1x Vodka and 1x Red Bull

• By cosine similarity, user 1 and user 2 are more similar. By euclidean

similarity, user 3 is more similar to user 1.
JACCARD SIMILARITY AND DISTANCE:
In Jaccard similarity instead of vectors, we will be using sets.
It is used to find the similarity between two sets.
Jaccard similarity is defined as the intersection of sets divided by
their union. (count)

Jaccard similarity between two sets A and B is

A simple example using set notation: How similar are these two sets?
A = {0,1,2,5,6}
B = {0,2,3,4,5,7,9}
J(A,B) = {0,2,5}/{0,1,2,3,4,5,6,7,9} = 3/9 = 0.33
Jaccard Similarity is given by :
Overlapping vs Total items.
• Jaccard Similarity value ranges between 0 to 1
• 1 indicates highest similarity
• 0 indicates no similarity
Application of Jaccard Similarity
• Language processing is one example where jaccard similarity is
used.

• In this example it is 4/12 = 0.33

Jaccard Similarity is popularly used for ML model
performance analysis
• In this example, table is designed against
Actual vs predicted.

This gives an idea how our algorithm is working

• In the example is shows the overlapping +ve vs
Total positives including actual and predicted
Common Properties of a Distance

• Distances, such as the Euclidean distance, have some well known

properties.
1. d(p, q)  0 for all p and q and d(p, q) = 0 only if
p = q. (Positive definiteness)
2. d(p, q) = d(q, p) for all p and q. (Symmetry)
3. d(p, r)  d(p, q) + d(q, r) for all points p, q, and r.
(Triangle Inequality)
where d(p, q) is the distance (dissimilarity) between points (data objects), p and q.

• A distance that satisfies these properties is a metric, and a space is

called a metric space
Distance Metrics Continued
• Dist (x,y) >= 0
• Dist (x,y) = Dist (y,x) are Symmetric
• Detours can not Shorten Distance
Dist(x,z) <= Dist(x,y) + Dist (y,z)
z

X X y
y z
Euclidean Distance

2 p1 point x y
p3 p4 p1 0 2
1 p2 2 0
p2 p3 3 1
0 p4 5 1
0 1 2 3 4 5 6

p1 p2 p3 p4
p1 0 2.828 3.162 5.099
p2 2.828 0 1.414 3.162
p3 3.162 1.414 0 2
p4 5.099 3.162 2 0

Distance Matrix
Minkowski Distance
3
L1 p1 p2 p3 p4
p1 0 4 4 6
2 p1 p2 4 0 2 4
p3 p4
1
p3 4 2 0 2
p2 p4 6 4 2 0
0
0 1 2 3 4 5 6 L2 p1 p2 p3 p4
p1 0 2.828 3.162 5.099
p2 2.828 0 1.414 3.162
p3 3.162 1.414 0 2
p4 5.099 3.162 2 0
point x y
L p1 p2 p3 p4
p1 0 2
p1 0 2 3 5
p2 2 0 p2 2 0 1 3
p3 3 1 p3 3 1 0 2
p4 5 1 p4 5 3 2 0

Distance Matrix
Summary of Distance Metrics

• Manhattan Distance • Euclidean Distance

|X1-X2| + |Y1-Y2| • 𝑥1 − 𝑥2 2 + √ 𝑦1 − 𝑦2 2
Nearest Neighbors Classifiers
• Basic idea:
– If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute Distance
Test Record

Training Records Choose k of the “nearest”

records
K-Nearest Neighbors (KNN) : ML algorithm
• Simple, but a very powerful classification algorithm
• Classifies based on a similarity measure
• This algorithm does not build a model
• Does not “learn” until the test example is submitted for classification
• Whenever we have a new data to classify, we find its K-nearest neighbors from
the training data
• Classified by “MAJORITY VOTES” for its neighbor classes
• Assigned to the most common class amongst its K-Nearest Neighbors
(by measuring “distant” between data)
• In practice, k is usually chosen to be odd, so as to avoid ties
• The k = 1 rule is generally called the “nearest-neighbor classification” rule
K-Nearest Neighbors (KNN)
• K-Nearest Neighbor is one of the simplest Machine Learning algorithms based
on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data/Pattern
and available cases and put the new case into the category that is most similar
to the available categories.

• K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the
new data.
Illustrative Example for KNN
Collected data over the past few years(training data)
Considering K=1, based on nearest neighbor find the test
data class- It belongs to class of africa
Now we have used K=3, and 2 are showing it is close to
North/South America and hence the new data or data under
testing belongs to that class.
In this case K=3… but still not a correct value to
classify…Hence select a new value of K
Algorithm
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance to all the data points in
training.
• Step-3: Take the K nearest neighbors as per the calculated
Euclidean distance.
• Step-4: Among these k neighbors, apply voting algorithm
• Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
• Step-6: Our model is ready.
Consider the following data set of a pharmaceutical company with assigned class labels,
using K nearest neighbour method classify a new unknown sample using k =3 and k = 2.

Points X1 (Acid Durability ) X2(strength) Y=Classification

P1 7 7 BAD

P2 7 4 BAD

P3 3 4 GOOD

P4 1 4 GOOD

New pattern with X1=3, and X2=7 Identify the Class?

Points X1(Acid Durability) X2(Strength) Y(Classification)
P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 ?
KNN

P1 P2 P3 P4

(7,7) (7,4) (3,4) (1,4)

Euclidean
Distance of
P5(3,7) from
Sqrt((7-3) 2 + (7-7)2 ) = 16 Sqrt((7-3) 2 + (4-7)2 ) = 25 Sqrt((3-3) 2 + (4-7)2 ) Sqrt((1-3) 2 + (4-7)2 )
=4 =5 = 9 = 13
=3 = 3.60
P1 P2 P3 P4

Euclide (7,7) (7,4) (3,4) (1,4)

an
Distanc
e of Sqrt((7-3) 2 + Sqrt((7-3) 2 + Sqrt((3-3) 2 Sqrt((1-3) 2
P5(3,7) (7-7)2 ) (4-7)2 ) = 25 + (4-7)2 ) + (4-7)2 )
from = 16 =5 = 9 = 13
=4 =3 = 3.60

Class BAD BAD GOOD GOOD

Height (in cms) Weight (in kgs) T Shirt Size
158 58 M New customer named 'Mary’ has height
158 59 M 161cm and weight 61kg.
158 63 M
160 59 M
Suggest the T shirt Size with K=3,5
160 60 M
163 60 M
using Euclidean Distance and
163 61 M also Manhattan Distance
160 64 L
163 64 L
165 61 L
165 62 L
165 65 L
168 62 L
168 63 L
168 66 L
170 63 L
170 64 L
170 68 L
There is a Car manufacturer company that has
manufactured a new SUV car. The company wants to give
the ads to the users who are interested in buying that SUV.
So for this problem, we have a dataset that contains
multiple user's information through the social network. The
dataset contains lots of information but the Estimated
Salary and Age we will consider for the independent
variable and the Purchased variable is for the dependent
variable. Dataset is as shown in the table. Using K =5
classify the new sample
• There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
• Large values for K are good, but it may find some difficulties.

• Advantages of KNN Algorithm:

• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.

• Disadvantages of KNN Algorithm:

• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the data
points for all the training samples.
Another example: solve
• Because the distance function used to find the k nearest
neighbors is not linear, so it usually won't lead to a linear
decision boundary.
Adaptive decision Boundaries
• Nearest neighbour techniques can approximate arbitrarily
complicated decision regions, but their error rates may be larger than
Bayesian rates.
• Experimentation may be required to choose K and to edit the
reference samples.
• Classification may be time consuming if the number of reference
samples is large.
• An alternate solution is to assume that the functional form of the
decision boundary between each pair of classes is given, and to find
the decision boundary of that form which best separates the classes
in some sense.
Adaptive Decision Boundaries. Continued
• For example, assume that a linear decision boundary will be used to
classify samples into two classes and each sample has M features.
• Then the discriminant function has the form
D=w0 + w1x1+…wMxM
• If D = 0 is the equation of the decision boundary between the two classes.
• The weights w0,w1…wM are to be chosen to provide good performance
on the set.
• A sample with vector (x1,x2…xM) is classified into one class,
say class 1 if D>0 and into another class say -1 if D<0
• W0 is the interceptor and W1,W2…WM are all the weights related to
slopes.
It is of the form Y=Mx+C or Y=C+Mx
Adaptive decision boundaries …continued
• Geometrically with D=0 is the equation of a hyperplane decision boundary that divides
the M-dimensional feature space into two regions
• Two classes are said to be linearly separable if there exists a hyperplane decision
boundary such that D>0 for all the samples in class 1 and D<0for all the samples in the
class -1.
• Figure shows two classes which are separated by a hyperplane.
• Weights w1,w2..wM can be varied. Boundary will be adapted based on the weights.
• During the adaptive or training phase, samples are presented to the current form of
the classifier. Whenever a sample is correctly classified no change is made in the
weights.
• When a sample is incorrectly classified, each weight is changed to correct the output.
Adaptive decision boundary algorithm
1. Initialize the weights w0,w2,…wM to zero or to small random values to some
initial guesses.
2. Choose the next sample x=(x1,x2..xM) from the training set. Let the ‘true’
class or desired value of D be d, so that d=1 or -1 represents the true class of x.
3. Compute D=w0+w1x1+..wMxM.
4.If D not equal to d, replace wi by (wi+cdxi) (small change).
5. Repeat the steps 2 to 4 with each samples in the training set. When finished
run through the entire training data set again.
6.Stop and report perfect classification when all the samples are classified
properly.
• If there are N classes and M features the set of linear
discriminant function is
• D1=w10 + w11x1+…w1MxM
• D2=w20 + w21x1+…w2MxM
• ……
• Dn=wN0 + wN1x1+…wNMxM
Minimum Squared Error Discriminant Functions

• Although the adaptive decision boundary and adaptive

discriminate function techniques have considerable appeal, it
requires lot of iterations.
• Alternate solution is to have the “ Minimum Squared Error
(MSE)” classification procedure.
• MSE does not require iteration.
• MSE uses single discriminant function regardless of the number
of classes.
MSE
• If there are V samples and M features for each sample, there will be V
feature vectors
xi= (xi1,xi2,…..xiM), i=1 to V
• Let the true class of xi be represented by di, which can have any numerical
value. We want to find a set of weights wj, j=0,…M for single linear
discriminant function.
– D(xi) =w0+w1xi1+…+wMxiM
– Such that D(xi) = di for all the samples i. In general it will not be possible
• But by properly choosing the weights wo,w1…wM, the sum of the squared
differences between the set of desired values di and the actual values D(xi)
can be minimized. The sum of the squared errors E is
• E=σ𝑣𝑖=1( 𝐷 𝑥𝑖 − 𝑑𝑖 2 ).
• The values of the weights that minimize E may be found by computing the
partial derivatives of E with respect to each of the Wj, setting each
derivative to zero and solving for the weights w0,..wM
End of Unit 3

Measuring Data Similarity and Dissimilarity
No ratings yet
Measuring Data Similarity and Dissimilarity
20 pages
Lec-3. Datamining-Similarity-Distance-Ext
No ratings yet
Lec-3. Datamining-Similarity-Distance-Ext
104 pages
Unit 2 ML
No ratings yet
Unit 2 ML
89 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
Week 3 - Similarity Distance Measures
No ratings yet
Week 3 - Similarity Distance Measures
42 pages
Unit - 3 Image Proc
No ratings yet
Unit - 3 Image Proc
71 pages
III Clustering
No ratings yet
III Clustering
87 pages
Information Theory Fundamentals: Distance Between Two Images Based On Pixels
No ratings yet
Information Theory Fundamentals: Distance Between Two Images Based On Pixels
24 pages
Distance and Similarity: Andre Salvaro Furtado
No ratings yet
Distance and Similarity: Andre Salvaro Furtado
56 pages
Distance and Similarity
No ratings yet
Distance and Similarity
33 pages
Lecture 3
No ratings yet
Lecture 3
58 pages
Class 1c - DataFundamentals
No ratings yet
Class 1c - DataFundamentals
27 pages
Distance Functions
No ratings yet
Distance Functions
10 pages
DMi 03-Proximity
No ratings yet
DMi 03-Proximity
51 pages
Lecture 7 - Distance Measures
No ratings yet
Lecture 7 - Distance Measures
38 pages
12 - Chapter4 (DISTANCE MEASURES) PDF
No ratings yet
12 - Chapter4 (DISTANCE MEASURES) PDF
18 pages
Class-Data Preprocessing-IV
No ratings yet
Class-Data Preprocessing-IV
28 pages
Distances Similarities
No ratings yet
Distances Similarities
39 pages
Unit 3
No ratings yet
Unit 3
13 pages
Distance and Similarity
No ratings yet
Distance and Similarity
33 pages
DS - Module 3
No ratings yet
DS - Module 3
65 pages
Clustering
No ratings yet
Clustering
15 pages
VectorApplicationsInDS
No ratings yet
VectorApplicationsInDS
31 pages
TM3 ch07 Clustering
No ratings yet
TM3 ch07 Clustering
47 pages
DMi 03 Proximity
No ratings yet
DMi 03 Proximity
9 pages
Cosine Similarity
No ratings yet
Cosine Similarity
4 pages
03 Schubert
No ratings yet
03 Schubert
13 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
TE IT DMBI Module2 Data Preprocessing L8-L11
No ratings yet
TE IT DMBI Module2 Data Preprocessing L8-L11
73 pages
Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
26 pages
9 Distance Measures in Data Science
No ratings yet
9 Distance Measures in Data Science
9 pages
Similarity
No ratings yet
Similarity
20 pages
A Comparative Study On Distance Measuring Approach
No ratings yet
A Comparative Study On Distance Measuring Approach
3 pages
Data Mining and Predictive Modeling: Lecture 13: Measuring Data Similarity
No ratings yet
Data Mining and Predictive Modeling: Lecture 13: Measuring Data Similarity
19 pages
Module-3Conti.. Similarity& Dissimlarity
No ratings yet
Module-3Conti.. Similarity& Dissimlarity
29 pages
Clustering
No ratings yet
Clustering
43 pages
CS2209 Similarity Distances
No ratings yet
CS2209 Similarity Distances
23 pages
Data Mining: Similarity and Distance Recommendation Systems Sketching, Locality Sensitive Hashing
No ratings yet
Data Mining: Similarity and Distance Recommendation Systems Sketching, Locality Sensitive Hashing
57 pages
ML Unit 2
No ratings yet
ML Unit 2
11 pages
Similarity
No ratings yet
Similarity
19 pages
Lesson 6 Similarities KNN
No ratings yet
Lesson 6 Similarities KNN
25 pages
Distance Functions
No ratings yet
Distance Functions
7 pages
Clustering
0% (1)
Clustering
127 pages
CS-DM Module - 3
No ratings yet
CS-DM Module - 3
27 pages
Lab 2
No ratings yet
Lab 2
21 pages
Lecture 2. Similarity Measures For Cluster Analysis
No ratings yet
Lecture 2. Similarity Measures For Cluster Analysis
31 pages
Similarity Analysis
No ratings yet
Similarity Analysis
85 pages
Data Mining: Similarity and Distance
No ratings yet
Data Mining: Similarity and Distance
6 pages
Similarity
No ratings yet
Similarity
20 pages
Materi 7.1. Distance Measurement
No ratings yet
Materi 7.1. Distance Measurement
14 pages
18CSE397T - Computational Data Analysis Unit - 3: Session - 8: SLO - 2
No ratings yet
18CSE397T - Computational Data Analysis Unit - 3: Session - 8: SLO - 2
4 pages
Similarity Measures
No ratings yet
Similarity Measures
11 pages
Similarity and Dissimilarity
No ratings yet
Similarity and Dissimilarity
34 pages
Grade 6 Curriculum
No ratings yet
Grade 6 Curriculum
20 pages
Dist
No ratings yet
Dist
14 pages
Mbict 111 - 162 - 2021 - 11 - 14032021 - 3236
No ratings yet
Mbict 111 - 162 - 2021 - 11 - 14032021 - 3236
30 pages
Lecture 13, 14 - Chapter 6 Area Moments of Inertia
No ratings yet
Lecture 13, 14 - Chapter 6 Area Moments of Inertia
31 pages
Data Science: Department of Computer Science & Engineering
No ratings yet
Data Science: Department of Computer Science & Engineering
31 pages
Class 8 Maths Notes
No ratings yet
Class 8 Maths Notes
37 pages
Combined Chapterwise Assessment 2024 - 25
No ratings yet
Combined Chapterwise Assessment 2024 - 25
63 pages
Lec09 466 PDF
No ratings yet
Lec09 466 PDF
5 pages
MMW - PPT - Based Reviewer
No ratings yet
MMW - PPT - Based Reviewer
4 pages
CH 03
No ratings yet
CH 03
11 pages
Summer Assignment Maths 9th, O-Levels
100% (1)
Summer Assignment Maths 9th, O-Levels
2 pages
Introduction To Linear Programming Applications and Extensions First Edition Darst PDF Download
No ratings yet
Introduction To Linear Programming Applications and Extensions First Edition Darst PDF Download
51 pages
Approximate Inference
No ratings yet
Approximate Inference
37 pages
Strategy Mathematics Strategy Topicwise Booklist Sample Answers For UPSC Mains Exam
No ratings yet
Strategy Mathematics Strategy Topicwise Booklist Sample Answers For UPSC Mains Exam
6 pages
Mathematics Tos 3RD
No ratings yet
Mathematics Tos 3RD
2 pages
55+Scott+Page - The Model Thinker
No ratings yet
55+Scott+Page - The Model Thinker
39 pages
Varun Advanced 2023 - Maths Lecture Planner
No ratings yet
Varun Advanced 2023 - Maths Lecture Planner
2 pages
Chapter 1 - Complex Numbers
No ratings yet
Chapter 1 - Complex Numbers
19 pages
Screenshot 2020-11-08 at 10.31.27 PM
100% (1)
Screenshot 2020-11-08 at 10.31.27 PM
4 pages
Math Construction Project
No ratings yet
Math Construction Project
24 pages
System Simulation Model Lab Manual
100% (1)
System Simulation Model Lab Manual
41 pages
MIT8 01SC Problems03 Soln
No ratings yet
MIT8 01SC Problems03 Soln
12 pages
Evolution Without Evolution: Dynamics Described Stationary
No ratings yet
Evolution Without Evolution: Dynamics Described Stationary
8 pages
Automatic Solar Tracker
No ratings yet
Automatic Solar Tracker
11 pages
Multiplicative Inverse
No ratings yet
Multiplicative Inverse
15 pages
Design and Implementation of An Electric Wheelchai
No ratings yet
Design and Implementation of An Electric Wheelchai
6 pages
SA-2Maths Class 8
No ratings yet
SA-2Maths Class 8
3 pages
Java OOPs
No ratings yet
Java OOPs
78 pages
Sequence & Series - Assignment
No ratings yet
Sequence & Series - Assignment
44 pages
Mathematics Class 10 Syllabus Break Up AY 2022-23
No ratings yet
Mathematics Class 10 Syllabus Break Up AY 2022-23
4 pages
Speed Control of Separately Excited DC Motor Using Fuzzy Logic Controller
No ratings yet
Speed Control of Separately Excited DC Motor Using Fuzzy Logic Controller
6 pages
Module 1business Math
No ratings yet
Module 1business Math
7 pages
ICP 2019 Assignment Solutions
No ratings yet
ICP 2019 Assignment Solutions
82 pages
Internshipadvertisment May 0
No ratings yet
Internshipadvertisment May 0
5 pages
Winter 23 Advertisment CiSTUP
No ratings yet
Winter 23 Advertisment CiSTUP
4 pages
Mixed Integer Linear Programming (MILP)
No ratings yet
Mixed Integer Linear Programming (MILP)
31 pages
Stage6 Additional Math QuestionPaper
No ratings yet
Stage6 Additional Math QuestionPaper
3 pages
Knapsack Problem Using Dynamic Problem Solving
No ratings yet
Knapsack Problem Using Dynamic Problem Solving
8 pages
0-1 Knapsack Probelm Solution
No ratings yet
0-1 Knapsack Probelm Solution
3 pages
The Order of Numbers and The Goldbach Conjecture
No ratings yet
The Order of Numbers and The Goldbach Conjecture
5 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
8 pages
Paper 441
No ratings yet
Paper 441
2 pages
Cover Letter
No ratings yet
Cover Letter
1 page

3 Unit PR NonParametric Decision Making

Uploaded by

3 Unit PR NonParametric Decision Making

Uploaded by

Unit-3

Non-Parametric Decision Making

In Nonparametric approach, distribution of data is not defined by a finite set of parameters

For Example : the height in the first row is 0.1/4 = 0.025

• This approximation to a continuous density estimation is not useful in

• Each Delta function is replaced by Kernel Functions such as rectangles,

-4 to -2 = 1, -2 to 0 = 2, 0 to -2 = 1 -2 to -4 =0, -4 to -6=1 and -6 to -8=1

• The Manhattan distance is related to the L1 vector norm

Euclidean distance is like flying

Manhattan distance is like

Manhattan Distance is called L1 Norm and

Minkowski is called Lp where P can be 1 or 2

• If two points are on the same plane or same vector

In this example P1 and P2 are separated by

• If it is 270, then again it will be 0, and 360 or 0 it will be 1.

The ‘x’ vector has values, x = { 3, 2, 0, 5 }

x . y = 3*1 + 2*0 + 0*0 + 5*0 = 3

||x|| = √ (3)^2 + (2)^2 + (0)^2 + (5)^2 = 6.16

∴ Cos(x, y) = 3 / (6.16 * 1) = 0.49

||d1|| = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5 = (42) 0.5 = 6.481

So cosine similarity = cos( d1, d2 ) = (d1 • d2)/ (||d1|| *||d2|| )

Cosine distance (or it can be called dis-similarity)

• By cosine similarity, user 1 and user 2 are more similar. By euclidean

Jaccard similarity between two sets A and B is

• In this example it is 4/12 = 0.33

This gives an idea how our algorithm is working

• Distances, such as the Euclidean distance, have some well known

• A distance that satisfies these properties is a metric, and a space is

• Manhattan Distance • Euclidean Distance

Training Records Choose k of the “nearest”

Points X1 (Acid Durability ) X2(strength) Y=Classification

New pattern with X1=3, and X2=7 Identify the Class?

(7,7) (7,4) (3,4) (1,4)

Euclide (7,7) (7,4) (3,4) (1,4)

Class BAD BAD GOOD GOOD

• Advantages of KNN Algorithm:

• Disadvantages of KNN Algorithm:

• Although the adaptive decision boundary and adaptive

You might also like

x . y = 31 + 20 + 00 + 50 = 3

||d1|| = (33+22+00+55+00+00+00+22+00+00)0.5 = (42) 0.5 = 6.481