0% found this document useful (0 votes)
23 views

KNN Using Python

notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

KNN Using Python

notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

K-Nearest Neighbour using Python(Lazy

Classification
Learner)
Lets read it...

• Recently, I read an article describing a new type of


dining experience. Patrons are served in a
completely darkened restaurant by waiters who
move carefully around memorized routes using only
their sense of touch and sound.
• The allure of these establishments is rooted in the
idea that depriving oneself of visual sensory input
will enhance the sense of taste and smell, and foods
will be experienced in new and exciting ways. Each
bite is said to be a small adventure in which the
diner discovers the flavors the chef has prepared.
What sort of Machine Learning?

• An idea that can be used for machine learning—


as does another maxim involving poultry: "birds
of a feather flock together."
• In other words, things that are alike are likely to
have properties that are alike.
• We can use this principle to classify data by
placing it in the category with the most similar,
or "nearest" neighbors.
Nearest Neighbor Classification

• In a single sentence, nearest neighbor classifiers are defined


by their characteristic of classifying unlabeled examples by
assigning them the class of the most similar labeled examples.
Despite the simplicity of this idea, nearest neighbor methods
are extremely powerful. They have been used successfully for:
– Computer vision applications, including optical character
recognition and facial recognition in both still images and
video
– Predicting whether a person enjoys a movie which he/she
has been recommended (as in the Netflix challenge)
– Identifying patterns in genetic data, for use in detecting
specific protein or diseases
The kNN Algorithm

• The kNN algorithm begins with a training dataset


made up of examples that are classified into several
categories, as labeled by a nominal variable.
• Assume that we have a test dataset containing
unlabeled examples that otherwise have the same
features as the training data.
• For each record in the test dataset, kNN identifies k
records in the training data that are the "nearest" in
similarity, where k is an integer specified in advance.
• The unlabeled test instance is assigned the class of
the majority of the k nearest neighbors
Example:
Example:
Example:
Example:
Calculating Distance

• Locating the tomato's nearest neighbors requires


a distance function, or a formula that measures
the similarity between two instances.
• There are many different ways to calculate
distance.
• Traditionally, the kNN algorithm uses Euclidean
distance, which is the distance one would measure
if you could use a ruler to connect two points,
illustrated in the previous figure by the dotted
lines connecting the tomato to its neighbors.
Distance

• Euclidean distance is specified by the following formula, where p


and q are th examples to be compared, each having n features. The
term p1 refers to the value of the first feature of example p, while
q1 refers to the value of the first feature of example q:

• The distance formula involves comparing the values of each


feature. For example, to calculate the distance between the
tomato (sweetness = 6, crunchiness = 4), and the green bean
(sweetness = 3, crunchiness = 7), we can use the formula as follows:
Distance
Distance

Manhattan
Distance

Euclidean
Distance
Closest Neighbors
Choosing appropriate k

• Deciding how many neighbors to use for kNN


determines how well the mode will generalize to
future data.
• The balance between overfitting and underfitting
the training data is a problem known as the bias-
variance tradeoff.
• Choosing a large k reduces the impact or variance
caused by noisy data, but can bias the learner such
that it runs the risk of ignoring small, but important
patterns.
Choosing appropriate k
Choosing appropriate k

• In practice, choosing k depends on the difficulty


of the concept to be learned and the number of
records in the training data.
• Typically, k is set somewhere between 3 and 10.
One common practice is to set k equal to the
square root of the number of training examples.
• In the classifier, we might set k = 4, because
there were 15 example ingredients in the
training data and the square root of 15 is 3.87.
Python Packages needed : KNN

• pandas
– Data Analytics
• numpy
– Numerical Computing
• mat plot lib.pyplot
– Plotting graphs
• sklearn
– KNN Classes
Sample Application
Problem Statement:
KNN – Classification : Dataset

Q(X=6, Y= 6) , Class=?
Useful resources

• www.pythonprogramminglanguage.com
• www.scikit-learn.org
• www.towardsdatascience.com
• www.medium.com
• www.analyticsvidhya.com
• www.kaggle.com
• www.stephacking.com
• www.github.com
Thank you

You might also like