0% found this document useful (0 votes)
27 views20 pages

Machine Learning Unit-3.1

Uploaded by

sahil.utube2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views20 pages

Machine Learning Unit-3.1

Uploaded by

sahil.utube2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT-3: Overview of the Course

• Supervised Learning: Classification


• Introduction
• Classification techniques:
K-nearest Neighbour (KNN)
Decision Tree Algorithm
Support Vector Machine
Naïve Bayes,
Logistic regression
• Recommendation System: Content based and
Collaborative techniques.
UNIT-3: Supervised Learning: Classification
Classification: Introduction
• It is called supervised learning because the process of learning from the
training data by a machine can be related to a teacher supervising the
learning process of a student who is new to the subject.

• Here, the teacher is the training data.

• Training data is the past information with known value of class field or
‘label’.

• Hence, we say that the ‘training data is labelled’ in the case of supervised
learning.

• Contrary to this, there is no labelled training data for unsupervised learning.

• Some more examples of supervised learning are as follows:


1. Predicting whether a tumour is malignant or benign.
2. Price prediction in domains such as real estate, stocks, etc.
3. Test results of newly admitted patients are to classify them as high-risk or low-risk
patients.
Classification

• Example: Credit
scoring
• Differentiating
between low-risk
and high-risk
customers from their
income and savings

Discriminant: IF income > θ1 AND savings > θ2


THEN low-risk ELSE high-risk
4
Regression

• Example: Price of a
used car
y = wx+w0
• x : car attributes
y : price
y = g (x | q )
g ( ) model,
q parameters
5
KNN Algorithm:
• The k-nearest neighbors (KNN) algorithm is
 a simple,
 easy-to-implement
 supervised machine learning algorithm.

• K-NN algorithm assumes the similarity between the new


case/data and available cases and put the new case into the
category that is most similar to the available categories.

• K-NN algorithm stores all the available data and classifies a


new data point based on the similarity.

• K-NN algorithm can be used for Regression as well as for


Classification.
• Example: Suppose, we have an image of a creature that looks similar
to cat and dog.

• So for this identification, we can use the KNN algorithm.

• Our KNN model will find the similar features of the new data set and
based on the most similar features, it will put it in either cat or dog
category.
• Why do we need a K-NN Algorithm?

• Suppose there are two categories, i.e., Category A and Category B, and
we have a new data point x1 (Input Value).

• This new data point will lie in which of these categories?.

• To solve this type of problem, we need a K-NN algorithm. Consider the


below diagram:
Algorithm: The working of KNN can be explained on the basis of the
below algorithm:

Step-1: Select the number K of the neighbors.

Step-2: Calculate the Euclidean distance of K number of neighbors

Step-3: Take the K nearest neighbors as per the calculated Euclidean


distance.

Step-4: Among these k neighbors, count the number of the data points
in each category.

Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.

Step-6: Our model is ready.


• Example:- We have data from the questionnaires survey and
objective testing with two attributes:
1. Acid Durability and
2. Strength

To classify whether a paper tissues is good or not?

• Here is four training data set is given .


Sr. X1=Acid X2=Strength
No. Durability (In Kg/square Y=Classification
(In Seconds) meter)
01 7 7 Bad
02 7 4 Bad
03 3 4 Good
04 1 4 Good
• Training (Sample) data set=(X1, X2) =(7,7),(7,4), (3,4), (1,4)
• Now factory produces a new paper tissues that pass
laboratory test with X1=3 and X2=7.
• Testing Sample data=(3,7)
• Without expensive survey, Can we guess/classify this new
paper tissues is Good or Not Good(Bad)?
• So , we applying KNN algorithm.
• Step1:- Determine , K= No. of nearest neighbors.
Suppose K=3.
• Step2: Calculate the distance between the query-instance
(Testing sample) and all the training samples.
Coordinate of query-instance is =(3,7).
So we compute Euclidian Distance as follows:
1. (7,7),(3,7)=√(7-3)2+(7-7)2=√16=4
2. (7,4), (3,7)= √(7-3)2+(4-7)2=√25=5
3. (3,4), (3,7)= √(3-3)2+(4-7)2=√9=3
4. (1,4), (3,7)= √(1-3)2+(4-7)2=√13=3.6
Step3:-Sorting the distance in Ascending order and determine nearest
neighbors based on K-th minimum distance.
X1 X2 Euclidian Distance Is it included in Category of
( in Ascending Order) nearest Nearest
neighbors Neighbors
3 4 3 Yes Good
1 4 3.6 Yes Good
7 7 4 Yes Bad
7 4 5 No Bad

Step4: Count the No. of Nearest Neighbors in each category.


We have Good=2 and Bad=1, since 2>1, then we conclude that a new paper
Conclusion:
• We have 2 Good NN and 1 Bad NN, since 2>1,
then we conclude that a new paper tissues
with X1=3 and X2=7 should be included in
Good Category.
The KNN Algorithm:
1. Load the data
2. Initialize K to your chosen number of neighbors
3. For each example in the data
• 3.1 Calculate the distance between the query example
and the current example from the data.
• 3.2 Add the distance and the index of the example to an
ordered collection
4. Sort the ordered collection of distances and indices from
smallest to largest (in ascending order) by the distances
5. Pick the first K entries from the sorted collection
6. Get the labels of the selected K entries
7. If regression, return the mean of the K labels
8. If classification, return the mode of the K labels
• How to select the value of K in the K-NN Algorithm?
Below are some points to remember while selecting the
value of K in the K-NN algorithm:
1. There is no particular way to determine the best value
for "K", so we need to try some values to find the best
out of them.

2. The most preferred value for K is 5.

3. A very low value for K such as K=1 or K=2, can be noisy


and lead to the effects of outliers in the model.

4. Large values for K are good, but it may find some


difficulties.
Advantages of KNN Algorithm:
1. It is simple to implement.
2. It is robust to the noisy training data.
3. It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:


4. Always needs to determine the value of K which may be
complex some time.

5. The computation cost is high because of calculating the


distance between the data points for all the training
samples.

6. The algorithm gets significantly slower as the number of


examples and/or predictors/independent variables increase.
Website:-
1. https://www.javatpoint.com/k-nearest-neighbor-
algorithm-for-machine-learning.
2. https://medium.com/@adi.bronshtein
3.
https://www.analyticsvidhya.com/blog/2021/05/k
nn-the-distance-based-machine-learning-algorithm
4.https://www.tutorialspoint.com/
5. https://towardsdatascience.com/
6.
https://people.revoledu.com/kardi/tutorial/KNN/K
NN_Numerical-example.html
7. https://medium.com/analytics-vidhya

You might also like