Unit - 2 ML
Unit - 2 ML
Supervised Learning
Contents
Types of Supervised Learning
Supervised Machine Learning Algorithms
k Nearest Neighbors
Regression Models
Naive Bayes Classifiers
Decision Trees
Ensembles of Decision Trees
Kernelized Support Vector Machines
Uncertainty Estimates from Classifiers
Supervised Machine Learning
In supervised learning, models are trained using labelled dataset, where the
model learns about each type of data.
Once the training process is completed, the model is tested on the basis of
test data (a subset of the training set), and then it predicts the output.
KNN Algorithm :
K-Nearest Neighbour is one of the simplest
Machine Learning algorithms based on Supervised
Learning technique.
K-NN algorithm can be used for Regression as
well as for Classification but mostly it is used for
the Classification problems.
K-NN is a non-parametric algorithm, which
means it does not make any assumption on
underlying data.
KNN algorithm at the training phase just stores
the dataset and when it gets new data, then it
classifies that data into a category that is much
similar to the new data.
Why do we need a K-NN Algorithm?
How does K-NN work?
The K-NN working can be explained on the basis of
the below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K
number of neighbors
Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
Step-4: Among these k neighbors, count the
number of the data points in each category.
Step-5: Assign the new data points to that category
for which the number of the neighbor is maximum.
Step-6: Our model is ready.
How to select the value of K in the K-NN Algorithm :
There is no particular way to determine the
best value for "K", so we need to try some
values to find the best out of them. The most
preferred value for K is 5.
A very low value for K such as K=1 or K=2,
can be noisy and lead to the effects of outliers
in the model.
Large values for K are good, but it may find
some difficulties.
Advantages of KNN Algorithm:
It is simple to implement.
It is robust to the noisy training data
It can be more effective if the training data is
large.
Disadvantages of KNN Algorithm:
Always needs to determine the value of K
which may be complex some time.
The computation cost is high because of
calculating the distance between the data
points for all the training samples.
Example
Start by visualizing some data points:
import matplotlib.pyplot as plt
plt.scatter(x, y, c=classes)
plt.show()
Now we fit the KNN algorithm with K=1
from sklearn.neighbors import KNeighborsClassifier
knn.fit(data, classes)
And use it to classify a new data point:
Example
new_x = 8
new_y = 21
new_point = [(new_x, new_y)]
prediction = knn.predict(new_point)