0% found this document useful (0 votes)
2 views23 pages

Unit 3 - Supervise Learning Classification

The document provides an overview of classification algorithms in supervised learning, focusing on their purpose to categorize new observations based on training data. It discusses different types of classifiers, including binary and multi-class classifiers, and highlights specific algorithms like K-Nearest Neighbors (K-NN) and Naïve Bayes, explaining their workings, advantages, and disadvantages. Additionally, it covers the importance of selecting the right parameters, such as the value of K in K-NN, and the applications of these algorithms in various fields.

Uploaded by

t.ghate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views23 pages

Unit 3 - Supervise Learning Classification

The document provides an overview of classification algorithms in supervised learning, focusing on their purpose to categorize new observations based on training data. It discusses different types of classifiers, including binary and multi-class classifiers, and highlights specific algorithms like K-Nearest Neighbors (K-NN) and Naïve Bayes, explaining their workings, advantages, and disadvantages. Additionally, it covers the importance of selecting the right parameters, such as the value of K in K-NN, and the applications of these algorithms in various fields.

Uploaded by

t.ghate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Unit 3 – Supervised

Learning Classification
- Bharti Kungwani
What is the Classification Algorithm?
 The Classification algorithm is a Supervised Learning
technique that is used to identify the category of new
observations on the basis of training data. In
Classification, a program learns from the given dataset
or observations and then classifies new observation into
a number of classes or groups. Such as, Yes or No, 0 or
1, Spam or Not Spam, cat or dog, etc. Classes can be
called as targets/labels or categories.
 In classification algorithm, a discrete output function(y)
is mapped to input variable(x).
 y=f(x), where y = categorical output
Classification Algorithm
 Classification algorithms can be better understood
using the below diagram. In the below diagram, there
are two classes, class A and Class B. These classes have
features that are similar to each other and dissimilar to
other classes.
Classifiers
 The algorithm which implements the classification on a
dataset is known as a classifier.
 There are two types of Classifications
 Binary Classifier: If the classification problem has only two
possible outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT
SPAM, CAT or DOG, etc.
 Multi-class Classifier: If a classification problem has more
than two outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of
types of music.
Learners in Classification Problems:
 In the classification problems, there are two types of learners:
 Lazy Learners: Lazy Learner firstly stores the training dataset
and wait until it receives the test dataset. In Lazy learner case,
classification is done on the basis of the most related data
stored in the training dataset. It takes less time in training but
more time for predictions.
Example: K-NN algorithm, Case-based reasoning
 Eager Learners:Eager Learners develop a classification model
based on a training dataset before receiving a test dataset.
Opposite to Lazy learners, Eager Learner takes more time in
learning, and less time in prediction. Example: Decision Trees,
Naïve Bayes, ANN.
K-Nearest Neighbor(KNN) Algorithm
for Machine Learning
 K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
 K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
 K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a
well suite category by using K- NN algorithm.
 K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
 K-NN is a non-parametric algorithm, which means it does not make any assumption
on underlying data.
 It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
 KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
Example K-NN
 Example: Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know either it is a cat or dog. So for this identification, we can
use the KNN algorithm, as it works on a similarity measure. Our KNN model will
find the similar features of the new data set to the cats and dogs images and based
on the most similar features it will put it in either cat or dog category.
Types of ML Classification
Algorithms:
 Classification Algorithms can be further divid ed into
the Mainly two category
 Linear Models

◦ Logistic Regression
◦ Support Vector Machines
 Non-linear Models
◦ K-Nearest Neighbours
◦ Kernel SVM
◦ Naïve Bayes
◦ Decision Tree Classification
◦ Random Forest Classification
Why do we need a K-NN Algorithm?
 Suppose there are two categories, i.e., Category A and Category B, and we
have a new data point x1, so this data point will lie in which of these
categories. To solve this type of problem, we need a K-NN algorithm. With
the help of K-NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:
How does K-NN work?
 The K-NN working can be explained on the basis of the below
algorithm:
 Step-1: Select the number K of the neighbors
 Step-2: Calculate the Euclidean distance of K number of
neighbors
 Step-3: Take the K nearest neighbors as per the calculated
Euclidean distance.
 Step-4: Among these k neighbors, count the number of the data
points in each category.
 Step-5: Assign the new data points to that category for which
the number of the neighbor is maximum.
 Step-6: Our model is ready.
KNN
 Suppose we have a new data point and we need to put
it in the required category. Consider the below image:
KNN
 Firstly, we will choose the number of neighbors, so we
will choose the k=5.
 Next, we will calculate the Euclidean

distance between the data points. The Euclidean


distance is the distance between two points, which we
have already studied in geometry. It can be calculated
as:
KNN
 Firstly, we will choose the number of neighbors, so we will choose the
k=5.
 Next, we will calculate the Euclidean distance between the data points.
The Euclidean distance is the distance between two points, which we have
already studied in geometry. It can be calculated as:
 By calculating the Euclidean distance we got the nearest neighbors, as
three nearest neighbors in category A and two nearest neighbors in
category B. Consider the below image:
 As we can see the 3 nearest neighbors are from category A, hence this new
data point must belong to category A.
How to select the value of K in the K-
NN Algorithm?
 There is no particular way to determine the best value
for "K", so we need to try some values to find the best
out of them. The most preferred value for K is 5.
 A very low value for K such as K=1 or K=2, can be

noisy and lead to the effects of outliers in the model.


 Large values for K are good, but it may find some

difficulties.
Advantages & Disadvantages of KNN
 Advantages of KNN Algorithm:
◦ It is simple to implement.
◦ It is robust to the noisy training data
◦ It can be more effective if the training data is large.
 Disadvantages of KNN Algorithm:
◦ Always needs to determine the value of K which may be
complex some time.
◦ The computation cost is high because of calculating the
distance between the data points for all the training samples.
Example KNN- Numerical
 Find For X2= 3 and Y2= 7 on this dataset
Naïve Bayes - Classification
Algorithm
 Naïve Bayes algorithm is a supervised learning algorithm,
which is based on Bayes theorem and used for solving
classification problems.
 It is mainly used in text classification that includes a high-
dimensional training dataset.
 Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast
machine learning models that can make quick predictions.
 It is a probabilistic classifier, which means it predicts on the
basis of the probability of an object.
 Some popular examples of Naïve Bayes Algorithm are spam
filtration, Sentimental analysis, and classifying articles.
Why is it called Naïve Bayes?
 The Naïve Bayes algorithm is comprised of two words
Naïve and Bayes, Which can be described as:
 Naïve: It is called Naïve because it assumes that the
occurrence of a certain feature is independent of the
occurrence of other features. Such as if the fruit is
identified on the bases of color, shape, and taste, then
red, spherical, and sweet fruit is recognized as an apple.
Hence each feature individually contributes to identify
that it is an apple without depending on each other.
 Bayes: It is called Bayes because it depends on the
principle of Bayes' Theorem.
Bayes' Theorem
 Bayes' theorem is also known as Bayes' Rule or Bayes' law,
which is used to determine the probability of a hypothesis with
prior knowledge. It depends on the conditional probability.
 The formula for Bayes' theorem is given as:
 Where,
 P(A|B) is Posterior probability: Probability of hypothesis A on
the observed event B.
 P(B|A) is Likelihood probability: Probability of the evidence
given that the probability of a hypothesis is true.
 P(A) is Prior Probability: Probability of hypothesis before
observing the evidence.
 P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier:
 Convert the given dataset into frequency tables.
 Generate Likelihood table by finding the probabilities

of given features.
 Now, use Bayes theorem to calculate the posterior

probability.
Problem: If the weather is sunny, then
the Player should play or not?
 Data Set
Advantages & Disadvantages of Naïve
Bayes Classifier:
 Advantages of Naïve Bayes Classifier:
◦ Naïve Bayes is one of the fast and easy ML algorithms to
predict a class of datasets.
◦ It can be used for Binary as well as Multi-class Classifications.
◦ It performs well in Multi-class predictions as compared to the
other Algorithms.
◦ It is the most popular choice for text classification problems.
 Disadvantages of Naïve Bayes Classifier:
◦ Naive Bayes assumes that all features are independent or
unrelated, so it cannot learn the relationship between features.
Applications of Naïve Bayes Classifier:

 It is used for Credit Scoring.


 It is used in medical data classification.
 It can be used in real-time predictions because Naïve

Bayes Classifier is an eager learner.


 It is used in Text classification such as Spam

filtering and Sentiment analysis.

You might also like