0% found this document useful (0 votes)
92 views26 pages

MachineLearning Unit-III

Uploaded by

Hemanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views26 pages

MachineLearning Unit-III

Uploaded by

Hemanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Unit-III

Classification
Topics to Be Covered
• What is Classification?
• General approach to Classification
• K-Nearest Neighbor Algorithm
• Logistic regression
• Decision Trees
• Naive Bayesian
• Support Vector Machine (SVM)
What is Classification?
• Classification is a supervised machine learning method where
the model tries to predict the correct label of a given input
data.

• In classification, the model is fully trained using the training


data, and then it is evaluated on test data before being used
to perform prediction on unseen data.

For Example, an algorithm can learn to predict whether

🡪a given email is spam or ham (no spam)

🡪a tumor is malignant or benign’


Classification model

🡪A target feature is
categorical type

🡪The target categorical


feature is known as
class .
Classification Terminologies In ML
• Classifier – It is an algorithm that is used to map the input data to a specific
category.

• Classification Model – The model predicts or draws a conclusion to the input


data given for training, it will predict the class or category for the data.

• Feature – A feature is an individual measurable property of the phenomenon


being observed.

• Binary Classification – It is a type of classification with two outcomes, for eg –


either True or False / 1 or 0 / Yes or No

• Multi-Class Classification – The classification with more than two classes, in


multi-class classification each sample is assigned to one and only one label or
target.

• Multi-label Classification – This is a type of classification where each sample is


assigned to a set of labels or targets.
Binary Classification
Multi-Class Classification
Multi-label Classification
Classification Usecases
• Image classification
• Disease prediction
• Win–loss prediction of games
• Prediction of natural calamity such as earthquake, flood, etc.
• Handwriting recognition
• Document Classification
• Spam Filters
Classification Model Steps in ML

1. Problem Identification
2. Identification of Required Data
3. Data Pre-processing
4. Definition of Training Data Set
5. Algorithm Selection
6. Training
7. Evaluation with the Test Data Set
Algorithms for Classification
• k nearest neighbour
• Logistic regression
• Decision tree
• support vector machine
• Naive bayes
• Random forest
K-Nearest Neighbor (KNN) Algorithm
• K-Nearest Neighbors is one of the simplest supervised machine
learning algorithms used for classification. It classifies a data point
based on its neighbors’ classifications. It stores all available cases and
classifies new cases based on similar features.

• It is a lazy learning algorithm that stores all instances corresponding to


training data in n-dimensional space

• Classification is computed from a simple majority vote of the k nearest


neighbors of each point.

• To label a new point, it looks at the labeled points closest to that new
point also known as its nearest neighbors. It has those neighbors vote,
so whichever label most of the neighbors have is the label for the new
point. The “k” is the number of neighbors it checks.
KNN
K-Nearest Neighbor (KNN) Algorithm
• These distance functions can be Euclidean, Manhattan, Minkowski
and Hamming distance

How to Choose the Factor ‘K’?


• A KNN algorithm is based on feature similarity. Selecting the right K
value is a process called parameter tuning, which is important to
achieve higher accuracy.

• Sqrt(n), where n is the total number of data points


• Odd value of ‘k’ is selected to avoid confusion
We can use KNN when data is labeled, noise free and dataset is small
kNN algorithm
• Input: Training data set, test data set (or data points), value of ‘k’ (i.e. number
of nearest neighbours to be considered)

Steps:

• Step-1: Calculate Similarity based on distance function

• Step-2: Choose the K value

• Step-2: Find K-Nearest Neighbors and rank them based on minimal distance

• Step-3: Among these k neighbors, count the number of the data points in
each category.

• Step-5: Assign the new data points to the category for which the number of
the neighbor is maximum.
How Does KNN work?
• Consider a dataset with two variables
height(cms), weight(kg) and each data
point is classified as Normal and
Underweight
How Does KNN work?
• Suppose we have height, weight and T-shirt
size of some customers and we need to
predict the T-shirt size of a new customer
given only height and weight information we
have. Data including height, weight and
T-shirt size information is shown below

If a customer has height 161cm and weight


61kg then what would be his T-Shirt size?
How Does KNN work?
• Step 1 : Calculate Similarity based on distance function
Another Problem
K-Nearest Neighbor (KNN) Algorithm
Applications
• Recommender Systems
• Document / Content Searching
Advantages
• Extremely simple algorithm – easy to understand
• Very effective in certain situations, e.g. for recommender system
design
• Very fast or almost no time required for the training phase
Disadvantages
• Does not learn anything
• Computation cost is high because of calculating the distance
between the data points for all the training samples.
Metrics to Evaluate ML Classification Algorithms
Metrics to Evaluate ML Classification Algorithms
• True positives: The number of positive observations the model
correctly predicted as positive.

• False-positive: The number of negative observations the model


incorrectly predicted as positive.

• True negative: The number of negative observations the model


correctly predicted as negative.

• False-negative: The number of positive observations the model


incorrectly predicted as negative.
Metrics to Evaluate ML Classification Algorithms
Metrics to Evaluate ML Classification Algorithms

You might also like