MachineLearning Unit-III
MachineLearning Unit-III
Classification
Topics to Be Covered
• What is Classification?
• General approach to Classification
• K-Nearest Neighbor Algorithm
• Logistic regression
• Decision Trees
• Naive Bayesian
• Support Vector Machine (SVM)
What is Classification?
• Classification is a supervised machine learning method where
the model tries to predict the correct label of a given input
data.
🡪A target feature is
categorical type
1. Problem Identification
2. Identification of Required Data
3. Data Pre-processing
4. Definition of Training Data Set
5. Algorithm Selection
6. Training
7. Evaluation with the Test Data Set
Algorithms for Classification
• k nearest neighbour
• Logistic regression
• Decision tree
• support vector machine
• Naive bayes
• Random forest
K-Nearest Neighbor (KNN) Algorithm
• K-Nearest Neighbors is one of the simplest supervised machine
learning algorithms used for classification. It classifies a data point
based on its neighbors’ classifications. It stores all available cases and
classifies new cases based on similar features.
• To label a new point, it looks at the labeled points closest to that new
point also known as its nearest neighbors. It has those neighbors vote,
so whichever label most of the neighbors have is the label for the new
point. The “k” is the number of neighbors it checks.
KNN
K-Nearest Neighbor (KNN) Algorithm
• These distance functions can be Euclidean, Manhattan, Minkowski
and Hamming distance
Steps:
• Step-2: Find K-Nearest Neighbors and rank them based on minimal distance
• Step-3: Among these k neighbors, count the number of the data points in
each category.
• Step-5: Assign the new data points to the category for which the number of
the neighbor is maximum.
How Does KNN work?
• Consider a dataset with two variables
height(cms), weight(kg) and each data
point is classified as Normal and
Underweight
How Does KNN work?
• Suppose we have height, weight and T-shirt
size of some customers and we need to
predict the T-shirt size of a new customer
given only height and weight information we
have. Data including height, weight and
T-shirt size information is shown below