ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
Computer Engineering
Machine Learning
Sem 7
Unit #2 & 3
Classification model tries to predict the correct label of a given input data.
Classification systems, access control, and surveillance. media posts) to determine sentiment (positive,
negative, neutral) and understand public opinion
Disease Diagnosis: and brand perception.
classify diseases or predict the likelihood of
certain conditions, assisting in medical diagnosis. Email Spam Filtering:
classify emails as either spam or non-spam,
Land Cover Classification in Remote helping in filtering unwanted or malicious emails
Sensing:
Toxic Comment Classification:
classify land cover types (e.g., forests, urban
areas, water bodies) in satellite or aerial imagery, classify text comments as toxic or non-toxic,
aiding in environmental monitoring, urban helping to identify and moderate harmful or
planning, and natural resource management. abusive content on online platforms.
Handwriting Recognition: . Fraud Detection:
classify handwritten characters or text, finding identify fraudulent transactions or activities,
applications in optical character recognition playing a critical role in financial institutions, e-
(OCR) systems and digitizing handwritten commerce platforms, and security systems
documents
Stock Market Prediction:
Document Classification: classify stocks as buy, sell, or hold based on
categorize documents, such as news articles, historical market data and indicators
legal documents, or customer support tickets,
into relevant categories, facilitating efficient Recommendation Systems:
Applications of document retrieval and organization
predict user preferences and classify items or
Lazy Learners:
Lazy Learner firstly stores the training dataset and wait until it receives
the test dataset.
In Lazy learner case, classification is done on the basis of the most
related data stored in the training dataset.
Learners in It takes less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
Classification
Problems: Eager Learners:
Eager Learners develop a classification model based on a training
dataset before receiving a test dataset.
Opposite to Lazy learners, Eager Learner takes more time in learning,
and less time in prediction.
Example: Logistic Regression, Support Vector Machine, Decision Trees,
Naïve Bayes, ANN.
Lazy learners
VS
Eager Learners
Logistic Regression
Introduction, Type of Logistic Regression, Sigmoid Function,
Example, Advantage & Disadvantage
Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the supervised Learning
technique.
Function) the "S" form. The S-form curve is called the Sigmoid
function or the logistic function.
z=−5+1.5⋅X
Advantages Logistic Regression performs well when the dataset is linearly separable.
and Logistic Regression not only gives a measure of how relevant a predictor
Disadvantages (coefficient size) is, but also its direction of association (positive or negative).
of Logistic Disadvantages of Logistic Regression Algorithm:
Regression
If the number of observations are lesser than the number of features,
Logistic Regression should not be used, otherwise it may lead to overfit.
Contd.
Note: Don’t get confused between SVM and logistic
regression.
SVM algorithm finds the closest point of the lines from both
the classes. These points are called support vectors.
Contd.
The distance between the vectors and the hyperplane is
called as margin.
Non-Linear
SVM
By adding the third dimension, the sample space will become as below
image (a).
So now, SVM will divide the datasets into classes in the following way.
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-
axis. If we convert it in 2d space with z=1, then it will become as, Consider
Contd. the below image (b)
Contd.
Linear Kernel
Polynomial Kernel
Different Sigmoid Kernel
Kernel
Functions RBF Kernel
Contd.
Contd.
Contd.
Advantages of SVM Algorithm:
It also doesn’t perform very well, when the data set has
more noise i.e. target classes are overlapping.
DEMO
https://colab.research.google.com/drive/1Yj4t10E7jqxL
--------------
kGTJeNc-aV7_8epgY3iR?usp=sharing
SVM
K- Nearest Neighbours
(KNN)
Introduction, Algorithm, Example
The K-NN algorithm compares a new data entry to the
values in a given data set (with different classes or
categories).
the value of K is 3
How to Choose Choosing a very low value will most likely lead to
the Value of K inaccurate predictions.
in the K-NN
The commonly used value of K is 5.
Algorithm
Always use an odd number as the value of K.
BRIGHTNESS SATURATION CLASS
40 20 Red
50 50 Blue
60 90 Blue
K-Nearest 10 25 Red
Neighbors 70 70 Blue
Classifiers and 60 10 Red
Model 25 80 Blue
Example With The table above represents our data set. We have two columns -
Data Set Brightness and Saturation. Each row in the table has a class of
either Red or Blue. let's assume the value of K is 5. we introduce
a new data entry,
BRIGHTNESS SATURATION CLASS
Apply KNN to find the class of New Entry. 20 35 ?
Step #1 - Assign a value to K.
Where:
Contd. X₂ = New entry's brightness (20).
Example:
Contd.
Contd.
Contd.
Apply KNN to find the class of New Entry. (Take k =1,2,5)
Example
ANS:
Example
Advantages of KNN Algorithm:
It is simple to implement.
Bayes’
Theorem Above,
(In case of a
single feature)
Whether No Yes
Overcast 0 4
Contd. Sunny 3 2
Rainy 2 3
Total: 5 9
Step 2: Create Likelihood table by finding the probabilities
Whether No Yes
Overcast 0 4 P(Overcast) = 4/14 = 0.29
Sunny 3 2 P(Sunny) = 5/14 = 0.36 Likelihood Table 1
Now suppose you want to calculate the probability of playing when the weather is overcast.
Probability of playing:
Now suppose you want to calculate the probability of playing when the weather is overcast.
= (0 * 0.36 ) / 0.29 = 0
Step 4: See which class has a higher probability, given the
input belongs to the higher probability class.
P(No | Overcast) = 0
Contd.
Ans: ???
Ans: ???
Color
Fruit ID Class
(Feature)
1 Red Apple
2 Red Apple
3 Green Apple
4 Orange Orange
Example: 5 Orange Orange
6 Green Apple
1 Red Apple
2 Red Apple
3 Green Apple
4 Orange Orange
Contd. 5 Orange Orange
6 Green Apple
Now suppose you want to calculate the probability of Apple when the Color is Red.
Probability of Apple:
Now suppose you want to calculate the probability of Orange when the Color is Red.
Probability of Orange:
= (0 * 0.33 ) / 0.33 = 0
Step 4: See which class has a higher probability, given the
input belongs to the higher probability class.
P(Apple | Red) = 1
Contd.
P(Orange | Red) = 0
P(Teen) = 0.6
P(Adult) = 0.4
Likelihoods:
Classification: For a new person whose favorite drink is "Juice", the person is
classified as "Teen" based on the higher posterior probability.
Age Group Buys Computer?
Apply Naive Bayes for
Youth No
Given a new instance
Youth No
with an input feature
Middle Yes
(e.g., Age Group = Youth),
Senior Yes
calculate the probability
Example Senior Yes
of each class
Senior No
Middle Yes
Youth No
Youth Yes
Senior Yes
P(Yes) = 5 / 10 = 0.5
P(No) = 5 / 10 = 0.5
Total 5 5
Win
Spam Not Spam Posterior probabilities (Spam) Posterior probabilities (Not Spam)
(Feature 1)
Yes 4 2 P(Win=Yes | Spam) = 4/5 = 0.8 P(Win=Yes | Not Spam) = 2/5 = 0.4
P(Win=No | Not Spam) = 3/5 = 0.6
No 1 3 P(Win=No | Spam) = 1/5 = 0.2
Total 5 5
Step 3: Apply Bayes Formula and calculate posterior probability for each target value
• Calculate the probability of Spam with New email classification with Free =Yes, Win = No.
P(Spam | Free =Yes, Win = No) = P(Free =Yes | Spam) * P(Win = No | Spam) * P(Spam)
= 0.8 * 0.2 * 0.5 = 0.08
• Calculate the probability of Not Spam with New email classification with Free = Yes, Win
= No.
P(Not Spam | Free =Yes, Win = No)
= P(Free =Yes | Not Spam) * P(Win = No | Not Spam) * P(Not Spam)
= 0.4 * 0.6 * 0.5 = 0.12
Step 4: See which class has a higher probability, given
the input belongs to the higher probability class.
Contd.
Since 0.12 > 0.08, we classify the new email as "Not
Spam".
Contd.
Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into
two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
Step-1: Begin the tree with the root node, says S, which
contains the complete dataset.
Where,
P(no)= probability of no
Gini index is a measure of impurity or purity used while
creating a decision tree in the CART(Classification and
Regression Tree) algorithm.
Gini Index
An attribute with the low Gini index should be
preferred as compared to the high Gini index.
Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM) – Consider Highest value of Information Gain Feature
as best Feature
Step-3: Divide the S into subsets that contains possible values for the
best attributes.
ID3 Algorithm:
Step-4: Generate the decision tree node, which contains the best
attribute.
Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as
a leaf node.
Example:
Temperature Humidity Play
T T Yes
T T No
F T Yes
F F No
T F Yes
Example:
T F No
T F No
Ans: ??????
Example:
Example:
Final Tree:
DEMO
https://colab.research.google.com/drive/1Yj4t10E7jqxL
--------------
kGTJeNc-aV7_8epgY3iR?usp=sharing
Decision Tree
Advantages of the Decision Tree
It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
For more class labels, the computational complexity of the decision tree
may increase.
Ensemble Methods
Bagging – Random Forest Algorithm, Boosting – XGBoost
Ensemble simply means combining multiple models.
Ensemble
Methods
Bagging, also known as Bootstrap Aggregation, serves as the ensemble technique in the
Random Forest algorithm. Here are the steps involved in Bagging:
Selection of Subset: Bagging starts by choosing a random sample, or subset, from the
entire dataset.
Bootstrap Sampling: Each model is then created from these samples, called Bootstrap
Samples, which are taken from the original data with replacement. This process is
known as row sampling.
Majority Voting: The final output is determined by combining the results of all models
through majority voting. The most commonly predicted outcome among the models is
selected.
Aggregation: This step, which involves combining all the results and generating the final
output based on majority voting, is known as aggregation.
Bagging
Bagging
Step 1: In the Random forest model, a subset of data
points and a subset of features is selected for
constructing each decision tree. Simply put, n random
records and m features are taken from the data set
having k number of records.
Random Step 2: Individual decision trees are constructed for
Forest each sample.
It’s obvious that all three models work in completely different ways. For
Contd. instance, the linear regression model tries to capture linear relationships in
the data while the decision tree model attempts to capture the non-
linearity in the data.
How about, instead of using any one of these models for making the final
predictions, we use a combination of all of these models?