Unit 4
Unit 4
Both classification and clustering are types of data mining techniques used to group data.
However, they differ in their approach:
○ Income
○ Credit Score
○ Loan Repayment History
Once trained, the model can classify new applicants as either "Approved" or "Rejected."
Steps in Classification
To understand different classification techniques better, let's first compare them based on
learning type, working principle, advantages, disadvantages, and common applications.
……………………………………………………………………………………………..
Example:
Imagine you're trying to decide whether to play tennis based on the weather. The decision
depends on factors like Outlook (Sunny, Overcast, Rain), Temperature (Hot, Mild, Cool),
Humidity (High, Normal), and Wind (Weak, Strong).
1. Select the Best Attribute: Choose the attribute that best separates the data into
distinct classes. This is often done using metrics like Information Gain or Gini Index.
2. Create a Node: Make a decision node that splits on the best attribute.
3. Split the Dataset: Divide the dataset into subsets where each subset contains data
with the same value for the chosen attribute.
4. Repeat: For each subset, repeat the process using the remaining attributes. Continue
until one of the stopping conditions is met (e.g., all instances in a subset belong to the
same class, no more attributes to split on).
Diagram:
● If the outlook is overcast, play tennis.
● If it's sunny, check the humidity:
○ High humidity: don't play.
○ Normal humidity: play.
● If it's rainy, check the wind:
○ Weak wind: play.
○ Strong wind: don't play.
Advantages:
Disadvantages:
Example:
Suppose you're classifying emails as 'Spam' or 'Not Spam' based on the presence of certain
words.
Steps:
1. Calculate Prior Probabilities: Determine the prior probability of each class (e.g., the
proportion of emails that are spam and not spam).
2. Calculate Likelihoods: For each word, calculate the likelihood of it appearing in
spam and not spam emails.
3. Apply Bayes' Theorem: For a new email, calculate the posterior probability for each
class using the prior probabilities and likelihoods.
4. Classify: Assign the class with the higher posterior probability to the email.
Diagram:
Lazy-Learner
Lazy learners are a type of machine learning model that do not learn a model during training.
Instead, they store the training data and classify new instances only when needed by
comparing them with existing data.
"Lazy learners delay processing until a new data point needs to be classified, making
them simple but computationally expensive."
Classifying a new fruit based on its features (e.g., weight, color) by looking at the 'k' most
similar fruits in the dataset.
Need:
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories. To solve this type of problem,
we need a K-NN algorithm. With the help of K-NN, we can easily identify the category or
class of a particular dataset. Consider the below diagram:
Steps:
The K-NN working can be explained on the basis of the below algorithm:
Suppose we have a new data point and we need to put it in the required category. Consider
the below image:
○ Firstly, we will choose the number of neighbors, so we will choose the k=5.
○ Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in
geometry. It can be calculated as:
By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors
in category A and two nearest neighbors in category B. Consider the below image:
○ As we can see the 3 nearest neighbors are from category A, hence this new data point
must belong to category A.
…………………………………………………………………………………………………
………
Example: Classifying Fruits Based on Features 🍎🍊
Suppose we classify a new fruit based on:
✔ Weight
✔ Color
● Let’s set k = 3.
Disadvantages of k-NN
Rule-Based Classification
Rule-based classification is a method where IF-THEN rules are used to classify data. Instead
of using a mathematical model like Decision Trees or SVM, this approach directly defines
rules for classification.
Example:
A bank wants to classify loan applicants as Low Risk or High Risk based on
income and credit score.
✔ Rule 1: IF income > 50,000 AND credit score > 700 THEN Low Risk.
✔ Rule 2: IF income < 30,000 AND credit score < 600 THEN High Risk.
3️ Handle Conflicts:
A 85 90 Excellent
B 40 60 Average
C 75 70 Good
D 20 30 Poor
Generated Rules:
✅
● New student: Math = 78, English = 72
● Matches Rule 3 → Classified as "Good"
…………………………………………………………………………………………………
Classification by Backpropagation (Neural Networks)
Introduction to Backpropagation
1️ Input Layer
2️ Hidden Layers
3️ Output Layer
● Produces final classification results (e.g., Spam/Not Spam, Digit 0-9, Disease/No
Disease).
Example:
For classifying handwritten digits (0-9), a neural network processes image pixels and
outputs a probability for each digit. The highest probability determines the final
classification.
● Steps 1-3 are repeated multiple times (Epochs) until the model achieves high
classification accuracy.
Advantages of Backpropagation
Disadvantages
Applications:
Medical Diagnosis:
● Predicting Diseases from Symptoms (e.g., COVID-19 detection from lung scans).
Fraud Detection in Banking:
Speech Recognition:
Image Classification:
…………………………………………………………………………………………………
"SVM finds the optimal boundary that maximizes the margin between two classes,
ensuring better generalization on unseen data."
📌 Example:
● In 2D space, a hyperplane is a straight line.
● In 3D space, a hyperplane is a plane.
● In higher dimensions, a hyperplane is a complex boundary.
● The vertical line in the middle is the hyperplane separating the two classes.
Among many possible hyperplanes, SVM finds the one that maximizes the margin
(distance between the closest points of both classes).
📌 Example:
● If two groups of students are separated by a line based on height and weight, SVM
finds the line that best separates them with the maximum margin.
Larger margin → Better generalization to new data.
Smaller margin → Higher risk of misclassification.
Sometimes, data cannot be separated by a straight line. In such cases, SVM uses the Kernel
Trick to transform the data into higher dimensions where it becomes linearly separable.
📌 Example:
● Imagine a dataset where points of Class A are inside a circle, and Class B points are
outside.
● In 2D space, this is not separable by a straight line.
● SVM applies a Kernel function to project the data into 3D space, where a hyperplane
can separate the classes.
Disadvantages of SVM
❌ Slow for Large Datasets – Training time increases as data size grows.
❌ Difficult to Interpret – Unlike Decision Trees, SVM results are not easily explainable.
❌ Choosing the Right Kernel Can Be Tricky – Requires tuning and testing different kernels.
Medical Diagnosis:
Handwriting Recognition:
Bioinformatics:
"A model is only useful if it makes accurate predictions on new, unseen data."
Example:
If a spam filter correctly classifies 90 out of 100 emails, the accuracy is:
❌ Not useful when one class dominates (e.g., detecting rare diseases).
2️ Confusion Matrix (Detailed Performance Analysis)
A confusion matrix provides a detailed breakdown of how well a model classifies different
categories.
If a model performs poorly, we can enhance its accuracy using these techniques:
1️ Feature Selection
2️ Hyperparameter Tuning
● Optimize model settings (e.g., Decision Tree depth, SVM kernel type).
F1-Score Harmonic mean of Precision & Used when both precision & recall are
Recall important
……………………………………………………………………………………………..
Clustering