0% found this document useful (0 votes)

18 views26 pages

Unit 4

The document provides an overview of classification and clustering techniques in data mining, highlighting methods such as Decision Trees, Naïve Bayes, Rule-Based Classification, and k-Nearest Neighbors (k-NN). It outlines the steps involved in classification, compares various techniques based on their learning types and applications, and discusses the advantages and disadvantages of each method. Additionally, it explains the concept of backpropagation in neural networks for improving classification accuracy.

Uploaded by

sharmashree9876

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views26 pages

Unit 4

Uploaded by

sharmashree9876

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Unit 4

Classification: Introduction, Classification using Decision tree Induction, Bayesian

classification methods, Rule-based Classification, Model Evaluation and selection,
techniques to improve classification accuracy, classification by backpropagation, SVM and
lazy learners, cluster analysis: Introduction, Basic clustering methods, partitioning methods,
hierarchical methods, evaluation of clustering.
…………………………………………………………………………………………..

Data Mining techniques: (Classification V/s Clustering)

Both classification and clustering are types of data mining techniques used to group data.
However, they differ in their approach:

Feature Classification Clustering

Definition Assigns labels to data based Groups similar data points

on predefined categories without predefined labels

Supervised/Unsupervised Supervised Learning Unsupervised Learning (No

? (Requires labeled data) predefined labels)

Example Classifying emails as Spam Grouping customers based on

or Not Spam shopping behavior

Classification in Data Mining

What is Classification?
Classification is a Supervised Learning technique where the system learns from labelled data
and predicts the class of new data points. For example:

A bank wants to classify loan applicants as low-risk or high-risk based on:

○ Income
○ Credit Score
○ Loan Repayment History

Once trained, the model can classify new applicants as either "Approved" or "Rejected."

Steps in Classification

1. Data Collection: Gather labeled data (e.g., past loan approvals).

2️. Data Preprocessing: Clean and format data (remove missing values).
3️. Model Training: Train a classifier using algorithms like Decision Tree or Naïve Bayes.
4️. Model Testing: Test the classifier on new data.
5️. Prediction: Predict the class (e.g., Approve/Reject).

To understand different classification techniques better, let's first compare them based on
learning type, working principle, advantages, disadvantages, and common applications.

Method Learning Working Advantage Disadvantages Common

Type Principle s Applications

Decision Tree Supervise Builds a Easy to Prone to Medical

Induction d tree using understand, overfitting, diagnosis,
features, handles unstable with customer
splits data numerical small changes churn
at decision & in data prediction
nodes categorical
data

Bayesian Supervise Uses Works well Assumes Spam

Classification d Bayes’ with small features are filtering,
(Naïve Bayes) Theorem to data, fast independent sentiment
compute analysis
class
probabilitie
s
Rule-Based Supervise Uses Easy to Hard to Fraud
Classification d IF-THEN interpret, generate detection,
rules to can be optimal rules expert
classify extracted systems
data from
decision
trees

Model N/A Uses Helps select No direct Used in

Evaluation & metrics like the best classification comparing
Selection accuracy, model for classification
precision, given data models
recall to
evaluate
models

Techniques to N/A Uses Enhances Can be Image

Improve feature classificatio computationall recognition,
Accuracy selection, n accuracy y expensive fraud
ensemble detection
methods
(bagging,
boosting) to
improve
performanc
e

Backpropagati Supervise Uses Very Requires large Face

on (Neural d gradient powerful, data and long recognition,
Networks) descent to works well training time speech
train with recognition
multi-layer complex
neural patterns
networks

Support Vector Supervise Finds the Works well Computationall Handwriting

Machines d best with y expensive for recognition,
(SVM) hyperplane high-dimen large datasets bioinformati
to separate sional data cs
data
Lazy Learners Supervise Stores data No training Slow for large Recommend
(e.g., k-NN) d and phase, datasets er systems,
classifies simple pattern
new points recognition
based on
similarity

……………………………………………………………………………………………..

Decision Tree Classification

A Decision Tree is a flowchart-like structure where each internal node represents a decision
on an attribute, each branch represents the outcome of the decision, and each leaf node
represents a class label (decision taken after computing all attributes). It's one of the most
intuitive and widely used classification methods.

Example:

Imagine you're trying to decide whether to play tennis based on the weather. The decision
depends on factors like Outlook (Sunny, Overcast, Rain), Temperature (Hot, Mild, Cool),
Humidity (High, Normal), and Wind (Weak, Strong).

Steps to Construct a Decision Tree:

1. Select the Best Attribute: Choose the attribute that best separates the data into
distinct classes. This is often done using metrics like Information Gain or Gini Index.
2. Create a Node: Make a decision node that splits on the best attribute.
3. Split the Dataset: Divide the dataset into subsets where each subset contains data
with the same value for the chosen attribute.
4. Repeat: For each subset, repeat the process using the remaining attributes. Continue
until one of the stopping conditions is met (e.g., all instances in a subset belong to the
same class, no more attributes to split on).

Diagram:
● If the outlook is overcast, play tennis.
● If it's sunny, check the humidity:
○ High humidity: don't play.
○ Normal humidity: play.
● If it's rainy, check the wind:
○ Weak wind: play.
○ Strong wind: don't play.

Advantages:

● Easy to understand and interpret.

● Can handle both numerical and categorical data.

Disadvantages:

● Prone to overfitting, especially with noisy data.

● Can become complex for large datasets.

Naïve Bayes Classification

Naïve Bayes is a probabilistic classifier based on Bayes' Theorem, assuming independence
between predictors. Despite the 'naïve' assumption of independence, it performs well in many
real-world situations.
It is widely used for text classification (e.g., Spam detection, Sentiment analysis).

Example:

Suppose you're classifying emails as 'Spam' or 'Not Spam' based on the presence of certain
words.

Steps:

1. Calculate Prior Probabilities: Determine the prior probability of each class (e.g., the
proportion of emails that are spam and not spam).
2. Calculate Likelihoods: For each word, calculate the likelihood of it appearing in
spam and not spam emails.
3. Apply Bayes' Theorem: For a new email, calculate the posterior probability for each
class using the prior probabilities and likelihoods.
4. Classify: Assign the class with the higher posterior probability to the email.

Diagram:

P(Spam|Words) = [P(Words|Spam) * P(Spam)] / P(Words)

We classify emails as Spam or Not Spam based on:

✔ Presence of certain words (e.g., “Free”, “Discount”, “Win”)
✔ Number of links in the email
✔ Sender's email domain

Step-by-Step Working of Naïve Bayes

1. Calculate Prior Probability

● P(Spam) = Number of spam emails / Total emails

● P(Not Spam) = Number of not spam emails / Total emails
2️. Calculate Likelihood (Conditional Probability)

● P(“Free” | Spam) = Probability of the word “Free” appearing in spam emails.

● P(“Free” | Not Spam) = Probability of “Free” appearing in not spam emails.

3️. Apply Bayes' Theorem:

4. Compare Probabilities & Classify the Email

● If P(Spam | Words) > P(Not Spam | Words) → Mark as Spam

● Else → Mark as Not Spam

Advantages of Naïve Bayes

✔ Fast and Efficient for Large Datasets.

✔ Performs Well Even with Small Data.
✔ Great for Text Classification (Spam detection, Sentiment analysis).

Disadvantages of Naïve Bayes

❌ Assumes Features are Independent – Not always true in real life.

❌ Zero Probability Problem – If a word is not in the training data, probability becomes zero.
❌ Not Ideal for Complex Relationships.

Lazy-Learner
Lazy learners are a type of machine learning model that do not learn a model during training.
Instead, they store the training data and classify new instances only when needed by
comparing them with existing data.

"Lazy learners delay processing until a new data point needs to be classified, making
them simple but computationally expensive."

✔ k-Nearest Neighbors (k-NN) is the most commonly used Lazy Learner.

k-NN is a non-parametric, instance-based learning algorithm where the classification of a

sample is determined by the majority class among its 'k' nearest neighbors in the feature
space.
Example:

Classifying a new fruit based on its features (e.g., weight, color) by looking at the 'k' most
similar fruits in the dataset.

K-Nearest Neighbor(KNN)/ Lazy Learner

○ K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
○ K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
○ K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a
well suite category by using K- NN algorithm.
○ K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
○ K-NN is a non-parametric algorithm, which means it does not make any assumption
on underlying data.
○ It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs
an action on the dataset.
○ KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
○ Example: Suppose, we have an image of a creature that looks similar to cat and dog,
but we want to know either it is a cat or dog. So for this identification, we can use the
KNN algorithm, as it works on a similarity measure. Our KNN model will find the
similar features of the new data set to the cats and dogs images and based on the most
similar features it will put it in either cat or dog category.

Need:
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories. To solve this type of problem,
we need a K-NN algorithm. With the help of K-NN, we can easily identify the category or
class of a particular dataset. Consider the below diagram:

Steps:
The K-NN working can be explained on the basis of the below algorithm:

○ Step-1: Select the number K of the neighbors

○ Step-2: Calculate the Euclidean distance of K number of neighbors
○ Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
○ Step-4: Among these k neighbors, count the number of the data points in each
category.
○ Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
○ Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category. Consider
the below image:

○ Firstly, we will choose the number of neighbors, so we will choose the k=5.
○ Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in
geometry. It can be calculated as:

By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors
in category A and two nearest neighbors in category B. Consider the below image:

○ As we can see the 3 nearest neighbors are from category A, hence this new data point
must belong to category A.

…………………………………………………………………………………………………
………
Example: Classifying Fruits Based on Features 🍎🍊
Suppose we classify a new fruit based on:
✔ Weight
✔ Color

We have existing labeled fruits:

Fruit Weight Color Class

Apple 180g Red 🍎 Apple

Apple 170g Red 🍎 Apple
Orange 200g Orange 🍊 Orange
Orange 190g Orange 🍊 Orange
??? 185g Red ???

Step-by-Step Working of k-NN

1. Choose k (Number of Neighbors).

● Let’s set k = 3.

2. Calculate Distance Between New Sample and Existing Data Points.

● Common methods: Euclidean Distance

Find the k Nearest Neighbors.

● New fruit is closest to 2 Apples 🍎 and 1 Orange 🍊.

4️. Majority Voting: Assign the Class.

● Since 2 Apples > 1 Orange, classify as Apple 🍎.

Advantages of k-NN
✔ Simple and Easy to Implement.
✔ No Training Required (Lazy Learning).
✔ Can Handle Complex Patterns Well.

Disadvantages of k-NN

❌ Computationally Expensive for Large Datasets.

❌ Sensitive to Noisy Data and Irrelevant Features.
❌ Choosing the Right 'k' Value is Crucial.
…………………………………………………………………………………………………

Rule-Based Classification
Rule-based classification is a method where IF-THEN rules are used to classify data. Instead
of using a mathematical model like Decision Trees or SVM, this approach directly defines
rules for classification.

"If a certain condition is met, then assign a class label."

Example:
A bank wants to classify loan applicants as Low Risk or High Risk based on
income and credit score.

✔ Rule 1: IF income > 50,000 AND credit score > 700 THEN Low Risk.
✔ Rule 2: IF income < 30,000 AND credit score < 600 THEN High Risk.

Steps in Rule-Based Classification

1️. Extract Rules from Data:

● Can be manually defined by experts OR

● Can be generated from Decision Trees (e.g., ID3, C4.5).

2️ Apply Rules to New Data:

● Check each rule one by one.

● If a rule matches, classify the instance.

3️ Handle Conflicts:

● If multiple rules match, use a priority ranking or confidence score.

Example: Classifying Students Based on Exam Scores

Student Math Score English Score Class

A 85 90 Excellent

B 40 60 Average

C 75 70 Good

D 20 30 Poor

Generated Rules:

✔ Rule 1: IF Math > 80 AND English > 85 THEN Class = Excellent.

✔ Rule 2: IF Math < 50 AND English < 50 THEN Class = Poor.
✔ Rule 3: IF Math > 70 OR English > 70 THEN Class = Good.

Classifying a New Student:

✅
● New student: Math = 78, English = 72
● Matches Rule 3 → Classified as "Good"

Advantages of Rule-Based Classification

✔ Easy to Understand: Rules are human-readable and interpretable.

✔ Can Be Generated from Decision Trees: We can extract rules from trees (like ID3,
C4.5).
✔ Handles Both Numerical and Categorical Data.

Disadvantages of Rule-Based Classification

Difficult to Find Optimal Rules: Manually defining rules is time-consuming.

Conflicts Between Rules: If two rules apply to the same instance, resolving conflicts can be
tricky.
May Not Generalize Well: If rules are too specific, they may not work well on new data.

…………………………………………………………………………………………………
Classification by Backpropagation (Neural Networks)
Introduction to Backpropagation

Backpropagation is the learning algorithm used in artificial neural networks (ANNs) to

train models for classification tasks. It adjusts the weights of the network to minimize errors
and improve accuracy.
"A neural network learns by adjusting its internal weights using backpropagation,
improving classification accuracy over time."

How Neural Networks Work in Classification

A neural network consists of three main layers:

1️ Input Layer

● Receives raw data (features).

● Example: In handwritten digit recognition, the input could be pixel values of an
image.

2️ Hidden Layers

● Processes data through multiple neurons connected with weights.

● Uses activation functions like ReLU, Sigmoid, or Tanh to make non-linear
decisions.

3️ Output Layer

● Produces final classification results (e.g., Spam/Not Spam, Digit 0-9, Disease/No
Disease).

Example:
For classifying handwritten digits (0-9), a neural network processes image pixels and
outputs a probability for each digit. The highest probability determines the final
classification.

How Backpropagation Works

Backpropagation is a technique that adjusts the weights in a neural network to reduce

classification errors.
Step 4: Repeat Until Convergence

● Steps 1-3 are repeated multiple times (Epochs) until the model achieves high
classification accuracy.

Advantages of Backpropagation

✔ Can Learn Complex Patterns: Handles large datasets efficiently.

✔ Works with Non-Linear Data: Unlike Decision Trees, it can classify non-linearly
separable data.
✔ Can Be Used for Any Classification Task: Speech recognition, image classification,
fraud detection.

Disadvantages

❌ Requires Large Datasets: Needs thousands or millions of examples to learn well.

❌ Computationally Expensive: Requires GPUs for deep networks.
❌ May Get Stuck in Local Minima: Gradient Descent may not always find the global best
solution.

Applications:

Medical Diagnosis:

● Predicting Diseases from Symptoms (e.g., COVID-19 detection from lung scans).
Fraud Detection in Banking:

● Detecting credit card fraud using transaction patterns.

Speech Recognition:

● Google Assistant, Siri classify spoken words into text.

Image Classification:

● Facebook’s face recognition system classifies photos based on people.

Step Process in Backpropagation

Step 1 Forward propagation: Compute output from input

Step 2 Compute error using loss function

Step 3 Backpropagation: Adjust weights using gradient descent

Step 4 Repeat until accuracy improves

…………………………………………………………………………………………………

Support Vector Machines (SVM) for Classification

Support Vector Machine (SVM) is a powerful classification algorithm that works by finding
the best boundary (hyperplane) that separates different classes. It is widely used in image
recognition, medical diagnosis, and text classification (spam detection, sentiment analysis,
etc.).

"SVM finds the optimal boundary that maximizes the margin between two classes,
ensuring better generalization on unseen data."

How SVM Works (Step-by-Step Explanation)

Step 1: Understanding the Concept of a Hyperplane

A hyperplane is a decision boundary that separates data into different classes.

📌 Example:
● In 2D space, a hyperplane is a straight line.
● In 3D space, a hyperplane is a plane.
● In higher dimensions, a hyperplane is a complex boundary.

● The vertical line in the middle is the hyperplane separating the two classes.

Step 2: Finding the Optimal Hyperplane

Among many possible hyperplanes, SVM finds the one that maximizes the margin
(distance between the closest points of both classes).

📌 Example:
● If two groups of students are separated by a line based on height and weight, SVM
finds the line that best separates them with the maximum margin.
Larger margin → Better generalization to new data.
Smaller margin → Higher risk of misclassification.

Step 3: Handling Non-Linearly Separable Data (Kernel Trick)

Sometimes, data cannot be separated by a straight line. In such cases, SVM uses the Kernel
Trick to transform the data into higher dimensions where it becomes linearly separable.

📌 Example:
● Imagine a dataset where points of Class A are inside a circle, and Class B points are
outside.
● In 2D space, this is not separable by a straight line.
● SVM applies a Kernel function to project the data into 3D space, where a hyperplane
can separate the classes.

Now, a hyperplane can easily separate them.

Advantages of SVM

✔ Works Well with High-Dimensional Data – Used in text classification, bioinformatics,

image recognition.
✔ Effective When Classes Are Well-Separated – Finds the best boundary for classification.
✔ Robust to Overfitting – Especially in high-dimensional spaces.
✔ Can Handle Non-Linear Data with Kernel Trick – Useful for complex datasets.

Disadvantages of SVM

❌ Slow for Large Datasets – Training time increases as data size grows.
❌ Difficult to Interpret – Unlike Decision Trees, SVM results are not easily explainable.
❌ Choosing the Right Kernel Can Be Tricky – Requires tuning and testing different kernels.

Applications of SVM in Classification

Medical Diagnosis:

● Cancer detection (classifying malignant vs. benign tumors).

● Identifying diseases from medical images.

Handwriting Recognition:

● Classifying handwritten characters in OCR (Optical Character Recognition) systems.

Text and Sentiment Analysis:

● Spam classification (Spam vs. Not Spam emails).

● Sentiment detection (Positive vs. Negative reviews).

Bioinformatics:

● Classifying DNA sequences into different species.

Step Process in SVM

Step 1 Identify features in the dataset

Step 2 Find the optimal hyperplane that separates classes

Step 3 Apply Kernel Trick if data is non-linearly separable

Step 4 Classify new data using the trained SVM model

Model Evaluation and Selection

Once we build a classification model, we need to evaluate its performance before using it for
real-world predictions. If the model performs poorly, we might need to tune it, select a
different model, or improve data quality.

"A model is only useful if it makes accurate predictions on new, unseen data."

Why is Model Evaluation Important?

✔ Prevents overfitting (model memorizing training data instead of learning patterns).

✔ Ensures that the model performs well on real-world data.
✔ Helps compare multiple models and select the best one.

4.1 Metrics for Model Evaluation

To evaluate a classification model, we use the following metrics:

1️ Accuracy (Overall Correct Predictions)

Example:

If a spam filter correctly classifies 90 out of 100 emails, the accuracy is:

Good when the dataset has balanced classes.

❌ Not useful when one class dominates (e.g., detecting rare diseases).
2️ Confusion Matrix (Detailed Performance Analysis)
A confusion matrix provides a detailed breakdown of how well a model classifies different
categories.

Example: Spam Classification

Actual ↓ / Predicted → Spam (Positive) Not Spam (Negative)

Spam (Positive) True Positive (TP) ✅ False Negative (FN) ❌

Not Spam (Negative) False Positive (FP) ❌ True Negative (TN) ✅
✔ True Positives (TP) – Model correctly classifies spam as spam.
✔ True Negatives (TN) – Model correctly classifies non-spam as non-spam.
✔ False Positives (FP) – Model mistakenly classifies a normal email as spam.
✔ False Negatives (FN) – Model mistakenly classifies spam as normal.
Model Selection Techniques
Once we evaluate multiple models, we need to select the best one.

1️ Holdout Method (Train-Test Split)

● Split dataset into Training (80%) and Testing (20%).

● Train on 80%, test on 20%, measure performance.
● Fast but may give different results for different splits.

2️ Cross-Validation (More Reliable)

● Split data into k parts (e.g., k=5).

● Train on k-1 parts, test on 1 part, repeat for each fold.
● Averages the results for a stable performance estimate.

3️ Bootstrapping (For Small Datasets)

● Resample data with replacement to create multiple training sets.

● Helps estimate model performance when data is limited.

Model Comparison Example

📌 Suppose we test three classification models on a disease prediction dataset:

Model Accuracy Precision Recall F1-Score

Decision Tree 85% 0.80 0.78 0.79

Naïve Bayes 88% 0.76 0.91 0.83

SVM 92% 0.89 0.85 0.87

✅ Best Model: SVM (92% Accuracy, High Precision & Recall).

Improving Model Performance

If a model performs poorly, we can enhance its accuracy using these techniques:

1️ Feature Selection

● Remove irrelevant features (e.g., removing ID numbers in a dataset).

● Keep only meaningful attributes (e.g., Age, Symptoms for disease prediction).

2️ Hyperparameter Tuning

● Optimize model settings (e.g., Decision Tree depth, SVM kernel type).

3️ Ensemble Methods (Combining Multiple Models)

● Bagging: Train multiple Decision Trees and average their predictions.

● Boosting: Train models sequentially, correcting errors at each step.

✅ Example: Random Forest (Bagging) and XGBoost (Boosting) improve accuracy

significantly.

Metric Definition Use Case

Accuracy Overall correct predictions Best for balanced datasets

Precision TP / (TP + FP) Important for fraud detection

Recall TP / (TP + FN) Important for medical diagnosis

F1-Score Harmonic mean of Precision & Used when both precision & recall are
Recall important

AUC-RO Measures model discrimination Used in spam detection, medical tests

Comparison of Classification Methods

Classification Working Advantages Disadvantages Best Used For
Method Principle

Decision Tree Builds a tree Easy to Prone to Customer

Induction where each node interpret, overfitting, churn
is a decision rule works with unstable with prediction,
both small data medical
categorical & changes diagnosis
numerical
data

Bayesian Uses Bayes' Fast, works Assumes Spam filtering,

Classification Theorem to well with features are sentiment
(Naïve Bayes) calculate small independent analysis
probability of datasets, used (which may not
classes in text be true)
classification

Rule-Based Uses IF-THEN Easy to Hard to generate Fraud

Classification rules to classify interpret, optimal rules detection,
data extracted medical
from Decision diagnosis
Trees

Model Uses accuracy, Helps in No direct Choosing the

Evaluation & precision, recall, selecting the classification best
Selection F1-score to best model classification
measure model for
performance deployment

Techniques to Feature Enhances Some techniques Image

Improve selection, model require extra recognition,
Accuracy hyperparameter performance computation fraud detection
tuning,
ensemble
learning

Classification by Adjusts neuron Learns Requires large Face

Backpropagation weights to complex datasets and recognition,
(Neural minimize error patterns, used computing speech
Networks) using gradient in deep power recognition
descent learning

Support Vector Finds the best Works well Computationally Medical

Machines (SVM) decision with expensive for imaging, text
boundary high-dimensio large datasets classification
(hyperplane) to nal data,
separate classes robust to
overfitting

Lazy Learners Classifies data Simple, no Slow for large Recommender

(e.g., k-NN) based on the training datasets, systems,
majority class of required sensitive to noise pattern
k-nearest recognition
neighbors

……………………………………………………………………………………………..

Clustering

CH 5
No ratings yet
CH 5
21 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
Classification
No ratings yet
Classification
23 pages
IR Unit 2 (1,2)
No ratings yet
IR Unit 2 (1,2)
76 pages
Unit 3
No ratings yet
Unit 3
123 pages
Chapter 02 - DM Tasks - Part I - Classification
No ratings yet
Chapter 02 - DM Tasks - Part I - Classification
58 pages
ML Notes (III BCA)
No ratings yet
ML Notes (III BCA)
64 pages
Machine Learning - Iii
No ratings yet
Machine Learning - Iii
53 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
Lecture 5-1 Naive
No ratings yet
Lecture 5-1 Naive
44 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Unit - 3
No ratings yet
Unit - 3
83 pages
Classification Chapter 5
No ratings yet
Classification Chapter 5
26 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
05 Classification
No ratings yet
05 Classification
33 pages
Big Data Mining and Analytics Notes
No ratings yet
Big Data Mining and Analytics Notes
7 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
03 Classification
No ratings yet
03 Classification
66 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Learning AI
No ratings yet
Learning AI
34 pages
Classification and Clustering Techniques in Data Mining
No ratings yet
Classification and Clustering Techniques in Data Mining
18 pages
FPA Notes
No ratings yet
FPA Notes
13 pages
8 Classification
No ratings yet
8 Classification
45 pages
1 - Supervised Learning & Its Types
No ratings yet
1 - Supervised Learning & Its Types
24 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
Unit 3
No ratings yet
Unit 3
20 pages
Unit 3 &4 BDA Notes
No ratings yet
Unit 3 &4 BDA Notes
20 pages
Unit 5
No ratings yet
Unit 5
25 pages
Module 1
No ratings yet
Module 1
47 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Classification Report Research Lab
No ratings yet
Classification Report Research Lab
6 pages
Resentation On Aïve Bayesian Lassification
No ratings yet
Resentation On Aïve Bayesian Lassification
38 pages
Module 3 - Classification
No ratings yet
Module 3 - Classification
9 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
Alaa Ali ch8 Mining
No ratings yet
Alaa Ali ch8 Mining
13 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
For Unit 4 Useful
100% (1)
For Unit 4 Useful
107 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
Machine Learning Section4 Ebook v03
No ratings yet
Machine Learning Section4 Ebook v03
20 pages
Classification
No ratings yet
Classification
33 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
Unit 4
No ratings yet
Unit 4
38 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
17 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Mtech AI Syllabus
No ratings yet
Mtech AI Syllabus
159 pages
Module I Supervised Learning PPT-2
No ratings yet
Module I Supervised Learning PPT-2
167 pages
Logical and Relational Learning PDF
100% (1)
Logical and Relational Learning PDF
403 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
Module3 DS PPT
No ratings yet
Module3 DS PPT
68 pages
ML Unit 4
No ratings yet
ML Unit 4
76 pages
Introduction To Environmental Data Science New Hsieh William W PDF Download
No ratings yet
Introduction To Environmental Data Science New Hsieh William W PDF Download
78 pages
MTech - CSE - Curriculum-2022
No ratings yet
MTech - CSE - Curriculum-2022
94 pages
Imbalanced Data
No ratings yet
Imbalanced Data
54 pages
Unit 2
No ratings yet
Unit 2
19 pages
MLT by Engineering Express
No ratings yet
MLT by Engineering Express
94 pages
BT-2016 SEM-IV Project Report (Review 1)
No ratings yet
BT-2016 SEM-IV Project Report (Review 1)
42 pages
Diagnosis of Bearing Faults Using Temporal Vibration Signals: A Comparative Study of Machine Learning Models With Feature Selection Techniques
No ratings yet
Diagnosis of Bearing Faults Using Temporal Vibration Signals: A Comparative Study of Machine Learning Models With Feature Selection Techniques
17 pages
XAI For Precision Pathology, 18
No ratings yet
XAI For Precision Pathology, 18
30 pages
P01 39-5-6 Lakshmi p541-554
No ratings yet
P01 39-5-6 Lakshmi p541-554
14 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Assessing A Single Classification Algorithm and Two Classification Algorithms
No ratings yet
Assessing A Single Classification Algorithm and Two Classification Algorithms
12 pages
Cheating Detection in Online Examinations
No ratings yet
Cheating Detection in Online Examinations
71 pages
Sharda Bia10e Tif 06
No ratings yet
Sharda Bia10e Tif 06
11 pages
Ijreas Volume 2, Issue 1 (January 2012) ISSN: 2249-3905 Indian Stock Market Trend Prediction Using Support Vector Machine
No ratings yet
Ijreas Volume 2, Issue 1 (January 2012) ISSN: 2249-3905 Indian Stock Market Trend Prediction Using Support Vector Machine
19 pages
JOCC Volume 2 Issue 1 Page 9 19
No ratings yet
JOCC Volume 2 Issue 1 Page 9 19
11 pages
Least Squares Support Vector Machines: Johan Suykens
No ratings yet
Least Squares Support Vector Machines: Johan Suykens
84 pages
Urban Land-Use Classification Using Machine Learning Classifiers Comparative Evaluation and Post-Classification Multi-Feature Fusion Approach
No ratings yet
Urban Land-Use Classification Using Machine Learning Classifiers Comparative Evaluation and Post-Classification Multi-Feature Fusion Approach
22 pages
Arora 2019
No ratings yet
Arora 2019
29 pages
BigData Assessment2 26230605
No ratings yet
BigData Assessment2 26230605
14 pages
Review of Fruit Grading System
No ratings yet
Review of Fruit Grading System
10 pages
21AI63 Simp 23
No ratings yet
21AI63 Simp 23
3 pages
Robust Kernel Principal Component Analys
No ratings yet
Robust Kernel Principal Component Analys
8 pages
Experiential Study of Kernel Functions To Design An Optimized Multi-Class SVM
No ratings yet
Experiential Study of Kernel Functions To Design An Optimized Multi-Class SVM
6 pages
Bahan Ajar Pemodelan Dan Identifikasi Sistem PDF
No ratings yet
Bahan Ajar Pemodelan Dan Identifikasi Sistem PDF
5 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Unit 4

Uploaded by

Unit 4

Uploaded by

Unit 4

Classification: Introduction, Classification using Decision tree Induction, Bayesian

Data Mining techniques: (Classification V/s Clustering)

Feature Classification Clustering

Definition Assigns labels to data based Groups similar data points

Supervised/Unsupervised Supervised Learning Unsupervised Learning (No

Example Classifying emails as Spam Grouping customers based on

Classification in Data Mining

A bank wants to classify loan applicants as low-risk or high-risk based on:

1. Data Collection: Gather labeled data (e.g., past loan approvals).​

Method Learning Working Advantage Disadvantages Common

Decision Tree Supervise Builds a Easy to Prone to Medical

Bayesian Supervise Uses Works well Assumes Spam

Model N/A Uses Helps select No direct Used in

Techniques to N/A Uses Enhances Can be Image

Backpropagati Supervise Uses Very Requires large Face

Support Vector Supervise Finds the Works well Computationall Handwriting

Decision Tree Classification

Steps to Construct a Decision Tree:

●​ Easy to understand and interpret.

●​ Prone to overfitting, especially with noisy data.

Naïve Bayes Classification

P(Spam|Words) = [P(Words|Spam) * P(Spam)] / P(Words)

We classify emails as Spam or Not Spam based on:​

Step-by-Step Working of Naïve Bayes

1. Calculate Prior Probability

●​ P(Spam) = Number of spam emails / Total emails

●​ P(“Free” | Spam) = Probability of the word “Free” appearing in spam emails.

3️. Apply Bayes' Theorem:

4. Compare Probabilities & Classify the Email

●​ If P(Spam | Words) > P(Not Spam | Words) → Mark as Spam

Advantages of Naïve Bayes

✔ Fast and Efficient for Large Datasets.​

Disadvantages of Naïve Bayes

❌ Assumes Features are Independent – Not always true in real life.​

✔ k-Nearest Neighbors (k-NN) is the most commonly used Lazy Learner.

k-NN is a non-parametric, instance-based learning algorithm where the classification of a

K-Nearest Neighbor(KNN)/ Lazy Learner

○​ Step-1: Select the number K of the neighbors

We have existing labeled fruits:

Fruit Weight Color Class

Apple 180g Red 🍎 Apple

Step-by-Step Working of k-NN

1. Choose k (Number of Neighbors).

2. Calculate Distance Between New Sample and Existing Data Points.

●​ Common methods: Euclidean Distance

Find the k Nearest Neighbors.

●​ New fruit is closest to 2 Apples 🍎 and 1 Orange 🍊.

●​ Since 2 Apples > 1 Orange, classify as Apple 🍎.

❌ Computationally Expensive for Large Datasets.​

"If a certain condition is met, then assign a class label."

Steps in Rule-Based Classification

1️. Extract Rules from Data:

●​ Can be manually defined by experts OR

2️ Apply Rules to New Data:

●​ Check each rule one by one.

●​ If multiple rules match, use a priority ranking or confidence score.

Student Math Score English Score Class

✔ Rule 1: IF Math > 80 AND English > 85 THEN Class = Excellent.​

Classifying a New Student:

Advantages of Rule-Based Classification

✔ Easy to Understand: Rules are human-readable and interpretable.​

Disadvantages of Rule-Based Classification

Difficult to Find Optimal Rules: Manually defining rules is time-consuming.​

Backpropagation is the learning algorithm used in artificial neural networks (ANNs) to

How Neural Networks Work in Classification

A neural network consists of three main layers:

●​ Receives raw data (features).

●​ Processes data through multiple neurons connected with weights.

How Backpropagation Works

Backpropagation is a technique that adjusts the weights in a neural network to reduce

✔ Can Learn Complex Patterns: Handles large datasets efficiently.​

❌ Requires Large Datasets: Needs thousands or millions of examples to learn well.​

●​ Detecting credit card fraud using transaction patterns.

●​ Google Assistant, Siri classify spoken words into text.

●​ Facebook’s face recognition system classifies photos based on people.

1. Data Collection: Gather labeled data (e.g., past loan approvals).

● Easy to understand and interpret.

● Prone to overfitting, especially with noisy data.

We classify emails as Spam or Not Spam based on:

● P(Spam) = Number of spam emails / Total emails

● P(“Free” | Spam) = Probability of the word “Free” appearing in spam emails.

● If P(Spam | Words) > P(Not Spam | Words) → Mark as Spam

✔ Fast and Efficient for Large Datasets.

❌ Assumes Features are Independent – Not always true in real life.

○ Step-1: Select the number K of the neighbors

● Common methods: Euclidean Distance

● New fruit is closest to 2 Apples 🍎 and 1 Orange 🍊.

● Since 2 Apples > 1 Orange, classify as Apple 🍎.

❌ Computationally Expensive for Large Datasets.

● Can be manually defined by experts OR

● Check each rule one by one.

● If multiple rules match, use a priority ranking or confidence score.

✔ Rule 1: IF Math > 80 AND English > 85 THEN Class = Excellent.

✔ Easy to Understand: Rules are human-readable and interpretable.

Difficult to Find Optimal Rules: Manually defining rules is time-consuming.

● Receives raw data (features).

● Processes data through multiple neurons connected with weights.

✔ Can Learn Complex Patterns: Handles large datasets efficiently.

❌ Requires Large Datasets: Needs thousands or millions of examples to learn well.

● Detecting credit card fraud using transaction patterns.

● Google Assistant, Siri classify spoken words into text.

● Facebook’s face recognition system classifies photos based on people.

● Cancer detection (classifying malignant vs. benign tumors).

● Classifying handwritten characters in OCR (Optical Character Recognition) systems.

● Spam classification (Spam vs. Not Spam emails).

● Classifying DNA sequences into different species.

✔ Prevents overfitting (model memorizing training data instead of learning patterns).

● Split dataset into Training (80%) and Testing (20%).

● Split data into k parts (e.g., k=5).

● Resample data with replacement to create multiple training sets.

● Remove irrelevant features (e.g., removing ID numbers in a dataset).

● Bagging: Train multiple Decision Trees and average their predictions.