0% found this document useful (0 votes)
7 views14 pages

Classification Notes

Classification is a data mining technique used to categorize data objects into predefined classes based on their attributes, essential for supervised machine learning. It involves a two-step process of model learning and classification, with applications in areas like spam detection, medical diagnosis, and image recognition. Key algorithms include Decision Trees, Bayesian classifiers, and K-nearest neighbors, while challenges include data cleaning and relevance.

Uploaded by

uthsahak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

Classification Notes

Classification is a data mining technique used to categorize data objects into predefined classes based on their attributes, essential for supervised machine learning. It involves a two-step process of model learning and classification, with applications in areas like spam detection, medical diagnosis, and image recognition. Key algorithms include Decision Trees, Bayesian classifiers, and K-nearest neighbors, while challenges include data cleaning and relevance.

Uploaded by

uthsahak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Unit-4

Classification
Classification is a technique in data mining that involves categorizing or classifying data objects into
predefined classes, categories, or groups based on their features or attributes. Or

Data classification is a process in which data is organized or labeled into predefined categories or classes
based on its characteristics. The goal of classification is to assign new, unseen data instances to the
correct predefined categories. This process is a fundamental task in supervised machine learning and data
mining.

General Approach to Classification:


How does classification work?

Data classification is a two-step process, consisting of a learning step (where a classification model is
constructed) and a classification step (where the model is used to predict class labels for given data).

 In the first step (Learning), we build a classification model based on previous data (training or
sample data).
 In the second step (classification), we determine if the model accuracy is acceptable and if so we
use the model to classify the new data.

Applications:

Data classification has numerous applications across various domains due to its ability to automatically
categorize data into predefined classes. Here are some key applications:

1. Spam Detection: Identifying and filtering out spam emails from legitimate emails. Example: Email
services like Gmail and Outlook use classification algorithms to move spam emails to a spam folder.

2. Medical Diagnosis: Predicting diseases or health conditions based on patient data. Example:
Classifying patients as diabetic or non-diabetic based on their medical history and test results.

3. Credit Scoring: Assessing the creditworthiness of individuals or businesses. Example: Banks and
financial institutions classify loan applicants into risk categories (e.g., low risk, high risk) based on their
financial history and credit score.

4. Image Recognition: Identifying objects, people, or scenes in images. Example: Facial recognition
systems classify images of faces to identify individuals for security purposes.

5. Sentiment Analysis: Determining the sentiment expressed in text data (e.g., positive, negative,
neutral). Example: Companies analyze customer reviews to classify them as positive or negative
feedback.

6. Document Classification: Categorizing documents into predefined categories. Example: News


articles can be classified into topics such as sports, politics, or entertainment.
7. Customer Segmentation: Grouping customers based on their behavior or characteristics for targeted
marketing. Example: Retailers classify customers into segments (e.g., frequent buyers, occasional
shoppers) to personalize marketing campaigns.

8. Fraud Detection: Identifying fraudulent activities in transactions. Example: Credit card companies
classify transactions as fraudulent or legitimate based on transaction patterns.

9. Voice Recognition: Identifying spoken words or phrases. Example: Virtual assistants like Siri and
Alexa classify audio input to recognize commands and provide appropriate responses.

10. Behavioral Targeting: Delivering personalized advertisements based on user behavior. Example:
Online advertising platforms classify users based on their browsing history to show relevant ads.

Prediction
Prediction, in the context of data science and machine learning, refers to the process of using a trained
model to make forecasts or estimations about future or unseen data. The goal of prediction is to generate
accurate and meaningful insights by using patterns learned from historical or labeled data.

Classification and Regression are the two major types of prediction problems, where classification is used
to predict discrete or nominal values, while regression is used to predict continuous or ordered values.

Issues Regarding Classification


Classification tasks can face several issues related to data preparation, including data cleaning, relevance,
and data transformation.

1. Data Cleaning: Data cleaning involves identifying and correcting (or removing) inaccuracies and
inconsistencies in the data to improve its quality.
 Missing Values: Missing data points can lead to biased or incorrect models.
 Noisy Data: Data with errors, outliers, or irrelevant information can distort model predictions.
 Duplicate Records: Multiple entries of the same data can skew results.

2. Relevance: Relevance involves ensuring that the features (variables) used in the classification task are
important and useful for making accurate predictions.
 Irrelevant Features: Features that do not contribute to the prediction can add noise and reduce model
performance.
 Redundant Features: Highly correlated features can provide duplicate information, making the model
more complex than necessary.

3. Data Transformation: Data transformation involves converting data into a suitable format or structure
for analysis, which can include scaling, encoding, or creating new features.
 Scaling Issues: Features with different scales can disproportionately affect the model.
 Categorical Data: Many algorithms require numerical input, so categorical data must be transformed
appropriately.
 Non-linear Relationships: Some relationships between features and the target variable may be non-linear
and need transformation.
Algorithms
1. Decision Tree Induction
 ID3 (Iterative Dichotomiser 3)
 C4.5
 CART (Classification and Regression Trees)
 Random forest(an ensemble of Decision tree)
2. Bayes classification method
 Bayes Theorem
 Naive Bayesian Classification
3. Rule Based Classification
4. Lazy Learning (Learn from your neighbors)
 K- nearest neighbors
Decision Tree Induction
Decision tree induction is a popular and powerful method for classification and regression tasks in
machine learning. It involves creating a model that predicts the value of a target variable by learning
simple decision rules inferred from the data features.

Steps in Decision Tree Induction:

1. Select the Best Attribute: Choose the feature that best splits the data according to a specific
criterion (such as information gain for ID3 or Gini impurity for CART).
2. Create a Decision Node: Create a node in the tree that represents the selected attribute.
3. Split the Data: Divide the dataset into subsets based on the selected attribute's values.
4. Repeat: Recursively apply the above steps to each subset.
5. Stopping Criteria: The recursion stops when one of the following conditions is met:
o All instances in a subset belong to the same class.
o No further attributes are left to split the data.
o The tree reaches a predefined maximum depth or a minimum number of instances per
node.
Advantages:

 Interpretability: Decision trees are easy to understand and interpret.


 Non-Parametric: They make no assumptions about the distribution of the data.
 Feature Importance: They provide insights into feature importance based on the splits.

Disadvantages:

 Overfitting: Decision trees can easily overfit the training data, especially if they are deep.
 Instability: Small changes in the data can lead to significant changes in the tree structure.
 Bias towards Features with More Levels: Features with more levels can dominate splits,
leading to biased models.

ID3 (Iterative Dichotomiser 3)


 Uses information gain to select the best attribute for splitting.
 Constructs a tree by recursively partitioning the data.
 Stops when all instances in a node belong to the same class or when no more attributes are
available.

C4.5

 Uses gain ratio (an extension of information gain) to handle attributes with many values.
 Handles both categorical and continuous attributes.
 Prunes the tree after creation to remove branches that do not provide additional power in
classification. Can handle missing values in the data.
CART (Classification and Regression Trees)

 Uses Gini impurity or entropy to split the data for classification tasks.
 Uses variance reduction to split the data for regression tasks.
 Constructs binary trees, where each node has exactly two children.
 Prunes the tree using cost-complexity pruning to avoid overfitting.

Random Forest (An Ensemble of Decision Trees)

 Uses bootstrap aggregating (bagging) to create multiple subsets of the training data.
 Each tree is trained on a different subset, and a random subset of features is used to split each
node.
 Reduces overfitting by averaging the results of many trees.
 Provides a measure of feature importance based on how much each feature improves the split
criterion.

Bayes Classification Methods


“What are Bayesian classifiers?”

Bayesian classifiers are statistical classifiers. They can predict class membership probabilities such as the
probability that a given tuple belongs to a particular class.
Bayesian classification is based on Bayes’ theorem, described next. Studies comparing classification
algorithms have found a simple Bayesian classifier known as the naive Bayesian classifier to be
comparable in performance with decision tree and selected neural network classifiers. Bayesian classifiers
have also exhibited high accuracy and speed when applied to large databases. Naive Bayesian classifiers
assume that the effect of an attribute value on a given class is independent of the values of the other
attributes. This assumption is called class conditional independence. It is made to simplify the
computations involved and, in this sense, is considered “naive”.

 Bayesian classifiers are statistical classifiers.


 They can predict class membership probabilities, such as the probability that a given tuple
belongs to a particular class.
 Bayesian classification is based on Bayes’ theorem.

Bayes Theorem
Bayes theorem is one of the most popular machine learning concepts that helps to calculate the
probability of occurring one event with uncertain knowledge while other one has already occurred.
Bayes' theorem can be derived using product rule and conditional probability of event X with known
event Y:
o According to the product rule we can express as the probability of event X with known event Y as
follows;

P(X ? Y)= P(X|Y) P(Y) {equation 1}

o Further, the probability of event Y with known event X:

P(X ? Y)= P(Y|X) P(X) {equation 2}

Bayes theorem can be expressed by combining both equations on right hand side. We will get:

Here, both events X and Y are independent events which means probability of outcome of both events
does not depends on one another. The above equation is called as Bayes Rule or Bayes Theorem.

o P(X|Y) is called as posterior, which we need to calculate. It is defined as updated probability


after considering the evidence.
o P(Y|X) is called the likelihood. It is the probability of evidence when hypothesis is true.
o P(X) is called the prior probability, probability of hypothesis before considering the evidence
o P(Y) is called marginal probability. It is defined as the probability of evidence under any
consideration.

Hence, Bayes Theorem can be written as: posterior = likelihood * prior / evidence
Naive Bayesian Classification

Solution:
Lazy Learning (Learn from your neighbors)
K- nearest neighbors
Problem
Problem 2
∴ Majority of the classification where there are rank 1and 2 are classify as 1 and
rank3 are classify as 0.

∴ new instance BMI=43.6, Age40 Sugar=1

Problems
1. Decision Tree Induction (ID3 and CART)
2. Naive Bayesian Classification
3. K-NN

Matrix in Classification and Prediction:

A classification matrix, also known as a confusion matrix, is a table used to evaluate the performance of a
classification algorithm. It compares the actual labels with the predicted labels generated by the model.

Structure of the Confusion Matrix

A confusion matrix for a binary classification problem typically looks like this:

Key Metrics Derived from the Confusion Matrix

1. Accuracy: The proportion of correctly classified instances (both true positives and true
negatives) among the total instances.
2. Precision (Positive Predictive Value): The proportion of true positive predictions among all
positive predictions.

3. Recall (Sensitivity, True Positive Rate): The proportion of true positive predictions among all
actual positive instances.

4. F1 Score: The harmonic mean of precision and recall, providing a balance between the two
metrics.

5. Specificity (True Negative Rate): The proportion of true negative predictions among all actual
negative instances.

6. False Positive Rate (FPR): The proportion of false positive predictions among all actual
negative instances.

7. False Negative Rate (FNR): The proportion of false negative predictions among all actual
positive instances.

Example

Consider a binary classification problem where a model is used to predict whether an email is spam
(positive class) or not spam (negative class). Here's an example confusion matrix:
Difference between prediction and classification

Classification Prediction

Classification is the process of identifying which Predication is the process of identifying the
category a new observation belongs to based on a missing or unavailable numerical data for a
training data set containing observations whose new observation.
category membership is known.

In classification, the accuracy depends on finding In prediction, the accuracy depends on how
the class label correctly. well a given predictor can guess the value of a
predicated attribute for new data.

In classification, the model can be known as the In prediction, the model can be known as the
classifier. predictor.
A model or the classifier is constructed to find the A model or a predictor will be constructed that
categorical labels. predicts a continuous-valued function or
ordered value.

For example, the grouping of patients based on For example, We can think of prediction as
their medical records can be considered a predicting the correct treatment for a particular
classification. disease for a person.

Difference between classification and clustering

You might also like