0% found this document useful (0 votes)
81 views57 pages

Unit Iii Classification

This document discusses classification, which predicts categorical class labels. There are two forms of data analysis for extracting models: classification and prediction. Classification models predict categorical class labels, while prediction models predict continuous valued functions. For example, a classification model could categorize bank loan applications as either safe or risky, while a prediction model could predict customer expenditures based on income and occupation. The classification process involves building a classifier model from labeled training data, and then using the model to predict class labels for new unlabeled data.

Uploaded by

arshuyas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views57 pages

Unit Iii Classification

This document discusses classification, which predicts categorical class labels. There are two forms of data analysis for extracting models: classification and prediction. Classification models predict categorical class labels, while prediction models predict continuous valued functions. For example, a classification model could categorize bank loan applications as either safe or risky, while a prediction model could predict customer expenditures based on income and occupation. The classification process involves building a classifier model from labeled training data, and then using the model to predict class labels for new unlabeled data.

Uploaded by

arshuyas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

UNIT III

Classification:
 There are two forms of data analysis that can be used for extracting models describing
important classes or to predict future data trends. These two forms are as follows −
1. Classification
2. Prediction
 Classification models predict categorical class labels; and prediction models predict
continuous valued functions. For example, we can build a classification model to
categorize bank loan applications as either safe or risky,
 or a prediction model to predict the expenditures in dollars of potential customers on
computer equipment given their income and occupation.
Classification
 Classification is a form of data analysis that extracts models describing important data classes.
 Such models, called classifiers, predict categorical (discrete, unordered) class labels.
 For example, we can build a classification model to categorize bank loan applications as either
safe or risky. Such analysis can help provide us with a better understanding of the data at large.
 Classification:
 predicts categorical class labels
 classifies data based on the training set and the values in a classification attribute and uses
it in classifying new data
General Approach to Classification
 Data classification is a two-step process, consisting of a
1. learning step (where a classification model is constructed) and
2. a classification step (where the model is used to predict class labels for given data).
General Approach for Building Classification Model
4

Tid Attrib1 Attrib2 Attrib3 Class


Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction


14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Introduction to Data Mining, 2nd Edition 04/12/2022
How Does Classification Works?
 The Data Classification process includes two steps −
1. Building the Classifier or Model
2. Using Classifier for Classification

 Building the Classifier or Model


 This step is the learning step or the learning phase.
 In this step the classification algorithms build the classifier.
 The classifier is built from the training set made up of database tuples and their associated class
labels.
 Each tuple that constitutes the training set is referred to as a category or class. These tuples can
also be referred to as sample, object or data points.
 Using Classifier for Classification
 In this step, the classifier is used for classification. Here the test data is used to estimate the
accuracy of classification rules. The classification rules can be applied to the new data tuples if the
accuracy is considered acceptable.
8 An example: model construction
Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED


Classifier
Mary Assistant Prof 3 no (Model)
James Assistant Prof 7 yes
Bill Professor 2 no
John Associate Prof 7 yes IF rank = ‘professor’
Mark Assistant Prof 6 no OR years > 6
THEN tenured = yes
Annie Associate Prof 3 no
Data mining: Classification
9 An example: model usage

Classifier

Testing Unseen Data


Data

(Jeff, Professor, 4)

NAME RANK YEARS TENURED


Tom Assistant Prof 2 no Tenured?
Lisa Associate Prof 7 no
Jack Professor 5 yes Yes
Ann Assistant Prof 7 yes
Data mining: Classification
Classification and Prediction Issues
 The major issue is preparing the data for Classification and Prediction. Preparing the data
involves the following activities −
 Data Cleaning − Data cleaning involves removing the noise and treatment of missing values.
The noise is removed by applying smoothing techniques and the problem of missing values is
solved by replacing a missing value with most commonly occurring value for that attribute.
 Relevance Analysis − Database may also have the irrelevant attributes. Correlation analysis
is used to know whether any two given attributes are related.
 Data Transformation and reduction − The data can be transformed by any of the
following methods.
 Normalization − The data is transformed using normalization. Normalization involves scaling
all values for given attribute in order to make them fall within a small specified range.
Normalization is used when in the learning step, the neural networks or the methods involving
measurements are used.
 Generalization − The data can also be transformed by generalizing it to the higher concept.
For this purpose we can use the concept hierarchies.
12
Classification vs. prediction
 Classification:
 predicts categorical class labels
 classifies data based on the training set and the values in a
classification attribute and uses it in classifying new data
 Prediction:
 models continuous-valued functions
 predicts unknown or missing values
Classification Algorithms
 Classification algorithms are used to analyze a given data set and takes each instance of it. It
assigns this instance to a particular class. Such that classification error will be least. It is
used to extract models. That define important data classes within the given data set.
Classification is a two-step process.
 During the first step, the model is created by applying a classification algorithm. That is on
training data set.
 Then in the second step, the extracted model is tested against a predefined test data set. That
is to measure the model trained performance and accuracy.
Applications of Classification Algorithms

 Email spam classification


 Bank customers loan pay willingness prediction.
 Cancer tumor cells identification.
 Sentiment analysis 
 Drugs classification
 Facial key points detection
Classification Techniques
15

 Classification consists of assigning a class label to a set of unclassified cases.

 Supervised Classification
 The set of possible classes is known in advance.

 Unsupervised Classification
 Set of possible classes is not known. After classification we can try to assign a
name to that class.
 Unsupervised classification is called clustering.
16 Supervised learning
 Supervised learning, as the name indicates, has the presence of a supervisor as
a teacher.
 Basically supervised learning is when we teach or train the machine using data
that is well labeled.
 Which means some data is already tagged with the correct answer.
 After that, the machine is provided with a new set of examples(data) so that
the supervised learning algorithm analyses the training data(set of training
examples) and produces a correct outcome from labeled data. 

Data mining: Classification


Example
 If the shape of the object is rounded and has a depression at the top, is red in color, then it
will be labeled as –Apple.
 If the shape of the object is a long curving cylinder having Green-Yellow color, then it will
be labeled as –Banana. 
 Now suppose after training the data, you have given a new separate fruit, say Banana from
the basket, and asked to identify it. 
 Since the machine has already learned the things from previous data
and this time have to use it wisely.
 It will first classify the fruit with its shape and color and would
confirm the fruit name as BANANA and put it in the Banana category.
 Thus the machine learns the things from training data(basket
containing fruits) and then applies the knowledge to test data(new
fruit). 
Supervised Classification
20

CS 40003: Data
Analytics
 Supervised learning classified into two categories of algorithms: 
 
 Classification: A classification problem is when the output variable is a
category, such as “Red” or “blue” or “disease” and “no disease”.
 Regression: A regression problem is when the output variable is a real
value, such as “dollars” or “weight”.
22 Unsupervised learning
 Unsupervised learning is the training of a machine using information that is neither
classified nor labeled and allowing the algorithm to act on that information without
guidance.
 Here the task of the machine is to group unsorted information according to
similarities, patterns, and differences without any prior training of data. 
 Unlike supervised learning, no teacher is provided that means no training will be
given to the machine.
 Therefore the machine is restricted to find the hidden structure in unlabeled data by
itself. 

Data mining: Classification


23 Example
For instance, suppose it is given an image having both dogs
and cats which it has never seen. 

Data mining: Classification


 Thus the machine has no idea about the features of dogs and cats so we can’t categorize
it as ‘dogs and cats ‘.

 But it can categorize them according to their similarities, patterns, and differences,

 i.e., we can easily categorize the above picture into two parts.

 The first may contain all pics having dogs in it and the second part may contain all pics
having cats in it.

 Here you didn’t learn anything before, which means no training data or examples.
Classification Techniques
25

 A number of classification techniques are known, which can be broadly classified into the following
categories:

1. Statistical-Based Methods
•Regression
•Bayesian Classifier

2. Distance-Based Classification
•Simple approach
•K-Nearest Neighbours

3. Decision Tree-Based Classification


•ID, C , CART

5. Classification using Machine Learning (SVM)

6. Classification using Neural Network (A


Linear regression

 It is simplest form of regression. Linear regression attempts to model


the relationship between two variables by fitting a linear equation to
observe the data.
 Linear regression attempts to find the mathematical relationship
between variables.
 If outcome is straight line then it is considered as linear model and if it
is curved line, then it is a non linear model.
Linear regression
contd..

 The relationship between dependent variable is given by straight


line and it has only one independent variable.
Y =  α + Β X
 Model 'Y', is a linear function of 'X'.
 The value of 'Y' increases or decreases in linear manner according
to which the value of 'X' also changes.
Linear regression
contd..
Statistical-Based Algorithms-
Regression
 Logistic Regression is a classification and not a regression algorithm. It estimates discrete
values (Binary values like 0/1, yes/no, true/false) based on a given set of independent
variable(s).
 The values obtained would always lie within 0 and 1 since it predicts the probability.
 regressions can be used in applications, such as:
 Credit score
 Measure the success rates of marketing campaigns
 Predict revenue for a particular product
Bayesian Classification.

 Bayesian classification uses Bayes theorem to predict the occurrence of any event.
Bayesian classifiers are the statistical classifiers with the Bayesian probability
understandings. The theory expresses how a level of belief, expressed as a probability.
 Bayes theorem came into existence after Thomas Bayes, who first utilized conditional
probability to provide an algorithm that uses evidence to calculate limits on an unknown
parameter.
Bayesian Classifier
33

 Principle
 If it walks like a duck, quacks like a duck, then it is probably a duck
Bayesian Classifier
34
 A statistical classifier

 Performs probabilistic prediction, i.e., predicts class membership probabilities

 Foundation

 Based on Bayes’ Theorem.

 Assumptions
1.The classes are mutually exclusive and exhaustive.
2.The attributes are independent given the class.

 Called “Naïve” classifier because of these assumptions.


 Empirically proven to be useful.
Example: Bayesian Classification
35

 Example 8.2: Air Traffic Data

 Let us consider a set


observation recorded in a
database

 Regarding the arrival of airplanes


in the routes from any airport to
New Delhi under certain
conditions.
Air-Traffic Data
36

Days Season Fog Rain Class


Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time

Cond. to next slide…


Air-Traffic Data
37

Cond. from previous slide…


Days Season Fog Rain Class
Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Weekday Winter Normal None Late
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time
Air-Traffic Data
38

 In this database, there are four attributes


A = [ Day, Season, Fog, Rain]
with 20 tuples.
 The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]

 Given this is the knowledge of data and classes, we are to find most likely
classification for any other unseen instance, for example:
Week Day Winter High None ???

 Classification technique eventually to map this tuple into an accurate class.


Bayesian Classifier
39

 In many applications, the relationship between the attributes set and the class variable is
non-deterministic.

 In other words, a test cannot be classified to a class label with certainty.

 In such a situation, the classification can be achieved probabilistically.

 The Bayesian classifier is an approach for modelling probabilistic relationships between


the attribute set and the class variable.

 More precisely, Bayesian classifier use Bayes’ Theorem of Probability for classification.

 Before going to discuss the Bayesian classifier, we should have a quick look at the
Theory of Probability and then Bayes’ Theorem.
Naive Bayes Algorithm
 Naïve Bayes algorithm is a supervised learning algorithm.
 It is a classification technique based on Bayes’ Theorem with an assumption of independence
among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature.
 For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in
diameter. Even if these features depend on each other or upon the existence of the other features,
all of these properties independently contribute to the probability that this fruit is an apple and
that is why it is known as ‘Naive’.
 It is mainly used in text classification that includes a high-dimensional training dataset.
Bayes' Theorem:

•The formula for Bayes' theorem is given as:

P(A|B) is Posterior probability: Probability of hypothesis A on the observed


event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the
evidence.
P(B) is Marginal Probability: Probability of Evidence.
 Bayesian interpretation:
In the Bayesian interpretation, probability determines a "degree of belief." Bayes theorem
connects the degree of belief in a hypothesis before and after accounting for evidence. For
example, Lets us consider an example of the coin. If we toss a coin, then we get either heads or
tails, and the percent of occurrence of either heads and tails is 50%. If the coin is flipped numbers
of times, and the outcomes are observed, the degree of belief may rise, fall, or remain the same
depending on the outcomes.
Bayesian network:
 yesian Network falls under the classification of Probabilistic Graphical Modelling (PGM)
procedure that is utilized to compute uncertainties by utilizing the probability concept. Generally
known as Belief Networks, Bayesian Networks are used to show uncertainties using Directed
Acyclic Graphs (DAG)
 A Directed Acyclic Graph is used to show a Bayesian Network, and like some other statistical
graph, a DAG consists of a set of nodes and links, where the links signify the connection between
the nodes.
 The nodes here represent random variables, and the edges define the relationship between
these variables.
 A DAG models the uncertainty of an event taking place based on the Conditional
Probability Distribution (CDP) of each random variable. A Conditional Probability
Table (CPT) is used to represent the CPD of each variable in a network.
Distance-Based Algorithms:
SimpleApproach
 Strategy:
 Each item mapped to the same class is more similar to other items in that class.
 Similarity or distance measures may be used to identify alikeness of different items
in data base.
 Using similarity measure is simpler for classification, since classes are known in
advance.
 If definition is in the form of a information retrieval query, the classification
problem is determining the similarity between each query and tuple.
Simple Approach
 Here we assume that,
 Each tuple ti in data base defined as a vector (ti1,ti2,……..tik) of numeric values.
 Each class Cj is defined by the tuple(Cj1,Cj2… Cjk)of numeric values.
 Definition:
 Given a database D={t1,t2… tn} of tuples where each tuple ti<=<ti1,ti2,…tik>
contains numeric values and a set of classes C={C1….Cm} where each class Cj
consisting of numerical values Cj<=<Cj1,Cj2…Cjk> . Now the classification
problem is to assign each ti to the class Cj such that (ti,Cj)>=similarity
between(ti,Cl) for all values of Cl being element of c where Cl≠Cj
 Method:
 For similarity measures representative vector for each class must be determined.
 Calculate the center for each region.
 Place each item in the class where it is most similar to center of the class.
 Algorithm:
 Here we calculate the center of each class
Input: (tuples to be classified)
C1,…….,Cm
Output:
C // class to which t is assigned
Algorithm is:
dist=∞
For i:=1 to m do
if dist(Ci,t) < dist, then
c=i;
dist=dist(Ci,t);
K Nearest Neighbors
 Here we consider K nearest neighbors to a particular item.
 Assumptions:
 Entire training set includes not only the data in the set but also desired
classification for each item.
 The training data becomes the model.
 For classification of new items distance of each item in the training set must be
determined.
 K closest entries in the training set are considered further.
 New item placed in the class contains most items from set K closest items.
Algorithm

You might also like