0% found this document useful (0 votes)

15 views92 pages

Unit-5 3161610

The document discusses the concepts of classification and prediction in data analysis, highlighting their definitions, differences, and methodologies. It covers various algorithms used for classification, such as decision trees, Bayesian classifiers, and neural networks, as well as the importance of data preparation and evaluation measures. Additionally, it explains the processes involved in building classifiers and the significance of attributes like information gain and Gini index in decision tree induction.

Uploaded by

rekhabenmummy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views92 pages

Unit-5 3161610

Uploaded by

rekhabenmummy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 92

Unit: 5

Classification and Prediction

Content
• Classification vs. prediction, Issues regarding
classification and prediction, Statistical-Based
Algorithms, Distance-Based Algorithms, Decision
Tree-Based Algorithms, Neural Network-Based
Algorithms, Rule-Based Algorithms, Combining
Techniques, accuracy and error measures,
evaluation of the accuracy of a classifier or
predictor. Neural Network Prediction methods:
Linear and nonlinear regression, Logistic
Regression Introduction of tools such as DB
Miner / WEKA / DTREG DM Tools
Formal Classification and prediction Definition:

• Classification and Prediction are two forms of data

analysis those can be used to extract models,
describing important data classes or to predict future
data trends.

• Such analysis can help to provide us with a better

understanding of the data at large.

• Classification predicts categorical (discrete,

unordered) labels, prediction models
continuous valued functions.
Cont..
Classification:

Prediction:
Classification
• The goal of data classification is to organize
and categorize data in distinct classes.
• A model is first created based on the
data distribution.
• The model is then used to classify new data.
• Given the model, a class can be predicted for
new data.
• In general way of saying classification is for discrete
and nominal values.
Prediction
• The goal of prediction is to forecast or deduce the
value of an attribute based on values of other
attributes.

• A model is first created based on the data

distribution.

• The model is then used to predict future or unknown

values.
Summarization of Classification and Prediction:

• If forecasting discrete value ( Classification )

• If forecasting continuous value ( Prediction )

Understanding Classification and Prediction
Cont..
Classification:
• Suppose from your past data ( train data ) you come to
know that your best friend likes above movies.
• Now one new movie ( test data ) released and hopefully you
want to know your best friend like it or not.
• If you strongly conformed about chances of liking that movie
by your friend, you can take your friend to movie this
weekend.
• If you clearly observe the problem it is just whether your
friend like or not.
• Finding solution to this type of problem is called
as classification. This is because we are classifying the things
to their belongings (yes or no, like or dislike )
Cont..
• Keep in mind here we are
forecasting discrete value( classification ) and the other
thing this classification belongs to Supervised learning.
• This is because you are learning this from your train
data.
• Mostly classification is binary classification in which we
have to predict whether output belongs to class 1 or
class 2 (class 1 : yes, class 2: no )
• We can use classification for predicting more classes
too. Like (suppose colors:
RED,GREEN,BLUE,YELLOW,ORANGE)
Cont..
Prediction:
• Suppose from your past data ( train data ) you come to
know that your best friend liked above movies and you
also know how many times each particular movie seen
by your friend.
• Now one new movie ( test data ) released same like
above, now your are going to find how many times this
present newly released movie will your friend watch is it
, 5 times, 6 times,10 times anything.
• If you clearly observe the problem it is about finding
the count, some times we can say this as predicting the
value.
Cont..
• Keep in mind, here we are forecasting continuous
value ( Prediction ) and the other thing this
prediction is also belongs to Supervised learning.
• This is because you are learning this from you train
data.
Difference between Discrete data values
and Continuous data values
• Discrete data values can only take certain values.

• For example you will have certain number of friends

like 4 or 5 but you can’t have 4.5(4 and half) friends.

• So These type of data values are called as discrete.

• weight of some object, height of the person these

type of data are called as Continuous data.
How Does Classification Works?

• With the help of the bank loan application , let us

understand the working of classification.

• The Data Classification process includes two steps:

• Building the Classifier or Model (Learning)

• Using Classifier for Classification (Classification)
How Does Classification Works?
• Building the Classifier or Model

• This step is the learning step or the learning phase.

• In this step the classification algorithms build the
classifier.
• The classifier is built from the training set made up
of database tuples and their associated class labels.
• Each tuple that constitutes the training set is
referred to as a category or class.
• These tuples can also be referred to as sample,
object or data points.
How Does Classification Works?
• Using Classifier for Classification

• In this step, the classifier is used for classification.

• Here the test data is used to estimate the accuracy of
classification rules.
• The classification rules can be applied to the new
data tuples if the accuracy is considered acceptable.
How Does Classification Works?
Learning Phase
How Does Classification Works?
• Learning Phase:

• Training data are analyzed by a classification

algorithm.

• Here, the class label attribute is loan decision, and

the learned model or classifier is
represented in the form of classification rules.
How Does Classification Works?
Classification Phase
How Does Classification Works?
• Classification:

• Test data are used to estimate

the accuracy of the classification rules.

• If the accuracy is considered acceptable, the rule can

be applied to the classification of new data tuples.
Classification and Prediction Issues
• The major issue is preparing the data for
Classification and Prediction. Preparing the data
involves the following activities:

• Data Cleaning
• Relevance Analysis
• Data Transformation and reduction
– Normalization
– Generalization
Classification and Prediction Issues
• Data Cleaning:

• Data cleaning involves removing the noise and

treatment of missing values.
• The noise is removed by applying smoothing
techniques and the problem of missing values is
solved by replacing a missing value with most
commonly occurring value for that attribute.
Classification and Prediction Issues
• Relevance Analysis:

• Database may also have the irrelevant attributes and the data
may be redundant.

• Correlation analysis is used to know whether any two given

attributes are related.

• Hence, relevance analysis, in the form of correlation analysis

and attribute subset selection, can be used to detect
attributes that do not contribute to the classification or
prediction task.
Classification and Prediction Issues
• Data Transformation and reduction − The data can
be transformed by any of the following methods.

• Normalization − The data is transformed using

normalization.
• Normalization involves scaling all values for a given
attribute so that they fall within a small specified
range, such as −1.0 to 1.0, or 0.0 to 1.0.
Classification and Prediction Issues
• Data Transformation and reduction −

• Generalization − The data can also be transformed

by generalizing it to the higher concept. For this
purpose we can use the concept hierarchies.

• For example, numeric values for the attribute income

can be generalized to discrete ranges, such as low,
medium, and high.
Classification and Prediction Methods

Classification

Decision Tree
Bayesian Classification

Rule Based Classification

Neural Network
Types of Algorithms
• Statistical-Based Algorithms: Naïve Bayes, Linear Discriminant
Analysis (LDA)
• Distance-Based Algorithms: k-Nearest Neighbors (k-NN),
Support Vector Machines (SVM)
• Decision Tree-Based Algorithms: ID3, C4.5, CART, Random
Forest
• Neural Network-Based Algorithms: Multi-Layer Perceptron
(MLP), Deep Neural Networks (DNN)
• Rule-Based Algorithms: Rule Induction, Associative
Classification
• Combining Techniques: Ensemble Methods (Bagging, Boosting,
Stacking)
Decision Tree Induction
 Used in classification, or in regression
 A decision tree is a structure that includes a root node,
branches, and leaf nodes.
 Each internal node denotes a test on an attribute, each
branch denotes the outcome of a test, and each leaf
node holds a class label.
 The topmost node in the tree is the root node.
 Types –
- ID3 Iterative Dichotomiser
- C4.5
- CART Classification and Regression Trees
Decision Tree Representation
Root Node

Branches

Leaf Node Leaf Node

Set of Possible Answers Set of Possible Answers

flow
• Dataset  Algorithm model/classifier class
Label Splitting Node

Entire Data Set D Employed Test?

no Yes

creditscore
income

High Low High Low

Approve Reject Approve Reject

Decision Tree (cont..)
• The following decision tree is for the concept buy_computer
that indicates whether a customer at a company is likely to
buy a computer or not. Each internal node represents a test
on an attribute. Each leaf node represents a class.

Features/
Test Attributes
Classification by Decision Tree Induction
•Decision tree
◦ Flowchart like Structure
◦ Internal node denotes a test on an attribute
◦ Branch represents an outcome of the test
◦ Leaf nodes represent class labels or class distribution

•Decision tree generation consists of two phases

◦ Tree construction
◦ At start, all the training examples are at the root
◦ Partition examples recursively based on selected attributes
◦ Tree pruning
◦ Identify and remove branches that reflect noise or outliers
Important Terms for Decision Tree
Attribute Selection Measures
-It’s a heuristic for selecting the splitting
criterion that “best” separates a given data
partition,D, of class-labeled training tuples into
individual classes.
-Also known as splitting rules
•Information Gain
•Entropy
•Gini Index
Information Gain
• Information gain can be used for continues-valued (numeric)
attributes.

• The attribute which has the highest information gain is selected

for split.

• Assume, that there are two classes P(positive) & N(negative).

• Suppose we have S samples, out of these p samples belongs to

class P and n samples belongs to class N.

• The amount of information, needed to decide split in S belongs to

P or N & that is defined as
Information Gain (Cont…)
• The amount of information, needed to decide split in S belongs to
P or N & that is defined as
Entropy (E)
• Entropy is the measure of impurity ,disorder or
uncertainty in bunch of examples.

• What an Entropy basically does?

• Entropy controls how a Decision Tree decides

to split the data.
• It actually effects how a Decision Tree draws its
boundaries.
Gini Index
• An alternative method to information gain is called the Gini
Index.
• Gini is used in CART (Classification and Regression Trees).
• If a dataset T Contains examples from n classes, gini index, gini(T)
is defined as
Gini (T) = 1 - 2

• n: the number of classes

• pj: the probability that a tuple in D belongs to class Ci
Gini Index (cont…)
• After splitting T into two subsets T1 and T2 with sizes
N1 and N2, the Giniindex of the split data is defined
as

Ginisplit(T) = gini (T1) + gini (T2)

Decision Tree Example
Decision Tree Example
Decision Tree Example
Gini Index
• It is used in the CART.
• the Gini index measures the impurity of D, a data
partition or set of training tuples, as

• where pi is the probability that a tuple in D belongs to

class Ci and is estimated by |Ci,D|/|D|.
• The sum is computed over m classes.
Gini Index Example
Gini Index
• Let D be the training data of Table where
there are 9 tuples belonging to the class buys
computer = yes and the remaining 5
tuples belong to the class buys computer = no.

• A (root) node N is created for the tuples

in D. We first use Equation for Gini index to compute
the impurity of D:
Gini Index
• To find the splitting criterion for the tuples in D, we need to
compute the Gini index for each attribute.

• Let’s start with the attribute income and consider each of the
possible splitting subsets.

• Consider the subset {low, medium}.

• This would result in 10 tuples in partition D1 satisfying the

condition “income ∈ {low, medium}.”

• The remaining four tuples of D would be assigned to partition D2.

Gini Index
• The Gini index value computed based on this partitioning is
Gini Index
• Similarly, the Gini index values for splits on the remaining
subsets are: 0.315 (for the subsets {low, high} and {medium})
and 0.300 (for the subsets {medium, high} and {low}).

• Therefore, the best binary split for attribute income is on

{medium, high} (or {low}) because it minimizes the Gini index.

• Evaluating the attribute, we obtain {youth, senior} (or

{middle aged}) as the best split for age with a Gini index of
0.375; the attributes {student} and {credit rating} are both
binary, with Gini index values of 0.367 and 0.429, respectively.
Gini Index
• The attribute income and splitting subset {medium, high} therefore
give the minimum Gini index overall, with a reduction in impurity
of 0.459 − 0.300 = 0.159.

• The binary split “income ∈ {medium, high}” results in the

maximum reduction in impurity of the tuples in D and is returned
as the splitting criterion.

• Node N is labeled with the criterion, two branches are grown from
it, and the tuples are partitioned accordingly.

• Hence, the Gini index has selected income instead of age at the
root node, unlike the (non binary) tree created by information gain.
Bayesian Classification
• “What are Bayesian classifiers?”

• Bayesian classifiers are statistical classifiers.

• They can predict class membership probabilities, such as the

probability that a given tuple belongs to a particular class.

• Bayesian classification is based on Bayes’ theorem.

• Bayesian classifiers have also exhibited high accuracy and speed

when applied to large databases.
Bayesian Classification
• Naïve Bayesian classifiers assume that ,

“The effect of an attribute value on a given class is independent

of the values of the other attributes”

• This assumption is called class conditional independence.

Bayesian Algorithm
• Bayes theorem provides a way of calculating the posterior
probability, P(c|x), from P(c), P(x), and P(x|c).

• Naive Bayes classifier assume that the effect of the value of a

predictor (x) on a given class (c) is independent of the values of
other predictors.

• This assumption is called class conditional independence.

Bayesian Algorithm

• P(c|x) is the posterior probability of class (target) given predictor

(attribute).
• P(c) is the prior probability of class.
• P(x|c) is the likelihood which is the probability of predictor given
class.
• P(x) is the prior probability of predictor.
Bayesian Example
Bayesian Example
• Predicting a class label using naïve Bayesian classification.

• We wish to predict the class label of a tuple using naïve Bayesian

classification, given the same training data “buys_computer”

• The data tuples are described by the attributes age, income,

student, and credit rating.

• The class label attribute, buys computer, has two distinct values
namely {yes, no}.

• Let C1 correspond to the class buys computer = yes

• C2 correspond to buys computer = no.

Bayesian Example
• Predicting a class label using naïve Bayesian classification.

• The tuple we wish to classify is

X = (age = youth, income = medium, student = yes, credit rating = fair)

We need to maximize P(X/Ci)P(Ci), for i = 1, 2.

P(Ci), the prior probability of each class, can be computed based on the
training tuples:

P(buys computer = yes) : 9/14 = 0.643

P(buys computer = no) : 5/14 = 0.357

Bayesian Example
• Predicting a class label using naïve Bayesian classification.

To compute P(X/Ci), for i = 1, 2

we compute the following conditional probabilities:

P(age = youth / buys computer = yes) : 2/9 = 0.222

P(age = youth / buys computer = no) : 3/5 = 0.600

P(income = medium / buys computer = yes) : 4/9 = 0.444

P(income = medium / buys computer = no) : 2/5 = 0.400

P(student = yes / buys computer = yes): 6/9 = 0.667

P(student = yes / buys computer = no) : 1/5 = 0.200

P(credit rating = fair / buys computer = yes) : 6/9 = 0.667

P(credit rating = fair / buys computer = no) : 2/5 = 0.400
Bayesian Example
Using the above probabilities, we obtain

P(X/buys computer = yes) = P(age = youth / buys computer = yes) *

P(income = medium / buys computer = yes) *
P(student = yes / buys computer = yes) *
P(credit rating = fair / buys computer = yes)

= 0.222*0.444*0.667*0.667

= 0.044.

Similarly,

P(X/buys computer = no) = 0.6000.4000.200*0.400

= 0.019.
Bayesian Example

To find the class, Ci, that maximizes P(X/Ci)P(Ci), we compute

P(X/buys computer = yes)P(buys computer = yes) = 0.044*0.643

= 0.028

P(X/buys computer = no)P(buys computer = no) = 0.019*0.357

= 0.007

Therefore, the naïve Bayesian classifier predicts buys computer = yes

for tuple X.
Bayesian classification Example
Bayesian Algorithm
• The posterior probability can be calculated by first, constructing a
frequency table for each attribute against the target.

• Then, transforming the frequency tables to likelihood tables

• and finally use the Naive Bayesian equation to calculate the

posterior probability for each class.

• The class with the highest posterior probability is the outcome of

prediction.
Bayesian Algorithm
Bayesian Algorithm
Bayesian Algorithm
• The likelihood tables for all four predictors.
Bayesian Algorithm
• An unseen sample X = <rain, hot, high, false>

•P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n)
= (3/5)*(2/5)*(4/5)*(2/5)*(5/14)
= 0.6*0.4*0.8*0.4*0.357
= 0.0274176
•Sample X is classified in class n (don’t play)
Rule Based Classification
• In this section, we look at rule-based classifiers, where the learned
model is represented as a set of IF-THEN rules.

• We first examine how such rules are used for classification.

Using IF-THEN Rules for Classification

• Rules are a good way of representing information or bits of

knowledge.
• A rule-based classifier uses a set of IF-THEN rules for classification.

• An IF-THEN rule is an expression of the form

IF condition THEN conclusion

Rule Based Classification
• An example is rule R1,

R1: IF age = youth AND student = yes THEN buys computer = yes.

• The “IF”-part (or left-hand side) of a rule is known as the rule

antecedent or precondition.

• The “THEN”-part (or right-hand side) is the rule consequent.

• In the rule antecedent, the condition consists of one or more

attribute tests (such as age = youth, and student = yes) that are
logically ANDed.

• The rule’s consequent contains a class prediction (in this case, we

are predicting whether a customer will buy a computer).
Rule Based Classification
• R1 can also be written as

R1: if (age = youth) ∧ (student = yes) ⇒ then (buys computer =

yes).

• If the condition (that is, all of the attribute tests) in a rule

antecedent holds true for a given tuple,

• we say that the rule antecedent is satisfied (or simply, that the
rule is satisfied) and that the rule covers the tuple.
Cont..
• IF AND THEN
• R1.
• IF outlook = sunny AND humidity = high
THEN play=no
R2
IF outlook = sunny THEN play=yes
Rule Based Classification
• A rule R can be assessed by its coverage and accuracy.

• Given a tuple, X, from a class labeled data set, D.

• let ncovers be the number of tuples covered by R.

• ncorrect be the number of tuples correctly classified by R.

• and |D| be the number of tuples in D.

• We can define the coverage and accuracy of R as

Rule Based Classification

• That is, a rule’s coverage is the percentage of tuples that are

covered by the rule.
(i.e., whose attribute values hold true for the rule’s antecedent).

• For a rule’s accuracy, we look at the tuples that it covers and see
what percentage of them the rule can correctly classify.
Rule Based Classification

• Consider rule R1 above, which covers 2 of the 14 tuples. It can

correctly classify both tuples.

• Therefore, coverage(R1) = (2/14)*100

= 14.28%

accuracy(R1) = (2/2)*100
= 100%.
Neural Network
• “Neural Network" (NN), is a mathematical model or computational
model based on biological ‘neural networks’

• It’s set of connected input/output units in which each connection

has a weight associated with it.

• Advantages:

◦ Prediction accuracy is generally high

◦ Fast evaluation of the learned target function
Neural Network
• Disadvantages:

◦ long training time

◦ difficult to understand the learned function (weights)
◦ not easy to incorporate domain knowledge
NEURONS
Cell body

axoms
dendrites

Terminal exom
Neural Network

F(y)

x1w1
Neural Network
• Network Training:

• The ultimate objective of training is to obtain a set of weights

that makes almost all the tuples in the training data classified
correctly.
•Steps:
◦ Initialize weights with random values.
◦ Feed the input tuples into the network one by one.
◦ For each unit
◦ Compute the net input to the unit as a linear combination of all the
inputs to the unit
◦ Compute the output value using the activation function
◦ Compute the error
◦ Update the weights and the bias
Other Classification Methods
•k-nearest neighbor classifier
•case-based reasoning
•Genetic algorithm

•Rough set approach

•Fuzzy set approaches

K-nearest neighbor classifier
• A powerful classification algorithm used in pattern recognition.

• K nearest neighbors stores all available cases and classifies new

cases based on a similarity measure(e.g distance function).
K-nearest neighbor classifier
• A powerful classification algorithm used in pattern recognition.

• K nearest neighbors stores all available cases and classifies new

cases based on a similarity measure(e.g distance function).

KNN: Classification Approach

• An object (a new instance) is classified by a majority votes for its

neighbor classes.

• The object is assigned to the most common class amongst its K

nearest neighbors.(measured by a distant function )
K-nearest neighbor classifier
• Distance measured for continuous variables..

Euclidean Distance formula

K-nearest neighbor Algorithm
• All the instances correspond to points in an n-dimensional feature
space.

• Each instance is represented with a set of numerical attributes.

• Each of the training data consists of a set of vectors and a class label
associated with each vector.

• Classification is done by comparing feature vectors of different K

nearest points.

• elect the K-nearest examples to E in the training set.

• Assign E to the most common class among its K-nearest neighbors.

K-nearest neighbor Algorithm
• Strengths of KNN

• Very simple.
• Can be applied to the data from any distribution.
• Good classification if the number of samples is large enough.

• Weaknesses of KNN

• Takes more time to classify a new example.

• need to calculate and compare distance from new example to all
other examples.
• Choosing k may be tricky.
• Need large number of samples for accuracy.
Prediction
• Regression:

• Regression is a data mining technique used to predict a range of

numeric values (also called continuous values), given a particular
dataset.

• For example, regression might be used to predict the cost of a

product or service, given other variables.

• Regression is used across multiple industries for business and

marketing planning, financial forecasting, environmental modeling
and analysis of trends.
Prediction
• Regression involves ,
Predictor variable (the values which are known) and
Response variable (values to be predicted).

There are 2 types of regression:

1) Linear Regression
2) Multiple Regression
Linear regression
• It is simplest form of regression. Linear regression attempts to
model the relationship between two variables by fitting a linear
equation to observe the data.

• Linear regression attempts to find the mathematical relationship

between variables.

• If outcome is straight line then it is considered as linear model

and if it is curved line, then it is a non linear model.

• The relationship between dependent variable is given by straight

line and it has only one independent variable.

Y= α+ΒX
Linear regression
• Model 'Y', is a linear function of 'X'.

• The value of 'Y' increases or decreases in linear manner according

to which the value of 'X' also changes.
Multiple Regression
• Multiple linear regression is an extension of linear regression
analysis.

• It uses two or more independent variables to predict an outcome

and a single continuous dependent variable.

Y = a0 + a1 X1 + a2 X2 +.........+ak Xk +e
where,

'Y' is the response variable.

X1 + X2 + Xk are the independent predictors.
'e' is random error.
a0, a1, a2, ak are the regression coefficients.
• Accuracy and Error Measures
• Accuracy, Precision, Recall, F1-Score
• Mean Squared Error (MSE), Root Mean
Squared Error (RMSE)
• Receiver Operating Characteristic (ROC)
Curve, Area Under Curve (AUC)
• Evaluating Classifier or Predictor Accuracy
• Cross-Validation (k-fold, Leave-One-Out)
• Confusion Matrix
• Bias-Variance Analysis
Data Mining Tools
• WEKA
• DB Miner
• DTREG DM
 Thank You 

Panel Analysis - April 2019 PDF
100% (1)
Panel Analysis - April 2019 PDF
303 pages
(Practical Guides To Biostatistics and Epidemiology) Jos W. R. Twisk - Applied Mixed Model Analysis - A Practical Guide-Cambridge University Press (2019)
100% (2)
(Practical Guides To Biostatistics and Epidemiology) Jos W. R. Twisk - Applied Mixed Model Analysis - A Practical Guide-Cambridge University Press (2019)
243 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
Classification
No ratings yet
Classification
23 pages
Lecture-5 Classification in ML
No ratings yet
Lecture-5 Classification in ML
50 pages
Data Mining UNIT-2 Notes
No ratings yet
Data Mining UNIT-2 Notes
91 pages
Path Analysis
No ratings yet
Path Analysis
11 pages
1-2C-Introduction To Statistical Analysis For Industrial Engineering 2
100% (1)
1-2C-Introduction To Statistical Analysis For Industrial Engineering 2
12 pages
Bayesian Inference 4 LMS PDF
No ratings yet
Bayesian Inference 4 LMS PDF
91 pages
Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Econometrics1 Cha2
100% (1)
Econometrics1 Cha2
77 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
IntroClassificationDA 2024
No ratings yet
IntroClassificationDA 2024
129 pages
Classification and Prediction
No ratings yet
Classification and Prediction
14 pages
Unit10 Plane of Regression: Structure
No ratings yet
Unit10 Plane of Regression: Structure
14 pages
Module 04
No ratings yet
Module 04
75 pages
Factor Analysis (Optional Session)
100% (2)
Factor Analysis (Optional Session)
23 pages
Down 4
No ratings yet
Down 4
83 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Unit 3
No ratings yet
Unit 3
53 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
New Classification11
No ratings yet
New Classification11
98 pages
Unit 4 - Classification and Prediction
No ratings yet
Unit 4 - Classification and Prediction
72 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
IE0005 Exercise Solutions 2-6
No ratings yet
IE0005 Exercise Solutions 2-6
84 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
CH 5
No ratings yet
CH 5
84 pages
DW Unit 6-Min
No ratings yet
DW Unit 6-Min
44 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Unit 3
No ratings yet
Unit 3
16 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Classification
No ratings yet
Classification
33 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
What Is Classification? What Is Prediction?
No ratings yet
What Is Classification? What Is Prediction?
36 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
DM Unit 4
No ratings yet
DM Unit 4
22 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Chapter 4 - Part 1
No ratings yet
Chapter 4 - Part 1
28 pages
10 Classification2022
No ratings yet
10 Classification2022
20 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
Outlook Temp Humidity Windy Play
No ratings yet
Outlook Temp Humidity Windy Play
17 pages
Unit6 - 1 Classification-and-Prediction-Basics
No ratings yet
Unit6 - 1 Classification-and-Prediction-Basics
12 pages
MSF 566 Topic 04 Modeling With Volatility
No ratings yet
MSF 566 Topic 04 Modeling With Volatility
36 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
U4 Clasification and Prediction
No ratings yet
U4 Clasification and Prediction
15 pages
Sta404 - Chapter 5 - Bivariate Analysis (Student)
No ratings yet
Sta404 - Chapter 5 - Bivariate Analysis (Student)
27 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Regression Analysis
No ratings yet
Regression Analysis
14 pages
Unit 3
No ratings yet
Unit 3
15 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
Bujang. 2024. Guideline Sample Size Correlation Dengan Tabel
No ratings yet
Bujang. 2024. Guideline Sample Size Correlation Dengan Tabel
8 pages
Animal Species Prediction Using Machine Learning
No ratings yet
Animal Species Prediction Using Machine Learning
10 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
Reference+Materials ARIMA
No ratings yet
Reference+Materials ARIMA
21 pages
18mca52c U3
No ratings yet
18mca52c U3
8 pages
Chp8 (Topic Not in Book) - ClassificationPrediction+Issues
No ratings yet
Chp8 (Topic Not in Book) - ClassificationPrediction+Issues
7 pages
Stats216 hw3 PDF
No ratings yet
Stats216 hw3 PDF
26 pages
Week 6
No ratings yet
Week 6
11 pages
9 Data Mining - Classification & Prediction
No ratings yet
9 Data Mining - Classification & Prediction
4 pages
Stat 250 Gunderson Lecture Notes 11: Regression Analysis: Main Idea
No ratings yet
Stat 250 Gunderson Lecture Notes 11: Regression Analysis: Main Idea
22 pages
Name-Devashish Chatterjee, Manisha Masud PGDM ID - 379, 2113 Assignment - M2 / Cluster Analysis
No ratings yet
Name-Devashish Chatterjee, Manisha Masud PGDM ID - 379, 2113 Assignment - M2 / Cluster Analysis
4 pages
6.3 Linear Regression
No ratings yet
6.3 Linear Regression
4 pages
CS402 Mod 3
No ratings yet
CS402 Mod 3
2 pages
ANOVA Poplar-Trees
No ratings yet
ANOVA Poplar-Trees
3 pages
Lesson 4.2 Cost Estimation 2024
No ratings yet
Lesson 4.2 Cost Estimation 2024
4 pages
Assignment - Econometrics (Instrumental Variable Stock Watson)
No ratings yet
Assignment - Econometrics (Instrumental Variable Stock Watson)
10 pages
Chap 14 - Simple Linear Regression
No ratings yet
Chap 14 - Simple Linear Regression
3 pages
Assignment 1 - Do-File Final
No ratings yet
Assignment 1 - Do-File Final
2 pages
Two Way Anova Dengan Replikasi: Gender Pendidikan Ujian
No ratings yet
Two Way Anova Dengan Replikasi: Gender Pendidikan Ujian
2 pages

Unit-5 3161610

Uploaded by

Unit-5 3161610

Uploaded by

Unit: 5

Classification and Prediction

• Classification and Prediction are two forms of data

• Such analysis can help to provide us with a better

• Classification predicts categorical (discrete,

• A model is first created based on the data

• The model is then used to predict future or unknown

• If forecasting discrete value ( Classification )

• If forecasting continuous value ( Prediction )

• For example you will have certain number of friends

• So These type of data values are called as discrete.

• weight of some object, height of the person these

• With the help of the bank loan application , let us

• The Data Classification process includes two steps:

• Building the Classifier or Model (Learning)

• This step is the learning step or the learning phase.

• In this step, the classifier is used for classification.

• Training data are analyzed by a classification

• Here, the class label attribute is loan decision, and

• Test data are used to estimate

• If the accuracy is considered acceptable, the rule can

• Data cleaning involves removing the noise and

• Correlation analysis is used to know whether any two given

• Hence, relevance analysis, in the form of correlation analysis

• Normalization − The data is transformed using

• Generalization − The data can also be transformed

• For example, numeric values for the attribute income

Rule Based Classification

Leaf Node Leaf Node

Set of Possible Answers Set of Possible Answers

Entire Data Set D Employed Test?

High Low High Low

Approve Reject Approve Reject

•Decision tree generation consists of two phases

• The attribute which has the highest information gain is selected

• Assume, that there are two classes P(positive) & N(negative).

• Suppose we have S samples, out of these p samples belongs to

• The amount of information, needed to decide split in S belongs to

• What an Entropy basically does?

• Entropy controls how a Decision Tree decides

• n: the number of classes

Ginisplit(T) = gini (T1) + gini (T2)

• where pi is the probability that a tuple in D belongs to

• A (root) node N is created for the tuples

• Consider the subset {low, medium}.

• This would result in 10 tuples in partition D1 satisfying the

• The remaining four tuples of D would be assigned to partition D2.

• Therefore, the best binary split for attribute income is on

• Evaluating the attribute, we obtain {youth, senior} (or

• The binary split “income ∈ {medium, high}” results in the

• Bayesian classifiers are statistical classifiers.

• They can predict class membership probabilities, such as the

• Bayesian classification is based on Bayes’ theorem.

• Bayesian classifiers have also exhibited high accuracy and speed

“The effect of an attribute value on a given class is independent

• This assumption is called class conditional independence.

• Naive Bayes classifier assume that the effect of the value of a

• This assumption is called class conditional independence.

• P(c|x) is the posterior probability of class (target) given predictor

• We wish to predict the class label of a tuple using naïve Bayesian

• The data tuples are described by the attributes age, income,

• Let C1 correspond to the class buys computer = yes

• C2 correspond to buys computer = no.

• The tuple we wish to classify is

X = (age = youth, income = medium, student = yes, credit rating = fair)

We need to maximize P(X/Ci)P(Ci), for i = 1, 2.

P(buys computer = yes) : 9/14 = 0.643

P(buys computer = no) : 5/14 = 0.357

To compute P(X/Ci), for i = 1, 2

P(age = youth / buys computer = yes) : 2/9 = 0.222

P(income = medium / buys computer = yes) : 4/9 = 0.444

P(student = yes / buys computer = yes): 6/9 = 0.667

P(credit rating = fair / buys computer = yes) : 6/9 = 0.667

P(X/buys computer = yes) = P(age = youth / buys computer = yes) *

P(X/buys computer = no) = 0.600*0.400*0.200*0.400

To find the class, Ci, that maximizes P(X/Ci)P(Ci), we compute

P(X/buys computer = no) = 0.6000.4000.200*0.400