0% found this document useful (0 votes)

44 views

Data Mining - Classification & Prediction

There are two main forms of data analysis: classification and prediction. Classification models predict categorical class labels, while prediction models predict continuous numeric values. Some examples given include classifying loan applications as safe or risky using classification, and predicting customer expenditures using prediction. The document then discusses the processes of building classification models, using classifiers, and some common issues around preparing data for classification and prediction like data cleaning, normalization, and attribute relevance analysis.

Uploaded by

Tdx mentor

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Data Mining - Classification & Prediction

Uploaded by

Tdx mentor

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Mining - Classification & Prediction

There are two forms of data analysis that can be used for extracting models describing
important classes or to predict future data trends. These two forms are as follows −

 Classification
 Prediction
Classification models predict categorical class labels; and prediction models predict
continuous valued functions.
For example, we can build a classification model to categorize bank loan applications as
either safe or risky, or a prediction model to predict the expenditures in dollars of potential
customers on computer equipment given their income and occupation.

What is classification?
Following are the examples of cases where the data analysis task is Classification −
 A bank loan officer wants to analyze the data in order to know which customer (loan
applicant) are risky or which are safe.
 A marketing manager at a company needs to analyze a customer with a given profile,
who will buy a new computer.
In both of the above examples, a model or classifier is constructed to predict the categorical
labels. These labels are risky or safe for loan application data and yes or no for marketing
data.

What is prediction?
Following are the examples of cases where the data analysis task is Prediction −
Suppose the marketing manager needs to predict how much a given customer will spend
during a sale at his company. In this example we are bothered to predict a numeric value.
Therefore the data analysis task is an example of numeric prediction. In this case, a model or
a predictor will be constructed that predicts a continuous-valued-function or ordered value.
Note − Regression analysis is a statistical methodology that is most often used for numeric
prediction.

How Does Classification Works?

With the help of the bank loan application that we have discussed above, let us understand
the working of classification. The Data Classification process includes two steps −

 Building the Classifier or Model

 Using Classifier for Classification
Building the Classifier or Model
 This step is the learning step or the learning phase.
 In this step the classification algorithms build the classifier.
 The classifier is built from the training set made up of database tuples and their
associated class labels.
 Each tuple that constitutes the training set is referred to as a category or class. These
tuples can also be referred to as sample, object or data points.

Using Classifier for Classification

In this step, the classifier is used for classification. Here the test data is used to estimate the
accuracy of classification rules. The classification rules can be applied to the new data tuples
if the accuracy is considered acceptable.

Classification and Prediction Issues

The major issue is preparing the data for Classification and Prediction. Preparing the data
involves the following activities −
 Data Cleaning − Data cleaning involves removing the noise and treatment of
missing values. The noise is removed by applying smoothing techniques and the
problem of missing values is solved by replacing a missing value with most
commonly occurring value for that attribute.
 Relevance Analysis − Database may also have the irrelevant attributes. Correlation
analysis is used to know whether any two given attributes are related.
 Data Transformation and reduction − The data can be transformed by any of the
following methods.
o Normalization − The data is transformed using normalization. Normalization
involves scaling all values for given attribute in order to make them fall
within a small specified range. Normalization is used when in the learning
step, the neural networks or the methods involving measurements are used.

Data Normalization in Data Mining

Last Updated: 25-06-2019

Normalization is used to scale the data of an attribute so that it falls in a smaller range, such as
-1.0 to 1.0 or 0.0 to 1.0. It is generally useful for classification algorithms.

Need of Normalization –

Normalization is generally required when we are dealing with attributes on a different scale,
otherwise, it may lead to a dilution in effectiveness of an important equally important
attribute(on lower scale) because of other attribute having values on larger scale.
In simple words, when multiple attributes are there but attributes have values on different
scales, this may lead to poor data models while performing data mining operations. So they are
normalized to bring all the attributes on the same scale.

Methods of Data Normalization –

 Decimal Scaling
 Min-Max Normalization
 z-Score Normalization(zero-mean Normalization)
Decimal Scaling Method For Normalization –

It normalizes by moving the decimal point of values of the data. To normalize the data by this
technique, we divide each value of the data by the maximum absolute value of data. The data
value, vi, of data is normalized to vi‘ by using the formula below –

where j is the smallest integer such that max(|vi‘|)<1.

EXAMPLE
Let the input data is: -10, 201, 301, -401, 501, 601, 701
To normalize the above data,
Step 1: Maximum absolute value in given data(m): 701
Step 2: Divide the given data by 1000 (i.e j=3)
Result: The normalized data is: -0.01, 0.201, 0.301, -0.401, 0.501, 0.601, 0.701

Min-Max Normalization –

In this technique of data normalization, linear transformation is performed on the original data.
Minimum and maximum value from data is fetched and each value is replaced according to the
following formula.

Where A is the attribute data,

Min(A), Max(A) are the minimum and maximum absolute value of A respectively.
v’ is the new value of each entry in data.
v is the old value of each entry in data.
new_max(A), new_min(A) is the max and min value of the range(i.e boundary value of range
required) respectively.

Z-score normalization –

In this technique, values are normalized based on mean and standard deviation of the data A.
The formula used is:
v’, v is the new and old of each entry in data respectively. σA, A is the standard deviation and
mean of A respectively.

o Generalization − The data can also be transformed by generalizing it to the

higher concept. For this purpose we can use the concept hierarchies.
Note − Data can also be reduced by some other methods such as wavelet transformation,
binning, histogram analysis, and clustering.

Comparison of Classification and Prediction Methods

Here is the criteria for comparing the methods of Classification and Prediction −
 Accuracy − Accuracy of classifier refers to the ability of classifier. It predict the
class label correctly and the accuracy of the predictor refers to how well a given
predictor can guess the value of predicted attribute for a new data.
 Speed − This refers to the computational cost in generating and using the classifier or
predictor.
 Robustness − It refers to the ability of classifier or predictor to make correct
predictions from given noisy data.
 Scalability − Scalability refers to the ability to construct the classifier or predictor
efficiently; given large amount of data.
 Interpretability − It refers to what extent the classifier or predictor understands.

Data Mining Classification Prediction
No ratings yet
Data Mining Classification Prediction
3 pages
9 Data Mining - Classification & Prediction
No ratings yet
9 Data Mining - Classification & Prediction
4 pages
DATA MINING JNTUH CSE R18
No ratings yet
DATA MINING JNTUH CSE R18
20 pages
u4 clasification and prediction
No ratings yet
u4 clasification and prediction
15 pages
Classification Analysis
No ratings yet
Classification Analysis
4 pages
DM Unit 4
No ratings yet
DM Unit 4
22 pages
Data Mining UNIT-2 Notes
No ratings yet
Data Mining UNIT-2 Notes
91 pages
DATA MINING MODULE 3
No ratings yet
DATA MINING MODULE 3
27 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
Classification in Data Mining 12
No ratings yet
Classification in Data Mining 12
7 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
Basic Concept of Classification (Data Mining)
No ratings yet
Basic Concept of Classification (Data Mining)
11 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
DMDW 5
No ratings yet
DMDW 5
25 pages
DADM S2 Data Preprocessing-Data Cleaning and Transformation
No ratings yet
DADM S2 Data Preprocessing-Data Cleaning and Transformation
12 pages
UNIT 3 DM
No ratings yet
UNIT 3 DM
34 pages
JAVA Advanced 3
No ratings yet
JAVA Advanced 3
19 pages
UNIT 4
No ratings yet
UNIT 4
39 pages
Data Mining unit-1 complete
No ratings yet
Data Mining unit-1 complete
45 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
HIT391-week 3-New
No ratings yet
HIT391-week 3-New
43 pages
Assignment Solution 074
No ratings yet
Assignment Solution 074
8 pages
BANA 560 - Lecture - 2 - Data - Mining - Overview - Data - Exploration
No ratings yet
BANA 560 - Lecture - 2 - Data - Mining - Overview - Data - Exploration
38 pages
Concepts (PPT) - Data Preprocessing
No ratings yet
Concepts (PPT) - Data Preprocessing
19 pages
Script
No ratings yet
Script
5 pages
Unit 4 Data warehousing and Data mining
No ratings yet
Unit 4 Data warehousing and Data mining
15 pages
Classification: Unit-III
No ratings yet
Classification: Unit-III
90 pages
6 Data Preprocessing
No ratings yet
6 Data Preprocessing
37 pages
202396123846584_26076Classification - Data Mining
No ratings yet
202396123846584_26076Classification - Data Mining
4 pages
Classification
No ratings yet
Classification
15 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
Session-2-CO3-Introduction to Data Preprocessing (1)
No ratings yet
Session-2-CO3-Introduction to Data Preprocessing (1)
39 pages
Data Mining Notes
No ratings yet
Data Mining Notes
43 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
3 pages
Data Transformation and standardization
No ratings yet
Data Transformation and standardization
5 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
Lecture 16
No ratings yet
Lecture 16
14 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
DM 02 04 Data Transformation
No ratings yet
DM 02 04 Data Transformation
52 pages
Statistics for Data Science
No ratings yet
Statistics for Data Science
39 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
Data Mining
No ratings yet
Data Mining
40 pages
Question 4 Module
No ratings yet
Question 4 Module
26 pages
18mca52c U3
No ratings yet
18mca52c U3
8 pages
Mod1 DM Part2
No ratings yet
Mod1 DM Part2
34 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
5 pages
Lect 2
No ratings yet
Lect 2
35 pages
Data
No ratings yet
Data
36 pages
Data Cleaning: Missing Values: - For Example in Attribute Income If
No ratings yet
Data Cleaning: Missing Values: - For Example in Attribute Income If
30 pages
Unit 1(DS)
No ratings yet
Unit 1(DS)
15 pages
Data Transformation
No ratings yet
Data Transformation
16 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Study+Material+Unit 4+Data+Preprocessing+
No ratings yet
Study+Material+Unit 4+Data+Preprocessing+
8 pages
Data Mining
No ratings yet
Data Mining
49 pages
Introd M
No ratings yet
Introd M
37 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
ML QB Ans
No ratings yet
ML QB Ans
141 pages
Web Content Extraction Through Machine Learning: Ziyan Zhou Ziyanjoe@stanford - Edu Muntasir Mashuq Muntasir@stanford - Edu
No ratings yet
Web Content Extraction Through Machine Learning: Ziyan Zhou Ziyanjoe@stanford - Edu Muntasir Mashuq Muntasir@stanford - Edu
5 pages
Prediction of Personal Loan Approval in Bank Using Logistic Regression and Support Vector Machine
No ratings yet
Prediction of Personal Loan Approval in Bank Using Logistic Regression and Support Vector Machine
3 pages
ML Using Python IT UPDATED
No ratings yet
ML Using Python IT UPDATED
53 pages
Datascience 4 Months
No ratings yet
Datascience 4 Months
9 pages
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications With JMP Pro Download PDF
100% (3)
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications With JMP Pro Download PDF
41 pages
Discriminating Stress From Cognitive Load Using A Wearable EDA Device
No ratings yet
Discriminating Stress From Cognitive Load Using A Wearable EDA Device
9 pages
Livestock Disease Prediction System
0% (1)
Livestock Disease Prediction System
3 pages
Interview Questions
No ratings yet
Interview Questions
13 pages
Understaing Support Vector Machine Example Code
No ratings yet
Understaing Support Vector Machine Example Code
11 pages
Soil Nutrient Analysis (1)
No ratings yet
Soil Nutrient Analysis (1)
9 pages
Automation in Spreading and Cutting - Mansi, Akriti, Khushi, Nisha PDF
No ratings yet
Automation in Spreading and Cutting - Mansi, Akriti, Khushi, Nisha PDF
30 pages
Plant Disease Detector: Pune Institute of Computer Technology (PICT)
No ratings yet
Plant Disease Detector: Pune Institute of Computer Technology (PICT)
5 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
On Fyr
No ratings yet
On Fyr
18 pages
HAFELE Balamale
No ratings yet
HAFELE Balamale
54 pages
Analysis of Women Safety
100% (1)
Analysis of Women Safety
11 pages
Machine Learning Ai Manufacturing PDF
No ratings yet
Machine Learning Ai Manufacturing PDF
6 pages
Cluster Analysis: Biological Data Analysis and Chemometrics
No ratings yet
Cluster Analysis: Biological Data Analysis and Chemometrics
41 pages
Classification
No ratings yet
Classification
12 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Midterm Lab Exam - Attempt Review
No ratings yet
Midterm Lab Exam - Attempt Review
17 pages
BMW M-4
No ratings yet
BMW M-4
108 pages
On Preprocessing Data For Financial Credit Risk Evaluation
No ratings yet
On Preprocessing Data For Financial Credit Risk Evaluation
9 pages
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
100% (1)
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
35 pages
AIML PPTS Merged
No ratings yet
AIML PPTS Merged
125 pages
Machine Learning: III B. Tech I Semester Regular/Supplementary Examinations, December - 2023
No ratings yet
Machine Learning: III B. Tech I Semester Regular/Supplementary Examinations, December - 2023
8 pages
Comparing CNNs and Random Forests For Landsat
No ratings yet
Comparing CNNs and Random Forests For Landsat
19 pages
Remote Sensing Classification Methods
No ratings yet
Remote Sensing Classification Methods
47 pages
Crisp-Dm: Cross Industry Standard Process For Data Mining
No ratings yet
Crisp-Dm: Cross Industry Standard Process For Data Mining
60 pages

Data Mining - Classification & Prediction

Uploaded by

Data Mining - Classification & Prediction

Uploaded by

Data Mining - Classification & Prediction

How Does Classification Works?

 Building the Classifier or Model

Using Classifier for Classification

Classification and Prediction Issues

Data Normalization in Data Mining

Methods of Data Normalization –

where j is the smallest integer such that max(|vi‘|)<1.

Where A is the attribute data,

o Generalization − The data can also be transformed by generalizing it to the

Comparison of Classification and Prediction Methods

You might also like