0% found this document useful (0 votes)

24 views22 pages

3 1 Chapter 3 Normalization

This document discusses techniques for major data preprocessing including normalization, data transformation, data discretization, and data reduction. Normalization techniques such as range normalization and z-score normalization are presented. Data transformation strategies like histograms and clustering are overviewed.

Uploaded by

mazeen naser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views22 pages

3 1 Chapter 3 Normalization

Uploaded by

mazeen naser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Chapter 3-1

Techniques for solving of Major Data Preprocessing

• Normalization

3/7/2023 1
• Data Transformation and Data Discretization
• This section presents methods of data transformation.
• In this preprocessing step, the data are transformed or consolidated so
that the resulting mining process may be more efficient, and the
patterns found may be easier to understand.

3/7/2023 2
•Data Transformation Strategies Overview
In data transformation, the data are transformed or consolidated into
forms appropriate for mining.
• Strategies for data transformation include the following:

3/7/2023 3
• Data Transformation by Normalization
•
Data Normalization
• Motivation The measurement unit used can affect the data analysis.
• For example, changing measurement units from meters to inches for height, or from
kilograms to pounds for weight, may lead to very different results
• To help avoid dependence on the choice of measurement units, the data should be normalized or
standardized.

• This involves transforming the data to fall within a smaller or common range such as [-1,1] or
[0.0, 1.0].

• Normalizing the data attempts to give all attributes an equal weight.

• Normalization is particularly useful for classification algorithms involving neural networks or
distance measurements such as nearest-neighbor classification and clustering
•
- If using the neural network backpropagation algorithm for classification mining

3/7/2023 4
• Range Normalization
Let X be an attribute and let x1,x2,...,xn be a random sample drawn from X. In range
normalization each value is scaled by the sample range rˆ of X:

• After transformation the new attribute takes on values in the range [0,1].
• Example, Lets X taken the values 12, 14, 18 , 23 , transformation this data into
normalization
Solution /solving : Sep1: Find Max X= 23 , MinX= 12 , MaxXi-MinXi= 23-12= 11
step2: x1= 12-12/11= 0 , x2= 14-12/11=0.1818
, x3=18-12/11= 0.5454 , x4=23-12/ = 1

Xi-Normalization = 0, 0.1818 , 0.5454 , 1

Note : Can we done this rule by Excel or matlab

3/7/2023 5
• Example 2 , consider we have the table below

• Transfer data of Income in to normalization data depend rule above

3/7/2023 6
• Solution

3/7/2023 7
• Second method
• Normal Xi= Xi/max xi
• Example, Lets X taken the values 12, 14, 18 , 23 , transformation this data into
normalization

• Solution , max Xi= 23

• Normalxi= x1= 12/23 = 0.521 , x2=14/23= 0.6086 , x3= 18/23=0.782
, x4=23/23= 1
• The data normalization = 0.521 , 0.6086 , 0.782 , 1

3/7/2023 8
3/7/2023 9
• Example 3 , consider we have the table below

• Transfer data of Income in to normalization data depend rule above

3/7/2023 10
• There are many methods for data normalization
1- Min-max normalization performs a linear transformation on the original
data. Suppose that minA and maxA are the minimum and maximum values
of an attribute, A. Min-max normalization maps a value, vi, of A to vi0 in the
range [new minA, new maxA ] by computing

• Min-max normalization preserves the relationships among the original data

values. It will encounter an “out-of-bounds” error if a future input case for
normalization falls outside of the original data range for A

3/7/2023 11
• Example Min-max normalization. Suppose that the minimum and
maximum values for the attribute income are $12,000 and $98,000,
respectively. We would like to map income to the range [0.0, 1.0]. By min-
max normalization, a value of $73,600 for income is transformed to

•
•=

3/7/2023 12
• In z-score normalization (or zero-mean normalization), the values for an
attribute, A, are normalized based on the mean (i.e., average) and standard
deviation of A. A value, vi, of A is normalized to vi0 by computing
•

• where j is the smallest integer such that max.jvi0j/ < 1

Example 3.6 Decimal scaling. Suppose that the recorded values of A range
from -986 to 917. The maximum absolute value of A is 986. To normalize by
decimal scaling, we therefore divide each value by 1000 (i.e., j D 3) so that -986
normalizes to -0.986 and 917 normalizes to 0.917.

3/7/2023 13
• Note that normalization can change the original data quite a bit, especially
when using z-score normalization or decimal scaling. It is also necessary to
save the normalization parameters (e.g., the mean and standard deviation
if using z-score normalization) so that future data can be normalized in a
uniform manner

3/7/2023 14
• Data Reduction

• The basic idea of this theory is to reduce the data representation which trades accuracy for
speed in response to the need to obtain quick approximate answers to queries on very large
databases. Some of the data reduction techniques are as follows:

Histograms
1-
2- Clustering
3- Sampling
4- Construction of Index Trees
5- Singular value Decomposition
6- Wavelets
7- Regression
8- Log-linear models

3/7/2023 15
• Histograms
• Histograms use binning to approximate data distributions and are a popular
form of data reduction
• A histogram for an attribute, A, partitions the data distribution of A into
disjoint subsets, referred to as buckets or bins.
• If each bucket represents only a single attribute–value/frequency pair, the
• buckets are called singleton buckets.
• Often, buckets instead represent continuous ranges for the given attribute.
•

3/7/2023 16
• Example 3.3 Histograms. The following data are a list of AllElectronics prices for
commonly sold items (rounded to the nearest dollar). The numbers have been
sorted: 1, 1, 5, 5, 5, 5, 5, 8, 8, 10, 10, 10, 10, 12, 14, 14, 14, 15, 15, 15, 15, 15,
15, 18, 18, 18, 18, 18, 18, 18, 18, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 25,
25, 25, 25, 25, 28, 28, 30, 30, 30.
• Reduction the data above via a histogram ?
Solution
• step1 : Record the data using singleton bin/ buckets
• Step2: fill in the table
•
Data 1 5 8 10 12 14 15 18 20 21 25 28 30
Frequency 2 5 2 4 1 3 6 8 7 4 5 2 3

3/7/2023 17
• Draw the figure
Chart Title
35

0
1 2 3 4 5 6 7 8 9 10 11 12 13

Data Frequency

3/7/2023 18
• An equal-width histogram for price, where values are aggregated so that each
bucket has a uniform width of $10.

Frequency
30

0
10-Jan 20-Nov 21 -30

3/7/2023 19
• Data Compression - The basic idea of this theory is to
compress the given data by encoding in terms of the
following:
1- Decision Trees
2-Clusters
3-Association Rules
4- Bits

3/7/2023 20
• Pattern Discovery - The basic idea of this theory is to
discover patterns occurring in a database. Following are the
areas that contribute to this
theory:
•
1-Machine Learning
2- Neural Network
3- Association Mining
4-Sequential Pattern Matching
5- Clustering

3/7/2023 21
• END

3/7/2023 22

Data Transformation and Standardization
No ratings yet
Data Transformation and Standardization
5 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
K-Nearest Neighbors
100% (1)
K-Nearest Neighbors
32 pages
Computer Science Industrial Training Report
No ratings yet
Computer Science Industrial Training Report
43 pages
Data Normalization in Data Mining
No ratings yet
Data Normalization in Data Mining
8 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Project
No ratings yet
Project
13 pages
AI Material
No ratings yet
AI Material
47 pages
Dibora@t 1
No ratings yet
Dibora@t 1
36 pages
Detection of Fake News Posts On Facebook
No ratings yet
Detection of Fake News Posts On Facebook
6 pages
Graded Quiz Unit 3 PDF
No ratings yet
Graded Quiz Unit 3 PDF
10 pages
Data Normalization
No ratings yet
Data Normalization
7 pages
Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations
No ratings yet
Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations
3 pages
L1 Segmentation 02
No ratings yet
L1 Segmentation 02
111 pages
Week 6: Test Bank Questions Data Mining and Data Warehousing - IT 446
No ratings yet
Week 6: Test Bank Questions Data Mining and Data Warehousing - IT 446
39 pages
CSE Syllabus For M.tech
No ratings yet
CSE Syllabus For M.tech
68 pages
4 - Finding and Fixing Data Quality Issues
No ratings yet
4 - Finding and Fixing Data Quality Issues
48 pages
Lec 05
No ratings yet
Lec 05
53 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
DeepLearning L1 Intro
No ratings yet
DeepLearning L1 Intro
92 pages
ISLP - Website-135-200 (1) - 1-60
No ratings yet
ISLP - Website-135-200 (1) - 1-60
60 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
52 pages
DAI101 4 Data Preparation
No ratings yet
DAI101 4 Data Preparation
45 pages
Agmas Getenet
No ratings yet
Agmas Getenet
75 pages
EECS6895 AdvancedBigDataAnalytics Lecture1
No ratings yet
EECS6895 AdvancedBigDataAnalytics Lecture1
102 pages
DM 02 04 Data Transformation
No ratings yet
DM 02 04 Data Transformation
52 pages
Mod1 DM Part2
No ratings yet
Mod1 DM Part2
34 pages
Guru Nanak Dev Engineering College, Ludhiana
No ratings yet
Guru Nanak Dev Engineering College, Ludhiana
48 pages
Module 6
No ratings yet
Module 6
82 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Data Minig Lab Manual
No ratings yet
Data Minig Lab Manual
58 pages
2023-Leveraging Targeted Machine Learning For Early Warning and Prevention OfStuck Pipe, Tight Holes, Pack Offs, Hole Cleaning Issues and Other PotentialDrilling Hazards
No ratings yet
2023-Leveraging Targeted Machine Learning For Early Warning and Prevention OfStuck Pipe, Tight Holes, Pack Offs, Hole Cleaning Issues and Other PotentialDrilling Hazards
15 pages
Data Preprocessing
No ratings yet
Data Preprocessing
49 pages
DMDW 5
No ratings yet
DMDW 5
25 pages
Model Selection and Feature Engineering
No ratings yet
Model Selection and Feature Engineering
64 pages
Nail Disease PREDICTION
No ratings yet
Nail Disease PREDICTION
34 pages
5 Data Pre Processing II
No ratings yet
5 Data Pre Processing II
26 pages
A Novel Stacking Approach For Accurate Detection of Fake News
No ratings yet
A Novel Stacking Approach For Accurate Detection of Fake News
14 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
48 pages
IDS5
No ratings yet
IDS5
56 pages
CH2 Data Integration - Transformation
No ratings yet
CH2 Data Integration - Transformation
16 pages
Data Pre Processing - NG
No ratings yet
Data Pre Processing - NG
43 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
15 pages
Data Transformation
No ratings yet
Data Transformation
12 pages
Data Transformation
No ratings yet
Data Transformation
16 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
Data Mining
No ratings yet
Data Mining
11 pages
Data Preparation DM
No ratings yet
Data Preparation DM
26 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
Lecture # 13 Data - Transformation - Techniques
No ratings yet
Lecture # 13 Data - Transformation - Techniques
36 pages
AIML Unit 2 Understanding Data
No ratings yet
AIML Unit 2 Understanding Data
51 pages
Scaling Techniques
No ratings yet
Scaling Techniques
30 pages
Study+Material+Unit 4+Data+Preprocessing+
No ratings yet
Study+Material+Unit 4+Data+Preprocessing+
8 pages
DSR Unit III
No ratings yet
DSR Unit III
11 pages
Sap Predictive Analytics Certification Training
No ratings yet
Sap Predictive Analytics Certification Training
7 pages
10-2 Data Analysis and Pre-Processing Part 4 PDF
No ratings yet
10-2 Data Analysis and Pre-Processing Part 4 PDF
23 pages
4 Data Pre Processing II
No ratings yet
4 Data Pre Processing II
26 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
Unit 4-1
No ratings yet
Unit 4-1
13 pages
UGC List of Approved Journals
No ratings yet
UGC List of Approved Journals
9 pages
Text-Mined Dataset of Inorganic Materials Synthesis Recipes: Data Descriptor
No ratings yet
Text-Mined Dataset of Inorganic Materials Synthesis Recipes: Data Descriptor
11 pages
Preprocessing Stage
No ratings yet
Preprocessing Stage
4 pages
8 Normalization Methods
No ratings yet
8 Normalization Methods
10 pages
WINSEM2024-25 MCSE615L TH VL2024250502897 2025-01-11 Reference-Material-I
No ratings yet
WINSEM2024-25 MCSE615L TH VL2024250502897 2025-01-11 Reference-Material-I
11 pages
Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning
No ratings yet
Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning
4 pages
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
No ratings yet
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
10 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Normalization A Preprocessing Stage
No ratings yet
Normalization A Preprocessing Stage
5 pages
Leo Breiman 2001 Random Forest Algorithm Weka - Google Scholar
No ratings yet
Leo Breiman 2001 Random Forest Algorithm Weka - Google Scholar
6 pages
Lecture 10 - Data Transformation-M
No ratings yet
Lecture 10 - Data Transformation-M
8 pages
dmdw2 2
No ratings yet
dmdw2 2
24 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Data Normalization and Standardization
No ratings yet
Data Normalization and Standardization
6 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
No ratings yet
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
7 pages
Data Preprocessing Techniques
No ratings yet
Data Preprocessing Techniques
11 pages
Step 06 - Data Preprocessing
No ratings yet
Step 06 - Data Preprocessing
10 pages
Normalization: Normalization Techniques at A Glance
No ratings yet
Normalization: Normalization Techniques at A Glance
5 pages
3point5point2 Normalization
No ratings yet
3point5point2 Normalization
3 pages
Iarjset 5
No ratings yet
Iarjset 5
3 pages
Syllabus
No ratings yet
Syllabus
2 pages
Chapter 3 Solutions
No ratings yet
Chapter 3 Solutions
3 pages
Data Mining: A Preprocessing Engine
No ratings yet
Data Mining: A Preprocessing Engine
5 pages
Resume 202404220944
No ratings yet
Resume 202404220944
1 page
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet

3 1 Chapter 3 Normalization

Uploaded by

3 1 Chapter 3 Normalization

Uploaded by

Chapter 3-1

Techniques for solving of Major Data Preprocessing

• Normalizing the data attempts to give all attributes an equal weight.

Xi-Normalization = 0, 0.1818 , 0.5454 , 1

• Transfer data of Income in to normalization data depend rule above

• Solution , max Xi= 23

• Transfer data of Income in to normalization data depend rule above

• Min-max normalization preserves the relationships among the original data

• where j is the smallest integer such that max.jvi0j/ < 1

You might also like