0% found this document useful (0 votes)

371 views6 pages

Data Normalization and Standardization

Data normalization and standardization are two important preprocessing techniques used in machine learning and data mining. Normalization scales data to a specific range like 0-1 or -1 to 1 to account for differences in data ranges. Standardization transforms data to have a mean of 0 and standard deviation of 1 to handle non-normal distributions. Both techniques make data more suitable for analysis by algorithms like support vector machines and neural networks. Key steps in normalization and standardization are outlined along with examples and applications in tools like WEKA.

Uploaded by

Allan Silva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

371 views6 pages

Data Normalization and Standardization

Uploaded by

Allan Silva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Normalization and Standardization

Peshawa Jammal Muhammad Ali

Department of Software Engineering, Koya University, Kurdistan Region, Iraq.
[email protected]

Please write me your comments by email so as I can improve the document

Abstract
This paper aims to clarify how and why data are normalized or standardized, these two
processes are used in the data preprocessing stage in which the data is prepared to be
processed later by one of the data mining and machine learning techniques like support vector
machine, neural network, etc. The two methods try to scale the data set. These two processes
are helpful in some cases and necessary in some other cases, most of the data mining and
machine learning tools include these two preprocessing techniques like in Weka or in Matlab.
This paper will simply define and present the use of these two data preprocessing techniques.

Normalization
It’s the process of casting the data to the specific range, like between 0 and 1 or between -1 and
+1. Normalization is required when there are big differences in the ranges of different features.
This scaling method is useful when the data set does not contain outliers. The theoretical
background of normalization can be easily understood from Figure (1). If it is required to cast
the data to the range 0,1 then:

From Trigonometry:
valueAf terN ormalization − 0 valueBef oreN ormalization − min
1−0 = max − min
valueAf terN ormalization valueBef oreN ormalization − min
1 = max − min

valueBef oreN ormalization − min

v alueAf terN ormalization = max − min

x − min
or x′ = max − min

Denormalization
This process should be done if normalization applied. For example, to denormalize the a data
from the range 0, 1 below equation can be used:

x = [x′ * (max − min)] + min

where x’ is the normalized data and x is denormalized data, min and max are the same values
used previously in the normalization process.

To normalize the data to the range -1, +1 see Fig(2):

valueAf terN ormalization − (−1) valueBef oreN ormalization − min
1 − (−1) = max − min

valueAf terN ormalization +1) valueBef oreN ormalization − min

2 = max − min

valueBef oreN ormalization − min

v alueAf terN ormalization = 2 * ( max − min ) −1

x − min
or x′ = 2 * ( max − min ) − 1

Denormalization from range -1, +1

x = [ ( x′ 2+ 1 )(max − min) ] + min

In WEKA, for the range -1,+1, the formula is organized as follow:

x − min
x′ = 2 * ( max − min ) − 1

x − min x − min−( max−min )

x′ = ( max−min ) − 1 = [ 2
max−min ]
2 2

x − min− max min

2 + 2 ) x − max min
2 − 2 )
x′ = [ max−min ]=[ max−min ]
2 2

x − ( max min
2 + 2 )
x′ = [ max−min ]
2

x − ( max 2+ min )
x′ = max−min
2
Z-score standardization
Making a data set with mean=0, and standard deviation =1. This scaling method is
useful when the data follows a normal distribution (Gaussian distribution), if the data
does not follow normal distribution then this will make problems.

Example: -20, -6, 0, 40, 70,120

−20−6+0+40+70+120
M ean = 6 = 34

sd = √ (−20−34)2 +(−6−34)2 +(0−34)2 +(40−34)2 +(70−34)2 + (120−34)2

sd = 48.98979

z-score standardization

x−mean −20−34
x" = sd = 48.98979 = − 1.1022

Other values are changed too,

Accordingly, values are changed to:

-1.10227

-0.8165

-0.69402

0.122474

0.734847

1.755468
Now, if you calculate the average and sd of these new values you will see that the mean
is zero and sd=1.

Important note:

However, the point must be made that N/S are _not_ good where the raw measurement
is desireable and where the N/S is irreversible, thus losing much of the information in
the raw measurement, this is according to a note made by Kevin Hankins
([email protected]).

References
1. Yazen A. Khalil and Peshawa J. Muhammad Ali; “A proposed method for colorizing
grayscale images”, International Journal of Computer, Science and Engineering,
2013, 2(2), pp.104-109.
http://www.iaset.us/view_archives.php?year=2013&id=14&jtype=2&page=2
2. Peshawa J. Muhammad Ali, Nigar M.S. Suramerry, Abdul-rahman M. Yunis, Ladeh
S.Abdulrahman, “Gender prediction of journalists from writing style”, Aro Journal,
2013, 1(1), pp.22-28. http://aro.koyauniversity.org/issues/volumeone/aro-10031
3. Peshawa J. Muhammad Ali; “Predicting the gender of the Kurdish writers in
Facebook” Sulaimani Journal for Engineering Sciences, 2013, 1(1), pp.18-28.
http://www.univsul.edu.iq/Wenekan_KS/12111313102014_Sulaimani%20Journal-EN
G.%2020-30.pdf
4. Peshawa J. Muhammad Ali and Rezhna H. Faraj; “Traffic congestion problem and
solutions, the road between Sawz square and Shahidan square at Koya city as a case
study”, The first international symposium on urban development, Iraq, Koya, Koya
University, 2013, pp.125-133. Transactions of the Wessex institute Paper DOI:
10.2495/ISUD130151
http://library.witpress.com/pages/PaperInfo.asp?PaperID=25351
5. Peshawa J. Muhammad Ali and Noura A. Semary; “A proposed color image protection
system based on color embedding”, International conference on electrical,
communication, computer, power and control engineering, Mosul, Iraq, 2013.

Classification of Services Through Feature Selection and Machine Learning in 5G Networks
No ratings yet
Classification of Services Through Feature Selection and Machine Learning in 5G Networks
11 pages
KeysTracy - Final-Report - FashionImageClassifier v2 For Github
No ratings yet
KeysTracy - Final-Report - FashionImageClassifier v2 For Github
13 pages
URTEC-2878 - Production Forecasting in Shale Reservoirs Using LSTM Method in Deep Learning
No ratings yet
URTEC-2878 - Production Forecasting in Shale Reservoirs Using LSTM Method in Deep Learning
60 pages
Norms and Basic Statistics For Testing
No ratings yet
Norms and Basic Statistics For Testing
26 pages
Data Normalization in Data Mining
No ratings yet
Data Normalization in Data Mining
8 pages
Normal LectureNote
No ratings yet
Normal LectureNote
48 pages
Data Normalization and Standardization
No ratings yet
Data Normalization and Standardization
6 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Data Normalizationand Standardization ATechnical Report
No ratings yet
Data Normalizationand Standardization ATechnical Report
6 pages
Outlier Detection A Survey
No ratings yet
Outlier Detection A Survey
84 pages
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
100% (1)
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
51 pages
ML Unit 1
No ratings yet
ML Unit 1
124 pages
dmdw2 2
No ratings yet
dmdw2 2
24 pages
Lecture 2.3 Data Normalization
No ratings yet
Lecture 2.3 Data Normalization
7 pages
IMS - Lecture Four
No ratings yet
IMS - Lecture Four
2 pages
Theory and Applications of Time Series Analysis
No ratings yet
Theory and Applications of Time Series Analysis
236 pages
Scaling Techniques
No ratings yet
Scaling Techniques
30 pages
8 Normalization Methods
No ratings yet
8 Normalization Methods
10 pages
Data Science - Unit-4
No ratings yet
Data Science - Unit-4
30 pages
Aes Cep 122
No ratings yet
Aes Cep 122
9 pages
Artificial Intelligence Crime An Overview of Malicious Use and Abuse of AI
No ratings yet
Artificial Intelligence Crime An Overview of Malicious Use and Abuse of AI
13 pages
Chapter 06
No ratings yet
Chapter 06
55 pages
Fuzzy Sem Question Paper
No ratings yet
Fuzzy Sem Question Paper
4 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
Data Processing in Research
No ratings yet
Data Processing in Research
31 pages
Iarjset 5
No ratings yet
Iarjset 5
3 pages
PPT+59+ +Ex+11B+Standardisation,+68!95!99.7+Rule
No ratings yet
PPT+59+ +Ex+11B+Standardisation,+68!95!99.7+Rule
23 pages
Mmw-Chapter 1docx-Pdf-Free
No ratings yet
Mmw-Chapter 1docx-Pdf-Free
5 pages
Data Science MCQs
No ratings yet
Data Science MCQs
9 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
CS361 FA23 Lec2 Post
No ratings yet
CS361 FA23 Lec2 Post
67 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Project On Agriculture
No ratings yet
Project On Agriculture
20 pages
Assignment#1
No ratings yet
Assignment#1
51 pages
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
No ratings yet
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
7 pages
Lecture # 13 Data - Transformation - Techniques
No ratings yet
Lecture # 13 Data - Transformation - Techniques
36 pages
Pothole Severity Prediction Using Monocular Depth (3) (1) - 2
No ratings yet
Pothole Severity Prediction Using Monocular Depth (3) (1) - 2
15 pages
Syllabus Sem 5
No ratings yet
Syllabus Sem 5
90 pages
Normal Distribn Theory
0% (1)
Normal Distribn Theory
16 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Model Selection and Feature Engineering
No ratings yet
Model Selection and Feature Engineering
64 pages
Deep Neural Networks and Tabular Data A Survey
No ratings yet
Deep Neural Networks and Tabular Data A Survey
21 pages
Basic Construction Materials
No ratings yet
Basic Construction Materials
32 pages
Exploring Microsoft PowerPoint AI, Using Python
No ratings yet
Exploring Microsoft PowerPoint AI, Using Python
16 pages
Characteristics of Complex Systems in Sports Injury Rehabilitation: Examples and Implications For Practice
No ratings yet
Characteristics of Complex Systems in Sports Injury Rehabilitation: Examples and Implications For Practice
15 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Data Preparation DM
No ratings yet
Data Preparation DM
26 pages
Ai Lakshmana Sai Vision Transformer
No ratings yet
Ai Lakshmana Sai Vision Transformer
19 pages
Hybrid Machine Learning Risk Assesment in Gender Based Crime
No ratings yet
Hybrid Machine Learning Risk Assesment in Gender Based Crime
15 pages
Back Propagation
No ratings yet
Back Propagation
21 pages
MCA II Sem
No ratings yet
MCA II Sem
11 pages
Solving The Rubik S Cube With
No ratings yet
Solving The Rubik S Cube With
8 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
The Normal Curve, Standardization and Z
No ratings yet
The Normal Curve, Standardization and Z
42 pages
Lecture-11 - Feature Scaling
No ratings yet
Lecture-11 - Feature Scaling
26 pages
PR Group 3
No ratings yet
PR Group 3
53 pages
Unit 2 ML 2019
No ratings yet
Unit 2 ML 2019
91 pages
Summary Chapter 6
No ratings yet
Summary Chapter 6
9 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Ey Smart Closing
No ratings yet
Ey Smart Closing
4 pages
An Approach To Detect Abusive Bangla Text
No ratings yet
An Approach To Detect Abusive Bangla Text
5 pages
3point5point2 Normalization
No ratings yet
3point5point2 Normalization
3 pages
Statistics in Education: Distribution
No ratings yet
Statistics in Education: Distribution
79 pages
Elie Niring
No ratings yet
Elie Niring
6 pages
Standardization Vs Normalization in Pattern Recognition
No ratings yet
Standardization Vs Normalization in Pattern Recognition
1 page
4 Normal Distribution
No ratings yet
4 Normal Distribution
40 pages
Data Normalization
No ratings yet
Data Normalization
7 pages
Standard Score
No ratings yet
Standard Score
2 pages
Soumen Samajdar Bio Data Linkdin
No ratings yet
Soumen Samajdar Bio Data Linkdin
2 pages
01
No ratings yet
01
36 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
3 1 Chapter 3 Normalization
No ratings yet
3 1 Chapter 3 Normalization
22 pages
Preprocessing Stage
No ratings yet
Preprocessing Stage
4 pages
Standard Scores
0% (1)
Standard Scores
26 pages
Data Mining: A Preprocessing Engine
No ratings yet
Data Mining: A Preprocessing Engine
5 pages
MMW 101 - Lesson 10 - Z-Scores and Normal Curve
No ratings yet
MMW 101 - Lesson 10 - Z-Scores and Normal Curve
45 pages
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
No ratings yet
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
10 pages
Meth
No ratings yet
Meth
6 pages
Normalization A Preprocessing Stage
No ratings yet
Normalization A Preprocessing Stage
5 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Lesson 2.5 StatAna
No ratings yet
Lesson 2.5 StatAna
5 pages
5 Random Var PDF
No ratings yet
5 Random Var PDF
74 pages
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
No ratings yet
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
33 pages
Additional-Notes STATS
No ratings yet
Additional-Notes STATS
8 pages
1 Normal Distribution
No ratings yet
1 Normal Distribution
34 pages
Normalization: Normalization Techniques at A Glance
No ratings yet
Normalization: Normalization Techniques at A Glance
5 pages
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Data Normalization and Standardization

Uploaded by

Data Normalization and Standardization

Uploaded by

Data Normalization and Standardization

Peshawa Jammal Muhammad Ali

Please write me your comments by email so as I can improve the document

valueBef oreN ormalization − min

x = [x′ * (max − min)] + min

To normalize the data to the range -1, +1 see Fig(2):

valueAf terN ormalization +1) valueBef oreN ormalization − min

valueBef oreN ormalization − min

Denormalization from range -1, +1

x = [ ( x′ 2+ 1 )(max − min) ] + min

x − min x − min−( max−min )

x − min− max min

Example: -20, -6, 0, 40, 70,120

sd = √ (−20−34)2 +(−6−34)2 +(0−34)2 +(40−34)2 +(70−34)2 + (120−34)2

Other values are changed too,

Accordingly, values are changed to:

You might also like