Data Normalization and Standardization
Data Normalization and Standardization
Data Normalization and Standardization
Normalization
It’s the process of casting the data to the specific range, like between 0 and 1 or between -1 and
+1. Normalization is required when there are big differences in the ranges of different features.
This scaling method is useful when the data set does not contain outliers. The theoretical
background of normalization can be easily understood from Figure (1). If it is required to cast
the data to the range 0,1 then:
From Trigonometry:
valueAf terN ormalization − 0 valueBef oreN ormalization − min
1−0 = max − min
valueAf terN ormalization valueBef oreN ormalization − min
1 = max − min
x − min
or x′ = max − min
Denormalization
This process should be done if normalization applied. For example, to denormalize the a data
from the range 0, 1 below equation can be used:
where x’ is the normalized data and x is denormalized data, min and max are the same values
used previously in the normalization process.
x − min
or x′ = 2 * ( max − min ) − 1
x − min
x′ = 2 * ( max − min ) − 1
x − ( max min
2 + 2 )
x′ = [ max−min ]
2
x − ( max 2+ min )
x′ = max−min
2
Z-score standardization
Making a data set with mean=0, and standard deviation =1. This scaling method is
useful when the data follows a normal distribution (Gaussian distribution), if the data
does not follow normal distribution then this will make problems.
−20−6+0+40+70+120
M ean = 6 = 34
sd = 48.98979
z-score standardization
x−mean −20−34
x" = sd = 48.98979 = − 1.1022
-0.8165
-0.69402
0.122474
0.734847
1.755468
Now, if you calculate the average and sd of these new values you will see that the mean
is zero and sd=1.
Important note:
However, the point must be made that N/S are _not_ good where the raw measurement
is desireable and where the N/S is irreversible, thus losing much of the information in
the raw measurement, this is according to a note made by Kevin Hankins
([email protected]).
References
1. Yazen A. Khalil and Peshawa J. Muhammad Ali; “A proposed method for colorizing
grayscale images”, International Journal of Computer, Science and Engineering,
2013, 2(2), pp.104-109.
http://www.iaset.us/view_archives.php?year=2013&id=14&jtype=2&page=2
2. Peshawa J. Muhammad Ali, Nigar M.S. Suramerry, Abdul-rahman M. Yunis, Ladeh
S.Abdulrahman, “Gender prediction of journalists from writing style”, Aro Journal,
2013, 1(1), pp.22-28. http://aro.koyauniversity.org/issues/volumeone/aro-10031
3. Peshawa J. Muhammad Ali; “Predicting the gender of the Kurdish writers in
Facebook” Sulaimani Journal for Engineering Sciences, 2013, 1(1), pp.18-28.
http://www.univsul.edu.iq/Wenekan_KS/12111313102014_Sulaimani%20Journal-EN
G.%2020-30.pdf
4. Peshawa J. Muhammad Ali and Rezhna H. Faraj; “Traffic congestion problem and
solutions, the road between Sawz square and Shahidan square at Koya city as a case
study”, The first international symposium on urban development, Iraq, Koya, Koya
University, 2013, pp.125-133. Transactions of the Wessex institute Paper DOI:
10.2495/ISUD130151
http://library.witpress.com/pages/PaperInfo.asp?PaperID=25351
5. Peshawa J. Muhammad Ali and Noura A. Semary; “A proposed color image protection
system based on color embedding”, International conference on electrical,
communication, computer, power and control engineering, Mosul, Iraq, 2013.