Data Normalization in Data Mining
Data Normalization in Data Mining
INTRODUCTION:
dataset into a common scale. This is impor tant because many machine learning
algorithms are sensitive to the scale of the input features and can produce better
There are several different normalization techniques that can be used in data
mining, including :
between 0 and 1. This is done by subtracting the minimum value of the feature
from each value, and then dividing by the range of the feature.
the feature from each value, and then dividing by the standard deviation.
3. Decimal Scaling : This technique scales the values of a feature by dividing the
to the values of a feature. This can be useful for data with a wide range of values,
values of a feature. This can be useful for data with a wide range of values, as it
6. It ’s impor tant to note that normalization should be applied only to the input
features, not the target variable, and that different normalization technique may
▲
Start Your Coding Journey Now!
In conclusion, normalization is an impor tant step in data mining, as it can help to
Login Register
improve the per formance of machine learning algorithms by scaling the input
Read Discuss
features to a common scale. This can help to reduce the impact of outliers and
range, such as -1.0 to 1.0 or 0.0 to 1.0. It is generally useful for classification
algorithms.
Need of Normalization –
different scale, other wise, it may lead to a dilution in effectiveness of an impor tant
equally impor tant attribute(on lower scale) because of other attribute having values
on larger scale. In simple words, when multiple attributes are there but attributes
have values on different scales, this may lead to poor data models while per forming
data mining operations. So they are normalized to bring all the attributes on the
same scale.
Min-Max Normalization
It normalizes by moving the decimal point of values of the data. To normalize the data
by this technique, we divide each value of the data by the maximum absolute value of
data. The data value, vi, of data is normalized to vi‘ by using the formula below –
Let the input data is: -10, 201, 301, -401, 501, 601, 701 To normalize the above
data, Step 1: Maximum absolute value in given data(m): 701 Step 2: Divide the
given data by 1000 (i.e j=3) Result : The normalized data is: -0.01, 0.201, 0.301,
Min-Max Normalization –
original data. Minimum and maximum value from data is fetched and each value is
Where A is the attribute data, Min(A), Max(A) are the minimum and maximum
absolute value of A respectively. v’ is the new value of each entr y in data. v is the old
value of each entr y in data. new_max(A), new_min(A) is the max and min value of the
Z-score normalization –
In this technique, values are normalized based on mean and standard deviation of the
ADVANTAGES OR DISADVANTAGES:
disadvantages.
Advantages :
improve the per formance of machine learning algorithms by scaling the input
features to a common scale. This can help to reduce the impact of outliers and
outliers by scaling the data to a common scale, which can make the outliers less
influential.
the results of a machine learning model, as the inputs will be on a common scale.
model, by reducing the impact of outliers and by making the model less sensitive
Disadvantages :
Start Your Coding Journey Now!
1. Loss of information: Normalization can result in a loss of information if the
Read Discuss
2. Impact on outliers: Normalization can make it harder to detect outliers as they will
which may not align with the original scale of the data.
costs to the data mining process, as it requires additional processing time to scale
the data.
can improve the per formance of machine learning algorithms and make it easier
to interpret the results. However, it can also result in a loss of information and
make it harder to detect outliers. It ’s impor tant to weigh the pros and cons of data
normalization and carefully assess the risks and benefits before implementing it.
Like 28
Previous Next
Related Articles
Read Discuss
5. Problems on min-max normalization
Ar ticle Contributed By :
deepak_jain
@deepak_jain
Company Learn
About Us DSA
Careers Algorithms
In Media Data Structures
Contact Us SDE Cheat Sheet
Privacy Policy Machine learning
Copyright Policy CS Subjects
Advertise with us Video Tutorials
Courses
News Languages
Top News
Python
Technology
Java
Work & Career
CPP
Business
Golang
Finance
C#
Lifestyle
SQL
Knowledge
Kotlin