0% found this document useful (0 votes)
59 views

Data Normalization

The document discusses data normalization techniques in data mining. It explains that normalization transforms data values into a common scale to improve machine learning algorithm performance by making the data less sensitive to differences in feature scales. Common normalization methods include min-max normalization, z-score normalization, decimal scaling, and logarithmic/root transformations. Normalization can improve accuracy but may also result in information loss or make outliers harder to detect. Overall, normalization is useful for data preprocessing but its costs and benefits must be considered for each application.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Data Normalization

The document discusses data normalization techniques in data mining. It explains that normalization transforms data values into a common scale to improve machine learning algorithm performance by making the data less sensitive to differences in feature scales. Common normalization methods include min-max normalization, z-score normalization, decimal scaling, and logarithmic/root transformations. Normalization can improve accuracy but may also result in information loss or make outliers harder to detect. Overall, normalization is useful for data preprocessing but its costs and benefits must be considered for each application.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1/21/24, 8:31 AM Data Normalization in Data Mining - GeeksforGeeks

90% Refund @Courses Trending Now Data Structures & Algorithms Foundational Courses Data Science

Data Normalization in Data Mining


Read Courses Jobs

INTRODUCTION:

Data normalization is a technique used in data mining to transform the


values of a dataset into a common scale. This is important because many
machine learning algorithms are sensitive to the scale of the input features
and can produce better results when the data is normalized.

There are several different normalization techniques that can be used in


data mining, including:

1. Min-Max normalization: This technique scales the values of a feature to a


range between 0 and 1. This is done by subtracting the minimum value of
the feature from each value, and then dividing by the range of the feature.
2. Z-score normalization: This technique scales the values of a feature to
have a mean of 0 and a standard deviation of 1. This is done by
subtracting the mean of the feature from each value, and then dividing by
the standard deviation.
3. Decimal Scaling: This technique scales the values of a feature by dividing
the values of a feature by a power of 10.
4. Logarithmic transformation: This technique applies a logarithmic
transformation to the values of a feature. This can be useful for data with
a wide range of values, as it can help to reduce the impact of outliers.
5. Root transformation: This technique applies a square root transformation
to the values of a feature. This can be useful for data with a wide range of
values, as it can help to reduce the impact of outliers.
6. It’s important to note that normalization should be applied only to the
input features, not the target variable, and that different normalization
technique may work better for different types of data and models.

https://www.geeksforgeeks.org/data-normalization-in-data-mining/ 1/11
1/21/24, 8:31 AM Data Normalization in Data Mining - GeeksforGeeks

In conclusion, normalization is an important step in data mining, as it can


help to improve the performance of machine learning algorithms by scaling
the input features to a common scale. This can help to reduce the impact of
outliers and improve the accuracy of the model.

Normalization is used to scale the data of an attribute so that it falls in a


smaller range, such as -1.0 to 1.0 or 0.0 to 1.0. It is generally useful for
classification algorithms.

Need of Normalization –

Normalization is generally required when we are dealing with attributes on a


different scale, otherwise, it may lead to a dilution in effectiveness of an
important equally important attribute(on lower scale) because of other
attribute having values on larger scale. In simple words, when multiple
attributes are there but attributes have values on different scales, this may
lead to poor data models while performing data mining operations. So they
are normalized to bring all the attributes on the same scale.

Methods of Data Normalization –

https://www.geeksforgeeks.org/data-normalization-in-data-mining/ 2/11
1/21/24, 8:31 AM Data Normalization in Data Mining - GeeksforGeeks

90% Refund Hack

1200+ have already taken up the challenge. It's your turn now! Get 90% refund
on course fees upon achieving 90% completion.Discover How

Decimal Scaling
Min-Max Normalization
z-Score Normalization(zero-mean Normalization)

Decimal Scaling Method For Normalization –

It normalizes by moving the decimal point of values of the data. To normalize


the data by this technique, we divide each value of the data by the maximum
absolute value of data. The data value, vi, of data is normalized to vi‘ by
using the formula below –

where j is the smallest integer such that max(|vi‘|)<1. Example –

Let the input data is: -10, 201, 301, -401, 501, 601, 701 To normalize
the above data, Step 1: Maximum absolute value in given data(m): 701

https://www.geeksforgeeks.org/data-normalization-in-data-mining/ 3/11
1/21/24, 8:31 AM Data Normalization in Data Mining - GeeksforGeeks

Step 2: Divide the given data by 1000 (i.e j=3) Result: The normalized
data is: -0.01, 0.201, 0.301, -0.401, 0.501, 0.601, 0.701

Min-Max Normalization –

In this technique of data normalization, linear transformation is performed on


the original data. Minimum and maximum value from data is fetched and
each value is replaced according to the following formula.

Where A is the attribute data, Min(A), Max(A) are the minimum and
maximum absolute value of A respectively. v’ is the new value of each entry
in data. v is the old value of each entry in data. new_max(A), new_min(A) is
the max and min value of the range(i.e boundary value of range required)
respectively.

Z-score normalization –

In this technique, values are normalized based on mean and standard


deviation of the data A. The formula used is:

https://www.geeksforgeeks.org/data-normalization-in-data-mining/ 4/11
1/21/24, 8:31 AM Data Normalization in Data Mining - GeeksforGeeks

v’, v is the new and old of each entry in data respectively. σA, A is the
standard deviation and mean of A respectively.

ADVANTAGES OR DISADVANTAGES:

Data normalization in data mining can have a number of advantages and


disadvantages.

Advantages:

1. Improved performance of machine learning algorithms: Normalization can


help to improve the performance of machine learning algorithms by
scaling the input features to a common scale. This can help to reduce the
impact of outliers and improve the accuracy of the model.
2. Better handling of outliers: Normalization can help to reduce the impact
of outliers by scaling the data to a common scale, which can make the
outliers less influential.
3. Improved interpretability of results: Normalization can make it easier to
interpret the results of a machine learning model, as the inputs will be on
a common scale.
4. Better generalization: Normalization can help to improve the
generalization of a model, by reducing the impact of outliers and by
making the model less sensitive to the scale of the inputs.

Disadvantages:

1. Loss of information: Normalization can result in a loss of information if the


original scale of the input features is important.
https://www.geeksforgeeks.org/data-normalization-in-data-mining/ 5/11
1/21/24, 8:31 AM Data Normalization in Data Mining - GeeksforGeeks

2. Impact on outliers: Normalization can make it harder to detect outliers as


they will be scaled along with the rest of the data.
3. Impact on interpretability: Normalization can make it harder to interpret
the results of a machine learning model, as the inputs will be on a
common scale, which may not align with the original scale of the data.
4. Additional computational costs: Normalization can add additional
computational costs to the data mining process, as it requires additional
processing time to scale the data.
5. In conclusion, data normalization can have both advantages and
disadvantages. It can improve the performance of machine learning
algorithms and make it easier to interpret the results. However, it can also
result in a loss of information and make it harder to detect outliers. It’s
important to weigh the pros and cons of data normalization and carefully
assess the risks and benefits before implementing it.

Learn to code easily with our course Coding for Everyone. This course is
accessible and designed for everyone, even if you're new to coding. Start
today and join millions on a journey to improve your skills!

Whether you're preparing for your first job interview or aiming to upskill in
this ever-evolving tech landscape, GeeksforGeeks Courses are your key to
success. We provide top-quality content at affordable prices, all geared
towards accelerating your growth in a time-bound manner. Join the millions
we've already empowered, and we're here to do the same for you. Don't
miss out - check it out now!

Commit to GfG's Three-90 Challenge! Purchase a course, complete 90% in 90


days, and save 90% cost click here to explore.

Last Updated : 02 Feb, 2023 34

Previous Next

Introduction to Mixed Reality Weighted K-NN

Share your thoughts in the comments Add Your Comment

https://www.geeksforgeeks.org/data-normalization-in-data-mining/ 6/11

You might also like