0% found this document useful (0 votes)
52 views25 pages

DMDW 5

Normalization is a process used to standardize data values that are measured on different scales. It is often necessary prior to performing data analysis to avoid attributes with larger ranges dominating over others. There are several normalization methods including min-max normalization, z-score normalization, and decimal scaling. Dimensionality reduction techniques can also be applied to reduce the number of random variables under consideration by obtaining a set of principal variables.

Uploaded by

Anu agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views25 pages

DMDW 5

Normalization is a process used to standardize data values that are measured on different scales. It is often necessary prior to performing data analysis to avoid attributes with larger ranges dominating over others. There are several normalization methods including min-max normalization, z-score normalization, and decimal scaling. Dimensionality reduction techniques can also be applied to reduce the number of random variables under consideration by obtaining a set of principal variables.

Uploaded by

Anu agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Dr.

Amiya Ranjan Panda


 Normalization is generally required when multiple attributes are
there but attributes have values on different scales, this may lead
to poor data models while performing data mining operations.
 Otherwise, it may lead to a dilution in effectiveness of an
important equally important attribute(on lower scale) because of
other attribute having values on larger scale.
 Heterogenous data with different units usually needs to be
normalized. Otherwise, data has the same unit and same order of
magnitude it might not be necessary with normalization.
 Unless normalized at pre-processing, variables with disparate
ranges or varying precision acquire different driving values.
 Normalization is normally done, when there is a distance
computation involved in our algorithm.
 Methods of Data Normalization:
◦ Decimal Scaling
◦ Min-Max Normalization
◦ z-Score Normalization(zero-mean Normalization)

 There are several approaches in normalisation which can be


used in deep learning
models.
 Batch Normalization
 Layer Normalization
 Group Normalization
 Instance Normalization
 Weight Normalization
◦ Decimal Scaling Method For Normalization
- It normalizes by moving the decimal point of values of the data.
- To normalize the data by this technique, we divide each value of the data
by the maximum absolute value of data.
- The data value, vi, of data is normalized to v'i by using the formula

- where j is the smallest integer such that max(|v'i|)<1.

In this technique, the computation is generally scaled in terms of decimals. It means


that the result is generally scaled by multiplying or dividing it with pow(10,k).

Example:
- Let the input data is: -15, 121, 201, 421, 561, 601, 850
- To normalize the above data,
- Step 1: Maximum absolute value in given data(m): 850
- Step 2: Divide the given data by 1000 (i.e j=3)
- Result:The normalized data is: -0.015, 0.121, 0.201, 0.421, 0.561, 0.601, 0.85
◦ Min-Max Normalization
- In this technique of data normalization, linear transformation is
performed on the original data.
- Minimum and maximum value from data is fetched and each value is
replaced according to the following formula.

- Where A is the attribute data,


- min(A), max(A) are the minimum and maximum absolute value of A respectively.
- v' is the new value of each entry in data.
- v is the old value of each entry in data.
- new_max(A), new_min(A) is the max and min value of the range(i.e boundary value of
range required) respectively.

Roll No Marks Example Roll No Marks


1 10 If we were to normalize it 1 0
2 15 between the ranges of 0 to 1 2 0.1
we would get the following
3 50 3 0.8
4 60 4 1
◦ z-Score Normalization (zero-mean Normalization)
- In this technique, values are normalized based on mean and standard
deviation of the data A.
- It is also called Standard Deviation method.
- So, the unstructured data can be normalized using z-score parameter,
the formula for z-score is as below;
where, is the mean and is the standard
deviation.
v is the old value of each entry in data.
v' is the Z-score normalized of each entry in data.
Example
Roll No Marks Roll No Marks
1 10 Mean is 33.75 and Standard 1 -0.951587303
2 15 Deviation is 24.95 2 -0.751253134
3 50 3 0.651086049
4 60 4 1.051754387
z-Score Normalization (zero-mean Normalization)
The normal distribution is a probability function that describes how the
values of a variable are distributed.
No matter what  and  are,
the area between - and + is about 68%;
the area between -2 and +2 is about 95%; and
the area between -3 and +3 is about 99.7%.
Almost all values fall within 3 standard deviations.

 
 1 ( x   )2
1
  e 2  dx  .68
 2
 
  2 1 x 2
1  ( )

  2
 e 2  dx  .95

  2
  3 1 x 2
1  ( )

  2
 e 2  dx  .997

  3
 Data aggregation is any process in which data is brought together and
conveyed in a summary form. It is typically used prior to the performance
of a statistical analysis.
 Combining two or more attributes (or objects) into a single attribute (or
object). Data aggregation is an element of business intelligence (BI)
solutions.
 Data aggregation generally works on the big data or data marts that do not
provide enough information value as a whole.
 Data aggregation is useful for everything from finance or business strategy
decisions to product, pricing, operations, and marketing strategies.
 Purpose
◦ Data reduction
 Reduce the number of attributes or objects
◦ Change of scale
 Cities aggregated into regions, states, countries, etc.
 Days aggregated into weeks, months, or years
◦ More “stable” data
 Aggregated data tends to have less variability
Examples of Data Aggregation
Companies often collect data from their online customers and website visitors.
For example, I am using Google analytics to see where my users are from? What
kind of content they like etc.
For example,
• Google collects data in the form of cookies to show targeted
advertisements to its users.
• Facebook is doing the same thing by collecting and analyzing the
information and show ads to its users.
In marketing: You can aggregate your data from a particular campaign, looking
at how it performed over time and with specific cohorts.

The retail industry: they must always be gathering the fresh information about
their competitors’ product offerings, promotions, and prices.

The travel industry: Include competitive price monitoring, competitor


research, gaining market intelligence, customer sentiment analysis, and capturing
images and descriptions for the services on their online travel sites.

Healthcare industry: can use data aggregation to help maintain transparency


and trust between the healthcare industry and patients.
 Time aggregation
◦ It is data points for a single resource over a specified period.
 Spatial aggregation
◦ It is data points for a group of resources over a specified period.
 Aggregating data can be a remarkably manual process, especially
if your need it in the early stages.
➢ Go through an excel sheet. Reformat it so it looks like other data sources.
➢ Then create charts to compare the performance/budget/progress of your
multiple analysis.

 If you want to go for the automated process, then It looks like


the implementation of third-party software/code/algorithm,
sometimes called Middleware, that can pull data automatically
from your database sources.
 So, Manual and automated data aggregation is possible based on
your domain’s requirements.
 When dimensionality increases, data becomes increasingly
sparse in the space that it occupies
 The curse of dimensionality basically means that the error
increases with the increase in the number of features.
 Complexcity (running time) increases with dimension d.
 If we have more features than observations than we run
the risk of massively overfitting our model — this would
generally result in terrible out of sample performance.
 Definitions of density and distance between points, which
are critical for clustering and outlier detection, become less
meaningful
 Dimensionality reduction is a method of converting the high
dimensional variables into lower dimensional variables without
changing the specific information of the variables.
 Dimensionality Reduction is used to reduce the feature space
with consideration by a set of principal features.
 Purpose:
◦ Avoid curse of dimensionality
◦ Reduce amount of time and memory required by data mining algorithms
◦ Allow data to be more easily visualized
◦ May help to eliminate irrelevant features or reduce noise

 Techniques
◦ Principal Components Analysis (PCA)
◦ Singular Value Decomposition
◦ Others: supervised and non-linear techniques
 Feature selection
 Feature Extraction (reduction)
A process that chooses an optimal subset
of features according to a objective
function
 Objectives
◦ To reduce dimensionality and remove noise
◦ To improve mining performance
 Speed of learning
 Predictive accuracy
 Simplicity and comprehensibility of mined results
 Another way to reduce dimensionality of data
 Redundant features
◦ Duplicate much or all of the information contained in
one or more other attributes
◦ Example: purchase price of a product and the amount of
sales tax paid
 Irrelevant features
◦ Contain no information that is useful for the data mining
task at hand
◦ Example: students' ID is often irrelevant to the task of
predicting students' GPA
 Many techniques developed, especially for
classification
 Create new attributes that can capture the
important information in a data set much
more efficiently than the original attributes

 Three general methodologies:


◦ Feature extraction
 Example: extracting edges from images
◦ Feature construction
 Example: dividing mass by volume to get density
◦ Mapping data to new space
 Example: Fourier and wavelet analysis
 Feature reduction refers to the mapping of the
original high-dimensional data onto a lower
dimensional space
 Given a set of data points of p variables {x1,x2,....xn}
Compute their low-dimensional representation:
xiRd → yiRp (p<<d)
Criterion for feature reduction can be different based on
different problem settings.
◦ Unsupervised setting: minimize the information loss
◦ Supervised setting: maximize the class discrimination
 Feature reduction
◦ All original features are used
◦ The transformed features are linear combinations
of the original features
 Feature selection
◦ Only a subset of the original features are selected
 Filter model
◦ Separating feature selection from classifier learning
◦ Relying on general characteristics of data (information,
distance, dependence, consistency)
◦ No bias toward any learning algorithm, fast
 Wrapper model
◦ Relying on a predetermined classification algorithm
◦ Using predictive accuracy as goodness measure
◦ High accuracy, computationally expensive
 Typical Error Matrix:
TRUE POSITIVE FALSE POSITIVE
FALSE NEGATIVE TRUE NEGATIVE

Reference Data

• Diagonals represent sites


Classified Data

classified correctly according


to reference data.

• Off-diagonals were
misclassified.
v Overall Accuracy is essentially tells us out of all of the reference sites what
proportion were mapped correctly.
Overall Accuracy = (TP+TN)/(TP+TN+FP+FN)

v Individual Class Accuracy Calculated by dividing the number of correctly


classified pixels in each category by either the total number of pixels in the
corresponding column; Producer’s accuracy, or row; User’s accuracy.

➢ Producer's Accuracy is the map accuracy from the point of view of the
map maker (the producer).

Producerʼs Accuracy (Class A) = TP/(TP+FN)


Producerʼs Accuracy (Class B) = TN/(FP+TN)
➢ User's Accuracy is the accuracy from the point of view of a map user, not
the map maker.The User's accuracy essentially tells use how often the class
on the map will actually be present on the ground. This is referred to as
reliability.
Userʼs Accuracy (Class A) = TP/(TP+FP)
Userʼs Accuracy (Class B) = TN/(TN+FN)
Overall Accuracy = (TP+TN)/(TP+TN+FP+FN)
Producerʼs Accuracy (Class A) = TP/(TP+FN)
Producerʼs Accuracy (Class B) = TN/(FP+TN)
Userʼs Accuracy (Class A) = TP/(TP+FP)
Userʼs Accuracy (Class B) = TN/(TN+FN)
Accuracy on preceding slide:
• Overall Accuracy = 92.4%
• Producerʼs Accuracy (Class A) = 89.9%
• Producerʼs Accuracy (Class B) = 94.7%
• Userʼs Accuracy (Class A) = 94.2%
• Userʼs Accuracy (Class B) = 90.7%

You might also like