DMDW 5
DMDW 5
Example:
- Let the input data is: -15, 121, 201, 421, 561, 601, 850
- To normalize the above data,
- Step 1: Maximum absolute value in given data(m): 850
- Step 2: Divide the given data by 1000 (i.e j=3)
- Result:The normalized data is: -0.015, 0.121, 0.201, 0.421, 0.561, 0.601, 0.85
◦ Min-Max Normalization
- In this technique of data normalization, linear transformation is
performed on the original data.
- Minimum and maximum value from data is fetched and each value is
replaced according to the following formula.
1 ( x )2
1
e 2 dx .68
2
2 1 x 2
1 ( )
2
e 2 dx .95
2
3 1 x 2
1 ( )
2
e 2 dx .997
3
Data aggregation is any process in which data is brought together and
conveyed in a summary form. It is typically used prior to the performance
of a statistical analysis.
Combining two or more attributes (or objects) into a single attribute (or
object). Data aggregation is an element of business intelligence (BI)
solutions.
Data aggregation generally works on the big data or data marts that do not
provide enough information value as a whole.
Data aggregation is useful for everything from finance or business strategy
decisions to product, pricing, operations, and marketing strategies.
Purpose
◦ Data reduction
Reduce the number of attributes or objects
◦ Change of scale
Cities aggregated into regions, states, countries, etc.
Days aggregated into weeks, months, or years
◦ More “stable” data
Aggregated data tends to have less variability
Examples of Data Aggregation
Companies often collect data from their online customers and website visitors.
For example, I am using Google analytics to see where my users are from? What
kind of content they like etc.
For example,
• Google collects data in the form of cookies to show targeted
advertisements to its users.
• Facebook is doing the same thing by collecting and analyzing the
information and show ads to its users.
In marketing: You can aggregate your data from a particular campaign, looking
at how it performed over time and with specific cohorts.
The retail industry: they must always be gathering the fresh information about
their competitors’ product offerings, promotions, and prices.
Techniques
◦ Principal Components Analysis (PCA)
◦ Singular Value Decomposition
◦ Others: supervised and non-linear techniques
Feature selection
Feature Extraction (reduction)
A process that chooses an optimal subset
of features according to a objective
function
Objectives
◦ To reduce dimensionality and remove noise
◦ To improve mining performance
Speed of learning
Predictive accuracy
Simplicity and comprehensibility of mined results
Another way to reduce dimensionality of data
Redundant features
◦ Duplicate much or all of the information contained in
one or more other attributes
◦ Example: purchase price of a product and the amount of
sales tax paid
Irrelevant features
◦ Contain no information that is useful for the data mining
task at hand
◦ Example: students' ID is often irrelevant to the task of
predicting students' GPA
Many techniques developed, especially for
classification
Create new attributes that can capture the
important information in a data set much
more efficiently than the original attributes
Reference Data
• Off-diagonals were
misclassified.
v Overall Accuracy is essentially tells us out of all of the reference sites what
proportion were mapped correctly.
Overall Accuracy = (TP+TN)/(TP+TN+FP+FN)
➢ Producer's Accuracy is the map accuracy from the point of view of the
map maker (the producer).