Data Preprocessing
Data Preprocessing
Data Preprocessing
By :
Parul Chauhan
Assistant Prof.
Data Mining
▶ Huge amount of data gets added up in our computer
networks, world wide web, and various storage
devices everyday from media, facebook, science etc.
▶ Example:
▶ Walmart handle hundreds of millions of transactions
per week at thousands of branches
▶ 1. Missing Values
▶ 2. Noisy data
iii. Regression
A) smoothing by means
Bin 1: 9,9,9
Bin 2: 22,22,22
Bin 3: 29,29,29
A) smoothing by means
Bin 1: 451, 451, 451
Bin 2: 547, 547, 547
Bin 3 656, 656, 656
Bin 4: 779, 779, 779
▶ For example,
▶ The data set will likely be huge! Complex data analysis and
mining on huge amounts of data can take a long time,
making such analysis impractical or infeasible.
a) Dimensionality reduction,
c) Data compression.
▶ The following data are a list of AllElectronics prices for commonly sold
items (rounded to the nearest dollar). The numbers have been sorted:
1, 1, 5, 5, 5, 5, 5, 8, 8, 10, 10, 10, 10, 12, 14, 14, 14, 15, 15, 15, 15, 15, 15, 18, 18, 18,
18, 18,18, 18, 18, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 25, 25, 25, 25, 25, 28,
28, 30,30, 30.
b) Z score
c) Decimal Scaling