3 Data Preprocessing
3 Data Preprocessing
• Data Integration
• Data Transformation
• Data Reduction
Data Cleaning
• First step of data pre-processing
– Handling noise
– Correcting inconsistency
Data Cleaning
• If the data contains values missing for some of it’s attributes , then
– They can be handled using one of the following ways:
• Use the mean of the attribute value to fill the missing value
– Used when the tuple have several attributes with missing value
– Bayesian formalism
– Etc.
Noise
• Noise
• A random error or
• Limitation of technology
Regression
Clustering
Handling Noise
Binning
• This methods works on
• Different techniques
– Wavelet Transform
• Two types
– Parametric
• Incorporates storing only data parameters instead of the original
data
• Method: Regression and Log-linear
– Non-Parametric