data preprocessing
data preprocessing
– Data Quality
• Data Cleaning
• Data Integration
• Data Reduction
• Data Transformation
1
1
Data Quality: Why Preprocess the Data?
2
Major Tasks in Data Preprocessing
• Data cleaning
– Fill in missing values, smooth noisy data, identify or remove outliers,
and resolve inconsistencies
• Data integration
– Integration of multiple databases, data cubes, or files
• Data reduction
– Dimensionality reduction
– Numerosity reduction
– Data compression
• Data transformation
– Normalization
– Concept hierarchy generation
3