0% found this document useful (0 votes)
2 views

data preprocessing

Chapter 2 discusses data preprocessing, emphasizing the importance of data quality through measures such as accuracy, completeness, and consistency. It outlines major tasks involved in preprocessing, including data cleaning, integration, reduction, and transformation. Each task is defined with specific techniques and objectives to enhance data usability.

Uploaded by

jawaharb40
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

data preprocessing

Chapter 2 discusses data preprocessing, emphasizing the importance of data quality through measures such as accuracy, completeness, and consistency. It outlines major tasks involved in preprocessing, including data cleaning, integration, reduction, and transformation. Each task is defined with specific techniques and objectives to enhance data usability.

Uploaded by

jawaharb40
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Chapter 2: Data Preprocessing

• Data Preprocessing: An Overview

– Data Quality

– Major Tasks in Data Preprocessing

• Data Cleaning

• Data Integration

• Data Reduction

• Data Transformation

1
1
Data Quality: Why Preprocess the Data?

• Measures for data quality: A multidimensional view


– Accuracy: correct or wrong, accurate or not
– Completeness: not recorded, unavailable, …
– Consistency: some modified but some not, dangling, …
– Timeliness: timely update?
– Believability: how trustable the data are correct?
– Interpretability: how easily the data can be understood?

2
Major Tasks in Data Preprocessing

• Data cleaning
– Fill in missing values, smooth noisy data, identify or remove outliers,
and resolve inconsistencies
• Data integration
– Integration of multiple databases, data cubes, or files
• Data reduction
– Dimensionality reduction
– Numerosity reduction
– Data compression
• Data transformation
– Normalization
– Concept hierarchy generation
3

You might also like