Lecture6
Lecture6
3rd Year
Spring 2025
Lec. 6
◼ Data Preprocessing
◼ Data Quality
◼ Data Cleaning
◼ Data Integration
◼ Data Reduction
◼ Data Transformation
2
Data Preprocessing: Why Preprocess the Data?
3
Data Quality
◼ Elements defining data quality
◼ Accuracy: correct or wrong, accurate or not
data)
◼ Believability: how much the data are trusted by users? Based on the data
inconsistences
◼ Data Integration
◼ Integration of multiple databases or files
◼ Data Reduction
◼ Dimensionality reduction
◼ Numerosity reduction
◼ Data Transformation
◼ Normalization
dependent
Observed value is
the actual count.
8
Chi-Square Calculation: An Example
Observed/actual Expected
male female
10
Chi-Square Calculation: An Example
Cont.
Chi square is > critical value Therefore, there is a strong correlation between gender
507.93 > 3.841 and preferred reading 11
Chi-Square Calculation: An Example
Cont.
male female
Chi square is > critical value
507.93 > 3.841
From the contingency table, male and like science fiction observed value (250) > expected value (90)
female and don’t like science fiction observed value (1000) > expected value (360)
THEREFORE
Specifically, Result shows that male and like science fiction are correlated in the group
female and don’t like science fiction are correlated in the group
12
Correlation Analysis (Numeric Data)
Correlation Coefficient
A: B:
معامل االرتباط
Students’ Studying
grades Hours
a1 b1
relation
…. ….
n n
13
Covariance (Numeric Data)
14
Covariance (Numeric Data)
Notice: very near to
the correlation formula
15
Covariance (Numeric Data)
16
Covariance: An Example
A B
2 5
3 8
5 10
4 11
6 14
Positively dependent 17
Covariance: Another Example