0% found this document useful (0 votes)

3 views50 pages

Lecture 3 - Data Preprocessing

The document outlines the data preprocessing steps essential for knowledge discovery, including data cleaning, integration, transformation, discretization, and reduction. It emphasizes the importance of data quality, addressing issues such as missing data, noise, and redundancy, and discusses methods for handling these challenges. Additionally, it presents tasks related to a moviegoer database, such as classification, estimation, clustering, and affinity grouping.

Uploaded by

mdimranulhaque.hstu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views50 pages

Lecture 3 - Data Preprocessing

Uploaded by

mdimranulhaque.hstu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Outline

• Data Preprocessing
– Data Quality
– Major Tasks
• Data Cleaning
• Data Integration
• Data Transformation
• Data Discretization
• Data Reduction
2
Knowledge Discovery Process

Machine
Learning

3
The Moviegoer Example

4
5
Moviegoer Database - Tasks

• Classification
– Determine gender based on age, source and movies seen
– Determine source based on gender, age and movies seen
• Estimation
– For estimation, you need a continuos variable, e.g. age
– Estimate age as a function of source, gender and past movies
• Clustering
– Find groupings of movies that are often seen by same people
– Find groupings of people that tend to see the same movies
6
Moviegoer Database - Tasks

• Affinity grouping
– Association rules: which movies
go together?
– Need to create “transactions” for
each moviegoer containing
movies seen by that person

– May result in association rules

like:

7
8

Data Quality: Why Preprocess the Data?

• Measures for data quality: A multidimensional view
– Accuracy: correct or wrong, accurate or not
– Completeness: not recorded, unavailable, …
– Consistency: some modified but some not, dangling, …
– Timeliness: timely update?
– Believability: how trustable the data are correct?
– Interpretability: how easily the data can be understood?

Can the decision be trusted?

Better change to discover useful knowledge when the data is clean.
9

Major Tasks in Data Preprocessing

• Data cleaning
– Fill in missing values, smooth noisy data, identify or remove outliers, and resolve
inconsistencies
• Data integration
– Integration of multiple databases, data cubes, or files
• Data transformation
– Normalization
• Data Discretization
– Part of data reduction but with particular importance, especially for numerical data
• Data Reduction
– Obtains reduced representation in volume but produces the same or similar analytical
results
Outline

• Data Preprocessing
– Data Quality
– Major Tasks
• Data Cleaning
• Data Integration
• Data Transformation
• Data Discretization
• Data Reduction
10
Data Cleaning
• Data in the Real World Is Dirty: Lots of potentially incorrect data, e.g., instrument faulty,
human or computer error, transmission error
– incomplete: lacking attribute values, lacking certain attributes of interest, or containing
only aggregate data
• e.g., Occupation=“ ” (missing data)
– noisy: containing noise, errors, or outliers
• e.g., Salary=“−10” (an error)
– inconsistent: containing discrepancies in codes or names, e.g.,
• Age=“42”, Birthday=“03/07/2010”
• Was rating “1, 2, 3”, now rating “A, B, C”
• discrepancy between duplicate records
– Intentional (e.g., disguised missing data)
• Jan. 1 as everyone’s birthday? 11
Incomplete (Missing) Data

• Data is not always available

– E.g., many tuples have no recorded value for several attributes, such as
customer income in sales data
• Missing data may be due to
– equipment malfunction
– inconsistent with other recorded data and thus deleted
– data not entered due to misunderstanding
– certain data may not be considered important at the time of entry
– not register history or changes of the data
• Missing data may need to be inferred
12
How to Handle Missing Data?

• Ignore the record: usually done when class label is missing (when doing
classification)—not effective when the % of missing values per attribute varies
considerably
• Fill in the missing value manually: tedious + infeasible?
• Fill in it automatically with
– a global constant : e.g., “unknown”, a new class?!
– the attribute mean
– the attribute mean for all samples belonging to the same class: smarter

13
Imputation of Missing Data (Basic)

• Imputation is a term that denotes a procedure that replaces

the missing values in a dataset by some plausible values
– i.e. by considering relationship among correlated values among
the attributes of the dataset.
If we consider only
Attribute 1 Attribute 2 Attribute 3 Attribute 4 {attribute#2}, then value
20 cool high false “cool” appears in 4
cool high true records.
20 cool high true Probability of Imputing
20 mild low false
value (20) = 66.6%
30 cool normal false
10 mild high true Probability of Imputing
value (30) = 33.3%
14
Imputation of Missing Data (Basic)
For {attribute#4} the
Attribute 1 Attribute 2 Attribute 3 Attribute 4
value “true” appears in 3
20 cool high false
records
cool high true
20 cool high true Probability of Imputing
20 mild low false value (20) = 50%
30 cool normal false Probability of Imputing
10 mild high true value (10) = 50%

Attribute 1 Attribute 2 Attribute 3 Attribute 4 For {attribute#2,

20 cool high false attribute#3} the value
cool high true {“cool”, “high”} appears
20 cool high true in only 2 records
20 mild low false
30 cool normal false Probability of Imputing
10 mild high true value (20) = 100% 15
16

Methods of Treating Missing Data

• K-Nearest Neighbor (k-NN) approach
– k-NN imputes the missing attribute values on the basis of nearest K
neighbor. Neighbors are determined on the basis of distance measure.
– Once K neighbors are determined, missing value are imputed by taking
mean/median or MOD of known attribute values of missing attribute.

Missing value
record

Other dataset
records
Noisy Data
• Noise: random error or variance in a measured variable
• Incorrect attribute values may be due to
– faulty data collection instruments
– data entry problems
– data transmission problems
– technology limitation
– inconsistency in naming convention
• Other data problems which require data cleaning
– duplicate records
– incomplete data
– inconsistent data
17
18

How to Handle Noisy Data?

• Binning
– first sort data and partition into (equal-frequency) bins
– then one can smooth by bin means, smooth by bin median,
smooth by bin boundaries, etc.
• Regression
– smooth by fitting the data into regression functions
• Clustering
– detect and remove outliers
• Combined computer and human inspection
– detect suspicious values and check by human (e.g., deal with
possible outliers)
19

Noisy Data (Binning Methods)

Sorted data for price (in dollars) with Bin = 3:
4, 8, 15, 21, 21, 24, 25, 28, 34
* Partition into (equi-depth) bins:
- Bin 1: 4, 8, 15
- Bin 2: 21, 21, 24
- Bin 3: 25, 28, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9
- Bin 2: 22, 22, 22
- Bin 3: 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 15
- Bin 2: 21, 21, 24
- Bin 3: 25, 25, 34
20

Noisy Data (Binning Methods)

Sorted data for price (in dollars) Bin = 4:
4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
* Partition into (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
21

Noisy Data (Clustering)

• Outliers may be detected by clustering, where similar values are
organized into groups or “clusters”.

• Values which falls outside of the set of clusters may be considered

outliers.
Outline

• Data Preprocessing
– Data Quality
– Major Tasks
• Data Cleaning
• Data Integration
• Data Transformation
• Data Discretization
• Data Reduction
22
23

Data Integration
• Data integration:
– Combines data from multiple sources into a coherent store
• Schema integration: e.g., A.cust-id  B.cust-#
– Integrate metadata from different sources
• Entity identification problem:
– Identify real world entities from multiple data sources, e.g., Bill Clinton =
William Clinton
• Detecting and resolving data value conflicts
– For the same real world entity, attribute values from different sources are
different
– Possible reasons: different representations, different scales, e.g., metric vs.
British units
23
24

Handling Redundancy in Data Integration

• Redundant data occur often when integration of multiple databases
– Object identification: The same attribute or object may have
different names in different databases
– Derivable data: One attribute may be a “derived” attribute in
another table, e.g., annual revenue
• Redundant attributes may be able to be detected by correlation
analysis and covariance analysis
• Careful integration of the data from multiple sources may help
reduce/avoid redundancies and inconsistencies and improve mining
speed and quality

24
Correlation Analysis (Nominal Data)

• Χ2 (chi-square) test
(Observed  Expected) 2
2  
Expected

• The larger the Χ2 value, the more likely the variables are related
• The cells that contribute the most to the Χ2 value are those whose actual count
is very different from the expected count

25
Chi-Square Calculation: An Example
• Null Hypothesis: A & B are independent (not correlated)
• Alternate Hypothesis: A & B are dependent (correlated)

Gender
Male Female Sum (row)
fiction 250(90) 200(360) 450
Preferred
Reading Non fiction 50(210) 1000(840) 1050

Sum(col.) 300 1200 1500

• Χ2 (chi-square) calculation (numbers in parenthesis are expected counts

calculated based on the data distribution in the two categories)

( 250  90 ) 2
(50  210) 2
( 200  360) 2
(1000  840) 2
2      507.93
90 210 360 840 26
Chi Square Calculation: An Example
Degree of Freedom = (r-1) (c-1) = (2-1) (2-1) = 1 where r= distinct value in variable A,
c= distinct values in variable B

Level of Significance

Critical Value

It shows that Gender and Preferred Reading are strongly correlated in the group. 27
Correlation Analysis (Numeric Data)

• Correlation coefficient (also called Pearson’s product moment coefficient)

i1 (ai  A)(bi  B) 

n n
(ai bi )  n AB
rA, B   i 1

(n  1) A B (n  1) A B

where n is the number of records, 𝐴ҧ and 𝐵ത are the respective means of A and B, σA and
σB are the respective standard deviation of A and B, and Σ(aibi) is the sum of the AB cross-
product.
• If rA,B > 0, A and B are positively correlated (A’s values increase as B’s). The
higher, the stronger correlation.
• rA,B = 0: independent; rAB < 0: negatively correlated

28
Covariance (Numeric Data)
Expected Values of A and B

Covariance of A and B

Positively Correlated
Stock prices of both companies rise together 29
30

Visually Evaluating Correlation

Scatter plots
showing the
similarity from
–1 to 1.
31

Pearson Correlation
32

Pearson Correlation (Shortcut)

Outline

• Data Preprocessing
– Data Quality
– Major Tasks
• Data Cleaning
• Data Integration
• Data Transformation
• Data Discretization
• Data Reduction
33
Data Transformation
• A function that maps the entire set of values of a given attribute to a new set of replacement
values s.t. each old value can be identified with one of the new values
• Methods
– Smoothing: Remove noise from data
– Attribute/feature selection
• Subset of a dataset is selected for further processing
– Normalization: Scaled to fall within a smaller, specified range
• min-max normalization
• z-score normalization
• normalization by decimal scaling
– Discretization: supervised
34
Normalization

• Works on numeric attributes

• Attribute normalization – one range of values to another
• Usual ranges: -1 to +1, or 0 to 1.
• Issues: this might introduce distortions or biases into the data
• So you need to understand the properties and potential weaknesses
of the method before using it

35
Normalization: min-max

• Min-max normalization: to [new_minA, new_maxA]

v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA
– Ex. Let income range $12,000 to $98,000 normalized to [0.0, 1.0]. Then $73,000 is
mapped to 73,600  12,000 (1.0  0)  0  0.716
98,000  12,000

• Positive: min-max normalization preserves all relationships of data values

exactly and doesn't introduce any potential biases
• Negative: If a future input case falls outside the original data range, an “out of
bounds” error will occur

36
Normalization (Dealing with out-of-range values)
• Ignore the range has been exceeded
– ….but does this affect the quality of the model?
• Ignore the out-of-range instances
– Reducing the number of instances, reduces the confidence in the sample
representing population
– Introduces bias
• Clip the out-of-range values
– E.g. if the value > 1, assign 1 to it. If value < 0, assign 0 to it.
– Information content on the limits is distorted by projecting multiple values
to a single value.

37
Normalization (z-score)

• Normalization of A and A’ based on the mean and the

standard deviation of the attribute
– Mean and Standard deviation depend on the data
v  A
v' 
 A

73,600  54,000
– Ex. Let μ = 54,000, σ = 16,000. Then 16,000
 1.225

• When to use z-score if min-max normalization is available?

38
Normalization: decimal scaling

• moves the decimal point of A by j positions such that j is

the minimum number of positions moved so that absolute
maximum value falls in [0,1]
v
v'  j
10

• E.g. if v ranges between -98 and 9738, then j = 4 means

that v` ranges between -0.0098 and 0.9738

39
Outline

• Data Preprocessing
– Data Quality
– Major Tasks
• Data Cleaning
• Data Integration
• Data Transformation
• Data Discretization
• Data Reduction
40
41

Data Discretization
• The task of attribute (feature)-discretization techniques is to
discretize the values of continuous features into a small number
of intervals, where each interval is mapped to a discrete symbol.
• Advantages:-
– Simplified data description and easy-to-understand data and final data-
mining results.
– Only Small interesting rules are mined.
– End-results processing time decreased.
– End-results accuracy improved.
Entropy Based Discretization
• Given a set of samples S, if S is partitioned into two intervals S1 and S2 using boundary T, the
Information is

• Where S1 and S2 corresponds to the samples in S, satisfying the condition A<T and A>=T.

• Where pi is the probability of class i in S1, determined by dividing the number of samples of
class i in S1 by the total number of samples in S1.

• The boundary that minimizes the entropy function over all possible boundaries is selected as a
binary discretization.

• The process is recursively applied to partitions obtained until some stopping criterion is met

42
Entropy Based Discretization: Example
ID 1 2 3 4 5 6 7 8 9
Age 21 22 24 25 27 27 27 35 41
Grade F F P F P P P P P
• Let Grade be the class attribute. Use entropy-based
discretization to divide the range of ages into different
discrete intervals.

• There are 6 possible boundaries. They are 21.5, 23, 24.5,

26, 31, and 38.

• Let us consider the boundary at T = 21.5.

Let S1 = {21}
Let S2 = {22, 24, 25, 27, 27, 27, 35, 41} 43
Entropy Based Discretization: Example
ID 1 2 3 4 5 6 7 8 9
Age 21 22 24 25 27 27 27 35 41
Grade F F P F P P P P P

• The number of elements in S1 and S2 are:

|S1| = 1
|S2| = 8
• The entropy of S1 is
Ent ( S1 )   P (Grade  F)  log 2 P(Grade  F)  P (Grade  P)  log 2 P (Grade  P)
 (1 / 1)  log 2 (1 / 1)  (0 / 1)  log 2 (0 / 1)
0
• The entropy of S2 is
Ent ( S 2 )   P (Grade  F)  log 2 P (Grade  F)  P (Grade  P)  log 2 P (Grade  P)
 (2 / 8)  log 2 (2 / 8)  (6 / 8)  log 2 (6 / 8)
 0.5  (0.311)
 0.811 44
45

Entropy Based Discretization: Example

• Hence, the entropy after partitioning at T = 21.5 is

| S1 | | S2 |
E (S , T )  Ent ( S1 )  Ent ( S 2 )
|S| |S|
|1| |8|
 Ent ( S1 )  Ent ( S 2 )
|9| |9|
 (1 / 9)(0)  (8 / 9)(0.811)
 0.721
46

Entropy Based Discretization: Example

• The entropies after partitioning for all the boundaries are:
T = 21.5 = E(S,21.5)
T = 23 = E(S,23)
.
Now recursively apply entropy
.
discretization upon both
T = 38 = E(S,38)
partitions

Select the boundary with the smallest entropy

Suppose best is T = 23

ID 1 2 3 4 5 6 7 8 9
Age 21 22 24 25 27 27 27 35 41
Grade F F P F P P P P P
Outline

• Data Preprocessing
– Data Quality
– Major Tasks
• Data Cleaning
• Data Integration
• Data Transformation
• Data Discretization
• Data Reduction
47
Data Reduction

• Data is often too large and reducing data can improve performance
• Data reduction consists of reducing the representation of the dataset
while producing the same or almost the same results
• Data reduction includes:
– Aggregation, dimensionality reduction, discretization, numerosity reduction

48
Dimensionality reduction

• Feature selection (or attribute subset selection)

– Select only the necessary attributes
– The goal is to find a minimum set of attributes such that the
resulting probability distribution of data classes is as close as
possible to the original distribution using the attributes
• Example Technique:
– Decision-tree induction

49
50

Summary
• Data quality: accuracy, completeness, consistency, timeliness, believability,
interpretability
• Data cleaning: e.g. missing/noisy values, outliers
• Data integration from multiple sources:
– Remove redundancies
– Detect inconsistencies
• Data transformation
– Dimensionality Reduction
– Normalization
– Discretization
51

References
• D. P. Ballou and G. K. Tayi. Enhancing data quality in data warehouse environments. Comm. of
ACM, 42:73-78, 1999
• A. Bruce, D. Donoho, and H.-Y. Gao. Wavelet analysis. IEEE Spectrum, Oct 1996
• T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley, 2003
• J. Devore and R. Peck. Statistics: The Exploration and Analysis of Data. Duxbury Press, 1997.
• H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative data cleaning:
Language, model, and algorithms. VLDB'01
• M. Hua and J. Pei. Cleaning disguised missing data: A heuristic approach. KDD'07
• H. V. Jagadish, et al., Special Issue on Data Reduction Techniques. Bulletin of the Technical
Committee on Data Engineering, 20(4), Dec. 1997
• H. Liu and H. Motoda (eds.). Feature Extraction, Construction, and Selection: A Data Mining
Perspective. Kluwer Academic, 1998
• J. E. Olson. Data Quality: The Accuracy Dimension. Morgan Kaufmann, 2003
• D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann, 1999
• V. Raman and J. Hellerstein. Potters Wheel: An Interactive Framework for Data Cleaning and
Transformation, VLDB’2001
• T. Redman. Data Quality: The Field Guide. Digital Press (Elsevier), 2001
• R. Wang, V. Storey, and C. Firth. A framework for analysis of data quality research. IEEE Trans.
Knowledge and Data Engineering, 7:623-640, 1995

This Study Resource Was: 8.1 Financial Condition of Banks: The File Banks - Xls Includes Data On A Sample of 20 Banks
No ratings yet
This Study Resource Was: 8.1 Financial Condition of Banks: The File Banks - Xls Includes Data On A Sample of 20 Banks
3 pages
Statistics For Engineers and Scientists, 6th Edition William Navidi - Ebook PDF Instant Download
100% (8)
Statistics For Engineers and Scientists, 6th Edition William Navidi - Ebook PDF Instant Download
69 pages
Assignment Mtech
No ratings yet
Assignment Mtech
5 pages
Attribute and Variable Sampling Plan Design
No ratings yet
Attribute and Variable Sampling Plan Design
14 pages
IM Tutorial 3
No ratings yet
IM Tutorial 3
8 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
The Significant Difference Between Control and Experimental Group
No ratings yet
The Significant Difference Between Control and Experimental Group
8 pages
Difference GMM vs. System GMM Term Paper
No ratings yet
Difference GMM vs. System GMM Term Paper
23 pages
Chapter 1
No ratings yet
Chapter 1
24 pages
A Class MTH302 MCQ's Solved by Rana Umair A Khan
100% (3)
A Class MTH302 MCQ's Solved by Rana Umair A Khan
14 pages
Chapter 4 - One Sample Test of Hypothesis
No ratings yet
Chapter 4 - One Sample Test of Hypothesis
37 pages
Econometric Analysis II (Theory and Lab) - Course Outline
No ratings yet
Econometric Analysis II (Theory and Lab) - Course Outline
3 pages
Hypothesis Testing 1 PDF
No ratings yet
Hypothesis Testing 1 PDF
15 pages
Edexcel Gcse Statistics Coursework Help
100% (2)
Edexcel Gcse Statistics Coursework Help
8 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
33 pages
DM Chapter 3 Data Preprocessing
No ratings yet
DM Chapter 3 Data Preprocessing
76 pages
PS 3 - 2015
No ratings yet
PS 3 - 2015
2 pages
Sample Thesis With Anova
100% (3)
Sample Thesis With Anova
4 pages
Confounding Variable
No ratings yet
Confounding Variable
3 pages
Data Mining Requires Collecting Great Amount of Data (Available in Data Warehouses or Databases) To Achieve The Intended Objective
No ratings yet
Data Mining Requires Collecting Great Amount of Data (Available in Data Warehouses or Databases) To Achieve The Intended Objective
37 pages
Identifying Poisonous Fish
No ratings yet
Identifying Poisonous Fish
11 pages
BS Set 4 Linear Regression
No ratings yet
BS Set 4 Linear Regression
20 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
62 pages
Making Decisions With Data - Project
No ratings yet
Making Decisions With Data - Project
3 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
63 pages
Data Mining CSE-443: Ayesha Aziz Prova Lecturer, Dept. of CSE CWU
No ratings yet
Data Mining CSE-443: Ayesha Aziz Prova Lecturer, Dept. of CSE CWU
21 pages
Data Preprocessing (DWDM MOD 2)
No ratings yet
Data Preprocessing (DWDM MOD 2)
62 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
03preprocessing 1
No ratings yet
03preprocessing 1
39 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
80 pages
03 Pre Processing
No ratings yet
03 Pre Processing
89 pages
Data Mining: Dosen: Dr. Vitri Tundjungsari
No ratings yet
Data Mining: Dosen: Dr. Vitri Tundjungsari
64 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
30 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
Unit-Ii Data Preprocessing
No ratings yet
Unit-Ii Data Preprocessing
94 pages
Data Mining P5
No ratings yet
Data Mining P5
32 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
54 pages
Concepts and Techniques: - Chapter 3
No ratings yet
Concepts and Techniques: - Chapter 3
64 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
Unit I
No ratings yet
Unit I
57 pages
Group Project - Corona Virus MAS291 - Group 1 - AI1603 Lecturer: Lê Thị Hồng Thơm
No ratings yet
Group Project - Corona Virus MAS291 - Group 1 - AI1603 Lecturer: Lê Thị Hồng Thơm
94 pages
TTDS Lecture 2
No ratings yet
TTDS Lecture 2
40 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
56 pages
Data Mining
No ratings yet
Data Mining
40 pages
Unit - II
No ratings yet
Unit - II
56 pages
BIS 541 Ch03 20-21 S
No ratings yet
BIS 541 Ch03 20-21 S
86 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
IT446 Wk03.2 HanKamberPei 03preprocessing PDF
No ratings yet
IT446 Wk03.2 HanKamberPei 03preprocessing PDF
64 pages
COS10022 - Lecture 03 - Data Preparation PDF
No ratings yet
COS10022 - Lecture 03 - Data Preparation PDF
61 pages
03 Preprocessing
No ratings yet
03 Preprocessing
64 pages
Unit2 Part2
No ratings yet
Unit2 Part2
67 pages
Parmar 1998
No ratings yet
Parmar 1998
20 pages
Correlation Coefficient in Power BI Using DAX - Ben's Blog
No ratings yet
Correlation Coefficient in Power BI Using DAX - Ben's Blog
10 pages
Preprocessing
No ratings yet
Preprocessing
50 pages
Correlation
No ratings yet
Correlation
14 pages
W4-5 03preprocessing
No ratings yet
W4-5 03preprocessing
83 pages
Lecture 09 DM
No ratings yet
Lecture 09 DM
14 pages
Cross Validation
No ratings yet
Cross Validation
10 pages
Wk6 Preprocessing
No ratings yet
Wk6 Preprocessing
64 pages
A - Decision - Tree - Algorithm - For Intrusion-Detection
No ratings yet
A - Decision - Tree - Algorithm - For Intrusion-Detection
7 pages
03preprocessing DMDW
No ratings yet
03preprocessing DMDW
81 pages
TTDS Lecture 2
No ratings yet
TTDS Lecture 2
40 pages
Basics of Statistics
No ratings yet
Basics of Statistics
7 pages
Module 2
No ratings yet
Module 2
62 pages
Businees Statistics Sums For Practice
No ratings yet
Businees Statistics Sums For Practice
12 pages
Full Download Handbook of Statistical Distributions With Applications 2ed. Edition Krishnamoorthy PDF
No ratings yet
Full Download Handbook of Statistical Distributions With Applications 2ed. Edition Krishnamoorthy PDF
55 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
Week 2 - Data Quality
No ratings yet
Week 2 - Data Quality
43 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Chapter 3
No ratings yet
Chapter 3
50 pages
Pre Processing
No ratings yet
Pre Processing
68 pages
Pre Processing
No ratings yet
Pre Processing
52 pages
Chapter 3 - Tagged
No ratings yet
Chapter 3 - Tagged
63 pages
DM Merged
No ratings yet
DM Merged
169 pages
Mod2 DM
No ratings yet
Mod2 DM
86 pages
Week2 2
No ratings yet
Week2 2
25 pages
2-Data Fundamentals For BI - Part1
No ratings yet
2-Data Fundamentals For BI - Part1
39 pages
Text To PDF PDF To Text 483 851
No ratings yet
Text To PDF PDF To Text 483 851
8 pages
VIPDMTheory Chapter 3
No ratings yet
VIPDMTheory Chapter 3
87 pages
2020 Preprocessing
No ratings yet
2020 Preprocessing
63 pages
Data Preprocessing
No ratings yet
Data Preprocessing
120 pages
Template For Calculating DECA Change Scores Positive and Negative LOCKED FINAL
No ratings yet
Template For Calculating DECA Change Scores Positive and Negative LOCKED FINAL
1,016 pages
DP
No ratings yet
DP
44 pages
04 DM BI Data Preprocessing
No ratings yet
04 DM BI Data Preprocessing
93 pages
Lecture 4 - Decision Tree
No ratings yet
Lecture 4 - Decision Tree
48 pages
Lecture 6 - Association Analysis
No ratings yet
Lecture 6 - Association Analysis
62 pages
Msa 02
No ratings yet
Msa 02
9 pages
Lecture 5 Bayesian
No ratings yet
Lecture 5 Bayesian
37 pages
Lecture 1 - Introduction To Machine Learning
No ratings yet
Lecture 1 - Introduction To Machine Learning
35 pages
UNIT II Data Processing (1) .PPTX DMT
No ratings yet
UNIT II Data Processing (1) .PPTX DMT
43 pages
3 Processing
No ratings yet
3 Processing
79 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
From Average To K-means
From Everand
From Average To K-means
Beam van Waardenberg
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet