0% found this document useful (0 votes)

19 views48 pages

' 3 IT326 - Ch2 - Pre-Processing

Uploaded by

SY010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views48 pages

' 3 IT326 - Ch2 - Pre-Processing

Uploaded by

SY010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

2 Data Preprocessing

IT 326: Data Mining

1st term 2023-2024

Chapter 2, “Data Mining: Concepts and Techniques” (4th ed.)

Outline
2

 Data Preprocessing
 Data Quality
 Major Tasks in Data Preprocessing
 Data Cleaning
 Data Integration
 Data Transformation
 Data Reduction
 Feature Selection
 Summary
Data Preprocessing:
3

Why Preprocess the Data?

LOW quality Data Mining LOW quality

data results

Apply Leads to..

ENHANCED
Preprocessing quality of
to ENHANCE results
the quality of
data
Data Quality
4

 Elements defining data quality:

 Accuracy: correct or wrong, accurate or not
 Completeness: not recorded, unavailable, …

 Consistency: inconsistent naming, coding, format …

 Timeliness: timely updated?

 Believability: how much the data are trusted by users?

 Interpretability: how easy the data are understood?

Major Tasks in Data Preprocessing

 Data cleaning
 Fill in missing values, smooth noisy data, identify
or remove outliers, and resolve inconsistencies.
 Data integration
 Integration of multiple databases or files.
 Data reduction
 Dimensionality reduction.
 Numerosity reduction.
 Data transformation
 Normalization.
 Concept hierarchy generation.
 Discretization

Data Mining: Concepts

and Techniques
Data Cleaning
6

 Data in the Real World is Dirty: Lots of potentially incorrect data, e.g., instrument
faulty, human or computer error, transmission error.
 Incomplete: lacking attribute values, or containing only aggregate data.
◼ e.g., Occupation=“ ” (missing data)
 Inconsistent: containing contradictories in codes or names, e.g.,
◼ Age=“42”, Birthday=“03/07/2010”.
 Noisy: containing noise, errors, or outliers.
◼ e.g., Salary=“−10” (an error)
 Intentional (e.g., disguised missing data)
◼ Jan. 1 as everyone’s birthday?

Other issues affecting the quality of data

Data Cleaning: Incomplete (Missing) Values
7

 Data is not always available.

 e.g., many tuples have no recorded value for several attributes, such as customer
income in sales data.
 Missing data may be due to:
 equipment malfunction.

 inconsistent with other recorded data and thus deleted.

◼ (e.g., is teacher & no students , age&DoB)

 data not entered due to misunderstanding.

 certain data may not be considered important at the time of entry.

 Missing data may need to be inferred.

Data Cleaning: Incomplete (Missing) Values
8

How to Handle Missing Values?

 Ignore the tuple: usually done when class label is missing (when doing
classification)
◼ effective when the tuple contains several attributes with missing values
◼ not effective when the percentage of missing values per attribute varies considerably.
 Fill in the missing value :
1) Manually: time consuming+ infeasible (large data and many missing values)
2) Use a global constant (such as a label like “Unknown” or −∞ or “NA”)
3) Use the central tendency for the attribute (e.g., the mean or median)
4) Use the attribute mean/median for all samples belonging to the same class.
5) Use the most probable value.
Data Cleaning: Noisy Data
9

 Noise: random error or variance in a measured variable.

 Incorrect attribute values may be due to:

 faulty data collection instruments.
 data entry problems.
 data transmission problems.

 Smooth out the data to remove noise.

Data Cleaning: Noisy Data
10

How to Handle Noisy Data?

 Binning
 First, sort data and partition into (equal-frequency) bins.
 Then smooth by bin means, smooth by bin median, smooth by bin boundaries, etc.
 Regression
 Smooth by fitting the data into regression functions.
 Outlier Analysis
 Detect and remove outliers using clustering.
◼ Values that fall outside of the set of clusters may be considered outliers.
 Combined computer and human inspection.
Data Cleaning: Noisy Data

Example: Binning To Handle Noise 1

equal-frequency: It divides the range into N

intervals, each containing approximately same
number of samples.

equal-width: It divides the range into N intervals

of equal size. 3 or
Width= (max-min/#bins)
Example: Width=(34-4/3)=10
[4-13], [14-23] , [24-34]
Bin1: 4, 8
Bin2: 15, 21, 21
Figure 3.2 - Binning methods for data smoothing.
Bin3: 24, 25, 28, 34
Data Integration
12

 Data integration:
 The merging of data from multiple sources into a coherent store.

 Challenges:
 Entity identification problem: How to match schemas and objects from different
sources?
 Redundancy and Correlation Analysis: Are any attributes correlated?
Data Integration: Challenges
13

Entity Identification Problem

 How can equivalent real-world entities from multiple data sources be mapped up?
 Same attribute or object may have different names in different databases.
 Example: Attribute name: A.cust-id  B.cust-#, Attribute values: Bill Clinton  William Clinton
 If both kept → redundancy
 Schema integration and object matching are tricky:
 Integrate metadata from different sources.
 Match equivalent real-world entities from multiple sources.
 Detecting and resolving data value conflicts due to different representations, different scales.
 Metadata can be used to avoid errors in schema integration.
Data Integration: Challenges
14

Attribute Redundancy and Correlation Analysis

 Redundant data occur often when integration of multiple databases.
 Causes of redundancy:
 An attribute may be redundant if it can be “derived” from another attribute or set
of attributes.
 Inconsistencies in attribute naming can also cause redundancies in the resulting
dataset.
 Careful integration of the data from multiple sources may help
reduce/avoid redundancies and inconsistencies and improve mining
speed and quality.
Data Integration: Challenges
15

Handling Redundancy with Correlation Analysis

 Some redundancies can be detected by correlation analysis.
 Given two attributes, correlation analysis can measure how strongly one
attribute implies the other, based on the available data.
 Each type of data has special type of correlation measure.
Nominal χ2 (Chi square)

Data Type
Correlation coefficient
Numerical
Covariance
Data Integration: Challenges (Correlation Analysis )
16

 Nominal Data: Χ2 (chi-square)

 Observed value is the actual frequency (counted from data)
 Expected value is the expected frequency (calculated by formula).
(Observed - Expected )2
c =å
2

Expected
 Expected values are calculated using :
𝑐𝑜𝑢𝑛𝑡 𝐴 = 𝑎𝑖 × 𝑐𝑜𝑢𝑛𝑡(𝐵 = 𝒃𝑗 )
𝑒𝑖𝑗 =
𝑛
 The larger the Χ2 value, the highest the correlation.
 The cells that contribute the most to the Χ2 value are those whose actual count is very different
from the expected count.
Data Integration: Challenges (Correlation Analysis )
17

□ Nominal Data: Χ2 (chi-square) Example

Contingency Table Observed (Expected) Col row
Dataset n=1500
male Female Sum (row) Gender Preferred
reading
fiction 250 (90) 200 (360) 450 Male Fiction
Not fiction 50 (210) 1000 (840) 1050 Female Not fiction
Female Fiction
Sum(col.) 300 1200 1500 .. …
.. …
 The expected frequencies(numbers in parentheses):
.. …
𝑐𝑜𝑢𝑛𝑡 𝑚𝑎𝑙𝑒 ×𝑐𝑜𝑢𝑛𝑡(𝑓𝑖𝑐𝑡𝑖𝑜𝑛) 300×450
◼ 𝑒11 = 𝑒male,fiction = 1500
= 1500
= 90 Female Fiction

 Χ2 (chi-square) calculation to test correlation between “preferred reading” and “gender” :

(250 - 90)2 (50 - 210) 2 (200 - 360) 2 (1000 - 840) 2

c =
2
+ + + = 507.93
90 210 360 840
Data Integration: Challenges (Correlation Analysis )
18

□ Nominal Data: Χ2 (chi-square) Example

1. Calculate degree of freedom (df) = (r-1)(c-1) where r is Null hypothesis:
𝑯𝟎 : the two attributes are independent
the # of rows and c is the # of columns. df= (2-1)(2-1) = 1 • df = 1→ degree of freedom
• 𝛼= 0.001 → significance level
2. Set your significance level “alpha” → e.g., alpha = 0.001
• Critical value= 10.827 → rejected\critical value
3. Find rejected value “critical value” from the table Χ2 : chi-square test statistic
the Χ2 rejected value “critical value” is 10.827
If Χ2 > critical value ➔ 𝑯𝟎 is rejected →
attributes are not independent → are correlated
4. Evaluate your results : Χ2 =507.93 >>> 10.827 → reject
𝑯𝟎 → reject“ preferred reading” and “gender” are not
independent → they are strongly correlated in the given
group of people.
Data Integration: Challenges (Correlation Analysis )
19

□ Numeric Data: Correlation coefficient

where n is the number of tuples, 𝐴ҧ and 𝐵ത are the respective means of A and B, σA and σB are
the respective standard deviation of A and B, and Σ(aibi) is the sum of the AB cross-product.

>0 Positively correlated

rA,B =0 Independent

<0 Negatively correlated

Data Integration: Challenges (Correlation Analysis )
20

□ Numeric Data: Covariance

where n is the # of tuples, and 𝐴ҧ and 𝐵ത are the respective mean or expected values of A and B

□ Covariance can be simplified as :

We will use the simplified equation

for calculations

□ Covariance is like correlation:

Correlation
Coefficient
Data Integration: Challenges (Correlation Analysis )
21

□ Numeric Data: Covariance

◼ Positive covariance: IF A and B both tend to be larger than their expected values
THEN Cov(A,B) > 0 → they rise together
◼ Negative covariance: IF A is larger than its expected value & B is smaller than its expected value
THEN Cov(A,B) < 0.
◼ Independence: IF A and B are independent THEN Cov(A,B) = 0.

◼ But the converse is not true.

▪ Cov(A,B) = 0 does NOT imply that A and B are independent.
▪ Some pairs of random variables may have a covariance of 0 but are not independent.
▪ Only under some additional assumptions does a covariance of 0 imply independence.
Data Integration: Challenges (Correlation Analysis )
22

□ Numeric Data: Covariance Example Time point AllElectronics HighTech

Stock prices observed at five time points for AllElectronics and T1 6 20
HighTech company. T2 5 10
T3 4 14
1
T4 3 5
T5 2 5
2

3
Data Transformation
23

 Data are transformed or consolidated into forms appropriate for mining.

 the resulting mining process may be more efficient, and the patterns found may be
easier to understand.

 Attribute Transformation: A function that maps the entire set of values

of a given attribute to a new set of replacement values such that each old
value can be identified with one of the new values.
Data Transformation

Discretization

Encoding

Aggregation
14.30 \ Stdv = 14.30\718.27 = 0.0199
24
Normalization 1,500 – mean =1500 – 1485.70 = 14.30
Data Transformation: Strategies
25

 Smoothing: Remove noise from data.

 Attribute construction: New attributes constructed from the given ones. New
attributes are added to help the mining process.

 Aggregation: Summary or aggregation operations are applied to the data.

 e.g., daily sales data may be aggregated so as to compute monthly and annual total amounts.

 Normalization: where the attribute data are scaled so as to fall within a smaller,
specified range. such as [−1.0 to 1.0], or [0.0 to 1.0].
Data Transformation: Strategies
26

 Discretization: divide the range of continuous attribute into intervals. Numerous continuous
attribute values are replaced by small interval labels.
 Example: a numeric attribute (e.g., age)
◼ Raw values are replaced by interval labels (e.g., 0–10, 11–20, etc.) or conceptual labels (e.g., youth, adult,
senior).
◼ The labels can be recursively organized into higher-level concepts, resulting in a concept hierarchy for this
numeric attribute.

More than one concept hierarchy can be

defined for the same attribute to
accommodate the needs of various users.

 Concept hierarchy generation for nominal data: replacing low level concepts by higher level
concepts
 i.e. attributes such as street can be generalized to higher-level concepts, like city or country.
Data Transformation: Normalization
27

 The measurement unit used can affect the data analysis.

 For example, changing measurement units from meters to inches for height (2.5 cm= 1 inches), or
from kilograms to pounds for weight(1 kg=2.2 pounds), may lead to very different results.
 In general, expressing an attribute in smaller units will lead to a larger range for that
attribute, and thus tend to give such an attribute greater effect or “weight.”
 To help avoid dependence on the choice of measurement units, the data should be normalized or
standardized.
 Normalization is transforming the data to fall within a smaller or common range such
as [-1, 1] or [0.0, 1.0].
 Normalizing the data gives all attributes an equal weight.
 For distance-based methods, normalization helps prevent attributes with initially large ranges (e.g.,
income) from outweighing attributes with initially smaller ranges (e.g., binary attributes).
Data Transformation: Normalization
28

 Min-max normalization: to [new_minA, new_maxA]

Example: Let income range $12,000 to $98,000 normalized to [0.0, 1.0]. Then $73,000 is mapped to
73,600 - 12,000
(1.0 - 0) + 0 = 0.716
98,000 - 12,000
v - µA
 Z-score normalization: (μ: mean, σ: standard deviation) v' =
s A

73,600 − 54,000
Example: Let μ = 54,000, σ = 16,000. Then = 1.225
16,000

 Decimal scaling normalization : Where j is the smallest integer such that

Example: Let A = -986 to 917. Then the maximum absolute value is 986 and j=3
which normalize A to [-0.986 to 0.917]
Data Reduction
29

Data reduction techniques can be applied to obtain a reduced representation of the data
set that is much smaller in volume, yet closely maintains the integrity of the original data.

 Mining on the reduced data set should be more efficient yet produce the same (or
almost the same) analytical results.

 Data reduction strategies include:

◼ Dimensionality reduction
◼ Numerosity reduction
◼ Data compression.
Data Reduction: Dimensionality Reduction
30

Dimensionality reduction is the process of reducing the number of attributes under consideration.
Including:
 Data compression techniques transform or project the original data onto a smaller space, such
as Wavelet transforms and principal components analysis (PCA).

 Attribute subset selection removes irrelevant, weakly relevant, or redundant attributes or

dimensions.
◼ Irrelevant attributes: attributes contain no information that is useful for the data mining task at hand.
◼ Redundant attributes: attributes duplicate much or all of the information contained in one or more other attributes.
 Attribute construction (Start with one column only, progressively adding one column at a time,
i.e., the column that produces the highest increase in performance)

Why ? to improve quality and efficiency of the mining process. Mining on a reduced set of attributes reduces the
number of attributes appearing in the discovered patterns, helping to make the patterns easier to understand.
Data Reduction: Numerosity Reduction
31

Numerosity Reduction : reduce data volume by choosing alternative, smaller forms

of data representation.

 Two types: Parametric methods and Non-parametric methods

 Parametric methods:
 Assume the data fits some model → estimate model parameters → store only the parameters →
discard the data (except possible outliers).
 Methods: Regression and Log-Linear Models.
Data Reduction: Numerosity Reduction
32

 Non-parametric methods: Do not assume models.

 Histogram: Divide data into buckets and store average (sum) for
each bucket
 Clustering: Partition data set into clusters based on similarity,
and store cluster representation (e.g., centroid and diameter)
only.
 Sampling: obtaining a small sample s to represent the whole
data set N.
◼ Key: Choose a representative subset of the data.
 Data cube aggregation: Data can be aggregated so that the Data cube aggregation
resulting data summarize the data (smaller in volume), without
loss of information necessary for the analysis task.
Data Reduction: Data Compression
33

 Obtain a reduced or “compressed” representation of the original data.

 Two types :
 Lossless: if the original data can be reconstructed from the compressed data without any information loss
 Lossy: if we can reconstruct only an approximation of the original data.

lossless

Original Data Compressed

Data

Original Data
Approximated

 Dimensionality and numerosity reduction may also be considered as forms of data

compression.
Feature Selection Methods
34

• Feature selection is the process of removing redundant or irrelevant features from the
original data set.
– a process of selecting the most and small subset of informative feature that are most predictive to
its related class.

• This maximizes the classifier’s ability to classify samples accurately.

• So the carrying out time of the classifier that processes the data will decreases and also
accuracy increases because irrelevant features can include noisy data affecting the
classification accuracy negatively .

Chapter 7, “Data Mining: Concepts and Techniques” (4th ed.)

Feature Selection vs Dimensionality Reduction
35

• Feature Selection and Dimensionality Reduction methods are used for reducing the
number of features in a dataset, there is an important difference.

• Feature selection is simply selecting and excluding given features without

changing them such as
– Remove features with missing values
– Remove features with low variance
– Remove highly correlated features

• Dimensionality reduction transforms features into a lower dimension. (ex: PCA)

https://towardsdatascience.com/feature-selection-and-dimensionality-reduction-f488d1a035de#:~:text=While%20both%20methods%20are%20used,features%20into%20a%20lower%20dimension.
Feature Selection Methods
36

• The feature selection methods can be classified into three categories:

Wrapper FS Embedded FS Hybrid FS

Filter FS Methods
Methods Methods Methods

Independent of the Model hypothesis search

within the feature subset Search in the combined
classification algorithm Features selection is
search space of optimal
embedded in the
feature subsets and
classifier
hypotheses
Computationally simple High classification
and fast. accuracy

Less computational Offer a good tradeoff

Worse classification than wrapper between filters and
Computationally complex , approaches. wrapper approaches.
performance expensive, and slow.
Filter FS Methods
37

• This method selects the feature without depending upon the type of classifier used.
• It does that by using statistical tests to find correlations between a feature and a class.
• The advantage of this method is that, it is simple and independent of the type of
classifier used so feature selection need to be done only once (e.g. as a preprocessing
step).
• The drawback of this method is that it ignores the interaction with the classifier,
ignores the feature dependencies, and lastly each feature considered separately

*https://www.analyticsvidhya.com/blog/2016/12/introduction-to-feature-selection-methods-with-an-example-or-how-to-select-the-right-variables/
Filter FS Methods
38

 Univariate Filter Methods

1. Individual features are ranked according to specific criteria
2. The top N features are selected

 It may select redundant features because the relationship between individual

features is not taken into account while making decisions.

 Examples of criteria include variance and correlation of the feature.

 Variance thresholds remove features whose values don’t change much from observation to
observation (i.e. their variance falls below a threshold). These features provide little value.
 Correlation examines each feature individually to determine the strength of the relationship of the
feature with the response variable.
Filter FS Methods
39

 Multivariate filter methods

 It calculates all pair-wise relationships among features according to a criterion

 They are capable of removing redundant features from the data since they take the
mutual relationship between the features into account.

 An example of criteria is Correlation of the features.

 Correlation thresholds remove features that are highly correlated with others (i.e. its values change
very similarly to another’s). These features provide redundant information.
Wrapper FS Methods
40

• In this method the feature is dependent upon the classifier , i.e. it uses the result of
the classifier to determine the goodness of the given feature or attribute.
• It does that by training the model using a subset of the features, after that this
method will try to improve the model by adding/removing features.
• The advantage of this method is that it removes the drawback of the filter method,
i.e. It includes the interaction with the classifier and also takes the feature
dependencies.
• The drawback of this method is that it is slower than the filter method because it
takes the dependencies also.
• The quality of the feature selection is directly measured by the performance of the
classifier.
Wrapper FS Methods
41

Set of all
Feature
Wrapper FS Methods
42

 Step Forward Feature Selection

1. It starts with an empty set of features.
2. The performance of the classifier is evaluated with respect to each feature. The
feature that performs the best is selected out of all the features.
3. The first feature is tried in combination with all the other features. The
combination of two features that yield the best algorithm performance is
selected.
4. The process continues until the specified number of features are selected.
Wrapper FS Methods
43

 Step Backwards Feature Selection

1. It starts from the set of all features
2. One feature is removed in round-robin fashion from the feature set and the
performance of the classifier is evaluated. The feature set that yields the best
performance is retained.
3. One feature is removed in a round-robin fashion and the performance of all the
combination of features except the 2 features is evaluated.
4. This process continues until the specified number of features remain in the
dataset.
Wrapper FS Methods
44

 Exhaustive Feature Selection

1. The performance of a machine learning algorithm is evaluated against all possible combinations
of the features in the dataset.
2. The feature subset that yields best performance is selected.

 It is the most greedy algorithm of all the wrapper methods since it tries all the
combination of features and selects the best.
 It can be slower compared to step forward and step backward method since it

evaluates all feature combinations.

Embedded FS Methods
45

• This approach consists in algorithms which simultaneously perform model fitting and
feature selection
• Examples of classifiers include decision tree (C4.5) and random forest.
• The advantage of this method is that it is less computationally intensive than a
wrapper approach.
• The accuracy of the classifier depends not only on the classification algorithm but
also on the feature selection method used.
• Selection of irrelevant and inappropriate features may confuse the classifier and
lead to incorrect results.
Embedded FS Methods
46

Source: https://www.analyticsvidhya.com/blog/2016/12/introduction-to-feature-selection-methods-with-an-example-or-how-to-select-the-right-variables/
Hybrid FS Methods
47

 It combines the best properties of filters and wrappers.

 First, a filter method is used in order to reduce the feature space dimension space,

possibly obtaining several candidate subsets.

 Then, a wrapper is employed to find the best candidate subset.

 Hybrid methods usually achieve high accuracy that is characteristic to wrappers and
high efficiency characteristic to filters
Summary
48

 Data quality: accuracy, completeness, consistency, timeliness, believability, interpretability.

 Data cleaning: e.g. missing/noisy values, outliers.
 Data integration from multiple sources: Entity identification problem, correlation analysis.
 Data transformation:
 Normalization
 Concept hierarchy generation
 Data reduction:
 Dimensionality reduction
 Numerosity reduction
 Data compression
 Feature selection:
 Filter Feature Selection
 Wrapper Feature Selection
 Hybrid Feature Selection

Data Preprocessing (DWDM MOD 2)
No ratings yet
Data Preprocessing (DWDM MOD 2)
62 pages
2-Data Fundamentals For BI - Part1
No ratings yet
2-Data Fundamentals For BI - Part1
39 pages
IT326 - Ch3
No ratings yet
IT326 - Ch3
33 pages
Data Preprocessing
No ratings yet
Data Preprocessing
39 pages
3 Processing
No ratings yet
3 Processing
79 pages
Data Mining 3
No ratings yet
Data Mining 3
57 pages
Chapter 3 - Tagged
No ratings yet
Chapter 3 - Tagged
63 pages
Lecture 2.3.1-2.3.3
No ratings yet
Lecture 2.3.1-2.3.3
67 pages
DP
No ratings yet
DP
44 pages
Lecture 3
No ratings yet
Lecture 3
47 pages
03 Preprocessing
No ratings yet
03 Preprocessing
54 pages
Slide 05 Chapter3 Data Preprocessing
No ratings yet
Slide 05 Chapter3 Data Preprocessing
58 pages
2020 Preprocessing
No ratings yet
2020 Preprocessing
63 pages
TTDS Lecture 2
No ratings yet
TTDS Lecture 2
40 pages
Mining
No ratings yet
Mining
63 pages
CH 03-01 Data Preprocessing
No ratings yet
CH 03-01 Data Preprocessing
27 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
62 pages
DM Merged
No ratings yet
DM Merged
169 pages
Module 5 03preprocessing
No ratings yet
Module 5 03preprocessing
63 pages
03 Preprocessing
No ratings yet
03 Preprocessing
38 pages
03 Preprocessing
No ratings yet
03 Preprocessing
65 pages
Lecture#2 Data Mining MS (DEIM) Spring 2025
No ratings yet
Lecture#2 Data Mining MS (DEIM) Spring 2025
61 pages
03 Preprocessing
No ratings yet
03 Preprocessing
64 pages
TTDS Lecture 2
No ratings yet
TTDS Lecture 2
40 pages
Data Preprocessing
No ratings yet
Data Preprocessing
120 pages
Unit 3
No ratings yet
Unit 3
164 pages
Data Pre Processing
No ratings yet
Data Pre Processing
63 pages
03 Pre Processing
No ratings yet
03 Pre Processing
63 pages
03 Preprocessing
No ratings yet
03 Preprocessing
60 pages
03preprocessing 20160222
No ratings yet
03preprocessing 20160222
65 pages
Chapter 3
No ratings yet
Chapter 3
56 pages
03 Preprocessing
No ratings yet
03 Preprocessing
63 pages
Data Mining Requires Collecting Great Amount of Data (Available in Data Warehouses or Databases) To Achieve The Intended Objective
No ratings yet
Data Mining Requires Collecting Great Amount of Data (Available in Data Warehouses or Databases) To Achieve The Intended Objective
37 pages
Unit2 Part2
No ratings yet
Unit2 Part2
67 pages
Module 2
No ratings yet
Module 2
62 pages
Lec 7
No ratings yet
Lec 7
45 pages
Wk6 Preprocessing
No ratings yet
Wk6 Preprocessing
64 pages
Data Mining: Dosen: Dr. Vitri Tundjungsari
No ratings yet
Data Mining: Dosen: Dr. Vitri Tundjungsari
64 pages
Chapter 3
No ratings yet
Chapter 3
63 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
66 pages
Data Preprocessing
No ratings yet
Data Preprocessing
63 pages
IT446 Wk03.2 HanKamberPei 03preprocessing PDF
No ratings yet
IT446 Wk03.2 HanKamberPei 03preprocessing PDF
64 pages
03 Pre Processing
No ratings yet
03 Pre Processing
89 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
30 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
63 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
40 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
Data Mining and Knowledge Discovery
No ratings yet
Data Mining and Knowledge Discovery
65 pages
Concepts and Techniques: - Chapter 3
No ratings yet
Concepts and Techniques: - Chapter 3
63 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
Concepts and Techniques: - Chapter 3
No ratings yet
Concepts and Techniques: - Chapter 3
63 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
56 pages
PPT1
No ratings yet
PPT1
93 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
54 pages
Concepts and Techniques: - Chapter 3
No ratings yet
Concepts and Techniques: - Chapter 3
64 pages
Chapter 3: Data Preprocessing
100% (1)
Chapter 3: Data Preprocessing
41 pages
Change Management - QB - 2018
No ratings yet
Change Management - QB - 2018
15 pages
OSD Topic 1
No ratings yet
OSD Topic 1
33 pages
Investigation Into The Value of Unit Cohesion in Peacetime - Frederick J. Manning and Larry H. Ingraham
100% (1)
Investigation Into The Value of Unit Cohesion in Peacetime - Frederick J. Manning and Larry H. Ingraham
33 pages
Written Report On Suicide
100% (1)
Written Report On Suicide
6 pages
Emic and Etic Viewpoints
No ratings yet
Emic and Etic Viewpoints
6 pages
The Role of Landscape in Catalogue D'oiseaux by Olivier Messiaen
No ratings yet
The Role of Landscape in Catalogue D'oiseaux by Olivier Messiaen
253 pages
Department of Education: Apalit High School
No ratings yet
Department of Education: Apalit High School
3 pages
WWW - Aensi.in: Received: 2 June Revised: 7 June Accepted: 17 June
No ratings yet
WWW - Aensi.in: Received: 2 June Revised: 7 June Accepted: 17 June
11 pages
Baseline Survey Report On Seaweed Cultivation, Processing, and Marketing For Employment Generation in Bangladesh's Coastal Poor Communities
100% (1)
Baseline Survey Report On Seaweed Cultivation, Processing, and Marketing For Employment Generation in Bangladesh's Coastal Poor Communities
9 pages
Science Report For Coins Rotation
No ratings yet
Science Report For Coins Rotation
3 pages
PP0191 06 PDF
No ratings yet
PP0191 06 PDF
11 pages
Introduction To Middle Range Nursing Theories
100% (1)
Introduction To Middle Range Nursing Theories
67 pages
Shaping Future Professionals: Employer Perspectives On Accounting Student Competency in Internships
No ratings yet
Shaping Future Professionals: Employer Perspectives On Accounting Student Competency in Internships
15 pages
Article Review: The Use of The Internet in Higher Education
No ratings yet
Article Review: The Use of The Internet in Higher Education
4 pages
Epekto NG Kompyuter Sa Mga Estudyante Thesis
100% (3)
Epekto NG Kompyuter Sa Mga Estudyante Thesis
6 pages
Abses Perianal Jurnal
No ratings yet
Abses Perianal Jurnal
4 pages
Diksha Verma Synopsis
No ratings yet
Diksha Verma Synopsis
13 pages
Qasim2021 Article ExaminingImpactOfIslamicWorkEt
No ratings yet
Qasim2021 Article ExaminingImpactOfIslamicWorkEt
13 pages
School of Illusion: Hypnotic Illusion and Its Possible Applications.
No ratings yet
School of Illusion: Hypnotic Illusion and Its Possible Applications.
11 pages
Challenging The Link Between Early Childhood Television Exposure and Later Attention Problems
No ratings yet
Challenging The Link Between Early Childhood Television Exposure and Later Attention Problems
50 pages
Oge Ogbonna Resume
No ratings yet
Oge Ogbonna Resume
2 pages
Literature Review On Budgeting Techniques
100% (1)
Literature Review On Budgeting Techniques
5 pages
Social Media Analyticsand Metricsfor Improving Users Engagement
No ratings yet
Social Media Analyticsand Metricsfor Improving Users Engagement
19 pages
Consumer Behavior of Bread and Its Influence On "Supply Chain Management" An Innovative Approach
No ratings yet
Consumer Behavior of Bread and Its Influence On "Supply Chain Management" An Innovative Approach
16 pages
Hulme Shepherd
No ratings yet
Hulme Shepherd
21 pages
Criticises Explainable Defect Pred
No ratings yet
Criticises Explainable Defect Pred
12 pages
Assignment Front Page
No ratings yet
Assignment Front Page
5 pages
Cambridge International Advanced Subsidiary and Advanced Level
No ratings yet
Cambridge International Advanced Subsidiary and Advanced Level
4 pages
SERB Call For Proposals 2022
No ratings yet
SERB Call For Proposals 2022
1 page
Lampiran Efas Dan Ifas: 1. EFAS (External Factors Analysis Strategy)
No ratings yet
Lampiran Efas Dan Ifas: 1. EFAS (External Factors Analysis Strategy)
3 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet