0% found this document useful (0 votes)

5 views40 pages

Module1.5 Preprocessing

The document outlines the course CSA3002 on Machine Learning Algorithms, focusing on skill development through experiential learning. It covers essential topics such as data preprocessing, including data cleaning, integration, transformation, and reduction, emphasizing the importance of quality data for effective machine learning. Additionally, it details various techniques for handling missing and noisy data, as well as methods for dimensionality reduction and data compression.

Uploaded by

kavya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views40 pages

Module1.5 Preprocessing

Uploaded by

kavya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Course Code: CSA3002

MACHINE LEARNING ALGORITHMS

Course Type: LPC – 2-2-3

Course Objectives
• The objective of the course is to familiarize the learners with
the concepts of Machine Learning Algorithms and attain
Skill Development through Experiential Learning
techniques.
Course Outcomes
At the end of the course, students should be able to
1. Understanding of training and testing the datasets using machine
Learning techniques.
2. Apply optimization and parameter tuning techniques for machine
Learning algorithms.
3. Apply a machine learning model to solve various problems using
machine learning algorithms.
4. Apply machine learning algorithm to create models.
DATA PRE-PROCESSING

Table of Contents:
• Why preprocess the data?

• Data cleaning

• Data integration and transformation

Why Is Data Dirty?
• Incomplete data may come from
• “Not applicable” data value when collected
• Different considerations between the time when the data was collected
and when it is analyzed.
• Human/hardware/software problems
• Noisy data (incorrect values) may come from
• Faulty data collection instruments
• Human or computer error at data entry
• Errors in data transmission
• Duplicate records also need data cleaning

5
Why Is Data Preprocessing Important?

• No quality data, no quality mining results!

• Quality decisions must be based on quality data
• e.g., duplicate or missing data may cause incorrect or even
misleading statistics.
• Data warehouse needs consistent integration of quality data
• Data extraction, cleaning, and transformation comprises the
majority of the work of building a data warehouse

6
Multi-Dimensional Measure of Data Quality

• A well-accepted multidimensional view:

• Accuracy
• Completeness
• Consistency
• Timeliness
• Believability
• Value added
• Interpretability
• Accessibility
• Broad categories:
• Essential, contextual, representational, and accessibility

7
Major Tasks in Data Preprocessing

• Data cleaning
• Fill in missing values, smooth noisy data, identify or remove outliers,
and resolve inconsistencies
• Data integration
• Integration of multiple databases, data cubes, or files
• Data transformation
• Normalization and aggregation
• Data reduction
• Obtains reduced representation in volume but produces the same or
similar analytical results
• Data discretization
• Part of data reduction but with particular importance, especially for
numerical data

8
Forms of Data Preprocessing

9
Data Cleaning
• Importance
• “Data cleaning is one of the biggest problems in data
warehousing”—Ralph Kimball
• “Data cleaning is the number one problem in data
warehousing”—DCI survey

• Data cleaning tasks

• Fill in missing values
• Identify outliers and smooth out noisy data
• Correct inconsistent data
• Resolve redundancy caused by data integration
10
Missing Data
• Data is not always available
• E.g., many tuples have no recorded value for several attributes, such
as customer income in sales data
• Missing data may be due to
• equipment malfunction
• inconsistent with other recorded data and thus deleted
• data not entered due to misunderstanding
• certain data may not be considered important at the time of entry
• not register history or changes of the data
• Missing data may need to be inferred.

11
How to Handle Missing Data?
• Ignore the tuple: usually done when class label is missing (assuming the
tasks in classification—not effective when the percentage of missing values
per attribute varies considerably.

• Fill in the missing value manually: tedious + infeasible?

• Fill in it automatically with

• a global constant : e.g., “unknown”, a new class?!
• the attribute mean
• the attribute mean for all samples belonging to the same class: smarter
• the most probable value: inference-based such as Bayesian formula or
decision tree

12
Noisy Data
• Noise: random error or variance in a measured variable
• Incorrect attribute values may due to
• faulty data collection instruments
• data entry problems
• data transmission problems
• technology limitation
• inconsistency in naming convention
• Other data problems which requires data cleaning
• duplicate records
• incomplete data
• inconsistent data

13
How to Handle Noisy Data?
• Binning
• first sort data and partition into (equal-frequency) bins
• then one can smooth by bin means, smooth by bin median,
smooth by bin boundaries, etc.
• Regression
• smooth by fitting the data into regression functions
• Clustering
• detect and remove outliers
• Combined computer and human inspection
• detect suspicious values and check by human (e.g., deal with
possible outliers)
Binning
• Binning methods smooth a sorted data value by consulting
its “neighborhood,” that is, the values around it. The sorted
values are distributed into a number of “buckets,” or bins.
Because binning methods consult the neighborhood of
values, they perform local smoothing.
• In smoothing by bin means, each value in a bin is replaced
by the mean value of the bin. For example, the mean of the
values 4, 8, and 15 in Bin 1 is 9. Therefore, each original
value in this bin is replaced by the value 9.
• smoothing by bin medians can be employed, in which
each bin value is replaced by the bin median.
• smoothing by bin boundaries, the minimum and
maximum values in a given bin are identified as the bin
boundaries. Each bin value is then replaced by the closest
boundary value.
Binning Methods for Data Smoothing
Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
* Partition into equal-frequency (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
16
Regression
Data smoothing can also
be done by regression, a y
technique that conforms
data values to a function.
Linear regression involves Y1
finding the “best” line to
fit two attributes (or
variables) so that one
Y1’ y=x+1
attribute can be used to
predict the other.

X1 x

17
Cluster Analysis

Outliers may be detected by

clustering, for example, where
similar values are organized into
groups, or “clusters.” Intuitively,
values that fall outside of
the set of clusters may be
considered outliers
Data Integration
• Data integration:
• Combines data from multiple sources into a coherent store
• Schema integration: e.g., A.cust-id  B.cust-#
• Integrate metadata from different sources
• 1. Entity identification problem:
• Identify real world entities from multiple data sources, e.g.,
Bill Clinton = William Clinton
• Detecting and resolving data value conflicts
• For the same real world entity, attribute values from different
sources are different
• Possible reasons: different representations, different scales,
e.g., metric vs. British units

19
2. Handling Redundancy in Data Integration

• Redundant data occur often when integration of multiple

databases
• Object identification: The same attribute or object may
have different names in different databases
• Derivable data: One attribute may be a “derived” attribute
in another table, e.g., annual revenue
• Redundant attributes may be able to be detected by correlation
analysis
• Careful integration of the data from multiple sources may help
reduce/avoid redundancies and inconsistencies and improve
mining speed and quality
20
3. Tuple Duplication
• At the time data integration, duplication should also be
detected at the tuple level (e.g., where there are two or
more identical tuples for a given unique data entry
case).
• The use of denormalized tables (often done to improve
performance by avoiding joins) is another source of data
redundancy.
• Inconsistencies often arise between various duplicates,
due to inaccurate data entry or updating some but not
all data occurrences.
4. Data Value Conflict Detection and Resolution
• Data integration also involves the detection and resolution of data value conflicts.
• For example, for the same real-world entity, attribute values from different sources may
differ. This may be due to differences in representation, scaling, or encoding. For
instance, a weight attribute may be stored in metric units in one system and British
imperial units in another.
• For a hotel chain, the price of rooms in different cities may involve not only different
currencies but also different services (e.g., free breakfast) and taxes.
Data Transformation
• Strategies for data transformation include the following:
• Smoothing: remove noise from data
• Aggregation: summarization, data cube construction
• Generalization: concept hierarchy climbing
• Normalization: scaled to fall within a small, specified range
• min-max normalization
• z-score normalization
• normalization by decimal scaling
• Attribute/feature construction
• New attributes constructed from the given ones
1. Data Transformation: Normalization
• Min-max normalization: to [new_minA, new_maxA]

v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA
• Ex. Let income range $12,000 to $98,000 normalized to [0.0, 1.0]. Then
73,600  12,000
(1.0  0)  0 0.716
$73,600 is mapped to 98,000  12,000

• Z-score normalization (μ: mean, σ: standard deviation):

v  A 73,600  54,000
1.225
v'  16,000
 A

• Ex. Let μ = 54,000, σ = 16,000. Then

• Normalization by decimal scaling
v
v'  Where j is the smallest integer such that Max(|ν’|) < 1
10 j
Suppose that the recorded values of A range from _986 to 917. The maximum absolute value of A is 986. To
normalize by decimal scaling, we therefore divide each value by 1000 (i.e., j=3) so that _986 normalizes to _0.986
July 3, 2025
and 917 normalizes to 0.917.
Pre-Processing

Table of Contents:
• Data reduction

• Discretization and concept hierarchy generation

Module Outcome:
• Apply the various pre-processing Techniques in dataset
Data Reduction Strategies

• Why data reduction?

• A database/data warehouse may store terabytes of data
• Complex data analysis/mining may take a very long time to run on the
complete data set
• Data reduction
• Obtain a reduced representation of the data set that is much smaller in volume
but yet produce the same (or almost the same) analytical results
• Data reduction strategies
1) Dimensionality reduction
2) Numerosity reduction,
3) Data compression.

Data Mining: Concepts and Techniques

Dimensionality reduction
• It is the process of reducing the number of random variables or attributes
under consideration.
• Attribute subset selection is a method of dimensionality reduction in which
irrelevant, weakly relevant, or redundant attributes or dimensions are
detected and removed
• Dimensionality reduction methods include wavelet transforms and
principal components analysis which transform or project the original data
onto a smaller space.
Attribute Subset Selection
• Feature selection (i.e., attribute subset selection):
• It is the process of selecting a subset of relevant features for use in model
construction.
Heuristic methods (due to exponential # of choices):

• Step-wise forward selection

• Step-wise backward elimination
• Combining forward selection and backward elimination
• Decision-tree induction

Data Mining: Concepts and Techniques

Heuristic Feature Selection Methods
• Several heuristic feature selection methods:
• Best step-wise feature selection:
• The best single-feature is picked first
• Then next best feature condition to the first, ...
• Step-wise feature elimination:
• Repeatedly eliminate the worst feature
• Best combined feature selection and elimination

Data Mining: Concepts and Techniques 29

Dimensionality Reduction: Principal
Component Analysis (PCA)
• Given N data vectors from n-dimensions, find k ≤ n
orthogonal vectors (principal components) that can be
best used to represent data.

• PCA, is a dimensionality-reduction method that is often

used to reduce the dimensionality of large data sets, by
transforming a large set of variables into a smaller one
that still contains most of the information in the large
set.

Data Mining: Concepts and Techniques 30

Image Compression
Image compression is the process of encoding or
converting an image file in such a way that it consumes
less space than the original file.

July 3, 2025 Data Mining: Concepts and Techniques 31

Numerosity reduction
• Numerosity reduction techniques replace the
original data volume by alternative, smaller forms of
data representation. These techniques may be
parametric or nonparametric.
• For parametric methods, a model is used to
estimate the data, so that typically only the data
parameters need to be stored, instead of the actual
data. (Outliers may also be stored.) Regression and
log-linear models are examples.
• Nonparametric methods for storing reduced
representations of the data include histograms ,
clustering , sampling , and data cube aggregation .
Data Reduction Method : Histograms

• Histogram is 40
the data representation in 35
terms of frequency. It uses
30
binning to
25
approximate data distribution
and is a popular form of data 20
reduction. 15
10
5
0
10000 30000 50000 70000 90000
Data Mining: Concepts and Techniques 33
Data Reduction Method: Clustering

• Partition data set into clusters based on similarity, and store cluster
representation (e.g., centroid and diameter) only

• Can be very effective if data is clustered

• Can have hierarchical clustering and be stored in multi-dimensional index

tree structures

• There are many choices of clustering definitions and clustering algorithms

Data Mining: Concepts and Techniques 34

Data compression
• In data compression, transformations are applied so as to obtain a
reduced or “compressed” representation of the original data.
• If the original data can be reconstructed from the compressed data
without any information loss, the data reduction is called lossless.
• If, instead, we can reconstruct only an approximation of the original
data, then the data reduction is called lossy.
• Dimensionality reduction and numerosity reduction techniques can
also be considered forms of data compression.
Data Reduction Method : Sampling

• Sampling: obtaining a small sample s to represent the whole

data set N.
• Techniques:
• 1) Simple random sample without replacement
(SRSWOR) of size s:
• 2) Simple random sample with replacement
(SRSWR) of size s:
• 3) Cluster sample:

Data Mining: Concepts and Techniques 36

Data Discretization
• Data discretization is defined as a process of converting
continuous data attribute values into a finite set of intervals
and associating with each interval some
specific data value.
Discretization
• Three types of attributes:
• Nominal — values from an unordered set, e.g., color, profession
• Ordinal — values from an ordered set, e.g., military or academic rank
• Continuous — real numbers, e.g., integer or real numbers

• Discretization:
• Divide the range of a continuous attribute into intervals
• Some classification algorithms only accept categorical attributes.
• Reduce data size by discretization

Data Mining: Concepts and Techniques 38

Discretization
• Discretization
• Reduce the number of values for a given continuous attribute by dividing
the range of the attribute into intervals
• Interval labels can then be used to replace actual data values
• Supervised vs. unsupervised
• Split (top-down) vs. merge (bottom-up)
• Discretization can be performed recursively on an attribute

Data Mining: Concepts and Techniques 39

Discretization for Numeric Data

• Typical methods: All the methods can be applied recursively

• Binning

• Top-down split, unsupervised,

• Histogram analysis
• Clustering analysis

• Either top-down split or bottom-up merge, unsupervised

Data Mining: Concepts and Techniques 40

3 Data Preprocessing
No ratings yet
3 Data Preprocessing
33 pages
Session-2-CO3-Introduction To Data Preprocessing
No ratings yet
Session-2-CO3-Introduction To Data Preprocessing
39 pages
Data Science - Module 1.3
No ratings yet
Data Science - Module 1.3
34 pages
Preprocessing
No ratings yet
Preprocessing
52 pages
BIS 541 Ch03 20-21 S
No ratings yet
BIS 541 Ch03 20-21 S
86 pages
Unit - 5 Irs
100% (1)
Unit - 5 Irs
78 pages
6-Significance of Exploratory Data Analysis, Making Sense of Data-06!02!2024
No ratings yet
6-Significance of Exploratory Data Analysis, Making Sense of Data-06!02!2024
85 pages
Aiml Data Preprocessing
No ratings yet
Aiml Data Preprocessing
99 pages
Week2 DataPreprocessing
No ratings yet
Week2 DataPreprocessing
43 pages
CH 3
No ratings yet
CH 3
68 pages
04 DM BI Data Preprocessing
No ratings yet
04 DM BI Data Preprocessing
93 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
52 pages
2 Data Pre-Processing
No ratings yet
2 Data Pre-Processing
50 pages
CH 2
No ratings yet
CH 2
36 pages
Pre Processing
No ratings yet
Pre Processing
68 pages
Data Mining: Concepts and Techniques: January 14, 2014 1
0% (1)
Data Mining: Concepts and Techniques: January 14, 2014 1
46 pages
Data Mining Requires Collecting Great Amount of Data (Available in Data Warehouses or Databases) To Achieve The Intended Objective
No ratings yet
Data Mining Requires Collecting Great Amount of Data (Available in Data Warehouses or Databases) To Achieve The Intended Objective
37 pages
Unit - 1 Data Preprocessing
No ratings yet
Unit - 1 Data Preprocessing
66 pages
Week 2 - Data Quality
No ratings yet
Week 2 - Data Quality
43 pages
Preprocessing
No ratings yet
Preprocessing
62 pages
Preprocessing
No ratings yet
Preprocessing
52 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
55 pages
Unit - II
No ratings yet
Unit - II
56 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
6 Data Preprocessing
No ratings yet
6 Data Preprocessing
37 pages
Week2 2
No ratings yet
Week2 2
25 pages
Unit 2
No ratings yet
Unit 2
37 pages
Preprocessing
No ratings yet
Preprocessing
50 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
03 Data Preparation
No ratings yet
03 Data Preparation
28 pages
COS10022 - Lecture 03 - Data Preparation PDF
No ratings yet
COS10022 - Lecture 03 - Data Preparation PDF
61 pages
M2 PPT
No ratings yet
M2 PPT
60 pages
CSC 3301-Lecture06 Introduction To Machine Learning
No ratings yet
CSC 3301-Lecture06 Introduction To Machine Learning
56 pages
Estimasi Anggaran Biaya Google Adwords Iklan Website
No ratings yet
Estimasi Anggaran Biaya Google Adwords Iklan Website
54 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
52 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Data Mining: Concepts and Techniques: - Chapter 3
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 3
52 pages
DM Lect3
No ratings yet
DM Lect3
41 pages
Data Mining: Concepts and Techniques: September 16, 2020 1
No ratings yet
Data Mining: Concepts and Techniques: September 16, 2020 1
46 pages
Module2 DataPreprocessing
No ratings yet
Module2 DataPreprocessing
27 pages
Lec2 - Data Preprocessing
No ratings yet
Lec2 - Data Preprocessing
30 pages
DSV-S8 Data Cleaning
No ratings yet
DSV-S8 Data Cleaning
34 pages
3datapreprocessing ppt3
No ratings yet
3datapreprocessing ppt3
46 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Machine Learning
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Machine Learning
35 pages
CS-DM Module-2
No ratings yet
CS-DM Module-2
29 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
50 pages
Normalization
No ratings yet
Normalization
35 pages
Data Preparation
No ratings yet
Data Preparation
21 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
ICS 2408 - Lecture 2 - Data Preprocessing
No ratings yet
ICS 2408 - Lecture 2 - Data Preprocessing
29 pages
Correlation
No ratings yet
Correlation
14 pages
AI351 Lecture 1
No ratings yet
AI351 Lecture 1
32 pages
Data Pre Processing
No ratings yet
Data Pre Processing
48 pages
MIM Advanced Databases Outline
No ratings yet
MIM Advanced Databases Outline
4 pages
Lecture 09 DM
No ratings yet
Lecture 09 DM
14 pages
Data Preprocessing Part 1
No ratings yet
Data Preprocessing Part 1
14 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
BCA Syllabus Sem-V
No ratings yet
BCA Syllabus Sem-V
11 pages
Artificial Intelligence and Deep Learning
0% (1)
Artificial Intelligence and Deep Learning
9 pages
Tahmi Sadiq
No ratings yet
Tahmi Sadiq
10 pages
BI & GIS DR Ruma
No ratings yet
BI & GIS DR Ruma
24 pages
Basics of Big Data 100 MCQs
No ratings yet
Basics of Big Data 100 MCQs
19 pages
Question Bank of All Units
No ratings yet
Question Bank of All Units
36 pages
DBMS All in One
No ratings yet
DBMS All in One
43 pages
Grade 10 ICT MWSP Quarter 3
No ratings yet
Grade 10 ICT MWSP Quarter 3
4 pages
Designing Data Marts in A Data Warehouse
No ratings yet
Designing Data Marts in A Data Warehouse
2 pages
Unit 1 - Introduction
No ratings yet
Unit 1 - Introduction
8 pages
Fake Review Detection From E-Commerce Website
No ratings yet
Fake Review Detection From E-Commerce Website
25 pages
NLP Ass 1&2
No ratings yet
NLP Ass 1&2
18 pages
Mastering Data Science
No ratings yet
Mastering Data Science
10 pages
Normalization
No ratings yet
Normalization
13 pages
DBMS - Viva QnA - Doubtly - in
No ratings yet
DBMS - Viva QnA - Doubtly - in
20 pages
Automating E Government Services Using Machine Learning
No ratings yet
Automating E Government Services Using Machine Learning
11 pages
Ucs672 MST 23
No ratings yet
Ucs672 MST 23
3 pages
Experiment No - 01
No ratings yet
Experiment No - 01
14 pages
2006 SEKE Jean
No ratings yet
2006 SEKE Jean
6 pages
Resume - Parser - Semi-Structured - Chinese - Document - Analysis
No ratings yet
Resume - Parser - Semi-Structured - Chinese - Document - Analysis
5 pages
RDBMS
No ratings yet
RDBMS
6 pages
Deep Seek
No ratings yet
Deep Seek
2 pages
Tarun Amballa Resume
No ratings yet
Tarun Amballa Resume
2 pages
Delphi Bianco Polish - MTC - 02.12.23
No ratings yet
Delphi Bianco Polish - MTC - 02.12.23
1 page
The Impact of Artifical Intellegience in Job Market and Society
No ratings yet
The Impact of Artifical Intellegience in Job Market and Society
1 page
Log
No ratings yet
Log
2 pages
Awl PracticalIndex
No ratings yet
Awl PracticalIndex
2 pages
Section 2: Correct
No ratings yet
Section 2: Correct
2 pages
SQL Mastery: From Novice Queries to Advanced Database Wizardry
From Everand
SQL Mastery: From Novice Queries to Advanced Database Wizardry
Scott Markham
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
The Beginner’s Guide to Databases & SQL
From Everand
The Beginner’s Guide to Databases & SQL
Steven Mcananey
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet

Module1.5 Preprocessing

Uploaded by

Module1.5 Preprocessing

Uploaded by

Course Code: CSA3002

MACHINE LEARNING ALGORITHMS

Course Type: LPC – 2-2-3

• Data integration and transformation

• No quality data, no quality mining results!

• A well-accepted multidimensional view:

• Data cleaning tasks

• Fill in the missing value manually: tedious + infeasible?

• Fill in it automatically with

Outliers may be detected by

• Redundant data occur often when integration of multiple

• Z-score normalization (μ: mean, σ: standard deviation):

• Ex. Let μ = 54,000, σ = 16,000. Then

• Discretization and concept hierarchy generation

• Why data reduction?

Data Mining: Concepts and Techniques

• Step-wise forward selection

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques 29

• PCA, is a dimensionality-reduction method that is often

Data Mining: Concepts and Techniques 30

July 3, 2025 Data Mining: Concepts and Techniques 31

• Can be very effective if data is clustered

• Can have hierarchical clustering and be stored in multi-dimensional index

• There are many choices of clustering definitions and clustering algorithms

Data Mining: Concepts and Techniques 34

• Sampling: obtaining a small sample s to represent the whole

Data Mining: Concepts and Techniques 36

Data Mining: Concepts and Techniques 38

Data Mining: Concepts and Techniques 39

• Typical methods: All the methods can be applied recursively

• Top-down split, unsupervised,

• Either top-down split or bottom-up merge, unsupervised

Data Mining: Concepts and Techniques 40

You might also like