III Unit Mtech 2023
III Unit Mtech 2023
•(global,local,Optmization based )
Sampling means selecting the group that you will actually collect data from
in your research
Purposive sampling :the sample units are selected with definite purpose.
Sample Size
Architecture for feature subset
selection
Entropy-based discretization is a supervised, top-down splitting
approach. It explores class distribution data in its computation and
preservation of split-points the method choose the value of A that
has the minimum entropy as a split-point, and recursively divisions
the resulting intervals to appear at a hierarchical discretization
d = √[(x2 – x1) + (y2 – y1) ]
2 2
d = √[(x2 – x1) + (y2 – y1)]
their lengths .
The Density-based Clustering tool works by detecting areas where points are
concentrated and where they are separated by areas that are empty or sparse.
• Centroid-based Clustering.
• Density-based Clustering.
• Distribution-based Clustering.
• Hierarchical Clustering.
Data transformation
• Data transformation is the process of converting, cleansing, and structuring
data into a usable format that can be analyzed to support decision making
processes, and to propel the growth of an organization. Data transformation
is the process of converting raw data into a more suitable format or
structure for analysis, to improve its quality and make it compatible with the
requirements of a particular task or system.
Types
The most common types of data transformation are:
• Constructive: The data transformation process adds, copies, or replicates
data.
• Destructive: The system deletes fields or records.
• Aesthetic: The transformation standardizes the data to meet requirements
or parameters
Phases
• Understanding the 4 stages of digital
transformation and what you need to move
forward
• Planning.
• Implementation.
• Acceleration.
• Measurement.
techniques
• The different types of data transformation
techniques such as manipulation,
normalization, attribute construction,
generalization, discretization, aggregation,
and smoothing can help solve various
problems that arise in data analysis projects.
• What are the 5 levels of transformation?
• The five stages of change are precontemplation, contemplation,
preparation, action, and maintenance.
CLASSIFIER
SVM—History and Applications
• Vapnik and colleagues (1992)—groundwork from Vapnik & Chervonenkis’
statistical learning theory in 1960s
• Features: training can be slow but accuracy is high owing to their ability to
model complex nonlinear decision boundaries (margin maximization)
• Used both for classification and prediction
• Applications:
– handwritten digit recognition, object recognition, speaker
identification, benchmarking time-series prediction tests
Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples
associated with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to
find the best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., maximum
marginal hyperplane (MMH)
CLASSIFIER
Bayesian Classification
• Statically classifiers
• Based on Baye’s theorem
• Naïve Bayesian classification
• Class conditional independence
• Bayesian belief netwoks
CLASSIFIER
Lecture-35 - Bayesian Classification
Bayesian Theorem: Basics
• Let X be a data sample (“evidence”), class label is unknown
• Let H be a hypothesis that X belongs to class C
• Classification is to determine P(C|X), the probability that the hypothesis holds
given the observed data sample X
• P(C) (prior probability), the initial probability of C
– E.g., X will buy computer, regardless of age, income, …
• P(X): probability that sample data is observed
• P(X|C) (likelihood), the probability of observing the sample X, given that the
hypothesis H holds
– E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income
CLASSIFIER
July 27, 2024 110
Bayesian Theorem
• Given training data X, posteriori probability of a hypothesis H, P(C|X),
follows the Bayes theorem
CLASSIFIER
July 27, 2024 111
Towards Naïve Bayesian Classifier
• Let D be a training set of tuples and their associated
class labels, and each tuple is represented by an n-D
attribute vector X = (x1, x2, …, xn)
• Suppose there are m classes C1, C2, …, Cm.
• Classification is to derive the maximum posteriori,
i.e., the maximal P(Ci|X)
• This can be derived from Bayes’ theorem (1<= i <=
m)
• Since P(X) is constant for
P(X | C )all
i
P(Cclasses,
i
) only
P(X | C )P(C )
• needs to be maximized P(C | X)
i
i
P(X)
i
CLASSIFIER
July 27, 2024 112
Derivation of Naïve Bayes Classifier
• A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between
attributes):
n
P ( X | C i ) P ( x | C i ) P ( x | C i ) P ( x | C i ) ... P ( x | C i )
k 1 2 n
k 1
• This greatly reduces the computation cost: Only counts
the class distribution
• If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having
value xk for Ak divided by |Ci, D| (# of tuples of Ci in D)
• If Ak is continuous-valued, P(xk|Ci) is usually computed
based on Gaussian distribution with a mean μ and ( x )2
1
standard deviation σ
g ( x, , ) 2
e 2
2
and P(xk|Ci) is
P ( X | C i ) g ( xk , Ci , Ci )
CLASSIFIER
July 27, 2024 Data Mining: Concepts and Techniques 113
Naïve Bayesian Classifier: Training Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data sample
31…40 low yes excellent yes
X = (age <=30,
<=30 medium no fair no
Income = medium,
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
July 27, 2024 114
Naïve Bayesian Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
CLASSIFIER
July 27, 2024 115
Naïve Bayesian Classifier: Comments
• Advantages
– Easy to implement
– Good results obtained in most of the cases
• Disadvantages
– Assumption: class conditional independence, therefore loss of
accuracy
– Practically, dependencies exist among variables
• E.g., salary and age.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
• Dependencies among these cannot be modeled by Naïve Bayesian
Classifier
• How to deal with these dependencies?
– Bayesian Belief Networks
CLASSIFIER
July 27, 2024 116
Play-tennis example: estimating P(xi|C)
outlook
Outlook Temperature Humidity Windy Class P(sunny|p) = 2/9 P(sunny|n) = 3/5
sunny hot high false N
sunny hot high true N P(overcast|p) = 4/9 P(overcast|n) = 0
overcast hot high false P
rain mild high false P P(rain|p) = 3/9 P(rain|n) = 2/5
rain cool normal false P
rain cool normal true N temperature
overcast cool normal true P
sunny
sunny
mild
cool
high false
normal false
N
P
P(hot|p) = 2/9 P(hot|n) = 2/5
rain
sunny
mild
mild
normal false
normal true
P
P
P(mild|p) = 4/9 P(mild|n) = 2/5
overcast mild high true P
overcast hot normal false P
P(cool|p) = 3/9 P(cool|n) = 1/5
rain mild high true N
humidity
P(high|p) = 3/9 P(high|n) = 4/5
P(p) = 9/14 P(normal|p) = 6/9 P(normal|n) = 2/5
windy
P(n) = 5/14
P(true|p) = 3/9 P(true|n) = 3/5
P(false|p) = 6/9 P(false|n) = 2/5
CLASSIFIER
Lecture-35 - Bayesian Classification
Play-tennis example: classifying X
• P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =
3/9·2/9·3/9·6/9·9/14 = 0.010582
• P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =
2/5·2/5·4/5·2/5·5/14 = 0.018286
CLASSIFIER
Lecture-35 - Bayesian Classification
How effective are Bayesian classifiers?
CLASSIFIER
Lecture-35 - Bayesian Classification
Using IF-THEN Rules for Classification
CLASSIFIER
July 27, 2024 121
Rule Extraction from a Decision Tree
age?