0% found this document useful (0 votes)

184 views224 pages

Supervised Learning Classification Techniques

The document discusses supervised learning algorithms for classification and prediction. It describes classification as organizing data into distinct classes using a model, while prediction forecasts attribute values. Classification algorithms covered include decision trees, Bayesian classification, rule-based classification, and backpropagation neural networks. The document outlines the three steps of the classification process: model construction, model evaluation for accuracy, and model use for classification.

Uploaded by

Bijal Vaza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

184 views224 pages

Supervised Learning Classification Techniques

Uploaded by

Bijal Vaza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Supervised Learning

Algorithms

October 10, 2022 1

 What is classification? What is  Support Vector Machines (SVM)
prediction?  Model selection
 Issues regarding classification  Summary
and prediction

 Classification by decision tree

induction

 Bayesian classification

 Rule-based classification

 Classification by back
propagation

October 10, 2022 2

Objectives
 Learn basic techniques for data classification and
prediction.

 Realize the difference between the following classifications

of data:
 supervised classification

 prediction

 unsupervised classification
What is Classification?

 The goal of data classification is to organize and categorize data

in distinct classes.

 A model is first created.

 The model is then used to classify new data.
 Given the model, a class can be predicted for new
data.

 Classification = prediction for discrete and nominal values

What is Prediction?
 The goal of prediction is to forecast or deduce the value of an attribute based
on values of other attributes.
 A model is first created based on the data distribution.
 The model is then used to predict future or unknown values

 In Machine Learning
 If forecasting discrete value  Classification
 If forecasting continuous value  Prediction
Classification Example
 Example training database
 Two predictor attributes: Age Car Class
Age and Car-type (Sport, 20 M Yes
Minivan and Truck) 30 M Yes
 Age is numeric, Car-type is 25 T No
categorical attribute 30 S Yes
 Class label indicates 40 S Yes
whether person bought 20 T No
product
30 M Yes
 Dependent attribute is 25 M Yes
categorical 40 M Yes
20 S No
Regression (Prediction) Example
 Example training database
 Two predictor attributes: Age Car Spent
Age and Car-type (Sport, 20 M $200
Minivan and Truck) 30 M $150
 Spent indicates how much 25 T $300
person spent during a recent 30 S $220
visit to the web site 40 S $400
 Dependent attribute is 20 T $80
numerical 30 M $100
25 M $125
40 M $500
20 S $420
Supervised and Unsupervised
 Supervised Classification = Classification
 We know the class labels and the number of classes

 Unsupervised Classification = Clustering

 We do not know the class labels and may not know the
number of classes
Preparing Data Before Classification
 Data transformation:

 Discretization of continuous data

 Normalization to [-1..1] or [0..1]
 Data Cleaning:

 Smoothing to reduce noise

 Relevance Analysis:

 Feature selection to eliminate irrelevant attributes

Application
 Credit approval
 Target marketing
 Medical diagnosis
 Defective parts identification in manufacturing
 Crime zoning
 Treatment effectiveness analysis
Classification is a 3-step process

 1. Model construction (Learning):

 Each tuple is assumed to belong to a predefined class, as
determined by one of the attributes, called the class label.
 The set of all tuples used for construction of the model is
called training set.

 The model is represented in the following forms:

 Classification rules, (IF-THEN statements),
 Decision tree
 Mathematical formulae
1. Classification Process (Learning)
Name Income Age Credit rating
Samir Low <30 bad Classification Method
Ahmed Medium [30...40] good
Salah High <30 good
Ali Medium >40 good
Sami Low [30..40] good
Emad Medium <30 bad Classification Model

IF Income = ‘High’
Training Data class OR Age > 30
THEN Class = ‘Good
OR
Decision Tree
OR
Mathematical For
Classification is a 3-step process
2. Model Evaluation (Accuracy):
 Estimate accuracy rate of the model based on a test set.
 The known label of test sample is compared with the classified
result from the model.
 Accuracy rate is the percentage of test set samples that are
correctly classified by the model.
 Test set is independent of training set otherwise over-fitting will
occur
2. Classification Process (Accuracy
Evaluation)

Classification Model

Name Income Age Credit rating Model

Naser Low <30 Bad Bad
Lutfi Medium <30 Bad good Accuracy
Adel High >40 good good 75%
Fahd Medium [30..40] good good

class
Classification is a three-step process
3. Model Use (Classification):
 The model is used to classify unseen objects.
 Give a class label to a new tuple
 Predict the value of an actual attribute <prediction>
3. Classification Process (Use)

Classification Model

Name Income Age

Credit rating
Adham Low <30
?
Classification Methods
 Decision Tree Induction
 Neural Networks
 Bayesian Classification
 Association-Based Classification
 K-Nearest Neighbour
 Case-Based Reasoning
 Genetic Algorithms
 Rough Set Theory
 Fuzzy Sets
 Etc.
Comparing Classification and Prediction Methods

 Accuracy- This is the ability of the model to correctly predict the

class level of new or previously unseen data.
classifier accuracy: predicting class label of
new or previously unseen data.
 predictor accuracy: guessing value of predicted

attributes new or previously unseen data.

 Speed (computational cost)
 time to construct the model (training time)

 time to use the model (classification/prediction

time)

October 10, 2022 19

Comparing Classification and Prediction Methods

 Robustness: handling noise and missing values

(ability of model to make correct predictions)
 Scalability: the ability to construct the model
efficiently given large amounts of data.
 Interpretability:
 This refers Level of understanding and insight

provided by the model (classifier or predictor).

 Other measures, e.g., goodness of rules, such as
decision tree size.

October 10, 2022 20

Decision Tree
Example 3
Example 4
What is a Decision Tree?
 A decision tree is a flow-chart-like tree structure.
 Internal node denotes a test on an attribute
 Branch represents an outcome of the test
 Leaf node represents class label
A quick recap of Linear Regression
–Linear models
Can Linear Regression help us in
this scenario?
How does Decision Tree come to
the rescue?
Simplicity
Feature selection
Handling different types of data
What is tree ?
What is Decision Tree(DT) ?
Use cases of Decision Tree
What is Root node of Decision
Tree?
What is decision node of decision
tree ?
What are leaf nodes of Decision
Tree?
CART(Classification and Regression
Tree)
CART(Classification and Regression
Tree)
CART: Classification Tree
CART: Regression Tree
How to build Decision Tree?
ID3( Iterative Dichotomiser 3)
What is Entropy?
Entropy: It is used to measure disorder in the system. If in a particular node,
all examples are positive OR all examples are negative (i.e. all examples belong to
the same class), then it is homogeneous set of examples and entropy is low.

However if we have two classes and half of the examples belong to one
class and half belong to another class, then entropy is high
m
Entropy ( S )   pi log 2 ( pi )
i 1
Entropy of heterogeneous data
Information Gain(IG)

v | Dj |
Entropy ( S , A)    I (D j )
j 1 |D|
Calculate Entropy & Information
Gain to build a Decision Tree
Step 1: Let’s calculate Entropy for
entire sample
Step 2: Calculate Entropy for each
column
v | Dj |
Entropy ( S , A)    I (D j )
j 1 |D|
Step 3: Calculate Information Gain
Information Gain from all
attributes
How does the tree look initially
Build Decision Tree –But what
next?
Build Decision Tree –next is here
How does the tree look finally?
Decision rules
Sample Decision Tree
Excellent customers
Fair customers
80

Income
< 6K >= 6K
Age YES
50 No

20
2000 6000 10000
Income
Sample Decision Tree
80

Income
<6k >=6k

NO Age
Age 50 >=50
<50
NO Yes

20
2000 6000 10000
Income
Decision-Tree Classification Methods

 The basic top-down decision tree generation approach usually

consists of two phases:

1. Tree construction
 At the start, all the training examples are at the root.
 Partition examples are recursively based on selected
attributes.

2. Tree pruning
 Aiming at removing tree branches that may reflect
noise in the training data and lead to errors when
classifying test data  improve classification accuracy
How to Specify Test Condition?
 Depends on attribute types
 Nominal
 Ordinal
 Continuous

 Depends on number of ways to split

 2-way split
 Multi-way split
Splitting Based on Nominal Attributes

 Multi-way split: Use as many partitions as distinct values.

CarType
Family Luxury
Sports

 Binary split: Divides values into two subsets.

Need to find optimal partitioning.

CarType CarType
{Sports, OR {Family,
Luxury} {Family} Luxury} {Sports}
Splitting Based on Ordinal Attributes
 Multi-way split: Use as many partitions as distinct values.
Size
Small Large
Medium

 Binary split: Divides values into two subsets.

Need to find optimal partitioning.

Size
Size {Medium,
{Small, Large} {Small}
Medium} {Large} OR

Size
{Small,
Large} {Medium}
Splitting Based on Continuous Attributes

 Different ways of handling

 Discretization to form an ordinal categorical
attribute
 Static – discretize once at the beginning

 Dynamic – ranges can be found by equal

interval bucketing, equal frequency bucketing

(percentiles), or clustering.

 Binary Decision: (A < v) or (A  v)

 consider all possible splits and finds the best cut
Splitting Based on Continuous Attributes

Taxable Taxable
Income Income?
> 80K?
< 10K > 80K
Yes No

[10K,25K) [25K,50K) [50K,80K)

(i) Binary split (ii) Multi-way split

Tree Induction

 Greedy strategy.
 Split the records based on an attribute test that
optimizes certain criterion.

 Issues
 Determine how to split the records
 How to specify the attribute test condition?

 How to determine the best split?

 Determine when to stop splitting

How to determine the Best Split

Excellent customers fair customers

Customers

Income Age
<10k >=10k young old
Algorithm for Decision Tree Induction

 Basic algorithm
 Tree is constructed in a top-down recursive divide-and-conquer
manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are discretized
in advance)
 Examples are partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
 Conditions for stopping partitioning
 There are no remaining attributes for further partitioning
 There are no samples left
Classification Algorithms

 ID3
 Uses information gain

 C4.5
 Uses Gain Ratio
Decision Tree Induction: Training Dataset

age income student credit_rating buys_computer

<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

October 10, 2022 73

Output: A Decision Tree for “buys_computer”

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes yes

October 10, 2022 74

ID3
Attribute Selection Measure: Information
Gain
 Notations:
 Let D, the data partition, be a training set of

class-labeled tuples.
 Suppose the class label attribute has m distinct

values defining m distinct classes, Ci (for i = 1,

….. , m).
 Let Ci,D be the set of tuples of class Ci in D. Let

| D | and |Ci,D | denote the number of tuples in

D and Ci,D, respectively.

October 10, 2022 75

Attribute Selection Measure:
Information Gain

 Select the attribute with the highest information gain for

current node
 Let pi be the probability that an arbitrary tuple in D
belongs to class Ci, estimated by |Ci, D|/|D|
 Expected information needed to classify a given tuple in D:
(log function base 2 is used since the info. is encoded in bits.)
m
Info( D)   pi log2 ( pi )
i 1
 Info (D) is just the average amount of information needed to identify the class
label of a tuple in D. Info (D) is also known as the entropy of D.
 Now, suppose we were to partition the tuples in D on some attribute A having
v distinct values, (a1, a2, …. , av), as observed from the training data. If A is
discrete-valued, then it gives the v outcomes of a test on A. Attribute A can be
used to split D into v partitions or subsets, (D1, D2, ….., Dv).

October 10, 2022 76

Attribute Selection Measure:
Information Gain

 Information needed (after using A to split D into v

partitions) to classify D: v |D |
Info A ( D)    I (D j )
j

j 1 |D|
 The term |Dj| / |D| acts as the weight of the j th partition. Info A(D) is the
expected information required to classify a tuple from D based on the
partitioning by A.

 Information gained by branching on attribute A

Gain(A)  Info(D)  InfoA(D)

October 10, 2022 77

Attribute Selection: Information Gain

 In training data set, The class level attribute,

buys_computer, has two distinct values (namely,
{yes,no}); therefore, there are two distinct classes (m=2).

 Let class P correspond to yes and N correspond to no.

 there are 9 samples of class yes and 5 samples of class
no.
 To compute the information gain of each attribute, we first
use Equation 1, to compute the expected information
needed to classify a given sample.

October 10, 2022 78

Attribute Selection: Information Gain

 Class P: buys_computer = “yes” Infoage ( D) 

5
I (2,3) 
4
I (4,0)
 Class N: buys_computer = “no” 14 14
9 9 5 5 5
Info( D)  I (9,5)   log2 ( )  log2 ( ) 0.940  I (3,2)  0.694
14 14 14 14 14
5
age pi ni I(pi, ni) I (2,3) means “age <=30” has 5
14
<=30 2 3 0.971 out of 14 samples, with 2 yes’es
31…40 4 0 0 and 3 no’s. Hence
>40 3 2 0.971
age income student credit_rating buys_computer Gain(age)  Info( D)  Infoage ( D)  0.246
<=30 high no fair no
<=30 high no excellent no
31…40 high
>40 medium
no
no
fair
fair
yes
yes
Similarly,
>40 low yes fair yes
>40
31…40 low
low yes
yes
excellent
excellent
no
yes Gain(income)  0.029
Gain( student )  0.151
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium
31…40 medium
yes
no
excellent
excellent
yes
yes
Gain(credit _ rating )  0.048
31…40 high yes fair yes
>40October 10, 2022 no
medium excellent no 79
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

October 10, 2022 80

 Final decision tree:
age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes yes

October 10, 2022 81

Why are decision tree classifiers so popular?

 The construction of decision tree classifiers does not require any

domain knowledge or parameter setting, and therefore is appropriate
for knowledge discovery.
 Decision trees can handle high dimensional data.
 Representation of acquired knowledge in tree form is generally easy
to humans.
 The learning and classification steps of decision tree induction are
simple and fast.
 In general, decision tree classifiers have good accuracy. However,
successful use may depend on the data at hand.
 Decision tree induction algorithms have been used for classification in
many application areas, such as medicine, manufacturing and
production, financial analysis, and molecular biology.
 Decision trees are the basis of several commercial rule induction
systems.
October 10, 2022 82
Gain Ratio for Attribute Selection

The information gain measure is biased toward tests with many outcomes. That is,
it prefers to select attributes having a large number of values.

For example, consider an attribute that acts as a unique identifier, such as product
ID.

A split on product ID would result in a large number of partitions (as many as

there are values), each one containing just one tuple.

Because each partition is pure, the information required to classify data set D
based on this partitioning would be Infoproduct ID(D) = 0.

Therefore, the information gained by partitioning on this attribute is maximal.

Clearly, such a partitioning is useless for classification.
Gain Ratio for Attribute Selection

 C4.5, a successor of ID3, uses an extension to information gain known as gain

ratio.
 (normalization to information gain)
v | Dj | | Dj |
SplitInfoA ( D)    log2 ( )
j 1 |D| |D|
 GainRatio(A) = Gain(A)/SplitInfo(A)

(A test on income splits the data into three partitions, namely low, medium & high containing four,six & four
tuples)

 Ex. 4 4 6 6 4 4
SplitInfoA ( D)    log2 ( )   log2 ( )   log2 ( )  0.926
14 14 14 14 14 14
 gain_ratio(income) = 0.029/0.926 = 0.031
 The attribute with the maximum gain ratio is selected as the splitting attribute

October 10, 2022 84

Comparing Attribute Selection Measures

 Information gain:
 biased towards multi valued attributes
 Gain ratio:
 tends to prefer unbalanced splits in which one
partition is much smaller than the others
Challenge with Decision Tree
models
Random Forest helps overcome this
challenge
Let’s know what overfitting is
How
How overfitting causes challenge in
Decision Tree
Random Forest to the rescue

R2 is a measure of the goodness of fit of a model.

Random Forest
What is Bagging?
What is Bagging?
Types of Ensemble learning
What is ensemble learning?
Ensemble
Why Random Forest is called
Random?
Row level randomness in Random
Forest
Column level randomness in
Random Forest
Example

 RS: Row sample

 FS: Feature/Column
sample

 Low Bias: Basically it says that if I am creating my decision tree to its complete depth,
then it will get properly trained for training dataset. So training error will be very less.
 High Variance: Whenever we get new test data, the decision tree is prone to give larger
amount of error.
How does Random Forest work in
Regression?
How does Random Forest work in
Classification?
•
Benefits of Random Forest
Use cases of Random Forest
•
Naïve Bayes classifier
Naïve Bayes classifier
Background
 Classification algorithms that differentiates between classes on the basis of
definite decision boundaries.
 Classification algorithms that learn boundaries between classes.
 Classification algorithms that constructs decision boundaries that separates
classes are called discriminative models.
Background

 What if we differentiate between two classes by

analyzing probability distribution of data…
What is Naïve Bayes?

 Naive Bayes classifier is an algorithm that learns

the probability that an object with certain
features, belong to a particular group or class.
Where is Naïve Bayes used?
Advantages of Naïve Bayes
Basics of Probability

 What is Probability?
What is Probability?
Probability explained through an
example
John’s emails have multiple occurrences
of the word ‘Lottery’. Let’s analyze them
closely..
Analyze Emails with word “lottery”
Let us consider two simple events..
Let us consider two simple events in
Emails
Appearance of “lottery” in spam and
genuine emails
Compute probability of word ‘lottery’
appearing in emails
Let us explore different types of
probabilities…
Types of Probabilities: Joint Probability
Types of Probabilities: Joint Probability
Venn Diagram for representing count of
events
Let us compute joint probability of word
‘lottery’ appearing in spam
Types of Probabilities: Marginal
Probability
Types of Probabilities: Marginal
Probability
Types of Probabilities: Conditional
Probability
Types of Probabilities: Conditional
Probability

Probability of an event given that another event has already occurred

is called conditional probability.
Conditional Probability
 For example, suppose you go out for lunch at the same
place and time every Friday and you are served lunch
within 15 minutes with probability 0.9. However, given
that you notice that the restaurant is exceptionally busy,
the probability of being served lunch within 15 minutes
may reduce to 0.7. This is the conditional probability of
being served lunch within 15 minutes given that the
restaurant is exceptionally busy.
 The usual notation for "event A occurs given that event B
has occurred" is "A | B" (A given B). The symbol | is a
vertical line and does not imply division.
 P(A | B) denotes the probability that event A will occur
given that event B has occurred already.
Conditional Probability
 A rule that can be used to determine a conditional
probability from unconditional probabilities is:
P  A  B
P  A B 
P  B
 where:
 P(A | B) = the (conditional) probability that event A will
occur given that event B has occurred already.
 P(A B) = the (unconditional) probability that event A and
event B both occur.
 P(B) = the (unconditional) probability that event B occurs.
Naive Bayes classifier

 Bayesian classifiers are statistical classifiers. They can

predict class membership probabilities, such as the
probability that a given tuple belongs to a particular class.
 Bayesian classification is based on Bayes’ Theorem.
 It is based on simplifying assumpations that the attribute
values are conditionally independent,
 A naive Bayes classifier assumes that the presence (or
absence) of a particular feature of a class is unrelated to
the presence (or absence) of any other feature, given the
class variable.

October 10, 2022 132

Naive Bayes classifier
 For example, a fruit may be considered to be an apple if it
is red, round, and about 4" in diameter. A naive Bayes
classifier considers all these features to contribute
independently to the probability that this fruit is an apple,
whether or not they're in fact related to each other or to
the existence of the other features.

 This reduces significantly computation cost since

calculating each one of the P  ai v j  requires only a
frequency count over the tuples in the training data with
class value equal to v j .
Bayes Theorem : Basics

 Let X be a data sample : class label is unknown

 Let H be a hypothesis that X belongs to a specified class C
 For classification problems, we want to determine P(H|X), the probability that
the hypothesis holds given the observed data sample X
 P(H) (prior probability), the initial probability
 E.g., X will buy computer, regardless of age, income or any other
information, for that matter.
 P(H|X) (posteriori probability), the probability of observing the sample X, given
that the hypothesis holds
 For example, suppose our world of data tuples is confined to customers
described by the attributes age and income, respectively,
 and that X is a 35-year-old customer with an income of $40,000.
 Suppose that H is the hypothesis that our customer will buy a computer.
 Then P(H|X) reflects the probability that customer X will buy a computer
given that we know the customer’s age and income.

October 10, 2022 134

Bayesian Theorem
 Given data X, posteriori probability of a hypothesis H, P(H|X), follows the
Bayes theorem

P(H | X)  P(X | H )P(H )

P(X)
 P(X|H) is the descriptor posterior probability of X conditioned on H.
That is, it is the probability that a customer, X, is 35 years old and
earns $40,000, given that we know the customer will buy a computer.

 Predicts X belongs to Ci if the probability P(Ci|X) is the highest among all the
P(Ck|X) for all the k classes.
 Practical difficulty: require initial knowledge of many probabilities.

October 10, 2022 135

Bayesian Theorem
 Assume target function f : X  Y (A function f with domain X and
codomain Y). The elements of X are called arguments of f. For each
argument x, the corresponding unique y in the codomain is called the
function value at x or the image of x under f.
 If, each instance X is describes by attributes <a1, a2, a3, ….an>.
 Most probable value of f(X) is: vMAP
 Using Bayes Theorem we can write the expression as :


vMAP  arg max P v j a1 , a2 ,....., an
 j V

P  a1 , a2 ,....., an v j  P  v j 
 arg max
 j V P  a1 , a2 ,....., an 
 arg max P  a1 , a2 ,....., an v j  P  v j 
 j V

vMAP  arg max P  v j   P  ai v j 

v j V
i
 The denominator does not depend on the choice of v j and thus, it can be
omitted from the arg max argument.
Bayesian Theorem

 In mathematics, argmax stands for

the argument of the maximum, that is to say,
the set of points of the given argument for which
the given function attains its maximum value.
Towards Naïve Bayesian Classifier
 Let D be a training set of tuples and their associated class
labels, and each tuple is represented by an n- dimensional
attribute vector X = (x1, x2,…, xn), showing n
measurements made on the tuple from n attributes.
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
 This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X)  i i
i P(X)
 Since P(X) is constant for all classes, only
P(C | X)  P(X| C )P(C )
i i i
needs to be maximized
October 10, 2022 138
Bayesian Classifier – Basic Equation

Class Prior Probability Descriptor Posterior Probability

PC  P X | C 
PC | X  
P X 

Class Posterior Probability

Descriptor Prior Probability
Naïve Bayesian Classifier: Training Dataset
age income studentcredit_rating
buys_comp
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data sample
31…40 low yes excellent yes
X = (age <=30,
Income = medium, <=30 medium no fair no
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
October 10, 2022 140
Naïve Bayesian Classifier: An Example
 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

 Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

October 10, 2022 141
Training Data

Outlook Temp Humidity Windy Play?

sunny hot high FALSE No

sunny hot high TRUE No

overcast hot high FALSE Yes

rainy mild high FALSE Yes

rainy cool normal FALSE Yes

rainy cool Normal TRUE No

overcast cool Normal TRUE Yes

sunny mild High FALSE No

sunny cool Normal FALSE Yes

rainy mild Normal FALSE Yes

sunny mild normal TRUE Yes

overcast mild High TRUE Yes

overcast hot Normal FALSE Yes

rainy mild high TRUE No

P(yes) = 9/14
P(no) = 5/14
Bayesian Classifier – Probabilities for the weather data

Frequency Tables

Outlook | No Yes Temp. | No Yes Humidity | No Yes Windy | No Yes

Outlook | No Yes Temp. | No Yes Humidity | No Yes Windy | No Yes

---------------------------------- ---------------------------------- ---------------------------------- ----------------------------------
Sunny | 3/5 2/9 Hot | 2/5 2/9 High | 4/5 3/9 False | 2/5 6/9
---------------------------------- ---------------------------------- ---------------------------------- ----------------------------------
Overcast | 0/5 4/9 Mild | 2/5 4/9 Normal | 1/5 6/9 True | 3/5 3/9
---------------------------------- ----------------------------------
Rainy | 2/5 3/9 Cool | 1/5 3/9

Likelihood Tables
Bayesian Classifier – Predicting a new day

Outlook Temp. Humidity Windy Play

X sunny cool high true ? Class?

P(yes|X) = p(sunny|yes) x p(cool|yes) x p(high|yes) x p(true|yes) x p(yes)

= 2/9 x 3/9 x 3/9 x 3/9 x 9/14 = 0.0053 => 0.0053/(0.0053+0.0206) = 0.205

P(no|X) = p(sunny|no) x p(cool|no) x p(high|no) x p(true|no) x p(no)

= 3/5 x 1/5 x 4/5 x 3/5 x 5/14 = 0.0206=0.0206/(0.0053+0.0206) = 0.795

Bayesian Classifier – zero frequency problem

 What if a descriptor value doesn’t occur with every class value

P(outlook=overcast|No)=0

 Remedy: add 1 to the count for every descriptor-class combination

(Laplace Estimator)

Outlook | No Yes Temp. | No Yes Humidity | No Yes Windy | No Yes

---------------------------------- ---------------------------------- ---------------------------------- ----------------------------------
Sunny | 3+1 2+1 Hot | 2+1 2+1 High | 4+1 3+1 False | 2+1 6+1
---------------------------------- ---------------------------------- ---------------------------------- ----------------------------------
Overcast | 0+1 4+1 Mild | 2+1 4+1 Normal | 1+1 6+1 True | 3+1 3+1
---------------------------------- ----------------------------------
Rainy | 2+1 3+1 Cool | 1+1 3+1
Bayesian Classifier – General Equation

PX | Ck  PCk 
PCk | X 
PX

Likelihood: P(X | Ck )

 ( x   )2 
Continues variable: P x | C  
1
exp 
(2 2 )1/ 2  2 2

Bayesian Classifier – Dealing with numeric attributes
EXAMPLE-I
Department status age salary
Sales senior 31. . .35 41K.. .45K
Sales junior 26. . .30 26K.. .30K
Sales junior 31. . .35 31K.. .35K
systems junior 21. . .25 31K.. .35K
systems senior 31. . .35 66K.. .70K
systems junior 26. . .30 31K.. .35K
systems senior 41. . .45 66K.. .70K
marketing senior 26. . .30 46K.. .50K
marketing junior 31. . .35 41K.. .45K
secretary senior 46. . .50 41K.. .45K
secretary junior 26. . .30 26K.. .30K

 Define Bayesian Classification .Given a data tuple having the values

“systems”, “26. . . 30”, and “41K.. .45K” for the attributes department, age,
and salary, respectively, what would be a naive Bayesian classification of the
status for the given data tuple ?
Example- continuous attributes
 Consider the training dataset as shown in below table. Let Play be the class label attribute. There
are two distinct classes, namely, yes and no and two numeric attributes namely “temp” and
“humidity”.

Outlook Temp Humidity Windy Play?

sunny 85 85 FALSE No
sunny 80 90 TRUE No
overcast 83 86 FALSE Yes
rainy 70 96 FALSE Yes
rainy 68 80 FALSE Yes
rainy 65 70 TRUE No
overcast 64 65 TRUE Yes
sunny 72 95 FALSE No
sunny 69 70 FALSE Yes
rainy 75 80 FALSE Yes
sunny 75 70 TRUE Yes
overcast 72 90 TRUE Yes
overcast 81 75 FALSE Yes
rainy 71 91 TRUE No

 Given a data tuple having the values “sunny”, 66, 89 and “true” for the attributes outlook, temp.,
humidity and windy respectively, what would be a naive Bayesian classification of the Play for the
given tuple?
Example- continuous attributes
The numeric weather data with summary statistics

Outlook temperature humidity windy play

yes no yes no yes no yes no yes no

sunny 2 3 83 85 86 85 false 6 2 9 5

overcast 4 0 70 80 96 90 true 3 3

rainy 3 2 68 65 80 70

64 72 65 95

69 71 70 91

75 80

75 70

72 90

81 75

sunny 2/9 3/5 mean 73 74.6 mean 79.1 86.2 false 6/9 2/5 9/14 5/14

overcast 4/9 0/5 std dev 6.2 7.9 std dev 10.2 9.7 true 3/9 3/5

rainy 3/9 2/5

Artificial Neural Networks
(ANN)
Neural Networks -Origin
Best learning system known to us?
How does the brain work?

In brain a neuron has three principal components:

1. Dendrites:- that carry electrical signals into the cell body.
2. Cell Body:- effectively sums and thresholds these incoming signals.
3. Axon:- is a single long fiber that carries the signal from the cell body out to other
neurons.
4. The point of contact between an axon of one cell and a dendrite of another cell is
called a ‘synapse’
Background: ANN Vs Brain

ANN Brain
It is simple (few neuron in connection) It is complex (1011 Neurons and 1015
connections)

It is dedicated for specific purpose It is generalized for all purpose

Response time is fast ( it may be in Response time is slow ( it may be in

Nanosecond) millisecond)

Design is regular Design is arbitrary

Activities are synchronous Activities are asynchronous

October 10, 2022 155

What is a neural network?
Perceptron

Perceptrons can only model linearly separable functions.

We need to use multi-layer perceptron to tackle non-linear problems.

Perceptron
Activation Functions
Multi Layer Perceptron
Multi Layer Perceptron
General Structure of ANN

x1 x2 x3 x4 x5

Input
Layer

Hidden
Layer

Output
Layer

 j is the bias of the unit. The bias acts as a threshold, which is used to adjust
the output along with the weighted sum of the inputs to the neuron. Therefore
bias is a constant which helps the model in a way that it can fit best for the
given data..
ANN

X1 X2 X3 Y Input Black box

1 0 0 0
1 0 1 1
X1
1 1 0 1 Output
1 1 1 1 X2
0 0 1 0
Y
0 1 0 0
0 1 1 1 X3
0 0 0 0

Output Y is 1 if at least two of the three inputs are equal to 1.

ANN

Input
nodes Black box
X1 X2 X3 Y
1 0 0 0 Output
1 0 1 1
X1 0.3 node
1 1 0 1
1 1 1 1 X2 0.3 
0 0 1 0
Y
0 1 0 0
0 1 1 1 X3 0.3 t=0.4
0 0 0 0

Y  I (0.3 X 1  0.3 X 2  0.3 X 3  0.4  0)

1 if z is true
where I ( z )  
0 otherwise
Artificial Neural Networks
Input
 Model is an assembly of inter- nodes Black box
connected nodes and weighted Output
links X1 w1 node
w2
 Output node sums up each of its X2  Y
input value according to the w3
weights of its links X3 t

 Compare output node against some

Perceptron Model
threshold t
Y  I ( wi X i  t )
i
Given the net input Ij to unit j, then Oj, the output of unit j, is computed as,

This function is also referred to as a squashing function, because it maps a large

input domain onto the smaller range of 0 to 1.
Where Do The Weights Come From?
Where Do The Weights Come From?
How Do Perceptrons Learn?
Learning Algorithms:
Back propagation for classification
What is backpropagation
 Backpropagation is a neural network learning algorithm.
 There are many different kinds of neural networks and
neural network algorithms.
 The most popular neural network algorithm is
backpropagation, which gained repute in the 1980s.
 Multilayer feed-forward networks is type of neural network
on which the backpropagation algorithm performs.
 Backpropagation learns for a set of weights that fits the
training data so as to minimize the mean squared distance
between the network’s class prediction and the known
target value of the tuples.
Major Steps for Back Propagation Network
 Constructing a network

 input data representation

 selection of number of layers, number of nodes in each
layer.
 Training the network using training data
 Pruning the network
 Interpret the results
A Multi-Layer Feed-Forward Neural Network

x1 x2 x3 x4 x5

Input
Layer
wij

I j   wij Oi   j
Hidden
Layer
i

1
Oj 
Output
Layer I j
1 e
y
How A Multi-Layer Neural Network Works?

 The inputs to the network correspond to the attributes measured for each
training tuple

 Inputs are fed simultaneously into the units making up the input layer

 They are then weighted and fed simultaneously to a hidden layer

 The number of hidden layers is arbitrary, although usually only one

 The weighted outputs of the last hidden layer are input to units making up the
output layer, which gives out the network's prediction

 The network is feed-forward in that none of the weights cycles back to an

input unit or to an output unit of a previous layer
Defining a Network Topology

 First decide the network topology: # of units in the input layer, # of

hidden layers (if > 1), # of units in each hidden layer, and # of units in
the output layer
 Normalizing the input values for each attribute measured in the training
tuples to [0.0—1.0]
 One input unit per domain value
 Output, if for classification and more than two classes, one output unit
per class is used
 Once a network has been trained and its accuracy is unacceptable,
repeat the training process with a different network topology or a
different set of initial weights
Backpropagation
 Iteratively process a set of training tuples & compare the network's
prediction with the actual known target value
 For each training tuple, the weights are modified to minimize the mean
squared error between the network's prediction and the actual target
value
 Modifications are made in the “backwards” direction: from the output
layer, through each hidden layer down to the first hidden layer, hence
“backpropagation”
 Steps
 Initialize weights and biases in the network
 Propagate the inputs forward (by applying activation function)
 Backpropagate the error (by updating weights and biases)
 Terminating condition (when error is very small, etc.)
Backpropagation: Algorithm
Backpropagation
x1 x2 x3 x4 x5

Input
Layer
Err j  O j (1  O j ) Errk w jk
k

Hidden
Layer wij  wij  (l ) Err j Oi

Output
Layer
 j   j  (l) Err j
y

Err j  O j (1  O j )(T j  O j )

Generated value Correct value

Example - Sample calculations for learning by
the backpropagation algorithm

Figure shows : a multilayer feed-forward neural network.

Let the learning rate be 0.9.
The first training tuple, X = (1, 0, 1), whose class label is 1.
Neural Network as a Classifier
 Weakness
 Long training time
 Require a number of parameters, e.g., the network topology or ``structure."
 Poor interpretability: Difficult to interpret the symbolic meaning behind the learned
weights and of ``hidden units" in the network

 Strength
 High tolerance to noisy data as well as their ability to classify patterns on which
they have not been trained.
 They are well-suited for continuous-valued inputs and outputs, unlike most decision
tree algorithms.
 They have been successful on a wide array of real-world data, including
handwritten character recognition, pathology and laboratory medicine, and training
a computer to pronounce English text.
 Neural network algorithms are inherently parallel; parallelization techniques can be
used to speed up the computation process.

These above factors contribute toward the usefulness of neural networks for
classification and prediction in machine learning.

October 10, 2022 182

K Nearest Neighbor
Lazy vs. Eager Learning

 The classification methods —decision tree

induction, Bayesian classification, classification by
backpropagation, support vector machines—are
all examples of eager learners.
Lazy vs. Eager Learning
 Lazy vs. eager learning

 Lazy learning (instance-based learning): Simply stores

training data (or only minor processing) and waits until it is given a
test tuple.
 Eager learning : Given a set of training set, constructs a
classification model before receiving new (e.g., test) data to
classify.
We can think of the learned model as being ready and
eager to classify previously unseen tuples.

 Lazy: less time in training but more time in predicting so lazy learners
can be computationally expensive.
Lazy Learner: Instance-Based Methods

 Instance-based learning:
 Store training examples and delay the processing
(“lazy evaluation”) until a new instance must be
classified.
 Typical approaches

 k-nearest neighbor approach

 Instances represented as points in a Euclidean
space.

 Case-based reasoning
 Uses symbolic representations and knowledge-

based inference.
The k-Nearest Neighbor Algorithm
 The k-nearest-neighbor method was first described in the early 1950s.
 It has since been widely used in the area of pattern recognition.

 Nearest-neighbor classifiers are based on learning by analogy, that is,

by comparing a given test tuple with training tuples that are similar to
it.

 The training tuples are described by n attributes.

 Each tuple represents a point in an n-dimensional space.

 In this way, all of the training tuples are stored in an n-dimensional
pattern space.
 When given an unknown tuple, a k-nearest-neighbor classifier
searches the pattern space for the k training tuples that are closest to
the unknown tuple. These k training tuples are the k “nearest
neighbors” of the unknown tuple.
The k-Nearest Neighbor Algorithm

 “Closeness” is defined in terms of a distance metric, such

as Euclidean distance. The Euclidean distance between
two points or tuples, say, X1 = (x11, x12, ….. , x1n) and
X2 = (x21, x22, …… , x2n), is

 For k-nearest-neighbor classification, the unknown tuple is

assigned the most common class among its k nearest
neighbors. When k = 1, the unknown tuple is assigned the
class of the training tuple that is closest to it in pattern
space.
 Nearest neighbor classifiers can also be used for
prediction, that is, to return a real-valued prediction for a
given unknown tuple.
Example-
Instance-Based Classification
 A KNN classifier assigns a test instance the majority class associated
with its K nearest training instances. Distance between instances is
measured using the Euclidean distance.
 Suppose we have the following training set of positive (+) and
negative (-) instances and a single test instance (o).
 All instances are projected onto a vector space of two real-valued
features: X and Y.
Contd…
(a) What would be the class assigned to this test instance for K=1 .
KNN assigns a test instance the target class associated with the
majority of the test instance’s K nearest neighbors. For K=1, this test
instance would be predicted negative because it’s single nearest
neighbor is negative.

(b) What would be the class assigned to this test instance for K=3.
KNN assigns a test instance the target class associated with the
majority of the test instance’s K nearest neighbors. For K=3, this test
instance would be predicted negative. Out of its three nearest
neighbors, two are negative and one is positive.
Advantages of KNN
Advantages of KNN
Advantages of KNN
Example of application of KNN
Example of application of KNN
KNN(K Nearest Neighbor) in a
nutshell
How does KNN work?
Let’s take a simple example of
Classification
Step 1: Build neighborhood
Step 2: Find distance from query point to
each point in neighborhood
FYI: Distance measures for
continuous data
Step 3: Assign to class
Classification with KNN: Loan
default data
Step 1: Build neighborhood
Classification with KNN: Build
neighborhood
Step 2: Measure distance from each
data point
Step 2: Graphical representation of
distance
Step 3: Assign to class based on
majority vote
KNN for Regression: Let’s work
withLoan data set
Step 1: Define ‘K’(number of
neighbors)
Step 2: Measure distance from each
data point
Regression with KNN: Predict
income of Query point
What should be the value of K?
What should be the value of K?
Case Study: Identify whether a
website is malicious or not
Identify whether a website is
malicious or not: Data Attributes
Metrics for Performance Evaluation of
Classifier
 Focus on the predictive capability of a model
 Rather than how fast it takes to classify or build models,
scalability, etc.
 Confusion Matrix:

PREDICTED CLASS a: TP (true positive)

Class=Yes Class=No b: FN (false negative)

c: FP (false positive)
Class=Yes a b d: TN (true negative)
ACTUAL
CLASS Class=No c d
Metrics for Performance Evaluation of
Classifier

PREDICTED CLASS
Class=Yes Class=No
(Positive) (Negative)
ACTUAL Class=Yes a b
CLASS (Positive)
Class=No c d
(Negative)

 The entries in the confusion matrix have the

following meaning :
 a is the number of correct predictions that an instance is positive,
 b is the number of incorrect of predictions that an instance negative,
 c is the number of incorrect predictions that an instance is positive, and
 d is the number of correct predictions that an instance is negative.
Metrics for Performance Evaluation of
Classifier

 The accuracy (AC)- is the proportion of the total number of

predictions that were correct. It is determined using the
equation:
ad TP  TN
Accuracy  
a  b  c  d TP  TN  FP  FN
 Consider a 2-class problem
 Number of Class 0 examples = 9990
 Number of Class 1 examples = 10
 If model predicts everything to be class 0, accuracy is
9990/10000 = 99.9 %
 Accuracy is misleading because model does not detect any
class 1 example
Contd…
 The recall or true positive rate (TP) is the proportion of
positive cases that were correctly identified, as calculated
using the equation:
a TP
TP  
a  b TP  FN
 The false positive rate (FP) is the proportion of negatives
cases that were incorrectly classified as positive, as
calculated using the equation:

c FP
FP  
c  d FP  TN
Contd…

 The true negative rate (TN) is defined as the

proportion of negatives cases that were classified
correctly, as calculated using the equation:
d TN
TN  
d  c TN  FP
 The false negative rate (FN) is the proportion of
positives cases that were incorrectly classified as
negative, as calculated using the equation:
b FN
FN  
b  a FN  TP
Contd…..

 The precision (P) is the proportion of the

predicted positive cases that were correct, as
calculated using the equation:

a TP
P  
c  a FP  TP
Example
 Suppose we train a model to predict whether an email is Spam or
Not Spam. After training the model, we apply it to a test set of 500
new email messages (also labeled) and the model produces the
contingency matrix below.

(a) Compute the precision of this model with respect to the Spam class.
Precision with respect to SPAM = # correctly predicted as SPAM / #
predicted as SPAM
= 70 / (70 + 10) = 70 / 80.
Cond…
(b) Compute the recall of this model with respect to the Spam class.

recall with respect to SPAM = # correctly predicted as SPAM / # truly

SPAM
= 70 / (70 + 40) = 70 / 110.

 High-precision and low recall with respect to SPAM: whatever

the model classifies as SPAM is probably SPAM. However, many emails
that are truly SPAM are misclassified as NOT SPAM i.e <False
Negative ( False Acceptance)>

 High recall and low precision with respect to SPAM: the model
filters all the SPAM emails, but also incorrectly classifies some genuine
emails as SPAM i.e. <False Positive (False Rejectance)>.
End of Presentataion

WEKA Manual For Version 3-6-5
No ratings yet
WEKA Manual For Version 3-6-5
303 pages
(Sudkamp) Languages and Machines PDF
100% (1)
(Sudkamp) Languages and Machines PDF
574 pages
Cs8082 Machine Learning Techniques Ripped From Amazon Kindle e Books by Sai Seena
No ratings yet
Cs8082 Machine Learning Techniques Ripped From Amazon Kindle e Books by Sai Seena
148 pages
Ramesh Babu
14% (7)
Ramesh Babu
303 pages
Machine Learning - An Algorithmic Perspective (2009) PDF
100% (1)
Machine Learning - An Algorithmic Perspective (2009) PDF
408 pages
Theory and Novel Applications of Machine Learning
No ratings yet
Theory and Novel Applications of Machine Learning
386 pages
Mobile Application Development
No ratings yet
Mobile Application Development
193 pages
Fourier Series Formulas and Concepts
No ratings yet
Fourier Series Formulas and Concepts
5 pages
Advanced Data Mining with Weka Course
No ratings yet
Advanced Data Mining with Weka Course
61 pages
Distributed Weka for Data Mining
No ratings yet
Distributed Weka for Data Mining
36 pages
Introduction To Machine Learning-Compressed
No ratings yet
Introduction To Machine Learning-Compressed
161 pages
Hyderabad Landfill Waste Management Study
No ratings yet
Hyderabad Landfill Waste Management Study
8 pages
Oops Through Java Btech 3rd Year Notes
No ratings yet
Oops Through Java Btech 3rd Year Notes
48 pages
Data Mining With Weka
No ratings yet
Data Mining With Weka
49 pages
Data Mining Lab
No ratings yet
Data Mining Lab
58 pages
Lab Manual: Department of Computer Engineering
No ratings yet
Lab Manual: Department of Computer Engineering
66 pages
Mathematical Foundations of Data Science
No ratings yet
Mathematical Foundations of Data Science
180 pages
Siahaan V. Data Science Crash Course... With Python GUI 2ed 2023
No ratings yet
Siahaan V. Data Science Crash Course... With Python GUI 2ed 2023
610 pages
CE8395 - Strength of Materials For Mechanical Engineers (Ripped From Amazon Kindle Ebooks by Sai Seena)
No ratings yet
CE8395 - Strength of Materials For Mechanical Engineers (Ripped From Amazon Kindle Ebooks by Sai Seena)
798 pages
Data Mining Lessons with Weka
No ratings yet
Data Mining Lessons with Weka
33 pages
CS 229: Supervised Learning Basics
100% (1)
CS 229: Supervised Learning Basics
48 pages
MCA Database Management Textbook
No ratings yet
MCA Database Management Textbook
16 pages
Quantitative Economics With Python PDF
No ratings yet
Quantitative Economics With Python PDF
945 pages
Sathish Yellanki: Skyess: in Association With
No ratings yet
Sathish Yellanki: Skyess: in Association With
12 pages
Kamalalaya Sri Suktam
No ratings yet
Kamalalaya Sri Suktam
4 pages
Cd-Rom Included: Business User Action
100% (1)
Cd-Rom Included: Business User Action
11 pages
Ref 3 Recommender Systems For Learning PDF
No ratings yet
Ref 3 Recommender Systems For Learning PDF
84 pages
Module 2
No ratings yet
Module 2
48 pages
Data Mining Notes
No ratings yet
Data Mining Notes
29 pages
CNS Book by Brainheaters
No ratings yet
CNS Book by Brainheaters
240 pages
Boosting and AdaBoost For Machine Learning
No ratings yet
Boosting and AdaBoost For Machine Learning
18 pages
12th CS
No ratings yet
12th CS
395 pages
AWS ML Cheat Sheet Nov 2024
No ratings yet
AWS ML Cheat Sheet Nov 2024
100 pages
Introduction to Decision Science
No ratings yet
Introduction to Decision Science
136 pages
Theory of Computation
No ratings yet
Theory of Computation
120 pages
Linear Regression for IIT Students
No ratings yet
Linear Regression for IIT Students
19 pages
C Data Structure Practice
100% (1)
C Data Structure Practice
507 pages
Automata Theory Workbook
No ratings yet
Automata Theory Workbook
138 pages
Data Science and Big Data Analytics-1-82
No ratings yet
Data Science and Big Data Analytics-1-82
82 pages
Lecture 1
100% (1)
Lecture 1
43 pages
J B Institute of Engineering and Technology: Course Plan For Software Testing Methodologies
No ratings yet
J B Institute of Engineering and Technology: Course Plan For Software Testing Methodologies
40 pages
ML Decode TE IT
No ratings yet
ML Decode TE IT
71 pages
AI
No ratings yet
AI
101 pages
Machine Learning: Andrew NG's Course From Coursera: Presentation
100% (1)
Machine Learning: Andrew NG's Course From Coursera: Presentation
4 pages
ML Material Unit1
No ratings yet
ML Material Unit1
32 pages
EE8351 - Digital Logic Circuits
100% (1)
EE8351 - Digital Logic Circuits
570 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
93 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Classification
No ratings yet
Classification
23 pages
Classification & Prediction Guide
100% (1)
Classification & Prediction Guide
67 pages
Classification
No ratings yet
Classification
33 pages
Classification and Prediction Overview
No ratings yet
Classification and Prediction Overview
75 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
ID3 AllanNeymark
No ratings yet
ID3 AllanNeymark
22 pages
COS10022 DSP Week05 Decision Tree and Random Forest
No ratings yet
COS10022 DSP Week05 Decision Tree and Random Forest
50 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
48 pages
Understanding Decision Trees and Binning
No ratings yet
Understanding Decision Trees and Binning
28 pages
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
100% (1)
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
8 pages
Genetic Based ID3 Classification Algorithm Diagnosis and Prognosis of Oral Cancer
No ratings yet
Genetic Based ID3 Classification Algorithm Diagnosis and Prognosis of Oral Cancer
3 pages
INT354 Question Bank
No ratings yet
INT354 Question Bank
11 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
37 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
7 pages
Decision Tree Analysis On J48 Algorithm PDF
No ratings yet
Decision Tree Analysis On J48 Algorithm PDF
6 pages
Decisiontrees
No ratings yet
Decisiontrees
46 pages
02 DT Splitting (Entropy, Gini, ChiSqr, Variance)
No ratings yet
02 DT Splitting (Entropy, Gini, ChiSqr, Variance)
14 pages
Machine Learning Data Classification Guide
No ratings yet
Machine Learning Data Classification Guide
156 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Lecture 11-Classification-M
No ratings yet
Lecture 11-Classification-M
33 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Unit 1-1
No ratings yet
Unit 1-1
45 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
CSE 445 - Lecture 6 - Decision Trees
No ratings yet
CSE 445 - Lecture 6 - Decision Trees
20 pages
Decision Tree Based Id3 Algorithm
No ratings yet
Decision Tree Based Id3 Algorithm
2 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Decision Trees: ID3, Pruning & Issues
No ratings yet
Decision Trees: ID3, Pruning & Issues
9 pages
DWDM UNIT-IV Classification and Prediction
100% (1)
DWDM UNIT-IV Classification and Prediction
70 pages
Learning Fuzzy Rules from Trees
No ratings yet
Learning Fuzzy Rules from Trees
12 pages
Module 03 Question Bank
No ratings yet
Module 03 Question Bank
6 pages
Entropy and Information Gain
No ratings yet
Entropy and Information Gain
3 pages
ID3 Algorithm for Spam Filtering
No ratings yet
ID3 Algorithm for Spam Filtering
11 pages