0% found this document useful (0 votes)

363 views143 pages

DT-0 (3 Files Merged)

The document discusses different types of machine learning including supervised, unsupervised, and reinforcement learning. It also discusses decision trees, a common supervised learning method. Decision trees involve splitting a training dataset into purer subsets based on the values of attributes. The attribute that provides the greatest information gain, or reduction in entropy, is selected at each node to make the subsets as pure as possible. Examples are provided to demonstrate how decision trees are constructed by recursively selecting the attribute that best splits the data.

Uploaded by

Qasim Abid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

363 views143 pages

DT-0 (3 Files Merged)

Uploaded by

Qasim Abid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 143

Learning

• Learning is essential for unknown environments, i.e. when designer

lacks omniscience
• Learning is useful as a system construction method, i.e., expose the
agent to reality rather than trying to write it down
• Learning modifies the agent's decision mechanisms to improve
performance
Types of Learning
• Supervised learning:
• Correct answer for each example. Answer can be a numeric
variable, categorical variable etc.
• Both inputs and outputs are given
• The outputs are typically provided by a friendly teacher.
o Unsupervised learning:
• Correct answers not given – just examples
• The agent receives some evaluation of its actions (such as a fine for
stealing bananas), but is not told the correct action (such as how to
buy bananas).
o Reinforcement learning:
• Occasional rewards
• The agent can learn relationships among its percepts, and the trend
with time
Decision Tree
• A decision tree takes as input an object or situation described by a set of
properties, and outputs a yes/no “decision”.
• Decision tree induction is one of the simplest and yet most successful
forms of
machine learning. We first describe the representation—the hypothesis
space—
and then show how to learn a good hypothesis.
• Each node tests the value of an input attribute
• Branches from the node correspond to possible values of the attribute
• Leaf nodes supply the values to be returned if that leaf is reached
Simple Examples
• Decision: Whether to wait for a table at a restaurant.
1. Alternate: whether there is a suitable alternative restaurant.
2. Bar: whether the restaurant has a Bar for waiting customers.
3. Fri/Sat: true on Fridays and Saturdays Sat.
4. Hungry: whether we are hungry.
5. Patrons: how many people are in it (None, Some, Full).
6. Price: the restaurant’s rating (★, ★★ , ★★★).
7. Raining: whether it is raining outside Raining.
8. Reservation: whether we made a reservation.
9. Type: the kind of restaurant (Indian, Chinese, Thai, Fast food):
10.Wait Estimate: 10 mins , 10 30, 30 60, >60
Representation
• To draw a decision tree from a dataset of some attributes:
● Each node corresponds to a splitting attribute.
● Each arc is a possible value of that attribute.
● Splitting attribute is selected to be the most informative among
the attributes.
● Entropy is a factor used to measure how informative is a node.
● The algorithm uses the criterion of information gain to determine
the goodness of a split.
● The attribute with the greatest information gain is taken as the
splitting attribute, and the data set is split for all distinct values of
the attribute values of the attribute.
• Classification Methods
Gini Index
• Used in CART, SLIQ, SPRINT to determine the best split.
• Tree Growing Process:
• Find each predictor’s best split.
• Find the node’s best split : Among the best splits found in step 1,
choose the one that maximizes the splitting criterion.
• Split the node using its best split found in step 2 if the stopping
rules are not satisfied.
• Finding the Best Split: GINI Index
• GINI Index for a given node t:

•
p(j|t) is the relative frequency of j at node t
● Maximum (1-1/n): records equally distributed in n classes
● Minimum 0: all records in one class
Example
Revision Of Classification
 Decision Tree
 Naïve Bayes
Inductive Learning (Learning From Examples)
Different kinds of learning…
Semi-
Supervised Unsupervised Reinforcement
supervised
learning: learning: learning:
learning:

Someone gives us We see examples Small amount of

examples with right only but get no labeled data, We take actions
answer (output/label) for feedback (no large amount of and get rewards
those examples labels/output) unlabeled data

We have to predict the right We need to find Have to learn

answer for unseen patterns in the how to get high
examples data rewards
Inductive Machine Learning Process …
Learning
Preparation Testing and
Scheme OR Trained
of Training Evaluation
Learning Model
Data Test Data
Algorithm

Deploy Trained &

Tested Model in
Domain
Machine Learning Process

Learning
Scheme OR
Learning
Algorithm
Supervised Machine Learning Types
Types Supervised Machine Learning
Classification
Association
Regression \ Numeric Prediction
Also Known as Predictive Learning
Classification Algorithms
There are many algorithms for classification. Some of the
famous are following:
 Decision trees
 Naive Bayes classifier
 Linear Regression
 Logistic regression
 Neural networks
 Perceptron
 Multi Layer Neural Networks
 Support vector machines (SVM)
 Quadratic classifiers
 Instance Based Learning
 k-nearest neighbor
Decision Tree Learning
Classification Training data may be used to create a
Decision Tree (Model) which consists of:
Nodes:
 Represents the Test of Attributes/Features of Training Data
Edges:
 Correspond to the outcome of a test which connect to the
next node or leaf
Leaves:
 Predict the class(y)
Nominal and numeric attributes
 Nominal:
number of children usually equal to number values
attribute won’t get tested more than once
 Other possibility: division into two subsets
 Numeric:
test whether value is greater or less than constant
attribute may get tested several times
 Other possibility: three-way split (or multi-way split)
Integer: less than, equal to, greater than
Real: below, within, above
Classification using Decision Tree
Example (Restaurant)
Decision Tree of Restaurant Example

Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait
Yes No yes Yes Full $ No Yes French 30–60 ?
Criterion for attribute selection
n
There is no way to efficiently search through the 22
trees
Want to get the smallest tree
Heuristic: choose the attribute that produces the
“purest” nodes
Strategy: choose attribute that gives greatest
information gain
Purity check of an attribute

Pure Impure
Measuring Purity
Question is How to Measure Purity?
Two Methods are used
Entropy Based Information Gain
Gini Index
Information Gain Used in ID3 and its commercial
version i.e. C5.4, J48 and J50
Gini Index is used in CART algorithm
implemented in Python Sklearn
Decision Trees
There are four cases to consider while creating
 If the remaining examples are all positive (or all negative), then answer Yes
or No
 If there are some positive and some negative examples, then choose the
best attribute to split them
 No Example left: If there are no examples left, it means that no example
has been observed for this combination of attribute values, and we return a
default value calculated from the plurality classification
 Noise in Data (No feature left): If there are no attributes left, but both
positive and negative examples, it means that these examples have exactly
the same description, but different classifications. This can happen because
there is an error or noise in the data; because the domain is on
deterministic; or because we can’t observe an attribute that would
distinguish the examples. The best we can do is return the plurality
classification of the remaining examples.
Example
(Weather Data)
Which attribute to select? 19
Computing information
Measure information in bits

Formula to calculate Entropy

entropy(𝒑𝟏, 𝒑𝟐, . . . , 𝒑𝒏 ) = −𝒑𝟏 𝐥𝐨𝐠𝒑𝟏 − 𝒑𝟐 𝐥𝐨𝐠𝒑𝟐 . . . −𝒑𝒏 𝐥𝐨𝐠𝒑𝒏

Constructing DT
1. Calculate Entropy of Branch: of each value of feature

2. Average Entropy Nodes : average of Entropy of Branches

3. Calculate Information Gain Nodes: which is: Total

Information - average of Entropy of Nodes

4. Select Node with highest Information Gain

5. Expand Nodes with mixed Classes Branches & repeat

process 1-4
Example: attribute Outlook
 Outlook = Sunny :
info( 𝟐, 𝟑 ) = entropy(𝟐 𝟓, 𝟑 𝟓) = −𝟐 𝟓 𝐥𝐨𝐠(𝟐 𝟓) − 𝟑 𝟓 𝐥𝐨𝐠(𝟑 𝟓) = 𝟎. 𝟗𝟕𝟏bits
 Outlook = Overcast : Note: this is normally undefined.

info( 4,0 ) = entropy(1,0) = −1log(1) − 0log(0) = 0bits

 Outlook = Rainy :
info( 𝟐, 𝟑 ) = entropy(𝟑 𝟓, 𝟐 𝟓) = −𝟑 𝟓 𝐥𝐨𝐠(𝟑 𝟓) − 𝟐 𝟓 𝐥𝐨𝐠(𝟐 𝟓) = 𝟎. 𝟗𝟕𝟏bits

 Expected information for attribute:

info 3,2 , 4,0 , 3,2 = 5 14 × 0.971 + 4 14 × 0 + (5 14) × 0.971
= 0.693bits
Which attribute to select?
Computing information gain
• Information gain: information before splitting – information
after splitting
• Gain(Outlook ) = Info([9,5]) – info([2,3],[4,0],[3,2])
= {-9/14log9/14-5/14log5/14}- 0.693
= 0.940 – 0.693
= 0.247 bits
• Information gain for attributes from weather data:
Gain(Outlook ) = 0.247 bits
Gain(Temperature ) = 0.029 bits
Gain(Humidity ) = 0.152 bits
Gain(Windy ) = 0.048 bits
Which attribute to select?

Gain(Outlook ) = 0.247 bits

Gain(Temperature ) = 0.029 bits
Gain(Humidity ) = 0.152 bits
Gain(Windy ) = 0.048 bits
Continuing to split

 gain(Temperature ) = 0.571 bits

 gain(Humidity ) = 0.971 bits
 gain(Windy ) = 0.020 bits
Final decision tree

 Note: not all leaves need to be pure; sometimes

identical instances have different classes
 Splitting stops when data can’t be split any further
Activity Restaurant?

• Construct Decision Tree on a Page and Submit

• Clearly Calculate Information Gain of every feature
• Submit Up to this Sunday
CART Algorithm
 Many alternative measures to Information Gain
 Most popular alternative: Gini index used in e.g., in CART
(Classification And Regression Trees)
 Average Gini index (instead of average entropy / information)
 Gini index is minimized instead of maximizing Gini gain

Impurity measure/For each Feature Value:

Average Gini for Attribute:

CART Algorithm
• Gini (outlook=sunny)=1-(2/5)2-(3/5)2 =0.48
• Gini (outlook=overcast)=1-(4/4)2-(0/4)2 =0
• Gini (outlook=rainy)=1-(3/5)2-(2/5)2 =0.48
Average Gini for Attribute:
Gini (outlook)=(5/14)*0.48+(4/14)*0+(5/14)*0.48
=0.342
Impurity measure/For each Feature Value:

Average Gini for Attribute:

Which attribute to select?
Activity
Weather?

• Construct Decision Tree

on a Page and Submit
• Clearly Calculate Gini
Measure feature
• Submit Up to this Sunday
Properties of Good Measure
We expect information measure should have
following properties:
1. When the number of either yes’s or no’s is
zero, the information is zero.
2. When the number of yes’s and no’s is equal,
the information reaches a maximum
3. The information should obey the multistage
property
Information Gain based on Entropy has all
above properties:
ISSUES With Training Data
 Missing data: Not all the attribute values are known
 Method to train and test during implementation
 Multivalued attributes: Attribute has many values information gain gives inappropriate
indication of the attribute’s usefulness. An attribute such as ExactTime ,ID
 One solution is to use the gain ratio C4.5
 Continuous and integer-valued input attributes: Continuous or integer-valued
attributes such as Height and Weight , have an infinite set of possible values.
 SPLIT POINT split point For example, at a given node in the tree, it might be the case
that testing on Weight > 160. start by sorting the values
 Splitting is the most expensive part of real-world decision tree learning applications
 Continuous-valued output attributes: If we are trying to predict a numerical output
value, such as the price of an apartment, then we need a regression tree rather than a
classification tree.
DT Discussion
 A decision-tree learning system for real-world applications must be able to
handle all of these problems.
 Handling continuous-valued variables is especially important, because both
physical and financial processes provide numerical data. Several commercial
packages built that meet these criteria,
 In many areas of industry and commerce, decision trees are usually the first
method tried when a classification method is to be extracted from a data set.
 One important property of decision trees is that it is possible for a human to
understand the reason for the output of the learning algorithm. (Indeed, this is
a legal requirement for financial decisions that are subject to anti-
discrimination laws.)
 This is a property not shared by some other representations, such as neural
networks.
Loading Data
 import pandas
 from sklearn import tree
 from sklearn.tree import DecisionTreeClassifier
 from sklearn import metrics
 df = pandas.read_csv("weatherdata_converted.csv")
 print(df)
 print(df.info())
Spliting dat
 features=['Outlook', 'Temperature', 'Humidity', 'Windy']
 X = df[features]
 y = df['Play']
 print(X)
 print(y)
 print(df.groupby('Play').size())
Decision Tree (Entropy)
 dtree1 = DecisionTreeClassifier(criterion = "entropy")
 # Performing training
 dtree1.fit(X, y)
 tree.plot_tree(dtree1,feature_names=features)
Decision Tree (Gini Index)
 # Classifier and Prediction
 dtree = DecisionTreeClassifier(criterion="gini")
 dtree = dtree.fit(X, y)
 #tree.plot_tree(dtree,feature_names=features)
 print(X.iloc[1,:])
Prediction
 print(dtree.predict(X.iloc[1:3,:]))
 print(y.iloc[1:3])
 x_predict=dtree.predict(X)
 print(x_predict)
 print ("Accuracy : ", metrics.accuracy_score(x_predict,y)*100)
Statistical modeling (Naïve Bayes)
Two assumptions: Attributes are
equally important
statistically independent (given the class value)
I.e., knowing the value of one attribute says nothing
about the value of another (if the class is known)
Independence assumption is never correct!
But … this scheme works well in practice
Bayes’s rule
Probability of event H given evidence E:
𝑃𝑟 𝐸 ∣ 𝐻 𝑃𝑟 𝐻
𝑃𝑟 𝐻 ∣ 𝐸 =
𝑃𝑟 𝐸

A priori probability of H : 𝑷𝒓 𝐄 ∣ 𝑯

Probability of event before evidence is seen

A posteriori probability of H : 𝑃𝑟 𝐻|𝑬
Probability of event after evidence is seen
Naïve Bayes for classification
 Classification learning: what’s the probability
of the class given an instance?
• Evidence E = instance’s non-class attribute
values
• Event H = class value of instance
 Naïve assumption: evidence splits into parts
(i.e. attributes) that are independent
𝑷𝒓 𝑬𝟏 ∣ 𝑯 𝑷𝒓 𝑬𝟐 ∣ 𝑯 . . 𝑷𝒓 𝑬𝒏 ∣ 𝑯 𝑷𝒓 𝑯
𝑷𝒓 𝑯 ∣ 𝑬 =
𝑷𝒓 𝑬
Weather data example
Outlook Temp. Humidity Windy Play Evidence E
Sunny Cool High True ?

P(yes| E) = P(Outlook = Sunny | yes)

P(Temperature= Cool | yes)
Probability of
P(Humidity = High | yes)
class “yes”
𝑷𝒓 𝑬𝟏 ∣ 𝑯 𝑷𝒓 𝑬𝟐 ∣ 𝑯 . . 𝑷𝒓 𝑬𝒏 ∣ 𝑯 𝑷𝒓 𝑯
P(Windy = True| yes) 𝑷𝒓 𝑯 ∣ 𝑬 =
𝑷𝒓 𝑬
P(yes) / P(E)
2 / 9 ´ 3 / 9 ´ 3 / 9 ´ 3 / 9 ´ 9 /14
=
P(E)
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes

Probabilities for Rainy

Rainy
Mild
Cool
High
Normal
False
False
Yes
Yes
weather data Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
Prior Probabilities for weather data
(Training)
Probabilities for weather data
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Rainy 3 2 Cool 3 1
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/ 5/
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5 14 14
Rainy 3/9 2/5 Cool 3/9 1/5

• A new day: Outlook Temp. Humidity Windy Play

Sunny Cool High True ?

Likelihood of “yes” = 2/9  3/9  3/9  3/9  9/14 = 0.0053

Likelihood For “no” = 3/5  1/5  4/5  3/5  5/14 = 0.0206
Normalizing Likelihood so that probablities sum to 1:

NO
How Naïve Bayes Deals….
Zero Frequency Problem
Missing Values
Numeric Values
The “zero-frequency problem”
 What if an attribute value doesn’t occur with every class
value?
(e.g. “Humidity = high” for class “yes”)
 Probability will be zero!
 A posteriori probability will also be zero!
(No matter how likely the other values are!)
 Remedy: add 1 to the count for every attribute value-
class combination (Laplace estimator)
 Result: probabilities will never be zero!
(also: stabilizes probability estimates)
Missing values
 Training: instance is not included in frequency
count for attribute value-class combination
 Classification: attribute will be omitted from
calculation
 Example:

Outlook Temp. Humidity Windy Play

? Cool High True ?
Numeric attributes
Outlook Temperature Humidity Windy Play
Sunny 85 85 FALSE no
Sunny 80 90 TRUE no
Overcast 83 86 FALSE yes
Rainy 70 96 FALSE yes
Rainy 68 80 FALSE yes
Rainy 65 70 TRUE no
Overcast 64 65 TRUE yes
Sunny 72 95 FALSE no
Sunny 69 70 FALSE yes
Rainy 75 80 FALSE yes
Sunny 75 70 TRUE yes
Overcast 72 90 TRUE yes
Overcast 81 75 FALSE yes
Rainy 71 91 TRUE no
Numeric attributes
• Usual assumption: attributes have a normal or Gaussian probability
distribution (given the class)
• Use probability density function for the normal distribution is defined
by two parameters:

• Sample mean

• Standard deviation

• Then the density function f(x) is

Statistics for weather data
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 64, 68, 65,71, 65, 70, 70, 85, False 6 2 9 5
Overcas 4 0 69, 70, 72,80, 70, 75, 90, 91, True 3 3
tRainy 3 2 72, … 85, 80, … 95, …
Sunny 2/9 3/5  =73 …
 =75  =79  =86 False 6/9 2/5 9/ 5/
Overcas 4/9 0/5  =6.2  =7.9  =10.2  =9.7 True 3/9 3/5 14 14
tRainy 3/9 2/5

• Example density value:

Classifying a new day
• A new day: Outlook Temp. Humidity Windy Play
Sunny 66 90 true ?

Likelihood of “yes” = 2/9  0.0340  0.0221  3/9  9/14 = 0.000036

Likelihood of “no” = 3/5  0.0221  0.0381  3/5  5/14 = 0.000108
P(“yes”) = 0.000036 / (0.000036 + 0. 000108) = 25%
P(“no”) = 0.000108 / (0.000036 + 0. 000108) = 75%

• Missing values during training are not included

in calculation of mean and standard deviation
Naïve Bayes: discussion
 Naïve Bayes works surprisingly well (even if
independence assumption is clearly violated)
 Why? Because classification doesn’t require
accurate probability estimates as long as
maximum probability is assigned to correct class
 However: adding too many redundant attributes
will cause problems (e.g. identical attributes)
 Note also: many numeric attributes are not
normally distributed ( kernel density estimators)
Multinomial naïve Bayes
• Version of naïve Bayes used for document classification using bag of words
model
• n1,n2, ... , nk: number of times word i occurs in the document
• P1,P2, ... , Pk: probability of obtaining word i when sampling from documents
in class H
• Probability of observing a particular document E given probabilities class H
(based on multinomial distribution):

• Note that this expression ignores the probability of generating a document of

the right length
• This probability is assumed to be constant for all classes
Multinomial naïve Bayes
• Suppose dictionary has two words, yellow and blue
• Suppose P(yellow | H) = 75% and P(blue | H) = 25%
• Suppose E is the document “blue yellow blue”
• Probability of observing document:
0.751 0.252 27
P({blueyellowblue} | H ) = 3!´ ´ =
1! 2! 64
Suppose there is another class H' that has
P(yellow | H’) = 10% and P(blue| H’) = 90%:
0.11 0.9 2 243
P({blueyellowblue} | H ) = 3!´ ´ =
1! 2! 1000
• Need to take prior probability of class into account to make the final
classification using Bayes’ rule
Categorical Naïve Bays
 import pandas
 from sklearn.naive_bayes import CategoricalNB
 from sklearn import metrics
 df = pandas.read_csv("weatherdata_converted.csv")
 print(df)
 features=['Outlook', 'Temperature', 'Humidity', 'Windy']
 print(df.info())
 df['Outlook'] = df['Outlook'].astype('category')
 df['Temperature'] =
df['Temperature'].astype('category')
 df['Humidity'] = df['Humidity'].astype('category')
 df['Windy'] = df['Windy'].astype('category')
 X = df[features]
 y = df['Play']
Categorical Naïve Bays
 print(df.info())
 print(df.describe())
 print(df.groupby('Play').size())
 clf = CategoricalNB()
 clf.fit(X, y)
 y_pred = clf.predict(X)
 print(y_pred)
 print ("Accuracy : ",
metrics.accuracy_score(y_pred,y)*
100)
Gaussian Naïve Bays (Iris flower data set)
The dataset contains a set of 150 records under five attributes /Features-
petal length, petal width, sepal length, sepal width and species(Class)

………..

………..
Gaussian Naïve Bays (Iris flower data set)
 from sklearn import datasets
 from sklearn.naive_bayes import GaussianNB
 from sklearn import metrics
 iris= datasets.load_iris()
 X = iris.data
 y = iris.target
 print(X.shape)
 print(y.shape)
 nb = GaussianNB()
 nb.fit(X,y).predict(X)
 y_pred = nb.predict(X)
 print(y_pred)
 print ("Accuracy : ", metrics.accuracy_score(y_pred,y)*100)
Revision Of Classification
 Decision Tree
 Naïve Bayes
Inductive Learning (Learning From Examples)
Different kinds of learning…
Semi-
Supervised Unsupervised Reinforcement
supervised
learning: learning: learning:
learning:

Someone gives us We see examples Small amount of

examples with right only but get no labeled data, We take actions
answer (output/label) for feedback (no large amount of and get rewards
those examples labels/output) unlabeled data

We have to predict the right We need to find Have to learn

Deploy Trained &

Tested Model in
Domain
Machine Learning Process

Formula to calculate Entropy

entropy(𝒑𝟏, 𝒑𝟐, . . . , 𝒑𝒏 ) = −𝒑𝟏 𝐥𝐨𝐠𝒑𝟏 − 𝒑𝟐 𝐥𝐨𝐠𝒑𝟐 . . . −𝒑𝒏 𝐥𝐨𝐠𝒑𝒏

Constructing DT
1. Calculate Entropy of Branch: of each value of feature

2. Average Entropy Nodes : average of Entropy of Branches

3. Calculate Information Gain Nodes: which is: Total

Information - average of Entropy of Nodes

4. Select Node with highest Information Gain

5. Expand Nodes with mixed Classes Branches & repeat

info( 4,0 ) = entropy(1,0) = −1log(1) − 0log(0) = 0bits

 Outlook = Rainy :
info( 𝟐, 𝟑 ) = entropy(𝟑 𝟓, 𝟐 𝟓) = −𝟑 𝟓 𝐥𝐨𝐠(𝟑 𝟓) − 𝟐 𝟓 𝐥𝐨𝐠(𝟐 𝟓) = 𝟎. 𝟗𝟕𝟏bits

 Expected information for attribute:

Gain(Outlook ) = 0.247 bits

Gain(Temperature ) = 0.029 bits
Gain(Humidity ) = 0.152 bits
Gain(Windy ) = 0.048 bits
Continuing to split

 gain(Temperature ) = 0.571 bits

 gain(Humidity ) = 0.971 bits
 gain(Windy ) = 0.020 bits
Final decision tree

 Note: not all leaves need to be pure; sometimes

identical instances have different classes
 Splitting stops when data can’t be split any further
Activity Restaurant?

• Construct Decision Tree on a Page and Submit

Impurity measure/For each Feature Value:

Average Gini for Attribute:

Which attribute to select?
Activity
Weather?

• Construct Decision Tree

A priori probability of H : 𝑷𝒓 𝐄 ∣ 𝑯

Probability of event before evidence is seen

P(yes| E) = P(Outlook = Sunny | yes)

Probabilities for Rainy

• A new day: Outlook Temp. Humidity Windy Play

Sunny Cool High True ?

Likelihood of “yes” = 2/9  3/9  3/9  3/9  9/14 = 0.0053

Likelihood For “no” = 3/5  1/5  4/5  3/5  5/14 = 0.0206
Normalizing Likelihood so that probablities sum to 1:

Outlook Temp. Humidity Windy Play

• Sample mean

• Standard deviation

• Then the density function f(x) is

• Example density value:

Classifying a new day
• A new day: Outlook Temp. Humidity Windy Play
Sunny 66 90 true ?

Likelihood of “yes” = 2/9  0.0340  0.0221  3/9  9/14 = 0.000036

Likelihood of “no” = 3/5  0.0221  0.0381  3/5  5/14 = 0.000108
P(“yes”) = 0.000036 / (0.000036 + 0. 000108) = 25%
P(“no”) = 0.000108 / (0.000036 + 0. 000108) = 75%

• Missing values during training are not included

• Note that this expression ignores the probability of generating a document of

………..

Blaine Ciment 114q
100% (1)
Blaine Ciment 114q
38 pages
Document
No ratings yet
Document
2 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Bootstrap Method PDF
No ratings yet
Bootstrap Method PDF
14 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Lecture 06 Part A - Macine Learning
No ratings yet
Lecture 06 Part A - Macine Learning
77 pages
CH 5
No ratings yet
CH 5
81 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Stochastic Calculus - An Introduction With Applications PDF
No ratings yet
Stochastic Calculus - An Introduction With Applications PDF
246 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Chapter 5,6 Regression Analysis
50% (2)
Chapter 5,6 Regression Analysis
44 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
CI Problem
No ratings yet
CI Problem
3 pages
Class Basic
No ratings yet
Class Basic
75 pages
6.1 Central Limit Theorem
No ratings yet
6.1 Central Limit Theorem
4 pages
Act L 2102 Pass Sample Midterm
No ratings yet
Act L 2102 Pass Sample Midterm
2 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Chapter-4: Pairs of Random Variables
No ratings yet
Chapter-4: Pairs of Random Variables
111 pages
Tabele Teste Z, T, Chi2, F
No ratings yet
Tabele Teste Z, T, Chi2, F
18 pages
Docx
No ratings yet
Docx
16 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
MM1
No ratings yet
MM1
7 pages
Types of Errors
100% (1)
Types of Errors
11 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
Materials SB: 1+3.3log Log (N)
No ratings yet
Materials SB: 1+3.3log Log (N)
10 pages
BCS 040
No ratings yet
BCS 040
23 pages
Ch08 - Large-Sample Estimation
No ratings yet
Ch08 - Large-Sample Estimation
28 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Regularization: The Problem of Overfitting
No ratings yet
Regularization: The Problem of Overfitting
16 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Problem 1 (15 Points)
No ratings yet
Problem 1 (15 Points)
2 pages
CL202: Introduction To Data Analysis: MB+SCP
No ratings yet
CL202: Introduction To Data Analysis: MB+SCP
37 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
03 MLE MAP NBayes-1-21-2015
No ratings yet
03 MLE MAP NBayes-1-21-2015
40 pages
Dimana: Model Regresi Non-Linear (Model Fungsi Produksi Cobb-Douglas)
No ratings yet
Dimana: Model Regresi Non-Linear (Model Fungsi Produksi Cobb-Douglas)
6 pages
STQS4113 Set1 Sem 1 2020/2021: Discriminant Analysis in Banking Data
No ratings yet
STQS4113 Set1 Sem 1 2020/2021: Discriminant Analysis in Banking Data
12 pages
Economic Conditions I.Q Rich Poor: High Medium Low 160 300 140 140 100 160
No ratings yet
Economic Conditions I.Q Rich Poor: High Medium Low 160 300 140 140 100 160
3 pages
Tutorial - 11-18MAB204T
No ratings yet
Tutorial - 11-18MAB204T
2 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Logistic Regression
No ratings yet
Logistic Regression
32 pages
Naïve Bayes Classifier: Dr. Hussain Dawood
No ratings yet
Naïve Bayes Classifier: Dr. Hussain Dawood
20 pages
CSC2071 - Lecture 26 (Mid Exam Solution)
No ratings yet
CSC2071 - Lecture 26 (Mid Exam Solution)
5 pages
Central Tendency Variability and Sampling Distribution PNCH
No ratings yet
Central Tendency Variability and Sampling Distribution PNCH
6 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
19 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Evaluating The Predictive Performance of Habitat M
No ratings yet
Evaluating The Predictive Performance of Habitat M
22 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
581 AVO Fluid Inversion
No ratings yet
581 AVO Fluid Inversion
45 pages
11) Elaborate On The Types of Machine Learning With Appropriate Examples
No ratings yet
11) Elaborate On The Types of Machine Learning With Appropriate Examples
9 pages
Classification
No ratings yet
Classification
45 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
DM 3
No ratings yet
DM 3
37 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
Presentation-Bivariate Data
No ratings yet
Presentation-Bivariate Data
112 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
CSE445 NSU Week - 4
No ratings yet
CSE445 NSU Week - 4
48 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
2024 Lecture11 MLAlgorithms
No ratings yet
2024 Lecture11 MLAlgorithms
84 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Mod 3 Part1 - Merged
No ratings yet
Mod 3 Part1 - Merged
101 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Hypothesis Testing Presentation
No ratings yet
Hypothesis Testing Presentation
17 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
41 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Trees
No ratings yet
Trees
78 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
8 Classification
No ratings yet
8 Classification
82 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
7 pages
Practical Design of Experiments: DoE Made Easy
From Everand
Practical Design of Experiments: DoE Made Easy
Colin Hardwick
4.5/5 (7)
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet