0% found this document useful (0 votes)
38 views5 pages

Assignment 4 DT NB LR Solution

The document discusses three questions related to supervised learning techniques. Question 1 uses a decision tree to predict student performance. Question 2 uses Naive Bayes classification to predict customer purchases. Question 3 uses logistic regression to predict customer purchases.

Uploaded by

Yvette Su
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views5 pages

Assignment 4 DT NB LR Solution

The document discusses three questions related to supervised learning techniques. Question 1 uses a decision tree to predict student performance. Question 2 uses Naive Bayes classification to predict customer purchases. Question 3 uses logistic regression to predict customer purchases.

Uploaded by

Yvette Su
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Assignment 4: Supervised Learning

Question 1
A college professor wants to predict whether a student will pass or fail a course
based on two factors: hours of study per week and attendance rate. The professor
collected data from 20 students. Use the Decision Tree method to compute the best
attribute for the first node splitting.
(Note: For simplicity of hand computing, the possible values for Hours of Study/Week
are binned as hours <= 7 or hours > 7 hours; while those of attendance rate % are
binned as rate <=80% or rate > 80%.)

Dataset:
Student_ID Hours of Study/Week Attendance Rate % Result
1 10 90 Pass
2 4 80 Fail
3 8 95 Pass
4 3 70 Fail
5 12 85 Pass
6 6 80 Fail
7 9 90 Pass
8 5 75 Fail
9 11 93 Pass
10 4 70 Fail
11 10 88 Pass
12 3 60 Fail
13 8 95 Pass
14 2 65 Fail
15 12 90 Pass
16 6 85 Fail
17 9 92 Pass
18 5 70 Fail
19 11 91 Pass
20 4 75 Fail

Solution

Step 1: Calculate Entropy of the Parent Node


First, we need to calculate the entropy of the parent node (Result):
Entropy(Result) = -(P(Pass) * log2(P(Pass)) + P(Fail) * log2(P(Fail)))

There are 10 Pass and 10 Fail cases out of 20 students:


Entropy(Result) = -(10/20 * log2(10/20) + 10/20 * log2(10/20)) = 1

Step 2: Calculate Entropies and Information Gains of the Alternative Splitting


Attributes
Next, we need to calculate the entropy of each attribute (Hours of Study/Week and
Attendance Rate (%)) and their information gain:
1. Hours of Study/Week:
- Split into two groups: <=7 hours and >7 hours
- Group 1: 8 Fail, 2 Pass
- Group 2: 2 Fail, 8 Pass

Entropy(Hours <= 7) = -(8/10 * log2(8/10) + 2/10 * log2(2/10)) = 0.72


Entropy(Hours > 7) = -(2/10 * log2(2/10) + 8/10 * log2(8/10)) = 0.72
Information Gain (Hours) = Entropy(Result) - (10/20 * Entropy(Hours <= 7) + 10/20 *
Entropy(Hours > 7)) = 1 - (0.5 * 0.72 + 0.5 * 0.72) = 0.28

2. Attendance Rate (%):


- Split into two groups: <=80% and >80%
- Group 1: 8 Fail, 1 Pass
- Group 2: 2 Fail, 9 Pass

Entropy(Attendance <= 80) = -(8/9 * log2(8/9) + 1/9 * log2(1/9)) = 0.50


Entropy(Attendance > 80) = -(2/11 * log2(2/11) + 9/11 * log2(9/11)) = 0.68
Information Gain (Attendance) = Entropy(Result) - (9/20 * Entropy(Attendance <= 80)
+ 11/20 * Entropy(Attendance > 80)) = 1 - (9/20 * 0.50 + 11/20 * 0.68) = 0.37

Step 3: Choose the Splitting Attribute based on the Obtained Information


Gains
Now, we compare the information gain of both attributes:
- Information Gain (Hours) = 0.28
- Information Gain (Attendance) = 0.37

Since the information gain for Attendance Rate is higher than for Hours of Study per
Week, the best attribute for node splitting is Attendance Rate. The decision tree will
split the parent node based on Attendance Rate (<= 80% and > 80%).

Question 2 Naïve Bayesian Classifier in Customer Targeting

Objective: In this assignment, students will manually compute the probabilities using
the Naive Bayesian classifier and predict if a customer will make a purchase based
on the given dataset.

Scenario: Suppose you are a marketing manager at a retail company. You have
collected data on customers' past behavior regarding their visits and purchases. The
dataset contains information about the day of the week, the customers' age group,
and their income level, along with whether they made a purchase. Your task is to use
the Naive Bayesian classifier to predict if a customer will make a purchase on a
given day based on their age group and income level.

Dataset:
Day Day of Age Group Income Purchase
Week Level
1 Monday Young High No
2 Monday Young High No
3 Tuesday Young High Yes
4 Wednesday Middle High Yes
5 Wednesday Senior Normal Yes
6 Wednesday Senior Normal No
7 Thursday Senior Normal Yes
8 Friday Middle High No
9 Friday Senior Normal Yes
10 Saturday Middle Normal Yes
11 Sunday Middle Normal Yes
12 Monday Middle High Yes
13 Tuesday Young Normal Yes
14 Wednesday Middle High No

Task: Suppose a new customer visits the store on a Friday, and their age group is
"Senior" and income level is "High." Predict if this customer will make a purchase
using the Naive Bayesian classifier. Follow the steps and provide the answer below
(Final answer round up to 1 decimal place).

a) Calculate the prior probabilities for each class (Purchase = Yes, Purchase = No).
b) Calculate the conditional probabilities for each feature (Day of Week, Age Group,
and Income Level) given each class.
c) Calculate the probability to infer the class for the incoming new customer.
d) Determine the class for the incoming new customer.

Solution (marking note: it is also correct if students applied Laplacian correction to


ALL probability terms.)
Step 1: Estimate Probabilities (Priors and Conditionals)
a) Prior probabilities P(Purchase = Y) = 9/14 ≈0.64 and P(Purchase = N) = 5/14
≈0.36
b) Conditional probabilities:
P(Friday | Purchase = Yes) = 1/9 ≈ 0.11
P(Friday | Purchase = No) = 1/5 = 0.20
P(Senior | Purchase = Yes) = 3/9 ≈ 0.33
P(Senior | Purchase = No) = 1/5 = 0.20
P(High | Purchase = Yes) = 3/9 ≈ 0.33
P(High | Purchase = No) = 4/5 = 0.80

Step 2: Infer the Probability of Assigning a Class to the Incoming Example


The probability to infer the class of incoming example is
𝑃(𝑃𝑢𝑟𝑐ℎ𝑎𝑠𝑒 = 𝑌𝑒𝑠 | 𝐹𝑟𝑖𝑑𝑎𝑦, 𝑆𝑒𝑛𝑖𝑜𝑟, 𝐻𝑖𝑔ℎ)
𝑃(𝑃𝑢𝑟𝑐ℎ𝑎𝑠𝑒 = 𝑌𝑒𝑠, 𝐹𝑟𝑖𝑑𝑎𝑦, 𝑆𝑒𝑛𝑖𝑜𝑟, 𝐻𝑖𝑔ℎ)
=
𝑃(𝐹𝑟𝑖𝑑𝑎𝑦, 𝑆𝑒𝑛𝑖𝑜𝑟, 𝐻𝑖𝑔ℎ)
𝑃(𝐹𝑟𝑖𝑑𝑎𝑦|𝑌𝑒𝑠)𝑃(𝑆𝑒𝑛𝑖𝑜𝑟|𝑌𝑒𝑠)𝑃(𝐻𝑖𝑔ℎ|𝑌𝑒𝑠)𝑃(𝑌𝑒𝑠)
=
∑𝑋∈{𝑌𝑒𝑠,𝑁𝑜} 𝑃(𝐹𝑟𝑖𝑑𝑎𝑦|𝑋)(𝑆𝑒𝑛𝑖𝑜𝑟|𝑋)𝑃(𝐻𝑖𝑔ℎ|𝑋)𝑃(𝑋)
0.11 × 0.33 × 0.33 × 0.64
=
0.11 × 0.33 × 0.33 × 0.64 + 0.2 × 0.2 × 0.8 × 0.36
0.00766
= = 0.39 < 0.5
0.00766 + 0.01152
Step 3: State Your Conclusion
The incoming customer will not make a purchase. Hence its class is Purchase
= No

Question 3

A marketing team at a small company wants to predict if a customer will make a


purchase based on their age and income. They collected data from 10 customers.
Use logistic regression to create a model that predicts the likelihood of a customer
making a purchase. For simplicity, assume the following logistic regression
coefficients have been provided:

Intercept: 15
Age Coefficient: -0.5
Income Coefficient: -0.01

Dataset:
Customer_ID Age Income in 1000s Purchase
1 25 40 No
2 35 60 Yes
3 30 50 Yes
4 20 30 No
5 40 70 Yes
6 22 32 No
7 45 80 Yes
8 27 45 No
9 38 65 Yes
10 24 38 No

Task: Calculate the probability of making a purchase for a new customer with an age
of 32 and an income of 55,000 using the provided coefficients. Hand compute for
your classification decision.

Solution
Step 1: Calculate the linear combination of weight coefficients and the input
features

Linear Combination = Intercept + (Age Coefficient * Age) + (Income Coefficient *


Income) Linear Combination = (15) + (-0.5 * 32) + (-0.01 * 55) = 15 - 16 - 0.55= -1.55

Step 2: Calculate the probability using logistic function

Probability = 1 / (1 + e^(Linear Combination))


Probability = 1 / (1 + e^(-1.55)) ≈ 0.8249

Step 3: Make the classification decision based on the probability


If Probability >= 0.5, then the customer is predicted to make a purchase. If
Probability < 0.5, then the customer is predicted not to make a purchase.

Since the calculated probability (0.8249) is greater than or equal to 0.5, the model
predicts that the new customer with an age of 32 and an income of 55,000 will make
a purchase.

You might also like