Assignment 4 DT NB LR Solution
Assignment 4 DT NB LR Solution
Question 1
A college professor wants to predict whether a student will pass or fail a course
based on two factors: hours of study per week and attendance rate. The professor
collected data from 20 students. Use the Decision Tree method to compute the best
attribute for the first node splitting.
(Note: For simplicity of hand computing, the possible values for Hours of Study/Week
are binned as hours <= 7 or hours > 7 hours; while those of attendance rate % are
binned as rate <=80% or rate > 80%.)
Dataset:
Student_ID Hours of Study/Week Attendance Rate % Result
1 10 90 Pass
2 4 80 Fail
3 8 95 Pass
4 3 70 Fail
5 12 85 Pass
6 6 80 Fail
7 9 90 Pass
8 5 75 Fail
9 11 93 Pass
10 4 70 Fail
11 10 88 Pass
12 3 60 Fail
13 8 95 Pass
14 2 65 Fail
15 12 90 Pass
16 6 85 Fail
17 9 92 Pass
18 5 70 Fail
19 11 91 Pass
20 4 75 Fail
Solution
Since the information gain for Attendance Rate is higher than for Hours of Study per
Week, the best attribute for node splitting is Attendance Rate. The decision tree will
split the parent node based on Attendance Rate (<= 80% and > 80%).
Objective: In this assignment, students will manually compute the probabilities using
the Naive Bayesian classifier and predict if a customer will make a purchase based
on the given dataset.
Scenario: Suppose you are a marketing manager at a retail company. You have
collected data on customers' past behavior regarding their visits and purchases. The
dataset contains information about the day of the week, the customers' age group,
and their income level, along with whether they made a purchase. Your task is to use
the Naive Bayesian classifier to predict if a customer will make a purchase on a
given day based on their age group and income level.
Dataset:
Day Day of Age Group Income Purchase
Week Level
1 Monday Young High No
2 Monday Young High No
3 Tuesday Young High Yes
4 Wednesday Middle High Yes
5 Wednesday Senior Normal Yes
6 Wednesday Senior Normal No
7 Thursday Senior Normal Yes
8 Friday Middle High No
9 Friday Senior Normal Yes
10 Saturday Middle Normal Yes
11 Sunday Middle Normal Yes
12 Monday Middle High Yes
13 Tuesday Young Normal Yes
14 Wednesday Middle High No
Task: Suppose a new customer visits the store on a Friday, and their age group is
"Senior" and income level is "High." Predict if this customer will make a purchase
using the Naive Bayesian classifier. Follow the steps and provide the answer below
(Final answer round up to 1 decimal place).
a) Calculate the prior probabilities for each class (Purchase = Yes, Purchase = No).
b) Calculate the conditional probabilities for each feature (Day of Week, Age Group,
and Income Level) given each class.
c) Calculate the probability to infer the class for the incoming new customer.
d) Determine the class for the incoming new customer.
Question 3
Intercept: 15
Age Coefficient: -0.5
Income Coefficient: -0.01
Dataset:
Customer_ID Age Income in 1000s Purchase
1 25 40 No
2 35 60 Yes
3 30 50 Yes
4 20 30 No
5 40 70 Yes
6 22 32 No
7 45 80 Yes
8 27 45 No
9 38 65 Yes
10 24 38 No
Task: Calculate the probability of making a purchase for a new customer with an age
of 32 and an income of 55,000 using the provided coefficients. Hand compute for
your classification decision.
Solution
Step 1: Calculate the linear combination of weight coefficients and the input
features
Since the calculated probability (0.8249) is greater than or equal to 0.5, the model
predicts that the new customer with an age of 32 and an income of 55,000 will make
a purchase.