DM_Practice_Problem_Set-2
DM_Practice_Problem_Set-2
1
4 Why do we often use one-hot encoding for Nominal attributes
but integer encoding for Ordinal attributes?
5 What is the key difference between Binary and Nominal at-
tributes? Can a binary attribute also be a nominal attribute?
Explain with an example.
6 If a dataset contains:
• Customer Age
• Product Category
• Purchase Amount
• Membership Status (Gold, Silver, Bronze)
ID Name Eye Color (Nominal) Income ($) (Numeric) Likes Spicy Food (Binary) Education Level (Ordinal)
1 John Brown 50,000 Yes Master’s
2 Emma Blue 70,000 No High School
3 Alex Green 40,000 Yes PhD
2
11 Convert the sentence “Data Science is awesome” into a one-
hot encoded representation using the vocabulary
{“Data”, “Science”, “is”, “awesome”, “Machine”, “learning”}
3
Person Age Income Student Credit Score Buy Product
1 Young High No Fair No
2 Young High No Excellent No
3 Middle High No Fair Yes
4 Senior Medium No Fair Yes
5 Senior Low Yes Fair Yes
6 Senior Low Yes Excellent No
7 Middle Low Yes Excellent Yes
8 Young Medium No Fair No
9 Young Low Yes Fair Yes
10 Senior Medium Yes Fair Yes
11 Young Medium Yes Excellent Yes
12 Middle Medium No Excellent Yes
13 Middle High Yes Fair Yes
14 Senior Medium No Excellent No
2. Calculate the Information Gain for each feature: Age, Income, Student, and Credit Score.
The Information Gain is given by:
4
3. Extract at least two IF-THEN rules for predicting Buy Product.
5
21 The table below represents sales performance ($1000s) and
marketing expenses ($1000s) over six months:
1. Compute the Pearson Correlation Coefficient r between sales and marketing spend.
2. If a Dice score above 0.75 is considered a good segmentation, does the model perform well?
6
25 A new predictive model for tuberculosis detection test was
applied to 300 patients. The results are as follows:
• 120 patients actually had tuberculosis.
• Out of those 120, the model correctly identified 90 as positive.
• The model incorrectly classified 30 tuberculosis patients as negative (false negatives).
• Out of 180 healthy patients, the model correctly classified 150 as negative.