Mid-Semester Regular Data Mining QP v1 PDF
Mid-Semester Regular Data Mining QP v1 PDF
Q.1 (a) Identify which of the following activities is a data mining activity. Justify your answer.
1) Computing the total sales of a company.
2) Predicting the outcomes of tossing a (fair) pair of dice.
3) Predicting the future stock price of a company using historical records.
4) Dividing the customers of a company according to their demographics.
Q.1 (b) Consider that following employee data set made available to you for carrying out some data
mining activity. What are the four potential issues with this dataset?
Name Age DateOfJoining Designation DateOfBirth
A 34 15-Jan-2015 Sr Engineer Feb 24, 1981
B 33 27-Jan-2015 Mar 27, 1982
A 34 15-Jan-2015 Sr Engineer Feb 24, 1981
C 32 30-Jan-2015 Staff Engineer Nov 25,1982
Q.1 (c) Statistical inference may indirectly facilitate the data preparation phase while pre-processing.
In light to this how the following inference can be used?
“A variable X is positively & uniformly correlated with another variable Y”
Q.2. Consider the following ordered list of observations of a variable. Answer the following:
25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 41, 42, 42, 99
Q.3 (a) The following dataset describes the data received on 14 patients and their status on diabetes.
Considering decision tree classifier to this binary classification problem, identify the attribute
selection among Exercise, Blood pressure level at the root node using Entropy and
information gain computation.
Sl# Blood
pressure Follow
Exercise level good diet ? Class
1 Yes High No - ve
2 Yes High Yes +ve
3 No High No +
4 Moderate High No +
5 Moderate Normal No +
6 Moderate Normal Yes -
7 No Normal Yes +
8 Yes High No -
9 Yes Normal No +
10 Moderate Normal No +
11 Yes Normal Yes +
12 No High Yes +
13 No Normal No +
14 Moderate High Yes -
Q.3 (b) Following table shows results of classification for a 2-class problem. Consider ‘Y’ and ‘N’ as
two classes. Calculate the F-score of the classifier for class ‘Y’.
Predicted Class Y N
Actual Class
Y 900 100 1000
N 200 800 1000
1100 900 2000