0% found this document useful (0 votes)
46 views2 pages

National Institute of Technology Rourkela: Department of Computer Science and Engineering

The document contains instructions for 6 questions related to data mining techniques. Question 1 asks to identify the class label of an unseen data point using modified weighted k-nearest neighbors classifier. Question 2 involves building a decision tree on sample data and calculating various evaluation metrics. Question 3 asks to predict the class of an unknown data point using k-nearest neighbors. Question 4 and 6 involve predicting the class of an unknown data point using naive bayes and random forest classifiers respectively on another sample dataset. Question 5 defines model overfitting and asks how to estimate the generalization error of a decision tree.

Uploaded by

Raj Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views2 pages

National Institute of Technology Rourkela: Department of Computer Science and Engineering

The document contains instructions for 6 questions related to data mining techniques. Question 1 asks to identify the class label of an unseen data point using modified weighted k-nearest neighbors classifier. Question 2 involves building a decision tree on sample data and calculating various evaluation metrics. Question 3 asks to predict the class of an unknown data point using k-nearest neighbors. Question 4 and 6 involve predicting the class of an unknown data point using naive bayes and random forest classifiers respectively on another sample dataset. Question 5 defines model overfitting and asks how to estimate the generalization error of a decision tree.

Uploaded by

Raj Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

National Institute of Technology Rourkela

Department of Computer Science and Engineering


B.Tech. (7th Semester) Mid Semester Examination (September), 2017
Subject: Data Warehousing and Mining (CS 425)
Unnecessary long answers may attract negative mark. It is a 2-page question.

Full Marks: 30 Time: 2 Hours

1. Modified Weighted k-NNC assigns non-linear weight as e−d to a nearest neighbors of an unseen object (u),
where d is the distance from unseen object to the neighbor. Sorted distances from the unseen objects to its
neighbors (first to last) and their class labels are given. Identify the class label of the unseen object.
K-NN (u) = { x1C1 , x2C2 , x3C2 , x4C3 , x5C2 }. Superscript represents class label.
Distance vector from u to the neighbors = (1, 4, 5, 7, 10) [ 3]
2. A training dataset is given in Table 1 with two attributes X and Y, and two classes ” + “ and ”−′′ . Each
attribute can take values from {0, 1, 2}. Answer the following questions.
(a) Build a decision tree on the training dataset.
(b) The concept for “ + ” class is Y = 1 and the concept for ” − “ class is X = 0 ∨ X = 2. Does your
decision tree capture this concept.
(c) What are the accuracy, precision, recall and F1-measure of the decision tree on the training set.
(d) What are the accuracy, precision, recall and F1-measure of the decision tree on the training set if
following cost matrix is considered.


 0 if i = j;

C (i, j) = 1 if i = +1, j = −1;
 #” − “ instances

if i = −, j = +;

#” +′′ instances
[ 3 + 1 + 3 + 3]

3. Consider the dataset given in Table 1 and predict the class label of an unknown instance with X = 2, Y = 2.
using KNN classifier (K = 111). [ 4]
4. Consider the dataset given in Table 2 (overleaf) and predict the class label of an unknown object X =
(Yes, Single, Low) using Naive Bayes classifier. [ 5]
5. What is model over-fitting? How do you estimate generalization error of a decision tree? [ 4]
6. Apply Random Forest with T = 3. Each tree is built with one attribute, from a bootstrap sample with
number of instances 5. Identify the class label of X = (Yes, Single, Low) (Table 2) [ 4]
[ P.T.O]
2

Table 1: Training data for Question No. 2 and Question No. 3


X Y #Instances
+ -
0 0 0 100
1 0 0 0
2 0 0 100
0 1 10 100
1 1 10 0
2 1 10 100
0 2 0 100
1 2 0 0

Table 2: Dataset: Question 4 and Question 6


Tid Home Marital Income Defaulter
Owner Status (Class)
1 Yes Single High No
2 No Married Medium No
3 No Single Low No
4 Yes Married High No
5 No Divorced Low Yes
6 No Married Low No
7 Yes Divorced High No
8 No Single Low Yes
9 No Married Low No
10 No Single Low Yes

You might also like