National Institute of Technology Rourkela: Department of Computer Science and Engineering

The document contains instructions for 6 questions related to data mining techniques. Question 1 asks to identify the class label of an unseen data point using modified weighted k-nearest neighbors classifier. Question 2 involves building a decision tree on sample data and calculating various evaluation metrics. Question 3 asks to predict the class of an unknown data point using k-nearest neighbors. Question 4 and 6 involve predicting the class of an unknown data point using naive bayes and random forest classifiers respectively on another sample dataset. Question 5 defines model overfitting and asks how to estimate the generalization error of a decision tree.

Uploaded by

Raj Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views2 pages

National Institute of Technology Rourkela: Department of Computer Science and Engineering

Uploaded by

Raj Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

National Institute of Technology Rourkela

Department of Computer Science and Engineering

B.Tech. (7th Semester) Mid Semester Examination (September), 2017
Subject: Data Warehousing and Mining (CS 425)
Unnecessary long answers may attract negative mark. It is a 2-page question.

Full Marks: 30 Time: 2 Hours

1. Modified Weighted k-NNC assigns non-linear weight as e−d to a nearest neighbors of an unseen object (u),
where d is the distance from unseen object to the neighbor. Sorted distances from the unseen objects to its
neighbors (first to last) and their class labels are given. Identify the class label of the unseen object.
K-NN (u) = { x1C1 , x2C2 , x3C2 , x4C3 , x5C2 }. Superscript represents class label.
Distance vector from u to the neighbors = (1, 4, 5, 7, 10) [ 3]
2. A training dataset is given in Table 1 with two attributes X and Y, and two classes ” + “ and ”−′′ . Each
attribute can take values from {0, 1, 2}. Answer the following questions.
(a) Build a decision tree on the training dataset.
(b) The concept for “ + ” class is Y = 1 and the concept for ” − “ class is X = 0 ∨ X = 2. Does your
decision tree capture this concept.
(c) What are the accuracy, precision, recall and F1-measure of the decision tree on the training set.
(d) What are the accuracy, precision, recall and F1-measure of the decision tree on the training set if
following cost matrix is considered.


 0 if i = j;

C (i, j) = 1 if i = +1, j = −1;
 #” − “ instances

if i = −, j = +;

#” +′′ instances
[ 3 + 1 + 3 + 3]

3. Consider the dataset given in Table 1 and predict the class label of an unknown instance with X = 2, Y = 2.
using KNN classifier (K = 111). [ 4]
4. Consider the dataset given in Table 2 (overleaf) and predict the class label of an unknown object X =
(Yes, Single, Low) using Naive Bayes classifier. [ 5]
5. What is model over-fitting? How do you estimate generalization error of a decision tree? [ 4]
6. Apply Random Forest with T = 3. Each tree is built with one attribute, from a bootstrap sample with
number of instances 5. Identify the class label of X = (Yes, Single, Low) (Table 2) [ 4]
[ P.T.O]
2

Table 1: Training data for Question No. 2 and Question No. 3

X Y #Instances
+ -
0 0 0 100
1 0 0 0
2 0 0 100
0 1 10 100
1 1 10 0
2 1 10 100
0 2 0 100
1 2 0 0

Table 2: Dataset: Question 4 and Question 6

Tid Home Marital Income Defaulter
Owner Status (Class)
1 Yes Single High No
2 No Married Medium No
3 No Single Low No
4 Yes Married High No
5 No Divorced Low Yes
6 No Married Low No
7 Yes Divorced High No
8 No Single Low Yes
9 No Married Low No
10 No Single Low Yes

Assignment 04
No ratings yet
Assignment 04
17 pages
Control Language High-Performance Process Manager Reference Manual
No ratings yet
Control Language High-Performance Process Manager Reference Manual
142 pages
Self Organized Biological Dynamics and Non Linear Control PDF
100% (2)
Self Organized Biological Dynamics and Non Linear Control PDF
443 pages
Logic Games
100% (8)
Logic Games
272 pages
Penganggaran Lengkap
No ratings yet
Penganggaran Lengkap
112 pages
Mensuration Worksheet Grade 10 Maths
No ratings yet
Mensuration Worksheet Grade 10 Maths
4 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
Experimental Physics
100% (1)
Experimental Physics
444 pages
120 Minutes
No ratings yet
120 Minutes
11 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
Feedforward PDF
No ratings yet
Feedforward PDF
21 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
7 - Exercises On Queueing Theory
0% (1)
7 - Exercises On Queueing Theory
43 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Global Shipping at Erken Apparel International: Group No: 9, Section: B
50% (2)
Global Shipping at Erken Apparel International: Group No: 9, Section: B
23 pages
Session-29 Co3 - BBN DT KNN
No ratings yet
Session-29 Co3 - BBN DT KNN
34 pages
DMDM Part 2
No ratings yet
DMDM Part 2
94 pages
Optimising Safety Relief and Flare Systems
100% (1)
Optimising Safety Relief and Flare Systems
8 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Aiml Nts
No ratings yet
Aiml Nts
33 pages
Partial Differential Equations (Pdes)
No ratings yet
Partial Differential Equations (Pdes)
66 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
DWDM Unit Iv
No ratings yet
DWDM Unit Iv
81 pages
Unit 3
100% (1)
Unit 3
21 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
ML 3
No ratings yet
ML 3
20 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Supervised Learning
No ratings yet
Supervised Learning
71 pages
Lecture3 2020classification PDF
No ratings yet
Lecture3 2020classification PDF
124 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Pca2 1
No ratings yet
Pca2 1
26 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Ai Combined Update
No ratings yet
Ai Combined Update
274 pages
Artificial Intelligence Lab 7
No ratings yet
Artificial Intelligence Lab 7
10 pages
Design and Control of A Bidirectional Dual Active Bridge DC-DC Co PDF
No ratings yet
Design and Control of A Bidirectional Dual Active Bridge DC-DC Co PDF
95 pages
DM Unit-3
No ratings yet
DM Unit-3
23 pages
Act 8
No ratings yet
Act 8
20 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Digital Receiver Handbook: Basics of Software Radio: Fourth Edition
No ratings yet
Digital Receiver Handbook: Basics of Software Radio: Fourth Edition
42 pages
Design of Air
100% (1)
Design of Air
12 pages
Classification
No ratings yet
Classification
81 pages
Exam1 Practice Solutions
No ratings yet
Exam1 Practice Solutions
25 pages
WK 07
No ratings yet
WK 07
8 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
Meat Products Demand in Agege Local Gove
No ratings yet
Meat Products Demand in Agege Local Gove
64 pages
Order of Operations: Ma'am Rae Ann V. Ines
No ratings yet
Order of Operations: Ma'am Rae Ann V. Ines
35 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Assingment On Database
No ratings yet
Assingment On Database
16 pages
QB 2
No ratings yet
QB 2
4 pages
EBUS537 Theme4 Week 5
No ratings yet
EBUS537 Theme4 Week 5
26 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
Homework 3
No ratings yet
Homework 3
10 pages
9702 w02 QP 2
No ratings yet
9702 w02 QP 2
16 pages
Lab 4 - Logistic Regression - KNN - Notes
No ratings yet
Lab 4 - Logistic Regression - KNN - Notes
6 pages
Module 2 Descriptive Statistics Tabular and Graphical Presentations
No ratings yet
Module 2 Descriptive Statistics Tabular and Graphical Presentations
23 pages
Code Challenges For A Level 21 40 1
No ratings yet
Code Challenges For A Level 21 40 1
10 pages
Classification: Basic Concepts, Decision Trees, and Model Evaluation
No ratings yet
Classification: Basic Concepts, Decision Trees, and Model Evaluation
46 pages
Merging Result-Merged
No ratings yet
Merging Result-Merged
14 pages
Programs
No ratings yet
Programs
8 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
Working With Spatial Data
No ratings yet
Working With Spatial Data
17 pages
ML Assignment
No ratings yet
ML Assignment
7 pages
Data Mining and Warehousing Concepts Lab: (ITPC - 228)
No ratings yet
Data Mining and Warehousing Concepts Lab: (ITPC - 228)
6 pages
Solid Mensuration
100% (1)
Solid Mensuration
3 pages
COSC/MATH 2056 EL-01 Discrete Mathematics Ii: Course Information
No ratings yet
COSC/MATH 2056 EL-01 Discrete Mathematics Ii: Course Information
4 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
Here's An Visualization of The K-Nearest Neighbors Algorithm
No ratings yet
Here's An Visualization of The K-Nearest Neighbors Algorithm
5 pages
Assignment 5
No ratings yet
Assignment 5
5 pages
Tutorial 6
No ratings yet
Tutorial 6
3 pages
P02 DecisionTrees SolutionNotes
No ratings yet
P02 DecisionTrees SolutionNotes
3 pages
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
No ratings yet
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
15 pages
Ass3 v1
No ratings yet
Ass3 v1
4 pages
4.2 Confidence Intervals For The Mean
No ratings yet
4.2 Confidence Intervals For The Mean
9 pages
تمارین درس داده کاوی فصل طبقه بندی
No ratings yet
تمارین درس داده کاوی فصل طبقه بندی
7 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
UCS622
No ratings yet
UCS622
1 page
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
Important Questions - Sets QB365
No ratings yet
Important Questions - Sets QB365
3 pages
Bimo 1 2023 PSS (B&J) - 1
No ratings yet
Bimo 1 2023 PSS (B&J) - 1
2 pages
Induced Emf
No ratings yet
Induced Emf
5 pages
19 Rules of Inference: Elementary Valid Argument Form Logically Equivalent Expressions
No ratings yet
19 Rules of Inference: Elementary Valid Argument Form Logically Equivalent Expressions
1 page
Cprog
No ratings yet
Cprog
5 pages
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Practise Mathematics Grade 7 Book 8
From Everand
Practise Mathematics Grade 7 Book 8
Esther Chen
5/5 (1)

National Institute of Technology Rourkela: Department of Computer Science and Engineering

Uploaded by

National Institute of Technology Rourkela: Department of Computer Science and Engineering

Uploaded by

National Institute of Technology Rourkela

Department of Computer Science and Engineering

Full Marks: 30 Time: 2 Hours

Table 1: Training data for Question No. 2 and Question No. 3

Table 2: Dataset: Question 4 and Question 6

You might also like