0% found this document useful (0 votes)

422 views4 pages

Data Mining Sample Midterm Questions (Last Modified 2/17/19)

The document provides a sample of 16 multiple choice and short answer questions that may appear on a data mining midterm exam. The questions cover topics such as the purpose of validation sets, interpreting decision trees, converting decision trees to rule sets, differences between nominal and ordinal features, definitions of noise, performance metrics for classification, learning curves, splitting attributes in decision trees, entropy and Gini calculations, and algorithms like decision trees, linear classifiers, nearest neighbors, rule learners and the curse of dimensionality.

Uploaded by

rekha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

422 views4 pages

Data Mining Sample Midterm Questions (Last Modified 2/17/19)

Uploaded by

rekha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Mining

Sample Midterm Questions (Last Modified 2/17/19)

Please note that the purpose here is to give you an idea about the level of detail of the questions on the
midterm exam. These sample questions are not meant to be exhaustive and you may certainly find
topics on the midterm that are not covered here at all. Your midterm will include more questions than
this.

1. Sometimes a data set is partitioned such that a validation set is provided. What is the purpose
of the validation set?

2. Are decision trees easy to interpret (circle one): Yes No

3. How can you convert a decision tree into a rule set? Explain the process.

4. List two reasons why data mining is popular now and it wasn’t as popular 20 years ago.

5. How does an ordinal feature differ from a nominal feature? Explain in one or two sentences.

6. Sally measures the pressure of all of tires coming into her garage for an oil change and records
the values. Unknown to her, her tire gauge is miscalibrated and adds 3 psi to each reading.
According to the definition of noise used by our textbook, is this error introduced by the tire
gauge considered noise? Answer “yes” or “no” and justify your answer.
Name: _______________________________ 2

7. For a two-class classification problem, with a Positive class P and a negative class N, we can
describe the performance of the algorithm using the following terms: TP, FP, TN, and FN.

a) What do each of these terms refer to?

TP:

TN:

FP:

FN:

b) Place the 4 terms listed above in part a into the appropriate slots in the table below.

Predicted

Positive Negative

Positive
Actual
Negative

c) Provide the formula for accuracy in terms of TP, TN, FP, and FN.

d) Provide the formula for precision and recall using TP, TN, FP, and FN.

Precision =

Recall =

8. If we build a classifier and evaluate it on the training set and the test set:

a) Which data set would we expect to have the higher accuracy: training set test set

b) Which data set provides best accuracy estimate on new data: training set test set
Name: _______________________________ 3

9. A learning curve shows the performance of a classifier as the training set size increases.
Assume that training set size is plotted on the x-axis and accuracy is plotted on the y axis.

a) On the figure below, plot a typical/expected learning curve when the accuracy is measured
on the 1) training set data and 2) the test set data (i.e., draw two curves). Should there be
any difference? If so, comment on the expected difference.
Accuracy

Training set size

10. You need to split on attribute a1 in your decision tree. The attribute has 8 values. Why might a
two-way split be better than an 8-way split? What might be a problem with the 8-way split?

Entropy(t ) = − p( j | t ) log p( j | t ) GINI (t ) = 1 − [ p( j | t )]2

j j

11. Given a training set with 5+ and 10- examples,

a) What is the entropy value associated with this data set? You need not simplify your
answer to get a numerical answer.

b) What is the Gini associated with this data set? In this case you should simplify your result,
although you may express the answer the answer as a fraction rather than a decimal.
Name: _______________________________ 4

c) If you generated a decision tree with just the root node for the examples in this data set,
what class value would you assign and what would be the training-set error rate associated
with this (very short) decision tree?

12. The nearest neighbor algorithms relies on having a good notion of similarity, or distance. In
class we discussed several factors that can make it non-trivial to have a good similarity metric.
What were two of the factors?

13. What classifier induction algorithm can effectively generate the most expressive classifiers, in
terms of the decision boundaries that can be formed? Which is the least expressive. Rank
order them from most to least expressive. Briefly justify your ordering.

The induction algorithms are: decision trees, linear classifiers, and nearest neighbor.

14. What is the curse of dimensionality?

15. What does it mean if the rule set for a rule learner is exhaustive?

16. Does the Ripper Rule Learner build rules from general to specific or specific to general?

Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
PCCCS504 Module 4
No ratings yet
PCCCS504 Module 4
4 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Data Science Final Mock Test
No ratings yet
Data Science Final Mock Test
47 pages
Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
DM-I Q Paper 2024
No ratings yet
DM-I Q Paper 2024
12 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
ML Ese 031223 Openbook
No ratings yet
ML Ese 031223 Openbook
4 pages
ML MID-1 Question Bank
No ratings yet
ML MID-1 Question Bank
6 pages
21CS54 QB Test3
No ratings yet
21CS54 QB Test3
2 pages
Machine Learning Assignment-7.Sol
No ratings yet
Machine Learning Assignment-7.Sol
6 pages
Data Mining Comprehensive Exam - Regular PDF
No ratings yet
Data Mining Comprehensive Exam - Regular PDF
3 pages
P15CS71 - Z2
No ratings yet
P15CS71 - Z2
3 pages
DM 2019
No ratings yet
DM 2019
7 pages
Mid Semester Regular-DM
No ratings yet
Mid Semester Regular-DM
3 pages
Data Analysis and Machine Learning Lab Questions
No ratings yet
Data Analysis and Machine Learning Lab Questions
9 pages
ML Questions
No ratings yet
ML Questions
9 pages
Uct633 MST e Mar25
No ratings yet
Uct633 MST e Mar25
2 pages
IIT Kharagpur Machine Learning Exam Guide
No ratings yet
IIT Kharagpur Machine Learning Exam Guide
11 pages
Nptel ML Questions
100% (2)
Nptel ML Questions
12 pages
Data Mining Exam for B.Sc. Students
No ratings yet
Data Mining Exam for B.Sc. Students
6 pages
Essentials of ML, VITOL Course DA1
No ratings yet
Essentials of ML, VITOL Course DA1
5 pages
Practical 7 Classification Revision Questions
No ratings yet
Practical 7 Classification Revision Questions
8 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
MidA F21
No ratings yet
MidA F21
8 pages
ML Assignment
No ratings yet
ML Assignment
7 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Sample MCQ
No ratings yet
Sample MCQ
16 pages
ML - Unit 2 - Question Bank
No ratings yet
ML - Unit 2 - Question Bank
15 pages
Midterm Spring13
No ratings yet
Midterm Spring13
10 pages
212 Final-Solution
No ratings yet
212 Final-Solution
23 pages
Machine Learning Assignment Questions
No ratings yet
Machine Learning Assignment Questions
4 pages
02-11-2019 - An - Feature Engineering - Comprehensive Exam-Regular
No ratings yet
02-11-2019 - An - Feature Engineering - Comprehensive Exam-Regular
2 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
19ECE357 - V Sem End - Odd 2023
No ratings yet
19ECE357 - V Sem End - Odd 2023
4 pages
Data Science
No ratings yet
Data Science
35 pages
QB - Data Science
No ratings yet
QB - Data Science
4 pages
hw2 2011spring
0% (1)
hw2 2011spring
3 pages
M.Tech Data Mining Mid-Sem Exam 2019
No ratings yet
M.Tech Data Mining Mid-Sem Exam 2019
2 pages
UNIT 1 Practice Quiz - MCQs - ML
100% (1)
UNIT 1 Practice Quiz - MCQs - ML
10 pages
examBD2223 January Solutions
No ratings yet
examBD2223 January Solutions
7 pages
Machine Learning QB
No ratings yet
Machine Learning QB
5 pages
It701d 2
No ratings yet
It701d 2
6 pages
Data Science - QB
No ratings yet
Data Science - QB
8 pages
Midterm Lab Quiz 2 - Attempt Review
No ratings yet
Midterm Lab Quiz 2 - Attempt Review
6 pages
Final 2019
No ratings yet
Final 2019
15 pages
DIT865 2018 Mar Solution
No ratings yet
DIT865 2018 Mar Solution
9 pages
AI Lecture 12-b
No ratings yet
AI Lecture 12-b
20 pages
AL3451 Assignment Question1
No ratings yet
AL3451 Assignment Question1
3 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Unit 3
No ratings yet
Unit 3
19 pages
Progressive Band Selection Processing of Hyperspectral Image Classification
No ratings yet
Progressive Band Selection Processing of Hyperspectral Image Classification
5 pages
Stock Market Prediction Using Machine Learning Algorithms A Classification Study
No ratings yet
Stock Market Prediction Using Machine Learning Algorithms A Classification Study
4 pages
KYC Test Cases
No ratings yet
KYC Test Cases
18 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
Analysing Stock Market Trend Prediction Using Machine Amp Deep Learning Models A Comprehensive Review
No ratings yet
Analysing Stock Market Trend Prediction Using Machine Amp Deep Learning Models A Comprehensive Review
10 pages
High Utility Sequential Pattern Mining
No ratings yet
High Utility Sequential Pattern Mining
17 pages
Linear Regression Experiment
No ratings yet
Linear Regression Experiment
6 pages
Spatial & Web Mining Insights
100% (1)
Spatial & Web Mining Insights
45 pages
Indian Stock Market Prediction Using Deep Learning
No ratings yet
Indian Stock Market Prediction Using Deep Learning
6 pages
TCS
No ratings yet
TCS
21 pages
Fuzzy K-Means in Cybersecurity Analysis
No ratings yet
Fuzzy K-Means in Cybersecurity Analysis
5 pages
Progress Test 1
No ratings yet
Progress Test 1
4 pages
Game Music Composition Quests
No ratings yet
Game Music Composition Quests
1 page
Chapter 3 Tourism Marketing
No ratings yet
Chapter 3 Tourism Marketing
20 pages
Cyber Stalking Laws in India: A Review
No ratings yet
Cyber Stalking Laws in India: A Review
29 pages
Sword of Judgment: Spiritual Incest & Righteousness
75% (4)
Sword of Judgment: Spiritual Incest & Righteousness
52 pages
Semi-Detailed Lesson Plan
No ratings yet
Semi-Detailed Lesson Plan
4 pages
English Time - Class 2 - Present Tense (Review)
No ratings yet
English Time - Class 2 - Present Tense (Review)
3 pages
Market Equilibrium: Equilibrium Price and Quantity
No ratings yet
Market Equilibrium: Equilibrium Price and Quantity
2 pages
Thesis Challenges in Small Biz Mgmt
100% (3)
Thesis Challenges in Small Biz Mgmt
6 pages
Microservices Architecture Guide
No ratings yet
Microservices Architecture Guide
2 pages
The Etruscans Were They Celts PDF
No ratings yet
The Etruscans Were They Celts PDF
392 pages
Multigroup Ethnic Identity Measure
No ratings yet
Multigroup Ethnic Identity Measure
21 pages
Chromosomal Disorders
No ratings yet
Chromosomal Disorders
4 pages
Your Golden Transits Await! 2
No ratings yet
Your Golden Transits Await! 2
1 page
1 - BMB307 - Basic Immunology - 25.09.22
No ratings yet
1 - BMB307 - Basic Immunology - 25.09.22
16 pages
A Bird Came Down The Walk - Close Reading
No ratings yet
A Bird Came Down The Walk - Close Reading
2 pages
Lonely Planet Discover Italy Lonely Planet Instant Download
No ratings yet
Lonely Planet Discover Italy Lonely Planet Instant Download
132 pages
Celebrity Journalism
No ratings yet
Celebrity Journalism
18 pages
Much Ado About Nothing
No ratings yet
Much Ado About Nothing
177 pages
Debat Bindo
No ratings yet
Debat Bindo
11 pages
Kayla Minetti - Sept 2020 Resume1 2
No ratings yet
Kayla Minetti - Sept 2020 Resume1 2
1 page
The Alexander Coinage of Sicyon / Arr. From Notes of Edward T. Newell With Comments and Additions by Sydney P. Noe
No ratings yet
The Alexander Coinage of Sicyon / Arr. From Notes of Edward T. Newell With Comments and Additions by Sydney P. Noe
62 pages
Roberto Santilli - Electrocardiography of The Dog and Cat - Diagnosis of Arrhythmias-Edra (2018)
100% (2)
Roberto Santilli - Electrocardiography of The Dog and Cat - Diagnosis of Arrhythmias-Edra (2018)
348 pages
Grade 3 Intergrated Learning 3 Endterm 2 Exam
No ratings yet
Grade 3 Intergrated Learning 3 Endterm 2 Exam
4 pages
Question Bank Dss-1
No ratings yet
Question Bank Dss-1
61 pages
Essays On Buddhism
100% (2)
Essays On Buddhism
6 pages
CBSE Class 9 Science: NCERT Solutions
No ratings yet
CBSE Class 9 Science: NCERT Solutions
7 pages
Law Students' Moot Court Brief
No ratings yet
Law Students' Moot Court Brief
20 pages
Chris Kenner Totally Out of Control
84% (31)
Chris Kenner Totally Out of Control
144 pages
The Futur Antérieur
No ratings yet
The Futur Antérieur
2 pages

Data Mining Sample Midterm Questions (Last Modified 2/17/19)

Uploaded by

Data Mining Sample Midterm Questions (Last Modified 2/17/19)

Uploaded by

Data Mining

Sample Midterm Questions (Last Modified 2/17/19)

2. Are decision trees easy to interpret (circle one): Yes No

a) What do each of these terms refer to?

Training set size

Entropy(t ) = − p( j | t ) log p( j | t ) GINI (t ) = 1 − [ p( j | t )]2

11. Given a training set with 5+ and 10- examples,

14. What is the curse of dimensionality?

You might also like