0% found this document useful (0 votes)
178 views1 page

INSY 5339 - Data Mining Exam #2 Review

The document provides a review of topics to study for Exam #2 in an INSY 5339 Data Mining course. Students should know methods of evaluation like holdout validation and cross validation. They should understand techniques for handling class imbalance, such as over-sampling and under-sampling. Additionally, students should be familiar with performance metrics, data preparation methods, data representation formats, estimating generalization error, and data mining methods including decision trees and association rules.

Uploaded by

LaluMohan Kc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
178 views1 page

INSY 5339 - Data Mining Exam #2 Review

The document provides a review of topics to study for Exam #2 in an INSY 5339 Data Mining course. Students should know methods of evaluation like holdout validation and cross validation. They should understand techniques for handling class imbalance, such as over-sampling and under-sampling. Additionally, students should be familiar with performance metrics, data preparation methods, data representation formats, estimating generalization error, and data mining methods including decision trees and association rules.

Uploaded by

LaluMohan Kc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

INSY 5339 – Data Mining

Exam #2 Review

 Know the following methods of estimation: holdout, random subsample, cross validation, and
stratified sampling (see the lecture notes on Model Evaluation).
 Know what the following methods for treating class imbalance problem do: over-sampling,
under-sampling and SMOTE.
 Know the different metrics for performance evaluation: accuracy, cost, precision, recall, F-
measure, etc. (see the lecture notes on Model Evaluation)
 Know the following data preparation methods: sampling and discretization (equal frequency vs.
equal width).
 Know the following data representation formats: decision tables, decision trees, classification
rules, regression equation.
 Know methods for estimating generalization errors: optimistic vs pessimistic (see the example in
lecture notes on Decision Trees).
 Know the following data mining methods: ZeroR, OneR (see the example in lecture notes on
Basic Classification Methods), Prism (see the example in lecture 7 on Classification Methods II),
Instance-based (k-NN; see the sample 1-NN problem as separate attachment), Decision Tables,
Linear Regression, Association Rules (should be able to find the association rules with a certain
confidence level – see the example in lecture notes on Intro to Machine Learning), Decision
Trees (using either the entropy measure or Gini coefficient; see the sample DTL problem as
separate attachment).
 Know how to construct the ROC curve given the posterior probabilities of predicting the class
(see the example in lecture notes on Model Evaluation)

You might also like