0% found this document useful (0 votes)
20 views18 pages

Classification 2

This document discusses various classification methods for data mining including decision trees, Bayesian classification, support vector machines, and neural networks. It also covers evaluating classification models using metrics like accuracy and error, as well as model selection techniques such as ROC curves and t-tests.

Uploaded by

pppchan23100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views18 pages

Classification 2

This document discusses various classification methods for data mining including decision trees, Bayesian classification, support vector machines, and neural networks. It also covers evaluating classification models using metrics like accuracy and error, as well as model selection techniques such as ROC curves and t-tests.

Uploaded by

pppchan23100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Classification

Data Mining:
Data Mining Methods
with Dr. Qin Lv
Learning objective: Apply techniques
for classification and explain how they
work. Evaluate and compare methods.
Classification
Ø Supervised learning
• Training set with predefined class labels
Ø Decision tree induction
• Top-down, recursive, attribute selection & split
Ø Bayesian classification
• Probability, naïve assumption, belief network
Support Vector Machines (SVM)
Ø Objects w/ class label
• (X1, y1), …, (Xn, yn)
Ø Separating hyperplane
• Maximum margin
• Maximum margin hyperplane
• Support vectors
SVM: Linear Separability
Ø Linearly separable
• Original space
Ø Linearly inseparable
• => higher dimension space
• Dot product on transformed data is mathematically
equivalent to applying a kernel function to original data
Neural Network
Ø Input, hidden, output layers
• #layers, #units/layer
Ø Weighted connections
• Initial weights, adjustments
Ø Feedforward
• Observations => classification
Ø Backpropagation
• Classification error => weight adjustment
Neuron: Hidden/Output Layer Unit
Deep Neural Network
Ø Advances in data, computation, model
Ø E.g., convolutional neural network (CNN)
• Activation function, regularization, attention, …
Classification Methods
Ø Decision tree induction: Efficient, easy to interpretate
Ø Bayesian classification
• Efficient, explainable, incremental
Ø Support vector machines: Good/high performance
Ø Neural networks
• Complex, high performance, poor interpretability
Ensemble
Ø Combined use of multiple models
• Analogy: consulting multiple medical doctors
Ø Bagging: equal weights, majority voting
• Training set: random sample with replacement
Ø Boosting: weighted votes
• Adjust weights to focus more on misclassified cases
Model Evaluation
Ø Holdout, random sampling
• Split into training set (for model construction) & test set
Ø (k-fold) Cross-validation
• Split into k partitions, each 1 for testing & (k-1) for training
Ø Bootstrapping: (e.g., .632 bootstrapping)
• Random sample with replacement
Classification Accuracy (1)
Ø Confusion matrix Predicted Class

• E.g., Fraud detection Yes No

Ø Sensitivity: t_pos / pos Actual


Class
Yes True Positive False Negative
No False Positive True Negative
Ø Specificity: t_neg / neg
Ø Precision: t_pos / (t_pos + f_pos)
Ø Accuracy:
Classification Accuracy (2)
Ø Costs and benefits of TP, TN, FP, FN
• E.g., fraud detection, medical diagnosis
• False positive: a normal case is flagged as fraud
• False negative: a fraud is misclassified as normal
Ø Multi-class classification
• Exact match: i.e., predicted class = actual class
• Some classes may be more similar (or different)
Prediction
Error
Ø E.g., stock price,
travel time
Ø Difference between
predicted values
and actual values
Model Selection: ROC Curve
Ø X: false positive rate
• f_pos / neg
Ø Y: true positive rate
• t_pos / pos
Ø Area below curve:
• accuracy, diagonal line: 0.5 accuracy
Model Selection: T-test
Ø Two models M1 and M2, k-fold cross-validation
• err(M1)1, …, err(M1)k vs. err(M2)1, …, err(M2)k
Ø Choose model with lower mean error?
• Statistically significant? Or by chance?
Ø T-test
T-test Example
Ø 10-fold cross-validation
Ø T-table (e.g., https://www.itl.nist.gov/div898/handbook/eda/section3/eda3672.htm)
Ø Degree of freedom: v = 10-1 = 9
Ø Significance level: a = 0.05
Ø Two-sided test: 1-a/2 = 1-0.05/2 = 0.975
Ø Check T-table (v=9, 0.975): critical value is 2.262
Ø Compute t, statistically significant if t > 2.262
Summary: Classification
Ø Supervised learning
• Training set with predefined class labels
Ø Methods
• Decision tree, Bayesian, SVM, neural network, ensemble
Ø Model evaluation, model selection
• Confusion matrix, accuracy, error, ROC curve, T-test

You might also like