Classification 2
Classification 2
Data Mining:
Data Mining Methods
with Dr. Qin Lv
Learning objective: Apply techniques
for classification and explain how they
work. Evaluate and compare methods.
Classification
Ø Supervised learning
• Training set with predefined class labels
Ø Decision tree induction
• Top-down, recursive, attribute selection & split
Ø Bayesian classification
• Probability, naïve assumption, belief network
Support Vector Machines (SVM)
Ø Objects w/ class label
• (X1, y1), …, (Xn, yn)
Ø Separating hyperplane
• Maximum margin
• Maximum margin hyperplane
• Support vectors
SVM: Linear Separability
Ø Linearly separable
• Original space
Ø Linearly inseparable
• => higher dimension space
• Dot product on transformed data is mathematically
equivalent to applying a kernel function to original data
Neural Network
Ø Input, hidden, output layers
• #layers, #units/layer
Ø Weighted connections
• Initial weights, adjustments
Ø Feedforward
• Observations => classification
Ø Backpropagation
• Classification error => weight adjustment
Neuron: Hidden/Output Layer Unit
Deep Neural Network
Ø Advances in data, computation, model
Ø E.g., convolutional neural network (CNN)
• Activation function, regularization, attention, …
Classification Methods
Ø Decision tree induction: Efficient, easy to interpretate
Ø Bayesian classification
• Efficient, explainable, incremental
Ø Support vector machines: Good/high performance
Ø Neural networks
• Complex, high performance, poor interpretability
Ensemble
Ø Combined use of multiple models
• Analogy: consulting multiple medical doctors
Ø Bagging: equal weights, majority voting
• Training set: random sample with replacement
Ø Boosting: weighted votes
• Adjust weights to focus more on misclassified cases
Model Evaluation
Ø Holdout, random sampling
• Split into training set (for model construction) & test set
Ø (k-fold) Cross-validation
• Split into k partitions, each 1 for testing & (k-1) for training
Ø Bootstrapping: (e.g., .632 bootstrapping)
• Random sample with replacement
Classification Accuracy (1)
Ø Confusion matrix Predicted Class