Classification
Classification
•Classification is a supervised
machine learning technique used
in data mining to assign
categories or labels to data points
based on their features.
•It is widely used in applications
such as spam detection, fraud
detection, sentiment analysis, and
medical diagnosis.
Steps in Classification
• Decision Tree
• Naïve Bayes
• k-Nearest Neighbors (k-NN)
• Support Vector Machine (SVM)
• Neural Networks (Deep Learning)
• Random Forest
• Logistic Regression
A. Decision Tree
B. Naïve Bayes
F. Random Forest
•An ensemble of multiple decision trees for
improved accuracy.
•Reduces overfitting compared to a single
decision tree.
G. Logistic Regression
Sigmoid Function:
C. Linear Regression
Steps:
1.Choose K cluster centroids.
2.Assign each data point to the nearest centroid.
3.Update centroids based on assigned points.
4.Repeat until centroids stabilize.
C. Hierarchical Clustering
•Builds a tree-like dendrogram to show
relationships between data points.
•Two types:
• Agglomerative (Bottom-Up) –
Merges smaller clusters into larger
ones.
Key Parameters:
•Epsilon (ε): Defines neighborhood radius.
•MinPts: Minimum points required to form a
dense cluster.
Decision-Based Algorithms in Data Mining
Types:
•Steps:
• Create multiple decision trees from random
subsets of data.
• Aggregate the results (majority vote for
classification, averaging for regression).