Notes on ML Basics (Classifier, Types of Classification Algorithms, AUC-ROC Curve, Cross-Validation)
Notes on ML Basics (Classifier, Types of Classification Algorithms, AUC-ROC Curve, Cross-Validation)
The algorithm which implements the classification on a dataset is known as a classifier. There
are two types of Classifications:
Binary Classifier: If the classification problem has only two possible outcomes, then it
is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG,
etc.
Multi-class Classifier: If a classification problem has more than two outcomes, then it
is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
Types of ML Classification Algorithms:
Classification Algorithms can be further divided into the Mainly two category:
Linear Models
o Logistic Regression
o Support Vector Machines
Non-linear Models
o K-Nearest Neighbours
o Naïve Bayes
o Decision Tree
o Random Forest
AUC-ROC curve:
ROC curve stands for Receiver Operating Characteristics Curve and AUC stands for
Area Under the Curve.
It is a graph that shows the performance of the classification model at different thresholds.
To visualize the performance of the binary-class classification model, we use the AUC-
ROC Curve.
The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on Y-axis
and FPR (False Positive Rate) on X-axis.
What is Cross-Validation?
Cross validation is a technique used in machine learning to evaluate the performance of a
model on unseen data. It involves dividing the available data into multiple folds or subsets,
using one of these folds as a validation set, and training the model on the remaining folds.
This process is repeated multiple times, each time using a different fold as the validation set.
Finally, the results from each validation step are averaged to produce a more robust estimate
of the model’s performance. Cross validation is an important step in the machine learning
process and helps to ensure that the model selected for deployment is robust and generalizes
well to new data.