BSC ML CH1
BSC ML CH1
Binary and
multiclass classification, Evaluation
Unit 1: measures for supervised learning, k-
Nearest Neighbor algorithm
2
It is the field of study that gives computers the capability to learn
without being explicitly programmed.
I/P
Data
Traditional Program Output
Algorithm
I/P
Data
Machine Learning Program
Output
3
Relationship Between
AI, ML, DL and DS
4
Types
Supervised Learning
• Supervised learning is when we train the machine using data that is well labeled.
• After that, the machine is provided with a new set of examples(data) so that the
supervised learning algorithm analyses the training data(set of training examples)
and produces a correct outcome from labeled data.
Classification
• The Classification algorithm is a Supervised Learning
technique that is used to identify the category of new
observations on the basis of training data.
• In Classification, a program learns from the given
dataset or observations and then classifies new
observation into a number of classes or groups.
• Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or
dog, etc. Classes can be called as targets/labels or
categories.
• Types:
➢ Binary Classifier: If the classification problem has only
two possible outcomes, then it is called as Binary
Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT
SPAM, CAT or DOG, etc.
➢ Multi-class Classifier: If a classification problem has
more than two outcomes, then it is called as Multi-class
Classifier.
Example: Classifications of types of crops, Classification
of types of music.
Binary Classification
• It is a process or task of classification, in which a given data is being classified into two
classes. It’s basically a kind of prediction about which of two groups the thing belongs to.
• categorizing data into two distinct classes. This method is essential for tasks like email spam
detection and medical diagnostics. It provides a clear decision boundary.
• The most popular algorithms used by the binary classification are-
• Logistic Regression
• k-Nearest Neighbors
• Decision Trees
• Support Vector Machine
• Naive Bayes
Multiclass Classification
Multi-class classification is the task of classifying elements into different classes. Unlike binary, it doesn’t restrict itself to any number of classes.
14
Comparison
16
Machine Learning Models
Task Driven Data
Driven
Clustering
Divide by similarity
Association
Regression Classification Identify Sequences
Dimensionality
Linear Regression Logistic Reduction
Compress data based on features
Regression
Decision Tree
Support Vector Machine
Random Forest
18
Neural Networks Naïve Bayes
Model Evaluation
⮚ Train/Test is a method to measure the accuracy of your model.
⮚ It is called Train/Test because you split the data set into two sets: a training set and a testing
set.
⮚ Example: 80% for training, and 20% for testing.
⮚ You train the model using the training set.
⮚ You test the model using the testing set.
⮚ Train the model means create the model.
⮚ Test the model means test the accuracy of the model.
⮚ We can measure model accuracy by two methods. Accuracy simply means the number of values correctly
predicted.
1. Confusion Matrix
2. Classification Measure
Confusion Matrix
• The confusion matrix is also known as Error matrix and is represented by a table which describes the
• It is a two-dimensional matrix where each row represents the instances in predictive class while each
column represents the instances in the actual class or you put the values in the other way.
Here, TP (True Positive) means the observation is positive and is predicted as positive,
FP (False Positive) means observation is negative but is predicted as positive,
TN (True Negative) means the observation is negative and is predicted as negative
and FN (False Negative) means the observation is positive but it is predicted as negative.
20
• Actual values =
[‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’,
‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’,
‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’,
‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]
• Predicted values =
[‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’,
‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’,
‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’,
‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]
• A good model is one which has high TP and TN rates, while low FP and FN
rates.
• If you have an imbalanced dataset to work with, it’s always better to
use confusion matrix as your evaluation criteria for your machine learning
model.
2. Classification Measure
• Basically, it is an extended version of the confusion matrix.
• There are measures other than the confusion matrix which can
help achieve better understanding and analysis of our model and
its performance.
a. Accuracy
b. Precision
c. Recall (TPR, Sensitivity)
d. F1-Score
e. FPR (Type I Error)
f. FNR (Type II Error)
Accuracy
➢ Accuracy is the ratio of the total number of correct predictions and the total number of
predictions.
➢ Accuracy is, simply put, the total proportion of observations that have been correctly
predicted.
➢ We can use accuracy when we are interested in predicting both 0 and 1 correctly and our
dataset is balanced enough.
➢ The formula for calculating accuracy is as follows:
25
A common complaint about accuracy is that it fails when the classes are imbalanced.
For example if the data contains only 10% of positive instances, a majority baseline classifier which always assigns
the negative label would reach 90% accuracy since it would correctly predict 90% instances. But of course such a
classifier is useless, it doesn't classify anything.
Precision
• Precision is the ratio between the True Positives and all the Positives.
• Precision is a measure of how many of the positive predictions made are correct (true
positives)
• Precision is a good measure to determine, when the costs of False Positive is high
Recall
• The recall is the measure of our model correctly identifying True Positives.
• Thus, for all the patients who actually have heart disease, recall tells us how
many we correctly identified as having a heart disease
• Recall also gives a measure of how accurately our model is able to identify
the relevant data. We refer to it as Sensitivity or True Positive Rate.
• In most cases, we want both our precision and recall being high, but it is not
possible.
• When our precision will be high our recall will be low and vice versa.
• So to balance these we have another metric called F1 Score.
F1 Score
F1 score is a machine learning evaluation metric that measures a model’s accuracy which combines the precision and recall
scores of a model.
The F1 score is a popular performance measure for classification and often preferred over accuracy when data is
unbalanced, such as when the quantity of examples belonging to one class significantly outnumbers those found in the
other class.
F1 Score might be a better measure to use if we need to seek a balance between Precision and Recall
Advantages:
• Very small precision or recall will result in lower overall score. Thus it helps balance the two metrics.
• If you choose your positive class as the one with fewer samples, F1-score can help balance the metric across
positive/negative samples.
31
AUC-ROC
• Area Under ROC curve is basically used as a measure of the quality of a classification model.
Hence, the AUC-ROC curve is the performance measurement for the classification problem at
various threshold settings.
33
• It measures the overall performance of the binary classification model.
• As both TPR and FPR range between 0 to 1, So, the area will always lie between
0 and 1, and A greater value of AUC denotes better model performance.
• Our main goal is to maximize this area in order to have the highest TPR and
lowest FPR at the given threshold.
• It represents the probability with our model to distinguish between the two
classes which are present in our target.
•Higher X-axis value indicates a higher number of false positive
than True Negative
• A poor model has AUC near the 0, which means it has the
worst measure of separability.
x = 0.5
Implies that the ROC is random and the classifier was unable to differentiate the positive and negative
classes properly.
x = 1.0
Implies that the ROC is perfect and the classifier has the ability to provide highly accurate results with
reliable performance.
AUC-ROC
• ROC (Receiver Operating Characteristic) Curve tells us about how good the model can
distinguish between two things (e.g If a patient has a disease or no).
• Better models can accurately distinguish between the two classes , Whereas, a poor model
will have difficulties in distinguishing between the two.
• ROC Curves and AUC in Python
# calculate roc curve : The function returns the false positive rates
for each threshold, true positive rates for each threshold .
fpr, tpr, thresholds = roc_curve(y, probs)
# calculate AUC
auc = roc_auc_score(y, probs)
print('AUC: %.3f' % auc)
• from sklearn.datasets import make_classification
• from sklearn.linear_model import LogisticRegression
• from sklearn.metrics import roc_curve
• from sklearn.metrics import roc_auc_score
• import plotly.express as px
• import pandas as pd
•
• # Random Classification dataset
• X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
•
• model = LogisticRegression()
• model.fit(X, y)
• Now we want to evaluate how good our model is using ROC curves. To do this, we need to find FPR and TPR for various
threshold values
• Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories.
• To solve this type of problem, we need a K-NN algorithm.
• With the help of K-NN, we can easily identify the category or class of a particular dataset.
How does K-NN work?