0% found this document useful (0 votes)
14 views28 pages

Ch6-Models Selection Evaluating Classifiers

The document outlines the process of model selection and evaluation in machine learning, detailing the steps of data input, abstraction, and generalization. It discusses various training methods such as holdout, k-fold cross-validation, and bootstrap sampling, as well as the concepts of underfitting and overfitting. Additionally, it covers performance evaluation metrics for classification models, including accuracy, precision, recall, and ROC curves.

Uploaded by

tssdhanvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views28 pages

Ch6-Models Selection Evaluating Classifiers

The document outlines the process of model selection and evaluation in machine learning, detailing the steps of data input, abstraction, and generalization. It discusses various training methods such as holdout, k-fold cross-validation, and bootstrap sampling, as well as the concepts of underfitting and overfitting. Additionally, it covers performance evaluation metrics for classification models, including accuracy, precision, recall, and ROC curves.

Uploaded by

tssdhanvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Models selection &

Evaluating Classifiers
Outline
• Model in ML
• Selecting a Model
• Training a Model (for Supervised Learning) (Holdout
method, K-fold Cross-validation method, Bootstrap
sampling, Lazy vs. Eager learner)
• Underfitting and Overfitting
• Evaluating Performance of a Model
Model in ML
The basic learning process in machine learning can be divided into three key parts:

• Data Input: This is the initial step where information or data is collected and provided to the learning system. In
machine learning, it refers to the dataset that contains examples and features used for training.
• Abstraction: After receiving the data, the system or learner abstracts relevant patterns, features, or concepts
from it. In machine learning, it involves extracting meaningful patterns and relationships from the data using
algorithms and statistical methods.
• Generalization: This is the process of applying the abstracted knowledge or patterns to make predictions or
decisions beyond the specific examples or data points that were initially provided. In machine learning, it's the
model's capacity to make accurate predictions or classifications on unseen data based on what it has learned
from the training data.

⮚ Abstraction is a significant step as it represents raw input data in a summarized and structured
format, such that a meaningful insight is obtained from the data.
⮚ This structured representation of raw input data to the meaningful pattern is called a model.
⮚ The model might have different forms. It might be a mathematical equation, it might be a graph
or tree structure, it might be a computational block, etc
Selecting a model
There are three broad categories of machine learning approaches used for resolving different types
of problems. They are:
1. Supervised
▪ Classification
▪ Regression
2. Unsupervised
▪ Clustering
▪ Association analysis
3. Reinforcement

For each of the cases, the model that has to be created/trained is different. There are multiple
factors play a role when we try to select the model for solving a machine learning problem. The
most important factors are :
i. The kind of problem we want to solve using machine learning
ii. The nature of the underlying data.
Machine learning algorithms are broadly of two types:
• Models for supervised learning, which primarily focus on solving predictive problems
• Models for unsupervised learning, which solve descriptive problems
Predictive models
• Predictive models try to predict certain value using the values in an input data set
• The Predictive models which are used for prediction of target features of categorical value are known as
classification models. The target feature is known as a class and the categories to which classes are
divided into are called levels.
• Examples:
▪ Predicting win/loss in a cricket match
▪ Predicting whether a transaction is fraud
▪ Predicting whether a customer may move to another product
• Some of the popular classification models include: k-Nearest Neighbor (kNN), Naïve Bayes and
Decision Tree.

• Predictive models may also be used to predict numerical values of the target feature based on the
predictor features. The models which are used for prediction of the numerical value of the target feature
of a data instance are known as regression models.
• Examples:
▪ Prediction of revenue growth in the succeeding year
▪ Prediction of rainfall amount in the coming monsoon
▪ Prediction of potential flu patients and demand for flu shots next winter
• Some of the popular regression models include: Linear Regression and Logistic Regression models.
Descriptive models
• Descriptive models are used to describe a data set or gain insight from a data set.
• There is no target feature or single feature of interest in case of unsupervised
learning. Based on the value of all features, interesting patterns or insights are
derived about the data set.
• Descriptive models which group together similar data instances, i.e. data
instances having a similar value of the different features are called clustering
models.
• Examples of clustering include:
▪ Customer grouping or segmentation based on social, demographic, ethnic,
etc. factors
▪ Grouping of music based on different aspects like genre, language,
timeperiod, etc.
▪ Grouping of commodities in an inventory
• The most popular model for clustering is k-Means
Training a model (for Supervised Learning)
• Machine learning model training is a fundamental process in AI, enabling
computers to learn and make intelligent decisions.
• It involves teaching algorithms to recognize patterns, relationships, and
trends in data to make predictions or decisions.
• Training starts with a dataset containing examples and corresponding
outcomes.
• The model learns to generalize and make predictions on new, unseen data.
• Once trained, the model can be used to make predictions, classify objects, or
offer recommendations.
• Effective model training is critical for various applications in industries like
healthcare, finance, autonomous vehicles, and natural language processing.
• There are Various methods for training the models like:
▪ Holdout method
▪ K-fold Cross-validation method
▪ Bootstrap sampling
▪ Lazy vs. Eager learner
Holdout method
• Holdout Method involves splitting the dataset into two parts, typically a training set and a test set. The
training set is used to train the model, and the test set is used to evaluate its performance.
• Training and Test Data Split: Typically, 70%–80% of labeled input data is used for training, and 20%–30% is
used for testing, but other proportions are also acceptable.
• Random Data Split: To ensure both training and test data are similar, random partitioning is done using
random numbers.(In some cases, data is divided into three parts: training, test, and validation data.
Validation data is used iteratively to refine the model).
• Problem of Imbalanced Data: Imbalanced distribution of classes in training and test data can occur
despite random sampling, particularly when certain classes have much fewer examples.
• To address the problem of imbalanced data, stratified random sampling can be used, which divides
data into homogeneous groups and selects random samples from each group to ensure balanced
proportions.

Figure 1: Holdout method


K-fold Cross-validation method
• In k-fold cross-validation, the data set is divided into k-completely distinct or non-
overlapping random partitions called folds.
• The value of ‘k’ in k-fold cross-validation can be set to any number.
• There are two approaches which are extremely popular:

▪ Leave-one-out cross-validation (LOOCV)


o LOOCV is an extreme case of cross-validation where one data instance is
used as test data at a time.
o It aims to maximize the amount of data used for model training.
o The number of iterations in LOOCV equals the total number of data points in
the dataset.
o LOOCV is computationally expensive due to the large number of iterations.
o As a result, it is not commonly used in practice due to its high computational
cost.
K-fold Cross-validation method ………….
10-fold cross-validation (10-fold CV)
o is a widely used approach for assessing model
performance.
o The dataset is divided into 10 equal-sized folds,
each comprising about 10% of the data.
o Records in a fold are randomly sampled to ensure
a fair representation.
o In each of the 10 iterations, one fold is designated
as the test data, while the remaining 9 folds (90%
of the data) are used for training.
o This process is repeated 10 times, with a different
fold as the test data in each iteration.
o The average performance across all iterations is
Figure 2:Overall approach for K-fold cross-validation
reported to evaluate the model.
Bootstrap Sampling method
• Bootstrap sampling is a popular method for creating training and test data sets from an input data set.
• It utilizes Simple Random Sampling with Replacement (SRSWR), a well-known technique in sampling
theory for drawing random samples.
• In contrast to k-fold cross-validation, which divides data into separate partitions for testing and training,
bootstrapping randomly selects data instances from the input data set.
• Bootstrapping allows for the possibility of the same data instance being picked multiple times during the
sampling process.
• As a result, it can create one or more training data sets with 'n' data instances, and some instances may
be repeated.
• This technique is particularly useful in case of input data sets of small size, i.e. having very less number of
data instances.

Figure 3:Bootstrap sampling


Lazy vs. Eager learner method
• Eager Learning:
▪ Follows standard machine learning principles, constructing a generalized, input-independent target
function during training.
▪ Typical machine learning steps of abstraction and generalization are involved.
▪ Results in a trained model at the end of the learning phase.
▪ Eager learners are prepared with a model for classification when test data is received.
▪ Learning phase is time-consuming.
▪ Algorithms using eager learning include Decision Trees, Support Vector Machines, Neural Networks,
etc.
• Lazy Learning:
▪ Skips the abstraction and generalization processes, essentially not 'learning' in the traditional sense.
▪ Utilizes training data as-is and employs it for classifying unlabelled test data.
▪ Relies heavily on the given training data, making it known as rote learning or instance learning.
▪ Also referred to as non-parametric learning.
▪ Training phase is quick because little learning occurs.
▪ Classification can be time-consuming as each test data point is compared to training data.
▪ k-Nearest Neighbors is a popular algorithm for lazy learning.
Underfitting and Overfitting
• If the target function is kept too simple, it may not be able to capture the essential nuances
and represent the underlying data well. A typical case of underfitting may occur when trying
to represent a non-linear data with a linear model
• Many times underfitting happens due to unavailability of sufficient training data. Underfitting
results in both poor performance with training data as well as poor generalization to test data.
Underfitting can be avoided by:
❑ using more training data
❑ reducing features by effective feature selection
• Overfitting refers to a situation where the model has been designed in such a way that it
emulates the training data too closely. In such a case, any specific deviation in the training
data, like noise or outliers, gets embedded in the model.
• It adversely impacts the performance of the model on the test data. Overfitting, in many
cases, occur as a result of trying to fit an excessively complex model to closely match the
training data. Overfitting can be avoided by:
❑ using re-sampling techniques like k-fold cross validation
❑ hold back of a validation data set
❑ remove the nodes which have little or no predictive power for the given machine
learning problem.
Figure 4:Underfitting and Overfitting of models
Evaluating Performance of a Model
Supervised learning - classification

• For any classification model, model accuracy is given by total number of correct classifications
(either as the class of interest, i.e. True Positive or as not the class of interest, i.e. True Negative)
divided by total number of classifications done.
• A matrix containing correct and incorrect predictions in the form of True Positives, False Positives,
False Negatives and True Negatives is known as confusion matrix.

Figure 5: Details of model classification


Supervised learning classification model evaluation metrics

• Key Performance Metrics and Evaluation Techniques for


Classification Models

⮚ Accuracy
⮚ Error rate
⮚ Sensitivity
⮚ Specificity
⮚ Precision
⮚ Recall
⮚ F-measure
⮚ Receiver operating characteristic (ROC) curves
⮚ Area under curve (AUC)
Example
Supervised learning classification model evaluation- Accuracy
Accuracy is a measure of how many predictions a classification model got correct, expressed
as a ratio of the correctly predicted instances to the total instances in the dataset. It is
measured as:
Supervised learning classification model evaluation- Error Rate

The error rate is the complement of accuracy. It represents the proportion of incorrect
predictions in relation to the total instances. It is measured as:
Supervised learning classification model evaluation- Sensitivity
The sensitivity of a model measures the proportion of TP examples or positive cases
which were correctly classified. It is measured as:

In the context of the previous


Supervised learning classification model evaluation- Specificity
Specificity of a model measures the proportion of negative examples which have
been correctly classified. It is measured as:

A higher value of specificity will indicate a better model performance. However, it is


quite understandable that a conservative approach to reduce False Negatives might
actually push up the number of FPs.
Supervised learning classification model evaluation- Precision

Precision: Precision, also known as Positive Predictive Value, assesses the accuracy of
positive predictions. It's the ratio of true positives to the total instances predicted as
positive.
Supervised learning classification model evaluation- Recall
Recall: indicates the proportion of correct prediction of positives to the total number of
positives. In case of win/loss prediction of cricket, recall resembles what proportion of the
total wins were predicted correctly.
Supervised learning classification model evaluation- F-Measure
F-measure is another measure of model performance which combines the precision
and recall. It takes the harmonic mean of precision and recall as calculated as:
Supervised learning classification model evaluation-
Receiver operating characteristic (ROC)
• Receiver Operating Characteristic (ROC) curve helps in visualizing the performance of a
classification model. It shows the efficiency of a model in the detection of true positives
while avoiding the occurrence of false positives.
• In the ROC curve, the FP rate is plotted (in the horizontal axis) against true positive rate
(in the vertical axis) at different classification thresholds. If we assume a lower value of
classification threshold, the model classifies more items as positive. Hence, the values of
both False Positives and True Positives increase.
Supervised learning classification model evaluation- Area Under Curve
• The area under curve (AUC) value, as shown in
figure 6.a , is the area of the two-dimensional
space under the curve extending from (0, 0) to (1,
1), where each point on the curve gives a set of
true and false positive values at a specific
classification threshold.
• This curve gives an indication of the predictive
quality of a model. AUC value ranges from 0 to 1,
with an AUC of less than 0.5 indicating that the
classifier has no predictive ability.
• Figure 6.b shows the curves of two classifiers –
classifier 1 and classifier 2. Quite obviously, the
AUC of classifier 1 is more than the AUC of
classifier 2. So, we can draw the inference that
classifier 1 is better than classifier 2.
Figure 6: ROC curve
Reference list
Ref 1. Miroslav Kubat, An Introduction to Machine Learning, Third Edition, 2021, Pearson, ISBN 978-
3-030-81934-7
Ref 2. Saikat Dutt (Author), Subramanian Chandramouli (Author), Amit Kumar Das, Machine Learning,
First Edition , 2018, Person.

You might also like