0% found this document useful (0 votes)
18 views14 pages

Unit 2 Chap 4

Unit 2

Uploaded by

KALPANA C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views14 pages

Unit 2 Chap 4

Unit 2

Uploaded by

KALPANA C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

TYCS RKT COLLEGE ULHASNAGAR ASST.

PROF SHREYA TIWARI

UNIT 2: CHAP 4: MODEL EVALUATION AND SELECTION

1) Techniques for model evaluation performance

Accuracy: Accuracy is a measure of the overall correctness of the model. It is


the ratio of correctly predicted instances to the total instances.
Accuracy=Number of Correct PredictionsTotal Number of PredictionsAccuracy
=Total Number of PredictionsNumber of Correct Predictions

Real-life Example: Email Spam Detection Consider an email spam detection


system. If the system correctly classifies 900 out of 1000 emails as either spam
or not spam, the accuracy is

2. Precision: Precision is a measure of the accuracy of the positive predictions.


It is the ratio of correctly predicted positive observations to the total predicted
positives.

Real-life Example: Medical Test for a Disease Imagine a medical test for a
disease. Precision would be the proportion of patients correctly diagnosed with
the disease among those predicted to have it. If the test correctly identifies 80
out of 100 patients with the disease, and 20 of the positive predictions were
false alarms, precision is

3. Recall (Sensitivity or True Positive Rate): Recall is a measure of the ability


of the model to capture all the relevant instances. It is the ratio of correctly
predicted positive observations to the all observations in actual class.

MAIL ID: [email protected] Contact No. 9619374538


TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

Real-life Example: Airport Security Screening In airport security screening,


recall would be the ability of the system to correctly identify all dangerous
items (true positives) among all actual dangerous items, even if it means some
non-dangerous items are incorrectly flagged. If the system identifies 90 out of
100 dangerous items but misses 10,

4. F1-Score: The F1-score is the harmonic mean of precision and recall. It


provides a balance between precision and recall, especially when they have an
uneven distribution.

Real-life Example: Text Classification Consider a text classification model


that identifies spam messages. The F1-score would be useful when you want to
balance the need to correctly identify spam messages (precision) with the need
to capture all spam messages (recall).
In summary, accuracy, precision, recall, and F1-score are metrics used to
evaluate the performance of classification models. They provide insights into
different aspects of the model's performance and are chosen based on the
specific goals and requirements of the application.
2) Confusion matrix
••
In machine learning, classification is the process of categorizing a given set of
data into different categories. In machine learning, to measure the performance
of the classification model, we use the confusion matrix. Through this tutorial,
understand the significance of the confusion matrix.
What is a Confusion Matrix?
A confusion matrix is a matrix that summarizes the performance of a machine
learning model on a set of test data. It is a means of displaying the number of
accurate and inaccurate instances based on the model’s predictions. It is often
used to measure the performance of classification models, which aim to predict
a categorical label for each input instance.
MAIL ID: [email protected] Contact No. 9619374538
TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

The matrix displays the number of instances produced by the model on the test
data.
• True positives (TP): occur when the model accurately predicts a
positive data point.
• True negatives (TN): occur when the model accurately predicts a
negative data point.
• False positives (FP): occur when the model predicts a positive data
point incorrectly.
• False negatives (FN): occur when the model mispredicts a negative
data point.
Why do we need a Confusion Matrix?
When assessing a classification model’s performance, a confusion matrix is
essential. It offers a thorough analysis of true positive, true negative, false
positive, and false negative predictions, facilitating a more profound
comprehension of a model’s recall, accuracy, precision, and overall
effectiveness in class distinction. When there is an uneven class distribution in a
dataset, this matrix is especially helpful in evaluating a model’s performance
beyond basic accuracy metrics.
Let’s understand the confusion matrix with the examples:
Confusion Matrix For binary classification
A 2X2 Confusion matrix is shown below for the image recognition having a
Dog image or Not Dog image.
Actual

Dog Not Dog

True Positive False Positive


Dog (TP) (FP)

False Negative True Negative


Predicted Not Dog (FN) (TN)

True Positive (TP): It is the total counts having both predicted and
actual values are Dog.
• True Negative (TN): It is the total counts having both predicted and
actual values are Not Dog.
• False Positive (FP): It is the total counts having prediction is Dog
while actually Not Dog.
• False Negative (FN): It is the total counts having prediction is Not
Dog while actually, it is Dog.
Example for binary classification problems

MAIL ID: [email protected] Contact No. 9619374538


TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

Index 1 2 3 4 5 6 7 8 9 10

Not Not Not Not


Do Do Do Do Do Do
Do Do Do Do
g g g g g g
Actual g g g g

Not Not Not Not


Do Do Do Do Do Do
Predicte Do Do Do Do
g g g g g g
d g g g g

Result TP FN TP TN TP FP TP TP TN TN
• Actual Dog Counts = 6
• Actual Not Dog Counts = 4
• True Positive Counts = 5
• False Positive Counts = 1
• True Negative Counts = 3
• False Negative Counts = 1
Actual

Dog Not Dog

True Positive False Positive


Dog (TP =5) (FP=1)

False Negative True Negative


Predicted Not Dog (FN =1) (TN=3)

3) ROC /AUC CURVE

What is the AUC-ROC curve?


The AUC-ROC curve, or Area Under the Receiver Operating
Characteristic curve, is a graphical representation of the performance of a
binary classification model at various classification thresholds. It is
commonly used in machine learning to assess the ability of a model to
distinguish between two classes, typically the positive class (e.g.,
presence of a disease) and the negative class (e.g., absence of a disease).
MAIL ID: [email protected] Contact No. 9619374538
TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

Receiver Operating Characteristics (ROC) Curve


ROC stands for Receiver Operating Characteristics, and the ROC curve is the
graphical representation of the effectiveness of the binary classification model.
It plots the true positive rate (TPR) vs the false positive rate (FPR) at different
classification thresholds.
Area Under Curve (AUC) Curve:
AUC stands for the Area Under the Curve, and the AUC curve represents the
area under the ROC curve. It measures the overall performance of the binary
classification model. As both TPR and FPR range between 0 to 1, So, the area
will always lie between 0 and 1, and A greater value of AUC denotes better
model performance. Our main goal is to maximize this area in order to have the
highest TPR and lowest FPR at the given threshold. The AUC measures the
probability that the model will assign a randomly chosen positive instance a
higher predicted probability compared to a randomly chosen negative instance.

It represents the probability with which our model can distinguish between the
two classes present in our target.

Key terms used in AUC and ROC Curve


1. TPR and FPR
This is the most common definition that you would have encountered when
you would Google AUC-ROC. Basically, the ROC curve is a graph that shows
the performance of a classification model at all possible thresholds( threshold
is a particular value beyond which you say a point belongs to a particular
class). The curve is plotted between two parameters
• TPR – True Positive Rate
• FPR – False Positive Rate

MAIL ID: [email protected] Contact No. 9619374538


TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

4) Cross-validation

Cross-validation is a resampling technique used in machine learning to assess


the performance of a
predictive model. It involves partitioning the dataset into subsets, training the
model on a subset
(training set), and evaluating it on the complementary subset (validation set or
test set). This process
is repeated multiple times, and the performance metrics are averaged over the
iterations to obtain a
more robust estimate of the model's performance.

MAIL ID: [email protected] Contact No. 9619374538


TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

K-Fold Cross-Validation:

MAIL ID: [email protected] Contact No. 9619374538


TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

Definition: K-Fold Cross-Validation is a popular technique where the original


dataset is divided into k
equal-sized subsets (folds). The model is trained k times, each time using k-1
folds as the training set
and the remaining fold as the validation set. The performance metrics are then
averaged over the k
iterations.
Procedure:
1. Partitioning: Divide the dataset into k equal-sized subsets (folds).
2. Training and Validation: For each fold, train the model on the remaining k-1
folds and
validate it on the current fold.
3. Performance Metrics: Compute the performance metrics (e.g., accuracy,
precision, recall) on
the validation set for each iteration.
4. Average Performance: Average the performance metrics over the k iterations
to obtain a
more reliable estimate of the model's performance.
Advantages:

• Provides a more accurate estimate of the model's performance compared to a


single train-
test split.

• Utilizes the entire dataset for both training and validation, reducing bias and
variance.
• Helps detect overfitting by evaluating the model on multiple subsets of the
data.
Disadvantages:
• Computationally expensive, especially for large datasets and complex models,
MAIL ID: [email protected] Contact No. 9619374538
TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

as it involves

MAIL ID: [email protected] Contact No. 9619374538


TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

training the model multiple times.


• Not suitable for time-series data or data with temporal dependencies, as it may
break the
temporal structure of the data.
Stratified Cross-Validation:
Definition: Stratified Cross-Validation is a variation of k-fold cross-validation
where the class
distribution in the dataset is preserved in each fold. This technique is
particularly useful for
classification problems with imbalanced class distributions.
Procedure:
1. Stratification: Stratify the dataset based on the target variable (class labels) to
ensure that
each fold has a similar class distribution as the original dataset.
2. Partitioning: Divide the dataset into k equal-sized subsets (folds), ensuring
that each fold
maintains the class proportions of the original dataset.

3. Training and Validation: For each fold, train the model on the remaining k-1
folds and
validate it on the current fold.
4. Performance Metrics: Compute the performance metrics on the validation set
for each
iteration.
5. Average Performance: Average the performance metrics over the k iterations
to obtain a
more reliable estimate of the model's performance.
Advantages:
• Ensures that each fold represents the overall class distribution of the dataset,
making the
MAIL ID: [email protected] Contact No. 9619374538
TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

evaluation more representative.


• Helps prevent bias in the performance estimate, especially for imbalanced
datasets.
Disadvantages:
• May not always be feasible or necessary, especially for well-balanced datasets.
• Can be computationally expensive, especially for large datasets and complex
models.
In summary, both k-fold cross-validation and stratified cross-validation are
essential techniques for
evaluating the performance of machine learning models. The choice between
them depends on the
specific characteristics of the dataset and the problem at hand, such as class
distribution imbalance.

5) Hyperparameter tuning and model selection

Hyperparameter tuning and model selection are crucial steps in the process of
model evaluation and selection, especially in machine learning tasks. Let's delve
into each of these concepts:
Hyperparameter Tuning:
Definition: Hyperparameter tuning involves finding the optimal values for the
hyperparameters of a machine learning model. Hyperparameters are
configuration settings that are set before the learning process begins and control
the learning process itself.
Procedure:
1. Define Hyperparameters: Identify the hyperparameters of the model
that need to be tuned. These could include parameters such as learning
rate, regularization strength, tree depth, etc.
MAIL ID: [email protected] Contact No. 9619374538
TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

2. Select Search Method: Choose a search method to explore the


hyperparameter space. Common methods include grid search, random
search, and Bayesian optimization.
3. Define Search Space: Define the range or set of possible values for each
hyperparameter to be explored during the search.
4. Evaluate Performance: Train the model with different combinations of
hyperparameters and evaluate its performance using cross-validation or a
validation set.
5. Select Best Model: Choose the combination of hyperparameters that
results in the best performance metric(s) on the validation set.
6. Test Final Model: Validate the selected model on a separate test set to
obtain an unbiased estimate of its performance.
Importance: Hyperparameter tuning is crucial because the choice of
hyperparameters can significantly impact the performance and generalization
ability of a machine learning model. By finding the optimal values for
hyperparameters, we can improve the model's accuracy and robustness.
Model Selection:
Definition: Model selection involves choosing the best model architecture or
algorithm for a given machine learning task among a set of candidate models.
Procedure:
1. Define Candidate Models: Select a set of candidate models or
algorithms that are suitable for the problem at hand. This could include
various machine learning algorithms (e.g., linear regression, decision
trees, support vector machines) or different architectures of deep learning
models.
2. Train and Evaluate Models: Train each candidate model on the training
data and evaluate its performance using cross-validation or a validation
set.
3. Compare Performance: Compare the performance metrics (e.g.,
accuracy, precision, recall, F1-score) of the candidate models to
determine which one performs best on the validation set.
4. Select Best Model: Choose the model with the highest performance
metric(s) as the final model.

MAIL ID: [email protected] Contact No. 9619374538


TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

5. Test Final Model: Validate the selected model on a separate test set to
obtain an unbiased estimate of its performance.
Importance: Model selection is critical because different models have different
strengths and weaknesses, and the choice of model can significantly impact the
overall performance of the machine learning system. By comparing and
selecting the best-performing model, we can build a more accurate and effective
predictive model for the task at hand.
Here are simplified steps for hyperparameter tuning and model selection:
1. Problem Definition:
• Clearly define the problem you want to solve, such as customer churn
prediction, spam detection, or disease diagnosis.
2. Data Collection:
• Gather relevant data that includes features (input variables) and labels
(output variable) for your problem. Ensure the data is clean and properly
formatted.
3. Split Data:
• Split the dataset into training and test sets. The training set will be used to
train models, and the test set will be used for evaluation.
4. Model Selection:
• Choose candidate machine learning algorithms suitable for your problem.
Consider algorithms like Logistic Regression, Decision Trees, Random
Forest, Support Vector Machines, etc.
5. Hyperparameter Tuning:
• For each selected algorithm, identify hyperparameters to tune. These are
parameters that control the learning process, such as regularization
strength, tree depth, or learning rate.
• Use techniques like Grid Search or Random Search to explore different
combinations of hyperparameters.
• Train models using different hyperparameter configurations on the
training set and evaluate their performance using cross-validation.
6. Evaluation:

MAIL ID: [email protected] Contact No. 9619374538


TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI

• Evaluate the performance of each model using appropriate evaluation


metrics (e.g., accuracy, precision, recall, F1-score) on the validation set.
• Select the model with the best performance metrics as the final model.
7. Test:
• Validate the selected model on the test set to obtain an unbiased estimate
of its performance.
• Ensure that the model generalizes well to unseen data and performs
consistently.
8. Deployment and Monitoring:
• Deploy the selected model into production to make predictions on new
data.
• Monitor the model's performance over time and periodically re-evaluate
and re-tune hyperparameters if necessary.
By following these steps, you can effectively tune hyperparameters and select
the best-performing model for your machine learning task, ensuring accurate
predictions and successful deployment in real-world applications.

MAIL ID: [email protected] Contact No. 9619374538

You might also like