Unit 2 Chap 4
Unit 2 Chap 4
Real-life Example: Medical Test for a Disease Imagine a medical test for a
disease. Precision would be the proportion of patients correctly diagnosed with
the disease among those predicted to have it. If the test correctly identifies 80
out of 100 patients with the disease, and 20 of the positive predictions were
false alarms, precision is
The matrix displays the number of instances produced by the model on the test
data.
• True positives (TP): occur when the model accurately predicts a
positive data point.
• True negatives (TN): occur when the model accurately predicts a
negative data point.
• False positives (FP): occur when the model predicts a positive data
point incorrectly.
• False negatives (FN): occur when the model mispredicts a negative
data point.
Why do we need a Confusion Matrix?
When assessing a classification model’s performance, a confusion matrix is
essential. It offers a thorough analysis of true positive, true negative, false
positive, and false negative predictions, facilitating a more profound
comprehension of a model’s recall, accuracy, precision, and overall
effectiveness in class distinction. When there is an uneven class distribution in a
dataset, this matrix is especially helpful in evaluating a model’s performance
beyond basic accuracy metrics.
Let’s understand the confusion matrix with the examples:
Confusion Matrix For binary classification
A 2X2 Confusion matrix is shown below for the image recognition having a
Dog image or Not Dog image.
Actual
Index 1 2 3 4 5 6 7 8 9 10
Result TP FN TP TN TP FP TP TP TN TN
• Actual Dog Counts = 6
• Actual Not Dog Counts = 4
• True Positive Counts = 5
• False Positive Counts = 1
• True Negative Counts = 3
• False Negative Counts = 1
Actual
It represents the probability with which our model can distinguish between the
two classes present in our target.
4) Cross-validation
K-Fold Cross-Validation:
• Utilizes the entire dataset for both training and validation, reducing bias and
variance.
• Helps detect overfitting by evaluating the model on multiple subsets of the
data.
Disadvantages:
• Computationally expensive, especially for large datasets and complex models,
MAIL ID: [email protected] Contact No. 9619374538
TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI
as it involves
3. Training and Validation: For each fold, train the model on the remaining k-1
folds and
validate it on the current fold.
4. Performance Metrics: Compute the performance metrics on the validation set
for each
iteration.
5. Average Performance: Average the performance metrics over the k iterations
to obtain a
more reliable estimate of the model's performance.
Advantages:
• Ensures that each fold represents the overall class distribution of the dataset,
making the
MAIL ID: [email protected] Contact No. 9619374538
TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI
Hyperparameter tuning and model selection are crucial steps in the process of
model evaluation and selection, especially in machine learning tasks. Let's delve
into each of these concepts:
Hyperparameter Tuning:
Definition: Hyperparameter tuning involves finding the optimal values for the
hyperparameters of a machine learning model. Hyperparameters are
configuration settings that are set before the learning process begins and control
the learning process itself.
Procedure:
1. Define Hyperparameters: Identify the hyperparameters of the model
that need to be tuned. These could include parameters such as learning
rate, regularization strength, tree depth, etc.
MAIL ID: [email protected] Contact No. 9619374538
TYCS RKT COLLEGE ULHASNAGAR ASST. PROF SHREYA TIWARI
5. Test Final Model: Validate the selected model on a separate test set to
obtain an unbiased estimate of its performance.
Importance: Model selection is critical because different models have different
strengths and weaknesses, and the choice of model can significantly impact the
overall performance of the machine learning system. By comparing and
selecting the best-performing model, we can build a more accurate and effective
predictive model for the task at hand.
Here are simplified steps for hyperparameter tuning and model selection:
1. Problem Definition:
• Clearly define the problem you want to solve, such as customer churn
prediction, spam detection, or disease diagnosis.
2. Data Collection:
• Gather relevant data that includes features (input variables) and labels
(output variable) for your problem. Ensure the data is clean and properly
formatted.
3. Split Data:
• Split the dataset into training and test sets. The training set will be used to
train models, and the test set will be used for evaluation.
4. Model Selection:
• Choose candidate machine learning algorithms suitable for your problem.
Consider algorithms like Logistic Regression, Decision Trees, Random
Forest, Support Vector Machines, etc.
5. Hyperparameter Tuning:
• For each selected algorithm, identify hyperparameters to tune. These are
parameters that control the learning process, such as regularization
strength, tree depth, or learning rate.
• Use techniques like Grid Search or Random Search to explore different
combinations of hyperparameters.
• Train models using different hyperparameter configurations on the
training set and evaluate their performance using cross-validation.
6. Evaluation: