0% found this document useful (0 votes)
207 views11 pages

Data Science Unit 5

This document discusses model evaluation and generalization. It defines generalization as a model's ability to classify or forecast new, unseen data. The document outlines factors that influence generalization, such as the nature of the algorithm/model and diversity in the training dataset. It also discusses overfitting versus proper generalization. Evaluation metrics mentioned include confusion matrices, accuracy, precision, recall, and cross-validation.

Uploaded by

Hanuman Jyothi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
207 views11 pages

Data Science Unit 5

This document discusses model evaluation and generalization. It defines generalization as a model's ability to classify or forecast new, unseen data. The document outlines factors that influence generalization, such as the nature of the algorithm/model and diversity in the training dataset. It also discusses overfitting versus proper generalization. Evaluation metrics mentioned include confusion matrices, accuracy, precision, recall, and cross-validation.

Uploaded by

Hanuman Jyothi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Unit – V: Model Evaluation

Introduction

The generalization of machine learning models is the ability of a model to classify or forecast
new data. When we train a model on a dataset, and the model is provided with new data absent
from the trained set, it may perform well. Such a model is generalizable. It doesn’t have to act on
all data types but with similar domains or datasets.

What are Unseen Data?

It is important to have an understanding of what unseen data is. Unseen data is data new to the
model that was not part of the training. Models perform better on observations they have seen
before. For more benefit, we should try to have models that will be able to perform even on
unseen data.

Benefits of Generalization

Sometimes generalization can be a process to improve performance. In deep learning, models


can analyze and understand patterns present in datasets. They are also liable to overfitting. Using
generalization techniques, this overfitting can be managed so that the model is not too strict. It
can assist in deep learning to predict a pattern that has not been seen before. Generalization
represents how a model can make proper predictions on new data after being trained on the
training set.

Elements for Generalization of Models

Since generalization is more of an advantage, it is necessary to see some of the factors that can
influence it in the design cycle of models.

Nature of the Algorithm/Model centric approach

All models have different behaviors. How they treat data and how they optimize their
performance is different. Decision Trees are non-parametric, making them prone to over fitting.
To address generalization in models, the nature of the algorithm should be considered
intentionally. Sometimes how models perform comes with high complexity. When they are
complex, over fitting becomes easy. A balance can be created using model regularization to
achieve generalization and avoid over fitting. For deep networks, changing network structure by
reducing the number of weights or network parameters i.e, the values of weights, could do the
trick.

The Nature of Dataset

The other side is the dataset being used for training. Sometimes datasets are too unified. They
hold little difference from each other. The dataset of bicycles may be so unified that it can’t be
used to detect motorcycles. In order to achieve a generalized machine learning model, the dataset
should contain diversity. Different possible samples should be added to provide a high range.
This helps models to be trained with the generalization best achieved. During training, we can
use cross-validation techniques e.g, K-fold. This is needful to see the sense of our model even
while targeting generalization.

Non-Generalization of Models

It is seen that models do not require generalization. Models should just do what they are strictly
expected to do. This may or may not be the best. I may want my model trained on images of
motorcycles to be able to identify all similar vehicles, including bicycles and even wheelchairs.
This may be very robust. In another application, this may not be good. We may want our model
trained with motorcycles to strictly identify motorcycles. It should not identify bicycles. Maybe
we want to count motorcycles in the parking lot without bicycles.

Using the above factors that affect generalization, we can decide and have control over when we
want or don’t need generalization. Though generalization can contain risks. As such if the means
are available, non-generalization should be highly optimized. If the means are available, a new
model should be developed for bicycles and another for wheelchairs. In cases when there are
fewer resources like time and dataset, the generalization technique can then be utilized.

Non-Generalization/Generalization and Over fitting of Models

Non-Generalization is most closely related to over fitting conditions. When a model is


non-generalizable it could associate with over fitting. If over fitting is settled, generalization
becomes more achievable. We do not want an over fitted model. A model that has learned the
training dataset and knows nothing else. It performs on the training dataset but does not perform
well on new inputs. Another case is an Underfit Model. This will be a model that does not
understand the problem and acts poorly on a training dataset and does not perform on new
inputs. We don’t want this too. Another scenario is the Good Fit Model. This is like a normal
graph in machine learning. The model appropriately learns the training dataset and generalizes it
to new inputs.

A good fit is what we need to target when we want a model that can be generalized.

Sample Evaluation Metrics Evaluation metrics for evaluating the performance of a machine
learning model, which is an integral component of any data science project. It aims to estimate
the generalization accuracy of a model on the future (unseen/out-of-sample) data.

Confusion Matrix

A confusion matrix is a matrix representation of the prediction results of any binary testing that
is often used to describe the performance of the classification model (or “classifier”) on a set
of test data for which the true values are known.

The confusion matrix itself is relatively simple to understand, but the related terminology can be
confusing.
Each prediction can be one of the four outcomes, based on how it matches up to the actual value:

● True Positive (TP): Predicted True and True in reality.

● True Negative (TN): Predicted False and False in reality.

● False Positive (FP): Predicted True and False in reality.

● False Negative (FN): Predicted False and True in reality.

Now let us understand this concept using hypothesis testing.

A Hypothesis is speculation or theory based on insufficient evidence that lends itself to further
testing and experimentation. With further testing, a hypothesis can usually be proven true or
false.

A Null Hypothesis is a hypothesis that says there is no statistical significance between the two
variables in the hypothesis. It is the hypothesis that the researcher is trying to disprove.

We would always reject the null hypothesis when it is false, and we would accept the null
hypothesis when it is indeed true.

Even though hypothesis tests are meant to be reliable, there are two types of errors that can
occur.

These errors are known as Type 1 and Type II errors.

For example, when examining the effectiveness of a drug, the null hypothesis would be that the
drug does not affect a disease.

Type I Error:- equivalent to False Positives(FP).

The first kind of error that is possible involves the rejection of a null hypothesis that is true.
Type II Error:- equivalent to False Negatives(FN).

The other kind of error that occurs when we accept a false null hypothesis. This sort of error is
called a type II error and is also referred to as an error of the second kind.

. Accuracy

Overall, how often is the classifier correct?

Accuracy = (TP+TN)/total

When our classes are roughly equal in size, we can use accuracy, which will give us correctly
classified values.

Accuracy is a common evaluation metric for classification problems. It’s the number of correct
predictions made as a ratio of all predictions made.

Misclassification Rate(Error Rate): Overall, how often is it wrong. Since accuracy is the
percent we correctly classified (success rate), it follows that our error rate (the percentage we got
wrong) can be calculated as follows:

Misclassification Rate = (FP+FN)/total

Precision

When it predicts yes, how often is it correct?

Precision=TP/predicted yes

Recall or Sensitivity

When it’s actually yes, how often does it predict yes?

True Positive Rate = TP/actual yes

Recall gives us the true positive rate (TPR), which is the ratio of true positives to everything
positive.
What is Cross-Validation?

Cross validation is a technique used in machine learning to evaluate the performance of a model
on unseen data. It involves dividing the available data into multiple folds or subsets, using one of
these folds as a validation set, and training the model on the remaining folds. This process is
repeated multiple times, each time using a different fold as the validation set. Finally, the results
from each validation step are averaged to produce a more robust estimate of the model’s
performance. Cross validation is an important step in the machine learning process and helps to
ensure that the model selected for deployment is robust and generalizes well to new data.

What is cross-validation used for?

The main purpose of cross validation is to prevent over fitting, which occurs when a model is
trained too well on the training data and performs poorly on new, unseen data. By evaluating the
model on multiple validation sets, cross validation provides a more realistic estimate of the
model’s generalization performance, i.e., its ability to perform well on new, unseen data.

Types of Cross-Validation

There are several types of cross validation techniques, including k-fold cross validation,
leave-one-out cross validation, and Holdout validation, Stratified Cross-Validation. The
choice of technique depends on the size and nature of the data, as well as the specific
requirements of the modeling problem.

Underfitting and Overfitting

When we talk about the Machine Learning model, we actually talk about how well it performs
and its accuracy which is known as prediction errors. Let us consider that we are designing a
machine learning model. A model is said to be a good machine learning model if it generalizes
any new input data from the problem domain in a proper way. This helps us to make predictions
about future data, that the data model has never seen. Now, suppose we want to check how well
our machine learning model learns and generalizes to the new data. For that, we have overfitting
and underfitting, which are majorly responsible for the poor performances of the machine
learning algorithms.

Bias and Variance in Machine Learning


● Bias: Bias refers to the error due to overly simplistic assumptions in the learning
algorithm. These assumptions make the model easier to comprehend and learn but might
not capture the underlying complexities of the data. It is the error due to the model’s
inability to represent the true relationship between input and output accurately. When a
model has poor performance both on the training and testing data means high bias
because of the simple model, indicating underfitting.

● Variance: Variance, on the other hand, is the error due to the model’s sensitivity to
fluctuations in the training data. It’s the variability of the model’s predictions for different
instances of training data. High variance occurs when a model learns the training data’s
noise and random fluctuations rather than the underlying pattern. As a result, the model
performs well on the training data but poorly on the testing data, indicating overfitting.

Under fitting

A statistical model or a machine learning algorithm is said to have underfitting when a model is
too simple to capture data complexities. It represents the inability of the model to learn the
training data effectively result in poor performance both on the training and testing data. In
simple terms, an underfit model’s are inaccurate, especially when applied to new, unseen
examples. It mainly happens when we uses very simple model with overly simplified
assumptions. To address underfitting problem of the model, we need to use more complex
models, with enhanced feature representation, and less regularization.

Note: The underfitting model has High bias and low variance.

Reasons for Underfitting

1. The model is too simple, So it may be not capable to represent the complexities in the
data.
2. The input features which is used to train the model is not the adequate representations of
underlying factors influencing the target variable.

3. The size of the training dataset used is not enough.

4. Excessive regularization are used to prevent the overfitting, which constraint the model to
capture the data well.

5. Features are not scaled.

Techniques to Reduce Underfitting

1. Increase model complexity.

2. Increase the number of features, performing feature engineering.

3. Remove noise from the data.

4. Increase the number of epochs or increase the duration of training to get better results.

Over fitting

A statistical model is said to be overfitted when the model does not make accurate predictions on
testing data. When a model gets trained with so much data, it starts learning from the noise and
inaccurate data entries in our data set. And when testing with test data results in High variance.
Then the model does not categorize the data correctly, because of too many details and noise.
The causes of overfitting are the non-parametric and non-linear methods because these types of
machine learning algorithms have more freedom in building the model based on the dataset and
therefore they can really build unrealistic models. A solution to avoid overfitting is using a linear
algorithm if we have linear data or using the parameters like the maximal depth if we are using
decision trees.

Reasons for Overfitting:

1. High variance and low bias.

2. The model is too complex.

3. The size of the training data.

Techniques to Reduce Overfitting

1. Increase training data.

2. Reduce model complexity.


3. Early stopping during the training phase (have an eye over the loss over the training
period as soon as loss begins to increase stop training).

4. Ridge Regularization and Lasso Regularization.

5. Use dropout for neural networks to tackle overfitting.

Feature Lasso Regression Ridge Regression

Penalty term Sum of absolute values of Sum of squared coefficients (L2).


coefficients (L1).

Coefficient shrinkage Strong shrinkage, can result in Moderate shrinkage, coefficients are
exact zeros. close to zero.

Feature selection Automatically selects relevant Retains all features, reduces impact of
features. less important ones.

Interpretability Can provide a sparse model with Retains all features, less sparse model.
selected features.

Bias-variance More biased but less variance. Less biased but more variance.
trade-off

Computational Can be computationally Generally less computationally


complexity expensive. expensive.

When to Use Ridge Regression?


Ridge regression is useful in several scenarios where linear regression is applied. Here are some
situations when Ridge regression can be beneficial:

● Multicollinearity: When the independent variables in a regression model are highly


correlated, it becomes challenging to estimate their individual effects accurately. Ridge
regression addresses this issue by adding a regularization term that reduces the impact of
multicollinearity. It shrinks the regression coefficients, preventing them from taking
extreme values and improving the stability of the model.

● Overfitting: Overfitting occurs when a regression model performs well on the training
data but fails to generalize well to new, unseen data. It often happens when the model
becomes too complex, capturing noise or irregularities specific to the training set. Ridge
regression helps mitigate overfitting by adding a penalty term that discourages large
coefficient values. By shrinking the coefficients, it reduces the complexity of the model
and improves its generalization ability.

● High-Dimensional Datasets: In datasets with many features relative to the number of


observations, traditional regression models may need a larger sample size. Ridge
regression can handle such high-dimensional datasets effectively. Shrinking the
coefficients prevents individual predictors from dominating the model. It reduces the risk
of overfitting, even in cases with fewer observations compared to the number of
predictors.

● Prediction Accuracy: When the main objective is accurate prediction rather than
interpreting individual coefficients, ridge regression can be advantageous. By reducing
the variance of coefficient estimates, it enhances the stability of the model, resulting in
improved prediction performance on new data.

● Bias-Variance Trade-off: Ridge regression allows control over the bias-variance


tradeoff. In linear regression, reducing the bias (making the model more flexible) often
leads to increased variance (model sensitivity to fluctuations in the training data). Ridge
regression introduces a regularization parameter, often denoted as lambda (λ), that
controls the amount of regularization applied. By tuning this parameter, you can balance
bias and variance, choosing a model that optimally fits the data.

FAQ

1. How can overfitting and underfitting impact the reliability of predictions?


2. Define precision and recall in the context of binary classification. When might
you prioritize precision over recall, and vice versa?
3. What is underfitting, and how does it differ from overfitting? How does model
complexity impact the likelihood of underfitting or overfitting?
4. Explain the purpose of Ridge Regression in the context of linear regression.
5. Define grid search and its role in hyperparameter tuning. What are the
advantages and disadvantages of using grid search?
6. Explain the concepts of model and hyper parameter in data science.

You might also like