Bias Variance
Bias Variance
What is Variance?
The variability of model prediction for a given data point which tells us the spread
of our data is called the variance of the model. The model with high variance has
a very complex fit to the training data and thus is not able to fit accurately on the
data which it hasn’t seen before. As a result, such models perform very well on
training data but have high error rates on test data. When a model is high on
variance, it is then said to as Overfitting of Data.
Overfitting is fitting the training set accurately via complex curve and high order
hypothesis but is not the solution as the error with unseen data is high. While
training a data model variance should be kept low. The high variance data looks
as follows.
The best fit will be given by the hypothesis on the tradeoff point. The error to
complexity graph to show trade-off is given as –
Region for the Least Value of Total Error
This is referred to as the best point chosen for the training of the algorithm
which gives low error in training as well as testing data.
Whenever we discuss model prediction, it’s important to understand prediction
errors (bias and variance). There is a tradeoff between a model’s ability to
minimize bias and variance. Gaining a proper understanding of these errors would
help us not only to build accurate models but also to avoid the mistake of
overfitting and underfitting.
So let’s start with the basics and see how they make difference to our machine
learning Models.
What is bias?
Bias is the difference between the average prediction of our model and the correct
value which we are trying to predict. Model with high bias pays very little
attention to the training data and oversimplifies the model. It always leads to high
error on training and test data.
What is variance?
Variance is the variability of model prediction for a given data point or a value
which tells us spread of our data. Model with high variance pays a lot of attention
to training data and does not generalize on the data which it hasn’t seen before. As
a result, such models perform very well on training data but has high error rates
on test data.
Mathematically
Let the variable we are trying to predict as Y and other covariates as X. We assume
there is a relationship between the two such that
Y=f(X) + e
Where e is the error term and it’s normally distributed with a mean of 0.
We will make a model f^(X) of f(X) using linear regression or any other modeling
technique.
So the expected squared error at a point x is
In the above diagram, center of the target is a model that perfectly predicts correct
values. As we move away from the bulls-eye our predictions become get worse
and worse. We can repeat our process of model building to get separate hits on
the target.
In supervised learning, underfitting happens when a model unable to capture the
underlying pattern of the data. These models usually have high bias and low
variance. It happens when we have very less amount of data to build an accurate
model or when we try to build a linear model with a nonlinear data. Also, these
kind of models are very simple to capture the complex patterns in data like Linear
and logistic regression.
In supervised learning, overfitting happens when our model captures the noise
along with the underlying pattern in data. It happens when we train our model a
lot over noisy dataset. These models have low bias and high variance. These
models are very complex like Decision trees which are prone to overfitting.