machine learning-unit 3
machine learning-unit 3
Bias is a phenomenon that skews the result of an algorithm in favor or against an idea.*
Bias is considered a systematic error that occurs in the machine learning model itself due
to incorrect assumptions in the ML process.* Bias is the difference between the average
prediction of our model and the correct value which we are trying to predict.fig shows
bias
Bias and variance are components of reducible error. Reducing errors requires selecting models
that have appropriate complexity and flexibility, as well as suitable training data
Low bias: A low bias model will make fewer assumptions about the form of the target
function.
High bias: A model with a high bias makes more assumptions and the model becomes unable
to capture the important features of our dataset. A high bias model also cannot perform well on
new data.
Examples of machine learning algorithms with low bias are decision trees, k-nearest
neighbours and support vector machines. An algorithm with high bias is linear regression, linear
discriminant analysis and logistic regression.
Q.3 Define variance. Explain low and high variance? How to reduce high variance
?
Ans. Variance indicates how much the estimate of the target function will alter if different
training data were used. In other words, variance describes how much a random variable
differs from its expected value.
* Low variance means there is a small variation in the prediction of the target function
with changes in the training data set. High variance shows a large variation in the
prediction of the target function with changes in the training dataset.
* Variance comes from highly complex models with a large number of features.
* In supervised learning, the class value assigned by the learning model built based on the
training data may differ from the actual class value. This error in learning can be of two
types, errors due to 'bias' and error due to 'variance'.
1. Low-bias, low-variance: The combination of low bias and low variance shows an ideal
machine learning model. However, it is not possible practically.
2. Low-bias, high-variance With low bias and high variance, model predictions are
inconsistent and accurate on average. This case occurs when the model learns with a large
number of parameters and hence leads to an overfitting
3. High-bias, low-variance With high bias and low variance, predictions are consistent but
inaccurate on average. This case occurs when a model does not learn well with the
training dataset or uses few numbers of the parameter. It leads to underfitting problems
in the model.
4. High-bias, high-variance With high bias and high variance, predictions are inconsistent
and also inaccurate on average.
Interpretation:
The variance of the exam scores is 50. This means that, on average, each score deviates
from the mean score by approximately √50 ≈ 7.07 points.
Underfitting
A statistical model or a machine learning algorithm is said to have underfitting when it cannot
capture the underlying trend of the data
Underfitting destroys the accuracy of our machine learning model. Ita occurrence simply means
that our model or the algorithm does not fit the data well enough.
It usually happens when we have less data to build an accurate model and also when we try to
build a linear model with a non-linear data.
In such cases the rules of the machine learning model are too easy and flexible to be applied on
such minimal data and therefore the model will probably make a lot of wrong predictions.
Underfitting can be avoided by using more data and also reducing the features by feature
selection.In a nutshell, Underfitting-High bias and low variance
Underfitting examples:
1. The learning time may be prohibitively large and the learning stage was prematurely
terminated.
2. The learner did not use a sufficient number of iterations.
3. The learner tries to fit a straight line to a training set whose examples exhibit a quadratic
nature.
Overfitting
A statistical model is said to be overfitted, when we train it with a lot of data. When a model gets
trained with so much of data, it starts learning from the noise and inaccurate data entries in our
data set
Then the model does not categorize the data correctly, because of too many details and noise.
The causes of overfitting are the non-parametric and non-linear methods because these types of
machine learning algorithms have more freedom in building the model based on the dataset and
therefore they can really build unrealistic modela
A solution to avoid overfitting is using a linear algorithm if we have linear data or using the
parameters like the maximal depth if we are using decision trees.
In a nutshell, Overfitting - High variance and low bias.
Example:
Suppose we want to predict exam scores based on hours studied.
Dataset:
Model:
We use a complex polynomial model: Score = 2x^3 - 5x^2 + 3x + 1
Problem:
The model fits the training data perfectly, but it's too complex.
Overfitting:
The model overfits the data because it's too complex and captures noise.
Solution:
Use regularization techniques, simplify the model, or collect more data
Ans: Reducing underfitting and overfitting in machine learning requires carefully selecting and
tuning techniques during model training. Here's a brief explanation of techniques to address each
issue:
1. Reducing Underfitting:
Underfitting occurs when a model is too simple to capture the underlying patterns in the data.
Increase Model Complexity: Use more complex models or add more layers/neurons for neural
networks.
Feature Engineering: Add relevant features or use polynomial features to capture more variance
in data.
Decrease Regularization: Reduce penalties like L1/L2 regularization to allow the model to fit the
data better.
Train Longer: Increase training epochs to allow the model to learn better from the data.
Hyperparameter Tuning: Adjust parameters like learning rate, tree depth, or number of
estimators.
2. Reducing Overfitting:
Overfitting happens when a model learns noise or irrelevant details from the training data.
Cross-Validation: Use techniques like k-fold cross-validation to ensure the model generalizes well
to unseen data.
Pruning (for tree-based models): Limit the depth of trees or remove unnecessary branches.
Dropout (for neural networks): Randomly drop neurons during training to prevent dependency
on specific features.
Data Augmentation: Increase training data size by introducing variations (e.g., rotating images or
adding noise).
Simplify the Model: Reduce model complexity by lowering the number of parameters or layers.
Early Stopping: Monitor validation loss during training and stop when performance starts to
degrade.
Increase Training Data: More data helps the model generalize better.
By balancing these techniques, you can improve your model's performance and achieve a good
trade-off between bias and variance.
Q9. Difference between overfitting and underfitting?
overfitting underfitting
The model performs well on training The model performs poorly on both training
data but poorly on test data and test data
Error Type is High variance, low bias Error type is High bias, low variance
Model Complexity is Too complex Model complexity is too simple
Training Data: Training Data: Fails to capture underlying
Learns noise and details in training data patterns in training data
Performance: Performance: Poor performance on both
Poor generalization to new data seen and unseen data
Symptoms: low training error, high Symptoms: High training and testing error
testing error
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor
Variable)
a0= intercept of the line (Gives an
additional degree of freedom)
a1 = Linear regression coefficient (scale
factor to each input value).
ε = random error The values for x and y
variables are training datasets for Linear
Regression model representation
J(w)=(12)∗∑(y−h(y))2+∑|w|
where y is the actual value, h(y) denotes the predicted value, and w denotes the feature coefficient.
Lasso regression can reduce certain coefficients to zero, conducting feature selection in effect. With high-
dimensional datasets where many characteristics could be unnecessary or redundant, this is very helpful.
The resultant model is less complex and easier to understand, and by minimizing overfitting, it frequently
exhibits improved predictive performance
Ridge Regression
To combat the issue of overfitting in linear regression models, ridge regression is a regularization
approach. The size of the coefficients is reduced and overfitting is prevented by adding a penalty term to
the cost function of linear regression. The penalty term regulates the magnitude of the coefficients in the
model and is proportional to the sum of squared coefficients. The coefficients shrink toward zero when
the penalty term's value is raised, lowering the model's variance.
J(w)=(12)∗∑(y−h(y))2+∑|w|2
where y is the actual value, h(y) denotes the predicted value, and w denotes the feature coefficient.
Ridge regression works best when there are several tiny to medium-sized coefficients and when all
characteristics are significant. Also, it is computationally more effective than other regularization
methods. Ridge regression's primary drawback is that it does not erase any characteristics, which may
not always be a good thing. The specific situation at hand and the qualities of the data will determine
whether to use Ridge or another regularization approach.
Q13. What is Gradient descent? Explain Gradient descent algorithm and
also its limitation
Gradient Descent is defined as one of the most commonly used iterative optimization
algorithms of machine learning to train the machine learning and deep learning models. It
helps in finding the local minimum of a function.
Gradient descent is a mathematical algorithm that helps train machine learning models and
neural networks by finding the best weights and biases to minimize errors between predicted
and actual results.
Algorithm:
gradient descent, one takes steps proportional to the negative of the gradient of the function
at the current point.
* Gradient descent is popular for very large-scale optimization problems because it is easy to
implement, can handle black box functions, and each iteration is cheap.
* Given a differentiable scalar field f (x) and an initial guess x1, gradient descent iteratively
moves the guess toward lower values of "f" by taking steps in the direction of the negative
gradient f (x).
* Locally, the negated gradient is the steepest descent direction, i.e., the direction that x
would need to move in order to decrease "f" the fastest. The algorithm typically converges to
a local minimum, but may rarely reach a saddle point, or not move at all if x, lies at a local
maximum.
* The gradient will give the slope of the curve at that x and its direction will point to an
increase in the function. So we change x in the opposite direction to lower the function value:
2. Regression Metrics:
Mean Absolute Error (MAE): The average of absolute differences between predicted and actual values.
Mean Squared Error (MSE): The average of squared differences between predicted and actual values;
penalizes larger errors more heavily.
Root Mean Squared Error (RMSE): The square root of MSE, providing error in the same unit as the target
variable.
R-squared: Indicates how well the model explains the variability of the target variable.
3. Clustering Metrics:
Silhouette Score: Measures how similar an object is to its own cluster compared to other clusters.
Davies-Bouldin Index: Measures the average similarity ratio of a cluster to its most similar cluster.
Adjusted Rand Index (ARI): Compares the similarity between predicted and true cluster assignments.
These metrics guide model selection, tuning, and deployment by revealing strengths and weaknesses in
performance.
Advantages of MAE
The MAE you get is in the same unit as the output variable.
It is most Robust to outliers.
2] RMSE
RMSE is a square root of value gathered from the mean square error function. It helps us plot
a difference between the estimate and actual value of a parameter of the model.
Using RSME, we can easily measure the efficiency of the model.
A well-working algorithm is known if its RSME score of less than 180. Anyhow, if the RSME
value surpasses 180, we need to apply feature selection and hyper-parameter tuning on the
model parameter.
Advantages of RMSE
The output value you get is in the same unit as the required output variable which makes
interpretation of loss easy.
Disadvantages of RMSE
3] R^2[ R-SQUARE]
R2 score is a metric that tells the performance of your model.
In contrast, MAE and MSE depend on the context whereas the R2 score is independent of context.
Hence, R2 squared is also known as Coefficient of Determination or sometimes also known as
Goodness of fit.
Now, how will you interpret the R2 score ? suppose If the R2 score is zero then the above regression
line by mean line is equal means 1 so 1-1 is zero. So, in this case, both lines are overlapping means
model performance is worst, It is not capable to take advantage of the output column.
Now the second case is when the R2 score is I, it means when the division term is zero and it wil
happen when the regression line does not make any mistake, it is perfect. In the real world and it i
not possible. So we can conclude that as our regression line moves towards perfection, R2 score
mate towards one. And the model performance improves.
The normal case is when the R2 score is between zero and one like 0.8 which means your model is
capable to explain 80 per cent of the variance of data.
4] MSE
MSE Mean squared error (MSE) is a metric used to measure the average squared difference
between the predicted values and the actual values in the dataset. It is calculated by taking
the average of the squared residuals, where the residual is the difference between predicted
value and the actual value for each data point. The MSE value provides a way to analyze the
accuracy of the model.
ML UNIT 3