0% found this document useful (0 votes)
2 views

machine learning-unit 3

Unit 3 imp

Uploaded by

shravanisd2854
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

machine learning-unit 3

Unit 3 imp

Uploaded by

shravanisd2854
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

What is Bias in machine learning? Or explain Bias??

 Bias is a phenomenon that skews the result of an algorithm in favor or against an idea.*
Bias is considered a systematic error that occurs in the machine learning model itself due
to incorrect assumptions in the ML process.* Bias is the difference between the average
prediction of our model and the correct value which we are trying to predict.fig shows
bias

Bias and variance are components of reducible error. Reducing errors requires selecting models
that have appropriate complexity and flexibility, as well as suitable training data

Low bias: A low bias model will make fewer assumptions about the form of the target
function.

High bias: A model with a high bias makes more assumptions and the model becomes unable
to capture the important features of our dataset. A high bias model also cannot perform well on
new data.

Examples of machine learning algorithms with low bias are decision trees, k-nearest
neighbours and support vector machines. An algorithm with high bias is linear regression, linear
discriminant analysis and logistic regression.

Q.2 How to reduce the high bias ?


 Ans. If the average predicted values are far off from the actual values then the bias is
high. High bias causes algorithm to miss relevant relationship between input and
output variable. When a model has a high bias then it implies that the model is too
simple and does not capture the complexity of data thus underfitting the data. Low
variance means there is a small variation in the prediction of the target function
with changes in the training data set. At the same time, High variance shows a large
variation in the prediction of the target function with changes in the training
dataset. High bias can be identified when we have high training error and validation
error or test error is same as training error. Following methods are used to reduce
high bias:

1. Increase the input features as the model is underfitted.

2. Decrease the regularization term.

3. Use more complex models, such as including some polynomial features .

Q.3 Define variance. Explain low and high variance? How to reduce high variance
?
Ans. Variance indicates how much the estimate of the target function will alter if different
training data were used. In other words, variance describes how much a random variable
differs from its expected value.

* Variance is based on a single training set. Variance measures the inconsistency of


different predictions using different training sets, it's not a measure of overall accuracy.

* Low variance means there is a small variation in the prediction of the target function
with changes in the training data set. High variance shows a large variation in the
prediction of the target function with changes in the training dataset.

* Variance comes from highly complex models with a large number of features.

1. Models with high bias will have low variance.

2. Models with high variance will have a low bias.

* Following methods are used to reduce high variance:

1. Reduce the input features or number of parameters as a model is overfitted.

2. Do not use a much complex model.

3. Increase the training data.

4. Increase the regularization term.


Q4. Explain Bias variance Trade off??
 If the algorithm is too simple (hypothesis with linear equation) then it may be on
high bias and low variance condition and thus is error-prone. If algorithms fit too
complex (hypothesis with high degree equation) then it may be on high variance
and low bias. In the latter condition, the new entries will not perform well. Well,
there is something between both of these conditions, known as a Trade-off or Bias
Variance Trade-off. This tradeoff in complexity is why there is a tradeoff between
bias and variance. An algorithm can’t be more complex and less complex at the same
time.
We try to optimize the value of the total error for the model by using the Bias-Variance
Tradeoff.
Total Error = Bias^2 + Variance + Irreducible Error
In the experimental practice we observe an important phenomenon called the bias
variance dilemma.

* In supervised learning, the class value assigned by the learning model built based on the
training data may differ from the actual class value. This error in learning can be of two
types, errors due to 'bias' and error due to 'variance'.

1. Low-bias, low-variance: The combination of low bias and low variance shows an ideal
machine learning model. However, it is not possible practically.

2. Low-bias, high-variance With low bias and high variance, model predictions are
inconsistent and accurate on average. This case occurs when the model learns with a large
number of parameters and hence leads to an overfitting

3. High-bias, low-variance With high bias and low variance, predictions are consistent but
inaccurate on average. This case occurs when a model does not learn well with the
training dataset or uses few numbers of the parameter. It leads to underfitting problems
in the model.

4. High-bias, high-variance With high bias and high variance, predictions are inconsistent
and also inaccurate on average.

Q5.Explain the Following term :Variance


 Definition: Variance is the average of the squared differences between each data
point and the mean value of the dataset. Variance indicates how much the estimate
of the target function will alter if different training data were used. In other words,
variance describes how much a random variable differs from its expected value.

* Variance is based on a single training set. Variance measures the inconsistency of


different predictions using different training sets, it's not a measure of overall accuracy.
Mathematical Formula:

Variance (σ²) = Σ(xi - μ)² / N


where:

xi = individual data point


μ = mean value of the dataset
N = total number of data points
Σ = summation symbol
Example:
Suppose we have a dataset of exam scores with the following values:
Score: 70, 75, 80, 85, 90
To calculate the variance:

1. Calculate the mean value:


μ = (70 + 75 + 80 + 85 + 90) / 5
μ = 80

1. Calculate the squared differences:


(70 - 80)² = (-10)² = 100
(75 - 80)² = (-5)² = 25
(80 - 80)² = 0² = 0
(85 - 80)² = 5² = 25
(90 - 80)² = 10² = 100

1. Calculate the sum of the squared differences:


Σ(xi - μ)² = 100 + 25 + 0 + 25 + 100 = 250
1. Calculate the variance:
σ² = Σ(xi - μ)² / N
σ² = 250 / 5
σ² = 50

Interpretation:
The variance of the exam scores is 50. This means that, on average, each score deviates
from the mean score by approximately √50 ≈ 7.07 points.

In machine learning, variance is an important concept in:

1. Regression analysis: Variance is used to measure the spread of the residuals.


2. Neural networks: Variance is used to initialize weights and biases.
3. Regularization techniques: Variance is used to penalize large weights.
Q6.What is difference between bias and variance??
Bias variance
1. Bias is the difference between the 1. Variance is the amount that the
average prediction and the correct prediction will change if different
value training data sets were used
2. The model is incapable of locating 2 The model recognizes the majority of
patterns in the dataset that it was the dataset's patterns and can even
trained on and it produces inaccurate learn from the noise or data that isn't
results for both seen and unseen data vital to its operation.
3 Low bias models: k-nearest 3. Low variance models: Linear
neighbors, decision trees and support regression and logistic regression.
vector machines.
4. High bias models: Linear regression 4. High variance models: k-Nearest
and logistic regression. neighbors, decision trees and support
vector machines
5. Characteristics: ignores relevant 5. Characteristics:-Captures noise in the
features.- missesunderlying patterns training data-overreacts to minor
variation

Q7.What is overfitting and underfitting in machine learning model?


Explain with example. & Explain the following terms overfitting &
underfitting?

Underfitting
A statistical model or a machine learning algorithm is said to have underfitting when it cannot
capture the underlying trend of the data
Underfitting destroys the accuracy of our machine learning model. Ita occurrence simply means
that our model or the algorithm does not fit the data well enough.
It usually happens when we have less data to build an accurate model and also when we try to
build a linear model with a non-linear data.
In such cases the rules of the machine learning model are too easy and flexible to be applied on
such minimal data and therefore the model will probably make a lot of wrong predictions.
Underfitting can be avoided by using more data and also reducing the features by feature
selection.In a nutshell, Underfitting-High bias and low variance

Underfitting examples:
1. The learning time may be prohibitively large and the learning stage was prematurely
terminated.
2. The learner did not use a sufficient number of iterations.
3. The learner tries to fit a straight line to a training set whose examples exhibit a quadratic
nature.
Overfitting

A statistical model is said to be overfitted, when we train it with a lot of data. When a model gets
trained with so much of data, it starts learning from the noise and inaccurate data entries in our
data set
Then the model does not categorize the data correctly, because of too many details and noise.
The causes of overfitting are the non-parametric and non-linear methods because these types of
machine learning algorithms have more freedom in building the model based on the dataset and
therefore they can really build unrealistic modela
A solution to avoid overfitting is using a linear algorithm if we have linear data or using the
parameters like the maximal depth if we are using decision trees.
In a nutshell, Overfitting - High variance and low bias.

Example:
Suppose we want to predict exam scores based on hours studied.

Dataset:

| Hours Studied | Exam Score |


| --- | --- |
| 2 | 40 |
| 4 | 60 |
| 6 | 80 |
| 8 | 90 |
| 10 | 95 |

Model:
We use a complex polynomial model: Score = 2x^3 - 5x^2 + 3x + 1
Problem:
The model fits the training data perfectly, but it's too complex.
Overfitting:
The model overfits the data because it's too complex and captures noise.
Solution:
Use regularization techniques, simplify the model, or collect more data

How to avoid overfitting and underfitting model ?


Ans. 1. Following methods are used to avoid overfitting:
Cross validation
Training with more data
Removing features
Early stopping the training
Regularization
Ensembling

2. Following methods are used to avoid underfitting:


By growing the education time of the model.
By increasing the wide variety of functions.
Q.8 Explain in Brief Technique to Reduce Underfitting and Overfitting in
Machine Learning???

Ans: Reducing underfitting and overfitting in machine learning requires carefully selecting and
tuning techniques during model training. Here's a brief explanation of techniques to address each
issue:
1. Reducing Underfitting:

Underfitting occurs when a model is too simple to capture the underlying patterns in the data.

Increase Model Complexity: Use more complex models or add more layers/neurons for neural
networks.
Feature Engineering: Add relevant features or use polynomial features to capture more variance
in data.
Decrease Regularization: Reduce penalties like L1/L2 regularization to allow the model to fit the
data better.
Train Longer: Increase training epochs to allow the model to learn better from the data.
Hyperparameter Tuning: Adjust parameters like learning rate, tree depth, or number of
estimators.

2. Reducing Overfitting:

Overfitting happens when a model learns noise or irrelevant details from the training data.

Regularization: Apply L1 (lasso) or L2 (ridge) regularization to penalize large coefficients and


prevent over-complexity.

Cross-Validation: Use techniques like k-fold cross-validation to ensure the model generalizes well
to unseen data.

Pruning (for tree-based models): Limit the depth of trees or remove unnecessary branches.

Dropout (for neural networks): Randomly drop neurons during training to prevent dependency
on specific features.

Data Augmentation: Increase training data size by introducing variations (e.g., rotating images or
adding noise).

Simplify the Model: Reduce model complexity by lowering the number of parameters or layers.

Early Stopping: Monitor validation loss during training and stop when performance starts to
degrade.

Increase Training Data: More data helps the model generalize better.
By balancing these techniques, you can improve your model's performance and achieve a good
trade-off between bias and variance.
Q9. Difference between overfitting and underfitting?

overfitting underfitting
 The model performs well on training The model performs poorly on both training
data but poorly on test data and test data
 Error Type is High variance, low bias Error type is High bias, low variance
 Model Complexity is Too complex Model complexity is too simple
 Training Data: Training Data: Fails to capture underlying
Learns noise and details in training data patterns in training data
 Performance: Performance: Poor performance on both
Poor generalization to new data seen and unseen data
 Symptoms: low training error, high Symptoms: High training and testing error
testing error

Q10. Define Regression and Explain different regression models??


Ans:- Regression finds correlations between dependent and independent variables. If the
desired output consists of one or more continuous variable, then the task is called as
regression.
Therefore, regression algorithms help predict continuous variables such as house prices,
market trends, weather patterns, oil and gas prices etc. When the targets in a dataset are
real numbers, the machine learning task is known as regression and each sample in the
dataset has a real-valued output or target.
01. Linear regression model:
A linear regression model is used to depict a relationship between variables that are
proportional to each other. This means that the dependent variable increases/decreases
with the independent variable.
In the graphical representation, it has a straight linear line plotted between the variables.
Even if the points are not exactly in a straight line (which is always the case) we can still
see a pattern and make sense of it.
For example, as the age of a person increases, the level of glucose in their body increases
as well.
02. Non-linear regression model:
A non-linear regression model allows for a more complex and flexible relationship
between variables. The relationship is described by a non-linear function rather than a
straight line. This function can have multiple parameters which you can estimate from the
gathered data by using statistical analysis.
This model is useful when you cannot capture the relationship between the variables
using a linear mode. It provides a powerful tool to analyze data and uncover complex
relationships between your dependent and independent variables.
03. Multiple regression model:
A multiple regression model is used when there is more than one independent variable
affecting a dependent variable. While predicting the outcome variable, it is important to
measure how each of the independent variables moves in their environment and how
their changes will affect the output or target variable.
For example, the chances of a student failing their test can be dependent on various input
variables like hard work, family issues, health issues, etc.

04. Stepwise regression modeling:


Unlike the above-mentioned regression model types, stepwise regression modeling is
more of a technique used when various input variables are affecting one output variable.
The analyst will automatically proceed to measure the variable that is directly correlated
input variable and build a model out of it. The rest of the variables come into the picture
when he decides to perfect the model.
The analyst may add the remaining inputs one after the other based on their significance
and the extent to which they affect the target variable.
For example, vegetable prices have increased in a certain areas. The reason behind the
event can be anything from natural calamities to transport and supply chain management.
When an analyst decides to put it out on a graph, he will pick up the most obvious reason,
heavy rainfall in the agricultural regions.
05.Logistic regression
Logistic regression is one of the types of regression analysis technique, which gets used
when the dependent variable is discrete. Example: 0 or 1, true or false, etc. This means the
target variable can have only two values, and a sigmoid curve denotes the relation
between the target variable and the independent variable.
Logit function is used in Logistic Regression to measure the relationship between the
target variable and independent variables. Below is the equation that denotes the logistic
regression.
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3….+bkXk
06. Lasso Regression
Lasso Regression is one of the types of regression in machine learning that performs
regularization along with feature selection. It prohibits the absolute size of the regression
coefficient. As a result, the coefficient value gets nearer to zero, which does not happen in
the case of Ridge Regression.
Due to this, feature selection gets used in Lasso Regression, which allows selecting a set of
features from the dataset to build the model. In the case of Lasso Regression, only the
required features are used, and the other ones are made zero. This helps in avoiding the
overfitting in the model. In case the independent variables are highly collinear, then Lasso
regression picks only one variable and makes other variables to shrink to zero.

07. Polynomial Regression


Polynomial Regression is another one of the types of regression analysis techniques in
machine learning, which is the same as Multiple Linear Regression with a little
modification. In Polynomial Regression, the relationship between independent and
dependent variables, that is X and Y, is denoted by the n-th degree.
08. Ridge Regression
This is another one of the types of regression in machine learning which is usually used
when there is a high correlation between the independent variables. This is because, in the
case of multi collinear data, the least square estimates give unbiased values. But, in case
the collinearity is very high, there can be some bias value. Therefore, a bias matrix is
introduced in the equation of Ridge Regression. This is a powerful regression method
where the model is less susceptible to overfitting.

Q11. Explain in brief linear regression??? Or what is linear regression?


Linear regression is one of the easiest and most popular Machine Learning algorithms. It is
a statistical method that is used for predictive analysis. Linear regression makes
predictions for continuous/real or numeric variables such as sales, salary, age, product
price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one
or more independent (y) variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship
between the variables.

Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor
Variable)
a0= intercept of the line (Gives an
additional degree of freedom)
a1 = Linear regression coefficient (scale
factor to each input value).
ε = random error The values for x and y
variables are training datasets for Linear
Regression model representation

Q12. Explain lasso Regression and Ridge Regression??


Lasso Regression
Lasso regression, commonly referred to as L1 regularization, is a method for stopping overfitting in linear
regression models by including a penalty term in the cost function. In contrast to Ridge regression, it adds
the total of the absolute values of the coefficients rather than the sum of the squared coefficients.

Lasso regression attempts to reduce the following cost function −

J(w)=(12)∗∑(y−h(y))2+∑|w|

where y is the actual value, h(y) denotes the predicted value, and w denotes the feature coefficient.
Lasso regression can reduce certain coefficients to zero, conducting feature selection in effect. With high-
dimensional datasets where many characteristics could be unnecessary or redundant, this is very helpful.
The resultant model is less complex and easier to understand, and by minimizing overfitting, it frequently
exhibits improved predictive performance
Ridge Regression
To combat the issue of overfitting in linear regression models, ridge regression is a regularization
approach. The size of the coefficients is reduced and overfitting is prevented by adding a penalty term to
the cost function of linear regression. The penalty term regulates the magnitude of the coefficients in the
model and is proportional to the sum of squared coefficients. The coefficients shrink toward zero when
the penalty term's value is raised, lowering the model's variance.

Ridge regression attempts to reduce the following cost function −

J(w)=(12)∗∑(y−h(y))2+∑|w|2

where y is the actual value, h(y) denotes the predicted value, and w denotes the feature coefficient.
Ridge regression works best when there are several tiny to medium-sized coefficients and when all
characteristics are significant. Also, it is computationally more effective than other regularization
methods. Ridge regression's primary drawback is that it does not erase any characteristics, which may
not always be a good thing. The specific situation at hand and the qualities of the data will determine
whether to use Ridge or another regularization approach.
Q13. What is Gradient descent? Explain Gradient descent algorithm and
also its limitation
Gradient Descent is defined as one of the most commonly used iterative optimization
algorithms of machine learning to train the machine learning and deep learning models. It
helps in finding the local minimum of a function.
Gradient descent is a mathematical algorithm that helps train machine learning models and
neural networks by finding the best weights and biases to minimize errors between predicted
and actual results.
Algorithm:
gradient descent, one takes steps proportional to the negative of the gradient of the function
at the current point.
* Gradient descent is popular for very large-scale optimization problems because it is easy to
implement, can handle black box functions, and each iteration is cheap.
* Given a differentiable scalar field f (x) and an initial guess x1, gradient descent iteratively
moves the guess toward lower values of "f" by taking steps in the direction of the negative
gradient f (x).

* Locally, the negated gradient is the steepest descent direction, i.e., the direction that x
would need to move in order to decrease "f" the fastest. The algorithm typically converges to
a local minimum, but may rarely reach a saddle point, or not move at all if x, lies at a local
maximum.

* The gradient will give the slope of the curve at that x and its direction will point to an
increase in the function. So we change x in the opposite direction to lower the function value:

How Gradient Descent Works:-


1. Initialize parameters: Choose initial values for the model's parameters.
2. Calculate cost: Compute the cost function using the current parameters.
3. Calculate gradients: Compute the partial derivatives of the cost function with respect to
each parameter.
4. Update parameters: Update the parameters using the gradients and a learning rate.
5. Repeat: Steps 2-4 until convergence or a stopping criterion is reached.
Limitations of gradient descent:-
Gradient descent is relatively slow close to the minimum technically, its asymptotic rate of
convergence is inferior to many other methods.
For poorly conditioned convex problems, gradient descent increasingly 'zigzags' as the
gradients point nearly orthogonally to the shortest direction to a minimum point
Q14. Compare Batch Gradient and Stochastic gradient Descent??
Q15. Write short note on Stochastic gradient descent algorithm and Batch
gradient algotithm
Ans :- Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent algorithm that is used for
optimizing machine learning models. It addresses the computational inefficiency of traditional Gradient
Descent methods when dealing with large datasets in machine learning projects.
In SGD, instead of using the entire dataset for each iteration, only a single random training example (or a
small batch) is selected to calculate the gradient and update the model parameters. This random selection
introduces randomness into the optimization process, hence the term “stochastic” in stochastic Gradient
Descent
Stochastic Gradient Descent Algorithm
Initialization: Randomly initialize the parameters of the model.
Set Parameters: Determine the number of iterations and the learning rate (alpha) for updating the
parameters.
Stochastic Gradient Descent Loop: Repeat the following steps until the model converges or reaches the
maximum number of iterations:
Shuffle the training dataset to introduce randomness.
Iterate over each training example (or a small batch) in the shuffled order.
Compute the gradient of the cost function with respect to the model parameters using the current training
example (or batch).
Update the model parameters by taking a step in the direction of the negative gradient, scaled by the
learning rate.
Evaluate the convergence criteria, such as the difference in the cost function between iterations of the
gradient.
Return Optimized Parameters: Once the convergence criteria are met or the maximum number of iterations
is reached, return the optimized model parameters.
Batch Gradient descent
In Batch Gradient Descent, all the training data is taken into consideration to take a single step. We take the
average of the gradients of all the training examples and then use that mean gradient to update our
parameters. So that’s just one step of gradient descent in one epoch.
Batch Gradient Descent is great for convex or relatively smooth error manifolds. In this case, we move
somewhat directly towards an optimum solution.
The graph of cost vs epochs is also quite smooth because we are averaging over all the gradients of training
data for a single step. The cost keeps on decreasing over the epochs.

Mini Batch Gradient Descent


Batch Gradient Descent can be used for smoother curves. SGD can be used when the dataset is large. Batch
Gradient Descent converges directly to minima. SGD converges faster for larger datasets. But, since in SGD
we use only one example at a time, we cannot implement the vectorized implementation on it. This can
slow down the computations. To tackle this problem, a mixture of Batch Gradient Descent and SGD is used.
Neither we use all the dataset all at once nor we use the single example at a time. We use a batch of a fixed
number of training examples which is less than the actual dataset and call it a mini-batch. Doing this helps
us achieve the advantages of both the former variants we saw. So, after creating the mini-batches of fixed
size, we do the following steps in one epoch:
Pick a mini-batch
Feed it to Neural Network
Calculate the mean gradient of the mini-batch
Use the mean gradient we calculated in step 3 to update the weights
Repeat steps 1–4 for the mini-batches we created
Q16. Write short note on Evaluation metrics???
Evaluation metrics in machine learning are used to assess the performance of a model, enabling
practitioners to understand its effectiveness and make improvements. The choice of metric depends on the
type of problem (classification, regression, or clustering) and the specific goals of the model. Common
evaluation metrics include:
1. Classification Metrics:

Accuracy: The ratio of correctly predicted instances to the total instances.


Precision: The ratio of true positives to the sum of true positives and false positives; measures correctness
of positive predictions.
Recall (Sensitivity): The ratio of true positives to the sum of true positives and false negatives; measures
completeness of positive predictions.
F1 Score: The harmonic mean of precision and recall, useful for imbalanced datasets.
ROC-AUC: The area under the Receiver Operating Characteristic curve, showing the trade-off between true
positive rate and false positive rate.

2. Regression Metrics:
Mean Absolute Error (MAE): The average of absolute differences between predicted and actual values.
Mean Squared Error (MSE): The average of squared differences between predicted and actual values;
penalizes larger errors more heavily.
Root Mean Squared Error (RMSE): The square root of MSE, providing error in the same unit as the target
variable.
R-squared: Indicates how well the model explains the variability of the target variable.

3. Clustering Metrics:

Silhouette Score: Measures how similar an object is to its own cluster compared to other clusters.
Davies-Bouldin Index: Measures the average similarity ratio of a cluster to its most similar cluster.
Adjusted Rand Index (ARI): Compares the similarity between predicted and true cluster assignments.
These metrics guide model selection, tuning, and deployment by revealing strengths and weaknesses in
performance.

Q17.Explain the following terms:-1]MAE 2]RMSE 3]R^2(R2) 4] MSE


1] MAE:- MAE is a very simple metric which calculates the absolute difference between actual
and predicted values.
To better understand, let's take an example you have input data and output data and use
Linear Regression, which draws a best-fit line.
Now you have to find the MAE of your model. Find the difference between the actual value
and predicted value
so, sum all the errors and divide them by a total number of observations And this is MAE. And
we aim to get a minimum MAE because this is a loss.

Advantages of MAE
The MAE you get is in the same unit as the output variable.
It is most Robust to outliers.
2] RMSE
RMSE is a square root of value gathered from the mean square error function. It helps us plot
a difference between the estimate and actual value of a parameter of the model.
Using RSME, we can easily measure the efficiency of the model.
A well-working algorithm is known if its RSME score of less than 180. Anyhow, if the RSME
value surpasses 180, we need to apply feature selection and hyper-parameter tuning on the
model parameter.

Advantages of RMSE
The output value you get is in the same unit as the required output variable which makes
interpretation of loss easy.

Disadvantages of RMSE

 It is not that robust to outliers as compared to MAE.


 Most of the time people use RMSE as an evaluation metric and mostly when you are
working with day learning techniques the most preferred metric is RMSE.

3] R^2[ R-SQUARE]
 R2 score is a metric that tells the performance of your model.
 In contrast, MAE and MSE depend on the context whereas the R2 score is independent of context.
 Hence, R2 squared is also known as Coefficient of Determination or sometimes also known as
Goodness of fit.
 Now, how will you interpret the R2 score ? suppose If the R2 score is zero then the above regression
line by mean line is equal means 1 so 1-1 is zero. So, in this case, both lines are overlapping means
model performance is worst, It is not capable to take advantage of the output column.
 Now the second case is when the R2 score is I, it means when the division term is zero and it wil
happen when the regression line does not make any mistake, it is perfect. In the real world and it i
not possible. So we can conclude that as our regression line moves towards perfection, R2 score
mate towards one. And the model performance improves.
 The normal case is when the R2 score is between zero and one like 0.8 which means your model is
capable to explain 80 per cent of the variance of data.

4] MSE
MSE Mean squared error (MSE) is a metric used to measure the average squared difference
between the predicted values and the actual values in the dataset. It is calculated by taking
the average of the squared residuals, where the residual is the difference between predicted
value and the actual value for each data point. The MSE value provides a way to analyze the
accuracy of the model.
ML UNIT 3

1. What is bias in machine learning? or explain bias?


2. How to reduce the high bias?
3. Define variance, explain low and high variance and how to reduce high variance?
4. Explain Bias variance Trade off?
5. Explain the following term:- variance
6. What is difference between bias and variance
7. What is overfitting and underfitting in machine learning model explain with
example, and explain the following terms over fitting and under fitting.
8. Explain in brief technique to reduce underfitting and overfitting in machine learning
9. Difference between over fitting and under fitting
10. Define regression and explain different regression models
11. Explain in brief linear regression or what is linear regression
12. Explain lasso regression and rigid recreation and difference
between lasso and Rigid regression
13. What is gradient decent? explain gradient decent algorithm and also its limitations
14 . Compare batch gradient and stochastic gradient decent
15 . Right short note and stochastic gradient decent algorithm and batch gradient
algorithm
16. Write short note on evaluation matrices?
17.explain the following terms:-1]MAE 2]RMSE 3]R^2[R square] 4]MSE

You might also like