ML Unit-III Notes
ML Unit-III Notes
Independent Variable / Predictor: The factors which affect the dependent variables or which are used to predict the values
Outliers: Outlier is an observation which contains either very low value or very high value in comparison to other observed
Multicollinearity: If the independent variables are highly correlated with each other than other variables, then such condition
is called Multicollinearity. It should not be present in the dataset, because it creates problem while ranking the most affecting
variable.
Underfitting and Overfitting: If our algorithm works well with the training dataset but not well with test dataset, then such
problem is called Overfitting. And if our algorithm does not perform well even with training dataset, then such problem is
called underfitting.
Regression Analysis
Regression is a method for understanding the relationship between independent variables or features and
a dependent variable or outcome.
Regression analysis is a form of predictive modelling technique which investigates the relationship
between a dependent variable and independent variable
Regression analysis is one of the most basic tools in the area of machine learning used for prediction.
Regression analysis is an integral part of any forecasting or predictive model, so is a common method
found in machine learning powered predictive analytics.
Machine learning regression generally involves plotting a line of best fit through the data points.
Using this plot, the machine learning model can make predictions about the data.
Regression Analysis
In simple words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the regression
line is minimum."
The distance between each point and the line is minimised to achieve the best fit line
The distance between data points and line tells whether a model has captured a strong relationship or
not.
Regression analysis is a statistical method to model the relationship between a dependent (target) and
independent (predictor) variables with one or more independent variables.
It predicts continuous/real values such as temperature, age, salary, price, etc.
Regression Analysis
Regression is a supervised learning technique which helps in finding the correlation between variables
and enables us to predict the continuous output variable based on the one or more predictor variables.
It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect
relationship between variables.
This approach required labelled input and output training data to train models
who does various advertisement every year and get sales on that.
Now, the company wants to do the advertisement of $200 in the year 2023 and wants to know the
prediction about the sales for this year. So to solve such type of prediction problems in machine
learning, we need regression analysis.
Uses of Regression Analysis
Prediction of rain using temperature and other factors
Trend forecasting
Forecasting continuous outcomes like house prices, stock prices, sales prediction, map salary changes,
weather condition etc.,
Predicting the success of future retail sales or marketing campaigns to ensure resources are used
effectively.
Uses of Regression Analysis
Predicting customer or user trends, such as on streaming services or ecommerce websites.
Linear equations can be used to represent the relationship between two variables, most commonly x and y.
To form the simplest linear relationship, we can make our two variables equal: x y
y=x 0 0
By plugging numbers into the equation, we can find some relative values of x and y 1 1
If we plot those points in the xy-plane, we create a line. 2 2
3 3
Linear Relationship
Four Criteria of Equation to Qualify at Linear Relationship
The equation can have up to two variables, but it cannot have more than two variables.
All the variables in the equation are to the first power. None are squared or cubed or taken to any power.
Linear relationships such as y = 2 and y = x all graph out as straight lines. When graphing y = 2, you get a
line going horizontally at the 2 mark on the y-axis. When graphing y = x, you get a diagonal line crossing
the origin.
Linear Relationship
The concept of linear relationship is used in Linear regression algorithm to shows a relationship between
a dependent (y) and one or more independent (x) variables, hence called as linear regression
Linear regression finds how the value of the dependent variable is changing according to the value of
the independent variable.
Examples of Linear Relationship
Linear relationships are very common in our everyday life, even if we aren't consciously aware of them.
Take, for example, how fast things such as cars and trains can go. Have you ever thought about how their
speeds are calculated? When a police officer gives someone a speeding ticket, how do they know for sure
if the person was speeding? Well, they use a simple linear relationship called the rate formula.
Speed of Object = Distance / Time
Another example is that of converting temperature from Fahrenheit to Celsius. If you live in the United
States, you probably use Fahrenheit, but if you discuss weather with a friend who lives in a different part of
the world, you may need to convert the temperature to Celsius. You can use the conversion formula to
convert one temperature type to the other:
Measures of Linear Relationship
Numerical measures of linear relationship that provide the direction (Positive or Negative) and strength
(Strong or weak relationship) of the linear relationship between two interval variables:
Measures of Linear Relationship
Linear relationship can be measured by using following
Covariance
Coefficient of Correlation
Coefficient of Determination
❑It’s similar to variance, but where variance tells you how a single variable varies, covariance tells you
how two variables vary together.
❑Covariance is a statistical tool that is used to determine the direction of the relationship between the
movements of two random variables.
❑When two stocks tend to move together, they are seen as having a positive covariance; when they move
inversely, the covariance is negative.
❑The covariance equation is used to determine the direction of the relationship between two variables
❑This nature of relationship is determined by the sign (positive or negative) of the covariance value.
❑In other words, whether they tend to move in the same or opposite directions.
Covariance
The magnitude of covariance describes the strength of the association
Unfortunately, the magnitude may be difficult to judge. For example, if you’re told that the covariance
between two variables in 500, does this mean that there is a strong linear relationship? The answer is that it
When two variables move in the same direction (both increase or both decrease), the covariance will be a
When two variables move in the opposite direction ,the covariance will be a large negative number
• A positive covariance between two variables indicates that these variables tend to be higher or lower
at the same time.
• In other words, a positive covariance between variables x and y indicates that x is higher than average at
the same times that y is higher than average, and vice versa.
• When charted on a two-dimensional graph, the data points will tend to slope upwards.
❑Negative Covariance
• When the calculated covariance is less than zero, this indicates that the two variables have an
inverse relationship.
• In other words, an x value that is lower than average tends to be paired with a y that is greater than
average, and vice versa.
Covariance Formula
❑Formula
❑Where,
• xi = data value of x
• yi = data value of y
• x̄ = mean of x
• ȳ = mean of y
• N = number of data values.
Covariance
❑Below figure shows the covariance of X and Y.
❑If cov(X, Y) is greater than zero, then we can say that the covariance for any two variables is positive and
both the variables move in the same direction.
❑If cov(X, Y) is less than zero, then we can say that the covariance for any two variables is negative and both
the variables move in the opposite direction.
❑If cov(X, Y) is zero, then we can say that there is no relation between two variables.
❑The relationship between the correlation coefficient and covariance is given by;
Correlation,ρ(X,Y) = Cov(X,Y)/σX σy
❑Where:
• ρ(X,Y) = correlation between the variables X and Y
• Cov(X,Y) = covariance between the variables X and Y
• σX = standard deviation of the X variable
• σY = standard deviation of the Y variable
Covariance
Question:
Calculate the covariance for the following data:
X 2 8 18 20 28 30
Y 5 12 18 23 45 50
❑Solution:
Number of observations = 6
Mean of X = 17.67
Mean of Y = 25.5
Cov(X, Y) = (⅙) [(2 – 17.67)(5 – 25.5) + (8 – 17.67)(12 – 25.5) + (18 – 17.67)(18 – 25.5) + (20 – 17.67)(23 – 25.5)
+ (28 – 17.67)(45 – 25.5) + (30 – 17.67)(50 – 25.5)]
Cov(X, Y) = 157.83
Coefficient of Correlation
❑Correlation is used to test relationships between quantitative variables or categorical variables.
• Researchers have found a direct correlation between smoking and lung cancer.
❑Some examples of data that have a low correlation (or none at all):
• The cost of a car wash and how long it takes to buy a soda inside the station.
Coefficient of Correlation
❑Correlations are useful because if you can find out what relationship variables have, you can
make predictions about future behaviour.
❑Knowing what the future holds is very important in the social sciences like government and healthcare.
Businesses also use these statistics for budgets and business plans.
❑Correlation means association - more precisely it is a measure of the extent to which two variables are
related. There are three possible results of a correlational study:
• A positive correlation,
• No correlation.
Coefficient of Correlation
❑A Positive Correlation:
• It is a relationship between two variables in which both variables move in the same direction.
• Therefore, when one variable increases as the other variable increases, or one variable decreases while
the other decreases.
• An example of positive correlation would be height and weight. Taller people tend to be heavier.
❑A Negative Correlation:
• Relationship between two variables in which an increase in one variable is associated with a decrease
in the other.
• An example of negative correlation would be height above sea level and temperature. As you climb the
mountain (increase in height) it gets colder (decrease in temperature).
❑A zero Correlation exists when there is no relationship between two variables. For example there is no
relationship between the amount of tea drunk and level of intelligence.
Coefficient of Correlation
❑A correlation can be expressed visually by drawing a scattergram (also known as a scatterplot, scatter
graph, scatter chart, or scatter diagram).
❑A scattergram is a graphical display that shows the relationships or associations between two numerical
variables (or co-variables), which are represented as points (or dots) for each pair of score.
❑A scattergraph indicates the strength and direction of the correlation between the co-variables.
❑A correlation coefficient of -1 means that for every positive increase in one variable, there is a negative
decrease of a fixed proportion in the other. For example, the amount of gas in a tank decreases in (almost)
perfect correlation with speed.
❑Zero means that for every increase, there isn’t a positive or negative increase. The two just aren’t
related.
❑One of the most commonly used formulas is Pearson’s correlation coefficient formula.
Coefficient of Correlation
Two other formulas are commonly used: the sample correlation coefficient and the population correlation
coefficient.
Sx and sy are the sample standard deviations, and sxy is the sample covariance.
The population correlation coefficient uses σx and σy as the population standard deviations, and σxy as the
population covariance.
Coefficient of Correlation
❑The drawback to the coefficient of correlation is that- except for the three values ( -1, 0,+1), we cannot
interpret the correlation
❑For example, suppose that we calculated the coefficient of correlation to be -0.4. What does this tell us?
Because 0.4 is closer to 0 than 1, we judge that the linear relationship is weak
❑In many applications, we need a better interpretation than the “linear relationship is weak”
Correlation Example
❑The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day.
Here are their figures for the last 12 days: Ice Cream Sales vs Temperature
Temperature °C Ice Cream Sales
❑And here is the same data as a Scatter Plot 14.2° $215
16.4° $325
11.9° $185
15.2° $332
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
18.1° $421
22.6° $445
17.2° $408
In fact the correlation is 0.9575
Calculating Correlation (Pearson's Correlation)
❑Let us call the two sets of data "x" and "y" (in our case Temperature is x and Ice Cream Sales is y):
❑Step 2: Subtract the mean of x from every x value (call them "a"), and subtract the mean of y from every y
value (call them "b")
❑Step 5: Divide the sum of ab by the square root of [(sum of a2) × (sum of b2)]
❑Here is how I calculated the first Ice Cream example (values rounded to 1 or 0 decimal places):
Calculating Correlation (Pearson's Correlation)
❑ Formula
Where:
Σ is Sigma, the symbol for "sum up"
is each x-value minus the mean of x (called "a" above)
is each y-value minus the mean of y (called "b" above)
Coefficient of Determination (R- Squared)
It is calculated by squaring the coefficient of correlation.
The coefficient of determination (R-squared) is used to analyse how differences in one variable can be
explained by a difference in a second variable. For example, when a lioness gets pregnant has a direct
More specifically, R-squared gives you the percentage variation in y explained by x-variables. The range
The coefficient of determination measures the amount of variation in the dependent variable that is
Where The SSR calculates the difference between the observations of the dependent variable yi and the
corresponding value predicted by the model pi.
The sum of squared residuals is also known as the sum of squared errors.
The SST calculates the sum of squares between the observations of the dependent variable and their mean.
Coefficient of Determination (R- Squared)
For Example: Assume R2 = 0.68
It can be referred that 68% of the changeability of the dependent output attribute can be explained by the model
while the remaining 32 % of the variability is still unaccounted for.
R2 indicates the proportion of data points which lie within the line created by the regression equation. A higher value
of R2 is desirable as it indicates better results.
Case 1: Model gives accurate results
Actual Predicted
Error (E1=y-p) SSR =(E1)2 E2= y-Mean SST= (E2)2
(y) (p)
10 10 0 0 -10 100
20 20 0 0 0 0
30 30 0 0 10 100
R2 = 1 – (0/200) = 1
Coefficient of Determination (R- Squared)
Case 2: Model gives same result always
20 20 0 0 0 0
R2 = 1 – (200/200) = 0
Coefficient of Determination (R- Squared)
Case 3: Model gives ambiguous result
Actual Predicted
Error (E1=y-p) SSR =(E1)2 E2= y-Mean SST= (E2)2
(y) (p)
20 10 10 100 0 0
30 20 10 100 10 100
R2 = 1 – (600/200) = - 2
Least Square
The least square is a formula used to measure the accuracy of a straight line in depicting the data that
was used to generate it. That is, the formula determines the line of best fit.
The least squares criterion is determined by minimizing the sum of squares created by a mathematical
function.
A square is determined by squaring the distance between a data point and the regression line or mean
value of the data set.
The least squares criterion method is used throughout finance, economics, and investing.
Line of Best Fit / Regression line / Trend line
A Line of best fit is a straight line that represents the best approximation of a scatter plot of data points.
It is used to study the nature of the relationship between those points.
It is an output of regression analysis and can be used as a prediction tool.
Line of best fit refers to a line through a scatter plot of data points that best expresses the relationship
between those points.
The equation to find the best fitting line is:
Y` = A + bX where,
E = Y – Y` where,
best fit for a set of data, providing a visual demonstration of the relationship between the data points.
Each point of data represents the relationship between a known independent variable and an unknown
dependent variable.
The least squares method is a statistical procedure to find the best fit for a set of data points by
minimizing the sum of the offsets or residuals of points from the plotted curve.
The least squares method provides the overall rationale for the placement of the line of best fit among the
An analyst using the least squares method will generate a line of best fit that explains the potential
An example of the least squares method is an analyst who wishes to test the relationship between a
company’s stock returns, and the returns of the index for which the stock is a component. In this example,
the analyst seeks to test the dependence of the stock returns on the index returns.
To achieve this, all of the returns are plotted on a chart. The index returns are then designated as the
independent variable, and the stock returns are the dependent variable. The line of best fit provides the
analysts, the method can help to quantify the relationship between two or more variables—such as a
stock’s share price and its earnings per share (EPS). By performing this type of analysis investors often try
To illustrate, consider the case of an investor considering whether to invest in a gold mining company.
The investor might wish to know how sensitive the company’s stock price is to changes in the market price
of gold. To study this, the investor could use the least squares method to trace the relationship between
those two variables over time onto a scatter plot. This analysis could help the investor predict the degree
to which the stock’s price would likely rise or fall for any given increase or decrease in the price of
gold.
Coefficient Calculation with Least Square Method / Least Square
Regression
The equation to find the best fitting line is:
Y` = A + bX where,
Y` denotes the predicted value / dependent variable
b denotes the slope /gradient of the line
X denotes the independent variable
A is the Y intercept
Coefficient Calculation with Least Square Method / Least Square
Regression
The coefficients A & b are derived using calculus given by following formula so that we minimize the sum of
squared deviations:
Coefficient Calculation with Least Square Method / Least Square
Regression
Least Square Method Graph
In linear regression, the line of best fit is a straight line as shown in the following diagram:
The given data points are to be minimized by the method of reducing residuals or offsets of each point
from the line. The vertical offsets are generally used in surface, polynomial and hyperplane problems, while
perpendicular offsets are utilized in common practice.
Coefficient Calculation with Least Square Method / Least Square
Regression
Coefficient Calculation with Least Square Method / Least Square
Regression
Coefficient Calculation with Least Square Method / Least Square
Regression
Coefficient Calculation with Least Square Method / Least Square
Regression
Coefficient Calculation with Least Square Method / Least Square
Regression
Coefficient Calculation with Least Square Method / Least Square
Regression
Uses of Least Square Method
The Least Square Method is used in order to find the independent variables in different fields coming from
Anthropology to Zoology:
Regression analysis is a form of predictive modelling technique which investigates the relationship
between a dependent variable and independent variable
Regression analysis is one of the most basic tools in the area of machine learning used for prediction.
Regression analysis is an integral part of any forecasting or predictive model, so is a common method
found in machine learning powered predictive analytics.
Machine learning regression generally involves plotting a line of best fit through the data points.
Using this plot, the machine learning model can make predictions about the data.
Regression Analysis
In simple words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the regression
line is minimum."
The distance between each point and the line is minimised to achieve the best fit line
The distance between data points and line tells whether a model has captured a strong relationship or
not.
Regression analysis is a statistical method to model the relationship between a dependent (target) and
independent (predictor) variables with one or more independent variables.
It predicts continuous/real values such as temperature, age, salary, price, etc.
Regression Analysis
Regression is a supervised learning technique which helps in finding the correlation between variables
and enables us to predict the continuous output variable based on the one or more predictor variables.
It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect
relationship between variables.
This approach required labelled input and output training data to train models
who does various advertisement every year and get sales on that.
Now, the company wants to do the advertisement of $200 in the year 2023 and wants to know the
prediction about the sales for this year. So to solve such type of prediction problems in machine
learning, we need regression analysis.
Uses of Regression Analysis
Prediction of rain using temperature and other factors
Trend forecasting
Forecasting continuous outcomes like house prices, stock prices, sales prediction, map salary changes,
weather condition etc.,
Predicting the success of future retail sales or marketing campaigns to ensure resources are used
effectively.
Uses of Regression Analysis
Predicting customer or user trends, such as on streaming services or ecommerce websites.
It is mostly used for finding out the relationship between variables and forecasting.
Different regression models differ based on – the kind of relationship between dependent and
independent variables they are considering, and the number of independent variables getting used.
Simple Linear Regression
Linear regression is a statistical regression method which is used for predictive analysis.
It is one of the very simple and easy algorithms which works on regression and shows the relationship
between the continuous variables / quantity.
Linear regression shows the linear relationship between the independent variable (X-axis) and the
dependent variable (Y-axis), hence called linear regression.
If there is only one input variable (x), then such linear regression is called simple linear regression. And
if there is more than one input variable, then such linear regression is called multiple linear regression.
Simple Linear regression is a linear regression technique which plots a straight line within data points to
minimise error between the line and the data points.
Simple Linear Regression
Outliers may be a common occurrence in simple linear regression because of the straight line of best
fit.
While training and building a regression model, it is these coefficients which are learned and fitted to
training data.
The aim of the training is to find the best fit line such that cost function is minimized. [The cost
function helps in measuring the error]. During the training process, we try to minimize the error between
actual and predicted values and thus minimizing the cost function.
The relationship between variables in the linear regression model can be explained using the below image.
Here we are predicting the salary of an employee on the basis of the year of experience.
Simple Linear Regression
Below is the mathematical equation for linear regression to predict dependent variable (Y) based on values
of independent variables (X). It can be used for the cases where we want to predict some continuous
quantity.
Applications of Simple Regression
Predicting the house price based on the size of the house, availability of schools in the area, and other
essential factors
Predicting the sales revenue of a company based on data such as the previous sales of the company
Predicting the temperature of any day based on data such as wind speed, humidity, atmospheric pressure
Salary forecasting
Demand Forecasting – To predict demand for goods and services. For example, restaurant chains can
predict the quantity of food depending on weather.
Real estate prediction – To model residential home prices as a function of the home’s living area,
bathrooms, no. of bedrooms, lot size
Applications of Simple Regression
Arriving at ETAs in traffic.
Risk Analysis for disease – For example; To analyse the effect of a proposed radiation treatment on
reducing tumour sizes based on patient attributes such as age or weight
Economic Growth – Used to determine the Economic Growth of a country or a state in coming quarter; can
also be used to predict the GDP of a country.
Product Price – Can be used to predict what would be the price of a product in the future.
Score Prediction – To predict the no. of runs a player would score in the coming matches based on
previous performance
Simple Regression Use Case
Advantages & Disadvantages
Advantages:
It handles overfitting well using dimensionally reduction techniques, regularization and cross-
validation
Disadvantages:
It is prone to multicollinearity
Multiple Linear Regression
In Simple Linear Regression, a single Independent/Predictor(X) variable is used to model the dependent
variable (Y). But there may be various cases in which the dependent variable is affected by more than one
predictor variable; for such cases, the Multiple Linear Regression algorithm is used.
For MLR, the dependent or target variable(Y) must be the continuous/real, but the predictor or
independent variable may be of continuous or categorical form.
Each feature/dependent variable must model the linear relationship with the independent variable.
The technique enables analysts to determine the variation of the model and the relative contribution of
each independent variable in the total variance.
Moreover, Multiple Linear Regression is an extension of Simple Linear regression as it takes more than one
predictor variable to predict the dependent / response / feature variable. We can define it as:
Multiple Linear Regression
“Multiple Linear Regression is one of the important regression algorithms which models the linear
relationship between a single dependent continuous variable and more than one independent
variable.”
It can also be non-linear, where the dependent and independent variables do not follow a straight line.
You can use multiple linear regression when you want to know:
How strong the relationship is between two or more independent variables and one dependent
variable (e.g. how rainfall, temperature, and amount of fertilizer added affect crop growth).
The value of the dependent variable at a certain value of the independent variables (e.g. the expected
yield of a crop at certain levels of rainfall, temperature, and fertilizer addition).
Multiple Linear Regression
In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple predictor variables
x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so the same is applied for the
multiple linear regression equation, the equation becomes:
Y= Output/Response variable
b0=the y-intercept
The regression residuals must be normally distributed i.e. Multiple linear regression assumes that the
amount of error in the residuals is similar at each point of the linear model.
In a normal distribution, data is symmetrically distributed with no skew. When plotted on a graph, the
data follows a bell shape, with most values clustering around a central region and tapering off as they
go further away from the center.
Normal distributions are also called Gaussian distributions or bell curves because of their shape.
MLR assumes little or no multicollinearity (correlation between the independent variable) in data.
Multiple Linear Regression
Example:
Prediction of CO2 emission based on engine size and number of cylinders in a car.
You are a public health researcher interested in social factors that influence heart disease. You survey
500 towns and gather data on the percentage of people in each town who smoke, the percentage of
people in each town who bike to work, and the percentage of people in each town who have heart
disease. Because you have two independent variables and one dependent variable, and all your
variables are quantitative, you can use multiple linear regression to analyze the relationship between
them.
It is also called the special case of Multiple Linear Regression in ML. Because we add some polynomial
terms to the Multiple Linear regression equation to convert it into Polynomial Regression.
It is a linear model with some modification in order to increase the accuracy.
It makes use of a linear regression model to fit the complicated and non-linear functions and datasets.
Polynomial Regression
Hence, "In Polynomial regression, the original features are converted into Polynomial features of
required degree (2,3,..,n) and then modeled using a linear model."
If we apply a linear model on a linear dataset, then it provides us a good result as we have seen in Simple
Linear Regression, but if we apply the same model without any modification on a non-linear dataset, then
it will produce a drastic output. Due to which loss function will increase, the error rate will be high, and
accuracy will be decreased.
So for such cases, where data points are arranged in a non-linear fashion, we need the Polynomial
Regression model. We can understand it in a better way using the below comparison diagram of the
linear dataset and non-linear dataset.
Polynomial Regression
In the above image, we have taken a dataset which is arranged non-linearly. So if we try to cover it with a
linear model, then we can clearly see that it hardly covers any data point. On the other hand, a curve is
suitable to cover most of the data points, which is of the Polynomial model.
Hence, if the datasets are arranged in a non-linear fashion, then we should use the Polynomial Regression
model instead of Simple Linear Regression.
A Polynomial Regression algorithm is also called Polynomial Linear Regression because it does not
depend on the variables, instead, it depends on the coefficients, which are arranged in a linear
fashion.
Equation of Polynomial Regression
Simple Linear Regression equation: y = b0+b1x
When we compare the above three equations, we can clearly see that all three equations are Polynomial
equations but differ by the degree of variables.
The Simple and Multiple Linear equations are also Polynomial equations with a single degree,
Polynomial regression equation is Linear equation with the nth degree. So if we add a degree to our linear
equations, then it will be converted into Polynomial Linear equations.
Metric for regression
Most beginners and practitioners most of the time do not bother about the model performance.
The talk is about building a well-generalized model, Machine learning model cannot have 100 per cent
efficiency otherwise the model is known as a biased model. which further includes the concept of
overfitting and underfitting.
It is necessary to obtain the accuracy on training data, But it is also important to get a genuine and
approximate result on unseen data otherwise Model is of no use.
So to build and deploy a generalized model we require to evaluate the model on different metrics which
helps us to better optimize the performance, fine-tune it, and obtain a better result.
There are five error metrics that are commonly used for evaluating and reporting the performance of a
regression model; they are:
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
R-Squared
Adjusted R-Squared
Mean Absolute Error (MAE)
MAE is a very simple metric which calculates the average of absolute difference between actual and
predicted values.
To better understand, let’s take an example you have input data and output data and use Linear
Regression, which draws a best-fit line.
Now you have to find the MAE of your model which is basically a mistake made by the model known
as an error. Now find the difference between the actual value and predicted value that is an absolute error
but we have to find the mean absolute of the complete dataset.
So, sum all the errors and divide them by a total number of observations And this is MAE. And we
aim to get a minimum MAE because this is a loss.
• Where,
• N = total number of data points
• Yi = actual value
• Ŷi = predicted value
Mean Absolute Error (MAE)
If we don’t take the absolute values, then the negative difference will cancel out the positive
difference and we will be left with a zero upon summation.
A small MAE suggests the model is great at prediction, while a large MAE suggests that your model
may have trouble in certain areas. MAE of 0 means that your model is a perfect predictor of the
outputs.
Advantages of MAE
The MAE you get is in the same unit as the output variable.
Disadvantages of MAE
The graph of MAE is not differentiable so we have to apply various optimizers like Gradient descent
which can be differentiable.
Mean Squared Error (MSE)
This is the mean / average of the squared difference of the actual value in the dataset and the value
predicted by the model.
Here, the error term is squared and thus more sensitive to outliers as compared to Mean Absolute Error
(MAE).
MSE uses the square operation to remove the sign of each error value and to punish large errors.
As we take the square of the error, the effect of larger errors become more pronounced then smaller error,
hence the model can now focus more on the larger errors.
• Where,
• N = total number of data points
• Yi = actual value
• Ŷi = predicted value
Mean Squared Error (MSE)
The MSE will be large if there are outliers in the dataset, this is not the case with MAE.
MSE focuses on larger errors, as when we are squaring the error the effect of large errors becomes
more prominent.
If the errors are low, lower than one, then it leads to underestimating the model’s error.
Advantages of MSE
The graph of MSE is differentiable, so you can easily use it as a loss function.
Disadvantages of MSE
The value you get after calculating MSE is a squared unit of output. for example, the output variable is
in meter(m) then after calculating MSE the output we get is in meter squared.
Root Mean Squared Error (RMSE)
As RMSE is clear by the name itself, that it is a simple square root of mean squared error.
It is the average root-squared difference between the real value and the predicted value. By taking a square
root of MSE, we get the Root Mean Square Error.
We want the value of RMSE to be as low as possible, as lower the RMSE value is, the better the model is
with its predictions. A Higher RMSE indicates that there are large deviations between the predicted and
actual value.
• Where,
• N = total number of data points
• Yi = actual value
• Ŷi = predicted value
Advantages of RMSE
The output value you get is in the same unit as the required output variable which makes
interpretation of loss easy.
Disadvantages of RMSE
It is not that robust to outliers as compared to MAE.