Linearregressionpl
Linearregressionpl
com
YHZEPDBA51
Linear Regression
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 1
Sharing or publishing the contents in part or full is liable for legal action.
Linear Regression
• Linear regression is a simple statistical regression method used for predictive analysis that shows the relationship between
the continuous independent variable (X-axis) and the continuous dependent variable (Y-axis).
• In regression, we try to calculate the best fit line which describes the relationship between the predictors and dependent
variable.
• The equation of best fit line is Y=a+b*X + e, where a is intercept, b is slope of the line and e is error term.
• When we
[email protected] have one independent variable, we call it Simple Linear Regression (SLR). If the number of independent variables
YHZEPDBA51
• SLR Example: You are a social researcher interested in the relationship between income and happiness.
• MLR Example: The selling price of a house can depend on the desirability of the location, the number of bedrooms, the number of
bathrooms, the year the house was built, the square footage of the lot and a number of other factors.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 2
Sharing or publishing the contents in part or full is liable for legal action.
Assumptions
Simple Linear Regression:
• Linearity: The relationship between independent variables and the mean of the dependent variable is linear.
• Homoscedasticity: The variance of residuals should be equal.
• Independence: Observations are independent of each other.
• Normality: For any fixed value of an independent variable, the dependent variable is normally distributed.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 3
Sharing or publishing the contents in part or full is liable for legal action.
Linear Regression Model Representation
• The representation is a linear equation that combines a set of input values (x) and predicted output (y).
• As such, both the input values (x) and the output value are numeric.
• For example, in a simple regression problem (a single x and a single y), the form of the model would be
Y= β0 + β1x
• For example, in a multi linear regression problem, the form of the model would be
[email protected]
YHZEPDBA51
• In higher dimensions when we have more than one input (x), the line is called a plane or a hyperplane
• The coefficients β0,β1… βn are estimated using the Ordinary Least Square (OLS). The goal behind this is to
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 4
Sharing or publishing the contents in part or full is liable for legal action.
Performance of Regression
By using MAE, we calculate the average absolute difference between the actual values and the predicted values.
Mean Square Error (MSE) is defined as Mean or Average of the square of the difference between actual and estimated values.
[email protected]
YHZEPDBA51
RMSE calculates the square root average of the sum of the squared difference between the actual and the predicted values.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 5
Sharing or publishing the contents in part or full is liable for legal action.
R-squared values
R-square depicts the percentage of the variation in the dependent variable explained by the independent variable
in the model.
RSS = Residual sum of squares: It is the measure of the difference between the expected and the actual output. It
is also defined as follows:
TSS = Total sum of squares: It is the sum of errors of the data points from the mean of the response variable.
Adjusted R2 value will improve only if the added variable is making a significant contribution to the model. Adjusted
R2 value adds penalty in the model.
where R2 is R-square value, n = total number of observations, and k = total number of variables used in the model.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 6
Sharing or publishing the contents in part or full is liable for legal action.
Advantages and Disadvantages
Advantages
• Linear regression performs exceptionally well for linearly separable data
• Easy to implement and train the model
• It can handle overfitting using dimensionality reduction techniques and cross validation and regularization
Disadvantages
• Sometimes Lot of Feature Engineering Is required
[email protected]
YHZEPDBA51
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 7
Sharing or publishing the contents in part or full is liable for legal action.
1. When you will use linear regression?
Linear Regression analysis is used when you want to predict a continuous dependent variable from a number of independent
variables.
2. How do you know explain the linear regression to layman terms?
Linear regression is a way to explain the relationship between a dependent variable and one or more explanatory variables
using a straight line.
3. What is heteroscedasticity?
Heteroscedasticity
is exactly the opposite of homoscedasticity, which means that the error terms are not equally distributed. To
[email protected]
correct this phenomenon, usually, a log function is used.
YHZEPDBA51
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 8
Sharing or publishing the contents in part or full is liable for legal action.
5. What are the possible ways of improving the accuracy of a linear regression model?
There could be multiple ways of improving the accuracy of a linear regression, most commonly used ways are as
follows:
Outlier Treatment: Regression is sensitive to outliers, hence it becomes very important to treat the outliers with
appropriate values. Replacing the values with mean, median, mode or percentile depending on the distribution can
prove to be useful.
6. What if data is not normally distributed?
Using transformation like Power Law transformation, Log Normal, Box-Cox, Exponential transformation etc.,
7. Whether feature scaling is required in linear regression?
[email protected]
YHZEPDBA51
Yes, feature scaling is required if you are using gradient descent for creating linear regression.
8. How the best fit line is selected?
The least Sum of Squares of Errors is used as the cost function for Linear Regression. For all possible lines, calculate
the sum of squares of errors. The line which has the least sum of squares of errors is the best fit line.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 9
Sharing or publishing the contents in part or full is liable for legal action.