0% found this document useful (0 votes)

9 views

Linearregressionpl

Linear regression is a statistical method for predicting a continuous dependent variable from one or more independent variables. It builds a linear model that relates the dependent variable to the independent variables. Simple linear regression uses one independent variable, while multiple linear regression uses more than one. Linear regression assumes a linear relationship between variables, and that residuals are independent and normally distributed. It estimates coefficients that minimize the sum of squared residuals to find the line of best fit.

Uploaded by

Prathamesh Lavekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Linearregressionpl

Uploaded by

Prathamesh Lavekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

prathameshlavekar@gmail.

com
YHZEPDBA51
Linear Regression

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 1
Sharing or publishing the contents in part or full is liable for legal action.
Linear Regression
• Linear regression is a simple statistical regression method used for predictive analysis that shows the relationship between

the continuous independent variable (X-axis) and the continuous dependent variable (Y-axis).

• In regression, we try to calculate the best fit line which describes the relationship between the predictors and dependent

variable.

• The equation of best fit line is Y=a+b*X + e, where a is intercept, b is slope of the line and e is error term.

• When we
[email protected] have one independent variable, we call it Simple Linear Regression (SLR). If the number of independent variables
YHZEPDBA51

is more than one, we call it Multiple Linear Regression (MLR).

• SLR Example: You are a social researcher interested in the relationship between income and happiness.

• MLR Example: The selling price of a house can depend on the desirability of the location, the number of bedrooms, the number of

bathrooms, the year the house was built, the square footage of the lot and a number of other factors.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 2
Sharing or publishing the contents in part or full is liable for legal action.
Assumptions
Simple Linear Regression:
• Linearity: The relationship between independent variables and the mean of the dependent variable is linear.
• Homoscedasticity: The variance of residuals should be equal.
• Independence: Observations are independent of each other.
• Normality: For any fixed value of an independent variable, the dependent variable is normally distributed.

Multi Linear Regression:

• Linearity: The relationship between independent variables and the mean of the dependent variable is linear.
[email protected]
YHZEPDBA51 • Multicollinearity: There should not be high correlation between two or more independent variables. Multicollinearity
can be checked using correlation matrix, Tolerance and Variance Inflation Factor (VIF).

• Homoscedasticity: The variance of residuals should be equal.

• Multivariate Normality: Residuals should be normally distributed.
• Categorical Data: Any categorical data present should be converted into dummy variables.
• Minimum records: There should be at least 20 records of independent variables.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 3
Sharing or publishing the contents in part or full is liable for legal action.
Linear Regression Model Representation

• The representation is a linear equation that combines a set of input values (x) and predicted output (y).

• As such, both the input values (x) and the output value are numeric.

• For example, in a simple regression problem (a single x and a single y), the form of the model would be

Y= β0 + β1x

• For example, in a multi linear regression problem, the form of the model would be
[email protected]
YHZEPDBA51

Y= β0 + β1x1+ β2x2+ β3x3……..+ βnxn

• In higher dimensions when we have more than one input (x), the line is called a plane or a hyperplane

• The coefficients β0,β1… βn are estimated using the Ordinary Least Square (OLS). The goal behind this is to

minimize the squared difference between actual and predict values.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 4
Sharing or publishing the contents in part or full is liable for legal action.
Performance of Regression

Mean Absolute Error (MAE)

By using MAE, we calculate the average absolute difference between the actual values and the predicted values.

Mean Square Error (MSE)

Mean Square Error (MSE) is defined as Mean or Average of the square of the difference between actual and estimated values.

[email protected]
YHZEPDBA51

Root Mean Square Error (RMSE)

RMSE calculates the square root average of the sum of the squared difference between the actual and the predicted values.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 5
Sharing or publishing the contents in part or full is liable for legal action.
R-squared values

R-square depicts the percentage of the variation in the dependent variable explained by the independent variable
in the model.

RSS = Residual sum of squares: It is the measure of the difference between the expected and the actual output. It
is also defined as follows:

TSS = Total sum of squares: It is the sum of errors of the data points from the mean of the response variable.

Higher the R-square

[email protected]
YHZEPDBA51 value better the model. The value of R2 increases if we add more variables to the model
irrespective of the variable contributing to the model or not. This is the disadvantage of using R2.

Adjusted R-squared values

Adjusted R2 value will improve only if the added variable is making a significant contribution to the model. Adjusted
R2 value adds penalty in the model.

where R2 is R-square value, n = total number of observations, and k = total number of variables used in the model.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 6
Sharing or publishing the contents in part or full is liable for legal action.
Advantages and Disadvantages
Advantages
• Linear regression performs exceptionally well for linearly separable data
• Easy to implement and train the model
• It can handle overfitting using dimensionality reduction techniques and cross validation and regularization

Disadvantages
• Sometimes Lot of Feature Engineering Is required
[email protected]
YHZEPDBA51

• If the independent features are correlated it may affect performance

• It is often quite prone to noise and overfitting

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 7
Sharing or publishing the contents in part or full is liable for legal action.
1. When you will use linear regression?
Linear Regression analysis is used when you want to predict a continuous dependent variable from a number of independent
variables.
2. How do you know explain the linear regression to layman terms?
Linear regression is a way to explain the relationship between a dependent variable and one or more explanatory variables
using a straight line.
3. What is heteroscedasticity?
Heteroscedasticity
is exactly the opposite of homoscedasticity, which means that the error terms are not equally distributed. To
[email protected]
correct this phenomenon, usually, a log function is used.
YHZEPDBA51

4. What is the difference between R square and adjusted R square?

R square and adjusted R square values are used for model validation in case of linear regression. R square indicates the variation
of all the independent variables on the dependent variable. i.e. it considers all the independent variable to explain the variation.
In the case of Adjusted R squared, it considers only significant variables(P values less than 0.05) to indicate the percentage of
variation in the model.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 8
Sharing or publishing the contents in part or full is liable for legal action.
5. What are the possible ways of improving the accuracy of a linear regression model?
There could be multiple ways of improving the accuracy of a linear regression, most commonly used ways are as
follows:
Outlier Treatment: Regression is sensitive to outliers, hence it becomes very important to treat the outliers with
appropriate values. Replacing the values with mean, median, mode or percentile depending on the distribution can
prove to be useful.
6. What if data is not normally distributed?
Using transformation like Power Law transformation, Log Normal, Box-Cox, Exponential transformation etc.,
7. Whether feature scaling is required in linear regression?
[email protected]
YHZEPDBA51
Yes, feature scaling is required if you are using gradient descent for creating linear regression.
8. How the best fit line is selected?
The least Sum of Squares of Errors is used as the cost function for Linear Regression. For all possible lines, calculate
the sum of squares of errors. The line which has the least sum of squares of errors is the best fit line.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This file is meant for personal use by [email protected] only. 9
Sharing or publishing the contents in part or full is liable for legal action.

Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Data Science
100% (1)
Data Science
14 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
1.5.Linear Regression
No ratings yet
1.5.Linear Regression
5 pages
Complete Linear Regression Algorithm
No ratings yet
Complete Linear Regression Algorithm
4 pages
3 Da
No ratings yet
3 Da
16 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Linear regression case study
No ratings yet
Linear regression case study
6 pages
Regression v33
No ratings yet
Regression v33
81 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
11 Regression
No ratings yet
11 Regression
34 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Hanan
No ratings yet
Hanan
9 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Data Scienece Note
No ratings yet
Data Scienece Note
38 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
25 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Day 2-Data Science
No ratings yet
Day 2-Data Science
16 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
DAUNIT-3
No ratings yet
DAUNIT-3
32 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
7 pages
He Images Outline the Steps to Solve a Supervised Learning Problem
No ratings yet
He Images Outline the Steps to Solve a Supervised Learning Problem
24 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Linear Regression Basic Interview Questions
No ratings yet
Linear Regression Basic Interview Questions
36 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Interview Questions - Linear Regression
No ratings yet
Interview Questions - Linear Regression
6 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
1linear Regression
No ratings yet
1linear Regression
12 pages
ML unit-2 ppt
No ratings yet
ML unit-2 ppt
34 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Unit-2
No ratings yet
Unit-2
18 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Linear Regression For Intermediate
No ratings yet
Linear Regression For Intermediate
6 pages
LRQA
No ratings yet
LRQA
3 pages
Unit II-II
No ratings yet
Unit II-II
21 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Arpita - Sarkar - Business - Report - 17th December, 2023
No ratings yet
Arpita - Sarkar - Business - Report - 17th December, 2023
23 pages
Business+Report - Ensemble1 Lavekar
No ratings yet
Business+Report - Ensemble1 Lavekar
32 pages
Clustering
No ratings yet
Clustering
7 pages
Baggingand Boosting
No ratings yet
Baggingand Boosting
8 pages
05 - Technological & Quantitative Forecasting
No ratings yet
05 - Technological & Quantitative Forecasting
44 pages
Analysis of The Effect of Food Quality Price Locat
No ratings yet
Analysis of The Effect of Food Quality Price Locat
14 pages
Sample Sec 3
No ratings yet
Sample Sec 3
16 pages
PDF Introduction To Statistics For Biology Third Edition Trudy A. Watt Download
100% (9)
PDF Introduction To Statistics For Biology Third Edition Trudy A. Watt Download
71 pages
MGT782 - Assignment 3
No ratings yet
MGT782 - Assignment 3
8 pages
Trade-lecture-11-Trefler 1995
No ratings yet
Trade-lecture-11-Trefler 1995
8 pages
AP Stats Study Guide 1 1 1
No ratings yet
AP Stats Study Guide 1 1 1
21 pages
Chapter6 Anova Bibd
No ratings yet
Chapter6 Anova Bibd
23 pages
Normal Distribution Collection
No ratings yet
Normal Distribution Collection
40 pages
ITLS5050 Data Set 2 Workers Vs Production
No ratings yet
ITLS5050 Data Set 2 Workers Vs Production
38 pages
Estimates of The Variance Components, Heritability and Genetic Gains
No ratings yet
Estimates of The Variance Components, Heritability and Genetic Gains
15 pages
Impact of Exchange Rate Fluctuations On Domestic Prices
No ratings yet
Impact of Exchange Rate Fluctuations On Domestic Prices
28 pages
CH 04 Wooldridge 5e PPT
No ratings yet
CH 04 Wooldridge 5e PPT
39 pages
BS 812-101 PDF
No ratings yet
BS 812-101 PDF
12 pages
Impact of Teacher Characteristics On Students" Academic Performance in Public Secondary Schools
No ratings yet
Impact of Teacher Characteristics On Students" Academic Performance in Public Secondary Schools
7 pages
QCH 3
No ratings yet
QCH 3
2 pages
Source DF Adj Ss Adj Ms F-Value P-Value Sig
100% (1)
Source DF Adj Ss Adj Ms F-Value P-Value Sig
7 pages
Neural Networks and Statistical Models
No ratings yet
Neural Networks and Statistical Models
13 pages
Paper 2
No ratings yet
Paper 2
13 pages
-OceanofPDF.com-Machine_Learning_in_Farm_Animal_Behavior_-_Natasa_Kleanthous
No ratings yet
-OceanofPDF.com-Machine_Learning_in_Farm_Animal_Behavior_-_Natasa_Kleanthous
565 pages
Applied Quantitative Analysis for Real Estate 1st Edition Sotiris Tsolacos instant download
100% (1)
Applied Quantitative Analysis for Real Estate 1st Edition Sotiris Tsolacos instant download
61 pages
Control Charts
No ratings yet
Control Charts
58 pages
7 - 8. Hanke, John E. - Wichern, Dean W. - Business Forecasting
No ratings yet
7 - 8. Hanke, John E. - Wichern, Dean W. - Business Forecasting
60 pages
Numerical Technique Laboratory PDF
No ratings yet
Numerical Technique Laboratory PDF
56 pages
Jawaban Uas Ekonomatrika
No ratings yet
Jawaban Uas Ekonomatrika
4 pages
Biostatistics For Public Health: Chapter 11 - Inference About A Mean
No ratings yet
Biostatistics For Public Health: Chapter 11 - Inference About A Mean
31 pages
Vds
100% (1)
Vds
220 pages
Linear Models
No ratings yet
Linear Models
92 pages
Model Stokastik
No ratings yet
Model Stokastik
6 pages
PS4 Solution
No ratings yet
PS4 Solution
9 pages

Linearregressionpl

Uploaded by

Linearregressionpl

Uploaded by

prathameshlavekar@gmail.

is more than one, we call it Multiple Linear Regression (MLR).

Multi Linear Regression:

• Homoscedasticity: The variance of residuals should be equal.

Y= β0 + β1x1+ β2x2+ β3x3……..+ βnxn

minimize the squared difference between actual and predict values.

Mean Absolute Error (MAE)

Mean Square Error (MSE)

Root Mean Square Error (RMSE)

Higher the R-square

Adjusted R-squared values

• If the independent features are correlated it may affect performance

4. What is the difference between R square and adjusted R square?

You might also like