0% found this document useful (0 votes)

12 views26 pages

Day7-Linear Regression New

Uploaded by

Shashank Ediga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views26 pages

Day7-Linear Regression New

Uploaded by

Shashank Ediga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Linear Regression

Introduction to Regression Analysis

• Regression Analysis is a form of predictive modeling techniques

• Estimate the relationships between two or more variables -How the dependent variable changes when one of
the independent variables

• Predict the value of a dependent variable based on the value of at least one independent variable-Explain the
impact of changes in an independent variable on the dependent variable

Dependent variable: the variable we wish to explain, the main factor you are trying to
understand and predict
Independent variable: the variable used to explain the dependent variable, the factors
that might influence the dependent variable.
Linear Regression
Linear regression attempts to model the relationship between two variables by fitting a linear equation
(straight line) to the observed data. One variable is considered to be an explanatory variable
(e.g. your income), and the other is considered to be a dependent variable (e.g. your expenses).

The sloped straight line representing the linear relationship that fits the given data best is called as a regression line.
It is also called as best fit line.
Uses of Regression?
Three major uses for regression analysis are:

1. Determining the strength of predictors- To identify the strength of the effect that the
independent variable(s) have on a dependent variable
what is the strength of relationship between dose and effect, sales and marketing
spending, or age and income.

2 . Forecasting an effect or impact of changes- How much the dependent variable changes
with a change in one or more independent variables
“ how much additional sales income do I get for each additional $1000 spent on
marketing?”

3. Trend forecasting - “what will the price of gold be in 6 months?”

Where is Linear Regression used?

• Evaluating Trends and Sales Estimates

• Analyzing the impact of Price Changes

• Assessment of risk in financial services and insurance domain

Linear regression equation
Mathematically, a linear regression is defined by
this equation:
Speed of
y = mx + c Vehicle

Where:

• x is an independent variable.
m=+ ve slope
• y is a dependent variable.
of Line

• c is the Y-intercept, which is the expected mean value

of y when all x variables are equal to 0. On a regression c= y- intercept
graph, it's the point where the line crosses the Y axis. of the line

• m is the slope of a regression line, which is the rate of

change for y as x changes.
C=2.4

Y = 0.4X + 2.4
0.3
Linear regression in Excel with Analysis ToolPak
Enable the Analysis ToolPak add-in:
1.In your Excel, click File > Options.
2.In the Excel Options dialog box, select Add-ins on the left sidebar, make sure Excel Add-ins is selected in
the Manage box, and click Go. 3. In the Add-ins dialog box,
tick off Analysis Toolpak,
and click OK:
X Range Y range
Step 1
Independent Variable Dependent varialble
Rain Fall (mm) Umbrellas sold Step 2
Jan'18 82 15
Feb 92.5 25
Mar 83.2 17
Apr 97.7 28
May 131.9 41
Jun 141.3 47
Jul 165.4 50
Aug 140 46
Sep 126.7 37
Oct 97.8 22
Nov 86.2 20
Dec 99.6 30
Jan'19 87 14
Feb 97.5 27 Step 3
Mar 88.2 14
Apr 102.7 30
May 123 43
Jun 146.3 49
Jul 160 49
Aug 145 44
Sep 131.7 39

Oct 118 36
Nov 91.2 20
Dec 104.6 32
SUMMARY RESIDUAL
OUTPUT OUTPUT

Regression Statistics Observation Predicted Umbrellas sold Residuals

1 17.82599924 -2.825999237
Multiple R 0.957666798
2 22.5510131 2.448986904
R Square 0.917125697 3 18.36600082 -1.366000821
4 24.89101996 3.10898004
Adjusted R Square 0.913358683
5 40.2810651 0.7189349
Standard Error 3.58141382 6 44.51107751 2.488922493
7 55.35610932 -5.356109317
Observations 24
8 43.92607579 2.073924208
9 37.94105824 -0.941058237
10 24.93602009 -2.936020092
ANOVA
11 19.71600478 0.283995219
df SS MS F Significance F 12 25.74602247 4.253977532
13 20.07600584 -6.076005837
Regression 1 3122.774784 3122.775 243.4623 2.21604E-13
14 24.8010197 2.198980304
Residual 22 282.1835489 12.82652 15 20.61600742 -6.616007421
16 27.14102656 2.858973441
Total 23 3404.958333
17 36.27605335 6.723946647
18 46.76108411 2.238915893

Coefficients Standard Error t Stat P-value Lower 95%

Upper 95% Lower 95.0% Upper 95.0% 19 52.92610219 -3.926102189
- - - 20 46.17608239 -2.176082391
Intercept -19.07410899 3.372182168 -5.65631 1.09E-05 -26.06758677 12.08063122 26.06758677 12.08063122 21 40.19106484 -1.191064836
22 34.02604675 1.973953246
Rain Fall (mm) 0.45000132 0.02884018 15.60328 2.22E-13 0.390190448 0.509812192 0.390190448 0.509812192
23 21.96601138 -1.966011381
24 27.99602907 4.003970933
Linear regression equation
Mathematically, a linear regression is defined by this equation:

y = bx + a

• x is an independent variable.
• y is a dependent variable.

• a is the Y-intercept, which is the expected mean value of y when all x variables are equal to
0. On a regression graph, it's the point where the line crosses the Y axis.

• b is the slope of a regression line, which is the rate of change for y as x changes.

Y (No. of Umbrella sold) = Rainfall Co-efficient * X (average monthly rainfall)+ Intercept

Y = 0.45 * X - 19.07
For example, with the average monthly rainfall equal to 82 mm, the umbrella sales would be approximately 17.8:
0.45*82-19.074=17.8

RESIDUAL OUTPUT

Regression analysis output: residuals

If you compare the estimated and actual number of sold umbrellas corresponding to Observation Predicted Umbrellas sold Residuals

the monthly rainfall of 82 mm, you will see that these numbers are slightly different: 1 17.82599924 -2.825999237
2 22.5510131 2.448986904
3 18.36600082 -1.366000821
• Estimated: 17.8 (calculated above) 4 24.89101996 3.10898004
• Actual: 15 (row 2 of the source data) 5 40.2810651 0.7189349
6 44.51107751 2.488922493

Why's the difference? Because independent variables are never perfect predictors of 7 55.35610932 -5.356109317

the dependent variables. And the residuals can help you understand how far away the 8 43.92607579 2.073924208
9 37.94105824 -0.941058237
actual values are from the predicted values:
10 24.93602009 -2.936020092
11 19.71600478 0.283995219
12 25.74602247 4.253977532
13 20.07600584 -6.076005837
14 24.8010197 2.198980304
15 20.61600742 -6.616007421
16 27.14102656 2.858973441
17 36.27605335 6.723946647
18 46.76108411 2.238915893
19 52.92610219 -3.926102189
20 46.17608239 -2.176082391
21 40.19106484 -1.191064836
22 34.02604675 1.973953246
23 21.96601138 -1.966011381
24 27.99602907 4.003970933
Linear regression graph in Excel
1. Select the two columns with your data, including headers (E4:F28)
2. On the Inset tab, in the Chats group, click the Scatter chart icon, and select the Scatter thumbnail (the first one):
Step 3

Step 4
Step 5
Regression analysis output: Summary Output
This part tells you how well the calculated linear regression equation fits your source data.

Here's what each piece of information means:

Multiple R. It is the Correlation Coefficient that measures the strength of a linear relationship between two variables. The correlation
coefficient can be any value between -1 and 1, and its absolute value indicates the relationship strength. The larger the absolute value,
the stronger the relationship:
•1 means a strong positive relationship
•-1 means a strong negative relationship
•0 means no relationship at all
R Square. It is the Coefficient of Determination, which is used as an indicator of the goodness of fit. It shows how many points fall on
the regression line. The R2 value is calculated from the total sum of squares, more precisely, it is the sum of the squared deviations of
the original data from the mean.
In our example, R2 is 0.91 (rounded to 2 digits), which is fairy good. It means that 91% of our values fit the regression analysis model.
In other words, 91% of the dependent variables (y-values) are explained by the independent variables (x-values). Generally, R Squared
of 95% or more is considered a good fit.
Adjusted R Square. It is the R square adjusted for the number of independent variable in the model. You will want to use this value
instead of R square for multiple regression analysis.
Standard Error. It is another goodness-of-fit measure that shows the precision of your regression analysis - the smaller the number,
the more certain you can be about your regression equation. While R 2 represents the percentage of the dependent variables variance
that is explained by the model, Standard Error is an absolute measure that shows the average distance that the data points fall from
the regression line.
Observations. It is simply the number of observations in your model.
Regression analysis output: ANOVA
The second part of the output is Analysis of Variance (ANOVA):

Basically, it splits the sum of squares into individual components that give information about the levels of variability within
your regression model:
df is the number of the degrees of freedom associated with the sources of variance.
SS is the sum of squares. The smaller the Residual SS compared with the Total SS, the better your model fits the data.
MS is the mean square.
F is the F statistic, or F-test for the null hypothesis. It is used to test the overall significance of the model.
Significance F is the P-value of F.
The ANOVA part is rarely used for a simple linear regression analysis in Excel, but you should definitely have a close look at
the last component. The Significance F value gives an idea of how reliable (statistically significant) your results are. If
Significance F is less than 0.05 (5%), your model is OK. If it is greater than 0.05, you'd probably better choose another
independent variable

Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
ML Module3 Regression
No ratings yet
ML Module3 Regression
51 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Linear Regression Analysis
No ratings yet
Linear Regression Analysis
4 pages
Regression Unit-2
No ratings yet
Regression Unit-2
5 pages
Ssdma Unit 2 Part1
No ratings yet
Ssdma Unit 2 Part1
20 pages
Business Analytics Module 4 Summary
No ratings yet
Business Analytics Module 4 Summary
3 pages
Data Science Analytics Finals Reviewer
No ratings yet
Data Science Analytics Finals Reviewer
64 pages
DMJAP LinearRegression 3
No ratings yet
DMJAP LinearRegression 3
28 pages
W6 - L4 - Simple Linear Regression
No ratings yet
W6 - L4 - Simple Linear Regression
4 pages
Regression Analysis in Excel
No ratings yet
Regression Analysis in Excel
22 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
P4-FDA-B29-Monish Patle
No ratings yet
P4-FDA-B29-Monish Patle
14 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
Least Square Method Using Excel
No ratings yet
Least Square Method Using Excel
21 pages
6.1 Basics-of-Statistical-Modeling
No ratings yet
6.1 Basics-of-Statistical-Modeling
17 pages
Regression Analysis in Excel
No ratings yet
Regression Analysis in Excel
15 pages
Unit III
No ratings yet
Unit III
13 pages
Linear Regression Analysis in Excel Assingment
No ratings yet
Linear Regression Analysis in Excel Assingment
17 pages
Simple Liner REgression
No ratings yet
Simple Liner REgression
27 pages
Linearregressionpl
No ratings yet
Linearregressionpl
9 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
1.5.linear Regression
No ratings yet
1.5.linear Regression
5 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression Analysis in Excel
No ratings yet
Linear Regression Analysis in Excel
17 pages
Linear Regression Analysis in Excel 2
No ratings yet
Linear Regression Analysis in Excel 2
15 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
MLT Unit 2
No ratings yet
MLT Unit 2
53 pages
AI Lec23
No ratings yet
AI Lec23
36 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Regression (Basic Concepts)
No ratings yet
Regression (Basic Concepts)
15 pages
BA Module 4 Summary
No ratings yet
BA Module 4 Summary
3 pages
5 - Part II - Regression Analysis W-Notes
No ratings yet
5 - Part II - Regression Analysis W-Notes
10 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
DA Notes 3
No ratings yet
DA Notes 3
12 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Linear Regression Analysis in Excel
No ratings yet
Linear Regression Analysis in Excel
17 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
6 pages
Simple Linear Regression: Coefficient of Determination
No ratings yet
Simple Linear Regression: Coefficient of Determination
21 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Linear Regression Analysis in Excel
No ratings yet
Linear Regression Analysis in Excel
15 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Regression
No ratings yet
Regression
25 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
91 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Regression Analysis in Excel
No ratings yet
Regression Analysis in Excel
20 pages
Midterm Sample ADM 3301
100% (1)
Midterm Sample ADM 3301
19 pages
Regression Analysis in R
No ratings yet
Regression Analysis in R
7 pages
Mcqs
50% (2)
Mcqs
4 pages
Time Series Analysis
No ratings yet
Time Series Analysis
5 pages
My e Book For Statistics For Free
100% (1)
My e Book For Statistics For Free
83 pages
Ancova
100% (1)
Ancova
20 pages
Time Series: International University - Vnu HCMC
No ratings yet
Time Series: International University - Vnu HCMC
35 pages
STA301 Solved MCQs (23 To 45) Final Term by JUNAID
No ratings yet
STA301 Solved MCQs (23 To 45) Final Term by JUNAID
26 pages
Tutorial 1-14 Student S Copy 201605
0% (2)
Tutorial 1-14 Student S Copy 201605
27 pages
M2S2 - Statistical Modelling: DR Axel Gandy Imperial College London Spring 2011
No ratings yet
M2S2 - Statistical Modelling: DR Axel Gandy Imperial College London Spring 2011
25 pages
Discovering Statistics Using IBM SPSS Statistics 4th Edition Field Test Bankdownload
100% (3)
Discovering Statistics Using IBM SPSS Statistics 4th Edition Field Test Bankdownload
45 pages
PSet1 - Solnb Solutiond
No ratings yet
PSet1 - Solnb Solutiond
10 pages
Full Download Robust Statistics 2° Edition Peter J. Huber PDF
100% (2)
Full Download Robust Statistics 2° Edition Peter J. Huber PDF
51 pages
AE Lecture 3 Differences-in-Differences
No ratings yet
AE Lecture 3 Differences-in-Differences
55 pages
Nichols 1994
No ratings yet
Nichols 1994
30 pages
MMPC-005 Quantitative Analysis
No ratings yet
MMPC-005 Quantitative Analysis
4 pages
6 Sebaran Penarikan Contoh
No ratings yet
6 Sebaran Penarikan Contoh
15 pages
TPPP
No ratings yet
TPPP
2 pages
9.3statistical Tables
No ratings yet
9.3statistical Tables
6 pages
Bretz Et Al 2011 Graphical Approaches For Multiple Comparison Procedures Using Weighted Bonferroni, Simes, or Parametric Tests
No ratings yet
Bretz Et Al 2011 Graphical Approaches For Multiple Comparison Procedures Using Weighted Bonferroni, Simes, or Parametric Tests
20 pages
KNN Algorithm For Conditional Mean and Variance Estimation With Automated Uncertainty Quantification and Variable Selection
No ratings yet
KNN Algorithm For Conditional Mean and Variance Estimation With Automated Uncertainty Quantification and Variable Selection
31 pages
Ecotrix Final Excel
No ratings yet
Ecotrix Final Excel
16 pages
Curve Fitting - CE 2207
No ratings yet
Curve Fitting - CE 2207
35 pages
Supplement To The Basic Practice of Statistics - Chapter 1
No ratings yet
Supplement To The Basic Practice of Statistics - Chapter 1
17 pages
Violations of Classical Assumptions: Chapter Four
No ratings yet
Violations of Classical Assumptions: Chapter Four
38 pages
Stats Final Exam Cheat Sheet
No ratings yet
Stats Final Exam Cheat Sheet
2 pages
Final Exam Formula Sheet
No ratings yet
Final Exam Formula Sheet
3 pages
Ahmed Saleem Khan Assignment 1 STATA
No ratings yet
Ahmed Saleem Khan Assignment 1 STATA
3 pages
Lampiran SPSS: Frequency Table
No ratings yet
Lampiran SPSS: Frequency Table
2 pages
DeepakPathak Resume
No ratings yet
DeepakPathak Resume
3 pages
Predictive Analytics: Group Assignment 2
No ratings yet
Predictive Analytics: Group Assignment 2
6 pages

Day7-Linear Regression New

Uploaded by

Day7-Linear Regression New

Uploaded by

Linear Regression

Introduction to Regression Analysis

3. Trend forecasting - “what will the price of gold be in 6 months?”

• Evaluating Trends and Sales Estimates

• Analyzing the impact of Price Changes

• Assessment of risk in financial services and insurance domain

• c is the Y-intercept, which is the expected mean value

• m is the slope of a regression line, which is the rate of

Regression Statistics Observation Predicted Umbrellas sold Residuals

Coefficients Standard Error t Stat P-value Lower 95%

Y (No. of Umbrella sold) = Rainfall Co-efficient * X (average monthly rainfall)+ Intercept

Regression analysis output: residuals

Here's what each piece of information means:

You might also like