Regression Analysis Using Excel
Regression Analysis Using Excel
1
Contents
2
Regression Model
Regression Analysis in Excel
Simple Linear Regression
Correlation
How To Do A Regression in Excel
Slope
Intercept
ANOVA
References
Regression Model
3
β1 is constant
Example:
Contd…..
5
Contd…..
6
What is it?
Determines if Y
depends on X and
provides a math
equation for the
y
relationship
(continuous data)
Examples:
x
Process conditions
rise
m = slope =
run
Y
b = Y intercept
rise
= the Y value
at point that
the line
intersects Y run
axis.
b
0 X
A simple linear relationship can be described mathematically by
Y = mX + b
Simple Linear Regression
14
(6 - 3) 1
rise = =
slope =
run (10 - 4) 2
Y
rise
run
intercept = 1
0 X
0 5 10
Y = 0.5X + 1
Simple Regression Example
15
apartments will be
726 935
700 875
956 1150
analyzed in
1100 1400
1285 1650
1985 2300
EXCEL.
1369 1800
1175 1400
1225 1450
1245 1100
1259 1700
1150 1200
896 1150
1361 1600
1040 1650
755 1200
1000 800
1200 1750
16
Scatter Plot
17
2500
2300
2100
1900
1700
Rent
1500
1300
1100
900
700
500
500 700 900 1100 1300 1500 1700 1900 2100
Size
Scatter plot suggests that there is a ‘linear’ relationship between Rent and Size
Interpreting EXCEL output
18
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.85
R Square 0.72
Adjusted R Square 0.71
Standard Error 194.60
Observations 25
ANOVA
df SS MS F Significance F
Regression 1 2268776.545 2268776.545 59.91376452 7.51833E-08
Residual 23 870949.4547 37867.3676
Total 24 3139726
Regression Equation
Rent = 177.121+1.065*Size
Interpretation of the Regression Coefficient
19
Regression Equation:
Rent = 177.121+1.065*Size
Thus, when Size=1000
Rent=177.121+1.065*1000=$1242 (rounded)
Using Regression for Prediction – Caution!
21
Extrapolated relationship
True
Relationship
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.85
R Square 0.72
Adjusted R Square 0.71
Standard Error 194.60
Observations 25
ANOVA
df SS MS F Significance F
Regression 1 2268776.545 2268776.545 59.91376452 7.51833E-08
Residual 23 870949.4547 37867.3676
Total 24 3139726
The sign of r is the same as that of the coefficient of X (Size) in the regression equation (in our
case the sign is positive). Also, if you look at the scatter plot, you will note that the sign should
be positive.
“Coefficient of Determination”, r-
squared, (sometimes R- squared),
defines the amount of the variation in Y
that is attributable to variation in X
Getting r2 from EXCEL
26
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.85
R Square 0.72
Adjusted R Square 0.71
Standard Error 194.60
Observations 25
ANOVA
df SS MS F Significance F
Regression 1 2268776.545 2268776.545 59.91376452 7.51833E-08
Residual 23 870949.4547 37867.3676
Total 24 3139726
It is important to remember that r-squared is always positive. It is the square of the coefficient of
correlation r. In our case, r2=0.72 suggests that 72% of variation in Rent is explained by the
variation in Size. The higher the value of r2, the better is the simple regression model.
Standard Error (SE)
27
2100
1900
1700
Rent ($)
1500
1300
1100
900
700
500
500 1000 1500 2000 2500
Size (square feet)
Getting the Standard Error (SE) from EXCEL
28
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.85
R Square 0.72
Adjusted R Square 0.71
Standard Error 194.60
Observations 25
ANOVA
df SS MS F Significance F
Regression 1 2268776.545 2268776.545 59.91376452 7.51833E-08
Residual 23 870949.4547 37867.3676
Total 24 3139726
In our example, the standard error associated with estimating rent is $194.60.
Is the Simple Regression Model Statistically Valid?
29
H 0 : Slope 0
H1 : Slope 0
What could we say about the linear relationship between X and Y if the slope
were zero?
Using coefficient information for
testing if
31
slope=0
SUMMARY OUTPUT
Regression Statistics
P-value
Multiple R 0.85
R Square 0.72 7.52E-08
Adjusted R Square 0.71
Standard Error 194.60 =7.52*10-8
Observations 25 =0.0000000752
ANOVA
df SS MS F Significance F
Regression 1 2268776.545 2268776.545 59.91376452 7.51833E-08
Residual 23 870949.4547 37867.3676
Total 24 3139726
t-stat=7.740 and P-value=7.52E-08. P-value is very small. If it is smaller than our a level,
then, we reject null; not otherwise. If a=0.05, we would reject null and conclude that slope
is not zero. Same result holds at a=0.01 because the P-value is smaller than 0.01. Thus, at
0.05 (or 0.01) level, we conclude that the slope is NOT zero implying that our model is
statistically valid.
Using ANOVA for testing if slope=0 in EXCEL
32
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.85
R Square 0.72
Adjusted R Square 0.71
Standard Error 194.60
Observations 25
ANOVA
df SS MS F Significance F
Regression 1 2268776.545 2268776.545 59.91376452 7.51833E-08
Residual 23 870949.4547 37867.3676
Total 24 3139726
F=59.91376 and P-value=7.51833E-08. P-value is again very small. If it is smaller than our
a level, then, we reject null; not otherwise. Thus, at 0.05 (or 0.01) level, slope is NOT zero
implying that our model is statistically valid. This is the same conclusion we reached using
the t-test.
Confidence Interval for the Slope of Size
33
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.85
R Square 0.72
Adjusted R Square 0.71
Standard Error 194.60
Observations 25
ANOVA
df SS MS F Significance F
Regression 1 2268776.545 2268776.545 59.91376452 7.51833E-08
Residual 23 870949.4547 37867.3676
Total 24 3139726
40
Select Scatter
and
this chart sub-type
41
Highlight only cells that
contain the x and y values
42
Enter a chart title
Label x and y axes
43
Store on a new worksheet
Name the worksheet
44
1. Click on grey background -- Delete
2. Click on any horizontal line -- Delete
3. Click on legend -- Delete
45
46
Right mouse click on any data point.
Select “Add Trendline”.
Select Linear from the trendline options.
47
Looks linear.
Return to Original Worksheet.
48
Go to Tools Menu
Select Data Analysis
49
Select Regression
50
1. Highlight cells of y-variable
51
r
r2
adj r2 SSR
s SSE
n SSTOTAL
Graphical investigation:
• side-by-side box plots
• multiple histograms
13
12
11
10
days
A B P
treatment
What does ANOVA do?
59
At its simplest (there are extensions) ANOVA tests the
following hypotheses:
x
Group i has
• ni = # of individuals in group i
• xij = value for individual j in group i
• = mean for group i
• si = standard deviation for group i
xi
How ANOVA works (outline)
64
x i x 2
x ij xi
2
The ANOVA F-statistic is a ratio of the Between Group Variaton
divided by the Within Group Variation:
Between MSG
F
Within MSE
65
How are These Computations Made?
66
x x
2
i
( xij xi ) 2
An Even Smaller Example
67
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 5.127333 2 2.563667 10.21575 0.008394 4.737416
Within Groups 1.756667 7 0.250952
Total 6.884 9
(x xi ) 2
(x x) 2
(x
ij
obs ij x) 2
obs
i
obs
SS stands for sum of squares
• ANOVA splits this into 3 parts
ANOVA Output
72
Analysis of Variance for days
Source DF SS MS F P
treatment 2 34.74 17.37 6.45 0.006
Error 22 59.26 2.69
Total 24 94.00
Since F is
Mean Square Between / Mean Square Within
= MSG / MSE
If ignore the groups for a moment and just compute the standard deviation of the
entire data set, we see
SST
2
x ij x
s 2
MST
n 1 DFT
So SST = (n -1) s2, and MST = s2. That is, SST and MST measure the TOTAL
variation in the data set.
Connections between SSE, MSE, and Standard
Deviation
75
Remember:
si
2
x ij xi
2
SS[ Within Group i ]
ni 1 dfi
So SS[Within Group i] = (si2) (dfi )
This means that we can compute SSE from the standard deviations and sizes
(df) of each group:
(n 1)s 2
(n 1)s 2
... (n 1)s 2
s2p 1 1 2 2 I I
nI
(df1)s (df 2 )s ... (df I )s
2 2 2
s
2
p
1 2 I
df1 df 2 ... df I
so MSE is the pooled
SSE estimate of variance
s
2
p MSE
DFE
In Summary
77
SST (x ij x ) s (DFT)
2 2
obs
SSE (x ij x i ) si (df i )
2 2
obs groups
SSG (x i x) 2
n (x i i x) 2
obs groups
SS MSG
SSE SSG SST; MS ; F
DF MSE
R2 Statistic
78
SS[Between ] SSG
R 2
SS[Total ] SST
Once ANOVA indicates that the groups do not all appear to have the same means,
what do we do?
A B
These give 98.01%
B -3.685
0.435
CI’s for each pairwise
difference.
P -4.863 -3.238
-0.859 0.766 Only P vs A is significant
(both values have same sign)
98% CI for A-P is (-0.86,-4.86)
Tukey’s Method in R
82
http://www.wikihow.com/Run-Regression-Analysis-
in-Microsoft-Excel
http://office.microsoft.com/en-001/excel-
help/slope-HP005209264.aspx
http://office.microsoft.com/en-in/excel-
help/intercept-HP005209143.aspx
http://capacitas.wordpress.com/2013/01/14/forecas
ting-basic-time-series-decomposition-in-excel/
THANK YOU
85