Simple Linear Regression
Simple Linear Regression
Xi 60 61 63 64 64 65 65 66 67 67 68
Yi 105 110 115 120 118 124 127 134 145 138 150
155
150
145
140
Weight (lbs)
135
130
125
120
115
110
105
100
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
Height (Inches)
Y = f (X)
Y= a+bX
Y=a+bX
n
n n
n xi yi xi y i
b i 1 i 1 i 1
2
n
n
n xi xi
2
i 1 i 1
Where
n n
yi b xi
a i 1
i 1
n n
Xi 60 61 63 64 64 65 65 66 67 67 68
Yi 105 110 115 120 118 124 127 134 145 138 150
This varies with different calculators, but your calculator should have a
“quick reference” card or info sheet stuck against its panel to show you
the keys.
Then press the keys for A and B to get the values for the least squares
line.
a
= -111.2830
b
= 3.6540
Y=-111.2830 + 3.6540 X
Where Y=weight, and X=height.
Linear regression Line plotted against scatter plot points
Y= -111.2830 +
155 3.6540X
150
145
140
Weight (lbs)
135
130
125
120
115
110
105
100
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
Height (Inches)
Y= f (70 inches)
Y= -111.2830 + 3.6540 (70)
= 144.497 lbs Using calculators, simply type in 70
then ŷ
r = 0.913150717
30 30
25 25
20 20
15 15
10 10
5 5
0 0
10 11 12 13 14 15 16 17 18
10 11 12 13 14 15 16 17 18
30
30
25
25
20
20
15
15
10
10
5
5
0
10 11 12 13 14 15 16 17 18 0
10 11 12 13 14 15 16 17 18
Practice Exercises:
Temp 28 29 30 32 33 35 38 42 46
(oC)
Life 100 98 89 95 92 88 90 88 85
(hrs) 0 0 0 0 1 5 0 0 0
Height , Y Amount,
X
28.1 27.6 6
32.3 33.2 7
34.8 35.0 8
38.2 39.4 9
43.5 46.8 10
City 3 5 7 10 15 20 17 12
size 0 0 5 0 0 0 5 0
Food 4 3 4 42 41 45 44 37
0 7 0
Clothi 1 2 2 15 16 12 14 10
ng 0 0 0
Housi 1 2 1 23 26 28 26 24
ng 5 0 9
City size in millions of people, all expenditures in thousands of US
dollars.
a. Fit a simple linear model relating city size and annual expenses
per family.
b. using the correlation coefficient betweeb city size and annual
expenses per family, state whether there is a strong or weak
correlation.
c. What would be the expected annual family expense from a city
of 65 million people?
d. Can city size predict food expenditure better than city size
predict annual family expenditure. Use the correlation coefficient
for your answer.
Other Linear Regression Models: Exponential Regression,
Power Regression.
Sometimes, the data that you have may not fit into a simple
linear model. However, if you transform the original data pair via
some function like finding its natural logarithm or its inverse, you can
transform data that is inherently not linear into linear values to fit into
our simple linear regression model.
X -2.0 -0.4 1.5 2.4 2.7 3.5 4.6 5.3 5.8 6.4 6.8
Y 5.3 8.8 13. 17. 18. 24. 28. 34. 44. 55. 72.
3 9 9 4 3 0 0 1 2
yˆ b0 b1 X 1 bk X k
What this lecture note would like to show instead is how to use
Microsoft™ Excel worksheets to compute for these coefficients as well
as determining which subset of available variables Xi’s should be
included in a multiple linear regression model.
Let’s say that a country’s GNP was thought to be predicted by,
say, three indicator variables: total consumption X 1 in the capital city ,
Total investments made by the citizens X 2, and finally the city’s
government expenditure X3.
The following table shows the values of each variable Xi and the
true GNP during that year.
X1 X2 X3 Y (GNP)
50 10 100 330
50 20 150 260
50 30 200 290
50 40 280 306
70 50 240 300
70 70 350 260
80 80 200 200
80 90 750 520
2. Load up Excel from your computer and type in the data on the
worksheet.
3. Then go to the menu item Tools-Data Analysis, and choose
Regression from the Data Analysis dialogue box.
5. Press OK and you should have the report after the dialogue
box on the next page:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.974163
R Square 0.948993
Adjusted R Square 0.910738
Standard Error 28.13851
Observations 8
ANOVA
df SS MS F Significance F
Regression 3 58924.4 19641.47 24.80685 0.004794
Residual 4 3167.103 791.7758
Total 7 62091.5
Look at the p-values in the bottom (or 3rd) table of the report,
all p-values must be below significance level .
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.947926
R Square 0.898564
Adjusted R Square 0.85799
Standard Error 35.49168
Observations 8
ANOVA
df SS MS F Significance F
Regression 2 55793.2 27896.6 22.14614 0.003277
Residual 5 6298.298 1259.66
Total 7 62091.5
Since all p-values are below 0.05, then stop. The efficient model
to predict Y contains X2 and X3, and the model is: Y=245.7008-
2.35443X2+0.624944X3.
Practice Exercise:
X1 X2 X3 X4 X5 X6 Y
7 9 10 12 13 22 995
6 10 18 18 19 25 1325
8 9 7 10 17 26 1452
6 7 9 25 37 27 1735
7 8 10 35 22 28 2188
8 9 11 18 15 29 1435
9 7 6 51 18 30 2980
10 6 8 16 41 32 1470
9 2 17 28 36 35 1240