Lecture 3
Lecture 3
1
Simple Linear Regression
2
Advertising Spending and Sales
Advertising
TV radio newspaper sales
⋯
⋯
⋯
⋯
Observation j
197 94.2 4.9 8.1 9.7
3
What is Predictive Analytics?
Advertising & Sales
Can we predict Sales using data on TV, Radio, and Newspaper spending? How?
4
Some Few Important Questions
• Is there a relationship between advertising spending and sales?
5
The simplest Model
Let's j ∈ {TVAd, RatioAd, NewspaperAd}
Sales = β0 + β1Xj + ϵj
Non-random Random
Assumptions:
1 − E[ϵj] = 0
2 − Var(ϵj) = σ 2
3 − ϵj ∼ N(0,σ 2) & are Independent and identically distributed i.e., I . I . D
4 − corr(X, ϵj) = 0
6
Visualization of Linear Regression
7
Why Linear Models?
Linear Models are:
8
The Simple Linear Regression Model
A single predictor: Xj Can be:
• quantitative
• qualitative
• transformations (i.e. log)
• basis expansion (squares)
• numeric coding of qualitative variable
• interactions
10
Linear vs. Non-linear Model
Linear Non-linear
Model is linear in parameters Model is NOT linear in parameters
β02
Sales = +ϵ
Sales = β0 + β1Xj2 + ϵ β1 + Xj
log(Sales)
log(Sales) = β0 + β1 log(Xj) + β2 Xj2 + ϵ = β02 exp(β1Xj) + log(β2)log(Xj) + ϵ
log(Xj + 1)
11
Which lines (Models) does explain data better?
How do sales depend on the TV advertising budget?
(β0 = 1,β1 = 1.5)
(β0 = 4,β1 = 1) intercept slope
(β0 = 4,β1 = 0.8)
Sales = β0 + β1TVAd + ϵ
12
Estimation
Let's our line be ŷ = f ̂ = β0̂ + β1̂ TVAd
Residual
̂ ) = y − ŷ
ei = yi − f(xi i i
25
15
TV
13
The line explains the Data better (on average)
if and only if
if and only if
14
We minimize the Residual Sum of Squares:
n
[∑ ]
min RSS = min (yi − β0 − β1xi)2
β0,β1 β0,β1
i=1
3 3
2.5
0.06
2.15
0.05
β1
RS
0.04
S
2.2
2.3
0.03
β1 3 3
β0 5 6 7 8 9
β0
15
OLS
[∑ ]
min RSS = min (yi − β0 − β1xi)2
β0,β1 β0,β1
i=1
16
OLS
[∑ ]
min RSS = min (yi − β0 − β1xi)2
β0,β1 β0,β1
⟹ i=1
25
∑i=1 (xi − x̄)(yi − ȳ)
n
20
β1̂ = = 0.0475
Sales
15
∑i=1 (xi − x̄)
n 2
10
β0̂ = ȳ − β1̂ x̄ = 7.03
5
0 50 100 150 200 250 300
TV
17
Assessing Accuracy of Coefficient Estimates
• Now, we need to know how “good” our estimates are…
(i.e. how well does this line capture the pattern in the data?)
25
20
β1̂ = 0.0475
Sales
15
10
β0̂ = 7.03
5
0 50 100 150 200 250 300
TV
18
Assessing Accuracy of Coefficient Estimates
Let's the true DGP beY = β0 + β1X + ϵ = 2 + 3X + ϵ
10
over each sample.
5
sample. The unobserved errors vary
Y
Y sample to sample.
0
0
−5
−5
−10
−10
−2 −1 0 1 2 −2 −1 0 1 2
X X
19
Assessing Accuracy of Coefficient Estimates
But we only have one sample and finite observations…We have formulas
for the standard error: how far (on average) the estimate differs
σ2
[ n ∑i (xi − x̄)2 ]
2
1 x̄
SE( β0̂ )2 = σ 2 + n SE( β1̂ )2 = n
∑i (xi − x̄)2
20
Assessing Accuracy of Coefficient Estimates
But we only have one sample and finite observations…We have formulas
for the standard error: how far (on average) the estimate differs
σ2
[ n ∑i (xi − x̄)2 ]
2
1 x̄
SE( β0̂ )2 = σ 2 + n SE( β1̂ )2 = n
∑i (xi − x̄)2
2
Do we know σ ?
21
Assessing Accuracy of Coefficient Estimates
But we only have one sample and finite observations…We have formulas
for the standard error: how far (on average) the estimate differs
σ ̂2
[ n ∑i (xi − x̄)2 ]
2
1 x̄
SE( β0̂ )2 = σ ̂2 + n SE( β1̂ )2 = n
∑i (xi − x̄)2
2
RSS ∑ e n̂
25
2
σ̂ = =
n−2 n−2 We can use its estimation
20
Sales
15
10
RSS ∑ e2n̂
σ̂ = =
5
σ ̂ = 3.259
20
15
TV
23
Confidence Interval
β ̂ ± 2 ⋅ SE( β)̂
β0̂
6.12 7.03 7.94
β1̂
0.042 0.047 0.053
24
Hypothesis Testing (A two-tailed test)
Null Hypothesis: β1 = 0 There IS NOT a significant effect of X on Y
⟺
Is β su ciently far from zero to reject the null? How far?
25
ffi
Hypothesis Testing (A two-tailed test)
Null Hypothesis: β1 = 0 There IS NOT a significant effect of X on Y
β1̂ − 0
Test statistic: t =
SE( β1̂ ) 2.5% 2.5%
t
-1.96 0 1.96
26
Goodness-of-Fit
n
(yi − ȳ)2
∑
TSS = This is total varia on in y . Our model Y = β0 + β1X will be closer to f
i=1 if we explain higher proportion of TSS
by using X
RSS: Total Sum of Residuals
n
e2î
∑
RSS = This is our es ma on of varia on in ϵ .
i=1
27
ti
ti
ti
ti
Goodness-of-Fit
28
ti
ti
ti
ti
ti
ti
ti
Goodness-of-Fit
n
∑i=1 (xi − x̄)(yi − ȳ)
r = corr(X, Y ) = 2 TSS − RSS RSS
n
∑i=1 (xi − x̄)2(yi − ȳ)2 R = =1−
TSS TSS
If we have only one predictor X ⟹ r 2 = R 2 i.e., how far X & Y are linearly connected
29
ti
ti
ti
Multiple Linear Regression
30
What if we have p − variables : X1⋯Xp
X1 X2 X3 Y
TV radio newspaper sales
⋯
⋯
⋯
197 94.2 4.9 8.1 9.7
31
Question:
32
If we consider three separated simple linear regression:
Sales = βoTv + β1TvTVAd + ϵ RadioAd & NewspaperAd are here
Sales = βoRa + β1Ra RadioAd + ϵ T VAd & NewspaperAd are here
Sales = βoNe + β1Ne NewspaperAd + ϵ T VAd & RadioAd are here
33

The Meaning of Multiple Linear Regression
Y = β0 + β1X1 + ⋯ + βj Xj + ⋯… + βp Xp + ϵ P≥2
fi
34
ff
fi
Visualization of Multiple Linear Regression
Sales
Residuals:
ei = yi − f([xi,1, …, xi,p])
Difference between model and data
TV
Radio
35
Sales = βoTv + β1TvTVAd + ϵ
72 3. Linear Regression
72 3. Linear Regression
Simple regression of sales on radio
Sales
Simple = βoRaStd.
+ of
regression
Coefficient βerror
1Ra RadioAd
sales on radio+ ϵ p-value
t-statistic
Intercept 9.312 0.563 16.54 < 0.0001
Coefficient Std. error t-statistic p-value
radio 0.203 0.020 9.92 < 0.0001
Intercept 9.312 0.563 16.54 < 0.0001
radio 0.203 0.020 9.92 < 0.0001
Sales
Simple = β of
regression + sales
oNe β NewspaperAd
1Ne
on newspaper +ϵ
Simple regression
Coefficient of sales
Std. error ont-statistic
newspaper p-value
Intercept 12.351 0.621 19.88 < 0.0001
Coefficient Std. error t-statistic p-value
newspaper 0.055 0.017 3.30 0.00115
Intercept 12.351 0.621 19.88 < 0.0001
TABLEnewspaper
3.3. More simple linear
0.055regression models for the
0.017 3.30 data. Co-
0.00115
Advertising
Suppose youofenter
efficients a newlinear
the simple market, shouldmodel
regression you for
advertise
number in
of the
unitsnewspaper?
sold on Top:
TABLE 3.3. More
radio advertising simple
budget andlinear regression
Bottom: models
newspaper for the Advertising
advertising data. Co-
budget. A $1,000 in-
efficients of the simple
crease in spending linearadvertising
on radio regression ismodel for number
associated of average
with an units sold on Top:
increase in
radio advertising budget and Bottom: newspaper advertising budget. A $1,000 in- 36
Sales = βoT v + β1T vT VAd + ϵ
72 3. Linear Regression
72 3. Linear Regression
Simple regression of sales on radio
Sales
Simple = βoRa + of
regression
Coefficient Std. βerrorRa dioAd
sales
1Ra +ϵ
on radio
t-statistic p-value
9.312 Std. 0.563
Intercept Coefficient 16.54
error t-statistic <p-value
0.0001
radio
Intercept
0.203
9.312 0.020
0.563 9.92
16.54 <
< 0.0001
0.0001
radio 0.203 0.020 9.92 < 0.0001
Simple regression
Sales of Ne
= βoNe + β1Ne wspaon
sales perAd +ϵ
newspaper
Simple regression
Coefficient of sales
Std. error ont-statistic
newspaper p-value
Intercept 12.351 Std. 0.621
Coefficient 19.88 <p-value
error t-statistic 0.0001
newspaper 0.055 0.017 3.30 0.00115
Intercept 12.351 0.621 19.88 < 0.0001
4 3. Linear Regression
Sales TABLE
= βo 3.3.
+theMore
βsimple
1TVAd + β2model RadioAd
for number of+ βsold
3NewspaperAd +ϵ
TABLEnewspaper 0.055regression
3.3. More simple linear 0.017 3.30
models for the 0.00115
Advertising data. Co-
efficients of linear regression units on Top:
simple linear regression models for the Advertising data. Co-
radio advertising budget and Bottom: newspaper advertising budget. A $1,000 in-
efficients
crease in of the simple linearadvertising
regression ismodel for number of average
units sold on Top:
Coefficient Std. error t-statistic
spending on radio associated with an
radio advertising budget and Bottom: newspaper advertising budget. A $1,000 in-
sales by around 203 units, while the same increase in spending on newspaper ad-
increase in p-value
crease in is
spending on radio advertising is associated with
by an average increase in
vertising
Intercept
sales by around
that the
2.939
associated with an
203 units,
sales variable is while
0.3119
average increase in sales
the sameofincrease
in thousands
9.42
around
in spending
units, and
55 units
the radioonand
(Note
newspaper ad-
newspaper
< 0.0001
vertising
variables isareassociated with of
andollars).
average increase in sales by around 55 units (Note
TV 0.046
in thousands 0.0014 32.81
that the sales variable is in thousands of units, and the radio and newspaper < 0.0001
variables are in thousands of dollars).
radiowhere X represents the
j 0.189
jth predictor and0.0086 21.89
β quantifies the association
j < 0.0001
between that variable and the response. We interpret β as the average j
effect on Y of a one−0.001 0.0059 −0.18 0.8599
where X represents the jth predictor and β quantifies the association
newspaper j
unit increase in X , holding j
all other predictors
j
between that variable and the response. We interpret β as the average
j
fixed.
In the advertising example, (3.19) becomes
effect on Y of a one unit increase in Xj , holding all other predictors fixed.
TABLE 3.4. For the WeAdvertising
controls
In the sales
advertising
0 = β +example,
1
for
β × TV +(3.19)
2
other
data, leastexplanations
β × becomes squares coefficient estimates of the
radio + β × newspaper + ".
3 (3.20)
multiple linear regression
sales = β0 + βof number
1 × TV + β2 × radioof
+ βunits sold+ ".on radio,
3 × newspaper (3.20) TV, and newspaper
3.2.1 Estimating the Regression Coefficients 37
dvertising budgets.
46 3. Linear Methods for Regression
1 x1 x2 x3 y
1 x11 x12 … x1,p
1 x21 x22 … x2,p
X= Estimated Residual
⋮ ⋮ ⋮ ⋮
1 xn1 xn2 … xn,p ê = y − ŷ
vectors x2
ŷ
x1
38
Correlated Information
x2 Newspaper
x1
Radio
The extra information added by NewspaperAd has no impact on sales, once
RadioAd is controlled for.
x2
x1
40
What is Predictive Analytics?
Advertising & Sales
i.e., Sales = βo + (β1 > 0)TVAd + (β2 > 0)RadioAd + (β3 > 0)NewspaperAd + ϵ
41
Sales = βoT v + β1T vT VAd + ϵ
72 3. Linear Regression
72 3. Linear Regression
Simple regression of sales on radio
Sales
Simple = βoRa + of
regression
Coefficient Std. βerrorRa dioAd
sales
1Ra +ϵ
on radio
t-statistic p-value
9.312 Std. 0.563
Intercept Coefficient 16.54
error t-statistic <p-value
0.0001
radio
Intercept
0.203
9.312 0.020
0.563 9.92
16.54 <
< 0.0001
0.0001
radio 0.203 0.020 9.92 < 0.0001
Simple regression
Sales of Ne
= βoNe + β1Ne wspaon
sales perAd +ϵ
newspaper
Simple regression
Coefficient of sales
Std. error ont-statistic
newspaper p-value
Intercept 12.351 Std. 0.621
Coefficient 19.88 <p-value
error t-statistic 0.0001
newspaper 0.055 0.017 3.30 0.00115
Intercept 12.351 0.621 19.88 < 0.0001
4 3. Linear Regression
Sales TABLE
= βo 3.3.
+theMore
βsimple
1TVAd + β2model RadioAd
for number of+ βsold
3NewspaperAd +ϵ
TABLEnewspaper 0.055regression
3.3. More simple linear 0.017 3.30
models for the 0.00115
Advertising data. Co-
efficients of linear regression units on Top:
simple linear regression models for the Advertising data. Co-
radio advertising budget and Bottom: newspaper advertising budget. A $1,000 in-
efficients
crease in of the simple linearadvertising
regression ismodel for number of average
units sold on Top:
Coefficient Std. error t-statistic
spending on radio associated with an
radio advertising budget and Bottom: newspaper advertising budget. A $1,000 in-
sales by around 203 units, while the same increase in spending on newspaper ad-
increase in p-value
crease in is
spending on radio advertising is associated with
by an average increase in
vertising
Intercept
sales by around
that the
2.939
associated with an
203 units,
sales variable is while
0.3119
average increase in sales
the sameofincrease
in thousands
9.42
around
in spending
units, and
55 units
the radioonand
(Note
newspaper ad-
newspaper
< 0.0001
vertising
variables isareassociated with of
andollars).
average increase in sales by around 55 units (Note
TV 0.046
in thousands 0.0014 32.81
that the sales variable is in thousands of units, and the radio and newspaper < 0.0001
variables are in thousands of dollars).
radiowhere X represents the
j 0.189
jth predictor and0.0086 21.89
β quantifies the association
j < 0.0001
between that variable and the response. We interpret β as the average j
effect on Y of a one−0.001 0.0059 −0.18 0.8599
where X represents the jth predictor and β quantifies the association
newspaper j
unit increase in X , holding j
all other predictors
j
between that variable and the response. We interpret β as the average
j
fixed.
In the advertising example, (3.19) becomes
effect on Y of a one unit increase in Xj , holding all other predictors fixed.
TABLE 3.4. For the WeAdvertising
controls
In the sales
advertising
0 = β +example,
1
for
β × TV +(3.19)
2
other
data, leastexplanations
β × becomes squares coefficient estimates of the
radio + β × newspaper + ".
3 (3.20)
multiple linear regression
sales = β0 + βof number
1 × TV + β2 × radioof
+ βunits sold+ ".on radio,
3 × newspaper (3.20) TV, and newspaper
3.2.1 Estimating the Regression Coefficients 42
dvertising budgets.
Omitted Variables Bias
Sales = βo + β1TVAd + β2 RadioAd + β3NewspaperAd + ϵ
̂ = β̂ + β̂ X
If we do Sales 0 1 RadioAd+ϵ′
It is biased “upward.”
The other advertising media effects on Sales is reflected on
(they are positively correlated).
Endogeneity
ti
ffi

43
Omitted Variables Bias
Sales = βo + β1TVAd + β2 RadioAd + β3NewspaperAd + ϵ
̂ = β̂ + β̂ X
If we do Sales 0 1 RadioAd+ϵ′
It is biased “upward.”
Why?
ti
ffi

44
Omitted Variables Bias
Sales = βo + β1TVAd + β2 RadioAd + β3NewspaperAd + ϵ
̂ = β̂ + β̂ X
If we do Sales 0 1 RadioAd+ϵ′ := βo + β1TVAd + β2 NewspaperAd + ϵ