0% found this document useful (0 votes)
9 views

Lecture 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

SHERVIN SHAHROKHI TEHRANI

Predictive Analytics With SAS


Lecture Three: Linear Regression I

Prof. Ryan Webb

1
Simple Linear Regression

2
Advertising Spending and Sales
Advertising
TV radio newspaper sales

1 230.1 37.8 69.2 22.1


Observation 1
2 44.5 39.3 45.1 10.4
Observation 2
3 17.2 45.9 69.3 9.3

4 151.5 41.3 58.5 18.5

⋯ 5 180.8 10.8 58.4 12.9





Observation j
197 94.2 4.9 8.1 9.7

198 177 9.3 6.4 12.8

199 283.6 42 66.2 25.5

200 232.1 8.6 8.7 13.4


Observation 200

3
What is Predictive Analytics?
Advertising & Sales

A firm advertising data in 200-markets:


• X-axis shows advertising budget in thousand of dollars
• Y-axis shows sale units in thousand scale

Can we predict Sales using data on TV, Radio, and Newspaper spending? How?

4
Some Few Important Questions
• Is there a relationship between advertising spending and sales?

• How strong the relationship between advertising and sales?

• Which media contributes to sales?

• How accurately can we estimate the effect of each medium on sales?

• How accurately can we predict future sales?

• Is the relationship linear?

• Is there synergy among the advertising media?

5
The simplest Model
Let's j ∈ {TVAd, RatioAd, NewspaperAd}

Sales = β0 + β1Xj + ϵj
Non-random Random
Assumptions:
1 − E[ϵj] = 0
2 − Var(ϵj) = σ 2
3 − ϵj ∼ N(0,σ 2) & are Independent and identically distributed i.e., I . I . D
4 − corr(X, ϵj) = 0
6
Visualization of Linear Regression

7
Why Linear Models?
Linear Models are:

1. Simplicity & Interpretability (they were developed before computers…)

2. They can sometimes even outperform fancier non-linear methods


when

• Error variance is relatively high


• Data is limited

8
The Simple Linear Regression Model
A single predictor: Xj Can be:
• quantitative
• qualitative
• transformations (i.e. log)
• basis expansion (squares)
• numeric coding of qualitative variable
• interactions

Linear Regression Model: Y = f(X) + ϵ Model is linear in parameters


= β0 + β1X + ϵ
Unknown Parameters/Coefficients
9
The Simple Linear Regression Model
A single predictor: Xj Can be: i.e. linear in coefficient
• quantitative
• qualitative
• transformations (i.e. log)
• basis expansion (squares)
• numeric coding of qualitative variable
• interactions

Linear Regression Model: Y = f(X) + ϵ Model is linear in parameters


= β0 + β1X + ϵ

10
Linear vs. Non-linear Model
Linear Non-linear
Model is linear in parameters Model is NOT linear in parameters

Sales = β0 + β1Xj + ϵ Sales = β0 β1 + β1Xj + ϵ

β02
Sales = +ϵ
Sales = β0 + β1Xj2 + ϵ β1 + Xj

Sales = β0 + β1Xj + β2 Xj2 +ϵ Sales = β02 exp(β1Xj) + log(β2)log(Xj) + ϵ

log(Sales)
log(Sales) = β0 + β1 log(Xj) + β2 Xj2 + ϵ = β02 exp(β1Xj) + log(β2)log(Xj) + ϵ
log(Xj + 1)

11
Which lines (Models) does explain data better?
How do sales depend on the TV advertising budget?
(β0 = 1,β1 = 1.5)
(β0 = 4,β1 = 1) intercept slope
(β0 = 4,β1 = 0.8)

Sales = β0 + β1TVAd + ϵ

What is average sales What is marginal increase of sales


at TVAd = 0. If TVAd goes up by 1 unit.

12
Estimation
Let's our line be ŷ = f ̂ = β0̂ + β1̂ TVAd

Residual
̂ ) = y − ŷ
ei = yi − f(xi i i
25

This is not a “random error”, it is the difference


20

between the data and the prediction


Sales

15

Residual Sum of Squares


10

RSS = e12 + e22 + … + en2


5

0 50 100 150 200 250 300

TV

13
The line explains the Data better (on average)

if and only if

Residual Sum of Squares be the smallest

if and only if

MSEtrain be the smallest.

14
We minimize the Residual Sum of Squares:
n

[∑ ]
min RSS = min (yi − β0 − β1xi)2
β0,β1 β0,β1
i=1

3 3
2.5

0.06
2.15

0.05
β1
RS

0.04
S

2.2
2.3

0.03
β1 3 3

β0 5 6 7 8 9

β0

15
OLS

[∑ ]
min RSS = min (yi − β0 − β1xi)2
β0,β1 β0,β1
i=1

This method is called


“ordinary least squares” (OLS)

16
OLS

[∑ ]
min RSS = min (yi − β0 − β1xi)2
β0,β1 β0,β1
⟹ i=1

Extra $1000 in TV, increases sales by 47.5 units

25
∑i=1 (xi − x̄)(yi − ȳ)
n

20
β1̂ = = 0.0475

Sales

15
∑i=1 (xi − x̄)
n 2

10
β0̂ = ȳ − β1̂ x̄ = 7.03

5
0 50 100 150 200 250 300

TV

17
Assessing Accuracy of Coefficient Estimates
• Now, we need to know how “good” our estimates are…
(i.e. how well does this line capture the pattern in the data?)

• Why? How far and confident can we use it to answer a question?


(Prediction or Inference)

25
20
β1̂ = 0.0475

Sales

15
10
β0̂ = 7.03

5
0 50 100 150 200 250 300

TV

18
Assessing Accuracy of Coefficient Estimates
Let's the true DGP beY = β0 + β1X + ϵ = 2 + 3X + ϵ

Example of a simulated The Estimates from 10


dataset with simulated datasets with
• There is variation in the estimates
10

10
over each sample.

• This is due to the randomness of the


5

5
sample. The unobserved errors vary
Y

Y sample to sample.
0

0
−5

−5
−10

−10

−2 −1 0 1 2 −2 −1 0 1 2

X X

19
Assessing Accuracy of Coefficient Estimates
But we only have one sample and finite observations…We have formulas
for the standard error: how far (on average) the estimate differs

σ2
[ n ∑i (xi − x̄)2 ]
2
1 x̄
SE( β0̂ )2 = σ 2 + n SE( β1̂ )2 = n
∑i (xi − x̄)2

There is a problem to complete the above formulas?

20
Assessing Accuracy of Coefficient Estimates
But we only have one sample and finite observations…We have formulas
for the standard error: how far (on average) the estimate differs

σ2
[ n ∑i (xi − x̄)2 ]
2
1 x̄
SE( β0̂ )2 = σ 2 + n SE( β1̂ )2 = n
∑i (xi − x̄)2

2
Do we know σ ?

21
Assessing Accuracy of Coefficient Estimates
But we only have one sample and finite observations…We have formulas
for the standard error: how far (on average) the estimate differs

σ ̂2
[ n ∑i (xi − x̄)2 ]
2
1 x̄
SE( β0̂ )2 = σ ̂2 + n SE( β1̂ )2 = n
∑i (xi − x̄)2

2
RSS ∑ e n̂
25

2
σ̂ = =
n−2 n−2 We can use its estimation
20
Sales

15
10

RSS ∑ e2n̂
σ̂ = =
5

0 50 100 150 200 250 300 n−2 n−2


TV
22
RSS ∑ e2n̂
σ̂ = = “average” size of residual
n−2 n−2
25

σ ̂ = 3.259
20

Even if model was correct, any sales


Sales

15

prediction would be off by 3260 units,


10

on average (Irreducible error)


5

0 50 100 150 200 250 300

TV

23
Confidence Interval
β ̂ ± 2 ⋅ SE( β)̂

β ̂ − 1.96 ⋅ SE( β)̂ β̂ β ̂ + 1.96 ⋅ SE( β)̂

For Sales data:

β0̂
6.12 7.03 7.94

β1̂
0.042 0.047 0.053
24
Hypothesis Testing (A two-tailed test)
Null Hypothesis: β1 = 0 There IS NOT a significant effect of X on Y

Alternative Hypothesis: β1 ≠ 0 There IS a significant effect of X on Y


Is β su ciently far from zero to reject the null? How far?

25
ffi
Hypothesis Testing (A two-tailed test)
Null Hypothesis: β1 = 0 There IS NOT a significant effect of X on Y

Alternative Hypothesis: β1 ≠ 0 There IS a significant effect of X on Y

It’s unlikely (5%) that the t-stat falls in these


regions by chance, so reject the null!

β1̂ − 0
Test statistic: t =
SE( β1̂ ) 2.5% 2.5%
t
-1.96 0 1.96
26
Goodness-of-Fit

TSS: Total Sum of Squares Y = f(X) + ϵ

n
(yi − ȳ)2

TSS = This is total varia on in y . Our model Y = β0 + β1X will be closer to f
i=1 if we explain higher proportion of TSS
by using X
RSS: Total Sum of Residuals
n
e2î

RSS = This is our es ma on of varia on in ϵ .
i=1

27
ti
ti
ti
ti
Goodness-of-Fit

R 2 stas sic: The propor on of variance explained


TSS: Total Sum of Squares
How much varia on is removed by the regression
n
(yi − ȳ)2

TSS = This is total varia on in y .
i=1 2 TSS − RSS RSS
R = =1−
TSS TSS
RSS: Total Sum of Residuals
n
e2î

RSS = This is our es ma on of varia on in ϵ .
i=1

28
ti
ti
ti
ti
ti
ti
ti
Goodness-of-Fit

R 2 stas sic: The propor on of variance explained

How much varia on is removed by the regression

n
∑i=1 (xi − x̄)(yi − ȳ)
r = corr(X, Y ) = 2 TSS − RSS RSS
n
∑i=1 (xi − x̄)2(yi − ȳ)2 R = =1−
TSS TSS

If we have only one predictor X ⟹ r 2 = R 2 i.e., how far X & Y are linearly connected

29
ti
ti
ti
Multiple Linear Regression

30
What if we have p − variables : X1⋯Xp
X1 X2 X3 Y
TV radio newspaper sales

1 230.1 37.8 69.2 22.1

2 44.5 39.3 45.1 10.4

3 17.2 45.9 69.3 9.3

4 151.5 41.3 58.5 18.5

⋯ 5 180.8 10.8 58.4 12.9



197 94.2 4.9 8.1 9.7

198 177 9.3 6.4 12.8

199 283.6 42 66.2 25.5

200 232.1 8.6 8.7 13.4

31
Question:

Should we consider three separated simple linear regression:


Sales = βoTv + β1TvTVAd + ϵ
Sales = βoRa + β1Ra RadioAd + ϵ
Sales = βoNe + β1Ne NewspaperAd + ϵ

Or a multiple linear regression:

Sales = βo + β1TVAd + β2 RadioAd + β3NewspaperAd + ϵ

32
If we consider three separated simple linear regression:
Sales = βoTv + β1TvTVAd + ϵ RadioAd & NewspaperAd are here
Sales = βoRa + β1Ra RadioAd + ϵ T VAd & NewspaperAd are here
Sales = βoNe + β1Ne NewspaperAd + ϵ T VAd & RadioAd are here

1- Sales = βo + β1T VAd + β2 RadioAd + β3NewspaperAd + ϵ


provides more information of advertising effect at each medium (when there are others)
What will be effect of an advertising budgets(XTVAd, XRadioAd, XNewspaer )

2- If there is any corrolation between Xj & Xj′


then it is a violation ofcorr(Xj, ϵ) = 0 Called Omitted Variables Bias

33

The Meaning of Multiple Linear Regression

Y = β0 + β1X1 + ⋯ + βj Xj + ⋯… + βp Xp + ϵ P≥2

The E ect of Xj on Y, holding all other predictors xed


If Xj increases by 1 unit, then Y will be increased by βj , when holding all other predictors xed

fi
34
ff
fi
Visualization of Multiple Linear Regression

The Model is Y = β0 + β1X1 + ⋯ + βj Xj + ⋯… + βp Xp + ϵ P≥2

Sales
Residuals:
ei = yi − f([xi,1, …, xi,p])
Difference between model and data

TV

Radio

35
Sales = βoTv + β1TvTVAd + ϵ
72 3. Linear Regression

72 3. Linear Regression
Simple regression of sales on radio
Sales
Simple = βoRaStd.
+ of
regression
Coefficient βerror
1Ra RadioAd
sales on radio+ ϵ p-value
t-statistic
Intercept 9.312 0.563 16.54 < 0.0001
Coefficient Std. error t-statistic p-value
radio 0.203 0.020 9.92 < 0.0001
Intercept 9.312 0.563 16.54 < 0.0001
radio 0.203 0.020 9.92 < 0.0001
Sales
Simple = β of
regression + sales
oNe β NewspaperAd
1Ne
on newspaper +ϵ
Simple regression
Coefficient of sales
Std. error ont-statistic
newspaper p-value
Intercept 12.351 0.621 19.88 < 0.0001
Coefficient Std. error t-statistic p-value
newspaper 0.055 0.017 3.30 0.00115
Intercept 12.351 0.621 19.88 < 0.0001
TABLEnewspaper
3.3. More simple linear
0.055regression models for the
0.017 3.30 data. Co-
0.00115
Advertising
Suppose youofenter
efficients a newlinear
the simple market, shouldmodel
regression you for
advertise
number in
of the
unitsnewspaper?
sold on Top:
TABLE 3.3. More
radio advertising simple
budget andlinear regression
Bottom: models
newspaper for the Advertising
advertising data. Co-
budget. A $1,000 in-
efficients of the simple
crease in spending linearadvertising
on radio regression ismodel for number
associated of average
with an units sold on Top:
increase in
radio advertising budget and Bottom: newspaper advertising budget. A $1,000 in- 36
Sales = βoT v + β1T vT VAd + ϵ

72 3. Linear Regression
72 3. Linear Regression
Simple regression of sales on radio
Sales
Simple = βoRa + of
regression
Coefficient Std. βerrorRa dioAd
sales
1Ra +ϵ
on radio
t-statistic p-value
9.312 Std. 0.563
Intercept Coefficient 16.54
error t-statistic <p-value
0.0001
radio
Intercept
0.203
9.312 0.020
0.563 9.92
16.54 <
< 0.0001
0.0001
radio 0.203 0.020 9.92 < 0.0001
Simple regression
Sales of Ne
= βoNe + β1Ne wspaon
sales perAd +ϵ
newspaper
Simple regression
Coefficient of sales
Std. error ont-statistic
newspaper p-value
Intercept 12.351 Std. 0.621
Coefficient 19.88 <p-value
error t-statistic 0.0001
newspaper 0.055 0.017 3.30 0.00115
Intercept 12.351 0.621 19.88 < 0.0001
4 3. Linear Regression
Sales TABLE
= βo 3.3.
+theMore
βsimple
1TVAd + β2model RadioAd
for number of+ βsold
3NewspaperAd +ϵ
TABLEnewspaper 0.055regression
3.3. More simple linear 0.017 3.30
models for the 0.00115
Advertising data. Co-
efficients of linear regression units on Top:
simple linear regression models for the Advertising data. Co-
radio advertising budget and Bottom: newspaper advertising budget. A $1,000 in-
efficients
crease in of the simple linearadvertising
regression ismodel for number of average
units sold on Top:
Coefficient Std. error t-statistic
spending on radio associated with an
radio advertising budget and Bottom: newspaper advertising budget. A $1,000 in-
sales by around 203 units, while the same increase in spending on newspaper ad-
increase in p-value
crease in is
spending on radio advertising is associated with
by an average increase in
vertising
Intercept
sales by around
that the
2.939
associated with an
203 units,
sales variable is while
0.3119
average increase in sales
the sameofincrease
in thousands
9.42
around
in spending
units, and
55 units
the radioonand
(Note
newspaper ad-
newspaper
< 0.0001
vertising
variables isareassociated with of
andollars).
average increase in sales by around 55 units (Note
TV 0.046
in thousands 0.0014 32.81
that the sales variable is in thousands of units, and the radio and newspaper < 0.0001
variables are in thousands of dollars).
radiowhere X represents the
j 0.189
jth predictor and0.0086 21.89
β quantifies the association
j < 0.0001
between that variable and the response. We interpret β as the average j
effect on Y of a one−0.001 0.0059 −0.18 0.8599
where X represents the jth predictor and β quantifies the association
newspaper j
unit increase in X , holding j
all other predictors
j
between that variable and the response. We interpret β as the average
j
fixed.
In the advertising example, (3.19) becomes
effect on Y of a one unit increase in Xj , holding all other predictors fixed.
TABLE 3.4. For the WeAdvertising
controls
In the sales
advertising
0 = β +example,
1
for
β × TV +(3.19)
2
other
data, leastexplanations
β × becomes squares coefficient estimates of the
radio + β × newspaper + ".
3 (3.20)
multiple linear regression
sales = β0 + βof number
1 × TV + β2 × radioof
+ βunits sold+ ".on radio,
3 × newspaper (3.20) TV, and newspaper
3.2.1 Estimating the Regression Coefficients 37
dvertising budgets.
46 3. Linear Methods for Regression

1 x1 x2 x3 y
1 x11 x12 … x1,p
1 x21 x22 … x2,p
X= Estimated Residual
⋮ ⋮ ⋮ ⋮
1 xn1 xn2 … xn,p ê = y − ŷ
vectors x2


x1
38
Correlated Information

x2 Newspaper

x1
Radio
The extra information added by NewspaperAd has no impact on sales, once
RadioAd is controlled for.

The variation in Sales can be explained by RadioAd


39
Correlated Information
Leaving out important explanatory variables can
severely bias parameter estimates, if
1. The omitted variable is correlated with the
response
2. The omitted variable is correlated with other
predictors

x2

x1
40
What is Predictive Analytics?
Advertising & Sales

We observe Advertising (in any medium) has positive effect on Sales

i.e., Sales = βo + (β1 > 0)TVAd + (β2 > 0)RadioAd + (β3 > 0)NewspaperAd + ϵ

41
Sales = βoT v + β1T vT VAd + ϵ

72 3. Linear Regression
72 3. Linear Regression
Simple regression of sales on radio
Sales
Simple = βoRa + of
regression
Coefficient Std. βerrorRa dioAd
sales
1Ra +ϵ
on radio
t-statistic p-value
9.312 Std. 0.563
Intercept Coefficient 16.54
error t-statistic <p-value
0.0001
radio
Intercept
0.203
9.312 0.020
0.563 9.92
16.54 <
< 0.0001
0.0001
radio 0.203 0.020 9.92 < 0.0001
Simple regression
Sales of Ne
= βoNe + β1Ne wspaon
sales perAd +ϵ
newspaper
Simple regression
Coefficient of sales
Std. error ont-statistic
newspaper p-value
Intercept 12.351 Std. 0.621
Coefficient 19.88 <p-value
error t-statistic 0.0001
newspaper 0.055 0.017 3.30 0.00115
Intercept 12.351 0.621 19.88 < 0.0001
4 3. Linear Regression
Sales TABLE
= βo 3.3.
+theMore
βsimple
1TVAd + β2model RadioAd
for number of+ βsold
3NewspaperAd +ϵ
TABLEnewspaper 0.055regression
3.3. More simple linear 0.017 3.30
models for the 0.00115
Advertising data. Co-
efficients of linear regression units on Top:
simple linear regression models for the Advertising data. Co-
radio advertising budget and Bottom: newspaper advertising budget. A $1,000 in-
efficients
crease in of the simple linearadvertising
regression ismodel for number of average
units sold on Top:
Coefficient Std. error t-statistic
spending on radio associated with an
radio advertising budget and Bottom: newspaper advertising budget. A $1,000 in-
sales by around 203 units, while the same increase in spending on newspaper ad-
increase in p-value
crease in is
spending on radio advertising is associated with
by an average increase in
vertising
Intercept
sales by around
that the
2.939
associated with an
203 units,
sales variable is while
0.3119
average increase in sales
the sameofincrease
in thousands
9.42
around
in spending
units, and
55 units
the radioonand
(Note
newspaper ad-
newspaper
< 0.0001
vertising
variables isareassociated with of
andollars).
average increase in sales by around 55 units (Note
TV 0.046
in thousands 0.0014 32.81
that the sales variable is in thousands of units, and the radio and newspaper < 0.0001
variables are in thousands of dollars).
radiowhere X represents the
j 0.189
jth predictor and0.0086 21.89
β quantifies the association
j < 0.0001
between that variable and the response. We interpret β as the average j
effect on Y of a one−0.001 0.0059 −0.18 0.8599
where X represents the jth predictor and β quantifies the association
newspaper j
unit increase in X , holding j
all other predictors
j
between that variable and the response. We interpret β as the average
j
fixed.
In the advertising example, (3.19) becomes
effect on Y of a one unit increase in Xj , holding all other predictors fixed.
TABLE 3.4. For the WeAdvertising
controls
In the sales
advertising
0 = β +example,
1
for
β × TV +(3.19)
2
other
data, leastexplanations
β × becomes squares coefficient estimates of the
radio + β × newspaper + ".
3 (3.20)
multiple linear regression
sales = β0 + βof number
1 × TV + β2 × radioof
+ βunits sold+ ".on radio,
3 × newspaper (3.20) TV, and newspaper
3.2.1 Estimating the Regression Coefficients 42
dvertising budgets.
Omitted Variables Bias
Sales = βo + β1TVAd + β2 RadioAd + β3NewspaperAd + ϵ

̂ = β̂ + β̂ X
If we do Sales 0 1 RadioAd+ϵ′

The es mated coe cient for β ̂ will be too big.


1

It is biased “upward.”
The other advertising media effects on Sales is reflected on
(they are positively correlated).

Endogeneity
ti
ffi

43
Omitted Variables Bias
Sales = βo + β1TVAd + β2 RadioAd + β3NewspaperAd + ϵ

̂ = β̂ + β̂ X
If we do Sales 0 1 RadioAd+ϵ′

The es mated coe cient for β ̂ will be too big.


1

It is biased “upward.”

Why?
ti
ffi

44
Omitted Variables Bias
Sales = βo + β1TVAd + β2 RadioAd + β3NewspaperAd + ϵ

̂ = β̂ + β̂ X
If we do Sales 0 1 RadioAd+ϵ′ := βo + β1TVAd + β2 NewspaperAd + ϵ

The es mated coe cient for β ̂ will be too big.


1 corr(XRadioAd, ϵ′) ≠ 0

This violates a key assumption of linear regression

The bias can go up or down, depending on the correlation and model


ti
ffi


45
Directions of Bias
A data-scientist should think about the possible dictions of biases:

True Model: Y = β0 + β1X1 + β2 X2 + ϵ


Estimated Model: Y ̂ = β0̂ + β1̂ X1
Bias in β1̂
Corr(X1, X2) > 0 Corr(X1, X2) < 0

β2 > 0 Positive Bias Negative Bias

β2 < 0 Negative Bias Positive Bias


46
What is Predictive Analytics?
Advertising & Sales

• Does Price will change our results of regression on advertising?


• Does Salesforce experience will change our results of regression on
advertising?
AD 1 AD 2
47

You might also like