0% found this document useful (0 votes)
13 views96 pages

7 Regression

Uploaded by

Zahra Aghaei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views96 pages

7 Regression

Uploaded by

Zahra Aghaei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

ETC3550/ETC5550

Applied forecasting

Ch7. Regression models


OTexts.org/fpp3/
Outline

1 The linear model with time series


2 Some useful predictors for linear models
3 Residual diagnostics
4 Selecting predictors and forecast evaluation
5 Forecasting with regression
6 Matrix formulation
7 Correlation, causation and forecasting
2
Outline

1 The linear model with time series


2 Some useful predictors for linear models
3 Residual diagnostics
4 Selecting predictors and forecast evaluation
5 Forecasting with regression
6 Matrix formulation
7 Correlation, causation and forecasting
3
Multiple regression and forecasting

yt = β0 + β1 x1,t + β2 x2,t + · · · + βk xk,t + εt .

yt is the variable we want to predict: the “response” variable


Each xj,t is numerical and is called a “predictor”. They are usually
assumed to be known for all past and future times.
The coefficients β1 , . . . , βk measure the effect of each predictor
after taking account of the effect of all other predictors in the
model.
That is, the coefficients measure the marginal effects.
εt is a white noise error term 4
Example: US consumption expenditure

Consumption
2
1
0
−1
−2

Income
2.5
0.0
−2.5

Production
2.5
0.0
−2.5
−5.0
40

Savings Unemployment
20
0
−20
−40
−60
1.5
1.0
0.5
0.0
−0.5
−1.0
1980 Q1 2000 Q1 2020 Q1
Quarter

5
Example: US consumption expenditure
Consumption Income Production Savings Unemployment

Consumption
0.6
0.4
Corr: Corr: Corr: Corr:
0.2 0.384*** 0.529*** −0.257*** −0.527***
0.0

Income
2.5 Corr: Corr: Corr:
0.0
−2.5 0.269*** 0.720*** −0.224**

Production
2.5
0.0 Corr: Corr:
−2.5 −0.059 −0.768***
−5.0
40

Savings Unemployment
20 Corr:
0
−20 0.106
−40
−60
1.5
1.0
0.5
0.0
−0.5
−1.0
−2 −1 0 1 2 −2.5 0.0 2.5 −5.0 −2.5 0.0 2.5 −60 −40 −20 0 20 40−1.0 −0.5 0.0 0.5 1.0 1.5

6
Example: US consumption expenditure
fit_consMR <- us_change %>%
model(lm = TSLM(Consumption ~ Income + Production + Unemployment + Savings))
report(fit_consMR)

## Series: Consumption
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.906 -0.158 -0.036 0.136 1.155
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.25311 0.03447 7.34 5.7e-12 ***
## Income 0.74058 0.04012 18.46 < 2e-16 ***
## Production 0.04717 0.02314 2.04 0.043 *
## Unemployment -0.17469 0.09551 -1.83 0.069 .
## Savings -0.05289 0.00292 -18.09 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.31 on 193 degrees of freedom
## Multiple R-squared: 0.768, Adjusted R-squared: 0.763
7
Example: US consumption expenditure

Percent change in US consumption expenditure

1
Data
0
Fitted

−1

−2

1980 Q1 2000 Q1 2020 Q1


Quarter

8
Example: US consumption expenditure

Percentage change in US consumption expenditure


Fitted (predicted values)

−1

−2 −1 0 1 2
Data (actual values)

9
Example: US consumption expenditure

fit_consMR %>% gg_tsresiduals()


Innovation residuals

1.0
0.5
0.0
−0.5
−1.0
1980 Q1 2000 Q1 2020 Q1
Quarter
40
0.1
30

count
0.0
acf

20
−0.1 10
−0.2 0
2 4 6 8 10 12 14 16 18 20 22 −1.0 −0.5 0.0 0.5 1.0
lag [1Q] .resid

10
Outline

1 The linear model with time series


2 Some useful predictors for linear models
3 Residual diagnostics
4 Selecting predictors and forecast evaluation
5 Forecasting with regression
6 Matrix formulation
7 Correlation, causation and forecasting
11
Trend

Linear trend
xt = t
t = 1, 2, . . . , T
Strong assumption that trend will continue.

12
Nonlinear trend

Piecewise linear trend with bend at τ


x1,t = t

 0 t<τ
x2,t = 
(t − τ ) t ≥ τ

13
Nonlinear trend

Piecewise linear trend with bend at τ


x1,t = t

 0 t<τ
x2,t = 
(t − τ ) t ≥ τ

Quadratic or higher order trend


x1,t = t, x2,t = t2 , ...

13
Nonlinear trend

Piecewise linear trend with bend at τ


x1,t = t

 0 t<τ
x2,t = 
(t − τ ) t ≥ τ

Quadratic or higher order trend


x1,t = t, x2,t = t2 , ...

NOT RECOMMENDED! 13
Dummy variables
If a categorical variable takes
only two values (e.g., ‘Yes’ or
‘No’), then an equivalent
numerical variable can be
constructed taking value 1 if
yes and 0 if no. This is called
a dummy variable.

14
Dummy variables
If there are more than two
categories, then the variable
can be coded using several
dummy variables (one fewer
than the total number of
categories).

15
Beware of the dummy variable trap!

Using one dummy for each category gives too many dummy
variables!
The regression will then be singular and inestimable.
Either omit the constant, or omit the dummy for one category.
The coefficients of the dummies are relative to the omitted
category.

16
Uses of dummy variables

Seasonal dummies
For quarterly data: use 3 dummies
For monthly data: use 11 dummies
For daily data: use 6 dummies
What to do with weekly data?

17
Uses of dummy variables

Seasonal dummies
For quarterly data: use 3 dummies
For monthly data: use 11 dummies
For daily data: use 6 dummies
What to do with weekly data?
Outliers
If there is an outlier, you can use a dummy variable to remove its
effect.

17
Uses of dummy variables

Seasonal dummies
For quarterly data: use 3 dummies
For monthly data: use 11 dummies
For daily data: use 6 dummies
What to do with weekly data?
Outliers
If there is an outlier, you can use a dummy variable to remove its
effect.
Public holidays
For daily data: if it is a public holiday, dummy=1, otherwise dummy=0. 17
Beer production revisited
Australian quarterly beer production

500
Megalitres

450

400

1995 Q1 2000 Q1 2005 Q1 2010 Q1


Quarter [1Q]

18
Beer production revisited
Australian quarterly beer production

500
Megalitres

450

400

1995 Q1 2000 Q1 2005 Q1 2010 Q1


Quarter [1Q]

Regression model
yt = β0 + β1 t + β2 d2,t + β3 d3,t + β4 d4,t + εt
di,t = 1 if t is quarter i and 0 otherwise. 18
Beer production revisited
fit_beer <- recent_production %>% model(TSLM(Beer ~ trend() + season()))
report(fit_beer)

## Series: Beer
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.9 -7.6 -0.5 8.0 21.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 441.8004 3.7335 118.33 < 2e-16 ***
## trend() -0.3403 0.0666 -5.11 2.7e-06 ***
## season()year2 -34.6597 3.9683 -8.73 9.1e-13 ***
## season()year3 -17.8216 4.0225 -4.43 3.4e-05 ***
## season()year4 72.7964 4.0230 18.09 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
19
## Residual standard error: 12.2 on 69 degrees of freedom
Beer production revisited
augment(fit_beer) %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Beer, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
labs(y="Megalitres",title ="Australian quarterly beer production") +
scale_colour_manual(values = c(Data = "black", Fitted = "#D55E00"))

Australian quarterly beer production

500
Megalitres

colour
450 Data
Fitted

400

1995 Q1 2000 Q1 2005 Q1 2010 Q1 20


Quarter
Beer production revisited
augment(fit_beer) %>%
ggplot(aes(x=Beer, y=.fitted, colour=factor(quarter(Quarter)))) +
geom_point() +
labs(y="Fitted", x="Actual values", title = "Quarterly beer production") +
scale_colour_brewer(palette="Dark2", name="Quarter") +
geom_abline(intercept=0, slope=1)

Quarterly beer production

Quarter
480
1
Fitted

2
440
3
4
400

400 450 500 21


Actual values
Beer production revisited

fit_beer %>% gg_tsresiduals()


Innovation residuals

20
0
−20
−40
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Quarter

0.2
0.1 20

count
acf

0.0
−0.1 10
−0.2
0
2 4 6 8 10 12 14 16 18 −25 0 25
lag [1Q] .resid

22
Beer production revisited

fit_beer %>% forecast %>% autoplot(recent_production)

500

level
Beer

450
80
95

400

350
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Quarter

23
Fourier series

Periodic seasonality can be handled using pairs of Fourier terms:


2πkt 2πkt
! !
sk (t) = sin ck (t) = cos
m m
K
X
yt = a + bt + [αk sk (t) + βk ck (t)] + εt
k=1
Every periodic function can be approximated by sums of sin and
cos terms for large enough K.
Choose K by minimizing AICc.
Called “harmonic regression”
TSLM(y ~ trend() + fourier(K))
24
Harmonic regression: beer production
fourier_beer <- recent_production %>% model(TSLM(Beer ~ trend() + fourier(K=2)))
report(fourier_beer)

## Series: Beer
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.9 -7.6 -0.5 8.0 21.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 446.8792 2.8732 155.53 < 2e-16 ***
## trend() -0.3403 0.0666 -5.11 2.7e-06 ***
## fourier(K = 2)C1_4 8.9108 2.0112 4.43 3.4e-05 ***
## fourier(K = 2)S1_4 -53.7281 2.0112 -26.71 < 2e-16 ***
## fourier(K = 2)C2_4 -13.9896 1.4226 -9.83 9.3e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
25
## Residual standard error: 12.2 on 69 degrees of freedom
Harmonic regression: eating-out expenditure
aus_cafe <- aus_retail %>% filter(
Industry == "Cafes, restaurants and takeaway food services",
year(Month) %in% 2004:2018
) %>% summarise(Turnover = sum(Turnover))
aus_cafe %>% autoplot(Turnover)

4000

3500
Turnover

3000

2500

2000

2005 Jan 2010 Jan 2015 Jan 26


Month [1M]
Harmonic regression: eating-out expenditure
fit <- aus_cafe %>%
model(K1 = TSLM(log(Turnover) ~ trend() + fourier(K = 1)),
K2 = TSLM(log(Turnover) ~ trend() + fourier(K = 2)),
K3 = TSLM(log(Turnover) ~ trend() + fourier(K = 3)),
K4 = TSLM(log(Turnover) ~ trend() + fourier(K = 4)),
K5 = TSLM(log(Turnover) ~ trend() + fourier(K = 5)),
K6 = TSLM(log(Turnover) ~ trend() + fourier(K = 6)))
glance(fit) %>% select(.model, r_squared, adj_r_squared, AICc)

## # A tibble: 6 x 4
## .model r_squared adj_r_squared AICc
## <chr> <dbl> <dbl> <dbl>
## 1 K1 0.962 0.962 -1085.
## 2 K2 0.966 0.965 -1099.
## 3 K3 0.976 0.975 -1160.
## 4 K4 0.980 0.979 -1183.
## 5 K5 0.985 0.984 -1234.
## 6 K6 0.985 0.984 -1232. 27
Harmonic regression: eating-out expenditure
Log transformed TSLM, trend() + fourier(K = 1)

5000

AICc = −1085
4000

level
Turnover

80
95
3000

2000

2005 Jan 2010 Jan 2015 Jan 2020 Jan


Month

28
Harmonic regression: eating-out expenditure
Log transformed TSLM, trend() + fourier(K = 2)

5000

AICc = −1099
4000

level
Turnover

80
95
3000

2000

2005 Jan 2010 Jan 2015 Jan 2020 Jan


Month

29
Harmonic regression: eating-out expenditure
Log transformed TSLM, trend() + fourier(K = 3)

5000

AICc = −1160
4000

level
Turnover

80
95
3000

2000

2005 Jan 2010 Jan 2015 Jan 2020 Jan


Month

30
Harmonic regression: eating-out expenditure
Log transformed TSLM, trend() + fourier(K = 4)

5000

AICc = −1183
4000

level
Turnover

80
95
3000

2000

2005 Jan 2010 Jan 2015 Jan 2020 Jan


Month

31
Harmonic regression: eating-out expenditure
Log transformed TSLM, trend() + fourier(K = 5)

5000

AICc = −1234
4000

level
Turnover

80
95
3000

2000

2005 Jan 2010 Jan 2015 Jan 2020 Jan


Month

32
Harmonic regression: eating-out expenditure
Log transformed TSLM, trend() + fourier(K = 6)

5000

AICc = −1232
4000

level
Turnover

80
95
3000

2000

2005 Jan 2010 Jan 2015 Jan 2020 Jan


Month

33
Intervention variables

Spikes
Equivalent to a dummy variable for handling an outlier.

34
Intervention variables

Spikes
Equivalent to a dummy variable for handling an outlier.
Steps
Variable takes value 0 before the intervention and 1 afterwards.

34
Intervention variables

Spikes
Equivalent to a dummy variable for handling an outlier.
Steps
Variable takes value 0 before the intervention and 1 afterwards.
Change of slope
Variables take values 0 before the intervention and values
{1, 2, 3, . . . } afterwards.
34
Holidays

For monthly data


Christmas: always in December so part of monthly seasonal
effect
Easter: use a dummy variable vt = 1 if any part of Easter is in that
month, vt = 0 otherwise.
Ramadan and Chinese new year similar.

35
Distributed lags

Lagged values of a predictor.


Example: x is advertising which has a delayed effect

x1 = advertising for previous month;


x2 = advertising for two months previously;
..
.
xm = advertising for m months previously.

36
Example: Boston marathon winning times
marathon <- boston_marathon %>%
filter(Event == "Men's open division") %>%
select(-Event) %>%
mutate(Minutes = as.numeric(Time)/60)
marathon %>% autoplot(Minutes) + labs(y="Winning times in minutes")
Winning times in minutes

170

160

150

140

130

1900 1925 1950 1975 2000 2025


37
Year [1Y]
Example: Boston marathon winning times
fit_trends <- marathon %>%
model(
# Linear trend
linear = TSLM(Minutes ~ trend()),
# Exponential trend
exponential = TSLM(log(Minutes) ~ trend()),
# Piecewise linear trend
piecewise = TSLM(Minutes ~ trend(knots = c(1940, 1980)))
)

fit_trends

## # A mable: 1 x 3
## linear exponential piecewise
## <model> <model> <model>
## 1 <TSLM> <TSLM> <TSLM>
38
Example: Boston marathon winning times
fit_trends %>% forecast(h=10) %>% autoplot(marathon)

Boston marathon winning times

.model
160
exponential
linear
Minutes

piecewise
140

level
95
120

1920 1960 2000


Year [1Y] 39
Example: Boston marathon winning times
fit_trends %>%
select(piecewise) %>%
gg_tsresiduals()
Innovation residuals

20
10
0
−10

1900 1925 1950 1975 2000 2025


Year
0.3
0.2 20

count
0.1
acf

0.0 10
−0.1
−0.2 0
5 10 15 20 −10 0 10 20
lag [1Y] .resid
40
Outline

1 The linear model with time series


2 Some useful predictors for linear models
3 Residual diagnostics
4 Selecting predictors and forecast evaluation
5 Forecasting with regression
6 Matrix formulation
7 Correlation, causation and forecasting
41
Multiple regression and forecasting

For forecasting purposes, we require the following assumptions:


εt are uncorrelated and zero mean
εt are uncorrelated with each xj,t .

42
Multiple regression and forecasting

For forecasting purposes, we require the following assumptions:


εt are uncorrelated and zero mean
εt are uncorrelated with each xj,t .
It is useful to also have εt ∼ N(0, σ 2 ) when producing prediction
intervals or doing statistical tests.

42
Residual plots

Useful for spotting outliers and whether the linear model was
appropriate.
Scatterplot of residuals εt against each predictor xj,t .
Scatterplot residuals against the fitted values ŷt
Expect to see scatterplots resembling a horizontal band with no
values too far from the band and no patterns such as curvature
or increasing spread.

43
Residual patterns

If a plot of the residuals vs any predictor in the model shows a


pattern, then the relationship is nonlinear.
If a plot of the residuals vs any predictor not in the model shows
a pattern, then the predictor should be added to the model.
If a plot of the residuals vs fitted values shows a pattern, then
there is heteroscedasticity in the errors. (Could try a
transformation.)

44
Outline

1 The linear model with time series


2 Some useful predictors for linear models
3 Residual diagnostics
4 Selecting predictors and forecast evaluation
5 Forecasting with regression
6 Matrix formulation
7 Correlation, causation and forecasting
45
Comparing regression models

Computer output for regression will always give the R2 value. This is a
useful summary of the model.
It is equal to the square of the correlation between y and ŷ.
It is often called the “coefficient of determination’ ’.
It can also be calculated as follows:
(ŷt − ȳ)2
P
2
R =P
(yt − ȳ)2
It is the proportion of variance accounted for (explained) by the
predictors.
46
Comparing regression models
However . . .
R2 does not allow for “degrees of freedom’ ’.
Adding any variable tends to increase the value of R2 , even if that
variable is irrelevant.

47
Comparing regression models
However . . .
R2 does not allow for “degrees of freedom’ ’.
Adding any variable tends to increase the value of R2 , even if that
variable is irrelevant.
To overcome this problem, we can use adjusted R2 :
T−1
R̄2 = 1 − (1 − R2 )
T−k−1
where k = no. predictors and T = no. observations.

47
Comparing regression models
However . . .
R2 does not allow for “degrees of freedom’ ’.
Adding any variable tends to increase the value of R2 , even if that
variable is irrelevant.
To overcome this problem, we can use adjusted R2 :
T−1
R̄2 = 1 − (1 − R2 )
T−k−1
where k = no. predictors and T = no. observations.

Maximizing R̄2 is equivalent to minimizing σ̂ 2 .


1 T
σ̂ 2 = ε2
X
47
T − k − 1 t=1 t
Akaike’s Information Criterion

AIC = −2 log(L) + 2(k + 2)

where L is the likelihood and k is the number of predictors in the


model.

48
Akaike’s Information Criterion

AIC = −2 log(L) + 2(k + 2)

where L is the likelihood and k is the number of predictors in the


model.
AIC penalizes terms more heavily than R̄2 .
Minimizing the AIC is asymptotically equivalent to minimizing
MSE via leave-one-out cross-validation (for any linear
regression).
48
Corrected AIC

For small values of T, the AIC tends to select too many predictors, and
so a bias-corrected version of the AIC has been developed.
2(k + 2)(k + 3)
AICC = AIC +
T−k−3

As with the AIC, the AICC should be minimized.

49
Bayesian Information Criterion

BIC = −2 log(L) + (k + 2) log(T)

where L is the likelihood and k is the number of predictors in the


model.

50
Bayesian Information Criterion

BIC = −2 log(L) + (k + 2) log(T)

where L is the likelihood and k is the number of predictors in the


model.
BIC penalizes terms more heavily than AIC
Also called SBIC and SC.
Minimizing BIC is asymptotically equivalent to leave-v-out
cross-validation when v = T[1 − 1/(log(T) − 1)].
50
Leave-one-out cross-validation

For regression, leave-one-out cross-validation is faster and more


efficient than time-series cross-validation.
Select one observation for test set, and use remaining
observations in training set. Compute error on test observation.
Repeat using each possible observation as the test set.
Compute accuracy measure over all errors.

51
Cross-validation

Traditional evaluation
Training data Test data
time

52
Cross-validation

Traditional evaluation
Training data Test data
time

Time series cross-validation


h=1

52
Cross-validation

Traditional evaluation
Training data Test data
time

Leave-one-out cross-validation
h=1

53
Cross-validation

Traditional evaluation
Training data Test data
time

Leave-one-out cross-validation
h=1

CV = MSE on test sets

53
Choosing regression variables

Best subsets regression


Fit all possible regression models using one or more of the
predictors.
Choose the best model based on one of the measures of
predictive ability (CV, AIC, AICc).

54
Choosing regression variables

Best subsets regression


Fit all possible regression models using one or more of the
predictors.
Choose the best model based on one of the measures of
predictive ability (CV, AIC, AICc).
Warning!
If there are a large number of predictors, this is not possible.
For example, 44 predictors leads to 18 trillion possible models!
54
Choosing regression variables

Backwards stepwise regression


Start with a model containing all variables.
Try subtracting one variable at a time. Keep the model if it has
lower CV or AICc.
Iterate until no further improvement.

55
Choosing regression variables

Backwards stepwise regression


Start with a model containing all variables.
Try subtracting one variable at a time. Keep the model if it has
lower CV or AICc.
Iterate until no further improvement.
Notes
Stepwise regression is not guaranteed to lead to the best
possible model.
Inference on coefficients of final model will be wrong.
55
Outline

1 The linear model with time series


2 Some useful predictors for linear models
3 Residual diagnostics
4 Selecting predictors and forecast evaluation
5 Forecasting with regression
6 Matrix formulation
7 Correlation, causation and forecasting
56
Ex-ante versus ex-post forecasts

Ex ante forecasts are made using only information available in


advance.
I require forecasts of predictors
Ex post forecasts are made using later information on the
predictors.
I useful for studying behaviour of forecasting models.
trend, seasonal and calendar variables are all known in advance,
so these don’t need to be forecast.

57
Scenario based forecasting

Assumes possible scenarios for the predictor variables


Prediction intervals for scenario based forecasts do not include
the uncertainty associated with the future values of the
predictor variables.

58
Building a predictive regression model

If getting forecasts of predictors is difficult, you can use lagged


predictors instead.
yt = β0 + β1 x1,t−h + · · · + βk xk,t−h + εt
A different model for each forecast horizon h.

59
US Consumption

fit_consBest <- us_change %>%


model(
TSLM(Consumption ~ Income + Savings + Unemployment)
)

future_scenarios <- scenarios(


Increase = new_data(us_change, 4) %>%
mutate(Income=1, Savings=0.5, Unemployment=0),
Decrease = new_data(us_change, 4) %>%
mutate(Income=-1, Savings=-0.5, Unemployment=0),
names_to = "Scenario")

fc <- forecast(fit_consBest, new_data = future_scenarios)

60
US Consumption
us_change %>% autoplot(Consumption) +
labs(y="% change in US consumption") +
autolayer(fc) +
labs(title = "US consumption", y = "% change")

US consumption

2 Scenario
Decrease
1
% change

Increase

0
level
−1
80
95
−2

1980 Q1 2000 Q1 2020 Q1


Quarter [1Q] 61
Outline

1 The linear model with time series


2 Some useful predictors for linear models
3 Residual diagnostics
4 Selecting predictors and forecast evaluation
5 Forecasting with regression
6 Matrix formulation
7 Correlation, causation and forecasting
62
Matrix formulation

yt = β0 + β1 x1,t + β2 x2,t + · · · + βk xk,t + εt .

63
Matrix formulation

yt = β0 + β1 x1,t + β2 x2,t + · · · + βk xk,t + εt .

Let y = (y1 , . . . , yT )0 , ε = (ε1 , . . . , εT )0 , β = (β0 , β1 , . . . , βk )0 and



1 x1,1 x2,1 . . . xk,1 
1 x1,2 x2,2 . . . xk,2 
 
X =  .. ..

.. ..  .

. . . . 

1 x1,T x2,T . . . xk,T

63
Matrix formulation

yt = β0 + β1 x1,t + β2 x2,t + · · · + βk xk,t + εt .

Let y = (y1 , . . . , yT )0 , ε = (ε1 , . . . , εT )0 , β = (β0 , β1 , . . . , βk )0 and



1 x1,1 x2,1 . . . xk,1 
1 x1,2 x2,2 . . . xk,2 
 
X =  .. ..

.. ..  .

. . . . 

1 x1,T x2,T . . . xk,T
Then
y = Xβ + ε.
63
Matrix formulation

Least squares estimation


Minimize: (y − Xβ)0 (y − Xβ)

64
Matrix formulation

Least squares estimation


Minimize: (y − Xβ)0 (y − Xβ)
Differentiate wrt β gives
β̂ = (X 0 X)−1 X 0 y

64
Matrix formulation

Least squares estimation


Minimize: (y − Xβ)0 (y − Xβ)
Differentiate wrt β gives
β̂ = (X 0 X)−1 X 0 y

(The “normal equation”.)

64
Matrix formulation

Least squares estimation


Minimize: (y − Xβ)0 (y − Xβ)
Differentiate wrt β gives
β̂ = (X 0 X)−1 X 0 y

(The “normal equation”.)

1
σ̂ 2 = (y − X β̂)0 (y − X β̂)
T−k−1
Note: If you fall for the dummy variable trap, (X 0 X) is a singular matrix. 64
Likelihood

If the errors are iid and normally distributed, then


y ∼ N(Xβ, σ 2 I).

65
Likelihood

If the errors are iid and normally distributed, then


y ∼ N(Xβ, σ 2 I).
So the likelihood is
1 1
!
0
L= T exp − 2 (y − Xβ) (y − Xβ)
σ (2π)T/2 2σ

65
Likelihood

If the errors are iid and normally distributed, then


y ∼ N(Xβ, σ 2 I).
So the likelihood is
1 1
!
0
L= T exp − 2 (y − Xβ) (y − Xβ)
σ (2π)T/2 2σ
which is maximized when (y − Xβ)0 (y − Xβ) is minimized.

65
Likelihood

If the errors are iid and normally distributed, then


y ∼ N(Xβ, σ 2 I).
So the likelihood is
1 1
!
0
L= T exp − 2 (y − Xβ) (y − Xβ)
σ (2π)T/2 2σ
which is maximized when (y − Xβ)0 (y − Xβ) is minimized.
So MLE = OLS.

65
Multiple regression forecasts
Optimal forecasts
ŷ∗ = E(y∗ |y, X, x∗ ) = x∗ β̂ = x∗ (X 0 X)−1 X 0 y

where x∗ is a row vector containing the values of the predictors for


the forecasts (in the same format as X).

66
Multiple regression forecasts
Optimal forecasts
ŷ∗ = E(y∗ |y, X, x∗ ) = x∗ β̂ = x∗ (X 0 X)−1 X 0 y

where x∗ is a row vector containing the values of the predictors for


the forecasts (in the same format as X).
Forecast variance
Var(y∗ |X, x∗ ) = σ 2 1 + x∗ (X 0 X)−1 (x∗ )0
h i

66
Multiple regression forecasts
Optimal forecasts
ŷ∗ = E(y∗ |y, X, x∗ ) = x∗ β̂ = x∗ (X 0 X)−1 X 0 y

where x∗ is a row vector containing the values of the predictors for


the forecasts (in the same format as X).
Forecast variance
Var(y∗ |X, x∗ ) = σ 2 1 + x∗ (X 0 X)−1 (x∗ )0
h i

This ignores any errors in x∗ .


95% prediction intervals assuming normal errors:
q

ŷ ± 1.96 Var(y∗ |X, x∗ ). 66
Outline

1 The linear model with time series


2 Some useful predictors for linear models
3 Residual diagnostics
4 Selecting predictors and forecast evaluation
5 Forecasting with regression
6 Matrix formulation
7 Correlation, causation and forecasting
67
Correlation is not causation

When x is useful for predicting y, it is not necessarily causing y.


e.g., predict number of drownings y using number of ice-creams
sold x.
Correlations are useful for forecasting, even when there is no
causality.
Better models usually involve causal relationships (e.g.,
temperature x and people z to predict drownings y).

68
Multicollinearity

In regression analysis, multicollinearity occurs when:


Two predictors are highly correlated (i.e., the correlation
between them is close to ±1).
A linear combination of some of the predictors is highly
correlated with another predictor.
A linear combination of one subset of predictors is highly
correlated with a linear combination of another subset of
predictors.

69
Multicollinearity

If multicollinearity exists. . .
the numerical estimates of coefficients may be wrong (worse in
Excel than in a statistics package)
don’t rely on the p-values to determine significance.
there is no problem with model predictions provided the
predictors used for forecasting are within the range used for
fitting.
omitting variables can help.
combining variables can help.
70

You might also like