0% found this document useful (0 votes)

5 views9 pages

cross-correlation function and lagged regression

The document discusses the relationship between two time series, specifically using cross correlation functions (CCF) to identify lagged predictors. It provides examples of how to analyze the Southern Oscillation Index and fish populations, demonstrating the use of R for calculating CCF and building regression models. The document also highlights the importance of considering time series structures and potential complications in the analysis.

Uploaded by

xiaohujin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views9 pages

cross-correlation function and lagged regression

Uploaded by

xiaohujin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

8.

2 Cross Correlation Functions and Lagged Regressions

The basic problem we’re considering is the description and modeling of the relationship between two time series.

In the relationship between two time series (yt and xt), the series yt may be related to past lags of the x-series. The sample cross

correlation function (CCF) is helpful for identifying lags of the x-variable that might be useful predictors of yt.

In R, the sample CCF is defined as the set of sample correlations between xt+h and yt for h = 0, ±1, ±2, ±3, and so on. A negative

value for h is a correlation between the x-variable at a time before t and the y-variable at time t. For instance, consider h = −2.

The CCF value would give the correlation between xt-2 and yt.

 When one or more xt+h , with h negative, are predictors of yt, it is sometimes said that x leads y.

 When one or more xt+h, with h positive, are predictors of yt, it is sometimes said that x lags y.

In some problems, the goal may be to identify which variable is leading and which is lagging. In many problems we consider,

though, we’ll examine the x-variable(s) to be a leading variable of the y-variable because we will want to use values of the x-

variable to predict future values of y.

Thus, we’ll usually be looking at what’s happening at the negative values of h on the CCF plot.

Note to Minitab Users: Minitab calculates its sample CCF as the set of sample correlations between xt and yt+h. Hence, the “x

leading y” side of the plot is for h positive. That’s where x comes before y in time.

Transfer Function Models

In a full transfer function model, we model yt as potentially a function of past lags of yt and current and past lags of the x-

variables. We also usually model the time series structure of the x-variables as well. We’ll take all of that on next week. This

week we’ll just look at the use of the CCF to identify some relatively simple regression structures for modeling yt.

Sample CCF in R

The CCF command is

ccf(x-variable name, y-variable name).

If you wish to specify how many lags to show, add that number as an argument of the command. For instance, ccf(x,y,

50) will give the CCF for values of h = 0, ±1, …, ±50.

Example: Southern Oscillation Index and Fish Populations in the southern hemisphere.
The text describes the relationship between a measure of weather called the Southern Oscillation Index (SOI) and “recruitment,”

a measure of the fish population in the southern hemisphere. The data are monthly estimates for n = 453 months. We see SOI as

a potential predictor of recruit.

The data are in two different files. The CCF below was created with these commands:

soi= scan("soi.dat")
rec = scan("recruit.dat")
soi=ts (soi)
rec = ts(rec)
ccf (soi, rec)

The most dominant cross correlations occur somewhere between h =−10 and about h = −4. It’s difficult to read the lags exactly

from the plot, so we might want to give an object name to the ccf and then list the object contents. The following two commands

will do that for our example.

ccfvalues = ccf(soi,rec)
ccfvalues

The result, showing lag (the h in xt+h) and correlation with yt :

-23 -22 -21 -20 -19 -18 -17 -16

-15 -14 -13
0.235 0.125 0.000 -0.108 -0.198 -0.253 -0.222 -0.149 -
0.092 -0.076 -0.103

-12 -11 -10 -9 -8 -7 -6 -5

-4 -3 -2
-0.175 -0.267 -0.369 -0.476 -0.560 -0.598 -0.599 -0.527 -
0.297 -0.146 -0.042

-1 0 1 2 3 4 5
6 7 8 9
0.011 0.025 -0.013 -0.086 -0.154 -0.228 -0.259 -0.232 -
0.144 -0.017 0.094

10 11 12 13 14 15 16
17 18 19 20
0.154 0.174 0.162 0.118 0.043 -0.057 -0.129 -0.156 -
0.131 -0.049 0.060

There are nearly equal maximum values at h = −5, −6, −7, and −8 with tapering occurring in both directions from that peak.

Note that the correlations in this region are negative, indicating that an above average value of SOI is likely to lead to a below

average value of “recruit” about 6 months later. And, a below average of SOI is associated with a likely above average recruit

value about 6 months later.

Scatterplots

In the "astsa" library that we’ve been using, Stoffer included a script that produces scatterplots of yt versus xt+h for negative h
from 0 back to a lag that you specify. The command is lag2.plot.

The result of the command lag2.plot (soi, rec, 10) is shown below. In each plot, (recruit variable) is on

the vertical and a past lag of SOI is on the horizontal. Correlation values are given on each plot.
Regression Models

There are a lot of models that we could try based on the CCF and lagged scatterplots for these data. For demonstration purposes,

we’ll first try a multiple regression in which yt, the recruit variable, is a linear function of (past) lags 5, 6, 7, 8, 9, and 10 of the

SOI variable. That model works fairly well. Following is some R output. All coefficients are statistically significant and the R-

squared is about 62%.

The residuals, however, have an AR(2) structure, as seen in the graph following the regression output. We might try the method

described in Lesson 8.1 to adjust for that, but we’ll take a different approach that we’ll describe after the output display.

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)
69.2743 0.8703 79.601 < 2e-16 ***

soilag5 -23.8255 2.7657 -8.615 < 2e-16 ***

soilag6 -15.3775 3.1651 -4.858 1.65e-06 ***

soilag7 -11.7711 3.1665 -3.717 0.000228 ***

soilag8 -11.3008 3.1664 -3.569 0.000398 ***

soilag9 -9.1525 3.1651 -2.892 0.004024 **

soilag10 -16.7219 2.7693 -6.038 3.33e-09 ***

Residual standard error: 17.42 on 436 degrees of freedom

(20 observations deleted due to missingness)

Multiple R-squared: 0.6251, Adjusted R-squared: 0.62

Next week we’ll discuss more about ways to interpret the CCF. One feature that will be described in more detail (with the “why”)

is that a peak in a CCF followed by a tapering pattern is an indicator that lag 1 and possibly lag 2 values of the y-variable may be

helpful predictors.

So, our try number 2 for a regression model will be to use lag 1 and lag 2 values of the y-variable as well as lags 5 through 10 of

the x-variable as linear predictors. Here’s the outcome:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 11.43047 1.33384 8.570 < 2e-16 ***

reclag1 1.25702 0.04316 29.128 < 2e-16 ***

reclag2 -0.41946 0.04120 -10.182 < 2e-16 ***

soilag5 -21.19210 1.11838 -18.949 < 2e-16 ***

soilag6 9.77648 1.56238 6.257 9.4e-10 ***

soilag7 -1.19189 1.32247 -0.901 0.3679

soilag8 -2.17345 1.30806 -1.662 0.0973 .

soilag9 0.56520 1.30035 0.435 0.6640

soilag10 -2.58630 1.19529 -2.164 0.0310 *

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.034 on 434 degrees of freedom

(20 observations deleted due to missingness)

Multiple R-squared: 0.9392, Adjusted R-squared: 0.938

The R-squared value has gone to about 94%. Not all sample coefficients are statistically significant. Although it’s dangerous to

drop too much from a model at once, we might think about dropping lags 7, 8 , 9, and maybe 10 of SOI from the model. You

might disagree with dropping lag 10 of SOI, but we’ll try it because it seems odd to have a “stray” term like that.

So our third attempt is to predict yt using lags 1 and 2 of itself and lags 5 and 6 of the x-variable (SOI). Here’s what happens:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.78498 1.00171 8.770 < 2e-16 ***

reclag1 1.24575 0.04314 28.879 < 2e-16 ***

reclag2 -0.37193 0.03846 -9.670 < 2e-16 ***

soilag5 -20.83776 1.10208 -18.908 < 2e-16 ***

soilag6 8.55600 1.43146 5.977 4.68e-09 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.069 on 442 degrees of freedom

(16 observations deleted due to missingness)

Multiple R-squared: 0.9375, Adjusted R-squared: 0.937

F-statistic: 1658 on 4 and 442 DF, p-value: < 2.2e-16

All sample coefficients are significant and the R-squared is about 94%. The ACF and PACF of the residuals look pretty good.

There’s a barely significant residual autocorrelation at lag 4 which we may or may not want to worry about.

Complications

The CCF pattern is affected by the underlying time series structures of the two variables and the trend each series has. It often

(perhaps most often) is helpful to de-trend and/or take into account the univariate ARIMA structure of the x-variable before

graphing the CCF. We’ll play with this a bit in the homework this week and will take it on more fully next week.

R code

Here’s the full R code for this handout. The alldata=ts.intersect() command preserves proper alignment between

all of the lagged variables (and defines lagged variables). The tryit=lm() commands are specifying the various regression

models and saving results as named objects.

library(astsa)
soi= scan("soi.dat")
rec = scan("recruit.dat")
soi=ts (soi)
rec = ts(rec)
ccfvalues =ccf (soi, rec)
ccfvalues
lag2.plot (soi, rec, 10)
alldata=ts.intersect(rec,reclag1=lag(rec,-1), reclag2=lag(rec,-
2), soilag5 = lag(soi,-5),
soilag6=lag(soi,-6), soilag7=lag(soi,-7), soilag8=lag(soi,-8),
soilag9=lag(soi,-9),
soilag10=lag(soi,-10))
tryit = lm(rec~soilag5+soilag6+soilag7+soilag8+soilag9+soilag10,
data = alldata)
summary (tryit)
acf2(residuals(tryit))
tryit2 =
lm(rec~reclag1+reclag2+soilag5+soilag6+soilag7+soilag8+soilag9+so
ilag10,
data = alldata)
summary (tryit2)
acf2(residuals(tryit2))
tryit3 = lm(rec~reclag1+reclag2+ soilag5+soilag6, data = alldata)
summary (tryit3)
acf2(residuals(tryit3))

Advanced Data Analysis - Lecture Notes
No ratings yet
Advanced Data Analysis - Lecture Notes
874 pages
Homework 1
0% (1)
Homework 1
8 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
STAT 443 Project
No ratings yet
STAT 443 Project
19 pages
National Institute of Technology, Tiruchirappalli MBA Trimester Examination, Basic Data Analytic Marathon Exam
No ratings yet
National Institute of Technology, Tiruchirappalli MBA Trimester Examination, Basic Data Analytic Marathon Exam
22 pages
Chapter 18
100% (1)
Chapter 18
7 pages
Cross-Correlation Leading Indicator TS
No ratings yet
Cross-Correlation Leading Indicator TS
11 pages
Coherence, Phase and Cross-Correlation: X, Y S T S T X, Y X, Y Y, X X, Y X, Y
No ratings yet
Coherence, Phase and Cross-Correlation: X, Y S T S T X, Y X, Y Y, X X, Y X, Y
6 pages
Structural Equation Modeling: Advanced Topics Rex B Kline Concordia
No ratings yet
Structural Equation Modeling: Advanced Topics Rex B Kline Concordia
140 pages
Corre Log Ram
No ratings yet
Corre Log Ram
6 pages
414
No ratings yet
414
12 pages
Forecasting For Manufacturing System
No ratings yet
Forecasting For Manufacturing System
19 pages
Math1 - Analyzing Lines of Fit
No ratings yet
Math1 - Analyzing Lines of Fit
8 pages
Simple Linear Regression: Y XI. XI X
No ratings yet
Simple Linear Regression: Y XI. XI X
25 pages
EEE312 Lab Sheet 3 Revised - Sum
No ratings yet
EEE312 Lab Sheet 3 Revised - Sum
8 pages
From Unit Root To Cointegration: Putting Economics Into Econometrics
No ratings yet
From Unit Root To Cointegration: Putting Economics Into Econometrics
23 pages
SEE5211 Chapter10 2017 - P
No ratings yet
SEE5211 Chapter10 2017 - P
60 pages
A Ybx: Scatter Diagram Correlation Coefficient
No ratings yet
A Ybx: Scatter Diagram Correlation Coefficient
7 pages
Configural Frequency Analysis Methods, Models, and Applications - 1st Edition Premium eBook Download
100% (12)
Configural Frequency Analysis Methods, Models, and Applications - 1st Edition Premium eBook Download
16 pages
Package CMPRSK': R Topics Documented
No ratings yet
Package CMPRSK': R Topics Documented
13 pages
Session 2
No ratings yet
Session 2
30 pages
Lecture4_Panelt-smodels_12-04-2017_corrections
No ratings yet
Lecture4_Panelt-smodels_12-04-2017_corrections
65 pages
Mod 3 Worksheet Review 14KEY
No ratings yet
Mod 3 Worksheet Review 14KEY
5 pages
Collection of Formulae and Statistical Tables For The B2-Econometrics and B3-Time Series Analysis Courses and Exams
No ratings yet
Collection of Formulae and Statistical Tables For The B2-Econometrics and B3-Time Series Analysis Courses and Exams
21 pages
Package Cmprsk, Competing Risk Analysis
No ratings yet
Package Cmprsk, Competing Risk Analysis
13 pages
AB1202 Statistics and Analysis: Time Series Predictive Models
No ratings yet
AB1202 Statistics and Analysis: Time Series Predictive Models
15 pages
Chapter 8
No ratings yet
Chapter 8
7 pages
Regression Model Development For Exposure at Default (EAD) : July 2010
No ratings yet
Regression Model Development For Exposure at Default (EAD) : July 2010
22 pages
Mathematics Grade 12 Term 3 Week 3_2020
No ratings yet
Mathematics Grade 12 Term 3 Week 3_2020
5 pages
C93 Eco 191022 PDF
No ratings yet
C93 Eco 191022 PDF
4 pages
SI: Step-By-Step EDM Analysis
No ratings yet
SI: Step-By-Step EDM Analysis
19 pages
Wooldridge_7e_Ch11_IM
No ratings yet
Wooldridge_7e_Ch11_IM
16 pages
ECS4863 - Solutions To Activity 1.1
No ratings yet
ECS4863 - Solutions To Activity 1.1
17 pages
Assignment4 Group3.CC01.Forecasting-1
No ratings yet
Assignment4 Group3.CC01.Forecasting-1
11 pages
Time Series and Sequential Data
No ratings yet
Time Series and Sequential Data
143 pages
Stat Technical Notes
0% (1)
Stat Technical Notes
430 pages
2 - The Forecaster's Toolbox-ClassNotes
No ratings yet
2 - The Forecaster's Toolbox-ClassNotes
25 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Carolina Found The Following Site With An Example of Unit Root Tests
100% (1)
Carolina Found The Following Site With An Example of Unit Root Tests
10 pages
LECTURE 11 Heizer - Om10 - ch04
No ratings yet
LECTURE 11 Heizer - Om10 - ch04
31 pages
Articulo 1
No ratings yet
Articulo 1
16 pages
Second Midterm Test in Advanced Econometrics: Tentative Answers
No ratings yet
Second Midterm Test in Advanced Econometrics: Tentative Answers
3 pages
Assigment
100% (2)
Assigment
13 pages
3 V Regression Mod 2
No ratings yet
3 V Regression Mod 2
35 pages
03 ASAP TimeSeriesForcasting - Day3 - 4-1
No ratings yet
03 ASAP TimeSeriesForcasting - Day3 - 4-1
35 pages
Δy = α + θy + β Δy β Δy: t t-1 1 t-1 + ..... +. k t-k
No ratings yet
Δy = α + θy + β Δy β Δy: t t-1 1 t-1 + ..... +. k t-k
3 pages
(1)Chapter7
No ratings yet
(1)Chapter7
52 pages
Notes Scatter Plots
No ratings yet
Notes Scatter Plots
39 pages
Adhithyan
No ratings yet
Adhithyan
22 pages
Lecture Note: Analysis of Financial Time Series
No ratings yet
Lecture Note: Analysis of Financial Time Series
12 pages
Ba Rimsr
No ratings yet
Ba Rimsr
110 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Biosignal and Medical Image Processing 3rd Semmlow Solution Manual pdf download
100% (1)
Biosignal and Medical Image Processing 3rd Semmlow Solution Manual pdf download
57 pages
Exercise-4 One Sample T-Test (Normal Distribution Test)
No ratings yet
Exercise-4 One Sample T-Test (Normal Distribution Test)
10 pages
acf_pacf_plots
No ratings yet
acf_pacf_plots
13 pages
Testing For Unit Roots in Rstudio
No ratings yet
Testing For Unit Roots in Rstudio
9 pages
C4 Forecasting
No ratings yet
C4 Forecasting
21 pages
Unit 3 Assignment DIRECTIONS R spr18
No ratings yet
Unit 3 Assignment DIRECTIONS R spr18
28 pages
TimeSeriesAnalysisLectureThree
No ratings yet
TimeSeriesAnalysisLectureThree
34 pages
STAT 497 - Old Exams
100% (2)
STAT 497 - Old Exams
71 pages
Its HOT! Build a Temperature Warning Sound Alarm with Thermistor
From Everand
Its HOT! Build a Temperature Warning Sound Alarm with Thermistor
GURUPRASAD N H
No ratings yet
Instruction for Using a Slide Rule
From Everand
Instruction for Using a Slide Rule
W. Stanley
No ratings yet
Multivariate Distributions
No ratings yet
Multivariate Distributions
8 pages
Docs Slides Lecture6
No ratings yet
Docs Slides Lecture6
31 pages
Statistical Tools in Research (June 23,2014)
100% (1)
Statistical Tools in Research (June 23,2014)
81 pages
Ch11 - Simple Linear Regression
No ratings yet
Ch11 - Simple Linear Regression
40 pages
E9 205 - Machine Learning For Signal Processing
No ratings yet
E9 205 - Machine Learning For Signal Processing
2 pages
Studi Kasus: Identifikasi Komponen Penciri Akreditasi Sekolah/Madrasah Pada Tingkat SD/MI Di Provinsi Kalimantan Timur Tahun 2015
No ratings yet
Studi Kasus: Identifikasi Komponen Penciri Akreditasi Sekolah/Madrasah Pada Tingkat SD/MI Di Provinsi Kalimantan Timur Tahun 2015
8 pages
Unit 5 Assignment Calculating Confidence Intervals
No ratings yet
Unit 5 Assignment Calculating Confidence Intervals
5 pages
Types of Errors in Hypothesis Testing
100% (1)
Types of Errors in Hypothesis Testing
18 pages
Deming Regression: Methcomp Package May 2007
100% (1)
Deming Regression: Methcomp Package May 2007
10 pages
g03 Bergonio g05 Fabul
No ratings yet
g03 Bergonio g05 Fabul
25 pages
1-s2.0-S0167404820304314-main_1
No ratings yet
1-s2.0-S0167404820304314-main_1
19 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
Jurnal Kualitas Produk Terhadap Loyalitas Pelanggan
No ratings yet
Jurnal Kualitas Produk Terhadap Loyalitas Pelanggan
7 pages
Managerial Statistics
No ratings yet
Managerial Statistics
11 pages
Supplementary HW Problems For Test 1 Multiple Choice Problems
No ratings yet
Supplementary HW Problems For Test 1 Multiple Choice Problems
8 pages
Lesson 6
No ratings yet
Lesson 6
9 pages
Stats2 Flowchart For All Cases
No ratings yet
Stats2 Flowchart For All Cases
4 pages
Medical Applications of Finite Mixture Models Full Digital Edition
100% (13)
Medical Applications of Finite Mixture Models Full Digital Edition
15 pages
Problem Set C
No ratings yet
Problem Set C
1 page
Time Series - pp04b
No ratings yet
Time Series - pp04b
46 pages
Exploring Regression Analysis
No ratings yet
Exploring Regression Analysis
13 pages
Ch3 Simple Linear Regression PDF
No ratings yet
Ch3 Simple Linear Regression PDF
24 pages
Lesson 3 Quiz
No ratings yet
Lesson 3 Quiz
5 pages
CORILLA, Final Exam
No ratings yet
CORILLA, Final Exam
7 pages
Course Outline in StatisticsProbability
No ratings yet
Course Outline in StatisticsProbability
4 pages
Lecture 14 - Logistic and Softmax Regression - Plain
No ratings yet
Lecture 14 - Logistic and Softmax Regression - Plain
12 pages
21CS54 Module 4 2021 Scheme
No ratings yet
21CS54 Module 4 2021 Scheme
42 pages
Business Statistics Course Outline
No ratings yet
Business Statistics Course Outline
4 pages