0% found this document useful (0 votes)

30 views39 pages

Temporal and Spatial Models: Anol Bhattacherjee, Ph.D. University of South Florida

This document discusses temporal and spatial models for time-series and geospatial data. Temporal models are important when observations close in time, like quarterly sales, are correlated. Spatial models are useful when nearby observations, such as home values, influence each other. Unique challenges include incorporating the effects of prior observations and defining distance metrics. The document provides examples of trend, seasonality, and lag models to account for temporal dependencies in univariate time-series data like quarterly sales.

Uploaded by

Saitej

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views39 pages

Temporal and Spatial Models: Anol Bhattacherjee, Ph.D. University of South Florida

Uploaded by

Saitej

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

Temporal and Spatial Models

ANOL BHATTACHERJEE, PH.D.

UNIVERSITY OF SOUTH FLORIDA
Outline
 Unique challenges of temporal (time-series) and spatial models.
 Autocorrelation in time-series data.
 Modeling time-series data:
 Trend model
 Additive seasonality model.
 Lag model.
 Autoregression model.
 Modeling spatial data:
 Spatial lag model.
 Assessing model quality:
 Predictive accuracy.
 Model fit.
Example: Coca Cola Sales
Quarter Sales
 Univariate data: Q1-86 1734.83
 We have data on quarterly sales (in millions of dollars) from Q1 of Q2-86 2244.96
Q3-86 2533.80
1986 to Q4 of 1999, but we have no other data. Q4-86 2154.96
Q1-87 1547.82
 Quarterly sales is dependent on time. Q2-87 2104.41
Q3-87 2014.36
 Goals: Q4-87 1991.75
Q1-88 1869.05
 We want to fit a (non-linear) model to this sales data. Q2-88 2313.63
 We want to use that model to predict future sales from Q1-00 to Q1-01. ……
……
 How can we accomplish this with? Q2-99 5379.00
Q3-99 5195.00
 Can we use OLS regression? No. Why? Q4-99 4803.00
Q1-00
Data: 4391.00
CocaCola.csv
 So what else can we do? Q2-00 5621.00
Q3-00 5543.00
Q4-00 4903.00
Q1-01 4479.00
Data Exploration
 Questions:
 What can we learn from this graph?
A. The plot is linear
B. There is a temporal trend
C. There is seasonality
D. There is autocorrelation
E. There is multicollinearity
F. All of the above
G. None of the above
 How can we model this data?
Linear Trend Model
 Recode “Quarter” as follows:  Is this an excellent model or what?
 Q1-86 (first observation in the dataset) is t = 1.  Mult R-sq = 0.89 with just one variable.
 Q2-86 (second observation) is t = 2.  High degree of freedom.
 …
 Q4-99 (last observation) is t = 56.
 Regress Sales against t: yt = β0 + β1 * t

m1 <- lm(Sales ~ t, data=d) What does this

Coefficients:
coefficient mean?
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1530.181 105.041 14.57 <2e-16 ***
t 66.828 3.206 20.84 <2e-16 ***

Residual standard error: 387.8 on 54 degrees of freedom

Multiple R-squared: 0.8895, Adjusted R-squared: 0.8874
F-statistic: 434.5 on 1 and 54 DF, p-value: < 2.2e-16
Linear Trend Model
 Questions:
 Based on this model, the forecasted sales for Q1-00 is approximately:
A. $5340 million.
B. $3809 million.
C. $4391 million.

m1 <- lm(Sales ~ t, data=d)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1530.181 105.041 14.57 <2e-16 ***
t 66.828 3.206 20.84 <2e-16 ***

Residual standard error: 387.8 on 54 degrees of freedom

Multiple R-squared: 0.8895, Adjusted R-squared: 0.8874
F-statistic: 434.5 on 1 and 54 DF, p-value: < 2.2e-16
Coca Cola: Linear Trend Model
 Question: Residual vs. Time
 What can we learn from the residual plot?
A. Residuals are independent
B. Residuals are heteroskedastic
C. Residuals are autocorrelated
D. Residuals are biased
E. None of the above
Seasonality
The red line is the linear trend model  There is a periodic pattern in y-variable; this is
called seasonality.
Observations are not independent but are
5000


related to each other in time.
 This quarter’s sales is in proximity to last
4000

quarter’s sales.
Sales

 Data is not independent; hence OLS models are

inappropriate.
3000

 Correlation across time is called auto-

correlation.
2000

 Linear models ignore seasonality.

0 10 20 30 40 50  So how can we model seasonality?
t
Time-Series Data
 Cross-sectional data:
 Different variables measured at the same point in time.
 Multivariate regression used for analysis.
 Time-series data:
 A variable that changes values at different points in time.
 Data observed at regular time intervals, e.g., daily (stock prices), monthly (CPI), quarterly (e.g.,
GDP), annually (budget).
 Multivariate regression not applicable, since data has high autocorrelation and is not i.i.d.
 May have different components:
 Trend: A long-term, relatively smooth pattern (often annual or more).
 Seasonal: A shorter-term pattern that appears in a regular interval (e.g., quarterly, or less than a year).
 Remaining fluctuations: May include random, lagged, or uncontrolled variations.
Temporal and Spatial Models
 Why/when are spatial/temporal models important?
 When observations that are “close” carry important information about the focal observation.
 Close in time: temporal (or time-series) models.

 Close in space: spatial models.

 Examples:
 This quarter’s sales is close to (in time) last quarter’s sales – temporal association.
 A home’s value is close to (in space) home values nearby – spatial association.

 If neighboring observations carry important information, then you should consider spatial
and/or temporal model components.
 In addition to classical regression components.
The Statistical Challenge
 How to incorporate the effect of neighboring (or close) observations?
 Unlike multivariate regression, we are not predicting future values of Y based on independent
variables X, but based on previous values of the same variable Y.
 Temporal dependency:
 If this quarter’s sales depend on last quarter’s sales, then salest = f(salest-1)
 Sales can be modeled as a lag-variable, using Time (t) as a predictor.
 Spatial dependency:
 If a home’s value depends on home values nearby, then the average value of all nearby homes
can be used as a predictor.
 We can also incorporate the variance (volatility) of nearby homes, the max value, min value, etc,
of nearby homes.
Other Challenges of Spatial Models
 Spatial models are usually more challenging than temporal models, because:
 We have to define an appropriate distance metric, such as Euclidian distance.
 Example: Home A is 5 miles from home B and 10 miles from home C.
 Or more generally, define a similarity metric.
 Example: Product A is closer to product B in the associated feature space.
 We must also specify the reach of the spatial dependency:
 Should we only include homes no further than 5 miles away? Or no further than 10 miles away?

 Should we weigh each observation inversely proportional to its distance?

 Incorporating temporal and spatial dependencies is not hard, but requires some intuition and
thinking.
Additive Seasonality Model
 How to incorporate seasonality:
 Seasonality is a categorical variable.
 Add seasonality to a trend model using dummy variables.
 Four quarters in the Coca Cola data implies three dummy variables.
 Seasonal model:
Quarter Sales t D D_1 D_2 D_3
 yt = β0 + βt t + β1 D1 + β2 D2 + Q1-86 1734.83 1 1 1 0 0
β3 D3 + e Q2-86
Q3-86
2244.96
2533.80
2
3
2
3
0
0
1
0
0
1
Q4-86 2154.96 4 4 0 0 0
Q1-87 1547.82 5 1 1 0 0
Q2-87 2104.41 6 2 0 1 0
Q3-87 2014.36 7 3 0 0 1
Q4-87 1991.75 8 4 0 0 0
 Interpretation: Q1-88 1869.05 9 1 1 0 0

 What do the model coefficients mean?

 How does this model compare to the linear trend model?
Comparing Trend and Seasonality Models
Linear trend model
 Questions: m1 <- lm(Sales ~ t, data=d)
 Based on the additive seasonal model,
Estimate Std. Error t value Pr(>|t|)
we conclude that… (Intercept) 1530.181 105.041 14.57 <2e-16 ***
t 66.828 3.206 20.84 <2e-16 ***
A. Sales increase by $67 million per
quarter. Multiple R-squared: 0.8895, Adjusted R-squared: 0.8874
F-statistic: 434.5 on 1 and 54 DF, p-value: < 2.2e-16
B. Seasonally adjusted sales increase
by $67 million per quarter. Additive seasonality model (trend + seasonal)
C. Detrended sales increase by $67 m2 <- lm(Sales ~ t + D_1 + D_2 + D_3, data=d)
million per quarter.
Coefficients:
D. None of the above. Estimate Std. Error t value Pr(>|t|)
(Intercept) 1368.286 103.871 13.173 < 2e-16 ***
 Which quarter has the highest sales? t 66.663 2.358 28.268 < 2e-16 ***
D_1 -172.285 107.786 -1.598 0.11613
 How will you test the predictive D_2 505.012 107.657 4.691 2.07e-05 ***
performance of each model? D_3 333.700 107.580 3.102 0.00313 ** ---

Multiple R-squared: 0.9438, Adjusted R-squared: 0.9394

F-statistic: 214 on 4 and 51 DF, p-value: < 2.2e-16
Forecasting with Trend and Seasonality Models
 For Q1-00: Linear trend model
m1 <- lm(Sales ~ t, data=d)
 t = 57, D1 = 1, D2 = 0, D3 = 0
 Trend model: Estimate Std. Error t value Pr(>|t|)
(Intercept) 1530.181 105.041 14.57 <2e-16 ***
 ŷ57 = 1530 + 67*57 = 5349 t 66.828 3.206 20.84 <2e-16 ***

 Seasonality model: Multiple R-squared: 0.8895, Adjusted R-squared: 0.8874

F-statistic: 434.5 on 1 and 54 DF, p-value: < 2.2e-16
 ŷ57 = 1368 + 67*57 – 172*1 = 5015
 For Q4-00: Additive seasonality model
 t = 60, D1 = 0, D2 = 0, D3 = 0 m2 <- lm(Sales ~ t + D_1 + D_2 + D_3, data=d)

 Trend model: Coefficients:

Estimate Std. Error t value Pr(>|t|)
 ŷ60 = 1530 + 67*60 = 5530
(Intercept) 1368.286 103.871 13.173 < 2e-16 ***
 Seasonality model: t 66.663 2.358 28.268 < 2e-16 ***
D_1 -172.285 107.786 -1.598 0.11613
 ŷ60 = 1368 + 67*60 = 5388 D_2 505.012 107.657 4.691 2.07e-05 ***
 Questions: D_3 333.700 107.580 3.102 0.00313 ** ---

 Which is the better model? Why? Multiple R-squared: 0.9438, Adjusted R-squared: 0.9394
F-statistic: 214 on 4 and 51 DF, p-value: < 2.2e-16
Add seasonality model Linear trend model

Actual vs Fitted Actual vs Fitted

Sales Sales

2000 3000 4000 5000 2000 3000 4000 5000

0
0
10

10
20
20

t
t
30

30
40
40
50

50
Residual vs Time Residual vs Time
Res Res

-400 -200 0 200 400 600 -500 0 500

0
10

10
20

20
Time
30

Time
30
40

40
Which Model Fits the Data Better

Residual distribution
Frequency

0 2 4 6 8
-500
0

Mod2
Residuals

500
1000
Predictive Accuracy On Holdout Set
 How to measure predictive performance:
Create training data set and holdout (test) data set.

6000
 Actual
Trend Model
 Training: Q1-86 to Q4-99. Seasonal Model

 Test: Q1-00 to Q1-01.

5500
 Estimate model using test data set and use
estimated model to predict sales for test data.

5000
 Compute root mean square error for test data:

4500
 RMSE (Linear Trend) = $729 million.

4000
 RMSE (Additive easonality) = $498 million. 56 57 58 59 60 61 62

 Seasonal is better because it has less error.

Can the Model be Improved Further?
 Residuals are still autocorrelated.
Autocorrelation (or serial correlation): Correlation of an observation with itself at different points in time.
 Autocorrelation: r ~ [-1, 1]
 How to detect it:
 Durbin-Watson Test: DW statistic (d) is a measure of autocorrelation.
d ~ 2(1-r) ~ [0, 4]
library(lmtest)
d = 2 implies no autocorrelation. dwtest(m1)
d  0 implies positive autocorrelation. DW = 1.3616, p-value = 0.004523

d  4 implies negative autocorrelation. dwtest(m2)

DW = 0.48736, p-value = 6.245e-12
 AIC, BIC, SBIC, RMSE (lower is better).
ACF Plots
 ACF and PACF plots: Linear Trend Model
 Residuals should be randomly distributed around the mean.

1.0
 Patterns (exponential decay, positive/negative swings, etc.)
are bad.

0.5
ACF
 Two ways of modeling autocorrelation:

0.0
 Explicitly (as lag variable):

-0.5
0 5 10 15

 Include lagged predictor variables.

Lag

 Autoregressive (AR) models. Additive Seasonality Model

Series res
 Implicitly (in the error term):

1.0
 Model response variable as a function of lagged error terms.

0.6
 Moving average (MA) models.

ACF

0.2
 Other problems:

-0.2
 Non-zero mean. 0 5 10 15

 Non-constant error variance. Lag

Series res
Model with Lag Variables
 Model specification
 Yt denote sales at time t.
 Create a “new” variable Yt-1 (lag-1 variable)
 Model Yt = β0 + β1 Yt-1 + ∑βi xi + ε
 Lags may occur for greater than 1 time unit:
 Different lag models (lag-1, lag-2, lag-3, etc.) can be compared to determine optimum lag duration.
 How many lags should we include?

“Lag” Variable
Quarter Sales Lag.Sales t D D_1 D_2 D_3
Q2-86 2244.96 1734.83 2 2 0 1 0
Q3-86 2533.8 2244.96 3 3 0 0 1
Q4-86 2154.96 2533.8 4 4 0 0 0
Q1-87 1547.82 2154.96 5 1 1 0 0
Q2-87 2104.41 1547.82 6 2 0 1 0
Interpreting Lag Model Results
Estimate Std. Error Pr(>|t|)
 Questions: (Intercept) 147.3157 164.4734 0.3748
 The coefficient 0.74 implies that: Lag.Sales 0.7373 0.0925 0.0000
t 18.0255 6.4156 0.0071
A. Last quarter’s sales have no impact. D_1 21.0050 77.0229 0.7862
B. Sales decrease by 0.74 every quarter. D_2 879.0750 84.2993 0.0000
D_3 207.9091 71.9296 0.0057
C. Seasonally adjusted and detrended sales increase by
Mult R-sq: 0.9761, Adj R-sq: 0.9736
0.74 every quarter.
D. None of the above. Estimate Std. Error Pr(>|t|)
 How do the results compare against the linear trend and (Intercept)
t
1339.6120
67.6190
102.8780
2.3680
0.0000
0.0000
seasonal models? D_1 -207.6500 107.2790 0.0586
D_2 506.9240 105.3540 0.0000
D_3 334.6560 105.2740 0.0025
Mult R-squared: 0.945, Adj R-sq: 0.9406

Estimate Std. Error Pr(>|t|)

(Intercept) 1519.6000 109.7790 0.0000
t 67.1070 3.3210 0.0000
Mult R-sq: 0.8851, Adj R-sq: 0.8830
Trend vs. Seasonal vs. Lag Models
Linear Trend Trend + Seasonal Trend + Lag + Seasonal

600

400
Residual vs Time

Residual vs Time

Residual vs Time
400

Actual vs Fitted
500

200
200
Res
Res

Res

0
0
0

-200

-200
-500

-400
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Time Time Time

Trend + Lag + Seasonal Trend + Lag + Seasonal RMSE Trend = $729 million
RMSE Trend +Seas = $498 million
6000
Actual
5000

Trend Model
Seasonal Model

RMSE Trend + Lag + Seas = $332 million

Holdout Sample Lag & Season

5500
4000
Sales

5000
3000

4500
2000

4000

0 10 20 30 40 50 56 57 58 59 60 61 62

t
ACF Plots
Series res Series res

1.0

1.0
1.0

0.6

0.6
ACF
0.5

ACF
0.2

0.2
ACF

-0.2
0.0

-0.2
0 5 10 15 0 5 10 15
-0.5

0 5 10 15

Lag
Lag Lag

Trend Model Trend +Series

Seasonal
res
Model Trend + Seasonal
Series res
+ Lag Model
0.6

0.2
 Questions:

Partial ACF
Partial ACF

0.0
0.2

 Which is the best of the three above model?

-0.2
-0.2

 In a lag model, how many lags should we consider? 5 10 15

5 10 15

 Do we really need trend and seasonality if we include sufficient number of lags?

Lag Lag

 Anything else that may lead to better forecasting of time-series models?

Autoregressive (AR) Models
 A model in which Yt depends only on its own past values Yt-1, Yt-2, …Yt-p.
 A lagged model, with multiple possible lags.
 Model specification: AR(p) model, with p parameters:
Yt = 𝛽0 + 𝛽1 Yt-1 + 𝛽2 Yt-2 +… + 𝛽p Yt-p + 𝜀t
 AR(0) model: Yt = 𝛽0 + 𝜀t
 AR(1) or first-order autoregressive model: Yt = 𝛽0 + 𝛽1 Yt-1 + 𝜀t
 AR(2) or second-order autoregressive model: Yt = 𝛽0 + 𝛽1 Yt-1 + 𝛽2 Yt-2 + 𝜀t
 How many past values to use?
 Little explanatory utility beyond 3 or 4 lags; hence we usually we stop at 3 or 4.
 Value of p set by modeler based on understanding of the context.
MA, ARMA, and ARIMA Models
 Moving average (MA) model:
 A model in which Yt depends only on random error terms 𝜀t , 𝜀t-1, 𝜀t-2, …
 Model specification: Yt = 𝛽0 + 𝜀t + 𝛷1 𝜀 t-1 + 𝛷2 𝜀t-2 +… 𝛷q 𝜀t-q
 Ideally, error terms 𝜀t should have a zero mean and constant variance 𝜎2 (“white noise”).
 Intended to account for endogeneity (omitted variables) in time-series models.
 Autoregressive moving average (ARMA) model:
 Has both AR and MA terms.
 Model specification: Yt = 𝛽0 + 𝛽1 Yt-1 + 𝛽2 Yt-2 +… + 𝛽p Yt-p + 𝜀t + 𝛷1 𝜀 t-1 + 𝛷2 𝜀t-2 +… 𝛷q 𝜀t-q
 ARMA models require two pre-specified parameters: p (number of past lag values) and q (number
of white noise disturbances).
 Autoregressive integrated moving average (ARIMA) model:
 ARMA models with difference terms to account for stationarity.
 Differentiation forces zero mean and constant error variance.
Stationarity of Time Series
 Series Yt is “strictly stationary” if the mean, variance, and covariance of Yt are time-invariant.
E(Y1) = E(Y2) = … = E(Yt) = 𝜇 ( a constant)
Var(Y1) = Var(Y2) = … = Var(Yt) = 𝜎2 ( a constant)
Cov(Y1, Y1+k) = Cov(Y2, Y2+k) = … = Cov(Yk-1, Yk) = 𝛾 (a constant)
 Series Yt is “weakly stationary” or “covariance stationary” if the first two conditions are met,
but covariance depends only on lag k.
Cov(Y1, Y1+k) = Cov(Y2, Y2+k) = … = Cov(Yp, Yp+k) = 𝛾k
 Why stationarity is important:
 Estimators are biased if a time-series is not stationary.
 A series that is not stationary can be made stationary via differencing.
 A series that becomes stationary after differencing once is integrated of order 1 or I(1).
 A series that becomes stationary after differencing d times is integrated of order d or I(d).
 A series that is stationary without differencing is I(0).
Estimating ARIMA Models
 Box-Jenkins Methodology:
 Works only on stationary time-series variables.
 Three steps:
 Identification:
time-series is not stationary, difference it until it becomes stationary.
 Check for stationarity; if
 Estimate d in I(d) as number of differentiation applied.
 Estimation:
 If time-series is stationary, estimate lag values p and q for AR(p) and MA(q) using ACF and PACF plots.
 Modeling:
 Run ARIMA model using p, d, and q values.

 Diagnostic checking:
 Compare model statistics (AIC, BIC, SBIC) to choose the best model.
 Plot residual ACF: should be random (no pattern), i.e., white noise.
ACF and PACF
 Autocorrelation Function (ACF):
 Measures correlation between observation Yt and observation Yt-p located p periods apart.
𝜌k = Corr (Yt , Yt-p) = Cov (Yt , Yt-p) / (√Var (Yt) * √Var (Yt-p) = 𝛾p/ 𝛾0
 Estimates how many past values may be related to Yt, i.e., the lag p in AR(p) models.
 Partial autocorrelation function (PACF):
 Measures correlation between residual 𝜀t and 𝜀t-q located q periods apart.
 Autocorrelation of a signal with itself at different points in time, when the linear dependency of that
signal at shorter lags removed.
 Used to estimate lag q in MA(q) models.
AirPassengers Data
 Monthly ticket sales counts (in thousands) for 1949-1960.
 We wish to predict ticket sales for the next 5 years.
data(AirPassengers)
ts <- AirPassengers
ts
start(ts)
end(ts)
class(ts)
frequency(ts)
cycle(ts)

plot(ts)
abline(lm(ts ~ time(ts)), col="red”)
boxplot(ts ~ cycle(ts))

 Questions:
 What are we trying to depict in this boxplot?
 What inferences can we draw from this boxplot?
AirPassengers: Stationarity
plot(ts)  Questions:
abline(lm(ts ~ time(ts)), col="red")
plot(log(ts))  What did the log transform do?
plot(diff(log(ts)))  What did the differencing do?
 Do we have a stationary time series?

Linear Trend Plot

ARIMA parameter: d=1
AirPassengers: ACF and PACF Estimation
acf(ts)
acf(diff(log(ts)))
ARIMA parameters: p=2, d=1, q=1
pacf(diff(log(ts))) Look for when ACF and PACF functions change signs
adf.test(diff(log(ts)))

Lag 0
Lag 0
Lag 1
Lag 2(p=2)

Lag 1 (q=1)
AirPassengers: ARIMA with Cross-Validation
model <- arima(log(ts), c(2,1,1), seasonal=list(order=c(2,1,1), period=12))
predicted <- predict(model, n.ahead=5*12)
predicted <- 2.718282^predicted$pred
predicted <- round(predicted,0)
predicted
ts.plot(ts, predicted, lty=c(1,3))

# Cross-validation
train <- ts(ts, frequency=12, start=c(1949,1), end=c(1958,12))
model <- arima(log(train), c(2,1,1),
seasonal=list(order=c(2,1,1), period=12))
predicted <- predict(model, n.ahead=2*12)
predicted <- 2.718282^predicted$pred Training data: 1949-1958
predicted <- round(predicted,0) Test data: 1959-1960
original <- tail(ts, 24)
original – predicted

RMSE <- sqrt(1/24*sum((original - predicted)^2))

Spatial Models
 Basic idea is similar to that of temporal models:
 We want to account for the information due to neighboring observations.
 More complex than temporal models since spatial dependency evolves continuously in a 2-
dimensional (or sometimes higher-dimensional) space.
Example: Baltimore House Prices
 Goal: To estimate factors that determine price of real estate
 We have data on house characteristics (bedrooms, bathrooms, patio, fireplace, AC, etc.) and
geographical location (longitude, latitude, coded as X and Y) for a sample of Baltimore homes.
 We also have the selling price (in thousands of dollars).

ID PRICE NBROOM DWELL NBATH PATIO FIREPL AC X Y Estimate Std. Error Pvalue
1 47 4 0 1 0 0 0 907 534
2 113 7 1 2.5 1 1 1 922 574
(Intercept) 15.4510 5.5480 0.0059
3 165 7 1 2.5 1 1 0 920 581 NBROOM 1.1510 1.2430 0.3556
4 104.3 7 1 2.5 1 1 1 923 578 NBATH 8.3310 2.2010 0.0002
5 62.5 7 1 1.5 1 1 0 918 574
6 70 6 1 2.5 1 1 0 900 577
PATIO 17.2780 3.4680 0.0000
7 127.5 6 1 2.5 1 1 1 918 576 FIREPL 17.1310 3.0270 0.0000
8 53 8 1 1.5 1 0 0 907 576 AC 12.7670 2.8330 0.0000

 Questions: Multiple R-squared: 0.5152,

Adjusted R-squared: 0.5033
 How do we interpret the numbers in red ovals?
 What can we learn from this model?
Spatial Plot of House Prices
Baltimore House Prices

580

580
570
150
560

560
550

100

Latitude
540

540
530

520
520
510

860 880 900 920 940 960 980

Longitude

 The colors and diamond size are proportional to the price of a house.
 What can we learn from this graph?
 How can we model this effect?
How to Measure ‘Space’?
 We must define space in order to measure its effects.
 Naive method: Regional dummy variables, e.g., for zip codes.
 Weight matrix: n x n neighborhood structure, where: 0 = not neighbor, 1 = neighbor.
Sample Region and Units Simple Neighborhood Matrix
1 2 3 4 5 6 7 8 9
1 2 3 1 0 1 0 1 0 0 0 0 0
2 1 0 1 0 1 0 0 0 0
3 0 1 0 0 0 1 0 0 0

4 5 6 4 1 0 0 0 1 0 1 0 0
5 0 1 0 1 0 1 0 1 0
6 0 0 1 0 1 0 0 0 1
7 0 0 0 1 0 0 0 1 0
7 8 9 8 0 0 0 0 1 0 1 0 1
9 0 0 0 0 0 1 0 0 0
Spatial Lag Model
Initial OLS Model (AIC OLS = 1793)
 Spatial autocorrelation in response variable: Estimate Std. Error Pvalue
Y = ρWY + 𝛽X + ε (Intercept)
NBROOM
15.4510
1.1510
5.5480
1.2430
0.0059
0.3556
W: spatial weight; ρ: spatial coefficient NBATH 8.3310 2.2010 0.0002
PATIO 17.2780 3.4680 0.0000
 Incorporates spatial effects by including a spatially lagged FIREPL 17.1310 3.0270 0.0000
dependent variable as an additional predictor. AC 12.7670 2.8330 0.0000

 OLS vs. spatial lag results: Spatial Lag Model (AIC Spatial Lag = 1739)
 Some of the estimates are smaller in the lag model. Estimate Std. Error Pvalue
(Intercept) -2.6764 4.8670 0.5824
 Intercept term switched signs and is no longer significant. NBROOM 1.2673 1.0487 0.2269
 What happened? Which model is better? NBATH
PATIO
7.6529
11.9579
1.8711
2.9539
0.0000
0.0001
FIREPL 11.1740 2.6072 0.0000
AC 8.4183 2.4014 0.0005
Spatial coefficient:
Rho: 0.49961; p-value: 6.6502e-14
OLS vs. Spatial Lag
 Certain predictors (e.g., presence of patio or fireplace) lost their importance in predicting
home prices when neighboring homes are included (using spatial lag ρWy).
 Why?
 Houses located in the same area tend to have similar features, e.g., fireplaces and patios in
wealthy neighborhoods, no central AC in poorer neighborhoods.
 Hence, prices of neighboring houses already factor in the price effect of these “expected” features.
 Lack of these features may change the price a little but not by much.
 Implication:
 Better to buy a low-end house in an expensive neighborhood rather than a high-end house in an
inexpensive neighborhood?
Key Takeaways
 Modeling temporal and spatial dependencies in data presents unique challenges such as
autocorrelation and location correlation.
 Statistical models available to account for these dependencies.
 Additive seasonality model.
 Lag model.
 AR, MA, ARMA, ARIMA models.
 Spatial lag model.
 Assessing model quality:
 Use estimates from training set to predict values in test set (predictive accuracy).
 Alternative measures of model fit such as AIC and visual examination of residuals are needed.
 Such analysis provide better insight into relationship among data not available from OLS
models.

Printers Presentation
100% (1)
Printers Presentation
17 pages
Ceaser and Cleopatra
No ratings yet
Ceaser and Cleopatra
9 pages
Wah Industry Limited. Internship Report
100% (4)
Wah Industry Limited. Internship Report
52 pages
Agricultural and Biological Engineering: Psychrometric Chart Use
No ratings yet
Agricultural and Biological Engineering: Psychrometric Chart Use
6 pages
Time Series For Data Science Analysis and Forecasting (Wayne A. Woodward, Bivin Philip Sadler Etc.) (Z-Library)
100% (1)
Time Series For Data Science Analysis and Forecasting (Wayne A. Woodward, Bivin Philip Sadler Etc.) (Z-Library)
529 pages
Unit 3 Regression Models
No ratings yet
Unit 3 Regression Models
74 pages
TSF Shoe Sales & Softdrink by Shubradip Ghosh Pgpdsba 2022 Mar
No ratings yet
TSF Shoe Sales & Softdrink by Shubradip Ghosh Pgpdsba 2022 Mar
61 pages
Time Series
No ratings yet
Time Series
67 pages
Topic 8 Time Series and Forecasting
No ratings yet
Topic 8 Time Series and Forecasting
33 pages
Marketing Analytics Unit 3
No ratings yet
Marketing Analytics Unit 3
54 pages
3-Structure Analysis - Trusses
No ratings yet
3-Structure Analysis - Trusses
58 pages
11eleventh Lecture - 2025
No ratings yet
11eleventh Lecture - 2025
37 pages
363 HHD 221
No ratings yet
363 HHD 221
102 pages
Apoorva P 17th March TSF
No ratings yet
Apoorva P 17th March TSF
47 pages
Business Plan II
No ratings yet
Business Plan II
93 pages
7 Regression
No ratings yet
7 Regression
96 pages
Lecture 05
No ratings yet
Lecture 05
57 pages
Lecture 21 Sarima Modeling
No ratings yet
Lecture 21 Sarima Modeling
28 pages
13-Time Series Forecasting Chap013
No ratings yet
13-Time Series Forecasting Chap013
26 pages
Airgoo: Airbrush Compressor General Manual
No ratings yet
Airgoo: Airbrush Compressor General Manual
128 pages
Handout 2020 Part1 PDF
No ratings yet
Handout 2020 Part1 PDF
82 pages
l9 Osi Model
No ratings yet
l9 Osi Model
172 pages
Lecture 20 Regr Arima
No ratings yet
Lecture 20 Regr Arima
23 pages
Time Series Analysis 4 25
100% (1)
Time Series Analysis 4 25
9 pages
Chapter 7 - Decomposition Models - 2024-Bị Cắt Xén
No ratings yet
Chapter 7 - Decomposition Models - 2024-Bị Cắt Xén
24 pages
BZ3 Instruction (v1.0)
No ratings yet
BZ3 Instruction (v1.0)
23 pages
Time Series Data
No ratings yet
Time Series Data
41 pages
Wetted Surface Area of Partially Filled Horizontal Vessel
No ratings yet
Wetted Surface Area of Partially Filled Horizontal Vessel
1 page
Distributed Information Systems: Prototypical Active Website Rest-Apis, Json, Database, Charts
No ratings yet
Distributed Information Systems: Prototypical Active Website Rest-Apis, Json, Database, Charts
31 pages
Assignment #3: Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
No ratings yet
Assignment #3: Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
9 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
11 pages
Topic 8 Time Series and Forecasting
No ratings yet
Topic 8 Time Series and Forecasting
33 pages
Hydrocarbon Solutions
No ratings yet
Hydrocarbon Solutions
26 pages
Advanced Information Systems Analysis and Design: Class 2: System Development Processes and Methods
No ratings yet
Advanced Information Systems Analysis and Design: Class 2: System Development Processes and Methods
98 pages
Bayesian Structural Time Series Models
No ratings yet
Bayesian Structural Time Series Models
100 pages
Advanced Information Systems Analysis and Design: Class 10: New Directions in Software Development
No ratings yet
Advanced Information Systems Analysis and Design: Class 10: New Directions in Software Development
95 pages
FM - Resumes
No ratings yet
FM - Resumes
18 pages
L4 Linq
No ratings yet
L4 Linq
19 pages
Advanced Information Systems Analysis and Design: Class 3: Requirements, Specification, and Architecture
No ratings yet
Advanced Information Systems Analysis and Design: Class 3: Requirements, Specification, and Architecture
71 pages
Distributed Information Systems: Lecture 2 - Git, Object-Oriented Programming
No ratings yet
Distributed Information Systems: Lecture 2 - Git, Object-Oriented Programming
57 pages
Distributed Information Systems: Lecture 4 - Linq
No ratings yet
Distributed Information Systems: Lecture 4 - Linq
19 pages
Distributed Information Systems: Lecture 4 - Linq
No ratings yet
Distributed Information Systems: Lecture 4 - Linq
19 pages
Bayesian Structural Time Series Models
100% (1)
Bayesian Structural Time Series Models
100 pages
ForecastingIndividualassignment MohammadMujtaba 12020063
No ratings yet
ForecastingIndividualassignment MohammadMujtaba 12020063
20 pages
BCGZrbIgSUOhma2yIGlDWw BF C2 W4a Seasonal Dummy Variables
No ratings yet
BCGZrbIgSUOhma2yIGlDWw BF C2 W4a Seasonal Dummy Variables
19 pages
CH 8 F
No ratings yet
CH 8 F
33 pages
Initial Analysis of A Time Series (Car Sales 1960-1968)
No ratings yet
Initial Analysis of A Time Series (Car Sales 1960-1968)
14 pages
Applied Quantitative Research Methods Time Series: 2nd March 2020
No ratings yet
Applied Quantitative Research Methods Time Series: 2nd March 2020
30 pages
Unit 6
No ratings yet
Unit 6
15 pages
DevOps 2018 Report
0% (1)
DevOps 2018 Report
46 pages
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
No ratings yet
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
65 pages
Multisensor Installation Tool List - 4309978 - 01
No ratings yet
Multisensor Installation Tool List - 4309978 - 01
6 pages
Distributed Information Systems: Prototypicalactivewebsite R Est - Apis, Js On, Data Ba Se, Cha Rts
No ratings yet
Distributed Information Systems: Prototypicalactivewebsite R Est - Apis, Js On, Data Ba Se, Cha Rts
31 pages
Module 2.3 EDA Part 3 Time Series Data in Python and R
No ratings yet
Module 2.3 EDA Part 3 Time Series Data in Python and R
20 pages
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
No ratings yet
Distributed Information Systems: Lecture4 - Entityframework Basedon Julia Lerman, Chs1-8
58 pages
Logistic Regression Video Exhibits Markup
No ratings yet
Logistic Regression Video Exhibits Markup
45 pages
L2 ObjectOrientedProgrammingIntroduction
No ratings yet
L2 ObjectOrientedProgrammingIntroduction
57 pages
Distributed Information Systems: Javascript and Jquery
No ratings yet
Distributed Information Systems: Javascript and Jquery
56 pages
Distributed Information Systems: Javascript and Jquery
No ratings yet
Distributed Information Systems: Javascript and Jquery
56 pages
ALiCC PWRI 2006
No ratings yet
ALiCC PWRI 2006
47 pages
Combined Stresses Singer
No ratings yet
Combined Stresses Singer
8 pages
Titrimetric Methods of Analysis
No ratings yet
Titrimetric Methods of Analysis
82 pages
Finding Seasonal Trends in Time-Series Data With Python by Spencer Hayes Towards Data Science
No ratings yet
Finding Seasonal Trends in Time-Series Data With Python by Spencer Hayes Towards Data Science
1 page
FASE II - Tema 7
No ratings yet
FASE II - Tema 7
27 pages
Timeseries - Analysis
No ratings yet
Timeseries - Analysis
37 pages
Group 10 TS Assignment
0% (1)
Group 10 TS Assignment
21 pages
Gakhov Time Series Forecasting With Python
No ratings yet
Gakhov Time Series Forecasting With Python
66 pages
Trend Projections and Cyclical Variations in Data
No ratings yet
Trend Projections and Cyclical Variations in Data
16 pages
Cable Size Selection - Student Version
No ratings yet
Cable Size Selection - Student Version
14 pages
Jacquenetta SlidesCarnival
No ratings yet
Jacquenetta SlidesCarnival
32 pages
Distributed Information Systems: Lecture 4 - Entity Framework Based On Julia Lerman, Chs 1-8
No ratings yet
Distributed Information Systems: Lecture 4 - Entity Framework Based On Julia Lerman, Chs 1-8
58 pages
Arima R Programas
No ratings yet
Arima R Programas
27 pages
Flexible Data Models: Dummy Variables and Interaction Effects
100% (1)
Flexible Data Models: Dummy Variables and Interaction Effects
31 pages
Advanced Information Systems Analysis and Design: Class 6: Introduction To Icase With Argouml
No ratings yet
Advanced Information Systems Analysis and Design: Class 6: Introduction To Icase With Argouml
30 pages
Forecasting Calculations
No ratings yet
Forecasting Calculations
11 pages
Chap3-INTERVENTION ANALYSIS
No ratings yet
Chap3-INTERVENTION ANALYSIS
62 pages
Distributed Information Systems: Lecture9 - Networkingbasics: Osi Model Basedon Agrawalandsharma, Prospectpress
No ratings yet
Distributed Information Systems: Lecture9 - Networkingbasics: Osi Model Basedon Agrawalandsharma, Prospectpress
172 pages
Imm Rota New
No ratings yet
Imm Rota New
1 page
Codes
No ratings yet
Codes
8 pages
2019 Spring Syllabus ISM6562 Muma 1
No ratings yet
2019 Spring Syllabus ISM6562 Muma 1
8 pages
Rscada: The Complete Scada Solution The Complete Scada Solution
No ratings yet
Rscada: The Complete Scada Solution The Complete Scada Solution
14 pages
The Upper East Side Home Prices
No ratings yet
The Upper East Side Home Prices
43 pages
T Series
No ratings yet
T Series
24 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
Time Series Penn
No ratings yet
Time Series Penn
67 pages
Session 07
No ratings yet
Session 07
41 pages
6304 Time Series Video Exhibits Markup
No ratings yet
6304 Time Series Video Exhibits Markup
20 pages
Hypothesis Testing and Regression Modelling
No ratings yet
Hypothesis Testing and Regression Modelling
8 pages
Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida
No ratings yet
Non-Linear Data Models: Anol Bhattacherjee, Ph.D. University of South Florida
28 pages
Time Series Project
No ratings yet
Time Series Project
19 pages
Intake - Output Medication Nursing Reference
No ratings yet
Intake - Output Medication Nursing Reference
4 pages
Advanced Information Systems Analysis and Design: Class 8: Software System Security
No ratings yet
Advanced Information Systems Analysis and Design: Class 8: Software System Security
17 pages
Pump Governer D6m Rastavljanje I Satavljanje
No ratings yet
Pump Governer D6m Rastavljanje I Satavljanje
33 pages
Regression Statistics
No ratings yet
Regression Statistics
10 pages
Chia Verini 2002
No ratings yet
Chia Verini 2002
2 pages
A Psalm of Life
0% (1)
A Psalm of Life
12 pages
Forecasting: Types of Forecasting Models
No ratings yet
Forecasting: Types of Forecasting Models
8 pages
Time Series Analysis and Forecasting
No ratings yet
Time Series Analysis and Forecasting
21 pages
Lectra Diamino V5R3 Referencia Brochure
100% (1)
Lectra Diamino V5R3 Referencia Brochure
3 pages
What Is Time Series Analysis
No ratings yet
What Is Time Series Analysis
28 pages
Homework Questions
No ratings yet
Homework Questions
4 pages
The Trinity - Lesson 4
100% (1)
The Trinity - Lesson 4
3 pages
Forecasting (Prediction) Limits: Example Linear Deterministic Trend Estimated by Least-Squares
No ratings yet
Forecasting (Prediction) Limits: Example Linear Deterministic Trend Estimated by Least-Squares
27 pages
Assignment 3: Forecasting Question 5 - 33
No ratings yet
Assignment 3: Forecasting Question 5 - 33
15 pages
Chapter 3 - Forecasting
No ratings yet
Chapter 3 - Forecasting
28 pages
Lecture Note: Analysis of Financial Time Series
No ratings yet
Lecture Note: Analysis of Financial Time Series
12 pages
Andrew D. Miall
No ratings yet
Andrew D. Miall
48 pages
Forecasting Using An Additive Model (From Last Week) Additive Model A T S R
No ratings yet
Forecasting Using An Additive Model (From Last Week) Additive Model A T S R
10 pages
Analyzing and Forecasting Time Series Data
No ratings yet
Analyzing and Forecasting Time Series Data
41 pages
Magnum Line Pressure Operated Surface Safety Gate Valve: Invention, Innovation, and Engineering Creativity
No ratings yet
Magnum Line Pressure Operated Surface Safety Gate Valve: Invention, Innovation, and Engineering Creativity
4 pages
BT Inter Phone User Manual
100% (1)
BT Inter Phone User Manual
15 pages
Introduction To Chemotaxis Chemotaxis Describes How Bacteria and Cellular Organisms
No ratings yet
Introduction To Chemotaxis Chemotaxis Describes How Bacteria and Cellular Organisms
2 pages
Social Studies Unit Plan Organizer Teacher Candidate: Andrea Murree Grade: 6 Social Studies
No ratings yet
Social Studies Unit Plan Organizer Teacher Candidate: Andrea Murree Grade: 6 Social Studies
23 pages
Drilling Calculations
No ratings yet
Drilling Calculations
7 pages
Mega Project Interface Management
100% (3)
Mega Project Interface Management
3 pages
Elliott Wave Timing Beyond Ordinary Fibonacci Methods
From Everand
Elliott Wave Timing Beyond Ordinary Fibonacci Methods
Mark Lytle
4/5 (23)
Control Charts: Six Sigma Thinking, #7
From Everand
Control Charts: Six Sigma Thinking, #7
Sumeet Savant
4/5 (1)

Temporal and Spatial Models: Anol Bhattacherjee, Ph.D. University of South Florida

Uploaded by

Temporal and Spatial Models: Anol Bhattacherjee, Ph.D. University of South Florida

Uploaded by

Temporal and Spatial Models

ANOL BHATTACHERJEE, PH.D.

m1 <- lm(Sales ~ t, data=d) What does this

Residual standard error: 387.8 on 54 degrees of freedom

Residual standard error: 387.8 on 54 degrees of freedom

 Data is not independent; hence OLS models are

 Correlation across time is called auto-

 Linear models ignore seasonality.

 Close in space: spatial models.

 Should we weigh each observation inversely proportional to its distance?

 What do the model coefficients mean?

Multiple R-squared: 0.9438, Adjusted R-squared: 0.9394

 Seasonality model: Multiple R-squared: 0.8895, Adjusted R-squared: 0.8874

 Trend model: Coefficients:

Actual vs Fitted Actual vs Fitted

2000 3000 4000 5000 2000 3000 4000 5000

-400 -200 0 200 400 600 -500 0 500

 Test: Q1-00 to Q1-01.

 Seasonal is better because it has less error.

d  4 implies negative autocorrelation. dwtest(m2)

 Include lagged predictor variables.

 Autoregressive (AR) models. Additive Seasonality Model

 Non-constant error variance. Lag

Estimate Std. Error Pr(>|t|)

Time Time Time

RMSE Trend + Lag + Seas = $332 million

Trend Model Trend +Series

 Which is the best of the three above model?

 In a lag model, how many lags should we consider? 5 10 15

 Do we really need trend and seasonality if we include sufficient number of lags?

 Anything else that may lead to better forecasting of time-series models?

Linear Trend Plot

RMSE <- sqrt(1/24*sum((original - predicted)^2))

 Questions: Multiple R-squared: 0.5152,

860 880 900 920 940 960 980

You might also like