Chapter 1 - Lecture
Chapter 1 - Lecture
A time series is a sequence of ordered data. The “ordering” refers generally to time, but other orderings
could be envisioned (e.g., over space, etc.). In this class, we will be concerned exclusively with time series
that are
Time series data arise in a variety of fields. Here are just a few examples.
• In business, we observe daily closing stock prices, weekly interest rates, quarterly sales, monthly price
indices, etc.
• In agriculture, we observe annual yields (e.g., crop production), daily crop prices, annual livestock
production, etc.
• In engineering, we observe electric signals, voltage measurements, etc.
• In natural sciences, we observe chemical yields, turbulence in ocean waves, earth tectonic plate
positions, etc.
• In medicine, we observe EKG measurements on patients, drug concentrations, blood pressure read-
ings, etc.
• In epidemiology, we observe the number of flu cases per day, the number of health-care clinic visits
per week, annual tuberculosis counts, etc.
• In meteorology, we observe hourly wind speeds, daily high temperatures, annual rainfall,earthquake
frequency, etc.
• In social sciences, we observe annual birth and death rates, accident frequencies, crime rates, school
enrollments, etc.
The time series plot is the most basic graphical display in the analysis of time series data. The plot is a
basically a scatterplot of Yt versus t, with straight lines connecting the points. Notationally, Yt = value of
the variable Y at time t, for t = 1, 2, ..., n. The subscript t tells us to which time point the measurement
Yt corresponds. Note that in the sequence Y1 , Y2 , ..., Yn , the subscripts are very important because they
correspond to a particular ordering of the data. This is perhaps a change in mind set from other methods
courses where the time element is ignored.
1
• What are the noticable patterns?
– typical year was about 15 inches
– considerable variation over the years, i.e., some years are low, some high, many are in-between
∗ The year 1884 was an exceptionally wet year
∗ The year 1989 was quite dry
• Prediction?
– there is little information about this year’s rainfall amount from last year’s amount, i.e., the plot
shows no "trends".
– there is litte correlation between last year’s rainfall amount and this years’ amount
library (TSA)
data(larain)
plot(larain, ylab="Inches",xlab="Year",type="o",
main="Time Series Plot of Los Angeles Annual Rainfall")
20
10
Year
2
Scatterplot of LA Rainfall versus Last Year's LA Rainfall
40
30
Inches
20
10
10 20 30 40
This plot is called lag-1 scatterplot, displaying the observed data plotted against the lag-1 series; i.e., the
scatterplot of the data points (Y1 , Y2 ), (Y2 , Y3 ), . . . , (Yn−1 , Yn ), graphically describing the degree of
correlation between rainfall from one year to the next year.
– strong correlation between the same month over the years in scatterplot is about 0.97
– seasonality can be used for prediction.
3
library (TSA)
data(tempdub)
plot(tempdub, ylab="Temperature",xlab="Time",type="o",
main="Average Monthly Temperatures, Dubuque, Iowa")
50
40
30
20
10
Time
4
Ave Monthly Temp versus Previous Year's Ave Monthly Tempe
70
60
Temperature
50
40
30
20
10
10 20 30 40 50 60 70
cor(tempdub,zlag(tempdub,12),use="complete.obs")
## [1] 0.9702201
5
library (TSA)
data(oilfilters)
plot(oilfilters, ylab="Sales",xlab="Time",type="l",
main="Monthly Oil Filter Sales with Month Symbol")
points(y=oilfilters,x=time(oilfilters),pch=as.vector(season(oilfilters)))
J F F
J J J A
F
5000
FMAM A
4000
J
Sales
S M
J A J
M
A O J J
D
3000
D
M J M
N A A M J
J O A
ON ON
2000
S M
N D
S S D
Time
plot(y=oilfilters,x=zlag(oilfilters,12),ylab="Sales",xlab="Time",
main="Lag-12 scatterplot")
6
Lag−12 scatterplot
6000
5000
4000
Sales
3000
2000
Time
cor(oilfilters,zlag(oilfilters,12),use="complete.obs")
## [1] 0.8084015
plot(y=oilfilters,x=zlag(oilfilters,1),ylab="Sales",xlab="Time",
main="Lag-1 scatterplot")
7
Lag−1 scatterplot
6000
5000
4000
Sales
3000
2000
Time
cor(oilfilters,zlag(oilfilters,1),use="complete.obs")
## [1] 0.3142145
8
– strong correlation between the neighboring years is about 0.84
0.2
0.0
−0.2
−0.4
Year
9
Lag−1 Scatterplot
0.4
Global temperature deviations
0.2
0.0
−0.2
−0.4
cor(globaltemps,zlag(globaltemps,1),use="complete.obs")
## [,1]
## V1 0.8421212
1. to model the stochastic (random) mechanism that gives rise to the series of data
2. to predict (forecast) the future values of the series based on the previous history.
NOTES: For time series data, we get to see only a single measurement from a population (at time t) instead
of a sample of measurements at a fixed point in time (cross-sectional data).
1. The special feature of time series data is that they are not independent! Instead, observations are
correlated through time.
• Correlated data are generally more difficult to analyze.
• Statistical theory in the absence of independence becomes markedly more difficult.
2. Most classical statistical methods (e.g., regression, analysis of variance, etc.) assume that observations
are statistically independent. For example, in the simple linear regression model
Yi = β0 + β1 xi + i
or an ANOVA model like
10
Yijk = µ + αi + βj + (αβ)ij + ijk ,
we typically assume that the error terms are independent and identically distributed (iid) normal
random variables with mean 0 and constant variance.
3. There can be additional trends or seasonal variation patterns (seasonality) that may be difficult to
identify and model.
4. The data may be highly non-normal in appearance and be possibly contaminated by outliers.
Our overarching goal in this course is to build (and use) time series models for data. This breaks down into
different parts.
3. Model diagnostics
• Use statistical inference and graphical displays to check how well the model fits the data.
• This part of the analysis may suggest the candidate model is inadequate and may point to more
appropriate models.
11