0% found this document useful (0 votes)
25 views75 pages

Ch1 DATA315 2023W2

The document introduces time series analysis and forecasting, providing examples such as nickel concentrations in chrome plating, lynx trapping data, and global temperature changes. It outlines the steps involved in time series modeling, including exploratory data analysis and forecasting, while also discussing necessary statistical concepts like probability distributions and expected values. The document emphasizes the importance of understanding trends and variability in time series data for various applications.

Uploaded by

Zhongyu Pan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views75 pages

Ch1 DATA315 2023W2

The document introduces time series analysis and forecasting, providing examples such as nickel concentrations in chrome plating, lynx trapping data, and global temperature changes. It outlines the steps involved in time series modeling, including exploratory data analysis and forecasting, while also discussing necessary statistical concepts like probability distributions and expected values. The document emphasizes the importance of understanding trends and variability in time series data for various applications.

Uploaded by

Zhongyu Pan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

DATA 315

Introduction to Time Series and Forecasting

Lengyi Han, UBC

1
Chapter 1: Introduction

Example 1
Electroless nickel concentrations in a chrome plating process* were
measured at the beginning of each eight hour work shift for a period of
25 days. A concentration of 4.5 ounces per gallon is considered optimal
in this application.

* The chrome plating process is a method of applying a thin layer of chromium onto a substrate
(metal or alloy) through an electroplating procedure.
https://www.sea.org.uk/blog/a-brief-guide-to-the-chrome-plating-process/?doingw pc ron =
1704997548.7409129142761230468750
2
Chapter 1: Introduction

Example 1

* https://rexplating.com/decorative-chrome-plating/

3
Chapter 1: Introduction

Example 1
Question: Interest here is in determining whether the distribution of
concentration measurements changes at any point in time.

4
Concentration (oz./gal.)

4.2 4.4 4.6 4.8 5.0 5.2

5
Chapter 1: Examples

10

Time (days)
15
20
25

5
ACF

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

0
1
2
3

Lag
Series nickel

4
5
6
## tau = 0.302, 2-sided pvalue =0.00029528
Concentration (oz./gal.)

4.2 4.4 4.6 4.8 5.0 5.2

5
Chapter 1: Examples

10

Time (days)
15
20
25

6
##
## Call:
## lm(formula = NKL ˜ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45313 -0.10964 -0.01179 0.12733 0.48605
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.481333 0.044843 99.934 < 2e-16
## x 0.010872 0.003016 3.604 0.000569
##
## Residual standard error: 0.1884 on 73 degrees of freedom
## Multiple R-squared: 0.1511,Adjusted R-squared: 0.1394
## F-statistic: 12.99 on 1 and 73 DF, p-value: 0.000569
Chapter 1: Examples

Compare with no trend time series


y <- rnorm(75, 4.5, 0.2)
x<- rep(1:25, rep(3, 25))

ts.plot(y)

7
y

4.0 4.2 4.4 4.6 4.8 5.0

0
20

Time
40
60
summary(lm(y˜x))

##
## Call:
## lm(formula = y ˜ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.47760 -0.13391 -0.02575 0.12465 0.59900
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.468052 0.048689 91.766 <2e-16
## x -0.001106 0.003275 -0.338 0.737
##
## Residual standard error: 0.2045 on 73 degrees of freedom
## Multiple R-squared: 0.001559,Adjusted R-squared: -0.0121
## F-statistic: 0.114 on 1 and 73 DF, p-value: 0.7366
Chapter 1: Examples

Example 2
The time series of the number of lynx trapped in Hudson’s Bay Company
territory and later Canada in the years from 1821 through 1934.

8
Chapter 1: Examples
Example 2
7000
6000
5000
Number Trapped

4000
3000
2000
1000
0

1820 1840 1860 1880 1900 1920

Time

9
Chapter 1: Examples

Example 2
Notice the periodic behaviour of this time series

Such time series are of interest in population ecology studies, where


one is interested in concepts such as capacity (i.e. how many predators
such as lynx can coexist in such a landscape) or in understanding the
natural decreases and increases in such a population.

10
Chapter 1: Examples

Example 3
Global average temperatures are recorded in terms of number of Celsius
degrees above a baseline temperature. The baseline temperature is the
average temperature for the year 1990.

11
Chapter 1: Examples
Example 3
0.5
Change in Temperature

0.0
−0.5

1880 1900 1920 1940 1960 1980 2000 2020

Time

12
Chapter 1: Examples
Example 3
0.5
Change in Temperature

0.0
−0.5

1880 1900 1920 1940 1960 1980 2000 2020

Time

13
Chapter 1: Examples

Example 3
Question: The main interest in such a series is whether the apparently
increasing trend is simply due to random chance or whether there is a
true structural change in the data.*

* These data are form Datahub https://datahub.io/core/global-temp.


14
Chapter 1: Examples

Example 4
Plot the daily closing pricess for DAX (German) stock index.

15
Chapter 1: Examples
Example 4

0.05
6000
5000

0.00
closing price

log returns
4000
3000

−0.05
2000

−0.10

1992 1994 1996 1998 1992 1994 1996 1998

Time (years) Time (years)

16
Chapter 1: Examples

Example 4
Question:

• Model the trend of daily closing price for prediction purposes.

• Understanding the behaviour of such financial processes from other


perspectives. The right panel contains a plot of the successive
differences of the stock index data after taking logs. This plot
reveals structure, particularly concerning the changes in the degree
of variability - or volatility - in the returns. Being able to model such
data satisfactorily has proved useful in pricing a large number of
financial investment products.

17
Chapter 1: Comparison with Trace Plots from 4 Examples
5.2

6000
Concentration (oz./gal.)

5.0

Number Trapped

4000
4.8
4.6

2000
4.4
4.2

0
5 10 15 20 25 1820 1860 1900

Time (days) Time

6000
0.5
Change in Temperature

closing price
0.0

4000
−0.5

2000

1880 1920 1960 2000 1992 1994 1996 1998

Time Time (years)


Chapter 1: What is Time Series
Chapter 1: What is Time Series

• Observe one object

• Repeatly measure one phenomenon of the same object at different


time

• A record of the measurements


Chapter 1: What if

• Observe more than one objects: LONGITUDIAL DATA

• Repeatly measure one phenomenon of the same object at different


locations with longitude and lattitude: SPACIAL DATA
Chapter 1: Time series modelling

We are trying to find hidden structure behind the noisy data.

18
Chapter 1: Time series modelling

5.2
5.0
Concentration (oz./gal.)

4.8
4.6
4.4
4.2

5 10 15 20 25

Time (days)

19
Chapter 1: Time series modelling

When confronted with a new set of time series data, the data analyst typ-
ically proceeds through the following steps:

1. Exploratory data analysis,

2. Model identification

3. Parameter estimation

4. Confirmatory analysis

5. Forecasting

20
Chapter 1: Necessary background probability and statistics concepts

• the normal distribution and density functions

• the expected value of a random variable and of a sum of random vari-


ables

• the variance and standard deviation of a random variable and of a sum


of random variables

• the covariance and correlation of a pair of random variables

• maximum likelihood estimation

21
Chapter 1: Basic features of a probability model

The continuous probability distributions have a probability density func-


tion or pdf. A pdf is always nonnegative, and the area under the density
curve is exactly 1.0. That is,

f (x) ≥ 0, for all x


and
Z ∞
f (x)dx = 1.
−∞
The probability that a random variable X with density function f (x) takes
a value in an interval [a, b] is calculated as
Z b
P (a ≤ X ≤ b) = f (x)dx.
a

22
Chapter 1: Basic features of a probability model

Such probabilities are also expressed in terms of the cumulative distribu-


tion function of cdf:
Z y
F (y) = P (X ≤ y) = f (x)dx.
−∞
Note that the probability density function can be recovered from the cu-
mulative distribution function by differentiation:

f (x) = F ′(x).

23
Chapter 1: Probabilty Example

For a given value of α > −1, the function


(
(α + 1)xα, for x ∈ [0, 1]
f (x) =
0, otherwise

• Is f (x) a pdf function?

• What is the function of CDF with the pdf f (x)?

24
Chapter 1: Probabilty Example

• For a given value of α > −1, the function f (x) is nonnegative every-
where, and
Z 1
(α + 1)xαdx = 1
0
.


Z x
F (x) = (α + 1)y αdy = xα+1, for x ∈ [0, 1]
−∞
.

25
Chapter 1: Probability Example

The normal pdf is given by


1 − 12 (x−µ)2
f (x) = √ e 2σ
2πσ
where µ is a real-valued constant and σ > 0.

• f (x) is nonnegative everywhere.


• It can be shown that it integrates to 1.
• The cdf cannot be written down in closed form. Instead probability
values are calculated using numerical methods.

26
Chapter 1: Expected value

The expectation of a single (continuous) random variable X, or expected


value of X, can be written as
Z ∞
E[X] = xf (x)dx
−∞
where f (x) is the probability density function of X.

• E[X] is the mean of X.


• The expected value gives us a single number that, at least in a rough
sense, conveys a typical value for the random variable.
• It is sometimes called a measure of location, since it specifies the lo-
cation of the distribution along the real axis.

27
Chapter 1: Expected Value Example

For the density function f (x) = (α + 1)xα, for x ∈ [0, 1], we have
Z 1
α+1
E[X] = x(α + 1)xαdx = . (1)
0 α+2
For the specific case where α = −0.5, this gives E[X] = 1/3. A com-
monly used alternate notation for the mean of a distribution is µ.

28
Chapter 1: Expected Value Example

The expected value of the normal distribution is


Z ∞
x − 12 (x−µ)2
E[X] = √ e 2σ dx = µ.
−∞ 2πσ

29
Chapter 1: Expected Value

Other types of expected value can be calculated by integration. For con-


tinuous functions g(x), we have
Z ∞
E[g(X)] = g(x)f (x)dx.
−∞

When a is a nonrandom constant and g(x) = ax, we have


Z ∞ Z ∞
E[aX] = axf (x)dx = a xf (x)dx = aE[X].
−∞ −∞
Similarly, it can be shown that

E[X + a] = E[X] + a.
When g(x) = x2, and the probability density function is as above, we
have
Z 1
α+1
E[X 2] = x2(α + 1)xαdx = .
0 α+3

30
Chapter 1: Example

The expected value of X 2 under the normal distribution is


x2 − 12 (x−µ)2
Z ∞
E[X 2] = √ e 2σ dx = µ2 + σ 2.
−∞ 2πσ

31
Chapter 1: Variance

• E[X] tells location, not whole story.


• variance is one way to measure the variability of a random variable.
• Denoting the mean of X by µ, we have
Z ∞
Var(X) = E[(X − µ)2] = (x − µ)2f (x)dx.
−∞
• An algebraically equivalent expression is

Var(X) = E[X 2] − µ2.


• Variance actually calculate the everage of the square of how much dif-
ferent from the mean.

E[X] and Var(X) together tell whole story of summary of distribution.

32
Chapter 1: Example

For the distribution with pdf f (x) = (α + 1)xα, the variance is


α + 1 (α + 1)2 α+1
Var(X) = − 2
= 2
.
α + 3 (α + 2) (α + 3)(α + 2)

33
Chapter 1: Example

The variance of X under the normal distribution is

Var(X) = E[X 2] − µ2 = σ 2.

34
Chapter 1: Variance

Question:

Except the value of Var(X) is a number, what does the num-


ber can tell us? especially when you know the variance of
two random variables Var(X) = a and Var(y) = b, what that
tells you?

35
Chapter 1: Variance

• A small value of Var(X) implies that there is more certainty about the
value of X; it will tend to take values close to µ when Var(X) is very
small.
• The distribution will be more spread out when Var(X) is large.

Can you draw two normal distributions which are

E[X] = E[Y ]

Var(X) ̸= Var(y)

36
Chapter 1: Variance

Note also that


Var(aX) = a2Var(X) (2)
for any nonrandom constant a, and

Var(X + a) = Var(X). (3)

37
Chapter 1: standard deviation
q
• The standard deviation is Var(x).
q
• Both standard deviation and Var(x) quantities summarize the spread
or variability in a probability distribution.

38
Chapter 1: Sample

• size n sample: randomly choose n of objects from the poplutation.


• denotea sample of measurements as x1, x2, . . . , xn
• here we use lower case of letters, denotes it is measurment of one
observation.

39
Chapter 1: Calculating the mean from a sample

• calculates the average of the sample values:


n
1 X
x̄ = xj .
n j=1
• sample mean using the mean() function in R to calculate

40
Chapter 1: Calculating the variance from a sample

• evaluates the formula for variance from a sample


n
1
s2 = (xj − x̄)2.
X
n − 1 j=1
• The sample variance can be computed
√ in R with the var() function.
• The sample standard deviation is s2: s.
• This can be computed using sd(). in R

41
Chapter 1: Modelling more than one random variable

• multiple random variables are to be analyzed. Information on k con-


tinuously measured random variables X1, X2, . . . , Xk can be summa-
rized by their distribution which often has a probability density function
f (x1, x2, . . . , xk ).
• we call function f a joint pdf to emphasize that there is more than one
random variable involved.

42
Chapter 1: Example

Consider the function f (x1, x2, x3) = 23 (x1 + x2 + x3 ), for 0 ≤ x1 ≤


1, 0 ≤ x2 ≤ 1, and 0 ≤ x3 ≤ 1, with f (x1, x2, x3) = 0, otherwise.
f is a valid joint pdf for 3 random variables, because it is nonnegative
everywhere and because its (triple) integral evaluates to 1.

43
Chapter 1: Example

Both X1 and X2 are normally distributed. The expected value of Xi is µi


and its variance is σi2, for i = 1, 2. The parameter ρ is referred to as the
correlation coefficient.
Let X1 and X2 be random variables taken from a population with joint

pdf
1
f (x1, x2) = q e−q/2
2πσ1σ2 1 − ρ2
where
1 2 − 2ρz z + z 2 )
q= (z 1 2
1 − ρ2 1 2

and z1 = (x1 − µ1)/σ1 and z2 = (x2 − µ2)/σ2.

The joint pdf f is called the bivariate normal density function.

44
Chapter 1: Probability calculations

Probabilities are calculated by integration. For the case of two random


variables X1 and X2, we can calculate the probability that X1 ∈ A and
X2 ∈ B as
Z Z
P (X1 ∈ A, X2 ∈ B) = f (x1, x2)dx1dx2.
A B
Here, f (x1, x2) represents the joint probability density function for the
random variables X1 and X2. Usually, this kind of calculation must be
carried out using numerical methods.

45
Chapter 1: Covariance

Expected values of functions of two random variables are calculated us-


ing double integrals:
Z Z
E[g(X1, X2)] = g(x1, x2)f (x1, x2)dx1dx2.
A B
Here g() is a continuous function of 2 variables, and the integrals are
assumed to be taken over the support of the function f () (i.e. wherever
it is strictly positive).

46
Chapter 1: Covariance

• The covariance of two random variables gives a measure of their asso-


ciation of two random variables, for example X1 and X2
• The formula

Cov(X1, X2) = E[X1X2] − E[X1]E[X2].


• Cov(X1, X2) ̸= 0 implies that the distributions of values of the pair of
random variables will have (at least a vague) linear relationship.
– When Cov(X1, X2) > 0 , It is called possitive associated
– When Cov(X1, X2) < 0 , it is called negative associated
• When Cov(X1, X2) = 0, the two random variables do not have either a
positive or negative linear association.
• Cov(aX1, bX2) = abCov(X1, X2), a and b are constant

47
Chapter 1: Correlation

Cov(X1, X2)
ρ = Corr(X1, X2) = q .
Var(X1)Var(X2)

• When the covariance is 0, the correlation will obviously be 0, we say


these measurements are uncorrelated.
• 0≤ρ≤1
• Correlation measures the strength of association of two random vari-
ables

48
Chapter 1: Calculation of covariance and correlation for a sample

For a sample {(x11, x21), (x12, x22), . . . , (x1n, x2n)}, the sample covari-
ance is given by
n
1 X
c= (x1j − x̄1)(x2j − x̄2).
n − 1 j=1
The sample correlation is given by
c
r=
s1 s 2
where s1 and s2 are the sample standard deviations of the samples of
x1’s and x2’s respectively.

49
Chapter 1: Independence

When a random variable X1 provides no predictive information about an-


other random variable X2, we say that X2 is independent of X1. Mathe-
matically, X1 and X2 are independent if their joint density is

f (x1, x2) = fX1 (x1)fX2 (x2) (4)


where the two functions are the respective marginal densities of X1 and
X2 .

50
Ch1: Expected values of products of independent random variables

For independent X1 and X2, we have

E[X1X2] = E[X1]E[X2].
In fact, for any functions g1(x) and g2(x), we have the more general result
that
E[g1(X1)g2(X2)] = E[g1(X1)]E[g2(X2)]
when X1 and X2 are independent.

51
Chapter 1: Maximum likelihood Estimation (MLE)

What kind of scenario can use MLE?

• a probability model assumed from given data


• estimating the unknown parameters for a probability model
• the objective is to find parameter values which give the highest proba-
bility of observing the data under the given model.

52
Chapter 1: Example

Suppose X1 and X2 are observations from a population governed by the


joint pdf
2
f (x1, x2) = (x1 + θe−θx2 ), for x1 ∈ (0, 1); x2 > 0
3
and f (x1, x2) = 0, otherwise.

• The goal is to find the value of θ which maximizes f (x1, x2) at the sam-
pled values of x1 and x2.

53
Chapter 1: Likelihood function

• Let X denote the vector of a random variables, that is X = (X1, X2, · · · , Xn)
• Let x denote the sample (x1, x2, · · · , xn)
• Suppose the density function of x is joint pdf or pmf of a sample,
f (x|θ) = f (x1, x2, · · · , xn|θ). Suppose we know the distribution of the
population, in the joint pdf or pmf function of x
– parameter θ is known
– the variables x is unknown
• Now, we observed the sample, but we don’t know the parameters, then
L(θ; x) = f (x1, x2, · · · , xn; θ) is called likelihood function. It is same
joint pdf or pmf, but unknown is parameter.

54
Chapter 1: Likelihood function in the discrete case

L(θ; x) = Pθ (X = x)

• Suppose there are two possibilities of θ, θ1 and θ2


• With the sample, suppose

Pθ1 (X = x) = L(θ1; x) > L(θ2; x) = Pθ2 (X = x)

Based on the present, which parameter of θ1 and θ2 is more likely to


create the sample we observed? Why?

55
Chapter 1: Likelihood function in the discrete case

If the sample we observed is really random selected from the population


with parameter θ, θ = θ1 is more likely to make the sample occured than
θ = θ2.

56
Chapter 1: Likelihood function in the continuous case

L(θ; x) = fθ (x)

• Suppose there are two possibilities of θ, θ1 and θ2


• With the sample, suppose

fθ1 (x) = L(θ1; x) > L(θ2; x) = fθ2 (x)

Based on the present, which parameter of θ1 and θ2 is more likely to


create the sample we observed? Why?

57
Chapter 1: Example

Suppose X1 and X2 are observations from a population governed by the


joint pdf
x1 <- 0.2
x2 <- 5
fn <- function(x){2/3*(x1+ x*exp(-x*x2))}
curve(fn, main=expression(paste("f(", theta, "; x1=0.2, x2=5)

58
f(θ; x1=0.2, x2=5)

0.18
0.17
0.16
f(θ)

0.15
0.14

0.0 0.2 0.4 0.6 0.8 1.0

θ
Chapter 1: Example

Suppose X1 and X2 are observations from a population governed by the


joint pdf
x1 <- 0.8
x2 <- 1.5
fn <- function(x){2/3*(x1+ x*exp(-x*x2))}
curve(fn, main=expression(paste("f(", theta, "; x1=0.8, x2=1.

59
f(θ; x1=0.8, x2=1.5)

0.70
0.65
f(θ)

0.60
0.55

0.0 0.2 0.4 0.6 0.8 1.0

θ
Chapter 1: Example

Suppose X1 and X2 are observations from a population governed by the


joint pdf
x1 <- 0.8
x2 <- 5
fn <- function(x){2/3*(x1+ x*exp(-x*x2))}
curve(fn, main=expression(paste("f(", theta, "; x1=0.8, x2=5)

60
f(θ; x1=0.8, x2=5)

0.58
0.57
0.56
f(θ)

0.55
0.54

0.0 0.2 0.4 0.6 0.8 1.0

θ
Chapter 1: Example

The log likelihood is

ℓ(θ) = log L(θ) = log(f (θ; x1, x2)) = log(2/3) + log(x1 + θe−θx2 ).

• Differentiating this with respect to θ


• Setting the derivative to 0, yields a single equation in the unknown
θ. After some algebraic simplification, we can find that the solution
is θb = 1/x2. This is the maximum likelihood estimator for θ.

61
Chapter 1: MLE

• When the observations are sampled independently of each other, the


likelihood function can be written down as a product of individual, or
marginal, pdfs. Then the log likelihood becomes a sum of the loga-
rithms of the marginal pdfs:
n
X
ℓ(θ) = log f (xi; θ).
i=1
• In the preceding example, the random variables were not independent
of each other, so we could not write the log likelihood in this way. We
had to work with the log of the joint pdf. With time series data, we
usually have some form of dependence, so the approach to maximum
likelihood estimation is based directly on the joint pdf of the data.

62
Chapter 1: Large sample properties of maximum likelihood estimators

• The maximum likelihood estimator for a parameter θ can be shown to


approximately normally distributed conditional on
– Some probability models. Under fairly general conditions, satisfied by
most of the models considered in these notes.
– A large enough sample of independent observations.
• The mean of this distribution is equal to the true parameter value
• The variance is equal to the (negative) inverse of the second derivative
of the log likelihood.
• The expected value of the negative second derivative is often referred
to as the Fisher Information.
• If there are multiple parameters, the second derivative is really a matrix
of second derivatives, and the reciprocals of the diagonal elements are
used to compute the standard errors of the corresponding parameters.

63

You might also like