0% found this document useful (0 votes)
209 views36 pages

Time Series Analysis and Forecasting

This document provides an overview of time series analysis and forecasting techniques. It discusses key aspects of time series such as stationarity, seasonality, and autocorrelation. Various time series models are explained, including moving average, exponential smoothing, ARIMA, and SARIMA models. An example of predicting stock prices uses real data to demonstrate exploratory data analysis and applying a moving average model to smooth the time series. The goal is to predict the closing price of a stock over the next five days.

Uploaded by

Avanija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
209 views36 pages

Time Series Analysis and Forecasting

This document provides an overview of time series analysis and forecasting techniques. It discusses key aspects of time series such as stationarity, seasonality, and autocorrelation. Various time series models are explained, including moving average, exponential smoothing, ARIMA, and SARIMA models. An example of predicting stock prices uses real data to demonstrate exploratory data analysis and applying a moving average model to smooth the time series. The goal is to predict the closing price of a stock over the next five days.

Uploaded by

Avanija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

The Complete Guide to

Time Series Analysis and


Forecasting
Whether we wish to predict the trend in financial
markets or electricity consumption, time is an
important factor that must now be considered in our
models. For example, it would be interesting to
forecast at what hour during the day is there going
to be a peak consumption in electricity, such as to
adjust the price or the production of electricity.

Enter time series. A time series is simply a series of


data points ordered in time. In a time series, time is
often the independent variable and the goal is
usually to make a forecast for the future.

However, there are other aspects that come into


play when dealing with time series.

Is it stationary?

Is there a seasonality?
Autocorrelation
Informally, autocorrelation is the similarity
between observations as a function of the time lag
between them.

Example of an autocorrelation plot

Above is an example of an autocorrelation plot.


Looking closely, you realize that the first value and
the 24th value have a high autocorrelation.
Similarly, the 12th and 36th observations are highly
correlated. This means that we will find a very
similar value at every 24 unit of time.

Notice how the plot looks like sinusoidal function.


This is a hint for seasonality, and you can find its
value by finding the period in the plot above, which
would give 24h.

Seasonality
Seasonality refers to periodic fluctuations. For
example, electricity consumption is high during the
day and low during night, or online sales increase
during Christmas before slowing down again.

Example of seasonality

As you can see above, there is a clear daily


seasonality. Every day, you see a peak towards the
evening, and the lowest points are the beginning and
the end of each day.

Remember that seasonality can also be derived from


an autocorrelation plot if it has a sinusoidal shape.
Simply look at the period, and it gives the length of
the season.

Stationarity
Stationarity is an important characteristic of time
series. A time series is said to be stationary if its
statistical properties do not change over time. In
other words, it has constant mean and variance,
and covariance is independent of time.

Example of a stationary process

Looking again at the same plot, we see that the


process above is stationary. The mean and variance
do not vary over time.

Often, stock prices are not a stationary process,


since we might see a growing trend, or its volatility
might increase over time (meaning that variance is
changing).

Ideally, we want to have a stationary time series for


modelling. Of course, not all of them are stationary,
but we can make different transformations to make
them stationary.

How to test if a process is


stationary
You may have noticed in the title of the plot
above Dickey-Fuller.  This is the statistical test that
we run to determine if a time series is stationary or
not.

Without going into the technicalities of the Dickey-


Fuller test, it test the null hypothesis that a unit root
is present.

If it is, then p >  0, and the process is not stationary.

Otherwise, p =  0, the null hypothesis is rejected,


and the process is considered to be stationary.
As an example, the process below is not stationary.
Notice how the mean is not constant through time.

Example of a non-stationary process

Modelling time series


There are many ways to model a time series in order
to make predictions. Here, I will present:

 moving average

 exponential smoothing

 ARIMA

Moving average
The moving average model is probably the most
naive approach to time series modelling. This model
simply states that the next observation is the mean
of all past observations.

Although simple, this model might be surprisingly


good and it represents a good starting point.

Otherwise, the moving average can be used to


identify interesting trends in the data. We can define
a window to apply the moving average model
to smooth the time series, and highlight different
trends.

Example of a moving average on a 24h window

In the plot above, we applied the moving average


model to a 24h window. The green
line smoothed  the time series, and we can see that
there are 2 peaks in a 24h period.
Of course, the longer the window, the smoother  the
trend will be. Below is an example of moving
average on a smaller window.

Example of a moving average on a 12h window

Exponential smoothing
Exponential smoothing uses a similar logic to moving
average, but this time, a different decreasing
weight  is assigned to each observations. In other
words, less importance  is given to observations as
we move further from the present.

Mathematically, exponential smoothing is expressed


as:
Exponential smoothing expression

Here, alpha  is a smoothing factor that takes


values between 0 and 1. It determines how fast the
weight decreases for previous observations.

Example of exponential smoothing

From the plot above, the dark blue line represents


the exponential smoothing of the time series using a
smoothing factor of 0.3, while the orange line uses a
smoothing factor of 0.05.

As you can see, the smaller the smoothing factor,


the smoother the time series will be. This makes
sense, because as the smoothing factor approaches
0, we approach the moving average model.
Double exponential
smoothing
Double exponential smoothing is used when there is
a trend in the time series. In that case, we use this
technique, which is simply a recursive use of
exponential smoothing twice.

Mathematically:

Double exponential smoothing expression

Here, beta is the trend smoothing factor, and it


takes values between 0 and 1.

Below, you can see how different values


of alpha  and beta  affect the shape of the time
series.
Example of double exponential smoothing

Tripe exponential smoothing


This method extends double exponential smoothing,
by adding a seasonal smoothing factor. Of
course, this is useful if you notice seasonality in your
time series.

Mathematically, triple exponential smoothing is


expressed as:
Triple exponential smoothing expression

Where gamma  is the seasonal smoothing factor


and L  is the length of the season.

Seasonal autoregressive
integraded moving average
model (SARIMA)
SARIMA is actually the combination of simpler
models to make a complex model that can model
time series exhibiting non-stationary properties and
seasonality.

At first, we have the autoregression model


AR(p). This is basically a regression of the time
series onto itself. Here, we assume that the current
value depends on its previous values with some lag.
It takes a parameter p which represents the
maximum lag. To find it, we look at the partial
autocorrelation plot and identify the lag after which
most lags are not significant.

In the example below, p would be 4.

Example of a partial autocorrelation plot

Then, we add the moving average model MA(q).


This takes a parameter q which represents the
biggest lag after which other lags are not significant
on the autocorrelation plot.

Below, q would be 4.

Example of an autocorrelation plot

After, we add the order of integration I(d). The


parameter d represents the number of differences
required to make the series stationary.

Finally, we add the final component: seasonality


S(P, D, Q, s), where s is simply the season’s length.
Furthermore, this component requires the
parameters P and Q which are the same as p and q,
but for the seasonal component. Finally, D is the
order of seasonal integration representing the
number of differences required to remove
seasonality from the series.

Combining all, we get the SARIMA(p, d, q)(P, D,


Q, s) model.

The main takeaway is: before modelling with


SARIMA, we must apply transformations to our time
series to remove seasonality and any non-stationary
behaviors.

That was a lot of theory to wrap our head around!


Let’s apply the techniques discussed above in our
first project.

We will try to predict the stock price of a specific


company. Now, predicting the stock price is virtually
impossible. However, it remains a fun exercise and it
will be a good way to practice what we have learned.
Project 1 — Predicting stock
price
We will use the historical stock price of the New
Germany Fund (GF) to try to predict the closing price
in the next five trading days.

Import the data

First, we import some libraries that will be helpful


throughout our analysis. Also, we define the mean
average percentage error (MAPE), as this will be
our error metric.

Then, we import our dataset and we previous the


first ten entries, and you should get:
First 10 entries of the dataset

As you can see, we have a few entries concerning a


different stock than the New Germany Fund (GF).
Also, we have an entry concerning intraday
information, but we only want end of day (EOD)
information.

Clean the data

First, we remove unwanted entries.

Then, we remove unwanted columns, as we solely


want to focus on the stock’s closing price.

If you preview the dataset, you should see:


Clean dataset

Awesome! We are ready for exploratory data


analysis!

Exploratory Data Analysis (EDA)

We plot the closing price over the entire time period


of our dataset.

You should get:


Closing price of the New Germany Fund (GF)

Clearly, you see that this is not


a stationary process, and it is hard to tell if there is
some kind of seasonality.

Moving average
Let’s use the moving average model to smooth our
time series. For that, we will use a helper function
that will run the moving average model on a
specified time window and it will plot the result
smoothed curve:

Using a time window of 5 days, we get:


Smoothed curve by the previous trading week

As you can see, we can hardly see a trend, because


it is too close to actual curve. Let’s see the result of
smoothing by the previous month, and previous
quarter.
Smoothed by the previous month (30 days)

Smoothed by the previous quarter (90 days)

Trends are easier to spot now. Notice how the 30-


day and 90-day trend show a downward curve at the
end. This might mean that the stock is likely to go
down in the following days.

Exponential smoothing
Now, let’s use exponential smoothing to see if it
can pick up a better trend.

Here, we use 0.05 and 0.3 as values for


the smoothing factor. Feel free to try other values
and see what the result is.
Exponential smoothing

As you can see, an alpha value of 0.05 smoothed the


curve while picking up most of the upward and
downward trends.

Now, let’s use double exponential smoothing.

Double exponential smoothing

And you get:


Double exponential smoothing

Again, experiment with


different alpha  and  beta  combinations to get better
looking curves.

Modelling
As outlined previously, we must turn our series into
a stationary process in order to model it. Therefore,
let’s apply the Dickey-Fuller test to see if it is a
stationary process:

You should see:


By the Dickey-Fuller test, the time series is
unsurprisingly non-stationary. Also, looking at the
autocorrelation plot, we see that it is very high, and
it seems that there is no clear seasonality.

Therefore, to get rid of the high autocorrelation and


to make the process stationary, let’s take the first
difference (line 23 in the code block). We simply
subtract the time series from itself with a lag of one
day, and we get:
Awesome! Our series is now stationary and we can
start modelling!

SARIMA

Now, for SARIMA, we first define a few parameters


and a range of values for other parameters to
generate a list of all possible combinations of p, q, d,
P, Q, D, s.

Now, in the code cell above, we have 625 different


combinations! We will try each combination and train
SARIMA with each so to find the best performing
model. This might take while depending on your
computer’s processing power.

Once this is done, we print out a summary of the


best model, and you should see:

Awesome! We finally predict the closing price of the


next five trading days and evaluate the MAPE of the
model.

In this case, we have a MAPE of 0.79%, which is


very good!
Compare the predicted price to actual
data

Now, to compare our prediction with actual data, we


take financial data from Yahoo Finance and create a
dataframe.

Then, we make a plot to see how far we were from


the actual closing prices:

Comparison of predicted and actual closing prices

It seems that we are a bit off in our predictions. In


fact, the predicted price is essentially flat, meaning
that our model is probably not performing well.
Again, this is not due to our procedure, but to the
fact that predicting stock prices is essentially
impossible.

From the first project, we learned the entire


procedure of making a time series stationary before
using SARIMA to model. It is a long and tedious
process, with a lot of manual tweaking.

Now, let’s introduce Facebook’s Prophet. It is a


forecasting tool available in both Python and R. This
tool allows both experts and non-experts to produce
high quality forecasts with minimal efforts.

Let’s see how we can use it in this second project!

Project 2 — Predict air


quality with Prophet
The title says it all: we will use Prophet to help us
predict air quality!

The full notebook and dataset can be found here.


Let’s make some predictions!
Prophecy cat!

Import the data

As always, we start by importing some useful


libraries. We then print out the first five rows:

First five entries of the dataset

As you can see, the dataset contains information


about the concentrations of different gases. They
were recorded at every hour for each day. You can
find a description of all features here.

If you explore the dataset a bit more, you will notice


that there are many instances of the value -200. Of
course, it does not make sense to have a negative
concentration, so we will need to clean the data
before modelling.
Therefore, we need to clean the data.

Data cleaning and feature engineering

Here, we start off by parsing our date column to turn


into “dates”.

Then, we turn all the measurements into floats.

After, we aggregate the data by day, by taking the


average of each measurement.

At this point, we still have some NaN that we need


to get rid of. Therefore, we remove the columns that
have more than 8 NaN. That way, we can then
remove rows containing NaN values without losing
too much data.

Finally, we aggregate the data by week, because it


will give a smoother trend to analyze.

We can plot the trends of each chemical. Here, we


show that of NOx.
NOx concentration

Oxides of nitrogen are very harmful, as they react to


form smog and acid rain, as well as being
responsible for the formation of fine particles and
ground level ozone. These have adverse health
effects, so the concentration of NOx is a key feature
of air quality.

Modelling

We will solely focus on modelling the NOx


concentration. Therefore, we remove all other
irrelevant columns.
Then, we import Prophet.

Prophet requires the date column to be


named ds and the feature column to be named y, so
we make the appropriate changes.

At this point, our data looks like this:

Then, we define a training set. For that we will hold


out the last 30 entries for prediction and validation.

Afterwards, we simply initialize Prophet, fit the


model to the data, and make predictions!
You should see the following:

Here, yhat represents the prediction,


while yhat_lower and yhat_upper represent the
lower and upper bound of the prediction
respectively.

Prophet allows you to easily plot the forecast and we


get:
NOx concentration forecast

As you can see, Prophet simply used a straight


downward line to predict the concentration of NOx in
the future.

Then, we check if the time series has any interesting


features, such as seasonality:
Here, Prophet only identified a downward trend with
no seasonality.

Evaluating the model’s performance by calculating


its mean absolute percentage error (MAPE) and
mean absolute error (MAE), we see that the MAPE is
13.86% and the MAE is 109.32, which is not that
bad! Remember that we did not fine tune the model
at all.

Finally, we just plot the forecast with its upper and


lower bounds:
Forecast of the average weekly NOx concentration

You might also like