0% found this document useful (0 votes)

108 views

A Complete Tutorial On Time Series Modeling in R: DECEMBER 16, 2015

Uploaded by

Donald

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views

A Complete Tutorial On Time Series Modeling in R: DECEMBER 16, 2015

Uploaded by

Donald

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Q i LOGIN / REGISTER

A Complete Tutorial on Time Series Modeling in R

TAVISH SRIVASTAVA, DECEMBER 16, 2015 LOGIN TO BOOKMARK THIS ARTICLE

Overview

Time Series Analysis and Time Series Modeling are powerful forecasting tools
A prior knowledge of the statistical theory behind Time Series is useful before Time series Modeling
ARMA and ARIMA are important models for performing Time Series Analysis

Introduction

‘Time’ is the most important factor which ensures success in a business. It’s di cult to keep up with the pace of time. But, technology
has developed some powerful methods using which we can ‘see things’ ahead of time. Don’t worry, I am not talking about Time
Machine. Let’s be realistic here!

I’m talking about the methods of prediction & forecasting. One such method, which deals with time based data is Time Series Modeling .
As the name suggests, it involves working on time (years, days, hours, minutes) based data, to derive hidden insights to make informed
decision making.

Time series models are very useful models when you have serially correlated data. Most of business houses work on time series data to
analyze sales number for the next year, website tra c, competition position and much more. However, it is also one of the areas, which
many analysts do not understand.

So, if you aren’t sure about complete process of time series modeling, this guide would introduce you to various levels of time series
modeling and its related techniques.

Table of Contents
1. Basics – Time Series Modeling
2. Exploration of Time Series Data in R
3. Introduction to ARMA Time Series Modeling
4. Framework and Application of ARIMA Time Series Modeling

Time to get started!

1. Basics – Time Series Modeling

Let’s begin from basics. This includes stationary series, random walks , Rho Coefficient, Dickey Fuller Test of Stationarity. If these terms
are already scaring you, don’t worry – they will become clear in a bit and I bet you will start enjoying the subject as I explain it.

Stationary Series

There are three basic criterion for a series to be classified as stationary series :

1. The mean of the series should not be a function of time rather should be a constant. The image below has the left hand graph
satisfying the condition whereas the graph in red has a time dependent mean.

2. The variance of the series should not a be a function of time. This property is known as homoscedasticity. Following graph depicts
what is and what is not a stationary series. (Notice the varying spread of distribution in the right hand graph)

3. The covariance of the i th term and the (i + m) th term should not be a function of time. In the following graph, you will notice the
spread becomes closer as the time increases. Hence, the covariance is not constant with time for the ‘red series’.
Why do I care about ‘stationarity’ of a time series?

The reason I took up this section rst was that until unless your time series is stationary, you cannot build a time series model. In cases
where the stationary criterion are violated, the rst requisite becomes to stationarize the time series and then try stochastic models to
predict this time series. There are multiple ways of bringing this stationarity. Some of them are Detrending, Differencing etc.

Random Walk

This is the most basic concept of the time series. You might know the concept well. But, I found many people in the industry who
interprets random walk as a stationary process. In this section with the help of some mathematics, I will make this concept crystal clear
for ever. Let’s take an example.

Example: Imagine a girl moving randomly on a giant chess board. In this case, next position of the girl is only dependent on the last
position.

(Source: http://scifun.chem.wisc.edu/WOP/RandomWalk.html )

Now imagine, you are sitting in another room and are not able to see the girl. You want to predict the position of the girl with time. How
accurate will you be? Of course you will become more and more inaccurate as the position of the girl changes. At t=0 you exactly know
where the girl is. Next time, she can only move to 8 squares and hence your probability dips to 1/8 instead of 1 and it keeps on going
down. Now let’s try to formulate this series :

X(t) = X(t-1) + Er(t)

where Er(t) is the error at time point t. This is the randomness the girl brings at every point in time.

Now, if we recursively fit in all the Xs, we will finally end up to the following equation :

X(t) = X(0) + Sum(Er(1),Er(2),Er(3).....Er(t))

Now, lets try validating our assumptions of stationary series on this random walk formulation:

1. Is the Mean constant ?

E[X(t)] = E[X(0)] + Sum(E[Er(1)],E[Er(2)],E[Er(3)].....E[Er(t)])

We know that Expectation of any Error will be zero as it is random.

Hence we get E[X(t)] = E[X(0)] = Constant.

2. Is the Variance constant?

Var[X(t)] = Var[X(0)] + Sum(Var[Er(1)],Var[Er(2)],Var[Er(3)].....Var[Er(t)])

Var[X(t)] = t * Var(Error) = Time dependent.

Hence, we infer that the random walk is not a stationary process as it has a time variant variance. Also, if we check the covariance, we
see that too is dependent on time.

Let’s spice up things a bit,

We already know that a random walk is a non-stationary process. Let us introduce a new coe cient in the equation to see if we can
make the formulation stationary.

Introduced coefficient : Rho

X(t) = Rho * X(t-1) + Er(t)

Now, we will vary the value of Rho to see if we can make the series stationary. Here we will interpret the scatter visually and not do any
test to check stationarity.

Let’s start with a perfectly stationary series with Rho = 0 . Here is the plot for the time series :

Increase the value of Rho to 0.5 gives us following graph :

You might notice that our cycles have become broader but essentially there does not seem to be a serious violation of stationary
assumptions. Let’s now take a more extreme case of Rho = 0.9

We still see that the X returns back from extreme values to zero after some intervals. This series also is not violating non-stationarity
significantly. Now, let’s take a look at the random walk with rho = 1.

This obviously is an violation to stationary conditions. What makes rho = 1 a special case which comes out badly in stationary test? We
will find the mathematical reason to this.

Let’s take expectation on each side of the equation “X(t) = Rho * X(t-1) + Er(t)”

E[X(t)] = Rho *E[ X(t-1)]

This equation is very insightful. The next X (or at time point t) is being pulled down to Rho * Last value of X.

For instance, if X(t – 1 ) = 1, E[X(t)] = 0.5 ( for Rho = 0.5) . Now, if X moves to any direction from zero, it is pulled back to zero in next
step. The only component which can drive it even further is the error term. Error term is equally probable to go in either direction. What
happens when the Rho becomes 1? No force can pull the X down in the next step.

Dickey Fuller Test of Stationarity

What you just learnt in the last section is formally known as Dickey Fuller test. Here is a small tweak which is made for our equation to
convert it to a Dickey Fuller test:

X(t) = Rho * X(t-1) + Er(t)

=> X(t) - X(t-1) = (Rho - 1) X(t - 1) + Er(t)

We have to test if Rho – 1 is significantly different than zero or not. If the null hypothesis gets rejected, we’ll get a stationary time series.

Stationary testing and converting a series into a stationary series are the most critical processes in a time series modelling. You need to
memorize each and every detail of this concept to move on to the next step of time series modelling.

Let’s now consider an example to show you what a time series looks like.

2. Exploration of Time Series Data in R

Here we’ll learn to handle time series data on R. Our scope will be restricted to data exploring in a time series type of data set and not go
to building time series models.

I have used an inbuilt data set of R called AirPassengers. The dataset consists of monthly totals of international airline passengers, 1949
to 1960.

Loading the Data Set

Following is the code which will help you load the data set and spill out a few top level metrics.

> data(AirPassengers)
> class(AirPassengers)

[1] "ts"

#This tells you that the data series is in a time series format

> start(AirPassengers)

[1] 1949 1

#This is the start of the time series

> end(AirPassengers)

[1] 1960 12

#This is the end of the time series

> frequency(AirPassengers)

[1] 12

#The cycle of this time series is 12months in a year

> summary(AirPassengers)
Min. 1st Qu. Median Mean 3rd Qu. Max.

104.0 180.0 265.5 280.3 360.5 622.0

Detailed Metrics

#The number of passengers are distributed across the spectrum

> plot(AirPassengers)

#This will plot the time series

>abline(reg=lm(AirPassengers~time(AirPassengers)))

# This will fit in a line

Here are a few more operations you can do:

> cycle(AirPassengers)

#This will print the cycle across years.

>plot(aggregate(AirPassengers,FUN=mean))

#This will aggregate the cycles and display a year on year trend

> boxplot(AirPassengers~cycle(AirPassengers))

#Box plot across months will give us a sense on seasonal effect

Important Inferences

1. The year on year trend clearly shows that the #passengers have been increasing without fail.
2. The variance and the mean value in July and August is much higher than rest of the months.
3. Even though the mean value of each month is quite different their variance is small. Hence, we have strong seasonal effect with a
cycle of 12 months or less.

Exploring data becomes most important in a time series model – without this exploration, you will not know whether a series is
stationary or not. As in this case we already know many details about the kind of model we are looking out for.

Let’s now take up a few time series models and their characteristics. We will also take this problem forward and make a few predictions.

3. Introduction to ARMA Time Series Modeling

ARMA models are commonly used in time series modeling. In ARMA model, AR stands for auto-regression and MA stands for moving
average. If these words sound intimidating to you, worry not – I’ll simplify these concepts in next few minutes for you!

We will now develop a knack for these terms and understand the characteristics associated with these models. But before we start, you
should remember, AR or MA are not applicable on non-stationary series.

In case you get a non stationary series, you rst need to stationarize the series (by taking difference / transformation) and then choose
from the available time series models.

First, I’ll explain each of these two models (AR & MA) individually. Next, we will look at the characteristics of these models.
Auto-Regressive Time Series Model

Let’s understanding AR models using the case below:

The current GDP of a country say x(t) is dependent on the last year’s GDP i.e. x(t – 1). The hypothesis being that the total cost of
production of products & services in a country in a scal year (known as GDP) is dependent on the set up of manufacturing plants /
services in the previous year and the newly set up industries / plants / services in the current year. But the primary component of the
GDP is the former one.

Hence, we can formally write the equation of GDP as:

x(t) = alpha * x(t – 1) + error (t)

This equation is known as AR(1) formulation. The numeral one (1) denotes that the next instance is solely dependent on the previous
instance. The alpha is a coe cient which we seek so as to minimize the error function. Notice that x(t- 1) is indeed linked to x(t-2) in the
same fashion. Hence, any shock to x(t) will gradually fade off in future.

For instance, let’s say x(t) is the number of juice bottles sold in a city on a particular day. During winters, very few vendors purchased
juice bottles. Suddenly, on a particular day, the temperature rose and the demand of juice bottles soared to 1000. However, after a few
days, the climate became cold again. But, knowing that the people got used to drinking juice during the hot days, there were 50% of the
people still drinking juice during the cold days. In following days, the proportion went down to 25% (50% of 50%) and then gradually to a
small number after significant number of days. The following graph explains the inertia property of AR series:

Moving Average Time Series Model

Let’s take another case to understand Moving average time series model.

A manufacturer produces a certain type of bag, which was readily available in the market. Being a competitive market, the sale of the
bag stood at zero for many days. So, one day he did some experiment with the design and produced a different type of bag. This type of
bag was not available anywhere in the market. Thus, he was able to sell the entire stock of 1000 bags (lets call this as x(t) ). The
demand got so high that the bag ran out of stock. As a result, some 100 odd customers couldn’t purchase this bag. Lets call this gap as
the error at that time point. With time, the bag had lost its woo factor. But still few customers were left who went empty handed the
previous day. Following is a simple formulation to depict the scenario :

x(t) = beta * error(t-1) + error (t)

If we try plotting this graph, it will look something like this :

Did you notice the difference between MA and AR model? In MA model, noise / shock quickly vanishes with time. The AR model has a
much lasting effect of the shock.

Difference between AR and MA models

The primary difference between an AR and MA model is based on the correlation between time series objects at different time points.
The correlation between x(t) and x(t-n) for n > order of MA is always zero. This directly ows from the fact that covariance between x(t)
and x(t-n) is zero for MA models (something which we refer from the example taken in the previous section). However, the correlation of
x(t) and x(t-n) gradually declines with n becoming larger in the AR model. This difference gets exploited irrespective of having the AR
model or MA model. The correlation plot can give us the order of MA model.

Exploiting ACF and PACF plots

Once we have got the stationary time series, we must answer two primary questions:

Q1. Is it an AR or MA process?

Q2. What order of AR or MA process do we need to use?

The trick to solve these questions is available in the previous section. Didn’t you notice?

The rst question can be answered using Total Correlation Chart (also known as Auto – correlation Function / ACF). ACF is a plot of
total correlation between different lag functions. For instance, in GDP problem, the GDP at time point t is x(t). We are interested in the
correlation of x(t) with x(t-1) , x(t-2) and so on. Now let’s reflect on what we have learnt above.

In a moving average series of lag n, we will not get any correlation between x(t) and x(t – n -1) . Hence, the total correlation chart cuts off
at nth lag. So it becomes simple to nd the lag for a MA series. For an AR series this correlation will gradually go down without any cut
off value. So what do we do if it is an AR series?

Here is the second trick. If we nd out the partial correlation of each lag, it will cut off after the degree of AR series. For instance,if we
have a AR(1) series, if we exclude the effect of 1st lag (x (t-1) ), our 2nd lag (x (t-2) ) is independent of x(t). Hence, the partial correlation
function (PACF) will drop sharply after the 1st lag. Following are the examples which will clarify any doubts you have on this concept :

ACF PACF

The blue line above shows signi cantly different values than zero. Clearly, the graph above has a cut off on PACF curve after 2nd lag
which means this is mostly an AR(2) process.

ACF P ACF
Clearly, the graph above has a cut off on ACF curve after 2nd lag which means this is mostly a MA(2) process.

Till now, we have covered on how to identify the type of stationary series using ACF & PACF plots. Now, I’ll introduce you to a
comprehensive framework to build a time series model. In addition, we’ll also discuss about the practical applications of time series
modelling.

4. Framework and Application of ARIMA Time Series Modeling

A quick revision, Till here we’ve learnt basics of time series modeling, time series in R and ARMA modeling. Now is the time to join these
pieces and make an interesting story.

Overview of the Framework

This framework(shown below) specifies the step by step approach on ‘ How to do a Time Series Analysis ‘:

As you would be aware, the rst three steps have already been discussed above. Nevertheless, the same has been delineated brie y
below:

Step 1: Visualize the Time Series

It is essential to analyze the trends prior to building any kind of time series model. The details we are interested in pertains to any kind of
trend, seasonality or random behaviour in the series. We have covered this part in the second part of this series.
Step 2: Stationarize the Series

Once we know the patterns, trends, cycles and seasonality , we can check if the series is stationary or not. Dickey – Fuller is one of the
popular test to check the same. We have covered this test in the first part of this article series. This doesn’t ends here! What if the series
is found to be non-stationary?

There are three commonly used technique to make a time series stationary:

1. Detrending : Here, we simply remove the trend component from the time series. For instance, the equation of my time series is:

x(t) = (mean + trend * t) + error

We’ll simply remove the part in the parentheses and build model for the rest.

2. Differencing : This is the commonly used technique to remove non-stationarity. Here we try to model the differences of the terms and
not the actual term. For instance,

x(t) – x(t-1) = ARMA (p , q)

This differencing is called as the Integration part in AR(I)MA. Now, we have three parameters

p : AR

d:I

q : MA

3. Seasonality : Seasonality can easily be incorporated in the ARIMA model directly. More on this has been discussed in the applications
part below.

Step 3: Find Optimal Parameters

The parameters p,d,q can be found using ACF and PACF plots . An addition to this approach is can be, if both ACF and PACF decreases
gradually, it indicates that we need to make the time series stationary and introduce a value to “d”.

Step 4: Build ARIMA Model

With the parameters in hand, we can now try to build ARIMA model. The value found in the previous section might be an approximate
estimate and we need to explore more (p,d,q) combinations. The one with the lowest BIC and AIC should be our choice. We can also try
some models with a seasonal component. Just in case, we notice any seasonality in ACF/PACF plots.

Step 5: Make Predictions

Once we have the nal ARIMA model, we are now ready to make predictions on the future time points. We can also visualize the trends
to cross validate if the model works fine.

Applications of Time Series Model

Now, we’ll use the same example that we have used above. Then, using time series, we’ll make future predictions. We recommend you
to check out the example before proceeding further.
Where did we start ?

Following is the plot of the number of passengers with years. Try and make observations on this plot before moving further in the
article.

Here are my observations :

1. There is a trend component which grows the passenger year by year.

2. There looks to be a seasonal component which has a cycle less than 12 months.

3. The variance in the data keeps on increasing with time.

We know that we need to address two issues before we test stationary series. One, we need to remove unequal variances. We do this
using log of the series. Two, we need to address the trend component. We do this by taking difference of the series. Now, let’s test the
resultant series.

adf.test(diff(log(AirPassengers)), alternative="stationary", k=0)

Augmented Dickey-Fuller Test

data: diff(log(AirPassengers))

Dickey-Fuller = -9.6003, Lag order = 0,

p-value = 0.01

alternative hypothesis: stationary

We see that the series is stationary enough to do any kind of time series modelling.

Next step is to nd the right parameters to be used in the ARIMA model. We already know that the ‘d’ component is 1 as we need 1
difference to make the series stationary. We do this using the Correlation plots. Following are the ACF plots for the series :

#ACF Plots

acf(log(AirPassengers))
What do you see in the chart shown above?

Clearly, the decay of ACF chart is very slow, which means that the population is not stationary. We have already discussed above
that we now intend to regress on the difference of logs rather than log directly. Let’s see how ACF and PACF curve come out after
regressing on the difference.

[stextbox id="grey"]

acf(diff(log(AirPassengers)))
pacf(diff(log(AirPassengers)))

Clearly, ACF plot cuts off after the rst lag. Hence, we understood that value of p should be 0 as the ACF is the curve getting a cut off.
While value of q should be 1 or 2. After a few iterations, we found that (0,1,1) as (p,d,q) comes out to be the combination with least AIC
and BIC.

Let’s t an ARIMA model and predict the future 10 years. Also, we will try tting in a seasonal component in the ARIMA formulation.
Then, we will visualize the prediction along with the training data. You can use the following code to do the same :

(fit <- arima(log(AirPassengers), c(0, 1, 1),seasonal = list(order = c(0, 1, 1), period = 12)))

pred <- predict(fit, n.ahead = 10*12)

ts.plot(AirPassengers,2.718^pred$pred, log = "y", lty = c(1,3))

Projects

Now, its time to take the plunge and actually play with some other real datasets. So are you ready to take on the challenge? Test the
techniques discussed in this post and accelerate your learning in Time Series Analysis with the following Practice Problems:

Practice Problem: Food Demand Forecast the demand of meals for a meal
Forecasting Challenge delivery company

Forecast the passenger traffic for an

Practice Problem: Time Series Analyses
intra-city rail system

End Notes

With this, we come to this end of tutorial on Time Series Modelling. I hope this will help you to improve your knowledge to work on time
based data. To reap maximum bene ts out of this tutorial, I’d suggest you to practice these R codes side by side and check your
progress.

Did you nd the article useful? Share with us if you have done similar kind of analysis before. Do let us know your thoughts about this
article in the box below.

Note – The discussions of this article are going on at AV’s Discuss portal. Join here!

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on
twitter or like our facebook page.

You can also read this article on Analytics Vidhya's Android APP

Top Business Analytics Programs in India (2015 – 16)

h
PREVIOUS ARTICLE

10 Machine Learning Algorithms Explained to an ‘Army Soldier’

Tavish Srivastava
Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-
science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including
Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and
Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory
or even movie related to this idea.

This article is quite old and you might not get a prompt response from the author. We request you to post this comment on
Analytics Vidhya's Discussion portal to get your queries resolved

50 COMMENTS

DR.D.K.SAMUEL Reply
December 16, 2015 at 5:27 am

Really useful. Please also write on how to make weather data into a times series for further analysis in R

DR SAHUL BHARTI Reply

December 16, 2015 at 5:56 am

Hi
I am a medical specialist (MD Pediatrics) with further training in research and statistics (Panjab University, Chandigarh). In our medical
settings, time series data are often seen in ICU and anesthesia related research where patients are continuously monitored for days or
even weeks generating such data. Frankly speaking, your article has clearly decoded this arcane process of time series analysis with
quite wonderful insight into its practical relevance. Fabulous article Mr Tavish, kindly write more about ARIMA modelling.
Thanks a lot
Dr Sahul Bharti

BASEER Reply
December 16, 2015 at 6:37 am

great article to start with timeseries mod

SHIVAM BANSAL Reply

December 16, 2015 at 6:52 am

Awesome Tutorial.
Big fan of you Tavish, your articles are really great. Explanations in beautiful manner.

ANKUR BHARGAVA Reply

December 16, 2015 at 7:19 am

Please elucidate on PACF part of MA series.

Thanks

TAVISH SRIVASTAVA Reply

December 16, 2015 at 9:04 am

PACF is not really required for MA models, as the degree of MA can be found from ACF directly.

RASHEED4DEM Reply
October 4, 2016 at 2:35 pm

Thank u it really help a lot

HUGO Reply
December 16, 2015 at 1:02 pm

Hi Tavish.
First off all, congratulations on your work around here. It’s been very useful. Thank you
I a doubt and i hope that you can help me

I performed a Dickey-Fuller test on both series ; AirPassengers and diff( log(AirPassengers))

Here the results:

Augmented Dickey-Fuller Test

data: diff(log(AirPassengers))
Dickey-Fuller = -9.6003, Lag order = 0, p-value = 0.01
alternative hypothesis: stationary

and

Augmented Dickey-Fuller Test

data: diff(log(AirPassengers))
Dickey-Fuller = -9.6003, Lag order = 0, p-value = 0.01
alternative hypothesis: stationary

In both tests i got a small p-value that allows me to reject the non stationary hypothesis. Am I right?

If so, the first series is already stationary??

This means that if i had performed a stationary test on the original series had move on to the next step.

Thank you in advance.

HUGO Reply
December 16, 2015 at 1:05 pm

Now with the right results .

Augmented Dickey-Fuller Test

data: AirPassengers
Dickey-Fuller = -4.6392, Lag order = 0, p-value = 0.01
Dickey-Fuller = -4.6392, Lag order = 0, p-value = 0.01
alternative hypothesis: stationary

Augmented Dickey-Fuller Test

data: diff(log(AirPassengers))
Dickey-Fuller = -9.6003, Lag order = 0, p-value = 0.01
alternative hypothesis: stationary

RAM Reply
December 18, 2015 at 10:45 pm

@Hugo,

Yes, the adf.test(AirPassengers) indicates that the series is stationary. This is a bit misleading.

Reason: This test first does a de-trend on the series, (ie., removes the trend component), then checks for stationarity. Hence it flags the
series as stationary.

There is another test in package fUnitRoots. Please try this code:

## Start
install.packages(“fUnitRoots”) # If you already have installed this package, you can omit this line
library(fUnitRoots);
adfTest(AirPassengers);
adfTest(log(AirPassengers));
adfTest(diff(AirPassengers));
## End

Hope this helps..

ARATI Reply
May 3, 2016 at 3:37 pm

thanks Ram, I had the same question as Hugo and your explanation helped
I just wanted to point out for the benefit of anyone else looking at this that R is cap sensitive, do not forget to capitalize the T in adfTest
else your function will not work.

ANNE Reply
March 23, 2018 at 2:44 am

If I use diff(AirPAssengers) dataset and test it with adfTest it gives stationary

AJAY Reply
December 16, 2015 at 6:32 pm

Fortunately the auto.arima function allows us to model time series quite nicely though it is quite useful to know the basics. Here is some
code I wrote on the same data

http://rpubs.com/ajaydecis/ts

SRINI Reply
December 18, 2015 at 10:21 am

Awesome Tavish! Short, crisp and absolutely crystal clear

TOUSIFAHAMED Reply
TOUSIFAHAMED Reply
December 20, 2015 at 6:56 am

Thanks for the post.

Awesome explanation..!

MIKE RAI Reply

December 20, 2015 at 5:54 pm

Rohit,

Please be more specific, and provide the location of the discussion on lnkd, so that Tavish can respond appropriately..
.

IAN M Reply
December 26, 2015 at 3:54 pm

The pairs of graphs introducing the concepts worked really well. I found the use of english letters for all the formulae clear.

BA Reply
December 28, 2015 at 10:01 am

Hi, thanks for the tutorial. I have just one comment for the identification of MA order. We have been taught that the length of the first line
of the ACF curve is always equal to 1 [because it’s cov(Xt, Xt)/(sigma(Xt)*sigma(Xt) = 1]. So we dont look at this line, we start counting
after this line. If that’s the case, your first MA example should be MA(1) instead of MA(2)

CAROL ZHANG Reply

January 1, 2016 at 9:20 am

Hi Tavish,

One question about the ADF test.

adf.test(diff(log(AirPassengers)), alternative=”stationary”, k=0)

How shall we decide on the value for k? I tried to run another version with no specification of k value. And the default value used it k = 5
(aka. lag order = 5).

Many thanks!

ABDI KENESA Reply

January 6, 2016 at 6:41 am

Thank you.
Your article is great

STEVEN F CHAPMAN, PH.D. Reply

February 18, 2016 at 12:56 pm

Is there any way we can get a PDF of this? I would like to use it to introduce my staff to trend analysis and some errors to look out for–

SURYA PRAKASH Reply

March 11, 2016 at 6:45 am

Why did we take d as 1 in this example?

PINTU SENGUPTA
PINTU SENGUPTA Reply
September 8, 2016 at 9:44 am

We have difference the series once and get to see that the trend is removed. Had the trend been still there we would have difference the
series once again. This series did not require to be difference more than once; hence d=1.

GURU Reply
March 19, 2016 at 1:24 pm

This article was very helpful

SIDHRAJ Reply
March 26, 2016 at 1:49 pm

why the author not answer the questions…..

this force us to look for better articles and doubt this one.

AMY Reply
April 18, 2016 at 4:28 am

Please explain the parameters to this last line of code

ts.plot(AirPassengers,2.718^pred$pred, log = “y”, lty = c(1,3))

ARATI Reply
May 3, 2016 at 5:43 pm

Hi,
After you run this
pred <- predict(APmodel, n.ahead=10*12)

take a look at 'pred'

It is a list of 2 (pred and se – I assume these are predictions and errors.)
I would suggest using a name other than pred in the predict function to avoid confusion , I used the following

APforecast <- predict(APmodel, n.ahead=10*12)

So APforecast is a list of pred and se and we need to plot the pred values , ie APforecast$pred
Also we did the arima on log of AirPassengers, so the forecast we have got is actually log of the true forecast. Hence we need to find the
log inverse of what we have got.
ie. log(forecast) = APforecast$pred
so forecast = e ^ APforecast$pred
e= 2.718
If you find that confusing, I would suggest reading up on natural logarithms and their inverse

the log = "y' is to plot on a logarithmic scale – this is not needed, try the function without it and with and observe the results.

The lty bit I have not figured out yet. Drop it and try the ts.plot, it works fine.

BRANDON Reply
September 6, 2016 at 11:54 pm

Hey Amy, ts.plot() will plot several time series on the same plot. The first two entries are the two time series he’s plotting. The last two
entries are nice visual parameters (we’ll come back to that). Clearly, this plots the AirPassengers time series in a dark, continuous line.
The second entry is also a time series, but it is a little more confusing: ” 2.718^pred$pred”. First, you have to know what pred$pred is.
The function predict() here is a generic function that will work differently for different classes plugged into it (it says so if you type ?
predict). The class we’re working with is an Arima class. If you type ?predict.Arima you will find a good description of what the function
is all about. predict.Arima() spits out something with a “pred” part (for predict) and a “se” part (for standard error). We want the “pred”
part, hence pred$pred. So, pred$pred is a time series. Now, 2.718^pred$pred is also. You have to remember that 2.718 is approximately
part, hence pred$pred. So, pred$pred is a time series. Now, 2.718^pred$pred is also. You have to remember that 2.718 is approximately
the constant e, and then this makes sense. He’s just undoing the log that he placed on the data when he created “fit”.

As for the last two parameters, log = “y” sets the y-axis to be on a log scale. And finally, lty = c(1,3) will set the LineTYpe to 1 (for solid)
for the original time series and 3 (for dotted) for the predicted time series.

MUSTAFA Reply
April 21, 2016 at 12:45 pm

Thanks a lot! Very useful article.

LUCA NICOLI Reply

May 4, 2016 at 4:24 pm

Hi, It is very interesting.

Can you make the same example with Python code?

REDDAIAH B N Reply
May 24, 2016 at 1:23 pm

Hi Tavish,
Thank you very much for the nice explanation about time series using ARIMA.
However I have the following the queries regarding the analysis.

1.ACF and PACF are to find the p and q values as part of ARIMA? Is only ACF is not enough to find the p and q?
If not can you explain the importance of PACF?

Thanks in advance…….:)

MUHAMMAD ARIF Reply

June 1, 2016 at 7:01 am

if non stationarity is present in data ,can we analyse that data

PARIND Reply
June 6, 2016 at 12:00 pm

Hey Tavish, really enjoyed the content,

Just a small doubt: Can you please ebaorate the covariance in stationary terms. I understand the covariance term, but here in time
series,it is not coming to my mind. Can you please help me understand the third condition of stationary series i.e “The covariance of the i
th term and the (i + m) th term should not be a function of time.” Please help me understand from data perspective e.g if i have sales
data for each date. how can you explain convariance in real life example with daily sales data.

PARTH GERA Reply

July 4, 2016 at 10:52 am

Hi Tavish,Thanks a lot .This article was immensely helpful .

I just had one small issue.After the last step, If I want to extract the predicted values from the curve . How do we do that?

RAM Reply
July 4, 2016 at 2:26 pm

@Parth,

You get the predicted values from the variable pred.

pred is a list with two items: pred and se. ( prediction and standard error).
To see the predictions, use this command: print(pred$pred)

PARTH GERA Reply

July 4, 2016 at 5:30 pm

Hi Ram,
Thanks for your help . Yeah, print(pred$pred) would give us log of the predicted values. print(2.718^pred$pred) would give us the actual
predicted values.
Thanks

RAM Reply
July 5, 2016 at 12:18 am

Yes, if you use ‘log’ when creating the model, you will use antilog or exponent to get the predicted values. If you create a model without
the log function, you will not use exponent to get the predicted values

MANPREET Reply
August 1, 2016 at 10:14 am

how to extract the data for the predicted and actual values from R

AKSHAY Reply
August 8, 2016 at 9:49 pm

hello,
the data you used in your tutorial, AirPassengers, is already a time series object.
my question is, HOW can i make/prepare my own time series object?
i currently have a historical currency exchange data set, with first column being date, and the rest 20 columns are titled by country, and
their values are the exchange rate.
after i convert my date column into date object, when i use the same commands used in your tutorial, the results are funny.
for example, start(data$Date) will give me a result of:
[1] 1 1
and frequency(data$Date) will return:
[1] 1
can you please explain HOW to prepare our data accordingly so we can use the functions?
thank you!

BRANDON Reply
September 7, 2016 at 12:03 am

If you type in ?ts then you should be on your way. You only need a (single) time series, a frequency, and a start date. The examples at
the bottom of the documentation should be very helpful. I’m guessing you’d write something like ts( your_timeseries_data, frequency =
365, start = c(1980, 153)) for instance if your data started on the 153rd day of 1980.

IMAMHIDAYAT Reply
August 30, 2016 at 7:26 am

Thank you very much…

RAM Reply
September 10, 2016 at 12:11 am

What is the format of your date value before you converted it ? If you post a few rows from your data, perhaps we can help.
VISHWANATH Reply
September 17, 2016 at 11:09 pm

Thank you, It was very helpful for me

KEVIN Reply
October 2, 2016 at 11:35 pm

Hi, thanks for the article! I’m still unclear how the parameters (p,d,q) = (0,1,1) were found from the ACF and PCF. I understand d, but not
p or q. What do you mean when you say ‘cutting off’?

AMY Reply
October 5, 2016 at 5:18 pm

Hi Kevin,

ACF plot is a bar chart of the coefficients of correlation between a time series and lags of itself.
PACF plot is a plot of the partial correlation coefficients between the series and lags of itself.

To find p and q you need to look at ACF and PACF plots. The interpretation of ACF and PACF plots to find p and q are as follows:

AR (p) model: If ACF plot tails off* but PACF plot cut off** after p lags
MA(q) model: If PACF plot tails off but ACF plot cut off after q lags
ARMA(p,q) model: If both ACF and PACF plot tail off, you can choose different combinations of p and q , smaller p and q are tried.
ARIMA(p,d,q) model: If it’s ARMA with d times differencing to make time series stationary.

Use AIC and BIC to find the most appropriate model. Lower values of AIC and BIC are desirable.

*Tails of mean slow decaying of the plot, i.e. plot has significant spikes at higher lags too.
**Cut off means the bar is significant at lag p and not significant at any higher order lags.

Here is a link that might help you understand the concept further http://people.duke.edu/~rnau/arimrule.htm

Hope this helps.

JOHNNY Reply
October 11, 2016 at 4:16 pm

Hi. Great article and I am working on a gforce (values + and -) dataset and am having trouble with the log function. NaNs produced and
not sure how to go about addressing this.

Any help would be appreciated.

AMIT KUMAR JAIN Reply

November 19, 2016 at 5:56 pm

GREAT ARTICLE… THANK YOU TAVISH!!!!

One strong suggestion to Analytics Vidya. Please add a link of PDF downloads to these kind of articles (without advertisements) which
for a person like me who is creating a repository of awesome articles to learn from will be really helpful!!!!

PARTH GERA Reply

November 19, 2016 at 7:22 pm

Hi Tavish, Great article. I had one doubt .In the last step , while fitting the arima model , you have used log(AirPassengers) instead of
diff(log(AirPassengers)).
Why is that so? log(Airpassengers) isn’t a stationary series , right?

DAZ Reply
May 20, 2017 at 3:51 am

Just an FYI for r-newbies. I don’t think its mentioned above by to run adf.test you will need to install the tseries package.

GAURAV SHARMA Reply

April 29, 2018 at 2:24 am

It;s handled by defining c(0, 1, 1) while fitting. Here 1st 1 denote to differentiation, which will make series stationary.
POPULAR POSTS

Here are 7 Data Science Projects on GitHub to Showcase your Machine Learning Skills!
Commonly used Machine Learning Algorithms (with Python and R Codes)
24 Ultimate Data Science Projects To Boost Your Knowledge and Skills (& can be accessed freely)
A Complete Python Tutorial to Learn Data Science from Scratch
7 Regression Techniques you should know!
4 Unique Methods to Optimize your Python Code for Data Science
3 Beginner-Friendly Techniques to Extract Features from Image Data using Python
6 Easy Steps to Learn Naive Bayes Algorithm with codes in Python and R

Introducing “PocketML” – an Experiential Learning Platform for Data Science

SEPTEMBER 26, 2019

Demystifying BERT: A Comprehensive Guide to the Groundbreaking NLP Framework

SEPTEMBER 25, 2019

DataHack Radio: All you Need to Know about TensorFlow with Google’s Paige Bailey
SEPTEMBER 24, 2019
How Search Engines like Google Retrieve Results: Introduction to Information Extraction using Python and spaCy
SEPTEMBER 23, 2019
DATA SCIENTISTS

COMPANIES

JOIN OUR COMMUNITY :

693+ Operations Manual Revision 2.5 (English)
100% (2)
693+ Operations Manual Revision 2.5 (English)
424 pages
Elliott Wave Timing Beyond Ordinary Fibonacci Methods
From Everand
Elliott Wave Timing Beyond Ordinary Fibonacci Methods
Mark Lytle
4/5 (23)
4540 17 PDF
No ratings yet
4540 17 PDF
274 pages
The Fundamentals of Freelancing - A Starting Guide For Freelance Songwriters, Producers, and Engineers by Make Pop Music
No ratings yet
The Fundamentals of Freelancing - A Starting Guide For Freelance Songwriters, Producers, and Engineers by Make Pop Music
14 pages
Tutorial 20 Liner With Sliding Gap
No ratings yet
Tutorial 20 Liner With Sliding Gap
17 pages
Time Series Analysis and Forecasting Using R
No ratings yet
Time Series Analysis and Forecasting Using R
30 pages
CH 5 Time Series
No ratings yet
CH 5 Time Series
46 pages
Lecture Notes
No ratings yet
Lecture Notes
97 pages
Time - Series - in - Brief
No ratings yet
Time - Series - in - Brief
11 pages
Math7339TS1TimesSeries Intro
No ratings yet
Math7339TS1TimesSeries Intro
33 pages
00 Time Series Analysis_ Complete Study Guide
No ratings yet
00 Time Series Analysis_ Complete Study Guide
26 pages
Time Series Chap21
No ratings yet
Time Series Chap21
27 pages
Introduction To Time Series Analysis, Lectures
No ratings yet
Introduction To Time Series Analysis, Lectures
49 pages
TIME SERIES MODEL
No ratings yet
TIME SERIES MODEL
22 pages
Unit-4
No ratings yet
Unit-4
24 pages
time series gujrati
No ratings yet
time series gujrati
45 pages
Time Series Forecasting Complete Tutorial Part 1
No ratings yet
Time Series Forecasting Complete Tutorial Part 1
10 pages
Day8 Session3 Time-Series Econometrics
No ratings yet
Day8 Session3 Time-Series Econometrics
33 pages
Econometric Toolkit For Studying Dynamic Models in Economics and Finance
No ratings yet
Econometric Toolkit For Studying Dynamic Models in Economics and Finance
39 pages
Econometrics CH 5
No ratings yet
Econometrics CH 5
87 pages
Stationary_Non-stationary_White Noise Time Series
No ratings yet
Stationary_Non-stationary_White Noise Time Series
21 pages
Mtrics-II(ppt-2)
No ratings yet
Mtrics-II(ppt-2)
36 pages
Chapter -6-Time Series Analysis [Compatibility Mode]
No ratings yet
Chapter -6-Time Series Analysis [Compatibility Mode]
102 pages
CHAPTER 3
No ratings yet
CHAPTER 3
28 pages
ECON 762 Lecture Notes
No ratings yet
ECON 762 Lecture Notes
19 pages
Characteristics of Time Series
No ratings yet
Characteristics of Time Series
17 pages
A129205660 - 23591 - 22 - 2019 - Time Series-1-1
No ratings yet
A129205660 - 23591 - 22 - 2019 - Time Series-1-1
20 pages
Quantitative Chapter10
No ratings yet
Quantitative Chapter10
27 pages
Predicting Stock Prices With Echo State Networks - Towards Data Science
No ratings yet
Predicting Stock Prices With Echo State Networks - Towards Data Science
19 pages
Chapter Two
No ratings yet
Chapter Two
13 pages
Propaganda Media
No ratings yet
Propaganda Media
42 pages
Time Series Analysis
100% (1)
Time Series Analysis
66 pages
Applied Econometrics - : Introduction To Time Series
No ratings yet
Applied Econometrics - : Introduction To Time Series
26 pages
Econ_II_Unit_3_Regression_with_Time_Series_Data,Edited_2015
No ratings yet
Econ_II_Unit_3_Regression_with_Time_Series_Data,Edited_2015
61 pages
Econometrics For Finance Ch6
No ratings yet
Econometrics For Finance Ch6
10 pages
make_non-stationary
No ratings yet
make_non-stationary
29 pages
Class Notes
No ratings yet
Class Notes
6 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Lecture 1
No ratings yet
Lecture 1
45 pages
Econometrics II Chap 4.1 Univariate Time Series Ppt (1)
No ratings yet
Econometrics II Chap 4.1 Univariate Time Series Ppt (1)
63 pages
Time-Series Note September 2022
No ratings yet
Time-Series Note September 2022
9 pages
Slides
No ratings yet
Slides
31 pages
econometrics II CH-3 PPT-1 (1)
No ratings yet
econometrics II CH-3 PPT-1 (1)
35 pages
Stationarity Issues in Time Series Models: David A. Dickey North Carolina State University
No ratings yet
Stationarity Issues in Time Series Models: David A. Dickey North Carolina State University
17 pages
PDS+LVC+3+Post-Session+Summary+Time+Series
No ratings yet
PDS+LVC+3+Post-Session+Summary+Time+Series
18 pages
Time-Series Econometrics
No ratings yet
Time-Series Econometrics
36 pages
ECON-960 - Econometrics All Slides Final Term PDF
No ratings yet
ECON-960 - Econometrics All Slides Final Term PDF
44 pages
Chap1 Introduction - pt1 Student
No ratings yet
Chap1 Introduction - pt1 Student
11 pages
Econometrics Chapter Six (1)
No ratings yet
Econometrics Chapter Six (1)
80 pages
Time Series and Survival Analysis
No ratings yet
Time Series and Survival Analysis
30 pages
gunjan p
No ratings yet
gunjan p
60 pages
Time Series Chapman Hall CRC Texts in Statistical Science 1st Edition Robert Shumway download
No ratings yet
Time Series Chapman Hall CRC Texts in Statistical Science 1st Edition Robert Shumway download
79 pages
Time Series Analysis Homework Solutions
100% (1)
Time Series Analysis Homework Solutions
6 pages
Introduction to Time Series Analysis
No ratings yet
Introduction to Time Series Analysis
93 pages
Group 9 Time Series Data Analysis (ARIMA)
No ratings yet
Group 9 Time Series Data Analysis (ARIMA)
47 pages
Chapter 3ee
No ratings yet
Chapter 3ee
51 pages
Time Series: H T 2008 P - G R
No ratings yet
Time Series: H T 2008 P - G R
161 pages
TSA Chapter 0: Fundamental concepts of time series
No ratings yet
TSA Chapter 0: Fundamental concepts of time series
6 pages
ARIMA Model Python Example - Time Series Forecasting
No ratings yet
ARIMA Model Python Example - Time Series Forecasting
11 pages
Time Series 2022 B
No ratings yet
Time Series 2022 B
57 pages
time-series-forecast-a-comprehensive-guide - Jupyter Notebook
No ratings yet
time-series-forecast-a-comprehensive-guide - Jupyter Notebook
24 pages
Time Series Forecasting
100% (1)
Time Series Forecasting
52 pages
The Logical Solution Syracuse Conjecture
From Everand
The Logical Solution Syracuse Conjecture
Rolando Zucchini
No ratings yet
No. of Pages
No ratings yet
No. of Pages
3 pages
Plagiarism Checker X Originality Report: Similarity Found: 24%
No ratings yet
Plagiarism Checker X Originality Report: Similarity Found: 24%
5 pages
Simultaneous Equations
No ratings yet
Simultaneous Equations
5 pages
Gec 330: Basic Social Sciences Research Methods Assignment 1
No ratings yet
Gec 330: Basic Social Sciences Research Methods Assignment 1
6 pages
Evaluation Research Revised
No ratings yet
Evaluation Research Revised
8 pages
Detecting Multicollinearity Using Variance Inflation Factors
No ratings yet
Detecting Multicollinearity Using Variance Inflation Factors
7 pages
0571 Physics
No ratings yet
0571 Physics
36 pages
Chemistry Notes For Class 12 Chapter 2 Solutions PDF
100% (3)
Chemistry Notes For Class 12 Chapter 2 Solutions PDF
15 pages
Chemistry Notes For Class 12 Chapter 2 Solutions PDF
100% (3)
Chemistry Notes For Class 12 Chapter 2 Solutions PDF
15 pages
Tesdoo 0
No ratings yet
Tesdoo 0
2 pages
Sta361: Time Series Analysis: T T T T
No ratings yet
Sta361: Time Series Analysis: T T T T
3 pages
Test1 15
No ratings yet
Test1 15
2 pages
El Filibusterismo 25-32
No ratings yet
El Filibusterismo 25-32
24 pages
RKB Maret 2025 Budong2
No ratings yet
RKB Maret 2025 Budong2
12 pages
CL 8
No ratings yet
CL 8
8 pages
Case Study
No ratings yet
Case Study
3 pages
Linux Admin III
100% (4)
Linux Admin III
230 pages
Assignment 1 ENGR 301
No ratings yet
Assignment 1 ENGR 301
3 pages
SPEED v.11.04: Release Notes
No ratings yet
SPEED v.11.04: Release Notes
19 pages
ARQ Protocol
No ratings yet
ARQ Protocol
31 pages
14 Physics Laws of Motion
No ratings yet
14 Physics Laws of Motion
3 pages
Real Estate Business Entity Program
No ratings yet
Real Estate Business Entity Program
20 pages
Bs Hanyang
No ratings yet
Bs Hanyang
49 pages
Fastener Tightening Specifications: Application Specification Metric English
No ratings yet
Fastener Tightening Specifications: Application Specification Metric English
9 pages
University of Dhaka Department of Marketing EMBA Program Course Outline
No ratings yet
University of Dhaka Department of Marketing EMBA Program Course Outline
8 pages
Ideal Triumph
No ratings yet
Ideal Triumph
52 pages
2 MIL Similarities and Differences Between and Among the Literacies (1)
No ratings yet
2 MIL Similarities and Differences Between and Among the Literacies (1)
16 pages
Topic 3 Gears and Shafts
No ratings yet
Topic 3 Gears and Shafts
27 pages
Project Xii
No ratings yet
Project Xii
9 pages
Procurement Control Tower Proof of Concept through Machine Learning and Natural Language
No ratings yet
Procurement Control Tower Proof of Concept through Machine Learning and Natural Language
68 pages
AD 123 IC Engine 27 nov
No ratings yet
AD 123 IC Engine 27 nov
5 pages
Detroid SERIES 60 PDF
100% (1)
Detroid SERIES 60 PDF
10 pages
Williams - Marxism and Literature (Intro, Chs 1 & 2)
100% (1)
Williams - Marxism and Literature (Intro, Chs 1 & 2)
23 pages
Edukasyon Sa Pagpapakatao Lesson 2
No ratings yet
Edukasyon Sa Pagpapakatao Lesson 2
21 pages
My Role Model Essay
100% (2)
My Role Model Essay
8 pages
Grover S Algorithm
No ratings yet
Grover S Algorithm
67 pages
TL - 080 PDF
No ratings yet
TL - 080 PDF
12 pages
Introduction To Professional Ethics and Practice
No ratings yet
Introduction To Professional Ethics and Practice
19 pages
Learning From The Past Building Community in New Towns Growth Areas and New Communities
No ratings yet
Learning From The Past Building Community in New Towns Growth Areas and New Communities
47 pages