05 ASAP TimeSeriesForcasting - Day 5-7

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

FOUNDATION TO DATA SCIENCE

Business Analytics

MODULE II-Unit:3
Time Series Forecasting-Forecasting Data
Prof. Dr. George Mathew
B.Sc., B.Tech, PGDCA, PGDM, MBA, PhD 1
Time Series Forecasting
2.3.1 What is Time-Series analysis
2.3.1.1 Decomposition
2.3.2 Correlation and Causation
2.3.3 Autocorrelation
2.3.4 Forecasting Data
2.3.5 Autoregressive Modeling, Moving averages and ARIMA
2.3.6 Neural networks for Time series data
Forecasting Data
(Day-5)
Forecasting Data
A forecast is a prediction of some future event or events. As
suggested by Neils Bohr, making good predictions is not always
easy. Forecasting is an important problem that spans many fields
including business and industry, government, economics,
environmental sciences, medicine, social science, politics, and
finance. Forecasting problems are often classified as short-term,
medium-term, and long-term.
Short-term forecasting problems involve predicting events only a few
time periods (days, weeks, and months) into the future.
Medium-term forecasts extend from 1 to 2 years into the future, and
long-term forecasting problems can extend beyond that by many
years. Short- and medium-term forecasts are required for activities
that range from operations management to budgeting and selecting
new research and development projects.
Forecasting Data
Long-term forecasts impact issues such as strategic planning. Short- and
medium-term forecasting is typically based on identifying, modeling, and
extrapolating the patterns found in historical data. Because these historical
data usually exhibit inertia and do not change dramatically very quickly,
statistical methods are very useful for short- and medium-term forecasting.

Most forecasting problems involve the use of time series data. A time
series is a time-oriented or chronological sequence of observations on a
variable of interest.

For example, Figure 1.1 shows the market yield on US Treasury Securities
at 10-year constant maturity from April 1953 through December 2006. This
This graph is called a time series plot.
The reason that forecasting is so important is that prediction of future
events is a critical input into many types of planning and decision-
making processes, with application to areas such as the following:
1. Operations Management. Business organizations routinely use
forecasts of product sales or demand for services in order to schedule
production, control inventories, manage the supply chain, determine
staffing requirements, and plan capacity. Forecasts may also be used
to determine the mix of products or services to be offered and the
locations at which products are to be produced.
2. Marketing. Forecasting is important in many marketing decisions.
Forecasts of sales response to advertising expenditures, new
promotions, or changes in pricing polices enable businesses to
evaluate their effectiveness, determine whether goals are being met,
and make adjustments.
3. Finance and Risk Management. Investors in financial assets are
interested in forecasting the returns from their investments. These
assets include but are not limited to stocks, bonds, and commodities;
other investment decisions can be made relative to forecasts of
interest rates, options, and currency exchange rates. Financial risk
management requires forecasts of the volatility of asset returns so
that the risks associated with investment portfolios can be evaluated
and insured, and so that financial derivatives can be properly priced.
4. Economics. Governments, financial institutions, and policy
organizations require forecasts of major economic variables, such as
gross domestic product, population growth, unemployment, interest
rates, inflation, job growth, production, and consumption. These
forecasts are an integral part of the guidance behind monetary and
fiscal policy, and budgeting plans and decisions made by
governments. They are also instrumental in the strategic planning
decisions made by business organizations and financial institutions.
5. Industrial Process Control. Forecasts of the future values of critical
quality characteristics of a production process can help determine
when important controllable variables in the process should be
changed, or if the process should be shut down and overhauled.
Feedback and feedforward control schemes are widely used in
monitoring and adjustment of industrial processes, and predictions of
the process output are an integral part of these schemes.
6. Demography. Forecasts of population by country and regions are
made routinely, often stratified by variables such as gender, age,
and race. Demographers also forecast births, deaths, and migration
patterns of populations. Governments use these forecasts for planning
policy and social service actions, such as spending on health care,
retirement programs, and antipoverty programs. Many businesses
use forecasts of populations by age groups to make strategic plans
regarding developing new product lines or the types of services that
will be offered.
There are only two broad types of Forecasting techniques—
qualitative methods and quantitative methods.
Qualitative forecasting techniques are often subjective in nature and
require judgment on the part of experts. Qualitative forecasts are
often used in situations where there is little or no historical data on
which to base the forecast. An example would be the introduction of
a new product, for which there is no relevant history. In this situation,
the company might use the expert opinion of sales andmarketing
personnel to subjectively estimate product sales during the new
product introduction phase of its life cycle.
Sometimes qualitative forecasting methods make use of marketing
tests, surveys of potential customers, and experience with the sales
performance of other products (both their own and those of
competitors). However, although some data analysis may be
performed, the basis of the forecast is subjective judgment.
Perhaps the most formal and widely known qualitative forecasting
technique is the Delphi Method. It employs a panel of experts who
are assumed to be knowledgeable about the problem. The panel
members are physically separated to avoid their deliberations being
impacted either by social pressures or by a single dominant
individual. Each panel member responds to a questionnaire
containing a series of questions and returns the information to a
coordinator. Following the first questionnaire, subsequent questions
are submitted to the panelists along with information about the
opinions of the panel as a group. This allows panelists to review their
predictions relative to the opinions of the entire group. After several
rounds, it is hoped that the opinions of the panelists converge to a
consensus, although achieving a consensus is not required and
justified differences of opinion can be included in the outcome.
Quantitative forecasting techniques make formal use of historical
data and a forecasting model. The model formally summarizes
patterns in the data and expresses a statistical relationship between
previous and current values of the variable. Then the model is used to
project the patterns in the data into the future. In other words, the
forecasting model is used to extrapolate past and current behavior
into the future.
The three most widely used are regression models, smoothing
models, and general time series models.
Regression models make use of relationships between the variable of
interest and one or more related predictor variables. Sometimes
regression models are called causal forecasting models, because the
predictor variables are assumed to describe the forces that cause or
drive the observed values of the variable of interest. An example
would be using data on house purchases as a predictor variable to
forecast furniture sales.
The method of least squares is the formal basis of most
regression models. Smoothing models typically employ a
simple function of previous observations to provide a
forecast of the variable of interest. These methods may
have a formal statistical basis, but they are often used and
justified heuristically on the basis that they are easy to use
and produce satisfactory results. General time series
models employ the statistical properties of the historical
data to specify a formal model and then estimate the
unknown parameters of this model (usually) by least
squares. In subsequent chapters, we will discuss all three
types of quantitative forecasting models.
The form of the forecast can be important. We typically think of a
forecast as a single number that represents our best estimate of the
future value of the variable of interest. Statisticians would call this a
point estimate or point forecast. Now these forecasts are almost
always wrong; that is, we experience forecast error. Consequently,
it is usually a good practice to accompany a forecast with an estimate
of how large a forecast error might be experienced. One way to do
this is to provide a prediction interval (PI) to accompany the point
forecast. The PI is a range of values for the future observation, and it
is likely to prove far more useful in decision-making than a single
number. We will show how to obtain PIs for most of the
forecasting methods discussed in the book.
Other important features of the forecasting problem are the forecast
horizon and the forecast interval.
The forecast horizon is the number of future periods for which
forecasts must be produced. The horizon is often dictated by the
nature of the problem. For example, in production planning, forecasts
of product demand may be made on a monthly basis.
Because of the time required to change or modify a production
schedule, ensure that sufficient raw material and component parts are
available from the supply chain, and plan the delivery of completed
goods to customers or inventory facilities, it would be necessary to
forecast up to 3 months ahead. The forecast horizon is also often
called the forecast lead time.
The forecast interval is the frequency with which new forecasts are
prepared. For example, in production planning, we might forecast
demand on a monthly basis, for up to 3 months in the future (the lead
time or horizon), and prepare a new forecast each month.
Thus the forecast interval is 1 month, the same as the
basic period of time for which each forecast is made.
If the forecast lead time is always the same length,
say, T periods, and the forecast is revised each time period,
then we are employing a rolling or moving horizon
forecasting approach.
This system updates or revises the forecasts for T−1
of the periods in the horizon and computes a forecast for
the newest period T. This rolling horizon approach to
forecasting is widely used when the lead time is several
periods long.
SOME EXAMPLES OF TIME SERIES
Time series plots can reveal patterns such as random, trends, level shifts, periods
or cycles, unusual observations, or a combination of patterns.
The sales of a mature pharmaceutical product may remain relatively
flat in the absence of unchanged marketing or manufacturing strategies.
Weekly sales of a generic pharmaceutical product shown in Figure
appear to be constant over time, at about 10,400 × 103 units, in a random sequence
with no obvious patterns.
To assure conformance with customer requirements and product specifications, the
production of chemicals is monitored by many characteristics.
These may be input variables such as temperature and flow rate, and output
properties such as viscosity and purity.
Due to the continuous nature of chemical manufacturing processes,
output properties often are positively autocorrelated; that is, a value
above the long-run average tends to be followed by other values above the average,
while a value below the average tends to be followed by other
values below the average.
THE FORECASTING PROCESS
Forecasting Data
Forecasting is a common statistical task in business, where it
helps to inform decisions about the scheduling of production,
transportation and personnel, and provides a guide to long-
term strategic planning. However, business forecasting is often
done poorly, and is frequently confused with planning and
goals.
The appropriate forecasting methods depend largely on what
data are available. If there is no data available, or if the data
available are not relevant to the forecasts, then qualitative
forecasting methods must be used. These methods are not
purely guesswork—there are well-developed structured
approaches to obtaining good forecasts without using historical
data.
Forecasting Data
Quantitative forecasting can be applied when two conditions
are satised:
1. numerical information about the past is available;
2. It is reasonable to assume that some aspects of the past
patterns will continue into the future.
There is a wide range of quantitative forecasting methods,
often developed within specific disciplines for specific
purposes. Each method has its own properties,
accuracies, and costs that must be considered when choosing
a specific method. Most quantitative prediction problems use
either time series data (collected at regular intervals over time)
or cross-sectional data (collected at a single point in time).
Time series forecasting
Examples of time series data include:
● Daily IBM stock prices
(2 M_PracticalTimeSeriesAnalysis_Moving_Averages_Ch2.pynb)
● Monthly rainfall
● Quarterly sales results for Amazon
● Annual Google profits
When forecasting time series data, the aim is to estimate how
the sequence of observations will continue into the future.
Time series forecasting
Figure below shows the quarterly Australian beer production
from 1992 to the second quarter of 2010.
Time series forecasting
The blue lines show forecasts for the next two years. Notice how the
forecasts have captured the seasonal pattern seen in the historical data
and replicated it for the next two years. The dark shaded region shows 80%
prediction intervals. That is, each future value is expected to lie in the dark
shaded region with a probability of 80%. The light shaded region shows
95% prediction intervals. These prediction intervals are a useful way of
displaying the uncertainty in forecasts. In this case the forecasts are
expected to be accurate, and hence the prediction intervals are quite
narrow.
The simplest time series forecasting methods use only information on the
variable to be forecast, and make no attempt to discover the factors that
affect its behaviour. Therefore they will extrapolate trends and seasonal
patterns, but they ignore all other information such as marketing initiatives,
competitor activity, changes in economic conditions, and so on. Time series
models used for forecasting include decomposition models, exponential
smoothing models and ARIMA models.
Predictor variables and time series
forecasting
Predictor variables are often useful in time series forecasting.
For example, suppose we wish to forecast the hourly electricity
demand (ED) of a hot region during the summer period. A
model with predictor variables might be of the form.

The relationship is not exact — there will always be changes in


electricity demand that cannot be accounted for by the
predictor variables. The “error” term on the right allows
for random variation and the effects of relevant variables that
are not included in the model. We call this an explanatory
model because it helps explain what causes the variation in
electricity demand.
Predictor variables and time series
forecasting
Because the electricity demand data form a time series, we
could also use a time series model for forecasting. In this case,
a suitable time series forecasting equation is of the form

Where 𝑡 is the present hour, 𝑡+1 is the next hour, 𝑡-1is the
previous hour, 𝑡-2 is two hours ago, and so on. Here,
prediction of the future is based on past values of a
variable, but not on external variables which may affect the
system. Again, the “error” term on the right allows for random
variation and the effects of relevant variables that are not
included in the model.
Predictor variables and time series
forecasting
There is also a third type of model which combines the
features of the above two models. For example, it might be
given by

These types of “mixed models” have been given various


names in different disciplines. They are known as dynamic
regression models, panel data models, longitudinal models,
transfer function models, and linear system models (assuming
that is linear).
Predictor variables and time series
forecasting
An explanatory model is useful because it incorporates information about
other variables, rather than only historical values of the variable to be
forecast. However, there are several reasons a forecaster might select a
time series model rather than an explanatory or mixed model. First, the
system may not be understood, and even if it was understood it may be
extremely difficult to measure the relationships that are assumed
to govern its behaviour. Second, it is necessary to know or forecast the
future values of the various predictors in order to be able to forecast the
variable of interest, and this may be too difficult. Third, the main concern
may be only to predict what will happen, not to know why it happens.
Finally, the time series model may give more accurate forecasts than an
explanatory or mixed model.
The model to be used in forecasting depends on the resources and data
available, the accuracy of the competing models, and the way in which the
forecasting model is to be used. (…End Day-5)
Practical: https://bit.ly/DSATS4cast2022ASAP Already completed
Statistical Models for Time Series
The specific models we will discuss are:
• Autoregressive (AR) models
• Moving average (MA) models, and
• Autoregressive integrated moving average
(ARIMA) models
These models have traditionally been the
workhorses of time series forecasting, and
they continue to be applied in a wide range of
situations, from academic research to
industry modeling.
Statistical Models for Time Series
Assumptions with respect to the behavior of the time series
• The time series has a linear response to its predictors.
• No input variable is constant over time or perfectly correlated
with another input variable. This simply extends the traditional
linear regression requirement of independent variables to
account for the temporal dimension of the data.
Assumptions with respect to the error
• For each point in time, the expected value of the error, given all
explanatory variables for all time periods (forward and backward),
is 0.
• The error at any given time period is uncorrelated with the
inputs at any time period in the past or future. So a plot of the
autocorrelation function of the errors will not indicate any pattern.
• Variance of the error is independent of time.
Autoregressive Models
The autoregressive (AR) model relies on the intuition that the
past predicts the future and so posits a time series process in
which the value at a point in time t is a function
of the series’s values at earlier points in time.
The simplest AR model, an AR(1) model, describes a system
as follows:
yt = b0 + b1 × yt -1 + et
The value of the series at time t is a function of a constant b0,
its value at the previous time step multiplied by another
constant b1 × yt -1 and an error term that also varies
with time et. This error term is assumed to have a constant
variance and a mean of 0.
We denote an autoregressive term that looks back only to the
immediately prior time as an AR(1) model because it includes
a lookback of one lag.
Autoregressive Models
Incidentally, the AR(1) model has an identical form to a simple
linear regression model with only one explanatory variable.
That is, it maps to:
Y = b0 + b1 × x + e
We can calculate both the expected value of yt and its
variance, given yt–1, if we know the value of b0 and b1.
E( yt | yt -1) = b0 + b1 × yt -1 + et
Var(yt | yt -1) = Var(et) = Var(e)
Stationarity is a key concept in time series analysis because
it is required by many time series models, including AR models.
Strong stationarity requires that the distribution of the random
variables output by a process remain the same over time, so, for
example, it demands that the statistical distribution of y1, y2, y3 be
the same as y101, y102, y103 for any measure of that distribution
rather than the first and second moments (the mean and variance).
Autoregressive Models
First we plot the data in chronological order (Figure 6-1). Since we will
model this as an AR process, we look to the PACF to set a cutoff on the
order of the process (Figure 6-2).

Figure 6-1. Daily number of Banking orders


Autoregressive Models

Figure 6-2. PACF of the untransformed orders time series


pictured in Figure 6-1.
Autoregressive Models
We can see that the value of the PACF crosses the 5%
significance threshold at lag 3. This is consistent with the
results from the ar() function available in R’s stats package.
ar() automatically chooses the order of an autoregressive
model if one is not specified.
If we look at the documentation for the ar() function, we can
see that the order selected is determined (with the default
parameters we left undisturbed) based on the Akaike
information criterion (AIC). This is helpful to know because it
shows that the visual selection we made by examining the
PACF is consistent with the selection that would be made by
minimizing an information criterion. These are two different
ways of selecting the order of the model, but in this case
they are consistent.
Interpreting the autoregressive model
We know that in the autoregressive model, or
AR for short, current values is predicted based
solely on previous values. Basically, this is a
linear model in which current period values is
derived from past outcome sums multiplied by a
numeric factor.
Using AR(p), we can consider the number of
lagged values we want to include in the model,
where p represents the model's order. A simple
autoregressive model, also known as an AR(1),
would look like this, for example, if X is a time-
series variable:
Interpreting the autoregressive model
Xt = C + ϕ1Xt-1 + ϵt
Where, Xt-1 represents the previous period’s value of X.
‘t’ represents today ,while ‘t-1’ represents the last week. As a
result, Xt-1 reflects last week’s value
What is ϕ1?
The coefficient ϕ1 represents the numeric constant multiplied by
the lagged variable (Xt-1). In other words, it represents the future
portion of the previous value.
Try to maintain these coefficients between -1 and 1. The reason
for this is as follows. When the absolute value of the coefficient
exceeds 1, it will explode exponentially over time.
What is ϵt?
This value is known as residual, representing the difference
between our period t prediction and the correct value (ϵt = yt -
ŷt).
Autoregressive model
Many regression models use linear combinations of predictors to forecast a variable.
In contrast, autoregressive models use the variable's past values to determine the
future value.
AR(1) autoregressive processes depend on the value immediately preceding the
current value. Alternatively, AR(2) uses the previous two values to calculate the
current value. While AR(0) processes white noise, which does not depend on terms.
What is time series forecasting?
In time series forecasting, time series data is analyzed through statistics and
mathematical modeling to predict and inform strategic decisions.
Forecasting can provide insight into the likelihood of certain outcomes. Generally, a
more extensive dataset leads to more accurate forecasting.
Predictions and forecasts are generally synonymous, but there is a notable difference
between them. Prediction refers to data at a general future point in time, whereas
forecasting focuses on data at a specific future point when it occurs.
The analysis of time series is often combined with series forecasting. In time series
analysis, models are developed to understand the underlying causes of the data. By
analyzing outcomes, you can understand "why" they occur. As a result, forecasting
takes the next step of extrapolating the future from the knowledge derived from the
past.
Akaike Information Criterion
The AIC of a model is equal to AIC = 2k – 2lnL where k is the number of
parameters of the model and L is the maximum likelihood value for that
function. In general, we want to lessen the complexity of the model (i.e.,
lessen k) while increasing the likelihood/ goodness-of-fit of the model (i.e.,
L). So we will favor models with smaller AIC values over those with greater
AIC values.
A likelihood function is a measure of how likely a particular set of
parameters for a function is relative to other parameters for that function
given the data. So imagine you’re fitting a linear regression of x on y of the
following data:
xy
11
22
33
If you were fitting this to the model y = b × x, your likelihood function would
tell you that an estimate of b = 1 was far more likely than an estimate of b =
0. You can think of likelihood functions as a tool for helping you identify the
most likely true parameters of a model given a set of data.
Forecasting with an AR(p) process
AR(p) Models Are Moving Window Functions
Forecasting many steps into the future
So far we have done a single-step-ahead forecast.
However, we may want to predict even further into the future.
we wanted to produce a two-step-ahead forecast instead of a
one-step-ahead forecast.
What we would do is first produce the one-step-ahead
forecast, and then use this to furnish the yt value we need to
predict yt+1.
Moving Average Models
A moving average (MA) model relies on a picture of a process in which the
value at each point in time is a function of the recent past value “error”
terms, each of which is independent from the others. We will review this
model in the same series of steps we used to study AR models.
In many cases, an MA process can be expressed as an infinite order AR
process. Likewise, in many cases an AR process can be expressed as an
infinite order MA process
The model
A moving average model can be expressed similarly to an autoregressive
model except that the terms included in the linear equation refer to present
and past error terms rather than present and past values of the process
itself. So an MA model of order q is expressed as:
yt = μ + et + θ1 × et -1 + θ2 × et -2... + θq × et –q
Do not confuse the MA model with a moving average. They are
not the same thing. Once you know how to fit a moving average
process, you can even compare the fit of an MA model to a moving
average of the underlying time series.
Handling Missing Data
Handling Missing Data

Backward Fill:
Autoregressive Integrated Moving Average Models
Now we that we have examined AR and MA models individually, we look to
the Autoregressive Integrated Moving Average (ARIMA) model, which
combines these, recognizing that the same time series can have both
underlying AR and MA model dynamics. This alone would lead us to an
ARMA model, but we extend to the ARIMA model, which accounts for
differencing, a way of removing trends and rendering a time series
stationary.
What Is Differencing?
As discussed earlier in the book, differencing is converting a time series of
values into a time series of changes in values over time. Most often this is
done by calculating pairwise differences of adjacent points in time, so that
the value of the differenced series at a time t is the value at time t minus the
value at time t – 1. However, differencing can also be performed on
different lag windows, as convenient
ARIMA models continue to deliver near state-of-the-art performance,
particularly in cases of small data sets where more sophisticated machine
learning or deep learning models are not at their best. However, even
ARIMA models pose the danger of overfitting despite their relative
simplicity.
Rolling-window analysis
Rolling-window analysis of a time-series model assesses: The
stability of the model over time. A common time-series model
assumption is that the coefficients are constant with respect to
time
Rolling Window regression
Performing a rolling regression (a regression with a rolling time window)
simply means, that you conduct regressions over and over again, with
subsamples of your original full sample.
For example you could perform the regressions using windows with a size
of 50 each, i.e. from 1:50, then from 51:100 etc.
Another approach would be to apply overlapping windows with a size of 50
each. So for example using 1:50, then 41:90 etc. (cutting off the last 10
elements in each succeeding subsample regression).
As a result you will receive a time series of your regression coefficients,
which you can then analyze.
Seasonal differencing
The seasonal difference of a time series is the
series of changes from one season to the next. For
monthly data, in which there are 12 periods in a
season, the seasonal difference of Y at period t
is Y(t)-Y(t-12). Seasonal differencing,
like nonseasonal differencing, can be performed as
an analysis option within the time series procedures.
If the seasonal difference of Y is "pure noise"
(constant variance, no autocorrelation, etc.), then Y
is described by a seasonal random walk model: each
value is a random step away from the value that
occurred exactly one season ago.
Practical's

1. Moving Average-
2_GM_PracticalTimeSeriesAnalysis_Moving
_Averages_Ch2-1.ipynb
2. Seasonal Differencing:
2_GM_Practical_TimeSeries_Seasonal_Diff
erencing_Ch2-2.ipynb
Basic steps in a
forecasting task
(Day-6)
Basic steps in a forecasting task
A forecasting task usually involves five basic steps.
Step 1: Problem definition
Often this is the most difficult part of forecasting.
Defining the problem carefully requires an
understanding of the way the forecasts will be used,
who requires the forecasts, and how the forecasting
function fits within the organisation requiring the
forecasts. A forecaster needs to spend time talking
to everyone who will be involved in collecting data,
maintaining databases, and using the forecasts for
future planning.
Basic steps in a forecasting task
Step 2: Gathering information.
There are always at least two kinds of information required:
(a) statistical data, and
(b) the accumulated expertise of the people who collect the
data and use the forecasts.
Often, it will be difficult to obtain enough historical data to be
able to fit a good statistical model. Occasionally, old data will
be less useful due to structural changes in the system being
forecast; then we may choose to use only the most recent
data. However, remember that good statistical models will
handle evolutionary changes in the system;
don’t throw away good data unnecessarily.
Basic steps in a forecasting task
Step 3: Preliminary (exploratory) analysis.
Always start by graphing the data. Are there consistent patterns? Is there
a significant trend? Is seasonality important? Is there evidence of the
presence of business cycles?
Are there any outliers in the data that need to be explained by those with
expert knowledge? How strong are the relationships among the variables
available for analysis? Various tools have been developed to help with
this analysis.
Step 4: Choosing and fitting models.
The best model to use depends on the availability of historical data, the
strength of relationships between the forecast variable and any
explanatory variables, and the way in which the forecasts are to be used.
It is common to compare two or three potential models. Each model is
itself an artificial construct that is based on a set of assumptions
(explicit and implicit) and usually involves one or more parameters which
must be estimated using the known historical data.
Basic steps in a forecasting task
Step 5: Using and evaluating a forecasting model.
Once a model has been selected and its parameters
estimated, the model is used to make forecasts. The
performance of the model can only be properly evaluated
after the data for the forecast period have become available.
A number of methods have been developed to help in
assessing the accuracy of forecasts.
There are also organizational issues in using and
acting on the forecasts.When using a forecasting model in
practice, numerous practical issues arise such as how to
handle missing values and outliers, or how to deal with short
time series.
The statistical forecasting perspective
The thing we are trying to forecast is unknown (or we would not
be forecasting it), and so we can think of it as a random variable. For
example, the total sales for next month could take a range of possible
values, and until we add up the actual sales at the end of the month, we
don’t know what the value will be. So until we know the sales for next
month, it is a random quantity.
Because next month is relatively close, we usually have a good
idea what the likely sales values could be. On the other hand, if we are
forecasting the sales for the same month next year, the possible values it
could take are much more variable. In most forecasting situations, the
variation associated with the thing we are forecasting will shrink as the
event approaches. In other words, the further ahead we forecast, the
more uncertain we are.
We can imagine many possible futures, each yielding a different
value for the thing we wish to forecast. Plotted in black in the figure below
are the total international visitors to Australia from 1980 to 2015. Also
shown are ten possible futures from 2016–2025.
The statistical forecasting perspective
The statistical forecasting perspective
When we obtain a forecast, we are estimating the
middle of the range of possible values the random variable
could take. Often, a forecast is accompanied by a prediction
interval giving a range of values the random variable could
take with relatively high probability.
For example, a 95% prediction interval contains a
range of values which should include the actual future value
with probability 95%. Instead of plotting individual possible
futures as shown in the figure above, we usually show these
prediction intervals instead. The plot below shows 80% and
95% intervals for the future Australian international visitors.
The blue line is the average of the possible future values,
which we call the point forecasts.
The statistical forecasting perspective
The statistical forecasting perspective
We will use the subscript for time. For example, 𝑦𝑡 will denote the
observation at time t. Suppose we denote all the information we have
observed as and we want to forecast 𝑦𝑡. We then write meaning “the
random variable given what we know in ”. The set of values that this
random variable could take, along with their relative probabilities, is
known as the “probability distribution” of 𝑦𝑡 . In forecasting, we call this
the forecast distribution.
When we talk about the “forecast” , we usually mean the average
value of the forecast distribution, and we put a “hat” over 𝑦 to show this.
Thus, we write the forecast of 𝑦𝑡 as 𝑦-hat , meaning the average of the
possible values that could take everything we know. Occasionally, we will
use 𝑦-hat , to refer to the median (or middle value) of the forecast
distribution instead.
How to Interpret ARIMA Results
SARIMAX stands for Seasonal AutoRegressive
Integrated Moving Average with eXogenous
regressors.
• Dep. Variable – What we’re trying to predict.
• Model – The type of model we’re using. AR, MA,
ARIMA.
• Date – The date we ran the model
• Time – The time the model finished
• Sample – The range of the data
• No. Observations – The number of observations
How to Interpret ARIMA Results
Log-Likelihood

The log-likelihood function identifies a distribution that fits best with the
sampled data. While it’s useful, AIC and BIC punish the model for complexity,
which helps make our ARIMA model parsimonious.
Akaike’s Information Criterion (AIC) helps determine the strength of the linear
regression model. The AIC penalizes a model for adding parameters since
adding more parameters will always increase the maximum likelihood value.
Bayesian Information Criterion
Bayesian Information Criterion (BIC), like the AIC, also punishes a model for
complexity, but it also incorporates the number of rows in the data.
Hannan-Quinn Information Criterion
Hannan-Quinn Information Criterion (HQIC), like AIC and BIC, is another
criterion for model selection; however, it’s not used as often in practice.

You might also like