Business Forecasting
Business Forecasting
Business Forecasting
Talk @XIM University
When I go to any university, and I tell people that my job is time series forecasting
and machine learning, usually one of two things happens:
• ...so, can you predict the stock market • I'll tell you how, and we're all going to be rich!
• El Nino
• Seismic events
❖ In ancient Babylon, forecasters would foretell the ❖ Beginning in 800 BC, a priestess known as the Oracle
future based on the distribution of maggots in a of Delphi would answer questions about the future at
rotten sheep's liver. the Temple of Apollo on Greece's Mount Parnassus.
Forecasters are to blame!
❖ Forecasters had a tougher time under the emperor Constantius, who
issued a decree in AD357 forbidding anyone “to consult a soothsayer, a
mathematician, or a forecaster -- May curiosity to foretell the future be
silenced forever.”
❖ But it did nothing but pour with rain the whole time,
leaving her with a cold. Gabitova has asked the court to
order the weather service to pay the cost of her travel.
Reputations can be made and lost
Some Misconceptions (Low Expectations): Our forecasts will always be inaccurate, so we should
concentrate our efforts elsewhere.
“I think there is a world market for maybe five computers. (Chairman of IBM, 1943)
“There is no reason anyone would want a computer in their home.” (President, DEC, 1977)
“There’s no chance that the iPhone is going to get any significant market share. No chance.”
(Steve Ballmer, CEO Microsoft, April 2007)
“We’re going to be opening relatively soon . . . The virus . . . will go away in April.”
(Donald Trump, February 2020)
"Prediction is very difficult, especially if it's about the future!" - Niels Bohr
Reputations can be made and lost
Some Misconceptions (High Expectations): If only we had the latest forecasting technology,
then all our problems could be solved.
• Discrete time series is one in which the set of time points at which observations are made is a discrete
set. (e.g., All above including irregularly spaced data)
• Continuous time series are obtained when observations are made continuously over some time
intervals. (e.g., ECG graph)
• Forecasting is estimating how the sequence of observations will continue in to the future. (e.g.,
Forecasting of major economic variables like GDP, Unemployment, Inflation, Exchange rates,
Production and Consumption)
• Forecasting is very difficult, since it’s about the future! (e.g., forecasts of daily cases of COVID-19)
Use of Time Series Data
• Trend (𝑇𝑡 ) : pattern exists when there is a long-term increase or decrease in the data.
• Seasonal (𝑆𝑡 ) : pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the
month, or day of the week).
• Cyclic (𝐶𝑡 ) : pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at
least 2 years).
• Additive decomposition: : 𝑌𝑡 = 𝑇𝑡 + 𝑆𝑡 + 𝐶𝑡 + 𝐼𝑡
• Multiplicative decomposition: 𝑌𝑡 = 𝑇𝑡 ∗ 𝑆𝑡 ∗ 𝐶𝑡 ∗ 𝐼𝑡
• A stationary series is roughly horizontal, constant variance and no patterns predictable in the long-term.
Stationary Time Series
Stationary Series:
• A series free from trend and seasonal patterns
• A stationary time series exhibits similar statistical behavior in time and this is often characterized
by a constant probability distribution in time
• We have done this because some model returns better results when the "form is linear". Thus, that
time we want to transform them in to some linear form. And for that you need to understand the
"nature" of the data.
Log Transformation
𝜆
′
𝑦(𝑡) −1
𝑦 𝑡 = 𝑖𝑓 𝜆 ≠ 0
𝜆 𝜆𝜆 −1
Since, lim = ln 𝑥
𝜆→0 𝜆
𝑦 ′ 𝑡 = log 𝑦(𝑡) 𝑖𝑓 𝜆 = 0
Air Passengers data
• Here, we can see that the trend is increasing and too the magnitude.
• There is the seasonality: On each year of some specific months the amount rises and then dips
• The amount is increasing i.e., the seasonality is same but the magnitude is increasing over
time: which the model has to consider.
Transformation on Air Passengers data
1. One-Step Forecasting
2. Multi-Step Forecasting
- Incremental Multi-step
Here we will forecast for only one time - Very next frame.
Multi-Step Forecast
— The term used is: "Forecast horizon" = Multiple steps in the future.
Remember: These are just the methods not the types of models.
Incremental Multi-Step Forecast
The model can be trained on the fixed n days. So, suppose
Now, in the first iteration we have forecast 4th day. But still one day is yet to be forecast so we will take day 4 as
the input.
So, instead of increasing the number of days, we will eleminate the first ones to keep the p consistant.
This is how incremental multi-step works in a nutshell.
Multi Output Forecast
Here, we feed the time-series in and get the h days out and at once.
Multi-step Forecasting
Type Uses
Short –term & Budgeting and selecting new research for
Medium-term development projects
2. Data collection: Consists of obtaining the relevant history for the variable(s) to be forecasted,
including historical information on potential predictorvariables.
3. Preliminary data analysis: Needed for selection of the appropriate forecasting model, to
identify the patterns such as trends, seasonal and other cyclic components. Numerical
summaries such as sample mean, standard deviation, percentiles, stationarity, nonlinearity,
and auto correlations need to be computed and evaluated. Unusual observations or potential
outliers need to be identified and flagged for possible further study. If predictor variables are
involved, scatter plots of each pair of variables should be examined.
Business Forecasting Process
4. Model selection and fitting: Choosing one or more forecasting models and fitting the
model to the data (estimating the unknown parameters of the model).
5. Forecast model deployment: Consists of using the model to forecast the future values of
the variable of interest for the customer.
6. Monitoring forecasting model performance: An ongoing activity after the model has been
deployed to ensure that the model still performing satisfactorily. Monitoring of forecast
errors is an essential part of good forecasting system design.
Auto-regression Analysis
Auto Regression Analysis
• Time series data are data collected on the same observational unit at multiple time period.
• Forecasting model
• Forecasting model build on regression model Can we predict the trend (USD vs. INR) at a
time, say 2025?
Some Notations and Concepts
• 𝑌𝑡 = Value of Y in a period t
• Data set [Y1, Y2, … YT-1, YT]: T observations on the time series random variable Y
There are four ways to have the time series data for AutoRegression analysis
• Difference: The fist difference of a series, Y𝑡 is its change between period 𝑡 and 𝑡 − 1 ,
that is, 𝑦𝑡 = 𝑌𝑡 − 𝑌𝑡−1
𝑌𝑡−1
• Percentage: 𝑦𝑡 = × 100
𝑌𝑡
Related Concepts and Notations
Assumptions
2. Stationarity: A time series 𝑌𝑡 is stationary if its probability distribution does not change over time,
that is, if the joint distribution of 𝑌𝑖+1 , 𝑌𝑖+2 , 𝑌𝑖+3 , … . 𝑌𝑖+𝑇 does not depend on 𝑖.
Stationary property implies that history is relevant. In other words, stationary requires the future to be
like the past (in a probabilistic sense).
Auto-regression analysis assumes that 𝑌𝑡 is both uniform and stationary.
Auto-correlation coefficient
The correlation of a series with its own lagged values is called autocorrelation (also called serial
correlation)
Example
• For the given data, say ρ1 = 0.84 between two given consecutive years
o This implies that the Dollars per Pound is highly serially correlated
• Similarly, we can determine ρ2 , ρ3,…. etc.
Popular Forecasting Techniques
Random Walk
Introduction
• In time series analysis, a random walk is a stochastic process where future values are determined
by previous values plus a random shock.
• Understanding the random walk is crucial for modeling and forecasting in various fields.
What is it?
• It is about those stocks which follow a random walk i.e. its price is not predictable.
• "Sometimes the best-fitting model is in fact a random walk"
• It is either to go up or down randomly having 50-50 chance on each direction
• It is impossible predict the next as there is 50-50 chances
Characteristics of a Random Walk
• Constant Drift:
• A random walk typically exhibits a constant drift or trend over time.
• The drift reflects the average rate of change in the series.
• Unpredictable Movement:
• Future values of a random walk are unpredictable.
• Each step depends solely on the current position and a random shock, making forecasting challenging.
• No Auto-correlation:
• A random walk typically has no autocorrelation.
• The correlation between observations at different lags is close to zero.
Gaussian Random Walk
• It is where the up and down values come from the Gaussian distribution
• So, here the prices don't go 1 unit up or down but go any value from the Gaussian distribution, thus
called: Gaussian Random Walk
𝑛𝑒𝑤 = 𝑜𝑙𝑑 + 𝑒
• Since there is unpredictability, we can only know that the error 𝑒 ∈ 𝑁 0, 𝜎 2
• Something interesting that we can do with log
• The general formulae for new price (in random walk):
𝑛𝑒𝑤 = 𝑜𝑙𝑑 + 𝜇 + 𝑒
• 𝜇 = Drift - This would control the trend of the time series.
• 𝑒 = The noise ∈ 𝑁 0, 𝜎 2
Naïve Forecasting Methods
Average Method
• Here, the forecasts of all future values are equal to the average (or "mean") of the historical
data. If we let the historical data be denoted by 𝑦1 , … , 𝑦𝑇 , then we can write the forecasts as
𝑦ƶ 𝑇+ℎ∣𝑇 = 𝑦᪄ = 𝑦1 + ⋯ + 𝑦𝑇 /𝑇.
• The notation 𝑦ƶ 𝑇+ℎ∣𝑇 is a short-hand for the estimate of 𝑦𝑇+ℎ based on the data 𝑦1 , … , 𝑦𝑇 .
Naive Method
• For naïve forecasts, we simply set all forecasts to be the value of the last observation.
That is,
𝑦ƶ 𝑇+ℎ∣𝑇 = 𝑦𝑇 .
• This method works remarkably well for many economic and financial time series.
Seasonal Naive Method
• A similar method is useful for highly seasonal data. In this case, we set each forecast to be equal to
the last observed value from the same season (e.g., the same month of the previous year). Formally,
the forecast for time 𝑇 + ℎ is written as
𝑦ƶ 𝑇+ℎ∣𝑇 = 𝑦𝑇+ℎ−𝑚(𝑘+1) ,
where 𝑚 = the seasonal period, and 𝑘 is the integer part of (ℎ − 1)/𝑚 (i.e., the number of complete
years in the forecast period prior to time 𝑇 + ℎ ). This looks more complicated than it really is.
• For example, with monthly data, the forecast for all future February values is equal to the last
observed February value. With quarterly data, the forecast of all future Q2 values is equal to the last
observed Q2 value (where Q2 means the second quarter). Similar rules apply for other months and
quarters, and for other seasonal periods.
Exponential Smoothing
Finally at Non-Naive Forecasting!
• Holt Model
• Trending, Non Seasonal (Average)
• Holt Winters
• Trending, Seasonal (Amazing)
The Simple Moving Averages
• Simple is simple. No arguement.
• It is used to show the zaggy data in a smooth manner
• Helps to give the overall trend of the time-series
• Used to show the intuition of the data without showing much information
• This is the Exponentially Weighted Moving Average where the weight is given exponentially lower
as the data point gets older.
• The moving average is designed as such that older observations are given lower weights. The
weights fall exponentially as the data point gets older – hence the name exponentially weighted.
Why Exponential?
Current Situation:
EWMA = 𝛼𝑥𝑡 + (1 − 𝛼)𝑥᪄𝑡−1
Which includes:
EWMA = 𝛼𝑥𝑡 + (1 − 𝛼)ሾ𝛼𝑥𝑡−1 + 1 − 𝛼 𝑥᪄𝑡−2 ]
Which Becomes:
EWMA = 𝛼𝑥𝑡 + (1 − 𝛼)𝛼𝑥𝑡−1 + (1 − 𝛼)2 𝑥᪄𝑡−2
Again:
EWMA = 𝛼𝑥𝑡 + (1 − 𝛼)𝛼𝑥𝑡−1 + (1 − 𝛼)2 𝛼𝑥𝑡−2 + (1 − 𝛼)𝑥᪄𝑡−3
Which becomes:
EWMA = 𝛼𝑥𝑡 + (1 − 𝛼)𝛼𝑥𝑡−1 + (1 − 𝛼)2 𝛼𝑥𝑡−2 + (1 − 𝛼)3 𝑥᪄𝑡−3
We can see that:
(1 − 𝛼) → (1 − 𝛼)2 → (1 − 𝛼)3 → ⋯ → (1 − 𝛼)𝑡−1
SES Model
• No Trend, No Seasonality model
• As can be seen in the image, the line goes in the upward direction.
• Which is the result of the intercept = level and 5 lope = trend. (Better than the SES model, just the
horizontal line)
Holt Winter Model
Definition
An autoregressive model (also called AR model) is used to model a future behavior for a time-
ordered data, using data from past behaviors.
• Essentially, it is a linear regression analysis of a dependent variable using one or more variables(s)
in a given time-series data.
𝑌𝑡 = 𝑓(𝑌𝑡−1 , 𝑌𝑡−2 , … , 𝑌𝑡−𝑝 )
Auto-Regression Model for Forecasting
Definition
• A natural starting point for forecasting model is to use past values of 𝑌, that is, 𝑌𝑡−1 , 𝑌𝑡−2 , … to predict 𝑌𝑡 .
• An auto-regression is a regression model in which 𝑌𝑡 is regressed against its own lagged values.
• In 𝑝𝑡ℎ order auto-regression (denoted as AR(p)), 𝑌𝑡 is regressed against, 𝑌𝑡−1 , 𝑌𝑡−2 , … , 𝑌𝑡−𝑝
𝒕𝒉
𝒑 order Auto-regression Model
𝑌𝑡 = 𝛽0 + 𝛽𝑖 𝑌𝑡−𝑖 + 𝜀𝑡
𝑖=1
where, 𝛽0 , 𝛽1 , … , 𝛽𝑝 is called auto-regression coefficients and 𝜀𝑡 is the noise term or residue and in
practice it is assumed to Gaussian white noise
For example, 𝐴𝑅(1) is 𝑌𝑡 = 𝛽0 + 𝛽1 𝑌𝑡−1 + 𝜀𝑡
The task in AR analysis is to derive the "best" values for 𝛽𝑖 , 𝑖 = 0,1, … , 𝑝 given a time
series 𝑌1 , 𝑌2 , … , 𝑌𝑇−1 , 𝑌𝑇
Computing AR Coefficients
• AR: autoregressive (lagged observations as inputs) I: integrated (differencing to make series stationary) MA: moving
average (lagged errors as inputs).
• The model is expressed as ARIMA 𝑝, 𝑑, 𝑞 where 𝑝, 𝑑 𝑎𝑛𝑑 𝑞 are integer parameter values that decide the structure of
the model.
• More precisely, 𝑝 𝑎𝑛𝑑 𝑞 are the order of the AR model and the MA model respectively, and parameter d is the level of
differencing applied to the data.
where 𝑦𝑡 is the actual value, 𝜀𝑡 is the random error at time t, 𝜙𝑖 and 𝜃𝑗 are the coefficients of the model.
• It is assumed that 𝜀𝑡−1 𝜀𝑡−1 = 𝑦𝑡−1 − 𝑦ො𝑡−1 has zero mean with constant variance, and satisfies the i.i.d. condition.
• Three basic Steps: Model identification, Parameter Estimation, and Diagnostic Checking.
ARIMA model
Data
ACF Plot
PACF Plot
Multivariate Forecasting
VAR(p)
• VAR models (vector autoregressive models) are used for multivariate time series. The structure is that
each variable is a linear function of past lags of itself and past lags of the other variables.
• As an example suppose that we measure three different time series variables, denoted by x(t,1), x(t,2), and
x(t,3).
• Each variable is a linear function of the lag 1 values for all variables in the set.
VAR(p)
• In a VAR(2) model, the lag 2 values for all variables are added to the right sides of the equations, In the
case of three x-variables (or time series) there would be six predictors on the right side of each
equation, three lag 1 terms and three lag 2 terms.
• In general, for a VAR(p) model, the first p lags of each variable in the system would be used as
regression predictors for each variable.
• VAR models are a specific case of more general VARMA models. VARMA models for multivariate
time series include the VAR structure above along with moving average terms for each variable. More
generally yet, these are special cases of ARMAX models that allow for the addition of other predictors
that are outside the multivariate set of principal interest.
It arose from macroeconomic data where large changes in the data permanently affect the level of the
series. 𝐱 𝑡 = 𝜙𝐱 𝑡−1 + 𝐰𝑡
Autoregressive Neural Network
Neural Network
• Feed-forward neural network comprise of a series of hidden layers between the input and output layer.
• The outputs of the nodes in one layer are inputs to the next layer. The inputs to each node are combined
using a weighted linear combination. The result is then modified by a nonlinear function before being
output.
4
𝑧𝑗 = 𝑏𝑗 + 𝑤𝑖,𝑗 𝑥𝑖 .
𝑖=1
• In the hidden layer, this is then modified using a nonlinear function such as a sigmoid,
1
𝑠(𝑧) = −𝑧
,
1+𝑒
to give the input for the next layer. This tends to reduce the effect of extreme input values, thus making the
network somewhat robust to outliers.
ARNN for Forecasting
• With time series data, lagged values of the time series can be used as inputs to a neural network,
typically named as Autoregressive neural network (ARNN)
• ARNN(𝑝, k) is a feed-forward network with one hidden layer, with 𝑝 lagged inputs and 𝑘
nodes in the hidden layer.
• ARNN(𝑝, 0) model is equivalent to an ARIMA(𝑝, 0,0) model, but without the restrictions on the
parameters to ensure stationarity.
• With seasonal data, it is useful to also add the last observed values from the same season as
inputs.
• For time series, the default is the optimal number of lags (according to the AIC) for a linear
AR(𝑝) model. If 𝑘 is not specified, it is set to 𝑘 = (𝑝 + 1)/2 (rounded to the nearest integer).
• When it comes to forecasting, the network is applied iteratively.
Forecast Evaluation
Forecast Evaluation
Performance metrics such as mean absolute error (MAE), root mean square error (RMSE), and mean
absolute percent error (MAPE) are used to evaluate the performances of different forecasting models
for the unemployment rate data sets:
𝑛
1 2;
𝑅𝑀𝑆𝐸 = 𝑦𝑖 − 𝑦ො𝑖
𝑛
𝑖=1
𝑛
1
𝑀𝐴𝐸 = 𝑦𝑖 − 𝑦ො𝑖 ;
𝑛
𝑖=1
1 𝑦𝑖 −𝑦ො 𝑖
𝑀𝐴𝑃𝐸 = σ𝑛𝑖=1 ,
𝑛 𝑦𝑖
Where 𝑦𝑖 is the actual output, 𝑦ො𝑖 is the predicted output, and n denotes the number of data points.
By definition, the lower the value of these performance metrics, the better is the performance of the
concerned forecasting model.
Quick Survey on Forecasting Tools
Reference Book