Study Notes For Business Forecasting
Study Notes For Business Forecasting
Introduction to Forecasting
Forecasting involves predicting future outcomes based on historical data and analysis. It is essential
in business for planning, decision-making, inventory management, and financial projections. There
are various strategies and models used depending on the context and type of data.
1. Qualitative Forecasting
2. Quantitative Forecasting
Time Series Models: Use past data points over time to predict future values.
Moving Average: Averages data over a fixed number of past periods to smooth
fluctuations.
Causal Models: Assume that the variable to be forecasted is influenced by one or more other
variables.
o Seasonality: Identifying recurring patterns over a specific period (e.g., quarterly sales
trends).
o Trend Analysis: Observing the overall direction of data over time to forecast growth
or decline.
3. Scenario Forecasting
4. Combination Forecasting
1. Error Metrics:
o Mean Squared Error (MSE): Squares the error to emphasize larger deviations.
o Overfitting: When a model is too complex and captures noise instead of the
underlying trend.
o Underfitting: When the model is too simple and fails to capture the true data
patterns.
Challenges in Forecasting
Changing Environments: Economic, political, or technological shifts can make past data less
relevant.
Selection of Models: Choosing the wrong model can result in significant forecasting errors.
Topic 2: Decomposition of time series data
Introduction to Time Series Decomposition
Decomposition is a method used to break down time series data into its underlying components to
better understand the structure and patterns. Time series data consists of observations taken over
time at regular intervals (e.g., daily sales, monthly temperature).
Time series data can typically be decomposed into three primary components:
1. Trend (T)
o Definition: The long-term movement or direction in the data over a period of time.
2. Seasonality (S)
o Purpose: Captures periodic variations in data, often driven by external factors such
as weather, holidays, or business cycles.
o Purpose: Represents the noise or randomness in the data, often short-term and non-
recurring.
1. Additive Decomposition
o Model: Y(t)=T(t)+S(t)+R(t)
2. Multiplicative Decomposition
o Model: Y(t)=T(t)×S(t)×R(t)
o Explanation: The components are multiplied together. This method is used when the
size of seasonal fluctuations and residuals increase or decrease in proportion to the
trend.
Steps in Decomposition
o Calculate the seasonal indices by averaging the data over the period where
seasonality repeats (e.g., monthly for yearly seasonality).
o Subtract (additive model) or divide (multiplicative model) the trend and seasonal
components from the original data to isolate the residual component.
o The remaining residual data can be analysed for any randomness or irregularities.
Sales Forecasting: Helps businesses understand seasonality in sales and long-term trends for
better inventory management.
Climate Data: Used to differentiate between long-term climate changes (trend) and recurring
weather patterns (seasonality).
Assumes that the components are static over time (e.g., seasonality remains constant).
May not be suitable for highly complex or irregular time series data.
1. Smoothing Techniques
Introduction to Smoothing
Smoothing is a simple time series forecasting technique used to reduce noise and reveal
important patterns like trends and seasonality in data.
It works by averaging adjacent data points to smooth out short-term fluctuations, helping
highlight the overall direction of the data.
Types of Smoothing Methods
o Use case: Best for data without significant trends or seasonality. Helps reduce
random variations.
o Limitations: Not ideal for time series with strong trends or seasonality as it lags
behind the data.
o Definition: Assigns different weights to past data points, giving more importance to
recent observations.
o Use case: Good for situations where recent data should have more influence on the
forecast.
o Definition: Averages past data points using exponentially decreasing weights, with
more emphasis on recent data.
o Use case: Suitable for data with no trend or seasonality. Captures recent changes
without lagging as much as moving averages.
Introduction to Holt-Winters
It combines three components: level (current value), trend, and seasonality, using separate
smoothing equations for each.
1. Additive Holt-Winters
o Formula:
o Use case: Suitable when seasonality remains constant over time, regardless of the
trend.
2. Multiplicative Holt-Winters
o Formula:
Seasonality: S(t)=γ[Y(t)L(t)]+(1−γ)S(t−p)
o Use case: Suitable when the magnitude of seasonality increases or decreases over
time, and the seasonal effect is proportional to the level.
Level (α): Controls how much weight is placed on the current level compared to past data.
Business and Economics: Used for sales forecasting, demand prediction, and inventory
management.
Climate Data: Smoothing temperature trends over time or predicting seasonal weather
patterns.
Finance: Stock market predictions based on trends and seasonality patterns in prices.
Advantages
Flexibility: Holt-Winters can adapt to various data patterns with both trends and seasonality.
Limitations
Assumes Fixed Patterns: Holt-Winters assumes trends and seasonal patterns remain
consistent over time, which may not always be the case.
Requires Parameter Tuning: Choosing appropriate smoothing constants (α, β, γ) can be
challenging and may require optimization.
1. Stochastic Models
Definition
They are often used to forecast time series data by accounting for both deterministic and
random components.
Key Characteristics
Time-Dependence: They are primarily used to model processes that change over time, such
as stock prices, weather conditions, or economic trends.
Applications
Finance: Modelling stock prices (e.g., using the Geometric Brownian Motion in the Black-
Scholes model).
Definition
Auto-regressive (AR) models are a class of stochastic models where future values of a time
series are expressed as linear combinations of its past values and a stochastic (random) error
term.
AR models are foundational in time series forecasting, where "auto" refers to the self-
dependence of the series.
AR(p) Model
o ϵ(t) is the error term, representing the randomness or noise in the model.
Key Concepts
Stationarity: For AR models, the time series should be stationary, meaning its statistical
properties (mean, variance) do not change over time.
Lag Length (p): The number of past terms (lags) used in the model. Selecting the optimal lag
length is critical to prevent overfitting or underfitting.
AR(1) Model
AR(1): Simplest form of auto-regressive model, where the current value depends only on the
immediate past value.
o Example: Stock price tomorrow depends on today’s price, plus a random shock.
AR(2) Model
Methods of Estimation
Ordinary Least Squares (OLS): The most common method for estimating the coefficients
ϕ1,ϕ2,...ϕp , by minimizing the sum of squared errors between the observed and predicted
values.
Maximum Likelihood Estimation (MLE): Another method that maximizes the likelihood that
the observed data was generated by the given model.
Yule-Walker Equations: Solving these equations provides an alternative method for
estimating AR model parameters based on the autocovariance of the time series.
4. Properties of AR Models
The ACF measures the correlation between a time series and its lagged values.
In an AR model, the ACF tails off gradually as the lag increases, and this behaviour helps
identify the order of the model.
PACF measures the correlation between a time series and its lag, removing the influence of
intermediate lags.
In an AR model, the PACF cuts off after p lags, meaning the model is of order p if PACF shows
significant correlation only for the first p lags.
Bayesian Information Criterion (BIC): Like AIC, but BIC imposes a stronger penalty for model
complexity (number of parameters).
Model Diagnostics
Residual Analysis: After fitting an AR model, it’s important to check if the residuals (errors)
behave like white noise, meaning they are uncorrelated and have constant variance. This
ensures the model captures all the patterns in the data.
6. Applications of AR Models
Finance: Forecasting stock prices, exchange rates, or interest rates based on historical data.
Energy Sector: Predicting electricity demand or prices based on past usage data.
7. Limitations of AR Models
Assumes Linearity: AR models assume a linear relationship between past and current values,
which may not hold in all cases.
Stationarity Requirement: Non-stationary data requires transformation (e.g., differencing)
before applying AR models.
Conclusion
Stochastic models, particularly auto-regressive models, provide a powerful framework for forecasting
time series data. AR models capture the dependence of current values on past observations, making
them widely applicable across finance, economics, and other fields. Understanding their
assumptions, estimation methods, and diagnostics is crucial for effective implementation.
1. Stationarity
Definition
A time series is said to be stationary if its statistical properties, such as the mean, variance,
and autocovariance, remain constant over time.
Stationarity is a crucial assumption for many time series models, including Moving Average
(MA) and Auto-Regressive (AR) models.
Types of Stationarity
Strict Stationarity: A series is strictly stationary if the joint distribution of any subset of the
series is the same, regardless of time shifts.
Weak or Second-Order Stationarity: A series is weakly stationary if the mean, variance, and
autocovariance are constant over time. This is the most common form of stationarity used in
time series analysis.
Constant Mean: The expected value of the series is constant, i.e., E[Y(t)] = μ
Constant Variance: The variance of the series does not change over time, i.e., Var[Y(t)] = σ2.
Constant Covariance: The covariance between Y(t) and Y(t+k) depends only on the time
difference k, not on the actual time t.
Model Simplicity: Many time series models (AR, MA, ARMA) assume stationarity because
stationary processes are easier to model and forecast.
Predictability: Stationary series have properties that are constant over time, making it easier
to predict future values.
Visual Inspection: Plotting the series can reveal trends, seasonality, or changing variance,
which indicate non-stationarity.
Statistical Tests:
o Augmented Dickey-Fuller (ADF) Test: Tests for the presence of a unit root, where the
null hypothesis is that the series is non-stationary.
o KPSS Test: Tests whether a time series is stationary around a deterministic trend.
Differencing: Subtracting consecutive values of the series can help eliminate trends and
make a series stationary.
Log Transformation: Applying a logarithmic transformation can stabilize the variance in time
series data.
Detrending: Removing a deterministic trend from the series can help achieve stationarity.
Definition
Moving Average (MA) models are a type of stochastic time series model where the current
value of the series is modelled as a linear combination of past error terms (shocks or white
noise).
MA(q) Model
An MA(q) model refers to a moving average model of order q, where q is the number of past
error terms used to predict the current value.
o q is the order of the MA process, representing how many past error terms influence
the current value.
Key Concepts
White Noise: The error terms ϵ(t) are assumed to be independently and identically
distributed (i.i.d.) with a mean of 0 and constant variance.
Finite Memory: Unlike AR models, which have an infinite memory (due to past values), MA
models have finite memory. After q periods, past errors have no effect on the current value.
3. Interpretation of MA Models
In an MA model, the ACF cuts off after q lags. This means that autocorrelations exist up to
the q-th lag and are zero afterward. This is useful for identifying the order q of the MA
model.
In an MA model, the PACF exhibits a gradual decay rather than a sharp cutoff.
o In this case, the current value of Y(t) depends on the current error ϵ(t) and the
immediately previous error ϵ(t−1)
o The current value of Y(t) depends on the errors at the current time and up to two
previous time periods.
Maximum Likelihood Estimation (MLE): Commonly used to estimate the coefficients θ1,
θ2,...,θq in an MA model.
Least Squares Estimation: Another approach where coefficients are estimated by minimizing
the sum of squared errors between the observed and predicted values.
Invertibility of MA Models
Invertibility Condition: To ensure a unique and stable solution, MA models are required to
be invertible. This means the model can be written as an infinite AR process, allowing for
better parameter estimation.
Condition for Invertibility: The roots of the characteristic equation should lie outside the
unit circle.
5. Applications of MA Models
Finance: Forecasting short-term fluctuations in stock prices, interest rates, or currency
exchange rates.
Dependency:
Memory:
o AR models have an infinite memory, meaning the entire past can influence the
current value.
o MA models have finite memory, meaning only a limited number of past errors affect
the current value.
ACF/PACF Behaviour:
o AR models have a gradually decaying ACF and a PACF that cuts off after p lags.
o MA models have an ACF that cuts off after q lags and a PACF that decays gradually.
7. Limitations of MA Models
Assumption of White Noise: MA models assume the error terms are white noise, which may
not always hold true in real-world data.
Conclusion
Stationarity is a critical assumption for time series modelling, ensuring that the statistical
properties of the series remain constant over time.
Moving Average (MA) models capture the relationship between a time series and past
random shocks (errors). MA models are useful for short-term forecasting and are commonly
applied in finance, economics, and engineering. Understanding the interplay between
stationarity and model behaviour is key to building accurate time series forecasts.
Topic 5: Non-Stationary models
Definition of Non-Stationarity
A time series is non-stationary when its statistical properties (mean, variance, and
autocovariance) change over time.
Non-stationary behavior can arise due to trends, seasonality, or varying levels of volatility in
the data.
Non-stationary models are required to analyse and forecast time series data where such
properties evolve over time.
2. Types of Non-Stationarity
1. Trend Non-Stationarity:
2. Seasonal Non-Stationarity:
o Occurs when a time series has a repeating pattern or seasonality that changes the
mean or variance at different intervals.
o Common in sales, weather, and production data where cyclical patterns recur over
time.
3. Variance Non-Stationarity:
o When the variance of the series increases or decreases over time (e.g., financial time
series like stock prices often exhibit changing volatility).
4. Structural Changes:
Augmented Dickey-Fuller (ADF) Test: A hypothesis test used to detect the presence of a unit
root, with the null hypothesis being that the series is non-stationary.
Phillips-Perron (PP) Test: Similar to ADF but more robust to autocorrelation and
heteroscedasticity in the error terms.
KPSS Test: Tests for stationarity around a deterministic trend. The null hypothesis is that the
series is stationary.
o A random walk is a special type of non-stationary model where the value of the
series is the sum of its previous value and a random shock (error term).
o The random walk model assumes no mean reversion, and past shocks accumulate,
leading to non-stationarity.
o Often used in financial markets where stock prices and exchange rates follow
random walks.
o The integrated part (denoted as "I" in ARIMA) represents the number of differences
required to transform a non-stationary series into a stationary one.
o ARIMA(p, d, q):
o Formula: ΔdY(t)=α0+α1Y(t−1)+α2Y(t−2)+...+ϵ(t)+θ1ϵ(t−1)+...
o SARIMA models extend ARIMA by adding terms to capture seasonal patterns in non-
stationary data.
s: Length of the seasonal cycle (e.g., 12 for monthly data with annual
seasonality).
o Useful for modelling data with seasonal trends like sales, weather data, and
production cycles.
o The model forecasts the level and the trend separately and is often used for series
with trends but without seasonal patterns.
1. Differencing:
o First differencing: Subtract the value of the series at time t-1 from the value at time
t: ΔY(t)=Y(t)−Y(t−1)
2. Detrending:
3. Transformation:
1. Finance:
o Non-stationary models are widely used in financial time series such as stock prices,
interest rates, and exchange rates, which often follow random walks or exhibit long-
term trends.
o ARIMA and random walk models are common tools for forecasting asset prices and
managing portfolio risk.
2. Macroeconomic Data:
o Economic indicators like GDP, inflation, and unemployment rates often exhibit trends
and require differencing or transformation to become stationary.
o Non-stationary models help forecast these variables and assess long-term policy
impacts.
o Climate models and temperature forecasts often use SARIMA models to capture
seasonal effects and long-term non-stationary trends.
o Retail sales and demand data exhibit non-stationary patterns due to seasonal trends
and growth.
o SARIMA models are used to forecast demand, manage inventory, and optimize
pricing strategies.
Model Complexity: Non-stationary models, particularly SARIMA and ARIMA, can be more
complex to estimate and interpret due to the need for differencing and seasonal
components.
Overfitting Risk: Including too many parameters (e.g., high differencing orders or seasonal
terms) can lead to overfitting and reduce the model's predictive power.
Conclusion
Non-stationary models are essential for time series data that exhibit trends, seasonality, or
changing variance.
ARIMA and SARIMA models are widely used to handle non-stationary time series by
applying differencing to achieve stationarity.
Understanding and applying techniques like differencing, transformation, and detrending are
crucial to modelling non-stationary time series accurately.
Topic 6: Generalised Least Squares models
Generalized Least Squares (GLS) is a statistical method used to estimate the parameters of a
linear regression model when there is heteroskedasticity or autocorrelation in the errors
(i.e., when the error terms are not independently and identically distributed).
In traditional Ordinary Least Squares (OLS), one of the key assumptions is that the errors are
homoscedastic (constant variance) and uncorrelated. GLS relaxes this assumption by
adjusting the model to account for these issues.
2. Autocorrelation: When the error terms are correlated with one another, which often
occurs in time series data.
3. Concept of GLS
GLS modifies the standard OLS approach by transforming the model to correct for
heteroskedasticity or autocorrelation in the residuals.
The key idea behind GLS is to apply a transformation to the regression equation that makes
the errors homoscedastic and uncorrelated, allowing for more efficient estimation of model
parameters.
(autocorrelation).
Minimizes the sum of squared residuals. Minimizes a weighted sum of squared residuals.
5. GLS Procedure
o In GLS, we assume that the error terms have a non-constant variance or are
correlated: Var(ϵ)=Σ where Σ is a covariance matrix of the error terms, which may
represent either heteroskedasticity or autocorrelation.
4. GLS Estimator:
o After the transformation, the GLS estimator for beta becomes: β GLS= (X′Σ−1X)−1X′Σ−1Y
o This formula accounts for the covariance structure of the errors, leading to more
efficient parameter estimates than OLS.
In practice, the exact structure of the covariance matrix Sigma is often unknown. In such
cases, Feasible Generalized Least Squares (FGLS) is used.
FGLS estimates the covariance matrix Sigma from the data and then uses this estimate to
perform GLS.
Steps:
2. Use the OLS residuals to estimate the covariance structure of the errors
(heteroskedasticity or autocorrelation).
FGLS is an iterative process, where the covariance matrix is refined at each step to improve
the model's efficiency.
7. Applications of GLS
GLS is particularly useful in time series data where autocorrelation is common, such as in
economic and financial data.
Panel Data:
In panel data models, where observations are collected across individuals and time,
heteroskedasticity and cross-sectional dependence are common, making GLS suitable.
Econometrics:
GLS is widely used in econometrics for models with heteroskedastic errors, common in
macroeconomic and microeconomic data.
Advantages:
Provides efficient estimates of the model parameters when the error terms are
heteroskedastic or autocorrelated.
Can handle a broader range of data patterns, making it applicable in various fields.
Disadvantages:
Requires an accurate estimate of the error covariance structure (which may not always be
easy to obtain).
FGLS can be computationally intensive and iterative, and incorrect estimation of the error
structure may lead to inefficient estimates.
Generalized Least Squares (GLS) is an important technique when dealing with non-constant
variance (heteroskedasticity) or correlated error terms (autocorrelation) in regression
models.
By accounting for these issues, GLS produces more efficient parameter estimates than OLS,
making it useful in a variety of real-world applications, particularly in time series and panel
data analysis.
Understanding how to apply GLS and FGLS is crucial for analysing and forecasting data where
OLS assumptions are violated.
They are designed to capture and forecast changing volatility patterns, which are common in
financial data such as stock prices, exchange rates, and interest rates.
o Volatility Clustering: Financial time series often exhibit periods of high volatility
followed by periods of low volatility.
o Leverage Effect: Negative shocks may lead to higher volatility than positive shocks of
the same magnitude.
o Heavy Tails: Returns data may exhibit fat tails (i.e., extreme returns occur more
frequently than predicted by normal distributions).
Objective: GARCH models aim to capture the dynamic nature of volatility and provide better
forecasts of future volatility.
o ARCH(q) model, introduced by Robert Engle in 1982, is the foundation for GARCH.
2. GARCH Model:
α0 is a constant term.
1. Estimation:
o Parameters of the GARCH model (e.g., αi and βj) are typically estimated using
maximum likelihood estimation (MLE).
o Software packages like R, Python (statsmodels, arch), and MATLAB can be used for
estimation.
2. Diagnostics:
o Model Fit: Check if the GARCH model adequately captures volatility clustering and
other features in the data.
o Goodness-of-Fit Tests: Use statistical tests and information criteria (AIC, BIC) to
assess model performance.
Applications of GARCH Models
Volatility Forecasting: GARCH models are used to forecast future volatility, which is crucial for
risk management, option pricing, and portfolio optimization.
Risk Management: Quantifying risk through Value-at-Risk (VaR) and Conditional Value-at-Risk
(CVaR) based on GARCH forecasts.
Financial Market Analysis: Analysing and predicting market behaviour, including asset pricing
and market efficiency.
Advantages:
Dynamic Volatility Modelling: Captures changing volatility over time, offering better forecasts
than models with constant variance.
Flexibility: Can be extended to include asymmetries and other features observed in financial
data.
Limitations:
Overfitting Risk: The risk of overfitting with more complex models and parameters.
Conclusion
GARCH models are essential tools for analysing and forecasting volatility in financial markets.
By capturing volatility clustering and changing variance over time, GARCH models provide
valuable insights for risk management, financial analysis, and decision-making.
Understanding and applying various GARCH models and their extensions allows for more
accurate volatility forecasts and improved financial modelling.