0% found this document useful (0 votes)
27 views37 pages

Time Series Forecasting

Uploaded by

Abhinav Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views37 pages

Time Series Forecasting

Uploaded by

Abhinav Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Time Series Forecasting-

The function pd.plotting.lag_plot(df['TEMP'], lag=1) is used to create a lag plot


for time series data. A lag plot helps visualize the relationship between a time
series and its lagged version, which is useful for identifying patterns like
autocorrelation in the data.
What is a Lag Plot?
A lag plot plots each data point in the time series against the previous data
point (for lag=1). If there is a strong correlation between successive values (i.e.,
TEMP(t) and TEMP(t-1)), the points will be aligned in a pattern (often linear). If
the points are scattered randomly, it suggests that there is little to no
correlation between the current and previous values.
Explanation of Key Elements:
 df['TEMP']: This is the time series data (in your case, temperature).
 lag=1: Specifies the lag interval. A lag=1 means each value is plotted
against the value that comes immediately before it (i.e., one step back in
the series).
How to Interpret the Plot:
 Strong autocorrelation: If the points on the plot form a clear diagonal
pattern (linearly or non-linearly), it indicates that there is a strong
relationship between the current value and the previous value of the
time series.
 Weak/no autocorrelation: If the points are scattered without any clear
pattern, it suggests little or no relationship between successive values.
Example Interpretation:
 If your temperature data (TEMP) shows a linear pattern in the lag plot, it
suggests that the current temperature values are highly correlated with
the previous temperature values. This can indicate some predictability or
persistence in the time series.
 If the plot is more of a random scatter, it implies that there is little to no
autocorrelation, meaning the current values do not depend much on
past values.
When to Use Lag Plots?
 Detecting autocorrelation: They are typically used to check for
autocorrelation, which is common in time series data like temperatures,
stock prices, etc.
 Checking stationarity: A lag plot can give hints on whether the data is
stationary or non-stationary (i.e., whether the statistical properties of
the data change over time).

Note: The past values are known as lags, so t-1 is lag 1, t-2 is lag 2, and so on.

In your code, you're testing various imputation methods for handling missing temperature
(TEMP) values using a subset of the data. Let's break down each technique you're trying and
how it's working:
1. Forward Fill (ffill):
 Method: The missing values are filled by propagating the last observed value
forward.
 Use Case: This works well when you assume that temperature doesn't change
drastically in a short period, i.e., the temperature in the next hour is likely close to
the last known value.

2. Rolling Mean with a Window = 3:


 Method: The missing values are imputed using the rolling mean (average of the last 3
values). The min_periods=1 ensures that if fewer than 3 values are available, it still
calculates the mean.
 Use Case: This smoothens out short-term fluctuations, which might be useful when
you want to account for gradual temperature changes over time.
3. Impute Data from 1 Year Ago:
 Method: Here, you're trying to fill missing values using the temperature from the
same time on the previous year. However, this method might not work well due to
the assumptions that the temperature on a given date last year will be similar this
year, which may not hold.
Issue: This method assumes you have data exactly one year prior for each missing value,
which might not always be available or applicable. Additionally, weather patterns can vary
significantly year-to-year, making this imputation less reliable.

A moving average in time series analysis is a method used to smooth


out short-term fluctuations and highlight longer-term trends or
cycles. It's a simple yet powerful technique often employed to
identify trends, forecast future data points, and remove noise from
the data.
Key Concepts:
1. Smoothing: By averaging a number of consecutive data points, you
reduce the effect of random fluctuations or noise, making the overall
trend easier to see.
2. Window Size (n): The number of consecutive data points considered for
calculating each average. The choice of window size depends on the
specific context and desired level of smoothing:
o A smaller window will be more sensitive to recent changes but
may capture more noise.
o A larger window will smooth the data more but may miss
important changes.

Types of Moving Averages:


1. Simple Moving Average (SMA): The average of the last n observations. It
gives equal weight to all observations within the window.

where Xt is the data point at time t, and n is the window size.


2. Weighted Moving Average (WMA): A moving average where more
recent observations are given more weight than older ones. Weights can
be assigned linearly or exponentially, depending on the approach.
3. Exponential Moving Average (EMA): Similar to the weighted moving
average, but the weights decay exponentially. The most recent data
points have the highest weight, and the contribution of older data points
diminishes quickly.

where α is the smoothing factor (usually between 0 and 1).


Applications:
 Trend Identification: By smoothing out fluctuations, moving averages
help to clearly identify upward or downward trends in the data.
 Forecasting: Moving averages can be used as a baseline model for
predicting future values, assuming the trend continues.
 Stock Market: Commonly used in stock market analysis to track the
direction of stock prices over time (e.g., 50-day and 200-day moving
averages).
Example:

If you have daily sales data over 10 days:


X=[20,22,19,24,25,27,29,31,30,32]
A 3-day simple moving average (SMA) at time t=4t would be:
SMA(4)=19+22+20/3=20.33
This smooths the daily fluctuations by averaging them.
Limitations:
 Lag: Moving averages introduce a lag because they are based on past
data, which can make them slower to react to recent changes.
 Window Size Sensitivity: Choosing an inappropriate window size may
either over-smooth or under-smooth the data.
Smoothing techniques in time series analysis help to reduce noise or random
fluctuations in data and reveal underlying patterns, such as trends or
seasonality. These techniques are crucial when analyzing time series data, as
they make it easier to identify key patterns that might otherwise be hidden due
to volatility in the data.
Here are some of the most commonly used smoothing techniques:
1. Moving Averages
Moving averages are simple techniques used to smooth data by averaging a set
number of consecutive data points. There are several variations:
 Simple Moving Average (SMA): Averages the last n data points, giving
equal weight to each.
 Weighted Moving Average (WMA): Assigns different weights to data
points within the window, with more recent points typically having
higher weights.
 Exponential Moving Average (EMA): Applies exponentially decreasing
weights, with the most recent data point having the highest weight. This
technique reacts more quickly to recent changes.
2. Exponential Smoothing
Exponential smoothing is a popular technique that uses weighted averages, but
with weights that decrease exponentially as observations get older.
 Single Exponential Smoothing (SES): Suitable for series without trend or
seasonality. It uses a smoothing factor (alpha) to give more importance
to recent observations.

 Double Exponential Smoothing: Accounts for trends by including a trend


component. This method smooths both the level and trend over time.
 Triple Exponential Smoothing (Holt-Winters): Handles data with both
trends and seasonality by using three components—level, trend, and
seasonality.
3. LOESS (Local Regression)
LOESS (Locally Estimated Scatterplot Smoothing) is a non-parametric technique
that fits multiple regressions to localized subsets of the data. It's useful when
you don't want to assume a specific form for the trend (linear, quadratic, etc.).
 Lowess (Locally Weighted Scatterplot Smoothing) is a variant of LOESS
where weights are applied to points within the local neighborhood.
The key advantage of LOESS is its flexibility, as it adapts to complex patterns
and works well for non-linear relationships.

In time series analysis, checking for stationarity typically comes


before applying any smoothing techniques. Here's why:
1. Stationarity is Essential for Analysis
Many time series models (like ARIMA, SARIMA, and Exponential
Smoothing) assume that the data is stationary, meaning that its
statistical properties—mean, variance, and autocorrelation—do not
change over time. If your series is non-stationary, it can lead to
inaccurate modeling and poor forecasting.
2. Smoothing is a Preprocessing Tool
Smoothing techniques like moving averages, exponential smoothing,
or LOESS are generally used for reducing noise or revealing trends.
Smoothing does not address stationarity issues like trends,
seasonality, or unit roots directly. It may help in visualizing the trend
or seasonality, but it doesn't remove them or make the data
stationary.
Process Overview:
1. Check for Stationarity First:
o Visual Inspection: Plot the data and look for signs of non-
stationarity (e.g., clear trends, seasonality, or varying
variance).
o Statistical Tests: Use tests like the Augmented Dickey-
Fuller (ADF) test or the KPSS test to formally check for
stationarity.
 If the data is non-stationary, steps like differencing,
detrending, or deseasonalizing may be needed to
make it stationary.
2. Apply Smoothing (if needed):
o After addressing stationarity, you can apply smoothing
techniques to reduce noise, detect trends, or visualize the
underlying structure of the data.
o In some cases, smoothing may make certain patterns
more apparent, but it doesn't solve stationarity issues.
When Smoothing Helps Before Stationarity:
In rare cases, smoothing can be applied to highlight trends or cycles
before stationarity checks. However, this is mainly for exploratory
analysis, not as a substitute for making the series stationary.
Example Workflow:
1. Check for stationarity with a test (e.g., ADF test):
o If non-stationary, perform transformations like
differencing or detrending.
o If stationary, proceed to modeling.
2. Apply smoothing to reduce noise or visualize the trend (if
needed) after stationarity is achieved.
Conclusion:
It’s generally better to check for stationarity first, transform the data
if needed, and then apply smoothing for noise reduction or
visualization. Smoothing doesn't directly make the series stationary;
it’s primarily used for detecting trends, not for making the data
suitable for modeling.

The Augmented Dickey-Fuller (ADF) test is a statistical test used to


determine whether a given time series is stationary or not.
It specifically tests for the presence of a unit root, which indicates
non-stationarity.
Why Stationarity is Important:
A stationary time series has constant statistical properties over time
(such as mean, variance, and autocorrelation), which is a key
assumption for many time series models. If a time series is non-
stationary, these properties change over time, making it challenging
to model or forecast.
Key Concepts in ADF Test:
 Null Hypothesis (H₀): The series has a unit root (i.e., it is non-
stationary).
 Alternative Hypothesis (H₁): The series does not have a unit
root (i.e., it is stationary).
If the test statistic is sufficiently negative, it suggests that the null
hypothesis can be rejected, meaning the series is stationary.
ADF Test Procedure:
The ADF test expands on the simpler Dickey-Fuller test by including
lagged differences of the series to account for higher-order
autocorrelation. This prevents autocorrelation from biasing the test
results.

Where:
 Yt is the time series at time t
 ΔYt=Yt−Yt−1 is the difference between consecutive values.
 α is a constant (optional).
 βt represents a time trend (optional).
 γYt−1 is the coefficient of the lagged level of the series, which the test
focuses on.
 δ1,δ2,.. are coefficients for the lagged differences of the series.
 ϵt is white noise.
Interpretation of ADF Test Results:
 Test Statistic: This value is compared to critical values from the Dickey-
Fuller distribution. If the test statistic is smaller than the critical value,
you reject the null hypothesis.
 p-value: If the p-value is below a significance level (usually 0.05), you
reject the null hypothesis, indicating that the series is stationary.
 Critical Values: These are threshold values at different significance levels
(1%, 5%, and 10%) that you compare against the test statistic.
Possible Outcomes:
1. Reject the null hypothesis (H₀): The series is stationary (no unit root is
present).
2. Fail to reject the null hypothesis: The series is non-stationary (a unit
root is present).
Example Workflow:
1. Check for stationarity using the ADF test.
2. If the series is non-stationary, apply transformations (e.g., differencing,
detrending).
3. Retest the stationarity of the transformed series.

Yes, simple moving average (SMA) and differencing are different techniques
used in time series analysis, although they both aim to handle certain aspects
of time series data.
1. Simple Moving Average (SMA):
 Purpose: SMA is primarily used for smoothing a time series by averaging
data points over a fixed window, which reduces noise and makes trends
more apparent.
 How it works: It calculates the average of a fixed number of previous
data points. For example, a 3-period moving average would calculate the
average of the last three observations for each point in time.
Formula for a 3-point SMA:
SMAt=Yt−1+Yt−2+Yt−33SMA_t = \frac{Y_{t-1} + Y_{t-2} + Y_{t-3}}{3}SMAt=3Yt−1
+Yt−2+Yt−3
Where YtY_tYt is the original time series.
 Effect: Smoothing, but it doesn't make the series stationary or remove
trends directly. It only makes the overall trend smoother by averaging
out short-term fluctuations.
2. Differencing:
 Purpose: Differencing is specifically used to remove trends or
seasonality and make a non-stationary time series stationary.
 How it works: It calculates the difference between consecutive data
points. First-order differencing, for example, takes the difference
between the current value and the previous value to remove a linear
trend.
Formula for first-order differencing:
ΔYt=Yt−Yt−1
Where Yt is the original time series and ΔYt is the differenced series.
 Effect: Differencing can remove both trends and seasonal patterns,
which helps in achieving stationarity. It is a common technique to
transform a non-stationary series into a stationary one.
Key Differences:
Simple Moving Average
Aspect Differencing
(SMA)
Smooth the data and reduce Remove trends and make the
Goal
noise series stationary
Averages the values over a Takes the difference between
How it works
fixed window consecutive data points
Retains long-term trends, only
Effect on Removes trends and long-term
smooths short-term
trends patterns
fluctuations
Helps make the series stationary
Effect on Does not directly address
by removing trends or
stationarity stationarity
seasonality
Commonly used as a
Often used for visualization
Application preprocessing step for time
and detecting trends
series modeling
When to Use:
 Use SMA when you want to smooth the data and visualize the overall
trend without altering the original time series structure.
 Use differencing when your time series is non-stationary and you want
to remove trends or seasonality to meet the assumptions of certain time
series models (like ARIMA).
Example:
Suppose you have a time series with a clear upward trend:
1. Applying SMA will smooth the noise but the upward trend will remain.
2. Applying differencing will remove the trend, and the transformed series
may become stationary.
In Summary:
 SMA is a smoothing technique.
 Differencing is used to remove trends and achieve stationarity.

Seasonality in Time Series


Seasonality refers to patterns or cycles in a time series that repeat at regular
intervals due to seasonal factors like weather, holidays, or financial cycles.
These seasonal variations can help in forecasting by highlighting predictable
changes over time.
For example, sales of winter clothing typically peak during colder months, or
electricity usage might increase during the summer due to air conditioning
demands. Detecting and analyzing seasonality allows us to model and predict
these variations.
How to Detect Seasonality
1. Visual Inspection (Time Series Plot)
o Plot the time series data to visually inspect recurring patterns.
Seasonal fluctuations often appear as consistent peaks and
troughs.
o Example tools:
 matplotlib or plotly in Python for visualization.
 Seaborn’s line plots.
2. Autocorrelation Function (ACF)
o The ACF measures the correlation between observations at
different lags. Peaks in the ACF at specific lags suggest seasonality.
o Example:
 Use the plot_acf() function from the statsmodels library in
Python to visualize the autocorrelations.
3. Decomposition (Additive/Multiplicative)
o Time series can be decomposed into three components:
 Trend: The long-term direction of the series.
 Seasonality: The repeating short-term cycle.
 Residual: The remainder after removing the trend and
seasonality.
o Use Seasonal Decomposition of Time Series (STL decomposition)
to separate these components. This can help identify the seasonal
component explicitly.
o Example:
 statsmodels.tsa.seasonal_decompose() in Python.
4. Fourier Transform
o Use a Fourier transform to detect dominant frequencies in the
data, which correspond to seasonal cycles. This is useful when
there are multiple overlapping seasonal patterns.
o Example:
 np.fft.fft() in Python to compute the Fourier transform.
5. Seasonal Subseries Plot
o Group the time series by the season (e.g., by month for monthly
data) and plot the values for each season over time. This highlights
the consistency of seasonal patterns.
Analyzing Seasonality
1. Seasonal Strength
o The strength of seasonality can be quantified by the variance of
the seasonal component relative to the residual variance. If the
seasonal variance is high, seasonality plays a major role in the time
series.
2. Modeling Seasonality
o If seasonality is detected, you can incorporate it into models:
 ARIMA/SARIMA models: These can handle seasonality by
adding seasonal terms (e.g., SARIMA adds seasonal
components to the ARIMA model).
 Exponential Smoothing (Holt-Winters): This method
includes seasonal components and is widely used for
forecasting seasonal data.
3. Periodicity Analysis
o If the period of the seasonality is unknown, you can use tools like
periodograms or Fourier analysis to determine the periodicity.
For ACF AND PACF read article—best article
https://towardsdatascience.com/significance-of-acf-and-pacf-plots-in-time-
series-analysis-2fa11a5d10a8
This Kaggle also best …
https://www.kaggle.com/code/iamleonie/time-series-interpreting-acf-and-
pacf/notebook

ACF (Autocorrelation Function) is a tool used to measure the correlation


between a time series and its lagged versions over various lags. It helps in
identifying patterns like seasonality, trend, and noise in time series data.
Key Concepts:
 Autocorrelation: Measures the correlation between a time series and its
lagged values.
 Lag: The time shift or displacement between data points in a time series.
The ACF plot is particularly useful for:
 Identifying seasonality: If there is a repeating pattern at regular
intervals, you'll see spikes at the corresponding lags.
 Determining the memory of the series: It shows how much the past
values of the series influence its future values.
Interpreting the ACF Plot
 Significant peaks: Lags with autocorrelation values above or below the
confidence interval lines (typically shown as a blue dashed line) are
statistically significant.
 Seasonality: If spikes occur at regular intervals (e.g., every 12 lags for
monthly data), this suggests a seasonal pattern.
 Gradual decay: A slow decay of autocorrelation over lags indicates the
presence of a trend in the data.
 Sharp cutoff: A sharp drop in autocorrelation after a few lags indicates a
short memory in the time series, often characteristic of stationary series.
Practical Uses of ACF:
1. Detecting seasonality: Regular peaks at specific lags indicate seasonal
cycles.
2. Model selection: ACF helps in deciding the lag orders for ARIMA models.
For example, in AR models, the ACF cuts off after p lags.

The Autocorrelation Function (ACF) is a mathematical tool used to measure


the correlation between a time series and its lagged values. It quantifies the
relationship between an observation at a given time point and observations at
previous time points.
The ACF provides information about the strength and direction of linear
dependence between the values of a time series at different lags. It helps
identify any patterns or dependencies in the data. The ACF values range from -
1 to 1, where:
 ACF value of 1 indicates a perfect positive correlation between the
current observation and the lagged observation.
 ACF value of -1 indicates a perfect negative correlation between the
current observation and the lagged observation.
 ACF value of 0 indicates no correlation between the current observation
and the lagged observation.
By plotting the ACF values against different lags, we can analyze the
autocorrelation structure of the time series and identify any significant patterns
or dependencies.

The Partial Autocorrelation Function (PACF) is used in time series analysis to


measure the correlation between a time series and its lagged versions while
controlling for the values of the intervening lags. It helps in identifying the
order of an autoregressive (AR) model.
How is PACF Different from ACF?
 ACF measures the correlation between a time series and its lagged
values, considering all previous lags.
 PACF shows the direct correlation between a time series and its lagged
values, after removing the effects of any intermediate lags.
Key Usage of PACF:
 Identifying the AR order: The PACF plot is used to determine the number
of significant lags (lag order) to use in an autoregressive model (AR). If
the PACF plot cuts off after a few lags, it suggests an AR model with that
number of lags.
 Stationarity: Similar to ACF, PACF can help in diagnosing stationarity, as
non-stationary data will often show high PACF values for many lags.
Interpreting the PACF Plot:
1. Significant spikes: The first lag that shows a significant spike (above the
confidence interval) indicates the appropriate lag for an AR model.
2. Cutoff: If the PACF cuts off after a few lags, it indicates that an
autoregressive (AR) model with those lag orders is appropriate.

Simple Definition: Correlate a time series with itself, ignoring the effects of an
intermediate lags.

Time series decomposition involves breaking down a time series into its
fundamental components: trend, seasonality, and residuals (noise or
irregularities). This allows for better understanding and modeling of the data.
The most commonly used method for decomposition is based on moving
averages and is available in libraries like statsmodels.
Types of Decomposition:
1. Additive Decomposition: Assumes that the components add up to
create the time series.
o Y(t)=Trend(t)+Seasonality(t)+Residual(t)Y(t) = Trend(t) +
Seasonality(t) + Residual(t)Y(t)=Trend(t)+Seasonality(t)+Residual(t)
2. Multiplicative Decomposition: Assumes that the components multiply
to create the time series.
o Y(t)=Trend(t)×Seasonality(t)×Residual(t)Y(t) = Trend(t) \times
Seasonality(t) \times
Residual(t)Y(t)=Trend(t)×Seasonality(t)×Residual(t)

Achieving stationarity in time series data is critical for many statistical models,
especially those in forecasting, such as ARIMA, where assumptions of
stationarity are often required. A stationary time series has properties like
constant mean, variance, and autocovariance over time.
Here are some methods to transform a non-stationary time series into a
stationary one:
1. Differencing
 Description: Subtract the previous observation from the current
observation to remove trends and make the series stationary.
 How to apply:
python
Copy code
df_diff = df['Value'].diff().dropna()
 Order of differencing:
o First-order differencing: Removes linear trends.
o Second-order differencing: May be necessary if the trend is
quadratic or higher-order.
2. Transformation (Logarithm, Square Root, etc.)
 Description: Apply transformations to stabilize the variance
(heteroscedasticity).
 Log Transformation: Used to reduce large fluctuations in variance.
python
Copy code
df_log = np.log(df['Value'])
 Square Root Transformation: Works similarly to log transformation but is
less aggressive.
python
Copy code
df_sqrt = np.sqrt(df['Value'])
 Power Transformation: The Box-Cox or Yeo-Johnson transformations can
stabilize variance and make the series more stationary.
python
Copy code
from scipy.stats import boxcox
df_boxcox, lmbda = boxcox(df['Value'])
3. De-trending
 Description: Remove the underlying trend in the data.
 Fitting a regression: Fit a polynomial or linear regression model to the
series and then subtract the trend component from the original data.
python
Copy code
from scipy import signal
detrended = signal.detrend(df['Value'])
4. Seasonal Decomposition
 Description: Remove seasonality by decomposing the series into trend,
seasonality, and residuals using methods like additive or multiplicative
decomposition.
python
Copy code
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(df['Value'], model='additive',
period=12)
df_deseasonalized = df['Value'] - decomposition.seasonal
5. Moving Average Smoothing
 Description: Smooth out short-term fluctuations to capture long-term
trends. After smoothing, you can subtract the smoothed series from the
original to remove the trend.
python
Copy code
df_smooth = df['Value'].rolling(window=12).mean()
df_detrended = df['Value'] - df_smooth
6. Difference from the Mean
 Description: Subtract the mean of the series from each data point to
stabilize the mean.
python
Copy code
df_mean_diff = df['Value'] - df['Value'].mean()
7. Exponentially Weighted Moving Average (EWMA)
 Description: Use EWMA to smooth the time series and remove short-
term variations. Then subtract the smoothed series from the original to
remove the trend.
python
Copy code
df_ewma = df['Value'].ewm(span=12).mean()
df_detrended_ewma = df['Value'] - df_ewma
8. Unit Root Test (Dickey-Fuller)
 Description: Use statistical tests like the Augmented Dickey-Fuller (ADF)
test to check stationarity. If the test statistic is greater than the critical
value, the series is non-stationary, and differencing or other techniques
may be needed.
python
Copy code
from statsmodels.tsa.stattools import adfuller
result = adfuller(df['Value'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
 If the p-value is less than 0.05, the series is stationary. If not, you may
need to apply differencing or other transformations.
9. Seasonal Differencing
 Description: Differencing at seasonal lags, i.e., subtracting the value from
the same point in the previous season.
o For monthly data, differencing at a lag of 12 would subtract each
value from its value 12 months ago:
python
Copy code
df_seasonal_diff = df['Value'].diff(12).dropna()
10. Variance Stabilization Using the Hodrick-Prescott Filter
 Description: Decompose a time series into a trend and a cyclical
component using the Hodrick-Prescott filter to remove long-term trend.
python
Copy code
from statsmodels.tsa.filters.hp_filter import hpfilter
cycle, trend = hpfilter(df['Value'], lamb=1600) # Use lambda=1600 for
quarterly data, adjust accordingly
df_detrended = df['Value'] - trend
11. Combining Techniques
 Sometimes a single method might not fully achieve stationarity. You
might need to:
o First, log transform to stabilize variance.
o Then, apply differencing to remove trends.
o Finally, seasonally difference to address seasonality.
How to Check for Stationarity:
Once you've applied a method, you should check whether the time series has
become stationary by:
 Plotting the series: Visually inspect for a constant mean and variance.
 Performing the ADF Test: To verify statistical stationarity.
 Plotting ACF/PACF: For stationarity, the autocorrelations should drop off
after a few lags and show no significant autocorrelation.

The KPSS (Kwiatkowski-Phillips-Schmidt-Shin) test is another popular test for


checking the stationarity of a time series. While the ADF (Augmented Dickey-
Fuller) test tests the null hypothesis that a series has a unit root (i.e., it is non-
stationary), the KPSS test has the opposite null hypothesis — it tests whether
the time series is stationary around a deterministic trend.
Here’s how to use the KPSS test in Python:
Null and Alternate Hypotheses:
 Null Hypothesis (H0): The series is stationary.
 Alternative Hypothesis (H1): The series is non-stationary.
If the p-value is small (typically < 0.05), you reject the null hypothesis and
conclude that the series is non-stationary.

So in summary, the ADF test has an alternate hypothesis of linear or difference


stationary, while the KPSS test identifies trend-stationarity in a series.

Types of Stationarity
 Strict Stationary: A strict stationary series satisfies the mathematical
definition of a stationary process. For a strict stationary series, the mean,
variance and covariance are not the function of time. The aim is to
convert a non-stationary series into a strict stationary series for making
predictions.

 Trend Stationary: A series that has no unit root but exhibits a trend is

referred to as a trend stationary series. Once the trend is removed, the

resulting series will be strict stationary. The KPSS test classifies a series

as stationary on the absence of unit root. This means that the series can

be strict stationary or trend stationary.

 Difference Stationary: A time series that can be made strict stationary by

differencing falls under difference stationary. ADF test is also known as a

difference stationarity test.


It’s always better to apply both the tests, so that we are sure that the series is
truly stationary. Let us look at the possible outcomes of applying these
stationary tests.
 Case 1: Both tests conclude that the series is not stationary -> series is
not stationary
 Case 2: Both tests conclude that the series is stationary -> series is
stationary
 Case 3: KPSS = stationary and ADF = not stationary -> trend stationary,
remove the trend to make series strict stationary
 Case 4: KPSS = not stationary and ADF = stationary -> difference
stationary, use differencing to make series stationary

Making a Time Series Stationary


Now that we are familiar with the concept of stationarity and its different
types, we can finally move on to actually making our series stationary. Always
keep in mind that in order to use time series forecasting models, it is necessary
to convert any non-stationary series to a stationary series first.
Differencing
In this method, we compute the difference of consecutive terms in the series.
Differencing is typically performed to get rid of the varying mean.
Mathematically, differencing can be written as:
yt‘ = yt – y(t-1)
where yt is the value at a time t
Seasonal Differencing
In seasonal differencing, instead of calculating the difference between
consecutive values, we calculate the difference between an observation and a
previous observation from the same season. For example, an observation taken
on a Monday will be subtracted from an observation taken on the previous
Monday. Mathematically it can be written as:
yt‘ = yt – y(t-n)

Transformation
Transformations are used to stabilize the non-constant variance of a series.
Common transformation methods include power transform, square root, and
log transform.
The Box-Cox transformation is a technique used to stabilize variance and make
data more closely approximate a normal distribution. It's particularly useful
when the data exhibits skewness. The transformation is defined as follows:

Where:
 y is the original data.
 λ is the transformation parameter that determines the power to which
all data points are raised.
Steps to Perform a Box-Cox Transformation:
1. Positive Data: The Box-Cox transformation can only be applied to
positive values. If the data has negative values or zeros, a constant must
be added to make all values positive.
2. Estimate Lambda: The transformation seeks to find an optimal value for
λ\lambdaλ. This value is usually determined via maximum likelihood
estimation (MLE).
3. Apply the Transformation: Depending on the value of λ\lambdaλ, you
apply the corresponding formula to transform the data.

Box-cox Transformation only cares about computing the value of which


varies from – 5 to 5. A value of is said to be best if it is able to approximate
the non-normal curve to a normal curve.
Does Box-cox always work? The answer is NO. Box-cox does not guarantee
normality because it never checks for the normality which is necessary to be
foolproof that it has correctly transformed the non-normal distribution or not.
It only checks for the smallest Standard deviation.
Therefore, it is absolutely necessary to always check the transformed data for
normality using a probability plot.
AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are
statistical metrics used for model selection, particularly in time series and
regression analysis. Both criteria provide a way to evaluate and compare
models, balancing model fit and complexity to help avoid overfitting. Here’s a
breakdown of each:
1. Akaike Information Criterion (AIC)
 Definition: AIC estimates the relative quality of a statistical model for a
given set of data.
 Formula: AIC= 2k − 2ln(L) where:
o k is the number of parameters in the model,
o L is the likelihood of the model (how well it fits the data).
 Interpretation: Lower AIC values indicate a better model, considering
both goodness of fit and model complexity. Adding more parameters
improves fit but increases AIC, as AIC penalizes model complexity to
avoid overfitting.
 Usage: Use AIC to compare models; the model with the lowest AIC is
often preferred.
2. Bayesian Information Criterion (BIC)
 Definition: BIC is similar to AIC but applies a stricter penalty for the
number of parameters, especially as the dataset size increases.
 Formula: BIC = k ln(n) -2 ln(L) where:
o n is the number of observations in the dataset,
o k and L are as defined above.
 Interpretation: Like AIC, a lower BIC indicates a preferable model.
However, since BIC has a larger penalty for complexity when n is large, it
often selects simpler models than AIC.
 Usage: BIC is often preferred when working with large datasets, as it
favors simpler models more strongly.
When to Use AIC vs. BIC
 AIC is often chosen for exploratory analysis or when model fit is the
primary concern, as it is generally more permissive with adding
parameters.
 BIC is more conservative and is often chosen in large datasets or when
model simplicity is prioritized.

ARIMA (AutoRegressive Integrated Moving Average) is a popular statistical


model used for time series forecasting. It combines three components:
 AR (AutoRegressive): This component captures the relationship between
an observation and a number of lagged observations (previous time
points).
 I (Integrated): This part involves differencing the raw observations to
make the time series stationary, which means the statistical properties
(like mean and variance) are constant over time.
 MA (Moving Average): This component models the relationship
between an observation and a residual error from a moving average
model applied to lagged observations.
AIC (Akaike Information Criterion)
AIC is a measure used to evaluate how well a model fits a given dataset, while
penalizing for the number of parameters used in the model. The goal is to
identify the model that best explains the data without overfitting. The formula
for AIC is:
AIC=2k−2ln⁡(L)\text{AIC} = 2k - 2\ln(L)AIC=2k−2ln(L)
Where:
 kkk is the number of parameters in the model.
 LLL is the maximum likelihood estimate of the model.
Interpretation of AIC
 Lower AIC values indicate a better-fitting model: When comparing
different models, the one with the lowest AIC is generally preferred.
 Penalty for complexity: The term 2k2k2k imposes a penalty for models
with more parameters to discourage overfitting.
Automated ARIMA
Automated ARIMA (often referred to as auto-ARIMA) is a technique that
automatically selects the best ARIMA model parameters (p, d, q) based on the
AIC. Here’s a simplified approach to building an automated version of an
ARIMA model:
1. Data Preparation: Ensure your time series data is cleaned and formatted
correctly.
2. Stationarity Check: Use tests like the Augmented Dickey-Fuller test to
check for stationarity. If the series is not stationary, differencing will be
applied.
3. Model Selection: Use a library that implements auto-ARIMA, which will
iterate through possible combinations of p, d, and q values, fitting
models and calculating AIC for each.
4. Fit the Best Model: Select the model with the lowest AIC and fit it to the
data.
5. Forecasting: Use the fitted model for making future predictions.

What is Average Forecasting?


Average forecasting calculates the mean of past data points and uses that
mean to predict future values. This method assumes that future observations
will continue to follow a stable, consistent pattern like the historical data.
Formula
The formula for average forecasting is:

When to Use Average Forecasting:


 Stable Data: If the historical data shows no clear trend or seasonal
variation.
 Short-Term Forecasting: It works best for short-term predictions where
significant changes in the trend are unlikely.
 Baseline Forecast: It can serve as a simple baseline forecast to compare
more complex models.

Limitations:
 Ignores Trends and Seasonality: It assumes that future observations will
be similar to past averages, which can be problematic if the data has
trends or seasonality.
 Over-Simplification: It may not capture fluctuations or changes in the
data that more advanced forecasting methods might detect.

Naive forecasting is a straightforward forecasting method used primarily for


time series data. It involves predicting future values based on the most recent
observation. This approach is particularly useful in contexts where historical
data may not exhibit a clear trend or seasonality, making it difficult to apply
more complex forecasting techniques.
Key Characteristics of Naive Forecasting
1. Simplicity: Naive forecasting is very easy to understand and implement.
It requires minimal statistical knowledge and computational resources.
2. Assumption of Stability: This method assumes that future values will be
similar to the most recent observed value, making it suitable for
stationary time series data.
3. No Historical Trends or Patterns: Naive forecasting does not consider
past trends, seasonal patterns, or other historical factors; it simply uses
the latest observation as the forecast.
How It Works
The basic idea of naive forecasting is to set the forecast for the next time
period equal to the most recent observed value. The formula can be expressed
as:
Forecast(t+1) = Actual(t)
Where:
 Forecast(t+1 is the predicted value for the next period.
 Actual(t) is the actual value observed in the most recent time period.
Applications
Naive forecasting can be applied in various domains, including:
 Sales Forecasting: Predicting future sales based on the latest sales
figures.
 Demand Planning: Estimating future demand for products or services.
 Inventory Management: Forecasting stock levels based on recent
inventory data.
 Weather Forecasting: Predicting future weather conditions based on the
most recent observations.
Advantages
 Easy to Implement: No need for complex calculations or software.
 Quick to Update: The forecast can be updated easily as new data
becomes available.
 Useful for Short-Term Forecasting: It can be effective in the short term
when trends are not apparent.
Limitations
 Ignores Trends and Seasonality: Naive forecasting may not perform well
when there are clear trends or seasonal patterns in the data.
 Limited Accuracy: It is often less accurate compared to more
sophisticated forecasting methods, especially for longer-term
predictions.
Seasonal naive forecasting is an extension of naive forecasting that is
specifically designed to handle time series data exhibiting seasonal patterns.
This method predicts future values based on the most recent observation from
the same season (or period) in the previous year. It is particularly useful for
data with clear seasonality, such as retail sales, temperature changes, and
tourism data.
Key Characteristics of Seasonal Naive Forecasting
1. Seasonal Focus: Unlike basic naive forecasting, which uses the last
observed value, seasonal naive forecasting uses the value from the same
season in the previous cycle (e.g., month, quarter) for predictions.
2. Simplicity: Similar to naive forecasting, it is easy to implement and
requires minimal computational effort.
3. Assumes Seasonal Stability: This method assumes that seasonal
patterns will continue to repeat over time.
How It Works
The basic idea is to set the forecast for a future time period equal to the last
observed value from the same season in the previous cycle.
For example, if you have monthly data, the forecast for January 2025 would be
equal to the value from January 2024.
Formula
The formula can be expressed as:

Where:
 Forecast(t+n)\text{Forecast}(t + n)Forecast(t+n) is the predicted value for
the next period.
 Actual(t−m)\text{Actual}(t - m)Actual(t−m) is the actual value observed
in the same period of the previous cycle.
 nnn is the length of the season (e.g., 12 for monthly data).
 mmm is the number of periods back to the same season (e.g., 12 for
monthly data).
Applications
Seasonal naive forecasting can be applied in various domains, including:
 Retail: Predicting future sales based on sales figures from the same
month in the previous year.
 Tourism: Estimating future visitor numbers based on previous years’ data
for the same season.
 Agriculture: Forecasting crop yields based on historical seasonal data.
Advantages
 Simplicity: Easy to understand and implement.
 Effective for Seasonal Data: Performs well when data shows clear
seasonal patterns.
 Quick to Update: Forecasts can be updated easily as new data becomes
available.
Limitations
 Ignores Trends: Does not account for trends or changes outside the
seasonal pattern.
 Limited to Seasonal Data: Not suitable for time series data that do not
exhibit seasonality.
 Static Approach: Assumes that past seasonal patterns will hold true in
the future without considering other influencing factors.
A drift model is a type of statistical model used in time series analysis to
capture the underlying trend or movement in the data over time. It is
particularly useful when you want to predict future values based on historical
data, especially when there is a systematic upward or downward trend (drift) in
the data.
Key Characteristics of Drift Models
1. Trend Representation: Drift models explicitly represent trends in time
series data, allowing for better predictions when the data shows
consistent growth or decline over time.
2. Simple Structure: The mathematical formulation of drift models is
typically straightforward, often using linear regression to model the
trend.
3. Adaptability: Drift models can be adjusted or updated as new data
points become available, allowing them to remain relevant over time.
How It Works
In a basic drift model, the predicted value for the next time step can be
expressed as:

Where:
 YtY_tYt is the predicted value at time ttt.
 μ\muμ is the drift term, representing the average change per time step.
 ϕ\phiϕ is the coefficient that determines how much the previous
observation influences the current observation.
 ϵt\epsilon_tϵt is the error term, which is usually assumed to be normally
distributed with mean zero.
Limitations
 Assumes Linear Trends: Drift models may not capture nonlinear trends
effectively.
 Sensitive to Outliers: Extreme values in the data can significantly affect
the model's predictions.
 Ignores Seasonality: Basic drift models do not account for seasonal
patterns unless specifically modified to do so.

https://orangematter.solarwinds.com/2019/12/15/holt-winters-forecasting-
simplified/
The Holt-Winters model, also known as the Triple Exponential Smoothing
model, is a popular time series forecasting method that accounts for
seasonality, trends, and levels in the data.
For Triple Exponential Smoothing (Holt-Winters):
 You do not need to make the data stationary. This model can handle
non-stationary data with trend and seasonality by directly modeling
these components.

It is particularly effective for datasets that exhibit both trends and seasonal
patterns.
Key Features of the Holt-Winters Model
1. Level: The average value in the series at the current time.
2. Trend: The long-term increase or decrease in the data.
3. Seasonality: The repeating fluctuations over a specific period.
The Holt-Winters model consists of three main components:
 Level component (Lt)
 Trend component (Tt)
 Seasonal component (St)

Types of Holt-Winters Models


1. Additive Model: Used when the seasonal variations are roughly constant
over time.
 Use this when the magnitude of seasonal variations remains
relatively constant over time.
 Example: The increase in complaints during holiday seasons (like
November) is by a similar number every year.

o The forecast is calculated as: Ft+h = Lt + hTt + St + Error


2. Multiplicative Model: Used when the seasonal variations change
proportionally to the level of the series.
 Use this when the magnitude of seasonal variations increases or
decreases proportionally with the level of the series (i.e., the higher the
trend, the larger the seasonal effect).

 Example: As your company grows, the complaints during holiday


seasons become proportionally larger every year.

o The forecast is calculated as:

o Ft+h=(Lt+hTt) × St+h

Confidence Intervals: The return_conf_int=True argument in the


predict method indicates that you want to receive the confidence
intervals along with the forecasted values.
 A confidence interval gives a range of values within which we
expect the actual future values to fall, with a certain level of
confidence (commonly 95%).
 For example, if your forecasted value is 100 with a 95%
confidence interval of [90, 110], it means you can be 95%
confident that the actual value will lie between 90 and 110.
Lower and Upper Bounds: conf_int typically returns a DataFrame
with two columns:
 Lower Bound: The lower end of the confidence interval.
 Upper Bound: The upper end of the confidence interval.

Facebook Prophet is a forecasting tool designed for producing high-


quality forecasts for time series data that may exhibit various
patterns. Here are some scenarios where using Facebook Prophet
would be particularly beneficial:
1. Time Series Data with Seasonality
 If your data exhibits strong seasonal effects (e.g., daily, weekly,
yearly patterns), Prophet can model these seasonal
components effectively.
2. Missing Data
 Prophet can handle missing data points quite well. If your time
series has gaps or irregular intervals, Prophet can still generate
reliable forecasts.
3. Trend Changes
 When your time series data shows abrupt changes in trends
(e.g., a significant shift upward or downward), Prophet allows
you to add changepoints manually or automatically detect
them.
4. Holiday Effects
 If your data is affected by holidays or special events (e.g., retail
sales spikes during holidays), you can easily incorporate these
effects into the model.
5. Non-Expert Users
 Prophet is designed to be user-friendly and allows non-experts
to generate forecasts without deep knowledge of statistical
modeling. It requires minimal preprocessing and has simple
parameter tuning options.
6. Quick Prototyping
 If you need to develop a forecasting solution quickly, Prophet
allows for fast experimentation and can provide decent results
without extensive tuning.
7. Daily Observations
 It works particularly well with daily time series data. While it
can handle other frequencies, it's most effective with daily
observations.
8. Scalability
 If you're working with larger datasets or want to generate
forecasts for multiple time series simultaneously, Prophet is
designed to scale efficiently.
9. Interpretable Outputs
 The decomposition of the forecast into trend, seasonality, and
holidays makes it easier to understand the underlying factors
influencing the forecast, which can be valuable for decision-
making.

You might also like