Time Series Forecasting
Time Series Forecasting
Note: The past values are known as lags, so t-1 is lag 1, t-2 is lag 2, and so on.
In your code, you're testing various imputation methods for handling missing temperature
(TEMP) values using a subset of the data. Let's break down each technique you're trying and
how it's working:
1. Forward Fill (ffill):
Method: The missing values are filled by propagating the last observed value
forward.
Use Case: This works well when you assume that temperature doesn't change
drastically in a short period, i.e., the temperature in the next hour is likely close to
the last known value.
Where:
Yt is the time series at time t
ΔYt=Yt−Yt−1 is the difference between consecutive values.
α is a constant (optional).
βt represents a time trend (optional).
γYt−1 is the coefficient of the lagged level of the series, which the test
focuses on.
δ1,δ2,.. are coefficients for the lagged differences of the series.
ϵt is white noise.
Interpretation of ADF Test Results:
Test Statistic: This value is compared to critical values from the Dickey-
Fuller distribution. If the test statistic is smaller than the critical value,
you reject the null hypothesis.
p-value: If the p-value is below a significance level (usually 0.05), you
reject the null hypothesis, indicating that the series is stationary.
Critical Values: These are threshold values at different significance levels
(1%, 5%, and 10%) that you compare against the test statistic.
Possible Outcomes:
1. Reject the null hypothesis (H₀): The series is stationary (no unit root is
present).
2. Fail to reject the null hypothesis: The series is non-stationary (a unit
root is present).
Example Workflow:
1. Check for stationarity using the ADF test.
2. If the series is non-stationary, apply transformations (e.g., differencing,
detrending).
3. Retest the stationarity of the transformed series.
Yes, simple moving average (SMA) and differencing are different techniques
used in time series analysis, although they both aim to handle certain aspects
of time series data.
1. Simple Moving Average (SMA):
Purpose: SMA is primarily used for smoothing a time series by averaging
data points over a fixed window, which reduces noise and makes trends
more apparent.
How it works: It calculates the average of a fixed number of previous
data points. For example, a 3-period moving average would calculate the
average of the last three observations for each point in time.
Formula for a 3-point SMA:
SMAt=Yt−1+Yt−2+Yt−33SMA_t = \frac{Y_{t-1} + Y_{t-2} + Y_{t-3}}{3}SMAt=3Yt−1
+Yt−2+Yt−3
Where YtY_tYt is the original time series.
Effect: Smoothing, but it doesn't make the series stationary or remove
trends directly. It only makes the overall trend smoother by averaging
out short-term fluctuations.
2. Differencing:
Purpose: Differencing is specifically used to remove trends or
seasonality and make a non-stationary time series stationary.
How it works: It calculates the difference between consecutive data
points. First-order differencing, for example, takes the difference
between the current value and the previous value to remove a linear
trend.
Formula for first-order differencing:
ΔYt=Yt−Yt−1
Where Yt is the original time series and ΔYt is the differenced series.
Effect: Differencing can remove both trends and seasonal patterns,
which helps in achieving stationarity. It is a common technique to
transform a non-stationary series into a stationary one.
Key Differences:
Simple Moving Average
Aspect Differencing
(SMA)
Smooth the data and reduce Remove trends and make the
Goal
noise series stationary
Averages the values over a Takes the difference between
How it works
fixed window consecutive data points
Retains long-term trends, only
Effect on Removes trends and long-term
smooths short-term
trends patterns
fluctuations
Helps make the series stationary
Effect on Does not directly address
by removing trends or
stationarity stationarity
seasonality
Commonly used as a
Often used for visualization
Application preprocessing step for time
and detecting trends
series modeling
When to Use:
Use SMA when you want to smooth the data and visualize the overall
trend without altering the original time series structure.
Use differencing when your time series is non-stationary and you want
to remove trends or seasonality to meet the assumptions of certain time
series models (like ARIMA).
Example:
Suppose you have a time series with a clear upward trend:
1. Applying SMA will smooth the noise but the upward trend will remain.
2. Applying differencing will remove the trend, and the transformed series
may become stationary.
In Summary:
SMA is a smoothing technique.
Differencing is used to remove trends and achieve stationarity.
Simple Definition: Correlate a time series with itself, ignoring the effects of an
intermediate lags.
Time series decomposition involves breaking down a time series into its
fundamental components: trend, seasonality, and residuals (noise or
irregularities). This allows for better understanding and modeling of the data.
The most commonly used method for decomposition is based on moving
averages and is available in libraries like statsmodels.
Types of Decomposition:
1. Additive Decomposition: Assumes that the components add up to
create the time series.
o Y(t)=Trend(t)+Seasonality(t)+Residual(t)Y(t) = Trend(t) +
Seasonality(t) + Residual(t)Y(t)=Trend(t)+Seasonality(t)+Residual(t)
2. Multiplicative Decomposition: Assumes that the components multiply
to create the time series.
o Y(t)=Trend(t)×Seasonality(t)×Residual(t)Y(t) = Trend(t) \times
Seasonality(t) \times
Residual(t)Y(t)=Trend(t)×Seasonality(t)×Residual(t)
Achieving stationarity in time series data is critical for many statistical models,
especially those in forecasting, such as ARIMA, where assumptions of
stationarity are often required. A stationary time series has properties like
constant mean, variance, and autocovariance over time.
Here are some methods to transform a non-stationary time series into a
stationary one:
1. Differencing
Description: Subtract the previous observation from the current
observation to remove trends and make the series stationary.
How to apply:
python
Copy code
df_diff = df['Value'].diff().dropna()
Order of differencing:
o First-order differencing: Removes linear trends.
o Second-order differencing: May be necessary if the trend is
quadratic or higher-order.
2. Transformation (Logarithm, Square Root, etc.)
Description: Apply transformations to stabilize the variance
(heteroscedasticity).
Log Transformation: Used to reduce large fluctuations in variance.
python
Copy code
df_log = np.log(df['Value'])
Square Root Transformation: Works similarly to log transformation but is
less aggressive.
python
Copy code
df_sqrt = np.sqrt(df['Value'])
Power Transformation: The Box-Cox or Yeo-Johnson transformations can
stabilize variance and make the series more stationary.
python
Copy code
from scipy.stats import boxcox
df_boxcox, lmbda = boxcox(df['Value'])
3. De-trending
Description: Remove the underlying trend in the data.
Fitting a regression: Fit a polynomial or linear regression model to the
series and then subtract the trend component from the original data.
python
Copy code
from scipy import signal
detrended = signal.detrend(df['Value'])
4. Seasonal Decomposition
Description: Remove seasonality by decomposing the series into trend,
seasonality, and residuals using methods like additive or multiplicative
decomposition.
python
Copy code
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(df['Value'], model='additive',
period=12)
df_deseasonalized = df['Value'] - decomposition.seasonal
5. Moving Average Smoothing
Description: Smooth out short-term fluctuations to capture long-term
trends. After smoothing, you can subtract the smoothed series from the
original to remove the trend.
python
Copy code
df_smooth = df['Value'].rolling(window=12).mean()
df_detrended = df['Value'] - df_smooth
6. Difference from the Mean
Description: Subtract the mean of the series from each data point to
stabilize the mean.
python
Copy code
df_mean_diff = df['Value'] - df['Value'].mean()
7. Exponentially Weighted Moving Average (EWMA)
Description: Use EWMA to smooth the time series and remove short-
term variations. Then subtract the smoothed series from the original to
remove the trend.
python
Copy code
df_ewma = df['Value'].ewm(span=12).mean()
df_detrended_ewma = df['Value'] - df_ewma
8. Unit Root Test (Dickey-Fuller)
Description: Use statistical tests like the Augmented Dickey-Fuller (ADF)
test to check stationarity. If the test statistic is greater than the critical
value, the series is non-stationary, and differencing or other techniques
may be needed.
python
Copy code
from statsmodels.tsa.stattools import adfuller
result = adfuller(df['Value'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
If the p-value is less than 0.05, the series is stationary. If not, you may
need to apply differencing or other transformations.
9. Seasonal Differencing
Description: Differencing at seasonal lags, i.e., subtracting the value from
the same point in the previous season.
o For monthly data, differencing at a lag of 12 would subtract each
value from its value 12 months ago:
python
Copy code
df_seasonal_diff = df['Value'].diff(12).dropna()
10. Variance Stabilization Using the Hodrick-Prescott Filter
Description: Decompose a time series into a trend and a cyclical
component using the Hodrick-Prescott filter to remove long-term trend.
python
Copy code
from statsmodels.tsa.filters.hp_filter import hpfilter
cycle, trend = hpfilter(df['Value'], lamb=1600) # Use lambda=1600 for
quarterly data, adjust accordingly
df_detrended = df['Value'] - trend
11. Combining Techniques
Sometimes a single method might not fully achieve stationarity. You
might need to:
o First, log transform to stabilize variance.
o Then, apply differencing to remove trends.
o Finally, seasonally difference to address seasonality.
How to Check for Stationarity:
Once you've applied a method, you should check whether the time series has
become stationary by:
Plotting the series: Visually inspect for a constant mean and variance.
Performing the ADF Test: To verify statistical stationarity.
Plotting ACF/PACF: For stationarity, the autocorrelations should drop off
after a few lags and show no significant autocorrelation.
Types of Stationarity
Strict Stationary: A strict stationary series satisfies the mathematical
definition of a stationary process. For a strict stationary series, the mean,
variance and covariance are not the function of time. The aim is to
convert a non-stationary series into a strict stationary series for making
predictions.
Trend Stationary: A series that has no unit root but exhibits a trend is
resulting series will be strict stationary. The KPSS test classifies a series
as stationary on the absence of unit root. This means that the series can
Transformation
Transformations are used to stabilize the non-constant variance of a series.
Common transformation methods include power transform, square root, and
log transform.
The Box-Cox transformation is a technique used to stabilize variance and make
data more closely approximate a normal distribution. It's particularly useful
when the data exhibits skewness. The transformation is defined as follows:
Where:
y is the original data.
λ is the transformation parameter that determines the power to which
all data points are raised.
Steps to Perform a Box-Cox Transformation:
1. Positive Data: The Box-Cox transformation can only be applied to
positive values. If the data has negative values or zeros, a constant must
be added to make all values positive.
2. Estimate Lambda: The transformation seeks to find an optimal value for
λ\lambdaλ. This value is usually determined via maximum likelihood
estimation (MLE).
3. Apply the Transformation: Depending on the value of λ\lambdaλ, you
apply the corresponding formula to transform the data.
Limitations:
Ignores Trends and Seasonality: It assumes that future observations will
be similar to past averages, which can be problematic if the data has
trends or seasonality.
Over-Simplification: It may not capture fluctuations or changes in the
data that more advanced forecasting methods might detect.
Where:
Forecast(t+n)\text{Forecast}(t + n)Forecast(t+n) is the predicted value for
the next period.
Actual(t−m)\text{Actual}(t - m)Actual(t−m) is the actual value observed
in the same period of the previous cycle.
nnn is the length of the season (e.g., 12 for monthly data).
mmm is the number of periods back to the same season (e.g., 12 for
monthly data).
Applications
Seasonal naive forecasting can be applied in various domains, including:
Retail: Predicting future sales based on sales figures from the same
month in the previous year.
Tourism: Estimating future visitor numbers based on previous years’ data
for the same season.
Agriculture: Forecasting crop yields based on historical seasonal data.
Advantages
Simplicity: Easy to understand and implement.
Effective for Seasonal Data: Performs well when data shows clear
seasonal patterns.
Quick to Update: Forecasts can be updated easily as new data becomes
available.
Limitations
Ignores Trends: Does not account for trends or changes outside the
seasonal pattern.
Limited to Seasonal Data: Not suitable for time series data that do not
exhibit seasonality.
Static Approach: Assumes that past seasonal patterns will hold true in
the future without considering other influencing factors.
A drift model is a type of statistical model used in time series analysis to
capture the underlying trend or movement in the data over time. It is
particularly useful when you want to predict future values based on historical
data, especially when there is a systematic upward or downward trend (drift) in
the data.
Key Characteristics of Drift Models
1. Trend Representation: Drift models explicitly represent trends in time
series data, allowing for better predictions when the data shows
consistent growth or decline over time.
2. Simple Structure: The mathematical formulation of drift models is
typically straightforward, often using linear regression to model the
trend.
3. Adaptability: Drift models can be adjusted or updated as new data
points become available, allowing them to remain relevant over time.
How It Works
In a basic drift model, the predicted value for the next time step can be
expressed as:
Where:
YtY_tYt is the predicted value at time ttt.
μ\muμ is the drift term, representing the average change per time step.
ϕ\phiϕ is the coefficient that determines how much the previous
observation influences the current observation.
ϵt\epsilon_tϵt is the error term, which is usually assumed to be normally
distributed with mean zero.
Limitations
Assumes Linear Trends: Drift models may not capture nonlinear trends
effectively.
Sensitive to Outliers: Extreme values in the data can significantly affect
the model's predictions.
Ignores Seasonality: Basic drift models do not account for seasonal
patterns unless specifically modified to do so.
https://orangematter.solarwinds.com/2019/12/15/holt-winters-forecasting-
simplified/
The Holt-Winters model, also known as the Triple Exponential Smoothing
model, is a popular time series forecasting method that accounts for
seasonality, trends, and levels in the data.
For Triple Exponential Smoothing (Holt-Winters):
You do not need to make the data stationary. This model can handle
non-stationary data with trend and seasonality by directly modeling
these components.
It is particularly effective for datasets that exhibit both trends and seasonal
patterns.
Key Features of the Holt-Winters Model
1. Level: The average value in the series at the current time.
2. Trend: The long-term increase or decrease in the data.
3. Seasonality: The repeating fluctuations over a specific period.
The Holt-Winters model consists of three main components:
Level component (Lt)
Trend component (Tt)
Seasonal component (St)
o Ft+h=(Lt+hTt) × St+h