Time Forecast Project SAMPLE REPORT
Time Forecast Project SAMPLE REPORT
BUSINESS REPORT
NANDINI PRIYA M
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Overall Trend: The graph suggests a positive overall trend in the number of soft drink sales over
the 175-year period. While there are fluctuations, the general direction is upward, indicating an
increasing popularity or consumption of soft drinks over time.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Monthly Trends: The x-axis represents months (Jan to Nov), suggesting we're looking at monthly
sales data for each year.
Yearly Comparison: Each line represents a different year from 1980 to 1995. This allows us to
compare sales patterns across different years.
Overall Upward Trend: Most years show an overall upward trend from January to November. This
indicates a general increase in sales of the product over these months.
Seasonal Peak: There seems to be a noticeable peak in sales around September-November for
most years. This strongly suggests a seasonal effect, perhaps related to holiday shopping, a specific
event, or a change in weather.
Year-to-Year Variability: While there's a general upward trend and seasonal peak, there's
significant year-to-year variability. Some years show steeper increases, while others have more
moderate growth or even declines in certain months.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Multiplicative Decomposition
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Problem 2
Data Pre-processing- Missing value treatment - Visualize the processed data - Train-test split
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
The graph visually separates the soft drink sales data into training and test sets, highlighting a clear upward
trend and seasonal patterns across both periods. The test data (orange) shows a continuation of the patterns
observed in the training data (blue), indicating the potential for effective forecasting.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
The training time instances represent a contiguous sequence from 1 to 130, implying a time-series dataset
where each instance likely corresponds to a sequential time point. The test time instances, however, start
much later (256) and also form a contiguous sequence, indicating a gap between the training and testing
periods, which is crucial for evaluating a model's ability to forecast beyond the known data.
Problem 3
Model Building - Original Data- Build forecasting models - Linear regression - Simple Average -
Moving Average - Exponential Models (Single, Double, Triple) - Check the performance of the models
built
Logistic Regression
The graph shows a linear regression model fitted to the test data, attempting to predict the soft drink
sales trend. While the regression line captures a slight upward trend in the test data, it doesn't
accurately reflect the significant seasonal fluctuations seen in both the training and test sets. This
suggests that a simple linear regression may not be the most appropriate model for forecasting this
time series.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Naïve Forecast
The graph shows a "Naive Forecast" applied to the test data, where the forecast is a flat line using
the last observed value from the training set. This forecast fails to capture the seasonal fluctuations
and upward trend present in the test data, indicating that it's a poor fit for this time series. The Naive
Forecast's simplicity leads to significant underestimation of the true values.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
The graph shows a Simple Average Forecast applied to the test data. The forecast is a horizontal line
representing the average of the training data. This forecast fails to capture both the upward trend
and the strong seasonality present in the test data. It consistently underestimates the actual values,
indicating that a simple average is not an appropriate model for this time series. The forecast's
flatness suggests it's ignoring the crucial temporal patterns, leading to poor predictive performance.
Both models exhibit extremely high RMSE values, indicating very poor predictive performance. The
Simple Average Model has a significantly higher RMSE than the Naive Model, suggesting that, for this
dataset, simply using the last observed value (Naive Model) is slightly better than using the average
of the training data. However, neither model is adequate for accurate forecasting. They both fail to
capture the underlying patterns in the time series
Moving Average
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Naive and Simple Average Models Perform Poorly: Both the Naive Model and the Simple Average
Model have very high RMSE values, indicating significant inaccuracies in their predictions.
Trailing Moving Averages Improve Performance: The trailing moving average models, especially
the 2-point moving average, show a substantial improvement in RMSE compared to the naive and
simple average models.
Smaller Window Size is Better: The 2-point trailing moving average has the lowest RMSE among all
the models, suggesting that a smaller window size captures the patterns in the data more effectively.
As the window size increases (4, 6, 9 points), the RMSE also increases, indicating a decrease in
accuracy.
Moving Averages Still Have Limitations: While the moving averages perform better than the naive
and simple average, their RMSE values are still relatively high, suggesting that they might not be
capturing all the complexities of the time series, such as strong seasonality and potential trends.
Model Comparison
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
DES Performs Better: The Double Exponential Smoothing (DES) model, with an RMSE of
614456.734894, performs significantly better than the Simple Exponential Smoothing (SES) model,
which has an RMSE of 797167.792.
DES Captures Trend: The lower RMSE of DES suggests that it is better at capturing the trend
component in the data compared to SES. DES accounts for trend and level, while SES only accounts
for level.
Both Models Still Show High Error: While DES is an improvement over SES, both models still have
relatively high RMSE values. This suggests that neither model is ideal for forecasting this time series,
and more complex models that handle seasonality might be necessary.
Problem 4
Check for Stationarity- Check for stationarity - Make the data stationary (if needed)
Non-Stationary: A p-value of 0.9952 is much greater than the common significance level of 0.05..
This means we conclude that the time series is non-stationary.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Exponential Growth: The fact that the upward trend remains after the log transformation suggests
that the original soft drink sales data likely exhibited exponential growth.
Multiplicative Seasonality: The persistence of seasonality in the logged data indicates that the
seasonality is multiplicative.
Trend Remains: The graph still shows an upward trend in the logged data, indicating that the original
data had an exponential growth pattern.
Seasonality Remains: The seasonal fluctuations are still visible in the logged data, suggesting that
the seasonality is multiplicative (i.e., the magnitude of the seasonal fluctuations increases with the
level of the series).
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Problem 5
Model Building - Stationary Data- Generate ACF & PACF Plot and find the AR, MA values. - Build
different ARIMA models - Auto ARIMA - Manual ARIMA - Build different SARIMA models - Auto
SARIMA - Manual SARIMA - Check the performance of the models built
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
AR(2) Model Selected: The selection of an ARIMA(2, 0, 0) model suggests that the autoregressive
components (AR terms) are the most significant in modeling the time series. The lack of differencing
(I(0)) implies the series was likely stationary or made stationary through prior transformations (e.g.,
log transformation).
High RMSE: The RMSE of 999.19256 is relatively high, indicating that the model's predictions have
a significant average error. This suggests the model, while considered the "best" among the tested
ARIMA models, is not highly accurate.
ARMA Model building to estimate best 'p' , 'q' ( Lowest AIC Approach )
ARIMA(1, 0, 1) - AIC:-350.8225798335556
The ARIMA(1, 0, 1) model, with an AIC of -350.82, suggests a potential improvement over the
ARIMA(2, 0, 0) model (based on the lower AIC). This indicates that a model with one autoregressive
(AR) term and one moving average (MA) term might better balance model fit and complexity for this
time series data.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Split: The graph shows the training data (blue), test data (orange), and the forecasted
sales (green).
Strong Seasonality: Both the training and test data exhibit strong seasonal patterns with
regular peaks and troughs.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
RMSE Value: The RMSE of 100.968926 indicates the average magnitude of the errors between the
forecasted values and the actual values in the test set.
Model Fit: The RMSE suggests that the ARIMA(3, 0, 3) model is providing forecasts that are, on
average, about 100.97 units away from the actual values.
ARIMA Model building to estimate best 'p' , 'd' , 'q' paramters ( Lowest AIC Approach )
ARIMA(1, 0, 1) - AIC:-350.8225798335556
Low AIC: The AIC value of -350.8225798335556 is relatively low. AIC is a measure of model fit that
penalizes model complexity. A lower AIC generally indicates a better balance between model fit and
complexity.
Potential for Good Fit: The low AIC suggests that the ARIMA(1, 0, 1) model might be a good fit for
the data compared to other models with higher AIC values
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Split: The graph shows the training data (blue), test data (orange), and the forecasted
sales (green).
Strong Seasonality: Both the training and test data exhibit strong seasonal patterns with
regular peaks and troughs.
Upward Trend: There's a general upward trend visible in the data over the years.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
ARIMA(2, 0, 0) - High RMSE: The ARIMA(2, 0, 0) model, which is essentially an AR(2) model (only
autoregressive terms), has the highest RMSE of 999.19256.
ARIMA(3, 0, 3) - Lower RMSE: Both the "Best ARMA Model" (ARIMA(3, 0, 3)) and the "Best ARIMA
Model" (also ARIMA(3, 0, 3)) have a significantly lower RMSE of 796.420052.
ARIMA(3, 0, 3) Selected: The ARIMA(3, 0, 3) model, which includes both autoregressive (AR) and
moving average (MA) terms, is selected as the best model, both for ARMA and general ARIMA.
Inference:
ARMA(3, 0, 3) Superior: The ARIMA(3, 0, 3) model demonstrates better forecasting accuracy
than the ARIMA(2, 0, 0) model, as evidenced by its lower RMSE
SARIMA MODEL
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Split: The graph shows the training data (blue), test data (orange), and the forecasted
sales (green).
Strong Seasonality: Both the training and test data exhibit strong seasonal patterns with
regular peaks and troughs.
Upward Trend: There's a general upward trend visible in the data over the years.
Forecasted Sales (Green): The forecasted sales (green) attempts to follow the pattern in the
test data but shows some deviations.
Better Fit Than Previous Models: Compared to the flat forecasts we saw earlier, this forecast
(green) appears to capture the seasonality and trend to a greater extent.
Inference:
Improved Model: The forecasting model used here is better than the simple models (naive,
average) and exponential smoothing models we saw previously. It seems to be capturing the
seasonal patterns and the upward trend more effectively.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
In summary, the SARIMAX(1, 0, 1)x(1, 0, 1, 12) model, which incorporates seasonal components, is
the most accurate model based on RMSE, demonstrating the importance of accounting for
seasonality in this time series. The ARIMA(3, 0, 3) model also performs better than the AR(2) model,
indicating the usefulness of moving average terms
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Problem 6
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Models Compared:
1. Regression on Time: (Linear Regression) - Very high RMSE, poor fit.
2. Naive Model: (Last observed value) - High RMSE, poor fit.
3. Simple Average Model: (Mean of training data) - Very high RMSE, poor fit.
4. Trailing Moving Average (Various Window Sizes): Improved over naive and simple average,
but still high RMSE and doesn't capture seasonality.
5. Simple Exponential Smoothing (SES): High RMSE, flat forecast, doesn't capture trend or
seasonality.
6. Double Exponential Smoothing (DES): Improved over SES by capturing trend, but still high
RMSE and doesn't capture seasonality.
7. ARIMA(2, 0, 0): (AR(2) Model) - High RMSE, doesn't capture seasonality.
8. ARIMA(3, 0, 3): (ARMA Model) - Lower RMSE than AR(2), but still moderate and potential
overfitting.
9. SARIMA(1, 0, 1)x(1, 0, 1, 12): (Seasonal ARIMA) - Lowest RMSE, captures both trend and
seasonality.
Performance Comparison:
RMSE: The Root Mean Squared Error (RMSE) is a key metric for evaluating model
performance. Lower RMSE indicates better accuracy.
o Worst: Naive, Simple Average, Regression on Time (very high RMSE).
o Poor: SES, DES, ARIMA(2, 0, 0) (high RMSE).
o Moderate: ARIMA(3, 0, 3) (moderate RMSE).
o Best: SARIMA(1, 0, 1)x(1, 0, 1, 12) (lowest RMSE).
AIC/BIC: Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are used
to compare model fit while penalizing complexity. Lower values are better.
o Best: SARIMA(1, 0, 1)x(1, 0, 1, 12) (lowest AIC, BIC).
o Better: ARIMA(3, 0, 3) (lower AIC, BIC than ARIMA(2, 0, 0)).
Visual Fit: Comparing the forecasted values to the actual test data visually:
o Worst: Naive, Simple Average, SES (flat forecasts).
o Poor: Regression on Time, DES (miss seasonality).
o Moderate: ARIMA(3, 0, 3) (follows trend, some seasonality).
o Best: SARIMA(1, 0, 1)x(1, 0, 1, 12) (captures seasonality and trend).
Residual Analysis: Residual diagnostics for the SARIMA(1, 0, 1)x(1, 0, 1, 12) model showed:
o Random residuals.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Observed Data (Blue): The blue line represents the actual observed sales data for soft
drinks over time.
Forecast Data (Orange): The orange line represents the forecasted sales data for the next 12
months (as indicated by the title).
Time Range: The graph covers the time period from approximately 1980 to 1996.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Actionable Insights & Recommendations- Conclude with the key takeaways (actionable insights and
recommendations) for the business
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Non-Stationarity: The series exhibits a clear non-stationary pattern. There's an apparent upward
trend from 1980 to around 1988, followed by a potential level shift and then a more stable (but still
fluctuating) pattern. This non-stationarity suggests that standard time series models like ARIMA
might need differencing to achieve stationarity.
Level Shift/Structural Break: Around 1986-1988, there's a significant jump in the "No_Shoe_Sales."
This indicates a potential structural break or level shift.
Volatility: The series shows varying degrees of volatility. The period from 1986-1990 exhibits higher
volatility compared to the periods before and after.
Potential Seasonality? While not immediately obvious, there might be some underlying seasonality
in the data.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Additive decomposition
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Pre-processing- Missing value treatment - Visualize the processed data - Train-test split
Monthly Data: The data is organized monthly, indicated by the "Month Year" column and the date format in
the index.
Time Range:
Training Data: Spans from January 1980 to October 1990.
Test Data: Spans from November 1990 to July 1995.
Target Variable: The "No_Shoe_Sales" column represents the number of shoe sales, which is the target
variable we want to forecast.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Problem 3
Model Building - Original Data- Build forecasting models - Linear regression - Simple Average -
Moving Average - Exponential Models (Single, Double, Triple) - Check the performance of the models
built
Logistic Regression
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Extremely High RMSE: The RMSE value of 331,850.445317 is exceptionally high. This suggests that
the model's predictions are significantly far from the actual values in the test set.
Poor Model Performance: A high RMSE indicates poor model performance. The model is not
accurately capturing the patterns in the data and is producing large errors.
Naïve Forecast
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
High RMSE: The RMSE value of 11605.175439 is relatively high. This suggests that the Naive
Forecast model's predictions are significantly different from the actual values in the test set.
Poor Model Performance: A high RMSE indicates poor model performance. The Naive Forecast, as
expected, is not accurately capturing the patterns in the data and is producing large errors.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Simple Average Outperforms Naive: The Simple Average Model has a significantly lower RMSE
(4974.097699) compared to the Naive Model (11605.175439). This means the Simple Average Model
is providing more accurate forecasts than the Naive Model on the test data.
Still High RMSE: While the Simple Average Model performs better than the Naive Model, its RMSE
of 4974.097699 is still relatively high. This suggests that the model's predictions are still significantly
different from the actual values in the test set
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Multiple Moving Averages: The plot shows the original training data (blue line) along with moving
average forecasts calculated using different window sizes: 2, 4, 6, and 9 points.
Smoothing Effect: All the moving average lines show a smoothing effect compared to the original
data, reducing the fluctuations and highlighting the underlying trends.
Varying Degrees of Smoothing: The degree of smoothing varies with the window size. Larger
window sizes result in smoother forecasts.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
For 2 point Moving Average Model forecast on the Training Data, RMSE is 2096.333
For 4 point Moving Average Model forecast on the Training Data, RMSE is 3456.840
For 6 point Moving Average Model forecast on the Training Data, RMSE is 4244.904
For 9 point Moving Average Model forecast on the Training Data, RMSE is 4983.745
2-Point Trailing Moving Average Performs Best: The 2-point trailing moving average has the
lowest RMSE (2096.333333) among all the models. This indicates that it provides the most accurate
forecasts on the test data compared to the other models.
Simple Average Outperforms Naive and Larger Moving Averages: The Simple Average Model
(RMSE: 4974.097699) outperforms the Naive Model (RMSE: 11605.175439) and the larger trailing
moving averages (6-point and 9-point).
RMSE Increases with Larger Moving Average Windows: As the window size of the trailing moving
average increases (from 2 to 9), the RMSE also increases. This suggests that larger window sizes lead
to less accurate forecasts on the test data.
Model Comparison:
Naive Model (Worst): The Naive Model has the highest RMSE, indicating the poorest
performance.
Simple Average (Better): The Simple Average Model performs better than the Naive Model.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Comparison
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Simple and Double Exponential Smoothing (SES & DES): The plot compares the forecasts
generated by Simple Exponential Smoothing (SES) with alpha = 0.99 and Double Exponential
Smoothing (DES) with alpha = 0.099 and beta = 0.0001.
Training and Test Split: The data is split into training (blue line) and test (orange line) sets.
Poor Fit: Both SES and DES forecasts show a poor fit to the test data.
Specific Inferences:
1. Simple Exponential Smoothing (SES) with Alpha = 0.99 (Green Line):
o High Alpha Value: As discussed previously, an alpha of 0.99 makes the forecast
essentially a Naive Forecast, heavily influenced by the last training data point.
o Constant Forecast: The forecast (green line) is a flat, horizontal line.
o Poor Fit: The SES forecast doesn't capture the fluctuations in the test data.
2. Double Exponential Smoothing (DES) with Alpha = 0.099 and Beta = 0.0001 (Red Line):
o Low Alpha and Beta: Low alpha and beta values indicate that the model gives more
weight to past observations and less weight to recent changes.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
SES Significantly Outperforms DES: The Simple Exponential Smoothing (SES) model with alpha =
0.99 has a significantly lower RMSE (109.515017) compared to the Double Exponential Smoothing
(DES) model with alpha = 1 and beta = 0.0189 (20949.171963). This indicates that the SES model
provides much more accurate forecasts on the test data than the DES model.
DES Performance is Poor: The DES model's RMSE of 20949.171963 is extremely high. This suggests
that the DES model is performing very poorly and is not capturing the patterns in the data effectively.
SES Performance is Relatively Good: The SES model's RMSE of 109.515017 is relatively low. This
indicates that the SES model is providing reasonably accurate forecasts on the test data, at least
compared to the DES model.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
SES Performs Best: The Simple Exponential Smoothing (SES) model with alpha = 0.99 has the
lowest RMSE (109.515017) among the three models. This indicates that it provides the most accurate
forecasts on the test data.
DES Performs Worst: The Double Exponential Smoothing (DES) model with alpha = 1 and beta =
0.0189 has the highest RMSE (20949.171963). This suggests that it's performing very poorly and is
not capturing the patterns in the data effectively.
TES Performance is Intermediate: The Triple Exponential Smoothing (TES) model with alpha =
0.25, beta = 0.0, and gamma = 0.74 has an RMSE of 8625.626132. It performs better than the DES
model but worse than the SES model.
Model Comparison:
SES (Best): The SES model provides the most accurate forecasts.
DES (Worst): The DES model performs the poorest.
TES (Intermediate): The TES model's performance is in between the other two.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
SES Performs Best: The Simple Exponential Smoothing (SES) model with alpha = 0.99 has the
lowest RMSE (109.515017) among all four models. This indicates that it provides the most accurate
forecasts on the test data.
DES Performs Worst: The Double Exponential Smoothing (DES) model with alpha = 1 and beta =
0.0189 has the highest RMSE (20949.171963). This suggests that it's performing very poorly and is
not capturing the patterns in the data effectively.
TES Performance Varies: The two Triple Exponential Smoothing (TES) models have different RMSE
values:
TES (Alpha = 0.25, Beta = 0.0, Gamma = 0.74) RMSE: 8625.626132
TES (Alpha = 0.74, Beta = 2.73e-06, Gamma = 5.2e-07) RMSE: 5251.999169
TES with Higher Alpha Performs Better: The TES model with alpha = 0.74, beta = 2.73e-06, and
gamma = 5.2e-07 performs better than the TES model with alpha = 0.25, beta = 0.0, and gamma =
0.74. This indicates that the parameter choices significantly impact the performance of the TES
model.
Model Comparison (Overall):
SES (Best): Provides the most accurate forecasts.
TES (Variable): Performance depends on parameter choices. The TES with higher alpha
performs better.
DES (Worst): Performs the poorest.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Check for Stationarity- Check for stationarity - Make the data stationary (if needed)
Non-Stationary: The p-value (0.4222) is greater than the typical significance level of 0.05.. This
means we conclude that the time series is non-stationary.
Stationarity Improvement: The differenced series appears to be more stationary than the original
series. The trend component, which was evident in the original series, has been removed.
Fluctuations Around Zero: The differenced series fluctuates around zero, indicating that there's no
clear upward or downward trend.
Volatility: The differenced series still exhibits volatility. The magnitude of fluctuations varies over
time.
Logarithmic Scale: The y-axis represents the logarithm of "No_Shoe_Sales," not the raw sales
values.
Trend and Level Shift (Log Scale):
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Stationarity: The series appears to be stationary. It fluctuates around zero with no clear trend.
Volatility: The volatility seems relatively consistent across the series, indicating that the log
transformation has helped stabilize the variance.
Random Fluctuations: The series shows random fluctuations, suggesting that the trend and level
shift have been effectively removed.
Problem 5
Model Building - Stationary Data- Generate ACF & PACF Plot and find the AR, MA values. - Build
different ARIMA models - Auto ARIMA - Manual ARIMA - Build different SARIMA models - Auto
SARIMA - Manual SARIMA - Check the performance of the models built
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Relatively Low RMSE: An RMSE of 111.48 suggests that the best AR model is providing relatively
accurate forecasts on the test data.
Model Fit: The low RMSE indicates that the AR model is capturing the patterns in the data
reasonably well and is producing forecasts that are close to the actual values.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
ARMA Model building to estimate best 'p' , 'q' ( Lowest AIC Approach )
ARIMA(1, 0, 1) Model Fit: The AIC value of -270.11436 represents the goodness of fit of the
ARIMA(1, 0, 1) model to the data, taking into account the model's complexity.
Model Comparison: To determine if this AIC value is good, it needs to be compared with the AIC
values of other ARIMA models fitted to the same dataset.
ARIMA(3, 0, 3) Model: The model fitted is an ARIMA(3, 0, 3) model. This means it includes three
autoregressive (AR) terms, no differencing (I), and three moving average (MA) terms.
Significant AR and MA Terms: All the AR and MA terms are statistically significant (p-value =
0.000), except for the constant term. This suggests that past values of the "No_Shoe_Sales" variable
and past forecast errors have a significant impact on the current values.
Insignificant Constant: The constant term is not statistically significant (p-value = 0.904). This
might suggest that there's no significant overall mean level in the time series after accounting for the
AR and MA terms.
Significant Residual Variance: The variance of the residuals (sigma2) is statistically significant (p-
value = 0.000). This suggests that the residuals are not simply random noise.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
RMSE Value: The RMSE of 100.969 indicates the average magnitude of the errors between the
forecasted values and the actual values in the test set.
Model Fit: The RMSE suggests that the model used for forecasting is providing forecasts that are,
on average, about 100.969 units away from the actual values.
RMSE Value: The RMSE of 100.968926 indicates the average magnitude of the errors between the
forecasted values and the actual values in the test set.
Model Fit: The RMSE suggests that the ARIMA(3, 0, 3) model is providing forecasts that are, on
average, about 100.97 units away from the actual values.
ARIMA Model building to estimate best 'p' , 'd' , 'q' paramters ( Lowest AIC Approach )
ARIMA(1, 0, 1) Model Fit: The AIC value of -270.11436 represents the goodness of fit of the
ARIMA(1, 0, 1) model to the data, taking into account the model's complexity
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
ARIMA(3, 0, 3) is Better: The ARIMA(3, 0, 3) model has a lower RMSE (100.968926) compared to
the ARIMA(2, 0, 0) model (111.480284). This indicates that the ARIMA(3, 0, 3) model provides more
accurate forecasts on the test data.
RMSE Difference: The difference in RMSE values is approximately 10.51 units (111.480284 -
100.968926). This suggests a noticeable improvement in forecast accuracy with the ARIMA(3, 0, 3)
model.
Model Complexity: The ARIMA(3, 0, 3) model is more complex than the ARIMA(2, 0, 0) model, as it
includes more AR and MA terms. However, the lower RMSE justifies the increased complexity.
SARIMA MODEL
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Fit: The AIC value of -334.88446529793055 represents the goodness of fit of the SARIMA(1, 0,
1)x(1, 0, 1, 12) model to the data, taking into account the model's complexity.
SARIMAX(1, 0, 1)x(1, 0, 1, 12) Model: The model fitted is a Seasonal ARIMA model with:
Non-seasonal components: AR(1), I(0), MA(1)
Seasonal components: Seasonal AR(1) at lag 12, Seasonal I(0), Seasonal MA(1) at lag 12
Seasonal period: 12 (likely monthly data with yearly seasonality).
Significant Parameters: All the AR and MA terms (both non-seasonal and seasonal) are statistically
significant (p-value = 0.000). This indicates that both past values and past forecast errors, including
seasonal lags, have a significant impact on the current values.
Significant Residual Variance: The variance of the residuals (sigma2) is statistically significant (p-
value = 0.000). This suggests that the residuals are not simply random noise.
Model Fit:
AIC, BIC, HQIC: The AIC (-358.169), BIC (-342.204), and HQIC (-351.696) are used to assess
the model fit. Lower values generally indicate a better fit.
Log Likelihood: The Log Likelihood (184.085) measures how well the model fits the data.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Fluctuations Around Zero: The residuals fluctuate around zero, suggesting that the model has
captured the mean structure of the time series reasonably well.
No Clear Trend or Pattern: There's no clear trend or pattern in the residuals, indicating that the
model has captured the trend and seasonality (if present) effectively.
Outliers: There are a few outliers (points far from zero), especially around 1986-1988. This
suggests that the model might not have captured some unusual events or structural changes during
that period.
Constant Variance: The variance of the residuals appears to be relatively constant over time,
indicating that the model has captured the heteroscedasticity (if present) effectively.
Plot 2: Histogram Plus Estimated Density
Approximately Normal Distribution: The histogram and KDE are approximately bell-shaped and
centered around zero, suggesting that the residuals are approximately normally distributed.
Slight Deviations from Normality: There are some slight deviations from the standard normal
distribution, particularly in the tails. This suggests that the residuals might not be perfectly normally
distributed.
Skewness: The distribution appears to be slightly skewed to the left, indicating that there are more
negative residuals than positive residuals.
Model Adequacy: The plots suggest that the model is reasonably adequate for forecasting the time
series. The residuals are centered around zero, have no clear trend or pattern, and are approximately
normally distributed.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Problem 6
Compare the performance of the models- Compare the performance of all the models built - Choose the best
model with proper rationale - Rebuild the best model using the entire data - Make a forecast for the next 12
months
Problem 7
Actionable Insights & Recommendations- Conclude with the key takeaways (actionable insights and
recommendations) for the business
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.