Statistics Project SEM1 Notes
Statistics Project SEM1 Notes
Part-A:
To address the requirements outlined in your project, let's break down the tasks step by step:
2. Data Exploration: Visualize the time series data to understand its characteristics, including trends, seasonality,
and any potential outliers. You can use line plots, histograms, or other appropriate visualizations.
2. Exponential Smoothing: Estimate exponential smoothing models such as Simple Exponential Smoothing
(SES), Holt's Linear Trend method, or Holt-Winters' seasonal method to capture trends and seasonality.
3. ARIMA/SARIMA: Fit Autoregressive Integrated Moving Average (ARIMA) or Seasonal ARIMA (SARIMA)
models to capture any autocorrelation, trends, and seasonality in the data. Conduct appropriate diagnostic tests
(e.g., ACF, PACF plots) to identify the model orders.
2. Forecasting: Forecast the average prices for the 6 months from October 2023 to March 2024 using the chosen
models.
3. Evaluation: Evaluate the accuracy of the forecasts against the actual data for the period October 2023 to
March 2024. Calculate relevant metrics (e.g., Mean Absolute Error, Mean Squared Error) to assess forecast
performance.
2. Adequacy for Forecasting: Provide commentary on the adequacy of the chosen optimal model for forecasting
purposes. Consider factors such as model assumptions, forecast horizon, and robustness.
3. Visualizations: Include relevant visualizations (e.g., time series plots, forecast vs. actual plots) to support your
analysis and conclusions.
4. References: Provide proper citations for data sources, models, and methodologies used in your analysis.
Ensure to thoroughly document your process, including any assumptions made, methodology choices, and
interpretations of results. If you need assistance with any specific aspect of the analysis or have further
questions, feel free to ask!
In the preliminary assessment step of time series analysis, exploratory data analysis (EDA) involves examining
the characteristics of the time series data to gain insights into its structure, patterns, and potential issues.
Here are some common techniques for conducting EDA on time series data:
1. Time Series Plot: Plot the time series data over time to visualize its general trend, seasonality, and any
outliers or irregularities. This can be done using a simple line plot with time on the x-axis and the variable of
interest on the y-axis.
2. Seasonal Decomposition: Decompose the time series into its trend, seasonal, and residual components using
methods like seasonal decomposition of time series (STL) or seasonal-trend decomposition using LOESS (STL
decomposition). This helps identify underlying patterns and seasonal fluctuations.
3. Histogram and Density Plot: Examine the distribution of the data using histograms or density plots to
understand its variability and skewness. This can provide insights into the data's stationarity and potential
transformations needed.
4. Autocorrelation and Partial Autocorrelation Plots: Plot the autocorrelation function (ACF) and partial
autocorrelation function (PACF) to identify the presence of autocorrelation in the data. This helps in determining
the order of autoregressive (AR) and moving average (MA) components in ARIMA modeling.
5. Box Plot or Violin Plot: Visualize the distribution of the data across different time periods, such as months or
seasons, using box plots or violin plots. This can reveal any systematic patterns or differences between time
periods.
6. Time Series Decomposition: Decompose the time series into trend, seasonality, and noise components using
methods like moving averages or exponentially weighted moving averages (EWMA). This can help in
understanding the underlying patterns and trends.
7. Summary Statistics: Calculate summary statistics such as mean, median, standard deviation, minimum, and
maximum values to describe the central tendency and variability of the data.
8. Lag Plots: Create lag plots to visualize the relationship between the time series data and its lagged values.
This can help identify potential autocorrelation and guide the selection of lag orders in ARIMA modeling.
Code Explanation:
1) Syntax: df.set_index('Date', inplace=True) #If df is our data frame name
It seems like you're asking about setting the index of the DataFrame df to the 'Date' column.
When you use df.set_index('Date', inplace=True), it sets the 'Date' column as the index of the DataFrame in
place, meaning it modifies the DataFrame directly without creating a new DataFrame.
So, after executing this line of code, your DataFrame df will have the 'Date' column as its index. This can be
helpful for time series analysis because you can easily access data based on dates.
2) Syntax: plt.gca().xaxis.set_major_locator(YearLocator())
plt.gca(): This function gets the current Axes instance in the current figure.
"gca" stands for "get current axes".
xaxis: This attribute of the Axes instance represents the x-axis.
set_major_locator(YearLocator()) : This method sets the major locator for the x-
axis ticks. YearLocator() is a locator that places ticks at regular intervals of years.
These three simple models provide a starting point for time series forecasting and can be
useful for establishing baseline performance metrics. However, they may not capture more
complex patterns or dynamics present in real-world time series data. Hence, more
sophisticated models are often required for accurate forecasting in practical applications.
窗体顶端
Exponential Smoothening:
Simple Exponential Smoothing (SES):
Holt's Linear Trend method:
Holt-Winters' seasonal method: