0% found this document useful (0 votes)

25 views5 pages

Statistics Project SEM1 Notes

Uploaded by

mrarcadian26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views5 pages

Statistics Project SEM1 Notes

Uploaded by

mrarcadian26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Statistics Project SEM1 notes:

Part-A:
To address the requirements outlined in your project, let's break down the tasks step by step:

### Preliminary Assessment of the Time Series:

1. Data Loading: Load the 'CocoaPrices.csv' dataset into your preferred data analysis environment (e.g., Python
with pandas library, R, etc.).

2. Data Exploration: Visualize the time series data to understand its characteristics, including trends, seasonality,
and any potential outliers. You can use line plots, histograms, or other appropriate visualizations.

### Estimation and Discussion of Suitable Time Series Models:

1. Simple Time Series Models: Consider basic models such as the mean, naive, or random walk models as
baseline benchmarks.

2. Exponential Smoothing: Estimate exponential smoothing models such as Simple Exponential Smoothing
(SES), Holt's Linear Trend method, or Holt-Winters' seasonal method to capture trends and seasonality.

3. ARIMA/SARIMA: Fit Autoregressive Integrated Moving Average (ARIMA) or Seasonal ARIMA (SARIMA)
models to capture any autocorrelation, trends, and seasonality in the data. Conduct appropriate diagnostic tests
(e.g., ACF, PACF plots) to identify the model orders.

### Model Evaluation and Forecasting:

1. Training Set Selection: Use data up to and including September 2023 as the training set.

2. Forecasting: Forecast the average prices for the 6 months from October 2023 to March 2024 using the chosen
models.

3. Evaluation: Evaluate the accuracy of the forecasts against the actual data for the period October 2023 to
March 2024. Calculate relevant metrics (e.g., Mean Absolute Error, Mean Squared Error) to assess forecast
performance.

### Discussion of Optimal Model:

1. Model Selection: Discuss your choice of an 'optimum' model based on forecast accuracy, diagnostic tests, and
model simplicity.

2. Adequacy for Forecasting: Provide commentary on the adequacy of the chosen optimal model for forecasting
purposes. Consider factors such as model assumptions, forecast horizon, and robustness.

### Report Writing:

1. Organization: Structure your report with clear sections for each task, including introduction, data description,
model estimation, forecasting, evaluation, and conclusion.
2. Clarity and Interpretation: Clearly present your findings, interpretations, and conclusions in a concise and
understandable manner.

3. Visualizations: Include relevant visualizations (e.g., time series plots, forecast vs. actual plots) to support your
analysis and conclusions.

4. References: Provide proper citations for data sources, models, and methodologies used in your analysis.

Ensure to thoroughly document your process, including any assumptions made, methodology choices, and
interpretations of results. If you need assistance with any specific aspect of the analysis or have further
questions, feel free to ask!

EDA process to do:

In the preliminary assessment step of time series analysis, exploratory data analysis (EDA) involves examining
the characteristics of the time series data to gain insights into its structure, patterns, and potential issues.

Here are some common techniques for conducting EDA on time series data:

1. Time Series Plot: Plot the time series data over time to visualize its general trend, seasonality, and any
outliers or irregularities. This can be done using a simple line plot with time on the x-axis and the variable of
interest on the y-axis.

2. Seasonal Decomposition: Decompose the time series into its trend, seasonal, and residual components using
methods like seasonal decomposition of time series (STL) or seasonal-trend decomposition using LOESS (STL
decomposition). This helps identify underlying patterns and seasonal fluctuations.

3. Histogram and Density Plot: Examine the distribution of the data using histograms or density plots to
understand its variability and skewness. This can provide insights into the data's stationarity and potential
transformations needed.

4. Autocorrelation and Partial Autocorrelation Plots: Plot the autocorrelation function (ACF) and partial
autocorrelation function (PACF) to identify the presence of autocorrelation in the data. This helps in determining
the order of autoregressive (AR) and moving average (MA) components in ARIMA modeling.

5. Box Plot or Violin Plot: Visualize the distribution of the data across different time periods, such as months or
seasons, using box plots or violin plots. This can reveal any systematic patterns or differences between time
periods.

6. Time Series Decomposition: Decompose the time series into trend, seasonality, and noise components using
methods like moving averages or exponentially weighted moving averages (EWMA). This can help in
understanding the underlying patterns and trends.
7. Summary Statistics: Calculate summary statistics such as mean, median, standard deviation, minimum, and
maximum values to describe the central tendency and variability of the data.

8. Lag Plots: Create lag plots to visualize the relationship between the time series data and its lagged values.
This can help identify potential autocorrelation and guide the selection of lag orders in ARIMA modeling.

Code Explanation:
1) Syntax: df.set_index('Date', inplace=True) #If df is our data frame name

It seems like you're asking about setting the index of the DataFrame df to the 'Date' column.

When you use df.set_index('Date', inplace=True), it sets the 'Date' column as the index of the DataFrame in
place, meaning it modifies the DataFrame directly without creating a new DataFrame.

Here's what each part of the code does:

 df: This is your DataFrame containing the data.

 set_index('Date'): This method sets the 'Date' column as the index of the DataFrame.
 inplace=True: This parameter is optional. When set to True, it modifies the DataFrame in place,
meaning it doesn't return a new DataFrame but instead modifies the existing one.

So, after executing this line of code, your DataFrame df will have the 'Date' column as its index. This can be
helpful for time series analysis because you can easily access data based on dates.

2) Syntax: plt.gca().xaxis.set_major_locator(YearLocator())
 plt.gca(): This function gets the current Axes instance in the current figure.
"gca" stands for "get current axes".
 xaxis: This attribute of the Axes instance represents the x-axis.
 set_major_locator(YearLocator()) : This method sets the major locator for the x-
axis ticks. YearLocator() is a locator that places ticks at regular intervals of years.

So, by calling plt.gca().xaxis.set_major_locator(YearLocator()) , you're setting the major

locator for the x-axis ticks to show only the years in your plot, which can be useful for
better readability and understanding of the time series data.

TIME SERIES MODELS

1.
Mean Model:
2.
 Description: The mean model is one of the simplest time series models, where the prediction
for each future time point is simply the mean of all past observations. It assumes that the time
series data fluctuates around a constant average value, and future values are expected to be
similar to the historical average.
 Formula: 𝑌^𝑡+1=1𝑛∑𝑖=1𝑛𝑌𝑖Y^t+1=n1∑i=1nYi, where 𝑌^𝑡+1Y^t+1 is the predicted value
for the next time point, 𝑌𝑖Yi represents the observed values in the historical data, and 𝑛n is
the total number of observations.
 Usage: The mean model is often used as a baseline or benchmark model for time series
forecasting. It provides a simple reference point for evaluating the performance of more
complex models.
 Assumptions: The mean model assumes that the underlying process generating the time
series data is stationary and does not exhibit any trend or seasonality. It also assumes that the
mean of the time series remains constant over time.
3.
Naive Model:
4.
 Description: The naive model is even simpler than the mean model, where the prediction for
each future time point is equal to the last observed value in the time series. It assumes that
future values will remain constant and equal to the most recent observation.
 Formula: 𝑌^𝑡+1=𝑌𝑡Y^t+1=Yt, where 𝑌^𝑡+1Y^t+1 is the predicted value for the next time
point, and 𝑌𝑡Yt represents the last observed value in the time series.
 Usage: The naive model is often used as a baseline for comparison with more sophisticated
forecasting methods. Despite its simplicity, it can sometimes perform well for time series data
with stable and persistent trends.
 Assumptions: The naive model assumes that there are no systematic patterns or trends in the
time series data, and that future values will be similar to the most recent observation.
5.
Random Walk Model:
6.
 Description: The random walk model is a stochastic process where each future value in the
time series is equal to the previous value plus some random noise. It assumes that future
values are influenced by past observations, but also incorporate random fluctuations or
shocks.
 Formula: 𝑌𝑡=𝑌𝑡−1+𝜖𝑡Yt=Yt−1+ϵt, where 𝑌𝑡Yt represents the value at time 𝑡t, 𝑌𝑡−1Yt−1
represents the value at the previous time point, and 𝑡ϵt represents a random error term.
 Usage: The random walk model is commonly used for modeling and forecasting processes
that exhibit persistence or autocorrelation in their time series structure. It can be extended to
more complex models such as autoregressive integrated moving average (ARIMA) models.
 Assumptions: The random walk model assumes that the random error terms 𝑡ϵt are
independent and identically distributed (i.i.d.), and that there are no systematic trends or
patterns in the time series data.

These three simple models provide a starting point for time series forecasting and can be
useful for establishing baseline performance metrics. However, they may not capture more
complex patterns or dynamics present in real-world time series data. Hence, more
sophisticated models are often required for accurate forecasting in practical applications.
窗体顶端

Exponential Smoothening:
Simple Exponential Smoothing (SES):
Holt's Linear Trend method:
Holt-Winters' seasonal method:

Iot Unit 3 4 5 Sem
No ratings yet
Iot Unit 3 4 5 Sem
46 pages
Time Series Analysis and Spectral Analysis
No ratings yet
Time Series Analysis and Spectral Analysis
11 pages
Module 5 (2) Finace
No ratings yet
Module 5 (2) Finace
66 pages
Time Series Analysis - COMPLETE
No ratings yet
Time Series Analysis - COMPLETE
15 pages
Introduction to Time Series Analysis
From Everand
Introduction to Time Series Analysis
Vikas Rathi
No ratings yet
Time Series Forecasting
No ratings yet
Time Series Forecasting
37 pages
Tsa - Time Series Analysis
No ratings yet
Tsa - Time Series Analysis
45 pages
Intro To Time Series
No ratings yet
Intro To Time Series
85 pages
ARIMA Model Python Example - Time Series Forecasting
No ratings yet
ARIMA Model Python Example - Time Series Forecasting
11 pages
Demgn801 Business Analytics 76 150
No ratings yet
Demgn801 Business Analytics 76 150
75 pages
Business Report TSF - Rose DataSet
100% (4)
Business Report TSF - Rose DataSet
52 pages
Study Notes For Business Forecasting
No ratings yet
Study Notes For Business Forecasting
23 pages
Time Series Forecasting
No ratings yet
Time Series Forecasting
29 pages
? Time Series
No ratings yet
? Time Series
27 pages
VO MCA S4 Time Series Analytics U5
No ratings yet
VO MCA S4 Time Series Analytics U5
22 pages
Lec 3 Ebm313
No ratings yet
Lec 3 Ebm313
21 pages
Forecasting For Stock Control
No ratings yet
Forecasting For Stock Control
27 pages
What Is A Time Series
No ratings yet
What Is A Time Series
34 pages
Time Series
No ratings yet
Time Series
45 pages
DSS16-Time Series
No ratings yet
DSS16-Time Series
65 pages
Practical Research 2: Quarter 4 - Module 2 Quantitative Data-Collection Techniques
100% (2)
Practical Research 2: Quarter 4 - Module 2 Quantitative Data-Collection Techniques
3 pages
Time Series Analysis - CheatSheet
No ratings yet
Time Series Analysis - CheatSheet
10 pages
FM - Resumes
No ratings yet
FM - Resumes
18 pages
Forecasting Models
No ratings yet
Forecasting Models
9 pages
Unit 4
No ratings yet
Unit 4
24 pages
Time Series
No ratings yet
Time Series
27 pages
Time Analysis in Statistics Presentation
No ratings yet
Time Analysis in Statistics Presentation
16 pages
06 Time Series Analysis
No ratings yet
06 Time Series Analysis
9 pages
Timeseries - Analysis
No ratings yet
Timeseries - Analysis
37 pages
Roadmap For Project
No ratings yet
Roadmap For Project
9 pages
Unit 6 2
No ratings yet
Unit 6 2
6 pages
Forecasting
No ratings yet
Forecasting
75 pages
Business Analytis C4
No ratings yet
Business Analytis C4
10 pages
Module 2.3 EDA Part 3 Time Series Data in Python and R
No ratings yet
Module 2.3 EDA Part 3 Time Series Data in Python and R
20 pages
Understanding Time Series
No ratings yet
Understanding Time Series
13 pages
UNIT 5 Time Series Analysis
No ratings yet
UNIT 5 Time Series Analysis
17 pages
Dav 4
No ratings yet
Dav 4
6 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
12 pages
Time Series Analysis
No ratings yet
Time Series Analysis
5 pages
Time Series Cheat Sheet
No ratings yet
Time Series Cheat Sheet
1 page
End Term Project (BA)
No ratings yet
End Term Project (BA)
19 pages
Assignment 1 Supplementary
No ratings yet
Assignment 1 Supplementary
5 pages
Data Analyst Course
No ratings yet
Data Analyst Course
8 pages
Time Series Analysis 1718649022
No ratings yet
Time Series Analysis 1718649022
5 pages
Time Arima 002
No ratings yet
Time Arima 002
11 pages
Resumos Forecasting
No ratings yet
Resumos Forecasting
17 pages
TIME - ChatGPT Manual 001
No ratings yet
TIME - ChatGPT Manual 001
7 pages
ADS LAB7 Removed
No ratings yet
ADS LAB7 Removed
3 pages
Lecture 7
No ratings yet
Lecture 7
10 pages
Understanding Time Series Data
No ratings yet
Understanding Time Series Data
3 pages
Researchpaperedited
No ratings yet
Researchpaperedited
20 pages
Note - Unit-4
No ratings yet
Note - Unit-4
12 pages
Political Dynamics Finals
No ratings yet
Political Dynamics Finals
2 pages
Chapter Vii Time Series Analysis
No ratings yet
Chapter Vii Time Series Analysis
6 pages
Time Series Forecasting
No ratings yet
Time Series Forecasting
4 pages
Markets
No ratings yet
Markets
5 pages
SCA Notes Unit Two
No ratings yet
SCA Notes Unit Two
4 pages
TSA Chapters 1: Introduction To Time Series
No ratings yet
TSA Chapters 1: Introduction To Time Series
4 pages
Methods of Field Experimentation
No ratings yet
Methods of Field Experimentation
40 pages
TSA Chapter 1
No ratings yet
TSA Chapter 1
2 pages
Class Notes
No ratings yet
Class Notes
6 pages
Time Series
No ratings yet
Time Series
1 page
Answer 4
No ratings yet
Answer 4
3 pages
Effect of Project Inovation
No ratings yet
Effect of Project Inovation
66 pages
Skywatcher Discovery Framework - April 2025
No ratings yet
Skywatcher Discovery Framework - April 2025
52 pages
Basic Statistical Terms
No ratings yet
Basic Statistical Terms
3 pages
Statistics and Prob 11 Summative Test 1,2 and 3 Q4
No ratings yet
Statistics and Prob 11 Summative Test 1,2 and 3 Q4
3 pages
Qualitative Versus Quantitative Research
No ratings yet
Qualitative Versus Quantitative Research
3 pages
Chap11 PPT
100% (1)
Chap11 PPT
46 pages
A Dendrochronology Program Library in R (DPLR)
No ratings yet
A Dendrochronology Program Library in R (DPLR)
10 pages
Icma Centre University of Reading: Quantitative Methods For Finance
No ratings yet
Icma Centre University of Reading: Quantitative Methods For Finance
3 pages
Materi 1
No ratings yet
Materi 1
18 pages
Data Analysis With Excel Handbook p1
No ratings yet
Data Analysis With Excel Handbook p1
17 pages
M.Sc. IT Semester III Artificial Neural Networks (2014 - 2015) Chapter 1 To 5
No ratings yet
M.Sc. IT Semester III Artificial Neural Networks (2014 - 2015) Chapter 1 To 5
4 pages
Project
No ratings yet
Project
34 pages
Naïve Model Period A. Demand Forecast F.Error Bs. Forecast Mse Percentage Error
No ratings yet
Naïve Model Period A. Demand Forecast F.Error Bs. Forecast Mse Percentage Error
6 pages
How To Use SPSS A Step by Step Guide To Analysis and Interpretation 4th Edition by Brian Cronk 188458568X 978-1884585685
No ratings yet
How To Use SPSS A Step by Step Guide To Analysis and Interpretation 4th Edition by Brian Cronk 188458568X 978-1884585685
49 pages
2023 Socio-Economic Profile
No ratings yet
2023 Socio-Economic Profile
56 pages
Eco 232 Main Exam May 2022 PDF
No ratings yet
Eco 232 Main Exam May 2022 PDF
5 pages
Quantitative Methods For Management: Session 8
No ratings yet
Quantitative Methods For Management: Session 8
60 pages
Meta Analysis of Skills An All Inclusive Manual: Sayed Huzaifa Mumit
No ratings yet
Meta Analysis of Skills An All Inclusive Manual: Sayed Huzaifa Mumit
23 pages
May MG 1
No ratings yet
May MG 1
19 pages
Machine Learning Model: Machine Learning 2021 UML Book Chapter 2 Slides P. Zanuttigh (Derived From F. Vandin Slides)
No ratings yet
Machine Learning Model: Machine Learning 2021 UML Book Chapter 2 Slides P. Zanuttigh (Derived From F. Vandin Slides)
21 pages
Chapter 7: Introduction: 1 Convergence in Distribution
No ratings yet
Chapter 7: Introduction: 1 Convergence in Distribution
6 pages
Macroeconomic Theory and Policy c3-c6
No ratings yet
Macroeconomic Theory and Policy c3-c6
17 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Mathematics and Statistics in Cybersecurity
No ratings yet
Mathematics and Statistics in Cybersecurity
2 pages
Endogeneity What It Is, and Potential Sources: Select Page
No ratings yet
Endogeneity What It Is, and Potential Sources: Select Page
2 pages

Statistics Project SEM1 Notes

Uploaded by

Statistics Project SEM1 Notes

Uploaded by

Statistics Project SEM1 notes:

### Preliminary Assessment of the Time Series:

### Estimation and Discussion of Suitable Time Series Models:

### Model Evaluation and Forecasting:

### Discussion of Optimal Model:

### Report Writing:

EDA process to do:

Here's what each part of the code does:

 df: This is your DataFrame containing the data.

So, by calling plt.gca().xaxis.set_major_locator(YearLocator()) , you're setting the major

TIME SERIES MODELS

You might also like