0% found this document useful (0 votes)

9 views5 pages

Time Series Analysis

The document provides a comprehensive guide on time series analysis using Python, including data preprocessing, visualization, and testing for stationarity. It covers techniques such as differencing, seasonal decomposition, and model fitting using ARIMA and SARIMAX. The document also highlights the importance of determining the right order of differencing and includes practical examples with code snippets.

Uploaded by

Daniel Wu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views5 pages

Time Series Analysis

Uploaded by

Daniel Wu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

---------------------------------------Time Series

Analysis----------------------------------

##date_parser: This specifies a function which converts an input string

# into datetime variable. Be default Pandas reads data in
# format ‘YYYY-MM-DD HH:MM:SS’. If the data is not in this format,
# the format has to be manually defined. Something similar to the
# dataparse function defined here can be used for this purpose.

#Convert date to the correct time series date formate if needed

dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m')

# Read in time series

df = pd.read_csv('D:\\For Dan\\Learning\\Web\\AirPassengers.csv',
parse_dates=['Month'], index_col='Month')
#by the lambda function above #,date_parser=dateparse)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# 1Read in time series

#Check the data, drop any N/A rows

df['Month'] = pd.to_datetime(df['Month'])

df.set_index('Month', inplace=True)

#2 preprocessing, also check for missing value

df.timestamp = pd.to_datetime(df.Month , format = '%Y-%m')
df.index = train.timestamp
df.drop('Month',axis = 1, inplace = True)

#Set the index column

df['Month'] = pd.to_datetime(df['Month'])
df.set_index('Month', inplace=True)

----------------Visualize the Data

plt.rcParams['figure.figsize'] = 8,4
df.plot()

-----------------Testing For Stationarity

### Dickey-Fuller test for Time Series Stationarity
from statsmodels.tsa.stattools import adfuller

# Check for the p-value is it less than 5% or 1% 95%/99% confident to reject Ho

adfuller(df['Sales'])

#Ho: It is non stationary

#H1: It is stationary

def adf_test(values):
result=adfuller(values)
labels = ['ADF Test Statistic','p-value','#Lags Used','Number of Observations
Used']
for value,label in zip(result,labels):
print(label+' : '+str(value) )
if result[1] <= 0.05:
print("strong evidence against the null hypothesis(Ho), reject the null
hypothesis. Data has no unit root and is stationary")
else:
print("weak evidence against null hypothesis, time series has a unit root,
indicating it is non-stationary ")

adf_test(df['Sales'])

------------Difference
df['Sales First Difference'] = df['Sales']-df['Sales'].shift(1)
# Since the data is seasonal(sales cycle usually happened during a year, a seasonal
period)
df['Seasonal First Difference'] = df['Sales']-df['Sales'].shift(12)

#Or
df['Sales'].diff()

#Check constant mean/std after difference

plt.rcParams['figure.figsize'] = 8,4
rolmean = df['Sales'].rolling(12).mean()
rolstd = df['Sales'].rolling(12).std()
orig = plt.plot(df['Sales'], color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend()

#Doing the adfuller test again, make sure to dropna

print(adf_test(df['Sales'].dropna()))
print('\n')
print(adf_test(df['Sales First Difference'].dropna()))
print('\n')
print(adf_test(df['Seasonal First Difference'].dropna()))

df['Seasonal First Difference'].plot()

----------------Decomposing
from statsmodels.tsa.seasonal import seasonal_decompose

#Also make sure to dropna

dfs = df['Seasonal First Difference'].dropna()
decomposition = seasonal_decompose(dfs)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

how to determine the right order of differencing?

The right order of differencing is the minimum differencing required to get a near-
stationary series which roams around a defined mean and the ACF plot reaches to
zero fairly quick.

If the autocorrelations are positive for many number of lags (10 or more), then the
series needs further differencing. On the other hand, if the lag 1 autocorrelation
itself is too negative, then the series is probably over-differenced.

In the event, you can’t really decide between two orders of differencing, then go
with the order that gives the least standard deviation in the differenced series.

If your series is slightly under differenced, adding one or more additional AR

terms usually makes it up. Likewise, if it is slightly over-differenced, try adding
an additional MA term.

------------fit the model find p d q

---1plot acf /pacf

from statsmodels.graphics.tsaplots import plot_acf,plot_pacf

plt.figure(figsize=(12,8))
plot_acf(df['Seasonal First Difference'].iloc[13:],lags=40)
plot_pacf(df['Seasonal First Difference'].iloc[13:],lags=40)

---2plot acf /pacf

# PACF plot of 1st differenced series
plt.rcParams.update({'figure.figsize':(9,3), 'figure.dpi':120})

fig, axes = plt.subplots(1, 2, sharex=True)

axes[0].plot(df.value.diff()); axes[0].set_title('1st Differencing')
axes[1].set(ylim=(0,5))
plot_pacf(df.value.diff().dropna(), ax=axes[1])

plt.show()

plt.rcParams.update({'figure.figsize':(9,3), 'figure.dpi':120})

fig, axes = plt.subplots(1, 2, sharex=True)

axes[0].plot(df.value.diff()); axes[0].set_title('1st Differencing')
axes[1].set(ylim=(0,1.2))
plot_acf(df.value.diff().dropna(), ax=axes[1])

plt.show()

# For non-seasonal data

#p=1, d=1, q=0 or 1
#p is the order of the AR term
#q is the order of the MA term
#d is the number of differencing required to make the time series stationary

from statsmodels.tsa.arima_model import ARIMA

model=ARIMA(df['Sales'],order=(1,1,1))
model_fit=model.fit()
model_fit.summary()

df['forecast']=model_fit.predict(start=90,end=103,dynamic=True)
df[['Sales','forecast']].plot(figsize=(12,8))

# seasonal order: in that season, how many order you're shifting

model=sm.tsa.statespace.SARIMAX(df['Sales'],order=(1, 1,
1),seasonal_order=(1,1,1,12))
results=model.fit()

#See how our forecast fit the actual

df['forecast']=results.predict(start=90,end=103,dynamic=True)
df[['Sales','forecast']].plot(figsize=(12,8))

#Predict the future month

from pandas.tseries.offsets import DateOffset
future_dates=[df.index[-1]+ DateOffset(months=x)for x in range(0,24)]

future_datest_df=pd.DataFrame(index=future_dates[1:],columns=df.columns)

future_df=pd.concat([df,future_datest_df])
future_df['forecast'] = results.predict(start = 104, end = 120, dynamic= True)
future_df[['Sales', 'forecast']].plot(figsize=(12, 8))

---------------------------An End-to-End Project on Time Series Analysis and

Forecasting with Python---------------------------------------------------------

import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
import pandas as pd
import statsmodels.api as sm
import matplotlib
matplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'

# You can specify the col_use when reading in files, or drop the unuse col later
df = pd.read_excel('C:\\Users\\wooju\\Desktop\\Python Programing\\Python Learning
Journey\\Dataset\\Superstore.xls',
sheet_name = 'Orders', usecols=['Order Date', 'Segment'])

if you were interested in summarizing all of the sales by month, you could use the
resample function. The tricky part about using resample is that it only operates on
an index. In this data set, the data is not indexed by the date column so resample
would not work without restructuring the data. In order to make it work, use
set_index to make the date column an index and then resample

M1 - L4 (Converting Non Stationary Data)
No ratings yet
M1 - L4 (Converting Non Stationary Data)
21 pages
Stationarity
No ratings yet
Stationarity
27 pages
Social Anxiety Social Interaction Anxiet
No ratings yet
Social Anxiety Social Interaction Anxiet
70 pages
9 Arima
No ratings yet
9 Arima
162 pages
Case Study Crude Oil Production Forecasting
No ratings yet
Case Study Crude Oil Production Forecasting
27 pages
Time Series
No ratings yet
Time Series
67 pages
Week 10 Intro Forecasting
No ratings yet
Week 10 Intro Forecasting
25 pages
LAB MANUAL 135 Time Series - Knit
No ratings yet
LAB MANUAL 135 Time Series - Knit
16 pages
Time Series Analysis of HDFCBANK Stock by Pavan
No ratings yet
Time Series Analysis of HDFCBANK Stock by Pavan
10 pages
Time Series Forecasting
No ratings yet
Time Series Forecasting
29 pages
Ibd Manual
No ratings yet
Ibd Manual
12 pages
Time Series Analysis
No ratings yet
Time Series Analysis
9 pages
Modules
No ratings yet
Modules
12 pages
Gas Prod
100% (3)
Gas Prod
24 pages
M5 Dataset Model
No ratings yet
M5 Dataset Model
13 pages
Completed Time Series Analysis! ?
No ratings yet
Completed Time Series Analysis! ?
24 pages
TSA Project Python Code
No ratings yet
TSA Project Python Code
6 pages
Time Series Using Python
No ratings yet
Time Series Using Python
18 pages
Time Series
67% (3)
Time Series
34 pages
ARIMA
No ratings yet
ARIMA
11 pages
Time Series Formulas and Python Functions
No ratings yet
Time Series Formulas and Python Functions
10 pages
ARIMA Note
No ratings yet
ARIMA Note
22 pages
Answering Questions With Data - Lab Manual-Independent (2018)
No ratings yet
Answering Questions With Data - Lab Manual-Independent (2018)
202 pages
CSE4261 Lecture-9
No ratings yet
CSE4261 Lecture-9
45 pages
00 Time Series Analysis - Complete Study Guide
No ratings yet
00 Time Series Analysis - Complete Study Guide
26 pages
Time Series Practical
No ratings yet
Time Series Practical
7 pages
Ifm Group2 Code
No ratings yet
Ifm Group2 Code
7 pages
Data Science Product Questions
No ratings yet
Data Science Product Questions
92 pages
Data Mining Using SAS Enterprise Miner A Case Study Approach PDF
No ratings yet
Data Mining Using SAS Enterprise Miner A Case Study Approach PDF
135 pages
Time-Series-Forecast-A-Comprehensive-Guide - Jupyter Notebook
No ratings yet
Time-Series-Forecast-A-Comprehensive-Guide - Jupyter Notebook
24 pages
Household Daily-Peak Electricity Load Forecasting With Statistical Models
No ratings yet
Household Daily-Peak Electricity Load Forecasting With Statistical Models
6 pages
Time Series Forecasting Handson
No ratings yet
Time Series Forecasting Handson
41 pages
Time Series Analysis in R A Beginner's Guide
No ratings yet
Time Series Analysis in R A Beginner's Guide
13 pages
Arima
No ratings yet
Arima
12 pages
TS Arima
No ratings yet
TS Arima
2 pages
EC212: Introduction To Econometrics Multiple Regression: Inference (Wooldridge, Ch. 4)
No ratings yet
EC212: Introduction To Econometrics Multiple Regression: Inference (Wooldridge, Ch. 4)
89 pages
Near Real Time Fraud Detection With Apac
No ratings yet
Near Real Time Fraud Detection With Apac
87 pages
Time Series Forecast - A Basic Introduction Using Python
No ratings yet
Time Series Forecast - A Basic Introduction Using Python
18 pages
Time Series Analysis 1718649022
No ratings yet
Time Series Analysis 1718649022
5 pages
Practical 9 - Time-Series Forecasting
No ratings yet
Practical 9 - Time-Series Forecasting
5 pages
06 Time Series Analysis
No ratings yet
06 Time Series Analysis
9 pages
ForecastingIndividualassignment MohammadMujtaba 12020063
No ratings yet
ForecastingIndividualassignment MohammadMujtaba 12020063
20 pages
Time Series Modeling: Shouvik Mani April 5, 2018
No ratings yet
Time Series Modeling: Shouvik Mani April 5, 2018
46 pages
Fournier - Consumer Brand Relationship PDF
No ratings yet
Fournier - Consumer Brand Relationship PDF
32 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
24 pages
(P3)
No ratings yet
(P3)
9 pages
Dav 4
No ratings yet
Dav 4
6 pages
Lampiran
No ratings yet
Lampiran
8 pages
Time Series Forecastingdocx - 1705073224
No ratings yet
Time Series Forecastingdocx - 1705073224
16 pages
Health Management Information System
No ratings yet
Health Management Information System
30 pages
ARIMA Model Python Example - Time Series Forecasting
No ratings yet
ARIMA Model Python Example - Time Series Forecasting
11 pages
Unit2-Data Science
No ratings yet
Unit2-Data Science
20 pages
TIME - ChatGPT Manual 001
No ratings yet
TIME - ChatGPT Manual 001
7 pages
Module 2.3 EDA Part 3 Time Series Data in Python and R
No ratings yet
Module 2.3 EDA Part 3 Time Series Data in Python and R
20 pages
Understanding Time Series
No ratings yet
Understanding Time Series
13 pages
cheatsheet的副本
No ratings yet
cheatsheet的副本
8 pages
Summer Internship Project Report
No ratings yet
Summer Internship Project Report
45 pages
End Term Project (BA)
No ratings yet
End Term Project (BA)
19 pages
Time Arima 002
No ratings yet
Time Arima 002
11 pages
DF PD - Read - Excel ('Sample - Superstore - XLS') : Anjaliassignmnet - Ipy NB
No ratings yet
DF PD - Read - Excel ('Sample - Superstore - XLS') : Anjaliassignmnet - Ipy NB
23 pages
Time Series Analysis
No ratings yet
Time Series Analysis
9 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Looking Out For Number One PDF
No ratings yet
Looking Out For Number One PDF
14 pages
Arima Notes
No ratings yet
Arima Notes
4 pages
One Whose Properties Do Not Depend On The Time at Which The Series Is Observed
No ratings yet
One Whose Properties Do Not Depend On The Time at Which The Series Is Observed
12 pages
An End-To-End Project On Time Series Analysis and Forecasting With Python
No ratings yet
An End-To-End Project On Time Series Analysis and Forecasting With Python
23 pages
Technical Paper 7 Statistical and Experimental Design Considerations in Alley Farming
No ratings yet
Technical Paper 7 Statistical and Experimental Design Considerations in Alley Farming
23 pages
ETI Solved Paper
No ratings yet
ETI Solved Paper
38 pages
Adela A110290011 Final Report Template
No ratings yet
Adela A110290011 Final Report Template
7 pages
董运昌《搁浅的心》指弹吉他谱
No ratings yet
董运昌《搁浅的心》指弹吉他谱
7 pages
Session 3 Slides Inferential Statistics
No ratings yet
Session 3 Slides Inferential Statistics
17 pages
FIFA 18 - Data Analysis: - Harsh Takrani - Pranay Lulla
No ratings yet
FIFA 18 - Data Analysis: - Harsh Takrani - Pranay Lulla
16 pages
Statistics Project SEM1 Notes
No ratings yet
Statistics Project SEM1 Notes
5 pages
Project6 Time Series
No ratings yet
Project6 Time Series
14 pages
Strata Stratch SQL Question - Hard
No ratings yet
Strata Stratch SQL Question - Hard
9 pages
Topic 1 - Analytics
No ratings yet
Topic 1 - Analytics
54 pages
Unofficial Cheat Sheet For Forecasting
No ratings yet
Unofficial Cheat Sheet For Forecasting
2 pages
Review
No ratings yet
Review
5 pages
Class Notes
No ratings yet
Class Notes
6 pages
Python Code Library
No ratings yet
Python Code Library
8 pages
数据科学 Sharon
No ratings yet
数据科学 Sharon
22 pages
A Comprehensive Analysis of The Effectiveness of AI Platforms in Improving Student Educational Skills
No ratings yet
A Comprehensive Analysis of The Effectiveness of AI Platforms in Improving Student Educational Skills
20 pages
Ai 2
No ratings yet
Ai 2
12 pages
Effective Data-Driven Campaigning For Credit Cards Target, Attract, Retain and Engage
No ratings yet
Effective Data-Driven Campaigning For Credit Cards Target, Attract, Retain and Engage
7 pages
Notes On ARIMA: ND RD
No ratings yet
Notes On ARIMA: ND RD
4 pages
Coca Cola Start
No ratings yet
Coca Cola Start
1 page
Data Science Notes
No ratings yet
Data Science Notes
5 pages
Cluster Analysis On PCA On Wholesale Customers Data
No ratings yet
Cluster Analysis On PCA On Wholesale Customers Data
6 pages
Symbolism in Edward Albee'S The Zoo Story: Research Paper
No ratings yet
Symbolism in Edward Albee'S The Zoo Story: Research Paper
8 pages
Michaud Made Orderform
No ratings yet
Michaud Made Orderform
2 pages
Linear Regression Machine Learning Model
No ratings yet
Linear Regression Machine Learning Model
10 pages
Theory of Planned Behavior, Self-Care Motivation, and Blood Pressure Self-Care
No ratings yet
Theory of Planned Behavior, Self-Care Motivation, and Blood Pressure Self-Care
15 pages
Tutorial 1
No ratings yet
Tutorial 1
5 pages
Assignment 11
No ratings yet
Assignment 11
5 pages
Chapter 4 Lesson 3 Mesaures of Dispersion 1
No ratings yet
Chapter 4 Lesson 3 Mesaures of Dispersion 1
9 pages
Methodology: Research Approach
No ratings yet
Methodology: Research Approach
20 pages
Git Editor Change
No ratings yet
Git Editor Change
1 page
Manulife Wellness Account List of Expenses
No ratings yet
Manulife Wellness Account List of Expenses
1 page
Individual Assignments: Unit 2: Values, Data Types and Data Structures in R, Assignment 1
No ratings yet
Individual Assignments: Unit 2: Values, Data Types and Data Structures in R, Assignment 1
5 pages
Class Test-1
No ratings yet
Class Test-1
1 page
U04d1 Repeated Measures ANOVA
No ratings yet
U04d1 Repeated Measures ANOVA
2 pages

Time Series Analysis

Uploaded by

Time Series Analysis

Uploaded by

---------------------------------------Time Series

##date_parser: This specifies a function which converts an input string

#Convert date to the correct time series date formate if needed

# Read in time series

# 1Read in time series

#2 preprocessing, also check for missing value

#Set the index column

----------------Visualize the Data

-----------------Testing For Stationarity

# Check for the p-value is it less than 5% or 1% 95%/99% confident to reject Ho

#Ho: It is non stationary

#Check constant mean/std after difference

#Doing the adfuller test again, make sure to dropna

df['Seasonal First Difference'].plot()

#Also make sure to dropna

how to determine the right order of differencing?

If your series is slightly under differenced, adding one or more additional AR

------------fit the model find p d q

from statsmodels.graphics.tsaplots import plot_acf,plot_pacf

---2plot acf /pacf

fig, axes = plt.subplots(1, 2, sharex=True)

fig, axes = plt.subplots(1, 2, sharex=True)

# For non-seasonal data

from statsmodels.tsa.arima_model import ARIMA

# seasonal order: in that season, how many order you're shifting

#See how our forecast fit the actual

#Predict the future month

---------------------------An End-to-End Project on Time Series Analysis and

You might also like