0% found this document useful (0 votes)
21 views23 pages

Time Series

Uploaded by

Alferino Filho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views23 pages

Time Series

Uploaded by

Alferino Filho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Steps implementation of Time Series Data EDA

1. Data Ingesition
2. EDA of the Data
3. processing of Data
4. Model Building
5. Model Evalution

Data Ingestion Steps:-


1. import the required libraries such as numpy,pandas,matplotlib,seaborn,etc
2. Load the data
3. Load the time series data into a pandas dataframe
4. Set the datetime columns as the index of dataframe
5. Check datatype of the index and convert it into the dataframe if necessary

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import sys
import warnings
warnings.filterwarnings('ignore')

df=pd.read_csv('TSLA.csv')
df

Date Open High Low Close Volume Dividends Stock Splits

0 2023-01-01 102.264052 102.844516 102.016732 102.375100 190884 0.0 0.0

1 2023-01-02 103.164210 103.568883 103.072105 103.268399 144529 0.0 0.0

2 2023-01-03 104.642948 104.945523 104.396706 104.661726 114590 0.0 0.0

3 2023-01-04 107.383841 107.749974 107.409781 107.514532 144406 0.0 0.0

4 2023-01-05 109.751399 109.687393 108.002799 109.147197 152652 0.0 0.0

... ... ... ... ... ... ... ... ...

360 2023-12-27 274.683259 274.739668 274.622839 274.681922 198906 0.0 0.0

361 2023-12-28 275.187029 275.220635 274.802580 275.070082 171058 0.0 0.0

362 2023-12-29 276.618878 277.740538 276.938281 277.099232 108824 0.0 0.0

363 2023-12-30 277.458843 278.365180 277.325499 277.716507 119610 0.0 0.0

364 2023-12-31 277.943161 278.736790 276.368373 277.682775 106382 0.0 0.0

365 rows × 8 columns

df.isnull().sum()

Date 0

Open 0

High 0

Low 0

Close 0

Volume 0

Dividends 0

Stock Splits 0

dtype: int64

Now we perform univariate analysis

df = df[['Date','Close']]
df
Date Close

0 2023-01-01 102.375100

1 2023-01-02 103.268399

2 2023-01-03 104.661726

3 2023-01-04 107.514532

4 2023-01-05 109.147197

... ... ...

360 2023-12-27 274.681922

361 2023-12-28 275.070082

362 2023-12-29 277.099232

363 2023-12-30 277.716507

364 2023-12-31 277.682775

365 rows × 2 columns

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 365 non-null object
1 Close 365 non-null float64
dtypes: float64(1), object(1)
memory usage: 5.8+ KB

df["Date"]=pd.to_datetime(df.Date)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 365 non-null datetime64[ns]
1 Close 365 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 5.8 KB

stock_df=df.set_index("Date")
stock_df

Close

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

... ...

2023-12-27 274.681922

2023-12-28 275.070082

2023-12-29 277.099232

2023-12-30 277.716507

2023-12-31 277.682775

365 rows × 1 columns

Why we convert this columns into index


1. Retrving of the data will be easy
2. Visualization will be easy
3. Whatever library we are using for the time series data like stats model scipy this library of data which is having index as a columns.

EDA of the Data


EDA of the Data
1. Summary Statistics such as mean,median,mode etc
2. Visualization the time series data
3. Stationarity check by using augmented dickey fuller test.
4. Check for Autocorrelation by using autocorrelation function (acf)
5. checking the Outlier
6. Check Partial autocorrelation function using ARIMA model.

Preprocessing of the data


1. fill the missing value (hear not required)
2. convert data into stationary time series
3. if necessary the normalized the data(hear(not required))
4. split the data into train and test.
5. clean the data by removing the outliers(hear not required)

stock_df.describe()

Close

count 365.000000

mean 199.661626

std 51.101389

min 102.375100

25% 147.327615

50% 205.663111

75% 238.942848

max 277.716507

stock_df.head()

Close

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

plt.plot(stock_df)
plt.show()

plt.hist(stock_df)
plt.hist(stock_df)
plt.show()

sns.distplot(stock_df)
plt.show()

# plotting close price

plt.style.use('ggplot')
plt.figure(figsize=(18,8))

plt.grid(True)

plt.xlabel('Dates', fontsize = 20)

plt.xticks(fontsize = 15)

plt.ylabel('Close Prices', fontsize = 20)

plt.yticks(fontsize = 15)

plt.plot(stock_df['Close'], linewidth = 3, color = 'blue')

plt.title('Tesla Stock Closing Price', fontsize = 30)

plt.show()
# plotting close price

plt.style.use('ggplot')
plt.figure(figsize=(18,8))

plt.grid(True)

plt.xlabel('Dates', fontsize = 20)

plt.xticks(fontsize = 15)

plt.ylabel('Close Prices', fontsize = 20)

plt.yticks(fontsize = 15)

plt.hist(stock_df['Close'], linewidth = 3, color = 'blue')

plt.title('Tesla Stock Closing Price', fontsize = 30)

plt.show()

# Style and figure size


plt.style.use('ggplot')
plt.figure(figsize=(18, 8))

# Labeling
plt.xlabel('Dates', fontsize=20)
plt.xticks(fontsize=15)
plt.ylabel('Close Prices', fontsize=20)
plt.yticks(fontsize=15)

# Plotting the distribution (Kernel Density Estimate plot)


sns.kdeplot(stock_df['Close'], color='blue', linewidth=3)

# Title
plt.title('Tesla Stock Closing Price Distribution', fontsize=30)

plt.grid(True)
plt.show()

stock_df["Close"]

Close

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

... ...

2023-12-27 274.681922

2023-12-28 275.070082

2023-12-29 277.099232

2023-12-30 277.716507

2023-12-31 277.682775

365 rows × 1 columns

dtype: float64

#Rolling mean which is windows size


rolemean=stock_df["Close"].rolling(120).mean()
rolemean
Close

Date

2023-01-01 NaN

2023-01-02 NaN

2023-01-03 NaN

2023-01-04 NaN

2023-01-05 NaN

... ...

2023-12-27 254.024091

2023-12-28 254.389435

2023-12-29 254.769750

2023-12-30 255.152673

2023-12-31 255.535278

365 rows × 1 columns

dtype: float64

#Rolling mean which is windows size


rolestd=stock_df["Close"].rolling(120).std()
rolestd

Close

Date

2023-01-01 NaN

2023-01-02 NaN

2023-01-03 NaN

2023-01-04 NaN

2023-01-05 NaN

... ...

2023-12-27 14.628715

2023-12-28 14.602063

2023-12-29 14.594201

2023-12-30 14.588376

2023-12-31 14.572035

365 rows × 1 columns

dtype: float64

plt.plot(stock_df.Close)
plt.plot(rolemean)
plt.plot(rolestd)

[<matplotlib.lines.Line2D at 0x7da8e8eca110>]
from statsmodels.tsa.stattools import adfuller
adft=adfuller(stock_df['Close'])

pd.Series(adft[0:4],index=["test stats","p value","lag","data points"])

test stats -1.893196

p value 0.335269

lag 0.000000

data points 364.000000

dtype: float64

# null hypotheseis=data is not stationary


# alternate hypothesis=data is stationary
# p value=0.335269
# p<0.05
# reject null hypothesis

# p>0.05
# accept null hypothesis

def test_stationarity(timeseries):
# Determining rolling statistics
rolmean = timeseries.rolling(48).mean() # rolling mean
rolstd = timeseries.rolling(48).std() # rolling standard deviation

# Plotting rolling statistics


plt.figure(figsize=(18, 8))
plt.grid('both')
plt.plot(timeseries, color='blue', label='Original', linewidth=3)
plt.plot(rolmean, color='red', label='Rolling Mean', linewidth=3)
plt.plot(rolstd, color='black', label='Rolling Std', linewidth=4)
plt.legend(loc='best', fontsize=20, shadow=True, facecolor='lightgray',edgecolor='k')
plt.title('Rolling Mean and Standard Deviation', fontsize=25)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.show(block=False)

# Perform Dickey-Fuller test


print("Results of Dickey-Fuller Test:")
adft = adfuller(timeseries, autolag='AIC')

# Displaying the output of the Dickey-Fuller test


output = pd.Series(adft[0:4], index=['Test Statistic', 'p-value', '#Lags Used', 'Number of Observations Used'
for key, value in adft[4].items():
output[f'Critical Value ({key})'] = value
print(output)

test_stationarity(stock_df.Close)

Results of Dickey-Fuller Test:


Test Statistic -1.893196
p-value 0.335269
#Lags Used 0.000000
Number of Observations Used 364.000000
Critical Value (1%) -3.448443
Critical Value (5%) -2.869513
Critical Value (10%) -2.571018
dtype: float64

#check the outliers


sns.boxplot(stock_df.Close)

<Axes: ylabel='Close'>

#Time series decomposition


from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(stock_df[["Close"]],period=12)
result.plot()
plt.show()

result.seasonal

seasonal

Date

2023-01-01 -0.049962

2023-01-02 0.098094

2023-01-03 -0.012132

2023-01-04 0.071651

2023-01-05 0.282969

... ...

2023-12-27 -0.049962

2023-12-28 0.098094

2023-12-29 -0.012132

2023-12-30 0.071651

2023-12-31 0.282969

365 rows × 1 columns

dtype: float64

from statsmodels.graphics.tsaplots import plot_acf,plot_pacf


plot_acf(stock_df.Close)#this function provide the correlation value on differenct different lags
plot_pacf(stock_df.Close)
df_close=stock_df["Close"]
df_close
Close

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

... ...

2023-12-27 274.681922

2023-12-28 275.070082

2023-12-29 277.099232

2023-12-30 277.716507

2023-12-31 277.682775

365 rows × 1 columns

dtype: float64

df_close

Close

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

... ...

2023-12-27 274.681922

2023-12-28 275.070082

2023-12-29 277.099232

2023-12-30 277.716507

2023-12-31 277.682775

365 rows × 1 columns

dtype: float64

df_close=df_close.diff()
df_close=df_close.dropna()

test_stationarity(df_close)
Results of Dickey-Fuller Test:
Test Statistic -5.281090
p-value 0.000006
#Lags Used 7.000000
Number of Observations Used 356.000000
Critical Value (1%) -3.448853
Critical Value (5%) -2.869693
Critical Value (10%) -2.571114
dtype: float64

perform train test split of time series model


df_close[0:-60]#training data

Close

Date

2023-01-02 0.893299

2023-01-03 1.393326

2023-01-04 2.852806

2023-01-05 1.632665

2023-01-06 0.547885

... ...

2023-10-28 -0.800922

2023-10-29 2.113884

2023-10-30 0.623045

2023-10-31 -0.615555

2023-11-01 1.324551

304 rows × 1 columns

dtype: float64

df_close[-60:]#testing data

Close

Date

2023-11-02 -0.166372

2023-11-03 -1.044071

2023-11-04 -0.820504

2023-11-05 1.426642

2023-11-06 0.892228

2023-11-07 -0.178966

2023-11-08 1.583613

2023-11-09 -0.351853

2023-11-10 -0.247485
2023-11-10 -0.247485

2023-11-11 -0.109094

2023-11-12 0.598929

2023-11-13 0.740941

2023-11-14 0.324071

2023-11-15 0.351657

2023-11-16 0.286591

2023-11-17 -0.604990

2023-11-18 0.541170

2023-11-19 0.030313

2023-11-20 -0.329250

2023-11-21 -0.608250

2023-11-22 0.342926

2023-11-23 0.701763

2023-11-24 1.912550

2023-11-25 0.010383

2023-11-26 1.710726

2023-11-27 1.372687

2023-11-28 -0.686699

2023-11-29 0.954674

2023-11-30 -0.466640

2023-12-01 -2.042970

2023-12-02 0.638001

2023-12-03 -1.164504

2023-12-04 1.029772

2023-12-05 0.085211

2023-12-06 1.994400

2023-12-07 2.125879

2023-12-08 -0.086252

2023-12-09 -0.577820

2023-12-10 -0.268866

2023-12-11 -0.369897

2023-12-12 -0.074961

2023-12-13 0.421395

2023-12-14 0.703201

2023-12-15 1.114858

2023-12-16 0.921257

2023-12-17 -0.380721

2023-12-18 -1.068875

2023-12-19 1.768739

2023-12-20 0.322271

2023-12-21 -0.512357

2023-12-22 0.072897

2023-12-23 -1.323132

2023-12-24 0.106720

2023-12-25 -0.330315

2023-12-26 1.021908

2023-12-27 1.472730

2023-12-28 0.388160

2023-12-29 2.029151

2023-12-30 0.617275

2023-12-31 -0.033732

dtype: float64

#split data into train and training set


#split data into train and training set
train_data=df_close[0:-60]
test_data=df_close[-60:]
plt.figure(figsize=(18,8))
plt.grid(True)
plt.xlabel('Dates', fontsize = 20)
plt.ylabel('Closing Prices', fontsize = 20)
plt.xticks(fontsize = 15)
plt.xticks(fontsize = 15)
plt.plot(train_data, 'green', label='Train data', linewidth = 5)
plt.plot(test_data, 'blue', label='Test data', linewidth = 5)
plt.legend(fontsize = 20, shadow=True,facecolor='lightpink', edgecolor = 'k')

<matplotlib.legend.Legend at 0x7da92d05a890>

Model building in Time Series


#in this time we use arima model
stock_df["Close"]

Close

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

... ...

2023-12-27 274.681922

2023-12-28 275.070082

2023-12-29 277.099232

2023-12-30 277.716507

2023-12-31 277.682775

365 rows × 1 columns

dtype: float64

365-60

305

import statsmodels.api as sm
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error

history = [x for x in train_data]


history

[0.8932991355088831,
1.3933263239742928,
2.852806278873757,
1.6326649420308286,
0.5478846329143892,
0.7696058342437482,
0.9625287430472156,
0.018231373084361735,
1.1077005040734065,
0.7497050402641605,
1.972947678168694,
0.40951767041809717,
1.1636240679130196,
1.2304959594809617,
0.18371201236193713,
2.4325497305482457,
-0.26391312202788697,
1.442625668491928,
-0.5318829334557194,
-2.413729246967847,
1.6712015109161484,
0.9317498913066089,
0.0698322548046093,
2.807609973898053,
-1.0265658609344541,
0.43923686869243284,
0.5207027332374281,
1.6977267356932941,
2.4579251484588553,
0.3907230052264481,
1.0285667069413762,
-0.4168248436978672,
-1.31013559865616,
-0.06614521282753572,
0.7182963135563796,
1.4772153494880627,
1.995977100207341,
-0.2607738551311627,
0.18751835521305793,
-0.1470745941979601,
-0.9302395758612363,
-1.175458451390483,
2.2248009695441056,
-0.02044980691758269,
0.19315671831344616,
-0.6671134180505476,
1.1357477139460173,
-0.8821431823303101,
0.35407616995422586,
-0.8295849273184785,
0.9959651825627134,
-0.06403624432709876,
-0.1254060218069526,
0.20201766368549556,
0.8513926357980495,
0.44520452621046047,
0.3740700567433919,
0.532466961761088,
-0.12477596304930216,
-0.01570540903489359,
0.1639978537589286,
-0.7857140190270115,
-1.2328669657189266,
1.1792026695864877,
-0.017910958370322305,
-1.4507965353131453,
1.5357081015208678,
-0.61471958905355,
0.41149953226522484,
1.0698880810996627,
0.8176723681895055,
1.6438973978009699,
-0.5466558718802901,
0.5599350920078052,
-0.17443936153424033,
-0.5136371255328243,
0.187439400478155,
0.5121577171133822,
0.4071279719513825,
-0.760302556354219,
1.2999894672940684,
0.667526643148932,
-0.7276142970002581,
1.912569456755989,
2.027267021302805,
2.3918575744206407,
-0.13105016392063362,
-0.17900711779094536,
1.4184955199386025,
0.38286385364656894,
1.3036863601885216,
0.5610416187842588,
1.835680127192603,
0.8274719712972285,
0.962973437639846,
0.43310279806178187,
2.0598256806272843,
0.8728259721148106,
1.1014355498728605,
2.7299102840982528,
-1.5219601321712162,
-0.5815011359226219,
1.236853495737762,
-0.2699754161442627,
1.9296890641726065,
0.6184725435801113,
-0.29059930396928735,
2.217915352321569,
2.415112577533364,
2.2554566313530415,
1.5617323934664569,
-0.9518744601043636,
2.5516783572660984,
0.41447323494159605,
1.4482892620561643,
1.3321046160507137,
0.23776838206009643,
1.2299132591679154,
1.5062757615648366,
0.7259905772023387,
-0.14271730621246093,
0.5038740865232683,
1.8941510595801674,
-0.20365688303775187,
0.04264250908897793,
0.3792145940770979,
2.384819429402455,
1.2267840907422567,
1.0470496207943256,
-0.9197373312707668,
1.105424410204506,
-0.12503909465468155,
0.4347971674349935,
-0.00047127816949910084,
1.4605530632271666,
0.8567678108843779,
0.4095450043537312,
0.9541648088898853,
-0.9846694681493204,
-1.0661810820396056,
1.0496226968534188,
0.6124782628608898,
1.3059418277300097,
3.0336473041813576,
1.6702186490335293,
-0.47266247272744977,
1.2718941508587136,
-1.0736219471799018,
0.56404071487745,
0.050955787937795094,
2.648999286346992,
-0.3239580264257995,
-0.4083176648713902,
0.736210905074671,
-0.47007975917725275,
1.2075272192262787,
-0.4642814851631556,
-0.648229305801209,
0.42947268677431794,
-0.21222472933988,
2.547204410309206,
1.4523247896722182,
0.14151962258091544,
0.021271593676090106,
0.2571075913099321,
0.41571940289978215,
-0.7933694966874612,
0.8053587181372848,
1.0925945456034185,
1.7128518843369989,
0.7429764591254013,
1.1948999450876556,
-0.12492871305823883,
-0.3450670390670041,
0.7897461900880103,
-0.1497014015942284,
0.08206034496333814,
-0.01971566829175231,
0.138933152095575,
0.4939003011037073,
-0.5239700280359045,
-1.155146324585104,
-0.5870156275400689,
0.686234065534677,
-1.335096902104965,
-0.445184759171525,
0.7279536732392273,
-0.5440324664817808,
2.058874594525122,
-0.27255959791790474,
-0.2724127433110368,
0.8499787304334347,
-0.7095231836120774,
1.1477017576514754,
0.40817688196634094,
1.3651581417331329,
1.3606390307013214,
2.229787485292121,
1.8274741977712665,
0.26484370559887793,
0.5955447688372146,
1.5488637598278103,
1.334956406980524,
1.048843017829114,
-1.0411073790201328,
0.12284371446423847,
-0.3749746318142684,
1.3140512763593222,
0.3287323868915166,
1.1016466869869816,
0.5882608524204045,
1.3703320827427206,
-0.1815827758780415,
0.2385799079773676,
-0.47976109312293147,
0.6954842758049438,
0.6059251624713227,
3.2347373527486525,
0.33844309895349056,
-0.303456523336763,
-0.002355769474434055,
-0.04602240039588423,
1.1438055364243667,
-1.6396435925654487,
1.2086371235202478,
0.5362330707300487,
0.9004626050362674,
-0.7666321295496061,
1.1056824150820432,
-1.0717533413471472,
-0.35213855788359183,
0.10624865407280026,
0.7082703765054248,
-0.3099064479051208,
0.9823418343268884,
2.307936411124075,
-2.0564375154723677,
1.2698091022463132,
0.8851898436832926,
0.24098021966440797,
-0.08847856271148657,
0.2326345171115065,
0.30425645941497237,
0.004451985076087794,
-1.0164009236605978,
1.7421066978022566,
1.9739693751788252,
-0.38503585552368236,
-1.0979274424854566,
0.9869216436693193,
-0.021929141850222322,
0.6957551437645861,
0.10236768419537157,
1.0597859114010078,
1.3301609680348747,
-0.2621673797302151,
-1.0454573238503997,
-1.2060857036777577,
1.3796231147639162,
-0.9547226304205481,
0.27181288446479357,
0.3146558113507183,
-0.013781623356692307,
-1.435210642150338,
0.5538345451957127,
1.220740058501633,
0.6369303829316095,
0.25667955803731957,
0.4518904213170458,
0.8895449833110263,
-2.5488895093606345,
2.8701273263094436,
0.8968927087922793,
-0.08302734093248887,
-0.12824623694370985,
1.1801893787638278,
0.7215067046920751,
-2.1639138156295985,
2.479478937589903,
0.7481824800907759,
1.2967698907965826,
-0.12351585784816166,
1.714674382931122,
1.537481772986979,
0.9174169024166474,
-1.1684285505278353,
2.4679278706622654,
0.9080225036720435,
1.7723986293189284,
-0.24381132693795848,
-0.014754413257094257,
2.781816436072006,
-0.2887306275033268,
-0.20006864541986147,
2.056440550381069,
0.5849053174142966,
1.3754351704423016,
-0.32557474963380173,
1.0517465564922759,
-0.8009215689510256,
2.113884101388976,
0.6230446831372092,
-0.6155548195604865,
1.3245506334627066]

#train arima model and we pass data as a history


model=ARIMA(history,order=(1,1,1))
model=model.fit()
model.summary()

SARIMAX Results
Dep. Variable: y No. Observations: 304

Model: ARIMA(1, 1, 1) Log Likelihood -444.626

Date: Sat, 02 Nov 2024 AIC 895.251

Time: 00:05:55 BIC 906.392

Sample: 0 HQIC 899.708

- 304

Covariance Type: opg

coef std err z P>|z| [0.025 0.975]

ar.L1 -0.1192 0.054 -2.195 0.028 -0.226 -0.013

ma.L1 -0.9370 0.023 -41.528 0.000 -0.981 -0.893

sigma2 1.0933 0.091 11.989 0.000 0.915 1.272

Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 0.09

Prob(Q): 0.91 Prob(JB): 0.95

Heteroskedasticity (H): 1.07 Skew: 0.00

Prob(H) (two-sided): 0.75 Kurtosis: 2.91

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

len(history)

304

model.forecast()

array([0.56016095])

mean_squared_error([test_data[0]],model.forecast())

0.527849948848813
np.sqrt(mean_squared_error([test_data[0]],model.forecast()))

0.7265328270964864

def train_arima_model(x, y, arima_order):


# prepare training dataset
# make predictions list
history = [x for x in x]
predictions = list()
for t in range(len(y)):
model = ARIMA(history, order=arima_order)
model_fit = model.fit()
yhat = model_fit.forecast()[0]
predictions.append(yhat)
history.append(y[t])
# calculate out of sample error
rmse = np.sqrt(mean_squared_error(y, predictions))
return rmse

# evaluate different combinations of p, d and q values for an ARIMA model to get the best order for ARIMA Model
def evaluate_models(dataset, test, p_values, d_values, q_values):
dataset=dataset.astype('float32')
best_score, best_cfg = float("inf"), None
for p in p_values:
for d in d_values:
for q in q_values:
order = (p,d,q)
try:
rmse=train_arima_model(dataset, test, order)
if rmse<best_score:
best_score, best_cfg = rmse, order
print('ARIMA%s RMSE=%.3f' % (order,rmse))
except:
continue
print('Best ARIMA%s RMSE=%.3f'%(best_cfg,best_score))

p_values=range(0,3)
d_values=range(0,3)
q_values=range(0,3)
evaluate_models(train_data,test_data,p_values,d_values,q_values)

ARIMA(0, 0, 0) RMSE=0.932
ARIMA(0, 0, 1) RMSE=0.940
ARIMA(0, 0, 2) RMSE=0.940
ARIMA(0, 1, 0) RMSE=1.237
ARIMA(0, 1, 1) RMSE=0.933
ARIMA(0, 1, 2) RMSE=0.958
ARIMA(0, 2, 0) RMSE=2.140
ARIMA(0, 2, 1) RMSE=1.239
ARIMA(0, 2, 2) RMSE=0.938
ARIMA(1, 0, 0) RMSE=0.941
ARIMA(1, 0, 1) RMSE=0.941
ARIMA(1, 0, 2) RMSE=0.953
ARIMA(1, 1, 0) RMSE=1.097
ARIMA(1, 1, 1) RMSE=0.955
ARIMA(1, 1, 2) RMSE=0.968
ARIMA(1, 2, 0) RMSE=1.604
ARIMA(1, 2, 1) RMSE=1.098
ARIMA(1, 2, 2) RMSE=0.959
ARIMA(2, 0, 0) RMSE=0.940
ARIMA(2, 0, 1) RMSE=0.953
ARIMA(2, 0, 2) RMSE=0.913
ARIMA(2, 1, 0) RMSE=1.045
ARIMA(2, 1, 1) RMSE=0.960
ARIMA(2, 1, 2) RMSE=0.957
ARIMA(2, 2, 0) RMSE=1.303
ARIMA(2, 2, 1) RMSE=1.047
ARIMA(2, 2, 2) RMSE=0.965
Best ARIMA(2, 0, 2) RMSE=0.913

history=[x for x in train_data]


predictions=list()
for i in range(len(test_data)):
model=ARIMA(history,order=(2,0,0))
model=model.fit()
fc=model.forecast(alpha=0.05)#alpha=0.05 we set out confidence interval 95%
predictions.append(fc)
history.append(test_data[i])
print(f"my RMSE {np.sqrt(mean_squared_error(test_data,predictions))}")

my RMSE 0.9404476463697088

plt.figure(figsize=(18,8))
plt.grid(True)
plt.plot(range(len(test_data)), test_data,label='True Test Close Value',linewidth = 5)
plt.plot(range(len(predictions)), predictions, label = 'Predictions on test data', linewidth = 5)
plt.xticks(fontsize = 15)
plt.xticks(fontsize = 15)
plt.legend(fontsize = 20, shadow=True, facecolor='lightpink', edgecolor = 'k')
plt.show()

fc_series=pd.Series(predictions,index=test_data.index)

#plot
plt.figure(figsize=(12,5), dpi=100)
plt.plot(train_data, label='Training', color = 'blue')
plt.plot(test_data, label='Test', color = 'green', linewidth = 3)
plt.plot(fc_series, label='Forecast', color = 'red')
plt.title('Forecast vs Actuals on test data')
plt.legend(loc='upper left', fontsize=8)
plt.show

matplotlib.pyplot.show
def show(*args, **kwargs)

Display all open figures.

Parameters
----------
block : bool, optional
Whether to wait for all figures to be closed before returning.

If `True` block and run the GUI main loop until all figure windows
are closed.

If `False` ensure that all figure windows are displayed and return
immediately. In this case, you are responsible for ensuring
that the event loop is running to have responsive figures.

Defaults to True in non-interactive mode and to False in interactive


mode (see `.pyplot.isinteractive`).

See Also
--------
ion : Enable interactive mode, which shows / updates the figure after
every plotting command, so that calling ``show()`` is not necessary.
ioff : Disable interactive mode.
savefig : Save the figure to an image file instead of showing it on screen.

Notes
-----
**Saving figures to file and showing a window at the same time**

If you want an image file as well as a user interface window, use


`.pyplot.savefig` before `.pyplot.show`. At the end of (a blocking)
from statsmodels.graphics.tsaplots
``show()`` the figure is closedimport plot_predict
and thus unregistered from pyplot. Calling
fig=plt.figure(figsize=(18,8))
`.pyplot.savefig` afterwards would save a new and thus empty figure. This
ax1=fig.add_subplot(111)#these are a forcasted next 60 days data
limitation of command order does not apply if the show is non-blocking or
plot_predict(result=model,start=1,end=len(df_close)+60,ax=ax1)
if you keep a reference to the figure and use `.Figure.savefig`.
plt.grid("both")
plt.legend(['Forecast', 'Close', '95% confidence interval'], fontsize = 20, shadow=True, facecolor='lightblue',
**Auto-show in jupyter notebooks**
plt.show()

The jupyter backends (activated via ``%matplotlib inline``,


``%matplotlib notebook``, or ``%matplotlib widget``), call ``show()`` at
the end of every cell by default. Thus, you usually don't have to call it
history= [x for x in train_data]
predictions = list()
conf_list = list()
for t in range(len(test_data)):
model=sm.tsa.statespace.SARIMAX(history, order = (0,1,0), seasonal_order = (1,1,1,3))
model_fit = model.fit()
fc=model_fit.forecast()
predictions.append(fc)
history.append(test_data[t])
print('RMSE OF SARIMA Model:', np.sqrt(mean_squared_error(test_data, predictions)))#my RMSE 0.9404476463697088

RMSE OF SARIMA Model: 1.2743650895214962

plt.figure(figsize=(18,8))
plt.grid(True)
plt.plot(range(len(test_data)), test_data,label='True Test Close Value',linewidth = 5)
plt.plot(range(len(predictions)), predictions, label = 'Predictions on test data', linewidth = 5)
plt.xticks(fontsize = 15)
plt.xticks(fontsize = 15)
plt.legend(fontsize = 20, shadow=True, facecolor='lightpink', edgecolor = 'k')
plt.show()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

You might also like