0% found this document useful (0 votes)

21 views23 pages

Time Series

Uploaded by

Alferino Filho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views23 pages

Time Series

Uploaded by

Alferino Filho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Steps implementation of Time Series Data EDA

1. Data Ingesition
2. EDA of the Data
3. processing of Data
4. Model Building
5. Model Evalution

Data Ingestion Steps:-

1. import the required libraries such as numpy,pandas,matplotlib,seaborn,etc
2. Load the data
3. Load the time series data into a pandas dataframe
4. Set the datetime columns as the index of dataframe
5. Check datatype of the index and convert it into the dataframe if necessary

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import sys
import warnings
warnings.filterwarnings('ignore')

df=pd.read_csv('TSLA.csv')
df

Date Open High Low Close Volume Dividends Stock Splits

0 2023-01-01 102.264052 102.844516 102.016732 102.375100 190884 0.0 0.0

1 2023-01-02 103.164210 103.568883 103.072105 103.268399 144529 0.0 0.0

2 2023-01-03 104.642948 104.945523 104.396706 104.661726 114590 0.0 0.0

3 2023-01-04 107.383841 107.749974 107.409781 107.514532 144406 0.0 0.0

4 2023-01-05 109.751399 109.687393 108.002799 109.147197 152652 0.0 0.0

... ... ... ... ... ... ... ... ...

360 2023-12-27 274.683259 274.739668 274.622839 274.681922 198906 0.0 0.0

361 2023-12-28 275.187029 275.220635 274.802580 275.070082 171058 0.0 0.0

362 2023-12-29 276.618878 277.740538 276.938281 277.099232 108824 0.0 0.0

363 2023-12-30 277.458843 278.365180 277.325499 277.716507 119610 0.0 0.0

364 2023-12-31 277.943161 278.736790 276.368373 277.682775 106382 0.0 0.0

365 rows × 8 columns

df.isnull().sum()

Date 0

Open 0

High 0

Low 0

Close 0

Volume 0

Dividends 0

Stock Splits 0

dtype: int64

Now we perform univariate analysis

df = df[['Date','Close']]
df
Date Close

0 2023-01-01 102.375100

1 2023-01-02 103.268399

2 2023-01-03 104.661726

3 2023-01-04 107.514532

4 2023-01-05 109.147197

... ... ...

360 2023-12-27 274.681922

361 2023-12-28 275.070082

362 2023-12-29 277.099232

363 2023-12-30 277.716507

364 2023-12-31 277.682775

365 rows × 2 columns

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 365 non-null object
1 Close 365 non-null float64
dtypes: float64(1), object(1)
memory usage: 5.8+ KB

df["Date"]=pd.to_datetime(df.Date)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 365 non-null datetime64[ns]
1 Close 365 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 5.8 KB

stock_df=df.set_index("Date")
stock_df

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

... ...

2023-12-27 274.681922

2023-12-28 275.070082

2023-12-29 277.099232

2023-12-30 277.716507

2023-12-31 277.682775

365 rows × 1 columns

Why we convert this columns into index

1. Retrving of the data will be easy
2. Visualization will be easy
3. Whatever library we are using for the time series data like stats model scipy this library of data which is having index as a columns.

EDA of the Data

EDA of the Data
1. Summary Statistics such as mean,median,mode etc
2. Visualization the time series data
3. Stationarity check by using augmented dickey fuller test.
4. Check for Autocorrelation by using autocorrelation function (acf)
5. checking the Outlier
6. Check Partial autocorrelation function using ARIMA model.

Preprocessing of the data

1. fill the missing value (hear not required)
2. convert data into stationary time series
3. if necessary the normalized the data(hear(not required))
4. split the data into train and test.
5. clean the data by removing the outliers(hear not required)

stock_df.describe()

count 365.000000

mean 199.661626

std 51.101389

min 102.375100

25% 147.327615

50% 205.663111

75% 238.942848

max 277.716507

stock_df.head()

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

plt.plot(stock_df)
plt.show()

plt.hist(stock_df)
plt.hist(stock_df)
plt.show()

sns.distplot(stock_df)
plt.show()

# plotting close price

plt.style.use('ggplot')
plt.figure(figsize=(18,8))

plt.grid(True)

plt.xlabel('Dates', fontsize = 20)

plt.xticks(fontsize = 15)

plt.ylabel('Close Prices', fontsize = 20)

plt.yticks(fontsize = 15)

plt.plot(stock_df['Close'], linewidth = 3, color = 'blue')

plt.title('Tesla Stock Closing Price', fontsize = 30)

plt.show()
# plotting close price

plt.style.use('ggplot')
plt.figure(figsize=(18,8))

plt.grid(True)

plt.xlabel('Dates', fontsize = 20)

plt.xticks(fontsize = 15)

plt.ylabel('Close Prices', fontsize = 20)

plt.yticks(fontsize = 15)

plt.hist(stock_df['Close'], linewidth = 3, color = 'blue')

plt.title('Tesla Stock Closing Price', fontsize = 30)

plt.show()

# Style and figure size

plt.style.use('ggplot')
plt.figure(figsize=(18, 8))

# Labeling
plt.xlabel('Dates', fontsize=20)
plt.xticks(fontsize=15)
plt.ylabel('Close Prices', fontsize=20)
plt.yticks(fontsize=15)

# Plotting the distribution (Kernel Density Estimate plot)

sns.kdeplot(stock_df['Close'], color='blue', linewidth=3)

# Title
plt.title('Tesla Stock Closing Price Distribution', fontsize=30)

plt.grid(True)
plt.show()

stock_df["Close"]

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

... ...

2023-12-27 274.681922

2023-12-28 275.070082

2023-12-29 277.099232

2023-12-30 277.716507

2023-12-31 277.682775

365 rows × 1 columns

dtype: float64

#Rolling mean which is windows size

rolemean=stock_df["Close"].rolling(120).mean()
rolemean
Close

Date

2023-01-01 NaN

2023-01-02 NaN

2023-01-03 NaN

2023-01-04 NaN

2023-01-05 NaN

... ...

2023-12-27 254.024091

2023-12-28 254.389435

2023-12-29 254.769750

2023-12-30 255.152673

2023-12-31 255.535278

365 rows × 1 columns

dtype: float64

#Rolling mean which is windows size

rolestd=stock_df["Close"].rolling(120).std()
rolestd

Date

2023-01-01 NaN

2023-01-02 NaN

2023-01-03 NaN

2023-01-04 NaN

2023-01-05 NaN

... ...

2023-12-27 14.628715

2023-12-28 14.602063

2023-12-29 14.594201

2023-12-30 14.588376

2023-12-31 14.572035

365 rows × 1 columns

dtype: float64

plt.plot(stock_df.Close)
plt.plot(rolemean)
plt.plot(rolestd)

[<matplotlib.lines.Line2D at 0x7da8e8eca110>]
from statsmodels.tsa.stattools import adfuller
adft=adfuller(stock_df['Close'])

pd.Series(adft[0:4],index=["test stats","p value","lag","data points"])

test stats -1.893196

p value 0.335269

lag 0.000000

data points 364.000000

dtype: float64

# null hypotheseis=data is not stationary

# alternate hypothesis=data is stationary
# p value=0.335269
# p<0.05
# reject null hypothesis

# p>0.05
# accept null hypothesis

def test_stationarity(timeseries):
# Determining rolling statistics
rolmean = timeseries.rolling(48).mean() # rolling mean
rolstd = timeseries.rolling(48).std() # rolling standard deviation

# Plotting rolling statistics

plt.figure(figsize=(18, 8))
plt.grid('both')
plt.plot(timeseries, color='blue', label='Original', linewidth=3)
plt.plot(rolmean, color='red', label='Rolling Mean', linewidth=3)
plt.plot(rolstd, color='black', label='Rolling Std', linewidth=4)
plt.legend(loc='best', fontsize=20, shadow=True, facecolor='lightgray',edgecolor='k')
plt.title('Rolling Mean and Standard Deviation', fontsize=25)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.show(block=False)

# Perform Dickey-Fuller test

print("Results of Dickey-Fuller Test:")
adft = adfuller(timeseries, autolag='AIC')

# Displaying the output of the Dickey-Fuller test

output = pd.Series(adft[0:4], index=['Test Statistic', 'p-value', '#Lags Used', 'Number of Observations Used'
for key, value in adft[4].items():
output[f'Critical Value ({key})'] = value
print(output)

test_stationarity(stock_df.Close)

Results of Dickey-Fuller Test:

Test Statistic -1.893196
p-value 0.335269
#Lags Used 0.000000
Number of Observations Used 364.000000
Critical Value (1%) -3.448443
Critical Value (5%) -2.869513
Critical Value (10%) -2.571018
dtype: float64

#check the outliers

sns.boxplot(stock_df.Close)

<Axes: ylabel='Close'>

#Time series decomposition

from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(stock_df[["Close"]],period=12)
result.plot()
plt.show()

result.seasonal

seasonal

Date

2023-01-01 -0.049962

2023-01-02 0.098094

2023-01-03 -0.012132

2023-01-04 0.071651

2023-01-05 0.282969

... ...

2023-12-27 -0.049962

2023-12-28 0.098094

2023-12-29 -0.012132

2023-12-30 0.071651

2023-12-31 0.282969

365 rows × 1 columns

dtype: float64

from statsmodels.graphics.tsaplots import plot_acf,plot_pacf

plot_acf(stock_df.Close)#this function provide the correlation value on differenct different lags
plot_pacf(stock_df.Close)
df_close=stock_df["Close"]
df_close
Close

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

... ...

2023-12-27 274.681922

2023-12-28 275.070082

2023-12-29 277.099232

2023-12-30 277.716507

2023-12-31 277.682775

365 rows × 1 columns

dtype: float64

df_close

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

... ...

2023-12-27 274.681922

2023-12-28 275.070082

2023-12-29 277.099232

2023-12-30 277.716507

2023-12-31 277.682775

365 rows × 1 columns

dtype: float64

df_close=df_close.diff()
df_close=df_close.dropna()

test_stationarity(df_close)
Results of Dickey-Fuller Test:
Test Statistic -5.281090
p-value 0.000006
#Lags Used 7.000000
Number of Observations Used 356.000000
Critical Value (1%) -3.448853
Critical Value (5%) -2.869693
Critical Value (10%) -2.571114
dtype: float64

perform train test split of time series model

df_close[0:-60]#training data

Date

2023-01-02 0.893299

2023-01-03 1.393326

2023-01-04 2.852806

2023-01-05 1.632665

2023-01-06 0.547885

... ...

2023-10-28 -0.800922

2023-10-29 2.113884

2023-10-30 0.623045

2023-10-31 -0.615555

2023-11-01 1.324551

304 rows × 1 columns

dtype: float64

df_close[-60:]#testing data

Date

2023-11-02 -0.166372

2023-11-03 -1.044071

2023-11-04 -0.820504

2023-11-05 1.426642

2023-11-06 0.892228

2023-11-07 -0.178966

2023-11-08 1.583613

2023-11-09 -0.351853

2023-11-10 -0.247485
2023-11-10 -0.247485

2023-11-11 -0.109094

2023-11-12 0.598929

2023-11-13 0.740941

2023-11-14 0.324071

2023-11-15 0.351657

2023-11-16 0.286591

2023-11-17 -0.604990

2023-11-18 0.541170

2023-11-19 0.030313

2023-11-20 -0.329250

2023-11-21 -0.608250

2023-11-22 0.342926

2023-11-23 0.701763

2023-11-24 1.912550

2023-11-25 0.010383

2023-11-26 1.710726

2023-11-27 1.372687

2023-11-28 -0.686699

2023-11-29 0.954674

2023-11-30 -0.466640

2023-12-01 -2.042970

2023-12-02 0.638001

2023-12-03 -1.164504

2023-12-04 1.029772

2023-12-05 0.085211

2023-12-06 1.994400

2023-12-07 2.125879

2023-12-08 -0.086252

2023-12-09 -0.577820

2023-12-10 -0.268866

2023-12-11 -0.369897

2023-12-12 -0.074961

2023-12-13 0.421395

2023-12-14 0.703201

2023-12-15 1.114858

2023-12-16 0.921257

2023-12-17 -0.380721

2023-12-18 -1.068875

2023-12-19 1.768739

2023-12-20 0.322271

2023-12-21 -0.512357

2023-12-22 0.072897

2023-12-23 -1.323132

2023-12-24 0.106720

2023-12-25 -0.330315

2023-12-26 1.021908

2023-12-27 1.472730

2023-12-28 0.388160

2023-12-29 2.029151

2023-12-30 0.617275

2023-12-31 -0.033732

dtype: float64

#split data into train and training set

#split data into train and training set
train_data=df_close[0:-60]
test_data=df_close[-60:]
plt.figure(figsize=(18,8))
plt.grid(True)
plt.xlabel('Dates', fontsize = 20)
plt.ylabel('Closing Prices', fontsize = 20)
plt.xticks(fontsize = 15)
plt.xticks(fontsize = 15)
plt.plot(train_data, 'green', label='Train data', linewidth = 5)
plt.plot(test_data, 'blue', label='Test data', linewidth = 5)
plt.legend(fontsize = 20, shadow=True,facecolor='lightpink', edgecolor = 'k')

<matplotlib.legend.Legend at 0x7da92d05a890>

Model building in Time Series

#in this time we use arima model
stock_df["Close"]

Date

2023-01-01 102.375100

2023-01-02 103.268399

2023-01-03 104.661726

2023-01-04 107.514532

2023-01-05 109.147197

... ...

2023-12-27 274.681922

2023-12-28 275.070082

2023-12-29 277.099232

2023-12-30 277.716507

2023-12-31 277.682775

365 rows × 1 columns

dtype: float64

365-60

305

import statsmodels.api as sm
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error

history = [x for x in train_data]

history

[0.8932991355088831,
1.3933263239742928,
2.852806278873757,
1.6326649420308286,
0.5478846329143892,
0.7696058342437482,
0.9625287430472156,
0.018231373084361735,
1.1077005040734065,
0.7497050402641605,
1.972947678168694,
0.40951767041809717,
1.1636240679130196,
1.2304959594809617,
0.18371201236193713,
2.4325497305482457,
-0.26391312202788697,
1.442625668491928,
-0.5318829334557194,
-2.413729246967847,
1.6712015109161484,
0.9317498913066089,
0.0698322548046093,
2.807609973898053,
-1.0265658609344541,
0.43923686869243284,
0.5207027332374281,
1.6977267356932941,
2.4579251484588553,
0.3907230052264481,
1.0285667069413762,
-0.4168248436978672,
-1.31013559865616,
-0.06614521282753572,
0.7182963135563796,
1.4772153494880627,
1.995977100207341,
-0.2607738551311627,
0.18751835521305793,
-0.1470745941979601,
-0.9302395758612363,
-1.175458451390483,
2.2248009695441056,
-0.02044980691758269,
0.19315671831344616,
-0.6671134180505476,
1.1357477139460173,
-0.8821431823303101,
0.35407616995422586,
-0.8295849273184785,
0.9959651825627134,
-0.06403624432709876,
-0.1254060218069526,
0.20201766368549556,
0.8513926357980495,
0.44520452621046047,
0.3740700567433919,
0.532466961761088,
-0.12477596304930216,
-0.01570540903489359,
0.1639978537589286,
-0.7857140190270115,
-1.2328669657189266,
1.1792026695864877,
-0.017910958370322305,
-1.4507965353131453,
1.5357081015208678,
-0.61471958905355,
0.41149953226522484,
1.0698880810996627,
0.8176723681895055,
1.6438973978009699,
-0.5466558718802901,
0.5599350920078052,
-0.17443936153424033,
-0.5136371255328243,
0.187439400478155,
0.5121577171133822,
0.4071279719513825,
-0.760302556354219,
1.2999894672940684,
0.667526643148932,
-0.7276142970002581,
1.912569456755989,
2.027267021302805,
2.3918575744206407,
-0.13105016392063362,
-0.17900711779094536,
1.4184955199386025,
0.38286385364656894,
1.3036863601885216,
0.5610416187842588,
1.835680127192603,
0.8274719712972285,
0.962973437639846,
0.43310279806178187,
2.0598256806272843,
0.8728259721148106,
1.1014355498728605,
2.7299102840982528,
-1.5219601321712162,
-0.5815011359226219,
1.236853495737762,
-0.2699754161442627,
1.9296890641726065,
0.6184725435801113,
-0.29059930396928735,
2.217915352321569,
2.415112577533364,
2.2554566313530415,
1.5617323934664569,
-0.9518744601043636,
2.5516783572660984,
0.41447323494159605,
1.4482892620561643,
1.3321046160507137,
0.23776838206009643,
1.2299132591679154,
1.5062757615648366,
0.7259905772023387,
-0.14271730621246093,
0.5038740865232683,
1.8941510595801674,
-0.20365688303775187,
0.04264250908897793,
0.3792145940770979,
2.384819429402455,
1.2267840907422567,
1.0470496207943256,
-0.9197373312707668,
1.105424410204506,
-0.12503909465468155,
0.4347971674349935,
-0.00047127816949910084,
1.4605530632271666,
0.8567678108843779,
0.4095450043537312,
0.9541648088898853,
-0.9846694681493204,
-1.0661810820396056,
1.0496226968534188,
0.6124782628608898,
1.3059418277300097,
3.0336473041813576,
1.6702186490335293,
-0.47266247272744977,
1.2718941508587136,
-1.0736219471799018,
0.56404071487745,
0.050955787937795094,
2.648999286346992,
-0.3239580264257995,
-0.4083176648713902,
0.736210905074671,
-0.47007975917725275,
1.2075272192262787,
-0.4642814851631556,
-0.648229305801209,
0.42947268677431794,
-0.21222472933988,
2.547204410309206,
1.4523247896722182,
0.14151962258091544,
0.021271593676090106,
0.2571075913099321,
0.41571940289978215,
-0.7933694966874612,
0.8053587181372848,
1.0925945456034185,
1.7128518843369989,
0.7429764591254013,
1.1948999450876556,
-0.12492871305823883,
-0.3450670390670041,
0.7897461900880103,
-0.1497014015942284,
0.08206034496333814,
-0.01971566829175231,
0.138933152095575,
0.4939003011037073,
-0.5239700280359045,
-1.155146324585104,
-0.5870156275400689,
0.686234065534677,
-1.335096902104965,
-0.445184759171525,
0.7279536732392273,
-0.5440324664817808,
2.058874594525122,
-0.27255959791790474,
-0.2724127433110368,
0.8499787304334347,
-0.7095231836120774,
1.1477017576514754,
0.40817688196634094,
1.3651581417331329,
1.3606390307013214,
2.229787485292121,
1.8274741977712665,
0.26484370559887793,
0.5955447688372146,
1.5488637598278103,
1.334956406980524,
1.048843017829114,
-1.0411073790201328,
0.12284371446423847,
-0.3749746318142684,
1.3140512763593222,
0.3287323868915166,
1.1016466869869816,
0.5882608524204045,
1.3703320827427206,
-0.1815827758780415,
0.2385799079773676,
-0.47976109312293147,
0.6954842758049438,
0.6059251624713227,
3.2347373527486525,
0.33844309895349056,
-0.303456523336763,
-0.002355769474434055,
-0.04602240039588423,
1.1438055364243667,
-1.6396435925654487,
1.2086371235202478,
0.5362330707300487,
0.9004626050362674,
-0.7666321295496061,
1.1056824150820432,
-1.0717533413471472,
-0.35213855788359183,
0.10624865407280026,
0.7082703765054248,
-0.3099064479051208,
0.9823418343268884,
2.307936411124075,
-2.0564375154723677,
1.2698091022463132,
0.8851898436832926,
0.24098021966440797,
-0.08847856271148657,
0.2326345171115065,
0.30425645941497237,
0.004451985076087794,
-1.0164009236605978,
1.7421066978022566,
1.9739693751788252,
-0.38503585552368236,
-1.0979274424854566,
0.9869216436693193,
-0.021929141850222322,
0.6957551437645861,
0.10236768419537157,
1.0597859114010078,
1.3301609680348747,
-0.2621673797302151,
-1.0454573238503997,
-1.2060857036777577,
1.3796231147639162,
-0.9547226304205481,
0.27181288446479357,
0.3146558113507183,
-0.013781623356692307,
-1.435210642150338,
0.5538345451957127,
1.220740058501633,
0.6369303829316095,
0.25667955803731957,
0.4518904213170458,
0.8895449833110263,
-2.5488895093606345,
2.8701273263094436,
0.8968927087922793,
-0.08302734093248887,
-0.12824623694370985,
1.1801893787638278,
0.7215067046920751,
-2.1639138156295985,
2.479478937589903,
0.7481824800907759,
1.2967698907965826,
-0.12351585784816166,
1.714674382931122,
1.537481772986979,
0.9174169024166474,
-1.1684285505278353,
2.4679278706622654,
0.9080225036720435,
1.7723986293189284,
-0.24381132693795848,
-0.014754413257094257,
2.781816436072006,
-0.2887306275033268,
-0.20006864541986147,
2.056440550381069,
0.5849053174142966,
1.3754351704423016,
-0.32557474963380173,
1.0517465564922759,
-0.8009215689510256,
2.113884101388976,
0.6230446831372092,
-0.6155548195604865,
1.3245506334627066]

#train arima model and we pass data as a history

model=ARIMA(history,order=(1,1,1))
model=model.fit()
model.summary()

SARIMAX Results
Dep. Variable: y No. Observations: 304

Model: ARIMA(1, 1, 1) Log Likelihood -444.626

Date: Sat, 02 Nov 2024 AIC 895.251

Time: 00:05:55 BIC 906.392

Sample: 0 HQIC 899.708

- 304

Covariance Type: opg

coef std err z P>|z| [0.025 0.975]

ar.L1 -0.1192 0.054 -2.195 0.028 -0.226 -0.013

ma.L1 -0.9370 0.023 -41.528 0.000 -0.981 -0.893

sigma2 1.0933 0.091 11.989 0.000 0.915 1.272

Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 0.09

Prob(Q): 0.91 Prob(JB): 0.95

Heteroskedasticity (H): 1.07 Skew: 0.00

Prob(H) (two-sided): 0.75 Kurtosis: 2.91

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

len(history)

304

model.forecast()

array([0.56016095])

mean_squared_error([test_data[0]],model.forecast())

0.527849948848813
np.sqrt(mean_squared_error([test_data[0]],model.forecast()))

0.7265328270964864

def train_arima_model(x, y, arima_order):

# prepare training dataset
# make predictions list
history = [x for x in x]
predictions = list()
for t in range(len(y)):
model = ARIMA(history, order=arima_order)
model_fit = model.fit()
yhat = model_fit.forecast()[0]
predictions.append(yhat)
history.append(y[t])
# calculate out of sample error
rmse = np.sqrt(mean_squared_error(y, predictions))
return rmse

# evaluate different combinations of p, d and q values for an ARIMA model to get the best order for ARIMA Model
def evaluate_models(dataset, test, p_values, d_values, q_values):
dataset=dataset.astype('float32')
best_score, best_cfg = float("inf"), None
for p in p_values:
for d in d_values:
for q in q_values:
order = (p,d,q)
try:
rmse=train_arima_model(dataset, test, order)
if rmse<best_score:
best_score, best_cfg = rmse, order
print('ARIMA%s RMSE=%.3f' % (order,rmse))
except:
continue
print('Best ARIMA%s RMSE=%.3f'%(best_cfg,best_score))

p_values=range(0,3)
d_values=range(0,3)
q_values=range(0,3)
evaluate_models(train_data,test_data,p_values,d_values,q_values)

ARIMA(0, 0, 0) RMSE=0.932
ARIMA(0, 0, 1) RMSE=0.940
ARIMA(0, 0, 2) RMSE=0.940
ARIMA(0, 1, 0) RMSE=1.237
ARIMA(0, 1, 1) RMSE=0.933
ARIMA(0, 1, 2) RMSE=0.958
ARIMA(0, 2, 0) RMSE=2.140
ARIMA(0, 2, 1) RMSE=1.239
ARIMA(0, 2, 2) RMSE=0.938
ARIMA(1, 0, 0) RMSE=0.941
ARIMA(1, 0, 1) RMSE=0.941
ARIMA(1, 0, 2) RMSE=0.953
ARIMA(1, 1, 0) RMSE=1.097
ARIMA(1, 1, 1) RMSE=0.955
ARIMA(1, 1, 2) RMSE=0.968
ARIMA(1, 2, 0) RMSE=1.604
ARIMA(1, 2, 1) RMSE=1.098
ARIMA(1, 2, 2) RMSE=0.959
ARIMA(2, 0, 0) RMSE=0.940
ARIMA(2, 0, 1) RMSE=0.953
ARIMA(2, 0, 2) RMSE=0.913
ARIMA(2, 1, 0) RMSE=1.045
ARIMA(2, 1, 1) RMSE=0.960
ARIMA(2, 1, 2) RMSE=0.957
ARIMA(2, 2, 0) RMSE=1.303
ARIMA(2, 2, 1) RMSE=1.047
ARIMA(2, 2, 2) RMSE=0.965
Best ARIMA(2, 0, 2) RMSE=0.913

history=[x for x in train_data]

predictions=list()
for i in range(len(test_data)):
model=ARIMA(history,order=(2,0,0))
model=model.fit()
fc=model.forecast(alpha=0.05)#alpha=0.05 we set out confidence interval 95%
predictions.append(fc)
history.append(test_data[i])
print(f"my RMSE {np.sqrt(mean_squared_error(test_data,predictions))}")

my RMSE 0.9404476463697088

plt.figure(figsize=(18,8))
plt.grid(True)
plt.plot(range(len(test_data)), test_data,label='True Test Close Value',linewidth = 5)
plt.plot(range(len(predictions)), predictions, label = 'Predictions on test data', linewidth = 5)
plt.xticks(fontsize = 15)
plt.xticks(fontsize = 15)
plt.legend(fontsize = 20, shadow=True, facecolor='lightpink', edgecolor = 'k')
plt.show()

fc_series=pd.Series(predictions,index=test_data.index)

#plot
plt.figure(figsize=(12,5), dpi=100)
plt.plot(train_data, label='Training', color = 'blue')
plt.plot(test_data, label='Test', color = 'green', linewidth = 3)
plt.plot(fc_series, label='Forecast', color = 'red')
plt.title('Forecast vs Actuals on test data')
plt.legend(loc='upper left', fontsize=8)
plt.show

matplotlib.pyplot.show
def show(*args, **kwargs)

Display all open figures.

Parameters
----------
block : bool, optional
Whether to wait for all figures to be closed before returning.

If `True` block and run the GUI main loop until all figure windows
are closed.

If `False` ensure that all figure windows are displayed and return
immediately. In this case, you are responsible for ensuring
that the event loop is running to have responsive figures.

Defaults to True in non-interactive mode and to False in interactive

mode (see `.pyplot.isinteractive`).

See Also
--------
ion : Enable interactive mode, which shows / updates the figure after
every plotting command, so that calling ``show()`` is not necessary.
ioff : Disable interactive mode.
savefig : Save the figure to an image file instead of showing it on screen.

Notes
-----
**Saving figures to file and showing a window at the same time**

If you want an image file as well as a user interface window, use

`.pyplot.savefig` before `.pyplot.show`. At the end of (a blocking)
from statsmodels.graphics.tsaplots
``show()`` the figure is closedimport plot_predict
and thus unregistered from pyplot. Calling
fig=plt.figure(figsize=(18,8))
`.pyplot.savefig` afterwards would save a new and thus empty figure. This
ax1=fig.add_subplot(111)#these are a forcasted next 60 days data
limitation of command order does not apply if the show is non-blocking or
plot_predict(result=model,start=1,end=len(df_close)+60,ax=ax1)
if you keep a reference to the figure and use `.Figure.savefig`.
plt.grid("both")
plt.legend(['Forecast', 'Close', '95% confidence interval'], fontsize = 20, shadow=True, facecolor='lightblue',
**Auto-show in jupyter notebooks**
plt.show()

The jupyter backends (activated via ``%matplotlib inline``,

``%matplotlib notebook``, or ``%matplotlib widget``), call ``show()`` at
the end of every cell by default. Thus, you usually don't have to call it
history= [x for x in train_data]
predictions = list()
conf_list = list()
for t in range(len(test_data)):
model=sm.tsa.statespace.SARIMAX(history, order = (0,1,0), seasonal_order = (1,1,1,3))
model_fit = model.fit()
fc=model_fit.forecast()
predictions.append(fc)
history.append(test_data[t])
print('RMSE OF SARIMA Model:', np.sqrt(mean_squared_error(test_data, predictions)))#my RMSE 0.9404476463697088

RMSE OF SARIMA Model: 1.2743650895214962

Quick Revision of Bio Phy Che 9 Hours
100% (2)
Quick Revision of Bio Phy Che 9 Hours
489 pages
TIME - ChatGPT Manual 001
No ratings yet
TIME - ChatGPT Manual 001
7 pages
Time Series Analysis of HDFCBANK Stock by Pavan
No ratings yet
Time Series Analysis of HDFCBANK Stock by Pavan
10 pages
Time Series Analysis 1718649022
No ratings yet
Time Series Analysis 1718649022
5 pages
Forecasting Economic Indicators Using Time Series Analysis
No ratings yet
Forecasting Economic Indicators Using Time Series Analysis
4 pages
Time Analysis - 1
No ratings yet
Time Analysis - 1
23 pages
Week 1 Time Series PDF
No ratings yet
Week 1 Time Series PDF
47 pages
MEE 6070 Data Science and Analytics: Importing Data Using Plotting The Data Checking For Linearity
No ratings yet
MEE 6070 Data Science and Analytics: Importing Data Using Plotting The Data Checking For Linearity
13 pages
Ibd Manual
No ratings yet
Ibd Manual
12 pages
TSA - Mini - Project - Ipynb - Colaboratory
No ratings yet
TSA - Mini - Project - Ipynb - Colaboratory
28 pages
EDA Document
No ratings yet
EDA Document
13 pages
Asm2024 1
No ratings yet
Asm2024 1
33 pages
06 Time Series Analysis
No ratings yet
06 Time Series Analysis
9 pages
Case - Study - MachineLearning - NSE - TATA - GLOBAL - Data - Prediction - Jupyter Notebook
No ratings yet
Case - Study - MachineLearning - NSE - TATA - GLOBAL - Data - Prediction - Jupyter Notebook
10 pages
Netflix Stock Price Prediction
No ratings yet
Netflix Stock Price Prediction
20 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Case Study Crude Oil Production Forecasting
No ratings yet
Case Study Crude Oil Production Forecasting
27 pages
Dev Record Final
No ratings yet
Dev Record Final
34 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Time Series Using Python
No ratings yet
Time Series Using Python
18 pages
Statistics Project SEM1 Notes
No ratings yet
Statistics Project SEM1 Notes
5 pages
Financial Analytics With Python
100% (1)
Financial Analytics With Python
40 pages
Machine Learning Stock Time Series 1700932258
No ratings yet
Machine Learning Stock Time Series 1700932258
21 pages
FUll Code
No ratings yet
FUll Code
43 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Análisis Exploratorio de Datos (EDA) - NVIDIA 2021-2023
No ratings yet
Análisis Exploratorio de Datos (EDA) - NVIDIA 2021-2023
9 pages
Unit 5 - Real Time Data Analysis
No ratings yet
Unit 5 - Real Time Data Analysis
16 pages
Stock Market Analysis ? Pro2 My
No ratings yet
Stock Market Analysis ? Pro2 My
32 pages
ML Report Miniproject
No ratings yet
ML Report Miniproject
11 pages
Completed Time Series Analysis! ?
No ratings yet
Completed Time Series Analysis! ?
24 pages
M1 - L2 (Visualizing Times Series Plots)
No ratings yet
M1 - L2 (Visualizing Times Series Plots)
28 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
Markets
No ratings yet
Markets
5 pages
Time Series
No ratings yet
Time Series
27 pages
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
No ratings yet
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
11 pages
Project Time Series Analysis
100% (2)
Project Time Series Analysis
26 pages
Exp 8
No ratings yet
Exp 8
4 pages
Unit 6 2
No ratings yet
Unit 6 2
6 pages
Stock Market Analysis Project Overview: Part 1: Getting The Data
No ratings yet
Stock Market Analysis Project Overview: Part 1: Getting The Data
1 page
Stock - Class - Py - Jupyter Notebook
No ratings yet
Stock - Class - Py - Jupyter Notebook
5 pages
Time Series Analysis - CheatSheet
No ratings yet
Time Series Analysis - CheatSheet
10 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Dav 4
No ratings yet
Dav 4
6 pages
Advanced Stock Price Prediction Using Machine Learning and Time Series Analysis - by Nickolas Discolll - Dec, 2023 - Medium
No ratings yet
Advanced Stock Price Prediction Using Machine Learning and Time Series Analysis - by Nickolas Discolll - Dec, 2023 - Medium
38 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Components of Time Series and Exploratory Analysis - Transcript
No ratings yet
Components of Time Series and Exploratory Analysis - Transcript
2 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Prototype 13
No ratings yet
Prototype 13
1 page
10 Minutes To Pandas - Pandas 1.2.4 Documentation
No ratings yet
10 Minutes To Pandas - Pandas 1.2.4 Documentation
18 pages
Python Programming Mock Exam
No ratings yet
Python Programming Mock Exam
20 pages
Dev Lab Manual Org
No ratings yet
Dev Lab Manual Org
28 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Tutorial - Time Series Analysis With Pandas - Dataquest
No ratings yet
Tutorial - Time Series Analysis With Pandas - Dataquest
32 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Code
No ratings yet
Code
3 pages
Dsi 436
No ratings yet
Dsi 436
4 pages
Chap 1: Preparing Data and A Linear Model: Explore The Data With Some EDA
No ratings yet
Chap 1: Preparing Data and A Linear Model: Explore The Data With Some EDA
27 pages
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
Maam Ty. Jadeng
No ratings yet
Maam Ty. Jadeng
4 pages
As A Single PDF
No ratings yet
As A Single PDF
3 pages
Algebra and Equations
No ratings yet
Algebra and Equations
36 pages
978 3 662 03750 8
No ratings yet
978 3 662 03750 8
541 pages
VX Positioner Bray IOM 6A Hart
No ratings yet
VX Positioner Bray IOM 6A Hart
164 pages
Manual Tambahan Geogebra
No ratings yet
Manual Tambahan Geogebra
21 pages
Case Study For The Amsterdam ArenA Stadium
No ratings yet
Case Study For The Amsterdam ArenA Stadium
24 pages
Analyzing Operational Flexibility of Electric Power Systems
No ratings yet
Analyzing Operational Flexibility of Electric Power Systems
10 pages
Traffic Engineering
No ratings yet
Traffic Engineering
24 pages
The Simulation and Optimization of The CPU Heat Sink For A New Type of Graphite
No ratings yet
The Simulation and Optimization of The CPU Heat Sink For A New Type of Graphite
4 pages
9th - PF - 21-22 - Maths - 2 - 05-Coordinate Geometry
No ratings yet
9th - PF - 21-22 - Maths - 2 - 05-Coordinate Geometry
18 pages
Mathematics in Chemical Engineering A 50 Year Introspection
No ratings yet
Mathematics in Chemical Engineering A 50 Year Introspection
17 pages
Main Elements of The Comunication Process
No ratings yet
Main Elements of The Comunication Process
1 page
DCF Techniques
No ratings yet
DCF Techniques
25 pages
Lesson 4 Polynomial Curves Tangents and Normal To Plane Curves
No ratings yet
Lesson 4 Polynomial Curves Tangents and Normal To Plane Curves
57 pages
Grade 7 9.7 Communicationg About Estimation Strategies
No ratings yet
Grade 7 9.7 Communicationg About Estimation Strategies
4 pages
Cad-Cam Modeling PDF
100% (2)
Cad-Cam Modeling PDF
390 pages
P B Q Xii Maths 2023-24
No ratings yet
P B Q Xii Maths 2023-24
6 pages
2nd Assignment
No ratings yet
2nd Assignment
15 pages
FS-719 Numerical Methods PDF
No ratings yet
FS-719 Numerical Methods PDF
2 pages
Inter 1b Syllabus
No ratings yet
Inter 1b Syllabus
3 pages
Design of UV Joint
No ratings yet
Design of UV Joint
11 pages
Load Length N LB MM in
No ratings yet
Load Length N LB MM in
3 pages
Phase Plane Analysis - 3
No ratings yet
Phase Plane Analysis - 3
21 pages
Improving The Forecasted Accuracy of Model Based On Fuzzy Time Series and K-Means Clustering
No ratings yet
Improving The Forecasted Accuracy of Model Based On Fuzzy Time Series and K-Means Clustering
10 pages
VMD0007 BL UP v3.1
No ratings yet
VMD0007 BL UP v3.1
47 pages
Kinematic Analysis For Sliding Failure of Multi-Faced Rock Slopes
No ratings yet
Kinematic Analysis For Sliding Failure of Multi-Faced Rock Slopes
11 pages
ECON022 BAP With Major
No ratings yet
ECON022 BAP With Major
3 pages
Model-Based Testing of Automotive Systems: Piketec GMBH, Germany
No ratings yet
Model-Based Testing of Automotive Systems: Piketec GMBH, Germany
9 pages