0% found this document useful (0 votes)
81 views

Multivariate Multi Step Time Series Forecasting Using Stacked LSTM Sequence To Sequence Autoencoder in Tensorflow 2 0 Keras

This document summarizes a stacked LSTM sequence-to-sequence model for multi-step time series forecasting. It introduces sequence-to-sequence learning and describes a model with an encoder and decoder. It then explains stacking additional LSTM layers to create a more complex representation (E2D2 model). The model is trained on individual household electric power consumption data to forecast future consumption values over multiple time steps.

Uploaded by

Anish Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views

Multivariate Multi Step Time Series Forecasting Using Stacked LSTM Sequence To Sequence Autoencoder in Tensorflow 2 0 Keras

This document summarizes a stacked LSTM sequence-to-sequence model for multi-step time series forecasting. It introduces sequence-to-sequence learning and describes a model with an encoder and decoder. It then explains stacking additional LSTM layers to create a more complex representation (E2D2 model). The model is trained on individual household electric power consumption data to forecast future consumption values over multiple time steps.

Uploaded by

Anish Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Multivariate Multi-step Time Series Forecasting using Stacked

LSTM sequence to sequence Autoencoder in Tensorflow 2.0 /


Keras
A D VA NC E D D E E P LE A RNI NG PYT HO N S T RUC T URE D D AT A T E C HNI Q UE T I M E S E RI E S F O RE C A S T I NG

Overview

This article will see how to create a stacked sequence to sequence the LSTM model for time series
forecasting in Keras/ TF 2.0. 

Prerequ isit es: Th e reader sh ou ld already be familiar wit h n eu ral n et works an d, in part icu lar, recu rren t n eu ral n et works (RNNs). Also, kn owledge of LSTM or

GRU models is preferable. If y ou are n ot familiar wit h LSTM, I wou ld prefer y ou t o read LSTM- Lon g Sh ort -Term Memory .

Introduction

In Sequence to Sequence Learning, an RNN model is trained to map an input sequence to an output
sequence. The input and output need not necessarily be of the same length. The seq2seq model contains
two RNNs, e.g., LSTMs. They can be treated as an encoder and decoder. The encoder part converts the
given input sequence to a fixed-length vector, which acts as a summary of the input sequence.

This fixed-length vector is called the context vector. The context vector is given as input to the decoder
and the final encoder state as an initial decoder state to predict the output sequence. Sequence to
Sequence learning is used in language translation, speech recognition, time series
forecasting, etc.

We will use the sequence to sequence learning for time series forecasting. We can use this architecture to
easily make a multistep forecast. we will add two layers, a repeat vector layer and time distributed dense
layer in the architecture.

A repeat vector layer is used to repeat the context vector we get from the encoder to pass it as an input to
the decoder. We will repeat it for n-steps ( n is the no of future steps you want to forecast). The output
received from the decoder with respect to each time step is mixed. The time distributed densely will apply
a fully connected dense layer on each time step and separates the output for each timestep. The time
distributed densely is a wrapper that allows applying a layer to every temporal slice of an input.

We will stack additional layers on the encoder part and the decoder part of the sequence to sequence
model. By stacking LSTM’s, it may increase the ability of our model to understand  more complex
representation of our time-series data in hidden layers, by capturing information at different levels.
Code

The data used is  In dividu al h ou seh old elect ric power con su mpt ion . You  can dow nload t he dat aset from t his link. 

Importing L ibra rie s

import pandas as pd import numpy as np from sklearn.preprocessing import MinMaxScaler import


matplotlib.pyplot as plt import tensorflow as tf import os

Now load the dataset into a pandas data frame.

df=pd.read_csv(r'household_power_consumption.txt', sep=';', header=0, low_memory=False, infer_datetime_format=True,

parse_dates={'datetime':[0,1]}, index_col=['datetime']) df.head()

 
Imputing Null Va lue s  

df = df.replace('?', np.nan) df.isnull().sum()

Now we will create a function that will impute missing values by replacing them with values on their
previous day.

def fill_missing(values): one_day = 60*24 for row in range(df.shape[0]): for col in range(df.shape[1]): if

np.isnan(values

[col]): values
= values[row-one_day,col] df = df.astype('float32') fill_missing(df.values) df.isnull().sum()

Downsampling of Data from minutes to Days

There are more than 2 lakhs observations recorded. Let's make the data simpler by downsampling them
from the frequency of minutes to days.

daily_df = df.resample('D').sum() daily_df.head()

 
 

Train - Test Split

After downsampling, the number of instances is 1442. We will split the dataset into train and test data in a
75% and 25% ratio of the instances. (0.75 * 1442 = 1081)

train_df,test_df = daily_df[1:1081], daily_df[1081:]

Scaling the values 

All the columns in the data frame are on a different scale. Now we will scale the values to -1 to 1 for faster
training of the models.

train = train_df scalers={} for i in train_df.columns: scaler = MinMaxScaler(feature_range=(-1,1)) s_s =


scaler.fit_transform(train[i].values.reshape(-1,1)) s_s=np.reshape(s_s,len(s_s)) scalers['scaler_'+ i] =

scaler train[i]=s_s test = test_df for i in train_df.columns: scaler = scalers['scaler_'+i] s_s =

scaler.transform(test[i].values.reshape(-1,1)) s_s=np.reshape(s_s,len(s_s)) scalers['scaler_'+i] = scaler


test[i]=s_s

Converting the series to samples 

Now we will make a function that will use a sliding window approach to transform our series into samples
of input past observations and output future observations to use supervised learning algorithms.

def split_series(series, n_past, n_future): # # n_past ==> no of past observations # # n_future ==> no of

future observations # X, y = list(), list() for window_start in range(len(series)): past_end = window_start +

n_past future_end = past_end + n_future if future_end > len(series): break # slicing the past and future
parts of the window past, future = series[window_start:past_end, :], series[past_end:future_end, :]

X.append(past) y.append(future) return np.array(X), np.array(y)

For this case, let's assume that given the past 10 days observation, we need to forecast the next 5 days
observations.
n_past = 10 n_future = 5 n_features = 7

Now convert both the train and test data into samples using the split_series function.

X_train, y_train = split_series(train.values,n_past, n_future) X_train = X_train.reshape((X_train.shape[0],

X_train.shape[1],n_features)) y_train = y_train.reshape((y_train.shape[0], y_train.shape[1], n_features))


X_test, y_test = split_series(test.values,n_past, n_future) X_test = X_test.reshape((X_test.shape[0],

X_test.shape[1],n_features)) y_test = y_test.reshape((y_test.shape[0], y_test.shape[1], n_features))

Model Architecture

Now we will create two models in the below-mentioned architecture.

E1D1 ==> Sequence to Sequence Model with one encoder layer and one decoder layer.

# E1D1 # n_features ==> no of features at each timestep in the data. # encoder_inputs =

tf.keras.layers.Input(shape=(n_past, n_features)) encoder_l1 = tf.keras.layers.LSTM(100, return_state=True)


encoder_outputs1 = encoder_l1(encoder_inputs) encoder_states1 = encoder_outputs1[1:] # decoder_inputs =

tf.keras.layers.RepeatVector(n_future)(encoder_outputs1[0]) # decoder_l1 = tf.keras.layers.LSTM(100,

return_sequences=True)(decoder_inputs,initial_state = encoder_states1) decoder_outputs1 =


tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(n_features))(decoder_l1) # model_e1d1 =

tf.keras.models.Model(encoder_inputs,decoder_outputs1) # model_e1d1.summary()

E2D2 ==> Sequence to Sequence Model with two encoder layers and two decoder layers.

# E2D2 # n_features ==> no of features at each timestep in the data. # encoder_inputs =

tf.keras.layers.Input(shape=(n_past, n_features)) encoder_l1 = tf.keras.layers.LSTM(100,return_sequences =


True, return_state=True) encoder_outputs1 = encoder_l1(encoder_inputs) encoder_states1 = encoder_outputs1[1:]
encoder_l2 = tf.keras.layers.LSTM(100, return_state=True) encoder_outputs2 = encoder_l2(encoder_outputs1[0])

encoder_states2 = encoder_outputs2[1:] # decoder_inputs = tf.keras.layers.RepeatVector(n_future)


(encoder_outputs2[0]) # decoder_l1 = tf.keras.layers.LSTM(100, return_sequences=True)
(decoder_inputs,initial_state = encoder_states1) decoder_l2 = tf.keras.layers.LSTM(100,

return_sequences=True)(decoder_l1,initial_state = encoder_states2) decoder_outputs2 =


tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(n_features))(decoder_l2) # model_e2d2 =
tf.keras.models.Model(encoder_inputs,decoder_outputs2) # model_e2d2.summary()
Training the models

I have used Adam optimizer and Huber loss as the loss function.  Let's compile and run the model.

reduce_lr = tf.keras.callbacks.LearningRateScheduler(lambda x: 1e-3 * 0.90 ** x)

model_e1d1.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.Huber())
history_e1d1=model_e1d1.fit(X_train,y_train,epochs=25,validation_data=
(X_test,y_test),batch_size=32,verbose=0,callbacks=[reduce_lr])

model_e2d2.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.Huber())
history_e2d2=model_e2d2.fit(X_train,y_train,epochs=25,validation_data=
(X_test,y_test),batch_size=32,verbose=0,callbacks=[reduce_lr])

 
 

Pre diction on te st sa mple s

pred_e1d1=model_e1d1.predict(X_test) pred_e2d2=model_e2d2.predict(X_test)

Inverse Scaling of the predicted values

Now we will convert the predictions to its original scale.

for index,i in enumerate(train_df.columns): scaler = scalers['scaler_'+i]


pred1_e1d1[:,:,index]=scaler.inverse_transform(pred1_e1d1[:,:,index])
pred_e1d1[:,:,index]=scaler.inverse_transform(pred_e1d1[:,:,index])

pred1_e2d2[:,:,index]=scaler.inverse_transform(pred1_e2d2[:,:,index])
pred_e2d2[:,:,index]=scaler.inverse_transform(pred_e2d2[:,:,index])
y_train[:,:,index]=scaler.inverse_transform(y_train[:,:,index])

y_test[:,:,index]=scaler.inverse_transform(y_test[:,:,index])

Checking Error

Now we will calculate the mean absolute error of all observations.

from sklearn.metrics import mean_absolute_error for index,i in enumerate(train_df.columns): print(i) for j in


range(1,6): print("Day ",j,":") print("MAE-E1D1 : ",mean_absolute_error(y_test[:,j-1,index],pred1_e1d1[:,j-

1,index]),end=", ") print("MAE-E2D2 : ",mean_absolute_error(y_test[:,j-1,index],pred1_e2d2[:,j-1,index]))


print() print()

 
From the above output, we can observe that, in some cases, the E2D2 model has performed better than the
E1D1 model with less error. Training different models with a different number of stacked layers and
creating an ensemble model also performs well.

Note: The results vary with respect to the dataset. If we stack more layers, it may also lead to overfitting. So
the number of layers to be stacked acts as a hyperparameter.

Links

Here’s the link for the code. 

Conclusion

Congratulations, you have learned how to implement multivariate multi-step time series forecasting using
TF 2.0 / Keras. This is my first attempt at writing a blog. So please share your opinion in the comments
section below.

Thanks for reading.

References:

1. https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/ 

2.  https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html

3.     https://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption

Article Url - https://www.analyticsvidhya.com/blog/2020/10/multivariate-multi-step-time-series-


forecasting-using-stacked-lstm-sequence-to-sequence-autoencoder-in-tensorflow-2-0-keras/

Jagadeesh23

You might also like