Recurrent Neural Networks
Basic and Implementations
2016.10.05
2016 (KSC 2016)
What you will learn about RNN
What is Recurrent Neural Networks?
How to build a RNN model?
Manipulate time series data
For RNN models
Run and evaluate graph
Predict using RNN as regression model
Recurrent Neural Networks @ KSC2016 Page 2
Contents
Overview of TensorFlow
Recurrent Neural Networks (RNN)
RNN Implementation
Case studies
Case study #1: MNIST using RNN
Case study #2: sine function
Case study #3: electricity price forecasting
Conclusions
Q&A
Recurrent Neural Networks @ KSC2016 Page 3
Contents
Overview of TensorFlow
Recurrent Neural Networks (RNN)
RNN Implementation
Case studies
Case study #1: MNIST using RNN
Case study #2: sine function
Case study #3: electricity price forecasting
Conclusions
Q&A
Recurrent Neural Networks @ KSC2016 Page 4
TensorFlow
Open Source Software Library for Machine Intelligence
Recurrent Neural Networks @ KSC2016 Page 5
Prerequisite
Software
TensorFlow (r0.10)
Python (2.7.6)
Numpy (1.11.1)
Pandas (0.18.1)
Tutorials
Recurrent Neural Networks, TensorFlow Tutorials
Sequence-to-Sequence Models, TensorFlow Tutorials
Blog Posts
Understanding LSTM Networks (Chris Olah @ colah.github.io)
Introduction to Recurrent Networks in TensorFlow (Danijar Hafner @ danijar.com)
Book
Deep Learning, I. Goodfellow, Y. Bengio, and A. Courville, MIT Press, 2016
Recurrent Neural Networks @ KSC2016 Page 6
Contents
Overview of TensorFlow
Recurrent Neural Networks (RNN)
RNN Implementation
Case studies
Case study #1: MNIST using RNN
Case study #2: sine function
Case study #3: electricity price forecasting
Conclusions
Q&A
Recurrent Neural Networks @ KSC2016 Page 7
Recurrent Neural Networks
Neural Networks Recurrent Neural Networks
... ...
Inputs and outputs are independent Sequential inputs and outputs
Recurrent Neural Networks @ KSC2016 Page 8
Recurrent Neural Networks (RNN)
: the input at time step
: the hidden state at time
: the output state at time
Image from WILDML.com: RECURRENT NEURAL NETWORKS TUTORIAL, PART 1 INTRODUCTION TO RNNS
Recurrent Neural Networks @ KSC2016 Page 9
Overall procedure: RNN
Initialization
All zeros
Random values (dependent on activation function)
Xavier initialization [1]:
1 1
Random values in the interval from , ,
where n is the number of incoming connections
from the previous layer
X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks (2010)
Recurrent Neural Networks @ KSC2016 Page 10
Overall procedure: RNN
Initialization
Forward Propagation
= + 1
: new state
1 : old state
: input vector at some time step
Function usually is a nonlinearity such as tanh or ReLU
Recurrent Neural Networks @ KSC2016 Page 11
Overall procedure: RNN
Initialization
Forward Propagation
Calculating the loss
: the labeled data
: the output data
Cross-entropy loss:
1
, = log( )
Recurrent Neural Networks @ KSC2016 Page 12
Overall procedure: RNN
Initialization
Forward Propagation
Calculating the loss
Stochastic Gradient Descent (SGD)
Push the parameters into a direction that reduced the error
The directions: the gradients on the loss
: , ,
Recurrent Neural Networks @ KSC2016 Page 13
Overall procedure: RNN
Initialization
Forward Propagation
Calculating the loss
Stochastic Gradient Descent (SGD)
Backpropagation Through Time (BPTT)
Long-term dependencies
vanishing/exploding gradient problem
Backpropagation
Backpropagation
Through Time
(BPTT)
Recurrent Neural Networks @ KSC2016 Page 14
Vanishing gradient over time
Standard RNN with sigmoid
The sensitivity of the input values
decays over time
The network forgets the previous input
Long-Short Term Memory (LSTM) [2]
The cell remember the input as long as
it wants
The output can be used anytime it wants
A. Graves. Supervised Sequence Labelling with Recurrent Neural Networks (2012)
Recurrent Neural Networks @ KSC2016 Page 15
Standard RNN
Simple tanh layer
Blog post by C. Olah. Understanding LSTM Networks (2015)
Recurrent Neural Networks @ KSC2016 Page 16
Long Short-Term Memory (LSTM)
Recurrent Neural Networks @ KSC2016 Page 17
Long Short-Term Memory (LSTM)
Cell state = conveyor belt!
Forget
Input
Update
Output
Recurrent Neural Networks @ KSC2016 Page 18
Long Short-Term Memory (LSTM)
Forget gate
LSTM have the ability to remove or add information to the cell state, carefully regulated
by structures call gates.
The decision what information were going to throw away from the cell state is made by
a sigmoid layer forget gate layer
Recurrent Neural Networks @ KSC2016 Page 19
Long Short-Term Memory (LSTM)
Input gate
Decide what new information were going to store in the cell state
First, input gate layer decide which values well update
Next, tanh layer creates a vector of new candidate values
Finally, combine two to create an update to the state
Recurrent Neural Networks @ KSC2016 Page 20
Long Short-Term Memory (LSTM)
Update
Forget old thing
Add new thing
This is where wed actually drop the information about the old subjects gender and add
the new information, as we decided in the previous steps.
Recurrent Neural Networks @ KSC2016 Page 21
Long Short-Term Memory (LSTM)
Output
Output will be based on cell state.
Recurrent Neural Networks @ KSC2016 Page 22
Gated Recurrent Unit (GRU)
Combine the forget and input gates into a single update gate
Merge the cell state and hidden state
Recurrent Neural Networks @ KSC2016 Page 23
LSTM vs GRU
Recurrent Neural Networks @ KSC2016 Page 24
Design Patterns for RNN
RNN Sequences
Task Input Output
Image classification fixed-sized image fixed-sized class
Image captioning image input sentence of words
Sentiment analysis sentence positive or negative sentiment
Machine translation sentence in English sentence in French
Video classification video sequence label each frame
Blog post by A. Karpathy. The Unreasonable Effectiveness of Recurrent Neural Networks (2015)
Recurrent Neural Networks @ KSC2016 Page 25
Contents
Overview of TensorFlow
Recurrent Neural Networks (RNN)
RNN Implementation
Case studies
Case study #1: MNIST using RNN
Case study #2: sine function
Case study #3: electricity price forecasting
Conclusions
Q&A
Recurrent Neural Networks @ KSC2016 Page 26
RNN Implementation
Recurrent States
Choose RNN cell type
Use multiple RNN cells
Input layer
Prepare time series data as RNN input
Data splitting
Connect input and recurrent layers
Output layer
Add DNN layer
Add regression model
Create RNN model for regression
Train & Prediction
Recurrent Neural Networks @ KSC2016 Page 27
1) Choose the RNN cell type
Neural Network RNN Cells (tf.nn.rnn_cell)
BasicRNNCell (tf.nn.rnn_cell.BasicRNNCell)
activation : tanh()
num_units : The number of units in the RNN cell
BasicLSTMCell (tf.nn.rnn_cell.BasicLSTMCell)
The implementation is based on RNN Regularization[3]
activation : tanh()
state_is_tuple : 2-tuples of the accepted and returned states
GRUCell (tf.nn.rnn_cell.GRUCell)
Gated Recurrent Unit cell[4]
activation : tanh()
LSTMCell (tf.nn.rnn_cell.LSTMCell)
use_peepholes (bool) : diagonal/peephole connections[5].
cell_clip (float) : the cell state is clipped by this value prior to the cell output activation.
num_proj (int): The output dimensionality for the projection matrices
W. Zaremba, L. Sutskever, and O. Vinyals, Recurrent Neural Network Regularization (2014)
K. Cho et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (2014)
H. Sak et al., Long short-term memory recurrent neural network architectures for large scale acoustic modeling (2014)
Recurrent Neural Networks @ KSC2016 Page 28
LAB-1) Choose the RNN Cell type
Import tensorflow as tf
num_units = 100
rnn_cell = tf.nn.rnn_cell.BasicRNNCell(num_units)
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
rnn_cell = tf.nn.rnn_cell.GRUCell(num_units)
rnn_cell = tf.nn.rnn_cell.LSTMCell(num_units)
BasicRNNCell BasicLSTMCell
GRUCell LSTMCell
Recurrent Neural Networks @ KSC2016 Page 29
2) Use the multiple RNN cells
RNN Cell wrapper (tf.nn.rnn_cell.MultiRNNCell)
Create a RNN cell composed sequentially of a number of RNN Cells.
RNN Dropout (tf.nn.rnn_cell.Dropoutwrapper)
Add dropout to inputs and outputs of the given cell.
RNN Embedding wrapper (tf.nn.rnn_cell.EmbeddingWrapper)
Add input embedding to the given cell.
Ex) word2vec, GloVe
RNN Input Projection wrapper (tf.nn.rnn_cell.InputProjectionWrapper)
Add input projection to the given cell.
RNN Output Projection wrapper (tf.nn.rnn_cell.OutputProjectionWrapper)
Add output projection to the given cell.
Recurrent Neural Networks @ KSC2016 Page 30
LAB-2) Use the multiple RNN cells
rnn_cell = tf.nn.rnn_cell.DropoutWrapper
(rnn_cell, input_keep_prob=0.8, output_keep_prob=0.8)
output_keep_prob=0.8
GRU/LSTM
Input_keep_prob=0.8
Stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([rnn_cell] * depth)
GRU/LSTM
GRU/LSTM depth
GRU/LSTM
Recurrent Neural Networks @ KSC2016 Page 31
3) Prepare the time series data
Split raw data into train, validation, and test dataset
split_data [6]
data : raw data
val_size : the ratio of validation set (ex. val_size=0.2)
test_size : the ratio of test set (ex. test_size=0.2)
def split_data(data, val_size=0.2, test_size=0.2):
ntest = int(round(len(data) * (1 - test_size)))
nval = int(round(len(data.iloc[:ntest]) * (1 - val_size)))
df_train, df_val, df_test = data.iloc[:nval], data.iloc[nval:ntest],
data.iloc[ntest:]
return df_train, df_val, df_test
M. Mourafiq, tensorflow-lstm-regression (code: https://github.com/mouradmourafiq/tensorflow-lstm-regression)
Recurrent Neural Networks @ KSC2016 Page 32
LAB-3) Prepare the time series data
train, val, test = split_data(raw_data, val_size=0.2, test_size=0.2)
Raw data
(100%)
Train Test
(80%) (20%)
Train Validation Test
(80%) (20%) (20%)
64% 16% 20%
Recurrent Neural Networks @ KSC2016 Page 33
3) Prepare the time series data
Generate sequence pair (x, y)
rnn_data [6]
labels : True for input data (x) / False for target data (y)
num_split : time_steps
data : our data
def rnn_data(data, time_steps, labels=False):
"""
creates new data frame based on previous observation
* example:
l = [1, 2, 3, 4, 5]
time_steps = 2
-> labels == False [[1, 2], [2, 3], [3, 4]]
-> labels == True [3, 4, 5]
"""
rnn_df = []
for i in range(len(data) - time_steps):
if labels:
try:
rnn_df.append(data.iloc[i + time_steps].as_matrix())
except AttributeError:
rnn_df.append(data.iloc[i + time_steps])
else:
data_ = data.iloc[i: i + time_steps].as_matrix()
rnn_df.append(data_ if len(data_.shape) > 1 else [[i] for i in data_])
return np.array(rnn_df)
Recurrent Neural Networks @ KSC2016 Page 34
LAB-3) Prepare the time series data
time_steps = 10
train_x = rnn_data(df_train, time_steps, labels=false)
train_y = rnn_data(df_train, time_steps, labels=true)
df_train [1:10000]
train_x
x #9990
x #01
[1, 2, 3, ,10]
x #02
[2, 3, 4, ,11]
[9990, 9991, 9992, ,9999]
train_y
y #01 y #02 y #9990
10000
11 12
Recurrent Neural Networks @ KSC2016 Page 35
4) Split our data
Split time series data into smaller tensors
split (tf.split)
split_dim : batch_size
num_split : time_steps
value : our data
split_squeeze (tf.contrib.learn.ops.split_squeeze)
Splits input on given dimension and then squeezes that dimension.
dim
num_split
tensor_in
From 0.10rc,
tf:split_squeeze is deprecated and will be removed after 2016-08-01. Use tf.unpack instead.
Recurrent Neural Networks @ KSC2016 Page 36
LAB-4) Split our data
time_step = 10
x_split = split_squeeze(1, time_steps, x_data)
x #01
[1, 2, 3, ,10]
split_squeeze
1 2 3 10 9 8 7
Recurrent Neural Networks @ KSC2016 Page 37
5) Connect input and recurrent layers
Create a recurrent neural network specified by RNNCell
rnn (tf.nn.rnn)
Args:
cell : an instance of RNNCell
inputs : list of inputs, tensor shape = [batch_size, input_size]
Returns:
(outputs, state)
outputs : list of outputs
state : the final state
dynamic_rnn (tf.nn.dynamic_rnn)
Args:
cell : an instance of RNNCell
inputs : list of inputs, tensor shape = [batch_size, input_size]
Returns:
(outputs, state)
outputs : the RNN output
state : the final state
Recurrent Neural Networks @ KSC2016 Page 38
LAB-5) Connect input and recurrent layers
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([rnn_cell] * depth)
x_split = tf.split(batch_size, time_steps, x_data)
output, state = tf.nn.rnn(stacked_lstm, x_split)
9 8 7
LSTM LSTM LSTM LSTM
LSTM LSTM LSTM LSTM
LSTM LSTM LSTM LSTM
9 8 7
Recurrent Neural Networks @ KSC2016 Page 39
6) Output Layer
Add DNN layer
dnn (tf.contrib.learn.ops.dnn)
input_layer
hidden units
Add Linear Regression
linear_regression (tf.contrib.learn.models.linear_regression)
X
y
Recurrent Neural Networks @ KSC2016 Page 40
LAB-6) Output Layer
dnn_output = dnn(rnn_output, [10, 10])
LSTM_Regressor = linear_regression(dnn_output, y)
Linear regression
DNN Layer 2 with 10 hidden units
DNN Layer 1 with 10 hidden units
LSTM LSTM LSTM LSTM
Recurrent Neural Networks @ KSC2016 Page 41
7) Create RNN model for regression
TensorFlowEstimator (tf.contrib.learn.TensorFlowEstimator)
regressor =
learn.TensorFlowEstimator(model_fn=LSTM_Regressor,
n_classes=0, verbose=1, steps=TRAINING_STEPS, optimizer='Adagrad',
learning_rate=0.03, batch_size=BATCH_SIZE)
regressor.fit(X['train'], y['train']
predicted = regressor.predict(X['test'])
mse = mean_squared_error(y['test'], predicted)
Recurrent Neural Networks @ KSC2016 Page 42
Contents
Overview of TensorFlow
Recurrent Neural Networks (RNN)
RNN Implementation
Case studies
Case study #1: MNIST using RNN
Case study #2: sine function
Case study #3: electricity price forecasting
Conclusions
Q&A
Recurrent Neural Networks @ KSC2016 Page 43
MNIST using RNN
https://github.com/tgjeon/TensorFlow-Tutorials-for-Time-Series/blob/master/mnist-
rnn.ipynb
Recurrent Neural Networks @ KSC2016 Page 44
Contents
Overview of TensorFlow
Recurrent Neural Networks (RNN)
RNN Implementation
Case studies
Case study #1: MNIST using RNN
Case study #2: sine function
Case study #3: electricity price forecasting
Conclusions
Q&A
Recurrent Neural Networks @ KSC2016 Page 45
Case study #1: sine function
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt
from tensorflow.contrib import learn
from sklearn.metrics import mean_squared_error,
mean_absolute_error
from lstm_predictor import generate_data, lstm_model
Libraries
numpy: package for scientific computing
matplotlib: 2D plotting library
tensorflow: open source software library for machine intelligence
learn: Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
mse: "mean squared error" as evaluation metric
lstm_predictor: our lstm class
Recurrent Neural Networks @ KSC2016 Page 46
Case study #1: sine function
LOG_DIR = './ops_logs'
TIMESTEPS = 5
RNN_LAYERS = [{'steps': TIMESTEPS}]
DENSE_LAYERS = [10, 10]
TRAINING_STEPS = 100000
BATCH_SIZE = 100
PRINT_STEPS = TRAINING_STEPS / 100
Parameter definitions
LOG_DIR: log file
TIMESTEPS: RNN time steps
RNN_LAYERS: RNN layer information
DENSE_LAYERS: Size of DNN[10, 10]: Two dense layer with 10 hidden units
TRAINING_STEPS
BATCH_SIZE
PRINT_STEPS
Recurrent Neural Networks @ KSC2016 Page 47
Case study #1: sine function
X, y = generate_data(np.sin, np.linspace(0, 100, 10000), TIMESTEPS,
seperate=False)
Generate waveform
fct: function
x: observation
time_steps: timesteps
seperate: check multimodality
Recurrent Neural Networks @ KSC2016 Page 48
Case study #1: sine function
regressor =
learn.TensorFlowEstimator(model_fn=lstm_model(TIMESTEPS,
RNN_LAYERS, DENSE_LAYERS), n_classes=0, verbose=1,
steps=TRAINING_STEPS, optimizer='Adagrad', learning_rate=0.03,
batch_size=BATCH_SIZE)
Create a regressor with TF Learn
model_fn: regression model
n_classes: 0 for regression
verbose:
steps: training steps
optimizer: ("SGD", "Adam", "Adagrad")
learning_rate
batch_size
Recurrent Neural Networks @ KSC2016 Page 49
Case study #1: sine function
validation_monitor = learn.monitors.ValidationMonitor(
X['val'], y['val'], every_n_steps=PRINT_STEPS,
early_stopping_rounds=1000)
regressor.fit(X['train'], y['train'],
monitors=[validation_monitor], logdir=LOG_DIR)
predicted = regressor.predict(X['test'])
mse = mean_squared_error(y['test'], predicted)
print ("Error: %f" % mse)
Error: 0.000294
Recurrent Neural Networks @ KSC2016 Page 50
Case study #1: sine function
plot_predicted, = plt.plot(predicted, label='predicted')
plot_test, = plt.plot(y['test'], label='test')
plt.legend(handles=[plot_predicted, plot_test])
Recurrent Neural Networks @ KSC2016 Page 51
Contents
Overview of TensorFlow
Recurrent Neural Networks (RNN)
RNN Implementation
Case studies
Case study #1: MNIST using RNN
Case study #2: sine function
Case study #3: electricity price forecasting
Conclusions
Q&A
Recurrent Neural Networks @ KSC2016 Page 52
Energy forecasting problems
Energy signal Current time Signal forecast
(e.g. load, price, generation)
External signal
(e.g. Weather) External forecast
(e.g. Weather forecast)
Recurrent Neural Networks @ KSC2016 Page 53
Dataset: Historical Data (2015-16) Prices
Prices ( / MWh )
Hourly real electricity price for MIBEL (the Portuguese (PT) area)
Duration: Jan 1st, 2015 (UTC 00:00) Feb 2nd, 2016 (UTC 23:00)
Recurrent Neural Networks @ KSC2016 Page 54
Dataset: Historical Data (2015-16) Prices
date (UTC) Price
01/01/2015 0:00 48.1
01/01/2015 1:00 47.33
01/01/2015 2:00 42.27
01/01/2015 3:00 38.41
01/01/2015 4:00 35.72
01/01/2015 5:00 35.13
01/01/2015 6:00 36.22
01/01/2015 7:00 32.4
01/01/2015 8:00 36.6
01/01/2015 9:00 43.1
01/01/2015 10:00 45.14
01/01/2015 11:00 45.14
01/01/2015 12:00 47.35
01/01/2015 13:00 47.35
01/01/2015 14:00 43.61
01/01/2015 15:00 44.91
01/01/2015 16:00 48.1
01/01/2015 17:00 58.02
01/01/2015 18:00 61.01
01/01/2015 19:00 62.69
01/01/2015 20:00 60.41
01/01/2015 21:00 58.15
01/01/2015 22:00 53.6
01/01/2015 23:00 47.34
Recurrent Neural Networks @ KSC2016 Page 55
Case study #2: Electricity Price Forecasting
dateparse = lambda dates: pd.datetime.strptime(dates, '%d/%m/%Y %H:%M')
rawdata = pd.read_csv("./input/ElectricityPrice/RealMarketPriceDataPT.csv",
parse_dates={'timeline': ['date', '(UTC)']},
index_col='timeline', date_parser=dateparse)
X, y = load_csvdata(rawdata, TIMESTEPS, seperate=False)
Recurrent Neural Networks @ KSC2016 Page 56
Tensorboard: Main Graph
Recurrent Neural Networks @ KSC2016 Page 57
Tensorboard: RNN
Recurrent Neural Networks @ KSC2016 Page 58
Tensorboard: DNN
Recurrent Neural Networks @ KSC2016 Page 59
Tensorboard: Linear Regression
Recurrent Neural Networks @ KSC2016 Page 60
Tensorboard: loss
Recurrent Neural Networks @ KSC2016 Page 61
Tensorboard: Histogram
Recurrent Neural Networks @ KSC2016 Page 62
Contents
Overview of TensorFlow
Recurrent Neural Networks (RNN)
RNN Implementation
Case studies
Case study #1: MNIST using RNN
Case study #2: sine function
Case study #3: electricity price forecasting
Conclusions
Q&A
Recurrent Neural Networks @ KSC2016 Page 63
Conclusion
LSTM and GRU
Data preparation
RNN source code in TensorFlow is simple,
but required time for training is painful.
Recurrent Neural Networks @ KSC2016 Page 64
Contents
Overview of TensorFlow
Recurrent Neural Networks (RNN)
RNN Implementation
Case studies
Case study #1: MNIST using RNN
Case study #2: sine function
Case study #3: electricity price forecasting
Conclusions
Q&A
Recurrent Neural Networks @ KSC2016 Page 65
Q&A
, PhD
Senior Researcher, R&D Center. SATREC INITIATIVE
Contact:
[email protected],
[email protected]Github for this tutorial: https://github.com/tgjeon/TensorFlow-Tutorials-for-Time-Series
Recurrent Neural Networks @ KSC2016 Page 66