0% found this document useful (0 votes)
1 views

01 Module 2 Neural Network Based Reinforcement Learning

Uploaded by

sherlockplus650b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

01 Module 2 Neural Network Based Reinforcement Learning

Uploaded by

sherlockplus650b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

Q Networks

Learning Objectives
● TD-Gammon

● Deep Q Networks

○ The Loss Function

○ Memory

○ Code
Agenda
TD-Gammon

Deep Q Networks - Loss

Deep Q Networks - Memory

Deep Q Networks - Code


From Chess to Backgammon
From Chess to Backgammon
From Chess to Backgammon
Q Tables vs Q Networks

Q - table

Left Down Right Up

0 0 0 0 0

1 0 0 0 0

2 0 0 0 0

3 0 0 0 0
Q Tables vs Q Networks

Output: TD(λ)
Q - table approximation

Left Down Right Up Hidden Layer 2

0 0 0 0 0

1 0 0 0 0 Hidden Layer 1

2 0 0 0 0
Input: State Features
3 0 0 0 0
Agenda
TD-Gammon

Deep Q Networks - Loss

Deep Q Networks - Memory

Deep Q Networks - Code


Deep Q Learning

Deep Reinforcement Learning


Deep Q Learning - Loss Function

Q(st,at ) = Q(st,at ) + ꭤt(rt + γ · maxa{Q(st+1,a)} - Q(st,at ))


Deep Q Learning - Loss Function

Q(st,at ) = Q(st,at ) + ꭤt(rt + γ · maxa{Q(st+1,a)} - Q(st,at ))

Δw = ꭤ(r + γ · maxa{Q(st+1,a,w)} - Q(st,at ,w))▽wQ(s,a,w)


Deep Q Learning - Loss Function

Q(st,at ) = Q(st,at ) + ꭤt(rt + γ · maxa{Q(st+1,a)} - Q(st,at ))

Δw = ꭤ(r + γ · maxa{Q(st+1,a,w)} - Q(st,at ,w))▽wQ(s,a,w)


Deep Q Learning - Loss Function

Q(st,at ) = Q(st,at ) + ꭤt(rt + γ · maxa{Q(st+1,a)} - Q(st,at ))

Δw = ꭤ(r + γ · maxa{Q(st+1,a,w)} - Q(st,at ,w))▽wQ(s,a,w)


Deep Q Learning - Training
Action 1 Cycle Returns:
● 1 State
● 1 Chosen Action
● 1 Reward
● 1 State Prime

Agent Environment

State Reward
Deep Q Learning - Training
1 Cycle Returns:
● 1 State
● 1 Chosen Action
● 1 Reward
● 1 State Prime

Feed in State Prime


Deep Q Learning - Training
1 Cycle Returns:
Q(s’, a0) Q(s’, a1) Q(s’, a2) ● 1 State
● 1 Chosen Action
● 1 Reward
● 1 State Prime
Deep Q Learning - Training
1 Cycle Returns:
Q(s’, a0) Q(s’, a1) Q(s’, a2) Take Max Value ● 1 State
● 1 Chosen Action
● 1 Reward
● 1 State Prime
Deep Q Learning - Training
1 Cycle Returns:
Q(s’, a0) Q(s’, a1) Q(s’, a2) Max Q ● 1 State
● 1 Chosen Action
● 1 Reward
● 1 State Prime

r + γ · maxa{Q(st+1,a,w)} Calculate Label


Deep Q Learning - Training
1 Cycle Returns:
0 0 Label Apply label to action ● 1 State
● 1 Chosen Action
● 1 Reward
● 1 State Prime
Deep Q Learning - Training
1 Cycle Returns:
0 0 Label ● 1 State
● 1 Chosen Action
● 1 Reward
● 1 State Prime

Train on State
Deep Q Learning - Loss Function

Δw = ꭤ(r + γ · maxa{Q(st+1,a,w)} - Q(st,at ,w))▽wQ(s,a,w)

Label Predicted Value


Deep Q Learning - Loss Function

Δw = ꭤ(r + γ · maxa{Q(st+1,a,w)} - Q(st,at ,w))▽wQ(s,a,w)

Label Predicted Value

Loss = (y - ŷ)2
Agenda
TD-Gammon

Deep Q Networks - Loss

Deep Q Networks - Memory

Deep Q Networks - Code


Experience Replay
Deep Q Learning - Memory
Action 1 Cycle Returns:
● 1 State
● 1 Chosen Action
● 1 Reward
● 1 State Prime

Agent Environment

State Reward
Deep Q Learning - Memory
1 Cycle Returns:
Memory Buffer ● 1 State
● 1 Chosen Action
state ● 1 Reward
Idx state action reward
prime
● 1 State Prime
0 s0 a0 r0 s1

1 s1 a1 r1 s2

2 s2 a2 r2 s3

...
Deep Q Learning - Memory
1 Cycle Returns:
Memory Buffer ● 1 State
● 1 Chosen Action
state ● 1 Reward
Idx state action reward
prime
● 1 State Prime
0 s0 a0 r0 s1

1 s1 a1 r1 s2

2 s2 a2 r2 s3

...
Deep Q Learning - Memory

Memory Buffer Training Sample

state state
Idx state action reward Idx state action reward
prime prime

0 s0 a0 r0 s1 2 s2 a2 r2 s3

1 s1 a1 r1 s2 11 s11 a11 r11 s12

2 s2 a2 r2 s3 25 s25 a25 r25 s26

... ...
The Memory Buffer
class Memory():
def __init__(self, memory_size, batch_size):
...

def add(self, experience):


...

def sample(self):
...
The Memory Buffer
class Memory():
def __init__(self, memory_size, batch_size):
self.buffer = deque(maxlen=memory_size)
self.batch_size = batch_size

def add(self, experience):


...

def sample(self):
...
The Memory Buffer
class Memory():
def __init__(self, memory_size, batch_size):
self.buffer = deque(maxlen=memory_size)
self.batch_size = batch_size

def add(self, experience):


# Adds a (state, action, reward, state_prime, done) tuple.
self.buffer.append(experience)

def sample(self):
...
The Memory Buffer
class Memory():
def __init__(self, memory_size, batch_size):
self.buffer = deque(maxlen=memory_size)
self.batch_size = batch_size

def add(self, experience):


# Adds a (state, action, reward, state_prime, done) tuple.
self.buffer.append(experience)

def sample(self):
buffer_size = len(self.buffer)
index = np.random.choice(
np.arange(buffer_size), size=self.batch_size, replace=False)
batch = [self.buffer[i] for i in index]
return batch
Experience Replay
Agenda
TD-Gammon

Deep Q Networks - Loss

Deep Q Networks - Memory

Deep Q Networks - Code


Deep Q Learning - Network
def deep_q_network(state_shape, action_size, learning_rate, hidden_neurons):
state_input = Input(state_shape, name='frames')

hidden_1 = Dense(hidden_neurons, activation='relu')(state_input)


hidden_2 = Dense(hidden_neurons, activation='relu')(hidden_1)
q_values = Dense(action_size)(hidden_2)

model = Model(inputs=[state_input], outputs=q_values)


optimizer = tf.keras.optimizers.RMSprop(lr=learning_rate)
model.compile(loss='mse', optimizer=optimizer)
return model
Deep Q Learning - Network (advanced)
def deep_q_network(state_shape, action_size, learning_rate, hidden_neurons):
state_input = Input(state_shape, name='frames')
actions_input = Input((action_size,), name='mask')

hidden_1 = Dense(hidden_neurons, activation='relu')(state_input)


hidden_2 = Dense(hidden_neurons, activation='relu')(hidden_1)
q_values = Dense(action_size)(hidden_2)
masked_q_values = Multiply()([q_values, actions_input])

model = Model(inputs=[state_input, actions_input], outputs=masked_q_values)


optimizer = tf.keras.optimizers.RMSprop(lr=learning_rate)
model.compile(loss='mse', optimizer=optimizer)
return model
Deep Q Learning - Network (advanced)
def deep_q_network(state_shape, action_size, learning_rate, hidden_neurons):
state_input = Input(state_shape, name='frames')
actions_input = Input((action_size,), name='mask')

hidden_1 Training
= Dense(hidden_neurons, activation='relu')(state_input)
hidden_2 = Dense(hidden_neurons, activation='relu')(hidden_1)
a0 a1 = Dense(action_size)(hidden_2)
q_values a2 a3
masked_q_values = Multiply()([q_values, actions_input])
0 0 1 0
model = Model(inputs=[state_input, actions_input], outputs=masked_q_values)
optimizer = tf.keras.optimizers.RMSprop(lr=learning_rate)
model.compile(loss='mse', optimizer=optimizer)
return model
Deep Q Learning - Network (advanced)
def deep_q_network(state_shape, action_size, learning_rate, hidden_neurons):
state_input = Input(state_shape, name='frames')
actions_input = Input((action_size,), name='mask')

hidden_1Predicting
= Dense(hidden_neurons, activation='relu')(state_input)
hidden_2 = Dense(hidden_neurons, activation='relu')(hidden_1)
a0 a1 = Dense(action_size)(hidden_2)
q_values a2 a3
masked_q_values = Multiply()([q_values, actions_input])
1 1 1 1
model = Model(inputs=[state_input, actions_input], outputs=masked_q_values)
optimizer = tf.keras.optimizers.RMSprop(lr=learning_rate)
model.compile(loss='mse', optimizer=optimizer)
return model
Deep Q Learning - The Act Function
def act(self, state, training=False):
if training:
# Random actions until enough simulations to train the model.
if len(self.memory.buffer) >= self.memory.batch_size:
self.random_rate *= self.random_decay

if self.random_rate > np.random.rand():


return random.randint(0, self.action_size-1)

# If not acting randomly, take action with highest predicted value.


state_batch = np.expand_dims(state, axis=0)
predict_mask = np.ones((1, self.action_size,))
action_qs = self.network.predict([state_batch, predict_mask])
return np.argmax(action_qs[0])
Deep Q Learning - The Act Function
def act(self, state, training=False):
if training:
# Random actions until enough simulations to train the model.
if len(self.memory.buffer) >= self.memory.batch_size:
self.random_rate *= self.random_decay

if self.random_rate > np.random.rand():


return random.randint(0, self.action_size-1)

# If not acting randomly, take action with highest predicted value.


state_batch = np.expand_dims(state, axis=0)
predict_mask = np.ones((1, self.action_size,))
action_qs = self.network.predict([state_batch, predict_mask])
return np.argmax(action_qs[0])
Deep Q Learning - The Act Function
def act(self, state, training=False):
if training:
# Random actions until enough simulations to train the model.
if len(self.memory.buffer) >= self.memory.batch_size:
self.random_rate *= self.random_decay

if self.random_rate > np.random.rand():


return random.randint(0, self.action_size-1)

# If not acting randomly, take action with highest predicted value.


state_batch = np.expand_dims(state, axis=0)
predict_mask = np.ones((1, self.action_size,))
action_qs = self.network.predict([state_batch, predict_mask])
return np.argmax(action_qs[0])
Deep Q Learning - The Act Function
def act(self, state, training=False):
if training:
# Random actions until enough simulations to train the model.
if len(self.memory.buffer) >= self.memory.batch_size:
self.random_rate *= self.random_decay

if self.random_rate > np.random.rand():


return random.randint(0, self.action_size-1)

# If not acting randomly, take action with highest predicted value.


state_batch = np.expand_dims(state, axis=0)
predict_mask = np.ones((1, self.action_size,))
action_qs = self.network.predict([state_batch, predict_mask])
return np.argmax(action_qs[0])
Deep Q Learning - Update Q Function
def update_Q(self):
state_mb, action_mb, reward_mb, state_prime_mb, done_mb = (
self.memory.sample())

# Get Q values for state_prime_mb.


...

# Apply the Bellman Equation


...

# Match training batch to network output


...
Deep Q Learning - Update Q Function
def update_Q(self):
state_mb, action_mb, reward_mb, state_prime_mb, done_mb = (
self.memory.sample())

# Get Q values for state_prime_mb.


predict_mask = np.ones(action_mb.shape + (self.action_size,))
next_q_mb = self.network.predict([state_prime_mb, predict_mask])
next_q_mb = tf.math.reduce_max(next_q_mb, axis=1)

# Apply the Bellman Equation


...

# Match training batch to network output


...
Deep Q Learning - Update Q Function
def update_Q(self):
state_mb, action_mb, reward_mb, state_prime_mb, done_mb = (
self.memory.sample())

# Get Q values for state_prime_mb.


predict_mask = np.ones(action_mb.shape + (self.action_size,))
next_q_mb = self.network.predict([state_prime_mb, predict_mask])
next_q_mb = tf.math.reduce_max(next_q_mb, axis=1)

# Apply the Bellman Equation


target_qs = (next_q_mb * self.memory.gamma) + reward_mb
target_qs = tf.where(done_mb, reward_mb, target_qs)

# Match training batch to network output


...
Deep Q Learning - Update Q Function
def update_Q(self):
state_mb, action_mb, reward_mb, state_prime_mb, done_mb = (
self.memory.sample())

# Get Q values for state_prime_mb.


predict_mask = np.ones(action_mb.shape + (self.action_size,))
next_q_mb = self.network.predict([state_prime_mb, predict_mask])
next_q_mb = tf.math.reduce_max(next_q_mb, axis=1)

# Apply the Bellman Equation


target_qs = (next_q_mb * self.memory.gamma) + reward_mb
target_qs = tf.where(done_mb, reward_mb, target_qs)

# Match training batch to network output


...
Deep Q Learning - Update Q Function
def update_Q(self):
state_mb, action_mb, reward_mb, state_prime_mb, done_mb = (
self.memory.sample())

# Get Q values for state_prime_mb.


predict_mask = np.ones(action_mb.shape + (self.action_size,))
next_q_mb = self.network.predict([state_prime_mb, predict_mask])
next_q_mb = tf.math.reduce_max(next_q_mb, axis=1)

# Apply the Bellman Equation


target_qs = (next_q_mb * self.memory.gamma) + reward_mb
target_qs = tf.where(done_mb, reward_mb, target_qs)

# Match training batch to network output


action_mb = tf.convert_to_tensor(action_mb, dtype=tf.int32)
action_hot = tf.one_hot(action_mb, self.action_size)
target_mask = tf.multiply(tf.expand_dims(target_qs, -1), action_hot)
return self.network.train_on_batch([state_mb, action_hot], target_mask)
Deep Q Learning - Update Q Function
def update_Q(self):
state_mb, action_mb, reward_mb, state_prime_mb, done_mb = (
self.memory.sample())

# Get Q values for state_prime_mb.


predict_mask = np.ones(action_mb.shape + (self.action_size,))
next_q_mb = self.network.predict([state_prime_mb, predict_mask])
next_q_mb = tf.math.reduce_max(next_q_mb, axis=1)

# Apply the Bellman Equation


Training
target_qs = (next_q_mb * self.memory.gamma) + reward_mb
target_qs = tf.where(done_mb, reward_mb, target_qs)
a0 a1 a2 a3
0 # Match0 training batch
1 to network
0 output
action_mb = tf.convert_to_tensor(action_mb, dtype=tf.int32)
action_hot = tf.one_hot(action_mb, self.action_size)
target_mask = tf.multiply(tf.expand_dims(target_qs, -1), action_hot)
return self.network.train_on_batch([state_mb, action_hot], target_mask)
Deep Q Learning - Update Q Function
def update_Q(self):
state_mb, action_mb, reward_mb, state_prime_mb, done_mb = (
self.memory.sample())

# Get Q values for state_prime_mb.


predict_mask = np.ones(action_mb.shape + (self.action_size,))
next_q_mb = self.network.predict([state_prime_mb, predict_mask])
next_q_mb = tf.math.reduce_max(next_q_mb, axis=1)

# Apply the Bellman Equation


target_qs = (next_q_mb * self.memory.gamma) + reward_mb
target_qs = tf.where(done_mb, reward_mb, target_qs)

# Match training batch to network output


action_mb = tf.convert_to_tensor(action_mb, dtype=tf.int32)
action_hot = tf.one_hot(action_mb, self.action_size)
target_mask = tf.multiply(tf.expand_dims(target_qs, -1), action_hot)
return self.network.train_on_batch([state_mb, action_hot], target_mask)
Lab
Use Reinforcement
Learning in Trading
Lab Objectives


Screencast
Policy Gradients
Agenda
Policy Gradients

Actor - Critic
Agenda
Policy Gradients

Actor - Critic
Deep Q vs Policy Gradients
Deep Q Network

Q(s, a0) Q(s, a1) Q(s, a2)

State Properties
Deep Q vs Policy Gradients
Deep Q Network Policy Gradient

Q(s, a0) Q(s, a1) Q(s, a2) P(a0| s) P(a1| s) P(a2| s)

State Properties State Properties


Deep Q vs Policy Gradients
Deep Q Network Policy Gradient

Q(s, a0) Q(s, a1) Q(s, a2) P(a0| s) P(a1| s) P(a2| s)


Deep Q vs Policy Gradients
Deep Q Network Policy Gradient

Q(s, a0) Q(s, a1) Q(s, a2) P(a0| s) P(a1| s) P(a2| s)

.4 .3 .3
Deep Q vs Policy Gradients
Deep Q Network Policy Gradient

Q(s, a0) Q(s, a1) Q(s, a2) P(a0| s) P(a1| s) P(a2| s)

.4 .3 .3
Policy Gradients - Loss
Chosen Action

0 0 1

State Properties
Policy Gradients - Loss

*
Δw = ꭤ▽πw(a ,s)
Policy Gradients - Loss

*
Δw = ꭤ▽πw(a ,s)
*
πw(a ,s)
Policy Gradients - Loss

*
Δw = ꭤ▽wlog(πw(a ,s))
Policy Gradients - Loss

Δw = ꭤ▽wlog(πw(a,s)) · Gt
Policy Gradients - Loss

Δw = ꭤ▽wlog(πw(a,s)) · Gt
def custom_loss(y_true, y_pred):
y_pred_clipped = K.clip(y_pred, 1e-8, 1-1e-8)
log_likelihood = y_true * K.log(y_pred_clipped)
return K.sum(-log_likelihood*g)
Policy Gradients - Network
def build_networks(state_shape, action_size, learning_rate, hidden_neurons):
state_input = Input(state_shape, name='frames')
g = Input((1,), name='G')
hidden_1 = Dense(hidden_neurons, activation='relu')(state_input)
hidden_2 = Dense(hidden_neurons, activation='relu')(hidden_1)
probabilities = Dense(action_size, activation='softmax')(hidden_2)

def custom_loss(y_true, y_pred):


# Previous slide.

policy = Model(
inputs=[state_input, g], outputs=[probabilities])
optimizer = Adam(lr=learning_rate)
policy.compile(loss=custom_loss, optimizer=optimizer)

predict = Model(inputs=[state_input], outputs=[probabilities])


return policy, predict
Policy Gradients - Memory
class Memory():
def __init__(self, gamma):
self.buffer = []
self.gamma = gamma

def add(self, experience):


self.buffer.append(experience)

def sample(self):
batch = np.array(self.buffer).T.tolist()
states_mb = np.array(batch[0], dtype=np.float32)
actions_mb = np.array(batch[1], dtype=np.int8)
rewards_mb = np.array(batch[2], dtype=np.float32)
self.buffer = []
return states_mb, actions_mb, rewards_mb
Policy Gradients - Training
def learn(self):
"""Trains the Deep Q Network based on stored experiences."""
# Obtain random mini-batch from memory.
state_mb, action_mb, reward_mb = self.memory.sample()
actions = tf.one_hot(action_mb, self.action_size)

# Normalized TD(1)
discount_mb = np.zeros_like(reward_mb)
total_rewards = 0
for t in reversed(range(len(reward_mb))):
total_rewards = reward_mb[t] + total_rewards * self.memory.gamma
discount_mb[t] = total_rewards
discount_mb = (discount_mb - np.mean(discount_mb)) / np.std(discount_mb)

self.policy.train_on_batch([state_mb, discount_mb], actions)


Policy Gradients - Training
def learn(self):
"""Trains the Deep Q Network based on stored experiences."""
# Obtain random mini-batch from memory.
state_mb, action_mb, reward_mb = self.memory.sample()
actions = tf.one_hot(action_mb, self.action_size)

# Normalized TD(1)
discount_mb = np.zeros_like(reward_mb)
total_rewards = 0
for t in reversed(range(len(reward_mb))):
total_rewards = reward_mb[t] + total_rewards * self.memory.gamma
discount_mb[t] = total_rewards
discount_mb = (discount_mb - np.mean(discount_mb)) / np.std(discount_mb)

self.policy.train_on_batch([state_mb, discount_mb], actions)


Policy Gradients Overview
def act(self, state):
state_batch = np.expand_dims(state, axis=0)
probabilities = self.predict.predict(state_batch)[0]
action = np.random.choice(self.action_size, p=probabilities)
return action
Agenda
Policy Gradients

Actor - Critic
Breaking Down Q

Q(s, a) = V(s) + A(s, a)


Breaking Down Q

Q(s, a) = V(s) + A(s, a)


Predict Actor Critic

State Inputs
Breaking Down Q

Q(s, a) = V(s) + A(s, a)


Predict Actor

State Inputs
Breaking Down Q

Q(s, a) = V(s) + A(s, a)


Critic

State Inputs
A2C - Network
def build_networks(state_shape, action_size, actor_lr, critic_lr, neurons):
state_input = layers.Input(state_shape, name='frames')
advantage = layers.Input((1,), name='A') # Now A instead of G.

hidden_1 = layers.Dense(neurons, activation='relu')(state_input)


hidden_2 = layers.Dense(neurons, activation='relu')(hidden_1)
probabilities = layers.Dense(action_size, activation='softmax')(hidden_2)
value = layers.Dense(1, activation='linear')(hidden_2)

def custom_loss(y_true, y_pred):


# Same as before.

actor = Model(inputs=[state_input, advantages], outputs=[probabilities, values])


actor.compile(loss=[custom_loss, 'mean_squared_error'], optimizer=Adam(lr=actor_lr))

critic = Model(inputs=[state_input], outputs=[value])


predict = Model(inputs=[state_input], outputs=[probabilities])
return actor, critic, predict
A2C - Training
def learn(self):
"""Trains the Deep Q Network based on stored experiences."""
# Obtain random mini-batch from memory.
state_mb, action_mb, reward_mb, dones_mb, next_v_mb = self.memory.sample()

#Apply TD(0)
discount_mb = reward_mb + next_v_mb * self.memory.gamma * (1 - dones_mb)
state_values = self.critic.predict([state_mb])
advantages = discount_mb - np.squeeze(state_values)
self.actor.train_on_batch([state_mb, advantages], [action_mb, discount_mb])
Lab
Use Reinfocement
Learning in Trading
Lab Objectives


Screencast
What is LSTM?

Daniel Sparing
Machine Learning Solutions Engineer
Google Cloud
Agenda
Sequence Models

DNNs and RNNs for sequences

RNN limitations

LSTM

Applying LSTM to Time Series Data


Why Sequence Models?
Predict the next word

The cat sat on the ______.


Translate
Why Sequence Models?
Smart Reply
Why Sequence Models?
Speech recognition

Input: Sequence of float vectors (windowed Fourier Transforms)


Output: Different length sequence of characters
Agenda
Sequence Models

DNNs and RNNs for sequences

RNN limitations

LSTM

Applying LSTM to Time Series Data


Feed Forward Networks

output layer

hidden layers

input layer
Feed Forward Networks

Fixed size layers

Inference is stateless

Nodes are unordered


Language as Input
Input to a language model can have variable length. For example,

do you like green eggs and ham ?

four and twenty blackbirds


Language as Input: the “Typical” Approach

Yielding vectors of fixed


size

Embeddings are
Aggregation aggregated using sum
or average

Words are embedded


Embedding
independently

do you like green eggs and ham ?

four and twenty blackbirds


Language as Input: the “Typical” Approach

This is essentially
the “bag-of-words”
Aggregation
approach

Embedding

do you like green eggs and ham ?

four and twenty blackbirds


Structure is Important

● Certain tasks, structure is essential:


○ Humor
The cat sat on the mat ○ Sarcasm

● Certain tasks, ngrams can get you a


long way:
○ Sentiment Analysis
○ Topic detection

● Specific words can be strong


sat the on mat cat the
indicators
○ Useless, fantastic (sentiment)
○ Hoop, green tea, NASDAQ (topic)
Structure is Hard

Ngrams is typical way of preserving some structure

sat the cat mat cat sat sat on

the mat the on cat on the

Beyond bi or tri-grams occurrences become very rare and


dimensionality becomes huge (1, 10 million + features)
Big Idea of Recurrent Neural Networks

Let’s
Let’s wrap
wrap
aa DNN
DNN in in
aa for
for loop!
loop!
RNNs: Networks with Loops

ht
A: a subgraph of the NN
A x(t): RNN input at time t
h(t): RNN state at time t
xt - h(t) = (hidden state, output)

for t in range(len(x)):
h_next = A(x[t], h[t-1].hidden)
h.append(h_next)
loss = sum([loss_fn(y) for y in h.output])

Parts of this slide and the following slides are building on ideas and visualizations from
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Unrolled Recurrent Neural Networks

ht ht ht ht ht

A = A A A A

xt xt xt xt ... xt
Secret sauce:
● Tie (share) weights of A for all t.
● Backprop updates same weights for all t.
(sum gradients from all t).
RNNs provide temporal context

h0 h1 h2 h3 h4

A A A A A

x0 x1 x2 x3 x4

I grew up in France… I speak fluent _______.


Agenda
Sequence Models

DNNs and RNNs for sequences

RNN limitations

LSTM

Applying LSTM to Time Series Data


Problems with Long-Term RNNs
h0 h1 h2 ht ht+1 ht+2

A A A A A A

...
x0 x1 x2 xt xt+1 xt+2

● Problem 1: Gradients exploding


1
● Problem 2: Gradients vanishing

1. On the difficulty of training recurrent neural networks


Agenda
Sequence Models

DNNs and RNNs for sequences

RNN limitations

LSTM

Applying LSTM to Time Series Data


Vanishing Gradients - Two Weird Tricks

Standard RNN:

ht-1 ht ht+1

A A
tanh

xt-1 xt xt+1
Vanishing Gradients - Two Weird Tricks
● LSTM: “magic” solution to the vanishing gradient problem
● Trick #1: Memory cell carried over time
● Trick #2: Gates that learn to manage the memory

Long Short Term Memory Networks (LSTM)


ht-1 ht ht+1

x
x +

A A
tanh
x
x

σ σ σ tanh σ

xt-1 xt xt+1
Long Short Term Memory Networks (LSTM)
ht-1 ht ht+1

xx +

A A
tanh
x
x

σ σ σ tanh σ

xt-1 xt xt+1
LSTM - Cell State
ht

Ct-1 Ct
x +
tanh
ft it x ot ● “Conveyer belt”
~ x
Ct ● LSTM can “add” or “remove”
ht-1 ht information to cell state via
σ σ tanh σ
gates.

xt
Gates: Optionally Let Information Through

● Elementwise sigmoid and elementwise multiplication


● Differentiable: trainable

Forget Gate: What were we talking about?
ht
Ct-1 Ct
x +
tanh
ft it ~
x ot ft = σ (Wf . [ht-1 , xt] + bf )
Ct x
ht-1 σ σ tanh σ ht

xt
Input Gate and Candidate State
ht
Ct-1 Ct
x +
tanh
ft it ~
x ot it = σ (Wi . [ht-1 , xt] + bi )
Ct x ~
ht-1 σ σ tanh σ h Ct = tanh (Wc. [ht-1 , xt] + bC )
t

xt
Update the Cell State
ht
Ct-1 Ct
x +
tanh ~
ft it ~
x ot Ct = ft * Ct-1 + it* Ct
Ct x
ht-1 σ σ tanh σ ht

xt
Output Gate
ht
Ct-1 Ct
x +
tanh
ft it ~
x ot ot = σ (Wo [ ht-1 , xt ] + bo )
Ct x
ht-1 σ σ tanh σ ht ht = ot * tanh (Ct )

xt
● Output filtered version of cell state

http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Apply LSTM to Time
Series data
Agenda
Sequence Models

DNNs and RNNs for sequences

RNN limitations

LSTM

Applying LSTM to Time Series Data


Time-series problems are ubiquitous
● How many items will be sold
next week? Next month? Next
year?

● What is the likelihood there will


be a major earthquake (M>6.7)
on the Hayward fault in the next
26 years?

● Is this a fraudulent transaction?


Training a time-series model can require
significant feature engineering

feature1 feature2 ... label

5 20.51 1 1.1
Feature (and label!)
engineering
0.8 -0.51 2.9 -0.82

... ... ... ...


NYC real estate data

sale_week
2010-12-26 134640
2011-01-02 1150000
2011-01-09 945000
2011-01-16 995000
2011-01-23 1150000
Sliding Window to create features and label
Example: Create a feature table, window_size = 3,
horizon = 1
datetime value
2018-01-01 0:00:00 0.7713206433
2018-01-02 0:00:00 0.02075194936 pred_datetime -3_steps -2_steps -1_steps label
2018-01-03 0:00:00 0.6336482349 2018-01-04 0:00:00 0.7713206433 0.02075194936 0.6336482349 0.7488038825
2018-01-04 0:00:00 0.7488038825 2018-01-05 0:00:00 0.02075194936 0.6336482349 0.7488038825 0.4985070123
2018-01-05 0:00:00 0.4985070123 2018-01-06 0:00:00 0.6336482349 0.7488038825 0.4985070123 0.2247966455
2018-01-06 0:00:00 0.2247966455 2018-01-07 0:00:00 0.7488038825 0.4985070123 0.2247966455 0.1980628648
2018-01-07 0:00:00 0.1980628648 2018-01-08 0:00:00 0.4985070123 0.2247966455 0.1980628648 0.7605307122
2018-01-08 0:00:00 0.7605307122 2018-01-09 0:00:00 0.2247966455 0.1980628648 0.7605307122 0.1691108366
2018-01-09 0:00:00 0.1691108366 2018-01-10 0:00:00 0.1980628648 0.7605307122 0.1691108366 0.08833981417
2018-01-10 0:00:00 0.08833981417
Features, label
Input table
Create the features and label

import time_series

WINDOW_SIZE = 52 * 1
HORIZON = 4*6

df = time_series.create_rolling_features_label(sales,
window_size=WINDOW_SIZE,
pred_offset=HORIZON)

https://github.com/GoogleCloudPlatform/training-data-an
alyst/blob/master/blogs/gcp_forecasting/time_series.py
Date features can provide performance lift

dates = df.index
df = time_series.add_date_features(df, dates)

doy dom month year n_holidays

155 3 6 2012 0

162 10 6 2012 0

169 17 6 2012 0

176 24 6 2012 0

183 1 7 2012 1
Train/test set: split temporally

# Features, label.
X = df.drop('label', axis=1)
y = df['label']

# Train/test split. Splitting on time.


train_ix = time_series.is_between_dates(y.index,
end='2015-12-30')
test_ix = time_series.is_between_dates(y.index,
start='2015-12-30',
end='2018-12-30')
X_train, y_train = X.iloc[train_ix], y.iloc[train_ix]
Training
Predicting
Baseline model
Simple model: look at all the history and
predicts the next point to be the average
of the last 20 observations.

import time_series My bidirectional LSTM


baseline_global_metrics = successfully trained
time_series.Metrics(df_baseline.pred,
df_baseline.label)
baseline_global_metrics.report("Global Baseline Model")

"""
Global Baseline Model results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RMSE: 376544.261
MAE: 316352.450
MALR: 0.207 It doesn’t beat my baseline
"""
model
Machine learn: Random Forest
# Train model.
cl = RandomForestRegressor(n_estimators=500,
max_features='sqrt', random_state=10, criterion='mse')
cl.fit(X_train, y_train)
pred = cl.predict(X_test)

random_forest_metrics = time_series.Metrics(y_test,
pred)
random_forest_metrics.report("Forest Model")
"""
Forest Model results
~~~~~~~~~~~~~~~~~~~~
RMSE: 259388.403
MAE: 202647.688
MALR: 0.125 it’s working
"""
Machine learn: using LSTM
Instead of the simple Random Forest model, we can also build an LSTM model on the
same prepared dataset to attempt to increase model performance.

See the coming Lab for more details.

ht-1 ht ht+1

x
x +

A A
tanh
x
x

σ σ σ tanh σ

xt-1 xt xt+1
Lab
Use LSTM framework to
set up a simple Buy/Sell
trading model
Lab Objectives


Screencast

You might also like