0% found this document useful (0 votes)
54 views23 pages

Chap 3.1 Embedding in Tensorflow

Uploaded by

HRITWIK GHOSH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views23 pages

Chap 3.1 Embedding in Tensorflow

Uploaded by

HRITWIK GHOSH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Implementing an Autoencoder

in TensorFlow
Dr. Sanjay Chatterji
CS 831
Motivating the Autoencoder Architecture
• We have discussed how each layer learned progressively more relevant
representations of the input.
• The final output is a lower-dimensional representation of the input image.
• Problems with these approaches
• there’s a significant amount of information loss that may be important for other tasks
• Autoencoder: encoder-decoder
• We first take the input and compress it into a low-dimensional vector.
• Then invert the computation and reconstruct the original input.
• Surprising effectiveness
Implementing an Autoencoder in TensorFlow
• “Reducing the dimensionality of data with neural networks” by Hinton and
Salakhutdinov in 2006
• nonlinear complexities afforded by a neural model would allow them to capture
structure that linear methods, such as PCA, would miss.
• To demonstrate this point, they ran an experiment on MNIST using both an
autoencoder and PCA to reduce the dataset into two-dimensional data points.
• we will recreate their experimental setup to validate this hypothesis and further
explore the architecture and properties of feed-forward autoencoders.
• Because we are essentially applying an inverse operation, we architect the decoder
network so that the autoencoder has the shape of an hourglass.
• The output of the decoder network is a 784-dimensional vector that can be
reconstructed into a 28 × 28 image.
The experimental setup for dimensionality reduction of the MNIST dataset
employed by Hinton and Salakhutdinov, 2006
Implementation details

• Two dimensional embedding is now treated as the input, and the


network attempts to reconstruct the original image.
• In order to accelerate training, we’ll reuse the batch normalization
strategy.
• We’ll use sigmoidal neurons instead of our usual ReLU neurons.
Implementing Decoder and Layer
def decoder(code, n_code, phase_train):
with tf.variable_scope("decoder"):
with tf.variable_scope("hidden_1"):
hidden_1 = layer(code, [n_code, n_decoder_hidden_1], [n_decoder_hidden_1], phase_train)
with tf.variable_scope("hidden_2"):
hidden_2 = layer(hidden_1, [n_decoder_hidden_1, n_decoder_hidden_2], [n_decoder_hidden_2], phase_train)
with tf.variable_scope("hidden_3"):
hidden_3 = layer(hidden_2, [n_decoder_hidden_2, n_decoder_hidden_3], [n_decoder_hidden_3], phase_train)
with tf.variable_scope("output"):
output = layer(hidden_3, [n_decoder_hidden_3, 784], [784], phase_train)
return output

def layer(input, weight_shape, bias_shape, phase_train):


weight_init = tf.random_normal_initializer(stddev=(1.0/weight_shape[0])**0.5)
bias_init = tf.constant_initializer(value=0)
W = tf.get_variable("W", weight_shape, initializer=weight_init)
b = tf.get_variable("b", bias_shape, initializer=bias_init)
logits = tf.matmul(input, W) + b
return tf.nn.sigmoid(layer_batch_norm(logits, weight_shape[1], phase_train))
Implementation details
• Then, construct a measure(objective function or loss function) to see how
well our model functions.
• We measure this by computing the distance between the original 784-
dimensional input and the reconstructed 784-dimensional output.
• L2 norm of the difference between the two vectors: ∥ I − O ∥ = √Σi (Ii − Oi)2
• We average this function over the whole minibatch to generate our final
objective function.
• Finally, we’ll train the network using the Adam optimizer, logging a scalar
summary of the error incurred at every minibatch using
tf.scalar_summary.
Implementing loss and Layer
def loss(output, x):
with tf.variable_scope("training"):
l2 = tf.sqrt(tf.reduce_sum(tf.square(tf.sub(output, x)), 1))
train_loss = tf.reduce_mean(l2)
train_summary_op = tf.scalar_summary("train_cost", train_loss)
return train_loss, train_summary_op

def training(cost, global_step):


optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999,
epsilon=1e-08, use_locking=False, name='Adam')
train_op = optimizer.minimize(cost, global_step=global_step)
return train_op
Implementing an Autoencoder in TensorFlow

• Evaluate the generalizability of the model using validation dataset.


• Compute the same L2 norm measurement for model evaluation.
• In addition, we’ll collect image summaries so that we can compare both the
input images and the reconstructions.
def image_summary(summary_label, tensor):
tensor_reshaped = tf.reshape(tensor, [-1, 28, 28, 1])
return tf.image_summary(summary_label, tensor_reshaped)

def evaluate(output, x):


with tf.variable_scope("validation"):
in_im_op = image_summary("input_image", x)
out_im_op = image_summary("output_image", output)
l2 = tf.sqrt(tf.reduce_sum(tf.square(tf.sub(output, x, name="val_diff")), 1))
val_loss = tf.reduce_mean(l2)
val_summary_op = tf.scalar_summary("val_cost", val_loss)
return val_loss, in_im_op, out_im_op, val_summary_op
Model building
• Finally, build the model out of these subcomponents and train the model.
• We accept command-line parameter for determining the number of
neurons in our code layer. $ python autoencoder_mnist.py 2
• We also reconfigure the model saver to maintain more snapshots of our
model.
• We’ll be reloading our most effective model later to compare its
performance to PCA.
• so we’d like to be able to have access to many snapshots.

• We use summary writers to also capture the image summaries after each
epoch.
parser = argparse.ArgumentParser(description='Test
val_writer = tf.train.SummaryWriter( print "Epoch:", '%04d' % (epoch+1),
various optimization strategies')
"mnist_autoencoder_hidden=" + n_code + "cost =", "{:.9f}".format(avg_cost)
parser.add_argument('n_code', nargs=1, type=str)
"_logs/", graph=sess.graph) train_writer.add_summary(train_summary,
args = parser.parse_args() sess.run(global_step))
init_op = tf.initialize_all_variables()
n_code = args.n_code[0] val_images = mnist.validation.images
sess.run(init_op)
mnist = input_data.read_data_sets("data/", validation_loss, in_im, out_im,
# Training cycle
one_hot=True) val_summary = sess.run([eval_op, in_im_op,
for epoch in range(training_epochs):
with tf.Graph().as_default(): out_im_op, val_summary_op],
avg_cost = 0.
with tf.variable_scope("autoencoder_model"): feed_dict={x: val_images,
total_batch = int(mnist.train.num_examples/
x = tf.placeholder("float", [None, 784]) # mnist phase_train: False})
batch_size)
data image of shape 28*28=784 val_writer.add_summary(in_im, sess.run
# Loop over all batches
phase_train = tf.placeholder(tf.bool) (global_step))
for i in range(total_batch):
code = encoder(x, int(n_code), phase_train) val_writer.add_summary(out_im, sess.run
mbatch_x, mbatch_y =
output = decoder(code, int(n_code), phase_train) (global_step))
mnist.train.next_batch(batch_size)
cost, train_summary_op = loss(output, x) val_writer.add_summary(val_summary, sess.run
# Fit training using batch data
(global_step))
global_step = tf.Variable(0, name='global_step', _, new_cost, train_summary = sess.run([
print "Validation Loss:", validation_loss
trainable=False) train_op, cost,
saver.save(sess,
train_op = training(cost, global_step) train_summary_op],
"mnist_autoencoder_hidden=" + n_code +
eval_op, in_im_op, out_im_op, val_summary_op = feed_dict={x: mbatch_x,
"_logs/model-checkpoint-"
evaluate(output, x) phase_train: True})
+ '%04d' % (epoch+1),
summary_op = tf.merge_all_summaries() train_writer.add_summary(train_summary,
global_step=global_step)
saver = tf.train.Saver(max_to_keep=200) sess.run(global_step))
print "Optimization Finished!"
# Compute average loss
sess = tf.Session() test_loss = sess.run(eval_op, feed_dict={x:
avg_cost += new_cost/total_batch
train_writer = tf.train.SummaryWriter( mnist.test.images, phase_train: False})
# Display logs per epoch step
"mnist_autoencoder_hidden=" + n_code + print "Test Loss:", loss
if epoch % display_step == 0:
"_logs/",graph=sess.graph)
TensorBoard
• We can visualize using TensorBoard
• TensorFlow graph
• training cost
• validation costs
• image summaries
• TensorBoard visualizations of the costs over the training and validation set.
• The image of the 1 on the left is compared to
• all of the other digits in the MNIST dataset
• Average of L2 cost
• $ tensorboard --logdir ~/path/to/mnist_autoencoder_hidden=2_logs
• Then navigate your browser to http://localhost:6006/.
• We can easily click through the components and delve deeper, tracing
• how data flows up through the various layers of the encoder and through the decoder
• how the optimizer reads the output of our training module
• how gradients in turn affect all of the components of the model
• We also visualize both the training (after each minibatch) and validation costs (after
each epoch), closely monitoring the curves for potential overfitting.
• The TensorBoard visualizations of the costs over the span of training are shown in
Figure.
2D codes produced by PCA and autoencoders
• We produce two-dimensional PCA codes on MNIST dataset.
• from sklearn import decomposition
• import input_data
• mnist = input_data.read_data_sets("data/", one_hot=False)
• pca = decomposition.PCA(n_components=2)
• pca.fit(mnist.train.images)
• pca_codes = pca.transform(mnist.test.images)
• one-hot vector is of size 10 with the ith component set to one to represent digit i
and the rest of the components set to zero
• PCA: It has trouble distinguishing 5’s from 3’s and 8’s, 0’s from 8’s, and 4’s from 9’s.
• Repeating the same experiment with 30-dimensional codes provides significant
improvement to the PCA reconstructions.
• A simple machine learning model more effectively classifies data points consisting
of autoencoder embeddings as compared to PCA embeddings.
Reconstructions for three randomly chosen samples from the test set

Two dimensional embeddings produced by PCA (left) and by an


autoencoder in clustering codes of different digit classes
import tensorflow as tf code = ae.encoder(x, 2, phase_train)
import autoencoder_mnist as ae output = ae.decoder(code, 2, phase_train)
import argparse
cost, train_summary_op = ae.loss(output, x)
global_step = tf.Variable(0, name='global_step',
def scatter(codes, labels):
trainable=False)
colors = [ ('#27ae60', 'o'), ('#2980b9', 'o'), train_op = ae.training(cost, global_step)
('#8e44ad', 'o'), ('#f39c12', 'o'), eval_op, in_im_op, out_im_op, val_summary_op =
('#c0392b', 'o'), ('#27ae60', 'x'), ae.evaluate(output, x)
('#2980b9', 'x'), ('#8e44ad', 'x'), saver = tf.train.Saver()
('#c0392b', 'x'), ('#f39c12', 'x')] sess = tf.Session()
for num in xrange(10): saver = tf.train.Saver()
saver.restore(sess, args.savepath[0])
plt.scatter([codes[:,0][i] for i in range(len(labels)) if labels[i] ==
ae_codes= sess.run(code, feed_dict={x:
num], [codes[:,1][i] for i in xrange(len (labels)) if labels[i] == mnist.test.images, phase_train: True})
num], 7, label=str(num), color = colors[num] scatter(ae_codes, mnist.test.labels)
[0],marker=colors[num][1]) scatter(pca_codes, mnist.test.labels)
plt.legend()
plt.show()
with tf.Graph().as_default():
with tf.variable_scope("autoencoder_model"):
x = tf.placeholder("float", [None, 784])
phase_train = tf.placeholder(tf.bool)
• The code snippet corrupts the input if the corrupt placeholder is equal to 1
• It refrains from corrupting the input if the corrupt placeholder tensor = 0.
• After this modification, rerun the autoencoder, resulting the reconstructions.
• Denoising autoencoder has faithfully replicated our incredible human ability
to fill in the missing pixels.
Word2Vec Framework
• A framework for generating word embeddings by Mikolov et al.
• Two strategies for generating embeddings
• Continuous Bag of Words (CBOW) model.
• encoder creates embedding from the context and predict the target word.
• Useful for smaller datasets
• Skip-Gram model
• takes the target word as input, and attempts to predict one of the words in the context.
• “the boy went to the bank.” => (context, target) pairs => (input, output) pairs => replace
each word with its unique index i ∈ {0, 1, . . . , |V|−1} in the vocabulary.
• There is a lookup table with V rows, where the ith row is the
embedding corresponding to the ith vocabulary word.
• The operation can be represented as a product of the transpose of
the lookup table and the one-hot vector representing the input word.
Word2Vec is not a deep machine learning model

• it thematically represents a
strategy (finding embeddings
using context) that
generalizes to many deep
learning models.
• Using Word2Vec embeddings
instead of one-hot vectors to
represent words will yield far
superior results.
Implementing the Skip-Gram Architecture
Thank You

You might also like