Chap 3.1 Embedding in Tensorflow
Chap 3.1 Embedding in Tensorflow
in TensorFlow
Dr. Sanjay Chatterji
CS 831
Motivating the Autoencoder Architecture
• We have discussed how each layer learned progressively more relevant
representations of the input.
• The final output is a lower-dimensional representation of the input image.
• Problems with these approaches
• there’s a significant amount of information loss that may be important for other tasks
• Autoencoder: encoder-decoder
• We first take the input and compress it into a low-dimensional vector.
• Then invert the computation and reconstruct the original input.
• Surprising effectiveness
Implementing an Autoencoder in TensorFlow
• “Reducing the dimensionality of data with neural networks” by Hinton and
Salakhutdinov in 2006
• nonlinear complexities afforded by a neural model would allow them to capture
structure that linear methods, such as PCA, would miss.
• To demonstrate this point, they ran an experiment on MNIST using both an
autoencoder and PCA to reduce the dataset into two-dimensional data points.
• we will recreate their experimental setup to validate this hypothesis and further
explore the architecture and properties of feed-forward autoencoders.
• Because we are essentially applying an inverse operation, we architect the decoder
network so that the autoencoder has the shape of an hourglass.
• The output of the decoder network is a 784-dimensional vector that can be
reconstructed into a 28 × 28 image.
The experimental setup for dimensionality reduction of the MNIST dataset
employed by Hinton and Salakhutdinov, 2006
Implementation details
• We use summary writers to also capture the image summaries after each
epoch.
parser = argparse.ArgumentParser(description='Test
val_writer = tf.train.SummaryWriter( print "Epoch:", '%04d' % (epoch+1),
various optimization strategies')
"mnist_autoencoder_hidden=" + n_code + "cost =", "{:.9f}".format(avg_cost)
parser.add_argument('n_code', nargs=1, type=str)
"_logs/", graph=sess.graph) train_writer.add_summary(train_summary,
args = parser.parse_args() sess.run(global_step))
init_op = tf.initialize_all_variables()
n_code = args.n_code[0] val_images = mnist.validation.images
sess.run(init_op)
mnist = input_data.read_data_sets("data/", validation_loss, in_im, out_im,
# Training cycle
one_hot=True) val_summary = sess.run([eval_op, in_im_op,
for epoch in range(training_epochs):
with tf.Graph().as_default(): out_im_op, val_summary_op],
avg_cost = 0.
with tf.variable_scope("autoencoder_model"): feed_dict={x: val_images,
total_batch = int(mnist.train.num_examples/
x = tf.placeholder("float", [None, 784]) # mnist phase_train: False})
batch_size)
data image of shape 28*28=784 val_writer.add_summary(in_im, sess.run
# Loop over all batches
phase_train = tf.placeholder(tf.bool) (global_step))
for i in range(total_batch):
code = encoder(x, int(n_code), phase_train) val_writer.add_summary(out_im, sess.run
mbatch_x, mbatch_y =
output = decoder(code, int(n_code), phase_train) (global_step))
mnist.train.next_batch(batch_size)
cost, train_summary_op = loss(output, x) val_writer.add_summary(val_summary, sess.run
# Fit training using batch data
(global_step))
global_step = tf.Variable(0, name='global_step', _, new_cost, train_summary = sess.run([
print "Validation Loss:", validation_loss
trainable=False) train_op, cost,
saver.save(sess,
train_op = training(cost, global_step) train_summary_op],
"mnist_autoencoder_hidden=" + n_code +
eval_op, in_im_op, out_im_op, val_summary_op = feed_dict={x: mbatch_x,
"_logs/model-checkpoint-"
evaluate(output, x) phase_train: True})
+ '%04d' % (epoch+1),
summary_op = tf.merge_all_summaries() train_writer.add_summary(train_summary,
global_step=global_step)
saver = tf.train.Saver(max_to_keep=200) sess.run(global_step))
print "Optimization Finished!"
# Compute average loss
sess = tf.Session() test_loss = sess.run(eval_op, feed_dict={x:
avg_cost += new_cost/total_batch
train_writer = tf.train.SummaryWriter( mnist.test.images, phase_train: False})
# Display logs per epoch step
"mnist_autoencoder_hidden=" + n_code + print "Test Loss:", loss
if epoch % display_step == 0:
"_logs/",graph=sess.graph)
TensorBoard
• We can visualize using TensorBoard
• TensorFlow graph
• training cost
• validation costs
• image summaries
• TensorBoard visualizations of the costs over the training and validation set.
• The image of the 1 on the left is compared to
• all of the other digits in the MNIST dataset
• Average of L2 cost
• $ tensorboard --logdir ~/path/to/mnist_autoencoder_hidden=2_logs
• Then navigate your browser to http://localhost:6006/.
• We can easily click through the components and delve deeper, tracing
• how data flows up through the various layers of the encoder and through the decoder
• how the optimizer reads the output of our training module
• how gradients in turn affect all of the components of the model
• We also visualize both the training (after each minibatch) and validation costs (after
each epoch), closely monitoring the curves for potential overfitting.
• The TensorBoard visualizations of the costs over the span of training are shown in
Figure.
2D codes produced by PCA and autoencoders
• We produce two-dimensional PCA codes on MNIST dataset.
• from sklearn import decomposition
• import input_data
• mnist = input_data.read_data_sets("data/", one_hot=False)
• pca = decomposition.PCA(n_components=2)
• pca.fit(mnist.train.images)
• pca_codes = pca.transform(mnist.test.images)
• one-hot vector is of size 10 with the ith component set to one to represent digit i
and the rest of the components set to zero
• PCA: It has trouble distinguishing 5’s from 3’s and 8’s, 0’s from 8’s, and 4’s from 9’s.
• Repeating the same experiment with 30-dimensional codes provides significant
improvement to the PCA reconstructions.
• A simple machine learning model more effectively classifies data points consisting
of autoencoder embeddings as compared to PCA embeddings.
Reconstructions for three randomly chosen samples from the test set
• it thematically represents a
strategy (finding embeddings
using context) that
generalizes to many deep
learning models.
• Using Word2Vec embeddings
instead of one-hot vectors to
represent words will yield far
superior results.
Implementing the Skip-Gram Architecture
Thank You