Experiment 4
Experiment 4
Aim:
To implement autoencoders for a given problem.
Objective:
To implement a basic autoencoder for MNIST dataset.
To implement various types of autoencoders (denoising, variational, sparse) and
compare them.
To understand the process of encoding, decoding and dimensionality reduction using
autoencoders.
Theory:
Autoencoders are a class of artificial neural networks designed to learn efficient
representations of data, typically for the purpose of dimensionality reduction or feature
extraction. They operate by compressing the input into a lower-dimensional latent space
and then reconstructing the original input from this representation. This process involves
two main components: the encoder, which maps the input to the latent space, and the
decoder, which reconstructs the output from this representation. Autoencoders are
particularly useful in tasks where data needs to be simplified while retaining essential
characteristics, such as image processing, denoising, and anomaly detection.
There are several types of autoencoders, each designed for specific applications and
challenges. Denoising autoencoders introduce noise to the input data during training and
learn to reconstruct the original, clean data, making them effective for noise reduction.
Variational autoencoders (VAEs) take a probabilistic approach by encoding the input data
into a distribution rather than a fixed point, enabling the generation of new, similar data
samples from the latent space. Sparse autoencoders, on the other hand, incorporate
sparsity constraints to promote the learning of more efficient and interpretable features,
leading to a representation that emphasizes the most important aspects of the input data.
An Autoencoder at a Glance
The versatility of autoencoders has led to their application across various fields, including
image and speech recognition, recommendation systems, and generative modeling. They
can be used for unsupervised learning tasks, where labeled data is scarce, allowing models
to leverage vast amounts of unlabelled data. By capturing the underlying structure of the
input data, autoencoders provide a powerful tool for both data compression and feature
extraction, facilitating subsequent tasks like classification or clustering.
Implementation:
Following is a step-by-step implementation of the autoencoders that was carried out in the
lab (Link to Notebook ---> AutoEncoders)-
Building a basic autoencoder model to encode and, further, decode the MNIST images
input_dim = x_train.shape[1]
encoding_dim = 32
input_layer = Input(shape=(input_dim,))
encoded = Dense(encoding_dim, activation='relu')(input_layer)
decoded = Dense(input_dim, activation='sigmoid')(encoded)
Using the trained basic autoencoder to reconstruct the MNIST data instances
reconstructed = autoencoder.predict(x_test)
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28), cmap='gray')
plt.axis('off')
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(reconstructed[i].reshape(28, 28), cmap='gray')
plt.axis('off')
plt.show()
Viewing the latent space of the encoded data as performed by the above autoencoder
encoded_data = encoder.predict(x_test)
plt.figure(figsize=(10, 10))
plt.scatter(tsne_results[:, 0], tsne_results[:, 1], alpha=0.5)
plt.title("t-SNE visualization of encoded data")
plt.xlabel("t-SNE Component 1")
plt.ylabel("t-SNE Component 2")
plt.show()
Building a denoising autoencoder and testing it on the noisy MNIST data instances
def add_noise(data, noise_factor=0.5):
noisy_data = data + noise_factor * np.random.normal(loc=0.0, scale=1.0,
size=data.shape)
noisy_data = np.clip(noisy_data, 0.0, 1.0)
return noisy_data
x_train_noisy = add_noise(x_train)
x_test_noisy = add_noise(x_test)
input_layer = Input(shape=(input_dim,))
encoded = Dense(encoding_dim, activation='relu')(input_layer)
decoded = Dense(input_dim, activation='sigmoid')(encoded)
denoising_autoencoder.fit(x_train_noisy, x_train,
epochs=50,
batch_size=256,
shuffle=True,
validation_data=(x_test_noisy, x_test))
plt.subplot(3, n, i + 1 + n)
plt.imshow(x_test_noisy[i].reshape(28, 28), cmap='gray')
plt.axis('off')
plt.subplot(3, n, i + 1 + 2*n)
plt.imshow(reconstructed_noisy[i].reshape(28, 28), cmap='gray')
plt.axis('off')
plt.show()
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
plt.figure(figsize=(10, 8))
plt.scatter(encoded_imgs_2d[:, 0], encoded_imgs_2d[:, 1], cmap='viridis')
plt.colorbar()
plt.title('t-SNE visualization of latent space')
plt.show()
class Sampling(layers.Layer):
def call(self, inputs):
z_mean, z_log_var = inputs
batch = tf.shape(z_mean)[0]
dim = tf.shape(z_mean)[1]
epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
return z_mean + tf.exp(0.5 * z_log_var) * epsilon
class VAE(keras.Model):
def __init__(self, encoder, decoder, **kwargs):
super(VAE, self).__init__(**kwargs)
self.encoder = encoder
self.decoder = decoder
self.total_loss_tracker = keras.metrics.Mean(name="total_loss")
self.reconstruction_loss_tracker =
keras.metrics.Mean(name="reconstruction_loss")
self.kl_loss_tracker = keras.metrics.Mean(name="kl_loss")
@property
def metrics(self):
return [
self.total_loss_tracker,
self.reconstruction_loss_tracker,
self.kl_loss_tracker,
]
encoder_inputs = keras.Input(shape=(784,))
x = layers.Dense(256, activation="relu")(encoder_inputs)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
z = Sampling()([z_mean, z_log_var])
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name="encoder")
latent_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(256, activation="relu")(latent_inputs)
decoder_outputs = layers.Dense(784, activation="sigmoid")(x)
decoder = keras.Model(latent_inputs, decoder_outputs, name="decoder")
encoded_imgs_vae = encoder.predict(x_test)[2]
decoded_imgs_vae = decoder.predict(encoded_imgs_vae)
plot_reconstructed(x_test, decoded_imgs_vae)
New examples as constructed by the Variational Autoencoder
decoded_imgs_sparse = sparse_autoencoder.predict(x_test)
plot_reconstructed(x_test, decoded_imgs_sparse)
visualize_latent_space(sparse_encoder, x_test)
Inferences:
In case of the basic autoencoder, training over 50 epochs lead to a training loss of
about 0.0912 while the validation loss was noted to be about 0.0923, observing an
approximately linear reduction trend, with a bit of an instability on training data.
Comparing the actual MNIST instances with their reconstructed counterpart using the
basic encoder shows a good amount of similarity with certain edges being dimed.
Viewing the latent space of the encoded data (2 components) shows a good
separation of the subspace. However, one does still notice close proximities among
various groups/digit classes, showing room for improvement.
For the variational encoder built above, the VAE loss was found to have a linear
reduction pattern. Further, the newly constructed digit examples were found to be
legible.
The sparse autoencoder instance, unlike the first autoencoder model built, showed a
proximity among training and validation loss, leading to their convergence towards
last 18-20 epochs over the entire 50 epochs training scheme as implemented.