Expt 5 Expt 6
Expt 5 Expt 6
DESCRIPTION:
The Continuous Bag-of-Words (CBOW) model is a neural network-based approach to
predict a target word based on its surrounding context words.
Figure 1. Neural network for vocabulary size 10 and embedding size 2 (specific for code
given below).
The main stages are as follows
1. Vocabulary Definition
Defines the vocabulary with indices for each word.
Vocab_size=10, corresponding to the number of unique words
2. Model Definition
Input Layer: Accepts context words as input.
Embedding Layer: Maps words to dense vectors of size embedding_dim.
Averaging Layer: Computes the average of embeddings for context words.
Output Layer: A dense layer with softmax activation predicts the target word
probabilities.
3. Model Compilation
Optimizer: Uses Adam for efficient gradient updates.
Loss Function: Categorical crossentropy, suitable for multi-class classification.
4. Training Data Preparation
Defines 10 training examples with context words and their corresponding target words.
Converts context word indices and target words into input and one-hot encoded target
labels.
5. Training the Model
Trains the model for 50 epochs using the prepared data
6. Network Visualization
NetworkX Visualization: Illustrates the CBOW model architecture, showing:
Input nodes (context words)
Embedding layer
Average layer
Softmax layer
7. Testing the Model
Tests the model on unseen examples and evaluates its predictions.
Displays the context words, predicted target word, actual target word, and predicted
probabilities.
8. Weight Updates (in Training)
Embedding Layer Weights (WWW): Updated for the context words based on the
hidden layer error.
Output Layer Weights (W′W'W′): Updated for all words based on output layer error.
PROCEDURE:
Here’s how you can implement a Word2vec as CBOW model in Google Colab.
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Dense,
Average
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
# Vocabulary mapping
vocab = {0: "queen", 1: "man", 2: "woman", 3: "child", 4:
"king",
5: "prince", 6: "princess", 7: "throne", 8:
"palace", 9: "royal"}
vocab_size = len(vocab) # Vocabulary size = 10
embedding_dim = 5 # Embedding size
context_size = 3 # Context window size
y_train = np.array([tf.keras.utils.to_categorical(target,
num_classes=vocab_size) for _, target in training_data])
predicted_word_index = np.argmax(prediction)
predicted_word = vocab[predicted_word_index]
actual_word = vocab[actual_word_index]
Tasks:
1. Provide Colab link to the above code. [2 marks] [CO2] [BTL 4]
2. Construct any two sentences so that vocabulary size is 10. For context 2, (forward
and backward of the target), prepare training and testing data. Include the new
network architecture. Provide Colab link to the new code [3 marks] [CO2] [BTL 4]
EXPERIMENT 6 To Implement Gradient-Descent strategies To Train
Neural Networks
AIM:
To implement Batch, Stochastic Gradient Descent, Mini-Batch Gradient Descent for training
Neural networks.
DESCRIPTION:
There are three types of Gradient Descent based training techniques.
1. Batch Gradient Descent: Processes the entire dataset at once to compute gradients.
o Pros: Accurate gradient estimation.
o Cons: Slow for large datasets, high memory usage.
o Example: Dataset size = 1000, batch size = 1000 (all data).
2. Stochastic Gradient Descent (SGD): Updates model parameters using one data point
at a time.
o Pros: Faster updates, less memory.
o Cons: High variance, noisy convergence.
o Example: Dataset size = 1000, batch size = 1.
3. Mini-Batch Gradient Descent: Divides the dataset into smaller batches and
processes each batch separately.
o Pros: Balance between efficiency and stability.
o Cons: Requires batch size tuning.
o Example: Dataset size = 1000, batch size = 32.
Additionally, 5-fold cross validation technique, histogram features on an image will also be
learnt through this experiment. A histogram of an image represents the distribution of pixel
intensities. For Grayscale Image, frequency of each intensity level (0 to 255 for 8-bit images)
is plotted.
Experiments are conducted on the benchmark dataset built-in with Keras – CIFAR 100. This
dataset contains 60,000 color images across 100 categories.
PROCEDURE:
We will implement and evaluate the performance of three gradient techniques—batch,
stochastic, and mini-batch gradient descent—on the CIFAR-100 dataset using TensorFlow's
Keras API. We will also employ 5-fold cross-validation to assess model accuracy across all
three methods and measure the execution time for each approach in Google Colab.
Here's how we'll proceed:
1. Data Loading and Preprocessing:
Load CIFAR-100 dataset and normalize pixel values.
Convert class labels to one-hot encoding.
2. Features:
Histogram Computation: Each image's pixel intensity histogram is computed and
normalized to serve as input features using StandardScaler.
3. Model Definition:
Shallow Model: A simple neural network with one hidden dense layer.
4. Gradient Descent Methods:
Implement Batch Gradient Descent (one large batch).
Implement Stochastic Gradient Descent (single example updates).
Implement Mini-Batch Gradient Descent (small batch updates).
5. 5-Fold Cross-Validation:
Split data into 5 folds.
Train the model using each gradient method on each fold.
Compute and store accuracy and training time for each fold and gradient method.
Comparison and Results:
Compare the mean accuracy and training time for each method.
6. Present results in a tabular format for clarity.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
import time
def compute_histograms(images):
# Compute histograms for each image in the dataset
histograms = []
for img in images:
hist, _ = np.histogram(img, bins=256, range=(0, 256),
density=True)
histograms.append(hist)
return np.array(histograms)
# Hyperparameters
batch_size = 128
mini_batch_size = 32
epochs = 10
kf = KFold(n_splits=5)
input_dim = x_train_hist.shape[1]
start_time = time.time()
if method == "batch":
model.fit(x_train_fold, y_train_fold,
batch_size=len(x_train_fold), epochs=epochs, verbose=0,
validation_data=(x_val_fold, y_val_fold))
elif method == "stochastic":
model.fit(x_train_fold, y_train_fold,
batch_size=1, epochs=epochs, verbose=0,
validation_data=(x_val_fold, y_val_fold))
elif method == "mini_batch":
model.fit(x_train_fold, y_train_fold,
batch_size=mini_batch_size, epochs=epochs, verbose=0,
validation_data=(x_val_fold, y_val_fold))
end_time = time.time()
# Evaluate model
loss, accuracy = model.evaluate(x_val_fold,
y_val_fold, verbose=0)
results[method]["accuracy"].append(accuracy)
results[method]["time"].append(end_time - start_time)
Tasks:
1. Provide Colab link to the above code. [1 marks] [CO2] [BTL 4]
2. Add an additional hidden layer. Use only the fastest gradient descent technique and
report 5-fold Cross-Validation accuracy. Compare with previous network results.
Provide Colab link to the new code [4 marks] [CO2] [BTL 4]