0% found this document useful (0 votes)
16 views10 pages

Expt 5 Expt 6

Uploaded by

claudle200415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views10 pages

Expt 5 Expt 6

Uploaded by

claudle200415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

EXPERIMENT 4 To Implement Neural Network Embedding with

Continuous Bag of Words


AIM:
To implement Neural network embedding with continuous bag of words in Keras.

DESCRIPTION:
The Continuous Bag-of-Words (CBOW) model is a neural network-based approach to
predict a target word based on its surrounding context words.

Figure 1. Neural network for vocabulary size 10 and embedding size 2 (specific for code
given below).
The main stages are as follows
1. Vocabulary Definition
Defines the vocabulary with indices for each word.
Vocab_size=10, corresponding to the number of unique words
2. Model Definition
Input Layer: Accepts context words as input.
Embedding Layer: Maps words to dense vectors of size embedding_dim.
Averaging Layer: Computes the average of embeddings for context words.
Output Layer: A dense layer with softmax activation predicts the target word
probabilities.
3. Model Compilation
Optimizer: Uses Adam for efficient gradient updates.
Loss Function: Categorical crossentropy, suitable for multi-class classification.
4. Training Data Preparation
Defines 10 training examples with context words and their corresponding target words.
Converts context word indices and target words into input and one-hot encoded target
labels.
5. Training the Model
Trains the model for 50 epochs using the prepared data
6. Network Visualization
NetworkX Visualization: Illustrates the CBOW model architecture, showing:
Input nodes (context words)
Embedding layer
Average layer
Softmax layer
7. Testing the Model
Tests the model on unseen examples and evaluates its predictions.
Displays the context words, predicted target word, actual target word, and predicted
probabilities.
8. Weight Updates (in Training)
Embedding Layer Weights (WWW): Updated for the context words based on the
hidden layer error.
Output Layer Weights (W′W'W′): Updated for all words based on output layer error.

PROCEDURE:
Here’s how you can implement a Word2vec as CBOW model in Google Colab.

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Dense,
Average
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

# Vocabulary mapping
vocab = {0: "queen", 1: "man", 2: "woman", 3: "child", 4:
"king",
5: "prince", 6: "princess", 7: "throne", 8:
"palace", 9: "royal"}
vocab_size = len(vocab) # Vocabulary size = 10
embedding_dim = 5 # Embedding size
context_size = 3 # Context window size

# Define the CBOW model


inputs = [Input(shape=(1,), name=f"input_{i}") for i in
range(context_size)]
embedding_layer = Embedding(input_dim=vocab_size,
output_dim=embedding_dim, input_length=1, name="embedding")
embedded_words = [embedding_layer(inp) for inp in inputs]
averaged = Average(name="average")(embedded_words)
# Reshape the output of the Average layer to remove the extra
dimension
averaged =
tf.keras.layers.Reshape((embedding_dim,))(averaged) #
Reshape to (None, embedding_dim)
output = Dense(vocab_size, activation="softmax",
name="output")(averaged)
model = Model(inputs=inputs, outputs=output)

# Compile the model


model.compile(optimizer="adam",
loss="categorical_crossentropy")

# Training data: 10 instances of context words and their


target words
training_data = [
([0, 1, 2], 4), # Context: ["queen", "man", "woman"],
Target: "king"
([1, 2, 3], 5), # Context: ["man", "woman", "child"],
Target: "prince"
([2, 3, 4], 6), # Context: ["woman", "child", "king"],
Target: "princess"
([4, 5, 6], 7), # Context: ["king", "prince",
"princess"], Target: "throne"
([0, 5, 7], 8), # Context: ["queen", "prince",
"throne"], Target: "palace"
([5, 6, 8], 9), # Context: ["prince", "princess",
"palace"], Target: "royal"
([3, 4, 8], 0), # Context: ["child", "king", "palace"],
Target: "queen"
([6, 7, 9], 1), # Context: ["princess", "throne",
"royal"], Target: "man"
([7, 8, 9], 2), # Context: ["throne", "palace",
"royal"], Target: "woman"
([1, 5, 9], 3) # Context: ["man", "prince", "royal"],
Target: "child"
]
# Prepare training inputs and labels
# Reshape X_train to match the expected input shape
X_train = [np.array([context[i] for context, _ in
training_data]) for i in range(context_size)]
#X_train should be a list of 3 arrays, each with shape (10,
1)

y_train = np.array([tf.keras.utils.to_categorical(target,
num_classes=vocab_size) for _, target in training_data])

# Train the model


model.fit(X_train, y_train, epochs=50, verbose=1)

# Visualize CBOW network structure including the softmax


layer using NetworkX
def visualize_cbow_network():
G = nx.DiGraph()

# Add nodes for the layers


G.add_node(" (Context Word 1)", pos=(0, 3))
G.add_node(" (Context Word 2)", pos=(1, 3))
G.add_node(" (Context Word 3)", pos=(2, 3))
G.add_node("Shared Embedding", pos=(1, 2))
G.add_node("Average Layer", pos=(1, 1.5))
G.add_node("Softmax Layer (Output)", pos=(1, 1))

# Add edges to represent connections


G.add_edges_from([
(" (Context Word 1)", "Shared Embedding"),
(" (Context Word 2)", "Shared Embedding"),
(" (Context Word 3)", "Shared Embedding"),
("Shared Embedding", "Average Layer"),
("Average Layer", "Softmax Layer (Output)")
])

# Position nodes for visualization


pos = nx.get_node_attributes(G, 'pos')

# Draw the network


plt.figure(figsize=(10, 7))
nx.draw(G, pos, with_labels=True, node_size=3000,
node_color="lightblue", font_size=10, font_weight="bold")
plt.title("CBOW Model Network Visualization with Softmax
Layer")
plt.show()

# Call the function to visualize the CBOW network


visualize_cbow_network()

# Display test results


print("\nTest Data Predictions:")
test_data = [
([0, 1, 2], 4), # Context: ["queen", "man", "woman"],
Target: "king"
([1, 2, 3], 5), # Context: ["man", "woman", "child"],
Target: "prince"
([4, 5, 6], 7) # Context: ["king", "prince",
"princess"], Target: "throne"
]
for i, (context_indices, actual_word_index) in
enumerate(test_data):
context_words = [vocab[idx] for idx in context_indices]

# Predict the target word


prediction =
model.predict([np.array([context_indices[0]]),
np.array([context_indices[1]
]),
np.array([context_indices[2]
])])

predicted_word_index = np.argmax(prediction)
predicted_word = vocab[predicted_word_index]
actual_word = vocab[actual_word_index]

print(f"Test Input {i + 1}: Context Words =


{context_words}")
print(f"Predicted Probabilities: {prediction}")
print(f"Predicted Target Word: '{predicted_word}'")
print(f"Actual Target Word: '{actual_word}'\n")

Tasks:
1. Provide Colab link to the above code. [2 marks] [CO2] [BTL 4]
2. Construct any two sentences so that vocabulary size is 10. For context 2, (forward
and backward of the target), prepare training and testing data. Include the new
network architecture. Provide Colab link to the new code [3 marks] [CO2] [BTL 4]
EXPERIMENT 6 To Implement Gradient-Descent strategies To Train
Neural Networks
AIM:
To implement Batch, Stochastic Gradient Descent, Mini-Batch Gradient Descent for training
Neural networks.
DESCRIPTION:
There are three types of Gradient Descent based training techniques.
1. Batch Gradient Descent: Processes the entire dataset at once to compute gradients.
o Pros: Accurate gradient estimation.
o Cons: Slow for large datasets, high memory usage.
o Example: Dataset size = 1000, batch size = 1000 (all data).
2. Stochastic Gradient Descent (SGD): Updates model parameters using one data point
at a time.
o Pros: Faster updates, less memory.
o Cons: High variance, noisy convergence.
o Example: Dataset size = 1000, batch size = 1.
3. Mini-Batch Gradient Descent: Divides the dataset into smaller batches and
processes each batch separately.
o Pros: Balance between efficiency and stability.
o Cons: Requires batch size tuning.
o Example: Dataset size = 1000, batch size = 32.
Additionally, 5-fold cross validation technique, histogram features on an image will also be
learnt through this experiment. A histogram of an image represents the distribution of pixel
intensities. For Grayscale Image, frequency of each intensity level (0 to 255 for 8-bit images)
is plotted.
Experiments are conducted on the benchmark dataset built-in with Keras – CIFAR 100. This
dataset contains 60,000 color images across 100 categories.
PROCEDURE:
We will implement and evaluate the performance of three gradient techniques—batch,
stochastic, and mini-batch gradient descent—on the CIFAR-100 dataset using TensorFlow's
Keras API. We will also employ 5-fold cross-validation to assess model accuracy across all
three methods and measure the execution time for each approach in Google Colab.
Here's how we'll proceed:
1. Data Loading and Preprocessing:
Load CIFAR-100 dataset and normalize pixel values.
Convert class labels to one-hot encoding.
2. Features:
Histogram Computation: Each image's pixel intensity histogram is computed and
normalized to serve as input features using StandardScaler.
3. Model Definition:
Shallow Model: A simple neural network with one hidden dense layer.
4. Gradient Descent Methods:
Implement Batch Gradient Descent (one large batch).
Implement Stochastic Gradient Descent (single example updates).
Implement Mini-Batch Gradient Descent (small batch updates).
5. 5-Fold Cross-Validation:
Split data into 5 folds.
Train the model using each gradient method on each fold.
Compute and store accuracy and training time for each fold and gradient method.
Comparison and Results:
Compare the mean accuracy and training time for each method.
6. Present results in a tabular format for clarity.

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
import time

# Load and preprocess CIFAR-100 dataset


from tensorflow.keras.datasets import cifar100
(x_train, y_train), (x_test, y_test) = cifar100.load_data()
num_classes = 100

def compute_histograms(images):
# Compute histograms for each image in the dataset
histograms = []
for img in images:
hist, _ = np.histogram(img, bins=256, range=(0, 256),
density=True)
histograms.append(hist)
return np.array(histograms)

# Compute histograms as features


x_train_hist = compute_histograms(x_train)
x_test_hist = compute_histograms(x_test)

# Standardize the histogram features


scaler = StandardScaler()
x_train_hist = scaler.fit_transform(x_train_hist)
x_test_hist = scaler.transform(x_test_hist)

# Convert labels to one-hot encoding


y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

# Define a shallow neural network model


def create_shallow_model(input_dim):
model = Sequential([
Dense(128, activation='relu', input_dim=input_dim),
Dropout(0.5),
Dense(num_classes, activation='softmax')
])
return model

# Define gradient methods


methods = {
"batch": lambda:
tf.keras.optimizers.SGD(learning_rate=0.01),
"stochastic": lambda:
tf.keras.optimizers.SGD(learning_rate=0.01),
"mini_batch": lambda:
tf.keras.optimizers.SGD(learning_rate=0.01)
}

# Hyperparameters
batch_size = 128
mini_batch_size = 32
epochs = 10
kf = KFold(n_splits=5)
input_dim = x_train_hist.shape[1]

# Initialize results dictionary


results = {method: {"accuracy": [], "time": []} for method in
methods.keys()}

# Perform 5-fold cross-validation


for fold, (train_idx, val_idx) in
enumerate(kf.split(x_train_hist)):
print(f"Processing Fold {fold + 1}...")
x_train_fold, x_val_fold = x_train_hist[train_idx],
x_train_hist[val_idx]
y_train_fold, y_val_fold = y_train[train_idx],
y_train[val_idx]

for method, optimizer_fn in methods.items():


print(f" Training with {method} gradient
descent...")
model = create_shallow_model(input_dim)
model.compile(optimizer=optimizer_fn(),
loss='categorical_crossentropy', metrics=['accuracy'])

start_time = time.time()

if method == "batch":
model.fit(x_train_fold, y_train_fold,
batch_size=len(x_train_fold), epochs=epochs, verbose=0,
validation_data=(x_val_fold, y_val_fold))
elif method == "stochastic":
model.fit(x_train_fold, y_train_fold,
batch_size=1, epochs=epochs, verbose=0,
validation_data=(x_val_fold, y_val_fold))
elif method == "mini_batch":
model.fit(x_train_fold, y_train_fold,
batch_size=mini_batch_size, epochs=epochs, verbose=0,
validation_data=(x_val_fold, y_val_fold))

end_time = time.time()

# Evaluate model
loss, accuracy = model.evaluate(x_val_fold,
y_val_fold, verbose=0)
results[method]["accuracy"].append(accuracy)
results[method]["time"].append(end_time - start_time)

# Compute average accuracy and time


for method in methods.keys():
avg_accuracy = np.mean(results[method]["accuracy"])
avg_time = np.mean(results[method]["time"])
print(f"{method.capitalize()} Gradient Descent: Avg
Accuracy: {avg_accuracy:.4f}, Avg Time: {avg_time:.2f}
seconds")

Tasks:
1. Provide Colab link to the above code. [1 marks] [CO2] [BTL 4]
2. Add an additional hidden layer. Use only the fastest gradient descent technique and
report 5-fold Cross-Validation accuracy. Compare with previous network results.
Provide Colab link to the new code [4 marks] [CO2] [BTL 4]

You might also like