AML Lab

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

List of Programs

Date Page

1 Gradient descent algorithm 1

2 Stochastic gradient descent algorithm 3

3 Neural network with layers 6

4 Support Vector Machine 8

5 1D Convolutional Neural Network 10

6 2D convolutional network 12

7 bias, variance, and the trade-off 14

8 ResNet 16

9 Sentiment analysis 21

10 Markov Decision Processes 26

11 Q-learning algorithm 28

i
Exercise 1

Gradient descent algorithm


Question
1. Implement the gradient descent algorithm for linear regression..

Code

import numpy as np
import matplotlib.pyplot as plt
def gradient_descent(X, y, learning_rate, epochs):
m = 0
b = 0
n = len(X)
costs = []
for epoch in range(epochs):
y_pred = m*X + b
error = y_pred - y
m -= learning_rate * (1/n) * np.sum(X * error)
b -= learning_rate * (1/n) * np.sum(error)
if epoch % 100 == 0:
cost = (1/(2*n)) * np.sum(error**2)
costs.append(cost)
print(f'Epoch {epoch}, Cost: {cost}')
# Plot epoch vs. cost
plt.plot(range(0, epochs, 100), costs, marker='o')
plt.xlabel('Epoch')
plt.ylabel('Cost')
plt.title('Epoch vs. Cost')
plt.show()
return m, b
# Generate some random data for demonstration
np.random.seed(42)
X = np.random.rand(100)
y = 2 + 3*X + np.random.randn(100) * 0.1
learning_rate = 0.1
1
epochs = 1000
m, b = gradient_descent(X, y, learning_rate, epochs)
print("Slope (m):", m)
print("Intercept (b):", b)

1
EXERCISE 1. GRADIENT DESCENT ALGORITHM
Output
Epoch 0, Cost: 6.201838597515928
Epoch 100, Cost: 0.029416306865408297
Epoch 200, Cost: 0.010164975244755979
Epoch 300, Cost: 0.0055142882242454495
Epoch 400, Cost: 0.0043907872523807415
Epoch 500, Cost: 0.004119374786176467
Epoch 600, Cost: 0.00405380766282203
Epoch 700, Cost: 0.004037968126322113
Epoch 800, Cost: 0.004034141651959707
Epoch 900, Cost: 0.004033217262155361

Caption

Output
Slope (m): 2.952757653921563
Intercept (b): 2.022149709011742

2
Exercise 2

Stochastic gradient descent algorithm


Question
2. Implement the stochastic gradient descent algorithm for logistic regression.

Code

import numpy as np # for mathematical operations


# Assuming X and y are already defined
# Example data (replace with actual data)
X = np.array([[0.5, 1.5], [1.5, 2.0], [3.0, 4.0], [5.0, 6.0]]) # example features
y = np.array([0, 0, 1, 1]) # example labels
# Initialize the weights and bias i.e. 'm' and 'c'
m = np.zeros_like(X[0]) # array with shape equal to no. of features
c = 0
LR = 0.0001 # The learning Rate
epochs = 50 # no. of iterations for optimization
# Define sigmoid function
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Performing Gradient Descent Optimization
# for every epoch
for epoch in range(1, epochs + 1):
# Optionally shuffle the data
# indices = np.random.permutation(len(X))
# X, y = X[indices], y[indices]
# for every data point(X_train, y_train)
for i in range(len(X)):
# compute gradient w.r.t 'm'
gr_wrt_m = X[i] * (y[i] - sigmoid(np.dot(m.T, X[i]) + c))
# compute gradient w.r.t 'c'
gr_wrt_c = y[i] - sigmoid(np.dot(m.T, X[i]) + c)
# Print the gradients
print(f"Epoch {epoch}, Data Point {i}: Gradient w.r.t m: {gr_wrt_m}, Gradient w.r.t
c: {gr_wrt_c}")
# update m, c
m = m + LR * gr_wrt_m # Note the correction here: use '+' instead of '-'
c = c + LR * gr_wrt_c # Note the correction here: use '+' instead of '-'
# At the end of all epochs we will be having optimum values of 'm' and 'c'
# So by using those optimum values of 'm' and 'c' we can perform predictions
predictions = []
for i in range(len(X)):
z = np.dot(m, X[i]) + c
y_pred = sigmoid(z)
if y_pred >= 0.5:
predictions.append(1)
else:
predictions.append(0)
# 'predictions' list will contain all the predicted class labels using optimum 'm' and 'c'
print("Predictions:", predictions)
print("Optimum weights (m):", m)
print("Optimum bias (c):", c)

3
EXERCISE 2. STOCHASTIC GRADIENT DESCENT ALGORITHM
Output
Epoch 1, Data Point 0: Gradient w.r.t m: [-0.25 -0.75], Gradient w.r.t c: -0.5
Epoch 1, Data Point 1: Gradient w.r.t m: [-0.74991094 -0.99988125], Gradient w.r.t c: -0.49
Epoch 1, Data Point 2: Gradient w.r.t m: [1.50082494 2.00109992], Gradient w.r.t c: 0.50027
Epoch 1, Data Point 3: Gradient w.r.t m: [2.49956097 2.99947317], Gradient w.r.t c: 0.49991
Epoch 2, Data Point 0: Gradient w.r.t m: [-0.25007971 -0.75023912], Gradient w.r.t c: -0.50
Epoch 2, Data Point 1: Gradient w.r.t m: [-0.7503235 -1.00043133], Gradient w.r.t c: -0.500
Epoch 2, Data Point 2: Gradient w.r.t m: [1.49917499 1.99889998], Gradient w.r.t c: 0.49972
Epoch 2, Data Point 3: Gradient w.r.t m: [2.49525134 2.9943016 ], Gradient w.r.t c: 0.49905
Epoch 3, Data Point 0: Gradient w.r.t m: [-0.2501592 -0.75047759], Gradient w.r.t c: -0.500
Epoch 3, Data Point 1: Gradient w.r.t m: [-0.75073501 -1.00098002], Gradient w.r.t c: -0.50
Epoch 3, Data Point 2: Gradient w.r.t m: [1.49752907 1.99670543], Gradient w.r.t c: 0.49917
Epoch 3, Data Point 3: Gradient w.r.t m: [2.4909521 2.98914252], Gradient w.r.t c: 0.498190
Epoch 4, Data Point 0: Gradient w.r.t m: [-0.25023847 -0.75071541], Gradient w.r.t c: -0.50
Epoch 4, Data Point 1: Gradient w.r.t m: [-0.75114549 -1.00152731], Gradient w.r.t c: -0.50
Epoch 4, Data Point 2: Gradient w.r.t m: [1.49588719 1.99451626], Gradient w.r.t c: 0.49862
Epoch 4, Data Point 3: Gradient w.r.t m: [2.48666327 2.98399592], Gradient w.r.t c: 0.49733
Epoch 5, Data Point 0: Gradient w.r.t m: [-0.25031753 -0.75095259], Gradient w.r.t c: -0.50
Epoch 5, Data Point 1: Gradient w.r.t m: [-0.75155492 -1.00207322], Gradient w.r.t c: -0.50
Epoch 5, Data Point 2: Gradient w.r.t m: [1.49424934 1.99233246], Gradient w.r.t c: 0.49808
Epoch 5, Data Point 3: Gradient w.r.t m: [2.48238484 2.97886181], Gradient w.r.t c: 0.49647
Epoch 6, Data Point 0: Gradient w.r.t m: [-0.25039637 -0.75118912], Gradient w.r.t c: -0.50
Epoch 6, Data Point 1: Gradient w.r.t m: [-0.75196331 -1.00261775], Gradient w.r.t c: -0.50
Epoch 6, Data Point 2: Gradient w.r.t m: [1.49261551 1.99015401], Gradient w.r.t c: 0.49753
Epoch 6, Data Point 3: Gradient w.r.t m: [2.47811681 2.97374017], Gradient w.r.t c: 0.49562
Epoch 7, Data Point 0: Gradient w.r.t m: [-0.250475 -0.75142501], Gradient w.r.t c: -0.5009
Epoch 7, Data Point 1: Gradient w.r.t m: [-0.75237068 -1.0031609 ], Gradient w.r.t c: -0.50
Epoch 7, Data Point 2: Gradient w.r.t m: [1.4909857 1.98798093], Gradient w.r.t c: 0.496995
Epoch 7, Data Point 3: Gradient w.r.t m: [2.47385918 2.96863102], Gradient w.r.t c: 0.49477
Epoch 8, Data Point 0: Gradient w.r.t m: [-0.25055342 -0.75166026], Gradient w.r.t c: -0.50
Epoch 8, Data Point 1: Gradient w.r.t m: [-0.752777 -1.00370267], Gradient w.r.t c: -0.5018
Epoch 8, Data Point 2: Gradient w.r.t m: [1.48935989 1.98581319], Gradient w.r.t c: 0.49645
Epoch 8, Data Point 3: Gradient w.r.t m: [2.46961195 2.96353434], Gradient w.r.t c: 0.49392
Epoch 9, Data Point 0: Gradient w.r.t m: [-0.25063162 -0.75189486], Gradient w.r.t c: -0.50
Epoch 9, Data Point 1: Gradient w.r.t m: [-0.7531823 -1.00424307], Gradient w.r.t c: -0.502
Epoch 9, Data Point 2: Gradient w.r.t m: [1.48773809 1.98365078], Gradient w.r.t c: 0.49591
Epoch 9, Data Point 3: Gradient w.r.t m: [2.46537512 2.95845014], Gradient w.r.t c: 0.49307
Epoch 10, Data Point 0: Gradient w.r.t m: [-0.25070961 -0.75212883], Gradient w.r.t c: -0.5
Epoch 10, Data Point 1: Gradient w.r.t m: [-0.75358657 -1.00478209], Gradient w.r.t c: -0.5
Epoch 10, Data Point 2: Gradient w.r.t m: [1.48612028 1.98149371], Gradient w.r.t c: 0.4953
Epoch 10, Data Point 3: Gradient w.r.t m: [2.46114868 2.95337841], Gradient w.r.t c: 0.4922
Epoch 11, Data Point 0: Gradient w.r.t m: [-0.25078739 -0.75236216], Gradient w.r.t c: -0.5
Epoch 11, Data Point 1: Gradient w.r.t m: [-0.75398981 -1.00531975], Gradient w.r.t c: -0.5
Epoch 11, Data Point 2: Gradient w.r.t m: [1.48450646 1.97934195], Gradient w.r.t c: 0.4948
Epoch 11, Data Point 3: Gradient w.r.t m: [2.45693263 2.94831915], Gradient w.r.t c: 0.4913
Epoch 12, Data Point 0: Gradient w.r.t m: [-0.25086495 -0.75259486], Gradient w.r.t c: -0.5
Epoch 12, Data Point 1: Gradient w.r.t m: [-0.75439203 -1.00585605], Gradient w.r.t c: -0.5
Epoch 12, Data Point 2: Gradient w.r.t m: [1.48289663 1.97719551], Gradient w.r.t c: 0.4942
Epoch 12, Data Point 3: Gradient w.r.t m: [2.45272697 2.94327236], Gradient w.r.t c: 0.4905
Epoch 13, Data Point 0: Gradient w.r.t m: [-0.25094231 -0.75282692], Gradient w.r.t c: -0.5
Epoch 13, Data Point 1: Gradient w.r.t m: [-0.75479323 -1.00639098], Gradient w.r.t c: -0.5

4
EXERCISE 2. STOCHASTIC GRADIENT DESCENT ALGORITHM
Epoch 13, Data Point 2: Gradient w.r.t m: [1.48129078 1.97505437], Gradient w.r.t c: 0.4937
Epoch 13, Data Point 3: Gradient w.r.t m: [2.44853169 2.93823803], Gradient w.r.t c: 0.4897
Epoch 14, Data Point 0: Gradient w.r.t m: [-0.25101945 -0.75305835], Gradient w.r.t c: -0.5
Epoch 14, Data Point 1: Gradient w.r.t m: [-0.75519341 -1.00692455], Gradient w.r.t c: -0.5
Epoch 14, Data Point 2: Gradient w.r.t m: [1.4796889 1.97291853], Gradient w.r.t c: 0.49322
Epoch 14, Data Point 3: Gradient w.r.t m: [2.4443468 2.93321616], Gradient w.r.t c: 0.48886
Epoch 15, Data Point 0: Gradient w.r.t m: [-0.25109638 -0.75328915], Gradient w.r.t c: -0.5

5
Exercise 3

Neural network with layers


Question
3. Use the MNIST dataset and compose a neural network with layers suitable for image classification.

Code

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize pixel values to between 0 and 1


x_train, x_test = x_train / 255.0, x_test / 255.0

# Build the neural network


model = Sequential([
Flatten(input_shape=(28, 28)), # Flatten the input images to a 1D array
Dense(128, activation='relu'), # Fully connected layer with 128 units and ReLU
activation
Dense(10, activation='softmax') # Output layer with 10 units (one for each digit) and
softmax activation
])

# Compile the model


model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(x_train, y_train, epochs=5)

# Evaluate the model


model.evaluate(x_test, y_test)

Output
Epoch 1/5
1875/1875 [==============================] - 1s 601us/step - loss: 0.2611 - accuracy: 0.925
Epoch 2/5
1875/1875 [==============================] - 1s 602us/step - loss: 0.1148 - accuracy: 0.966
Epoch 3/5
1875/1875 [==============================] - 1s 584us/step - loss: 0.0797 - accuracy: 0.976
Epoch 4/5
1875/1875 [==============================] - 1s 586us/step - loss: 0.0603 - accuracy: 0.981
Epoch 5/5

6
EXERCISE 3. NEURAL NETWORK WITH LAYERS
1875/1875 [==============================] - 1s 600us/step - loss: 0.0464 - accuracy: 0.985
313/313 [==============================] - 0s 408us/step - loss: 0.0805 - accuracy: 0.9758
[0.08051188290119171, 0.9757999777793884]

7
Exercise 4

Support Vector Machine


Question
4.Implement a Support Vector Machine for classification using the MNIST dataset with feature extraction using Prin-
cipal Component Analysis to reduce the dimensionality of the data

Code

import idx2numpy
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the MNIST dataset from the IDX files


def load_mnist_images(filename):
return idx2numpy.convert_from_file(filename)

def load_mnist_labels(filename):
return idx2numpy.convert_from_file(filename)

# Replace with the actual path to the MNIST files


X_train = load_mnist_images('train-images-idx3-ubyte/train-images-idx3-ubyte')
y_train = load_mnist_labels('train-labels-idx1-ubyte/train-labels-idx1-ubyte')
X_test = load_mnist_images('t10k-images-idx3-ubyte/t10k-images-idx3-ubyte')
y_test = load_mnist_labels('t10k-labels-idx1-ubyte/t10k-labels-idx1-ubyte')

# Flatten the images (28x28) into vectors of size 784


X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)

# Standardize the features


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Apply PCA to reduce dimensionality


pca = PCA(n_components=0.95) # Retain 95% of variance
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

# Train an SVM classifier


svm = SVC(kernel='linear', random_state=42)
svm.fit(X_train_pca, y_train)

# Make predictions on the test set


y_pred = svm.predict(X_test_pca)

# Evaluate the classifier


accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

8
EXERCISE 4. SUPPORT VECTOR MACHINE
Output
Accuracy: 0.94

9
Exercise 5

1D Convolutional Neural Network


Question
5. Implement a 1D Convolutional Neural Network for sequence classification.

Code

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.metrics import SparseCategoricalAccuracy

# Generate dummy data


# Assuming we have 1000 sequences, each of length 100, with 1 feature per timestep
# And there are 10 classes
num_sequences = 1000
sequence_length = 100
num_features = 1
num_classes = 10

X = np.random.rand(num_sequences, sequence_length, num_features)


y = np.random.randint(0, num_classes, num_sequences)

# Define the 1D CNN model


model = Sequential([
Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(sequence_length,
num_features)),
MaxPooling1D(pool_size=2),
Dropout(0.5),
Conv1D(filters=128, kernel_size=3, activation='relu'),
MaxPooling1D(pool_size=2),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(num_classes, activation='softmax')
])

# Compile the model


model.compile(optimizer=Adam(),
loss=SparseCategoricalCrossentropy(),
metrics=[SparseCategoricalAccuracy()])

# Print model summary


model.summary()

# Train the model


model.fit(X, y, epochs=10, batch_size=32, validation_split=0.2)

10
EXERCISE 5. 1D CONVOLUTIONAL NEURAL NETWORK
Output
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d (Conv1D) (None, 98, 64) 256

max_pooling1d (MaxPooling1 (None, 49, 64) 0


D)

dropout (Dropout) (None, 49, 64) 0

conv1d_1 (Conv1D) (None, 47, 128) 24704

max_pooling1d_1 (MaxPoolin (None, 23, 128) 0


g1D)

flatten_2 (Flatten) (None, 2944) 0

dense_2 (Dense) (None, 128) 376960

dropout_1 (Dropout) (None, 128) 0

dense_3 (Dense) (None, 10) 1290

=================================================================
Total params: 403210 (1.54 MB)
Trainable params: 403210 (1.54 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/10
25/25 [==============================] - 2s 29ms/step - loss: 2.3285 - sparse_categorical_a
Epoch 2/10
25/25 [==============================] - 0s 20ms/step - loss: 2.3042 - sparse_categorical_a
Epoch 3/10
25/25 [==============================] - 0s 20ms/step - loss: 2.2998 - sparse_categorical_a
Epoch 4/10
25/25 [==============================] - 0s 19ms/step - loss: 2.3004 - sparse_categorical_a
Epoch 5/10
25/25 [==============================] - 0s 18ms/step - loss: 2.2965 - sparse_categorical_a
Epoch 6/10
25/25 [==============================] - 0s 19ms/step - loss: 2.3008 - sparse_categorical_a
Epoch 7/10
25/25 [==============================] - 0s 20ms/step - loss: 2.2984 - sparse_categorical_a
Epoch 8/10
25/25 [==============================] - 0s 18ms/step - loss: 2.2977 - sparse_categorical_a
Epoch 9/10
25/25 [==============================] - 0s 19ms/step - loss: 2.2957 - sparse_categorical_a
Epoch 10/10
25/25 [==============================] - 1s 21ms/step - loss: 2.2908 - sparse_categorical_a
<keras.src.callbacks.History at 0x7fa1f120f460>

11
Exercise 6

2D convolutional network
Question
6. Implement a 2D convolutional network using the CIFAR-10 dataset for image classificatio.

Code

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, Flatten

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

unique_values_train = np.unique(y_train)
n_features = X_train.shape[1:]
n_classes = len(unique_values_train)
# Normalize pixel values to be between 0 and 1
X_train, X_test = X_train / 255.0, X_test / 255.0
def build_neural_network(n_features, n_classes):
## define input layer
inputs = Input(shape=n_features)

# Add convolutional layers


network_1 = Conv2D(32, kernel_size=(3, 3), activation='relu')(inputs)
network_2 = Conv2D(64, kernel_size=(3, 3), activation='relu')(network_1)
network_3 = MaxPooling2D(pool_size=(2, 2))(network_2)

network_4 = Conv2D(128, kernel_size=(3, 3), activation='relu')(network_3)


network_5 = Conv2D(256, kernel_size=(3, 3), activation='relu')(network_4)
network_6 = MaxPooling2D(pool_size=(2, 2))(network_5)

# Flatten the output and add fully connected layers


network_7 = Flatten()(network_6)
network_8 = Dense(128, activation='relu')(network_7)

## Defining output layer


output = Dense(n_classes, activation="softmax", name='output')(network_8)

## Defining the model by specifying the input and output layers


model = Model(inputs=inputs, outputs=output)

# print(model.summary())

return model

# model compile
metric = [tf.keras.metrics.CategoricalAccuracy()]
opt = tf.keras.optimizers.Adam()
loss = tf.keras.losses.CategoricalCrossentropy()
network = build_neural_network(n_features, n_classes)

12
EXERCISE 6. 2D CONVOLUTIONAL NETWORK
network.compile(loss=loss, optimizer=opt, metrics=metric)
network.summary()

Output
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 32, 32, 3)] 0

conv2d_4 (Conv2D) (None, 30, 30, 32) 896

conv2d_5 (Conv2D) (None, 28, 28, 64) 18496

max_pooling2d_2 (MaxPoolin (None, 14, 14, 64) 0


g2D)

conv2d_6 (Conv2D) (None, 12, 12, 128) 73856

conv2d_7 (Conv2D) (None, 10, 10, 256) 295168

max_pooling2d_3 (MaxPoolin (None, 5, 5, 256) 0


g2D)

flatten_1 (Flatten) (None, 6400) 0

dense_1 (Dense) (None, 128) 819328

output (Dense) (None, 10) 1290

=================================================================
Total params: 1209034 (4.61 MB)
Trainable params: 1209034 (4.61 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

13
Exercise 7

bias, variance, and the trade-off


Question
7. Evaluate the trained model on the Fashion MNIST dataset and analyze bias, variance, and the trade-off

Code

import numpy as np
import tensorflow as tf
from sklearn.metrics import accuracy_score, mean_squared_error
from sklearn.utils import resample
# Load Fashion MNIST dataset
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
# Normalize the pixel values to be between 0 and 1
X_train, X_test = X_train / 255.0, X_test / 255.0
# Flatten the images to make them compatible with logistic regression
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)
# Convert labels to binary (for simplicity, let's classify between two classes: 0 (T-shirt/
top) and 1 (Trouser))
binary_indices_train = np.where((y_train == 0) | (y_train == 1))[0]
binary_indices_test = np.where((y_test == 0) | (y_test == 1))[0]
X_train_binary, y_train_binary = X_train[binary_indices_train], y_train[binary_indices_train
]
X_test_binary, y_test_binary = X_test[binary_indices_test], y_test[binary_indices_test]
# Convert labels to 0 and 1
y_train_binary = np.where(y_train_binary == 0, 0, 1)
y_test_binary = np.where(y_test_binary == 0, 0, 1)
# Define sigmoid function
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Function to train the model and return predictions
def train_and_predict(X_train, y_train, X_test):
m = np.zeros(X_train.shape[1])
c = 0
LR = 0.0001
epochs = 50
for epoch in range(1, epochs + 1):
for i in range(len(X_train)):
gr_wrt_m = X_train[i] * (y_train[i] - sigmoid(np.dot(m.T, X_train[i]) + c))
gr_wrt_c = y_train[i] - sigmoid(np.dot(m.T, X_train[i]) + c)
m = m + LR * gr_wrt_m
c = c + LR * gr_wrt_c
predictions = []
for i in range(len(X_test)):
z = np.dot(m, X_test[i]) + c
y_pred = sigmoid(z)
predictions.append(y_pred)
return np.array(predictions)
# Train multiple models and collect their predictions
n_models = 10
all_predictions = []
for _ in range(n_models):

14
EXERCISE 7. BIAS, VARIANCE, AND THE TRADE-OFF
X_resampled, y_resampled = resample(X_train_binary, y_train_binary)
predictions = train_and_predict(X_resampled, y_resampled, X_test_binary)
all_predictions.append(predictions)
all_predictions = np.array(all_predictions)
# Calculate the average prediction
average_prediction = np.mean(all_predictions, axis=0)
# Calculate bias
bias = mean_squared_error(y_test_binary, average_prediction)
# Calculate variance
variance = np.mean(np.var(all_predictions, axis=0))
# Output bias and variance
print("Bias:", bias)
print("Variance:", variance)

Output
Bias: 0.014884992836837167
Variance: 8.482621657344636e-05

15
Exercise 8

ResNet
Question
8. Build a ResNet model with residual connections and Batch Normalization using the SVHN dataset (Street View
House Numbers)

Code

import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, Add,
GlobalAveragePooling2D, Dense, Flatten
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical

# Load the SVHN dataset


(ds_train, ds_test), ds_info = tfds.load(
'svhn_cropped',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)

# Preprocess the data


def preprocess(image, label):
image = tf.cast(image, tf.float32) / 255.0
label = tf.one_hot(label, depth=10)
return image, label

AUTOTUNE = tf.data.AUTOTUNE
train_ds = ds_train.map(preprocess, num_parallel_calls=AUTOTUNE)
test_ds = ds_test.map(preprocess, num_parallel_calls=AUTOTUNE)

# Batch and prefetch the data


BATCH_SIZE = 32
train_ds = train_ds.cache().shuffle(1000).batch(BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
test_ds = test_ds.batch(BATCH_SIZE).cache().prefetch(buffer_size=AUTOTUNE)

# Define ResNet block


def resnet_block(inputs, filters, kernel_size=3, stride=1):
x = Conv2D(filters, kernel_size=kernel_size, strides=stride, padding='same')(inputs)
x = BatchNormalization()(x)
x = Activation('relu')(x)

x = Conv2D(filters, kernel_size=kernel_size, strides=1, padding='same')(x)


x = BatchNormalization()(x)

if stride != 1 or inputs.shape[-1] != filters:


inputs = Conv2D(filters, kernel_size=1, strides=stride, padding='same')(inputs)
inputs = BatchNormalization()(inputs)

16
EXERCISE 8. RESNET
x = Add()([x, inputs])
x = Activation('relu')(x)
return x

# Build ResNet model


def build_resnet(input_shape, num_classes):
inputs = Input(shape=input_shape)

x = Conv2D(32, kernel_size=3, strides=1, padding='same')(inputs)


x = BatchNormalization()(x)
x = Activation('relu')(x)

x = resnet_block(x, 32)
x = resnet_block(x, 32)

x = resnet_block(x, 64, stride=2)


x = resnet_block(x, 64)

x = resnet_block(x, 128, stride=2)


x = resnet_block(x, 128)

x = GlobalAveragePooling2D()(x)
x = Flatten()(x)
outputs = Dense(num_classes, activation='softmax')(x)

model = Model(inputs, outputs)


return model

# Build and compile the model


input_shape = (32, 32, 3)
num_classes = 10
model = build_resnet(input_shape, num_classes)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Print model summary


model.summary()

# Data augmentation
datagen = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True
)

EPOCHS = 5
# Convert the dataset to NumPy arrays
train_images = []
train_labels = []
for image, label in train_ds:
train_images.append(image.numpy())
train_labels.append(label.numpy())

train_images = np.concatenate(train_images, axis=0) # Concatenate the image tensors along


the first axis
train_labels = np.concatenate(train_labels, axis=0)
history = model.fit(datagen.flow(train_images,
train_labels,
batch_size=BATCH_SIZE),
validation_data=test_ds,
epochs=EPOCHS)
loss, accuracy = model.evaluate(test_ds)
print(f"Test Accuracy: {accuracy * 100:.2f}%")

17
EXERCISE 8. RESNET
Output
Model: "model"
___________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
===========================================================================================
input_1 (InputLayer) [(None, 32, 32, 3)] 0 []

conv2d (Conv2D) (None, 32, 32, 32) 896 [’input_1[0][0]’]

batch_normalization (Batch (None, 32, 32, 32) 128 [’conv2d[0][0]’]


Normalization)

activation (Activation) (None, 32, 32, 32) 0 [’batch_normalization[0


activation_1 (Activation) (None, 32, 32, 32) 0 [’batch_normalization_1
]

conv2d_2 (Conv2D) (None, 32, 32, 32) 9248 [’activation_1[0][0]’]

batch_normalization_2 (Bat (None, 32, 32, 32) 128 [’conv2d_2[0][0]’]


chNormalization)

add (Add) (None, 32, 32, 32) 0 [’batch_normalization_2


, ’activation[0][0]’]

activation_2 (Activation) (None, 32, 32, 32) 0 [’add[0][0]’]

conv2d_3 (Conv2D) (None, 32, 32, 32) 9248 [’activation_2[0][0]’]

batch_normalization_3 (Bat (None, 32, 32, 32) 128 [’conv2d_3[0][0]’]


chNormalization)

activation_3 (Activation) (None, 32, 32, 32) 0 [’batch_normalization_3


]

conv2d_4 (Conv2D) (None, 32, 32, 32) 9248 [’activation_3[0][0]’]

batch_normalization_4 (Bat (None, 32, 32, 32) 128 [’conv2d_4[0][0]’]


chNormalization)

add_1 (Add) (None, 32, 32, 32) 0 [’batch_normalization_4


, ’activation_2[0][0]’]

activation_4 (Activation) (None, 32, 32, 32) 0 [’add_1[0][0]’]

conv2d_5 (Conv2D) (None, 16, 16, 64) 18496 [’activation_4[0][0]’]

batch_normalization_5 (Bat (None, 16, 16, 64) 256 [’conv2d_5[0][0]’]


chNormalization)

activation_5 (Activation) (None, 16, 16, 64) 0 [’batch_normalization_5


]

18
EXERCISE 8. RESNET

conv2d_6 (Conv2D) (None, 16, 16, 64) 36928 [’activation_5[0][0]’]

conv2d_7 (Conv2D) (None, 16, 16, 64) 2112 [’activation_4[0][0]’]

batch_normalization_6 (Bat (None, 16, 16, 64) 256 [’conv2d_6[0][0]’]


chNormalization)

batch_normalization_7 (Bat (None, 16, 16, 64) 256 [’conv2d_7[0][0]’]


chNormalization)

add_2 (Add) (None, 16, 16, 64) 0 [’batch_normalization_6


, ’batch_normalization_
’]

activation_6 (Activation) (None, 16, 16, 64) 0 [’add_2[0][0]’]

conv2d_8 (Conv2D) (None, 16, 16, 64) 36928 [’activation_6[0][0]’]

batch_normalization_8 (Bat (None, 16, 16, 64) 256 [’conv2d_8[0][0]’]


chNormalization)

activation_7 (Activation) (None, 16, 16, 64) 0 [’batch_normalization_8


]

conv2d_9 (Conv2D) (None, 16, 16, 64) 36928 [’activation_7[0][0]’]

batch_normalization_9 (Bat (None, 16, 16, 64) 256 [’conv2d_9[0][0]’]


chNormalization)

add_3 (Add) (None, 16, 16, 64) 0 [’batch_normalization_9


, ’activation_6[0][0]’]

activation_8 (Activation) (None, 16, 16, 64) 0 [’add_3[0][0]’]

conv2d_10 (Conv2D) (None, 8, 8, 128) 73856 [’activation_8[0][0]’]

batch_normalization_10 (Ba (None, 8, 8, 128) 512 [’conv2d_10[0][0]’]


tchNormalization)

activation_9 (Activation) (None, 8, 8, 128) 0 [’batch_normalization_1


’]

conv2d_11 (Conv2D) (None, 8, 8, 128) 147584 [’activation_9[0][0]’]

conv2d_12 (Conv2D) (None, 8, 8, 128) 8320 [’activation_8[0][0]’]

batch_normalization_11 (Ba (None, 8, 8, 128) 512 [’conv2d_11[0][0]’]


tchNormalization)

batch_normalization_12 (Ba (None, 8, 8, 128) 512 [’conv2d_12[0][0]’]

19
EXERCISE 8. RESNET
tchNormalization)

add_4 (Add) (None, 8, 8, 128) 0 [’batch_normalization_1


’,
’batch_normalization_1
’]

activation_10 (Activation) (None, 8, 8, 128) 0 [’add_4[0][0]’]

conv2d_13 (Conv2D) (None, 8, 8, 128) 147584 [’activation_10[0][0]’]

batch_normalization_13 (Ba (None, 8, 8, 128) 512 [’conv2d_13[0][0]’]


tchNormalization)

activation_11 (Activation) (None, 8, 8, 128) 0 [’batch_normalization_1


’]

conv2d_14 (Conv2D) (None, 8, 8, 128) 147584 [’activation_11[0][0]’]

batch_normalization_14 (Ba (None, 8, 8, 128) 512 [’conv2d_14[0][0]’]


tchNormalization)

add_5 (Add) (None, 8, 8, 128) 0 [’batch_normalization_1


’,
’activation_10[0][0]’]

activation_12 (Activation) (None, 8, 8, 128) 0 [’add_5[0][0]’]

global_average_pooling2d ( (None, 128) 0 [’activation_12[0][0]’]


GlobalAveragePooling2D)

flatten (Flatten) (None, 128) 0 [’global_average_poolin


0]’]

dense (Dense) (None, 10) 1290 [’flatten[0][0]’]

===========================================================================================
Total params: 699978 (2.67 MB)
Trainable params: 697738 (2.66 MB)
Non-trainable params: 2240 (8.75 KB)

Epoch 1/2
1145/1145 [==============================] - 816s 707ms/step - loss: 0.7138 - accuracy: 0.7
Epoch 2/2
1145/1145 [==============================] - 786s 687ms/step - loss: 0.3181 - accuracy: 0.9

407/407 [==============================] - 73s 179ms/step - loss: 2.2911 - accuracy: 0.1179


Test Accuracy: 11.79%

20
Exercise 9

Sentiment analysis
Question
9.Implement a transformer for sentiment analysis using the IMDB movie review dataset.

Code

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Dropout, Embedding, GlobalAveragePooling1D
, LayerNormalization, MultiHeadAttention, Conv1D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
import numpy as np
import pandas as pd

# Constants
MAXLEN = 200 # Maximum sequence length
NUM_HEADS = 2 # Number of attention heads (reduced for simplicity)
FF_DIM = 128 # Feed-forward dimension in each Transformer block (reduced for simplicity)
NUM_TRANSFORMER_BLOCKS = 2 # Number of Transformer blocks (reduced for simplicity)
VOCAB_SIZE = 5000 # Vocabulary size
EMBED_DIM = 128 # Embedding dimension (reduced for simplicity)

# Load IMDB dataset


(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=VOCAB_SIZE)
df_train = pd.DataFrame({'review': x_train, 'sentiment': y_train})
df_test = pd.DataFrame({'review': x_test, 'sentiment': y_test})
print(df_train.head)
# print(x_train[0],y_train[0])

# Get the word index


word_index = imdb.get_word_index()
# Reserve the first indices for special tokens
word_index = {k: (v + 3) for k, v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2 # unknown
word_index["<UNUSED>"] = 3

# Reverse the word index to map integers to words


reverse_word_index = {value: key for key, value in word_index.items()}

# Function to decode reviews back to words


def decode_review(encoded_review):
return ' '.join([reverse_word_index.get(i, '?') for i in encoded_review])

# Print the first 5 reviews and their labels


for i in range(5):
print(f"Review {i+1}: {decode_review(x_train[i])}")
print(f"Label {i+1}: {y_train[i]}")
print()
x_train_df = pd.DataFrame({'review': x_train})

21
EXERCISE 9. SENTIMENT ANALYSIS
y_train_df = pd.DataFrame({'label': y_train})
x_test_df = pd.DataFrame({'review': x_test})
y_test_df = pd.DataFrame({'label': y_test})

# Select half of the dataset using iloc


train_size = len(x_train_df) // 4
test_size = len(x_test_df) // 4

x_train_half = x_train_df.iloc[:train_size]
y_train_half = y_train_df.iloc[:train_size]
x_test_half = x_test_df.iloc[:test_size]
y_test_half = y_test_df.iloc[:test_size]

# Ensure sequences are padded to the same length


x_train_half_padded = pad_sequences(x_train_half['review'].tolist(), maxlen=MAXLEN)
x_test_half_padded = pad_sequences(x_test_half['review'].tolist(), maxlen=MAXLEN)

# Build Transformer model


def build_transformer_model(maxlen, vocab_size, embed_dim, num_heads, ff_dim,
num_transformer_blocks):
inputs = Input(shape=(maxlen,))
embedding_layer = Embedding(input_dim=vocab_size, output_dim=embed_dim)(inputs)

# Positional encoding
positions = np.arange(maxlen).reshape(-1, 1)
positional_encoding = np.zeros((maxlen, embed_dim))
positional_encoding[:, 0::2] = np.sin(positions / 10000**(2 * np.arange(embed_dim)[0::2]
/ embed_dim))
positional_encoding[:, 1::2] = np.cos(positions / 10000**(2 * np.arange(embed_dim)[1::2]
/ embed_dim))
x = embedding_layer + positional_encoding

# Transformer blocks
for _ in range(num_transformer_blocks):
# Multi-head self-attention
x1 = LayerNormalization()(x)
x2 = MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim // num_heads)(x1, x1)
x = x1 + x2

# Feed-forward network
x1 = LayerNormalization()(x)
x2 = Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(x1)
x = x1 + x2

# Global average pooling and classification


x = GlobalAveragePooling1D()(x)
outputs = Dense(1, activation="sigmoid")(x)

model = Model(inputs=inputs, outputs=outputs)


return model

# Build and compile the model


transformer_model = build_transformer_model(MAXLEN, VOCAB_SIZE, EMBED_DIM, NUM_HEADS, FF_DIM
, NUM_TRANSFORMER_BLOCKS)
transformer_model.compile(optimizer=Adam(learning_rate=1e-4), loss="binary_crossentropy",
metrics=["accuracy"])

# Print model summary


transformer_model.summary()

# Train the model


transformer_model.fit(np.array(x_train_half_padded), np.array(y_train_half['label']), epochs
=2, batch_size=32, validation_data=(np.array(x_test_half_padded), np.array(y_test_half['
label'])))

22
EXERCISE 9. SENTIMENT ANALYSIS
# Evaluate the model
loss, accuracy = transformer_model.evaluate(np.array(x_test_half_padded), np.array(
y_test_half['label']))
print(f"Test Accuracy: {accuracy * 100:.2f}%")
def predict_sentiment(review, model, maxlen):
# Tokenize and pad the review
review_seq = imdb.get_word_index()
review_seq = {k:(v+3) for k,v in review_seq.items()}
tokenized_review = [review_seq[word] if word in review_seq and review_seq[word] <
VOCAB_SIZE else 2 for word in review.split()]
padded_review = pad_sequences([tokenized_review], maxlen=maxlen)

# Predict sentiment
prediction = model.predict(padded_review)[0, 0]
sentiment = "positive" if prediction >= 0.5 else "negative"
confidence = prediction if prediction >= 0.5 else 1 - prediction

return sentiment, confidence

# Example usage of sentiment analysis function


new_review = "This movie good! The acting was bad and the plot was not engaging."
sentiment, confidence = predict_sentiment(new_review, transformer_model, MAXLEN)
print(f"Review: '{new_review}'")
print(f"Predicted Sentiment: {sentiment} (Confidence: {confidence * 100:.2f}%)")

Output

Model: "model_1"
___________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
===========================================================================================
input_2 (InputLayer) [(None, 200)] 0 []

embedding_1 (Embedding) (None, 200, 128) 640000 [’input_2[0][0]’]

tf.__operators__.add_5 (TF (None, 200, 128) 0 [’embedding_1[0][0]’]

layer_normalization_4 (Lay (None, 200, 128) 256 [’tf.__operators__.add_


erNormalization) ’]

23
EXERCISE 9. SENTIMENT ANALYSIS
multi_head_attention_2 (Mu (None, 200, 128) 66048 [’layer_normalization_4
ltiHeadAttention) , ’layer_normalization_
’]

tf.__operators__.add_6 (TF (None, 200, 128) 0 [’layer_normalization_4


OpLambda) , ’multi_head_attention
]’]

layer_normalization_5 (Lay (None, 200, 128) 256 [’tf.__operators__.add_


erNormalization) ’]

conv1d_2 (Conv1D) (None, 200, 128) 16512 [’layer_normalization_5


]

tf.__operators__.add_7 (TF (None, 200, 128) 0 [’layer_normalization_5


OpLambda) , ’conv1d_2[0][0]’]

layer_normalization_6 (Lay (None, 200, 128) 256 [’tf.__operators__.add_


erNormalization) ’]

multi_head_attention_3 (Mu (None, 200, 128) 66048 [’layer_normalization_6


ltiHeadAttention) , ’layer_normalization_
’]

tf.__operators__.add_8 (TF (None, 200, 128) 0 [’layer_normalization_6


OpLambda) , ’multi_head_attention
]’]

layer_normalization_7 (Lay (None, 200, 128) 256 [’tf.__operators__.add_


erNormalization) ’]

conv1d_3 (Conv1D) (None, 200, 128) 16512 [’layer_normalization_7


]

tf.__operators__.add_9 (TF (None, 200, 128) 0 [’layer_normalization_7


OpLambda) , ’conv1d_3[0][0]’]

global_average_pooling1d_1 (None, 128) 0 [’tf.__operators__.add_


(GlobalAveragePooling1D) ’]

dense_1 (Dense) (None, 1) 129 [’global_average_poolin


][0]’]

===========================================================================================
Total params: 806273 (3.08 MB)
Trainable params: 806273 (3.08 MB)
Non-trainable params: 0 (0.00 Byte)
___________________________________________________________________________________________
Epoch 1/2
196/196 [==============================] - 147s 733ms/step - loss: 0.6982 - accuracy: 0.498
Epoch 2/2

24
EXERCISE 9. SENTIMENT ANALYSIS
196/196 [==============================] - 139s 711ms/step - loss: 0.6922 - accuracy: 0.522
196/196 [==============================] - 37s 188ms/step - loss: 0.6852 - accuracy: 0.5230
Test Accuracy: 52.30%

1/1 [==============================] - 0s 29ms/step


Review: ’This movie good! The acting was very good and the plot was too much engaging.’
Predicted Sentiment: negative (Confidence: 50.31%)

25
Exercise 10

Markov Decision Processes


Question
10. Implement the Markov Decision Processes algorithm to find the optimal policy using the given environment
actions, transitions, and rewards. Print the optimal policy for each state.

Code

import numpy as np

# Define the grid size


grid_size = 4

# Define actions
actions = ['up', 'down', 'left', 'right']

# Define rewards
rewards = np.zeros((grid_size, grid_size))
rewards[3, 3] = 1 # Reward for reaching the goal state

# Define transition probabilities


transition_probs = {
'up': (lambda x, y: (max(x-1, 0), y)),
'down': (lambda x, y: (min(x+1, grid_size-1), y)),
'left': (lambda x, y: (x, max(y-1, 0))),
'right': (lambda x, y: (x, min(y+1, grid_size-1)))
}

def value_iteration(grid_size, rewards, transition_probs, actions, gamma=0.9, theta=1e-6):


value_table = np.zeros((grid_size, grid_size))
policy = np.zeros((grid_size, grid_size), dtype=int) # Change np.int to int

while True:
delta = 0
for x in range(grid_size):
for y in range(grid_size):
if (x, y) == (3, 3): # Skip the terminal state
continue
v = value_table[x, y]
q_values = []
for i, action in enumerate(actions):
(next_x, next_y) = transition_probs[action](x, y)
q_value = rewards[x, y] + gamma * value_table[next_x, next_y]
q_values.append(q_value)
value_table[x, y] = max(q_values)
policy[x, y] = np.argmax(q_values)
delta = max(delta, abs(v - value_table[x, y]))
if delta < theta:
break
return policy, value_table

def print_policy(policy, actions):


policy_arrows = np.full(policy.shape, ' ')

26
EXERCISE 10. MARKOV DECISION PROCESSES
for x in range(policy.shape[0]):
for y in range(policy.shape[1]):
if (x, y) == (3, 3):
policy_arrows[x, y] = 'G' # Goal state
else:
policy_arrows[x, y] = actions[policy[x, y]][0].upper()
for row in policy_arrows:
print(' '.join(row))

policy, value_table = value_iteration(grid_size, rewards, transition_probs, actions)


print("Optimal Policy:")
print_policy(policy, actions)

Output
Optimal Policy:
U U U U
U U U U
U U U U
U U U G

27
Exercise 11

Q-learning algorithm
Question
11. Implement a Q-learning algorithm to solve a tabular reinforcement learning problem using the OpenAI Gym
environment

Code

import gym
import numpy as np
import random

# Initialize the FrozenLake environment


env = gym.make("FrozenLake-v1", is_slippery=False)

# Q-learning parameters
alpha = 0.1 # Learning rate
gamma = 0.99 # Discount factor
epsilon = 1.0 # Exploration rate
epsilon_min = 0.1 # Minimum exploration rate
epsilon_decay = 0.995 # Decay rate for exploration probability

# Initialize the Q-table


q_table = np.zeros((env.observation_space.n, env.action_space.n))

# Training parameters
num_episodes = 1000
max_steps_per_episode = 100

# Q-learning algorithm
for episode in range(num_episodes):
state = env.reset()
done = False
step = 0
total_reward = 0

while not done and step < max_steps_per_episode:


# Exploration-exploitation tradeoff
if random.uniform(0, 1) < epsilon:
action = env.action_space.sample() # Explore
else:
action = np.argmax(q_table[state, :]) # Exploit

# Take the action and observe the outcome


next_state, reward, done, _ = env.step(action)

# Update the Q-table


old_value = q_table[state, action]
next_max = np.max(q_table[next_state, :])
new_value = (1 - alpha) * old_value + alpha * (reward + gamma * next_max)
q_table[state, action] = new_value

state = next_state

28
EXERCISE 11. Q-LEARNING ALGORITHM
step += 1
total_reward += reward

# Decay the exploration rate


if epsilon > epsilon_min:
epsilon *= epsilon_decay

if (episode + 1) % 100 == 0:
print(f'Episode {episode + 1}/{num_episodes} - Total reward: {total_reward} -
Epsilon: {epsilon}')
print(f'Q-table snapshot:\n{q_table}')

# Evaluate the agent


num_eval_episodes = 100
total_rewards = 0

for episode in range(num_eval_episodes):


state = env.reset()
done = False
step = 0
episode_reward = 0

while not done and step < max_steps_per_episode:


action = np.argmax(q_table[state, :]) # Always exploit during evaluation
next_state, reward, done, _ = env.step(action)
episode_reward += reward
state = next_state
step += 1

total_rewards += episode_reward

average_reward = total_rewards / num_eval_episodes


print(f'Average reward over {num_eval_episodes} evaluation episodes: {average_reward}')

env.close()

Output
Episode 100/1000 - Total reward: 0.0 - Epsilon: 0.6057704364907278
Q-table snapshot:
[[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0.1 0. ]
[0. 0. 0. 0. ]]

29
EXERCISE 11. Q-LEARNING ALGORITHM
Episode 200/1000 - Total reward: 0.0 - Epsilon: 0.3669578217261671
Q-table snapshot:
[[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0.1 0. ]
[0. 0. 0. 0. ]]
Episode 300/1000 - Total reward: 1.0 - Epsilon: 0.22229219984074702
Q-table snapshot:
[[6.02547296e-07 1.51495463e-04 0.00000000e+00 2.20400503e-06]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 1.53648100e-03 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 1.03181644e-02 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 4.51467432e-02 0.00000000e+00]
[0.00000000e+00 2.18172292e-01 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 5.69532790e-01 1.83176054e-02]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]]
Episode 400/1000 - Total reward: 1.0 - Epsilon: 0.1346580429260134
Q-table snapshot:
[[9.84775773e-02 8.83934279e-01 9.59888291e-04 2.37303204e-01]
[7.78714740e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[1.94936243e-01 9.28005608e-01 0.00000000e+00 1.80659948e-01]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[8.42392398e-02 0.00000000e+00 9.58402468e-01 1.65558509e-01]
[3.00984444e-01 4.63133002e-02 9.76380016e-01 0.00000000e+00]
[8.96077280e-02 9.89357301e-01 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]

30
EXERCISE 11. Q-LEARNING ALGORITHM
[0.00000000e+00 4.42146309e-02 5.00856645e-01 0.00000000e+00]
[1.05385835e-01 1.87311993e-01 9.99923823e-01 4.49784855e-01]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]]
Episode 500/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[2.57967746e-01 9.50917648e-01 9.59888291e-04 4.25477197e-01]
[7.78714740e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[2.69194204e-01 9.60574965e-01 0.00000000e+00 3.84676614e-01]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[1.71747383e-01 0.00000000e+00 9.70293865e-01 2.43452026e-01]
[4.79013518e-01 1.45283853e-01 9.80098982e-01 0.00000000e+00]
[5.02068648e-01 9.89999839e-01 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 4.42146309e-02 6.33413939e-01 9.70102057e-02]
[1.84416127e-01 4.04834911e-01 9.99999991e-01 5.02815773e-01]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]]
Episode 600/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[0.44319831 0.95099002 0.12310583 0.523514 ]
[0.48251959 0. 0. 0. ]
[0.02395337 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.45396019 0.960596 0. 0.49046894]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.3855252 0. 0.970299 0.57497446]
[0.47901352 0.3847749 0.9801 0. ]
[0.59103233 0.99 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0.04421463 0.80049555 0.09701021]
[0.27840679 0.51601628 1. 0.59349976]
[0. 0. 0. 0. ]]
Episode 700/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[0.53787186 0.95099005 0.30532645 0.66725256]
[0.67046853 0. 0. 0. ]
[0.02395337 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.50366317 0.96059601 0. 0.57616107]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.58329205 0. 0.970299 0.67687469]
[0.57051419 0.47573037 0.9801 0. ]

31
EXERCISE 11. Q-LEARNING ALGORITHM
[0.663093 0.99 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0.04421463 0.88928956 0.09701021]
[0.46501218 0.51601628 1. 0.69826843]
[0. 0. 0. 0. ]]
Episode 800/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[0.61455743 0.95099005 0.47396598 0.76155943]
[0.7814505 0. 0. 0. ]
[0.02395337 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.62488876 0.96059601 0. 0.6455717 ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.6549798 0. 0.970299 0.75115995]
[0.77402109 0.47573037 0.9801 0. ]
[0.74634583 0.99 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0.04421463 0.89936061 0.09701021]
[0.50655063 0.56341465 1. 0.72645158]
[0. 0. 0. 0. ]]
Episode 900/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[0.67667275 0.95099005 0.60843455 0.79574437]
[0.83648469 0. 0. 0. ]
[0.02395337 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.71326221 0.96059601 0. 0.72576289]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.71304688 0. 0.970299 0.78912767]
[0.82458289 0.51719403 0.9801 0. ]
[0.80703714 0.99 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0.04421463 0.92392388 0.09701021]
[0.58198826 0.60607319 1. 0.81368127]
[0. 0. 0. 0. ]]
Episode 1000/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[0.74843555 0.95099005 0.65121421 0.82343416]
[0.85643383 0. 0. 0. ]
[0.02395337 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.7584305 0.96059601 0. 0.83830325]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]

32
EXERCISE 11. Q-LEARNING ALGORITHM
[0. 0. 0. 0. ]
[0.82903825 0. 0.970299 0.84479214]
[0.85042538 0.62833521 0.9801 0. ]
[0.8738945 0.99 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0.04421463 0.95488444 0.09701021]
[0.67685248 0.76329506 1. 0.81368127]
[0. 0. 0. 0. ]]
Average reward over 100 evaluation episodes: 1.0

33

You might also like