AML Lab
AML Lab
AML Lab
Date Page
6 2D convolutional network 12
8 ResNet 16
9 Sentiment analysis 21
11 Q-learning algorithm 28
i
Exercise 1
Code
import numpy as np
import matplotlib.pyplot as plt
def gradient_descent(X, y, learning_rate, epochs):
m = 0
b = 0
n = len(X)
costs = []
for epoch in range(epochs):
y_pred = m*X + b
error = y_pred - y
m -= learning_rate * (1/n) * np.sum(X * error)
b -= learning_rate * (1/n) * np.sum(error)
if epoch % 100 == 0:
cost = (1/(2*n)) * np.sum(error**2)
costs.append(cost)
print(f'Epoch {epoch}, Cost: {cost}')
# Plot epoch vs. cost
plt.plot(range(0, epochs, 100), costs, marker='o')
plt.xlabel('Epoch')
plt.ylabel('Cost')
plt.title('Epoch vs. Cost')
plt.show()
return m, b
# Generate some random data for demonstration
np.random.seed(42)
X = np.random.rand(100)
y = 2 + 3*X + np.random.randn(100) * 0.1
learning_rate = 0.1
1
epochs = 1000
m, b = gradient_descent(X, y, learning_rate, epochs)
print("Slope (m):", m)
print("Intercept (b):", b)
1
EXERCISE 1. GRADIENT DESCENT ALGORITHM
Output
Epoch 0, Cost: 6.201838597515928
Epoch 100, Cost: 0.029416306865408297
Epoch 200, Cost: 0.010164975244755979
Epoch 300, Cost: 0.0055142882242454495
Epoch 400, Cost: 0.0043907872523807415
Epoch 500, Cost: 0.004119374786176467
Epoch 600, Cost: 0.00405380766282203
Epoch 700, Cost: 0.004037968126322113
Epoch 800, Cost: 0.004034141651959707
Epoch 900, Cost: 0.004033217262155361
Caption
Output
Slope (m): 2.952757653921563
Intercept (b): 2.022149709011742
2
Exercise 2
Code
3
EXERCISE 2. STOCHASTIC GRADIENT DESCENT ALGORITHM
Output
Epoch 1, Data Point 0: Gradient w.r.t m: [-0.25 -0.75], Gradient w.r.t c: -0.5
Epoch 1, Data Point 1: Gradient w.r.t m: [-0.74991094 -0.99988125], Gradient w.r.t c: -0.49
Epoch 1, Data Point 2: Gradient w.r.t m: [1.50082494 2.00109992], Gradient w.r.t c: 0.50027
Epoch 1, Data Point 3: Gradient w.r.t m: [2.49956097 2.99947317], Gradient w.r.t c: 0.49991
Epoch 2, Data Point 0: Gradient w.r.t m: [-0.25007971 -0.75023912], Gradient w.r.t c: -0.50
Epoch 2, Data Point 1: Gradient w.r.t m: [-0.7503235 -1.00043133], Gradient w.r.t c: -0.500
Epoch 2, Data Point 2: Gradient w.r.t m: [1.49917499 1.99889998], Gradient w.r.t c: 0.49972
Epoch 2, Data Point 3: Gradient w.r.t m: [2.49525134 2.9943016 ], Gradient w.r.t c: 0.49905
Epoch 3, Data Point 0: Gradient w.r.t m: [-0.2501592 -0.75047759], Gradient w.r.t c: -0.500
Epoch 3, Data Point 1: Gradient w.r.t m: [-0.75073501 -1.00098002], Gradient w.r.t c: -0.50
Epoch 3, Data Point 2: Gradient w.r.t m: [1.49752907 1.99670543], Gradient w.r.t c: 0.49917
Epoch 3, Data Point 3: Gradient w.r.t m: [2.4909521 2.98914252], Gradient w.r.t c: 0.498190
Epoch 4, Data Point 0: Gradient w.r.t m: [-0.25023847 -0.75071541], Gradient w.r.t c: -0.50
Epoch 4, Data Point 1: Gradient w.r.t m: [-0.75114549 -1.00152731], Gradient w.r.t c: -0.50
Epoch 4, Data Point 2: Gradient w.r.t m: [1.49588719 1.99451626], Gradient w.r.t c: 0.49862
Epoch 4, Data Point 3: Gradient w.r.t m: [2.48666327 2.98399592], Gradient w.r.t c: 0.49733
Epoch 5, Data Point 0: Gradient w.r.t m: [-0.25031753 -0.75095259], Gradient w.r.t c: -0.50
Epoch 5, Data Point 1: Gradient w.r.t m: [-0.75155492 -1.00207322], Gradient w.r.t c: -0.50
Epoch 5, Data Point 2: Gradient w.r.t m: [1.49424934 1.99233246], Gradient w.r.t c: 0.49808
Epoch 5, Data Point 3: Gradient w.r.t m: [2.48238484 2.97886181], Gradient w.r.t c: 0.49647
Epoch 6, Data Point 0: Gradient w.r.t m: [-0.25039637 -0.75118912], Gradient w.r.t c: -0.50
Epoch 6, Data Point 1: Gradient w.r.t m: [-0.75196331 -1.00261775], Gradient w.r.t c: -0.50
Epoch 6, Data Point 2: Gradient w.r.t m: [1.49261551 1.99015401], Gradient w.r.t c: 0.49753
Epoch 6, Data Point 3: Gradient w.r.t m: [2.47811681 2.97374017], Gradient w.r.t c: 0.49562
Epoch 7, Data Point 0: Gradient w.r.t m: [-0.250475 -0.75142501], Gradient w.r.t c: -0.5009
Epoch 7, Data Point 1: Gradient w.r.t m: [-0.75237068 -1.0031609 ], Gradient w.r.t c: -0.50
Epoch 7, Data Point 2: Gradient w.r.t m: [1.4909857 1.98798093], Gradient w.r.t c: 0.496995
Epoch 7, Data Point 3: Gradient w.r.t m: [2.47385918 2.96863102], Gradient w.r.t c: 0.49477
Epoch 8, Data Point 0: Gradient w.r.t m: [-0.25055342 -0.75166026], Gradient w.r.t c: -0.50
Epoch 8, Data Point 1: Gradient w.r.t m: [-0.752777 -1.00370267], Gradient w.r.t c: -0.5018
Epoch 8, Data Point 2: Gradient w.r.t m: [1.48935989 1.98581319], Gradient w.r.t c: 0.49645
Epoch 8, Data Point 3: Gradient w.r.t m: [2.46961195 2.96353434], Gradient w.r.t c: 0.49392
Epoch 9, Data Point 0: Gradient w.r.t m: [-0.25063162 -0.75189486], Gradient w.r.t c: -0.50
Epoch 9, Data Point 1: Gradient w.r.t m: [-0.7531823 -1.00424307], Gradient w.r.t c: -0.502
Epoch 9, Data Point 2: Gradient w.r.t m: [1.48773809 1.98365078], Gradient w.r.t c: 0.49591
Epoch 9, Data Point 3: Gradient w.r.t m: [2.46537512 2.95845014], Gradient w.r.t c: 0.49307
Epoch 10, Data Point 0: Gradient w.r.t m: [-0.25070961 -0.75212883], Gradient w.r.t c: -0.5
Epoch 10, Data Point 1: Gradient w.r.t m: [-0.75358657 -1.00478209], Gradient w.r.t c: -0.5
Epoch 10, Data Point 2: Gradient w.r.t m: [1.48612028 1.98149371], Gradient w.r.t c: 0.4953
Epoch 10, Data Point 3: Gradient w.r.t m: [2.46114868 2.95337841], Gradient w.r.t c: 0.4922
Epoch 11, Data Point 0: Gradient w.r.t m: [-0.25078739 -0.75236216], Gradient w.r.t c: -0.5
Epoch 11, Data Point 1: Gradient w.r.t m: [-0.75398981 -1.00531975], Gradient w.r.t c: -0.5
Epoch 11, Data Point 2: Gradient w.r.t m: [1.48450646 1.97934195], Gradient w.r.t c: 0.4948
Epoch 11, Data Point 3: Gradient w.r.t m: [2.45693263 2.94831915], Gradient w.r.t c: 0.4913
Epoch 12, Data Point 0: Gradient w.r.t m: [-0.25086495 -0.75259486], Gradient w.r.t c: -0.5
Epoch 12, Data Point 1: Gradient w.r.t m: [-0.75439203 -1.00585605], Gradient w.r.t c: -0.5
Epoch 12, Data Point 2: Gradient w.r.t m: [1.48289663 1.97719551], Gradient w.r.t c: 0.4942
Epoch 12, Data Point 3: Gradient w.r.t m: [2.45272697 2.94327236], Gradient w.r.t c: 0.4905
Epoch 13, Data Point 0: Gradient w.r.t m: [-0.25094231 -0.75282692], Gradient w.r.t c: -0.5
Epoch 13, Data Point 1: Gradient w.r.t m: [-0.75479323 -1.00639098], Gradient w.r.t c: -0.5
4
EXERCISE 2. STOCHASTIC GRADIENT DESCENT ALGORITHM
Epoch 13, Data Point 2: Gradient w.r.t m: [1.48129078 1.97505437], Gradient w.r.t c: 0.4937
Epoch 13, Data Point 3: Gradient w.r.t m: [2.44853169 2.93823803], Gradient w.r.t c: 0.4897
Epoch 14, Data Point 0: Gradient w.r.t m: [-0.25101945 -0.75305835], Gradient w.r.t c: -0.5
Epoch 14, Data Point 1: Gradient w.r.t m: [-0.75519341 -1.00692455], Gradient w.r.t c: -0.5
Epoch 14, Data Point 2: Gradient w.r.t m: [1.4796889 1.97291853], Gradient w.r.t c: 0.49322
Epoch 14, Data Point 3: Gradient w.r.t m: [2.4443468 2.93321616], Gradient w.r.t c: 0.48886
Epoch 15, Data Point 0: Gradient w.r.t m: [-0.25109638 -0.75328915], Gradient w.r.t c: -0.5
5
Exercise 3
Code
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Output
Epoch 1/5
1875/1875 [==============================] - 1s 601us/step - loss: 0.2611 - accuracy: 0.925
Epoch 2/5
1875/1875 [==============================] - 1s 602us/step - loss: 0.1148 - accuracy: 0.966
Epoch 3/5
1875/1875 [==============================] - 1s 584us/step - loss: 0.0797 - accuracy: 0.976
Epoch 4/5
1875/1875 [==============================] - 1s 586us/step - loss: 0.0603 - accuracy: 0.981
Epoch 5/5
6
EXERCISE 3. NEURAL NETWORK WITH LAYERS
1875/1875 [==============================] - 1s 600us/step - loss: 0.0464 - accuracy: 0.985
313/313 [==============================] - 0s 408us/step - loss: 0.0805 - accuracy: 0.9758
[0.08051188290119171, 0.9757999777793884]
7
Exercise 4
Code
import idx2numpy
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
def load_mnist_labels(filename):
return idx2numpy.convert_from_file(filename)
8
EXERCISE 4. SUPPORT VECTOR MACHINE
Output
Accuracy: 0.94
9
Exercise 5
Code
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.metrics import SparseCategoricalAccuracy
10
EXERCISE 5. 1D CONVOLUTIONAL NEURAL NETWORK
Output
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d (Conv1D) (None, 98, 64) 256
=================================================================
Total params: 403210 (1.54 MB)
Trainable params: 403210 (1.54 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/10
25/25 [==============================] - 2s 29ms/step - loss: 2.3285 - sparse_categorical_a
Epoch 2/10
25/25 [==============================] - 0s 20ms/step - loss: 2.3042 - sparse_categorical_a
Epoch 3/10
25/25 [==============================] - 0s 20ms/step - loss: 2.2998 - sparse_categorical_a
Epoch 4/10
25/25 [==============================] - 0s 19ms/step - loss: 2.3004 - sparse_categorical_a
Epoch 5/10
25/25 [==============================] - 0s 18ms/step - loss: 2.2965 - sparse_categorical_a
Epoch 6/10
25/25 [==============================] - 0s 19ms/step - loss: 2.3008 - sparse_categorical_a
Epoch 7/10
25/25 [==============================] - 0s 20ms/step - loss: 2.2984 - sparse_categorical_a
Epoch 8/10
25/25 [==============================] - 0s 18ms/step - loss: 2.2977 - sparse_categorical_a
Epoch 9/10
25/25 [==============================] - 0s 19ms/step - loss: 2.2957 - sparse_categorical_a
Epoch 10/10
25/25 [==============================] - 1s 21ms/step - loss: 2.2908 - sparse_categorical_a
<keras.src.callbacks.History at 0x7fa1f120f460>
11
Exercise 6
2D convolutional network
Question
6. Implement a 2D convolutional network using the CIFAR-10 dataset for image classificatio.
Code
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, Flatten
unique_values_train = np.unique(y_train)
n_features = X_train.shape[1:]
n_classes = len(unique_values_train)
# Normalize pixel values to be between 0 and 1
X_train, X_test = X_train / 255.0, X_test / 255.0
def build_neural_network(n_features, n_classes):
## define input layer
inputs = Input(shape=n_features)
# print(model.summary())
return model
# model compile
metric = [tf.keras.metrics.CategoricalAccuracy()]
opt = tf.keras.optimizers.Adam()
loss = tf.keras.losses.CategoricalCrossentropy()
network = build_neural_network(n_features, n_classes)
12
EXERCISE 6. 2D CONVOLUTIONAL NETWORK
network.compile(loss=loss, optimizer=opt, metrics=metric)
network.summary()
Output
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 32, 32, 3)] 0
=================================================================
Total params: 1209034 (4.61 MB)
Trainable params: 1209034 (4.61 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
13
Exercise 7
Code
import numpy as np
import tensorflow as tf
from sklearn.metrics import accuracy_score, mean_squared_error
from sklearn.utils import resample
# Load Fashion MNIST dataset
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
# Normalize the pixel values to be between 0 and 1
X_train, X_test = X_train / 255.0, X_test / 255.0
# Flatten the images to make them compatible with logistic regression
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)
# Convert labels to binary (for simplicity, let's classify between two classes: 0 (T-shirt/
top) and 1 (Trouser))
binary_indices_train = np.where((y_train == 0) | (y_train == 1))[0]
binary_indices_test = np.where((y_test == 0) | (y_test == 1))[0]
X_train_binary, y_train_binary = X_train[binary_indices_train], y_train[binary_indices_train
]
X_test_binary, y_test_binary = X_test[binary_indices_test], y_test[binary_indices_test]
# Convert labels to 0 and 1
y_train_binary = np.where(y_train_binary == 0, 0, 1)
y_test_binary = np.where(y_test_binary == 0, 0, 1)
# Define sigmoid function
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Function to train the model and return predictions
def train_and_predict(X_train, y_train, X_test):
m = np.zeros(X_train.shape[1])
c = 0
LR = 0.0001
epochs = 50
for epoch in range(1, epochs + 1):
for i in range(len(X_train)):
gr_wrt_m = X_train[i] * (y_train[i] - sigmoid(np.dot(m.T, X_train[i]) + c))
gr_wrt_c = y_train[i] - sigmoid(np.dot(m.T, X_train[i]) + c)
m = m + LR * gr_wrt_m
c = c + LR * gr_wrt_c
predictions = []
for i in range(len(X_test)):
z = np.dot(m, X_test[i]) + c
y_pred = sigmoid(z)
predictions.append(y_pred)
return np.array(predictions)
# Train multiple models and collect their predictions
n_models = 10
all_predictions = []
for _ in range(n_models):
14
EXERCISE 7. BIAS, VARIANCE, AND THE TRADE-OFF
X_resampled, y_resampled = resample(X_train_binary, y_train_binary)
predictions = train_and_predict(X_resampled, y_resampled, X_test_binary)
all_predictions.append(predictions)
all_predictions = np.array(all_predictions)
# Calculate the average prediction
average_prediction = np.mean(all_predictions, axis=0)
# Calculate bias
bias = mean_squared_error(y_test_binary, average_prediction)
# Calculate variance
variance = np.mean(np.var(all_predictions, axis=0))
# Output bias and variance
print("Bias:", bias)
print("Variance:", variance)
Output
Bias: 0.014884992836837167
Variance: 8.482621657344636e-05
15
Exercise 8
ResNet
Question
8. Build a ResNet model with residual connections and Batch Normalization using the SVHN dataset (Street View
House Numbers)
Code
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, Add,
GlobalAveragePooling2D, Dense, Flatten
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical
AUTOTUNE = tf.data.AUTOTUNE
train_ds = ds_train.map(preprocess, num_parallel_calls=AUTOTUNE)
test_ds = ds_test.map(preprocess, num_parallel_calls=AUTOTUNE)
16
EXERCISE 8. RESNET
x = Add()([x, inputs])
x = Activation('relu')(x)
return x
x = resnet_block(x, 32)
x = resnet_block(x, 32)
x = GlobalAveragePooling2D()(x)
x = Flatten()(x)
outputs = Dense(num_classes, activation='softmax')(x)
# Data augmentation
datagen = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True
)
EPOCHS = 5
# Convert the dataset to NumPy arrays
train_images = []
train_labels = []
for image, label in train_ds:
train_images.append(image.numpy())
train_labels.append(label.numpy())
17
EXERCISE 8. RESNET
Output
Model: "model"
___________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
===========================================================================================
input_1 (InputLayer) [(None, 32, 32, 3)] 0 []
18
EXERCISE 8. RESNET
19
EXERCISE 8. RESNET
tchNormalization)
===========================================================================================
Total params: 699978 (2.67 MB)
Trainable params: 697738 (2.66 MB)
Non-trainable params: 2240 (8.75 KB)
Epoch 1/2
1145/1145 [==============================] - 816s 707ms/step - loss: 0.7138 - accuracy: 0.7
Epoch 2/2
1145/1145 [==============================] - 786s 687ms/step - loss: 0.3181 - accuracy: 0.9
20
Exercise 9
Sentiment analysis
Question
9.Implement a transformer for sentiment analysis using the IMDB movie review dataset.
Code
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Dropout, Embedding, GlobalAveragePooling1D
, LayerNormalization, MultiHeadAttention, Conv1D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
import numpy as np
import pandas as pd
# Constants
MAXLEN = 200 # Maximum sequence length
NUM_HEADS = 2 # Number of attention heads (reduced for simplicity)
FF_DIM = 128 # Feed-forward dimension in each Transformer block (reduced for simplicity)
NUM_TRANSFORMER_BLOCKS = 2 # Number of Transformer blocks (reduced for simplicity)
VOCAB_SIZE = 5000 # Vocabulary size
EMBED_DIM = 128 # Embedding dimension (reduced for simplicity)
21
EXERCISE 9. SENTIMENT ANALYSIS
y_train_df = pd.DataFrame({'label': y_train})
x_test_df = pd.DataFrame({'review': x_test})
y_test_df = pd.DataFrame({'label': y_test})
x_train_half = x_train_df.iloc[:train_size]
y_train_half = y_train_df.iloc[:train_size]
x_test_half = x_test_df.iloc[:test_size]
y_test_half = y_test_df.iloc[:test_size]
# Positional encoding
positions = np.arange(maxlen).reshape(-1, 1)
positional_encoding = np.zeros((maxlen, embed_dim))
positional_encoding[:, 0::2] = np.sin(positions / 10000**(2 * np.arange(embed_dim)[0::2]
/ embed_dim))
positional_encoding[:, 1::2] = np.cos(positions / 10000**(2 * np.arange(embed_dim)[1::2]
/ embed_dim))
x = embedding_layer + positional_encoding
# Transformer blocks
for _ in range(num_transformer_blocks):
# Multi-head self-attention
x1 = LayerNormalization()(x)
x2 = MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim // num_heads)(x1, x1)
x = x1 + x2
# Feed-forward network
x1 = LayerNormalization()(x)
x2 = Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(x1)
x = x1 + x2
22
EXERCISE 9. SENTIMENT ANALYSIS
# Evaluate the model
loss, accuracy = transformer_model.evaluate(np.array(x_test_half_padded), np.array(
y_test_half['label']))
print(f"Test Accuracy: {accuracy * 100:.2f}%")
def predict_sentiment(review, model, maxlen):
# Tokenize and pad the review
review_seq = imdb.get_word_index()
review_seq = {k:(v+3) for k,v in review_seq.items()}
tokenized_review = [review_seq[word] if word in review_seq and review_seq[word] <
VOCAB_SIZE else 2 for word in review.split()]
padded_review = pad_sequences([tokenized_review], maxlen=maxlen)
# Predict sentiment
prediction = model.predict(padded_review)[0, 0]
sentiment = "positive" if prediction >= 0.5 else "negative"
confidence = prediction if prediction >= 0.5 else 1 - prediction
Output
Model: "model_1"
___________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
===========================================================================================
input_2 (InputLayer) [(None, 200)] 0 []
23
EXERCISE 9. SENTIMENT ANALYSIS
multi_head_attention_2 (Mu (None, 200, 128) 66048 [’layer_normalization_4
ltiHeadAttention) , ’layer_normalization_
’]
===========================================================================================
Total params: 806273 (3.08 MB)
Trainable params: 806273 (3.08 MB)
Non-trainable params: 0 (0.00 Byte)
___________________________________________________________________________________________
Epoch 1/2
196/196 [==============================] - 147s 733ms/step - loss: 0.6982 - accuracy: 0.498
Epoch 2/2
24
EXERCISE 9. SENTIMENT ANALYSIS
196/196 [==============================] - 139s 711ms/step - loss: 0.6922 - accuracy: 0.522
196/196 [==============================] - 37s 188ms/step - loss: 0.6852 - accuracy: 0.5230
Test Accuracy: 52.30%
25
Exercise 10
Code
import numpy as np
# Define actions
actions = ['up', 'down', 'left', 'right']
# Define rewards
rewards = np.zeros((grid_size, grid_size))
rewards[3, 3] = 1 # Reward for reaching the goal state
while True:
delta = 0
for x in range(grid_size):
for y in range(grid_size):
if (x, y) == (3, 3): # Skip the terminal state
continue
v = value_table[x, y]
q_values = []
for i, action in enumerate(actions):
(next_x, next_y) = transition_probs[action](x, y)
q_value = rewards[x, y] + gamma * value_table[next_x, next_y]
q_values.append(q_value)
value_table[x, y] = max(q_values)
policy[x, y] = np.argmax(q_values)
delta = max(delta, abs(v - value_table[x, y]))
if delta < theta:
break
return policy, value_table
26
EXERCISE 10. MARKOV DECISION PROCESSES
for x in range(policy.shape[0]):
for y in range(policy.shape[1]):
if (x, y) == (3, 3):
policy_arrows[x, y] = 'G' # Goal state
else:
policy_arrows[x, y] = actions[policy[x, y]][0].upper()
for row in policy_arrows:
print(' '.join(row))
Output
Optimal Policy:
U U U U
U U U U
U U U U
U U U G
27
Exercise 11
Q-learning algorithm
Question
11. Implement a Q-learning algorithm to solve a tabular reinforcement learning problem using the OpenAI Gym
environment
Code
import gym
import numpy as np
import random
# Q-learning parameters
alpha = 0.1 # Learning rate
gamma = 0.99 # Discount factor
epsilon = 1.0 # Exploration rate
epsilon_min = 0.1 # Minimum exploration rate
epsilon_decay = 0.995 # Decay rate for exploration probability
# Training parameters
num_episodes = 1000
max_steps_per_episode = 100
# Q-learning algorithm
for episode in range(num_episodes):
state = env.reset()
done = False
step = 0
total_reward = 0
state = next_state
28
EXERCISE 11. Q-LEARNING ALGORITHM
step += 1
total_reward += reward
if (episode + 1) % 100 == 0:
print(f'Episode {episode + 1}/{num_episodes} - Total reward: {total_reward} -
Epsilon: {epsilon}')
print(f'Q-table snapshot:\n{q_table}')
total_rewards += episode_reward
env.close()
Output
Episode 100/1000 - Total reward: 0.0 - Epsilon: 0.6057704364907278
Q-table snapshot:
[[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0.1 0. ]
[0. 0. 0. 0. ]]
29
EXERCISE 11. Q-LEARNING ALGORITHM
Episode 200/1000 - Total reward: 0.0 - Epsilon: 0.3669578217261671
Q-table snapshot:
[[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0.1 0. ]
[0. 0. 0. 0. ]]
Episode 300/1000 - Total reward: 1.0 - Epsilon: 0.22229219984074702
Q-table snapshot:
[[6.02547296e-07 1.51495463e-04 0.00000000e+00 2.20400503e-06]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 1.53648100e-03 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 1.03181644e-02 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 4.51467432e-02 0.00000000e+00]
[0.00000000e+00 2.18172292e-01 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 5.69532790e-01 1.83176054e-02]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]]
Episode 400/1000 - Total reward: 1.0 - Epsilon: 0.1346580429260134
Q-table snapshot:
[[9.84775773e-02 8.83934279e-01 9.59888291e-04 2.37303204e-01]
[7.78714740e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[1.94936243e-01 9.28005608e-01 0.00000000e+00 1.80659948e-01]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[8.42392398e-02 0.00000000e+00 9.58402468e-01 1.65558509e-01]
[3.00984444e-01 4.63133002e-02 9.76380016e-01 0.00000000e+00]
[8.96077280e-02 9.89357301e-01 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
30
EXERCISE 11. Q-LEARNING ALGORITHM
[0.00000000e+00 4.42146309e-02 5.00856645e-01 0.00000000e+00]
[1.05385835e-01 1.87311993e-01 9.99923823e-01 4.49784855e-01]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]]
Episode 500/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[2.57967746e-01 9.50917648e-01 9.59888291e-04 4.25477197e-01]
[7.78714740e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[2.69194204e-01 9.60574965e-01 0.00000000e+00 3.84676614e-01]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[1.71747383e-01 0.00000000e+00 9.70293865e-01 2.43452026e-01]
[4.79013518e-01 1.45283853e-01 9.80098982e-01 0.00000000e+00]
[5.02068648e-01 9.89999839e-01 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 4.42146309e-02 6.33413939e-01 9.70102057e-02]
[1.84416127e-01 4.04834911e-01 9.99999991e-01 5.02815773e-01]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]]
Episode 600/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[0.44319831 0.95099002 0.12310583 0.523514 ]
[0.48251959 0. 0. 0. ]
[0.02395337 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.45396019 0.960596 0. 0.49046894]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.3855252 0. 0.970299 0.57497446]
[0.47901352 0.3847749 0.9801 0. ]
[0.59103233 0.99 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0.04421463 0.80049555 0.09701021]
[0.27840679 0.51601628 1. 0.59349976]
[0. 0. 0. 0. ]]
Episode 700/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[0.53787186 0.95099005 0.30532645 0.66725256]
[0.67046853 0. 0. 0. ]
[0.02395337 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.50366317 0.96059601 0. 0.57616107]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.58329205 0. 0.970299 0.67687469]
[0.57051419 0.47573037 0.9801 0. ]
31
EXERCISE 11. Q-LEARNING ALGORITHM
[0.663093 0.99 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0.04421463 0.88928956 0.09701021]
[0.46501218 0.51601628 1. 0.69826843]
[0. 0. 0. 0. ]]
Episode 800/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[0.61455743 0.95099005 0.47396598 0.76155943]
[0.7814505 0. 0. 0. ]
[0.02395337 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.62488876 0.96059601 0. 0.6455717 ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.6549798 0. 0.970299 0.75115995]
[0.77402109 0.47573037 0.9801 0. ]
[0.74634583 0.99 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0.04421463 0.89936061 0.09701021]
[0.50655063 0.56341465 1. 0.72645158]
[0. 0. 0. 0. ]]
Episode 900/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[0.67667275 0.95099005 0.60843455 0.79574437]
[0.83648469 0. 0. 0. ]
[0.02395337 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.71326221 0.96059601 0. 0.72576289]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.71304688 0. 0.970299 0.78912767]
[0.82458289 0.51719403 0.9801 0. ]
[0.80703714 0.99 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0.04421463 0.92392388 0.09701021]
[0.58198826 0.60607319 1. 0.81368127]
[0. 0. 0. 0. ]]
Episode 1000/1000 - Total reward: 1.0 - Epsilon: 0.0996820918179746
Q-table snapshot:
[[0.74843555 0.95099005 0.65121421 0.82343416]
[0.85643383 0. 0. 0. ]
[0.02395337 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.7584305 0.96059601 0. 0.83830325]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
32
EXERCISE 11. Q-LEARNING ALGORITHM
[0. 0. 0. 0. ]
[0.82903825 0. 0.970299 0.84479214]
[0.85042538 0.62833521 0.9801 0. ]
[0.8738945 0.99 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0.04421463 0.95488444 0.09701021]
[0.67685248 0.76329506 1. 0.81368127]
[0. 0. 0. 0. ]]
Average reward over 100 evaluation episodes: 1.0
33