Experiment No-1
Aim: Implementation of perceptron from scratch
Objective: To study the structure, contents and working principle of basic
Perceptron.
Perceptron Neural Networks
Rosenblatt created many variations of the perceptron. One of the simplest was a
single-layer network whose weights and biases could be trained to produce a
correct target vector when presented with the corresponding input vector.The
training technique used is called the perceptron learning rule. The perceptron
generated great interest due to its ability to generalize from its training vectors
and learn from initially randomly distributed connections.
Perceptron are especially suited for simple problems in pattern classification.
They are fast and reliable networks for the problems they can solve. In addition,
an understanding of the operations of the perceptron provides a good basis for
understanding more complex networks.
Activation functions
Activation functions are mathematical functions that can be used in Perceptrons
to determine the output given its input. As we said it determines whether the
neuron(Perceptron) needs to be activated or not. Activation functions take in a
weighted sum of the input data, called the activation, and produce an output that
can be used for prediction.
Activation functions are an essential part of Perceptrons and neural networks
because they allow the model to learn and make decisions based on the input
data. They also help to introduce non-linearity into the model, which is necessary
for learning more complex relationships in the data.
Some common types of activation functions used in Perceptrons are the Sign
function, Heaviside function, Sigmoid function, ReLU function,
Implementing Perceptron in Python
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
%matplotlib inline
(x_train, y_train),(x_test, y_test) =
keras.datasets.mnist.load_data()
len(x_train)
len(x_test)
x_train[0].shape
plt.matshow(x_train[0])
# Normalizing the dataset
x_train = x_train/255
x_test = x_test/255
# Flatting the dataset in order
# to compute for model building
x_train_flatten = x_train.reshape(len(x_train), 28*28)
x_test_flatten = x_test.reshape(len(x_test), 28*28)
model = keras.Sequential([
keras.layers.Dense(10, input_shape=(784,),
activation='sigmoid')
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train_flatten, y_train, epochs=5)
model.evaluate(x_test_flatten, y_test)
Results:
Experiment-02
Aim:
Theory:
A Multilayer Perceptron (MLP) is an extension of the basic perceptron
that can handle more complex, non-linear data by using multiple layers
of neurons. It forms the backbone of many modern deep learning
models and is often called a fully connected neural network because
each neuron in one layer is connected to every neuron in the next
layer.
Structure of a Multilayer Perceptron (MLP)
1. Input Layer: The initial layer where data features are fed into the
network. The number of neurons in this layer corresponds to the
number of features in the input data.
2. Hidden Layers: One or more layers between the input and output
layers that allow the network to learn complex patterns. Each
hidden layer applies a linear transformation followed by a non-
linear activation function, such as ReLU.
3. Output Layer: The final layer that outputs predictions. In a
classification task, this layer has one neuron per class with a
softmax activation function (for multi-class classification) or
sigmoid activation (for binary classification). For regression, it
usually has a single neuron without activation.
Key Hyperparameters in MLP
Training an MLP involves setting various hyperparameters that impact
the network’s performance. These hyperparameters control aspects of
model architecture, training dynamics, and regularization.
1. Number of Hidden Layers and Neurons per Layer
Definition: The depth (number of layers) and width (number of
neurons per layer) of the network.
Tuning Impact: More hidden layers and neurons generally allow
the network to learn more complex functions, but it can also lead
to overfitting if the network becomes too large for the available
data.
Tuning Strategy: Start with a small network (e.g., 1–2 hidden
layers with a moderate number of neurons) and increase
complexity as needed.
2. Activation Functions
Common Choices: ReLU (Rectified Linear Unit) for hidden layers,
Softmax for multi-class output, and Sigmoid for binary output.
Impact on Training: Activation functions affect how information
flows through the network. ReLU is often used because it
mitigates the vanishing gradient problem and speeds up training.
Tuning Strategy: Generally, ReLU is a good default for hidden
layers, but other options like tanh may sometimes be tested to
see if they improve performance.
3. Learning Rate
Definition: Controls how much the model’s weights are adjusted
with each step of training.
Impact on Training: A high learning rate can lead to faster
convergence but may overshoot optimal values. A low learning
rate is more stable but may lead to slow convergence.
Tuning Strategy: Use learning rate schedules or adaptive
optimizers (e.g., Adam, which adapts the learning rate). Start with
a learning rate around 0.001, then fine-tune by increasing or
decreasing as needed.
4. Batch Size
Definition: The number of training samples processed before
updating model weights.
Impact on Training: Larger batch sizes offer smoother gradients
but require more memory. Smaller batch sizes may introduce
noise, potentially helping generalization.
Tuning Strategy: Common batch sizes are powers of 2 (e.g., 32,
64, 128). Try several values to find the best balance between
training stability and performance.
5. Number of Epochs
Definition: Number of complete passes through the entire training
dataset.
Impact on Training: Too few epochs can lead to underfitting, while
too many can lead to overfitting.
Tuning Strategy: Use early stopping to prevent overfitting and
dynamically set the optimal number of epochs based on validation
performance.
6. Regularization Techniques
Dropout: Randomly drops neurons during training, reducing
overfitting. Common values are 0.2 to 0.5 for dropout rate.
L2 Regularization (Weight Decay): Penalizes large weights by
adding a regularization term to the loss function, helping to
prevent overfitting.
Tuning Strategy: Test different dropout rates and regularization
strengths. Dropout is especially effective for large networks with
many layers.
7. Optimizer Choice
Popular Choices: Adam (adaptive learning rate), SGD (Stochastic
Gradient Descent), RMSprop.
Impact on Training: The choice of optimizer affects convergence
speed and stability. Adam is widely used because of its adaptive
learning rates.
Tuning Strategy: Start with Adam, but for some tasks, SGD with
momentum can improve performance.
Results
Experiment No-03:
Aim: Hyper Tuning using Grid search and Random Search
Objective: The objective of hyperparameter tuning using Grid Search and Random
Search is to find the optimal combination of hyperparameter values for a machine
learning model by systematically evaluating different combinations, with Grid
Search exhaustively testing all possible combinations and Random Search
randomly sampling combinations, to ultimately achieve the best possible model
performance on a given task
Theory
Grid search
Grid search is the simplest algorithm for hyperparameter tuning. Basically, we
divide the domain of the hyperparameters into a discrete grid. Then, we try every
combination of values of this grid, calculating some performance metrics using
cross-validation. The point of the grid that maximizes the average value in cross-
validation, is the optimal combination of values for the hyperparameters.
Example of a grid search
Grid search is an exhaustive algorithm that spans all the combinations, so it can
actually find the best point in the domain. The great drawback is that it’s very
slow. Checking every combination of the space requires a lot of time that,
sometimes, is not available. Don’t forget that every point in the grid needs k-fold
cross-validation, which requires k training steps. So, tuning the hyperparameters
of a model in this way can be quite complex and expensive. However, if we look
for the best combination of values of the hyperparameters, grid search is a very
good idea.
Random search
Random search is similar to grid search, but instead of using all the points in the
grid, it tests only a randomly selected subset of these points. The smaller this
subset, the faster but less accurate the optimization. The larger this dataset, the
more accurate the optimization but the closer to a grid search.
Example of random search
Random search is a very useful option when you have several hyperparameters
with a fine-grained grid of values. Using a subset made by 5-100 randomly
selected points, we are able to get a reasonably good set of values of the
hyperparameters. It will not likely be the best point, but it can still be a good set
of values that gives us a good model.
Code:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam, SGD
from sklearn.model_selection import ParameterGrid, ParameterSampler
import numpy as np
# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Normalize the data
x_train = x_train / 255.0
x_test = x_test / 255.0
# Convert labels to categorical
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
def build_model(hidden_layers, hidden_units, activation, optimizer):
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
for _ in range(hidden_layers):
model.add(Dense(hidden_units, activation=activation))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=optimizer, loss='categorical_crossentropy',
metrics=['accuracy'])
return model
# Define parameter grid
param_grid = {
'hidden_layers': [1, 2],
'hidden_units': [32, 64],
'activation': ['relu', 'tanh'],
'optimizer': ['adam', 'sgd'],
'batch_size': [32, 64],
'epochs': [5, 10]
}
grid = ParameterGrid(param_grid)
best_model = None
best_accuracy = 0
# Perform Grid Search
for params in grid:
print(f"Testing params: {params}")
model = build_model(params['hidden_layers'], params['hidden_units'],
params['activation'], params['optimizer'])
model.fit(x_train, y_train, batch_size=params['batch_size'],
epochs=params['epochs'], verbose=0)
_, accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"Accuracy: {accuracy}")
if accuracy > best_accuracy:
best_accuracy = accuracy
best_model = model
print(f"Best Accuracy: {best_accuracy}")
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
grid_space={'max_depth':[3,5,10,None],
'n_estimators':[10,100,200],
'max_features':[1,3,5,7],
'min_samples_leaf':[1,2,3],
'min_samples_split':[1,2,3]
}
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(rf,param_grid=grid_space,cv=3,scoring='accuracy')
model_grid = grid.fit(X,y)
print('Best hyperparameters are: '+str(model_grid.best_params_))
print('Best score is: '+str(model_grid.best_score_))
Challenging Experiment 01
Aim: Implementation and Performance Evaluation of a Multi-Layer Perceptron
(MLP) for Cats and Dogs Image Recognition
Experiment no 04
Aim:MLP Implementation using Keras on MNIST Fashion Dataset
Objective: Multilayer Perceptron (MLP) with the Fashion MNIST dataset is to train
a machine learning model to accurately classify grayscale images of clothing items
from different categories (like t-shirts, pants, dresses) by predicting the correct
clothing class for each image
Theory:
Code:
import keras
import numpy as np
import matplotlib.pyplot as plt
# %matplotlib inline
keras.backend.backend()
fm = keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = fm.load_data()
X_train.shape
X_test.shape
X_train[0]
y_train[0]
plt.matshow(X_train[0])
"""<h3 style='color:purple'>Normalize training data before training the neural
net</h3>"""
X_train = X_train/255
X_test = X_test/255
"""<h3 style='color:purple'>Now build the Sequential Model and add layers into
it</h3>"""
from keras.models import Sequential
from keras.layers import Flatten, Dense, Activation
model = Sequential()
model.add(Flatten(input_shape=[28, 28]))
model.add(Dense(100, activation="relu"))
model.add(Dense(10, activation="softmax"))
"""<img src='fashion_neural_net.png' />"""
model.summary()
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
model.fit(X_train, y_train)
model.evaluate(X_test, y_test)
"""**Above shows accuracy score of 82.76%. The first parameter is loss**"""
plt.matshow(X_test[0])
yp = model.predict(X_test)
np.argmax(yp[0])
class_labels =
["T-shirt/top","Trouser","Pullover","Dress","Coat","Sandal","Shirt","Sneaker","Ba
g","Ankle boot"]
class_labels[np.argmax(yp[0])]
Results: